Revision as of 10:23, 9 July 2025 edit Arcrev1 (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 5,927 edits m Reverted edits by 117.223.94.125 (talk) (HG) (3.4.13) Tags: Huggle Rollback ← Previous edit		Revision as of 08:58, 24 July 2025 edit undo Kalman (talk \| contribs) 130 edits Updated and added references Tag: Visual edit Next edit →
Line 10: In March 2023, a research paper titled "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation" was published, presenting a novel approach to video generation.<ref>{{Cite arXiv \|eprint=2303.08320 \|class=cs.CV \|first1=Zhengxiong \|last1=Luo \|first2=Dayou \|last2=Chen \|title=VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation \|date=2023 \|last3=Zhang \|first3=Yingya \|last4=Huang \|first4=Yan \|last5=Wang \|first5=Liang \|last6=Shen \|first6=Yujun \|last7=Zhao \|first7=Deli \|last8=Zhou \|first8=Jingren \|last9=Tan \|first9=Tieniu}}</ref> The VideoFusion model decomposes the diffusion process into two components: base noise and residual noise, which are shared across frames to ensure temporal coherence. By utilizing a pre-trained image diffusion model as a base generator, the model efficiently generated high-quality and coherent videos. Fine-tuning the pre-trained model on video data addressed the ___domain gap between image and video data, enhancing the model's ability to produce realistic and consistent video sequences.<ref>{{Cite arXiv \|title=VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation \|eprint=2303.08320 \|last1=Luo \|first1=Zhengxiong \|last2=Chen \|first2=Dayou \|last3=Zhang \|first3=Yingya \|last4=Huang \|first4=Yan \|last5=Wang \|first5=Liang \|last6=Shen \|first6=Yujun \|last7=Zhao \|first7=Deli \|last8=Zhou \|first8=Jingren \|last9=Tan \|first9=Tieniu \|date=2023 \|class=cs.CV }}</ref> In the same month, [[Adobe Inc.\|Adobe]] introduced Firefly AI as part of its features.<ref>{{Cite web \|date=2024-10-10 \|title=Adobe launches Firefly Video model and enhances image, vector and design models. Adobe Newsroom \|url=https://news.adobe.com/news/2024/10/101424-adobe-launches-firefly-video-model \|access-date=2024-11-18 \|publisher=[[Adobe Inc.]]}}</ref> In January 2024, [[Google]] announced development of a text-to-video model named Lumiere which is anticipated to integrate advanced video editing capabilities.<ref>{{Cite web \|last=Yirka \|first=Bob \|date=2024-01-26 \|title=Google announces the development of Lumiere, an AI-based next-generation text-to-video generator. \|url=https://techxplore.com/news/2024-01-google-lumiere-ai-based-generation.html \|access-date=2024-11-18 \|website=Tech Xplore}}</ref> [[Matthias Niessner]] and [[Lourdes Agapito]] at AI company [[Synthesia (company)\|Synthesia]] work on developing 3D neural rendering techniques that can synthesise realistic video by using 2D and 3D neural representations of shape, appearances, and motion for controllable video synthesis of avatars.<ref>{{Cite web \|title=Text to Speech for Videos \|url=https://www.synthesia.io/text-to-speech \|access-date=2023-10-17 \|website=Synthesia.io}}</ref> In June 2024, Luma Labs launched its [[Dream Machine (text-to-video model)\|Dream Machine]] video tool.<ref>{{Cite web \|last=Nuñez \|first=Michael \|date=2024-06-12 \|title=Luma AI debuts 'Dream Machine' for realistic video generation, heating up AI media race \|url=https://venturebeat.com/ai/luma-ai-debuts-dream-machine-for-realistic-video-generation-heating-up-ai-media-race/ \|access-date=2024-11-18 \|website=VentureBeat \|language=en-US}}</ref><ref>{{Cite web \|last=Fink \|first=Charlie \|title=Apple Debuts Intelligence, Mistral Raises $600 Million, New AI Text-To-Video \|url=https://www.forbes.com/sites/charliefink/2024/06/13/apple-debuts-intelligence-mistral-raises-600-million-new-ai-text-to-video/ \|access-date=2024-11-18 \|website=Forbes \|language=en}}</ref> That same month,<ref>{{Cite web \|last=Franzen \|first=Carl \|date=2024-06-12 \|title=What you need to know about Kling, the AI video generator rival to Sora that's wowing creators \|url=https://venturebeat.com/ai/what-you-need-to-know-about-kling-the-ai-video-generator-rival-to-sora-thats-wowing-creators/ \|access-date=2024-11-18 \|website=VentureBeat \|language=en-US}}</ref> [[Kuaishou]] extended its Kling AI text-to-video model to international users. In July 2024, [[TikTok]] owner [[ByteDance]] released Jimeng AI in China, through its subsidiary, Faceu Technology.<ref>{{Cite web \|date=2024-08-06 \|title=ByteDance joins OpenAI's Sora rivals with AI video app launch \|url=https://www.reuters.com/technology/artificial-intelligence/bytedance-joins-openais-sora-rivals-with-ai-video-app-launch-2024-08-06/ \|access-date=2024-11-18 \|publisher=[[Reuters]]}}</ref> By September 2024, the Chinese AI company [[MiniMax (company)\|MiniMax]] debuted its video-01 model, joining other established AI model companies like [[Zhipu AI]], [[Baichuan]], and [[Moonshot AI]], which contribute to China's involvement in AI technology.<ref>{{Cite web \|date=2024-09-02 \|title=Chinese ai "tiger" minimax launches text-to-video-generating model to rival OpenAI's sora \|url=https://finance.yahoo.com/news/chinese-ai-tiger-minimax-launches-093000322.html \|access-date=2024-11-18 \|website=Yahoo! Finance}}</ref> In December 2024 [[Lightricks]] launched LTX Video as an open source model.<ref>{{Cite web \|last=Requiroso \|first=Kelvene \|date=2024-12-15 \|title=Lightricks' LTXV Model Breaks Speed Records, Generating 5-Second AI Video Clips in 4 Seconds \|url=https://www.eweek.com/news/lightricks-open-source-ai-video-generator/ \|access-date=2025-07-24 \|website=eWEEK \|language=en-US}}</ref> Alternative approaches to text-to-video models include<ref>{{Citation \|title=Text2Video-Zero \|date=2023-08-12 \|url=https://github.com/Picsart-AI-Research/Text2Video-Zero \|access-date=2023-08-12 \|publisher=Picsart AI Research (PAIR)}}</ref> Google's Phenaki, Hour One, [[Colossyan]],<ref name=":5" /> [[Runway (company)\|Runway]]'s Gen-3 Alpha,<ref>{{Cite web \|last=Kemper \|first=Jonathan \|date=2024-07-01 \|title=Runway's Sora competitor Gen-3 Alpha now available \|url=https://the-decoder.com/runways-sora-competitor-gen-3-alpha-now-available/ \|access-date=2024-11-18 \|website=THE DECODER \|language=en-US}}</ref><ref>{{Cite news \|date=2023-03-20 \|title=Generative AI's Next Frontier Is Video \|url=https://www.bloomberg.com/news/articles/2023-03-20/generative-ai-s-next-frontier-is-video \|access-date=2024-11-18 \|work=Bloomberg.com \|language=en}}</ref> and OpenAI's [[Sora (text-to-video model)\|Sora]],<ref>{{Cite web \|date=2024-02-15 \|title=OpenAI teases 'Sora,' its new text-to-video AI model \|url=https://www.nbcnews.com/tech/tech-news/openai-sora-video-artificial-intelligence-unveiled-rcna139065 \|access-date=2024-11-18 \|website=NBC News \|language=en}}</ref><ref>{{Cite web \|last=Kelly \|first=Chris \|date=2024-06-25 \|title=Toys R Us creates first brand film to use OpenAI's text-to-video tool \|url=https://www.marketingdive.com/news/toys-r-us-openai-sora-gen-ai-first-text-video/719797/ \|access-date=2024-11-18 \|website=Marketing Dive \|publisher=[[Informa]] \|language=en-US}}</ref> Several additional text-to-video models, such as Plug-and-Play, Text2LIVE, and TuneAVideo, have emerged.<ref>{{Cite book \|last1=Jin \|first1=Jiayao \|last2=Wu \|first2=Jianhang \|last3=Xu \|first3=Zhoucheng \|last4=Zhang \|first4=Hang \|last5=Wang \|first5=Yaxin \|last6=Yang \|first6=Jielong \|chapter=Text to Video: Enhancing Video Generation Using Diffusion Models and Reconstruction Network \|date=2023-08-04 \|title=2023 2nd International Conference on Computing, Communication, Perception and Quantum Technology (CCPQT) \|chapter-url=https://ieeexplore.ieee.org/document/10336607 \|publisher=IEEE \|pages=108–114 \|doi=10.1109/CCPQT60491.2023.00024 \|isbn=979-8-3503-4269-7}}</ref> [[FLUX.1]] developer Black Forest Labs has announced its text-to-video model SOTA.<ref>{{Cite web \|date=2024-08-01 \|title=Announcing Black Forest Labs \|url=https://blackforestlabs.ai/announcing-black-forest-labs/ \|access-date=2024-11-18 \|website=Black Forest Labs \|language=en-US}}</ref> [[Google]] was preparing to launch a video generation tool named [[Veo (text-to-video model)\|Veo]] for [[YouTube Shorts]] in 2025.<ref>{{Cite web \|last=Forlini \|first=Emily Dreibelbis \|date=2024-09-18 \|title=Google's veo text-to-video AI generator is coming to YouTube shorts \|url=https://www.pcmag.com/news/googles-veo-text-to-video-ai-generator-is-coming-to-youtube-shorts \|access-date=2024-11-18 \|website=[[PC Magazine]]}}</ref> In May 2025, Google launched the Veo 3 iteration of the model. It was noted for its impressive audio generation capabilities, which were a previous limitation for text-to-video models.<ref>{{Cite web \|last=Subin \|first=Jennifer Elias,Samantha \|date=2025-05-20 \|title=Google launches Veo 3, an AI video generator that incorporates audio \|url=https://www.cnbc.com/2025/05/20/google-ai-video-generator-audio-veo-3.html \|access-date=2025-05-22 \|website=CNBC \|language=en}}</ref> In July 2025 Lightricks released an update to LTX Video capable of generating clips reaching 60 seconds.<ref>{{Cite web \|last=Fink \|first=Charlie \|title=LTX Video Breaks The 60-Second Barrier, Redefining AI Video As A Longform Medium \|url=https://www.forbes.com/sites/charliefink/2025/07/16/ltx-video-breaks-the-60-second-barrier-redefining-ai-video-as-a-longform-medium/ \|access-date=2025-07-24 \|website=Forbes \|language=en}}</ref><ref>{{Cite web \|date=2025-07-16 \|title=Lightricks' latest release lets creators direct long-form AI-generated videos in real time \|url=https://siliconangle.com/2025/07/16/lightricks-latest-release-allows-creators-direct-longform-ai-generated-videos-real-time/ \|access-date=2025-07-24 \|website=SiliconANGLE \|language=en-US}}</ref> == Architecture and training ==

Text-to-video model: Difference between revisions