Content deleted Content added
No edit summary Tag: Reverted |
Undid revision 1288918759 by 37.111.136.253 (talk); advertisement Tags: Undo Mobile edit Mobile web edit Advanced mobile edit |
||
Line 6:
== Models ==
{{Globalize|section|date=August 2024}}
There are different models, including [[open source]] models. Chinese-language input<ref name=":5">{{Cite web |last=Wodecki |first=Ben |date=2023-08-11 |title=Text-to-Video Generative AI Models: The Definitive List |url=https://aibusiness.com/nlp/ai-video-generation-the-supreme-list |access-date=2024-11-18 |website=AI Business |publisher=[[Informa]]}}</ref> CogVideo is the earliest text-to-video model "of 9.4 billion parameters" to be developed, with its demo version of open source codes first presented on [[GitHub]] in 2022.<ref>{{Citation |title=CogVideo |date=2022-10-12 |url=https://github.com/THUDM/CogVideo |publisher=THUDM |access-date=2022-10-12}}</ref> That year, [[Meta Platforms]] released a partial text-to-video model called "Make-A-Video",<ref>{{Cite web |last=Davies |first=Teli |date=2022-09-29 |title=Make-A-Video: Meta AI's New Model For Text-To-Video Generation |url=https://wandb.ai/telidavies/ml-news/reports/Make-A-Video-Meta-AI-s-New-Model-For-Text-To-Video-Generation--VmlldzoyNzE4Nzcx |access-date=2022-10-12 |website=Weights & Biases |language=en}}</ref><ref>{{Cite web |last=Monge |first=Jim Clyde |date=2022-08-03 |title=This
In March 2023, a research paper titled "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation" was published, presenting a novel approach to video generation.<ref>{{Cite arXiv |eprint=2303.08320 |class=cs.CV |first1=Zhengxiong |last1=Luo |first2=Dayou |last2=Chen |title=VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation |date=2023 |last3=Zhang |first3=Yingya |last4=Huang |first4=Yan |last5=Wang |first5=Liang |last6=Shen |first6=Yujun |last7=Zhao |first7=Deli |last8=Zhou |first8=Jingren |last9=Tan |first9=Tieniu}}</ref> The VideoFusion model decomposes the diffusion process into two components: base noise and residual noise, which are shared across frames to ensure temporal coherence. By utilizing a pre-trained image diffusion model as a base generator, the model efficiently generated high-quality and coherent videos. Fine-tuning the pre-trained model on video data addressed the ___domain gap between image and video data, enhancing the model's ability to produce realistic and consistent video sequences.<ref>{{Cite arXiv |title=VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation |eprint=2303.08320 |last1=Luo |first1=Zhengxiong |last2=Chen |first2=Dayou |last3=Zhang |first3=Yingya |last4=Huang |first4=Yan |last5=Wang |first5=Liang |last6=Shen |first6=Yujun |last7=Zhao |first7=Deli |last8=Zhou |first8=Jingren |last9=Tan |first9=Tieniu |date=2023 |class=cs.CV }}</ref> In the same month, [[Adobe Inc.|Adobe]] introduced Firefly AI as part of its features.<ref>{{Cite web |date=2024-10-10 |title=Adobe launches Firefly Video model and enhances image, vector and design models. Adobe Newsroom |url=https://news.adobe.com/news/2024/10/101424-adobe-launches-firefly-video-model |access-date=2024-11-18 |publisher=[[Adobe Inc.]]}}</ref>
|