Text-to-video model: Difference between revisions

Content deleted Content added
added missing footnotes list
GreenC bot (talk | contribs)
Line 6:
== Models ==
{{Update section|date=February 2024}}
There are different models, including [[open source]] models. The demo version of CogVideo is an early text-to-video model "of 9.4 billion parameters", with their codes presented on [[GitHub]].<ref>{{Citation |title=CogVideo |date=2022-10-12 |url=https://github.com/THUDM/CogVideo |publisher=THUDM |access-date=2022-10-12}}</ref> [[Meta Platforms]] has a partial text-to-video{{NoteTag|It can also generate videos from images, video insertion between two images, and videos variations.|name=}} model called "Make-A-Video".<ref>{{Cite web |last=Davies |first=Teli |date=2022-09-29 |title=Make-A-Video: Meta AI's New Model For Text-To-Video Generation |url=https://wandb.ai/telidavies/ml-news/reports/Make-A-Video-Meta-AI-s-New-Model-For-Text-To-Video-Generation--VmlldzoyNzE4Nzcx |access-date=2022-10-12 |website=Weights & Biases |language=en}}</ref><ref>{{Cite web |last=Monge |first=Jim Clyde |date=2022-08-03 |title=This AI Can Create Video From Text Prompt |url=https://betterprogramming.pub/this-ai-can-create-video-from-text-prompt-6904439d7aba |access-date=2022-10-12 |website=Medium |language=en}}</ref><ref>{{Cite web |title=Meta's Make-A-Video AI creates videos from text |url=https://www.fonearena.com/blog/375627/meta-make-a-video-ai-create-videos-from-text.html |access-date=2022-10-12 |website=www.fonearena.com}}</ref> [[Google]]'s [[Google Brain|Brain]] has released a research paper introducing Imagen Video, a text-to-video model with 3D [[U-Net]].<ref>{{Cite web |title=google: Google takes on Meta, introduces own video-generating AI - The Economic Times |url=https://m.economictimes.indiatimes.com/tech/technology/google-takes-on-meta-introduces-own-video-generating-ai/amp_articleshowarticleshow/94681128.cms?amp_gsa=1&amp_js_v=a9&usqp=mq331AQKKAFQArABIIACAw==#amp_tf=From%20%251$s&aohfrom=16655942495197&referrer=https://www.google.com&ampshare=https://m.economictimes.com/tech/technology/google-takes-on-meta-introduces-own-video-generating-ai/articleshow/94681128.cmsmdr |access-date=2022-10-12 |website=m.economictimes.com}}</ref><ref>{{Cite web |last=Monge |first=Jim Clyde |date=2022-08-03 |title=This AI Can Create Video From Text Prompt |url=https://betterprogramming.pub/this-ai-can-create-video-from-text-prompt-6904439d7aba |access-date=2022-10-12 |website=Medium |language=en}}</ref><ref>{{Cite web |title=Nuh-uh, Meta, we can do text-to-video AI, too, says Google |url=https://www.theregister.com/AMP/2022/10/06/google_ai_imagen_video/ |access-date=2022-10-12 |website=www.theregister.com}}</ref><ref>{{Cite web |title=Papers with Code - See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction |url=https://paperswithcode.com/paper/see-plan-predict-language-guided-cognitive |access-date=2022-10-12 |website=paperswithcode.com |language=en}}</ref><ref>{{Cite web |title=Papers with Code - Text-driven Video Prediction |url=https://paperswithcode.com/paper/text-driven-video-prediction |access-date=2022-10-12 |website=paperswithcode.com |language=en}}</ref>
 
In March 2023, a landmark research paper by Alibaba was published, applying many of the principles found in latent image diffusion models to video generation.<ref>{{Cite web |title=Home - DAMO Academy |url=https://damo.alibaba.com/ |access-date=2023-08-12 |website=damo.alibaba.com}}</ref><ref>{{Cite arXiv |last1=Luo |first1=Zhengxiong |last2=Chen |first2=Dayou |last3=Zhang |first3=Yingya |last4=Huang |first4=Yan |last5=Wang |first5=Liang |last6=Shen |first6=Yujun |last7=Zhao |first7=Deli |last8=Zhou |first8=Jingren |last9=Tan |first9=Tieniu |date=2023 |title=VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation |class=cs.CV |eprint=2303.08320}}</ref> Services like Kaiber and Reemix have since adopted similar approaches to video generation in their respective products.