Text-to-video model: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: template type. Add: eprint, class. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
Tidy up of existing company mentions and added new company (Synthesia)
Line 14:
Antonia Antonova presented another model.<ref>{{Cite web |title=Text to Video Generation |url=https://antonia.space/text-to-video-generation |access-date=2022-10-12 |website=Antonia Antonova |language=en-US}}</ref>
 
In March 2023, a landmark research paper by Alibaba research was published, applying many of the principles found in latent image diffusion models to video generation.<ref>{{Cite web |title=Home - DAMO Academy |url=https://damo.alibaba.com/ |access-date=2023-08-12 |website=damo.alibaba.com}}</ref><ref>{{Cite arXiv |last1=Luo |first1=Zhengxiong |last2=Chen |first2=Dayou |last3=Zhang |first3=Yingya |last4=Huang |first4=Yan |last5=Wang |first5=Liang |last6=Shen |first6=Yujun |last7=Zhao |first7=Deli |last8=Zhou |first8=Jingren |last9=Tan |first9=Tieniu |date=2023 |title=VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation |class=cs.CV |eprint=2303.08320}}</ref> Many servicesServices like https://kaiber.ai/Kaiber or https://reemix.co/Reemix have since adopted similar approaches to video generation in their respective products.
 
[[Matthias Niessner]] (TUM) and [[Lourdes Agapito]] (UCL) at AI company [[Synthesia (company)|Synthesia]] work on developing 3D neural rendering techniques that synthesise realistic video. The goal is to improve existing text to video model by 2D and 3D neural representations of shape appearance and motion for controllable video synthesis of avatars that look and sound like real people.<ref>{{Cite web |title=Text to Speech for Videos |url=https://www.synthesia.io/text-to-speech |access-date=2023-10-17}}</ref>
 
Although alternative approaches exist,<ref>{{Citation |title=Text2Video-Zero |date=2023-08-12 |url=https://github.com/Picsart-AI-Research/Text2Video-Zero |access-date=2023-08-12 |publisher=Picsart AI Research (PAIR)}}</ref> full latent diffusion models are currently regarded to be state of the art for video diffusion.