Text-to-video model: Difference between revisions

Content deleted Content added
Reverting edit(s) by PeoEspanol (talk) to rev. 1288606528 by Davey2010: Unexplained content removal (RW 16.1)
No edit summary
Tags: Undo Reverted
Line 22:
 
== Limitations ==
Despite the rapid evolution of Text-to-Video models in their performance, a primary limitation is that they are very computationally heavy which limits its capacity to provide high quality and lengthy outputs.<ref name=":03">{{Cite book |last1=Bhagwatkar |first1=Rishika |last2=Bachu |first2=Saketh |last3=Fitter |first3=Khurshed |last4=Kulkarni |first4=Akshay |last5=Chiddarwar |first5=Shital |chapter=A Review of Video Generation Approaches |date=2020-12-17 |title=2020 International Conference on Power, Instrumentation, Control and Computing (PICC) |chapter-url=https://ieeexplore.ieee.org/document/9362485 |publisher=IEEE |pages=1–5 |doi=10.1109/PICC51425.2020.9362485 |isbn=978-1-7281-7590-4}}</ref><ref name=":13">{{Cite book |last=Singh |first=Aditi |chapter=A Survey of AI Text-to-Image and AI Text-to-Video Generators |date=2023-05-09 |title=2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC) |chapter-url=https://ieeexplore.ieee.org/document/10303174 |publisher=IEEE |pages=32–36 |doi=10.1109/AIRC57904.2023.10303174 |isbn=979-8-3503-4824-8|arxiv=2311.06329 }}</ref><ref>{{cite web | title= Evolution of Text To Video Models | url=https://deevid.ai/text-to-video | access-date= 26 April 2025}}</ref> Additionally, these models require a large amount of specific training data to be able to generate high quality and coherent outputs, which brings about the issue of accessibility.<ref name=":13" /><ref name=":03" />
 
Moreover, models may misinterpret textual prompts, resulting in video outputs that deviate from the intended meaning. This can occur due to limitations in capturing semantic context embedded in text, which affects the model’s ability to align generated video with the user’s intended message.<ref name=":13" /><ref name=":32"/> Various models, including Make-A-Video, Imagen Video, Phenaki, CogVideo, GODIVA, and NUWA, are currently being tested and refined to enhance their alignment capabilities and overall performance in text-to-video generation.<ref name=":13" />