Revision as of 11:11, 4 May 2025 edit Davey2010 (talk \| contribs) Extended confirmed users, File movers, Pending changes reviewers, Rollbackers 142,939 edits Reverting edit(s) by PeoEspanol (talk) to rev. 1288606528 by Davey2010: Unexplained content removal (RW 16.1) Tags: RW Undo ← Previous edit		Revision as of 11:38, 4 May 2025 edit undo 103.174.195.232 (talk) No edit summary Tags: Undo Reverted Next edit →
Line 22: == Limitations == Despite the rapid evolution of Text-to-Video models in their performance, a primary limitation is that they are very computationally heavy which limits its capacity to provide high quality and lengthy outputs.<ref name=":03">{{Cite book \|last1=Bhagwatkar \|first1=Rishika \|last2=Bachu \|first2=Saketh \|last3=Fitter \|first3=Khurshed \|last4=Kulkarni \|first4=Akshay \|last5=Chiddarwar \|first5=Shital \|chapter=A Review of Video Generation Approaches \|date=2020-12-17 \|title=2020 International Conference on Power, Instrumentation, Control and Computing (PICC) \|chapter-url=https://ieeexplore.ieee.org/document/9362485 \|publisher=IEEE \|pages=1–5 \|doi=10.1109/PICC51425.2020.9362485 \|isbn=978-1-7281-7590-4}}</ref><ref name=":13">{{Cite book \|last=Singh \|first=Aditi \|chapter=A Survey of AI Text-to-Image and AI Text-to-Video Generators \|date=2023-05-09 \|title=2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC) \|chapter-url=https://ieeexplore.ieee.org/document/10303174 \|publisher=IEEE \|pages=32–36 \|doi=10.1109/AIRC57904.2023.10303174 \|isbn=979-8-3503-4824-8\|arxiv=2311.06329 }}</ref><ref>{{cite web \| title= Evolution of Text To Video Models \| url=https://deevid.ai/text-to-video \| access-date= 26 April 2025}}</ref> Additionally, these models require a large amount of specific training data to be able to generate high quality and coherent outputs, which brings about the issue of accessibility.<ref name=":13" /><ref name=":03" /> Moreover, models may misinterpret textual prompts, resulting in video outputs that deviate from the intended meaning. This can occur due to limitations in capturing semantic context embedded in text, which affects the model’s ability to align generated video with the user’s intended message.<ref name=":13" /><ref name=":32"/> Various models, including Make-A-Video, Imagen Video, Phenaki, CogVideo, GODIVA, and NUWA, are currently being tested and refined to enhance their alignment capabilities and overall performance in text-to-video generation.<ref name=":13" />

Text-to-video model: Difference between revisions