Text-to-video model: Difference between revisions

Content deleted Content added
Reverted good faith edits by 114.31.186.133 (talk): That's not an appropriate short description
Line 25:
 
Moreover, models may misinterpret textual prompts, resulting in video outputs that deviate from the intended meaning. This can occur due to limitations in capturing semantic context embedded in text, which affects the model’s ability to align generated video with the user’s intended message.<ref name=":13" /><ref name=":32"/> Various models, including Make-A-Video, Imagen Video, Phenaki, CogVideo, GODIVA, and NUWA, are currently being tested and refined to enhance their alignment capabilities and overall performance in text-to-video generation.<ref name=":13" />
 
Another issue with the outputs is that text or fine details in AI-generated videos often appear garbled, a problem that [[Stable Diffusion|stable diffusion]] models also struggle with. Examples include distorted hands and unreadable text.
 
== Ethics ==