Revision as of 02:42, 22 March 2025 edit AP 499D25 (talk \| contribs) Extended confirmed users, Rollbackers 11,298 edits Reverted good faith edits by 114.31.186.133 (talk): That's not an appropriate short description Tags: Twinkle Undo ← Previous edit		Revision as of 01:43, 24 March 2025 edit undo Dr. Precursor (talk \| contribs) Extended confirmed users 846 edits →Limitations Tag: Visual edit Next edit →
Line 25: Moreover, models may misinterpret textual prompts, resulting in video outputs that deviate from the intended meaning. This can occur due to limitations in capturing semantic context embedded in text, which affects the model’s ability to align generated video with the user’s intended message.<ref name=":13" /><ref name=":32"/> Various models, including Make-A-Video, Imagen Video, Phenaki, CogVideo, GODIVA, and NUWA, are currently being tested and refined to enhance their alignment capabilities and overall performance in text-to-video generation.<ref name=":13" /> Another issue with the outputs is that text or fine details in AI-generated videos often appear garbled, a problem that [[Stable Diffusion\|stable diffusion]] models also struggle with. Examples include distorted hands and unreadable text. == Ethics ==

Text-to-video model: Difference between revisions