Revision as of 20:16, 24 July 2025 edit Eurohunter (talk \| contribs) Autopatrolled, Extended confirmed users 26,269 edits →Comparison of models: -double bold ← Previous edit		Revision as of 04:51, 26 July 2025 edit undo 27.60.30.173 (talk) No edit summary Tags: Reverted Mobile edit Mobile web edit Next edit →
Line 3: [[File:OpenAI Sora in Action- Tokyo Walk.webm\|thumb\|upright=1.35\|A video generated using OpenAI's [[Sora (text-to-video model)\|Sora]] text-to-video model, using the prompt: <code>A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.</code>]] A '''text-to-video model''' is a [[machine learning model]] that uses a [[natural language]] description as input to produce a [[video]] relevant to the input text.<ref name="AIIR">{{cite report\|url=https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf\|title=Artificial Intelligence Index Report 2023\|publisher=Stanford Institute for Human-Centered Artificial Intelligence\|page=98\|quote=Multiple high quality text-to-video models, AI systems that can generate video clips from prompted text, were released in 2022.}}</ref> Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video [[diffusion model]]s.<ref>{{cite arXiv \|last1=Melnik \|first1=Andrew \|title=Video Diffusion Models: A Survey \|date=2024-05-06 \|eprint =2405.03150 \|last2=Ljubljanac \|first2=Michal \|last3=Lu \|first3=Cong \|last4=Yan \|first4=Qi \|last5=Ren \|first5=Weiming \|last6=Ritter \|first6=Helge\|class=cs.CV }}</ref> A hyper-realistic cinematic close-up of a whole, full-shaped [pineapple red ] made of transparent glass with a soft light-colored outer hue — for example, pale yellow for a banana, light red for an apple, gentle orange for a carrot. The glass fruit is perfectly centered on a wooden cutting board, glowing subtly under studio lighting. A human hand is clearly visible, holding a sharp stainless steel knife just above the fruit, ready to slice. In slow motion, the knife makes the first clean slice through the glass fruit — the front section breaks off cleanly with delicate glass-crack sounds. Then, the knife immediately makes a second slice, cutting another piece smoothly. Transparent shards scatter lightly from both cuts. ASMR slicing sounds only — no talking, no music. Only the hand, knife, and fruit are visible. Ultra-sharp macro lens, shallow depth of field, cinematic lighting, 1280x720 resolution, 30 FPS. == Models ==

Text-to-video model: Difference between revisions