Revision as of 17:23, 13 April 2024 edit Smeagol 17 (talk \| contribs) Extended confirmed users 61,055 edits →References Tag: 2017 wikitext editor ← Previous edit		Revision as of 10:20, 22 April 2024 edit undo WhyIsNameSoHardOmg- - (talk \| contribs) Extended confirmed users 8,075 edits Revi, not formatting Tag: Visual edit Next edit →
Line 1: {{short description\|Machine learning model}} A '''text-to-video model''' is a [[machine learning]] model which takes ~~as input~~ a [[natural language]] description as input and ~~produces~~producing a [[video]] ~~matching~~or ~~that~~multiples ~~description~~videos from the input.<ref name="AIIR">{{cite report\|url=https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf\|title=Artificial Intelligence Index Report 2023\|publisher=Stanford Institute for Human-Centered Artificial Intelligence\|page=98\|quote=Multiple high quality text-to-video models, AI systems that can generate video clips from prompted text, were released in 2022.}}</ref> Video prediction on making objects realistic in a stable background is performed by using [[recurrent neural network]] for a sequence to sequence model with a connector [[convolutional neural network]] encoding and decoding each frame pixel by pixel,<ref>{{Cite web \|title=Leading India \|url=https://www.leadingindia.ai/downloads/projects/VP/vp_16.pdf}}</ref> creating video using [[deep learning]].<ref>{{Cite web \|last=Narain \|first=Rohit \|date=2021-12-29 \|title=Smart Video Generation from Text Using Deep Neural Networks \|url=https://www.datatobiz.com/blog/smart-video-generation-from-text/ \|access-date=2022-10-12 \|language=en-US}}</ref> Testing of the [[data set]] in conditional [[generative model]] for existing information from text can be done by [[variational autoencoder]] and [[generative adversarial network]] (GAN). ~~== Methodology ==~~ * Data collection and data set preparation using clear video from kinetic human action video. * Training the [[convolutional neural network]] for making video. * Keywords extraction from text using [[natural-language programming]] . * Testing of Data set in conditional generative model for existing static and dynamic information from text by [[variational autoencoder]] and [[generative adversarial network]]. == Models == {{Update section\|date=February 2024}} There are different models, including [[open source]] models. The demo version of CogVideo ~~presented~~is an early text-to-video model "of 9.4 billion parameters", with their ~~code~~codes presented inon [[GitHub]].<ref>{{Citation \|title=CogVideo \|date=2022-10-12 \|url=https://github.com/THUDM/CogVideo \|publisher=THUDM \|access-date=2022-10-12}}</ref> [[Meta Platforms]] ~~uses~~has a partial text-to-video{{NoteTag\|It ~~with~~can ~~[https://Makeavideo.studio~~also ~~makeavideo~~generate videos from images, video insertion between two images, and videos variations.~~studio]~~\|name=}} model called "Make-A-Video".<ref>{{Cite web \|last=Davies \|first=Teli \|date=2022-09-29 \|title=Make-A-Video: Meta AI's New Model For Text-To-Video Generation \|url=https://wandb.ai/telidavies/ml-news/reports/Make-A-Video-Meta-AI-s-New-Model-For-Text-To-Video-Generation--VmlldzoyNzE4Nzcx \|access-date=2022-10-12 \|website=WWeights &B Biases \|language=en}}</ref><ref>{{Cite web \|last=Monge \|first=Jim Clyde \|date=2022-08-03 \|title=This AI Can Create Video From Text Prompt \|url=https://betterprogramming.pub/this-ai-can-create-video-from-text-prompt-6904439d7aba \|access-date=2022-10-12 \|website=Medium \|language=en}}</ref><ref>{{Cite web \|title=Meta's Make-A-Video AI creates videos from text \|url=https://www.fonearena.com/blog/375627/meta-make-a-video-ai-create-videos-from-text.html \|access-date=2022-10-12 \|website=www.fonearena.com}}</ref> [[Google]]'s ~~used~~[[Google Brain\|Brain]] has released a research paper introducing Imagen Video, ~~for converting~~a text-to-video model with 3D [[U-Net]].<ref>{{Cite web \|title=google: Google takes on Meta, introduces own video-generating AI - The Economic Times \|url=https://m.economictimes.com/tech/technology/google-takes-on-meta-introduces-own-video-generating-ai/amp_articleshow/94681128.cms?amp_gsa=1&amp_js_v=a9&usqp=mq331AQKKAFQArABIIACAw==#amp_tf=From%20%251$s&aoh=16655942495197&referrer=https://www.google.com&ampshare=https://m.economictimes.com/tech/technology/google-takes-on-meta-introduces-own-video-generating-ai/articleshow/94681128.cms \|access-date=2022-10-12 \|website=m.economictimes.com}}</ref><ref>{{Cite web \|last=Monge \|first=Jim Clyde \|date=2022-08-03 \|title=This AI Can Create Video From Text Prompt \|url=https://betterprogramming.pub/this-ai-can-create-video-from-text-prompt-6904439d7aba \|access-date=2022-10-12 \|website=Medium \|language=en}}</ref><ref>{{Cite web \|title=Nuh-uh, Meta, we can do text-to-video AI, too, says Google \|url=https://www.theregister.com/AMP/2022/10/06/google_ai_imagen_video/ \|access-date=2022-10-12 \|website=www.theregister.com}}</ref><ref>{{Cite web \|title=Papers with Code - See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction \|url=https://paperswithcode.com/paper/see-plan-predict-language-guided-cognitive \|access-date=2022-10-12 \|website=paperswithcode.com \|language=en}}</ref><ref>{{Cite web \|title=Papers with Code - Text-driven Video Prediction \|url=https://paperswithcode.com/paper/text-driven-video-prediction \|access-date=2022-10-12 \|website=paperswithcode.com \|language=en}}</ref> Antonia Antonova presented another model.<ref>{{Cite web \|title=Text to Video Generation \|url=https://antonia.space/text-to-video-generation \|access-date=2022-10-12 \|website=Antonia Antonova \|language=en-US}}</ref> In March 2023, a landmark research paper by Alibaba ~~research~~ was published, applying many of the principles found in latent image diffusion models to video generation.<ref>{{Cite web \|title=Home - DAMO Academy \|url=https://damo.alibaba.com/ \|access-date=2023-08-12 \|website=damo.alibaba.com}}</ref><ref>{{Cite arXiv \|last1=Luo \|first1=Zhengxiong \|last2=Chen \|first2=Dayou \|last3=Zhang \|first3=Yingya \|last4=Huang \|first4=Yan \|last5=Wang \|first5=Liang \|last6=Shen \|first6=Yujun \|last7=Zhao \|first7=Deli \|last8=Zhou \|first8=Jingren \|last9=Tan \|first9=Tieniu \|date=2023 \|title=VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation \|class=cs.CV \|eprint=2303.08320}}</ref> Services like Kaiber orand Reemix have since adopted similar approaches to video generation in their respective products. [[Matthias Niessner]] ~~(TUM)~~ and [[Lourdes Agapito]] ~~(UCL)~~ at AI company [[Synthesia (company)\|Synthesia]] work on developing 3D neural rendering techniques that can synthesise realistic video. ~~The~~by ~~goal is to improve existing text to video model by~~using 2D and 3D neural representations of shape, ~~appearance~~appearances, and motion for controllable video synthesis of avatars ~~that look and sound like real people~~.<ref>{{Cite web \|title=Text to Speech for Videos \|url=https://www.synthesia.io/text-to-speech \|access-date=2023-10-17}}</ref> ~~Although alternative~~Alternative approaches to text-to-video models exist,.<ref>{{Citation \|title=Text2Video-Zero \|date=2023-08-12 \|url=https://github.com/Picsart-AI-Research/Text2Video-Zero \|access-date=2023-08-12 \|publisher=Picsart AI Research (PAIR)}}</ref> ~~full latent diffusion models are currently regarded to be state of the art for video diffusion.~~ == See also == * [[Text-to-image model]] * [[VideoPoet]], ~~early~~unreleased Google's model, precursor of [[Lumiere (text-to-video model)\|Lumiere]] * [[Sora (text-to-video model)\|Sora]], unreleased OpenAI model * [[Runway (company)\|Runway]], the company developing Gen-1 and Gen-2 models

Text-to-video model: Difference between revisions