Revision as of 12:43, 18 November 2023 edit Chris Howard (talk \| contribs) Extended confirmed users 5,868 edits added categories ← Previous edit		Revision as of 12:49, 18 November 2023 edit undo Chris Howard (talk \| contribs) Extended confirmed users 5,868 edits see also: Text-to-image model Next edit →
Line 1: A '''text-to-video model''' ~~model~~ is a [[machine learning]] model which takes as input a [[natural language]] description and produces a [[video]] matching that description.<ref name="AIIR">{{cite report\|url=https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf\|title=Artificial Intelligence Index Report 2023\|publisher=Stanford Institute for Human-Centered Artificial Intelligence\|page=98\|quote=Multiple high quality text-to-video models, AI systems that can generate video clips from prompted text, were released in 2022.}}</ref> Video prediction on making objects realistic in a stable background is performed by using [[recurrent neural network]] for a sequence to sequence model with a connector [[convolutional neural network]] encoding and decoding each frame pixel by pixel,<ref>{{Cite web \|title=Leading India \|url=https://www.leadingindia.ai/downloads/projects/VP/vp_16.pdf}}</ref> creating video using [[deep learning]].<ref>{{Cite web \|last=Narain \|first=Rohit \|date=2021-12-29 \|title=Smart Video Generation from Text Using Deep Neural Networks \|url=https://www.datatobiz.com/blog/smart-video-generation-from-text/ \|access-date=2022-10-12 \|language=en-US}}</ref> Line 19: Although alternative approaches exist,<ref>{{Citation \|title=Text2Video-Zero \|date=2023-08-12 \|url=https://github.com/Picsart-AI-Research/Text2Video-Zero \|access-date=2023-08-12 \|publisher=Picsart AI Research (PAIR)}}</ref> full latent diffusion models are currently regarded to be state of the art for video diffusion. == See also == * [[Text-to-image model]] == References ==

Text-to-video model: Difference between revisions