Text-to-video model

This is an old revision of this page, as edited by WikiCleanerBot (talk | contribs) at 09:05, 19 August 2023 (v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A text-to-video model is a machine learning model which takes as input a natural language description and produces a video matching that description.[1]

Video prediction on making objects realistic in a stable background is performed by using recurrent neural network for a sequence to sequence model with a connector convolutional neural network encoding and decoding each frame pixel by pixel,[2] creating video using deep learning.[3]

Methodology

Models

There are different models including open source models. CogVideo presented their code in GitHub.[4] Meta Platforms uses text-to-video with makeavideo.studio.[5][6][7]Google used Imagen Video for converting text-to-video.[8][9][10][11][12]

Antonia Antonova presented another model.[13]

In March 2023, a landmark research paper by Alibaba research was published, applying many of the principles found in latent image diffusion models to video generation.[14][15] Many services like https://kaiber.ai/ or https://reemix.co/ have since adopted similar approaches to video generation in their respective products.

Although alternative approaches exist,[16] full latent diffusion models are currently regarded to be state of the art for video diffusion.

References

  1. ^ Artificial Intelligence Index Report 2023 (PDF) (Report). Stanford Institute for Human-Centered Artificial Intelligence. p. 98. Multiple high quality text-to-video models, AI systems that can generate video clips from prompted text, were released in 2022.
  2. ^ "Leading India" (PDF).
  3. ^ Narain, Rohit (2021-12-29). "Smart Video Generation from Text Using Deep Neural Networks". Retrieved 2022-10-12.
  4. ^ CogVideo, THUDM, 2022-10-12, retrieved 2022-10-12
  5. ^ Davies, Teli (2022-09-29). "Make-A-Video: Meta AI's New Model For Text-To-Video Generation". W&B. Retrieved 2022-10-12.
  6. ^ Monge, Jim Clyde (2022-08-03). "This AI Can Create Video From Text Prompt". Medium. Retrieved 2022-10-12.
  7. ^ "Meta's Make-A-Video AI creates videos from text". www.fonearena.com. Retrieved 2022-10-12.
  8. ^ "google: Google takes on Meta, introduces own video-generating AI - The Economic Times". m.economictimes.com. Retrieved 2022-10-12.
  9. ^ Monge, Jim Clyde (2022-08-03). "This AI Can Create Video From Text Prompt". Medium. Retrieved 2022-10-12.
  10. ^ "Nuh-uh, Meta, we can do text-to-video AI, too, says Google". www.theregister.com. Retrieved 2022-10-12.
  11. ^ "Papers with Code - See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction". paperswithcode.com. Retrieved 2022-10-12.
  12. ^ "Papers with Code - Text-driven Video Prediction". paperswithcode.com. Retrieved 2022-10-12.
  13. ^ "Text to Video Generation". Antonia Antonova. Retrieved 2022-10-12.
  14. ^ "Home - DAMO Academy". damo.alibaba.com. Retrieved 2023-08-12.
  15. ^ Luo, Zhengxiong; Chen, Dayou; Zhang, Yingya; Huang, Yan; Wang, Liang; Shen, Yujun; Zhao, Deli; Zhou, Jingren; Tan, Tieniu (2023). "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation". doi:10.48550/ARXIV.2303.08320. {{cite journal}}: Cite journal requires |journal= (help)
  16. ^ Text2Video-Zero, Picsart AI Research (PAIR), 2023-08-12, retrieved 2023-08-12