Text-to-video model: Difference between revisions

Content deleted Content added
Corrected punctuation and terminology
Corrected capitalization. Those are common nouns, not proper nouns.
Line 1:
'''Text-to-Video''' is a state of the art [[artificial intelligence]] technology which needs only text as input for the output as video. The inspiration came from [[text-to-image model]]s which deliver images as output from text as input.
 
Video prediction on making objects realistic in a stable background is performed by using [[recurrent neural network]] for a sequence to sequence model with a connector [[convolutional neural network]] encoding and decoding each frame pixel by pixel,<ref>{{Cite web |title=Leading India |url=https://www.leadingindia.ai/downloads/projects/VP/vp_16.pdf}}</ref> creating video using [[Deepdeep learning]].<ref>{{Cite web |last=Narain |first=Rohit |date=2021-12-29 |title=Smart Video Generation from Text Using Deep Neural Networks |url=https://www.datatobiz.com/blog/smart-video-generation-from-text/ |access-date=2022-10-12 |language=en-US}}</ref>
 
== Methodology ==
* Data collection and data set preparation using clear video from kinetic human action video.
* Training the [[Convolutionalconvolutional neural network]] for making video.
* Keywords extraction from text using [[Naturalnatural-language programming]] .
* Testing of Data set in conditional generative model for existing static and dynamic information from text by [[Variationalvariational autoencoder]] and [[Generativegenerative adversarial network]].
 
== Models ==