Reworded first sentences for grammar. Cogvideo shouldn't be in the introduction of the topic and is already mentioned and referenced in the Models section.
'''Text-to-Video''' is a state of the art technology which needs only text as input for outcomethe output as video.The inspiration came from [[Texttext-to-image model]]s which deliversdeliver images as output forfrom text as input by CogVideo.<ref>{{Citation |title=CogVideo |date=2022-10-12 |url=https://github.com/THUDM/CogVideo |publisher=THUDM |access-date=2022-10-12}}</ref>
Video prediction on making objects realistic in a stable background is performed by using [[Recurrentrecurrent neural network]] for a sequence to sequence model with a connector [[Convolutionalconvolutional neural network]] encoding/ and decoding each frame pixel by pixel,<ref>{{Cite web |title=Leading India |url=https://www.leadingindia.ai/downloads/projects/VP/vp_16.pdf}}</ref> creating video using [[Deep learning]].<ref>{{Cite web |last=Narain |first=Rohit |date=2021-12-29 |title=Smart Video Generation from Text Using Deep Neural Networks |url=https://www.datatobiz.com/blog/smart-video-generation-from-text/ |access-date=2022-10-12 |language=en-US}}</ref>