Diffusion model: Difference between revisions

Content deleted Content added
Google: image 3
Citation bot (talk | contribs)
Add: arxiv, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Markov models | #UCB_Category 12/62
Line 379:
Muse (2023-01)<ref>{{cite arXiv |last1=Chang |first1=Huiwen |title=Muse: Text-To-Image Generation via Masked Generative Transformers |date=2023-01-02 |eprint=2301.00704 |last2=Zhang |first2=Han |last3=Barber |first3=Jarred |last4=Maschinot |first4=A. J. |last5=Lezama |first5=Jose |last6=Jiang |first6=Lu |last7=Yang |first7=Ming-Hsuan |last8=Murphy |first8=Kevin |last9=Freeman |first9=William T.|class=cs.CV }}</ref> is not a diffusion model, but an encoder-only Transformer that is trained to predict masked image tokens from unmasked image tokens.
 
Imagen 2 (2023-12) is also diffusion-based. It can generate images based on a prompt that mixes images and text. No further information available.<ref>{{Cite web |title=Imagen 2 - our most advanced text-to-image technology |url=https://deepmind.google/technologies/imagen-2/ |access-date=2024-04-04 |website=Google DeepMind |language=en}}</ref> Imagen 3 (2024-05) is too. No further information available.<ref>{{Citation |lastlast1=Imagen-Team-Google |title=Imagen 3 |date=2024-12-13 |url=https://arxiv.org/abs/2408.07009 |access-date=2024-12-23 |doiarxiv=10.48550/arXiv.2408.07009 |last2=Baldridge |first2=Jason |last3=Bauer |first3=Jakob |last4=Bhutani |first4=Mukul |last5=Brichtova |first5=Nicole |last6=Bunner |first6=Andrew |last7=Castrejon |first7=Lluis |last8=Chan |first8=Kelvin |last9=Chen |first9=Yichang}}</ref>
 
Veo (2024) generates videos by latent diffusion. The diffusion is conditioned on a vector that encodes both a text prompt and an image prompt.<ref>{{Cite web |date=2024-05-14 |title=Veo |url=https://deepmind.google/technologies/veo/ |access-date=2024-05-17 |website=Google DeepMind |language=en}}</ref>