Content deleted Content added
→Google: image 3 |
Citation bot (talk | contribs) Add: arxiv, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Markov models | #UCB_Category 12/62 |
||
Line 379:
Muse (2023-01)<ref>{{cite arXiv |last1=Chang |first1=Huiwen |title=Muse: Text-To-Image Generation via Masked Generative Transformers |date=2023-01-02 |eprint=2301.00704 |last2=Zhang |first2=Han |last3=Barber |first3=Jarred |last4=Maschinot |first4=A. J. |last5=Lezama |first5=Jose |last6=Jiang |first6=Lu |last7=Yang |first7=Ming-Hsuan |last8=Murphy |first8=Kevin |last9=Freeman |first9=William T.|class=cs.CV }}</ref> is not a diffusion model, but an encoder-only Transformer that is trained to predict masked image tokens from unmasked image tokens.
Imagen 2 (2023-12) is also diffusion-based. It can generate images based on a prompt that mixes images and text. No further information available.<ref>{{Cite web |title=Imagen 2 - our most advanced text-to-image technology |url=https://deepmind.google/technologies/imagen-2/ |access-date=2024-04-04 |website=Google DeepMind |language=en}}</ref> Imagen 3 (2024-05) is too. No further information available.<ref>{{Citation |
Veo (2024) generates videos by latent diffusion. The diffusion is conditioned on a vector that encodes both a text prompt and an image prompt.<ref>{{Cite web |date=2024-05-14 |title=Veo |url=https://deepmind.google/technologies/veo/ |access-date=2024-05-17 |website=Google DeepMind |language=en}}</ref>
|