Revision as of 08:54, 23 December 2024 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Google: image 3 Tag: Visual edit ← Previous edit		Revision as of 11:25, 30 December 2024 edit undo Citation bot (talk \| contribs) Bots 5,868,822 edits Add: arxiv, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Category:Markov models \| #UCB_Category 12/62 Next edit →
Line 379: Muse (2023-01)<ref>{{cite arXiv \|last1=Chang \|first1=Huiwen \|title=Muse: Text-To-Image Generation via Masked Generative Transformers \|date=2023-01-02 \|eprint=2301.00704 \|last2=Zhang \|first2=Han \|last3=Barber \|first3=Jarred \|last4=Maschinot \|first4=A. J. \|last5=Lezama \|first5=Jose \|last6=Jiang \|first6=Lu \|last7=Yang \|first7=Ming-Hsuan \|last8=Murphy \|first8=Kevin \|last9=Freeman \|first9=William T.\|class=cs.CV }}</ref> is not a diffusion model, but an encoder-only Transformer that is trained to predict masked image tokens from unmasked image tokens. Imagen 2 (2023-12) is also diffusion-based. It can generate images based on a prompt that mixes images and text. No further information available.<ref>{{Cite web \|title=Imagen 2 - our most advanced text-to-image technology \|url=https://deepmind.google/technologies/imagen-2/ \|access-date=2024-04-04 \|website=Google DeepMind \|language=en}}</ref> Imagen 3 (2024-05) is too. No further information available.<ref>{{Citation \|~~last~~last1=Imagen-Team-Google \|title=Imagen 3 \|date=2024-12-13 \|url=https://arxiv.org/abs/2408.07009 \|access-date=2024-12-23 \|~~doi~~arxiv=~~10.48550/arXiv.~~2408.07009 \|last2=Baldridge \|first2=Jason \|last3=Bauer \|first3=Jakob \|last4=Bhutani \|first4=Mukul \|last5=Brichtova \|first5=Nicole \|last6=Bunner \|first6=Andrew \|last7=Castrejon \|first7=Lluis \|last8=Chan \|first8=Kelvin \|last9=Chen \|first9=Yichang}}</ref> Veo (2024) generates videos by latent diffusion. The diffusion is conditioned on a vector that encodes both a text prompt and an image prompt.<ref>{{Cite web \|date=2024-05-14 \|title=Veo \|url=https://deepmind.google/technologies/veo/ \|access-date=2024-05-17 \|website=Google DeepMind \|language=en}}</ref>

Diffusion model: Difference between revisions