Diffusion model: Difference between revisions

Content deleted Content added
m Table of contents
m replaced: naturally-occurring → naturally occurring (3)
Line 17:
Diffusion models were introduced in 2015 as a method to train a model that can sample from a highly complex probability distribution. They used techniques from [[non-equilibrium thermodynamics]], especially [[diffusion]].<ref>{{Cite journal |last1=Sohl-Dickstein |first1=Jascha |last2=Weiss |first2=Eric |last3=Maheswaranathan |first3=Niru |last4=Ganguli |first4=Surya |date=2015-06-01 |title=Deep Unsupervised Learning using Nonequilibrium Thermodynamics |url=http://proceedings.mlr.press/v37/sohl-dickstein15.pdf |journal=Proceedings of the 32nd International Conference on Machine Learning |language=en |publisher=PMLR |volume=37 |pages=2256–2265|arxiv=1503.03585 }}</ref>
 
Consider, for example, how one might model the distribution of all naturally- occurring photos. Each image is a point in the space of all images, and the distribution of naturally- occurring photos is a "cloud" in space, which, by repeatedly adding noise to the images, diffuses out to the rest of the image space, until the cloud becomes all but indistinguishable from a [[Normal distribution|Gaussian distribution]] <math>\mathcal{N}(0, I)</math>. A model that can approximately undo the diffusion can then be used to sample from the original distribution. This is studied in "non-equilibrium" thermodynamics, as the starting distribution is not in equilibrium, unlike the final distribution.
 
The equilibrium distribution is the Gaussian distribution <math>\mathcal{N}(0, I)</math>, with pdf <math>\rho(x) \propto e^{-\frac 12 \|x\|^2}</math>. This is just the [[Maxwell–Boltzmann distribution]] of particles in a potential well <math>V(x) = \frac 12 \|x\|^2</math> at temperature 1. The initial distribution, being very much out of equilibrium, would diffuse towards the equilibrium distribution, making biased random steps that are a sum of pure randomness (like a [[Brownian motion|Brownian walker]]) and gradient descent down the potential well. The randomness is necessary: if the particles were to undergo only gradient descent, then they will all fall to the origin, collapsing the distribution.
Line 265:
 
=== Other examples ===
Notable variants include<ref>{{Cite journal |last1=Cao |first1=Hanqun |last2=Tan |first2=Cheng |last3=Gao |first3=Zhangyang |last4=Xu |first4=Yilun |last5=Chen |first5=Guangyong |last6=Heng |first6=Pheng-Ann |last7=Li |first7=Stan Z. |date=July 2024 |title=A Survey on Generative Diffusion Models |url=https://ieeexplore.ieee.org/document/10419041 |journal=IEEE Transactions on Knowledge and Data Engineering |volume=36 |issue=7 |pages=2814–2830 |doi=10.1109/TKDE.2024.3361474 |issn=1041-4347|url-access=subscription }}</ref> Poisson flow generative model,<ref>{{Cite journal |last1=Xu |first1=Yilun |last2=Liu |first2=Ziming |last3=Tian |first3=Yonglong |last4=Tong |first4=Shangyuan |last5=Tegmark |first5=Max |last6=Jaakkola |first6=Tommi |date=2023-07-03 |title=PFGM++: Unlocking the Potential of Physics-Inspired Generative Models |url=https://proceedings.mlr.press/v202/xu23m.html |journal=Proceedings of the 40th International Conference on Machine Learning |language=en |publisher=PMLR |pages=38566–38591|arxiv=2302.04265 }}</ref> consistency model,<ref>{{Cite journal |last1=Song |first1=Yang |last2=Dhariwal |first2=Prafulla |last3=Chen |first3=Mark |last4=Sutskever |first4=Ilya |date=2023-07-03 |title=Consistency Models |url=https://proceedings.mlr.press/v202/song23a |journal=Proceedings of the 40th International Conference on Machine Learning |language=en |publisher=PMLR |pages=32211–32252}}</ref> critically- damped Langevin diffusion,<ref>{{Cite arXiv |last1=Dockhorn |first1=Tim |last2=Vahdat |first2=Arash |last3=Kreis |first3=Karsten |date=2021-10-06 |title=Score-Based Generative Modeling with Critically-Damped Langevin Diffusion |class=stat.ML |eprint=2112.07068 }}</ref> GenPhys,<ref>{{cite arXiv |last1=Liu |first1=Ziming |title=GenPhys: From Physical Processes to Generative Models |date=2023-04-05 |eprint=2304.02637 |last2=Luo |first2=Di |last3=Xu |first3=Yilun |last4=Jaakkola |first4=Tommi |last5=Tegmark |first5=Max|class=cs.LG }}</ref> cold diffusion,<ref>{{Cite journal |last1=Bansal |first1=Arpit |last2=Borgnia |first2=Eitan |last3=Chu |first3=Hong-Min |last4=Li |first4=Jie |last5=Kazemi |first5=Hamid |last6=Huang |first6=Furong |last7=Goldblum |first7=Micah |last8=Geiping |first8=Jonas |last9=Goldstein |first9=Tom |date=2023-12-15 |title=Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise |url=https://proceedings.neurips.cc/paper_files/paper/2023/hash/80fe51a7d8d0c73ff7439c2a2554ed53-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=36 |pages=41259–41282|arxiv=2208.09392 }}</ref> discrete diffusion,<ref>{{Cite journal |last1=Gulrajani |first1=Ishaan |last2=Hashimoto |first2=Tatsunori B. |date=2023-12-15 |title=Likelihood-Based Diffusion Language Models |url=https://proceedings.neurips.cc/paper_files/paper/2023/hash/35b5c175e139bff5f22a5361270fce87-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=36 |pages=16693–16715|arxiv=2305.18619 }}</ref><ref>{{cite arXiv |last1=Lou |first1=Aaron |title=Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution |date=2024-06-06 |eprint=2310.16834 |last2=Meng |first2=Chenlin |last3=Ermon |first3=Stefano|class=stat.ML }}</ref> etc.
 
== Flow-based diffusion model ==