Revision as of 22:17, 20 March 2024 edit Yoderj (talk \| contribs) 467 edits The biggest goal is sampling, not probabilistic modeling. let's mention both. ← Previous edit		Revision as of 03:38, 21 March 2024 edit undo PopoDameron (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 2,461 edits sure, but let's not imply that the sampling is something that the model itself is doing Next edit →
Line 1: {{Short description\|Deep learning algorithm}} {{Machine learning\|Artificial neural network}}In [[machine learning]], '''diffusion models''', also known as '''diffusion probabilistic models''' or '''score-based generative models''', are a class of [[latent variable model\|latent variable]] [[generative model\|generative]] models. A diffusion model consists of three major components: the forward process, the reverse process, and the sampling procedure.<ref name="chang23design">{{cite arXiv \|last1=Chang \|first1=Ziyi \|last2=Koulieris \|first2=George Alex \|last3=Shum \|first3=Hubert P. H. \|title=On the Design Fundamentals of Diffusion Models: A Survey \|date=2023 \|eprint=2306.04542 \|class=cs.LG}}</ref> The goal of diffusion models is to learn a [[diffusion process]] that generates ~~the~~a probability distribution offor a given dataset ~~and~~from towhich ~~sample~~we ~~from~~can ~~that~~then ~~distribution~~sample new images. They learn the latent structure of a dataset by modeling the way in which data points diffuse through their [[latent space]].<ref name="song"/> In the case of [[computer vision]], diffusion models can be applied to a variety of tasks, including [[image denoising]], [[inpainting]], [[super-resolution]], and [[text-to-image model\|image generation]]. This typically involves training a neural network to sequentially [[denoise]] images blurred with [[Gaussian noise]].<ref name="song">{{Cite arXiv \|last1=Song \|first1=Yang \|last2=Sohl-Dickstein \|first2=Jascha \|last3=Kingma \|first3=Diederik P. \|last4=Kumar \|first4=Abhishek \|last5=Ermon \|first5=Stefano \|last6=Poole \|first6=Ben \|date=2021-02-10 \|title=Score-Based Generative Modeling through Stochastic Differential Equations \|class=cs.LG \|eprint=2011.13456 }}</ref><ref name="gu">{{cite arXiv \|last1=Gu \|first1=Shuyang \|last2=Chen \|first2=Dong \|last3=Bao \|first3=Jianmin \|last4=Wen \|first4=Fang \|last5=Zhang \|first5=Bo \|last6=Chen \|first6=Dongdong \|last7=Yuan \|first7=Lu \|last8=Guo \|first8=Baining \|title=Vector Quantized Diffusion Model for Text-to-Image Synthesis \|date=2021 \|class=cs.CV \|eprint=2111.14822}}</ref> The model is trained to reverse the process of adding noise to an image. After training to convergence, it can be used for image generation by starting with an image composed of random noise for the network to iteratively denoise. Announced on 13 April 2022, [[OpenAI]]'s text-to-image model [[DALL-E 2]] is an example that uses diffusion models for both the model's prior (which produces an image embedding given a text caption) and the decoder that generates the final image.<ref name="dalle2"/> Diffusion models have recently found applications in natural language processing (NLP),<ref>{{Cite journal \|last=Li \|first=Yifan \|last2=Zhou \|first2=Kun \|last3=Zhao \|first3=Wayne Xin \|last4=Wen \|first4=Ji-Rong \|date=August 2023 \|title=Diffusion Models for Non-autoregressive Text Generation: A Survey \|url=http://dx.doi.org/10.24963/ijcai.2023/750 \|journal=Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence \|___location=California \|publisher=International Joint Conferences on Artificial Intelligence Organization \|doi=10.24963/ijcai.2023/750\|arxiv=2303.06574 }}</ref> particularly in areas like text generation<ref>{{Cite journal \|last=Han \|first=Xiaochuang \|last2=Kumar \|first2=Sachin \|last3=Tsvetkov \|first3=Yulia \|date=2023 \|title=SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control \|url=http://dx.doi.org/10.18653/v1/2023.acl-long.647 \|journal=Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) \|___location=Stroudsburg, PA, USA \|publisher=Association for Computational Linguistics \|doi=10.18653/v1/2023.acl-long.647\|arxiv=2210.17432 }}</ref><ref>{{Cite journal \|last=Xu \|first=Weijie \|last2=Hu \|first2=Wenxiang \|last3=Wu \|first3=Fanyou \|last4=Sengamedu \|first4=Srinivasan \|date=2023 \|title=DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM \|url=http://dx.doi.org/10.18653/v1/2023.findings-emnlp.606 \|journal=Findings of the Association for Computational Linguistics: EMNLP 2023 \|___location=Stroudsburg, PA, USA \|publisher=Association for Computational Linguistics \|doi=10.18653/v1/2023.findings-emnlp.606\|arxiv=2310.15296 }}</ref> and summarization.<ref>{{Cite journal \|last=Zhang \|first=Haopeng \|last2=Liu \|first2=Xiao \|last3=Zhang \|first3=Jiawei \|date=2023 \|title=DiffuSum: Generation Enhanced Extractive Summarization with Diffusion \|url=http://dx.doi.org/10.18653/v1/2023.findings-acl.828 \|journal=Findings of the Association for Computational Linguistics: ACL 2023 \|___location=Stroudsburg, PA, USA \|publisher=Association for Computational Linguistics \|doi=10.18653/v1/2023.findings-acl.828\|arxiv=2305.01735 }}</ref>

Diffusion model: Difference between revisions