Revision as of 18:16, 21 July 2022 edit MB (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 341,678 edits rmv redlink hatnote WP:REDHAT ← Previous edit		Revision as of 02:01, 24 July 2022 edit undo MB (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 341,678 edits cleanup, typo(s) fixed: i.e → i.e. Tag: AWB Next edit →
Line 5: In [[machine learning]], a '''variational autoencoder (VAE)''',<ref name=":0">{{cite arXiv \|last1=Kingma \|first1=Diederik P. \|last2=Welling \|first2=Max \|title=Auto-Encoding Variational Bayes \|date=2014-05-01 \|class=stat.ML \|eprint=1312.6114}}</ref> is an [[artificial neural network]] architecture introduced by Diederik P. Kingma and [[Max Welling]], belonging to the families of [[graphical model\|probabilistic graphical models]] and [[variational Bayesian methods]]. Variational autoencoders are often associated with the [[autoencoder]] model because of its architectural affinity, but with significant differences in the goal and mathematical formulation. Variational autoencoders allow ''statistical inference'' problems (such as inferring the value of one [[random variable]] from another random variable) to be rewritten as ''statistical optimization'' problems (i.e. find the parameter values that minimize some objective function).<ref>{{cite journal \|last1=Kramer \|first1=Mark A. \|title=Nonlinear principal component analysis using autoassociative neural networks \|journal=AIChE Journal \|date=1991 \|volume=37 \|issue=2 \|pages=233–243 \|doi=10.1002/aic.690370209 \|url=https://aiche.onlinelibrary.wiley.com/doi/abs/10.1002/aic.690370209 \|language=en}}</ref><ref>{{cite journal \|last1=Hinton \|first1=G. E. \|last2=Salakhutdinov \|first2=R. R. \|title=Reducing the Dimensionality of Data with Neural Networks \|journal=Science \|date=2006-07-28 \|volume=313 \|issue=5786 \|pages=504–507 \|doi=10.1126/science.1127647 \|pmid=16873662 \|bibcode=2006Sci...313..504H \|s2cid=1658773 \|url=https://www.science.org/doi/abs/10.1126/science.1127647?casa_token=ZLsQ9vPfFA4AAAAA%3A3iBJRtRFr9RzkbbGpAJQtghIAndmRGEPVxW-yixDgfiXqWuuaQs8WjDMf-fkzTIe8RKn_J9o1aFozD4 \|language=en}}</ref> <ref>{{cite web \|title=A Beginner's Guide to Variational Methods: Mean-Field Approximation \|url=https://blog.evjang.com/2016/08/variational-bayes.html \|website=Eric Jang \|language=en \|date=2016-07-08}}</ref> They are meant to map the input variable to a multivariate latent distribution. Although this type of model was initially designed for [[unsupervised learning]],<ref>{{cite arXiv \|last1=Dilokthanakul \|first1=Nat \|last2=Mediano \|first2=Pedro A. M. \|last3=Garnelo \|first3=Marta \|last4=Lee \|first4=Matthew C. H. \|last5=Salimbeni \|first5=Hugh \|last6=Arulkumaran \|first6=Kai \|last7=Shanahan \|first7=Murray \|title=Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders \|date=2017-01-13 \|class=cs.LG \|eprint=1611.02648}}</ref><ref>{{cite book \|last1=Hsu \|first1=Wei-Ning \|last2=Zhang \|first2=Yu \|last3=Glass \|first3=James \|title=2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) \|chapter=Unsupervised ___domain adaptation for robust speech recognition via variational autoencoder-based data augmentation \|date=December 2017 \|pages=16–23 \|doi=10.1109/ASRU.2017.8268911 \|arxiv=1707.06265 \|isbn=978-1-5090-4788-8 \|s2cid=22681625 \|chapter-url=https://ieeexplore.ieee.org/abstract/document/8268911?casa_token=i8S9DzueB5gAAAAA:SnZUh5mfUYtRpusQLMJxN7eC_-6-qOQs9vpkEcA0Ai_ju-nJH7o1H1DN6nDFdeCY-LgGg3OVKQ}}</ref> its effectiveness has been proven for [[semi-supervised learning]]<ref>{{cite book \|last1=Ehsan Abbasnejad \|first1=M. \|last2=Dick \|first2=Anthony \|last3=van den Hengel \|first3=Anton \|title=Infinite Variational Autoencoder for Semi-Supervised Learning \|date=2017 \|pages=5888–5897 \|url=https://openaccess.thecvf.com/content_cvpr_2017/html/Abbasnejad_Infinite_Variational_Autoencoder_CVPR_2017_paper.html}}</ref><ref>{{cite journal \|last1=Xu \|first1=Weidi \|last2=Sun \|first2=Haoze \|last3=Deng \|first3=Chao \|last4=Tan \|first4=Ying \|title=Variational Autoencoder for Semi-Supervised Text Classification \|journal=Proceedings of the AAAI Conference on Artificial Intelligence \|date=2017-02-12 \|volume=31 \|issue=1 \|url=https://ojs.aaai.org/index.php/AAAI/article/view/10966 \|language=en}}</ref> and [[supervised learning]].<ref>{{cite journal \|last1=Kameoka \|first1=Hirokazu \|last2=Li \|first2=Li \|last3=Inoue \|first3=Shota \|last4=Makino \|first4=Shoji \|title=Supervised Determined Source Separation with Multichannel Variational Autoencoder \|journal=Neural Computation \|date=2019-09-01 \|volume=31 \|issue=9 \|pages=1891–1914 \|doi=10.1162/neco_a_01217 \|pmid=31335290 \|s2cid=198168155 \|url=https://direct.mit.edu/neco/article/31/9/1891/8494/Supervised-Determined-Source-Separation-with}}</ref> == Architecture == Line 51: &=\ln p_\theta(x) + \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\ln \frac{q_\phi({z\| x})}{p_\theta(x, z)}\right] \end{align}</math> Now define the [[evidence lower bound]] (ELBO):<math display="block">L_{\theta,\phi}(x) := Line 61 ⟶ 60: == Reparameterization == [[File:Reparameterization Trick.png\|thumb\|300x300px\|The scheme of the reparameterization trick. The randomness variable <math>{\varepsilon}</math> is injected into the latent space <math>z</math> as external input. In this way, it is possible to backpropagate the gradient without involving stochastic variable during the update.]] To efficient search for <math display="block">\theta^,\phi^ = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>the typical method is [[gradient descent]]. It is straightforward to find<math display="block">\nabla_\theta \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z\| x})}\right] = \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[ \nabla_\theta \ln \frac{p_\theta(x, z)}{q_\phi({z\| x})}\right] </math>However, <math display="block">\nabla_\phi \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z\| x})}\right] </math>does not allow one to put the <math>\nabla_\phi </math> inside the expectation, since <math>\phi </math> appears in the probability distribution itself. The '''reparameterization trick''' (also known as stochastic backpropagation<ref>{{Cite journal \|last=Rezende \|first=Danilo Jimenez \|last2=Mohamed \|first2=Shakir \|last3=Wierstra \|first3=Daan \|date=2014-06-18 \|title=Stochastic Backpropagation and Approximate Inference in Deep Generative Models \|url=https://proceedings.mlr.press/v32/rezende14.html \|journal=International Conference on Machine Learning \|language=en \|publisher=PMLR \|pages=1278–1286}}</ref>) bypasses this difficulty.<ref name=":0" /><ref>{{Cite journal\|last1=Bengio\|first1=Yoshua\|last2=Courville\|first2=Aaron\|last3=Vincent\|first3=Pascal\|title=Representation Learning: A Review and New Perspectives\|url=https://ieeexplore.ieee.org/abstract/document/6472238?casa_token=wQPK9gUGfCsAAAAA:FS5uNYCQVJGH-bq-kVvZeTdnQ8a33C6qQ4VUyDyGLMO13QewH3wcry9_Jh-5FATvspBj8YOXfw\|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence\|year=2013\|volume=35\|issue=8\|pages=1798–1828\|doi=10.1109/TPAMI.2013.50\|pmid=23787338\|issn=1939-3539\|arxiv=1206.5538\|s2cid=393948}}</ref><ref>{{Cite arXiv\|last1=Kingma\|first1=Diederik P.\|last2=Rezende\|first2=Danilo J.\|last3=Mohamed\|first3=Shakir\|last4=Welling\|first4=Max\|date=2014-10-31\|title=Semi-Supervised Learning with Deep Generative Models\|class=cs.LG\|eprint=1406.5298}}</ref> The most important example is when <math>z \sim q_\phi(\cdot \| x) </math> is normally distributed, as <math>\mathcal N(\mu_\phi(x), \Sigma_\phi(x)) </math>.

Variational autoencoder: Difference between revisions