Revision as of 20:06, 18 November 2022 edit 2a01:c23:8432:7300:426:16d4:f6f8:8124 (talk) Removed complete bullshit and its irrelevant citations: The fact that inference is an optimization problem is not a merit of variational autoencoders, as one could literally write down the generative model and set its partial derivatives to zero. Tags: references removed Visual edit ← Previous edit		Revision as of 20:53, 18 November 2022 edit undo 2a01:c23:8432:7300:426:16d4:f6f8:8124 (talk) →Architecture Tags: Reverted references removed Next edit →
Line 10: == Architecture == A variational autoencoder is a generative model with a prior and noise distribution respectively. Usually such models are trained using the Expectation-Maximization meta-algorithm (e.g. probabilistic PCA, (spike & slab) sparse coding). Such a scheme optimizes a lower bound of the data likelihood, which is usually intractable, and in doing so requires the discovery of q-distributions, or variational posteriors. These q distributions are normally parameterized for each individual data point in a separate optimization process. However, variational autoencoders use a neural network as an amortized approach to jointly optimize across data points. This neural network takes as input the data points themselves, and outputs parameters for the variational distribution. As it maps from a known input space to the low-dimensional latent space, it is called the encoder. In a VAE the input data is sampled from a parametrized distribution (the [[Prior probability\|prior]], in [[Bayesian inference]] terms), and the encoder and decoder are trained jointly such that the output minimizes a reconstruction error in the sense of the [[Kullback–Leibler divergence]] between the true [[Posterior probability\|posterior]] and its parametric approximation (the so-called "variational posterior").<ref name=":2">An, J., & Cho, S. (2015). Variational autoencoder based anomaly detection using reconstruction probability. ''Special Lecture on IE'', ''2''(1).</ref><ref name="1bitVAE">{{cite arXiv\|eprint=1911.12410\|class=eess.SP\|author1=Khobahi, S.\|first2=M.\|last2=Soltanalian\|title=Model-Aware Deep Architectures for One-Bit Compressive Variational Autoencoding\|date=2019}}</ref><ref>{{Cite journal\|last1=Kingma\|first1=Diederik P.\|last2=Welling\|first2=Max\|date=2019\|title=An Introduction to Variational Autoencoders\|journal=Foundations and Trends in Machine Learning\|volume=12\|issue=4\|pages=307–392\|doi=10.1561/2200000056\|issn=1935-8237\|arxiv=1906.02691\|s2cid=174802445}}</ref> The decoder is the second neural network of this model. It is a function that maps from the latent space to the input space, e.g. as the means of the noise distribution. It is possible to use another neural network that maps to the variance, however this is can be omitted for simplicity. In such a case, the variance can be optimized with gradient descent. To optimize this model, one needs to know two terms: the "reconstruction error", and the [[Kullback–Leibler divergence]]. Both terms are derived from the free energy expression of the probabilistic model, and therefore differ depending on the noise distribution and the assumed prior of the data. The KL-D from the free energy expression maximizes the probability mass of the q distribution that overlaps with the p distribution, which unfortunately can result in mode-seeking behaviour. The "reconstruction" term is the remainder of the free energy expression, and requires a sampling approximation to compute its expectation value. == Formulation ==

Variational autoencoder: Difference between revisions