Content deleted Content added
m Reverted edits by Materialscientist (talk) to last version by EugenioTL |
m fixed typos |
||
Line 9:
Variational autoencoders are variational Bayesian methods with a multivariate distribution as prior and a posterior, approximated by an artificial neural network, forming the so-called variational encoder-decoder structure<ref name=":2">An, J., & Cho, S. (2015). Variational autoencoder based anomaly detection using reconstruction probability. ''Special Lecture on IE'', ''2''(1).</ref><ref name="1bitVAE">{{cite arxiv|eprint=1911.12410|class=eess.SP|author1=Khobahi, S.|first2=M.|last2=Soltanalian|title=Model-Aware Deep Architectures for One-Bit Compressive Variational Autoencoding|date=2019}}</ref><ref>{{Cite journal|last=Kingma|first=Diederik P.|last2=Welling|first2=Max|date=2019|title=An Introduction to Variational Autoencoders|url=http://arxiv.org/abs/1906.02691|journal=Foundations and Trends® in Machine Learning|volume=12|issue=4|pages=307–392|doi=10.1561/2200000056|issn=1935-8237}}</ref>.
A vanilla encoder is an artificial neural network to reduce its input information into a bottleneck representation named latent space. It represents the first half of the architecture of both encoder and variational autoencoder. For the former, the output is a fixed vector of artificial neurons. For the latter, the outgoing information is compressed into a probabilistic latent space
A vanilla decoder is still an artificial neural network
From a systemic point of view, both the vanilla autoencoder and the variational autoencoder models receive as input a set of high dimensional data. Then they adaptively compress it into a latent space (encoding), and finally, they try to reconstruct it as accurately as possible (decoding). Given the nature of its latent space, the variational autoencoder is characterized by a slightly different objective function: it has to minimize a reconstruction [[loss function]] like the vanilla autoencoder. However, it also takes into account the [[Kullback–Leibler divergence]] between the latent space and a vector of normal Gaussians.
Line 50:
For variational autoencoders the idea is to jointly minimize the generative model parameters <math>\theta</math> to reduce the reconstruction error between the input and the output of the network, and <math>\Phi</math> to have <math>q_\Phi(\mathbf{z|x})</math> as close as possible to <math>p_\theta(\mathbf{z}|\mathbf{x})</math>.
As reconstruction loss, [[mean squared error]] and [[cross entropy]] represent good alternatives.
As distance loss between the two distributions the reverse Kullback–Leibler divergence <math>D_{KL}(q_\Phi(\mathbf{z|x})||p_\theta(\mathbf{z|x}))</math> is a good choice to squeeze <math>q_\Phi(\mathbf{z|x})</math> under <math>p_\theta(\mathbf{z}|\mathbf{x})</math><ref name=":0" /><ref>{{cite web |title=From Autoencoder to Beta-VAE |url=https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html |website=Lil'Log |language=en |date=2018-08-12}}</ref>.
|