Variational autoencoder: Difference between revisions

Content deleted Content added
m Reverted edits by Materialscientist (talk) to last version by EugenioTL
Tags: Removed redirect Rollback
Lavalec (talk | contribs)
m fixed typos
Line 9:
Variational autoencoders are variational Bayesian methods with a multivariate distribution as prior and a posterior, approximated by an artificial neural network, forming the so-called variational encoder-decoder structure<ref name=":2">An, J., & Cho, S. (2015). Variational autoencoder based anomaly detection using reconstruction probability. ''Special Lecture on IE'', ''2''(1).</ref><ref name="1bitVAE">{{cite arxiv|eprint=1911.12410|class=eess.SP|author1=Khobahi, S.|first2=M.|last2=Soltanalian|title=Model-Aware Deep Architectures for One-Bit Compressive Variational Autoencoding|date=2019}}</ref><ref>{{Cite journal|last=Kingma|first=Diederik P.|last2=Welling|first2=Max|date=2019|title=An Introduction to Variational Autoencoders|url=http://arxiv.org/abs/1906.02691|journal=Foundations and Trends® in Machine Learning|volume=12|issue=4|pages=307–392|doi=10.1561/2200000056|issn=1935-8237}}</ref>.
 
A vanilla encoder is an artificial neural network to reduce its input information into a bottleneck representation named latent space. It represents the first half of the architecture of both encoder and variational autoencoder. For the former, the output is a fixed vector of artificial neurons. For the latter, the outgoing information is compressed into a probabilistic latent space composed still composed by artificial neurons. However, in variational autoencoder architecture, they represent and are treated as two distinct vectors with the same dimensions, representing the vector of means and the vector of standard deviations, respectively.
 
A vanilla decoder is still an artificial neural network thoughtdesigned to be the mirror architecture of the encoder. It takes as input the compressed information coming from the latent space, and then it expands it to produce an output that is as equal as possible to the encoder's input. While for an autoencoder, the decoder input is trivially a fixed-length vector of real values, for a variational autoencoder, it is necessary to introduce an intermediate step. Given the probabilistic nature of the latent space, it is possible to consider it as a multivariate Gaussian vector. With this assumption, and through the technique known as the reparametrization trick, it is possible to sample populations from this latent space and treat them precisely as a fixed-length vector of real values.
 
From a systemic point of view, both the vanilla autoencoder and the variational autoencoder models receive as input a set of high dimensional data. Then they adaptively compress it into a latent space (encoding), and finally, they try to reconstruct it as accurately as possible (decoding). Given the nature of its latent space, the variational autoencoder is characterized by a slightly different objective function: it has to minimize a reconstruction [[loss function]] like the vanilla autoencoder. However, it also takes into account the [[Kullback–Leibler divergence]] between the latent space and a vector of normal Gaussians.
Line 50:
For variational autoencoders the idea is to jointly minimize the generative model parameters <math>\theta</math> to reduce the reconstruction error between the input and the output of the network, and <math>\Phi</math> to have <math>q_\Phi(\mathbf{z|x})</math> as close as possible to <math>p_\theta(\mathbf{z}|\mathbf{x})</math>.
 
As reconstruction loss, [[mean squared error]] and [[cross entropy]] represent good alternatives.
 
As distance loss between the two distributions the reverse Kullback–Leibler divergence <math>D_{KL}(q_\Phi(\mathbf{z|x})||p_\theta(\mathbf{z|x}))</math> is a good choice to squeeze <math>q_\Phi(\mathbf{z|x})</math> under <math>p_\theta(\mathbf{z}|\mathbf{x})</math><ref name=":0" /><ref>{{cite web |title=From Autoencoder to Beta-VAE |url=https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html |website=Lil'Log |language=en |date=2018-08-12}}</ref>.