Variational autoencoder: Difference between revisions

Content deleted Content added
mNo edit summary
Formulation: Tighten up wording, make notation and terminology more consistent with the variational Bayesian methods page.
Line 12:
== Formulation ==
[[File:VAE Basic.png|thumb|425x425px|The basic scheme of a variational autoencoder. The model receives <math>\mathbf{x}</math> as input. The encoder compresses it into the latent space. The decoder receives as input the information sampled from the latent space and produces <math>\mathbf{x'}</math> as similar as possible to <math>\mathbf{x}</math>.]]
From a formal perspective, given an input dataset <math>\mathbf{x}</math> characterized by an unknown probability functiondistribution <math>P(\mathbf{x})</math>, andthe objective is to model or approximate the data's true distribution <math>P</math> using a multivariateparametrized latentdistribution encoding<math>p_\theta</math> vectorhaving parameters <math>\theta</math>. Let <math>\mathbf{z}</math>, thebe objectivea israndom tovector modeljointly-distributed the data as a distributionwith <math>p_\theta(\mathbf{ x})</math>,. withConceptually <math>\thetamathbf z </math> definedwill asrepresent thea setlatent encoding of the<math>\mathbf networkx parameters</math>. [[Marginal distribution|Marginalizing]] over <math>\mathbf z</math> gives
 
: <math>p_\theta(\mathbf{x}) = \int_{\mathbf{z}}p_\theta(\mathbf{x,z}) \, d\mathbf{z}, </math>
It is possible to formalize this distribution as
 
where <math>p_\theta(\mathbf{x,z})</math> represents the [[joint distribution]] under <math>p_\theta</math> of the observable data <math>\mathbf x </math> and its latent representation or encoding <math>\mathbf z </math>. According to the [[Chain rule (probability)|chain rule]], the equation can be rewritten as
: <math>p_\theta(\mathbf{x}) = \int_{\mathbf{z}}p_\theta(\mathbf{x,z}) \, d\mathbf{z} </math>
 
where <math>p_\theta</math> is the [[Model evidence|evidence]] of the model's data with [[Marginalization (probability)|marginalization]] performed over the unobserved variables and thus <math>p_\theta(\mathbf{x,z})</math> represents the [[joint distribution]] between input data and its latent representation according to the network parameters <math>\theta</math>.
 
According to the [[Chain rule (probability)|chain rule]], the equation can be rewritten as
 
: <math>p_\theta(\mathbf{x}) = \int_{\mathbf{z}}p_\theta(\mathbf{x\mid z})p_\theta(\mathbf{z}) \, d\mathbf{z}</math>
 
In the vanilla variational autoencoder, we assumeusually take <math>\mathbf{z}</math> withto be a finite-dimensional dimensionvector andof thatreal numbers, and <math>p_\theta(\mathbf{x|z})</math> isto be a [[Gaussian distribution]],. thenThen <math>p_\theta(\mathbf{x})</math> is a mixture of Gaussian distributions.
 
It is now possible to define the set of the relationships between the input data and its latent representation as