Variational autoencoder: Difference between revisions

Content deleted Content added
Stochastic gradient descend has nothing to do with taking expectations. Undid revision 1280088605 by G.S.Ray (talk)
m Evidence lower bound (ELBO): x|z to be consistent with z|x just below.
Line 69:
= \ln p_\theta(x) - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta({\cdot | x})) </math>Maximizing the ELBO<math display="block">\theta^*,\phi^* = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>is equivalent to simultaneously maximizing <math>\ln p_\theta(x) </math> and minimizing <math> D_{KL}(q_\phi({z| x})\parallel p_\theta({z| x})) </math>. That is, maximizing the log-likelihood of the observed data, and minimizing the divergence of the approximate posterior <math>q_\phi(\cdot | x) </math> from the exact posterior <math>p_\theta(\cdot | x) </math>.
 
The form given is not very convenient for maximization, but the following, equivalent form, is:<math display="block">L_{\theta,\phi}(x) = \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln p_\theta(x|z)\right] - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta(\cdot)) </math>where <math>\ln p_\theta(x|z)</math> is implemented as <math>-\frac{1}{2}\| x - D_\theta(z)\|^2_2</math>, since that is, up to an additive constant, what <math>x|z \sim \mathcal N(D_\theta(z), I)</math> yields. That is, we model the distribution of <math>x</math> conditional on <math>z</math> to be a Gaussian distribution centered on <math>D_\theta(z)</math>. The distribution of <math>q_\phi(z |x)</math> and <math>p_\theta(z)</math> are often also chosen to be Gaussians as <math>z|x \sim \mathcal N(E_\phi(x), \sigma_\phi(x)^2I)</math> and <math>z \sim \mathcal N(0, I)</math>, with which we obtain by the formula for [[Kullback–Leibler divergence#Multivariate normal distributions|KL divergence of Gaussians]]:<math display="block">L_{\theta,\phi}(x) = -\frac 12\mathbb E_{z \sim q_\phi(\cdot | x)} \left[ \|x - D_\theta(z)\|_2^2\right] - \frac 12 \left( N\sigma_\phi(x)^2 + \|E_\phi(x)\|_2^2 - 2N\ln\sigma_\phi(x) \right) + Const </math>Here <math> N </math> is the dimension of <math> z </math>. For a more detailed derivation and more interpretations of ELBO and its maximization, see [[Evidence lower bound|its main page]].
 
== Reparameterization ==