Content deleted Content added
→Evidence lower bound (ELBO): more formulase |
|||
Line 65:
= \ln p_\theta(x) - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta({\cdot | x})) </math>Maximizing the ELBO<math display="block">\theta^*,\phi^* = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>is equivalent to simultaneously maximizing <math>\ln p_\theta(x) </math> and minimizing <math> D_{KL}(q_\phi({z| x})\parallel p_\theta({z| x})) </math>. That is, maximizing the log-likelihood of the observed data, and minimizing the divergence of the approximate posterior <math>q_\phi(\cdot | x) </math> from the exact posterior <math>p_\theta(\cdot | x) </math>.
The form given is not very convenient for maximization, but the following, equivalent form, is:<math display="block">L_{\theta,\phi}(x) = \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln p_\theta(x|z) - D_{KL}(q_\phi({\cdot| x})\parallel p_\theta(\cdot))\right] </math>where <math>\ln p_\theta(x|z)</math> is implemented as <math>\| x - D_\theta(z)\|^2_2</math>, since that is, up to an additive constant, what <math>x \sim \mathcal N(D_\theta(z), I)</math> yields. That is, we model the distribution of <math>x</math> conditional on <math>z</math> to be a gaussian distribution centered on <math>D_\theta(z)</math>. The distribution of <math>q_\phi(z |x)</math> and <math>p_\theta(z)</math> are often also chosen to be gaussians as <math>z|x \sim \mathcal(E_\phi(x), \sigma_\phi(x)^2I)</math> and <math>z \sim \mathcal(0, I)</math>, with which we obtain by formula for [[Kullback–Leibler divergence#Multivariate normal distributions|KL divergence of gaussians]]:<math display="block">L_{\theta,\phi}(x) = -\frac 12\mathbb E_{z \sim q_\phi(\cdot | x)} \left[ \|x - D_\theta(z)\|_2^2\right]
== Reparameterization ==
|