Revision as of 16:44, 7 February 2023 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits →Evidence lower bound (ELBO): more formulase Tag: Visual edit ← Previous edit		Revision as of 16:44, 7 February 2023 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits m →Evidence lower bound (ELBO) Tag: Visual edit Next edit →
Line 65: = \ln p_\theta(x) - D_{KL}(q_\phi({\cdot\| x})\parallel p_\theta({\cdot \| x})) </math>Maximizing the ELBO<math display="block">\theta^,\phi^ = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>is equivalent to simultaneously maximizing <math>\ln p_\theta(x) </math> and minimizing <math> D_{KL}(q_\phi({z\| x})\parallel p_\theta({z\| x})) </math>. That is, maximizing the log-likelihood of the observed data, and minimizing the divergence of the approximate posterior <math>q_\phi(\cdot \| x) </math> from the exact posterior <math>p_\theta(\cdot \| x) </math>. The form given is not very convenient for maximization, but the following, equivalent form, is:<math display="block">L_{\theta,\phi}(x) = \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\ln p_\theta(x\|z) - D_{KL}(q_\phi({\cdot\| x})\parallel p_\theta(\cdot))\right] </math>where <math>\ln p_\theta(x\|z)</math> is implemented as <math>\\| x - D_\theta(z)\\|^2_2</math>, since that is, up to an additive constant, what <math>x \sim \mathcal N(D_\theta(z), I)</math> yields. That is, we model the distribution of <math>x</math> conditional on <math>z</math> to be a gaussian distribution centered on <math>D_\theta(z)</math>. The distribution of <math>q_\phi(z \|x)</math> and <math>p_\theta(z)</math> are often also chosen to be gaussians as <math>z\|x \sim \mathcal(E_\phi(x), \sigma_\phi(x)^2I)</math> and <math>z \sim \mathcal(0, I)</math>, with which we obtain by formula for [[Kullback–Leibler divergence#Multivariate normal distributions\|KL divergence of gaussians]]:<math display="block">L_{\theta,\phi}(x) = -\frac 12\mathbb E_{z \sim q_\phi(\cdot \| x)} \left[ \\|x - D_\theta(z)\\|_2^2\right] +- \frac 12 \left( N\sigma_\phi(x)^2 + \\|E_\phi(x)\\|_2^2 - 2N\ln\sigma_\phi(x) \right) + Const </math>For a more detailed derivation and more interpretations of ELBO and its maximization, see [[Evidence lower bound\|its main page]]. == Reparameterization ==

Variational autoencoder: Difference between revisions