Revision as of 15:18, 17 April 2025 edit 82.102.110.228 (talk) Stochastic gradient descend has nothing to do with taking expectations. Undid revision 1280088605 by G.S.Ray (talk) Tag: Undo ← Previous edit		Revision as of 05:53, 30 April 2025 edit undo Herbmuell (talk \| contribs) Extended confirmed users 550 edits m →Evidence lower bound (ELBO): x\|z to be consistent with z\|x just below. Next edit →
Line 69: = \ln p_\theta(x) - D_{KL}(q_\phi({\cdot\| x})\parallel p_\theta({\cdot \| x})) </math>Maximizing the ELBO<math display="block">\theta^,\phi^ = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>is equivalent to simultaneously maximizing <math>\ln p_\theta(x) </math> and minimizing <math> D_{KL}(q_\phi({z\| x})\parallel p_\theta({z\| x})) </math>. That is, maximizing the log-likelihood of the observed data, and minimizing the divergence of the approximate posterior <math>q_\phi(\cdot \| x) </math> from the exact posterior <math>p_\theta(\cdot \| x) </math>. The form given is not very convenient for maximization, but the following, equivalent form, is:<math display="block">L_{\theta,\phi}(x) = \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\ln p_\theta(x\|z)\right] - D_{KL}(q_\phi({\cdot\| x})\parallel p_\theta(\cdot)) </math>where <math>\ln p_\theta(x\|z)</math> is implemented as <math>-\frac{1}{2}\\| x - D_\theta(z)\\|^2_2</math>, since that is, up to an additive constant, what <math>x\|z \sim \mathcal N(D_\theta(z), I)</math> yields. That is, we model the distribution of <math>x</math> conditional on <math>z</math> to be a Gaussian distribution centered on <math>D_\theta(z)</math>. The distribution of <math>q_\phi(z \|x)</math> and <math>p_\theta(z)</math> are often also chosen to be Gaussians as <math>z\|x \sim \mathcal N(E_\phi(x), \sigma_\phi(x)^2I)</math> and <math>z \sim \mathcal N(0, I)</math>, with which we obtain by the formula for [[Kullback–Leibler divergence#Multivariate normal distributions\|KL divergence of Gaussians]]:<math display="block">L_{\theta,\phi}(x) = -\frac 12\mathbb E_{z \sim q_\phi(\cdot \| x)} \left[ \\|x - D_\theta(z)\\|_2^2\right] - \frac 12 \left( N\sigma_\phi(x)^2 + \\|E_\phi(x)\\|_2^2 - 2N\ln\sigma_\phi(x) \right) + Const </math>Here <math> N </math> is the dimension of <math> z </math>. For a more detailed derivation and more interpretations of ELBO and its maximization, see [[Evidence lower bound\|its main page]]. == Reparameterization ==

Variational autoencoder: Difference between revisions