Content deleted Content added
→Reparameterization: some simplification |
→ELBO loss function: repara |
||
Line 54:
\end{align}</math>
Now define the function<math display="block">L_{\theta,\phi}(x) :=
\mathbb E_{z \sim q_\phi(\cdot | x)} \left[\log \frac{p_\theta(\mathbf{z,x})}{q_\phi(\mathbf{z\mid x})}\right]
= E_{\mathbf{z} \sim q_\phi(\mathbf{z|x})}(\log(p_\theta(\mathbf{x\mid z}))) - D_{KL}(q_\phi(\mathbf{z\mid x}) \parallel p_\theta(\mathbf{z})) </math>This is named the [[evidence lower bound]] (ELBO). Maximizing the ELBO<math display="block">\theta^*,\phi^* = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>is equivalent to simultaneously maximizing <math>p_\theta(x) </math> and minimizing <math> D_{KL}(q_\phi(\mathbf{z\mid x})\parallel p_\theta(\mathbf{z\mid x})) </math>. That is, maximizing the log-likelihoof of the observed data, and minimizing the divergence of the approximate posterior <math>q_\phi(\cdot | x) </math> from the exact posterior <math>p_\theta(\cdot | x) </math>.
For a more detailed derivation and interpretation of ELBO and its maximization, see [[Evidence lower bound|its main page]].
▲: <math>-L_{\theta,\phi} = \log (p_\theta(\mathbf{x})) - D_{KL}(q_\phi(\mathbf{z\mid x})\parallel p_\theta(\mathbf{z\mid x})) \leq \log (p_\theta(\mathbf{x})) </math>
== Reparameterization ==
[[File:Reparameterization Trick.png|thumb|300x300px|The scheme of the reparameterization trick. The randomness variable <math>\mathbf{\varepsilon}</math> is injected into the latent space <math>\mathbf{z}</math> as external input. In this way, it is possible to backpropagate the gradient without involving stochastic variable during the update.]]{{Main|Reparametrization trick}}
To efficient search for <math display="block">\theta^*,\phi^* = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>the typical method is [[gradient descent]]. However, a direct approach:<math display="block">\nabla_\phi \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\log \frac{p_\theta(\mathbf{z,x})}{q_\phi(\mathbf{z\mid x})}\right] </math>does not allow one to put the <math>\nabla_\phi </math> inside the expectation, since <math>\phi </math> appears in the probability distribution itself. The '''reparameterization trick''' bypasses this difficulty.
To make the ELBO formulation suitable for training purposes, it is necessary to slightly modify the problem formulation and the VAE structure.<ref name=":0" /><ref>{{Cite journal|last1=Bengio|first1=Yoshua|last2=Courville|first2=Aaron|last3=Vincent|first3=Pascal|title=Representation Learning: A Review and New Perspectives|url=https://ieeexplore.ieee.org/abstract/document/6472238?casa_token=wQPK9gUGfCsAAAAA:FS5uNYCQVJGH-bq-kVvZeTdnQ8a33C6qQ4VUyDyGLMO13QewH3wcry9_Jh-5FATvspBj8YOXfw|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2013|volume=35|issue=8|pages=1798–1828|doi=10.1109/TPAMI.2013.50|pmid=23787338|issn=1939-3539|arxiv=1206.5538|s2cid=393948}}</ref><ref>{{Cite arXiv|last1=Kingma|first1=Diederik P.|last2=Rezende|first2=Danilo J.|last3=Mohamed|first3=Shakir|last4=Welling|first4=Max|date=2014-10-31|title=Semi-Supervised Learning with Deep Generative Models|class=cs.LG|eprint=1406.5298}}</ref>
|