Variational autoencoder: Difference between revisions

Content deleted Content added
notation change
Reparameterization: some simplification
Line 49:
 
: <math>\begin{align}
D_{KL}(q_\phi(\mathbf{z\mid x})\parallel p_\theta(\mathbf{z\mid x})) &= \intmathbb E_{z \sim q_\phi(\mathbf{z\midcdot | x})} \left[\log \frac{q_\phi(\mathbf{z\mid |x})}{p_\theta(\mathbf{z\mid |x})} \, d\mathbf{z}right]\\
&= \intmathbb E_{z \sim q_\phi(\mathbf{z\midcdot | x)}) \left[\log \frac{q_\phi(\mathbf{z\mid x})p_\theta(\mathbf{x})}{p_\theta(\mathbf{z,x})} \,d\mathbf{z}right]\\
&= \intlog q_p_\phitheta(\mathbf{z\mid x}) + \left(mathbb E_{z \logsim (p_q_\thetaphi(\mathbf{cdot | x)})) + \left[\log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{z,x})}\right) d\mathbf{z}\\]
&= \log (p_\theta(\mathbf{x})) + \int q_\phi(\mathbf{z\mid x}) \log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{z,x})} \,d\mathbf{z}\\
&= \log (p_\theta(\mathbf{x})) + \int q_\phi(\mathbf{z\mid x}) \log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{x\mid z})p_\theta(\mathbf{z})} \,d\mathbf{z}\\
&= \log (p_\theta(\mathbf{x})) + E_{\mathbf{z} \sim q_\phi(\mathbf{z\mid x})}(\log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{z})} - \log(p_\theta(\mathbf{x\mid z})))\\
&= \log (p_\theta(\mathbf{x})) + D_{KL}(q_\phi(\mathbf{z\mid x}) \parallel p_\theta(\mathbf{z})) - E_{\mathbf{z} \sim q_\phi(\mathbf{z\mid x})}(\log(p_\theta(\mathbf{x\mid z})))
\end{align}</math>
 
Now define the function<math display="block">L_{\theta,\phi}(x) := \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{z,x})}\right] = -\log (p_\theta(\mathbf{x})) + D_{KL}(q_\phi(\mathbf{z\mid x})\parallel p_\theta(\mathbf{z\mid x})) = -E_{\mathbf{z} \sim q_\phi(\mathbf{z|x})}(\log(p_\theta(\mathbf{x\mid z}))) + D_{KL}(q_\phi(\mathbf{z\mid x}) \parallel p_\theta(\mathbf{z})) </math>
 
At this point, it is possible to rewrite the equation as
Line 83 ⟶ 81:
 
== Reparameterization ==
[[File:Reparameterization Trick.png|thumb|300x300px|The scheme of the reparameterization trick. The randomness variable <math>\mathbf{\varepsilon}</math> is injected into the latent space <math>\mathbf{z}</math> as external input. In this way, it is possible to backpropagate the gradient without involving stochastic variable during the update.]]{{Main|Reparametrization trick}}
To make the ELBO formulation suitable for training purposes, it is necessary to slightly modify the problem formulation and the VAE structure.<ref name=":0" /><ref>{{Cite journal|last1=Bengio|first1=Yoshua|last2=Courville|first2=Aaron|last3=Vincent|first3=Pascal|title=Representation Learning: A Review and New Perspectives|url=https://ieeexplore.ieee.org/abstract/document/6472238?casa_token=wQPK9gUGfCsAAAAA:FS5uNYCQVJGH-bq-kVvZeTdnQ8a33C6qQ4VUyDyGLMO13QewH3wcry9_Jh-5FATvspBj8YOXfw|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2013|volume=35|issue=8|pages=1798–1828|doi=10.1109/TPAMI.2013.50|pmid=23787338|issn=1939-3539|arxiv=1206.5538|s2cid=393948}}</ref><ref>{{Cite arXiv|last1=Kingma|first1=Diederik P.|last2=Rezende|first2=Danilo J.|last3=Mohamed|first3=Shakir|last4=Welling|first4=Max|date=2014-10-31|title=Semi-Supervised Learning with Deep Generative Models|class=cs.LG|eprint=1406.5298}}</ref>
 
Stochastic sampling is the non-differentiable operation through which it is possible to sample from the latent space and feed the probabilistic decoder.