Revision as of 06:26, 19 July 2022 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits notation change Tag: 2017 wikitext editor ← Previous edit		Revision as of 06:40, 19 July 2022 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Reparameterization: some simplification Tag: Visual edit Next edit →
Line 49: : <math>\begin{align} D_{KL}(q_\phi(\mathbf{z\mid x})\parallel p_\theta(\mathbf{z\mid x})) &= \~~int~~mathbb E_{z \sim q_\phi(\~~mathbf{z\mid~~cdot \| x})} \left[\log \frac{q_\phi(~~\mathbf{~~z~~\mid~~ \|x})}{p_\theta(~~\mathbf{~~z~~\mid~~ \|x})} \~~, d\mathbf{z}~~right]\\ &= \~~int~~mathbb E_{z \sim q_\phi(\~~mathbf{z\mid~~cdot \| x)}) \left[\log \frac{q_\phi(\mathbf{z\mid x})p_\theta(\mathbf{x})}{p_\theta(\mathbf{z,x})} \~~,d\mathbf{z}~~right]\\ &= \~~int~~log q_p_\~~phi~~theta(\mathbf{~~z\mid~~ x}) + \~~left(~~mathbb E_{z \~~log~~sim ~~(p_~~q_\~~theta~~phi(\~~mathbf{~~cdot \| x)}~~)) +~~ \left[\log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{z,x})}\right~~) d\mathbf{z}\\~~] ~~&= \log (p_\theta(\mathbf{x})) + \int q_\phi(\mathbf{z\mid x}) \log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{z,x})} \,d\mathbf{z}\\~~ ~~&= \log (p_\theta(\mathbf{x})) + \int q_\phi(\mathbf{z\mid x}) \log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{x\mid z})p_\theta(\mathbf{z})} \,d\mathbf{z}\\~~ ~~&= \log (p_\theta(\mathbf{x})) + E_{\mathbf{z} \sim q_\phi(\mathbf{z\mid x})}(\log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{z})} - \log(p_\theta(\mathbf{x\mid z})))\\~~ ~~&= \log (p_\theta(\mathbf{x})) + D_{KL}(q_\phi(\mathbf{z\mid x}) \parallel p_\theta(\mathbf{z})) - E_{\mathbf{z} \sim q_\phi(\mathbf{z\mid x})}(\log(p_\theta(\mathbf{x\mid z})))~~ \end{align}</math> Now define the function<math display="block">L_{\theta,\phi}(x) := \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\log \frac{q_\phi(\mathbf{z\mid x})}{p_\theta(\mathbf{z,x})}\right] = -\log (p_\theta(\mathbf{x})) + D_{KL}(q_\phi(\mathbf{z\mid x})\parallel p_\theta(\mathbf{z\mid x})) = -E_{\mathbf{z} \sim q_\phi(\mathbf{z\|x})}(\log(p_\theta(\mathbf{x\mid z}))) + D_{KL}(q_\phi(\mathbf{z\mid x}) \parallel p_\theta(\mathbf{z})) </math> At this point, it is possible to rewrite the equation as Line 83 ⟶ 81: == Reparameterization == [[File:Reparameterization Trick.png\|thumb\|300x300px\|The scheme of the reparameterization trick. The randomness variable <math>\mathbf{\varepsilon}</math> is injected into the latent space <math>\mathbf{z}</math> as external input. In this way, it is possible to backpropagate the gradient without involving stochastic variable during the update.]]{{Main\|Reparametrization trick}} To make the ELBO formulation suitable for training purposes, it is necessary to slightly modify the problem formulation and the VAE structure.<ref name=":0" /><ref>{{Cite journal\|last1=Bengio\|first1=Yoshua\|last2=Courville\|first2=Aaron\|last3=Vincent\|first3=Pascal\|title=Representation Learning: A Review and New Perspectives\|url=https://ieeexplore.ieee.org/abstract/document/6472238?casa_token=wQPK9gUGfCsAAAAA:FS5uNYCQVJGH-bq-kVvZeTdnQ8a33C6qQ4VUyDyGLMO13QewH3wcry9_Jh-5FATvspBj8YOXfw\|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence\|year=2013\|volume=35\|issue=8\|pages=1798–1828\|doi=10.1109/TPAMI.2013.50\|pmid=23787338\|issn=1939-3539\|arxiv=1206.5538\|s2cid=393948}}</ref><ref>{{Cite arXiv\|last1=Kingma\|first1=Diederik P.\|last2=Rezende\|first2=Danilo J.\|last3=Mohamed\|first3=Shakir\|last4=Welling\|first4=Max\|date=2014-10-31\|title=Semi-Supervised Learning with Deep Generative Models\|class=cs.LG\|eprint=1406.5298}}</ref> Stochastic sampling is the non-differentiable operation through which it is possible to sample from the latent space and feed the probabilistic decoder.

Variational autoencoder: Difference between revisions