Revision as of 09:37, 19 July 2022 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,297 edits →ELBO loss function: repara Tag: 2017 wikitext editor ← Previous edit		Revision as of 09:47, 19 July 2022 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,297 edits →Reparameterization Tag: Visual edit Next edit →
Line 63: == Reparameterization == [[File:Reparameterization Trick.png\|thumb\|300x300px\|The scheme of the reparameterization trick. The randomness variable <math>\mathbf{\varepsilon}</math> is injected into the latent space <math>\mathbf{z}</math> as external input. In this way, it is possible to backpropagate the gradient without involving stochastic variable during the update.]]{{Main\|Reparametrization trick}} To efficient search for <math display="block">\theta^,\phi^ = \underset{\theta,\phi}\operatorname{arg max} \, L_{\theta,\phi}(x) </math>the typical method is [[gradient descent]]. However, a direct approach:<math display="block">\nabla_\phi \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\log \frac{p_\theta(\mathbf{z,x})}{q_\phi(\mathbf{z\mid x})}\right] </math>does not allow one to put the <math>\nabla_\phi </math> inside the expectation, since <math>\phi </math> appears in the probability distribution itself. The '''reparameterization trick''' bypasses this difficulty. It is straightforward to find<math display="block">\nabla_\theta \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\log \frac{p_\theta(\mathbf{z,x})}{q_\phi(\mathbf{z\mid x})}\right] To= ~~make~~\mathbb ~~the~~E_{z ~~ELBO~~\sim ~~formulation~~q_\phi(\cdot ~~suitable~~\| ~~for~~x)} ~~training~~\left[ ~~purposes~~\nabla_\theta \log \frac{p_\theta(\mathbf{z,x})}{q_\phi(\mathbf{z\mid itx})}\right] is ~~necessary~~</math>However, <math display="block">\nabla_\phi \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\log \frac{p_\theta(\mathbf{z,x})}{q_\phi(\mathbf{z\mid x})}\right] </math>does not allow one to ~~slightly~~put ~~modify~~the <math>\nabla_\phi </math> inside the ~~problem~~expectation, ~~formulation~~since ~~and~~<math>\phi </math> appears in the ~~VAE~~probability distribution itself. The '''reparameterization trick''' (also known as stochastic backpropagation<ref>{{Cite journal \|last=Rezende \|first=Danilo Jimenez \|last2=Mohamed \|first2=Shakir \|last3=Wierstra \|first3=Daan \|date=2014-06-18 \|title=Stochastic Backpropagation and Approximate Inference in Deep Generative Models \|url=https://proceedings.mlr.press/v32/rezende14.html \|journal=International Conference on Machine Learning \|language=en \|publisher=PMLR \|pages=1278–1286}}</ref>) bypasses this ~~structure~~difficulty.<ref name=":0" /><ref>{{Cite journal\|last1=Bengio\|first1=Yoshua\|last2=Courville\|first2=Aaron\|last3=Vincent\|first3=Pascal\|title=Representation Learning: A Review and New Perspectives\|url=https://ieeexplore.ieee.org/abstract/document/6472238?casa_token=wQPK9gUGfCsAAAAA:FS5uNYCQVJGH-bq-kVvZeTdnQ8a33C6qQ4VUyDyGLMO13QewH3wcry9_Jh-5FATvspBj8YOXfw\|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence\|year=2013\|volume=35\|issue=8\|pages=1798–1828\|doi=10.1109/TPAMI.2013.50\|pmid=23787338\|issn=1939-3539\|arxiv=1206.5538\|s2cid=393948}}</ref><ref>{{Cite arXiv\|last1=Kingma\|first1=Diederik P.\|last2=Rezende\|first2=Danilo J.\|last3=Mohamed\|first3=Shakir\|last4=Welling\|first4=Max\|date=2014-10-31\|title=Semi-Supervised Learning with Deep Generative Models\|class=cs.LG\|eprint=1406.5298}}</ref> ~~Stochastic sampling is the non-differentiable operation through which it is possible to sample from the latent space and feed the probabilistic decoder.~~ The most important example is when <math>z \sim q_\phi(\cdot \| x) </math> is normally distributed, as <math>\mathcal N(\mu_\phi(x), \Sigma_\phi(x)) </math>. ~~The main assumption about the latent space is that it can be considered to be a set of multivariate Gaussian distributions, and thus can be described as~~ : ~~<math>\mathbf{z} \sim q_\phi(\mathbf{z}\mid\mathbf{x}) = \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\sigma}^2)</math>.~~[[File:Reparameterized Variational Autoencoder.png\|thumb\|The scheme of a variational autoencoder after the reparameterization trick. \|300x300px]] This can be reparametrized by letting <math>\boldsymbol{\varepsilon} \sim \mathcal{N}(0, \boldsymbol{I})</math> be a "standard [[Random number generation\|random number generator]]", and construct <math>z </math> as <math>z = \mu_\phi(x) + L_\phi(x)\epsilon </math>. Here, <math>L_\phi(x) </math> is obtained by the [[Cholesky decomposition]]:<math display="block">\Sigma_\phi(x) = L_\phi(x)L_\phi(x)^T </math>Then we have<math display="block">\nabla_\phi \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\log \frac{p_\theta(\mathbf{z,x})}{q_\phi(\mathbf{z\mid x})}\right] Given <math>\boldsymbol{\varepsilon} \sim \mathcal{N}(0, \boldsymbol{I})</math> and <math>\odot</math> defined as the element-wise product ([[Hadamard product (matrices)]]), the reparameterization trick modifies the above equation as = \mathbb {E}_{\epsilon}\left[ \nabla_\phi \log {\frac {p_{\theta }(x, \mu_\phi(x) + L_\phi(x)\epsilon)}{q_{\phi }(\mu_\phi(x) + L_\phi(x)\epsilon \| x)}}\right] </math>and so we obtained an unbiased estimator of the gradient, allowing [[stochastic gradient descent]]. ~~: <math>\mathbf{z} = \boldsymbol{\mu} + \boldsymbol{\sigma} \odot \boldsymbol{\varepsilon}. </math>~~ Thanks to this transformation (which can be extended to non-Gaussian distributions), the VAE becomes trainable and the probabilistic encoder has to learn how to map a compressed representation of the input into the two latent vectors <math>\boldsymbol{\mu} </math> and <math>\boldsymbol{\sigma} </math>, while the stochasticity remains excluded from the updating process and is injected in the latent space as an external input through the random vector <math>\boldsymbol{\varepsilon} </math>. == Variations ==

Variational autoencoder: Difference between revisions