Variational autoencoder: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: title, pages, year, template type. Add: chapter-url, isbn, doi, volume, series, chapter. Removed or converted URL. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. Upgrade ISBN10 to 13. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Neural network architectures | #UCB_Category 2/30
m v2.05 - Autofix / Fix errors for CW project (Link equal to linktext)
Line 106:
proposed to formulate in a more abstract way the operation of the VAE. In these approaches the loss function is composed of two parts :
* the usual reconstruction error part which seeks to ensure that the encoder-then-decoder mapping <math>x \mapsto D_\theta(E_\psi(x))</math> is as close to the identity map as possible; the sampling is done at run time from the empirical distribution <math>\mathbb{P}^{real}</math> of objects available (e.g., for MNIST or IMAGENET this will be the empirical probability law of all images in the dataset). This gives the term: <math> \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \|x - D_\theta(E_\phi(x))\|_2^2\right]</math>.
* a variational part that ensures that, when the empirical distribution <math>\mathbb{P}^{real}</math> is passed through the encoder <math>E_\phi</math>, we recover the target distribution, denoted here <math>\mu(dz)</math> that is usually taken to be a [[Multivariate normal distribution]]. We will denote <math>E_\phi \sharp \mathbb{P}^{real}</math> this [[Pushforward measure|pushforward measure]] which in practice is just the empirical distribution obtained by passing all dataset objects through the encoder <math> E_\phi</math>. In order to make sure that <math>E_\phi \sharp \mathbb{P}^{real}</math> is close to the target <math>\mu(dz)</math>, a [[Statistical distance]] <math>d</math> is invoked and the term <math>d \left( \mu(dz), E_\phi \sharp \mathbb{P}^{real} \right)^2 </math> is added to the loss.
 
We obtain the final formula for the loss:
Line 114:
The statistical distance <math>d</math> requires special properties, for instance is has to be posses a formula as expectation because the loss function will need to be optimized by [[Stochastic gradient descent|stochastic optimization algorithms]]. Several distances can be chosen and this gave rise to several flavors of VAEs:
* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference |last1=Kolouri |first1=Soheil |last2=Pope |first2=Phillip E. |last3=Martin |first3=Charles E. |last4=Rohde |first4=Gustavo K. |date=2019 |title=Sliced Wasserstein Auto-Encoders |url=https://openreview.net/forum?id=H1xaJn05FQ |conference=International Conference on Learning Representations |publisher=ICPR |book-title=International Conference on Learning Representations}}</ref>
* the [[Energy distance|energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal |last=Turinici |first=Gabriel |year=2021 |title=Radon-Sobolev Variational Auto-Encoders |url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 |journal=Neural Networks |volume=141 |pages=294–305 |arxiv=1911.13135 |doi=10.1016/j.neunet.2021.04.018 |issn=0893-6080 |pmid=33933889}}</ref>
* the [[Maximum Mean Discrepancy]] distance used in the MMD-VAE<ref>{{Cite journal |arxiv=1705.02239 |first1=A. |last1=Gretton |first2=Y. |last2=Li |title=A Polya Contagion Model for Networks |date=2017 |last3=Swersky |first3=K. |last4=Zemel |first4=R. |last5=Turner |first5=R.|journal=IEEE Transactions on Control of Network Systems |volume=5 |issue=4 |pages=1998–2010 |doi=10.1109/TCNS.2017.2781467 }}</ref>
* the [[Wasserstein distance]] used in the WAEs<ref>{{Cite arXiv |eprint=1711.01558 |first1=I. |last1=Tolstikhin |first2=O. |last2=Bousquet |title=Wasserstein Auto-Encoders |date=2018 |last3=Gelly |first3=S. |last4=Schölkopf |first4=B.|class=stat.ML }}</ref>