Variational autoencoder: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Altered template type. Add: arxiv, pmid, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Jay8g | #UCB_toolbar
Line 101:
== Statistical distance VAE variants==
 
After the initial work of Diederik P. Kingma and [[Max Welling]].<ref>{{citeCite arXiv |last1arxiv=Kingma1312.6114 |class=stat.ML |first1first=Diederik P. |last=Kingma |first2=Max |last2=Welling |title=Auto-Encoding Variational Bayes |date=2022-12-10 |last2=Welling |first2=Max|class=stat.ML |eprint=1312.6114 }}</ref> several procedures were
proposed to formulate in a more abstract way the operation of the VAE. In these approaches the loss function is composed of two parts :
* the usual reconstruction error part which seeks to ensure that the encoder-then-decoder mapping <math>x \mapsto D_\theta(E_\psi(x))</math> is as close to the identity map as possible; the sampling is done at run time from the empirical distribution <math>\mathbb{P}^{real}</math> of objects available (e.g., for MNIST or IMAGENET this will be the empirical probability law of all images in the dataset). This gives the term: <math> \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \|x - D_\theta(E_\phi(x))\|_2^2\right]</math>.
Line 111:
 
The statistical distance <math>d</math> requires special properties, for instance is has to be posses a formula as expectation because the loss function will need to be optimized by [[Stochastic gradient descent|stochastic optimization algorithms]]. Several distances can be chosen and this gave rise to several flavors of VAEs:
* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference |last=Kolouri |first=Soheil |last2=Pope |first2=Phillip E. |last3=Martin |first3=Charles E. |last4=Rohde |first4=Gustavo K. |date=2019 |title=Sliced Wasserstein Auto-Encoders |url=https://openreview.net/forum?id=H1xaJn05FQ |conference=International Conference on Learning Representations |publisher=ICPR |book-title=International Conference on Learning Representations}}</ref>
* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{cite conference
* the [[Energy distance|energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal |last=Turinici |first=Gabriel |year=2021 |title=Radon-Sobolev Variational Auto-Encoders |url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 |journal=Neural Networks |volume=141 |pages=294–305 |arxiv=1911.13135 |doi=10.1016/j.neunet.2021.04.018 |issn=0893-6080 |pmid=33933889}}</ref>
| url = https://openreview.net/forum?id=H1xaJn05FQ
* the [[Maximum Mean Discrepancy]] distance used in the MMD-VAE<ref>{{Cite arXiv |idarxiv=1705.02239 |titlefirst=Maximum Mean Discrepancy VariationalA. Autoencoders|last1last=Gretton |first1first2=AY. |last2=Li |first2title=Y.Maximum Mean Discrepancy Variational Autoencoders |date=2017 |last3=Swersky |first3=K. |last4=Zemel |first4=R. |last5=Turner |first5=R.|date=2017}}</ref>
| title = Sliced Wasserstein Auto-Encoders
* the [[Wasserstein distance]] used in the WAEs<ref>{{Cite arXiv |idarxiv=1711.01558 |titlefirst=WassersteinI. Auto-Encoders|last1last=Tolstikhin |first1first2=IO. |last2=Bousquet |first2title=O.Wasserstein Auto-Encoders |date=2018 |last3=Gelly |first3=S. |last4=Schölkopf |first4=B.|date=2018}}</ref>
| last1 = Kolouri
* kernel-based distances used in the Kernelized Variational Autoencoder (K-VAE)<ref>{{Cite arXiv |idarxiv=1901.02401 |titlefirst=KernelizedC. Variational Autoencoders|last1last=Louizos |first1first2=CX. |last2=Shi |first2title=X.Kernelized Variational Autoencoders |date=2019 |last3=Swersky |first3=K. |last4=Li |first4=Y. |last5=Welling |first5=M.|date=2019}}</ref>
| first1 = Soheil
 
| last2 = Pope
| first2 = Phillip E.
| last3 = Martin
| first3 = Charles E.
| last4 = Rohde
| first4 = Gustavo K.
| date = 2019
| publisher = ICPR
| book-title = International Conference on Learning Representations
| pages =
| conference=International Conference on Learning Representations
}}</ref>
* the [[Energy distance|energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>
{{Cite journal
| last = Turinici
| first = Gabriel
| title = Radon-Sobolev Variational Auto-Encoders
| journal = Neural Networks
| volume = 141
| pages = 294–305
| year = 2021
| issn = 0893-6080
| doi = 10.1016/j.neunet.2021.04.018
| pmid = 33933889
| url = https://www.sciencedirect.com/science/article/pii/S0893608021001556
| arxiv = 1911.13135
}}</ref>
* the [[Maximum Mean Discrepancy]] distance used in the MMD-VAE<ref>{{Cite arXiv|id=1705.02239|title=Maximum Mean Discrepancy Variational Autoencoders|last1=Gretton|first1=A.|last2=Li|first2=Y.|last3=Swersky|first3=K.|last4=Zemel|first4=R.|last5=Turner|first5=R.|date=2017}}</ref>
* the [[Wasserstein distance]] used in the WAEs<ref>{{Cite arXiv|id=1711.01558|title=Wasserstein Auto-Encoders|last1=Tolstikhin|first1=I.|last2=Bousquet|first2=O.|last3=Gelly|first3=S.|last4=Schölkopf|first4=B.|date=2018}}</ref>
* kernel-based distances used in the Kernelized Variational Autoencoder (K-VAE)<ref>{{Cite arXiv|id=1901.02401|title=Kernelized Variational Autoencoders|last1=Louizos|first1=C.|last2=Shi|first2=X.|last3=Swersky|first3=K.|last4=Li|first4=Y.|last5=Welling|first5=M.|date=2019}}</ref>
== See also ==
{{div col|colwidth=22em}}