Revision as of 03:04, 14 October 2024 edit Snowman304 (talk \| contribs) Extended confirmed users 13,599 edits →Statistical distance VAE variants: updated citations Tag: ProveIt edit ← Previous edit		Revision as of 03:05, 14 October 2024 edit undo Citation bot (talk \| contribs) Bots 5,861,492 edits Alter: title, template type. Add: class, doi, pages, issue, volume, journal, eprint, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Snowman304 \| #UCB_toolbar Next edit →
Line 101: == Statistical distance VAE variants== After the initial work of Diederik P. Kingma and [[Max Welling]].<ref>{{Cite arXiv \|~~arxiv~~eprint=1312.6114 \|class=stat.ML \|~~first~~first1=Diederik P. \|~~last~~last1=Kingma \|first2=Max \|last2=Welling \|title=Auto-Encoding Variational Bayes \|date=2022-12-10}}</ref> several procedures were proposed to formulate in a more abstract way the operation of the VAE. In these approaches the loss function is composed of two parts : * the usual reconstruction error part which seeks to ensure that the encoder-then-decoder mapping <math>x \mapsto D_\theta(E_\psi(x))</math> is as close to the identity map as possible; the sampling is done at run time from the empirical distribution <math>\mathbb{P}^{real}</math> of objects available (e.g., for MNIST or IMAGENET this will be the empirical probability law of all images in the dataset). This gives the term: <math> \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \\|x - D_\theta(E_\phi(x))\\|_2^2\right]</math>. Line 111: The statistical distance <math>d</math> requires special properties, for instance is has to be posses a formula as expectation because the loss function will need to be optimized by [[Stochastic gradient descent\|stochastic optimization algorithms]]. Several distances can be chosen and this gave rise to several flavors of VAEs: * the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference \|~~last~~last1=Kolouri \|~~first~~first1=Soheil \|last2=Pope \|first2=Phillip E. \|last3=Martin \|first3=Charles E. \|last4=Rohde \|first4=Gustavo K. \|date=2019 \|title=Sliced Wasserstein Auto-Encoders \|url=https://openreview.net/forum?id=H1xaJn05FQ \|conference=International Conference on Learning Representations \|publisher=ICPR \|book-title=International Conference on Learning Representations}}</ref> * the [[Energy distance\|energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal \|last=Turinici \|first=Gabriel \|year=2021 \|title=Radon-Sobolev Variational Auto-Encoders \|url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 \|journal=Neural Networks \|volume=141 \|pages=294–305 \|arxiv=1911.13135 \|doi=10.1016/j.neunet.2021.04.018 \|issn=0893-6080 \|pmid=33933889}}</ref> * the [[Maximum Mean Discrepancy]] distance used in the MMD-VAE<ref>{{Cite ~~arXiv~~journal \|arxiv=1705.02239 \|~~first~~first1=A. \|~~last~~last1=Gretton \|first2=Y. \|last2=Li \|title=~~Maximum~~A Polya ~~Mean~~Contagion ~~Discrepancy~~Model ~~Variational~~for ~~Autoencoders~~Networks \|date=2017 \|last3=Swersky \|first3=K. \|last4=Zemel \|first4=R. \|last5=Turner \|first5=R.\|journal=IEEE Transactions on Control of Network Systems \|volume=5 \|issue=4 \|pages=1998–2010 \|doi=10.1109/TCNS.2017.2781467 }}</ref> * the [[Wasserstein distance]] used in the WAEs<ref>{{Cite arXiv \|~~arxiv~~eprint=1711.01558 \|~~first~~first1=I. \|~~last~~last1=Tolstikhin \|first2=O. \|last2=Bousquet \|title=Wasserstein Auto-Encoders \|date=2018 \|last3=Gelly \|first3=S. \|last4=Schölkopf \|first4=B.\|class=stat.ML }}</ref> * kernel-based distances used in the Kernelized Variational Autoencoder (K-VAE)<ref>{{Cite arXiv \|~~arxiv~~eprint=1901.02401 \|~~first~~first1=C. \|~~last~~last1=Louizos \|first2=X. \|last2=Shi \|title=Kernelized Variational Autoencoders \|date=2019 \|last3=Swersky \|first3=K. \|last4=Li \|first4=Y. \|last5=Welling \|first5=M.\|class=astro-ph.CO }}</ref> == See also ==

Variational autoencoder: Difference between revisions