Revision as of 03:11, 6 October 2024 edit Citation bot (talk \| contribs) Bots 5,861,751 edits Altered template type. Add: arxiv, pmid, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Jay8g \| #UCB_toolbar ← Previous edit		Revision as of 03:04, 14 October 2024 edit undo Snowman304 (talk \| contribs) Extended confirmed users 13,599 edits →Statistical distance VAE variants: updated citations Tag: ProveIt edit Next edit →
Line 101: == Statistical distance VAE variants== After the initial work of Diederik P. Kingma and [[Max Welling]].<ref>{{~~cite~~Cite arXiv \|~~last1~~arxiv=~~Kingma~~1312.6114 \|class=stat.ML \|~~first1~~first=Diederik P. \|last=Kingma \|first2=Max \|last2=Welling \|title=Auto-Encoding Variational Bayes \|date=2022-12-10 ~~\|last2=Welling \|first2=Max\|class=stat.ML \|eprint=1312.6114~~ }}</ref> several procedures were proposed to formulate in a more abstract way the operation of the VAE. In these approaches the loss function is composed of two parts : * the usual reconstruction error part which seeks to ensure that the encoder-then-decoder mapping <math>x \mapsto D_\theta(E_\psi(x))</math> is as close to the identity map as possible; the sampling is done at run time from the empirical distribution <math>\mathbb{P}^{real}</math> of objects available (e.g., for MNIST or IMAGENET this will be the empirical probability law of all images in the dataset). This gives the term: <math> \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \\|x - D_\theta(E_\phi(x))\\|_2^2\right]</math>. Line 111: The statistical distance <math>d</math> requires special properties, for instance is has to be posses a formula as expectation because the loss function will need to be optimized by [[Stochastic gradient descent\|stochastic optimization algorithms]]. Several distances can be chosen and this gave rise to several flavors of VAEs: * the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference \|last=Kolouri \|first=Soheil \|last2=Pope \|first2=Phillip E. \|last3=Martin \|first3=Charles E. \|last4=Rohde \|first4=Gustavo K. \|date=2019 \|title=Sliced Wasserstein Auto-Encoders \|url=https://openreview.net/forum?id=H1xaJn05FQ \|conference=International Conference on Learning Representations \|publisher=ICPR \|book-title=International Conference on Learning Representations}}</ref> * the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{cite conference * the [[Energy distance\|energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal \|last=Turinici \|first=Gabriel \|year=2021 \|title=Radon-Sobolev Variational Auto-Encoders \|url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 \|journal=Neural Networks \|volume=141 \|pages=294–305 \|arxiv=1911.13135 \|doi=10.1016/j.neunet.2021.04.018 \|issn=0893-6080 \|pmid=33933889}}</ref> ~~\| url = https://openreview.net/forum?id=H1xaJn05FQ~~ * the [[Maximum Mean Discrepancy]] distance used in the MMD-VAE<ref>{{Cite arXiv \|idarxiv=1705.02239 \|~~title~~first=~~Maximum Mean Discrepancy Variational~~A. ~~Autoencoders~~\|~~last1~~last=Gretton \|~~first1~~first2=AY. \|last2=Li \|~~first2~~title=Y.Maximum Mean Discrepancy Variational Autoencoders \|date=2017 \|last3=Swersky \|first3=K. \|last4=Zemel \|first4=R. \|last5=Turner \|first5=R.~~\|date=2017~~}}</ref>▼ ~~\| title = Sliced Wasserstein Auto-Encoders~~ * the [[Wasserstein distance]] used in the WAEs<ref>{{Cite arXiv \|idarxiv=1711.01558 \|~~title~~first=~~Wasserstein~~I. ~~Auto-Encoders~~\|~~last1~~last=Tolstikhin \|~~first1~~first2=IO. \|last2=Bousquet \|~~first2~~title=O.Wasserstein Auto-Encoders \|date=2018 \|last3=Gelly \|first3=S. \|last4=Schölkopf \|first4=B.~~\|date=2018~~}}</ref>▼ ~~\| last1 = Kolouri~~ * kernel-based distances used in the Kernelized Variational Autoencoder (K-VAE)<ref>{{Cite arXiv \|idarxiv=1901.02401 \|~~title~~first=~~Kernelized~~C. ~~Variational Autoencoders~~\|~~last1~~last=Louizos \|~~first1~~first2=CX. \|last2=Shi \|~~first2~~title=X.Kernelized Variational Autoencoders \|date=2019 \|last3=Swersky \|first3=K. \|last4=Li \|first4=Y. \|last5=Welling \|first5=M.~~\|date=2019~~}}</ref>▼ ~~\| first1 = Soheil~~ ~~\| last2 = Pope~~ ~~\| first2 = Phillip E.~~ ~~\| last3 = Martin~~ ~~\| first3 = Charles E.~~ ~~\| last4 = Rohde~~ ~~\| first4 = Gustavo K.~~ ~~\| date = 2019~~ ~~\| publisher = ICPR~~ ~~\| book-title = International Conference on Learning Representations~~ ~~\| pages =~~ ~~\| conference=International Conference on Learning Representations~~ ~~}}</ref>~~ * the [[Energy distance\|energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref> ~~{{Cite journal~~ ~~\| last = Turinici~~ ~~\| first = Gabriel~~ ~~\| title = Radon-Sobolev Variational Auto-Encoders~~ ~~\| journal = Neural Networks~~ ~~\| volume = 141~~ ~~\| pages = 294–305~~ ~~\| year = 2021~~ ~~\| issn = 0893-6080~~ ~~\| doi = 10.1016/j.neunet.2021.04.018~~ ~~\| pmid = 33933889~~ ~~\| url = https://www.sciencedirect.com/science/article/pii/S0893608021001556~~ ~~\| arxiv = 1911.13135~~ ~~}}</ref>~~ ▲* the [[Maximum Mean Discrepancy]] distance used in the MMD-VAE<ref>{{Cite arXiv\|id=1705.02239\|title=Maximum Mean Discrepancy Variational Autoencoders\|last1=Gretton\|first1=A.\|last2=Li\|first2=Y.\|last3=Swersky\|first3=K.\|last4=Zemel\|first4=R.\|last5=Turner\|first5=R.\|date=2017}}</ref> ▲* the [[Wasserstein distance]] used in the WAEs<ref>{{Cite arXiv\|id=1711.01558\|title=Wasserstein Auto-Encoders\|last1=Tolstikhin\|first1=I.\|last2=Bousquet\|first2=O.\|last3=Gelly\|first3=S.\|last4=Schölkopf\|first4=B.\|date=2018}}</ref> ▲* kernel-based distances used in the Kernelized Variational Autoencoder (K-VAE)<ref>{{Cite arXiv\|id=1901.02401\|title=Kernelized Variational Autoencoders\|last1=Louizos\|first1=C.\|last2=Shi\|first2=X.\|last3=Swersky\|first3=K.\|last4=Li\|first4=Y.\|last5=Welling\|first5=M.\|date=2019}}</ref> == See also == {{div col\|colwidth=22em}}

Variational autoencoder: Difference between revisions