Content deleted Content added
m Sp |
m Cleaned up phrasing and made clearer (I hope) the link between the statistical distance and the type of optimization algorithm. Tags: Reverted Visual edit |
||
Line 110:
We obtain the final formula for the loss:
<math display="block"> L_{\theta,\phi} = \mathbb{E}_{x \sim \mathbb{P}^{real}} \left[ \|x - D_\theta(E_\phi(x))\|_2^2\right]
▲The statistical distance <math>d</math> requires special properties, for instance it has to be posses a formula as expectation because the loss function will need to be optimized by [[Stochastic gradient descent|stochastic optimization algorithms]]. Several distances can be chosen and this gave rise to several flavors of VAEs:
* the sliced Wasserstein distance used by S Kolouri, et al. in their VAE<ref>{{Cite conference |last1=Kolouri |first1=Soheil |last2=Pope |first2=Phillip E. |last3=Martin |first3=Charles E. |last4=Rohde |first4=Gustavo K. |date=2019 |title=Sliced Wasserstein Auto-Encoders |url=https://openreview.net/forum?id=H1xaJn05FQ |conference=International Conference on Learning Representations |publisher=ICPR |book-title=International Conference on Learning Representations}}</ref>
* the [[energy distance]] implemented in the Radon Sobolev Variational Auto-Encoder<ref>{{Cite journal |last=Turinici |first=Gabriel |year=2021 |title=Radon-Sobolev Variational Auto-Encoders |url=https://www.sciencedirect.com/science/article/pii/S0893608021001556 |journal=Neural Networks |volume=141 |pages=294–305 |arxiv=1911.13135 |doi=10.1016/j.neunet.2021.04.018 |issn=0893-6080 |pmid=33933889}}</ref>
|