Variational autoencoder: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
Citation bot (talk | contribs)
Add: arxiv, bibcode. Removed URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 819/967
 
(One intermediate revision by one other user not shown)
Line 12:
Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution within the latent space, rather than to a single point in that space. The decoder has the opposite function, which is to map from the latent space to the input space, again according to a distribution (although in practice, noise is rarely added during the decoding stage). By mapping a point to a distribution instead of a single point, the network can avoid overfitting the training data. Both networks are typically trained together with the usage of the [[#Reparameterization|reparameterization trick]], although the variance of the noise model can be learned separately.{{cn|date=June 2024}}
 
Although this type of model was initially designed for [[unsupervised learning]],<ref>{{cite arXiv |last1=Dilokthanakul |first1=Nat |last2=Mediano |first2=Pedro A. M. |last3=Garnelo |first3=Marta |last4=Lee |first4=Matthew C. H. |last5=Salimbeni |first5=Hugh |last6=Arulkumaran |first6=Kai |last7=Shanahan |first7=Murray |title=Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders |date=2017-01-13 |class=cs.LG |eprint=1611.02648}}</ref><ref>{{cite book |last1=Hsu |first1=Wei-Ning |last2=Zhang |first2=Yu |last3=Glass |first3=James |title=2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |chapter=Unsupervised ___domain adaptation for robust speech recognition via variational autoencoder-based data augmentation |date=December 2017 |pages=16–23 |doi=10.1109/ASRU.2017.8268911 |arxiv=1707.06265 |isbn=978-1-5090-4788-8 |s2cid=22681625 |chapter-url=https://ieeexplore.ieee.org/document/8268911}}</ref> its effectiveness has been proven for [[semi-supervised learning]]<ref>{{cite book |last1=Ehsan Abbasnejad |first1=M. |last2=Dick |first2=Anthony |last3=van den Hengel |first3=Anton |title=Infinite Variational Autoencoder for Semi-Supervised Learning |date=2017 |pages=5888–5897 |url=https://openaccess.thecvf.com/content_cvpr_2017/html/Abbasnejad_Infinite_Variational_Autoencoder_CVPR_2017_paper.html}}</ref><ref>{{cite journal |last1=Xu |first1=Weidi |last2=Sun |first2=Haoze |last3=Deng |first3=Chao |last4=Tan |first4=Ying |title=Variational Autoencoder for Semi-Supervised Text Classification |journal=Proceedings of the AAAI Conference on Artificial Intelligence |date=2017-02-12 |volume=31 |issue=1 |doi=10.1609/aaai.v31i1.10966 |s2cid=2060721 |url=https://ojs.aaai.org/index.php/AAAI/article/view/10966 |language=en|doi-access=free }}</ref> and [[supervised learning]].<ref>{{cite journal |last1=Kameoka |first1=Hirokazu |last2=Li |first2=Li |last3=Inoue |first3=Shota |last4=Makino |first4=Shoji |title=Supervised Determined Source Separation with Multichannel Variational Autoencoder |journal=Neural Computation |date=2019-09-01 |volume=31 |issue=9 |pages=1891–1914 |doi=10.1162/neco_a_01217 |pmid=31335290 |s2cid=198168155 |url=https://direct.mit.edu/neco/article/31/9/1891/8494/Supervised-Determined-Source-Separation-with|url-access=subscription }}</ref>
 
== Overview of architecture and operation ==
Line 76:
 
It is straightforward to find<math display="block">\nabla_\theta \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z| x})}\right]
= \mathbb E_{z \sim q_\phi(\cdot | x)} \left[ \nabla_\theta \ln \frac{p_\theta(x, z)}{q_\phi({z| x})}\right] </math>However, <math display="block">\nabla_\phi \mathbb E_{z \sim q_\phi(\cdot | x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z| x})}\right] </math>does not allow one to put the <math>\nabla_\phi </math> inside the expectation, since <math>\phi </math> appears in the probability distribution itself. The '''reparameterization trick''' (also known as stochastic backpropagation<ref>{{Cite journal |last1=Rezende |first1=Danilo Jimenez |last2=Mohamed |first2=Shakir |last3=Wierstra |first3=Daan |date=2014-06-18 |title=Stochastic Backpropagation and Approximate Inference in Deep Generative Models |url=https://proceedings.mlr.press/v32/rezende14.html |journal=International Conference on Machine Learning |language=en |publisher=PMLR |pages=1278–1286|arxiv=1401.4082 }}</ref>) bypasses this difficulty.<ref name="Kingma2013"/><ref>{{Cite journal|last1=Bengio|first1=Yoshua|last2=Courville|first2=Aaron|last3=Vincent|first3=Pascal|title=Representation Learning: A Review and New Perspectives|url=https://ieeexplore.ieee.org/document/6472238|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2013|volume=35|issue=8|pages=1798–1828|doi=10.1109/TPAMI.2013.50|pmid=23787338|issn=1939-3539|arxiv=1206.5538|bibcode=2013ITPAM..35.1798B |s2cid=393948}}</ref><ref>{{Cite arXiv|last1=Kingma|first1=Diederik P.|last2=Rezende|first2=Danilo J.|last3=Mohamed|first3=Shakir|last4=Welling|first4=Max|date=2014-10-31|title=Semi-Supervised Learning with Deep Generative Models|class=cs.LG|eprint=1406.5298}}</ref>
 
The most important example is when <math>z \sim q_\phi(\cdot | x) </math> is normally distributed, as <math>\mathcal N(\mu_\phi(x), \Sigma_\phi(x)) </math>.
Line 97:
Some structures directly deal with the quality of the generated samples<ref>{{Cite arXiv|last1=Dai|first1=Bin|last2=Wipf|first2=David|date=2019-10-30|title=Diagnosing and Enhancing VAE Models|class=cs.LG|eprint=1903.05789}}</ref><ref>{{Cite arXiv|last1=Dorta|first1=Garoe|last2=Vicente|first2=Sara|last3=Agapito|first3=Lourdes|last4=Campbell|first4=Neill D. F.|last5=Simpson|first5=Ivor|date=2018-07-31|title=Training VAEs Under Structured Residuals|class=stat.ML|eprint=1804.01050}}</ref> or implement more than one latent space to further improve the representation learning.
 
Some architectures mix VAE and [[generative adversarial network]]s to obtain hybrid models.<ref>{{Cite journal|last1=Larsen|first1=Anders Boesen Lindbo|last2=Sønderby|first2=Søren Kaae|last3=Larochelle|first3=Hugo|last4=Winther|first4=Ole|date=2016-06-11|title=Autoencoding beyond pixels using a learned similarity metric|url=http://proceedings.mlr.press/v48/larsen16.html|journal=International Conference on Machine Learning|language=en|publisher=PMLR|pages=1558–1566|arxiv=1512.09300}}</ref><ref>{{cite arXiv|last1=Bao|first1=Jianmin|last2=Chen|first2=Dong|last3=Wen|first3=Fang|last4=Li|first4=Houqiang|last5=Hua|first5=Gang|date=2017|title=CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training|pages=2745–2754|class=cs.CV|eprint=1703.10155}}</ref><ref>{{Cite journal|last1=Gao|first1=Rui|last2=Hou|first2=Xingsong|last3=Qin|first3=Jie|last4=Chen|first4=Jiaxin|last5=Liu|first5=Li|last6=Zhu|first6=Fan|last7=Zhang|first7=Zhao|last8=Shao|first8=Ling|date=2020|title=Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning|url=https://ieeexplore.ieee.org/document/8957359|journal=IEEE Transactions on Image Processing|volume=29|pages=3665–3680|doi=10.1109/TIP.2020.2964429|pmid=31940538|bibcode=2020ITIP...29.3665G|s2cid=210334032|issn=1941-0042|url-access=subscription}}</ref>
 
It is not necessary to use gradients to update the encoder. In fact, the encoder is not necessary for the generative model. <ref>{{cite book | last1=Drefs | first1=J. | last2=Guiraud | first2=E. | last3=Panagiotou | first3=F. | last4=Lücke | first4=J. | chapter=Direct evolutionary optimization of variational autoencoders with binary latents | title=Joint European Conference on Machine Learning and Knowledge Discovery in Databases | series=Lecture Notes in Computer Science | pages=357–372 | year=2023 | volume=13715 | publisher=Springer Nature Switzerland | doi=10.1007/978-3-031-26409-2_22 | isbnarxiv=978-3-031-26408-52011.13704 | chapter-urlisbn=https://link.springer.com/chapter/10.1007/978-3-031-2640926408-2_225 }}</ref>
 
== Statistical distance VAE variants==
Line 146:
[[Category:Bayesian statistics]]
[[Category:Dimension reduction]]
[[Category:2013 in artificial intelligence]]