Variational autoencoder: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 14:55, 25 May 2025 edit OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot. ← Previous edit		Latest revision as of 16:21, 27 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,863,337 edits Add: arxiv, bibcode. Removed URL that duplicated identifier. Removed parameters. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 819/967
(One intermediate revision by one other user not shown)
Line 12: Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution within the latent space, rather than to a single point in that space. The decoder has the opposite function, which is to map from the latent space to the input space, again according to a distribution (although in practice, noise is rarely added during the decoding stage). By mapping a point to a distribution instead of a single point, the network can avoid overfitting the training data. Both networks are typically trained together with the usage of the [[#Reparameterization\|reparameterization trick]], although the variance of the noise model can be learned separately.{{cn\|date=June 2024}} Although this type of model was initially designed for [[unsupervised learning]],<ref>{{cite arXiv \|last1=Dilokthanakul \|first1=Nat \|last2=Mediano \|first2=Pedro A. M. \|last3=Garnelo \|first3=Marta \|last4=Lee \|first4=Matthew C. H. \|last5=Salimbeni \|first5=Hugh \|last6=Arulkumaran \|first6=Kai \|last7=Shanahan \|first7=Murray \|title=Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders \|date=2017-01-13 \|class=cs.LG \|eprint=1611.02648}}</ref><ref>{{cite book \|last1=Hsu \|first1=Wei-Ning \|last2=Zhang \|first2=Yu \|last3=Glass \|first3=James \|title=2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) \|chapter=Unsupervised ___domain adaptation for robust speech recognition via variational autoencoder-based data augmentation \|date=December 2017 \|pages=16–23 \|doi=10.1109/ASRU.2017.8268911 \|arxiv=1707.06265 \|isbn=978-1-5090-4788-8 \|s2cid=22681625 ~~\|chapter-url=https://ieeexplore.ieee.org/document/8268911~~}}</ref> its effectiveness has been proven for [[semi-supervised learning]]<ref>{{cite book \|last1=Ehsan Abbasnejad \|first1=M. \|last2=Dick \|first2=Anthony \|last3=van den Hengel \|first3=Anton \|title=Infinite Variational Autoencoder for Semi-Supervised Learning \|date=2017 \|pages=5888–5897 \|url=https://openaccess.thecvf.com/content_cvpr_2017/html/Abbasnejad_Infinite_Variational_Autoencoder_CVPR_2017_paper.html}}</ref><ref>{{cite journal \|last1=Xu \|first1=Weidi \|last2=Sun \|first2=Haoze \|last3=Deng \|first3=Chao \|last4=Tan \|first4=Ying \|title=Variational Autoencoder for Semi-Supervised Text Classification \|journal=Proceedings of the AAAI Conference on Artificial Intelligence \|date=2017-02-12 \|volume=31 \|issue=1 \|doi=10.1609/aaai.v31i1.10966 \|s2cid=2060721 \|url=https://ojs.aaai.org/index.php/AAAI/article/view/10966 \|language=en\|doi-access=free }}</ref> and [[supervised learning]].<ref>{{cite journal \|last1=Kameoka \|first1=Hirokazu \|last2=Li \|first2=Li \|last3=Inoue \|first3=Shota \|last4=Makino \|first4=Shoji \|title=Supervised Determined Source Separation with Multichannel Variational Autoencoder \|journal=Neural Computation \|date=2019-09-01 \|volume=31 \|issue=9 \|pages=1891–1914 \|doi=10.1162/neco_a_01217 \|pmid=31335290 \|s2cid=198168155 \|url=https://direct.mit.edu/neco/article/31/9/1891/8494/Supervised-Determined-Source-Separation-with\|url-access=subscription }}</ref> == Overview of architecture and operation == Line 76: It is straightforward to find<math display="block">\nabla_\theta \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z\| x})}\right] = \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[ \nabla_\theta \ln \frac{p_\theta(x, z)}{q_\phi({z\| x})}\right] </math>However, <math display="block">\nabla_\phi \mathbb E_{z \sim q_\phi(\cdot \| x)} \left[\ln \frac{p_\theta(x, z)}{q_\phi({z\| x})}\right] </math>does not allow one to put the <math>\nabla_\phi </math> inside the expectation, since <math>\phi </math> appears in the probability distribution itself. The '''reparameterization trick''' (also known as stochastic backpropagation<ref>{{Cite journal \|last1=Rezende \|first1=Danilo Jimenez \|last2=Mohamed \|first2=Shakir \|last3=Wierstra \|first3=Daan \|date=2014-06-18 \|title=Stochastic Backpropagation and Approximate Inference in Deep Generative Models \|url=https://proceedings.mlr.press/v32/rezende14.html \|journal=International Conference on Machine Learning \|language=en \|publisher=PMLR \|pages=1278–1286\|arxiv=1401.4082 }}</ref>) bypasses this difficulty.<ref name="Kingma2013"/><ref>{{Cite journal\|last1=Bengio\|first1=Yoshua\|last2=Courville\|first2=Aaron\|last3=Vincent\|first3=Pascal\|title=Representation Learning: A Review and New Perspectives~~\|url=https://ieeexplore.ieee.org/document/6472238~~\|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence\|year=2013\|volume=35\|issue=8\|pages=1798–1828\|doi=10.1109/TPAMI.2013.50\|pmid=23787338\|issn=1939-3539\|arxiv=1206.5538\|bibcode=2013ITPAM..35.1798B \|s2cid=393948}}</ref><ref>{{Cite arXiv\|last1=Kingma\|first1=Diederik P.\|last2=Rezende\|first2=Danilo J.\|last3=Mohamed\|first3=Shakir\|last4=Welling\|first4=Max\|date=2014-10-31\|title=Semi-Supervised Learning with Deep Generative Models\|class=cs.LG\|eprint=1406.5298}}</ref> The most important example is when <math>z \sim q_\phi(\cdot \| x) </math> is normally distributed, as <math>\mathcal N(\mu_\phi(x), \Sigma_\phi(x)) </math>. Line 97: Some structures directly deal with the quality of the generated samples<ref>{{Cite arXiv\|last1=Dai\|first1=Bin\|last2=Wipf\|first2=David\|date=2019-10-30\|title=Diagnosing and Enhancing VAE Models\|class=cs.LG\|eprint=1903.05789}}</ref><ref>{{Cite arXiv\|last1=Dorta\|first1=Garoe\|last2=Vicente\|first2=Sara\|last3=Agapito\|first3=Lourdes\|last4=Campbell\|first4=Neill D. F.\|last5=Simpson\|first5=Ivor\|date=2018-07-31\|title=Training VAEs Under Structured Residuals\|class=stat.ML\|eprint=1804.01050}}</ref> or implement more than one latent space to further improve the representation learning. Some architectures mix VAE and [[generative adversarial network]]s to obtain hybrid models.<ref>{{Cite journal\|last1=Larsen\|first1=Anders Boesen Lindbo\|last2=Sønderby\|first2=Søren Kaae\|last3=Larochelle\|first3=Hugo\|last4=Winther\|first4=Ole\|date=2016-06-11\|title=Autoencoding beyond pixels using a learned similarity metric\|url=http://proceedings.mlr.press/v48/larsen16.html\|journal=International Conference on Machine Learning\|language=en\|publisher=PMLR\|pages=1558–1566\|arxiv=1512.09300}}</ref><ref>{{cite arXiv\|last1=Bao\|first1=Jianmin\|last2=Chen\|first2=Dong\|last3=Wen\|first3=Fang\|last4=Li\|first4=Houqiang\|last5=Hua\|first5=Gang\|date=2017\|title=CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training\|pages=2745–2754\|class=cs.CV\|eprint=1703.10155}}</ref><ref>{{Cite journal\|last1=Gao\|first1=Rui\|last2=Hou\|first2=Xingsong\|last3=Qin\|first3=Jie\|last4=Chen\|first4=Jiaxin\|last5=Liu\|first5=Li\|last6=Zhu\|first6=Fan\|last7=Zhang\|first7=Zhao\|last8=Shao\|first8=Ling\|date=2020\|title=Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning~~\|url=https://ieeexplore.ieee.org/document/8957359~~\|journal=IEEE Transactions on Image Processing\|volume=29\|pages=3665–3680\|doi=10.1109/TIP.2020.2964429\|pmid=31940538\|bibcode=2020ITIP...29.3665G\|s2cid=210334032\|issn=1941-0042~~\|url-access=subscription~~}}</ref> It is not necessary to use gradients to update the encoder. In fact, the encoder is not necessary for the generative model. <ref>{{cite book \| last1=Drefs \| first1=J. \| last2=Guiraud \| first2=E. \| last3=Panagiotou \| first3=F. \| last4=Lücke \| first4=J. \| chapter=Direct evolutionary optimization of variational autoencoders with binary latents \| title=Joint European Conference on Machine Learning and Knowledge Discovery in Databases \| series=Lecture Notes in Computer Science \| pages=357–372 \| year=2023 \| volume=13715 \| publisher=Springer Nature Switzerland \| doi=10.1007/978-3-031-26409-2_22 \| ~~isbn~~arxiv=~~978-3-031-26408-5~~2011.13704 \| ~~chapter-url~~isbn=~~https://link.springer.com/chapter/10.1007/~~978-3-031-~~26409~~26408-~~2_22~~5 }}</ref> == Statistical distance VAE variants== Line 146: [[Category:Bayesian statistics]] [[Category:Dimension reduction]] [[Category:2013 in artificial intelligence]]