Autoencoder: Difference between revisions

Content deleted Content added
History: early application (1988)
Further reading section
Line 42:
 
In the ideal setting, the code dimension and the model capacity could be set on the basis of the complexity of the data distribution to be modeled. A standard way to do so is to add modifications to the basic autoencoder, to be detailed below.<ref name=":0" />
 
Autoencoders have been interpreted as nonlinear [[principal component analysis]]<ref name=":12" /> and with the [[Minimum description length|minimum description length principle]].<ref name=":15" />
 
==Variations==
 
Line 99 ⟶ 96:
 
==== Minimal description length autoencoder ====
<ref name=":15">{{Cite journal |last1=Hinton |first1=Geoffrey E |last2=Zemel |first2=Richard |date=1993 |title=Autoencoders, Minimum Description Length and Helmholtz Free Energy |url=https://proceedings.neurips.cc/paper/1993/hash/9e3cfc48eccf81a0d57663e129aef3cb-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Morgan-Kaufmann |volume=6}}</ref>
{{Empty section|date=March 2024}}
 
Line 126 ⟶ 123:
 
== History ==
(Oja, 1982)<ref>{{Cite journal |last=Oja |first=Erkki |date=1982-11-01 |title=Simplified neuron model as a principal component analyzer |url=https://link.springer.com/article/10.1007/BF00275687 |journal=Journal of Mathematical Biology |language=en |volume=15 |issue=3 |pages=267–273 |doi=10.1007/BF00275687 |issn=1432-1416}}</ref> noted that [[Principal component analysis|PCA]] is equivalent to a neural network with one hidden layer with identity activation function. In the language of autoencoding, the input-to-hidden module is the encoder, and the hidden-to-output module is the decoder. Subsequently, in (Baldi and Hornik, 1989)<ref>{{Cite journal |last=Baldi |first=Pierre |last2=Hornik |first2=Kurt |date=1989-01-01 |title=Neural networks and principal component analysis: Learning from examples without local minima |url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 |journal=Neural Networks |volume=2 |issue=1 |pages=53–58 |doi=10.1016/0893-6080(89)90014-2 |issn=0893-6080}}</ref> and (Kramer, 1991)<ref name=":12" /> connectedgeneralized PCA to autoencoders, which they termed as "nonlinear PCA".
 
Immediately after the resurgence of neural networks in the 1980s, it was suggested in 1986<ref>{{Cite book |last=Rumelhart |first=David E. |url=https://direct.mit.edu/books/book/4424/Parallel-Distributed-ProcessingExplorations-in-the |title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations |last2=McClelland |first2=James L. |last3=AU |date=1986 |publisher=The MIT Press |isbn=978-0-262-29140-8 |language=en |chapter=2. A General Framework for Parallel Distributed Processing |doi=10.7551/mitpress/5236.001.0001}}</ref> that a neural network be put in "auto-association mode". This was then implemented in (Harrison, 1987)<ref>Harrison TD (1987) A Connectionist framework for continuous speech recognition. Cambridge University Ph. D. dissertation</ref> and (Elman, Zipser, 1988)<ref>{{Cite journal |last=Elman |first=Jeffrey L. |last2=Zipser |first2=David |date=1988-04-01 |title=Learning the hidden structure of speech |url=https://pubs.aip.org/jasa/article/83/4/1615/826094/Learning-the-hidden-structure-of-speechLearning |journal=The Journal of the Acoustical Society of America |language=en |volume=83 |issue=4 |pages=1615–1626 |doi=10.1121/1.395916 |issn=0001-4966}}</ref> for speech and in (Cottrell, Munro, Zipser, 1987)<ref>{{Cite journal |last=Cottrell |first=Garrison W. |last2=Munro |first2=Paul |last3=Zipser |first3=David |date=1987 |title=Learning Internal Representation From Gray-Scale Images: An Example of Extensional Programming |url=https://escholarship.org/uc/item/2zs7w6z8 |journal=Proceedings of the Annual Meeting of the Cognitive Science Society |language=en |volume=9 |issue=0}}</ref> for images.<ref name=":14" /> In (Hinton, Salakhutdinov, 2006),<ref name=":72">{{cite journal |last1=Hinton |first1=G. E. |last2=Salakhutdinov |first2=R.R. |date=28 July 2006 |title=Reducing the Dimensionality of Data with Neural Networks |journal=Science |volume=313 |issue=5786 |pages=504–507 |bibcode=2006Sci...313..504H |doi=10.1126/science.1127647 |pmid=16873662 |s2cid=1658773}}</ref> [[Deep belief network|deep belief networks]] were developed. These train a pair [[Restricted Boltzmann machine|restricted Boltzmann machines]] as encoder-decoder pairs, then train another pair on the latent representation of the first pair, and so on.<ref name="scholar">{{Cite journal |vauthors=Hinton G |year=2009 |title=Deep belief networks |journal=Scholarpedia |volume=4 |issue=5 |pages=5947 |bibcode=2009SchpJ...4.5947H |doi=10.4249/scholarpedia.5947 |doi-access=free}}</ref>
 
The first applications of AE includeddate [[speechto recognition]]early (Elman,1990s.<ref Zipser,name=":0" 1988)/><ref>{{Cite journal |last=ElmanSchmidhuber |first=Jeffrey L.Jürgen |last2date=ZipserJanuary |first2=David |date=1988-04-012015 |title=LearningDeep thelearning hiddenin structureneural ofnetworks: speechAn |url=https://pubs.aip.org/jasa/article/83/4/1615/826094/Learning-the-hidden-structure-of-speechLearningoverview |journal=TheNeural Journal of the Acoustical Society of America |language=enNetworks |volume=8361 |issuepages=485–117 |pagesarxiv=1615–16261404.7828 |doi=10.11211016/1j.neunet.2014.09.395916003 |issnpmid=0001-496625462637 |s2cid=11715509}}</ref> and [[Facial recognition system|facial recognition]] (Cottrell, 1991)<ref>{{CitationCite journal |last=CottrellHinton |first=GarrisonGeoffrey W.E |titlelast2=ExtractingZemel features from faces using compression networks: Face, identity, emotion, and gender recognition using holons|first2=Richard |date=1991-01-011993 |worktitle=ConnectionistAutoencoders, ModelsMinimum |pages=328–337Description |editor-last=TouretzkyLength |editor-first=Davidand S.Helmholtz Free Energy |url=https://wwwproceedings.sciencedirectneurips.comcc/sciencepaper/article1993/abshash/pii/B9781483214481500391 |access9e3cfc48eccf81a0d57663e129aef3cb-date=2024-10-04Abstract.html |publisherjournal=MorganAdvances Kaufmannin |isbn=978-1-4832-1448-1Neural |editor2-last=ElmanInformation |editor2-first=JeffreyProcessing L.Systems |editor3-lastpublisher=Sejnowski |editor3Morgan-first=Terrence J.Kaufmann |editor4-lastvolume=Hinton |editor4-first=Geoffrey E.6}}</ref>. Their most traditional application was [[dimensionality reduction]] or [[feature learning]], but the concept became widely used for learning [[generative model]]s of data.<ref name=":0" /><ref>{{Cite journal |last=Schmidhuber |first=Jürgen |date=January 2015 |title=Deep learning in neural networks: An overview |journal=Neural Networks |volume=61 |pages=85–117 |arxiv=1404.7828 |doi=10.1016/j.neunet.2014.09.003 |pmid=25462637 |s2cid=11715509}}</ref><ref name="VAE">{{cite arXiv |eprint=1312.6114 |class=stat.ML |author1=Diederik P Kingma |first2=Max |last2=Welling |title=Auto-Encoding Variational Bayes |date=2013}}</ref><ref name="gan_faces">Generating Faces with Torch, Boesen A., Larsen L. and Sonderby S.K., 2015 {{url|http://torch.ch/blog/2015/11/13/gan.html}}</ref> Some of the most powerful [[Artificial intelligence|AIs]] in the 2010s involved autoencoder modules as a component of larger AI systems, such as VAE in [[Stable Diffusion]], discrete VAE in Transformer-based image generators like [[DALL-E|DALL-E 1]], etc.
 
During the early days, when the terminology was uncertain, the autoencoder has also been called identity mapping,<ref>{{Cite journal |last=Baldi |first=Pierre |last2=Hornik |first2=Kurt |date=1989-01-01 |title=Neural networks and principal component analysis: Learning from examples without local minima |url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 |journal=Neural Networks |volume=2 |issue=1 |pages=53–58 |doi=10.1016/0893-6080(89)90014-2 |issn=0893-6080}}</ref><ref name=":12" /> auto-associating,<ref>{{Cite journal |last=Ackley |first=D |last2=Hinton |first2=G |last3=Sejnowski |first3=T |date=1985-03 |title=A learning algorithm for boltzmann machines |url=http://doi.wiley.com/10.1016/S0364-0213(85)80012-4 |journal=Cognitive Science |language=en |volume=9 |issue=1 |pages=147–169 |doi=10.1016/S0364-0213(85)80012-4}}</ref> self-supervised backpropagation,<ref name=":12" /> or Diabolo network.<ref>{{Cite journal |last1=Schwenk |first1=Holger |last2=Bengio |first2=Yoshua |date=1997 |title=Training Methods for Adaptive Boosting of Neural Networks |url=https://proceedings.neurips.cc/paper/1997/hash/9cb67ffb59554ab1dabb65bcb370ddd9-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=MIT Press |volume=10}}</ref><ref name="bengio" />
Line 197 ⟶ 194:
* [[Sparse dictionary learning]]
* [[Deep learning]]
 
== Further reading ==
 
* {{cite book |last=Bank |first=Dor |title=Machine Learning for Data Science Handbook |last2=Koenigstein |first2=Noam |last3=Giryes |first3=Raja |publisher=Springer International Publishing |year=2023 |isbn=978-3-031-24627-2 |publication-place=Cham |chapter=Autoencoders |doi=10.1007/978-3-031-24628-9_16}}
* {{Cite book |last=Goodfellow |first=Ian |title=Deep learning |last2=Bengio |first2=Yoshua |last3=Courville |first3=Aaron |date=2016 |publisher=The MIT press |isbn=978-0-262-03561-3 |series=Adaptive computation and machine learning |___location=Cambridge, Mass |chapter=14. Autoencoders |chapter-url=https://www.deeplearningbook.org/contents/autoencoders.html}}
 
==References==