Revision as of 18:16, 4 October 2024 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits →History: early application (1988) Tag: Visual edit ← Previous edit		Revision as of 18:22, 4 October 2024 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits Further reading section Tag: 2017 wikitext editor Next edit →
Line 42: In the ideal setting, the code dimension and the model capacity could be set on the basis of the complexity of the data distribution to be modeled. A standard way to do so is to add modifications to the basic autoencoder, to be detailed below.<ref name=":0" /> ~~Autoencoders have been interpreted as nonlinear [[principal component analysis]]<ref name=":12" /> and with the [[Minimum description length\|minimum description length principle]].<ref name=":15" />~~ ==Variations== Line 99 ⟶ 96: ==== Minimal description length autoencoder ==== <ref ~~name=":15"~~>{{Cite journal \|last1=Hinton \|first1=Geoffrey E \|last2=Zemel \|first2=Richard \|date=1993 \|title=Autoencoders, Minimum Description Length and Helmholtz Free Energy \|url=https://proceedings.neurips.cc/paper/1993/hash/9e3cfc48eccf81a0d57663e129aef3cb-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=Morgan-Kaufmann \|volume=6}}</ref> {{Empty section\|date=March 2024}} Line 126 ⟶ 123: == History == (Oja, 1982)<ref>{{Cite journal \|last=Oja \|first=Erkki \|date=1982-11-01 \|title=Simplified neuron model as a principal component analyzer \|url=https://link.springer.com/article/10.1007/BF00275687 \|journal=Journal of Mathematical Biology \|language=en \|volume=15 \|issue=3 \|pages=267–273 \|doi=10.1007/BF00275687 \|issn=1432-1416}}</ref> noted that ~~[[Principal component analysis\|~~PCA]] is equivalent to a neural network with one hidden layer with identity activation function. In the language of autoencoding, the input-to-hidden module is the encoder, and the hidden-to-output module is the decoder. Subsequently, in (Baldi and Hornik, 1989)<ref>{{Cite journal \|last=Baldi \|first=Pierre \|last2=Hornik \|first2=Kurt \|date=1989-01-01 \|title=Neural networks and principal component analysis: Learning from examples without local minima \|url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 \|journal=Neural Networks \|volume=2 \|issue=1 \|pages=53–58 \|doi=10.1016/0893-6080(89)90014-2 \|issn=0893-6080}}</ref> and (Kramer, 1991)<ref name=":12" /> ~~connected~~generalized PCA to autoencoders, which they termed as "nonlinear PCA". Immediately after the resurgence of neural networks in the 1980s, it was suggested in 1986<ref>{{Cite book \|last=Rumelhart \|first=David E. \|url=https://direct.mit.edu/books/book/4424/Parallel-Distributed-ProcessingExplorations-in-the \|title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations \|last2=McClelland \|first2=James L. \|last3=AU \|date=1986 \|publisher=The MIT Press \|isbn=978-0-262-29140-8 \|language=en \|chapter=2. A General Framework for Parallel Distributed Processing \|doi=10.7551/mitpress/5236.001.0001}}</ref> that a neural network be put in "auto-association mode". This was then implemented in (Harrison, 1987)<ref>Harrison TD (1987) A Connectionist framework for continuous speech recognition. Cambridge University Ph. D. dissertation</ref> and (Elman, Zipser, 1988)<ref>{{Cite journal \|last=Elman \|first=Jeffrey L. \|last2=Zipser \|first2=David \|date=1988-04-01 \|title=Learning the hidden structure of speech \|url=https://pubs.aip.org/jasa/article/83/4/1615/826094/Learning-the-hidden-structure-of-speechLearning \|journal=The Journal of the Acoustical Society of America \|language=en \|volume=83 \|issue=4 \|pages=1615–1626 \|doi=10.1121/1.395916 \|issn=0001-4966}}</ref> for speech and in (Cottrell, Munro, Zipser, 1987)<ref>{{Cite journal \|last=Cottrell \|first=Garrison W. \|last2=Munro \|first2=Paul \|last3=Zipser \|first3=David \|date=1987 \|title=Learning Internal Representation From Gray-Scale Images: An Example of Extensional Programming \|url=https://escholarship.org/uc/item/2zs7w6z8 \|journal=Proceedings of the Annual Meeting of the Cognitive Science Society \|language=en \|volume=9 \|issue=0}}</ref> for images.<ref name=":14" /> In (Hinton, Salakhutdinov, 2006),<ref name=":72">{{cite journal \|last1=Hinton \|first1=G. E. \|last2=Salakhutdinov \|first2=R.R. \|date=28 July 2006 \|title=Reducing the Dimensionality of Data with Neural Networks \|journal=Science \|volume=313 \|issue=5786 \|pages=504–507 \|bibcode=2006Sci...313..504H \|doi=10.1126/science.1127647 \|pmid=16873662 \|s2cid=1658773}}</ref> [[Deep belief network\|deep belief networks]] were developed. These train a pair [[Restricted Boltzmann machine\|restricted Boltzmann machines]] as encoder-decoder pairs, then train another pair on the latent representation of the first pair, and so on.<ref name="scholar">{{Cite journal \|vauthors=Hinton G \|year=2009 \|title=Deep belief networks \|journal=Scholarpedia \|volume=4 \|issue=5 \|pages=5947 \|bibcode=2009SchpJ...4.5947H \|doi=10.4249/scholarpedia.5947 \|doi-access=free}}</ref> The first applications of AE ~~included~~date ~~[[speech~~to ~~recognition]]~~early ~~(Elman,~~1990s.<ref ~~Zipser,~~name=":0" ~~1988)~~/><ref>{{Cite journal \|last=~~Elman~~Schmidhuber \|first=~~Jeffrey L.~~Jürgen \|~~last2~~date=~~Zipser~~January ~~\|first2=David \|date=1988-04-01~~2015 \|title=~~Learning~~Deep ~~the~~learning ~~hidden~~in ~~structure~~neural ofnetworks: ~~speech~~An ~~\|url=https://pubs.aip.org/jasa/article/83/4/1615/826094/Learning-the-hidden-structure-of-speechLearning~~overview \|journal=~~The~~Neural ~~Journal of the Acoustical Society of America \|language=en~~Networks \|volume=8361 \|~~issue~~pages=485–117 \|~~pages~~arxiv=~~1615–1626~~1404.7828 \|doi=10.~~1121~~1016/1j.neunet.2014.09.~~395916~~003 \|~~issn~~pmid=~~0001-4966~~25462637 \|s2cid=11715509}}</ref> ~~and [[Facial recognition system\|facial recognition]] (Cottrell, 1991)~~<ref>{{~~Citation~~Cite journal \|last=~~Cottrell~~Hinton \|first=~~Garrison~~Geoffrey W.E \|~~title~~last2=~~Extracting~~Zemel ~~features from faces using compression networks: Face, identity, emotion, and gender recognition using holons~~\|first2=Richard \|date=~~1991-01-01~~1993 \|~~work~~title=~~Connectionist~~Autoencoders, ~~Models~~Minimum ~~\|pages=328–337~~Description ~~\|editor-last=Touretzky~~Length ~~\|editor-first=David~~and S.Helmholtz Free Energy \|url=https://~~www~~proceedings.~~sciencedirect~~neurips.~~com~~cc/~~science~~paper/~~article~~1993/~~abs~~hash/~~pii/B9781483214481500391 \|access~~9e3cfc48eccf81a0d57663e129aef3cb-~~date=2024-10-04~~Abstract.html \|~~publisher~~journal=~~Morgan~~Advances ~~Kaufmann~~in ~~\|isbn=978-1-4832-1448-1~~Neural ~~\|editor2-last=Elman~~Information ~~\|editor2-first=Jeffrey~~Processing L.Systems \|~~editor3-last~~publisher=~~Sejnowski \|editor3~~Morgan-~~first=Terrence J.~~Kaufmann \|~~editor4-last~~volume=~~Hinton \|editor4-first=Geoffrey E.~~6}}</ref>. Their most traditional application was [[dimensionality reduction]] or [[feature learning]], but the concept became widely used for learning [[generative model]]s of data.<ref name=":0" /><ref>{{Cite journal \|last=Schmidhuber \|first=Jürgen \|date=January 2015 \|title=Deep learning in neural networks: An overview \|journal=Neural Networks \|volume=61 \|pages=85–117 \|arxiv=1404.7828 \|doi=10.1016/j.neunet.2014.09.003 \|pmid=25462637 \|s2cid=11715509}}</ref><ref name="VAE">{{cite arXiv \|eprint=1312.6114 \|class=stat.ML \|author1=Diederik P Kingma \|first2=Max \|last2=Welling \|title=Auto-Encoding Variational Bayes \|date=2013}}</ref><ref name="gan_faces">Generating Faces with Torch, Boesen A., Larsen L. and Sonderby S.K., 2015 {{url\|http://torch.ch/blog/2015/11/13/gan.html}}</ref> Some of the most powerful [[Artificial intelligence\|AIs]] in the 2010s involved autoencoder modules as a component of larger AI systems, such as VAE in [[Stable Diffusion]], discrete VAE in Transformer-based image generators like [[DALL-E\|DALL-E 1]], etc. During the early days, when the terminology was uncertain, the autoencoder has also been called identity mapping,<ref>{{Cite journal \|last=Baldi \|first=Pierre \|last2=Hornik \|first2=Kurt \|date=1989-01-01 \|title=Neural networks and principal component analysis: Learning from examples without local minima \|url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 \|journal=Neural Networks \|volume=2 \|issue=1 \|pages=53–58 \|doi=10.1016/0893-6080(89)90014-2 \|issn=0893-6080}}</ref><ref name=":12" /> auto-associating,<ref>{{Cite journal \|last=Ackley \|first=D \|last2=Hinton \|first2=G \|last3=Sejnowski \|first3=T \|date=1985-03 \|title=A learning algorithm for boltzmann machines \|url=http://doi.wiley.com/10.1016/S0364-0213(85)80012-4 \|journal=Cognitive Science \|language=en \|volume=9 \|issue=1 \|pages=147–169 \|doi=10.1016/S0364-0213(85)80012-4}}</ref> self-supervised backpropagation,<ref name=":12" /> or Diabolo network.<ref>{{Cite journal \|last1=Schwenk \|first1=Holger \|last2=Bengio \|first2=Yoshua \|date=1997 \|title=Training Methods for Adaptive Boosting of Neural Networks \|url=https://proceedings.neurips.cc/paper/1997/hash/9cb67ffb59554ab1dabb65bcb370ddd9-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=MIT Press \|volume=10}}</ref><ref name="bengio" /> Line 197 ⟶ 194: * [[Sparse dictionary learning]] * [[Deep learning]] == Further reading == * {{cite book \|last=Bank \|first=Dor \|title=Machine Learning for Data Science Handbook \|last2=Koenigstein \|first2=Noam \|last3=Giryes \|first3=Raja \|publisher=Springer International Publishing \|year=2023 \|isbn=978-3-031-24627-2 \|publication-place=Cham \|chapter=Autoencoders \|doi=10.1007/978-3-031-24628-9_16}} * {{Cite book \|last=Goodfellow \|first=Ian \|title=Deep learning \|last2=Bengio \|first2=Yoshua \|last3=Courville \|first3=Aaron \|date=2016 \|publisher=The MIT press \|isbn=978-0-262-03561-3 \|series=Adaptive computation and machine learning \|___location=Cambridge, Mass \|chapter=14. Autoencoders \|chapter-url=https://www.deeplearningbook.org/contents/autoencoders.html}} ==References==

Autoencoder: Difference between revisions