Content deleted Content added
No edit summary Tags: Reverted Visual edit |
m Reverted edits by 103.191.235.41 (talk) (AV) |
||
Line 1:
{{Short description|Neural network that learns efficient data encoding in an unsupervised manner}}
{{Distinguish|Autocoder|Autocode}}
{{Use dmy dates|date=March 2020|cs1-dates=y}}
{{Machine learning|Artificial neural network}}
An '''autoencoder''' is a type of [[artificial neural network]] used to learn [[Feature learning|efficient codings]] of unlabeled data ([[unsupervised learning]]).<ref name=":12">{{cite journal|doi=10.1002/aic.690370209|title=Nonlinear principal component analysis using autoassociative neural networks|journal=AIChE Journal|volume=37|issue=2|pages=233–243|date=1991|last1=Kramer|first1=Mark A.|url= https://www.researchgate.net/profile/Abir_Alobaid/post/To_learn_a_probability_density_function_by_using_neural_network_can_we_first_estimate_density_using_nonparametric_methods_then_train_the_network/attachment/59d6450279197b80779a031e/AS:451263696510979@1484601057779/download/NL+PCA+by+using+ANN.pdf}}</ref><ref name=":13">{{Cite journal |last=Kramer |first=M. A. |date=1992-04-01 |title=Autoassociative neural networks |url=https://dx.doi.org/10.1016/0098-1354%2892%2980051-A |journal=Computers & Chemical Engineering |series=Neutral network applications in chemical engineering |language=en |volume=16 |issue=4 |pages=313–328 |doi=10.1016/0098-1354(92)80051-A |issn=0098-1354}}</ref> An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an [[Feature learning|efficient representation]] (encoding) for a set of data, typically for [[dimensionality reduction]].
Variants exist, aiming to force the learned representations to assume useful properties.<ref name=":0" /> Examples are regularized autoencoders (''Sparse'', ''Denoising'' and ''Contractive''), which are effective in learning representations for subsequent [[Statistical classification|classification]] tasks,<ref name=":4" /> and ''Variational'' autoencoders, with applications as [[generative model]]s.<ref name=":11">{{cite journal |arxiv=1906.02691|doi=10.1561/2200000056|bibcode=2019arXiv190602691K|title=An Introduction to Variational Autoencoders|date=2019|last1=Welling|first1=Max|last2=Kingma|first2=Diederik P.|journal=Foundations and Trends in Machine Learning|volume=12|issue=4|pages=307–392|s2cid=174802445}}</ref> Autoencoders are applied to many problems, including [[face recognition|facial recognition]],<ref>Hinton GE, Krizhevsky A, Wang SD. [http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf Transforming auto-encoders.] In International Conference on Artificial Neural Networks 2011 Jun 14 (pp. 44-51). Springer, Berlin, Heidelberg.</ref> feature detection,<ref name=":2">{{Cite book|last=Géron|first=Aurélien|title=Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow|publisher=O’Reilly Media, Inc.|year=2019|___location=Canada|pages=739–740}}</ref> anomaly detection and acquiring the meaning of words.<ref>{{cite journal|doi=10.1016/j.neucom.2008.04.030|title=Modeling word perception using the Elman network|journal=Neurocomputing|volume=71|issue=16–18|pages=3150|date=2008|last1=Liou|first1=Cheng-Yuan|last2=Huang|first2=Jau-Chi|last3=Yang|first3=Wen-Chie|url=http://ntur.lib.ntu.edu.tw//handle/246246/155195 }}</ref><ref>{{cite journal|doi=10.1016/j.neucom.2013.09.055|title=Autoencoder for words|journal=Neurocomputing|volume=139|pages=84–96|date=2014|last1=Liou|first1=Cheng-Yuan|last2=Cheng|first2=Wei-Chen|last3=Liou|first3=Jiun-Wei|last4=Liou|first4=Daw-Ran}}</ref> Autoencoders are also generative models which can randomly generate new data that is similar to the input data (training data).<ref name=":2" />
{{Toclimit|3}}
== Mathematical principles ==
=== Definition ===
An autoencoder is defined by the following components: <blockquote>Two sets: the space of decoded messages <math>\mathcal X</math>; the space of encoded messages <math>\mathcal Z</math>. Almost always, both <math>\mathcal X</math> and <math>\mathcal Z</math> are Euclidean spaces, that is, <math>\mathcal X = \R^m, \mathcal Z = \R^n</math> for some <math>m, n</math>. </blockquote><blockquote>Two parametrized families of functions: the encoder family <math>E_\phi:\mathcal{X} \rightarrow \mathcal{Z}</math>, parametrized by <math>\phi</math>; the decoder family <math>D_\theta:\mathcal{Z} \rightarrow \mathcal{X}</math>, parametrized by <math>\theta</math>.</blockquote>For any <math>x\in \mathcal X</math>, we usually write <math>z = E_\phi(x)</math>, and refer to it as the code, the [[latent variable]], latent representation, latent vector, etc. Conversely, for any <math>z\in \mathcal Z</math>, we usually write <math>x' = D_\theta(z)</math>, and refer to it as the (decoded) message.
Usually, both the encoder and the decoder are defined as [[multilayer perceptron]]s. For example, a one-layer-MLP encoder <math>E_\phi</math> is:
Line 22 ⟶ 35:
The simplest way to perform the copying task perfectly would be to duplicate the signal. To suppress this behavior, the code space <math>\mathcal Z</math> usually has fewer dimensions than the message space <math>\mathcal{X}</math>.
Such an autoencoder is called ''undercomplete''. It can be interpreted as [[Data compression|compressing]] the message, or [[Dimensionality reduction|reducing its dimensionality]].<ref name=":12"
At the limit of an ideal undercomplete autoencoder, every possible code <math>z</math> in the code space is used to encode a message <math>x</math> that really appears in the distribution <math>\mu_{ref}</math>, and the decoder is also perfect: <math>D_\theta(E_\phi(x)) = x</math>. This ideal autoencoder can then be used to generate messages indistinguishable from real messages, by feeding its decoder arbitrary code <math>z</math> and obtaining <math>D_\theta(z)</math>, which is a message that really appears in the distribution <math>\mu_{ref}</math>.
Line 69 ⟶ 82:
Given a task <math>(\mu_{ref}, d)</math>, the problem of training a DAE is the optimization problem:<math display="block">\min_{\theta, \phi}L(\theta, \phi) = \mathbb \mathbb E_{x\sim \mu_X, T\sim\mu_T}[d(x, (D_\theta\circ E_\phi \circ T)(x))]</math>That is, the optimal DAE should take any noisy message and attempt to recover the original message without noise, thus the name "denoising"''.''
Usually, the noise process <math>T</math> is applied only during training and testing, not during downstream use.
The use of DAE depends on two assumptions:
Line 150 ⟶ 163:
=== Anomaly detection ===
Another application for autoencoders is [[anomaly detection]].<ref name=":13"
Recent literature has however shown that certain autoencoding models can, counterintuitively, be very good at reconstructing anomalous examples and consequently not able to reliably perform anomaly detection.<ref>{{cite arXiv|last1=Nalisnick|first1=Eric|last2=Matsukawa|first2=Akihiro|last3=Teh|first3=Yee Whye|last4=Gorur|first4=Dilan|last5=Lakshminarayanan|first5=Balaji|date=2019-02-24|title=Do Deep Generative Models Know What They Don't Know?|class=stat.ML|eprint=1810.09136}}</ref><ref>{{Cite journal|last1=Xiao|first1=Zhisheng|last2=Yan|first2=Qing|last3=Amit|first3=Yali|date=2020|title=Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder|url=https://proceedings.neurips.cc/paper/2020/hash/eddea82ad2755b24c4e168c5fc2ebd40-Abstract.html|journal=Advances in Neural Information Processing Systems|language=en|volume=33|arxiv=2003.02977}}</ref>
|