Autoencoder: Difference between revisions

Content deleted Content added
Reverted 1 edit by Yitzhak Peleg (talk): Language change not needed
No edit summary
Tags: Reverted references removed Visual edit Mobile edit Mobile web edit
Line 9:
{{Toclimit|3}}
 
== Basic architectureJudy ==
An autoencoderauoder has two main parts: an encoder that maps the input into the code, and a decoder that maps the code to a reconstruction of the input.
 
The simplest way to perform the copying task perfectly would be to duplicate the signal. Instead, autoencoders are typically forced to reconstruct the input approximately, preserving only the most relevant aspects of the data in the copy.
 
The idea al application was dimensionality reduction or , but the concept became widely used for learning of data Some of the most powerful in the 2010s involved stacked inside
The idea of autoencoders has been popular for decades. The first applications date to the 1980s.<ref name=":0" /><ref>{{Cite journal|last=Schmidhuber|first=Jürgen|date=January 2015|title=Deep learning in neural networks: An overview|journal=Neural Networks|volume=61|pages=85–117|doi=10.1016/j.neunet.2014.09.003|pmid=25462637|arxiv=1404.7828|s2cid=11715509}}</ref><ref>Hinton, G. E., & Zemel, R. S. (1994). Autoencoders, minimum description length and Helmholtz free energy. In ''Advances in neural information processing systems 6'' (pp. 3-10).</ref> Their most traditional application was [[dimensionality reduction]] or [[feature learning]], but the concept became widely used for learning [[generative model]]s of data.<ref name="VAE">{{cite arxiv|eprint=1312.6114|author1=Diederik P Kingma|title=Auto-Encoding Variational Bayes|last2=Welling|first2=Max|class=stat.ML|date=2013}}</ref><ref name="gan_faces">Generating Faces with Torch, Boesen A., Larsen L. and Sonderby S.K., 2015 {{url|http://torch.ch/blog/2015/11/13/gan.html}}</ref> Some of the most powerful [[Artificial intelligence|AIs]] in the 2010s involved autoencoders stacked inside [[Deep learning|deep]] neural networks.<ref name="domingos">{{cite book|title=The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World|title-link=The Master Algorithm|last1=Domingos|first1=Pedro|publisher=Basic Books|date=2015|isbn=978-046506192-1|at="Deeper into the Brain" subsection|chapter=4|author-link=Pedro Domingos}}</ref>[[File:Autoencoder schema.png|thumb|Schema of a basic Autoencoder]]The simplest form of an autoencoder is a [[feedforward neural network|feedforward]], non-[[recurrent neural network]] similar to single layer [[perceptrons]] that participate in [[multilayer perceptron]]s (MLP) – employing an input layer and an output layer connected by one or more hidden layers. The output layer has the same number of nodes (neurons) as the input layer. Its purpose is to reconstruct its inputs (minimizing the difference between the input and the output) instead of predicting a target value <math>Y</math> given inputs <math>X</math>. Therefore, autoencoders learn unsupervised.
 
The simplest form of an autder is an input layer and an output layer ted by one or more hidden layers. The output layer has as the input layer. Its purpose is to reconstruct its of predicting a tar
An autoencoder consists of two parts, the encoder and the decoder, which can be defined as transitions <math>\phi</math> and <math>\psi,</math> such that:
 
:
:<math>\phi:\mathcal{X} \rightarrow \mathcal{F}</math>
:
:<math>\psi:\mathcal{F} \rightarrow \mathcal{X}</math>
:
:<math>\phi,\psi = \underset{\phi,\psi}{\operatorname{arg\,min}}\, \|\mathcal{X}-(\psi \circ \phi) \mathcal{X}\|^2</math>
 
In the simplest case, given one hidden layer, the encoder stage of an autoencoder takes the input <math>\mathbf{x} \in \mathbb{R}^d = \mathcal{X}</math> and maps it to <math>\mathbf{h} \in \mathbb{R}^p = \mathcal{F}</math>:
Line 26:
:<math>\mathbf{h} = \sigma(\mathbf{Wx}+\mathbf{b})</math>
 
This image <math>\mathbf{h}</math> is usually referred to as code, [[latent variable]]sr, or a latent representation. <math>\sigma</math> is an element-wise [[activation function]] such as a [[sigmoid function]] or a [[rectified linear unit]]. <math>\mathbf{W}</math> is a weight matrix and <math>\mathbf{b}</math> is a bias vector. Weights and biases are usually initialized randomly, and then updated iteratively during training through [[backpropagation]]. After that, the decoder stage of the autoencoder maps <math>\mathbf{h}</math> to the reconstruction <math>\mathbf{x'}</math> of the same shape as <math>\mathbf{x}</math>:
 
:<math>\mathbf{x'} = \sigma'(\mathbf{W'h}+\mathbf{b'})</math>
Line 49:
====Sparse autoencoder (SAE)====
[[File:Autoencoder sparso.png|thumb|Simple schema of a single-layer sparse autoencoder. The hidden nodes in bright yellow are activated, while the light yellow ones are inactive. The activation depends on the input.]]
Learning [[Representation learning|representations]] in a way that encourages sparsity improves performance on classification tasks.<ref name=":5">{{Cite journal|last1=Frey|first1=Brendan|last2=Makhzani|first2=Alireza|date=2013-12-19|title=k-Sparse Autoencoders|arxiv=1312.5663|bibcode=2013arXiv1312.5663M}}</ref> Sparse autoencoders may include more (rather than fewer) hidden units than inputs, but only a small number of the hidden units are allowed to be active at the same time (thus, sparse).<ref name="domingos">{{cite book|last1=Domingos|first1=Pedro|title=The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World|title-link=The Master Algorithm|date=2015|publisher=Basic Books|isbn=978-046506192-1|at="Deeper into the Brain" subsection|chapter=4|author-link=Pedro Domingos}}</ref> This constraint forces the model to respond to the unique statistical features of the training data.
 
Specifically, a sparse autoencoder is an autoencoder whose training criterion involves a sparsity penalty <math>\Omega(\boldsymbol h)</math> on the code layer <math>\boldsymbol h</math>.