Autoencoder: Difference between revisions

Content deleted Content added
m fx cite
m Sparse autoencoder: dedup cite
Line 48:
 
====Sparse autoencoder====
Inspired by the [[sparse coding]] hypothesis in neuroscience, sparse autoencoders (SAE) are variants of autoencoders, such that the codes <math>E_\phi(x)</math> for messages tend to be ''sparse codes'', that is, <math>E_\phi(x)</math> is close to zero in most entries. Sparse autoencoders may include more (rather than fewer) hidden units than inputs, but only a small number of the hidden units are allowed to be active at the same time.<ref name="domingos">{{cite book |last1=Domingos |first1=Pedro |author-link=Pedro Domingos |title=The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World |title-link=The Master Algorithm |date=2015 |publisher=Basic Books |isbn=978-046506192-1 |at="Deeper into the Brain" subsection |chapter=4}}</ref> Encouraging sparsity improves performance on classification tasks.<ref name=":51">{{Cite arxiv|last1=Frey |first1=Brendan |last2=Makhzani |first2=Alireza |date=2013-12-19 |title=k-Sparse Autoencoders |arxiv=1312.5663 |bibcode=2013arXiv1312.5663M}}</ref> [[File:Autoencoder sparso.png|thumb|Simple schema of a single-layer sparse autoencoder. The hidden nodes in bright yellow are activated, while the light yellow ones are inactive. The activation depends on the input.]]
There are two main ways to enforce sparsity. One way is to simply clamp all but the highest-k activations of the latent code to zero. This is the '''k-sparse autoencoder'''.<ref name=":1">{{cite arXiv |eprint=1312.5663 |class=cs.LG |first1=Alireza |last1=Makhzani |first2=Brendan |last2=Frey |title=K-Sparse Autoencoders |date=2013}}</ref>
 
Line 61:
For each input <math>x</math>, let the actual sparsity of activation in each layer <math>k</math> be<math display="block">\rho_k(x) = \frac 1n \sum_{i=1}^n a_{k, i}(x)</math>where <math>a_{k, i}(x)</math> is the activation in the <math>i</math> -th neuron of the <math>k</math> -th layer upon input <math>x</math>.
 
The sparsity loss upon input <math>x</math> for one layer is <math>s(\hat\rho_k, \rho_k(x))</math>, and the sparsity regularization loss for the entire autoencoder is the expected weighted sum of sparsity losses:<math display="block">L_{sparsity}(\theta, \phi) = \mathbb \mathbb E_{x\sim\mu_X}\left[\sum_{k\in 1:K} w_k s(\hat\rho_k, \rho_k(x)) \right]</math>Typically, the function <math>s</math> is either the [[Kullback–Leibler divergence|Kullback-Leibler (KL) divergence]], as<ref name=":51" /><ref name=":6">Ng, A. (2011). [https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf Sparse autoencoder]. ''CS294A Lecture notes'', ''72''(2011), 1-19.</ref><ref>{{Cite journal|last1=Nair|first1=Vinod|last2=Hinton|first2=Geoffrey E.|date=2009|title=3D Object Recognition with Deep Belief Nets|url=http://dl.acm.org/citation.cfm?id=2984093.2984244|journal=Proceedings of the 22nd International Conference on Neural Information Processing Systems|series=NIPS'09|___location=USA|publisher=Curran Associates Inc.|pages=1339–1347|isbn=9781615679119}}</ref><ref>{{Cite journal|last1=Zeng|first1=Nianyin|last2=Zhang|first2=Hong|last3=Song|first3=Baoye|last4=Liu|first4=Weibo|last5=Li|first5=Yurong|last6=Dobaie|first6=Abdullah M.|date=2018-01-17|title=Facial expression recognition via learning deep sparse autoencoders|journal=Neurocomputing|volume=273|pages=643–649|doi=10.1016/j.neucom.2017.08.043|issn=0925-2312}}</ref>
 
::<math>s(\rho, \hat\rho) = KL(\rho || \hat{\rho}) = \rho \log \frac{\rho}{\hat{\rho}}+(1- \rho)\log \frac{1-\rho}{1-\hat{\rho}}</math>