Revision as of 18:23, 4 October 2024 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits m fx cite Tag: 2017 wikitext editor ← Previous edit		Revision as of 18:24, 4 October 2024 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,296 edits m →Sparse autoencoder: dedup cite Tag: Visual edit Next edit →
Line 48: ====Sparse autoencoder==== Inspired by the [[sparse coding]] hypothesis in neuroscience, sparse autoencoders (SAE) are variants of autoencoders, such that the codes <math>E_\phi(x)</math> for messages tend to be ''sparse codes'', that is, <math>E_\phi(x)</math> is close to zero in most entries. Sparse autoencoders may include more (rather than fewer) hidden units than inputs, but only a small number of the hidden units are allowed to be active at the same time.<ref name="domingos">{{cite book \|last1=Domingos \|first1=Pedro \|author-link=Pedro Domingos \|title=The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World \|title-link=The Master Algorithm \|date=2015 \|publisher=Basic Books \|isbn=978-046506192-1 \|at="Deeper into the Brain" subsection \|chapter=4}}</ref> Encouraging sparsity improves performance on classification tasks.<ref name=":51"~~>{{Cite~~ ~~arxiv\|last1=Frey \|first1=Brendan \|last2=Makhzani \|first2=Alireza \|date=2013-12-19 \|title=k-Sparse Autoencoders \|arxiv=1312.5663 \|bibcode=2013arXiv1312.5663M}}<~~/~~ref~~> [[File:Autoencoder sparso.png\|thumb\|Simple schema of a single-layer sparse autoencoder. The hidden nodes in bright yellow are activated, while the light yellow ones are inactive. The activation depends on the input.]] There are two main ways to enforce sparsity. One way is to simply clamp all but the highest-k activations of the latent code to zero. This is the '''k-sparse autoencoder'''.<ref name=":1">{{cite arXiv \|eprint=1312.5663 \|class=cs.LG \|first1=Alireza \|last1=Makhzani \|first2=Brendan \|last2=Frey \|title=K-Sparse Autoencoders \|date=2013}}</ref> Line 61: For each input <math>x</math>, let the actual sparsity of activation in each layer <math>k</math> be<math display="block">\rho_k(x) = \frac 1n \sum_{i=1}^n a_{k, i}(x)</math>where <math>a_{k, i}(x)</math> is the activation in the <math>i</math> -th neuron of the <math>k</math> -th layer upon input <math>x</math>. The sparsity loss upon input <math>x</math> for one layer is <math>s(\hat\rho_k, \rho_k(x))</math>, and the sparsity regularization loss for the entire autoencoder is the expected weighted sum of sparsity losses:<math display="block">L_{sparsity}(\theta, \phi) = \mathbb \mathbb E_{x\sim\mu_X}\left[\sum_{k\in 1:K} w_k s(\hat\rho_k, \rho_k(x)) \right]</math>Typically, the function <math>s</math> is either the [[Kullback–Leibler divergence\|Kullback-Leibler (KL) divergence]], as<ref name=":51" /><ref name=":6">Ng, A. (2011). [https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf Sparse autoencoder]. ''CS294A Lecture notes'', ''72''(2011), 1-19.</ref><ref>{{Cite journal\|last1=Nair\|first1=Vinod\|last2=Hinton\|first2=Geoffrey E.\|date=2009\|title=3D Object Recognition with Deep Belief Nets\|url=http://dl.acm.org/citation.cfm?id=2984093.2984244\|journal=Proceedings of the 22nd International Conference on Neural Information Processing Systems\|series=NIPS'09\|___location=USA\|publisher=Curran Associates Inc.\|pages=1339–1347\|isbn=9781615679119}}</ref><ref>{{Cite journal\|last1=Zeng\|first1=Nianyin\|last2=Zhang\|first2=Hong\|last3=Song\|first3=Baoye\|last4=Liu\|first4=Weibo\|last5=Li\|first5=Yurong\|last6=Dobaie\|first6=Abdullah M.\|date=2018-01-17\|title=Facial expression recognition via learning deep sparse autoencoders\|journal=Neurocomputing\|volume=273\|pages=643–649\|doi=10.1016/j.neucom.2017.08.043\|issn=0925-2312}}</ref> ::<math>s(\rho, \hat\rho) = KL(\rho \|\| \hat{\rho}) = \rho \log \frac{\rho}{\hat{\rho}}+(1- \rho)\log \frac{1-\rho}{1-\hat{\rho}}</math>

Autoencoder: Difference between revisions