Content deleted Content added
Citation bot (talk | contribs) Alter: pages, template type. Add: magazine, bibcode, website, authors 1-1. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by AManWithNoPlan | #UCB_webform 250/1776 |
|||
Line 71:
* Another way to achieve sparsity is by applying L1 or L2 regularization terms on the activation, scaled by a certain parameter <math>\lambda</math>.<ref>{{cite arXiv |eprint=1505.05561|last1=Arpit|first1=Devansh|last2=Zhou|first2=Yingbo|last3=Ngo|first3=Hung|last4=Govindaraju|first4=Venu|title=Why Regularized Auto-Encoders learn Sparse Representation?|class=stat.ML|date=2015}}</ref> For instance, in the case of L1 the [[loss function]] becomes
::<math>\mathcal{L}(\mathbf{x},\mathbf{x'}) + \lambda \
* A further proposed strategy to force sparsity is to manually zero all but the strongest hidden unit activations (''k-sparse autoencoder'').<ref name=":1">{{cite arXiv |eprint=1312.5663|last1=Makhzani|first1=Alireza|last2=Frey|first2=Brendan|title=K-Sparse Autoencoders|class=cs.LG|date=2013}}</ref> The k-sparse autoencoder is based on a linear autoencoder (i.e. with linear activation function) and tied weights. The identification of the strongest activations can be achieved by sorting the activities and keeping only the first ''k'' values, or by using [[Rectifier (neural networks)|ReLU]] hidden units with thresholds that are adaptively adjusted until the k largest activities are identified. This selection acts like the previously mentioned regularization terms in that it prevents the model from reconstructing the input using too many neurons.<ref name=":1" />
|