Content deleted Content added
remove external dead link; version visible though https://web.archive.org/web/20210629102308/https://pythoncodingai.com/autoencoders-its-types-and-design/ seems to not contribute much that the article itself does not already cover. |
Citation bot (talk | contribs) Alter: pages, template type. Add: magazine, bibcode, website, authors 1-1. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by AManWithNoPlan | #UCB_webform 250/1776 |
||
Line 5:
An '''autoencoder''' is a type of [[artificial neural network]] used to learn [[Feature learning|efficient codings]] of unlabeled data ([[unsupervised learning]]).<ref>{{cite journal|doi=10.1002/aic.690370209|title=Nonlinear principal component analysis using autoassociative neural networks|journal=AIChE Journal|volume=37|issue=2|pages=233–243|date=1991|last1=Kramer|first1=Mark A.|url= https://www.researchgate.net/profile/Abir_Alobaid/post/To_learn_a_probability_density_function_by_using_neural_network_can_we_first_estimate_density_using_nonparametric_methods_then_train_the_network/attachment/59d6450279197b80779a031e/AS:451263696510979@1484601057779/download/NL+PCA+by+using+ANN.pdf}}</ref> The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder learns a [[Feature learning|representation]] (encoding) for a set of data, typically for [[dimensionality reduction]], by training the network to ignore insignificant data (“noise”).
Variants exist, aiming to force the learned representations to assume useful properties.<ref name=":0" /> Examples are regularized autoencoders (''Sparse'', ''Denoising'' and ''Contractive''), which are effective in learning representations for subsequent [[Statistical classification|classification]] tasks,<ref name=":4" /> and ''Variational'' autoencoders, with applications as [[generative model]]s.<ref name=":11">{{cite journal |arxiv=1906.02691|doi=10.1561/2200000056|bibcode=2019arXiv190602691K|title=An Introduction to Variational Autoencoders|date=2019|last1=Welling|first1=Max|last2=Kingma|first2=Diederik P.|journal=Foundations and Trends in Machine Learning|volume=12|issue=4|pages=307–392|s2cid=174802445}}</ref> Autoencoders are applied to many problems, from [[face recognition|facial recognition]],<ref>Hinton GE, Krizhevsky A, Wang SD. [http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf Transforming auto-encoders.] In International Conference on Artificial Neural Networks 2011 Jun 14 (pp. 44-51). Springer, Berlin, Heidelberg.</ref> feature detection,<ref name=":2">{{Cite book|last=Géron|first=Aurélien|title=Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow|publisher=O’Reilly Media, Inc.|year=2019|___location=Canada|pages=
{{Toclimit|3}}
Line 14:
The simplest way to perform the copying task perfectly would be to duplicate the signal. Instead, autoencoders are typically forced to reconstruct the input approximately, preserving only the most relevant aspects of the data in the copy.
The idea of autoencoders has been popular for decades. The first applications date to the 1980s.<ref name=":0" /><ref>{{Cite journal|last=Schmidhuber|first=Jürgen|date=January 2015|title=Deep learning in neural networks: An overview|journal=Neural Networks|volume=61|pages=85–117|doi=10.1016/j.neunet.2014.09.003|pmid=25462637|arxiv=1404.7828|s2cid=11715509}}</ref><ref>Hinton, G. E., & Zemel, R. S. (1994). Autoencoders, minimum description length and Helmholtz free energy. In ''Advances in neural information processing systems 6'' (pp. 3-10).</ref> Their most traditional application was [[dimensionality reduction]] or [[feature learning]], but the concept became widely used for learning [[generative model]]s of data.<ref name="VAE">{{cite
An autoencoder consists of two parts, the encoder and the decoder, which can be defined as transitions <math>\phi</math> and <math>\psi,</math> such that:
Line 69:
:where <math>j</math> is summing over the <math>s</math> hidden nodes in the hidden layer, and <math>KL(\rho || \hat{\rho_j}) </math> is the KL-divergence between a [[Bernoulli distribution|Bernoulli random variable]] with mean <math>\rho</math> and a Bernoulli random variable with mean <math>\hat{\rho_j}</math>.<ref name=":6" />
* Another way to achieve sparsity is by applying L1 or L2 regularization terms on the activation, scaled by a certain parameter <math>\lambda</math>.<ref>{{cite
::<math>\mathcal{L}(\mathbf{x},\mathbf{x'}) + \lambda \sum_i |h_i|</math>
* A further proposed strategy to force sparsity is to manually zero all but the strongest hidden unit activations (''k-sparse autoencoder'').<ref name=":1">{{cite
====Denoising autoencoder (DAE)====
Line 107:
=== Concrete autoencoder ===
The concrete autoencoder is designed for discrete feature selection.<ref>{{cite
===Variational autoencoder (VAE)===
Line 127:
[[Geoffrey Hinton]] developed the [[deep belief network]] technique for training many-layered deep autoencoders. His method involves treating each neighbouring set of two layers as a [[restricted Boltzmann machine]] so that pretraining approximates a good solution, then using backpropagation to fine-tune the results.<ref name=":7">{{cite journal|last1=Hinton|first1=G. E.|last2=Salakhutdinov|first2=R.R.|title=Reducing the Dimensionality of Data with Neural Networks|journal=Science|date=28 July 2006|volume=313|issue=5786|pages=504–507|doi=10.1126/science.1127647|pmid=16873662|bibcode=2006Sci...313..504H|s2cid=1658773}}</ref>
Researchers have debated whether joint training (i.e. training the whole architecture together with a single global reconstruction objective to optimize) would be better for deep auto-encoders.<ref name=":9">{{cite
== Applications ==
Line 133:
=== Dimensionality reduction ===
[[File:PCA vs Linear Autoencoder.png|thumb|Plot of the first two Principal Components (left) and a two-dimension hidden layer of a Linear Autoencoder (Right) applied to the [[Fashion MNIST dataset]].<ref name=":10">{{Cite web|url=https://github.com/zalandoresearch/fashion-mnist|title=Fashion MNIST|website=[[GitHub]]|date=2019-07-12}}</ref> The two models being both linear learn to span the same subspace. The projection of the data points is indeed identical, apart from rotation of the subspace - to which PCA is invariant.]][[Dimensionality reduction]] was one of the first [[deep learning]] applications.<ref name=":0" />
For Hinton's 2006 study,<ref name=":7" /> he pretrained a multi-layer autoencoder with a stack of [[Restricted Boltzmann machine|RBMs]] and then used their weights to initialize a deep autoencoder with gradually smaller hidden layers until hitting a bottleneck of 30 neurons. The resulting 30 dimensions of the code yielded a smaller reconstruction error compared to the first 30 components of a principal component analysis (PCA), and learned a representation that was qualitatively easier to interpret, clearly separating data clusters.<ref name=":0" /><ref name=":7" />
Line 141:
==== Principal component analysis ====
[[File:Reconstruction autoencoders vs PCA.png|thumb|Reconstruction of 28x28pixel images by an Autoencoder with a code size of two (two-units hidden layer) and the reconstruction from the first two Principal Components of PCA. Images come from the [[Fashion MNIST dataset]].<ref name=":10" />]]
If linear activations are used, or only a single sigmoid hidden layer, then the optimal solution to an autoencoder is strongly related to [[principal component analysis]] (PCA).<ref>{{Cite journal|last1=Bourlard|first1=H.|last2=Kamp|first2=Y.|date=1988|title=Auto-association by multilayer perceptrons and singular value decomposition|journal=Biological Cybernetics|volume=59|issue=4–5|pages=291–294|doi=10.1007/BF00332918|pmid=3196773|s2cid=206775335|url=http://infoscience.epfl.ch/record/82601}}</ref><ref>{{cite book|title=Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '14|last1=Chicco|first1=Davide|last2=Sadowski|first2=Peter|last3=Baldi|first3=Pierre|date=2014|isbn=9781450328944|pages=533|chapter=Deep autoencoder neural networks for gene ontology annotation predictions|doi=10.1145/2649387.2649442|hdl=11311/964622|s2cid=207217210|url=http://dl.acm.org/citation.cfm?id=2649442}}</ref> The weights of an autoencoder with a single hidden layer of size <math>p</math> (where <math>p</math> is less than the size of the input) span the same vector subspace as the one spanned by the first <math>p</math> principal components, and the output of the autoencoder is an orthogonal projection onto this subspace. The autoencoder weights are not equal to the principal components, and are generally not orthogonal, yet the principal components may be recovered from them using the [[singular value decomposition]].<ref>{{cite
However, the potential of autoencoders resides in their non-linearity, allowing the model to learn more powerful generalizations compared to PCA, and to reconstruct the input with significantly lower information loss.<ref name=":7" />
Line 149:
=== Anomaly detection ===
Another application for autoencoders is [[anomaly detection]].<ref> Morales-Forero, A., & Bassetto, S. (2019, December). Case Study: A Semi-Supervised Methodology for Anomaly Detection and Diagnosis. In ''2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM)'' (p. 4) (pp. 1031-1037). IEEE.</ref> <ref>Sakurada, M., & Yairi, T. (2014, December). Anomaly detection using autoencoders with nonlinear dimensionality reduction. In ''Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis'' (p. 4). ACM.</ref><ref name=":8">An, J., & Cho, S. (2015). Variational autoencoder based anomaly detection using reconstruction probability. ''Special Lecture on IE'', ''2'', 1-18.</ref><ref>Zhou, C., & Paffenroth, R. C. (2017, August). Anomaly detection with robust deep autoencoders. In ''Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining'' (pp. 665-674). ACM.</ref><ref>{{Cite journal|doi=10.1016/j.patrec.2017.07.016|title=A study of deep convolutional auto-encoders for anomaly detection in videos|year=2018|last1=Ribeiro|first1=Manassés|last2=Lazzaretti|first2=André Eugênio|last3=Lopes|first3=Heitor Silvério|journal=Pattern Recognition Letters|volume=105|pages=13–22|bibcode=2018PaReL.105...13R}}</ref> By learning to replicate the most salient features in the training data under some of the constraints described previously, the model is encouraged to learn to precisely reproduce the most frequently observed characteristics. When facing anomalies, the model should worsen its reconstruction performance. In most cases, only data with normal instances are used to train the autoencoder; in others, the frequency of anomalies is small compared to the observation set so that its contribution to the learned representation could be ignored. After training, the autoencoder will accurately reconstruct "normal" data, while failing to do so with unfamiliar anomalous data.<ref name=":8" /> Reconstruction error (the error between the original data and its low dimensional reconstruction) is used as an anomaly score to detect anomalies.<ref name=":8" />
Recent literature has however shown that certain autoencoding models can, counterintuitively, be very good at reconstructing anomalous examples and consequently not able to reliably perform anomaly detection.<ref>{{cite
=== Image processing ===
The characteristics of autoencoders are useful in image processing.
One example can be found in lossy [[image compression]], where autoencoders outperformed other approaches and proved competitive against [[JPEG 2000]].<ref>{{cite
Another useful application of autoencoders in image preprocessing is [[image denoising]].<ref>Cho, K. (2013, February). Simple sparsification improves sparse denoising autoencoders in denoising highly corrupted images. In ''International Conference on Machine Learning'' (pp. 432-440).</ref><ref>{{cite
Autoencoders found use in more demanding contexts such as [[medical imaging]] where they have been used for [[image denoising]]<ref>{{Cite journal|last=Gondara|first=Lovedeep|date=December 2016|title=Medical Image Denoising Using Convolutional Denoising Autoencoders|journal=2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)|___location=Barcelona, Spain|publisher=IEEE|pages=241–246|doi=10.1109/ICDMW.2016.0041|isbn=9781509059102|arxiv=1608.04667|bibcode=2016arXiv160804667G|s2cid=14354973}}</ref> as well as [[super-resolution]].<ref>{{Cite journal|last1=Zeng|first1=Kun|last2=Yu|first2=Jun|last3=Wang|first3=Ruxin|last4=Li|first4=Cuihua|last5=Tao|first5=Dacheng|s2cid=20787612|date=January 2017|title=Coupled Deep Autoencoder for Single Image Super-Resolution|journal=IEEE Transactions on Cybernetics|volume=47|issue=1|pages=27–37|doi=10.1109/TCYB.2015.2501373|pmid=26625442|issn=2168-2267}}</ref><ref>{{cite journal |last1=Tzu-Hsi |first1=Song |last2=Sanchez |first2=Victor |last3=Hesham |first3=EIDaly |last4=Nasir M. |first4=Rajpoot |title=Hybrid deep autoencoder with Curvature Gaussian for detection of various types of cells in bone marrow trephine biopsy images |journal=2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) |date=2017 |pages=1040–1043 |doi=10.1109/ISBI.2017.7950694 |isbn=978-1-5090-1172-8 |s2cid=7433130 }}</ref> In image-assisted diagnosis, experiments have applied autoencoders for [[breast cancer]] detection<ref>{{cite journal |last1=Xu |first1=Jun |last2=Xiang |first2=Lei |last3=Liu |first3=Qingshan |last4=Gilmore |first4=Hannah |last5=Wu |first5=Jianzhong |last6=Tang |first6=Jinghai |last7=Madabhushi |first7=Anant |title=Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images |journal=IEEE Transactions on Medical Imaging |date=January 2016 |volume=35 |issue=1 |pages=119–130 |doi=10.1109/TMI.2015.2458702 |pmid=26208307 |pmc=4729702 }}</ref> and for modelling the relation between the cognitive decline of [[Alzheimer's disease]] and the latent features of an autoencoder trained with [[MRI]].<ref>{{cite journal |last1=Martinez-Murcia |first1=Francisco J. |last2=Ortiz |first2=Andres |last3=Gorriz |first3=Juan M. |last4=Ramirez |first4=Javier |last5=Castillo-Barnes |first5=Diego |s2cid=195187846 |title=Studying the Manifold Structure of Alzheimer's Disease: A Deep Learning Approach Using Convolutional Autoencoders |journal=IEEE Journal of Biomedical and Health Informatics |volume=24 |issue=1 |pages=17–26 |doi=10.1109/JBHI.2019.2914970 |pmid=31217131 |date=2020 |doi-access=free }}</ref>
=== Drug discovery ===
In 2019 molecules generated with variational autoencoders were validated experimentally in mice.<ref>{{cite journal |last1=Zhavoronkov |first1=Alex|s2cid=201716327|date=2019|title=Deep learning enables rapid identification of potent DDR1 kinase inhibitors |journal=Nature Biotechnology |volume=37|issue=9|pages=1038–1040|doi=10.1038/s41587-019-0224-x |pmid=31477924}}</ref><ref>{{cite
=== Popularity prediction ===
Line 169:
=== Machine translation ===
Autoencoders have been applied to [[machine translation]], which is usually referred to as [[neural machine translation]] (NMT).<ref>{{cite
==See also==
|