Autoencoder: Difference between revisions

Content deleted Content added
Oha791 (talk | contribs)
adding one more application for Autoencoder which is "Communication Systems"
 
(16 intermediate revisions by 13 users not shown)
Line 7:
An '''autoencoder''' is a type of [[artificial neural network]] used to learn [[Feature learning|efficient codings]] of unlabeled data ([[unsupervised learning]]). An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for [[dimensionality reduction]], to generate lower-dimensional embeddings for subsequent use by other [[machine learning]] algorithms.<ref>{{Cite book|last1=Bank |first1=Dor |last2=Koenigstein |first2=Noam |last3=Giryes |first3=Raja |year=2023 |chapter=Autoencoders |editor-last1=Rokach |editor-first1=Lior |editor-last2=Maimon |editor-first2=Oded |editor-last3=Shmueli |editor-first3=Erez |title=Machine learning for data science handbook |chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-24628-9_16 |language=en |pages=353–374 |doi=10.1007/978-3-031-24628-9_16|isbn=978-3-031-24627-2 }}</ref>
 
Variants exist which aim to make the learned representations assume useful properties.<ref name=":0" /> Examples are regularized autoencoders (''sparse'', ''denoising'' and ''contractive'' autoencoders), which are effective in learning representations for subsequent [[Statistical classification|classification]] tasks,<ref name=":4" /> and [[Variational_autoencoderVariational autoencoder|''variational'' autoencoders]], which can be used as [[generative model]]s.<ref name=":11">{{cite journal |arxiv=1906.02691|doi=10.1561/2200000056|bibcode=2019arXiv190602691K|title=An Introduction to Variational Autoencoders|date=2019|last1=Welling|first1=Max|last2=Kingma|first2=Diederik P.|journal=Foundations and Trends in Machine Learning|volume=12|issue=4|pages=307–392|s2cid=174802445}}</ref> Autoencoders are applied to many problems, including [[Facial recognition system|facial recognition]],<ref>Hinton GE, Krizhevsky A, Wang SD. [http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf Transforming auto-encoders.] In International Conference on Artificial Neural Networks 2011 Jun 14 (pp. 44-51). Springer, Berlin, Heidelberg.</ref> [[Feature (computer vision)|feature detection]],<ref name=":2">{{Cite book|last=Géron|first=Aurélien|title=Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow|publisher=O’Reilly Media, Inc.|year=2019|___location=Canada|pages=739–740}}</ref> [[anomaly detection]], and [[Word embedding|learning the meaning of words]].<ref>{{cite journal|doi=10.1016/j.neucom.2008.04.030|title=Modeling word perception using the Elman network|journal=Neurocomputing|volume=71|issue=16–18|pages=3150|date=2008|last1=Liou|first1=Cheng-Yuan|last2=Huang|first2=Jau-Chi|last3=Yang|first3=Wen-Chie|url=http://ntur.lib.ntu.edu.tw//handle/246246/155195 }}</ref><ref>{{cite journal|doi=10.1016/j.neucom.2013.09.055|title=Autoencoder for words|journal=Neurocomputing|volume=139|pages=84–96|date=2014|last1=Liou|first1=Cheng-Yuan|last2=Cheng|first2=Wei-Chen|last3=Liou|first3=Jiun-Wei|last4=Liou|first4=Daw-Ran}}</ref> In terms of [[Synthetic data|data synthesis]], autoencoders can also be used to randomly generate new data that is similar to the input (training) data.<ref name=":2" />
 
{{Toclimit|3}}
Line 29:
In most situations, the reference distribution is just the [[Empirical measure|empirical distribution]] given by a dataset <math>\{x_1, ..., x_N\} \subset \mathcal X</math>, so that<math display="block">\mu_{ref} = \frac{1}{N}\sum_{i=1}^N \delta_{x_i}</math>
 
where <math>\delta_{x_i}</math> is the [[Dirac measure]], the quality function is just L2 loss: <math>d(x, x') = \|x - x'\|_2^2</math>, and <math>\|\cdot\|_2</math> is the [[Norm_Norm (mathematics)#Euclidean_normEuclidean norm|Euclidean norm]]. Then the problem of searching for the optimal autoencoder is just a [[Least squares|least-squares]] optimization:<math display="block">\min_{\theta, \phi} L(\theta, \phi),\qquad \text{where } L(\theta, \phi) = \frac{1}{N}\sum_{i=1}^N \|x_i - D_\theta(E_\phi(x_i))\|_2^2</math>
 
=== Interpretation ===
Line 43:
 
In the ideal setting, the code dimension and the model capacity could be set on the basis of the complexity of the data distribution to be modeled. A standard way to do so is to add modifications to the basic autoencoder, to be detailed below.<ref name=":0" />
 
==Variations==
 
Line 78 ⟶ 79:
===Denoising autoencoder (DAE)===
 
[[File:Denoising-autoencoder.png|thumb|A schema of a denoising autoencoder.]]
''Denoising autoencoders'' (DAE) try to achieve a ''good'' representation by changing the ''reconstruction criterion''.<ref name=":0" /><ref name=":4" />
 
A DAE, originally called a "robust autoassociative network" by Mark A. Kramer,<ref name=":13">{{Cite journal |last=Kramer |first=M. A. |date=1992-04-01 |title=Autoassociative neural networks |url=https://dx.doi.org/10.1016/0098-1354%2892%2980051-A |journal=Computers & Chemical Engineering |series=Neutral network applications in chemical engineering |language=en |volume=16 |issue=4 |pages=313–328 |doi=10.1016/0098-1354(92)80051-A |issn=0098-1354|url-access=subscription }}</ref> is trained by intentionally corrupting the inputs of a standard autoencoder during training. A noise process is defined by a probability distribution <math>\mu_T</math> over functions <math>T:\mathcal X \to \mathcal X</math>. That is, the function <math>T</math> takes a message <math>x\in \mathcal X</math>, and corrupts it to a noisy version <math>T(x)</math>. The function <math>T</math> is selected randomly, with a probability distribution <math>\mu_T</math>.
 
Given a task <math>(\mu_{\text{ref}}, d)</math>, the problem of training a DAE is the optimization problem:<math display="block">\min_{\theta, \phi}L(\theta, \phi) = \mathbb \mathbb E_{x\sim \mu_X, T\sim\mu_T}[d(x, (D_\theta\circ E_\phi \circ T)(x))]</math>That is, the optimal DAE should take any noisy message and attempt to recover the original message without noise, thus the name "denoising"''.''
Line 98 ⟶ 99:
 
=== Contractive autoencoder (CAE) ===
A ''contractive autoencoder'' (CAE) adds the contractive regularization loss to the standard autoencoder loss:<math display="block">\min_{\theta, \phi}L(\theta, \phi) + \lambda L_{\text{cont}} (\theta, \phi)</math>where <math>\lambda > 0</math> measures how much contractive-ness we want to enforce. The contractive regularization loss itself is defined as the expected square of [[Frobenius norm]] of the [[Jacobian matrix and determinant|Jacobian matrix]] of the encoder activations with respect to the input:<math display="block">L_{\text{cont}}(\theta, \phi) = \mathbb E_{x\sim \mu_{ref}} \|\nabla_x E_\phi(x) \|_F^2</math>To understand what <math>L_{\text{cont}}</math> measures, note the fact<math display="block">\|E_\phi(x + \delta x) - E_\phi(x)\|_2 \leq \|\nabla_x E_\phi(x) \|_F \|\delta x\|_2</math>for any message <math>x\in \mathcal X</math>, and small variation <math>\delta x</math> in it. Thus, if <math>\|\nabla_x E_\phi(x) \|_F^2</math> is small, it means that a small neighborhood of the message maps to a small neighborhood of its code. This is a desired property, as it means small variation in the message leads to small, perhaps even zero, variation in its code, like how two pictures may look the same even if they are not exactly the same.
 
The DAE can be understood as an infinitesimal limit of CAE: in the limit of small Gaussian input noise, DAEs make the reconstruction function resist small but finite-sized input perturbations, while CAEs make the extracted features resist infinitesimal input perturbations.
Line 125 ⟶ 126:
 
== History ==
(Oja, 1982)<ref>{{Cite journal |last=Oja |first=Erkki |date=1982-11-01 |title=Simplified neuron model as a principal component analyzer |url=https://link.springer.com/article/10.1007/BF00275687 |journal=Journal of Mathematical Biology |language=en |volume=15 |issue=3 |pages=267–273 |doi=10.1007/BF00275687 |pmid=7153672 |issn=1432-1416|url-access=subscription }}</ref> noted that [[Principal component analysis | PCA]] is equivalent to a neural network with one hidden layer with identity activation function. In the language of autoencoding, the input-to-hidden module is the encoder, and the hidden-to-output module is the decoder. Subsequently, in (Baldi and Hornik, 1989)<ref name="auto">{{Cite journal |last1=Baldi |first1=Pierre |last2=Hornik |first2=Kurt |date=1989-01-01 |title=Neural networks and principal component analysis: Learning from examples without local minima |url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 |journal=Neural Networks |volume=2 |issue=1 |pages=53–58 |doi=10.1016/0893-6080(89)90014-2 |issn=0893-6080|url-access=subscription }}</ref> and (Kramer, 1991)<ref name=":12" /> generalized PCA to autoencoders, a technique which they termed as "nonlinear PCA".
 
Immediately after the resurgence of neural networks in the 1980s, it was suggested in 1986<ref>{{Cite book |last1=Rumelhart |first1=David E. |url=https://direct.mit.edu/books/book/4424/Parallel-Distributed-ProcessingExplorations-in-the |title=Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations |last2=McClelland |first2=James L. |last3=AU |date=1986 |publisher=The MIT Press |isbn=978-0-262-29140-8 |language=en |chapter=2. A General Framework for Parallel Distributed Processing |doi=10.7551/mitpress/5236.001.0001}}</ref> that a neural network be put in "auto-association mode". This was then implemented in (Harrison, 1987)<ref>Harrison TD (1987) A Connectionist framework for continuous speech recognition. Cambridge University Ph. D. dissertation</ref> and (Elman, Zipser, 1988)<ref>{{Cite journal |last1=Elman |first1=Jeffrey L. |last2=Zipser |first2=David |date=1988-04-01 |title=Learning the hidden structure of speech |url=https://pubs.aip.org/jasa/article/83/4/1615/826094/Learning-the-hidden-structure-of-speechLearning |journal=The Journal of the Acoustical Society of America |language=en |volume=83 |issue=4 |pages=1615–1626 |doi=10.1121/1.395916 |pmid=3372872 |bibcode=1988ASAJ...83.1615E |issn=0001-4966|url-access=subscription }}</ref> for speech and in (Cottrell, Munro, Zipser, 1987)<ref>{{Cite journal |last1=Cottrell |first1=Garrison W. |last2=Munro |first2=Paul |last3=Zipser |first3=David |date=1987 |title=Learning Internal Representation From Gray-Scale Images: An Example of Extensional Programming |url=https://escholarship.org/uc/item/2zs7w6z8 |journal=Proceedings of the Annual Meeting of the Cognitive Science Society |language=en |volume=9 }}</ref> for images.<ref name=":14" /> In (Hinton, Salakhutdinov, 2006),<ref name=":72">{{cite journal |last1=Hinton |first1=G. E. |last2=Salakhutdinov |first2=R.R. |date=28 July 2006 |title=Reducing the Dimensionality of Data with Neural Networks |journal=Science |volume=313 |issue=5786 |pages=504–507 |bibcode=2006Sci...313..504H |doi=10.1126/science.1127647 |pmid=16873662 |s2cid=1658773}}</ref> [[Deepdeep belief network|deep belief networks]]s were developed. These train a pair [[Restrictedrestricted Boltzmann machine|restricted Boltzmann machines]]s as encoder-decoder pairs, then train another pair on the latent representation of the first pair, and so on.<ref name="scholar">{{Cite journal |vauthors=Hinton G |year=2009 |title=Deep belief networks |journal=Scholarpedia |volume=4 |issue=5 |pages=5947 |bibcode=2009SchpJ...4.5947H |doi=10.4249/scholarpedia.5947 |doi-access=free}}</ref>
 
The first applications of AE date to early 1990s.<ref name=":0" /><ref>{{Cite journal |last=Schmidhuber |first=Jürgen |date=January 2015 |title=Deep learning in neural networks: An overview |journal=Neural Networks |volume=61 |pages=85–117 |arxiv=1404.7828 |doi=10.1016/j.neunet.2014.09.003 |pmid=25462637 |s2cid=11715509}}</ref><ref name=":5" /> Their most traditional application was [[dimensionality reduction]] or [[feature learning]], but the concept became widely used for learning [[generative model]]s of data.<ref name="VAE">{{cite arXiv |eprint=1312.6114 |class=stat.ML |author1=Diederik P Kingma |first2=Max |last2=Welling |title=Auto-Encoding Variational Bayes |date=2013}}</ref><ref name="gan_faces">Generating Faces with Torch, Boesen A., Larsen L. and Sonderby S.K., 2015 {{urlURL|http://torch.ch/blog/2015/11/13/gan.html}}</ref> Some of the most powerful [[Artificial intelligence|AIs]] in the 2010s involved autoencoder modules as a component of larger AI systems, such as VAE in [[Stable Diffusion]], discrete VAE in Transformer-based image generators like [[DALL-E|DALL-E 1]], etc.
 
During the early days, when the terminology was uncertain, the autoencoder has also been called identity mapping,<ref>{{Cite journal |last1=Baldi |first1=Pierre |last2=Hornik |first2=Kurt |date=1989-01-01 |title=Neural networks and principal component analysis: Learning from examples without local minima |url=https://www.sciencedirect.com/science/article/abs/pii/0893608089900142 |journal=Neural Networks |volume=2 |issue=1 |pages=53–58 |doi=10.1016/0893-6080(89)90014-2 |issnname=0893-6080}}<"auto"/ref><ref name=":12" /> auto-associating,<ref>{{Cite journal |last1=Ackley |first1=D |last2=Hinton |first2=G |last3=Sejnowski |first3=T |date=March 1985 |title=A learning algorithm for boltzmann machines |url=http://doi.wiley.com/10.1016/S0364-0213(85)80012-4 |journal=Cognitive Science |language=en |volume=9 |issue=1 |pages=147–169 |doi=10.1016/S0364-0213(85)80012-4}}</ref> [[self-supervised learning|self-supervised]] [[backpropagation]],<ref name=":12" /> or Diabolo network.<ref>{{Cite journal |last1=Schwenk |first1=Holger |last2=Bengio |first2=Yoshua |date=1997 |title=Training Methods for Adaptive Boosting of Neural Networks |url=https://proceedings.neurips.cc/paper/1997/hash/9cb67ffb59554ab1dabb65bcb370ddd9-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=MIT Press |volume=10}}</ref><ref name="bengio" />
 
== Applications ==
Line 141 ⟶ 142:
For Hinton's 2006 study,<ref name=":7" /> he pretrained a multi-layer autoencoder with a stack of [[Restricted Boltzmann machine|RBMs]] and then used their weights to initialize a deep autoencoder with gradually smaller hidden layers until hitting a bottleneck of 30 neurons. The resulting 30 dimensions of the code yielded a smaller reconstruction error compared to the first 30 components of a principal component analysis (PCA), and learned a representation that was qualitatively easier to interpret, clearly separating data clusters.<ref name=":0" /><ref name=":7" />
 
RepresentingReducing dimensions can improve performance on tasks such as classification.<ref name=":0" /> Indeed, the hallmark of dimensionality reduction is to place semantically related examples near each other.<ref name=":3">{{Cite journal|last1=Salakhutdinov|first1=Ruslan|last2=Hinton|first2=Geoffrey|date=2009-07-01|title=Semantic hashing|journal=International Journal of Approximate Reasoning|series=Special Section on Graphical Models and Information Retrieval|volume=50|issue=7|pages=969–978|doi=10.1016/j.ijar.2008.11.006|issn=0888-613X|doi-access=free}}</ref>
 
==== Principal component analysis ====
[[File:Reconstruction autoencoders vs PCA.png|thumb|Reconstruction of 28x28pixel images by an Autoencoder with a code size of two (two-units hidden layer) and the reconstruction from the first two Principal Components of PCA. Images come from the [[Fashion MNIST|Fashion MNIST dataset]].<ref name=":10" />]]
If linear activations are used, or only a single sigmoid hidden layer, then the optimal solution to an autoencoder is strongly related to [[principal component analysis]] (PCA).<ref name=":14">{{Cite journal|last1=Bourlard|first1=H.|last2=Kamp|first2=Y.|date=1988|title=Auto-association by multilayer perceptrons and singular value decomposition|journal=Biological Cybernetics|volume=59|issue=4–5|pages=291–294|doi=10.1007/BF00332918|pmid=3196773|s2cid=206775335|url=http://infoscience.epfl.ch/record/82601}}</ref><ref>{{cite book|title=Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics - BCB '14|last1=Chicco|first1=Davide|last2=Sadowski|first2=Peter|last3=Baldi|first3=Pierre|date=2014|isbn=9781450328944|pages=533|chapter=Deep autoencoder neural networks for gene ontology annotation predictions|doi=10.1145/2649387.2649442|hdl=11311/964622|s2cid=207217210|url=http://dl.acm.org/citation.cfm?id=2649442}}</ref> The weights of an autoencoder with a single hidden layer of size <math>p</math> (where <math>p</math> is less than the size of the input) span the same vector subspace as the one spanned by the first <math>p</math> principal components, and the output of the autoencoder is an orthogonal projection onto this subspace. The autoencoder weights are not equal to the principal components, and are generally not orthogonal, yet the principal components may be recovered from them using the [[singular value decomposition]].<ref>{{cite arXiv|last1=Plaut|first1=E|title=From Principal Subspaces to Principal Components with Linear Autoencoders|eprint=1804.10253|date=2018|class=stat.ML}}</ref>
 
However, the potential of autoencoders resides in their non-linearity, allowing the model to learn more powerful generalizations compared to PCA, and to reconstruct the input with significantly lower information loss.<ref name=":7" />
 
=== Information retrieval and Search engine optimization===
[[Information retrieval]] benefits particularly from [[dimensionality reduction]] in that search can become more efficient in certain kinds of low dimensional spaces. Autoencoders were indeed applied to semantic hashing, proposed by [[Russ Salakhutdinov|Salakhutdinov]] and Hinton in 2007.<ref name=":3" /> By training the algorithm to produce a low-dimensional binary code, all database entries could be stored in a [[hash table]] mapping binary code vectors to entries. This table would then support information retrieval by returning all entries with the same binary code as the query, or slightly less similar entries by flipping some bits from the query encoding.
 
=== Anomaly detection ===
The encoder-decoder architecture, often used in natural language processing and neural networks, can be scientifically applied in the field of SEO (Search Engine Optimization) in various ways:
Another application for autoencoders is [[anomaly detection]].<ref name=":13" /><ref>{{Cite book |last1=Morales-Forero |first1=A. |last2=Bassetto |first2=S. |title=2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) |chapter=Case Study: A Semi-Supervised Methodology for Anomaly Detection and Diagnosis |date=December 2019 |chapter-url=https://ieeexplore.ieee.org/document/8978509 |___location=Macao, Macao |publisher=IEEE |pages=1031–1037 |doi=10.1109/IEEM44572.2019.8978509 |isbn=978-1-7281-3804-6|s2cid=211027131 }}</ref><ref>{{Cite book |last1=Sakurada |first1=Mayu |last2=Yairi |first2=Takehisa |title=Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis |chapter=Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction |date=December 2014 |chapter-url=http://dl.acm.org/citation.cfm?doid=2689746.2689747 |language=en |___location=Gold Coast, Australia QLD, Australia |publisher=ACM Press |pages=4–11 |doi=10.1145/2689746.2689747 |isbn=978-1-4503-3159-3|s2cid=14613395 }}</ref><ref name=":8">An, J., & Cho, S. (2015). [http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2015-03.pdf Variational Autoencoder based Anomaly Detection using Reconstruction Probability]. ''Special Lecture on IE'', ''2'', 1-18.</ref><ref>{{Cite book |last1=Zhou |first1=Chong |last2=Paffenroth |first2=Randy C. |title=Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |chapter=Anomaly Detection with Robust Deep Autoencoders |date=2017-08-04 |chapter-url=https://dl.acm.org/doi/10.1145/3097983.3098052 |language=en |publisher=ACM |pages=665–674 |doi=10.1145/3097983.3098052 |isbn=978-1-4503-4887-4|s2cid=207557733 }}</ref><ref>{{Cite journal|doi=10.1016/j.patrec.2017.07.016|title=A study of deep convolutional auto-encoders for anomaly detection in videos|year=2018|last1=Ribeiro|first1=Manassés|last2=Lazzaretti|first2=André Eugênio|last3=Lopes|first3=Heitor Silvério|journal=Pattern Recognition Letters|volume=105|pages=13–22|bibcode=2018PaReL.105...13R}}</ref> By learning to replicate the most salient features in the training data under some of the constraints described previously, the model is encouraged to learn to precisely reproduce the most frequently observed characteristics. When facing anomalies, the model should worsen its reconstruction performance. In most cases, only data with normal instances are used to train the autoencoder; in others, the frequency of anomalies is small compared to the observation set so that its contribution to the learned representation could be ignored. After training, the autoencoder will accurately reconstruct "normal" data, while failing to do so with unfamiliar anomalous data.<ref name=":8" /> Reconstruction error (the error between the original data and its low dimensional reconstruction) is used as an anomaly score to detect anomalies.<ref name=":8" />
Typically, this means that on a validation set the empirical distribution of reconstruction errors is recorded and then (e.g.) the empirical 95-percentile <math>x_p</math> is taken as threshold <math>t:=x_p</math> to flag anomalous data points: <math>\text{loss}(x, \text{reconstruction}(x))>t \implies \text{anomaly}</math>. Since the threshold is an empirical [[quantile]] estimate, there is an inherent difficulty with "correctly" setting this threshold:
In many cases the distribution of the empirical quantile is asymptotically a normal distribution <math>\text{empirical p-quantile} \sim \mathcal{N}\left(\mu=p, \sigma^2=\frac{p( 1 - p )}{n f(x_p)^2}\right),</math> with <math>f(x_p)</math> the probability density at the quantile. This means that the variance grows if an extreme quantile is considered (because <math>f(x_p)</math> is small there). This means that there is a, potentially, a big uncertainty in what is the right choice for the threshold since it is ''estimated'' from a validation set.
 
# '''Text Processing''': By using an autoencoder, it's possible to compress the text of web pages into a more compact vector representation. This can help reduce page loading times and improve indexing by search engines.
#
# '''Noise Reduction''': Autoencoders can be used to remove noise from the textual data of web pages. This can lead to a better understanding of the content by search engines, thereby enhancing ranking in search engine result pages.
#
# '''Meta Tag and Snippet Generation''': Autoencoders can be trained to automatically generate meta tags, snippets, and descriptions for web pages using the page content. This can optimize the presentation in search results, increasing the Click-Through Rate (CTR).
#
# '''Content Clustering''': Using an autoencoder, web pages with similar content can be automatically grouped together. This can help organize the website logically and improve navigation, potentially positively affecting user experience and search engine rankings.
#
# '''Generation of Related Content''': An autoencoder can be employed to generate content related to what is already present on the site. This can enhance the website's attractiveness to search engines and provide users with additional relevant information.
#
# '''Keyword Detection''': Autoencoders can be trained to identify keywords and important concepts within the content of web pages. This can assist in optimizing keyword usage for better indexing.
#
# '''Semantic Search''': By using autoencoder techniques, semantic representation models of content can be created. These models can be used to enhance search engines' understanding of the themes covered in web pages.
 
Recent literature has however shown that certain autoencoding models can, counterintuitively, be very good at reconstructing anomalous examples and consequently not able to reliably perform anomaly detection.<ref>{{cite arXiv|last1=Nalisnick|first1=Eric|last2=Matsukawa|first2=Akihiro|last3=Teh|first3=Yee Whye|last4=Gorur|first4=Dilan|last5=Lakshminarayanan|first5=Balaji|date=2019-02-24|title=Do Deep Generative Models Know What They Don't Know?|class=stat.ML|eprint=1810.09136}}</ref><ref>{{Cite journal|last1=Xiao|first1=Zhisheng|last2=Yan|first2=Qing|last3=Amit|first3=Yali|date=2020|title=Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder|url=https://proceedings.neurips.cc/paper/2020/hash/eddea82ad2755b24c4e168c5fc2ebd40-Abstract.html|journal=Advances in Neural Information Processing Systems|language=en|volume=33|arxiv=2003.02977}}</ref>
In essence, the encoder-decoder architecture or autoencoders can be leveraged in SEO to optimize web page content, improve their indexing, and enhance their appeal to both search engines and users.
Intuitively, this can be understood by considering those one layer auto encoders which are related to PCA - also in this case there can be perfect rein reconstructions for points which are far away from the data region but which lie on a principal component axis.
 
=== Anomaly detection ===
Another application for autoencoders is [[anomaly detection]].<ref name=":13" /><ref>{{Cite book |last1=Morales-Forero |first1=A. |last2=Bassetto |first2=S. |title=2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) |chapter=Case Study: A Semi-Supervised Methodology for Anomaly Detection and Diagnosis |date=December 2019 |chapter-url=https://ieeexplore.ieee.org/document/8978509 |___location=Macao, Macao |publisher=IEEE |pages=1031–1037 |doi=10.1109/IEEM44572.2019.8978509 |isbn=978-1-7281-3804-6|s2cid=211027131 }}</ref><ref>{{Cite book |last1=Sakurada |first1=Mayu |last2=Yairi |first2=Takehisa |title=Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis |chapter=Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction |date=December 2014 |chapter-url=http://dl.acm.org/citation.cfm?doid=2689746.2689747 |language=en |___location=Gold Coast, Australia QLD, Australia |publisher=ACM Press |pages=4–11 |doi=10.1145/2689746.2689747 |isbn=978-1-4503-3159-3|s2cid=14613395 }}</ref><ref name=":8">An, J., & Cho, S. (2015). [http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2015-03.pdf Variational Autoencoder based Anomaly Detection using Reconstruction Probability]. ''Special Lecture on IE'', ''2'', 1-18.</ref><ref>{{Cite book |last1=Zhou |first1=Chong |last2=Paffenroth |first2=Randy C. |title=Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |chapter=Anomaly Detection with Robust Deep Autoencoders |date=2017-08-04 |chapter-url=https://dl.acm.org/doi/10.1145/3097983.3098052 |language=en |publisher=ACM |pages=665–674 |doi=10.1145/3097983.3098052 |isbn=978-1-4503-4887-4|s2cid=207557733 }}</ref><ref>{{Cite journal|doi=10.1016/j.patrec.2017.07.016|title=A study of deep convolutional auto-encoders for anomaly detection in videos|year=2018|last1=Ribeiro|first1=Manassés|last2=Lazzaretti|first2=André Eugênio|last3=Lopes|first3=Heitor Silvério|journal=Pattern Recognition Letters|volume=105|pages=13–22|bibcode=2018PaReL.105...13R}}</ref> By learning to replicate the most salient features in the training data under some of the constraints described previously, the model is encouraged to learn to precisely reproduce the most frequently observed characteristics. When facing anomalies, the model should worsen its reconstruction performance. In most cases, only data with normal instances are used to train the autoencoder; in others, the frequency of anomalies is small compared to the observation set so that its contribution to the learned representation could be ignored. After training, the autoencoder will accurately reconstruct "normal" data, while failing to do so with unfamiliar anomalous data.<ref name=":8" /> Reconstruction error (the error between the original data and its low dimensional reconstruction) is used as an anomaly score to detect anomalies.<ref name=":8" />
 
It is best to analyze if the anomalies which are flagged by the auto encoder are true anomalies. In this sense all the metrics in [[Evaluation of binary classifiers]] can be considered. The fundamental challenge which comes with the unsupervised (self-supervised) learning setting is, that labels for rare events do not exist (in which case the labels first have to be gathered and the data set will be imbalanced) or anomaly indicating labels are very rare, introducing larger [[confidence interval]]s for these performance estimates.
Recent literature has however shown that certain autoencoding models can, counterintuitively, be very good at reconstructing anomalous examples and consequently not able to reliably perform anomaly detection.<ref>{{cite arXiv|last1=Nalisnick|first1=Eric|last2=Matsukawa|first2=Akihiro|last3=Teh|first3=Yee Whye|last4=Gorur|first4=Dilan|last5=Lakshminarayanan|first5=Balaji|date=2019-02-24|title=Do Deep Generative Models Know What They Don't Know?|class=stat.ML|eprint=1810.09136}}</ref><ref>{{Cite journal|last1=Xiao|first1=Zhisheng|last2=Yan|first2=Qing|last3=Amit|first3=Yali|date=2020|title=Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder|url=https://proceedings.neurips.cc/paper/2020/hash/eddea82ad2755b24c4e168c5fc2ebd40-Abstract.html|journal=Advances in Neural Information Processing Systems|language=en|volume=33|arxiv=2003.02977}}</ref>
 
=== Image processing ===
Line 182 ⟶ 171:
Another useful application of autoencoders in image preprocessing is [[image denoising]].<ref>Cho, K. (2013, February). Simple sparsification improves sparse denoising autoencoders in denoising highly corrupted images. In ''International Conference on Machine Learning'' (pp. 432-440).</ref><ref>{{cite arXiv |eprint=1301.3468|last1=Cho|first1=Kyunghyun|title=Boltzmann Machines and Denoising Autoencoders for Image Denoising|class=stat.ML|date=2013}}</ref><ref>{{Cite journal|doi = 10.1137/040616024|title = A Review of Image Denoising Algorithms, with a New One |url=https://hal.archives-ouvertes.fr/hal-00271141 |year = 2005|last1 = Buades|first1 = A.|last2 = Coll|first2 = B.|last3 = Morel|first3 = J. M.|journal = Multiscale Modeling & Simulation|volume = 4|issue = 2|pages = 490–530|s2cid = 218466166 }}</ref>
 
Autoencoders found use in more demanding contexts such as [[medical imaging]] where they have been used for [[image denoising]]<ref>{{Cite book|last=Gondara|first=Lovedeep|title=2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) |chapter=Medical Image Denoising Using Convolutional Denoising Autoencoders |date=December 2016|___location=Barcelona, Spain|publisher=IEEE|pages=241–246|doi=10.1109/ICDMW.2016.0041|isbn=9781509059102|arxiv=1608.04667|bibcode=2016arXiv160804667G|s2cid=14354973}}</ref> as well as [[super-resolution]].<ref>{{Cite journal|last1=Zeng|first1=Kun|last2=Yu|first2=Jun|last3=Wang|first3=Ruxin|last4=Li|first4=Cuihua|last5=Tao|first5=Dacheng|s2cid=20787612|date=January 2017|title=Coupled Deep Autoencoder for Single Image Super-Resolution|journal=IEEE Transactions on Cybernetics|volume=47|issue=1|pages=27–37|doi=10.1109/TCYB.2015.2501373|pmid=26625442|bibcode=2017ITCyb..47...27Z |issn=2168-2267}}</ref><ref>{{cite book |last1=Tzu-Hsi |first1=Song |last2=Sanchez |first2=Victor |last3=Hesham |first3=EIDaly |last4=Nasir M. |first4=Rajpoot |title=2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) |chapter=Hybrid deep autoencoder with Curvature Gaussian for detection of various types of cells in bone marrow trephine biopsy images |date=2017 |pages=1040–1043 |doi=10.1109/ISBI.2017.7950694 |isbn=978-1-5090-1172-8 |s2cid=7433130 }}</ref> In image-assisted diagnosis, experiments have applied autoencoders for [[breast cancer]] detection<ref>{{cite journal |last1=Xu |first1=Jun |last2=Xiang |first2=Lei |last3=Liu |first3=Qingshan |last4=Gilmore |first4=Hannah |last5=Wu |first5=Jianzhong |last6=Tang |first6=Jinghai |last7=Madabhushi |first7=Anant |title=Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images |journal=IEEE Transactions on Medical Imaging |date=January 2016 |volume=35 |issue=1 |pages=119–130 |doi=10.1109/TMI.2015.2458702 |pmid=26208307 |pmc=4729702 |bibcode=2016ITMI...35..119X }}</ref> and for modelling the relation between the cognitive decline of [[Alzheimer's disease]] and the latent features of an autoencoder trained with [[MRI]].<ref>{{cite journal |last1=Martinez-Murcia |first1=Francisco J. |last2=Ortiz |first2=Andres |last3=Gorriz |first3=Juan M. |last4=Ramirez |first4=Javier |last5=Castillo-Barnes |first5=Diego |s2cid=195187846 |title=Studying the Manifold Structure of Alzheimer's Disease: A Deep Learning Approach Using Convolutional Autoencoders |journal=IEEE Journal of Biomedical and Health Informatics |volume=24 |issue=1 |pages=17–26 |doi=10.1109/JBHI.2019.2914970 |pmid=31217131 |date=2020 |bibcode=2020IJBHI..24...17M |doi-access=free |hdl=10630/28806 |hdl-access=free }}</ref>
 
=== Drug discovery ===
Line 194 ⟶ 183:
 
=== Communication Systems ===
Autoencoders in communication systems are important because they help in encoding data into a more resilient representation for channel impairments, which is crucial for transmitting information while minimizing errors. In Addition, AE-based systems can optimize end-to-end communication performance. This approach can solve the several limitations of designing communication systems such as the inherent difficulty in accurately modeling the complex behavior of real-world channels .<ref>{{cite arXiv |eprint=2412.13843|last1=Alnaseri|first1=Omar|last2=Alzubaidi|first2=Laith|last3=Himeur|first3=Yassine|last4=Timmermann|first4=Jens|title=A Review on Deep Learning Autoencoder in the Design of Next-Generation Communication Systems|class=eess.SP|date=2024}}</ref>.
 
==See also==
* [[Representation learning]]
* [[Singular value decomposition]]
* [[Sparse dictionary learning]]
* [[Deep learning]]