Content deleted Content added
Citation bot (talk | contribs) Added bibcode. Removed URL that duplicated identifier. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 213/1032 |
|||
Line 156:
Another application for autoencoders is [[anomaly detection]].<ref name=":13" /><ref>{{Cite book |last1=Morales-Forero |first1=A. |last2=Bassetto |first2=S. |title=2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) |chapter=Case Study: A Semi-Supervised Methodology for Anomaly Detection and Diagnosis |date=December 2019 |___location=Macao, Macao |publisher=IEEE |pages=1031–1037 |doi=10.1109/IEEM44572.2019.8978509 |isbn=978-1-7281-3804-6|s2cid=211027131 }}</ref><ref>{{Cite book |last1=Sakurada |first1=Mayu |last2=Yairi |first2=Takehisa |title=Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis |chapter=Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction |date=December 2014 |chapter-url=http://dl.acm.org/citation.cfm?doid=2689746.2689747 |language=en |___location=Gold Coast, Australia QLD, Australia |publisher=ACM Press |pages=4–11 |doi=10.1145/2689746.2689747 |isbn=978-1-4503-3159-3|s2cid=14613395 }}</ref><ref name=":8">An, J., & Cho, S. (2015). [http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2015-03.pdf Variational Autoencoder based Anomaly Detection using Reconstruction Probability]. ''Special Lecture on IE'', ''2'', 1-18.</ref><ref>{{Cite book |last1=Zhou |first1=Chong |last2=Paffenroth |first2=Randy C. |title=Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |chapter=Anomaly Detection with Robust Deep Autoencoders |date=2017-08-04 |chapter-url=https://dl.acm.org/doi/10.1145/3097983.3098052 |language=en |publisher=ACM |pages=665–674 |doi=10.1145/3097983.3098052 |isbn=978-1-4503-4887-4|s2cid=207557733 }}</ref><ref>{{Cite journal|doi=10.1016/j.patrec.2017.07.016|title=A study of deep convolutional auto-encoders for anomaly detection in videos|year=2018|last1=Ribeiro|first1=Manassés|last2=Lazzaretti|first2=André Eugênio|last3=Lopes|first3=Heitor Silvério|journal=Pattern Recognition Letters|volume=105|pages=13–22|bibcode=2018PaReL.105...13R}}</ref> By learning to replicate the most salient features in the training data under some of the constraints described previously, the model is encouraged to learn to precisely reproduce the most frequently observed characteristics. When facing anomalies, the model should worsen its reconstruction performance. In most cases, only data with normal instances are used to train the autoencoder; in others, the frequency of anomalies is small compared to the observation set so that its contribution to the learned representation could be ignored. After training, the autoencoder will accurately reconstruct "normal" data, while failing to do so with unfamiliar anomalous data.<ref name=":8" /> Reconstruction error (the error between the original data and its low dimensional reconstruction) is used as an anomaly score to detect anomalies.<ref name=":8" />
Typically, this means that on a validation set the empirical distribution of reconstruction errors is recorded and then (e.g.) the empirical 95-percentile <math>x_p</math> is taken as threshold <math>t:=x_p</math> to flag anomalous data points: <math>\text{loss}(x, \text{reconstruction}(x))>t \implies \text{anomaly}</math>. Since the threshold is an empirical [[quantile]] estimate, there is an inherent difficulty with "correctly" setting this threshold:
In many cases the distribution of the empirical quantile is asymptotically a normal distribution <math>\text{empirical p-quantile} \sim \mathcal{N}\left(\mu=p, \sigma^2=\frac{p( 1 - p )}{n f(x_p)^2}\right),</math> with <math>f(x_p)</math> the probability density at the quantile. This means that the variance grows if an extreme quantile is considered (because <math>f(x_p)</math> is small there). This means that there is a, potentially, a big uncertainty in what is the right choice for the threshold
|