T-distributed stochastic neighbor embedding: Difference between revisions

Content deleted Content added
m Fixing a grammatical error.
Yobot (talk | contribs)
m WP:CHECKWIKI error fixes using AWB (12041)
Line 5:
The t-SNE algorithm comprises two main stages. First, t-SNE constructs a [[probability distribution]] over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an [[infinitesimal]] probability of being picked. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the [[Kullback–Leibler divergence]] between the two distributions with respect to the locations of the points in the map. Note that whilst the original algorithm uses the [[Euclidean distance]] between objects as the base of its similarity metric, this should be changed as appropriate.
 
t-SNE has been used in a wide range of applications, including [[computer security]] research,<ref>{{cite journal|last=Gashi|first=I.|author2=Stankovic, V. |author3=Leita, C. |author4=Thonnard, O. |title=An Experimental Study of Diversity with Off-the-shelf AntiVirus Engines|journal=Proceedings of the IEEE International Symposium on Network Computing and Applications|year=2009|pages=4–11}}</ref> [[music analysis]],<ref>{{cite journal|last=Hamel|first=P.|author2=Eck, D. |title=Learning Features from Music Audio with Deep Belief Networks|journal=Proceedings of the International Society for Music Information Retrieval Conference|year=2010|pages=339–344}}</ref> [[cancer research]],<ref>{{cite journal|last=Jamieson|first=A.R.|author2=Giger, M.L. |author3=Drukker, K. |author4=Lui, H. |author5=Yuan, Y. |author6=Bhooshan, N. |title=Exploring Nonlinear Feature Space Dimension Reduction and Data Representation in Breast CADx with Laplacian Eigenmaps and t-SNE|journal=Medical Physics |issue=1|year=2010|pages=339–351|doi=10.1118/1.3267037|volume=37}}</ref> [[bioinformatics]],<ref>{{cite journal|last=Wallach|first=I.|author2=Liliean, R. |title=The Protein-Small-Molecule Database, A Non-Redundant Structural Resource for the Analysis of Protein-Ligand Binding|journal=Bioinformatics |year=2009|pages=615–620|doi=10.1093/bioinformatics/btp035|volume=25|issue=5}}</ref>, and biomedical signal processing.<ref>{{Cite journal|last=Birjandtalab|first=J.|last2=Pouyan|first2=M. B.|last3=Nourani|first3=M.|date=2016-02-01|title=Nonlinear dimension reduction for EEG-based epileptic seizure detection|url=http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7455968|journal=2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)|pages=595–598|doi=10.1109/BHI.2016.7455968}}</ref>.
 
== Details ==
Line 22:
Herein a heavy-tailed [[Student-t distribution]] (with one-degree of freedom, which is the same as a [[Cauchy distribution]]) is used to measure similarities between low-dimensional points in order to allow dissimilar objects to be modeled far apart in the map.
 
The locations of the points <math>\mathbf{y}_i</math> in the map are determined by minimizing the (non-symmetric) [[Kullback–Leibler divergence]] of the distribution <math>Q</math> from the distribution <math>P</math>, that is:
 
: <math>KL(P||Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}</math>