Content deleted Content added
Birjandtalab (talk | contribs) m Fixing a grammatical error. |
m WP:CHECKWIKI error fixes using AWB (12041) |
||
Line 5:
The t-SNE algorithm comprises two main stages. First, t-SNE constructs a [[probability distribution]] over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an [[infinitesimal]] probability of being picked. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the [[Kullback–Leibler divergence]] between the two distributions with respect to the locations of the points in the map. Note that whilst the original algorithm uses the [[Euclidean distance]] between objects as the base of its similarity metric, this should be changed as appropriate.
t-SNE has been used in a wide range of applications, including [[computer security]] research,<ref>{{cite journal|last=Gashi|first=I.|author2=Stankovic, V. |author3=Leita, C. |author4=Thonnard, O. |title=An Experimental Study of Diversity with Off-the-shelf AntiVirus Engines|journal=Proceedings of the IEEE International Symposium on Network Computing and Applications|year=2009|pages=4–11}}</ref> [[music analysis]],<ref>{{cite journal|last=Hamel|first=P.|author2=Eck, D. |title=Learning Features from Music Audio with Deep Belief Networks|journal=Proceedings of the International Society for Music Information Retrieval Conference|year=2010|pages=339–344}}</ref> [[cancer research]],<ref>{{cite journal|last=Jamieson|first=A.R.|author2=Giger, M.L. |author3=Drukker, K. |author4=Lui, H. |author5=Yuan, Y. |author6=Bhooshan, N. |title=Exploring Nonlinear Feature Space Dimension Reduction and Data Representation in Breast CADx with Laplacian Eigenmaps and t-SNE|journal=Medical Physics |issue=1|year=2010|pages=339–351|doi=10.1118/1.3267037|volume=37}}</ref> [[bioinformatics]],<ref>{{cite journal|last=Wallach|first=I.|author2=Liliean, R. |title=The Protein-Small-Molecule Database, A Non-Redundant Structural Resource for the Analysis of Protein-Ligand Binding|journal=Bioinformatics |year=2009|pages=615–620|doi=10.1093/bioinformatics/btp035|volume=25|issue=5}}</ref>
== Details ==
Line 22:
Herein a heavy-tailed [[Student-t distribution]] (with one-degree of freedom, which is the same as a [[Cauchy distribution]]) is used to measure similarities between low-dimensional points in order to allow dissimilar objects to be modeled far apart in the map.
The locations of the points <math>\mathbf{y}_i</math> in the map are determined by minimizing the (non-symmetric) [[Kullback–Leibler divergence]] of the distribution <math>Q</math> from the distribution <math>P</math>, that is:
: <math>KL(P||Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}</math>
|