T-distributed stochastic neighbor embedding: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 1:
{{lower case title}}
{{Machine learning bar}}
'''t-distributed stochastic neighbor embedding (t-SNE)''' is a [[machine learning]] algorithm for [[dimensionality reduction]] developed by [[Laurens van der Maaten]] and [[Geoffrey Hinton]].<ref>{{cite journal|last=van der Maaten|first=L.J.P.|author2=Hinton, G.E. |title=Visualizing High-Dimensional Data Using t-SNE|journal=Journal of Machine Learning Research |volume=9|date=Nov 2008|pages=2579–2605|url=http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf}}</ref> It is a [[nonlinear dimensionality reduction]] technique that is particularly well suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a [[scatter plot]]. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points.
 
The t-SNE algorithm comprises two main stages. First, t-SNE constructs a [[probability distribution]] over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an [[infinitesimal]] probability of being picked. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the [[Kullback–Leibler divergence]] between the two distributions with respect to the locations of the points in the map. Note that whilst the original algorithm uses the [[Euclidean distance]] between objects as the base of its similarity metric, this should be changed as appropriate.