Content deleted Content added
categorization/tagging using AWB |
m Open access bot: url-access updated in citation with #oabot. |
||
(194 intermediate revisions by more than 100 users not shown) | |||
Line 1:
{{short description|Technique for dimensionality reduction}}
{{lower case title}}▼
{{redirect|TSNE|the Boston-based organization|Third Sector New England}}
'''t-Distributed Stochastic Neighbor Embedding (t-SNE)''' is a [[machine learning]] algorithm for [[dimensionality reduction]] developed by Laurens van der Maaten and [[Geoffrey Hinton]].<ref>{{cite journal|last=van der Maaten|first=L.J.P.|coauthors=Hinton, G.E.|title=Visualizing High-Dimensional Data Using t-SNE|journal=Journal of Machine Learning Research 9|year=2008|month=Nov|pages=2579–2605|url=http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf}}</ref> It is a [[nonlinear dimensionality reduction]] technique that is particularly well suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points.▼
[[File:T-SNE visualisation of word embeddings generated using 19th century literature.png|thumb|T-SNE visualisation of [[word embedding]]s generated using 19th century literature]]
[[File:T-SNE Embedding of MNIST.png|thumb|T-SNE embeddings of [[MNIST]] dataset]]
{{Data Visualization}}
▲'''t-
The t-SNE
t-SNE has been used for visualization in a wide range of applications, including [[genomics]], [[computer security]] research,<ref>{{cite journal|last=Gashi|first=I.|
For a data set with ''n'' elements, t-SNE runs in {{math|O(''n''<sup>2</sup>)}} time and requires {{math|O(''n''<sup>2</sup>)}} space.<ref>{{cite arXiv|title=Approximated and User Steerable tSNE for Progressive Visual Analytics|last=Pezzotti|first=Nicola|date=2015 |class=cs.CV |eprint=1512.01655 }}</ref>
== Details ==
Given a set of <math>N</math> high-dimensional objects <math>\mathbf{x}_1, \dots, \mathbf{x}_N</math>, t-SNE first computes probabilities <math>p_{ij}</math> that are proportional to the similarity of objects <math>\mathbf{x}_i</math> and <math>\mathbf{x}_j</math>, as follows
For <math>i \neq j</math>, define
: <math>p_{j and set <math>p_{i\mid i} = 0</math>.
Note the above denominator ensures <math>\sum_j p_{j\mid i} = 1</math> for all <math>i</math>.
As van der Maaten and Hinton explained: "The similarity of datapoint <math>x_j</math> to datapoint <math>x_i</math> is the conditional probability, <math>p_{j|i}</math>, that <math>x_i</math> would pick <math>x_j</math> as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at <math>x_i</math>."<ref name=MaatenHinton/>
<math>p_{ij} = \frac{p_{j|i} + p_{i|j}}{2N}</math>▼
Now define
The bandwidth of the Gaussian kernels <math>\sigma_i</math>, is set in such a way that the [[perplexity]] of the conditional distribution equals a predefined perplexity using a [[binary search]]. As a result, the bandwidth is adapted to the [[density]] of the data: smaller values of <math>\sigma_i</math> are used in denser parts of the data space.▼
Also note that <math>
▲The bandwidth of the [[Gaussian
Herein a heavy-tailed [[Student-t distribution]] is used to measure similarities between low-dimensional points in order to allow dissimilar objects to be modeled far apart in the map CITATION.▼
: <math>Perp(P_i) = 2^{H(P_i)}</math>
The locations of the points <math>\mathbf{y}_i</math> in the map are determined by minimizing the [[Kullback-Leibler divergence]] between the two distributions <math>P</math> and <math>Q</math>:▼
where <math>
The perplexity is a hand-chosen parameter of t-SNE, and as the authors state, "perplexity can be interpreted as a smooth measure of the effective number of neighbors. The performance of SNE is fairly robust to changes in the perplexity, and typical values are between 5 and 50.".<ref name=MaatenHinton/>
The minimization of the Kullback-Leibler divergence with respect to the points <math>\mathbf{y}_i</math> is performed using [[gradient descent]]. The result of this optimization is a map that reflects the similarities between the high-dimensional inputs well.▼
Since the Gaussian kernel uses the [[Euclidean distance]] <math>\lVert x_i-x_j \rVert</math>, it is affected by the [[curse of dimensionality]], and in high dimensional data when distances lose the ability to discriminate, the <math>p_{ij}</math> become too similar (asymptotically, they would converge to a constant). It has been proposed to adjust the distances with a power transform, based on the [[intrinsic dimension]] of each point, to alleviate this.<ref>{{Cite conference|last1=Schubert|first1=Erich|last2=Gertz|first2=Michael|date=2017-10-04|title=Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection|conference=SISAP 2017 – 10th International Conference on Similarity Search and Applications|pages=188–203|doi=10.1007/978-3-319-68474-1_13}}</ref>
t-SNE aims to learn a <math>d</math>-dimensional map <math>\mathbf{y}_1, \dots, \mathbf{y}_N</math> (with <math>\mathbf{y}_i \in \mathbb{R}^d</math> and <math>d</math> typically chosen as 2 or 3) that reflects the similarities <math>p_{ij}</math> as well as possible. To this end, it measures similarities <math>q_{ij}</math> between two points in the map <math>\mathbf{y}_i</math> and <math>\mathbf{y}_j</math>, using a very similar approach.
Specifically, for <math>i \neq j</math>, define <math>q_{ij}</math> as
: <math>q_{ij} = \frac{(1 + \lVert \mathbf{y}_i - \mathbf{y}_j\rVert^2)^{-1}}{\sum_k \sum_{l \neq k} (1 + \lVert \mathbf{y}_k - \mathbf{y}_l\rVert^2)^{-1}}</math>
and set <math> q_{ii} = 0 </math>.
▲Herein a heavy-tailed [[Student
▲The locations of the points <math>\mathbf{y}_i</math> in the map are determined by minimizing the (non-symmetric) [[
: <math>\mathrm{KL}\left(P \parallel Q\right) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}</math>
▲The minimization of the
The result of this optimization is a map that reflects the similarities between the high-dimensional inputs.
== Output ==
While t-SNE plots often seem to display [[cluster analysis|clusters]], the visual clusters can be strongly influenced by the chosen parameterization (especially the perplexity) and so a good understanding of the parameters for t-SNE is needed. Such "clusters" can be shown to even appear in structured data with no clear clustering,<ref>{{Cite web |title=K-means clustering on the output of t-SNE |url=https://stats.stackexchange.com/a/264647 |access-date=2018-04-16 |website=Cross Validated}}</ref> and so may be false findings. Similarly, the size of clusters produced by t-SNE is not informative, and neither is the distance between clusters.<ref>{{Cite journal |last1=Wattenberg |first1=Martin |last2=Viégas |first2=Fernanda |last3=Johnson |first3=Ian |date=2016-10-13 |title=How to Use t-SNE Effectively |url=http://distill.pub/2016/misread-tsne |journal=Distill |language=en |volume=1 |issue=10 |pages=e2 |doi=10.23915/distill.00002 |issn=2476-0757|doi-access=free }}</ref> Thus, interactive exploration may be needed to choose parameters and validate results.<ref>{{Cite journal |last1=Pezzotti |first1=Nicola |last2=Lelieveldt |first2=Boudewijn P. F. |last3=Maaten |first3=Laurens van der |last4=Hollt |first4=Thomas |last5=Eisemann |first5=Elmar |last6=Vilanova |first6=Anna |date=2017-07-01 |title=Approximated and User Steerable tSNE for Progressive Visual Analytics |journal=IEEE Transactions on Visualization and Computer Graphics |language=en-US |volume=23 |issue=7 |pages=1739–1752 |arxiv=1512.01655 |doi=10.1109/tvcg.2016.2570755 |issn=1077-2626 |pmid=28113434 |s2cid=353336}}</ref><ref>{{cite journal |last1=Wattenberg |first1=Martin |last2=Viégas |first2=Fernanda |last3=Johnson |first3=Ian |date=2016-10-13 |title=How to Use t-SNE Effectively |url=https://distill.pub/2016/misread-tsne/ |journal=Distill |language=en |volume=1 |issue=10 |doi=10.23915/distill.00002 |access-date=4 December 2017 |doi-access=free}}</ref> It has been shown that t-SNE can often recover well-separated clusters, and with special parameter choices, approximates a simple form of [[spectral clustering]].<ref>{{cite arXiv |eprint=1706.02582 |class=cs.LG |first1=George C. |last1=Linderman |first2=Stefan |last2=Steinerberger |title=Clustering with t-SNE, provably |date=2017-06-08}}</ref>
== Software ==
* A C++ implementation of Barnes-Hut is available on the [https://github.com/lvdmaaten/bhtsne github account] of one of the original authors.
* The R package [https://CRAN.R-project.org/package=Rtsne Rtsne] implements t-SNE in [[R (programming language)|R]].
* [[ELKI]] contains tSNE, also with Barnes-Hut approximation
* [[scikit-learn]], a popular machine learning library in Python implements t-SNE with both exact solutions and the Barnes-Hut approximation.
* Tensorboard, the visualization kit associated with [[TensorFlow]], also implements t-SNE ([https://projector.tensorflow.org/ online version])
* The [[Julia (programming language)|Julia]] package [https://juliapackages.com/p/tsne TSne] implements t-SNE
== References ==
{{reflist}}
== External links ==
* {{Cite journal |last1=Wattenberg |first1=Martin |last2=Viégas |first2=Fernanda |last3=Johnson |first3=Ian |date=2016-10-13 |title=How to Use t-SNE Effectively |url=https://distill.pub/2016/misread-tsne/ |journal=Distill |language=en |volume=1 |issue=10 |pages=e2 |doi=10.23915/distill.00002 |issn=2476-0757|doi-access=free }}. Interactive demonstration and tutorial.
* [https://www.youtube.com/watch?v=RJVL80Gg3lA Visualizing Data Using t-SNE], Google Tech Talk about t-SNE
* [https://lvdmaaten.github.io/tsne/ Implementations of t-SNE in various languages], A link collection maintained by Laurens van der Maaten
[[Category:Machine learning algorithms]]
[[Category:Dimension reduction]]
|