T-distributed stochastic neighbor embedding: Difference between revisions

Content deleted Content added
m Minus sign mistake.
m Details: added 2 wikilinks, capitalized "Shannon"
Line 30:
Also note that <math>p_{ii} = 0 </math> and <math>\sum_{i, j} p_{ij} = 1</math>.
 
The bandwidth of the [[Gaussian kernel]]s <math>\sigma_i</math> is set in such a way that the [[Entropy (information theory)|entropy]] of the conditional distribution equals a predefined entropy using the [[bisection method]]. As a result, the bandwidth is adapted to the [[density]] of the data: smaller values of <math>\sigma_i</math> are used in denser parts of the data space. The entropy increases with the [[perplexity]] of this distribution <math>P_i</math>; this relation is seen as
 
: <math>Perp(P_i) = 2^{H(P_i)}</math>
 
where <math>H(P_i)</math> is the shannonShannon entropy <math>H(P_i)=-\sum_jp_{j|i}\log_2p_{j|i}.</math>
 
The perplexity is a hand-chosen parameter of t-SNE, and as the authors state, "perplexity can be interpreted as a smooth measure of the effective number of neighbors. The performance of SNE is fairly robust to changes in the perplexity, and typical values are between 5 and 50.".<ref name=MaatenHinton/>
 
Since the Gaussian kernel uses the [[Euclidean distance]] <math>\lVert x_i-x_j \rVert</math>, it is affected by the [[curse of dimensionality]], and in high dimensional data when distances lose the ability to discriminate, the <math>p_{ij}</math> become too similar (asymptotically, they would converge to a constant). It has been proposed to adjust the distances with a power transform, based on the [[intrinsic dimension]] of each point, to alleviate this.<ref>{{Cite conference|last1=Schubert|first1=Erich|last2=Gertz|first2=Michael|date=2017-10-04|title=Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection|conference=SISAP 2017 – 10th International Conference on Similarity Search and Applications|pages=188–203|doi=10.1007/978-3-319-68474-1_13}}</ref>
 
t-SNE aims to learn a <math>d</math>-dimensional map <math>\mathbf{y}_1, \dots, \mathbf{y}_N</math> (with <math>\mathbf{y}_i \in \mathbb{R}^d</math> and <math>d</math> typically chosen as 2 or 3) that reflects the similarities <math>p_{ij}</math> as well as possible. To this end, it measures similarities <math>q_{ij}</math> between two points in the map <math>\mathbf{y}_i</math> and <math>\mathbf{y}_j</math>, using a very similar approach.