Revision as of 08:55, 19 December 2024 edit FThoresen (talk \| contribs) 2 edits m Minus sign mistake. Tag: Visual edit ← Previous edit		Revision as of 18:44, 5 January 2025 edit undo Mrwojo (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers 15,713 edits m →Details: added 2 wikilinks, capitalized "Shannon" Next edit →
Line 30: Also note that <math>p_{ii} = 0 </math> and <math>\sum_{i, j} p_{ij} = 1</math>. The bandwidth of the [[Gaussian kernel]]s <math>\sigma_i</math> is set in such a way that the [[Entropy (information theory)\|entropy]] of the conditional distribution equals a predefined entropy using the [[bisection method]]. As a result, the bandwidth is adapted to the [[density]] of the data: smaller values of <math>\sigma_i</math> are used in denser parts of the data space. The entropy increases with the [[perplexity]] of this distribution <math>P_i</math>; this relation is seen as : <math>Perp(P_i) = 2^{H(P_i)}</math> where <math>H(P_i)</math> is the ~~shannon~~Shannon entropy <math>H(P_i)=-\sum_jp_{j\|i}\log_2p_{j\|i}.</math> The perplexity is a hand-chosen parameter of t-SNE, and as the authors state, "perplexity can be interpreted as a smooth measure of the effective number of neighbors. The performance of SNE is fairly robust to changes in the perplexity, and typical values are between 5 and 50.".<ref name=MaatenHinton/> Since the Gaussian kernel uses the [[Euclidean distance]] <math>\lVert x_i-x_j \rVert</math>, it is affected by the [[curse of dimensionality]], and in high dimensional data when distances lose the ability to discriminate, the <math>p_{ij}</math> become too similar (asymptotically, they would converge to a constant). It has been proposed to adjust the distances with a power transform, based on the [[intrinsic dimension]] of each point, to alleviate this.<ref>{{Cite conference\|last1=Schubert\|first1=Erich\|last2=Gertz\|first2=Michael\|date=2017-10-04\|title=Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection\|conference=SISAP 2017 – 10th International Conference on Similarity Search and Applications\|pages=188–203\|doi=10.1007/978-3-319-68474-1_13}}</ref> t-SNE aims to learn a <math>d</math>-dimensional map <math>\mathbf{y}_1, \dots, \mathbf{y}_N</math> (with <math>\mathbf{y}_i \in \mathbb{R}^d</math> and <math>d</math> typically chosen as 2 or 3) that reflects the similarities <math>p_{ij}</math> as well as possible. To this end, it measures similarities <math>q_{ij}</math> between two points in the map <math>\mathbf{y}_i</math> and <math>\mathbf{y}_j</math>, using a very similar approach.

T-distributed stochastic neighbor embedding: Difference between revisions