Revision as of 08:49, 27 November 2015 edit 109.28.97.194 (talk) Remove spurious "CITATION" author self-reminder ← Previous edit		Revision as of 17:31, 9 February 2016 edit undo 2601:445:4001:273e:3410:d115:c9ec:425 (talk) No edit summary Next edit →
Line 10: Given a set of <math>N</math> high-dimensional objects <math>\mathbf{x}_1, \dots, \mathbf{x}_N</math>, t-SNE first computes probabilities <math>p_{ij}</math> that are proportional to the similarity of objects <math>\mathbf{x}_i</math> and <math>\mathbf{x}_j</math>, as follows: : <math>p_{j\|\mid i} = \frac{\exp(-\lVert\mathbf{x}_i - \mathbf{x}_j\rVert^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-\lVert\mathbf{x}_i - \mathbf{x}_k\rVert^2 / 2\sigma_i^2)},</math> : <math>p_{ij} = \frac{p_{j\|\mid i} + p_{i\|\mid j}}{2N}</math> The bandwidth of the [[Gaussian kernel]]s <math>\sigma_i</math>, is set in such a way that the [[perplexity]] of the conditional distribution equals a predefined perplexity using a [[binary search]]. As a result, the bandwidth is adapted to the [[density]] of the data: smaller values of <math>\sigma_i</math> are used in denser parts of the data space. Line 18: t-SNE aims to learn a <math>d</math>-dimensional map <math>\mathbf{y}_1, \dots, \mathbf{y}_N</math> (with <math>\mathbf{y}_i \in \mathbb{R}^d</math>) that reflects the similarities <math>p_{ij}</math> as well as possible. To this end, it measures similarities <math>q_{ij}</math> between two points in the map <math>\mathbf{y}_i</math> and <math>\mathbf{y}_j</math>, using a very similar approach. Specifically, <math>q_{ij}</math> is defined as: : <math>q_{ij} = \frac{(1 + \lVert \mathbf{y}_i - \mathbf{y}_j\rVert^2)^{-1}}{\sum_{k \neq l} (1 + \lVert \mathbf{y}_k - \mathbf{y}_l\rVert^2)^{-1}}</math> Herein a heavy-tailed [[Student-t distribution]] is used to measure similarities between low-dimensional points in order to allow dissimilar objects to be modeled far apart in the map. Line 24: The locations of the points <math>\mathbf{y}_i</math> in the map are determined by minimizing the (non-symmetric) [[Kullback–Leibler divergence]] of the distribution <math>Q</math> from the distribution <math>P</math>, that is: : <math>KL(P\|\|Q) = \sum_{i \neq j} p_{ij} \, \log \frac{p_{ij}}{q_{ij}}</math> The minimization of the Kullback–Leibler divergence with respect to the points <math>\mathbf{y}_i</math> is performed using [[gradient descent]]. The result of this optimization is a map that reflects the similarities between the high-dimensional inputs well. Line 33: == Software == * t-Distributed Stochastic Neighbor Embedding http://homepage.tudelft.nl/19j49/t-SNE.html [[Category:Machine learning algorithms]]

T-distributed stochastic neighbor embedding: Difference between revisions