T-distributed stochastic neighbor embedding: Difference between revisions

Content deleted Content added
Reverting edit(s) by 196.200.180.26 (talk) to rev. 1104961342 by Quantling: non-constructive (RW 16.1)
Explain pij formula
Line 25:
Now define
: <math>p_{ij} = \frac{p_{j\mid i} + p_{i\mid j}}{2N}</math>
 
and note that <math>p_{ij} = p_{ji}</math>, <math>p_{ii} = 0 </math>, and <math>\sum_{i, j} p_{ij} = 1</math>.
 
This is motivated because <math>p_{i}</math> and <math>p_{i}</math> from the N samples are estimated as 1/N, so conditional probability can be written as <math>p_{j\mid i} = Np_{ji}</math> and <math>p_{j\mid i} = Np_{ij}</math> . Since <math> p_{ji} = p_{ji}</math>, you can obtain previous formula.
 
andAlso note that <math>p_{ij} = p_{ji}</math>, <math>p_{ii} = 0 </math>, and <math>\sum_{i, j} p_{ij} = 1</math>.
 
This is motivated because that estimated <math>p_{i}</math> and <math>p_{i}</math> from the samples are estimated as 1/N, so p_{j\mid i} = p_{ji}/N
 
 
 
The bandwidth of the [[Gaussian kernel]]s <math>\sigma_i</math> is set in such a way that the [[Entropy (information theory)|entropy]] of the conditional distribution equals a predefined entropy using the [[bisection method]]. As a result, the bandwidth is adapted to the [[density]] of the data: smaller values of <math>\sigma_i</math> are used in denser parts of the data space.