Kernel embedding of distributions: Difference between revisions

Content deleted Content added
Line 14:
==Definitions==
 
Let <math>X</math> denote a random variable with codomain___domain <math>\Omega</math> and distribution <math>P(dx).</math> Given a kernel <math>k</math> on <math>\Omega \times \Omega,</math> the [[Reproducing kernel Hilbert space#Moore–Aronszajn theorem|Moore–Aronszajn theorem]] asserts the existence of a RKHS <math>\mathcal{H}</math> (a [[Hilbert space]] of functions <math>f: \Omega \to \R</math> equipped with inner products <math>\langle \cdot, \cdot \rangle_\mathcal{H}</math> and norms <math>\| \cdot \|_\mathcal{H}</math>) in which the element <math>k(x,\cdot)</math> satisfies the reproducing property
 
:<math>\forall f \in \mathcal{H}, \forall x \in \Omega \qquad \langle f, k(x,\cdot) \rangle_\mathcal{H} = f(x).</math>
Line 21:
 
===Kernel embedding===
The kernel embedding of the distribution <math>P(X)</math> in <math> \mathcal{H} </math> (also called the '''kernel mean''' or '''mean map''') is given by:<ref name = "Smola2007" />
 
:<math>\mu_X := \mathbb{E}_X [k(X, \cdot) ] = \mathbb{E}_X [\varphi(X) ] = \int_\Omega \varphi(x) \ \mathrm{d}P(x) </math>
Line 33:
 
===Joint distribution embedding===
If <math>Y</math> denotes another random variable (for simplicity, assume the co-___domain of <math>Y</math> is also <math>\Omega</math> with the same kernel <math>k</math> which satisfies <math> \langle \varphi(x) \otimes \varphi(y), \varphi(x') \otimes \varphi(y') \rangle = k(x,x') \otimes k(y,y')</math>), then the [[Joint probability distribution|joint distribution]] <math> P(Xx,Yy)) </math> can be mapped into a [[tensor product]] feature space <math>\mathcal{H} \otimes \mathcal{H} </math> via <ref name = "Song2013"/>
 
:<math> \mathcal{C}_{XY} = \mathbb{E}_{XY} [\varphi(X) \otimes \varphi(Y)] = \int_{\Omega \times \Omega} \varphi(x) \otimes \varphi(y) \ \mathrm{d} P(x,y) </math>
Line 46:
 
===Conditional distribution embedding===
Given a [[conditional distribution]] <math>P(Yy\mid Xx),</math> one can define the corresponding RKHS embedding as <ref name = "Song2013"/>
 
:<math>\mu_{Y \mid x} = \mathbb{E}_{Y \mid x} [ \varphi(Y) ] = \int_\Omega \varphi(y) \ \mathrm{d}P(y \mid x) </math>
 
Note that the embedding of <math>P(Yy\mid Xx) </math> thus defines a family of points in the RKHS indexed by the values <math>x</math> taken by conditioning variable <math>X</math>. By fixing <math>X</math> to a particular value, we obtain a single element in <math>\mathcal{H}</math>, and thus it is natural to define the operator
 
:<math>\begin{cases} \mathcal{C}_{Y\mid X}: \mathcal{H} \to \mathcal{H} \\ \mathcal{C}_{Y\mid X} = \mathcal{C}_{YX} \mathcal{C}_{XX}^{-1} \end{cases}</math>