Revision as of 18:55, 27 March 2020 edit Michael Hardy (talk \| contribs) Administrators 210,597 edits →Definitions ← Previous edit		Revision as of 18:57, 27 March 2020 edit undo Michael Hardy (talk \| contribs) Administrators 210,597 edits →Definitions Next edit →
Line 14: ==Definitions== Let <math>X</math> denote a random variable with ~~codomain~~___domain <math>\Omega</math> and distribution <math>P~~(dx)~~.</math> Given a kernel <math>k</math> on <math>\Omega \times \Omega,</math> the [[Reproducing kernel Hilbert space#Moore–Aronszajn theorem\|Moore–Aronszajn theorem]] asserts the existence of a RKHS <math>\mathcal{H}</math> (a [[Hilbert space]] of functions <math>f: \Omega \to \R</math> equipped with inner products <math>\langle \cdot, \cdot \rangle_\mathcal{H}</math> and norms <math>\\| \cdot \\|_\mathcal{H}</math>) in which the element <math>k(x,\cdot)</math> satisfies the reproducing property :<math>\forall f \in \mathcal{H}, \forall x \in \Omega \qquad \langle f, k(x,\cdot) \rangle_\mathcal{H} = f(x).</math> Line 21: ===Kernel embedding=== The kernel embedding of the distribution <math>P~~(X)~~</math> in <math> \mathcal{H} </math> (also called the '''kernel mean''' or '''mean map''') is given by:<ref name = "Smola2007" /> :<math>\mu_X := \mathbb{E}_X [k(X, \cdot) ] = \mathbb{E}_X [\varphi(X) ] = \int_\Omega \varphi(x) \ \mathrm{d}P(x) </math> Line 33: ===Joint distribution embedding=== If <math>Y</math> denotes another random variable (for simplicity, assume the co-___domain of <math>Y</math> is also <math>\Omega</math> with the same kernel <math>k</math> which satisfies <math> \langle \varphi(x) \otimes \varphi(y), \varphi(x') \otimes \varphi(y') \rangle = k(x,x') \otimes k(y,y')</math>), then the [[Joint probability distribution\|joint distribution]] <math> P(Xx,Yy)) </math> can be mapped into a [[tensor product]] feature space <math>\mathcal{H} \otimes \mathcal{H} </math> via <ref name = "Song2013"/> :<math> \mathcal{C}_{XY} = \mathbb{E}_{XY} [\varphi(X) \otimes \varphi(Y)] = \int_{\Omega \times \Omega} \varphi(x) \otimes \varphi(y) \ \mathrm{d} P(x,y) </math> Line 46: ===Conditional distribution embedding=== Given a [[conditional distribution]] <math>P(Yy\mid Xx),</math> one can define the corresponding RKHS embedding as <ref name = "Song2013"/> :<math>\mu_{Y \mid x} = \mathbb{E}_{Y \mid x} [ \varphi(Y) ] = \int_\Omega \varphi(y) \ \mathrm{d}P(y \mid x) </math> Note that the embedding of <math>P(Yy\mid Xx) </math> thus defines a family of points in the RKHS indexed by the values <math>x</math> taken by conditioning variable <math>X</math>. By fixing <math>X</math> to a particular value, we obtain a single element in <math>\mathcal{H}</math>, and thus it is natural to define the operator :<math>\begin{cases} \mathcal{C}_{Y\mid X}: \mathcal{H} \to \mathcal{H} \\ \mathcal{C}_{Y\mid X} = \mathcal{C}_{YX} \mathcal{C}_{XX}^{-1} \end{cases}</math>

Kernel embedding of distributions: Difference between revisions