Revision as of 02:48, 17 September 2019 edit Latex-yow (talk \| contribs) Extended confirmed users 1,289 edits m →Properties ← Previous edit		Revision as of 02:56, 17 September 2019 edit undo Latex-yow (talk \| contribs) Extended confirmed users 1,289 edits m →Example Next edit →
Line 254: == Example == In this simple example, which is taken from Song et al.,<ref name = "Song2013"/> <math>X, Y</math> are assumed to be [[Probability distribution#Discrete probability distribution\|discrete random variables]] which take values in the set <math>\{1,\~~dots~~ldots,K\} </math> and the kernel is chosen to be the [[Kronecker delta]] function, so <math>k(x,x') = \delta(x,x')</math>. The feature map corresponding to this kernel is the [[standard basis]] vector <math>\phi(x) = \mathbf{e}_x</math>. The kernel embeddings of such a distributions are thus vectors of marginal probabilities while the embeddings of joint distributions in this setting are <math>K\times K </math> matrices specifying joint probability tables, and the explicit form of these embeddings is ~~:: <math>\mu_X = \mathbb{E}_X [\mathbf{e}_x] = \left(~~ :<math>\mu_X = \mathbb{E}_X [\mathbf{e}_X] = \begin{pmatrix} P(X=1) \\ \vdots \\ P(X=K) \\ \end{pmatrix}</math> ~~\begin{array}{c}~~ :: <math> \mathcal{C}_{XY} = \mathbb{E}_{XY} [\mathbf{e}_X \otimes ~~e_Y~~\mathbf{e}_Y] = ~~\bigg~~( P(X=s, Y=t) ~~\bigg~~)_{s,t \in \{1,\~~dots~~ldots,K\}} </math>▼ ~~P(X=1) \\~~ ~~\vdots \\~~ The conditional distribution embedding operator, ~~P(X=K) \\~~ ~~\end{array}~~ ~~The conditional distribution embedding operator~~ :<math> \mathcal{C}_{Y\mid X} = \mathcal{C}_{YX} \mathcal{C}_{XX}^{-1} ,</math> ~~is in this setting a conditional probability table~~▼ ~~\right) </math>~~ ▲:: <math> \mathcal{C}_{XY} = \mathbb{E}_{XY} [\mathbf{e}_X \otimes e_Y] = \bigg( P(X=s, Y=t) \bigg)_{s,t \in \{1,\dots,K\}} </math> is in this setting a conditional probability table :: <math> \mathcal{C}_{Y \mid X} = ~~\bigg~~( P(Y=s \mid X=t) ~~\bigg~~)_{s,t \in \{1,\dots,K\}} </math>▼ and :<math>\mathcal{C}_{XX} =\begin{pmatrix} P(X=1) & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & P(X=K) \\ \end{pmatrix}</math> ▲The conditional distribution embedding operator <math> \mathcal{C}_{Y\mid X} = \mathcal{C}_{YX} \mathcal{C}_{XX}^{-1} </math> is in this setting a conditional probability table ▲:: <math> \mathcal{C}_{Y \mid X} = \bigg( P(Y=s \mid X=t) \bigg)_{s,t \in \{1,\dots,K\}} </math> ~~: and <math> \mathcal{C}_{XX} =\left(~~ ~~\begin{array}{c c c}~~ ~~P(X=1) & \dots & 0 \\~~ ~~\vdots & \ddots & \vdots \\~~ ~~0 & \dots & P(X=K) \\~~ ~~\end{array}~~ ~~\right)~~ ~~</math>~~ Thus, the embeddings of the conditional distribution under a fixed value of <math>X</math> may be computed as ~~:: <math> \mu_{Y \mid x} = \mathcal{C}_{Y \mid X} \phi(x) = \left(~~ :<math>\mu_{Y \mid x} = \mathcal{C}_{Y \mid X} \phi(x) = \begin{pmatrix} P(Y=1 \mid X = x) \\ \vdots \\ P(Y=K \mid X = x) \\ \end{pmatrix} </math> ~~\begin{array}{c}~~ ~~P(Y=1 \mid X = x) \\~~ ~~\vdots \\~~ ~~P(Y=K \mid X = x) \\~~ ~~\end{array}~~ ~~\right) </math>~~ In this discrete-valued setting with the Kronecker delta kernel, the [[#Rules of probability as operations in the RKHS\|kernel sum rule]] becomes ~~:: <math> \underbrace{ \left(~~ :<math>\underbrace{\begin{pmatrix} Q(X=1) \\ \vdots \\ P(X = N) \\ \end{pmatrix}}_{\mu_X^\pi} = \underbrace{\begin{pmatrix} \\ P(X=s \mid Y=t) \\ \\ \end{pmatrix}}_{\mathcal{C}_{X\mid Y}} \underbrace{\begin{pmatrix} \pi(Y=1) \\ \vdots \\ \pi(Y = N) \\ \end{pmatrix}}_{ \mu_Y^\pi}</math> ~~\begin{array}{c}~~ ~~Q(X=1) \\~~ ~~\vdots \\~~ ~~P(X = N) \\~~ ~~\end{array}~~ ~~\right) }_{\mu_X^\pi} = \underbrace{ \left( \begin{array}{c} \\ P(X=s \mid Y=t) \\ \\ \end{array} \right) }_{ \mathcal{C}_{X\mid Y} } \underbrace{ \left(~~ ~~\begin{array}{c}~~ ~~\pi(Y=1) \\~~ ~~\vdots \\~~ ~~\pi(Y = N) \\~~ ~~\end{array}~~ \right) }_{ \mu_Y^\pi} </math>▼ The [[#Rules of probability as operations in the RKHS\|kernel chain rule]] in this case is given by ~~:: <math> \underbrace{ \left( \begin{array}{c} \\ Q(X=s,Y=t) \\ \\ \end{array} \right) }_{\mathcal{C}_{XY}^\pi} =~~ :<math>\underbrace{\begin{pmatrix} \~~left~~\ Q(X=s,Y=t) \~~begin~~\ \\ \end{~~array~~pmatrix} }_{\mathcal{C}_{XY}^\pi} = \underbrace{\begin{cpmatrix} \\ P(X=s \mid Y=t) \\ \\ \end{~~array~~pmatrix} ~~\right)~~ }_{\mathcal{C}_{X \mid Y}} \underbrace{ \begin{pmatrix} \pi(Y=1) & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \pi(Y=K) \\ ▲\~~right)~~end{pmatrix} }_{ \~~mu_Y~~mathcal{C}_{YY}^\pi} </math> ~~\underbrace{ \left(~~ ~~\begin{array}{c c c}~~ ~~\pi(Y=1) & \dots & 0 \\~~ ~~\vdots & \ddots & \vdots \\~~ ~~0 & \dots & \pi(Y=K) \\~~ ~~\end{array}~~ ~~\right) }_{\mathcal{C}_{YY}^\pi} </math>~~ ==References==

Kernel embedding of distributions: Difference between revisions