== Example ==
In this simple example, which is taken from Song et al.,<ref name = "Song2013"/> <math>X, Y</math> are assumed to be [[Probability distribution#Discrete probability distribution|discrete random variables]] which take values in the set <math>\{1,\dotsldots,K\} </math> and the kernel is chosen to be the [[Kronecker delta]] function, so <math>k(x,x') = \delta(x,x')</math>. The feature map corresponding to this kernel is the [[standard basis]] vector <math>\phi(x) = \mathbf{e}_x</math>. The kernel embeddings of such a distributions are thus vectors of marginal probabilities while the embeddings of joint distributions in this setting are <math>K\times K </math> matrices specifying joint probability tables, and the explicit form of these embeddings is
:: <math>\mu_X = \mathbb{E}_X [\mathbf{e}_x] = \left(
:<math>\mu_X = \mathbb{E}_X [\mathbf{e}_X] = \begin{pmatrix} P(X=1) \\ \vdots \\ P(X=K) \\ \end{pmatrix}</math>
\begin{array}{c}
: : <math> \mathcal{C}_{XY} = \mathbb{E}_{XY} [\mathbf{e}_X \otimes e_Y\mathbf{e}_Y] = \bigg( P(X=s, Y=t) \bigg)_{s,t \in \{1,\ dotsldots,K\}} </math> ▼
P(X=1) \\
\vdots \\
The conditional distribution embedding operator,
P(X=K) \\
\end{array}
The conditional distribution embedding operator :<math> \mathcal{C}_{Y\mid X} = \mathcal{C}_{YX} \mathcal{C}_{XX}^{-1} ,</math> is in this setting a conditional probability table▼
\right) </math>
▲:: <math> \mathcal{C}_{XY} = \mathbb{E}_{XY} [\mathbf{e}_X \otimes e_Y] = \bigg( P(X=s, Y=t) \bigg)_{s,t \in \{1,\dots,K\}} </math>
is in this setting a conditional probability table
: : <math> \mathcal{C}_{Y \mid X} = \bigg( P(Y=s \mid X=t) \bigg)_{s,t \in \{1,\dots,K\}} </math> ▼
and
:<math>\mathcal{C}_{XX} =\begin{pmatrix} P(X=1) & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & P(X=K) \\ \end{pmatrix}</math>
▲The conditional distribution embedding operator <math> \mathcal{C}_{Y\mid X} = \mathcal{C}_{YX} \mathcal{C}_{XX}^{-1} </math> is in this setting a conditional probability table
▲:: <math> \mathcal{C}_{Y \mid X} = \bigg( P(Y=s \mid X=t) \bigg)_{s,t \in \{1,\dots,K\}} </math>
: and <math> \mathcal{C}_{XX} =\left(
\begin{array}{c c c}
P(X=1) & \dots & 0 \\
\vdots & \ddots & \vdots \\
0 & \dots & P(X=K) \\
\end{array}
\right)
</math>
Thus, the embeddings of the conditional distribution under a fixed value of <math>X</math> may be computed as
:: <math> \mu_{Y \mid x} = \mathcal{C}_{Y \mid X} \phi(x) = \left(
:<math>\mu_{Y \mid x} = \mathcal{C}_{Y \mid X} \phi(x) = \begin{pmatrix} P(Y=1 \mid X = x) \\ \vdots \\ P(Y=K \mid X = x) \\ \end{pmatrix} </math>
\begin{array}{c}
P(Y=1 \mid X = x) \\
\vdots \\
P(Y=K \mid X = x) \\
\end{array}
\right) </math>
In this discrete-valued setting with the Kronecker delta kernel, the [[#Rules of probability as operations in the RKHS|kernel sum rule]] becomes
:: <math> \underbrace{ \left(
:<math>\underbrace{\begin{pmatrix} Q(X=1) \\ \vdots \\ P(X = N) \\ \end{pmatrix}}_{\mu_X^\pi} = \underbrace{\begin{pmatrix} \\ P(X=s \mid Y=t) \\ \\ \end{pmatrix}}_{\mathcal{C}_{X\mid Y}} \underbrace{\begin{pmatrix} \pi(Y=1) \\ \vdots \\ \pi(Y = N) \\ \end{pmatrix}}_{ \mu_Y^\pi}</math>
\begin{array}{c}
Q(X=1) \\
\vdots \\
P(X = N) \\
\end{array}
\right) }_{\mu_X^\pi} = \underbrace{ \left( \begin{array}{c} \\ P(X=s \mid Y=t) \\ \\ \end{array} \right) }_{ \mathcal{C}_{X\mid Y} } \underbrace{ \left(
\begin{array}{c}
\pi(Y=1) \\
\vdots \\
\pi(Y = N) \\
\end{array}
\right) }_{ \mu_Y^\pi} </math> ▼
The [[#Rules of probability as operations in the RKHS|kernel chain rule]] in this case is given by
:: <math> \underbrace{ \left( \begin{array}{c} \\ Q(X=s,Y=t) \\ \\ \end{array} \right) }_{\mathcal{C}_{XY}^\pi} =
:<math>\underbrace{\begin{pmatrix} \left\ Q(X=s,Y=t) \begin\ \\ \end{arraypmatrix} }_{\mathcal{C}_{XY}^\pi} = \underbrace{\begin{cpmatrix} \\ P(X=s \mid Y=t) \\ \\ \end{arraypmatrix} \right) }_{\mathcal{C}_{X \mid Y}} \underbrace{ \begin{pmatrix} \pi(Y=1) & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \pi(Y=K) \\
▲\ right)end{pmatrix} }_{ \ mu_Ymathcal{C}_{YY}^\pi} </math>
\underbrace{ \left(
\begin{array}{c c c}
\pi(Y=1) & \dots & 0 \\
\vdots & \ddots & \vdots \\
0 & \dots & \pi(Y=K) \\
\end{array}
\right) }_{\mathcal{C}_{YY}^\pi} </math>
==References==
|