Content deleted Content added
No edit summary |
No edit summary |
||
Line 132:
In probability theory, the marginal distribution of <math>X</math> can be computed by integrating out <math> Y </math> from the joint density (including the prior distribution on <math>Y</math>)
:<math> Q(X) = \int_\Omega P(X \mid Y) \, \mathrm{d} \pi(Y) </math>
The analog of this rule in the kernel embedding framework states that <math>\mu_X^\pi,</math> the RKHS embedding of <math>Q(X)</math>, can be computed via
Line 164:
In practical implementations, the kernel chain rule takes the following form
:<math> \widehat{\mathcal{C}}_{XY}^\pi = \widehat{\mathcal{C}}_{X \mid Y} \widehat{\mathcal{C}}_{YY}^\pi = \boldsymbol{\Upsilon} (\mathbf{G} + \lambda \mathbf{I})^{-1} \widetilde{\mathbf{G}} \
=== Kernel Bayes' rule ===
In probability theory, a posterior distribution can be expressed in terms of a prior distribution and a likelihood function as
:<math>Q(Y\mid x) = \frac{P(x\mid Y) \pi(Y)}{Q(x)} </math> where <math> Q(x) = \int_\Omega P(x \mid y) \, \mathrm{d} \pi(y) </math>
The analog of this rule in the kernel embedding framework expresses the kernel embedding of the conditional distribution in terms of conditional embedding operators which are modified by the prior distribution
Line 185:
where
:<math>\boldsymbol{\Lambda} = \left(\mathbf{G} + \widetilde{\lambda} \mathbf{I} \right)^{-1} \widetilde{\mathbf{G}} \
Two regularization parameters are used in this framework: <math>\lambda </math> for the estimation of <math> \widehat{\mathcal{C}}_{YX}^\pi, \widehat{\mathcal{C}}_{XX}^\pi = \boldsymbol{\Upsilon} \mathbf{D} \boldsymbol{\Upsilon}^T</math> and <math>\widetilde{\lambda}</math> for the estimation of the final conditional embedding operator
Line 256:
and filtering with kernel embeddings is thus implemented recursively using the following updates for the weights <math>\boldsymbol{\alpha} = (\alpha_1, \dots, \alpha_T)</math> <ref name = "Song2013"/>
:<math>\mathbf{D}^{t+1} = \
:<math>\boldsymbol{\alpha}^{t+1} = \mathbf{D}^{t+1} \mathbf{K} \left( (\mathbf{D}^{t+1} K)^2 + \widetilde{\lambda} \mathbf{I} \right)^{-1} \mathbf{D}^{t+1} \mathbf{K}_{o^{t+1}} </math>
Line 314:
where
:<math>\mu_{\hat{X}_i} = \int_
with a <math>k</math> kernel on the ___domain of <math>X_i</math>-s <math>(k:\Omega\times \Omega \to \R)</math>, <math>K</math> is a kernel on the embedded distributions, and <math>\mathcal{H}(K)</math> is the RKHS determined by <math>K</math>. Examples for <math>K</math> include the linear kernel <math>\left[ K(\mu_P,\mu_Q) = \langle\mu_P,\mu_Q\rangle_{\mathcal{H}(k)} \right] </math>, the Gaussian kernel <math> \left[ K(\mu_P,\mu_Q) = e^{-\left\|\mu_P-\mu_Q\right\|_{H(k)}^2/(2\sigma^2)} \right] </math>, the exponential kernel <math> \left[ K(\mu_P,\mu_Q) = e^{-\left\|\mu_P-\mu_Q\right\|_{H(k)}/(2\sigma^2)} \right] </math>, the Cauchy kernel <math> \left[ K(\mu_P,\mu_Q) = \left(1+ \left\|\mu_P-\mu_Q\right\|_{H(k)}^2/\sigma^2 \right)^{-1} \right] </math>, the generalized t-student kernel <math> \left[ K(\mu_P,\mu_Q) = \left(1+ \left\|\mu_P-\mu_Q\right\|_{H(k)}^{\sigma} \right)^{-1}, (\sigma \le 2) \right] </math>, or the inverse multiquadrics kernel <math> \left[ K(\mu_P,\mu_Q) = \left(\left\|\mu_P-\mu_Q\right\|_{H(k)}^2 + \sigma^2 \right)^{-\frac{1}{2}} \right] </math>.
|