Kernel embedding of distributions: Difference between revisions

Content deleted Content added
m clean up spacing around commas and other punctuation fixes, replaced: ,K → , K (3), ,Q → , Q (3), ,Y → , Y (11), ,g → , g, ,j → , j, ,n → , n (3), ,t → , t (2), ,u → , u, ,x → , x (4), ,y → , y (4), ,z → , z, , → , (4)
No edit summary
Line 242:
where <math>\odot</math> denotes the element-wise vector product, <math>N(t) \backslash s </math> is the set of nodes connected to ''t'' excluding node ''s'', <math> \boldsymbol{\beta}_{ut} = \left(\beta_{ut}^1, \dots, \beta_{ut}^n \right) </math>, <math>\mathbf{K}_t, \mathbf{K}_s </math> are the Gram matrices of the samples from variables <math>X_t, X_s </math>, respectively, and <math>\boldsymbol{\Upsilon}_s = \left(\varphi(x_s^1),\dots, \varphi(x_s^n)\right)</math> is the feature matrix for the samples from <math>X_s</math>.
 
Thus, if the incoming messages to node ''t'' are linear combinations of feature mapped samples from <math> X_t </math>, then the outgoing message from this node is also a linear combination of feature mapped samples from <math> X_s </math>. This RKHS function representation of message-passing updates therefore produces an efficient belief propagation algorithm in which the [[Markov Randomrandom Fieldfield#Clique factorization|potentials]] are nonparametric functions inferred from the data so that arbitrary statistical relationships may be modeled.<ref name = "Song2013"/>
 
=== Nonparametric filtering in hidden Markov models ===
In the [[hidden Markov model]] (HMM), two key quantities of interest are the transition probabilities between hidden states <math> P(S^t \mid S^{t-1})</math> and the emission probabilities <math>P(O^t \mid S^t)</math> for observations. Using the kernel conditional distribution embedding framework, these quantities may be expressed in terms of samples from the HMM. A serious limitation of the embedding methods in this ___domain is the need for training samples containing hidden states, as otherwise inference with arbitrary distributions in the HMM is not possible.
 
One common use of HMMs is [[Hidden Markov Modelmodel#Filtering|filtering]] in which the goal is to estimate posterior distribution over the hidden state <math>s^t</math> at time step ''t'' given a history of previous observations <math>h^t = (o^1, \dots, o^t)</math> from the system. In filtering, a '''belief state''' <math>P(S^{t+1} \mid h^{t+1})</math> is recursively maintained via a prediction step (where updates <math>P(S^{t+1} \mid h^t) = \mathbb{E}[P(S^{t+1} \mid S^t) \mid h^t]</math> are computed by marginalizing out the previous hidden state) followed by a conditioning step (where updates <math> P(S^{t+1} \mid h^t, o^{t+1}) \propto P(o^{t+1} \mid S^{t+1}) P(S^{t+1} \mid h^t) </math> are computed by applying Bayes' rule to condition on a new observation).<ref name = "Song2013"/> The RKHS embedding of the belief state at time ''t+1'' can be recursively expressed as
 
:<math>\mu_{S^{t+1} \mid h^{t+1}} = \mathcal{C}_{S^{t+1} O^{t+1}}^\pi \left(\mathcal{C}_{O^{t+1} O^{t+1}}^\pi \right)^{-1} \varphi(o^{t+1}) </math>