Revision as of 00:29, 3 December 2010 edit 93.96.236.20 (talk) →Backward Probabilities ← Previous edit		Revision as of 16:11, 8 February 2011 edit undo Rich Farmbrough (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, File movers, IP block exemptions, Pending changes reviewers, Rollbackers, Template editors 1,733,945 edits Copyedit. Correct caps in section header. Next edit →
Line 4: The '''forward-backward algorithm''' is an [[inference]] [[algorithm]] for [[hidden Markov models]] which computes the [[posterior probability\|posterior]] [[marginal probability\|marginals]] of all hidden state variables given a sequence of observations/emissions <math>o_{1:t}:= o_1,\dots,o_t</math>, i.e. it computes, for all hidden state variables <math>X_k \in \{X_1, \dots, X_t\}</math>, the distribution <math>P(X_k\ \|\ o_{1:t})</math>. This inference task is usually called ''smoothing''. The algorithm makes use of the principle of [[dynamic programming]] to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name ''forward-backward algorithm''. The term ''forward-backward algorithm'' is also used to refer to any algorithm belonging to the general class of algorithms that operate on sequence models in a forward-backward manner. In this sense, the descriptions in the remainder of this article refer but to one specific instance of this class. ==Overview == In the first pass, the forward-backward algorithm computes a set of forward probabilities which provide, for all <math>k \in \{1, \dots, t\}</math>, the probability of ending up in any particular state given the first <math>k</math> observations in the sequence, i.e. <math>P(X_k\ \|\ o_{1:k})</math>. In the second pass, the algorithm computes a set of backward probabilities which provide the probability of observing the remaining observations given any starting point <math>k</math>, i.e. <math>P(o_{k+1:t}\ \|\ X_k)</math>. These two sets of probability distributions can then be combined to obtain the distribution over states at any specific point in time given the entire observation sequence: :<math>P(X_k\ \|\ o_{1:t}) = P(X_k\ \|\ o_{1:k}, o_{k+1:t}) \propto P(o_{k+1:t}\ \|\ X_k) P(X_k\ \|\ o_{1:k})</math> The last step follows from an application of [[Bayes' rule]] and the [[conditional independence]] of <math>o_{k+1:t}</math> and <math>o_{1:k}</math> given <math>X_k</math>. As outlined above, the algorithm involves three steps: # computing forward probabilities Line 24 ⟶ 23: The forward-backward algorithm can be used to find the most likely state for any point in time. It cannot, however, be used to find the most likely sequence of states (see [[Viterbi algorithm]]). == Forward ~~Probabilities~~ probabilities== The following description takes as its base matrices of probability values rather than probability distributions. We transform the probability distributions related to a given [[hidden Markov model]] into matrix notation as follows. The transition probabilities <math>\mathbf{P}(X_t\mid X_{t-1})</math> of a given random variable <math>X_t</math> representing all possible states in the hidden Markov model will be represented by the matrix <math>\mathbf{T}</math> where the row index, i, will represent the start state and the column index, j, represents the target state. The example below represents a system where the probability of staying in the same state after each step is 70% and the probability of transitioning to the other state is 30%. The transition matrix is then: Line 30 ⟶ 29: :<math>\mathbf{T} = \begin{pmatrix} 0.7 & 0.3 \\ 0.3 & 0.7 \end{pmatrix} </math> Line 38 ⟶ 37: :<math>\mathbf{B} = \begin{pmatrix} 0.9 & 0.1 \\ 0.2 & 0.8 \end{pmatrix} </math> Line 50 ⟶ 49: :<math>\mathbf{O_1} = \begin{pmatrix} 0.9 & 0.0 \\ 0.0 & 0.2 \end{pmatrix} </math> Line 69 ⟶ 68: :<math> \mathbf{f_{0:t}}(i) = \mathbf{P}(o_1, o_2, \dots, o_t, X_t=x_i \| \mathbf{\pi} ) </math> Line 87 ⟶ 86: :<math> \mathbf{\hat{f}_{0:t}}(i) = \frac{\mathbf{f_{0:t}}(i)}{\prod_{s=1}^t c_s} = \frac{\mathbf{P}(o_1, o_2, \dots, o_t, X_t=x_i \| \mathbf{\pi} )}{\mathbf{P}(o_1, o_2, \dots, o_t\|\mathbf{\pi})} = \mathbf{P}(X_t=x_i \| o_1, o_2, \dots, o_t, \mathbf{\pi} ) Line 95 ⟶ 94: We thus find that the product of the scaling factors provides us with the total probability for observing the given sequence up to time t and that the scaled probability vector provides us with the probability of being in each state at this time. == Backward ~~Probabilities~~ probabilities== A similar procedure can be constructed to find backward probabilities. These intend to provide the probabilities: :<math> \mathbf{b_{t:T}}(i) = \mathbf{P}(o_{t+1}, o_{t+2}, \dots, o_{T} \| X_t=x_i ) </math> Line 123 ⟶ 122: :<math> \mathbf{\hat{b}_{t:T}}(i) = \frac{\mathbf{b_{t:T}}(i)}{\prod_{s=t+1}^T c_s} </math> Line 130 ⟶ 129: :<math> \mathbf{\gamma_t}(i) = \mathbf{P}(X_t=x_i \| o_1, o_2, \dots, o_T, \mathbf{\pi}) = \frac{ \mathbf{P}(o_1, o_2, \dots, o_T, X_t=x_i \| \mathbf{\pi} ) }{ \mathbf{P}(o_1, o_2, \dots, o_T \| \mathbf{\pi} ) } = Line 146 ⟶ 145: :<math>\mathbf{T} = \begin{pmatrix} 0.7 & 0.3 \\ 0.3 & 0.7 \end{pmatrix} </math> Line 154 ⟶ 153: :<math>\mathbf{B} = \begin{pmatrix} 0.9 & 0.1 \\ 0.2 & 0.8 \end{pmatrix} </math> We then observe the following sequence of events: {umbrella, umbrella, no umbrella, umbrella, umbrella} which we will represent in our calculations as: :<math> Line 164 ⟶ 163: </math> Note that <math>\mathbf{O_3}</math> differs from the others because of the "no umbrella" observation. In computing the forward probabilities we begin with: Line 187 ⟶ 186: :<math> (\mathbf{\hat{f}_{0:1}})^T = c_1^{-1}\begin{pmatrix}0.9 & 0.0 \\ 0.0 & 0.2 \end{pmatrix}\begin{pmatrix} 0.7 & 0.3 \\ 0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.5000 \\ 0.5000 \end{pmatrix}= c_1^{-1}\begin{pmatrix}0.4500 \\ 0.1000\end{pmatrix}= Line 194 ⟶ 193: :<math> (\mathbf{\hat{f}_{0:2}})^T = c_2^{-1}\begin{pmatrix}0.9 & 0.0 \\ 0.0 & 0.2 \end{pmatrix}\begin{pmatrix} 0.7 & 0.3 \\ 0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.8182 \\ 0.1818 \end{pmatrix}= c_2^{-1}\begin{pmatrix}0.5645 \\ 0.0745\end{pmatrix}= Line 201 ⟶ 200: :<math> (\mathbf{\hat{f}_{0:3}})^T = c_3^{-1}\begin{pmatrix}0.1 & 0.0 \\ 0.0 & 0.8 \end{pmatrix}\begin{pmatrix} 0.7 & 0.3 \\ 0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.8834 \\ 0.1166 \end{pmatrix}= c_3^{-1}\begin{pmatrix}0.0653 \\ 0.2772\end{pmatrix}= Line 208 ⟶ 207: :<math> (\mathbf{\hat{f}_{0:4}})^T = c_4^{-1}\begin{pmatrix}0.9 & 0.0 \\ 0.0 & 0.2 \end{pmatrix}\begin{pmatrix} 0.7 & 0.3 \\ 0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.1907 \\ 0.8093 \end{pmatrix}= c_4^{-1}\begin{pmatrix}0.3386 \\ 0.1247\end{pmatrix}= Line 215 ⟶ 214: :<math> (\mathbf{\hat{f}_{0:5}})^T = c_5^{-1}\begin{pmatrix}0.9 & 0.0 \\ 0.0 & 0.2 \end{pmatrix}\begin{pmatrix} 0.7 & 0.3 \\ 0.3 & 0.7 \end{pmatrix}\begin{pmatrix}0.7308 \\ 0.2692 \end{pmatrix}= c_5^{-1}\begin{pmatrix}0.5331 \\ 0.0815\end{pmatrix}= Line 284 ⟶ 283: Several enhancements are known to the general forward-backward algorithm which allow for computations to take place in constant space. In addition, as it grows, algorithms have been developed to compute <math>\mathbf{f_{0:t+1}}</math> efficiently through online smoothing such as the fixed-lag smoothing (FLS) algorithm [[#RusselNorvig03\|Russel & Norvig 2003 pp. 552]]. == Pseudocode== <pre> ForwardBackward(guessState, sequenceIndex): Line 303 ⟶ 302: == References== * [[Lawrence Rabiner\|Lawrence R. Rabiner]], [http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%20and%20applications.pdf A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition]. ''Proceedings of the [[IEEE]]'', 77 (2), p. 257–286, February 1989. *{{cite journal \|author=Lawrence R. Rabiner, B. H. Juang\|title=An introduction to hidden Markov models\|journal=IEEE ASSP Magazine \|month=January \|pages=4–15 \|year=1986}}

Forward–backward algorithm: Difference between revisions