Forward–backward algorithm: Difference between revisions

Content deleted Content added
Corrected page reference. "Director" in the textbook does not have a gender specified.
Update notation to be consistent - always t=1, ..., T and never k=1, ..., t
Line 1:
{{Inline|date=April 2018}}
 
The '''forward–backward algorithm''' is an [[Statistical_inference | inference]] [[algorithm]] for [[hidden Markov model]]s which computes the [[posterior probability|posterior]] [[marginal probability|marginals]] of all hidden state variables given a sequence of observations/emissions <math>o_{1:tT}:= o_1,\dots,o_to_T</math>, i.e. it computes, for all hidden state variables <math>X_kX_t \in \{X_1, \dots, X_tX_T\}</math>, the distribution <math>P(X_kX_t\ |\ o_{1:tT})</math>. This inference task is usually called ''smoothing''. The algorithm makes use of the principle of [[dynamic programming]] to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name ''forward–backward algorithm''.
 
The term ''forward–backward algorithm'' is also used to refer to any algorithm belonging to the general class of algorithms that operate on sequence models in a forward–backward manner. In this sense, the descriptions in the remainder of this article refer but to one specific instance of this class.
 
==Overview ==
In the first pass, the forward–backward algorithm computes a set of forward probabilities which provide, for all <math>kt \in \{1, \dots, tT\}</math>, the probability of ending up in any particular state given the first <math>kt</math> observations in the sequence, i.e. <math>P(X_kX_t\ |\ o_{1:kt})</math>. In the second pass, the algorithm computes a set of backward probabilities which provide the probability of observing the remaining observations given any starting point <math>kt</math>, i.e. <math>P(o_{kt+1:tT}\ |\ X_kX_t)</math>. These two sets of probability distributions can then be combined to obtain the distribution over states at any specific point in time given the entire observation sequence:
 
:<math>P(X_kX_t\ |\ o_{1:tT}) = P(X_kX_t\ |\ o_{1:kt}, o_{kt+1:tT}) \propto P(o_{kt+1:tT}\ |\ X_kX_t) P( X_kX_t | o_{1:kt})</math>
 
The last step follows from an application of the [[Bayes' rule]] and the [[conditional independence]] of <math>o_{kt+1:tT}</math> and <math>o_{1:kt}</math> given <math>X_kX_t</math>.
 
As outlined above, the algorithm involves three steps: