Forward–backward algorithm: Difference between revisions

Content deleted Content added
m changed O(1) to O_1 to match notation (also O(1) was undefined)
Bender the Bot (talk | contribs)
m External links: HTTP to HTTPS for Brown University
 
(11 intermediate revisions by 6 users not shown)
Line 1:
{{Short description|Inference algorithm for hidden Markov models}}
{{Inline|date=April 2018}}
 
The '''forward–backward algorithm''' is an [[Statistical_inference | inference]] [[algorithm]] for [[hidden Markov model]]s which computes the [[posterior probability|posterior]] [[marginal probability|marginals]] of all hidden state variables given a sequence of observations/emissions <math>o_{1:T}:= o_1,\dots,o_T</math>, i.e. it computes, for all hidden state variables <math>X_t \in \{X_1, \dots, X_T\}</math>, the distribution <math>P(X_t\ |\ o_{1:T})</math>. This inference task is usually called '''smoothing'''. The algorithm makes use of the principle of [[dynamic programming]] to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name ''forward–backward algorithm''.
 
The term ''forward–backward algorithm'' is also used to refer to any algorithm belonging to the general class of algorithms that operate on sequence models in a forward–backward manner. In this sense, the descriptions in the remainder of this article refer butonly to one specific instance of this class.
 
==Overview ==
Line 23 ⟶ 24:
 
==Forward probabilities==
The following description will use matrices of probability values ratherinstead thanof probability distributions. However, althoughit inis generalimportant to note that the forward-backward algorithm can generally be applied to both continuous as well asand discrete probability models.
 
We transform the probability distributions related to a given [[hidden Markov model]] into matrix notation as follows.
Line 34 ⟶ 35:
</math>
 
In a typical Markov model, we would multiply a state vector by this matrix to obtain the probabilities for the subsequent state. In a hidden Markov model the state is unknown, and we instead observe events associated with the possible states. An event matrix of the form:
 
:<math>\mathbf{B} = \begin{pmatrix}
Line 60 ⟶ 61:
</math>
 
We can now make this general procedure specific to our series of observations. Assuming an initial state vector <math>\mathbf{\pi}_0</math>, (which can be optimized as a parameter through repetitions of the forward-backbackward procedure), we begin with <math>\mathbf{f_{0:0}} = \mathbf{\pi}_0</math>, then updating the state distribution and weighting by the likelihood of the first observation:
 
:<math>
Line 72 ⟶ 73:
</math>
 
This value is the forward unnormalized [[probability vector]]. The i'th entry of this vector provides:
 
:<math>
Line 114 ⟶ 115:
</math>
 
Notice that we are now using a [[Row and column vectors|column vector]] while the forward probabilities used row vectors. We can then work backwards using:
 
:<math>
Line 190 ⟶ 191:
</math>
 
Notice that the [[transformation matrix]] is also transposed, but in our example the transpose is equal to the original matrix. Performing these calculations and normalizing the results provides:
 
:<math>
Line 227 ⟶ 228:
</math>
 
For the backward probabilities, we start with:
 
:<math>
Line 286 ⟶ 287:
 
==Performance ==
The forward–backward algorithm runs with time complexity <math> O(S^2 T) </math> in space <math> O(S T) </math>, where <math>T</math> is the length of the time sequence and <math>S</math> is the number of symbols in the state alphabet.<ref>[[#RussellNorvig10|Russell & Norvig 2010 pp. 579]]</ref> The algorithm can also run in constant space with time complexity <math> O(S^2 T^2) </math> by recomputing values at each step.<ref>[[#RussellNorvig10|Russell & Norvig 2010 pp. 575]]</ref> For comparison, a [[Brute-force search|brute-force procedure]] would generate all possible <math>S^T</math> state sequences and calculate the joint probability of each state sequence with the observed series of events, which would have [[time complexity]] <math> O(T \cdot S^T) </math>. Brute force is intractable for realistic problems, as the number of possible hidden node sequences typically is extremely high.
 
An enhancement to the general forward-backward algorithm, called the [[Island algorithm]], trades smaller memory usage for longer running time, taking <math> O(S^2 T \log T) </math> time and <math> O(S^2 \log T) </math> memory. Furthermore, it is possible to invert the process model to obtain an <math>O(S)</math> space, <math>O(S^2 T)</math> time algorithm, although the inverted process may not exist or be [[ill-conditioned]].<ref>{{cite journal |last1=Binder |first1=John |last2=Murphy |first2=Kevin |last3=Russell |first3=Stuart |title=Space-efficient inference in dynamic probabilistic networks |journal=Int'l, Joint Conf. On Artificial Intelligence |date=1997 |url=https://www.cs.ubc.ca/~murphyk/Papers/ijcai97.pdf |access-date=8 July 2020}}</ref>
 
In addition, algorithms have been developed to compute <math>\mathbf{f_{0:t+1}}</math> efficiently through online smoothing such as the fixed-lag smoothing (FLS) algorithm.<ref>[[#RussellNorvig10|Russell & Norvig 2010 Figure 15.6 pp. 580]]</ref>
Line 318 ⟶ 319:
Given HMM (just like in [[Viterbi algorithm]]) represented in the [[Python programming language]]:
<syntaxhighlight lang="python">
states = ('"Healthy'", '"Fever'")
end_state = '"E'"
 
observations = ('"normal'", '"cold'", '"dizzy'")
 
start_probability = {'"Healthy'": 0.6, '"Fever'": 0.4}
 
transition_probability = {
'Healthy' "Healthy": {'"Healthy'": 0.69, '"Fever'": 0.3, '"E'": 0.01},
'Fever' "Fever": {'"Healthy'": 0.4, '"Fever'": 0.59, '"E'": 0.01},
}
 
emission_probability = {
'Healthy' "Healthy": {'"normal'": 0.5, '"cold'": 0.4, '"dizzy'": 0.1},
'Fever' "Fever": {'"normal'": 0.1, '"cold'": 0.3, '"dizzy'": 0.6},
}
</syntaxhighlight>
 
Line 395 ⟶ 396:
<syntaxhighlight lang="python">
def example():
return fwd_bkw(
observations,
states,
start_probability,
transition_probability,
emission_probability,
end_state),
)
</syntaxhighlight>
<syntaxhighlight lang="pycon">
Line 418 ⟶ 421:
== References==
{{reflist}}
* [[Lawrence Rabiner|Lawrence R. Rabiner]], A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. ''Proceedings of the [[IEEE]]'', 77 (2), p.&nbsp;257–286, February 1989. [https://dx.doi.org/10.1109/5.18626 10.1109/5.18626]
* {{cite journal |author=Lawrence R. Rabiner, B. H. Juang|title=An introduction to hidden Markov models|journal=IEEE ASSP Magazine |date=January 1986 |pages=4–15}}
* {{cite book | author = Eugene Charniak|title = Statistical Language Learning|publisher = MIT Press| ___location=Cambridge, Massachusetts|year = 1993|isbn=978-0-262-53141-2}}
* <cite id = RussellNorvig10>{{cite book | author = Stuart Russell and Peter Norvig|title = Artificial Intelligence A Modern Approach 3rd Edition|publisher = Pearson Education/Prentice-Hall|___location = Upper Saddle River, New Jersey|year = 2010|isbn=978-0-13-604259-4}}</cite>
 
==External links ==
* [http://www.cs.jhu.edu/~jason/papers/#eisner-2002-tnlp An interactive spreadsheet for teaching the forward–backward algorithm] (spreadsheet and article with step-by-step walk-through)
* [httphttps://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html Tutorial of hidden Markov models including the forward–backward algorithm]
* [http://code.google.com/p/aima-java/ Collection of AI algorithms implemented in Java] (including HMM and the forward–backward algorithm)
 
{{DEFAULTSORT:Forward-backward algorithm}}
[[Category:Articles with example Python (programming language) code]]
[[Category:Dynamic programming]]
[[Category:Error detection and correction]]