Forward–backward algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 22:26, 3 May 2023 edit Yfarjoun (talk \| contribs) 7 edits m changed O(1) to O_1 to match notation (also O(1) was undefined) ← Previous edit		Latest revision as of 07:56, 20 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →External links: HTTP to HTTPS for Brown University Tag: AWB
(11 intermediate revisions by 6 users not shown)
Line 1: {{Short description\|Inference algorithm for hidden Markov models}} {{Inline\|date=April 2018}} The '''forward–backward algorithm''' is an [[Statistical_inference \| inference]] [[algorithm]] for [[hidden Markov model]]s which computes the [[posterior probability\|posterior]] [[marginal probability\|marginals]] of all hidden state variables given a sequence of observations/emissions <math>o_{1:T}:= o_1,\dots,o_T</math>, i.e. it computes, for all hidden state variables <math>X_t \in \{X_1, \dots, X_T\}</math>, the distribution <math>P(X_t\ \|\ o_{1:T})</math>. This inference task is usually called '''smoothing'''. The algorithm makes use of the principle of [[dynamic programming]] to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name ''forward–backward algorithm''. The term ''forward–backward algorithm'' is also used to refer to any algorithm belonging to the general class of algorithms that operate on sequence models in a forward–backward manner. In this sense, the descriptions in the remainder of this article refer ~~but~~only to one specific instance of this class. ==Overview == Line 23 ⟶ 24: ==Forward probabilities== The following description will use matrices of probability values ~~rather~~instead ~~than~~of probability distributions. However, ~~although~~it inis ~~general~~important to note that the forward-backward algorithm can generally be applied to both continuous ~~as well as~~and discrete probability models. We transform the probability distributions related to a given [[hidden Markov model]] into matrix notation as follows. Line 34 ⟶ 35: </math> In a typical Markov model, we would multiply a state vector by this matrix to obtain the probabilities for the subsequent state. In a hidden Markov model the state is unknown, and we instead observe events associated with the possible states. An event matrix of the form: :<math>\mathbf{B} = \begin{pmatrix} Line 60 ⟶ 61: </math> We can now make this general procedure specific to our series of observations. Assuming an initial state vector <math>\mathbf{\pi}_0</math>, (which can be optimized as a parameter through repetitions of the forward-~~back~~backward procedure), we begin with <math>\mathbf{f_{0:0}} = \mathbf{\pi}_0</math>, then updating the state distribution and weighting by the likelihood of the first observation: :<math> Line 72 ⟶ 73: </math> This value is the forward unnormalized [[probability vector]]. The i'th entry of this vector provides: :<math> Line 114 ⟶ 115: </math> Notice that we are now using a [[Row and column vectors\|column vector]] while the forward probabilities used row vectors. We can then work backwards using: :<math> Line 190 ⟶ 191: </math> Notice that the [[transformation matrix]] is also transposed, but in our example the transpose is equal to the original matrix. Performing these calculations and normalizing the results provides: :<math> Line 227 ⟶ 228: </math> For the backward probabilities, we start with: :<math> Line 286 ⟶ 287: ==Performance == The forward–backward algorithm runs with time complexity <math> O(S^2 T) </math> in space <math> O(S T) </math>, where <math>T</math> is the length of the time sequence and <math>S</math> is the number of symbols in the state alphabet.<ref>[[#RussellNorvig10\|Russell & Norvig 2010 pp. 579]]</ref> The algorithm can also run in constant space with time complexity <math> O(S^2 T^2) </math> by recomputing values at each step.<ref>[[#RussellNorvig10\|Russell & Norvig 2010 pp. 575]]</ref> For comparison, a [[Brute-force search\|brute-force procedure]] would generate all possible <math>S^T</math> state sequences and calculate the joint probability of each state sequence with the observed series of events, which would have [[time complexity]] <math> O(T \cdot S^T) </math>. Brute force is intractable for realistic problems, as the number of possible hidden node sequences typically is extremely high. An enhancement to the general forward-backward algorithm, called the [[Island algorithm]], trades smaller memory usage for longer running time, taking <math> O(S^2 T \log T) </math> time and <math> O(S^2 \log T) </math> memory. Furthermore, it is possible to invert the process model to obtain an <math>O(S)</math> space, <math>O(S^2 T)</math> time algorithm, although the inverted process may not exist or be [[ill-conditioned]].<ref>{{cite journal \|last1=Binder \|first1=John \|last2=Murphy \|first2=Kevin \|last3=Russell \|first3=Stuart \|title=Space-efficient inference in dynamic probabilistic networks \|journal=Int'l, Joint Conf. On Artificial Intelligence \|date=1997 \|url=https://www.cs.ubc.ca/~murphyk/Papers/ijcai97.pdf \|access-date=8 July 2020}}</ref> In addition, algorithms have been developed to compute <math>\mathbf{f_{0:t+1}}</math> efficiently through online smoothing such as the fixed-lag smoothing (FLS) algorithm.<ref>[[#RussellNorvig10\|Russell & Norvig 2010 Figure 15.6 pp. 580]]</ref> Line 318 ⟶ 319: Given HMM (just like in [[Viterbi algorithm]]) represented in the [[Python programming language]]: <syntaxhighlight lang="python"> states = ('"Healthy'", '"Fever'") end_state = '"E'" observations = ('"normal'", '"cold'", '"dizzy'") start_probability = {'"Healthy'": 0.6, '"Fever'": 0.4} transition_probability = { ~~'Healthy'~~ "Healthy": {'"Healthy'": 0.69, '"Fever'": 0.3, '"E'": 0.01}, ~~'Fever'~~ "Fever": {'"Healthy'": 0.4, '"Fever'": 0.59, '"E'": 0.01}, } emission_probability = { ~~'Healthy'~~ "Healthy": {'"normal'": 0.5, '"cold'": 0.4, '"dizzy'": 0.1}, ~~'Fever'~~ "Fever": {'"normal'": 0.1, '"cold'": 0.3, '"dizzy'": 0.6}, } </syntaxhighlight> Line 395 ⟶ 396: <syntaxhighlight lang="python"> def example(): return fwd_bkw( observations, states, start_probability, transition_probability, emission_probability, end_state), ) </syntaxhighlight> <syntaxhighlight lang="pycon"> Line 418 ⟶ 421: == References== {{reflist}} * [[Lawrence Rabiner\|Lawrence R. Rabiner]], A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. ''Proceedings of the [[IEEE]]'', 77 (2), p. 257–286, February 1989. [https://dx.doi.org/10.1109/5.18626 10.1109/5.18626] * {{cite journal \|author=Lawrence R. Rabiner, B. H. Juang\|title=An introduction to hidden Markov models\|journal=IEEE ASSP Magazine \|date=January 1986 \|pages=4–15}} * {{cite book \| author = Eugene Charniak\|title = Statistical Language Learning\|publisher = MIT Press\| ___location=Cambridge, Massachusetts\|year = 1993\|isbn=978-0-262-53141-2}} * <cite id = RussellNorvig10>{{cite book \| author = Stuart Russell and Peter Norvig\|title = Artificial Intelligence A Modern Approach 3rd Edition\|publisher = Pearson Education/Prentice-Hall\|___location = Upper Saddle River, New Jersey\|year = 2010\|isbn=978-0-13-604259-4}}</cite> ==External links == * [http://www.cs.jhu.edu/~jason/papers/#eisner-2002-tnlp An interactive spreadsheet for teaching the forward–backward algorithm] (spreadsheet and article with step-by-step walk-through) * [~~http~~https://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html Tutorial of hidden Markov models including the forward–backward algorithm] * [http://code.google.com/p/aima-java/ Collection of AI algorithms implemented in Java] (including HMM and the forward–backward algorithm) {{DEFAULTSORT:Forward-backward algorithm}} [[Category:Articles with example Python (programming language) code]] [[Category:Dynamic programming]] [[Category:Error detection and correction]]