Forward–backward algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:42, 27 May 2020 edit 86.169.238.202 (talk) →Python example: Removed unnecessary initialisations of b_prev and f_prev ← Previous edit		Latest revision as of 07:56, 20 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →External links: HTTP to HTTPS for Brown University Tag: AWB
(19 intermediate revisions by 12 users not shown)
Line 1: {{Short description\|Inference algorithm for hidden Markov models}} {{Inline\|date=April 2018}} The '''forward–backward algorithm''' is an [[Statistical_inference \| inference]] [[algorithm]] for [[hidden Markov model]]s which computes the [[posterior probability\|posterior]] [[marginal probability\|marginals]] of all hidden state variables given a sequence of observations/emissions <math>o_{1:T}:= o_1,\dots,o_T</math>, i.e. it computes, for all hidden state variables <math>X_t \in \{X_1, \dots, X_T\}</math>, the distribution <math>P(X_t\ \|\ o_{1:T})</math>. This inference task is usually called '''smoothing'''. The algorithm makes use of the principle of [[dynamic programming]] to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name ''forward–backward algorithm''. The term ''forward–backward algorithm'' is also used to refer to any algorithm belonging to the general class of algorithms that operate on sequence models in a forward–backward manner. In this sense, the descriptions in the remainder of this article refer ~~but~~only to one specific instance of this class. ==Overview == Line 23 ⟶ 24: ==Forward probabilities== The following description will use matrices of probability values ~~rather~~instead ~~than~~of probability distributions. However, ~~although~~it inis ~~general~~important to note that the forward-backward algorithm can generally be applied to both continuous ~~as well as~~and discrete probability models. We transform the probability distributions related to a given [[hidden Markov model]] into matrix notation as follows. Line 34 ⟶ 35: </math> In a typical Markov model, we would multiply a state vector by this matrix to obtain the probabilities for the subsequent state. In a hidden Markov model the state is unknown, and we instead observe events associated with the possible states. An event matrix of the form: :<math>\mathbf{B} = \begin{pmatrix} Line 46 ⟶ 47: :<math>\mathbf{P}(O = j)=\sum_{i} \pi_i B_{i,j}</math> ~~This~~The probability of a given state leading to the observed event j can be represented in matrix form by multiplying the state row-vector (<math>\mathbf{\pi}</math>) bywith an observation matrix (<math>\mathbf{O_j} = \mathrm{diag}(B_{,o_j})</math>) containing only diagonal entries. ~~Each entry is the probability of the observed event given each state.~~ Continuing the above example, anthe observation ofmatrix for event 1 would be: :<math>\mathbf{O_1} = \begin{pmatrix} Line 60 ⟶ 61: </math> We can now make this general procedure specific to our series of observations. Assuming an initial state vector <math>\mathbf{\pi}_0</math>, (which can be optimized as a parameter through repetitions of the forward-~~back~~backward procedure), we begin with <math>\mathbf{f_{0:0}} = \mathbf{\pi}_0</math>, then updating the state distribution and weighting by the likelihood of the first observation: :<math> \mathbf{f_{0:1}} = \mathbf{\pi}_0 \mathbf{T} \mathbf{O_{~~o(1)~~o_1}} </math> Line 69 ⟶ 70: :<math> \mathbf{f_{0:t}} = \mathbf{f_{0:t-1}} \mathbf{T} \mathbf{O_{~~o(t)~~o_t}} </math> This value is the forward unnormalized [[probability vector]]. The i'th entry of this vector provides: :<math> Line 81 ⟶ 82: :<math> \mathbf{\hat{f}_{0:t}} = c_t^{-1}\ \mathbf{\hat{f}_{0:t-1}} \mathbf{T} \mathbf{O_{~~o(t)~~o_t}} </math> Line 114 ⟶ 115: </math> Notice that we are now using a [[Row and column vectors\|column vector]] while the forward probabilities used row vectors. We can then work backwards using: :<math> Line 190 ⟶ 191: </math> Notice that the [[transformation matrix]] is also transposed, but in our example the transpose is equal to the original matrix. Performing these calculations and normalizing the results provides: :<math> Line 227 ⟶ 228: </math> For the backward probabilities, we start with: :<math> Line 255 ⟶ 256: </math> Finally, we will compute the smoothed probability values. These ~~result also~~results must also be scaled so that its entries sum to 1 because we did not scale the backward probabilities with the <math>c_t</math>'s found earlier. The backward probability vectors above thus actually represent the likelihood of each state at time t given the future observations. Because these vectors are proportional to the actual backward probabilities, the result has to be scaled an additional time. :<math> Line 286 ⟶ 287: ==Performance == The ~~brute-force~~forward–backward ~~procedure~~algorithm ~~for~~runs ~~the~~with ~~solution~~time ofcomplexity ~~this~~<math> ~~problem~~O(S^2 isT) ~~the~~</math> ~~generation~~in ofspace ~~all~~<math> ~~possible~~O(S T) </math>, where <math>N^T</math> ~~state~~is ~~sequences~~the ~~and~~length ~~calculating~~of the ~~joint~~time ~~probability~~sequence ofand ~~each~~<math>S</math> ~~state~~is ~~sequence~~the ~~with~~number of symbols in the ~~observed~~state ~~series~~alphabet.<ref>[[#RussellNorvig10\|Russell of& ~~events~~Norvig 2010 pp. 579]]</ref> ~~This~~ ~~approach~~The ~~has~~algorithm can also run in constant space with [[time complexity]] <math> O(~~T \cdot N~~S^2 T^2) </math>, ~~where~~by recomputing values at each step.<~~math~~ref>T[[#RussellNorvig10\|Russell & Norvig 2010 pp. 575]]</~~math~~ref> isFor ~~the~~comparison, ~~length~~a of[[Brute-force ~~sequences~~search\|brute-force ~~and~~procedure]] would generate all possible <math>NS^T</math> isstate sequences and calculate the ~~number~~joint probability of ~~symbols~~each instate sequence with the ~~state~~observed ~~alphabet~~series of events, which would have [[time complexity]] <math> O(T \cdot S^T) </math>. ~~This~~Brute force is intractable for realistic problems, as the number of possible hidden node sequences typically is extremely high~~. However, the forward–backward algorithm has time complexity <math> O(N^2 T)\, </math>~~. An enhancement to the general forward-backward algorithm, called the [[Island algorithm]], trades smaller memory usage for longer running time, taking <math> O(NS^2 T \log T)\, </math> time and <math> O(NS \log T)\, </math> memory. OnFurthermore, ait ~~computer~~is ~~with~~possible anto ~~unlimited~~invert ~~number~~the ofprocess ~~processors,~~model ~~this~~to ~~can~~obtain bean ~~reduced~~<math>O(S)</math> tospace, <math> O(NS^2 T)\, </math> ~~total~~time ~~time~~algorithm, ~~while~~although ~~still~~the ~~taking~~inverted ~~only~~process ~~<math>~~may ~~O(N~~not ~~\log~~exist ~~T)\,~~or ~~</math>~~be ~~memory~~[[ill-conditioned]].<ref>J.{{cite journal \|last1=Binder, K.\|first1=John \|last2=Murphy ~~and S.~~\|first2=Kevin \|last3=Russell. \|first3=Stuart \|title=Space-~~Efficient~~efficient ~~Inference~~inference in ~~Dynamic~~dynamic ~~Probabilistic Networks.~~probabilistic networks \|journal=Int'l, Joint Conf. onOn Artificial Intelligence, \|date=1997 \|url=https://www.cs.ubc.ca/~murphyk/Papers/ijcai97.pdf \|access-date=8 July 2020}}</ref> In addition, algorithms have been developed to compute <math>\mathbf{f_{0:t+1}}</math> efficiently through online smoothing such as the fixed-lag smoothing (FLS) algorithm .<ref>[[#RussellNorvig10\|Russell & Norvig 2010 Figure 15.6 pp. 580]].</ref> ==Pseudocode== Line 318 ⟶ 319: Given HMM (just like in [[Viterbi algorithm]]) represented in the [[Python programming language]]: <syntaxhighlight lang="python"> states = ('"Healthy'", '"Fever'") end_state = '"E'" observations = ('"normal'", '"cold'", '"dizzy'") start_probability = {'"Healthy'": 0.6, '"Fever'": 0.4} transition_probability = { ~~'Healthy'~~ "Healthy": {'"Healthy'": 0.69, '"Fever'": 0.3, '"E'": 0.01}, ~~'Fever'~~ "Fever": {'"Healthy'": 0.4, '"Fever'": 0.59, '"E'": 0.01}, } emission_probability = { ~~'Healthy'~~ "Healthy": {'"normal'": 0.5, '"cold'": 0.4, '"dizzy'": 0.1}, ~~'Fever'~~ "Fever": {'"normal'": 0.1, '"cold'": 0.3, '"dizzy'": 0.6}, } </syntaxhighlight> Line 349 ⟶ 350: prev_f_sum = start_prob[st] else: prev_f_sum = sum(f_prev[k] trans_prob[k][st] for k in states) f_curr[st] = emm_prob[st][observation_i] * prev_f_sum Line 360 ⟶ 361: # Backward part of the algorithm bkw = [] for i, observation_i_plus in enumerate(reversed(observations[1:] + (None,))): b_curr = {} for st in states: Line 381 ⟶ 382: assert p_fwd == p_bkw return fwd, bkw, posterior </syntaxhighlight> Line 396: <syntaxhighlight lang="python"> def example(): return fwd_bkw( observations, states, start_probability, transition_probability, emission_probability, end_state), ) </syntaxhighlight> <syntaxhighlight lang="pycon"> Line 419 ⟶ 421: == References== {{reflist}} * [[Lawrence Rabiner\|Lawrence R. Rabiner]], A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. ''Proceedings of the [[IEEE]]'', 77 (2), p. 257–286, February 1989. [https://dx.doi.org/10.1109/5.18626 10.1109/5.18626] * {{cite journal \|author=Lawrence R. Rabiner, B. H. Juang\|title=An introduction to hidden Markov models\|journal=IEEE ASSP Magazine \|date=January 1986 \|pages=4–15}} * {{cite book \| author = Eugene Charniak\|title = Statistical Language Learning\|publisher = MIT Press\| ___location=Cambridge, Massachusetts\|year = 1993\|isbn=978-0-262-53141-2}} * <cite id = RussellNorvig10>{{cite book \| author = Stuart Russell and Peter Norvig\|title = Artificial Intelligence A Modern Approach 3rd Edition\|publisher = Pearson Education/Prentice-Hall\|___location = Upper Saddle River, New Jersey\|year = 2010\|isbn=978-0-13-604259-4}}</cite> ==External links == * [http://www.cs.jhu.edu/~jason/papers/#eisner-2002-tnlp An interactive spreadsheet for teaching the forward–backward algorithm] (spreadsheet and article with step-by-step walk-through) * [~~http~~https://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html Tutorial of hidden Markov models including the forward–backward algorithm] * [http://code.google.com/p/aima-java/ Collection of AI algorithms implemented in Java] (including HMM and the forward–backward algorithm) {{DEFAULTSORT:Forward-backward algorithm}} [[Category:Articles with example Python (programming language) code]] [[Category:Dynamic programming]] [[Category:Error detection and correction]]