TheIn [[computer science]], the '''forward-backward algorithm''' is, a [[dynamic programming]] [[algorithm]] for computing the [[probability]] of a particular observation sequence, given the parameters of the model, operates in the context of [[hidden Markov model]]s.
==Overview ==
The algorithm comprises three steps:
The algorithm is composed of three steps: Computing forward probabilities, computing backward probabilities and computing smoothed values. The forward and backward steps are often called forward message pass and backward message pass. The wording originates from the way the algorithm processes the given observation sequence. First the algorithm moves forward starting with the first observation in the sequence and going to the last, and then returning back to the first. At each single observation in the sequence probabilities to be used for calculations at the next observation are computed. During the backward pass the algorithm simultaneously performes the smoothing step. This step allows the algorithm to take into account any past observations of output for computing more accurate results. ▼
# computing forward probabilities
# computing backward probabilities
The following description will be based on matrices of probability values rather than on probability distributions. We transform the probability distributions related to a given [[hidden Markov model]] into matrix notation as follows. ▼
# computing smoothed values
▲The algorithm is composed of three steps: Computing forward probabilities, computing backward probabilities and computing smoothed values. The forward and backward steps are often called "forward message pass " and "backward message pass ". The wording originates from the way the algorithm processes the given observation sequence. First the algorithm moves forward starting with the first observation in the sequence and going to the last, and then returning back to the first. At each single observation in the sequence probabilities to be used for calculations at the next observation are computed. During the backward pass the algorithm simultaneously performes the smoothing step. This step allows the algorithm to take into account any past observations of output for computing more accurate results.
▲The following description willtakes beas basedits onbase matrices of probability values rather than on probability distributions. We transform the probability distributions related to a given [[hidden Markov model]] into matrix notation as follows.
The transition probabilities <math>\mathbf{P}(X_t\mid X_{t-1})</math> of a given random variable <math>X_t</math> representing all possible states in the [[hidden Markov model]] will be represented by the matrix <math>\mathbf{T}</math> where the row index, i, will represent the start state and the column index, j, represents the target state. For example, in the example below <math>\mathbf{T}</math> would be defined as:
</math>
This allows a very elegant description of how to compute the next forward probability is computed. Let the set of forward probabilities be stored in yet another matrix <math>\mathbf{f_{1:t+1}}</math>. Where 1:t+1 indicates that the computed probabilities are depending on all forward probabilities from 1 to t+1 including the current one which we will describe with <math>\mathbf{f_{1:t}}</math>. Hence, <math>\mathbf{f_{1:t+1}}</math> is equal to the multiplication of the transposed transformation matrix with the current forward probabilities and the observation matrix for the next evidence observation. Finally the resulting matrix needs to berequires normalizednormalization, i.e. the resulting values in the matrix are divided by the sum of all values in the matrix. The normalization factor is described with <math>\alpha</math>. Therefore the resulting forward computation is described by the following general formula:
<math>\mathbf{f_{1:t+1}} = \alpha\mathbf{O_{t+1}}\mathbf{T^T}\mathbf{f_{1:t}}</math>
TheWe can describe the backward computation <math>\mathbf{b_{k+1:t}}</math>, that is startingstarts at the end of the sequence, can be described in a similar manner. Let the end of the sequence be described by the index k, starting at 0. Therefore running from k down to t=0 and calculating each backward probability can be described by the following formula:
<math>\mathbf{b_{k+1:t}} = \alpha(\mathbf{T}\mathbf{O_{k+1}}\mathbf{b_{k+2t}})\times\mathbf{f_{1:t}}</math>
Note that we use the non-transposed matrix of <math>\mathbf{T}</math> and that the order of the terms has changed. Also note that theas a final product iswe have not a usual matrix multiplication, but a point product. This operation multiplies each value in one matrix with the corresponding value of the other. Finally note that the description in [[#RusselNorvig03|Russel & Norvig 2003 pp. 550]] excludes the point product, thought the procedure is required earlier.
The third and final step isinvolves the computation of smoothed probabilities <math>\mathbf{svt}</math>. These are the point product of the backward and forward probabilities for each t. Therefore the formular is defined as:
<math>\mathbf{sv_t} = \alpha\mathbf{b_{k+1:t}}\times\mathbf{f_{1:t}}</math>
The example in the following section will show numerically howthe eachcalculation of these matrices are calculated.
==Example ==
This example istakes basedas onits basis the umbrella world in [[#RusselNorvig03|Russel & Norvig 2003 pp. 540]]. The example uses a longer observation sequence {umbrella, umbrella, no umbrella, umbrella, umbrella} than the example described in the book. Assume the probabilities for rain given an umbrella observation are 90% if an umbrella is observed and 20% if no umbrella is observed. Further more assume that the probability of rain in t+1 given that it is raining in now (i.e. t) is 70% and the probability that it does not rain is 30%. The following matrices can be extracted from the umbrella world given the above observations:
<math>
</math>
Note that <math>\mathbf{O_3}</math> is differentdiffers from the others because of the "no umbrella" observation. Also note that <math>\mathbf{T}</math> and <math>\mathbf{T^T}</math> are equal because the are symmetrical. before we can start to compute forward messages we need to describe two special values, the first forward probability and the k+2 backward probability. The first forward probability at t=0 is defined by the prior of the random variable. The k+2 backward probability is defined by the "true" matrix. Therefore if follows:
<math>
</math>
Now we will first iterate through all t and compute the forward probabilities. The following matrices are appearingappear in the same order as described in the forward formula above. Some calculations may be less accurate due to the limited numbers of decimals used in this example.
<math>
</math>
Now that we have defined the forward probabilities are defined, we continue to compute the backward probabilities. Again the matrices appear in the order as in the backward formula above.
<math>
</math>
==Performance ==
The brute -force procedure for the solution of this problem is the generation of all possible sequences of observed events and hidden states with their probabilities using the two transition matrices. The joint probability of two sequences, given the model, is calculated by multiplying the corresponding probabilities. The brute force procedure has [[time complexity]] <math> O(T \cdot N^T) </math>, where <math>T</math> is the length of sequences and <math>N</math> is the number of symbols in the state alphabet. This is intractable for realistic problems, as the number of possible hidden node sequences typically is extremely high. However, the forward-backward algorithm has time complexity <math> O(N^2 T)\, </math>.
Several enhancements are known to the general forward-backward algorithm which allow for computations to take place in constant space. In addition, as t grows, algorithms have been developed to compute <math>\mathbf{f_{1:t+1}}</math> efficiently through online smoothing such as the fixed-lag smoothing (FLS) algorithm [[#RusselNorvig03|Russel & Norvig 2003 pp. 552]].
==Pseudo codePseudocode==
<code><pre>
ForwardBackward(guessState, sequenceIndex):
</pre></code>
==See also ==
* [[Baum-Welch algorithm]]
* [[Hidden Markov model]]
* [[Viterbi algorithm]]
== References==
* [[Lawrence Rabiner|Lawrence R. Rabiner]], [http://www.caip.rutgers.edu/~lrr/Reprints/tutorial%20on%20hmm%20and%20applications.pdf A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition]. ''Proceedings of the [[IEEE]]'', 77 (2), p. 257–286, February 1989.
<cite id=RusselNorvig03>
*{{cite book | author = Russel S., Norvig P.|title = Artificial Intelligence A Modern Approach 2nd Edition|publisher = Pearson Education|address = Upper Saddle River, New Jersey|year = 2003|id=ISBN 0-13-080302-2}}
==External links ==
* [http://www.cs.jhu.edu/~jason/papers/#tnlp02 An Interactive Spreadsheet for Teaching the Forward-Backward Algorithm] (spreadsheet and article with step-by-step walkthroughwalk-through)
* [http://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html Tutorial of Hidden Markov Models including the forward-backward algorithm]
* [http://code.google.com/p/aima-java/ Collection of AI algorithms implemented in Java] (including HMM and the forward-backward algorithm)
|