Content deleted Content added
pi_0 for initial state probabilities |
m →External links: HTTP to HTTPS for Brown University |
||
(27 intermediate revisions by 17 users not shown) | |||
Line 1:
{{Short description|Inference algorithm for hidden Markov models}}
{{Inline|date=April 2018}}
The '''forward–backward algorithm''' is an [[Statistical_inference | inference]] [[algorithm]] for [[hidden Markov model]]s which computes the [[posterior probability|posterior]] [[marginal probability|marginals]] of all hidden state variables given a sequence of observations/emissions <math>o_{1:T}:= o_1,\dots,o_T</math>, i.e. it computes, for all hidden state variables <math>X_t \in \{X_1, \dots, X_T\}</math>, the distribution <math>P(X_t\ |\ o_{1:T})</math>. This inference task is usually called '''smoothing'''. The algorithm makes use of the principle of [[dynamic programming]] to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name ''forward–backward algorithm''.
The term ''forward–backward algorithm'' is also used to refer to any algorithm belonging to the general class of algorithms that operate on sequence models in a forward–backward manner. In this sense, the descriptions in the remainder of this article refer
==Overview ==
Line 23 ⟶ 24:
==Forward probabilities==
The following description will use matrices of probability values
We transform the probability distributions related to a given [[hidden Markov model]] into matrix notation as follows.
The transition probabilities <math>\mathbf{P}(X_t\mid X_{t-1})</math> of a given random variable <math>X_t</math> representing all possible states in the hidden Markov model will be represented by the matrix <math>\mathbf{T}</math> where the column index <math>
:<math>\mathbf{T} = \begin{pmatrix}
Line 34 ⟶ 35:
</math>
In a typical Markov model, we would multiply a state vector by this matrix to obtain the probabilities for the subsequent state. In a hidden Markov model the state is unknown, and we instead observe events associated with the possible states. An event matrix of the form:
:<math>\mathbf{B} = \begin{pmatrix}
Line 46 ⟶ 47:
:<math>\mathbf{P}(O = j)=\sum_{i} \pi_i B_{i,j}</math>
:<math>\mathbf{O_1} = \begin{pmatrix}
Line 60 ⟶ 61:
</math>
We can now make this general procedure specific to our series of observations. Assuming an initial state vector <math>\mathbf{\pi}_0</math>, (which can be optimized as a parameter through repetitions of the forward-
:<math>
\mathbf{f_{0:1}} = \mathbf{\pi}_0 \mathbf{T} \mathbf{O_{
</math>
Line 69 ⟶ 70:
:<math>
\mathbf{f_{0:t}} = \mathbf{f_{0:t-1}} \mathbf{T} \mathbf{O_{
</math>
This value is the forward unnormalized [[probability vector]]. The i'th entry of this vector provides:
:<math>
Line 81 ⟶ 82:
:<math>
\mathbf{\hat{f}_{0:t}} = c_t^{-1}\ \mathbf{\hat{f}_{0:t-1}} \mathbf{T} \mathbf{O_{
</math>
Line 114 ⟶ 115:
</math>
Notice that we are now using a [[Row and column vectors|column vector]] while the forward probabilities used row vectors. We can then work backwards using:
:<math>
Line 145 ⟶ 146:
To understand this, we note that <math>\mathbf{f_{0:t}}(i) \cdot \mathbf{b_{t:T}}(i)</math> provides the probability for observing the given events in a way that passes through state <math>x_i</math> at time t. This probability includes the forward probabilities covering all events up to time t as well as the backward probabilities which include all future events. This is the numerator we are looking for in our equation, and we divide by the total probability of the observation sequence to normalize this value and extract only the probability that <math>X_t=x_i</math>. These values are sometimes called the "smoothed values" as they combine the forward and backward probabilities to compute a final probability.
The values <math>\mathbf{\gamma_t}(i)</math> thus provide the probability of being in each state at time t. As such, they are useful for determining the most probable state at any time.
==Example ==
Line 190 ⟶ 191:
</math>
Notice that the [[transformation matrix]] is also transposed, but in our example the transpose is equal to the original matrix. Performing these calculations and normalizing the results provides:
:<math>
Line 227 ⟶ 228:
</math>
For the backward probabilities, we start with:
:<math>
Line 255 ⟶ 256:
</math>
Finally, we will compute the smoothed probability values. These
:<math>
Line 286 ⟶ 287:
==Performance ==
The
An enhancement to the general forward-backward algorithm, called the [[Island algorithm]], trades smaller memory usage for longer running time, taking <math> O(
In addition, algorithms have been developed to compute <math>\mathbf{f_{0:t+1}}</math> efficiently through online smoothing such as the fixed-lag smoothing (FLS) algorithm
==Pseudocode==
Backward(guessState, sequenceIndex):▼
if sequenceIndex is past the end of the sequence, return 1▼
if (guessState, sequenceIndex) has been seen before, return saved result▼
result = 0▼
for each neighboring state n:▼
result = result + (transition probability from guessState to ▼
n given observation element at sequenceIndex)▼
* Backward(n, sequenceIndex+1)▼
return result▼
'''algorithm''' forward_backward '''is'''
==Python example==▼
'''input:''' guessState
Given HMM (just like in [[Viterbi algorithm]]) represented in the [[Python programming language]]:▼
int ''sequenceIndex''
<source lang="python">▼
'''output:''' ''result''
states = ('Healthy', 'Fever')▼
end_state = 'E'▼
observations = ('normal', 'cold', 'dizzy')▼
'''return''' 1
'''return''' saved result
▲ ''result'' := 0
start_probability = {'Healthy': 0.6, 'Fever': 0.4}▼
▲ '''for each''' neighboring state n:
transition_probability = {▼
▲ ''result'' := result + (transition probability from ''guessState'' to
'Healthy' : {'Healthy': 0.69, 'Fever': 0.3, 'E': 0.01},▼
▲ n given observation element at ''sequenceIndex'')
'Fever' : {'Healthy': 0.4, 'Fever': 0.59, 'E': 0.01},▼
}▼
▲ '''return''' ''result''
▲==Python example==
▲Given HMM (just like in [[Viterbi algorithm]]) represented in the [[Python programming language]]:
▲transition_probability = {
}
emission_probability = {
</syntaxhighlight>
We can write the implementation of the forward-backward algorithm like this:
<
def fwd_bkw(observations, states, start_prob, trans_prob, emm_prob, end_st):
# Forward part of the algorithm
fwd = []
for i, observation_i in enumerate(observations):
f_curr = {}
Line 340 ⟶ 350:
prev_f_sum = start_prob[st]
else:
prev_f_sum = sum(f_prev[k] * trans_prob[k][st] for k in states)
f_curr[st] = emm_prob[st][observation_i] * prev_f_sum
Line 349 ⟶ 359:
p_fwd = sum(f_curr[k] * trans_prob[k][end_st] for k in states)
#
bkw = []
for i, observation_i_plus in enumerate(reversed(observations[1:] + (None,))):▼
▲ for i, observation_i_plus in enumerate(reversed(observations[1:]+(None,))):
b_curr = {}
for st in states:
Line 366 ⟶ 375:
p_bkw = sum(start_prob[l] * emm_prob[l][observations[0]] * b_curr[l] for l in states)
#
posterior = []
for i in range(len(observations)):
Line 373 ⟶ 382:
assert p_fwd == p_bkw
return fwd, bkw, posterior
</syntaxhighlight>
The function <code>fwd_bkw</code> takes the following arguments:
Line 386 ⟶ 394:
In the running example, the forward-backward algorithm is used as follows:
<
def example():
return fwd_bkw(
observations, </syntaxhighlight>
<syntaxhighlight lang="pycon">
>>> for line in example():
... print(*line)
Line 403 ⟶ 412:
{'Healthy': 0.0010418399999999998, 'Fever': 0.00109578} {'Healthy': 0.00249, 'Fever': 0.00394} {'Healthy': 0.01, 'Fever': 0.01}
{'Healthy': 0.8770110375573259, 'Fever': 0.1229889624426741} {'Healthy': 0.623228030950954, 'Fever': 0.3767719690490461} {'Healthy': 0.2109527048413057, 'Fever': 0.7890472951586943}
</syntaxhighlight>
==See also ==
* [[
* [[Viterbi algorithm]]
* [[BCJR algorithm]]
Line 412 ⟶ 421:
== References==
{{reflist}}
* [[Lawrence Rabiner|Lawrence R. Rabiner]], A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. ''Proceedings of the [[IEEE]]'', 77 (2), p. 257–286, February 1989. [https://dx.doi.org/10.1109/5.18626 10.1109/5.18626]
* {{cite journal |author=Lawrence R. Rabiner, B. H. Juang|title=An introduction to hidden Markov models|journal=IEEE ASSP Magazine |date=January 1986 |pages=4–15}}
* {{cite book | author = Eugene Charniak|title = Statistical Language Learning|publisher = MIT Press| ___location=Cambridge, Massachusetts|year = 1993|isbn=978-0-262-53141-2}}
* <cite id = RussellNorvig10>{{cite book | author = Stuart Russell and Peter Norvig|title = Artificial Intelligence A Modern Approach 3rd Edition|publisher = Pearson Education/Prentice-Hall|___location = Upper Saddle River, New Jersey|year = 2010|isbn=978-0-13-604259-4}}</cite>▼
▲*{{cite book | author = Stuart Russell and Peter Norvig|title = Artificial Intelligence A Modern Approach 3rd Edition|publisher = Pearson Education/Prentice-Hall|___location = Upper Saddle River, New Jersey|year = 2010|isbn=978-0-13-604259-4}}
==External links ==
* [http://www.cs.jhu.edu/~jason/papers/#eisner-2002-tnlp An interactive spreadsheet for teaching the forward–backward algorithm] (spreadsheet and article with step-by-step walk-through)
* [
* [http://code.google.com/p/aima-java/ Collection of AI algorithms implemented in Java] (including HMM and the forward–backward algorithm)
{{DEFAULTSORT:Forward-backward algorithm}}
[[Category:Articles with example Python (programming language) code]]
[[Category:Dynamic programming]]
[[Category:Error detection and correction]]
|