Forward–backward algorithm: Difference between revisions

Content deleted Content added
m Citations: [Pu168]Tweaked: unused_data. You can use this bot yourself! Report bugs here.
corrected definition; more details in definition and overview; standardized notation (capitalization) in formulas
Line 2:
{{Mergefrom|Forward algorithm|date=April 2010}}
 
In [[computer science]], the '''forward-backward algorithm''' is an [[algorithminference]] for computing the [[probabilityalgorithm]] of a particular observation sequence in the context offor [[hidden Markov modelmodels]]s. The algorithm firstwhich computes athe set[[posterior ofprobability|posterior]] forward probabilities which provide the[[marginal probability|marginals]] of observingall thehidden firststate kvariables observationsgiven in thea sequence and ending in each of theobservations/emissions possible<math>o_{1:t}:= Markovo_1,\dots,o_t</math>, model statesi.e. it Thecomputes, algorithmfor alsoall computeshidden astate setvariables of<math>X_k backward\in probabilities\{X_1, which\dots, provideX_t\}</math>, the probabilitydistribution of<math>P(X_k\ observing|\ theo_{1:t})</math>. remainingThis observationsinference giventask anis initialusually statecalled ''smoothing''. The Thesealgorithm twomakes setsuse of probabilitiesthe canprinciple thenof be[[dynamic combinedprogramming]] to provideefficiently compute the probabilityvalues ofthat beingare inrequired eachto stateobtain atthe aposterior specificmarginal timedistributions duringin thetwo observation sequencepasses. The forward-backwardfirst algorithmpass cangoes thusforward bein usedtime to findwhile the mostsecond likelygoes statebackward forin atime; hiddenhence Markovthe modelname at''forward-backward any timealgorithm''.
 
The term ''forward-backward algorithm'' is also used to refer to any algorithm belonging to the general class of algorithms that operate on sequence models in a forward-backward manner. In this sense, the descriptions in the remainder of this article refer but to one specific instance of this class.
 
==Overview ==
 
The algorithm comprises three steps:
In the first pass, the forward-backward algorithm computes a set of forward probabilities which provide, for all <math>k \in \{1, \dots, t\}</math>, the probability of ending up in any particular state given the first <math>k</math> observations in the sequence, i.e. <math>P(X_k\ |\ o_{1:k})</math>. In the second pass, the algorithm computes a set of backward probabilities which provide the probability of observing the remaining observations given any starting point <math>k</math>, i.e. <math>P(o_{k+1:t}\ |\ X_k)</math>. These two sets of probability distributions can then be combined to obtain the distribution over states at any specific point in time given the entire observation sequence:
 
:<math>P(X_k\ |\ o_{1:t}) = P(X_k\ |\ o_{1:k}, o_{k+1:t}) \propto P(o_{k+1:t}\ |\ X_k) P(X_k\ |\ o_{1:k})</math>
 
The last step follows from an application of [[Bayes' rule]] and the [[conditional independence]] of <math>o_{k+1:t}</math> and <math>o_{1:k}</math> given <math>X_k</math>.
 
TheAs outlined above, the algorithm comprisesinvolves three steps:
 
# computing forward probabilities
Line 12 ⟶ 21:
 
The forward and backward steps are often called "forward message pass" and "backward message pass". The wording originates from the way the algorithm processes the given observation sequence. First the algorithm moves forward starting with the first observation in the sequence and going to the last, and then returning back to the first. At each single observation in the sequence probabilities to be used for calculations at the next observation are computed. During the backward pass the algorithm simultaneously performs the smoothing step. This step allows the algorithm to take into account any past observations of output for computing more accurate results.
 
The forward-backward algorithm can be used to find the most likely state for any point in time. It cannot, however, be used to find the most likely sequence of states (see [[Viterbi algorithm]]).
 
== Forward Probabilities ==
Line 58 ⟶ 69:
 
:<math>
\mathbf{f_{0:t}}(i) = \mathbf{P}(O_1o_1, O_2o_2, \dots, O_to_t, x_tX_t=X_ix_i | \mathbf{\pi} )
</math>
 
Line 70 ⟶ 81:
 
:<math>
\mathbf{P}(O_1o_1, O_2o_2, \dots, O_to_t|\mathbf{\pi}) = \prod_{s=1}^t c_s
</math>
 
Line 78 ⟶ 89:
\mathbf{\hat{f}_{0:t}}(i) =
\frac{\mathbf{f_{0:t}}(i)}{\prod_{s=1}^t c_s} =
\frac{\mathbf{P}(O_1o_1, O_2o_2, \dots, O_to_t, x_tX_t=X_ix_i | \mathbf{\pi} )}{\mathbf{P}(O_1o_1, O_2o_2, \dots, O_to_t|\mathbf{\pi})} =
\mathbf{P}(x_tX_t=X_ix_i | O_1o_1, O_2o_2, \dots, O_to_t, \mathbf{\pi} )
</math>
 
Line 88 ⟶ 99:
 
:<math>
\mathbf{b_{t:T}}(i) = \mathbf{P}(O_o_{t+1}, O_o_{t+2}, \dots, O_o_{T} | x_tX_t=X_ix_i )
</math>
 
That is, we now want to assume that we start in a particular state (<math>x_tX_t=X_ix_i</math>), and we are now interested in the probability of observing all future events from this state. Since the initial state is assumed as given (i.e. the prior probability of this state = 100%), we begin with:
 
:<math>
Line 120 ⟶ 131:
:<math>
\mathbf{\gamma_t}(i) =
\mathbf{P}(x_t=X_i | O_1o_1, O_2o_2, \dots, O_To_T, \mathbf{\pi}) =
\frac{ \mathbf{P}(O_1o_1, O_2o_2, \dots, O_To_T, x_tX_t=X_ix_i | \mathbf{\pi} ) }{ \mathbf{P}(O_1o_1, O_2o_2, \dots, O_To_T | \mathbf{\pi} ) } =
\frac{ \mathbf{f_{0:t}}(i) \cdot \mathbf{b_{t:T}}(i) }{ \prod_{s=1}^T c_s } =
\mathbf{\hat{f}_{0:t}}(i) \cdot \mathbf{\hat{b}_{t:T}}(i)
Line 128 ⟶ 139:
To understand this, we note that <math>\mathbf{f_{0:t}}(i) \cdot \mathbf{b_{t:T}}(i)</math> provides the probability for observing the given events in a way that passes through state <math>X_i</math> at time t. This probability includes the forward probabilities covering all events up to time t as well as the backward probabilities which include all future events. This is the numerator we are looking for in our equation, and we divide by the total probability of the observation sequence to normalize this value and extract only the probability that <math>x_t=X_i</math>. These values are sometimes called the "smoothed values" as they combine the forward and backward probabilities to compute a final probability.
 
The values <math>\mathbf{\gamma_t}(i)</math> thus provide the probability of being in each state at time t. As such, they are useful for determining the most probable state at any time. It should be noted, however, that the term "most probable state" is somewhat ambiguous. While the most probable state is the most likely to be correct at a given point, the sequence of individually probable states is not likely to be the most probable sequence. This is because the probabilities for each point are calculated independently of each other. They do not take into account the transition probabilities between states, and it is thus possible to get states at two moments (t and t+1) that are both most probable at those time points but which have very little probability of occurring together, i. ( iee. <math> \mathbf{P}(x_tX_t=X_ix_i,x_X_{t+1}=X_jx_j) \neq \mathbf{P}(x_tX_t=X_ix_i) \mathbf{P}(x_X_{t+1}=X_jx_j) </math> ) . The most probable sequence of states that produced an observation sequence can be found using the [[Viterbi algorithm]].
 
==Example ==