Revision as of 10:36, 28 June 2010 edit Citation bot 1 (talk \| contribs) Bots 130,044 edits m Citations: [Pu168]Tweaked: unused_data. You can use this bot yourself! Report bugs here. ← Previous edit		Revision as of 18:04, 19 July 2010 edit undo 131.159.0.7 (talk) corrected definition; more details in definition and overview; standardized notation (capitalization) in formulas Next edit →
Line 2: {{Mergefrom\|Forward algorithm\|date=April 2010}} In [[computer science]], the '''forward-backward algorithm''' is an [[~~algorithm~~inference]] ~~for computing the~~ [[~~probability~~algorithm]] ~~of a particular observation sequence in the context of~~for [[hidden Markov ~~model~~models]]s. ~~The algorithm first~~which computes athe ~~set~~[[posterior ofprobability\|posterior]] ~~forward probabilities which provide the~~[[marginal probability\|marginals]] of ~~observing~~all ~~the~~hidden ~~first~~state kvariables ~~observations~~given ~~in the~~a sequence ~~and ending in each~~ of ~~the~~observations/emissions ~~possible~~<math>o_{1:t}:= ~~Markov~~o_1,\dots,o_t</math>, ~~model states~~i.e. it ~~The~~computes, ~~algorithm~~for ~~also~~all ~~computes~~hidden astate ~~set~~variables of<math>X_k ~~backward~~\in ~~probabilities~~\{X_1, ~~which~~\dots, ~~provide~~X_t\}</math>, the ~~probability~~distribution of<math>P(X_k\ ~~observing~~\|\ ~~the~~o_{1:t})</math>. ~~remaining~~This ~~observations~~inference ~~given~~task anis ~~initial~~usually ~~state~~called ''smoothing''. The ~~These~~algorithm ~~two~~makes ~~sets~~use of ~~probabilities~~the ~~can~~principle ~~then~~of be[[dynamic ~~combined~~programming]] to ~~provide~~efficiently compute the ~~probability~~values ofthat ~~being~~are inrequired ~~each~~to ~~state~~obtain atthe aposterior ~~specific~~marginal ~~time~~distributions ~~during~~in ~~the~~two ~~observation sequence~~passes. The ~~forward-backward~~first ~~algorithm~~pass ~~can~~goes ~~thus~~forward bein ~~used~~time ~~to find~~while the ~~most~~second ~~likely~~goes ~~state~~backward ~~for~~in atime; ~~hidden~~hence ~~Markov~~the ~~model~~name at''forward-backward ~~any time~~algorithm''. The term ''forward-backward algorithm'' is also used to refer to any algorithm belonging to the general class of algorithms that operate on sequence models in a forward-backward manner. In this sense, the descriptions in the remainder of this article refer but to one specific instance of this class. ==Overview == The algorithm comprises three steps: ▼ In the first pass, the forward-backward algorithm computes a set of forward probabilities which provide, for all <math>k \in \{1, \dots, t\}</math>, the probability of ending up in any particular state given the first <math>k</math> observations in the sequence, i.e. <math>P(X_k\ \|\ o_{1:k})</math>. In the second pass, the algorithm computes a set of backward probabilities which provide the probability of observing the remaining observations given any starting point <math>k</math>, i.e. <math>P(o_{k+1:t}\ \|\ X_k)</math>. These two sets of probability distributions can then be combined to obtain the distribution over states at any specific point in time given the entire observation sequence: :<math>P(X_k\ \|\ o_{1:t}) = P(X_k\ \|\ o_{1:k}, o_{k+1:t}) \propto P(o_{k+1:t}\ \|\ X_k) P(X_k\ \|\ o_{1:k})</math> The last step follows from an application of [[Bayes' rule]] and the [[conditional independence]] of <math>o_{k+1:t}</math> and <math>o_{1:k}</math> given <math>X_k</math>. ▲~~The~~As outlined above, the algorithm ~~comprises~~involves three steps: # computing forward probabilities Line 12 ⟶ 21: The forward and backward steps are often called "forward message pass" and "backward message pass". The wording originates from the way the algorithm processes the given observation sequence. First the algorithm moves forward starting with the first observation in the sequence and going to the last, and then returning back to the first. At each single observation in the sequence probabilities to be used for calculations at the next observation are computed. During the backward pass the algorithm simultaneously performs the smoothing step. This step allows the algorithm to take into account any past observations of output for computing more accurate results. The forward-backward algorithm can be used to find the most likely state for any point in time. It cannot, however, be used to find the most likely sequence of states (see [[Viterbi algorithm]]). == Forward Probabilities == Line 58 ⟶ 69: :<math> \mathbf{f_{0:t}}(i) = \mathbf{P}(~~O_1~~o_1, ~~O_2~~o_2, \dots, ~~O_t~~o_t, ~~x_t~~X_t=~~X_i~~x_i \| \mathbf{\pi} ) </math> Line 70 ⟶ 81: :<math> \mathbf{P}(~~O_1~~o_1, ~~O_2~~o_2, \dots, ~~O_t~~o_t\|\mathbf{\pi}) = \prod_{s=1}^t c_s </math> Line 78 ⟶ 89: \mathbf{\hat{f}_{0:t}}(i) = \frac{\mathbf{f_{0:t}}(i)}{\prod_{s=1}^t c_s} = \frac{\mathbf{P}(~~O_1~~o_1, ~~O_2~~o_2, \dots, ~~O_t~~o_t, ~~x_t~~X_t=~~X_i~~x_i \| \mathbf{\pi} )}{\mathbf{P}(~~O_1~~o_1, ~~O_2~~o_2, \dots, ~~O_t~~o_t\|\mathbf{\pi})} = \mathbf{P}(~~x_t~~X_t=~~X_i~~x_i \| ~~O_1~~o_1, ~~O_2~~o_2, \dots, ~~O_t~~o_t, \mathbf{\pi} ) </math> Line 88 ⟶ 99: :<math> \mathbf{b_{t:T}}(i) = \mathbf{P}(O_o_{t+1}, O_o_{t+2}, \dots, O_o_{T} \| ~~x_t~~X_t=~~X_i~~x_i ) </math> That is, we now want to assume that we start in a particular state (<math>~~x_t~~X_t=~~X_i~~x_i</math>), and we are now interested in the probability of observing all future events from this state. Since the initial state is assumed as given (i.e. the prior probability of this state = 100%), we begin with: :<math> Line 120 ⟶ 131: :<math> \mathbf{\gamma_t}(i) = \mathbf{P}(x_t=X_i \| ~~O_1~~o_1, ~~O_2~~o_2, \dots, ~~O_T~~o_T, \mathbf{\pi}) = \frac{ \mathbf{P}(~~O_1~~o_1, ~~O_2~~o_2, \dots, ~~O_T~~o_T, ~~x_t~~X_t=~~X_i~~x_i \| \mathbf{\pi} ) }{ \mathbf{P}(~~O_1~~o_1, ~~O_2~~o_2, \dots, ~~O_T~~o_T \| \mathbf{\pi} ) } = \frac{ \mathbf{f_{0:t}}(i) \cdot \mathbf{b_{t:T}}(i) }{ \prod_{s=1}^T c_s } = \mathbf{\hat{f}_{0:t}}(i) \cdot \mathbf{\hat{b}_{t:T}}(i) Line 128 ⟶ 139: To understand this, we note that <math>\mathbf{f_{0:t}}(i) \cdot \mathbf{b_{t:T}}(i)</math> provides the probability for observing the given events in a way that passes through state <math>X_i</math> at time t. This probability includes the forward probabilities covering all events up to time t as well as the backward probabilities which include all future events. This is the numerator we are looking for in our equation, and we divide by the total probability of the observation sequence to normalize this value and extract only the probability that <math>x_t=X_i</math>. These values are sometimes called the "smoothed values" as they combine the forward and backward probabilities to compute a final probability. The values <math>\mathbf{\gamma_t}(i)</math> thus provide the probability of being in each state at time t. As such, they are useful for determining the most probable state at any time. It should be noted, however, that the term "most probable state" is somewhat ambiguous. While the most probable state is the most likely to be correct at a given point, the sequence of individually probable states is not likely to be the most probable sequence. This is because the probabilities for each point are calculated independently of each other. They do not take into account the transition probabilities between states, and it is thus possible to get states at two moments (t and t+1) that are both most probable at those time points but which have very little probability of occurring together, i. ~~( ie~~e. <math> \mathbf{P}(~~x_t~~X_t=~~X_i~~x_i,x_X_{t+1}=~~X_j~~x_j) \neq \mathbf{P}(~~x_t~~X_t=~~X_i~~x_i) \mathbf{P}(x_X_{t+1}=~~X_j~~x_j) </math> ) . The most probable sequence of states that produced an observation sequence can be found using the [[Viterbi algorithm]]. ==Example ==

Forward–backward algorithm: Difference between revisions