Talk:Forward–backward algorithm

Robotics Start‑class Mid‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-importance on the project's importance scale.

Computing Start‑class

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the project's importance scale.

--- The description is very hard to follow. It is incomplete, being based upon and making reference to a book that is not available online. The one-dimensional vectors are written wrongly in the matrix formulas (horizontal instead of vertical). IMO, this article needs a fundamental re-writing, it is extremely frustrating to read as it is.

—Preceding unsigned comment added by 82.181.190.197 (talk) 17:38, 22 October 2008 (UTC) --- —Preceding unsigned comment added by Mihai preda (talk • contribs) 16:22, 15 July 2008 (UTC)Reply

--- I am not comfortable making the edits, but I believe there are major errors here.

With respect to the backward messages b. It is stated:

Finally note that the description in Russel & Norvig 2003 pp. 550 excludes the point product, thought the procedure is required earlier.

I don't believe this is correct. It should be as in the book:

b_{k+1:t} = TO_{k+1}b_{k+2:t}

no point product with f, no normalization. This is only the backward message, implements the updates shown on pp. 545 eq (15.7).

You multiply the b with the corresponding f in each time slice to obtained the smoothed vectors sv, as shown on pp. 544 eq (15.6). So as written, there are two multiplications by f taking place! Eq 15.7 shows how the RHS of 15.6 breaks down, NOT the LHS. (BB)

---

I did the modification (but I did not recompute the final result in the example). Indeed, this was not correct. Mayerwin (from Wikipedia FR).

---

Thanks for the sanity check Mayerwin.

If someone cares to do the formatting, this is what my implementation yields for this scenario:

i: 1 fv: {s:Rain=0.8181818181818181, s:No Rain=0.18181818181818182}

i: 2 fv: {s:Rain=0.883357041251778, s:No Rain=0.1166429587482219}

i: 3 fv: {s:Rain=0.19066793972352525, s:No Rain=0.8093320602764746}

i: 4 fv: {s:Rain=0.7307940045849821, s:No Rain=0.2692059954150179}

i: 5 fv: {s:Rain=0.8673388895754848, s:No Rain=0.13266111042451526}

i: 5 B: {s:Rain=1.0, s:No Rain=1.0}

i: 4 B: {s:Rain=0.69, s:No Rain=0.41000000000000003}

i: 3 B: {s:Rain=0.4593, s:No Rain=0.2437}

i: 2 B: {s:Rain=0.090639, s:No Rain=0.15025100000000002}

i: 1 B: {s:Rain=0.06611763, s:No Rain=0.04550767}

i: 5 sv: {s:Rain=0.8673388895754848, s:No Rain=0.13266111042451526}

i: 4 sv: {s:Rain=0.8204190536236754, s:No Rain=0.17958094637632463}

i: 3 sv: {s:Rain=0.3074835760066177, s:No Rain=0.6925164239933823}

i: 2 sv: {s:Rain=0.8204190536236753, s:No Rain=0.17958094637632466}

i: 1 sv: {s:Rain=0.8673388895754848, s:No Rain=0.13266111042451528}

The interpretation of these smoothed vectors depends on the case usage, but for this example one conclusion we could draw is the best explanation of {umbrella, umbrella, no umbrella, umbrella, umbrella} seems to be {Rain, Rain, No Rain, Rain, Rain} (the Viterbi alg would explicitly output this). (BB)

---

The word 'umbrella' appears suddenly in the middle of the example. What is this t=1 'umbrella'? --- —Preceding unsigned comment added by 134.174.140.104 (talk) 14:58, 3 July 2008 (UTC) I am afraid this isn't complete? Forward-backward approach is to avoid the time complexity of the brute force one, but the method really isn't elaborated here. --huggieReply

Also, the time complexity formula should omit any constant factors (ie, the 2). The forward-backward algorithm itself just exploits the Markov property to reduce the time complexity to something like $O(N\cdot T)$ where $N$ is the number of symbols in the alphabet and $T$ is the length of the sequence. Some rough pseudo-code is below:

ForwardBackward(guessState, sequenceIndex):
    if sequenceIndex is past the end of the sequence, return 1
    if (guessState, sequenceIndex) has been seen before, return saved result
    result = 0
    for each neighboring state n:
        result = result + (transition probability from guessState to n given observation element at sequenceIndex)*ForwardBackward(n, sequenceIndex+1)
    save result for (guessState, sequenceIndex)
    return result

The initial call can either be done by creating a loop and calling with all initial probabilities, or creating a dummy start state with transition probabilities equal to the initial probabilities and calling using that. Mskinn99 (talk) 21:04, 13 January 2008 (UTC)Reply

Viterbi algorithm has better pseudo-code, and a better description. The two algorithms are so similar (they can both be implemented in a single function) that it might be worth merging the articles. Mskinn99 (talk) 23:24, 13 January 2008 (UTC)Reply

Page Cleanup

Latest comment: 15 years ago2 comments2 people in discussion

I extended the page with a brief overview a semi formal description of the algorithm, referenced some performance improvements and included an example. I personally feel that the matrix based description is best to follow. Additionally I felt that it is most helpful to understand the workings of this algorithm through a numerical example. Therefore I included a (rather extensive) example based on an example presented in the widely used AI text book of Russel and Norvig. I also referenced a repository of source code where java code implementing the algorithm may be found. I am generally new to editing at Wikipedia but tried to follow the guidelines I found to be relevant. If the content is not found suitable, contains errors or is otherwise unsuitable, I certainly welcome feedback, critique and any edits, including deletions, deemed necessary :) BJJV (talk) 11:08, 26 May 2008 (UTC)Reply

I just finished cleaning up the page a bit to try and make it easier to follow. Hopefully I didn't make things worse. I am new to the subject but described the problem as it is addressed in the first reference (Rabiner, 1989). I also modified the notation so that the forward probabilities use subscripts of the form: $\mathbf {f_{0:t}}$ (zero-based instead of one-based). This provides a consistent form that describes the problem and allows us to calculate the state vector at time t=0. I also used the "hat" to distinguish between the actual probabilities of an event sequence and the normalized values that describe the state vector. I hope the notation is OK, and I worry about my terminology (is "state vector" appropriate? despite my reading I can't find what this is actually called). Also, my opinion is that the matrix example is very helpful. I found it useful for checking my results when I developed an HMM program and think it should be left in. Bodhi.root (talk) —Preceding undated comment added 19:36, 15 September 2009 (UTC).Reply

Sorry, had to modify things once more. The explanation I gave before was not technically correct. The new version explains the underlying probabilities and how to interpret the forward, backward, and smoothed probabilities. It is a little harder to follow, but it is now technically correct. I also modified the example section to use the new notation. The matrix examples should probably be reworked so that they use the formulas given earlier (i.e. scaling by the $c_{t}$ 's), but I'm done for now and don't want to recalculate all that. Bodhi.root (talk)

Explaining this algorithm with Matrices

Latest comment: 16 years ago1 comment1 person in discussion

It was a terrible idea to explain this algorithm in terms of matrices. Yes, it makes it so that you can elegantly describe the algorithm with a single formula, but it has many disadvantages, including:

It makes the algorithm inaccessible to people that aren't comfortable with matrix math.
It performs superfluous computations that would not be done in an efficient implementation.
It hides context, and makes it easy to forget the meaning of columns and rows.
The matrix math shown here is broken anyway. It shows row vectors where column vectors should be, and implies impossible calculations with matrices that aren't compatible for multiplication. When the smoothed values are computed, the symbol "x" is used, which implies cross-product, but straight scalar multiplication is used.

If we *really must* use matrix math to explain this algorithm, it needs to be cleaned up a lot. Frankly, I think the traditional explanation that uses probabilities and intuition is much better suited for an encyclopedic article. I propose that we rip out the matrix math completely. Also, I agree very much with the comment made earlier that this algorithm is so similar to the Viterbi algorithm that they should be explained together, and in a consistent manner. —Preceding unsigned comment added by 128.187.80.2 (talk) 20:33, 3 February 2009 (UTC)Reply