Knuth–Morris–Pratt algorithm: Difference between revisions

Content deleted Content added
Remove grammatically incorrect sentence that didn't add valuable information (and wasn't sourced) anyway
Line 182:
We pass to the subsequent <code>W[4]</code>, <code>'A'</code>. The same logic shows that the longest substring we need consider has length 1, and as in the previous case it fails since "D" is not a prefix of <code>W</code>. But instead of setting <code>T[4] = 0</code>, we can do better by noting that <code>W[4] = W[0]</code>, and also that a look-up of <code>T[4]</code> implies the corresponding <code>S</code> character, <code>S[m+4]</code>, was a mismatch and therefore <code>S[m+4] ≠ 'A'</code>. Thus there is no point in restarting the search at <code>S[m+4]</code>; we should begin 1 ahead. This means that we may shift pattern <code>W</code> by match length plus one character.
 
Considering now the next character, <code>W[5]</code>, which is <code>'B'</code>: though by inspection the longest substring would appear to be <code>'A'</code>, we still set <code>WT[5] = 0</code>. The reasoning is similar to why <code>WT[4] = -1</code>. <code>W[5]</code> itself extends the prefix match begun with <code>W[4]</code>, and we can assume that the corresponding character in <code>S</code>, <code>S[m+5] ≠ 'B'</code>. So backtracking before <code>W[5]</code> is pointless, but <code>S[m+5]</code> may be <code>'A'</code>, hence <code>T[5] = 0</code>.
 
Finally, we see that the next character in the ongoing segment starting at <code>W[4] = 'A'</code> would be <code>'B'</code>, and indeed this is also <code>W[5]</code>. Furthermore, the same argument as above shows that we need not look before <code>W[4]</code> to find a segment for <code>W[6]</code>, so that this is it, and we take <code>T[6] = 2</code>.