Knuth–Morris–Pratt algorithm: Difference between revisions

Content deleted Content added
3 edits removed
Fix Linter errors
Line 49:
In each step the algorithm compares <code>S[m+i]</code> with <code>W[i]</code> and increments <code>i</code> if they are equal. This is depicted, at the start of the run, like
 
<spandiv style="font-family:monospace;">
1 2
m: 01234567890123456789012
Line 55:
W: {{color|blue|ABC}}{{color|red|D}}{{color|gray|ABD}}
i: {{color|blue|012}}{{color|red|3}}{{color|gray|456}}
</spandiv>
 
The algorithm compares successive characters of <code>W</code> to "parallel" characters of <code>S</code>, moving from one to the next by incrementing <code>i</code> if they match. However, in the fourth step <code>S[3]&nbsp;=&nbsp;'&nbsp;'</code> does not match <code>W[3] = 'D'</code>. Rather than beginning to search again at <code>S[1]</code>, we note that no <code>'A'</code> occurs between positions 1 and 2 in <code>S</code>; hence, having checked all those characters previously (and knowing they matched the corresponding characters in <code>W</code>), there is no chance of finding the beginning of a match. Therefore, the algorithm sets <code>m = 3</code> and <code>i = 0</code>.
 
<spandiv style="font-family:monospace;">
1 2
m: 01234567890123456789012
Line 65:
W: {{color|red|A}}{{color|gray|BCDABD}}
i: {{color|red|0}}{{color|gray|123456}}
</spandiv>
 
This match fails at the initial character, so the algorithm sets <code>m = 4</code> and <code>i = 0</code>
 
<spandiv style="font-family:monospace;">
1 2
m: 01234567890123456789012
Line 75:
W: {{color|blue|ABCDAB}}{{color|red|D}}
i: {{color|blue|012345}}{{color|red|6}}
</spandiv>
 
Here, <code>i</code> increments through a nearly complete match <code>"ABCDAB"</code> until <code>i = 6</code> giving a mismatch at <code>W[6]</code> and <code>S[10]</code>. However, just prior to the end of the current partial match, there was that substring <code>"AB"</code> that could be the beginning of a new match, so the algorithm must take this into consideration. As these characters match the two characters prior to the current position, those characters need not be checked again; the algorithm sets <code>m = 8</code> (the start of the initial prefix) and <code>i = 2</code> (signaling the first two characters match) and continues matching. Thus the algorithm not only omits previously matched characters of <code>S</code> (the <code>"AB"</code>), but also previously matched characters of <code>W</code> (the prefix <code>"AB"</code>).
 
<spandiv style="font-family:monospace;">
1 2
m: 01234567890123456789012
Line 85:
W: {{color|gray|AB}}{{color|red|C}}{{color|gray|DABD}}
i: {{color|gray|01}}{{color|red|2}}{{color|gray|3456}}
</spandiv>
 
This search at the new position fails immediately because <code>W[2]</code> (a <code>'C'</code>) does not match <code>S[10]</code> (a <code>'&nbsp;'</code>). As in the first trial, the mismatch causes the algorithm to return to the beginning of <code>W</code> and begins searching at the mismatched character position of <code>S</code>: <code>m = 10</code>, reset <code>i = 0</code>.
 
<spandiv style="font-family:monospace;">
1 2
m: 01234567890123456789012
Line 95:
W: {{color|red|A}}{{color|gray|BCDABD}}
i: {{color|red|0}}{{color|gray|123456}}
</spandiv>
 
The match at <code>m=10</code> fails immediately, so the algorithm next tries <code>m = 11</code> and <code>i = 0</code>.
 
<spandiv style="font-family:monospace;">
1 2
m: 01234567890123456789012
Line 105:
W: {{color|blue|ABCDAB}}{{color|red|D}}
i: {{color|blue|012345}}{{color|red|6}}
</spandiv>
 
Once again, the algorithm matches <code>"ABCDAB"</code>, but the next character, <code>'C'</code>, does not match the final character <code>'D'</code> of the word <code>W</code>. Reasoning as before, the algorithm sets <code>m = 15</code>, to start at the two-character string <code>"AB"</code> leading up to the current position, set <code>i = 2</code>, and continue matching from the current position.
 
<spandiv style="font-family:monospace;">
1 2
m: 01234567890123456789012
Line 115:
W: {{color|blue|ABCDABD}}
i: {{color|blue|0123456}}
</spandiv>
 
This time the match is complete, and the first character of the match is <code>S[15]</code>.