Knuth–Morris–Pratt algorithm: Difference between revisions

Content deleted Content added
Ryan Reich (talk | contribs)
Add C code example to the table algorithm.
Ryan Reich (talk | contribs)
m A few typographical corrections.
Line 42:
This time we are able to complete the match, and return the position 17 as its origin.
 
===The searchingsearch algorithm===
 
The above example is completely instructive in this regard. We assume the existence of a "partial match" table, described below, which indicates where we need to look for the start of a new match in the event that the current one ends in a mismatch. For the moment the table, <math>T</math>, should be taken as a [[black box]] with the property that if we have a match starting at <math>S[m]</math> that fails when comparing <math>S[m + i]</math> to <math>P[i]</math>, then the next possible match will start at <math>m + i - T[i - 1]</math>. In particular, <math>T[-1]</math> is defined and equals <math>-1</math>. Knowing this, the algorithm is very simple:
Line 129:
|}
 
===The table-building algorithm===
 
The example above illustrates the general technique for assembling the table with a minimum of fuss. The principle is that of the overall search: most of the work was already done in getting to the current position, so very little needs to be done in leaving it. Here follows the algorithm; to eliminate special cases we will use the convention that <math>P[-1]</math> is defined and its value is unequal to any possible character in <math>P</math>.
Line 145:
The complexity of the table algorithm is <math>O(n)</math>, where <math>n</math> is the length of <math>P</math>. As except for some initialization all the work is done in step 3, it is sufficient to show that step 3 executes in <math>O(n)</math> time, which will be done by simultaneously examining the quantities <math>i</math> and <math>i - j</math>. In the first branch, <math>i - j</math> is preserved, as both <math>i</math> and <math>j</math> are incremented simultaneously, but naturally, <math>i</math> is increased. In the second branch, <math>j</math> is replaced by <math>T[j - 1]</math>, which we saw above is always strictly less than <math>j</math>, thus increasing <math>i - j</math>. In the third branch, <math>i</math> is incremented and <math>j</math> is not, so both <math>i</math> and <math>i - j</math> increase. Since <math>i \geq i - j</math>, this means that at each stage either <math>i</math> or a lower bound for <math>i</math> increases; therefore since the algorithm terminates once <math>i = n</math>, it must terminate after at most <math>n</math> iterations of step 3, since <math>i - j</math> begins at <math>1</math>. Therefore the complexity of the table algorithm is <math>O(n)</math>.
 
===CodeA code example forof the table-building algorithm===
 
A [[C programming language|C]] code example for the table-building algorithm is given here. As for the search algorithm, the bounds of <math>T</math> have been incremented by 1 to make the C code more natural. The additional variable <math>c</math> is employed to simulate the existence of <math>P[-1]</math>. It is also assumed that both this routine and the search routine are [[call]]ed as [[subroutine]]s of a [[wrapper]] which correctly [[Memory allocation|allocates]] [[Computer storage|memory]] for <math>T</math>.
Line 168:
T[i + 1] = 0;
++i;
j = 0;
}
c = T[j];
Line 175 ⟶ 176:
}
 
==EfficiencyThe efficiency of the KNP algorithm==
 
Since the two portions of the algorithm have, respectively, complexities of <math>O(l)</math> and <math>O(n)</math>, the complexity of the overall algorithm is <math>O(n + l)</math>.