Revision as of 18:19, 6 February 2005 edit Ryan Reich (talk \| contribs) 1,551 edits Add C code example to the table algorithm. ← Previous edit		Revision as of 18:26, 6 February 2005 edit undo Ryan Reich (talk \| contribs) 1,551 edits m A few typographical corrections. Next edit →
Line 42: This time we are able to complete the match, and return the position 17 as its origin. ===The ~~searching~~search algorithm=== The above example is completely instructive in this regard. We assume the existence of a "partial match" table, described below, which indicates where we need to look for the start of a new match in the event that the current one ends in a mismatch. For the moment the table, <math>T</math>, should be taken as a [[black box]] with the property that if we have a match starting at <math>S[m]</math> that fails when comparing <math>S[m + i]</math> to <math>P[i]</math>, then the next possible match will start at <math>m + i - T[i - 1]</math>. In particular, <math>T[-1]</math> is defined and equals <math>-1</math>. Knowing this, the algorithm is very simple: Line 129: \|} ===The table-building algorithm=== The example above illustrates the general technique for assembling the table with a minimum of fuss. The principle is that of the overall search: most of the work was already done in getting to the current position, so very little needs to be done in leaving it. Here follows the algorithm; to eliminate special cases we will use the convention that <math>P[-1]</math> is defined and its value is unequal to any possible character in <math>P</math>. Line 145: The complexity of the table algorithm is <math>O(n)</math>, where <math>n</math> is the length of <math>P</math>. As except for some initialization all the work is done in step 3, it is sufficient to show that step 3 executes in <math>O(n)</math> time, which will be done by simultaneously examining the quantities <math>i</math> and <math>i - j</math>. In the first branch, <math>i - j</math> is preserved, as both <math>i</math> and <math>j</math> are incremented simultaneously, but naturally, <math>i</math> is increased. In the second branch, <math>j</math> is replaced by <math>T[j - 1]</math>, which we saw above is always strictly less than <math>j</math>, thus increasing <math>i - j</math>. In the third branch, <math>i</math> is incremented and <math>j</math> is not, so both <math>i</math> and <math>i - j</math> increase. Since <math>i \geq i - j</math>, this means that at each stage either <math>i</math> or a lower bound for <math>i</math> increases; therefore since the algorithm terminates once <math>i = n</math>, it must terminate after at most <math>n</math> iterations of step 3, since <math>i - j</math> begins at <math>1</math>. Therefore the complexity of the table algorithm is <math>O(n)</math>. ===~~Code~~A code example ~~for~~of the table-building algorithm=== A [[C programming language\|C]] code example for the table-building algorithm is given here. As for the search algorithm, the bounds of <math>T</math> have been incremented by 1 to make the C code more natural. The additional variable <math>c</math> is employed to simulate the existence of <math>P[-1]</math>. It is also assumed that both this routine and the search routine are [[call]]ed as [[subroutine]]s of a [[wrapper]] which correctly [[Memory allocation\|allocates]] [[Computer storage\|memory]] for <math>T</math>. Line 168: T[i + 1] = 0; ++i; j = 0; } c = T[j]; Line 175 ⟶ 176: } ==~~Efficiency~~The efficiency of the KNP algorithm== Since the two portions of the algorithm have, respectively, complexities of <math>O(l)</math> and <math>O(n)</math>, the complexity of the overall algorithm is <math>O(n + l)</math>.

Knuth–Morris–Pratt algorithm: Difference between revisions