Revision as of 04:04, 24 May 2014 edit 106.220.26.43 (talk) →Description of pseudocode for the search algorithm Tag: section blanking ← Previous edit		Revision as of 04:07, 24 May 2014 edit undo 106.220.26.43 (talk) →Efficiency of the search algorithm Tag: section blanking Next edit →
Line 67: </code> This time we are able to complete the match, whose first character is <code>S[15]</code>. ~~===Efficiency of the search algorithm===~~ Assuming the prior existence of the table <code>T</code>, the search portion of the Knuth–Morris–Pratt algorithm has [[Computational complexity theory\|complexity]] [[Linear_time#Linear_time\|O(n)]], where <code>n</code> is the length of <code>S</code> and the <code>O</code> is [[big-O notation]]. Except for the fixed overhead incurred in entering and exiting the function, all the computations are performed in the <code>'''while'''</code> loop. To bound the number of iterations of this loop; observe that <code>T</code> is constructed so that if a match which had begun at <code>S[m]</code> fails while comparing <code>S[m + i]</code> to <code>W[i]</code>, then the next possible match must begin at <code>S[m + (i - T[i])]</code>. In particular the next possible match must occur at a higher index than <code>m</code>, so that <code>T[i] < i</code>. This fact implies that the loop can execute at most <code>2n</code> times. For, in each iteration, it executes one of the two branches in the loop. The first branch invariably increases <code>i</code> and does not change <code>m</code>, so that the index <code>m + i</code> of the currently scrutinized character of <code>S</code> is increased. The second branch adds <code>i - T[i]</code> to <code>m</code>, and as we have seen, this is always a positive number. Thus the ___location <code>m</code> of the beginning of the current potential match is increased. Now, the loop ends if <code>m + i = n</code>; therefore each branch of the loop can be reached at most <code>k</code> times, since they respectively increase either <code>m + i</code> or <code>m</code>, and <code>m ≤ m + i</code>: if <code>m = n</code>, then certainly <code>m + i ≥ n</code>, so that since it increases by unit increments at most, we must have had <code>m + i = n</code> at some point in the past, and therefore either way we would be done. ~~Thus the loop executes at most <code>2n</code> times, showing that the time complexity of the search algorithm is <code>O(n)</code>.~~ ~~Here is another way to think about the runtime:~~ ~~Let us say we begin to match W and S at position i and p, if W exists as a substring of S at p, then W[0 through m] == S[p through p+m].~~ ~~Upon success, that is, the word and the text matched at the positions(W[i] == S[p+i]), we increase i by 1 (i++).~~ Upon failure, that is, the word and the text does not match at the positions(W[i] != S[p+i]), the text pointer is kept still, while the word pointer roll-back a certain amount(i = T[i], where T is the jump table) And we attempt to match <code>W[T[i]]</code> with <code>S[p+i]</code>. ~~The maximum number of roll-back of i is bounded by i, that is to say, for any failure, we can only roll-back as much as we have progressed up to the failure.~~ ~~Then it is clear the runtime is 2n.~~ =="Partial match" table (also known as "failure function")==

Knuth–Morris–Pratt algorithm: Difference between revisions