Knuth–Morris–Pratt algorithm: Difference between revisions

Content deleted Content added
Ryan Reich (talk | contribs)
Correct return value for the search algorithm
m added links to [Big-O notation]
Line 56:
===The efficiency of the search algorithm===
 
Assuming the prior existence of the table <math>T</math>, the search portion of the Knuth-Morris-Pratt algorithm has [[complexity]] <math>[[Big-O notation|O]](''l'')</math>, where <math>l</math> is the length of <math>S</math>. We compute this by counting the number of times each step can execute. The crucial stages are the two branches of step 2; the total number of times they execute is the number of times step 2 itself executes. The important fact about the second branch is that the value of <math>m</math> is guaranteed to increase by at least 1. Since we set <math>m = m + (i - e)</math>, <math>m</math> increases if and only if <math>e = T[i - 1] < i</math>. By definition, <math>T[i - 1]</math> is the length of a ''proper'' terminal substring of the <math>i</math> characters <math>P[0], \dots, P[i - 1]</math>; hence <math>T[i - 1]</math> is at most <math>i - 1 < i</math>. This argument holds when <math>i > 0</math>; when <math>i = 0</math> we have <math>T[i - 1] = -1 < 0 = i</math>, so the inequality again holds. Therefore <math>m</math> does indeed increase in the second branch, so this branch is executed at most <math>l</math> times, as the algorithm terminates when <math>m + i = l</math>.
 
The second branch of step 2 executes once each time a comparison is not a match; the first branch, likewise, executes once for each match. It is clear from the operations performed in the second branch that <math>m + i</math> is nondecreasing with successive iterations of step 2; therefore, since <math>i</math> actually increases each time the first branch executes, the index <math>m + i</math> of the character currently under scrutiny increases between successive iterations of the first branch. Hence a character which has once been matched is never subjected to comparison again. It follows that the first branch also executes at most once for each character of <math>S</math>, hence no more than <math>l</math> times. So step 2 executes at most <math>2l</math> times. As step 1 executes merely once, and step 3 executes each time step 2 does except when the algorithm exits, the number of steps performed by the algorithm in searching <math>S</math> is at most <math>4l</math>; hence the search algorithm exits in <math>O(l)</math> time.
Line 182:
==The efficiency of the KMP algorithm==
 
Since the two portions of the algorithm have, respectively, complexities of <math>O(l)</math> and <math>O(n)</math>, the complexity of the overall algorithm is [[Big-O notation|O]]<math>O(n + l)</math>.
 
==External links==