Knuth–Morris–Pratt algorithm: Difference between revisions

Content deleted Content added
Danilcha (talk | contribs)
Fix expected comparison count example (1G -> 1.04M)
Danilcha (talk | contribs)
Normalize the second example
Line 33:
Usually, the trial check will quickly reject the trial match. If the strings are uniformly distributed random letters, then the chance that characters match is 1 in 26. In most cases, the trial check will reject the match at the initial letter. The chance that the first two letters will match is 1 in 26<sup>2</sup> (1 in 676). So if the characters are random, then the expected complexity of searching string <code>S[]</code> of length ''k'' is on the order of ''k'' comparisons or ''O''(''k''). The expected performance is very good. If <code>S[]</code> is 1 million characters and <code>W[]</code> is 1000 characters, then the string search should complete after about 1.04 million character comparisons.
 
That expected performance is not guaranteed. If the strings are not random, then checking a trial <code>m</code> may take many character comparisons. The worst case is if the two strings match in all but the last letter. Imagine that the string <code>S[]</code> consists of 1 billionmillion characters that are all ''A'', and that the word <code>W[]</code> is 999 ''A'' characters terminating in a final ''B'' character. The simple string-matching algorithm will now examine 1000 characters at each trial position before rejecting the match and advancing the trial position. The simple string search example would now take about 1000 character comparisons times 1 billionmillion positions for 1 trillionbillion character comparisons. If the length of <code>W[]</code> is ''n'', then the worst-case performance is ''O''(''k''&sdot;''n'').
 
The KMP algorithm has a better worst-case performance than the straightforward algorithm. KMP spends a little time precomputing a table (on the order of the size of <code>W[]</code>, ''O''(''n'')), and then it uses that table to do an efficient search of the string in ''O''(''k'').