Revision as of 17:51, 1 January 2019 edit 188.146.78.4 (talk) →Background ← Previous edit		Revision as of 22:02, 5 January 2019 edit undo Danilcha (talk \| contribs) 37 edits Fix expected comparison count example (1G -> 1.04M) Next edit →
Line 31: The most straightforward algorithm is to look for a character match at successive values of the index <code>m</code>, the position in the string being searched, i.e. <code>S[m]</code>. If the index <code>m</code> reaches the end of the string then there is no match, in which case the search is said to "fail". At each position <code>m</code> the algorithm first checks for equality of the first character in the word being searched, i.e. <code>S[m] =? W[0]</code>. If a match is found, the algorithm tests the other characters in the word being searched by checking successive values of the word position index, <code>i</code>. The algorithm retrieves the character <code>W[i]</code> in the word being searched and checks for equality of the expression <code>S[m+i] =? W[i]</code>. If all successive characters match in <code>W</code> at position <code>m</code>, then a match is found at that position in the search string. Usually, the trial check will quickly reject the trial match. If the strings are uniformly distributed random letters, then the chance that characters match is 1 in 26. In most cases, the trial check will reject the match at the initial letter. The chance that the first two letters will match is 1 in 26<sup>2</sup> (1 in 676). So if the characters are random, then the expected complexity of searching string <code>S[]</code> of length ''k'' is on the order of ''k'' comparisons or ''O''(''k''). The expected performance is very good. If <code>S[]</code> is 1 million characters and <code>W[]</code> is 1000 characters, then the string search should complete after about ~~one~~1.04 ~~billion~~million character comparisons. That expected performance is not guaranteed. If the strings are not random, then checking a trial <code>m</code> may take many character comparisons. The worst case is if the two strings match in all but the last letter. Imagine that the string <code>S[]</code> consists of 1 billion characters that are all ''A'', and that the word <code>W[]</code> is 999 ''A'' characters terminating in a final ''B'' character. The simple string-matching algorithm will now examine 1000 characters at each trial position before rejecting the match and advancing the trial position. The simple string search example would now take about 1000 character comparisons times 1 billion positions for 1 trillion character comparisons. If the length of <code>W[]</code> is ''n'', then the worst-case performance is ''O''(''k''⋅''n'').

Knuth–Morris–Pratt algorithm: Difference between revisions