Content deleted Content added
Lots more details and paragraphing and explanation and pseudocode |
More details |
||
Line 4:
== Shifting substrings search and competing algorithms ==
The basic problem the algorithm addresses is finding a
'''1''' '''function''' NaiveSearch(''string'' s[1..n], ''string'' sub[1..m])
'''2''' '''for''' i '''from''' 1 '''to''' n
'''3''' '''for''' j '''from''' 1 '''to''' m
'''4''' '''if''' s[i+j-1] ≠ sub[j]
'''5''' jump to next iteration of outer loop
'''6''' '''return''' i
'''7''' '''return''' not found
This algorithm works well in many practical cases, but fails spectacularly on certain examples, such as searching for a string of 10,000 "a"s in a string of 10 million "a"s, in which case it exhibits its worst-case [[Big-O notation|Ω]](''mn'') time.
[[Knuth-Morris-Pratt algorithm]] is an elegant elaboration on this naive algorithm that uses precomputed data to skip forward not by 1 character, but by as many as possible for the search to succeed. However, Rabin-Karp focuses instead on speeding up lines 3-6, as we'll discuss in the next section.
== Use of hashing for shifting substring search ==
However, there are two problems with this. First, because there are so many different strings, to keep the hash values small we have to assign some strings the same number. This means that if the hash values match, the strings might not match; we have to verify that they do, which can take a long time for long substrings. Luckily, a good hash function promises us that on most reasonable inputs, this won't happen too often, which keeps the average search time good.
Line 14 ⟶ 24:
{{wikicode}}
'''1''' '''function''' RabinKarp(''string'' s[1..n], ''string'' sub[1..m])
'''2''' hsub := hash(sub[1..m])
'''3''' hs := hash(s[1..m])
Line 22 ⟶ 32:
'''7''' '''return''' i
'''8''' hs := hash(s[i+1..i+m])
'''9'''
Lines 2, 3, and 6 each require [[Big-O notation|Ω]](m) time. However, lines 2 and 3 are only executed once, and line 6 is only executed if the hash values match, which is unlikely to happen more than a few times. Line 5 is executed ''n'' times, but only requires constant time. We now come to the second problem: line 8.
Line 58 ⟶ 68:
'''return''' i
hs := hash(s[i+1..i+m])
'''return''' not found
}
|