Rabin–Karp algorithm: Difference between revisions

Content deleted Content added
ISee (talk | contribs)
Edemaine (talk | contribs)
Rabin-Karp and multiple pattern search: improve wording, correct runtimes, remove "worst case" claim
Line 57:
Rabin-Karp is inferior for single pattern searching to [[Knuth-Morris-Pratt algorithm]], [[Boyer-Moore string searching algorithm]] and other faster single pattern [[string searching algorithm]]s because of its slow worst case behavior. However, Rabin-Karp is an algorithm of choice for multiple pattern search.
 
That is, if we want to find any of a large number, say ''k'', fixed length patterns in a text, we can create a simple variant of Rabin-Karp that uses a [[hash table]] or any other [[set data structure]] to check whether the hash of a given string belongs to a set of hash values of patterns we are looking for:
 
'''function''' RabinKarpSet(''string'' s[1..n], ''set'' of ''string'' subs, m) {
Line 74:
Here we assume all the substrings have a fixed length ''m'', but this assumption can be eliminated. We simply compare the current hash value against the hash values of all the substrings simultaneously using a quick lookup in our set data structure, and then verify any match we find against all substrings with that hash value.
 
Other algorithms can search for a single pattern in time order O(''n'') time, and hence they willcan be used to search for ''k'' patterns in time order O(''n*'' ''k'') time. TheIn contrast, the variant Rabin-Karp above willcan stillfind workall in''k'' timepatterns orderin O(''n''+''k'') intime the best and averagein caseexpectation, because a hash table allows to checkchecks whether or nota substring hash equals any of the pattern hashes in time order of O(1). We can also ensure O(''mn''log ''k'') time in the worst case, where ''m'' is the length of the longest of the ''k'' strings, by storing the hashes in a [[self-balancing binary search tree]] instead of a hash table.
 
==References==