Hash function: Difference between revisions

Content deleted Content added
m ce
Line 212:
{{Main|Rolling hash}}
{{See also|Linear congruential generator}}
In some applications, such as [[string searching algorithm|substring search]], one can compute a hash function {{math|''h''}} for every {{math|''k''}}-character [[substring]] of a given {{math|''n''}}-character string by advancing a window of width {{math|''k''}} characters along the string, where {{math|''k''}} is a fixed integer, and {{Math|''n'' > ''k''}}. The straightforward solution, which is to extract such a substring at every character position in the text and compute {{math|''h''}} separately, requires a number of operations proportional to {{math|''k''·''n''}}. However, with the proper choice of {{math|''h''}}, one can use the technique of rolling hash to compute all those hashes with an effort proportional to {{math|''mk''&nbsp;+&nbsp;''n''}} where {{math|''m''}} is the number of occurrences of the substring.<ref>{{citationCite book needed|datelast=NovemberSingh 2022|first=N. B. |url=https://books.google.com/books?id=ALIMEQAAQBAJ&pg=PT102&dq=rolling+hash&hl=en&newbks=1&newbks_redir=0&sa=X&ved=2ahUKEwijuvSVy5qIAxUUGDQIHSU5Ma4Q6AF6BAgIEAI#v=onepage&q=rolling%20hash&f=false |title=A Handbook of Algorithms |publisher=N.B. Singh |language=en}}</ref>{{fix|text=what is the choice of h?}}
 
The most familiar algorithm of this type is [[Rabin-Karp]] with best and average case performance {{math|''O''(''n''+''mk'')}} and worst case {{math|''O''(''n''·''k'')}} (in all fairness, the worst case here is gravely pathological: both the text string and substring are composed of a repeated single character, such as {{math|''t''}}="AAAAAAAAAAA", and {{math|''s''}}="AAA"). The hash function used for the algorithm is usually the [[Rabin fingerprint]], designed to avoid collisions in 8-bit character strings, but other suitable hash functions are also used.