Rabin–Karp algorithm: Difference between revisions

Content deleted Content added
 
(2 intermediate revisions by the same user not shown)
Line 72:
 
We can then compute the hash of the next substring, "bra", from the hash of "abr" by subtracting the number added for the first 'a' of "abr", i.e. 97 &times; 256<sup>2</sup>, multiplying by the base and adding for the last a of "bra", i.e. 97 &times; 256<sup>0</sup>. Like so:
{{pre|style=font-size:95%|1=
 
// ''old hash (-ve avoider){{efn|name=ua}} old 'a' left base offset base shift new 'a''' prime modulus
hash("bra") = [ ( 4 + 101 - 97 * [(256%101)*256] % 101{{efn|name=mod101}} ) * 256{{efn|name=times256}} + 97 ] % 101 = 30
}}
 
If we are matching the search string "bra", using similar calculation of hash("abr"),
 
Line 108:
==Notes==
{{notelist|refs=
{{efn|name=mod|'{{char|%'}} is 'mod' or [[modulo]], or remainder after integer division, operator.}}
{{efn|name=ua|1=(-ve avoider) = "underflow avoider". Necessary if using unsigned integers for calculations. Because we know all hashes <math>h \leq p</math> for prime modulus ${{mvar|p$}}, we can ensure no underflow by adding {{mvar|p}} to the old hash before subtracting the value corresponding to the old 'a' (mod {{mvar|p}}).}}
{{efn|name=times256|the last '{{code|* 256'}} is the shift of the subtracted hash to the left.}}
{{efn|name=mod101|although {{code|((256%101)*256)%101}} is the same as 256<sup>2</sup> %mod 101, to avoid overflowing integer maximums when the pattern string is longer (e.g. 'Rabin-Karp' is 10 characters, 256<sup>9</sup> is the offset without modulation ), the pattern length base offset is pre-calculated in a loop, modulating the result each iteration.}}
}}