Talk:Rabin–Karp algorithm: Difference between revisions

Content deleted Content added
Rewrite: out of balance
Line 252:
:::Modular arithmetic is high school mathematics. You don't need to know Galois theory to understand it. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 01:16, 28 September 2019 (UTC)
::::We shouldn't care too much about arithmetic, and more about the properties of the function. They used a modulus of 101 - if we're searching for a string of 3 characters out of a character set of 96 printable characters (but their base was actually 8-bit bytes), there are 96<sup>3</sup>/101 = ~10,000 strings that can map to each hash value. If we use their base, it's 2<sup>24</sup>/101 binary values = ~160,000 strings. What they used as a modulus close to the cardinality of the character set, is supposed to be the base, and the modulus is an irreducible polynomial over the field relatively prime to the pattern's polynomial. That polynomial is big and choosing it relies on some reduction properties of the field. They used numbers without understanding how operating in the field yields a unique representation of a string, i.e. it's fingerprint. The article is junk. [[User:Sbalfour|Sbalfour]] ([[User talk:Sbalfour|talk]]) 17:05, 28 September 2019 (UTC)
:::::Maybe, but edits like [[Special:Diff/918281340|this one]] of yours are no better. Putting polynomials and finite fields into the first sentence of the article is completely inappropriate, especially when the main point of Rabin–Karp has nothing to do with the specific rolling hash used (replace it with any other rolling hash and it would still be the Rabin–Karp algorithm). Also your replacement of asymptotic comparisons with vague and unsourced editorializations ("a marginal advantage") is again completely inappropriate. And your version makes roughly the first half of the article be general considerations about string search that have nothing to do with this specific algorithm; that's far out of balance. I have reverted to an older version before your disimprovements. Stop worrying about putting the details of the hash function into the lead and focus on the algorithm itself. It's not as complicated as you want to make it. Just use a rolling hash function to reduce the number of times you do a more careful exact match. That's what should be in the lead. The details are for later. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 17:31, 28 September 2019 (UTC)