Content deleted Content added
→Rewrite: out of balance |
|||
Line 253:
::::We shouldn't care too much about arithmetic, and more about the properties of the function. They used a modulus of 101 - if we're searching for a string of 3 characters out of a character set of 96 printable characters (but their base was actually 8-bit bytes), there are 96<sup>3</sup>/101 = ~10,000 strings that can map to each hash value. If we use their base, it's 2<sup>24</sup>/101 binary values = ~160,000 strings. What they used as a modulus close to the cardinality of the character set, is supposed to be the base, and the modulus is an irreducible polynomial over the field relatively prime to the pattern's polynomial. That polynomial is big and choosing it relies on some reduction properties of the field. They used numbers without understanding how operating in the field yields a unique representation of a string, i.e. it's fingerprint. The article is junk. [[User:Sbalfour|Sbalfour]] ([[User talk:Sbalfour|talk]]) 17:05, 28 September 2019 (UTC)
:::::Maybe, but edits like [[Special:Diff/918281340|this one]] of yours are no better. Putting polynomials and finite fields into the first sentence of the article is completely inappropriate, especially when the main point of Rabin–Karp has nothing to do with the specific rolling hash used (replace it with any other rolling hash and it would still be the Rabin–Karp algorithm). Also your replacement of asymptotic comparisons with vague and unsourced editorializations ("a marginal advantage") is again completely inappropriate. And your version makes roughly the first half of the article be general considerations about string search that have nothing to do with this specific algorithm; that's far out of balance. I have reverted to an older version before your disimprovements. Stop worrying about putting the details of the hash function into the lead and focus on the algorithm itself. It's not as complicated as you want to make it. Just use a rolling hash function to reduce the number of times you do a more careful exact match. That's what should be in the lead. The details are for later. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 17:31, 28 September 2019 (UTC)
::::::You're a distinguished Professor of Computer Science and an ACM fellow! (GULP) You're more qualified to write this article than me, and I wasn't very familiar with it when I started. Throw in something, and I'll do, too. Starting with the definition in the first sentence. Who says that it can find multiple patterns? The seminal paper listed below just says it finds the first occurrence of a pattern in text... i.e. not even multiple occurrences of one pattern.[[User:Sbalfour|Sbalfour]] ([[User talk:Sbalfour|talk]]) 18:24, 28 September 2019 (UTC)
|