Talk:Rabin–Karp algorithm: Difference between revisions

Content deleted Content added
Line 254:
:::::Maybe, but edits like [[Special:Diff/918281340|this one]] of yours are no better. Putting polynomials and finite fields into the first sentence of the article is completely inappropriate, especially when the main point of Rabin–Karp has nothing to do with the specific rolling hash used (replace it with any other rolling hash and it would still be the Rabin–Karp algorithm). Also your replacement of asymptotic comparisons with vague and unsourced editorializations ("a marginal advantage") is again completely inappropriate. And your version makes roughly the first half of the article be general considerations about string search that have nothing to do with this specific algorithm; that's far out of balance. I have reverted to an older version before your disimprovements. Stop worrying about putting the details of the hash function into the lead and focus on the algorithm itself. It's not as complicated as you want to make it. Just use a rolling hash function to reduce the number of times you do a more careful exact match. That's what should be in the lead. The details are for later. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 17:31, 28 September 2019 (UTC)
::::::You're a distinguished Professor of Computer Science and an ACM fellow! (GULP) You're more qualified to write this article than me, and I wasn't very familiar with it when I started. Throw in something, and I'll do, too. Starting with the definition in the first sentence. Who says that it can find multiple patterns? The seminal paper listed below just says it finds the first occurrence of a pattern in text... i.e. not even multiple occurrences of one pattern.[[User:Sbalfour|Sbalfour]] ([[User talk:Sbalfour|talk]]) 18:24, 28 September 2019 (UTC)
:::::::Just keep running the algorithm and it will find multiple instances of the same pattern. Use a separate hash table to store the hashes of the patterns and it will be able to find multiple different patterns. The same idea, easily generalized. We need published sources for that of course but they don't have to be in the original paper. The subtlety is that the expected time becomes O(n + sum of lengths of matches), which could be larger than n for some inputs. (Also technically I'm not a distinguished professor; that's a specific job title that I don't currently have.) —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 18:32, 28 September 2019 (UTC)