Talk:Rabin–Karp algorithm: Difference between revisions

Content deleted Content added
MalnadachBot (talk | contribs)
m Fixed Lint errors. (Task 12)
AmirOnWiki (talk | contribs)
 
(9 intermediate revisions by 3 users not shown)
Line 1:
{{WikiProject banner shell|class=C|
{{WikiProject Computer science |importance=mid |class=C}}
{{WikiProject Computing |class=C |importance=Low |science=y |science-importance=Mid |software=y |software-importance=Low}}
}}
 
==Extending the algorithm==
Line 255 ⟶ 258:
::::::You're a distinguished Professor of Computer Science and an ACM fellow! (GULP) You're more qualified to write this article than me, and I wasn't very familiar with it when I started. Throw in something, and I'll do, too. Starting with the definition in the first sentence. Who says that it can find multiple patterns? The seminal paper listed below just says it finds the first occurrence of a pattern in text... i.e. not even multiple occurrences of one pattern.[[User:Sbalfour|Sbalfour]] ([[User talk:Sbalfour|talk]]) 18:24, 28 September 2019 (UTC)
:::::::Just keep running the algorithm and it will find multiple instances of the same pattern. Use a separate hash table to store the hashes of the patterns and it will be able to find multiple different patterns. The same idea, easily generalized. We need published sources for that of course but they don't have to be in the original paper. The subtlety is that the expected time becomes O(n + sum of lengths of matches), which could be larger than n for some inputs. (Also technically I'm not a distinguished professor; that's a specific job title that I don't currently have.) —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 18:32, 28 September 2019 (UTC)
 
== Why does this work? (resolved) ==
 
The hash function suggested in [[Rabin–Karp algorithm#Hash function used]] goes over each character. So I don't see how using the hash is faster than just comparing each character &nbsp;[[User:AltoStev|<span style='color: orange;'>'''AltoStev'''</span>]] ([[User Talk:AltoStev|<span style='color: #448cff'>talk</span>]]) 23:00, 28 September 2024 (UTC)
 
:Comparing each character to what? A naive search would compare each pair of (position in first string, position in second string). That could be a lot more pairs than characters. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 23:40, 28 September 2024 (UTC)
::<s>That would be the same amount of pairs as characters, as the number of pairs is equal to the number of positions, which is equal to the number of characters in the string. &nbsp;[[User:AltoStev|<span style='color: orange;'>'''AltoStev'''</span>]] ([[User Talk:AltoStev|<span style='color: #448cff'>talk</span>]]) 12:09, 30 September 2024 (UTC)</s>
::Oh I see what you mean. I meant, for each index i, the pair (s[i], s_2[i]). &nbsp;[[User:AltoStev|<span style='color: orange;'>'''AltoStev'''</span>]] ([[User Talk:AltoStev|<span style='color: #448cff'>talk</span>]]) 12:20, 30 September 2024 (UTC)
:::But that would only test two strings for equality. This algorithm tests whether one is a substring of the other, at any position. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 20:17, 30 September 2024 (UTC)
 
== Probabilistic analysis ==
 
There is a vauge statement that the expected time of the algorithm is linear. It would be nice if someone can flesh this out. [[User:AmirOnWiki|AmirOnWiki]] ([[User talk:AmirOnWiki|talk]]) 14:09, 23 June 2025 (UTC)