Content deleted Content added
→Rewrite: r |
AmirOnWiki (talk | contribs) →Probabilistic analysis: new section |
||
(11 intermediate revisions by 5 users not shown) | |||
Line 1:
{{WikiProject banner shell|class=C|
{{WikiProject Computer science |importance=mid {{WikiProject Computing }}
==Extending the algorithm==
Line 171 ⟶ 174:
*<s>'''Support''' With both forms in use, I'm fine defaulting to alphabetical order. --[[User:BDD|BDD]] ([[User talk:BDD|talk]]) 17:50, 29 April 2014 (UTC)</s>
*'''Neutral'''. Either is good for me. Cormen uses RK, and RK sounds better KR (which may be why author order was switched -- leading with a double stop is awkward). [[User:Glrx|Glrx]] ([[User talk:Glrx|talk]]) 23:37, 29 April 2014 (UTC)
*'''Strong, speedy support''' - dang, somewhere someone messed up big time to botch this order. Follow the order they wrote it in. [[User:Red Slash|<
* '''Oppose''' – in both scholar search and book search, Rabin–Karp is somewhat more common than Karp–Rabin (at least, when I look). I don't know why, but I think it's OK to stick with what it's more commonly called. Unless someone can show that these are due to wiki mirroring. [[User:Dicklyon|Dicklyon]] ([[User talk:Dicklyon|talk]]) 04:27, 2 May 2014 (UTC)
**FWIW, Google scholar gives me about 820 hits for "Karp-Rabin" (as a quoted string) and about 733 for "Rabin-Karp". I don't put much stock in general web searches for this sort of topic. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 05:05, 2 May 2014 (UTC)
Line 180 ⟶ 183:
***My observation was based on a quick look at Google books. [https://www.google.com.au/search?q=%22Karp-Rabin%22+algorithm&tbm=bks] [https://www.google.com.au/search?q=%22Rabin-Karp%22+algorithm&tbm=bks] It's not an overwhelming result, but the trend looked clear to me. Certainly enough to require more evidence before moving. [[User:Andrewa|Andrewa]] ([[User talk:Andrewa|talk]]) 08:02, 7 May 2014 (UTC)
<hr />
:''The above discussion is preserved as an archive of a [[WP:RM|requested move]]. <span style="color:red">'''Please do not modify it.'''</span> Subsequent comments should be made in a new section on this talk page or in a [[WP:move review|move review]]. No further edits should be made to this section.''</div><!-- Template:RM bottom -->
== confusing example data in section 'Hash function used' ==
Line 255 ⟶ 258:
::::::You're a distinguished Professor of Computer Science and an ACM fellow! (GULP) You're more qualified to write this article than me, and I wasn't very familiar with it when I started. Throw in something, and I'll do, too. Starting with the definition in the first sentence. Who says that it can find multiple patterns? The seminal paper listed below just says it finds the first occurrence of a pattern in text... i.e. not even multiple occurrences of one pattern.[[User:Sbalfour|Sbalfour]] ([[User talk:Sbalfour|talk]]) 18:24, 28 September 2019 (UTC)
:::::::Just keep running the algorithm and it will find multiple instances of the same pattern. Use a separate hash table to store the hashes of the patterns and it will be able to find multiple different patterns. The same idea, easily generalized. We need published sources for that of course but they don't have to be in the original paper. The subtlety is that the expected time becomes O(n + sum of lengths of matches), which could be larger than n for some inputs. (Also technically I'm not a distinguished professor; that's a specific job title that I don't currently have.) —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 18:32, 28 September 2019 (UTC)
== Why does this work? (resolved) ==
The hash function suggested in [[Rabin–Karp algorithm#Hash function used]] goes over each character. So I don't see how using the hash is faster than just comparing each character [[User:AltoStev|<span style='color: orange;'>'''AltoStev'''</span>]] ([[User Talk:AltoStev|<span style='color: #448cff'>talk</span>]]) 23:00, 28 September 2024 (UTC)
:Comparing each character to what? A naive search would compare each pair of (position in first string, position in second string). That could be a lot more pairs than characters. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 23:40, 28 September 2024 (UTC)
::<s>That would be the same amount of pairs as characters, as the number of pairs is equal to the number of positions, which is equal to the number of characters in the string. [[User:AltoStev|<span style='color: orange;'>'''AltoStev'''</span>]] ([[User Talk:AltoStev|<span style='color: #448cff'>talk</span>]]) 12:09, 30 September 2024 (UTC)</s>
::Oh I see what you mean. I meant, for each index i, the pair (s[i], s_2[i]). [[User:AltoStev|<span style='color: orange;'>'''AltoStev'''</span>]] ([[User Talk:AltoStev|<span style='color: #448cff'>talk</span>]]) 12:20, 30 September 2024 (UTC)
:::But that would only test two strings for equality. This algorithm tests whether one is a substring of the other, at any position. —[[User:David Eppstein|David Eppstein]] ([[User talk:David Eppstein|talk]]) 20:17, 30 September 2024 (UTC)
== Probabilistic analysis ==
There is a vauge statement that the expected time of the algorithm is linear. It would be nice if someone can flesh this out. [[User:AmirOnWiki|AmirOnWiki]] ([[User talk:AmirOnWiki|talk]]) 14:09, 23 June 2025 (UTC)
|