Talk:Rabin–Karp algorithm: Difference between revisions

Content deleted Content added
Assessment: +Computing: class=C, importance=Low, science=y, science-importance=Mid, software=y, software-importance=Low (assisted)
Magazine article: new section
Line 193:
 
Note that the pdf linked as a reference does NOT actually refer to the fingerprint function being used as a Rabin fingerprint, so it is correct. The Rabin Karp algorithm does not require using the Rabin Fingerprint, but any rolling hash will work. <!-- Template:Unsigned --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Chmduquesne|Chmduquesne]] ([[User talk:Chmduquesne#top|talk]] • [[Special:Contributions/Chmduquesne|contribs]]) 11:32, 30 November 2017 (UTC)</small>
 
== Magazine article ==
 
The whole article reads like a popular magazine article. For example, the text says, ''Luckily, a good hash function on reasonable strings usually does not have many collisions, so the expected search time will be acceptable.'' I hope I'm not relying on luck for reliability. We may presume, as scholars and computer scientists, that a good hash function is chosen, and the ''prima facie'' evidence of a good hash function is few collisions. A good hash function will minimize collisions even on unreasonable strings.
 
Then there's this:
 
'''Hash function used'''
Main article :Rabin fingerprint
 
The key to the Rabin–Karp algorithm's performance is the efficient computation of hash values of the successive substrings of the text. The Rabin fingerprint is a popular and effective rolling hash function. The hash function described here is not a Rabin fingerprint,
 
I'm expecting to read about Rabin fingerprint since the hatnote says this is what the section is about, but then the text it's not about that. A rolling hash doesn't require a Rabin fingerprint hash function , but that's the most common implementation, and Rabin fingerprint makes a very good hash function, to be recommended to those implementing a rolling hash and just wanting to get on with the task. Especially since earlier, the text actually said ''The essential benefit achieved by using a rolling hash such as the Rabin fingerprint...'' If you don't want to read and understand GF(2), used to construct the Rabin polynomial, then you probably shouldn't we editing on this topic...
 
Now look at this:
 
''The trick can be exploited using a rolling hash. A rolling hash is a hash function specially designed to enable this operation. A trivial (but not very good) rolling hash function just adds the values of each character in the substring. This rolling hash formula can compute the next hash value from the previous value in constant time:''
 
s[i+1..i+m] = s[i..i+m-1] - s[i] + s[i+m]
 
''This simple function works, but will result in statement 5 being executed more often than other more sophisticated rolling hash functions such as those discussed in the next section.''
 
Nobody should do that, nobody would do that. We're not tutoring grade schoolers; this is supposed to be a scholarly article. A rolling hash isn't a trick, it's a well-worn technique.
 
All that pseudocode is how-to (see [[WP:NOTHOWTO]]), and looks to me like original art (see [[WP:NOR]]), which can't be in the encyclopedia. If it's copied from a professional publication, where's the citation(see WP:RS]])? And if it is, I'd have to say it's copyrighted (GULP! See [[WP:COPYVIO]]) So we have a triple whammy here.
 
The section '''Shifting substrings search and competing algorithms''' isn't about Rabin-Karp, and shouldn't be in the article. All that may be needed is one sentence stating that the naive approach of scanning the text string a character at a time and then doing a string match (via a hash or any method) at that position, is O(nm). It's self-evident, no need to ramble on.
 
The text says: ''...using a [[hash function]]. A hash function is a function which converts every string into a numeric value, called its hash value; for example, we might have hash("hello")=5. The algorithm exploits the fact that if two strings are equal, their hash values are also equal.'' Hash function is wikilinked; we don't need to give a kindergarten definition of it in the text. If you don't know what a hash function is, you're probably not reading this article.
 
I could go on, but that's enough for now. The article needs to be redrafted entirely for [[WP:TONE]], and it ''must, must, must'' be cited inline, every paragraph, factoid, piece of code or pesudocode, formula, or anything else that might constitute researched and referenceable content. [[User:Sbalfour|Sbalfour]] ([[User talk:Sbalfour|talk]]) 15:46, 26 September 2019 (UTC)