Hash function: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 20:35, 27 May 2025 edit 149.143.73.179 (talk) No edit summary Tags: Reverted Mobile edit Mobile web edit ← Previous edit		Latest revision as of 08:08, 26 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,866,601 edits Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 128/990
(7 intermediate revisions by 7 users not shown)
Line 4: {{about\|a computer programming construct\|other meanings of "hash" and "hashing"\|Hash (disambiguation)}} {{More citations needed\|date=July 2010}} [[File:Hash table 4 1 1 0 0 1 0 LL.svg\|thumb\|240px\|right\|A hash function that maps names to integers from 0 to 15. There is a ~~[[hash~~ collision~~\|collision]]~~ between keys "John Smith" and "Sandra Dee".]] ~~[[Andor (TV series)\|~~A]] '''hash function''' is any [[Function (mathematics)\|function]] that can be used to map [[data (computing)\|data]] of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output.<ref>{{cite conference \|last1=Aggarwal \|first1=Kirti \|last2=Verma \|first2=Harsh K. \|date=March 19, 2015 ~~\|url=https://ieeexplore.ieee.org/document/7164747~~ \|title=Hash_RC6 — Variable length Hash algorithm using RC6 \|doi=10.1109/ICACEA.2015.7164747 \|conference=2015 International Conference on Advances in Computer Engineering and Applications (ICACEA) ~~\|access-date=January 24, 2023~~}}</ref> The values returned by a hash function are called ''hash values'', ''hash codes'', (''hash/message'') ''digests'',<ref> {{cite web\|url=https://csrc.nist.gov/glossary/term/hash_digest\|title=hash digest\|publisher=[[NIST]] \|website=Computer Security Resource Center - Glossary}} {{cite web \|url=https://csrc.nist.gov/glossary/term/message_digest \|title=message digest \|publisher=[[NIST]] \|website=Computer Security Resource Center - Glossary}}</ref> or simply ''hashes''. The values are usually used to index a fixed-size table called a ''[[hash table]]''. Use of a hash function to index a hash table is called ''hashing'' or ''scatter-storage addressing''. Line 79: === Universality === {{main\|Universal hashing}} A ''universal hashing'' scheme is a [[randomized algorithm]] that selects a hash function {{math\|''h''}} among a family of such functions, in such a way that the probability of a collision of any two distinct keys is {{math\|1/''m''}}, where {{math\|''m''}} is the number of distinct hash values desired—independently of the two keys. Universal hashing ensures (in a probabilistic sense) that the hash [[function application]] will behave as well as if it were using a random function, for any distribution of the input data. It will, however, have more collisions than perfect hashing and may require more operations than a special-purpose hash function. ===Applicability=== Line 93: === Deterministic === A hash procedure must be [[deterministic algorithm\|deterministic]]—for a given input value, it must always generate the same hash value. In other words, it must be a [[function (mathematics)\|function]] of the data to be hashed, in the mathematical sense of the term. This requirement excludes hash functions that depend on external variable parameters, such as [[pseudo-random number generator]]s or the time of day. It also excludes functions that depend on the [[memory address]] of the object being hashed, because the address may change during execution (as may happen on systems that use certain methods of [[garbage collection (computer science)\|garbage collection]]), although sometimes rehashing of the item is possible. The determinism is in the context of the reuse of the function. For example, [[Python (programming language)\|Python]] adds the feature that hash functions make use of a randomized seed that is generated once when the Python process starts in addition to the input to be hashed.<ref>{{Cite web\|url=https://docs.python.org/3/reference/datamodel.html#object.__hash__\|title=3. Data model — Python 3.6.1 documentation\|website=docs.python.org\|access-date=2017-03-24}}</ref> The Python hash ([[SipHash]]) is still a valid hash function when used within a single run, but if the values are persisted (for example, written to disk), they can no longer be treated as valid hash values, since in the next run the random value might differ. Line 162: Multiplicative hashing is susceptible to a "common mistake" that leads to poor diffusion—higher-value input bits do not affect lower-value output bits.<ref> {{cite web \|url=~~http~~https://www.cs.cornell.edu/courses/cs3110/2008fa/lectures/lec21.html \|title=CS 3110 Lecture 21: Hash functions \|at=Section "Multiplicative hashing"}}</ref> A transmutation on the input which shifts the span of retained top bits down and XORs or ADDs them to the key before the multiplication step corrects for this. The resulting function looks like:<ref name="fibonacci-hashing"/> <syntaxhighlight lang="c"> unsigned hash(unsigned K) { Line 181: === Zobrist hashing === {{main \| Tabulation hashing\|Zobrist hashing}} [[Tabulation hashing]], more generally known as ''Zobrist hashing'' after [[Albert Lindsey Zobrist\|Albert Zobrist]], is a method for constructing universal families of hash functions by combining table lookup with XOR operations. This algorithm has proven to be very fast and of high quality for hashing purposes (especially hashing of integer-number keys).<ref>{{citation\|first=Albert L.\|last=Zobrist\|author-link= Albert Lindsey Zobrist\|url=https://www.cs.wisc.edu/techreports/1970/TR88.pdf\|title=A New Hashing Method with Application for Game Playing\|series=Tech. Rep. 88\|publisher=Computer Sciences Department, University of Wisconsin\|___location=Madison, Wisconsin\|date=April 1970}}.</ref> Zobrist hashing was originally introduced as a means of compactly representing chess positions in computer game-playing programs. A unique random number was assigned to represent each type of piece (six each for black and white) on each space of the board. Thus a table of 64×12 such numbers is initialized at the start of the program. The random numbers could be any length, but 64 bits was natural due to the 64 squares on the board. A position was transcribed by cycling through the pieces in a position, indexing the corresponding random numbers (vacant spaces were not included in the calculation) and XORing them together (the starting value could be 0 (the identity value for XOR) or a random seed). The resulting value was reduced by modulo, folding, or some other operation to produce a hash table index. The original Zobrist hash was stored in the table as the representation of the position.