Content deleted Content added
Tags: Reverted Visual edit |
Citation bot (talk | contribs) Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 128/990 |
||
(12 intermediate revisions by 12 users not shown) | |||
Line 4:
{{about|a computer programming construct|other meanings of "hash" and "hashing"|Hash (disambiguation)}}
{{More citations needed|date=July 2010}}
[[File:Hash table 4 1 1 0 0 1 0 LL.svg|thumb|240px|right|A hash function that maps names to integers from 0 to 15. There is a
A '''hash function''' is any [[Function (mathematics)|function]] that can be used to map [[data (computing)|data]] of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output.<ref>{{cite conference |last1=Aggarwal |first1=Kirti |last2=Verma |first2=Harsh K. |date=March 19, 2015
*{{cite web|url=https://csrc.nist.gov/glossary/term/hash_digest|title=hash digest|publisher=[[NIST]] |website=Computer Security Resource Center - Glossary}}
*{{cite web |url=https://csrc.nist.gov/glossary/term/message_digest |title=message digest |publisher=[[NIST]] |website=Computer Security Resource Center - Glossary}}</ref> or simply ''hashes''. The values are usually used to index a fixed-size table called a ''[[hash table]]''. Use of a hash function to index a hash table is called ''hashing'' or ''scatter-storage addressing''.
Line 16:
Hash functions are related to (and often confused with) [[checksums]], [[check digit]]s, [[Fingerprint (computing)|fingerprints]], [[lossy compression]], [[randomization function]]s, [[Error correction code|error-correcting codes]], and [[cipher]]s. Although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently. The hash function differs from these concepts mainly in terms of [[data integrity]]. Hash tables may use [[non-cryptographic hash function]]s, while [[cryptographic hash function]]s are used in cybersecurity to secure sensitive data such as passwords.
== Overview
In a hash table, a hash function takes a key as an input, which is associated with a datum or record and used to identify it to the data storage and retrieval application. The keys may be fixed-length, like an integer, or variable-length, like a name. In some cases, the key is the datum itself. The output is a hash code used to index a hash table holding the data or records, or pointers to them.
Line 79:
=== Universality ===
{{main|Universal hashing}}
A ''universal hashing'' scheme is a [[randomized algorithm]] that selects a hash function {{math|''h''}} among a family of such functions, in such a way that the probability of a collision of any two distinct keys is {{math|1/''m''}}, where {{math|''m''}} is the number of distinct hash values desired—independently of the two keys. Universal hashing ensures (in a probabilistic sense) that the hash [[function application]] will behave as well as if it were using a random function, for any distribution of the input data. It will, however, have more collisions than perfect hashing and may require more operations than a special-purpose hash function.
===Applicability===
Line 93:
=== Deterministic ===
A hash procedure must be [[deterministic algorithm|deterministic]]—for a given input value, it must always generate the same hash value. In other words, it must be a [[function (mathematics)|function]] of the data to be hashed, in the mathematical sense of the term. This requirement excludes hash functions that depend on external variable parameters, such as [[pseudo-random number generator]]s or the time of day. It also excludes functions that depend on the [[memory address]] of the object being hashed, because the address may change during execution (as may happen on systems that use certain methods of [[garbage collection (computer science)|garbage collection]]), although sometimes rehashing of the item is possible.
The determinism is in the context of the reuse of the function. For example, [[Python (programming language)|Python]] adds the feature that hash functions make use of a randomized seed that is generated once when the Python process starts in addition to the input to be hashed.<ref>{{Cite web|url=https://docs.python.org/3/reference/datamodel.html#object.__hash__|title=3. Data model — Python 3.6.1 documentation|website=docs.python.org|access-date=2017-03-24}}</ref> The Python hash ([[SipHash]]) is still a valid hash function when used within a single run, but if the values are persisted (for example, written to disk), they can no longer be treated as valid hash values, since in the next run the random value might differ.
Line 154:
=== Multiplicative hashing ===
Standard multiplicative hashing uses the formula {{Math|1=''h''<sub>''a''</sub>(''K'') = {{floor|(''aK'' mod ''W'') / (''W''/''M'')}}}}, which produces a hash value in {{Math|{{mset|0, …, ''M'' − 1}}}}. The value {{Mvar|a}} is an appropriately chosen value that should be [[Coprime integers|relatively prime]] to {{Mvar|W}}; it should be large,{{Clarify|reason=How large?|date=January 2021}} and its binary representation a random mix{{Clarify|reason="a" is a constant, so it cannot be random|date=January 2021}} of 1s and 0s. An important practical special case occurs when {{Math|1=''W'' = 2<sup>''w''</sup>}} and {{Math|1=''M'' = 2<sup>''m''</sup>}} are powers of 2 and {{Mvar|w}} is the machine [[word size]]. In this case, this formula becomes {{Math|1=''h''<sub>''a''</sub>(''K'') = {{floor|(''aK'' mod 2<sup>''w''</sup>) / 2<sup>''w''−''m''</sup>}}}}. This is special because arithmetic modulo {{Math|2<sup>''w''</sup>}} is done by default in low-level programming languages and integer division by a power of 2 is simply a right-shift, so, in [[C (programming language)|C]], for example, this function becomes
<syntaxhighlight lang="c">
return (a * K) >> (w - m);
</syntaxhighlight>
and for fixed {{Mvar|m}} and {{Mvar|w}} this translates into a single integer multiplication and right-shift, making it one of the fastest hash functions to compute.
Multiplicative hashing is susceptible to a "common mistake" that leads to poor diffusion—higher-value input bits do not affect lower-value output bits.<ref>
{{cite web |url=
<syntaxhighlight lang="c">
K ^= K >> (w-m); ▼
</syntaxhighlight>
=== Fibonacci hashing ===
Line 177 ⟶ 181:
=== Zobrist hashing ===
{{main | Tabulation hashing|Zobrist hashing}}
[[Tabulation hashing]], more generally known as ''Zobrist hashing'' after [[Albert Lindsey Zobrist|Albert Zobrist]], is a method for constructing universal families of hash functions by combining table lookup with XOR operations. This algorithm has proven to be very fast and of high quality for hashing purposes (especially hashing of integer-number keys).<ref>{{citation|first=Albert L.|last=Zobrist|author-link= Albert Lindsey Zobrist|url=https://www.cs.wisc.edu/techreports/1970/TR88.pdf|title=A New Hashing Method with Application for Game Playing|series=Tech. Rep. 88|publisher=Computer Sciences Department, University of Wisconsin|___location=Madison, Wisconsin|date=April 1970}}.</ref>
Zobrist hashing was originally introduced as a means of compactly representing chess positions in computer game-playing programs. A unique random number was assigned to represent each type of piece (six each for black and white) on each space of the board. Thus a table of 64×12 such numbers is initialized at the start of the program. The random numbers could be any length, but 64 bits was natural due to the 64 squares on the board. A position was transcribed by cycling through the pieces in a position, indexing the corresponding random numbers (vacant spaces were not included in the calculation) and XORing them together (the starting value could be 0 (the identity value for XOR) or a random seed). The resulting value was reduced by modulo, folding, or some other operation to produce a hash table index. The original Zobrist hash was stored in the table as the representation of the position.
Line 248 ⟶ 252:
== External links ==
{{Wiktionary|hash}}
*[http://www.sinfocol.org/archivos/2009/11/Goulburn06.pdf The Goulburn Hashing Function] ([[Portable Document Format|PDF]]) by Mayur Patel
*[https://dspace5.zcu.cz/bitstream/11025/11784/1/Skala_2010_Corfu-NAUN-Hash.pdf Hash Function Construction for Textual and Geometrical Data Retrieval] ([[Portable Document Format|PDF]]) Latest Trends on Computers, Vol.2, pp. 483–489, CSCC Conference, Corfu, 2010
|