Hash function: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Altered url. URLs might have been anonymized. | Use this bot. Report bugs. | Suggested by Jay8g | Linked from User:Jay8g/sandbox | #UCB_webform_linked 16/1786
Get rid of unnecessary piping.
Line 21:
*Scramble the bits of the key so that the resulting values are uniformly distributed over the [[Key space (cryptography)|keyspace]], and
*Map the key values into ones less than or equal to the size of the table.
A good hash function satisfies two basic properties: it should be very fast to compute, and it should minimize duplication of output values ([[Hash collision|collisions]]). Hash functions rely on generating favorable [[Probabilityprobability distribution|probability distributions]]s for their effectiveness, reducing access time to nearly constant. High table loading factors, [[Pathological (mathematics)|pathological]] key sets, and poorly designed hash functions can result in access times approaching linear in the number of items in the table. Hash functions can be designed to give the best worst-case performance,<ref group=Notes>This is useful in cases where keys are devised by a malicious agent, for example in pursuit of a DOS attack.</ref> good performance under high table loading factors, and in special cases, perfect (collisionless) mapping of keys into hash codes. Implementation is based on parity-preserving bit operations (XOR and ADD), multiply, or divide. A necessary adjunct to the hash function is a collision-resolution method that employs an auxiliary data structure like [[linked list]]s, or systematic probing of the table to find an empty slot.
 
== Hash tables ==
Line 69:
Because collisions should be infrequent, and cause a marginal delay but are otherwise harmless, it is usually preferable to choose a faster hash function over one that needs more computation but saves a few collisions.
 
Division-based implementations can be of particular concern because the division is [[Microprogramming|microprogrammed]] on nearly all chip architectures. Division ([[Modulo operation|modulo]]) by a constant can be inverted to become a multiplication by the word-size multiplicative-inverse of that constant. This can be done by the programmer, or by the compiler. Division can also be reduced directly into a series of shift-subtracts and shift-adds, though minimizing the number of such operations required is a daunting problem; the number of assembly instructions resulting may be more than a dozen and swamp the pipeline. If the architecture has [[hardware multiply]] [[Functionalfunctional unit|functional units]]s, then the multiply-by-inverse is likely a better approach.
 
We can allow the table size {{math|''n''}} to not be a power of 2 and still not have to perform any remainder or division operation, as these computations are sometimes costly. For example, let {{math|''n''}} be significantly less than {{math|2<sup>''b''</sup>}}. Consider a [[pseudorandom number generator]] function {{math|''P''(key)}} that is uniform on the interval {{math|[0, 2<sup>''b''</sup>&nbsp;−&nbsp;1]}}. A hash function uniform on the interval {{math|[0, ''n'' &minus; 1]}} is {{math|''n'' ''P''(key) / 2<sup>''b''</sup>}}. We can replace the division by a (possibly faster) right [[bit shifting|bit shift]]: {{math|''n P''(key) >> ''b''}}.
Line 86:
* [[File verification|Integrity checking]]: Identical hash values for different files imply equality, providing a reliable means to detect file modifications.
* [[Key derivation function|Key derivation]]: Minor input changes result in a random-looking output alteration, known as the diffusion property. Thus, hash functions are valuable for key derivation functions.
* [[Message authentication code|Message Authentication Codes]]s (MACs): Through the integration of a confidential key with the input data, hash functions can generate MACs ensuring the genuineness of the data, such as in [[HMAC]]s.
* Password storage: The password's hash value does not expose any password details, emphasizing the importance of securely storing hashed passwords on the server.
* [[Digital signature|Signatures]]: Message hashes are signed rather than the whole message.