Hash function: Difference between revisions

Content deleted Content added
Get rid of unnecessary piping.
Division may or may not be microcoded; the issue is whether it takes multiple cycles, regardless of whether it's implemented with a hardware state machine or microcoded. That, and single-cycle vs. multi-cycle multiplication, are microarchitecture issues. And the division issue predated single-chip microprocessors. Unrolling a division loop ultimately generates machine instructions, whether they are first put into assembly language or not.
Line 69:
Because collisions should be infrequent, and cause a marginal delay but are otherwise harmless, it is usually preferable to choose a faster hash function over one that needs more computation but saves a few collisions.
 
Division-based implementations can be of particular concern because thea division isrequires [[Microprogramming|microprogrammed]]multiple cycles on nearly all chipprocessor architectures[[microarchitecture]]s. Division ([[Modulo operation|modulo]]) by a constant can be inverted to become a multiplication by the word-size multiplicative-inverse of that constant. This can be done by the programmer, or by the compiler. Division can also be reduced directly into a series of shift-subtracts and shift-adds, though minimizing the number of such operations required is a daunting problem; the number of assemblymachine-language instructions resulting may be more than a dozen and swamp the pipeline. If the architecturemicroarchitecture has [[hardware multiply]] [[functional unit]]s, then the multiply-by-inverse is likely a better approach.
 
We can allow the table size {{math|''n''}} to not be a power of 2 and still not have to perform any remainder or division operation, as these computations are sometimes costly. For example, let {{math|''n''}} be significantly less than {{math|2<sup>''b''</sup>}}. Consider a [[pseudorandom number generator]] function {{math|''P''(key)}} that is uniform on the interval {{math|[0, 2<sup>''b''</sup>&nbsp;−&nbsp;1]}}. A hash function uniform on the interval {{math|[0, ''n'' &minus; 1]}} is {{math|''n'' ''P''(key) / 2<sup>''b''</sup>}}. We can replace the division by a (possibly faster) right [[bit shifting|bit shift]]: {{math|''n P''(key) >> ''b''}}.
Line 134:
 
=== Division hashing ===
A standard technique is to use a modulo function on the key, by selecting a divisor {{Mvar|M}} which is a prime number close to the table size, so {{Math|''h''(''K'') &equiv; ''K'' (mod ''M'')}}. The table size is usually a power of 2. This gives a distribution from {{Math|{{mset|0, ''M'' &minus; 1}}}}. This gives good results over a large number of key sets. A significant drawback of division hashing is that division isrequires microprogrammedmultiple cycles on most modern architectures (including [[x86]]) and can be 10 times slower than multiplication. A second drawback is that it will not break up clustered keys. For example, the keys 123000, 456000, 789000, etc. modulo 1000 all map to the same address. This technique works well in practice because many key sets are sufficiently random already, and the probability that a key set will be cyclical by a large prime number is small.
 
=== Algebraic coding ===