Content deleted Content added
→Construction: {{math}} |
m Proper minus signs and other cleanup. Report bugs, errors, and suggestions at User talk:MinusBot |
||
Line 3:
In [[computer science]], a '''perfect hash function''' {{mvar|h}} for a set {{mvar|S}} is a [[hash function]] that maps distinct elements in {{mvar|S}} to a set of {{mvar|m}} integers, with no [[hash collision|collisions]]. In mathematical terms, it is an [[injective function]].
Perfect hash functions may be used to implement a [[lookup table]] with constant worst-case access time. A perfect hash function can, as any [[hash function]], be used to implement [[hash table
Disadvantages of perfect hash functions are that {{mvar|S}} needs to be known for the construction of the perfect hash function. Non-dynamic perfect hash functions need to be re-constructed if {{mvar|S}} changes. For frequently changing {{mvar|S}} [[dynamic perfect hashing|dynamic perfect hash functions]] may be used at the cost of additional space.<ref name="DynamicPerfectHashing" /> The space requirement to store the perfect hash function is in {{math|''O''(''n'')}}.
Line 16:
| doi = 10.1109/ISIT.2006.261567
| journal = 2006 [[IEEE International Symposium on Information Theory]]
| pages =
| title = Perfect Hashing for Network Applications
| year = 2006}}</ref>
==Performance of perfect hash functions==
The important performance parameters for perfect hashing are the representation size, the evaluation time, the construction time, and additionally the range requirement <math>\frac{m}{n}</math>.<ref name="CHD"/> The evaluation time can be as fast as {{math|''O''(''1'')}}, which is optimal.<ref name="inventor"/><ref name="CHD"/> The construction time needs to be at least {{math|''O''(''n'')}}, because each element in {{mvar|S}} needs to be considered, and {{mvar|S}} contains {{mvar|n}} elements. This lower bound can be achieved in practice.<ref name="CHD"/>
The lower bound for the representation size depends on {{mvar|m}} and {{mvar|n}}. Let {{math|''m'' {{=}} (1+ε) ''n''}} and {{mvar|h}} a perfect hash function. A good approximation for the lower bound is <math>\log e - \varepsilon \log \frac{1+\varepsilon}{\varepsilon}</math> Bits per element. For minimal perfect hashing, {{math|ε {{=}} 0}}, the lower bound is {{math|log e ≈ 1.44}} bits per element.<ref name="CHD"/>
Line 47:
The hash function itself requires storage space {{math|''O''(''n'')}} to store {{mvar|k}}, {{mvar|p}}, and all of the second-level linear modular functions. Computing the hash value of a given key {{mvar|x}} may be performed in constant time by computing {{math|''g''(''x'')}}, looking up the second-level function associated with {{math|''g''(''x'')}}, and applying this function to {{mvar|x}}.
A modified version of this two-level scheme with a larger number of values at the top level can be used to construct a perfect hash function that maps {{mvar|S}} into a smaller range of length {{math|''n'' + ''o''(''n'')}}.<ref name="inventor"/>
A more recent method for constructing a perfect hash function is described by {{harvtxt|Belazzougui|Botelho|Dietzfelbinger|2009}} as "hash, displace, and compress". Here a first-level hash function {{mvar|g}} is also used to map elements onto a range of {{mvar|r}} integers. An element {{math|''x'' ∈ ''S''}} is stored in the Bucket {{mvar|B<sub>g(x)</sub>}}.<ref name="CHD" />
Line 61 ⟶ 60:
===Pseudocode===
'''algorithm''' ''hash, displace, and compress'' '''is'''
(1) Split S into buckets {{math|B<sub>i</sub> :{{=}} g<sup>
(2) Sort buckets B<sub>i</sub> in falling order according to size |B<sub>i</sub>|
(3) Initialize array T[0...m-1] with 0's
Line 199 ⟶ 198:
==Further reading==
*Richard J. Cichelli. ''Minimal Perfect Hash Functions Made Simple'', Communications of the ACM, Vol. 23, Number 1, January 1980.
* [[Thomas H. Cormen]], [[Charles E. Leiserson]], [[Ronald L. Rivest]], and [[Clifford Stein]]. ''[[Introduction to Algorithms]]'', Third Edition. MIT Press, 2009. {{ISBN|978-0262033848}}. Section 11.5: Perfect hashing, pp. 267, 277–282.
* Fabiano C. Botelho, [[Rasmus Pagh]] and Nivio Ziviani. [https://arxiv.org/abs/cs/0702159 "Perfect Hashing for Data Management Applications"].
* Fabiano C. Botelho and [[Nivio Ziviani]]. [http://homepages.dcc.ufmg.br/~nivio/papers/cikm07.pdf "External perfect hashing for very large key sets"]. 16th ACM Conference on Information and Knowledge Management (CIKM07), Lisbon, Portugal, November 2007.
|