Perfect hash function: Difference between revisions

Content deleted Content added
Construction: {{math}}
MinusBot (talk | contribs)
m Proper minus signs and other cleanup. Report bugs, errors, and suggestions at User talk:MinusBot
Line 3:
In [[computer science]], a '''perfect hash function''' {{mvar|h}} for a set {{mvar|S}} is a [[hash function]] that maps distinct elements in {{mvar|S}} to a set of {{mvar|m}} integers, with no [[hash collision|collisions]]. In mathematical terms, it is an [[injective function]].
 
Perfect hash functions may be used to implement a [[lookup table]] with constant worst-case access time. A perfect hash function can, as any [[hash function]], be used to implement [[hash table|hash tables]]s, with the advantage that no [[Hash table#Collision resolution|collision resolution]] has to be implemented. In addition, if the keys are not the data and if it is known that queried keys will be valid, then the keys do not need to be stored in the lookup table, saving space.
 
Disadvantages of perfect hash functions are that {{mvar|S}} needs to be known for the construction of the perfect hash function. Non-dynamic perfect hash functions need to be re-constructed if {{mvar|S}} changes. For frequently changing {{mvar|S}} [[dynamic perfect hashing|dynamic perfect hash functions]] may be used at the cost of additional space.<ref name="DynamicPerfectHashing" /> The space requirement to store the perfect hash function is in {{math|''O''(''n'')}}.
Line 16:
| doi = 10.1109/ISIT.2006.261567
| journal = 2006 [[IEEE International Symposium on Information Theory]]
| pages = 2774-27782774–2778
| title = Perfect Hashing for Network Applications
| year = 2006}}</ref>
 
==Performance of perfect hash functions==
The important performance parameters for perfect hashing are the representation size, the evaluation time, the construction time, and additionally the range requirement <math>\frac{m}{n}</math>.<ref name="CHD"/> The evaluation time can be as fast as {{math|''O''(''1'')}}, which is optimal.<ref name="inventor"/><ref name="CHD"/> The construction time needs to be at least {{math|''O''(''n'')}}, because each element in {{mvar|S}} needs to be considered, and {{mvar|S}} contains {{mvar|n}} elements. This lower bound can be achieved in practice.<ref name="CHD"/>
 
The lower bound for the representation size depends on {{mvar|m}} and {{mvar|n}}. Let {{math|''m'' {{=}} (1+&epsilon;) ''n''}} and {{mvar|h}} a perfect hash function. A good approximation for the lower bound is <math>\log e - \varepsilon \log \frac{1+\varepsilon}{\varepsilon}</math> Bits per element. For minimal perfect hashing, {{math|&epsilon; {{=}} 0}}, the lower bound is {{math|log e ≈ 1.44}} bits per element.<ref name="CHD"/>
Line 47:
The hash function itself requires storage space {{math|''O''(''n'')}} to store {{mvar|k}}, {{mvar|p}}, and all of the second-level linear modular functions. Computing the hash value of a given key {{mvar|x}} may be performed in constant time by computing {{math|''g''(''x'')}}, looking up the second-level function associated with {{math|''g''(''x'')}}, and applying this function to {{mvar|x}}.
A modified version of this two-level scheme with a larger number of values at the top level can be used to construct a perfect hash function that maps {{mvar|S}} into a smaller range of length {{math|''n'' + ''o''(''n'')}}.<ref name="inventor"/>
 
 
A more recent method for constructing a perfect hash function is described by {{harvtxt|Belazzougui|Botelho|Dietzfelbinger|2009}} as "hash, displace, and compress". Here a first-level hash function {{mvar|g}} is also used to map elements onto a range of {{mvar|r}} integers. An element {{math|''x'' ∈ ''S''}} is stored in the Bucket {{mvar|B<sub>g(x)</sub>}}.<ref name="CHD" />
Line 61 ⟶ 60:
===Pseudocode===
'''algorithm''' ''hash, displace, and compress'' '''is'''
(1) Split S into buckets {{math|B<sub>i</sub> :{{=}} g<sup>-1−1</sup>({i})&cap;S,0 ≤ i < r}}
(2) Sort buckets B<sub>i</sub> in falling order according to size |B<sub>i</sub>|
(3) Initialize array T[0...m-1] with 0's
Line 199 ⟶ 198:
==Further reading==
*Richard J. Cichelli. ''Minimal Perfect Hash Functions Made Simple'', Communications of the ACM, Vol. 23, Number 1, January 1980.
* [[Thomas H. Cormen]], [[Charles E. Leiserson]], [[Ronald L. Rivest]], and [[Clifford Stein]]. ''[[Introduction to Algorithms]]'', Third Edition. MIT Press, 2009. {{ISBN|978-0262033848}}. Section 11.5: Perfect hashing, pp.&nbsp;267,&nbsp;277&ndash;282.
* Fabiano C. Botelho, [[Rasmus Pagh]] and Nivio Ziviani. [https://arxiv.org/abs/cs/0702159 "Perfect Hashing for Data Management Applications"].
* Fabiano C. Botelho and [[Nivio Ziviani]]. [http://homepages.dcc.ufmg.br/~nivio/papers/cikm07.pdf "External perfect hashing for very large key sets"]. 16th ACM Conference on Information and Knowledge Management (CIKM07), Lisbon, Portugal, November 2007.