Content deleted Content added
No edit summary Tags: Mobile edit Mobile web edit |
m HTTP to HTTPS for SourceForge |
||
(12 intermediate revisions by 10 users not shown) | |||
Line 6:
Perfect hash functions may be used to implement a [[lookup table]] with constant worst-case access time. A perfect hash function can, as any [[hash function]], be used to implement [[hash table]]s, with the advantage that no [[Hash table#Collision resolution|collision resolution]] has to be implemented. In addition, if the keys are not in the data and if it is known that queried keys will be valid, then the keys do not need to be stored in the lookup table, saving space.
Disadvantages of perfect hash functions are that {{mvar|S}} needs to be known for the construction of the perfect hash function. Non-dynamic perfect hash functions need to be re-constructed if {{mvar|S}} changes. For frequently changing {{mvar|S}} [[dynamic perfect hashing|dynamic perfect hash functions]] may be used at the cost of additional space.<ref name="DynamicPerfectHashing" /> The space requirement to store the perfect hash function is in {{math|''O''(''n'')}} where {{math|''n''}} is the number of keys in the structure.
The important performance parameters for perfect hash functions are the evaluation time, which should be constant, the construction time, and the representation size.
Line 14:
| last1 = Lu | first1 = Yi | author1-link = Yi Lu (computer scientist)
| last2 = Prabhakar | first2 = Balaji | author2-link = Balaji Prabhakar
| last3 = Bonomi | first3 = Flavio | title = 2006 IEEE International Symposium on Information Theory | chapter = Perfect Hashing for Network Applications | author3-link = Flavio Bonomi
| doi = 10.1109/ISIT.2006.261567
| pages = 2774–2778
| year = 2006| isbn = 1-4244-0505-X | s2cid = 1494710 }}</ref>
==Performance of perfect hash functions==
The important performance parameters for perfect hashing are the representation size, the evaluation time, the construction time, and additionally the range requirement <math>\frac{m}{n}</math> (average number of buckets per key in the hash table).<ref name="CHD"/> The evaluation time can be as fast as {{math|''O''(''1'')}}, which is optimal.<ref name="inventor"/><ref name="CHD"/> The construction time needs to be at least {{math|''O''(''n'')}}, because each element in {{mvar|S}} needs to be considered, and {{mvar|S}} contains {{mvar|n}} elements. This lower bound can be achieved in practice.<ref name="CHD"/>
The lower bound for the representation size depends on {{mvar|m}} and {{mvar|n}}. Let {{math|''m'' {{=}} (1+ε) ''n''}} and {{mvar|h}} a perfect hash function. A good approximation for the lower bound is <math>\log e - \varepsilon \log \frac{1+\varepsilon}{\varepsilon}</math> Bits per element. For minimal perfect hashing, {{math|ε {{=}} 0}}, the lower bound is {{math|log e ≈ 1.44}} bits per element.<ref name="CHD"/>
Line 42 ⟶ 40:
| title = Storing a Sparse Table with {{math|''O''(1)}} Worst Case Access Time
| volume = 31
| year = 1984| s2cid = 5399743 | doi-access = free
}}</ref> As {{harvtxt|Fredman|Komlós|Szemerédi|1984}} show, there exists a choice of the parameter {{mvar|k}} such that the sum of the lengths of the ranges for the {{mvar|n}} different values of {{math|''g''(''x'')}} is {{math|''O''(''n'')}}. Additionally, for each value of {{math|''g''(''x'')}}, there exists a linear modular function that maps the corresponding subset of {{mvar|S}} into the range associated with that value. Both {{mvar|k}}, and the second-level functions for each value of {{math|''g''(''x'')}}, can be found in [[polynomial time]] by choosing values randomly until finding one that works.<ref name="inventor"/>
Line 96 ⟶ 95:
==Extensions==
===Dynamic perfect hashing===
{{main article|Dynamic perfect hashing}}
Line 118 ⟶ 114:
===Minimal perfect hash function===
A minimal perfect hash function is a perfect hash function that maps {{mvar|n}} keys to {{mvar|n}} consecutive integers – usually the numbers from {{math|0}} to {{math|''n'' − 1}} or from {{math|1}} to {{mvar|n}}. A more formal way of expressing this is: Let {{mvar|j}} and {{mvar|k}} be elements of some finite set {{mvar|S}}. Then {{mvar|h}} is a minimal perfect hash function if and only if {{math|1=''h''(''j'') = ''h''(''k'')}} implies {{math|1=''j'' = ''k''}} ([[injectivity]]) and there exists an integer {{mvar|a}} such that the range of {{mvar|h}} is {{math|1=''a''..''a'' + {{!}}''S''{{!}} − 1}}. It has been proven that a general purpose minimal perfect hash scheme requires at least
| last1 = Belazzougui | first1 = Djamal
| last2 = Botelho | first2 = Fabiano C.
| last3 = Dietzfelbinger | first3 = Martin
| contribution = Hash, displace, and compress
| contribution-url =
| doi = 10.1007/978-3-642-04128-0_61
| ___location = Berlin
Line 130 ⟶ 126:
| publisher = Springer
| series = [[Lecture Notes in Computer Science]]
| title = Algorithms - ESA 2009
| volume = 5757
| isbn = 978-3-642-04127-3
| year = 2009| citeseerx = 10.1.1.568.130
| url =
}}.</ref> Assuming that <math>S</math> is a set of size <math>n</math> containing integers in the range <math>[1, 2^{o(n)}]</math>, it is known how to efficiently construct an explicit minimal perfect hash function from <math>S</math> to <math>\{1, 2, \ldots, n\}</math> that uses space <math>n \log_2 e + o(n)</math>bits and that supports constant evaluation time.<ref>{{Citation |last1=Hagerup |first1=Torben |title=Efficient Minimal Perfect Hashing in Nearly Minimal Space |date=2001 |url=http://dx.doi.org/10.1007/3-540-44693-1_28 |work=STACS 2001 |pages=317–326 |access-date=2023-11-12 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |isbn=978-3-540-41695-1 |last2=Tholey |first2=Torsten|doi=10.1007/3-540-44693-1_28 |url-access=subscription }}</ref> In practice, there are minimal perfect hashing schemes that use roughly 1.56 bits/key if given enough time.<ref name="RecSplit">{{citation
| last1 = Esposito | first1 = Emmanuel
| last2 = Mueller Graf | first2 = Thomas
Line 146 ⟶ 143:
| arxiv = 1910.06416
| doi-access = free
}}.</ref><ref>[https://github.com/iwiwi/minimal-perfect-hash minimal-perfect-hash (GitHub)]</ref>
===k-perfect hashing===
Line 158 ⟶ 155:
===Order preservation===
A minimal perfect hash function {{mvar|F}} is ''order preserving'' if keys are given in some order {{math|''a''<sub>1</sub>, ''a''<sub>2</sub>, ..., ''a''<sub>''n''</sub>}} and for any keys {{math|''a''<sub>''j''</sub>}} and {{math|''a''<sub>''k''</sub>}}, {{math|''j'' < ''k''}} implies {{math|''F''(''a''<sub>''j''</sub>) < F(''a''<sub>''k''</sub>)}}.<ref>{{Citation |first=Bob |last=Jenkins |contribution=order-preserving minimal perfect hashing |title=Dictionary of Algorithms and Data Structures |editor-first=Paul E. |editor-last=Black |publisher=U.S. National Institute of Standards and Technology |date=14 April 2009 |accessdate=2013-03-05 |url=https://xlinux.nist.gov/dads/HTML/orderPreservMinPerfectHash.html}}</ref> In this case, the function value is just the position of each key in the sorted ordering of all of the keys. A simple implementation of order-preserving minimal perfect hash functions with constant access time is to use an (ordinary) perfect hash function to store a lookup table of the positions of each key. This solution uses <math>O(n \log n)</math> bits, which is optimal in the setting where the comparison function for the keys may be arbitrary.<ref>{{citation |last1=Fox |first1=Edward A. |title=Order-preserving minimal perfect hash functions and information retrieval |date=July 1991 |url=http://eprints.cs.vt.edu/archive/00000248/01/TR-91-01.pdf |journal=ACM Transactions on Information Systems |volume=9 |issue=3 |pages=281–308 |___location=New York, NY, USA |publisher=ACM |doi=10.1145/125187.125200 |s2cid=53239140 |last2=Chen |first2=Qi Fan |last3=Daoud |first3=Amjad M. |last4=Heath |first4=Lenwood S.}}.</ref> However, if the keys {{math|''a''<sub>1</sub>, ''a''<sub>2</sub>, ..., ''a''<sub>''n''</sub>}} are integers drawn from a universe <math>\{1, 2, \ldots, U\}</math>, then it is possible to construct an order-preserving hash function using only <math>O(n \log \log \log U)</math> bits of space.<ref>{{citation |last1=Belazzougui |first1=Djamal |title=Theory and practice of monotone minimal perfect hashing |date=November 2008 |journal=Journal of Experimental Algorithmics |volume=16 |at=Art. no. 3.2, 26pp |doi=10.1145/1963190.2025378 |s2cid=2367401 |last2=Boldi |first2=Paolo |last3=Pagh |first3=Rasmus |last4=Vigna |first4=Sebastiano |author3-link=Rasmus Pagh}}.</ref> Moreover, this bound is known to be optimal.<ref>{{Citation |
==Related constructions==
Line 193 ⟶ 190:
* Marshall D. Brain and Alan L. Tharp. "Near-perfect Hashing of Large Word Sets". Software—Practice and Experience, vol. 19(10), 967-078, October 1989. John Wiley & Sons.
* Douglas C. Schmidt, [http://www.dre.vanderbilt.edu/~schmidt/PDF/gperf.pdf GPERF: A Perfect Hash Function Generator], C++ Report, SIGS, Vol. 10, No. 10, November/December, 1998.
* Hans-Peter Lehmann, Thomas Mueller, Rasmus Pagh, Giulio Ermanno Pibiri, Peter Sanders, Sebastiano Vigna, Stefan Walzer, "Modern Minimal Perfect Hashing: A Survey", {{arxiv|2506.06536}}, June 2025. Discusses post-1997 developments in the field.
==External links==
*[https://www.gnu.org/software/gperf/ gperf] is an [[
*[http://burtleburtle.net/bob/hash/perfect.html Minimal Perfect Hashing (bob algorithm)] by Bob Jenkins
*[
*[http://sux.di.unimi.it/ Sux4J]: open source monotone minimal perfect hashing in Java
*[https://web.archive.org/web/20130729211948/http://www.dupuis.me/node/9 MPHSharp]: perfect hashing methods in C#
|