Content deleted Content added
WhackTheWiki (talk | contribs) m →Variations: fix minor comma issue, and tighten/simplify prose for improved clarity |
m →External links: HTTP to HTTPS for Blogspot |
||
(27 intermediate revisions by 14 users not shown) | |||
Line 12:
===Lookup===
Cuckoo hashing uses two hash tables, <math>T_1</math> and <math>T_2</math>. Assuming <math>r</math> is the length of each table, the hash functions for the two tables is defined as, <math>h_1,\ h_2\ :\
{|
|- style="vertical-align:top"
Line 23:
===Deletion===
Deletion is performed in <math>O(1)</math> time since
===Insertion===
{|
|- style="vertical-align:top"
Line 50:
19 '''end function'''
|}
On lines 10 and 15, the "cuckoo approach" of kicking other
==Theory==
Line 57:
One method of proving this uses the theory of [[random graph]]s: one may form an [[undirected graph]] called the "cuckoo graph" that has a vertex for each hash table ___location, and an edge for each hashed value, with the endpoints of the edge being the two possible locations of the value. Then, the greedy insertion algorithm for adding a set of values to a cuckoo hash table succeeds if and only if the cuckoo graph for this set of values is a [[pseudoforest]], a graph with at most one cycle in each of its [[connected component (graph theory)|connected component]]s. Any vertex-induced subgraph with more edges than vertices corresponds to a set of keys for which there are an insufficient number of slots in the hash table. When the hash function is chosen randomly, the cuckoo graph is a [[random graph]] in the [[Erdős–Rényi model]]. With high probability, for load factor less than 1/2 (corresponding to a random graph in which the ratio of the number of edges to the number of vertices is bounded below 1/2), the graph is a pseudoforest and the cuckoo hashing algorithm succeeds in placing all keys. The same theory also proves that the expected size of a [[Connected component (graph theory)|connected component]] of the cuckoo graph is small, ensuring that each insertion takes constant expected time. However, also with high probability, a load factor greater than 1/2 will lead to a [[giant component]] with two or more cycles, causing the data structure to fail and need to be resized.<ref>{{Cite conference|first=Reinhard|last=Kutzelnigg|url=https://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/viewFile/dmAG0133/1710.pdf|conference=Fourth Colloquium on Mathematics and Computer Science|title=Bipartite random graphs and cuckoo hashing|series=Discrete Mathematics and Theoretical Computer Science|year=2006|volume=AG|pages=403–406}}</ref>
Since a theoretical random hash function requires too much space for practical usage, an important theoretical question is which practical hash functions suffice for Cuckoo hashing. One approach is to use [[k-independent hashing]]. In 2009 it was shown<ref>Cohen, Jeffrey S., and Daniel M. Kane. "Bounds on the independence required for cuckoo hashing." ACM Transactions on Algorithms (2009).</ref> that <math>O(\log n)</math>-independence suffices, and at least 6-independence is needed. Another approach is to use [[
A third approach from 2014<ref>Aumüller, Martin, Martin Dietzfelbinger, and Philipp Woelfel. "Explicit and efficient hash families suffice for cuckoo hashing with a stash." Algorithmica 70.3 (2014): 428-456.</ref> is to slightly modify the cuckoo hashtable with a so-called stash, which makes it possible to use nothing more than 2-independent hash functions.
Line 63:
In practice, cuckoo hashing is about 20–30% slower than [[linear probing]], which is the fastest of the common approaches.<ref name="Cuckoo"/>
The reason is that cuckoo hashing often causes two cache misses per search, to check the two locations where a key might be stored, while linear probing usually causes only one cache miss per search.
However, because of its worst case guarantees on search time, cuckoo hashing can still be valuable when [[Real-time computing|real-time response rates]] are required
==Example==
The following hash functions are given (the two least significant digits of k in base 11):
<math>h\left(k\right)=k\bmod 11</math><br/>
<math>h'\left(k\right)=\left\lfloor\frac{k}{11}\right\rfloor\bmod 11</math>
The following two tables show the insertion of some example elements. Each column corresponds to the state of the two hash tables over time. The possible insertion locations for each new value are highlighted. The last column illustrates a failed insertion due to a cycle, details below.
<div style="float:left; margin-right:1em">
{| class="wikitable"
|+
|-
|colspan=2|
!colspan=10|
|-
!colspan=2|
|
|-▼
!colspan=2| Key inserted
| 53 || 50 || 20 || 75 || 100 || 67 || 105 || 3 || 36 || 45
|-
!colspan=2| h(k)
| 9 || 6 || 9 || 9 || 1 || 1 || 6 || 3 || 3 ||
|-
! rowspan=11 {{vert header|va=middle|Hash table entries}}
Line 91 ⟶ 94:
|-
! 1
| || || || ||bgcolor=#ffffcc| 100 ||bgcolor=#ffffcc| 67 || 67 || 67 || 67 ||bgcolor=#ffffcc|
|-
! 2
Line 106 ⟶ 109:
|-
! 6
| ||bgcolor=#ffffcc| 50 || 50 || 50 || 50 || 50 ||bgcolor=#ffffcc| 105 || 105 || 105 |
|-
! 7
Line 115 ⟶ 118:
|-
! 9
|bgcolor=#ffffcc|
|-
! 10
Line 124 ⟶ 127:
<div style="float:left">
{| class="wikitable"
|+
|-
|colspan=2|
!colspan=10|
|-
!colspan=2|
|
|-▼
!colspan=2| Key inserted
| 53 || 50 || 20 || 75 || 100 || 67 || 105 || 3 || 36 || 45
|-
!colspan=2| h′(k)
|
|-
! rowspan=11 {{vert header|va=middle|Hash table entries}}
Line 140 ⟶ 146:
|-
! 1
|
|-
! 2
Line 146 ⟶ 152:
|-
! 3
| || || || || || || || ||bgcolor=#ffffcc| |
|-
! 4
|-
! 5
Line 155 ⟶ 161:
|-
! 6
| || || ||bgcolor=#ffffcc| || ||bgcolor=#ffffcc| || 75 || 75 || 75 ||
|-
! 7
Line 164 ⟶ 170:
|-
! 9
| || || || ||bgcolor=#ffffcc| || 100 ||bgcolor=#ffffcc| 100 || 100 || 100 ||
|-
! 10
Line 173 ⟶ 179:
===Cycle===
If you
<math>h\left(
<math>h'\left(
{| class="wikitable"
|-
!
!
|-
|
|-
|
|-
|
|-
|
|-
|
|-
|
▲|-
▲|-
|-
|
|-
|
|}
Line 210:
Generalizations of cuckoo hashing that use more than two alternative hash functions can be expected to utilize a larger part of the capacity of the hash table efficiently while sacrificing some lookup and insertion speed. Using just three hash functions increases the load to 91%.<ref name="mitzenmacher2009survey">{{cite journal | last=Mitzenmacher | first=Michael | author-link=Michael Mitzenmacher |title=Some Open Questions Related to Cuckoo Hashing |journal=Proceedings of ESA 2009 | date=2009-09-09 |url=http://www.eecs.harvard.edu/~michaelm/postscripts/esa2009.pdf | access-date=2010-11-10 }}</ref>
Another generalization of cuckoo hashing called ''blocked cuckoo hashing'' uses more than one key per bucket
| last1 = Dietzfelbinger | first1 = Martin
| last2 = Weidling | first2 = Christoph
Line 270:
A survey by [[Michael Mitzenmacher|Mitzenmacher]]<ref name="mitzenmacher2009survey" /> presents open problems related to cuckoo hashing as of 2009.
==
Cuckoo
The TikTok recommendation system "Monolith" takes advantage cuckoo hashing's collision resolution to prevent different concepts from being mapped to the same vectors.<ref>{{
==See also==
Line 284:
==External links==
* [http://www.ru.is/faculty/ulfar/CuckooHash.pdf A cool and practical alternative to traditional hash tables] {{Webarchive|url=https://web.archive.org/web/20190407085017/http://www.ru.is/faculty/ulfar/CuckooHash.pdf |date=2019-04-07 }}, U. Erlingsson, M. Manasse, F. Mcsherry, 2006.
* [http://www.itu.dk/people/pagh/papers/cuckoo-undergrad.pdf Cuckoo Hashing for Undergraduates, 2006], R. Pagh, 2006.
* [
* {{cite conference |last=Naor |first=Moni |author2=Segev, Gil |author3=Wieder, Udi |title=History-Independent Cuckoo Hashing |book-title=International Colloquium on Automata, Languages and Programming (ICALP) |place=Reykjavik, Iceland |year=2008 |url=http://www.wisdom.weizmann.ac.il/~naor/PAPERS/cuckoo_hi_abs.html |access-date=2008-07-21}}
* [http://www.cs.princeton.edu/~mfreed/docs/cuckoo-eurosys14.pdf Algorithmic Improvements for Fast Concurrent Cuckoo Hashing], X. Li, D. Andersen, M. Kaminsky, M. Freedman. EuroSys 2014.
Line 292:
===Examples===
*[https://github.com/efficient/libcuckoo Concurrent high-performance Cuckoo hashtable written in C++]
* [
* [http://www.theiling.de/projects/lookuptable.html Static cuckoo hashtable generator for C/C++]
* [http://hackage.haskell.org/packages/archive/hashtables/latest/doc/html/Data-HashTable-ST-Cuckoo.html Cuckoo hash table written in Haskell]
|