Content deleted Content added
Tags: Reverted possible vandalism |
m →External links: HTTP to HTTPS for Blogspot |
||
(30 intermediate revisions by 15 users not shown) | |||
Line 8:
==Operations==
Cuckoo hashing is a form of [[open addressing]] in which each non-empty cell of a [[hash table]] contains a [[unique key|key]] or [[name–value pair|key–value pair]]. A [[hash function]] is used to determine the ___location for each key, and its presence in the table (or the value associated with it) can be found by examining that cell of the table. However, open addressing suffers from [[hash collision|collisions]], which happens when more than one key is mapped to the same cell.
The basic idea of cuckoo hashing is to resolve collisions by using two hash functions instead of only one. This provides two possible locations in the hash table for each key. In one of the commonly used variants of the algorithm, the hash table is split into two smaller tables of equal size, and each hash function provides an index into one of these two tables. It is also possible for both hash functions to provide indexes into a single table.{{r|Cuckoo|p=121-122}}
===Lookup===
Cuckoo hashing uses two hash tables, <math>T_1</math> and <math>T_2</math>. Assuming <math>r</math> is the length of each table, the hash functions for the two tables is defined as, <math>h_1,\ h_2\ :\
{|
|- style="vertical-align:top"
Line 24 ⟶ 23:
===Deletion===
Deletion is performed in <math>O(1)</math> time since
===Insertion===
{|
|- style="vertical-align:top"
Line 51 ⟶ 50:
19 '''end function'''
|}
On lines 10 and 15, the "cuckoo approach" of kicking other
==Theory==
Line 58 ⟶ 57:
One method of proving this uses the theory of [[random graph]]s: one may form an [[undirected graph]] called the "cuckoo graph" that has a vertex for each hash table ___location, and an edge for each hashed value, with the endpoints of the edge being the two possible locations of the value. Then, the greedy insertion algorithm for adding a set of values to a cuckoo hash table succeeds if and only if the cuckoo graph for this set of values is a [[pseudoforest]], a graph with at most one cycle in each of its [[connected component (graph theory)|connected component]]s. Any vertex-induced subgraph with more edges than vertices corresponds to a set of keys for which there are an insufficient number of slots in the hash table. When the hash function is chosen randomly, the cuckoo graph is a [[random graph]] in the [[Erdős–Rényi model]]. With high probability, for load factor less than 1/2 (corresponding to a random graph in which the ratio of the number of edges to the number of vertices is bounded below 1/2), the graph is a pseudoforest and the cuckoo hashing algorithm succeeds in placing all keys. The same theory also proves that the expected size of a [[Connected component (graph theory)|connected component]] of the cuckoo graph is small, ensuring that each insertion takes constant expected time. However, also with high probability, a load factor greater than 1/2 will lead to a [[giant component]] with two or more cycles, causing the data structure to fail and need to be resized.<ref>{{Cite conference|first=Reinhard|last=Kutzelnigg|url=https://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/viewFile/dmAG0133/1710.pdf|conference=Fourth Colloquium on Mathematics and Computer Science|title=Bipartite random graphs and cuckoo hashing|series=Discrete Mathematics and Theoretical Computer Science|year=2006|volume=AG|pages=403–406}}</ref>
Since a theoretical random hash function requires too much space for practical usage, an important theoretical question is which practical hash functions suffice for Cuckoo hashing. One approach is to use [[k-independent hashing]]. In 2009 it was shown<ref>Cohen, Jeffrey S., and Daniel M. Kane. "Bounds on the independence required for cuckoo hashing." ACM Transactions on Algorithms (2009).</ref> that <math>O(\log n)</math>-independence suffices, and at least 6-independence is needed. Another approach is to use [[
A third approach from 2014<ref>Aumüller, Martin, Martin Dietzfelbinger, and Philipp Woelfel. "Explicit and efficient hash families suffice for cuckoo hashing with a stash." Algorithmica 70.3 (2014): 428-456.</ref> is to slightly modify the cuckoo hashtable with a so-called stash, which makes it possible to use nothing more than 2-independent hash functions.
Line 64 ⟶ 63:
In practice, cuckoo hashing is about 20–30% slower than [[linear probing]], which is the fastest of the common approaches.<ref name="Cuckoo"/>
The reason is that cuckoo hashing often causes two cache misses per search, to check the two locations where a key might be stored, while linear probing usually causes only one cache miss per search.
However, because of its worst case guarantees on search time, cuckoo hashing can still be valuable when [[Real-time computing|real-time response rates]] are required
==Example==
The following hash functions are given (the two least significant digits of k in base 11):
<math>h\left(k\right)=k\bmod 11</math><br/>
<math>h'\left(k\right)=\left\lfloor\frac{k}{11}\right\rfloor\bmod 11</math>
The following two tables show the insertion of some example elements. Each column corresponds to the state of the two hash tables over time. The possible insertion locations for each new value are highlighted. The last column illustrates a failed insertion due to a cycle, details below.
<div style="float:left; margin-right:1em">
{| class="wikitable"
|+
|-
|colspan=2|
!colspan=10|
|-
!colspan=2|
|
|-▼
!colspan=2| Key inserted
| 53 || 50 || 20 || 75 || 100 || 67 || 105 || 3 || 36 || 45
|-
!colspan=2| h(k)
| 9 || 6 || 9 || 9 || 1 || 1 || 6 || 3 || 3 ||
|-
! rowspan=11 {{vert header|va=middle|Hash table entries}}
Line 92 ⟶ 94:
|-
! 1
| || || || ||bgcolor=#ffffcc| 100 ||bgcolor=#ffffcc| 67 || 67 || 67 || 67 ||bgcolor=#ffffcc|
|-
! 2
Line 107 ⟶ 109:
|-
! 6
| ||bgcolor=#ffffcc| 50 || 50 || 50 || 50 || 50 ||bgcolor=#ffffcc| 105 || 105 || 105 |
|-
! 7
Line 116 ⟶ 118:
|-
! 9
|bgcolor=#ffffcc|
|-
! 10
Line 125 ⟶ 127:
<div style="float:left">
{| class="wikitable"
|+
|-
|colspan=2|
!colspan=10|
|-
!colspan=2|
|
|-▼
!colspan=2| Key inserted
| 53 || 50 || 20 || 75 || 100 || 67 || 105 || 3 || 36 || 45
|-
!colspan=2| h′(k)
|
|-
! rowspan=11 {{vert header|va=middle|Hash table entries}}
Line 141 ⟶ 146:
|-
! 1
|
|-
! 2
Line 147 ⟶ 152:
|-
! 3
| || || || || || || || ||bgcolor=#ffffcc| |
|-
! 4
|-
! 5
Line 156 ⟶ 161:
|-
! 6
| || || ||bgcolor=#ffffcc| || ||bgcolor=#ffffcc| || 75 || 75 || 75 ||
|-
! 7
Line 165 ⟶ 170:
|-
! 9
| || || || ||bgcolor=#ffffcc| || 100 ||bgcolor=#ffffcc| 100 || 100 || 100 ||
|-
! 10
Line 174 ⟶ 179:
===Cycle===
If you
<math>h\left(
<math>h'\left(
{| class="wikitable"
|-
!
!
|-
|
|-
|
|-
|
|-
|
|-
|
|-
|
|-
|
|-
|
▲|-
▲|-
|}
Line 210 ⟶ 209:
Generalizations of cuckoo hashing that use more than two alternative hash functions can be expected to utilize a larger part of the capacity of the hash table efficiently while sacrificing some lookup and insertion speed. Using just three hash functions increases the load to 91%.<ref name="mitzenmacher2009survey">{{cite journal | last=Mitzenmacher | first=Michael | author-link=Michael Mitzenmacher |title=Some Open Questions Related to Cuckoo Hashing |journal=Proceedings of ESA 2009 | date=2009-09-09 |url=http://www.eecs.harvard.edu/~michaelm/postscripts/esa2009.pdf | access-date=2010-11-10 }}</ref>
Another generalization of cuckoo hashing
| last1 = Dietzfelbinger | first1 = Martin
| last2 = Weidling | first2 = Christoph
Line 270:
A survey by [[Michael Mitzenmacher|Mitzenmacher]]<ref name="mitzenmacher2009survey" /> presents open problems related to cuckoo hashing as of 2009.
==
Cuckoo
The TikTok recommendation system "Monolith" takes advantage cuckoo hashing's collision resolution to prevent different concepts from being mapped to the same vectors.<ref>{{
==See also==
Line 284:
==External links==
* [http://www.ru.is/faculty/ulfar/CuckooHash.pdf A cool and practical alternative to traditional hash tables] {{Webarchive|url=https://web.archive.org/web/20190407085017/http://www.ru.is/faculty/ulfar/CuckooHash.pdf |date=2019-04-07 }}, U. Erlingsson, M. Manasse, F. Mcsherry, 2006.
* [http://www.itu.dk/people/pagh/papers/cuckoo-undergrad.pdf Cuckoo Hashing for Undergraduates, 2006], R. Pagh, 2006.
* [
* {{cite conference |last=Naor |first=Moni |author2=Segev, Gil |author3=Wieder, Udi |title=History-Independent Cuckoo Hashing |book-title=International Colloquium on Automata, Languages and Programming (ICALP) |place=Reykjavik, Iceland |year=2008 |url=http://www.wisdom.weizmann.ac.il/~naor/PAPERS/cuckoo_hi_abs.html |access-date=2008-07-21}}
* [http://www.cs.princeton.edu/~mfreed/docs/cuckoo-eurosys14.pdf Algorithmic Improvements for Fast Concurrent Cuckoo Hashing], X. Li, D. Andersen, M. Kaminsky, M. Freedman. EuroSys 2014.
Line 292:
===Examples===
*[https://github.com/efficient/libcuckoo Concurrent high-performance Cuckoo hashtable written in C++]
* [
* [http://www.theiling.de/projects/lookuptable.html Static cuckoo hashtable generator for C/C++]
* [http://hackage.haskell.org/packages/archive/hashtables/latest/doc/html/Data-HashTable-ST-Cuckoo.html Cuckoo hash table written in Haskell]
|