Distributed hash table: Difference between revisions

Content deleted Content added
Cdunlopsb (talk | contribs)
Rescuing 17 sources and tagging 0 as dead.) #IABot (v2.0.9.5
Line 1:
{{Short description|Decentralized distributed system with lookup service}}
{{More citations needed|date=September 2020}}
A '''distributed hash table''' ('''DHT''') is a [[Distributed computing|distributed system]] that provides a lookup service similar to a [[hash table]]. [[Key–value pair]]s are stored in a DHT, and any participating [[node (networking)|node]] can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. ''Keys'' are unique identifiers which map to particular ''values'', which in turn can be anything from addresses, to [[Electronic document|documents]], to arbitrary [[Data (computing)|data]].<ref name=StoicaEtAl2001>{{Cite journal | last1 = Stoica | first1 = I. | author-link1 = Ion Stoica | last2 = Morris | first2 = R. | last3 = Karger | first3 = D. | author-link3 = David Karger | last4 = Kaashoek | first4 = M. F. | last5 = Balakrishnan | first5 = H. | author-link5 = Hari Balakrishnan | title = Chord: A scalable peer-to-peer lookup service for internet applications | doi = 10.1145/964723.383071 | journal = ACM SIGCOMM Computer Communication Review | volume = 31 | issue = 4 | pages = 149 | year = 2001 | url = http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf | quote = A value can be an address, a document, or an arbitrary data item. | access-date = 2018-09-18 | archive-date = 2023-07-07 | archive-url = https://web.archive.org/web/20230707080145/https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf | url-status = live }}</ref> Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to [[scale (computing)|scale]] to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.
 
DHTs form an infrastructure that can be used to build more complex services, such as [[anycast]], cooperative [[Web cache|web caching]], [[distributed file system]]s, [[Domain name system|___domain name services]], [[instant messaging]], [[multicast]], and also [[peer-to-peer file sharing]] and [[content distribution]] systems. Notable distributed networks that use DHTs include [[BitTorrent (protocol)|BitTorrent]]'s distributed tracker, the [[Kad network]], the [[Storm botnet]], the [[Tox (protocol)|Tox instant messenger]], [[Freenet]], the [[YaCy]] search engine, and the [[InterPlanetary File System]].
Line 8:
 
== History ==
DHT research was originally motivated, in part, by [[peer-to-peer]] (P2P) systems such as [[Freenet]], [[Gnutella]], [[BitTorrent]] and [[Napster]], which took advantage of resources distributed across the Internet to provide a single useful application. In particular, they took advantage of increased [[Bandwidth (computing)|bandwidth]] and [[hard disk]] capacity to provide a file-sharing service.<ref>{{cite journal |last1=Liz, Crowcroft |display-authors=et al |title=A survey and comparison of peer-to-peer overlay network schemes |journal=IEEE Communications Surveys & Tutorials |date=2005 |volume=7 |issue=2 |pages=72–93 |doi=10.1109/COMST.2005.1610546 |citeseerx=10.1.1.109.6124 |s2cid=7971188 |url=http://www.cl.cam.ac.uk/teaching/2005/AdvSysTop/survey.pdf |access-date=2019-09-24 |archive-date=2023-10-05 |archive-url=https://web.archive.org/web/20231005035522/https://www.cl.cam.ac.uk/teaching/2005/AdvSysTop/survey.pdf |url-status=live }}</ref>
 
These systems differed in how they located the data offered by their peers. Napster, the first large-scale P2P content delivery system, required a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the queries to the nodes that held the results. This central component left the system vulnerable to attacks and lawsuits.
Line 18:
Distributed hash tables use a more structured key-based routing in order to attain both the decentralization of Freenet and Gnutella, and the efficiency and guaranteed results of Napster. One drawback is that, like Freenet, DHTs only directly support exact-match search, rather than keyword search, although Freenet's [[routing algorithm]] can be generalized to any key type where a closeness operation can be defined.<ref>{{citation |chapter-url=https://freenetproject.org/papers/ddisrs.pdf |title=A Distributed Decentralized Information Storage and Retrieval System |chapter=Section 5.2.2 |access-date=2012-01-10 |archive-date=2012-03-16 |archive-url=https://web.archive.org/web/20120316102156/https://freenetproject.org/papers/ddisrs.pdf |url-status=dead }}</ref>
 
In 2001, four systems&mdash;[[Content addressable network|CAN]],<ref name = "Ratnasamy01">{{cite journal |title=A Scalable Content-Addressable Network |publisher=In Proceedings of ACM SIGCOMM 2001 |author=Ratnasamy |year=2001 |url=http://www.eecs.berkeley.edu/~sylvia/papers/cans.pdf |access-date=2013-05-20 |display-authors=etal |archive-date=2016-03-04 |archive-url=https://web.archive.org/web/20160304200527/http://www.eecs.berkeley.edu/~sylvia/papers/cans.pdf |url-status=live }}</ref> [[Chord (peer-to-peer)|Chord]],<ref>[[Hari Balakrishnan]], [[M. Frans Kaashoek]], David Karger, [[Robert Tappan Morris|Robert Morris]], and Ion Stoica. [http://www.cs.berkeley.edu/~istoica/papers/2003/cacm03.pdf Looking up data in P2P systems] {{Webarchive|url=https://web.archive.org/web/20160519125101/http://www.cs.berkeley.edu/~istoica/papers/2003/cacm03.pdf |date=2016-05-19 }}. In [[Communications of the ACM]], February 2003.</ref> [[Pastry (DHT)|Pastry]], and [[Tapestry (DHT)|Tapestry]]&mdash;ignited DHTs as a popular research topic.
A project called the Infrastructure for Resilient Internet Systems (Iris) was funded by a $12 million grant from the United States [[National Science Foundation]] in 2002.<ref>{{Cite news |title= New P2P network funded by US government |author= David Cohen |work= New Scientist |date= October 1, 2002 |url= https://www.newscientist.com/article.ns?id=dn2861 |access-date= November 10, 2013 |archive-date= April 6, 2008 |archive-url= https://web.archive.org/web/20080406123915/http://www.newscientist.com/article.ns?id=dn2861 |url-status= live }}</ref>
Researchers included [[Sylvia Ratnasamy]], [[Ion Stoica]], [[Hari Balakrishnan]] and [[Scott Shenker]].<ref>{{Cite news |title= MIT, Berkeley, ICSI, NYU, and Rice Launch the IRIS Project |work= Press release |publisher= MIT |date= September 25, 2002 |url= https://iris.pdos.csail.mit.edu/MITPressRelease1.doc |access-date= November 10, 2013 |url-status= dead |archive-url= https://web.archive.org/web/20150926070618/https://iris.pdos.csail.mit.edu/MITPressRelease1.doc |archive-date= September 26, 2015 }}</ref>
Outside academia, DHT technology has been adopted as a component of BitTorrent and in the Coral Content Distribution Network.
Line 27:
 
* [[Decentralized computing|Autonomy and decentralization]]: The nodes collectively form the system without any central coordination.
* [[Fault tolerance]]: The system should be reliable (in some sense) even with nodes continuously joining, leaving, and failing.<ref>R Mokadem, A Hameurlain and AM Tjoa. [https://www.irit.fr/~Riad.Mokadem/wp-content/uploads/sites/67/2020/12/Resource-discovery-service-while-minimizing-maintenance-overhead-in-hierarchical-DHT-systems-iiWas2010.pdf Resource discovery service while minimizing maintenance overhead in hierarchical DHT systems] {{Webarchive|url=https://web.archive.org/web/20220809224052/https://www.irit.fr/~Riad.Mokadem/wp-content/uploads/sites/67/2020/12/Resource-discovery-service-while-minimizing-maintenance-overhead-in-hierarchical-DHT-systems-iiWas2010.pdf |date=2022-08-09 }}. Proc. iiWas, 2010</ref>
* [[scale (computing)|Scalability]]: The system should function efficiently even with thousands or millions of nodes.
 
A key technique used to achieve these goals is that any one node needs to coordinate with only a few other nodes in the system – most commonly, [[Big O notation|O]](log ''n'') of the ''n'' participants (see below) – so that only a limited amount of work needs to be done for each change in membership.
 
Some DHT designs seek to be [[Secure communication|secure]] against malicious participants<ref>Guido Urdaneta, Guillaume Pierre and Maarten van Steen. [http://www.globule.org/publi/SDST_acmcs2009.html A Survey of DHT Security Techniques] {{Webarchive|url=https://web.archive.org/web/20230601193203/http://www.globule.org/publi/SDST_acmcs2009.html |date=2023-06-01 }}. ACM Computing Surveys 43(2), January 2011.</ref> and to allow participants to remain [[Anonymity|anonymous]], though this is less common than in many other peer-to-peer (especially [[file sharing]]) systems; see [[anonymous P2P]].
 
== Structure ==
The structure of a DHT can be decomposed into several main components.<ref>Moni Naor and Udi Wieder. [http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf Novel Architectures for P2P Applications: the Continuous-Discrete Approach] {{Webarchive|url=https://web.archive.org/web/20191209032152/http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf |date=2019-12-09 }}. Proc. SPAA, 2003.</ref><ref>Gurmeet Singh Manku. [http://www-db.stanford.edu/~manku/phd/index.html Dipsea: A Modular Distributed Hash Table] {{webarchive|url=https://web.archive.org/web/20040910154927/http://www-db.stanford.edu/~manku/phd/index.html |date=2004-09-10 }}. Ph. D. Thesis (Stanford University), August 2004.</ref> The foundation is an abstract [[Keyspace (distributed data store)|keyspace]], such as the set of 160-bit [[string (computer science)|string]]s. A keyspace [[Partition (database)|partitioning]] scheme splits ownership of this keyspace among the participating nodes. An [[overlay network]] then connects the nodes, allowing them to find the owner of any given key in the keyspace.
 
Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows. Suppose the keyspace is the set of 160-bit strings. To index a file with given {{Var serif|filename}} and {{mvar|data}} in the DHT, the [[SHA-1]] hash of {{mvar|filename}} is generated, producing a 160-bit key {{mvar|k}}, and a message {{math|''put''(''k, data'')}} is sent to any node participating in the DHT. The message is forwarded from node to node through the overlay network until it reaches the single node responsible for key {{mvar|k}} as specified by the keyspace partitioning. That node then stores the key and the data. Any other client can then retrieve the contents of the file by again hashing {{mvar|filename}} to produce {{mvar|k}} and asking any DHT node to find the data associated with {{mvar|k}} with a message {{math|''get''(''k'')}}. The message will again be routed through the overlay to the node responsible for {{mvar|k}}, which will reply with the stored {{mvar|data}}.
Line 65:
{{further|Locality-preserving hashing}}
 
Locality-preserving hashing ensures that similar keys are assigned to similar objects. This can enable a more efficient execution of range queries, however, in contrast to using consistent hashing, there is no more assurance that the keys (and thus the load) is uniformly randomly distributed over the key space and the participating peers. DHT protocols such as Self-Chord and Oscar<ref>{{Cite journal|last1=Girdzijauskas|first1=Šarūnas|last2=Datta|first2=Anwitaman|last3=Aberer|first3=Karl|date=2010-02-01|title=Structured overlay for heterogeneous environments|journal=ACM Transactions on Autonomous and Adaptive Systems|volume=5|issue=1|pages=1–25|doi=10.1145/1671948.1671950|s2cid=13218263|issn=1556-4665|url=http://infoscience.epfl.ch/record/134972|access-date=2020-03-12|archive-date=2020-07-12|archive-url=https://web.archive.org/web/20200712230838/https://infoscience.epfl.ch/record/134972|url-status=live}}</ref> address such issues. Self-Chord decouples object keys from peer IDs and sorts keys along the ring with a statistical approach based on the [[swarm intelligence]] paradigm.<ref>{{cite journal |last1=Forestiero |first1=Agostino |last2=Leonardi |first2=Emilio |last3=Mastroianni |first3=Carlo |last4=Meo |first4=Michela |title=Self-Chord: A Bio-Inspired P2P Framework for Self-Organizing Distributed Systems |journal=IEEE/ACM Transactions on Networking |date=October 2010 |volume=18 |issue=5 |pages=1651–1664 |doi=10.1109/TNET.2010.2046745 |s2cid=14797120 |url=http://porto.polito.it/2370172/ |access-date=2019-07-28 |archive-date=2012-07-01 |archive-url=https://web.archive.org/web/20120701163158/http://porto.polito.it/2370172/ |url-status=live }}</ref> Sorting ensures that similar keys are stored by neighbour nodes and that discovery procedures, including [[Range query (data structures)|range queries]], can be performed in logarithmic time. Oscar constructs a navigable [[small-world network]] based on [[random walk]] sampling also assuring logarithmic search time.
 
=== Overlay network ===
Line 89:
|}
 
The most common choice, <math>O(\log n)</math> degree/route length, is not optimal in terms of degree/route length tradeoff, but such topologies typically allow more flexibility in choice of neighbors. Many DHTs use that flexibility to pick neighbors that are close in terms of latency in the physical underlying network. In general, all DHTs construct navigable small-world network topologies, which trade-off route length vs. network degree.<ref>{{Cite book|url=https://infoscience.epfl.ch/record/130838?ln=en|title=Designing peer-to-peer overlays a small-world perspective|last=Girdzijauskas|first=Sarunas|date=2009|website=epfl.ch|publisher=EPFL|access-date=2019-11-11|archive-date=2020-03-03|archive-url=https://web.archive.org/web/20200303182938/https://infoscience.epfl.ch/record/130838?ln=en|url-status=live}}</ref>
 
Maximum route length is closely related to [[Diameter (graph theory)|diameter]]: the maximum number of hops in any shortest path between nodes. Clearly, the network's worst case route length is at least as large as its diameter, so DHTs are limited by the degree/diameter tradeoff<ref>{{citation |url=http://maite71.upc.es/grup_de_grafs/table_g.html |title=The (Degree, Diameter) Problem for Graphs |publisher=Maite71.upc.es |access-date=2012-01-10 |archive-url=https://web.archive.org/web/20120217054532/http://maite71.upc.es/grup_de_grafs/table_g.html/ |archive-date=2012-02-17 |url-status=dead }}</ref> that is fundamental in [[graph theory]]. Route length can be greater than diameter, since the greedy routing algorithm may not find shortest paths.<ref>Gurmeet Singh Manku, Moni Naor, and Udi Wieder. [http://citeseer.ist.psu.edu/naor04know.html "Know thy Neighbor's Neighbor: the Power of Lookahead in Randomized P2P Networks"] {{Webarchive|url=https://web.archive.org/web/20080420030133/http://citeseer.ist.psu.edu/naor04know.html |date=2008-04-20 }}. Proc. STOC, 2004.</ref>
 
=== Algorithms for overlay networks ===
Aside from routing, there exist many algorithms that exploit the structure of the overlay network for sending a message to all nodes, or a subset of nodes, in a DHT.<ref>{{cite web|author=[[Ali Ghodsi]]|url=http://www.sics.se/~ali/thesis/|title= Distributed k-ary System: Algorithms for Distributed Hash Tables |archive-url=https://web.archive.org/web/20070522060750/http://www.sics.se/~ali/thesis/ |archive-date=22 May 2007|date=22 May 2007 |url-status=dead}}. KTH-Royal Institute of Technology, 2006.</ref> These algorithms are used by applications to do [[overlay multicast]], range queries, or to collect statistics. Two systems that are based on this approach are Structella,<ref>{{cite journal |last1=Castro |first1=Miguel |last2=Costa |first2=Manuel |last3=Rowstron |first3=Antony |title=Should we build Gnutella on a structured overlay? |journal=ACM SIGCOMM Computer Communication Review |date=1 January 2004 |volume=34 |issue=1 |pages=131 |doi=10.1145/972374.972397 |citeseerx=10.1.1.221.7892 |s2cid=6587291 |url=http://nms.lcs.mit.edu/HotNets-II/papers/structella.pdf |access-date=25 September 2019 |archive-date=14 February 2021 |archive-url=https://web.archive.org/web/20210214011937/http://nms.lcs.mit.edu/HotNets-II/papers/structella.pdf |url-status=live }}</ref> which implements flooding and random walks on a Pastry overlay, and DQ-DHT, which implements a dynamic querying search algorithm over a Chord network.<ref>{{cite journal |last1=Talia |first1=Domenico |last2=Trunfio |first2=Paolo |title=Enabling Dynamic Querying over Distributed Hash Tables |journal=Journal of Parallel and Distributed Computing |date=December 2010 |volume=70 |issue=12 |pages=1254–1265 |doi=10.1016/j.jpdc.2010.08.012 }}</ref>
 
== Security ==
Line 107:
</ref>
 
A DHT system that is carefully designed to have [[Byzantine fault tolerance]] can defend against a security weakness, known as the [[Sybil attack]], which affects most current DHT designs.<ref>Maxwell Young; Aniket Kate; Ian Goldberg; Martin Karsten.
[http://www.cypherpunks.ca/~iang/pubs/robustMessagePassing.pdf "Practical Robust Communication in DHTs Tolerating a Byzantine Adversary"] {{Webarchive|url=https://web.archive.org/web/20160722030852/http://www.cypherpunks.ca/~iang/pubs/robustMessagePassing.pdf |date=2016-07-22 }}.</ref><ref>
Maxwell Young; Aniket Kate; Ian Goldberg; Martin Karsten.
[http://www.cypherpunks.ca/~iang/pubs/robustMessagePassing.pdf "Practical Robust Communication in DHTs Tolerating a Byzantine Adversary"].
</ref><ref>
Natalya Fedotova; Giordano Orzetti; Luca Veltri; Alessandro Zaccagnini.
"Byzantine agreement for reputation management in DHT-based peer-to-peer networks".
{{doi|10.1109/ICTEL.2008.4652638}}
</ref> Whanau is a DHT designed to be resistant to Sybil attacks.<ref>Whanau: A Sybil-proof Distributed Hash Table
https://pdos.csail.mit.edu/papers/whanau-nsdi10.pdf {{Webarchive|url=https://web.archive.org/web/20220125025128/https://pdos.csail.mit.edu/papers/whanau-nsdi10.pdf |date=2022-01-25 }}</ref>
Whanau: A Sybil-proof Distributed Hash Table
https://pdos.csail.mit.edu/papers/whanau-nsdi10.pdf
</ref>
 
Petar Maymounkov, one of the original authors of [[Kademlia]], has proposed a way to circumvent the weakness to the Sybil attack by incorporating social trust relationships into the system design.<ref>{{cite journal |author=Chris Lesniewski-Laas |title=A Sybil-proof one-hop DHT |pages=20 |url=http://pdos.csail.mit.edu/papers/sybil-dht-socialnets08.pdf |access-date=2018-02-16 |archive-date=2017-05-15 |archive-url=https://web.archive.org/web/20170515220528/https://pdos.csail.mit.edu/papers/sybil-dht-socialnets08.pdf |url-status=live }}</ref> The new system, codenamed Tonika or also known by its ___domain name as 5ttt, is based on an algorithm design known as "electric routing" and co-authored with the mathematician Jonathan Kelner.<ref>{{cite journal |author=Jonathan Kelner, Petar Maymounkov |title=Electric routing and concurrent flow cutting |url=https://archive.org/details/arxiv-0909.2859 |arxiv=0909.2859|bibcode=2009arXiv0909.2859K|year=2009 }}</ref> Maymounkov has now undertaken a comprehensive implementation effort of this new system. However, research into effective defences against Sybil attacks is generally considered an open question, and wide variety of potential defences are proposed every year in top security research conferences.{{Citation needed|date=May 2020}}
 
== Implementations ==
Line 129 ⟶ 125:
* Redundancy can be added to improve reliability. The {{var serif|1=(k, data)}} key pair can be stored in more than one node corresponding to the key. Usually, rather than selecting just one node, real world DHT algorithms select {{var serif|1=i}} suitable nodes, with {{var serif|1=i}} being an implementation-specific parameter of the DHT. In some DHT designs, nodes agree to handle a certain keyspace range, the size of which may be chosen dynamically, rather than hard-coded.
* Some advanced DHTs like [[Kademlia]] perform iterative lookups through the DHT first in order to select a set of suitable nodes and send {{var serif|1=put(k, data)}} messages only to those nodes, thus drastically reducing useless traffic, since published messages are only sent to nodes that seem suitable for storing the key {{var serif|1=k}}; and iterative lookups cover just a small set of nodes rather than the entire DHT, reducing useless forwarding. In such DHTs, forwarding of {{Var serif|put(k, data)}} messages may only occur as part of a self-healing algorithm: if a target node receives a {{var serif|1=put(k, data)}} message, but believes that {{var serif|1=k}} is out of its handled range and a closer node (in terms of DHT keyspace) is known, the message is forwarded to that node. Otherwise, data are indexed locally. This leads to a somewhat self-balancing DHT behavior. Of course, such an algorithm requires nodes to publish their presence data in the DHT so the iterative lookups can be performed.
* Since on most machines sending messages is much more expensive than local hash table accesses, it makes sense to bundle many messages concerning a particular node into a single batch. Assuming each node has a local batch consisting of at most {{var serif|1=b}} operations, the bundling procedure is as follows. Each node first sorts its local batch by the identifier of the node responsible for the operation. Using [[bucket sort]], this can be done in {{var serif|1=O(b + n)}}, where {{var serif|1=n}} is the number of nodes in the DHT. When there are multiple operations addressing the same key within one batch, the batch is condensed before being sent out. For example, multiple lookups of the same key can be reduced to one or multiple increments can be reduced to a single add operation. This reduction can be implemented with the help of a temporary local hash table. Finally, the operations are sent to the respective nodes.<ref>{{Cite book|url=https://www.springer.com/gp/book/9783030252083|title=Sequential and Parallel Algorithms and Data Structures: The Basic Toolbox|last1=Sanders|first1=Peter|last2=Mehlhorn|first2=Kurt|last3=Dietzfelbinger|first3=Martin|last4=Dementiev|first4=Roman|date=2019|publisher=Springer International Publishing|isbn=978-3-030-25208-3|language=en|access-date=2020-01-22|archive-date=2021-08-17|archive-url=https://web.archive.org/web/20210817105142/https://www.springer.com/gp/book/9783030252083|url-status=live}}</ref>
 
== Examples ==