Content deleted Content added
Yonkeltron (talk | contribs) m Added link to planetlab and a citation of the Coral CDN paper #article-section-source-editor Tags: Mobile edit Mobile app edit iOS app edit |
DavidBrooks (talk | contribs) Reverted 1 edit by 112.215.244.250 (talk): Broke syntax |
||
(14 intermediate revisions by 8 users not shown) | |||
Line 1:
{{Short description|Decentralized distributed system with lookup service}}
A '''distributed hash table''' ('''DHT''') is a [[Distributed computing|distributed system]] that provides a lookup service similar to a [[hash table]]. [[Key–value pair]]s are stored in a DHT, and any participating [[node (networking)|node]] can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys.<ref>{{Cite book |last1=Hota |first1=Chittaranjan |url=https://books.google.com/books?id=5I25BQAAQBAJ |title=Distributed Computing and Internet Technology: 9th International Conference, ICDCIT 2013, Bhubaneswar, India, February 5-8, 2013, Proceedings |last2=Srimani |first2=Pradip K. |date=2013-01-11 |publisher=Springer |isbn=978-3-642-36071-8 |language=en}}</ref> ''Keys'' are unique identifiers which map to particular ''values'', which in turn can be anything from addresses, to [[Electronic document|documents]], to arbitrary [[Data (computing)|data]].<ref name=StoicaEtAl2001>{{Cite journal | last1 = Stoica | first1 = I. | author-link1 = Ion Stoica | last2 = Morris | first2 = R. | last3 = Karger | first3 = D. | author-link3 = David Karger | last4 = Kaashoek | first4 = M. F. | last5 = Balakrishnan | first5 = H. | author-link5 = Hari Balakrishnan | title = Chord: A scalable peer-to-peer lookup service for internet applications | doi = 10.1145/964723.383071 | journal = ACM SIGCOMM Computer Communication Review | volume = 31 | issue = 4 | pages = 149 | year = 2001 | url = http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf | quote = A value can be an address, a document, or an arbitrary data item. | access-date = 2018-09-18 | archive-date = 2023-07-07 | archive-url = https://web.archive.org/web/20230707080145/https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf | url-status = live }}</ref> Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to [[scale (computing)|scale]] to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.▼
▲A '''distributed hash table''' ('''DHT''') is a [[Distributed computing|distributed system]] that provides a lookup service similar to a [[hash table]]. [[Key–value pair]]s are stored in a DHT, and any participating [[node (networking)|node]] can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. ''Keys'' are unique identifiers which map to particular ''values'', which in turn can be anything from addresses, to [[Electronic document|documents]], to arbitrary [[Data (computing)|data]].<ref name=StoicaEtAl2001>{{Cite journal | last1 = Stoica | first1 = I. | author-link1 = Ion Stoica | last2 = Morris | first2 = R. | last3 = Karger | first3 = D. | author-link3 = David Karger | last4 = Kaashoek | first4 = M. F. | last5 = Balakrishnan | first5 = H. | author-link5 = Hari Balakrishnan | title = Chord: A scalable peer-to-peer lookup service for internet applications | doi = 10.1145/964723.383071 | journal = ACM SIGCOMM Computer Communication Review | volume = 31 | issue = 4 | pages = 149 | year = 2001 | url = http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf | quote = A value can be an address, a document, or an arbitrary data item. | access-date = 2018-09-18 | archive-date = 2023-07-07 | archive-url = https://web.archive.org/web/20230707080145/https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf | url-status = live }}</ref> Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to [[scale (computing)|scale]] to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.
DHTs form an infrastructure that can be used to build more complex services, such as [[anycast]], cooperative [[Web cache|web caching]], [[distributed file system]]s, [[Domain name system|___domain name services]], [[instant messaging]], [[multicast]], and also [[peer-to-peer file sharing]] and [[content distribution]] systems. Notable distributed networks that use DHTs include [[BitTorrent (protocol)|BitTorrent]]'s distributed tracker, the [[Kad network]], the [[Storm botnet]], the [[Tox (protocol)|Tox instant messenger]], [[Freenet]], the [[YaCy]] search engine, and the [[InterPlanetary File System]].
Line 12 ⟶ 11:
These systems differed in how they located the data offered by their peers. Napster, the first large-scale P2P content delivery system, required a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the queries to the nodes that held the results. This central component left the system vulnerable to attacks and lawsuits.
Gnutella and similar networks moved to a [[query flooding]] model{{spaced ndash}} in essence, each search would result in a message being broadcast to every machine in the network. While avoiding a [[single point of failure]], this method was significantly less efficient than Napster. Later versions of Gnutella clients moved to a
Freenet is fully distributed, but employs a [[Heuristic (computer science)|heuristic]] [[key-based routing]] in which each file is associated with a key, and files with similar keys tend to cluster on a similar set of nodes. Queries are likely to be routed through the network to such a cluster without needing to visit many peers.<ref>{{citation |url=https://freenetproject.org/papers/lic.pdf |title=Searching in a Small World Chapters 1 & 2 |access-date=2012-01-10 |archive-date=2012-03-16 |archive-url=https://web.archive.org/web/20120316102141/https://freenetproject.org/papers/lic.pdf |url-status=dead }}</ref> However, Freenet does not guarantee that data will be found.
Line 18 ⟶ 17:
Distributed hash tables use a more structured key-based routing in order to attain both the decentralization of Freenet and Gnutella, and the efficiency and guaranteed results of Napster. One drawback is that, like Freenet, DHTs only directly support exact-match search, rather than keyword search, although Freenet's [[routing algorithm]] can be generalized to any key type where a closeness operation can be defined.<ref>{{citation |chapter-url=https://freenetproject.org/papers/ddisrs.pdf |title=A Distributed Decentralized Information Storage and Retrieval System |chapter=Section 5.2.2 |access-date=2012-01-10 |archive-date=2012-03-16 |archive-url=https://web.archive.org/web/20120316102156/https://freenetproject.org/papers/ddisrs.pdf |url-status=dead }}</ref>
In 2001, four systems—[[Content addressable network|CAN]],<ref name
A project called the Infrastructure for Resilient Internet Systems (Iris) was funded by a $12 million grant from the United States [[National Science Foundation]] in 2002.<ref>{{Cite news |title= New P2P network funded by US government |author= David Cohen |work= New Scientist |date= October 1, 2002 |url= https://www.newscientist.com/article.ns?id=dn2861 |access-date= November 10, 2013 |archive-date= April 6, 2008 |archive-url= https://web.archive.org/web/20080406123915/http://www.newscientist.com/article.ns?id=dn2861 |url-status= live }}</ref>
Researchers included [[Sylvia Ratnasamy]], [[Ion Stoica]], [[Hari Balakrishnan]] and [[Scott Shenker]].<ref>{{Cite news |title= MIT, Berkeley, ICSI, NYU, and Rice Launch the IRIS Project |work= Press release |publisher= MIT |date= September 25, 2002 |url= https://iris.pdos.csail.mit.edu/MITPressRelease1.doc |access-date= November 10, 2013 |url-status= dead |archive-url= https://web.archive.org/web/20150926070618/https://iris.pdos.csail.mit.edu/MITPressRelease1.doc |archive-date= September 26, 2015 }}</ref>
Outside academia, DHT technology has been adopted as a component of BitTorrent and in [[PlanetLab]] projects such as the Coral Content Distribution Network
== Properties ==
Line 37 ⟶ 36:
The structure of a DHT can be decomposed into several main components.<ref>Moni Naor and Udi Wieder. [http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf Novel Architectures for P2P Applications: the Continuous-Discrete Approach] {{Webarchive|url=https://web.archive.org/web/20191209032152/http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf |date=2019-12-09 }}. Proc. SPAA, 2003.</ref><ref>Gurmeet Singh Manku. [http://www-db.stanford.edu/~manku/phd/index.html Dipsea: A Modular Distributed Hash Table] {{webarchive|url=https://web.archive.org/web/20040910154927/http://www-db.stanford.edu/~manku/phd/index.html |date=2004-09-10 }}. Ph. D. Thesis (Stanford University), August 2004.</ref> The foundation is an abstract [[Keyspace (distributed data store)|keyspace]], such as the set of 160-bit [[string (computer science)|string]]s. A keyspace [[Partition (database)|partitioning]] scheme splits ownership of this keyspace among the participating nodes. An [[overlay network]] then connects the nodes, allowing them to find the owner of any given key in the keyspace.
Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows. Suppose the keyspace is the set of 160-bit strings. To index a file with given {{
The keyspace partitioning and overlay network components are described below with the goal of capturing the principal ideas common to most DHTs; many designs differ in the details.
Line 89 ⟶ 88:
|}
The most common choice, <math>O(\log n)</math> degree/route length, is not optimal in terms of degree/route length tradeoff, but such topologies typically allow more flexibility in choice of neighbors. Many DHTs use that flexibility to pick neighbors that are close in terms of latency in the physical underlying network. In general, all DHTs construct navigable small-world network topologies, which trade-off route length vs. network degree.<ref>{{Cite
Maximum route length is closely related to [[Diameter (graph theory)|diameter]]: the maximum number of hops in any shortest path between nodes. Clearly, the network's worst case route length is at least as large as its diameter, so DHTs are limited by the degree/diameter tradeoff<ref>{{citation |url=http://maite71.upc.es/grup_de_grafs/table_g.html |title=The (Degree, Diameter) Problem for Graphs |publisher=Maite71.upc.es |access-date=2012-01-10 |archive-url=https://web.archive.org/web/20120217054532/http://maite71.upc.es/grup_de_grafs/table_g.html/ |archive-date=2012-02-17 |url-status=dead }}</ref> that is fundamental in [[graph theory]]. Route length can be greater than diameter, since the greedy routing algorithm may not find shortest paths.<ref>Gurmeet Singh Manku, Moni Naor, and Udi Wieder. [http://citeseer.ist.psu.edu/naor04know.html "Know thy Neighbor's Neighbor: the Power of Lookahead in Randomized P2P Networks"] {{Webarchive|url=https://web.archive.org/web/20080420030133/http://citeseer.ist.psu.edu/naor04know.html |date=2008-04-20 }}. Proc. STOC, 2004.</ref>
Line 115 ⟶ 114:
https://pdos.csail.mit.edu/papers/whanau-nsdi10.pdf {{Webarchive|url=https://web.archive.org/web/20220125025128/https://pdos.csail.mit.edu/papers/whanau-nsdi10.pdf |date=2022-01-25 }}</ref>
Petar Maymounkov, one of the original authors of [[Kademlia]], has proposed a way to circumvent the weakness to the Sybil attack by incorporating social trust relationships into the system design.<ref>{{
== Implementations ==
Line 159 ⟶ 158:
* [[Oracle Coherence]]: an in-memory data grid built on top of a Java DHT implementation
* [[Perfect Dark (P2P)|Perfect Dark]]: a [[peer-to-peer]] [[file-sharing]] application from Japan
* [[Retroshare]]: a [[Friend-to-friend]] network<ref>[
* [[Jami (software)|Jami]]: a privacy-preserving voice, video and chat communication platform, based on a Kademlia-like DHT
* [[Tox (protocol)|Tox]]: an [[instant messaging]] system intended to function as a [[Skype]] replacement
|