Distributed hash table: Difference between revisions

Content deleted Content added
No edit summary
Reverted 1 edit by 112.215.244.250 (talk): Broke syntax
 
(7 intermediate revisions by 6 users not shown)
Line 1:
{{Short description|Decentralized distributed system with lookup service}}
A '''distributed hash table''' ('''DHT''') is a [[Distributed computing|distributed system]] that provides a lookup service similar to a [[hash table]]. [[Key–value pair]]s are stored in a DHT, and any participating [[node (networking)|node]] can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys.<ref>{{Cite book |lastlast1=Hota |firstfirst1=Chittaranjan |url=https://wwwbooks.google.com/books/edition/Distributed_Computing_and_Internet_Techn/?id=5I25BQAAQBAJ |title=Distributed Computing and Internet Technology: 9th International Conference, ICDCIT 2013, Bhubaneswar, India, February 5-8, 2013, Proceedings |last2=Srimani |first2=Pradip K. |date=2013-01-11 |publisher=Springer |isbn=978-3-642-36071-8 |language=en}}</ref> ''Keys'' are unique identifiers which map to particular ''values'', which in turn can be anything from addresses, to [[Electronic document|documents]], to arbitrary [[Data (computing)|data]].<ref name=StoicaEtAl2001>{{Cite journal | last1 = Stoica | first1 = I. | author-link1 = Ion Stoica | last2 = Morris | first2 = R. | last3 = Karger | first3 = D. | author-link3 = David Karger | last4 = Kaashoek | first4 = M. F. | last5 = Balakrishnan | first5 = H. | author-link5 = Hari Balakrishnan | title = Chord: A scalable peer-to-peer lookup service for internet applications | doi = 10.1145/964723.383071 | journal = ACM SIGCOMM Computer Communication Review | volume = 31 | issue = 4 | pages = 149 | year = 2001 | url = http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf | quote = A value can be an address, a document, or an arbitrary data item. | access-date = 2018-09-18 | archive-date = 2023-07-07 | archive-url = https://web.archive.org/web/20230707080145/https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf | url-status = live }}</ref> Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to [[scale (computing)|scale]] to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.
 
DHTs form an infrastructure that can be used to build more complex services, such as [[anycast]], cooperative [[Web cache|web caching]], [[distributed file system]]s, [[Domain name system|___domain name services]], [[instant messaging]], [[multicast]], and also [[peer-to-peer file sharing]] and [[content distribution]] systems. Notable distributed networks that use DHTs include [[BitTorrent (protocol)|BitTorrent]]'s distributed tracker, the [[Kad network]], the [[Storm botnet]], the [[Tox (protocol)|Tox instant messenger]], [[Freenet]], the [[YaCy]] search engine, and the [[InterPlanetary File System]].
Line 11:
These systems differed in how they located the data offered by their peers. Napster, the first large-scale P2P content delivery system, required a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the queries to the nodes that held the results. This central component left the system vulnerable to attacks and lawsuits.
 
Gnutella and similar networks moved to a [[query flooding]] model{{spaced ndash}} in essence, each search would result in a message being broadcast to every machine in the network. While avoiding a [[single point of failure]], this method was significantly less efficient than Napster. Later versions of Gnutella clients moved to a [[dynamic querying]] model which vastly improved efficiency.<ref>{{cite journal |last1=Richter, Stevenson|display-authors=et al |title=Analysis of the impact of dynamic querying models on client-server relationships |journal=Trends in Modern Computing |date=2009 |pages=682–701}}</ref>
 
Freenet is fully distributed, but employs a [[Heuristic (computer science)|heuristic]] [[key-based routing]] in which each file is associated with a key, and files with similar keys tend to cluster on a similar set of nodes. Queries are likely to be routed through the network to such a cluster without needing to visit many peers.<ref>{{citation |url=https://freenetproject.org/papers/lic.pdf |title=Searching in a Small World Chapters 1 & 2 |access-date=2012-01-10 |archive-date=2012-03-16 |archive-url=https://web.archive.org/web/20120316102141/https://freenetproject.org/papers/lic.pdf |url-status=dead }}</ref> However, Freenet does not guarantee that data will be found.
Line 17:
Distributed hash tables use a more structured key-based routing in order to attain both the decentralization of Freenet and Gnutella, and the efficiency and guaranteed results of Napster. One drawback is that, like Freenet, DHTs only directly support exact-match search, rather than keyword search, although Freenet's [[routing algorithm]] can be generalized to any key type where a closeness operation can be defined.<ref>{{citation |chapter-url=https://freenetproject.org/papers/ddisrs.pdf |title=A Distributed Decentralized Information Storage and Retrieval System |chapter=Section 5.2.2 |access-date=2012-01-10 |archive-date=2012-03-16 |archive-url=https://web.archive.org/web/20120316102156/https://freenetproject.org/papers/ddisrs.pdf |url-status=dead }}</ref>
 
In 2001, four systems&mdash;[[Content addressable network|CAN]],<ref name="Ratnasamy01">{{Cite journal |lastlast1=Ratnasamy |firstfirst1=Sylvia |last2=Francis |first2=Paul |last3=Handley |first3=Mark |last4=Karp |first4=Richard |last5=Shenker |first5=Scott |date=2001-08-27 |title=A scalable content-addressable network |url=https://dl.acm.org/doi/10.1145/964723.383072 |journal=SIGCOMM Comput. Commun. Rev. |volume=31 |issue=4 |pages=161–172 |doi=10.1145/964723.383072 |issn=0146-4833}}</ref> [[Chord (peer-to-peer)|Chord]],<ref>[[Hari Balakrishnan]], [[M. Frans Kaashoek]], David Karger, [[Robert Tappan Morris|Robert Morris]], and Ion Stoica. [http://www.cs.berkeley.edu/~istoica/papers/2003/cacm03.pdf Looking up data in P2P systems] {{Webarchive|url=https://web.archive.org/web/20160519125101/http://www.cs.berkeley.edu/~istoica/papers/2003/cacm03.pdf |date=2016-05-19 }}. In [[Communications of the ACM]], February 2003.</ref> [[Pastry (DHT)|Pastry]], and [[Tapestry (DHT)|Tapestry]]&mdash;ignitedbrought DHTsattention asto a popular research topicDHTs.
A project called the Infrastructure for Resilient Internet Systems (Iris) was funded by a $12 million grant from the United States [[National Science Foundation]] in 2002.<ref>{{Cite news |title= New P2P network funded by US government |author= David Cohen |work= New Scientist |date= October 1, 2002 |url= https://www.newscientist.com/article.ns?id=dn2861 |access-date= November 10, 2013 |archive-date= April 6, 2008 |archive-url= https://web.archive.org/web/20080406123915/http://www.newscientist.com/article.ns?id=dn2861 |url-status= live }}</ref>
Researchers included [[Sylvia Ratnasamy]], [[Ion Stoica]], [[Hari Balakrishnan]] and [[Scott Shenker]].<ref>{{Cite news |title= MIT, Berkeley, ICSI, NYU, and Rice Launch the IRIS Project |work= Press release |publisher= MIT |date= September 25, 2002 |url= https://iris.pdos.csail.mit.edu/MITPressRelease1.doc |access-date= November 10, 2013 |url-status= dead |archive-url= https://web.archive.org/web/20150926070618/https://iris.pdos.csail.mit.edu/MITPressRelease1.doc |archive-date= September 26, 2015 }}</ref>
Line 36:
The structure of a DHT can be decomposed into several main components.<ref>Moni Naor and Udi Wieder. [http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf Novel Architectures for P2P Applications: the Continuous-Discrete Approach] {{Webarchive|url=https://web.archive.org/web/20191209032152/http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf |date=2019-12-09 }}. Proc. SPAA, 2003.</ref><ref>Gurmeet Singh Manku. [http://www-db.stanford.edu/~manku/phd/index.html Dipsea: A Modular Distributed Hash Table] {{webarchive|url=https://web.archive.org/web/20040910154927/http://www-db.stanford.edu/~manku/phd/index.html |date=2004-09-10 }}. Ph. D. Thesis (Stanford University), August 2004.</ref> The foundation is an abstract [[Keyspace (distributed data store)|keyspace]], such as the set of 160-bit [[string (computer science)|string]]s. A keyspace [[Partition (database)|partitioning]] scheme splits ownership of this keyspace among the participating nodes. An [[overlay network]] then connects the nodes, allowing them to find the owner of any given key in the keyspace.
 
Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows. Suppose the keyspace is the set of 160-bit strings. To index a file with given {{Var serifmvar|filename}} and {{mvar|data}} in the DHT, the [[SHA-1]] hash of {{mvar|filename}} is generated, producing a 160-bit key {{mvar|k}}, and a message {{math|''put''(''k, data'')}} is sent to any node participating in the DHT. The message is forwarded from node to node through the overlay network until it reaches the single node responsible for key {{mvar|k}} as specified by the keyspace partitioning. That node then stores the key and the data. Any other client can then retrieve the contents of the file by again hashing {{mvar|filename}} to produce {{mvar|k}} and asking any DHT node to find the data associated with {{mvar|k}} with a message {{math|''get''(''k'')}}. The message will again be routed through the overlay to the node responsible for {{mvar|k}}, which will reply with the stored {{mvar|data}}.
 
The keyspace partitioning and overlay network components are described below with the goal of capturing the principal ideas common to most DHTs; many designs differ in the details.
Line 114:
https://pdos.csail.mit.edu/papers/whanau-nsdi10.pdf {{Webarchive|url=https://web.archive.org/web/20220125025128/https://pdos.csail.mit.edu/papers/whanau-nsdi10.pdf |date=2022-01-25 }}</ref>
 
Petar Maymounkov, one of the original authors of [[Kademlia]], has proposed a way to circumvent the weakness to the Sybil attack by incorporating social trust relationships into the system design.<ref>{{Cite journalbook |last=Lesniewski-Laas |first=Chris |chapter=A Sybil-proof one-hop DHT |date=2008-04-01 |title=AProceedings Sybil-proofof one-hopthe DHT1st Workshop on Social Network Systems |chapter-url=https://dl.acm.org/doi/10.1145/1435497.1435501 |journal=Proceedings of the 1st Workshop on Social Network Systems |series=SocialNets '08 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=19–24 |doi=10.1145/1435497.1435501 |isbn=978-1-60558-124-8}}</ref> The new system, codenamed Tonika or also known by its ___domain name as 5ttt, is based on an algorithm design known as "electric routing" and co-authored with the mathematician Jonathan Kelner.<ref>{{Cite journal |lastlast1=Kelner |firstfirst1=Jonathan |last2=Maymounkov |first2=Petar |date=2011-07-22 |title=Electric routing and concurrent flow cutting |url=https://linkinghub.elsevier.com/retrieve/pii/S0304397510003476 |journal=Theoretical Computer Science |series=Algorithms and Computation |volume=412 |issue=32 |pages=4123–4135 |doi=10.1016/j.tcs.2010.06.013 |issn=0304-3975|hdl=1721.1/71604 |hdl-access=free }}</ref> Maymounkov has now undertaken a comprehensive implementation effort of this new system. However, research into effective defences against Sybil attacks is generally considered an open question, and wide variety of potential defences are proposed every year in top security research conferences.{{Citation needed|date=May 2020}}
 
== Implementations ==
Line 158:
* [[Oracle Coherence]]: an in-memory data grid built on top of a Java DHT implementation
* [[Perfect Dark (P2P)|Perfect Dark]]: a [[peer-to-peer]] [[file-sharing]] application from Japan
* [[Retroshare]]: a [[Friend-to-friend]] network<ref>[httphttps://retroshare.sourceforge.net/wiki/index.php/Frequently_Asked_Questions#4-1_How_does_RetroShare_know_my_friend.27s_IP_address_and_port.3F_Why_don.27t_I_need_a_static_IP_address.3F_What_is_DHT_for.3F Retroshare FAQ] {{Webarchive|url=https://web.archive.org/web/20130717094704/http://retroshare.sourceforge.net/wiki/index.php/Frequently_Asked_Questions#4-1_How_does_RetroShare_know_my_friend.27s_IP_address_and_port.3F_Why_don.27t_I_need_a_static_IP_address.3F_What_is_DHT_for.3F |date=2013-07-17 }} retrieved December 2011</ref>
* [[Jami (software)|Jami]]: a privacy-preserving voice, video and chat communication platform, based on a Kademlia-like DHT
* [[Tox (protocol)|Tox]]: an [[instant messaging]] system intended to function as a [[Skype]] replacement