Persistent data structure: Difference between revisions

Content deleted Content added
AmirOnWiki (talk | contribs)
 
(6 intermediate revisions by one other user not shown)
Line 34:
 
====Complexity of fat node====
With using fat node method, it requires O(1) space for every modification: just store the new data. Each modification takes O(1) additional time to store the modification at the end of the modification history. This is an [[Amortized analysis|amortized time]] bound, assuming modification history is stored in a growable [[Array data structure|array]]. At [[access time]], the right version at each node must be found as the structure is traversed. If "''m"'' modifications were to be made, then each access operation would have <math>O(\log m)</math> slowdown resulting from the cost of finding the nearest modification in the array. Alternatively, one can employ the [[van Emde Boas tree]] at each node (possibly the space-efficient version using hashing) to reduce the time for an access to <math>O(\log\log m)</math> at the cost of increasing update time to <math>O(\log\log m)</math>.
 
===Path copying===
This method assumes that the data structure is a linked graph of nodes.
WithOn theupdate, patha copyingcopy methodis a copymade of all nodes is made on the path to any node which is about to be modified. These changes must then be [[Fractional cascading|cascaded]] back through the data structure: all nodes that pointed to the old node must be modified to point to the new node instead. These modifications cause more cascading changes, and so on, until the root node is reached.
 
====Complexity of path copying====
Line 43 ⟶ 44:
 
===A combination===
Driscoll, Sarnak, [[Daniel Sleator|Sleator]], [[Robert Tarjan|Tarjan]] came up<ref name=Driscoll /> with a way to combine the techniques of fat nodes and path copying, achieving O(1) access slowdown and O(1) modificationamortized overhead in space and time complexityper modification. Their method assumes a linked data structure with at most ''d'' incoming pointers to each node, where ''d'' is a known constant.
 
In each node, one modification box is stored. This box can hold one modification to the node—either a modification to one of the pointers, or to the node's key, or to some other piece of node-specific data—and a timestamp for when that modification was applied. Initially, every node's modification box is empty.
Line 87 ⟶ 88:
*CHANGE-EDGE: There are two cases to consider. The first case occurs when there is still at least one empty row in the table. In this case one credit is used to the newly inserted row. The second case occurs when the table is full. In this case the old table becomes inactive and the <math>d+1</math> credits are transformed to the new table in addition to the one credit acquired from calling the CHANGE-EDGE. So in total we have <math>d+2</math> credits. One credit will be used for the creation of the new table. Another credit will be used for the new row added to the table and the {{mvar|d}} credits left are used for updating the tables of the other vertices that need to point to the new table. We conclude that the invariant is maintained.
*CHANGE-LABEL: It works exactly the same as CHANGE-EDGE.
As a summary, we conclude that having <math>n_{1}</math> calls to CREATE_NODE and <math>n_{2}</math> calls to CHANGE_EDGE will result in the creation of <math>2\cdot n_{1}+n_{2}</math> tables. Since each table has size <math>O(d)</math> without taking into account the recursive calls, then filling in a table requires <math>O(d^{2}dc)</math> where the additional d factor comes from updating the inedges at other nodes. Therefore, the amount of work required to complete a sequence of operations is bounded by the number of tables created multiplied by <math>O(d^{2}dc)</math>. Each access operation can be done in <math>O(\log(d))</math> and there are {{mvar|m}} edge and label operations, thus it requires <math>m\cdot O(\log(d))</math>. We conclude that There exists a data structure that can complete any {{mvar|n}} sequence of CREATE-NODE, CHANGE-EDGE and CHANGE-LABEL and ''m'' access operations in <math>O(n\cdot d^{2}dc)+m\cdot O(\log(d))</math>.
 
==Applications of persistent data structures==
Line 100 ⟶ 101:
 
==Examples of persistent data structures==
[[Purely functional data structure]] are automatically persistent.
Perhaps the simplest persistent data structure is the [[Linked list|singly linked list]] or ''cons''-based list, a simple list of objects formed by each carrying a [[reference]] to the next in the list. This is persistent because the ''tail'' of the list can be taken, meaning the last ''k'' items for some ''k'', and new nodes can be added in front of it. The tail will not be duplicated, instead becoming shared between both the old list and the new list. So long as the contents of the tail are immutable, this sharing will be invisible to the program.
 
Many common reference-based data structures, such as [[red–black tree]]s,<ref name="sarnak2">{{cite journal |author=Neil Sarnak |author2=Robert E. Tarjan |year=1986 |title=Planar Point Location Using Persistent Search Trees |url=http://www.link.cs.cmu.edu/15859-f07/papers/point-___location.pdf |journal=Communications of the ACM |volume=29 |issue=7 |pages=669–679 |doi=10.1145/6138.6151 |s2cid=8745316 |author2-link=Robert Tarjan |access-date=2011-04-06 |archive-url=https://web.archive.org/web/20151010204956/http://www.link.cs.cmu.edu/15859-f07/papers/point-___location.pdf |archive-date=2015-10-10 |url-status=dead }}</ref> [[Stack (data structure)|stacks]],<ref name="okasaki2">{{Cite journal|author=Chris Okasaki|title=Purely Functional Data Structures (thesis)|url=https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf}}</ref> and [[treap]]s,<ref>{{cite journal |last=Liljenzin |first=Olle |title=Confluently Persistent Sets and Maps|arxiv=1301.3388|bibcode=2013arXiv1301.3388L|year=2013 }}</ref> can easily be adapted to create a persistent version. Some others need slightly more effort, for example: [[Queue (data structure)|queues]], [[Double-ended queue|dequeues]], and extensions including [[min-deque]]s (which have an additional ''O''(1) operation ''min'' returning the minimal element) and [[random access deque]]s (which have an additional operation of random access with sub-linear, most often logarithmic, complexity).
 
Persistent data strctures which are based on immutable ("pure functional") structures should be constrasted with structures that used destructive updates (mutation) and are made persistent using the fat node or path copying techniques, described above.
There also exist persistent data structures which use destructive{{clarify|date=June 2013}} operations, making them impossible to implement efficiently in purely functional languages (like Haskell outside specialized monads like state or IO), but possible in languages like C or Java. These types of data structures can often be avoided with a different design. One primary advantage to using purely persistent data structures is that they often behave better in multi-threaded environments.
 
===Linked lists===
Line 150 ⟶ 152:
 
Notice two points: first, the original tree (<code>xs</code>) persists. Second, many common nodes are shared between the old tree and the new tree. Such persistence and sharing is difficult to manage without some form of [[Garbage collection (computer science)|garbage collection]] (GC) to automatically free up nodes which have no live references, and this is why GC is a feature commonly found in [[Functional programming|functional programming languages]].
 
==== Code ====
GitHub repo containing implementations of persistent BSTs using Fat Nodes, Copy-on-Write, and Path Copying Techniques.
 
To use the persistent BST implementations, simply clone the repository and follow the instructions provided in the README file.<ref>{{github|DesaultierMAKK/PersistentBST}}</ref>
 
=== Persistent hash array mapped trie ===
Line 219 ⟶ 216:
* [http://wiki.edinburghhacklab.com/PersistentRedBlackTreeSet Lightweight Java implementation of Persistent Red-Black Trees]
* [https://persistent.codeplex.com/ Efficient persistent structures in C#]
* {{github|DesaultierMAKK/PersistentBST}} - GitHub repo containing implementations of persistent BSTs using Fat Nodes, Copy-on-Write, and Path Copying Techniques. To use the persistent BST implementations, simply clone the repository and follow the instructions provided in the README file.
 
 
[[Category:Data structures]]