Content deleted Content added
→Partially persistent data structure: fixed typo |
→Efficiency of the generalized persistent data structure: fix math parse error ? |
||
(11 intermediate revisions by 2 users not shown) | |||
Line 27:
=== Copy-on-write ===
One method for creating a persistent data structure is to use a platform provided ephemeral data structure such as an [[Mutable array|array]] to store the data in the data structure and copy the entirety of that
[[Copy-on-write]] memory management can reduce the price for an update from <math>\Theta(n)</math> to <math>O(Bu)</math>, where ''B'' is the memory block size and ''u'' the number of pages updated in an operation.{{citation needed|date=May 2019}}
===Fat node===
Line 33 ⟶ 34:
====Complexity of fat node====
With using fat node method, it requires O(1) space for every modification: just store the new data. Each modification takes O(1) additional time to store the modification at the end of the modification history. This is an [[Amortized analysis|amortized time]] bound, assuming modification history is stored in a growable [[Array data structure|array]]. At [[access time]], the right version at each node must be found as the structure is traversed. If
===Path copying===
This method assumes that the data structure is a linked graph of nodes.
====Complexity of path copying====
Line 42 ⟶ 44:
===A combination===
Driscoll, Sarnak, [[Daniel Sleator|Sleator]], [[Robert Tarjan|Tarjan]] came up<ref name=Driscoll /> with a way to combine the techniques of fat nodes and path copying, achieving O(1) access slowdown and O(1)
In each node, one modification box is stored. This box can hold one modification to the node—either a modification to one of the pointers, or to the node's key, or to some other piece of node-specific data—and a timestamp for when that modification was applied. Initially, every node's modification box is empty.
Line 53 ⟶ 55:
====Complexity of the combination====
Time and space for modifications require amortized analysis. A modification takes O(1) amortized space, and O(1) amortized time. To see why, use a [[Potential method|potential function]] ''ϕ'', where ''ϕ''(T) is the number of full live nodes in T . The live nodes of T are just the nodes that are reachable from the current root at the current time (that is, after the last modification). The full live nodes are the live nodes whose modification boxes are full.
Each modification involves some number of copies, say ''k'', followed by 1 change to a modification box. Consider each of the ''k'' copies. Each costs O(1) space and time, but decreases the potential function by one. (First, the node to be copied must be full and live, so it contributes to the potential function. The potential function will only drop, however, if the old node isn't reachable in the new tree. But it is known that it isn't reachable in the new tree—the next step in the algorithm will be to modify the node's parent to point at the copy. Finally, it is known that the copy's modification box is empty. Thus, replaced a full live node has been replaced with an empty live node, and ''ϕ'' goes down by one.) The final step fills a modification box, which costs O(1) time and increases ''ϕ'' by one.
Putting it
==Generalized form of persistence==
Line 86 ⟶ 88:
*CHANGE-EDGE: There are two cases to consider. The first case occurs when there is still at least one empty row in the table. In this case one credit is used to the newly inserted row. The second case occurs when the table is full. In this case the old table becomes inactive and the <math>d+1</math> credits are transformed to the new table in addition to the one credit acquired from calling the CHANGE-EDGE. So in total we have <math>d+2</math> credits. One credit will be used for the creation of the new table. Another credit will be used for the new row added to the table and the {{mvar|d}} credits left are used for updating the tables of the other vertices that need to point to the new table. We conclude that the invariant is maintained.
*CHANGE-LABEL: It works exactly the same as CHANGE-EDGE.
As a summary, we conclude that having <math>n_{1}</math> calls to CREATE_NODE and <math>n_{2}</math> calls to CHANGE_EDGE will result in the creation of <math>2\cdot n_{1}+n_{2}</math> tables. Since each table has size <math>O(d)</math> without taking into account the recursive calls, then filling in a table requires <math>O(
==Applications of persistent data structures==
===Next element search or point ___location===
One of the useful applications that can be solved efficiently using persistence is the Next Element Search. Assume that there are
====Naïve method====
We start with a vertical line segment that starts off at infinity and we sweep the line segments from the left to the right. We take a pause every time we encounter an end point of these segments. The vertical lines split the plane into vertical strips. If there are
====Persistent data structure method====
We can notice that what really takes time in the data structure used in the naïve method is that whenever we move from a strip to the next, we need to take a snap shot of whatever data structure we are using to keep things in sorted order. We can notice that once we get the segments that intersect <math>s_{i}</math>, when we move to <math>s_{i+1}</math> either one thing leaves or one thing enters. If the difference between what is in <math>s_{i}</math> and what is in <math>s_{i+1}</math> is only one insertion or deletion then it is not a good idea to copy everything from <math>s_{i}</math> to <math>s_{i+1}</math>. The trick is that since each copy differs from the previous one by only one insertion or deletion, then we need to copy only the parts that change. Let us assume that we have a tree rooted at
==Examples of persistent data structures==
[[Purely functional data structure]] are automatically persistent.
Perhaps the simplest persistent data structure is the [[Linked list|singly linked list]] or ''cons''-based list, a simple list of objects formed by each carrying a [[reference]] to the next in the list. This is persistent because the ''tail'' of the list can be taken, meaning the last ''k'' items for some ''k'', and new nodes can be added in front of it. The tail will not be duplicated, instead becoming shared between
Many common reference-based data structures, such as [[red–black tree]]s,<ref name="sarnak2">{{cite journal |author=Neil Sarnak |author2=Robert E. Tarjan |year=1986 |title=Planar Point Location Using Persistent Search Trees |url=http://www.link.cs.cmu.edu/15859-f07/papers/point-___location.pdf |journal=Communications of the ACM |volume=29 |issue=7 |pages=669–679 |doi=10.1145/6138.6151 |s2cid=8745316 |author2-link=Robert Tarjan |access-date=2011-04-06 |archive-url=https://web.archive.org/web/20151010204956/http://www.link.cs.cmu.edu/15859-f07/papers/point-___location.pdf |archive-date=2015-10-10 |url-status=dead }}</ref> [[Stack (data structure)|stacks]],<ref name="okasaki2">{{Cite journal|author=Chris Okasaki|title=Purely Functional Data Structures (thesis)|url=https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf}}</ref> and [[treap]]s,<ref>{{cite journal |last=Liljenzin |first=Olle |title=Confluently Persistent Sets and Maps|arxiv=1301.3388|bibcode=2013arXiv1301.3388L|year=2013 }}</ref> can easily be adapted to create a persistent version. Some others need slightly more effort, for example: [[Queue (data structure)|queues]], [[Double-ended queue|dequeues]], and extensions including [[min-deque]]s (which have an additional ''O''(1) operation ''min'' returning the minimal element) and [[random access deque]]s (which have an additional operation of random access with sub-linear, most often logarithmic, complexity).
Persistent data strctures which are based on immutable ("pure functional") structures should be constrasted with structures that used destructive updates (mutation) and are made persistent using the fat node or path copying techniques, described above.
===Linked lists===
Line 149 ⟶ 152:
Notice two points: first, the original tree (<code>xs</code>) persists. Second, many common nodes are shared between the old tree and the new tree. Such persistence and sharing is difficult to manage without some form of [[Garbage collection (computer science)|garbage collection]] (GC) to automatically free up nodes which have no live references, and this is why GC is a feature commonly found in [[Functional programming|functional programming languages]].
GitHub repo containing implementations of persistent BSTs using Fat Nodes, Copy-on-Write, and Path Copying Techniques.▼
=== Persistent hash array mapped trie ===
Line 220 ⟶ 216:
* [http://wiki.edinburghhacklab.com/PersistentRedBlackTreeSet Lightweight Java implementation of Persistent Red-Black Trees]
* [https://persistent.codeplex.com/ Efficient persistent structures in C#]
▲* {{github|DesaultierMAKK/PersistentBST}} - GitHub repo containing implementations of persistent BSTs using Fat Nodes, Copy-on-Write, and Path Copying Techniques. To use the persistent BST implementations, simply clone the repository and follow the instructions provided in the README file.
[[Category:Data structures]]
|