Persistent data structure: Difference between revisions

Content deleted Content added
No edit summary
Tags: Reverted Visual edit Mobile edit Mobile web edit
Reverting edit(s) by 2806:1016:A:1882:9DC1:4705:E5EA:5DB5 (talk) to rev. 1076485894 by WikiLinuz: Vandalism (RW 16.1)
Line 1:
{{Short description|Data structure that always preserves the previous version of itself when it is modified}}In computing, a '''persistent data structure''' or '''not ephemeral data structure''' is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjans' 1986 article.
{{Distinguish|Persistent storage}}
 
In [[computing]], a '''persistent data structure''' or '''not ephemeral data structure''' is a [[data structure]] that always preserves the previous version of itself when it is modified. Such data structures are effectively [[Immutable object|immutable]], as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjans' 1986 article.<ref name="Driscoll">{{Cite book |vauthors=Driscoll JR, Sarnak N, Sleator DD, Tarjan RE |title=Proceedings of the eighteenth annual ACM symposium on Theory of computing - STOC '86 |year=1986 |chapter=Making data structures persistent |isbn=978-0-89791-193-1 |doi=10.1145/12130.12142 |journal=Proceeding STOC '86. Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing |pages=109–121|citeseerx=10.1.1.133.4630 |s2cid=364871 }}</ref>
A data structure is '''partially persistent''' if all versions can be accessed but only the newest version can be modified. The data structure is '''fully persistent''' if every version can be both accessed and modified. If there is also a meld or merge operation that can create a new version from two previous versions, the data structure is called '''confluently persistent'''. Structures that are not persistent are called ''ephemeral''.
 
TheseA typesdata ofstructure is '''partially persistent''' if all versions can be accessed but only the newest version can be modified. The data structuresstructure areis particularly'''fully commonpersistent''' inif logicalevery version can be both accessed and functionalmodified. programmingIf there is also a meld or merge operation that can create a new version from two previous versions, the data structure is called '''confluently persistent'''. Structures that are not persistent are called ''ephemeral''.<ref name="kaplan2">{{Cite journal |author=Kaplan, Haim |year=2001 |title=Persistent data structures |url=http://www.math.tau.ac.il/~haimk/papers/persistent-survey.ps |journal=Handbook on Data Structures and Applications}}</ref> as languages in those paradigms discourage (or fully forbid) the use of mutable data.
 
These types of data structures are particularly common in [[Logic programming|logical]] and [[functional programming]],<ref name="kaplan2" /> as languages in those paradigms discourage (or fully forbid) the use of mutable data.
 
==Partial versus full persistence==
In the partial persistence model, a programmer may query any previous version of a data structure, but may only update the latest version. This implies a [[Total order|linear ordering]] among each version of the data structure.<ref>{{Citation|last1=Conchon|first1=Sylvain|chapter=Semi-persistent Data Structures|pages=322–336|publisher=Springer Berlin Heidelberg|isbn=9783540787389|last2=Filliâtre|first2=Jean-Christophe|doi=10.1007/978-3-540-78739-6_25|title=Programming Languages and Systems|volume=4960|series=Lecture Notes in Computer Science|year=2008|doi-access=free}}</ref> In the fully persistent model, both updates and queries are allowed on any version of the data structure. In some cases the [[Computer performance|performance characteristics]] of querying or updating older versions of a data structure may be allowed to degrade, as is true with the [[Rope (data structure)|Rope data structure]].<ref>{{Cite book|title=RRB-Trees: Efficient Immutable Vectors|last=Tiark|first=Bagwell, Philip Rompf|date=2011|oclc=820379112}}</ref> In addition, a data structure can be referred to as confluently persistent if, in addition to being fully persistent, two versions of the same data structure can be combined to form a new version which is still fully persistent.<ref>{{Citation|last1=Brodal|first1=Gerth Stølting|title=Purely Functional Worst Case Constant Time Catenable Sorted Lists|date=2006|work=Lecture Notes in Computer Science|pages=172–183|publisher=Springer Berlin Heidelberg|isbn=9783540388753|last2=Makris|first2=Christos|last3=Tsichlas|first3=Kostas|doi=10.1007/11841036_18|citeseerx=10.1.1.70.1493}}</ref>
 
==Techniques for preserving previous versions==
Line 26 ⟶ 29:
 
===A combination===
Driscoll, Sarnak, [[Daniel Sleator|Sleator]], [[Robert Tarjan|Tarjan]] came up<ref name="Driscoll">{{Cite book |title=Proceedings of the eighteenth annual ACM symposium on Theory of computing - STOC '86 |vauthors=Driscoll JR, Sarnak N, Sleator DD, Tarjan RE |journal=Proceeding STOC '86. Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing |year=1986 |isbn=978-0-89791-193-1 |pages=109–121 |chapter=Making data structures persistent |citeseerx=10.1.1.133.4630 |doi=10.1145/12130.12142 |s2cid=364871}}</ref> with a way to combine the techniques of fat nodes and path copying, achieving O(1) access slowdown and O(1) modification space and time complexity.
 
In each node, one modification box is stored. This box can hold one modification to the node—either a modification to one of the pointers, or to the node's key, or to some other piece of node-specific data—and a timestamp for when that modification was applied. Initially, every node's modification box is empty.
Line 74 ⟶ 77:
==Applications of persistent data structures==
===Next Element Search or Point Location===
One of the useful applications that can be solved efficiently using persistence is the Next Element Search. Assume that there are <math>n</math> non intersecting line segments that don't cross each other that are parallel to the x-axis. We want to build a data structure that can query a point <math>p</math> and return the segment above <math>p</math> (if any). We will start by solving the Next Element Search using the naïve method then we will show how to solve it using the persistent data structure method.
 
====Naïve Method====
We start with a vertical line segment that starts off at infinity and we sweep the line segments from the left to the right. We take a pause every time we encounter an end point of these segments. The vertical lines split the plane into vertical strips. If there are <math>n</math> line segments then we can get <math>2\cdot n+1</math> vertical strips since each segment has <math>2</math> end points. No segment begins and ends in the strip. Every segment either it doesn't touch the strip or it completely crosses it. We can think of the segments as some objects that are in some sorted order from top to bottom. What we care about is where the point that we are looking at fits in this order. We sort the endpoints of the segments by their <math>x</math> coordinate. For each strip <math>s_{i}</math>, we store the subset segments that cross <math>s_{i}</math> in a dictionary. When the vertical line sweeps the line segments, whenever it passes over the left endpoint of a segment then we add it to the to the dictionary. When it passes through the right endpoint of the segment, we remove it from the dictionary. At every endpoint, we save a copy of the dictionary and we store all the copies sorted by the <math>x</math> coordinates. Thus we have a data structure that can answer any query. In order to find the segment above a point <math>p</math>, we can look at the <math>x</math> coordinate of <math>p</math> to know which copy or strip it belongs to. Then we can look at the <math>y</math> coordinate to find the segment above it. Thus we need two binary searches, one for the <math>x</math> coordinate to find the strip or the copy, and another for the <math>y</math> coordinate to find the segment above it. Thus the query time takes <math>O(Log(n))</math>. In this data structure, the space is the issue since if we assume that we have the segments structured in a way such that every segment starts before the end of any other segment, then the space required for the structure to be built using the naïve method would be <math>O(n^{2})</math>. Let us see how we can build another persistent data structure with the same query time but with a better space.
 
====Persistent Data Structure Method====
We can notice that what really takes time in the data structure used in the naïve method is that whenever we move from a strip to the next, we need to take a snap shot of whatever data structure we are using to keep things in sorted order. We can notice that once we get the segments that intersect <math>s_{i}</math>, when we move to <math>s_{i+1}</math> either one thing leaves or one thing enters. If the difference between what is in <math>s_{i}</math> and what is in <math>s_{i+1}</math> is only one insertion or deletion then it is not a good idea to copy everything from <math>s_{i}</math> to <math>s_{i+1}</math>. The trick is that since each copy differs from the previous one by only one insertion or deletion, then we need to copy only the parts that change. Let us assume that we have a tree rooted at <math>T</math>. When we insert a key <math>k</math> into the tree, we create a new leaf containing <math>k</math>. Performing rotations to rebalance the tree will only modify the nodes of the path from <math>k</math> to <math>T</math>. Before inserting the key <math>k</math> into the tree, we copy all the nodes on the path from <math>k</math> to <math>T</math>. Now we have 2 versions of the tree, the original one which doesn't contain <math>k</math> and the new tree that contains <math>k</math> and whose root is a copy of the root of <math>T</math>. Since copying the path from <math>k</math> to <math>T</math> doesn't increase the insertion time by more than a constant factor then the insertion in the persistent data structure takes <math>O(Log(n))</math> time. For the deletion, we need to find which nodes will be affected by the deletion. For each node <math>v</math> affected by the deletion, we copy the path from the root to <math>v</math>. This will provide a new tree whose root is a copy of the root of the original tree. Then we perform the deletion on the new tree. We will end up with 2 versions of the tree. The original one which contains <math>k</math> and the new one which doesn't contain <math>k</math>. Since any deletion only modifies the path from the root to <math>v</math> and any appropriate deletion algorithm runs in <math>O(Log(n))</math>, thus the deletion in the persistent data structure takes <math>O(Log(n))</math>. Every sequence of insertion and deletion will cause the creation of a sequence of dictionaries or versions or trees <math>S_{1}, S_{2}, \dots S_{i}</math> where each <math>S_{i}</math> is the result of operations <math>S_{1}, S_{2}, \dots S_{i}</math>. If each <math>S_{i}</math> contains <math>m</math> elements, then the search in each <math>S_{i}</math> takes <math>O(Log(m))</math>. Using this persistent data structure we can solve the next element search problem in <math>O(Log(n))</math> query time and <math>O(n\cdot Log(n))</math> space instead of <math>O(n^{2})</math>. Please find below the source code for an example related to the next search problem.
 
==Examples of persistent data structures==