Content deleted Content added
m Removed non-content empty section(s), performed general fixes |
|||
Line 1:
{{Distinguish|Persistent storage}}▼
{{Short description|Data structure that always preserves the previous version of itself when it is modified}}
▲{{Distinguish|Persistent storage}}
In [[computing]], a '''persistent data structure''' or '''not ephemeral data structure''' is a [[data structure]] that always preserves the previous version of itself when it is modified. Such data structures are effectively [[Immutable object|immutable]], as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjans' 1986 article.<ref name="Driscoll">{{Cite book |vauthors=Driscoll JR, Sarnak N, Sleator DD, Tarjan RE |title=Proceedings of the eighteenth annual ACM symposium on Theory of computing - STOC '86 |year=1986 |chapter=Making data structures persistent |isbn=978-0-89791-193-1 |doi=10.1145/12130.12142 |journal=Proceeding STOC '86. Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing |pages=109–121|citeseerx=10.1.1.133.4630 |s2cid=364871 }}</ref>
Line 45:
Putting it all together, the change in ϕ is Δϕ =1− k. Thus, the algorithm takes O(k +Δϕ)= O(1) space and O(k +Δϕ +1) = O(1) time
==Generalized form of persistence==
Path copying is one of the simple methods to achieve persistency in a certain data structure such as binary search trees. It is nice to have a general strategy for implementing persistence that works with any given data structure. In order to achieve that, we consider a directed graph {{mvar|G}}. We assume that each vertex {{mvar|v}} in {{mvar|G}} has a constant number {{mvar|c}} of outgoing edges that are represented by pointers. Each vertex has a label representing the data. We consider that a vertex has a bounded number {{mvar|d}} of edges leading into it which we define as inedges({{mvar|v}}). We allow the following different operations on {{mvar|G}}.
Line 51 ⟶ 52:
* CHANGE-LABEL({{mvar|v}}, {{mvar|x}}): Changes the value of the data stored at {{mvar|v}} to {{mvar|x}}
Any of the above operations is performed at a specific time and the purpose of the persistent graph representation is to be able to access any version of {{mvar|G}} at any given time. For this purpose we define a table for each vertex {{mvar|v}} in {{mvar|G}}. The table contains {{mvar|c}} columns and <math>d+1</math> rows. Each row contains in addition to the pointers for the outgoing edges, a label which represents the data at the vertex and a time {{mvar|t}} at which the operation was performed. In addition to that there is an array inedges({{mvar|v}}) that keeps track of all the incoming edges to {{mvar|v}}. When a table is full, a new table with <math>d+1</math> rows can be created. The old table becomes inactive and the new table becomes the active table.
===CREATE-NODE===
A call to CREATE-NODE creates a new table and set all the references to null
===CHANGE-EDGE===
If we assume that CHANGE-EDGE({{mvar|v}}, {{mvar|i}}, {{mvar|u}}) is called, then there are two cases to consider.
Line 75 ⟶ 78:
===Next Element Search or Point Location===
One of the useful applications that can be solved efficiently using persistence is the Next Element Search. Assume that there are <math>n</math> non intersecting line segments that don't cross each other that are parallel to the x-axis. We want to build a data structure that can query a point <math>p</math> and return the segment above <math>p</math> (if any). We will start by solving the Next Element Search using the naïve method then we will show how to solve it using the persistent data structure method.
====Naïve Method====
We start with a vertical line segment that starts off at infinity and we sweep the line segments from the left to the right. We take a pause every time we encounter an end point of these segments. The vertical lines split the plane into vertical strips. If there are <math>n</math> line segments then we can get <math>2\cdot n+1</math> vertical strips since each segment has <math>2</math> end points. No segment begins and ends in the strip. Every segment either it doesn't touch the strip or it completely crosses it. We can think of the segments as some objects that are in some sorted order from top to bottom. What we care about is where the point that we are looking at fits in this order. We sort the endpoints of the segments by their <math>x</math> coordinate. For each strip <math>s_{i}</math>, we store the subset segments that cross <math>s_{i}</math> in a dictionary. When the vertical line sweeps the line segments, whenever it passes over the left endpoint of a segment then we add it to the to the dictionary. When it passes through the right endpoint of the segment, we remove it from the dictionary. At every endpoint, we save a copy of the dictionary and we store all the copies sorted by the <math>x</math> coordinates. Thus we have a data structure that can answer any query. In order to find the segment above a point <math>p</math>, we can look at the <math>x</math> coordinate of <math>p</math> to know which copy or strip it belongs to. Then we can look at the <math>y</math> coordinate to find the segment above it. Thus we need two binary searches, one for the <math>x</math> coordinate to find the strip or the copy, and another for the <math>y</math> coordinate to find the segment above it. Thus the query time takes <math>O(Log(n))</math>. In this data structure, the space is the issue since if we assume that we have the segments structured in a way such that every segment starts before the end of any other segment, then the space required for the structure to be built using the naïve method would be <math>O(n^{2})</math>. Let us see how we can build another persistent data structure with the same query time but with a better space.
Line 89 ⟶ 93:
===Linked lists===
Singly [[linked list]]s are the bread-and-butter data structure in functional languages.<ref name="auto">''This example is taken from Okasaki. See the bibliography.''</ref> Some [[ML programming language|ML]]-derived languages, like [[Haskell (programming language)|Haskell]], are purely functional because once a node in the list has been allocated, it cannot be modified, only copied, referenced or destroyed by the [[
Consider the two lists:
Line 173 ⟶ 177:
=== Prolog ===
Prolog terms are naturally immutable and therefore data structures are typically persistent data structures. Their performance depends on sharing and garbage collection offered by the Prolog system.<ref>{{Citation|title= The Implementation of Prolog - Patrice Boizumault|date=1993|isbn=9780691637709|url=https://press.princeton.edu/books/hardcover/9780691637709/the-implementation-of-prolog|last1=Djamboulian|first1=Ara M.|last2=Boizumault|first2=Patrice}}</ref> Extensions to non-ground Prolog terms are not always feasible because of search space explosion. Delayed goals might mitigate the problem.
Some Prolog systems nevertheless do provide destructive operations like setarg/3, which might come in different flavors, with/without copying and with/without backtracking of the state change. There are cases where setarg/3 is used to the good of providing a new declarative layer, like a constraint solver.<ref>{{Citation|title=The Use of Mercury for the Implementation of a Finite Domain Solver - Henk Vandecasteele, Bart Demoen, Joachim Van Der Auwera|date=1999|url=https://lirias.kuleuven.be/retrieve/440562}}</ref>
Line 194 ⟶ 198:
<references group=""></references>
==External links==
* [http://wiki.edinburghhacklab.com/PersistentRedBlackTreeSet Lightweight Java implementation of Persistent Red-Black Trees]
|