Persistent data structure: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:14, 18 January 2022 edit WikiLinuz (talk \| contribs) Extended confirmed users, IP block exemptions 10,245 edits WP:NOTREPOSITORY ← Previous edit		Latest revision as of 07:10, 21 June 2025 edit undo Jruderman (talk \| contribs) Extended confirmed users, Rollbackers 3,732 edits →Efficiency of the generalized persistent data structure: fix math parse error ?
(48 intermediate revisions by 25 users not shown)
Line 1: {{Distinguish\|Persistent storage}}▼ {{Short description\|Data structure that always preserves the previous version of itself when it is modified}} ▲{{Distinguish\|Persistent storage}} In [[computing]], a '''persistent data structure''' or '''not ephemeral data structure''' is a [[data structure]] that always preserves the previous version of itself when it is modified. Such data structures are effectively [[Immutable object\|immutable]], as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and ~~Tarjans~~Tarjan's 1986 article.<ref name="Driscoll">{{Cite book \|vauthors=Driscoll JR, Sarnak N, Sleator DD, Tarjan RE \|title=Proceedings of the eighteenth annual ACM symposium on Theory of computing - STOC '86 \|year=1986 \|chapter=Making data structures persistent \|isbn=978-0-89791-193-1 \|doi=10.1145/12130.12142 ~~\|journal=Proceeding STOC '86. Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing~~ \|pages=109–121\|citeseerx=10.1.1.133.4630 \|s2cid=364871 }}</ref> A data structure is '''partially persistent''' if all versions can be accessed but only the newest version can be modified. The data structure is '''fully persistent''' if every version can be both accessed and modified. If there is also a meld or merge operation that can create a new version from two previous versions, the data structure is called '''confluently persistent'''. Structures that are not persistent are called ''ephemeral''.<ref name="kaplan2">{{Cite journal\|author=Kaplan, Haim\|year=2001\|title=Persistent data structures\|url=http://www.math.tau.ac.il/~haimk/papers/persistent-survey.ps\|journal=Handbook on Data Structures and Applications}}</ref> Line 9: ==Partial versus full persistence== In the partial persistence model, a programmer may query any previous version of a data structure, but may only update the latest version. This implies a [[Total order\|linear ordering]] among each version of the data structure.<ref>{{Citation\|last1=Conchon\|first1=Sylvain\|chapter=Semi-persistent Data Structures\|pages=322–336\|publisher=Springer Berlin Heidelberg\|isbn=9783540787389\|last2=Filliâtre\|first2=Jean-Christophe\|doi=10.1007/978-3-540-78739-6_25\|title=Programming Languages and Systems\|volume=4960\|series=Lecture Notes in Computer Science\|year=2008\|doi-access=free}}</ref> In the fully persistent model, both updates and queries are allowed on any version of the data structure. In some cases the [[Computer performance\|performance characteristics]] of querying or updating older versions of a data structure may be allowed to degrade, as is true with the [[Rope (data structure)\|~~Rope~~rope data structure]].<ref>{{Cite book\|title=RRB-Trees: Efficient Immutable Vectors\|last=Tiark\|first=Bagwell, Philip Rompf\|date=2011\|oclc=820379112}}</ref> In addition, a data structure can be referred to as confluently persistent if, in addition to being fully persistent, two versions of the same data structure can be combined to form a new version which is still fully persistent.<ref>{{Citation\|last1=Brodal\|first1=Gerth Stølting\|title=~~Purely~~Algorithms ~~Functional~~– ~~Worst Case~~ESA ~~Constant Time Catenable Sorted Lists~~2006\|date=2006\|~~work~~series=Lecture Notes in Computer Science\|pages=172–183\|publisher=Springer Berlin Heidelberg\|isbn=9783540388753\|last2=Makris\|first2=Christos\|last3=Tsichlas\|first3=Kostas\|chapter=Purely Functional Worst Case Constant Time Catenable Sorted Lists \|volume=4168 \|doi=10.1007/11841036_18\|citeseerx=10.1.1.70.1493}}</ref> == Partially persistent data structure == A type of data structure where user may query any version of the structure but may only update the latest version. An ephemeral data structure can be converted to partially persistent data structure using a few techniques. One of the technique is by using randomized version of Van Emde Boas Tree which is created using dynamic perfect hashing. This data structure is created as follows: * A stratified tree with m elements is implemented using dynamic perfect hashing. * The tree is pruned by dividing the m elements into buckets of size log(log n) such that the elements of bucket 1 is smaller than the elements of bucket 2 and so on. * The maximal element in each bucket is stored in the stratified tree and each bucket is stored in the structure as an unordered linked list. The size of this data structure is bounded by the number of elements stored in the structure that is O(m). The insertion of a new maximal element is done in constant O(1) expected and amortized time. Finally query to find an element can be done in this structure in O(log(log n)) worst-case time.<ref>{{Cite journal \|last=Lenhof \|first=Hans-Peter \|last2=Smid \|first2=Michiel \|date=1994 \|title=Using persistent data structures for adding range restrictions to searching problems \|url=http://dx.doi.org/10.1051/ita/1994280100251 \|journal=RAIRO - Theoretical Informatics and Applications \|volume=28 \|issue=1 \|pages=25–49 \|doi=10.1051/ita/1994280100251 \|issn=0988-3754\|hdl=11858/00-001M-0000-0014-AD4F-B \|hdl-access=free }}</ref> ==Techniques for preserving previous versions== === Copy-on-write === One method for creating a persistent data structure is to use a platform provided ephemeral data structure such as an [[Mutable array\|array]] to store the data in the data structure and copy the entirety of that ~~data structure using [[Copy-on-write\|copy-on-write semantics]] for any updates to the~~ data structure. This is an inefficient technique because the entire backing data structure must be copied for each write, leading to worst case <math>O(n·\cdot m)</math> performance characteristics for ''m'' modifications of an array of size ''n''.~~{{citation needed\|date=May 2019}}~~ [[Copy-on-write]] memory management can reduce the price for an update from <math>\Theta(n)</math> to <math>O(Bu)</math>, where ''B'' is the memory block size and ''u'' the number of pages updated in an operation.{{citation needed\|date=May 2019}} ===Fat node=== Line 20 ⟶ 34: ====Complexity of fat node==== With using fat node method, it requires O(1) space for every modification: just store the new data. Each modification takes O(1) additional time to store the modification at the end of the modification history. This is an [[Amortized analysis\|amortized time]] bound, assuming modification history is stored in a growable [[Array data structure\|array]]. At [[access time]], the right version at each node must be found as the structure is traversed. If "''m"'' modifications were to be made, then each access operation would have <math>O(\log m)</math> slowdown resulting from the cost of finding the nearest modification in the array. Alternatively, one can employ the [[van Emde Boas tree]] at each node (possibly the space-efficient version using hashing) to reduce the time for an access to <math>O(\log\log m)</math> at the cost of increasing update time to <math>O(\log\log m)</math>. ===Path copying=== This method assumes that the data structure is a linked graph of nodes. ~~With~~On ~~the~~update, ~~path~~a ~~copying~~copy ~~method~~is ~~a copy~~made of all nodes ~~is made~~ on the path to any node which is about to be modified. These changes must then be ~~[[Fractional cascading\|~~cascaded]] back through the data structure: all nodes that pointed to the old node must be modified to point to the new node instead. These modifications cause more cascading changes, and so on, until the root node is reached. ====Complexity of path copying==== Line 29 ⟶ 44: ===A combination=== Driscoll, Sarnak, [[Daniel Sleator\|Sleator]], [[Robert Tarjan\|Tarjan]] came up<ref name=Driscoll /> with a way to combine the techniques of fat nodes and path copying, achieving O(1) access slowdown and O(1) ~~modification~~amortized overhead in space and time ~~complexity~~per modification. Their method assumes a linked data structure with at most ''d'' incoming pointers to each node, where ''d'' is a known constant. In each node, one modification box is stored. This box can hold one modification to the node—either a modification to one of the pointers, or to the node's key, or to some other piece of node-specific data—and a timestamp for when that modification was applied. Initially, every node's modification box is empty. Line 35 ⟶ 50: Whenever a node is accessed, the modification box is checked, and its timestamp is compared against the access time. (The access time specifies the version of the data structure being considered.) If the modification box is empty, or the access time is before the modification time, then the modification box is ignored and only the normal part of the node is considered. On the other hand, if the access time is after the modification time, then the value in the modification box is used, overriding that value in the node. Modifying a node works like this. (It is assumed that each modification touches one pointer or similar field.) If the node's modification box is empty, then it is filled with the modification. Otherwise, the modification box is full. A copy of the node is made, but using only the latest values. The modification is performed directly on the new node, without using the modification box. (One of the new node's fields is overwritten and its modification box stays empty.) Finally, this change is cascaded to the node's parent, just like path copying. (This may involve filling the parent's modification box, or making a copy of the parent recursively. If the node has no parent—it's the root—it is added the new root to a [[sorted array]] of roots.) With this [[algorithm]], given any time t, at most one modification box exists in the data structure with time t. Thus, a modification at time t splits the tree into three parts: one part contains the data from before time t, one part contains the data from after time t, and one part was unaffected by the modification. ====Complexity of the combination==== Time and space for modifications require amortized analysis. A modification takes O(1) amortized space, and O(1) amortized time. To see why, use a [[Potential method\|potential function]] ''ϕ'', where ''ϕ''(T) is the number of full live nodes in T . The live nodes of T are just the nodes that are reachable from the current root at the current time (that is, after the last modification). The full live nodes are the live nodes whose modification boxes are full. Each modification involves some number of copies, say ''k'', followed by 1 change to a modification box. Consider each of the ''k'' copies. Each costs O(1) space and time, but decreases the potential function by one. (First, the node to be copied must be full and live, so it contributes to the potential function. The potential function will only drop, however, if the old node isn't reachable in the new tree. But it is known that it isn't reachable in the new tree—the next step in the algorithm will be to modify the node's parent to point at the copy. Finally, it is known that the copy's modification box is empty. Thus, replaced a full live node has been replaced with an empty live node, and ''ϕ'' goes down by one.) The final step fills a modification box, which costs O(1) time and increases ''ϕ'' by one. Putting it all together, the change in ''ϕ'' is ΔϕΔ''ϕ'' =1−1 − ''k''. Thus, the algorithm takes O(''k'' +ΔϕΔ''ϕ'')= O(1) space and O(''k'' +ΔϕΔ''ϕ'' +1) = O(1) time▼ ▲Putting it all together, the change in ϕ is Δϕ =1− k. Thus, the algorithm takes O(k +Δϕ)= O(1) space and O(k +Δϕ +1) = O(1) time ==Generalized form of persistence== Path copying is one of the simple methods to achieve persistency in a certain data structure such as binary search trees. It is nice to have a general strategy for implementing persistence that works with any given data structure. In order to achieve that, we consider a directed graph {{mvar\|G}}. We assume that each vertex {{mvar\|v}} in {{mvar\|G}} has a constant number {{mvar\|c}} of outgoing edges that are represented by pointers. Each vertex has a label representing the data. We consider that a vertex has a bounded number {{mvar\|d}} of edges leading into it which we define as inedges({{mvar\|v}}). We allow the following different operations on {{mvar\|G}}. Line 51 ⟶ 67: * CHANGE-LABEL({{mvar\|v}}, {{mvar\|x}}): Changes the value of the data stored at {{mvar\|v}} to {{mvar\|x}} Any of the above operations is performed at a specific time and the purpose of the persistent graph representation is to be able to access any version of {{mvar\|G}} at any given time. For this purpose we define a table for each vertex {{mvar\|v}} in {{mvar\|G}}. The table contains {{mvar\|c}} columns and <math>d+1</math> rows. Each row contains in addition to the pointers for the outgoing edges, a label which represents the data at the vertex and a time {{mvar\|t}} at which the operation was performed. In addition to that there is an array inedges({{mvar\|v}}) that keeps track of all the incoming edges to {{mvar\|v}}. When a table is full, a new table with <math>d+1</math> rows can be created. The old table becomes inactive and the new table becomes the active table. ===CREATE-NODE=== A call to CREATE-NODE creates a new table and set all the references to null ===CHANGE-EDGE=== If we assume that CHANGE-EDGE({{mvar\|v}}, {{mvar\|i}}, {{mvar\|u}}) is called, then there are two cases to consider. Line 62 ⟶ 80: ===Efficiency of the generalized persistent data structure=== In order to find the efficiency of the scheme proposed above, we use an argument defined as a credit scheme. The credit represents a currency. For example, the credit can be used to pay for a table. The argument states the following: * The creation of one table requires one credit * Each call to CREATE-NODE comes with two credits Line 70 ⟶ 88: CHANGE-EDGE: There are two cases to consider. The first case occurs when there is still at least one empty row in the table. In this case one credit is used to the newly inserted row. The second case occurs when the table is full. In this case the old table becomes inactive and the <math>d+1</math> credits are transformed to the new table in addition to the one credit acquired from calling the CHANGE-EDGE. So in total we have <math>d+2</math> credits. One credit will be used for the creation of the new table. Another credit will be used for the new row added to the table and the {{mvar\|d}} credits left are used for updating the tables of the other vertices that need to point to the new table. We conclude that the invariant is maintained. CHANGE-LABEL: It works exactly the same as CHANGE-EDGE. As a summary, we conclude that having <math>n_{1}</math> calls to CREATE_NODE and <math>n_{2}</math> calls to CHANGE_EDGE will result in the creation of <math>2\cdot n_{1}+n_{2}</math> tables. Since each table has size <math>O(d)</math> without taking into account the recursive calls, then filling in a table requires <math>O(~~d^{2}~~dc)</math> ~~where the additional d factor comes from updating the inedges at other nodes~~. Therefore, the amount of work required to complete a sequence of operations is bounded by the number of tables created multiplied by <math>O(~~d^{2}~~dc)</math>. Each access operation can be done in <math>O(~~Log~~\log(d))</math> and there are ~~<math>~~{{mvar\|m~~</math>~~}} edge and label operations, thus it requires <math>m\cdot O(~~Log~~\log(d))</math>. We conclude that There exists a data structure that can complete any ~~<math>~~{{mvar\|n~~</math>~~}} sequence of CREATE-NODE, CHANGE-EDGE and CHANGE-LABEL and ''m'' access operations in <math>O(n\cdot ~~d^{2}~~dc)+m\cdot O(~~Log~~\log(d))</math>. ==Applications of persistent data structures== ===Next ~~Element~~element ~~Search~~search or ~~Point~~point ~~Location~~___location=== One of the useful applications that can be solved efficiently using persistence is the Next Element Search. Assume that there are ~~<math>~~{{mvar\|n~~</math>~~}} non intersecting line segments that don't cross each other that are parallel to the x-axis. We want to build a data structure that can query a point ~~<math>~~{{mvar\|p~~</math>~~}} and return the segment above ~~<math>~~{{mvar\|p~~</math>~~}} (if any). We will start by solving the Next Element Search using the naïve method then we will show how to solve it using the persistent data structure method. ~~====Naïve Method====~~ We start with a vertical line segment that starts off at infinity and we sweep the line segments from the left to the right. We take a pause every time we encounter an end point of these segments. The vertical lines split the plane into vertical strips. If there are <math>n</math> line segments then we can get <math>2\cdot n+1</math> vertical strips since each segment has <math>2</math> end points. No segment begins and ends in the strip. Every segment either it doesn't touch the strip or it completely crosses it. We can think of the segments as some objects that are in some sorted order from top to bottom. What we care about is where the point that we are looking at fits in this order. We sort the endpoints of the segments by their <math>x</math> coordinate. For each strip <math>s_{i}</math>, we store the subset segments that cross <math>s_{i}</math> in a dictionary. When the vertical line sweeps the line segments, whenever it passes over the left endpoint of a segment then we add it to the to the dictionary. When it passes through the right endpoint of the segment, we remove it from the dictionary. At every endpoint, we save a copy of the dictionary and we store all the copies sorted by the <math>x</math> coordinates. Thus we have a data structure that can answer any query. In order to find the segment above a point <math>p</math>, we can look at the <math>x</math> coordinate of <math>p</math> to know which copy or strip it belongs to. Then we can look at the <math>y</math> coordinate to find the segment above it. Thus we need two binary searches, one for the <math>x</math> coordinate to find the strip or the copy, and another for the <math>y</math> coordinate to find the segment above it. Thus the query time takes <math>O(Log(n))</math>. In this data structure, the space is the issue since if we assume that we have the segments structured in a way such that every segment starts before the end of any other segment, then the space required for the structure to be built using the naïve method would be <math>O(n^{2})</math>. Let us see how we can build another persistent data structure with the same query time but with a better space.▼ ====~~Persistent~~Naïve ~~Data Structure Method~~method==== ▲We start with a vertical line segment that starts off at infinity and we sweep the line segments from the left to the right. We take a pause every time we encounter an end point of these segments. The vertical lines split the plane into vertical strips. If there are ~~<math>~~{{mvar\|n~~</math>~~}} line segments then we can get <math>2\cdot n+1</math> vertical strips since each segment has ~~<math>~~{{val\|2~~</math>~~}} end points. No segment begins and ends in the strip. Every segment either it doesn't touch the strip or it completely crosses it. We can think of the segments as some objects that are in some sorted order from top to bottom. What we care about is where the point that we are looking at fits in this order. We sort the endpoints of the segments by their ~~<math>~~{{mvar\|x~~</math>~~}} coordinate. For each strip <math>s_{i}</math>, we store the subset segments that cross <math>s_{i}</math> in a dictionary. When the vertical line sweeps the line segments, whenever it passes over the left endpoint of a segment then we add it ~~to the~~ to the dictionary. When it passes through the right endpoint of the segment, we remove it from the dictionary. At every endpoint, we save a copy of the dictionary and we store all the copies sorted by the ~~<math>~~{{mvar\|x~~</math>~~}} coordinates. Thus we have a data structure that can answer any query. In order to find the segment above a point ~~<math>~~{{mvar\|p~~</math>~~}}, we can look at the ~~<math>~~{{mvar\|x~~</math>~~}} coordinate of ~~<math>~~{{mvar\|p~~</math>~~}} to know which copy or strip it belongs to. Then we can look at the ~~<math>~~{{mvar\|y~~</math>~~}} coordinate to find the segment above it. Thus we need two binary searches, one for the ~~<math>~~{{mvar\|x~~</math>~~}} coordinate to find the strip or the copy, and another for the ~~<math>~~{{mvar\|y~~</math>~~}} coordinate to find the segment above it. Thus the query time takes <math>O(~~Log~~\log(n))</math>. In this data structure, the space is the issue since if we assume that we have the segments structured in a way such that every segment starts before the end of any other segment, then the space required for the structure to be built using the naïve method would be <math>O(n^{2})</math>. Let us see how we can build another persistent data structure with the same query time but with a better space. We can notice that what really takes time in the data structure used in the naïve method is that whenever we move from a strip to the next, we need to take a snap shot of whatever data structure we are using to keep things in sorted order. We can notice that once we get the segments that intersect <math>s_{i}</math>, when we move to <math>s_{i+1}</math> either one thing leaves or one thing enters. If the difference between what is in <math>s_{i}</math> and what is in <math>s_{i+1}</math> is only one insertion or deletion then it is not a good idea to copy everything from <math>s_{i}</math> to <math>s_{i+1}</math>. The trick is that since each copy differs from the previous one by only one insertion or deletion, then we need to copy only the parts that change. Let us assume that we have a tree rooted at <math>T</math>. When we insert a key <math>k</math> into the tree, we create a new leaf containing <math>k</math>. Performing rotations to rebalance the tree will only modify the nodes of the path from <math>k</math> to <math>T</math>. Before inserting the key <math>k</math> into the tree, we copy all the nodes on the path from <math>k</math> to <math>T</math>. Now we have 2 versions of the tree, the original one which doesn't contain <math>k</math> and the new tree that contains <math>k</math> and whose root is a copy of the root of <math>T</math>. Since copying the path from <math>k</math> to <math>T</math> doesn't increase the insertion time by more than a constant factor then the insertion in the persistent data structure takes <math>O(Log(n))</math> time. For the deletion, we need to find which nodes will be affected by the deletion. For each node <math>v</math> affected by the deletion, we copy the path from the root to <math>v</math>. This will provide a new tree whose root is a copy of the root of the original tree. Then we perform the deletion on the new tree. We will end up with 2 versions of the tree. The original one which contains <math>k</math> and the new one which doesn't contain <math>k</math>. Since any deletion only modifies the path from the root to <math>v</math> and any appropriate deletion algorithm runs in <math>O(Log(n))</math>, thus the deletion in the persistent data structure takes <math>O(Log(n))</math>. Every sequence of insertion and deletion will cause the creation of a sequence of dictionaries or versions or trees <math>S_{1}, S_{2}, \dots S_{i}</math> where each <math>S_{i}</math> is the result of operations <math>S_{1}, S_{2}, \dots S_{i}</math>. If each <math>S_{i}</math> contains <math>m</math> elements, then the search in each <math>S_{i}</math> takes <math>O(Log(m))</math>. Using this persistent data structure we can solve the next element search problem in <math>O(Log(n))</math> query time and <math>O(n\cdot Log(n))</math> space instead of <math>O(n^{2})</math>. Please find below the source code for an example related to the next search problem.▼ ====Persistent data structure method==== ▲We can notice that what really takes time in the data structure used in the naïve method is that whenever we move from a strip to the next, we need to take a snap shot of whatever data structure we are using to keep things in sorted order. We can notice that once we get the segments that intersect <math>s_{i}</math>, when we move to <math>s_{i+1}</math> either one thing leaves or one thing enters. If the difference between what is in <math>s_{i}</math> and what is in <math>s_{i+1}</math> is only one insertion or deletion then it is not a good idea to copy everything from <math>s_{i}</math> to <math>s_{i+1}</math>. The trick is that since each copy differs from the previous one by only one insertion or deletion, then we need to copy only the parts that change. Let us assume that we have a tree rooted at ~~<math>~~{{mvar\|T~~</math>~~}}. When we insert a key ~~<math>~~{{mvar\|k~~</math>~~}} into the tree, we create a new leaf containing ~~<math>~~{{mvar\|k~~</math>~~}}. Performing rotations to rebalance the tree will only modify the nodes of the path from ~~<math>~~{{mvar\|k~~</math>~~}} to ~~<math>~~{{mvar\|T~~</math>~~}}. Before inserting the key ~~<math>~~{{mvar\|k~~</math>~~}} into the tree, we copy all the nodes on the path from ~~<math>~~{{mvar\|k~~</math>~~}} to ~~<math>~~{{mvar\|T~~</math>~~}}. Now we have 2 versions of the tree, the original one which doesn't contain ~~<math>~~{{mvar\|k~~</math>~~}} and the new tree that contains ~~<math>~~{{mvar\|k~~</math>~~}} and whose root is a copy of the root of ~~<math>~~{{mvar\|T~~</math>~~}}. Since copying the path from ~~<math>~~{{mvar\|k~~</math>~~}} to ~~<math>~~{{mvar\|T~~</math>~~}} doesn't increase the insertion time by more than a constant factor then the insertion in the persistent data structure takes <math>O(~~Log~~\log(n))</math> time. For the deletion, we need to find which nodes will be affected by the deletion. For each node ~~<math>~~{{mvar\|v~~</math>~~}} affected by the deletion, we copy the path from the root to ~~<math>~~{{mvar\|v~~</math>~~}}. This will provide a new tree whose root is a copy of the root of the original tree. Then we perform the deletion on the new tree. We will end up with 2 versions of the tree. The original one which contains ~~<math>~~{{mvar\|k~~</math>~~}} and the new one which doesn't contain ~~<math>~~{{mvar\|k~~</math>~~}}. Since any deletion only modifies the path from the root to ~~<math>~~{{mvar\|v~~</math>~~}} and any appropriate deletion algorithm runs in <math>O(~~Log~~\log(n))</math>, thus the deletion in the persistent data structure takes <math>O(~~Log~~\log(n))</math>. Every sequence of insertion and deletion will cause the creation of a sequence of dictionaries or versions or trees <math>S_{1}, S_{2}, \dots S_{i}</math> where each <math>S_{i}</math> is the result of operations <math>S_{1}, S_{2}, \dots S_{i}</math>. If each <math>S_{i}</math> contains ~~<math>~~{{mvar\|m~~</math>~~}} elements, then the search in each <math>S_{i}</math> takes <math>O(~~Log~~\log(m))</math>. Using this persistent data structure we can solve the next element search problem in <math>O(~~Log~~\log(n))</math> query time and <math>O(n\cdot ~~Log~~\log(n))</math> space instead of <math>O(n^{2})</math>. Please find below the source code for an example related to the next search problem. ==Examples of persistent data structures== [[Purely functional data structure]] are automatically persistent. Perhaps the simplest persistent data structure is the [[Linked list\|singly linked list]] or ''cons''-based list, a simple list of objects formed by each carrying a [[reference]] to the next in the list. This is persistent because the ''tail'' of the list can be taken, meaning the last ''k'' items for some ''k'', and new nodes can be added in front of it. The tail will not be duplicated, instead becoming shared between ~~both~~ the old list and the new list. So long as the contents of the tail are immutable, this sharing will be invisible to the program. Many common reference-based data structures, such as [[red–black tree]]s,<ref name="sarnak2">{{cite journal \|author=Neil Sarnak \|author2=Robert E. Tarjan \|year=1986 \|title=Planar Point Location Using Persistent Search Trees \|url=http://www.link.cs.cmu.edu/15859-f07/papers/point-___location.pdf \|journal=Communications of the ACM \|volume=29 \|issue=7 \|pages=669–679 \|doi=10.1145/6138.6151 \|s2cid=8745316 \|author2-link=Robert Tarjan \|access-date=2011-04-06 \|archive-url=https://web.archive.org/web/20151010204956/http://www.link.cs.cmu.edu/15859-f07/papers/point-___location.pdf \|archive-date=2015-10-10 \|url-status=dead }}</ref> [[Stack (data structure)\|stacks]],<ref name="okasaki2">{{Cite journal\|author=Chris Okasaki\|title=Purely Functional Data Structures (thesis)\|url=https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf}}</ref> and [[treap]]s,<ref>{{cite journal \|last=Liljenzin \|first=Olle \|title=Confluently Persistent Sets and Maps\|arxiv=1301.3388\|bibcode=2013arXiv1301.3388L\|year=2013 }}</ref> can easily be adapted to create a persistent version. Some others need slightly more effort, for example: [[Queue (data structure)\|queues]], [[Double-ended queue\|dequeues]], and extensions including [[min-deque]]s (which have an additional ''O''(1) operation ''min'' returning the minimal element) and [[random access deque]]s (which have an additional operation of random access with sub-linear, most often logarithmic, complexity). Persistent data strctures which are based on immutable ("pure functional") structures should be constrasted with structures that used destructive updates (mutation) and are made persistent using the fat node or path copying techniques, described above. There also exist persistent data structures which use destructive{{clarify\|date=June 2013}} operations, making them impossible to implement efficiently in purely functional languages (like Haskell outside specialized monads like state or IO), but possible in languages like C or Java. These types of data structures can often be avoided with a different design. One primary advantage to using purely persistent data structures is that they often behave better in multi-threaded environments. ===Linked lists=== Singly [[linked list]]s are the bread-and-butter data structure in functional languages.<ref name="auto">''This example is taken from Okasaki. See the bibliography.''</ref> Some [[ML programming language\|ML]]-derived languages, like [[Haskell (programming language)\|Haskell]], are purely functional because once a node in the list has been allocated, it cannot be modified, only copied, referenced or destroyed by the [[~~Garbage_collection_~~Garbage collection (~~computer_science~~computer science)\|garbage collector]] when nothing refers to it. (Note that ML itself is '''not''' purely functional, but supports non-destructive list operations subset, that is also true in the [[Lisp (programming language)\|Lisp]] (LISt Processing) functional language dialects like [[Racket (programming language)\|Scheme]] and [[Racket (programming language)\|Racket]].) Consider the two lists: Line 150 ⟶ 170: === Clojure === Like many programming languages in the [[Lisp (programming language)\|Lisp]] family, Clojure contains an implementation of a linked list, but unlike other dialects its implementation of a ~~Linked~~linked ~~List~~list has enforced persistence instead of being persistent by convention.<ref>{{Cite web\|url=https://clojure.org/reference/lisps\|title=Clojure - Differences with other Lisps\|website=clojure.org\|access-date=2018-10-23}}</ref> Clojure also has efficient implementations of persistent vectors, maps, and sets based on persistent hash array mapped tries. These data structures implement the mandatory read-only parts of the [[Java collections framework]].<ref>{{Cite web\|url=https://clojure.org/reference/data_structures\|title=Clojure - Data Structures\|website=clojure.org\|access-date=2018-10-23}}</ref> The designers of the Clojure language advocate the use of persistent data structures over mutable data structures because they have [[value semantics]] which gives the benefit of making them freely shareable between threads with cheap aliases, easy to fabricate, and language independent.<ref>{{Cite web\|url=https://www.infoq.com/presentations/Value-Values\|title=Keynote: The Value of Values\|website=InfoQ\|access-date=2018-10-23}}</ref> Line 162 ⟶ 182: === Java === The [[Java (programming language)\|Java programming language]] is not particularly functional. Despite this, the core JDK package java.util.concurrent includes CopyOnWriteArrayList and CopyOnWriteArraySet which are persistent structures, implemented using copy-on-write techniques. The usual concurrent map implementation in Java, ConcurrentHashMap, is not persistent, however. Fully persistent collections are available in third-party libraries,<ref>{{Cite web\|url=https://github.com/sigbla/sigbla-pds/\|title=Persistent (immutable) collections for Java and Kotlin\|website=github.com\|access-date=2023-12-13}}</ref> or other JVM languages. === JavaScript === Line 170 ⟶ 190: One such library of persistent data structures Immutable.js is based on the data structures made available and popularized by Clojure and Scala.<ref>{{Cite web\|url=https://facebook.github.io/immutable-js/\|title=Immutable.js\|website=facebook.github.io\|access-date=2018-10-23\|archive-url=https://web.archive.org/web/20150809185757/http://facebook.github.io/immutable-js/\|archive-date=2015-08-09\|url-status=dead}}</ref> It is mentioned by the documentation of Redux as being one of the possible libraries that can provide enforced immutability.<ref name=":0" /> Mori.js brings data structures similar to those in Clojure to JavaScript.<ref>{{Cite web\|url=https://swannodette.github.io/mori/\|title = Mori}}</ref> Immer.js brings an interesting approach where one "creates the next immutable state by mutating the current one". <ref>{{Cite web\|url=https://github.com/immerjs/immer\|title = Immer\| website=[[GitHub]] \|date = 26 October 2021}}</ref> Immer.js uses native JavaScript objects and not efficient persistent data structures and it might cause performance issues when data size is big. === Prolog === Prolog terms are naturally immutable and therefore data structures are typically persistent data structures. Their performance depends on sharing and garbage collection offered by the Prolog system.<ref>{{Citation\|title= The Implementation of Prolog - Patrice Boizumault\|date=1993\|isbn=9780691637709\|url=https://press.princeton.edu/books/hardcover/9780691637709/the-implementation-of-prolog\|last1=Djamboulian\|first1=Ara M.\|last2=Boizumault\|first2=Patrice\|publisher=Princeton University Press }}</ref> Extensions to non-ground Prolog terms are not always feasible because of search space explosion. Delayed goals might mitigate the problem. Some Prolog systems nevertheless do provide destructive operations like setarg/3, which might come in different flavors, with/without copying and with/without backtracking of the state change. There are cases where setarg/3 is used to the good of providing a new declarative layer, like a constraint solver.<ref>{{Citation\|title=The Use of Mercury for the Implementation of a Finite Domain Solver - Henk Vandecasteele, Bart Demoen, Joachim Van Der Auwera\|date=1999\|url=https://lirias.kuleuven.be/retrieve/440562}}</ref> === Scala === The Scala programming language promotes the use of persistent data structures for implementing programs using "Object-Functional Style".<ref>{{Cite news\|url=https://blog.codecentric.de/en/2015/08/essence-of-object-functional-programming-practical-potential-of-scala/\|title=The Essence of Object-Functional Programming and the Practical Potential of Scala - codecentric AG Blog\|date=2015-08-31\|work=codecentric AG Blog\|access-date=2018-10-23\|language=en-US}}</ref> Scala contains implementations of many ~~Persistent~~persistent data structures including ~~Linked~~linked ~~Lists~~lists, [[~~Red–black~~red–black tree]]s, as well as persistent hash array mapped tries as introduced in Clojure.<ref>{{Citation\|last=ClojureTV\|title=Extreme Cleverness: Functional Data Structures in Scala - Daniel Spiewak\|date=2013-01-07\|url=https://www.youtube.com/watch?v=pNhBQJN44YQ\|access-date=2018-10-23}}{{cbignore}}{{Dead ~~Youtube~~YouTube link\|date=February ~~links~~2022}}</ref> ==Garbage collection== Because persistent data structures are often implemented in such a way that successive versions of a data structure share underlying memory<ref>{{Cite web\|url=https://kostyukov.net/posts/designing-a-pfds/\|title=Vladimir Kostyukov - Posts / Slides\|website=kostyukov.net\|access-date=2018-11-30}}</ref> ergonomic use of such data structures generally requires some form of [[automatic garbage collection]] system such as [[reference counting]] or [[mark and sweep]].<ref>{{Cite web\|url=http://wiki.c2.com/?ImmutableObjectsAndGarbageCollection \| title=~~http://wiki.c2.com/?ImmutableObjectsAndGarbageCollection~~Immutable Objects And Garbage Collection \|website=wiki.c2.com\|access-date=2018-11-30}}</ref> In some platforms where persistent data structures are used it is an option to not use garbage collection which, while doing so can lead to [[memory leak]]s, can in some cases have a positive impact on the overall performance of an application.<ref>{{Cite web\|url=https://www.infoq.com/news/2017/03/java-epsilon-gc\|title=The Last Frontier in Java Performance: Remove the Garbage Collector\|website=InfoQ\|access-date=2018-11-30}}</ref> ==See also== Line 192 ⟶ 212: ==References== {{Reflist}} ~~<references group=""></references>~~ ~~==Further reading==~~ ==External links== * [http://wiki.edinburghhacklab.com/PersistentRedBlackTreeSet Lightweight Java implementation of Persistent Red-Black Trees] * [https://persistent.codeplex.com/ Efficient persistent structures in C#] * {{github\|DesaultierMAKK/PersistentBST}} - GitHub repo containing implementations of persistent BSTs using Fat Nodes, Copy-on-Write, and Path Copying Techniques. To use the persistent BST implementations, simply clone the repository and follow the instructions provided in the README file. [[Category:Data structures]] ~~[[Category:Functional data structures\| ]]~~ [[Category:Persistence]]