Content deleted Content added
→Definition: per GA1 |
Citation bot (talk | contribs) Altered doi-broken-date. | Use this bot. Report bugs. | #UCB_CommandLine |
||
(45 intermediate revisions by 8 users not shown) | |||
Line 1:
{{good article}}
{{Short description|Binary tree derived from a sequence of numbers}}
[[File:Cartesian tree.svg|thumb|240px|A sequence of numbers and the Cartesian tree derived from them.]]
{{for|Descartes' metaphor of tree of knowledge|Tree of knowledge (philosophy)}}
In [[computer science]], a '''Cartesian tree''' is a [[binary tree]] derived from a sequence of distinct numbers.
==Definition==
Cartesian trees are defined using [[binary tree]]s, which are a form of [[rooted tree]]
It is also possible to characterize the Cartesian tree directly rather than recursively, using its ordering properties. In any tree, the ''subtree'' rooted at any node consists of all other nodes that can reach it by repeatedly following parent pointers. The Cartesian tree for a sequence of distinct numbers is defined by the following properties:
#The Cartesian tree for a sequence is a [[binary tree]] with one node for each number in the sequence
#A [[Tree_traversal#In-order,_LNR|symmetric (in-order) traversal]] of the tree results in the original sequence.
#The tree has the [[Binary heap|min-heap property]]: the parent of any non-root node has a smaller value than the node itself.<ref name=minmax/>
These two definitions are equivalent: the tree defined recursively as described above is the unique tree that has the properties listed above. If a sequence of numbers contains repetitions, a Cartesian tree
==History==▼
Cartesian trees were introduced and named by {{harvtxt|Vuillemin|1980}}, who used them as an example of the interaction between [[geometric combinatorics]] and the design and analysis of [[data structure]]s. In particular, Vuillemin used these structures to analyze the [[average-case complexity]] of concatenation and splitting operations on [[binary search tree]]s. The name is derived from the [[Cartesian coordinate]] system for the plane: in one version of this structure, as in the two-dimensional range searching application discussed
==Efficient construction==
A Cartesian tree
One method is to
An alternative linear-time construction algorithm is based on the [[all nearest smaller values]] problem. In the input sequence,
Another linear-time algorithm for Cartesian tree construction is based on [[Divide-and-conquer algorithm|divide-and-conquer]]. The algorithm recursively constructs the tree on each half of the input, and then merges the two trees.
▲An alternative linear-time construction algorithm is based on the [[all nearest smaller values]] problem. In the input sequence, one may define the ''left neighbor'' of a value <math>a</math> to be the value that occurs prior to <math>a</math>, is smaller than <math>a</math>, and is closer in position to <math>a</math> than any other smaller value. The ''right neighbor'' is defined symmetrically. The sequence of left neighbors may be found by an algorithm that maintains a [[stack (data structure)|stack]] containing a subsequence of the input. For each new sequence value <math>a</math>, the stack is popped until it is empty or its top element is smaller than <math>a</math>, and then <math>a</math> is pushed onto the stack. The left neighbor of <math>a</math> is the top element at the time <math>a</math> is pushed. The right neighbors may be found by applying the same stack algorithm to the reverse of the sequence. The parent of <math>a</math> in the Cartesian tree is either the left neighbor of <math>a</math> or the right neighbor of <math>a</math>, whichever exists and has a larger value. The left and right neighbors may also be constructed efficiently by [[parallel algorithm]]s, so this formulation may be used to develop efficient parallel algorithms for Cartesian tree construction.<ref>{{harvtxt|Berkman|Schieber|Vishkin|1993}}.</ref>
Yet another linear-time algorithm, using a linked list representation of the input sequence, is based on ''locally maximum linking'': the algorithm repeatedly identifies a ''local maximum'' element, i.e., one that is larger than both its neighbors (or than its only neighbor, in case it is the first or last element in the list). This element is then removed from the list, and attached as the right child of its left neighbor, or the left child of its right neighbor, depending on which of the two neighbors has a larger value, breaking ties arbitrarily. This process can be implemented in a single left-to-right pass of the input, and it is easy to see that each element can gain at most one left-, and at most one right child, and that the resulting binary tree is a Cartesian tree of the input sequence. {{sfnp|Kozma|Saranurak|2020}} {{sfnp|Hartmann|Kozma|Sinnamon|Tarjan|2021}}
▲Another linear-time algorithm for Cartesian tree construction is based on divide-and-conquer. The algorithm recursively constructs the tree on each half of the input, and then merges the two trees by taking the right spine of the left tree and left spine of the right tree (both of which are paths whose root-to-leaf order sorts their values from smallest to largest) and performs a standard [[Merge algorithm#Merging two lists|merging]] operation, replacing these two paths in two trees by a single path that contains the same nodes. In the merged path, the successor in the sorted order of each node from the left tree is placed in its right child, and the successor of each node from the right tree is placed in its left child, the same position that was previously used for its successor in the spine. The left children of nodes from the left tree and right children of nodes from the right tree are left unchanged. The algorithm is also parallelizable since on each level of recursion, each of the two sub-problems can be computed in parallel, and the merging operation can be [[Merge algorithm#Parallel merge|efficiently parallelized]] as well.{{sfnp|Shun|Blelloch|2014}}
It is possible to maintain the Cartesian tree of a dynamic input, subject to insertions of elements and [[lazy deletion]] of elements, in logarithmic [[amortized time]] per operation. Here, lazy deletion means that
==Applications==
===Range searching and lowest common ancestors===
[[File:Cartesian tree range searching.svg|thumb|300px|Two-dimensional range-searching using a Cartesian tree: the bottom point (red in the figure) within a three-sided region with two vertical sides and one horizontal side (if the region is nonempty)
Cartesian trees
{{harvtxt|Bender|Farach-Colton|2000}} reversed this relationship between the two data structure problems by showing that data structures for range minimization could also be used for finding lowest common ancestors. Their data structure associates with each node of the tree its distance from the root, and constructs a sequence of these distances in the order of an [[Euler tour]] of the (edge-doubled) tree. It then constructs a range minimization data structure for the resulting sequence. The lowest common ancestor of any two vertices in the given tree can be found as the minimum distance appearing in the interval between the initial positions of these two vertices in the sequence. Bender and Farach-Colton also provide a method for range minimization that can be used for the sequences resulting from this transformation, which have the special property that adjacent sequence values differ by
The same range minimization problem may also be given an alternative interpretation in terms of two dimensional range searching. A collection of finitely many points in the [[Cartesian plane]]
The same construction, of lowest common ancestors in a Cartesian tree, makes it possible to construct a data structure with linear space that allows the distances between pairs of points in any [[ultrametric space]] to be queried in constant time per query. The distance within an ultrametric is the same as the [[widest path problem|minimax path]] weight in the [[minimum spanning tree]] of the metric.<ref>{{harvtxt|Hu|1961}}; {{harvtxt|Leclerc|1981}}</ref> From the minimum spanning tree, one can construct a Cartesian tree, the root node of which represents the heaviest edge of the minimum spanning tree. Removing this edge partitions the minimum spanning tree into two subtrees, and Cartesian trees recursively constructed for these two subtrees form the children of the root node of the Cartesian tree. The leaves of the Cartesian tree represent points of the metric space, and the lowest common ancestor of two leaves in the Cartesian tree is the heaviest edge between those two points in the minimum spanning tree, which has weight equal to the distance between the two points. Once the minimum spanning tree has been found and its edge weights sorted, the Cartesian tree
===As a binary search tree===
{{main article|Treap}}
This idea was applied by {{harvtxt|Seidel|Aragon|1996}}, who suggested the use of random numbers as priorities. The
If the priorities of each key are chosen randomly and independently once whenever the key is inserted into the tree, the resulting Cartesian tree will have the same properties as a [[random binary search tree]], a tree computed by inserting the keys in a randomly chosen [[permutation]] starting from an empty tree, with each insertion leaving the previous tree structure unchanged and inserting the new node as a leaf of the tree. Random binary search trees
===In sorting===
[[File:Bracketing pairs.svg|thumb|300px|Pairs of consecutive sequence values (shown as the thick red edges) that bracket a sequence value (the darker blue point). The cost of including this value in the sorted order produced by the Levcopoulos–Petersson algorithm is proportional to the [[logarithm]] of this number of bracketing pairs.]]
{{harvtxt|Levcopoulos|Petersson|1989}} describe a [[sorting algorithm]] based on Cartesian trees. They describe the algorithm as based on a tree with the maximum at the root,{{sfnp|Levcopoulos|Petersson|1989}} but it
The Levcopoulos–Petersson algorithm can be viewed as a version of [[selection sort]] or [[heap sort]] that maintains a [[priority queue]] of candidate minima, and that at each step finds and removes the minimum value in this queue, moving this value to the end of an output sequence. In their algorithm, the priority queue consists only of elements whose parent in the Cartesian tree has already been found and removed. Thus, the algorithm consists of the following steps:{{sfnp|Levcopoulos|Petersson|1989}}
Line 56 ⟶ 62:
#* Add the Cartesian tree children of the removed value to the priority queue
As Levcopoulos and Petersson show, for input sequences that are already nearly sorted, the size of the priority queue will remain small, allowing this method to take advantage of the nearly-sorted input and run more quickly. Specifically, the worst-case running time of this algorithm is <math>O(n\log k)</math>, where <math>n</math> is the sequence length and <math>k</math> is the average, over all values in the sequence, of the number of consecutive pairs of sequence values that bracket the given value (meaning that the given value is between the two sequence values). They also prove a lower bound stating that, for any <math>n</math> and (non-constant) <math>k</math>, any comparison-based sorting algorithm must use <math>\Omega(n\log k)</math> comparisons for some inputs.{{sfnp|Levcopoulos|Petersson|1989}}
===In pattern matching===
The problem of ''Cartesian tree matching'' has been defined as a generalized form of [[string matching]] in which one seeks a [[substring]] (or in some cases, a [[subsequence]]) of a given string that has a Cartesian tree of the same form as a given pattern. Fast algorithms for variations of the problem with a single pattern or multiple patterns have been developed, as well as data structures analogous to the [[suffix tree]] and other text indexing structures.<ref>{{harvtxt|Park|Amir|Landau|Park|2019}}; {{harvtxt|Park|Bataa|Amir|Landau|2020}}; {{harvtxt|Song|Gu|Ryu|Faro|2021}}; {{harvtxt|Kim|Cho|2021}}; {{harvtxt|Nishimoto|Fujisato|Nakashima|Inenaga|2021}}; {{harvtxt|Oizumi|Kai|Mieno|Inenaga|2022}}</ref>
▲==History==
▲Cartesian trees were introduced and named by {{harvtxt|Vuillemin|1980}}. The name is derived from the [[Cartesian coordinate]] system for the plane: in one version of this structure, as in the two-dimensional range searching application discussed above, a Cartesian tree for a point set has the sorted order of the points by their <math>x</math>-coordinates as its symmetric traversal order, and it has the heap property according to the <math>y</math>-coordinates of the points. {{harvtxt|Vuillemin|1980}} described both this geometric version of the structure, and the definition here in which a Cartesian tree is defined from a sequence. Using sequences instead of point coordinates provides a more general setting that allows the Cartesian tree to be applied to non-geometric problems as well.{{sfnp|Vuillemin|1980}}
==Notes==
Line 101 ⟶ 104:
| title = STACS 2006, 23rd Annual Symposium on Theoretical Aspects of Computer Science, Marseille, France, February 23-25, 2006, Proceedings
| volume = 3884
| year = 2006
}}
*{{citation|contribution=On Cartesian trees and range minimum queries|first1=Erik D.|last1=Demaine|author1-link=Erik Demaine|first2=Gad M.|last2=Landau|first3=Oren|last3=Weimann|series=Lecture Notes in Computer Science|volume=5555|year=2009|pages=341–353|doi=10.1007/978-3-642-02927-1_29|title=Automata, Languages and Programming, 36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009|isbn=978-3-642-02926-4|hdl=1721.1/61963|hdl-access=free}}
*{{citation
Line 150 ⟶ 154:
| volume = 13
| year = 1984}}
*{{citation
*{{citation|title=The maximum capacity route problem|first=T. C.|last=Hu|journal=Operations Research|volume=9|issue=6|year=1961|pages=898–900|jstor=167055|doi=10.1287/opre.9.6.898|doi-access=free}}▼
| last1 = Hartmann | first1 = Maria
| last2 = Kozma | first2 = László
| last3 = Sinnamon | first3 = Corwin
| last4 = Tarjan | first4 = Robert E. | author4-link = Robert Tarjan
| doi =10.4230/LIPIcs.ICALP.2021.79
| journal = [[ICALP]]
| pages = 79:1–79:20
| title = Analysis of Smooth Heaps and Slim Heaps
| series = Leibniz International Proceedings in Informatics (LIPIcs)
| year = 2021| volume = 198
| doi-access = free
| isbn = 978-3-95977-195-5
}}
▲*{{citation|title=The maximum capacity route problem|first=T. C.|last=Hu|author-link=T. C. Hu|journal=Operations Research|volume=9|issue=6|year=1961|pages=898–900|jstor=167055|doi=10.1287/opre.9.6.898|doi-access=free}}
*{{citation
| last1 = Kim | first1 = Sung-Hwan
Line 163 ⟶ 181:
| title = 32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021, July 5-7, 2021, Wrocław, Poland
| volume = 191
| year = 2021| doi-access = free
| isbn = 9783959771863 }}
*{{citation
| last1 = Kozma | first1 = László
| last2 = Saranurak | first2 = Thatchaphol
| doi = 10.1137/18M1195188
| number = 5
| journal = [[SIAM J. Comput.]]
| publisher = SIAM
| title = Smooth Heaps and a Dual View of Self-Adjusting Data Structures
| volume = 49
| year = 2020
| arxiv = 1802.05471
}}
*{{citation
Line 185 ⟶ 216:
| title = WADS '89: Proceedings of the Workshop on Algorithms and Data Structures
| volume = 382
| year = 1989
}}
*{{citation
| last1 = Nishimoto | first1 = Akio
Line 201 ⟶ 233:
| title = String Processing and Information Retrieval - 28th International Symposium, SPIRE 2021, Lille, France, October 4-6, 2021, Proceedings
| volume = 12944
| year = 2021|
| s2cid = 235313506
}}
*{{citation
Line 218 ⟶ 251:
| title = 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022, June 27-29, 2022, Prague, Czech Republic
| volume = 223
| year = 2022| doi-access = free
| isbn = 9783959772341 | s2cid = 246679910
}}
Line 236 ⟶ 270:
| title = 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, June 18-20, 2019, Pisa, Italy
| volume = 128
| year = 2019| doi-access = free
| isbn = 9783959771030 | s2cid = 162168587
}}
Line 262 ⟶ 297:
| url = http://citeseer.ist.psu.edu/seidel96randomized.html
| volume = 16
| year = 1996| doi-broken-date = 1 July 2025 }}
*{{citation
| last1 = Schieber | first1 = Baruch
|