Content deleted Content added
split long sentence per GA1 |
can vs may per GA1 |
||
Line 4:
In [[computer science]], a '''Cartesian tree''' is a [[binary tree]] derived from a sequence of distinct numbers. The smallest number in the sequence is at the root of the tree; its left and right subtrees are constructed recursively from the subsequences to the left and right of this number. When all numbers are distinct, the Cartesian tree is uniquely defined from the properties that it is [[Heap (data structure)|heap]]-ordered and that a [[Tree traversal|symmetric (in-order) traversal]] of the tree returns the original sequence.
Cartesian trees were introduced by {{harvtxt|Vuillemin|1980}} in the context of geometric [[range searching]] data structures. They have also been used in the definition of the [[treap]] and [[randomized binary search tree]] data structures for [[binary search]] problems, in [[comparison sort]] algorithms that perform efficiently on nearly-sorted inputs, and as the basis for [[pattern matching]] algorithms. A Cartesian tree for a sequence
==Definition==
Cartesian trees are defined using [[binary tree]]s, which are a form of [[rooted tree]]. These are mathematical structures rather than computer [[data structure]]s, although Cartesian trees have been used as part of the definition of certain data structures. A ''rooted tree'' consists of a collection of nodes, with bidirectional links connecting ''parent'' and ''child'' nodes, such that repeatedly following parent links, starting from any node, will always reach a unique ''root'' node. In a ''binary tree'', each node
It is also possible to characterize the Cartesian tree directly rather than recursively, using its ordering properties. In any tree, the ''subtree'' rooted at any node consists of all other nodes that can reach it by repeatedly following parent pointers. The Cartesian tree for a sequence of distinct numbers is defined by the following properties:
Line 13:
#A [[Tree_traversal#In-order,_LNR|symmetric (in-order) traversal]] of the tree results in the original sequence. That is, the left subtree consists of the values earlier than the root in the sequence order, while the right subtree consists of the values later than the root, and a similar ordering constraint holds at each lower node of the tree.
#The tree has the [[Binary heap|min-heap property]]: the parent of any non-root node has a smaller value than the node itself.<ref name=minmax/>
These two definitions are equivalent: the tree defined recursively as described above is the unique tree that has the properties listed above. If a sequence of numbers contains repetitions, a Cartesian tree
==Efficient construction==
A Cartesian tree
One method is to simply process the sequence values in left-to-right order, maintaining the Cartesian tree of the nodes processed so far, in a structure that allows both upwards and downwards traversal of the tree. To process each new value <math>a</math>, start at the node representing the value prior to <math>a</math> in the sequence and follow the path from this node to the root of the tree until finding a value <math>b</math> smaller than <math>a</math>. The node <math>a</math> becomes the right child of <math>b</math>, and the previous right child of <math>b</math> becomes the new left child of <math>a</math>. The total time for this procedure is linear, because the time spent searching for the parent <math>b</math> of each new node <math>a</math> can be [[Potential method|charged]] against the number of nodes that are removed from the rightmost path in the tree.{{sfnp|Gabow|Bentley|Tarjan|1984}}
An alternative linear-time construction algorithm is based on the [[all nearest smaller values]] problem. In the input sequence,
Another linear-time algorithm for Cartesian tree construction is based on divide-and-conquer. The algorithm recursively constructs the tree on each half of the input, and then merges the two trees by taking the right spine of the left tree and left spine of the right tree (both of which are paths whose root-to-leaf order sorts their values from smallest to largest) and performs a standard [[Merge algorithm#Merging two lists|merging]] operation, replacing these two paths in two trees by a single path that contains the same nodes. In the merged path, the successor in the sorted order of each node from the left tree is placed in its right child, and the successor of each node from the right tree is placed in its left child, the same position that was previously used for its successor in the spine. The left children of nodes from the left tree and right children of nodes from the right tree are left unchanged. The algorithm is also parallelizable since on each level of recursion, each of the two sub-problems can be computed in parallel, and the merging operation can be [[Merge algorithm#Parallel merge|efficiently parallelized]] as well.{{sfnp|Shun|Blelloch|2014}}
Line 27:
==Applications==
===Range searching and lowest common ancestors===
[[File:Cartesian tree range searching.svg|thumb|300px|Two-dimensional range-searching using a Cartesian tree: the bottom point (red in the figure) within a three-sided region with two vertical sides and one horizontal side (if the region is nonempty)
Cartesian trees
{{harvtxt|Bender|Farach-Colton|2000}} reversed this relationship between the two data structure problems by showing that data structures for range minimization could also be used for finding lowest common ancestors. Their data structure associates with each node of the tree its distance from the root, and constructs a sequence of these distances in the order of an [[Euler tour]] of the (edge-doubled) tree. It then constructs a range minimization data structure for the resulting sequence. The lowest common ancestor of any two vertices in the given tree can be found as the minimum distance appearing in the interval between the initial positions of these two vertices in the sequence. Bender and Farach-Colton also provide a method for range minimization that can be used for the sequences resulting from this transformation, which have the special property that adjacent sequence values differ by ±1. As they describe, for range minimization in sequences that do not have this form, it is possible to use Cartesian trees to reduce the range minimization problem to lowest common ancestors, and then to use Euler tours to reduce lowest common ancestors to a range minimization problem with this special form.{{sfnp|Bender|Farach-Colton|2000}}
The same range minimization problem may also be given an alternative interpretation in terms of two dimensional range searching. A collection of finitely many points in the [[Cartesian plane]]
The same construction, of lowest common ancestors in a Cartesian tree, makes it possible to construct a data structure with linear space that allows the distances between pairs of points in any [[ultrametric space]] to be queried in constant time per query. The distance within an ultrametric is the same as the [[widest path problem|minimax path]] weight in the [[minimum spanning tree]] of the metric.<ref>{{harvtxt|Hu|1961}}; {{harvtxt|Leclerc|1981}}</ref> From the minimum spanning tree, one can construct a Cartesian tree, the root node of which represents the heaviest edge of the minimum spanning tree. Removing this edge partitions the minimum spanning tree into two subtrees, and Cartesian trees recursively constructed for these two subtrees form the children of the root node of the Cartesian tree. The leaves of the Cartesian tree represent points of the metric space, and the lowest common ancestor of two leaves in the Cartesian tree is the heaviest edge between those two points in the minimum spanning tree, which has weight equal to the distance between the two points. Once the minimum spanning tree has been found and its edge weights sorted, the Cartesian tree
===As a binary search tree===
Line 40:
Because a Cartesian tree is a binary tree, it is natural to use it as a [[binary search tree]] for an ordered sequence of values. However, defining a Cartesian tree based on the same values that form the search keys of a binary search tree does not work well: the Cartesian tree of a sorted sequence is just a [[path graph|path]], rooted at its leftmost endpoint, and binary searching in this tree degenerates to [[sequential search]] in the path. However, it is possible to generate more-balanced search trees by generating ''priority'' values for each search key that are different than the key itself, sorting the inputs by their key values, and using the corresponding sequence of priorities to generate a Cartesian tree. This construction may equivalently be viewed in the geometric framework described above, in which the <math>x</math>-coordinates of a set of points are the search keys and the <math>y</math>-coordinates are the priorities.{{sfnp|Seidel|Aragon|1996}}
This idea was applied by {{harvtxt|Seidel|Aragon|1996}}, who suggested the use of random numbers as priorities. The data structure resulting from this random choice is called a [[treap]], due to its combination of binary search tree and binary heap features. An insertion into a treap
If the priorities of each key are chosen randomly and independently once whenever the key is inserted into the tree, the resulting Cartesian tree will have the same properties as a [[random binary search tree]], a tree computed by inserting the keys in a randomly chosen [[permutation]] starting from an empty tree, with each insertion leaving the previous tree structure unchanged and inserting the new node as a leaf of the tree. Random binary search trees had been studied for much longer, and are known to behave well as search trees (they have [[logarithm]]ic depth with high probability); the same good behavior carries over to treaps. It is also possible, as suggested by Aragon and Seidel, to reprioritize frequently-accessed nodes, causing them to move towards the root of the treap and speeding up future accesses for the same keys.{{sfnp|Seidel|Aragon|1996}}
Line 46:
===In sorting===
[[File:Bracketing pairs.svg|thumb|300px|Pairs of consecutive sequence values (shown as the thick red edges) that bracket a sequence value (the darker blue point). The cost of including this value in the sorted order produced by the Levcopoulos–Petersson algorithm is proportional to the [[logarithm]] of this number of bracketing pairs.]]
{{harvtxt|Levcopoulos|Petersson|1989}} describe a [[sorting algorithm]] based on Cartesian trees. They describe the algorithm as based on a tree with the maximum at the root,{{sfnp|Levcopoulos|Petersson|1989}} but it
The Levcopoulos–Petersson algorithm can be viewed as a version of [[selection sort]] or [[heap sort]] that maintains a [[priority queue]] of candidate minima, and that at each step finds and removes the minimum value in this queue, moving this value to the end of an output sequence. In their algorithm, the priority queue consists only of elements whose parent in the Cartesian tree has already been found and removed. Thus, the algorithm consists of the following steps:{{sfnp|Levcopoulos|Petersson|1989}}
|