Smoothsort: Difference between revisions

Content deleted Content added
Overview: Rewrite overview to give a more conceptual view of the approach
Operations: Reformulate this section
Line 35:
==Operations==
 
While the two phases of the sorting procedure are opposite to each other as far as the evolution of the sequence-of-heaps structure is concerned, the operations they need to perform to ensure the invariants of the structure have much in common. All these operations are variations of the "sift" operation in a binary max-heap, which restores the heap invariant when it is possibly violated only at the root node. Restoring the increase among the roots of the stretches is obtained by considering the root of each stretch except the first to have as an additional child (Dijkstra uses the term "stepson") the root of the stretch to its left.
Ignoring (for the moment) Dijkstra's optimisations, two operations are necessary: enlarge the heap by adding one element to the right, and shrink the heap by removing the right most element (the root of the last heap), preserving the three types of invariant properties:
# The heap property within each tree,
# The order property, keeping the root elements of the trees in order, and
# The size properties relating the sizes of the various trees.
 
===GrowGrowing the heap region by addingincorporating an element to the right===
 
When an additional element is considered for incorporation into the sequence of stretches (list of disjoint heap structures) it either forms a new stretch of its own added to the sequence, or it combines the two rightmost stretches by becoming parent of both their roots and forming a new stretch that replaces the two in the sequence. Which of the two happens depends only on the sizes of the stretches currently present (and ultimately only on the index of the element added); Dijkstra stipulated that stretches are combined if and only if their sizes are {{nowrap|L(k+1)}} and L(k) for some k, i.e., consecutive Leonardo numbers; the new stretch will have size L(k+2).
Enlarging the heap while maintaining the size properties can be done without any data motion at all by just rearranging the boundaries of the component trees:
* If the last two trees are of size {{nowrap|L(k+1)}} and L(k) (i.e., consecutive Leonardo numbers), combine them with the new element as the root of a larger tree of size L(k+2). Because consecutive trees other than the last two may not have consecutive orders, the third-last tree must have been of size at least L(k+3), and therefore the size property is maintained. This new tree will probably not have the heap property.
* If the last two trees in the list are not consecutive Leonardo numbers, then we may simply add a new heap of size 1 to the list. This 1 is taken to be L(1), unless the rightmost tree is already of order 1, in which case the new one-element tree is taken to be of size L(0).
 
After a new element ''x'' is incorporated into the region of heap structures, as the root of a final stretch that is either a new singleton or obtained by combining two previous stretches, the following procedure called "trinkle" will restore the required relations. The element ''x'' is repeatedly swapped with the root of the previous stretches, as long as this condition holds: there is a stretch immediately to the left of the stretch of ''x'', whose root is greater than ''x'', and ''also'' greater than both of the children of ''x'', if ''x'' has them. Note that at this time no ordering relation between ''x'' and its children has yet been established, and if the largest of its children exceeds ''x'', then it is that child that will become the root once the max-heap property is ensured. After thus moving ''x'' into the appropriate stretch, the heap property of the tree for that stretch is established by "sifting down" the new element to its correct position. This is the same operation as used in heapsort, adapted to the different implicit tree structure used here. It proceeds as follows: while ''x'' is not at a leaf position, and its greater child exceeds ''x'', swap ''x'' with that child and continue sift-down.
After this, the newly added element must be moved to the correct ___location to maintain the heap and order properties. Because there are {{math|''O''(log ''n'')}} trees, each of depth {{math|''O''(log ''n'')}}, it is simple to do this in {{math|''O''(log ''n'')}} time, as follows:
 
Because there are {{math|''O''(log ''n'')}} stretches, whose tree structures have depth {{math|''O''(log ''n'')}}, the complexity of trinkle is bounded by {{math|''O''(log ''n'')}}.
# First, restore the order property by moving the new element left until it is at the root of the correct tree:
#* Begin with the rightmost tree (the one that has just been created) as the "current" tree.
#* While there is a tree to the left of the current tree and its root is greater than the new element (the current root) ''and'' both of its children,
#** swap the new element with the root of the tree to the left. This preserves the heap property of the current tree. The tree on the left then becomes the current tree, and we repeat this step.
# Second, restore the heap property to the current tree by "sifting down" the new element to its correct position. (This is a standard heap operation, as used in heapsort.)
#* While the current tree has a size greater than 1 and either child of is greater than the current root, then
#** Swap the greater child root with the current root. That child tree becomes the current tree and sift-down continues.
 
The trees used are systematically slightly unbalanced (any left subtree has depth one more than the corresponding right subtree, except when both are singletons); still, the sift-down operation isremains slightlysomewhat simpler than inone that can handle general binary heaps, becausesince each node has either two children or zero.none; there Oneare doesfewer not needcases to handle the condition of one of the child heaps not being presentdistinguish.
 
====OptimisationOptimization====
 
With respect to the description above, Dijkstra's algorithm implements the following improvement.
There are two opportunities to omit some operations during heap construction:
* If the new tree is going to become part of a larger tree before we are done, then don't bother establishing the order property: it only needs to be done when a tree has reached its final size.
** To do this, look at how many elements remain after the new tree of size L(k). If there are {{nowrap|L(k − 1) + 1}} or more, then this new tree is going to be merged.
* Do not maintain the heap property of the rightmost tree. If that tree becomes one of the final trees of the heap, then maintaining the order property will restore the heap property. Of course, whenever a new tree is added to the list, then the rightmost tree is no longer the rightmost and the heap property needs to be restored, but this can be done in {{math|''O''(''n'')}} time, whereas maintaining it at each step takes {{math|''O''(''n'' log ''n'')}} time.
 
The decision of how to handle the incorporation of a new element is postponed until the type of the next element is inspected: if either that next element is the parent of the current element (having it as right child), or is a leaf but the parent of the current element still is within the range of the array being sorted (when it comes along it will take the current element as left child), then instead of trinkle a simple sift within its subtree is done, avoiding comparison with previous roots. For the final element of the array trinkle is always applied. Thus during the beginning of the growing phase, one is most often just applying sift-down in a single stretch; only those "stepson" relations are considered (by trinkle) that will continue to hold until the end of the growing phase.
===Shrink the heap by removing the rightmost element===
This is the reverse of the grow process:
* If the rightmost tree has a size of 1 (i.e., L(1) or L(0)), then this is trivial; simply remove that rightmost tree.
* If the rightmost tree has size greater than 1, then remove its root, exposing the two sub-trees as members of the list. This preserves the size property. Restore the order property using the same algorithm as in construction; first on the left subtree, and then on the right one.
 
===ShrinkShrinking the heap region by removingseparating the rightmost element from it===
====Optimisation====
 
* Unlike when constructing the heap, when restoring the order property after removing a tree's root, we know that the newly exposed trees satisfy the heap property. Therefore, it is not necessary to compare the preceding tree's root to the children of the newly exposed root. Just compare it to the root. (This only applies to the first step. After an element has been swapped left once, it is necessary to compare to both children.)
During this phase, the form of the sequence of stretches goes through the changes of the growing phase in reverse. No work at all is needed when separating off a leaf node, but for a non-leaf node its two children become roots of new stretches, and need to be moved to their proper place in the sequence of roots of stretches. This can be obtained by applying trinkle first for the left child, and then for the right child.
 
====Optimization====
In this description, one knows that at the position where trinkle starts the heap property is already valid. So I can simplify by, instead of doing the tests that trinkle needs to do, just compare and possibly swap the element with the root before it (if any), and afterwards apply trinkle at its new position if a swap was actually done. Since a swap may invalidate the heap property at the new position where the smaller root moved to, the simplification only applies to the first step.
 
LocalWords: trinkle
 
==Analysis==