Selection algorithm: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Section-stub}}
Line 39:
* [[Introselect]] ([[Introsort]])
 
== Incremental sorting by selection ==
== Selection as incremental sorting ==
Converse to selection by sorting, one can incrementally sort by repeated selection. Abstractly, selection only yields a single element, the ''k''th element. However, practical selection algorithms frequently involve partial sorting, or can be modified to do so. Selecting by partial sorting naturally does so, sorting the elements up to ''k'', and selecting by partitioning also sorts some elements: the pivots are sorted to the correct positions, with the ''k''th element being the final pivot, and the elements between the pivots have values between the pivot values. The difference between partition-based selection and partition-based sorting, as in quickselect versus quicksort, is that in selection one recurses on only one side of each pivot, sorting only the pivots (an average of log(''n'') pivots are used), rather than recursing on both sides of the pivot.
One of the advantages of the sort-and-index approach, as mentioned, is its ability to [[amortized analysis|amortize]] the sorting cost over many subsequent selections. However, sometimes the number of selections that will be done is not known in advance, and may be either small or large. In these cases, we can adapt the algorithms given above to simultaneously select an element while [[Partial sorting|partially sorting]] the list, thus accelerating future selections.
 
This can be used to speed up subsequent selections on the same data; in the extreme, a fully sorted array allows O(1) selection. Further, compared with first doing a full sort, incrementally sorting by repeated selection [[amortized analysis|amortizes]] the sorting cost over multiple selections.
Both the selection procedure based on minimum-finding and the one based on partitioning can be seen as a form of partial sort. The minimum-based algorithm sorts the list up to the given index, and so clearly speeds up future selections, especially of smaller indexes. The partition-based algorithm does not achieve the same behaviour automatically, but can be adapted to remember its previous pivot choices and reuse them wherever possible, avoiding costly partition operations, particularly the top-level one. The list becomes gradually more sorted as more partition operations are done incrementally; no pivots are ever "lost". If desired, this same pivot list could be passed on to quicksort to reuse, again avoiding many costly partition operations.
 
For partially sorted data (up to ''k''), so long as the partially sorted data and the index ''k'' up to which the data is sorted are recorded, subsequent selections of ''j'' less than or equal to ''k'' can simply select the ''j''th element, as it is already sorted, while selections of ''j'' greater than ''k'' only need to sort the elements above the ''k''th position.
 
For partitioned data, if the list of pivots is stored (for example, in a sorted list of the indices), then subsequent selections only need to select in the interval between two pivots (the nearest pivots below and above). The biggest gain is from the top-level pivots, which eliminate costly large partitions: a single pivot near the middle of the data cuts the time for future selections in half. The pivot list will grow over subsequent selections, as the data becomes more sorted, and can even be passed to a partition-based sort as the basis of a full sort.
 
== Using data structures to select in sublinear time ==