Content deleted Content added
clean up and general fixes, http: --> https:, typo(s) fixed: Indeed → Indeed, using AWB |
|||
Line 35:
== Partition-based selection ==
{{
Linear performance can be achieved by a partition-based selection algorithm, most basically [[quickselect]]. Quickselect is a variant of [[quicksort]] – in both one chooses a pivot and then partitions the data by it, but while Quicksort recurses on both sides of the partition, Quickselect only recurses on one side, namely the side on which the desired ''k''th element is. As with Quicksort, this has optimal average performance, in this case linear, but poor worst-case performance, in this case quadratic. This occurs for instance by taking the first element as the pivot and searching for the maximum element, if the data is already sorted. In practice this can be avoided by choosing a random element as pivot, which yields [[almost certain]] linear performance. Alternatively, a more careful deterministic pivot strategy can be used, such as [[median of medians]]. These are combined in the hybrid [[introselect]] algorithm (analogous to [[introsort]]), which starts with Quickselect but falls back to median of medians if progress is slow, resulting in both fast average performance and optimal worst-case performance. The average time complexity performance is O(''n'').
Line 42:
=== Median selection as pivot strategy ===
{{
A median-selection algorithm can be used to yield a general selection algorithm or sorting algorithm, by applying it as the pivot strategy in Quickselect or Quicksort; if the median-selection algorithm is asymptotically optimal (linear-time), the resulting selection or sorting algorithm is as well. In fact, an exact median is not necessary – an approximate median is sufficient. In the [[median of medians]] selection algorithm, the pivot strategy computes an approximate median and uses this as pivot, recursing on a smaller set to compute this pivot. In practice the overhead of pivot computation is significant, so these algorithms are generally not used, but this technique is of theoretical interest in relating selection and sorting algorithms.
Line 63:
The strategy to find an order statistic in [[sublinear time]] is to store the data in an organized fashion using suitable data structures that facilitate the selection. Two such data structures are tree-based structures and frequency tables.
When only the minimum (or maximum) is needed, a good approach is to use a [[
Another simple strategy is based on some of the same concepts as the [[hash table]]. When we know the range of values beforehand, we can divide that range into ''h'' subintervals and assign these to ''h'' buckets. When we insert an element, we add it to the bucket corresponding to the interval it falls in. To find the minimum or maximum element, we scan from the beginning or end for the first nonempty bucket and find the minimum or maximum element in that bucket. In general, to find the ''k''th element, we maintain a count of the number of elements in each bucket, then scan the buckets from left to right adding up counts until we find the bucket containing the desired element, then use the expected linear-time algorithm to find the correct element in that bucket.
Line 90:
[[online algorithm|Online]] selection may refer narrowly to computing the ''k''th smallest element of a stream, in which case partial sorting algorithms (with ''k'' + O(1) space for the ''k'' smallest elements so far) can be used, but partition-based algorithms cannot be.
Alternatively, selection itself may be required to be [[online algorithm|online]], that is, an element can only be selected from a sequential input at the instance of observation and each selection, respectively refusal, is irrevocable. The problem is to select, under these constraints, a specific element of the input sequence (as for example the largest or the smallest value) with largest probability. This problem can be tackled by the [[Odds algorithm]], which yields the optimal under an independence condition; it is also optimal itself as an algorithm with the number of computations being linear in the length of input.
The simplest example is the [[secretary problem]] of choosing the maximum with high probability, in which case optimal strategy (on random data) is to track the running maximum of the first ''n''/''e'' elements and reject them, and then select the first element that is higher than this maximum.
Line 106:
[[Python (programming language)|Python]]'s standard library (since 2.4) includes <code>[https://docs.python.org/library/heapq.html heapq].nsmallest()</code> and <code>nlargest()</code>, returning sorted lists, the former in O(''n'' + ''k'' log ''n'') time, the latter in O(''n'' log ''k'') time.
Because [[sorting algorithm#Language support|language support for sorting]] is more ubiquitous, the simplistic approach of sorting followed by indexing is preferred in many environments despite its disadvantage in speed. Indeed, for [[Lazy evaluation|lazy languages]], this simplistic approach can even achieve the best complexity possible for the ''k'' smallest/greatest sorted (with maximum/minimum as a special case) if the sort is lazy enough{{Citation needed|date=April 2014}}.
== See also ==
Line 117:
* {{Cite journal | last1 = Floyd | first1 = R. W. | authorlink1 = Robert W. Floyd | last2 = Rivest | first2 = R. L. | authorlink2 = Ron Rivest | doi = 10.1145/360680.360691 | title = Expected time bounds for selection | journal = Communications of the ACM | volume = 18 | issue = 3 | pages = 165–172 | date=March 1975 }}
* {{Cite journal | last1 = Kiwiel | first1 = K. C. | doi = 10.1016/j.tcs.2005.06.032 | title = On Floyd and Rivest's SELECT algorithm | journal = Theoretical Computer Science | volume = 347 | pages = 214–238 | year = 2005 | pmid = | pmc = }}
* [[Donald Knuth]]. ''[[The Art of Computer Programming]]'', Volume 3: ''Sorting and Searching'', Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Section 5.3.3: Minimum-Comparison Selection, pp. 207–219.
* [[Thomas H. Cormen]], [[Charles E. Leiserson]], [[Ronald L. Rivest]], and [[Clifford Stein]]. ''[[Introduction to Algorithms]]'', Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapter 9: Medians and Order Statistics, pp. 183–196. Section 14.1: Dynamic order statistics, pp. 302–308.
* {{DADS|Select|select}}
{{refend}}
|