Selection algorithm: Difference between revisions

Content deleted Content added
more nowrap
Line 45:
 
The next simplest case is selecting the second-smallest. After several incorrect attempts, the first tight lower bound on this case was published in 1964 by Soviet mathematician [[Sergey Kislitsyn]]. It can be shown by observing that selecting the second-smallest also requires distinguishing the smallest value from the rest, and by considering the number <math>p</math> of comparisons involving the smallest value that an algorithm for this problem makes. Each of the <math>p</math> items that were compared to the smallest value is a candidate for second-smallest, and <math>p-1</math> of these values must be found larger than another value in a second comparison in order to rule them out as second-smallest.
With <math>n-1</math> values being the larger in at least one comparison, and <math>p-1</math> values being the larger in at least two comparisons, there are a total of at least <math>n+p-2</math> comparisons. An [[Adversary model|adversary argument]], in which the outcome of each comparison is chosen in order to maximize <math>p</math> (subject to consistency with at least one possible ordering) rather than by the numerical values of the given items, shows that it is possible to force <math>p</math> to be {{nowrap|at least <math>\log_2 n</math>.}} Therefore, the worst-case number of comparisons needed to select the second smallest {{nowrap|is <math>n+\lceil\log_2 n\rceil-2</math>,}} the same number that would be obtained by holding a single-elimination tournament with a run-off tournament among the values that lost to the smallest value. However, the expected number of comparisons of a randomized selection algorithm can be better than this bound; for instance, selecting the second-smallest of six elements requires seven comparisons in the worst case, but may be done by a randomized algorithm with an expected number of {{nowrap|6.5 comparisons.{{r|knuth}}}}
 
More generally, selecting the {{nowrap|<math>k</math>th}} element out of <math>n</math> requires at least <math>n+\min(k,n-k)-O(1)</math> comparisons, in the average case, matching the number of comparisons of the Floyd–Rivest algorithm up to its <math>o(n)</math> term. The argument is made directly for deterministic algorithms, with a number of comparisons that is averaged over all possible [[permutation]]s of the input {{nowrap|values.{{r|cunmun}}}} By [[Yao's principle]], it also applies to the expected number of comparisons for a randomized algorithm on its worst-case {{nowrap|input.{{r|chan}}}}
 
For deterministic algorithms, it has been shown that selecting the {{nowrap|<math>k</math>th}} element requires <math>\bigl(1+H(k/n)\bigr)n+\Omega(\sqrt n)</math> comparisons, where <math display=block>H(x)=x\log_2\frac1x + (1-x)\log_2\frac1{1-x}</math> is the {{nowrap|[[binary entropy function]].{{r|benjoh}}}} The special case of median-finding has a slightly larger lower bound on the number of comparisons, {{nowrap|at least <math>(2+\varepsilon)n</math>,}} for {{nowrap|<math>\varepsilon\approx 2^{-80}</math>.{{r|dz01}}}}
 
== Exact numbers of comparisons ==
Knuth supplies the following triangle of numbers summarizing pairs of <math>n</math> and <math>k</math> for which the exact number of comparisons needed by an optimal selection algorithm is known. The {{nowrap|<math>n</math>th}} row of the triangle (starting with <math>n=1</math> in the top row) gives the numbers of comparisons for inputs of <math>n</math> values, and the {{nowrap|<math>k</math>th}} number within each row gives the number of comparisons needed to select the {{nowrap|<math>k</math>th}} smallest value from an input of that size. The rows are symmetric because selecting the {{nowrap|<math>k</math>th}} smallest requires exactly the same number of comparisons, in the worst case, as selecting the {{nowrap|<math>k</math>th}} {{nowrap|largest.{{r|knuth}}}}
 
{{center|0}}
Line 68:
Very few languages have built-in support for general selection, although many provide facilities for finding the smallest or largest element of a list. A notable exception is the [[Standard Template Library]] for [[C++]], which provides a templated <code>nth_element</code> method with a guarantee of expected linear time.{{r|skiena}}
 
[[Python (programming language)|Python]]'s standard library (since 2.4) includes <code>heapq.nsmallest</code> and <code>heapq.nlargest</code> subroutines for returning the smallest or largest elements from a collection, in sorted order. As of Python version 3.11, these subroutines use heapselect, taking time <math>O(n+k\log n)</math> to return a list of <math>k</math> elements. When <math>k</math> is large relative {{nowrap|to <math>n</math>,}} they revert to sorting the whole input and then returning a slice of the resulting sorted array. A linear-time selection algorithm is not {{nowrap|included.{{r|python}}}}
 
Since 2017, [[Matlab]] has included <code>maxk()</code> and <code>mink()</code> functions, which return the maximal (minimal) <math>k</math> values in a vector as well as their indices. The Matlab documentation does not specify which algorithm these functions use or what their running {{nowrap|time is.{{r|matlab}}}}
 
== History==
[[Quickselect]] was presented without analysis by [[Tony Hoare]] {{nowrap|in 1965,{{r|hoare}}}} and first analyzed in a 1971 technical report by {{nowrap|[[Donald Knuth]].{{r|floriv}}}} The first known linear time deterministic selection algorithm is the [[median of medians]] method, published in 1973 by [[Manuel Blum]], [[Robert W. Floyd]], [[Vaughan Pratt]], [[Ron Rivest]], and [[Robert Tarjan]]. They trace the formulation of the selection problem to work of [[Lewis Carroll]] in 1883, who pointed out that the usual design of single-elimination sports tournaments does not guarantee that the second-best player wins second place, and to work of [[Hugo Steinhaus]] circa 1930, who followed up this same line of thought by asking for a tournament design that can make this guarantee, with a minimum number of games played (that is, {{nowrap|comparisons).{{r|bfprt}}}}
 
== See also ==