Revision as of 06:49, 31 March 2023 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,660 edits →Exact numbers of comparisons: too many knowns ← Previous edit		Revision as of 06:56, 31 March 2023 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,660 edits more nowrap Next edit →
Line 45: The next simplest case is selecting the second-smallest. After several incorrect attempts, the first tight lower bound on this case was published in 1964 by Soviet mathematician [[Sergey Kislitsyn]]. It can be shown by observing that selecting the second-smallest also requires distinguishing the smallest value from the rest, and by considering the number <math>p</math> of comparisons involving the smallest value that an algorithm for this problem makes. Each of the <math>p</math> items that were compared to the smallest value is a candidate for second-smallest, and <math>p-1</math> of these values must be found larger than another value in a second comparison in order to rule them out as second-smallest. With <math>n-1</math> values being the larger in at least one comparison, and <math>p-1</math> values being the larger in at least two comparisons, there are a total of at least <math>n+p-2</math> comparisons. An [[Adversary model\|adversary argument]], in which the outcome of each comparison is chosen in order to maximize <math>p</math> (subject to consistency with at least one possible ordering) rather than by the numerical values of the given items, shows that it is possible to force <math>p</math> to be {{nowrap\|at least <math>\log_2 n</math>.}} Therefore, the worst-case number of comparisons needed to select the second smallest {{nowrap\|is <math>n+\lceil\log_2 n\rceil-2</math>,}} the same number that would be obtained by holding a single-elimination tournament with a run-off tournament among the values that lost to the smallest value. However, the expected number of comparisons of a randomized selection algorithm can be better than this bound; for instance, selecting the second-smallest of six elements requires seven comparisons in the worst case, but may be done by a randomized algorithm with an expected number of {{nowrap\|6.5 comparisons.{{r\|knuth}}}} More generally, selecting the {{nowrap\|<math>k</math>th}} element out of <math>n</math> requires at least <math>n+\min(k,n-k)-O(1)</math> comparisons, in the average case, matching the number of comparisons of the Floyd–Rivest algorithm up to its <math>o(n)</math> term. The argument is made directly for deterministic algorithms, with a number of comparisons that is averaged over all possible [[permutation]]s of the input {{nowrap\|values.{{r\|cunmun}}}} By [[Yao's principle]], it also applies to the expected number of comparisons for a randomized algorithm on its worst-case {{nowrap\|input.{{r\|chan}}}} For deterministic algorithms, it has been shown that selecting the {{nowrap\|<math>k</math>th}} element requires <math>\bigl(1+H(k/n)\bigr)n+\Omega(\sqrt n)</math> comparisons, where <math display=block>H(x)=x\log_2\frac1x + (1-x)\log_2\frac1{1-x}</math> is the {{nowrap\|[[binary entropy function]].{{r\|benjoh}}}} The special case of median-finding has a slightly larger lower bound on the number of comparisons, {{nowrap\|at least <math>(2+\varepsilon)n</math>,}} for {{nowrap\|<math>\varepsilon\approx 2^{-80}</math>.{{r\|dz01}}}} == Exact numbers of comparisons == Knuth supplies the following triangle of numbers summarizing pairs of <math>n</math> and <math>k</math> for which the exact number of comparisons needed by an optimal selection algorithm is known. The {{nowrap\|<math>n</math>th}} row of the triangle (starting with <math>n=1</math> in the top row) gives the numbers of comparisons for inputs of <math>n</math> values, and the {{nowrap\|<math>k</math>th}} number within each row gives the number of comparisons needed to select the {{nowrap\|<math>k</math>th}} smallest value from an input of that size. The rows are symmetric because selecting the {{nowrap\|<math>k</math>th}} smallest requires exactly the same number of comparisons, in the worst case, as selecting the {{nowrap\|<math>k</math>th}} {{nowrap\|largest.{{r\|knuth}}}} {{center\|0}} Line 68: Very few languages have built-in support for general selection, although many provide facilities for finding the smallest or largest element of a list. A notable exception is the [[Standard Template Library]] for [[C++]], which provides a templated <code>nth_element</code> method with a guarantee of expected linear time.{{r\|skiena}} [[Python (programming language)\|Python]]'s standard library (since 2.4) includes <code>heapq.nsmallest</code> and <code>heapq.nlargest</code> subroutines for returning the smallest or largest elements from a collection, in sorted order. As of Python version 3.11, these subroutines use heapselect, taking time <math>O(n+k\log n)</math> to return a list of <math>k</math> elements. When <math>k</math> is large relative {{nowrap\|to <math>n</math>,}} they revert to sorting the whole input and then returning a slice of the resulting sorted array. A linear-time selection algorithm is not {{nowrap\|included.{{r\|python}}}} Since 2017, [[Matlab]] has included <code>maxk()</code> and <code>mink()</code> functions, which return the maximal (minimal) <math>k</math> values in a vector as well as their indices. The Matlab documentation does not specify which algorithm these functions use or what their running {{nowrap\|time is.{{r\|matlab}}}} == History== [[Quickselect]] was presented without analysis by [[Tony Hoare]] {{nowrap\|in 1965,{{r\|hoare}}}} and first analyzed in a 1971 technical report by {{nowrap\|[[Donald Knuth]].{{r\|floriv}}}} The first known linear time deterministic selection algorithm is the [[median of medians]] method, published in 1973 by [[Manuel Blum]], [[Robert W. Floyd]], [[Vaughan Pratt]], [[Ron Rivest]], and [[Robert Tarjan]]. They trace the formulation of the selection problem to work of [[Lewis Carroll]] in 1883, who pointed out that the usual design of single-elimination sports tournaments does not guarantee that the second-best player wins second place, and to work of [[Hugo Steinhaus]] circa 1930, who followed up this same line of thought by asking for a tournament design that can make this guarantee, with a minimum number of games played (that is, {{nowrap\|comparisons).{{r\|bfprt}}}} == See also ==

Selection algorithm: Difference between revisions