Selection algorithm: Difference between revisions

Content deleted Content added
factories and deterministic lower bounds
Lower bounds: too many howevers
Line 43:
 
== Lower bounds ==
The <math>O(n)</math> running time of the selection algorithms described above is necessary, because a selection algorithm that can handle inputs in an arbitrary order must take that much time to look at all of its inputs; if any one of its input values is not compared, that one value could be the one that should have been selected, and the algorithm can be made to produce an incorrect answer. However, beyondBeyond this simple argument, there has been a significant amount of research on the exact number of comparisons needed for selection, both in the randomized and deterministic cases.
 
Selecting the minimum of <math>n</math> values requires <math>n-1</math> comparisons, because the <math>n-1</math> values that are not selected must each have been determined to be non-minimal, by being the largest in some comparison, and no two of these values can be largest in the same comparison. The same argument applies symmetrically to selecting the maximum.{{r|knuth}}
Line 50:
With <math>n-1</math> values being the larger in at least one comparison, and <math>p-1</math> values being the larger in at least two comparisons, there are a total of at least <math>n+p-2</math> comparisons. An [[Adversary model|adversary argument]], in which the outcome of each comparison is chosen in order to maximize <math>p</math> (subject to consistency with at least one possible ordering) rather than by the numerical values of the given items, shows that it is possible to force <math>p</math> to be at least <math>\log_2 n</math>. Therefore, the worst-case number of comparisons needed to select the second smallest is <math>n+\lceil\log_2 n\rceil-2</math>, the same number that would be obtained by holding a single-elimination tournament with a run-off tournament among the values that lost to the smallest value. However, the expected number of comparisons of a randomized selection algorithm can be better than this bound; for instance, selecting the second-smallest of six elements requires seven comparisons in the worst case, but may be done by a randomized algorithm with an expected number of 6.5 comparisons.{{r|knuth}}
 
More generally, selecting the <math>k</math> element out of <math>n</math> requires at least <math>n+\min(k,n-k)-O(1)</math> comparisons, in the average case, matching the number of comparisons of the Floyd–Rivest algorithm up to its <math>o(n)</math> term. The argument is made directly for deterministic algorithms, with a number of comparisons that is averaged over all possible [[permutation]]s of the input values.{{r|cunmun}} However, byBy [[Yao's principle]], it also applies to the expected number of comparisons for a randomized algorithm on its worst-case input.{{r|chan}}
 
For deterministic algorithms, it has been shown that selecting the <math>k</math> element requires <math>\bigl(1+H(k/n)\bigr)n+\Omega(\sqrt n)</math> comparisons, where <math display=block>H(x)=x\log_2\frac1x + (1-x)\log_2\frac1{1-x}</math> is the [[binary entropy function]].{{r|benjoh}} The special case of median-finding has a slightly larger lower bound on the number of comparisons, at least <math>(2+\varepsilon)n</math>, for <math>\varepsilon\approx 2^{-80}</math>.{{r|dz01}}