Selection algorithm: Difference between revisions

Content deleted Content added
Cunto & Munro
factories and deterministic lower bounds
Line 31:
*The [[median of medians]] method partitions the input into sets of five elements, and then uses some other method (rather than a recursive call) to find the median of each of these sets in constant time per set. It then recursively calls the same selection algorithm to find the median of these <math>n/5</math> medians, using the result as its pivot. It can be shown that, for this choice of pivot, <math>\max(|L|,|R|)\le 7n/10</math>. Thus, a problem on <math>n</math> elements is reduced to two recursive problems on <math>n/5</math> and at most <math>7n/10</math> elements. The total size of these two recursive subproblems is at most <math>9n/10</math>, allowing the total time to be analyzed as a geometric series adding to <math>O(n)</math>. Unlike quickselect, this algorithm is deterministic, not randomized.{{r|clrs|bfprt}} It was the first linear-time deterministic selection algorithm known,{{r|bfprt}} and is commonly taught in undergraduate algorithms classes as an example of a [[divide and conquer]] algorithm that does not divide into two equal subproblems. However, the high constant factors in its <math>O(n)</math> time bound make it unsuitable for practical use.
*Hybrid algorithms such as [[introselect]] can be used to achieve the practical performance of quickselect with a fallback to medians of medians guaranteeing worst-case <math>O(n)</math> time.{{r|musser}}
 
===Factories===
The deterministic selection algorithms with the smallest known numbers of comparisons, for values of <math>k</math> that are far from <math>1</math> or <math>n</math>, are based on the concept of ''factories'', introduced in 1976 by [[Arnold Schönhage]], [[Mike Paterson]], and [[Nick Pippenger]].{{r|spp}} These are methods that build [[partial order]]s of certain specified types, on small subsets of input values, by combining smaller partial orders using comparisons on their elements. As a very simple case, for instance, one type of factory can take as input a sequence of single-element partial orders and produce as output two-element totally ordered sets, obtained by comparing the two elements of two input orders. The goal of such an algorithm is to combine together different factories, with the outputs of some factories going to the inputs of others, in order to eventually obtain a partial order in which one element (the <math>k</math> smallest) is larger than some <math>k-1</math> other elements and smaller than another <math>n-k</math> others. A careful design of these factories leads to an algorithm that, when applied to median-finding, uses at most <math>2.942n</math> comparisons. For other values of <math>k</math>, the number of comparisons is smaller.{{r|dz99}}
 
===Sublinear data structures===
Line 48 ⟶ 51:
 
More generally, selecting the <math>k</math> element out of <math>n</math> requires at least <math>n+\min(k,n-k)-O(1)</math> comparisons, in the average case, matching the number of comparisons of the Floyd–Rivest algorithm up to its <math>o(n)</math> term. The argument is made directly for deterministic algorithms, with a number of comparisons that is averaged over all possible [[permutation]]s of the input values.{{r|cunmun}} However, by [[Yao's principle]], it also applies to the expected number of comparisons for a randomized algorithm on its worst-case input.{{r|chan}}
 
For deterministic algorithms, it has been shown that selecting the <math>k</math> element requires <math>\bigl(1+H(k/n)\bigr)n+\Omega(\sqrt n)</math> comparisons, where <math display=block>H(x)=x\log_2\frac1x + (1-x)\log_2\frac1{1-x}</math> is the [[binary entropy function]].{{r|benjoh}} The special case of median-finding has a slightly larger lower bound on the number of comparisons, at least <math>(2+\varepsilon)n</math>, for <math>\varepsilon\approx 2^{-80}</math>.{{r|dz01}}
 
== Language support ==
Line 67 ⟶ 72:
== References ==
{{reflist|refs=
 
<ref name=benjoh>{{cite conference
| last1 = Bent | first1 = Samuel W.
| last2 = John | first2 = John W.
| editor-last = Sedgewick | editor-first = Robert
| contribution = Finding the median requires <math>2n</math> comparisons
| doi = 10.1145/22145.22169
| pages = 213–216
| publisher = Association for Computing Machinery
| title = Proceedings of the 17th Annual ACM Symposium on Theory of Computing, May 6–8, 1985, Providence, Rhode Island, USA
| year = 1985}}</ref>
 
<ref name=bfprt>{{cite journal
Line 117 ⟶ 133:
| volume = 36
| year = 1989}}</ref>
 
<ref name=dz99>{{cite journal
| last1 = Dor | first1 = Dorit
| last2 = Zwick | first2 = Uri | author2-link = Uri Zwick
| doi = 10.1137/S0097539795288611
| issue = 5
| journal = [[SIAM Journal on Computing]]
| mr = 1694164
| pages = 1722–1758
| title = Selecting the median
| volume = 28
| year = 1999}}</ref>
 
<ref name=dz01>{{cite journal
| last1 = Dor | first1 = Dorit
| last2 = Zwick | first2 = Uri
| doi = 10.1137/S0895480199353895
| issue = 3
| journal = [[SIAM Journal on Discrete Mathematics]]
| mr = 1857348
| pages = 312–325
| title = Median selection requires <math>(2+\varepsilon)N</math> comparisons
| volume = 14
| year = 2001}}</ref>
 
<ref name=floriv>{{cite journal
Line 206 ⟶ 246:
| volume = 26
| year = 1983}}</ref>
 
<ref name=spp>{{cite journal
| last1 = Schönhage | first1 = A. | author1-link = Arnold Schönhage
| last2 = Paterson | first2 = M. | author2-link = Mike Paterson
| last3 = Pippenger | first3 = N. | author3-link = Nick Pippenger
| doi = 10.1016/S0022-0000(76)80029-3
| issue = 2
| journal = [[Journal of Computer and System Sciences]]
| mr = 428794
| pages = 184–199
| title = Finding the median
| volume = 13
| year = 1976}}</ref>
 
<ref name=iso>Section 25.3.2 of ISO/IEC 14882:2003(E) and 14882:1998(E)</ref>