Content deleted Content added
source selection from a heap; COI edit, to make up for which I am dropping the link to my old lecture notes |
promote Blum et al to references |
||
Line 19:
The time for this method is dominated by the sorting step, which requires <math>\Theta(n\log n)</math> time using a [[comparison sort]].{{r|clrs}} Even when [[integer sorting]] algorithms may be used, these are generally slower than the linear time that may be achieved using specialized selection algorithms. Nevertheless, the simplicity of this approach makes it attractive, especially when a highly-optimized sorting routine is provided as part of a runtime library, but a selection algorithm is not.
For a sorting algorithm that generates one item at a time, such as [[selection sort]], the scan can be done in tandem with the sort, and the sort can be terminated once the <math>k</math> element has been found. One possible design of a consolation bracket in a [[single-elimination tournament]], in which the teams who lost to the eventual winner play another mini-tournament to determine second place, can be seen as an instance of this method.{{r|bfprt}} Applying this optimization to [[heapsort]] produces the [[heapselect]] algorithm, which can select the <math>k</math>th smallest value in time <math>O(n+k\log n)</math>. This is fast when <math>k</math> is small relative to <math>n</math>, but degenerates to <math>O(n\log n)</math> for larger values of <math>k</math>, such as the choice <math>k=n/2</math> used for median finding.
===Decision trees===
Line 31:
*If the pivot were exactly at the median of the input, then each recursive call would have at most half as many values as the previous call, and the total times would add in a [[geometric series]] to <math>O(n)</math>. However, finding the median is itself a selection problem, on the entire original input. Trying to find it by a recursive call to a selection algorithm would lead to an infinite recursion, because the problem size would not decrease in each call.
*[[Quickselect]] chooses the pivot uniformly at random from the input values. It can be described as a variant of [[quicksort]], with the same pivoting strategy, but where quicksort makes two recursive calls to sort the two subcollections <math>L</math> and <math>R</math>, quickselect only makes one of these two calls. Its [[expected time]] is <math>O(n)</math>.{{r|clrs}}
*The [[median of medians]] method partitions the input into sets of five elements, and then uses some other method (rather than a recursive call) to find the median of each of these sets in constant time per set. It then recursively calls the same selection algorithm to find the median of these <math>n/5</math> medians, using the result as its pivot. It can be shown that, for this choice of pivot, <math>\max(|L|,|R|)\le 7n/10</math>. Thus, a problem on <math>n</math> elements is reduced to two recursive problems on <math>n/5</math> and at most <math>7n/10</math> elements. The total size of these two recursive subproblems is at most <math>9n/10</math>, allowing the total time to be analyzed as a geometric series adding to <math>O(n)</math>. Unlike quickselect, this algorithm is deterministic, not randomized.{{r|clrs|bfprt}} It was the first linear-time selection algorithm known,{{r|bfprt}} and is commonly taught in undergraduate algorithms classes as an example of a [[divide and conquer]] algorithm that does not divide into two equal subproblems. However, the high constant factors in its <math>O(n)</math> time bound make it unsuitable for practical use.
*Hybrid algorithms such as [[introselect]] can be used to achieve the practical performance of quickselect with a fallback to medians of medians guaranteeing worst-case <math>O(n)</math> time.
Line 59:
[[Matlab]] includes <code>maxk()</code> and <code>mink()</code> functions, which return the maximal (minimal) k values in a vector as well as their indices.
== History==
The first known linear time selection algorithm is the median-of-medians method, published in 1973 by [[Manuel Blum]], [[Robert W. Floyd]], [[Vaughan Pratt]], [[Ron Rivest]], and [[Robert Tarjan]]. They trace the formulation of the selection problem to work of [[Lewis Carroll]] in 1883, who pointed out that the usual design of single-elimination sports tournaments does not guarantee that the second-best player wins second place, and to work of [[Hugo Steinhaus]] circa 1930, who followed up this same line of thought by asking for a tournament design that can make this guarantee, with a minimum number of games played (that is, comparisons).{{r|bfprt}}
== See also ==
Line 66 ⟶ 69:
== References ==
{{reflist|refs=
<ref name=bfprt>{{cite journal
| last1 = Blum | first1 = Manuel | author1-link = Manuel Blum
| last2 = Floyd | first2 = Robert W. | author2-link = Robert W. Floyd
| last3 = Pratt | first3 = Vaughan | author3-link = Vaughan Pratt
| last4 = Rivest | first4 = Ronald L. | author4-link = Ron Rivest
| last5 = Tarjan | first5 = Robert E. | author5-link = Robert Tarjan
| doi = 10.1016/S0022-0000(73)80033-9 | doi-access = free
| journal = [[Journal of Computer and System Sciences]]
| mr = 329916
| pages = 448–461
| title = Time bounds for selection
| url = http://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf
| volume = 7
| year = 1973}}</ref>
<ref name=clrs>{{Introduction to Algorithms|edition=3|chapter=Chapter 9: Medians and order statistics|pages=213–227}}; "Section 14.1: Dynamic order statistics", pp. 339–345</ref>
Line 99 ⟶ 117:
==Further reading==
{{refbegin}}
* {{Cite journal | last1 = Floyd | first1 = R. W. | author-link1 = Robert W. Floyd | last2 = Rivest | first2 = R. L. | author-link2 = Ron Rivest | doi = 10.1145/360680.360691 | title = Expected time bounds for selection | journal = Communications of the ACM | volume = 18 | issue = 3 | pages = 165–172 | date=March 1975 | s2cid = 3064709 }}
* {{Cite journal | last1 = Kiwiel | first1 = K. C. | doi = 10.1016/j.tcs.2005.06.032 | title = On Floyd and Rivest's SELECT algorithm | journal = Theoretical Computer Science | volume = 347 | pages = 214–238 | year = 2005 | issue = 1–2 | doi-access = free }}
|