Selection algorithm: Difference between revisions

Content deleted Content added
2nd devroye ref
Line 24:
As with the related pivoting-based [[quicksort]] algorithm, the partition of the input into <math>L</math> and <math>R</math> may be done by making new collections for these sets, or by a method that partitions a given list or array data type in-place. Details vary depending on how the input collection is {{nowrap|represented.<ref>For instance, Cormen et al. use an in-place array partition, while Kleinberg and Tardos describe the input as a set and use a method that partitions it into two new sets.</ref>}} The time to compare the pivot against all the other values {{nowrap|is <math>O(n)</math>.{{r|kletar}}}} However, pivoting methods differ in how they choose the pivot, which affects how big the subproblems in each recursive call will be. The efficiency of these methods depends greatly on the choice of the pivot. If the pivot is chosen badly, the running time of this method can be as slow {{nowrap|as <math>O(n^2)</math>.{{r|erickson}}}}
*If the pivot were exactly at the median of the input, then each recursive call would have at most half as many values as the previous call, and the total times would add in a [[geometric series]] {{nowrap|to <math>O(n)</math>.}} However, finding the median is itself a selection problem, on the entire original input. Trying to find it by a recursive call to a selection algorithm would lead to an infinite recursion, because the problem size would not decrease in each {{nowrap|call.{{r|kletar}}}}
*[[Quickselect]] chooses the pivot uniformly at random from the input values. It can be described as a [[prune and search]] algorithm,{{r|gootam}} a variant of [[quicksort]], with the same pivoting strategy, but where quicksort makes two recursive calls to sort the two subcollections <math>L</math> {{nowrap|and <math>R</math>,}} quickselect only makes one of these two calls. Its [[expected time]] {{nowrap|is <math>O(n)</math>.{{r|clrs|kletar|gootam}}}} For any constant <math>C</math>, the probability that its number of comparisons exceeds <math>Cn</math> is atsuperexponentially most proportional to an inverse exponential functionsmall {{nowrap|ofin <math>C</math>.{{r|devroye}}}}
*The [[Floyd–Rivest algorithm]], a variation of quickselect, chooses a pivot by randomly sampling a subset of <math>r</math> data values, for some sample {{nowrap|size <math>r</math>,}} and then recursively selecting two elements somewhat above and below position <math>rk/n</math> of the sample to use as pivots. With this choice, it is likely that <math>k</math> is sandwiched between the two pivots, so that after pivoting only a small number of data values between the pivots are left for a recursive call. This method can achieve an expected number of comparisons that is {{nowrap|<math>n+\min(k,n-k)+o(n)</math>.{{r|floriv}}}} In their original work, Floyd and Rivest claimed that the <math>o(n)</math> term could be made as small as <math>O(\sqrt n)</math> by a recursive sampling scheme, but the correctness of their analysis has been {{nowrap|questioned.{{r|brown|prt}}}} Instead, more rigorous analysis has shown that a version of their algorithm achieves <math>O(\sqrt{n\log n})</math> for this {{nowrap|term.{{r|knuth}}}} Although the usual analysis of both quickselect and the Floyd–Rivest algorithm assumes the use of a [[true random number generator]], a version of the Floyd–Rivest algorithm using a [[pseudorandom number generator]] seeded with only logarithmically many true random bits has been proven to run in linear time with high probability.{{r|karrag}}
[[File:Mid-of-mid.png|thumb|upright=1.35|Visualization of pivot selection for the [[median of medians]] method. Each set of five elements is shown as a column of dots in the figure, sorted in increasing order from top to bottom. If their medians (the green and purple dots in the middle row) are sorted in increasing order from left to right, and the median of medians is chosen as the pivot, then the <math>3n/10</math> elements in the upper left quadrant will be less than the pivot, and the <math>3n/10</math> elements in the lower right quadrant will be greater than the pivot, showing that many elements will be eliminated by pivoting.]]
Line 232:
 
<ref name=devroye>{{cite journal
| last = Devroye | first = Luc | author-link = Luc Devroye
| doi = 10.1016/0022-0000(84)90009-6
| issue = 1
| journal = [[Journal of Computer and System Sciences]]
| mr = 761047
| pages = 1–7
| title = Exponential bounds for the running time of a selection algorithm
| url = httpshttp://coreluc.acdevroye.ukorg/download/pdf/82774506devroye-selection1984.pdf
| volume = 29
| year = 1984}}</ref> {{cite journal
| last = Devroye | first = Luc
| doi = 10.1007/s00453-001-0046-2
| issue = 3
| journal = Algorithmica
| mr = 1855252
| pages = 291–303
| title = On the probabilistic worst-case time of 'find'
| url = https://luc.devroye.org/wcfind.pdf
| volume = 31
| year = 2001}}</ref>
 
<ref name=dieram>{{cite journal