Selection algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 19:41, 1 April 2023 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,585 edits m →References: typo in authorlink param ← Previous edit		Latest revision as of 20:59, 28 January 2025 edit undo David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,585 edits Undid revision 1272462687 by 49.207.232.175 (talk) the values must have an ordering. They may not be presented in that ordering. Tag: Undo
(43 intermediate revisions by 13 users not shown)
Line 1: {{Short description\|~~An algorithm~~Method for finding ~~the~~ kth smallest ~~number in a list or array~~value}} {{~~for~~For\|simulated natural selection in genetic algorithms\|Selection (genetic algorithm)}} {{Good article}} {{Use mdy dates\|cs1-dates=ly\|date=April 2023}} {{Use list-defined references\|date=April 2023}} In [[computer science]], a '''selection algorithm''' is an [[algorithm]] for finding the <math>k</math>th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the {{nowrap\|<math>k</math>th}} [[order statistic]]. Selection includes as special cases the problems of finding the [[minimum]], [[median]], and [[maximum]] element in the collection. Selection algorithms include [[quickselect]], and the [[median of medians]] algorithm. When applied to a collection of <math>n</math> values, these algorithms take [[linear time]], <math>O(n)</math> as expressed using [[big O notation]]. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted [[Array (data structure)\|array]] takes {{nowrap\|time <math>O(1)</math>.}} ==Problem statement== An algorithm for the selection problem takes as input a collection of values, and a {{nowrap\|number <math>k</math>.}} It outputs the {{nowrap\|<math>k</math>th}} smallest of these values, or, in some versions of the problem, a collection of the <math>k</math> smallest values. For this ~~should~~to be well-defined, it should be possible to [[Sorting\|sort]] the values into an order from smallest to largest; for instance, they may be ~~numbers~~[[Integer (computer science)\|integers]], [[floating-point number]]s, or some other kind of [[Object (computer science)\|object]] with a numeric key. However, they are not assumed to have been already sorted. Often, selection algorithms are restricted to a comparison-based [[model of computation]], as in [[comparison sort]] algorithms, where the algorithm has access to a [[Relational operator\|comparison operation]] that can determine the relative ordering of any two values, but may not perform any other kind of [[Arithmetic\|arithmetic operations]] on these values.{{r\|cunmun}} To simplify the problem, some ~~sources~~works ~~may~~on this problem assume that the values are all distinct from each {{nowrap\|other,{{r\|clrs}}}} or that some consistent tie-breaking method has been used to assign an ordering to pairs of items with the same value as each other. Another variation in the problem definition concerns the numbering of the ordered values: is the smallest value obtained by {{nowrap\|setting <math>k=0</math>,}} as in [[zero-based numbering]] of arrays, or is it obtained by {{nowrap\|setting <math>k=1</math>,}} following the usual English-language conventions for the smallest, second-smallest, etc.? This article follows the conventions used by Cormen et al., according to which all values are distinct and the minimum value is obtained from {{nowrap\|<math>k=1</math>.{{r\|clrs}}}} With these conventions, the maximum value, among a collection of <math>n</math> values, is obtained by {{nowrap\|setting <math>k=n</math>.}} When <math>n</math> is an [[odd number]], the [[median]] of the collection is obtained by {{nowrap\|setting <math>k=(n+1)/2</math>.}} When <math>n</math> is even, there are two choices for the median, obtained by rounding this choice of <math>k</math> down or up, respectively: the ''lower median'' with <math>k=n/2</math> and the ''upper median'' with {{nowrap\|<math>k=n/2+1</math>.{{r\|clrs}}}} Line 12 ⟶ 15: ==Algorithms== ===Sorting and heapselect=== As a baseline algorithm, selection of the {{nowrap\|<math>k</math>th}} smallest value in a collection of values can be performed ~~very simply~~ by the following two steps: * [[Sorting\|Sort]] the collection * If the output of the sorting algorithm is an [[Array (data type)\|array]], ~~jump to~~retrieve its {{nowrap\|<math>k</math>th}} element; otherwise, scan the sorted sequence to find the {{nowrap\|<math>k</math>th}} element. The time for this method is dominated by the sorting step, which requires <math>\Theta(n\log n)</math> time using a {{nowrap\|[[comparison sort]].{{r\|clrs\|skiena}}}} Even when [[integer sorting]] algorithms may be used, these are generally slower than the linear time that may be achieved using specialized selection algorithms. Nevertheless, the simplicity of this approach makes it attractive, especially when a highly-optimized sorting routine is provided as part of a runtime library, but a selection algorithm is not. For inputs of moderate size, sorting can be faster than non-random selection algorithms, because of the smaller constant factors in its running {{nowrap\|time.{{r\|erickson}}}} This method also produces a sorted version of the collection, which may be useful for other later computations, and in particular for selection with other choices {{nowrap\|of <math>k</math>.{{r\|skiena}}}} Line 24 ⟶ 27: As with the related pivoting-based [[quicksort]] algorithm, the partition of the input into <math>L</math> and <math>R</math> may be done by making new collections for these sets, or by a method that partitions a given list or array data type in-place. Details vary depending on how the input collection is {{nowrap\|represented.<ref>For instance, Cormen et al. use an in-place array partition, while Kleinberg and Tardos describe the input as a set and use a method that partitions it into two new sets.</ref>}} The time to compare the pivot against all the other values {{nowrap\|is <math>O(n)</math>.{{r\|kletar}}}} However, pivoting methods differ in how they choose the pivot, which affects how big the subproblems in each recursive call will be. The efficiency of these methods depends greatly on the choice of the pivot. If the pivot is chosen badly, the running time of this method can be as slow {{nowrap\|as <math>O(n^2)</math>.{{r\|erickson}}}} If the pivot were exactly at the median of the input, then each recursive call would have at most half as many values as the previous call, and the total times would add in a [[geometric series]] {{nowrap\|to <math>O(n)</math>.}} However, finding the median is itself a selection problem, on the entire original input. Trying to find it by a recursive call to a selection algorithm would lead to an infinite recursion, because the problem size would not decrease in each {{nowrap\|call.{{r\|kletar}}}} [[Quickselect]] chooses the pivot uniformly at random from the input values. It can be described as a [[prune and search]] algorithm,{{r\|gootam}} a variant of [[quicksort]], with the same pivoting strategy, but where quicksort makes two recursive calls to sort the two subcollections <math>L</math> {{nowrap\|and <math>R</math>,}} quickselect only makes one of these two calls. Its [[expected time]] {{nowrap\|is <math>O(n)</math>.{{r\|clrs\|kletar\|gootam}}}} For any constant <math>C</math>, the probability that its number of comparisons exceeds <math>Cn</math> is superexponentially small {{nowrap\|in <math>C</math>.{{r\|devroye}}}} The [[Floyd–Rivest algorithm]], a variation of quickselect, chooses a pivot by randomly sampling a subset of <math>r</math> data values, for some sample {{nowrap\|size <math>r</math>,}} and then recursively selecting two elements somewhat above and below position <math>rk/n</math> of the sample to use as pivots. With this choice, it is likely that <math>k</math> is sandwiched between the two pivots, so that after pivoting only a small number of data values between the pivots are left for a recursive call. This method can achieve an expected number of comparisons that is {{nowrap\|<math>n+\min(k,n-k)+o(n)</math>.{{r\|floriv}}}} In their original work, Floyd and Rivest claimed that the <math>o(n)</math> term could be made as small as <math>O(\sqrt n)</math> by a recursive sampling scheme, but the correctness of their analysis has been {{nowrap\|questioned.{{r\|brown\|prt}}}} Instead, more rigorous analysis has shown that a version of their algorithm achieves <math>O(\sqrt{n\log n})</math> for this {{nowrap\|term.{{r\|knuth}}}} Although the usual analysis of both quickselect and the Floyd–Rivest algorithm assumes the use of a [[true random number generator]], a version of the Floyd–Rivest algorithm using a [[pseudorandom number generator]] seeded with only logarithmically many true random bits has been proven to run in linear time with high probability.{{r\|karrag}} [[File:Mid-of-mid.png\|thumb\|upright=1.35\|Visualization of pivot selection for the [[median of medians]] method. Each set of five elements is shown as a column of dots in the figure, sorted in increasing order from top to bottom. If their medians (the green and purple dots in the middle row) are sorted in increasing order from left to right, and the median of medians is chosen as the pivot, then the <math>3n/10</math> elements in the upper left quadrant will be less than the pivot, and the <math>3n/10</math> elements in the lower right quadrant will be greater than the pivot, showing that many elements will be eliminated by pivoting.]] The [[median of medians]] method partitions the input into sets of five elements, and ~~then~~ uses some other ~~method (rather than a~~ non-recursive ~~call)~~method to find the median of each of these sets in constant time per set. It then recursively calls ~~the same selection algorithm~~itself to find the median of these <math>n/5</math> medians,. ~~using~~Using the ~~result~~resulting asmedian ~~its~~of ~~pivot.~~medians Itas ~~can~~the bepivot ~~shown~~produces ~~that,~~a ~~for~~partition ~~this choice of pivot,~~with {{nowrap\|<math>\max(\|L\|,\|R\|)\le 7n/10</math>.}} Thus, a problem on <math>n</math> elements is reduced to two recursive problems on <math>n/5</math> elements (to find the pivot) and at most <math>7n/10</math> elements (after the pivot is used). The total size of these two recursive subproblems is at {{nowrap\|most <math>9n/10</math>,}} allowing the total time to be analyzed as a geometric series adding {{nowrap\|to <math>O(n)</math>.}} Unlike quickselect, this algorithm is deterministic, not {{nowrap\|randomized.{{r\|clrs\|erickson\|bfprt}}}} It was the first linear-time deterministic selection algorithm {{nowrap\|known,{{r\|bfprt}}}} and is commonly taught in undergraduate algorithms classes as an example of a [[Divide-and-conquer algorithm\|divide and conquer]] that does not divide into two equal {{nowrap\|subproblems.{{r\|clrs\|erickson\|gootam\|gurwitz}}}} However, the high constant factors in its <math>O(n)</math> time bound make it slower than quickselect in {{nowrap\|practice,{{r\|skiena\|gootam}}}} and slower even than sorting for inputs of moderate {{nowrap\|size.{{r\|erickson}}}} Hybrid algorithms such as [[introselect]] can be used to achieve the practical performance of quickselect with a fallback to medians of medians guaranteeing worst-case <math>O(n)</math> {{nowrap\|time.{{r\|musser}}}} Line 37 ⟶ 40: ===Sublinear data structures=== When data is already organized into a [[data structure]], it may be possible to perform selection in an amount of time that is sublinear in the number of values. As a simple case of this, for data already sorted into an array, selecting the {{nowrap\|<math>k</math>th}} element may be performed by a single array lookup, in constant {{nowrap\|time.{{r\|frejoh}}}} For values organized into a two-dimensional array of {{nowrap\|size <math>m\times n</math>,}} with sorted rows and columns, selection may be performed in time {{nowrap\|<math>O\bigl(m\log(2n/m)\bigr)</math>,}} or faster when <math>k</math> is small relative to the array {{nowrap\|dimensions.{{r\|frejoh\|kkzz}}}} For a collection of <math>m</math> one-dimensional sorted arrays, with <math>k_i</math> items less than the selected item in the {{nowrap\|<math>i</math>th}} array, the time is {{nowrap\|<math display=inline>O\bigl(m+\sum_{i=1}^m\log(k_i+1)\bigr)</math>.{{r\|kkzz}}}} Selection from data in a [[binary heap]] takes {{nowrap\|time <math>O(k)</math>.}} This is independent of the size <math>n</math> of the heap, and faster than the <math>O(k\log n)</math> time bound that would be obtained from {{nowrap\|[[best-first search]].{{r\|kkzz\|frederickson}}}} This same method can be applied more generally to data organized as any kind of heap-ordered tree (a tree in which each node stores one value in which the parent of each non-root node has a smaller value than its child). This method of performing selection in a heap has been applied to problems of listing multiple solutions to combinatorial optimization problems, such as finding the [[k shortest path routing\|{{mvar\|k}} shortest paths]] in a weighted graph, by defining a [[State space (computer science)\|state space]] of solutions in the form of an [[implicit graph\|implicitly defined]] heap-ordered tree, and then applying this selection algorithm to this {{nowrap\|tree.{{r\|kpaths}}}} In the other direction, linear time selection algorithms have been used as a subroutine in a [[priority queue]] data structure related to the heap, improving the time for extracting its {{nowrap\|<math>k</math>th}} item from <math>O(\log n)</math> to {{nowrap\|<math>O(\log^ n+\log k)</math>;}} here <math>\log^* n</math> is the {{nowrap\|[[iterated logarithm]].{{r\|bks}}}} For a collection of data values undergoing dynamic insertions and deletions, the [[order statistic tree]] augments a [[self-balancing binary search tree]] structure with a constant amount of additional information per tree node, allowing insertions, deletions, and selection queries that ask for the {{nowrap\|<math>k</math>th}} element in the current set to all be performed in <math>O(\log n)</math> time per {{nowrap\|operation.{{r\|clrs}}}} Going beyond the comparison model of computation, faster times per operation are possible for values that are small integers, on which binary arithmetic operations are {{nowrap\|allowed.{{r\|pattho}}}} It is not possible for a [[streaming algorithms\|streaming algorithm]] with memory sublinear in both <math>n</math> and <math>k</math> to solve selection queries exactly for dynamic data, but the [[count–min sketch]] can be used to solve selection queries approximately, by finding a value whose position in the ordering of the elements (if it were added to them) would be within <math>\varepsilon n</math> steps of <math>k</math>, for a sketch whose size is within logarithmic factors of <math>1/\varepsilon</math>.{{r\|cormut}} == Lower bounds == The <math>O(n)</math> running time of the selection algorithms described above is necessary, because a selection algorithm that can handle inputs in an arbitrary order must take that much time to look at all of its inputs;. ifIf any one of its input values is not compared, that one value could be the one that should have been selected, and the algorithm can be made to produce an incorrect answer.{{r\|kkzz}} Beyond this simple argument, there has been a significant amount of research on the exact number of comparisons needed for selection, both in the randomized and deterministic cases. Selecting the minimum of <math>n</math> values requires <math>n-1</math> comparisons, because the <math>n-1</math> values that are not selected must each have been determined to be non-minimal, by being the largest in some comparison, and no two of these values can be largest in the same comparison. The same argument applies symmetrically to selecting the {{nowrap\|maximum.{{r\|knuth}}}} Line 57 ⟶ 60: == Exact numbers of comparisons == [[File:Median of 5.svg\|thumb\|upright=0.9\|Finding the median of five values using six comparisons. Each step shows the comparisons to be performed next as yellow line segments, and a [[Hasse diagram]] of the order relations found so far (with smaller=lower and larger=higher) as blue line segments. The red elements have already been found to be greater than three others and so cannot be the median. The larger of the two elements in the final comparison is the median.]] [[Donald Knuth\|Knuth]] supplies the following triangle of numbers summarizing pairs of <math>n</math> and <math>k</math> for which the exact number of comparisons needed by an optimal selection algorithm is known. The {{nowrap\|<math>n</math>th}} row of the triangle (starting with <math>n=1</math> in the top row) gives the numbers of comparisons for inputs of <math>n</math> values, and the {{nowrap\|<math>k</math>th}} number within each row gives the number of comparisons needed to select the {{nowrap\|<math>k</math>th}} smallest value from an input of that size. The rows are symmetric because selecting the {{nowrap\|<math>k</math>th}} smallest requires exactly the same number of comparisons, in the worst case, as selecting the {{nowrap\|<math>k</math>th}} {{nowrap\|largest.{{r\|knuth}}}} {{center\|0}} Line 70 ⟶ 73: {{center\|9   12   14   15   16   16   15   14   12   9}} Most, but not all, of the entries on the left half of each row can be found using the formula <math display=block>n-k+(k-1)\bigl\lceil\log_2(n+2-k)\bigr\rceil.</math> ~~describing~~This describes the number of comparisons made by a method of Abdollah Hadian and [[Milton Sobel]], related to heapselect, that finds the smallest value using a single-elimination tournament and then repeatedly uses a smaller tournament among the values eliminated by the eventual tournament winners to find the next successive values until reaching the {{nowrap\|<math>k</math>th}} smallest.{{r\|knuth\|hadsob}} Some of the larger entries were proven to be optimal using a computer search.{{r\|knuth\|gkp}} == Language support == Very few languages have built-in support for general selection, although many provide facilities for finding the smallest or largest element of a list. A notable exception is the [[Standard Template Library]] for [[C++]], which provides a templated <code>nth_element</code> method with a guarantee of expected linear time.{{r\|skiena}} [[Python (programming language)\|Python]]'s standard library ~~(since 2.4)~~ includes <code>heapq.nsmallest</code> and <code>heapq.nlargest</code> ~~subroutines~~functions for returning the smallest or largest elements from a collection, in sorted order. AsThe ofimplementation ~~Python~~maintains ~~version~~a ~~3.11~~[[binary heap]], ~~these~~limited ~~subroutines~~to ~~use~~holding ~~heapselect~~<math>k</math> elements, ~~taking~~and ~~time~~initialized to the first <math>~~O(n+~~k~~\log n)~~</math> toelements ~~return~~in athe ~~list~~collection. Then, each subsequent item of the collection may replace the largest or smallest element in the heap if it is smaller or larger than this element. The algorithm's memory usage is superior to heapselect (the former only holds <math>k</math> elements in memory at a time while the latter requires manipulating the entire dataset into memory). ~~When~~ Running time depends on data ordering. The best case is <math>O((n - k) + k\log k)</math> isfor ~~large~~already ~~relative~~sorted ~~{{nowrap\|to~~data. The worst-case is <math>O(n\log k)</math> for reverse sorted data. In average cases,}} ~~they~~there ~~revert~~are likely to ~~sorting~~be ~~the~~few ~~whole~~heap ~~input~~updates and ~~then~~most ~~returning~~input elements are processed with only a ~~slice~~single ofcomparison. For example, extracting the ~~resulting~~100 ~~sorted~~largest ~~array.~~or Asmallest ~~linear-time~~values ~~selection~~out ~~algorithm~~of is10,000,000 ~~not~~random ~~{{nowrap\|included~~inputs makes 10,009,401 comparisons on average.{{r\|python}}}} Since 2017, [[Matlab]] has included <code>maxk()</code> and <code>mink()</code> functions, which return the maximal (minimal) <math>k</math> values in a vector as well as their indices. The Matlab documentation does not specify which algorithm these functions use or what their running {{nowrap\|time is.{{r\|matlab}}}} == History== [[Quickselect]] was presented without analysis by [[Tony Hoare]] {{nowrap\|in 1965,{{r\|hoare}}}} and first analyzed in a 1971 technical report by {{nowrap\|[[Donald Knuth]].{{r\|floriv}}}} The first known linear time deterministic selection algorithm is the [[median of medians]] method, published in 1973 by [[Manuel Blum]], [[Robert W. Floyd]], [[Vaughan Pratt]], [[Ron Rivest]], and [[Robert Tarjan]].{{r\|bfprt}} They trace the formulation of the selection problem to work of Charles L. Dodgson (better known as [[Lewis Carroll]]) who in 1883~~, who~~ pointed out that the usual design of single-elimination sports tournaments does not guarantee that the second-best player wins second place,{{r\|bfprt\|carroll}} and to work of [[Hugo Steinhaus]] circa 1930, who followed up this same line of thought by asking for a tournament design that can make this guarantee, with a minimum number of games played (that is, {{nowrap\|comparisons).{{r\|bfprt}}}} == See also == Line 107 ⟶ 110: \| last2 = Pippenger \| first2 = Nicholas \| author2-link = Nick Pippenger \| doi = 10.1016/0166-218X(90)90128-Y \| issue = ~~1-2~~1–2 \| journal = [[Discrete Applied Mathematics]] \| mr = 1055590 Line 124 ⟶ 127: \| publisher = Association for Computing Machinery \| title = Proceedings of the 17th Annual ACM Symposium on Theory of Computing, May 6–8, 1985, Providence, Rhode Island, USA \| year = 1985~~}}</ref>~~\| doi-access = free \| isbn = 0-89791-151-2 }}</ref> <ref name=bfprt>{{cite journal Line 139 ⟶ 144: \| url = http://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf \| volume = 7 \| year = 1973\| issue = 4 }}</ref> <ref name=bks>{{cite journal Line 152 ⟶ 157: \| title = Cascade heap: towards time-optimal extractions \| volume = 63 \| year = 2019~~}}</ref>~~\| s2cid = 253740380 }}</ref> <ref name=brodal>{{cite conference \| last = Brodal \| first = Gerth Stølting \| author-link = Gerth Stølting Brodal \| editor1-last = Brodnik \| editor1-first = Andrej \| editor2-last = López-Ortiz \| editor2-first = Alejandro Line 167 ⟶ 173: \| title = Space-Efficient Data Structures, Streams, and Algorithms – Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday \| volume = 8066 \| year = 2013\| isbn = 978-3-642-40272-2 }}</ref> <ref name=brown>{{cite journal Line 177 ⟶ 183: \| pages = 301–304 \| title = Remark on Algorithm 489 \| volume = 2~~}}</ref>~~\| s2cid = 13985011 }}</ref> <ref name=carroll>{{cite book\|title=Lawn Tennis Tournaments: The True Method of Assigning Prizes with a Proof of the Fallacy of the Present Method\|first=Charles L.\|last=Dodgson\|author-link=Lewis Carroll\|year=1883\|___location=London\|publisher=Macmillan and Co.}} See also {{cite book\|title=The Mathematical World of Charles L. Dodgson (Lewis Carroll)\|editor1-first=Robin\|editor1-last=Wilson\|editor2-first=Amirouche\|editor2-last=Moktefi\|publisher=Oxford University Press\|year=2019\|isbn=9780192549013\|contribution=Lawn tennis tournaments\|page=129\|contribution-url=https://books.google.com/books?id=OBGIDwAAQBAJ&pg=PA129}}</ref> <ref name=chan>{{cite journal Line 188 ⟶ 197: \| title = Comparison-based time-space lower bounds for selection \| volume = 6 \| year = 2010\| s2cid = 11742607 }}</ref> <ref name=chr>{{cite conference Line 203 ⟶ 212: \| title = Mathematical Foundations of Computer Science 1993, 18th International Symposium, MFCS'93, Gdansk, Poland, August 30 – September 3, 1993, Proceedings \| volume = 711 \| year = 1993~~}}<~~\| hdl = 11858/~~ref>~~00-001M-0000-0014-B748-C \| isbn = 978-3-540-57182-7 \| hdl-access = free }}</ref> <ref name=clrs>{{Introduction to Algorithms\|edition=3\|chapter=Chapter 9: Medians and order statistics\|pages=213–227}}; "Section 14.1: Dynamic order statistics", pp. 339–345</ref> Line 229 ⟶ 241: \| title = Average case selection \| volume = 36 \| year = 1989~~}}</ref>~~\| s2cid = 10947879 \| doi-access = free }}</ref> <ref name=devroye>{{cite journal \| last = Devroye \| first = Luc \| doi = 10.1016/0022-0000(84)90009-6 \| issue = 1 \| journal = Journal of Computer and System Sciences \| mr = 761047 \| pages = 1–7 \| title = Exponential bounds for the running time of a selection algorithm \| url = http://luc.devroye.org/devroye-selection1984.pdf \| volume = 29 \| year = 1984}} {{cite journal \| last = Devroye \| first = Luc \| doi = 10.1007/s00453-001-0046-2 \| issue = 3 \| journal = Algorithmica \| mr = 1855252 \| pages = 291–303 \| title = On the probabilistic worst-case time of 'find' \| url = https://luc.devroye.org/wcfind.pdf \| volume = 31 \| year = 2001\| s2cid = 674040 }}</ref> <ref name=dieram>{{cite journal Line 244 ⟶ 281: <ref name=dz99>{{cite journal \| last1 = Dor \| first1 = Dorit \| author1-link = Dorit Dor \| last2 = Zwick \| first2 = Uri \| author2-link = Uri Zwick \| doi = 10.1137/S0097539795288611 Line 253 ⟶ 290: \| title = Selecting the median \| volume = 28 \| year = 1999~~}}</ref>~~\| s2cid = 2633282 }}</ref> <ref name=dz01>{{cite journal \| last1 = Dor \| first1 = Dorit \| author1-link = Dorit Dor \| last2 = Zwick \| first2 = Uri \| author2-link = Uri Zwick \| doi = 10.1137/S0895480199353895 \| issue = 3 Line 279 ⟶ 317: \| s2cid = 3064709 \| title = Expected time bounds for selection \| volume = 18\| doi-access = free ~~\| volume = 18~~}} See also "Algorithm 489: the algorithm SELECT—for finding the {{nowrap\|<math>i</math>th}} smallest of <math>n</math> elements", p. 173, {{doi\|10.1145/360680.360694}}.</ref> <ref name=frederickson>{{cite journal Line 290 ⟶ 329: \| title = An optimal algorithm for selection in a min-heap \| volume = 104 \| year = 1993~~}}</ref>~~\| doi-access = free }}</ref> <ref name=frejoh>{{cite journal Line 314 ⟶ 354: \| pages = 88–96 \| title = Finding the {{nowrap\|<math>i</math>th}} largest of <math>n</math> for small <math>i,n</math> \| volume = 27\| s2cid = 3133332 }}</ref> <ref name=gootam>{{cite book\|title=Algorithm Design and Applications\|first1=Michael T.\|last1=Goodrich\|author1-link= Michael T. Goodrich\|first2=Roberto\|last2=Tamassia\|author2-link=Roberto Tamassia\|publisher=Wiley\|year=2015\|chapter=9.2: Selection\|pages=270–275\|isbn=978-1-118-33591-8}}</ref> Line 326 ⟶ 366: \| title = On teaching median-finding algorithms \| volume = 35 \| year = 1992~~}}</ref>~~\| bibcode = 1992ITEdu..35..230G }}</ref> <ref name=hadsob>{{cite report\|first1=Abdollah\|last1=Hadian\|first2=Milton\|last2=Sobel\|author2-link=Milton Sobel\|hdl=11299/199105\|series=School of Statistics Technical Reports\|volume=121\|publisher=University of Minnesota\|title=Selecting the {{nowrap\|<math>t</math>-th}} largest using binary errorless comparisons\|date=May 1969}}</ref> <ref name=han>{{cite journal Line 339 ⟶ 380: \| title = Optimal parallel selection \| volume = 3 \| year = 2007~~}}</ref>~~\| s2cid = 9645870 }}</ref> <ref name=hoare>{{cite journal Line 361 ⟶ 403: \| title = Randomized algorithms and pseudorandom numbers \| volume = 40 \| year = 1993~~}}</ref>~~\| s2cid = 17956460 \| doi-access = free }}</ref> <ref name=kkzz>{{cite conference \| last1 = Kaplan \| first1 = Haim \| last2 = Kozma \| first2 = László \| last3 = Zamir \| first3 = Or \| last4 = Zwick \| first4 = Uri \| author4-link = Uri Zwick \| editor1-last = Fineman \| editor1-first = Jeremy T. \| editor2-last = Mitzenmacher \| editor2-first = Michael \| editor2-link = Michael Mitzenmacher \| arxiv = 1802.07041 \| contribution = Selection from heaps, row-sorted matrices, and <math>X+Y</math> using soft heaps \| doi = 10.4230/OASIcs.SOSA.2019.5 \| pages = 5:1–5:21 \| publisher = Schloss Dagstuhl – Leibniz-Zentrum für Informatik \| series = OASIcs \| title = 2nd Symposium on Simplicity in Algorithms, SOSA 2019, January 8–9, 2019, San Diego, CA, USA \| volume = 69 \| year = 2019\| doi-access = free }}</ref> <ref name=kletar>{{cite book Line 415 ⟶ 477: \| publisher = IEEE Computer Society \| title = 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18–21, 2014 \| year = 2014\| isbn = 978-1-4799-6517-5 }}</ref> <ref name=prt>{{cite journal Line 428 ⟶ 490: \| title = An efficient dynamic selection method \| volume = 26 \| year = 1983~~}}</ref>~~\| s2cid = 3211474 \| doi-access = free }}</ref> <ref name=reischuk>{{cite journal Line 452 ⟶ 516: \| series = Texts in Computer Science \| title = The Algorithm Design Manual \| year = 2020\| s2cid = 22382667 }}</ref> <ref name=spp>{{cite journal Line 465 ⟶ 529: \| title = Finding the median \| volume = 13 \| year = 1976\| s2cid = 29867292 }}</ref> <ref name=valiant>{{cite journal Line 480 ⟶ 544: <ref name=matlab>{{cite web\|url=https://www.mathworks.com/help/matlab/ref/mink.html\|title=mink: Find k smallest elements of array\|work=Matlab R2023a documentation\|publisher=Mathworks\|access-date=2023-03-30}}</ref> <ref name=python>{{cite web\|url=https://~~svn~~github.com/python~~.org~~/~~projects~~cpython/~~python~~blob/~~trunk~~main/Lib/heapq.py\|title=heapq package source code\|work=Python library\|access-date=2023-0308-2906}}; see also the linked [https://code.activestate.com/recipes/577573-compare-algorithms-for-heapqsmallest/ comparison of algorithm performance on best-case data].</ref> }}