Content deleted Content added
→Bibliography: I don't think any material here still resembles the DADS link |
Start rewrite, cutting a lot of unsourced cruft. New material is also unsourced for now but should be sourceable. |
||
Line 2:
{{for|simulated natural selection in genetic algorithms|Selection (genetic algorithm)}}
{{more footnotes|date=July 2017}}
In [[computer science]], a '''selection algorithm''' is an [[algorithm]] for finding the
==Algorithms==
===Sorting and heapselect===
As a baseline algorithm, selection of the <math>k</math>th smallest value in a collection of values can be performed very simply by the following two steps:
* [[Sorting|Sort]] the collection
* If the output of the sorting algorithm is an array, jump to its <math>k</math>th element; otherwise, scan the sorted sequence to find the <math>k</math>th element.
The time for this method is dominated by the sorting step, which requires <math>\Theta(n\log n)</math> time using a [[comparison sort]]. Even when [[integer sorting]] algorithms may be used, these are generally slower than the linear time that may be achieved using specialized selection algorithms. Nevertheless, the simplicity of this approach makes it attractive, especially when a highly-optimized sorting routine is provided as part of a runtime library, but a selection algorithm is not.
For a sorting algorithm that generates one item at a time, such as [[selection sort]], the scan can be done in tandem with the sort, and the sort can be terminated once the <math>k</math> element has been found. Applying this optimization to [[heapsort]] produces the [[heapselect]] algorithm, which can select the <math>k</math>th smallest value in time <math>O(n+k\log n)</math>. This is fast when <math>k</math> is small relative to <math>n</math>, but degenerates to <math>O(n\log n)</math> for larger values of <math>k</math>, such as the choice <math>k=n/2</math> used for median finding.
===Decision trees===
===Pivoting===
Many methods for selection are based on choosing a special "pivot" element from the input, and using comparisons with this element to divide the remaining <math>n-1</math> input values into two subsets: the set <math>L</math> of elements less than the pivot, and the set <math>R</math> of elements greater than the pivot. The algorithm can then determine where the <math>k</math>th smallest value is to be found, based on a comparison of <math>k</math> with the sizes of these sets. In particular, if <math>k\le|L|</math>, the <math>k</math>th smallest value is in <math>L</math>, and can be found recursively by applying the same selection algorithm to <math>L</math>. If <math>k=|L|+1</math>, then the <math>k</math>th smallest value is the pivot, and it can be returned immediately. In the remaining case, the <math>k</math>th smallest value is in <math>R</math>, and more specifically it is the element in position <math>k-|L|-1</math> of <math>R</math>. It can be found by applying a selection algorithm recursively, seeking the value in this position in <math>R</math>.
As with the related pivoting-based [[quicksort]] algorithm, the partition of the input into <math>L</math> and <math>R</math> may be done by making new collections for these sets, or by a method that partitions a given list or array data type in-place. Details vary depending on how the input collection is represented.
The time to compare the pivot against all the other values is <math>O(n)</math>. However, pivoting methods differ in how they choose the pivot, which affects how big the subproblems in each recursive call will be. The efficiency of these methods depends greatly on the choice of the pivot.
*If the pivot were exactly at the median of the input, then each recursive call would have at most half as many values as the previous call, and the total times would add in a [[geometric series]] to <math>O(n)</math>. However, finding the median is itself a selection problem, on the entire original input. Trying to find it by a recursive call to a selection algorithm would lead to an infinite recursion, because the problem size would not decrease in each call.
*[[Quickselect]] chooses the pivot uniformly at random from the input values. It can be described as a variant of [[quicksort]], with the same pivoting strategy, but where quicksort makes two recursive calls to sort the two subcollections <math>L</math> and <math>R</math>, quickselect only makes one of these two calls. Its [[expected time]] is <math>O(n)</math>.
*The [[median of medians]] method partitions the input into sets of five elements, and then uses a decision tree to find the median of each of these sets. It then recursively calls the same selection algorithm to find the median of these <math>n/5</math> medians, using the result as its pivot. It can be shown that, for this choice of pivot, <math>\max(|L|,|R|)\le 7n/10</math>. Thus, a problem on <math>n</math> elements is reduced to two recursive problems on <math>n/5</math> and at most <math>7n/10</math> elements. The total size of these two recursive subproblems is at most <math>9n/10</math>, allowing the total time to be analyzed as a geometric series adding to <math>O(n)</math>. Unlike quickselect, this algorithm is deterministic, not randomized. It is commonly taught in undergraduate algorithms classes as an example of a [[divide and conquer]] algorithm that does not divide into two equal subproblems. However, the high constant factors in its <math>O(n)</math> time bound make it unsuitable for practical use.
*Hybrid algorithms such as [[introselect]] can be used to achieve the practical performance of quickselect with a fallback to medians of medians guaranteeing worst-case <math>O(n)</math> time.
===Sublinear data structures===
Given an unorganized list of data, linear time (Ω(''n'')) is required to find the minimum element, because we have to examine every element (otherwise, we might miss it). If we organize the list, for example by keeping it sorted at all times, then selecting the ''k''th largest element is trivial, but then insertion requires linear time, as do other operations such as combining two lists.
Line 80 ⟶ 46:
This bound is achievable for ''t''=2 but better, more complex bounds are known for larger ''t''.{{citation needed|date=April 2018}}
== Language support ==
|