Partial sorting: Difference between revisions

Content deleted Content added
reorganize; add incremental sorting, remove long-unsourced thing about tournament selection
Line 3:
In terms of indices, in a partially sorted list, for every index ''i'' from 1 to ''k,'' the ''i''th element is in the same place as it would be in the fully sorted list: element ''i'' of the partially sorted list contains [[order statistic]] ''i'' of the input list.
 
==Offline problem==
== Solution by partitioning selection==
== =Heap-based solutions solution===
A further relaxation requiring only a list of the {{mvar|k}} smallest elements, but without requiring that these be ordered, makes the problem equivalent to [[Selection algorithm#Partition-based selection|partition-based selection]]; the original partial sorting problem can be solved by such a selection algorithm to obtain an array where the first {{mvar|k}} elements are the {{mvar|k}} smallest, and sorting these, at a total expected cost of {{math|''O''(''n'' + ''k'' log ''k'')}} operations. When [[quickselect]] and [[quicksort]] are used as the building blocks in this algorithm, the result is called "quickselsort".<ref name="aofa04slides"/>
A [[streamingHeap algorithm(data structure)|streamingHeaps]], admit a simple single-pass partial sort iswhen also{{mvar|k}} possibleis using heaps or other [[priority queue]] data structures. First,fixed: insert the first {{mvar|k}} elements of the input into thea structuremax-heap. Then make one pass over the remaining elements, add each to the structureheap in turn, and remove the largest element. Each insertion operation also takes {{math|''O''(log ''k'')}} time, resulting in {{math|''O''(''n'' log ''k'')}} time overall; this algorithm is practical for small values of {{mvar|k}} and in [[online algorithm|online]] settings.<ref name="aofa04slides"/>
 
== =Solution by partitioning selection===
== Heap-based solutions ==
A further relaxation requiring only a list of the {{mvar|k}} smallest elements, but without requiring that these be ordered, makes the problem equivalent to [[Selection algorithm#Partition-based selection|partition-based selection]]; the original partial sorting problem can be solved by such a selection algorithm to obtain an array where the first {{mvar|k}} elements are the {{mvar|k}} smallest, and sorting these, at a total expected cost of {{math|''O''(''n'' + ''k'' log ''k'')}} operations. WhenA popular choice to implement this algorithm scheme is to combine [[quickselect]] and [[quicksort]] are used as the building blocks in this algorithm,; the result is sometimes called "quickselsort".<ref name="aofa04slides"/>
 
== =Specialised sorting algorithms ===
[[Binary heap]]s lead to an {{math|''O''(''n'' + ''k'' log ''n'')}} solution to partial sorting: partial [[heapsort]]. First "heapify", in linear time, the complete input array. Then extract the minimum of the heap {{mvar|k}} times.<ref name="aofa04slides">{{cite conference |author=Conrado Martínez |year=2004 |title=On partial sorting |url=http://www.lsi.upc.edu/~conrado/research/talks/aofa04.pdf |conference=10th Seminar on the Analysis of Algorithms}}</ref>
More efficient than anythe of theseaforementioned are specialized partial sorting algorithms based on [[mergesort]] and [[quicksort]]. In the quicksort variant, there is no need to recursively sort partitions which only contain elements that would fall after the {{mvar|k}}'th place in the final sorted array (starting from the "left" boundary). Thus, if the pivot falls in position {{mvar|k}} or later, we recurse only on the left partition:<ref>{{cite conference |last=Martínez |first=Conrado |title=Partial quicksort |conference=Proc. 6th ACM-SIAM Workshop on Algorithm Engineering and Experiments and 1st ACM-SIAM Workshop on Analytic Algorithmics and Combinatorics |year=2004 |url=http://www.lsi.upc.edu/~conrado/research/reports/ALCOMFT-TR-03-50.pdf}}</ref>
 
A [[streaming algorithm|streaming]], single-pass partial sort is also possible using heaps or other [[priority queue]] data structures. First, insert the first {{mvar|k}} elements of the input into the structure. Then make one pass over the remaining elements, add each to the structure in turn, and remove the largest element. Each insertion operation also takes {{math|''O''(log ''k'')}} time, resulting in {{math|''O''(''n'' log ''k'')}} time overall; this algorithm is practical for small values of {{mvar|k}} and in [[online algorithm|online]] settings.<ref name="aofa04slides"/>
 
== Specialised sorting algorithms ==
More efficient than any of these are specialized partial sorting algorithms based on [[mergesort]] and [[quicksort]]. In the quicksort variant, there is no need to recursively sort partitions which only contain elements that would fall after the {{mvar|k}}'th place in the final sorted array (starting from the "left" boundary). Thus, if the pivot falls in position {{mvar|k}} or later, we recurse only on the left partition:<ref>{{cite conference |last=Martínez |first=Conrado |title=Partial quicksort |conference=Proc. 6th ACM-SIAM Workshop on Algorithm Engineering and Experiments and 1st ACM-SIAM Workshop on Analytic Algorithmics and Combinatorics |year=2004 |url=http://www.lsi.upc.edu/~conrado/research/reports/ALCOMFT-TR-03-50.pdf}}</ref>
 
'''function''' partial_quicksort(A, i, j, k)
Line 25 ⟶ 23:
The resulting algorithm is called partial quicksort and requires an ''expected'' time of only {{math|''O''(''n'' + ''k'' log ''k'')}}, and is quite efficient in practice, especially if a [[selection sort]] is used as a base case when {{mvar|k}} becomes small relative to {{mvar|n}}. However, the worst-case time complexity is still very bad, in the case of a bad pivot selection. Pivot selection along the lines of the worst-case linear time selection algorithm could be used to get better worst-case performance.
 
==Incremental sorting==
==Tournament algorithm==
Incremental sorting is an "online" version of the partial sorting problem, where the input is given up front but {{mvar|k}} is unknown: given a {{mvar|k}}-sorted array, it should be possible to extend the partially sorted part so that the array becomes {{math|(''k''+1)}}-sorted.{{r|paredes}}
{{unreferenced section|date=May 2014}}
 
Another method is the tournament algorithm. The idea is to conduct a knockout minimal round tournament to decide the ranks. It first organises the games (comparisons) between adjacent pairs and moves the winners to next round until championship (the first best) is decided. It also constructs the tournament tree along the way. Now the second best element must be among the direct losers to winner and these losers can be found out by walking in the binary tree in O(log ''n'') time. It organises another tournament to decide the second best among these potential elements. The third best must be one among the losers of the second best in either of the two tournament trees. The approach continues until we find ''k'' elements. This algorithm takes O(''n'' + k log ''n'') complexity, which for any fixed ''k'' independent of ''n'' is O(''n'').
[[BinaryHeap heap(data structure)|Heaps]]s lead to an {{math|''O''(''n'' + ''k'' log ''n'')}} solution to online partial sorting: partial [[heapsort]]. Firstfirst "heapify", in linear time, the complete input array to produce a min-heap. Then extract the minimum of the heap {{mvar|k}} times.<ref name="aofa04slides">{{cite conference |author=Conrado Martínez |year=2004 |title=On partial sorting |url=http://www.lsi.upc.edu/~conrado/research/talks/aofa04.pdf |conference=10th Seminar on the Analysis of Algorithms}}</ref>
 
An [[Asymptotic analysis|asymptotically]] faster incremental sort can be obtained by modifying quickselect. The version due to Paredes and Navarro maintains a [[stack (data structure)|stack]] of pivots across calls, so that incremental sorting can be accomplished by repeatedly requesting the smallest item of an array {{mvar|A}} from the following algorithm:<ref name="paredes">{{cite doi|10.1137/1.9781611972863.16}}</ref>
 
<div style="margin-left: 35px; width: 600px">
{{framebox|blue}}
Algorithm {{math|IQS(''A'' : array, ''i'' : integer, ''S'' : stack)}} returns the {{mvar|i}}'th smallest element in {{mvar|A}}
* If {{math|''i'' {{=}} top(''S'')}}:
** Pop {{mvar|S}}
** Return {{math|''A''[''i'']}}
* Let {{math|pivot ← random [''i'', top(''S''))}}
* Update {{math|pivot ← partition(''A''[''i'' : top(''S'')), ''A''[pivot])}}
* Push {{math|pivot}} onto {{mvar|S}}
* Return {{math|IQS(''A'', ''i'', ''S''}}
{{frame-footer}}
</div>
 
The stack {{mvar|S}} is initialized to contain only the length {{mvar|n}} of {{mvar|A}}. {{mvar|k}}-sorting the array is done by calling {{math|IQS(''A'', ''i'', ''S'')}} for {{math|''i'' {{=}} 0, 1, 2, ...}}; this sequence of calls has [[average-case complexity]] {{math|''O''(''n'' + ''k'' log ''k'')}}. The worst-case time is quadratic, but this can be fixed by replacing the random pivot selection by the [[median of medians]] algorithm.{{r|paredes}}
 
== Language/library support ==