Partial sorting: Difference between revisions

Content deleted Content added
m Bot: http → https
 
(7 intermediate revisions by 3 users not shown)
Line 5:
==Offline problems==
===Heap-based solution===
[[Heap (data structure)|Heaps]] admit a simple single-pass partial sort when {{mvar|k}} is fixed: insert the first {{mvar|k}} elements of the input into a max-heap. Then make one pass over the remaining elements, add each to the heap in turn, and remove the largest element. Each insertion operation takes {{math|''O''(log ''k'')}} time, resulting in {{math|''O''(''n'' log ''k'')}} time overall; this "partial heapsort" algorithm is practical for small values of {{mvar|k}} and in [[online algorithm|online]] settings.<ref name="aofa04slides"/> AnotherAn option"online isheapselect" toalgorithm builddescribed below, based on a min-heap for all the values (the build takes {{math|''O''(n)}}) and take out the head of the heap K times, each remove operation takes {{math|''O''(log ''n'')}}. In+ that case the algorithm takes {{math|''Ok''(n+klog log ''n'')}}.<ref name="aofa04slides"/>
 
===Solution by partitioning selection===
A further relaxation requiring only a list of the {{mvar|k}} smallest elements, but without requiring that these be ordered, makes the problem equivalent to [[Selection algorithm#Partition-based selection|partition-based selection]]; the original partial sorting problem can be solved by such a selection algorithm to obtain an array where the first {{mvar|k}} elements are the {{mvar|k}} smallest, and sorting these, at a total cost of {{math|''O''(''n'' + ''k'' log ''k'')}} operations. A popular choice to implement this algorithm scheme is to combine [[quickselect]] and [[quicksort]]; the result is sometimes called "quickselsort".<ref name="aofa04slides"/>
 
Common in current (as of 2022) C++ STL implementations is a pass of [[Heap (data structure)#Applications|heapselect]] for a list of ''k'' elements, followed by a [[heapsort]] for the final result.<ref>{{cite web |title=std::partial_sort |url=https://en.cppreference.com/w/cpp/algorithm/partial_sort |website=en.cppreference.com}}</ref>
 
==={{anchor|Partial quicksort}} Specialised sorting algorithms===
More efficient than the aforementioned are specialized partial sorting algorithms based on [[mergesort]] and [[quicksort]]. In the quicksort variant, there is no need to recursively sort partitions which only contain elements that would fall after the {{mvar|k}}'th place in the final sorted array (starting from the "left" boundary). Thus, if the pivot falls in position {{mvar|k}} or later, we recurse only on the left partition:<ref>{{cite conference |last=Martínez |first=Conrado |title=Partial quicksort |conference=Proc. 6th ACM-SIAM Workshop on Algorithm Engineering and Experiments and 1st ACM-SIAM Workshop on Analytic Algorithmics and Combinatorics |year=2004 |url=httphttps://www.lsi.upc.edu/~conrado/research/reports/ALCOMFT-TR-03-50.pdf}}</ref>
 
'''function''' partial_quicksort(A, i, j, k) '''is'''
Line 21 ⟶ 23:
partial_quicksort(A, p+1, j, k)
 
The resulting algorithm is called partial quicksort and requires an ''expected'' time of only {{math|''O''(''n'' + ''k'' log ''k'')}}, and is quite efficient in practice, especially if a [[selection sort]] is used as a base case when {{mvar|k}} becomes small relative to {{mvar|n}}. However, the worst-case time complexity is still very bad, in the case of a bad pivot selection. Pivot selection along the lines of the worst-case linear time selection algorithm (see {{section link|Quicksort|Choice of pivot}}) could be used to get better worst-case performance. Partial quicksort, quickselect (including the multiple variant), and quicksort can all be generalized into what is known as a ''chunksort''.<ref name="aofa04slides"/>
 
==Incremental sorting==
Incremental sorting is an "online"a version of the partial sorting problem, where the input is given up front but {{mvar|k}} is unknown: given a {{mvar|k}}-sorted array, it should be possible to extend the partially sorted part so that the array becomes {{math|(''k''+1)}}-sorted.{{r|paredes}}
 
[[Heap (data structure)|Heaps]] lead to an {{math|''O''(''n'' + ''k'' log ''n'')}} "online heapselect" solution to onlineincremental partial sorting: first "heapify", in linear time, the complete input array to produce a min-heap. Then extract the minimum of the heap {{mvar|k}} times.<ref name="aofa04slides">{{cite conference |author=Conrado Martínez |year=2004 |title=On partial sorting |url=httphttps://www.lsi.upc.edu/~conrado/research/talks/aofa04.pdf |conference=10th Seminar on the Analysis of Algorithms}}</ref>
 
A different incremental sort can be obtained by modifying quickselect. The version due to Paredes and Navarro maintains a [[stack (data structure)|stack]] of pivots across calls, so that incremental sorting can be accomplished by repeatedly requesting the smallest item of an array {{mvar|A}} from the following algorithm:<ref name="paredes">{{Cite conference| doi = 10.1137/1.9781611972863.16| chapter = Optimal Incremental Sorting| title = Proc. Eighth Workshop on Algorithm Engineering and Experiments (ALENEX)| pages = 171–182| year = 2006| last1 = Paredes | first1 = Rodrigo| last2 = Navarro | first2 = Gonzalo| isbn = 978-1-61197-286-3| citeseerx = 10.1.1.218.4119}}</ref>
Line 46 ⟶ 48:
 
== Language/library support ==
* The [[C++]] standard specifies a library function called <code>[httphttps://en.cppreference.com/w/cpp/algorithm/partial_sort std::partial_sort]</code>.
* The [[Python (programming language)|Python]] standard library includes functions <code>[https://docs.python.org/library/heapq.html#heapq.nlargest nlargest]</code> and <code>[https://docs.python.org/library/heapq.html#heapq.nsmallest nsmallest]</code> in its <code>heapq</code> module.
* The [[Julia_(programming_language)|Julia]] standard library includes a <code>[https://docs.julialang.org/en/stablev1/stdlibbase/sort/#Sorting-Algorithms-1Base.Sort.PartialQuickSort PartialQuickSort]</code> implementationalgorithm used in <code>[https://docs.julialang.org/en/v1/base/sort/#Base.Sort.partialsort! partialsort!]</code> and variants.
 
== See also ==