Partial sorting: Difference between revisions

Content deleted Content added
convert solution to and fro
Data structure-based solutions: O((n + k) log k) = O(n log k), and don't address the reader as "you"; wants citation and clarification for the second suggestion
Line 9:
== Data structure-based solutions ==
 
Another simple method is to add each element of the list into a [[priority queue]] data structure, such as a [[heap (data structure)|heap]] or [[self-balancing binary search tree]], with at most ''k'' elements. Whenever the data structure has more than ''k'' elements, we remove the largest element, which can be done in O(log ''k'') time. Each insertion operation also takes O(log ''k'') time, resulting in O(''n'' log ''k'') time overall. If [[heap (data structure)|heap]] is used, you may need to sort it further resulting in O((''n'' + ''k'')log''k'').
 
It is possible to transform the list into a [[binary heap]] in Θ(''n'') time, and then traverse the heap using a modified [[breadth-first search]] algorithm that places the elements in a [[Prioritypriority Queuequeue]]{{clarify|reason=a heap is a priority queue!}} (instead of the ordinary queue that is normally used in a BFS), and terminate the scan after traversing exactly ''k'' elements. As the queue size remains O(''k'') throughout the traversal, it would require O(''k'' log ''k'') time to complete, leading to a time bound of O(''n'' + ''k'' log ''k'') on this algorithm.{{citation needed}}
 
We can achieve an O(log ''n'') time solution using [[skip lists]]. Skip lists are sorted data structures that allow insertion, deletion and indexed retrieval in O(log ''n'') time. Thus, for any given percentile, we can insert a new element into (and possibly delete an old element from) the list in O(log ''n''), calculate the corresponding index(es) and finally access the percentile value in O(log ''n'') time. See, for example, this Python-based implementation for calculating [http://code.activestate.com/recipes/576930-efficient-running-median-using-an-indexable-skipli running median].