Quicksort: Difference between revisions

Content deleted Content added
Fix logarithm notation: use subscript instead of superscript for log base
Tag: Reverted
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Citation needed}}
 
(7 intermediate revisions by 3 users not shown)
Line 37:
# [[Recursion (computer science)|Recursively]] apply quicksort to the sub-range up to the point of division and to the sub-range after it, possibly excluding from both ranges the element equal to the pivot at the point of division. (If the partition produces a possibly larger sub-range near the boundary where all elements are known to be equal to the pivot, these can be excluded as well.)
 
The choice of partition routine (including the pivot selection) and other details not entirely specified above can affect the algorithm's performance, possibly to a great extent for specific input arrays. In discussing the efficiency of quicksort, it is therefore necessary to specify these choices first. Here we mention two specific partition methods that are commonly used.
 
=== Lomuto partition scheme ===
Line 206:
 
==== Parallelization ====
Quicksort's divide-and-conquer formulation makes it amenable to [[parallel algorithm|parallelization]] using [[task parallelism]]. The partitioning step is accomplished through the use of a [[prefix sum|parallel prefix sum]] algorithm to compute an index for each array element in its section of the partitioned array.<ref>Umut A. Acar, Guy E Blelloch, Margaret Reid-Miller, and Kanat Tangwongsan, [https://www.cs.cmu.edu/afs/cs/academic/class/15210-s13/www/lectures/lecture19.pdf Quicksort and Sorting Lower Bounds], ''Parallel and Sequential Data Structures and Algorithms''. 2013.</ref><ref>{{Cite journal | title=Quicksort Partition via Prefix Scan | year=2012 | journal=Dr. Dobb's | last=Breshears | first=Clay | url=http://www.drdobbs.com/parallel/quicksort-partition-via-prefix-scan/240003109}}</ref> Given an array of size {{mvar|n}}, the partitioning step performs {{math|O(''n'')}} work in {{math|''O''(log ''n'')}} time and requires {{math|O(''n'')}} additional scratch space. After the array has been partitioned, the two partitions can be sorted recursively in parallel. Assuming an ideal choice of pivots, parallel quicksort sorts an array of size {{mvar|n}} in {{math|O(''n'' log ''n'')}} work in {{math|O(log<subsup>2</subsup> ''n'')}} time using {{math|O(''n'')}} additional space.
 
Quicksort has some disadvantages when compared to alternative sorting algorithms, like [[merge sort]], which complicate its efficient parallelization. The depth of quicksort's divide-and-conquer tree directly impacts the algorithm's scalability, and this depth is highly dependent on the algorithm's choice of pivot. Additionally, it is difficult to parallelize the partitioning step efficiently in-place. The use of scratch space simplifies the partitioning step, but increases the algorithm's memory footprint and constant overheads.
Line 224:
To sort an array of {{mvar|n}} distinct elements, quicksort takes {{math|''O''(''n'' log ''n'')}} time in expectation, averaged over all {{math|''n''!}} permutations of {{mvar|n}} elements with [[Uniform distribution (discrete)|equal probability]]. Alternatively, if the algorithm selects the pivot uniformly at random from the input array, the same analysis can be used to bound the expected running time for any input sequence; the expectation is then taken over the random choices made by the algorithm (Cormen ''et al.'', ''[[Introduction to Algorithms]]'',<ref name=":2"/> Section 7.3).
 
Three common proofs to this claim use percentiles, recurrences, and binary search trees, each providing different insights into quicksort's workings.
 
==== Using percentiles ====
Line 230:
 
When the input is a random permutation, the pivot has a random rank, and so it is not guaranteed to be in the middle 50 percent. However, when starting from a random permutation, each recursive call's pivot has a random rank in its list, and therefore is in the middle 50 percent approximately half the time. Imagine that a coin is flipped: heads means that the rank of the pivot is in the middle 50 percent, tail means that it isn't. Now imagine that the coin is flipped over and over until it gets {{mvar|k}} heads. Although this could take a long time, on average only {{math|2''k''}} flips are required, and the chance that the coin won't get {{mvar|k}} heads after {{math|100''k''}} flips is highly improbable (this can be made rigorous using [[Chernoff bound]]s). By the same argument, Quicksort's recursion will terminate on average at a call depth of only <math>2 \log_{4/3} n</math>. But if its average call depth is {{math|''O''(log ''n'')}}, and each level of the call tree processes at most {{mvar|n}} elements, the total amount of work done on average is the product, {{math|''O''(''n'' log ''n'')}}. The algorithm does not have to verify that the pivot is in the middle half as long as it is a consistent amount of times.
 
Using more careful arguments, it is possible to extend this proof, for the version of Quicksort where the pivot is randomnly chosen,
to show a time bound that holds ''with high probability'': specifically, for any give <math>a\ge 4</math>, let <math>c=(a-4)/2</math>, then with probability at least <math>1-\frac{1}{n^c}</math>, the number of comparisons will not exceed <math>2an\log_{4/3}n</math>.<ref>{{cite book |last1=Motwani |first1= Rajeev |last2= Raghavan|first2= Prabhakar |date= |title= Randomized Algorithms|url= |___location= |publisher= Cambridge University Press|page= |isbn=9780521474658 |access-date=}}</ref>
 
 
==== Using recurrences ====
Line 294 ⟶ 298:
From a bit complexity viewpoint, variables such as ''lo'' and ''hi'' do not use constant space; it takes {{math|''O''(log ''n'')}} bits to index into a list of {{mvar|n}} items. Because there are such variables in every stack frame, quicksort using Sedgewick's trick requires {{math|''O''((log ''n'')<sup>2</sup>)}} bits of space. This space requirement isn't too terrible, though, since if the list contained distinct elements, it would need at least {{math|''O''(''n'' log ''n'')}} bits of space.
 
Stack-free versions of Quicksort have been proposed. These use <math>O(1)</math> additional space (more precisely, one cell of the type
Another, less common, not-in-place, version of quicksort uses {{math|''O''(''n'')}} space for working storage and can implement a stable sort. The working storage allows the input array to be easily partitioned in a stable manner and then copied back to the input array for successive recursive calls. Sedgewick's optimization is still appropriate.
of the sorted records, in order to exchange records, and a constant number of integer variables used as indices).<ref>{{cite conference |last= Ďurian|first= Branislav|date= |title=Quicksort without a stack |url= |work= |book-title= Mathematical Foundations of Computer Science 1986: Proceedings of the 12th Symposium |conference=MFCS 1986 |___location= Bratislava, Czechoslovakia|publisher= Springer Berlin Heidelberg|access-date=}}</ref>
 
Another, less common, not-in-place, version of quicksort{{citation needed|date=July 2025}} uses {{math|''O''(''n'')}} space for working storage and can implement a stable sort. The working storage allows the input array to be easily partitioned in a stable manner and then copied back to the input array for successive recursive calls. Sedgewick's optimization is still appropriate.
 
== Relation to other algorithms ==
Line 376 ⟶ 383:
 
=== Generalization ===
[[Richard J. Cole|Richard Cole]] and David C. Kandathil, in 2004, discovered a one-parameter family of sorting algorithms, called partition sorts, which on average (with all input orderings equally likely) perform at most <math>n\log n + {O}(n)</math> comparisons (close to the information theoretic lower bound) and <math>{\Theta}(n\log n)</math> operations; at worst they perform <math>{\Theta}(n\log_2log^2 n)</math> comparisons (and also operations); these are in-place, requiring only additional <math>{O}(\log n)</math> space. Practical efficiency and smaller variance in performance were demonstrated against optimized quicksorts (of [[Robert Sedgewick (computer scientist)|Sedgewick]] and [[Jon Bentley (computer scientist)|Bentley]]-[[Douglas McIlroy|McIlroy]]).<ref>Richard Cole, David C. Kandathil: [http://www.cs.nyu.edu/cole/papers/part-sort.pdf "The average case analysis of Partition sorts"], European Symposium on Algorithms, 14–17 September 2004, Bergen, Norway. Published: ''Lecture Notes in Computer Science'' 3221, Springer Verlag, pp. 240–251.</ref>
 
== See also ==