Quicksort: Difference between revisions

Content deleted Content added
mNo edit summary
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Citation needed}}
 
(11 intermediate revisions by 7 users not shown)
Line 145:
It puts a median into <code>A[hi]</code> first, then that new value of <code>A[hi]</code> is used for a pivot, as in a basic algorithm presented above.
 
Specifically, the expected number of comparisons needed to sort {{mvar|n}} elements (see {{Section link||AnalysisAverage-case of randomized quicksortanalysis}}) with random pivot selection is {{math|1.386 ''n'' log ''n''}}. Median-of-three pivoting brings this down to {{math|[[Binomial coefficient|''C'']]<sub>''n'', 2</sub> ≈ 1.188 ''n'' log ''n''}}, at the expense of a three-percent increase in the expected number of swaps.{{r|engineering}} An even stronger pivoting rule, for larger arrays, is to pick the [[ninther]], a recursive median-of-three (Mo3), defined as{{r|engineering}}
 
:{{math|ninther(''a'') {{=}} median(Mo3(first {{sfrac|1|3}} of ''a''), Mo3(middle {{sfrac|1|3}} of ''a''), Mo3(final {{sfrac|1|3}} of ''a''))}}
Line 213:
 
== Formal analysis ==
{{inappropriate person|section|we|excessively|date=April 2025}}
=== Worst-case analysis ===
The most unbalanced partition occurs when one of the sublists returned by the partitioning routine is of size {{math|''n'' − 1}}.<ref name="unbalanced">The other one may either have {{math|1}} element or be empty (have {{math|0}} elements), depending on whether the pivot is included in one of subpartitions, as in the Hoare's partitioning routine, or is excluded from both of them, like in the Lomuto's routine.</ref> This may occur if the pivot happens to be the smallest or largest element in the list, or in some implementations (e.g., the Lomuto partition scheme as described above) when all the elements are equal.
 
If this happens repeatedly in every partition, then each recursive call processes a list of size one less than the previous list. Consequently, weit can maketakes {{math|''n'' − 1}} nested calls before weto reach a list of size 1. This means that the [[Call stack|call tree]] is a linear chain of {{math|''n'' − 1}} nested calls. The {{mvar|i}}th call does {{math|''O''(''n'' − ''i'')}} work to do the partition, and <math>\textstyle\sum_{i=0}^n (n-i) = O(n^2)</math>, so in that case quicksort takes {{math|''O''(''n''<sup>2</sup>)}} time.
 
=== Best-case analysis ===
In the most balanced case, each time we perform a partition we dividedivides the list into two nearly equal pieces. This means each recursive call processes a list of half the size. Consequently, we can make only {{math|log<sub>2</sub> ''n''}} nested calls beforecan webe made before reachreaching a list of size 1. This means that the depth of the [[Call stack|call tree]] is {{math|log<sub>2</sub> ''n''}}. But no two calls at the same level of the call tree process the same part of the original list; thus, each level of calls needs only {{math|''O''(''n'')}} time all together (each call has some constant overhead, but since there are only {{math|''O''(''n'')}} calls at each level, this is subsumed in the {{math|''O''(''n'')}} factor). The result is that the algorithm uses only {{math|''O''(''n'' log ''n'')}} time.
 
=== Average-case analysis ===
To sort an array of {{mvar|n}} distinct elements, quicksort takes {{math|''O''(''n'' log ''n'')}} time in expectation, averaged over all {{math|''n''!}} permutations of {{mvar|n}} elements with [[Uniform distribution (discrete)|equal probability]]. Alternatively, if the algorithm selects the pivot uniformly at random from the input array, the same analysis can be used to bound the expected running time for any input sequence; the expectation is then taken over the random choices made by the algorithm (Cormen ''et al.'', ''[[Introduction to Algorithms]]'',<ref name=":2"/> Section 7.3).
 
We list here threeThree common proofs to this claim use percentiles, recurrences, and binary search trees, each providing different insights into quicksort's workings.
 
==== Using percentiles ====
If each pivot has rank somewhere in the middle 50 percent, that is, between the 25th [[percentile]] and the 75th percentile, then it splits the elements with at least 25% and at most 75% on each side. IfConsistently we could consistently choosechoosing such pivots, we would only have to split the list at most <math>\log_{4/3} n</math> times before reaching lists of size 1, yielding an {{math|''O''(''n'' log ''n'')}} algorithm.
 
When the input is a random permutation, the pivot has a random rank, and so it is not guaranteed to be in the middle 50 percent. However, when we startstarting from a random permutation, in each recursive call the's pivot has a random rank in its list, and so ittherefore is in the middle 50 percent aboutapproximately half the time. That is good enough. Imagine that a coin is flipped: heads means that the rank of the pivot is in the middle 50 percent, tail means that it isn't. Now imagine that the coin is flipped over and over until it gets {{mvar|k}} heads. Although this could take a long time, on average only {{math|2''k''}} flips are required, and the chance that the coin won't get {{mvar|k}} heads after {{math|100''k''}} flips is highly improbable (this can be made rigorous using [[Chernoff bound]]s). By the same argument, Quicksort's recursion will terminate on average at a call depth of only <math>2 \log_{4/3} n</math>. But if its average call depth is {{math|''O''(log ''n'')}}, and each level of the call tree processes at most {{mvar|n}} elements, the total amount of work done on average is the product, {{math|''O''(''n'' log ''n'')}}. The algorithm does not have to verify that the pivot is in the middle half—ifhalf weas hitlong as it anyis constanta fractionconsistent amount of the times, that is enough for the desired complexity.
 
Using more careful arguments, it is possible to extend this proof, for the version of Quicksort where the pivot is randomnly chosen,
to show a time bound that holds ''with high probability'': specifically, for any give <math>a\ge 4</math>, let <math>c=(a-4)/2</math>, then with probability at least <math>1-\frac{1}{n^c}</math>, the number of comparisons will not exceed <math>2an\log_{4/3}n</math>.<ref>{{cite book |last1=Motwani |first1= Rajeev |last2= Raghavan|first2= Prabhakar |date= |title= Randomized Algorithms|url= |___location= |publisher= Cambridge University Press|page= |isbn=9780521474658 |access-date=}}</ref>
 
When the input is a random permutation, the pivot has a random rank, and so it is not guaranteed to be in the middle 50 percent. However, when we start from a random permutation, in each recursive call the pivot has a random rank in its list, and so it is in the middle 50 percent about half the time. That is good enough. Imagine that a coin is flipped: heads means that the rank of the pivot is in the middle 50 percent, tail means that it isn't. Now imagine that the coin is flipped over and over until it gets {{mvar|k}} heads. Although this could take a long time, on average only {{math|2''k''}} flips are required, and the chance that the coin won't get {{mvar|k}} heads after {{math|100''k''}} flips is highly improbable (this can be made rigorous using [[Chernoff bound]]s). By the same argument, Quicksort's recursion will terminate on average at a call depth of only <math>2 \log_{4/3} n</math>. But if its average call depth is {{math|''O''(log ''n'')}}, and each level of the call tree processes at most {{mvar|n}} elements, the total amount of work done on average is the product, {{math|''O''(''n'' log ''n'')}}. The algorithm does not have to verify that the pivot is in the middle half—if we hit it any constant fraction of the times, that is enough for the desired complexity.
 
==== Using recurrences ====
Line 279 ⟶ 282:
Observe that since <math>(x_1,x_2,\ldots,x_n)</math> is a random permutation, <math>(x_1,x_2,\ldots,x_j,x_i)</math> is also a random permutation, so the probability that <math>x_i</math> is adjacent to <math>x_j</math> is exactly <math>\frac{2}{j+1}</math>.
 
WeSimplified endas with athe short calculation:
 
: <math>\operatorname{E}[C] = \sum_i \sum_{j<i} \frac{2}{j+1} = O\left(\sum_i \log i\right)=O(n \log n).</math>
Line 295 ⟶ 298:
From a bit complexity viewpoint, variables such as ''lo'' and ''hi'' do not use constant space; it takes {{math|''O''(log ''n'')}} bits to index into a list of {{mvar|n}} items. Because there are such variables in every stack frame, quicksort using Sedgewick's trick requires {{math|''O''((log ''n'')<sup>2</sup>)}} bits of space. This space requirement isn't too terrible, though, since if the list contained distinct elements, it would need at least {{math|''O''(''n'' log ''n'')}} bits of space.
 
Stack-free versions of Quicksort have been proposed. These use <math>O(1)</math> additional space (more precisely, one cell of the type
Another, less common, not-in-place, version of quicksort uses {{math|''O''(''n'')}} space for working storage and can implement a stable sort. The working storage allows the input array to be easily partitioned in a stable manner and then copied back to the input array for successive recursive calls. Sedgewick's optimization is still appropriate.
of the sorted records, in order to exchange records, and a constant number of integer variables used as indices).<ref>{{cite conference |last= Ďurian|first= Branislav|date= |title=Quicksort without a stack |url= |work= |book-title= Mathematical Foundations of Computer Science 1986: Proceedings of the 12th Symposium |conference=MFCS 1986 |___location= Bratislava, Czechoslovakia|publisher= Springer Berlin Heidelberg|access-date=}}</ref>
 
Another, less common, not-in-place, version of quicksort{{citation needed|date=July 2025}} uses {{math|''O''(''n'')}} space for working storage and can implement a stable sort. The working storage allows the input array to be easily partitioned in a stable manner and then copied back to the input array for successive recursive calls. Sedgewick's optimization is still appropriate.
 
== Relation to other algorithms ==