Content deleted Content added
Reverted 1 edit by 2409:40E3:5017:664:4427:2C0B:1CC6:4BD6 (talk): Test edit |
m Dating maintenance tags: {{Citation needed}} |
||
(19 intermediate revisions by 14 users not shown) | |||
Line 16:
'''Quicksort''' is an efficient, general-purpose [[sorting algorithm]]. Quicksort was developed by British computer scientist [[Tony Hoare]] in 1959<ref>{{cite web |title=Sir Antony Hoare |publisher=Computer History Museum |access-date=22 April 2015 |url=http://www.computerhistory.org/fellowawards/hall/bios/Antony,Hoare/ |url-status=dead |archive-url=https://web.archive.org/web/20150403184558/http://www.computerhistory.org/fellowawards/hall/bios/Antony%2CHoare/ |archive-date=3 April 2015}}</ref> and published in 1961.<ref name=alg64>{{Cite journal |last1 = Hoare |first1 = C. A. R. |author-link1 = Tony Hoare |title = Algorithm 64: Quicksort |doi = 10.1145/366622.366644 |journal = [[Communications of the ACM|Comm. ACM]] |volume = 4 |issue = 7 |pages = 321 |year = 1961 }}</ref> It is still a commonly used algorithm for sorting. Overall, it is slightly faster than [[merge sort]] and [[heapsort]] for randomized data, particularly on larger distributions.<ref name="skiena">{{cite book |first=Steven S. |last=Skiena |year=2008 |author-link=Steven Skiena |title=The Algorithm Design Manual |url=https://books.google.com/books?id=7XUSn0IKQEgC |publisher=Springer |isbn=978-1-84800-069-8 |page=129}}</ref>
Quicksort is a [[divide-and-conquer algorithm]]. It works by selecting a
Quicksort is a [[comparison sort]], meaning that it can sort items of any type for which a "less-than" relation (formally, a [[total order]]) is defined. It is a comparison-based sort since elements ''a'' and ''b'' are only swapped in case their relative order has been obtained in the transitive closure of prior comparison-outcomes. Most implementations of quicksort are not [[Sorting algorithm#Stability|stable]], meaning that the relative order of equal sort items is not preserved.
Line 23:
== History ==
The quicksort algorithm was developed in 1959 by [[Tony Hoare]] while he was a visiting student at [[Moscow State University]]. At that time, Hoare was working on a [[machine translation]] project for the [[National Physical Laboratory, UK|National Physical Laboratory]]. As a part of the translation process, he needed to sort the words in Russian sentences before looking them up in a Russian-English dictionary, which was in alphabetical order on [[magnetic tape data storage|magnetic tape]].<ref>{{Cite journal |last = Shustek |first = L. |title = Interview: An interview with C.A.R. Hoare |doi = 10.1145/1467247.1467261 |journal = [[Communications of the ACM|Comm. ACM]] |volume = 52 |issue = 3 |pages = 38–41 |year = 2009 |s2cid = 1868477 }}</ref> After recognizing that his first idea, [[insertion sort]], would be slow, he came up with a new idea. He wrote the partition part in Mercury [[Autocode]] but had trouble dealing with the list of unsorted segments. On return to England, he was asked to write code for [[Shellsort]]. Hoare mentioned to his boss that he knew of a faster algorithm and his boss bet a [[Sixpence (British coin)|sixpence]] that he did not. His boss ultimately accepted that he had lost the bet. Hoare published a paper about his algorithm in [[The Computer Journal]] [https://academic.oup.com/comjnl/article/5/1/10/395338?login=false Volume 5, Issue 1, 1962, Pages 10–16]. Later, Hoare learned about [[ALGOL]] and its ability to do recursion,
Quicksort gained widespread adoption, appearing, for example, in [[Unix]] as the default library sort subroutine. Hence, it lent its name to the [[C standard library]] subroutine {{mono|[[qsort]]}}<ref name="engineering" /> and in the reference implementation of [[Java (programming language)|Java]].
Line 125:
Let's expand a little bit on the next two segments that the main algorithm recurs on. Because we are using strict comparators (>, <) in the '''{{Mono|"do...while"}}''' loops to prevent ourselves from running out of range, there's a chance that the pivot itself gets swapped with other elements in the partition function. Therefore, '''the index returned in the partition function isn't necessarily where the actual pivot is.''' Consider the example of '''{{Mono|[5, 2, 3, 1, 0]}}''', following the scheme, after the first partition the array becomes '''{{Mono|[0, 2, 1, 3, 5]}}''', the "index" returned is 2, which is the number 1, when the real pivot, the one we chose to start the partition with was the number 3. With this example, we see how it is necessary to include the returned index of the partition function in our subsequent recursions. As a result, we are presented with the choices of either recursing on {{mono|(lo..p)}} and {{mono|(p+1..hi)}}, or {{mono|(lo..p-1)}} and {{mono|(p..hi)}}. Which of the two options we choose depends on which index ('''i''' or '''j''') we return in the partition function when the indices cross, and how we choose our pivot in the partition function ('''floor''' v.s. '''ceiling''').
The choice of recursing on {{mono|(lo..p-1)}} and {{mono|(p..hi)}} follows the exact same logic as above. '''Because the right half of the recursion includes the returned index, it is the partition function's job to exclude the "head" in non-advancing scenarios.''' The index i (the "latter" index after the indices cross) in the partition function needs to be returned, and "ceiling" needs to be chosen as the pivot. The two nuances are clear, again, when considering the examples of sorting an array where multiple identical elements exist ('''{{Mono|[0, 0]}}'''), and an already sorted array '''{{Mono|[0, 1]}}''' respectively. It is noteworthy that with version of recursion, for the same reason, choice of the first element as pivot must be avoided.
Line 145:
It puts a median into <code>A[hi]</code> first, then that new value of <code>A[hi]</code> is used for a pivot, as in a basic algorithm presented above.
Specifically, the expected number of comparisons needed to sort {{mvar|n}} elements (see {{Section link||
:{{math|ninther(''a'') {{=}} median(Mo3(first {{sfrac|1|3}} of ''a''), Mo3(middle {{sfrac|1|3}} of ''a''), Mo3(final {{sfrac|1|3}} of ''a''))}}
Line 216:
The most unbalanced partition occurs when one of the sublists returned by the partitioning routine is of size {{math|''n'' − 1}}.<ref name="unbalanced">The other one may either have {{math|1}} element or be empty (have {{math|0}} elements), depending on whether the pivot is included in one of subpartitions, as in the Hoare's partitioning routine, or is excluded from both of them, like in the Lomuto's routine.</ref> This may occur if the pivot happens to be the smallest or largest element in the list, or in some implementations (e.g., the Lomuto partition scheme as described above) when all the elements are equal.
If this happens repeatedly in every partition, then each recursive call processes a list of size one less than the previous list. Consequently,
=== Best-case analysis ===
In the most balanced case, each
=== Average-case analysis ===
To sort an array of {{mvar|n}} distinct elements, quicksort takes {{math|''O''(''n'' log ''n'')}} time in expectation, averaged over all {{math|''n''!}} permutations of {{mvar|n}} elements with [[Uniform distribution (discrete)|equal probability]]. Alternatively, if the algorithm selects the pivot uniformly at random from the input array, the same analysis can be used to bound the expected running time for any input sequence; the expectation is then taken over the random choices made by the algorithm (Cormen ''et al.'', ''[[Introduction to Algorithms]]'',<ref name=":2"/> Section 7.3).
==== Using percentiles ====
If each pivot has rank somewhere in the middle 50 percent, that is, between the 25th [[percentile]] and the 75th percentile, then it splits the elements with at least 25% and at most 75% on each side.
When the input is a random permutation, the pivot has a random rank, and so it is not guaranteed to be in the middle 50 percent. However, when
Using more careful arguments, it is possible to extend this proof, for the version of Quicksort where the pivot is randomnly chosen,
to show a time bound that holds ''with high probability'': specifically, for any give <math>a\ge 4</math>, let <math>c=(a-4)/2</math>, then with probability at least <math>1-\frac{1}{n^c}</math>, the number of comparisons will not exceed <math>2an\log_{4/3}n</math>.<ref>{{cite book |last1=Motwani |first1= Rajeev |last2= Raghavan|first2= Prabhakar |date= |title= Randomized Algorithms|url= |___location= |publisher= Cambridge University Press|page= |isbn=9780521474658 |access-date=}}</ref>
▲When the input is a random permutation, the pivot has a random rank, and so it is not guaranteed to be in the middle 50 percent. However, when we start from a random permutation, in each recursive call the pivot has a random rank in its list, and so it is in the middle 50 percent about half the time. That is good enough. Imagine that a coin is flipped: heads means that the rank of the pivot is in the middle 50 percent, tail means that it isn't. Now imagine that the coin is flipped over and over until it gets {{mvar|k}} heads. Although this could take a long time, on average only {{math|2''k''}} flips are required, and the chance that the coin won't get {{mvar|k}} heads after {{math|100''k''}} flips is highly improbable (this can be made rigorous using [[Chernoff bound]]s). By the same argument, Quicksort's recursion will terminate on average at a call depth of only <math>2 \log_{4/3} n</math>. But if its average call depth is {{math|''O''(log ''n'')}}, and each level of the call tree processes at most {{mvar|n}} elements, the total amount of work done on average is the product, {{math|''O''(''n'' log ''n'')}}. The algorithm does not have to verify that the pivot is in the middle half—if we hit it any constant fraction of the times, that is enough for the desired complexity.
==== Using recurrences ====
Line 278 ⟶ 282:
Observe that since <math>(x_1,x_2,\ldots,x_n)</math> is a random permutation, <math>(x_1,x_2,\ldots,x_j,x_i)</math> is also a random permutation, so the probability that <math>x_i</math> is adjacent to <math>x_j</math> is exactly <math>\frac{2}{j+1}</math>.
: <math>\operatorname{E}[C] = \sum_i \sum_{j<i} \frac{2}{j+1} = O\left(\sum_i \log i\right)=O(n \log n).</math>
Line 294 ⟶ 298:
From a bit complexity viewpoint, variables such as ''lo'' and ''hi'' do not use constant space; it takes {{math|''O''(log ''n'')}} bits to index into a list of {{mvar|n}} items. Because there are such variables in every stack frame, quicksort using Sedgewick's trick requires {{math|''O''((log ''n'')<sup>2</sup>)}} bits of space. This space requirement isn't too terrible, though, since if the list contained distinct elements, it would need at least {{math|''O''(''n'' log ''n'')}} bits of space.
Stack-free versions of Quicksort have been proposed. These use <math>O(1)</math> additional space (more precisely, one cell of the type
Another, less common, not-in-place, version of quicksort uses {{math|''O''(''n'')}} space for working storage and can implement a stable sort. The working storage allows the input array to be easily partitioned in a stable manner and then copied back to the input array for successive recursive calls. Sedgewick's optimization is still appropriate.▼
of the sorted records, in order to exchange records, and a constant number of integer variables used as indices).<ref>{{cite conference |last= Ďurian|first= Branislav|date= |title=Quicksort without a stack |url= |work= |book-title= Mathematical Foundations of Computer Science 1986: Proceedings of the 12th Symposium |conference=MFCS 1986 |___location= Bratislava, Czechoslovakia|publisher= Springer Berlin Heidelberg|access-date=}}</ref>
▲Another, less common, not-in-place, version of quicksort{{citation needed|date=July 2025}} uses {{math|''O''(''n'')}} space for working storage and can implement a stable sort. The working storage allows the input array to be easily partitioned in a stable manner and then copied back to the input array for successive recursive calls. Sedgewick's optimization is still appropriate.
== Relation to other algorithms ==
Line 376 ⟶ 383:
=== Generalization ===
[[Richard J. Cole|Richard Cole]] and David C. Kandathil, in 2004, discovered a one-parameter family of sorting algorithms, called partition sorts, which on average (with all input orderings equally likely) perform at most <math>n\log n + {O}(n)</math> comparisons (close to the information theoretic lower bound) and <math>{\Theta}(n\log n)</math> operations; at worst they perform <math>{\Theta}(n\log^2 n)</math> comparisons (and also operations); these are in-place, requiring only additional <math>{O}(\log n)</math> space. Practical efficiency and smaller variance in performance were demonstrated against
== See also ==
|