Quicksort: Difference between revisions

Content deleted Content added
m Reverted 1 edit by 14.141.254.2 (talk) to last revision by CiaPan
Algorithm: Rewrote parts, indicating considerable variability within the "quicksort" family, and even within the "Hoare partition scheme"
Line 33:
== Algorithm ==
[[File:Quicksort-diagram.svg|right|200px|thumb|Full example of quicksort on a random set of numbers. The shaded element is the pivot. It is always chosen as the last element of the partition. However, always choosing the last element in the partition as the pivot in this way results in poor performance ({{math|''O''(''n''²)}}) on ''already sorted'' arrays, or arrays of identical elements. Since sub-arrays of sorted / identical elements crop up a lot towards the end of a sorting procedure on a large set, versions of the quicksort algorithm that choose the pivot as the middle element run much more quickly than the algorithm described in this diagram on large sets of numbers.]]
Quicksort is a type of [[divide and conquer algorithm]] for sorting an array, based on a partitioning routine; the details of this partitioning can vary somewhat, so that quicksort is really a family of closely related algorithms. Applied to a range of at least two elements, partitioning produces a division into two consecutive non empty sub-ranges, in such a way that no element of the first sub-range is greater than an element of the second sub-range. After applying this partition, quicksort then recursively sorts the sub-ranges, possibly after excluding from them an element at the point of division that is at this point known to be already in its final ___location. Due to its recursive nature, quicksort (like the partition routine) has to be formulated so as to be callable for a range within a larger array, even if the ultimate goal is to sort a complete array. The steps for [[in-place algorithm|in-place]] quicksort are:
Quicksort is a [[divide and conquer algorithm]]. It first divides the input array into two smaller sub-arrays: the low elements and the high elements. It then recursively sorts the sub-arrays. The steps for [[in-place algorithm|in-place]] Quicksort are:
# If the range has less than two elements, return immediately as there is nothing to do. Possibly for other very short lengths a special-purpose sorting method is applied and the remainder of these steps skipped.
# Pick an element, called a ''pivot'', from the array.
# Otherwise pick an value, called a ''pivot'', that occurs in the range (the precise manner of choosing depends on the partition routine, and can involve randomness).
# ''Partitioning'': reorder the array so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the ''partition'' operation.
# ''Partition'' the range: reorder its elements, while determining a point of division, so that all elements with values less than the pivot come before the division, while all elements with values greater than the pivot come after it; elements that are equal to the pivot can go either way. Since at least one instance of the pivot is present, most partition routines ensure that the value that ends up at the point of division is equal to the pivot, and is now in its final position (but termination of quicksort does not depend on this, as long as sub-ranges strictly smaller than the original are produced).
# [[Recursion (computer science)|Recursively]] apply the above steps to the sub-array of elements with smaller values and separately to the sub-array of elements with greater values.
# [[Recursion (computer science)|Recursively]] apply the quicksort to the sub-range up to the point of division and to the sub-range after it, possibly excluding form both ranges the element equal to the pivot at the point of division. (If the partition produces a possibly larger sub-range near the boundary where all elements are known to be equal to the pivot, these can be excluded as well.)
 
The choice of partition routine (including the pivot selection) and other details not entirely specified above can affect the algorithm's performance, possibly to a great extend for specific input arrays. In discussing the efficiency of quicksort, it is therefore necessary to specify these choices first. Here we mention two specific partition methods.
The base case of the recursion is arrays of size zero or one, which are in order by definition, so they never need to be sorted.
 
The pivot selection and partitioning steps can be done in several different ways; the choice of specific implementation schemes greatly affects the algorithm's performance.
 
=== Lomuto partition scheme ===
Line 64 ⟶ 63:
 
=== Hoare partition scheme ===
The original partition scheme described by [[Tony Hoare]] uses two pointers (indices into the range) that start at theboth ends of the array being partitioned, then move toward each other, until they detect an inversion: a pair of elements, one greater than orthe equalbound to(Hoare's terms for the pivot value) at the first pointer, and one less than orthe bound equalat the second pointer; if at this point the first pointer is still before the second, thatthese elements are in the wrong order relative to each other., Theand inverted elementsthey are then swappedexchanged.<ref>{{Cite journal | title = Quicksort | url = http://comjnl.oxfordjournals.org/content/5/1/10 | journal = The Computer Journal | date = 1962-01-01 | issn = 0010-4620 | pages = 10–16 | volume = 5 | issue = 1 | doi = 10.1093/comjnl/5.1.10 | first = C. A. R. | last = Hoare | author-link = Tony Hoare | doi-access = free }}</ref> WhenAfter this the indicespointers meet,are themoved algorithm stopsinwards, and returns the finalsearch index.for Hoare'san schemeinversion is more efficient than Lomuto's partition scheme because it does three times fewer swaps on average, and it creates efficient partitions evenrepeated; when alleventually valuesthe arepointers equal.<refcross name=":1" />{{self-published inline | date=August 2015}} Like Lomuto's partition scheme, Hoare's partitioning also would cause Quicksort to degrade to {{math|''O''(''n''<sup>2</sup>)}} for already sorted input, if the pivot was chosen as the first orpoints the last element. Withafter the middle element as the pivotsecond), however, sorted data results with (almost) no swaps in equally sized partitions leading to best case behavior of Quicksort, i.e. {{math|''O''(''n'' log(''n''))}}. Like others, Hoare's partitioning doesn't produce a stable sort. In this scheme, the pivot's final ___locationexchange is not necessarily atperformed; the index thatpartition is returned, as the pivot and elements equal to the pivot can end up anywhere within the partition after a partitionnow stepcomplete, and may not be sorted until the base casepoint of adivision partition with a single element is reached via recursion. The next two segments that(between the maincrossed algorithm recurs on are {{mono|(lo..ppointers)}} (elementsis &le; pivot) and {{mono|(p+1determined..hi)}} (elements &ge; pivot) as opposed to {{mono|(lo..p-1)}} and {{mono|(p+1..hi)}} as in Lomuto's scheme. However, the partitioning algorithm guarantees {{nowrap|{{mono|lo ≤ p < hi}}}} which implies both resulting partitions are non-empty, hence there's no risk of infinite recursion. In [[pseudocode]],<ref name=":2"/>
 
With respect to this original description, implementations often make minor but important variations. Notably, the scheme as presented below includes elements equal to the pivot among the candidates for an inversion (so "greater than or equal" and "less than or equal" tests are used instead of "greater than" respectively "less than"). While there is no reason to exchange elements equal to the bound, this change allows tests on the pointers themselves to be omitted, which are otherwise needed to ensure they do not run out of range. Indeed, since at least one instance of the pivot value is present in the range, the first advancement of either pointer cannot pass across this instance if an inclusive test is used; once an exchange is performed, these exchanged elements are now both strictly ahead of the pointer that found them, preventing that pointer from running off. (The latter is true independently of the test used, so it would be possible to use the inclusive test only when looking for the first inversion. However using an inclusive test throughout also ensures that a division near the middle is found when all elements in the range are equal, which gives an important efficiency gain for sorting arrays with many equal elements.) There is also a concern of never producing a division at one of the extremities of the given range, as that would prevent the algorithm from advancing, causing non termination; this can only happen the first search already produces a false inversion (crossed pointers). In the version below the division returned is after the returned final value of the second pointer; the worry is than that it could point at the last element of the range, which would happen if that element is chosen as the pivot and all other elements in the range are smaller (both pointers then advance to the final element, and being equal no exchange is done). Therefore the pivot choice must avoid the final element, which is done here by rounding down the middle position, using the <kbd>floor</kbd> function;<ref>in many languages this is the standard behavior of integer division</ref>. This illustrates that the argument for correctness of an implementation of the Hoare partition scheme can be subtle, and it is easy to get it wrong.
 
In [[pseudocode]],<ref name=":2"/>
 
'''algorithm''' quicksort(A, lo, hi) '''is'''
Line 86 ⟶ 89:
'''return''' j
swap A[i] with A[j]
 
The pivot division rounds down, as shown here by the <kbd>floor</kbd> function;<ref>in many languages this is the standard behavior of integer division</ref> this avoids using A[hi] as the pivot, which can result in infinite recursion.
 
The entire array is sorted by {{mono|quicksort(A, 0, length(A) - 1)}}.
 
Hoare's scheme is more efficient than Lomuto's partition scheme because it does three times fewer swaps on average. Also, as mentioned, the implementation given creates an balanced partition even when all values are equal.<ref name=":1" />{{self-published inline | date=August 2015}}, which Lomuto's scheme does not. Like Lomuto's partition scheme, Hoare's partitioning also would cause Quicksort to degrade to {{math|''O''(''n''<sup>2</sup>)}} for already sorted input, if the pivot was chosen as the first or the last element. With the middle element as the pivot, however, sorted data results with (almost) no swaps in equally sized partitions leading to best case behavior of Quicksort, i.e. {{math|''O''(''n'' log(''n''))}}. Like others, Hoare's partitioning doesn't produce a stable sort. In this scheme, the pivot's final ___location is not necessarily at the index that is returned, as the pivot and elements equal to the pivot can end up anywhere within the partition after a partition step, and may not be sorted until the base case of a partition with a single element is reached via recursion. The next two segments that the main algorithm recurs on are {{mono|(lo..p)}} (elements &le; pivot) and {{mono|(p+1..hi)}} (elements &ge; pivot) as opposed to {{mono|(lo..p-1)}} and {{mono|(p+1..hi)}} as in Lomuto's scheme.
 
=== Implementation issues ===