Content deleted Content added
Why the name; Knuth; summation formula |
add animation gif |
||
(45 intermediate revisions by 18 users not shown) | |||
Line 1:
{{Short description|Type of comparison sorting algorithm}}
In [[computer science]], '''merge-
[[File:Ford-janson.gif|thumb|An animation of the [[Merge algorithm|merge-algorithm]] sorting an array of randomized values.]]
==Algorithm==
Merge-
#Group the elements of <math>X</math> into <math>\lfloor n/2\rfloor</math> pairs of elements, arbitrarily, leaving one element unpaired if there is an odd number of elements.
#Perform <math>\lfloor n/2\rfloor</math> comparisons, one per pair, to determine the larger of the two elements in each pair.
#Recursively sort the <math>\lfloor n/2\rfloor</math> larger elements from each pair, creating a sorted sequence <math>S</math> of <math>\lfloor n/2\rfloor</math> of the input elements, in ascending order, using the merge-insertion sort.
#Insert at the start of <math>S</math> the element that was paired with the first and smallest element of <math>S</math>.
#Insert the remaining <math>\lceil n/2\rceil-1</math> elements of <math>X\setminus S</math> into <math>S</math>, one at a time, with a specially chosen insertion ordering described below. Use [[binary search]] in subsequences of <math>S</math> (as described below) to determine the position at which each element should be inserted.
Line 13 ⟶ 15:
:<math>S=(x_1,x_2,x_3,\dots),</math>
where each element <math>x_i</math> with <math>i\ge 3</math> is paired with an element <math>y_i < x_i</math> that has not yet been inserted. (There are no elements <math>y_1</math> or <math>y_2</math> because <math>x_1</math> and <math>x_2</math> were paired with each other.) If <math>n</math> is odd, the remaining unpaired element should also be numbered as <math>y_i</math> with <math>i</math> larger than the indexes of the paired elements.
Then, the final step of the outline above can be expanded into the following steps:{{r|fj|c4cs|distrib|knuth}}
*Partition the uninserted elements <math>y_i</math> into groups with contiguous indexes. There are two elements <math>y_3</math> and <math>y_4</math> in the first group, and the
*Order the uninserted elements by their groups (smaller indexes to larger indexes), but within each group order them from larger indexes to smaller indexes. Thus, the ordering becomes
::<math>y_4,y_3,y_6,y_5,y_{12},y_{11},y_{10},y_9,y_8,y_7,y_{
*Use this ordering to insert the elements <math>y_i</math> into <math>S</math>. For each element <math>y_i</math>, use a binary search from the start of <math>S</math> up to but not including <math>x_i</math> to determine where to insert <math>y_i</math>.
==Analysis==
Let <math>C(n)</math> denote the number of comparisons that merge-
This number of comparisons can be broken down as the sum of three terms:
*<math>\lfloor n/2\rfloor</math> comparisons among the pairs of items,
*<math>C(\lfloor n/2\rfloor)</math> comparisons for the recursive call, and
*
In the third term, the worst-case number of comparisons for the elements in the first group is two, because each is inserted into a subsequence of <math>S</math> of length at most three. First, <math>y_4</math> is inserted into the three-element subsequence <math>(x_1,x_2,x_3)</math>. Then, <math>y_3</math> is inserted into some permutation of the three-element subsequence <math>(x_1,x_2,y_4)</math>, or in some cases into the two-element subsequence <math>(x_1,x_2)</math>. Similarly, the elements <math>y_6</math> and <math>y_5</math> of the second group are each inserted into a subsequence of length at most seven, using three comparisons. More generally, the worst-case number of comparisons for the elements in the <math>i</math>th group is <math>i+1</math>, because each is inserted into a subsequence of length at most <math>2^{i+1}-1</math>.{{r|fj|c4cs|distrib|knuth}} By summing the number of comparisons used for all the elements and solving the resulting [[recurrence relation]],
this analysis can be used to compute the values of <math>C(n)</math>, giving the formula<ref>{{harvtxt|Knuth|1998}} credits
:<math>C(n)=\sum_{i=1}^n \left\lceil \log_2 \frac{3i}{4} \right\rceil \approx n\log_2 n - 1.415n</math>
or, in [[closed-form expression|closed form]],{{r|gn}}
:<math>C(n)=n\biggl\lceil\log_2\frac{3n}{4}\biggr\rceil-\biggl\lfloor\frac{2^{\lfloor \log_2 6n\rfloor}}{3}\biggr\rfloor+\biggl\lfloor\frac{\log_2 6n}{2}\biggr\rfloor.</math>
For <math>n=1,2,\dots</math> the numbers of comparisons are{{r|fj}}
:0, 1, 3, 5, 7, 10, 13, 16, 19, 22, 26, 30, 34, ... {{OEIS|A001768}}
For small inputs (up to <math>n=11</math>) these numbers of comparisons equal the [[lower bound]] on comparison sorting of <math>\lceil\log_2 n!\rceil\approx n\log_2 n - 1.443n</math>. However, for larger inputs the number of comparisons made by the merge-insert algorithm is bigger than this lower bound.▼
Merge-insert sort also performs fewer comparisons than the [[sorting number]]s, which count the comparisons made by binary insertion sort or merge sort in the worst case. The sorting numbers fluctuate between <math>n\log_2 n - 0.915n</math> and <math>n\log_2 n - n</math>, with the same leading term but a worse constant factor in the lower-order linear term.{{r|fj}}▼
==Relation to other comparison sorts==
The algorithm is called merge-
while the comparisons that it performs after the recursive call (using binary search to insert elements one by one into a sorted list) follow the same
▲For small inputs (up to <math>n=11</math>)
Merge insert sort is the sorting algorithm with the fewest possible comparisons for <math>n</math> items whenever <math>n\le 14</math>, and for <math>20\le n\le 22</math>.{{r|pec}}▼
▲Merge-
For 20 years, merge-insert sort was the sorting algorithm with the fewest comparisons known for all input lengths.▼
▲Merge
▲For 20 years, merge-
However, in 1979 Glenn Manacher published another sorting algorithm that used even fewer comparisons, for large enough inputs.{{r|distrib|nonopt}}
It remains unknown exactly how many comparisons are needed for sorting, for all <math>n</math>, but Manacher's algorithm
and later record-breaking sorting algorithms have all used modifications of the merge-
==References==
Line 81 ⟶ 84:
| volume = 66
| year = 1959}}</ref>
<ref name=gn>{{citation
| last1 = Guy | first1 = Richard K. | author1-link = Richard K. Guy
| last2 = Nowakowski | first2 = Richard J.
| date = December 1995
| doi = 10.2307/2975272
| issue = 10
| journal = [[American Mathematical Monthly]]
| pages = 921–926
| title = ''Monthly'' Unsolved Problems, 1969-1995
| volume = 102}}</ref>
<ref name=knuth>{{citation
Line 98 ⟶ 112:
| pages = 441–456
| title = The Ford-Johnson Sorting Algorithm Is Not Optimal
| volume = 26
}}</ref>
<ref name=pec>{{citation
Line 110 ⟶ 125:
| volume = 40
| year = 2004}}</ref>
<ref name=pec2>{{citation
| last = Peczarski | first = Marcin
| doi = 10.1016/j.ipl.2006.09.001
| issue = 3
| journal = Information Processing Letters
| mr = 2287331
| pages = 126–128
| title = The Ford-Johnson algorithm still unbeaten for less than 47 elements
| volume = 101
| year = 2007}}</ref>
}}
Line 115 ⟶ 141:
{{Sorting}}
[[Category:Comparison sorts]]
[[Category:1959 in computing]]
|