Merge-insertion sort: Difference between revisions

Content deleted Content added
optimality
Why the name; Knuth; summation formula
Line 1:
In [[computer science]], '''merge-insert sort''' or the '''Ford–Johnson algorithm''' is a [[comparison sort]]ing algorithm published in 1959 by [[L. R. Ford Jr.]] and [[Selmer M. Johnson]].{{r|fj|c4cs|distrib}} It uses fewer comparisons in the [[worst case analysis|worst case]] than the best previously known algorithms, [[insertion sort|binary insertion sort]] and [[merge sort]],{{r|fj}} and for 20 years it was the sorting algorithm with the fewest known comparisons.{{r|nonopt}} Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons.{{r|distrib}} The same algorithm may have also been independently discovered by S. Trybula and P. Czen.{{r|knuth}}
 
==Algorithm==
Merge-insert sort performs the following steps, on an input <math>X</math> of <math>n</math> elements:<ref>The original description by {{harvtxt|Ford|Johnson|1959}} sorted the elements in descending order. The steps listed here reverse the output, makingfollowing the description in {{harvtxt|Knuth|1998}}. The resulting algorithm makes the same comparisons but producingproduces ascending order instead.</ref>
#Group the elements of <math>X</math> into <math>\lfloor n/2\rfloor</math> pairs of elements, arbitrarily, leaving one element unpaired if there is an odd number of elements.
#Perform <math>\lfloor n/2\rfloor</math> comparisons, one per pair, to determine the larger of the two elements in each pair.
Line 25:
*<math>C(\lfloor n/2\rfloor)</math> comparisons for the recursive call,
*Some number of comparisons for the binary insertions used to insert the remaining elements.
In the third term, the worst-case number of comparisons for the elements in the first group is two, because each is inserted into a subsequence of <math>S</math> of length at most three. First, <math>y_4</math> is inserted into the three-element subsequence <math>(x_1,x_2,x_3)</math>. Then, <math>y_3</math> is inserted into some permutation of the three-element subsequence <math>(x_1,x_2,y_4)</math>, or in some cases into the two-element subsequence <math>(x_1,x_2)</math>. Similarly, the elements <math>y_6</math> and <math>y_5</math> of the second group are each inserted into a subsequence of length at most seven, using three comparisons. More generally, the worst-case number of comparisons for the elements in the <math>i</math>th group is <math>i+1</math>, because each is inserted into a subsequence of length at most <math>2^{i+1}-1</math>.{{r|fj|c4cs|distrib}} By summing the number of comparisons used for all the elements and solving the resulting [[recurrence relation]],
this analysis can be used to compute the values of <math>C(n)</math>, giving the formula<ref>{{harvtxt|Knuth|1998}} credits this formula to the 1960 Ph.D. thesis of A. Hadian.</ref>
 
:<math>C(n)=\sum_{i=1}^n \left\lceil \log_2 \frac{3i}{4} \right\rceil.</math>
By summing the number of comparisons used for all the elements and solving the resulting [[recurrence relation]],
this analysis can be used to compute the values of <math>C(n)</math>. For <math>n=1,2,\dots</math> thesethe numbers of valuescomparisons are{{r|fj}}
:0, 1, 3, 5, 7, 10, 13, 16, 19, 22, 26, 30, 34, ... {{OEIS|A001768}}
[[Asymptotic analysis|Asymptotically]], these numbers are approximately{{r|fj}}
Line 35:
Merge-insert sort also performs fewer comparisons than the [[sorting number]]s, which count the comparisons made by binary insertion sort or merge sort in the worst case. The sorting numbers fluctuate between <math>n\log_2 n - 0.915n</math> and <math>n\log_2 n - n</math>, with the same leading term but a worse constant factor in the lower-order linear term.{{r|fj}}
 
==Relation to other comparison sorts==
==Optimality==
The algorithm is called merge-insert sort because the initial comparisons that it performs before its recursive call (pairing up arbitrary items and comparing each pair) are the same as the initial comparisons of [[merge sort]],
For 20 years, merge-insert sort was the sorting algorithm with the fewest comparisons known (in the worst case, for each input length).
while the comparisons that it performs after the recursive call (using binary search to insert elements one by one into a sorted list) follow the same principal as [[insertion sort]]. In this sense, it is a [[hybrid algorithm]] that combines both merge sort and insertion sort.<ref>{{harvtxt|Knuth|1998}}, p. 184: "Since it involves some aspects of merging and some aspects of insertion, we call it ''merge insertion''."</ref>
 
Merge insert sort is the sorting algorithm with the fewest possible comparisons for <math>n</math> items whenever <math>n\le 14</math>, and for <math>20\le n\le 22</math>.{{r|pec}}
For 20 years, merge-insert sort was the sorting algorithm with the fewest comparisons known (in the worst case, for eachall input length)lengths.
However, in 1979 Glenn Manacher published another sorting algorithm that used even fewer comparisons, for large enough inputs.{{r|distrib|nonopt}}
It remains unknown exactly how many comparisons are needed for sorting, for all <math>n</math>, but Manacher's algorithm
Line 77 ⟶ 81:
| volume = 66
| year = 1959}}</ref>
 
<ref name=knuth>{{citation
| last = Knuth | first = Donald E. | author-link = Donald Knuth
| contribution = Merge insertion
| edition = 2nd
| pages = 184–186
| title = [[The Art of Computer Programming]], Vol. 3: Sorting and Searching
| year = 1998}}</ref>
 
<ref name=nonopt>{{citation
Line 87 ⟶ 99:
| title = The Ford-Johnson Sorting Algorithm Is Not Optimal
| volume = 26}}</ref>
 
<ref name=pec>{{citation
| last = Peczarski | first = Marcin
| doi = 10.1007/s00453-004-1100-7
| issue = 2
| journal = Algorithmica
| mr = 2072769
| pages = 133–145
| title = New results in minimum-comparison sorting
| volume = 40
| year = 2004}}</ref>
 
}}