Merge-insertion sort: Difference between revisions

Content deleted Content added
add animation gif
 
(51 intermediate revisions by 18 users not shown)
Line 1:
{{Short description|Type of comparison sorting algorithm}}
In [[computer science]], '''merge-insertinsertion sort''' or the '''Ford–Johnson algorithm''' is a [[comparison sort]]ing algorithm published in 1959 by [[L. R. Ford Jr.]] and [[Selmer M. Johnson]].{{r|fj|c4cs|distrib|knuth}} It uses fewer comparisons in the [[worst case analysis|worst case]] than the best previously known algorithms, [[insertion sort|binary insertion sort]] and [[merge sort]],{{r|fj}} and for 20 years it was the sorting algorithm with the fewest known comparisons.{{r|nonopt}} Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons.{{r|distrib}} The same algorithm may have also been independently discovered by [[Stanisław Trybuła]] and Czen Ping.{{r|knuth}}
[[File:Ford-janson.gif|thumb|An animation of the [[Merge algorithm|merge-algorithm]] sorting an array of randomized values.]]
 
==Algorithm==
Merge-insertinsertion sort performs the following steps, on an input <math>X</math> of <math>n</math> elements:<ref>The original description by {{harvtxt|Ford|Johnson|1959}} sorted the elements in descending order. The steps listed here reverse the output, makingfollowing the description in {{harvtxt|Knuth|1998}}. The resulting algorithm makes the same comparisons but producingproduces ascending order instead.</ref>
#Group the elements of <math>X</math> into <math>\lfloor n/2\rfloor</math> pairs of elements, arbitrarily, leaving one element unpaired if there is an odd number of elements.
#Perform <math>\lfloor n/2\rfloor</math> comparisons, one per pair, to determine the larger of the two elements in each pair.
#Recursively sort the <math>\lfloor n/2\rfloor</math> larger elements from each pair, creating a sorted sequence <math>S</math> of <math>\lfloor n/2\rfloor</math> of the input elements, in ascending order, using the merge-insertion sort.
#Insert at the start of <math>S</math> the element that was paired with the first and smallest element of <math>S</math>.
#Insert the remaining <math>\lceil n/2\rceil-1</math> elements of <math>X\setminus S</math> into <math>S</math>, one at a time, with a specially chosen insertion ordering described below. Use [[binary search]] in subsequences of <math>S</math> (as described below) to determine the position at which each element should be inserted.
Line 13 ⟶ 15:
:<math>S=(x_1,x_2,x_3,\dots),</math>
where each element <math>x_i</math> with <math>i\ge 3</math> is paired with an element <math>y_i < x_i</math> that has not yet been inserted. (There are no elements <math>y_1</math> or <math>y_2</math> because <math>x_1</math> and <math>x_2</math> were paired with each other.) If <math>n</math> is odd, the remaining unpaired element should also be numbered as <math>y_i</math> with <math>i</math> larger than the indexes of the paired elements.
Then, the final step of the outline above can be expanded into the following steps:{{r|fj|c4cs|distrib|knuth}}
*Partition the uninserted elements <math>y_i</math> into groups with contiguous indexes. There are two elements <math>y_3</math> and <math>y_4</math> in the first group, and the sizesums of eachsizes subsequentof groupevery equalstwo theadjacent numbergroups ofform elementsa insequence allof previouspowers of groupstwo. Thus, the sizes of the groups form a sequence of powers of twoare: 2, 2, 46, 10, 822, 1642, ...
*Order the uninserted elements by their groups (smaller indexes to larger indexes), but within each group order them from larger indexes to smaller indexes. Thus, the ordering becomes
::<math>y_4,y_3,y_6,y_5,y_{12},y_{11},y_{10},y_9,y_8,y_7,y_{1822},y_{21}\dots</math>
*Use this ordering to insert the elements <math>y_i</math> into <math>S</math>. For each element <math>y_i</math>, use a binary search from the start of <math>S</math> up to but not including <math>x_i</math> to determine where to insert <math>y_i</math>.
 
==Analysis==
IfLet this<math>C(n)</math> algorithmdenote isthe usednumber toof comparisons that merge-insertion sort makes, in the worst case, when sorting <math>n</math> elements,.
let <math>C(n)</math> denote theThis number of comparisons thatcan be broken down as the sum of itthree makes.terms:
Then*<math>\lfloor this number ofn/2\rfloor</math> comparisons can be analyzed asamong the sumpairs of three terms:items,
*<math>C(\lfloor n/2\rfloor)</math> comparisons amongfor the pairsrecursive of itemscall, and
<math>C(\lfloor*some n/2\rfloor)</math>number of comparisons for the recursivebinary insertions used to insert the remaining call,elements.
and some number of comparisons forIn the binarythird insertions used to insertterm, the remaining elements. The worst-case number of comparisons for the elements in the first group is two, because each is inserted into a subsequence of <math>S</math> of length at most three. First, <math>y_4</math> is inserted into the three-element subsequence <math>(x_1,x_2,x_3)</math>. Then, <math>y_3</math> is inserted into some permutation of the three-element subsequence <math>(x_1,x_2,y_4)</math>, or in some cases into the two-element subsequence <math>(x_1,x_2)</math>. Similarly, the elements <math>y_6</math> and <math>y_5</math> of the second group are each inserted into a subsequence of length at most seven, using three comparisons. More generally, the worst-case number of comparisons for the elements in the <math>i</math>th group is <math>i+1</math>, because each is inserted into a subsequence of length at most <math>2^{i+1}-1</math>.{{r|fj|c4cs|distrib|knuth}} By summing the number of comparisons used for all the elements and solving the resulting [[recurrence relation]],
this analysis can be used to compute the values of <math>C(n)</math>, giving the formula<ref>{{harvtxt|Knuth|1998}} credits the summation formula to the 1960 Ph.D. thesis of A. Hadian. The approximation formula was already given by {{harvtxt|Ford|Johnson|1959}}.</ref>
 
:<math>C(n)=\sum_{i=1}^n \left\lceil \log_2 \frac{3i}{4} \right\rceil \approx n\log_2 n - 1.415n</math>
Based on this, the number of comparisons used by the algorithm
or, in [[closed-form expression|closed form]],{{r|gn}}
on an <math>n</math>-element input
:<math>C(n)=n\biggl\lceil\log_2\frac{3n}{4}\biggr\rceil-\biggl\lfloor\frac{2^{\lfloor \log_2 6n\rfloor}}{3}\biggr\rfloor+\biggl\lfloor\frac{\log_2 6n}{2}\biggr\rfloor.</math>
can be described by the [[integer sequence]] (starting with <math>n=1</math>){{r|fj}}
For <math>n=1,2,\dots</math> the numbers of comparisons are{{r|fj}}
:0, 1, 3, 5, 7, 10, 13, 16, 19, 22, 26, 30, 34, ... {{OEIS|A001768}}
 
[[Asymptotic analysis|Asymptotically]], these numbers are approximately{{r|fj}}
==Relation to other comparison sorts==
:<math>n\log_2 n - 1.415n</math>
The algorithm is called merge-insertion sort because the initial comparisons that it performs before its recursive call (pairing up arbitrary items and comparing each pair) are the same as the initial comparisons of [[merge sort]],
The number of comparisons agrees with the [[lower bound]] on comparison sorting of <math>\lceil\log_2 n!\rceil\approx n\log_2 n - 1.443n</math> up to <math>n=11</math>, but diverges for larger values of <math>n</math>.
while the comparisons that it performs after the recursive call (using binary search to insert elements one by one into a sorted list) follow the same principle as [[insertion sort]]. In this sense, it is a [[hybrid algorithm]] that combines both merge sort and insertion sort.<ref>{{harvtxt|Knuth|1998}}, p. 184: "Since it involves some aspects of merging and some aspects of insertion, we call it ''merge insertion''."</ref>
It also compares favorably with the [[sorting number]]s, the numbers of comparisons made by binary insertion sort or merge sort in the worst case,
 
which are approximately <math>n\log_2 n - 0.915n</math> for certain values of <math>n</math>.{{r|fj}}
TheFor numbersmall inputs (up to <math>n=11</math>) its numbers of comparisons agrees withequal the [[lower bound]] on comparison sorting of <math>\lceil\log_2 n!\rceil\approx n\log_2 n - 1.443n</math>. up to <math>n=11</math>However, but diverges for larger valuesinputs the number of <math>n</math>comparisons made by the merge-insertion algorithm is bigger than this lower bound.
Merge-insertion sort also performs fewer comparisons than the [[sorting number]]s, which count the comparisons made by binary insertion sort or merge sort in the worst case. The sorting numbers fluctuate between <math>n\log_2 n - 0.915n</math> and <math>n\log_2 n - n</math>, with the same leading term but a worse constant factor in the lower-order linear term.{{r|fj}}
 
Merge-insertion sort is the sorting algorithm with the minimum possible comparisons for <math>n</math> items whenever <math>n\le 22</math>, and it has the fewest comparisons known for <math>n\le 46</math>.{{r|pec|pec2}}
For 20 years, merge-insertion sort was the sorting algorithm with the fewest comparisons known for all input lengths.
However, in 1979 Glenn Manacher published another sorting algorithm that used even fewer comparisons, for large enough inputs.{{r|distrib|nonopt}}
It remains unknown exactly how many comparisons are needed for sorting, for all <math>n</math>, but Manacher's algorithm
and later record-breaking sorting algorithms have all used modifications of the merge-insertion sort ideas.{{r|distrib}}
 
==References==
Line 73 ⟶ 84:
| volume = 66
| year = 1959}}</ref>
 
<ref name=gn>{{citation
| last1 = Guy | first1 = Richard K. | author1-link = Richard K. Guy
| last2 = Nowakowski | first2 = Richard J.
| date = December 1995
| doi = 10.2307/2975272
| issue = 10
| journal = [[American Mathematical Monthly]]
| pages = 921–926
| title = ''Monthly'' Unsolved Problems, 1969-1995
| volume = 102}}</ref>
 
<ref name=knuth>{{citation
| last = Knuth | first = Donald E. | author-link = Donald Knuth
| contribution = Merge insertion
| edition = 2nd
| pages = 184–186
| title = [[The Art of Computer Programming]], Vol. 3: Sorting and Searching
| year = 1998}}</ref>
 
<ref name=nonopt>{{citation
Line 82 ⟶ 112:
| pages = 441–456
| title = The Ford-Johnson Sorting Algorithm Is Not Optimal
| volume = 26}}</ref>| doi-access = free
}}</ref>
 
<ref name=pec>{{citation
| last = Peczarski | first = Marcin
| doi = 10.1007/s00453-004-1100-7
| issue = 2
| journal = Algorithmica
| mr = 2072769
| pages = 133–145
| title = New results in minimum-comparison sorting
| volume = 40
| year = 2004}}</ref>
 
<ref name=pec2>{{citation
| last = Peczarski | first = Marcin
| doi = 10.1016/j.ipl.2006.09.001
| issue = 3
| journal = Information Processing Letters
| mr = 2287331
| pages = 126–128
| title = The Ford-Johnson algorithm still unbeaten for less than 47 elements
| volume = 101
| year = 2007}}</ref>
 
}}
Line 88 ⟶ 141:
{{Sorting}}
[[Category:Comparison sorts]]
[[Category:1959 in computing]]