Merge-insertion sort

In computer science, merge-insert sort or the Ford–Johnson algorithm is a comparison sorting algorithm published in 1959 by L. R. Ford Jr. and Selmer M. Johnson.^[1]^[2]^[3] It uses fewer comparisons in the worst case than the best previously known algorithms, binary insertion sort and merge sort,^[1] and for 20 years it was the sorting algorithm with the fewest known comparisons.^[4] Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons.^[3]

Algorithm

Merge-insert sort performs the following steps, on an input $X$ of $n$ elements:^[5]

Group the elements of $X$ into $\lfloor n/2\rfloor$ pairs of elements, arbitrarily, leaving one element unpaired if there is an odd number of elements.
Perform $\lfloor n/2\rfloor$ comparisons, one per pair, to determine the larger of the two elements in each pair.
Recursively sort the $\lfloor n/2\rfloor$ larger elements from each pair, creating a sorted sequence $S$ of $\lfloor n/2\rfloor$ of the input elements, in ascending order.
Insert at the start of $S$ the element that was paired with the first and smallest element of $S$ .
Insert the remaining $\lceil n/2\rceil -1$ elements of $X\setminus S$ into $S$ , one at a time, with a specially chosen insertion ordering described below. Use binary search in subsequences of $S$ (as described below) to determine the position at which each element should be inserted.

The algorithm is designed to take advantage of the fact that the binary searches used to insert elements into $S$ are most efficient (from the point of view of worst case analysis) when the length of the subsequence that is searched is one less than a power of two. This is because, for those lengths, all outcomes of the search use the same number of comparisons as each other.^[1] To choose an insertion ordering that produces these lengths, consider the sorted sequence $S$ after step 4 of the outline above (before inserting the remaining elements), and let $x_{i}$ denote the $i$ th element of this sorted sequence. Thus,

S=(x_{1},x_{2},x_{3},\dots ),

where each element $x_{i}$ with $i\geq 3$ is paired with an element $y_{i}<x_{i}$ that has not yet been inserted. (There are no elements $y_{1}$ or $y_{2}$ because $x_{1}$ and $x_{2}$ were paired with each other.) If $n$ is odd, the remaining unpaired element should also be numbered as $y_{i}$ with $i$ larger than the indexes of the paired elements. Then, the final step of the outline above can be expanded into the following steps:^[1]^[2]^[3]

Partition the uninserted elements $y_{i}$ into groups with contiguous indexes. There are two elements $y_{3}$ and $y_{4}$ in the first group, and the size of each subsequent group equals the number of elements in all previous groups. Thus, the sizes of the groups form a sequence of powers of two: 2, 2, 4, 8, 16, ...
Order the uninserted elements by their groups (smaller indexes to larger indexes), but within each group order them from larger indexes to smaller indexes. Thus, the ordering becomes

y_{4},y_{3},y_{6},y_{5},y_{10},y_{9},y_{8},y_{7},y_{18},\dots

Use this ordering to insert the elements $y_{i}$ into $S$ . For each element $y_{i}$ , use a binary search from the start of $S$ up to but not including $x_{i}$ to determine where to insert $y_{i}$ .

Analysis

Let $C(n)$ denote the number of comparisons that merge-insert sort makes, in the worst case, when sorting $n$ elements. This number of comparisons can be broken down as the sum of three terms:

$\lfloor n/2\rfloor$ comparisons among the pairs of items,
$C(\lfloor n/2\rfloor )$ comparisons for the recursive call,
Some number of comparisons for the binary insertions used to insert the remaining elements.

In the third term, the worst-case number of comparisons for the elements in the first group is two, because each is inserted into a subsequence of $S$ of length at most three. More generally, the worst-case number of comparisons for the elements in the $i$ th group is $i+1$ , because each is inserted into a subsequence of length at most $2^{i+1}-1$ .^[1]^[2]^[3]

This analysis can be used to compute the values of $C(n)$ . For $n=1,2,\dots$ they are^[1]

0, 1, 3, 5, 7, 10, 13, 16, 19, 22, 26, 30, 34, ... (sequence A001768 in the OEIS)

Asymptotically, these numbers are approximately^[1]

n\log _{2}n-1.415n

For small inputs (up to $n=11$ ) these numbers of comparisons equal the lower bound on comparison sorting of $\lceil \log _{2}n!\rceil \approx n\log _{2}n-1.443n$ . However, for larger inputs the number of comparisons made by the merge-insert algorithm is bigger than this lower bound. The algorithm also performs fewer comparisons than the sorting numbers, which count the comparisons made by binary insertion sort or merge sort in the worst case. The sorting numbers fluctuate between $n\log _{2}n-0.915n$ and $n\log _{2}n-n$ , with the same leading term but a worse constant factor in the lower-order linear term.^[1]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h Ford, Lester R. Jr.; Johnson, Selmer M. (1959), "A tournament problem", American Mathematical Monthly, 66: 387–389, doi:10.2307/2308750, MR 0103159
^ ^a ^b ^c Williamson, Stanley Gill (2002), "2.31 Merge insertion (Ford–Johnson)", Combinatorics for Computer Science, Dover books on mathematics, Courier Corporation, pp. 66–68, ISBN 9780486420769
^ ^a ^b ^c ^d Mahmoud, Hosam M. (2011), "12.3.1 The Ford–Johnson algorithm", Sorting: A Distribution Theory, Wiley Series in Discrete Mathematics and Optimization, vol. 54, John Wiley & Sons, pp. 286–288, ISBN 9781118031131
^ Manacher, Glenn K. (July 1979), "The Ford-Johnson Sorting Algorithm Is Not Optimal", Journal of the ACM, 26 (3): 441–456, doi:10.1145/322139.322145
^ The original description by Ford & Johnson (1959) sorted the elements in descending order. The steps listed here reverse the output, making the same comparisons but producing ascending order instead.

[fj-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h Ford, Lester R. Jr.; Johnson, Selmer M. (1959), "A tournament problem", American Mathematical Monthly, 66: 387–389, doi:10.2307/2308750, MR 0103159

[c4cs-2] Williamson, Stanley Gill (2002), "2.31 Merge insertion (Ford–Johnson)", Combinatorics for Computer Science, Dover books on mathematics, Courier Corporation, pp. 66–68, ISBN 9780486420769

[distrib-3] Mahmoud, Hosam M. (2011), "12.3.1 The Ford–Johnson algorithm", Sorting: A Distribution Theory, Wiley Series in Discrete Mathematics and Optimization, vol. 54, John Wiley & Sons, pp. 286–288, ISBN 9781118031131

[nonopt-4] Manacher, Glenn K. (July 1979), "The Ford-Johnson Sorting Algorithm Is Not Optimal", Journal of the ACM, 26 (3): 441–456, doi:10.1145/322139.322145

[5] The original description by Ford & Johnson (1959) sorted the elements in descending order. The steps listed here reverse the output, making the same comparisons but producing ascending order instead.

[1]

[2]

[3]

[4]

[5]