Merge-insertion sort

In computer science, merge-insert sort or the Ford–Johnson algorithm is a comparison sorting algorithm published in 1959 by L. R. Ford Jr. and Selmer M. Johnson.^[1]^[2]^[3] It uses fewer comparisons in the worst case than the best previously known algorithms, binary insertion sort and merge sort,^[1] and for 20 years it was the sorting algorithm with the fewest known comparisons.^[4] Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons.^[3]

Algorithm

Merge-insert sort performs the following steps, on an input $X$ of $n$ elements:^[1]^[2]^[3]

Group the elements of $X$ into $\lfloor n/2\rfloor$ pairs of elements, arbitrarily, leaving one element unpaired if there is an odd number of elements.
Perform $\lfloor n/2\rfloor$ comparisons, one per pair, to determine the larger of the two elements in each pair.
Recursively sort the $\lfloor n/2\rfloor$ elements found to be larger, creating a sorted sequence $S$ of $\lfloor n/2\rfloor$ of the input elements, in ascending order.
Insert at the start of $S$ the element that was paired with the first and smallest element of $S$ .
Insert the remaining $\lceil n/2\rceil -1$ elements of $X\setminus S$ into $S$ , one at a time, with a specially chosen insertion ordering described below. Use binary search in $S$ to determine the position at which each element should be inserted.

The algorithm is designed to take advantage of the fact that the binary searches used to insert elements into $S$ are most efficient when the subsequence of $S$ that is searched has a length that is one less than a power of two. To order the elements in such a way as to get binary searches of these lengths, consider the sorted sequence $S$ after step 4 of the outline above (before inserting the remaining elements), and let $x_{i}$ denote the $i$ th element of this sorted sequence. Thus,

S=(x_{1},x_{2},x_{3},\dots ),

and each element $x_{i}$ with $i\geq 3$ has a corresponding element $y_{i}$ , known to be smaller than $x_{i}$ , that has not yet been inserted into the sequence. If $n$ is odd, the remaining unpaired element should also be numbered as one of the elements $y_{i}$ , with an index greater by one than the largest index of a paired element. With these indices defined, the final step of the outline above can be expanded into the following steps:^[1]^[2]^[3]

Partition the uninserted elements $y_{i}$ into groups with contiguous indexes. There are two elements $y_{3}$ and $y_{4}$ in the first group, and the size of each subsequent group equals the number of elements in all previous groups. Thus, the sizes of the groups form a sequence of powers of two: 2, 2, 4, 8, 16, ...
Order the uninserted elements by their groups (smaller indexes to larger indexes), but within each group order them from larger indexes to smaller indexes. Thus, the ordering becomes

y_{4},y_{3},y_{6},y_{5},y_{10},y_{9},y_{8},y_{7},y_{18},\dots

Use this ordering to insert the elements $y_{i}$ into $S$ . For each element $y_{i}$ , use a binary search from the start of $S$ up to but not including $x_{i}$ to determine where to insert $y_{i}$ .

Analysis

If this algorithm is used to sort $n$ elements, let $C(n)$ denote the number of comparisons that it makes. Then this number of comparisons can be analyzed as the sum of three terms: $\lfloor n/2\rfloor$ comparisons among the pairs of items, $C(\lfloor n/2\rfloor )$ comparisons for the recursive call, and some number of comparisons for the binary insertions used to insert the remaining elements. The worst-case number of comparisons for the elements in the first group is two, because each is inserted into a subsequence of $S$ of length at most three. Similarly, the worst-case number of comparisons for the elements in the $i$ th group is $i+1$ , because each is inserted into a subsequence of length at most $2^{i+1}-1$ .^[1]^[2]^[3]

Based on this, the number of comparisons used by the algorithm on an $n$ -element input can be described by the integer sequence (starting with $n=1$ )^[1]

0, 1, 3, 5, 7, 10, 13, 16, 19, 22, 26, 30, 34, ... (sequence A001768 in the OEIS)

Asymptotically, these numbers are approximately^[1]

n\log _{2}n-1.415n

The number of comparisons agrees with the lower bound on comparison sorting of $\lceil \log _{2}n!\rceil \approx n\log _{2}n-1.443n$ up to $n=11$ , but diverges for larger values of $n$ . It also compares favorably with the sorting numbers, the numbers of comparisons made by binary insertion sort or merge sort in the worst case, which are approximately $n\log _{2}n-0.915n$ .^[1]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h Ford, Lester R. Jr.; Johnson, Selmer M. (1959), "A tournament problem", American Mathematical Monthly, 66: 387–389, doi:10.2307/2308750, MR 0103159
^ ^a ^b ^c ^d Williamson, Stanley Gill (2002), "2.31 Merge insertion (Ford–Johnson)", Combinatorics for Computer Science, Dover books on mathematics, Courier Corporation, pp. 66–68, ISBN 9780486420769
^ ^a ^b ^c ^d ^e Mahmoud, Hosam M. (2011), "12.3.1 The Ford–Johnson algorithm", Sorting: A Distribution Theory, Wiley Series in Discrete Mathematics and Optimization, vol. 54, John Wiley & Sons, pp. 286–288, ISBN 9781118031131
^ Manacher, Glenn K. (July 1979), "The Ford-Johnson Sorting Algorithm Is Not Optimal", Journal of the ACM, 26 (3): 441–456, doi:10.1145/322139.322145

[fj-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h Ford, Lester R. Jr.; Johnson, Selmer M. (1959), "A tournament problem", American Mathematical Monthly, 66: 387–389, doi:10.2307/2308750, MR 0103159

[c4cs-2] Williamson, Stanley Gill (2002), "2.31 Merge insertion (Ford–Johnson)", Combinatorics for Computer Science, Dover books on mathematics, Courier Corporation, pp. 66–68, ISBN 9780486420769

[distrib-3] Mahmoud, Hosam M. (2011), "12.3.1 The Ford–Johnson algorithm", Sorting: A Distribution Theory, Wiley Series in Discrete Mathematics and Optimization, vol. 54, John Wiley & Sons, pp. 286–288, ISBN 9781118031131

[nonopt-4] Manacher, Glenn K. (July 1979), "The Ford-Johnson Sorting Algorithm Is Not Optimal", Journal of the ACM, 26 (3): 441–456, doi:10.1145/322139.322145

[1]

[2]

[3]

[4]