Merge-insertion sort

This is an old revision of this page, as edited by David Eppstein (talk | contribs) at 23:21, 12 August 2018 (New article). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

In computer science, merge-insert sort or the Ford–Johnson algorithm is a comparison sorting algorithm published in 1959 by L. R. Ford Jr. and Selmer M. Johnson.[1][2][3] It uses fewer comparisons in the worst case than the best previously known algorithms, binary insertion sort and merge sort,[1] and for 20 years it was the sorting algorithm with the fewest known comparisons.[4] Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons.[3]

Algorithm

Merge-insert sort performs the following steps, on an input   of   elements:[1][2][3]

  1. Group the elements of   into   pairs of elements, arbitrarily, leaving one element unpaired if there is an odd number of elements.
  2. Perform   comparisons, one per pair, to determine the larger of the two elements in each pair.
  3. Recursively sort the   elements found to be larger, creating a sorted sequence   of   of the input elements, in ascending order.
  4. Insert at the start of   the element that was paired with the first and smallest element of  .
  5. Insert the remaining   elements of   into  , one at a time, with a specially chosen insertion ordering described below. Use binary search in   to determine the position at which each element should be inserted.

The algorithm is designed to take advantage of the fact that the binary searches used to insert elements into   are most efficient when the subsequence of   that is searched has a length that is one less than a power of two. To order the elements in such a way as to get binary searches of these lengths, consider the sorted sequence   after step 4 of the outline above (before inserting the remaining elements), and let   denote the  th element of this sorted sequence. Thus,

 

and each element   with   has a corresponding element  , known to be smaller than  , that has not yet been inserted into the sequence. If   is odd, the remaining unpaired element should also be numbered as one of the elements  , with an index greater by one than the largest index of a paired element. With these indices defined, the final step of the outline above can be expanded into the following steps:[1][2][3]

  • Partition the uninserted elements   into groups with contiguous indexes. There are two elements   and   in the first group, and the size of each subsequent group equals the number of elements in all previous groups. Thus, the sizes of the groups form a sequence of powers of two: 2, 2, 4, 8, 16, ...
  • Order the uninserted elements by their groups (smaller indexes to larger indexes), but within each group order them from larger indexes to smaller indexes. Thus, the ordering becomes
 
  • Use this ordering to insert the elements   into  . For each element  , use a binary search from the start of   up to but not including   to determine where to insert  .

Analysis

If this algorithm is used to sort   elements, let   denote the number of comparisons that it makes. Then this number of comparisons can be analyzed as the sum of three terms:   comparisons among the pairs of items,   comparisons for the recursive call, and some number of comparisons for the binary insertions used to insert the remaining elements. The worst-case number of comparisons for the elements in the first group is two, because each is inserted into a subsequence of   of length at most three. Similarly, the worst-case number of comparisons for the elements in the  th group is  , because each is inserted into a subsequence of length at most  .[1][2][3]

Based on this, the number of comparisons used by the algorithm on an  -element input can be described by the integer sequence (starting with  )[1]

0, 1, 3, 5, 7, 10, 13, 16, 19, 22, 26, 30, 34, ... (sequence A001768 in the OEIS)

Asymptotically, these numbers are approximately[1]

 

The number of comparisons agrees with the lower bound on comparison sorting of   up to  , but diverges for larger values of  . It also compares favorably with the sorting numbers, the numbers of comparisons made by binary insertion sort or merge sort in the worst case, which are approximately  .[1]

References

  1. ^ a b c d e f g h Ford, Lester R. Jr.; Johnson, Selmer M. (1959), "A tournament problem", American Mathematical Monthly, 66: 387–389, doi:10.2307/2308750, MR 0103159
  2. ^ a b c d Williamson, Stanley Gill (2002), "2.31 Merge insertion (Ford–Johnson)", Combinatorics for Computer Science, Dover books on mathematics, Courier Corporation, pp. 66–68, ISBN 9780486420769
  3. ^ a b c d e Mahmoud, Hosam M. (2011), "12.3.1 The Ford–Johnson algorithm", Sorting: A Distribution Theory, Wiley Series in Discrete Mathematics and Optimization, vol. 54, John Wiley & Sons, pp. 286–288, ISBN 9781118031131
  4. ^ Manacher, Glenn K. (July 1979), "The Ford-Johnson Sorting Algorithm Is Not Optimal", Journal of the ACM, 26 (3): 441–456, doi:10.1145/322139.322145