Merge algorithm

Merge algorithms are a family of algorithms that run sequentially over multiple sorted lists to produce a single sorted list as output. These algorithms are used as subroutines in various sorting algorithms, most famously merge sort.

Binary merge

The simplest merge algorithm is the one employed in classical merge sort. This binary merge algorithm takes two sorted lists as input, and repeatedly picks off an element from either the "left" or "right" list, whichever is smaller. When one list is exhausted, it produces all elements from the other list.

K-way merging

$k$ -way merging generalizes binary merging to an arbitrary number $k$ of sorted input lists. Several solutions to this problem exist. A naive solution is to do a loop over the $k$ sequences to pick off the minimum element each time, and repeat this loop until all sequences are empty:

Input: a sequence of $k$ sequences.
While any of the sequences is non-empty:
- Loop over the sequences to find the one with the minimum first element.
- Output the minimum element and remove it from its sequence.

In the worst case, this algorithm performs $(k -1)(n - k /2$ element comparisons to perform its work if there are a total of $n$ elements in the sequences in total.^[1] It can be improved by storing the sequences in a priority queue (min-heap) keyed by their first element:

Input: a sequence of $k$ sequences.
Build a min-heap $h$ of the $k$ sequences, using the first element as the key.
While any of the sequences is non-empty:
- Let $i = find-min(h)$ .
- Output the first element of sequence $i$ and remove it from its sequence.
- Re-heapify $h$ .

Searching for the next smallest element to be output (find-min) and restoring heap order can now be done in $O (log k)$ time (more specifically, $2⌊log k ⌋$ comparisons^[1]), and the full problem can be solved in $O (n log k)$ time.^[2]

A third algorithm for the problem is a divide and conquer solution that builds on the binary merge algorithm:

Input: a sequence of $k$ sequences.
If $k = 1$ , output the single input sequence.
If $k = 2$ , perform a binary merge.
Else, recursively merge the first $⌊ k /2⌋$ sequences and the final $⌈ k /2⌉$ sequences, then binary merge these.

The divide and conquer algorithm requires fewer than $n ⌈log k ⌉$ comparisons, i.e., less than half the number used by the heap-based algorithm.^[1]

Applications

Applications of $k$ -way merging arise various sorting algorithms, including patience sorting^[3] and an external sorting algorithm that divides its input into $k = .mw-parser-output .sfrac{white-space:nowrap}.mw-parser-output .sfrac.tion,.mw-parser-output .sfrac .tion{display:inline-block;vertical-align:-0.5em;font-size:85%;text-align:center}.mw-parser-output .sfrac .num{display:block;line-height:1em;margin:0.0em 0.1em;border-bottom:1px solid}.mw-parser-output .sfrac .den{display:block;line-height:1em;margin:0.1em 0.1em}.mw-parser-output .sr-only{border:0;clip:rect(0,0,0,0);clip-path:polygon(0px 0px,0px 0px,0px 0px);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}⁠1/M⁠ − 1$ blocks that fit in memory, sorts these one by one, then merges these blocks.^[2]

A $k$ -way in-memory merge sort can also be constructed from a $k$ -way merge algorithm, but his provides no benefits over the ordinary binary merge sort.^[1]

Parallel merge

In parallel computing, arrays of sorted values may be merged efficiently using an all nearest smaller values computation.^[4]

Parallel merge can also be implemented using a divide-and-conquer algorithm, developed and shown in pseudo-code in.^[5] This algorithm performs well when combined with a fast sequential merge as a base case for merging of small arrays. Implementation using Intel's Threading Building Blocks (TBB) and Microsoft's Parallel Pattern Library (PPL) to run on multi-core processors is shown to perform well in practice.^[6]

Language support

Some computer languages provide built-in or library support for merging sorted collections.

C++

The C++'s Standard Template Library has the function std::merge, which merges two sorted ranges of iterators, and std::inplace_merge, which merges two consecutive sorted ranges in-place. In addition, the std::list (linked list) class has its own merge method which merges another list into itself. The type of the elements merged must support the less-than (<) operator, or it must be provided with a custom comparator.

Python

Python (programming language)'s standard library (since 2.6) also has a merge function in the heapq module, that takes multiple sorted iterables, and merges them into a single iterator.^[7]

References

^ ^a ^b ^c ^d Greene, William A. (1993). k-way Merging and k-ary Sorts. Proc. 31-st Annual ACM Southeast Conf. pp. 127–135.
^ ^a ^b Kurt Mehlhorn; Peter Sanders (2008). Algorithms and Data Structures: The Basic Toolbox (PDF). Springer. pp. 119–120. ISBN 978-3-540-77978-0.
^ Chandramouli, Badrish; Goldstein, Jonathan (2014). Patience is a Virtue: Revisiting Merge and Sort on Modern Processors. SIGMOD/PODS.
^ Berkman, Omer; Schieber, Baruch; Vishkin, Uzi (1993), "Optimal double logarithmic parallel algorithms based on finding all nearest smaller values", Journal of Algorithms, 14 (3): 344–370, doi:10.1006/jagm.1993.1018
^ Cormen et al. 2009, p. 800
^ V. J. Duvanenko, "Parallel Merge", Dr. Dobb's Journal, February 2011
^ https://docs.python.org/library/heapq.html#heapq.merge