K-way merge algorithm

This is an old revision of this page, as edited by Kl4bx (talk | contribs) at 06:00, 21 November 2015 (2-Way Merge). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template.

Sequence Merge Algorithms are a family of algorithms that run sequentially over multiple sorted lists, typically producing more sorted lists as output. K-Way Merge Algorithms specifically deal with taking in an arbitrary number of sorted lists greater than one, and then merging them all together into a single sorted list.

2-Way Merge

A 2-Way Merge, or a binary merge, has been studied extensively due to its key role in Merge sort. The classic merge outputs the data item with the lowest key at each step; given some sorted lists, it produces a sorted list containing all the elements in any of the input lists, and it does so in time proportional to the sum of the lengths of the input lists.

Example[1]

Assume, that we have two arrays A[0..m-1] and B[0..n-1] that are sorted in ascending order and we want to merge them into an array C[0..m+n-1] with the same order. 

  1. Introduce read-indices ij to traverse arrays A and B, accordingly. Introduce write-index k to store position of the first free cell in the resulting array. By default i = j = k = 0.
  2. At each step: if both indices are in range (i < m and j < n), choose minimum of (A[i], B[j]) and write it to C[k]. Otherwise go to step 4.
  3. Increase k and index of the array, algorithm located minimal value at, by one. Repeat step 2.
  4. Copy the rest values from the array, which index is still in range, to the resulting array.

Hwang-Lin Merge

The classic merge is a simple sort that completes in linear time Ο(n + m).

Optimal 2-Way Merge

 
An example of Huffman Coding, which uses the same technique as in the optimal merge. The values shown would represent each list length.

The 2-Way Merge function found in merge sort can also be used as a tool to merge more than two lists.

Let D={n1, ... , nk} be the set of sequences to be merged. Pick ni, nj∈ D and then merge them together using the merge function. The new set D is then D' = (D - {ni, nj}) ∪ {ni+nj}. This process is repeated until |D| = 1. The question then becomes how to pick ni and nj. How the merge algorithm picks ni and nj determines the cost of the overall algorithm.

The optimal merge pattern is found by utilizing a Greedy algorithm that selects the two shortest lists at each time to merge. This technique is similar to the one used in Huffman coding. The algorithm picks  ni, nj∈ D such that |n| ≥ |ni| and |n| ≥ |nj| ∀ n∈ D[2].

Ideal Merge

The Ideal Merge technique is another merge method for merging greater than two lists. The ideal merging technique was first discussed as a part of UnShuffle Sort.[3]

Given a group of sorted lists S that we want to merge into list S', the algorithm is as follows:

  1. Each list is sorted by the value of its head element
  2. Then the head element of the first list is removed and placed into S'
  3. After the head element of the first list is removed, it is resorted according to the value of its new head element
  4. This repeats until all lists are empty

Depending on how ideal merge is implemented, the running time can vary. If ideal merge keeps its information in a sorted list, then inserting a new head element to the list would be done through a Linear search and the running time will be Θ(M•N) where M is the total number of elements in the sorted lists, and N is the total number of sorted lists.

On the other hand, if the sorted items in a heap, then the running time becomes Θ(M log N).

In-Place Multiway Merging

References

  • Knuth, Donald (1998). "Section 5.2.4: Sorting by Merging". Sorting and Searching. The Art of Computer Programming. Vol. 3 (2nd ed.). Addison-Wesley. pp. 158–168. ISBN 0-201-89685-0. {{cite book}}: Invalid |ref=harv (help)
  1. ^ "ALGORITHM TO MERGE SORTED ARRAYS (Java, C++) | Algorithms and Data Structures". www.algolist.net. Retrieved 2015-11-19.
  2. ^ KABAT, MANAS RANJAN (2013-08-21). DESIGN AND ANALYSIS OF ALGORITHMS. PHI Learning Pvt. Ltd. ISBN 9788120348066.
  3. ^ Art S. KagelUnshuffle Algorithm, Not Quite a Sort?, Computer Language Magazine, 3(11), November 1985.