K-way merge algorithm: Difference between revisions

Content deleted Content added
m Pzoxicuvybtnrm moved page K-way merge algorithms to K-way merge algorithm: should be singular
fix capitalisation
Line 1:
{{DISPLAYTITLE:''k''-way merge algorithm}}
 
{{orphan|date=November 2015}}
In [[computer science]], '''K''k''-Wayway Mergemerge Algorithmsalgorithms''' or Multiwaymultiway Mergesmerges are a specific type of [[Merge algorithm|Sequencesequence Mergemerge Algorithmsalgorithms]] that specialize in taking in multiple sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. 2-Wayway Mergesmerges are also referred to as binary merges.
 
== 2-Wayway Mergemerge ==
A 2-Wayway Mergemerge, or a binary merge, has been studied extensively due to its key role in [[Mergemerge sort]]. An example of such is the classic merge that appears frequently in merge sort examples. The classic merge outputs the data item with the lowest key at each step; given some sorted lists, it produces a sorted list containing all the elements in any of the input lists, and it does so in time proportional to the sum of the lengths of the input lists.
 
Denote by A[1..p] and B[1..q] two arrays sorted in increasing order.
Further, denote by C[1..n] the output array.
The canonical 2-Way Mergemerge algorithm<ref>{{cite book|author1=Thomas H. Cormen|authorlink1=Thomas H. Cormen|author2=Charles E. Leiserson|author3=Ronald L. Rivest|authorlink3=Ron Rivest|author4=Clifford Stein|title=Introduction To Algorithms|url=https://books.google.com/books?id=NLngYyWFl_YC&pg=PA11|year=2001|publisher=MIT Press|isbn=978-0-262-03293-3|pages=28–29}}</ref> stores indices i, j, and k into A, B, and C respectively.
Initially, these indices refer to the first element, i.e., are 1.
If A[i] < B[j], then the algorithm copies A[i] into C[k] and increases i and k.
Line 14 ⟶ 16:
In this case the algorithm copies the remaining elements of B or A into C and terminates.
 
== K''k''-Wayway Mergemerge ==
The K''k''-way merge problem consists of merging k sorted arrays to produce a single sorted array with the same elements.
Denote by n the total number of elements.
n is equal to the size of the output array and the sum of the sizes of the k input arrays.
Line 23 ⟶ 25:
Several algorithms that achieve this running time exist.
 
=== Iterative 2-Way Mergemerge ===
 
The problem can be merged by iteratively merging two of the k arrays using a 2-Wayway Mergemerge until only a single array is left. If the arrays are merged in arbitrary order, then the resulting running time is only O(kn). This is suboptimal.
 
The running time can be improved by iteratively merging the first with the second, the third with the fourth, and so on. As the number of arrays is halved in each iteration, there are only Θ(log k) iteration. In each iteration every element is moved exactly once. The running time per iteration is therefore in Θ(n) as n is the number of elements. The total running time is therefore in Θ(n log k).
Line 31 ⟶ 33:
We can further improve upon this algorithm, by iteratively merging the two shortest arrays. It is clear that this minimizes the running time and can therefore not be worse than the strategy described in the previous paragraph. The running time is therefore in O(n log k). Fortunately, in border cases the running time can be better. Consider for example the degenerate case, where all but one array contain only one element. The strategy explained in the previous paragraph needs Θ(n log k) running time, while the improved one only needs Θ(n) running time.
 
=== Direct K''k''-Wayway Mergemerge ===
 
The basic idea of a direct K''k''-Wayway Mergemerge consists of efficiently computing the minimum element of all k arrays and then transferring it to the output array.
A straightforward implementation would scan all k arrays to determine the minimum.
This straightforward implementation results in a running time of Θ(kn).
Line 67 ⟶ 69:
 
===== Example =====
now Suppose we have 4 sorted arrays:<blockquote><code>{5, 10, 15, 20}</code></blockquote><blockquote><code>{10, 13, 16, 19}</code></blockquote><blockquote><code>{2, 19, 26, 40}</code> </blockquote><blockquote><code>{18, 22, 23, 24}</code></blockquote>We start with the heads of each array and then build a binary tree from there.
 
[[File:Binary Ideal Merge 1.png|centre|thumb]]
Line 79 ⟶ 81:
=== Lower Running Time Bound ===
 
One can show that no comparison-based K''k''-Wayway Mergemerge algorithm exists with a running time in O(n f(k)) where f grows asymptotically slower than a logarithm.
The proof is a straightforward reduction from comparison-based sorting.
Suppose that such an algorithm existed, then we could construct a comparison-based sorting algorithm with running time O(n f(n)) as follows:
Chop the input array into n arrays of size 1.
Merge these n arrays with the K''k''-Wayway Mergemerge algorithm.
The resulting array is sorted and the algorithm has a running time in O(n f(n)).
This is a contradiction to the well-known result that no comparison-based sorting algorithm with a worst case running time below O(n log n) exists.
 
== External sorting ==
== External Sorting<ref>{{Cite book|title = Data Structures and Algorithm Analysis in C++, Third Edition|url = https://books.google.com/books?id=AijEAgAAQBAJ|publisher = Courier Corporation|date = 2012-07-26|isbn = 9780486172620|first = Clifford A.|last = Shaffer}}</ref> ==
{{See also|External sorting}}
K''k''-way merges findare great use inused external sorting procedures.<ref>{{Cite book|title = Data Structures and Algorithm Analysis in C++, Third Edition|url = https://books.google.com/books?id=AijEAgAAQBAJ|publisher = Courier Corporation|date = 2012-07-26|isbn = 9780486172620|first = Clifford A.|last = Shaffer}}</ref> [[External sorting|External sorting algorithms]] are a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). K''k''-way merge algorithms usually take place in the second stage of external sorting algorithms, much like they do for merge sort.
 
A multiway merge allows for the files outside of memory to be merged in fewer passes than in a binary merge. If there are 6 runs that need be merged then a binary merge would need to take 3 merge passes, as opposed to a 6-way merges single merge pass. This reduction of merge passes is especially important considering the large amount of information that is usually being sorted in the first place, allowing for greater speed-ups while also reducing the amount of accesses to slower memory.