Content deleted Content added
m SwisterTwister moved page Draft:K-Way Merge Algorithms to K-Way Merge Algorithms: Publishing accepted Articles for creation submission (AFCH 0.9) |
Happysailor (talk | contribs) complete accepted draft, minor fixes |
||
Line 1:
In computer science, '''K-Way Merge Algorithms''' are a specific type of [[Merge algorithm|Sequence Merge Algorithms]] that specialize in taking in multiple sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. 2-Way Merges are referred to as binary merges on the other hand and are also utilized in k-way merge algorithms.
▲In computer science, '''K-Way Merge Algorithms''' are a specific type of [[Merge algorithm|Sequence Merge Algorithms]] that specialize in taking in multiple sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. 2-Way Merges are referred to as binary merges on the other hand and are also utilized in k-way merge algorithms. K-way merge algorithms often find use in
== 2-Way Merge ==
A 2-Way Merge, or a binary merge, has been studied extensively due to its key role in [[Merge sort]]. An example of such is the classic merge that appears frequently in merge sort examples. The classic merge outputs the data item with the lowest key at each step; given some sorted lists, it produces a sorted list containing all the elements in any of the input lists, and it does so in time proportional to the sum of the lengths of the input lists. There are algorithms that exist that can operate in better than linear times such as the Hwang-Lin Merging Algorithm.<ref>F.K.Hwang and S. Lin, \A Simple Algorithm for Merging Two Disjoint Linearly Ordered Sets", SIAM Journal on Computing 1 (1972), 31-39.</ref>
=== Example<ref>{{Cite web|title = ALGORITHM TO MERGE SORTED ARRAYS (Java, C++) {{!}} Algorithms and Data Structures|url = http://www.algolist.net/Algorithms/Merge/Sorted_arrays|website = www.algolist.net|accessdate = 2015-11-19}}</ref> ===
Assume, that we have two arrays A[0..m-1] and B[0..n-1] that are sorted in ascending order and we want to merge them into an array C[0..m+n-1] with the same order.
# Introduce read-indices '''i''', '''j''' to traverse arrays A and B, accordingly. Introduce write-index '''k''' to store position of the first free cell in the resulting array. By default '''i''' = '''j''' = '''k''' = 0.
# At each step: if both indices are in range ('''i''' < m and '''j''' < n), choose minimum of (A['''i'''], B['''j''']) and write it to C['''k''']. Otherwise go to step 4.
Line 21 ⟶ 13:
=== Applying 2-Way Merge when k > 2 ===
If the number of sorted lists k is greater than 2, the 2-Way Merge function found in merge sort can still be used to merge everything into a single sorted list.
[[File:HuffmanCodeAlg.png|thumb|An example of Huffman Coding, which uses the same technique as in the optimal merge. The values shown would represent each list length.]] ==== Non-optimal ====
Let D={n<sub>1</sub>, ... , n<sub>k</sub>} be the set of sequences to be merged. Pick n<sub>i</sub>, n<sub>j</sub>∈ D and then merge them together using the merge function. The new set D is then D' = (D - {n<sub>i</sub>, n<sub>j</sub>}) ∪ {n<sub>i</sub>+n<sub>j</sub>}. This process is repeated until |D| = 1. The question then becomes how to pick n<sub>i</sub> and n<sub>j</sub>. How the merge algorithm picks n<sub>i</sub> and n<sub>j</sub> determines the cost of the overall algorithm. The worst case running time for this algorithm reaches O(m<sup>2</sup> • n).
==== optimal merge pattern ====
The optimal merge pattern is found by utilizing a [[Greedy algorithm]] that selects the two shortest lists at each time to merge. This technique is similar to the one used in [[Huffman coding]]. The algorithm picks n<sub>i</sub>, n<sub>j</sub>∈ D such that |n| ≥ |n<sub>i</sub>| and |n| ≥ |n<sub>j</sub>| ∀ n∈ D.<ref>{{Cite book|title = DESIGN AND ANALYSIS OF ALGORITHMS|url = https://books.google.com/books?id=jYz1AAAAQBAJ|publisher = PHI Learning Pvt. Ltd.|date = 2013-08-21|isbn = 9788120348066|language = en|first = MANAS RANJAN|last = KABAT}}</ref>
On the optimal merge pattern on the hand can reduce the running time to O(m • n • log m)<ref>{{Cite book|title = Discrete Mathematics|url = https://books.google.com/books?id=tUAZAQAAIAAJ|publisher = Addison-Wesley|date = 1997-01-01|isbn = 9780673980397|language = en}}</ref>. By choosing the shortest lists to merge each time, this lets the algorithm minimize how many times it is necessary to copy the same value into each new merged list.
== Ideal Merge ==
The Ideal Merge technique is another merge method for merging greater than two lists except it does not use the 2-Way merge technique. The ideal merging technique was discussed and saw use as a part of UnShuffle Sort.<ref>'''Art S. Kagel''', ''Unshuffle Algorithm, Not Quite a Sort?'', Computer Language Magazine, 3(11), November 1985.</ref>
Given a group of sorted lists ''S'' that we want to merge into list ''S''', the algorithm is as follows:
# Each list is sorted by the value of its head element
# Then the head element of the first list is removed and placed into ''S''' ▼
#
▲#
# This repeats until all lists are empty
The head elements are generally stored in a priority queue. Depending on how the priority queue is implemented, the running time can vary. If ideal merge keeps its information in a sorted list, then inserting a new head element to the list would be done through a [[Linear search]] and the running time will be Θ(M • N) where N is the total number of elements in the sorted lists, and M is the total number of sorted lists.
On the other hand, if the sorted items in a [[Heap (data structure)|heap]], then the running time becomes Θ(N log M).
=== Binary Tree Implementation ===
As an example, the above technique can be implemented using a binary tree for its priority queue<ref>{{Cite book|title = Fundamentals of data structures|url = https://books.google.com/books?id=kdRQAAAAMAAJ|publisher = Computer Science Press|date = 1983-01-01|isbn = 9780914894209|language = en|first = Ellis|last = Horowitz|first2 = Sartaj|last2 = Sahni}}</ref>. This can help limit the number of comparisons between the element heads. The binary tree is built containing the results of comparing the head of each array. The topmost node of the binary tree is then popped off and its leaf is refilled with the next element in its array.
▲ Very fast merge sort - Google Project Hosting|url = https://code.google.com/p/kway/|website = code.google.com|accessdate = 2015-11-22}}</ref><blockquote><code>{5, 10, 15, 20}</code></blockquote><blockquote><code>{10, 13, 16, 19}</code></blockquote><blockquote><code>{2, 19, 26, 40}</code> </blockquote><blockquote><code>{18, 22, 23, 24}</code></blockquote>We start with the heads of each array and then build a binary tree from there.
[[File:Binary Ideal Merge 1.png|centre|thumb]]
The nodes from each array are compared to each other, before the value 2 is found as the lowest list head element. That value is then popped off, and its leaf is refilled with the next value in the list.
[[File:Binary Ideal Merge 2.png|centre|thumb]]
The value 2 is repopulated by the next value in the sorted list, 19. The comparisons end with 5 being the smallest value, and thus the next value to be popped off. This continues until all of the sorted lists are empty.
== References ==
{{Reflist}}
== Further reading ==
* {{cite book| last = Knuth| first = Donald| authorlink = Donald Knuth| series = [[The Art of Computer Programming]]| volume= 3| title= Sorting and Searching| edition = 2nd| publisher = Addison-Wesley| year= 1998| chapter = Section 5.2.4: Sorting by Merging| pages = 158–168| isbn = 0-201-89685-0| ref = harv}}
* {{cite book|author1=Thomas H Cormen|author2=Charles E Leiserson|author3=Ronald L Rivest|coauthors=Clifford Stein|title=Introduction To Algorithms|url=http://books.google.com/books?id=NLngYyWFl_YC&pg=PA11|year=2001|publisher=MIT Press|isbn=978-0-262-03293-3|pages=28–29}}
|