Binary heap: Difference between revisions

Content deleted Content added
DGPickett (talk | contribs)
Tag: Reverted
Added missing space complexity - it was not only missing but produced a poorly formatted page.
 
(35 intermediate revisions by 19 users not shown)
Line 1:
{{Short description|Variant of heap data structure}}
{{Infobox data structure
| name = Binary (min) heap
| type = binary tree/heap
| invented_by = [[J. W. J. Williams]]
|
| invented_year = 1964
<!-- NOTE:
For the purposes of "Big O" notation, all bases are equivalent, because changing the base of a log only introduces a constant factor.
Base of logarithms doesn't matter in big O notation. O(log n) is the same as O(lg n) or O(ln n) or O(log_2 n). A change of base is just a constant factor. So don't change these O(log n) complexities to O(lg n) or something else just to indicate a base-2 log. The base doesn't matter.
Since the base of logarithms doesn't matter, please do not write complexity expressions that indicate base-2 (or any other base).
-->
|space_avg= DO: O(log n)
DON'T: O(lg n), O(log2 n), O(log_2 n), O(ln n), O(log10 n), etc.
|space_worst=O(n)
-->| insert_worst = O(log ''n'')
|search_avg=O(n)
|search_worst insert_avg = O(n1)
|insert_worst delete_min_avg = O(log ''n'')
| delete_min_worst = O(log ''n'')
|insert_avg=O(1)
|delete_min_avg decrease_key_avg = O(log ''n'')
|delete_min_worst decrease_key_worst = O(log ''n'')
| find_min_avg = O(1)
|invented_by=[[J. W. J. Williams]]
| find_min_worst = O(1)}}
|invented_year=1964
|find_min_avg merge_avg = O(1''n'')
| merge_worst = O(''n'')
|find_min_worst=O(1)}}
| space_avg = O(n)
| space_worst = O(n)
}}
[[File:Max-Heap.svg|thumb|right|Example of a complete binary max-heap]]
[[File:Min-heap.png|thumb|right|Example of a complete binary min heap]]
A '''binary heap''' is a [[heap (data structure)|heap]] [[data structure]] that takes the form of a [[binary tree]]. Binary heaps are a common way of implementing [[priority queue]]s.{{r|clrs|pp=162–163}} The binary heap was introduced by [[J. W. J. Williams]] in 1964, as a data structure for implementing [[heapsort]].<ref>{{Citation |first=J. W. J. |last=Williams |author-link=J. W. J. Williams |title=Algorithm 232 - Heapsort |year=1964 |journal=[[Communications of the ACM]] |volume=7 |issue=6 |pages=347–348 |doi= 10.1145/512274.512284}}</ref>
 
A binary heap is defined as a binary tree with two additional constraints:<ref>{{citation | author=Y Narahari | title=Data Structures and Algorithms | chapter=Binary Heaps | url=https://gtl.csa.iisc.ac.in/dsa/ | chapter-url=http://lcm.csa.iisc.ernet.in/dsa/node137.html}}</ref>
 
*Shape property: a binary heap is a ''[[complete binary tree]]''; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right.
*Heap property: the key stored in each node is either greater than or equal to (≥) or less than or equal to (≤) the keys in the node's children, according to some [[total order]].
 
Heaps where the parent key is greater than or equal to (≥) the child keys are called ''max-heaps''; those where it is less than or equal to (≤) are called ''min-heaps''. Efficient (that is, [[logarithmic time]]) algorithms are known for the two operations needed to implement a priority queue on a binary heap: inserting
*Inserting an element, and removing;
*Removing the smallest or largest element from (respectively) a min-heap or max-heap, respectively.
Binary heaps are also commonly employed in the [[heapsort]] [[sorting algorithm]], which is an in-place algorithm becauseas binary heaps can be implemented as an [[implicit data structure]], storing keys in an array and using their relative positions within that array to represent child–parent relationships.
 
==Heap operations==
Both the insert and remove operations modify the heap to conform topreserve the shape property first, by adding or removing from the end of the heap. Then the heap property is restored by traversing up or down the heap. Both operations take {{nowrap|O(log ''n'')}} time.
 
=== Insert ===
To addinsert an element to a heap, we can perform thisthe following algorithmsteps:
 
# Add the element to the bottom level of the heap at the leftmost open space.
# Compare the added element with its parent; if they are in the correct order, stop.
# If not, swap the element with its parent and return to the previous step.
Steps 2 and 3, which restore the heap property by comparing and possibly swapping a node with its parent, are called ''the up-heap'' operation (also known as ''bubble-up'', ''percolate-up'', ''sift-up'', ''trickle-up'', ''swim-up'', ''heapify-up'', or ''cascade-up'').
 
Steps 2 and 3, which restore the heap property by comparing and possibly swapping a node with its parent, are called ''the up-heap'' operation (also known as ''bubble-up'', ''percolate-up'', <!-- not a typo: -->''sift-up'', ''trickle-up'', ''swim-up'', ''heapify-up'', ''cascade-up'', or ''cascadefix-up'').
The number of operations required depends only on the number of levels the new element must rise to satisfy the heap property. Thus, the insertion operation has a worst-case time complexity of {{nowrap|O(log ''n'')}}. For a random heap, and for repeated insertions, the insertion operation has an average-case complexity of O(1).<ref>{{Cite journal|last1=Porter|first1=Thomas|last2=Simon|first2=Istvan|date=Sep 1975|title=Random insertion into a priority queue structure|journal=IEEE Transactions on Software Engineering|volume=SE-1|issue=3|pages=292–298|doi=10.1109/TSE.1975.6312854|s2cid=18907513|issn=1939-3520}}</ref><ref>{{Cite journal |last1=Mehlhorn|first1=Kurt|last2=Tsakalidis|first2=A.|date=Feb 1989| title=Data structures |website=Universität des Saarlandes |url=https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26179 |language=en|page=27|publisher=Universität des Saarlandes |doi=10.22028/D291-26123 |quote=Porter and Simon [171] analyzed the average cost of inserting a random element into a random heap in terms of exchanges. They proved that this average is bounded by the constant 1.61. Their proof docs not generalize to sequences of insertions since random insertions into random heaps do not create random heaps. The repeated insertion problem was solved by Bollobas and Simon [27]; they show that the expected number of exchanges is bounded by 1.7645. The worst-case cost of inserts and deletemins was studied by Gonnet and Munro [84]; they give log log n + O(1) and log n + log n* + O(1) bounds for the number of comparisons respectively.}}</ref>
 
The number of operations required depends only on the number of levels the new element must rise to satisfy the heap property. Thus, the insertion operation has a worst-case time complexity of {{nowrap|O(log ''n'')}}. For a random heap, and for repeated insertions, the insertion operation has an average-case complexity of O(1).<ref>{{Cite journal|last1=Porter|first1=Thomas|last2=Simon|first2=Istvan|date=Sep 1975|title=Random insertion into a priority queue structure|journal=IEEE Transactions on Software Engineering|volume=SE-1|issue=3|pages=292–298|doi=10.1109/TSE.1975.6312854|s2cid=18907513|issn=1939-3520}}</ref><ref>{{Cite journal |last1=Mehlhorn|first1=Kurt|last2=Tsakalidis|first2=A.|date=Feb 1989| title=Data structures |website=Universität des Saarlandes |url=https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26179 |language=en|page=27|publisher=Universität des Saarlandes |doi=10.22028/D291-26123 |quote=Porter and Simon [171] analyzed the average cost of inserting a random element into a random heap in terms of exchanges. They proved that this average is bounded by the constant 1.61. Their proof docs not generalize to sequences of insertions since random insertions into random heaps do not create random heaps. The repeated insertion problem was solved by Bollobas and Simon [27]; they show that the expected number of exchanges is bounded by 1.7645. The worst-case cost of inserts and deletemins was studied by Gonnet and Munro [84]; they give log log n + O(1) and log n + log n* + O(1) bounds for the number of comparisons respectively.}}</ref>
 
As an example of binary heap insertion, say we have a max-heap
 
::[[File:Heap add step1.svg|150px|class=skin-invert-image]]
 
and we want to add the number 15 to the heap. We first place the 15 in the position marked by the X. However, the heap property is violated since {{nowrap|15 > 8}}, so we need to swap the 15 and the 8. So, we have the heap looking as follows after the first swap:
 
::[[File:Heap add step2.svg|150px|class=skin-invert-image]]
 
However the heap property is still violated since {{nowrap|15 > 11}}, so we need to swap again:
 
::[[File:Heap add step3.svg|150px|class=skin-invert-image]]
 
which is a valid max-heap. There is no need to check the left child after this final step: at the start, the max-heap was valid, meaning the root was already greater than its left child, so replacing the root with an even greater value will maintain the property that each node is greater than its children ({{nowrap|11 > 5}}; if {{nowrap|15 > 11}}, and {{nowrap|11 > 5}}, then {{nowrap|15 > 5}}, because of the [[transitive relation]]).
 
An alternative to leaf insertion that supports balance is to insert root to leaf recursively. For a max heap, if your element's key is greater than that of the current root, you swap elements. Insert the remaining element to a child subtree. Picking the child controls balance, using any of the common balance mechanisms.
 
Height (AVL) balance control is a simple mechanism, and very fast. As you return from each insert to a subtree, you set the height of this node to 1 + max height (right, left). A no child subtree (null reference or pointer) has height 0, so a leaf is height 1. You insert to one side, say left, unless the right height is less. (When you extract, extract from left first for equal keys, as it is probably more heavily populated. This is the only control extract has on tree balance.)
 
Heaps generally do not support a stable sort unless you either add the order as a min secondary key, or insert all later equal keys right (temporarily increasing imbalance) and extract equal key elements always left first (or vice versa).
 
=== Extract===
Line 69 ⟶ 70:
#Compare the new root with its children; if they are in the correct order, stop.
#If not, swap the element with one of its children and return to the previous step. (Swap with its smaller child in a min-heap and its larger child in a max-heap.)
Steps 2 and 3, which restore the heap property by comparing and possibly swapping a node with one of its children, are called the ''down-heap'' (also known as ''bubble-down'', ''percolate-down'', <!-- not a typo: -->''sift-down'', ''sink-down'', ''trickle down'', ''heapify-down'', ''cascade-down'', ''fix-down'', ''extract-min'' or ''extract-max'', or simply ''heapify'') operation.
 
So, if we have the same max-heap as before
 
::[[File:Heap delete step0.svg|150px|class=skin-invert-image]]
 
We remove the 11 and replace it with the 4.
 
::[[File:Heap remove step1.svg|150px|class=skin-invert-image]]
 
Now the heap property is violated since 8 is greater than 4. In this case, swapping the two elements, 4 and 8, is enough to restore the heap property and we need not swap elements further:
 
::[[File:Heap remove step2.svg|150px|class=skin-invert-image]]
 
The downward-moving node is swapped with the ''larger'' of its children in a max-heap (in a min-heap it would be swapped with its smaller child), until it satisfies the heap property in its new position. This functionality is achieved by the '''Max-Heapify''' function as defined below in [[pseudocode]] for an [[Array data structure|array]]-backed heap ''A'' of length ''length''(''A''). ''A'' is indexed starting at 1.
Line 142 ⟶ 143:
 
# Find the index <math>i</math> of the element we want to delete
# Swap this element with the last element. Remove the last element after the swap.
# Down-heapify or up-heapify to restore the heap property. In a max-heap (min-heap), up-heapify is only required when the new key of element <math>i</math> is greater (smaller) than the previous one because only the heap-property of the parent element might be violated. Assuming that the heap-property was valid between element <math>i</math> and its children before the element swap, it can't be violated by a now larger (smaller) key value. When the new key is less (greater) than the previous one then only a down-heapify is required because the heap-property might only be violated in the child elements.
 
=== Decrease or increase key ===
<!-- section linked from [[Reheapification]] -->
 
The decrease key operation replaces the value of a node with a given value with a lower value, and the increase key operation does the same but with a higher value. This involves finding the node with the given value, changing the value, and then down-heapifying or up-heapifying to restore the heap property.
 
Line 241 ⟶ 244:
The operation of merging two binary heaps takes Θ(''n'') for equal-sized heaps. The best you can do is (in case of array implementation) simply concatenating the two heap arrays and build a heap of the result.<ref>Chris L. Kuszmaul.
[http://nist.gov/dads/HTML/binaryheap.html "binary heap"] {{Webarchive| url=https://web.archive.org/web/20080808141408/http://www.nist.gov/dads/HTML/binaryheap.html |date=2008-08-08 }}.
Dictionary of Algorithms and Data Structures, Paul E. Black, ed., U.S. National Institute of Standards and Technology. 16 November 2009.</ref> A heap on ''n'' elements can be merged with a heap on ''k'' elements using O(log ''n'' log ''k'') key comparisons, or, in case of a pointer-based implementation, in O(log ''n'' log ''k'') time.<ref>[[Jörg-Rüdiger Sack|J.-R. Sack]] and T. Strothotte
[https://doi.org/10.1007%2FBF00264229 "An Algorithm for Merging Heaps"],
Acta Informatica 22, 171-186 (1985).</ref> An algorithm for splitting a heap on ''n'' elements into two heaps on ''k'' and ''n-k'' elements, respectively, based on a new view
of heaps as an ordered collections of subheaps was presented in.<ref>{{Cite journal |doi = 10.1016/0890-5401(90)90026-E|title = A characterization of heaps and its applications|journal = Information and Computation|volume = 86|pages = 69–86|year = 1990|last1 = Sack|first1 = Jörg-Rüdiger|author1-link = Jörg-Rüdiger Sack| last2 = Strothotte|first2 = Thomas|doi-access = free}}</ref> The algorithm requires O(log ''n'' * log ''n'') comparisons. The view also presents a new and conceptually simple algorithm for merging heaps. When merging is a common task, a different heap implementation is recommended, such as [[binomial heap]]s, which can be merged in O(log ''n'').
 
Additionally, a binary heap can be implemented with a traditional binary tree data structure, but there is an issue with finding the adjacent element on the last level on the binary heap when adding an element. This element can be determined algorithmically or by adding extra data to the nodes, called "threading" the tree—instead of merely storing references to the children, we store the [[inorder]] successor of the node as well.
Line 250 ⟶ 253:
It is possible to modify the heap structure to make the extraction of both the smallest and largest element possible in [[Big O notation|<math>O</math>]]<math>(\log n)</math> time.<ref name="sym">{{cite web
| url = http://cg.scs.carleton.ca/~morin/teaching/5408/refs/minmax.pdf
| authorauthor1 = Atkinson, M.D.
| author1-link = Michael D. Atkinson
| author2 = J.-R. Sack
| author2-link = Jörg-Rüdiger Sack
Line 259 ⟶ 263:
| publisher = Programming techniques and Data structures. Comm. ACM, 29(10): 996–1000
| date = 1 October 1986
| access-date = 29 April 2008
| archive-date = 27 January 2007
| archive-url = https://web.archive.org/web/20070127093845/http://cg.scs.carleton.ca/%7Emorin/teaching/5408/refs/minmax.pdf
| url-status = dead
}}</ref> To do this, the rows alternate between min heap and max-heap. The algorithms are roughly the same, but, in each step, one must consider the alternating rows with alternating comparisons. The performance is roughly the same as a normal single direction heap. This idea can be generalized to a min-max-median heap.
 
Line 264 ⟶ 272:
In an array-based heap, the children and parent of a node can be located via simple arithmetic on the node's index. This section derives the relevant equations for heaps with their root at index 0, with additional notes on heaps with their root at index 1.
 
To avoid confusion, we'll define the '''level''' of a node as its distance from the root, such that the root itself occupies level 0.
 
=== Child nodes ===
Line 291 ⟶ 299:
\end{alignat}
</math>
 
As required.
 
Noting that the left child of any node is always 1 place before its right child, we get <math>\text{left} = 2i + 1</math>.
Line 300 ⟶ 306:
=== Parent node ===
 
Every non-root node is either the left or right child of its parent, so we know that eitherone of the following ismust true.hold:
 
#* <math>i = 2 \times (\text{parent}) + 1</math>
#* <math>i = 2 \times (\text{parent}) + 2</math>
 
Hence,
Line 330 ⟶ 336:
==Summary of running times==
{{Heap Running Times}}
 
==See also==
* [[Heap (data structure)|Heap]]
* [[Heapsort]]
 
==References==