Disjoint-set data structure: Difference between revisions

Content deleted Content added
Dcoetzee (talk | contribs)
More links, more links!
Dcoetzee (talk | contribs)
Added pseudocode
Line 2:
* '''Find''': Determine which group a particular element is in.
* '''Union''': Combine two groups into a single group.
The other important operation, '''Make-SetMakeSet''', which makes a singleton set containing only a given element, is generally trivial. With these three operations, many practical partitioning problems can be solved (see the ''Applications'' section).
 
While we could represent the sets themselves as objects and have the operations operate on these sets, the more common approach is to choose an element from each set as a ''representative''; then, '''Find''' returns the representative of the set that the given element is in, and '''Union''' takes the representatives of two given sets.
Line 10:
Perhaps the simplest approach to creating a disjoint-set data structure is to create a [[linked list]] for each group. We choose the element at the head of the list as the representative.
 
'''Make-SetMakeSet''' is obvious, creating a list of one element. '''Union''' simply appends the two lists, a constant-time operation. Unfortunately, it seems that '''Find''' requires [[Big-O notation|Ω]](n) or linear time with this approach.
 
We can avoid this by including in each linked list node a pointer to the head of the list; then '''Find''' takes constant time. However, we've now ruined the time of '''Union''', which has to go through the elements of the list being appended to make them point to the head of the new combined list, requiring [[Big-O notation|Ω]](n) time.
 
We can ameliorate this by always appending the smaller list to the longer, called the ''weighted union heuristic''. This also requires keeping track of the length of each list as we perform operations to be efficient. Using this, a sequence of ''m'' '''Make-SetMakeSet''', '''Union''', and '''Find''' operations on ''n'' elements requires O(''m'' + ''n''log ''n'') time. To make any further progress, we need to start over with a different data structure.
 
== Disjoint-set forests ==
 
In a disjoint-set forest, each set is represented by a [[tree data structure]] where each node holds a [[reference]] to its parent node. The representative of each set is the treeroot of that set's roottree. '''Find-Set''' simply follows parent nodes until it reaches the root. '''Union''' combines two trees into one by making the root one of them the child of the root of the other. InOne thisway naiveof form,implementing thisthese approach is no better than the linked-list approach, because the tree it creates canmight be highly unbalanced, but it can be enhanced in two ways.:
 
{{wikicode}}
The first way, called ''union by rank'', is to always attach the smaller tree to the root of the larger tree, rather than vice versa. To evaluate which tree is larger, we use a simple heuristic called ''rank'': one-element trees have a rank of zero, and whenever two trees of the same rank are unioned together, the result has one greater rank. Just applying this technique alone yields an average running-time of O(log ''n'') per '''Make-Set''', '''Union''', or '''Find''' operation.
'''function''' MakeSet(x)
x.parent := '''null'''
'''function''' Find(x)
p := x
'''while''' p.parent ≠ '''null'''
p := p.parent
'''return''' p
'''function''' Union(x, y)
y.parent := x
 
In this naive form, this approach is no better than the linked-list approach, because the tree it creates can be highly unbalanced, but it can be enhanced in two ways.
The second way, called ''path compression'', is a way of flattening the structure of the tree whenever we use '''Find''' on it. The idea is that each node we visit on our way to a root node may as well be attached directly to the root node; they all share the same representative. To effect this, we make one traversal up to the root node, to find out what it is, and then make another traversal, making this root node the immediate parent of all nodes along the path. The resulting tree is much flatter, speeding up future operations not only on these elements but on those referencing them, directly or indirectly.
 
The first way, called ''union by rank'', is to always attach the smaller tree to the root of the larger tree, rather than vice versa. To evaluate which tree is larger, we use a simple heuristic called ''rank'': one-element trees have a rank of zero, and whenever two trees of the same rank are unioned together, the result has one greater rank. Just applying this technique alone yields an average running-time of O(log ''n'') per '''Make-SetMakeSet''', '''Union''', or '''Find''' operation. Here are the improved <code>MakeSet</code> and <code>Union</code>:
 
'''function''' MakeSet(x)
x.parent := '''null'''
x.rank := 0
 
'''function''' Union(x, y)
if x.rank > y.rank
y.parent := x
else if x.rank < y.rank
x.parent := y
else if x.rank = y.rank
y.parent := x
x.rank := x.rank + 1
 
The second wayimprovement, called ''path compression'', is a way of flattening the structure of the tree whenever we use '''Find''' on it. The idea is that each node we visit on our way to a root node may as well be attached directly to the root node; they all share the same representative. To effect this, we make one traversal up to the root node, to find out what it is, and then make another traversal, making this root node the immediate parent of all nodes along the path. The resulting tree is much flatter, speeding up future operations not only on these elements but on those referencing them, directly or indirectly. Here is the improved <code>Find</code>:
 
'''function''' Find(x)
p := x
'''while''' p.parent &ne; '''null'''
p := p.parent
root := p
p := x
'''while''' p &ne; root
next := p.parent
p.parent := root
p := next
'''return''' root
 
These two techniques complement each other; applied together, the average time per operation is only O(&alpha;(n)), where &alpha;(n) is the inverse of the function ''f''(''n'') = ''A''(''n'',''n''), where ''A'' is the extremely quickly-growing [[Ackermann function]]. Since &alpha;(''n'') is its inverse, it's less than 5 for all remotely practical values of ''n''. Thus, the average running time per operation is effectively a small constant; we couldn't ask for much better than this.
Line 32 ⟶ 72:
=== Finding the [[connected component]]s of an [[undirected graph]] ===
 
Initially, we assume that every vertex in the graph is in its own connected component, and is not connected to any other vertex. To represent this, we use '''Make-SetMakeSet''' to initially make a set for each vertex containing only that vertex.
 
Next, we simply visit each vertex and use '''Union''' to union its set with the sets of all its neighbors. Once this is done, we will have one group for each connected component, and can use '''Find''' to determine which connected component a particular vertex is in, or whether two vertices are in the same connected component.
Line 40 ⟶ 80:
When computing the contours of a 3D surface, one of the first steps is to compute the "shorelines," which surround local minima or "lake bottoms." We imagine we are sweeping a plane, which we refer to as the "water level," from below the surface upwards. We will form a series of contour lines as we move upwards, categorized by which local minima they contain. In the end, we will have a single contour containing all local minima.
 
Whenever the water level rises just above a new local minimum, it creates a small "lake," a new contour line that surrounds the local minimum; this is done with the '''Make-SetMakeSet''' operation.
 
As the water level continues to rise, it may touch a saddle point, or "pass." When we reach such a pass, we follow the steepest downhill route from it on each side until we arrive a local minimum. We use '''Find''' to determine which contours surround these two local minima, then use '''Union''' to combine them. Eventually, all contours will be combined into one, and we are done.