Disjoint-set data structure

In computer science, a disjoint-set data structure is a data structure that assigns each element in a set to one of a number of disjoint (nonoverlapping) groups of elements. A union-find algorithm is a way of performing two critical operations on such a data structure:

Find: Determine which group a particular element is in.
Union: Combine two groups into a single group.

The other important operation, Make-Set, which makes a singleton set containing only a given element, is generally trivial. With these three operations, many practical partitioning problems can be solved (see the Applications section).

While we could represent the sets themselves as objects and have the operations operate on these sets, the more common approach is to choose an element from each set as a representative; then, Find returns the representative of the set that the given element is in, and Union takes the representatives of two given sets.

Linked lists

Perhaps the simplest approach to creating a disjoint-set data structure is to create a linked list for each group. We choose the element at the head of the list as the representative.

Make-Set is obvious, creating a list of one element. Union simply appends the two lists, a constant-time operation. Unfortunately, it seems that Find requires O(n) or linear time with this approach.

We can avoid this by including in each linked list node a pointer to the head of the list; then Find takes constant time. However, we've now ruined the time of Union, which has to go through the elements of the list being appended to make them point to the head of the new combined list. We can ameliorate this by always appending the smaller list to the longer, or by adapting Find so that it will follow chains of head of list pointers and only modifying the first node's pointer, but none of these approaches achieve the most effective balance. We choose instead to start over with a different data structure.

Disjoint-set forests

Applications

Disjoint-set data structures arise naturally in many applications, particularly where some kind of partitioning or equivalence relation is involved, and this section discusses some of them.

Finding the connected components of an undirected graph

Initially, we assume that every vertex in the graph is in its own connected component, and is not connected to any other vertex. To represent this, we use Make-Set to initially make a set for each vertex containing only that vertex.

Next, we simply visit each vertex and use Union to union its set with the sets of all its neighbors. Once this is done, we will have one group for each connected component, and can use Find to

Computing shorelines of a terrain

When computing the contours of a 3D surface, one of the first steps is to compute the "shorelines," which surround local minima or "lake bottoms." We imagine we are sweeping a plane, which we refer to as the "water level," from below the surface upwards. We will form a series of contour lines as we move upwards, categorized by which local minima they contain. In the end, we will have a single contour containing all local minima.

Whenever the water level rises just above a new local minimum, it creates a small "lake," a new contour line that surrounds the local minimum; this is done with the Make-Set operation.

As the water level continues to rise, it may touch a saddle point, or "pass." When we reach such a pass, we follow the steepest downhill route from it on each side until we arrive a local minimum. We use Find to determine which contours surround these two local minima, then use Union to combine them. Eventually, all contours will be combined into one, and we are done.

References

Chapter 21, Introduction to Algorithms, 2nd ed. Cormen, Leiserson, Rivest, Stein. ISBN 0262032937.