Disjoint-set data structure: Difference between revisions

Content deleted Content added
Dcoetzee (talk | contribs)
Added linked list approach
Dcoetzee (talk | contribs)
Added disjoint-set forests
Line 12:
'''Make-Set''' is obvious, creating a list of one element. '''Union''' simply appends the two lists, a constant-time operation. Unfortunately, it seems that '''Find''' requires [[Big-O notation|O]](n) or linear time with this approach.
 
We can avoid this by including in each linked list node a pointer to the head of the list; then '''Find''' takes constant time. However, we've now ruined the time of '''Union''', which has to go through the elements of the list being appended to make them point to the head of the new combined list. We can ameliorate this by always appending the smaller list to the longer, orbut byeven adaptingin this case '''FindUnion''' socan thattake itΩ(n) willtime followon chainsaverage. ofTo ''headsolve of list'' pointers and only modifying the first node's pointerthis, but none of these approaches achieve the most effective balance. We choosewe insteadneed to start over with a different data structure.
 
== Disjoint-set forests ==
 
In a disjoint-set forest, each set is represented by a [[tree data structure]] where each node holds a [[reference]] to its parent node. The representative of each set is the tree's root. '''Find-Set''' simply follows parent nodes until it reaches the root. '''Union''' combines two trees into one by making the root one of them the child of the root of the other. In this naive form, this approach is no better than the linked-list approach, because the tree it creates can be highly unbalanced, but it can be enhanced in two ways.
 
The first way, called ''union by rank'', is to always attach the smaller tree to the root of the larger tree, rather than vice versa. To evaluate which tree is larger, we use a simple heuristic called ''rank'': one-element trees have a rank of zero, and whenever two trees of the same rank are unioned together, the result has one greater rank. Just applying this technique alone yields an average running-time of O(log ''n'') per '''Make-Set''', '''Union''', or '''Find''' operation.
 
The second way, called ''path compression'', is a way of flattening the structure of the tree whenever we use '''Find''' on it. The idea is that each node we visit on our way to a root node may as well be attached directly to the root node; they all share the same representative. To effect this, we make one traversal up to the root node, to find out what it is, and then make another traversal, making this root node the immediate parent of all nodes along the path. The resulting tree is much flatter, speeding up future operations.
 
These two techniques complement each other; applied together, the average time per operation is only O(α(n)), where α(n) is the inverse of the function ''f''(''n'') = ''A''(''n'',''n''), where ''A'' is the extremely quickly-growing [[Ackermann function]]. Since α(''n'') is its inverse, it's less than 5 for all remotely practical values of ''n'', and so is effectively a small constant.
 
== Applications ==