Disjoint-set data structure: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 07:53, 29 January 2023 edit OAbot (talk \| contribs) Bots 646,409 edits m Open access bot: doi added to citation with #oabot. ← Previous edit		Latest revision as of 12:42, 28 July 2025 edit undo Cedar101 (talk \| contribs) Extended confirmed users 18,557 edits m →Merging two sets: {{multiple image}}
(39 intermediate revisions by 23 users not shown)
Line 1: {{Short description\|Data structure for storing non-overlapping sets}} {{Infobox data structure \| name = Disjoint-set/Union-find Forest \| type = multiway tree \| invented_by = [[Bernard A. Galler]] and [[Michael J. Fischer]] \| invented_year = 1964\| \| space_avg = {{math\|''O''(''n'')}}<ref name="tarjan1975">{{cite journal\|last1=Tarjan\|first1=Robert Endre\|author1-link=Robert E. Tarjan\|year=1975\|title=Efficiency of a Good But Not Linear Set Union Algorithm\|journal=Journal of the ACM\|volume=22\|issue=2\|pages=215–225\|doi=10.1145/321879.321884\|hdl=1813/5942\|s2cid=11105749\|hdl-access=free }}</ref>▼ \| \|~~merge_worst~~ space_worst = {{math\|O(''αO''(''n''))}}<ref name="tarjan1975"/>▼ ▲\|space_avg ={{math\|O(''n'')}}<ref name="tarjan1975">{{cite journal\|last1=Tarjan\|first1=Robert Endre\|author1-link=Robert E. Tarjan\|year=1975\|title=Efficiency of a Good But Not Linear Set Union Algorithm\|journal=Journal of the ACM\|volume=22\|issue=2\|pages=215–225\|doi=10.1145/321879.321884\|hdl=1813/5942\|s2cid=11105749\|hdl-access=free }}</ref> \|~~space_worst~~ search_avg = {{math\|''O''(α(''n''))}}<ref name="tarjan1975"/> (amortized) \|~~search_avg~~ search_worst = {{math\|O(''αO''(α(''n''))}}<ref name="tarjan1975"/> (amortized) \|~~search_worst~~ insert_avg = {{math\|O(''αO''(~~''n'')~~1)}}<ref name="tarjan1975"/> \|~~merge_avg~~ insert_worst = ={{math\|O(''αO''(~~''n'')~~1)}}<ref name="tarjan1975"/> ▲\|merge_worst ={{math\|O(''α''(''n''))}}<ref name="tarjan1975"/> }} In [[computer science]], a '''disjoint-set data structure''', also called a '''union–find data structure''' or '''merge–find set''', is a [[data structure]] that stores a collection of [[Disjoint sets\|disjoint]] (non-overlapping) [[Set (mathematics)\|sets]]. Equivalently, it stores a [[partition of a set]] into disjoint [[subset]]s. It provides operations for adding new sets, merging sets (replacing them bywith their [[Union (set theory)\|union]]), and finding a representative member of a set. The last operation makes it possible to ~~find out~~determine efficiently ifwhether any two elements ~~are~~belong into the same set or to different sets.▼ [[File:Dsu disjoint sets init.svg\|thumb\|360px\|<code>MakeSet</code> creates 8 singletons.]]▼ [[File:Dsu disjoint sets final.svg\|thumb\|360px\|After some operations of <code>Union</code>, some sets are grouped together.]]▼ While there are several ways of implementing disjoint-set data structures, in practice they are often identified with a particular implementation ~~called~~known as a '''disjoint-set forest'''. This ~~is a~~ specialized type of [[Forest (graph theory)\|forest]] ~~which~~ performs ~~unions~~union and ~~finds~~find operations in near-constant [[Amortized analysis\|amortized time]]. ~~To perform~~For a sequence of {{mvar\|m}} addition, union, or find operations on a disjoint-set forest with {{mvar\|n}} nodes, ~~requires~~the total time required is {{math\|[[Big O notation\|''O'']](''m''α(''n''))}}, where {{math\|α(''n'')}} is the extremely slow-growing [[inverse Ackermann function]]. Although ~~Disjoint~~disjoint-set forests do not guarantee this ~~performance~~time onper ~~a per-~~operation ~~basis. Individual union and find operations can take longer than a constant times {{math\|α(''n'')}} time~~, ~~but~~ each operation ~~causes~~rebalances the ~~disjoint-set~~structure ~~forest~~(via totree ~~adjust itself~~compression) so that ~~successive~~subsequent operations ~~are~~become faster. As ~~Disjoint~~a result, disjoint-set forests are both [[asymptotically optimal]] and practically efficient.▼ ▲In [[computer science]], a '''disjoint-set data structure''', also called a '''union–find data structure''' or '''merge–find set''', is a [[data structure]] that stores a collection of [[Disjoint sets\|disjoint]] (non-overlapping) [[Set (mathematics)\|sets]]. Equivalently, it stores a [[partition of a set]] into disjoint [[subset]]s. It provides operations for adding new sets, merging sets (replacing them by their [[Union (set theory)\|union]]), and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets. Disjoint-set data structures play a key role in [[Kruskal's algorithm]] for finding the [[minimum spanning tree]] of a graph. The importance of minimum spanning trees means that disjoint-set data structures ~~underlie~~support a wide variety of algorithms. In addition, ~~disjoint-set~~these data structures ~~also have~~find applications toin symbolic computation~~, as~~ ~~well~~and in compilers, especially for [[register allocation]] problems.▼ ▲While there are several ways of implementing disjoint-set data structures, in practice they are often identified with a particular implementation called a '''disjoint-set forest'''. This is a specialized type of [[Forest (graph theory)\|forest]] which performs unions and finds in near-constant [[Amortized analysis\|amortized time]]. To perform a sequence of {{mvar\|m}} addition, union, or find operations on a disjoint-set forest with {{mvar\|n}} nodes requires total time {{math\|[[Big O notation\|''O'']](''m''α(''n''))}}, where {{math\|α(''n'')}} is the extremely slow-growing [[inverse Ackermann function]]. Disjoint-set forests do not guarantee this performance on a per-operation basis. Individual union and find operations can take longer than a constant times {{math\|α(''n'')}} time, but each operation causes the disjoint-set forest to adjust itself so that successive operations are faster. Disjoint-set forests are both [[asymptotically optimal]] and practically efficient. ▲Disjoint-set data structures play a key role in [[Kruskal's algorithm]] for finding the [[minimum spanning tree]] of a graph. The importance of minimum spanning trees means that disjoint-set data structures underlie a wide variety of algorithms. In addition, disjoint-set data structures also have applications to symbolic computation, as well in compilers, especially for [[register allocation]] problems. == History == Disjoint-set forests were first described by [[Bernard A. Galler]] and [[Michael J. Fischer]] in 1964.<ref name="Galler1964">{{cite journal\|first1=Bernard A.\|last1=Galler\|author1-link=Bernard A. Galler\|first2=Michael J.\|last2=Fischer\|author2-link=Michael J. Fischer\|title=An improved equivalence algorithm\|journal=[[Communications of the ACM]]\|volume=7\|issue=5\|date=May 1964\|pages=301–303\|doi=10.1145/364099.364331\|s2cid=9034016 \|doi-access=free}}. The paper originating disjoint-set forests.</ref> In 1973, their time complexity was bounded to <math>O(\log^{}(n))</math>, the [[iterated logarithm]] of <math>n</math>, by [[John Hopcroft\|Hopcroft]] and [[Jeffrey Ullman\|Ullman]].<ref name="Hopcroft1973">{{cite journal\|last1=Hopcroft\|first1=J. E.\|author1-link=John Hopcroft\|last2=Ullman\|first2=J. D.\|author2-link=Jeffrey Ullman\|year=1973\|title=Set Merging Algorithms\|journal=SIAM Journal on Computing\|volume=2\|issue=4\|pages=294–303\|doi=10.1137/0202024}}</ref> In 1975, [[Robert Tarjan]] was the first to prove the <math>O(m\alpha(n))</math> ([[Ackermann function#Inverse\|inverse Ackermann function]]) upper bound on the algorithm's time complexity,.<ref name="Tarjan1984">{{cite journal\|first1=Robert E.\|last1=Tarjan\|author1-link=Robert E. Tarjan\|first2=Jan\|last2=van Leeuwen\|author2-link=Jan van Leeuwen\|title=Worst-case analysis of set union algorithms\|journal=Journal of the ACM\|volume=31\|issue=2\|pages=245–281\|year=1984\|doi= 10.1145/62.2160\|s2cid=5363073 \|doi-access=free}}</ref> ~~and,~~He inalso proved it to be tight. In 1979, he showed that this was the lower bound for a ~~restricted~~certain ~~case~~class of algorithms, [[pointer algorithm\|pointer algorithms]], that include the Galler-Fischer structure.<ref name="Tarjan1979">{{cite journal\|first=Robert Endre\|last=Tarjan\|author-link=Robert E. Tarjan\|year=1979\|title=A class of algorithms which require non-linear time to maintain disjoint sets\|journal=Journal of Computer and System Sciences\|volume=18\|issue=2\|pages=110–127\|doi=10.1016/0022-0000(79)90042-4\|doi-access=free }}</ref> In 1989, [[Michael Fredman\|Fredman]] and [[Michael Saks (mathematician)\|Saks]] showed that <math>\Omega(\alpha(n))</math> (amortized) words of <math>O(\log n)</math> bits must be accessed by ''any'' disjoint-set data structure per operation,<ref name="Fredman1989">{{cite ~~journal~~book\|first1=M.\|last1=Fredman\|author-link=Michael Fredman\|first2=M.\|last2=Saks\|title=~~The~~Proceedings ~~cell~~of ~~probe~~the ~~complexity~~twenty-first ofannual ~~dynamic~~ACM ~~data~~symposium ~~structures\|journal=Proceedings~~on Theory of ~~the~~computing ~~Twenty~~-~~First~~ ~~Annual~~STOC ~~ACM~~'89 ~~Symposium~~\|chapter=The oncell ~~Theory~~probe complexity of ~~Computing~~dynamic data structures \|pages=345–354\|date=May 1989\|doi=10.1145/73007.73040\|isbn=0897913078\|s2cid=13470414\|quote=Theorem 5: Any CPROBE(log ''n'') implementation of the set union problem requires Ω(''m'' α(''m'', ''n'')) time to execute ''m'' Find's and ''n''−1 Union's, beginning with ''n'' singleton sets. \|doi-access=free}}</ref> thereby proving the optimality of the data structure in this model. In 1991, Galil and Italiano published a survey of data structures for disjoint-sets.<ref name="Galil1991">{{cite journal\|first1=Z.\|last1=Galil\|first2=G.\|last2=Italiano\|title=Data structures and algorithms for disjoint set union problems.\|journal=ACM Computing Surveys\|volume=23\|issue=3\|pages=319–344\|year=1991\|doi=10.1145/116873.116878\|s2cid=207160759 }}</ref> In 1994, Richard J. Anderson and Heather Woll described a parallelized version of Union–Find that never needs to block.<ref name="Anderson1994">{{cite conference\|first1=Richard J.\|last1=Anderson\|first2=Heather\|last2=Woll\|title=Wait-free Parallel Algorithms for the Union-Find Problem\|conference=23rd ACM Symposium on Theory of Computing\|year=1994\|pages=370–380}}</ref> In 2007, Sylvain Conchon and Jean-Christophe Filliâtre developed a semi-[[persistent data structure\|persistent]] version of the disjoint-set forest data structure and formalized its correctness using the [[proof assistant]] [[Coq (software)\|Coq]].<ref name="Conchon2007">{{cite conference\|first1=Sylvain\|last1=Conchon\|first2=Jean-Christophe\|last2=Filliâtre\|contribution=A Persistent Union-Find Data Structure\|title=ACM SIGPLAN Workshop on ML\|___location=Freiburg, Germany\|date=October 2007\|url=https://www.lri.fr/~filliatr/puf/}}</ref> "Semi-persistent" means that previous versions of the structure are efficiently retained, but accessing previous versions of the data structure invalidates later ones. Their fastest implementation achieves performance almost as efficient as the non-persistent algorithm. They do not perform a complexity analysis. Variants of disjoint-set data structures with better performance on a restricted class of problems have also been considered. Gabow and Tarjan showed that if the possible unions are restricted in certain ways, then a truly linear time algorithm is possible.<ref>Harold N. Gabow, Robert Endre Tarjan, "A linear-time algorithm for a special case of disjoint set union," Journal of Computer and System Sciences, Volume 30, Issue 2, 1985, pp. 209–221, ISSN 0022-0000, https://doi.org/10.1016/0022-0000(85)90014-5</ref> In particular, linear time is achievable if a "union tree" is given a priori. This is a tree that includes all elements of the sets. Let p[''v''] denote the parent in the tree, then the assumption is that union operations must have the form '''union'''(''v'',p[''v'']) for some ''v''. == Representation == In this and the following section we describe the most common implementation of the disjoint-set data structure, as a forest of [[parent pointer tree]]s. This representation is known as '''Galler-Fischer trees'''. Each node in a disjoint-set forest consists of a pointer and some auxiliary information, either a size or a rank (but not both). The pointers are used to make [[parent pointer tree]]s, where each node that is not the root of a tree points to its parent. To distinguish root nodes from others, their parent pointers have invalid values, such as a circular reference to the node or a sentinel value. Each tree represents a set stored in the forest, with the members of the set being the nodes in the tree. Root nodes provide set representatives: Two nodes are in the same set if and only if the roots of the trees containing the nodes are equal. Line 59 ⟶ 58: '''end function''' This operation has ~~constant~~linear time complexity. In particular, initializing a disjoint-set forest with {{mvar\|n}} nodes requires {{math\|''O''(''n'')}} time. Lack of a parent assigned to the node implies that the node is not present in the forest. In practice, <code>MakeSet</code> must be preceded by an operation that allocates memory to hold {{math\|x}}. As long as memory allocation is an amortized constant-time operation, as it is for a good [[dynamic array]] implementation, it does not change the asymptotic performance of the random-set forest. Line 119 ⟶ 120: === Merging two sets === {{multiple image \| direction = vertical \| width =360 ▲~~[[File:~~ \| image1=Dsu disjoint sets init.svg\|~~thumb\|360px\|~~caption1=<code>MakeSet</code> creates 8 singletons.]] ▲~~[[File:~~ \| image2=Dsu disjoint sets final.svg\|~~thumb\|360px\|~~caption2=After some operations of <code>Union</code>, some sets are grouped together.]] }} The operation <code>Union(''x'', ''y'')</code> replaces the set containing {{mvar\|x}} and the set containing {{mvar\|y}} with their union. <code>Union</code> first uses <code>Find</code> to determine the roots of the trees containing {{mvar\|x}} and {{mvar\|y}}. If the roots are the same, there is nothing more to do. Otherwise, the two trees must be merged. This is done by either setting the parent pointer of {{mvar\|x}}'s root to {{mvar\|y}}'s, or setting the parent pointer of {{mvar\|y}}'s root to {{mvar\|x}}'s. The choice of which node becomes the parent has consequences for the complexity of future operations on the tree. If it is done carelessly, trees can become excessively tall. For example, suppose that <code>Union</code> always made the tree containing {{mvar\|x}} a subtree of the tree containing {{mvar\|y}}. Begin with a forest that has just been initialized with elements <math>1, 2, 3, \ldots, n,</math> and execute <code>~~{{math\|~~Union(1, 2)}}</code>, <code>~~{{math\|~~Union(2, 3)}}</code>, ..., <code>~~{{math\|~~Union(''n'' - 1, ''n'')}}</code>. The resulting forest contains a single tree whose root is {{mvar\|n}}, and the path from 1 to {{mvar\|n}} passes through every node in the tree. For this forest, the time to run <code>Find(1)</code> is {{math\|''O''(''n'')}}. In an efficient implementation, tree height is controlled using '''union by size''' or '''union by rank'''. Both of these require a node to store information besides just its parent pointer. This information is used to decide which root becomes the new parent. Both strategies ensure that trees do not become too deep. Line 139 ⟶ 146: '''end if''' ''// If necessary, ~~rename~~swap variables to ensure that'' ''// x has at least as many descendants as y'' '''if''' ''x''.size < ''y''.size '''then''' Line 183 ⟶ 190: It is clear from the above implementations that the size and rank of a node do not matter unless a node is the root of a tree. Once a node becomes a child, its size and rank are never accessed again. There is a variant of the <code>Union</code> operation in which the user determines the representative of the formed set. It is not hard to add this functionality to the above algorithms without losing efficiency. == Time complexity == Line 196 ⟶ 205: === Proof of O(m log n) time complexity of Union-Find === The precise analysis of the performance of a disjoint-set forest is somewhat intricate. However, there is a much simpler analysis that proves that the amortized time for any {{mvar\|m}} <code>Find</code> or <code>Union</code> operations on a disjoint-set forest containing {{mvar\|n}} objects is {{math\|''O''(~~mlog~~''m'' log<sup></sup> ''n'')}}, where {{math\|log<sup></sup>}} denotes the [[iterated logarithm]].<ref>[[Raimund Seidel]], Micha Sharir. "Top-down analysis of path compression", SIAM J. Comput. 34(3):515–525, 2005</ref><ref>{{cite journal\|last1=Tarjan\|first1=Robert Endre\|year=1975\|title=Efficiency of a Good But Not Linear Set Union Algorithm\|url=http://portal.acm.org/citation.cfm?id=321884\|journal=Journal of the ACM\|volume=22\| issue=2\| pages=215–225 \| doi=10.1145/321879.321884\|hdl=1813/5942\|s2cid=11105749\|hdl-access=free}}</ref><ref>{{cite journal\| last1=Hopcroft\|first1=J. E.\| last2=Ullman\|first2=J. D.\|year=1973\|title=Set Merging Algorithms\|journal=SIAM Journal on Computing\|volume=2\| issue=4\| pages=294–303 \| doi=10.1137/0202024}}</ref><ref>[[Robert E. Tarjan]] and [[Jan van Leeuwen]]. Worst-case analysis of set union algorithms. Journal of the ACM, 31(2):245–281, 1984.</ref> {{anchor\|increasing rank lemma}}Lemma 1: As the [[~~Disjoint-set data structure~~#Disjoint-set forests\|find function]] follows the path along to the root, the rank of node it encounters is increasing. {{math proof\| We claim that as Find and Union operations are applied to the data set, this fact remains true over time. Initially when each node is the root of its own tree, it's trivially true. The only case when the rank of a node might be changed is when the [[~~Disjoint-set data structure~~#Disjoint-set forests\|Union by Rank]] operation is applied. In this case, a tree with smaller rank will be attached to a tree with greater rank, rather than vice versa. And during the find operation, all nodes visited along the path will be attached to the root, which has larger rank than its children, so this operation won't change this fact either.}} {{anchor\|min subtree size lemma}}Lemma 2: A node {{mvar\|u}} which is root of a subtree with rank {{mvar\|r}} has at least <math>2^r</math> nodes. {{math proof\| Initially when each node is the root of its own tree, it's trivially true. Assume that a node {{mvar\|u}} with rank {{mvar\|r}} has at least {{math\|2<sup>''r''</sup>}} nodes. Then when two trees with rank {{mvar\|r}} are merged using the operation [[~~Disjoint-set data structure~~#Disjoint-set forests\|Union by Rank]], a tree with rank {{math\|''r'' + 1}} results, the root of which has at least <math>2^r + 2^r = 2^{r + 1}</math> nodes.}} [[File:ProofOflogstarnRank.jpg\|center]] Line 211 ⟶ 220: {{math proof\| From [[#min subtree size lemma\|lemma 2]], we know that a node {{mvar\|u}} which is root of a subtree with rank {{mvar\|r}} has at least <math>2^r</math> nodes. We will get the maximum number of nodes of rank {{mvar\|r}} when each node with rank {{mvar\|r}} is the root of a tree that has exactly <math>2^r</math> nodes. In this case, the number of nodes of rank {{mvar\|r}} is <math>\frac{n}{2^r}.</math>}} At any particular point in the execution, we can group the vertices of the graph into "buckets", according to their rank. We define the buckets' ranges inductively, as follows: Bucket 0 contains vertices of rank 0. Bucket 1 contains vertices of rank 1. Bucket 2 contains vertices of ranks 2 and 3. ~~For convenience, we define "bucket" here: a bucket is a set that contains vertices with particular ranks.~~ WeIn ~~create some buckets and put vertices into the buckets according to their ranks inductively. That is~~general, ~~vertices with rank 0 go into the zeroth bucket, vertices with rank 1 go into the first bucket, vertices with ranks 2 and 3 go into the second bucket. If~~if the {{mvar\|B}}-th bucket contains vertices with ranks from interval <math>\left[r, 2^r - 1\right] = [r, R - 1]</math>, then the (B+1)st bucket will contain vertices with ranks from interval <math>\left[R, 2^R - 1\right].</math>▼ For <math>B \in \mathbb{N}</math>, let <math>\text{tower}(B) = \underbrace{2^{2^{\cdots^2}}}_{B \text{ times}}</math>. Then bucket <math>B</math> will have vertices with ranks in the interval <math>[\text{tower}(B-1), \text{tower}(B)-1]</math>. ▲We create some buckets and put vertices into the buckets according to their ranks inductively. That is, vertices with rank 0 go into the zeroth bucket, vertices with rank 1 go into the first bucket, vertices with ranks 2 and 3 go into the second bucket. If the {{mvar\|B}}-th bucket contains vertices with ranks from interval <math>\left[r, 2^r - 1\right] = [r, R - 1]</math> then the (B+1)st bucket will contain vertices with ranks from interval <math>\left[R, 2^R - 1\right].</math> [[File:Proof_of_O(logn)_Union_Find.jpg\|center\|frame\|Proof of <math>O(\log^n)</math> Union Find]] We can make two observations about the buckets' sizes. # {{anchor\|max buckets}}The total number of buckets is at most {{math\|log<sup></sup>''n''}}. #: Proof: ~~When~~Since weno govertex ~~from~~can ~~one~~have ~~bucket~~rank togreater ~~the~~than ~~next~~<math>n</math>, weonly ~~add~~the ~~one~~first ~~more~~<math>\log^ ~~two~~(n)</math> tobuckets ~~the power,~~can ~~that~~have isvertices, ~~the next bucket to~~where <math>\~~left[B, 2~~log^~~B - 1\right]~~</math> ~~will~~denotes bethe inverse of the <math>\~~left[2^B, 2^~~text{~~2^B~~tower} ~~- 1\right]~~</math> function defined above. # {{anchor\|max bucket size}}The maximum number of elements in bucket <math>\left[B, 2^B - 1\right]</math> is at most <math>\frac{2n}{2^B}</math>. #: Proof: The maximum number of elements in bucket <math>\left[B, 2^B - 1\right]</math> is at most <math>\frac{n}{2^B} + \frac{n}{2^{B+1}} + \frac{n}{2^{B+2}} + \cdots + \frac{n}{2^{2^B - 1}} \leq \frac{2n}{2^B}.</math> Line 241 ⟶ 254: Therefore, <math>T = T_1 + T_2 + T_3 = O(m \log^n).</math> == Other structures == ===Better worst-case time per operation=== The worst-case time of the <code>Find</code> operation in trees with '''Union by rank''' or '''Union by weight''' is <math>\Theta(\log n)</math> (i.e., it is <math>O(\log n)</math> and this bound is tight). In 1985, N. Blum gave an implementation of the operations that does not use path compression, but compresses trees during <math>union</math>. His implementation runs in <math>O(\log n / \log\log n)</math> time per operation,<ref>{{cite journal \|last1=Blum \|first1=Norbert \|title=On the Single-Operation Worst-Case Time Complexity of the Disjoint Set Union Problem \|journal=2nd Symp. On Theoretical Aspects of Computer Science \|date=1985 \|pages=32–38}}</ref> and thus in comparison with Galler and Fischer's structure it has a better worst-case time per operation, but inferior amortized time. In 1999, Alstrup et al. gave a structure that has optimal worst-case time <math>O(\log n / \log\log n)</math> together with inverse-Ackermann amortized time.<ref>{{cite book \|last1=Alstrup \|first1=Stephen \|last2=Ben-Amram \|first2=Amir M. \|last3=Rauhe \|first3=Theis \|title=Proceedings of the thirty-first annual ACM symposium on Theory of Computing \|chapter=Worst-case and amortised optimality in union-find (Extended abstract) \|date=1999 \|pages=499–506 \|doi=10.1145/301250.301383\|isbn=1581130678 \|s2cid=100111 }}</ref> ===Deletion=== The regular implementation as disjoint-set forests does not react favorably to the deletion of elements, in the sense that the time for <code>Find</code> will not improve as a result of the decrease in the number of elements. However, there exist modern implementations that allow for constant-time deletion and where the time-bound for <code>Find</code> depends on the ''current'' number of elements<ref>{{cite journal \|last1=Alstrup \|first1=Stephen \|last2=Thorup \|first2=Mikkel \|last3=Gørtz \|first3=Inge Li \|last4=Rauhe \|first4=Theis \|last5=Zwick \|first5=Uri \|title=Union-Find with Constant Time Deletions \|journal= ACM Transactions on Algorithms\|date=2014 \|volume=11 \|issue=1 \|pages=6:1–6:28\|doi=10.1145/2636922 \|s2cid=12767012 }}</ref><ref>{{cite journal \|last1=Ben-Amram \|first1=Amir M. \|last2=Yoffe \|first2=Simon \|title=A simple and efficient Union-Find-Delete algorithm \|journal=Theoretical Computer Science \|date=2011 \|volume=412 \|issue=4–5 \|pages=487–492\|doi=10.1016/j.tcs.2010.11.005 }}</ref> ==Backtracking== It is possible to extend certain disjoint-set forest structures to allow backtracking. The basic form of backtracking is to allow a <code>Backtrack(1)</code> operation, that undoes the last <code>Union</code>. A more advanced form allows <code>Backtrack(i)</code>, which undoes the last i unions. The following complexity result is known: there is a data structure which supports <code>Union</code> and <code>Find</code> in <math>O(\log n / \log \log n)</math> time per operation, and <code>Backtrack</code> in <math>O(1)</math> time.<ref name="WT">{{Cite journal \| last1 = Westbrook \| first1 = Jeffery R. \| last2 = Tarjan \| first2 = Robert E. \| title = Amortized Analysis of Algorithms for Set Union with Backtracking \| journal = SIAM Journal on Computing \| volume = 18 \| issue = 1 \| pages = 1-11 \| date = \| year = 1989 \| doi = 10.1137/0218001 \| pmid = \| url = }}</ref>. In this result, the freedom of <code>Union</code> to choose the representative of the formed set is essential. Better amortized time cannot be achieved within the class of separable [[pointer algorithm]]s<ref name="WT"/>. == Applications == Line 250 ⟶ 296: This data structure is used by the [[Boost Graph Library]] to implement its [http://www.boost.org/libs/graph/doc/incremental_components.html Incremental Connected Components] functionality. It is also a key component in implementing [[Kruskal's algorithm]] to find the [[minimum spanning tree]] of a graph. The [[Hoshen–Kopelman algorithm\|Hoshen-Kopelman algorithm]] uses a Union-Find in the algorithm. Note that the regular implementation as disjoint-set forests does not allow the deletion of edges, even without path compression or the rank heuristic. However, there exist modern implementations that allow for constant-time deletion.<ref>{{Cite book \|url=https://www.worldcat.org/oclc/262681795 \|title=Automata, languages and programming : 32nd international colloquium, ICALP 2005, Lisbon, Portugal, July 11-15, 2005 ; proceedings \|date=2005 \|publisher=Springer \|others=Luís. Caires, European Association for Theoretical Computer Science \|isbn=978-3-540-31691-6 \|___location=Berlin \|oclc=262681795}}</ref> Sharir and Agarwal report connections between the worst-case behavior of disjoint-sets and the length of [[Davenport–Schinzel sequence\|Davenport–Schinzel sequences]], a combinatorial structure from computational geometry.<ref name="Sharir1995">{{cite book\|first1=M.\|last1=Sharir\|first2=P.\|last2=Agarwal\|title=Davenport-Schinzel sequences and their geometric applications\|title-link= Davenport–Schinzel Sequences and Their Geometric Applications\|publisher=Cambridge University Press\|year=1995}}</ref> == See also == Line 268 ⟶ 312: * [https://github.com/jgrapht/jgrapht/blob/master/jgrapht-core/src/main/java/org/jgrapht/alg/util/UnionFind.java Java implementation], part of [https://jgrapht.org/ JGraphT library] * [https://the-algorithms.com/algorithm/union-find Javascript implementation] * [https://github.com/USNavalResearchLaboratory/TrackerComponentLibrary/blob/master/Container%20Classes/DisjointSet.m A Matlab Implementation] which is part of the [[Tracker Component Library]] * [http://code.activestate.com/recipes/215912-union-find-data-structure/ Python implementation] * [http://www.mathblog.dk/disjoint-set-data-structure/ Visual explanation and C# code] {{data structures}}