Tarjan's strongly connected components algorithm: Difference between revisions

Content deleted Content added
dup ref
Stack invariant: ce: attempt to fix one gibberish sentence and removing a second
Tags: Mobile edit Mobile web edit
 
(10 intermediate revisions by 7 users not shown)
Line 1:
{{Short description|Graph algorithm}}
{{CS1 config|mode=cs2}}
{{Infobox algorithm
|class=
Line 12 ⟶ 13:
|complete=
}}
'''Tarjan's strongly connected components algorithm''' is an [[algorithm]] in [[graph theory]] for finding the [[strongly connected component]]s (SCCs) of a [[directed graph]]. It runs in [[linear time]], matching the time bound for alternative methods including [[Kosaraju's algorithm]] and the [[path-based strong component algorithm]]. The algorithm is named for its inventor, [[Robert Tarjan]].<ref name=Tarjan>{{citation|first=R. E.|last=Tarjan|author-link=Robert Tarjan|title=Depth-first search and linear graph algorithms|journal=[[SIAM Journal on Computing]]|volume=1|year=1972|issue=2|pages=146–160|doi=10.1137/0201010|citeseerx=10.1.1.327.8418|url=http://www.cs.ucsb.edu/~gilbert/cs240a/old/cs240aSpr2011/slides/TarjanDFS.pdf|access-date=2024-04-07|archive-date=2017-08-29|archive-url=https://web.archive.org/web/20170829214726id_20170829214726/http://www.cs.ucsb.edu/~gilbert/cs240a/old/cs240aSpr2011/slides/TarjanDFS.pdf|url-status=bot: unknown}}</ref>
 
== Overview ==
 
The algorithm takes a [[directed graph]] as input, and produces a [[Partition of a set|partition]] of the graph's [[Vertex (graph theory)|vertices]] into the graph's strongly connected components. Each vertex of the graph appears in exactly one of the strongly connected components. Any vertex that is not on a directed cycle forms a strongly connected component all by itself: for example, aany vertex whose in-degree or out-degree is 0, or anyevery vertex of ana [[directed acyclic graph]].
 
The basic idea of the algorithm is this: a depth-first search (DFS) begins from an arbitrary start node (and subsequent depth-first searches are conducted on any nodes that have not yet been found). As usual with depth-first search, the search visits every node of the graph exactly once, refusing to revisit any node that has already been visited. Thus, the collection of search trees is a [[Spanning forest#Spanning forests|spanning forest]] of the graph. The strongly connected components will be recovered as certain subtrees of this forest. The roots of these subtrees are called the "roots" of the strongly connected components. Any node of a strongly connected component might serve as a root, if it happens to be the first node of a component that is discovered by search.
Line 22 ⟶ 23:
=== Stack invariant ===
 
Nodes are placed on a [[Stack (data structure)|stack]] in the order in which they are visited. When the depth-first search recursively visits a node <code>v</code> and its descendants, those nodes are not all necessarily popped from the stack when this recursive call returns. The crucial [[Invariant (computer science)|invariant property]] is that a node remains on the stack after it has been visited if and only if there exists a path in the input graph from it to some node earlier on the stack. In other words, it means that in the DFS a node would beis only removed from the DFS stack afterwhen all of its connected paths have been traversed. When the DFS will backtrack it would remove the nodes on a single path and return to the root in order to start a new path.
 
At the end of the call that visits <code>v</code> and its descendants, we know whether <code>v</code> itself has a path to any node earlier on the stack. If so, the call returns, leaving <code>v</code> on the stack to preserve the invariant. If not, then <code>v</code> must be the root of its strongly connected component, which consists of <code>v</code> together with any nodes later on the stack than <code>v</code> (such nodes all have paths back to <code>v</code> but not to any earlier node, because if they had paths to earlier nodes then <code>v</code> would also have paths to earlier nodes which is false). The connected component rooted at <code>v</code> is then popped from the stack and returned, again preserving the invariant.
 
=== Bookkeeping ===
Line 80 ⟶ 81:
When each node finishes recursing, if its lowlink is still set to its index, then it is the root node of a strongly connected component, formed by all of the nodes above it on the stack. The algorithm pops the stack up to and including the current node, and presents all of these nodes as a strongly connected component.
 
In Tarjan's paper, when <code>''w''</code> is on the stack, <code>''v''.lowlink</code> is updated with the assignment <code>''v''.lowlink := min(''v''.lowlink, ''w''.index)</code>.<ref name=Tarjan/>{{rp|157}} A common variation is to instead use <code>''v''.lowlink := min(''v''.lowlink, ''w''.lowlink)</code>.<ref>{{cite journal |last1=Kordy |first1=Piotr |last2=Langerak |first2=Rom |last3=Mauw |first3=Sjouke |last4=Polderman |first4=Jan Willem |title=A Symbolic Algorithm for the Analysis of Robust Timed Automata |journal=FM 2014: Formal Methods |date=2014 |volume=8442 |pages=351–366 |doi=10.1007/978-3-319-06410-9_25 |url=https://satoss.uni.lu/members/sjouke/papers/KLMP14.pdf}}</ref><ref>{{cite web |title=Lecture 19: Tarjan’s Algorithm for Identifying Strongly Connected Components in the Dependency Graph |url=http://courses.cms.caltech.edu/cs130/lectures/CS130-Wi2024-Lec19.pdf |website=CS130 Software Engineering |publisher=Caltech |date=Winter 2024}}</ref> This modified algorithm does not compute the lowlink numbers as Tarjan defined them, but the test <code>''v''.lowlink = ''v''.index</code> still identifies root nodes of strongly connected components, and therefore the overall algorithm remains valid.<ref name="CMU2018">{{cite web |title=Lecture #19: Depth First Search and Strong Components |url=https://www.cs.cmu.edu/~15451-f18/lectures/lec19-DFS-strong-components.pdf |website=15-451/651: Design & Analysis of Algorithms |publisher=Carnegie Mellon |date=1 November 2018}}</ref>conference
| last1 = Kordy | first1 = Piotr
| last2 = Langerak | first2 = Rom
| last3 = Mauw | first3 = Sjouke
| last4 = Polderman | first4 = Jan Willem
| editor1-last = Jones | editor1-first = Cliff B.
| editor2-last = Pihlajasaari | editor2-first = Pekka
| editor3-last = Sun | editor3-first = Jun
| contribution = A symbolic algorithm for the analysis of robust timed automata
| contribution-url = https://satoss.uni.lu/members/sjouke/papers/KLMP14.pdf
| doi = 10.1007/978-3-319-06410-9_25
| isbn = 978-3-319-06409-3
| pages = 351–366
| publisher = Springer
| series = Lecture Notes in Computer Science
| title = FM 2014: Formal Methods – 19th International Symposium, Singapore, May 12–16, 2014. Proceedings
| volume = 8442
| year = 2014}}</ref><ref>{{cite web |title=Lecture 19: Tarjan's Algorithm for Identifying Strongly Connected Components in the Dependency Graph |url=http://courses.cms.caltech.edu/cs130/lectures-2024wi/CS130-Wi2024-Lec19.pdf |website=CS130 Software Engineering |publisher=Caltech |date=Winter 2024}}</ref> This modified algorithm does not compute the lowlink numbers as Tarjan defined them, but the test <code>''v''.lowlink = ''v''.index</code> still identifies root nodes of strongly connected components, and therefore the overall algorithm remains valid.<ref name="CMU2018">{{cite web |title=Lecture #19: Depth First Search and Strong Components |url=https://www.cs.cmu.edu/~15451-f18/lectures/lec19-DFS-strong-components.pdf |website=15-451/651: Design & Analysis of Algorithms |publisher=Carnegie Mellon |date=1 November 2018}}</ref>
 
== Complexity ==
Line 86 ⟶ 104:
 
In order to achieve this complexity, the test for whether <code>w</code> is on the stack should be done in constant time.
This maycan be done, foras example,in bythe storingpseudocode above: store a flag on each node that indicates whether it is on the stack, and performing this test by examining the flag.
 
''Space Complexity'': The Tarjan procedure requires two words of supplementary data per vertex for the <code>index</code> and <code>lowlink</code> fields, along with one bit for <code>onStack</code> and another for determining when <code>index</code> is undefined. In addition, one word is required on each stack frame to hold <code>v</code> and another for the current position in the edge list. Finally, the worst-case size of the stack <code>S</code> must be <math>|V|</math> (i.e. when the graph is one giant component). This gives a final analysis of <math>O(|V|\cdot(2+5w))</math> where <math>w</math> is the machine word size. The variation of Nuutila and Soisalon-Soininen reduced this to <math>O(|V|\cdot(1+4w))</math> and, subsequently, that of Pearce requires only <math>O(|V|\cdot(1+3w))</math>.<ref>{{cite journal|last=Nuutila|first=Esko|title=On Finding the Strongly Connected Components in a Directed Graph|journal=Information Processing Letters|pages=9–14|volume=49|number=1|doi=10.1016/0020-0190(94)90047-7|year=1994}}</ref><ref>{{cite journal|last=Pearce|first=David|title=A Space Efficient Algorithm for Detecting Strongly Connected Components|journal=Information Processing Letters|pages=47–52|number=1|volume=116|doi=10.1016/j.ipl.2015.08.010}}</ref>