Tarjan's strongly connected components algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:17, 22 August 2008 edit Dcoetzee (talk \| contribs) 37,529 edits Link other algorithms ← Previous edit		Latest revision as of 20:48, 26 August 2025 edit undo 24.19.113.134 (talk) →Stack invariant: ce: attempt to fix one gibberish sentence and removing a second Tags: Mobile edit Mobile web edit
(280 intermediate revisions by more than 100 users not shown)
Line 1: {{Short description\|Graph algorithm}} '''Tarjan's Algorithm''' (named for its discoverer, [[Robert Tarjan]]) is a [[graph theory]] [[algorithm]] for finding the [[strongly connected components]] of a [[Graph (data structure)\|graph]]. It can be seen as an improved version of [[Kosaraju's algorithm\|Kosaraju's algorithm]], and is comparable in efficiency to [[Gabow's algorithm]]. {{CS1 config\|mode=cs2}} {{Infobox algorithm \|class= \|image= [[File:Tarjan's Algorithm Animation.gif\|250px]] \|caption = Tarjan's algorithm animation \|data=[[Graph (data structure)\|Graph]] \|time= <math>O(\|V\|+\|E\|)</math> \|best-time= \|average-time= \|space= \|optimal= \|complete= }} '''Tarjan's strongly connected components algorithm''' is an [[algorithm]] in [[graph theory]] for finding the [[strongly connected component]]s (SCCs) of a [[directed graph]]. It runs in [[linear time]], matching the time bound for alternative methods including [[Kosaraju's algorithm]] and the [[path-based strong component algorithm]]. The algorithm is named for its inventor, [[Robert Tarjan]].<ref name=Tarjan>{{citation\|first=R. E.\|last=Tarjan\|author-link=Robert Tarjan\|title=Depth-first search and linear graph algorithms\|journal=[[SIAM Journal on Computing]]\|volume=1\|year=1972\|issue=2\|pages=146–160\|doi=10.1137/0201010\|citeseerx=10.1.1.327.8418\|url=http://www.cs.ucsb.edu/~gilbert/cs240a/old/cs240aSpr2011/slides/TarjanDFS.pdf\|access-date=2024-04-07\|archive-date=2017-08-29\|archive-url=https://web.archive.org/web/20170829214726/http://www.cs.ucsb.edu/~gilbert/cs240a/old/cs240aSpr2011/slides/TarjanDFS.pdf\|url-status=bot: unknown}}</ref> == ~~Idea~~Overview == The basic idea of the algorithm is this: a [[depth-first search]] begins from a start node. The strongly connected components form the subtrees of the search tree, the roots of which are the roots of the strongly connected components. The nodes are placed on a [[Stack (data structure)\|stack]] in the order in which they are visited. When the search returns from a subtree, the nodes are taken from the stack and it is determined whether each node is the root of a strongly connected component. If a node is the root of a strongly connected component, then it and all of the nodes taken off before it form that strongly connected component. The algorithm takes a [[directed graph]] as input, and produces a [[Partition of a set\|partition]] of the graph's [[Vertex (graph theory)\|vertices]] into the graph's strongly connected components. Each vertex of the graph appears in exactly one of the strongly connected components. Any vertex that is not on a directed cycle forms a strongly connected component all by itself: for example, any vertex whose in-degree or out-degree is 0, or every vertex of a [[directed acyclic graph]]. ~~== The root property ==~~ The basic idea of the algorithm is this: a depth-first search (DFS) begins from an arbitrary start node (and subsequent depth-first searches are conducted on any nodes that have not yet been found). As usual with depth-first search, the search visits every node of the graph exactly once, refusing to revisit any node that has already been visited. Thus, the collection of search trees is a [[Spanning forest#Spanning forests\|spanning forest]] of the graph. The strongly connected components will be recovered as certain subtrees of this forest. The roots of these subtrees are called the "roots" of the strongly connected components. Any node of a strongly connected component might serve as a root, if it happens to be the first node of a component that is discovered by search. The crux of the algorithm comes in determining whether a node is the root of a strongly connected component. To do this, each node is given a depth search index <tt>v.index</tt>, which numbers the nodes consecutively in the order in which they are discovered. In addition, each node is assigned a value <tt>v.lowlink</tt> that satisfies <tt>v.lowlink := min {v'.index: v' is reachable from v}</tt>. Therefore <tt>v</tt> is the root of a strongly connected component if and only if <tt>v.lowlink = v.index</tt>. The value <tt>v.lowlink</tt> is computed during the depth first search such that it is always known when needed. === Stack invariant === ~~== The algorithm in [[pseudocode]] ==~~ ~~Input: Graph G = (V, E), Start node v0~~ Nodes are placed on a [[Stack (data structure)\|stack]] in the order in which they are visited. When the depth-first search recursively visits a node <code>v</code> and its descendants, those nodes are not all necessarily popped from the stack when this recursive call returns. The crucial [[Invariant (computer science)\|invariant property]] is that a node remains on the stack after it has been visited if and only if there exists a path in the input graph from it to some node earlier on the stack. In other words, a node is only removed from the DFS stack when all of its connected paths have been traversed. At the end of the call that visits <code>v</code> and its descendants, we know whether <code>v</code> itself has a path to any node earlier on the stack. If so, the call returns, leaving <code>v</code> on the stack to preserve the invariant. If not, then <code>v</code> must be the root of its strongly connected component, which consists of <code>v</code> together with any nodes later on the stack than <code>v</code> (such nodes all have paths back to <code>v</code> but not to any earlier node, because if they had paths to earlier nodes then <code>v</code> would also have paths to earlier nodes which is false). The connected component rooted at <code>v</code> is then popped from the stack and returned, again preserving the invariant. === Bookkeeping === Each node <code>v</code> is assigned a unique integer <code>v.index</code>, which numbers the nodes consecutively in the order in which they are discovered. It also maintains a value <code>v.lowlink</code> that represents the smallest index of any node on the stack known to be reachable from <code>v</code> through <code>v</code>'s DFS subtree, including <code>v</code> itself. Therefore <code>v</code> must be left on the stack if <code>v.lowlink < v.index</code>, whereas v must be removed as the root of a strongly connected component if <code>v.lowlink == v.index</code>. The value <code>v.lowlink</code> is computed during the depth-first search from <code>v</code>, as this finds the nodes that are reachable from <code>v</code>. The lowlink is different from the lowpoint, which is the smallest index reachable from <code>v</code> through any part of the graph.<ref name=Tarjan/>{{rp\|156}}<ref name="CMU2018"/> == The algorithm in pseudocode == '''algorithm''' tarjan '''is''' ~~index = 0 // DFS node number counter~~ '''input:''' graph ''G'' = (''V'', ''E'') ~~S = empty // An empty stack of nodes~~ '''output:''' set of strongly connected components (sets of vertices) ~~tarjan(v0) // Start a DFS at the start node~~ ''index'' := 0 ~~procedure tarjan(v)~~ ''S'' := empty stack ~~v.index = index // Set the depth index for v~~ '''for each''' ''v'' '''in''' ''V'' '''do''' ~~v.lowlink = index~~ ~~index~~ = '''if''' ''v''.index +is undefined 1'''then''' ~~S.push(v)~~ ~~// Push~~ strongconnect(''v ~~on the stack~~'') ~~forall (v, v') in E do // Consider successors of v~~ '''function''' strongconnect(''v'') ~~if (v'.index is undefined) // Was successor v' visited?~~ ~~tarjan(v')~~ ''// Set the depth index for v to the smallest unused ~~// Recurse~~index'' ~~v.lowlink~~ = ~~min(~~''v''.~~lowlink,~~index := v'~~.lowlink)~~'index'' ''v''.lowlink := ''index'' ~~elseif (v' in S) // Is v' on the stack?~~ ~~v.lowlink~~ = ~~min(v.lowlink,~~''index'' := v'.'index)'' + 1 ''S''.push(''v'') ~~if (v.lowlink == v.index) // Is v the root of an SCC?~~ ''v''.onStack := true ~~print "SCC:"~~ ~~repeat~~ v ''// =Consider ~~S.pop~~successors of v'' '''for each''' (''v'', ''w'') '''in''' ''E'' '''do''' ~~print v'~~ '''if''' ''w''.index is undefined '''then''' ~~until (v' == v)~~ ''// Successor w has not yet been visited; recurse on it'' strongconnect(''w'') ''v''.lowlink := min(''v''.lowlink, ''w''.lowlink) '''else if''' ''w''.onStack '''then''' ''// Successor w is in stack S and hence in the current SCC'' ''// If ''w'' is not on stack, then (''v'', ''w'') is an edge pointing to an SCC already found and must be ignored ''// See below regarding the next line'' ''v''.lowlink := min(''v''.lowlink, ''w''.index) ''// If v is a root node, pop the stack and generate an SCC'' '''if''' ''v''.lowlink = ''v''.index '''then''' start a new strongly connected component '''repeat''' ''w'' := ''S''.pop() ''w''.onStack := false add ''w'' to current strongly connected component '''while''' ''w'' ≠ ''v'' output the current strongly connected component The <code>index</code> variable is the depth-first search node number counter. <code>S</code> is the node stack, which starts out empty and stores the history of nodes explored but not yet committed to a strongly connected component. This is not the normal depth-first search stack, as nodes are not popped as the search returns up the tree; they are only popped when an entire strongly connected component has been found. The outermost loop searches each node that has not yet been visited, ensuring that nodes which are not reachable from the first node are still eventually traversed. The function <code>strongconnect</code> performs a single depth-first search of the graph, finding all successors from the node <code>v</code>, and reporting all strongly connected components of that subgraph. When each node finishes recursing, if its lowlink is still set to its index, then it is the root node of a strongly connected component, formed by all of the nodes above it on the stack. The algorithm pops the stack up to and including the current node, and presents all of these nodes as a strongly connected component. In Tarjan's paper, when <code>''w''</code> is on the stack, <code>''v''.lowlink</code> is updated with the assignment <code>''v''.lowlink := min(''v''.lowlink, ''w''.index)</code>.<ref name=Tarjan/>{{rp\|157}} A common variation is to instead use <code>''v''.lowlink := min(''v''.lowlink, ''w''.lowlink)</code>.<ref>{{cite conference \| last1 = Kordy \| first1 = Piotr \| last2 = Langerak \| first2 = Rom \| last3 = Mauw \| first3 = Sjouke \| last4 = Polderman \| first4 = Jan Willem \| editor1-last = Jones \| editor1-first = Cliff B. \| editor2-last = Pihlajasaari \| editor2-first = Pekka \| editor3-last = Sun \| editor3-first = Jun \| contribution = A symbolic algorithm for the analysis of robust timed automata \| contribution-url = https://satoss.uni.lu/members/sjouke/papers/KLMP14.pdf \| doi = 10.1007/978-3-319-06410-9_25 \| isbn = 978-3-319-06409-3 \| pages = 351–366 \| publisher = Springer \| series = Lecture Notes in Computer Science \| title = FM 2014: Formal Methods – 19th International Symposium, Singapore, May 12–16, 2014. Proceedings \| volume = 8442 \| year = 2014}}</ref><ref>{{cite web \|title=Lecture 19: Tarjan's Algorithm for Identifying Strongly Connected Components in the Dependency Graph \|url=http://courses.cms.caltech.edu/cs130/lectures-2024wi/CS130-Wi2024-Lec19.pdf \|website=CS130 Software Engineering \|publisher=Caltech \|date=Winter 2024}}</ref> This modified algorithm does not compute the lowlink numbers as Tarjan defined them, but the test <code>''v''.lowlink = ''v''.index</code> still identifies root nodes of strongly connected components, and therefore the overall algorithm remains valid.<ref name="CMU2018">{{cite web \|title=Lecture #19: Depth First Search and Strong Components \|url=https://www.cs.cmu.edu/~15451-f18/lectures/lec19-DFS-strong-components.pdf \|website=15-451/651: Design & Analysis of Algorithms \|publisher=Carnegie Mellon \|date=1 November 2018}}</ref> == Complexity == ''Time Complexity'': The Tarjan procedure is called once for each node; the forall statement considers each edge at most once. The algorithm's running time is therefore linear in the number of edges and nodes in G, i.e. <math>O(\|V\|+\|E\|)</math>. In order to achieve this complexity, the test for whether <code>w</code> is on the stack should be done in constant time. This can be done as in the pseudocode above: store a flag on each node that indicates whether it is on the stack, and performing this test by examining the flag. ''Space Complexity'': The Tarjan procedure requires two words of supplementary data per vertex for the <code>index</code> and <code>lowlink</code> fields, along with one bit for <code>onStack</code> and another for determining when <code>index</code> is undefined. In addition, one word is required on each stack frame to hold <code>v</code> and another for the current position in the edge list. Finally, the worst-case size of the stack <code>S</code> must be <math>\|V\|</math> (i.e. when the graph is one giant component). This gives a final analysis of <math>O(\|V\|\cdot(2+5w))</math> where <math>w</math> is the machine word size. The variation of Nuutila and Soisalon-Soininen reduced this to <math>O(\|V\|\cdot(1+4w))</math> and, subsequently, that of Pearce requires only <math>O(\|V\|\cdot(1+3w))</math>.<ref>{{cite journal\|last=Nuutila\|first=Esko\|title=On Finding the Strongly Connected Components in a Directed Graph\|journal=Information Processing Letters\|pages=9–14\|volume=49\|number=1\|doi=10.1016/0020-0190(94)90047-7\|year=1994}}</ref><ref>{{cite journal\|last=Pearce\|first=David\|title=A Space Efficient Algorithm for Detecting Strongly Connected Components\|journal=Information Processing Letters\|pages=47–52\|number=1\|volume=116\|doi=10.1016/j.ipl.2015.08.010}}</ref> ==Additional ~~Remarks~~ remarks== While there is nothing special about the order of the nodes within each strongly connected component, one useful property of the algorithm is that no strongly connected component will be identified before any of its successors. Therefore, the order in which the strongly connected components are identified constitutes a reverse [[Topological sorting\|topological sort]] of the [[Directed acyclic graph\|DAG]] formed by the strongly connected components.<ref>{{cite web\|last=Harrison\|first=Paul\|title=Robust topological sorting and Tarjan's algorithm in Python\|url=http://www.logarithmic.net/pfh/blog/01208083168\|access-date=9 February 2011}}</ref> #Complexity: The tarjan procedure is called once for each node; the forall statement considers each edge at most twice. The algorithm's running time is therefore linear in the number of edges in G (O(\|V\|+\|E\|)). ~~#The test for whether v' is on the stack should be done in constant time, for example, by testing a flag stored on each node that indicates whether it is on the stack.~~ ~~#The algorithm can only find those strongly connected components that are reachable from the start node. This can be overcome by starting the algorithm several times from different start nodes.~~ [[Donald Knuth]] described Tarjan's SCC algorithm as one of his favorite implementations in the book ''The Stanford GraphBase''.<ref>Knuth, ''The Stanford GraphBase'', pages 512–519.</ref> ~~== Literature ==~~ He also wrote:<ref>{{cite book\|last=Knuth\|first=Donald\|title=Twenty Questions for Donald Knuth\|url=http://www.informit.com/articles/article.aspx?p=2213858&WT.mc_id=Author_Knuth_20Questions\|date=2014-05-20}}</ref> {{quote\|The data structures that he devised for this problem fit together in an amazingly beautiful way, so that the quantities you need to look at while exploring a directed graph are always magically at your fingertips. And his algorithm also does topological sorting as a byproduct.}} * Robert Tarjan: ''Depth-first search and linear graph algorithms''. In: ''SIAM Journal on Computing''. Vol. 1 (1972), No. 2, P. 146-160. == ~~Links~~References == <references /> ~~[http://www.ics.uci.edu/~eppstein/161/960220.html#sca Description of Tarjan's Algorithm]~~ [[Category:Graph algorithms]] [[Category:Graph connectivity]] [[Category:Articles with example pseudocode]] ==External links== ~~[[de:Algorithmus von Tarjan zur Bestimmung starker Zusammenhangskomponenten]]~~ * [https://rosettacode.org/wiki/Tarjan Rosetta Code], showing implementations in different languages ~~[[ru:Алгоритм Тарьяна]]~~ * [https://github.com/Vacilando/php-tarjan PHP implementation of Tarjan's strongly connected components algorithm] * [https://github.com/Vacilando/js-tarjan JavaScript implementation of Tarjan's strongly connected components algorithm]