Strongly Connected Components Explained — Kosaraju, Tarjan & When to Use Each
Social networks, airline route maps, compiler dependency graphs, deadlock detection in operating systems — every one of these systems hides a graph underneath, and inside that graph lurk tightly knit clusters where every node can reach every other node. Those clusters are Strongly Connected Components (SCCs), and identifying them is one of the most practically useful graph operations you'll ever implement. Miss an SCC and your compiler might mis-order function compilation. Ignore SCCs in a routing system and you'll miss that a subset of cities is completely isolated from the rest of the network.
The core problem SCCs solve is reachability symmetry in directed graphs. In an undirected graph, if A can reach B then B can reach A — trivially. But in a directed graph, edges have direction, so reachability isn't symmetric. You might be able to fly from New York to London but the direct return flight might not exist. SCCs carve a directed graph into maximal groups where every internal pair of nodes can mutually reach each other, giving you a clean decomposition of the graph's cyclic structure.
By the end of this article you'll understand exactly how Kosaraju's two-pass DFS and Tarjan's single-pass low-link algorithm work at the bit-level, you'll know which one to reach for in production and why, you'll have complete runnable Java implementations of both, and you'll be ready to handle the tricky follow-up questions interviewers throw at candidates who only memorised the surface-level answer.
What an SCC Actually Is — And the Condensation Graph Trick
A Strongly Connected Component is a maximal set of vertices in a directed graph such that there is a directed path from every vertex in the set to every other vertex in the set. The word 'maximal' is doing serious work here — you can't add one more vertex and keep the property. A single isolated vertex with no edges counts as an SCC of size 1, because the only path it needs is the trivial zero-length path to itself.
Once you identify all SCCs, something beautiful happens: you can collapse each SCC into a single super-node, and the edges between SCCs form a Directed Acyclic Graph (DAG). This is called the condensation graph, and it's the secret weapon SCCs give you. A DAG can be topologically sorted, which means problems that were intractable on the original cyclic graph become straightforward on the condensation. Compiler pass ordering, task scheduling with mutual dependencies, and finding the minimum number of edges to add to make a graph strongly connected — all of these reduce to operations on the condensation DAG.
Think of the condensation as compressing all the complexity of cycles into single nodes. The resulting DAG tells you the 'macro structure' of your graph: which clusters depend on which other clusters, with all the internal circular dependencies safely encapsulated.
import java.util.*; /** * Demonstrates what an SCC is and builds a condensation graph. * Graph used: * 0 -> 1, 1 -> 2, 2 -> 0 (cycle: SCC {0,1,2}) * 1 -> 3, 3 -> 4 (no back edges: SCCs {3}, {4}) */ public class SCCCondensationDemo { private final int totalVertices; private final List<List<Integer>> adjacencyList; public SCCCondensationDemo(int totalVertices) { this.totalVertices = totalVertices; this.adjacencyList = new ArrayList<>(); for (int i = 0; i < totalVertices; i++) { adjacencyList.add(new ArrayList<>()); } } public void addDirectedEdge(int from, int to) { adjacencyList.get(from).add(to); } /** * Runs a DFS and records the finish order of vertices. * This finish order is the secret sauce of Kosaraju's algorithm. */ private void dfsRecordFinish( int vertex, boolean[] visited, Deque<Integer> finishStack, List<List<Integer>> graph) { visited[vertex] = true; for (int neighbour : graph.get(vertex)) { if (!visited[neighbour]) { dfsRecordFinish(neighbour, visited, finishStack, graph); } } // A vertex is pushed AFTER all its descendants finish. // The last vertex to finish has the highest 'reach' in the original graph. finishStack.push(vertex); } /** * Collects all vertices reachable from 'startVertex' on the transposed graph. * Every vertex collected here belongs to the same SCC as 'startVertex'. */ private void dfsCollectComponent( int startVertex, boolean[] visited, List<Integer> currentComponent, List<List<Integer>> transposedGraph) { visited[startVertex] = true; currentComponent.add(startVertex); for (int neighbour : transposedGraph.get(startVertex)) { if (!visited[neighbour]) { dfsCollectComponent(neighbour, visited, currentComponent, transposedGraph); } } } /** * Builds the transposed graph (every edge reversed). * Reversing edges is what lets us 'check' reachability in both directions. */ private List<List<Integer>> buildTransposedGraph() { List<List<Integer>> transposed = new ArrayList<>(); for (int i = 0; i < totalVertices; i++) { transposed.add(new ArrayList<>()); } for (int from = 0; from < totalVertices; from++) { for (int to : adjacencyList.get(from)) { transposed.get(to).add(from); // flip the direction } } return transposed; } /** * Kosaraju's algorithm — returns each SCC as a list of vertex IDs. * Time complexity: O(V + E). Space: O(V + E) for the transposed graph. */ public List<List<Integer>> findSCCs() { boolean[] visited = new boolean[totalVertices]; Deque<Integer> finishStack = new ArrayDeque<>(); // Pass 1: DFS on original graph, push vertices by finish time for (int vertex = 0; vertex < totalVertices; vertex++) { if (!visited[vertex]) { dfsRecordFinish(vertex, visited, finishStack, adjacencyList); } } List<List<Integer>> transposedGraph = buildTransposedGraph(); Arrays.fill(visited, false); // reset for second DFS pass List<List<Integer>> allSCCs = new ArrayList<>(); // Pass 2: Process vertices in reverse finish order on the transposed graph. // If you can reach a vertex from the 'leader' in the transposed graph, // it means the original graph has a path back — confirming mutual reachability. while (!finishStack.isEmpty()) { int leader = finishStack.pop(); if (!visited[leader]) { List<Integer> component = new ArrayList<>(); dfsCollectComponent(leader, visited, component, transposedGraph); allSCCs.add(component); } } return allSCCs; } public static void main(String[] args) { SCCCondensationDemo graph = new SCCCondensationDemo(5); // Cycle among 0, 1, 2 — they form one SCC graph.addDirectedEdge(0, 1); graph.addDirectedEdge(1, 2); graph.addDirectedEdge(2, 0); // 1 reaches 3 and 4, but 3 and 4 can't get back into the cycle graph.addDirectedEdge(1, 3); graph.addDirectedEdge(3, 4); List<List<Integer>> sccs = graph.findSCCs(); System.out.println("Strongly Connected Components found: " + sccs.size()); for (int i = 0; i < sccs.size(); i++) { System.out.println(" SCC " + (i + 1) + ": " + sccs.get(i)); } System.out.println(); System.out.println("Condensation insight:"); System.out.println(" {0,1,2} -> {3} -> {4} forms a DAG — no cycles remain."); } }
SCC 1: [0, 2, 1]
SCC 2: [3]
SCC 3: [4]
Condensation insight:
{0,1,2} -> {3} -> {4} forms a DAG — no cycles remain.
Tarjan's Algorithm — One DFS, Low-Link Values, and Why It's Faster in Practice
Kosaraju's two-pass approach is elegant and easy to explain, but it requires building an entirely transposed graph in memory and running DFS twice. Tarjan's algorithm finds all SCCs in a single DFS pass using two clever bookkeeping arrays: discovery time (when was this vertex first visited?) and low-link value (what is the earliest discovery time reachable from this vertex's subtree, including back edges?).
The low-link value is the key insight. When you finish processing a vertex, if its low-link value equals its own discovery time, it means no back edge from its subtree escapes to an ancestor. That vertex is the 'root' of an SCC. Everything on the DFS stack between the root and the current top of the stack belongs to that SCC.
Tarjan's algorithm uses an explicit stack to track 'potentially still in the current SCC' vertices, and a boolean array to mark which vertices are currently on that stack. The on-stack check is critical — without it, you'd incorrectly update low-link values through cross-edges to already-completed SCCs, collapsing separate components into one.
In practice, Tarjan's wins on memory because it doesn't need the transposed graph. For sparse graphs (E close to V) they're equivalent, but for dense graphs Kosaraju's extra memory allocation becomes noticeable.
import java.util.*; /** * Tarjan's SCC algorithm — single DFS pass using low-link values. * Time: O(V + E). Space: O(V) auxiliary (no transposed graph needed). * * Same test graph as before: * 0 -> 1 -> 2 -> 0 (cycle) * 1 -> 3 -> 4 */ public class TarjanSCC { private final int totalVertices; private final List<List<Integer>> adjacencyList; // Discovery time of each vertex (order in which DFS visits it) private final int[] discoveryTime; // Low-link value: smallest discovery time reachable from this vertex's subtree private final int[] lowLink; // Tracks whether a vertex is currently on the DFS stack private final boolean[] onStack; // The explicit stack holding the 'current path' of the DFS private final Deque<Integer> dfsStack; // Monotonically increasing timer — gives each vertex a unique discovery stamp private int timer; private final List<List<Integer>> allSCCs; public TarjanSCC(int totalVertices) { this.totalVertices = totalVertices; this.adjacencyList = new ArrayList<>(); this.discoveryTime = new int[totalVertices]; this.lowLink = new int[totalVertices]; this.onStack = new boolean[totalVertices]; this.dfsStack = new ArrayDeque<>(); this.allSCCs = new ArrayList<>(); Arrays.fill(discoveryTime, -1); // -1 means 'not yet visited' Arrays.fill(lowLink, -1); for (int i = 0; i < totalVertices; i++) { adjacencyList.add(new ArrayList<>()); } } public void addDirectedEdge(int from, int to) { adjacencyList.get(from).add(to); } /** * Core DFS function. * After visiting all neighbours of 'current', we check: * if lowLink[current] == discoveryTime[current] * That means 'current' is the root of an SCC — no back edge escapes upward. */ private void tarjanDFS(int current) { // Stamp this vertex with its discovery time and initialise low-link to the same value discoveryTime[current] = lowLink[current] = timer++; // Push onto our tracking stack dfsStack.push(current); onStack[current] = true; for (int neighbour : adjacencyList.get(current)) { if (discoveryTime[neighbour] == -1) { // Tree edge — neighbour hasn't been visited yet tarjanDFS(neighbour); // After returning, propagate the lowest reachable time upward lowLink[current] = Math.min(lowLink[current], lowLink[neighbour]); } else if (onStack[neighbour]) { // Back edge — neighbour is an ancestor still on our stack. // Update low-link directly from neighbour's DISCOVERY time, not its low-link. // Using discoveryTime here (not lowLink) prevents wrongly crossing SCC boundaries. lowLink[current] = Math.min(lowLink[current], discoveryTime[neighbour]); } // If neighbour is visited but NOT on the stack, it belongs to a completed SCC. // We deliberately ignore it — updating low-link from it would merge separate SCCs. } // If low-link equals discovery time, 'current' is the SCC root. // Pop everything off the stack down to (and including) 'current' — that's the SCC. if (lowLink[current] == discoveryTime[current]) { List<Integer> newComponent = new ArrayList<>(); int poppedVertex; do { poppedVertex = dfsStack.pop(); onStack[poppedVertex] = false; newComponent.add(poppedVertex); } while (poppedVertex != current); // stop when we reach the root allSCCs.add(newComponent); } } public List<List<Integer>> findAllSCCs() { // Handle disconnected graphs by trying every unvisited starting point for (int vertex = 0; vertex < totalVertices; vertex++) { if (discoveryTime[vertex] == -1) { tarjanDFS(vertex); } } return allSCCs; } public static void main(String[] args) { TarjanSCC graph = new TarjanSCC(5); graph.addDirectedEdge(0, 1); graph.addDirectedEdge(1, 2); graph.addDirectedEdge(2, 0); graph.addDirectedEdge(1, 3); graph.addDirectedEdge(3, 4); List<List<Integer>> sccs = graph.findAllSCCs(); System.out.println("Tarjan's Algorithm — SCCs found: " + sccs.size()); for (int i = 0; i < sccs.size(); i++) { System.out.println(" SCC " + (i + 1) + ": " + sccs.get(i)); } System.out.println(); System.out.println("Discovery times: " + Arrays.toString(graph.discoveryTime)); System.out.println("Final low-links: " + Arrays.toString(graph.lowLink)); System.out.println(); System.out.println("Vertices 0,1,2 share low-link=0 (discoveryTime[0])."); System.out.println("That confirms they are all in the same SCC."); } }
SCC 1: [4]
SCC 2: [3]
SCC 3: [2, 1, 0]
Discovery times: [0, 1, 2, 3, 4]
Final low-links: [0, 0, 0, 3, 4]
Vertices 0,1,2 share low-link=0 (discoveryTime[0]).
That confirms they are all in the same SCC.
Production Gotchas — Stack Overflow, Iterative Tarjan, and Real-World SCC Bugs
Both recursive implementations above will cause a StackOverflowError on graphs with tens of thousands of vertices in a default JVM. The default Java thread stack is 512KB–1MB, which allows roughly 5,000–10,000 recursive calls before crashing. Production graphs — web crawls, social networks, dependency trees — routinely have millions of vertices.
The fix is to rewrite the recursive DFS iteratively using an explicit stack that stores not just the vertex but also the iterator position (which neighbour are we currently processing). This is the tricky part: a recursive DFS implicitly stores the 'current neighbour index' in the call frame. When you go iterative, you must track that explicitly, typically by storing an iterator or index alongside the vertex on the stack.
A second production concern is input validation. Real-world graph data often has duplicate edges, self-loops, or disconnected components. Self-loops (a vertex pointing to itself) are valid SCCs of size 1 in Tarjan's — the condition lowLink[v] == discoveryTime[v] still holds. Duplicate edges don't break correctness but add wasted iterations; deduplicate adjacency lists on ingestion if performance matters.
A third gotcha is integer overflow in the timer for enormous graphs. A plain int timer maxes out at ~2.1 billion — fine for most cases, but worth noting if you're processing a multi-billion-edge web graph.
import java.util.*; /** * Iterative version of Tarjan's SCC algorithm. * Eliminates recursion to avoid StackOverflowError on large graphs. * * The key challenge: a recursive DFS implicitly tracks 'which neighbour am I * currently processing' via the call stack frame. We replicate that with a * DfsFrame object that stores the vertex AND a neighbour iterator. */ public class TarjanSCCIterative { private final int totalVertices; private final List<List<Integer>> adjacencyList; public TarjanSCCIterative(int totalVertices) { this.totalVertices = totalVertices; adjacencyList = new ArrayList<>(); for (int i = 0; i < totalVertices; i++) { adjacencyList.add(new ArrayList<>()); } } public void addDirectedEdge(int from, int to) { adjacencyList.get(from).add(to); } /** * Represents one 'frame' on our simulated call stack. * Stores the vertex being processed and an iterator over its neighbours, * so we know where to resume when we return from a 'recursive call'. */ private static class DfsFrame { final int vertex; final Iterator<Integer> neighbourIterator; DfsFrame(int vertex, Iterator<Integer> neighbourIterator) { this.vertex = vertex; this.neighbourIterator = neighbourIterator; } } public List<List<Integer>> findAllSCCs() { int[] discoveryTime = new int[totalVertices]; int[] lowLink = new int[totalVertices]; boolean[] onStack = new boolean[totalVertices]; Arrays.fill(discoveryTime, -1); Deque<Integer> sccStack = new ArrayDeque<>(); // Tarjan's tracking stack Deque<DfsFrame> callStack = new ArrayDeque<>(); // simulated recursion stack List<List<Integer>> allSCCs = new ArrayList<>(); int[] timer = {0}; // array wrapper so we can mutate inside the loop for (int startVertex = 0; startVertex < totalVertices; startVertex++) { if (discoveryTime[startVertex] != -1) continue; // already visited // Simulate the initial 'call' to tarjanDFS(startVertex) discoveryTime[startVertex] = lowLink[startVertex] = timer[0]++; sccStack.push(startVertex); onStack[startVertex] = true; callStack.push(new DfsFrame(startVertex, adjacencyList.get(startVertex).iterator())); while (!callStack.isEmpty()) { DfsFrame frame = callStack.peek(); // peek, not pop — we're 'inside' this frame int current = frame.vertex; if (frame.neighbourIterator.hasNext()) { // Still have neighbours to process — equivalent to the for-loop body int neighbour = frame.neighbourIterator.next(); if (discoveryTime[neighbour] == -1) { // Tree edge: simulate recursive call by pushing a new frame discoveryTime[neighbour] = lowLink[neighbour] = timer[0]++; sccStack.push(neighbour); onStack[neighbour] = true; callStack.push(new DfsFrame(neighbour, adjacencyList.get(neighbour).iterator())); } else if (onStack[neighbour]) { // Back edge: update low-link with neighbour's discovery time lowLink[current] = Math.min(lowLink[current], discoveryTime[neighbour]); } // Cross edge to completed SCC: do nothing } else { // No more neighbours — simulate the 'return' from the recursive call callStack.pop(); if (!callStack.isEmpty()) { // Propagate low-link upward to the parent frame int parent = callStack.peek().vertex; lowLink[parent] = Math.min(lowLink[parent], lowLink[current]); } // Check if 'current' is the root of an SCC if (lowLink[current] == discoveryTime[current]) { List<Integer> component = new ArrayList<>(); int poppedVertex; do { poppedVertex = sccStack.pop(); onStack[poppedVertex] = false; component.add(poppedVertex); } while (poppedVertex != current); allSCCs.add(component); } } } } return allSCCs; } public static void main(String[] args) { // Test with a larger graph to prove no stack overflow int nodeCount = 100_000; TarjanSCCIterative graph = new TarjanSCCIterative(nodeCount); // One giant cycle across all 100k nodes — should be exactly 1 SCC for (int i = 0; i < nodeCount - 1; i++) { graph.addDirectedEdge(i, i + 1); } graph.addDirectedEdge(nodeCount - 1, 0); // close the cycle List<List<Integer>> sccs = graph.findAllSCCs(); System.out.println("Graph with " + nodeCount + " nodes in one giant cycle."); System.out.println("SCCs found: " + sccs.size()); // should be 1 System.out.println("SCC size: " + sccs.get(0).size()); // should be 100000 System.out.println("No StackOverflowError — iterative approach works at scale."); // Also test the original 5-node graph for correctness TarjanSCCIterative smallGraph = new TarjanSCCIterative(5); smallGraph.addDirectedEdge(0, 1); smallGraph.addDirectedEdge(1, 2); smallGraph.addDirectedEdge(2, 0); smallGraph.addDirectedEdge(1, 3); smallGraph.addDirectedEdge(3, 4); List<List<Integer>> smallSCCs = smallGraph.findAllSCCs(); System.out.println("\nSmall graph SCCs: " + smallSCCs.size()); smallSCCs.forEach(scc -> System.out.println(" " + scc)); } }
SCCs found: 1
SCC size: 100000
No StackOverflowError — iterative approach works at scale.
Small graph SCCs: 3
[4]
[3]
[2, 1, 0]
| Feature / Aspect | Kosaraju's Algorithm | Tarjan's Algorithm |
|---|---|---|
| DFS passes required | Two separate passes | One pass only |
| Extra memory needed | Full transposed graph (O(V+E) extra) | O(V) for discoveryTime, lowLink, onStack arrays |
| Conceptual simplicity | Easier to explain and remember | Requires understanding low-link values |
| Recursive stack depth | Same risk — two DFS traversals | Same risk — one DFS traversal |
| Output order | SCCs in reverse topological order of condensation | SCCs in reverse topological order of condensation |
| Handles self-loops | Yes — self-loop vertex is its own SCC | Yes — lowLink[v] == discoveryTime[v] still holds |
| Handles disconnected graphs | Yes — outer loop covers all components | Yes — outer loop covers all components |
| Preferred in production | When code clarity is the priority | When memory efficiency matters most |
| Interview recognition | Very commonly asked — easy to whiteboard | Slightly harder to whiteboard but more impressive |
🎯 Key Takeaways
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Updating lowLink through cross-edges to completed SCCs — Symptom: multiple real SCCs get merged into one giant component — Fix: only update lowLink[current] from a neighbour if onStack[neighbour] is true; ignore neighbours that are visited but no longer on the stack.
- ✕Mistake 2: Using lowLink[neighbour] instead of discoveryTime[neighbour] for back-edge updates — Symptom: subtle test cases fail where an ancestor's low-link has already been compressed through another path, incorrectly pulling down the current vertex's low-link below the true SCC boundary — Fix: for back edges (neighbour already on stack), always use lowLink[current] = min(lowLink[current], discoveryTime[neighbour]), not lowLink[neighbour].
- ✕Mistake 3: Forgetting to handle disconnected graphs in the outer loop — Symptom: vertices unreachable from vertex 0 are never visited and don't appear in any SCC — Fix: wrap your DFS in a for-loop that iterates over all vertices and calls DFS on any vertex where discoveryTime[vertex] == -1, guaranteeing every component is discovered regardless of the starting point.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.