Strongly Connected Components — Kosaraju's and Tarjan's Algorithms
- SCCs partition a directed graph into maximal groups where every node can reach every other.
- Kosaraju's uses two DFS passes: the first computes finish order, the second processes the transposed graph.
- Both Kosaraju's and Tarjan's algorithms run in O(V+E) time.
Imagine a group of friends where everyone can pass a message to everyone else — maybe not directly, but through mutual friends. That closed circle is a Strongly Connected Component. Now imagine two separate friend groups at a party who never mingle: those are two different SCCs. Tarjan's and Kosaraju's algorithms are just clever ways of finding all those isolated circles in one pass through the crowd.
Social networks, airline route maps, compiler dependency graphs, deadlock detection in operating systems — every one of these systems hides a graph underneath, and inside that graph lurk tightly knit clusters where every node can reach every other node. Those clusters are Strongly Connected Components (SCCs), and identifying them is one of the most practically useful graph operations you'll ever implement. Miss an SCC and your compiler might mis-order function compilation. Ignore SCCs in a routing system and you'll miss that a subset of cities is completely isolated from the rest of the network.
The core problem SCCs solve is reachability symmetry in directed graphs. In an undirected graph, if A can reach B then B can reach A — trivially. But in a directed graph, edges have direction, so reachability isn't symmetric. You might be able to fly from New York to London but the direct return flight might not exist. SCCs carve a directed graph into maximal groups where every internal pair of nodes can mutually reach each other, giving you a clean decomposition of the graph's cyclic structure.
By the end of this article you'll understand exactly how Kosaraju's two-pass DFS and Tarjan's single-pass low-link algorithm work at the bit-level, you'll know which one to reach for in production and why, you'll have complete runnable Java implementations of both, and you'll be ready to handle the tricky follow-up questions interviewers throw at candidates who only memorised the surface-level answer.
What are Strongly Connected Components? — Plain English
A Strongly Connected Component (SCC) is a maximal group of nodes in a directed graph where every node can reach every other node in the group. Think of it like social circles where everyone in the circle can directly or indirectly message everyone else. For example, in a graph where A→B→C→A (a triangle), all three form one SCC. If D→A but A cannot reach D, then D is its own SCC. SCCs partition the graph into groups. Contracting each SCC to a single super-node produces a DAG (Directed Acyclic Graph) — the condensation graph. Applications: compiler analysis (detecting circular dependencies), social network analysis (finding tightly-knit communities), and internet page ranking.
How Kosaraju's Algorithm Works — Step by Step
Kosaraju's uses two DFS passes:
Pass 1 — Find finish order: 1. Run DFS on the original graph. When a node fully finishes (all its descendants explored), push it onto a stack. 2. The last node pushed is the node that finishes last in DFS order.
Pass 2 — Process in reverse finish order on reversed graph: 3. Reverse all edges in the graph (transpose). 4. Pop nodes from the stack one by one. 5. If the node is unvisited in the reversed graph, run DFS from it on the reversed graph. 6. All nodes visited in this DFS form one SCC. 7. Repeat until the stack is empty.
Why it works: the finish-order stack ensures we process 'sink SCCs' first in the transposed graph. Each DFS on the transposed graph from a sink SCC reaches exactly the nodes in that SCC and no others.
Worked Example — Kosaraju's on a 5-Node Graph
Directed graph: 0→1, 1→2, 2→0, 1→3, 3→4. SCCs should be: {0,1,2}, {3}, {4}.
- DFS(0): go to 1, go to 2, go to 0 (visited), finish 2 (push 2), go to 3, go to 4, finish 4 (push 4), finish 3 (push 3), finish 1 (push 1), finish 0 (push 0).
- Stack (bottom to top): [2, 4, 3, 1, 0].
Pass 2 on reversed graph (edges: 1→0, 2→1, 0→2, 3→1, 4→3): - Pop 0: DFS on reversed. 0→2→1→0(visited). Visits {0,1,2} → SCC #1. - Pop 1: already visited. - Pop 3: DFS on reversed. 3→1(visited). Visits {3} → SCC #2. - Pop 4: DFS on reversed. 4→3(visited). Visits {4} → SCC #3. - Result: [{0,1,2}, {3}, {4}].
Implementation
Kosaraju's runs two DFS passes: the first on the original graph to collect finish order, the second on the transposed graph in reverse finish order. Each SCC is the set of nodes visited in one DFS of pass 2. Tarjan's algorithm finds SCCs in a single DFS pass using a stack and 'low-link' values — more complex but only one traversal. Both algorithms run in O(V+E) time.
from collections import defaultdict def kosaraju_scc(num_v, adj): """Returns list of SCCs (each SCC is a list of node indices).""" # Pass 1: DFS to get finish order visited = [False] * num_v finish_stack = [] def dfs1(v): visited[v] = True for nb in adj[v]: if not visited[nb]: dfs1(nb) finish_stack.append(v) for v in range(num_v): if not visited[v]: dfs1(v) # Build transposed graph radj = defaultdict(list) for v in range(num_v): for nb in adj[v]: radj[nb].append(v) # Pass 2: DFS on transposed in reverse finish order visited2 = [False] * num_v sccs = [] def dfs2(v, component): visited2[v] = True component.append(v) for nb in radj[v]: if not visited2[nb]: dfs2(nb, component) while finish_stack: v = finish_stack.pop() if not visited2[v]: comp = [] dfs2(v, comp) sccs.append(comp) return sccs # Graph: 0->1->2->0 (SCC1), 1->3->4 (separate SCCs) adj = defaultdict(list) for u, v in [(0,1),(1,2),(2,0),(1,3),(3,4)]: adj[u].append(v) result = kosaraju_scc(5, adj) for scc in result: print(sorted(scc))
[3]
[4]
| Feature / Aspect | Kosaraju's Algorithm | Tarjan's Algorithm |
|---|---|---|
| DFS passes required | Two separate passes | One pass only |
| Extra memory needed | Full transposed graph (O(V+E) extra) | O(V) for discoveryTime, lowLink, onStack arrays |
| Conceptual simplicity | Easier to explain and remember | Requires understanding low-link values |
| Recursive stack depth | Same risk — two DFS traversals | Same risk — one DFS traversal |
| Output order | SCCs in reverse topological order of condensation | SCCs in reverse topological order of condensation |
| Handles self-loops | Yes — self-loop vertex is its own SCC | Yes — lowLink[v] == discoveryTime[v] still holds |
| Handles disconnected graphs | Yes — outer loop covers all components | Yes — outer loop covers all components |
| Preferred in production | When code clarity is the priority | When memory efficiency matters most |
| Interview recognition | Very commonly asked — easy to whiteboard | Slightly harder to whiteboard but more impressive |
🎯 Key Takeaways
- SCCs partition a directed graph into maximal groups where every node can reach every other.
- Kosaraju's uses two DFS passes: the first computes finish order, the second processes the transposed graph.
- Both Kosaraju's and Tarjan's algorithms run in O(V+E) time.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is a strongly connected component?
- QDescribe Kosaraju's algorithm.
- QWhat is the condensation graph?
Frequently Asked Questions
Why does Kosaraju's algorithm need two DFS passes?
The first pass computes the finish order, which reveals the SCC structure. The node that finishes last must be in a 'source SCC' (one with no incoming edges from other SCCs). Processing the transposed graph in reverse finish order starts from sink SCCs, and each DFS reaches exactly one SCC because edges into that SCC from other SCCs are now outgoing in the transposed graph.
What is the difference between Kosaraju's and Tarjan's algorithms?
Kosaraju's uses two passes and is easier to understand and implement. Tarjan's uses a single DFS pass with a stack and 'low-link' values (the lowest discovery time reachable from a subtree). Tarjan's is more efficient in practice (one pass) but harder to implement correctly. Both are O(V+E).
What is the condensation graph?
After finding all SCCs, contract each SCC into a single super-node. Edges between super-nodes represent edges between different SCCs in the original graph. The condensation graph is always a DAG — no cycles, because a cycle would merge those SCCs into one. The condensation reveals the high-level structure of the directed graph.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.