Skip to content
Home DSA Union-Find Disjoint Set Explained — Path Compression, Union by Rank & Real-World Use Cases

Union-Find Disjoint Set Explained — Path Compression, Union by Rank & Real-World Use Cases

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Graphs → Topic 8 of 17
Union-Find (Disjoint Set Union) deep dive: path compression, union by rank, cycle detection, Kruskal's MST, and production gotchas every senior dev must know.
🔥 Advanced — solid DSA foundation required
In this tutorial, you'll learn
Union-Find (Disjoint Set Union) deep dive: path compression, union by rank, cycle detection, Kruskal's MST, and production gotchas every senior dev must know.
  • Union-Find tracks disjoint sets with near O(1) find and union operations.
  • Path compression flattens the tree during find — future finds on same nodes are O(1).
  • Union by rank attaches the shorter tree under the taller — prevents O(n) tree height.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer

Imagine a school with 1,000 students. On day one, everyone is a stranger — each student is their own 'friend group'. As the year goes on, students become friends and their groups merge. Union-Find is a data structure that answers two questions blazing fast: 'Are these two students in the same friend group?' and 'Merge these two groups into one.' That's it. Every graph connectivity problem is secretly this same question at scale.

Distributed systems crash, social networks grow to billions of edges, and compilers resolve symbol dependencies — all of these lean on one elegant primitive: the ability to track which elements belong to the same connected component and merge components together efficiently. Union-Find (also called Disjoint Set Union, or DSU) is the structure that makes this possible in near-constant time, and it shows up everywhere from Kruskal's minimum spanning tree algorithm to LeetCode hard problems to production network topology managers.

The naive solution to connectivity is a hash map or a boolean matrix — mark which nodes are connected and do a BFS every time you're asked a query. That works for toy problems. The moment your graph has millions of nodes and you're doing hundreds of thousands of union and find operations, that approach collapses. Union-Find solves the problem by maintaining a forest of trees where each tree represents one connected component, and two killer optimisations — path compression and union by rank — bring the amortised cost of each operation to effectively O(α(n)), where α is the inverse Ackermann function. For any input you'll encounter in the real world, α(n) ≤ 5. It's as close to O(1) as a non-trivial algorithm gets.

By the end of this article you'll be able to implement a production-grade Union-Find from scratch, explain exactly why path compression and union by rank work together (not just that they do), detect cycles in undirected graphs, build the skeleton of Kruskal's MST, and dodge the three most common bugs that trip up experienced engineers in interviews and on the job.

What is Union-Find (Disjoint Set Union)? — Plain English

Union-Find (also called Disjoint Set Union, DSU) is a data structure that tracks a set of elements partitioned into disjoint groups. It supports two core operations: find (which group does element x belong to?) and union (merge the groups of x and y).

Real-world uses: Kruskal's MST algorithm, detecting cycles in a graph, network connectivity problems (are computers A and B in the same network?), and clustering algorithms.

Naive implementation uses an array where parent[i] = i's direct parent. Find walks up the tree to the root. Two optimizations make it nearly O(1): path compression (flatten the tree during find) and union by rank (attach the smaller tree under the larger).

How Union-Find (Disjoint Set Union) Works — Step by Step

Initialization: 1. parent[i] = i for all i (each element is its own root). 2. rank[i] = 0 for all i.

Find with path compression (amortized O(alpha(n)) ≈ O(1)): 1. If parent[x] == x: return x (x is the root). 2. parent[x] = find(parent[x]) (path compression: directly connect x to root). 3. Return parent[x].

Union by rank (O(1)): 1. Find root of x: rx = find(x). 2. Find root of y: ry = find(y). 3. If rx == ry: already in same set, return False. 4. Attach smaller rank tree under larger: if rank[rx] < rank[ry]: swap rx,ry. 5. parent[ry] = rx. If rank[rx]==rank[ry]: rank[rx]++. 6. Return True.

With both optimizations: effectively O(alpha(n)) per operation where alpha is the inverse Ackermann function — practically constant.

Worked Example — Tracing the Algorithm

Elements: 0, 1, 2, 3, 4. Initially each is its own group. parent = [0, 1, 2, 3, 4]

union(0, 1): find(0)=0, find(1)=1. Different groups. parent[1]=0. rank[0]=1. parent = [0, 0, 2, 3, 4]

union(2, 3): find(2)=2, find(3)=3. parent[3]=2. rank[2]=1. parent = [0, 0, 2, 2, 4]

union(0, 2): find(0)=0, find(2)=2. rank[0]=rank[2]=1. Attach 2 under 0. rank[0]=2. parent = [0, 0, 0, 2, 4]

find(3): parent[3]=2. parent[2]=0. Path compress: parent[3]=0. Return 0. parent = [0, 0, 0, 0, 4] (3 now points directly to root 0)

find(1): parent[1]=0. Return 0. find(3)=0. Same group! Group containing {0,1,2,3}: root=0. Group {4}: root=4.

Implementation

find uses recursive path compression: each node on the path from x to its root is made to point directly to the root. union attaches the lower-rank root under the higher-rank root; equal ranks increment the winner's rank. components tracks the number of disjoint sets and decrements on each successful union. connected simply compares the roots of two nodes.

union_find.py · PYTHON
12345678910111213141516171819202122232425262728
class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank   = [0] * n
        self.components = n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # path compression
        return self.parent[x]

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return False  # already connected
        if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1
        self.components -= 1
        return True

    def connected(self, x, y):
        return self.find(x) == self.find(y)

uf = UnionFind(5)
uf.union(0, 1); uf.union(2, 3); uf.union(0, 2)
print('0 and 3 connected:', uf.connected(0, 3))  # True
print('0 and 4 connected:', uf.connected(0, 4))  # False
print('Components:', uf.components)
▶ Output
0 and 3 connected: True
0 and 4 connected: False
Components: 2
AspectUnion by RankUnion by Size
What's trackedUpper bound on tree heightExact number of nodes in component
Merge ruleAttach lower-rank root under higher-rank rootAttach smaller component under larger component
Rank/size updateIncrement only when ranks are equalAlways add smaller size to larger
After path compressionRank becomes a proxy (no longer exact height)Size remains accurate — always exact
Extra utilityNone beyond merge decisionscomponentSize() query is free and always correct
Asymptotic boundO(α(n)) amortisedO(α(n)) amortised — identical
Which to preferTheoretical purists, textbooksPractical production code — size is queryable

🎯 Key Takeaways

  • Union-Find tracks disjoint sets with near O(1) find and union operations.
  • Path compression flattens the tree during find — future finds on same nodes are O(1).
  • Union by rank attaches the shorter tree under the taller — prevents O(n) tree height.
  • Combined, the amortized complexity per operation is O(alpha(n)) — the inverse Ackermann function.
  • Used in Kruskal's MST, cycle detection in undirected graphs, and network connectivity.

Interview Questions on This Topic

  • QWhat are path compression and union by rank, and why do they matter?
  • QHow does Union-Find detect cycles in an undirected graph?
  • QWhat is the amortized time complexity of Union-Find with both optimizations?

Frequently Asked Questions

What is path compression and why does it matter?

Path compression makes every node on the find path point directly to the root during the find operation. Without it, a chain of unions creates a tall tree and find takes O(n) per call. With path compression, the tree flattens over time and subsequent finds on the same nodes are O(1). Combined with union by rank, the amortized cost per operation becomes O(alpha(n)) — nearly constant.

What is union by rank and why not just always attach root A under root B?

Union by rank always attaches the shorter tree (smaller rank) under the taller tree. This prevents the tree from growing tall. If you always did A-under-B arbitrarily, a sequence of unions could create a linear chain of height n, making find O(n). Union by rank guarantees the tree height stays O(log n) even without path compression.

How is Union-Find used to detect cycles in an undirected graph?

For each edge (u, v): call find(u) and find(v). If they have the same root, adding this edge would create a cycle (u and v are already connected). If they are in different components, union them. After processing all edges, if no cycle was detected, the graph is acyclic. This is exactly how Kruskal's MST algorithm works.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousTopological SortNext →Minimum Spanning Tree — Kruskal and Prim
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged