Union-Find Disjoint Set Explained — Path Compression, Union by Rank & Real-World Use Cases
- Union-Find tracks disjoint sets with near O(1) find and union operations.
- Path compression flattens the tree during find — future finds on same nodes are O(1).
- Union by rank attaches the shorter tree under the taller — prevents O(n) tree height.
Imagine a school with 1,000 students. On day one, everyone is a stranger — each student is their own 'friend group'. As the year goes on, students become friends and their groups merge. Union-Find is a data structure that answers two questions blazing fast: 'Are these two students in the same friend group?' and 'Merge these two groups into one.' That's it. Every graph connectivity problem is secretly this same question at scale.
Distributed systems crash, social networks grow to billions of edges, and compilers resolve symbol dependencies — all of these lean on one elegant primitive: the ability to track which elements belong to the same connected component and merge components together efficiently. Union-Find (also called Disjoint Set Union, or DSU) is the structure that makes this possible in near-constant time, and it shows up everywhere from Kruskal's minimum spanning tree algorithm to LeetCode hard problems to production network topology managers.
The naive solution to connectivity is a hash map or a boolean matrix — mark which nodes are connected and do a BFS every time you're asked a query. That works for toy problems. The moment your graph has millions of nodes and you're doing hundreds of thousands of union and find operations, that approach collapses. Union-Find solves the problem by maintaining a forest of trees where each tree represents one connected component, and two killer optimisations — path compression and union by rank — bring the amortised cost of each operation to effectively O(α(n)), where α is the inverse Ackermann function. For any input you'll encounter in the real world, α(n) ≤ 5. It's as close to O(1) as a non-trivial algorithm gets.
By the end of this article you'll be able to implement a production-grade Union-Find from scratch, explain exactly why path compression and union by rank work together (not just that they do), detect cycles in undirected graphs, build the skeleton of Kruskal's MST, and dodge the three most common bugs that trip up experienced engineers in interviews and on the job.
What is Union-Find (Disjoint Set Union)? — Plain English
Union-Find (also called Disjoint Set Union, DSU) is a data structure that tracks a set of elements partitioned into disjoint groups. It supports two core operations: find (which group does element x belong to?) and union (merge the groups of x and y).
Real-world uses: Kruskal's MST algorithm, detecting cycles in a graph, network connectivity problems (are computers A and B in the same network?), and clustering algorithms.
Naive implementation uses an array where parent[i] = i's direct parent. Find walks up the tree to the root. Two optimizations make it nearly O(1): path compression (flatten the tree during find) and union by rank (attach the smaller tree under the larger).
How Union-Find (Disjoint Set Union) Works — Step by Step
Initialization: 1. parent[i] = i for all i (each element is its own root). 2. rank[i] = 0 for all i.
Find with path compression (amortized O(alpha(n)) ≈ O(1)): 1. If parent[x] == x: return x (x is the root). 2. parent[x] = find(parent[x]) (path compression: directly connect x to root). 3. Return parent[x].
Union by rank (O(1)): 1. Find root of x: rx = find(x). 2. Find root of y: ry = find(y). 3. If rx == ry: already in same set, return False. 4. Attach smaller rank tree under larger: if rank[rx] < rank[ry]: swap rx,ry. 5. parent[ry] = rx. If rank[rx]==rank[ry]: rank[rx]++. 6. Return True.
With both optimizations: effectively O(alpha(n)) per operation where alpha is the inverse Ackermann function — practically constant.
Worked Example — Tracing the Algorithm
Elements: 0, 1, 2, 3, 4. Initially each is its own group. parent = [0, 1, 2, 3, 4]
union(0, 1): find(0)=0, find(1)=1. Different groups. parent[1]=0. rank[0]=1. parent = [0, 0, 2, 3, 4]
union(2, 3): find(2)=2, find(3)=3. parent[3]=2. rank[2]=1. parent = [0, 0, 2, 2, 4]
union(0, 2): find(0)=0, find(2)=2. rank[0]=rank[2]=1. Attach 2 under 0. rank[0]=2. parent = [0, 0, 0, 2, 4]
find(3): parent[3]=2. parent[2]=0. Path compress: parent[3]=0. Return 0. parent = [0, 0, 0, 0, 4] (3 now points directly to root 0)
find(1): parent[1]=0. Return 0. find(3)=0. Same group! Group containing {0,1,2,3}: root=0. Group {4}: root=4.
Implementation
find uses recursive path compression: each node on the path from x to its root is made to point directly to the root. union attaches the lower-rank root under the higher-rank root; equal ranks increment the winner's rank. components tracks the number of disjoint sets and decrements on each successful union. connected simply compares the roots of two nodes.
class UnionFind: def __init__(self, n): self.parent = list(range(n)) self.rank = [0] * n self.components = n def find(self, x): if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) # path compression return self.parent[x] def union(self, x, y): rx, ry = self.find(x), self.find(y) if rx == ry: return False # already connected if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx self.parent[ry] = rx if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1 self.components -= 1 return True def connected(self, x, y): return self.find(x) == self.find(y) uf = UnionFind(5) uf.union(0, 1); uf.union(2, 3); uf.union(0, 2) print('0 and 3 connected:', uf.connected(0, 3)) # True print('0 and 4 connected:', uf.connected(0, 4)) # False print('Components:', uf.components)
0 and 4 connected: False
Components: 2
| Aspect | Union by Rank | Union by Size |
|---|---|---|
| What's tracked | Upper bound on tree height | Exact number of nodes in component |
| Merge rule | Attach lower-rank root under higher-rank root | Attach smaller component under larger component |
| Rank/size update | Increment only when ranks are equal | Always add smaller size to larger |
| After path compression | Rank becomes a proxy (no longer exact height) | Size remains accurate — always exact |
| Extra utility | None beyond merge decisions | componentSize() query is free and always correct |
| Asymptotic bound | O(α(n)) amortised | O(α(n)) amortised — identical |
| Which to prefer | Theoretical purists, textbooks | Practical production code — size is queryable |
🎯 Key Takeaways
- Union-Find tracks disjoint sets with near O(1) find and union operations.
- Path compression flattens the tree during find — future finds on same nodes are O(1).
- Union by rank attaches the shorter tree under the taller — prevents O(n) tree height.
- Combined, the amortized complexity per operation is O(alpha(n)) — the inverse Ackermann function.
- Used in Kruskal's MST, cycle detection in undirected graphs, and network connectivity.
Interview Questions on This Topic
- QWhat are path compression and union by rank, and why do they matter?
- QHow does Union-Find detect cycles in an undirected graph?
- QWhat is the amortized time complexity of Union-Find with both optimizations?
Frequently Asked Questions
What is path compression and why does it matter?
Path compression makes every node on the find path point directly to the root during the find operation. Without it, a chain of unions creates a tall tree and find takes O(n) per call. With path compression, the tree flattens over time and subsequent finds on the same nodes are O(1). Combined with union by rank, the amortized cost per operation becomes O(alpha(n)) — nearly constant.
What is union by rank and why not just always attach root A under root B?
Union by rank always attaches the shorter tree (smaller rank) under the taller tree. This prevents the tree from growing tall. If you always did A-under-B arbitrarily, a sequence of unions could create a linear chain of height n, making find O(n). Union by rank guarantees the tree height stays O(log n) even without path compression.
How is Union-Find used to detect cycles in an undirected graph?
For each edge (u, v): call find(u) and find(v). If they have the same root, adding this edge would create a cycle (u and v are already connected). If they are in different components, union them. After processing all edges, if no cycle was detected, the graph is acyclic. This is exactly how Kruskal's MST algorithm works.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.