Advanced 7 min · March 05, 2026

Union Find — Disjoint Set

Union-Find - Recursion Causes StackOverflow in Production

Q: What is path compression and why does it matter?

Path compression makes every node on the find path point directly to the root during the find operation. Without it, a chain of unions creates a tall tree and find takes O(n) per call. With path compression, the tree flattens over time and subsequent finds on the same nodes are O(1). Combined with union by rank, the amortized cost per operation becomes O(alpha(n)) — nearly constant.

Q: What is union by rank and why not just always attach root A under root B?

Union by rank always attaches the shorter tree (smaller rank) under the taller tree. This prevents the tree from growing tall. If you always did A-under-B arbitrarily, a sequence of unions could create a linear chain of height n, making find O(n). Union by rank guarantees the tree height stays O(log n) even without path compression.

Q: How is Union-Find used to detect cycles in an undirected graph?

For each edge (u, v): call find(u) and find(v). If they have the same root, adding this edge would create a cycle (u and v are already connected). If they are in different components, union them. After processing all edges, if no cycle was detected, the graph is acyclic. This is exactly how Kruskal's MST algorithm works.

Q: What is the inverse Ackermann function alpha(n)?

It's the number of times you need to apply the logarithm function to reduce n to a constant. For any practical n, alpha(n) ≤ 5. More precisely, alpha(n) grows slower than log(log(...(n)...)) with a tower of 2s. It's defined via the inverse of the Ackermann function A(m,n), but you don't need to memorise the formula — just know it's effectively constant.

Q: Can Union-Find be used for directed graph cycle detection?

No. Union-Find works only for undirected graphs because connectivity in directed graphs is not symmetric: a path from u to v doesn't imply a path from v to u. For directed cycles, use DFS with back-edge detection (white-grey-black marking) or Tarjan's SCC algorithm.

Q: Should I use arrays or objects for parent and rank?

Use arrays (plain lists or fixed-size arrays) for performance. Object-based implementations create memory overhead that kills cache locality on large datasets. In Java, use int[] for parent and rank. In Python, use list of ints. Array-based DSU with linear memory access dominates object-based DSU on modern hardware.

Naive Union-Find recursion causes StackOverflowError on deep chains.

Naren Founder & Principal Engineer

20+ years shipping performance-critical code where algorithms decide the bill. Notes here come from systems that actually shipped.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Union-Find tracks disjoint connected components with near-constant operations
Two operations: find(x) returns root, union(x,y) merges components
Path compression flattens trees during find
Union by rank keeps tree heights O(log n)
Combined: amortized O(alpha(n)) — effectively O(1) for all practical inputs
Real use: graph connectivity, Kruskal's MST, percolation, compiler symbol resolution

✦ Definition~90s read

What is Union Find?

Union-Find (also called Disjoint Set Union, DSU) is a data structure that tracks a set of elements partitioned into disjoint groups. It supports two core operations: find (which group does element x belong to?) and union (merge the groups of x and y).

★

Imagine a school with 1,000 students.

Real-world uses: Kruskal's MST algorithm, detecting cycles in a graph, network connectivity problems (are computers A and B in the same network?), and clustering algorithms.

Naive implementation uses an array where parent[i] = i's direct parent. Find walks up the tree to the root. Two optimizations make it nearly O(1): path compression (flatten the tree during find) and union by rank (attach the smaller tree under the larger).

Plain-English First

Imagine a school with 1,000 students. On day one, everyone is a stranger — each student is their own 'friend group'. As the year goes on, students become friends and their groups merge. Union-Find is a data structure that answers two questions blazing fast: 'Are these two students in the same friend group?' and 'Merge these two groups into one.' That's it. Every graph connectivity problem is secretly this same question at scale.

Distributed systems crash, social networks grow to billions of edges, and compilers resolve symbol dependencies — all of these lean on one elegant primitive: the ability to track which elements belong to the same connected component and merge components together efficiently. Union-Find (also called Disjoint Set Union, or DSU) is the structure that makes this possible in near-constant time, and it shows up everywhere from Kruskal's minimum spanning tree algorithm to LeetCode hard problems to production network topology managers.

The naive solution to connectivity is a hash map or a boolean matrix — mark which nodes are connected and do a BFS every time you're asked a query. That works for toy problems. The moment your graph has millions of nodes and you're doing hundreds of thousands of union and find operations, that approach collapses. Union-Find solves the problem by maintaining a forest of trees where each tree represents one connected component, and two killer optimisations — path compression and union by rank — bring the amortised cost of each operation to effectively O(α(n)), where α is the inverse Ackermann function. For any input you'll encounter in the real world, α(n) ≤ 5. It's as close to O(1) as a non-trivial algorithm gets.

By the end of this article you'll be able to implement a production-grade Union-Find from scratch, explain exactly why path compression and union by rank work together (not just that they do), detect cycles in undirected graphs, build the skeleton of Kruskal's MST, and dodge the three most common bugs that trip up experienced engineers in interviews and on the job.

What is Union-Find (Disjoint Set Union)? — Plain English

Real-world uses: Kruskal's MST algorithm, detecting cycles in a graph, network connectivity problems (are computers A and B in the same network?), and clustering algorithms.

📊 Production Insight

Without path compression, repeated finds on chain unions create O(n) trees.

A 50M-edge graph with naive DSU can cause minutes of GC pauses and stack overflows.

Rule: always pair both optimisations in any code path handling >10K elements.

🎯 Key Takeaway

Union-Find is the hammer for connectivity questions.

Two optimisations — path compression and union by rank — turn linear operations into amortised constant.

Never ship DSU without both; you'll regret it when the graph grows.

thecodeforge.io

Union Find Disjoint Set

How Union-Find (Disjoint Set Union) Works — Step by Step

Initialization: 1. parent[i] = i for all i (each element is its own root). 2. rank[i] = 0 for all i.

Find with path compression (amortized O(alpha(n)) ≈ O(1)): 1. If parent[x] == x: return x (x is the root). 2. parent[x] = find(parent[x]) (path compression: directly connect x to root). 3. Return parent[x].

Union by rank (O(1)): 1. Find root of x: rx = find(x). 2. Find root of y: ry = find(y). 3. If rx == ry: already in same set, return False. 4. Attach smaller rank tree under larger: if rank[rx] < rank[ry]: swap rx,ry. 5. parent[ry] = rx. If rank[rx]==rank[ry]: rank[rx]++. 6. Return True.

With both optimizations: effectively O(alpha(n)) per operation where alpha is the inverse Ackermann function — practically constant.

📊 Production Insight

Recursive find works but risks stack overflow on deep chains (depth > 1000).

Iterative find with while loop avoids stack entirely and is faster.

Union by rank doesn't need to be perfect after path compression — size is more practical for extra queries.

🎯 Key Takeaway

Path compression flattens: each node visited becomes a direct child of the root.

Union by rank bounds height: log n even without path compression.

Together they make DSU the most practically constant data structure.

Before and After: Path Compression Effect — Visual Diagram

Path compression is the most impactful optimisation in Union-Find. Without it, a series of unions can create a deep chain — finding the root of the last element means traversing every ancestor. With path compression, each find operation flattens the tree so that subsequent finds on any of those nodes become O(1). The diagram below shows the parent array before and after a single find(3) on a chain of four nodes. Note how nodes 1, 2, and 3 all point directly to root 0 after compression.

📊 Production Insight

Even after partial path compression, the iterative find in production code gradually flattens the tree. The worst-case chain can be flattened in a single find. Always measure parent array depth before and after processing large batches.

🎯 Key Takeaway

Path compression turns worst-case O(n) finds into amortised O(α(n)). The diagram proves that one find(3) on a chain of length 3 reduces depth to 1 for all nodes on the path.

Path Compression Effect on Parent Array

Step 1

0

0

1

2

Before find(3): parent array shows chain 0→0, 1→0, 2→1, 3→2. Depth to root for node 3 is 3.

Step 2

0

0

0

0

After find(3): all nodes along path now point directly to root 0. Depth is 1 for all.

thecodeforge.io

Union Find Disjoint Set

Worked Example — Tracing the Algorithm

Elements: 0, 1, 2, 3, 4. Initially each is its own group. parent = [0, 1, 2, 3, 4]

union(0, 1): find(0)=0, find(1)=1. Different groups. parent[1]=0. rank[0]=1. parent = [0, 0, 2, 3, 4]

union(2, 3): find(2)=2, find(3)=3. parent[3]=2. rank[2]=1. parent = [0, 0, 2, 2, 4]

union(0, 2): find(0)=0, find(2)=2. rank[0]=rank[2]=1. Attach 2 under 0. rank[0]=2. parent = [0, 0, 0, 2, 4]

find(3): parent[3]=2. parent[2]=0. Path compress: parent[3]=0. Return 0. parent = [0, 0, 0, 0, 4] (3 now points directly to root 0)

find(1): parent[1]=0. Return 0. find(3)=0. Same group! Group containing {0,1,2,3}: root=0. Group {4}: root=4.

📊 Production Insight

Path compression in find(3) made parent[3] point root 0 directly — next find is O(1).

Without union by rank, union(0,2) could have attached 0 under 2, creating height 2 instead of 1.

Production lesson: trace union order; worst-case chain unions happen when elements are added in increasing index order.

🎯 Key Takeaway

Trace one find with path compression to see the flattening effect.

Union by rank prevents tall trees even on adversarial input.

After enough operations, almost all nodes point directly to root.

Implementation

find uses recursive path compression: each node on the path from x to its root is made to point directly to the root. union attaches the lower-rank root under the higher-rank root; equal ranks increment the winner's rank. components tracks the number of disjoint sets and decrements on each successful union. connected simply compares the roots of two nodes.

union_find.pyPYTHON

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank   = [0] * n
        self.components = n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # path compression
        return self.parent[x]

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return False  # already connected
        if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1
        self.components -= 1
        return True

    def connected(self, x, y):
        return self.find(x) == self.find(y)

uf = UnionFind(5)
uf.union(0, 1); uf.union(2, 3); uf.union(0, 2)
print('0 and 3 connected:', uf.connected(0, 3))  # True
print('0 and 4 connected:', uf.connected(0, 4))  # False
print('Components:', uf.components)

Output

0 and 3 connected: True

0 and 4 connected: False

Components: 2

📊 Production Insight

Recursive find is clean but unsafe for large graphs — use iterative version in production.

Track components count: it's a free indicator of partition count; useful for monitoring.

If using union by size instead of rank, componentSize() query becomes free and accurate after path compression.

🎯 Key Takeaway

The code is short but powerful.

Always expose connected() and component count for debugging.

Consider iterative find if recursion depth is a concern.

C++ Implementation with Iterative Find

In production C++ code, iterative find is strongly preferred over recursion to avoid stack overflow. The implementation below uses a while loop for path compression and an integer array for rank. Note the use of std::vector for dynamic sizing.

union_find.cppCPP

#include <vector>

class UnionFind {
public:
    UnionFind(int n) : parent(n), rank(n, 0), components(n) {
        for (int i = 0; i < n; ++i) parent[i] = i;
    }

    int find(int x) {
        // iterative find with path compression
        int root = x;
        while (parent[root] != root)
            root = parent[root];
        // compress path
        while (x != root) {
            int next = parent[x];
            parent[x] = root;
            x = next;
        }
        return root;
    }

    bool union_sets(int x, int y) {
        int rx = find(x);
        int ry = find(y);
        if (rx == ry) return false;
        if (rank[rx] < rank[ry]) std::swap(rx, ry);
        parent[ry] = rx;
        if (rank[rx] == rank[ry]) rank[rx]++;
        components--;
        return true;
    }

    bool connected(int x, int y) { return find(x) == find(y); }
    int get_components() const { return components; }

private:
    std::vector<int> parent;
    std::vector<int> rank;
    int components;
};

Output

// Example usage

UnionFind uf(5);

uf.union_sets(0,1);

uf.union_sets(2,3);

uf.union_sets(0,2);

std::cout << uf.connected(0,3); // 1 (true)

std::cout << uf.connected(0,4); // 0 (false)

std::cout << uf.get_components(); // 2

📊 Production Insight

Iterative find is mandatory in C++ for large graphs due to limited stack per thread (default ~8MB on Linux, but deep recursion still risks overflow). The two‑pass iterative compression is slightly slower than the recursive one‑pass but completely safe.

🎯 Key Takeaway

Always use iterative find in production C++. The two‑path technique (find root, then compress) is straightforward and avoids any recursion risk.

Weighted DSU — Tracking Edge Weights in a Disjoint Set

Standard Union-Find tracks connectivity but not extra information. A weighted DSU variant stores additional data per component, such as sum of edge weights, number of nodes, or aggregated statistics. This is useful when you need to answer queries like “What is the total weight of edges in the component containing node v?” or “Maximum value in the component?”.

The core idea: store an array weight or value per root, and when merging components, combine the values. For example, to track total weight of edges added to each component, increment the root's weight by the edge weight on each union. The query component_weight(x) simply returns weight[find(x)].

weighted_dsu.pyPYTHON

class WeightedDSU:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n          # component size (union by size)
        self.total = [0] * n         # total weight per component (root only)
        self.components = n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y, weight=0):
        """Merge components of x and y, adding 'weight' to the merged component's total."""
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            # already connected, but still add weight (if tracking cumulative edge weight)
            self.total[rx] += weight
            return False
        if self.size[rx] < self.size[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        self.size[rx] += self.size[ry]
        # combine totals: the old total of ry is already in self.total[ry], add to rx
        self.total[rx] += self.total[ry] + weight
        self.components -= 1
        return True

    def component_total(self, x):
        return self.total[self.find(x)]

# Example: building a weighted graph, track total weight per component
dsu = WeightedDSU(5)
dsu.union(0,1,10)   # edge weight 10
print(dsu.component_total(0))  # 10
dsu.union(2,3,20)
dsu.union(0,2,5)    # merge components with an edge weight 5
print(dsu.component_total(0))  # 10 + 20 + 5 = 35

Output

📊 Production Insight

Weighted DSU is useful for clustering algorithms (e.g., aggregate similarity scores) and for monitoring total resource consumption per connected group. Ensure the combine operation is commutative and the weight update logic is correct for directed vs undirected edges.

🎯 Key Takeaway

With a simple extra array, Union-Find can track sums, max/min, or any aggregatable attribute per component. This extends its utility beyond pure connectivity.

Cycle Detection in an Undirected Graph Using Union-Find

Union-Find provides a clean linear-time algorithm for cycle detection in undirected graphs. Process each edge (u, v): find roots of u and v. If they are the same, a cycle exists (u and v are already connected). Otherwise, union them. If no edge produces same-root after all edges, the graph is acyclic.

Complexity: O(E * α(V)) — near-linear. This is used inside Kruskal's MST to skip edges that would create cycles.

Mental Model

Mental Model: Cycle Detection as Blue vs Red Teams

Think of each connected component as a team. If two members already belong to the same team, adding a connection between them creates a conflict — that's a cycle.

Each node starts on its own team.
Union merges two teams into one.
When processing an edge, if both nodes are already on the same team, the edge is redundant — cycle found.
This is exactly how Kruskal's MST builds a tree: it adds edges only if they connect two different teams.

📊 Production Insight

This algorithm works only for undirected graphs. For directed graphs, use DFS with back-edge detection.

In production, watch for self-loops (edge u->u): they always produce a cycle; your code should handle them.

Large graphs with >10M edges may need iterative find to avoid stack overflow.

🎯 Key Takeaway

Cycle detection with Union-Find is elegant and fast.

Use for undirected graphs only.

The same mechanism powers Kruskal's MST — keep it in mind for graph problems.

When to use Union-Find vs DFS for Cycle Detection

IfGraph is undirected, edges are streaming or dynamic

→

UseUnion-Find — supports incremental edge additions in near-constant time

IfGraph is directed

→

UseDFS with back-edge detection (Union-Find doesn't work for directed cycles)

IfNeed to detect cycle in a static graph once

→

UseUnion-Find or DFS — both O(V+E); Union-Find often simpler to implement

Kruskal's Minimum Spanning Tree Algorithm

Kruskal's algorithm finds a minimum spanning tree for a weighted undirected graph: 1. Sort all edges by weight. 2. Initialise Union-Find with V components. 3. For each edge (u, v, w) in sorted order: - If find(u) != find(v): add edge to MST, union(u, v). - If MST size reaches V-1, stop.

Complexity: O(E log E) for sorting + O(E α(V)) for Union-Find. Union-Find makes the cycle detection O(1) per edge.

kruskal_mst.pyPYTHON

def kruskal_mst(vertices, edges):
    """
    edges: list of (weight, u, v)
    returns: list of edges in MST, total weight
    """
    edges.sort()  # sort by weight
    uf = UnionFind(vertices)
    mst = []
    total_weight = 0
    for weight, u, v in edges:
        if uf.union(u, v):
            mst.append((u, v, weight))
            total_weight += weight
            if len(mst) == vertices - 1:
                break
    return mst, total_weight

Output

edges = [(1,0,1), (2,1,2), (3,0,2), (4,2,3)]

mst, total = kruskal_mst(4, edges)

print(mst) # [(0,1,1), (1,2,2), (2,3,4)]

print(total) # 7

📊 Production Insight

Sorting edges is the bottleneck — for dense graphs, Prim's algorithm with heap may be faster.

Union-Find ensures the algorithm runs in near-linear time after sorting.

In production, use indexed edges (list of tuples) rather than objects to reduce memory overhead for large graphs.

🎯 Key Takeaway

Kruskal + Union-Find = production-grade MST.

Union-Find eliminates the need for visited arrays in cycle detection.

For extremely dense graphs, consider Prim's instead.

Understanding O(α(n)) — The Inverse Ackermann Complexity

When both path compression and union by rank are used, the amortised time per operation is O(α(n)), where α is the inverse Ackermann function. The Ackermann function A(m, n) grows extremely fast — far faster than exponential or tower functions. Its inverse α(n) is the number of times you must apply the logarithm function (base 2) to reduce n to a constant. For all practical values of n, α(n) ≤ 5. In fact, α(n) exceeds 5 only when n > 2^65536, a number astronomically larger than the number of atoms in the observable universe. Thus, for any real‑world dataset, Union‑Find operations run in effectively constant time.

📊 Production Insight

When engineers say ‘DSU is O(1)’, they implicitly mean ‘amortised O(α(n)) with α(n) ≤ 5’. This bound holds only because both optimizations work together. Skipping either one raises the worst‑case to O(log n) or O(n) per operation — a dramatic difference at scale.

🎯 Key Takeaway

O(α(n)) is practically constant for any real dataset. The bound relies on the combination of path compression and union by rank.

Practice Problems for Union-Find

Sharpen your Union-Find skills with these carefully chosen coding problems. They cover fundamental connectivity, cycle detection, and advanced applications.

Problem	Platform	Difficulty	Core Concept
[Redundant Connection (LeetCode 684)](https://leetcode.com/problems/redundant-connection/)	LeetCode	Medium	Cycle detection in an undirected graph (edge that creates a cycle)
[Number of Islands (LeetCode 200)](https://leetcode.com/problems/number-of-islands/)	LeetCode	Medium	Counting connected components in a grid (2D DSU)
[Accounts Merge (LeetCode 721)](https://leetcode.com/problems/accounts-merge/)	LeetCode	Medium	Merging accounts with common emails
[Minimize Malware Spread (LeetCode 924)](https://leetcode.com/problems/minimize-malware-spread/)	LeetCode	Hard	Find the node whose removal minimises infected component size
[The Earliest Moment When Everyone Become Friends (LeetCode 1101)](https://leetcode.com/problems/the-earliest-moment-when-everyone-become-friends/)	LeetCode	Medium	Time‑based connectivity using DSU
[Graph Connectivity With Threshold (LeetCode 1627)](https://leetcode.com/problems/graph-connectivity-with-threshold/)	LeetCode	Hard	DSU with divisors — connect nodes that share a common divisor above a threshold
[Checking Existence of Edge Length Limited Paths (LeetCode 1697)](https://leetcode.com/problems/checking-existence-of-edge-length-limited-paths/)	LeetCode	Hard	Offline queries with DSU (process edges in sorted order)

Start with LeetCode 684 and 200 to solidify the basics, then move to the hard problems for advanced DSU patterns.

📊 Production Insight

These problems map directly to real scenarios: connectivity in distributed systems (Accounts Merge), network failure impact (Minimize Malware Spread), and dynamic threshold queries (Graph Connectivity With Threshold). Practicing them prepares you for both interviews and production challenges.

🎯 Key Takeaway

The 7 problems above cover every DSU pattern from simple connectivity to offline queries. Work through them in order of difficulty for deep mastery.

Advantages and Disadvantages of Union-Find

Union-Find is a powerful and elegant data structure, but it's not a silver bullet. Below we summarize its strengths and limitations.

Advantage	Disadvantage
Near‑constant amortised time per operation (O(α(n)))	Only works for undirected graphs — cannot detect cycles in directed graphs
Incremental — supports dynamic edge additions	No support for edge deletions (requires offline techniques like rollback DSU)
Extremely memory efficient (two arrays of ints)	Recursive implementations risk stack overflow on large inputs
Simple to implement and debug (roughly 20 lines)	Not suitable for queries like “what is the path between two nodes?” — only connectivity
Ideal for Kruskal's MST, network connectivity, and percolation	Cannot track most‑recent union time without extra data structures
Easily extended (weighted DSU, union by size, etc.)	Slower than adjacency‑list DFS for single static connectivity queries

In summary, Union-Find excels at dynamic connectivity problems in undirected graphs where the primary operations are union and find. For static graphs, a simple BFS/DFS might be faster. For directed graphs or queries involving paths, other data structures are needed.

📊 Production Insight

When evaluating DSU for a production system, check if your graph is undirected, if edges are only added (never removed), and if the main query is connectivity. If yes, DSU is unbeatable. Otherwise, consider alternative approaches like adjacency lists or DFS.

🎯 Key Takeaway

Union-Find is a specialized tool: fast, memory‑light, but limited to undirected incremental connectivity. Know when to use it and when to choose something else.

Check Set Membership: Are Two Elements in the Same Group?

The whole point of Union-Find is answering one question fast: "Are these two items connected?" Not "how" — just "yes" or "no".

You resolve this by comparing their root representatives. If find(x) == find(y), they're in the same set. If not, they're strangers.

The naive approach scans both chains to the root. That's O(n) in worst case — a linked list on steroids. But with path compression, the second call is practically O(1). Every find flattens the tree, caching the root directly.

Production reality: you almost never call find once. You call it in loops. You call it inside union. You call it in edge processing. If you skip path compression, your latency graph looks like a hockey stick. Always compress.

GroupMembership.javaJAVA

// io.thecodeforge — dsa tutorial

public class GroupMembership {
    private int[] parent;
    private int[] rank;

    public GroupMembership(int n) {
        parent = new int[n];
        rank = new int[n];
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    public int find(int x) {
        if (parent[x] != x) parent[x] = find(parent[x]); // path compression
        return parent[x];
    }

    public void union(int x, int y) {
        int px = find(x);
        int py = find(y);
        if (px == py) return;
        if (rank[px] < rank[py]) parent[px] = py;
        else if (rank[px] > rank[py]) parent[py] = px;
        else { parent[py] = px; rank[px]++; }
    }

    public boolean sameGroup(int x, int y) {
        return find(x) == find(y);
    }
}

Output

No output — utility class. Call sameGroup(3, 7) → true/false.

⚠ Production Trap:

Never call find() inside a system that's not path-compressed. Your O(n) becomes O(n²) across 100k elements. Seen a 30-second reconnect? This is why.

🎯 Key Takeaway

Two elements belong to the same set if and only if their roots are identical after compression.

thecodeforge.io

Union Find Disjoint Set

Core Data Structures: Arrays Are All You Need

Union-Find doesn't need linked lists, trees, or adjacency maps. Two flat arrays are enough: one for parent pointers, one for rank or size.

parent[i] stores the immediate parent of element i. Initially, each element is its own parent — i.e., a singleton set. Root nodes are self-referential.

rank[i] tracks the tree height (or an upper bound). When merging, attach the shorter tree under the taller one. Without rank, you'll degenerate into a linked list the moment someone unions in order.

You can also use size[i] instead of rank — that gives you weighted union semantics. Both work. Rank is cheaper to update. Size lets you query set cardinality. Pick one, be consistent.

The beauty? Zero allocations after init. No garbage collection spikes. No cache misses from pointer chasing. Just two contiguous int arrays.

DSUDataStructure.javaJAVA

// io.thecodeforge — dsa tutorial

public class DSUDataStructure {
    private int[] parent;
    private int[] rank;

    public DSUDataStructure(int n) {
        parent = new int[n];
        rank = new int[n];
        for (int i = 0; i < n; i++) {
            parent[i] = i;  // each node is its own parent
            rank[i] = 0;    // all trees start at height 0
        }
    }

    // For debugging: dump state
    public void printState() {
        System.out.print("parent: ");
        for (int p : parent) System.out.print(p + " ");
        System.out.print("\nrank:   ");
        for (int r : rank) System.out.print(r + " ");
        System.out.println();
    }
}

Output

No output unless printState() called. Example: parent: 0 1 2 3 4 rank: 0 0 0 0 0

💡Senior Shortcut:

Use int[] — not ArrayList, not HashMap. Boxed objects eat cache. Primitive arrays are 4 bytes per element and fit in L1 for 100k elements.

🎯 Key Takeaway

Two flat arrays of primitive ints — parent and rank — are all you need to represent the forest.

● Production incidentPOST-MORTEMseverity: high

The 40-Second Graph Build That Never Finished

Symptom

Graph connectivity job hangs for 40+ minutes then crashes with StackOverflowError.

Assumption

The Union-Find library from a third-party had both optimisations enabled by default.

Root cause

Naive find used recursion without path compression. On a deep chain of unions, find traversed O(n) nodes, and recursion depth exceeded JVM default stack (typically 1024).

Fix

Switched to iterative find with path compression: while(parent[x] != x) { int p = parent[x]; parent[x] = parent[parent[x]]; x = p; }. Also added union by rank to keep trees flat.

Key lesson

Always implement Union-Find with both optimisations in any production code path handling >10K elements.
Prefer iterative find when recursion depth could exceed 1,000 — JVM stack limits bite hard.
Profile your union-find under realistic data shapes: chain unions (worst case) should be in your test suite.

Production debug guideSymptom → Action for the three most common DSU failures3 entries

Symptom · 01

find(x) returns a node that is not the root (wrong component id)

→

Fix

Check if path compression is properly implemented. In recursive version, ensure parent[x] = find(parent[x]) is called. In iterative, verify the while loop updates ancestor links. Also check parent array initialisation: all elements must be self-parent at start.

Symptom · 02

union(x,y) returns True but the component count doesn't decrease

→

Fix

Verify that union returns False only when roots are equal. If ranks are used, ensure the rank increment only happens when both ranks equal. Common bug: rank[rx]++ inside else block instead of post-attach.

Symptom · 03

StackOverflowError during find on large datasets

→

Fix

Switch from recursive to iterative find. Alternatively increase JVM stack size (-Xss), but that masks the problem. The real fix is path compression and avoiding deep trees via union by rank.

★ Union-Find Quick Debug Cheat SheetUse these commands to diagnose the most common DSU production failures. Assumes a Python implementation. Adapt for other languages.

find returns wrong root after many unions−

Immediate action

Print parent array and trace path from suspect node to root manually.

Commands

def debug_find(uf, x): path = []; while uf.parent[x] != x: path.append(x); x = uf.parent[x]; print('Path:', '->'.join(map(str, path))); return x

uf = UnionFind(1000); uf.union(0,1); uf.union(2,3); print([uf.find(i) for i in range(5)])

Fix now

Ensure parent array initialised with list(range(n)). No element should have parent pointing outside [0,n-1].

component count not decrementing+

Union by Rank vs Union by Size in Path Compression DSU

Aspect	Union by Rank	Union by Size
What's tracked	Upper bound on tree height	Exact number of nodes in component
Merge rule	Attach lower-rank root under higher-rank root	Attach smaller component under larger component
Rank/size update	Increment only when ranks are equal	Always add smaller size to larger
After path compression	Rank becomes a proxy (no longer exact height)	Size remains accurate — always exact
Extra utility	None beyond merge decisions	componentSize() query is free and always correct
Asymptotic bound	O(α(n)) amortised	O(α(n)) amortised — identical
Which to prefer	Theoretical purists, textbooks	Practical production code — size is queryable

⚙ Quick Reference

6 commands from this guide

File	Command / Code	Purpose
union_find.py	class UnionFind:	Implementation
union_find.cpp	class UnionFind {	C++ Implementation with Iterative Find
weighted_dsu.py	class WeightedDSU:	Weighted DSU
kruskal_mst.py	def kruskal_mst(vertices, edges):	Kruskal's Minimum Spanning Tree Algorithm
GroupMembership.java	public class GroupMembership {	Check Set Membership
DSUDataStructure.java	public class DSUDataStructure {	Core Data Structures

Key takeaways

Union-Find tracks disjoint sets with near O(1) find and union operations.

Path compression flattens the tree during find

future finds on same nodes are O(1).

Union by rank attaches the shorter tree under the taller

prevents O(n) tree height.

Combined, the amortized complexity per operation is O(alpha(n))

the inverse Ackermann function.

Used in Kruskal's MST, cycle detection in undirected graphs, and network connectivity.

Iterative find avoids stack overflow

use for production systems with large datasets.

Union by size gives free componentSize() queries

prefer it for practical applications.

LEETCODE PRACTICE · 6 PROBLEMS

Practice These on LeetCode

Open LeetCode

#200Easy

Number of Islands

Exercises Union Find by connecting adjacent land cells and counting distinct connected components in a grid.

Solve on LeetCode

#684Medium

Redundant Connection

Exercises Union Find by detecting the first edge that creates a cycle in an undirected graph.

Solve on LeetCode

#721Medium

Accounts Merge

Exercises Union Find by merging accounts that share common emails into connected components.

Solve on LeetCode

#924Hard

Minimize Malware Spread

Exercises Union Find by computing component sizes and identifying the node whose removal minimizes infection spread.

Solve on LeetCode

#1101Medium

The Earliest Moment When Everyone Become Friends

Exercises Union Find by processing timestamps to determine when all nodes become connected.

Solve on LeetCode

#128Hard

Longest Consecutive Sequence

Exercises Union Find by unioning consecutive numbers to find the largest connected component.

Solve on LeetCode

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

What are path compression and union by rank, and why do they matter?

Q02SENIOR

How does Union-Find detect cycles in an undirected graph?

Q03SENIOR

What is the amortized time complexity of Union-Find with both optimizati...

Q04SENIOR

Why does union by rank guarantee O(log n) tree height even without path ...

Q05SENIOR

When would you choose union by size over union by rank?

Q01 of 05SENIOR

What are path compression and union by rank, and why do they matter?

ANSWER

Path compression makes each node on the find path point directly to the root during the find operation. Union by rank always attaches the shorter tree under the taller tree, bounding tree height to O(log n). Together they bring the amortised time per operation to O(α(n)), the inverse Ackermann function — practically constant. Without them, Union-Find can degrade to O(n) per operation on chain-like unions.

FAQ · 6 QUESTIONS

Frequently Asked Questions

What is path compression and why does it matter?

What is union by rank and why not just always attach root A under root B?

How is Union-Find used to detect cycles in an undirected graph?

What is the inverse Ackermann function alpha(n)?

Can Union-Find be used for directed graph cycle detection?

Should I use arrays or objects for parent and rank?

Naren Founder & Principal Engineer

20+ years shipping performance-critical code where algorithms decide the bill. Notes here come from systems that actually shipped.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's Graphs. Mark it forged?

7 min read · try the examples if you haven't