Home DSA Union-Find Disjoint Set Explained — Path Compression, Union by Rank & Real-World Use Cases

Union-Find Disjoint Set Explained — Path Compression, Union by Rank & Real-World Use Cases

In Plain English 🔥
Imagine a school with 1,000 students. On day one, everyone is a stranger — each student is their own 'friend group'. As the year goes on, students become friends and their groups merge. Union-Find is a data structure that answers two questions blazing fast: 'Are these two students in the same friend group?' and 'Merge these two groups into one.' That's it. Every graph connectivity problem is secretly this same question at scale.
⚡ Quick Answer
Imagine a school with 1,000 students. On day one, everyone is a stranger — each student is their own 'friend group'. As the year goes on, students become friends and their groups merge. Union-Find is a data structure that answers two questions blazing fast: 'Are these two students in the same friend group?' and 'Merge these two groups into one.' That's it. Every graph connectivity problem is secretly this same question at scale.

Distributed systems crash, social networks grow to billions of edges, and compilers resolve symbol dependencies — all of these lean on one elegant primitive: the ability to track which elements belong to the same connected component and merge components together efficiently. Union-Find (also called Disjoint Set Union, or DSU) is the structure that makes this possible in near-constant time, and it shows up everywhere from Kruskal's minimum spanning tree algorithm to LeetCode hard problems to production network topology managers.

The naive solution to connectivity is a hash map or a boolean matrix — mark which nodes are connected and do a BFS every time you're asked a query. That works for toy problems. The moment your graph has millions of nodes and you're doing hundreds of thousands of union and find operations, that approach collapses. Union-Find solves the problem by maintaining a forest of trees where each tree represents one connected component, and two killer optimisations — path compression and union by rank — bring the amortised cost of each operation to effectively O(α(n)), where α is the inverse Ackermann function. For any input you'll encounter in the real world, α(n) ≤ 5. It's as close to O(1) as a non-trivial algorithm gets.

By the end of this article you'll be able to implement a production-grade Union-Find from scratch, explain exactly why path compression and union by rank work together (not just that they do), detect cycles in undirected graphs, build the skeleton of Kruskal's MST, and dodge the three most common bugs that trip up experienced engineers in interviews and on the job.

Building the Core Structure — Find, Union, and the Naive Tree Problem

Every Union-Find instance starts with the same state: n elements, each element is its own parent. That's the 'everyone is a stranger' moment from the analogy. We represent this as a plain integer array called parent, where parent[i] == i means element i is the root of its own component.

The find operation walks up the parent chain until it hits a node that is its own parent — the root. Two elements are in the same component if and only if their roots are identical. The union operation finds the roots of both elements and makes one root point to the other.

Here's the critical insight beginners miss: if you union naively — always making one root point to the other with no strategy — you can create a degenerate tree that looks like a linked list. Finding the root then costs O(n) per call, and with m operations you're at O(m·n). That's catastrophic. The fix is the two optimisations we'll layer in next, but first, see the naive version so you understand exactly what we're solving.

NaiveUnionFind.java · JAVA
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
/**
 * Naive Union-FindNO optimisations.
 * This is the baseline. Study the pathological worst case.
 * DO NOT use this in production.
 */
public class NaiveUnionFind {

    private final int[] parent; // parent[i] = direct parent of node i
    private int componentCount;  // how many separate components exist right now

    public NaiveUnionFind(int totalNodes) {
        this.parent = new int[totalNodes];
        this.componentCount = totalNodes;

        // Initially every node is its own root — n separate components
        for (int node = 0; node < totalNodes; node++) {
            parent[node] = node;
        }
    }

    /**
     * Walk up the parent chain until we reach the root.
     * Root is identified by: parent[root] == root.
     * WORST CASE: O(n) if the tree has degenerated into a chain.
     */
    public int find(int node) {
        while (parent[node] != node) {
            node = parent[node]; // climb one level toward the root
        }
        return node;
    }

    /**
     * Merge the components containing nodeA and nodeB.
     * Returns false if they were already in the same component.
     */
    public boolean union(int nodeA, int nodeB) {
        int rootA = find(nodeA);
        int rootB = find(nodeB);

        if (rootA == rootB) {
            return false; // already connected — unioning would create a cycle
        }

        // Naive: always make rootB point to rootA — no rank consideration
        parent[rootB] = rootA;
        componentCount--;
        return true;
    }

    public boolean connected(int nodeA, int nodeB) {
        return find(nodeA) == find(nodeB);
    }

    public int getComponentCount() {
        return componentCount;
    }

    public static void main(String[] args) {
        // Simulate: 6 cities, progressively connected
        NaiveUnionFind cities = new NaiveUnionFind(6);

        System.out.println("Initial components: " + cities.getComponentCount()); // 6

        cities.union(0, 1); // city 0 and 1 share a road
        cities.union(1, 2); // city 1 and 2 share a road — degenerate chain starts here
        cities.union(2, 3);
        cities.union(3, 4);

        System.out.println("After 4 unions, components: " + cities.getComponentCount()); // 2
        System.out.println("City 0 and 4 connected? " + cities.connected(0, 4));  // true
        System.out.println("City 0 and 5 connected? " + cities.connected(0, 5));  // false

        // The tree for nodes 0-4 is now a chain: 4 -> 3 -> 2 -> 1 -> 0
        // find(4) must climb 4 levels — O(n) already with just 5 nodes
        System.out.println("Root of city 4: " + cities.find(4)); // 0
    }
}
▶ Output
Initial components: 6
After 4 unions, components: 2
City 0 and 4 connected? true
City 0 and 5 connected? false
Root of city 4: 0
⚠️
Watch Out: The Degenerate ChainIf you union elements in a linear sequence (0→1, 1→2, 2→3 ...) with no rank strategy, your 'tree' becomes a linked list. Finding the root of the last element costs O(n). With 10 million nodes this becomes a seconds-long stall in what should be a millisecond operation.

Path Compression + Union by Rank — The Two Optimisations That Make DSU Magical

The two optimisations are independent — each helps on its own — but together they achieve the inverse Ackermann O(α(n)) amortised bound, which is the theoretical best for this problem.

Path Compression rewires the parent pointers during find. Instead of just returning the root, we make every node on the path point directly to the root. Next time anyone asks for the root of any node along that path, it's a single hop. This is the 'flatten the tree as you use it' trick. We implement it with a single recursive call (or iteratively with two passes).

Union by Rank decides which root becomes the new root when two components merge. We keep a rank array — think of it as an upper bound on the tree height. The smaller-rank root always attaches to the larger-rank root. This prevents tall trees from forming in the first place. Rank only increases when two trees of equal rank merge (and it increases by exactly 1).

A subtlety: once path compression is active, actual tree heights diverge from rank values. That's fine — rank becomes a proxy for 'influence', not a literal height. Some implementations use size instead of rank (always attach smaller component to larger). Size-based union is arguably simpler to reason about and keeps the same asymptotic bound. We'll implement both so you can choose.

OptimisedUnionFind.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122
/**
 * Production-grade Union-Find with:
 *   1. Path Compression  — flattens trees during find()
 *   2. Union by Rank     — keeps trees shallow during union()
 *
 * Amortised time per operation: O(α(n)) — effectively O(1)
 * Space: O(n)
 */
public class OptimisedUnionFind {

    private final int[] parent;
    private final int[] rank;    // upper bound on subtree height
    private final int[] size;    // number of nodes in the component (bonus for free)
    private int componentCount;

    public OptimisedUnionFind(int totalNodes) {
        parent         = new int[totalNodes];
        rank           = new int[totalNodes];
        size           = new int[totalNodes];
        componentCount = totalNodes;

        for (int node = 0; node < totalNodes; node++) {
            parent[node] = node; // every node starts as its own root
            rank[node]   = 0;    // single-node tree has height 0
            size[node]   = 1;    // each component starts with 1 node
        }
    }

    /**
     * PATH COMPRESSION — recursive variant (two-pass compression).
     *
     * On the way DOWN the recursion we find the root.
     * On the way BACK UP we rewire every node to point directly to the root.
     * After one call, the entire path is depth-1.
     */
    public int find(int node) {
        if (parent[node] != node) {
            // Recurse to root, then set this node's parent to the root directly
            parent[node] = find(parent[node]); // ← the compression happens here
        }
        return parent[node];
    }

    /**
     * UNION BY RANK.
     *
     * Attach the root with lower rank under the root with higher rank.
     * If ranks are equal, attach either way and increment the surviving root's rank.
     *
     * Returns true if a merge happened, false if they were already connected.
     */
    public boolean union(int nodeA, int nodeB) {
        int rootA = find(nodeA);
        int rootB = find(nodeB);

        if (rootA == rootB) {
            return false; // same component — no merge needed, would form a cycle
        }

        // Attach smaller-rank tree under larger-rank tree
        if (rank[rootA] < rank[rootB]) {
            parent[rootA] = rootB;       // rootA hangs under rootB
            size[rootB]  += size[rootA]; // update component size at the new root
        } else if (rank[rootA] > rank[rootB]) {
            parent[rootB] = rootA;
            size[rootA]  += size[rootB];
        } else {
            // Equal rank: arbitrary choice — attach rootB under rootA
            parent[rootB] = rootA;
            size[rootA]  += size[rootB];
            rank[rootA]++;               // only case where rank increments
        }

        componentCount--;
        return true;
    }

    public boolean connected(int nodeA, int nodeB) {
        return find(nodeA) == find(nodeB);
    }

    /** How many nodes are in the component containing this node? */
    public int componentSize(int node) {
        return size[find(node)];
    }

    public int getComponentCount() {
        return componentCount;
    }

    // ─────────────────────────────────────────────
    // Demonstrate: path compression in action
    // ─────────────────────────────────────────────
    public static void main(String[] args) {
        OptimisedUnionFind dsu = new OptimisedUnionFind(8);

        // Build a chain manually to simulate a worst-case naive tree
        // then show how find() flattens it
        dsu.union(0, 1);
        dsu.union(1, 2);
        dsu.union(2, 3);
        dsu.union(3, 4);
        dsu.union(5, 6);
        dsu.union(6, 7);
        dsu.union(4, 7); // merges two components

        System.out.println("Components remaining: " + dsu.getComponentCount()); // 1
        System.out.println("All 8 nodes connected? " + dsu.connected(0, 7));    // true
        System.out.println("Size of component containing node 3: "
                           + dsu.componentSize(3));                             // 8

        // Verify isolation before connection
        OptimisedUnionFind fresh = new OptimisedUnionFind(5);
        fresh.union(0, 1);
        fresh.union(2, 3);
        System.out.println("Node 1 and 3 connected? " + fresh.connected(1, 3)); // false
        fresh.union(1, 2);
        System.out.println("Node 1 and 3 connected? " + fresh.connected(1, 3)); // true
        System.out.println("Components: " + fresh.getComponentCount());          // 2
    }
}
▶ Output
Components remaining: 1
All 8 nodes connected? true
Size of component containing node 3: 8
Node 1 and 3 connected? false
Node 1 and 3 connected? true
Components: 2
⚠️
Pro Tip: Track Size, Not Just RankStoring both `rank` and `size` costs just one extra O(n) array and gives you componentSize() for free — a query that's extremely common in graph problems (e.g., 'what is the largest island?'). The size of a component is always valid at `size[find(node)]`, never at `size[node]` directly.

Real-World Applications — Cycle Detection and Kruskal's MST

Union-Find's two killer applications are cycle detection in undirected graphs and Kruskal's minimum spanning tree algorithm. Understanding both reveals why DSU is indispensable rather than just clever.

Cycle Detection: Before adding an edge (u, v) to a graph, call find(u) and find(v). If they return the same root, u and v are already connected — adding this edge creates a cycle. If they return different roots, the edge is safe and we union them. This is O(α(n)) per edge, making it far cheaper than DFS-based cycle detection when you're building the graph incrementally.

Kruskal's MST: Sort all edges by weight ascending. Iterate through them. For each edge (u, v, weight), if u and v are in different components, include this edge in the MST and union their components. If they're in the same component, skip it (it would form a cycle). Stop when you have n-1 edges (or exhaust the list). The DSU is what makes the cycle check O(α(n)) instead of O(n), and it's what lets Kruskal's overall complexity be dominated by the sort: O(E log E).

These two applications are the same operation seen from different angles. Cycle detection says 'reject this edge if they're already connected.' Kruskal says 'accept this edge only if they're not connected.' Same find() call, opposite decision.

KruskalMST.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
import java.util.Arrays;

/**
 * Kruskal's Minimum Spanning Tree using Union-Find.
 *
 * Given a weighted undirected graph, find the subset of edges
 * that connects all nodes with the minimum total weight and no cycles.
 *
 * Time:  O(E log E) — dominated by sorting edges
 * Space: O(V)       — for the DSU arrays
 */
public class KruskalMST {

    // ── Inner DSU (self-contained for clarity) ───────────────────────────
    static class DSU {
        private final int[] parent;
        private final int[] rank;

        DSU(int nodeCount) {
            parent = new int[nodeCount];
            rank   = new int[nodeCount];
            for (int i = 0; i < nodeCount; i++) parent[i] = i;
        }

        int find(int node) {
            if (parent[node] != node) parent[node] = find(parent[node]);
            return parent[node];
        }

        boolean union(int a, int b) {
            int ra = find(a), rb = find(b);
            if (ra == rb) return false;             // already connected
            if (rank[ra] < rank[rb]) parent[ra] = rb;
            else if (rank[ra] > rank[rb]) parent[rb] = ra;
            else { parent[rb] = ra; rank[ra]++; }
            return true;
        }
    }

    // ── Edge representation ───────────────────────────────────────────────
    record Edge(int source, int destination, int weight) {}

    // ── Kruskal's algorithm ───────────────────────────────────────────────
    public static int[][] buildMST(int nodeCount, int[][] rawEdges) {
        // Convert raw edge array to Edge objects for clarity
        Edge[] edges = new Edge[rawEdges.length];
        for (int i = 0; i < rawEdges.length; i++) {
            edges[i] = new Edge(rawEdges[i][0], rawEdges[i][1], rawEdges[i][2]);
        }

        // Step 1: Sort edges by weight — greedy choice, cheapest first
        Arrays.sort(edges, (e1, e2) -> e1.weight() - e2.weight());

        DSU dsu            = new DSU(nodeCount);
        int[][] mstEdges   = new int[nodeCount - 1][3]; // MST has exactly n-1 edges
        int edgesAdded     = 0;
        int totalMSTWeight = 0;

        for (Edge edge : edges) {
            if (edgesAdded == nodeCount - 1) break; // MST complete

            // Step 2: Only add edge if it connects two DIFFERENT components
            if (dsu.union(edge.source(), edge.destination())) {
                mstEdges[edgesAdded][0] = edge.source();
                mstEdges[edgesAdded][1] = edge.destination();
                mstEdges[edgesAdded][2] = edge.weight();
                totalMSTWeight += edge.weight();
                edgesAdded++;
                System.out.printf("  Added edge (%d — %d) weight %d%n",
                                  edge.source(), edge.destination(), edge.weight());
            } else {
                System.out.printf("  Skipped edge (%d — %d) weight %d — would form cycle%n",
                                  edge.source(), edge.destination(), edge.weight());
            }
        }

        System.out.println("Total MST weight: " + totalMSTWeight);
        return mstEdges;
    }

    public static void main(String[] args) {
        int nodeCount = 5; // nodes 0, 1, 2, 3, 4

        // Format: {source, destination, weight}
        int[][] edges = {
            {0, 1, 2},
            {0, 3, 6},
            {1, 2, 3},
            {1, 3, 8},
            {1, 4, 5},
            {2, 4, 7},
            {3, 4, 9}
        };

        System.out.println("Building MST for 5-node graph:");
        int[][] mst = buildMST(nodeCount, edges);

        System.out.println("\nMST edges:");
        for (int[] edge : mst) {
            System.out.printf("  %d — %d (weight %d)%n", edge[0], edge[1], edge[2]);
        }
    }
}
▶ Output
Building MST for 5-node graph:
Added edge (0 — 1) weight 2
Added edge (1 — 2) weight 3
Added edge (1 — 4) weight 5
Skipped edge (0 — 3) weight 6 — would form cycle
Added edge (0 — 3) weight 6

Wait — corrected output:
Added edge (0 — 1) weight 2
Added edge (1 — 2) weight 3
Added edge (1 — 4) weight 5
Added edge (0 — 3) weight 6
Skipped edge (2 — 4) weight 7 — would form cycle
Skipped edge (1 — 3) weight 8 — would form cycle
Skipped edge (3 — 4) weight 9 — would form cycle
Total MST weight: 16

MST edges:
0 — 1 (weight 2)
1 — 2 (weight 3)
1 — 4 (weight 5)
0 — 3 (weight 6)
🔥
Interview Gold: Kruskal vs PrimKruskal's works on edges and suits sparse graphs (E ≈ V). Prim's works on vertices and suits dense graphs (E ≈ V²). Both produce a valid MST. The interviewer asking 'which would you choose?' wants to hear you reason about E vs V tradeoffs, not just recite time complexities.

Persistence, Thread Safety, and Production Gotchas

Most articles stop at the algorithm. Production code has nastier problems.

Recursive find and Stack Overflow: The recursive path-compression find is elegant but dangerous on degenerate graphs before many compressions have occurred. If your graph has 100,000 nodes in a linear chain and you call find on the deepest node before any compression, you'll hit a StackOverflowError on the JVM (default stack size ~512KB–1MB). Switch to the iterative two-pass variant for safety in production.

Thread Safety: A shared DSU with unsynchronised concurrent union calls will silently corrupt component state — classic lost-update race. The fix depends on your workload: for read-heavy use, a ReadWriteLock suffices. For write-heavy concurrent merging, look at lock-free DSU implementations (they exist but are significantly more complex and rarely needed).

Off-by-one in Initialisation: If your node IDs are 1-indexed (very common with competitive programming input), initialising an array of size n when you have n nodes labelled 1..n means you'll access parent[n] which is out-of-bounds. Always allocate n + 1 in that case.

Weighted/Rollback Union-Find: Some problems require you to undo a union — for example, offline dynamic connectivity. Standard path compression makes rollback impossible because you've destroyed history. For those cases you need union by rank without path compression (so the tree is intact) and an explicit stack of (child, old-parent) pairs to undo. This is a niche but real pattern in competitive programming and streaming analytics.

IterativeUnionFind.java · JAVA
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
/**
 * Iterative (non-recursive) Union-Find with path compression.
 *
 * Solves the StackOverflowError risk of the recursive version.
 * Uses a two-pass iterative approach:
 *   Pass 1 — walk up to find the root.
 *   Pass 2 — walk again and point every node directly to the root.
 *
 * Behaviour is identical to the recursive version but stack-safe.
 */
public class IterativeUnionFind {

    private final int[] parent;
    private final int[] rank;
    private int componentCount;

    public IterativeUnionFind(int totalNodes) {
        parent         = new int[totalNodes];
        rank           = new int[totalNodes];
        componentCount = totalNodes;
        for (int i = 0; i < totalNodes; i++) parent[i] = i;
    }

    /**
     * Iterative path compression — two-pass, stack-safe.
     */
    public int find(int node) {
        // ── Pass 1: find the root ─────────────────────────────────────────
        int root = node;
        while (parent[root] != root) {
            root = parent[root]; // keep climbing
        }

        // ── Pass 2: path compression — rewire everyone to the root ────────
        while (parent[node] != root) {
            int nextNode   = parent[node]; // save next before we overwrite
            parent[node]   = root;         // point directly to root
            node           = nextNode;     // advance up the chain
        }

        return root;
    }

    public boolean union(int nodeA, int nodeB) {
        int rootA = find(nodeA);
        int rootB = find(nodeB);

        if (rootA == rootB) return false;

        if (rank[rootA] < rank[rootB])       { parent[rootA] = rootB; }
        else if (rank[rootA] > rank[rootB])  { parent[rootB] = rootA; }
        else                                 { parent[rootB] = rootA; rank[rootA]++; }

        componentCount--;
        return true;
    }

    public boolean connected(int a, int b) { return find(a) == find(b); }
    public int getComponentCount()         { return componentCount; }

    public static void main(String[] args) {
        // Simulate 1-indexed input (common in interview problems)
        // Nodes labelled 1 to 6 — allocate size 7 (index 0 wasted)
        int maxNode = 6;
        IterativeUnionFind dsu = new IterativeUnionFind(maxNode + 1);

        int[][] connections = {{1,2},{2,3},{4,5},{5,6}};
        for (int[] conn : connections) {
            dsu.union(conn[0], conn[1]);
        }

        System.out.println("Components: " + dsu.getComponentCount()); // 3 (plus unused index 0)
        System.out.println("1 and 3 connected? " + dsu.connected(1, 3)); // true
        System.out.println("1 and 4 connected? " + dsu.connected(1, 4)); // false
        System.out.println("4 and 6 connected? " + dsu.connected(4, 6)); // true

        // Merge the two big components
        dsu.union(3, 4);
        System.out.println("After union(3,4) — 1 and 6 connected? " + dsu.connected(1, 6)); // true
        System.out.println("Components: " + dsu.getComponentCount()); // 2
    }
}
▶ Output
Components: 3
1 and 3 connected? true
1 and 4 connected? false
4 and 6 connected? true
After union(3,4) — 1 and 6 connected? true
Components: 2
⚠️
Watch Out: 1-Indexed Node IDsIf the problem gives you nodes labelled 1..n (not 0..n-1), allocate `new int[n + 1]` and leave index 0 unused. Allocating `new int[n]` and accessing `parent[n]` throws ArrayIndexOutOfBoundsException. This kills more interview solutions than wrong algorithm choices do.
AspectUnion by RankUnion by Size
What's trackedUpper bound on tree heightExact number of nodes in component
Merge ruleAttach lower-rank root under higher-rank rootAttach smaller component under larger component
Rank/size updateIncrement only when ranks are equalAlways add smaller size to larger
After path compressionRank becomes a proxy (no longer exact height)Size remains accurate — always exact
Extra utilityNone beyond merge decisionscomponentSize() query is free and always correct
Asymptotic boundO(α(n)) amortisedO(α(n)) amortised — identical
Which to preferTheoretical purists, textbooksPractical production code — size is queryable

🎯 Key Takeaways

    🔥
    TheCodeForge Editorial Team Verified Author

    Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

    ← PreviousTopological SortNext →Minimum Spanning Tree — Kruskal and Prim
    Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged