Homeβ€Ί DSAβ€Ί Closest Pair of Points β€” Divide and Conquer O(n log n)

Closest Pair of Points β€” Divide and Conquer O(n log n)

Where developers are forged. Β· Structured learning Β· Free forever.
πŸ“ Part of: Recursion β†’ Topic 7 of 9
Solve the closest pair of points problem using divide and conquer in O(n log n).
πŸ”₯ Advanced β€” solid DSA foundation required
In this tutorial, you'll learn:
  • Sort by x, divide at median, recurse on each half to get Ξ΄_L and Ξ΄_R. Ξ΄ = min(Ξ΄_L, Ξ΄_R). Check strip of width 2Ξ΄ around the dividing line.
  • The 7-point bound: in a 2δ×δ rectangle, at most 8 points can be placed with mutual distance β‰₯ Ξ΄ (2Γ—2 arrangement per half). So each point checks ≀ 7 neighbours in the strip.
  • Strip must be sorted by y-coordinate for the 7-point bound to apply. Pre-sorting by y (merging during the divide step) achieves T(n) = 2T(n/2) + O(n) β†’ O(n log n).
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
⚑ Quick Answer
Given n points on a plane, find the two closest ones. Naive: check all pairs β€” O(nΒ²). D&C: split points by x-coordinate, find closest pair in each half recursively, then check if a closer pair crosses the dividing line. The key insight: you only need to check a narrow strip around the dividing line, and within that strip, each point has at most 7 neighbours to check.

The closest pair problem is a beautiful example of computational geometry enabling an otherwise-quadratic combine step to run in linear time. The naive O(n^2) solution checks every pair. The D&C solution achieves O(n log n) by proving a remarkable geometric fact: in the critical 'strip' around the dividing line, each point has at most 7 other points it needs to check.

This seven-point bound β€” derived from packing arguments in a 2δ×δ rectangle β€” is the kind of geometric reasoning that separates algorithm designers from algorithm users. It is also a template for many computational geometry optimisations: use the structure of space to bound the number of interactions.

Algorithm Overview

  1. Sort points by x-coordinate: O(n log n)
  2. Recursively find closest pair in left half (Ξ΄_L) and right half (Ξ΄_R)
  3. Ξ΄ = min(Ξ΄_L, Ξ΄_R)
  4. Find all points within Ξ΄ of the dividing line (the strip)
  5. Check pairs in the strip sorted by y-coordinate
  6. The key: each point in the strip has at most 7 other points to check
closest_pair.py Β· PYTHON
123456789101112131415161718192021222324252627282930313233343536373839
from math import dist
from itertools import combinations

def closest_pair(points: list[tuple]) -> tuple[float, tuple, tuple]:
    points = sorted(points, key=lambda p: p[0])
    return _closest(points)

def _closest(pts):
    n = len(pts)
    if n <= 3:
        # Brute force for small input
        best = float('inf'), pts[0], pts[1]
        for a, b in combinations(pts, 2):
            d = dist(a, b)
            if d < best[0]: best = d, a, b
        return best
    mid = n // 2
    mid_x = pts[mid][0]
    d_left  = _closest(pts[:mid])
    d_right = _closest(pts[mid:])
    best = d_left if d_left[0] < d_right[0] else d_right
    delta = best[0]
    # Build strip: points within delta of dividing line
    strip = [p for p in pts if abs(p[0] - mid_x) < delta]
    strip.sort(key=lambda p: p[1])  # sort by y
    # Check strip pairs (at most 7 comparisons per point)
    for i in range(len(strip)):
        for j in range(i+1, min(i+8, len(strip))):
            if strip[j][1] - strip[i][1] >= delta:
                break
            d = dist(strip[i], strip[j])
            if d < delta:
                best = d, strip[i], strip[j]
                delta = d
    return best

points = [(2,3),(12,30),(40,50),(5,1),(12,10),(3,4)]
d, p1, p2 = closest_pair(points)
print(f'Closest: {p1} and {p2}, distance: {d:.4f}')
β–Ά Output
Closest: (2, 3) and (3, 4), distance: 1.4142

The '7 Neighbours' Proof

Why does each point in the strip have at most 7 other points to check? Consider a δ×2Ξ΄ rectangle straddling the midline. In each δ×δ half, the closest pair distance is β‰₯ Ξ΄ (we already found the best in each half). So in each δ×δ quarter, we can pack at most 4 points at distance β‰₯ Ξ΄ from each other (a 2Γ—2 grid). Total: 8 points maximum in the 2δ×2Ξ΄ strip rectangle including the reference point itself β€” so at most 7 others.

This bounds the inner loop to constant work per point β†’ O(n) combine step.

πŸ”₯
ComplexityT(n) = 2T(n/2) + O(n log n) [sorting strip] or O(n) [if strip pre-sorted]. With pre-sorting by y: T(n) = 2T(n/2) + O(n) β†’ O(n log n) total by Master Theorem Case 2.

Optimisation β€” Pre-Sort by Y

Re-sorting the strip each time adds a log factor. The optimisation: maintain two sorted arrays throughout β€” one sorted by x (for dividing) and one sorted by y (for strip checking). This gives T(n) = 2T(n/2) + O(n) β†’ O(n log n) with smaller constants.

🎯 Key Takeaways

  • Sort by x, divide at median, recurse on each half to get Ξ΄_L and Ξ΄_R. Ξ΄ = min(Ξ΄_L, Ξ΄_R). Check strip of width 2Ξ΄ around the dividing line.
  • The 7-point bound: in a 2δ×δ rectangle, at most 8 points can be placed with mutual distance β‰₯ Ξ΄ (2Γ—2 arrangement per half). So each point checks ≀ 7 neighbours in the strip.
  • Strip must be sorted by y-coordinate for the 7-point bound to apply. Pre-sorting by y (merging during the divide step) achieves T(n) = 2T(n/2) + O(n) β†’ O(n log n).
  • O(n log n) is optimal for comparison-based closest pair β€” same lower bound as sorting.
  • The packing argument generalises: closest pair in higher dimensions also bounds strip neighbours by a constant, though the constant grows with dimension.

Interview Questions on This Topic

  • QWhy does each point in the strip need at most 7 comparisons?
  • QWhat is the recurrence and complexity of the closest pair algorithm?
  • QWhat happens at the base case and why is brute force acceptable there?
  • QHow would you extend this to 3D?

Frequently Asked Questions

Is O(n log n) optimal for closest pair?

Yes β€” any comparison-based algorithm requires Ξ©(n log n) by reduction from element distinctness. The D&C algorithm is therefore asymptotically optimal.

πŸ”₯
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousCounting Inversions using Merge SortNext β†’Strassen's Matrix Multiplication Algorithm
Forged with πŸ”₯ at TheCodeForge.io β€” Where Developers Are Forged