Closest Pair of Points β Divide and Conquer O(n log n)
- Sort by x, divide at median, recurse on each half to get Ξ΄_L and Ξ΄_R. Ξ΄ = min(Ξ΄_L, Ξ΄_R). Check strip of width 2Ξ΄ around the dividing line.
- The 7-point bound: in a 2Ξ΄ΓΞ΄ rectangle, at most 8 points can be placed with mutual distance β₯ Ξ΄ (2Γ2 arrangement per half). So each point checks β€ 7 neighbours in the strip.
- Strip must be sorted by y-coordinate for the 7-point bound to apply. Pre-sorting by y (merging during the divide step) achieves T(n) = 2T(n/2) + O(n) β O(n log n).
The closest pair problem is a beautiful example of computational geometry enabling an otherwise-quadratic combine step to run in linear time. The naive O(n^2) solution checks every pair. The D&C solution achieves O(n log n) by proving a remarkable geometric fact: in the critical 'strip' around the dividing line, each point has at most 7 other points it needs to check.
This seven-point bound β derived from packing arguments in a 2Ξ΄ΓΞ΄ rectangle β is the kind of geometric reasoning that separates algorithm designers from algorithm users. It is also a template for many computational geometry optimisations: use the structure of space to bound the number of interactions.
Algorithm Overview
- Sort points by x-coordinate: O(n log n)
- Recursively find closest pair in left half (Ξ΄_L) and right half (Ξ΄_R)
- Ξ΄ = min(Ξ΄_L, Ξ΄_R)
- Find all points within Ξ΄ of the dividing line (the strip)
- Check pairs in the strip sorted by y-coordinate
- The key: each point in the strip has at most 7 other points to check
from math import dist from itertools import combinations def closest_pair(points: list[tuple]) -> tuple[float, tuple, tuple]: points = sorted(points, key=lambda p: p[0]) return _closest(points) def _closest(pts): n = len(pts) if n <= 3: # Brute force for small input best = float('inf'), pts[0], pts[1] for a, b in combinations(pts, 2): d = dist(a, b) if d < best[0]: best = d, a, b return best mid = n // 2 mid_x = pts[mid][0] d_left = _closest(pts[:mid]) d_right = _closest(pts[mid:]) best = d_left if d_left[0] < d_right[0] else d_right delta = best[0] # Build strip: points within delta of dividing line strip = [p for p in pts if abs(p[0] - mid_x) < delta] strip.sort(key=lambda p: p[1]) # sort by y # Check strip pairs (at most 7 comparisons per point) for i in range(len(strip)): for j in range(i+1, min(i+8, len(strip))): if strip[j][1] - strip[i][1] >= delta: break d = dist(strip[i], strip[j]) if d < delta: best = d, strip[i], strip[j] delta = d return best points = [(2,3),(12,30),(40,50),(5,1),(12,10),(3,4)] d, p1, p2 = closest_pair(points) print(f'Closest: {p1} and {p2}, distance: {d:.4f}')
The '7 Neighbours' Proof
Why does each point in the strip have at most 7 other points to check? Consider a Ξ΄Γ2Ξ΄ rectangle straddling the midline. In each Ξ΄ΓΞ΄ half, the closest pair distance is β₯ Ξ΄ (we already found the best in each half). So in each Ξ΄ΓΞ΄ quarter, we can pack at most 4 points at distance β₯ Ξ΄ from each other (a 2Γ2 grid). Total: 8 points maximum in the 2Ξ΄Γ2Ξ΄ strip rectangle including the reference point itself β so at most 7 others.
This bounds the inner loop to constant work per point β O(n) combine step.
Optimisation β Pre-Sort by Y
Re-sorting the strip each time adds a log factor. The optimisation: maintain two sorted arrays throughout β one sorted by x (for dividing) and one sorted by y (for strip checking). This gives T(n) = 2T(n/2) + O(n) β O(n log n) with smaller constants.
π― Key Takeaways
- Sort by x, divide at median, recurse on each half to get Ξ΄_L and Ξ΄_R. Ξ΄ = min(Ξ΄_L, Ξ΄_R). Check strip of width 2Ξ΄ around the dividing line.
- The 7-point bound: in a 2Ξ΄ΓΞ΄ rectangle, at most 8 points can be placed with mutual distance β₯ Ξ΄ (2Γ2 arrangement per half). So each point checks β€ 7 neighbours in the strip.
- Strip must be sorted by y-coordinate for the 7-point bound to apply. Pre-sorting by y (merging during the divide step) achieves T(n) = 2T(n/2) + O(n) β O(n log n).
- O(n log n) is optimal for comparison-based closest pair β same lower bound as sorting.
- The packing argument generalises: closest pair in higher dimensions also bounds strip neighbours by a constant, though the constant grows with dimension.
Interview Questions on This Topic
- QWhy does each point in the strip need at most 7 comparisons?
- QWhat is the recurrence and complexity of the closest pair algorithm?
- QWhat happens at the base case and why is brute force acceptable there?
- QHow would you extend this to 3D?
Frequently Asked Questions
Is O(n log n) optimal for closest pair?
Yes β any comparison-based algorithm requires Ξ©(n log n) by reduction from element distinctness. The D&C algorithm is therefore asymptotically optimal.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.