Skip to content
Home DSA Closest Pair of Points — Why Your Strip Takes O(n²)

Closest Pair of Points — Why Your Strip Takes O(n²)

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Recursion → Topic 7 of 9
An unsorted strip turned 10K points into 30-second queries.
🔥 Advanced — solid DSA foundation required
In this tutorial, you'll learn
An unsorted strip turned 10K points into 30-second queries.
  • Sort by x, divide at median, recurse on each half to get δ_L and δ_R. δ = min(δ_L, δ_R). Check strip of width 2δ around the dividing line.
  • The 7-point bound: in a 2δ×δ rectangle, at most 8 points can be placed with mutual distance ≥ δ (2×2 arrangement per half). So each point checks ≤ 7 neighbours in the strip.
  • Strip must be sorted by y-coordinate for the 7-point bound to apply. Pre-sorting by y (merging during the divide step) achieves T(n) = 2T(n/2) + O(n) → O(n log n).
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Divide and conquer splits points by median x, finds closest pair in each half recursively, then checks a narrow strip around the dividing line.
  • The strip’s width is δ (minimum distance from left/right halves). Only points within δ of the midline are considered.
  • Each point in the strip checks at most 7 neighbours, thanks to a geometric packing argument in a 2δ×δ rectangle.
  • Performance insight: the combine step runs in O(n) when the strip is sorted by y, giving overall O(n log n).
  • Production insight: forgetting to sort the strip by y can silently balloon the combine step to O(n²) on dense datasets.
  • Biggest mistake: assuming the 7-neighbour bound holds without sorting the strip by y-coordinate.
Production Incident

Closest Pair Blowup in Geospatial Query

A geospatial service processing millions of coordinates hit O(n²) runtime due to a missing y-sort in the strip checking step.
SymptomThe closest pair endpoint returned correct results but took over 30 seconds for 10K points, while the team expected sub-second latency.
AssumptionThe team assumed the 7-neighbour bound alone guaranteed O(n) combine, regardless of how the strip was processed.
Root causeThe strip was not sorted by y-coordinate. The inner loop compared each point against every other point in the strip, not just the nearest 7, leading to O(n²) behaviour for dense datasets.
FixAdded strip.sort(key=lambda p: p[1]) and enforced the 7-point check with break condition on y-difference. Profiling confirmed O(n log n) after the fix.
Key Lesson
The geometric bound only applies when points are sorted by y in the strip.Always verify assumptions about pre-sorting when using D&C combine steps.Profile before assuming O(n) — a missing sort can silently degrade performance.
Production Debug Guide

Symptom-driven actions for production and implementation issues

Algorithm takes more than O(n log n) time on moderate nCheck if the strip list is sorted by y before the inner loop. Add a runtime assertion that the strip's y-coordinates are non-decreasing. If absent, sort the strip.
Answer is incorrect or misses closest pair across dividing lineVerify that δ is correctly updated after every closer pair found within the strip. Also ensure the strip includes points exactly at distance δ from the midline (use < δ, not <=).
Stack overflow on recursive callsCheck the base case: for n ≤ 3, use brute force. If recursion goes deeper than log₂(n), the split may be uneven. Ensure sorting by x is stable and the median index is correct.

The closest pair problem is a beautiful example of computational geometry enabling an otherwise-quadratic combine step to run in linear time. The naive O(n^2) solution checks every pair. The D&C solution achieves O(n log n) by proving a remarkable geometric fact: in the critical 'strip' around the dividing line, each point has at most 7 other points it needs to check.

This seven-point bound — derived from packing arguments in a 2δ×δ rectangle — is the kind of geometric reasoning that separates algorithm designers from algorithm users. It is also a template for many computational geometry optimisations: use the structure of space to bound the number of interactions.

Algorithm Overview

  1. Sort points by x-coordinate: O(n log n)
  2. Recursively find closest pair in left half (δ_L) and right half (δ_R)
  3. δ = min(δ_L, δ_R)
  4. Find all points within δ of the dividing line (the strip)
  5. Check pairs in the strip sorted by y-coordinate
  6. The key: each point in the strip has at most 7 other points to check
closest_pair.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940
# Package: io.thecodeforge.closest_pair
from math import dist
from itertools import combinations

def closest_pair(points: list[tuple]) -> tuple[float, tuple, tuple]:
    points = sorted(points, key=lambda p: p[0])
    return _closest(points)

def _closest(pts):
    n = len(pts)
    if n <= 3:
        # Brute force for small input
        best = float('inf'), pts[0], pts[1]
        for a, b in combinations(pts, 2):
            d = dist(a, b)
            if d < best[0]: best = d, a, b
        return best
    mid = n // 2
    mid_x = pts[mid][0]
    d_left  = _closest(pts[:mid])
    d_right = _closest(pts[mid:])
    best = d_left if d_left[0] < d_right[0] else d_right
    delta = best[0]
    # Build strip: points within delta of dividing line
    strip = [p for p in pts if abs(p[0] - mid_x) < delta]
    strip.sort(key=lambda p: p[1])  # sort by y
    # Check strip pairs (at most 7 comparisons per point)
    for i in range(len(strip)):
        for j in range(i+1, min(i+8, len(strip))):
            if strip[j][1] - strip[i][1] >= delta:
                break
            d = dist(strip[i], strip[j])
            if d < delta:
                best = d, strip[i], strip[j]
                delta = d
    return best

points = [(2,3),(12,30),(40,50),(5,1),(12,10),(3,4)]
d, p1, p2 = closest_pair(points)
print(f'Closest: {p1} and {p2}, distance: {d:.4f}')
▶ Output
Closest: (2, 3) and (3, 4), distance: 1.4142
📊 Production Insight
Skipping the y-sort on the strip is the most common production mistake.
It turns the O(n) combine into an O(n²) disaster for dense datasets.
Rule: always sort strip by y before the inner loop — profile to confirm linear time.
🎯 Key Takeaway
Divide by x, recurse, then combine in O(n) by checking only 7 neighbours per point in a sorted strip.
That's the recipe that gives O(n log n).

The 7-Neighbour Proof

Why does each point in the strip have at most 7 other points to check? Consider a δ×2δ rectangle straddling the midline. In each δ×δ half, the closest pair distance is ≥ δ (we already found the best in each half). So in each δ×δ quarter, we can pack at most 4 points at distance ≥ δ from each other (a 2×2 grid). Total: 8 points maximum in the 2δ×2δ strip rectangle including the reference point itself — so at most 7 others.

This bounds the inner loop to constant work per point → O(n) combine step.

🔥Complexity
T(n) = 2T(n/2) + O(n log n) [sorting strip] or O(n) [if strip pre-sorted]. With pre-sorting by y: T(n) = 2T(n/2) + O(n) → O(n log n) total by Master Theorem Case 2.
📊 Production Insight
The proof only works if the strip is sorted by y.
Without that, you can't break the inner loop early, and the bound becomes O(n²).
Rule: the geometric bound depends on the ordering — enforce it in code.
🎯 Key Takeaway
Maximum 8 points fit in a 2δ×δ rectangle with pairwise distance ≥ δ.
Therefore, each point checks ≤ 7 neighbours — the combine step is O(n).

Optimisation — Pre-Sort by Y

Re-sorting the strip each time adds a log factor. The optimisation: maintain two sorted arrays throughout — one sorted by x (for dividing) and one sorted by y (for strip checking). This gives T(n) = 2T(n/2) + O(n) → O(n log n) with smaller constants.

Implementation detail: during recursion, merge the y-sorted lists of the left and right halves (like merge sort) to produce the parent's y-sorted list. The strip can then be filtered in O(|strip|) from the y-sorted list without a full sort.

📊 Production Insight
Pre-sorting trades memory for time. For memory-constrained environments (e.g., embedded systems), re-sorting the strip may be acceptable if n is small.
But for production at scale, the extra O(log n) factor becomes noticeable at millions of points.
Rule: always pre-sort by y when n > 10⁵ — the memory overhead is linear.
🎯 Key Takeaway
Maintain a separate y-sorted list to avoid O(n log n) per combine.
This reduces recurrence to T(n) = 2T(n/2) + O(n) = O(n log n).

Complexity Analysis and Recurrence

The recurrence for the naive version (no pre-sort): T(n) = 2T(n/2) + O(n log n). Solving via Master Theorem: a=2, b=2, f(n)=O(n log n) = O(n^(log₂2) log n) = O(n log n). This falls under Case 2 of the Master Theorem (actually a special case). The solution is O(n log² n). Because the combine step sorts the strip, it's O(n log n), leading to T(n) = 2T(n/2) + O(n log n) = O(n log² n). Wait — careful: if we sort the strip each time, the combine is O(k log k) where k ≤ n. That sums to O(n log² n) in the worst case. Pre-sorting brings combine to O(n) and recurrence to O(n log n).

Let's walk the math
  • Without pre-sort: T(n) = 2T(n/2) + O(n log n) → by Master Theorem Case 2, T(n) = Θ(n log² n).
  • With pre-sort: T(n) = 2T(n/2) + O(n) → by Master Theorem Case 2, T(n) = Θ(n log n).

So pre-sorting doesn't just reduce constants — it changes the asymptotic complexity.

📊 Production Insight
Engineers often claim O(n log n) without pre-sorting, but the correct bound is O(n log² n).
For n=10⁶, log² n ≈ 400 vs log n ≈ 20 — that's 20x slower.
Rule: always state the recurrence honestly and include pre-sorting in your implementation.
🎯 Key Takeaway
Without pre-sort: O(n log² n). With pre-sort: O(n log n).
The difference matters at scale — pre-sort is not a micro-optimisation.

Implementation Details and Edge Cases

Base Case: n ≤ 3. Brute force is efficient and avoids recursion overhead. Use combinations for clarity, but a triple-nested loop is fine for n=3.

Median Selection: Sort by x once, then the median is simply p[mid]. No need for linear-time selection — the sort dominates anyway.

Duplicate Points: If two points have identical coordinates, the distance is 0. The algorithm should handle this correctly because δ becomes 0 and the strip width becomes 0 — only points exactly on the midline are considered, but duplicates on the same side will be caught in the recursive step. However, if duplicates exist across halves, they may be missed because the strip is empty (δ=0). Fix: before divide, deduplicate or handle zero-distance case explicitly.

Vertical and Horizontal Lines: Sorting by x works fine. If many points share the same x, the median may be ambiguous. Use a stable sort or choose the middle index after sorting.

Floating-Point Precision: Distance calculations can suffer from floating-point errors. Use math.dist which is stable, but for comparison, prefer squared distances to avoid sqrt overhead.

Integer Coordinates: If all coordinates are integers, use squared distances to avoid sqrt entirely, only taking sqrt at the end.

closest_pair_optimized.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
# Package: io.thecodeforge.closest_pair_optimized
from math import dist

def closest_pair_with_pre_sort(points: list[tuple]) -> tuple[float, tuple, tuple]:
    # Pre-sort by x for divide step
    px = sorted(points, key=lambda p: p[0])
    py = sorted(points, key=lambda p: p[1])
    return _closest(px, py)

def _closest(px: list, py: list) -> tuple[float, tuple, tuple]:
    n = len(px)
    if n <= 3:
        # Brute force
        best = float('inf'), px[0], px[1]
        for i in range(n):
            for j in range(i+1, n):
                d = dist(px[i], px[j])
                if d < best[0]:
                    best = d, px[i], px[j]
        return best
    mid = n // 2
    mid_x = px[mid][0]
    # Split y-sorted points into left and right based on x
    py_left = []
    py_right = []
    for p in py:
        if p[0] <= mid_x and len(py_left) < mid:
            py_left.append(p)
        else:
            py_right.append(p)
    # Recurse
    d_left = _closest(px[:mid], py_left)
    d_right = _closest(px[mid:], py_right)
    delta = min(d_left[0], d_right[0])
    # Build strip from py (already sorted by y)
    strip = [p for p in py if abs(p[0] - mid_x) < delta]
    # Check up to 7 neighbours
    best = d_left if d_left[0] < d_right[0] else d_right
    for i in range(len(strip)):
        for j in range(i+1, min(i+8, len(strip))):
            if strip[j][1] - strip[i][1] >= delta:
                break
            d = dist(strip[i], strip[j])
            if d < best[0]:
                best = d, strip[i], strip[j]
                delta = d
    return best
▶ Output
Same as above but handles pre-sorting.
📊 Production Insight
Duplicate points cause δ=0, which empties the strip and misses cross-half duplicates.
In production geospatial data, duplicates happen more often than you think.
Rule: deduplicate points before running the algorithm, or add a zero-distance check at the start.
🎯 Key Takeaway
Handle base case n≤3 by brute force. Handle duplicates explicitly.
Pre-sort by y to keep combine linear. Use squared distances for integers.

Extensions and Higher Dimensions

The D&C approach generalises to k dimensions. The strip becomes a slab of width 2δ, and the neighbour bound becomes O(3^k), which grows exponentially with dimension. For 3D, each point checks at most 31 neighbours (3^3 - 1 = 26? Actually the bound is more complex). But the combinatorial explosion makes D&C impractical for dimensions > 3. For high-dimensional data, use space-filling curves or KD-trees.

Related Problems: The packing argument also applies to finding the minimal distance between two convex polygons, or computing the diameter of a point set using rotating calipers.

Practical Note: In production, for 2D and 3D, the D&C algorithm is the gold standard. For higher dimensions, approximate methods (like LSH) are used.

📊 Production Insight
Extending to 3D naively increases the constant from 7 to ~32. Still O(n log n) but 4x slower per point.
For dimensions > 3, the constant factor makes D&C slower than brute force for reasonable n.
Rule: choose the algorithm based on dimensionality, not just asymptotic complexity.
🎯 Key Takeaway
Works for any fixed dimension, but constant grows exponentially.
For 3D, neighbour bound is ~31. For > 3D, use approximate methods.
🗂 Closest Pair Approaches Comparison
Naive vs D&C vs Sweep Line
AlgorithmTime ComplexitySpace ComplexityWhen to Use
Naive (Brute Force)O(n²)O(1)n ≤ 1000, prototyping
Divide and Conquer (Pre-sorted)O(n log n)O(n)General purpose, n up to 10⁷
Shamos-Hoey (Sweep Line)O(n log n)O(n)When points are streaming or you need incremental updates

🎯 Key Takeaways

  • Sort by x, divide at median, recurse on each half to get δ_L and δ_R. δ = min(δ_L, δ_R). Check strip of width 2δ around the dividing line.
  • The 7-point bound: in a 2δ×δ rectangle, at most 8 points can be placed with mutual distance ≥ δ (2×2 arrangement per half). So each point checks ≤ 7 neighbours in the strip.
  • Strip must be sorted by y-coordinate for the 7-point bound to apply. Pre-sorting by y (merging during the divide step) achieves T(n) = 2T(n/2) + O(n) → O(n log n).
  • O(n log n) is optimal for comparison-based closest pair — same lower bound as sorting.
  • The packing argument generalises: closest pair in higher dimensions also bounds strip neighbours by a constant, though the constant grows with dimension.
  • Production failure: missing y-sort turns O(n log n) into O(n²). Always sort your strip.

⚠ Common Mistakes to Avoid

    Forgetting to sort the strip by y-coordinate
    Symptom

    Algorithm runs in O(n²) for dense datasets; profiling shows the inner loop executing far more than 7 comparisons per point.

    Fix

    Add strip.sort(key=lambda p: p[1]) before the inner loop. Alternatively, maintain a y-sorted list during recursion to avoid the sort entirely.

    Using '<=' instead of '<' for strip width
    Symptom

    Misses the closest pair when it lies exactly on the dividing line; algorithm returns a slightly larger distance than correct.

    Fix

    Use < delta (strict less than) when filtering points within the strip. The strip should include points strictly less than δ away from the midline.

    Not handling duplicate points
    Symptom

    Algorithm returns distance > 0 when duplicate coordinates exist, or misses the true minimum distance of 0.

    Fix

    Preprocess to deduplicate points using a set, or check for zero distance in the base case and return immediately.

    Base case too small (n ≤ 1) or too large (n ≤ 10)
    Symptom

    Recursion error or performance hit. n ≤ 1 fails because two points are required; n > 3 adds unnecessary recursion overhead.

    Fix

    Set base case to n ≤ 3. Use brute force with O(n²) for those small subsets.

    Not updating delta after finding a closer pair in the strip
    Symptom

    Correct distance found but later comparisons use the old delta, possibly missing an even closer pair within the same strip.

    Fix

    Update delta after each closer pair found: if d < delta: best = ...; delta = d. Then the inner loop's break condition uses the updated delta.

Interview Questions on This Topic

  • QWhy does each point in the strip need at most 7 comparisons?SeniorReveal
    Consider a 2δ×δ rectangle centred on the reference point, extending δ left and right of the midline. The rectangle is split into two δ×δ squares. Within each square, the closest pair distance is at least δ (from the recursive result). In a δ×δ square, the maximum number of points with pairwise distance ≥ δ is 4 (arranged in a 2×2 grid). So the whole rectangle contains at most 8 points (including the reference), meaning at most 7 others to check. This packing argument holds only if the strip is sorted by y.
  • QWhat is the recurrence and complexity of the closest pair algorithm with and without pre-sorting by y?Mid-levelReveal
    Without pre-sorting: T(n) = 2T(n/2) + O(n log n) → O(n log² n) by Master Theorem. With pre-sorting (merging y-sorted lists): T(n) = 2T(n/2) + O(n) → O(n log n). The pre-sorting eliminates the log factor from the combine step.
  • QWhat happens at the base case and why is brute force acceptable there?JuniorReveal
    For n ≤ 3, brute force is used because the number of pairs is tiny (≤ 3). Recursion overhead exceeds the O(1) brute force cost. It also avoids the complexity of handling small subsets with the geometric bound.
  • QHow would you extend this to 3D?SeniorReveal
    The same D&C pattern works: sort by x, find δ recursively, then check a 3D slab of width 2δ. The packing argument becomes a 2δ×δ×δ box divided into 8 cubes. The maximum number of points with mutual distance ≥ δ in a δ×δ×δ cube is 8 (a 2×2×2 grid). So the slab contains at most 8×8 = 64 points, giving a neighbour bound of 63. The algorithm remains O(n log n) but with a much larger constant. For production, you'd pre-sort by y and z.
  • QHow can you optimise the closest pair algorithm for floating-point coordinates?Mid-levelReveal
    Use squared distances to avoid the sqrt operation in every comparison. Only take the sqrt once at the end. This significantly reduces floating-point overhead. Also, careful with epsilon comparisons: use < epsilon for equality to avoid precision issues, but for the strip width, use < delta (not ≤) to stay true to the geometric bound. Pre-sorting by y remains crucial.

Frequently Asked Questions

Is O(n log n) optimal for closest pair?

Yes — any comparison-based algorithm requires Ω(n log n) by reduction from element distinctness. The D&C algorithm is therefore asymptotically optimal.

Why 7 and not 8?

The reference point itself is one of the up-to-8 points in the rectangle, so it checks at most 7 others.

What if points have the same x-coordinate?

The median is chosen after sorting by x. If many points share the median x, the split may be uneven, but the algorithm still works. The strip includes points on both sides of the dividing line, so vertical lines are handled.

Can I use this algorithm for large-scale geospatial queries?

Yes, but pre-process to handle duplicates and watch out for floating-point precision. For millions of points, pre-sorting by y is essential. Also consider using a sweep-line variant if the points are indexed in a spatial database.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousCounting Inversions using Merge SortNext →Strassen's Matrix Multiplication Algorithm
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged