Home DSA Interpolation Search Algorithm — O(log log n) for Uniform Data

Interpolation Search Algorithm — O(log log n) for Uniform Data

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Searching → Topic 6 of 8
Learn interpolation search — how it uses value distribution to probe smarter than binary search, achieving O(log log n) for uniformly distributed data.
⚙️ Intermediate — basic DSA knowledge assumed
In this tutorial, you'll learn:
  • Probe position: lo + (target - arr[lo]) * (hi - lo) / (arr[hi] - arr[lo]). Linear interpolation assumes uniform distribution between boundaries.
  • O(log log n) average for uniform data — each probe reduces search space multiplicatively by more than 2. O(n) worst case for non-uniform (exponential) distributions.
  • Real-world version: database index range scans use histogram statistics to estimate probe positions — this is interpolation search applied to B-tree pages.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
⚡ Quick Answer
Binary search always checks the middle element. But if you're looking for 'Z' in a phone book, you don't open to the middle — you go near the end. Interpolation search does exactly this: estimate where the target is based on its value relative to the range, like proportional placement on a ruler. For uniformly distributed data, this gives O(log log n) — dramatically faster than binary search's O(log n).

Interpolation search achieves O(log log n) on uniformly distributed data — compared to binary search's O(log n). For n=10^9 uniformly distributed integers, that is ~5 probes versus ~30 probes. For the right data, that is a 6x speedup in comparison count.

The catch: 'uniformly distributed' is rarely guaranteed in real data. Database query optimisers use similar interpolation logic — estimating where a value falls in a range based on column statistics and histograms. That is interpolation search applied to index structures. Understanding interpolation search means understanding how query planners estimate probe costs.

The Interpolation Formula

Instead of mid = (lo + hi) // 2, interpolation search estimates: pos = lo + ((target - arr[lo]) × (hi - lo)) / (arr[hi] - arr[lo])

This is linear interpolation — projecting where the target would fall if values were uniformly distributed between arr[lo] and arr[hi].

interpolation_search.py · PYTHON
12345678910111213141516171819
def interpolation_search(arr: list, target) -> int:
    lo, hi = 0, len(arr) - 1
    while lo <= hi and arr[lo] <= target <= arr[hi]:
        if lo == hi:
            return lo if arr[lo] == target else -1
        # Interpolation probe
        pos = lo + int((target - arr[lo]) * (hi - lo) / (arr[hi] - arr[lo]))
        if arr[pos] == target:
            return pos
        elif arr[pos] < target:
            lo = pos + 1
        else:
            hi = pos - 1
    return -1

# Uniformly distributed data — very fast
arr = list(range(0, 1000, 10))  # [0, 10, 20, ..., 990]
print(interpolation_search(arr, 680))  # 68
print(interpolation_search(arr, 777))  # -1 (not a multiple of 10)
▶ Output
68
-1

When it Excels vs When it Fails

Excels: Uniformly distributed sorted arrays — phone books, sorted numeric IDs, timestamps at regular intervals. O(log log n) expected.

Fails badly: Non-uniform distributions — geometric sequences, or data clustered at one end. Worst case O(n). Example: arr = [1, 2, 4, 8, 16, ..., 2^n] — interpolation probes near the start every time for large targets.

Rule of thumb: Use interpolation when you know the data is roughly uniform. Use binary search otherwise.

Complexity

Average (uniform data): O(log log n) — the probe reduces search space multiplicatively Worst case: O(n) — non-uniform distribution can make it probe one element at a time Space: O(1)

For n=10^9 uniform elements: log log n ≈ 5 probes vs binary search's log n ≈ 30 probes.

🎯 Key Takeaways

  • Probe position: lo + (target - arr[lo]) * (hi - lo) / (arr[hi] - arr[lo]). Linear interpolation assumes uniform distribution between boundaries.
  • O(log log n) average for uniform data — each probe reduces search space multiplicatively by more than 2. O(n) worst case for non-uniform (exponential) distributions.
  • Real-world version: database index range scans use histogram statistics to estimate probe positions — this is interpolation search applied to B-tree pages.
  • Never use in production without validating data distribution. A geometric sequence makes it O(n) — worse than binary search.
  • Good interview discussion topic: demonstrates that asymptotic complexity depends on data distribution assumptions, not just algorithm structure.

Interview Questions on This Topic

  • QDerive the interpolation formula and explain what it assumes about the data.
  • QWhen does interpolation search degrade to O(n)?
  • QCompare interpolation search with binary search — when would you choose each?
  • QWhat is the expected number of probes for interpolation search on uniform data?

Frequently Asked Questions

Does interpolation search work on non-numeric data?

No — it requires a meaningful distance metric between values to compute the interpolation. It works for integers, floats, and dates, but not arbitrary comparable types.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousJump Search AlgorithmNext →Exponential Search and Fibonacci Search
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged