Interpolation Search Algorithm — O(log log n) for Uniform Data
- Probe position: lo + (target - arr[lo]) * (hi - lo) / (arr[hi] - arr[lo]). Linear interpolation assumes uniform distribution between boundaries.
- O(log log n) average for uniform data — each probe reduces search space multiplicatively by more than 2. O(n) worst case for non-uniform (exponential) distributions.
- Real-world version: database index range scans use histogram statistics to estimate probe positions — this is interpolation search applied to B-tree pages.
Interpolation search achieves O(log log n) on uniformly distributed data — compared to binary search's O(log n). For n=10^9 uniformly distributed integers, that is ~5 probes versus ~30 probes. For the right data, that is a 6x speedup in comparison count.
The catch: 'uniformly distributed' is rarely guaranteed in real data. Database query optimisers use similar interpolation logic — estimating where a value falls in a range based on column statistics and histograms. That is interpolation search applied to index structures. Understanding interpolation search means understanding how query planners estimate probe costs.
The Interpolation Formula
Instead of mid = (lo + hi) // 2, interpolation search estimates: pos = lo + ((target - arr[lo]) × (hi - lo)) / (arr[hi] - arr[lo])
This is linear interpolation — projecting where the target would fall if values were uniformly distributed between arr[lo] and arr[hi].
def interpolation_search(arr: list, target) -> int: lo, hi = 0, len(arr) - 1 while lo <= hi and arr[lo] <= target <= arr[hi]: if lo == hi: return lo if arr[lo] == target else -1 # Interpolation probe pos = lo + int((target - arr[lo]) * (hi - lo) / (arr[hi] - arr[lo])) if arr[pos] == target: return pos elif arr[pos] < target: lo = pos + 1 else: hi = pos - 1 return -1 # Uniformly distributed data — very fast arr = list(range(0, 1000, 10)) # [0, 10, 20, ..., 990] print(interpolation_search(arr, 680)) # 68 print(interpolation_search(arr, 777)) # -1 (not a multiple of 10)
-1
When it Excels vs When it Fails
Excels: Uniformly distributed sorted arrays — phone books, sorted numeric IDs, timestamps at regular intervals. O(log log n) expected.
Fails badly: Non-uniform distributions — geometric sequences, or data clustered at one end. Worst case O(n). Example: arr = [1, 2, 4, 8, 16, ..., 2^n] — interpolation probes near the start every time for large targets.
Rule of thumb: Use interpolation when you know the data is roughly uniform. Use binary search otherwise.
Complexity
Average (uniform data): O(log log n) — the probe reduces search space multiplicatively Worst case: O(n) — non-uniform distribution can make it probe one element at a time Space: O(1)
For n=10^9 uniform elements: log log n ≈ 5 probes vs binary search's log n ≈ 30 probes.
🎯 Key Takeaways
- Probe position: lo + (target - arr[lo]) * (hi - lo) / (arr[hi] - arr[lo]). Linear interpolation assumes uniform distribution between boundaries.
- O(log log n) average for uniform data — each probe reduces search space multiplicatively by more than 2. O(n) worst case for non-uniform (exponential) distributions.
- Real-world version: database index range scans use histogram statistics to estimate probe positions — this is interpolation search applied to B-tree pages.
- Never use in production without validating data distribution. A geometric sequence makes it O(n) — worse than binary search.
- Good interview discussion topic: demonstrates that asymptotic complexity depends on data distribution assumptions, not just algorithm structure.
Interview Questions on This Topic
- QDerive the interpolation formula and explain what it assumes about the data.
- QWhen does interpolation search degrade to O(n)?
- QCompare interpolation search with binary search — when would you choose each?
- QWhat is the expected number of probes for interpolation search on uniform data?
Frequently Asked Questions
Does interpolation search work on non-numeric data?
No — it requires a meaningful distance metric between values to compute the interpolation. It works for integers, floats, and dates, but not arbitrary comparable types.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.