To design a proximity service, you index locations using a spatial data structure like a geohash or quadtree, then query for points within a bounding box or radius. The key is balancing write throughput and query latency at scale.
✦ Definition~90s read
What is Design a Proximity Service?
A proximity service finds nearby points of interest (POIs) within a given radius from a user's location. It's the engine behind 'find restaurants near me' or 'nearest gas station' features.
★
Imagine you're at a music festival and want to find the nearest food truck.
Plain-English First
Imagine you're at a music festival and want to find the nearest food truck. You don't ask every truck in the entire festival—you look at the map, see which trucks are in your section, then check distances. A proximity service does the same: it organizes locations into 'sections' (like geohash cells or quadtree nodes) so it only checks nearby ones, not the whole world.
Most proximity services fail because they treat it as a simple distance calculation problem. They slap a Haversine formula on every POI and wonder why queries take seconds at 10 million locations. The real challenge isn't math—it's indexing. Without a spatial index, you're scanning the entire dataset for every request. That doesn't scale. By the end of this article, you'll know how to build a proximity service that handles millions of POIs with sub-100ms queries, using geohashes and quadtrees, and you'll learn the production gotchas that burn teams at 3AM.
Why Naive Distance Calculations Fail at Scale
The first thing every junior does is compute Haversine distance between the user and every POI. That's O(n) per query. At 10 million POIs and 1000 queries/second, you're doing 10 billion distance calculations per second. Even with vectorized math, that's a CPU meltdown. The fix: pre-filter with a spatial index. You only compute exact distances on a small candidate set (usually <100 POIs). Without this, your service dies under load.
NaiveProximity.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — SystemDesign tutorial
// DON'T DOTHIS — naive O(n) query
function findNearbyNaive(userLat, userLng, radiusKm, allPois) {
let results = [];
for (let poi of allPois) {
let dist = haversine(userLat, userLng, poi.lat, poi.lng);
if (dist <= radiusKm) {
results.push(poi);
}
}
return results;
}
// For 10M POIs, this takes ~500ms per query on a single core.
// At1000QPS, you need 500 cores. Not happening.
Output
500ms per query on 10M POIs (single core).
Production Trap:
Haversine is CPU-heavy (trig functions). If you must compute it, pre-calculate a bounding box first to filter out obviously far points. But really, use a spatial index.
thecodeforge.io
Proximity Service Design: Geo-Hashing & Quadtrees
Design Proximity Service
Geohashing: The Simple Spatial Index That Works
Geohashing divides the world into a grid of cells, each with a string ID. Nearby points share a common prefix. To find POIs near a user, you compute the geohash of the user's location at a precision that gives cells of ~1km, then query all POIs with that prefix. This is a simple string prefix match—fast on any database with a B-tree index. The trade-off: edge cases near cell boundaries. A point just outside the user's cell but still within radius will be missed unless you also query neighboring cells.
GeohashQuery.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — SystemDesign tutorial
// Geohash-based proximity query (pseudo-SQL)
// Assume table 'pois' has columns: id, lat, lng, geohash (varchar, indexed)
-- Step1: Compute user's geohash at precision 6 (~1.2km cells)
-- Step2: QueryPOIs with matching prefix
SELECT id, lat, lng, name
FROM pois
WHERE geohash LIKE'u4pruy%' -- prefix for user's cell
OR geohash LIKE'u4pruz%' -- neighbor cells (8 neighbors)
OR geohash LIKE'u4prv0%'
-- ... include all 9 cells
-- Step3: Filter by exact distance (Haversine on small set)
-- This step is optional if cell size is small enough.
-- Index: CREATEINDEX idx_geohash ONpois(geohash);
-- This is a B-tree index, so prefix match is O(log n).
Output
Query returns ~100 candidates from 10M POIs in <10ms.
Senior Shortcut:
Use geohash precision that matches your radius. For 1km radius, precision 6 (cell ~1.2km). For 10km, precision 5 (~4.8km). Always query the 9-cell grid (center + 8 neighbors) to avoid boundary misses.
thecodeforge.io
Geohash Query Pipeline
Design Proximity Service
Quadtrees: When You Need Dynamic Resolution
Geohashes have fixed grid size. If your POI density varies wildly (downtown vs rural), you waste memory in sparse areas and lose precision in dense areas. Quadtrees adapt: they recursively subdivide a region into four quadrants until each leaf contains at most a threshold number of POIs. Queries traverse the tree, pruning branches that fall outside the search radius. This gives O(log n) query time and memory proportional to POI density. The downside: writes can be expensive if the tree needs rebalancing.
QuadtreeProximity.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// io.thecodeforge — SystemDesign tutorial
classQuadtree {
constructor(boundary, capacity) {
this.boundary = boundary; // {x, y, w, h}
this.capacity = capacity; // max POIs per node before split
this.points = [];
this.children = null;
}
insert(point) {
if (!this.boundary.contains(point)) returnfalse;
if (this.children === null && this.points.length < this.capacity) {
this.points.push(point);
returntrue;
}
if (this.children === null) this.subdivide();
for (let child of this.children) {
if (child.insert(point)) returntrue;
}
returnfalse; // shouldn't happen
}
query(range, found) {
if (!this.boundary.intersects(range)) return;
for (let p of this.points) {
if (range.contains(p)) found.push(p);
}
if (this.children !== null) {
for (let child of this.children) child.query(range, found);
}
}
subdivide() {
let {x, y, w, h} = this.boundary;
let halfW = w / 2, halfH = h / 2;
this.children = [
newQuadtree({x, y, w: halfW, h: halfH}, this.capacity),
newQuadtree({x: x+halfW, y, w: halfW, h: halfH}, this.capacity),
newQuadtree({x, y: y+halfH, w: halfW, h: halfH}, this.capacity),
newQuadtree({x: x+halfW, y: y+halfH, w: halfW, h: halfH}, this.capacity)
];
// Redistribute points to children
for (let p of this.points) {
for (let child of this.children) child.insert(p);
}
this.points = []; // clear parent
}
}
// Usage:
let tree = newQuadtree({x: -180, y: -90, w: 360, h: 180}, 10);
// Insert 10M POIs...
// Query: find points within 1km of (lat, lng)
let range = {x: userLng-0.01, y: userLat-0.01, w: 0.02, h: 0.02}; // approx 1km
let nearby = [];
tree.query(range, nearby);
Output
Query returns ~50 candidates from 10M POIs in <5ms.
Never Do This:
Don't rebuild the entire quadtree on every write. Use a balanced tree or batch updates. Otherwise, a single POI insertion triggers O(n) rebalancing, killing write throughput.
thecodeforge.io
Geohash vs Quadtree Tradeoffs
Design Proximity Service
Production Gotchas: Thread Pools, Memory, and Edge Cases
I've seen a proximity service bring down a payments system because the geohash query thread pool was exhausted at 3AM. The service used a single thread pool for both reads and writes. A bulk write operation (importing 100K POIs) blocked all read queries, causing cascading timeouts. Fix: separate read and write thread pools. Also, watch out for geohash precision at high latitudes—cells get distorted near poles. Use a library that handles this (e.g., geohash-int). Memory: quadtrees with full POI objects blow up. Store only IDs and coordinates in the tree; fetch details from cache or DB.
ThreadPoolSeparation.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — SystemDesign tutorial
// JavaExecutorService example
ExecutorService readPool = Executors.newFixedThreadPool(20); // read queries
ExecutorService writePool = Executors.newFixedThreadPool(5); // POI inserts
// Submit read tasks to readPool, write tasks to writePool
// This prevents bulk writes from starving reads.
// Also set a timeout on read tasks:
Future<List<POI>> future = readPool.submit(queryTask);
try {
List<POI> results = future.get(100, TimeUnit.MILLISECONDS);
} catch (TimeoutException e) {
// Return empty or fallback
future.cancel(true);
}
Output
Read queries never blocked by writes. Timeouts handled gracefully.
Interview Gold:
Q: 'How does your proximity service handle concurrent writes and reads?' A: 'Separate thread pools for reads and writes. Read pool has more threads. Write pool has a bounded queue to backpressure bulk imports.'
When Not to Use a Proximity Service — Simpler Alternatives
If your dataset is small (<10K POIs) and query rate is low (<10 QPS), just compute Haversine on the fly. No index needed. If you only need 'closest one' (not radius), use a nearest-neighbor search with a k-d tree. If your POIs are static (e.g., store locations), precompute distances and cache them. Don't over-engineer. A proximity service with geohashes and quadtrees is overkill for a local pizza shop's website.
Senior Shortcut:
Rule of thumb: if your dataset fits in memory (<1M POIs) and you can tolerate 100ms queries, use a simple in-memory list with a bounding box pre-filter. Only reach for geohash/quadtree when you exceed 1M POIs or need <10ms queries.
● Production incidentPOST-MORTEMseverity: high
The 4GB Container That Kept Dying
Symptom
A proximity service for a food delivery app started returning 500s every 15 minutes. The container had 4GB RAM and was OOM-killed.
Assumption
The team assumed a memory leak in the geohash library. They restarted the container and bumped RAM to 8GB.
Root cause
The quadtree implementation stored all POI objects in memory, including full address strings and metadata. With 2 million POIs, each ~2KB, the tree consumed 4GB. The OOM killer struck during peak traffic when new POIs were inserted, causing tree rebalancing that temporarily doubled memory usage.
Fix
Switched to a disk-backed R-tree using SQLite's R*Tree module. Reduced memory footprint to 200MB. Also added a memory limit on the quadtree's node cache.
Key lesson
Never store full objects in spatial indexes.
Store only IDs and coordinates; fetch details on demand from a cache or DB.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Queries return incomplete results (missing POIs near boundaries)
→
Fix
1. Check if you query neighbor geohash cells. 2. Verify geohash precision matches radius. 3. For quadtree, check if the query range correctly covers the radius (add margin).
Symptom · 02
High memory usage in quadtree
→
Fix
1. Check if full POI objects are stored in tree nodes. 2. Reduce to ID+coords only. 3. Set a max node capacity to prevent deep trees. 4. Consider disk-backed R-tree.
Symptom · 03
Write latency spikes during POI insertion
→
Fix
1. Check if quadtree rebalances on every insert. 2. Batch inserts and rebuild tree periodically. 3. Use a write-ahead log and async index update.
★ Proximity Service Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Query returns no results even though POIs exist−
Immediate action
Check geohash precision and neighbor cells
Commands
SELECT geohash FROM pois LIMIT 5;
SELECT COUNT(*) FROM pois WHERE geohash LIKE 'u4pruy%';
Fix now
Increase geohash precision or add neighbor cell query.
OOM kill on container+
Immediate action
Check memory usage of quadtree
Commands
top -p $(pgrep -f proximity)
jmap -histo <pid> | head -20
Fix now
Reduce POI object size in tree; use disk-backed index.
High CPU usage on queries+
Immediate action
Check if Haversine is computed on all POIs
Commands
perf top -p <pid>
strace -c -p <pid>
Fix now
Add spatial index (geohash/quadtree) to pre-filter.
Thread pool exhausted+
Immediate action
Check thread pool queue size
Commands
jstack <pid> | grep 'pool'
curl /actuator/metrics/jvm.threads.live
Fix now
Separate read/write pools; add backpressure on writes.
Feature / Aspect
Geohash
Quadtree
Query time
O(log n) with B-tree index
O(log n) average, O(n) worst-case if unbalanced
Write time
O(log n) (insert geohash string)
O(log n) average, O(n) worst-case on split
Memory overhead
Low (just string index)
Higher (tree nodes, but adaptive)
Boundary precision
Fixed grid size
Adaptive to density
Implementation complexity
Simple (string prefix match)
Moderate (recursive tree)
Best for
Uniform density, simple setup
Varying density, high precision needs
Key takeaways
1
Never compute Haversine on all POIs—use a spatial index to pre-filter.
2
Geohash is simple and works with any DB that has a B-tree index; always query neighbor cells.
3
Quadtrees adapt to density but can blow memory if you store full objects; store only IDs.
4
Separate read and write thread pools to avoid cascading failures during bulk imports.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How does a quadtree handle concurrent inserts and queries in a multi-thr...
Q02SENIOR
When would you choose geohash over quadtree in a production system?
Q03SENIOR
What happens when you query a geohash at the International Date Line? Ho...
Q04JUNIOR
What is a spatial index and why is it needed for proximity services?
Q05SENIOR
Your proximity service returns 500s under load. You suspect the quadtree...
Q06SENIOR
Design a proximity service for a global food delivery app with 10M resta...
Q01 of 06SENIOR
How does a quadtree handle concurrent inserts and queries in a multi-threaded environment?
ANSWER
Use a read-write lock. Queries acquire read lock (shared), inserts acquire write lock (exclusive). For higher throughput, use a lock-free quadtree or partition the tree by region (e.g., separate trees for each continent).
Q02 of 06SENIOR
When would you choose geohash over quadtree in a production system?
ANSWER
Choose geohash when POI density is roughly uniform and you need simple, scalable indexing on a relational DB. Choose quadtree when density varies widely (e.g., city vs rural) and you need adaptive precision to keep query times consistent.
Q03 of 06SENIOR
What happens when you query a geohash at the International Date Line? How do you mitigate?
ANSWER
Geohash cells cross the date line, but prefix matching may fail because longitude wraps. Mitigation: handle longitude wrapping in application code—if the user's longitude is near ±180°, also query cells with wrapped longitude (e.g., add 360°).
Q04 of 06JUNIOR
What is a spatial index and why is it needed for proximity services?
ANSWER
A spatial index organizes data by location so you can quickly find points within a region without scanning all records. Without it, every query is O(n).
Q05 of 06SENIOR
Your proximity service returns 500s under load. You suspect the quadtree is the bottleneck. How do you diagnose?
ANSWER
Check CPU and memory. If CPU is high, profile the query method—likely the tree traversal is deep. If memory is high, check if full objects are stored. Use jstack to see if threads are stuck on tree locks. Mitigation: reduce node capacity, use read-write locks, or partition the tree.
Q06 of 06SENIOR
Design a proximity service for a global food delivery app with 10M restaurants and 100K queries/second.
ANSWER
Use geohash for simplicity and horizontal scaling. Partition by geohash prefix (e.g., first 3 chars) across shards. Each shard has a B-tree index on geohash. Cache popular queries (e.g., downtown areas) in Redis. Use separate read/write thread pools. Monitor geohash precision to avoid boundary misses.
01
How does a quadtree handle concurrent inserts and queries in a multi-threaded environment?
SENIOR
02
When would you choose geohash over quadtree in a production system?
SENIOR
03
What happens when you query a geohash at the International Date Line? How do you mitigate?
SENIOR
04
What is a spatial index and why is it needed for proximity services?
JUNIOR
05
Your proximity service returns 500s under load. You suspect the quadtree is the bottleneck. How do you diagnose?
SENIOR
06
Design a proximity service for a global food delivery app with 10M restaurants and 100K queries/second.
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
What is the best way to find nearby points in a database?
Use a spatial index. Geohash is the easiest to implement on any SQL database—just index a geohash column and query with prefix matching. For more dynamic density, use a quadtree or R-tree.
Was this helpful?
02
What's the difference between geohash and quadtree?
Geohash uses a fixed grid; quadtree adapts to data density. Geohash is simpler and works with standard B-tree indexes. Quadtree is more memory-efficient for sparse data and more precise for dense clusters.
Was this helpful?
03
How do I handle geohash boundary issues?
Always query the 9-cell grid: the user's cell plus its 8 neighbors. This ensures you don't miss POIs just outside the cell but within the search radius.
Was this helpful?
04
What happens to geohash precision at high latitudes?
Geohash cells distort near the poles—they become narrower in longitude. Use a library that handles this (e.g., geohash-int) or switch to a quadtree for polar regions.