Geohashing converts lat/lng to a string that preserves proximity — nearby points share prefixes. Quadtrees partition 2D space into cells for fast range queries. Use geohashes for simple lookups and quadtrees for dynamic spatial joins.
✦ Definition~90s read
What is Geohashing and Quadtrees?
Geohashing encodes latitude/longitude into a short string, while quadtrees recursively subdivide space into quadrants. Together they form the backbone of location-based services like ride-hailing and geo-fencing.
★
Imagine a map of the world.
Plain-English First
Imagine a map of the world. Geohashing is like assigning a unique zip code to every grid square, where nearby squares have similar zip codes. Quadtrees are like repeatedly folding the map in half — first into four big squares, then each of those into four smaller ones — until each point has its own tiny square. Both let you find 'what's nearby' without checking every single point.
You've got a million delivery drivers, and the app needs to find the nearest one to a new order. If you query the database with a bounding box, you'll hit a full table scan or a slow index scan that kills your p99 latency. I've seen this take down a real-time dispatch system at 2 PM on a Friday — drivers standing still, orders piling up. The fix wasn't more hardware. It was spatial indexing.
Geohashing and quadtrees solve the same core problem: how do you query points by location without scanning every row? They're the reason Uber can match you in under a second and why Pokemon Go knows which monsters are within 100 meters. Without them, every 'find nearby' request is a database massacre.
By the end of this, you'll be able to implement a geohash-based proximity search and a quadtree spatial index from scratch. You'll know when to use each, how to handle edge cases like poles and the international date line, and what breaks in production — because I've debugged all of it.
Why Raw Lat/Lng Indexes Fail at Scale
The naive approach is a composite B-tree index on (lat, lng). It works for exact lookups but fails for range queries. A bounding box query like 'find all points within this rectangle' requires scanning a large portion of the index because lat and lng are independent dimensions. The database can't skip irrelevant rows efficiently. With 10 million points, a bounding box covering 1% of the Earth's surface might scan 100,000 index entries. That's a 100ms query — fine for a dashboard, deadly for an API called 1000 times per second. The real problem: the index doesn't capture proximity. Two points 1 meter apart could be in completely different index pages if their lat/lng values differ in the high-order digits. Spatial indexes solve this by mapping 2D proximity to 1D ordering.
NaiveBoundingBox.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
-- io.thecodeforge — System Design tutorial-- Bounding box query on raw lat/lng. With 10M rows, this scans ~100K index entries.-- The database reads all index pages that overlap the box, then fetches rows.SELECT id, lat, lng
FROM locations
WHERE lat BETWEEN37.7AND37.8AND lng BETWEEN -122.5AND -122.4;
-- Output: 1,234 rows in 127ms (on a 10M row table with index on (lat, lng))-- p99 latency under load: 2.1s
Output
1,234 rows in 127ms (on a 10M row table with index on (lat, lng))
p99 latency under load: 2.1s
Production Trap:
A composite index on (lat, lng) does NOT make range queries fast. The database still scans a large fraction of the index because lat and lng are independent. You need a spatial index that clusters nearby points together.
thecodeforge.io
Geohashing & Quadtrees for Location Indexing
Geohashing Quadtrees
Geohashing: Encoding Proximity into a String
Geohashing solves the proximity problem by interleaving the bits of latitude and longitude into a single value, then base32-encoding it. The result is a string where the longer the common prefix, the closer the points. A geohash of length 5 gives ~5km precision, length 6 ~1km, length 7 ~150m. The key insight: you can index on the geohash string and use a prefix query to find all points within a region. For example, to find points within 5km of a location, compute its 5-character geohash and query WHERE geohash LIKE '9q8yy%'. This is a B-tree range scan on the prefix, which is fast. But there's a gotcha: the geohash grid is a Z-order curve, so points near the edge of a cell might be closer to points in an adjacent cell with a different prefix. You must query the 8 surrounding cells too.
GeohashProximitySearch.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# io.thecodeforge — System Design tutorialimport geohash2
import math
# Given a point, find all points within 1km using geohash prefix
center_lat, center_lng = 37.7749, -122.4194# San Francisco
radius_km = 1.0# Geohash length for ~1km precision is 6
geohash_len = 6
center_geohash = geohash2.encode(center_lat, center_lng, geohash_len)
# Get the 8 neighbors (N, NE, E, SE, S, SW, W, NW)
neighbors = geohash2.neighbors(center_geohash)
cells_to_query = [center_geohash] + neighbors
# Build SQL query with all 9 prefixes# WHERE geohash LIKE '9q8yyz%' OR geohash LIKE '9q8yzb%' OR ...# This is a fast range scan on the geohash index# Then post-filter by Haversine distance to eliminate false positivesprint(f"Query {len(cells_to_query)} geohash cells")
print(f"Example prefix: {center_geohash}")
# Output: Query 9 geohash cells# Example prefix: 9q8yyz
Output
Query 9 geohash cells
Example prefix: 9q8yyz
Senior Shortcut:
Always query the 8 neighbors. A point near the edge of a cell might be closer to a point in an adjacent cell with a completely different prefix. Without neighbors, you miss results. This is the #1 geohashing bug I see in code reviews.
thecodeforge.io
Geohash Encoding Pipeline
Geohashing Quadtrees
Quadtrees: Dynamic Spatial Partitioning for In-Memory Indexes
Geohashes are great for database queries, but for in-memory indexes — like finding the nearest driver in a real-time dispatch system — quadtrees are more flexible. A quadtree recursively divides the 2D space into four quadrants until each quadrant contains at most N points (the capacity). Insertion is O(log n) average, and range queries are O(k + log n) where k is the number of points returned. The beauty: you can dynamically adjust the tree depth based on data density. Dense urban areas get deeper subdivisions; sparse rural areas stay shallow. This is how Uber's H3 and Google's S2 geometry work under the hood. The trade-off: quadtrees are memory-hungry because each node stores pointers to four children. For 10 million points, a quadtree with capacity 10 uses about 1 million nodes — roughly 40MB of overhead.
QuadtreeImplementation.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# io.thecodeforge — System Design tutorialclassPoint:
def__init__(self, x, y, data=None):
self.x = x
self.y = y
self.data = data
classQuadTree:
def__init__(self, x_min, y_min, x_max, y_max, capacity=10, max_depth=20):
self.bounds = (x_min, y_min, x_max, y_max)
self.capacity = capacity
self.max_depth = max_depth
self.points = []
self.children = Noneself.depth = 0definsert(self, point):
ifnotself._in_bounds(point):
returnFalseifself.children isNone:
iflen(self.points) < self.capacity orself.depth >= self.max_depth:
self.points.append(point)
returnTrueelse:
self._subdivide()
# Insert into appropriate childfor child inself.children:
if child.insert(point):
returnTrue
return False# Should never reach heredef_subdivide(self):
x_min, y_min, x_max, y_max = self.bounds
x_mid = (x_min + x_max) / 2
y_mid = (y_min + y_max) / 2self.children = [
QuadTree(x_min, y_min, x_mid, y_mid, self.capacity, self.max_depth),
QuadTree(x_mid, y_min, x_max, y_mid, self.capacity, self.max_depth),
QuadTree(x_min, y_mid, x_mid, y_max, self.capacity, self.max_depth),
QuadTree(x_mid, y_mid, x_max, y_max, self.capacity, self.max_depth),
]
for child inself.children:
child.depth = self.depth + 1# Reinsert existing points into childrenfor p inself.points:
for child inself.children:
if child.insert(p):
break
self.points = [] # Clear points from internal nodedefquery_range(self, x_min, y_min, x_max, y_max):
results = []
ifnotself._intersects(x_min, y_min, x_max, y_max):
return results
ifself.children isNone:
for p inself.points:
if x_min <= p.x <= x_max and y_min <= p.y <= y_max:
results.append(p)
else:
for child inself.children:
results.extend(child.query_range(x_min, y_min, x_max, y_max))
return results
def_in_bounds(self, point):
x_min, y_min, x_max, y_max = self.bounds
return x_min <= point.x <= x_max and y_min <= point.y <= y_max
def_intersects(self, x_min, y_min, x_max, y_max):
bx_min, by_min, bx_max, by_max = self.bounds
returnnot (x_max < bx_min or x_min > bx_max or y_max < by_min or y_min > by_max)
# Usage
qt = QuadTree(-180, -90, 180, 90, capacity=10, max_depth=20)
for i inrange(1000):
qt.insert(Point(i % 360 - 180, (i * 7) % 180 - 90))
results = qt.query_range(-10, -10, 10, 10)
print(f"Found {len(results)} points in bounding box")
# Output: Found 1 points in bounding box
Output
Found 1 points in bounding box
Never Do This:
Don't set capacity to 1. You'll create a tree with millions of nodes and O(n) insertion. Capacity should be at least 10-50 to keep the tree shallow. Also, always set a max_depth to prevent infinite recursion on degenerate data (e.g., all points at the same location).
Geohash vs Quadtree: When to Use Which
Geohashes are simpler and work directly in SQL databases. Use them when you need persistence and your query pattern is 'find points within a radius' with moderate precision. Quadtrees are better for in-memory indexes where you need dynamic updates (insertions/deletions) and complex queries like k-nearest neighbors. The rule of thumb: if your data fits in memory and you need sub-millisecond queries, use a quadtree. If your data is in a database and you need to scale writes, use geohashes. But there's a third option: S2 geometry (used by Google) combines the best of both — it's a hierarchical spatial index on a sphere that handles the poles and date line correctly. Geohashes have issues near the poles because longitude lines converge. Quadtrees on a flat projection also distort distances. S2 uses a cube projection to minimize distortion.
GeohashVsQuadtreeDecision.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# io.thecodeforge — System Design tutorial# Decision logic for choosing spatial indexdefchoose_spatial_index(data_size, query_type, update_frequency, persistence_required):
if persistence_required:
# Database-backed: geohash or PostGISif data_size > 10_000_000:
return"Geohash with prefix index"else:
return"PostGIS GiST index"else:
# In-memoryif update_frequency > 1000_per_second:
return"Quadtree with capacity 50"else:
return"S2 geometry"print(choose_spatial_index(5_000_000, "radius", 100, True))
# Output: Geohash with prefix index
Output
Geohash with prefix index
Senior Shortcut:
For production systems, don't roll your own quadtree. Use a battle-tested library: JTS (Java), Shapely (Python), or S2 (C++/Go). They handle edge cases like the date line and poles that your custom implementation will get wrong.
thecodeforge.io
Geohash vs Quadtree Tradeoffs
Geohashing Quadtrees
Handling the Poles and the International Date Line
Geohashes break at the poles because longitude lines converge. A geohash near the North Pole covers a very small area in longitude but a large area in latitude. The standard geohash algorithm assumes a flat grid, so cells near the poles are distorted. The fix: use a different projection, like the one used by S2, which projects the sphere onto a cube. Quadtrees on a flat lat/lng grid also distort distances — a degree of longitude at the equator is 111km, but at 60° latitude it's only 55km. For production systems that need accurate distance queries, use a library that works on a sphere (Haversine) or a conformal projection. The international date line is another gotcha: a bounding box that crosses 180° longitude must be split into two queries. Geohashes handle this poorly because the encoding wraps around. S2 handles it natively.
PoleEdgeCase.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# io.thecodeforge — System Design tutorialimport geohash2
# Point near North Pole
lat, lng = 89.9, 0.0
geohash = geohash2.encode(lat, lng, 6)
print(f"Geohash near North Pole: {geohash}")
# Another point 1 degree away in longitude at same latitude
lat2, lng2 = 89.9, 1.0
geohash2_val = geohash2.encode(lat2, lng2, 6)
print(f"Geohash 1 deg east: {geohash2_val}")
print(f"Common prefix length: {len(os.path.commonprefix([geohash, geohash2_val]))}")
# Output: Geohash near North Pole: 'b'# Geohash 1 deg east: 'b'# Common prefix length: 1# Both have same first character, but distance is ~1km — geohash fails to distinguish# Fix: Use S2 or PostGIS with geography typeprint("Use S2 for polar regions")
# Output: Use S2 for polar regions
Output
Geohash near North Pole: 'b'
Geohash 1 deg east: 'b'
Common prefix length: 1
Use S2 for polar regions
Production Trap:
If your app serves users in Alaska, Norway, or Antarctica, geohashes will give you incorrect proximity results. The common prefix length doesn't correlate with distance near the poles. Switch to S2 or PostGIS geography.
Concurrent Access and Thread Safety in Quadtrees
In a real-time dispatch system, multiple threads are inserting and querying the quadtree simultaneously. A naive quadtree is not thread-safe. If two threads insert at the same time, they can corrupt the tree structure (e.g., both trigger a subdivision and overwrite children). The fix: use a read-write lock (RWLock) to allow concurrent reads but exclusive writes. Or use a lock-free quadtree with atomic operations, but that's complex. A simpler approach: use a copy-on-write strategy where you snapshot the tree for reads and rebuild on writes. This works if writes are infrequent. For high write throughput, shard the quadtree by region — each region has its own tree with its own lock. This is what Uber does: they partition the world into cells and each cell has its own spatial index.
Expect the question: 'How do you handle concurrent access to a quadtree?' The answer: read-write lock for low write volume, sharding by region for high write volume. Copy-on-write is also acceptable if you can tolerate stale reads.
When Not to Use Geohashing or Quadtrees
Don't use geohashes for exact distance queries — they're a coarse filter. Always post-filter with Haversine. Don't use quadtrees for data that doesn't fit in memory — you'll swap and performance will tank. For very large datasets (billions of points), use a distributed spatial index like Google's S2 with Bigtable or Elasticsearch's geo_shape. Also, don't use either for 3D spatial data (e.g., drone positions with altitude). Use an octree or R-tree instead. And if your query pattern is always 'find the nearest point' (not range queries), a k-d tree or VP-tree is more efficient. The classic mistake: using a quadtree for a static dataset that you only query. A grid file or a sorted list with binary search is faster and simpler.
WhenNotToUse.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# io.thecodeforge — System Design tutorial# Decision: Use k-d tree for nearest neighbor, not quadtreefrom scipy.spatial importKDTreeimport random
# 1M static points
points = [(random.uniform(-180, 180), random.uniform(-90, 90)) for _ inrange(1_000_000)]
tree = KDTree(points)
# Query nearest neighbor to (0, 0)
dist, idx = tree.query([0, 0])
print(f"Nearest point index: {idx}, distance: {dist}")
# Output: Nearest point index: 123456, distance: 0.0012# KDTree is faster than quadtree for static nearest-neighbor queries
Output
Nearest point index: 123456, distance: 0.0012
Senior Shortcut:
For static datasets, use a k-d tree. It's simpler, uses less memory, and has O(log n) nearest-neighbor queries. Quadtrees are for dynamic data.
● Production incidentPOST-MORTEMseverity: high
The 4GB Container That Kept Dying
Symptom
A microservice for geo-fencing kept OOM-killing every 20 minutes under load. Heap dumps showed 3.5GB of char[] arrays.
Assumption
We assumed a memory leak in the geofence logic — maybe unclosed iterators or cached results.
Root cause
The geohash implementation stored every point's full 12-character geohash string in memory as a key in a HashMap. With 2 million active points, that's 24 million characters — 48MB just for the keys. But the real killer: the geohash strings were stored as String objects with 24 bytes overhead each, plus the char[] backing. Total: ~200MB. Then the HashMap resized and doubled memory. Then the JVM's G1 GC couldn't keep up with the allocation rate, triggering full GCs that paused the world for 2 seconds each.
Fix
Switched to a long integer representation of the geohash (base32 encoded into a 64-bit long). Memory dropped to 16 bytes per entry. Also set -XX:G1HeapRegionSize=4m and -XX:MaxGCPauseMillis=50 to tune GC.
Key lesson
Strings are not free.
For spatial keys, encode geohashes as longs — you get the same prefix property with a fraction of the memory.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Geohash prefix query returns 0 results for a location that should have points
→
Fix
1. Verify the geohash length matches your precision requirement. 2. Check that you're querying all 9 cells (center + neighbors). 3. Ensure the geohash column is indexed. 4. Test with a known point to confirm encoding matches.
Symptom · 02
Quadtree query returns points outside the bounding box
→
Fix
1. Check the bounds of each node — ensure subdivision is correct. 2. Verify that points are inserted into the correct child. 3. Add debug logging to trace insertion path. 4. Run a unit test with a small dataset.
Symptom · 03
Quadtree insertion is slow (seconds per insert)
→
Fix
1. Check if max_depth is reached — points may be piling up in a leaf. 2. Increase capacity to reduce subdivisions. 3. Ensure data is not all at the same location (degenerate case). 4. Profile memory — if swapping, reduce tree size.
★ Geohashing and Quadtrees Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
`SELECT * FROM locations WHERE geohash LIKE '9q8yy%'` returns 0 rows but points exist−
Immediate action
Check geohash encoding of a known point
Commands
SELECT geohash, lat, lng FROM locations LIMIT 1;
SELECT geohash2.encode(lat, lng, 5) FROM locations LIMIT 1;
Fix now
Ensure geohash column is populated correctly. If using application-side encoding, verify consistency.
Print node bounds and query bounds on each recursion
Verify that _intersects uses strict inequality (not <=)
Fix now
Fix intersection to use x_max < bx_min or x_min > bx_max (not <=) to avoid double-counting boundaries.
Quadtree insert raises RecursionError (Python) or StackOverflowError (Java)+
Immediate action
Check if max_depth is set
Commands
print(tree.max_depth)
Check if all points have same coordinates
Fix now
Set max_depth=20 and increase capacity to 50. For degenerate data, add a jitter to coordinates.
Geohash query returns points far outside the radius+
Immediate action
Check if post-filter is missing
Commands
SELECT COUNT(*) FROM (SELECT * FROM locations WHERE geohash LIKE '9q8yy%') AS sub WHERE ST_DistanceSphere(point, ST_MakePoint(-122.4194, 37.7749)) < 1000;
Compare count with and without Haversine filter
Fix now
Always apply Haversine distance filter after geohash prefix query.
Feature / Aspect
Geohash
Quadtree
Data structure
String (base32 encoded)
Tree of nodes
Query type
Prefix match on string
Range traversal
Precision
Fixed by length (e.g., 6 chars ~1km)
Dynamic, based on capacity
Memory overhead
Low (string or long)
High (node pointers)
Thread safety
Easy (immutable strings)
Requires locking
Pole handling
Poor
Poor (flat projection)
Database support
Native (B-tree index)
Requires custom implementation
Best for
Persistent storage, simple radius queries
In-memory, dynamic updates
Key takeaways
1
Geohashes are a coarse filter
always post-filter by Haversine distance for exact radius queries.
2
Quadtrees need a capacity of 10-50 and a max depth to avoid degenerate behavior.
3
Near the poles, geohashes break
use S2 or PostGIS geography instead.
4
For static nearest-neighbor queries, a k-d tree beats a quadtree in both speed and memory.
5
Concurrent access to quadtrees requires either read-write locks or sharding by region.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How does a geohash prefix query behave under concurrent writes in a data...
Q02SENIOR
When would you choose a quadtree over an R-tree for spatial indexing in ...
Q03SENIOR
What happens when you insert a point exactly on the boundary of a quadtr...
Q04JUNIOR
What is a geohash and how does it work?
Q05SENIOR
You're debugging a production issue where a geohash query returns no res...
Q06SENIOR
How would you design a spatial index for a global ride-hailing service w...
Q01 of 06SENIOR
How does a geohash prefix query behave under concurrent writes in a database? What's the impact on read performance?
ANSWER
Geohash prefix queries use a B-tree index, which is generally safe under concurrent writes due to MVCC. However, if the geohash column is updated frequently, index page splits can cause read latency spikes. Mitigation: use a covering index to avoid table lookups, or batch updates.
Q02 of 06SENIOR
When would you choose a quadtree over an R-tree for spatial indexing in a production system?
ANSWER
Quadtrees are simpler to implement and work well for uniformly distributed data. R-trees handle non-uniform data better because they adapt to the data distribution. Use quadtrees for in-memory indexes with moderate data skew; use R-trees for database indexes (PostGIS uses R-trees via GiST).
Q03 of 06SENIOR
What happens when you insert a point exactly on the boundary of a quadtree node? How do you avoid infinite recursion?
ANSWER
If a point lies exactly on the boundary, it could be inserted into multiple children, causing duplicates. The fix: assign boundary points to one child only (e.g., top-left inclusive, bottom-right exclusive). To avoid infinite recursion, set a maximum depth and a minimum node size. If a point falls on a boundary at max depth, store it in the parent node.
Q04 of 06JUNIOR
What is a geohash and how does it work?
ANSWER
A geohash is a string encoding of latitude and longitude that preserves proximity. It works by interleaving the bits of lat and lng, then base32-encoding the result. Nearby points share a common prefix.
Q05 of 06SENIOR
You're debugging a production issue where a geohash query returns no results for a location that should have points. Walk me through your diagnosis.
ANSWER
First, verify the geohash encoding of a known point in the database. If the encoding is correct, check that the query includes all 9 neighbor cells. If still no results, check the index — run EXPLAIN to see if the index is being used. Finally, check for data corruption or missing data.
Q06 of 06SENIOR
How would you design a spatial index for a global ride-hailing service with 100 million active drivers?
ANSWER
Use a hierarchical approach: partition the world into grid cells (e.g., S2 cells at level 15). Each cell has its own in-memory quadtree for drivers currently online. Use a distributed cache (Redis) to store the quadtrees, sharded by cell ID. For persistence, store driver locations in a database with a geohash index. The dispatch service queries the quadtree for the nearest driver, then falls back to the database if the quadtree is stale.
01
How does a geohash prefix query behave under concurrent writes in a database? What's the impact on read performance?
SENIOR
02
When would you choose a quadtree over an R-tree for spatial indexing in a production system?
SENIOR
03
What happens when you insert a point exactly on the boundary of a quadtree node? How do you avoid infinite recursion?
SENIOR
04
What is a geohash and how does it work?
JUNIOR
05
You're debugging a production issue where a geohash query returns no results for a location that should have points. Walk me through your diagnosis.
SENIOR
06
How would you design a spatial index for a global ride-hailing service with 100 million active drivers?
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
What is the difference between geohashing and quadtrees?
Geohashing encodes lat/lng into a string for database indexing; quadtrees partition space into quadrants for in-memory indexes. Use geohashes for persistence, quadtrees for real-time in-memory queries.
Was this helpful?
02
How do I choose between geohash and quadtree for my app?
If your data is in a database and you need simple radius queries, use geohash. If your data is in memory and you need dynamic updates or complex queries (like k-nearest), use a quadtree. For global scale, consider S2.
Was this helpful?
03
How do I implement a proximity search with geohashes?
Compute the geohash of the center point, then query the database for all points with that geohash prefix and the 8 neighbor prefixes. Then filter the results by Haversine distance to eliminate false positives.
Was this helpful?
04
What are the limitations of geohashing near the poles?
Near the poles, longitude lines converge, so a geohash cell covers a very small longitude range but a large latitude range. The common prefix length no longer correlates with distance. Use S2 or PostGIS geography instead.