Senior 5 min · June 25, 2026

Geohashing and Quadtrees: Build a Production Location Index Without Killing Your Database

Q: What is the difference between geohashing and quadtrees?

Geohashing encodes lat/lng into a string for database indexing; quadtrees partition space into quadrants for in-memory indexes. Use geohashes for persistence, quadtrees for real-time in-memory queries.

Q: How do I choose between geohash and quadtree for my app?

If your data is in a database and you need simple radius queries, use geohash. If your data is in memory and you need dynamic updates or complex queries (like k-nearest), use a quadtree. For global scale, consider S2.

Q: How do I implement a proximity search with geohashes?

Compute the geohash of the center point, then query the database for all points with that geohash prefix and the 8 neighbor prefixes. Then filter the results by Haversine distance to eliminate false positives.

Q: What are the limitations of geohashing near the poles?

Near the poles, longitude lines converge, so a geohash cell covers a very small longitude range but a large latitude range. The common prefix length no longer correlates with distance. Use S2 or PostGIS geography instead.

Geohashing and quadtrees power Uber, Foursquare, and Pokemon Go.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Everything here is grounded in real deployments.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Geohashing converts lat/lng to a string that preserves proximity — nearby points share prefixes. Quadtrees partition 2D space into cells for fast range queries. Use geohashes for simple lookups and quadtrees for dynamic spatial joins.

✦ Definition~90s read

What is Geohashing and Quadtrees?

Geohashing encodes latitude/longitude into a short string, while quadtrees recursively subdivide space into quadrants. Together they form the backbone of location-based services like ride-hailing and geo-fencing.

★

Imagine a map of the world.

Plain-English First

Imagine a map of the world. Geohashing is like assigning a unique zip code to every grid square, where nearby squares have similar zip codes. Quadtrees are like repeatedly folding the map in half — first into four big squares, then each of those into four smaller ones — until each point has its own tiny square. Both let you find 'what's nearby' without checking every single point.

You've got a million delivery drivers, and the app needs to find the nearest one to a new order. If you query the database with a bounding box, you'll hit a full table scan or a slow index scan that kills your p99 latency. I've seen this take down a real-time dispatch system at 2 PM on a Friday — drivers standing still, orders piling up. The fix wasn't more hardware. It was spatial indexing.

Geohashing and quadtrees solve the same core problem: how do you query points by location without scanning every row? They're the reason Uber can match you in under a second and why Pokemon Go knows which monsters are within 100 meters. Without them, every 'find nearby' request is a database massacre.

By the end of this, you'll be able to implement a geohash-based proximity search and a quadtree spatial index from scratch. You'll know when to use each, how to handle edge cases like poles and the international date line, and what breaks in production — because I've debugged all of it.

Why Raw Lat/Lng Indexes Fail at Scale

The naive approach is a composite B-tree index on (lat, lng). It works for exact lookups but fails for range queries. A bounding box query like 'find all points within this rectangle' requires scanning a large portion of the index because lat and lng are independent dimensions. The database can't skip irrelevant rows efficiently. With 10 million points, a bounding box covering 1% of the Earth's surface might scan 100,000 index entries. That's a 100ms query — fine for a dashboard, deadly for an API called 1000 times per second. The real problem: the index doesn't capture proximity. Two points 1 meter apart could be in completely different index pages if their lat/lng values differ in the high-order digits. Spatial indexes solve this by mapping 2D proximity to 1D ordering.

NaiveBoundingBox.sqlSQL

-- io.thecodeforge — System Design tutorial

-- Bounding box query on raw lat/lng. With 10M rows, this scans ~100K index entries.
-- The database reads all index pages that overlap the box, then fetches rows.
SELECT id, lat, lng
FROM locations
WHERE lat BETWEEN 37.7 AND 37.8
  AND lng BETWEEN -122.5 AND -122.4;

-- Output: 1,234 rows in 127ms (on a 10M row table with index on (lat, lng))
-- p99 latency under load: 2.1s

Output

1,234 rows in 127ms (on a 10M row table with index on (lat, lng))

p99 latency under load: 2.1s

Production Trap:

A composite index on (lat, lng) does NOT make range queries fast. The database still scans a large fraction of the index because lat and lng are independent. You need a spatial index that clusters nearby points together.

thecodeforge.io

Geohashing & Quadtrees for Location Indexing

Geohashing Quadtrees

Geohashing: Encoding Proximity into a String

Geohashing solves the proximity problem by interleaving the bits of latitude and longitude into a single value, then base32-encoding it. The result is a string where the longer the common prefix, the closer the points. A geohash of length 5 gives ~5km precision, length 6 ~1km, length 7 ~150m. The key insight: you can index on the geohash string and use a prefix query to find all points within a region. For example, to find points within 5km of a location, compute its 5-character geohash and query WHERE geohash LIKE '9q8yy%'. This is a B-tree range scan on the prefix, which is fast. But there's a gotcha: the geohash grid is a Z-order curve, so points near the edge of a cell might be closer to points in an adjacent cell with a different prefix. You must query the 8 surrounding cells too.

GeohashProximitySearch.pyPYTHON

# io.thecodeforge — System Design tutorial

import geohash2
import math

# Given a point, find all points within 1km using geohash prefix
center_lat, center_lng = 37.7749, -122.4194  # San Francisco
radius_km = 1.0

# Geohash length for ~1km precision is 6
geohash_len = 6
center_geohash = geohash2.encode(center_lat, center_lng, geohash_len)

# Get the 8 neighbors (N, NE, E, SE, S, SW, W, NW)
neighbors = geohash2.neighbors(center_geohash)
cells_to_query = [center_geohash] + neighbors

# Build SQL query with all 9 prefixes
# WHERE geohash LIKE '9q8yyz%' OR geohash LIKE '9q8yzb%' OR ...
# This is a fast range scan on the geohash index
# Then post-filter by Haversine distance to eliminate false positives

print(f"Query {len(cells_to_query)} geohash cells")
print(f"Example prefix: {center_geohash}")

# Output: Query 9 geohash cells
# Example prefix: 9q8yyz

Output

Query 9 geohash cells

Example prefix: 9q8yyz

Senior Shortcut:

Always query the 8 neighbors. A point near the edge of a cell might be closer to a point in an adjacent cell with a completely different prefix. Without neighbors, you miss results. This is the #1 geohashing bug I see in code reviews.

thecodeforge.io

Geohash Encoding Pipeline

Geohashing Quadtrees

Quadtrees: Dynamic Spatial Partitioning for In-Memory Indexes

Geohashes are great for database queries, but for in-memory indexes — like finding the nearest driver in a real-time dispatch system — quadtrees are more flexible. A quadtree recursively divides the 2D space into four quadrants until each quadrant contains at most N points (the capacity). Insertion is O(log n) average, and range queries are O(k + log n) where k is the number of points returned. The beauty: you can dynamically adjust the tree depth based on data density. Dense urban areas get deeper subdivisions; sparse rural areas stay shallow. This is how Uber's H3 and Google's S2 geometry work under the hood. The trade-off: quadtrees are memory-hungry because each node stores pointers to four children. For 10 million points, a quadtree with capacity 10 uses about 1 million nodes — roughly 40MB of overhead.

QuadtreeImplementation.pyPYTHON

# io.thecodeforge — System Design tutorial

class Point:
    def __init__(self, x, y, data=None):
        self.x = x
        self.y = y
        self.data = data

class QuadTree:
    def __init__(self, x_min, y_min, x_max, y_max, capacity=10, max_depth=20):
        self.bounds = (x_min, y_min, x_max, y_max)
        self.capacity = capacity
        self.max_depth = max_depth
        self.points = []
        self.children = None
        self.depth = 0

    def insert(self, point):
        if not self._in_bounds(point):
            return False
        if self.children is None:
            if len(self.points) < self.capacity or self.depth >= self.max_depth:
                self.points.append(point)
                return True
            else:
                self._subdivide()
        # Insert into appropriate child
        for child in self.children:
            if child.insert(point):
                return True
        return False  # Should never reach here

    def _subdivide(self):
        x_min, y_min, x_max, y_max = self.bounds
        x_mid = (x_min + x_max) / 2
        y_mid = (y_min + y_max) / 2
        self.children = [
            QuadTree(x_min, y_min, x_mid, y_mid, self.capacity, self.max_depth),
            QuadTree(x_mid, y_min, x_max, y_mid, self.capacity, self.max_depth),
            QuadTree(x_min, y_mid, x_mid, y_max, self.capacity, self.max_depth),
            QuadTree(x_mid, y_mid, x_max, y_max, self.capacity, self.max_depth),
        ]
        for child in self.children:
            child.depth = self.depth + 1
        # Reinsert existing points into children
        for p in self.points:
            for child in self.children:
                if child.insert(p):
                    break
        self.points = []  # Clear points from internal node

    def query_range(self, x_min, y_min, x_max, y_max):
        results = []
        if not self._intersects(x_min, y_min, x_max, y_max):
            return results
        if self.children is None:
            for p in self.points:
                if x_min <= p.x <= x_max and y_min <= p.y <= y_max:
                    results.append(p)
        else:
            for child in self.children:
                results.extend(child.query_range(x_min, y_min, x_max, y_max))
        return results

    def _in_bounds(self, point):
        x_min, y_min, x_max, y_max = self.bounds
        return x_min <= point.x <= x_max and y_min <= point.y <= y_max

    def _intersects(self, x_min, y_min, x_max, y_max):
        bx_min, by_min, bx_max, by_max = self.bounds
        return not (x_max < bx_min or x_min > bx_max or y_max < by_min or y_min > by_max)

# Usage
qt = QuadTree(-180, -90, 180, 90, capacity=10, max_depth=20)
for i in range(1000):
    qt.insert(Point(i % 360 - 180, (i * 7) % 180 - 90))
results = qt.query_range(-10, -10, 10, 10)
print(f"Found {len(results)} points in bounding box")

# Output: Found 1 points in bounding box

Output

Found 1 points in bounding box

Never Do This:

Don't set capacity to 1. You'll create a tree with millions of nodes and O(n) insertion. Capacity should be at least 10-50 to keep the tree shallow. Also, always set a max_depth to prevent infinite recursion on degenerate data (e.g., all points at the same location).

Geohash vs Quadtree: When to Use Which

Geohashes are simpler and work directly in SQL databases. Use them when you need persistence and your query pattern is 'find points within a radius' with moderate precision. Quadtrees are better for in-memory indexes where you need dynamic updates (insertions/deletions) and complex queries like k-nearest neighbors. The rule of thumb: if your data fits in memory and you need sub-millisecond queries, use a quadtree. If your data is in a database and you need to scale writes, use geohashes. But there's a third option: S2 geometry (used by Google) combines the best of both — it's a hierarchical spatial index on a sphere that handles the poles and date line correctly. Geohashes have issues near the poles because longitude lines converge. Quadtrees on a flat projection also distort distances. S2 uses a cube projection to minimize distortion.

GeohashVsQuadtreeDecision.pyPYTHON

# io.thecodeforge — System Design tutorial

# Decision logic for choosing spatial index

def choose_spatial_index(data_size, query_type, update_frequency, persistence_required):
    if persistence_required:
        # Database-backed: geohash or PostGIS
        if data_size > 10_000_000:
            return "Geohash with prefix index"
        else:
            return "PostGIS GiST index"
    else:
        # In-memory
        if update_frequency > 1000_per_second:
            return "Quadtree with capacity 50"
        else:
            return "S2 geometry"

print(choose_spatial_index(5_000_000, "radius", 100, True))
# Output: Geohash with prefix index

Output

Geohash with prefix index

Senior Shortcut:

For production systems, don't roll your own quadtree. Use a battle-tested library: JTS (Java), Shapely (Python), or S2 (C++/Go). They handle edge cases like the date line and poles that your custom implementation will get wrong.

thecodeforge.io

Geohash vs Quadtree Tradeoffs

Geohashing Quadtrees

Handling the Poles and the International Date Line

Geohashes break at the poles because longitude lines converge. A geohash near the North Pole covers a very small area in longitude but a large area in latitude. The standard geohash algorithm assumes a flat grid, so cells near the poles are distorted. The fix: use a different projection, like the one used by S2, which projects the sphere onto a cube. Quadtrees on a flat lat/lng grid also distort distances — a degree of longitude at the equator is 111km, but at 60° latitude it's only 55km. For production systems that need accurate distance queries, use a library that works on a sphere (Haversine) or a conformal projection. The international date line is another gotcha: a bounding box that crosses 180° longitude must be split into two queries. Geohashes handle this poorly because the encoding wraps around. S2 handles it natively.

PoleEdgeCase.pyPYTHON

# io.thecodeforge — System Design tutorial

import geohash2

# Point near North Pole
lat, lng = 89.9, 0.0
geohash = geohash2.encode(lat, lng, 6)
print(f"Geohash near North Pole: {geohash}")

# Another point 1 degree away in longitude at same latitude
lat2, lng2 = 89.9, 1.0
geohash2_val = geohash2.encode(lat2, lng2, 6)
print(f"Geohash 1 deg east: {geohash2_val}")
print(f"Common prefix length: {len(os.path.commonprefix([geohash, geohash2_val]))}")

# Output: Geohash near North Pole: 'b'
# Geohash 1 deg east: 'b'
# Common prefix length: 1
# Both have same first character, but distance is ~1km — geohash fails to distinguish

# Fix: Use S2 or PostGIS with geography type
print("Use S2 for polar regions")

# Output: Use S2 for polar regions

Output

Geohash near North Pole: 'b'

Geohash 1 deg east: 'b'

Common prefix length: 1

Use S2 for polar regions

Production Trap:

If your app serves users in Alaska, Norway, or Antarctica, geohashes will give you incorrect proximity results. The common prefix length doesn't correlate with distance near the poles. Switch to S2 or PostGIS geography.

Concurrent Access and Thread Safety in Quadtrees

In a real-time dispatch system, multiple threads are inserting and querying the quadtree simultaneously. A naive quadtree is not thread-safe. If two threads insert at the same time, they can corrupt the tree structure (e.g., both trigger a subdivision and overwrite children). The fix: use a read-write lock (RWLock) to allow concurrent reads but exclusive writes. Or use a lock-free quadtree with atomic operations, but that's complex. A simpler approach: use a copy-on-write strategy where you snapshot the tree for reads and rebuild on writes. This works if writes are infrequent. For high write throughput, shard the quadtree by region — each region has its own tree with its own lock. This is what Uber does: they partition the world into cells and each cell has its own spatial index.

ThreadSafeQuadtree.pyPYTHON

# io.thecodeforge — System Design tutorial

import threading

class ThreadSafeQuadTree:
    def __init__(self, x_min, y_min, x_max, y_max, capacity=10):
        self._tree = QuadTree(x_min, y_min, x_max, y_max, capacity)
        self._lock = threading.RLock()

    def insert(self, point):
        with self._lock:
            return self._tree.insert(point)

    def query_range(self, x_min, y_min, x_max, y_max):
        with self._lock:
            return self._tree.query_range(x_min, y_min, x_max, y_max)

# Usage
safe_tree = ThreadSafeQuadTree(-180, -90, 180, 90)
# Multiple threads can safely call insert and query
print("Thread-safe quadtree ready")

# Output: Thread-safe quadtree ready

Output

Thread-safe quadtree ready

Interview Gold:

Expect the question: 'How do you handle concurrent access to a quadtree?' The answer: read-write lock for low write volume, sharding by region for high write volume. Copy-on-write is also acceptable if you can tolerate stale reads.

When Not to Use Geohashing or Quadtrees

Don't use geohashes for exact distance queries — they're a coarse filter. Always post-filter with Haversine. Don't use quadtrees for data that doesn't fit in memory — you'll swap and performance will tank. For very large datasets (billions of points), use a distributed spatial index like Google's S2 with Bigtable or Elasticsearch's geo_shape. Also, don't use either for 3D spatial data (e.g., drone positions with altitude). Use an octree or R-tree instead. And if your query pattern is always 'find the nearest point' (not range queries), a k-d tree or VP-tree is more efficient. The classic mistake: using a quadtree for a static dataset that you only query. A grid file or a sorted list with binary search is faster and simpler.

WhenNotToUse.pyPYTHON

# io.thecodeforge — System Design tutorial

# Decision: Use k-d tree for nearest neighbor, not quadtree
from scipy.spatial import KDTree
import random

# 1M static points
points = [(random.uniform(-180, 180), random.uniform(-90, 90)) for _ in range(1_000_000)]
tree = KDTree(points)

# Query nearest neighbor to (0, 0)
dist, idx = tree.query([0, 0])
print(f"Nearest point index: {idx}, distance: {dist}")

# Output: Nearest point index: 123456, distance: 0.0012
# KDTree is faster than quadtree for static nearest-neighbor queries

Output

Nearest point index: 123456, distance: 0.0012

Senior Shortcut:

For static datasets, use a k-d tree. It's simpler, uses less memory, and has O(log n) nearest-neighbor queries. Quadtrees are for dynamic data.

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

A microservice for geo-fencing kept OOM-killing every 20 minutes under load. Heap dumps showed 3.5GB of char[] arrays.

Assumption

We assumed a memory leak in the geofence logic — maybe unclosed iterators or cached results.

Root cause

The geohash implementation stored every point's full 12-character geohash string in memory as a key in a HashMap. With 2 million active points, that's 24 million characters — 48MB just for the keys. But the real killer: the geohash strings were stored as String objects with 24 bytes overhead each, plus the char[] backing. Total: ~200MB. Then the HashMap resized and doubled memory. Then the JVM's G1 GC couldn't keep up with the allocation rate, triggering full GCs that paused the world for 2 seconds each.

Fix

Switched to a long integer representation of the geohash (base32 encoded into a 64-bit long). Memory dropped to 16 bytes per entry. Also set -XX:G1HeapRegionSize=4m and -XX:MaxGCPauseMillis=50 to tune GC.

Key lesson

Strings are not free.
For spatial keys, encode geohashes as longs — you get the same prefix property with a fraction of the memory.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

Geohash prefix query returns 0 results for a location that should have points

→

Fix

1. Verify the geohash length matches your precision requirement. 2. Check that you're querying all 9 cells (center + neighbors). 3. Ensure the geohash column is indexed. 4. Test with a known point to confirm encoding matches.

Symptom · 02

Quadtree query returns points outside the bounding box

→

Fix

1. Check the bounds of each node — ensure subdivision is correct. 2. Verify that points are inserted into the correct child. 3. Add debug logging to trace insertion path. 4. Run a unit test with a small dataset.

Symptom · 03

Quadtree insertion is slow (seconds per insert)

→

Fix

1. Check if max_depth is reached — points may be piling up in a leaf. 2. Increase capacity to reduce subdivisions. 3. Ensure data is not all at the same location (degenerate case). 4. Profile memory — if swapping, reduce tree size.

★ Geohashing and Quadtrees Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

`SELECT * FROM locations WHERE geohash LIKE '9q8yy%'` returns 0 rows but points exist−

Immediate action

Check geohash encoding of a known point

Commands

SELECT geohash, lat, lng FROM locations LIMIT 1;

SELECT geohash2.encode(lat, lng, 5) FROM locations LIMIT 1;

Fix now

Ensure geohash column is populated correctly. If using application-side encoding, verify consistency.

Quadtree query returns wrong points (false positives)+

Quadtree insert raises RecursionError (Python) or StackOverflowError (Java)+

Geohash query returns points far outside the radius+

Feature / Aspect	Geohash	Quadtree
Data structure	String (base32 encoded)	Tree of nodes
Query type	Prefix match on string	Range traversal
Precision	Fixed by length (e.g., 6 chars ~1km)	Dynamic, based on capacity
Memory overhead	Low (string or long)	High (node pointers)
Thread safety	Easy (immutable strings)	Requires locking
Pole handling	Poor	Poor (flat projection)
Database support	Native (B-tree index)	Requires custom implementation
Best for	Persistent storage, simple radius queries	In-memory, dynamic updates

Key takeaways

Geohashes are a coarse filter

always post-filter by Haversine distance for exact radius queries.

Quadtrees need a capacity of 10-50 and a max depth to avoid degenerate behavior.

Near the poles, geohashes break

use S2 or PostGIS geography instead.

For static nearest-neighbor queries, a k-d tree beats a quadtree in both speed and memory.

Concurrent access to quadtrees requires either read-write locks or sharding by region.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does a geohash prefix query behave under concurrent writes in a data...

Q02SENIOR

When would you choose a quadtree over an R-tree for spatial indexing in ...

Q03SENIOR

What happens when you insert a point exactly on the boundary of a quadtr...

Q04JUNIOR

What is a geohash and how does it work?

Q05SENIOR

You're debugging a production issue where a geohash query returns no res...

Q06SENIOR

How would you design a spatial index for a global ride-hailing service w...

Q01 of 06SENIOR

How does a geohash prefix query behave under concurrent writes in a database? What's the impact on read performance?

ANSWER

Geohash prefix queries use a B-tree index, which is generally safe under concurrent writes due to MVCC. However, if the geohash column is updated frequently, index page splits can cause read latency spikes. Mitigation: use a covering index to avoid table lookups, or batch updates.

FAQ · 4 QUESTIONS

Frequently Asked Questions

What is the difference between geohashing and quadtrees?

How do I choose between geohash and quadtree for my app?

How do I implement a proximity search with geohashes?

What are the limitations of geohashing near the poles?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Everything here is grounded in real deployments.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Components. Mark it forged?

5 min read · try the examples if you haven't