Uber System Design — Cassandra Tombstone Staleness
Cassandra tombstones caused 10x dispatch delays in Uber's 2019 location blackout.
- Uber's backend is a microservices architecture handling 20M trips/day with 99.99% availability.
- Location tracking uses H3 geospatial indexing to store driver GPS pings sent every 4 seconds.
- Matching runs a real-time auction: finds nearby drivers via geohash prefix, then assigns based on ETA and surge.
- Surge pricing recalculates every 5 minutes from supply/demand curves, then broadcasts via push.
- Payment uses idempotency keys across sharded PostgreSQL shards with saga compensation on failures.
- Performance gotcha: Cassandra read-repair on stale replicas caused riders to see ghost drivers far away.
Imagine a city with thousands of taxi drivers all driving around, and millions of people raising their hands asking for a ride. Someone needs to constantly watch where every driver is, instantly find the closest one to each person, connect them, track the whole trip, and charge the right amount at the end — all in under 5 seconds, for millions of people at once. That 'someone' is the Uber backend. It's basically an incredibly fast, constantly-updated map crossed with a matchmaking engine crossed with a payment system — all stitched together without a single point of failure.
Uber processes roughly 20 million trips per day across 70+ countries. At peak hours in a city like New York, the system is simultaneously tracking hundreds of thousands of driver GPS pings per second, matching riders to drivers in under 2 seconds, calculating surge multipliers in real-time, and processing payments across dozens of currencies. Getting any one of those wrong at scale doesn't just cause a bug — it causes someone standing in the rain at 2am. That's the real pressure behind this design.
The core problem Uber solves isn't 'connecting riders to drivers' — that's too simple. The real problem is: how do you maintain a globally consistent, real-time view of moving objects (drivers), efficiently query that view by proximity, run a two-sided marketplace matching algorithm under millisecond constraints, handle partial failures gracefully, and do all of this across data centers on multiple continents while remaining cheap enough to be profitable? Each of those sub-problems alone is a PhD thesis. Together, they're one of the most instructive system design challenges you'll encounter.
By the end of this article you'll be able to walk into any senior engineering interview and articulate the full Uber architecture — from the geospatial indexing strategy that makes proximity search fast, to the matching algorithm trade-offs, to why Uber moved away from a monolith to a domain-oriented microservices architecture, and the exact database and messaging choices that make all of it work in production. More importantly, you'll understand why each decision was made, not just what it was.
High-Level Architecture Overview
Uber operates a domain-oriented microservices architecture. Each domain — location, matching, payment, pricing, trip management — owns its data and exposes APIs via an API gateway (Envoy). Services communicate asynchronously through Kafka topics for event-driven flows, and synchronously through gRPC for low-latency queries.
The architecture is regionally isolated: each city or metro area runs its own stack. Data centers are replicated across multiple regions, with Cassandra providing multi-master replication for location data and PostgreSQL shards for transactional trip/ payment data.
- Driver app → GPS pipeline — every 4 seconds, driver location sent via WebSocket to location-ingestion service.
- Rider app → request — HTTP request to matching service via gateway.
- Matching service — looks up nearby drivers via geospatial index, runs auction.
- Surge pricing service — consumes supply/demand Kafka topics, computes multipliers.
- Payment service — idempotent capture after trip end, uses saga pattern across payment providers.
Geospatial Indexing & Location Tracking
Every driver sends a GPS ping every 4 seconds. At 20 million trips per day, that's roughly 2.5 million pings per second at peak. Storing and querying these points in real time requires a geospatial indexing system that can answer "Who is within 500 meters of (lat, lng)?" in under 10 milliseconds.
Uber originally used Google S2 but later developed H3, a hexagonal hierarchical grid. Each driver's location is assigned an H3 cell at resolution 9 (hexagons ~0.1 km²). The matching service then queries all drivers in the same cell and adjacent cells (hex ring), then calculates ETA via OSRM (Open Source Routing Machine).
Storage: The location table in Cassandra uses driver_id as partition key and epoch_minute as clustering key, with a TTL of 10 minutes. A secondary table by h3_cell allows fast proximity searches: SELECT driver_id, lat, lng, ts FROM driver_location WHERE h3_cell = ? AND epoch_min = ?.
- Hexagons have uniform neighbor distance, unlike squares (grid distortion).
- Use resolution 9 (0.1 km²) for city-level accuracy; lower resolution for long-distance dispatch.
- TTL on location rows prevents stale data from living in Cassandra read-repair.
- Secondary index on h3_cell + epoch_min allows fast partition scans.
Matching Algorithm (Ride Dispatch)
When a rider requests a ride, the matching service must find the best driver within 2 seconds. The process is:
- Filter eligible drivers — those whose acceptance rate > 80%, not on a trip, within surge zone.
- Proximity query — find drivers in the same H3 hex ring (radius ~1 km). If too few, expand to ring 2.
- Cost computation — for each candidate, compute ETA (via OSRM routing service) and surge multiplier.
- Auction — Uber uses a second-price auction (Vickrey): the rider pays the lowest winning bid, the driver gets their bid price. This incentivizes truthful bidding.
- Dispatch — send the rider's request to the top 3 drivers simultaneously (but avoid over-dispatching by reserving the driver for 15 seconds).
The algorithm is optimized for throughput: most cities can dispatch in under 1 second at p99.
Surge Pricing Engine
Surge pricing adjusts fares based on real-time supply (available drivers) and demand (ride requests). The calculation runs every 5 minutes per geographic zone (a set of H3 cells).
Algorithm: - Compute surge_multiplier = max(1.0, demand / (supply * target_coverage)) - Where target_coverage is the desired driver-to-rider ratio (e.g., 0.5 for 1 driver per 2 riders). - The multiplier is smoothed using an exponential moving average to avoid sudden spikes. - If supply drops below a threshold, the zone is marked "surge".
Implementation: A separate Kafka stream processor consumes ride_request and driver_online events per zone, aggregates, then broadcasts the multiplier to a Redis cache. The matching service reads the multiplier from Redis, and the rider app displays the surge notification before confirming.
Uber also uses heatmaps to proactively send drivers notifications about potential surge areas.
Payment & Trip Execution
Uber's payment system processes tens of millions of transactions daily across 50+ currencies. The core challenge is exactly-once payment capture — you never want to charge a rider twice or miss a driver payout.
The solution: idempotency keys. Before initiating any payment, the client generates a UUID (the idempotency key) and sends it along with the request. The payment service stores this key in a Redis set with a short TTL. If the same key appears again, the service returns the previous response without re-executing.
For cross-region payments (e.g., rider in New York pays for a trip in Paris), the system uses a saga pattern with compensating actions. The steps: 1. Capture rider payment (source account) 2. Payout to driver (destination account) 3. Apply Uber commission 4. If any step fails, compensate: reverse capture, refund driver percentage.
The trip execution state machine runs on a Kafka-backed stream: states go from REQUESTED → MATCHED → EN_ROUTE → ON_TRIP → COMPLETED → SETTLED.
Scaling, Fault Tolerance & Real-World Incidents
Uber's system must survive: single data center failure, sudden demand spikes (New Year's Eve), driver app disconnections, network partitions, and rogue deployments. Key strategies:
- Regional isolation: each city runs independent stacks. If one region fails, others are unaffected.
- Graceful degradation: if the matching service cannot compute ETAs, it falls back to linear distance matching. If payment fails, riders can still complete the trip and pay later.
- Auto-scaling: all stateless services (matching, pricing, ETA) scale based on CPU and request queue depth. Cassandra and Redis clusters are sharded and replicated.
- Chaos engineering: Uber runs regular failure drills: kill random pods, inject latency into Kafka, throttle Cassandra nodes.
- Circuit breakers: every synchronous call (gRPC) has a circuit breaker. When error rate exceeds 50%, the circuit opens and the caller uses a fallback (e.g., cached data).
A real incident from 2020: A bug in the H3 library caused all new driver pings to be placed in the same hex cell. Suddenly, all riders in a city saw drivers at a single point. The fix required a hotpatch rolled out via the driver app's feature switch system.
- Assume every network call can fail, every dependency can slow down, every message can be lost.
- Design fallbacks that still offer a reasonable user experience (e.g., distance-only matching).
- Test failures proactively: kill a container, throttle a database, partition a network.
- Monitor the right metrics: p99 latency, error rates, consumer lag, cache hit ratio.
The 2019 Location Data Blackout
- Eventual consistency is not safe for geo-fencing queries that need seconds-fresh data.
- Always test read-repair behavior under compaction load before upgrades.
- Add defensive timestamp validation in downstream services.
Key takeaways
Common mistakes to avoid
4 patternsMemorising syntax before understanding the concept
Skipping practice and only reading theory
Ignoring constraints like latency and consistency
Assuming a monolith can't be scaled
Interview Questions on This Topic
How would you design Uber's location service to handle 5 million GPS pings per second?
Frequently Asked Questions
That's Real World. Mark it forged?
5 min read · try the examples if you haven't