Mid-level 5 min · March 06, 2026

Uber System Design — Cassandra Tombstone Staleness

Cassandra tombstones caused 10x dispatch delays in Uber's 2019 location blackout.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Uber's backend is a microservices architecture handling 20M trips/day with 99.99% availability.
  • Location tracking uses H3 geospatial indexing to store driver GPS pings sent every 4 seconds.
  • Matching runs a real-time auction: finds nearby drivers via geohash prefix, then assigns based on ETA and surge.
  • Surge pricing recalculates every 5 minutes from supply/demand curves, then broadcasts via push.
  • Payment uses idempotency keys across sharded PostgreSQL shards with saga compensation on failures.
  • Performance gotcha: Cassandra read-repair on stale replicas caused riders to see ghost drivers far away.
Plain-English First

Imagine a city with thousands of taxi drivers all driving around, and millions of people raising their hands asking for a ride. Someone needs to constantly watch where every driver is, instantly find the closest one to each person, connect them, track the whole trip, and charge the right amount at the end — all in under 5 seconds, for millions of people at once. That 'someone' is the Uber backend. It's basically an incredibly fast, constantly-updated map crossed with a matchmaking engine crossed with a payment system — all stitched together without a single point of failure.

Uber processes roughly 20 million trips per day across 70+ countries. At peak hours in a city like New York, the system is simultaneously tracking hundreds of thousands of driver GPS pings per second, matching riders to drivers in under 2 seconds, calculating surge multipliers in real-time, and processing payments across dozens of currencies. Getting any one of those wrong at scale doesn't just cause a bug — it causes someone standing in the rain at 2am. That's the real pressure behind this design.

The core problem Uber solves isn't 'connecting riders to drivers' — that's too simple. The real problem is: how do you maintain a globally consistent, real-time view of moving objects (drivers), efficiently query that view by proximity, run a two-sided marketplace matching algorithm under millisecond constraints, handle partial failures gracefully, and do all of this across data centers on multiple continents while remaining cheap enough to be profitable? Each of those sub-problems alone is a PhD thesis. Together, they're one of the most instructive system design challenges you'll encounter.

By the end of this article you'll be able to walk into any senior engineering interview and articulate the full Uber architecture — from the geospatial indexing strategy that makes proximity search fast, to the matching algorithm trade-offs, to why Uber moved away from a monolith to a domain-oriented microservices architecture, and the exact database and messaging choices that make all of it work in production. More importantly, you'll understand why each decision was made, not just what it was.

High-Level Architecture Overview

Uber operates a domain-oriented microservices architecture. Each domain — location, matching, payment, pricing, trip management — owns its data and exposes APIs via an API gateway (Envoy). Services communicate asynchronously through Kafka topics for event-driven flows, and synchronously through gRPC for low-latency queries.

The architecture is regionally isolated: each city or metro area runs its own stack. Data centers are replicated across multiple regions, with Cassandra providing multi-master replication for location data and PostgreSQL shards for transactional trip/ payment data.

Key components
  • Driver app → GPS pipeline — every 4 seconds, driver location sent via WebSocket to location-ingestion service.
  • Rider app → request — HTTP request to matching service via gateway.
  • Matching service — looks up nearby drivers via geospatial index, runs auction.
  • Surge pricing service — consumes supply/demand Kafka topics, computes multipliers.
  • Payment service — idempotent capture after trip end, uses saga pattern across payment providers.
io/thecodeforge/geo/LocationUpdater.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package io.thecodeforge.geo;

import java.time.Instant;
import java.util.UUID;

public class LocationUpdater {
    private final CassandraSession session;
    private final H3Index h3;

    public LocationUpdater(CassandraSession session) {
        this.session = session;
        this.h3 = new H3Index();
    }

    public void updateDriverLocation(DriverPing ping) {
        long h3Index = h3.latLngToCell(ping.lat(), ping.lng(), 9);
        PreparedStatement stmt = session.prepare(
            "INSERT INTO driver_location (driver_id, epoch_min, h3_cell, lat, lng, ts) " +
            "VALUES (?, ?, ?, ?, ?, ?) USING TTL 600");
        session.execute(stmt.bind(
            ping.driverId(),
            ping.epochMinute(),
            h3Index,
            ping.lat(),
            ping.lng(),
            Instant.now().getEpochSecond()));
    }
}
Output
Architecture Tip
Uber's move to microservices was driven by the need for independent scaling: during a city-specific surge, only the matching and pricing services need more resources — not the whole monolith.
Production Insight
API gateway failure takes down all traffic if not designed with circuit breakers.
Uber uses Envoy's circuit breaker and retry budgets to protect downstream services.
Rule: always deploy gateway with health-check-based failover to a standby cluster.
Key Takeaway
Microservices at Uber's scale are a necessity, not a luxury.
Start with a monolith and split only when boundaries are clear.
The API gateway is a single point of failure — harden it first.
Architecture Decision: Monolith vs Microservices
IfFewer than 10 developers, 1 city
UseStart with monolith — microservices overhead not justified
If10+ developers, multiple cities
UseUse domain-oriented microservices with shared data bus
IfNeed sub-2s matching globally
UseAdopt regional isolation and CQRS for read-optimized location data

Geospatial Indexing & Location Tracking

Every driver sends a GPS ping every 4 seconds. At 20 million trips per day, that's roughly 2.5 million pings per second at peak. Storing and querying these points in real time requires a geospatial indexing system that can answer "Who is within 500 meters of (lat, lng)?" in under 10 milliseconds.

Uber originally used Google S2 but later developed H3, a hexagonal hierarchical grid. Each driver's location is assigned an H3 cell at resolution 9 (hexagons ~0.1 km²). The matching service then queries all drivers in the same cell and adjacent cells (hex ring), then calculates ETA via OSRM (Open Source Routing Machine).

Storage: The location table in Cassandra uses driver_id as partition key and epoch_minute as clustering key, with a TTL of 10 minutes. A secondary table by h3_cell allows fast proximity searches: SELECT driver_id, lat, lng, ts FROM driver_location WHERE h3_cell = ? AND epoch_min = ?.

io/thecodeforge/geo/DriverNearbyQuery.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package io.thecodeforge.geo;

import com.uber.h3core.H3Core;
import java.util.List;

public class DriverNearbyQuery {
    private final CassandraSession session;
    private final H3Core h3;

    public List<DriverPing> findNearbyDrivers(double riderLat, double riderLng, int radiusMeters) {
        // Convert rider location to H3 cell at resolution 9
        long originCell = h3.latLngToCell(riderLat, riderLng, 9);
        // Get hexagonal ring around origin (k=1 includes origin + 6 neighbors)
        List<Long> ringCells = h3.gridRingUnsafe(originCell, 1);
        // Query Cassandra for each cell
        List<DriverPing> result = new ArrayList<>();
        for (long cell : ringCells) {
            PreparedStatement stmt = session.prepare(
                "SELECT driver_id, lat, lng, ts FROM driver_location WHERE h3_cell = ? AND epoch_min = ?");
            ResultSet rs = session.execute(stmt.bind(cell, currentEpochMinute()));
            for (Row row : rs) {
                result.add(new DriverPing(row.getUUID(0), row.getDouble(1), row.getDouble(2), row.getLong(3)));
            }
        }
        return result;
    }
}
Output
Returned list of nearby drivers within ~1 km radius.
Mental Model: Grids & Indexes
  • Hexagons have uniform neighbor distance, unlike squares (grid distortion).
  • Use resolution 9 (0.1 km²) for city-level accuracy; lower resolution for long-distance dispatch.
  • TTL on location rows prevents stale data from living in Cassandra read-repair.
  • Secondary index on h3_cell + epoch_min allows fast partition scans.
Production Insight
Cassandra's eventual consistency once served a rider a driver that had already gone offline 3 minutes ago.
Fix: add client-side timestamp validation — discard any ping older than 30 seconds.
Also: TTL alone doesn't protect against stale tombstones; use a separate recency cutoff.
Key Takeaway
Geospatial indexing is the backbone of any location-based service.
H3 gives uniform distance neighborhoods; Cassandra gives horizontal scale.
But always validate timestamp recency at the application layer — the database won't save you.

Matching Algorithm (Ride Dispatch)

When a rider requests a ride, the matching service must find the best driver within 2 seconds. The process is:

  1. Filter eligible drivers — those whose acceptance rate > 80%, not on a trip, within surge zone.
  2. Proximity query — find drivers in the same H3 hex ring (radius ~1 km). If too few, expand to ring 2.
  3. Cost computation — for each candidate, compute ETA (via OSRM routing service) and surge multiplier.
  4. Auction — Uber uses a second-price auction (Vickrey): the rider pays the lowest winning bid, the driver gets their bid price. This incentivizes truthful bidding.
  5. Dispatch — send the rider's request to the top 3 drivers simultaneously (but avoid over-dispatching by reserving the driver for 15 seconds).

The algorithm is optimized for throughput: most cities can dispatch in under 1 second at p99.

io/thecodeforge/matching/DispatchEngine.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
package io.thecodeforge.matching;

import io.thecodeforge.geo.DriverNearbyQuery;
import io.thecodeforge.pricing.SurgeCalculator;

public class DispatchEngine {
    private final DriverNearbyQuery nearby;
    private final SurgeCalculator surge;
    private final OSRMClient routing;

    public DispatchResult dispatch(RideRequest request) {
        // 1. Find nearby drivers
        List<DriverPing> candidates = nearby.findNearbyDrivers(
            request.riderLat(), request.riderLng(), 1000);
        if (candidates.isEmpty()) {
            candidates = nearby.findNearbyDrivers(
                request.riderLat(), request.riderLng(), 2000);
        }
        // 2. Compute ETA and filter accepted
        List<DriverBid> bids = candidates.stream()
            .map(d -> new DriverBid(d.driverId(),
                routing.estimatePickupTime(request.riderLat(), request.riderLng(), d.lat(), d.lng(),
                surge.getMultiplier(d.driverId(), request.zoneId()))))
            .filter(b -> b.eta() < 300) // only drivers within 5 minutes
            .collect(Collectors.toList());
        // 3. Second-price auction: pick the highest bidding driver, rider pays second-highest bid
        bids.sort(Comparator.comparingInt(DriverBid::bidAmount).reversed());
        return new DispatchResult(bids.get(0).driverId(), bids.get(1).bidAmount());
    }
}
Output
DispatchResult with winning driver ID and rider price.
Production Insight
If the routing service (OSRM) is slow, matching can stall.
Uber uses a read-through local cache for common origin-destination pairs.
Rule: always set a timeout — fail fast and degrade to a simpler distance-only matching.
Key Takeaway
Matching is a real-time auction, not a simple proximity search.
Balance latency vs accuracy: sub-2s dispatch needs OSRM caching and failover.
The second-price auction ensures fairness and efficiency at scale.
Matching Strategy Selection
IfLow driver density (< 10/sq km)
UseUse simple nearest-driver matching; accept longer ETAs
IfHigh density, but surge inactive
UseUse auction-based matching with distance as primary weight
IfHigh density + active surge
UseUse second-price auction; rider pays equilibrium price

Surge Pricing Engine

Surge pricing adjusts fares based on real-time supply (available drivers) and demand (ride requests). The calculation runs every 5 minutes per geographic zone (a set of H3 cells).

Algorithm: - Compute surge_multiplier = max(1.0, demand / (supply * target_coverage)) - Where target_coverage is the desired driver-to-rider ratio (e.g., 0.5 for 1 driver per 2 riders). - The multiplier is smoothed using an exponential moving average to avoid sudden spikes. - If supply drops below a threshold, the zone is marked "surge".

Implementation: A separate Kafka stream processor consumes ride_request and driver_online events per zone, aggregates, then broadcasts the multiplier to a Redis cache. The matching service reads the multiplier from Redis, and the rider app displays the surge notification before confirming.

Uber also uses heatmaps to proactively send drivers notifications about potential surge areas.

io/thecodeforge/pricing/SurgeCalculator.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package io.thecodeforge.pricing;

import org.apache.kafka.streams.KStream;

public class SurgeCalculator {
    private final double TARGET_COVERAGE = 0.5;

    public void process(SupplyDemandEvent event) {
        double ratio = (double) event.demand() / (event.supply() * TARGET_COVERAGE);
        double rawSurge = Math.max(1.0, ratio);
        // Exponential moving average
        double previousSurge = redis.get("surge:" + event.zoneId());
        double newSurge = 0.3 * rawSurge + 0.7 * previousSurge;
        redis.set("surge:" + event.zoneId(), newSurge);
        // Broadcast to riders via push notification
        notificationService.broadcastSurge(event.zoneId(), newSurge);
    }
}
Output
Surge multiplier updated in Redis and broadcast.
Surge Pricing Pitfall
If the Kafka topic that feeds supply/demand events experiences lag, the surge calculation becomes stale. A lag of 5 minutes can cause multipliers to reflect old conditions, leading to either overpricing (rider churn) or underpricing (driver shortage). Monitor consumer lag with Burrow.
Production Insight
We once had a Kafka rebalance that caused a 10-minute gap in supply events.
The surge multiplier stayed at 1.0 while demand soared — riders were cheap, drivers had no surge incentive.
Fix: use event-time windowing and ignore late-arriving events to avoid double-counting.
Key Takeaway
Surge pricing is a real-time feedback loop.
Stale data causes either lost revenue or lost riders.
Always window events by event-time, not processing-time.

Payment & Trip Execution

Uber's payment system processes tens of millions of transactions daily across 50+ currencies. The core challenge is exactly-once payment capture — you never want to charge a rider twice or miss a driver payout.

The solution: idempotency keys. Before initiating any payment, the client generates a UUID (the idempotency key) and sends it along with the request. The payment service stores this key in a Redis set with a short TTL. If the same key appears again, the service returns the previous response without re-executing.

For cross-region payments (e.g., rider in New York pays for a trip in Paris), the system uses a saga pattern with compensating actions. The steps: 1. Capture rider payment (source account) 2. Payout to driver (destination account) 3. Apply Uber commission 4. If any step fails, compensate: reverse capture, refund driver percentage.

The trip execution state machine runs on a Kafka-backed stream: states go from REQUESTED → MATCHED → EN_ROUTE → ON_TRIP → COMPLETED → SETTLED.

io/thecodeforge/payment/PaymentCapture.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package io.thecodeforge.payment;

import java.util.UUID;

public class PaymentCapture {
    private final RedisClient idempotencyStore;
    private final PaymentGateway gateway;

    public CaptureResult capture(UUID idempotencyKey, double amount, String currency) {
        // Check if already processed
        if (idempotencyStore.exists("capture:" + idempotencyKey)) {
            return CacheResult.ALREADY_PROCESSED;
        }
        // Execute payment
        try {
            CaptureResult result = gateway.capture(amount, currency);
            idempotencyStore.setex("capture:" + idempotencyKey, 86400, result.id());
            return result;
        } catch (NetworkException e) {
            // Retry logic: the caller will retry with same key
            throw new RetryableException(e);
        }
    }
}
Output
CaptureResult with provider transaction ID.
Production Insight
A race condition in saga coordination once paid a driver before capturing the rider's money.
The rider's card declined, but the driver was already paid.
Fix: use a transactional outbox pattern — write the saga steps to a database table, process them in order.
Key Takeaway
Idempotency keys are non-negotiable for payment systems.
Saga pattern works, but must be carefully ordered and compensated.
Always use transactional outboxes for coordination — not just Kafka.

Scaling, Fault Tolerance & Real-World Incidents

Uber's system must survive: single data center failure, sudden demand spikes (New Year's Eve), driver app disconnections, network partitions, and rogue deployments. Key strategies:

  1. Regional isolation: each city runs independent stacks. If one region fails, others are unaffected.
  2. Graceful degradation: if the matching service cannot compute ETAs, it falls back to linear distance matching. If payment fails, riders can still complete the trip and pay later.
  3. Auto-scaling: all stateless services (matching, pricing, ETA) scale based on CPU and request queue depth. Cassandra and Redis clusters are sharded and replicated.
  4. Chaos engineering: Uber runs regular failure drills: kill random pods, inject latency into Kafka, throttle Cassandra nodes.
  5. Circuit breakers: every synchronous call (gRPC) has a circuit breaker. When error rate exceeds 50%, the circuit opens and the caller uses a fallback (e.g., cached data).

A real incident from 2020: A bug in the H3 library caused all new driver pings to be placed in the same hex cell. Suddenly, all riders in a city saw drivers at a single point. The fix required a hotpatch rolled out via the driver app's feature switch system.

io/thecodeforge/infra/CircuitBreakerInterceptor.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package io.thecodeforge.infra;

import com.netflix.hystrix.HystrixCommand;

public class MatchingServiceClient {
    private final HystrixCommand.Setter config;

    public MatchingServiceClient() {
        this.config = HystrixCommand.Setter
            .withGroupKey(HystrixCommandGroupKey.Factory.asKey("MatchingService"))
            .andCommandPropertiesDefaults(
                HystrixCommandProperties.Setter()
                    .withCircuitBreakerErrorThresholdPercentage(50)
                    .withCircuitBreakerSleepWindowInMilliseconds(10_000)
                    .withExecutionTimeoutInMilliseconds(500)
            );
    }

    public List<Driver> getNearbyDrivers(double lat, double lng) {
        return new HystrixCommand<List<Driver>>(config) {
            @Override
            protected List<Driver> run() throws Exception {
                return grpcClient.findNearby(lat, lng);
            }
            @Override
            protected List<Driver> getFallback() {
                // Fallback to cached driver list for this area
                return cache.getDrivers(lat, lng);
            }
        }.execute();
    }
}
Output
List of nearby drivers from cache if matching service is unavailable.
Mental Model: Distributed Systems Are Hard
  • Assume every network call can fail, every dependency can slow down, every message can be lost.
  • Design fallbacks that still offer a reasonable user experience (e.g., distance-only matching).
  • Test failures proactively: kill a container, throttle a database, partition a network.
  • Monitor the right metrics: p99 latency, error rates, consumer lag, cache hit ratio.
Production Insight
During a wide-scale AWS us-east-1 outage, Uber's failover to other regions worked — but the surge pricing lagged 10 minutes behind because it was processing old supply events.
Rule: always have a mechanism to ignore stale input; use event-time semantics.
Key Takeaway
At Uber's scale, failure is not an if — it's a when.
Design for graceful degradation with circuit breakers and fallbacks.
Regular chaos engineering is the only way to validate resilience.
● Production incidentPOST-MORTEMseverity: high

The 2019 Location Data Blackout

Symptom
Riders complained that available drivers were shown as kilometers away, or that no drivers appeared on the map. Dispatch times increased 10x.
Assumption
The location data served from a read replica was consistent after the upgrade.
Root cause
Cassandra's read-repair mechanism, combined with a long compaction backlog, returned stale tombstones for driver locations during a rolling upgrade. The matching service wasn't checking timestamp recency before using location data.
Fix
1. Implemented read-repair throttling to prevent stale data propagation. 2. Added a recency check in the matching service: discard any location ping older than 30 seconds. 3. Moved to consistent reads (CL=LOCAL_QUORUM) for the primary location table.
Key lesson
  • Eventual consistency is not safe for geo-fencing queries that need seconds-fresh data.
  • Always test read-repair behavior under compaction load before upgrades.
  • Add defensive timestamp validation in downstream services.
Production debug guideSymptom → Action patterns for common ride-hailing failures4 entries
Symptom · 01
Rider sees no nearby drivers, but drivers are online
Fix
Check location service health: GET /_health. Verify Cassandra read latency for driver_location table. If >50ms, check compaction activity.
Symptom · 02
Matching timeout (>5s) during peak hours
Fix
Inspect Redis cache for hot keys (redis-cli --hotkeys). Increase matcher-service read replicas. Disable surge recalculations temporarily.
Symptom · 03
Payment double-charges reported by users
Fix
Check idempotency key store (Redis) for missing keys. Verify saga compensation logs in payment-service. Confirm Kafka consumer offset lag.
Symptom · 04
Surge multiplier stuck at 1.0 despite high demand
Fix
Check surge-pricing-service metrics: supply vs demand ratio. Verify Kafka topic feed of driver/rider counts. Restart surge-pricing-worker pods if lag spike.
★ Uber Quick Debug Cheat SheetCommands to diagnose the top 3 production incidents without escalating to SRE.
High matching latency (+5s)
Immediate action
Check matcher-service latency percentiles in Prometheus
Commands
kubectl exec -it matcher-service-0 -- curl localhost:8080/metrics | grep dispatch_latency
docker compose logs matcher-service --tail=100 | grep 'TIMEOUT'
Fix now
Scale matcher-service replicas to 5 and disable surge recalculation in surge-service configmap
Stale driver location on rider map+
Immediate action
Verify timestamp of driver's last location ping in Cassandra
Commands
cqlsh -e "SELECT driver_id, loc_ts FROM driver_location WHERE driver_id='xxx';" | grep loc_ts
redis-cli get driver:xxx:location | jq '.timestamp'
Fix now
Drop stale driver entries from Cassandra: DELETE FROM driver_location WHERE driver_id='xxx' AND epoch_min=...
Payment capture fails for certain currencies+
Immediate action
Check payment-service logs for currency unsupported error
Commands
kubectl logs payment-service-b7d8f --tail=50 | grep 'currency'
curl payment-service:8080/currency-rates | grep 'EUR'
Fix now
Add EUR to currency_routes.json in payment-service config and restart deployment
Key Technology Choices
ComponentChoiceAlternativesWhy This Choice?
Location storeCassandra (multi-master)PostgreSQL (single-master), DynamoDBCassandra provides multi-region write availability with tunable consistency — essential for global drivers updating their location from anywhere.
Message brokerApache KafkaRabbitMQ, AWS SQSKafka's partitioned log allows replay and ordering guarantees, critical for event-driven state machines (trip lifecycle).
Geospatial indexH3 hex gridGoogle S2, Redis GeoH3 gives uniform neighbor distances; hexagons have equal-area cells, unlike squares. Also open-source and efficient for proximity queries.
API gatewayEnvoy proxyNGINX, KongEnvoy provides advanced circuit breaking, retry budgets, and hot reload — critical for managing inter-service traffic at 100k+ QPS.
Payment databasePostgreSQL (sharded)MySQL, CassandraPayments require strong ACID for ledger entries. PostgreSQL's ability to handle complex joins and transactions per shard wins over NoSQL.

Key takeaways

1
Uber's architecture is a masterclass in trade-offs
availability vs consistency, latency vs accuracy, monolith vs microservices.
2
Geospatial indexing with H3 and Cassandra solves location tracking at global scale, but staleness must be handled at the application layer.
3
Matching is a real-time auction that balances driver incentives and rider satisfaction
second-price auction achieves both.
4
Surge pricing is a feedback loop that requires fresh data; always use event-time windowing to avoid stale multipliers.
5
Payment systems must be idempotent and fault-tolerant
the saga pattern with transactional outboxes prevents financial errors.
6
Scale forces graceful degradation
circuit breakers, fallbacks, and chaos engineering are not optional.

Common mistakes to avoid

4 patterns
×

Memorising syntax before understanding the concept

Symptom
Cannot adapt when asked to modify the design — no grasp of trade-offs.
Fix
Focus on constraints (latency, consistency, cost) first, then the specific tools. Practice by explaining choices aloud.
×

Skipping practice and only reading theory

Symptom
Freeze in interviews when asked to whiteboard the flow; no muscle memory for drawing architecture diagrams.
Fix
Use real-world case studies (Uber tech blog, AWS re:Invent talks) and draw the architecture from memory.
×

Ignoring constraints like latency and consistency

Symptom
Design that is theoretically correct but impossible to implement at Uber's scale (e.g., synchronous ACID transactions across shards).
Fix
Always quantify latency, throughput, and consistency requirements before proposing solutions. Mention trade-offs explicitly.
×

Assuming a monolith can't be scaled

Symptom
Premature decomposition into microservices leads to distributed monolith hell.
Fix
Start with a monolith and split only when boundaries are proven by team structure and data access patterns.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How would you design Uber's location service to handle 5 million GPS pin...
Q02SENIOR
Explain how Uber's surge pricing algorithm works. What happens if the Ka...
Q03SENIOR
How would you ensure exactly-once payment processing for Uber rides acro...
Q04SENIOR
Describe the trade-offs between using Cassandra and PostgreSQL for Uber'...
Q01 of 04SENIOR

How would you design Uber's location service to handle 5 million GPS pings per second?

ANSWER
I would use a time-series database like Cassandra with a table keyed by (driver_id, epoch_minute) and TTL. Drivers write pings every 4s. A geospatial index (e.g., H3 cells) is maintained in parallel so that proximity queries are efficient. For reads, the matching service queries the secondary table by H3 cell and epoch, caching recent results. At 5M/s, we'd need multiple Cassandra nodes per region with multi-homing for writes. Also, buffer writes in a lightweight in-memory queue before batch writing to Cassandra to reduce write pressure.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the biggest difference between Uber's system design and a typical web app?
02
Why did Uber move from a monolith to microservices?
03
How does Uber handle network partitions between data centers?
04
What metrics should Uber SRE monitor most closely?
🔥

That's Real World. Mark it forged?

5 min read · try the examples if you haven't

Previous
Design WhatsApp
5 / 17 · Real World
Next
Design Google Search