Senior 6 min · June 25, 2026

Design Ticketmaster: The System Design That Handles 10M Concurrent Requests Without Crashing

Design Ticketmaster system to handle 10M concurrent users.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

To design Ticketmaster, use a sharded relational database for inventory, Redis for hot data caching, a queue-based async order processing system, and a rate limiter at the API gateway to prevent stampedes. The key is to reserve inventory for a short window (e.g., 5 minutes) and release unconfirmed reservations back to the pool.

✦ Definition~90s read
What is Design Ticketmaster?

Design Ticketmaster is a system design pattern for building high-concurrency, low-latency event ticketing platforms that handle millions of concurrent users during on-sale events, preventing overselling and ensuring fair access.

Imagine a stadium with 50,000 seats and 500,000 people trying to buy tickets at the exact same moment.
Plain-English First

Imagine a stadium with 50,000 seats and 500,000 people trying to buy tickets at the exact same moment. You can't let everyone run to the box office at once — that's a riot. Instead, you let people line up virtually (queue), give each person a short time to pick seats (reservation), and if they don't pay in time, you kick them out and let the next person try. The trick is to never sell the same seat twice, even when millions are clicking simultaneously.

Here's the nightmare: 10 million people hit F5 at 10:00 AM for Taylor Swift tickets. Your database melts. The site goes down. Twitter explodes. This isn't hypothetical — I've seen it happen to a major ticketing platform that shall remain nameless. The root cause? They treated ticket sales like a regular e-commerce checkout. Wrong. Ticket sales are a stampede, not a shopping trip. The problem is simple: supply is fixed (50,000 seats), demand is insane (millions of users), and every seat must be sold exactly once. No overselling. No double-booking. No angry mobs. After this article, you'll be able to design a system that survives a Taylor Swift on-sale without a scratch. You'll understand database sharding for inventory, Redis-based seat reservations with TTLs, queue-based order processing to absorb spikes, and rate limiting that doesn't punish legitimate users. Let's get into it.

Why Your Database Will Die Without Sharding

The first thing everyone gets wrong: they put all event inventory in one database table. During a hot on-sale, every user queries that table to see available seats. Then they try to lock a seat with a SELECT FOR UPDATE. That's a full table lock on writes. Your database grinds to a halt. The fix: shard by event. Each event gets its own database shard. Now writes for different events don't conflict. But even within one event, you need finer granularity. Shard by section or even row. For a stadium with 50,000 seats, split into 10 shards of 5,000 seats each. This reduces lock contention by an order of magnitude. Use a consistent hashing scheme to map event+section to a shard. The routing layer (API gateway or a lightweight proxy) reads the shard map from ZooKeeper or etcd. When a new event is created, assign it to the shard with the least load. Pro tip: pre-create shards with a fixed number (e.g., 64) and map events to shards via hash(event_id) % 64. This avoids resharding later.

ShardRouting.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// io.thecodeforge — System Design tutorial

// Shard routing logic for event inventory
// Uses consistent hashing to map event to shard

class ShardRouter {
    private static final int NUM_SHARDS = 64;
    private final Map<Integer, DatabasePool> shardPools;

    public ShardRouter(List<DatabasePool> pools) {
        // Initialize 64 shard pools from config
        shardPools = new HashMap<>();
        for (int i = 0; i < NUM_SHARDS; i++) {
            shardPools.put(i, pools.get(i % pools.size()));
        }
    }

    public DatabasePool getShardForEvent(String eventId) {
        // Use consistent hash to avoid rehashing on pool changes
        int shardId = Math.abs(eventId.hashCode()) % NUM_SHARDS;
        return shardPools.get(shardId);
    }

    public void reserveSeat(String eventId, String seatId, String userId) {
        DatabasePool pool = getShardForEvent(eventId);
        try (Connection conn = pool.getConnection()) {
            // Use SELECT FOR UPDATE with NOWAIT to avoid blocking
            String sql = "SELECT status FROM seats WHERE event_id = ? AND seat_id = ? FOR UPDATE NOWAIT";
            PreparedStatement stmt = conn.prepareStatement(sql);
            stmt.setString(1, eventId);
            stmt.setString(2, seatId);
            ResultSet rs = stmt.executeQuery();
            if (rs.next() && "available".equals(rs.getString("status"))) {
                // Update status to reserved with TTL
                sql = "UPDATE seats SET status = 'reserved', reserved_by = ?, reserved_at = NOW() WHERE event_id = ? AND seat_id = ?";
                stmt = conn.prepareStatement(sql);
                stmt.setString(1, userId);
                stmt.setString(2, eventId);
                stmt.setString(3, seatId);
                stmt.executeUpdate();
            } else {
                throw new SeatUnavailableException("Seat " + seatId + " is not available");
            }
        } catch (SQLException e) {
            if (e.getSQLState().equals("55P03")) { // NOWAIT lock not available
                throw new SeatUnavailableException("Seat is locked by another transaction");
            }
            throw new RuntimeException(e);
        }
    }
}
Output
No output — this is a design pattern. But the key behavior: concurrent requests for different events hit different shards, so no lock contention. Requests for same event but different sections also hit different shards if section-based sharding is used.
Production Trap: SELECT FOR UPDATE NOWAIT
If you forget NOWAIT, your database will queue up thousands of waiting transactions. They'll all timeout after 30 seconds, and your connection pool will be exhausted. Error: 'Connection pool exhausted' or 'Lock wait timeout exceeded'. Always use NOWAIT and handle the lock-not-available exception gracefully.
Ticketmaster System Design for 10M Concurrent Users THECODEFORGE.IO Ticketmaster System Design for 10M Concurrent Users Flow from sharded DB to queue processing and payment Sharded Database Horizontal sharding to avoid single DB death Redis Reservation Buffer In-memory seat hold to prevent oversell Queue-Based Order Processing Async queue absorbs traffic spikes Rate Limiting Throttle requests to protect backend Payment Gateway Weakest link; retry and fallback logic Waiting Room Queue users before they hit the system ⚠ Redis reservation timeout too short causes lost seats Set TTL to match user checkout window exactly THECODEFORGE.IO
thecodeforge.io
Ticketmaster System Design for 10M Concurrent Users
Design Ticketmaster

Redis: The Seat Reservation Buffer That Saves Your Database

Even with sharding, your database can't handle millions of SELECT FOR UPDATE per second. That's where Redis comes in. Use Redis as a hot cache for seat availability. Before touching the database, check Redis. But here's the trick: don't just cache the seat list. Use Redis transactions (WATCH/MULTI/EXEC) or Lua scripts to atomically reserve a seat. This gives you sub-millisecond seat selection. The reservation has a TTL (e.g., 5 minutes). If the user doesn't complete payment within that time, a background job releases the seat back to the pool. This is the 'soft reservation' pattern. The database is the source of truth, but Redis handles the burst. When a seat is reserved in Redis, you also write a reservation record to the database asynchronously. If Redis goes down, you fall back to the database (with degraded performance). Pro tip: use Redis Cluster for high availability. Each event's seats are stored in a single hash key: event:{eventId}:seats. The hash field is seatId, value is status (available/reserved/userId). Lua script: if status == 'available' then set status = userId, set TTL on key, return success else return failure.

RedisReservation.luaSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — System Design tutorial

// Lua script for atomic seat reservation in Redis
// Called via EVALSHA for performance

-- KEYS[1] = event:{eventId}:seats (hash)
-- ARGV[1] = seatId
-- ARGV[2] = userId
-- ARGV[3] = ttlSeconds

local seatStatus = redis.call('HGET', KEYS[1], ARGV[1])
if seatStatus == 'available' or seatStatus == false then
    redis.call('HSET', KEYS[1], ARGV[1], ARGV[2])
    redis.call('EXPIRE', KEYS[1], ARGV[3])
    return 1  -- success
else
    return 0  -- seat already reserved
end
Output
Returns 1 if reservation succeeded, 0 if seat already taken.
Senior Shortcut: Use Lua Scripts for Atomicity
Don't use WATCH/MULTI/EXEC for high contention — they cause retries and wasted round trips. Lua scripts execute atomically on the Redis server. One round trip, no race conditions. Use EVALSHA with script caching for maximum performance.
Seat Reservation Flow with RedisTHECODEFORGE.IOSeat Reservation Flow with RedisHow Redis buffers seat locks before DB writesUser Requests SeatCheck Redis cache for availabilityRedis TransactionWATCH seat key, MULTI, EXECReserve in RedisMark seat as held with TTLAsync DB WritePersist reservation to sharded DB⚠ Redis TTL must match checkout timeout to avoid phantom holdsTHECODEFORGE.IO
thecodeforge.io
Seat Reservation Flow with Redis
Design Ticketmaster

Queue-Based Order Processing: Absorbing the Tsunami

Once a user reserves seats and proceeds to checkout, you can't process the order synchronously. Payment gateways are slow (500ms-2s). If you block on payment, your web server threads will exhaust. Instead, enqueue an order processing message and return a 'processing' status to the user. The queue (RabbitMQ, Kafka, or SQS) acts as a shock absorber. Consumers pick up messages and process payments. If payment succeeds, mark seats as sold. If it fails, release seats back to the pool. The queue must be durable and have dead-letter queues for failed messages. Important: the reservation TTL in Redis should be longer than the expected queue processing time. If TTL is 5 minutes and queue processing takes 30 seconds, you're safe. But if the queue backs up to 10 minutes, reservations will expire before processing. Monitor queue depth and alert if it exceeds a threshold. Autoscale consumers based on queue depth. Pro tip: use a priority queue for VIP users or higher-priced tickets.

OrderProcessor.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// io.thecodeforge — System Design tutorial

// Order processing consumer that handles payment and seat confirmation

public class OrderProcessor implements Runnable {
    private final QueueClient queue;
    private final PaymentGateway paymentGateway;
    private final SeatInventoryClient inventoryClient;

    public OrderProcessor(QueueClient queue, PaymentGateway paymentGateway, SeatInventoryClient inventoryClient) {
        this.queue = queue;
        this.paymentGateway = paymentGateway;
        this.inventoryClient = inventoryClient;
    }

    @Override
    public void run() {
        while (true) {
            OrderMessage msg = queue.dequeue(30, TimeUnit.SECONDS); // long poll
            if (msg == null) continue;
            try {
                // Process payment
                PaymentResult result = paymentGateway.charge(msg.getUserId(), msg.getAmount());
                if (result.isSuccess()) {
                    // Confirm seat sale in database
                    inventoryClient.confirmSeats(msg.getEventId(), msg.getSeatIds(), msg.getUserId());
                    // Send confirmation email (async)
                    NotificationService.sendConfirmation(msg.getUserId(), msg.getEventId(), msg.getSeatIds());
                } else {
                    // Payment failed — release seats back to pool
                    inventoryClient.releaseSeats(msg.getEventId(), msg.getSeatIds());
                    // Notify user
                    NotificationService.sendPaymentFailed(msg.getUserId());
                }
            } catch (Exception e) {
                // Transient error — requeue with backoff
                if (msg.getRetryCount() < 3) {
                    msg.incrementRetry();
                    queue.enqueue(msg, 5 * msg.getRetryCount(), TimeUnit.SECONDS); // delayed retry
                } else {
                    // Dead letter — manual intervention
                    queue.sendToDeadLetter(msg);
                    logger.error("Order processing failed after 3 retries: {}", msg);
                }
            }
        }
    }
}
Output
No direct output. But the system behavior: orders are processed asynchronously, seats are confirmed only after payment success, and failures are retried with backoff.
The Classic Bug: Reservation TTL Shorter Than Queue Delay
If your queue backs up and processing takes longer than the Redis TTL, seats will be released back to the pool while the payment is still processing. Result: overselling. Always set TTL to at least 2x the expected max queue delay. Monitor queue depth and alert if it exceeds TTL/2.
Synchronous vs Queue-Based PaymentTHECODEFORGE.IOSynchronous vs Queue-Based PaymentWhy blocking on payment kills throughputSynchronous (Bad)Blocks web server thread for 500ms-2sThread pool exhausted under loadNo retry on payment timeoutUser sees 5xx on gateway failureQueue-Based (Good)Returns 'processing' immediatelyWorker pool absorbs payment latencyIdempotent retries with unique keyUser polls for async status updateQueue depth > 10K triggers alert — scale workers or throttleTHECODEFORGE.IO
thecodeforge.io
Synchronous vs Queue-Based Payment
Design Ticketmaster

Rate Limiting: Don't Let the Stampede Through

Without rate limiting, your system will be overwhelmed by bots and aggressive users. But naive rate limiting (e.g., 10 requests per second per IP) will block legitimate users behind a NAT. The solution: multi-layer rate limiting. First layer: global rate limit at the API gateway (e.g., 1 million requests per minute). Second layer: per-user rate limit based on user ID (e.g., 100 requests per minute). Third layer: per-event rate limit (e.g., 10,000 requests per second per event). Use a sliding window algorithm (not fixed window) to avoid burst traffic at window boundaries. Store counters in Redis with a TTL equal to the window size. For per-user limits, use a sorted set with timestamps as scores. For global limits, use a simple counter with EXPIRE. Pro tip: use token bucket for per-user limits to allow short bursts. And always return a Retry-After header so clients can back off.

RateLimiter.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — System Design tutorial

// Sliding window rate limiter using Redis sorted sets

public class SlidingWindowRateLimiter {
    private final Jedis jedis;
    private final int limit;
    private final long windowSizeMillis;

    public SlidingWindowRateLimiter(Jedis jedis, int limit, long windowSizeMillis) {
        this.jedis = jedis;
        this.limit = limit;
        this.windowSizeMillis = windowSizeMillis;
    }

    public boolean allowRequest(String userId) {
        String key = "ratelimit:" + userId;
        long now = System.currentTimeMillis();
        long windowStart = now - windowSizeMillis;

        // Use a transaction to ensure atomicity
        Transaction t = jedis.multi();
        // Remove entries outside the window
        t.zremrangeByScore(key, 0, windowStart);
        // Count remaining entries
        t.zcard(key);
        // Add current request
        t.zadd(key, now, String.valueOf(now));
        // Set TTL to avoid memory leaks
        t.expire(key, (int) (windowSizeMillis / 1000) + 1);
        List<Object> results = t.exec();

        long count = (Long) results.get(1); // zcard result
        return count <= limit;
    }
}
Output
Returns true if request is allowed, false if rate limited. The Redis sorted set contains timestamps of recent requests.
Interview Gold: Sliding Window vs Fixed Window
Fixed window rate limiting (e.g., reset counter every minute) allows double the limit at the boundary. Sliding window smooths it out. In production, always use sliding window. Fixed window is only acceptable for non-critical rate limits like email sending.

Payment gateways are the most failure-prone part of the system. They can be slow, return errors, or even double-charge. Never trust a synchronous payment response. Always use idempotency keys. Generate a unique key per order attempt. If the payment gateway returns a timeout, retry with the same idempotency key. The gateway will deduplicate. Also, implement a payment reconciliation job that runs every hour. It compares our order records with the gateway's transaction logs. If a payment succeeded on the gateway but we didn't confirm the seat (e.g., due to a crash), the job will fix it. Pro tip: use a circuit breaker for the payment gateway. If error rate exceeds 50% in a 1-minute window, stop sending requests and fail orders gracefully. Re-check after 30 seconds.

PaymentService.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — System Design tutorial

// Payment service with idempotency and circuit breaker

public class PaymentService {
    private final PaymentGateway gateway;
    private final CircuitBreaker circuitBreaker;
    private final IdempotencyStore idempotencyStore;

    public PaymentResult charge(String userId, String orderId, BigDecimal amount) {
        if (!circuitBreaker.isAllowed()) {
            throw new CircuitBreakerOpenException("Payment gateway is down, try again later");
        }
        String idempotencyKey = orderId + ":" + userId;
        // Check if we already processed this idempotency key
        PaymentResult cached = idempotencyStore.get(idempotencyKey);
        if (cached != null) {
            return cached;
        }
        try {
            PaymentResult result = gateway.charge(idempotencyKey, userId, amount);
            idempotencyStore.set(idempotencyKey, result, 24, TimeUnit.HOURS);
            circuitBreaker.recordSuccess();
            return result;
        } catch (GatewayTimeoutException e) {
            circuitBreaker.recordFailure();
            // Retry with same idempotency key
            throw e;
        } catch (GatewayDeclineException e) {
            // Payment declined — no retry
            idempotencyStore.set(idempotencyKey, PaymentResult.declined(e.getReason()), 24, TimeUnit.HOURS);
            circuitBreaker.recordSuccess(); // Not a system failure
            return PaymentResult.declined(e.getReason());
        }
    }
}
Output
Returns PaymentResult with success/failure. Idempotency ensures no double charges. Circuit breaker prevents cascading failures.
Senior Shortcut: Idempotency Keys Save Your Weekend
I've seen a payment gateway double-charge 10,000 users because of a network retry without idempotency. The fix: always generate a unique idempotency key per operation. Store it in Redis with a 24-hour TTL. If you get a timeout, retry with the same key. The gateway will return the original response.

The Waiting Room: Queueing Users Before They Hit the Site

When 10 million users hit the site at once, even your best infrastructure will struggle. The solution: a virtual waiting room. Before users can even see the event page, they are placed in a queue. The queue assigns a position number and an estimated wait time. Users are admitted to the site in batches (e.g., 10,000 per minute). This is essentially a distributed rate limiter at the application level. Implement the waiting room using a Redis sorted set with the user's arrival timestamp as score. A background job periodically pops users from the front of the queue and issues a token (stored in Redis with a TTL). The user's browser polls for the token. Once they have it, they can proceed to the event page. The token is single-use and expires after 30 seconds. Pro tip: use WebSocket or Server-Sent Events for real-time queue position updates instead of polling. This reduces load.

WaitingRoom.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — System Design tutorial

// Waiting room queue management using Redis sorted set

public class WaitingRoom {
    private final Jedis jedis;
    private final int admissionRatePerMinute = 10000;

    public void enqueueUser(String userId, String eventId) {
        String key = "waitingroom:" + eventId;
        long now = System.currentTimeMillis();
        jedis.zadd(key, now, userId);
        // Set TTL to 1 hour to clean up abandoned users
        jedis.expire(key, 3600);
    }

    public String getQueuePosition(String userId, String eventId) {
        String key = "waitingroom:" + eventId;
        Long rank = jedis.zrank(key, userId);
        if (rank == null) return null;
        return String.valueOf(rank + 1);
    }

    public void admitUsers(String eventId) {
        String key = "waitingroom:" + eventId;
        // Admit users in batches every second (roughly 167 per second for 10k/min)
        Set<String> users = jedis.zpopmin(key, admissionRatePerMinute / 60);
        for (String userId : users) {
            String tokenKey = "token:" + eventId + ":" + userId;
            String token = UUID.randomUUID().toString();
            jedis.setex(tokenKey, 30, token); // token expires in 30 seconds
            // Notify user via WebSocket or push
            NotificationService.notifyUser(userId, token);
        }
    }
}
Output
Users are admitted in batches. Each gets a single-use token valid for 30 seconds. Queue positions are real-time.
Production Trap: Token Expiry Too Short
If your token expires in 10 seconds, users with slow connections will miss it and have to re-queue. Set token TTL to 30 seconds minimum. Also, allow users to request a new token if the previous one expired (but only once per minute to prevent abuse).

Monitoring and Alerting: What to Watch in Production

You can't fix what you don't measure. Here are the key metrics to monitor: 1) Queue depth of order processing queue — alert if > 10,000. 2) Redis memory usage — alert if > 80% of maxmemory. 3) Database connection pool utilization — alert if > 80%. 4) Payment gateway error rate — alert if > 5% in 5 minutes. 5) Seat reservation success rate — if it drops below 90%, something is wrong. 6) Waiting room queue length — if it grows faster than admission rate, you need to scale. Use a dashboard (Grafana) with these metrics. Set up PagerDuty alerts for critical thresholds. Pro tip: log every seat reservation and release with a unique trace ID. This allows you to debug overselling incidents by replaying the logs.

MonitoringConfig.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — System Design tutorial

// Prometheus metrics for the ticketing system

# HELP ticket_reservations_total Total number of seat reservations
# TYPE ticket_reservations_total counter
ticket_reservations_total{event="tswift2025", status="success"} 50000
ticket_reservations_total{event="tswift2025", status="failure"} 1200

# HELP ticket_reservation_duration_seconds Duration of seat reservation
# TYPE ticket_reservation_duration_seconds histogram
ticket_reservation_duration_seconds_bucket{le="0.1"} 45000
ticket_reservation_duration_seconds_bucket{le="0.5"} 49000
ticket_reservation_duration_seconds_bucket{le="1.0"} 50000
ticket_reservation_duration_seconds_sum 2500
ticket_reservation_duration_seconds_count 50000

# HELP order_queue_depth Current depth of order processing queue
# TYPE order_queue_depth gauge
order_queue_depth{queue="orders"} 1500

# HELP payment_gateway_error_rate Error rate for payment gateway calls
# TYPE payment_gateway_error_rate gauge
payment_gateway_error_rate{gateway="stripe"} 0.02
Output
Prometheus metrics that can be scraped and visualized in Grafana. Alert rules can be set on these metrics.
Senior Shortcut: Trace IDs Save Hours of Debugging
Include a trace ID in every log line from reservation to payment. When a user complains their seat was sold to someone else, you can grep the trace ID and see exactly what happened. Without it, you're guessing.

When Not to Use This Architecture

This architecture is overkill for small events (< 1,000 attendees). For those, a simple database with optimistic locking (version column) and a single server is fine. Also, if your event doesn't have a fixed capacity (e.g., general admission with no seat numbers), you don't need the reservation system. Just use a counter. And if you're not expecting a stampede (e.g., niche event), skip the waiting room. The complexity of sharding, Redis, queues, and waiting rooms is justified only when you have millions of concurrent users. For most systems, a simpler approach with caching and a CDN is sufficient. Don't over-engineer.

Never Do This: Over-Engineering for Small Events
I've seen a startup use Kafka, Redis Cluster, and 10 microservices for a 500-person meetup. They spent 3 months building it. The event sold out in 2 minutes with zero load. Use the simplest thing that works. Add complexity only when you have evidence of need.
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
During a major on-sale, the inventory service containers kept OOM-killing every 30 seconds. Users saw 'Service Unavailable' errors.
Assumption
We assumed a memory leak in the reservation code. Spent hours profiling heap dumps.
Root cause
The Redis client connection pool was set to 100 connections per container, but each connection held a 40MB buffer for pipelining. With 4GB RAM and 100 connections, that's 4GB just for buffers — no room for application heap. The JVM heap was 2GB, causing immediate OOM.
Fix
Reduced Redis connection pool to 20 per container. Increased container memory to 8GB. Set JVM heap to 4GB. Added connection pool monitoring alert.
Key lesson
  • Always calculate the memory footprint of connection pools before setting limits.
  • A single connection's buffer can be larger than you think.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.4 entries
Symptom · 01
Users report 'Seat already taken' immediately after selecting
Fix
1. Check Redis reservation TTL — is it too short? 2. Verify Lua script atomicity — are there race conditions? 3. Check if reservation release job is running too frequently. 4. Review logs for duplicate reservation attempts.
Symptom · 02
Order processing queue growing faster than consumers can drain
Fix
1. Check consumer autoscaling — is it enabled? 2. Verify payment gateway latency — is it slow? 3. Increase max consumers. 4. If gateway is slow, implement circuit breaker to fail fast. 5. Consider adding a dead-letter queue for poison messages.
Symptom · 03
Redis memory usage at 100% causing evictions
Fix
1. Check maxmemory policy — use allkeys-lru. 2. Reduce TTL on reservation keys. 3. Increase Redis cluster size. 4. Monitor key patterns — are there leaked keys without TTL? 5. Use redis-cli --bigkeys to find large keys.
Symptom · 04
Waiting room queue not draining
Fix
1. Check admission rate — is it too low? 2. Verify token generation job is running. 3. Check if users are polling correctly — maybe they're not using the token. 4. Increase admission rate gradually. 5. Monitor Redis sorted set size.
★ Design Ticketmaster Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Seat reservation failures — `SeatUnavailableException`
Immediate action
Check Redis key for event seats
Commands
redis-cli HGETALL event:{eventId}:seats | head -20
redis-cli TTL event:{eventId}:seats
Fix now
If TTL is 0, set it: redis-cli EXPIRE event:{eventId}:seats 300. If many seats are reserved by stale users, flush with DEL and reload from DB.
Order queue backlog — `order_queue_depth > 10000`+
Immediate action
Check consumer count and payment gateway health
Commands
kubectl get pods -l app=order-processor | wc -l
curl -s http://payment-gateway/health | jq .
Fix now
If gateway is healthy, scale consumers: kubectl scale deployment order-processor --replicas=20. If unhealthy, enable circuit breaker and fail orders gracefully.
Redis OOM — `OOM command not allowed when used memory > 'maxmemory'`+
Immediate action
Check memory policy and largest keys
Commands
redis-cli INFO memory | grep -E 'used_memory_human|maxmemory_human|evicted_keys'
redis-cli --bigkeys
Fix now
Set maxmemory-policy allkeys-lru: redis-cli CONFIG SET maxmemory-policy allkeys-lru. Increase maxmemory: redis-cli CONFIG SET maxmemory 8gb. Restart if needed.
Waiting room not admitting users — `No token issued`+
Immediate action
Check admission job logs
Commands
kubectl logs -l app=waiting-room-admitter --tail=50
redis-cli ZCARD waitingroom:{eventId}
Fix now
If admission job is stuck, restart it: kubectl delete pod -l app=waiting-room-admitter. If queue is empty, users may have abandoned — check frontend polling.
Feature / AspectDatabase-Only ApproachRedis + Queue Approach
Seat reservation latency10-50ms (with index)<1ms (Redis Lua script)
Concurrent users supported~10,000 (single DB)1,000,000+ (sharded + Redis)
Overselling riskHigh (race conditions)Low (atomic Lua scripts)
ComplexityLowHigh
CostLowHigh (Redis cluster, queue infrastructure)
Failure modeDatabase crash = total outageRedis crash = degraded (fallback to DB)

Key takeaways

1
Shard your inventory database by event to avoid write contention. Use consistent hashing to map events to shards.
2
Use Redis with Lua scripts for atomic seat reservations with TTL. This handles the burst without melting your database.
3
Process orders asynchronously via a queue to absorb spikes. Monitor queue depth and autoscale consumers.
4
Always use idempotency keys for payment processing to prevent double charges. Implement a circuit breaker for payment gateways.
5
Implement a virtual waiting room to control the flow of users into the site. Admit in batches to prevent stampede.
6
Monitor key metrics
queue depth, Redis memory, DB connection pool, payment error rate. Set up alerts for thresholds.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does your system handle a scenario where a user reserves a seat but ...
Q02SENIOR
When would you choose a queue-based order processing over synchronous pr...
Q03SENIOR
What happens when Redis goes down during a hot on-sale? How do you preve...
Q04JUNIOR
Explain how you would shard the seat inventory database. What sharding k...
Q05SENIOR
A user reports that they successfully paid for a ticket but the system s...
Q06SENIOR
How would you design the system to handle a 'Taylor Swift effect' where ...
Q01 of 06SENIOR

How does your system handle a scenario where a user reserves a seat but never completes payment? What prevents that seat from being locked forever?

ANSWER
The reservation has a TTL in Redis (e.g., 5 minutes). A background job runs every minute, scanning for expired reservations and releasing them back to the pool. The database also has a 'reserved_at' timestamp; a cron job releases seats where reservation is older than TTL. This ensures no seat is locked indefinitely.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How does Ticketmaster handle millions of concurrent users during ticket sales?
02
What's the difference between a fixed window and sliding window rate limiter?
03
How do I prevent overselling tickets in a high-concurrency system?
04
What happens if Redis goes down during a ticket sale?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Real World. Mark it forged?

6 min read · try the examples if you haven't

Previous
Design a Proximity Service
25 / 40 · Real World
Next
Design a Distributed Job Scheduler