Senior 6 min · June 25, 2026

Design Ticketmaster: The System Design That Handles 10M Concurrent Requests Without Crashing

Q: How does Ticketmaster handle millions of concurrent users during ticket sales?

Ticketmaster uses a combination of sharded databases, Redis caching, queue-based order processing, and a virtual waiting room. Users are placed in a queue before accessing the site. Seat reservations are handled atomically in Redis with Lua scripts. Orders are processed asynchronously to absorb spikes. Rate limiting prevents abuse.

Q: What's the difference between a fixed window and sliding window rate limiter?

A fixed window rate limiter resets the counter at the end of each window (e.g., every minute), allowing double the limit at the boundary. A sliding window rate limiter uses a rolling time window, smoothing out traffic and preventing bursts. For production systems, always use sliding window.

Q: How do I prevent overselling tickets in a high-concurrency system?

Use atomic seat reservations in Redis with Lua scripts. Set a TTL on reservations. Process payments asynchronously and confirm seats only after successful payment. Implement a reconciliation job to fix inconsistencies. Use idempotency keys to prevent duplicate charges.

Q: What happens if Redis goes down during a ticket sale?

Implement a fallback to the database. The system will be slower but operational. Use Redis Sentinel or Cluster for automatic failover. Test the fallback regularly. Also, consider using a multi-region Redis deployment for high availability.

Design Ticketmaster system to handle 10M concurrent users.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

To design Ticketmaster, use a sharded relational database for inventory, Redis for hot data caching, a queue-based async order processing system, and a rate limiter at the API gateway to prevent stampedes. The key is to reserve inventory for a short window (e.g., 5 minutes) and release unconfirmed reservations back to the pool.

✦ Definition~90s read

What is Design Ticketmaster?

Design Ticketmaster is a system design pattern for building high-concurrency, low-latency event ticketing platforms that handle millions of concurrent users during on-sale events, preventing overselling and ensuring fair access.

★

Imagine a stadium with 50,000 seats and 500,000 people trying to buy tickets at the exact same moment.

Plain-English First

Imagine a stadium with 50,000 seats and 500,000 people trying to buy tickets at the exact same moment. You can't let everyone run to the box office at once — that's a riot. Instead, you let people line up virtually (queue), give each person a short time to pick seats (reservation), and if they don't pay in time, you kick them out and let the next person try. The trick is to never sell the same seat twice, even when millions are clicking simultaneously.

Here's the nightmare: 10 million people hit F5 at 10:00 AM for Taylor Swift tickets. Your database melts. The site goes down. Twitter explodes. This isn't hypothetical — I've seen it happen to a major ticketing platform that shall remain nameless. The root cause? They treated ticket sales like a regular e-commerce checkout. Wrong. Ticket sales are a stampede, not a shopping trip. The problem is simple: supply is fixed (50,000 seats), demand is insane (millions of users), and every seat must be sold exactly once. No overselling. No double-booking. No angry mobs. After this article, you'll be able to design a system that survives a Taylor Swift on-sale without a scratch. You'll understand database sharding for inventory, Redis-based seat reservations with TTLs, queue-based order processing to absorb spikes, and rate limiting that doesn't punish legitimate users. Let's get into it.

Why Your Database Will Die Without Sharding

The first thing everyone gets wrong: they put all event inventory in one database table. During a hot on-sale, every user queries that table to see available seats. Then they try to lock a seat with a SELECT FOR UPDATE. That's a full table lock on writes. Your database grinds to a halt. The fix: shard by event. Each event gets its own database shard. Now writes for different events don't conflict. But even within one event, you need finer granularity. Shard by section or even row. For a stadium with 50,000 seats, split into 10 shards of 5,000 seats each. This reduces lock contention by an order of magnitude. Use a consistent hashing scheme to map event+section to a shard. The routing layer (API gateway or a lightweight proxy) reads the shard map from ZooKeeper or etcd. When a new event is created, assign it to the shard with the least load. Pro tip: pre-create shards with a fixed number (e.g., 64) and map events to shards via hash(event_id) % 64. This avoids resharding later.

ShardRouting.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Shard routing logic for event inventory
// Uses consistent hashing to map event to shard

class ShardRouter {
    private static final int NUM_SHARDS = 64;
    private final Map<Integer, DatabasePool> shardPools;

    public ShardRouter(List<DatabasePool> pools) {
        // Initialize 64 shard pools from config
        shardPools = new HashMap<>();
        for (int i = 0; i < NUM_SHARDS; i++) {
            shardPools.put(i, pools.get(i % pools.size()));
        }
    }

    public DatabasePool getShardForEvent(String eventId) {
        // Use consistent hash to avoid rehashing on pool changes
        int shardId = Math.abs(eventId.hashCode()) % NUM_SHARDS;
        return shardPools.get(shardId);
    }

    public void reserveSeat(String eventId, String seatId, String userId) {
        DatabasePool pool = getShardForEvent(eventId);
        try (Connection conn = pool.getConnection()) {
            // Use SELECT FOR UPDATE with NOWAIT to avoid blocking
            String sql = "SELECT status FROM seats WHERE event_id = ? AND seat_id = ? FOR UPDATE NOWAIT";
            PreparedStatement stmt = conn.prepareStatement(sql);
            stmt.setString(1, eventId);
            stmt.setString(2, seatId);
            ResultSet rs = stmt.executeQuery();
            if (rs.next() && "available".equals(rs.getString("status"))) {
                // Update status to reserved with TTL
                sql = "UPDATE seats SET status = 'reserved', reserved_by = ?, reserved_at = NOW() WHERE event_id = ? AND seat_id = ?";
                stmt = conn.prepareStatement(sql);
                stmt.setString(1, userId);
                stmt.setString(2, eventId);
                stmt.setString(3, seatId);
                stmt.executeUpdate();
            } else {
                throw new SeatUnavailableException("Seat " + seatId + " is not available");
            }
        } catch (SQLException e) {
            if (e.getSQLState().equals("55P03")) { // NOWAIT lock not available
                throw new SeatUnavailableException("Seat is locked by another transaction");
            }
            throw new RuntimeException(e);
        }
    }
}

Output

No output — this is a design pattern. But the key behavior: concurrent requests for different events hit different shards, so no lock contention. Requests for same event but different sections also hit different shards if section-based sharding is used.

Production Trap: SELECT FOR UPDATE NOWAIT

If you forget NOWAIT, your database will queue up thousands of waiting transactions. They'll all timeout after 30 seconds, and your connection pool will be exhausted. Error: 'Connection pool exhausted' or 'Lock wait timeout exceeded'. Always use NOWAIT and handle the lock-not-available exception gracefully.

thecodeforge.io

Ticketmaster System Design for 10M Concurrent Users

Design Ticketmaster

Redis: The Seat Reservation Buffer That Saves Your Database

Even with sharding, your database can't handle millions of SELECT FOR UPDATE per second. That's where Redis comes in. Use Redis as a hot cache for seat availability. Before touching the database, check Redis. But here's the trick: don't just cache the seat list. Use Redis transactions (WATCH/MULTI/EXEC) or Lua scripts to atomically reserve a seat. This gives you sub-millisecond seat selection. The reservation has a TTL (e.g., 5 minutes). If the user doesn't complete payment within that time, a background job releases the seat back to the pool. This is the 'soft reservation' pattern. The database is the source of truth, but Redis handles the burst. When a seat is reserved in Redis, you also write a reservation record to the database asynchronously. If Redis goes down, you fall back to the database (with degraded performance). Pro tip: use Redis Cluster for high availability. Each event's seats are stored in a single hash key: event:{eventId}:seats. The hash field is seatId, value is status (available/reserved/userId). Lua script: if status == 'available' then set status = userId, set TTL on key, return success else return failure.

RedisReservation.luaSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Lua script for atomic seat reservation in Redis
// Called via EVALSHA for performance

-- KEYS[1] = event:{eventId}:seats (hash)
-- ARGV[1] = seatId
-- ARGV[2] = userId
-- ARGV[3] = ttlSeconds

local seatStatus = redis.call('HGET', KEYS[1], ARGV[1])
if seatStatus == 'available' or seatStatus == false then
    redis.call('HSET', KEYS[1], ARGV[1], ARGV[2])
    redis.call('EXPIRE', KEYS[1], ARGV[3])
    return 1  -- success
else
    return 0  -- seat already reserved
end

Output

Returns 1 if reservation succeeded, 0 if seat already taken.

Senior Shortcut: Use Lua Scripts for Atomicity

Don't use WATCH/MULTI/EXEC for high contention — they cause retries and wasted round trips. Lua scripts execute atomically on the Redis server. One round trip, no race conditions. Use EVALSHA with script caching for maximum performance.

thecodeforge.io

Seat Reservation Flow with Redis

Design Ticketmaster

Queue-Based Order Processing: Absorbing the Tsunami

Once a user reserves seats and proceeds to checkout, you can't process the order synchronously. Payment gateways are slow (500ms-2s). If you block on payment, your web server threads will exhaust. Instead, enqueue an order processing message and return a 'processing' status to the user. The queue (RabbitMQ, Kafka, or SQS) acts as a shock absorber. Consumers pick up messages and process payments. If payment succeeds, mark seats as sold. If it fails, release seats back to the pool. The queue must be durable and have dead-letter queues for failed messages. Important: the reservation TTL in Redis should be longer than the expected queue processing time. If TTL is 5 minutes and queue processing takes 30 seconds, you're safe. But if the queue backs up to 10 minutes, reservations will expire before processing. Monitor queue depth and alert if it exceeds a threshold. Autoscale consumers based on queue depth. Pro tip: use a priority queue for VIP users or higher-priced tickets.

OrderProcessor.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Order processing consumer that handles payment and seat confirmation

public class OrderProcessor implements Runnable {
    private final QueueClient queue;
    private final PaymentGateway paymentGateway;
    private final SeatInventoryClient inventoryClient;

    public OrderProcessor(QueueClient queue, PaymentGateway paymentGateway, SeatInventoryClient inventoryClient) {
        this.queue = queue;
        this.paymentGateway = paymentGateway;
        this.inventoryClient = inventoryClient;
    }

    @Override
    public void run() {
        while (true) {
            OrderMessage msg = queue.dequeue(30, TimeUnit.SECONDS); // long poll
            if (msg == null) continue;
            try {
                // Process payment
                PaymentResult result = paymentGateway.charge(msg.getUserId(), msg.getAmount());
                if (result.isSuccess()) {
                    // Confirm seat sale in database
                    inventoryClient.confirmSeats(msg.getEventId(), msg.getSeatIds(), msg.getUserId());
                    // Send confirmation email (async)
                    NotificationService.sendConfirmation(msg.getUserId(), msg.getEventId(), msg.getSeatIds());
                } else {
                    // Payment failed — release seats back to pool
                    inventoryClient.releaseSeats(msg.getEventId(), msg.getSeatIds());
                    // Notify user
                    NotificationService.sendPaymentFailed(msg.getUserId());
                }
            } catch (Exception e) {
                // Transient error — requeue with backoff
                if (msg.getRetryCount() < 3) {
                    msg.incrementRetry();
                    queue.enqueue(msg, 5 * msg.getRetryCount(), TimeUnit.SECONDS); // delayed retry
                } else {
                    // Dead letter — manual intervention
                    queue.sendToDeadLetter(msg);
                    logger.error("Order processing failed after 3 retries: {}", msg);
                }
            }
        }
    }
}

Output

No direct output. But the system behavior: orders are processed asynchronously, seats are confirmed only after payment success, and failures are retried with backoff.

The Classic Bug: Reservation TTL Shorter Than Queue Delay

If your queue backs up and processing takes longer than the Redis TTL, seats will be released back to the pool while the payment is still processing. Result: overselling. Always set TTL to at least 2x the expected max queue delay. Monitor queue depth and alert if it exceeds TTL/2.

thecodeforge.io

Synchronous vs Queue-Based Payment

Design Ticketmaster

Rate Limiting: Don't Let the Stampede Through

Without rate limiting, your system will be overwhelmed by bots and aggressive users. But naive rate limiting (e.g., 10 requests per second per IP) will block legitimate users behind a NAT. The solution: multi-layer rate limiting. First layer: global rate limit at the API gateway (e.g., 1 million requests per minute). Second layer: per-user rate limit based on user ID (e.g., 100 requests per minute). Third layer: per-event rate limit (e.g., 10,000 requests per second per event). Use a sliding window algorithm (not fixed window) to avoid burst traffic at window boundaries. Store counters in Redis with a TTL equal to the window size. For per-user limits, use a sorted set with timestamps as scores. For global limits, use a simple counter with EXPIRE. Pro tip: use token bucket for per-user limits to allow short bursts. And always return a Retry-After header so clients can back off.

RateLimiter.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Sliding window rate limiter using Redis sorted sets

public class SlidingWindowRateLimiter {
    private final Jedis jedis;
    private final int limit;
    private final long windowSizeMillis;

    public SlidingWindowRateLimiter(Jedis jedis, int limit, long windowSizeMillis) {
        this.jedis = jedis;
        this.limit = limit;
        this.windowSizeMillis = windowSizeMillis;
    }

    public boolean allowRequest(String userId) {
        String key = "ratelimit:" + userId;
        long now = System.currentTimeMillis();
        long windowStart = now - windowSizeMillis;

        // Use a transaction to ensure atomicity
        Transaction t = jedis.multi();
        // Remove entries outside the window
        t.zremrangeByScore(key, 0, windowStart);
        // Count remaining entries
        t.zcard(key);
        // Add current request
        t.zadd(key, now, String.valueOf(now));
        // Set TTL to avoid memory leaks
        t.expire(key, (int) (windowSizeMillis / 1000) + 1);
        List<Object> results = t.exec();

        long count = (Long) results.get(1); // zcard result
        return count <= limit;
    }
}

Output

Returns true if request is allowed, false if rate limited. The Redis sorted set contains timestamps of recent requests.

Interview Gold: Sliding Window vs Fixed Window

Fixed window rate limiting (e.g., reset counter every minute) allows double the limit at the boundary. Sliding window smooths it out. In production, always use sliding window. Fixed window is only acceptable for non-critical rate limits like email sending.

Handling the Payment Gateway: The Weakest Link

Payment gateways are the most failure-prone part of the system. They can be slow, return errors, or even double-charge. Never trust a synchronous payment response. Always use idempotency keys. Generate a unique key per order attempt. If the payment gateway returns a timeout, retry with the same idempotency key. The gateway will deduplicate. Also, implement a payment reconciliation job that runs every hour. It compares our order records with the gateway's transaction logs. If a payment succeeded on the gateway but we didn't confirm the seat (e.g., due to a crash), the job will fix it. Pro tip: use a circuit breaker for the payment gateway. If error rate exceeds 50% in a 1-minute window, stop sending requests and fail orders gracefully. Re-check after 30 seconds.

PaymentService.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Payment service with idempotency and circuit breaker

public class PaymentService {
    private final PaymentGateway gateway;
    private final CircuitBreaker circuitBreaker;
    private final IdempotencyStore idempotencyStore;

    public PaymentResult charge(String userId, String orderId, BigDecimal amount) {
        if (!circuitBreaker.isAllowed()) {
            throw new CircuitBreakerOpenException("Payment gateway is down, try again later");
        }
        String idempotencyKey = orderId + ":" + userId;
        // Check if we already processed this idempotency key
        PaymentResult cached = idempotencyStore.get(idempotencyKey);
        if (cached != null) {
            return cached;
        }
        try {
            PaymentResult result = gateway.charge(idempotencyKey, userId, amount);
            idempotencyStore.set(idempotencyKey, result, 24, TimeUnit.HOURS);
            circuitBreaker.recordSuccess();
            return result;
        } catch (GatewayTimeoutException e) {
            circuitBreaker.recordFailure();
            // Retry with same idempotency key
            throw e;
        } catch (GatewayDeclineException e) {
            // Payment declined — no retry
            idempotencyStore.set(idempotencyKey, PaymentResult.declined(e.getReason()), 24, TimeUnit.HOURS);
            circuitBreaker.recordSuccess(); // Not a system failure
            return PaymentResult.declined(e.getReason());
        }
    }
}

Output

Returns PaymentResult with success/failure. Idempotency ensures no double charges. Circuit breaker prevents cascading failures.

Senior Shortcut: Idempotency Keys Save Your Weekend

I've seen a payment gateway double-charge 10,000 users because of a network retry without idempotency. The fix: always generate a unique idempotency key per operation. Store it in Redis with a 24-hour TTL. If you get a timeout, retry with the same key. The gateway will return the original response.

The Waiting Room: Queueing Users Before They Hit the Site

When 10 million users hit the site at once, even your best infrastructure will struggle. The solution: a virtual waiting room. Before users can even see the event page, they are placed in a queue. The queue assigns a position number and an estimated wait time. Users are admitted to the site in batches (e.g., 10,000 per minute). This is essentially a distributed rate limiter at the application level. Implement the waiting room using a Redis sorted set with the user's arrival timestamp as score. A background job periodically pops users from the front of the queue and issues a token (stored in Redis with a TTL). The user's browser polls for the token. Once they have it, they can proceed to the event page. The token is single-use and expires after 30 seconds. Pro tip: use WebSocket or Server-Sent Events for real-time queue position updates instead of polling. This reduces load.

WaitingRoom.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Waiting room queue management using Redis sorted set

public class WaitingRoom {
    private final Jedis jedis;
    private final int admissionRatePerMinute = 10000;

    public void enqueueUser(String userId, String eventId) {
        String key = "waitingroom:" + eventId;
        long now = System.currentTimeMillis();
        jedis.zadd(key, now, userId);
        // Set TTL to 1 hour to clean up abandoned users
        jedis.expire(key, 3600);
    }

    public String getQueuePosition(String userId, String eventId) {
        String key = "waitingroom:" + eventId;
        Long rank = jedis.zrank(key, userId);
        if (rank == null) return null;
        return String.valueOf(rank + 1);
    }

    public void admitUsers(String eventId) {
        String key = "waitingroom:" + eventId;
        // Admit users in batches every second (roughly 167 per second for 10k/min)
        Set<String> users = jedis.zpopmin(key, admissionRatePerMinute / 60);
        for (String userId : users) {
            String tokenKey = "token:" + eventId + ":" + userId;
            String token = UUID.randomUUID().toString();
            jedis.setex(tokenKey, 30, token); // token expires in 30 seconds
            // Notify user via WebSocket or push
            NotificationService.notifyUser(userId, token);
        }
    }
}

Output

Users are admitted in batches. Each gets a single-use token valid for 30 seconds. Queue positions are real-time.

Production Trap: Token Expiry Too Short

If your token expires in 10 seconds, users with slow connections will miss it and have to re-queue. Set token TTL to 30 seconds minimum. Also, allow users to request a new token if the previous one expired (but only once per minute to prevent abuse).

Monitoring and Alerting: What to Watch in Production

You can't fix what you don't measure. Here are the key metrics to monitor: 1) Queue depth of order processing queue — alert if > 10,000. 2) Redis memory usage — alert if > 80% of maxmemory. 3) Database connection pool utilization — alert if > 80%. 4) Payment gateway error rate — alert if > 5% in 5 minutes. 5) Seat reservation success rate — if it drops below 90%, something is wrong. 6) Waiting room queue length — if it grows faster than admission rate, you need to scale. Use a dashboard (Grafana) with these metrics. Set up PagerDuty alerts for critical thresholds. Pro tip: log every seat reservation and release with a unique trace ID. This allows you to debug overselling incidents by replaying the logs.

MonitoringConfig.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Prometheus metrics for the ticketing system

# HELP ticket_reservations_total Total number of seat reservations
# TYPE ticket_reservations_total counter
ticket_reservations_total{event="tswift2025", status="success"} 50000
ticket_reservations_total{event="tswift2025", status="failure"} 1200

# HELP ticket_reservation_duration_seconds Duration of seat reservation
# TYPE ticket_reservation_duration_seconds histogram
ticket_reservation_duration_seconds_bucket{le="0.1"} 45000
ticket_reservation_duration_seconds_bucket{le="0.5"} 49000
ticket_reservation_duration_seconds_bucket{le="1.0"} 50000
ticket_reservation_duration_seconds_sum 2500
ticket_reservation_duration_seconds_count 50000

# HELP order_queue_depth Current depth of order processing queue
# TYPE order_queue_depth gauge
order_queue_depth{queue="orders"} 1500

# HELP payment_gateway_error_rate Error rate for payment gateway calls
# TYPE payment_gateway_error_rate gauge
payment_gateway_error_rate{gateway="stripe"} 0.02

Output

Prometheus metrics that can be scraped and visualized in Grafana. Alert rules can be set on these metrics.

Senior Shortcut: Trace IDs Save Hours of Debugging

Include a trace ID in every log line from reservation to payment. When a user complains their seat was sold to someone else, you can grep the trace ID and see exactly what happened. Without it, you're guessing.

When Not to Use This Architecture

This architecture is overkill for small events (< 1,000 attendees). For those, a simple database with optimistic locking (version column) and a single server is fine. Also, if your event doesn't have a fixed capacity (e.g., general admission with no seat numbers), you don't need the reservation system. Just use a counter. And if you're not expecting a stampede (e.g., niche event), skip the waiting room. The complexity of sharding, Redis, queues, and waiting rooms is justified only when you have millions of concurrent users. For most systems, a simpler approach with caching and a CDN is sufficient. Don't over-engineer.

Never Do This: Over-Engineering for Small Events

I've seen a startup use Kafka, Redis Cluster, and 10 microservices for a 500-person meetup. They spent 3 months building it. The event sold out in 2 minutes with zero load. Use the simplest thing that works. Add complexity only when you have evidence of need.

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

During a major on-sale, the inventory service containers kept OOM-killing every 30 seconds. Users saw 'Service Unavailable' errors.

Assumption

We assumed a memory leak in the reservation code. Spent hours profiling heap dumps.

Root cause

The Redis client connection pool was set to 100 connections per container, but each connection held a 40MB buffer for pipelining. With 4GB RAM and 100 connections, that's 4GB just for buffers — no room for application heap. The JVM heap was 2GB, causing immediate OOM.

Fix

Reduced Redis connection pool to 20 per container. Increased container memory to 8GB. Set JVM heap to 4GB. Added connection pool monitoring alert.

Key lesson

Always calculate the memory footprint of connection pools before setting limits.
A single connection's buffer can be larger than you think.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.4 entries

Symptom · 01

Users report 'Seat already taken' immediately after selecting

→

Fix

1. Check Redis reservation TTL — is it too short? 2. Verify Lua script atomicity — are there race conditions? 3. Check if reservation release job is running too frequently. 4. Review logs for duplicate reservation attempts.

Symptom · 02

Order processing queue growing faster than consumers can drain

→

Fix

1. Check consumer autoscaling — is it enabled? 2. Verify payment gateway latency — is it slow? 3. Increase max consumers. 4. If gateway is slow, implement circuit breaker to fail fast. 5. Consider adding a dead-letter queue for poison messages.

Symptom · 03

Redis memory usage at 100% causing evictions

→

Fix

1. Check maxmemory policy — use allkeys-lru. 2. Reduce TTL on reservation keys. 3. Increase Redis cluster size. 4. Monitor key patterns — are there leaked keys without TTL? 5. Use redis-cli --bigkeys to find large keys.

Symptom · 04

Waiting room queue not draining

→

Fix

1. Check admission rate — is it too low? 2. Verify token generation job is running. 3. Check if users are polling correctly — maybe they're not using the token. 4. Increase admission rate gradually. 5. Monitor Redis sorted set size.

★ Design Ticketmaster Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

Seat reservation failures — `SeatUnavailableException`−

Immediate action

Check Redis key for event seats

Commands

redis-cli HGETALL event:{eventId}:seats | head -20

redis-cli TTL event:{eventId}:seats

Fix now

If TTL is 0, set it: redis-cli EXPIRE event:{eventId}:seats 300. If many seats are reserved by stale users, flush with DEL and reload from DB.

Order queue backlog — `order_queue_depth > 10000`+

Redis OOM — `OOM command not allowed when used memory > 'maxmemory'`+

Waiting room not admitting users — `No token issued`+

Feature / Aspect	Database-Only Approach	Redis + Queue Approach
Seat reservation latency	10-50ms (with index)	<1ms (Redis Lua script)
Concurrent users supported	~10,000 (single DB)	1,000,000+ (sharded + Redis)
Overselling risk	High (race conditions)	Low (atomic Lua scripts)
Complexity	Low	High
Cost	Low	High (Redis cluster, queue infrastructure)
Failure mode	Database crash = total outage	Redis crash = degraded (fallback to DB)

Key takeaways

Shard your inventory database by event to avoid write contention. Use consistent hashing to map events to shards.

Use Redis with Lua scripts for atomic seat reservations with TTL. This handles the burst without melting your database.

Process orders asynchronously via a queue to absorb spikes. Monitor queue depth and autoscale consumers.

Always use idempotency keys for payment processing to prevent double charges. Implement a circuit breaker for payment gateways.

Implement a virtual waiting room to control the flow of users into the site. Admit in batches to prevent stampede.

Monitor key metrics

queue depth, Redis memory, DB connection pool, payment error rate. Set up alerts for thresholds.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does your system handle a scenario where a user reserves a seat but ...

Q02SENIOR

When would you choose a queue-based order processing over synchronous pr...

Q03SENIOR

What happens when Redis goes down during a hot on-sale? How do you preve...

Q04JUNIOR

Explain how you would shard the seat inventory database. What sharding k...

Q05SENIOR

A user reports that they successfully paid for a ticket but the system s...

Q06SENIOR

How would you design the system to handle a 'Taylor Swift effect' where ...

Q01 of 06SENIOR

How does your system handle a scenario where a user reserves a seat but never completes payment? What prevents that seat from being locked forever?

ANSWER

The reservation has a TTL in Redis (e.g., 5 minutes). A background job runs every minute, scanning for expired reservations and releasing them back to the pool. The database also has a 'reserved_at' timestamp; a cron job releases seats where reservation is older than TTL. This ensures no seat is locked indefinitely.

FAQ · 4 QUESTIONS

Frequently Asked Questions

How does Ticketmaster handle millions of concurrent users during ticket sales?

What's the difference between a fixed window and sliding window rate limiter?

How do I prevent overselling tickets in a high-concurrency system?

What happens if Redis goes down during a ticket sale?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Real World. Mark it forged?

6 min read · try the examples if you haven't