To design Ticketmaster, use a sharded relational database for inventory, Redis for hot data caching, a queue-based async order processing system, and a rate limiter at the API gateway to prevent stampedes. The key is to reserve inventory for a short window (e.g., 5 minutes) and release unconfirmed reservations back to the pool.
✦ Definition~90s read
What is Design Ticketmaster?
Design Ticketmaster is a system design pattern for building high-concurrency, low-latency event ticketing platforms that handle millions of concurrent users during on-sale events, preventing overselling and ensuring fair access.
★
Imagine a stadium with 50,000 seats and 500,000 people trying to buy tickets at the exact same moment.
Plain-English First
Imagine a stadium with 50,000 seats and 500,000 people trying to buy tickets at the exact same moment. You can't let everyone run to the box office at once — that's a riot. Instead, you let people line up virtually (queue), give each person a short time to pick seats (reservation), and if they don't pay in time, you kick them out and let the next person try. The trick is to never sell the same seat twice, even when millions are clicking simultaneously.
Here's the nightmare: 10 million people hit F5 at 10:00 AM for Taylor Swift tickets. Your database melts. The site goes down. Twitter explodes. This isn't hypothetical — I've seen it happen to a major ticketing platform that shall remain nameless. The root cause? They treated ticket sales like a regular e-commerce checkout. Wrong. Ticket sales are a stampede, not a shopping trip. The problem is simple: supply is fixed (50,000 seats), demand is insane (millions of users), and every seat must be sold exactly once. No overselling. No double-booking. No angry mobs. After this article, you'll be able to design a system that survives a Taylor Swift on-sale without a scratch. You'll understand database sharding for inventory, Redis-based seat reservations with TTLs, queue-based order processing to absorb spikes, and rate limiting that doesn't punish legitimate users. Let's get into it.
Why Your Database Will Die Without Sharding
The first thing everyone gets wrong: they put all event inventory in one database table. During a hot on-sale, every user queries that table to see available seats. Then they try to lock a seat with a SELECT FOR UPDATE. That's a full table lock on writes. Your database grinds to a halt. The fix: shard by event. Each event gets its own database shard. Now writes for different events don't conflict. But even within one event, you need finer granularity. Shard by section or even row. For a stadium with 50,000 seats, split into 10 shards of 5,000 seats each. This reduces lock contention by an order of magnitude. Use a consistent hashing scheme to map event+section to a shard. The routing layer (API gateway or a lightweight proxy) reads the shard map from ZooKeeper or etcd. When a new event is created, assign it to the shard with the least load. Pro tip: pre-create shards with a fixed number (e.g., 64) and map events to shards via hash(event_id) % 64. This avoids resharding later.
ShardRouting.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// io.thecodeforge — SystemDesign tutorial
// Shard routing logic for event inventory
// Uses consistent hashing to map event to shard
classShardRouter {
privatestaticfinalint NUM_SHARDS = 64;
privatefinalMap<Integer, DatabasePool> shardPools;
publicShardRouter(List<DatabasePool> pools) {
// Initialize64 shard pools from config
shardPools = newHashMap<>();
for (int i = 0; i < NUM_SHARDS; i++) {
shardPools.put(i, pools.get(i % pools.size()));
}
}
publicDatabasePoolgetShardForEvent(String eventId) {
// Use consistent hash to avoid rehashing on pool changes
int shardId = Math.abs(eventId.hashCode()) % NUM_SHARDS;
return shardPools.get(shardId);
}
publicvoidreserveSeat(String eventId, String seatId, String userId) {
DatabasePool pool = getShardForEvent(eventId);
try (Connection conn = pool.getConnection()) {
// UseSELECTFORUPDATE with NOWAIT to avoid blocking
String sql = "SELECT status FROM seats WHERE event_id = ? AND seat_id = ? FOR UPDATE NOWAIT";
PreparedStatement stmt = conn.prepareStatement(sql);
stmt.setString(1, eventId);
stmt.setString(2, seatId);
ResultSet rs = stmt.executeQuery();
if (rs.next() && "available".equals(rs.getString("status"))) {
// Update status to reserved with TTL
sql = "UPDATE seats SET status = 'reserved', reserved_by = ?, reserved_at = NOW() WHERE event_id = ? AND seat_id = ?";
stmt = conn.prepareStatement(sql);
stmt.setString(1, userId);
stmt.setString(2, eventId);
stmt.setString(3, seatId);
stmt.executeUpdate();
} else {
thrownewSeatUnavailableException("Seat " + seatId + " is not available");
}
} catch (SQLException e) {
if (e.getSQLState().equals("55P03")) { // NOWAIT lock not available
thrownewSeatUnavailableException("Seat is locked by another transaction");
}
thrownewRuntimeException(e);
}
}
}
Output
No output — this is a design pattern. But the key behavior: concurrent requests for different events hit different shards, so no lock contention. Requests for same event but different sections also hit different shards if section-based sharding is used.
Production Trap: SELECT FOR UPDATE NOWAIT
If you forget NOWAIT, your database will queue up thousands of waiting transactions. They'll all timeout after 30 seconds, and your connection pool will be exhausted. Error: 'Connection pool exhausted' or 'Lock wait timeout exceeded'. Always use NOWAIT and handle the lock-not-available exception gracefully.
thecodeforge.io
Ticketmaster System Design for 10M Concurrent Users
Design Ticketmaster
Redis: The Seat Reservation Buffer That Saves Your Database
Even with sharding, your database can't handle millions of SELECT FOR UPDATE per second. That's where Redis comes in. Use Redis as a hot cache for seat availability. Before touching the database, check Redis. But here's the trick: don't just cache the seat list. Use Redis transactions (WATCH/MULTI/EXEC) or Lua scripts to atomically reserve a seat. This gives you sub-millisecond seat selection. The reservation has a TTL (e.g., 5 minutes). If the user doesn't complete payment within that time, a background job releases the seat back to the pool. This is the 'soft reservation' pattern. The database is the source of truth, but Redis handles the burst. When a seat is reserved in Redis, you also write a reservation record to the database asynchronously. If Redis goes down, you fall back to the database (with degraded performance). Pro tip: use Redis Cluster for high availability. Each event's seats are stored in a single hash key: event:{eventId}:seats. The hash field is seatId, value is status (available/reserved/userId). Lua script: if status == 'available' then set status = userId, set TTL on key, return success else return failure.
RedisReservation.luaSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — SystemDesign tutorial
// Lua script for atomic seat reservation in Redis
// Called via EVALSHAfor performance
-- KEYS[1] = event:{eventId}:seats (hash)
-- ARGV[1] = seatId
-- ARGV[2] = userId
-- ARGV[3] = ttlSeconds
local seatStatus = redis.call('HGET', KEYS[1], ARGV[1])
if seatStatus == 'available' or seatStatus == false then
redis.call('HSET', KEYS[1], ARGV[1], ARGV[2])
redis.call('EXPIRE', KEYS[1], ARGV[3])
return1 -- success
elsereturn0 -- seat already reserved
end
Output
Returns 1 if reservation succeeded, 0 if seat already taken.
Senior Shortcut: Use Lua Scripts for Atomicity
Don't use WATCH/MULTI/EXEC for high contention — they cause retries and wasted round trips. Lua scripts execute atomically on the Redis server. One round trip, no race conditions. Use EVALSHA with script caching for maximum performance.
thecodeforge.io
Seat Reservation Flow with Redis
Design Ticketmaster
Queue-Based Order Processing: Absorbing the Tsunami
Once a user reserves seats and proceeds to checkout, you can't process the order synchronously. Payment gateways are slow (500ms-2s). If you block on payment, your web server threads will exhaust. Instead, enqueue an order processing message and return a 'processing' status to the user. The queue (RabbitMQ, Kafka, or SQS) acts as a shock absorber. Consumers pick up messages and process payments. If payment succeeds, mark seats as sold. If it fails, release seats back to the pool. The queue must be durable and have dead-letter queues for failed messages. Important: the reservation TTL in Redis should be longer than the expected queue processing time. If TTL is 5 minutes and queue processing takes 30 seconds, you're safe. But if the queue backs up to 10 minutes, reservations will expire before processing. Monitor queue depth and alert if it exceeds a threshold. Autoscale consumers based on queue depth. Pro tip: use a priority queue for VIP users or higher-priced tickets.
OrderProcessor.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// io.thecodeforge — SystemDesign tutorial
// Order processing consumer that handles payment and seat confirmation
publicclassOrderProcessorimplementsRunnable {
privatefinalQueueClient queue;
privatefinalPaymentGateway paymentGateway;
privatefinalSeatInventoryClient inventoryClient;
publicOrderProcessor(QueueClient queue, PaymentGateway paymentGateway, SeatInventoryClient inventoryClient) {
this.queue = queue;
this.paymentGateway = paymentGateway;
this.inventoryClient = inventoryClient;
}
@Overridepublicvoidrun() {
while (true) {
OrderMessage msg = queue.dequeue(30, TimeUnit.SECONDS); // long poll
if (msg == null) continue;
try {
// Process payment
PaymentResult result = paymentGateway.charge(msg.getUserId(), msg.getAmount());
if (result.isSuccess()) {
// Confirm seat sale in database
inventoryClient.confirmSeats(msg.getEventId(), msg.getSeatIds(), msg.getUserId());
// Send confirmation email (async)
NotificationService.sendConfirmation(msg.getUserId(), msg.getEventId(), msg.getSeatIds());
} else {
// Payment failed — release seats back to pool
inventoryClient.releaseSeats(msg.getEventId(), msg.getSeatIds());
// Notify user
NotificationService.sendPaymentFailed(msg.getUserId());
}
} catch (Exception e) {
// Transient error — requeue with backoff
if (msg.getRetryCount() < 3) {
msg.incrementRetry();
queue.enqueue(msg, 5 * msg.getRetryCount(), TimeUnit.SECONDS); // delayed retry
} else {
// Dead letter — manual intervention
queue.sendToDeadLetter(msg);
logger.error("Order processing failed after 3 retries: {}", msg);
}
}
}
}
}
Output
No direct output. But the system behavior: orders are processed asynchronously, seats are confirmed only after payment success, and failures are retried with backoff.
The Classic Bug: Reservation TTL Shorter Than Queue Delay
If your queue backs up and processing takes longer than the Redis TTL, seats will be released back to the pool while the payment is still processing. Result: overselling. Always set TTL to at least 2x the expected max queue delay. Monitor queue depth and alert if it exceeds TTL/2.
thecodeforge.io
Synchronous vs Queue-Based Payment
Design Ticketmaster
Rate Limiting: Don't Let the Stampede Through
Without rate limiting, your system will be overwhelmed by bots and aggressive users. But naive rate limiting (e.g., 10 requests per second per IP) will block legitimate users behind a NAT. The solution: multi-layer rate limiting. First layer: global rate limit at the API gateway (e.g., 1 million requests per minute). Second layer: per-user rate limit based on user ID (e.g., 100 requests per minute). Third layer: per-event rate limit (e.g., 10,000 requests per second per event). Use a sliding window algorithm (not fixed window) to avoid burst traffic at window boundaries. Store counters in Redis with a TTL equal to the window size. For per-user limits, use a sorted set with timestamps as scores. For global limits, use a simple counter with EXPIRE. Pro tip: use token bucket for per-user limits to allow short bursts. And always return a Retry-After header so clients can back off.
RateLimiter.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — SystemDesign tutorial
// Sliding window rate limiter using Redis sorted sets
publicclassSlidingWindowRateLimiter {
privatefinalJedis jedis;
privatefinalint limit;
privatefinallong windowSizeMillis;
publicSlidingWindowRateLimiter(Jedis jedis, int limit, long windowSizeMillis) {
this.jedis = jedis;
this.limit = limit;
this.windowSizeMillis = windowSizeMillis;
}
publicbooleanallowRequest(String userId) {
String key = "ratelimit:" + userId;
long now = System.currentTimeMillis();
long windowStart = now - windowSizeMillis;
// Use a transaction to ensure atomicity
Transaction t = jedis.multi();
// Remove entries outside the window
t.zremrangeByScore(key, 0, windowStart);
// Count remaining entries
t.zcard(key);
// Add current request
t.zadd(key, now, String.valueOf(now));
// SetTTL to avoid memory leaks
t.expire(key, (int) (windowSizeMillis / 1000) + 1);
List<Object> results = t.exec();
long count = (Long) results.get(1); // zcard result
return count <= limit;
}
}
Output
Returns true if request is allowed, false if rate limited. The Redis sorted set contains timestamps of recent requests.
Interview Gold: Sliding Window vs Fixed Window
Fixed window rate limiting (e.g., reset counter every minute) allows double the limit at the boundary. Sliding window smooths it out. In production, always use sliding window. Fixed window is only acceptable for non-critical rate limits like email sending.
Handling the Payment Gateway: The Weakest Link
Payment gateways are the most failure-prone part of the system. They can be slow, return errors, or even double-charge. Never trust a synchronous payment response. Always use idempotency keys. Generate a unique key per order attempt. If the payment gateway returns a timeout, retry with the same idempotency key. The gateway will deduplicate. Also, implement a payment reconciliation job that runs every hour. It compares our order records with the gateway's transaction logs. If a payment succeeded on the gateway but we didn't confirm the seat (e.g., due to a crash), the job will fix it. Pro tip: use a circuit breaker for the payment gateway. If error rate exceeds 50% in a 1-minute window, stop sending requests and fail orders gracefully. Re-check after 30 seconds.
PaymentService.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — SystemDesign tutorial
// Payment service with idempotency and circuit breaker
publicclassPaymentService {
privatefinalPaymentGateway gateway;
privatefinalCircuitBreaker circuitBreaker;
privatefinalIdempotencyStore idempotencyStore;
publicPaymentResultcharge(String userId, String orderId, BigDecimal amount) {
if (!circuitBreaker.isAllowed()) {
thrownewCircuitBreakerOpenException("Payment gateway is down, try again later");
}
String idempotencyKey = orderId + ":" + userId;
// Checkif we already processed this idempotency key
PaymentResult cached = idempotencyStore.get(idempotencyKey);
if (cached != null) {
return cached;
}
try {
PaymentResult result = gateway.charge(idempotencyKey, userId, amount);
idempotencyStore.set(idempotencyKey, result, 24, TimeUnit.HOURS);
circuitBreaker.recordSuccess();
return result;
} catch (GatewayTimeoutException e) {
circuitBreaker.recordFailure();
// Retry with same idempotency key
throw e;
} catch (GatewayDeclineException e) {
// Payment declined — no retry
idempotencyStore.set(idempotencyKey, PaymentResult.declined(e.getReason()), 24, TimeUnit.HOURS);
circuitBreaker.recordSuccess(); // Not a system failure
returnPaymentResult.declined(e.getReason());
}
}
}
Output
Returns PaymentResult with success/failure. Idempotency ensures no double charges. Circuit breaker prevents cascading failures.
Senior Shortcut: Idempotency Keys Save Your Weekend
I've seen a payment gateway double-charge 10,000 users because of a network retry without idempotency. The fix: always generate a unique idempotency key per operation. Store it in Redis with a 24-hour TTL. If you get a timeout, retry with the same key. The gateway will return the original response.
The Waiting Room: Queueing Users Before They Hit the Site
When 10 million users hit the site at once, even your best infrastructure will struggle. The solution: a virtual waiting room. Before users can even see the event page, they are placed in a queue. The queue assigns a position number and an estimated wait time. Users are admitted to the site in batches (e.g., 10,000 per minute). This is essentially a distributed rate limiter at the application level. Implement the waiting room using a Redis sorted set with the user's arrival timestamp as score. A background job periodically pops users from the front of the queue and issues a token (stored in Redis with a TTL). The user's browser polls for the token. Once they have it, they can proceed to the event page. The token is single-use and expires after 30 seconds. Pro tip: use WebSocket or Server-Sent Events for real-time queue position updates instead of polling. This reduces load.
WaitingRoom.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — SystemDesign tutorial
// Waiting room queue management using Redis sorted set
publicclassWaitingRoom {
privatefinalJedis jedis;
privatefinalint admissionRatePerMinute = 10000;
publicvoidenqueueUser(String userId, String eventId) {
String key = "waitingroom:" + eventId;
long now = System.currentTimeMillis();
jedis.zadd(key, now, userId);
// SetTTL to 1 hour to clean up abandoned users
jedis.expire(key, 3600);
}
publicStringgetQueuePosition(String userId, String eventId) {
String key = "waitingroom:" + eventId;
Long rank = jedis.zrank(key, userId);
if (rank == null) returnnull;
returnString.valueOf(rank + 1);
}
publicvoidadmitUsers(String eventId) {
String key = "waitingroom:" + eventId;
// Admit users in batches every second (roughly 167 per second for 10k/min)
Set<String> users = jedis.zpopmin(key, admissionRatePerMinute / 60);
for (String userId : users) {
String tokenKey = "token:" + eventId + ":" + userId;
String token = UUID.randomUUID().toString();
jedis.setex(tokenKey, 30, token); // token expires in 30 seconds
// Notify user via WebSocket or push
NotificationService.notifyUser(userId, token);
}
}
}
Output
Users are admitted in batches. Each gets a single-use token valid for 30 seconds. Queue positions are real-time.
Production Trap: Token Expiry Too Short
If your token expires in 10 seconds, users with slow connections will miss it and have to re-queue. Set token TTL to 30 seconds minimum. Also, allow users to request a new token if the previous one expired (but only once per minute to prevent abuse).
Monitoring and Alerting: What to Watch in Production
You can't fix what you don't measure. Here are the key metrics to monitor: 1) Queue depth of order processing queue — alert if > 10,000. 2) Redis memory usage — alert if > 80% of maxmemory. 3) Database connection pool utilization — alert if > 80%. 4) Payment gateway error rate — alert if > 5% in 5 minutes. 5) Seat reservation success rate — if it drops below 90%, something is wrong. 6) Waiting room queue length — if it grows faster than admission rate, you need to scale. Use a dashboard (Grafana) with these metrics. Set up PagerDuty alerts for critical thresholds. Pro tip: log every seat reservation and release with a unique trace ID. This allows you to debug overselling incidents by replaying the logs.
MonitoringConfig.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — SystemDesign tutorial
// Prometheus metrics for the ticketing system
# HELP ticket_reservations_total Total number of seat reservations
# TYPE ticket_reservations_total counter
ticket_reservations_total{event="tswift2025", status="success"} 50000
ticket_reservations_total{event="tswift2025", status="failure"} 1200
# HELP ticket_reservation_duration_seconds Duration of seat reservation
# TYPE ticket_reservation_duration_seconds histogram
ticket_reservation_duration_seconds_bucket{le="0.1"} 45000
ticket_reservation_duration_seconds_bucket{le="0.5"} 49000
ticket_reservation_duration_seconds_bucket{le="1.0"} 50000
ticket_reservation_duration_seconds_sum 2500
ticket_reservation_duration_seconds_count 50000
# HELP order_queue_depth Current depth of order processing queue
# TYPE order_queue_depth gauge
order_queue_depth{queue="orders"} 1500
# HELP payment_gateway_error_rate Error rate for payment gateway calls
# TYPE payment_gateway_error_rate gauge
payment_gateway_error_rate{gateway="stripe"} 0.02
Output
Prometheus metrics that can be scraped and visualized in Grafana. Alert rules can be set on these metrics.
Senior Shortcut: Trace IDs Save Hours of Debugging
Include a trace ID in every log line from reservation to payment. When a user complains their seat was sold to someone else, you can grep the trace ID and see exactly what happened. Without it, you're guessing.
When Not to Use This Architecture
This architecture is overkill for small events (< 1,000 attendees). For those, a simple database with optimistic locking (version column) and a single server is fine. Also, if your event doesn't have a fixed capacity (e.g., general admission with no seat numbers), you don't need the reservation system. Just use a counter. And if you're not expecting a stampede (e.g., niche event), skip the waiting room. The complexity of sharding, Redis, queues, and waiting rooms is justified only when you have millions of concurrent users. For most systems, a simpler approach with caching and a CDN is sufficient. Don't over-engineer.
Never Do This: Over-Engineering for Small Events
I've seen a startup use Kafka, Redis Cluster, and 10 microservices for a 500-person meetup. They spent 3 months building it. The event sold out in 2 minutes with zero load. Use the simplest thing that works. Add complexity only when you have evidence of need.
● Production incidentPOST-MORTEMseverity: high
The 4GB Container That Kept Dying
Symptom
During a major on-sale, the inventory service containers kept OOM-killing every 30 seconds. Users saw 'Service Unavailable' errors.
Assumption
We assumed a memory leak in the reservation code. Spent hours profiling heap dumps.
Root cause
The Redis client connection pool was set to 100 connections per container, but each connection held a 40MB buffer for pipelining. With 4GB RAM and 100 connections, that's 4GB just for buffers — no room for application heap. The JVM heap was 2GB, causing immediate OOM.
Fix
Reduced Redis connection pool to 20 per container. Increased container memory to 8GB. Set JVM heap to 4GB. Added connection pool monitoring alert.
Key lesson
Always calculate the memory footprint of connection pools before setting limits.
A single connection's buffer can be larger than you think.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.4 entries
Symptom · 01
Users report 'Seat already taken' immediately after selecting
→
Fix
1. Check Redis reservation TTL — is it too short? 2. Verify Lua script atomicity — are there race conditions? 3. Check if reservation release job is running too frequently. 4. Review logs for duplicate reservation attempts.
Symptom · 02
Order processing queue growing faster than consumers can drain
→
Fix
1. Check consumer autoscaling — is it enabled? 2. Verify payment gateway latency — is it slow? 3. Increase max consumers. 4. If gateway is slow, implement circuit breaker to fail fast. 5. Consider adding a dead-letter queue for poison messages.
Symptom · 03
Redis memory usage at 100% causing evictions
→
Fix
1. Check maxmemory policy — use allkeys-lru. 2. Reduce TTL on reservation keys. 3. Increase Redis cluster size. 4. Monitor key patterns — are there leaked keys without TTL? 5. Use redis-cli --bigkeys to find large keys.
Symptom · 04
Waiting room queue not draining
→
Fix
1. Check admission rate — is it too low? 2. Verify token generation job is running. 3. Check if users are polling correctly — maybe they're not using the token. 4. Increase admission rate gradually. 5. Monitor Redis sorted set size.
★ Design Ticketmaster Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
redis-cli HGETALL event:{eventId}:seats | head -20
redis-cli TTL event:{eventId}:seats
Fix now
If TTL is 0, set it: redis-cli EXPIRE event:{eventId}:seats 300. If many seats are reserved by stale users, flush with DEL and reload from DB.
Order queue backlog — `order_queue_depth > 10000`+
Immediate action
Check consumer count and payment gateway health
Commands
kubectl get pods -l app=order-processor | wc -l
curl -s http://payment-gateway/health | jq .
Fix now
If gateway is healthy, scale consumers: kubectl scale deployment order-processor --replicas=20. If unhealthy, enable circuit breaker and fail orders gracefully.
Redis OOM — `OOM command not allowed when used memory > 'maxmemory'`+
Immediate action
Check memory policy and largest keys
Commands
redis-cli INFO memory | grep -E 'used_memory_human|maxmemory_human|evicted_keys'
redis-cli --bigkeys
Fix now
Set maxmemory-policy allkeys-lru: redis-cli CONFIG SET maxmemory-policy allkeys-lru. Increase maxmemory: redis-cli CONFIG SET maxmemory 8gb. Restart if needed.
Waiting room not admitting users — `No token issued`+
If admission job is stuck, restart it: kubectl delete pod -l app=waiting-room-admitter. If queue is empty, users may have abandoned — check frontend polling.
Feature / Aspect
Database-Only Approach
Redis + Queue Approach
Seat reservation latency
10-50ms (with index)
<1ms (Redis Lua script)
Concurrent users supported
~10,000 (single DB)
1,000,000+ (sharded + Redis)
Overselling risk
High (race conditions)
Low (atomic Lua scripts)
Complexity
Low
High
Cost
Low
High (Redis cluster, queue infrastructure)
Failure mode
Database crash = total outage
Redis crash = degraded (fallback to DB)
Key takeaways
1
Shard your inventory database by event to avoid write contention. Use consistent hashing to map events to shards.
2
Use Redis with Lua scripts for atomic seat reservations with TTL. This handles the burst without melting your database.
3
Process orders asynchronously via a queue to absorb spikes. Monitor queue depth and autoscale consumers.
4
Always use idempotency keys for payment processing to prevent double charges. Implement a circuit breaker for payment gateways.
5
Implement a virtual waiting room to control the flow of users into the site. Admit in batches to prevent stampede.
6
Monitor key metrics
queue depth, Redis memory, DB connection pool, payment error rate. Set up alerts for thresholds.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How does your system handle a scenario where a user reserves a seat but ...
Q02SENIOR
When would you choose a queue-based order processing over synchronous pr...
Q03SENIOR
What happens when Redis goes down during a hot on-sale? How do you preve...
Q04JUNIOR
Explain how you would shard the seat inventory database. What sharding k...
Q05SENIOR
A user reports that they successfully paid for a ticket but the system s...
Q06SENIOR
How would you design the system to handle a 'Taylor Swift effect' where ...
Q01 of 06SENIOR
How does your system handle a scenario where a user reserves a seat but never completes payment? What prevents that seat from being locked forever?
ANSWER
The reservation has a TTL in Redis (e.g., 5 minutes). A background job runs every minute, scanning for expired reservations and releasing them back to the pool. The database also has a 'reserved_at' timestamp; a cron job releases seats where reservation is older than TTL. This ensures no seat is locked indefinitely.
Q02 of 06SENIOR
When would you choose a queue-based order processing over synchronous processing? What are the trade-offs?
ANSWER
Use async queue processing when you expect high concurrency and can't afford to block web server threads on slow payment gateways. Trade-offs: increased complexity (message durability, retries, dead-letter queues), eventual consistency (user sees 'processing' instead of immediate confirmation), and potential for duplicate payments if idempotency is not implemented. Synchronous is simpler but limits throughput.
Q03 of 06SENIOR
What happens when Redis goes down during a hot on-sale? How do you prevent a total outage?
ANSWER
Implement a fallback to the database. When Redis is unavailable, the reservation service reads seat availability directly from the database (with caching disabled). The database will be slower, but the system remains operational. Also, use Redis Sentinel or Cluster for automatic failover. The fallback should be tested regularly in chaos engineering exercises.
Q04 of 06JUNIOR
Explain how you would shard the seat inventory database. What sharding key would you use and why?
ANSWER
Shard by event_id. Each event's seats are stored in a separate shard. This isolates load from different events. For very large events, further shard by section or row. Use consistent hashing to map event_id to shard. This avoids resharding when adding new shards. The shard map is stored in ZooKeeper and cached by the application.
Q05 of 06SENIOR
A user reports that they successfully paid for a ticket but the system shows 'payment failed' and released the seat. How do you debug this?
ANSWER
First, check the payment gateway logs for the transaction using the idempotency key. If the gateway shows success, check our order processing logs — did the consumer crash after payment but before confirming the seat? If so, the reconciliation job should fix it. If not, manually confirm the seat and refund if double-charged. The root cause is likely a missing idempotency check or a crash between payment and seat confirmation.
Q06 of 06SENIOR
How would you design the system to handle a 'Taylor Swift effect' where 10 million users hit the site at exactly 10:00 AM?
ANSWER
Use a virtual waiting room to admit users in batches. Pre-warm caches with event data. Use Redis for seat reservations with Lua scripts. Shard the database by event. Use a queue for order processing. Rate limit at multiple layers. Autoscale all services. Conduct load testing to find bottlenecks. Have a detailed runbook for common failure modes.
01
How does your system handle a scenario where a user reserves a seat but never completes payment? What prevents that seat from being locked forever?
SENIOR
02
When would you choose a queue-based order processing over synchronous processing? What are the trade-offs?
SENIOR
03
What happens when Redis goes down during a hot on-sale? How do you prevent a total outage?
SENIOR
04
Explain how you would shard the seat inventory database. What sharding key would you use and why?
JUNIOR
05
A user reports that they successfully paid for a ticket but the system shows 'payment failed' and released the seat. How do you debug this?
SENIOR
06
How would you design the system to handle a 'Taylor Swift effect' where 10 million users hit the site at exactly 10:00 AM?
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
How does Ticketmaster handle millions of concurrent users during ticket sales?
Ticketmaster uses a combination of sharded databases, Redis caching, queue-based order processing, and a virtual waiting room. Users are placed in a queue before accessing the site. Seat reservations are handled atomically in Redis with Lua scripts. Orders are processed asynchronously to absorb spikes. Rate limiting prevents abuse.
Was this helpful?
02
What's the difference between a fixed window and sliding window rate limiter?
A fixed window rate limiter resets the counter at the end of each window (e.g., every minute), allowing double the limit at the boundary. A sliding window rate limiter uses a rolling time window, smoothing out traffic and preventing bursts. For production systems, always use sliding window.
Was this helpful?
03
How do I prevent overselling tickets in a high-concurrency system?
Use atomic seat reservations in Redis with Lua scripts. Set a TTL on reservations. Process payments asynchronously and confirm seats only after successful payment. Implement a reconciliation job to fix inconsistencies. Use idempotency keys to prevent duplicate charges.
Was this helpful?
04
What happens if Redis goes down during a ticket sale?
Implement a fallback to the database. The system will be slower but operational. Use Redis Sentinel or Cluster for automatic failover. Test the fallback regularly. Also, consider using a multi-region Redis deployment for high availability.