Mid-level 5 min · March 06, 2026

E-commerce System Design — Flash Sale Race Conditions

SELECT-then-UPDATE inventory causes double-bookings under load.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • E-commerce platforms are distributed systems managing product discovery, cart, checkout, payments, and inventory under high concurrency
  • Key components: product catalog service, cart service, checkout orchestration, payment gateway, inventory service
  • Performance insight: Product search must return in <200ms; use Elasticsearch with caching
  • Production insight: Without idempotency in payments, a single retry can charge a customer twice — use idempotency keys
  • Biggest mistake: Keeping cart and inventory in the same service — leads to tight coupling and checkout failures
Plain-English First

Imagine you're running the world's biggest flea market. You've got thousands of sellers, millions of buyers, and everyone wants to browse, pick something, pay, and get it delivered — all at the same time, without chaos. Building an e-commerce platform is exactly that: designing the invisible plumbing that makes sure the right product gets to the right buyer, the money moves safely, and nothing crashes when a flash sale hits at midnight.

Amazon processes over 66,000 orders per minute at peak. Shopify powers over 4 million stores. These aren't just databases with a shopping cart bolted on — they're distributed systems solving some of the hardest problems in engineering: consistency under concurrency, sub-second search over millions of products, payment reliability, and inventory accuracy across warehouses. If you're designing an e-commerce platform from scratch, every architectural decision you make will either hold up under that load or quietly become technical debt that kills you at scale.

The core problem e-commerce platforms solve is deceptively simple on the surface: let someone find a product, add it to a cart, pay for it, and receive it. But underneath that user flow are a dozen non-trivial challenges — you need to prevent two buyers from purchasing the last item simultaneously, ensure a failed payment never charges a card twice, serve product search results in under 200ms, and handle a 10x traffic spike the moment a celebrity tweets about your product. Each of these requires a deliberate architectural choice, and the wrong choice doesn't just slow things down — it loses money or erodes customer trust instantly.

By the end of this article, you'll be able to walk into a system design interview and confidently sketch the architecture of a production-grade e-commerce platform. You'll understand why the product catalog and inventory services must be separated, how to handle the distributed transaction problem at checkout, what caching strategy keeps product pages fast, and how to design a payment system that is both idempotent and fault-tolerant. This is the article I wish existed when I was preparing for those interviews.

Core Components & Service Separation

An e-commerce platform is a set of loosely coupled services, each responsible for one domain. The three non-negotiable splits are:

  • Product Catalog Service: Manages product metadata (name, description, images, categories, prices). This is read-heavy and benefits from caching and Elasticsearch.
  • Cart Service: Manages user sessions, add/remove items, coupon application. It's write-heavy for the current session but read-only for historical data.
  • Checkout Orchestrator: Coordinates the actual purchase — validates cart, locks inventory, calls payment gateway, creates order. This is the most failure-sensitive service.
  • Inventory Service: Tracks stock levels across warehouses, reserves items during checkout, handles restocks.
  • Payment Service: Interacts with external gateways (Stripe, PayPal), stores idempotency keys, handles retries.
  • Order Service: Records completed orders, sends notifications, manages returns.

The mistake everyone makes is bundling inventory with the catalog. These have completely different access patterns: catalog is read-heavy and stale-ok, inventory is write-heavy and consistency-critical. Keep them separate from the start.

SYSTEM DESIGN
1
Service Boundaries Mental Model
  • Catalog: Fast reads, eventual consistency acceptable.
  • Cart: Temporary state, can be lost without financial impact.
  • Inventory: Strong consistency, cannot oversell.
  • Payment: Must be idempotent and auditable.
  • Order: Immutable after creation, source of truth.
Production Insight
A single-service monolith works for <10k products and <100 concurrent users.
Once you hit 100k products and 1k concurrent users, the catalog's read load kills inventory write throughput.
Rule: Strip inventory from catalog at the very start — you'll never untangle it later.
Key Takeaway
Service separation is about access pattern differences.
Read-heavy vs write-heavy vs consistency-critical — each belongs in its own service.
Never share a database between catalog and inventory.
When to Split Services
IfProduct count < 10k, users < 100 concurrent
UseMonolith with separate modules is fine. Focus on clean code.
IfProduct count > 50k, traffic spikes expected
UseSplit catalog and inventory immediately. Use Elasticsearch for search.
IfMultiple payment gateways or complex promotions
UseSplit checkout orchestrator from order service for independent scaling.

Product Search & Catalog Performance

Product search is the gateway to purchase. Users expect results in under 200ms, with filters for category, price range, rating, and sorting by relevance or newest. Achieving this at scale means you cannot query the primary database directly.

Architecture: - Use Elasticsearch as the search index. It supports full-text search, faceted aggregation, and fuzzy matching out of the box. - Keep a read-through cache (Redis) for product detail pages (PDP). The cache key is product_id:locale:version. - For autocomplete, use a prefix-based Trie in memory or Elasticsearch's completion suggester.

The search index is built from the product catalog database using change data capture (CDC) with Debezium. Updates propagate within seconds — eventual consistency is acceptable here because a stale product in search is better than a failing search.

Caveats: - Sorting by combined fields (e.g., relevance * price) requires careful mapping in Elasticsearch. - Facet counts can be expensive; cache them separately and invalidate on product updates. - Avoid deep pagination (>100th page) — use search_after instead of from/size.

elasticsearch-mapping.jsonJSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
  "mappings": {
    "properties": {
      "name": { "type": "text", "analyzer": "standard" },
      "description": { "type": "text", "analyzer": "english" },
      "category_id": { "type": "keyword" },
      "price": { "type": "float" },
      "rating": { "type": "float" },
      "stock_status": { "type": "keyword" },
      "created_at": { "type": "date" }
    }
  },
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 2
  }
}
Search Performance Trap
Using 'wildcard' queries for partial matching is a common mistake. Wildcard queries in Elasticsearch are slow because they scan all terms. Use 'match_phrase_prefix' with 'index_prefixes' mapping instead. It's up to 10x faster.
Production Insight
A flash sale can cause the product cache to miss for newly popular items.
If the cache isn't warmed, the catalog DB sees a thundering herd of queries.
Rule: Pre-warm cache with top-1000 products before any known traffic spike.
Invest in a circuit breaker on catalog DB reads — 500 errors are better than a 30-minute DB outage.
Key Takeaway
Search is not query — use Elasticsearch with CDC.
Cache product pages aggressively, but invalidate on update.
Pre-warm, never let a spike hit the DB cold.

Cart & Checkout Consistency

The cart seems innocuous — items, quantities, maybe a promo code. But checkout is where distributed systems meet financial reality. The cart state must be consistent while the checkout orchestrator runs a mini-saga across inventory, payment, and order services.

Cart Design: - Store cart in Redis as a hash with TTL (e.g., 24 hours). This is fast and transient. - On checkout initiation, move cart data to a persistent checkout session in PostgreSQL. - Lock the cart to prevent modifications during checkout.

Checkout Orchestrator Steps: 1. Validate cart (prices, stock, promo codes). 2. Reserve inventory items (atomic decrement in inventory service). 3. Call payment gateway with idempotency key. 4. On payment success, create order record. 5. If payment fails, release inventory reservations (compensating transaction).

This is the Saga pattern: a sequence of local transactions with compensating actions. Avoid distributed transactions (2PC) — they don't scale and break across services.

Consistency Guarantee: - Use an outbox pattern: the order service writes an event to a database table, and a background worker publishes it reliably to a message queue. - This ensures no order is lost even if the message broker is down.

io/thecodeforge/checkout/CheckoutOrchestrator.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package io.thecodeforge.checkout;

public class CheckoutOrchestrator {
    // Using Saga pattern
    public OrderResult checkout(CheckoutRequest request) {
        String idempotencyKey = generateIdempotencyKey(request.userId, request.cartId);
        
        // 1. Validate cart
        Cart cart = cartService.getLockedCart(request.cartId);
        validatePrices(cart);
        
        // 2. Reserve inventory
        boolean reserved = inventoryService.reserveItems(cart.items());
        if (!reserved) {
            return OrderResult.failure("Out of stock");
        }
        
        // 3. Charge payment
        PaymentResult payment = paymentService.charge(
            request.paymentToken,
            cart.total(),
            idempotencyKey
        );
        if (!payment.success()) {
            inventoryService.releaseReservation(cart.items());
            return OrderResult.failure("Payment declined");
        }
        
        // 4. Create order (in outbox table)
        Order order = orderService.createOrder(cart, payment.transactionId);
        return OrderResult.success(order);
    }
}
Idempotency Key Mental Model
  • Generate a deterministic key: hash(userId + cartId + timestamp).
  • The payment gateway must reject duplicate key with the same payload.
  • Store key in a database table with a unique constraint to enforce idempotency.
  • If the request times out, retry with the same key — no double charge.
Production Insight
The most common checkout failure is not handling the gap between payment success and order creation.
If the payment callback is processed but order creation fails, the customer is charged but gets nothing.
Rule: Use an outbox pattern to make order creation durable before acknowledging payment.
Always log payment callback with raw payload for manual reconciliation.
Key Takeaway
Checkout is a saga — not a distributed transaction.
Idempotency keys prevent duplicate charges.
Outbox pattern guarantees order durability.

Payment System Reliability

Payment is the most critical subsystem — it moves real money. A successful payment must result in exactly one order and one charge. Payment systems at scale rely on three pillars: idempotency, retry with backoff, and idempotency verification at the gateway level.

Idempotency: - Before calling a payment gateway, generate a unique idempotency key (e.g., UUID per order attempt). - Store the key and the request payload in a database table with a unique constraint. - On a timeout or network error, retry with the same key. The gateway returns the original result.

Retry Strategy: - Use exponential backoff with jitter: first retry after 1s, then 2s, 4s, up to 60s max. - After 3 retries, escalate to a dead-letter queue for manual review. - Monitor the rate of payment timeouts — a sudden spike may indicate a gateway issue.

Failure Modes: - Dual charge: Happens when idempotency is missing or the gateway doesn't support it. Always use a payment provider that supports idempotency keys (Stripe, Braintree, Adyen). - Silent failures: Payment fails but the customer doesn't get an error — the system marks the order as pending. Use a reconciliation job that compares pending orders with gateway transactions daily.

io/thecodeforge/payment/PaymentService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
package io.thecodeforge.payment;

public class PaymentService {
    private static final int MAX_RETRIES = 3;
    private static final long BASE_DELAY_MS = 1000;
    
    public PaymentResult charge(String token, double amount, String idempotencyKey) {
        for (int attempt = 1; attempt <= MAX_RETRIES; attempt++) {
            try {
                PaymentResult result = gateway.charge(token, amount, idempotencyKey);
                if (result.isSuccess()) {
                    return result;
                }
                // If gateway declines with a non-retriable error, fail immediately
                if (result.isDecline()) {
                    return result;
                }
            } catch (TimeoutException | NetworkException e) {
                if (attempt == MAX_RETRIES) {
                    // Push to dead-letter queue for manual review
                    deadLetterQueue.send(new PaymentFailureEvent(token, amount, idempotencyKey, e));
                    return PaymentResult.failure("Payment processing delayed — contact support");
                }
                long delay = (long) (BASE_DELAY_MS * Math.pow(2, attempt - 1) * (1 + Math.random()));
                Thread.sleep(delay);
            }
        }
        return PaymentResult.failure("Max retries exceeded");
    }
}
Idempotency Key Storage
Store the idempotency key with a unique constraint in a dedicated table. If a retry arrives before the first request completed, the second request will block on the constraint until the first transaction commits or times out. Use a retry loop with a short timeout to handle this contention gracefully.
Production Insight
A payment gateway outage can cascade into a full site crash if your payment service doesn't have a circuit breaker.
Without it, every checkout request blocks waiting for a timeout, exhausting the HTTP connection pool and affecting other services.
Rule: Wrap payment gateway calls with a circuit breaker — fail fast after 3 consecutive timeouts within 30 seconds.
Key Takeaway
Idempotency keys are non-negotiable.
Retry with exponential backoff and jitter.
Circuit breakers prevent gateway failures from cascading.

Scaling Strategies & Trade-offs

Scaling an e-commerce platform is not just adding more servers — it's about understanding where bottlenecks appear at each growth stage.

Stage 1: Up to 100k daily active users - Monolithic architecture with separate read replicas for catalog. - Redis cache for product pages and session data. - Single PostgreSQL database with connection pooling.

Stage 2: 100k to 1M DAU - Break out catalog and inventory services (as discussed). - Use Elasticsearch for search, read replicas for orders. - Asynchronous payment callbacks (webhooks). - Message queue (RabbitMQ / Kafka) for order processing and inventory sync.

Stage 3: 1M+ DAU with flash sales - Full microservices architecture with event sourcing. - Each service has its own database (database-per-service). - CDN for static assets and cached product pages. - Pre-warming inventory cache for top products. - Auto-scaling infrastructure with Kubernetes. - Feature flags to quickly disable payment gateways or checkout during incidents.

The critical trade-off: consistency vs availability. During a flash sale, you might accept reduced availability for the checkout service to avoid overselling. Use a strong consistency model for inventory but allow reads from cache for product pages.

CODE
1
Database-Per-Service Pitfall
When each service has its own database, you lose the ability to do cross-service JOINs. Instead, use data duplication (denormalization) and eventual consistency. For example, the order service stores product snapshots (price, name at time of purchase) so order history doesn't depend on the catalog service being up.
Production Insight
Auto-scaling works great for stateless services (catalog, cart) but backfires for stateful services (inventory, orders).
Inventory service stores stock counts in a database that cannot be sharded at runtime easily.
Rule: Overprovision inventory database capacity for peak and use connection pooling to absorb spikes.
Also use a fast in-memory gate (Redis) for the hot inventory items during flash sales, then reconcile with the database asynchronously.
Key Takeaway
Scale in stages — don't build a microservices monolith from day one.
Consistency vs availability: choose based on the operation.
Stateful services (inventory) cannot auto-scale as easily — plan capacity ahead.
When to Scale Which Component
IfProduct page loading slow
UseCDN + Redis cache, horizontal scaling of catalog service, add Elasticsearch nodes.
IfCart save/update slow
UseRedis cluster for cart data, shard by user ID region.
IfCheckout failing during high load
UseScale checkout orchestration pods, add connection pooling to inventory DB, use Redis gate for stock.
IfPayment gateway timeouts
UseCircuit breaker, dead-letter queue, manual reconciliation script.
● Production incidentPOST-MORTEMseverity: high

Flash Sale Double-Book Disaster

Symptom
Customers received order confirmation emails for the same limited-edition product, but inventory showed zero after the first few seconds. Support tickets flooded in.
Assumption
Assumed that a simple database decrement with a WHERE clause (quantity > 0) would prevent overselling. But two concurrent requests read the same row before either wrote.
Root cause
Read-modify-write race condition. The 'SELECT quantity then UPDATE ... SET quantity = quantity - 1' pattern is not atomic under high concurrency. MySQL default isolation level (REPEATABLE READ) still allows phantom reads in this pattern under high load.
Fix
Switched to atomic decrement: 'UPDATE inventory SET quantity = quantity - 1 WHERE product_id = ? AND quantity > 0'. Then checked affected_rows in application code. Also added a Redis atomic counter as a fast gate before hitting the DB.
Key lesson
  • Inventory decrement must be atomic — never SELECT then UPDATE separately.
  • Use row-level locks or optimistic locking for critical stock operations.
  • Always test race conditions with simultaneous curl scripts before a flash sale.
Production debug guideSymptom → Action for the most frequent production issues4 entries
Symptom · 01
User gets 'Item out of stock' after adding to cart
Fix
Check if inventory reserve was released on cart expiry or failed payment. Look at inventory_reserve_logs for the user's session. Verify TTL on cart reservation.
Symptom · 02
Payment succeeded but order not created
Fix
Check payment gateway callback logs. The webhook may have been delivered but the order service failed to process it. Verify idempotency key was stored; if not, the callback is being replayed.
Symptom · 03
Duplicate charge on credit card
Fix
Your payment idempotency key is missing or not enforced on the gateway side. Check payment_intent.idempotency_key in Stripe logs. Ensure your payment service generates a deterministic key per order attempt.
Symptom · 04
Cart total is inconsistent with price after promo code
Fix
Promo code validation may have used stale product prices. Recalculate cart on checkout initiation, never rely on cached prices. Log all applied promos with timestamps.
★ Flash Sale Performance Debug Cheat SheetWhen traffic spikes 10x, these commands pinpoint the bottleneck in seconds.
Product page loads >1s
Immediate action
Check cache hit ratio on product catalog CDN and Redis
Commands
redis-cli --stat | grep keyspace_hits
curl -s -w '%{time_total}' -o /dev/null https://your-cdn-endpoint/product/123
Fix now
Warm cache with popular product IDs one hour before sale. Pre-generate page HTML and serve from CDN.
Checkout API returning 503+
Immediate action
Check number of active DB connections and queue depth
Commands
show processlist; (MySQL) or SELECT count(*) FROM pg_stat_activity; (PostgreSQL)
kubectl top pods | grep checkout
Fix now
Scale checkout service horizontally. Increase connection pool size (max 100 per instance) and add read replicas for inventory queries.
Orders failing due to inventory race+
Immediate action
Check inventory atomic operation logs
Commands
tail -n100 /var/log/inventory/app.log | grep 'UPDATE inventory'
redis-cli get inventory:sku123 (check Redis gate)
Fix now
Implement Redis atomic decrement with Lua script. Fallback to pessimistic locking for high-value items.
E-commerce Architecture Comparison
Architecture StyleBest ForConsistency ModelOperational ComplexityCost at Scale
MonolithStartups, <100k DAU, simple catalogStrong (single DB)LowLow — single server or small cluster
Microservices (per domain)Medium to large platforms, 100k-10M DAUEventual consistency across servicesHigh — requires DevOps, monitoring, and CI/CD maturityMedium — more services, but better scaling
Serverless (Lambda, Fargate, etc.)Variable traffic, low ops overheadEventual (function per service)Low to medium — but cold starts affect latencyLow for low traffic, high for steady high traffic

Key takeaways

1
Service boundaries should follow access patterns
read-heavy (catalog) vs write-heavy (inventory) vs consistency-critical (payment).
2
Idempotency keys prevent duplicate payments
store them with a unique constraint and use them in every payment gateway call.
3
Checkout is a Saga
atomic inventory reservation, idempotent payment, compensating release on failure.
4
Search is not database query
use Elasticsearch with CDC for product search; cache product pages aggressively.
5
Scaling is stage-dependent
start monolith, split services when bottlenecks appear, and never share a database between inventory and catalog.

Common mistakes to avoid

3 patterns
×

Keeping inventory and catalog in the same database

Symptom
Product page queries cause deadlocks on inventory rows during checkout. Checkout fails with 'could not serialize access' errors.
Fix
Split into two separate databases or schemas. Catalog uses read replicas, inventory uses strongly consistent writes. Consider different engines: MySQL for catalog, PostgreSQL for inventory if needed.
×

Not using idempotency keys for payment

Symptom
Users are charged twice for a single order. Customer support spends hours processing refunds.
Fix
Generate a unique idempotency key per payment attempt. Send it to the payment gateway. Store the key in a table with a unique constraint. Retry with the same key on timeout.
×

Implementing cart as a server-side session only

Symptom
Users lose their cart when switching devices or after session timeout. Cart abandonment rate increases by 30%.
Fix
Persist cart to a database (or Redis) linked to the user ID after login. For guest users, use a client-side cart stored in localStorage and sync on login.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How would you design the checkout flow in a high-traffic e-commerce plat...
Q02SENIOR
How do you handle a payment gateway timeout in a customer-facing checkou...
Q03SENIOR
Explain the trade-offs between using a single database for all e-commerc...
Q01 of 03SENIOR

How would you design the checkout flow in a high-traffic e-commerce platform to prevent overselling?

ANSWER
The checkout flow must guarantee that inventory is decremented atomically before payment, and that payment is idempotent. Use an atomic UPDATE on the inventory table: UPDATE inventory SET quantity = quantity - 1 WHERE product_id = ? AND quantity > 0. Then check affected_rows. If zero, reject. For high contention, use a Redis Lua script for the decrement and fall back to database. After reserving inventory, call payment gateway with an idempotency key. If payment fails, release the reservation by incrementing quantity back. Use a Saga pattern with compensating transactions. For flash sales, consider a pre-reservation step where the cart holds items for a short time (e.g., 5 minutes) before finalizing.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the most common cause of overselling in e-commerce platforms?
02
How do you ensure an order is not lost after payment succeeds?
03
Should I use a cache for inventory data?
04
What is the difference between a distributed transaction (2PC) and a Saga?
🔥

That's Real World. Mark it forged?

5 min read · try the examples if you haven't

Previous
Design a Live Video Streaming System
16 / 17 · Real World
Next
Design a Content Moderation System