Design Amazon — S3 Blast Radius and Checkout Races
A mistyped S3 command took down Amazon.
- Amazon is a multi-service distributed system: catalog, cart, orders, payments, search, recs, logistics — all independent but coordinated
- Product catalog is read-optimised with caching; inventory is real-time with strong consistency in a separate store
- Cart lives in a low-latency KV store (DynamoDB); order history in a relational DB (Aurora)
- Search is powered by a dedicated search engine (Elasticsearch) separate from the OLTP DB
- Payments must be idempotent and eventually consistent — one duplicate charge can cost millions
- The biggest mistake: designing for consistency everywhere — you'll kill availability and latency
Imagine a massive warehouse with millions of shelves, thousands of cashiers, a personal shopping assistant who remembers everything you've ever bought, and a delivery network that spans the globe. Amazon is exactly that — but built from software. Every time you search for headphones, add them to a cart, pay, and track a package, dozens of separate systems are quietly talking to each other to make it feel seamless. This article is about how those systems are actually designed — and the real trade-offs that keep them running under peak load.
Amazon processes over 66,000 orders per minute at peak, serves hundreds of millions of customers across 20+ countries, and runs one of the most complex distributed systems ever built — all while most transactions complete in under a second. Understanding how to design a system at this scale isn't just an interview exercise; it's a masterclass in the real trade-offs that define modern software engineering: consistency vs. availability, latency vs. accuracy, operational simplicity vs. raw performance.
The core problem Amazon solves is multi-dimensional. It's not just a database with a shopping cart on top. It's a real-time inventory system, a personalization engine, a payments processor, a logistics orchestrator, a search engine, and a seller marketplace — all running simultaneously, all needing to agree on the state of the world, and all needing to survive individual component failures without the customer ever noticing. The challenge isn't writing any one of these systems; it's making them work together under crushing load.
By the end of this article, you'll be able to walk into a system design interview and articulate a coherent, production-realistic Amazon architecture. You'll understand why the product catalog is separated from inventory, why the cart lives in a different data store than order history, how search is decoupled from the relational database, and what actually happens between you clicking 'Buy Now' and your order appearing on screen. You'll know the real trade-offs, not just the happy path.
Core Architecture Principles
Amazon's architecture is built on a few non-negotiable principles. First, data ownership is absolute: each microservice owns its data exclusively — no shared tables between services. Second, communication is asynchronous where possible: use events (Kafka) for order creation, inventory updates, and shipping triggers. Synchronous calls are reserved for operations that need immediate confirmation, like payment gateway interaction. Third, cache everything that can be stale. The product catalog, search results, recommendations — all served from cached layers that accept minutes of staleness. Fourth, fail gracefully: if a downstream service is down, the system degrades, it doesn't crash. The homepage might show fewer recommendations, but the site stays up.
These principles are not theoretical — they were earned through real production failures. The 2017 S3 outage showed that shared infrastructure can bring down the entire site. The 2020 DynamoDB throttling event during Prime Day taught them to provision for 3x peak traffic. Every principle has a scar.
- Each service owns its data — no shared DB tables between services.
- Communicate through events for most flows; use synchronous calls only for idempotent, critical paths.
- If two services need to share a database table, merge them into one service.
- Design for partial failure: every external call can fail, and the system must survive.
Requirements & Estimation
Before drawing boxes, we need numbers. Amazon serves ~200M active customers, processes ~66,000 orders/min at peak. Every second, that's ~1,100 orders. Each order generates writes to cart, inventory, payment, order, and logistics services. Read-to-write ratio for the product catalog is roughly 100:1, while for cart it's 1:1 (every add is followed by a read during checkout). Storage: product catalog ~100M items, each with 10-50 KB metadata — that's ~5 TB in the DB. Images stored in object storage (S3), total petabyte-scale. Network bandwidth: each page load transfers ~2 MB (HTML, JS, images). At 200M DAU, average 10 pages per session = 2B page loads/day = ~4 PB/day outbound — that's why you need CDN and aggressive caching.
These numbers drive every architecture decision. You don't design for 66K orders/min without knowing your bottleneck: database writes per second, queue throughput, payment latency SLA. A common mistake: designing for average load, not peak. Prime Day traffic spikes 5-10x above average. So you need to provision for at least 2x your estimated peak, and then use auto-scaling to handle surges.
- Assume 2x growth over next 2 years. Design for 200K orders/min.
- Every order creates 5 writes (cart, inventory, payment, order, shipping). So 5500 writes/sec at peak.
- Catalog reads: 100:1 read/write -> 200K reads/sec. Cache the top 5% hottest items (Pareto).
- Bandwidth: 2MB per page 10 pages/user 200M users / 86400 = 4.6 TB/day -> 53 GBps peak. CDN is non-negotiable.
High-Level Architecture — Service Decomposition
Amazon's architecture is a collection of hundreds of microservices. The core ones for an e-commerce platform:
- Product Catalog Service: read-heavy, exposes product details, categories, images. Uses a read replica with a CDN cache for images.
- Inventory Service: tracks stock per warehouse. Must be strongly consistent to avoid overselling. Usually a separate database (Aurora with row-level locking).
- Cart Service: low-latency, high-write. Uses DynamoDB with eventual consistency for add/remove operations; cart read during checkout uses strong consistency.
- Order Service: receives checkout request, orchestrates the saga: reserve inventory, process payment, create order, trigger shipping. Uses a queue for decoupling.
- Payment Service: idempotent, integrates with external gateways. Stores transaction logs in a relational DB.
- Search Service: Elasticsearch cluster indexed from catalog and inventory changes via CDC.
- Recommendation Service: ML pipeline producing real-time recommendations served via a separate read-optimised cache.
- Shipping Service: async, watches order completion events and sends to logistics.
Each service has its own database, communicates via HTTP/REST or async events (Kafka). API Gateway routes requests, handles authentication, rate limiting.
Data Consistency & Trade-offs Across Services
Amazon must maintain consistency where it matters (inventory, payments) and accepts eventual consistency where it doesn't (product catalog updates, recommendations). The key trade-offs:
- Product Catalog: writes are rare (admin updates), reads are massive. Use a leaderless read-replica architecture with cache-aside pattern. A catalog update can take minutes to propagate to all edge caches — that's fine.
- Inventory: overselling is unacceptable. When a customer adds an item to cart, we reserve inventory for 15 minutes. If not checked out, the reservation expires. This is optimistic — but during high contention, we risk deadlocks. Use row-level locking in Aurora for the inventory row. This limits throughput to ~1000 inventory reservations per second per row. Solution: shard inventory by product ID (each product gets its own partition).
- Cart & Order: The cart service uses eventual consistency for add/remove, but during checkout, the order service reads the cart with strong consistency and then runs a saga: reserve inventory (idempotent), charge payment (idempotent), decrement inventory, create order. If any step fails, compensate: release inventory, void payment.
- Search: Elasticsearch is eventually consistent with the inventory DB. If you add an item, it might take seconds to appear in search results. Acceptable for most queries, but for sellers pushing inventory updates, we provide a synchronous fallback: if a seller uses 'update inventory API', we directly update a cache that search reads with low latency.
Use gossip protocols and CRDTs where possible for coordination-free eventual consistency.
Search & Recommendations — The Read-Optimised Path
Search and recommendations are the two features with the highest read load on Amazon. Both are served entirely from caches and search indices, never touching the main OLTP databases.
Search: Users type a query, API Gateway routes to Search Service, which queries Elasticsearch (ES). ES returns product IDs, then the service fetches product details from a local Redis cache (or falls back to catalog DB). The search index is updated asynchronously via Kafka connect from the inventory and catalog databases. Latency target: under 100ms P99.
Recommendations: For each page load, the frontend sends user context (user ID, page category, recent searches). The Recommendation Service runs an ML model (e.g., collaborative filtering with matrix factorisation) tuned every 6 hours. Model outputs are pre-computed for each user and stored in Redis with a TTL of 12 hours. The service returns a list of product IDs, and the frontend fetches details from the same cache layer as search. Latency target: under 50ms P99.
To scale search, we use a tiered approach: popular queries are cached in a local CDN node (Varnish) with 5-minute TTL. Hot product details are in Redis with sharding across nodes. Cold products go to Elasticsearch with a larger shard count.
Caching & CDN Strategy
Amazon's read volume is staggering — millions of requests per second for catalog pages, images, search results. Without a multi-tier caching strategy, the origin databases would collapse. The caching layers, from edge to database:
- CDN (CloudFront): Caches static assets (product images, CSS, JS) at edge locations. TTL of 24 hours for assets, invalidated on new uploads. For dynamic content (search results, recommendations), CDN caches only popular queries with short TTL (5 minutes).
- API Gateway Cache: Regional cache for identical API responses. Works well for product details that don't change often.
- Service-level Cache (Redis): Each service has its own Redis cluster. Catalog service caches product details by ID (LRU eviction). Cart service uses Redis for session data. Recommendation service caches precomputed user recommendations.
- Database Read Replicas: Aurora read replicas handle cache misses. In extreme cases, they can be promoted to handle more read load.
The design principle: the top 5% of hottest products receive 80% of traffic (Pareto). Cache those aggressively. Long-tail products are served from Elasticsearch or read replicas with lower priority.
- Track access frequency per product. Promote hot items to faster cache tiers.
- Use Redis with maxmemory-policy allkeys-lru for automatic eviction of cold items.
- In CDN, cache popular query results but invalidate on inventory change.
- Warm the cache before major sales events by pre-loading top products.
Checkout Flow — From Cart to Confirmation
When the user clicks 'Place Order', this is the most critical path. Here's the real sequence:
- Cart Service retrieves the user's cart with strong consistency (gets latest items and their IDs).
- Order Service receives the checkout request and starts a saga:
- - Reserve Inventory: for each item, call Inventory Service to reserve quantity. If any item is insufficient, fail the entire order (release other reservations).
- - Process Payment: call Payment Service with the total amount and an idempotency key. The payment service interacts with the external gateway. If timeout, retry (idempotency prevents double charge).
- - Create Order: insert order record into Order DB.
- - Decrement Inventory: final decrement of reserved quantities.
- - Send to Shipping: publish
order_createdevent to Kafka, which the Shipping Service picks up. - If any step fails after payment, a compensation transaction is run: refund payment, release remaining inventory. This compensation is also idempotent.
- The frontend polls the Order Service for the order status (every 2 seconds until confirmed) and then redirects to the order confirmation page.
All services use asynchronous communication where possible to reduce end-to-end latency. The entire saga typically completes in under 500ms for 95% of orders.
S3 Outage That Took Down Amazon.com
- Blast radius: any admin command on shared infrastructure can take down unrelated services. Always use change management and runbooks.
- Defense in depth: the frontend should degrade gracefully when static assets are unavailable — show text-only product descriptions instead of failing entirely.
- Monitoring: alarm on sudden capacity loss in critical storage systems, not just traffic drops.
Key takeaways
Common mistakes to avoid
5 patternsDesigning for strong consistency everywhere
Treating the cart as a simple key-value store without conflict resolution
Not planning for idempotency in payment processing
Building an unbounded cache without an eviction policy
Building a monolith and decomposing too late
Interview Questions on This Topic
How would you design the product catalog service to handle 200M daily active users with a 100:1 read-to-write ratio?
Frequently Asked Questions
That's Real World. Mark it forged?
7 min read · try the examples if you haven't