System Design Interview - Cache Stampede in Production
During interviews, cache stampede during peak hours caused 100% database CPU and >30s P95 latency.
- System design interviews test your ability to handle ambiguity and make trade-offs under pressure
- Core skill: requirement clarification defines the scope before drawing any boxes
- Key components: functional vs non-functional requirements, high-level design, deep-dive bottlenecks, wrap-up with failure modes
- Performance insight: a 10x scale miscalculation (e.g., QPS off by factor) can make your entire design irrelevant — always verify numbers
- Production insight: the same mistake shows up in real systems — teams build for current load, then wonder why it crumbles at 10x
- Biggest mistake: jumping straight to a solution (Kafka! NoSQL!) without asking 'What problem are we solving?'
Imagine you're asked to design a city from scratch. You don't start by choosing the color of doorknobs — you start with roads, power grids, and water pipes. System design interviews work exactly the same way: interviewers want to see that you can think big, make smart trade-offs, and build something that won't collapse under pressure. It's less about memorizing answers and more about showing you can be the architect, not just the bricklayer.
Every senior engineering role at a top tech company has one brutal filter: the system design round. It's the interview that makes experienced developers freeze up, not because they lack knowledge, but because the question is deliberately open-ended. 'Design Twitter.' 'Design a URL shortener.' 'Design Netflix.' The candidate who answers these well isn't the one who memorized the most blog posts — it's the one who can think out loud, reason through trade-offs, and communicate at the level of a staff engineer.
The core problem this interview type solves — from the interviewer's perspective — is figuring out how you'll behave when given an ambiguous, high-stakes technical problem with no single right answer. At scale, every architectural decision has cascading consequences. Choosing the wrong database engine, ignoring read/write ratios, or failing to think about failure modes can mean millions in lost revenue or a 3am outage. The design interview is a compressed simulation of exactly that situation.
By the end of this guide, you'll have a repeatable framework you can apply to any system design prompt, understand the specific trade-offs interviewers are listening for (and the buzzwords that actually hurt you), know how to handle the moments where you genuinely don't know the answer, and walk away with a mental model that works in real production systems — not just whiteboards.
The 4-Step Framework for Architectural Clarity
A System Design Interview isn't a coding test; it's a conversation about trade-offs. To avoid the 'blank whiteboard' syndrome, you need a reliable framework. We recommend the following: 1. Understand Requirements (Functional & Non-Functional), 2. High-Level Design (The 'Boxes and Arrows'), 3. Deep Dive into Bottlenecks (Database Sharding, Caching), and 4. Wrap-up (Identifying SPOFs and Scaling).
Rather than starting with a dry definition, let's see a practical example of how you might handle a request-response flow in a distributed environment.
Here's the thing most candidates miss: the framework is just a container. What matters is how you move through it. Don't treat it as a rigid checklist — treat it as a conversation. If the interviewer asks a deep question about caching mid-way through step 2, follow that thread. The framework keeps you from getting lost, not from getting interesting.
Infrastructure as Code: Deploying the Architecture
Interviewers love when you can bridge the gap between a whiteboard drawing and actual deployment. Understanding how to containerize and scale your components is key to demonstrating seniority.
But don't just name tools. Explain why you'd choose Docker over a VM, or Kubernetes over a single server. The 'how' is easy — the 'why' is what separates staff engineers.
The dirty secret: most senior engineers have been burned by overcomplicated infrastructure. If you're interviewing, showing you understand the pain of maintaining a 15-microservice nightclub is more impressive than listing every AWS service. Say 'I'd start with a monolith until I have a concrete bottleneck.' That's the answer that gets 'hire.'
Back-of-Envelope Estimations: The Numbers That Validate Your Design
In a system design interview, you can talk theoretical all day, but the moment you put numbers on the whiteboard, you demonstrate real-world experience. Interviewers want to see you can estimate QPS, storage, bandwidth, and cache size with reasonable accuracy.
Don't aim for precision — aim for orders of magnitude. A factor of 10 off is acceptable if you catch it and adjust. Here's the cheat sheet: 1 million requests/second = 1,000,000 QPS. 1 TB = 1000 GB. A single MySQL write can handle ~1k writes/second. A Redis single node can handle ~100k reads/second.
Here's a real trick: always mention 'I'd add 30% headroom for traffic spikes.' It shows you've dealt with production surprises. If you forget to include replication factor (3x) and retention (say 18 months), your storage estimate will be off by a factor of 5 — and your architecture will be wrong.
- QPS is the surface area of your system. High QPS demands caching, partitioning, and asynchronous processing.
- Storage grows unbounded. Estimate 3x growth for data, indexes, and backups.
- Latency is the sharpest constraint. Every network hop adds ~1ms locally, ~50ms cross-region.
Trade-Off Decision Tree: SQL vs NoSQL, Cache vs DB, Sync vs Async
Every system design interview forces you to make choices. The ability to articulate the trade-offs is what separates a good answer from a great one. Let's create a mental decision tree:
- If your system requires strong consistency (e.g., banking), go SQL with a single writer and read replicas. Accept write latency.
- If you need high availability with eventual consistency for a global feed, choose NoSQL (Cassandra, DynamoDB).
- If you have a read-heavy workload (90% reads), add a cache layer (Redis) and consider CDN for static assets.
- If your workload is write-heavy (50%+ writes), you need a commit log (Kafka) and a write-optimised store (Cassandra).
The decision tree is not about picking the right answer — it's about showing you understand the constraints.
Don't forget to mention the hard part: consistency trade-offs. If you pick eventual consistency, say 'I accept that a user might see stale data for a few seconds — that's fine for this use case.' If you pick strong consistency, say 'I'm trading availability for correctness — I'll need to handle higher write latency and potential downtime during partitions.' That's the level of depth that gets 'strong hire.'
Failure Modes: Designing for When Things Go Wrong
The most overlooked part of system design interviews is discussing failure modes. Too many candidates describe a perfect system where everything works. The real world has partial failures — a network partition, a node crash, a buggy deployment. A senior engineer designs for these.
- Single point of failure (SPOF): every load balancer, database master, and queue broker is a candidate. Make everything redundant.
- Cascading failures: when one component fails and overloads the next. Example: a cache node goes down → all traffic hits DB → DB goes down → app fails.
- Recoverability: how do you return to normal after a failure? Blue-green deployments, circuit breakers, and graceful degradation are key concepts.
Don't just list them — explain the specific scenario. Say 'If the cache cluster goes down, I'd have a circuit breaker that falls back to the database with a pool limiter to prevent overload. The users would see slightly slower responses, but no outage.' That's a senior answer.
The Wrap-Up: Single Points of Failure and Scaling Plans
The final step of your interview answer should be a structured wrap-up. Summarise what you've designed, then explicitly call out: 1. Single points of failure: 'Our load balancer is a SPOF. I'd make it redundant with active-passive and a floating IP.' 2. Scaling plan: 'Currently this handles 1M DAU. To reach 100M DAU, I'd shard the database by user ID, add a Redis cache for timelines, and introduce Kafka for async processing of tweets.' 3. Future improvements: things you'd add if time allowed (like telemetry, rate limiting, etc.)
This structured wrap-up leaves a strong final impression — it tells the interviewer you think holistically.
Pro tip: end with an open-ended question. 'Is there any requirement I missed that would change this design?' That shows you're not attached to your answer — you're collaborating.
Data Partitioning and Sharding Strategies: Consistent Hashing, Range Partitioning, and Rebalancing
When you outgrow a single database instance, partitioning becomes inevitable. The two most common strategies are range partitioning and consistent hashing. Range partitioning splits data by key ranges (e.g., user ID 1-1000 on shard A, 1001-2000 on shard B). It's simple and supports range queries, but hot keys can overload a single shard. Consistent hashing distributes data across a ring using a hash function. It minimizes data movement when nodes join or leave, but range queries become expensive because data is scattered.
In interviews, you need to choose based on query patterns. If you frequently query by user ID range, range partitioning is natural. If you need uniform load and dynamic scaling, consistent hashing wins. Many production systems use a hybrid: consistent hashing with virtual nodes (replicas) to spread load evenly, and secondary indexes for range queries.
Always discuss rebalancing: adding new nodes in a range-partitioned system requires splitting ranges and migrating data. Consistent hashing only moves keys within the affected segment. Tools like Cassandra handle rebalancing automatically using virtual nodes and hinted handoff.
Here's the real-world truth: rebalancing is where most designs break. Say 'I'd use consistent hashing with 150 virtual nodes per physical node to distribute load evenly.' That level of detail signals you've done this before.
- Each node maps to multiple points on the ring (virtual nodes) for even load distribution.
- Data is assigned to the nearest clockwise node.
- Adding or removing a node only affects its neighbours, not the entire ring.
Observability and Monitoring: Logging, Metrics, and Distributed Tracing
Most system design descriptions skip observability, but in production you can't fix what you can't see. The three pillars are logging, metrics, and distributed tracing. Logs give you per-request detail but are expensive to store long-term. Metrics (latency, error rates, throughput) give you aggregated health. Distributed tracing connects a single request across multiple services.
In an interview, mentioning you'd integrate Prometheus for metrics, structured logging (JSON), and Jaeger for tracing shows operational maturity. Discuss how you'd monitor the key SLIs: latency (P50, P95, P99), error rate, throughput, and saturation (e.g., CPU, memory, connection pool). Define SLOs and set up alerts based on burn-rate budgets.
A common mistake is to design for zero latency but not instrument to measure it. Add a simple tracing middleware from day one. Without tracing, debugging a 500ms latency spike across 10 services becomes a guessing game.
Here's a concrete snippet: 'I'd set up structured logging with correlation IDs, then pipe logs into a centralised system (ELK). For metrics, I'd expose endpoints for Prometheus and create dashboards showing P99 latency, error rate, and throughput. For tracing, I'd use OpenTelemetry with Jaeger.' That's a senior-level answer.
Idempotency and Retry Strategies: Preventing Double Charges and Data Corruption
In distributed systems, network failures are inevitable. A client sends a request, the server processes it, but the acknowledgment is lost. The client retries — and suddenly you have two orders, two payments, two emails. That's why idempotency is not optional.
Idempotency means performing the same operation multiple times produces the same result. For write operations, use an idempotency key: a unique token generated by the client and sent with the request. The server stores the result keyed by that token; if it sees the same key again, it returns the stored response without executing the operation again.
In interviews, mention idempotency early. Say 'Every write operation will include an idempotency key. The client generates a UUID, sends it with the request. The server deduplicates by that key.' This signals you've built payment systems.
For retries, use exponential backoff with jitter — never retry instantly. A naive retry can bring down a struggling service. Say 'I'd use exponential backoff with a base of 1 second, doubling each time, capped at 30 seconds, plus random jitter (±20%) to spread retries.'
The Cache Stampede That Cost a Weekend
- Always model your cache miss rate — a 50% spike can kill your database.
- Never trust default TTL values: they're designed for demo, not production.
- In an interview, discussing cache stampede shows depth — say 'I'd use a mutex around recompute and stagger TTLs.'
- Always include cache warm-up during deployment to avoid cold cache stampede.
Key takeaways
Common mistakes to avoid
7 patternsJumping into drawing boxes before defining functional and non-functional requirements
Failing to estimate scale (QPS, storage, bandwidth)
Ignoring the 'Failure Mode': Never assume a network call succeeds
Choosing technologies without trade-off analysis
Over-engineering for scale that doesn't exist
Ignoring data replication and consistency trade-offs
Designing without considering network latency between components
Interview Questions on This Topic
Design a URL shortener. Walk through your requirement gathering, then high-level design.
Frequently Asked Questions
That's System Design Interview. Mark it forged?
8 min read · try the examples if you haven't