Cache Stampede Prevention — Mutex vs Background Refresh
One expired key triggered 20,000 DB queries during a flash sale.
- Caching stores frequently accessed data in fast memory (RAM) to reduce database load
- Core components: storage (hash map), eviction policy (LRU), consistency strategy (TTL)
- Distributed caching adds consistent hashing for sharding across nodes
- Performance: in-memory read ~1ms vs database ~10-50ms — a 10-50x improvement
- Production threat: a cache stampede can collapse your DB when a hot key expires under high concurrency
- Biggest mistake: treating cache as a database — no TTL leads to stale data and OOM crashes
Imagine you're a chef who gets asked for the same recipes dozens of times a day. Instead of flipping through your giant cookbook every single time, you write the ten most-requested recipes on a sticky note and pin it to the fridge. That sticky note is your cache — fast to read, always nearby, but limited in space. When someone asks for a new recipe that's not on the note, you look it up in the big book and decide which sticky note to replace. That's literally how a caching system works.
Every millisecond matters at scale. When Netflix serves 200 million subscribers or Twitter surfaces a trending tweet, a database query that takes 50ms repeated ten thousand times per second will buckle your infrastructure and light your AWS bill on fire. Caching is the single highest-leverage tool in a backend engineer's kit, yet most engineers treat it as an afterthought — a Redis call dropped in after the database is already struggling. The engineers who design caches thoughtfully are the ones who build systems that survive virality.
The core problem caching solves is the impedance mismatch between how fast your application needs data and how fast your storage layer can produce it. Disk-based databases are optimised for durability and complex queries, not for microsecond reads of the same user profile record fifty thousand times per minute. A well-placed cache absorbs that repeated read pressure, serves data from memory at nanosecond speed, and lets your database do what it's actually good at — handling writes and complex aggregations.
By the end of this article you'll be able to walk into a system design interview and explain cache placement strategies, eviction policies and their trade-offs, cache invalidation approaches and their consistency guarantees, how distributed caches like Redis Cluster work internally, and the production failure modes (stampedes, poisoned caches, thundering herds) that separate senior engineers from mid-levels. You'll have working Java code you can reason about, and you'll know exactly what the interviewer is fishing for when they ask 'how would you cache this?'
Anatomy of a Distributed Cache: More Than Just a Key-Value Store
In an interview, you're not just 'using Redis'; you're designing a high-availability, low-latency data layer. A production-grade caching system consists of three pillars: Storage (usually in-memory Hash Maps or B-Trees), an Eviction Policy to handle capacity limits, and a Consistency Strategy to ensure the cache doesn't serve stale data while the database has moved on.
When we talk about distributed caching, we introduce a fourth pillar: Partitioning (Sharding). Since a single node can't hold all the data or handle all the traffic, we use Consistent Hashing to distribute keys across multiple nodes. This minimizes data movement when a node is added or removed — a critical detail that distinguishes a Senior candidate from a Junior one.
Write Strategies: Balancing Speed and Safety
How you update the cache determines your system's consistency.
- Write-Through: Data is written to the cache and the database simultaneously. High consistency, but adds latency to writes.
- Write-Around: Data is written only to the database. The cache is only updated on a 'miss'. This prevents the cache from being flooded with data that is rarely read.
- Write-Back (Write-Behind): Data is written to the cache first, and the database is updated asynchronously. This provides the highest performance but risks data loss if the cache node crashes before the DB is updated.
Cache Invalidation: The Hardest Problem in Computer Science
There are only two hard things in computer science: cache invalidation and naming things. Invalidation ensures that stale data is removed or updated when the underlying source changes. Common approaches:
- TTL-based: Set an expiry time. Simple but can serve stale data within the TTL window.
- Event-based invalidation: Publish a cache invalidation event when the DB is updated (e.g., Redis Pub/Sub, Kafka). Near real-time consistency.
- Write-through (already covered): Synchronously update cache on write — strong consistency but higher write latency.
- Read-repair: In distributed caches, if a stale value is read, the reader triggers an update. Used in Dynamo-style systems.
In practice, most systems use a combination: TTL as a safety net, event-based for critical data.
Distributed Cache Topologies: Replication vs Sharding
When a single node can't handle traffic or memory, you need multiple nodes. Two primary patterns:
Replication: Data is copied to multiple nodes (e.g., Redis Sentinel). High read throughput, but write amplification. Writes go to primary, then async to replicas — eventual consistency.
Sharding: Data is partitioned across nodes (e.g., Redis Cluster, Memcached with client-side hashing). Each node owns a subset of keys. Requires consistent hashing to minimize rehashing.
Trade-off: Replication is simpler but wastes memory (each node stores all data). Sharding is memory-efficient but adds complexity for cross-node operations (e.g., multi-key commands fail if keys are on different nodes).
In interviews, start with replication for read-heavy workloads, then scale to sharding when memory of a single node becomes the bottleneck.
Cache Failure Modes: What Breaks Your Cache in Production
Three classic failure modes every senior engineer must explain:
Cache Stampede (Thundering Herd): A hot key expires, thousands of requests try to repopulate it simultaneously, overloading the database. Fix: Use a mutex/lease to allow only one thread to rebuild; others wait. Or use probabilistic early expiration (like Twitter's 'cache roulette').
Cache Penetration: Requests for non-existent keys (e.g., user IDs that don't exist) bypass cache and hit the DB every time. If these are common (e.g., DDOS attacks), the DB gets hammered. Fix: Cache 'null' results with a short TTL (e.g., 1 minute) or use a Bloom filter upfront.
Cache Avalanche: A large number of keys expire at the same time (e.g., all users cached at midnight with same TTL). DB gets a sudden load spike. Fix: Add random jitter to TTLs so they expire at different times. Or use a secondary local cache to absorb the initial misses.
Black Friday Cache Stampede: How a Single Expired Key Took Down the Checkout DB
- Always protect cache-miss paths with a mutex or lease to prevent stampedes.
- Hot keys should use a shorter TTL with background refresh, not a single long TTL.
- Monitor cache hit rate as a leading indicator of DB load.
Key takeaways
Common mistakes to avoid
6 patternsNot setting a TTL
EXPIRE in Redis or @Cacheable TTL configurations.Ignoring Cache Penetration
Hardcoding cache logic in Business Services
Over-caching small objects
Using `key % N` for distributed sharding
Not setting maxmemory on Redis
maxmemory (e.g., 80% of RAM) and maxmemory-policy (default: noeviction — change to allkeys-lru).Interview Questions on This Topic
Design a Least Frequently Used (LFU) cache with O(1) get and put operations. (Requires a nested doubly linked list or a frequency map + linked list.)
Frequently Asked Questions
That's System Design Interview. Mark it forged?
4 min read · try the examples if you haven't