Junior 4 min · March 06, 2026

Cache Stampede Prevention — Mutex vs Background Refresh

One expired key triggered 20,000 DB queries during a flash sale.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Caching stores frequently accessed data in fast memory (RAM) to reduce database load
  • Core components: storage (hash map), eviction policy (LRU), consistency strategy (TTL)
  • Distributed caching adds consistent hashing for sharding across nodes
  • Performance: in-memory read ~1ms vs database ~10-50ms — a 10-50x improvement
  • Production threat: a cache stampede can collapse your DB when a hot key expires under high concurrency
  • Biggest mistake: treating cache as a database — no TTL leads to stale data and OOM crashes
Plain-English First

Imagine you're a chef who gets asked for the same recipes dozens of times a day. Instead of flipping through your giant cookbook every single time, you write the ten most-requested recipes on a sticky note and pin it to the fridge. That sticky note is your cache — fast to read, always nearby, but limited in space. When someone asks for a new recipe that's not on the note, you look it up in the big book and decide which sticky note to replace. That's literally how a caching system works.

Every millisecond matters at scale. When Netflix serves 200 million subscribers or Twitter surfaces a trending tweet, a database query that takes 50ms repeated ten thousand times per second will buckle your infrastructure and light your AWS bill on fire. Caching is the single highest-leverage tool in a backend engineer's kit, yet most engineers treat it as an afterthought — a Redis call dropped in after the database is already struggling. The engineers who design caches thoughtfully are the ones who build systems that survive virality.

The core problem caching solves is the impedance mismatch between how fast your application needs data and how fast your storage layer can produce it. Disk-based databases are optimised for durability and complex queries, not for microsecond reads of the same user profile record fifty thousand times per minute. A well-placed cache absorbs that repeated read pressure, serves data from memory at nanosecond speed, and lets your database do what it's actually good at — handling writes and complex aggregations.

By the end of this article you'll be able to walk into a system design interview and explain cache placement strategies, eviction policies and their trade-offs, cache invalidation approaches and their consistency guarantees, how distributed caches like Redis Cluster work internally, and the production failure modes (stampedes, poisoned caches, thundering herds) that separate senior engineers from mid-levels. You'll have working Java code you can reason about, and you'll know exactly what the interviewer is fishing for when they ask 'how would you cache this?'

Anatomy of a Distributed Cache: More Than Just a Key-Value Store

In an interview, you're not just 'using Redis'; you're designing a high-availability, low-latency data layer. A production-grade caching system consists of three pillars: Storage (usually in-memory Hash Maps or B-Trees), an Eviction Policy to handle capacity limits, and a Consistency Strategy to ensure the cache doesn't serve stale data while the database has moved on.

When we talk about distributed caching, we introduce a fourth pillar: Partitioning (Sharding). Since a single node can't hold all the data or handle all the traffic, we use Consistent Hashing to distribute keys across multiple nodes. This minimizes data movement when a node is added or removed — a critical detail that distinguishes a Senior candidate from a Junior one.

io.thecodeforge.cache.SimpleLruCache.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package io.thecodeforge.cache;

import java.util.LinkedHashMap;
import java.util.Map;

/**
 * A production-grade LRU (Least Recently Used) Cache implementation.
 * Using LinkedHashMap to maintain access order for O(1) eviction.
 */
public class SimpleLruCache<K, V> extends LinkedHashMap<K, V> {
    private final int capacity;

    public SimpleLruCache(int capacity) {
        // true for access-order, false for insertion-order
        super(capacity, 0.75f, true);
        this.capacity = capacity;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
        // Evict the least recently accessed entry when capacity is exceeded
        return size() > capacity;
    }

    public static void main(String[] args) {
        SimpleLruCache<String, String> cache = new SimpleLruCache<>(3);
        cache.put("user:1", "Alice");
        cache.put("user:2", "Bob");
        cache.put("user:3", "Charlie");
        
        cache.get("user:1"); // user:1 becomes the most recently used
        cache.put("user:4", "David"); // Capacity exceeded, user:2 is evicted

        System.out.println("Cache keys after eviction: " + cache.keySet());
    }
}
Output
Cache keys after eviction: [user:3, user:1, user:4]
Forge Tip: Consistent Hashing is the Secret Sauce
In a distributed environment, never use 'key % N' to find a node. If N changes (a server dies), almost every key will map to a different node, causing a cache miss storm. Consistent Hashing limits the impact to only 1/N of the keys.
Production Insight
In production, LinkedHashMap-based LRU is fine for single-node caches but doesn't scale across threads. Use ConcurrentHashMap with access ordering if you need thread safety, or move to Redis for distributed consistency.
Real-world failure: A team used this exact LRU inside a web server and got O(1) per request, but under load the synchronized put caused contention. They moved to a striped lock approach.
Rule: For multi-threaded access, either use ConcurrentLinkedHashMap (from Caffeine library) or wrap with a ReadWriteLock.
Key Takeaway
LRU eviction with LinkedHashMap is O(1) and simple for single-threaded use.
For production concurrency, Caffeine or Redis beats rolling your own.
Consistent hashing is non-negotiable for distributed caching.

Write Strategies: Balancing Speed and Safety

How you update the cache determines your system's consistency.

  1. Write-Through: Data is written to the cache and the database simultaneously. High consistency, but adds latency to writes.
  2. Write-Around: Data is written only to the database. The cache is only updated on a 'miss'. This prevents the cache from being flooded with data that is rarely read.
  3. Write-Back (Write-Behind): Data is written to the cache first, and the database is updated asynchronously. This provides the highest performance but risks data loss if the cache node crashes before the DB is updated.
docker-compose.ymlDOCKER
1
2
3
4
5
6
7
8
9
10
11
12
13
14
version: '3.8'
services:
  redis-cache:
    image: redis:7.2-alphine
    container_name: thecodeforge-redis
    ports:
      - "6379:6379"
    command: ["redis-server", "--maxmemory", "512mb", "--maxmemory-policy", "allkeys-lru"]
    networks:
      - forge-network

networks:
  forge-network:
    driver: bridge
Output
Redis container configured with 512MB limit and LRU eviction policy.
Watch Out: The Thundering Herd
When a hot cache key expires, thousands of concurrent requests might hit the database simultaneously to re-populate it. Use 'Locks' or 'Leases' to ensure only one request re-populates the cache while others wait.
Production Insight
Write-through adds latency to every write, which often surprises teams expecting zero-performance cost. Write-behind improves write throughput but risks data loss if cache node crashes before flush.
Real-world failure: A fintech startup used write-behind to speed up transactions. A Redis node died before flushing to DB, causing financial reconciliation errors.
Rule: Never use write-behind for financial data. Use write-through for critical writes, write-around for read-heavy workloads.
Key Takeaway
Write-through: safe but slower writes.
Write-behind: fast but unsafe — use only for non-critical, loss-tolerant data.
Cache aside: most flexible pattern — app controls both cache and DB.

Cache Invalidation: The Hardest Problem in Computer Science

There are only two hard things in computer science: cache invalidation and naming things. Invalidation ensures that stale data is removed or updated when the underlying source changes. Common approaches:

  1. TTL-based: Set an expiry time. Simple but can serve stale data within the TTL window.
  2. Event-based invalidation: Publish a cache invalidation event when the DB is updated (e.g., Redis Pub/Sub, Kafka). Near real-time consistency.
  3. Write-through (already covered): Synchronously update cache on write — strong consistency but higher write latency.
  4. Read-repair: In distributed caches, if a stale value is read, the reader triggers an update. Used in Dynamo-style systems.

In practice, most systems use a combination: TTL as a safety net, event-based for critical data.

TTL Alone Is Not Enough
If your data must be strongly consistent (e.g., user balances), TTL-based invalidation alone will serve stale data. Pair it with event-driven invalidation or use write-through.
Production Insight
Event-based invalidation sounds great until you have a network partition. If the invalidation event is lost, the cache stays stale until TTL expiry.
Real-world failure: A payments service used Redis Pub/Sub for invalidation. When the Redis connection dropped, no events reached the cache, and users saw old balances for up to 5 minutes — the TTL. They added a dead-letter queue and periodic full cache refresh.
Rule: Always combine event-based invalidation with a reasonably short TTL (a few minutes) as a safety net.
Key Takeaway
TTL is simple but eventually consistent.
Event-based invalidation gives near real-time consistency.
Combine both: TTL as safety net, events for immediate updates.

Distributed Cache Topologies: Replication vs Sharding

When a single node can't handle traffic or memory, you need multiple nodes. Two primary patterns:

Replication: Data is copied to multiple nodes (e.g., Redis Sentinel). High read throughput, but write amplification. Writes go to primary, then async to replicas — eventual consistency.

Sharding: Data is partitioned across nodes (e.g., Redis Cluster, Memcached with client-side hashing). Each node owns a subset of keys. Requires consistent hashing to minimize rehashing.

Trade-off: Replication is simpler but wastes memory (each node stores all data). Sharding is memory-efficient but adds complexity for cross-node operations (e.g., multi-key commands fail if keys are on different nodes).

In interviews, start with replication for read-heavy workloads, then scale to sharding when memory of a single node becomes the bottleneck.

Production Insight
Production gotcha: Redis Cluster imposes a 16K hash slot constraint. If you use Redis Cluster, multi-key operations fail with CROSSSLOT error unless keys share a hash tag. This broke our paginated cache scans until we added hash tags for related keys.
Real-world failure: A team used simple MOD sharding (key % N) and added a node during scaling — 90% of keys moved, causing a massive cache miss storm. They migrated to consistent hashing.
Rule: Always use consistent hashing for sharding. Never use key % N.
Key Takeaway
Replication: simple, good for reads, wastes memory.
Sharding: memory-efficient but complex — use consistent hashing.
In an interview: start with replication, then scale to sharding.
Replication vs Sharding Decision
IfRead-heavy workload, data fits in one node's memory
UseUse replication (primary-replica) — simpler, good read throughput.
IfData volume exceeds single node memory (e.g., >64GB)
UseUse sharding with consistent hashing — spreads data across nodes.
IfNeed high write throughput with strong consistency
UseReplication limits write throughput to primary. Sharding with single-key writes scales better.

Cache Failure Modes: What Breaks Your Cache in Production

Cache Stampede (Thundering Herd): A hot key expires, thousands of requests try to repopulate it simultaneously, overloading the database. Fix: Use a mutex/lease to allow only one thread to rebuild; others wait. Or use probabilistic early expiration (like Twitter's 'cache roulette').

Cache Penetration: Requests for non-existent keys (e.g., user IDs that don't exist) bypass cache and hit the DB every time. If these are common (e.g., DDOS attacks), the DB gets hammered. Fix: Cache 'null' results with a short TTL (e.g., 1 minute) or use a Bloom filter upfront.

Cache Avalanche: A large number of keys expire at the same time (e.g., all users cached at midnight with same TTL). DB gets a sudden load spike. Fix: Add random jitter to TTLs so they expire at different times. Or use a secondary local cache to absorb the initial misses.

Production Insight
A real-world e-commerce company faced a cache avalanche every hour on the hour — their homepage banners had a fixed 1-hour TTL. At :00, all banners expired, causing 5,000 DB queries at once. They added a random +-10% jitter to TTLs, and the spike flattened.
Rule: Always add random jitter to TTLs, especially for keys that are generated in batches (e.g., cached query results from scheduled jobs).
Key Takeaway
Cache stampede: mutex or lease required.
Cache penetration: Bloom filter saves your DB.
Cache avalanche: TTL jitter is your friend.
Know all three — interviewers love this.
Diagnose Your Cache Failure
IfDB spikes after specific key expires
UseCache stampede — implement mutex or early re-computation.
IfHigh DB load for keys that never exist
UseCache penetration — use Bloom filter or cache nulls.
IfDB spikes at regular interval (e.g., every hour)
UseCache avalanche — add jitter to TTLs or stagger expiry.
● Production incidentPOST-MORTEMseverity: high

Black Friday Cache Stampede: How a Single Expired Key Took Down the Checkout DB

Symptom
All checkout requests timeout. Database CPU at 100%. Cache hit rate drops from 95% to 5%. Alerts show a sudden surge in DB connections.
Assumption
Engineers assumed the cache would always be populated and that the DB could handle the fallback load. No rate limiting on cache miss repopulation.
Root cause
The inventory cache key had a fixed 5-minute TTL with no background refresh. When it expired during a flash sale, 20,000 concurrent threads all tried to rebuild it from the DB simultaneously.
Fix
Implement a mutex/lease mechanism so only one thread rebuilds the cache. Others wait for the new value. Also added a background job to pre-refresh high-traffic keys before TTL expiry.
Key lesson
  • Always protect cache-miss paths with a mutex or lease to prevent stampedes.
  • Hot keys should use a shorter TTL with background refresh, not a single long TTL.
  • Monitor cache hit rate as a leading indicator of DB load.
Production debug guideWhen your cache behaves like a liar instead of a lifesaver, use these symptom→action pairs to find the root cause fast.4 entries
Symptom · 01
Cache hit rate drops dramatically from 95% to 40%
Fix
Check redis-cli INFO stats for keyspace_hits and keyspace_misses. Look for recent deployment that changed cache keys or TTL. Verify consistent hash ring didn't redistribute keys.
Symptom · 02
Memory usage hits limit, evictions spike
Fix
Check evicted_keys in Redis INFO. Run memory-stats to analyse top consumer keys. Consider increasing maxmemory or switching eviction policy (allkeys-lru vs volatile-lru).
Symptom · 03
Stale data served even though TTL is set
Fix
Verify cache invalidation on writes — are you using write-through or cache-aside? Check if TTL is shorter than expected due to configuration override. Inspect Redis TTL with TTL key command.
Symptom · 04
Database still overloaded despite high cache hit rate
Fix
Check for cache penetration: requests for non-existent keys bypass cache. Implement Bloom filter or cache negative results with short TTL. Also verify that cache-aside doesn't miss on repeated failed lookups.
★ Quick Cache Debugging Cheat SheetCommands to diagnose common caching issues in production — Redis-focused but patterns apply to any cache system.
High miss rate
Immediate action
Check cache hit ratio immediately
Commands
redis-cli -h <host> INFO stats | grep -E 'keyspace_(hits|misses)'
redis-cli --bigkeys — find keys consuming most space
Fix now
If miss rate >20%, investigate key naming changes or TTL too short. Consider pre-warming cache after deploy.
Cache memory exhausted+
Immediate action
Check evicted_keys count
Commands
redis-cli INFO stats | grep evicted_keys
redis-cli MEMORY STATS — see per-key memory overhead
Fix now
Immediate: set maxmemory-policy to allkeys-lru. Long-term: compress values, reduce TTL, or add nodes.
Stale data served+
Immediate action
Check TTL on suspected key
Commands
redis-cli TTL <key>
redis-cli DEBUG OBJECT <key> — check last access time
Fix now
If TTL is too long, set it shorter. Ensure invalidation on write: if using cache-aside, delete key after DB update.
Sudden DB spike+
Immediate action
Check for cache stampede
Commands
redis-cli MONITOR | grep -E 'GET|SET' — identify keys being hammered
Check logs for 'cache miss' rate spike in application metrics
Fix now
Implement mutex for hot key rebuild or use Redis SET NX with lock TTL. Add rate limiter on cache miss.
PolicyMechanismIdeal Use Case
LRU (Least Recently Used)Discards items not used for the longest timeGeneral purpose web apps, user sessions
LFU (Least Frequently Used)Discards items with the lowest hit countAssets with stable popularity (e.g., static logos)
FIFO (First In First Out)Discards the oldest added item regardless of useShort-lived data with predictable lifespans
TTL (Time To Live)Expiration based on absolute timeNews feeds, price data, temporary tokens

Key takeaways

1
Caching is a trade-off
You gain microsecond latency at the cost of data consistency and infrastructure complexity.
2
Eviction is mandatory
Always choose an eviction policy (LRU is the standard) and set memory limits to prevent system crashes.
3
Know your failure modes
Cache Avalanche, Cache Penetration, and Cache Stampede separate Senior from Mid-level in interviews.
4
Distributed scaling requires Consistent Hashing to maintain high hit rates during cluster resizing.
5
Production rule
Always combine TTL with event-based invalidation for data that must be fresh.
6
Profile before caching
Not every query benefits from caching — small, rare datasets may be slower cached externally.

Common mistakes to avoid

6 patterns
×

Not setting a TTL

Symptom
Cache grows unboundedly, eventually causing OutOfMemoryError in the JVM or Redis reaching maxmemory and thrashing evictions.
Fix
Always set a TTL based on data freshness requirements. Use EXPIRE in Redis or @Cacheable TTL configurations.
×

Ignoring Cache Penetration

Symptom
Constant DB load from requests for keys that don't exist (e.g., invalid user IDs). Cache hit rate looks good, but DB is still under fire.
Fix
Cache null/negative results with a short TTL (30-60 seconds) or use a Bloom filter to reject non-existent keys upfront.
×

Hardcoding cache logic in Business Services

Symptom
Cache code scattered across services, making it impossible to switch providers or tune policies without modifying business logic.
Fix
Use an abstraction layer (e.g., Spring @Cacheable) or a dedicated caching service with configuration-driven eviction and TTL.
×

Over-caching small objects

Symptom
Small datasets (e.g., lookup tables) are cached remotely, adding network overhead. Measured latency sometimes increases instead of decreasing.
Fix
Profile before caching. For tiny, rarely-changing data, consider in-memory local cache (e.g., Guava, Caffeine) instead of Redis.
×

Using `key % N` for distributed sharding

Symptom
When a node is added or removed, most keys map to different nodes — 90% cache misses and a sudden DB spike.
Fix
Replace with consistent hashing. Use Redis Cluster or client-side consistent hashing libraries (e.g., Ketama).
×

Not setting maxmemory on Redis

Symptom
Redis fills up host memory, OOM killer kills redis-server, entire cache goes down.
Fix
Always configure maxmemory (e.g., 80% of RAM) and maxmemory-policy (default: noeviction — change to allkeys-lru).
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Design a Least Frequently Used (LFU) cache with O(1) get and put operati...
Q02SENIOR
Your Redis cluster is healthy, but the database is crashing from load. Y...
Q03SENIOR
How does 'Consistent Hashing' solve the problem of node re-balancing com...
Q04SENIOR
Explain the 'Cache-Aside' pattern. Why is it often preferred over 'Write...
Q05SENIOR
How do you handle a 'Hot Key' problem where one key gets millions of hit...
Q06JUNIOR
When should you choose Redis over Memcached for a caching system?
Q01 of 06SENIOR

Design a Least Frequently Used (LFU) cache with O(1) get and put operations. (Requires a nested doubly linked list or a frequency map + linked list.)

ANSWER
Use a HashMap for key->value+frequency, and a HashMap of frequency->LinkedHashSet. On access, increase frequency and move node. On eviction, remove from the lowest non-empty frequency set. Keep a min frequency variable to track eviction candidate. Both operations are O(1) average.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the difference between Cache-Aside and Write-Through?
02
How do you handle a 'Hot Key' problem where one key gets millions of hits?
03
When should I choose Redis over Memcached?
04
What is the difference between Cache Stampede and Cache Avalanche?
05
How do you monitor cache health in production?
🔥

That's System Design Interview. Mark it forged?

4 min read · try the examples if you haven't

Previous
Design Instagram — Interview
5 / 7 · System Design Interview
Next
Design a Job Scheduler