Advanced 9 min · March 06, 2026

Design a Caching System

Cache Stampede Prevention — Mutex vs Background Refresh

Q: What is the difference between Cache-Aside and Write-Through?

In Cache-Aside, the application code is responsible for reading from the cache and updating it if there's a miss. In Write-Through, the cache acts as the primary data store for the app, and the cache provider handles the synchronous write to the database. Cache-Aside is more resilient to cache failures.

Q: How do you handle a 'Hot Key' problem where one key gets millions of hits?

For extremely hot keys, use 'Local Caching' (L1) on the application server itself to shield the distributed cache (L2). Alternatively, use 'Key Salting' where you replicate the hot key across multiple cache nodes (e.g., user:1:part1, user:1:part2).

Q: When should I choose Redis over Memcached?

Choose Redis if you need data persistence, complex data structures (Lists, Sets, Sorted Sets), or built-in replication and clustering. Memcached is better for very simple, high-speed string caching where multithreading performance is the absolute priority.

Q: What is the difference between Cache Stampede and Cache Avalanche?

Cache Stampede is a single hot key expiring and causing many simultaneous DB requests. Cache Avalanche is many keys expiring at the same time, causing a widespread DB load spike. Both can be mitigated: stampede with mutexes, avalanche with random TTL jitter.

Q: How do you monitor cache health in production?

Track key metrics: cache hit rate (should be >90% for most workloads), eviction count (if rising, memory is too small), latency p99 of cache get/set commands, and TTL distribution. Use Redis INFO stats or cloud monitoring. Also monitor DB load as a leading indicator of cache problems.

One expired key triggered 20,000 DB queries during a flash sale.

Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Everything here is grounded in real deployments.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Caching stores frequently accessed data in fast memory (RAM) to reduce database load
Core components: storage (hash map), eviction policy (LRU), consistency strategy (TTL)
Distributed caching adds consistent hashing for sharding across nodes
Performance: in-memory read ~1ms vs database ~10-50ms — a 10-50x improvement
Production threat: a cache stampede can collapse your DB when a hot key expires under high concurrency
Biggest mistake: treating cache as a database — no TTL leads to stale data and OOM crashes

✦ Definition~90s read

What is Design a Caching System?

Cache stampede (also called thundering herd) is what happens when a popular cached key expires and hundreds of concurrent requests all miss the cache simultaneously, each racing to recompute the same expensive value. This can spike database load 100x in milliseconds, taking down your backend.

★

Imagine you're a chef who gets asked for the same recipes dozens of times a day.

The two canonical prevention strategies are mutex locking (one request recomputes, others wait or get a stale value) and background refresh (a background job proactively refreshes the cache before TTL expiry, so live requests never see a miss). Mutex works well for high-write, low-latency scenarios where you can tolerate a brief wait; background refresh is better for read-heavy workloads where staleness is acceptable and you want zero latency impact.

In production systems like Redis with Redlock or Memcached with CAS, you'll often combine both — use a background refresh as the primary defense and a mutex as a fallback for the rare case the refresh fails. The key insight for a system design interview is that cache stampede is not a theoretical edge case; it's a predictable failure mode that you must explicitly design for, especially when your cache sits in front of a slow or expensive data source like a relational database or a third-party API.

Plain-English First

Imagine you're a chef who gets asked for the same recipes dozens of times a day. Instead of flipping through your giant cookbook every single time, you write the ten most-requested recipes on a sticky note and pin it to the fridge. That sticky note is your cache — fast to read, always nearby, but limited in space. When someone asks for a new recipe that's not on the note, you look it up in the big book and decide which sticky note to replace. That's literally how a caching system works.

Every millisecond matters at scale. When Netflix serves 200 million subscribers or Twitter surfaces a trending tweet, a database query that takes 50ms repeated ten thousand times per second will buckle your infrastructure and light your AWS bill on fire. Caching is the single highest-leverage tool in a backend engineer's kit, yet most engineers treat it as an afterthought — a Redis call dropped in after the database is already struggling. The engineers who design caches thoughtfully are the ones who build systems that survive virality.

The core problem caching solves is the impedance mismatch between how fast your application needs data and how fast your storage layer can produce it. Disk-based databases are optimised for durability and complex queries, not for microsecond reads of the same user profile record fifty thousand times per minute. A well-placed cache absorbs that repeated read pressure, serves data from memory at nanosecond speed, and lets your database do what it's actually good at — handling writes and complex aggregations.

By the end of this article you'll be able to walk into a system design interview and explain cache placement strategies, eviction policies and their trade-offs, cache invalidation approaches and their consistency guarantees, how distributed caches like Redis Cluster work internally, and the production failure modes (stampedes, poisoned caches, thundering herds) that separate senior engineers from mid-levels. You'll have working Java code you can reason about, and you'll know exactly what the interviewer is fishing for when they ask 'how would you cache this?'

What a Caching System Interview Actually Tests

A caching system interview evaluates your ability to design a layer that stores frequently accessed data in fast memory (e.g., Redis, Memcached) to reduce latency and load on a primary datastore. The core mechanic is a trade-off: you accept eventual consistency for O(1) reads instead of O(n) database queries. The interviewer wants to see you reason about hit ratios, TTLs, eviction policies (LRU, LFU), and failure modes like thundering herds or cache stampedes.

In practice, you must decide between write-through, write-around, and write-back strategies. Write-through ensures consistency but adds write latency; write-back improves write throughput at the risk of data loss. Key properties that matter are cache invalidation granularity (do you evict a single key or a whole partition?) and the staleness window — how long can a stale value be served before the next refresh? A common benchmark: a 95% hit ratio reduces read latency from 50ms to under 5ms, but a stampede can spike database connections to 10x normal.

Use caching when reads dominate writes and the data changes infrequently — think user sessions, product catalogs, or configuration. It's critical in systems where a single database query costs 100ms+ and the cache can serve in under 1ms. Avoid caching for rapidly mutating data (e.g., real-time stock ticks) unless you accept bounded staleness. In production, a poorly designed cache can cause cascading failures: a single key expiry triggers thousands of concurrent recomputations, overwhelming the database.

⚠ Cache Stampede Blind Spot

Most engineers assume a simple TTL is enough, but without mutex or background refresh, a single key expiry can trigger a thundering herd that takes down your database.

📊 Production Insight

A social media feed cache with a 60-second TTL expired at peak traffic, causing 5000 concurrent recomputations that saturated the PostgreSQL connection pool.

Symptom: database CPU spikes to 100%, query latency jumps from 10ms to 30s, and the cache remains empty for minutes as every request blocks on a slow recompute.

Rule of thumb: always add a mutex (lock on cache miss) or a background refresh (proactively recompute before TTL expiry) for any cache key that serves more than 100 requests per second.

🎯 Key Takeaway

Cache stampede is not a theoretical edge case — it's a production incident waiting to happen on any high-traffic key.

Mutex prevents duplicate recomputations but adds latency on cache miss; background refresh eliminates miss latency but requires a dedicated worker.

Always measure your cache hit ratio and set TTLs with a jitter (e.g., ±10%) to avoid synchronized expirations.

thecodeforge.io

Design Caching System Interview

Anatomy of a Distributed Cache: More Than Just a Key-Value Store

In an interview, you're not just 'using Redis'; you're designing a high-availability, low-latency data layer. A production-grade caching system consists of three pillars: Storage (usually in-memory Hash Maps or B-Trees), an Eviction Policy to handle capacity limits, and a Consistency Strategy to ensure the cache doesn't serve stale data while the database has moved on.

When we talk about distributed caching, we introduce a fourth pillar: Partitioning (Sharding). Since a single node can't hold all the data or handle all the traffic, we use Consistent Hashing to distribute keys across multiple nodes. This minimizes data movement when a node is added or removed — a critical detail that distinguishes a Senior candidate from a Junior one.

io.thecodeforge.cache.SimpleLruCache.javaJAVA

package io.thecodeforge.cache;

import java.util.LinkedHashMap;
import java.util.Map;

/**
 * A production-grade LRU (Least Recently Used) Cache implementation.
 * Using LinkedHashMap to maintain access order for O(1) eviction.
 */
public class SimpleLruCache<K, V> extends LinkedHashMap<K, V> {
    private final int capacity;

    public SimpleLruCache(int capacity) {
        // true for access-order, false for insertion-order
        super(capacity, 0.75f, true);
        this.capacity = capacity;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
        // Evict the least recently accessed entry when capacity is exceeded
        return size() > capacity;
    }

    public static void main(String[] args) {
        SimpleLruCache<String, String> cache = new SimpleLruCache<>(3);
        cache.put("user:1", "Alice");
        cache.put("user:2", "Bob");
        cache.put("user:3", "Charlie");
        
        cache.get("user:1"); // user:1 becomes the most recently used
        cache.put("user:4", "David"); // Capacity exceeded, user:2 is evicted

        System.out.println("Cache keys after eviction: " + cache.keySet());
    }
}

Output

Cache keys after eviction: [user:3, user:1, user:4]

🔥Forge Tip: Consistent Hashing is the Secret Sauce

In a distributed environment, never use 'key % N' to find a node. If N changes (a server dies), almost every key will map to a different node, causing a cache miss storm. Consistent Hashing limits the impact to only 1/N of the keys.

📊 Production Insight

In production, LinkedHashMap-based LRU is fine for single-node caches but doesn't scale across threads. Use ConcurrentHashMap with access ordering if you need thread safety, or move to Redis for distributed consistency.

Real-world failure: A team used this exact LRU inside a web server and got O(1) per request, but under load the synchronized put caused contention. They moved to a striped lock approach.

Rule: For multi-threaded access, either use ConcurrentLinkedHashMap (from Caffeine library) or wrap with a ReadWriteLock.

🎯 Key Takeaway

LRU eviction with LinkedHashMap is O(1) and simple for single-threaded use.

For production concurrency, Caffeine or Redis beats rolling your own.

Consistent hashing is non-negotiable for distributed caching.

Write Strategies: Balancing Speed and Safety

How you update the cache determines your system's consistency.

Write-Through: Data is written to the cache and the database simultaneously. High consistency, but adds latency to writes.
Write-Around: Data is written only to the database. The cache is only updated on a 'miss'. This prevents the cache from being flooded with data that is rarely read.
Write-Back (Write-Behind): Data is written to the cache first, and the database is updated asynchronously. This provides the highest performance but risks data loss if the cache node crashes before the DB is updated.

docker-compose.ymlDOCKER

version: '3.8'
services:
  redis-cache:
    image: redis:7.2-alphine
    container_name: thecodeforge-redis
    ports:
      - "6379:6379"
    command: ["redis-server", "--maxmemory", "512mb", "--maxmemory-policy", "allkeys-lru"]
    networks:
      - forge-network

networks:
  forge-network:
    driver: bridge

Output

Redis container configured with 512MB limit and LRU eviction policy.

⚠ Watch Out: The Thundering Herd

When a hot cache key expires, thousands of concurrent requests might hit the database simultaneously to re-populate it. Use 'Locks' or 'Leases' to ensure only one request re-populates the cache while others wait.

📊 Production Insight

Write-through adds latency to every write, which often surprises teams expecting zero-performance cost. Write-behind improves write throughput but risks data loss if cache node crashes before flush.

Real-world failure: A fintech startup used write-behind to speed up transactions. A Redis node died before flushing to DB, causing financial reconciliation errors.

Rule: Never use write-behind for financial data. Use write-through for critical writes, write-around for read-heavy workloads.

🎯 Key Takeaway

Write-through: safe but slower writes.

Write-behind: fast but unsafe — use only for non-critical, loss-tolerant data.

Cache aside: most flexible pattern — app controls both cache and DB.

thecodeforge.io

Design Caching System Interview

Cache Invalidation: The Hardest Problem in Computer Science

There are only two hard things in computer science: cache invalidation and naming things. Invalidation ensures that stale data is removed or updated when the underlying source changes. Common approaches:

TTL-based: Set an expiry time. Simple but can serve stale data within the TTL window.
Event-based invalidation: Publish a cache invalidation event when the DB is updated (e.g., Redis Pub/Sub, Kafka). Near real-time consistency.
Write-through (already covered): Synchronously update cache on write — strong consistency but higher write latency.
Read-repair: In distributed caches, if a stale value is read, the reader triggers an update. Used in Dynamo-style systems.

In practice, most systems use a combination: TTL as a safety net, event-based for critical data.

⚠ TTL Alone Is Not Enough

If your data must be strongly consistent (e.g., user balances), TTL-based invalidation alone will serve stale data. Pair it with event-driven invalidation or use write-through.

📊 Production Insight

Event-based invalidation sounds great until you have a network partition. If the invalidation event is lost, the cache stays stale until TTL expiry.

Real-world failure: A payments service used Redis Pub/Sub for invalidation. When the Redis connection dropped, no events reached the cache, and users saw old balances for up to 5 minutes — the TTL. They added a dead-letter queue and periodic full cache refresh.

Rule: Always combine event-based invalidation with a reasonably short TTL (a few minutes) as a safety net.

🎯 Key Takeaway

TTL is simple but eventually consistent.

Event-based invalidation gives near real-time consistency.

Combine both: TTL as safety net, events for immediate updates.

Distributed Cache Topologies: Replication vs Sharding

When a single node can't handle traffic or memory, you need multiple nodes. Two primary patterns:

Replication: Data is copied to multiple nodes (e.g., Redis Sentinel). High read throughput, but write amplification. Writes go to primary, then async to replicas — eventual consistency.

Sharding: Data is partitioned across nodes (e.g., Redis Cluster, Memcached with client-side hashing). Each node owns a subset of keys. Requires consistent hashing to minimize rehashing.

Trade-off: Replication is simpler but wastes memory (each node stores all data). Sharding is memory-efficient but adds complexity for cross-node operations (e.g., multi-key commands fail if keys are on different nodes).

In interviews, start with replication for read-heavy workloads, then scale to sharding when memory of a single node becomes the bottleneck.

📊 Production Insight

Production gotcha: Redis Cluster imposes a 16K hash slot constraint. If you use Redis Cluster, multi-key operations fail with CROSSSLOT error unless keys share a hash tag. This broke our paginated cache scans until we added hash tags for related keys.

Real-world failure: A team used simple MOD sharding (key % N) and added a node during scaling — 90% of keys moved, causing a massive cache miss storm. They migrated to consistent hashing.

Rule: Always use consistent hashing for sharding. Never use key % N.

🎯 Key Takeaway

Replication: simple, good for reads, wastes memory.

Sharding: memory-efficient but complex — use consistent hashing.

In an interview: start with replication, then scale to sharding.

Replication vs Sharding Decision

IfRead-heavy workload, data fits in one node's memory

→

UseUse replication (primary-replica) — simpler, good read throughput.

IfData volume exceeds single node memory (e.g., >64GB)

→

UseUse sharding with consistent hashing — spreads data across nodes.

IfNeed high write throughput with strong consistency

→

UseReplication limits write throughput to primary. Sharding with single-key writes scales better.

Cache Failure Modes: What Breaks Your Cache in Production

Three classic failure modes every senior engineer must explain:

Cache Stampede (Thundering Herd): A hot key expires, thousands of requests try to repopulate it simultaneously, overloading the database. Fix: Use a mutex/lease to allow only one thread to rebuild; others wait. Or use probabilistic early expiration (like Twitter's 'cache roulette').

Cache Penetration: Requests for non-existent keys (e.g., user IDs that don't exist) bypass cache and hit the DB every time. If these are common (e.g., DDOS attacks), the DB gets hammered. Fix: Cache 'null' results with a short TTL (e.g., 1 minute) or use a Bloom filter upfront.

Cache Avalanche: A large number of keys expire at the same time (e.g., all users cached at midnight with same TTL). DB gets a sudden load spike. Fix: Add random jitter to TTLs so they expire at different times. Or use a secondary local cache to absorb the initial misses.

📊 Production Insight

A real-world e-commerce company faced a cache avalanche every hour on the hour — their homepage banners had a fixed 1-hour TTL. At :00, all banners expired, causing 5,000 DB queries at once. They added a random +-10% jitter to TTLs, and the spike flattened.

Rule: Always add random jitter to TTLs, especially for keys that are generated in batches (e.g., cached query results from scheduled jobs).

🎯 Key Takeaway

Cache stampede: mutex or lease required.

Cache penetration: Bloom filter saves your DB.

Cache avalanche: TTL jitter is your friend.

Know all three — interviewers love this.

Diagnose Your Cache Failure

IfDB spikes after specific key expires

→

UseCache stampede — implement mutex or early re-computation.

IfHigh DB load for keys that never exist

→

UseCache penetration — use Bloom filter or cache nulls.

IfDB spikes at regular interval (e.g., every hour)

→

UseCache avalanche — add jitter to TTLs or stagger expiry.

Cache Eviction: Why Your "Hot" Data Stays Cold

You've got a cache. It's full. Something has to go. The eviction policy you choose decides which data survives and which gets nuked. This isn't academic — it's the difference between a snappy API and a thundering herd of cache misses that crush your database.

LRU (Least Recently Used) is the default for most production caches. It kicks out the item nobody touched longest. Works great for workloads with temporal locality — think news feeds or session data. But watch out: one-off batch jobs scanning your cache can poison LRU by touching old entries, making them look "recent" while your real hot data gets evicted.

LFU (Least Frequently Used) tracks access frequency instead of recency. Better for content that's popular consistently but accessed infrequently — like a config file read once per hour. The downside? LFU burns memory tracking counters for every key. Implement it with a min-heap or approximate counts (count-min sketch) to keep overhead sane.

Don't overthink this. Start with LRU. Set a TTL. Monitor your miss ratio. If you see hot keys cycling out, switch to LFU or add a small second-level cache. Your cache isn't smart — you have to feed it the right strategy.

LRUCache.javaJAVA

// io.thecodeforge
import java.util.*;

public class LRUCache<K, V> {
    private final int capacity;
    private final LinkedHashMap<K, V> map;

    public LRUCache(int capacity) {
        this.capacity = capacity;
        // accessOrder=true: rearranges on get()
        this.map = new LinkedHashMap<K, V>(capacity, 0.75f, true) {
            @Override
            protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
                return size() > LRUCache.this.capacity;
            }
        };
    }

    public V get(K key) {
        return map.getOrDefault(key, null);
    }

    public void put(K key, V value) {
        map.put(key, value);
    }
}

Output

// LRU auto-evicts the least recently accessed key

// when capacity is exceeded.

// No manual eviction logic needed.

⚠ Production Trap:

LRU with accessOrder=true on LinkedHashMap is NOT thread-safe. Wrap with Collections.synchronizedMap or use Caffeine cache. That batch job you wrote? It just evicted your user sessions.

🎯 Key Takeaway

LRU for temporal workloads, LFU for stable hot keys. Always monitor miss ratio before tuning.

Types of Cache: Where You Park Your Hot Data Matters

Every cache lives somewhere. Pick the wrong spot and you're adding latency instead of removing it. Here's the real breakdown.

Application Server Cache lives in-process — right inside your Java heap or Python dict. Fastest possible read (microseconds) because there's no network hop. Problem? You scale horizontally and now each node has its own cache. A request hits node A, caches a user profile. Same user hits node B — cache miss. You burn database queries warming up every new server.

Distributed Cache like Redis or Memcached sits outside your app servers. All nodes share one logical cache. Slower reads (milliseconds) but consistent hit rates. The trade-off: network calls add latency, and your cache becomes a single point of failure unless you shard or replicate. Every production system I've inherited had a Redis cluster that went down at 3 AM — plan for it.

CDN / Edge Cache is for static or semi-static content — images, CSS, API responses with long TTLs. Your users are global but your database is in us-east-1. Edge nodes cache at 200+ locations worldwide. User in Tokyo gets the response from Tokyo edge node — no cross-Pacific round trip. Use this for anything that doesn't change every second.

Global Cache is a hybrid — one logical cache but physically distributed. Usually a write-through to a central node with read replicas close to users. Complex to set up, but necessary for systems like Google or Netflix where consistency and speed both matter.

Rule of thumb: start with application cache for local hot data. Add distributed cache when you need consistency across servers. Add CDN when your users are global and your data is mostly read.

CacheTopologyConfig.javaJAVA

// io.thecodeforge
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;
import com.github.benmanes.caffeine.cache.Caffeine;

public class CacheTopologyConfig {
    // L1: application cache (Caffeine) — microsecond reads
    public com.github.benmanes.caffeine.cache.Cache<String, String> buildL1() {
        return Caffeine.newBuilder()
                .maximumSize(10_000)
                .expireAfterWrite(5, java.util.concurrent.TimeUnit.MINUTES)
                .build();
    }

    // L2: distributed cache (Redis) — consistent across nodes
    public JedisPool buildL2() {
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        poolConfig.setMaxTotal(50);
        return new JedisPool(poolConfig, "redis-cluster.prod.internal", 6379);
    }
}

Output

// L1 (Caffeine) handles hot keys locally.

// L2 (Redis) catches misses from L1.

// Prevents thundering herd on single cache node.

⚠ Production Trap:

Application cache per node means cold start after deploy. Pre-warm cache by replaying recent queries or seeding from Redis. Or watch your database get hammered for 10 minutes post-deploy.

🎯 Key Takeaway

In-process cache for speed, distributed cache for consistency, CDN for global reach. Both is better than either.

Distributed Caching: Redis Cluster vs Memcached vs Hazelcast

Choosing the right distributed cache is critical for performance and scalability. Redis Cluster, Memcached, and Hazelcast are three popular options, each with distinct trade-offs.

Redis Cluster is an in-memory data structure store supporting advanced data types (strings, hashes, lists, sets, sorted sets, bitmaps, HyperLogLog, streams). It offers built-in replication, automatic sharding via hash slots, and high availability with Redis Sentinel. Redis Cluster is ideal for use cases requiring complex data operations, pub/sub messaging, or persistence. However, it has a higher memory footprint and operational complexity compared to Memcached.

Memcached is a simple, high-performance distributed memory object caching system. It supports only string values and has a minimal feature set (no persistence, no replication, no clustering). Memcached excels at caching simple key-value pairs with extremely low latency and high throughput. It uses consistent hashing for distribution, making it easy to scale horizontally. However, it lacks advanced data structures and durability, making it suitable only for transient caching.

Hazelcast is an in-memory data grid (IMDG) that provides distributed data structures (maps, queues, topics) with strong consistency and high availability. It supports partitioning, replication, and near-cache for low-latency access. Hazelcast integrates tightly with Java applications and offers features like distributed computing (executor service), entry processors, and event listeners. It is ideal for applications requiring a distributed data store with compute capabilities. However, it has a steeper learning curve and is primarily Java-centric.

Example scenario: For a high-traffic e-commerce site, Redis Cluster can cache product details with complex queries, Memcached can cache session data, and Hazelcast can manage distributed counters for inventory.

Decision factors: Consider data complexity (Redis for complex, Memcached for simple), persistence needs (Redis/Hazelcast), latency requirements (Memcached lowest), and ecosystem fit (Hazelcast for Java).

redis_vs_memcached_vs_hazelcast.pyPYTHON

# Example: Connecting to each cache
import redis
import pymemcache
import hazelcast

# Redis Cluster
rc = redis.RedisCluster(host='localhost', port=7000)
rc.set('key', 'value')
print(rc.get('key'))

# Memcached
mc = pymemcache.Client(('localhost', 11211))
mc.set('key', 'value')
print(mc.get('key'))

# Hazelcast
client = hazelcast.HazelcastClient()
map = client.get_map('my-distributed-map').blocking()
map.put('key', 'value')
print(map.get('key'))

🔥Choosing the Right Cache

📊 Production Insight

In production, monitor memory usage and eviction rates. Redis Cluster requires careful slot management, Memcached is easy to scale horizontally, and Hazelcast benefits from JVM tuning.

🎯 Key Takeaway

Redis Cluster offers rich data types and persistence, Memcached provides simplicity and low latency, while Hazelcast excels in Java-centric distributed computing.

Cache Eviction Policies: LFU, LRU, FIFO, ARC Comparison

Cache eviction policies determine which entries to remove when the cache reaches capacity. The choice impacts hit rate and performance.

LRU (Least Recently Used) evicts the entry that has not been accessed for the longest time. It assumes that recently accessed data is likely to be accessed again. LRU is simple to implement and works well for workloads with temporal locality. However, it can be vulnerable to scan attacks where a one-time scan evicts hot data.

LFU (Least Frequently Used) evicts the entry with the lowest access frequency. It retains frequently accessed data, making it suitable for workloads with skewed access patterns. However, LFU can suffer from cache pollution by old data that was once popular but is no longer relevant. It also has higher overhead due to maintaining frequency counters.

FIFO (First In, First Out) evicts the oldest entry regardless of access patterns. It is simple and fair but performs poorly for workloads with temporal locality, as it may evict frequently accessed data that has been in the cache for a long time.

ARC (Adaptive Replacement Cache) dynamically balances between LRU and LFU by maintaining two lists: recent and frequent. It adapts to changing workloads, providing better hit rates than LRU or LFU alone. ARC is more complex but offers robust performance across diverse access patterns.

Example: For a social media feed, LRU works well for recent posts, LFU for viral content, FIFO for session data, and ARC for mixed workloads.

Implementation considerations: LRU can be implemented with a doubly linked list and hash map (O(1) operations). LFU requires a min-heap or frequency list. ARC uses two LRU lists and a ghost list for history.

Trade-offs: LRU is simple but scan-resistant; LFU retains hot data but can be stale; FIFO is fair but inefficient; ARC adapts but is complex.

lru_cache.pyPYTHON

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity: int):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)

💡Choosing an Eviction Policy

📊 Production Insight

In production, monitor cache hit rates and adjust eviction policy based on access patterns. Use tools like Redis's maxmemory-policy to configure LRU, LFU, or other policies.

🎯 Key Takeaway

LRU is simple and effective for temporal locality, LFU retains frequently accessed data, FIFO is easy but inefficient, and ARC adapts to changing workloads for better hit rates.

Cache-aside vs Read-through vs Write-through: Decision Guide

Caching strategies define how data flows between the application, cache, and database. The three main patterns are cache-aside, read-through, and write-through.

Cache-aside (Lazy Loading): The application is responsible for loading data into the cache. On a cache miss, the application fetches data from the database, stores it in the cache, and returns it. On updates, the application invalidates the cache entry (or updates it). This pattern gives the application full control over caching logic. It is simple and works well for read-heavy workloads with infrequent updates. However, it can lead to stale data if invalidation is missed, and it adds latency on cache misses.

Read-through: The cache itself loads data from the database on a miss. The application only interacts with the cache. This simplifies application code and ensures consistency between cache and database (if the cache handles invalidation). However, it requires the cache to support read-through (e.g., Redis with RedisJSON, or custom implementation). It can also cause cache stampedes if many requests miss simultaneously.

Write-through: The application writes data to both cache and database synchronously. This ensures strong consistency between cache and database. However, it adds write latency and can reduce throughput. It is suitable for write-heavy workloads where data must be immediately consistent.

Decision guide

Use cache-aside when you need fine-grained control over caching logic, or when using a simple cache like Memcached.
Use read-through when you want to simplify application code and have a cache that supports it (e.g., Redis with RedisJSON, or using a library like Spring Cache).
Use write-through when consistency is critical and write volume is manageable.

Example: For a product catalog with frequent reads and occasional updates, cache-aside with lazy loading is common. For a session store where consistency is less critical, read-through may suffice. For a financial system, write-through ensures no stale data.

Combined patterns: You can mix patterns, e.g., read-through for reads and write-behind for writes (asynchronous updates).

cache_aside.pyPYTHON

import redis
import psycopg2

cache = redis.Redis()
db = psycopg2.connect("dbname=test")

def get_user(user_id):
    # Cache-aside
    user = cache.get(f"user:{user_id}")
    if user is None:
        cursor = db.cursor()
        cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
        user = cursor.fetchone()
        cache.setex(f"user:{user_id}", 3600, user)
    return user

def update_user(user_id, data):
    # Update database
    cursor = db.cursor()
    cursor.execute("UPDATE users SET ... WHERE id = %s", (user_id,))
    db.commit()
    # Invalidate cache
    cache.delete(f"user:{user_id}")

⚠ Consistency Considerations

📊 Production Insight

In production, monitor cache hit rates and write latency. For high write volumes, consider write-behind (asynchronous) patterns to reduce load on the database.

🎯 Key Takeaway

Cache-aside offers control and simplicity, read-through reduces application complexity, and write-through ensures strong consistency at the cost of write latency.

● Production incidentPOST-MORTEMseverity: high

Black Friday Cache Stampede: How a Single Expired Key Took Down the Checkout DB

Symptom

All checkout requests timeout. Database CPU at 100%. Cache hit rate drops from 95% to 5%. Alerts show a sudden surge in DB connections.

Assumption

Engineers assumed the cache would always be populated and that the DB could handle the fallback load. No rate limiting on cache miss repopulation.

Root cause

The inventory cache key had a fixed 5-minute TTL with no background refresh. When it expired during a flash sale, 20,000 concurrent threads all tried to rebuild it from the DB simultaneously.

Fix

Implement a mutex/lease mechanism so only one thread rebuilds the cache. Others wait for the new value. Also added a background job to pre-refresh high-traffic keys before TTL expiry.

Key lesson

Always protect cache-miss paths with a mutex or lease to prevent stampedes.
Hot keys should use a shorter TTL with background refresh, not a single long TTL.
Monitor cache hit rate as a leading indicator of DB load.

Production debug guideWhen your cache behaves like a liar instead of a lifesaver, use these symptom→action pairs to find the root cause fast.4 entries

Symptom · 01

Cache hit rate drops dramatically from 95% to 40%

→

Fix

Check redis-cli INFO stats for keyspace_hits and keyspace_misses. Look for recent deployment that changed cache keys or TTL. Verify consistent hash ring didn't redistribute keys.

Symptom · 02

Memory usage hits limit, evictions spike

→

Fix

Check evicted_keys in Redis INFO. Run memory-stats to analyse top consumer keys. Consider increasing maxmemory or switching eviction policy (allkeys-lru vs volatile-lru).

Symptom · 03

Stale data served even though TTL is set

→

Fix

Verify cache invalidation on writes — are you using write-through or cache-aside? Check if TTL is shorter than expected due to configuration override. Inspect Redis TTL with TTL key command.

Symptom · 04

Database still overloaded despite high cache hit rate

→

Fix

Check for cache penetration: requests for non-existent keys bypass cache. Implement Bloom filter or cache negative results with short TTL. Also verify that cache-aside doesn't miss on repeated failed lookups.

★ Quick Cache Debugging Cheat SheetCommands to diagnose common caching issues in production — Redis-focused but patterns apply to any cache system.

High miss rate−

Immediate action

Check cache hit ratio immediately

Commands

redis-cli -h <host> INFO stats | grep -E 'keyspace_(hits|misses)'

redis-cli --bigkeys — find keys consuming most space

Fix now

If miss rate >20%, investigate key naming changes or TTL too short. Consider pre-warming cache after deploy.

Cache memory exhausted+

Stale data served+

Sudden DB spike+

Policy	Mechanism	Ideal Use Case
LRU (Least Recently Used)	Discards items not used for the longest time	General purpose web apps, user sessions
LFU (Least Frequently Used)	Discards items with the lowest hit count	Assets with stable popularity (e.g., static logos)
FIFO (First In First Out)	Discards the oldest added item regardless of use	Short-lived data with predictable lifespans
TTL (Time To Live)	Expiration based on absolute time	News feeds, price data, temporary tokens

⚙ Quick Reference

7 commands from this guide

File	Command / Code	Purpose
io.thecodeforge.cache.SimpleLruCache.java	/**	Anatomy of a Distributed Cache
docker-compose.yml	version: '3.8'	Write Strategies
LRUCache.java	public class LRUCache {	Cache Eviction
CacheTopologyConfig.java	public class CacheTopologyConfig {	Types of Cache
redis_vs_memcached_vs_hazelcast.py	rc = redis.RedisCluster(host='localhost', port=7000)	Distributed Caching
lru_cache.py	from collections import OrderedDict	Cache Eviction Policies
cache_aside.py	cache = redis.Redis()	Cache-aside vs Read-through vs Write-through

Key takeaways

Caching is a trade-off

You gain microsecond latency at the cost of data consistency and infrastructure complexity.

Eviction is mandatory

Always choose an eviction policy (LRU is the standard) and set memory limits to prevent system crashes.

Know your failure modes

Cache Avalanche, Cache Penetration, and Cache Stampede separate Senior from Mid-level in interviews.

Distributed scaling requires Consistent Hashing to maintain high hit rates during cluster resizing.

Production rule

Always combine TTL with event-based invalidation for data that must be fresh.

Profile before caching

Not every query benefits from caching — small, rare datasets may be slower cached externally.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Design a Least Frequently Used (LFU) cache with O(1) get and put operati...

Q02SENIOR

Your Redis cluster is healthy, but the database is crashing from load. Y...

Q03SENIOR

How does 'Consistent Hashing' solve the problem of node re-balancing com...

Q04SENIOR

Explain the 'Cache-Aside' pattern. Why is it often preferred over 'Write...

Q05SENIOR

How do you handle a 'Hot Key' problem where one key gets millions of hit...

Q06JUNIOR

When should you choose Redis over Memcached for a caching system?

Q01 of 06SENIOR

Design a Least Frequently Used (LFU) cache with O(1) get and put operations. (Requires a nested doubly linked list or a frequency map + linked list.)

ANSWER

Use a HashMap for key->value+frequency, and a HashMap of frequency->LinkedHashSet. On access, increase frequency and move node. On eviction, remove from the lowest non-empty frequency set. Keep a min frequency variable to track eviction candidate. Both operations are O(1) average.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is the difference between Cache-Aside and Write-Through?

How do you handle a 'Hot Key' problem where one key gets millions of hits?

When should I choose Redis over Memcached?

What is the difference between Cache Stampede and Cache Avalanche?

How do you monitor cache health in production?

Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Everything here is grounded in real deployments.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's System Design Interview. Mark it forged?

9 min read · try the examples if you haven't