Senior 3 min · June 25, 2026

Multi-Level Caching: Stop Thundering Herds and Slash Latency in Production

Multi-level caching reduces latency and prevents cache stampedes.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

Multi-level caching places a small, fast cache (like an in-memory map) in front of a larger, slower cache (like Redis). This reduces load on the slow cache and database, and prevents thundering herds when a popular key expires.

✦ Definition~90s read
What is Multi-Level Caching?

Multi-level caching uses two or more cache layers (e.g., L1 in-memory, L2 Redis) to reduce latency and protect backend systems. The L1 cache is fast but small; the L2 cache is larger but slower. Together they balance speed and capacity.

Think of a busy coffee shop.
Plain-English First

Think of a busy coffee shop. The barista (L1 cache) keeps the most popular drinks ready in a small rack. If a customer orders something not on the rack, they check the fridge (L2 cache) — bigger but slower. If it's not there either, they brew fresh (database). This way, most orders are served instantly, and the fridge isn't slammed every time someone wants a latte.

You've got a Redis cluster that handles 100k ops/s. Then a viral tweet hits. Your cache miss rate spikes. Redis CPU goes to 100%. Your database connection pool is exhausted. The site goes down. The classic rookie mistake? A single-level cache. One layer to rule them all — and fail them all. Multi-level caching is the pattern that stops this madness. It's not just about speed; it's about survival under load. After this article, you'll be able to design a multi-level cache that handles traffic spikes without melting your backend, and you'll know exactly which consistency model to use so you don't serve stale data to paying customers.

Why One Cache Isn't Enough: The Thundering Herd Problem

A single cache layer (say Redis) works fine until a popular key expires and thousands of requests all miss simultaneously. They all hit the database, which falls over. This is the thundering herd. Multi-level caching solves this by having a small, very fast L1 cache (in-process memory) that can absorb the herd. The L1 cache has a short TTL (seconds), so even if it expires, only one request goes to L2 or DB to repopulate it, while others block briefly (or get a stale value). Without this, your database dies. I've seen it happen to a payment service at 3am — the cache TTL was 1 hour, and when it expired, 500 concurrent requests hit the DB. The fix: add an L1 cache with a 5-second TTL and use a mutex to prevent concurrent rebuilds.

L1CacheWithMutex.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// io.thecodeforge — System Design tutorial

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.ReentrantLock;

public class L1CacheWithMutex {
    // L1 cache: fast, small, short TTL
    private final Cache<String, String> l1 = Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(5, TimeUnit.SECONDS)
            .build();

    // L2 client (e.g., Redis)
    private final RedisClient l2 = new RedisClient();

    // Mutex per key to prevent thundering herd
    private final ReentrantLock lock = new ReentrantLock();

    public String get(String key) {
        String value = l1.getIfPresent(key);
        if (value != null) return value;

        // Only one thread rebuilds per key
        lock.lock();
        try {
            value = l1.getIfPresent(key); // double-check
            if (value != null) return value;

            value = l2.get(key);
            if (value == null) {
                value = loadFromDatabase(key);
                l2.set(key, value, 60, TimeUnit.SECONDS);
            }
            l1.put(key, value);
            return value;
        } finally {
            lock.unlock();
        }
    }

    private String loadFromDatabase(String key) {
        // Simulate DB call
        return "value_for_" + key;
    }
}
Output
Returns cached value quickly. Under high concurrency, only one thread hits L2/DB per key.
Production Trap: Coarse-Grained Locking
Using a single lock for all keys serializes all cache misses. Use a striped lock or per-key lock (e.g., ConcurrentHashMap<String, Lock>) to avoid this. Otherwise, a miss on key 'A' blocks key 'B' requests.
Multi-Level Caching Architecture Flow THECODEFORGE.IO Multi-Level Caching Architecture Flow L1 and L2 caches with consistency and stampede prevention Client Request Entry point for data fetch L1 Cache (In-Process) Local memory, fast but small L2 Cache (Redis/Memcached) Distributed, larger capacity Consistency Model Write-through or write-behind Probabilistic Early Expiry Prevents cache stampedes ⚠ Multi-level caching can add complexity and latency Use only when single cache causes thundering herds THECODEFORGE.IO
thecodeforge.io
Multi-Level Caching Architecture Flow
Multi Level Caching
Multi-Level Cache Read FlowTHECODEFORGE.IOMulti-Level Cache Read FlowHow L1 and L2 caches prevent thundering herdsRequest ArrivesApplication checks L1 (in-process) cache firstL1 Cache HitReturn data in nanoseconds, no DB callL1 Cache MissFall through to L2 (Redis/Memcached)L2 Cache HitReturn data, optionally populate L1L2 Cache MissQuery database, populate both caches⚠ Without L1, a single expired key can trigger thousands of DB hitsTHECODEFORGE.IO
thecodeforge.io
Multi-Level Cache Read Flow
Multi Level Caching

L1 Cache: In-Process vs. Distributed — Pick Your Poison

Your L1 cache lives in the same process as your application. That means it's fast (nanoseconds) but it's also per-instance. If you have 10 instances, each has its own copy. That's fine for read-heavy workloads where consistency isn't critical. But if you need strong consistency, you'll need to invalidate L1 across all instances — which is a distributed invalidation problem. The alternative is a distributed L1 (like Redis with read-replicas), but that adds latency. My rule of thumb: if your read-to-write ratio is >100:1 and you can tolerate seconds of staleness, use in-process L1. If writes are frequent or you need near-real-time consistency, skip L1 and go straight to L2 with a shorter TTL.

InProcessL1.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// io.thecodeforge — System Design tutorial

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import java.util.concurrent.TimeUnit;

public class InProcessL1 {
    // Per-instance cache: fast but not shared
    private final Cache<String, String> l1 = Caffeine.newBuilder()
            .maximumSize(5_000)
            .expireAfterWrite(2, TimeUnit.SECONDS)
            .recordStats()
            .build();

    public String get(String key) {
        String val = l1.getIfPresent(key);
        if (val != null) return val;
        // fallback to L2
        val = l2Get(key);
        if (val != null) l1.put(key, val);
        return val;
    }

    // Called by write path to invalidate L1 across all instances
    public void invalidate(String key) {
        l1.invalidate(key);
        // Also publish invalidation event to message bus for other instances
    }

    private String l2Get(String key) {
        // Redis get
        return null;
    }
}
Output
Returns cached value from local memory in <1ms. Invalidation requires message bus.
Senior Shortcut: Stats Are Your Friend
Enable cache stats (recordStats()) in production. Monitor hit rate, miss rate, and load time. If L1 hit rate is below 80%, your TTL is too short or cache size too small. If it's above 99%, you might be caching too aggressively — stale data risk.

L2 Cache: Redis, Memcached, or Something Else?

Your L2 cache is the shared, durable cache. Redis is the default choice for most — it's fast, supports data structures, and has built-in replication. Memcached is simpler and more memory-efficient for pure key-value, but lacks persistence and replication. For most production systems, Redis wins because you can recover from a restart without a cold cache. But don't use Redis as a primary database — I've seen teams store everything in Redis and then lose data on failover. Use Redis as a cache with a TTL, and always have a database behind it. Also, watch out for hot keys: a single key that gets hammered can saturate a Redis shard. Solution: shard by key, or use local L1 to absorb the heat.

RedisL2Client.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — System Design tutorial

import redis.clients.jedis.JedisPool;
import redis.clients.jedis.Jedis;

public class RedisL2Client {
    private final JedisPool pool = new JedisPool("localhost", 6379);

    public String get(String key) {
        try (Jedis jedis = pool.getResource()) {
            return jedis.get(key);
        }
    }

    public void set(String key, String value, int ttlSeconds) {
        try (Jedis jedis = pool.getResource()) {
            jedis.setex(key, ttlSeconds, value);
        }
    }

    // Use for write-through: set in L2 and invalidate L1
    public void setAndInvalidateL1(String key, String value, int ttlSeconds) {
        set(key, value, ttlSeconds);
        // Publish invalidation to L1 instances via Redis Pub/Sub
        try (Jedis jedis = pool.getResource()) {
            jedis.publish("cache-invalidation", key);
        }
    }
}
Output
Connects to Redis, sets/gets values with TTL. Publishes invalidation events.
Never Do This: Storing Java Objects Directly in Redis
Don't use Java serialization for Redis values. It's slow, bloated, and brittle. Use JSON (Jackson) or Protocol Buffers. And always compress large values (>10KB) with Snappy or Gzip.

Consistency Models: Write-Through vs. Write-Behind vs. Cache-Aside

You have three main patterns for keeping your multi-level cache consistent with the database. Cache-Aside: application reads from cache, on miss loads from DB and populates cache. Writes go to DB, then invalidate cache. This is the most common and simplest. Write-Through: every write goes through cache to DB — cache is always consistent but adds latency. Write-Behind: writes go to cache first, then asynchronously to DB — fast writes but risk data loss if cache crashes. For multi-level, I recommend Cache-Aside with L1 invalidation on writes. Write-Through is too slow for L1 (you'd block on every write). Write-Behind is acceptable only if you can tolerate losing the last few seconds of writes.

CacheAsidePattern.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — System Design tutorial

public class CacheAsidePattern {
    private final L1Cache l1 = new L1Cache();
    private final RedisL2Client l2 = new RedisL2Client();
    private final Database db = new Database();

    public String read(String key) {
        // Check L1 first
        String val = l1.get(key);
        if (val != null) return val;

        // Check L2
        val = l2.get(key);
        if (val != null) {
            l1.put(key, val);
            return val;
        }

        // Load from DB
        val = db.get(key);
        if (val != null) {
            l2.set(key, val, 60);
            l1.put(key, val);
        }
        return val;
    }

    public void write(String key, String value) {
        // Write to DB first
        db.put(key, value);
        // Invalidate both caches
        l1.invalidate(key);
        l2.delete(key);
    }
}
Output
Reads from L1 → L2 → DB. Writes go to DB, then invalidate caches.
Interview Gold: Why Invalidate, Not Update?
Invalidating is cheaper than updating because you avoid write-write conflicts. If two concurrent writes happen, updating the cache could leave it with the wrong value. Invalidation forces the next read to fetch the latest. This is called 'lazy loading' and is the standard pattern.
Cache Consistency PatternsTHECODEFORGE.IOCache Consistency PatternsWrite-Through vs. Cache-Aside trade-offsCache-AsideRead: cache miss → load DB → populate cacheWrite: update DB → invalidate cache keySimplest to implement and debugStale data possible until next readWrite-ThroughWrite: update DB + cache atomicallyRead: always cache hit (if cached)Stronger consistency guaranteeHigher write latency, more cache churnCache-Aside is preferred for most multi-level setupsTHECODEFORGE.IO
thecodeforge.io
Cache Consistency Patterns
Multi Level Caching

Handling Cache Stampedes with Probabilistic Early Expiration

Even with multi-level caching, a cache stampede can still happen if a key expires simultaneously across all L1 instances. The fix: add jitter to TTLs, or use probabilistic early expiration (like Redis' SETEX with random TTL). Better yet, implement 'recompute before expiry' — when a key is within 10% of its TTL, allow one thread to refresh it while others get the stale value. This is called 'cache refresh-ahead'. Caffeine supports this natively with refreshAfterWrite. Use it.

RefreshAheadCache.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — System Design tutorial

import com.github.benmanes.caffeine.cache.Caffeine;
import com.github.benmanes.caffeine.cache.LoadingCache;
import java.util.concurrent.TimeUnit;

public class RefreshAheadCache {
    private final LoadingCache<String, String> cache = Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(10, TimeUnit.SECONDS)
            .refreshAfterWrite(8, TimeUnit.SECONDS) // refresh before expiry
            .build(key -> loadFromL2OrDB(key));

    public String get(String key) {
        return cache.get(key);
    }

    private String loadFromL2OrDB(String key) {
        // Check L2, then DB
        return "value";
    }
}
Output
Cache refreshes asynchronously before expiry, reducing stampede risk.
Production Trap: refreshAfterWrite Without expireAfterWrite
If you only set refreshAfterWrite, the cache never expires — it just refreshes. That means stale data lives forever if the refresh fails. Always pair with expireAfterWrite as a safety net.

When Multi-Level Caching Is Overkill — And What to Use Instead

Multi-level caching adds complexity: you now have two caches to manage, invalidate, and monitor. For low-traffic services (<100 req/s), a single Redis cache with a reasonable TTL is fine. For read-heavy workloads with strict consistency, skip L1 and use Redis with read-replicas. For write-heavy workloads, caching might not help at all — consider a write-optimized database instead. My rule: if your L1 hit rate is below 50%, the overhead of maintaining L1 isn't worth it. Measure first, then optimize.

Never Do This: Multi-Level Caching for User Sessions
User sessions are write-heavy and need strong consistency. Using L1 for sessions means a user's session could be stale on a different instance. Use a distributed session store (Redis) with sticky sessions or a shared cache.
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
A microservice serving product recommendations crashed every 30 minutes under moderate load. Heap dump showed 90% of memory was a single ConcurrentHashMap.
Assumption
Memory leak in the recommendation algorithm.
Root cause
The team had implemented an L1 cache (ConcurrentHashMap) with no eviction policy. It grew unbounded until OOM killer killed the container. The -Xmx flag was set to 2GB but the container had 4GB limit — the JVM never triggered GC aggressively enough.
Fix
Replaced the raw HashMap with Caffeine cache with maximumSize(10000) and expireAfterWrite(5 minutes). Also set -XX:+UseG1GC and -XX:MaxGCPauseMillis=200.
Key lesson
  • Never use a plain ConcurrentHashMap as a cache.
  • Always set a maximum size and TTL, even in dev.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
High L1 miss rate (>50%) under load
Fix
1. Check L1 hit rate via metrics (Caffeine stats). 2. Increase maximumSize. 3. Increase TTL. 4. Check if keys are being invalidated too aggressively.
Symptom · 02
Redis CPU at 100%, connections timing out
Fix
1. Identify hot keys via Redis --hotkeys option. 2. Add L1 cache for those keys. 3. Shard Redis or increase replicas. 4. Reduce TTL to spread load.
Symptom · 03
Stale data served after write
Fix
1. Check if L1 invalidation is called on write path. 2. Verify invalidation event reaches all instances (check message bus). 3. Reduce L1 TTL as fallback.
★ Multi-Level Caching Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
L1 cache hit rate dropped suddenly
Immediate action
Check if a deployment changed cache config or invalidated keys
Commands
curl localhost:8080/actuator/metrics/cache.gets?tag=name:l1
jstat -gcutil <pid> 1000
Fix now
Increase L1 maximumSize by 20% and redeploy
Redis `OOM command not allowed when used memory > 'maxmemory'`+
Immediate action
Check maxmemory and eviction policy
Commands
redis-cli INFO memory | grep used_memory_human
redis-cli CONFIG GET maxmemory-policy
Fix now
Set maxmemory-policy allkeys-lru and increase maxmemory
Application throws `RedisConnectionException`+
Immediate action
Check Redis server health and network
Commands
redis-cli PING
netstat -an | grep 6379
Fix now
Increase connection pool size or add retry logic
Cache stampede: DB CPU spikes every 10 minutes+
Immediate action
Check cache TTLs — they likely all expire at the same time
Commands
redis-cli --bigkeys
grep 'expireAfterWrite' config
Fix now
Add jitter to TTL: base + random(0, 20%)
Feature / AspectIn-Process L1 (Caffeine)Distributed L2 (Redis)
Latency<1ms1-5ms
CapacityLimited by heap (MB-GB)Virtually unlimited (GB-TB)
ConsistencyPer-instance, eventualShared, strong with replication
EvictionLRU, LFU, TTLTTL, LRU (via maxmemory-policy)
PersistenceNone (lost on restart)Optional (RDB/AOF)
Best forHot keys, high read rateShared data, large datasets

Key takeaways

1
Always pair an L1 cache with a bounded size and TTL
never use a raw ConcurrentHashMap.
2
Use cache-aside with invalidation for most workloads; write-through only when consistency is critical.
3
Add jitter to TTLs or use refresh-ahead to prevent thundering herds.
4
Measure L1 hit rate in production
if below 50%, the complexity isn't worth it.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does multi-level caching handle concurrent cache misses for the same...
Q02SENIOR
When would you choose write-through over cache-aside for multi-level cac...
Q03SENIOR
What happens when L1 cache is full and a new key is inserted? How does e...
Q04JUNIOR
What is the difference between cache invalidation and cache eviction?
Q05SENIOR
You notice that after a write, some instances serve stale data for up to...
Q06SENIOR
Design a multi-level cache for a social media feed that handles 1M reads...
Q01 of 06SENIOR

How does multi-level caching handle concurrent cache misses for the same key? What failure mode does it prevent?

ANSWER
It prevents the thundering herd problem. With a mutex or lock per key, only one thread fetches from L2/DB while others wait or get a stale value. Without it, all concurrent requests hit the backend simultaneously, causing overload.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is multi-level caching in system design?
02
What's the difference between L1 and L2 cache?
03
How do I implement multi-level caching in Java?
04
How do you handle cache consistency in multi-level caching?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Components. Mark it forged?

3 min read · try the examples if you haven't

Previous
Distributed Caching
21 / 23 · Components
Next
Geohashing and Quadtrees