Senior 3 min · June 25, 2026

Multi-Level Caching: Stop Thundering Herds and Slash Latency in Production

Q: What is multi-level caching in system design?

Multi-level caching uses two or more cache layers (e.g., in-memory L1 and Redis L2) to reduce latency and protect backend systems. L1 is fast but small; L2 is larger but slower. Requests check L1 first, then L2, then the database.

Q: What's the difference between L1 and L2 cache?

L1 cache is in-process memory (nanosecond latency, limited size). L2 cache is a distributed store like Redis (millisecond latency, large capacity). L1 reduces load on L2; L2 reduces load on the database.

Q: How do I implement multi-level caching in Java?

Use Caffeine for L1 (in-process) and Jedis/Lettuce for L2 (Redis). Implement cache-aside: read from L1, on miss read from L2, on miss read from DB. On writes, write to DB then invalidate both caches.

Q: How do you handle cache consistency in multi-level caching?

Use write-invalidate: on every write, update the database and invalidate the corresponding keys in both L1 and L2. For L1 across instances, publish invalidation events via a message bus (e.g., Redis Pub/Sub). This ensures eventual consistency.

Multi-level caching reduces latency and prevents cache stampedes.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Multi-level caching places a small, fast cache (like an in-memory map) in front of a larger, slower cache (like Redis). This reduces load on the slow cache and database, and prevents thundering herds when a popular key expires.

✦ Definition~90s read

What is Multi-Level Caching?

Multi-level caching uses two or more cache layers (e.g., L1 in-memory, L2 Redis) to reduce latency and protect backend systems. The L1 cache is fast but small; the L2 cache is larger but slower. Together they balance speed and capacity.

★

Think of a busy coffee shop.

Plain-English First

Think of a busy coffee shop. The barista (L1 cache) keeps the most popular drinks ready in a small rack. If a customer orders something not on the rack, they check the fridge (L2 cache) — bigger but slower. If it's not there either, they brew fresh (database). This way, most orders are served instantly, and the fridge isn't slammed every time someone wants a latte.

You've got a Redis cluster that handles 100k ops/s. Then a viral tweet hits. Your cache miss rate spikes. Redis CPU goes to 100%. Your database connection pool is exhausted. The site goes down. The classic rookie mistake? A single-level cache. One layer to rule them all — and fail them all. Multi-level caching is the pattern that stops this madness. It's not just about speed; it's about survival under load. After this article, you'll be able to design a multi-level cache that handles traffic spikes without melting your backend, and you'll know exactly which consistency model to use so you don't serve stale data to paying customers.

Why One Cache Isn't Enough: The Thundering Herd Problem

A single cache layer (say Redis) works fine until a popular key expires and thousands of requests all miss simultaneously. They all hit the database, which falls over. This is the thundering herd. Multi-level caching solves this by having a small, very fast L1 cache (in-process memory) that can absorb the herd. The L1 cache has a short TTL (seconds), so even if it expires, only one request goes to L2 or DB to repopulate it, while others block briefly (or get a stale value). Without this, your database dies. I've seen it happen to a payment service at 3am — the cache TTL was 1 hour, and when it expired, 500 concurrent requests hit the DB. The fix: add an L1 cache with a 5-second TTL and use a mutex to prevent concurrent rebuilds.

L1CacheWithMutex.javaJAVA

// io.thecodeforge — System Design tutorial

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.ReentrantLock;

public class L1CacheWithMutex {
    // L1 cache: fast, small, short TTL
    private final Cache<String, String> l1 = Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(5, TimeUnit.SECONDS)
            .build();

    // L2 client (e.g., Redis)
    private final RedisClient l2 = new RedisClient();

    // Mutex per key to prevent thundering herd
    private final ReentrantLock lock = new ReentrantLock();

    public String get(String key) {
        String value = l1.getIfPresent(key);
        if (value != null) return value;

        // Only one thread rebuilds per key
        lock.lock();
        try {
            value = l1.getIfPresent(key); // double-check
            if (value != null) return value;

            value = l2.get(key);
            if (value == null) {
                value = loadFromDatabase(key);
                l2.set(key, value, 60, TimeUnit.SECONDS);
            }
            l1.put(key, value);
            return value;
        } finally {
            lock.unlock();
        }
    }

    private String loadFromDatabase(String key) {
        // Simulate DB call
        return "value_for_" + key;
    }
}

Output

Returns cached value quickly. Under high concurrency, only one thread hits L2/DB per key.

Production Trap: Coarse-Grained Locking

Using a single lock for all keys serializes all cache misses. Use a striped lock or per-key lock (e.g., ConcurrentHashMap<String, Lock>) to avoid this. Otherwise, a miss on key 'A' blocks key 'B' requests.

thecodeforge.io

Multi-Level Caching Architecture Flow

Multi Level Caching

thecodeforge.io

Multi-Level Cache Read Flow

Multi Level Caching

L1 Cache: In-Process vs. Distributed — Pick Your Poison

Your L1 cache lives in the same process as your application. That means it's fast (nanoseconds) but it's also per-instance. If you have 10 instances, each has its own copy. That's fine for read-heavy workloads where consistency isn't critical. But if you need strong consistency, you'll need to invalidate L1 across all instances — which is a distributed invalidation problem. The alternative is a distributed L1 (like Redis with read-replicas), but that adds latency. My rule of thumb: if your read-to-write ratio is >100:1 and you can tolerate seconds of staleness, use in-process L1. If writes are frequent or you need near-real-time consistency, skip L1 and go straight to L2 with a shorter TTL.

InProcessL1.javaJAVA

// io.thecodeforge — System Design tutorial

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import java.util.concurrent.TimeUnit;

public class InProcessL1 {
    // Per-instance cache: fast but not shared
    private final Cache<String, String> l1 = Caffeine.newBuilder()
            .maximumSize(5_000)
            .expireAfterWrite(2, TimeUnit.SECONDS)
            .recordStats()
            .build();

    public String get(String key) {
        String val = l1.getIfPresent(key);
        if (val != null) return val;
        // fallback to L2
        val = l2Get(key);
        if (val != null) l1.put(key, val);
        return val;
    }

    // Called by write path to invalidate L1 across all instances
    public void invalidate(String key) {
        l1.invalidate(key);
        // Also publish invalidation event to message bus for other instances
    }

    private String l2Get(String key) {
        // Redis get
        return null;
    }
}

Output

Returns cached value from local memory in <1ms. Invalidation requires message bus.

Senior Shortcut: Stats Are Your Friend

Enable cache stats (recordStats()) in production. Monitor hit rate, miss rate, and load time. If L1 hit rate is below 80%, your TTL is too short or cache size too small. If it's above 99%, you might be caching too aggressively — stale data risk.

L2 Cache: Redis, Memcached, or Something Else?

Your L2 cache is the shared, durable cache. Redis is the default choice for most — it's fast, supports data structures, and has built-in replication. Memcached is simpler and more memory-efficient for pure key-value, but lacks persistence and replication. For most production systems, Redis wins because you can recover from a restart without a cold cache. But don't use Redis as a primary database — I've seen teams store everything in Redis and then lose data on failover. Use Redis as a cache with a TTL, and always have a database behind it. Also, watch out for hot keys: a single key that gets hammered can saturate a Redis shard. Solution: shard by key, or use local L1 to absorb the heat.

RedisL2Client.javaJAVA

// io.thecodeforge — System Design tutorial

import redis.clients.jedis.JedisPool;
import redis.clients.jedis.Jedis;

public class RedisL2Client {
    private final JedisPool pool = new JedisPool("localhost", 6379);

    public String get(String key) {
        try (Jedis jedis = pool.getResource()) {
            return jedis.get(key);
        }
    }

    public void set(String key, String value, int ttlSeconds) {
        try (Jedis jedis = pool.getResource()) {
            jedis.setex(key, ttlSeconds, value);
        }
    }

    // Use for write-through: set in L2 and invalidate L1
    public void setAndInvalidateL1(String key, String value, int ttlSeconds) {
        set(key, value, ttlSeconds);
        // Publish invalidation to L1 instances via Redis Pub/Sub
        try (Jedis jedis = pool.getResource()) {
            jedis.publish("cache-invalidation", key);
        }
    }
}

Output

Connects to Redis, sets/gets values with TTL. Publishes invalidation events.

Never Do This: Storing Java Objects Directly in Redis

Don't use Java serialization for Redis values. It's slow, bloated, and brittle. Use JSON (Jackson) or Protocol Buffers. And always compress large values (>10KB) with Snappy or Gzip.

Consistency Models: Write-Through vs. Write-Behind vs. Cache-Aside

You have three main patterns for keeping your multi-level cache consistent with the database. Cache-Aside: application reads from cache, on miss loads from DB and populates cache. Writes go to DB, then invalidate cache. This is the most common and simplest. Write-Through: every write goes through cache to DB — cache is always consistent but adds latency. Write-Behind: writes go to cache first, then asynchronously to DB — fast writes but risk data loss if cache crashes. For multi-level, I recommend Cache-Aside with L1 invalidation on writes. Write-Through is too slow for L1 (you'd block on every write). Write-Behind is acceptable only if you can tolerate losing the last few seconds of writes.

CacheAsidePattern.javaJAVA

// io.thecodeforge — System Design tutorial

public class CacheAsidePattern {
    private final L1Cache l1 = new L1Cache();
    private final RedisL2Client l2 = new RedisL2Client();
    private final Database db = new Database();

    public String read(String key) {
        // Check L1 first
        String val = l1.get(key);
        if (val != null) return val;

        // Check L2
        val = l2.get(key);
        if (val != null) {
            l1.put(key, val);
            return val;
        }

        // Load from DB
        val = db.get(key);
        if (val != null) {
            l2.set(key, val, 60);
            l1.put(key, val);
        }
        return val;
    }

    public void write(String key, String value) {
        // Write to DB first
        db.put(key, value);
        // Invalidate both caches
        l1.invalidate(key);
        l2.delete(key);
    }
}

Output

Reads from L1 → L2 → DB. Writes go to DB, then invalidate caches.

Interview Gold: Why Invalidate, Not Update?

Invalidating is cheaper than updating because you avoid write-write conflicts. If two concurrent writes happen, updating the cache could leave it with the wrong value. Invalidation forces the next read to fetch the latest. This is called 'lazy loading' and is the standard pattern.

thecodeforge.io

Cache Consistency Patterns

Multi Level Caching

Handling Cache Stampedes with Probabilistic Early Expiration

Even with multi-level caching, a cache stampede can still happen if a key expires simultaneously across all L1 instances. The fix: add jitter to TTLs, or use probabilistic early expiration (like Redis' SETEX with random TTL). Better yet, implement 'recompute before expiry' — when a key is within 10% of its TTL, allow one thread to refresh it while others get the stale value. This is called 'cache refresh-ahead'. Caffeine supports this natively with refreshAfterWrite. Use it.

RefreshAheadCache.javaJAVA

// io.thecodeforge — System Design tutorial

import com.github.benmanes.caffeine.cache.Caffeine;
import com.github.benmanes.caffeine.cache.LoadingCache;
import java.util.concurrent.TimeUnit;

public class RefreshAheadCache {
    private final LoadingCache<String, String> cache = Caffeine.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(10, TimeUnit.SECONDS)
            .refreshAfterWrite(8, TimeUnit.SECONDS) // refresh before expiry
            .build(key -> loadFromL2OrDB(key));

    public String get(String key) {
        return cache.get(key);
    }

    private String loadFromL2OrDB(String key) {
        // Check L2, then DB
        return "value";
    }
}

Output

Cache refreshes asynchronously before expiry, reducing stampede risk.

Production Trap: refreshAfterWrite Without expireAfterWrite

If you only set refreshAfterWrite, the cache never expires — it just refreshes. That means stale data lives forever if the refresh fails. Always pair with expireAfterWrite as a safety net.

When Multi-Level Caching Is Overkill — And What to Use Instead

Multi-level caching adds complexity: you now have two caches to manage, invalidate, and monitor. For low-traffic services (<100 req/s), a single Redis cache with a reasonable TTL is fine. For read-heavy workloads with strict consistency, skip L1 and use Redis with read-replicas. For write-heavy workloads, caching might not help at all — consider a write-optimized database instead. My rule: if your L1 hit rate is below 50%, the overhead of maintaining L1 isn't worth it. Measure first, then optimize.

Never Do This: Multi-Level Caching for User Sessions

User sessions are write-heavy and need strong consistency. Using L1 for sessions means a user's session could be stale on a different instance. Use a distributed session store (Redis) with sticky sessions or a shared cache.

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

A microservice serving product recommendations crashed every 30 minutes under moderate load. Heap dump showed 90% of memory was a single ConcurrentHashMap.

Assumption

Memory leak in the recommendation algorithm.

Root cause

The team had implemented an L1 cache (ConcurrentHashMap) with no eviction policy. It grew unbounded until OOM killer killed the container. The -Xmx flag was set to 2GB but the container had 4GB limit — the JVM never triggered GC aggressively enough.

Fix

Replaced the raw HashMap with Caffeine cache with maximumSize(10000) and expireAfterWrite(5 minutes). Also set -XX:+UseG1GC and -XX:MaxGCPauseMillis=200.

Key lesson

Never use a plain ConcurrentHashMap as a cache.
Always set a maximum size and TTL, even in dev.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

High L1 miss rate (>50%) under load

→

Fix

1. Check L1 hit rate via metrics (Caffeine stats). 2. Increase maximumSize. 3. Increase TTL. 4. Check if keys are being invalidated too aggressively.

Symptom · 02

Redis CPU at 100%, connections timing out

→

Fix

1. Identify hot keys via Redis --hotkeys option. 2. Add L1 cache for those keys. 3. Shard Redis or increase replicas. 4. Reduce TTL to spread load.

Symptom · 03

Stale data served after write

→

Fix

1. Check if L1 invalidation is called on write path. 2. Verify invalidation event reaches all instances (check message bus). 3. Reduce L1 TTL as fallback.

★ Multi-Level Caching Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

L1 cache hit rate dropped suddenly−

Immediate action

Check if a deployment changed cache config or invalidated keys

Commands

curl localhost:8080/actuator/metrics/cache.gets?tag=name:l1

jstat -gcutil <pid> 1000

Fix now

Increase L1 maximumSize by 20% and redeploy

Redis `OOM command not allowed when used memory > 'maxmemory'`+

Application throws `RedisConnectionException`+

Cache stampede: DB CPU spikes every 10 minutes+

Feature / Aspect	In-Process L1 (Caffeine)	Distributed L2 (Redis)
Latency	<1ms	1-5ms
Capacity	Limited by heap (MB-GB)	Virtually unlimited (GB-TB)
Consistency	Per-instance, eventual	Shared, strong with replication
Eviction	LRU, LFU, TTL	TTL, LRU (via maxmemory-policy)
Persistence	None (lost on restart)	Optional (RDB/AOF)
Best for	Hot keys, high read rate	Shared data, large datasets

Key takeaways

Always pair an L1 cache with a bounded size and TTL

never use a raw ConcurrentHashMap.

Use cache-aside with invalidation for most workloads; write-through only when consistency is critical.

Add jitter to TTLs or use refresh-ahead to prevent thundering herds.

Measure L1 hit rate in production

if below 50%, the complexity isn't worth it.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does multi-level caching handle concurrent cache misses for the same...

Q02SENIOR

When would you choose write-through over cache-aside for multi-level cac...

Q03SENIOR

What happens when L1 cache is full and a new key is inserted? How does e...

Q04JUNIOR

What is the difference between cache invalidation and cache eviction?

Q05SENIOR

You notice that after a write, some instances serve stale data for up to...

Q06SENIOR

Design a multi-level cache for a social media feed that handles 1M reads...

Q01 of 06SENIOR

How does multi-level caching handle concurrent cache misses for the same key? What failure mode does it prevent?

ANSWER

It prevents the thundering herd problem. With a mutex or lock per key, only one thread fetches from L2/DB while others wait or get a stale value. Without it, all concurrent requests hit the backend simultaneously, causing overload.

FAQ · 4 QUESTIONS

Frequently Asked Questions

What is multi-level caching in system design?

What's the difference between L1 and L2 cache?

How do I implement multi-level caching in Java?

How do you handle cache consistency in multi-level caching?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Components. Mark it forged?

3 min read · try the examples if you haven't