Senior 4 min · June 25, 2026

Distributed Caching: How to Stop Your Database From Begging for Mercy

Q: What is the difference between Redis and Memcached for distributed caching?

Redis supports rich data structures (hashes, lists, sets) and persistence, while Memcached is a simple key-value store with multi-threading for higher throughput. Use Redis when you need complex operations or data durability; use Memcached for simple, ephemeral caching at scale.

Q: How do I choose between cache-aside and read-through caching?

Cache-aside gives you control over cache population and invalidation — you write the logic. Read-through hides the DB call behind the cache, simplifying application code but requiring the cache to support it (e.g., Redis with a custom module). Use cache-aside for most systems; use read-through when you want to centralize caching logic.

Q: How do I prevent cache stampede in Redis?

Use a distributed lock (SET NX) around the cache miss logic so only one thread fetches from DB. Alternatively, use probabilistic early expiration: refresh the cache before TTL expires if the key is likely to be accessed soon. Redis doesn't have built-in support, but you can implement it with a background job.

Q: What happens to a distributed cache when a node fails?

With consistent hashing, only the keys on the failed node are lost. Requests for those keys will miss and hit the database. To mitigate, replicate keys across multiple nodes (e.g., each key stored on primary and one replica). Redis Cluster automatically promotes a replica if the master fails.

Distributed caching explained with production patterns, failure modes, and code.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Distributed caching stores hot data across a cluster of servers, reducing database load and latency. Use it when a single cache server can't handle your traffic or you need high availability. The go-to tools are Redis and Memcached, with Redis offering richer data structures and persistence.

✦ Definition~90s read

What is Distributed Caching?

Distributed caching is a system where cache data is spread across multiple machines, providing low-latency access to frequently used data while offloading backend databases. It uses techniques like consistent hashing to distribute keys and handle node failures gracefully.

★

Imagine a library where the most popular books are kept at a front desk.

Plain-English First

Imagine a library where the most popular books are kept at a front desk. A single desk can only serve so many people. Distributed caching is like having multiple front desks across the building, each holding copies of the most requested books. If one desk gets crowded, you go to another. If a desk burns down, the others still have the books. The librarian uses a map (consistent hashing) to decide which desk holds which book, so you always know where to go.

Your database is screaming. It's 2 AM, traffic spiked, and your primary DB is at 100% CPU serving the same 10,000 rows over and over. You've got a cache — a single Redis instance — but it's also melting because every request hits it for the same keys. You need distributed caching. Not because it's trendy, but because your architecture is begging for it.

Distributed caching solves a specific problem: your data access pattern has a hot set that exceeds what one machine can handle. It spreads the load across multiple cache nodes, so no single node becomes the bottleneck. It also gives you fault tolerance — when a node dies, you don't lose the entire cache.

By the end of this, you'll be able to design a distributed cache that survives traffic spikes, node failures, and even your own deployment mistakes. You'll know when to use Redis Cluster vs. Memcached, how consistent hashing actually works, and the exact failure modes that will bite you in production.

Why Your Single Redis Instance Won't Cut It

You've outgrown single-node caching. The problem isn't just capacity — it's throughput. A single Redis instance can handle ~100K ops/sec on a good day. When your traffic demands 500K ops/sec, you need to split the load. But naive sharding (like key % N) is a trap. When you add or remove a node, almost every key remaps to a different server. That means a cache storm — all clients miss simultaneously and hit the database. I've seen this take down a production system at 3 AM. The fix is consistent hashing, which minimizes remapping when the cluster changes.

ConsistentHashingExample.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Consistent hashing with virtual nodes
// Each physical node gets multiple virtual nodes on the hash ring
// to distribute load evenly.

import java.util.*;

class ConsistentHash<T> {
    private final int numberOfReplicas;
    private final SortedMap<Integer, T> circle = new TreeMap<>();

    public ConsistentHash(int numberOfReplicas, Collection<T> nodes) {
        this.numberOfReplicas = numberOfReplicas;
        for (T node : nodes) {
            add(node);
        }
    }

    public void add(T node) {
        // Add virtual nodes for each physical node
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.put(hash(node.toString() + i), node);
        }
    }

    public void remove(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.remove(hash(node.toString() + i));
        }
    }

    public T get(Object key) {
        if (circle.isEmpty()) return null;
        int hash = hash(key);
        // Find the first node clockwise from the key's hash
        if (!circle.containsKey(hash)) {
            SortedMap<Integer, T> tailMap = circle.tailMap(hash);
            hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
        }
        return circle.get(hash);
    }

    private int hash(Object key) {
        // Use a good hash function like Murmur3 or FNV-1a
        return key.hashCode() & 0x7fffffff; // Simple for demo, use better in prod
    }
}

// Usage: ConsistentHash<String> cacheNodes = new ConsistentHash<>(200, nodes);

Output

No direct output — this is a data structure. When you add/remove a node, only 1/N keys remap (N = number of nodes).

Production Trap: Hash Function Quality

Using Java's hashCode() for consistent hashing is a disaster. It's not uniformly distributed — you'll get hot nodes. Use Murmur3 or FNV-1a. Redis Cluster uses CRC16. Test your hash distribution with a representative key set before going to production.

Sharding Strategy Decision Tree

IfCluster size rarely changes (< 1 change/month)

→

UseSimple modulo (key % N) — but only if you can tolerate full cache flush on change.

IfCluster size changes frequently or needs auto-scaling

→

UseConsistent hashing with 100-200 virtual nodes per physical node.

IfNeed replication for high availability

→

UseRedis Cluster (built-in) or client-side consistent hashing with primary-replica pairs.

thecodeforge.io

Distributed Caching Topologies and Trade-Offs

Distributed Caching

thecodeforge.io

Sharding: Naive vs. Consistent

Distributed Caching

Cache Topologies: Side Cache vs. Read-Through vs. Write-Through

How your application talks to the cache matters as much as the cache itself. The most common pattern is side cache (aka cache-aside): your app checks cache first, on miss it loads from DB and populates cache. Simple, but you have to handle stale data yourself. Read-through cache (like Redis with a custom module or a proxy) hides the DB call behind the cache — your app just asks the cache, and the cache fetches from DB if missing. Write-through cache updates the cache and DB in the same transaction — ensures consistency but adds latency. Write-behind (write-back) updates cache immediately and DB asynchronously — fast writes but risk of data loss on crash. I've used write-behind for a high-throughput analytics pipeline where losing a few seconds of data was acceptable. For a payments system? Never. Use write-through or side cache with strong consistency checks.

CacheAsidePattern.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Cache-aside (side cache) pattern for a checkout service
// Uses Redis for caching user cart data.

import redis.clients.jedis.Jedis;
import com.google.gson.Gson;

public class CartService {
    private final Jedis cache;
    private final Database db;
    private final Gson gson = new Gson();
    private static final String CART_PREFIX = "cart:";
    private static final int TTL_SECONDS = 3600; // 1 hour

    public CartService(Jedis cache, Database db) {
        this.cache = cache;
        this.db = db;
    }

    public Cart getCart(String userId) {
        String key = CART_PREFIX + userId;
        String cached = cache.get(key);
        if (cached != null) {
            return gson.fromJson(cached, Cart.class);
        }
        // Cache miss — load from DB
        Cart cart = db.loadCart(userId);
        if (cart != null) {
            // Populate cache with TTL to avoid stale data
            cache.setex(key, TTL_SECONDS, gson.toJson(cart));
        }
        return cart;
    }

    public void updateCart(String userId, Cart cart) {
        // Write-through: update DB first, then invalidate cache
        db.saveCart(userId, cart);
        cache.del(CART_PREFIX + userId);
        // Next read will miss and fetch fresh data
    }
}

// Output: When getCart is called, if cache miss, DB is queried and cache is populated.
// On update, cache is invalidated to ensure consistency.

Output

On first call for user 'abc': cache miss → DB query → cache set. Subsequent calls: cache hit. On update: cache deleted → next read fetches fresh.

Senior Shortcut: Cache Invalidation Strategy

Don't try to update cache on writes — just delete the key. Let the next read populate it. This avoids race conditions where a stale write overwrites a fresh one. It's called 'lazy invalidation' and it's the simplest correct strategy for most systems.

thecodeforge.io

Cache-Aside Flow

Distributed Caching

Eviction Policies: What Happens When Memory Runs Out

Your cache has finite memory. When it's full, something has to go. The default in Redis is noeviction — it just returns errors on writes. That's a production disaster waiting to happen. You must set maxmemory-policy. The most common is allkeys-lru: evict the least recently used key regardless of TTL. For caches with TTLs, volatile-lru evicts only keys with TTL set. allkeys-lfu (least frequently used) works well for workloads with skewed access patterns. I once saw a team use noeviction on a Redis instance that stored session data. When memory filled up, users couldn't log in. The fix was adding maxmemory 4gb and maxmemory-policy allkeys-lru. Also, monitor eviction rate via INFO command — if it's high, your cache is too small or your TTLs are too long.

RedisEvictionConfig.systemdesignSYSTEMDESIGN

# io.thecodeforge — System Design tutorial

# Redis configuration for production cache
# Set in redis.conf or via CONFIG SET

maxmemory 4gb
maxmemory-policy allkeys-lru
# Also consider:
# maxmemory-policy volatile-lru  # only evict keys with TTL
# maxmemory-policy allkeys-lfu   # evict least frequently used

# Monitor evictions:
# redis-cli INFO stats | grep evicted_keys
# If evicted_keys per second > 100, increase maxmemory or reduce TTLs.

Output

No direct output — configuration. Use INFO stats to see evicted_keys.

Never Do This: noeviction in Production

Setting maxmemory-policy to noeviction (the default!) means Redis will reject write commands when memory is full. Your cache becomes read-only. Applications that try to set keys will get OOM errors. Always set an eviction policy. If you need to guarantee certain keys are never evicted, use a separate Redis instance with noeviction for those keys only.

Consistency and Stale Data: The CAP Trade-Off

Distributed caches are eventually consistent by nature. When you update data in the DB, the cache might serve stale data for a while. How stale? That depends on your invalidation strategy. The simplest approach is TTL-based expiry: set a reasonable TTL (e.g., 5 minutes) and accept that data can be up to 5 minutes stale. For stricter consistency, use write-through or publish-subscribe invalidation (e.g., Redis Pub/Sub to broadcast cache invalidation events). But even then, there's a window between DB write and cache invalidation where a read could get stale data. In practice, for most systems, a short TTL (seconds to minutes) is good enough. For systems that can't tolerate any staleness (like inventory in an e-commerce checkout), you might skip the cache entirely for that data or use a distributed lock to synchronize reads and writes. I've seen a booking system that cached seat availability with a 30-second TTL — double bookings happened. They switched to a write-through cache with Redis transactions (MULTI/EXEC) to atomically update DB and cache.

WriteThroughConsistency.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Write-through with Redis transaction for seat booking
// Ensures cache and DB are updated atomically.

import redis.clients.jedis.Jedis;
import redis.clients.jedis.Transaction;

public class BookingService {
    private final Jedis cache;
    private final Database db;

    public boolean bookSeat(String eventId, String seatId, String userId) {
        String cacheKey = "seat:" + eventId + ":" + seatId;
        // Watch the key for optimistic locking
        cache.watch(cacheKey);
        String currentOwner = cache.get(cacheKey);
        if (currentOwner != null) {
            // Seat already booked (in cache)
            return false;
        }
        // Attempt to book in DB
        boolean dbSuccess = db.bookSeat(eventId, seatId, userId);
        if (!dbSuccess) {
            cache.unwatch();
            return false;
        }
        // Update cache atomically
        Transaction t = cache.multi();
        t.set(cacheKey, userId);
        t.expire(cacheKey, 3600); // 1 hour TTL
        List<Object> results = t.exec();
        if (results == null) {
            // Transaction failed (key was modified by another client)
            // Rollback DB? Or handle conflict
            return false;
        }
        return true;
    }
}

// Output: Returns true if booking succeeded. Cache and DB are consistent.

Output

Returns true if booking succeeded. Cache and DB are consistent.

Interview Gold: CAP Theorem and Caching

In a distributed cache, you typically sacrifice strong consistency for availability and partition tolerance. That's why TTLs exist — they bound the inconsistency window. If an interviewer asks 'how do you handle cache consistency?', say: 'I use TTLs for bounded staleness, and for critical data, I use write-through with atomic transactions or a distributed lock.'

When Not to Use Distributed Caching

Distributed caching isn't free. It adds complexity: network latency (even localhost has overhead), serialization costs, and operational burden (monitoring, backups, cluster management). Don't use it if your entire dataset fits in memory on a single node and your traffic is moderate. A single Redis instance with 32GB RAM can handle most small-to-medium applications. Also, don't cache data that changes every millisecond — the cache invalidation overhead will kill you. For example, real-time stock prices: by the time you cache and serve, the price is stale. Better to use a streaming system. Finally, if your read-to-write ratio is close to 1:1, caching might not help much — you're just adding latency to every write. Measure your actual access patterns before adding a cache layer.

Senior Shortcut: Measure Before You Cache

Add metrics to your application: cache hit rate, latency percentiles, and database query frequency. If your hit rate is below 80%, your cache is likely doing more harm than good. Use tools like Redis's INFO command or a metrics dashboard (Datadog, Grafana).

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

Every 45 minutes, one Redis node in our cluster would OOM-kill. The rest would spike to 200% CPU, then cascade. The entire payments service went down for 3 minutes each time.

Assumption

We assumed a memory leak in our application code — maybe we weren't setting TTLs correctly.

Root cause

We used a simple modulo-based sharding (key_hash % N). When a node died, the hash space shifted, causing a thundering herd as every client recomputed shards and hammered the remaining nodes. Also, we had no maxmemory-policy set, so Redis used default noeviction — it just crashed instead of evicting.

Fix

Switched to consistent hashing with 200 virtual nodes per physical node. Set maxmemory-policy allkeys-lru. Added a second replica for each key range. Pre-warmed cache on restart by replaying last 5 minutes of traffic from a log.

Key lesson

Never use simple modulo for distributed caching.
Always set an eviction policy.
And always plan for node failure — it's not if, but when.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

High cache miss rate (>30%) under load

→

Fix

1. Check eviction rate: redis-cli INFO stats | grep evicted_keys. If high, increase maxmemory or reduce TTLs. 2. Check if keys are expiring too fast: redis-cli INFO stats | grep expires. 3. Verify hash distribution: use redis-cli --cluster check for Redis Cluster.

Symptom · 02

Cache node OOM kill (out-of-memory)

→

Fix

1. Check maxmemory setting: redis-cli CONFIG GET maxmemory. 2. Check eviction policy: redis-cli CONFIG GET maxmemory-policy. 3. If noeviction, change to allkeys-lru. 4. Reduce TTLs or add more nodes.

Symptom · 03

Thundering herd on DB after cache node failure

→

Fix

1. Implement consistent hashing with replication. 2. Add circuit breaker on DB calls (e.g., Hystrix). 3. Pre-warm cache on node restart by replaying recent traffic from logs. 4. Use a local cache (L1) to absorb initial misses.

★ Distributed Caching Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

Cache miss rate spike — `INFO stats` shows `keyspace_misses` high−

Immediate action

Check if evictions are happening

Commands

redis-cli INFO stats | grep evicted_keys

redis-cli INFO stats | grep keyspace_hits

Fix now

Increase maxmemory by 20% or reduce TTLs. Set maxmemory-policy allkeys-lru if not set.

Redis OOM error in logs — `OOM command not allowed when used memory > 'maxmemory'`+

High latency on cache gets — `SLOWLOG` shows many slow commands+

Cache cluster split-brain or inconsistent data+

Feature / Aspect	Redis Cluster	Memcached
Data Structures	Strings, hashes, lists, sets, sorted sets, streams	Strings only (key-value)
Persistence	RDB snapshots, AOF logs	None (volatile)
Replication	Built-in (master-replica, cluster)	No built-in replication (client-side)
Sharding	Automatic via hash slots (CRC16)	Client-side consistent hashing
Eviction Policies	Multiple (LRU, LFU, TTL, etc.)	Only LRU
Use Case	Rich data, persistence needed, complex operations	Simple key-value, high throughput, ephemeral data

Key takeaways

Distributed caching is about spreading load and surviving failures

not just adding memory. Use consistent hashing to minimize disruption when nodes change.

Always set an eviction policy (allkeys-lru) and a TTL. Without them, your cache becomes a memory leak that eventually crashes.

Cache consistency is a spectrum. Use TTLs for bounded staleness, write-through for strong consistency, and lazy invalidation (delete on write) for simplicity.

Measure before you cache. If your hit rate is below 80%, you're adding complexity without benefit. Profile your access patterns first.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does Redis Cluster handle a network partition? What happens to reads...

Q02SENIOR

When would you choose Memcached over Redis Cluster in a production syste...

Q03SENIOR

What happens when a consistent hashing ring is imbalanced? How do you mi...

Q04JUNIOR

What is cache stampede and how do you prevent it?

Q05SENIOR

A production incident: your cache hit rate dropped from 95% to 40% after...

Q06SENIOR

Design a distributed cache for a social media feed that supports 10 mill...

Q01 of 06SENIOR

How does Redis Cluster handle a network partition? What happens to reads and writes during a split?

ANSWER

Redis Cluster uses a gossip protocol to detect partitions. If a majority of master nodes can't reach a minority, the minority stops accepting writes (to prevent split-brain). Reads are still served from replicas if configured. When the partition heals, the minority nodes sync from the majority. This means during a partition, some data may be unavailable for writes, but consistency is preserved.

FAQ · 4 QUESTIONS

Frequently Asked Questions

What is the difference between Redis and Memcached for distributed caching?

How do I choose between cache-aside and read-through caching?

How do I prevent cache stampede in Redis?

What happens to a distributed cache when a node fails?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Components. Mark it forged?

4 min read · try the examples if you haven't