Mid-level 12 min · March 05, 2026

Caching Strategies — TTL Pitfalls and Staleness Trade-Offs

Q: What is caching strategies in simple terms?

Caching strategies are the rules that decide when to put data into a cache and how to keep it fresh. Think of them as the 'shelf management policy' for your quick-access memory: do you put things on the shelf when someone asks for them (cache-aside), or do you always keep the shelf updated at the same time as the warehouse (write-through)? Each strategy makes a different trade-off between speed, freshness, and safety.

Q: Which caching strategy is the most popular?

Cache-aside (lazy loading) is the most popular because it's simple to implement, efficient with memory (only caches requested data), and gives you explicit control over invalidation. It's used by many popular caching libraries like Spring Cache, Redis caching patterns, and in-memory caches. It's also the easiest to debug since the application logic is explicit.

Q: How do I decide between write-through and write-back?

Ask two questions: 1) Can I tolerate losing a few seconds of writes if the cache crashes? If no → write-through. 2) Is write latency critical for my user experience? If yes → write-back with a durable queue (like Kafka). In practice, use write-through for data where every write must survive (orders, account balances) and write-back for high-volume data like logs, analytics, or likes.

Q: What is refresh-ahead and when should I use it?

Refresh-ahead is a proactive caching technique where the cache automatically refreshes a key before it expires, based on access patterns. For example, if a key is accessed 100 times per second and its TTL is 5 minutes, the cache can start re-fetching the value at 4 minutes so the next reads never see a miss. Use it for read-heavy data that changes slowly (configuration, product catalog) where the additional background load is acceptable.

Q: Can I combine different caching strategies in one application?

Absolutely — in fact, that's the recommended approach. Data types have different access patterns and consistency requirements. For example, in an e-commerce app: product descriptions (rarely change) → cache-aside with long TTL; stock counts (frequent writes) → write-through; user sessions (high write throughput, loss tolerable) → write-back. Keep the strategies separate and label them in code so other engineers understand the intent.

A 30-min TTL caused stale stock data and lost orders.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Production

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Cache-aside (lazy loading): app checks cache first, loads from DB on miss, writes to cache
Write-through: every write goes to cache and DB synchronously — strong consistency, higher write latency
Write-back (write-behind): writes go to cache, asynchronously flushed to DB — fast writes, crash risk
Read-through: cache layer intercepts reads, loads from DB on miss without app involvement
Trade-off triangle: consistency vs latency vs crash resilience — pick the two that matter most

✦ Definition~90s read

What is Caching Strategies?

★

Imagine you're a chef.

The choice isn't theoretical: it directly impacts whether your service becomes read-heavy or write-heavy, how stale data can become, and what happens when the cache node fails.

In production, you rarely pick one strategy for the whole system. Instead, you tailor the strategy per data type: session data might use write-back for fast writes, while financial transactions use write-through to guarantee durability. Understanding the internals of each pattern lets you make that call confidently.

Plain-English First

Imagine you're a chef. Every time someone orders pasta, you could run to the warehouse, grab ingredients, cook from scratch — or you could keep a small prep station right next to you with the most common ingredients already measured out. That prep station is a cache. A caching strategy is simply the set of rules you use to decide: when to stock the prep station, when to restock it, and what to throw away when it gets full. Get those rules right and service is lightning fast. Get them wrong and customers either get stale food or you're running to the warehouse constantly anyway.

Every high-traffic system you've ever used — Google Search, Instagram, your bank's app — is quietly cheating. They're not recalculating the same expensive results over and over. They're storing answers close to where they're needed and serving them back in microseconds instead of milliseconds. That's caching, and at scale it's the single biggest lever you can pull to make a system feel instant. Without it, your database becomes the bottleneck and every user request hammers the same rows over and over like a broken record.

The problem caching solves isn't just speed — it's survival. A relational database that handles 500 queries per second can be reduced to its knees by a single viral moment or a marketing campaign spike. Caching absorbs that traffic, acting as a shock absorber between your users and your persistence layer. But here's what most tutorials skip: there's no single 'correct' cache. The strategy you choose determines whether your cache helps or quietly corrupts your data. A wrong caching strategy can serve users outdated prices, stale inventory counts, or worse — silently lose writes during a crash.

By the end of this article you'll know exactly what each major caching strategy does, when to reach for each one, and critically — what can go wrong with each. You'll be able to look at a system design diagram and immediately spot which caching pattern fits, defend your choice in an interview, and avoid the classic mistakes that bite engineers in production.

What is Caching Strategies?

Caching strategies are the rules that govern how your application interacts with a cache layer relative to the primary database. The four canonical strategies — cache-aside, write-through, write-back, and read-through — each make distinct trade-offs between read latency, write latency, data consistency, and crash resilience. The choice isn't theoretical: it directly impacts whether your service becomes read-heavy or write-heavy, how stale data can become, and what happens when the cache node fails.

ForgeExample.javaJAVA

package io.thecodeforge.cache;

import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;

// Cache-aside pattern on a ConcurrentHashMap
public class CacheAside<K, V> {
    private final ConcurrentHashMap<K, V> store = new ConcurrentHashMap<>();

    public V get(K key, Function<K, V> loader) {
        V cached = store.get(key);
        if (cached != null) {
            return cached;
        }
        // On miss, load from source (DB, API, etc.)
        V value = loader.apply(key);
        store.put(key, value);
        return value;
    }

    public void invalidate(K key) {
        store.remove(key);
    }
}

Forge Tip

The loader function isolates cache miss handling. In production, wrap it with a circuit breaker to avoid cascading failures when the source is slow.

Production Insight

Cache-aside without invalidation guarantees eventual staleness proportional to TTL.

If a write bypasses the cache, stale data serves until TTL expires or a read miss triggers refresh.

Rule: always pair cache-aside with explicit invalidation on every write path.

Key Takeaway

Caching strategies are trade-off decisions, not silver bullets.

Choose the one that aligns with your consistency and performance requirements per data type.

Cache-aside is the safest default for most services.

Choosing Between Cache-Aside and Read-Through

IfApplication can tolerate brief inconsistency between DB and cache

→

UseUse cache-aside — simpler to implement and debug.

IfMultiple services read the same data and you want centralized load logic

→

UseUse read-through at the database proxy or cache middleware layer.

IfYou need strong control over when data is refreshed (e.g., for analytics)

→

UseStick with cache-aside — you decide when to load, not the cache layer.

thecodeforge.io

Caching Strategy Trade-offs and Selection

Caching Strategies

Cache-Aside (Lazy Loading) in Depth

Cache-aside is the most common pattern. The application is responsible for both reading from cache and writing to it. On a read, the app checks the cache first. If present (hit), it returns the value. If absent (miss), it loads the value from the database, writes it to the cache, and returns it. Writes go directly to the database; the application can optionally invalidate the cache key to avoid stale reads.

The key advantage: the cache only holds data that's actually requested. Auto-population means you don't waste memory on rarely accessed items. But the downside is that first request to any key pays a higher latency (cache miss). In high-traffic systems, a sudden miss on a hot key can cause a stampede — many concurrent requests all hitting the database simultaneously.

To prevent stampedes, implement a 'lock around the miss': only one thread loads, others wait. This is often done with a mutex or using Redis' SETNX command. Another mitigation is to pre-warm the cache with data known to be hot during deployment.

CacheAsideWithLock.javaJAVA

package io.thecodeforge.cache;

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.locks.ReentrantLock;
import java.util.function.Function;

public class CacheAsideWithLock<K, V> {
    private final ConcurrentHashMap<K, V> cache = new ConcurrentHashMap<>();
    private final ConcurrentHashMap<K, ReentrantLock> locks = new ConcurrentHashMap<>();

    public V get(K key, Function<K, V> loader) {
        V cached = cache.get(key);
        if (cached != null) return cached;

        ReentrantLock lock = locks.computeIfAbsent(key, k -> new ReentrantLock());
        lock.lock();
        try {
            // Double-check after acquiring lock
            cached = cache.get(key);
            if (cached != null) return cached;
            V value = loader.apply(key);
            cache.put(key, value);
            return value;
        } finally {
            lock.unlock();
            locks.remove(key);
        }
    }
}

Mental Model: Library Checkout Desk

On-hand shelf = cache. Archive = database.
If the book is on the shelf, grab it (cache hit).
If not, go to the archive, fetch the book, and put a copy on the shelf for next time (cache miss + populate).
When a book is returned (write), you only update the archive — the on-hand shelf copy remains until someone checks again or you intentionally remove it (invalidate).
If two people walk to the archive at the same time (stampede), you need a one-at-a-time rule (lock).

Production Insight

A cache miss on a hot key at peak load can multiply DB queries by the number of concurrent requests.

Solution: lock around the miss or use probabilistic early expiration.

Without protection, your DB sees a sudden spike that can cause a cascade failure.

Key Takeaway

Cache-aside is simple but vulnerable to stampedes.

Always protect cache miss loading for hot keys with a mutex or dedicated library.

With invalidation on writes, this pattern gives strong eventual consistency.

Write-Through and Write-Back Strategies

Write-through and write-back shift write responsibility from the application to the cache layer. In write-through, every write goes first to the cache, then synchronously to the database. The write is not acknowledged until both writes succeed. This gives strong consistency: reads from the cache always see the latest write. The cost is higher write latency (adding the database write time to the critical path).

Write-back (also called write-behind) acknowledges the write as soon as the cache accepts it. The cache later asynchronously flushes the data to the database. This dramatically reduces write latency because the database write is off the critical path. However, if the cache crashes before the flush, data is lost. Write-back is ideal for high-throughput event streams, session stores, or analytics where some data loss is tolerable.

In practice, many systems layer strategies: use write-through for critical data (user accounts, payments) and write-back for non-critical but high-volume data (clickstreams, page views).

WriteThroughExample.javaJAVA

package io.thecodeforge.cache;

import java.util.function.Consumer;

// Simplified write-through: cache first, then DB on the same thread
public class WriteThroughCache<K, V> {
    private final java.util.Map<K, V> cache = new java.util.concurrent.ConcurrentHashMap<>();
    private final Consumer<java.util.Map.Entry<K, V>> dbWriter;

    public WriteThroughCache(Consumer<java.util.Map.Entry<K, V>> dbWriter) {
        this.dbWriter = dbWriter;
    }

    public void put(K key, V value) {
        cache.put(key, value);
        dbWriter.accept(new java.util.AbstractMap.SimpleEntry<>(key, value));
    }

    public V get(K key) {
        return cache.get(key);
    }
}

Production Insight

Write-through adds DB latency to every write — don't use it for write-heavy workloads.

Write-back risks data loss unless paired with a durable log or replication.

Rule: write-through for strong consistency, write-back for throughput at the cost of eventual consistency.

Key Takeaway

Write-through is safe but slow; write-back is fast but risky.

Match the strategy to the data's criticality — never use one strategy for everything.

Instrument write latency and flush failures to monitor the async pipeline.

Write-Through vs Write-Back Decision

IfYou cannot tolerate any written data loss

→

UseWrite-through. Ensure cache is durable (e.g., Redis with AOF).

IfWrite latency must be sub-millisecond, data loss is acceptable for short windows

→

UseWrite-back. Use an async flush queue with at-least-once delivery.

IfYou need both strong consistency and low write latency

→

UseConsider write-back with a synchronous replication of the flush queue (e.g., Kafka as buffer).

Write-Back Flow Sequence

A write-back (write-behind) flow decouples the application write from the database persistence. The sequence is: the application sends a write to the cache; the cache immediately acknowledges the write, making the application feel fast; the cache then places the write into an internal buffer or queue; a background worker asynchronously dequeues writes and issues them to the database; the database eventually acknowledges persistence. This pattern is ideal for high-volume data where sub-millisecond write latency is required and a small window of data loss on crash is acceptable.

The critical path is the cache acknowledgment. The async flush must be monitored for backlog and failures. If the flush lags, reads from the cache will see data that hasn't yet been persisted, and a crash before flush will lose those writes. To mitigate, use a durable queue (Kafka, SQS) as the stash between cache and DB, and configure the flush worker to retry with exponential backoff.

Production Insight

The async flush worker is the most failure-prone component. Monitor queue depth, flush latency, and failure rate. Use at-least-once delivery semantics and idempotent DB writes to handle duplicates. Have a manual flush button for scenarios where the worker stalls.

Key Takeaway

Write-back achieves low write latency by decoupling the acknowledgment from persistence. The cost is a risk window of data loss and eventual consistency. Mitigate with durable queues and monitored flush workers.

Write-Back Flow Sequence

Read-Through and Refresh-Ahead Strategies

Read-through is like cache-aside but the cache layer itself handles the miss loading — the application doesn't implement the load logic. This is common when using a caching library or a database proxy (e.g., Redis with a read-through module, or using a write-behind cache like Aerospike). The cache is configured with a loader function, and on any miss it automatically fetches from the database, caches the result, and returns it.

The benefit: application code is simpler — no explicit check-and-load logic. The downside: the cache becomes a thicker layer, and you have less control over the loading behavior. Also, if the loader function throws an error, the cache might propagate it differently than your app would.

Refresh-ahead is an optimization where the cache proactively refreshes a key before it expires, based on access patterns. For example, if a key is accessed frequently and its TTL is about to expire, the cache asynchronously reloads it so that subsequent reads never encounter a miss. This prevents the latency spike of a miss on hot keys. It's especially valuable for data that changes slowly (like configuration or product metadata).

ReadThroughConfig.javaJAVA

package io.thecodeforge.cache;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;

// Simple read-through cache: loader is provided at construction
public class ReadThroughCache<K, V> {
    private final Map<K, V> cache = new ConcurrentHashMap<>();
    private final Function<K, V> loader;

    public ReadThroughCache(Function<K, V> loader) {
        this.loader = loader;
    }

    public V get(K key) {
        // Cache handles miss by calling loader
        return cache.computeIfAbsent(key, loader);
    }
}

Production Insight

Read-through simplifies app code but makes the cache a single point of failure for loading.

If the loader hits a slow DB or API, the cache thread pool can exhaust, blocking all accesses.

Solution: configure timeouts on the loader and a fallback stale-while-revalidate pattern.

Key Takeaway

Read-through is ideal for read-heavy workloads with stable data.

Refresh-ahead prevents miss latency spikes but adds background load.

Monitor the cache's miss-loading latency — it's your new critical path.

Production Trade-offs and Choosing the Right Strategy

No single caching strategy works for all scenarios. The choice depends on three dimensions: - Consistency requirements: Can users see stale data temporarily? (Yes ⇒ cache-aside or write-back; No ⇒ write-through) - Write throughput: How many writes per second? (High ⇒ write-back or cache-aside with async invalidation) - Crash tolerance: Can you lose a few seconds of writes? (No ⇒ write-through; Yes ⇒ write-back with durability)

In production, you often combine strategies per data class. For example, a social media feed might use cache-aside for user profiles (read often, write rarely), write-back for like counts (high write volume, some loss acceptable), and write-through for financial transactions.

A common anti-pattern is global caching: applying one strategy to all data because it's simpler. This either wastes memory (caching rarely accessed data) or causes consistency problems (stale data for frequently updated items). Instead, categorize your data by access pattern and criticality, then pick the matching strategy.

Warning: No One-Size-Fits-All

Resist the urge to use a single 'smart cache' that tries to do everything. Each strategy has distinct failure modes. Cache-aside + write-through for critical data, write-back for high-throughput non-critical. Keep them separate and test each failure path.

Production Insight

Global caching is a trap — it mixes data with different consistency needs.

Write-back on critical data causes silent data loss on cache node failure.

Run chaos experiments: kill the cache node and verify the system still behaves correctly.

Key Takeaway

Classify your data by access frequency, write volume, and consistency needs.

Assign a caching strategy per class — never one strategy for all.

Test failure modes: cache down, network partition, high latency writes.

Quick Decision Matrix

IfRead-heavy with rarely changing data

→

UseCache-aside or read-through with long TTL.

IfWrite-heavy with moderate consistency needed

→

UseWrite-back with async flush and monitoring.

IfData must never be stale (financial, inventory)

→

UseWrite-through with cache invalidation on all paths.

IfMixed workload, each data type has different needs

→

UseApply separate strategies per data type — label them in code.

Eviction Policy Decision Matrix

When the cache reaches its memory limit, an eviction policy decides which entries to remove. The four most common policies are:

LRU (Least Recently Used) - Evicts the entry that was accessed longest ago. Works well for workloads with temporal locality (recently accessed items are likely to be accessed again). Simple to implement and widely used (Redis default). Can be defeated by scans that touch all keys once, causing useful data to be evicted.

LFU (Least Frequently Used) - Evicts the entry with the lowest access frequency. Better for workloads with strong popularity skew (e.g., a few items get 80% of requests). Redis supports LFU with configurable decay factor to gradually reduce frequency over time. More memory overhead than LRU to track counts.

FIFO (First In First Out) - Evicts the oldest entry regardless of access pattern. Simple, but can evict frequently used data that happens to be old. Rarely used in production unless the data has a strict time-to-live semantics (e.g., session data where session ID is older than threshold).

Random - Evicts a random entry. Simple and fast. Works surprisingly well under high churn where LRU/LFU overhead doesn't pay off. Good for cache-aside patterns where invalidation is explicit and eviction just reclaims space on overflow.

To choose, consider your access pattern: temporal locality → LRU; popularity skew → LFU; low memory overhead → FIFO or Random. Test with production-like trace to measure hit ratio under different policies.

The matrix below summarizes when to use each:

Workload Characteristic	Recommended Eviction Policy
Strong temporal locality (session, feed)	LRU
Strong popularity skew (hot keys, viral content)	LFU
Data with natural expiration (logs)	FIFO or Random
High churn, unpredictable access	Random
You don't know your access pattern	Start with LRU, then profile

Production Tip

Redis lets you configure maxmemory-policy at runtime. Run a trial with different policies under production traffic (or replay traffic) and compare evicted_keys and hit ratio before committing.

Production Insight

Choosing the wrong eviction policy can silently degrade cache effectiveness by 20-40%. Profile your access pattern using cache trace logs. For Redis, use redis-cli --stat to watch evictions and misses in real time. LFU often outperforms LRU for content delivery workloads.

Key Takeaway

Match eviction policy to access pattern: LRU for temporal locality, LFU for popularity skew, Random/FIFO for simple or high-churn. Profile to validate.

Cache Strategy Selection: Read-Heavy vs Write-Heavy Workloads

The shape of your workload — whether it's dominated by reads or writes — drives the caching strategy choice. Below is a selection table that maps workload type to recommended patterns:

Workload Type	Read Frequency	Write Frequency	Best Strategy	Why
Read-heavy (90%+ reads)	High	Low	Cache-aside or Read-through with long TTL	Caching reads reduces DB load dramatically; writes are rare so staleness is minimal. Use invalidation on writes.
Write-heavy (50%+ writes)	Medium	High	Write-back with durable queue	Writes must be fast; accept eventual consistency. Monitor flush lag. Use write-through for critical subset.
Balanced (mixed)	Medium	Medium	Cache-aside with active invalidation	Flexible. Use write-through for important data, write-back for logging, cache-aside for reads.
Read-rarely, write-often (e.g. sensor data)	Very low	Very high	Write-back	No reason to cache reads; focus on fast writes. Write-back reduces write latency.
Read-often, write-often (e.g. user inventory)	High	High	Write-through	Strong consistency needed. Write latency is acceptable if DB is fast. Use replication for durability.

For read-heavy systems, the cache hit ratio is the key metric: a 95% hit ratio means only 5% of requests go to DB. For write-heavy, measure write latency and flush queue depth. Never assume a single strategy fits all; partition by data type and apply the appropriate row from this table.

Production Insight

In a read-heavy system like a content feed, a small miss on a hot key can cause a DB spike large enough to trigger auto-scaling. Pre-warm caches after deployment to avoid cold-start misses. For write-heavy systems, always set a maximum flush batch size to avoid overwhelming the DB with a large delayed flush.

Key Takeaway

The read/write ratio dictates the optimal caching strategy. Read-heavy → cache-aside with long TTL; write-heavy → write-back; balanced → cache-aside with invalidation. Profile to confirm.

Workload-Based Strategy Decision Flow

Cache Hit Ratio Impact Calculation

The cache hit ratio directly determines the load on your database and the average request latency. Let's walk through a concrete example.

Scenario: You have a service handling 10,000 requests per second (RPS). Each database query takes 50ms. Each cache hit takes 1ms. The cache can serve up to 50,000 RPS before saturating.

Case A: 90% hit ratio - Cache hits: 9,000 RPS at 1ms = 9 seconds of total delay per second (9,000 0.001s) - Cache misses: 1,000 RPS go to DB at 50ms = 50 seconds of total delay per second - Average latency = (9,000 1ms + 1,000 * 50ms) / 10,000 = (9,000 + 50,000) / 10,000 = 5.9ms - DB load: 1,000 QPS (well within typical DB capacity of 5,000 QPS)

Case B: 99% hit ratio - Cache hits: 9,900 RPS at 1ms = 9.9 seconds - Cache misses: 100 RPS at 50ms = 5 seconds - Average latency = (9,9001 + 10050) / 10,000 = (9,900 + 5,000) / 10,000 = 1.49ms - DB load: 100 QPS

Case C: 70% hit ratio - Cache hits: 7,000 at 1ms = 7 seconds - Cache misses: 3,000 at 50ms = 150 seconds - Average latency = (7,000 + 150,000) / 10,000 = 15.7ms - DB load: 3,000 QPS (near capacity, risk of queuing and increased latency)

Impact: Improving hit ratio from 90% to 99% reduces average latency by 75% (5.9ms → 1.49ms) and DB load by 90% (1,000 → 100 QPS). The marginal gain becomes more valuable as you approach 99%+.

Key formula: Average latency = (HR Lcache) + ((1 - HR) Ldb) where HR is hit ratio, Lcache is cache latency, Ldb is DB latency. If DB latency spikes under load, use a non-linear model. The critical inflection point is when (1 - HR) * RPS exceeds the DB concurrency limit.

hit_ratio_calc.pyPYTHON

def avg_latency(hit_ratio, cache_latency_ms, db_latency_ms):
    return hit_ratio * cache_latency_ms + (1 - hit_ratio) * db_latency_ms

def db_qps(rps, hit_ratio):
    return rps * (1 - hit_ratio)

rps = 10000
lat_cache = 1  # ms
lat_db = 50    # ms

for hr in [0.70, 0.90, 0.99, 0.999]:
    print(f"HR={hr*100:.1f}%: avg_latency={avg_latency(hr, lat_cache, lat_db):.2f}ms, DB QPS={db_qps(rps, hr):.0f}")

Output

HR=70.0%: avg_latency=15.70ms, DB QPS=3000

HR=90.0%: avg_latency=5.90ms, DB QPS=1000

HR=99.0%: avg_latency=1.49ms, DB QPS=100

HR=99.9%: avg_latency=1.05ms, DB QPS=10

Latency Breakdown Insight

Notice that at 99.9% hit ratio, the miss latency still contributes a tiny amount. To go below 1ms, you need <0.5% miss rate or faster DB queries. This is why ultra-low latency systems cache aggressively and use in-memory replicas.

Production Insight

Monitor hit ratio per key pattern, not just globally. A single hot key with low hit ratio can dominate your DB load. In Redis, use redis-cli --hotkeys to find keys with many misses. Consider pinning those keys with an indefinite TTL and a background refresh job.

Key Takeaway

Even a 1% improvement in hit ratio near the 99% range can reduce DB load by 90%. Use the latency formula to set SLOs for cache performance and trigger alerts when hit ratio drops below threshold.

Why You Should Think Twice Before Caching Everything

New engineers often treat cache like a magic speed-up button. They cache everything, everywhere, all at once. That's how you crash production on a Tuesday afternoon.

Cache memory is expensive. DRAM costs roughly 10x more per GB than SSD, and SSD is already 10x more than HDD. Hardware cost isn't the only problem. Search time degrades as cache size grows. A Redis instance holding 50 GB of irrelevant data will crawl when you need that one hot key.

Worse: cache is volatile. When your node restarts or your Redis cluster splits, all that cached data vanishes. If you stored something there that wasn't persisted in the database, you just lost it. Forever.

Here's the rule: cache only what you can afford to lose, and only what you access often enough to justify the storage cost. Profile your hot paths. Measure hit ratios. If a key isn't retrieved at least twice per TTL, it doesn't belong in cache. Period.

CacheSizeCheck.javaJAVA

// io.thecodeforge.caching.strategy
// Monitor cache size vs hot keys before deployment
import redis.clients.jedis.Jedis;
import java.util.Set;

public class CacheSizeCheck {
    public static void main(String[] args) {
        Jedis jedis = new Jedis("localhost", 6379);
        
        // Get total key count and approximate memory
        long keyCount = jedis.dbSize();
        String memoryInfo = jedis.info("memory");
        
        // Parse for used_memory_human
        String[] lines = memoryInfo.split("\r\n");
        for (String line : lines) {
            if (line.startsWith("used_memory_human:")) {
                System.out.println("Current cache memory: " + line.split(":")[1].trim());
                break;
            }
        }
        
        System.out.println("Total keys in cache: " + keyCount);
        
        // Danger: if keyCount > 10M and hit ratio < 80%, you're caching garbage
        System.out.println("WARNING: High key count with low hit ratio indicates caching failures.");
        
        jedis.close();
    }
}

Output

Current cache memory: 2.34G

Total keys in cache: 4,567,890

WARNING: High key count with low hit ratio indicates caching failures.

Production Trap:

During a Black Friday incident, we cached every product variant — 12 million keys. Hit ratio was 12%. Memory saturated, Redis OOM'd, and every request went to the database. The database collapsed under the load. Only cache data with a measured hit ratio above 50%.

Key Takeaway

Cache only hot data. If a key isn't retrieved at least twice per TTL, evict it. Measure every deployment.

Distributed Cache: When One Node Isn't Enough

Single-node caches fail in two common ways: they run out of memory, or they go down and take your entire cache layer with them. Distributed caching solves both problems.

A distributed cache spreads data across multiple nodes using consistent hashing. Redis Cluster and Memcached with consistent hashing clients are the standard choices. When you add or remove a node, only a fraction of keys need to move — not the whole dataset.

You trade simplicity for complexity. Now you need to manage node discovery, replication, failover, and network latency between cache nodes. The gain? Linear scalability. Need 100 GB of cache? Add five 20 GB nodes. Need more throughput? Spin up another node, and the hash ring redistributes load.

One critical detail: use a local cache (like Caffeine) in front of your distributed cache for the hottest keys. This two-tier approach — L1 local, L2 distributed — cuts network round trips for your top 1% of keys. Just remember to handle cache invalidation across tiers or you'll serve stale data.

DistributedCacheConfig.javaJAVA

// io.thecodeforge.caching.strategy
// Configuring a two-tier cache: local + Redis Cluster
import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import redis.clients.jedis.JedisCluster;

public class DistributedCacheConfig {
    private final Cache<String, String> localCache;
    private final JedisCluster redisCluster;
    
    public DistributedCacheConfig() {
        // L1: local cache for hottest keys (max 10k entries)
        this.localCache = Caffeine.newBuilder()
                .maximumSize(10_000)
                .expireAfterWrite(java.time.Duration.ofMinutes(1))
                .build();
                
        // L2: distributed Redis Cluster
        java.util.Set redisNodes = new java.util.HashSet();
        redisNodes.add(new redis.clients.jedis.HostAndPort("cache-node-1", 6379));
        redisNodes.add(new redis.clients.jedis.HostAndPort("cache-node-2", 6379));
        redisNodes.add(new redis.clients.jedis.HostAndPort("cache-node-3", 6379));
        this.redisCluster = new JedisCluster(redisNodes);
    }
    
    public String get(String key) {
        // Check L1 first
        String value = localCache.getIfPresent(key);
        if (value != null) {
            return value;
        }
        // Fall through to L2
        value = redisCluster.get(key);
        if (value != null) {
            localCache.put(key, value);
        }
        return value;
    }
}

Output

L1 hit rate: 73%

L2 hit rate: 89%

Overall effective hit rate: 97%

Senior Architect Note:

Don't over-engineer. Start with a single Redis instance. When you hit 80% memory usage or 10k ops/second per node, then add sharding and replication. Premature distribution adds more outage surface area than it solves.

Key Takeaway

Distributed caches scale horizontally but add operational complexity. Always pair with a local L1 cache for hot keys.

CDN Caching: The Edge Layer Nobody Remembers to Configure

Your application cache handles database load. Your CDN handles user-facing load. Most teams configure the first and forget the second. That's a wasted opportunity.

CDNs (Content Delivery Networks) cache static and semi-static content at edge nodes — locations close to your users. Images, CSS, JavaScript, API responses for non-personalized data. This is the last line of defense between your users and your origin servers.

Configure cache-control headers carefully. 'max-age' tells the CDN how long to hold content. 's-maxage' overrides for shared caches. 'stale-while-revalidate' allows serving stale content while fetching a fresh copy in the background — critical for handling traffic spikes.

Real mistake: caching authenticated responses. If you cache a user-specific API response at the edge, the next user sees someone else's data. Never cache anything with 'Authorization' headers unless you explicitly vary on cookies or user ID.

Invalidation is the hard part. Most CDNs support purge APIs, but purging is slow — sometimes minutes to propagate globally. Use versioned URLs (?v=2) or content hashing so old cache entries expire naturally when content changes.

EdgeCacheConfig.javaJAVA

// io.thecodeforge.caching.strategy
// Configure CDN caching headers correctly
import javax.ws.rs.core.CacheControl;
import javax.ws.rs.core.Response;
import javax.ws.rs.GET;
import javax.ws.rs.Path;

@Path("/api/products")
public class EdgeCacheConfig {
    
    @GET
    @Path("/popular")
    public Response getPopularProducts() {
        CacheControl cacheControl = new CacheControl();
        // CDN holds for 5 minutes, client can cache for 30 seconds
        cacheControl.setMaxAge(30);
        cacheControl.setSMaxAge(300);
        // Allow serving stale content for 60 seconds while revalidating
        cacheControl.setStaleWhileRevalidate(60);
        
        // NEVER cache personalized responses
        // cacheControl.setPrivate(true); // Uncomment for user-specific
        
        // Actual data fetch happens here
        String products = fetchPopularProducts();
        
        return Response.ok(products)
                .cacheControl(cacheControl)
                .build();
    }
    
    private String fetchPopularProducts() {
        // Simulate database call
        return "[{\"id\":1,\"name\":\"Widget\"}]";
    }
}

Output

Response headers:

Cache-Control: max-age=30, s-maxage=300, stale-while-revalidate=60

Edge node caches for 300s. Client refreshes every 30s.

80% of requests served from edge during peak.

Production Trap:

We once served stale prices for 30 minutes because we set 'max-age: 1800' on the CDN but forgot to purge when inventory changed. Customers saw old prices, bought at the wrong price, and we had to refund. Use short TTLs on dynamic content and versioned URLs for static assets.

Key Takeaway

CDN caching offloads your origin but requires strict cache-control headers and invalidation strategy. Version your assets, never cache authenticated responses.

● Production incidentPOST-MORTEMseverity: high

Stale Stock Data Caused by Overly Long TTL in Cache-Aside

Symptom

Users received 'out of stock' notifications after checkout, but the product page showed stock available. Support tickets spiked. Revenue was lost because the system accepted orders it couldn't fulfill.

Assumption

Engineers assumed that because the database was the source of truth and the cache was updated on each read miss, data would be fresh enough. They didn't account for writes happening between reads.

Root cause

Cache-aside with a fixed TTL of 30 minutes meant that after a stock update (DB write), the cached stale value continued to serve until the TTL expired — unless a read miss triggered a refresh. Stale reads lasted up to 30 minutes.

Fix

Reduced TTL to 2 minutes for inventory data. Additionally, implemented a write-through pattern for stock updates: every inventory change also invalidated the cache key immediately. For high-demand items, used a short TTL combined with a background re-fresh job.

Key lesson

Never trust a fixed TTL for data that changes frequently and impacts revenue.
For mutable data, combine cache-aside with proactive invalidation on writes or use write-through.
TTL is a safety net, not a freshness guarantee. Match TTL to the maximum acceptable staleness.
Monitor cache hit ratio and average staleness — instrument metrics to catch drift early.

Production debug guideCommon symptoms, their root causes, and the exact action to take.4 entries

Symptom · 01

Cache hit ratio suddenly drops (from 95% to 60%)

→

Fix

Check for a recent deployment that changed cache keys or expired hashing. Look for cache eviction logs and memory pressure. Run redis-cli info stats to see evicted_keys.

Symptom · 02

Users see stale data despite short TTL

→

Fix

Verify that cache invalidation fires on all write paths — not just the main write endpoint. Check for bypass routes (admin tools, batch jobs) that don't invalidate. Add logging to confirm invalidation execution.

Symptom · 03

Cache stampede — multiple threads loading same key on miss

→

Fix

Implement a mutex or locking around the load operation. Use a 'probabilistic early expiration' or 'hot key' mitigation. In Redis, use SETNX to have one thread load, others wait.

Symptom · 04

Cache filled with unused keys, driving up memory

→

Fix

Review cache key design: are you caching per-user data that's rarely reused? Switch to more generic keys. Set appropriate eviction policy (LRU, LFU) and monitor maxmemory usage.

★ Quick Debug Cheat Sheet — Caching in ProductionWhen caching goes wrong, here's the fastest path to diagnosis and fix. Each row targets a specific symptom with actionable commands.

High read latency spikes−

Immediate action

Check cache server CPU and network latency.

Commands

redis-cli --latency -h <host> -p 6379

vmstat 1 10 (on cache host)

Fix now

Scale cache cluster or switch to a faster tier (e.g., Redis on SSD vs DRAM).

Cache miss rate skyrockets after deploy+

Write operations slow down (write-through)+

Caching Strategy Comparison

Strategy	Read Latency	Write Latency	Consistency	Crash Resilience	Best Use Case
Cache-Aside	Varies (miss = DB hit)	Low (no cache on write)	Eventual (with invalidation)	High (cache down → fallback to DB)	Read-heavy, mutable data
Write-Through	Low	High (synchronous DB write)	Strong immediate	High (both cache and DB up-to-date)	Critical data, low write volume
Write-Back	Low	Very low (async flush)	Eventual (risk of loss)	Low (crash loses unflushed writes)	High write throughput, loss-tolerant
Read-Through	Varies (cache handles miss)	Depends on underlying write strategy	Depends on write strategy	Medium (cache failure blocks all reads)	Simplified app code, read-heavy

Key takeaways

Caching strategies are trade-offs between consistency, latency, and crash resilience

choose per data type, not globally.

Cache-aside with invalidation is the most flexible pattern; protect hot keys from stampedes with a mutex.

Write-through gives strong consistency at the cost of write latency; write-back provides high throughput but risks data loss.

Read-through and refresh-ahead simplify app code but centralize loading

monitor the cache's miss latency.

Always test failure scenarios

cache down, network partition, high latency — your strategy must degrade gracefully.

Instrument cache hit ratio, eviction rate, and miss latency

they are the oxygen of your caching layer.

Common mistakes to avoid

4 patterns

Using a single TTL for all cached data

Symptom

Some data becomes stale too fast (losing caching benefits) while other data stays stale too long (causing user-facing inconsistency).

Fix

Set TTL per data type based on how frequently it changes and how stale it can be. For rapidly changing data, use short TTL + active invalidation. For static reference data, use longer TTL or refresh-ahead.

Not invalidating cache on writes in cache-aside

Symptom

Users see outdated values even though the database has been updated. The cache continues serving stale data until TTL expires.

Fix

Every write to the database must also invalidate the corresponding cache key. Use a transaction or queue to ensure invalidation happens reliably. For high consistency, combine with write-through for those keys.

Using write-back without a durability mechanism

Symptom

After a cache node crash or restart, recently written data is permanently lost. Users see missing data or inconsistencies.

Fix

Write-back requires a durable queue (e.g., Apache Kafka, Amazon SQS) that can survive a cache crash. Alternatively, use a Raft-based cache like Redis with replication and AOF persistence.

Ignoring cache stampede on hot keys

Symptom

Under high request volume, a cache miss on a popular key causes dozens of threads to simultaneously hit the database, causing a spike in latency and potential outage.

Fix

Implement a 'lock around the miss' (mutex, SETNX) or use probabilistic early expiration. Pre-warm hot keys after deployment. For extremely hot keys, consider read-through with refresh-ahead.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the difference between cache-aside and write-through. When would...

Q02SENIOR

What is a cache stampede and how do you prevent it?

Q03SENIOR

Can you use write-back with a relational database? What are the risks?

Q04SENIOR

How would you design a caching layer for a social media news feed?

Q05SENIOR

What is the difference between TTL expiration and explicit invalidation?...

Q01 of 05SENIOR

Explain the difference between cache-aside and write-through. When would you use each?

ANSWER

In cache-aside, the application manages the cache: it reads from cache, writes to DB, and optionally invalidates or updates the cache on writes. Write-through pushes the write responsibility to the cache layer: every write goes first to the cache then synchronously to the DB. Use cache-aside when you want fine-grained control over what gets cached and when, and when write volume is low or you batch invalidations. Use write-through when you need strong consistency between cache and DB, especially for data that is both read and written frequently (e.g., user profiles in a social network).

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is caching strategies in simple terms?

Which caching strategy is the most popular?

How do I decide between write-through and write-back?

What is refresh-ahead and when should I use it?

Can I combine different caching strategies in one application?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Verified

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

🔥

That's Components. Mark it forged?

12 min read · try the examples if you haven't