Caching Strategies — TTL Pitfalls and Staleness Trade-Offs
A 30-min TTL caused stale stock data and lost orders.
- Cache-aside (lazy loading): app checks cache first, loads from DB on miss, writes to cache
- Write-through: every write goes to cache and DB synchronously — strong consistency, higher write latency
- Write-back (write-behind): writes go to cache, asynchronously flushed to DB — fast writes, crash risk
- Read-through: cache layer intercepts reads, loads from DB on miss without app involvement
- Trade-off triangle: consistency vs latency vs crash resilience — pick the two that matter most
Imagine you're a chef. Every time someone orders pasta, you could run to the warehouse, grab ingredients, cook from scratch — or you could keep a small prep station right next to you with the most common ingredients already measured out. That prep station is a cache. A caching strategy is simply the set of rules you use to decide: when to stock the prep station, when to restock it, and what to throw away when it gets full. Get those rules right and service is lightning fast. Get them wrong and customers either get stale food or you're running to the warehouse constantly anyway.
Every high-traffic system you've ever used — Google Search, Instagram, your bank's app — is quietly cheating. They're not recalculating the same expensive results over and over. They're storing answers close to where they're needed and serving them back in microseconds instead of milliseconds. That's caching, and at scale it's the single biggest lever you can pull to make a system feel instant. Without it, your database becomes the bottleneck and every user request hammers the same rows over and over like a broken record.
The problem caching solves isn't just speed — it's survival. A relational database that handles 500 queries per second can be reduced to its knees by a single viral moment or a marketing campaign spike. Caching absorbs that traffic, acting as a shock absorber between your users and your persistence layer. But here's what most tutorials skip: there's no single 'correct' cache. The strategy you choose determines whether your cache helps or quietly corrupts your data. A wrong caching strategy can serve users outdated prices, stale inventory counts, or worse — silently lose writes during a crash.
By the end of this article you'll know exactly what each major caching strategy does, when to reach for each one, and critically — what can go wrong with each. You'll be able to look at a system design diagram and immediately spot which caching pattern fits, defend your choice in an interview, and avoid the classic mistakes that bite engineers in production.
What is Caching Strategies?
Caching strategies are the rules that govern how your application interacts with a cache layer relative to the primary database. The four canonical strategies — cache-aside, write-through, write-back, and read-through — each make distinct trade-offs between read latency, write latency, data consistency, and crash resilience. The choice isn't theoretical: it directly impacts whether your service becomes read-heavy or write-heavy, how stale data can become, and what happens when the cache node fails.
In production, you rarely pick one strategy for the whole system. Instead, you tailor the strategy per data type: session data might use write-back for fast writes, while financial transactions use write-through to guarantee durability. Understanding the internals of each pattern lets you make that call confidently.
Cache-Aside (Lazy Loading) in Depth
Cache-aside is the most common pattern. The application is responsible for both reading from cache and writing to it. On a read, the app checks the cache first. If present (hit), it returns the value. If absent (miss), it loads the value from the database, writes it to the cache, and returns it. Writes go directly to the database; the application can optionally invalidate the cache key to avoid stale reads.
The key advantage: the cache only holds data that's actually requested. Auto-population means you don't waste memory on rarely accessed items. But the downside is that first request to any key pays a higher latency (cache miss). In high-traffic systems, a sudden miss on a hot key can cause a stampede — many concurrent requests all hitting the database simultaneously.
To prevent stampedes, implement a 'lock around the miss': only one thread loads, others wait. This is often done with a mutex or using Redis' SETNX command. Another mitigation is to pre-warm the cache with data known to be hot during deployment.
- On-hand shelf = cache. Archive = database.
- If the book is on the shelf, grab it (cache hit).
- If not, go to the archive, fetch the book, and put a copy on the shelf for next time (cache miss + populate).
- When a book is returned (write), you only update the archive — the on-hand shelf copy remains until someone checks again or you intentionally remove it (invalidate).
- If two people walk to the archive at the same time (stampede), you need a one-at-a-time rule (lock).
Write-Through and Write-Back Strategies
Write-through and write-back shift write responsibility from the application to the cache layer. In write-through, every write goes first to the cache, then synchronously to the database. The write is not acknowledged until both writes succeed. This gives strong consistency: reads from the cache always see the latest write. The cost is higher write latency (adding the database write time to the critical path).
Write-back (also called write-behind) acknowledges the write as soon as the cache accepts it. The cache later asynchronously flushes the data to the database. This dramatically reduces write latency because the database write is off the critical path. However, if the cache crashes before the flush, data is lost. Write-back is ideal for high-throughput event streams, session stores, or analytics where some data loss is tolerable.
In practice, many systems layer strategies: use write-through for critical data (user accounts, payments) and write-back for non-critical but high-volume data (clickstreams, page views).
Write-Back Flow Sequence
A write-back (write-behind) flow decouples the application write from the database persistence. The sequence is: the application sends a write to the cache; the cache immediately acknowledges the write, making the application feel fast; the cache then places the write into an internal buffer or queue; a background worker asynchronously dequeues writes and issues them to the database; the database eventually acknowledges persistence. This pattern is ideal for high-volume data where sub-millisecond write latency is required and a small window of data loss on crash is acceptable.
The critical path is the cache acknowledgment. The async flush must be monitored for backlog and failures. If the flush lags, reads from the cache will see data that hasn't yet been persisted, and a crash before flush will lose those writes. To mitigate, use a durable queue (Kafka, SQS) as the stash between cache and DB, and configure the flush worker to retry with exponential backoff.
Read-Through and Refresh-Ahead Strategies
Read-through is like cache-aside but the cache layer itself handles the miss loading — the application doesn't implement the load logic. This is common when using a caching library or a database proxy (e.g., Redis with a read-through module, or using a write-behind cache like Aerospike). The cache is configured with a loader function, and on any miss it automatically fetches from the database, caches the result, and returns it.
The benefit: application code is simpler — no explicit check-and-load logic. The downside: the cache becomes a thicker layer, and you have less control over the loading behavior. Also, if the loader function throws an error, the cache might propagate it differently than your app would.
Refresh-ahead is an optimization where the cache proactively refreshes a key before it expires, based on access patterns. For example, if a key is accessed frequently and its TTL is about to expire, the cache asynchronously reloads it so that subsequent reads never encounter a miss. This prevents the latency spike of a miss on hot keys. It's especially valuable for data that changes slowly (like configuration or product metadata).
Production Trade-offs and Choosing the Right Strategy
No single caching strategy works for all scenarios. The choice depends on three dimensions: - Consistency requirements: Can users see stale data temporarily? (Yes ⇒ cache-aside or write-back; No ⇒ write-through) - Write throughput: How many writes per second? (High ⇒ write-back or cache-aside with async invalidation) - Crash tolerance: Can you lose a few seconds of writes? (No ⇒ write-through; Yes ⇒ write-back with durability)
In production, you often combine strategies per data class. For example, a social media feed might use cache-aside for user profiles (read often, write rarely), write-back for like counts (high write volume, some loss acceptable), and write-through for financial transactions.
A common anti-pattern is global caching: applying one strategy to all data because it's simpler. This either wastes memory (caching rarely accessed data) or causes consistency problems (stale data for frequently updated items). Instead, categorize your data by access pattern and criticality, then pick the matching strategy.
Eviction Policy Decision Matrix
When the cache reaches its memory limit, an eviction policy decides which entries to remove. The four most common policies are:
LRU (Least Recently Used) - Evicts the entry that was accessed longest ago. Works well for workloads with temporal locality (recently accessed items are likely to be accessed again). Simple to implement and widely used (Redis default). Can be defeated by scans that touch all keys once, causing useful data to be evicted.
LFU (Least Frequently Used) - Evicts the entry with the lowest access frequency. Better for workloads with strong popularity skew (e.g., a few items get 80% of requests). Redis supports LFU with configurable decay factor to gradually reduce frequency over time. More memory overhead than LRU to track counts.
FIFO (First In First Out) - Evicts the oldest entry regardless of access pattern. Simple, but can evict frequently used data that happens to be old. Rarely used in production unless the data has a strict time-to-live semantics (e.g., session data where session ID is older than threshold).
Random - Evicts a random entry. Simple and fast. Works surprisingly well under high churn where LRU/LFU overhead doesn't pay off. Good for cache-aside patterns where invalidation is explicit and eviction just reclaims space on overflow.
To choose, consider your access pattern: temporal locality → LRU; popularity skew → LFU; low memory overhead → FIFO or Random. Test with production-like trace to measure hit ratio under different policies.
The matrix below summarizes when to use each:
| Workload Characteristic | Recommended Eviction Policy |
|---|---|
| Strong temporal locality (session, feed) | LRU |
| Strong popularity skew (hot keys, viral content) | LFU |
| Data with natural expiration (logs) | FIFO or Random |
| High churn, unpredictable access | Random |
| You don't know your access pattern | Start with LRU, then profile |
maxmemory-policy at runtime. Run a trial with different policies under production traffic (or replay traffic) and compare evicted_keys and hit ratio before committing.redis-cli --stat to watch evictions and misses in real time. LFU often outperforms LRU for content delivery workloads.Cache Strategy Selection: Read-Heavy vs Write-Heavy Workloads
The shape of your workload — whether it's dominated by reads or writes — drives the caching strategy choice. Below is a selection table that maps workload type to recommended patterns:
| Workload Type | Read Frequency | Write Frequency | Best Strategy | Why |
|---|---|---|---|---|
| Read-heavy (90%+ reads) | High | Low | Cache-aside or Read-through with long TTL | Caching reads reduces DB load dramatically; writes are rare so staleness is minimal. Use invalidation on writes. |
| Write-heavy (50%+ writes) | Medium | High | Write-back with durable queue | Writes must be fast; accept eventual consistency. Monitor flush lag. Use write-through for critical subset. |
| Balanced (mixed) | Medium | Medium | Cache-aside with active invalidation | Flexible. Use write-through for important data, write-back for logging, cache-aside for reads. |
| Read-rarely, write-often (e.g. sensor data) | Very low | Very high | Write-back | No reason to cache reads; focus on fast writes. Write-back reduces write latency. |
| Read-often, write-often (e.g. user inventory) | High | High | Write-through | Strong consistency needed. Write latency is acceptable if DB is fast. Use replication for durability. |
For read-heavy systems, the cache hit ratio is the key metric: a 95% hit ratio means only 5% of requests go to DB. For write-heavy, measure write latency and flush queue depth. Never assume a single strategy fits all; partition by data type and apply the appropriate row from this table.
Cache Hit Ratio Impact Calculation
The cache hit ratio directly determines the load on your database and the average request latency. Let's walk through a concrete example.
Scenario: You have a service handling 10,000 requests per second (RPS). Each database query takes 50ms. Each cache hit takes 1ms. The cache can serve up to 50,000 RPS before saturating.
Case A: 90% hit ratio - Cache hits: 9,000 RPS at 1ms = 9 seconds of total delay per second (9,000 0.001s) - Cache misses: 1,000 RPS go to DB at 50ms = 50 seconds of total delay per second - Average latency = (9,000 1ms + 1,000 * 50ms) / 10,000 = (9,000 + 50,000) / 10,000 = 5.9ms - DB load: 1,000 QPS (well within typical DB capacity of 5,000 QPS)
Case B: 99% hit ratio - Cache hits: 9,900 RPS at 1ms = 9.9 seconds - Cache misses: 100 RPS at 50ms = 5 seconds - Average latency = (9,9001 + 10050) / 10,000 = (9,900 + 5,000) / 10,000 = 1.49ms - DB load: 100 QPS
Case C: 70% hit ratio - Cache hits: 7,000 at 1ms = 7 seconds - Cache misses: 3,000 at 50ms = 150 seconds - Average latency = (7,000 + 150,000) / 10,000 = 15.7ms - DB load: 3,000 QPS (near capacity, risk of queuing and increased latency)
Impact: Improving hit ratio from 90% to 99% reduces average latency by 75% (5.9ms → 1.49ms) and DB load by 90% (1,000 → 100 QPS). The marginal gain becomes more valuable as you approach 99%+.
Key formula: Average latency = (HR Lcache) + ((1 - HR) Ldb) where HR is hit ratio, Lcache is cache latency, Ldb is DB latency. If DB latency spikes under load, use a non-linear model. The critical inflection point is when (1 - HR) * RPS exceeds the DB concurrency limit.
redis-cli --hotkeys to find keys with many misses. Consider pinning those keys with an indefinite TTL and a background refresh job.Stale Stock Data Caused by Overly Long TTL in Cache-Aside
- Never trust a fixed TTL for data that changes frequently and impacts revenue.
- For mutable data, combine cache-aside with proactive invalidation on writes or use write-through.
- TTL is a safety net, not a freshness guarantee. Match TTL to the maximum acceptable staleness.
- Monitor cache hit ratio and average staleness — instrument metrics to catch drift early.
redis-cli info stats to see evicted_keys.SETNX to have one thread load, others wait.Key takeaways
Common mistakes to avoid
4 patternsUsing a single TTL for all cached data
Not invalidating cache on writes in cache-aside
Using write-back without a durability mechanism
Ignoring cache stampede on hot keys
Interview Questions on This Topic
Explain the difference between cache-aside and write-through. When would you use each?
Frequently Asked Questions
That's Components. Mark it forged?
10 min read · try the examples if you haven't