Cache Eviction Policies: Stop Guessing, Start Predicting What Gets Kicked Out
Cache eviction policies explained with production trade-offs.
20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.
The most common cache eviction policies are LRU (Least Recently Used), LFU (Least Frequently Used), FIFO (First In First Out), and TTL-based expiry. Choose LRU for general-purpose caching with temporal locality, LFU for content with skewed popularity, and TTL for data that becomes stale after a fixed time.
Imagine your desk has room for only 5 sticky notes. When you need to add a 6th, you have to throw one away. LRU means you toss the note you haven't looked at the longest. LFU means you toss the one you've touched the fewest times. FIFO means you toss the oldest note regardless of use. TTL means you set a timer on each note and it self-destructs when time's up.
Your cache is lying to you. It's full of data nobody asked for in hours, while the hot keys your users actually need are getting evicted every 30 seconds. That's not a cache — that's a memory leak with extra steps. The problem isn't your cache size. It's your eviction policy. Pick the wrong one and your hit rate tanks, your database gets hammered, and your pager goes off at 3 AM. This article gives you the decision framework to choose the right eviction policy for your workload, the gotchas that'll burn you in production, and the exact commands to debug when things go sideways.
Why Your Cache Needs a Bouncer (and Why FIFO Is the Worst Bouncer)
A cache without an eviction policy is a ticking time bomb. The moment you hit capacity, writes fail or the process OOMs. The simplest policy — FIFO — evicts the oldest entry regardless of how popular it is. That's fine for a queue, terrible for a cache. In production, I've seen FIFO cause a 90% cache miss rate because a batch job loaded a million cold keys, pushing out all the hot ones. The database got hammered, latency spiked, and the cache became a liability. The fix: never use FIFO for caching. Use it only for rate-limiting buckets or job queues where order matters.
LRU: The Default You Should Actually Use (But Tune)
LRU evicts the entry that hasn't been accessed the longest. It exploits temporal locality — if you accessed something recently, you'll likely access it again soon. This is the default for Redis (allkeys-lru) and memcached. But LRU has a blind spot: a one-hit wonder that was accessed once and never again will sit in the cache until it's the least recently used among all entries. That's fine for most workloads. The real gotcha: LRU requires tracking access order. In a naive implementation (linked list + hashmap), every get() causes a promotion — moving the node to the head. Under high concurrency, that's a mutex bottleneck. Redis uses an approximation (sampling) to avoid this. For your own implementation, consider using an approximate LRU with a clock algorithm or a concurrent skip list.
LFU: When Popularity Matters More Than Recency
LFU evicts the entry with the lowest access frequency. It's ideal for workloads where a small set of keys gets the vast majority of requests — think CDN caching, API rate limit counters, or leaderboard data. The problem: LFU has a 'cold start' issue. A new key starts with frequency 1, so it's immediately vulnerable to eviction. You need a frequency aging mechanism — periodically halve all counters to prevent old popular keys from dominating forever. Redis implements LFU via a probabilistic counter (Morris counter) with logarithmic aging. The gotcha: LFU consumes more memory per key (to store frequency) and can be tricked by bursty traffic — a key that gets 100 requests in 1 second then nothing for an hour might be kept while a steadily used key gets evicted.
TTL: The Self-Destruct Mechanism (and Why You Need a Purge Thread)
TTL (Time-To-Live) isn't strictly an eviction policy — it's an expiry mechanism. But combined with any eviction policy, it's essential for data that becomes stale. The classic mistake: relying on lazy expiration (checking TTL on access). If a key is never accessed, it stays in memory forever. Redis has an active expiry mechanism that samples keys periodically, but it's not guaranteed. In production, I've seen a cache grow unbounded because TTL was set but the active expiry couldn't keep up with writes. Fix: run a background purge thread that scans and deletes expired keys. Or use a time-ordered data structure (like a priority queue of expiry times) to evict precisely.
When Eviction Policies Collide: The Hybrid Approach
Real production systems often combine policies. For example, Redis allows 'volatile-lru' — evicts among keys with TTL set, leaving persistent keys untouched. Or you can implement a two-tier cache: L1 (in-memory LRU) and L2 (Redis with LFU). The L1 handles hot keys with low latency; L2 handles warm keys with frequency-based retention. The gotcha: consistency between tiers. If L1 evicts a key, L2 might still have it stale. Use write-through or invalidation messages. Another hybrid: use TTL as the primary eviction trigger, but if the cache is full before TTL expires, fall back to LRU. This prevents stale data from lingering while still handling capacity spikes.
The Concurrency Nightmare: Locking, Striping, and Lock-Free Approaches
Eviction policies that require promotion on every get() (like LRU and LFU) are a concurrency nightmare under high throughput. A naive synchronized LRU cache becomes a bottleneck — all threads queue up for the lock. The fix: use concurrent data structures. For LRU, consider a ConcurrentLinkedHashMap (from Caffeine library) or a striped lock where each stripe handles a subset of keys. For LFU, use a concurrent frequency counter with atomic increments and a lock-free min-heap (tricky). Redis avoids this by using a single-threaded event loop — no locks needed. If you're building your own, measure before optimizing. Often, a simple synchronized cache with a hit rate >95% is fast enough because the lock is held briefly.
When Not to Use an Eviction Policy: The Anti-Cache Pattern
Sometimes you don't want eviction at all. If your data is small enough to fit in memory, just use a hash map. If your data has a natural expiry (like session tokens), use TTL-only with no max size — let expiry handle it. Eviction policies add complexity and overhead. I've seen teams implement LRU for a cache of 100 entries — the overhead of tracking order was higher than the cost of recomputing the data. Rule of thumb: if your cache hit rate is >99% with a simple TTL, don't add eviction. If your cache is large (>1GB) and access pattern is unpredictable, then reach for LRU or LFU.
The 4GB Container That Kept Dying
- FIFO eviction is almost never the right choice for a cache — it doesn't respect access patterns.
redis-cli info memory | grep used_memory_human. 2. Check eviction count: redis-cli info stats | grep evicted_keys. 3. If evictions are high, increase maxmemory or switch to a more efficient policy (e.g., allkeys-lru). 4. Analyze access pattern with redis-cli --hotkeys.kubectl describe pod <redis-pod> | grep Limits. 2. Check Redis maxmemory config: redis-cli config get maxmemory. 3. Ensure maxmemory is set to 80% of container limit. 4. Set maxmemory-policy to allkeys-lru. 5. Add memory request/limit to pod spec.redis-cli ttl <key>. 2. If TTL is -1 (no expiry), fix code to set TTL. 3. If TTL is set but stale, check for lazy expiration: expired keys linger until accessed. 4. Enable active expiry: redis-cli config set activedefrag yes. 5. Consider using a background purge thread in your app.redis-cli info stats | grep evicted_keysredis-cli config get maxmemoryredis-cli config set maxmemory-policy allkeys-lruKey takeaways
Interview Questions on This Topic
How does Redis implement LRU without a linked list?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.
That's Components. Mark it forged?
4 min read · try the examples if you haven't