Senior 6 min · March 05, 2026

Redis Write Failures — Default Eviction Policy

OOM errors from default 'noeviction' policy: use MEMORY USAGE to detect session leaks and avoid late-night pager calls—debugging steps for production.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Redis is a single-threaded, in-memory data structure server that operates in microseconds
  • Uses RAM for storage — eliminates disk I/O latency (nanoseconds vs milliseconds)
  • Core types: Strings, Hashes, Lists, Sets, Sorted Sets — each designed for a specific access pattern
  • Key expiry (TTL) prevents memory bloat and ensures data freshness
  • Production risk: default noeviction policy causes write failures when memory fills up without warning
Plain-English First

Imagine your brain vs a filing cabinet. When your teacher asks you the capital of France, you don't go rummaging through a cabinet — you just know it instantly because it's in your short-term memory. Redis is that short-term memory for your application. Your main database (PostgreSQL, MySQL) is the filing cabinet — thorough but slow. Redis sits in RAM, razor-close to your app, so fetching a value takes microseconds instead of milliseconds. It's a supercharged sticky-note board your entire server cluster can share.

Every high-traffic application hits the same wall eventually: the database becomes the bottleneck. A query that takes 40ms feels invisible during development but becomes catastrophic when 10,000 users hit it simultaneously. Twitter, GitHub, Stack Overflow, and Shopify all reached this wall — and Redis is a big part of how they broke through it. It's not an exaggeration to say that understanding Redis is what separates junior developers from engineers who can design systems that actually scale.

Redis (Remote Dictionary Server) solves the read-amplification problem. Most web applications read data far more than they write it — a product page might be read 50,000 times a day but updated once. Hammering your relational database with 50,000 identical queries is wasteful and slow. Redis lets you compute the answer once, store it in memory, and serve all 50,000 requests from there in microseconds. But Redis isn't just a cache — it's a full data structure server that can power rate limiters, leaderboards, pub/sub messaging, session stores, and queues.

By the end of this article you'll understand not just what Redis commands look like, but WHY each data structure exists, WHEN to reach for each one, and how to wire Redis into a real application pattern. You'll also learn the subtle mistakes — wrong expiry strategies, cache stampedes, missing persistence configs — that trip up developers who learned Redis from a cheat sheet instead of from first principles.

Why Redis Lives in RAM and Why That Changes Everything

Traditional databases store data on disk. Disk access — even an NVMe SSD — operates in the microseconds-to-milliseconds range. RAM access operates in nanoseconds. That's not a small difference; it's three orders of magnitude. Redis keeps its entire dataset in memory by default, which is the single most important architectural decision behind its speed.

But speed isn't the only trick. Redis is single-threaded for command execution. That sounds like a limitation until you understand what it eliminates: lock contention. In a multi-threaded database, threads fight over the same rows with locks. Redis sidesteps that fight entirely — one command runs to completion before the next starts. This makes Redis operations atomic by default, which matters enormously for things like incrementing a counter or checking-then-setting a value.

Redis also supports optional persistence. You can tell it to snapshot its RAM contents to disk every N seconds (RDB snapshotting) or to log every write command to an append-only file (AOF). Most production setups use both. This means Redis isn't just a volatile cache — it can survive a restart and recover its data.

The practical takeaway: use Redis for data that is read far more than it's written, where milliseconds matter, and where you can tolerate the data being slightly stale or reproducible if lost.

redis_getting_started.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# --- Step 1: Start the Redis server (run in one terminal) ---
# This launches Redis with default config on port 6379
redis-server

# --- Step 2: Connect with the Redis CLI (run in another terminal) ---
redis-cli

# --- Step 3: Ping the server to confirm it's alive ---
127.0.0.1:6379> PING
# Redis responds with PONG — the simplest health check you'll ever do

# --- Step 4: Store a string value (the most basic operation) ---
# SET key value
# We're caching a user's display name keyed by their user ID
127.0.0.1:6379> SET user:1001:display_name "Alice Nguyen"

# --- Step 5: Retrieve it ---
127.0.0.1:6379> GET user:1001:display_name

# --- Step 6: Store a value with an expiry (TTL = Time To Live) ---
# EX sets expiry in seconds — this key auto-deletes after 300 seconds (5 minutes)
# This is the pattern for caching: store it, let it expire, recompute if missing
127.0.0.1:6379> SET product:42:price "29.99" EX 300

# --- Step 7: Check how many seconds remain before expiry ---
127.0.0.1:6379> TTL product:42:price

# --- Step 8: Check if a key exists without fetching its value ---
127.0.0.1:6379> EXISTS user:1001:display_name
Output
# redis-server output (abbreviated):
# * Ready to accept connections on port 6379
# redis-cli responses:
PONG
OK
"Alice Nguyen"
OK
298 # (seconds remaining — decreasing in real time)
(integer) 1 # 1 = key exists, 0 = it doesn't
Pro Tip: Use Colons as Namespace Separators
Redis has no concept of tables or schemas. The community convention is to structure keys like entity:id:field — e.g., user:1001:display_name or session:abc123. This makes keys self-documenting and lets tools like RedisInsight group them visually. Never use flat keys like displayname1001 — you'll hate yourself when you have 2 million keys to debug.
Production Insight
Single-threaded means one slow command blocks all others — a KEYS * on 10M keys freezes Redis for seconds.
Latency jumps from microseconds to milliseconds when RAM is near full — Redis starts swapping or evicting.
Rule: monitor maxmemory and instantaneous_ops_per_sec in production; never use KEYS in app code.
Key Takeaway
Redis is fast because RAM + single thread eliminates disk I/O and lock contention.
Every command is atomic — a blessing for counters, a curse if you run an O(n) scan.
Use colons for key names, always set TTLs, and monitor memory before it bites you.

Redis Data Structures — Picking the Right Tool for Each Problem

Redis isn't just a key-value store in the boring sense. It stores five core data types, and choosing the right one is the difference between an elegant solution and a painful hack.

Strings — the default. Good for counters, cached HTML, serialized JSON blobs, and session tokens. The INCR command atomically increments a string-as-integer, making it perfect for rate limiting and hit counters.

Hashes — think of a Hash as a mini dictionary attached to one key. Instead of storing a user as one giant JSON blob, you store their fields separately. This lets you update a single field without fetching and re-serializing the entire object.

Lists — ordered, duplicates allowed. Built on a linked list internally. Ideal for queues (push to the tail, pop from the head) and activity feeds (push new events to the head, trim the list to keep only the last N).

Sets — unordered, unique members. Perfect for tracking unique visitors, tagging systems, or finding common followers between two users with SINTER.

Sorted Sets — the crown jewel. Every member has a floating-point score. Redis keeps members ordered by score automatically. This is how you build leaderboards, priority queues, and range-based queries without a single SQL ORDER BY.

redis_data_structures.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# =========================================================
# STRINGS — atomic counter for API rate limiting
# =========================================================

# Track how many API calls user 2055 has made this minute
# INCR is atomic — safe even with concurrent requests
127.0.0.1:6379> INCR api_calls:user:2055:minute:2024061514
127.0.0.1:6379> INCR api_calls:user:2055:minute:2024061514
127.0.0.1:6379> INCR api_calls:user:2055:minute:2024061514

# Set it to expire at the end of the minute (60 seconds)
127.0.0.1:6379> EXPIRE api_calls:user:2055:minute:2024061514 60

# =========================================================
# HASHES — store a user profile without one giant JSON blob
# =========================================================

# HSET sets one or more fields on a hash key
# Updating just the email later only rewrites that one field
127.0.0.1:6379> HSET user:2055 username "bob_the_dev" email "bob@example.com" plan "pro" login_count 0

# Retrieve a single field — no need to deserialize a full JSON object
127.0.0.1:6379> HGET user:2055 email

# Retrieve all fields at once
127.0.0.1:6379> HGETALL user:2055

# Atomically increment just the login counter
127.0.0.1:6379> HINCRBY user:2055 login_count 1

# =========================================================
# SORTED SETS — real-time game leaderboard
# =========================================================

# ZADD leaderboard_key score member
# Score is the player's points — Redis sorts automatically
127.0.0.1:6379> ZADD game:leaderboard 4200 "player:alice"
127.0.0.1:6379> ZADD game:leaderboard 8750 "player:bob"
127.0.0.1:6379> ZADD game:leaderboard 6100 "player:carol"

# Fetch top 3 players, highest score first (WITHSCORES shows the score)
# ZREVRANGE = reverse order = highest to lowest
127.0.0.1:6379> ZREVRANGE game:leaderboard 0 2 WITHSCORES

# Get a specific player's rank (0-indexed, 0 = top)
127.0.0.1:6379> ZREVRANK game:leaderboard "player:alice"

# =========================================================
# LISTS — lightweight task queue
# =========================================================

# LPUSH adds to the LEFT (head) of the list
# Workers will RPOP from the RIGHT (tail) — FIFO queue
127.0.0.1:6379> RPUSH email_queue "{\"to\":\"alice@example.com\",\"subject\":\"Welcome\"}"
127.0.0.1:6379> RPUSH email_queue "{\"to\":\"bob@example.com\",\"subject\":\"Reset\"}"

# BLPOP = blocking pop — worker waits up to 5 seconds for a job
# This is more efficient than polling in a loop
127.0.0.1:6379> BLPOP email_queue 5
Output
# INCR responses:
(integer) 1
(integer) 2
(integer) 3
(integer) 1 # EXPIRE confirmation
# HGET response:
"bob@example.com"
# HGETALL response:
1) "username"
2) "bob_the_dev"
3) "email"
4) "bob@example.com"
5) "plan"
6) "pro"
7) "login_count"
8) "0"
# HINCRBY response:
(integer) 1
# ZREVRANGE top 3:
1) "player:bob"
2) "8750"
3) "player:carol"
4) "6100"
5) "player:alice"
6) "4200"
# ZREVRANK alice (0-indexed from top):
(integer) 2 # alice is 3rd place
# BLPOP response (job dequeued):
1) "email_queue"
2) "{\"to\":\"alice@example.com\",\"subject\":\"Welcome\"}"
Watch Out: Don't Store Giant Objects in a Single String
A common mistake is serializing an entire user object — with 40 fields — into one JSON string and storing it as a Redis String. Every time you need to update the user's last_login field, you must GET the entire blob, deserialize it in your app, update the field, re-serialize, and SET it back. Under concurrent load this causes race conditions and unnecessary network traffic. Use a Hash instead — HSET lets you update one field atomically in a single round trip.
Production Insight
Large Strings ( >10KB) increase memory fragmentation and network latency per operation.
Sorted Sets with O(log n) insert/update are efficient for ~1M members but degrade with >10M.
Rule: keep values under 10KB; if larger, consider compression or separate storage.
Key Takeaway
Choose the data structure that matches your access pattern, not the one you're used to.
Strings are not for mutable objects — Hashes are.
Sorted Sets are Redis's superpower: real-time rankings, range queries, and priority queues in one command.

The Cache-Aside Pattern — Wiring Redis Into a Real Application

Knowing Redis commands is one thing. Knowing how to integrate Redis into your application code without creating subtle bugs is another. The most widely-used pattern is Cache-Aside (also called Lazy Loading). The logic is elegantly simple: when your app needs data, check Redis first. If it's there (a cache hit), return it immediately. If it's not (a cache miss), fetch it from the database, store it in Redis with a TTL, then return it. Redis never gets data pushed to it — your application pulls it through.

This pattern is powerful because it's self-healing. If Redis goes down and loses all its data, your app degrades gracefully — everything just goes to the database until Redis is warm again. The cache populates itself organically based on what users actually request, not what you predict they'll request.

The critical detail most tutorials skip: always set a TTL. Without one, your cache grows forever and you'll eventually run out of RAM. More importantly, stale data lives forever. If a product's price changes in your database but the Redis entry never expires, customers see wrong prices indefinitely. Your TTL is your freshness guarantee.

The code below shows this pattern implemented in Python with the redis-py library — the same library used by Instagram and Pinterest in production.

cache_aside_pattern.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import redis
import json
import time

# --- Connect to Redis ---
# decode_responses=True means Redis returns strings instead of bytes
redis_client = redis.Redis(
    host="localhost",
    port=6379,
    db=0,
    decode_responses=True
)

# --- Simulated database fetch (replace with your real DB query) ---
def fetch_product_from_database(product_id: int) -> dict:
    """
    Simulates a slow database query.
    In production this would be: cursor.execute('SELECT * FROM products WHERE id = %s', [product_id])
    """
    print(f"[DB] Querying database for product {product_id}...")
    time.sleep(0.05)  # simulate 50ms DB query latency
    return {
        "id": product_id,
        "name": "Mechanical Keyboard TKL",
        "price": 129.99,
        "stock": 42
    }

def get_product(product_id: int) -> dict:
    """
    Cache-Aside Pattern implementation.
    Always check Redis first. Fall back to DB on miss. Always set a TTL.
    """
    cache_key = f"product:{product_id}"  # namespaced key following the colon convention
    cache_ttl_seconds = 300              # cache is valid for 5 minutes

    # --- Step 1: Try the cache first ---
    cached_value = redis_client.get(cache_key)

    if cached_value is not None:
        # Cache HIT — data found in Redis, no DB query needed
        print(f"[CACHE] Hit for key '{cache_key}'")
        return json.loads(cached_value)  # deserialize from JSON string back to dict

    # --- Step 2: Cache MISS — go to the database ---
    print(f"[CACHE] Miss for key '{cache_key}'")
    product_data = fetch_product_from_database(product_id)

    # --- Step 3: Populate the cache for next time ---
    # json.dumps serializes the dict to a JSON string for storage
    # ex=cache_ttl_seconds ensures the key auto-expires — NEVER skip this
    redis_client.set(
        cache_key,
        json.dumps(product_data),
        ex=cache_ttl_seconds
    )
    print(f"[CACHE] Stored '{cache_key}' with TTL={cache_ttl_seconds}s")

    return product_data

def invalidate_product_cache(product_id: int):
    """
    Call this whenever a product is updated in the database.
    Removing the key forces the next request to re-fetch fresh data.
    """
    cache_key = f"product:{product_id}"
    deleted_count = redis_client.delete(cache_key)
    if deleted_count > 0:
        print(f"[CACHE] Invalidated key '{cache_key}'")
    else:
        print(f"[CACHE] Key '{cache_key}' wasn't in cache — nothing to invalidate")

# --- Demo ---
if __name__ == "__main__":
    print("=== First request — cold cache ===")
    product = get_product(product_id=7)
    print(f"Result: {product}\n")

    print("=== Second request — warm cache ===")
    product = get_product(product_id=7)
    print(f"Result: {product}\n")

    print("=== Simulating a product update ===")
    invalidate_product_cache(product_id=7)

    print("\n=== Third request — cache was invalidated ===")
    product = get_product(product_id=7)
    print(f"Result: {product}")
Output
=== First request — cold cache ===
[CACHE] Miss for key 'product:7'
[DB] Querying database for product 7...
[CACHE] Stored 'product:7' with TTL=300s
Result: {'id': 7, 'name': 'Mechanical Keyboard TKL', 'price': 129.99, 'stock': 42}
=== Second request — warm cache ===
[CACHE] Hit for key 'product:7'
Result: {'id': 7, 'name': 'Mechanical Keyboard TKL', 'price': 129.99, 'stock': 42}
=== Simulating a product update ===
[CACHE] Invalidated key 'product:7'
=== Third request — cache was invalidated ===
[CACHE] Miss for key 'product:7'
[DB] Querying database for product 7...
[CACHE] Stored 'product:7' with TTL=300s
Result: {'id': 7, 'name': 'Mechanical Keyboard TKL', 'price': 129.99, 'stock': 42}
Interview Gold: Cache-Aside vs Write-Through
Cache-Aside loads data lazily (on first read). Write-Through updates the cache on every write, keeping it always warm but adding write latency. The tradeoff: Cache-Aside has slower first reads but only caches what's actually needed. Write-Through has faster reads but wastes memory caching data that may never be read again. Most production systems use Cache-Aside with explicit invalidation on writes — exactly the pattern shown above.
Production Insight
Cache-Aside with uniform TTL causes thundering herd when all keys expire together.
Without invalidation, stale data persists for TTL duration — can display wrong prices or broken states.
Rule: add random jitter to TTLs (±20%) and always have an invalidation path for critical data.
Key Takeaway
Check Redis first, DB as fallback, always set TTL with jitter.
Invalidate cache explicitly on writes — don't wait for TTL to expire.
This pattern is production-proven (Instagram, Pinterest, Twitter) for good reason.

Redis Expiry, Eviction and Why Your Cache Will Betray You Without Them

TTLs are your first line of defense against stale data. But what happens when Redis runs out of memory before any keys expire? This is where eviction policies come in, and most developers don't think about them until Redis starts refusing writes in production — which is a very bad day.

Redis has several eviction policies configured via maxmemory-policy in your redis.conf. The default policy is noeviction — Redis refuses new writes when full. That sounds safe but it means your application starts throwing errors. For a cache, you almost always want allkeys-lru (evict the least recently used key across all keys) or volatile-lru (evict the least recently used key that has a TTL set).

There's also the cache stampede problem — also called the thundering herd. Imagine 500 concurrent users all request the same popular product page. The cache entry expires at the exact same moment. All 500 requests find a cache miss simultaneously and all fire a database query at once. Your database gets hammered with 500 identical queries in the same millisecond. The fix is probabilistic early expiration or using a mutex lock in your cache-miss path so only one request rebuilds the cache while others wait.

The rule of thumb: if your cache powers any page that gets high traffic, you need to think about stampedes. If your cache serves data with truly random access patterns, you probably don't.

redis_expiry_and_eviction.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# =========================================================
# CONFIGURING EVICTION POLICY (redis.conf or at runtime)
# =========================================================

# Set max memory to 256MB — Redis will start evicting when this is reached
127.0.0.1:6379> CONFIG SET maxmemory 268435456

# allkeys-lru = evict least-recently-used key from the ENTIRE keyspace
# Best default for a pure cache where all keys have roughly equal value
127.0.0.1:6379> CONFIG SET maxmemory-policy allkeys-lru

# Confirm the config was applied
127.0.0.1:6379> CONFIG GET maxmemory-policy

# =========================================================
# TTL COMMANDS — managing key lifetime
# =========================================================

# SET with expiry in seconds
127.0.0.1:6379> SET session:user:8821 "eyJhbGciOiJIUzI1NiJ9" EX 3600

# SET with expiry in milliseconds (for sub-second precision)
127.0.0.1:6379> SET rate_check:ip:192.168.1.1 "1" PX 60000

# Check TTL in seconds (-1 = no expiry, -2 = key doesn't exist)
127.0.0.1:6379> TTL session:user:8821

# Check TTL in milliseconds (more precise)
127.0.0.1:6379> PTTL rate_check:ip:192.168.1.1

# PERSIST removes the TTL — key lives forever (use with caution)
127.0.0.1:6379> PERSIST session:user:8821
127.0.0.1:6379> TTL session:user:8821

# =========================================================
# MUTEX PATTERN — prevent cache stampedes
# NX = only set if key does NOT exist (atomic check-and-set)
# This is a distributed lock: only the first caller wins
# =========================================================

# First request tries to acquire the rebuild lock (TTL=10s to auto-release)
127.0.0.1:6379> SET lock:product:7:rebuild "1" NX EX 10

# Second concurrent request tries the same — gets nil (lock is taken)
127.0.0.1:6379> SET lock:product:7:rebuild "1" NX EX 10

# After the first request rebuilds the cache and releases the lock:
127.0.0.1:6379> DEL lock:product:7:rebuild

# =========================================================
# CHECK MEMORY USAGE
# =========================================================

# See overall memory stats
127.0.0.1:6379> INFO memory

# See exactly how much RAM one key is using (in bytes)
127.0.0.1:6379> MEMORY USAGE session:user:8821
Output
# CONFIG SET responses:
OK
OK
# CONFIG GET maxmemory-policy:
1) "maxmemory-policy"
2) "allkeys-lru"
# SET with EX/PX:
OK
OK
# TTL session:user:8821:
(integer) 3598 # approximately 3600, decreasing
# PTTL rate_check (milliseconds):
(integer) 59847
# After PERSIST:
OK
(integer) -1 # -1 means no expiry — key lives forever now
# First SET NX (lock acquired):
OK
# Second SET NX (lock already held):
(nil) # nil = the SET was rejected — lock is taken
# DEL lock:
(integer) 1 # 1 = key was deleted
# MEMORY USAGE:
(integer) 88 # this specific key uses 88 bytes of RAM
Watch Out: The Default Eviction Policy Will Cause Production Outages
Redis ships with maxmemory-policy noeviction by default. If you don't set a maxmemory limit AND an eviction policy before going to production, one of two things happens: Redis eats all available RAM until the OS kills it, or Redis fills up and starts returning COMMAND errors to your application. Always set maxmemory and maxmemory-policy allkeys-lru in your redis.conf before deploying. Check it now — seriously.
Production Insight
noeviction is safe only for bounded, permanent data — it's a landmine for caches.
allkeys-lru works well for uniform access patterns but evicts cold but important keys.
volatile-lru preserves permanent keys but evicts TTL keys — better for hybrid workloads.
Rule: for a pure cache, use allkeys-lru; for mixed usage, volatile-ttl or volatile-lru.
Key Takeaway
Always set maxmemory-policy to allkeys-lru for caches — never rely on defaults.
Cache stampedes are silent DB killers — use mutex locks or probabilistic early expiration.
TTL jitter prevents mass expiry; monitor evicted_keys and expired_keys in production.

Redis Persistence — RDB vs AOF and Production Trade-offs

By default, Redis stores everything in RAM. If the server restarts, all data is lost. For a cache, that's acceptable. But many teams use Redis as a session store, a rate-limiter state store, or even a primary database for high-frequency writes. In those cases, losing data on restart is catastrophic.

RDB (Redis Database) — periodic snapshots of the entire dataset to disk. You configure how often (e.g., save 900 1 means if at least 1 key changed in 900 seconds, save). RDB files are compact and great for backups/disaster recovery. The downside: you lose data between snapshots. A crash 10 minutes before the next snapshot loses 10 minutes of writes.

AOF (Append Only File) — logs every write command to an append-only file. You can replay the file on restart to reconstruct the dataset. AOF gives you finer granularity: you can configure appendfsync everysec (lose at most 1 second of writes) or always (every write forces disk sync, near-zero data loss but 30-50% slower writes).

Most production setups use both. Redis supports a combined mode: RDB for fast restores, AOF for durability. When both are enabled, Redis loads the AOF on restart because it's more complete.

The choice matters for your data loss tolerance. Session stores need AOF everysec. Cache layers don't need persistence at all. A leaderboard that can rebuild from database can afford RDB-only. Match the persistence config to your data's criticality.

redis_persistence_config.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# =========================================================
# CONFIGURING PERSISTENCE (redis.conf)
# =========================================================

# --- RDB Snapshotting ---
# Format: save <seconds> <changes>
# Trigger: at least 1 change in 900s (15 min) OR 10 in 300s (5 min) OR 10000 in 60s
save 900 1
save 300 10
save 60 10000

# RDB file location and name
dbfilename dump.rdb
dir /var/lib/redis/

# Compress the RDB file (yes/no)
rdbcompression yes

# --- AOF Persistence ---
# Enable AOF (yes/no)
appendonly yes

# AOF file name
appendfilename "appendonly.aof"

# Sync policy:
# always = sync every write (safest, slowest)
# everysec = sync once per second (default, best trade-off)
# no = let OS decide (fastest, least safe)
appendfsync everysec

# --- AOF Rewrite ---
# Auto-rewrite AOF when it grows 100% in size (auto-aof-rewrite-percentage 100)
# and reaches at least 64MB (auto-aof-rewrite-min-size 64mb)
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# --- Check current config ---
# From redis-cli
127.0.0.1:6379> CONFIG GET save
127.0.0.1:6379> CONFIG GET appendonly
127.0.0.1:6379> CONFIG GET appendfsync

# --- Force an RDB save immediately ---
127.0.0.1:6379> SAVE

# --- Force an AOF rewrite ---
127.0.0.1:6379> BGREWRITEAOF

# --- Check persistence stats ---
127.0.0.1:6379> INFO persistence
Output
# CONFIG GET responses:
1) "save"
2) "900 1 300 10 60 10000"
3) "appendonly"
4) "yes"
5) "appendfsync"
6) "everysec"
# SAVE:
OK
# BGREWRITEAOF:
Background append only file rewriting started
# INFO persistence (abbreviated):
# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_last_save_time:1718476800
rdb_last_save_status:ok
aof_enabled:1
aof_rewrite_in_progress:0
aof_last_bgrewrite_status:ok
aof_current_size:1234567
Persistence Mental Model: Backup Camera vs Dash Cam
  • RDB: Low overhead, fast recovery, but data loss window (up to your save interval). Best for cache layers and scenarios where data can be rebuilt.
  • AOF: Higher overhead (especially appendfsync always), but sub-second data loss. Best for session stores, rate limiters, critical counters.
  • Both: Use both for maximum safety. Redis prioritizes AOF on restart. The small disk cost is worth the peace of mind.
  • Production trap: AOF with appendfsync always can cause write latency spikes during disk syncs. Test with your write throughput before enabling.
Production Insight
AOF appendfsync always cuts write throughput by ~50% compared to everysec.
RDB snapshot on busy instances can block writes for milliseconds during fork.
Large AOF files (>1GB) slow down startup — BGREWRITEAOF prevents unbounded growth.
Rule: session stores use AOF everysec; caches use RDB-only or no persistence.
Key Takeaway
RDB for fast restores and backups, AOF for near-zero data loss.
AOF everysec is the production sweet spot — lose at most 1 second.
Monitor aof_current_size and rewrite regularly; test recovery process quarterly.
● Production incidentPOST-MORTEMseverity: high

The Late-Night Pager: Redis Write Failures from Default Eviction Policy

Symptom
Application logs show sporadic 'OOM command not allowed when used memory > maxmemory' errors. Some transactions succeed, others fail with no clear pattern. No alarms on database latency or CPU.
Assumption
The team assumed the Redis instance had enough memory (64 GB) and that eviction would happen automatically. They never touched the maxmemory-policy config.
Root cause
Default maxmemory-policy noeviction combined with a memory leak from unbounded session storage. When Redis hit the maxmemory limit (set to 48 GB via monitoring tool), it refused all writes. No key had a TTL, so no keys expired. The leak was caused by a session table that never cleaned stale sessions.
Fix
Set maxmemory-policy allkeys-lru in redis.conf, added TTLs to session keys (EXPIRE 3600), and configured memory alerting at 80% of maxmemory. Ran CONFIG SET maxmemory-policy allkeys-lru as a live fix, then bounced the service to clear memory.
Key lesson
  • Always configure maxmemory-policy before going to production — never rely on defaults.
  • Every SET command must include a TTL unless the key is truly permanent (and documented).
  • Set memory usage alerts at 70% and 85% of maxmemory — don't wait for OOM errors.
  • Use MEMORY USAGE and MEMORY STATS in production monitoring to detect leaks early.
Production debug guideSymptom → Action patterns for the most common Redis failures5 entries
Symptom · 01
Commands return 'OOM command not allowed when used memory > maxmemory'
Fix
Check INFO memory to see used_memory_human vs maxmemory. Temporarily increase maxmemory with CONFIG SET maxmemory 2gb, then permanently fix the eviction policy and memory leak.
Symptom · 02
Application reports high latency on Redis operations (>10ms)
Fix
Run SLOWLOG GET 10 to find slow commands. Check CLIENT LIST for long-running connections. Verify network latency between app and Redis with redis-cli --latency.
Symptom · 03
Keys disappear unexpectedly before TTL expiry
Fix
Check if eviction is active: INFO evicted_keys. If evicted_keys > 0, the maxmemory-policy is evicting keys. Review maxmemory setting and add TTLs where missing. Also check for FLUSHDB/FLUSHALL in slowlog.
Symptom · 04
Cache miss rate spikes suddenly
Fix
Check for mass key expiry: INFO expired_keys. If many keys share the same TTL, you have a cache stampede. Use random TTL jitter (±20%) on SETs to spread expiry times. Also check if a deployment cleared cache via FLUSHALL.
Symptom · 05
Redis connection refused or timeout
Fix
Check redis-cli ping. If fails, verify systemctl status redis, check logs at /var/log/redis/redis-server.log. Ensure bind config allows app IP. Check firewall/security groups. redis-cli -h <host> -p 6379 PING from the app server.
★ Redis Quick Debug Cheat SheetThree-command diagnostics for the most painful Redis production issues
REDIS OOM ERROR - writes failing
Immediate action
Temporarily increase maxmemory to let writes through
Commands
redis-cli CONFIG SET maxmemory 2gb
redis-cli CONFIG SET maxmemory-policy allkeys-lru
Fix now
After live fix, permanently update redis.conf: maxmemory 1gb and maxmemory-policy allkeys-lru. Restart Redis to apply.
REDIS HIGH LATENCY - slow responses+
Immediate action
Get the top slow commands to identify the bottleneck
Commands
redis-cli SLOWLOG GET 10
redis-cli --latency -h <host> -p 6379
Fix now
If SLOWLOG shows KEYS * or SMEMBERS on large sets, replace with SCAN or SSCAN. Use pipelining for batch operations.
REDIS DISCONNECTED - app can't connect+
Immediate action
Test connectivity and check if Redis is up
Commands
redis-cli PING
systemctl status redis # or `ps aux | grep redis-server`
Fix now
If Redis not running: systemctl start redis. If network issue: check bind 0.0.0.0 in redis.conf and firewall rules. If still failing, tail logs: tail -f /var/log/redis/redis-server.log.
Data StructureBest Use CaseKey CommandsStores Duplicates?Ordered?
StringCached values, counters, session tokensGET, SET, INCR, EXPIREN/A (single value)N/A
HashObject/entity fields (user profiles, product data)HGET, HSET, HGETALL, HINCRBYN/A (field map)No
ListQueues, activity feeds, job pipelinesRPUSH, LPOP, BLPOP, LRANGEYesInsertion order
SetUnique visitors, tags, friend graphsSADD, SMEMBERS, SINTER, SUNIONNoNo
Sorted SetLeaderboards, priority queues, range queriesZADD, ZREVRANGE, ZRANK, ZSCORENo (by member)By score (float)

Key takeaways

1
Redis is fast because it lives in RAM and uses a single-threaded event loop
which makes every command atomic without needing locks.
2
Sorted Sets are Redis's most underrated data structure
they let you build real-time leaderboards and priority queues in a single command with no application-side sorting.
3
Always set a TTL and always configure maxmemory-policy before production
the default noeviction policy will cause your app to throw errors when Redis fills up.
4
Cache-Aside (lazy loading) is the most production-proven caching pattern
check cache first, miss falls back to DB, always store with expiry, invalidate explicitly on writes.
5
Match your persistence strategy to data criticality
AOF everysec for session stores, RDB-only or no persistence for caches, both for maximum safety.

Common mistakes to avoid

4 patterns
×

Not setting a TTL on cached keys

Symptom
Redis memory grows unboundedly until the server OOMs or Redis starts rejecting writes. Keys accumulate from every cache miss, never expire.
Fix
Make it a rule in code review that every SET must include EX or PX. Write a wrapper function that makes TTL mandatory and throws an exception if the caller omits it. Monitor used_memory and set alerts at 70% of maxmemory.
×

Using KEYS * in production to find matching keys

Symptom
KEYS is O(n) and blocks Redis's single thread while it scans the entire keyspace. On a server with 10 million keys, this freezes Redis for seconds, dropping all other requests. Application times out and user requests fail.
Fix
Use SCAN instead — it iterates in small batches without blocking. Example: SCAN 0 MATCH product:* COUNT 100 returns up to 100 matching keys per call and a cursor to continue from. Never use KEYS in production code.
×

Storing large serialized objects as Strings and updating them non-atomically

Symptom
If two concurrent requests both GET a user JSON blob, update different fields in their own memory, then SET the blob back, one request's write silently overwrites the other's. This is a classic race condition causing data loss (last write wins).
Fix
Use a Hash and HSET to update individual fields atomically. If you must use a String, wrap the read-modify-write in a Lua script via EVAL to ensure atomicity. Better yet, never store mutable aggregates as Strings in Redis.
×

Setting the same TTL for all cache keys in a high-traffic endpoint

Symptom
When the TTL expires, all keys for that endpoint expire simultaneously, causing a thundering herd. The database gets hammered with 1000s of identical queries, potentially causing a cascading outage.
Fix
Add random jitter to the TTL: ttl = base_ttl + random.uniform(-0.2base_ttl, 0.2base_ttl). This spreads expiry over time. Also consider probabilistic early expiration (refresh the cache when TTL < 10% of original).
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Redis is single-threaded — how can it handle thousands of concurrent con...
Q02SENIOR
Explain the difference between Redis RDB snapshotting and AOF persistenc...
Q03SENIOR
What is a cache stampede and how would you prevent it in a high-traffic ...
Q01 of 03SENIOR

Redis is single-threaded — how can it handle thousands of concurrent connections without being a bottleneck?

ANSWER
Redis uses an event-driven, non-blocking I/O model with a single thread for command processing. All network I/O is handled by epoll/kqueue (multiplexing), so connections don't block each other. The single thread processes commands sequentially, which eliminates lock contention and context switching. Redis is fast because it keeps everything in RAM and operations are O(1) or O(log n) — not because it multitasks. The bottleneck is never the number of connections but the complexity of individual commands (avoid O(n) commands like KEYS, SMEMBERS on large sets).
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Is Redis a database or a cache?
02
What happens to Redis data when the server restarts?
03
When should I use a Hash instead of storing JSON as a String?
04
What eviction policy should I use for a production cache?
05
Can I run Redis in a Kubernetes cluster?
🔥

That's NoSQL. Mark it forged?

6 min read · try the examples if you haven't

Previous
MongoDB Indexing
6 / 15 · NoSQL
Next
Redis Data Structures