Intermediate 17 min · March 05, 2026

Redis Basics

Redis Write Failures — Default Eviction Policy

Q: Is Redis a database or a cache?

Redis is both, and that's not a cop-out. It's an in-memory data structure server that can act as a cache (with TTLs and eviction), a primary database (with RDB or AOF persistence), a message broker (with pub/sub or Streams), or a queue (with Lists and BLPOP). Most teams use it alongside a relational database — Redis handles high-frequency reads while the relational DB handles durable writes.

Q: What happens to Redis data when the server restarts?

By default, if you're using Redis purely as an in-memory cache with no persistence configured, all data is lost on restart — which is usually fine for a cache. For durability, enable RDB snapshotting (periodic disk snapshots) or AOF (append-only log of every write). With AOF and `appendfsync always`, you get near-zero data loss but a small write performance penalty.

Q: When should I use a Hash instead of storing JSON as a String?

Use a Hash whenever you need to update individual fields of an object independently and frequently. If you store a user as a JSON String and need to increment their login_count, you must fetch the entire blob, deserialize it, increment the field, and store it back — all in two round trips with a race condition window. With a Hash, HINCRBY user:1001 login_count 1 does this atomically in one command with no deserialization needed.

Q: What eviction policy should I use for a production cache?

For a pure cache where all data can be regenerated, use `allkeys-lru`. It evicts the least recently used keys across the entire dataset when memory fills up. For mixed workloads (some permanent keys, some TTL-based caches), use `volatile-ttl` which evicts keys with shorter remaining TTLs first. Never leave the default `noeviction` — it will make your application fail when memory is full.

Q: Can I run Redis in a Kubernetes cluster?

Yes, but carefully. Redis is stateful — it needs persistent volumes for RDB/AOF files. Use a StatefulSet with PVCs. Configure `maxmemory-policy` appropriately because memory is a constrained resource in containers. Use `redis-sentinel` or Redis Cluster for high availability. Be aware that Kubernetes pod restarts will trigger data loss if persistence is not configured correctly. Many production teams run Redis outside Kubernetes (on dedicated VMs) to avoid these complexities.

OOM errors from default 'noeviction' policy: use MEMORY USAGE to detect session leaks and avoid late-night pager calls—debugging steps for production..

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Drawn from code that ran under real load.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Redis is a single-threaded, in-memory data structure server that operates in microseconds
Uses RAM for storage — eliminates disk I/O latency (nanoseconds vs milliseconds)
Core types: Strings, Hashes, Lists, Sets, Sorted Sets — each designed for a specific access pattern
Key expiry (TTL) prevents memory bloat and ensures data freshness
Production risk: default noeviction policy causes write failures when memory fills up without warning

✦ Definition~90s read

What is Redis Basics?

Redis is an in-memory key-value store that operates entirely in RAM, which gives it sub-millisecond latency but imposes a hard memory ceiling. When you hit that ceiling—say your 4GB Redis instance fills up—something has to give. The default eviction policy (noeviction) simply rejects any write that would exceed maxmemory, returning an error to your application.

★

Imagine your brain vs a filing cabinet.

This is fine for caches where you control TTLs tightly, but catastrophic for session stores or queues where writes must succeed. The alternative policies—like allkeys-lru or volatile-ttl—automatically delete keys to free space, trading data retention for availability.

You choose the policy based on whether your data is ephemeral (cache) or durable (rate limiter counters).

Redis's RAM-only design is both its superpower and its Achilles' heel. Unlike PostgreSQL or MySQL, which spill to disk and handle datasets larger than memory, Redis keeps everything hot. This means a single misconfigured eviction policy can silently drop your most recent user sessions while preserving stale data from last week.

The maxmemory setting (default 0 = unlimited) is the safety valve—without it, Redis will happily OOM your server. In production, you pair this with Redis data structures like sorted sets for leaderboards or hashes for user profiles, each with different memory footprints that affect how quickly you hit the limit.

The cache-aside pattern is where most teams encounter eviction: your app checks Redis first, falls back to the database on miss, then writes the result back to Redis with a TTL. If you set maxmemory-policy allkeys-lru, Redis will evict the least recently used keys when full—perfect for this pattern.

But if you're using Redis for pub/sub or streams, eviction can silently drop messages. The tradeoff is stark: noeviction guarantees no data loss but risks write failures under load; allkeys-lru guarantees writes succeed but may delete data you thought was safe.

Real-world incidents—like GitHub's 2018 Redis outage—often trace back to mismatched eviction policies and memory limits.

Plain-English First

Imagine your brain vs a filing cabinet. When your teacher asks you the capital of France, you don't go rummaging through a cabinet — you just know it instantly because it's in your short-term memory. Redis is that short-term memory for your application. Your main database (PostgreSQL, MySQL) is the filing cabinet — thorough but slow. Redis sits in RAM, razor-close to your app, so fetching a value takes microseconds instead of milliseconds. It's a supercharged sticky-note board your entire server cluster can share.

Every high-traffic application hits the same wall eventually: the database becomes the bottleneck. A query that takes 40ms feels invisible during development but becomes catastrophic when 10,000 users hit it simultaneously. Twitter, GitHub, Stack Overflow, and Shopify all reached this wall — and Redis is a big part of how they broke through it. It's not an exaggeration to say that understanding Redis is what separates junior developers from engineers who can design systems that actually scale.

Redis (Remote Dictionary Server) solves the read-amplification problem. Most web applications read data far more than they write it — a product page might be read 50,000 times a day but updated once. Hammering your relational database with 50,000 identical queries is wasteful and slow. Redis lets you compute the answer once, store it in memory, and serve all 50,000 requests from there in microseconds. But Redis isn't just a cache — it's a full data structure server that can power rate limiters, leaderboards, pub/sub messaging, session stores, and queues.

By the end of this article you'll understand not just what Redis commands look like, but WHY each data structure exists, WHEN to reach for each one, and how to wire Redis into a real application pattern. You'll also learn the subtle mistakes — wrong expiry strategies, cache stampedes, missing persistence configs — that trip up developers who learned Redis from a cheat sheet instead of from first principles.

What Redis Eviction Policy Actually Does

Redis is an in-memory key-value store where all data lives in RAM. When memory fills up, Redis must decide which keys to remove to make room for new writes — that decision is the eviction policy. By default, Redis uses no eviction (noeviction), meaning writes fail with an OOM error when maxmemory is reached. This is not a bug; it's a deliberate safety mechanism to prevent data loss without explicit configuration.

The eviction policy is set via maxmemory-policy in redis.conf. With noeviction, Redis rejects SET, LPUSH, and other write commands, returning an error to the client. Other policies like allkeys-lru or volatile-ttl evict keys based on usage or TTL. The default choice forces you to acknowledge memory limits — a design that prevents silent data loss but requires proactive capacity planning.

Use noeviction when data integrity is critical and you can predict memory usage — for example, caching session tokens with fixed TTLs. In production, teams often switch to allkeys-lru for general caching, but the default catches engineers off guard when memory spikes cause sudden write failures. Understanding this default is the first step to designing resilient Redis deployments.

⚠ Default ≠ Safe

The default noeviction policy does not protect you from crashes — it protects you from silent eviction, but write failures can cascade into application errors.

📊 Production Insight

Teams using Redis for rate limiting hit noeviction when a traffic burst fills memory, causing all rate-limit increments to fail silently.

Symptom: clients receive 'OOM command not allowed when used memory > maxmemory' errors on every write, while reads still work.

Rule: always set maxmemory-policy to allkeys-lru for caching workloads; reserve noeviction only for data you cannot afford to lose.

🎯 Key Takeaway

Default noeviction causes write failures, not data eviction — plan for it.

Memory limits are enforced at write time, not proactively — monitor used_memory.

Choose eviction policy based on data criticality, not convenience — noeviction is for durable data, not caches.

thecodeforge.io

Redis Basics

Why Redis Lives in RAM and Why That Changes Everything

Traditional databases store data on disk. Disk access — even an NVMe SSD — operates in the microseconds-to-milliseconds range. RAM access operates in nanoseconds. That's not a small difference; it's three orders of magnitude. Redis keeps its entire dataset in memory by default, which is the single most important architectural decision behind its speed.

But speed isn't the only trick. Redis is single-threaded for command execution. That sounds like a limitation until you understand what it eliminates: lock contention. In a multi-threaded database, threads fight over the same rows with locks. Redis sidesteps that fight entirely — one command runs to completion before the next starts. This makes Redis operations atomic by default, which matters enormously for things like incrementing a counter or checking-then-setting a value.

Redis also supports optional persistence. You can tell it to snapshot its RAM contents to disk every N seconds (RDB snapshotting) or to log every write command to an append-only file (AOF). Most production setups use both. This means Redis isn't just a volatile cache — it can survive a restart and recover its data.

The practical takeaway: use Redis for data that is read far more than it's written, where milliseconds matter, and where you can tolerate the data being slightly stale or reproducible if lost.

redis_getting_started.shBASH

# --- Step 1: Start the Redis server (run in one terminal) ---
# This launches Redis with default config on port 6379
redis-server

# --- Step 2: Connect with the Redis CLI (run in another terminal) ---
redis-cli

# --- Step 3: Ping the server to confirm it's alive ---
127.0.0.1:6379> PING
# Redis responds with PONG — the simplest health check you'll ever do

# --- Step 4: Store a string value (the most basic operation) ---
# SET key value
# We're caching a user's display name keyed by their user ID
127.0.0.1:6379> SET user:1001:display_name "Alice Nguyen"

# --- Step 5: Retrieve it ---
127.0.0.1:6379> GET user:1001:display_name

# --- Step 6: Store a value with an expiry (TTL = Time To Live) ---
# EX sets expiry in seconds — this key auto-deletes after 300 seconds (5 minutes)
# This is the pattern for caching: store it, let it expire, recompute if missing
127.0.0.1:6379> SET product:42:price "29.99" EX 300

# --- Step 7: Check how many seconds remain before expiry ---
127.0.0.1:6379> TTL product:42:price

# --- Step 8: Check if a key exists without fetching its value ---
127.0.0.1:6379> EXISTS user:1001:display_name

Output

# redis-server output (abbreviated):

# * Ready to accept connections on port 6379

# redis-cli responses:

PONG

"Alice Nguyen"

298 # (seconds remaining — decreasing in real time)

(integer) 1 # 1 = key exists, 0 = it doesn't

💡Pro Tip: Use Colons as Namespace Separators

Redis has no concept of tables or schemas. The community convention is to structure keys like entity:id:field — e.g., user:1001:display_name or session:abc123. This makes keys self-documenting and lets tools like RedisInsight group them visually. Never use flat keys like displayname1001 — you'll hate yourself when you have 2 million keys to debug.

📊 Production Insight

Single-threaded means one slow command blocks all others — a KEYS * on 10M keys freezes Redis for seconds.

Latency jumps from microseconds to milliseconds when RAM is near full — Redis starts swapping or evicting.

Rule: monitor maxmemory and instantaneous_ops_per_sec in production; never use KEYS in app code.

🎯 Key Takeaway

Redis is fast because RAM + single thread eliminates disk I/O and lock contention.

Every command is atomic — a blessing for counters, a curse if you run an O(n) scan.

Use colons for key names, always set TTLs, and monitor memory before it bites you.

Redis Data Structures — Picking the Right Tool for Each Problem

Redis isn't just a key-value store in the boring sense. It stores five core data types, and choosing the right one is the difference between an elegant solution and a painful hack.

Strings — the default. Good for counters, cached HTML, serialized JSON blobs, and session tokens. The INCR command atomically increments a string-as-integer, making it perfect for rate limiting and hit counters.

Hashes — think of a Hash as a mini dictionary attached to one key. Instead of storing a user as one giant JSON blob, you store their fields separately. This lets you update a single field without fetching and re-serializing the entire object.

Lists — ordered, duplicates allowed. Built on a linked list internally. Ideal for queues (push to the tail, pop from the head) and activity feeds (push new events to the head, trim the list to keep only the last N).

Sets — unordered, unique members. Perfect for tracking unique visitors, tagging systems, or finding common followers between two users with SINTER.

Sorted Sets — the crown jewel. Every member has a floating-point score. Redis keeps members ordered by score automatically. This is how you build leaderboards, priority queues, and range-based queries without a single SQL ORDER BY.

redis_data_structures.shBASH

# =========================================================
# STRINGS — atomic counter for API rate limiting
# =========================================================

# Track how many API calls user 2055 has made this minute
# INCR is atomic — safe even with concurrent requests
127.0.0.1:6379> INCR api_calls:user:2055:minute:2024061514
127.0.0.1:6379> INCR api_calls:user:2055:minute:2024061514
127.0.0.1:6379> INCR api_calls:user:2055:minute:2024061514

# Set it to expire at the end of the minute (60 seconds)
127.0.0.1:6379> EXPIRE api_calls:user:2055:minute:2024061514 60

# =========================================================
# HASHES — store a user profile without one giant JSON blob
# =========================================================

# HSET sets one or more fields on a hash key
# Updating just the email later only rewrites that one field
127.0.0.1:6379> HSET user:2055 username "bob_the_dev" email "bob@example.com" plan "pro" login_count 0

# Retrieve a single field — no need to deserialize a full JSON object
127.0.0.1:6379> HGET user:2055 email

# Retrieve all fields at once
127.0.0.1:6379> HGETALL user:2055

# Atomically increment just the login counter
127.0.0.1:6379> HINCRBY user:2055 login_count 1

# =========================================================
# SORTED SETS — real-time game leaderboard
# =========================================================

# ZADD leaderboard_key score member
# Score is the player's points — Redis sorts automatically
127.0.0.1:6379> ZADD game:leaderboard 4200 "player:alice"
127.0.0.1:6379> ZADD game:leaderboard 8750 "player:bob"
127.0.0.1:6379> ZADD game:leaderboard 6100 "player:carol"

# Fetch top 3 players, highest score first (WITHSCORES shows the score)
# ZREVRANGE = reverse order = highest to lowest
127.0.0.1:6379> ZREVRANGE game:leaderboard 0 2 WITHSCORES

# Get a specific player's rank (0-indexed, 0 = top)
127.0.0.1:6379> ZREVRANK game:leaderboard "player:alice"

# =========================================================
# LISTS — lightweight task queue
# =========================================================

# LPUSH adds to the LEFT (head) of the list
# Workers will RPOP from the RIGHT (tail) — FIFO queue
127.0.0.1:6379> RPUSH email_queue "{\"to\":\"alice@example.com\",\"subject\":\"Welcome\"}"
127.0.0.1:6379> RPUSH email_queue "{\"to\":\"bob@example.com\",\"subject\":\"Reset\"}"

# BLPOP = blocking pop — worker waits up to 5 seconds for a job
# This is more efficient than polling in a loop
127.0.0.1:6379> BLPOP email_queue 5

Output

# INCR responses:

(integer) 1

(integer) 2

(integer) 3

(integer) 1 # EXPIRE confirmation

# HGET response:

"bob@example.com"

# HGETALL response:

1) "username"

2) "bob_the_dev"

3) "email"

4) "bob@example.com"

5) "plan"

6) "pro"

7) "login_count"

8) "0"

# HINCRBY response:

(integer) 1

# ZREVRANGE top 3:

1) "player:bob"

2) "8750"

3) "player:carol"

4) "6100"

5) "player:alice"

6) "4200"

# ZREVRANK alice (0-indexed from top):

(integer) 2 # alice is 3rd place

# BLPOP response (job dequeued):

1) "email_queue"

2) "{\"to\":\"alice@example.com\",\"subject\":\"Welcome\"}"

⚠ Watch Out: Don't Store Giant Objects in a Single String

A common mistake is serializing an entire user object — with 40 fields — into one JSON string and storing it as a Redis String. Every time you need to update the user's last_login field, you must GET the entire blob, deserialize it in your app, update the field, re-serialize, and SET it back. Under concurrent load this causes race conditions and unnecessary network traffic. Use a Hash instead — HSET lets you update one field atomically in a single round trip.

📊 Production Insight

Large Strings ( >10KB) increase memory fragmentation and network latency per operation.

Sorted Sets with O(log n) insert/update are efficient for ~1M members but degrade with >10M.

Rule: keep values under 10KB; if larger, consider compression or separate storage.

🎯 Key Takeaway

Choose the data structure that matches your access pattern, not the one you're used to.

Strings are not for mutable objects — Hashes are.

Sorted Sets are Redis's superpower: real-time rankings, range queries, and priority queues in one command.

thecodeforge.io

Redis Basics

The Cache-Aside Pattern — Wiring Redis Into a Real Application

Knowing Redis commands is one thing. Knowing how to integrate Redis into your application code without creating subtle bugs is another. The most widely-used pattern is Cache-Aside (also called Lazy Loading). The logic is elegantly simple: when your app needs data, check Redis first. If it's there (a cache hit), return it immediately. If it's not (a cache miss), fetch it from the database, store it in Redis with a TTL, then return it. Redis never gets data pushed to it — your application pulls it through.

This pattern is powerful because it's self-healing. If Redis goes down and loses all its data, your app degrades gracefully — everything just goes to the database until Redis is warm again. The cache populates itself organically based on what users actually request, not what you predict they'll request.

The critical detail most tutorials skip: always set a TTL. Without one, your cache grows forever and you'll eventually run out of RAM. More importantly, stale data lives forever. If a product's price changes in your database but the Redis entry never expires, customers see wrong prices indefinitely. Your TTL is your freshness guarantee.

The code below shows this pattern implemented in Python with the redis-py library — the same library used by Instagram and Pinterest in production.

cache_aside_pattern.pyPYTHON

import redis
import json
import time

# --- Connect to Redis ---
# decode_responses=True means Redis returns strings instead of bytes
redis_client = redis.Redis(
    host="localhost",
    port=6379,
    db=0,
    decode_responses=True
)

# --- Simulated database fetch (replace with your real DB query) ---
def fetch_product_from_database(product_id: int) -> dict:
    """
    Simulates a slow database query.
    In production this would be: cursor.execute('SELECT * FROM products WHERE id = %s', [product_id])
    """
    print(f"[DB] Querying database for product {product_id}...")
    time.sleep(0.05)  # simulate 50ms DB query latency
    return {
        "id": product_id,
        "name": "Mechanical Keyboard TKL",
        "price": 129.99,
        "stock": 42
    }

def get_product(product_id: int) -> dict:
    """
    Cache-Aside Pattern implementation.
    Always check Redis first. Fall back to DB on miss. Always set a TTL.
    """
    cache_key = f"product:{product_id}"  # namespaced key following the colon convention
    cache_ttl_seconds = 300              # cache is valid for 5 minutes

    # --- Step 1: Try the cache first ---
    cached_value = redis_client.get(cache_key)

    if cached_value is not None:
        # Cache HIT — data found in Redis, no DB query needed
        print(f"[CACHE] Hit for key '{cache_key}'")
        return json.loads(cached_value)  # deserialize from JSON string back to dict

    # --- Step 2: Cache MISS — go to the database ---
    print(f"[CACHE] Miss for key '{cache_key}'")
    product_data = fetch_product_from_database(product_id)

    # --- Step 3: Populate the cache for next time ---
    # json.dumps serializes the dict to a JSON string for storage
    # ex=cache_ttl_seconds ensures the key auto-expires — NEVER skip this
    redis_client.set(
        cache_key,
        json.dumps(product_data),
        ex=cache_ttl_seconds
    )
    print(f"[CACHE] Stored '{cache_key}' with TTL={cache_ttl_seconds}s")

    return product_data

def invalidate_product_cache(product_id: int):
    """
    Call this whenever a product is updated in the database.
    Removing the key forces the next request to re-fetch fresh data.
    """
    cache_key = f"product:{product_id}"
    deleted_count = redis_client.delete(cache_key)
    if deleted_count > 0:
        print(f"[CACHE] Invalidated key '{cache_key}'")
    else:
        print(f"[CACHE] Key '{cache_key}' wasn't in cache — nothing to invalidate")

# --- Demo ---
if __name__ == "__main__":
    print("=== First request — cold cache ===")
    product = get_product(product_id=7)
    print(f"Result: {product}\n")

    print("=== Second request — warm cache ===")
    product = get_product(product_id=7)
    print(f"Result: {product}\n")

    print("=== Simulating a product update ===")
    invalidate_product_cache(product_id=7)

    print("\n=== Third request — cache was invalidated ===")
    product = get_product(product_id=7)
    print(f"Result: {product}")

Output

=== First request — cold cache ===

[CACHE] Miss for key 'product:7'

[DB] Querying database for product 7...

[CACHE] Stored 'product:7' with TTL=300s

Result: {'id': 7, 'name': 'Mechanical Keyboard TKL', 'price': 129.99, 'stock': 42}

=== Second request — warm cache ===

[CACHE] Hit for key 'product:7'

Result: {'id': 7, 'name': 'Mechanical Keyboard TKL', 'price': 129.99, 'stock': 42}

=== Simulating a product update ===

[CACHE] Invalidated key 'product:7'

=== Third request — cache was invalidated ===

[CACHE] Miss for key 'product:7'

[DB] Querying database for product 7...

[CACHE] Stored 'product:7' with TTL=300s

Result: {'id': 7, 'name': 'Mechanical Keyboard TKL', 'price': 129.99, 'stock': 42}

🔥Interview Gold: Cache-Aside vs Write-Through

Cache-Aside loads data lazily (on first read). Write-Through updates the cache on every write, keeping it always warm but adding write latency. The tradeoff: Cache-Aside has slower first reads but only caches what's actually needed. Write-Through has faster reads but wastes memory caching data that may never be read again. Most production systems use Cache-Aside with explicit invalidation on writes — exactly the pattern shown above.

📊 Production Insight

Cache-Aside with uniform TTL causes thundering herd when all keys expire together.

Without invalidation, stale data persists for TTL duration — can display wrong prices or broken states.

Rule: add random jitter to TTLs (±20%) and always have an invalidation path for critical data.

🎯 Key Takeaway

Check Redis first, DB as fallback, always set TTL with jitter.

Invalidate cache explicitly on writes — don't wait for TTL to expire.

This pattern is production-proven (Instagram, Pinterest, Twitter) for good reason.

Redis Expiry, Eviction and Why Your Cache Will Betray You Without Them

TTLs are your first line of defense against stale data. But what happens when Redis runs out of memory before any keys expire? This is where eviction policies come in, and most developers don't think about them until Redis starts refusing writes in production — which is a very bad day.

Redis has several eviction policies configured via maxmemory-policy in your redis.conf. The default policy is noeviction — Redis refuses new writes when full. That sounds safe but it means your application starts throwing errors. For a cache, you almost always want allkeys-lru (evict the least recently used key across all keys) or volatile-lru (evict the least recently used key that has a TTL set).

There's also the cache stampede problem — also called the thundering herd. Imagine 500 concurrent users all request the same popular product page. The cache entry expires at the exact same moment. All 500 requests find a cache miss simultaneously and all fire a database query at once. Your database gets hammered with 500 identical queries in the same millisecond. The fix is probabilistic early expiration or using a mutex lock in your cache-miss path so only one request rebuilds the cache while others wait.

The rule of thumb: if your cache powers any page that gets high traffic, you need to think about stampedes. If your cache serves data with truly random access patterns, you probably don't.

redis_expiry_and_eviction.shBASH

# =========================================================
# CONFIGURING EVICTION POLICY (redis.conf or at runtime)
# =========================================================

# Set max memory to 256MB — Redis will start evicting when this is reached
127.0.0.1:6379> CONFIG SET maxmemory 268435456

# allkeys-lru = evict least-recently-used key from the ENTIRE keyspace
# Best default for a pure cache where all keys have roughly equal value
127.0.0.1:6379> CONFIG SET maxmemory-policy allkeys-lru

# Confirm the config was applied
127.0.0.1:6379> CONFIG GET maxmemory-policy

# =========================================================
# TTL COMMANDS — managing key lifetime
# =========================================================

# SET with expiry in seconds
127.0.0.1:6379> SET session:user:8821 "eyJhbGciOiJIUzI1NiJ9" EX 3600

# SET with expiry in milliseconds (for sub-second precision)
127.0.0.1:6379> SET rate_check:ip:192.168.1.1 "1" PX 60000

# Check TTL in seconds (-1 = no expiry, -2 = key doesn't exist)
127.0.0.1:6379> TTL session:user:8821

# Check TTL in milliseconds (more precise)
127.0.0.1:6379> PTTL rate_check:ip:192.168.1.1

# PERSIST removes the TTL — key lives forever (use with caution)
127.0.0.1:6379> PERSIST session:user:8821
127.0.0.1:6379> TTL session:user:8821

# =========================================================
# MUTEX PATTERN — prevent cache stampedes
# NX = only set if key does NOT exist (atomic check-and-set)
# This is a distributed lock: only the first caller wins
# =========================================================

# First request tries to acquire the rebuild lock (TTL=10s to auto-release)
127.0.0.1:6379> SET lock:product:7:rebuild "1" NX EX 10

# Second concurrent request tries the same — gets nil (lock is taken)
127.0.0.1:6379> SET lock:product:7:rebuild "1" NX EX 10

# After the first request rebuilds the cache and releases the lock:
127.0.0.1:6379> DEL lock:product:7:rebuild

# =========================================================
# CHECK MEMORY USAGE
# =========================================================

# See overall memory stats
127.0.0.1:6379> INFO memory

# See exactly how much RAM one key is using (in bytes)
127.0.0.1:6379> MEMORY USAGE session:user:8821

Output

# CONFIG SET responses:

# CONFIG GET maxmemory-policy:

1) "maxmemory-policy"

2) "allkeys-lru"

# SET with EX/PX:

# TTL session:user:8821:

(integer) 3598 # approximately 3600, decreasing

# PTTL rate_check (milliseconds):

(integer) 59847

# After PERSIST:

(integer) -1 # -1 means no expiry — key lives forever now

# First SET NX (lock acquired):

# Second SET NX (lock already held):

(nil) # nil = the SET was rejected — lock is taken

# DEL lock:

(integer) 1 # 1 = key was deleted

# MEMORY USAGE:

(integer) 88 # this specific key uses 88 bytes of RAM

⚠ Watch Out: The Default Eviction Policy Will Cause Production Outages

Redis ships with maxmemory-policy noeviction by default. If you don't set a maxmemory limit AND an eviction policy before going to production, one of two things happens: Redis eats all available RAM until the OS kills it, or Redis fills up and starts returning COMMAND errors to your application. Always set maxmemory and maxmemory-policy allkeys-lru in your redis.conf before deploying. Check it now — seriously.

📊 Production Insight

noeviction is safe only for bounded, permanent data — it's a landmine for caches.

allkeys-lru works well for uniform access patterns but evicts cold but important keys.

volatile-lru preserves permanent keys but evicts TTL keys — better for hybrid workloads.

Rule: for a pure cache, use allkeys-lru; for mixed usage, volatile-ttl or volatile-lru.

🎯 Key Takeaway

Always set maxmemory-policy to allkeys-lru for caches — never rely on defaults.

Cache stampedes are silent DB killers — use mutex locks or probabilistic early expiration.

TTL jitter prevents mass expiry; monitor evicted_keys and expired_keys in production.

Redis Eviction Policies — Technical Comparison Table

Choosing the right eviction policy is one of the most consequential Redis decisions you'll make in production. Each policy has a specific use case, and using the wrong one can silently corrupt your data or crash your application. Below is a technical comparison of all eight eviction policies available in Redis 7.x.

Policy	Scope	Algorithm	Evicts	Best Use Case	Data Loss Risk
noeviction	N/A	N/A	nothing	Bounded permanent stores (e.g., config keys)	None (but writes fail)
allkeys-lru	Entire keyspace	LRU approximation	Any key (LRU)	Pure cache with uniform access	Low (least recently used)
volatile-lru	Keys with TTL	LRU approximation	TTL keys (LRU)	Hybrid: permanent data + cache	Low for permanent keys, medium for TTL keys
allkeys-random	Entire keyspace	Random	Any key	Cache where all keys are equally valuable	Low
volatile-random	Keys with TTL	Random	TTL keys	Scenarios with TTL keys that can be regenerated	Medium
volatile-ttl	Keys with TTL	TTL value	Key with shortest TTL remaining	Cache-first workloads where you want to keep active data longest	Medium (may evict soon-to-expire keys)
allkeys-lfu	Entire keyspace	LFU approximation	Least frequently used	Workloads with skewed access patterns (popular content)	Low
volatile-lfu	Keys with TTL	LFU approximation	TTL keys with least frequency	Similar to volatile-lru but frequency-based	Low for permanent keys

How to choose: - Pure cache: allkeys-lru or allkeys-lfu (if access pattern is skewed). - Session store with TTLs: volatile-ttl (keys with short TTL evicted first – likely stale anyway) or volatile-lru. - Permanent data only: noeviction is acceptable if you never hit the memory limit, but always monitor maxmemory. - Random access: allkeys-random is simple and predictable.

Algorithm internals: Redis LRU/LFU are approximations, not exact. They use a sample pool of keys (default 5) to pick the one to evict. This is O(1) with low overhead. You can tune the sample size via maxmemory-samples in redis.conf. Larger samples improve eviction quality but use more CPU.

Production checklist: 1. Never rely on the default noeviction for any cache workload. 2. Set maxmemory based on available RAM and headroom for other processes. 3. Monitor evicted_keys in INFO stats – if it's >0, your cache is under memory pressure. 4. Combine eviction with TTLs to reduce pressure on LRU/LFU algorithms.

redis_eviction_policy_config.shBASH

# Check current eviction policy and stats
127.0.0.1:6379> CONFIG GET maxmemory-policy
127.0.0.1:6379> INFO stats | grep evicted_keys
127.0.0.1:6379> CONFIG GET maxmemory-samples

# Change policy at runtime
127.0.0.1:6379> CONFIG SET maxmemory-policy allkeys-lfu
127.0.0.1:6379> CONFIG SET maxmemory-samples 10

# View current eviction stats over the last 5 seconds
127.0.0.1:6379> INFO stats | grep -E "evicted|expired"

# Test which key would be evicted next (Redis 7+)
127.0.0.1:6379> MEMORY DOCTOR

Output

# CONFIG GET maxmemory-policy

1) "maxmemory-policy"

2) "allkeys-lru"

# INFO stats | grep evicted_keys

evicted_keys:0

# CONFIG GET maxmemory-samples

1) "maxmemory-samples"

2) "5"

# After setting

# INFO stats | grep -E "evicted|expired"

evicted_keys:148

# MEMORY DOCTOR

# (detailed memory analysis)

🔥Production Tip: Monitor evicted_keys as a Health Metric

If evicted_keys > 0 during normal traffic, your maxmemory is too low or your TTLs are too long. An eviction rate above 100 keys/second indicates a memory leak or a workload that doesn't fit in RAM. Add memory alerting at 70% and 85% of maxmemory to catch this before evictions spike.

📊 Production Insight

evicted_keys > 0 means data is being discarded — if those evicted keys were important, you lost data.

allkeys-lfu requires more CPU than allkeys-lru but is better for skewed access.

If you need to keep some keys permanently, use volatile-lru and never set TTL on permanent keys.

Rule: for caches, prefer allkeys-lru; for stateful sets, use volatile-ttl.

🎯 Key Takeaway

Eight eviction policies exist, each with a distinct tradeoff between data retention and memory management.

Default noeviction is dangerous for caches; always change it in production.

Monitor evicted_keys as a leading indicator of memory pressure.

Redis Persistence — RDB vs AOF and Production Trade-offs

By default, Redis stores everything in RAM. If the server restarts, all data is lost. For a cache, that's acceptable. But many teams use Redis as a session store, a rate-limiter state store, or even a primary database for high-frequency writes. In those cases, losing data on restart is catastrophic.

Redis offers two persistence mechanisms:

RDB (Redis Database) — periodic snapshots of the entire dataset to disk. You configure how often (e.g., save 900 1 means if at least 1 key changed in 900 seconds, save). RDB files are compact and great for backups/disaster recovery. The downside: you lose data between snapshots. A crash 10 minutes before the next snapshot loses 10 minutes of writes.

AOF (Append Only File) — logs every write command to an append-only file. You can replay the file on restart to reconstruct the dataset. AOF gives you finer granularity: you can configure appendfsync everysec (lose at most 1 second of writes) or always (every write forces disk sync, near-zero data loss but 30-50% slower writes).

Most production setups use both. Redis supports a combined mode: RDB for fast restores, AOF for durability. When both are enabled, Redis loads the AOF on restart because it's more complete.

The choice matters for your data loss tolerance. Session stores need AOF everysec. Cache layers don't need persistence at all. A leaderboard that can rebuild from database can afford RDB-only. Match the persistence config to your data's criticality.

redis_persistence_config.shBASH

# =========================================================
# CONFIGURING PERSISTENCE (redis.conf)
# =========================================================

# --- RDB Snapshotting ---
# Format: save <seconds> <changes>
# Trigger: at least 1 change in 900s (15 min) OR 10 in 300s (5 min) OR 10000 in 60s
save 900 1
save 300 10
save 60 10000

# RDB file location and name
dbfilename dump.rdb
dir /var/lib/redis/

# Compress the RDB file (yes/no)
rdbcompression yes

# --- AOF Persistence ---
# Enable AOF (yes/no)
appendonly yes

# AOF file name
appendfilename "appendonly.aof"

# Sync policy:
# always = sync every write (safest, slowest)
# everysec = sync once per second (default, best trade-off)
# no = let OS decide (fastest, least safe)
appendfsync everysec

# --- AOF Rewrite ---
# Auto-rewrite AOF when it grows 100% in size (auto-aof-rewrite-percentage 100)
# and reaches at least 64MB (auto-aof-rewrite-min-size 64mb)
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# --- Check current config ---
# From redis-cli
127.0.0.1:6379> CONFIG GET save
127.0.0.1:6379> CONFIG GET appendonly
127.0.0.1:6379> CONFIG GET appendfsync

# --- Force an RDB save immediately ---
127.0.0.1:6379> SAVE

# --- Force an AOF rewrite ---
127.0.0.1:6379> BGREWRITEAOF

# --- Check persistence stats ---
127.0.0.1:6379> INFO persistence

Output

# CONFIG GET responses:

1) "save"

2) "900 1 300 10 60 10000"

3) "appendonly"

4) "yes"

5) "appendfsync"

6) "everysec"

# SAVE:

# BGREWRITEAOF:

Background append only file rewriting started

# INFO persistence (abbreviated):

# Persistence

loading:0

rdb_changes_since_last_save:0

rdb_last_save_time:1718476800

rdb_last_save_status:ok

aof_enabled:1

aof_rewrite_in_progress:0

aof_last_bgrewrite_status:ok

aof_current_size:1234567

Mental Model

Persistence Mental Model: Backup Camera vs Dash Cam

RDB is like a backup camera — takes periodic snapshots. AOF is like a dash cam — records every event continuously.

RDB: Low overhead, fast recovery, but data loss window (up to your save interval). Best for cache layers and scenarios where data can be rebuilt.
AOF: Higher overhead (especially appendfsync always), but sub-second data loss. Best for session stores, rate limiters, critical counters.
Both: Use both for maximum safety. Redis prioritizes AOF on restart. The small disk cost is worth the peace of mind.
Production trap: AOF with appendfsync always can cause write latency spikes during disk syncs. Test with your write throughput before enabling.

📊 Production Insight

AOF appendfsync always cuts write throughput by ~50% compared to everysec.

RDB snapshot on busy instances can block writes for milliseconds during fork.

Large AOF files (>1GB) slow down startup — BGREWRITEAOF prevents unbounded growth.

Rule: session stores use AOF everysec; caches use RDB-only or no persistence.

🎯 Key Takeaway

RDB for fast restores and backups, AOF for near-zero data loss.

AOF everysec is the production sweet spot — lose at most 1 second.

Monitor aof_current_size and rewrite regularly; test recovery process quarterly.

RDB vs AOF — Decision Matrix

Choosing between RDB and AOF (or both) depends on your data loss tolerance, write throughput, and recovery speed requirements. This decision matrix helps you pick the right combination for your production workload.

Factor	RDB-Only	AOF-Only (everysec)	Both	None
Data loss on restart	Up to last snapshot (minutes)	≤1 second	≤1 second (AOF loads first)	All data lost
Recovery time	Very fast (binary file)	Slower (command replay)	Slower (AOF replay)	Instant (empty)
Write throughput impact	Minimal (fork + save)	~5-10% overhead (everysec)	~10-15% overhead	None
Disk space	Low (single compact file)	High (grows linearly)	Higher (both files)	None
Best for	Caches, ephemeral data	Session stores, counters	Critical data (orders, state)	Throwaway test data
Cross-version compatibility	Yes (RDB format stable)	Limited (AOF format changes)	Yes (both)	N/A
Backup strategy	Periodic RDB copy	AOF file copy + BGREWRITEAOF	Both	No backup
Incremental restore	Not possible (full restore)	Partial replay (risk)	Same as AOF	N/A

Decision rules: 1. Cache only → No persistence. If Redis crashes, the cache rebuilds from the database. 2. Session store → AOF everysec. Losing sessions forces mass logouts; 1-second loss is acceptable. 3. Primary data store → Both. RDB for fast recovery (if AOF is corrupted) and AOF for durability. 4. Rate limiter / metadata → AOF everysec or RDB-only depending on whether state can be rebuilt. 5. Cross-version upgrade → RDB-only redises persist across versions; AOF may require rewriting.

Production recommendation: Use RDB snapshots for backup and AOF everysec for crash recovery. Configure AOF auto-rewrite (auto-aof-rewrite-percentage 100, auto-aof-rewrite-min-size 64mb) to keep AOF files manageable. Test your recovery procedure quarterly by simulating a server restart.

redis_persistence_decision.shBASH

# Enable both persistence methods
127.0.0.1:6379> CONFIG SET save "900 1 300 10 60 10000"
127.0.0.1:6379> CONFIG SET appendonly yes
127.0.0.1:6379> CONFIG SET appendfsync everysec
127.0.0.1:6379> CONFIG SET auto-aof-rewrite-percentage 100
127.0.0.1:6379> CONFIG SET auto-aof-rewrite-min-size 64mb

# Verify
127.0.0.1:6379> CONFIG GET appendfsync
127.0.0.1:6379> INFO persistence | grep aof_current_size

# Simulate a clean shutdown to flush all data
127.0.0.1:6379> SHUTDOWN SAVE

# After restart, check which file Redis loaded
# Look for "DB loaded from disk" or "DB loaded from AOF" in Redis log
# Typically: tail -20 /var/log/redis/redis-server.log
# Expect line: "DB loaded from append only file: AOF enabled"

Output

# CONFIG SET responses (all OK)

# CONFIG GET appendfsync

1) "appendfsync"

2) "everysec"

# INFO persistence snippet

aof_current_size:8435234

# After restart, log shows:

# * DB loaded from append only file: 0.023 seconds

# * DB loaded from RDB (background): 0.001 seconds (old snapshot not used)

⚠ Production Trap: AOF Corruption

If the Redis server crashes while writing to the AOF file, the AOF may become corrupt. By default, Redis refuses to load a corrupt AOF. You can set aof-load-truncated yes to load the last valid portion, but you'll lose the last few writes. Test AOF recovery with a crash simulation during staging.

📊 Production Insight

Many teams start with no persistence, then add AOF when they realize session loss hurts revenue.

RDB + AOF adds ~100-200 MB disk space for a 1 GB dataset — trivial cost for data safety.

BGREWRITEAOF can spike CPU during AOF rewrite; schedule it during low traffic via cron or timed config changes.

Rule: for any Redis instance with > 100K keys, use at least AOF everysec.

🎯 Key Takeaway

Use the decision matrix to match persistence to your data criticality.

AOF everysec + RDB is the gold standard for production durability.

Test recovery quarterly — a backup you've never restored is not a backup.

Redis Persistence Architecture — Visual Diagram

The diagram illustrates two separate paths for persistence. On the left (solid arrows), every write command is buffered and, depending on appendfsync, flushed to the AOF file on disk. On the right (dashed arrows), a background process BGSAVE forks and writes a snapshot (RDB file) at configured intervals. During restart, Redis checks for persistence files: AOF takes precedence if both exist. If only RDB exists, it loads the snapshot; if neither, the dataset starts empty.

Key architectural details: - RDB uses copy-on-write: the fork creates a child process that snapshots the memory at that instant. The parent continues serving requests, modified pages are copied via COW. This can spike memory usage (if many pages are modified during save). - AOF rewrite also forks: the child reads the existing AOF and creates a compact version in a temp file, then swaps it atomically. - Both mechanisms rely on fsync() to ensure data is actually on disk. appendfsync everysec issues fsync() once per second in a background thread; appendfsync always calls fsync() for every write, synchronously blocking the main thread.

What this means in practice: - If you enable both RDB and AOF, restart time is determined by AOF replay speed (linear with size). Large AOF files (>5 GB) can take seconds to load. - If you care about startup time and have AOF enabled, schedule regular BGREWRITEAOF to keep the file small. - For read-heavy workloads, RDB alone may be sufficient because restarts are fast and data loss is acceptable.

🔥Visualising COW: RDB Save Memory Impact

During BGSAVE, the child process sees a frozen snapshot of memory. But any writes from the parent (on a page the child hasn't copied yet) trigger a page copy (COW). The more writes during a save, the more memory is used. For an instance with 1 GB dataset and 1000 writes/sec, expect ~200 MB extra memory during the RDB save. Monitor process_rss during saves.

📊 Production Insight

RDB saves can cause latency spikes if the fork blocks the main thread for memory allocation.

Set slave-lazy-flush no to avoid triggering RDB saves on replica sync.

AOF rewrite has lower memory impact than RDB because it reads the existing AOF file, not memory.

Rule: schedule BGSAVE/BGREWRITEAOF during low traffic (e.g., cron at 3 AM).

🎯 Key Takeaway

RDB is fast to write and load but loses data between snapshots.

AOF is slower but more durable; use both for full protection.

Visualise the architecture to understand when memory spikes happen during persistence operations.

Redis Persistence Architecture Flow

Redis Cluster — Architectural Overview

Redis Cluster provides automatic sharding and high availability without needing a separate proxy. It's the production architecture for any dataset exceeding a single server's RAM or any workload demanding horizontal scalability. Understanding its architecture is essential for engineers building large-scale caching and real-time data platforms.

Key Components: - Nodes: Each node in a cluster is a regular Redis server running in cluster mode (cluster-enabled yes). - Hash Slots: The keyspace is divided into 16,384 hash slots (hash function: CRC16(key) mod 16384). Each node is responsible for a subset of slots. - Cluster Bus: A dedicated TCP channel (port + 10000) for inter-node communication: heartbeat, gossip, failover. - Configuration: Every node stores the cluster configuration (nodes.conf) with the slot assignment and the view of all nodes. - Replicas: Each master node can have one or more replicas that replicate its data. If a master fails, a replica is promoted.

How Data Is Distributed: When a client sends a command for a specific key, the Redis client library (e.g., redis-py-cluster) computes the hash slot and sends the command directly to the node responsible for that slot. If the client sends to the wrong node, that node returns a -MOVED redirect error with the correct node's address. Smart clients cache the slot mapping to avoid redirects.

Resharding: Adding or removing nodes involves moving hash slots between nodes. This is done online via redis-cli --cluster reshard. During migration, slots are source nodes mark as migrating and target nodes as importing. Keys are migrated in batches using MIGRATE command. The cluster remains available during resharding.

Failover: Master failure detection is based on a quorum of nodes marking the master as PFAIL (possibly failing), then FAIL if a majority agree. A replica then triggers an election using the cluster bus. The winning replica increases its currentEpoch and becomes master. Automatic failover requires that the majority of masters are reachable.

Limitations: - Multi-key operations (e.g., MGET, transactions, Lua scripts) only work if all keys hash to the same slot. Use hash tags {key} to force keys into the same slot. - No support for multiple databases (only DB 0). - Minimum 3 master nodes recommended; for high availability, 3 masters + 3 replicas in different racks/availability zones. - Network latency between nodes adds overhead for cluster bus and replication.

When to use Redis Cluster: - Dataset > 10 GB (or whatever fits in one node's RAM). - Need for automatic failover and high availability. - Write throughput exceeds what a single Redis instance can handle. - You need linear scalability by adding nodes.

When not to use: - Your dataset fits in one node and you don't need high availability – a single instance with Sentinel is simpler. - You rely heavily on multi-key operations across different keys. - Your workload is predominantly read-heavy with a small dataset – single instance is faster.

redis_cluster_setup.shBASH

# --- Step 1: Configure each node in cluster mode (add to redis.conf) ---
# On each of 3 servers (node1, node2, node3):
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
port 6379

# --- Step 2: Start each Redis instance ---
redis-server /path/to/redis.conf &

# --- Step 3: Create cluster (run from one node) ---
# Replace IPs with actual server IPs. Use --cluster-replicas 1 for one replica per master
redis-cli --cluster create \
  192.168.1.10:6379 \
  192.168.1.11:6379 \
  192.168.1.12:6379 \
  --cluster-replicas 1

# --- Step 4: Check cluster status ---
127.0.0.1:6379> CLUSTER INFO
127.0.0.1:6379> CLUSTER NODES
127.0.0.1:6379> CLUSTER SLOTS

# --- Step 5: Test key distribution ---
# Connect to any node and set a key; simulate redirected client
127.0.0.1:6379> SET user:1001 "Alice"
# If sent to wrong node, responds:
# -MOVED 7629 192.168.1.11:6379

# --- Step 6: Reshard example (migrate 100 slots from node 1 to node 4) ---
redis-cli --cluster reshard 192.168.1.10:6379 --cluster-from <node-id-1> --cluster-to <node-id-4> --cluster-slots 100

Output

# Step 3 output (after entering 'yes' to accept proposed layout):

>>> Performing Cluster Check (using node 192.168.1.10:6379)

M: <node-id-1> 192.168.1.10:6379

slots:[0-5460] (5461 slots) master

M: <node-id-2> 192.168.1.11:6379

slots:[5461-10922] (5462 slots) master

M: <node-id-3> 192.168.1.12:6379

slots:[10923-16383] (5461 slots) master

S: <node-id-4> 192.168.1.13:6379

replicates <node-id-1>

[OK] All 16384 slots covered

# Step 4: CLUSTER NODES output (abbreviated):

<node-id-1> 192.168.1.10:6379@16379 master - 0 1718476800000 1 connected 0-5460

<node-id-2> 192.168.1.11:6379@16379 master - 0 1718476800000 2 connected 5461-10922

<node-id-3> 192.168.1.12:6379@16379 master - 0 1718476800000 3 connected 10923-16383

<node-id-4> 192.168.1.13:6379@16379 slave <node-id-1> 0 1718476800000 1 connected

# Step 5: SET response if not proper hash slot

-MOVED 7629 192.168.1.11:6379

⚠ Production Gotcha: Network Partitions and Split-Brain

Redis Cluster uses a quorum-based failure detection. If a network partition isolates a majority of masters, the minority partition will become unavailable (to prevent split-brain). In a 3-node cluster, if 1 node is isolated, it stops serving writes. Always use an odd number of nodes to ensure a clear majority can form. For larger clusters, place replicas in different availability zones.

📊 Production Insight

Redis Cluster requires users to coordinate replicas and nodes manually — it's not fully automated like Cassandra.

Client-side smart routing (e.g., redis-py-cluster) is essential to avoid performance-impacting redirects.

Hash tags can cause hot spots if many keys share the same tag (e.g., {user}:1001 and {user}:1002 go to same slot).

Rule: test resharding in staging before production; monitor cluster_state and cluster_size.

🎯 Key Takeaway

Redis Cluster shards data across nodes using 16,384 hash slots.

Provides high availability through automatic replica promotion.

Use hash tags for multi-key operations but avoid hot spots.

Start with 3 masters + 3 replicas for production.

Cache Hit vs Cache Miss — The Two Paths That Decide Your Latency

Every Redis request takes one of two paths. Cache hit means data was in RAM. Cache miss means you're about to pay the disk tax. Simple, right? Yet I've seen teams flame out because they only tested the happy path.

A cache hit returns data in sub-millisecond time. The application gets its response, the database never sees the request. That's the whole point of Redis.

A cache miss is the expensive fallback. Redis returns nil, the application queries the primary database, writes the result back to Redis for next time, then serves the response. That's at least one network round trip plus a disk read. In high traffic, a sudden wave of misses — from a cache flush or deployment — can take down your database.

Design for cache misses. Set proper expiry. Pre-warm caches after restart. Your database doesn't have a second chance.

CacheFlowDiagram.sqlSQL

// io.thecodeforge — database tutorial

-- Simulating a cache miss + write-back in PostgreSQL
-- This is NOT Redis itself, it shows the app fallback logic

BEGIN;

-- Step 1: Check Redis (pseudo) — SELECT from cache table as stand-in
SELECT value FROM redis_cache WHERE key = 'user:profile:47291';

-- If returned 0 rows (CACHE MISS) → hit primary DB
SELECT id, username, email, preferences_json 
FROM users 
WHERE id = 47291;

-- Step 2: Write result back to cache for next request
INSERT INTO redis_cache (key, value, expiry_epoch)
VALUES ('user:profile:47291', '{"id":47291,"username":"jane_dev","prefs":{"theme":"dark"}}', extract(epoch from now()) + 3600)
ON CONFLICT (key) DO UPDATE 
SET value = EXCLUDED.value, expiry_epoch = EXCLUDED.expiry_epoch;

-- Step 3: Return to client
SELECT 'SUCCESS' AS response;

COMMIT;

Output

Step 1: 0 rows (CACHE MISS)

Step 2: 1 row returned from users

INSERT 0 1

SUCCESS

⚠ Production Trap:

If your cache-miss handler makes a synchronous DB call per request without connection pooling, you'll exhaust database connections in seconds during traffic spikes. Always pool, always time out.

🎯 Key Takeaway

Cache misses are the silent killer. Pre-warm caches. Pool your DB connections. Never fallback synchronously without a circuit breaker.

Real-World Redis — Where the Big Shops Actually Use It

Competitor pages list 'Netflix uses Redis for caching' like it's a revelation. Let's get specific about where Redis earns its keep in production.

E-commerce — product catalog caches. Amazon doesn't query DynamoDB for every page load. They push the entire category view into Redis hashes. Product ID as key, all attributes as fields. Fetch in one round trip. Fresh inventory at scale.

Real-time gaming leaderboards — sorted sets with scores. Every player action updates their score via ZINCRBY. Top 100 is ZREVRANGE with LIMIT. No SQL JOINs. No page refreshes. This is why your mobile game updates ranks instantly.

Rate limiting — INCR with EXPIRE. User hits an endpoint, you INCR a key like ratelimit:user:47291:minute. Set TTL to 60 seconds. If value > threshold, reject. Atomic. Fast. No database write. Instagram uses this pattern for API throttling.

Session store — EXPIRE handles user logout automatically. No cron jobs to clean stale sessions. No table scans.

RealWorldRateLimiter.sqlSQL

// io.thecodeforge — database tutorial

-- Rate limit check pattern (pseudo-Redis logic in PostgreSQL)
-- NOT actual Redis, but shows the atomic increment structure

CREATE OR REPLACE FUNCTION rate_limit_user(p_user_id INT)
RETURNS BOOLEAN AS $$
DECLARE
  v_count INT;
  v_max_requests INT := 100;  -- per minute
BEGIN
  -- Simulate Redis INCR + EXPIRE
  INSERT INTO rate_limiter (user_id, counter, window_start)
  VALUES (p_user_id, 1, extract(epoch from now())::bigint / 60)
  ON CONFLICT (user_id, window_start) DO UPDATE
  SET counter = rate_limiter.counter + 1
  RETURNING counter INTO v_count;
  
  -- Check threshold
  IF v_count > v_max_requests THEN
    RETURN FALSE;  -- BLOCK
  END IF;
  
  RETURN TRUE;  -- ALLOW
END;
$$ LANGUAGE plpgsql;

Output

SELECT rate_limit_user(47291); → TRUE (under limit)

SELECT rate_limit_user(47291); → TRUE

... 100 more calls ...

SELECT rate_limit_user(47291); → FALSE (blocked)

💡Senior Shortcut:

When using Redis for leaderboards, don't fetch the full sorted set every time. Use ZREVRANK to get a single user's position and ZREVRANGE for the top N. Two calls, zero bloat.

🎯 Key Takeaway

Rate limiting, leaderboards, sessions, and product caches — Redis handles these with atomic operations. Design your data structure to match the access pattern, not the relational model.

What Makes Redis Fast — The Three Pillars (Spoiler: Not Just RAM)

Everyone says 'Redis is fast because it's in-memory.' True, but lazy. Plenty of in-memory databases are slow. Redis has three architectural decisions that matter more than RAM.

Single-threaded event loop. No locks. No context switching between threads. One queue processes all commands sequentially. This means no race conditions on simple operations, no mutex overhead. Write a complex Lua script? It blocks everything else. Don't write slow Lua scripts.

Non-blocking I/O with epoll/kqueue. Redis doesn't spin waiting for network data. It registers file descriptors and gets notified when data arrives. This is why a single Redis instance handles 100K+ ops per second on modest hardware.

Data structures tuned for cache lines. Redis strings, lists, hashes are designed to fit in CPU L2 cache. Compare that to PostgreSQL which is optimised for disk pages. Redis data is already hot in memory and laid out for minimal pointer chasing.

Combine these: single-threaded + event-driven + cache-optimised structures. That's the real answer. Not just 'it's in RAM'.

ExplainRedisSpeed.sqlSQL

// io.thecodeforge — database tutorial

-- Benchmark simulation comparing in-memory vs disk latency
-- Shows the orders of magnitude difference

WITH latencies AS (
  SELECT 'L1 Cache' AS location, 1 AS nanoseconds_per_op, 1 AS relative_speed
  UNION ALL SELECT 'L2 Cache', 7, 7
  UNION ALL SELECT 'RAM', 100, 100
  UNION ALL SELECT 'SSD (NVMe)', 10000, 10000
  UNION ALL SELECT 'HDD (Spinning)', 5000000, 5000000  -- 5 ms
)
SELECT 
  location,
  nanoseconds_per_op,
  CASE 
    WHEN relative_speed <= 100 THEN 'HOT — CPU Cache'
    WHEN relative_speed <= 10000 THEN 'WARM — RAM'
    ELSE 'COLD — Disk'
  END AS redis_temperature
FROM latencies
ORDER BY relative_speed;

Output

location | nanoseconds_per_op | redis_temperature

------------------+-------------------+---------------------

L1 Cache | 1 | HOT — CPU Cache

L2 Cache | 7 | HOT — CPU Cache

RAM | 100 | WARM — RAM

SSD (NVMe) | 10000 | COLD — Disk

HDD (Spinning) | 5000000 | COLD — Disk

🔥Architecture Note:

Redis 7.0 introduced multi-threaded I/O for parsing network buffers. But command execution remains single-threaded. This means bulk data loading is faster, but your Lua scripts still block everything — keep them short.

🎯 Key Takeaway

Redis speed is a product of its single-threaded event loop, non-blocking I/O, and cache-optimised data structures — not just RAM. Don't write slow scripts. Don't use it for complex joins.

Working With Numbers — Redis Isn't Just for Strings

Redis treats numbers as strings internally but provides atomic operations for integer arithmetic. When you store a numeric value, Redis sees a string that can be incremented or decremented using commands like INCR, DECR, INCRBY, and DECRBY. These operations are atomic—no race conditions, no lost updates. This makes Redis an excellent choice for real-time counters, rate limiters, and leaderboards. The WHY: atomicity eliminates the need for locks or transactions in single-instance scenarios. If you try to increment a non-numeric value (like a word), Redis returns an error. The error message is clear: 'ERR value is not an integer or out of range'. Always validate your data type before performing arithmetic. Redis stores integers as signed 64-bit values, so overflow will wrap around silently—watch for that in high-throughput counters.

RedisNumbers.sqlSQL

// io.thecodeforge — database tutorial

// Atomic increment and decrement operations
SET views:post:42 0
INCR views:post:42
// Returns (integer) 1
DECR views:post:42
// Returns (integer) 0
INCRBY views:post:42 10
// Returns (integer) 10
SET badkey "hello"
INCR badkey
// Returns (error) ERR value is not an integer or out of range

Output

(integer) 1

(integer) 0

(integer) 10

(error) ERR value is not an integer or out of range

⚠ Production Trap:

INCR uses a single-threaded event loop, so it's atomic per Redis instance. But in a Redis Cluster, multiple keys in the same command may hash to different nodes—use hash tags to keep related counters together.

🎯 Key Takeaway

Use INCR/DECR for atomic counters; Redis errors on non-numeric values, and overflow wraps silently at 64-bit boundary.

Saving and Retrieving Key-Value Pairs — The Foundation of Every Redis App

Redis stores data as key-value pairs. The key is always a string, and the value can be one of many data types: string, list, set, hash, or sorted set. The most basic operations are SET and GET. SET associates a key with a value; GET retrieves it. If the key doesn't exist, GET returns nil (null in most clients). The WHY: Redis uses keyspace as a flat dictionary—no schema, no tables, no joins. This zero-overhead model delivers microsecond latency for single-key lookups. When you SET a key that already exists, the old value is silently overwritten. To avoid accidental overwrites, use SETNX (set if not exists) or check for nil before writing. Namespacing keys with colons (like 'user:1000:name') is a standard practice—Redis handles long keys fine, but shorter keys mean less memory overhead. TTL expiry is often set at write time to control cache lifetimes.

RedisKeyValue.sqlSQL

// io.thecodeforge — database tutorial

// Basic SET and GET operations
SET user:1000:name "Alice"
GET user:1000:name
// Returns "Alice"
GET user:9999:name
// Returns (nil)
SETNX user:1000:name "Bob"
// Returns (integer) 0 — key already exists, not set
SETNX user:1001:name "Bob"
// Returns (integer) 1 — new key set successfully

Output

"Alice"

(nil)

(integer) 0

(integer) 1

⚠ Production Trap:

Keys in Redis are case-sensitive and binary-safe. 'User:1' and 'user:1' are two different keys. Always enforce key naming conventions in your application layer.

🎯 Key Takeaway

SET overwrites existing keys; GET returns nil for missing keys; use SETNX for safe creation and colons for namespacing.

Redis 7.x/8.x: New Features and Architecture in 2026

Redis 7.x and 8.x introduce significant architectural improvements and new features that enhance performance, scalability, and developer experience. Key highlights include Redis Functions (server-side scripting with Lua), vector search capabilities via the RediSearch module integration, and improved cluster sharding with automatic resharding. Redis 8.x further extends with native support for time-series data, enhanced ACLs, and better memory efficiency through defragmentation improvements. The architecture now supports multi-threaded I/O for higher throughput, while maintaining the single-threaded event loop for core operations. For developers, the new RESP3 protocol provides richer data types and better client-server interaction. Example: Using Redis Functions to atomically update a user's score and rank:

```sql -- Register a function FUNCTION LOAD "#!lua name=update_score redis.register_function('update_score', function(keys, args) local user = keys[1] local score = tonumber(args[1]) redis.call('ZADD', 'leaderboard', score, user) local rank = redis.call('ZREVRANK', 'leaderboard', user) return {score, rank} end)"

-- Call the function FCALL update_score 1 user123 100 ``` This atomic operation eliminates race conditions in leaderboard updates. Redis 8.x also introduces 'Redis Stack' as a unified distribution, bundling modules for search, JSON, time-series, and graph. The architecture now supports active-active geo-distribution with CRDT-based replication, enabling multi-region writes without conflicts. These features make Redis suitable for modern microservices, real-time analytics, and AI/ML workloads.

redis7_features.sqlSQL

-- Register a Redis Function
FUNCTION LOAD "#!lua name=update_score
redis.register_function('update_score', function(keys, args)
  local user = keys[1]
  local score = tonumber(args[1])
  redis.call('ZADD', 'leaderboard', score, user)
  local rank = redis.call('ZREVRANK', 'leaderboard', user)
  return {score, rank}
end)"

-- Call the function
FCALL update_score 1 user123 100

-- Example: Vector search with RediSearch (Redis 8.x)
FT.CREATE idx ON HASH PREFIX 1 doc: SCHEMA title TEXT embedding VECTOR FLAT 6 DIM 768 DISTANCE_METRIC L2
FT.SEARCH idx "@embedding:[VECTOR_RANGE 0.1 $vec]" PARAMS 2 vec $query_vec DIALECT 2

🔥Redis 8.x Active-Active Geo-Distribution

📊 Production Insight

When upgrading to Redis 8.x, test your existing Lua scripts for compatibility with the new Functions API. Use the new multi-threaded I/O for workloads with many concurrent connections, but monitor CPU usage as it may increase overhead for small operations.

🎯 Key Takeaway

Redis 7.x/8.x bring server-side scripting with Functions, vector search, and active-active geo-distribution, making Redis a versatile platform for modern real-time applications.

Redis Persistence: RDB vs AOF vs No Persistence

Redis offers three persistence modes: RDB (snapshots), AOF (append-only file), and no persistence. Each has distinct trade-offs in durability, performance, and recovery time. RDB creates point-in-time snapshots at configurable intervals (e.g., every 5 minutes if 100 keys changed). It is compact and fast for recovery but risks losing data since the last snapshot. AOF logs every write operation, offering finer granularity (every second, every write, or OS-controlled). It provides better durability but larger file sizes and slower recovery. No persistence (default for cache-only use) maximizes performance but loses all data on restart. Example: Configuring RDB and AOF together for hybrid persistence:

-- redis.conf
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec
aof-use-rdb-preamble yes
```
The `aof-use-rdb-preamble` option combines RDB and AOF: Redis writes an RDB snapshot at the start of the AOF file, then appends incremental writes. This speeds up recovery (load RDB then replay AOF) while maintaining durability. For production, a hybrid approach is recommended: enable both RDB and AOF with `appendfsync everysec` for a balance between performance and safety. If you need maximum durability (e.g., financial data), use AOF with `appendfsync always` but expect ~50% throughput reduction. For ephemeral caches, disable persistence entirely to avoid disk I/O overhead. Always monitor disk space: AOF files can grow large; use `BGREWRITEAOF` periodically to compact them.

redis_persistence_config.sqlSQL

-- redis.conf configuration for hybrid persistence
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec
aof-use-rdb-preamble yes

-- Manual RDB save
SAVE

-- Manual AOF rewrite
BGREWRITEAOF

-- Check persistence info
INFO persistence

⚠ AOF Rewrite Can Block Redis

📊 Production Insight

In production, always test your recovery time objective (RTO) and recovery point objective (RPO). For example, with AOF every second, you may lose up to 1 second of data. Use INFO persistence to monitor last save time and AOF size. Consider using Redis Enterprise or a managed service for automated failover and persistence tuning.

🎯 Key Takeaway

Choose RDB for fast recovery with some data loss, AOF for durability with slower recovery, or hybrid for a balance. No persistence for pure caching.

thecodeforge.io

Redis Basics

Redis Security: ACL, TLS, and Best Practices

Securing Redis is critical for production deployments. Redis 6+ introduces Access Control Lists (ACLs) for granular user permissions, and TLS support for encrypted connections. ACLs allow you to define users with specific command and key permissions, replacing the old requirepass approach. Example: Creating a read-only user for analytics:

```sql -- Create a user with read-only access to keys matching 'analytics:' ACL SETUSER analytics on >password +@read ~analytics:

-- Verify user permissions ACL LIST

-- Connect as analytics user redis-cli --user analytics --askpass `` TLS can be enabled by configuring certificates in redis.conf`:

``sql tls-port 6379 port 0 tls-cert-file /etc/redis/redis.crt tls-key-file /etc/redis/redis.key tls-ca-cert-file /etc/redis/ca.crt tls-auth-clients yes ` Best practices include: disable the FLUSHALL and CONFIG commands for non-admin users, use strong passwords, bind to localhost or private IPs, and enable rename-command for dangerous commands. Example: Renaming FLUSHALL` to a custom name:

``sql rename-command FLUSHALL "" rename-command CONFIG "" `` Additionally, use firewalls to restrict access, run Redis as a non-root user, and enable logging. For cloud deployments, use VPCs and security groups. Regularly audit ACLs and rotate passwords. Redis 8.x further enhances security with encrypted replication and improved audit logging.

redis_security_setup.sqlSQL

-- ACL: Create a read-only user
ACL SETUSER analytics on >StrongPass123 +@read ~analytics:*

-- ACL: Create an admin user with all permissions
ACL SETUSER admin on >AdminPass456 +@all ~*

-- List all users
ACL LIST

-- Enable TLS in redis.conf
tls-port 6379
port 0
tls-cert-file /etc/redis/redis.crt
tls-key-file /etc/redis/redis.key
tls-ca-cert-file /etc/redis/ca.crt
tls-auth-clients yes

-- Rename dangerous commands
rename-command FLUSHALL ""
rename-command CONFIG ""
rename-command SHUTDOWN ""

💡Use ACL Categories for Simplicity

📊 Production Insight

In production, integrate Redis ACLs with your identity provider (e.g., LDAP) using Redis Enterprise or a custom auth proxy. Regularly rotate passwords and audit ACLs. For compliance (PCI-DSS, HIPAA), enable TLS and audit logging. Use redis-cli --tls for encrypted connections.

🎯 Key Takeaway

Implement ACLs and TLS to secure Redis. Disable dangerous commands, use strong passwords, and restrict network access.

● Production incidentPOST-MORTEMseverity: high

The Late-Night Pager: Redis Write Failures from Default Eviction Policy

Symptom

Application logs show sporadic 'OOM command not allowed when used memory > maxmemory' errors. Some transactions succeed, others fail with no clear pattern. No alarms on database latency or CPU.

Assumption

The team assumed the Redis instance had enough memory (64 GB) and that eviction would happen automatically. They never touched the maxmemory-policy config.

Root cause

Default maxmemory-policy noeviction combined with a memory leak from unbounded session storage. When Redis hit the maxmemory limit (set to 48 GB via monitoring tool), it refused all writes. No key had a TTL, so no keys expired. The leak was caused by a session table that never cleaned stale sessions.

Fix

Set maxmemory-policy allkeys-lru in redis.conf, added TTLs to session keys (EXPIRE 3600), and configured memory alerting at 80% of maxmemory. Ran CONFIG SET maxmemory-policy allkeys-lru as a live fix, then bounced the service to clear memory.

Key lesson

Always configure maxmemory-policy before going to production — never rely on defaults.
Every SET command must include a TTL unless the key is truly permanent (and documented).
Set memory usage alerts at 70% and 85% of maxmemory — don't wait for OOM errors.
Use MEMORY USAGE and MEMORY STATS in production monitoring to detect leaks early.

Production debug guideSymptom → Action patterns for the most common Redis failures5 entries

Symptom · 01

Commands return 'OOM command not allowed when used memory > maxmemory'

→

Fix

Check INFO memory to see used_memory_human vs maxmemory. Temporarily increase maxmemory with CONFIG SET maxmemory 2gb, then permanently fix the eviction policy and memory leak.

Symptom · 02

Application reports high latency on Redis operations (>10ms)

→

Fix

Run SLOWLOG GET 10 to find slow commands. Check CLIENT LIST for long-running connections. Verify network latency between app and Redis with redis-cli --latency.

Symptom · 03

Keys disappear unexpectedly before TTL expiry

→

Fix

Check if eviction is active: INFO evicted_keys. If evicted_keys > 0, the maxmemory-policy is evicting keys. Review maxmemory setting and add TTLs where missing. Also check for FLUSHDB/FLUSHALL in slowlog.

Symptom · 04

Cache miss rate spikes suddenly

→

Fix

Check for mass key expiry: INFO expired_keys. If many keys share the same TTL, you have a cache stampede. Use random TTL jitter (±20%) on SETs to spread expiry times. Also check if a deployment cleared cache via FLUSHALL.

Symptom · 05

Redis connection refused or timeout

→

Fix

Check redis-cli ping. If fails, verify systemctl status redis, check logs at /var/log/redis/redis-server.log. Ensure bind config allows app IP. Check firewall/security groups. redis-cli -h <host> -p 6379 PING from the app server.

★ Redis Quick Debug Cheat SheetThree-command diagnostics for the most painful Redis production issues

REDIS OOM ERROR - writes failing−

Immediate action

Temporarily increase maxmemory to let writes through

Commands

redis-cli CONFIG SET maxmemory 2gb

redis-cli CONFIG SET maxmemory-policy allkeys-lru

Fix now

After live fix, permanently update redis.conf: maxmemory 1gb and maxmemory-policy allkeys-lru. Restart Redis to apply.

REDIS HIGH LATENCY - slow responses+

REDIS DISCONNECTED - app can't connect+

Data Structure	Best Use Case	Key Commands	Stores Duplicates?	Ordered?
String	Cached values, counters, session tokens	GET, SET, INCR, EXPIRE	N/A (single value)	N/A
Hash	Object/entity fields (user profiles, product data)	HGET, HSET, HGETALL, HINCRBY	N/A (field map)	No
List	Queues, activity feeds, job pipelines	RPUSH, LPOP, BLPOP, LRANGE	Yes	Insertion order
Set	Unique visitors, tags, friend graphs	SADD, SMEMBERS, SINTER, SUNION	No	No
Sorted Set	Leaderboards, priority queues, range queries	ZADD, ZREVRANGE, ZRANK, ZSCORE	No (by member)	By score (float)

⚙ Quick Reference

16 commands from this guide

File	Command / Code	Purpose
redis_getting_started.sh	redis-server	Why Redis Lives in RAM and Why That Changes Everything
redis_data_structures.sh	127.0.0.1:6379> INCR api_calls:user:2055:minute:2024061514	Redis Data Structures
cache_aside_pattern.py	redis_client = redis.Redis(	The Cache-Aside Pattern
redis_expiry_and_eviction.sh	127.0.0.1:6379> CONFIG SET maxmemory 268435456	Redis Expiry, Eviction and Why Your Cache Will Betray You Wi
redis_eviction_policy_config.sh	127.0.0.1:6379> CONFIG GET maxmemory-policy	Redis Eviction Policies
redis_persistence_config.sh	save 900 1	Redis Persistence
redis_persistence_decision.sh	127.0.0.1:6379> CONFIG SET save "900 1 300 10 60 10000"	RDB vs AOF
redis_cluster_setup.sh	cluster-enabled yes	Redis Cluster
CacheFlowDiagram.sql	BEGIN;	Cache Hit vs Cache Miss
RealWorldRateLimiter.sql	CREATE OR REPLACE FUNCTION rate_limit_user(p_user_id INT)	Real-World Redis
ExplainRedisSpeed.sql	WITH latencies AS (	What Makes Redis Fast
RedisNumbers.sql	SET views:post:42 0	Working With Numbers
RedisKeyValue.sql	SET user:1000:name "Alice"	Saving and Retrieving Key-Value Pairs
redis7_features.sql	FUNCTION LOAD "#!lua name=update_score	Redis 7.x/8.x
redis_persistence_config.sql	save 900 1	Redis Persistence
redis_security_setup.sql	ACL SETUSER analytics on >StrongPass123 +@read ~analytics:*	Redis Security

Key takeaways

Redis is fast because it lives in RAM and uses a single-threaded event loop

which makes every command atomic without needing locks.

Sorted Sets are Redis's most underrated data structure

they let you build real-time leaderboards and priority queues in a single command with no application-side sorting.

Always set a TTL and always configure maxmemory-policy before production

the default noeviction policy will cause your app to throw errors when Redis fills up.

Cache-Aside (lazy loading) is the most production-proven caching pattern

check cache first, miss falls back to DB, always store with expiry, invalidate explicitly on writes.

Match your persistence strategy to data criticality

AOF everysec for session stores, RDB-only or no persistence for caches, both for maximum safety.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Redis is single-threaded — how can it handle thousands of concurrent con...

Q02SENIOR

Explain the difference between Redis RDB snapshotting and AOF persistenc...

Q03SENIOR

What is a cache stampede and how would you prevent it in a high-traffic ...

Q01 of 03SENIOR

Redis is single-threaded — how can it handle thousands of concurrent connections without being a bottleneck?

ANSWER

Redis uses an event-driven, non-blocking I/O model with a single thread for command processing. All network I/O is handled by epoll/kqueue (multiplexing), so connections don't block each other. The single thread processes commands sequentially, which eliminates lock contention and context switching. Redis is fast because it keeps everything in RAM and operations are O(1) or O(log n) — not because it multitasks. The bottleneck is never the number of connections but the complexity of individual commands (avoid O(n) commands like KEYS, SMEMBERS on large sets).

FAQ · 5 QUESTIONS

Frequently Asked Questions

Is Redis a database or a cache?

What happens to Redis data when the server restarts?

When should I use a Hash instead of storing JSON as a String?

What eviction policy should I use for a production cache?

Can I run Redis in a Kubernetes cluster?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Drawn from code that ran under real load.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's NoSQL. Mark it forged?

17 min read · try the examples if you haven't