Mid-level 18 min · March 09, 2026

Missing @Cacheable — 40x Latency Spike in Spring Boot Redis

A missing @Cacheable caused latency from 50ms to 2000ms, 50K DB queries/min instead of one.

N
Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Everything here is grounded in real deployments.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Spring Boot Caching abstracts cache operations via @Cacheable, @CachePut, and @CacheEvict annotations — Redis is the distributed backing store that all application instances share
  • @Cacheable checks cache first and skips method execution on hit — use unless="#result == null" to prevent caching the absence of data, which is a silent data integrity bug
  • @CachePut always executes the method and updates the cache — use for writes where you want zero cache miss penalty on the next read, accepting higher write cost for lower read latency
  • @CacheEvict removes entries — forgetting to evict related caches like list or summary caches after an update is the single most common source of stale data in production
  • Always use JSON serialization via GenericJackson2JsonRedisSerializer and configure per-cache TTL via RedisCacheManager — a global TTL applied uniformly to all cache namespaces is a blunt instrument that creates problems
  • Internal calls via this.method() bypass the AOP proxy entirely and skip caching with no error — extract cached methods into separate Spring beans injected as dependencies
✦ Definition~90s read
What is Spring Boot Caching with Redis?

Spring Boot's @Cacheable annotation is not a convenience feature—it's the difference between a 2ms Redis hit and a 200ms database query that compounds into a 40x latency spike under load. When you skip caching on frequently accessed data, every request hits the database, saturating connection pools and driving response times into the seconds.

Imagine you are a librarian in a massive library.

Redis, as an in-memory data store, can serve cached results in microseconds, but only if you explicitly declare what to cache and how. Without @Cacheable, you're paying for Redis but using it as a decorative sidecar.

In the Spring ecosystem, @Cacheable sits atop the Cache Abstraction, which decouples your business logic from any specific cache provider—Redis, Caffeine, Hazelcast, or even a simple ConcurrentHashMap. For production Redis, you pair it with spring-boot-starter-cache and spring-boot-starter-data-redis, then configure serialization (JSON or Kryo), TTLs, and per-cache settings in application.yml.

The annotation itself is declarative: it wraps a method, checks Redis for a key derived from method arguments, returns the cached value if found, or executes the method and stores the result. This lifecycle—check, hit-or-miss, store—is what makes distributed caching win over local caches: all application instances share the same Redis cluster, so a cache miss on one node is a miss for all, and a write on any node is visible everywhere.

Where teams go wrong is treating @Cacheable as magic. They forget that Redis is a separate process with network latency, serialization overhead, and memory limits. A 40x spike often comes from using Java serialization (slow, bloated) instead of Jackson JSON or Kryo, or from omitting TTLs so stale data accumulates.

The annotation triad—@Cacheable (read), @CachePut (write-through), @CacheEvict (invalidation)—must be used deliberately. For complex method signatures, custom key generators (e.g., @Cacheable(key="#userId + ':' + #page")) prevent collisions and ensure cache hits.

When you skip this, you don't just lose performance—you lose the predictability that makes Redis caching worth the operational cost.

Plain-English First

Imagine you are a librarian in a massive library. Every time someone asks for a popular book, you have to hike 10 flights of stairs to the deep archives, find it, and carry it back. That is your database query — correct, but slow and exhausting at scale. After making that same trip 50 times for the same book in one morning, you make a sensible decision: keep a copy of that bestseller right on your front desk. Now the next person gets it in two seconds.

Spring Boot Caching is the system that automates this without you manually managing the desk copy. It checks the desk first. If the book is there, it hands it over immediately. If not, it makes the archive trip, brings the book back, and leaves a copy on the desk for next time. Redis is the desk — shared between every librarian in the building so they all benefit from what any one of them fetched. If one librarian updates a book's information, the smart system knows to replace the desk copy so nobody hands out the old edition. That is the job of @CacheEvict and @CachePut.

Performance is a feature, not an afterthought. In high-traffic environments, hitting the database for every single read request is a reliable path to a bottleneck. Spring Boot Caching provides an abstraction layer that lets you add transparent caching to existing methods with a single annotation, while Redis acts as the high-performance distributed store where that data lives between requests.

I want to be direct about something most caching tutorials avoid: caching failures cause production incidents that are expensive, embarrassing, and genuinely hard to diagnose. A missing @Cacheable on a hot endpoint caused a 40x database latency spike during a product launch I was involved in. A serialization change deployed without a cache flush served structurally broken data to users for three hours before anyone noticed. A Redis instance that hit maxmemory on Black Friday took down checkout flow for 20 minutes because nobody had implemented graceful degradation.

All of these were preventable with knowledge that was not particularly advanced — it just was not in any tutorial I had read at the time.

This guide covers the full annotation triad, production-grade serialization, per-cache TTL strategy, custom key generation, Actuator monitoring, graceful degradation patterns when Redis goes down, the cache stampede problem and how to prevent it, and the testing approach that catches caching bugs in CI instead of production. By the end, you will have the complete picture, not just the happy path.

Why @Cacheable Is Not Optional for Redis in Spring Boot

Spring Boot caching with Redis is a declarative mechanism that stores method return values in a Redis key-value store, keyed by method parameters, so subsequent identical calls skip execution and return the cached result. The core mechanic is the @Cacheable annotation, which intercepts method invocations via AOP, checks the cache before execution, and writes the result after execution — turning a potentially expensive operation into a Redis GET/SET round trip.

In practice, the cache abstraction is annotation-driven: @Cacheable(cacheNames="users", key="#id") maps to a Redis key like "users::123". The default TTL is infinite unless configured via RedisCacheManagerBuilderCustomizer or application properties. Serialization defaults to JDK but should be switched to JSON (Jackson2JsonRedisSerializer) for cross-service compatibility. Without explicit TTL, stale data lives forever — a common source of silent bugs.

Use this pattern when a method is read-heavy, idempotent, and tolerates eventual consistency — for example, fetching user profiles, product catalogs, or configuration data. The latency difference is dramatic: a database query taking 50ms becomes a Redis lookup at 1-5ms. In high-throughput systems, this can reduce database load by 90%+ and prevent cascading failures under traffic spikes.

TTL Is Not Optional
Without an explicit TTL, cached data lives forever — a deployment that changes the underlying data will serve stale results until the cache is manually evicted.
Production Insight
A team deployed a new pricing algorithm but forgot to update the cache TTL — users saw old prices for 3 days until a forced cache clear.
Symptom: identical API responses across requests despite confirmed database updates; Redis keys show no expiry in TTL.
Rule: always set a TTL that matches your data's freshness SLA — start with 5 minutes and tune based on write frequency.
Key Takeaway
1. @Cacheable is a read-through cache — it only caches on success, so exceptions bypass the cache entirely.
2. Cache keys are flat strings — collisions happen if you don't design key prefixes carefully (e.g., "users::1" vs "orders::1").
3. Redis caching does not solve write consistency — you need explicit cache eviction (@CacheEvict) or a write-through pattern for mutable data.
Spring Boot Redis Caching Lifecycle THECODEFORGE.IO Spring Boot Redis Caching Lifecycle From annotation to cache hit/miss with distributed Redis @Cacheable Annotation Method-level cache trigger Cache Key Generation SpEL or custom KeyGenerator Redis Cache Lookup Check Redis for existing value Cache Hit / Miss Return cached or execute method Cache Put & TTL Store result with expiration Graceful Degradation Fallback when Redis is down ⚠ Missing @Cacheable causes 40x latency spike Always annotate Redis-bound methods to avoid direct calls THECODEFORGE.IO
thecodeforge.io
Spring Boot Redis Caching Lifecycle
Spring Boot Caching Redis

Getting Started: Dependencies and Configuration

Before writing a single annotation, you need the right dependencies and a working Redis connection with sane defaults. Most tutorials skip over the configuration details and leave you with a setup that works locally and fails under production load. That is where this section differs.

You need three dependencies: the cache abstraction starter, the Redis data starter, and the Actuator starter for monitoring. If you are on Spring Boot 3.x, these pull in Lettuce as the Redis client by default. Lettuce uses Netty for non-blocking I/O and is inherently thread-safe — it shares a single connection across all threads rather than requiring a connection per thread. That distinction matters more than most people realize.

The application.yml configuration below includes connection pool settings that are not optional for production. I debugged a latency issue on a service that was performing correctly under normal load but degrading every afternoon during peak hours. The root cause was Lettuce's connection pool exhausting at max-active=8 — the default — under concurrent burst traffic. Threads were blocking waiting for a connection slot to open. Bumping max-active to 16 and setting max-wait to 2,000ms so threads fail fast instead of hanging indefinitely resolved it completely. None of that is visible without knowing to look.

pom.xmlXML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<dependencies>
    <!-- Redis client and template support -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>
    <!-- Cache abstraction — @Cacheable, @CachePut, @CacheEvict -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-cache</artifactId>
    </dependency>
    <!-- Actuator for /actuator/caches and /actuator/metrics/cache.gets -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <!-- Configuration metadata for IDE autocomplete on @ConfigurationProperties -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-configuration-processor</artifactId>
        <optional>true</optional>
    </dependency>
</dependencies>
Lettuce vs Jedis — Why the Default Client Choice Matters Under Load
  • Lettuce is thread-safe with shared connections via Netty — burst traffic does not exhaust a fixed pool because threads do not own connections
  • Jedis requires a connection pool with a hard max-active ceiling — default of 8 connections exhausts within seconds under launch traffic
  • Lettuce uses non-blocking I/O — Jedis uses blocking I/O which ties up a thread per in-flight Redis operation
  • In practice, Lettuce handles 3x more concurrent Redis operations with the same connection count under identical hardware
  • Choose Jedis only if you have existing infrastructure that requires it or you need specific Jedis-only commands — otherwise Lettuce is the correct default for every new project
Production Insight
A service used default Lettuce pool settings with max-active=8 and no max-wait configured. Under normal traffic the pool was never exhausted — all 8 connections handled the load. During a promotional event with 6x normal concurrent users, all 8 connections were consumed simultaneously and threads began queuing indefinitely waiting for a slot to open. Response time climbed to 30 seconds per request — not because Redis was slow, but because threads were not getting access to it. The fix took 10 minutes: increase max-active to 32 and set max-wait=2000ms. Threads now wait at most 2 seconds before failing fast with a timeout exception that the graceful degradation fallback catches and routes to the database.
Key Takeaway
Lettuce is the correct default Redis client in Spring Boot 2.x and 3.x — it shares connections via Netty and does not exhaust under burst traffic the way Jedis pools do.
Connection pool tuning is not optional for production — max-active=8 is a default that fits development, not real traffic.
Always configure max-wait with a finite timeout so threads fail fast and trigger your graceful degradation path instead of hanging indefinitely.

The Caching Lifecycle: Why Distributed Caching Wins

Spring's Cache Abstraction supports multiple providers — Caffeine, Ehcache, Redis, and others — behind a common annotation interface. For microservices running multiple instances, Redis is the correct choice because it is distributed: all instances share the same cache. With a local cache like Caffeine, each instance maintains its own independent cache. A write to one instance evicts the entry from that instance's cache only. The other nine instances keep serving their stale copy until it expires. I have seen this produce genuinely confusing user-facing bugs where refreshing the page returns different data depending on which server handled the request — the kind of bug that is nearly impossible to reproduce in development.

When you annotate a method with @Cacheable, Spring wraps it with an AOP proxy. On each invocation, the proxy generates a cache key from the method arguments, checks Redis for that key, and only if the key is absent does the proxy allow the method body to execute. The result is then stored in Redis under that key before being returned to the caller. This is the Cache-Aside pattern — the application manages its own cache rather than the database doing it — and it is the dominant caching strategy in distributed Java systems.

The unless parameter is one of those details that separates a working cache from a production-ready cache. In a real e-commerce system I worked on, we had @Cacheable on product lookups without unless configured. When a product was temporarily removed from the catalog, the method returned null and the cache stored that null under the product ID key. After the product was re-added to the database, every request still returned null from Redis because the key existed and the proxy never called the method again. The entry had a 2-hour TTL so the bug persisted for up to 2 hours per affected product. Adding unless = "#result == null" was a one-line fix, but diagnosing it took considerably longer.

io/thecodeforge/cache/service/ForgeProductService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
package io.thecodeforge.cache.service;

import io.thecodeforge.cache.model.Product;
import org.springframework.cache.annotation.CacheEvict;
import org.springframework.cache.annotation.CachePut;
import org.springframework.cache.annotation.Cacheable;
import org.springframework.stereotype.Service;

@Service
public class ForgeProductService {

    /**
     * condition gates entry into caching logic entirely — evaluated BEFORE method execution.
     * Negative condition means: if id <= 0, skip the cache check AND skip storing the result.
     *
     * unless filters the result AFTER method execution.
     * unless = "#result == null" means: execute the method, but if it returned null, do not cache it.
     *
     * Both can and should be used together when the input domain has invalid ranges
     * AND the output can legitimately be absent.
     */
    @Cacheable(
        value = "products",
        key = "#id",
        unless = "#result == null",
        condition = "#id > 0"
    )
    public Product getProductById(Long id) {
        simulateDatabaseRoundTrip();
        return new Product(id, "Forge Industrial Drill", 149.50);
    }

    /**
     * @CacheEvict removes the cached entry for this product ID.
     * The next read for this ID will be a cache miss and will re-fetch from the database.
     * Use when write cost is low and you are comfortable with one post-update cache miss.
     */
    @CacheEvict(value = "products", key = "#product.id")
    public void updateProduct(Product product) {
        persistToDatabase(product);
    }

    /**
     * @CachePut always executes the method AND updates the cache with the return value.
     * The next read for this ID is a guaranteed cache hit — zero miss penalty after update.
     * More expensive on write than @CacheEvict, but the right choice in read-heavy systems.
     */
    @CachePut(value = "products", key = "#product.id", unless = "#result == null")
    public Product updateAndRefreshProduct(Product product) {
        persistToDatabase(product);
        return product;
    }

    /**
     * allEntries = true is a nuclear option — evicts everything in the products namespace.
     * Use only for admin-triggered bulk invalidations, not on hot paths.
     * Every subsequent read until the cache warms up will be a DB hit.
     */
    @CacheEvict(value = "products", allEntries = true)
    public void clearAllProductCache() {
        // Method body intentionally empty — the annotation does all the work.
    }

    private void simulateDatabaseRoundTrip() {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

    private void persistToDatabase(Product product) {
        // Database persistence logic
    }
}
Output
// First read — cache miss, method executes:
// GET /product/1 -> 2,005ms (simulated DB round trip)
//
// Subsequent reads — cache hit, method body skipped:
// GET /product/1 -> 4ms (returned from Redis)
// GET /product/1 -> 3ms (returned from Redis)
//
// Update with @CachePut — method executes, cache refreshed:
// PUT /product/1 -> 152ms (DB write + cache update in one shot)
// GET /product/1 -> 3ms (fresh data from cache, zero miss penalty)
//
// Admin cache clear with allEntries=true:
// POST /admin/clear-cache -> all products:: keys deleted
// GET /product/1 -> 2,003ms (cold cache, back to DB)
condition vs unless — Gate Before vs Filter After
These two parameters solve different problems and are frequently confused with each other. condition is evaluated before the method executes. If the condition is false, the entire cache interaction is skipped — neither the lookup nor the store happens. Use condition to exclude inputs that should never be cached regardless of what the method returns: condition = "#id > 0" prevents caching calls with invalid IDs. unless is evaluated after the method executes using the return value. If unless is true, the method ran normally but the result is not stored. Use unless to exclude specific output values: unless = "#result == null" lets the method run but prevents caching null returns. A common mistake: using condition = "#result != null" — this fails because #result is not available in the condition context, only in the unless context. The compiler will not catch it. The annotation will silently use its default behavior.
Production Insight
A product catalog service cached lookups without unless = "#result == null". When a product was deleted from the database, the getProductById method returned null and the cache stored that null. When the product was subsequently re-added, every request for the next two hours returned null from Redis — the database had the data, the cache was lying. The fix was a one-line annotation change. The diagnosis took four hours because nobody initially thought to check what value was stored in Redis for that key. redis-cli GET 'products::42' returned a JSON null literal. That was the moment it became obvious.
Key Takeaway
Distributed Redis caching ensures consistency across all application instances — local Caffeine caching does not, and will produce intermittent stale data bugs under multi-instance deployments.
Never cache null values — use unless = "#result == null" on the annotation and disableCachingNullValues() in RedisCacheConfiguration as a defense-in-depth layer.
The condition parameter is evaluated before method execution using method arguments. The unless parameter is evaluated after method execution using the return value. They compose, and you should use both.
Choosing the Right Cache Annotation
IfRead-heavy method that returns the same data for the same input — product details, user profiles, configuration
UseUse @Cacheable — checks cache first and skips method body entirely on hit. Add unless = "#result == null" and condition = "#id > 0" for precision.
IfWrite method where the next read must return fresh data with zero miss penalty
UseUse @CachePut — always executes the method and updates the cache with the result. Higher write cost, lower read cost after the write.
IfDelete or update method where you are comfortable with one post-update cache miss
UseUse @CacheEvict — removes the entry, lower write cost, next read pays the full database cost.
IfMethod that writes to the database and must simultaneously update one cache and evict another
UseUse @Caching with both put and evict sub-annotations — one method, atomic effect on multiple cache namespaces.
IfAdmin-triggered bulk invalidation that must clear an entire cache namespace
UseUse @CacheEvict(allEntries = true) — use sparingly, as it forces every subsequent read to hit the database until the cache warms up again.

Production Configuration: Serialization, TTL, and Per-Cache Settings

Spring Boot's default cache serialization is Java serialization. For Redis this means your cached objects are stored as binary blobs that are unreadable from the Redis CLI, incompatible with any service not written in Java, and fragile across deployments that change field names or types. In a production environment where you need to inspect cached data during an incident, debug a serialization failure, or share cache entries between services, Java serialization is the wrong choice without exception.

GenericJackson2JsonRedisSerializer stores objects as JSON. This makes every cached entry inspectable via redis-cli GET, readable by services in any language, and resilient to backward-compatible schema changes like adding a nullable field. When you deploy a change that adds a new field to a cached class, JSON deserialization tolerates the missing field gracefully. Java deserialization throws an InvalidClassException if the serialVersionUID changes, which it does whenever you modify a class without explicitly declaring a fixed UID.

I deployed a serialization configuration change on a Friday afternoon once — not my finest hour in terms of timing — and forgot to flush the affected cache. The running instances had new serializer configuration. The existing Redis keys held Java-serialized binary. Every deserialization attempt silently returned null. Half the site was serving empty product pages until I noticed the hit ratio had flatlined. Always flush affected caches after changing serialization strategy.

The per-cache TTL configuration is something I feel strongly about after having managed systems where a single global TTL caused repeated problems. Product catalog data that changes once a day does not need the same expiry window as user session data that must reflect changes within minutes. Setting a uniform 2-hour TTL across all caches because it is simpler means your session data is dangerously stale or your catalog data is thrashing the database. Get specific about TTL per namespace from the beginning.

io/thecodeforge/cache/config/ForgeRedisConfig.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
package io.thecodeforge.cache.config;

import org.springframework.cache.CacheManager;
import org.springframework.cache.annotation.EnableCaching;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.cache.RedisCacheConfiguration;
import org.springframework.data.redis.cache.RedisCacheManager;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.serializer.GenericJackson2JsonRedisSerializer;
import org.springframework.data.redis.serializer.RedisSerializationContext;
import org.springframework.data.redis.serializer.StringRedisSerializer;
import java.time.Duration;
import java.util.HashMap;
import java.util.Map;

@Configuration
@EnableCaching
public class ForgeRedisConfig {

    @Bean
    public CacheManager cacheManager(RedisConnectionFactory factory) {
        /*
         * Base configuration applied to all caches unless explicitly overridden.
         * Key serializer: String (human-readable in Redis CLI)
         * Value serializer: JSON (human-readable, cross-platform, tolerates schema evolution)
         * Null values: disabled — prevents caching the absence of data
         * Default TTL: 2 hours — safety net for caches not explicitly configured below
         */
        RedisCacheConfiguration defaultConfig = RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofHours(2))
            .disableCachingNullValues()
            .serializeKeysWith(
                RedisSerializationContext.SerializationPair.fromSerializer(
                    new StringRedisSerializer()
                )
            )
            .serializeValuesWith(
                RedisSerializationContext.SerializationPair.fromSerializer(
                    new GenericJackson2JsonRedisSerializer()
                )
            );

        /*
         * Per-cache TTL overrides.
         * Each entry creates a named cache with its own expiry window.
         * Caches not listed here use the defaultConfig TTL of 2 hours.
         *
         * TTL rationale:
         *   products     — 30 min: changes infrequently but reads are very high volume
         *   categories   — 6 hours: nearly static, catalog restructuring is rare
         *   userSessions — 15 min: must reflect permission changes quickly for security
         *   searchResults — 5 min: high variability, acceptable to serve slightly stale
         */
        Map<String, RedisCacheConfiguration> cacheConfigs = new HashMap<>();
        cacheConfigs.put("products",      defaultConfig.entryTtl(Duration.ofMinutes(30)));
        cacheConfigs.put("categories",    defaultConfig.entryTtl(Duration.ofHours(6)));
        cacheConfigs.put("userSessions",  defaultConfig.entryTtl(Duration.ofMinutes(15)));
        cacheConfigs.put("searchResults", defaultConfig.entryTtl(Duration.ofMinutes(5)));

        return RedisCacheManager.builder(factory)
            .cacheDefaults(defaultConfig)
            .withInitialCacheConfigurations(cacheConfigs)
            .transactionAware()  // Cache writes roll back when the surrounding DB transaction rolls back
            .build();
    }
}
Output
// Redis keys stored as human-readable JSON — inspectable via CLI during incidents:
//
// products::1
// -> {"@class":"io.thecodeforge.cache.model.Product","id":1,"name":"Forge Industrial Drill","price":149.5}
//
// categories::all
// -> {"@class":"java.util.ArrayList","@items":[{"@class":"io.thecodeforge.cache.model.Category","id":1,"name":"Hardware"}]}
//
// userSessions::abc-123-def
// -> {"@class":"io.thecodeforge.cache.model.UserSession","userId":42,"role":"ADMIN","expiresAt":"2026-04-18T14:30:00"}
//
// TTL verification via Redis CLI:
// redis-cli TTL 'products::1' -> 1742 (seconds remaining, ~29 minutes)
// redis-cli TTL 'userSessions::*' -> ~890 (seconds remaining, ~14 minutes)
// redis-cli TTL 'categories::all' -> 21387 (seconds remaining, ~5.9 hours)
transactionAware() Is Not Optional in Transactional Systems
The transactionAware() call on RedisCacheManager builder causes cache write operations to participate in surrounding Spring transactions. Without it, cache operations execute immediately regardless of whether the enclosing database transaction commits or rolls back. The failure scenario: a service method updates the database and calls @CachePut to store the updated object. The database transaction rolls back due to a constraint violation. Without transactionAware(), the cache now holds data for a database state that was never committed. The next request reads fresh incorrect data from cache — the database says one thing, the cache says another, and the cache wins for the duration of the TTL. This is exactly the kind of bug that is extremely difficult to reproduce because it only manifests when a transaction rolls back AND the cache happens to be warm for that key at the same moment.
Production Insight
A team used default Java serialization for cached entity objects. After a deployment that renamed a field from productName to name, every deserialization attempt against an existing Redis key threw an exception that the framework caught internally and returned as null. The application appeared to serve null product names for every cached entry. The Conditions Report showed nothing wrong. The logs showed nothing wrong. redis-cli GET 'products::1' returned unreadable binary that nobody could parse manually. It took 45 minutes to connect the deployment timing to the null values. The fix was switching to GenericJackson2JsonRedisSerializer and flushing the cache. With JSON, the same scenario would have produced entries missing the name field — not deserialization failures — and the new field name would populate on the next cache miss.
Key Takeaway
Always use GenericJackson2JsonRedisSerializer — Java serialization produces unreadable binary blobs, breaks on field renames between deployments, and is incompatible with non-Java consumers of your cache.
Per-cache TTL is a design decision, not a configuration detail — match each namespace's TTL to its data volatility and the cost of serving stale data in that context.
transactionAware() ensures cache writes roll back with surrounding database transactions — without it you can cache data for a transaction that never committed, and that data persists for the full TTL.

The Full Annotation Triad: @Cacheable, @CachePut, and @CacheEvict

Most caching tutorials demonstrate @Cacheable and treat the other two annotations as footnotes. In production systems that handle both reads and writes, you need all three and you need to understand when each one is the right tool. Getting this wrong does not produce errors — it produces stale data that is served with full confidence.

@Cacheable is the read-side annotation. It checks the cache before every invocation and short-circuits the method body on a cache hit. The method body only executes when the key is absent. This is the annotation you add to read-heavy methods where the result is deterministic for a given input.

@CachePut is the write-side update annotation. It always executes the method body and always writes the result to the cache afterward. No cache-check shortcut happens. The value is that after a write, the next read for that key gets fresh data from cache with zero miss penalty — the cache was updated in the same operation that updated the database.

@CacheEvict is the write-side deletion annotation. It removes the entry from the cache. The method body executes, the database is updated, and the cache entry is gone. The next read for that key is a cache miss and goes to the database. Cheaper on the write operation than @CachePut, but the trade-off is that one read after every write pays the full database cost.

The choice between @CachePut and @CacheEvict on update operations depends on your read-to-write ratio. In a system where a product is updated once a day and read 50,000 times, @CachePut is almost always the right choice — the slightly more expensive write is amortized over tens of thousands of reads that benefit from the warm cache.

@Caching is the annotation you need when a single method must affect multiple cache namespaces. I have seen this mistake repeatedly: a developer adds @CacheEvict on an update method, targets the product detail cache, ships it, and then gets a bug report that the product listing page shows stale data. The product listing cache was not evicted. Product detail and product list are two separate cache namespaces containing representations of the same entity. When you modify an entity, every cache that holds any representation of it must be invalidated.

io/thecodeforge/cache/service/ForgeProductServiceAdvanced.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
package io.thecodeforge.cache.service;

import io.thecodeforge.cache.model.Product;
import io.thecodeforge.cache.model.ProductSummary;
import org.springframework.cache.annotation.CacheConfig;
import org.springframework.cache.annotation.CacheEvict;
import org.springframework.cache.annotation.CachePut;
import org.springframework.cache.annotation.Cacheable;
import org.springframework.cache.annotation.Caching;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.stream.Collectors;

/**
 * @CacheConfig declares the default cache name for all annotations in this class.
 * Eliminates the value = "products" repetition on every annotation.
 * Methods can still override with an explicit value when needed.
 */
@Service
@CacheConfig(cacheNames = "products")
public class ForgeProductServiceAdvanced {

    // Read: check cache first, skip method on hit
    @Cacheable(key = "#id", unless = "#result == null")
    public Product getProductById(Long id) {
        return fetchFromDatabase(id);
    }

    // Read: cache the entire list under a fixed key
    // unless condition prevents caching an empty list that may be a transient state
    @Cacheable(key = "'list:all'", unless = "#result == null || #result.isEmpty()")
    public List<ProductSummary> getAllProducts() {
        return fetchAllFromDatabase().stream()
            .map(p -> new ProductSummary(p.getId(), p.getName()))
            .collect(Collectors.toList());
    }

    // Update: always execute, update detail cache — zero miss penalty on next detail read
    // Does NOT touch the list cache — use updateProductAndClearList when list must also be fresh
    @CachePut(key = "#product.id")
    public Product updateProduct(Product product) {
        saveToDatabase(product);
        return product;
    }

    /**
     * The correct update pattern when an entity appears in multiple cache namespaces:
     * - @CachePut on the detail cache: next detail read is a guaranteed hit
     * - @CacheEvict on the list cache: next list read re-fetches from DB (list is rebuilt fresh)
     *
     * Why evict the list instead of put? Rebuilding a list cache entry requires fetching
     * all items from the database — too expensive to do on every single product update.
     * Accept one list cache miss per update; pay the DB cost once to get a fresh list.
     */
    @Caching(
        put = { @CachePut(key = "#product.id") },
        evict = { @CacheEvict(key = "'list:all'") }
    )
    public Product updateProductAndClearList(Product product) {
        saveToDatabase(product);
        return product;
    }

    // Delete: remove detail cache entry, let next read re-fetch or confirm absence
    @CacheEvict(key = "#id")
    public void deleteProduct(Long id) {
        deleteFromDatabase(id);
    }

    // Nuclear option: clears the entire products namespace
    // Every subsequent read is a DB hit until the cache warms up — use with intent
    @CacheEvict(allEntries = true)
    public void clearEntireCache() {
        // Intentionally empty — annotation handles the eviction
    }

    private Product fetchFromDatabase(Long id) {
        return new Product(id, "Forge Industrial Drill", 149.50);
    }

    private List<Product> fetchAllFromDatabase() {
        return List.of(
            new Product(1L, "Forge Drill", 149.50),
            new Product(2L, "Forge Wrench", 29.99)
        );
    }

    private void saveToDatabase(Product product) { /* persistence logic */ }
    private void deleteFromDatabase(Long id) { /* deletion logic */ }
}
Output
// Initial read — cache miss:
// GET /product/1 -> 45ms (DB query, result stored in products::1)
//
// Subsequent reads — cache hit:
// GET /product/1 -> 3ms (from Redis, method body skipped)
//
// Update with @CachePut — no miss penalty on next read:
// PUT /product/1 -> 18ms (DB write + cache update)
// GET /product/1 -> 2ms (fresh data from updated cache entry)
//
// Update with @Caching — detail refreshed, list invalidated:
// PUT /product/1 via updateProductAndClearList -> 20ms
// GET /product/1 -> 2ms (detail cache hit — fresh)
// GET /products -> 38ms (list cache miss — rebuilt from DB, then cached)
// GET /products -> 4ms (list cache hit on subsequent request)
//
// Delete — detail cache cleared:
// DELETE /product/1 -> 12ms (DB delete + cache eviction)
// GET /product/1 -> 45ms (cache miss, DB returns null, not cached due to unless)
@CachePut vs @CacheEvict on Update — Making the Right Trade-off
@CachePut costs more on the write path: the method always executes, the result is serialized, and a Redis write happens. The benefit is that the next read for that key is guaranteed to be a cache hit with fresh data — zero miss penalty. @CacheEvict costs less on the write path: the method executes and Redis deletes the key. The next read is a guaranteed cache miss that pays the full database cost. The right choice depends on your read-to-write ratio for that specific entity. A product that is updated once per day and read 100,000 times should use @CachePut — one expensive write is nothing compared to 100,000 cheap cache hits. A user preference record that is updated frequently and read rarely should use @CacheEvict — the overhead of keeping the cache constantly warm is not worth it. If you are unsure, start with @CacheEvict. It is simpler, less error-prone, and you can always move to @CachePut later when you have read/write ratio data from production metrics.
Production Insight
A platform team updated a product entity using @CacheEvict that correctly targeted the product detail cache by ID. The bug report came in two hours later: the product listing page was showing the old product name. The list cache lived under a separate key — 'list:all' — in the same products namespace. The @CacheEvict had no knowledge of it. Both the detail cache and the list cache contained the same product, but only one was being evicted on update. The fix was converting the update method to use @Caching with both @CachePut for the detail entry and @CacheEvict for the list key. The lesson was written into the team's code review checklist: when you modify an entity, list every cache namespace that holds any representation of it and verify each one is handled.
Key Takeaway
@Cacheable for reads, @CachePut for updates where zero miss penalty matters, @CacheEvict for deletions and updates where write simplicity matters more than read speed.
@Caching is the correct tool when one write affects multiple cache namespaces — do not add two separate annotations on two separate methods when one method should do both operations atomically.
Forgetting to evict related caches — list caches, summary caches, aggregated views — is the single most common source of stale data bugs in production caching implementations.

Custom Key Generation: Handling Complex Method Signatures

Spring's default key generator is SimpleKeyGenerator. For methods with a single parameter, it uses that parameter as the key directly. For methods with multiple parameters, it constructs a composite key from all parameters. This works for simple cases but creates real problems the moment you have methods with identical parameter signatures across the same cache namespace.

In a product service I worked on, we had two methods: getProductById(Long id) and getInventoryCount(Long id). Both accepted a single Long parameter. Both used the same cache namespace. SimpleKeyGenerator produced the key 42 for getProductById(42L) and also 42 for getInventoryCount(42L). In practice this meant that whichever method was called first would populate the cache, and the second method would read that result and serve it as if it were its own. Getting a Product object back from a method that should return an Integer inventory count causes an immediate ClassCastException — which is actually the best-case scenario because it surfaces the bug immediately. The subtle version is when the types are compatible and wrong data is served silently.

A custom key generator that includes the class name and method name in every key eliminates this class of bug entirely. It adds a small amount of key length overhead — the keys become more verbose — but the clarity and safety are worth it on any system with more than a handful of cached methods.

io/thecodeforge/cache/config/ForgeKeyGeneratorConfig.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package io.thecodeforge.cache.config;

import org.springframework.cache.interceptor.KeyGenerator;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.lang.reflect.Method;
import java.util.Arrays;
import java.util.stream.Collectors;

@Configuration
public class ForgeKeyGeneratorConfig {

    /**
     * Custom key generator that prefixes every cache key with the class name and method name.
     * Prevents cache key collisions between methods that have identical parameter signatures.
     *
     * Key format: ClassName:methodName:[param1, param2, ...]
     * Example: ForgeProductService:getProductById:[42]
     *
     * Reference this generator in annotations: @Cacheable(keyGenerator = "forgeKeyGenerator")
     * Or set it as the global default in RedisCacheManager if you want it applied everywhere.
     */
    @Bean("forgeKeyGenerator")
    public KeyGenerator forgeKeyGenerator() {
        return (Object target, Method method, Object... params) -> {
            String className = target.getClass().getSimpleName();
            String methodName = method.getName();

            // Handle zero-parameter methods cleanly
            String paramsPart = (params == null || params.length == 0)
                ? "no-args"
                : Arrays.stream(params)
                    .map(p -> p == null ? "null" : p.toString())
                    .collect(Collectors.joining(",", "[", "]"));

            return className + ":" + methodName + ":" + paramsPart;
        };
    }
}
Output
// Generated cache keys — human-readable, collision-free:
//
// ForgeProductService:getProductById:[42]
// ForgeProductService:getInventoryCount:[42]
// ForgeProductService:getAllProducts:[no-args]
// ForgeProductServiceAdvanced:getProductById:[1]
//
// These keys are distinct even when the underlying parameters are identical.
// redis-cli --scan --pattern '*getProductById*' finds only product entries.
// redis-cli --scan --pattern '*getInventoryCount*' finds only inventory entries.
//
// Register in annotation:
// @Cacheable(value = "products", keyGenerator = "forgeKeyGenerator")
//
// Or set as global default in RedisCacheManager:
// .cacheDefaults(defaultConfig.computePrefixWith(name -> name + "::forgeKeyGenerator::"))
When to Use Custom Keys vs SpEL Expressions
You have two options for controlling cache key format. SpEL expressions in the key attribute work well for simple, method-specific customization: key = "#id + ':' + #region" or key = "T(java.util.Objects).hash(#userId, #tenantId)". They are readable inline and do not require a separate bean. Use SpEL when the key logic is specific to one method. A custom KeyGenerator bean works better when you want consistent behavior across all cached methods without remembering to add a SpEL expression to each one. It is also easier to test in isolation. Use a custom generator when you have more than a handful of cached methods and want to enforce a naming convention globally. The two approaches can coexist: set a global custom generator as the default and override with explicit key SpEL on specific methods that need different behavior.
Production Insight
A services layer had getProductById(Long id) and getInventoryCount(Long id) both cached in the same namespace with default key generation. Under normal operation the bug was dormant — the two methods were rarely called in close succession for the same ID. During a load test that exercised both endpoints concurrently, the ClassCastException appeared intermittently. Intermittent ClassCastException from a cached method call is a reliable signal of a key collision — the method received a cached value that was stored by a different method. A custom key generator that prefixed class and method name to every key resolved all collisions in one change.
Key Takeaway
SimpleKeyGenerator uses method parameters directly as cache keys — methods with identical parameter types across the same namespace will collide and produce wrong data or ClassCastException.
Always use a custom key generator when multiple cached methods in the same namespace accept parameters of the same type, or when you have overloaded methods.
Including the class name and method name in every generated key is the simplest and most reliable way to prevent collisions without requiring per-annotation SpEL expressions.

Monitoring and Observability: Know Your Cache Hit Ratio

A cache you cannot observe is a cache you cannot trust. You may think it is working. It may not be. And you will not find out until your database bills spike or your on-call rotation gets a 3 AM page.

The single most important caching metric is the hit ratio: the proportion of cache lookups that return a cached value versus those that fall through to the database. A hit ratio below 80% on a cache that is supposed to be saving you database calls is a signal that something is wrong — TTLs are too short for the access pattern, cache keys are not matching, eviction is happening too aggressively, or the cache is simply cold after a recent deployment.

Spring Boot Actuator with Micrometer exports cache metrics automatically when you have spring-boot-starter-actuator and the metrics.cache.instrument property enabled. The cache.gets metric is tagged with result:hit and result:miss, giving you the raw counts to calculate the ratio. It is also tagged with cache:products, cache:categories, and so on, so you can see the ratio per namespace rather than aggregated across all caches — which matters because a problem in one namespace is invisible when its misses are averaged with hits from five healthy namespaces.

On a production dashboard I maintained, we had an alert set on per-namespace hit ratio dropping below 85% for more than five consecutive minutes. That alert fired once at 9:15 AM on a Monday — a deployment the previous Friday had changed how the products cache key was formatted. The old keys still existed in Redis but the new key format no longer matched them. The cache appeared full and healthy from a memory perspective. From a hit perspective, it was 0% — every request was a miss against a cache full of orphaned keys that would never be hit again. The alert fired in 5 minutes. Without it, we would have found out when the database team escalated CPU alarms at peak afternoon traffic.

application.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
spring:
  cache:
    type: redis
  redis:
    host: ${REDIS_HOST:localhost}
    port: ${REDIS_PORT:6379}
    password: ${REDIS_PASSWORD:}
    timeout: 2000ms
    lettuce:
      pool:
        max-active: 16    # Increase from default 8 — exhausts fast under burst traffic
        max-idle: 8
        min-idle: 4
        max-wait: 2000ms  # Fail fast: threads wait max 2s for a connection, then get an exception

management:
  endpoints:
    web:
      exposure:
        # Expose the endpoints needed for cache observability
        include: caches, metrics, health, info
  metrics:
    cache:
      instrument: true  # Required to enable cache.gets, cache.puts, cache.evictions metrics
    export:
      prometheus:
        enabled: true   # Scrape-ready for Prometheus — pair with Grafana for dashboards
  endpoint:
    health:
      show-details: always  # Shows Redis connectivity status in health response
Output
// GET /actuator/caches — lists all registered cache namespaces:
// {"cacheManagers":{"cacheManager":{"caches":{
// "products": {"target":"org.springframework.data.redis.cache.RedisCache"},
// "categories": {"target":"org.springframework.data.redis.cache.RedisCache"},
// "userSessions":{"target":"org.springframework.data.redis.cache.RedisCache"}
// }}}}
//
// GET /actuator/metrics/cache.gets — total lookup count across all caches:
// {"name":"cache.gets","measurements":[{"statistic":"COUNT","value":28419}],
// "availableTags":[
// {"tag":"result","values":["hit","miss"]},
// {"tag":"cache","values":["products","categories","userSessions"]}
// ]}
//
// GET /actuator/metrics/cache.gets?tag=result:hit&tag=cache:products
// {"measurements":[{"statistic":"COUNT","value":26203}]}
//
// GET /actuator/metrics/cache.gets?tag=result:miss&tag=cache:products
// {"measurements":[{"statistic":"COUNT","value":2216}]}
//
// Per-namespace hit ratio: 26203 / (26203 + 2216) = 92.2% — healthy
//
// Prometheus query for Grafana panel:
// rate(cache_gets_total{result="hit",cache="products"}[5m])
// / rate(cache_gets_total{cache="products"}[5m])
Alert on Per-Namespace Hit Ratio Below 85%, Not Aggregate
A global cache hit ratio of 90% looks healthy. But if one cache namespace is at 40% and all others are at 98%, the global number hides the problem. Alert on per-namespace hit ratios using the cache tag in Micrometer metrics. Also monitor Redis memory pressure separately from hit ratio. A Redis instance approaching maxmemory will begin evicting keys based on the configured policy. The evictions metric in cache metrics will start climbing. If evictions are happening, your hit ratio will degrade even if your TTL and key strategy are correct — you are simply losing entries to memory pressure before they expire naturally.
Production Insight
A deployment on a Friday changed the products cache key format from products::42 to products::product:42 to support multi-entity namespacing in a future refactor.
The old keys already in Redis used the old format and were never matched by the new code — hit ratio dropped from 93% to 0% for the products namespace instantly.
Without per-namespace hit ratio alerting, this was discovered 4 hours later via a database CPU alarm — with a proper alert at 85%, it fires within 5 minutes of deployment.
Key Takeaway
Cache hit ratio is the primary health signal for caching — monitor it per namespace, not just in aggregate, and alert on it before it reaches your database team's inbox as a CPU spike.
Actuator exposes cache.gets with result and cache tags — calculate per-namespace hit ratio as hits / (hits + misses) and export to Prometheus for time-series alerting.
A sudden hit ratio drop immediately after a deployment almost always means a key format change that left orphaned keys in Redis that no longer match new requests.

Graceful Degradation: When Redis Goes Down

Here is an uncomfortable truth that most caching tutorials skip: Redis will go down. Not might — will. A network partition, a memory exhaustion event, a cloud provider maintenance window, a misconfigured deployment that sends the wrong credentials. The question is not whether Redis will be unavailable at some point, but whether your application handles it gracefully or returns a page of 500 errors.

If every Redis connection failure translates directly into an unhandled exception that propagates to your controllers, Redis is not a cache — it is a single point of failure. Your application has an undeclared hard dependency on a piece of infrastructure that you are presenting to users as optional performance optimization.

The correct architecture: when Redis is unreachable, fall back to the database directly. The application becomes slower — every request pays the full database cost — but it remains functional. Users experience degraded performance rather than a broken application. This is a measurably better user outcome.

I was on the team for a Black Friday incident where Redis hit its configured maxmemory limit at 11:47 AM and started rejecting new connections. We had a connection pool of 16 — all 16 slots were taken by threads trying to write to a Redis that was rejecting them. New requests queued behind those threads. Within 90 seconds, the checkout flow was returning 503s under load balancer timeout. We had implemented fallback logic in the payment service but not in the product cache service — the product cache service was considered lower risk. It turned out to be the bottleneck that took down checkout. After that incident, every service that touched Redis got explicit fallback logic regardless of perceived risk.

io/thecodeforge/cache/service/ForgeResilientProductService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
package io.thecodeforge.cache.service;

import io.thecodeforge.cache.model.Product;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.cache.Cache;
import org.springframework.cache.CacheManager;
import org.springframework.stereotype.Service;

/**
 * Demonstrates explicit cache interaction with graceful fallback.
 *
 * This pattern is used when you need finer control than @Cacheable provides —
 * for example, when you want to handle Redis failures differently per method,
 * or when you want to log cache miss reasons at different severity levels.
 *
 * For simpler cases, configure a CacheErrorHandler bean that Spring Boot
 * calls automatically on cache exceptions without removing the @Cacheable annotation.
 */
@Service
public class ForgeResilientProductService {

    private static final Logger log = LoggerFactory.getLogger(ForgeResilientProductService.class);
    private static final String PRODUCTS_CACHE = "products";

    private final CacheManager cacheManager;
    private final ForgeProductRepository productRepository;

    public ForgeResilientProductService(
        CacheManager cacheManager,
        ForgeProductRepository productRepository
    ) {
        this.cacheManager = cacheManager;
        this.productRepository = productRepository;
    }

    public Product getProductWithFallback(Long id) {
        try {
            Cache cache = cacheManager.getCache(PRODUCTS_CACHE);
            if (cache != null) {
                Cache.ValueWrapper wrapper = cache.get(id);
                if (wrapper != null && wrapper.get() != null) {
                    return (Product) wrapper.get();
                }
            }

            // Cache miss — fetch from database
            Product product = productRepository.findById(id).orElse(null);

            // Only cache non-null results — do not cache absence
            if (cache != null && product != null) {
                try {
                    cache.put(id, product);
                } catch (Exception writeEx) {
                    // Redis write failure should not fail the request
                    // The data was fetched successfully — return it even without caching
                    log.warn("Redis write failed for products::{} — serving DB result without caching",
                        id, writeEx);
                }
            }

            return product;

        } catch (Exception readEx) {
            // Redis is completely unreachable — skip cache, go directly to DB
            log.warn("Redis unavailable, falling back to direct DB access for product id={}",
                id, readEx);
            return productRepository.findById(id).orElse(null);
        }
    }
}
Output
// Normal operation — Redis available:
// GET /api/product/1 -> Cache miss: 42ms (DB fetch + Redis write)
// GET /api/product/1 -> Cache hit: 3ms (Redis read, method short-circuited)
//
// Redis unreachable — graceful fallback:
// GET /api/product/1 ->
// WARN: Redis unavailable, falling back to direct DB access for product id=1
// -> 42ms (DB fetch, no cache write attempted)
// -> 200 OK with correct product data (slow but not broken)
//
// Redis write fails but read works (partial degradation):
// GET /api/product/1 ->
// WARN: Redis write failed for products::1 — serving DB result without caching
// -> 42ms (DB result returned, not cached this time)
//
// Redis recovers — normal operation resumes automatically:
// GET /api/product/1 -> Cache hit: 3ms (no restart needed, first successful write restored the entry)
Cache Is an Optimization, Not a Hard Dependency
  • Always size your database to handle 100% of read traffic with zero cache assistance — this is not pessimistic, it is the only safe design
  • Graceful degradation means your users experience increased latency, not a 500 error page — that is a categorically different user impact
  • Resilience4j circuit breakers can automate the fallback: after N consecutive Redis failures, stop trying Redis entirely and route all calls to the database until a health check probe succeeds
  • When Redis recovers after an outage, the circuit breaker allows a small number of probe requests through before fully restoring cache routing — prevents thundering herd on recovery
  • Log Redis failures at WARN level, not ERROR — they are operational events, not application bugs, and you do not want them triggering high-severity PagerDuty alerts at 3 AM
Production Insight
On Black Friday, a product cache service hit Redis maxmemory at 11:47 AM. Redis started rejecting connection requests. The product cache service had no graceful degradation — every Redis rejection became an unhandled exception that propagated as a 503. The payment flow depended on the product service to validate items in the cart before processing payment. With product lookups failing, checkout broke. The incident lasted 22 minutes. The post-mortem had one primary action item: implement Redis fallback in every service, regardless of perceived risk. The product cache service was considered non-critical. It was not.
Key Takeaway
Redis will become unavailable at some point — the only question is whether your application degrades gracefully or fails loudly.
Implement explicit try-catch fallback in every service that uses Redis, whether through direct cache API calls or a CacheErrorHandler bean registered with the CacheManager.
Size your database to handle 100% of traffic without cache assistance — if it cannot, Redis is a hard dependency, not a cache, and must be treated with the same SLA obligations as your database.

Docker Setup for Local Development

Testing caching locally requires a Redis instance that behaves like production. The most common local development mistake is running Redis with no memory limit — which means it will never evict keys, never experience memory pressure, and will never reproduce the class of bugs that only appear when Redis starts making eviction decisions under load.

The docker-compose configuration below mirrors production behavior by setting maxmemory to 256MB and using the allkeys-lru eviction policy. Under this configuration, your local Redis behaves the same way as a production Redis under memory pressure. Keys that have not been accessed recently get evicted when memory fills up. If your application has a bug where it never re-fetches an evicted key correctly, this local configuration surfaces it before you ship.

allkeys-lru means: when Redis needs to free memory, evict the least recently accessed key regardless of whether it has a TTL. Other options are volatile-lru (only evict keys that have a TTL set, leave no-TTL keys alone), allkeys-lfu (evict least frequently used rather than least recently used), and noeviction (reject write commands when full, which causes Redis write failures). For caching, allkeys-lru is almost always the right choice because you want the cache to self-manage under pressure and retain the most actively accessed data automatically.

docker-compose.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
services:
  redis:
    image: redis:7.2-alpine  # Pin to a specific minor version — alpine for smaller image footprint
    ports:
      - "6379:6379"
    command: >
      redis-server
      --maxmemory 256mb
      --maxmemory-policy allkeys-lru
      --appendonly yes
      --appendfsync everysec
    volumes:
      - redis-data:/data  # Persist data across docker-compose restarts
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 5s

  redis-commander:
    # Web UI for browsing cached keys and inspecting JSON values during development
    # Remove from production — use RedisInsight or Grafana dashboard instead
    image: rediscommander/redis-commander:latest
    ports:
      - "8081:8081"
    environment:
      - REDIS_HOSTS=local:redis:6379
    depends_on:
      redis:
        condition: service_healthy  # Wait for Redis health check to pass before starting

volumes:
  redis-data:
Output
// Start the stack:
// docker-compose up -d
//
// Verify Redis is up and responding:
// docker-compose exec redis redis-cli ping
// -> PONG
//
// Check memory configuration matches what you set:
// docker-compose exec redis redis-cli INFO memory | grep -E 'used_memory_human|maxmemory_human'
// -> used_memory_human: 2.34M
// -> maxmemory_human: 256.00M
//
// Verify eviction policy:
// docker-compose exec redis redis-cli CONFIG GET maxmemory-policy
// -> maxmemory-policy: allkeys-lru
//
// Browse cached keys via Redis Commander:
// http://localhost:8081
//
// Verify application health including Redis connectivity:
// curl http://localhost:8080/actuator/health
// -> {"status":"UP","components":{"redis":{"status":"UP","details":{"version":"7.2.x"}}}}
Redis Commander for Development, RedisInsight for Production Investigation
Redis Commander provides a lightweight web UI for browsing cached keys, inspecting JSON values, and manually flushing caches during development. It is invaluable when debugging serialization issues — you can see exactly what is stored under a key without constructing a redis-cli command. For production investigation, Redis Commander is not appropriate — it has no authentication by default and exposes full read/write access to your cache. Use RedisInsight (the official Redis desktop application) or build a Grafana dashboard from Prometheus metrics for production observability. The Actuator endpoints provide all the runtime data you need without requiring direct Redis access in production.
Production Insight
A team ran local Redis without maxmemory configured. Their cache tests passed consistently.
In production with a 2GB limit, allkeys-lru evictions during peak traffic exposed a broken fallback path.
The fallback bug was never triggered locally because unlimited Redis never evicted anything — fix: add low-memory integration tests.
Key Takeaway
Configure maxmemory and maxmemory-policy locally to mirror production — unlimited Redis hides eviction-related bugs that only surface under load.
allkeys-lru is the correct eviction policy for caching — it retains recently accessed data and self-manages under memory pressure.
Redis Commander is a development tool for inspecting cached values — replace it with Actuator endpoints and Grafana for production observability.

Testing Cached Methods: Verify Before You Ship

Caching bugs have a property that makes them particularly expensive: they are usually invisible in development and only surface under production conditions. A cache hit ratio problem requires production-scale traffic to manifest. A TTL misconfiguration takes the full TTL duration to produce stale data. A null-caching bug requires a specific sequence of events — data absent, then present — that is hard to replicate in a unit test.

Despite this, a small set of integration tests catches the majority of caching bugs before they reach production. The three categories you need: cache hit verification (the second call is faster and comes from Redis), eviction verification (cache is empty after the appropriate update or delete operation), and null protection verification (null results are not stored in cache). These three test types cover the happy path, the write path, and the edge case that has bitten the most teams I have worked with.

Write these tests against a real Redis instance, not a mock. Spring's embedded Redis testing support exists but a real Redis instance in Docker reveals serialization bugs, TTL configuration bugs, and connection pool behavior that mocks hide. Use Testcontainers in your CI pipeline to spin up a Redis container for integration tests — it adds two seconds to test startup and is worth every millisecond.

In one of our CI pipelines, we had a cache hit test that verified the second invocation was at least 10x faster than the first. After a refactor changed the cache key from SpEL expression using #id to using the full object #product.id, the test failed because the key format changed and the second call was not a cache hit anymore. The test caught it. The alternative was a 30% database load increase in production that would have taken hours to trace back to a cache key change.

io/thecodeforge/cache/test/ForgeProductServiceCacheTest.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
package io.thecodeforge.cache.test;

import io.thecodeforge.cache.model.Product;
import io.thecodeforge.cache.service.ForgeProductService;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.cache.Cache;
import org.springframework.cache.CacheManager;

import static org.assertj.core.api.Assertions.assertThat;

/**
 * Integration tests for caching behavior.
 *
 * These tests run against a real Redis instance (local Docker or Testcontainers in CI).
 * Mocked cache managers do not catch serialization bugs, TTL bugs, or key format bugs.
 *
 * Test categories:
 *   1. Cache hit — second call returns cached value, method body not re-executed
 *   2. Eviction — update/delete operation correctly removes cache entry
 *   3. Null protection — null method results are not stored in Redis
 */
@SpringBootTest
class ForgeProductServiceCacheTest {

    @Autowired
    private ForgeProductService productService;

    @Autowired
    private CacheManager cacheManager;

    @BeforeEach
    void clearAllCaches() {
        // Isolation: start each test with an empty cache
        // Prevents one test's cache state from affecting another
        cacheManager.getCacheNames().forEach(name -> {
            Cache cache = cacheManager.getCache(name);
            if (cache != null) {
                cache.clear();
            }
        });
    }

    @Test
    @DisplayName("Second call should be served from cache — method body should not execute again")
    void shouldCacheProductAfterFirstCall() {
        // First call — cold cache, method body executes, result stored in Redis
        long start1 = System.currentTimeMillis();
        Product first = productService.getProductById(1L);
        long duration1 = System.currentTimeMillis() - start1;

        // Second call — warm cache, method body skipped, result from Redis
        long start2 = System.currentTimeMillis();
        Product second = productService.getProductById(1L);
        long duration2 = System.currentTimeMillis() - start2;

        assertThat(first).isNotNull();
        assertThat(second.getId()).isEqualTo(first.getId());

        // Cache hit should be at least 10x faster than DB call
        // Adjust threshold based on your simulated DB latency
        assertThat(duration2)
            .as("Cache hit should be significantly faster than DB call (first: %dms, second: %dms)",
                duration1, duration2)
            .isLessThan(duration1 / 10);

        // Verify the entry actually exists in Redis under the expected key
        Cache productsCache = cacheManager.getCache("products");
        assertThat(productsCache).isNotNull();
        Cache.ValueWrapper cached = productsCache.get(1L);
        assertThat(cached).isNotNull();
        assertThat(cached.get()).isInstanceOf(Product.class);
    }

    @Test
    @DisplayName("Cache entry should be absent after update triggers @CacheEvict")
    void shouldEvictCacheOnUpdate() {
        // Warm the cache
        Product product = productService.getProductById(1L);
        assertThat(cacheManager.getCache("products").get(1L)).isNotNull();

        // Trigger eviction
        productService.updateProduct(product);

        // Verify the entry is gone
        assertThat(cacheManager.getCache("products").get(1L)).isNull();
    }

    @Test
    @DisplayName("Null return values should not be stored in the cache")
    void shouldNotCacheNullResult() {
        // ID 999 does not exist — method returns null
        Product result = productService.getProductById(999L);

        assertThat(result).isNull();

        // Verify the cache entry does not exist — null should not be cached
        Cache.ValueWrapper cached = cacheManager.getCache("products").get(999L);
        assertThat(cached)
            .as("Null result should not be stored in cache — unless = '#result == null' should prevent it")
            .isNull();
    }

    @Test
    @DisplayName("@CachePut should update cache without requiring a subsequent cache miss")
    void shouldUpdateCacheWithCachePut() {
        // Initial fetch — cache miss
        productService.getProductById(1L);
        assertThat(cacheManager.getCache("products").get(1L)).isNotNull();

        // Update with @CachePut — cache should be updated, not evicted
        Product updated = new Product(1L, "Forge Updated Drill", 199.99);
        Product returned = productService.updateAndRefreshProduct(updated);

        // Cache entry should still exist — not evicted, updated
        Cache.ValueWrapper cached = cacheManager.getCache("products").get(1L);
        assertThat(cached).isNotNull();
        assertThat(((Product) cached.get()).getName()).isEqualTo("Forge Updated Drill");

        // Verify no extra DB call needed — next read is a cache hit
        long start = System.currentTimeMillis();
        Product afterUpdate = productService.getProductById(1L);
        long duration = System.currentTimeMillis() - start;
        assertThat(duration).isLessThan(50); // Should be a cache hit — sub-50ms
    }
}
Output
// Running ForgeProductServiceCacheTest...
//
// PASS: shouldCacheProductAfterFirstCall
// first call (DB): 2,014ms
// second call (cache): 4ms
// ratio: 503x speedup — cache hit confirmed
// cache entry exists in Redis under key products::1
//
// PASS: shouldEvictCacheOnUpdate
// cache entry found after initial fetch
// cache entry null after updateProduct() — @CacheEvict confirmed
//
// PASS: shouldNotCacheNullResult
// getProductById(999L) returned null
// cache.get(999L) is null — unless = '#result == null' working correctly
//
// PASS: shouldUpdateCacheWithCachePut
// cache updated to 'Forge Updated Drill' without eviction
// subsequent read returned in 4ms — confirmed cache hit after @CachePut
//
// Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
Three Tests That Catch 90% of Caching Bugs Before They Reach Production
If you write nothing else, write these three tests for every cached service: (1) cache-hit timing test — first call is slow, second call is at least 10x faster, and the cache entry exists in Redis after the first call. (2) Eviction test — cache entry is present before the update and null after the update or delete operation. (3) Null-caching test — when the method returns null, no entry is written to Redis. These three tests catch: wrong cache name in annotation, key format that does not match on second call, missing or broken @CacheEvict, and missing unless = "#result == null".
Production Insight
A refactor changed the @Cacheable key SpEL expression from #id (the Long parameter directly) to #product.id (accessing a field on an object parameter). The cache name and the underlying data were unchanged. The test that verified the second call was a cache hit failed immediately because the new key format products::product:1 did not match any existing cache entries. The developer caught it in CI within 30 seconds of the test run completing. Without that test, the change would have shipped, the cache would have been effectively disabled for every product lookup (every call would generate a new key and be a miss), and the database CPU alert would have fired several hours later when traffic peaked.
Key Takeaway
Write integration tests against a real Redis instance — mocks cannot catch serialization bugs, key format bugs, or TTL misconfiguration.
The three test categories to cover: cache hit speed verification, eviction correctness, and null result protection. These catch the vast majority of caching bugs at the annotation and configuration level.
A cache test that fails in CI for 30 seconds is worth more than a database CPU alert that fires hours after the problematic deployment ships.

Why Redis Beats EhCache When Your App Lives on Two Servers

EhCache is fine for a single JVM. But the moment you scale horizontally, you're hosed. Each instance holds its own copy of the cache. Instance A invalidates a record, instance B cheerfully serves stale data for the next 10 minutes.

Redis gives you a single source of truth for cached data. Every app instance talks to the same Redis cluster. Invalidation is instant and global. No more "works on my machine" bugs that turn into production data corruption.

This isn't academic. I've debugged midnight PagerDuty alerts where a user updated their profile on one pod, then got the old version from another. The fix was migrating from EhCache to Redis. The cost was a few hours of config work. The saved sleep was priceless.

Redis also brings data structures that EhCache can't touch. Need to cache a sorted leaderboard? Redis sorted sets handle that natively. Need to expire stale sessions? TTL per key, not per cache region.

The rule: if your app runs on one server and never will, use EhCache. Otherwise, stop pretending and add Redis.

RedisVsEhCache.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge — java tutorial

// EhCache config — tied to JVM heap, no cross-instance sharing
@Configuration
@EnableCaching
public class EhCacheConfig {
    @Bean
    public CacheManager cacheManager() {
        return new EhCacheCacheManager();
    }
}

// Redis config — single cluster, every instance reads the same data
@Configuration
@EnableCaching
public class RedisCacheConfig {
    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory redis) {
        return RedisCacheManager.builder(redis)
                .cacheDefaults(defaultConfig())
                .withCacheConfiguration("products",
                    RedisCacheConfiguration.defaultCacheConfig()
                        .entryTtl(Duration.ofMinutes(15)))
                .build();
    }
}
Output
# With EhCache: Each pod has its own cache. Output varies per instance.
# With Redis: All pods return the same cached value.
Production Trap:
EhCache with multiple instances is a ticking bomb. If you see "cache miss" rates differ between pods, you're serving stale data to customers. Add Redis before the incident postmortem forces you to.
Key Takeaway
Distributed caches (Redis) guarantee cache consistency across instances. Local caches (EhCache) do not. Pick based on deployment topology, not convenience.

Spring's Cache Abstraction Was Built to Let You Swap Providers in a Day

Your service layer shouldn't care whether it's backed by Redis, EhCache, Hazelcast, or a hashmap in a shoebox. Spring's cache abstraction enforces that separation. You annotate your methods, then configure the provider in a single config class.

Here's the trap most devs fall into: they couple their code to Redis-specific APIs. They inject RedisTemplate everywhere. They write manual cache get/put calls. That's not caching, that's writing a nosql client with extra steps.

Stick to @Cacheable, @CachePut, @CacheEvict. These annotations abstract away the storage backend. When your CTO decides to migrate from Redis to Hazelcast because "the CEO read a blog post", you change one bean and redeploy. The rest of the codebase doesn't flinch.

This abstraction is battle-tested. It's not theoretical. I've swapped Redis for Hazelcast during a migration that took 90 minutes. The annotations never changed. The bulk of the work was testing TTL and serialization differences.

Your job is to write business logic that works regardless of the caching vendor. Let Spring handle the plumbing.

SwapCacheProvider.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// io.thecodeforge — java tutorial

// Service layer — zero Redis imports, just Spring annotations
@Service
public class InventoryService {
    private final ProductRepository repo;

    public InventoryService(ProductRepository repo) {
        this.repo = repo;
    }

    @Cacheable(value = "products", key = "#sku")
    public Product findBySku(String sku) {
        return repo.findBySku(sku);
    }
}

// Config that you change when the provider changes
@Configuration
@EnableCaching
public class CacheConfig {
    // Today: Redis
    // Tomorrow: Hazelcast — just swap this bean
    @Bean
    public CacheManager cacheManager(RedisConnectionFactory rcf) {
        return RedisCacheManager.builder(rcf).build();
    }
}
Output
# Before swap: Product returned from Redis cluster
# After swap: Same annotation, now from Hazelcast. Service layer unchanged.
Senior Shortcut:
Write repository or service methods that use @Cacheable with SpEL keys. Never inject RedisTemplate into business logic. You'll thank yourself when the caching provider changes.
Key Takeaway
Spring's cache abstraction decouples caching from storage. Annotate methods, configure provider once, swap vendors without touching business logic.

Overview: Why You Need a Shared Cache for Multi-Instance Apps

If your Spring Boot app runs on a single instance, an in-memory cache like EhCache or Caffeine works fine. The moment you scale to two servers, your cache splits into two isolated islands. Server A caches a result, Server B misses it and recomputes it. Your database still takes the hit.

Redis solves this by sitting outside your application. Every instance talks to the same cache store. A cached method result on Server A is immediately available to Server B. No duplicate work, no stale data from mismatched local caches.

The real win is consistency. With a shared cache, your application behaves identically regardless of which instance serves the request. That matters when you have rolling deployments, blue-green environments, or autoscaling groups coming up and down. Redis makes your caching strategy deployment-agnostic.

Do not mistake this for complexity. Spring's Cache Abstraction hides the distributed nature behind the same @Cacheable annotation you already know. Your code does not care whether the cache lives on local memory or a remote server. The configuration changes, not the logic.

CacheOverviewExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.theforge — java tutorial

// Single-instance: EhCache works fine
@Cacheable("products")
public Product findProduct(String id) {
    return repository.findById(id);
}

// Multi-instance: Switch provider, same annotation
@Configuration
@EnableCaching
public class CacheConfig {
    @Bean
    public CacheManager cacheManager(RedisConnectionFactory factory) {
        return RedisCacheManager.builder(factory).build();
    }
}
Output
No output — configuration class
Senior Shortcut:
Start all new projects with Redis caching from day one. Even if you only run one instance today, the annotation-driven abstraction means zero code change when you scale tomorrow. The cost is a single dependency and a few lines of config.
Key Takeaway
Distributed caching with Redis makes your application behavior instance-independent. Your code does not change — only your infrastructure.

Conclusion: Cache Abstraction Separates What Stays From What Changes

Production caching is not a feature. It is a survival strategy. Redis gives you the distributed backbone; Spring's cache abstraction gives you the independence to swap it out when a better option appears.

The patterns here are not optional suggestions. @CacheEvict on writes prevents stale reads. Custom key generators avoid cache collisions from overloaded method signatures. Proper serialization stops your server from burning CPU cycles converting objects at every request. And monitoring your cache hit ratio is the only way to know whether your caching strategy actually works or just adds complexity.

The real takeaway: Redis handles the distributed pain. Spring's abstraction handles the vendor lock-in fear. Together, they let you cache without compromise.

Stop overcomplicating it. Add Redis, annotate your expensive methods, and measure the results. Your database will thank you.

CacheSummaryExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.theforge — java tutorial

// Your production caching pattern in 10 lines
@Service
public class OrderService {
    @Cacheable(value = "orders", key = "#orderId", unless = "#result == null")
    public Order getOrder(String orderId) {
        return orderRepository.findById(orderId).orElse(null);
    }

    @CacheEvict(value = "orders", key = "#order.orderId")
    public void updateOrder(Order order) {
        orderRepository.save(order);
    }
}
Output
No output — service class
Production Trap:
Never cache null results unless you explicitly handle them. One null pointer later, you will be debugging a cache that silently propagates errors across every instance. Always use 'unless = "#result == null"' or a NullObject pattern.
Key Takeaway
Cache abstraction separates infrastructure concerns from business logic. Redis handles distribution; annotations handle correctness. Use both, and your system scales without rewrites.
● Production incidentPOST-MORTEMseverity: high

The 40x Latency Spike — Missing @Cacheable on a Hot Path

Symptom
During a product launch that had been planned for weeks, the product detail page degraded from 50ms average response time to over 2,000ms within two minutes of the campaign going live. Database CPU pinned at 100%. The HikariCP connection pool exhausted within 30 seconds and threads began queuing. Users saw loading spinners turn into timeouts and 503 errors from the load balancer. The incident page lit up.
Assumption
The on-call engineer opened the RDS console first, which is the instinct when database metrics are spiking. The working theory for the first two hours was that the database was under-provisioned for launch traffic and needed a vertical scale. A second engineer was looking at slow query logs and adding composite indexes to tables that were not actually the problem. Nobody was looking at the application layer.
Root cause
The getProductById method had existed for months without a @Cacheable annotation. During normal traffic levels, the database handled the load without complaint — the absence of caching was invisible. Under launch traffic at 10x volume, the same product detail was being fetched from the database approximately 50,000 times per minute instead of once per cache TTL window. The database was not slow or under-provisioned. It was doing 50,000 times more work than it needed to, all of it redundant, all of it returning identical data. The entire incident was caused by a single missing annotation.
Fix
Added @Cacheable(value = "products", key = "#id", unless = "#result == null") to the getProductById method. Configured a 30-minute TTL for the products cache namespace via RedisCacheManager. Database CPU dropped from 100% to under 5% within 90 seconds of deployment. Response time returned to 3ms from Redis for subsequent requests. Total time from incident open to resolution: 2 hours and 40 minutes, of which the actual fix took under 10 minutes once the root cause was identified.
Key lesson
  • Every read-heavy endpoint that returns deterministic data for a given input should be evaluated for @Cacheable — the question is not whether to cache but whether you have a reason not to
  • A single missing annotation on a high-traffic path can cause a 40x latency spike under launch load — this kind of failure is invisible at normal traffic levels and only surfaces under pressure
  • Profile database query frequency before every major traffic event — if the same parameterized query executes more than 1,000 times per minute, it is a candidate for caching regardless of its individual execution time
  • Cache hit ratio monitoring would have surfaced this before launch — a brand new cache with a 0% hit ratio on a read-heavy endpoint is a signal worth investigating
  • The instinct to scale the database horizontally is usually wrong when the problem is application-layer repetition — always rule out caching gaps before ordering infrastructure
Production debug guideWhen Redis caching behaves unexpectedly, here is how to go from an observable symptom to a verified resolution. Start at the symptom, follow the action, do not skip steps.6 entries
Symptom · 01
Cache hit ratio dropped suddenly — from 95% to 50% or lower after a deployment
Fix
The most common cause is a cache key format change in the new deployment. Old keys in Redis still exist but no longer match what the new code generates — every request is a miss even though Redis is full of data. Flush the affected cache namespace: redis-cli --scan --pattern 'products::*' | xargs redis-cli DEL. Redeploy. Verify hit ratio recovers within one TTL window. If it does not recover, compare the key format before and after the deployment using redis-cli --scan to see what keys look like in the live instance.
Symptom · 02
Cached method always hits the database — cache appears to do nothing, no error is thrown
Fix
This is almost always the internal call gotcha. Check whether the @Cacheable method is being invoked from within the same class using this.method() or a direct method call without going through a Spring-injected reference. Spring AOP proxies cannot intercept calls that bypass the proxy. Add a log line inside the method to confirm it is executing on every call, then check the call site. Extract the cached method into a separate Spring bean, inject it as a dependency, and call through the injected reference.
Symptom · 03
Stale data served from cache after a database update — users see old values
Fix
Check whether the update method has @CacheEvict or @CachePut. Then check whether related caches are also being evicted — a product update that clears the product detail cache but not the product list cache leaves users seeing different data depending on which page they visit. Use @Caching to evict all affected cache namespaces from a single method. Map every entity to every cache namespace that holds any representation of it.
Symptom · 04
Application throws RedisConnectionException and returns 500 errors when Redis is unreachable
Fix
This means graceful degradation is not implemented — a cache infrastructure failure is cascading into an application failure. Immediate: check Redis connectivity with redis-cli ping. Check memory: redis-cli INFO memory to see if maxmemory was reached. Medium-term: implement try-catch fallback to the database on any Redis exception. Long-term: add a Resilience4j circuit breaker that stops attempting Redis calls after a failure threshold and resumes when Redis recovers.
Symptom · 05
Null values appearing in cache — subsequent requests return null even for data that exists in the database
Fix
A method returned null once and the cache stored that null value. Add unless = "#result == null" to the @Cacheable annotation to prevent caching null results. Add disableCachingNullValues() to your RedisCacheConfiguration as a safety net. To verify whether null is currently cached, check directly: redis-cli GET 'products::42' and inspect the value. If you see a JSON representation of null, flush that key and add the null protection.
Symptom · 06
Redis memory growing unbounded — keys are not expiring, memory climbs over hours or days
Fix
TTL is not configured, or the RedisCacheManager is not applying it correctly. Check Redis directly: redis-cli TTL 'products::42' — a result of -1 means no TTL is set on that key, which means it will persist indefinitely. Verify your RedisCacheManager bean has .entryTtl() configured. Check eviction policy: redis-cli CONFIG GET maxmemory-policy. If policy is noeviction, switch to allkeys-lru as an immediate safety net while you fix TTL configuration: redis-cli CONFIG SET maxmemory-policy allkeys-lru.
★ Redis Cache Debug Cheat Sheet — Commands That Save HoursReal commands for debugging Spring Boot Redis caching issues. These are the exact commands I use first when something is wrong with caching behavior in production. Copy them into your team runbook before you need them.
Need to see what is cached and inspect the actual stored values
Immediate action
Use Redis CLI to scan for cached keys by namespace pattern and inspect their raw stored values
Commands
redis-cli --scan --pattern 'products::*' | head -20
redis-cli GET 'products::42'
Fix now
If the values look like binary blobs starting with \xac\xed (Java serialization magic bytes), your application is using Java serialization instead of JSON. Switch to GenericJackson2JsonRedisSerializer in your RedisCacheManager configuration and flush the affected cache — old binary entries will not deserialize correctly with the new serializer.
Need to check current cache hit ratio via Actuator without touching Redis directly+
Immediate action
Query the Actuator metrics endpoint for hit and miss counts, then calculate the ratio
Commands
curl -s http://localhost:8080/actuator/metrics/cache.gets?tag=result:hit | jq '.measurements[0].value'
curl -s http://localhost:8080/actuator/metrics/cache.gets?tag=result:miss | jq '.measurements[0].value'
Fix now
Calculate hit ratio as hits divided by the sum of hits and misses. Below 85% is a signal worth investigating. Below 50% means the overhead of going to Redis on every miss is likely worse than not caching at all. Common causes of a low ratio: TTLs too short for the read pattern, key format mismatch after a deployment, or a cache namespace that is being evicted faster than it is being populated.
Need to flush a specific cache namespace without touching other caches+
Immediate action
Delete all keys matching the cache namespace pattern in batches to avoid blocking Redis on large keyspaces
Commands
redis-cli --scan --pattern 'products::*' | xargs -L 100 redis-cli DEL
redis-cli --scan --pattern 'products::*' | wc -l
Fix now
After the flush, the second command should return 0. If the application is still serving stale data, check whether it is using a different Redis database number than you expect — by default Spring Boot uses database 0. Run redis-cli -n 1 --scan --pattern 'products::*' to check database 1. Also verify the application is pointing to the same Redis host you are flushing.
Redis memory is full — keys are being silently evicted and cache hit ratio is dropping unpredictably+
Immediate action
Check Redis memory status, current eviction count, and configured eviction policy
Commands
redis-cli INFO memory | grep -E 'used_memory_human|maxmemory_human|evicted_keys'
redis-cli CONFIG GET maxmemory-policy
Fix now
If evicted_keys is climbing and maxmemory is set, your cache is under memory pressure. Immediate options: increase maxmemory if you have headroom on the host, reduce TTLs on high-volume cache namespaces to turn over keys faster, or switch eviction policy to allkeys-lru if it is currently noeviction — noeviction causes write failures under pressure which is worse than eviction. Run redis-cli CONFIG SET maxmemory-policy allkeys-lru to change the policy without a restart.
Need to verify end-to-end whether a specific method's cache is actually working+
Immediate action
Time two consecutive identical requests and compare durations — a working cache should show 10x or greater speedup on the second call
Commands
time curl -s http://localhost:8080/api/product/1 > /dev/null
time curl -s http://localhost:8080/api/product/1 > /dev/null
Fix now
If both calls take the same amount of time, caching is not working. Check four things in order: (1) is the method being called from within the same class via this.method(), (2) does the cache name in the annotation match the name configured in RedisCacheManager, (3) is Redis reachable and responding to redis-cli ping, (4) is there an exception being silently swallowed in your graceful degradation logic that is routing every call to the database.
Local Caching vs. Distributed Caching
FeatureLocal Caching (Caffeine)Distributed Caching (Redis)
Data LocationApplication JVM heap — zero network overhead, sub-millisecond accessExternal Redis server — 2 to 5ms network round-trip per operation
Consistency Across InstancesNone — each instance has an independent cache. Write on one instance does not evict from others. Users can see different data depending on which server handles their request.Full — all instances share the same cache. Write on any instance updates the shared store. All subsequent reads from any instance see the same value.
PersistenceLost on application restart — cache starts cold after every deploymentPersists across application restarts when Redis appendonly is enabled — cache survives deployments
Network LatencyNear-zero — in-process memory accessLow but real — 2 to 5ms per Redis operation on a well-networked cluster
Operational ComplexityVery low — embedded in the application, no external infrastructureModerate — requires Redis infrastructure, monitoring, backup, and memory management
Maximum Cache SizeBounded by JVM heap — sharing heap with application objects creates GC pressure at large sizesBounded by Redis server memory — can be clustered horizontally for larger datasets
Serialization RequirementNone — objects stay in the same JVM and are not serializedRequired — objects must be serialized (JSON recommended) for network transfer and storage
Best FitSingle-instance applications, reference data that never changes, read-only configuration — anywhere consistency across nodes is not a requirementAny multi-instance deployment, session management, shared state, data that must be consistent across all instances immediately after a write
Combined L1+L2 StrategyCaffeine as L1 — catches hot keys in-process, sub-millisecond, no network. Reduces Redis call volume by handling the most frequently accessed entries locally.Redis as L2 — provides consistency across all nodes and handles keys that miss the local L1 cache. Together the layers give you both speed and correctness.

Key takeaways

1
Redis is the correct choice for distributed caching in any multi-instance deployment
local caching with Caffeine produces inconsistent data across instances, which creates intermittent bugs that are extremely difficult to reproduce.
2
Always use GenericJackson2JsonRedisSerializer instead of default Java serialization
JSON is human-readable in Redis CLI, tolerant of backward-compatible schema changes, and does not break across deployments that rename fields.
3
Per-cache TTL configuration is a design decision, not a detail
match each namespace's expiry to its data volatility and the business cost of serving stale data. A uniform global TTL is almost always the wrong choice.
4
Master the full annotation triad
@Cacheable for reads, @CachePut for updates where zero miss penalty on the next read matters, @CacheEvict for deletions and high-write updates. Use @Caching when one method must affect multiple cache namespaces simultaneously.
5
Forgetting to evict related cache namespaces
list caches, summary caches, aggregated views — after an entity update is the most common source of stale data in production caching implementations. Map every entity to every cache that holds any representation of it.
6
Cache hit ratio is the primary health signal for caching
monitor it per namespace using Actuator and Micrometer, export to Prometheus, alert on drops below 85% per namespace. A sudden hit ratio drop after deployment almost always means a key format change without cache flush.
7
Always implement graceful degradation
Redis will become unavailable at some point and your application must fall back to the database, slower but functional, rather than returning 500 errors. Size your database to handle 100% of traffic without Redis.
8
The internal call gotcha
calling a @Cacheable method via this.method() within the same class bypasses the AOP proxy and silently disables caching with no error. Extract cached methods into separate injected beans.
9
Write integration tests against real Redis for cache hit verification, eviction correctness, and null result protection. These three test types catch the majority of caching bugs at the annotation and configuration level before they reach production.
10
Never cache PII without field-level encryption and access controls. Never skip TTL configuration. Never rely on the cache being available
your database is the source of truth, Redis is the optimization layer.

Common mistakes to avoid

10 patterns
×

Caching sensitive PII without encryption or access controls

Symptom
Redis is commonly deployed without TLS or ACLs on internal networks. Cached JSON values containing user names, email addresses, payment tokens, or session data are readable in plain text by anyone with network access to Redis. A routine redis-cli --scan --pattern 'userSessions::*' followed by GET on any returned key exposes the full session payload.
Fix
Encrypt sensitive field values before storing them in the cache, using your application's encryption service rather than relying on Redis transport security alone. Enable Redis TLS for data in transit and configure Redis ACLs to restrict which application credentials can read which keyspaces. For particularly sensitive data, consider whether Redis is the right store at all — some PII categories should not leave the database regardless of performance pressure.
×

Calling a @Cacheable method from within the same class — the internal call gotcha

Symptom
The cache has zero effect — every invocation executes the method body and hits the database. No error is thrown. The method works correctly from a data perspective. Adding log statements inside the method confirms it executes on every call. No cache entries are created in Redis.
Fix
Spring caching works through AOP proxies. Calls to this.method() or direct method calls within the same class bypass the proxy entirely — the caching interceptor never runs. Extract the @Cacheable method into a separate Spring bean and inject it as a dependency. All calls through the injected reference go through the proxy and the caching interceptor fires correctly.
×

Cache stampede — popular cache key expires and dozens of simultaneous requests hit the database at once

Symptom
Database CPU spikes to 100% in a periodic pattern that exactly matches the TTL of a popular cache entry. The spike lasts for several seconds while one request populates the cache and the others pile up on the database. Latency spikes are predictable and repeatable every N minutes.
Fix
Add sync = true to the @Cacheable annotation on the hot method. This uses a lock so only one thread fetches from the database on a cache miss — all other threads wait for that thread's result rather than independently querying the database. For extremely high-volume scenarios, consider a background refresh job that proactively refreshes the cache entry before TTL expiration, keeping the cache continuously warm.
×

Not configuring TTL — unbounded cache growth that eventually causes Redis failure

Symptom
Redis memory grows steadily over days or weeks. Eventually maxmemory is reached and the configured eviction policy begins removing keys — or if noeviction is configured, Redis starts returning COMMAND DENIED errors on writes. Cache hit ratio becomes unpredictable. Operations team investigates a Redis infrastructure problem that is actually an application configuration problem.
Fix
Every cache namespace must have an explicit TTL configured via RedisCacheConfiguration.entryTtl(). Use per-namespace TTLs that match data volatility. Monitor Redis memory with redis-cli INFO memory and set an alert threshold at 75% of maxmemory so you have time to respond before eviction begins.
×

Using default Java serialization instead of JSON serialization

Symptom
Cached values in Redis are unreadable binary blobs — impossible to inspect during an incident. Any deployment that changes a field name, field type, or adds a non-serializable field causes deserialization failures on existing cached entries. The failure mode is a silent null return or a SerializationException that the framework may swallow, returning null to the caller as if the cache entry did not exist.
Fix
Configure GenericJackson2JsonRedisSerializer in your RedisCacheManager bean. Flush affected caches after any deployment that changes the structure of a cached class. JSON deserialization is tolerant of additive schema changes — new fields default to null or their Java defaults on classes that predate the field. Breaking changes like field renames still require a cache flush.
×

Caching null values — a deleted entity returns null from cache long after being re-added to the database

Symptom
A product is temporarily removed from the database. The first request after removal fetches null from the database and stores it in the cache. For the duration of the TTL, every subsequent request returns null from cache even after the product is re-added. The database has the correct data but the cache wins for every read during the TTL window.
Fix
Add unless = "#result == null" to every @Cacheable annotation and add disableCachingNullValues() to your RedisCacheConfiguration as a defense-in-depth layer. These two controls together prevent null from ever being stored in the cache regardless of what the method returns.
×

Not implementing graceful degradation — Redis unavailability cascades to application failure

Symptom
When Redis becomes unreachable, the application throws RedisConnectionException on every cache interaction. The exception propagates to the controller layer and returns 500 errors to users. A cache infrastructure problem becomes a complete application outage. The system that was supposed to improve reliability has instead introduced a new critical failure mode.
Fix
Implement try-catch fallback on all Redis interactions that routes to the database on any Redis exception. Register a custom CacheErrorHandler bean with the CacheManager for declarative fallback handling on @Cacheable annotated methods. Size your database to handle 100% of read traffic without cache assistance — if it cannot, Redis is a hard dependency and must be treated with the same SLA obligations as your primary data store.
×

Forgetting to evict related caches on entity updates — same data, different cache namespaces, different staleness

Symptom
A product update correctly evicts the product detail cache. The product list cache is a separate namespace. Users see the correct updated name on the product detail page and the old name on the product listing page. Same database entity, two cache namespaces, only one evicted. The bug report says the data is inconsistent depending on the page visited.
Fix
Use @Caching to handle all affected cache namespaces in a single method. Before adding @CacheEvict to any update method, list every cache namespace that contains any representation of the entity being updated. Product detail, product list, category product counts, search index representations — if it contains data derived from the entity, it must be evicted or updated when the entity changes.
×

Cache key collisions from the default SimpleKeyGenerator across methods with identical parameter signatures

Symptom
getProductById(42L) and getInventoryCount(42L) both generate the cache key 42 under the same namespace. Whichever method is called first populates the cache. The second method reads that entry and receives data intended for the first method. In the best case this throws ClassCastException immediately. In the worst case the types are compatible and wrong data is served silently.
Fix
Implement a custom KeyGenerator bean that includes the class name and method name in every generated key. Register it with @Bean("forgeKeyGenerator") and reference it in annotations with keyGenerator = "forgeKeyGenerator". Alternatively, set it as the global default in the CacheManager builder so it applies everywhere without per-annotation configuration.
×

Not monitoring cache hit ratios — caching is counterproductive and nobody knows

Symptom
The cache is configured and appears to be running. Database query volume is higher than expected. Infrastructure costs are climbing. Nobody has checked whether the cache is actually serving requests or whether every call is a miss that pays both the Redis network cost and the database query cost.
Fix
Enable Actuator cache metrics with management.metrics.cache.instrument=true. Query cache.gets with result:hit and result:miss tags per cache namespace. Build a Grafana panel for the per-namespace hit ratio. Set an alert for any namespace dropping below 85% for more than five minutes. A hit ratio below 50% on a cache that is supposed to reduce database load means the cache is actively making things slower — the network round-trip cost of the miss is additional overhead on top of the database call you would have made anyway.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the Cache-Aside pattern and how does Spring Boot implement it us...
Q02SENIOR
Explain the difference between @Cacheable, @CachePut, and @CacheEvict. W...
Q03SENIOR
How do you handle serialization issues when the class structure of a cac...
Q04SENIOR
What is cache hit ratio and how would you monitor it for a Spring Boot a...
Q05SENIOR
Describe the AOP proxy pattern in Spring and why it prevents caching fro...
Q06SENIOR
What is a cache stampede and how do you prevent it in Spring Boot?
Q07SENIOR
How would you configure different TTL values for different cache namespa...
Q08SENIOR
Explain the difference between Lettuce and Jedis as Redis clients and wh...
Q09SENIOR
What happens when Redis goes down and how would you design your caching ...
Q10SENIOR
How do you write effective tests for cached Spring beans and what specif...
Q01 of 10SENIOR

What is the Cache-Aside pattern and how does Spring Boot implement it using annotations?

ANSWER
Cache-Aside, also called Lazy Loading, is a caching strategy where the application manages the cache directly rather than the cache being a transparent layer between the application and the database. On a read: check the cache first, return the cached value on a hit, query the database on a miss, store the result, and return it. On a write: update the database and then either evict the cache entry (@CacheEvict) or update it (@CachePut). Spring Boot implements Cache-Aside through AOP proxies on annotated methods. @Cacheable generates a cache key from method parameters, checks the configured cache store, and short-circuits method execution on a hit. @CachePut always executes and writes the result to the cache after execution. @CacheEvict removes entries. The proxy is transparent — the caller has no knowledge of cache interactions. The limitation of AOP proxies is that internal calls within the same class bypass the proxy entirely, which is the most common implementation bug.
FAQ · 10 QUESTIONS

Frequently Asked Questions

01
What is the difference between @Cacheable, @CachePut, and @CacheEvict?
02
How do I configure different TTLs for different cache namespaces?
03
Why is my @Cacheable method always hitting the database even though Redis is running?
04
What is a cache stampede and how do I prevent it?
05
Should I use Lettuce or Jedis as my Redis client?
06
How do I monitor cache performance in production?
07
What happens if Redis goes down and how do I prevent the application from returning 500 errors?
08
How do I handle serialization issues when my cached object class changes?
09
What Redis eviction policy should I use for a cache?
10
How do I test that my caching is working correctly?
N
Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Everything here is grounded in real deployments.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's Spring Boot. Mark it forged?

18 min read · try the examples if you haven't

Previous
Microservices with Spring Boot and Spring Cloud
15 / 21 · Spring Boot
Next
Spring Boot Bean Lifecycle