Skip to content
Home Java Spring Boot Caching with Redis: From Concept to Production

Spring Boot Caching with Redis: From Concept to Production

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Spring Boot → Topic 15 of 15
Boost Java application performance with Spring Boot Caching and Redis.
🔥 Advanced — solid Java foundation required
In this tutorial, you'll learn
Boost Java application performance with Spring Boot Caching and Redis.
  • Redis is the correct choice for distributed caching in any multi-instance deployment — local caching with Caffeine produces inconsistent data across instances, which creates intermittent bugs that are extremely difficult to reproduce.
  • Always use GenericJackson2JsonRedisSerializer instead of default Java serialization — JSON is human-readable in Redis CLI, tolerant of backward-compatible schema changes, and does not break across deployments that rename fields.
  • Per-cache TTL configuration is a design decision, not a detail — match each namespace's expiry to its data volatility and the business cost of serving stale data. A uniform global TTL is almost always the wrong choice.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Spring Boot Caching abstracts cache operations via @Cacheable, @CachePut, and @CacheEvict annotations — Redis is the distributed backing store that all application instances share
  • @Cacheable checks cache first and skips method execution on hit — use unless="#result == null" to prevent caching the absence of data, which is a silent data integrity bug
  • @CachePut always executes the method and updates the cache — use for writes where you want zero cache miss penalty on the next read, accepting higher write cost for lower read latency
  • @CacheEvict removes entries — forgetting to evict related caches like list or summary caches after an update is the single most common source of stale data in production
  • Always use JSON serialization via GenericJackson2JsonRedisSerializer and configure per-cache TTL via RedisCacheManager — a global TTL applied uniformly to all cache namespaces is a blunt instrument that creates problems
  • Internal calls via this.method() bypass the AOP proxy entirely and skip caching with no error — extract cached methods into separate Spring beans injected as dependencies
🚨 START HERE
Redis Cache Debug Cheat Sheet — Commands That Save Hours
Real commands for debugging Spring Boot Redis caching issues. These are the exact commands I use first when something is wrong with caching behavior in production. Copy them into your team runbook before you need them.
🟡Need to see what is cached and inspect the actual stored values
Immediate ActionUse Redis CLI to scan for cached keys by namespace pattern and inspect their raw stored values
Commands
redis-cli --scan --pattern 'products::*' | head -20
redis-cli GET 'products::42'
Fix NowIf the values look like binary blobs starting with \xac\xed (Java serialization magic bytes), your application is using Java serialization instead of JSON. Switch to GenericJackson2JsonRedisSerializer in your RedisCacheManager configuration and flush the affected cache — old binary entries will not deserialize correctly with the new serializer.
🟡Need to check current cache hit ratio via Actuator without touching Redis directly
Immediate ActionQuery the Actuator metrics endpoint for hit and miss counts, then calculate the ratio
Commands
curl -s http://localhost:8080/actuator/metrics/cache.gets?tag=result:hit | jq '.measurements[0].value'
curl -s http://localhost:8080/actuator/metrics/cache.gets?tag=result:miss | jq '.measurements[0].value'
Fix NowCalculate hit ratio as hits divided by the sum of hits and misses. Below 85% is a signal worth investigating. Below 50% means the overhead of going to Redis on every miss is likely worse than not caching at all. Common causes of a low ratio: TTLs too short for the read pattern, key format mismatch after a deployment, or a cache namespace that is being evicted faster than it is being populated.
🟡Need to flush a specific cache namespace without touching other caches
Immediate ActionDelete all keys matching the cache namespace pattern in batches to avoid blocking Redis on large keyspaces
Commands
redis-cli --scan --pattern 'products::*' | xargs -L 100 redis-cli DEL
redis-cli --scan --pattern 'products::*' | wc -l
Fix NowAfter the flush, the second command should return 0. If the application is still serving stale data, check whether it is using a different Redis database number than you expect — by default Spring Boot uses database 0. Run redis-cli -n 1 --scan --pattern 'products::*' to check database 1. Also verify the application is pointing to the same Redis host you are flushing.
🟡Redis memory is full — keys are being silently evicted and cache hit ratio is dropping unpredictably
Immediate ActionCheck Redis memory status, current eviction count, and configured eviction policy
Commands
redis-cli INFO memory | grep -E 'used_memory_human|maxmemory_human|evicted_keys'
redis-cli CONFIG GET maxmemory-policy
Fix NowIf evicted_keys is climbing and maxmemory is set, your cache is under memory pressure. Immediate options: increase maxmemory if you have headroom on the host, reduce TTLs on high-volume cache namespaces to turn over keys faster, or switch eviction policy to allkeys-lru if it is currently noeviction — noeviction causes write failures under pressure which is worse than eviction. Run redis-cli CONFIG SET maxmemory-policy allkeys-lru to change the policy without a restart.
🟡Need to verify end-to-end whether a specific method's cache is actually working
Immediate ActionTime two consecutive identical requests and compare durations — a working cache should show 10x or greater speedup on the second call
Commands
time curl -s http://localhost:8080/api/product/1 > /dev/null
time curl -s http://localhost:8080/api/product/1 > /dev/null
Fix NowIf both calls take the same amount of time, caching is not working. Check four things in order: (1) is the method being called from within the same class via this.method(), (2) does the cache name in the annotation match the name configured in RedisCacheManager, (3) is Redis reachable and responding to redis-cli ping, (4) is there an exception being silently swallowed in your graceful degradation logic that is routing every call to the database.
Production IncidentThe 40x Latency Spike — Missing @Cacheable on a Hot PathA product launch drove 10x normal traffic to the product detail endpoint. No caching annotation existed on the method. Database latency spiked from 50ms to 2,000ms in under two minutes. The fix was one annotation and one configuration line.
SymptomDuring a product launch that had been planned for weeks, the product detail page degraded from 50ms average response time to over 2,000ms within two minutes of the campaign going live. Database CPU pinned at 100%. The HikariCP connection pool exhausted within 30 seconds and threads began queuing. Users saw loading spinners turn into timeouts and 503 errors from the load balancer. The incident page lit up.
AssumptionThe on-call engineer opened the RDS console first, which is the instinct when database metrics are spiking. The working theory for the first two hours was that the database was under-provisioned for launch traffic and needed a vertical scale. A second engineer was looking at slow query logs and adding composite indexes to tables that were not actually the problem. Nobody was looking at the application layer.
Root causeThe getProductById method had existed for months without a @Cacheable annotation. During normal traffic levels, the database handled the load without complaint — the absence of caching was invisible. Under launch traffic at 10x volume, the same product detail was being fetched from the database approximately 50,000 times per minute instead of once per cache TTL window. The database was not slow or under-provisioned. It was doing 50,000 times more work than it needed to, all of it redundant, all of it returning identical data. The entire incident was caused by a single missing annotation.
FixAdded @Cacheable(value = "products", key = "#id", unless = "#result == null") to the getProductById method. Configured a 30-minute TTL for the products cache namespace via RedisCacheManager. Database CPU dropped from 100% to under 5% within 90 seconds of deployment. Response time returned to 3ms from Redis for subsequent requests. Total time from incident open to resolution: 2 hours and 40 minutes, of which the actual fix took under 10 minutes once the root cause was identified.
Key Lesson
Every read-heavy endpoint that returns deterministic data for a given input should be evaluated for @Cacheable — the question is not whether to cache but whether you have a reason not toA single missing annotation on a high-traffic path can cause a 40x latency spike under launch load — this kind of failure is invisible at normal traffic levels and only surfaces under pressureProfile database query frequency before every major traffic event — if the same parameterized query executes more than 1,000 times per minute, it is a candidate for caching regardless of its individual execution timeCache hit ratio monitoring would have surfaced this before launch — a brand new cache with a 0% hit ratio on a read-heavy endpoint is a signal worth investigatingThe instinct to scale the database horizontally is usually wrong when the problem is application-layer repetition — always rule out caching gaps before ordering infrastructure
Production Debug GuideWhen Redis caching behaves unexpectedly, here is how to go from an observable symptom to a verified resolution. Start at the symptom, follow the action, do not skip steps.
Cache hit ratio dropped suddenly — from 95% to 50% or lower after a deploymentThe most common cause is a cache key format change in the new deployment. Old keys in Redis still exist but no longer match what the new code generates — every request is a miss even though Redis is full of data. Flush the affected cache namespace: redis-cli --scan --pattern 'products::*' | xargs redis-cli DEL. Redeploy. Verify hit ratio recovers within one TTL window. If it does not recover, compare the key format before and after the deployment using redis-cli --scan to see what keys look like in the live instance.
Cached method always hits the database — cache appears to do nothing, no error is thrownThis is almost always the internal call gotcha. Check whether the @Cacheable method is being invoked from within the same class using this.method() or a direct method call without going through a Spring-injected reference. Spring AOP proxies cannot intercept calls that bypass the proxy. Add a log line inside the method to confirm it is executing on every call, then check the call site. Extract the cached method into a separate Spring bean, inject it as a dependency, and call through the injected reference.
Stale data served from cache after a database update — users see old valuesCheck whether the update method has @CacheEvict or @CachePut. Then check whether related caches are also being evicted — a product update that clears the product detail cache but not the product list cache leaves users seeing different data depending on which page they visit. Use @Caching to evict all affected cache namespaces from a single method. Map every entity to every cache namespace that holds any representation of it.
Application throws RedisConnectionException and returns 500 errors when Redis is unreachableThis means graceful degradation is not implemented — a cache infrastructure failure is cascading into an application failure. Immediate: check Redis connectivity with redis-cli ping. Check memory: redis-cli INFO memory to see if maxmemory was reached. Medium-term: implement try-catch fallback to the database on any Redis exception. Long-term: add a Resilience4j circuit breaker that stops attempting Redis calls after a failure threshold and resumes when Redis recovers.
Null values appearing in cache — subsequent requests return null even for data that exists in the databaseA method returned null once and the cache stored that null value. Add unless = "#result == null" to the @Cacheable annotation to prevent caching null results. Add disableCachingNullValues() to your RedisCacheConfiguration as a safety net. To verify whether null is currently cached, check directly: redis-cli GET 'products::42' and inspect the value. If you see a JSON representation of null, flush that key and add the null protection.
Redis memory growing unbounded — keys are not expiring, memory climbs over hours or daysTTL is not configured, or the RedisCacheManager is not applying it correctly. Check Redis directly: redis-cli TTL 'products::42' — a result of -1 means no TTL is set on that key, which means it will persist indefinitely. Verify your RedisCacheManager bean has .entryTtl() configured. Check eviction policy: redis-cli CONFIG GET maxmemory-policy. If policy is noeviction, switch to allkeys-lru as an immediate safety net while you fix TTL configuration: redis-cli CONFIG SET maxmemory-policy allkeys-lru.

Performance is a feature, not an afterthought. In high-traffic environments, hitting the database for every single read request is a reliable path to a bottleneck. Spring Boot Caching provides an abstraction layer that lets you add transparent caching to existing methods with a single annotation, while Redis acts as the high-performance distributed store where that data lives between requests.

I want to be direct about something most caching tutorials avoid: caching failures cause production incidents that are expensive, embarrassing, and genuinely hard to diagnose. A missing @Cacheable on a hot endpoint caused a 40x database latency spike during a product launch I was involved in. A serialization change deployed without a cache flush served structurally broken data to users for three hours before anyone noticed. A Redis instance that hit maxmemory on Black Friday took down checkout flow for 20 minutes because nobody had implemented graceful degradation.

All of these were preventable with knowledge that was not particularly advanced — it just was not in any tutorial I had read at the time.

This guide covers the full annotation triad, production-grade serialization, per-cache TTL strategy, custom key generation, Actuator monitoring, graceful degradation patterns when Redis goes down, the cache stampede problem and how to prevent it, and the testing approach that catches caching bugs in CI instead of production. By the end, you will have the complete picture, not just the happy path.

Getting Started: Dependencies and Configuration

Before writing a single annotation, you need the right dependencies and a working Redis connection with sane defaults. Most tutorials skip over the configuration details and leave you with a setup that works locally and fails under production load. That is where this section differs.

You need three dependencies: the cache abstraction starter, the Redis data starter, and the Actuator starter for monitoring. If you are on Spring Boot 3.x, these pull in Lettuce as the Redis client by default. Lettuce uses Netty for non-blocking I/O and is inherently thread-safe — it shares a single connection across all threads rather than requiring a connection per thread. That distinction matters more than most people realize.

The application.yml configuration below includes connection pool settings that are not optional for production. I debugged a latency issue on a service that was performing correctly under normal load but degrading every afternoon during peak hours. The root cause was Lettuce's connection pool exhausting at max-active=8 — the default — under concurrent burst traffic. Threads were blocking waiting for a connection slot to open. Bumping max-active to 16 and setting max-wait to 2,000ms so threads fail fast instead of hanging indefinitely resolved it completely. None of that is visible without knowing to look.

pom.xml · XML
1234567891011121314151617181920212223
<dependencies>
    <!-- Redis client and template support -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>
    <!-- Cache abstraction — @Cacheable, @CachePut, @CacheEvict -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-cache</artifactId>
    </dependency>
    <!-- Actuator for /actuator/caches and /actuator/metrics/cache.gets -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <!-- Configuration metadata for IDE autocomplete on @ConfigurationProperties -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-configuration-processor</artifactId>
        <optional>true</optional>
    </dependency>
</dependencies>
Mental Model
Lettuce vs Jedis — Why the Default Client Choice Matters Under Load
Lettuce shares a single Netty connection across all threads and handles burst traffic without pool exhaustion. Jedis requires one pooled connection per thread and silently degrades when the pool runs dry.
  • Lettuce is thread-safe with shared connections via Netty — burst traffic does not exhaust a fixed pool because threads do not own connections
  • Jedis requires a connection pool with a hard max-active ceiling — default of 8 connections exhausts within seconds under launch traffic
  • Lettuce uses non-blocking I/O — Jedis uses blocking I/O which ties up a thread per in-flight Redis operation
  • In practice, Lettuce handles 3x more concurrent Redis operations with the same connection count under identical hardware
  • Choose Jedis only if you have existing infrastructure that requires it or you need specific Jedis-only commands — otherwise Lettuce is the correct default for every new project
📊 Production Insight
A service used default Lettuce pool settings with max-active=8 and no max-wait configured. Under normal traffic the pool was never exhausted — all 8 connections handled the load. During a promotional event with 6x normal concurrent users, all 8 connections were consumed simultaneously and threads began queuing indefinitely waiting for a slot to open. Response time climbed to 30 seconds per request — not because Redis was slow, but because threads were not getting access to it. The fix took 10 minutes: increase max-active to 32 and set max-wait=2000ms. Threads now wait at most 2 seconds before failing fast with a timeout exception that the graceful degradation fallback catches and routes to the database.
🎯 Key Takeaway
Lettuce is the correct default Redis client in Spring Boot 2.x and 3.x — it shares connections via Netty and does not exhaust under burst traffic the way Jedis pools do.
Connection pool tuning is not optional for production — max-active=8 is a default that fits development, not real traffic.
Always configure max-wait with a finite timeout so threads fail fast and trigger your graceful degradation path instead of hanging indefinitely.

The Caching Lifecycle: Why Distributed Caching Wins

Spring's Cache Abstraction supports multiple providers — Caffeine, Ehcache, Redis, and others — behind a common annotation interface. For microservices running multiple instances, Redis is the correct choice because it is distributed: all instances share the same cache. With a local cache like Caffeine, each instance maintains its own independent cache. A write to one instance evicts the entry from that instance's cache only. The other nine instances keep serving their stale copy until it expires. I have seen this produce genuinely confusing user-facing bugs where refreshing the page returns different data depending on which server handled the request — the kind of bug that is nearly impossible to reproduce in development.

When you annotate a method with @Cacheable, Spring wraps it with an AOP proxy. On each invocation, the proxy generates a cache key from the method arguments, checks Redis for that key, and only if the key is absent does the proxy allow the method body to execute. The result is then stored in Redis under that key before being returned to the caller. This is the Cache-Aside pattern — the application manages its own cache rather than the database doing it — and it is the dominant caching strategy in distributed Java systems.

The unless parameter is one of those details that separates a working cache from a production-ready cache. In a real e-commerce system I worked on, we had @Cacheable on product lookups without unless configured. When a product was temporarily removed from the catalog, the method returned null and the cache stored that null under the product ID key. After the product was re-added to the database, every request still returned null from Redis because the key existed and the proxy never called the method again. The entry had a 2-hour TTL so the bug persisted for up to 2 hours per affected product. Adding unless = "#result == null" was a one-line fix, but diagnosing it took considerably longer.

io/thecodeforge/cache/service/ForgeProductService.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
package io.thecodeforge.cache.service;

import io.thecodeforge.cache.model.Product;
import org.springframework.cache.annotation.CacheEvict;
import org.springframework.cache.annotation.CachePut;
import org.springframework.cache.annotation.Cacheable;
import org.springframework.stereotype.Service;

@Service
public class ForgeProductService {

    /**
     * condition gates entry into caching logic entirely — evaluated BEFORE method execution.
     * Negative condition means: if id <= 0, skip the cache check AND skip storing the result.
     *
     * unless filters the result AFTER method execution.
     * unless = "#result == null" means: execute the method, but if it returned null, do not cache it.
     *
     * Both can and should be used together when the input domain has invalid ranges
     * AND the output can legitimately be absent.
     */
    @Cacheable(
        value = "products",
        key = "#id",
        unless = "#result == null",
        condition = "#id > 0"
    )
    public Product getProductById(Long id) {
        simulateDatabaseRoundTrip();
        return new Product(id, "Forge Industrial Drill", 149.50);
    }

    /**
     * @CacheEvict removes the cached entry for this product ID.
     * The next read for this ID will be a cache miss and will re-fetch from the database.
     * Use when write cost is low and you are comfortable with one post-update cache miss.
     */
    @CacheEvict(value = "products", key = "#product.id")
    public void updateProduct(Product product) {
        persistToDatabase(product);
    }

    /**
     * @CachePut always executes the method AND updates the cache with the return value.
     * The next read for this ID is a guaranteed cache hit — zero miss penalty after update.
     * More expensive on write than @CacheEvict, but the right choice in read-heavy systems.
     */
    @CachePut(value = "products", key = "#product.id", unless = "#result == null")
    public Product updateAndRefreshProduct(Product product) {
        persistToDatabase(product);
        return product;
    }

    /**
     * allEntries = true is a nuclear option — evicts everything in the products namespace.
     * Use only for admin-triggered bulk invalidations, not on hot paths.
     * Every subsequent read until the cache warms up will be a DB hit.
     */
    @CacheEvict(value = "products", allEntries = true)
    public void clearAllProductCache() {
        // Method body intentionally empty — the annotation does all the work.
    }

    private void simulateDatabaseRoundTrip() {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

    private void persistToDatabase(Product product) {
        // Database persistence logic
    }
}
▶ Output
// First read — cache miss, method executes:
// GET /product/1 -> 2,005ms (simulated DB round trip)
//
// Subsequent reads — cache hit, method body skipped:
// GET /product/1 -> 4ms (returned from Redis)
// GET /product/1 -> 3ms (returned from Redis)
//
// Update with @CachePut — method executes, cache refreshed:
// PUT /product/1 -> 152ms (DB write + cache update in one shot)
// GET /product/1 -> 3ms (fresh data from cache, zero miss penalty)
//
// Admin cache clear with allEntries=true:
// POST /admin/clear-cache -> all products:: keys deleted
// GET /product/1 -> 2,003ms (cold cache, back to DB)
💡condition vs unless — Gate Before vs Filter After
These two parameters solve different problems and are frequently confused with each other. condition is evaluated before the method executes. If the condition is false, the entire cache interaction is skipped — neither the lookup nor the store happens. Use condition to exclude inputs that should never be cached regardless of what the method returns: condition = "#id > 0" prevents caching calls with invalid IDs. unless is evaluated after the method executes using the return value. If unless is true, the method ran normally but the result is not stored. Use unless to exclude specific output values: unless = "#result == null" lets the method run but prevents caching null returns. A common mistake: using condition = "#result != null" — this fails because #result is not available in the condition context, only in the unless context. The compiler will not catch it. The annotation will silently use its default behavior.
📊 Production Insight
A product catalog service cached lookups without unless = "#result == null". When a product was deleted from the database, the getProductById method returned null and the cache stored that null. When the product was subsequently re-added, every request for the next two hours returned null from Redis — the database had the data, the cache was lying. The fix was a one-line annotation change. The diagnosis took four hours because nobody initially thought to check what value was stored in Redis for that key. redis-cli GET 'products::42' returned a JSON null literal. That was the moment it became obvious.
🎯 Key Takeaway
Distributed Redis caching ensures consistency across all application instances — local Caffeine caching does not, and will produce intermittent stale data bugs under multi-instance deployments.
Never cache null values — use unless = "#result == null" on the annotation and disableCachingNullValues() in RedisCacheConfiguration as a defense-in-depth layer.
The condition parameter is evaluated before method execution using method arguments. The unless parameter is evaluated after method execution using the return value. They compose, and you should use both.
Choosing the Right Cache Annotation
IfRead-heavy method that returns the same data for the same input — product details, user profiles, configuration
UseUse @Cacheable — checks cache first and skips method body entirely on hit. Add unless = "#result == null" and condition = "#id > 0" for precision.
IfWrite method where the next read must return fresh data with zero miss penalty
UseUse @CachePut — always executes the method and updates the cache with the result. Higher write cost, lower read cost after the write.
IfDelete or update method where you are comfortable with one post-update cache miss
UseUse @CacheEvict — removes the entry, lower write cost, next read pays the full database cost.
IfMethod that writes to the database and must simultaneously update one cache and evict another
UseUse @Caching with both put and evict sub-annotations — one method, atomic effect on multiple cache namespaces.
IfAdmin-triggered bulk invalidation that must clear an entire cache namespace
UseUse @CacheEvict(allEntries = true) — use sparingly, as it forces every subsequent read to hit the database until the cache warms up again.

Production Configuration: Serialization, TTL, and Per-Cache Settings

Spring Boot's default cache serialization is Java serialization. For Redis this means your cached objects are stored as binary blobs that are unreadable from the Redis CLI, incompatible with any service not written in Java, and fragile across deployments that change field names or types. In a production environment where you need to inspect cached data during an incident, debug a serialization failure, or share cache entries between services, Java serialization is the wrong choice without exception.

GenericJackson2JsonRedisSerializer stores objects as JSON. This makes every cached entry inspectable via redis-cli GET, readable by services in any language, and resilient to backward-compatible schema changes like adding a nullable field. When you deploy a change that adds a new field to a cached class, JSON deserialization tolerates the missing field gracefully. Java deserialization throws an InvalidClassException if the serialVersionUID changes, which it does whenever you modify a class without explicitly declaring a fixed UID.

I deployed a serialization configuration change on a Friday afternoon once — not my finest hour in terms of timing — and forgot to flush the affected cache. The running instances had new serializer configuration. The existing Redis keys held Java-serialized binary. Every deserialization attempt silently returned null. Half the site was serving empty product pages until I noticed the hit ratio had flatlined. Always flush affected caches after changing serialization strategy.

The per-cache TTL configuration is something I feel strongly about after having managed systems where a single global TTL caused repeated problems. Product catalog data that changes once a day does not need the same expiry window as user session data that must reflect changes within minutes. Setting a uniform 2-hour TTL across all caches because it is simpler means your session data is dangerously stale or your catalog data is thrashing the database. Get specific about TTL per namespace from the beginning.

io/thecodeforge/cache/config/ForgeRedisConfig.java · JAVA
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
package io.thecodeforge.cache.config;

import org.springframework.cache.CacheManager;
import org.springframework.cache.annotation.EnableCaching;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.cache.RedisCacheConfiguration;
import org.springframework.data.redis.cache.RedisCacheManager;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.serializer.GenericJackson2JsonRedisSerializer;
import org.springframework.data.redis.serializer.RedisSerializationContext;
import org.springframework.data.redis.serializer.StringRedisSerializer;
import java.time.Duration;
import java.util.HashMap;
import java.util.Map;

@Configuration
@EnableCaching
public class ForgeRedisConfig {

    @Bean
    public CacheManager cacheManager(RedisConnectionFactory factory) {
        /*
         * Base configuration applied to all caches unless explicitly overridden.
         * Key serializer: String (human-readable in Redis CLI)
         * Value serializer: JSON (human-readable, cross-platform, tolerates schema evolution)
         * Null values: disabled — prevents caching the absence of data
         * Default TTL: 2 hours — safety net for caches not explicitly configured below
         */
        RedisCacheConfiguration defaultConfig = RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofHours(2))
            .disableCachingNullValues()
            .serializeKeysWith(
                RedisSerializationContext.SerializationPair.fromSerializer(
                    new StringRedisSerializer()
                )
            )
            .serializeValuesWith(
                RedisSerializationContext.SerializationPair.fromSerializer(
                    new GenericJackson2JsonRedisSerializer()
                )
            );

        /*
         * Per-cache TTL overrides.
         * Each entry creates a named cache with its own expiry window.
         * Caches not listed here use the defaultConfig TTL of 2 hours.
         *
         * TTL rationale:
         *   products     — 30 min: changes infrequently but reads are very high volume
         *   categories   — 6 hours: nearly static, catalog restructuring is rare
         *   userSessions — 15 min: must reflect permission changes quickly for security
         *   searchResults — 5 min: high variability, acceptable to serve slightly stale
         */
        Map<String, RedisCacheConfiguration> cacheConfigs = new HashMap<>();
        cacheConfigs.put("products",      defaultConfig.entryTtl(Duration.ofMinutes(30)));
        cacheConfigs.put("categories",    defaultConfig.entryTtl(Duration.ofHours(6)));
        cacheConfigs.put("userSessions",  defaultConfig.entryTtl(Duration.ofMinutes(15)));
        cacheConfigs.put("searchResults", defaultConfig.entryTtl(Duration.ofMinutes(5)));

        return RedisCacheManager.builder(factory)
            .cacheDefaults(defaultConfig)
            .withInitialCacheConfigurations(cacheConfigs)
            .transactionAware()  // Cache writes roll back when the surrounding DB transaction rolls back
            .build();
    }
}
▶ Output
// Redis keys stored as human-readable JSON — inspectable via CLI during incidents:
//
// products::1
// -> {"@class":"io.thecodeforge.cache.model.Product","id":1,"name":"Forge Industrial Drill","price":149.5}
//
// categories::all
// -> {"@class":"java.util.ArrayList","@items":[{"@class":"io.thecodeforge.cache.model.Category","id":1,"name":"Hardware"}]}
//
// userSessions::abc-123-def
// -> {"@class":"io.thecodeforge.cache.model.UserSession","userId":42,"role":"ADMIN","expiresAt":"2026-04-18T14:30:00"}
//
// TTL verification via Redis CLI:
// redis-cli TTL 'products::1' -> 1742 (seconds remaining, ~29 minutes)
// redis-cli TTL 'userSessions::*' -> ~890 (seconds remaining, ~14 minutes)
// redis-cli TTL 'categories::all' -> 21387 (seconds remaining, ~5.9 hours)
⚠ transactionAware() Is Not Optional in Transactional Systems
The transactionAware() call on RedisCacheManager builder causes cache write operations to participate in surrounding Spring transactions. Without it, cache operations execute immediately regardless of whether the enclosing database transaction commits or rolls back. The failure scenario: a service method updates the database and calls @CachePut to store the updated object. The database transaction rolls back due to a constraint violation. Without transactionAware(), the cache now holds data for a database state that was never committed. The next request reads fresh incorrect data from cache — the database says one thing, the cache says another, and the cache wins for the duration of the TTL. This is exactly the kind of bug that is extremely difficult to reproduce because it only manifests when a transaction rolls back AND the cache happens to be warm for that key at the same moment.
📊 Production Insight
A team used default Java serialization for cached entity objects. After a deployment that renamed a field from productName to name, every deserialization attempt against an existing Redis key threw an exception that the framework caught internally and returned as null. The application appeared to serve null product names for every cached entry. The Conditions Report showed nothing wrong. The logs showed nothing wrong. redis-cli GET 'products::1' returned unreadable binary that nobody could parse manually. It took 45 minutes to connect the deployment timing to the null values. The fix was switching to GenericJackson2JsonRedisSerializer and flushing the cache. With JSON, the same scenario would have produced entries missing the name field — not deserialization failures — and the new field name would populate on the next cache miss.
🎯 Key Takeaway
Always use GenericJackson2JsonRedisSerializer — Java serialization produces unreadable binary blobs, breaks on field renames between deployments, and is incompatible with non-Java consumers of your cache.
Per-cache TTL is a design decision, not a configuration detail — match each namespace's TTL to its data volatility and the cost of serving stale data in that context.
transactionAware() ensures cache writes roll back with surrounding database transactions — without it you can cache data for a transaction that never committed, and that data persists for the full TTL.

The Full Annotation Triad: @Cacheable, @CachePut, and @CacheEvict

Most caching tutorials demonstrate @Cacheable and treat the other two annotations as footnotes. In production systems that handle both reads and writes, you need all three and you need to understand when each one is the right tool. Getting this wrong does not produce errors — it produces stale data that is served with full confidence.

@Cacheable is the read-side annotation. It checks the cache before every invocation and short-circuits the method body on a cache hit. The method body only executes when the key is absent. This is the annotation you add to read-heavy methods where the result is deterministic for a given input.

@CachePut is the write-side update annotation. It always executes the method body and always writes the result to the cache afterward. No cache-check shortcut happens. The value is that after a write, the next read for that key gets fresh data from cache with zero miss penalty — the cache was updated in the same operation that updated the database.

@CacheEvict is the write-side deletion annotation. It removes the entry from the cache. The method body executes, the database is updated, and the cache entry is gone. The next read for that key is a cache miss and goes to the database. Cheaper on the write operation than @CachePut, but the trade-off is that one read after every write pays the full database cost.

The choice between @CachePut and @CacheEvict on update operations depends on your read-to-write ratio. In a system where a product is updated once a day and read 50,000 times, @CachePut is almost always the right choice — the slightly more expensive write is amortized over tens of thousands of reads that benefit from the warm cache.

@Caching is the annotation you need when a single method must affect multiple cache namespaces. I have seen this mistake repeatedly: a developer adds @CacheEvict on an update method, targets the product detail cache, ships it, and then gets a bug report that the product listing page shows stale data. The product listing cache was not evicted. Product detail and product list are two separate cache namespaces containing representations of the same entity. When you modify an entity, every cache that holds any representation of it must be invalidated.

io/thecodeforge/cache/service/ForgeProductServiceAdvanced.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
package io.thecodeforge.cache.service;

import io.thecodeforge.cache.model.Product;
import io.thecodeforge.cache.model.ProductSummary;
import org.springframework.cache.annotation.CacheConfig;
import org.springframework.cache.annotation.CacheEvict;
import org.springframework.cache.annotation.CachePut;
import org.springframework.cache.annotation.Cacheable;
import org.springframework.cache.annotation.Caching;
import org.springframework.stereotype.Service;
import java.util.List;
import java.util.stream.Collectors;

/**
 * @CacheConfig declares the default cache name for all annotations in this class.
 * Eliminates the value = "products" repetition on every annotation.
 * Methods can still override with an explicit value when needed.
 */
@Service
@CacheConfig(cacheNames = "products")
public class ForgeProductServiceAdvanced {

    // Read: check cache first, skip method on hit
    @Cacheable(key = "#id", unless = "#result == null")
    public Product getProductById(Long id) {
        return fetchFromDatabase(id);
    }

    // Read: cache the entire list under a fixed key
    // unless condition prevents caching an empty list that may be a transient state
    @Cacheable(key = "'list:all'", unless = "#result == null || #result.isEmpty()")
    public List<ProductSummary> getAllProducts() {
        return fetchAllFromDatabase().stream()
            .map(p -> new ProductSummary(p.getId(), p.getName()))
            .collect(Collectors.toList());
    }

    // Update: always execute, update detail cache — zero miss penalty on next detail read
    // Does NOT touch the list cache — use updateProductAndClearList when list must also be fresh
    @CachePut(key = "#product.id")
    public Product updateProduct(Product product) {
        saveToDatabase(product);
        return product;
    }

    /**
     * The correct update pattern when an entity appears in multiple cache namespaces:
     * - @CachePut on the detail cache: next detail read is a guaranteed hit
     * - @CacheEvict on the list cache: next list read re-fetches from DB (list is rebuilt fresh)
     *
     * Why evict the list instead of put? Rebuilding a list cache entry requires fetching
     * all items from the database — too expensive to do on every single product update.
     * Accept one list cache miss per update; pay the DB cost once to get a fresh list.
     */
    @Caching(
        put = { @CachePut(key = "#product.id") },
        evict = { @CacheEvict(key = "'list:all'") }
    )
    public Product updateProductAndClearList(Product product) {
        saveToDatabase(product);
        return product;
    }

    // Delete: remove detail cache entry, let next read re-fetch or confirm absence
    @CacheEvict(key = "#id")
    public void deleteProduct(Long id) {
        deleteFromDatabase(id);
    }

    // Nuclear option: clears the entire products namespace
    // Every subsequent read is a DB hit until the cache warms up — use with intent
    @CacheEvict(allEntries = true)
    public void clearEntireCache() {
        // Intentionally empty — annotation handles the eviction
    }

    private Product fetchFromDatabase(Long id) {
        return new Product(id, "Forge Industrial Drill", 149.50);
    }

    private List<Product> fetchAllFromDatabase() {
        return List.of(
            new Product(1L, "Forge Drill", 149.50),
            new Product(2L, "Forge Wrench", 29.99)
        );
    }

    private void saveToDatabase(Product product) { /* persistence logic */ }
    private void deleteFromDatabase(Long id) { /* deletion logic */ }
}
▶ Output
// Initial read — cache miss:
// GET /product/1 -> 45ms (DB query, result stored in products::1)
//
// Subsequent reads — cache hit:
// GET /product/1 -> 3ms (from Redis, method body skipped)
//
// Update with @CachePut — no miss penalty on next read:
// PUT /product/1 -> 18ms (DB write + cache update)
// GET /product/1 -> 2ms (fresh data from updated cache entry)
//
// Update with @Caching — detail refreshed, list invalidated:
// PUT /product/1 via updateProductAndClearList -> 20ms
// GET /product/1 -> 2ms (detail cache hit — fresh)
// GET /products -> 38ms (list cache miss — rebuilt from DB, then cached)
// GET /products -> 4ms (list cache hit on subsequent request)
//
// Delete — detail cache cleared:
// DELETE /product/1 -> 12ms (DB delete + cache eviction)
// GET /product/1 -> 45ms (cache miss, DB returns null, not cached due to unless)
💡@CachePut vs @CacheEvict on Update — Making the Right Trade-off
@CachePut costs more on the write path: the method always executes, the result is serialized, and a Redis write happens. The benefit is that the next read for that key is guaranteed to be a cache hit with fresh data — zero miss penalty. @CacheEvict costs less on the write path: the method executes and Redis deletes the key. The next read is a guaranteed cache miss that pays the full database cost. The right choice depends on your read-to-write ratio for that specific entity. A product that is updated once per day and read 100,000 times should use @CachePut — one expensive write is nothing compared to 100,000 cheap cache hits. A user preference record that is updated frequently and read rarely should use @CacheEvict — the overhead of keeping the cache constantly warm is not worth it. If you are unsure, start with @CacheEvict. It is simpler, less error-prone, and you can always move to @CachePut later when you have read/write ratio data from production metrics.
📊 Production Insight
A platform team updated a product entity using @CacheEvict that correctly targeted the product detail cache by ID. The bug report came in two hours later: the product listing page was showing the old product name. The list cache lived under a separate key — 'list:all' — in the same products namespace. The @CacheEvict had no knowledge of it. Both the detail cache and the list cache contained the same product, but only one was being evicted on update. The fix was converting the update method to use @Caching with both @CachePut for the detail entry and @CacheEvict for the list key. The lesson was written into the team's code review checklist: when you modify an entity, list every cache namespace that holds any representation of it and verify each one is handled.
🎯 Key Takeaway
@Cacheable for reads, @CachePut for updates where zero miss penalty matters, @CacheEvict for deletions and updates where write simplicity matters more than read speed.
@Caching is the correct tool when one write affects multiple cache namespaces — do not add two separate annotations on two separate methods when one method should do both operations atomically.
Forgetting to evict related caches — list caches, summary caches, aggregated views — is the single most common source of stale data bugs in production caching implementations.

Custom Key Generation: Handling Complex Method Signatures

Spring's default key generator is SimpleKeyGenerator. For methods with a single parameter, it uses that parameter as the key directly. For methods with multiple parameters, it constructs a composite key from all parameters. This works for simple cases but creates real problems the moment you have methods with identical parameter signatures across the same cache namespace.

In a product service I worked on, we had two methods: getProductById(Long id) and getInventoryCount(Long id). Both accepted a single Long parameter. Both used the same cache namespace. SimpleKeyGenerator produced the key 42 for getProductById(42L) and also 42 for getInventoryCount(42L). In practice this meant that whichever method was called first would populate the cache, and the second method would read that result and serve it as if it were its own. Getting a Product object back from a method that should return an Integer inventory count causes an immediate ClassCastException — which is actually the best-case scenario because it surfaces the bug immediately. The subtle version is when the types are compatible and wrong data is served silently.

A custom key generator that includes the class name and method name in every key eliminates this class of bug entirely. It adds a small amount of key length overhead — the keys become more verbose — but the clarity and safety are worth it on any system with more than a handful of cached methods.

io/thecodeforge/cache/config/ForgeKeyGeneratorConfig.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839
package io.thecodeforge.cache.config;

import org.springframework.cache.interceptor.KeyGenerator;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.lang.reflect.Method;
import java.util.Arrays;
import java.util.stream.Collectors;

@Configuration
public class ForgeKeyGeneratorConfig {

    /**
     * Custom key generator that prefixes every cache key with the class name and method name.
     * Prevents cache key collisions between methods that have identical parameter signatures.
     *
     * Key format: ClassName:methodName:[param1, param2, ...]
     * Example: ForgeProductService:getProductById:[42]
     *
     * Reference this generator in annotations: @Cacheable(keyGenerator = "forgeKeyGenerator")
     * Or set it as the global default in RedisCacheManager if you want it applied everywhere.
     */
    @Bean("forgeKeyGenerator")
    public KeyGenerator forgeKeyGenerator() {
        return (Object target, Method method, Object... params) -> {
            String className = target.getClass().getSimpleName();
            String methodName = method.getName();

            // Handle zero-parameter methods cleanly
            String paramsPart = (params == null || params.length == 0)
                ? "no-args"
                : Arrays.stream(params)
                    .map(p -> p == null ? "null" : p.toString())
                    .collect(Collectors.joining(",", "[", "]"));

            return className + ":" + methodName + ":" + paramsPart;
        };
    }
}
▶ Output
// Generated cache keys — human-readable, collision-free:
//
// ForgeProductService:getProductById:[42]
// ForgeProductService:getInventoryCount:[42]
// ForgeProductService:getAllProducts:[no-args]
// ForgeProductServiceAdvanced:getProductById:[1]
//
// These keys are distinct even when the underlying parameters are identical.
// redis-cli --scan --pattern '*getProductById*' finds only product entries.
// redis-cli --scan --pattern '*getInventoryCount*' finds only inventory entries.
//
// Register in annotation:
// @Cacheable(value = "products", keyGenerator = "forgeKeyGenerator")
//
// Or set as global default in RedisCacheManager:
// .cacheDefaults(defaultConfig.computePrefixWith(name -> name + "::forgeKeyGenerator::"))
💡When to Use Custom Keys vs SpEL Expressions
You have two options for controlling cache key format. SpEL expressions in the key attribute work well for simple, method-specific customization: key = "#id + ':' + #region" or key = "T(java.util.Objects).hash(#userId, #tenantId)". They are readable inline and do not require a separate bean. Use SpEL when the key logic is specific to one method. A custom KeyGenerator bean works better when you want consistent behavior across all cached methods without remembering to add a SpEL expression to each one. It is also easier to test in isolation. Use a custom generator when you have more than a handful of cached methods and want to enforce a naming convention globally. The two approaches can coexist: set a global custom generator as the default and override with explicit key SpEL on specific methods that need different behavior.
📊 Production Insight
A services layer had getProductById(Long id) and getInventoryCount(Long id) both cached in the same namespace with default key generation. Under normal operation the bug was dormant — the two methods were rarely called in close succession for the same ID. During a load test that exercised both endpoints concurrently, the ClassCastException appeared intermittently. Intermittent ClassCastException from a cached method call is a reliable signal of a key collision — the method received a cached value that was stored by a different method. A custom key generator that prefixed class and method name to every key resolved all collisions in one change.
🎯 Key Takeaway
SimpleKeyGenerator uses method parameters directly as cache keys — methods with identical parameter types across the same namespace will collide and produce wrong data or ClassCastException.
Always use a custom key generator when multiple cached methods in the same namespace accept parameters of the same type, or when you have overloaded methods.
Including the class name and method name in every generated key is the simplest and most reliable way to prevent collisions without requiring per-annotation SpEL expressions.

Monitoring and Observability: Know Your Cache Hit Ratio

A cache you cannot observe is a cache you cannot trust. You may think it is working. It may not be. And you will not find out until your database bills spike or your on-call rotation gets a 3 AM page.

The single most important caching metric is the hit ratio: the proportion of cache lookups that return a cached value versus those that fall through to the database. A hit ratio below 80% on a cache that is supposed to be saving you database calls is a signal that something is wrong — TTLs are too short for the access pattern, cache keys are not matching, eviction is happening too aggressively, or the cache is simply cold after a recent deployment.

Spring Boot Actuator with Micrometer exports cache metrics automatically when you have spring-boot-starter-actuator and the metrics.cache.instrument property enabled. The cache.gets metric is tagged with result:hit and result:miss, giving you the raw counts to calculate the ratio. It is also tagged with cache:products, cache:categories, and so on, so you can see the ratio per namespace rather than aggregated across all caches — which matters because a problem in one namespace is invisible when its misses are averaged with hits from five healthy namespaces.

On a production dashboard I maintained, we had an alert set on per-namespace hit ratio dropping below 85% for more than five consecutive minutes. That alert fired once at 9:15 AM on a Monday — a deployment the previous Friday had changed how the products cache key was formatted. The old keys still existed in Redis but the new key format no longer matched them. The cache appeared full and healthy from a memory perspective. From a hit perspective, it was 0% — every request was a miss against a cache full of orphaned keys that would never be hit again. The alert fired in 5 minutes. Without it, we would have found out when the database team escalated CPU alarms at peak afternoon traffic.

application.yml · YAML
123456789101112131415161718192021222324252627282930
spring:
  cache:
    type: redis
  redis:
    host: ${REDIS_HOST:localhost}
    port: ${REDIS_PORT:6379}
    password: ${REDIS_PASSWORD:}
    timeout: 2000ms
    lettuce:
      pool:
        max-active: 16    # Increase from default 8 — exhausts fast under burst traffic
        max-idle: 8
        min-idle: 4
        max-wait: 2000ms  # Fail fast: threads wait max 2s for a connection, then get an exception

management:
  endpoints:
    web:
      exposure:
        # Expose the endpoints needed for cache observability
        include: caches, metrics, health, info
  metrics:
    cache:
      instrument: true  # Required to enable cache.gets, cache.puts, cache.evictions metrics
    export:
      prometheus:
        enabled: true   # Scrape-ready for Prometheus — pair with Grafana for dashboards
  endpoint:
    health:
      show-details: always  # Shows Redis connectivity status in health response
▶ Output
// GET /actuator/caches — lists all registered cache namespaces:
// {"cacheManagers":{"cacheManager":{"caches":{
// "products": {"target":"org.springframework.data.redis.cache.RedisCache"},
// "categories": {"target":"org.springframework.data.redis.cache.RedisCache"},
// "userSessions":{"target":"org.springframework.data.redis.cache.RedisCache"}
// }}}}
//
// GET /actuator/metrics/cache.gets — total lookup count across all caches:
// {"name":"cache.gets","measurements":[{"statistic":"COUNT","value":28419}],
// "availableTags":[
// {"tag":"result","values":["hit","miss"]},
// {"tag":"cache","values":["products","categories","userSessions"]}
// ]}
//
// GET /actuator/metrics/cache.gets?tag=result:hit&tag=cache:products
// {"measurements":[{"statistic":"COUNT","value":26203}]}
//
// GET /actuator/metrics/cache.gets?tag=result:miss&tag=cache:products
// {"measurements":[{"statistic":"COUNT","value":2216}]}
//
// Per-namespace hit ratio: 26203 / (26203 + 2216) = 92.2% — healthy
//
// Prometheus query for Grafana panel:
// rate(cache_gets_total{result="hit",cache="products"}[5m])
// / rate(cache_gets_total{cache="products"}[5m])
⚠ Alert on Per-Namespace Hit Ratio Below 85%, Not Aggregate
A global cache hit ratio of 90% looks healthy. But if one cache namespace is at 40% and all others are at 98%, the global number hides the problem. Alert on per-namespace hit ratios using the cache tag in Micrometer metrics. Also monitor Redis memory pressure separately from hit ratio. A Redis instance approaching maxmemory will begin evicting keys based on the configured policy. The evictions metric in cache metrics will start climbing. If evictions are happening, your hit ratio will degrade even if your TTL and key strategy are correct — you are simply losing entries to memory pressure before they expire naturally.
📊 Production Insight
A deployment on a Friday changed the products cache key format from products::42 to products::product:42 to support multi-entity namespacing in a future refactor.
The old keys already in Redis used the old format and were never matched by the new code — hit ratio dropped from 93% to 0% for the products namespace instantly.
Without per-namespace hit ratio alerting, this was discovered 4 hours later via a database CPU alarm — with a proper alert at 85%, it fires within 5 minutes of deployment.
🎯 Key Takeaway
Cache hit ratio is the primary health signal for caching — monitor it per namespace, not just in aggregate, and alert on it before it reaches your database team's inbox as a CPU spike.
Actuator exposes cache.gets with result and cache tags — calculate per-namespace hit ratio as hits / (hits + misses) and export to Prometheus for time-series alerting.
A sudden hit ratio drop immediately after a deployment almost always means a key format change that left orphaned keys in Redis that no longer match new requests.

Graceful Degradation: When Redis Goes Down

Here is an uncomfortable truth that most caching tutorials skip: Redis will go down. Not might — will. A network partition, a memory exhaustion event, a cloud provider maintenance window, a misconfigured deployment that sends the wrong credentials. The question is not whether Redis will be unavailable at some point, but whether your application handles it gracefully or returns a page of 500 errors.

If every Redis connection failure translates directly into an unhandled exception that propagates to your controllers, Redis is not a cache — it is a single point of failure. Your application has an undeclared hard dependency on a piece of infrastructure that you are presenting to users as optional performance optimization.

The correct architecture: when Redis is unreachable, fall back to the database directly. The application becomes slower — every request pays the full database cost — but it remains functional. Users experience degraded performance rather than a broken application. This is a measurably better user outcome.

I was on the team for a Black Friday incident where Redis hit its configured maxmemory limit at 11:47 AM and started rejecting new connections. We had a connection pool of 16 — all 16 slots were taken by threads trying to write to a Redis that was rejecting them. New requests queued behind those threads. Within 90 seconds, the checkout flow was returning 503s under load balancer timeout. We had implemented fallback logic in the payment service but not in the product cache service — the product cache service was considered lower risk. It turned out to be the bottleneck that took down checkout. After that incident, every service that touched Redis got explicit fallback logic regardless of perceived risk.

io/thecodeforge/cache/service/ForgeResilientProductService.java · JAVA
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
package io.thecodeforge.cache.service;

import io.thecodeforge.cache.model.Product;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.cache.Cache;
import org.springframework.cache.CacheManager;
import org.springframework.stereotype.Service;

/**
 * Demonstrates explicit cache interaction with graceful fallback.
 *
 * This pattern is used when you need finer control than @Cacheable provides —
 * for example, when you want to handle Redis failures differently per method,
 * or when you want to log cache miss reasons at different severity levels.
 *
 * For simpler cases, configure a CacheErrorHandler bean that Spring Boot
 * calls automatically on cache exceptions without removing the @Cacheable annotation.
 */
@Service
public class ForgeResilientProductService {

    private static final Logger log = LoggerFactory.getLogger(ForgeResilientProductService.class);
    private static final String PRODUCTS_CACHE = "products";

    private final CacheManager cacheManager;
    private final ForgeProductRepository productRepository;

    public ForgeResilientProductService(
        CacheManager cacheManager,
        ForgeProductRepository productRepository
    ) {
        this.cacheManager = cacheManager;
        this.productRepository = productRepository;
    }

    public Product getProductWithFallback(Long id) {
        try {
            Cache cache = cacheManager.getCache(PRODUCTS_CACHE);
            if (cache != null) {
                Cache.ValueWrapper wrapper = cache.get(id);
                if (wrapper != null && wrapper.get() != null) {
                    return (Product) wrapper.get();
                }
            }

            // Cache miss — fetch from database
            Product product = productRepository.findById(id).orElse(null);

            // Only cache non-null results — do not cache absence
            if (cache != null && product != null) {
                try {
                    cache.put(id, product);
                } catch (Exception writeEx) {
                    // Redis write failure should not fail the request
                    // The data was fetched successfully — return it even without caching
                    log.warn("Redis write failed for products::{} — serving DB result without caching",
                        id, writeEx);
                }
            }

            return product;

        } catch (Exception readEx) {
            // Redis is completely unreachable — skip cache, go directly to DB
            log.warn("Redis unavailable, falling back to direct DB access for product id={}",
                id, readEx);
            return productRepository.findById(id).orElse(null);
        }
    }
}
▶ Output
// Normal operation — Redis available:
// GET /api/product/1 -> Cache miss: 42ms (DB fetch + Redis write)
// GET /api/product/1 -> Cache hit: 3ms (Redis read, method short-circuited)
//
// Redis unreachable — graceful fallback:
// GET /api/product/1 ->
// WARN: Redis unavailable, falling back to direct DB access for product id=1
// -> 42ms (DB fetch, no cache write attempted)
// -> 200 OK with correct product data (slow but not broken)
//
// Redis write fails but read works (partial degradation):
// GET /api/product/1 ->
// WARN: Redis write failed for products::1 — serving DB result without caching
// -> 42ms (DB result returned, not cached this time)
//
// Redis recovers — normal operation resumes automatically:
// GET /api/product/1 -> Cache hit: 3ms (no restart needed, first successful write restored the entry)
Mental Model
Cache Is an Optimization, Not a Hard Dependency
If your database cannot survive without Redis, then Redis is not a cache — it is a critical dependency with all the availability obligations that implies. The entire value proposition of caching is that your system works without it, just slower.
  • Always size your database to handle 100% of read traffic with zero cache assistance — this is not pessimistic, it is the only safe design
  • Graceful degradation means your users experience increased latency, not a 500 error page — that is a categorically different user impact
  • Resilience4j circuit breakers can automate the fallback: after N consecutive Redis failures, stop trying Redis entirely and route all calls to the database until a health check probe succeeds
  • When Redis recovers after an outage, the circuit breaker allows a small number of probe requests through before fully restoring cache routing — prevents thundering herd on recovery
  • Log Redis failures at WARN level, not ERROR — they are operational events, not application bugs, and you do not want them triggering high-severity PagerDuty alerts at 3 AM
📊 Production Insight
On Black Friday, a product cache service hit Redis maxmemory at 11:47 AM. Redis started rejecting connection requests. The product cache service had no graceful degradation — every Redis rejection became an unhandled exception that propagated as a 503. The payment flow depended on the product service to validate items in the cart before processing payment. With product lookups failing, checkout broke. The incident lasted 22 minutes. The post-mortem had one primary action item: implement Redis fallback in every service, regardless of perceived risk. The product cache service was considered non-critical. It was not.
🎯 Key Takeaway
Redis will become unavailable at some point — the only question is whether your application degrades gracefully or fails loudly.
Implement explicit try-catch fallback in every service that uses Redis, whether through direct cache API calls or a CacheErrorHandler bean registered with the CacheManager.
Size your database to handle 100% of traffic without cache assistance — if it cannot, Redis is a hard dependency, not a cache, and must be treated with the same SLA obligations as your database.

Docker Setup for Local Development

Testing caching locally requires a Redis instance that behaves like production. The most common local development mistake is running Redis with no memory limit — which means it will never evict keys, never experience memory pressure, and will never reproduce the class of bugs that only appear when Redis starts making eviction decisions under load.

The docker-compose configuration below mirrors production behavior by setting maxmemory to 256MB and using the allkeys-lru eviction policy. Under this configuration, your local Redis behaves the same way as a production Redis under memory pressure. Keys that have not been accessed recently get evicted when memory fills up. If your application has a bug where it never re-fetches an evicted key correctly, this local configuration surfaces it before you ship.

allkeys-lru means: when Redis needs to free memory, evict the least recently accessed key regardless of whether it has a TTL. Other options are volatile-lru (only evict keys that have a TTL set, leave no-TTL keys alone), allkeys-lfu (evict least frequently used rather than least recently used), and noeviction (reject write commands when full, which causes Redis write failures). For caching, allkeys-lru is almost always the right choice because you want the cache to self-manage under pressure and retain the most actively accessed data automatically.

docker-compose.yml · YAML
12345678910111213141516171819202122232425262728293031323334
services:
  redis:
    image: redis:7.2-alpine  # Pin to a specific minor version — alpine for smaller image footprint
    ports:
      - "6379:6379"
    command: >
      redis-server
      --maxmemory 256mb
      --maxmemory-policy allkeys-lru
      --appendonly yes
      --appendfsync everysec
    volumes:
      - redis-data:/data  # Persist data across docker-compose restarts
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 5s

  redis-commander:
    # Web UI for browsing cached keys and inspecting JSON values during development
    # Remove from production — use RedisInsight or Grafana dashboard instead
    image: rediscommander/redis-commander:latest
    ports:
      - "8081:8081"
    environment:
      - REDIS_HOSTS=local:redis:6379
    depends_on:
      redis:
        condition: service_healthy  # Wait for Redis health check to pass before starting

volumes:
  redis-data:
▶ Output
// Start the stack:
// docker-compose up -d
//
// Verify Redis is up and responding:
// docker-compose exec redis redis-cli ping
// -> PONG
//
// Check memory configuration matches what you set:
// docker-compose exec redis redis-cli INFO memory | grep -E 'used_memory_human|maxmemory_human'
// -> used_memory_human: 2.34M
// -> maxmemory_human: 256.00M
//
// Verify eviction policy:
// docker-compose exec redis redis-cli CONFIG GET maxmemory-policy
// -> maxmemory-policy: allkeys-lru
//
// Browse cached keys via Redis Commander:
// http://localhost:8081
//
// Verify application health including Redis connectivity:
// curl http://localhost:8080/actuator/health
// -> {"status":"UP","components":{"redis":{"status":"UP","details":{"version":"7.2.x"}}}}
💡Redis Commander for Development, RedisInsight for Production Investigation
Redis Commander provides a lightweight web UI for browsing cached keys, inspecting JSON values, and manually flushing caches during development. It is invaluable when debugging serialization issues — you can see exactly what is stored under a key without constructing a redis-cli command. For production investigation, Redis Commander is not appropriate — it has no authentication by default and exposes full read/write access to your cache. Use RedisInsight (the official Redis desktop application) or build a Grafana dashboard from Prometheus metrics for production observability. The Actuator endpoints provide all the runtime data you need without requiring direct Redis access in production.
📊 Production Insight
A team ran local Redis without maxmemory configured. Their cache tests passed consistently.
In production with a 2GB limit, allkeys-lru evictions during peak traffic exposed a broken fallback path.
The fallback bug was never triggered locally because unlimited Redis never evicted anything — fix: add low-memory integration tests.
🎯 Key Takeaway
Configure maxmemory and maxmemory-policy locally to mirror production — unlimited Redis hides eviction-related bugs that only surface under load.
allkeys-lru is the correct eviction policy for caching — it retains recently accessed data and self-manages under memory pressure.
Redis Commander is a development tool for inspecting cached values — replace it with Actuator endpoints and Grafana for production observability.

Testing Cached Methods: Verify Before You Ship

Caching bugs have a property that makes them particularly expensive: they are usually invisible in development and only surface under production conditions. A cache hit ratio problem requires production-scale traffic to manifest. A TTL misconfiguration takes the full TTL duration to produce stale data. A null-caching bug requires a specific sequence of events — data absent, then present — that is hard to replicate in a unit test.

Despite this, a small set of integration tests catches the majority of caching bugs before they reach production. The three categories you need: cache hit verification (the second call is faster and comes from Redis), eviction verification (cache is empty after the appropriate update or delete operation), and null protection verification (null results are not stored in cache). These three test types cover the happy path, the write path, and the edge case that has bitten the most teams I have worked with.

Write these tests against a real Redis instance, not a mock. Spring's embedded Redis testing support exists but a real Redis instance in Docker reveals serialization bugs, TTL configuration bugs, and connection pool behavior that mocks hide. Use Testcontainers in your CI pipeline to spin up a Redis container for integration tests — it adds two seconds to test startup and is worth every millisecond.

In one of our CI pipelines, we had a cache hit test that verified the second invocation was at least 10x faster than the first. After a refactor changed the cache key from SpEL expression using #id to using the full object #product.id, the test failed because the key format changed and the second call was not a cache hit anymore. The test caught it. The alternative was a 30% database load increase in production that would have taken hours to trace back to a cache key change.

io/thecodeforge/cache/test/ForgeProductServiceCacheTest.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
package io.thecodeforge.cache.test;

import io.thecodeforge.cache.model.Product;
import io.thecodeforge.cache.service.ForgeProductService;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.cache.Cache;
import org.springframework.cache.CacheManager;

import static org.assertj.core.api.Assertions.assertThat;

/**
 * Integration tests for caching behavior.
 *
 * These tests run against a real Redis instance (local Docker or Testcontainers in CI).
 * Mocked cache managers do not catch serialization bugs, TTL bugs, or key format bugs.
 *
 * Test categories:
 *   1. Cache hit — second call returns cached value, method body not re-executed
 *   2. Eviction — update/delete operation correctly removes cache entry
 *   3. Null protection — null method results are not stored in Redis
 */
@SpringBootTest
class ForgeProductServiceCacheTest {

    @Autowired
    private ForgeProductService productService;

    @Autowired
    private CacheManager cacheManager;

    @BeforeEach
    void clearAllCaches() {
        // Isolation: start each test with an empty cache
        // Prevents one test's cache state from affecting another
        cacheManager.getCacheNames().forEach(name -> {
            Cache cache = cacheManager.getCache(name);
            if (cache != null) {
                cache.clear();
            }
        });
    }

    @Test
    @DisplayName("Second call should be served from cache — method body should not execute again")
    void shouldCacheProductAfterFirstCall() {
        // First call — cold cache, method body executes, result stored in Redis
        long start1 = System.currentTimeMillis();
        Product first = productService.getProductById(1L);
        long duration1 = System.currentTimeMillis() - start1;

        // Second call — warm cache, method body skipped, result from Redis
        long start2 = System.currentTimeMillis();
        Product second = productService.getProductById(1L);
        long duration2 = System.currentTimeMillis() - start2;

        assertThat(first).isNotNull();
        assertThat(second.getId()).isEqualTo(first.getId());

        // Cache hit should be at least 10x faster than DB call
        // Adjust threshold based on your simulated DB latency
        assertThat(duration2)
            .as("Cache hit should be significantly faster than DB call (first: %dms, second: %dms)",
                duration1, duration2)
            .isLessThan(duration1 / 10);

        // Verify the entry actually exists in Redis under the expected key
        Cache productsCache = cacheManager.getCache("products");
        assertThat(productsCache).isNotNull();
        Cache.ValueWrapper cached = productsCache.get(1L);
        assertThat(cached).isNotNull();
        assertThat(cached.get()).isInstanceOf(Product.class);
    }

    @Test
    @DisplayName("Cache entry should be absent after update triggers @CacheEvict")
    void shouldEvictCacheOnUpdate() {
        // Warm the cache
        Product product = productService.getProductById(1L);
        assertThat(cacheManager.getCache("products").get(1L)).isNotNull();

        // Trigger eviction
        productService.updateProduct(product);

        // Verify the entry is gone
        assertThat(cacheManager.getCache("products").get(1L)).isNull();
    }

    @Test
    @DisplayName("Null return values should not be stored in the cache")
    void shouldNotCacheNullResult() {
        // ID 999 does not exist — method returns null
        Product result = productService.getProductById(999L);

        assertThat(result).isNull();

        // Verify the cache entry does not exist — null should not be cached
        Cache.ValueWrapper cached = cacheManager.getCache("products").get(999L);
        assertThat(cached)
            .as("Null result should not be stored in cache — unless = '#result == null' should prevent it")
            .isNull();
    }

    @Test
    @DisplayName("@CachePut should update cache without requiring a subsequent cache miss")
    void shouldUpdateCacheWithCachePut() {
        // Initial fetch — cache miss
        productService.getProductById(1L);
        assertThat(cacheManager.getCache("products").get(1L)).isNotNull();

        // Update with @CachePut — cache should be updated, not evicted
        Product updated = new Product(1L, "Forge Updated Drill", 199.99);
        Product returned = productService.updateAndRefreshProduct(updated);

        // Cache entry should still exist — not evicted, updated
        Cache.ValueWrapper cached = cacheManager.getCache("products").get(1L);
        assertThat(cached).isNotNull();
        assertThat(((Product) cached.get()).getName()).isEqualTo("Forge Updated Drill");

        // Verify no extra DB call needed — next read is a cache hit
        long start = System.currentTimeMillis();
        Product afterUpdate = productService.getProductById(1L);
        long duration = System.currentTimeMillis() - start;
        assertThat(duration).isLessThan(50); // Should be a cache hit — sub-50ms
    }
}
▶ Output
// Running ForgeProductServiceCacheTest...
//
// PASS: shouldCacheProductAfterFirstCall
// first call (DB): 2,014ms
// second call (cache): 4ms
// ratio: 503x speedup — cache hit confirmed
// cache entry exists in Redis under key products::1
//
// PASS: shouldEvictCacheOnUpdate
// cache entry found after initial fetch
// cache entry null after updateProduct() — @CacheEvict confirmed
//
// PASS: shouldNotCacheNullResult
// getProductById(999L) returned null
// cache.get(999L) is null — unless = '#result == null' working correctly
//
// PASS: shouldUpdateCacheWithCachePut
// cache updated to 'Forge Updated Drill' without eviction
// subsequent read returned in 4ms — confirmed cache hit after @CachePut
//
// Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
💡Three Tests That Catch 90% of Caching Bugs Before They Reach Production
If you write nothing else, write these three tests for every cached service: (1) cache-hit timing test — first call is slow, second call is at least 10x faster, and the cache entry exists in Redis after the first call. (2) Eviction test — cache entry is present before the update and null after the update or delete operation. (3) Null-caching test — when the method returns null, no entry is written to Redis. These three tests catch: wrong cache name in annotation, key format that does not match on second call, missing or broken @CacheEvict, and missing unless = "#result == null".
📊 Production Insight
A refactor changed the @Cacheable key SpEL expression from #id (the Long parameter directly) to #product.id (accessing a field on an object parameter). The cache name and the underlying data were unchanged. The test that verified the second call was a cache hit failed immediately because the new key format products::product:1 did not match any existing cache entries. The developer caught it in CI within 30 seconds of the test run completing. Without that test, the change would have shipped, the cache would have been effectively disabled for every product lookup (every call would generate a new key and be a miss), and the database CPU alert would have fired several hours later when traffic peaked.
🎯 Key Takeaway
Write integration tests against a real Redis instance — mocks cannot catch serialization bugs, key format bugs, or TTL misconfiguration.
The three test categories to cover: cache hit speed verification, eviction correctness, and null result protection. These catch the vast majority of caching bugs at the annotation and configuration level.
A cache test that fails in CI for 30 seconds is worth more than a database CPU alert that fires hours after the problematic deployment ships.
🗂 Local Caching vs. Distributed Caching
Caffeine provides sub-millisecond access within a single JVM heap — Redis provides consistency across all application instances at the cost of 2 to 5 milliseconds of network round-trip. The right choice depends on whether your deployment topology is single-instance or distributed.
FeatureLocal Caching (Caffeine)Distributed Caching (Redis)
Data LocationApplication JVM heap — zero network overhead, sub-millisecond accessExternal Redis server — 2 to 5ms network round-trip per operation
Consistency Across InstancesNone — each instance has an independent cache. Write on one instance does not evict from others. Users can see different data depending on which server handles their request.Full — all instances share the same cache. Write on any instance updates the shared store. All subsequent reads from any instance see the same value.
PersistenceLost on application restart — cache starts cold after every deploymentPersists across application restarts when Redis appendonly is enabled — cache survives deployments
Network LatencyNear-zero — in-process memory accessLow but real — 2 to 5ms per Redis operation on a well-networked cluster
Operational ComplexityVery low — embedded in the application, no external infrastructureModerate — requires Redis infrastructure, monitoring, backup, and memory management
Maximum Cache SizeBounded by JVM heap — sharing heap with application objects creates GC pressure at large sizesBounded by Redis server memory — can be clustered horizontally for larger datasets
Serialization RequirementNone — objects stay in the same JVM and are not serializedRequired — objects must be serialized (JSON recommended) for network transfer and storage
Best FitSingle-instance applications, reference data that never changes, read-only configuration — anywhere consistency across nodes is not a requirementAny multi-instance deployment, session management, shared state, data that must be consistent across all instances immediately after a write
Combined L1+L2 StrategyCaffeine as L1 — catches hot keys in-process, sub-millisecond, no network. Reduces Redis call volume by handling the most frequently accessed entries locally.Redis as L2 — provides consistency across all nodes and handles keys that miss the local L1 cache. Together the layers give you both speed and correctness.

🎯 Key Takeaways

  • Redis is the correct choice for distributed caching in any multi-instance deployment — local caching with Caffeine produces inconsistent data across instances, which creates intermittent bugs that are extremely difficult to reproduce.
  • Always use GenericJackson2JsonRedisSerializer instead of default Java serialization — JSON is human-readable in Redis CLI, tolerant of backward-compatible schema changes, and does not break across deployments that rename fields.
  • Per-cache TTL configuration is a design decision, not a detail — match each namespace's expiry to its data volatility and the business cost of serving stale data. A uniform global TTL is almost always the wrong choice.
  • Master the full annotation triad: @Cacheable for reads, @CachePut for updates where zero miss penalty on the next read matters, @CacheEvict for deletions and high-write updates. Use @Caching when one method must affect multiple cache namespaces simultaneously.
  • Forgetting to evict related cache namespaces — list caches, summary caches, aggregated views — after an entity update is the most common source of stale data in production caching implementations. Map every entity to every cache that holds any representation of it.
  • Cache hit ratio is the primary health signal for caching — monitor it per namespace using Actuator and Micrometer, export to Prometheus, alert on drops below 85% per namespace. A sudden hit ratio drop after deployment almost always means a key format change without cache flush.
  • Always implement graceful degradation — Redis will become unavailable at some point and your application must fall back to the database, slower but functional, rather than returning 500 errors. Size your database to handle 100% of traffic without Redis.
  • The internal call gotcha — calling a @Cacheable method via this.method() within the same class bypasses the AOP proxy and silently disables caching with no error. Extract cached methods into separate injected beans.
  • Write integration tests against real Redis for cache hit verification, eviction correctness, and null result protection. These three test types catch the majority of caching bugs at the annotation and configuration level before they reach production.
  • Never cache PII without field-level encryption and access controls. Never skip TTL configuration. Never rely on the cache being available — your database is the source of truth, Redis is the optimization layer.

⚠ Common Mistakes to Avoid

    Caching sensitive PII without encryption or access controls
    Symptom

    Redis is commonly deployed without TLS or ACLs on internal networks. Cached JSON values containing user names, email addresses, payment tokens, or session data are readable in plain text by anyone with network access to Redis. A routine redis-cli --scan --pattern 'userSessions::*' followed by GET on any returned key exposes the full session payload.

    Fix

    Encrypt sensitive field values before storing them in the cache, using your application's encryption service rather than relying on Redis transport security alone. Enable Redis TLS for data in transit and configure Redis ACLs to restrict which application credentials can read which keyspaces. For particularly sensitive data, consider whether Redis is the right store at all — some PII categories should not leave the database regardless of performance pressure.

    Calling a @Cacheable method from within the same class — the internal call gotcha
    Symptom

    The cache has zero effect — every invocation executes the method body and hits the database. No error is thrown. The method works correctly from a data perspective. Adding log statements inside the method confirms it executes on every call. No cache entries are created in Redis.

    Fix

    Spring caching works through AOP proxies. Calls to this.method() or direct method calls within the same class bypass the proxy entirely — the caching interceptor never runs. Extract the @Cacheable method into a separate Spring bean and inject it as a dependency. All calls through the injected reference go through the proxy and the caching interceptor fires correctly.

    Cache stampede — popular cache key expires and dozens of simultaneous requests hit the database at once
    Symptom

    Database CPU spikes to 100% in a periodic pattern that exactly matches the TTL of a popular cache entry. The spike lasts for several seconds while one request populates the cache and the others pile up on the database. Latency spikes are predictable and repeatable every N minutes.

    Fix

    Add sync = true to the @Cacheable annotation on the hot method. This uses a lock so only one thread fetches from the database on a cache miss — all other threads wait for that thread's result rather than independently querying the database. For extremely high-volume scenarios, consider a background refresh job that proactively refreshes the cache entry before TTL expiration, keeping the cache continuously warm.

    Not configuring TTL — unbounded cache growth that eventually causes Redis failure
    Symptom

    Redis memory grows steadily over days or weeks. Eventually maxmemory is reached and the configured eviction policy begins removing keys — or if noeviction is configured, Redis starts returning COMMAND DENIED errors on writes. Cache hit ratio becomes unpredictable. Operations team investigates a Redis infrastructure problem that is actually an application configuration problem.

    Fix

    Every cache namespace must have an explicit TTL configured via RedisCacheConfiguration.entryTtl(). Use per-namespace TTLs that match data volatility. Monitor Redis memory with redis-cli INFO memory and set an alert threshold at 75% of maxmemory so you have time to respond before eviction begins.

    Using default Java serialization instead of JSON serialization
    Symptom

    Cached values in Redis are unreadable binary blobs — impossible to inspect during an incident. Any deployment that changes a field name, field type, or adds a non-serializable field causes deserialization failures on existing cached entries. The failure mode is a silent null return or a SerializationException that the framework may swallow, returning null to the caller as if the cache entry did not exist.

    Fix

    Configure GenericJackson2JsonRedisSerializer in your RedisCacheManager bean. Flush affected caches after any deployment that changes the structure of a cached class. JSON deserialization is tolerant of additive schema changes — new fields default to null or their Java defaults on classes that predate the field. Breaking changes like field renames still require a cache flush.

    Caching null values — a deleted entity returns null from cache long after being re-added to the database
    Symptom

    A product is temporarily removed from the database. The first request after removal fetches null from the database and stores it in the cache. For the duration of the TTL, every subsequent request returns null from cache even after the product is re-added. The database has the correct data but the cache wins for every read during the TTL window.

    Fix

    Add unless = "#result == null" to every @Cacheable annotation and add disableCachingNullValues() to your RedisCacheConfiguration as a defense-in-depth layer. These two controls together prevent null from ever being stored in the cache regardless of what the method returns.

    Not implementing graceful degradation — Redis unavailability cascades to application failure
    Symptom

    When Redis becomes unreachable, the application throws RedisConnectionException on every cache interaction. The exception propagates to the controller layer and returns 500 errors to users. A cache infrastructure problem becomes a complete application outage. The system that was supposed to improve reliability has instead introduced a new critical failure mode.

    Fix

    Implement try-catch fallback on all Redis interactions that routes to the database on any Redis exception. Register a custom CacheErrorHandler bean with the CacheManager for declarative fallback handling on @Cacheable annotated methods. Size your database to handle 100% of read traffic without cache assistance — if it cannot, Redis is a hard dependency and must be treated with the same SLA obligations as your primary data store.

    Forgetting to evict related caches on entity updates — same data, different cache namespaces, different staleness
    Symptom

    A product update correctly evicts the product detail cache. The product list cache is a separate namespace. Users see the correct updated name on the product detail page and the old name on the product listing page. Same database entity, two cache namespaces, only one evicted. The bug report says the data is inconsistent depending on the page visited.

    Fix

    Use @Caching to handle all affected cache namespaces in a single method. Before adding @CacheEvict to any update method, list every cache namespace that contains any representation of the entity being updated. Product detail, product list, category product counts, search index representations — if it contains data derived from the entity, it must be evicted or updated when the entity changes.

    Cache key collisions from the default SimpleKeyGenerator across methods with identical parameter signatures
    Symptom

    getProductById(42L) and getInventoryCount(42L) both generate the cache key 42 under the same namespace. Whichever method is called first populates the cache. The second method reads that entry and receives data intended for the first method. In the best case this throws ClassCastException immediately. In the worst case the types are compatible and wrong data is served silently.

    Fix

    Implement a custom KeyGenerator bean that includes the class name and method name in every generated key. Register it with @Bean("forgeKeyGenerator") and reference it in annotations with keyGenerator = "forgeKeyGenerator". Alternatively, set it as the global default in the CacheManager builder so it applies everywhere without per-annotation configuration.

    Not monitoring cache hit ratios — caching is counterproductive and nobody knows
    Symptom

    The cache is configured and appears to be running. Database query volume is higher than expected. Infrastructure costs are climbing. Nobody has checked whether the cache is actually serving requests or whether every call is a miss that pays both the Redis network cost and the database query cost.

    Fix

    Enable Actuator cache metrics with management.metrics.cache.instrument=true. Query cache.gets with result:hit and result:miss tags per cache namespace. Build a Grafana panel for the per-namespace hit ratio. Set an alert for any namespace dropping below 85% for more than five minutes. A hit ratio below 50% on a cache that is supposed to reduce database load means the cache is actively making things slower — the network round-trip cost of the miss is additional overhead on top of the database call you would have made anyway.

Interview Questions on This Topic

  • QWhat is the Cache-Aside pattern and how does Spring Boot implement it using annotations?Mid-levelReveal
    Cache-Aside, also called Lazy Loading, is a caching strategy where the application manages the cache directly rather than the cache being a transparent layer between the application and the database. On a read: check the cache first, return the cached value on a hit, query the database on a miss, store the result, and return it. On a write: update the database and then either evict the cache entry (@CacheEvict) or update it (@CachePut). Spring Boot implements Cache-Aside through AOP proxies on annotated methods. @Cacheable generates a cache key from method parameters, checks the configured cache store, and short-circuits method execution on a hit. @CachePut always executes and writes the result to the cache after execution. @CacheEvict removes entries. The proxy is transparent — the caller has no knowledge of cache interactions. The limitation of AOP proxies is that internal calls within the same class bypass the proxy entirely, which is the most common implementation bug.
  • QExplain the difference between @Cacheable, @CachePut, and @CacheEvict. When would you specifically choose @CachePut over @CacheEvict on an update method?Mid-levelReveal
    @Cacheable: reads the cache before executing. On a hit, the method body is completely skipped and the cached value is returned. On a miss, the method executes and the result is stored. Use for read-heavy, deterministic operations. @CachePut: always executes the method body and always writes the return value to the cache afterward. No shortcircuiting happens. Use for writes where you want the cache to reflect the new state immediately — the next read will be a cache hit with fresh data, paying zero miss penalty. @CacheEvict: removes the cache entry without updating it. The method executes, the database is updated, and the cache entry is deleted. The next read is a guaranteed miss and goes to the database. Choose @CachePut over @CacheEvict when your read-to-write ratio is high for that entity type. If a product is updated once per day and read 100,000 times, @CachePut's slightly more expensive write is trivial compared to the benefit of keeping 100,000 subsequent reads as cache hits. Use @CacheEvict when writes are frequent and the cache freshness benefit of @CachePut does not justify the write overhead.
  • QHow do you handle serialization issues when the class structure of a cached object changes across deployments?SeniorReveal
    With GenericJackson2JsonRedisSerializer (JSON), schema evolution handling depends on whether the change is backward-compatible. Additive changes — adding a new nullable field, adding a new field with a default value — are handled automatically by Jackson's deserialization tolerance. Existing cache entries missing the new field will deserialize with the field set to null or its default, which is usually acceptable. Breaking changes — renaming a field, changing a field's type, removing a field that other code depends on — require one of three strategies: (1) Flush the affected cache namespace as part of the deployment process. This is the most common approach: deploy the new code, immediately flush the cache, and accept a brief period of cache misses while the cache re-warms. (2) Version the cache name — use products_v2 instead of products. Old entries in products are never read; new code reads from and writes to products_v2. Old entries expire naturally. (3) Implement a custom deserializer that handles both the old and new format during a transition period. Always prefer JSON over Java serialization because JSON gives you option 1 and 2 cleanly — with Java serialization, serialVersionUID mismatches cause hard failures rather than graceful tolerance.
  • QWhat is cache hit ratio and how would you monitor it for a Spring Boot application using Actuator and Micrometer?Mid-levelReveal
    Cache hit ratio is the proportion of cache lookups that return a cached value to the total number of lookups. Formula: hits / (hits + misses). A ratio below 80% on a cache that is supposed to be saving database calls signals a configuration problem — TTLs too short for the access pattern, key format mismatch, or aggressive eviction under memory pressure. Spring Boot Actuator with Micrometer exports a cache.gets metric automatically when management.metrics.cache.instrument=true is set. The metric has two tags: result (hit or miss) and cache (the namespace name). Query /actuator/metrics/cache.gets?tag=result:hit&tag=cache:products for hit count and ?tag=result:miss&tag=cache:products for miss count. Divide hits by their sum for the per-namespace ratio. Export to Prometheus using management.metrics.export.prometheus.enabled=true, build a Grafana panel using rate() functions on the counter metrics, and alert on any namespace dropping below 85% for a sustained period. Always alert per-namespace rather than on an aggregate ratio — a problem in one namespace is invisible when averaged with healthy namespaces.
  • QDescribe the AOP proxy pattern in Spring and why it prevents caching from working on internal class calls and private methods.SeniorReveal
    Spring implements caching via Spring AOP using proxy objects. When a bean is marked with @Cacheable methods, Spring creates a proxy that wraps the real bean object. External callers receive a reference to the proxy, not the real bean. When they call a method, the proxy intercepts the call, executes the caching logic (key generation, cache lookup, conditional method invocation, result storage), and delegates to the real bean's method when needed. Internal calls — this.method() or direct method calls within the same class — call the real bean object directly, bypassing the proxy entirely. The caching interceptor never runs. Private methods cannot be proxied at all in the default Spring AOP model because proxies operate at the class boundary level. The fix for internal call issues is to extract the @Cacheable method into a separate Spring bean and inject it as a dependency. All calls through the injected reference go through the proxy and the caching logic fires correctly. The alternative is to use AspectJ load-time or compile-time weaving instead of Spring AOP proxy-based weaving, which does intercept internal calls — but this adds build complexity that is rarely justified.
  • QWhat is a cache stampede and how do you prevent it in Spring Boot?Mid-levelReveal
    A cache stampede, also called the thundering herd problem, occurs when a highly-accessed cache entry expires and multiple concurrent requests simultaneously discover the cache miss. All of them independently query the database to reload the entry. For a popular product page with 1,000 concurrent users, a single TTL expiration can drive 1,000 simultaneous database queries for identical data. Database CPU spikes to 100%, latency spikes, and requests queue up — all from a single cache entry expiring. The simplest prevention in Spring Boot is sync = true on the @Cacheable annotation: @Cacheable(value = "products", key = "#id", sync = true). This uses a distributed lock so only one thread fetches from the database on a miss. All other threads block waiting for that thread's result to be written to the cache, then read it from there. For extremely high-volume scenarios where even the lock contention is unacceptable, use a background cache refresh job that proactively updates the cache entry before TTL expiration — keeping the cache continuously warm so the expiration never triggers a stampede in the first place.
  • QHow would you configure different TTL values for different cache namespaces in Spring Boot with Redis?Mid-levelReveal
    Create a RedisCacheManager bean with a base RedisCacheConfiguration that sets the default TTL, serialization, and null-value behavior. Then create a Map of cache name to RedisCacheConfiguration where each entry calls .entryTtl(Duration) with the namespace-specific expiry. Pass the map to RedisCacheManager.builder(factory).cacheDefaults(defaultConfig).withInitialCacheConfigurations(namedConfigs).build(). Each named cache uses its specific configuration while any cache not in the map uses the default. The TTL values should be driven by data volatility and the business cost of serving stale data: product catalog data that changes once a day can have a 6-hour TTL, session data that must reflect permission changes quickly needs 15 minutes, search results that benefit from freshness but are expensive to compute can use 5 minutes. Document the reasoning for each TTL in code comments — a future engineer who sees TTL of 15 minutes on userSessions should not have to guess why.
  • QExplain the difference between Lettuce and Jedis as Redis clients and why Lettuce is the default in Spring Boot.Mid-levelReveal
    Lettuce uses Netty for asynchronous non-blocking I/O. It is inherently thread-safe and shares a single connection (or a small pool) across all application threads. Requests are multiplexed over the connection — multiple in-flight commands can be sent without waiting for each response. Under burst traffic, additional threads do not require additional connections. Jedis uses synchronous blocking I/O. Each operation blocks the calling thread until the response arrives. It requires a connection pool where each thread claims a connection for the duration of its operation. Under the default pool configuration of max-active=8, only 8 concurrent Redis operations can proceed simultaneously — the 9th thread blocks waiting for a pool slot. Under burst traffic this exhausts immediately and causes cascading latency. Lettuce handles burst traffic more gracefully because threads share connections rather than competing for pool slots. It also supports reactive programming models natively. Lettuce is the Spring Boot default because it better fits modern high-concurrency microservice architectures. Jedis is appropriate only for legacy compatibility or for workloads that specifically require Jedis-only features.
  • QWhat happens when Redis goes down and how would you design your caching layer to degrade gracefully rather than fail completely?SeniorReveal
    Without graceful degradation, any Redis operation failure propagates as a RedisConnectionException or LettuceConnectionException to the calling method, which propagates to the controller, which returns a 500 error. A cache infrastructure failure becomes a user-facing application outage. Graceful degradation requires treating Redis exceptions as operational events rather than fatal errors. The implementation options in increasing sophistication: (1) Implement a custom CacheErrorHandler bean and register it with the CacheManager. Spring calls getFromCache, put, evict, and clear methods on this handler — override them to catch exceptions and fall back to null on reads (triggering method execution) and no-op on writes. (2) Wrap explicit cache API calls in try-catch blocks that fall back to the database directly. (3) Use a Resilience4j CircuitBreaker around the cache interaction — after N consecutive Redis failures, the circuit opens and all cache operations are bypassed entirely without attempting Redis. After a configured wait duration, the circuit allows probe requests to test Redis recovery. The architectural prerequisite for any of these: your database must be sized to handle 100% of read traffic without cache. If it cannot, Redis is a hard dependency with availability obligations — not a cache.
  • QHow do you write effective tests for cached Spring beans and what specific assertions matter most?Mid-levelReveal
    Write integration tests against a real Redis instance — mocked CacheManagers cannot catch serialization failures, key format bugs, TTL misconfiguration, or null-caching bugs. For CI, use Testcontainers to spin up a Redis container automatically. Always call cache.clear() in @BeforeEach to ensure test isolation. The three test categories that catch the majority of caching bugs: (1) Cache hit test — measure first call duration, measure second call duration, assert the second is at least 10x faster, assert the CacheManager contains a non-null entry for the expected key under the expected cache name. This test catches wrong cache name in the annotation, wrong key expression, and internal call gotcha issues. (2) Eviction test — populate the cache, call the update or delete method annotated with @CacheEvict, assert the cache entry is null. This catches missing @CacheEvict annotations, wrong cache name on the evict annotation, and forgotten list or summary cache evictions. (3) Null protection test — call a method that returns null, assert no cache entry exists for that key. This catches missing unless = "#result == null" annotations. A fourth test worth adding: verify @CachePut updates the cache with a new value without evicting — assert the entry exists after the update and contains the updated data, then assert the next read is a cache hit.

Frequently Asked Questions

What is the difference between @Cacheable, @CachePut, and @CacheEvict?

@Cacheable intercepts a method call, generates a cache key from the parameters, checks the cache, and returns the cached value without executing the method body if the key exists. On a miss, it executes the method and stores the result. Use this for read operations where the result is deterministic for a given input.

@CachePut always executes the method and always writes the return value to the cache under the generated key. No shortcircuiting happens. Use this for write operations where you want the cache to hold fresh data immediately after the write — the next read for that key will be a cache hit with the updated value.

@CacheEvict removes the cache entry for the generated key without updating it. Use this for deletes or updates where you are comfortable accepting one cache miss (the read immediately after the write) in exchange for a simpler write path.

The standard pattern: @Cacheable for reads, @CachePut for updates in read-heavy systems, @CacheEvict for deletes and updates in write-heavy systems.

How do I configure different TTLs for different cache namespaces?

Create a RedisCacheManager bean in a @Configuration class. Define a base RedisCacheConfiguration with your default TTL, JSON serializer, and null-value protection. Then build a Map where each key is a cache name and each value is a RedisCacheConfiguration with a namespace-specific .entryTtl(Duration). Pass the map to RedisCacheManager.builder(factory).cacheDefaults(defaultConfig).withInitialCacheConfigurations(namedConfigs).build(). Caches listed in the map use their specific TTL. Caches not listed use the default. The complete configuration example is in the Production Configuration section with concrete TTL values and the reasoning behind each.

Why is my @Cacheable method always hitting the database even though Redis is running?

The most common cause is the internal call problem. If the @Cacheable method is being called via this.method() or as a direct method call from within the same class, Spring's AOP proxy is bypassed entirely. The caching interceptor never runs. The method body executes every time with no cache interaction. To verify: add a log.info statement inside the method body. If it logs on every invocation including the second and third call for the same argument, the proxy is not intercepting.

The fix: extract the @Cacheable method into a separate @Service or @Component class and inject it as a dependency. Call it through the injected reference — all calls through an injected Spring bean reference go through the proxy.

Other causes: cache name in the annotation does not match any name configured in RedisCacheManager, Redis is unreachable and graceful degradation is routing all calls to the database, or the condition SpEL expression evaluates to false and is preventing caching entirely.

What is a cache stampede and how do I prevent it?

A cache stampede occurs when a popular cache entry expires and multiple concurrent requests simultaneously discover the miss. All of them independently query the database to reload the entry instead of one loading it and the rest waiting. For a product page with 500 concurrent users, a single expiration event can trigger 500 identical database queries in rapid succession.

The simplest prevention: add sync = true to the @Cacheable annotation. Spring uses a per-key lock so only one thread executes the method body on a miss. All other threads block waiting for that thread's result to be written to the cache, then read it from there — one database query instead of 500.

For extreme high-volume scenarios, use a background refresh job that proactively updates the cache before TTL expiration. The cache is never cold — the TTL expiration never triggers a stampede because a fresh entry is always written before the old one expires.

Should I use Lettuce or Jedis as my Redis client?

Lettuce is the default in Spring Boot 2.x and 3.x and is the correct choice for virtually all new projects. It uses Netty for non-blocking I/O and shares connections across threads rather than requiring a connection per thread. Under burst traffic, Lettuce handles significantly more concurrent Redis operations without the connection pool exhaustion that Jedis experiences at its default configuration.

Jedis requires a connection pool with a fixed maximum. At the default max-active of 8, only 8 concurrent Redis operations can proceed simultaneously. Under burst traffic, threads queue waiting for pool slots, which causes cascading latency spikes that look like Redis performance problems but are actually connection management problems.

Choose Jedis only if you have legacy infrastructure or compatibility requirements that mandate it. For everything else, Lettuce is the better default.

How do I monitor cache performance in production?

Add spring-boot-starter-actuator and set management.metrics.cache.instrument=true in application.yml. This exposes a cache.gets metric in Micrometer tagged with result (hit or miss) and cache (namespace name).

Query /actuator/metrics/cache.gets?tag=result:hit&tag=cache:products for per-namespace hit count and ?tag=result:miss&tag=cache:products for miss count. Calculate hit ratio as hits / (hits + misses). Below 85% on any namespace warrants investigation.

Export to Prometheus with management.metrics.export.prometheus.enabled=true. Build Grafana panels for per-namespace hit ratio over time using rate() functions on the counter metrics. Alert on any namespace dropping below 85% for more than 5 consecutive minutes — this threshold catches deployment-related key format changes before they produce database CPU alerts.

What happens if Redis goes down and how do I prevent the application from returning 500 errors?

Without explicit handling, any Redis operation failure throws an exception that propagates to the caller and eventually becomes a 500 error. A cache infrastructure problem becomes a user-facing application outage.

The fix is graceful degradation: catch Redis exceptions and fall back to the database directly. Spring provides a CacheErrorHandler interface — implement it and register it with your CacheManager to handle exceptions from cache reads, writes, and evictions without propagating them. For reads, return null from the error handler, which triggers method execution as if the cache missed. For writes and evictions, no-op the error handler so the operation proceeds without caching.

For more robust handling, use a Resilience4j CircuitBreaker that monitors Redis failure rate and opens after a threshold, bypassing Redis entirely until a health check probe confirms recovery.

The prerequisite for any fallback strategy: your database must be sized to handle 100% of read traffic without cache. If it cannot, Redis is a hard dependency, not an optimization.

How do I handle serialization issues when my cached object class changes?

With JSON serialization (GenericJackson2JsonRedisSerializer), you have more flexibility than Java serialization. For backward-compatible changes (adding fields, removing fields with default values), no migration is needed. For breaking changes, you have three options: (1) flush the affected cache after deployment, (2) use a versioned cache name (e.g., products_v2), or (3) implement custom serialization with class versioning. JSON serialization makes debugging these issues much easier since you can inspect keys directly in Redis.

What Redis eviction policy should I use for a cache?

For caching use cases, allkeys-lru is almost always the right choice. It evicts the least recently used keys when memory is full, regardless of whether they have TTLs set. This ensures the cache retains the most frequently accessed data. volatile-lru is useful when you have a mix of data that must never be evicted (use no TTL) and data that can be evicted. noeviction should be avoided for caches as it will cause write failures when memory is exhausted.

How do I test that my caching is working correctly?

Write integration tests that verify cache behavior. First, measure execution time — the second call should be significantly faster. Second, use CacheManager to directly assert cache entries exist after a read and are removed after an eviction. Third, verify that null results are not cached by returning null from a test method and confirming the cache entry does not exist. The Testing Cached Methods section in this guide contains complete, runnable test examples.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousMicroservices with Spring Boot and Spring Cloud
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged