Intermediate 6 min · May 23, 2026

Retry Mechanism in Spring Boot

Q: Can I use Spring Retry and Resilience4j in the same application?

Yes, but avoid applying both to the same method — you'll get double retry behavior (e.g., 3 Spring Retry attempts each with 3 Resilience4j retries = 9 total). Use one or the other per method. Resilience4j is generally preferred for new code because of its metrics integration.

Q: What's the difference between maxAttempts in @Retryable and Resilience4j?

Both count total attempts including the first. maxAttempts=3 means one initial attempt plus two retries. In Resilience4j, maxAttempts=3 in the YAML config means the same thing. Check your configuration carefully — off-by-one errors here affect your retry budget and backoff window.

Q: Should I retry on all RuntimeExceptions by default?

No. Only retry on exceptions that represent transient failures: network timeouts, connection refused, server errors (5xx). Never retry on client errors (4xx — bad request, unauthorized), validation exceptions, or business rule violations. These are deterministic failures that will not resolve on retry. Use include/retryFor to whitelist specific exception types.

Q: How do I test retry behavior in unit tests?

Use a mock that throws the expected exception a configurable number of times before succeeding: when(mockClient.call()).thenThrow(new TimeoutException()).thenThrow(new TimeoutException()).thenReturn(success). Verify the mock was called the expected number of times. For Resilience4j, use the in-memory RetryRegistry without actual delays by setting waitDuration: 0ms in test profile configuration.

Q: Does retry work with Spring WebFlux reactive endpoints?

Spring Retry annotations don't work reactively — they're blocking and based on AOP. For reactive retry, use Project Reactor's built-in .retry() and .retryWhen(Retry.backoff()) operators, or Resilience4j's reactive decorators (Retry.decoratePublisher()). Resilience4j has first-class reactive support.

Q: What happens when retry exhausts and there's no @Recover method?

Spring Retry rethrows the last exception from the final failed attempt. Resilience4j also propagates the last exception (wrapped if necessary). Always provide a @Recover method or fallbackMethod in Resilience4j for production services — never let retry exhaustion surface raw infrastructure exceptions to end users.

Master Spring Boot retry: @Retryable, @Recover, Resilience4j with exponential backoff and jitter, RetryTemplate, idempotency, and retry vs circuit breaker patterns..

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Notes here come from systems that actually shipped.

✓ Production

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Use @Retryable for declarative retry on specific exceptions with configurable backoff and max attempts
Always pair @Retryable with @Recover to handle exhausted retries gracefully without bubbling exceptions
Add jitter to exponential backoff to prevent retry thundering herd when many clients fail simultaneously
Idempotency is mandatory before enabling retry — retrying non-idempotent operations causes double charges, duplicate records, and data corruption
Use circuit breaker alongside retry: retry handles transient failures, circuit breaker stops retrying when a downstream is systemically down

✦ Definition~90s read

What is Retry Mechanism in Spring Boot?

A retry mechanism is a resilience pattern that automatically re-executes a failed operation under the assumption that the failure is transient — a network blip, a momentary service overload, a short database unavailability. Rather than propagating failures to the caller immediately, retry gives the operation multiple chances to succeed with configurable delays between attempts.

★

Retry is like redialing a phone number when the line is busy.

Spring Boot supports two complementary retry libraries. Spring Retry (part of the Spring ecosystem) provides annotation-driven retry via @Retryable and @Recover, plus a programmatic RetryTemplate API. It integrates naturally with Spring AOP. Resilience4j is a standalone fault tolerance library with a more functional API, richer configuration options (especially for bulkhead and rate limiting), and first-class Micrometer metrics integration through spring-boot-starter-actuator.

Retry is a local resilience pattern — it handles transient failures at the call site. It pairs with circuit breaker (which handles systemic failures by opening the circuit after too many failures) and bulkhead (which limits concurrent calls to a downstream service).

In production, these three patterns work together: retry for the optimistic transient case, circuit breaker to fail fast when retry is futile, bulkhead to protect your thread pool from a slow downstream.

Plain-English First

Retry is like redialing a phone number when the line is busy. You don't give up on the first busy signal — you wait a moment and try again. Exponential backoff means each wait is longer than the last, so you're not hammering a struggling service. Jitter is like adding a random few seconds to your wait so you and 10,000 other callers don't all redial at exactly the same millisecond and crash the exchange again.

Your payment service calls a bank API. Network blips happen. The bank's load balancer hiccups for 200ms. Without retry, your customer sees 'Payment failed — please try again' for a transient error that would have resolved itself in a third of a second. With naive retry, all 10,000 concurrent failed requests retry simultaneously and take down the bank's already-struggling service.

Getting retry right is one of the most impactful resilience improvements you can make to a distributed system. Spring Boot provides two battle-tested options: Spring Retry (declarative, annotation-driven, lightweight) and Resilience4j (feature-rich, metrics-integrated, production-hardened). Both support exponential backoff, jitter, custom retry conditions, and fallback logic.

But retry is a sharp knife. Retry a non-idempotent operation and you get double charges. Retry without a circuit breaker and you amplify load on an already-down service. Retry with fixed backoff and synchronized clients create thundering herd. Every retry decision involves trade-offs that experienced engineers sweat over.

This guide walks through both Spring Retry and Resilience4j with complete production configurations, explains when to use each, covers the idempotency requirements that must come first, and draws the precise boundary between retry and circuit breaker. Code examples run on Spring Boot 3.x with Java 17+.

Spring Retry: @Retryable and @Recover

Spring Retry's annotation-driven API is the fastest path from zero to production retry. Add spring-retry and spring-boot-starter-aop to your classpath, put @EnableRetry on your configuration, and annotate methods with @Retryable. Spring AOP wraps the bean in a proxy that intercepts calls, catches specified exceptions, and retries according to your configuration.

The @Retryable annotation has several key attributes: value (or include) specifies which exception types trigger retry; exclude lists exceptions that should immediately propagate without retry; maxAttempts (default 3) controls total attempts including the first; backoff configures the delay strategy between attempts.

The @Backoff annotation controls timing: delay is the initial delay in milliseconds, multiplier enables exponential growth (delay × multiplier^n), maxDelay caps the delay, and random=true adds jitter by multiplying the computed delay by a random factor between 0 and 1. Always set random=true in production.

@Recover is the fallback for exhausted retries. It must be in the same class as @Retryable, have a compatible return type, and accept the exception as its first parameter (with the same additional parameters as the retried method). Without @Recover, Spring Retry rethrows the last exception after exhausting attempts.

A common gotcha: @Retryable doesn't work when calling the method from within the same class. Spring AOP creates a proxy around the bean, but self-calls bypass the proxy. Always inject the service into itself (using @Lazy or @Self-injection) or, better, extract the retryable method into a dedicated infrastructure class.

Self-invocation bypasses @Retryable

Calling a @Retryable method from within the same bean bypasses the AOP proxy and disables retry entirely. Extract retryable methods to a separate @Service or inject the bean into itself with @Lazy.

Production Insight

We discovered our @Retryable was silently doing nothing because the caller and callee were in the same class. Added an integration test that mocks the downstream to throw twice then succeed — caught the self-invocation bug immediately.

Key Takeaway

@Retryable is declarative and powerful, but requires AOP proxy — method must be called from a different Spring bean, and always enable random jitter on backoff.

thecodeforge.io

Spring Boot Retry Mechanism

Resilience4j Retry: Exponential Backoff with Jitter

Resilience4j is the production-grade choice when you need rich configurability, Micrometer metrics integration, reactive support, or composability with circuit breaker and bulkhead. It works well with Spring Boot's auto-configuration through the resilience4j-spring-boot3 starter.

Resilience4j Retry wraps a Supplier, Callable, or function and retries it on configurable exceptions. The Java annotation API (@Retry from resilience4j-spring-boot3) is the most convenient for Spring services — it works via AOP similarly to Spring Retry.

The power of Resilience4j is in its IntervalFunction options. Simple fixed delay is available, but the real value is exponential randomized backoff: IntervalFunction.ofExponentialRandomBackoff() computes delay as initialInterval × multiplier^n, then multiplies by a random factor between 0 and a configurable upper bound. This breaks synchronization between retrying clients.

Resilience4j integrates with Micrometer out of the box — every retry instance exposes metrics: resilience4j.retry.calls (tagged with kind=successful_with_retry, failed_with_retry, successful_without_retry, failed_without_retry). This gives you precise visibility into retry rates in production without adding instrumentation code.

The most powerful production pattern is composing Retry with CircuitBreaker: wrap the retry in a circuit breaker so that when failure rate exceeds the threshold, the circuit opens and retry stops immediately rather than burning retry budget on a genuinely down service. Resilience4j's decorator API makes this composition clean.

Decorator order matters: Retry inside CircuitBreaker

When composing, the circuit breaker should wrap the retry. This way: retry handles transient failures within a closed circuit; circuit breaker opens after too many retry exhaustions. If you invert the order, each retry attempt counts as a separate circuit breaker event, causing the circuit to open prematurely.

Production Insight

Switching from Spring Retry to Resilience4j gave us per-retry-instance Grafana dashboards via Micrometer. We caught that our payment retry rate had doubled week-over-week — the bank was degraded — 3 days before it became a full outage. Visibility is worth the migration.

Key Takeaway

Resilience4j's value over Spring Retry is Micrometer metrics out of the box and clean circuit breaker composition — if you're running Prometheus/Grafana, the metrics alone justify the switch.

Idempotency: The Prerequisite for Retry

No retry discussion is complete without idempotency — it's the non-negotiable prerequisite. Retrying a non-idempotent operation is worse than not retrying at all, because you get silent data corruption instead of visible failures. Before enabling retry on any operation, ask: 'If this executes twice, what happens?' For database reads and idempotent updates (set status = X where status = Y), retry is safe. For inserts, payment charges, or email sends, you must implement idempotency first.

Idempotency implementation has three layers. First, use the idempotency features built into external APIs: Stripe accepts an Idempotency-Key header that deduplicates charges for 24 hours; AWS S3 PUT operations are naturally idempotent; most modern payment processors support this. Always use these native capabilities.

Second, for your own APIs, implement idempotency keys at the application layer. The client generates a stable key (ideally from the business request: SHA256 of orderId + amount + currency), includes it in the request, and your service stores the key alongside the result in a database table. On retry, you detect the key exists and return the cached result without re-executing.

Third, use database-level idempotency for database operations: unique constraints prevent duplicate inserts; INSERT ... ON CONFLICT DO NOTHING (PostgreSQL) or INSERT IGNORE (MySQL) handle concurrent retries safely. For state transitions, WHERE clauses on current state (UPDATE orders SET status='CONFIRMED' WHERE id=? AND status='PENDING') make updates safe to retry.

For distributed systems, the idempotency key table should have an expiry TTL matched to your retry window. Don't keep idempotency records forever — 24-48 hours covers all reasonable retry scenarios and prevents unbounded growth.

Never derive idempotency keys from random UUIDs generated at call time

If you generate a new UUID each attempt, every retry looks like a new request — your idempotency key provides zero protection. Derive keys from stable business identifiers: orderId, customerId, amount, and action type.

Production Insight

We had a nasty bug where retry + UUID idempotency keys charged customers twice. The Stripe dashboard showed two charges with different idempotency keys for the same order. Switched to SHA256(orderId + amount + currency) — problem eliminated permanently.

Key Takeaway

Idempotency is the prerequisite, not the afterthought. Implement it before enabling retry. Derive keys from stable business data, not random values generated at call time.

thecodeforge.io

Spring Boot Retry Mechanism

RetryTemplate: Programmatic Retry

While annotations cover most use cases, RetryTemplate gives you programmatic control over retry logic — useful for batch jobs, complex retry conditions, dynamic retry configuration based on runtime state, or testing retry behavior directly.

RetryTemplate is configurable with RetryPolicy (how many times and on what conditions) and BackOffPolicy (delay strategy). Common policies include SimpleRetryPolicy (max attempts), ExceptionClassifierRetryPolicy (different policies per exception type), and TimeoutRetryPolicy (retry until a wall-clock deadline). BackOff policies include FixedBackOffPolicy, ExponentialBackOffPolicy, and ExponentialRandomBackOffPolicy.

RetryTemplate's execute() method takes a RetryCallback (the operation) and optionally a RecoveryCallback (fallback). The RetryContext passed to the callback contains retry count and the last exception — useful for logging or conditional logic within the retried operation.

A powerful pattern is ExceptionClassifierRetryPolicy: map specific exception types to specific retry policies. Retry ServerException up to 5 times with exponential backoff; retry ThrottledException up to 10 times with longer delays; immediately propagate ValidationException without any retry.

Use ExceptionClassifierRetryPolicy for mixed exception types

Different exceptions deserve different retry behavior: server errors (5xx) are worth retrying multiple times; throttling errors need longer delays and more attempts; client errors (4xx) should never retry. ExceptionClassifierRetryPolicy expresses this cleanly without if-else chains.

Production Insight

RetryTemplate with a RetryListener gave us per-attempt logging with retry count, exception message, and elapsed time — invaluable for debugging intermittent failures in batch jobs where annotation-based retry's logs weren't granular enough.

Key Takeaway

RetryTemplate for batch jobs and dynamic retry configurations; @Retryable for service-level declarative retry. Both support the same policies — choose based on whether you need runtime configurability.

Retry vs. Circuit Breaker: Drawing the Boundary

Retry and circuit breaker are complementary resilience patterns that solve different problems and must be combined to handle the full spectrum of failures. Understanding exactly where one ends and the other begins prevents over-retrying and amplifying load on already-stressed services.

Retry addresses transient failures: the downstream succeeded a moment ago and will succeed again soon. The failure is temporary — a brief network blip, a momentary pod restart, a short GC pause on the downstream. Retry waits and tries again, with the expectation of eventual success within a handful of attempts.

Circuit breaker addresses systemic failures: the downstream has been failing consistently for a meaningful period. Retrying in this state is counterproductive — you're adding load to an already-struggling system and burning your thread pool waiting for timeouts. The circuit breaker tracks failure rate over a sliding window, and when it exceeds a threshold (typically 50%), the circuit opens. In open state, calls fail immediately (without attempting the downstream) for a configured cooldown period. After cooldown, the circuit enters half-open, allows a small number of probe calls, and closes if they succeed.

The correct composition: circuit breaker wraps retry. For each call: the circuit breaker checks if the circuit is open (fail fast if so); if closed, the retry logic executes; if the retry exhausts without success, that counts as a circuit breaker failure event. This way retry handles transient failures within a healthy circuit, and the circuit breaker detects when retry is systematically failing and opens to stop the bleeding.

Key metrics to monitor: retry rate (what % of calls require at least one retry), retry success rate (what % of retried calls eventually succeed), circuit breaker state transitions (closed → open means systemic failure), and time in open state (how long services are failing fast). If retry success rate drops below 50%, your retry budget is being wasted on a systemic failure — the circuit breaker threshold needs tuning.

Slow calls trigger circuit breaker too

A downstream that responds in 10 seconds instead of timing out is just as dangerous as one that fails. Configure slowCallRateThreshold and slowCallDurationThreshold alongside failure rate to open the circuit on slow responses — thread pool exhaustion from slow calls is as deadly as hard failures.

Production Insight

Before we added circuit breakers, our retry logic was making the problem worse during outages — 3 retries × 5-second timeouts × thousands of concurrent requests = thread pool exhaustion in 30 seconds. Circuit breaker cut that to zero thread waste in open state.

Key Takeaway

Retry for transient failures; circuit breaker for systemic failures. Always compose them: circuit breaker wraps retry. If retry success rate is low, your circuit breaker threshold needs tuning.

The Missing Piece: Backoff Policies That Don't Kill Your Backend

The default retry is a blunt instrument. Three retries with a 1-second delay might work for a local database deadlock, but against a flaky external API it becomes a denial-of-service attack. Production incidents taught me to never retry without exponential backoff. Spring Retry gives you multiple backoff strategies via the @Backoff annotation and ExponentialBackOffPolicy. The key insight is multiplier — each attempt delay multiplies (e.g., 2s, 4s, 8s). Add maxDelay to cap the ceiling. For high-throughput systems, combine this with jitter to avoid thundering herd problems on your dependencies. The default backoff is fixed. That's fine for testing. In production, fixed backoff is how you accidentally DDoS your own database. Always pair @Retryable with @Backoff(delay = 1000, multiplier = 2.0, maxDelay = 10000). Your ops team will thank you when that AWS RDS failover happens at 3 AM.

RetryWithBackoff.javaJAVA

// io.thecodeforge — java tutorial
@Service
public class PaymentClient {
    @Retryable(
        retryFor = TimeoutException.class,
        maxAttempts = 5,
        backoff = @Backoff(delay = 2000, multiplier = 2.0, maxDelay = 30000)
    )
    public PaymentResponse charge(PaymentRequest request) {
        return paymentGateway.charge(request);
    }

    @Recover
    public PaymentResponse fallback(TimeoutException e, PaymentRequest request) {
        log.error("Payment failed after 5 attempts: {}", request.id());
        return PaymentResponse.failed(request.id(), "GATEWAY_TIMEOUT");
    }
}

Output

Attempt 1: wait 2s

Attempt 2: wait 4s

Attempt 3: wait 8s

Attempt 4: wait 16s

Attempt 5: wait 30s (capped)

→ Exhausted after 5 attempts, calls @Recover

Production Trap:

Fixed backoff without jitter causes synchronized retries across all instances. Add randomize = true to @Backoff to spread retry windows. For Kafka consumers, this prevents the entire cluster from hammering the DB at the same second.

Key Takeaway

Always specify multiplier in @Backoff. Default fixed delay is for demos, not production. Exponential backoff + jitter = survival.

Retry Configuration: Externalize It, Don't Hardcode It

How many times have you hotfixed a retry count at 2 AM because your payment provider started throttling? Hardcoded @Retryable(maxAttempts = 3) means a recompile and deploy cycle. Stop doing that. Spring Retry supports externalized properties via the @Retryable annotation's maxAttemptsExpression, delayExpression, and backoff attributes. You feed these from application.yml or environment variables. This is a game-changer for multi-region or multi-tenant setups where different environments have different SLOs. The pattern is simple: use SpEL expressions like #{${retry.payment.max-attempts}} and define the defaults in your properties file. When your API starts returning 429s, you bump the delay in the config server and the next retry picks it up. No code change. No pipeline. This is what separates a production battle station from a toy app. Always externalize retry parameters. Your pager will appreciate the difference between a config change and a full deploy at 3 AM.

ConfigurableRetry.javaJAVA

// io.thecodeforge — java tutorial
@Service
public class InventoryService {
    @Retryable(
        retryFor = {DatabaseException.class, NetworkException.class},
        maxAttemptsExpression = "#{${retry.inventory.max-attempts:4}}",
        backoff = @Backoff(
            delayExpression = "#{${retry.inventory.delay:2000}}",
            multiplierExpression = "#{${retry.inventory.multiplier:2.0}}"
        )
    )
    public Stock getStock(String sku) {
        return stockRepository.findById(sku).orElseThrow();
    }

    @Recover
    public Stock fallback(RuntimeException e, String sku) {
        log.warn("Returning stale stock for {} due to: {}", sku, e.getMessage());
        return Stock.stale(sku, cacheService.getLastKnownStock(sku));
    }
}

Output

application.yml:

retry:

inventory:

max-attempts: 5

delay: 1000

multiplier: 3.0

→ Changes take effect on next retry without recompilation.

Pro Tip:

For Spring Cloud Config or Kubernetes ConfigMap users, bind these properties to a @ConfigurationProperties class. Then inject them as a bean and reference them in @Retryable expressions. Avoid stringly-typed magic numbers in code.

Key Takeaway

Use @Retryable with SpEL expressions to externalize retry parameters. Config change > code change when the database is burning.

● Production incidentPOST-MORTEMseverity: high

The Thundering Herd: Synchronized Retries Took Down the Auth Service

Symptom

Auth service returned to normal after a 30-second hardware issue. Traffic monitoring showed the service immediately received 50x normal request volume and crashed again within 10 seconds. The outage extended from 30 seconds to 8 minutes.

Assumption

The team assumed retry would make their services more resilient. They configured maxAttempts=3 with fixedDelay=1000ms across 80 microservices, all making auth calls.

Root cause

All 80 services had been simultaneously failing for 30 seconds. When auth recovered, all services retried their queued failures simultaneously — 80 services × 3 retry attempts × thousands of requests/second = a request storm that overwhelmed auth. Fixed delay (no jitter) meant all retries were synchronized to the same 1-second marks.

Fix

Switched to exponential backoff with full jitter: initialInterval=500ms, multiplier=2.0, maxInterval=30s, random jitter ±50% of computed delay. Added Resilience4j circuit breaker with 50% failure threshold to stop retrying when auth was truly down. The next auth outage lasted 35 seconds; services degraded gracefully and recovered smoothly.

Key lesson

Jitter is not optional — it's mandatory.
Without randomization in retry delays, synchronized failures create synchronized retry storms.
Always add jitter to backoff, and always pair retry with a circuit breaker to stop retrying a service that's genuinely down.

Production debug guideSymptom → root cause → fix5 entries

Symptom · 01

@Retryable has no effect — method fails immediately without retrying

→

Fix

Spring Retry requires the method to be called through a Spring proxy — calling a @Retryable method from within the same class bypasses the AOP proxy. Move the retryable method to a separate @Service bean and inject it. Also verify spring-retry and spring-boot-starter-aop are on the classpath, and @EnableRetry is on a @Configuration class or the main application class. Check that the exception thrown matches the include list on @Retryable.

Symptom · 02

Resilience4j retry not recording metrics in Actuator

→

Fix

Confirm resilience4j-micrometer is on the classpath alongside spring-boot-actuator. Add management.endpoints.web.exposure.include=health,metrics,retries to application.properties. Check that the retry registry bean name matches the name referenced in @Retry(name=). Use curl http://localhost:8080/actuator/retries to list registered retry instances and their current counts.

Symptom · 03

Retry causes duplicate records or double charges

→

Fix

The retried operation is not idempotent. Immediately disable retry for this operation. Implement idempotency: generate a stable idempotency key from business data (order ID + attempt number), persist it to a database before the external call, check for existing results on each attempt. For payment APIs, use the provider's idempotency key header. Only re-enable retry after verifying idempotency end-to-end.

Symptom · 04

All retries exhaust immediately — service takes as long as maxAttempts × timeout

→

Fix

Connection timeout is too high relative to retry delay. If each attempt has a 30-second connection timeout and you have maxAttempts=3, worst case is 90 seconds. Reduce individual call timeouts to be much smaller than the retry budget. Alternatively, use Resilience4j TimeLimiter to bound total execution time including retries. Check if the downstream service is actually down (circuit breaker would be more appropriate than retry here).

Symptom · 05

Exponential backoff reaches maxInterval immediately and stays there

→

Fix

This is expected behavior — maxInterval caps the computed delay. Verify the configuration: initialInterval, multiplier, and maxInterval. If retries are still exhausting without success, the issue is systemic (circuit breaker territory) rather than transient. Add Resilience4j circuit breaker on top of retry, with a half-open probe after the cooldown window.

★ Debug Cheat SheetFast triage for Spring Boot retry issues

@Retryable not retrying−

Immediate action

Check self-invocation and @EnableRetry annotation

Commands

grep -r '@EnableRetry' src/main/java/ --include='*.java'

grep -r 'spring-retry\|spring-boot-starter-aop' pom.xml build.gradle

Fix now

Add @EnableRetry to main class; move @Retryable method to separate @Service bean

Resilience4j retry metrics missing+

Retry causing thundering herd+

Spring Retry vs. Resilience4j Retry

Feature	Spring Retry	Resilience4j
Annotation API	@Retryable / @Recover	@Retry(name=)
Programmatic API	RetryTemplate	RetryRegistry + `Retry.decorateSupplier()`
Backoff options	Fixed, exponential, random	Fixed, exponential, random, custom IntervalFunction
Micrometer metrics	None built-in	First-class — per-instance metrics
Circuit breaker	Separate project (spring-circuit-breaker)	Built-in, same library
Reactive support	Limited	Full (Mono/Flux decorators)
Setup complexity	Low — spring-retry + @EnableRetry	Medium — yaml config + dependency
Exception classification	include/exclude lists	retryOnException predicate
Best for	Simple service retry, quick setup	Production systems needing metrics and CB composition

⚙ Quick Reference

2 commands from this guide

File	Command / Code	Purpose
RetryWithBackoff.java	@Service	The Missing Piece
ConfigurableRetry.java	@Service	Retry Configuration

Key takeaways

Idempotency is the prerequisite for retry

derive keys from stable business identifiers, never from random UUIDs generated at call time

Always enable jitter (random=true in @Backoff, ofExponentialRandomBackoff in Resilience4j) to prevent synchronized retry thundering herd

@Retryable requires AOP proxy

self-invocation silently disables retry; always call from a different Spring bean

Resilience4j + Micrometer gives you retry rate dashboards for free

if you're running Prometheus/Grafana, the migration from Spring Retry pays for itself in observability

Circuit breaker wraps retry in the composition order

circuit breaker detects systemic failure and opens; retry handles transient failures within a closed circuit

Common mistakes to avoid

6 patterns

Using @Retryable with self-invocation (calling from the same class)

Symptom

Method fails immediately without retrying; no retry log entries

Fix

Extract the @Retryable method to a separate @Service bean and inject it; self-calls bypass the AOP proxy

Fixed backoff delay without jitter across many clients

Symptom

After a downstream outage, all clients retry simultaneously creating a thundering herd

Fix

Add random=true to @Backoff or use IntervalFunction.ofExponentialRandomBackoff() — jitter desynchronizes retry waves

Retrying non-idempotent operations (payments, email sends, record creation)

Symptom

Duplicate charges, duplicate emails, or duplicate database records

Fix

Implement idempotency before enabling retry; use provider idempotency keys (Stripe) or an idempotency_keys table

Not configuring noRetryFor for client errors (4xx HTTP status)

Symptom

Retrying bad requests (400), unauthorized (401), not found (404) — wasting retry budget on non-transient failures

Fix

Add HttpClientErrorException to noRetryFor/ignoreExceptions — 4xx errors are never transient and should fail immediately

Missing @Recover method — retries exhaust and throw the raw exception

Symptom

After retries exhaust, callers get raw exception types instead of graceful degradation

Fix

Always add @Recover for production retry logic; return a degraded result or throw a domain-specific exception with context

Retry without circuit breaker — amplifying load on a downed service

Symptom

During outages, retries create 3× the load on already-struggling downstream, extending the outage

Fix

Wrap retry with a circuit breaker; when failure rate exceeds threshold, stop retrying entirely and fail fast

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

What is exponential backoff with jitter and why is jitter critical?

Q02JUNIOR

Why doesn't @Retryable work when calling the annotated method from withi...

Q03SENIOR

What must you implement before enabling retry on a payment charge operat...

Q04SENIOR

Explain the difference between retry and circuit breaker and how they co...

Q05SENIOR

How does Resilience4j retry composition differ from Spring Retry in term...

Q06SENIOR

Describe how you would implement exactly-once semantics for retry on a c...

Q07SENIOR

How would you use ExceptionClassifierRetryPolicy with RetryTemplate?

Q08SENIOR

What is the thundering herd problem in retry, and how do you prevent it?

Q01 of 08JUNIOR

What is exponential backoff with jitter and why is jitter critical?

ANSWER

Exponential backoff increases the delay between retry attempts geometrically — if initial delay is 500ms and multiplier is 2, delays are 500ms, 1s, 2s, 4s. Without jitter, all clients that failed simultaneously retry at the same exponential intervals, creating synchronized retry waves that can overwhelm a recovering service. Jitter adds randomness — multiplying the computed delay by a random factor — so clients retry at different times, distributing load instead of concentrating it.

FAQ · 6 QUESTIONS

Frequently Asked Questions

Can I use Spring Retry and Resilience4j in the same application?

What's the difference between maxAttempts in @Retryable and Resilience4j?

Should I retry on all RuntimeExceptions by default?

How do I test retry behavior in unit tests?

Does retry work with Spring WebFlux reactive endpoints?

What happens when retry exhausts and there's no @Recover method?

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Notes here come from systems that actually shipped.

✓ Verified

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

🔥

That's Messaging. Mark it forged?

6 min read · try the examples if you haven't