Senior 8 min · May 23, 2026

Retry Mechanism in Spring Boot

Master Spring Boot retry: @Retryable, @Recover, Resilience4j with exponential backoff and jitter, RetryTemplate, idempotency, and retry vs circuit breaker patterns.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Use @Retryable for declarative retry on specific exceptions with configurable backoff and max attempts
  • Always pair @Retryable with @Recover to handle exhausted retries gracefully without bubbling exceptions
  • Add jitter to exponential backoff to prevent retry thundering herd when many clients fail simultaneously
  • Idempotency is mandatory before enabling retry — retrying non-idempotent operations causes double charges, duplicate records, and data corruption
  • Use circuit breaker alongside retry: retry handles transient failures, circuit breaker stops retrying when a downstream is systemically down
✦ Definition~90s read
What is Retry Mechanism in Spring Boot?

A retry mechanism is a resilience pattern that automatically re-executes a failed operation under the assumption that the failure is transient — a network blip, a momentary service overload, a short database unavailability. Rather than propagating failures to the caller immediately, retry gives the operation multiple chances to succeed with configurable delays between attempts.

Retry is like redialing a phone number when the line is busy.

Spring Boot supports two complementary retry libraries. Spring Retry (part of the Spring ecosystem) provides annotation-driven retry via @Retryable and @Recover, plus a programmatic RetryTemplate API. It integrates naturally with Spring AOP. Resilience4j is a standalone fault tolerance library with a more functional API, richer configuration options (especially for bulkhead and rate limiting), and first-class Micrometer metrics integration through spring-boot-starter-actuator.

Retry is a local resilience pattern — it handles transient failures at the call site. It pairs with circuit breaker (which handles systemic failures by opening the circuit after too many failures) and bulkhead (which limits concurrent calls to a downstream service).

In production, these three patterns work together: retry for the optimistic transient case, circuit breaker to fail fast when retry is futile, bulkhead to protect your thread pool from a slow downstream.

Plain-English First

Retry is like redialing a phone number when the line is busy. You don't give up on the first busy signal — you wait a moment and try again. Exponential backoff means each wait is longer than the last, so you're not hammering a struggling service. Jitter is like adding a random few seconds to your wait so you and 10,000 other callers don't all redial at exactly the same millisecond and crash the exchange again.

Your payment service calls a bank API. Network blips happen. The bank's load balancer hiccups for 200ms. Without retry, your customer sees 'Payment failed — please try again' for a transient error that would have resolved itself in a third of a second. With naive retry, all 10,000 concurrent failed requests retry simultaneously and take down the bank's already-struggling service.

Getting retry right is one of the most impactful resilience improvements you can make to a distributed system. Spring Boot provides two battle-tested options: Spring Retry (declarative, annotation-driven, lightweight) and Resilience4j (feature-rich, metrics-integrated, production-hardened). Both support exponential backoff, jitter, custom retry conditions, and fallback logic.

But retry is a sharp knife. Retry a non-idempotent operation and you get double charges. Retry without a circuit breaker and you amplify load on an already-down service. Retry with fixed backoff and synchronized clients create thundering herd. Every retry decision involves trade-offs that experienced engineers sweat over.

This guide walks through both Spring Retry and Resilience4j with complete production configurations, explains when to use each, covers the idempotency requirements that must come first, and draws the precise boundary between retry and circuit breaker. Code examples run on Spring Boot 3.x with Java 17+.

Spring Retry: @Retryable and @Recover

Spring Retry's annotation-driven API is the fastest path from zero to production retry. Add spring-retry and spring-boot-starter-aop to your classpath, put @EnableRetry on your configuration, and annotate methods with @Retryable. Spring AOP wraps the bean in a proxy that intercepts calls, catches specified exceptions, and retries according to your configuration.

The @Retryable annotation has several key attributes: value (or include) specifies which exception types trigger retry; exclude lists exceptions that should immediately propagate without retry; maxAttempts (default 3) controls total attempts including the first; backoff configures the delay strategy between attempts.

The @Backoff annotation controls timing: delay is the initial delay in milliseconds, multiplier enables exponential growth (delay × multiplier^n), maxDelay caps the delay, and random=true adds jitter by multiplying the computed delay by a random factor between 0 and 1. Always set random=true in production.

@Recover is the fallback for exhausted retries. It must be in the same class as @Retryable, have a compatible return type, and accept the exception as its first parameter (with the same additional parameters as the retried method). Without @Recover, Spring Retry rethrows the last exception after exhausting attempts.

A common gotcha: @Retryable doesn't work when calling the method from within the same class. Spring AOP creates a proxy around the bean, but self-calls bypass the proxy. Always inject the service into itself (using @Lazy or @Self-injection) or, better, extract the retryable method into a dedicated infrastructure class.

Self-invocation bypasses @Retryable
Calling a @Retryable method from within the same bean bypasses the AOP proxy and disables retry entirely. Extract retryable methods to a separate @Service or inject the bean into itself with @Lazy.
Production Insight
We discovered our @Retryable was silently doing nothing because the caller and callee were in the same class. Added an integration test that mocks the downstream to throw twice then succeed — caught the self-invocation bug immediately.
Key Takeaway
@Retryable is declarative and powerful, but requires AOP proxy — method must be called from a different Spring bean, and always enable random jitter on backoff.

Resilience4j Retry: Exponential Backoff with Jitter

Resilience4j is the production-grade choice when you need rich configurability, Micrometer metrics integration, reactive support, or composability with circuit breaker and bulkhead. It works well with Spring Boot's auto-configuration through the resilience4j-spring-boot3 starter.

Resilience4j Retry wraps a Supplier, Callable, or function and retries it on configurable exceptions. The Java annotation API (@Retry from resilience4j-spring-boot3) is the most convenient for Spring services — it works via AOP similarly to Spring Retry.

The power of Resilience4j is in its IntervalFunction options. Simple fixed delay is available, but the real value is exponential randomized backoff: IntervalFunction.ofExponentialRandomBackoff() computes delay as initialInterval × multiplier^n, then multiplies by a random factor between 0 and a configurable upper bound. This breaks synchronization between retrying clients.

Resilience4j integrates with Micrometer out of the box — every retry instance exposes metrics: resilience4j.retry.calls (tagged with kind=successful_with_retry, failed_with_retry, successful_without_retry, failed_without_retry). This gives you precise visibility into retry rates in production without adding instrumentation code.

The most powerful production pattern is composing Retry with CircuitBreaker: wrap the retry in a circuit breaker so that when failure rate exceeds the threshold, the circuit opens and retry stops immediately rather than burning retry budget on a genuinely down service. Resilience4j's decorator API makes this composition clean.

Decorator order matters: Retry inside CircuitBreaker
When composing, the circuit breaker should wrap the retry. This way: retry handles transient failures within a closed circuit; circuit breaker opens after too many retry exhaustions. If you invert the order, each retry attempt counts as a separate circuit breaker event, causing the circuit to open prematurely.
Production Insight
Switching from Spring Retry to Resilience4j gave us per-retry-instance Grafana dashboards via Micrometer. We caught that our payment retry rate had doubled week-over-week — the bank was degraded — 3 days before it became a full outage. Visibility is worth the migration.
Key Takeaway
Resilience4j's value over Spring Retry is Micrometer metrics out of the box and clean circuit breaker composition — if you're running Prometheus/Grafana, the metrics alone justify the switch.

Idempotency: The Prerequisite for Retry

No retry discussion is complete without idempotency — it's the non-negotiable prerequisite. Retrying a non-idempotent operation is worse than not retrying at all, because you get silent data corruption instead of visible failures. Before enabling retry on any operation, ask: 'If this executes twice, what happens?' For database reads and idempotent updates (set status = X where status = Y), retry is safe. For inserts, payment charges, or email sends, you must implement idempotency first.

Idempotency implementation has three layers. First, use the idempotency features built into external APIs: Stripe accepts an Idempotency-Key header that deduplicates charges for 24 hours; AWS S3 PUT operations are naturally idempotent; most modern payment processors support this. Always use these native capabilities.

Second, for your own APIs, implement idempotency keys at the application layer. The client generates a stable key (ideally from the business request: SHA256 of orderId + amount + currency), includes it in the request, and your service stores the key alongside the result in a database table. On retry, you detect the key exists and return the cached result without re-executing.

Third, use database-level idempotency for database operations: unique constraints prevent duplicate inserts; INSERT ... ON CONFLICT DO NOTHING (PostgreSQL) or INSERT IGNORE (MySQL) handle concurrent retries safely. For state transitions, WHERE clauses on current state (UPDATE orders SET status='CONFIRMED' WHERE id=? AND status='PENDING') make updates safe to retry.

For distributed systems, the idempotency key table should have an expiry TTL matched to your retry window. Don't keep idempotency records forever — 24-48 hours covers all reasonable retry scenarios and prevents unbounded growth.

Never derive idempotency keys from random UUIDs generated at call time
If you generate a new UUID each attempt, every retry looks like a new request — your idempotency key provides zero protection. Derive keys from stable business identifiers: orderId, customerId, amount, and action type.
Production Insight
We had a nasty bug where retry + UUID idempotency keys charged customers twice. The Stripe dashboard showed two charges with different idempotency keys for the same order. Switched to SHA256(orderId + amount + currency) — problem eliminated permanently.
Key Takeaway
Idempotency is the prerequisite, not the afterthought. Implement it before enabling retry. Derive keys from stable business data, not random values generated at call time.

RetryTemplate: Programmatic Retry

While annotations cover most use cases, RetryTemplate gives you programmatic control over retry logic — useful for batch jobs, complex retry conditions, dynamic retry configuration based on runtime state, or testing retry behavior directly.

RetryTemplate is configurable with RetryPolicy (how many times and on what conditions) and BackOffPolicy (delay strategy). Common policies include SimpleRetryPolicy (max attempts), ExceptionClassifierRetryPolicy (different policies per exception type), and TimeoutRetryPolicy (retry until a wall-clock deadline). BackOff policies include FixedBackOffPolicy, ExponentialBackOffPolicy, and ExponentialRandomBackOffPolicy.

RetryTemplate's execute() method takes a RetryCallback (the operation) and optionally a RecoveryCallback (fallback). The RetryContext passed to the callback contains retry count and the last exception — useful for logging or conditional logic within the retried operation.

A powerful pattern is ExceptionClassifierRetryPolicy: map specific exception types to specific retry policies. Retry ServerException up to 5 times with exponential backoff; retry ThrottledException up to 10 times with longer delays; immediately propagate ValidationException without any retry.

Use ExceptionClassifierRetryPolicy for mixed exception types
Different exceptions deserve different retry behavior: server errors (5xx) are worth retrying multiple times; throttling errors need longer delays and more attempts; client errors (4xx) should never retry. ExceptionClassifierRetryPolicy expresses this cleanly without if-else chains.
Production Insight
RetryTemplate with a RetryListener gave us per-attempt logging with retry count, exception message, and elapsed time — invaluable for debugging intermittent failures in batch jobs where annotation-based retry's logs weren't granular enough.
Key Takeaway
RetryTemplate for batch jobs and dynamic retry configurations; @Retryable for service-level declarative retry. Both support the same policies — choose based on whether you need runtime configurability.

Retry vs. Circuit Breaker: Drawing the Boundary

Retry and circuit breaker are complementary resilience patterns that solve different problems and must be combined to handle the full spectrum of failures. Understanding exactly where one ends and the other begins prevents over-retrying and amplifying load on already-stressed services.

Retry addresses transient failures: the downstream succeeded a moment ago and will succeed again soon. The failure is temporary — a brief network blip, a momentary pod restart, a short GC pause on the downstream. Retry waits and tries again, with the expectation of eventual success within a handful of attempts.

Circuit breaker addresses systemic failures: the downstream has been failing consistently for a meaningful period. Retrying in this state is counterproductive — you're adding load to an already-struggling system and burning your thread pool waiting for timeouts. The circuit breaker tracks failure rate over a sliding window, and when it exceeds a threshold (typically 50%), the circuit opens. In open state, calls fail immediately (without attempting the downstream) for a configured cooldown period. After cooldown, the circuit enters half-open, allows a small number of probe calls, and closes if they succeed.

The correct composition: circuit breaker wraps retry. For each call: the circuit breaker checks if the circuit is open (fail fast if so); if closed, the retry logic executes; if the retry exhausts without success, that counts as a circuit breaker failure event. This way retry handles transient failures within a healthy circuit, and the circuit breaker detects when retry is systematically failing and opens to stop the bleeding.

Key metrics to monitor: retry rate (what % of calls require at least one retry), retry success rate (what % of retried calls eventually succeed), circuit breaker state transitions (closed → open means systemic failure), and time in open state (how long services are failing fast). If retry success rate drops below 50%, your retry budget is being wasted on a systemic failure — the circuit breaker threshold needs tuning.

Slow calls trigger circuit breaker too
A downstream that responds in 10 seconds instead of timing out is just as dangerous as one that fails. Configure slowCallRateThreshold and slowCallDurationThreshold alongside failure rate to open the circuit on slow responses — thread pool exhaustion from slow calls is as deadly as hard failures.
Production Insight
Before we added circuit breakers, our retry logic was making the problem worse during outages — 3 retries × 5-second timeouts × thousands of concurrent requests = thread pool exhaustion in 30 seconds. Circuit breaker cut that to zero thread waste in open state.
Key Takeaway
Retry for transient failures; circuit breaker for systemic failures. Always compose them: circuit breaker wraps retry. If retry success rate is low, your circuit breaker threshold needs tuning.

The Missing Piece: Backoff Policies That Don't Kill Your Backend

The default retry is a blunt instrument. Three retries with a 1-second delay might work for a local database deadlock, but against a flaky external API it becomes a denial-of-service attack. Production incidents taught me to never retry without exponential backoff. Spring Retry gives you multiple backoff strategies via the @Backoff annotation and ExponentialBackOffPolicy. The key insight is multiplier — each attempt delay multiplies (e.g., 2s, 4s, 8s). Add maxDelay to cap the ceiling. For high-throughput systems, combine this with jitter to avoid thundering herd problems on your dependencies. The default backoff is fixed. That's fine for testing. In production, fixed backoff is how you accidentally DDoS your own database. Always pair @Retryable with @Backoff(delay = 1000, multiplier = 2.0, maxDelay = 10000). Your ops team will thank you when that AWS RDS failover happens at 3 AM.

RetryWithBackoff.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — java tutorial
@Service
public class PaymentClient {
    @Retryable(
        retryFor = TimeoutException.class,
        maxAttempts = 5,
        backoff = @Backoff(delay = 2000, multiplier = 2.0, maxDelay = 30000)
    )
    public PaymentResponse charge(PaymentRequest request) {
        return paymentGateway.charge(request);
    }

    @Recover
    public PaymentResponse fallback(TimeoutException e, PaymentRequest request) {
        log.error("Payment failed after 5 attempts: {}", request.id());
        return PaymentResponse.failed(request.id(), "GATEWAY_TIMEOUT");
    }
}
Output
Attempt 1: wait 2s
Attempt 2: wait 4s
Attempt 3: wait 8s
Attempt 4: wait 16s
Attempt 5: wait 30s (capped)
→ Exhausted after 5 attempts, calls @Recover
Production Trap:
Fixed backoff without jitter causes synchronized retries across all instances. Add randomize = true to @Backoff to spread retry windows. For Kafka consumers, this prevents the entire cluster from hammering the DB at the same second.
Key Takeaway
Always specify multiplier in @Backoff. Default fixed delay is for demos, not production. Exponential backoff + jitter = survival.

Retry Configuration: Externalize It, Don't Hardcode It

How many times have you hotfixed a retry count at 2 AM because your payment provider started throttling? Hardcoded @Retryable(maxAttempts = 3) means a recompile and deploy cycle. Stop doing that. Spring Retry supports externalized properties via the @Retryable annotation's maxAttemptsExpression, delayExpression, and backoff attributes. You feed these from application.yml or environment variables. This is a game-changer for multi-region or multi-tenant setups where different environments have different SLOs. The pattern is simple: use SpEL expressions like #{${retry.payment.max-attempts}} and define the defaults in your properties file. When your API starts returning 429s, you bump the delay in the config server and the next retry picks it up. No code change. No pipeline. This is what separates a production battle station from a toy app. Always externalize retry parameters. Your pager will appreciate the difference between a config change and a full deploy at 3 AM.

ConfigurableRetry.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — java tutorial
@Service
public class InventoryService {
    @Retryable(
        retryFor = {DatabaseException.class, NetworkException.class},
        maxAttemptsExpression = "#{${retry.inventory.max-attempts:4}}",
        backoff = @Backoff(
            delayExpression = "#{${retry.inventory.delay:2000}}",
            multiplierExpression = "#{${retry.inventory.multiplier:2.0}}"
        )
    )
    public Stock getStock(String sku) {
        return stockRepository.findById(sku).orElseThrow();
    }

    @Recover
    public Stock fallback(RuntimeException e, String sku) {
        log.warn("Returning stale stock for {} due to: {}", sku, e.getMessage());
        return Stock.stale(sku, cacheService.getLastKnownStock(sku));
    }
}
Output
application.yml:
retry:
inventory:
max-attempts: 5
delay: 1000
multiplier: 3.0
→ Changes take effect on next retry without recompilation.
Pro Tip:
For Spring Cloud Config or Kubernetes ConfigMap users, bind these properties to a @ConfigurationProperties class. Then inject them as a bean and reference them in @Retryable expressions. Avoid stringly-typed magic numbers in code.
Key Takeaway
Use @Retryable with SpEL expressions to externalize retry parameters. Config change > code change when the database is burning.
● Production incidentPOST-MORTEMseverity: high

The Thundering Herd: Synchronized Retries Took Down the Auth Service

Symptom
Auth service returned to normal after a 30-second hardware issue. Traffic monitoring showed the service immediately received 50x normal request volume and crashed again within 10 seconds. The outage extended from 30 seconds to 8 minutes.
Assumption
The team assumed retry would make their services more resilient. They configured maxAttempts=3 with fixedDelay=1000ms across 80 microservices, all making auth calls.
Root cause
All 80 services had been simultaneously failing for 30 seconds. When auth recovered, all services retried their queued failures simultaneously — 80 services × 3 retry attempts × thousands of requests/second = a request storm that overwhelmed auth. Fixed delay (no jitter) meant all retries were synchronized to the same 1-second marks.
Fix
Switched to exponential backoff with full jitter: initialInterval=500ms, multiplier=2.0, maxInterval=30s, random jitter ±50% of computed delay. Added Resilience4j circuit breaker with 50% failure threshold to stop retrying when auth was truly down. The next auth outage lasted 35 seconds; services degraded gracefully and recovered smoothly.
Key lesson
  • Jitter is not optional — it's mandatory.
  • Without randomization in retry delays, synchronized failures create synchronized retry storms.
  • Always add jitter to backoff, and always pair retry with a circuit breaker to stop retrying a service that's genuinely down.
Production debug guideSymptom → root cause → fix5 entries
Symptom · 01
@Retryable has no effect — method fails immediately without retrying
Fix
Spring Retry requires the method to be called through a Spring proxy — calling a @Retryable method from within the same class bypasses the AOP proxy. Move the retryable method to a separate @Service bean and inject it. Also verify spring-retry and spring-boot-starter-aop are on the classpath, and @EnableRetry is on a @Configuration class or the main application class. Check that the exception thrown matches the include list on @Retryable.
Symptom · 02
Resilience4j retry not recording metrics in Actuator
Fix
Confirm resilience4j-micrometer is on the classpath alongside spring-boot-actuator. Add management.endpoints.web.exposure.include=health,metrics,retries to application.properties. Check that the retry registry bean name matches the name referenced in @Retry(name=). Use curl http://localhost:8080/actuator/retries to list registered retry instances and their current counts.
Symptom · 03
Retry causes duplicate records or double charges
Fix
The retried operation is not idempotent. Immediately disable retry for this operation. Implement idempotency: generate a stable idempotency key from business data (order ID + attempt number), persist it to a database before the external call, check for existing results on each attempt. For payment APIs, use the provider's idempotency key header. Only re-enable retry after verifying idempotency end-to-end.
Symptom · 04
All retries exhaust immediately — service takes as long as maxAttempts × timeout
Fix
Connection timeout is too high relative to retry delay. If each attempt has a 30-second connection timeout and you have maxAttempts=3, worst case is 90 seconds. Reduce individual call timeouts to be much smaller than the retry budget. Alternatively, use Resilience4j TimeLimiter to bound total execution time including retries. Check if the downstream service is actually down (circuit breaker would be more appropriate than retry here).
Symptom · 05
Exponential backoff reaches maxInterval immediately and stays there
Fix
This is expected behavior — maxInterval caps the computed delay. Verify the configuration: initialInterval, multiplier, and maxInterval. If retries are still exhausting without success, the issue is systemic (circuit breaker territory) rather than transient. Add Resilience4j circuit breaker on top of retry, with a half-open probe after the cooldown window.
★ Debug Cheat SheetFast triage for Spring Boot retry issues
@Retryable not retrying
Immediate action
Check self-invocation and @EnableRetry annotation
Commands
grep -r '@EnableRetry' src/main/java/ --include='*.java'
grep -r 'spring-retry\|spring-boot-starter-aop' pom.xml build.gradle
Fix now
Add @EnableRetry to main class; move @Retryable method to separate @Service bean
Resilience4j retry metrics missing+
Immediate action
Check actuator exposure and dependency
Commands
curl -s http://localhost:8080/actuator/retries | jq '.'
curl -s http://localhost:8080/actuator/metrics | jq '.names[] | select(startswith("resilience4j"))'
Fix now
Add resilience4j-micrometer dependency; expose metrics endpoint in application.properties
Retry causing thundering herd+
Immediate action
Add jitter to backoff configuration immediately
Commands
grep -r 'multiplier\|fixedDelay\|backoff' src/main/resources/ --include='*.yml' --include='*.properties'
curl -s http://localhost:8080/actuator/metrics/resilience4j.retry.calls | jq '.measurements'
Fix now
Set random=true in @Backoff or use Resilience4j IntervalFunction.ofExponentialRandomBackoff()
Spring Retry vs. Resilience4j Retry
FeatureSpring RetryResilience4j
Annotation API@Retryable / @Recover@Retry(name=)
Programmatic APIRetryTemplateRetryRegistry + Retry.decorateSupplier()
Backoff optionsFixed, exponential, randomFixed, exponential, random, custom IntervalFunction
Micrometer metricsNone built-inFirst-class — per-instance metrics
Circuit breakerSeparate project (spring-circuit-breaker)Built-in, same library
Reactive supportLimitedFull (Mono/Flux decorators)
Setup complexityLow — spring-retry + @EnableRetryMedium — yaml config + dependency
Exception classificationinclude/exclude listsretryOnException predicate
Best forSimple service retry, quick setupProduction systems needing metrics and CB composition

Key takeaways

1
Idempotency is the prerequisite for retry
derive keys from stable business identifiers, never from random UUIDs generated at call time
2
Always enable jitter (random=true in @Backoff, ofExponentialRandomBackoff in Resilience4j) to prevent synchronized retry thundering herd
3
@Retryable requires AOP proxy
self-invocation silently disables retry; always call from a different Spring bean
4
Resilience4j + Micrometer gives you retry rate dashboards for free
if you're running Prometheus/Grafana, the migration from Spring Retry pays for itself in observability
5
Circuit breaker wraps retry in the composition order
circuit breaker detects systemic failure and opens; retry handles transient failures within a closed circuit

Common mistakes to avoid

6 patterns
×

Using @Retryable with self-invocation (calling from the same class)

Symptom
Method fails immediately without retrying; no retry log entries
Fix
Extract the @Retryable method to a separate @Service bean and inject it; self-calls bypass the AOP proxy
×

Fixed backoff delay without jitter across many clients

Symptom
After a downstream outage, all clients retry simultaneously creating a thundering herd
Fix
Add random=true to @Backoff or use IntervalFunction.ofExponentialRandomBackoff() — jitter desynchronizes retry waves
×

Retrying non-idempotent operations (payments, email sends, record creation)

Symptom
Duplicate charges, duplicate emails, or duplicate database records
Fix
Implement idempotency before enabling retry; use provider idempotency keys (Stripe) or an idempotency_keys table
×

Not configuring noRetryFor for client errors (4xx HTTP status)

Symptom
Retrying bad requests (400), unauthorized (401), not found (404) — wasting retry budget on non-transient failures
Fix
Add HttpClientErrorException to noRetryFor/ignoreExceptions — 4xx errors are never transient and should fail immediately
×

Missing @Recover method — retries exhaust and throw the raw exception

Symptom
After retries exhaust, callers get raw exception types instead of graceful degradation
Fix
Always add @Recover for production retry logic; return a degraded result or throw a domain-specific exception with context
×

Retry without circuit breaker — amplifying load on a downed service

Symptom
During outages, retries create 3× the load on already-struggling downstream, extending the outage
Fix
Wrap retry with a circuit breaker; when failure rate exceeds threshold, stop retrying entirely and fail fast
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is exponential backoff with jitter and why is jitter critical?
Q02JUNIOR
Why doesn't @Retryable work when calling the annotated method from withi...
Q03SENIOR
What must you implement before enabling retry on a payment charge operat...
Q04SENIOR
Explain the difference between retry and circuit breaker and how they co...
Q05SENIOR
How does Resilience4j retry composition differ from Spring Retry in term...
Q06SENIOR
Describe how you would implement exactly-once semantics for retry on a c...
Q07SENIOR
How would you use ExceptionClassifierRetryPolicy with RetryTemplate?
Q08SENIOR
What is the thundering herd problem in retry, and how do you prevent it?
Q01 of 08JUNIOR

What is exponential backoff with jitter and why is jitter critical?

ANSWER
Exponential backoff increases the delay between retry attempts geometrically — if initial delay is 500ms and multiplier is 2, delays are 500ms, 1s, 2s, 4s. Without jitter, all clients that failed simultaneously retry at the same exponential intervals, creating synchronized retry waves that can overwhelm a recovering service. Jitter adds randomness — multiplying the computed delay by a random factor — so clients retry at different times, distributing load instead of concentrating it.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
Can I use Spring Retry and Resilience4j in the same application?
02
What's the difference between maxAttempts in @Retryable and Resilience4j?
03
Should I retry on all RuntimeExceptions by default?
04
How do I test retry behavior in unit tests?
05
Does retry work with Spring WebFlux reactive endpoints?
06
What happens when retry exhausts and there's no @Recover method?
🔥

That's Messaging. Mark it forged?

8 min read · try the examples if you haven't

Previous
Async Messaging Patterns in Spring Boot
7 / 7 · Messaging
Next
Database-per-Service Pattern in Microservices