Resilience4j in Spring Boot — CircuitBreaker, Retry, RateLimiter, Bulkhead, TimeLimiter
Master Resilience4j in Spring Boot 3.
- Annotate service methods with @CircuitBreaker(name="svc", fallbackMethod="fallback") — zero XML required
- Configure thresholds in application.yml under resilience4j.circuitbreaker.instances.
- Expose metrics via Actuator: GET /actuator/circuitbreakers shows state, failure rate, slow call rate
- Use @Retry for transient failures and @TimeLimiter + @CircuitBreaker together for async calls
- Never share a CircuitBreaker instance across unrelated dependencies — isolation is the entire point
Think of a CircuitBreaker like the fuse box in your house. When one appliance draws too much power (a downstream service keeps failing), the fuse trips to protect everything else. After a cooldown, it tries again cautiously. Resilience4j is that fuse box for your microservices — it stops cascading failures before they take down your entire system.
At 2 AM on Black Friday, your payment service starts timing out. Within 90 seconds, the timeout propagates upstream: the order service threads fill up waiting for payment, the API gateway runs out of connections, and your entire platform is down. Not because payment was broken — because nothing stopped the cascade.
This is the problem Resilience4j solves. It is the successor to Netflix Hystrix (which reached end-of-life in 2018) and is now the de-facto resilience library for Spring Boot microservices. Unlike Hystrix's thread-pool-per-command model, Resilience4j uses decorators over functional interfaces, making it lightweight, composable, and natural to use with modern Java.
Spring Boot 3.x integrates Resilience4j through the spring-boot-starter-aop and resilience4j-spring-boot3 starters. Configuration lives in application.yml, annotations handle instrumentation, and Spring Actuator exposes circuit breaker states as live metrics you can alert on. The library ships five core patterns: CircuitBreaker (stop calling a broken service), Retry (handle transient blips), RateLimiter (protect yourself from overload), Bulkhead (isolate thread or semaphore pools), and TimeLimiter (never hang indefinitely).
The key insight most teams miss: these patterns compose. A production-grade remote call should layer TimeLimiter → CircuitBreaker → Retry → the actual HTTP call. Each layer adds a different protection dimension. Getting the order wrong — for example, wrapping a CircuitBreaker inside a Retry — means you hammer a half-open circuit with retries, defeating the entire cooldown mechanism.
This guide is written from real incident experience: war stories from services that dropped 40% error rates to under 0.1% after correctly tuning these patterns, and horror stories from misconfigured bulkheads that caused more damage than the original outage.
CircuitBreaker: Configuration That Actually Works in Production
The default Resilience4j CircuitBreaker configuration will get you killed in production. slidingWindowSize: 100 means you need 100 failed calls before the circuit can even consider tripping. At 500ms per call, that's 50 seconds of failure propagating through your system before any protection kicks in.
Here is the configuration I use in production for a typical microservice-to-microservice call with moderate traffic:
The sliding window type should almost always be COUNT_BASED for microservices. TIME_BASED is useful when call volume is very high and you care about failure rate over a recent time period, but COUNT_BASED gives more predictable behavior. Set slidingWindowSize between 10-20 for most services.
failureRateThreshold: 50 means 50% of calls in the window must fail before opening. Don't set this too low — transient blips (GC pause, brief network hiccup) will false-positive and open the circuit unnecessarily. 50% is a good starting point; tune based on your baseline error rate.
slowCallRateThreshold is the underrated config. A slow call isn't a failed call, but it's often worse — it holds threads and connections. Setting slowCallDurationThreshold: 2s and slowCallRateThreshold: 50 means if 50% of calls take more than 2 seconds, the circuit opens. This catches the 'service is up but broken' scenario that failure rate alone misses.
waitDurationInOpenState: 30s is how long the circuit stays OPEN before trying HALF_OPEN. In production you want this short enough to recover quickly (30-60s) but long enough that the downstream service has had time to recover. Don't set it under 10 seconds.
AutomaticTransitionFromOpenToHalfOpenEnabled: true means you don't need a probe call to trigger HALF_OPEN — the circuit transitions automatically after the wait duration. Enable this in production so recovery is automatic.
ApplicationContext.getBean() for self-injection.Retry: Backoff Strategies and When NOT to Retry
Retry is the most dangerous resilience pattern if misused. Blindly retrying a failed call can turn a struggling service into a completely dead one. The golden rule: only retry idempotent operations, and always use exponential backoff with jitter.
Idempotent operations safe to retry: GET requests, PUT with full resource replacement, DELETE. Never blindly retry: POST (creates duplicate resources), payment processing, order placement, anything with side effects. If you must retry a non-idempotent operation, your service must implement idempotency keys.
Exponential backoff doubles the wait time between retries: 100ms, 200ms, 400ms, 800ms. This gives the downstream service time to recover. Jitter adds randomness to the wait time to prevent thundering herd — without jitter, all retrying clients hit the server simultaneously after the same backoff period, potentially causing another overload.
The maxAttempts includes the first attempt. maxAttempts: 3 means 1 original call + 2 retries. Don't be fooled — I've seen teams set maxAttempts: 10 wondering why their p99 latency is 10 seconds.
retryExceptions should be explicit. Don't catch all exceptions — only retry on transient failures: SocketTimeoutException, ConnectException, ServiceUnavailableException. Business exceptions like IllegalArgumentException or validation errors should be in ignoreExceptions — retrying them is futile and wastes time.
The @Retry annotation stacks beautifully with @CircuitBreaker. The correct composition order (outermost to innermost) is: @CircuitBreaker → @Retry → @TimeLimiter → actual call. This way, if the circuit is open, retries don't happen. If retries exhaust, the circuit records failures and may open. TimeLimiter ensures each individual attempt has a hard ceiling.
RateLimiter: Protect Yourself, Not Just Others
Most developers think of rate limiting as something you do to external clients hitting your API. In microservices, the more critical use case is rate-limiting your own outbound calls to protect downstream services. If inventory-service has a rate limit of 1000 RPS and you have 10 instances each capable of making 500 RPS of calls, you need client-side rate limiting to avoid overwhelming it.
Resilience4j's RateLimiter uses a token bucket algorithm by default (AtomicRateLimiter). Tokens refill every limitRefreshPeriod. limitForPeriod tokens are available per period. timeoutDuration is how long to wait for a token before giving up.
The subtle production issue: with timeoutDuration > 0, requests queue waiting for tokens. Under a traffic spike, you can accumulate thousands of queued threads — this is often worse than just failing fast. For outbound HTTP calls, set timeoutDuration: 0 and handle RequestNotPermitted with a fallback that returns 429 to the caller. Let the caller's retry mechanism handle backoff.
For inbound rate limiting (protecting your own API), consider Spring's built-in support or an API gateway — Resilience4j's RateLimiter is a per-instance, in-memory rate limiter. In a 10-instance deployment, each instance allows limitForPeriod requests, so effective total is 10x. For global rate limiting you need Redis-backed solutions (Bucket4j + Redis, Spring Cloud Gateway rate limiter).
Combining RateLimiter with Bulkhead gives you both throughput control (RateLimiter) and concurrency control (Bulkhead). They solve different problems: RateLimiter limits requests per time period; Bulkhead limits simultaneous in-flight requests.
Bulkhead: Thread Pool and Semaphore Isolation
Bulkhead prevents a slow downstream dependency from consuming all available threads in your service, taking down unrelated features. The name comes from ship design — bulkheads partition a ship into watertight compartments so a breach in one doesn't sink the whole vessel.
Resilience4j offers two bulkhead types. SemaphoreBulkhead limits concurrent calls using a semaphore — it runs in the calling thread, blocking it for maxWaitDuration before throwing BulkheadFullException. This is lightweight and appropriate for most use cases. ThreadPoolBulkhead runs calls in a separate thread pool — the calling thread is released immediately, and results come back via CompletableFuture. Use ThreadPoolBulkhead when you need true isolation and your framework is non-reactive.
The key decision: semaphore vs thread pool. Semaphore is simpler and has lower overhead — use it for most HTTP calls where you're already running in a request thread. Thread pool is better when you need to isolate CPU-heavy operations or when you want calling threads to remain responsive.
Sizing the bulkhead correctly is non-trivial. Too small and you get excessive BulkheadFullException under normal load. Too large and you lose isolation benefits. A practical heuristic: set maxConcurrentCalls to (expected peak concurrent calls × 1.5), never higher than what the downstream can handle. Monitor resilience4j.bulkhead.available.concurrent.calls in Grafana — if it's consistently near zero, you need to increase the bulkhead size or fix the downstream latency.
Bulkhead isolates failure domains. Your payment integration can have a tight bulkhead (10 concurrent calls), while your product catalog (reads, cheap, fast) can have a loose one (50 concurrent calls). If payment service goes slow, it can't steal threads from catalog lookups.
Actuator Metrics and Alerting on Circuit State Changes
Resilience4j's Actuator integration is one of its best features. Without it, you're flying blind — you don't know the circuit is open until customers call support. With it, you get real-time state visibility and can build alerts that fire before the circuit opens, giving you time to respond proactively.
The key Actuator endpoints: /actuator/circuitbreakers (all instances, states, metrics), /actuator/circuitbreakerevents (last N events — failures, successes, state transitions), /actuator/retryevents, /actuator/bulkheadevents. These endpoints are invaluable during incident response.
Resilience4j auto-registers Micrometer metrics when io.micrometer:micrometer-registry-prometheus is on the classpath. Key metrics to alert on: resilience4j.circuitbreaker.state (0=CLOSED, 1=OPEN, 2=HALF_OPEN) — alert when state == 1. resilience4j.circuitbreaker.failure.rate — alert when > 30% for proactive warning. resilience4j.circuitbreaker.slow.call.rate — alert when > 40%. resilience4j.bulkhead.available.concurrent.calls — alert when consistently near 0.
Set up a Grafana dashboard with state timeline per circuit breaker. The moment you see a transition CLOSED→OPEN is the moment the incident started — this gives you a precise timestamp to correlate with other signals (deployment, traffic spike, upstream alert).
Event consumers let you hook into state transitions for custom actions — PagerDuty alerts, Slack notifications, or logging to your incident management system. Register an EventConsumer that fires when a circuit transitions to OPEN.
Composing Patterns: The Complete Production-Grade Service Call
In production, you rarely use a single resilience pattern. A mature microservice call uses them all, correctly composed. The composition order matters enormously and is a source of many subtle bugs.
Correct order (outermost to innermost, matching annotation order on the method): Bulkhead (limits concurrent calls) → CircuitBreaker (stops calls when things are broken) → RateLimiter (throttles call rate) → Retry (handles transient failures) → TimeLimiter (hard timeout per attempt) → actual call.
In Spring AOP, annotations are applied in reverse declaration order. So if you declare @Bulkhead first and @TimeLimiter last in your code, @TimeLimiter is applied innermost (closest to the actual call) and @Bulkhead outermost. This matches the desired composition.
Why this order? If the circuit is open, there's no point in bulkhead enforcement (the call is rejected before the bulkhead is entered — actually the reverse: bulkhead should be outer to limit concurrent calls including fallback evaluation). The TimeLimiter must be innermost so it times out each individual retry attempt, not the entire retry sequence.
For reactive stacks (WebFlux + WebClient), use the ReactiveResilience4JCircuitBreakerFactory and the io.github.resilience4j:resilience4j-reactor module. The composition works identically but through reactive operators.
Configuration externalization: in production, never hardcode thresholds. Use Spring Cloud Config to push circuit breaker configuration changes without restarting services. With @RefreshScope, resilience4j instance configs can be updated dynamically — invaluable during incidents when you need to temporarily loosen a threshold.
TimeLimiter: Your Unsung Hero Against Slow Downstream Death
You have a circuit breaker. It catches failures. But what about the service that doesn't fail — it just hangs for 30 seconds? That is the silent killer in production. It ties up your thread pool, kills your throughput, and burns CPU on connections that will never return. The TimeLimiter is the timeout guard you need. It throws a TimeoutException when a downstream call exceeds your threshold. Without it, your retries fire on dead horses, your bulkhead fills with zombies, and your circuit breaker never opens because technically the call didn't "fail" — it just hasn't finished. Configure TimeLimiter with a hard timeout (500ms for most APIs) and a cancelRunningFuture flag set to true. That flag ensures the underlying CompletableFuture or thread is actually interrupted, not just abandoned. Combine it with Retry on TimeoutException for transient blips, but cap retries at 1 or 2 — retrying a timeout that doesn't return is just busy waiting. Remember: a slow service is a dead service from the caller's perspective. Don't let it rot your system.
Cache: The Pattern Nobody Talks About But Everyone Needs
Resilience4j has a Cache module. Most tutorials skip it. That is a mistake. Cache is resilience — it reduces load on fragile downstream systems during recovery. When your circuit breaker is half-open, every request that hits cache instead of the real service is a free win. The module wraps your functional call with a JCache (JSR-107) implementation like Caffeine or Ehcache. It works transparently: on cache hit, skip the remote call entirely. On cache miss, execute and store result. The key insight: never cache failures. Resilience4j's Cache decorator only caches successful results by default. That feature alone has saved my team during multi-minute circuit breaker recovery windows. Configure a small, bounded cache (1000 entries, 60-second TTL). Big caches cause GC pressure for rarely-used records. Cache the last known good response. When the circuit is open, you return stale data instead of errors. Your users prefer a 2-second-old response to a 500. This is not a suggestion — it's a production pattern that distinguishes resilient services from fragile ones.
The Circuit That Never Opened: A $2M Black Friday Outage
Optional.empty() — those weren't recorded as failures, only the timeout exceptions after 45s were. So the effective failure rate in Resilience4j's view was low despite 100% of calls hanging.- The default minimumNumberOfCalls: 100 is lethal during incidents.
- For microservices, use 5-20.
- Always wrap CircuitBreaker with TimeLimiter — a hanging call is a failure, but Resilience4j doesn't know that unless the TimeLimiter throws.
- Never catch and swallow exceptions inside a @CircuitBreaker boundary.
curl -s http://localhost:8080/actuator/circuitbreakers | jq .curl -s http://localhost:8080/actuator/circuitbreakerevents?name=inventoryService | jq .circuitBreakerEvents[-10:]Key takeaways
Common mistakes to avoid
6 patternsSetting minimumNumberOfCalls too high (default 100)
Catching and swallowing exceptions inside @CircuitBreaker boundary
Using @Retry on non-idempotent POST operations without idempotency keys
Nesting Retry inside CircuitBreaker (wrong annotation order)
Setting RateLimiter timeoutDuration too high (e.g., 5s)
Using @CircuitBreaker on @Scheduled tasks or batch jobs
Interview Questions on This Topic
What's the difference between COUNT_BASED and TIME_BASED sliding windows in Resilience4j CircuitBreaker?
Frequently Asked Questions
That's Spring Cloud. Mark it forged?
10 min read · try the examples if you haven't