Spring Cloud LoadBalancer: Stop Using Netflix Ribbon — Here's What Actually Breaks
Real production failures from Spring Cloud LoadBalancer: retry storms, sticky sessions, and circuit breakers.
- Spring Cloud LoadBalancer replaces Netflix Ribbon in Spring Boot 3.x
- Default is round-robin — but that breaks on cache-heavy services
- Retry logic is disabled by default — you must explicitly configure it
- Health checks are lazy — first request after a restart may hit a dead node
- Reactive and Servlet stacks have different LoadBalancer implementations
Imagine a busy restaurant with one menu and many tables. A good server balances who gets served first so nobody waits too long. If one table's food burns, the host must skip it. Spring Cloud LoadBalancer is that host for your microservices — sending requests to healthy instances.
It was 2:47 AM. My phone buzzed with a PagerDuty alert. 502 errors on the payment service. By the time I logged in, 30% of users couldn't complete checkout. The on-call junior had already restarted all pods. Twice.
The team had migrated from Ribbon to Spring Cloud LoadBalancer two weeks earlier. The PR looked clean. Tests passed. They waved the old spring-cloud-starter-netflix-ribbon goodbye and added spring-cloud-starter-loadbalancer. Deployment went smooth. Metrics looked normal.
But production doesn't care about your green pipeline. After restart, traffic surged. One payment instance started returning 5xx errors. The load balancer happily kept sending it traffic. Round-robin doesn't skip sick nodes. It just rotates.
The incident taught me something painful: Spring Cloud LoadBalancer gives you a simple default that works in demos but kills in production. You must explicitly configure retry, health checks, and failure thresholds. The defaults assume your instances are perfectly healthy zombies — they never die.
This article is my debrief from that night. I'll show you what breaks, why it breaks, and the exact configs you need. No theory. Just production scars.
Why Spring Cloud LoadBalancer (And Not Ribbon)
Let's get one thing straight: Ribbon died because Netflix stopped maintaining it in 2017. Spring kept it on life support through Hoxton. By 2021, they pulled the plug. Spring Cloud LoadBalancer is the successor, and it's better — for the right reasons.
Ribbon was a monolith. It bundled load balancing with service discovery and configuration. Spring Cloud LoadBalancer is a leaf dependency. If you use Eureka, you add spring-cloud-starter-netflix-eureka-client. If you use Consul, you add spring-cloud-starter-consul-discovery. The load balancer is just a tiny layer on top.
But the big win? Reactive support. Ribbon was a blocking implementation. In the WebFlux world, that breaks your async pipeline. Spring Cloud LoadBalancer has both BlockingLoadBalancerClient (for RestTemplate) and ReactiveLoadBalancerClient (for WebClient). Choose your stack, get the right implementation.
Here's the trade-off no one talks about: Ribbon had a rich ecosystem of custom rules. You could write complex traffic-splitting rules. Spring Cloud LoadBalancer is simpler. That's good for 95% of cases. But if you need canary routing based on request headers, you must write a custom ReactiveLoadBalancer implementation. I've done it. It's about 50 lines of code, but the first time you forget to handle null requests, your whole service goes down.
Another pain point: Ribbon cached instance lists aggressively. Spring Cloud LoadBalancer refreshes from the discovery client on each request by default. That's a performance hit under high load. You need to tune caching. I'll show you how.
Production verdict: Use Spring Cloud LoadBalancer. It's the right choice. But treat it like a sharp knife — don't assume it's safe out of the box.
loadBalancerClient.choose() directly in the reactive stack blocks the thread. The reactive implementation ReactiveLoadBalancer is injected automatically when using @LoadBalanced WebClient.Builder. Don't call choose() manually unless you absolutely need custom instance selection logic.choose() call in Mono.fromCallable() and still got thread contention. The fix: use @LoadBalanced on a WebClient bean. Let Spring wire the reactive client for you.The Five Configurations You Must Set Before Going Live
The default Spring Cloud LoadBalancer configuration is a get-to-production trap. Here's my checklist. Every single item has burned me or someone on my team.
- Enable retry.
spring.cloud.loadbalancer.retry.enabled=true. Without it, one slow request means a failed request. With it, the client tries another instance. But don't stop there. SetmaxRetriesOnSameServiceInstance=1andmaxRetriesOnNextServiceInstance=2. More than that and your retry storm will kill downstream databases. - Add backoff.
spring.cloud.loadbalancer.retry.backoff.enabled=true. WithminBackoff=500msandmaxBackoff=5s. Without backoff, retries happen instantly. I've seen a retry loop execute 50 requests in 200ms. The downstream service doesn't recover — it drowns. - Enable health-check filtering. The load balancer can skip instances that fail health checks. Set
spring.cloud.loadbalancer.health-check.path=/actuator/health. But here's the catch: this is lazy filtering. The first request after a health check passes may still hit a node that just failed. Combine with circuit breakers for real protection. - Configure instance caching. By default, the LoadBalancer fetches instances from the discovery client on every request. That means a network call to Eureka or Consul for each incoming request. Set
spring.cloud.loadbalancer.cache.enabled=trueandspring.cloud.loadbalancer.cache.ttl=30s. This reduces discovery load by 99%. - Enable circuit breaker integration. Spring Cloud LoadBalancer integrates with Resilience4j. Add
spring.cloud.loadbalancer.circuitbreaker.enabled=trueon the client. This wraps the load balancer call in a circuit breaker. After N failures, it trips open and stops sending requests to the broken service entirely. Without this, the load balancer keeps rotating through failing instances.
I've seen teams skip all five and go live. They all had an incident within the first month. Don't be that team.
RandomLoadBalancer instead of RoundRobinLoadBalancer for cache-heavy services. If every request goes through a cache layer, round-robin creates a thundering herd on cold caches. Random spreads the load more evenly. Swap it via spring.cloud.loadbalancer.strategies=random.initial-retry-delay for health checks should match your pod startup time plus Eureka registration delay. On a Kubernetes cluster with slow image pulls, that's 30-45 seconds. Failure to set this means the load balancer marks new pods as down before they even process a request.Sticky Sessions: When Affinity Becomes Your Enemy
Sticky sessions (session affinity) seem like a good idea. Pin a user to one instance. Cache their session data. Avoid re-authentication. But in microservices, sticky sessions are a distributed systems nightmare.
Spring Cloud LoadBalancer has a built-in sticky session implementation. It uses the Set-Cookie header (e.g., SC-LB-SESSION) to pin a request to a specific instance. Enable it with spring.cloud.loadbalancer.sticky-session.enabled=true. Sounds convenient. It's not.
Here's the problem: if that instance goes down, the user's session is gone. They're pinned to a dead node. The load balancer will either throw an error or, if retry is enabled, keep retrying the dead instance until max retries. That's a terrible user experience.
I saw this at a fintech startup. They used sticky sessions for authentication tokens. One instance crashed. All users pinned to that instance got logged out. Support tickets exploded. They disabled sticky sessions and moved to Redis-based session storage. Problem solved.
When would you actually want stickiness? Two scenarios: first, if your service handles large file uploads and you need to keep the connection on one node for performance. Second, if you can't switch to distributed caching and must use JVM-local caches. But both cases are architecture smells.
If you must use sticky sessions, at least configure a fallback. Spring Cloud LoadBalancer's StickySessionLoadBalancer wraps another load balancer instance. When the sticky target is down, it delegates to the underlying balancer. You control this with spring.cloud.loadbalancer.sticky-session.fallback-strategy. The default is FAIL, meaning it errors out. Change it to ROUND_ROBIN to survive node failures.
My advice: avoid sticky sessions in production. Use stateless services with a shared cache (Redis, Hazelcast). Your life will be simpler.
Retry Storms: How Round-Robin + No Backoff Killed a Database
Round-robin is the default load-balancing algorithm. It's simple, fair, and terrible for error scenarios. Here's why.
Imagine you have three payment instances. One instance's database connection pool is exhausted — it's returning 503s on every third request. With round-robin, your client sends a request to instance 1 (success), instance 2 (success), instance 3 (fail). The failure triggers a retry, which goes to instance 1 (success), instance 2 (success), instance 3 (fail again). Repeat. You get a steady 33% failure rate. Users feel it.
Now add retry without backoff. The retry happens instantly — on the same thread. If you have 100 concurrent requests, your thread pool quickly exhausts. Then threads queue. Then connection timeouts cascade to upstream services. The entire system slows down.
I saw this take down a payment gateway. The team had set maxRetriesOnNextServiceInstance=3. No backoff. The database connection pool on one instance hit its limit. That instance started failing. The retry storm sent 300% more traffic to the other two instances. Their connection pools also filled up. All three died within 90 seconds.
The fix: set backoff with exponential delay. And reduce maxRetriesOnNextServiceInstance to 1. One retry is often enough to find a healthy instance. If the first retry also fails, something else is broken — more retries won't fix it.
Also: configure a circuit breaker with a sliding window of failures. Resilience4j's TimeLimiter can cut off calls that take too long. I set timeoutDuration=5s on all LoadBalancer-backed calls. If the instance doesn't respond in 5 seconds, fail fast and try another.
Production metric to monitor: loadbalancer.retry.attempts. If this spikes above 1.0 average, your instances are sick or your retry policy is too aggressive. Alert on it.
maxRetriesOnSameServiceInstance high (e.g., 3) without backoff. This creates a tight retry loop on a single failing instance. The thread blocks for N retries, each failing instantly. Thread pool exhaustion follows. Set it to 0 — let the next retry go to a different node.X-LoadBalancer-Instance-Id headers to trace retries. Add that header value to your logs. It'll save you hours in debugging which specific host went bad.Custom Load Balancing: When Defaults Fail Your Use Case
Round-robin and random cover most cases. But sometimes you need custom logic. For example: routing requests to instances in the same availability zone, or implementing a weighted load balancer based on CPU usage.
Spring Cloud LoadBalancer makes custom implementations straightforward. You implement ReactorLoadBalancer<ServiceInstance> and provide it as a bean. The framework calls choose(Request) where Request contains hints (headers, query params, cookies).
I wrote a custom zone-aware load balancer for a fintech client. They had instances in us-east-1a, us-east-1b, and us-west-2a. Cross-region latency was killing their API call times. The custom balancer checked the X-Forwarded-For headers and mapped IP ranges to zones. Requests from Europe went to us-east-1a (faster from Europe than us-west-2a). Requests from Asia went to us-west-2a. Latency dropped by 40%.
Here's the gotcha: your Request object must carry the context. By default, DefaultRequest is empty. You must populate it when making the call. For RestTemplate, set the request context via RequestInterceptor. For WebClient, pass headers in the request.
Another common custom requirement: weighted load balancing based on health metrics. I've seen teams expose custom metrics via GET /actuator/loadbalance/weight. The custom LoadBalancer reads this endpoint from each instance and assigns higher weight to healthier nodes. This is powerful but dangerous. Misconfigured weights can send 100% traffic to one node if your metric calculation is wrong.
The trick: test your custom LoadBalancer in a staging environment with fault injection. Use Chaos Monkey to kill random instances. Verify the balancer handles the failure correctly. Don't trust unit tests — they don't capture real network conditions.
ServiceInstanceListSupplier instead of the raw discovery client. It integrates caching and health-check filtering automatically. Just provide your metadata with DiscoveryClientServiceInstanceListSupplier.Testing LoadBalancer Configs: What CI Usually Misses
Unit tests won't catch load balancing bugs. You need integration tests with a real instance registry. Here's my approach after being burned too many times.
First, use @SpringBootTest(webEnvironment = WebEnvironment.RANDOM_PORT) with a WireMockServer or TestContainers to simulate multiple instances. Register them with a local Eureka or Consul server. Then make requests and verify they round-robin or randomize correctly.
Second, test retry behavior. Start with one healthy instance and one that returns 503. Verify the LoadBalancer fails over correctly. Test backoff timing with Thread.sleep() in the unhealthy instance's response. Make sure the retry respects the configured backoff.
Third, test circuit breaker integration. Simulate repeated failures and verify the circuit opens. Then make the instance healthy and verify the circuit closes. This catches misconfigured circuit breaker thresholds.
Fourth, test health-check filtering. Start a new instance that reports DOWN via /actuator/health. Verify the LoadBalancer never sends traffic to it. Then change it to UP and verify traffic resumes.
Fifth — and this is the one everyone misses — test without any healthy instances. Your LoadBalancer should throw a specific exception, not hang forever. Set spring.cloud.loadbalancer.no-available-instances-behavior=THROW_EXCEPTION. Otherwise, it blocks on an empty Mono, causing thread pool exhaustion.
I maintain a test suite with these five scenarios. It catches regressions every sprint.
DiscoveryClient with a fixed list of instances that never fail. Real instances crash, become slow, or return partial responses. Your tests must include failure modes: 503, 500, timeout, and connection refused.The Round-Robin Death Spiral
spring.cloud.loadbalancer.retry.enabled=true to application.yml.
2) Configured retry backoff: spring.cloud.loadbalancer.retry.maxRetriesOnSameServiceInstance=2.
3) Enabled health-check filtering:
``
spring:
cloud:
loadbalancer:
health-check:
initial-retry-delay: 1000
retry-delay: 5000
path: /actuator/health
``
4) Added a circuit breaker using Resilience4j on the Feign client.
5) Deployed fixed config and scaled up instances. Errors dropped to zero within 30 seconds.- Default round-robin is a trap.
- Always configure retry, health checking, and circuit breakers client-side.
- The load balancer is not a magic wand — it's a dispatcher.
- If you don't tell it to skip sick nodes, it won't.
spring.cloud.loadbalancer.health-check.initial-retry-delay to match your Eureka heartbeat interval (default 30s). Also verify /actuator/health returns 200 from the new pod.X-Forwarded-For or cookie-based affinity. If you didn't configure stickiness, your code might be setting it implicitly. Clear all cookies and try again. Set spring.cloud.loadbalancer.sticky-session.enabled=false to disable.spring.cloud.loadbalancer.retry.maxRetriesOnSameServiceInstance=1 and maxRetriesOnNextServiceInstance=1. Also add exponential backoff: spring.cloud.loadbalancer.retry.backoff.enabled=true..responseTimeout(Duration.ofSeconds(5)). Also check the LoadBalancer's cache TTL — it may be holding a stale instance list.kubectl logs -l app=payment-service -c payment --tail=100 | grep 'sticky\|affinity'curl -I http://payment-service/api/v1/order | grep -i 'set-cookie'Key takeaways
Common mistakes to avoid
5 patternsNot enabling retry before going live
spring.cloud.loadbalancer.retry.enabled=true and set maxRetriesOnNextServiceInstance=1Using round-robin with cache-heavy services
spring.cloud.loadbalancer.strategies=randomEnabling sticky sessions without a fallback strategy
spring.cloud.loadbalancer.sticky-session.fallback-strategy=ROUND_ROBINSetting `maxRetriesOnSameServiceInstance` > 1 without backoff
Forgetting to set instance cache TTL
spring.cloud.loadbalancer.cache.enabled=true and ttl=30sInterview Questions on This Topic
What happens when all instances of a service are down and you have a Feign client with LoadBalancer + retry?
LoadBalancerClient.choose(), which throws ServiceInstanceNotFoundException. If retry is enabled (default: 2 retries), it retries the choose() call. After exhausting retries, the Feign client throws a RetryableException. The circuit breaker (if configured) counts this as a failure and may open the circuit. The default behavior without circuit breaker is to retry until timeout — which can block threads indefinitely. Best practice: configure a short timeout (5s) and circuit breaker that opens after 3 consecutive failures.Frequently Asked Questions
That's Spring Cloud. Mark it forged?
8 min read · try the examples if you haven't