Intermediate 8 min · May 23, 2026

Spring Cloud Inter-Service Communication: Don't Let Your Microservices Talk Behind Your Back

Q: How do I set a Feign client timeout globally?

Create a @Configuration class with a @Bean for Request.Options. Set connectTimeout and readTimeout in milliseconds. Spring Cloud applies it to all Feign clients that don't override it. Example: new Request.Options(2000, TimeUnit.MILLISECONDS, 5000, TimeUnit.MILLISECONDS).

Q: Eureka vs Consul vs Kubernetes native — which should I use?

If you're on pure Spring Boot with no K8s, use Eureka. If you're on K8s and want faster failover, use Kubernetes native discovery (spring.cloud.kubernetes.discovery). Consul is for polyglot environments where services are written in different languages. Avoid Eureka on K8s unless you tune heartbeats aggressively — it's too slow for K8s pod churn.

Q: How do I handle a 404 response from a downstream service without triggering the circuit breaker?

Write a custom ErrorDecoder that returns an exception only for 5xx status codes. For 4xx, return a response that doesn't extend Exception, or return a ResponseStatusException with a non-500 status. The circuit breaker only counts 5xx and timeouts as failures. Check your circuit breaker config to ensure you're not including 4xx in the failure list.

Q: Why does my Feign client sometimes hit a dead instance?

Your client-side discovery cache is stale. The default registry fetch interval is 30 seconds. When an instance is de-registered, other services might still have a cached reference for up to 30 seconds. Reduce eureka.client.registry-fetch-interval-seconds to 5. Consider using Kubernetes native discovery for zero-lag updates.

Q: How do I propagate trace context across asynchronous calls?

Use Micrometer's observation API to wrap your async code. For @Async methods, ensure you use the async task executor that propagates the trace context. Spring Boot's default executor does this automatically when Micrometer is on the classpath. For CompletableFuture, use CompletableFuture.supplyAsync() with a wrapped executor. For message queues, add the trace headers to the message envelope.

Master Spring Cloud inter-service communication with OpenFeign, Resilience4j, and service discovery.

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Lessons pulled from things that broke in production.

✓ Production

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Spring Cloud OpenFeign is the standard for declarative REST clients; avoid raw RestTemplate in new code.
Service discovery (Eureka) decouples callers from instance IPs; never hardcode URLs in production.
Circuit breakers (Resilience4j) are mandatory for fault isolation; one cascading failure kills your whole system.
Load balancing at the client side (Spring Cloud LoadBalancer) replaces ribbon; know your default.
Distributed tracing (Sleuth / Micrometer Tracing) is non-negotiable for debugging across service boundaries.

✦ Definition~90s read

What is Inter-service Communication in Microservices?

Spring Cloud inter-service communication is the set of patterns and tools for one microservice to call another across a network. It wraps synchronous HTTP calls with service discovery, client-side load balancing, circuit breakers, retries, and distributed tracing.

★

Think of microservices like a kitchen brigade.

The core components: OpenFeign for declarative REST clients, Spring Cloud LoadBalancer for picking the right instance, Resilience4j for fault tolerance, and Eureka/Consul/ZooKeeper for service registry. You don't call IPs. You call logical service names.

The framework resolves them.

This is opinionated. There are other ways — gRPC, message queues, raw HTTP — but if you're in a Spring Boot shop, Spring Cloud is your default path. It's battle-tested. It handles the boilerplate you'd otherwise copy-paste and get wrong. The trade-off: you buy into the Spring ecosystem, which means version compatibility hell if you mix releases.

Stick to the Spring Boot parent BOM. Do not mix Hoxton with 2020.x. I've seen it. It ends badly.

The real value isn't the pretty annotation. It's the fallback, the retry, the circuit breaker. Your service will fail. Your network will drop packets. Your downstream will take 30 seconds. Spring Cloud gives you sane defaults and knobs to tune. Ignore them at your peril.

Plain-English First

Think of microservices like a kitchen brigade. Each station (service) needs to pass orders and ingredients to the next. Spring Cloud is the intercom system and the rulebook for how they shout, listen, and handle it when someone burns the steak. Without it, you get dropped orders, burnt food, and a shouting match.

You've heard this story. Two in the morning. Pager goes off. Users are getting 500s on checkout. You SSH into the box, tail the logs, and see a tsunami of HTTP connection timeouts from the order service to the payment service. The payment service is healthy. Its CPU is at 12%. What gives?

I joined that call. The junior on-call had deployed a new version of the inventory service an hour earlier. Inventory used to call payment directly via a hardcoded IP. The IP was still in an old environment variable. The new payment instance got a different pod IP. Inventory kept trying the dead IP. Every request blocked for 30 seconds. Thread pool exhausted. Order service waited for inventory. API gateway queued up. Downstream all the way to the user.

That's the cascading failure pattern. One bad URL. No circuit breaker. No retry limit. No timeout. This is what we're here to fix.

Spring Cloud gives you a structured way to wire services together so that a simple deployment doesn't take down your entire platform. It's not magic. It's configuration discipline. You decide the timeout. You decide the retry count. You decide what happens when a service is slow — fail fast or degrade gracefully.

But it's also easy to get wrong. You can misconfigure a timeout so that your circuit breaker never opens. You can forget to add resilience annotations. You can version your dependencies wrong and spend three days on a classpath hell. I've done all of these. This article is so you don't have to.

OpenFeign: The Right Way to Write REST Clients in 2026

Stop writing RestTemplate. Stop copying HttpClient wrappers. OpenFeign is the declarative way to call another microservice. You write an interface. You annotate it with @FeignClient. The framework generates the implementation at compile time. No boilerplate. No manual serialization. The code is clean and testable.

Here's the trap: Feign's default configuration is terrible for production. The default connect timeout is 10 seconds and read timeout is 60 seconds. That's an eternity in a high-volume system. One slow downstream will eat your thread pool alive. You must set explicit timeouts.

Another trap: Feign's error handling. By default, it throws FeignException on any non-2xx response. That's fine for 500s. But what about 404? What about 409? You need a custom ErrorDecoder. Without it, your circuit breaker will open on every 409 Conflict, which is usually recoverable.

The biggest win: Feign integrates with service discovery and load balancer out of the box. You name the service, Spring resolves the address. No DNS hell. No service mesh required.

I had a service call another via a raw URL for two years. One day the URL changed. The Ops team didn't tell us. Everything broke. Feign with Eureka would have caught it immediately. Don't be that team.

Production Trap:

Never use the default Feign error decoder. A 404 will throw an exception and trip your circuit breaker. You'll take down your whole service because of a missing resource. Write a custom ErrorDecoder that only treats 5xx as failures.

Production Insight

In a high-throughput system, I set Feign connect timeout to 500ms and read timeout to 2s. If your downstream can't respond in 2 seconds, you need a queue, not a synchronous call.

Key Takeaway

Always configure explicit timeouts and a custom ErrorDecoder for your Feign clients. Defaults are for demos, not production.

thecodeforge.io

Spring Microservices Communication

Service Discovery: Why Hardcoding IPs Is a War Crime

You wouldn't hardcode a database password. Don't hardcode a service URL. Service discovery is the directory that tells Service A where Service B lives. Eureka is the most common in Spring Cloud. Consul and ZooKeeper work too. Pick one and stick with it.

The flow: Service B starts and registers its IP and port with Eureka. It sends heartbeats every 30 seconds. If it misses three heartbeats, Eureka evicts it. Service A asks Eureka for 'payment-service' and gets a list of IPs. The load balancer picks one.

The pain point: Eureka's eventual consistency. When you scale up a service, there's a lag before all callers see it. If you scale down, callers might hit a dead instance for up to 30 seconds. That's fine for most systems. If it's not fine, you need readiness gates and smarter health checks.

I once saw a team set eureka.instance.lease-renewal-interval-in-seconds to 3 and lease-expiration-duration-in-seconds to 10. They wanted fast failover. What they got was a constant flapping of instances being de-registered and re-registered because of network hiccups. Stick to defaults (30s and 90s) unless you have a specific reason.

The other mistake: not using service discovery at all. Just using Kubernetes DNS. That works until you have a blue-green deployment and the old service name points to the old version. Eureka gives you instance metadata — version, zone, canary flag. Use it.

Senior Shortcut:

Set eureka.client.registry-fetch-interval-seconds to 5 in production. The default 30 seconds means when you scale up, callers don't see new instances for half a minute. That's 30 seconds of overload on existing instances.

Production Insight

If you're on Kubernetes, consider using the Kubernetes-native service discovery via spring.cloud.kubernetes.discovery. It watches the API server instead of heartbeats. Faster failover, no extra infrastructure.

Key Takeaway

Service discovery isn't optional. It decouples deployment from configuration. One config change in Eureka ripples everywhere. That's the feature, not the bug.

Resilience4j: Your Last Line of Defense Against Cascading Failures

Resilience4j replaced Hystrix after Netflix stopped active development. It's a lightweight, modular library for circuit breakers, retries, rate limiters, bulkheads, and time limiters. You need at least three: circuit breaker, retry, and time limiter.

The circuit breaker has three states: Closed (normal), Open (failing, reject fast), Half-Open (testing if downstream recovered). You configure a failure rate threshold and a sliding window. Default: 50% failure rate in a window of 10 calls triggers open. After 60 seconds, it goes half-open and tries again.

The trap: people set the sliding window too small. In a low-traffic service, 10 calls might take an hour. A single failure keeps the circuit open. Set a minimum number of calls per minute before the circuit breaker evaluates.

Retries are dangerous without a circuit breaker. If the downstream is slow, retries make it worse. You'll drown it in requests. Always wrap retry inside a circuit breaker. Order matters: TimeLimiter → CircuitBreaker → Retry.

Rate limiting: protect your service from being overwhelmed by a single client. In inter-service communication, this is often per Feign client. If Service A calls Service B 100 times per second and B's rate limit is 50, you'll see 429s. Handle them with backoff.

Bulkheads: limit the number of concurrent threads calling a downstream. Prevents one slow downstream from consuming all threads. Use ThreadPoolBulkhead for async calls, SemaphoreBulkhead for sync.

I had a payment service that started responding in 5 seconds after a database migration. No one noticed because the circuit breaker was open for 60 seconds, then tried one request, failed, and opened again. The retry mechanism with backoff eventually got through. Without it, every request would have timed out. The circuit breaker saved us.

Interview Gold:

Order of decorators in Resilience4j matters: TimeLimiter first, then CircuitBreaker, then Retry. If you retry before the circuit breaker, you'll hammer the failing service and make recovery slower. The retry right after a failure is useless.

Production Insight

Set minimumNumberOfCalls to at least 5 on CircuitBreakerConfig. Without it, a single failure in a quiet service triggers the breaker. You'll get flapping — open, close, open, close — which is worse than no circuit breaker.

Key Takeaway

Circuit breakers, retries, and time limiters must be combined. Any one alone is dangerous. The correct order: time limiter → circuit breaker → retry.

thecodeforge.io

Spring Microservices Communication

Every junior asks: 'RestTemplate is thread-blocking garbage, so I should only use WebClient, right?' Wrong. RestTemplate isn't dead; it's a specialized tool for synchronous blocking workflows. You burn WebClient cycles when you need a simple blocking call in a synchronous service, not for every CRUD endpoint. The real production sin is using RestTemplate inside a WebFlux reactive stream or a virtual thread pool without isolation. That kills throughput faster than a DDoS. Use RestTemplate for legacy integrations, batch jobs, or simple request-reply patterns where you control the thread pool. Use WebClient when you need non-blocking, streaming responses, or reactive backpressure. The rule: if you block, own your thread. If you're reactive, never block.

RestTemplateVsWebClient.javaJAVA

// io.thecodeforge — java tutorial
// RestTemplate wins for blocking batch work
@Service
public class LegacyPaymentService {
    private final RestTemplate restTemplate;

    public LegacyPaymentService(RestTemplateBuilder builder) {
        this.restTemplate = builder
            .setConnectTimeout(Duration.ofSeconds(2))
            .setReadTimeout(Duration.ofSeconds(5))
            .build();
    }

    public PaymentResponse process(PaymentRequest request) {
        // Blocking call — fine here, this is a batch processor
        return restTemplate.postForObject(
            "https://legacy-gateway.internal/payments",
            request,
            PaymentResponse.class
        );
    }
}

Output

PaymentResponse{ id='txn-3847', status='SETTLED' }

Production Trap:

Never share a RestTemplate across virtual threads without a dedicated thread pool. You'll silently exhaust your carrier threads and wonder why your response times spike to HTTP 503.

Key Takeaway

RestTemplate for blocking work you control; WebClient when you embrace async. Mixing them without isolation is a production incident waiting to happen.

Processing Streaming Responses Without Blowing Your Heap

You have a service that returns a 50MB JSON array of customer orders. If you read the whole body into memory with bodyToMono(), you just killed your heap. The fix: streaming. WebClient supports streaming responses out of the box via Flux. This is essential for large datasets, real-time feeds, or event streams. You consume the stream line by line, processing each element as it arrives. No giant list sitting in memory. The catch? You must handle backpressure or your downstream consumer will get overwhelmed. Spring WebFlux handles backpressure automatically if you use Flux<Order> as the return type and let the framework push items at its own pace. But if you force a .collectList(), you're back to memory land and you've lost the streaming advantage.

StreamingExample.javaJAVA

// io.thecodeforge — java tutorial
// Stream a large JSON array without blowing the heap
@Service
public class OrderStreamingService {

    private final WebClient webClient;

    public OrderStreamingService(WebClient.Builder builder) {
        this.webClient = builder.baseUrl("https://orders.internal").build();
    }

    public Flux<Order> streamOrders(String customerId) {
        return webClient.get()
            .uri("/orders/{customerId}", customerId)
            .accept(MediaType.APPLICATION_NDJSON) // Newline-delimited JSON
            .retrieve()
            .bodyToFlux(Order.class)
            .doOnError(e -> log.error("Stream failed for {}", customerId, e));
    }
}

Output

2026-03-21 10:15:32.123 INFO [orders-service] Processing order #8273

2026-03-21 10:15:32.124 INFO [orders-service] Processing order #8274

(stream continues without heap spike)

Production Trap:

Using .bodyToFlux() but then calling .collectList() anywhere in the pipeline defeats the streaming purpose. Always return Flux directly from your controller or service method.

Key Takeaway

Stream responses with Flux for large datasets. Never read the entire body into memory when you can process items one at a time.

Retries With Exponential Backoff: Don't Hammer A Dying Service

Your order service calls the inventory service. Inventory has a hiccup. You configured a simple retry with a 100ms delay. Now your service is pummeling a failing inventory service with retries every 100ms, making the outage worse. That's a cascading failure. The fix: exponential backoff with jitter. Resilience4j's Retry annotation supports this natively. You specify an initial delay, a multiplier, and a maximum delay. The jitter randomizes the retry interval so your retries don't land on the same clock tick. This is non-negotiable for any production microservice. Configure it globally via application.yml so every service gets the same sensible defaults. If you hardcode retry policies in individual @Retry annotations, you'll get drift and surprises during an incident.

RetryConfig.javaJAVA

// io.thecodeforge — java tutorial
@Configuration
public class RetryConfiguration {

    @Bean
    public RetryRegistry retryRegistry() {
        RetryConfig config = RetryConfig.custom()
            .maxAttempts(4)
            .waitDuration(Duration.ofMillis(200))
            .intervalFunction(IntervalFunction.ofExponentialBackoff(
                Duration.ofMillis(200), 2.0, Duration.ofSeconds(5)))
            .retryExceptions(IOException.class, TimeoutException.class)
            .build();
        return RetryRegistry.of(config);
    }
}

// Usage in service
@Retry(name = "inventoryRetry", fallbackMethod = "inventoryFallback")
public InventoryResponse checkStock(String sku) {
    return webClient.get()
        .uri("/stock/{sku}", sku)
        .retrieve()
        .bodyToMono(InventoryResponse.class)
        .block();
}

Output

2026-03-21 10:15:35.001 WARN [orders-service] Retry attempt 1/4 for checkStock(sku=12345), waiting 200ms

2026-03-21 10:15:35.435 WARN [orders-service] Retry attempt 2/4, waiting 400ms

2026-03-21 10:15:36.085 WARN [orders-service] Retry attempt 3/4, waiting 800ms

2026-03-21 10:15:37.135 INFO [orders-service] Fallback inventory returned: OutOfStock

Production Trap:

Never retry on 4xx errors. A 401 or 400 won't magically become a 200 on retry. Only retry on 5xx, network timeouts, and IOExceptions.

Key Takeaway

Exponential backoff with jitter is the only acceptable retry strategy. If you don't have it, you're part of the problem during an outage.

● Production incidentPOST-MORTEMseverity: high

The cascading timeout from a missing circuit breaker

Symptom

Checkout page returns 500 after 45 seconds. Payment service logs show no requests. Inventory service logs show 'Connection timed out' on a single endpoint. All other services healthy.

Assumption

The payment service is down or overloaded. Restarted it. No change.

Root cause

Inventory service had a Feign client calling payment-service with a hardcoded URL from an old config map. New payment pod had a different IP. Feign client had no circuit breaker and a default connection timeout of 30 seconds. Each checkout request blocked inventory's thread for 30 seconds. Tomcat thread pool exhausted in minutes.

Fix

1) Fixed the Feign client to use service discovery (eureka://payment-service). 2) Added Resilience4j circuit breaker with 2-second timeout and 5 retries with exponential backoff. 3) Set a global Feign connect timeout to 2 seconds and read timeout to 5 seconds. 4) Wrote a fallback method returning a cached response for the payment health check.

Key lesson

Every cross-service call MUST have a circuit breaker and a timeout.
No exceptions.
Without them, you're one deployment away from a platform-wide outage.

Production debug guideSymptom → root cause → fix for the failures that actually happen4 entries

Symptom · 01

Feign client logs 'Load balancer does not have available server for client: my-service'

→

Fix

Service discovery is failing. Check the service registry (Eureka dashboard or Consul API) to confirm the target service is registered. Verify the Feign client name matches the spring.application.name of the target service. Check if the service has a health check failing and got de-registered. Common cause: the target service's health endpoint returns 200 but a downstream dependency is missing, causing Eureka to evict it. Fix: add retries to health check or set a longer eviction interval.

Symptom · 02

Random 500s with no clear pattern — some requests work, some don't

→

Fix

This is often a timeout or circuit breaker opening under load. Check /actuator/health of the calling service for circuit breaker state. Look for 'CircuitBreaker 'my-feign' is OPEN' in logs. Check Resilience4j metrics. Likely cause: the target service has a slow path that occasionally exceeds the timeout. Fix: add a timeout to the specific Feign method using @FeignClient(configuration=MyConfig.class) or increase the circuit breaker wait duration.

Symptom · 03

Feign client logs 'Read timed out' but target service logs show request processed in 100ms

→

Fix

Read timeout is usually a network issue between the services. Check if they are on different Kubernetes nodes or have network policies. Also check SSL/TLS termination — if the target is behind a proxy that buffers before responding, it can appear to timeout. Fix: increase read timeout to 30 seconds temporarily to confirm, then investigate the network path. If behind a proxy, ensure the proxy sends 100 (Continue) responses.

Symptom · 04

Service A calls Service B, but Service B gets a different IP than expected in its logs

→

Fix

Your load balancer is distributing traffic, but you're looking at the wrong instance. Check if Service A is using client-side load balancing (Spring Cloud LoadBalancer) or has a hardcoded URL. If using Eureka, verify Service B has multiple instances. Run 'curl <eureka-server>/eureka/apps/SERVICE-B' to see registered instances. Fix: ensure Service A uses a service name, not an IP, in its Feign client.

★ Debug Cheat SheetCommands for fast diagnosis in production

Feign client gets 'Connection refused'−

Immediate action

Check if target service is running and its port is correct

Commands

kubectl get pods -n your-namespace | grep target-service

kubectl exec -it caller-pod -- curl -v http://target-service:8080/actuator/health

Fix now

Update the target service deployment to match the expected port. In application.yml: server.port=8080.

Circuit breaker half-open state fluctuating+

Feign client logs 'No instances available for service' but Eureka shows it up+

Service Discovery Options in Spring Cloud

Feature	Eureka	Consul	Kubernetes Native
CAP Theorem	AP (availability + partition tolerance)	CP (consistency + partition tolerance)	CP
Setup complexity	Simple — one JAR, one config	Medium — separate binary + config	Low — built into K8s API server
Heartbeat mechanism	Client (service sends heartbeat every 30s)	Agent (Consul agent on each node)	Kubelet + liveness/readiness probes
Metadata support	Yes — key-value metadata map	Yes — rich key-value + tags	Yes — annotations and labels
Health check integration	Custom health endpoint, de-registers on 404	Script or HTTP checks every TTL	Readiness probe gates pod lifecycle
Staleness window	Up to 90s (lease expiration)	Up to 5s (deregister critical service)	Instant (API server watches) — 0s
Best for	Spring Boot clusters without K8s	Polyglot environments, service mesh	Teams already on K8s
Anti-pattern	Using Eureka on K8s without extra tuning	Using Consul with Spring Cloud when already on Anthos	Relying solely on DNS for service discovery

⚙ Quick Reference

3 commands from this guide

File	Command / Code	Purpose
RestTemplateVsWebClient.java	@Service	Why You Still Need RestTemplate (And When to Burn It)
StreamingExample.java	@Service	Processing Streaming Responses Without Blowing Your Heap
RetryConfig.java	@Configuration	Retries With Exponential Backoff

Key takeaways

Every inter-service call must have a circuit breaker, timeout, and fallback. No exceptions. Defaults are not production-ready.

Hardcoded service URLs are a ticking time bomb. Use service discovery with a registry. Your future self will thank you.

Distributed tracing is the only way to debug latency across service boundaries. Sample 1% in prod, 100% for errors.

Load balancer and bulkheads protect your service, not the downstream. Monitor their metrics as leading indicators of trouble.

Symptom

Zipkin/Jaeger backend overwhelmed; costs skyrocket; trace storage fills up

Fix

Set management.tracing.sampling.probability to 0.01. Use rate-limiting to capture exceptions at 100%.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Your Feign client to the payment service is timing out under load. The p...

Q02JUNIOR

What happens when a circuit breaker is in the OPEN state and a new reque...

Q03SENIOR

Explain the difference between client-side and server-side load balancin...

Q04SENIOR

You need to call a downstream service that sometimes takes 10 seconds. U...

Q05SENIOR

Why does Spring Cloud recommend using @LoadBalanced RestTemplate instead...

Q06JUNIOR

Sampling rate for distributed tracing: what do you set for a high-traffi...

Q07SENIOR

Your service calls two downstream services: A takes 200ms average, B tak...

Q08SENIOR

What happens if you put Retry before Circuit Breaker in the Resilience4j...

Q01 of 08SENIOR

Your Feign client to the payment service is timing out under load. The payment service logs show it's responding in 100ms. What do you check first?

ANSWER

The network path. Check if there's a proxy or load balancer between them. The proxy could be buffering requests. Also check the Feign client's connection pool — if it's exhausted, new requests queue up. Use curl from the caller pod to test the actual latency. Then check the read timeout config on the Feign client — it might be too low for queued requests.

FAQ · 5 QUESTIONS

Frequently Asked Questions

How do I set a Feign client timeout globally?

Eureka vs Consul vs Kubernetes native — which should I use?

How do I handle a 404 response from a downstream service without triggering the circuit breaker?

Why does my Feign client sometimes hit a dead instance?

How do I propagate trace context across asynchronous calls?

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

🔥

That's Spring Cloud. Mark it forged?

8 min read · try the examples if you haven't