Service Discovery with Spring Cloud Eureka
Master Spring Cloud Eureka service discovery: @EnableEurekaServer, client registration, heartbeat tuning, self-preservation pitfalls, and zone-aware routing.
- Annotate your server app with @EnableEurekaServer and add spring-cloud-starter-netflix-eureka-server dependency
- Clients register with @EnableDiscoveryClient (or auto-configured) and spring-cloud-starter-netflix-eureka-client
- Tune heartbeat with eureka.instance.lease-renewal-interval-in-seconds (default 30s) and lease-expiration-duration-in-seconds (default 90s)
- Self-preservation mode prevents mass de-registration during network partitions — disable only in dev (eureka.server.enable-self-preservation=false)
- Zone-aware routing uses eureka.instance.metadata-map.zone to prefer same-zone instances
Eureka is like a hotel concierge who keeps a live directory of every service currently open for business. When your microservice starts up, it checks in with the concierge; when it shuts down, it checks out. Any other service that wants to call it asks the concierge for the current room number instead of hardcoding an address.
In a monolithic application you know exactly where your database and downstream APIs live — they have fixed hostnames in a config file and they stay there. Microservices shattered that comfort zone. A single product page might need to call inventory, pricing, review, and recommendation services, each deployed as an auto-scaling group that assigns a new IP every time a container restarts. Hardcoding those addresses is a recipe for 3 AM pages.
Spring Cloud Eureka solves this with a client-side service registry. Every microservice registers itself on startup, renews its lease every 30 seconds, and de-registers on graceful shutdown. Consumers fetch the registry once and cache it locally, falling back to the cached copy if the Eureka server is briefly unavailable. That local cache is what gives Eureka its resilience story.
The production pain point most teams discover too late is self-preservation mode. When the Eureka server stops receiving enough heartbeats — maybe because of a network hiccup, not because services actually died — it refuses to expire registrations to avoid a cascading de-registration event. This is the right behavior in production but baffling in development when you kill a service and it stays in the registry for minutes.
Zone-aware routing is Eureka's answer to latency. In a multi-AZ or multi-region deployment, you want service A in us-east-1a to prefer instances of service B also in us-east-1a before crossing AZ boundaries. Eureka's metadata map and Spring Cloud LoadBalancer's ZonePreferenceServiceInstanceListSupplier wire this together with minimal config.
Health check integration bridges the gap between Eureka's heartbeat mechanism and Spring Boot Actuator's richer health indicators. By default Eureka only knows a service is alive if it can send a heartbeat; with health check callbacks enabled it also considers the /actuator/health status, so a service with a broken database connection is correctly marked DOWN in the registry even though the JVM is running.
This guide covers all of these concerns with production-tested configuration, real incident post-mortems, and the exact commands you need when something goes wrong at 2 AM.
Setting Up the Eureka Server
The Eureka Server is a standalone Spring Boot application. You need spring-cloud-starter-netflix-eureka-server on the classpath and the @EnableEurekaServer annotation on your main application class. By default, the server also tries to register itself as a client — in a standalone setup you must disable this.
The server exposes a dashboard at / and the REST API at /eureka/apps. The dashboard is invaluable for debugging but should be secured in production with Spring Security. The REST API is what clients use and should be accessible from your service network.
For high availability, run at least two Eureka server instances and have each register with the other. This peer-to-peer replication means each server maintains a full copy of the registry. Use DNS round-robin or a load balancer in front of the Eureka cluster and point all clients at that address. In AWS, a common pattern is one Eureka instance per AZ with cross-AZ peer replication.
Server configuration requires careful tuning of the eviction intervals. The default eviction task runs every 60 seconds, which means a dead instance can live in the registry for up to 90+60=150 seconds. In latency-sensitive systems, reduce both lease-expiration-duration-in-seconds and eviction-interval-timer-in-ms together.
Registering Eureka Clients
The Eureka client is auto-configured when spring-cloud-starter-netflix-eureka-client is on the classpath and spring.application.name is set. The @EnableDiscoveryClient annotation is optional in modern Spring Cloud versions but serves as explicit documentation of intent.
At startup, the client sends a POST /eureka/apps/{appName} request with a JSON payload containing the instance metadata: hostname, IP address, port, secure port, health check URL, homepage URL, and the metadata map. This registration happens in a background thread after the application context is fully started.
The most important client-side tuning parameters are the heartbeat interval and the registry fetch interval. Reducing the heartbeat interval means faster detection of dead instances (at the cost of more network traffic). Reducing the registry fetch interval means consumers see new registrations faster (also at the cost of more traffic). In a large deployment with hundreds of instances, be conservative with these values.
The preferIpAddress setting is important in containerized environments where hostname resolution is unreliable. Setting eureka.instance.prefer-ip-address=true makes the client register with its IP address, and instance-id should be set to something unique like ${spring.application.name}:${spring.application.instance_id:${random.value}} to avoid collisions when multiple instances run on the same host.
For Kubernetes deployments, you often want to register with the pod IP rather than the node hostname. Combine prefer-ip-address=true with a liveness probe that matches your health check URL, and set the initial registration delay to give the pod time to become ready before it starts receiving traffic from the registry.
Self-Preservation Mode Deep Dive
Self-preservation is Eureka's defense against the split-brain problem in distributed systems. The server tracks the number of heartbeats it expects to receive per minute based on all registered instances. When the actual heartbeat rate drops below 85% of the expected rate, the server assumes it's experiencing a network issue and stops expiring instances.
The math: if you have 100 instances each sending a heartbeat every 30 seconds, the server expects 200 heartbeats per minute. If only 150 arrive (75%), self-preservation activates. The server now holds all 100 instances in the registry even if some have genuinely died.
Why this is correct behavior: without self-preservation, a temporary network partition between the Eureka server and a subset of healthy services would cause those services to be evicted, even though they're actively serving traffic. Consumers would then stop sending them traffic, causing unnecessary downtime.
Why this causes confusion: during a rolling deployment when you're intentionally terminating old instances, self-preservation can cause Eureka to hold the dead instances for much longer than the lease-expiration-duration-in-seconds value suggests, because the eviction task checks if self-preservation is active before evicting.
The correct approach for deployments is to ensure graceful shutdown so instances de-register themselves before being terminated. This does not trigger self-preservation because de-registration is an explicit API call, not a missed heartbeat. Only use eureka.server.enable-self-preservation=false in development or staging environments where you need fast cleanup after testing.
Zone-Aware Routing with Eureka
Zone-aware routing reduces cross-AZ latency and data transfer costs by preferring service instances in the same availability zone. In AWS, cross-AZ traffic within a region costs $0.01/GB in each direction — in a high-throughput system this adds up quickly and also adds 1-3ms of latency compared to same-AZ calls.
Eureka implements zone awareness through instance metadata. Each instance registers with a zone label in its metadata map, and the load balancer (Spring Cloud LoadBalancer in modern Spring Cloud, previously Ribbon) uses this metadata to prefer same-zone instances. The fallback when no same-zone instances are available is to use instances from other zones.
Configuration requires setting the zone on both the provider (so Eureka knows which zone the instance is in) and on the consumer (so LoadBalancer knows which zone to prefer). In AWS, the zone should match the EC2 availability zone label (us-east-1a, us-east-1b, etc.). In ECS and EKS, inject this via environment variables from the instance metadata service.
The ZonePreferenceServiceInstanceListSupplier in Spring Cloud LoadBalancer is the modern implementation. It wraps another ServiceInstanceListSupplier and filters the list to prefer same-zone instances. If no same-zone instances are available, it falls back to the full list. This fallback is important — never configure zone routing in a way that causes NoInstanceAvailableException when a zone is down.
Health Check Integration and Instance Status Management
Eureka's default heartbeat mechanism only confirms that the JVM process is alive and the heartbeat thread is running. It does not verify that the application is actually capable of serving requests. A service with a broken database connection, a saturated thread pool, or a crashed internal executor will still send heartbeats and remain in the registry as UP.
Spring Boot Actuator's /actuator/health endpoint aggregates health indicators from all registered components: datasource health, Redis connectivity, disk space, custom business health checks, and more. By enabling eureka.client.healthcheck.enabled=true, you instruct the Eureka client to periodically check the local /actuator/health endpoint and update the instance status in the registry accordingly. If health returns DOWN, the instance status in Eureka is updated to DOWN and consumers will not route traffic to it.
You can also programmatically control instance status using the EurekaClient or ApplicationInfoManager. This is useful for implementing blue-green deployments, graceful pre-shutdown draining, or maintenance mode. Setting the status to OUT_OF_SERVICE removes the instance from consumer registries without de-registering it — it comes back as UP when the status is reset.
Custom health indicators let you define exactly what 'healthy' means for your service. A payment service might include a health check that verifies it can reach the payment gateway. An inventory service might check that its local cache is warm. These business-level health checks, surfaced through Actuator and propagated to Eureka, give you much more accurate traffic routing than a simple JVM heartbeat.
High Availability and Production Hardening
A single Eureka server is a single point of failure. All consumers cache the registry locally, so they can survive a brief Eureka outage, but new instances cannot register and the registry cannot be updated. For true HA, run a cluster of 2-3 Eureka servers with peer replication enabled.
In AWS, the recommended topology is one Eureka instance per AZ, each registered with the others. Use an internal Application Load Balancer or Route 53 round-robin DNS in front of the cluster. Clients should have the ALB/DNS address in their service-url — this way clients are not affected if one Eureka server fails and is replaced with a new IP address.
Eureka server memory requirements are modest — typically 512MB-1GB heap for a deployment of 100-200 services with multiple instances each. However, GC pauses can interrupt heartbeat processing. Use G1GC with a max pause time target of 200ms, and monitor GC pause times as a leading indicator of Eureka health.
Rate limiting on the Eureka server is often overlooked. At startup, all service instances attempt to register simultaneously, creating a thundering herd. The server can handle this, but it's worth staggering startup in your deployment pipeline. Similarly, if a Eureka server restarts, all clients will simultaneously attempt to re-fetch the full registry — implement exponential backoff in clients via eureka.client.initial-instance-info-replication-interval-seconds.
Security hardening requires Spring Security on the Eureka server with at minimum HTTP Basic auth. Use TLS everywhere — between clients and server, and between peer Eureka servers. In a zero-trust network architecture, use mutual TLS (mTLS) so only authorized services can register.
Ghost Instances Serving 503s After Blue-Green Deployment
- Eureka de-registration only happens on graceful JVM shutdown.
- In cloud environments where VMs or containers are terminated externally, you must either ensure graceful shutdown hooks run or reduce lease TTLs and accept the tradeoff of more aggressive eviction.
EurekaClient.shutdown() or a /actuator/service-registry?status=OUT_OF_SERVICE was made. If status is UP but consumers still fail, the consumer's local cache may be stale — check eureka.client.registry-fetch-interval-seconds and wait one TTL cycle.curl -s http://eureka-server:8761/eureka/apps | python3 -m json.tool | grep -A5 'YOUR-SERVICE'curl -s http://your-service:8080/actuator/health | python3 -m json.toolKey takeaways
Common mistakes to avoid
6 patternsDisabling self-preservation in production
Not setting spring.application.name
Pointing all clients directly at individual Eureka server IPs instead of a load balancer
Forgetting to disable CSRF for /eureka/** when adding Spring Security
Using the same Eureka URL for both peer replication and client registration in HA mode
Not enabling health check callbacks (eureka.client.healthcheck.enabled=false, the default)
Interview Questions on This Topic
What is self-preservation mode in Eureka and when does it activate?
Frequently Asked Questions
That's Spring Cloud. Mark it forged?
8 min read · try the examples if you haven't