Senior 8 min · May 23, 2026

Spring Boot Production Deployment on Kubernetes

Deploy Spring Boot on Kubernetes with production-grade YAML, readiness/liveness probes, graceful shutdown, HPA, resource limits, and zero-downtime strategies.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Use /actuator/health/readiness for readiness probe and /actuator/health/liveness for liveness probe — never swap them
  • Set spring.lifecycle.timeout-per-shutdown-phase=30s and terminationGracePeriodSeconds=60 for graceful shutdown
  • Always set both requests AND limits for CPU and memory — missing requests breaks HPA scheduling
  • HPA requires metrics-server installed; for custom metrics use Prometheus Adapter or KEDA
  • ConfigMap for non-sensitive config, Secret for credentials — never bake secrets into container images
✦ Definition~90s read
What is Spring Boot Production Deployment on Kubernetes?

Kubernetes is a container orchestration platform that manages the deployment, scaling, and health of containerised applications. A Spring Boot application is packaged as a Docker image and declared in a Kubernetes Deployment resource, which ensures a specified number of pod replicas are always running.

Running Spring Boot on Kubernetes is like managing a restaurant chain.

Kubernetes continuously reconciles the desired state (defined in YAML manifests) with the actual state of the cluster.

Key Kubernetes resources for a Spring Boot deployment: Deployment (manages pod replicas and rolling update strategy), Service (stable DNS name and load-balanced VIP for a set of pods), ConfigMap (environment configuration injected as env vars or mounted files), Secret (base64-encoded sensitive values — credentials, API keys), HorizontalPodAutoscaler (adjusts replica count based on CPU, memory, or custom metrics), and PodDisruptionBudget (ensures minimum availability during voluntary disruptions like node drains).

Spring Boot 2.3+ includes first-class Kubernetes support via Spring Boot Actuator: /actuator/health/readiness and /actuator/health/liveness endpoints, graceful shutdown via the SmartLifecycle abstraction, and integration with Kubernetes ConfigMap reloading via Spring Cloud Kubernetes.

Plain-English First

Running Spring Boot on Kubernetes is like managing a restaurant chain. Each pod is a kitchen. The readiness probe checks if the kitchen is ready to take orders (DB connected, caches warm). The liveness probe checks if the kitchen hasn't caught fire. HPA is the manager who opens more kitchens during lunch rush and closes them at night. Graceful shutdown is the 'last orders' call — no new customers, finish the current tables.

Deploying a Spring Boot application to Kubernetes for the first time typically takes an afternoon. Getting that deployment production-ready — with proper health checks, graceful shutdown, resource sizing, autoscaling, and zero-downtime rollouts — takes considerably longer, often because the gap between 'it runs' and 'it is production-safe' is invisible until an incident exposes it.

The most common first mistake is missing or misconfigured health probes. A liveness probe that points to /actuator/health (which includes database health) will kill a healthy pod the moment the database becomes temporarily unavailable — creating a cascade where the loss of DB connectivity triggers pod restarts, which surge connection pool creation, which worsens the DB overload. Kubernetes probes require precise configuration of which health indicator reports to which endpoint.

The second common mistake is omitting graceful shutdown. Without it, Kubernetes sends SIGTERM and immediately stops routing traffic to the pod — but the pod's in-flight requests are still being processed. With a 5-10 second request timeout and no graceful shutdown, every in-flight request at deployment time returns a 503. At 10 deployments per day, that is 10 traffic spikes of errors that accumulate in SLO burn rate dashboards.

Resource requests and limits are equally critical. Missing resource requests means the Kubernetes scheduler cannot make informed placement decisions, and the HPA has no baseline for scaling decisions. Missing limits means a memory leak in one pod can consume all node memory, triggering OOM kills of unrelated pods on the same node.

This guide covers the full production deployment stack: Deployment/Service/ConfigMap/Secret YAML, probe configuration, graceful shutdown setup, HPA with both CPU and custom Prometheus metrics, resource sizing formulas, and the operational checklist for zero-downtime deploys.

Complete Kubernetes Deployment YAML

A production Spring Boot Kubernetes deployment requires eight distinct resource types working together: Deployment, Service, ConfigMap, Secret, HorizontalPodAutoscaler, PodDisruptionBudget, ServiceAccount, and (optionally) a NetworkPolicy. Omitting any of these creates operational gaps.

The Deployment spec defines the pod template, container configuration, resource requests/limits, health probes, lifecycle hooks, and rolling update strategy. The strategy.rollingUpdate section should specify maxUnavailable=0 (zero pods taken down before new ones are ready) and maxSurge=1 (one extra pod spun up during rollout) for true zero-downtime deploys.

Resource requests must be set accurately based on observed metrics, not guessed. Under-setting requests causes the scheduler to pack too many pods on one node, leading to noisy-neighbour CPU throttling. Over-setting requests wastes capacity and prevents efficient bin-packing. The target is to set requests at the p50 (median) observed usage and limits at the p99 plus a 20% buffer.

The ServiceAccount with minimal RBAC permissions follows the principle of least privilege. Spring Cloud Kubernetes uses the service account to read ConfigMaps for live configuration reload — scope this permission tightly.

A PodDisruptionBudget with minAvailable=1 (or maxUnavailable=1 for larger deployments) ensures that node drains during cluster maintenance do not take down all pods simultaneously. This is frequently overlooked and causes unnecessary downtime during infrastructure maintenance.

terminationGracePeriodSeconds Must Exceed App Shutdown Timeout
If spring.lifecycle.timeout-per-shutdown-phase=30s (default), your terminationGracePeriodSeconds must be at least 30 + preStop sleep (5s) + buffer = 45s minimum. Kubernetes sends SIGKILL after terminationGracePeriodSeconds regardless, abruptly killing in-flight requests.
Production Insight
Always pin image tags to a specific digest (image@sha256:...) in production — mutable tags like :latest can silently pull a different image on pod restart.
Key Takeaway
maxUnavailable=0 + maxSurge=1 is the zero-downtime rolling update strategy — never set maxUnavailable>0 if you care about availability during deployments.

Health Probes: Readiness vs Liveness vs Startup

Kubernetes provides three probe types that serve distinct purposes. Confusing them is one of the most common and damaging production misconfigurations.

The liveness probe answers: 'Is this container worth keeping alive?' Kubernetes kills and restarts a container that fails the liveness probe. Therefore, liveness should only check application-internal state — is the JVM running, is the main thread alive, is there no deadlock. It must NOT check external dependencies like databases. If the database goes down, the application is still alive and should remain alive, waiting for the database to recover. Checking external health in the liveness probe causes pod restarts on infrastructure failures, worsening the situation.

The readiness probe answers: 'Is this container ready to receive traffic?' Kubernetes removes a pod from the Service's endpoint pool when it fails the readiness probe, stopping new traffic from routing to it. Readiness SHOULD check external dependencies: is the database connection pool healthy, is the cache warm, are required downstream services reachable. A pod that is failing readiness is still alive and counted in replica count — it just does not receive traffic.

The startup probe answers: 'Has the application finished starting up?' It runs instead of liveness and readiness until it succeeds, giving slow-starting applications (Spring Boot with many auto-configurations, database migration on startup) time to initialise without triggering liveness failures. Once the startup probe succeeds, Kubernetes switches to liveness and readiness probes.

Spring Boot Actuator exposes these three endpoints out of the box with management.endpoint.health.probes.enabled=true. The /actuator/health/liveness endpoint reports LivenessState (CORRECT or BROKEN). The /actuator/health/readiness endpoint reports ReadinessState (ACCEPTING_TRAFFIC or REFUSING_TRAFFIC). Your application can programmatically update these states via ApplicationAvailability.

Never Check External Dependencies in the Liveness Probe
Including database, Redis, or Kafka health in the liveness probe causes pod restarts on infrastructure failures — the worst time to restart. Use /actuator/health/liveness (JVM-only checks) for liveness and /actuator/health/readiness (all dependencies) for readiness.
Production Insight
Add a startupProbe with failureThreshold=30 and periodSeconds=10 — this gives 5 minutes for Spring Boot startup without triggering false liveness failures on slow nodes.
Key Takeaway
Liveness = JVM alive (never external deps). Readiness = all deps healthy (stops traffic). Startup = slow init allowed (prevents premature liveness failure).

Graceful Shutdown

Graceful shutdown is the process by which a Spring Boot application, upon receiving SIGTERM from Kubernetes, stops accepting new requests, allows in-flight requests to complete, then cleanly shuts down its connections and resources. Without it, every rolling deployment or node drain causes visible errors for the requests in flight at the moment of pod termination.

The graceful shutdown sequence in Kubernetes is: (1) Kubernetes sets pod to Terminating state, removes it from Service endpoints (stops routing new traffic), runs preStop hook; (2) simultaneously, Kubernetes sends SIGTERM to the container; (3) the application's shutdown hooks run in order: Spring's SmartLifecycle stops accepting new requests, waits for in-flight requests to complete (up to timeout), then closes each infrastructure component in order (HTTP server, Kafka consumers, JDBC pool, caches); (4) after terminationGracePeriodSeconds, Kubernetes sends SIGKILL regardless.

Critical timing issue: the endpoint removal (step 1) and SIGTERM (step 2) happen concurrently. There is a race condition where the load balancer may still route requests to the pod for a few hundred milliseconds after SIGTERM is received. The preStop sleep of 5 seconds covers this window.

Spring Boot 2.3+ enables graceful shutdown natively. Set server.shutdown=graceful and spring.lifecycle.timeout-per-shutdown-phase=30s. The timeout is per phase — Spring's SmartLifecycle has ordered phases (HIGHEST_PRECEDENCE to LOWEST_PRECEDENCE) and each gets 30 seconds. Total shutdown time could be multiple minutes for complex applications with many lifecycle phases.

Shell Form ENTRYPOINT Breaks SIGTERM
ENTRYPOINT java -jar app.jar (shell form) spawns /bin/sh -c which receives SIGTERM but does not forward it to the JVM. The JVM is force-killed after terminationGracePeriodSeconds. Always use exec form: ENTRYPOINT ["java", "-jar", "app.jar"].
Production Insight
Set terminationGracePeriodSeconds to (spring.lifecycle.timeout-per-shutdown-phase × number of lifecycle phases) + preStop sleep + 10s buffer — calculate it, do not guess.
Key Takeaway
Graceful shutdown requires three coordinated settings: server.shutdown=graceful, terminationGracePeriodSeconds>shutdown timeout, and a preStop sleep to cover the endpoint removal race.

HorizontalPodAutoscaler: CPU and Custom Metrics

The HorizontalPodAutoscaler (HPA) automatically adjusts the replica count of a Deployment based on observed metrics. The most common trigger is CPU utilisation, but production systems often need to scale on custom metrics like request queue depth, Kafka consumer lag, or active WebSocket connections.

CPU-based HPA uses the Kubernetes Metrics Server, which scrapes CPU usage from the kubelet every 15 seconds. HPA calculates the desired replica count as: ceil(currentMetricValue / desiredMetricValue × currentReplicas). For example, with targetCPUUtilizationPercentage=70, 3 replicas at 90% CPU → ceil(90/70 × 3) = ceil(3.86) = 4 replicas.

Custom metrics HPA requires additional infrastructure. The Prometheus Adapter translates Prometheus metrics into the Kubernetes custom metrics API. KEDA (Kubernetes Event-Driven Autoscaling) is a more powerful alternative that supports scaling to zero and a wider range of metric sources including Kafka consumer lag, RabbitMQ queue depth, and AWS SQS queue length.

HPA has configurable scale-up and scale-down stabilisation windows to prevent flapping. The default scale-up stabilisation is 0 seconds (scale up immediately) and scale-down is 300 seconds (wait 5 minutes before scaling down). This asymmetry is intentional — scale up fast to handle load spikes, scale down slowly to avoid premature termination.

VerticalPodAutoscaler (VPA) adjusts resource requests/limits based on observed usage, complementing HPA. Use VPA in recommendation mode first (never in auto mode in production) to understand actual resource needs before setting requests and limits manually.

Use KEDA for Event-Driven Scaling
KEDA can scale to zero (0 replicas when no events) and back up on demand — impossible with standard HPA. It supports Kafka lag, RabbitMQ queue depth, AWS SQS, Azure Service Bus, and 50+ other triggers. For async workloads, KEDA provides much more accurate scaling signals than CPU utilisation.
Production Insight
Set minReplicas=2 for any production service — a single-replica deployment has zero tolerance for pod evictions, and node drains will cause downtime.
Key Takeaway
CPU-based HPA is the starting point; migrate to custom metrics (KEDA with Kafka lag) for event-driven workloads where CPU is a poor proxy for actual load.

ConfigMap and Secret Management

Kubernetes ConfigMaps and Secrets decouple application configuration from container images, enabling the same image to run in development, staging, and production with different configuration values. This is a core Twelve-Factor App principle.

ConfigMaps store non-sensitive configuration: feature flags, tuning parameters, service URLs, Spring profiles. Secrets store sensitive values: database passwords, API keys, TLS certificates, Kafka credentials. Secrets are base64-encoded in the API server (not encrypted by default — enable encryption at rest for production clusters).

Spring Boot reads Kubernetes configuration via environment variables (preferred for simple values) or mounted files (preferred for complex configuration like Java keystore files or multi-line YAML). For dynamic configuration reload without pod restart, Spring Cloud Kubernetes Config Server polls the ConfigMap for changes and refreshes @RefreshScope beans.

Never bake secrets into Docker images or commit them to source control. Use external secret management: AWS Secrets Manager + External Secrets Operator, HashiCorp Vault + Vault Agent Injector, or Sealed Secrets for GitOps. The External Secrets Operator syncs secrets from external stores into Kubernetes Secrets automatically, including rotation.

For sensitive Spring Boot properties, prefer environment variable injection over mounted files — mounted files may be accessible to other processes in the container. Use Kubernetes Secret references in env.valueFrom.secretKeyRef for fine-grained per-key injection.

Enable Kubernetes Secret Encryption at Rest
By default, Kubernetes Secrets are stored base64-encoded (not encrypted) in etcd. Enable Envelope Encryption using a KMS provider (AWS KMS, GCP KMS) to encrypt Secrets at rest. Without this, anyone with etcd access can read all Secrets in plaintext.
Production Insight
Use External Secrets Operator to sync secrets from AWS Secrets Manager or HashiCorp Vault — it handles rotation automatically and keeps Kubernetes Secrets current without manual intervention.
Key Takeaway
ConfigMaps for non-sensitive config (reload without restart via Spring Cloud Kubernetes); Secrets for credentials (use External Secrets Operator, never commit to Git).

Resource Sizing and Zero-Downtime Deploy Checklist

Resource sizing for Spring Boot on Kubernetes requires measuring actual usage, not guessing. Start with VPA in recommendation mode for one week, then set requests at the p50 of observed usage and limits at p99 + 20%.

For a typical Spring Boot REST service with moderate traffic: memory request 512Mi, memory limit 1Gi, CPU request 250m, CPU limit 1000m. Adjust based on your JVM heap configuration (-XX:MaxRAMPercentage=75 → 768Mi heap for 1Gi limit), number of threads (each thread uses ~1MB stack by default), and HikariCP pool size (each connection uses ~10MB in PostgreSQL).

Zero-downtime deployment requires all of the following to be true simultaneously: (1) readiness probe passes before pod receives traffic, (2) maxUnavailable=0 so old pods stay until new pods are ready, (3) preStop sleep covers load balancer propagation delay, (4) terminationGracePeriodSeconds > shutdown time, (5) PodDisruptionBudget ensures minimum replicas during the rollout, (6) new version is backward-compatible with the current database schema (no breaking migrations before deployment completes).

Database migration strategy for zero-downtime: use Flyway or Liquibase with non-breaking migrations. The rule: never remove or rename columns in the same migration as deploying new code that stops using them. Phase it: (1) add new column, deploy new code that writes to both old and new column; (2) backfill new column; (3) deploy code that reads from new column only; (4) remove old column in a later migration after all pods are on the new version.

Use CPU Limits Conservatively — Throttling is Invisible
CPU limits cause CPU throttling (not OOM kill) — the application slows down but does not crash. Throttling is invisible in most metrics dashboards. Set CPU limits generously (2-4x request) or not at all for latency-sensitive services. Use resource quotas at the namespace level to bound total cluster consumption instead.
Production Insight
Run kubectl top pods every hour for a week after a new deployment to establish accurate resource baselines — do not trust sizing estimates made before load testing.
Key Takeaway
Zero-downtime deployment is a system property requiring maxUnavailable=0, preStop sleep, graceful shutdown, PDB, and backward-compatible DB migrations — any missing piece causes visible errors.
● Production incidentPOST-MORTEMseverity: high

Liveness Probe Pointed at /actuator/health Causes Pod Restart Storm During RDS Maintenance

Symptom
At 03:12 AM: PagerDuty alert — order service availability dropped to 23%. Logs showed hundreds of pod restarts. RDS event log showed a planned maintenance failover at 03:10 AM.
Assumption
The on-call engineer assumed the RDS failover caused the database connections to fail and pods to crash. In fact, pods were healthy — they were being killed by Kubernetes.
Root cause
All liveness probes were configured to httpGet: path: /actuator/health. Spring Boot's /actuator/health includes a DataSourceHealthIndicator that reports DOWN when the database is unreachable. When RDS failed over (2-minute window), all pods reported DOWN to the liveness probe. Kubernetes interpreted this as 'the application is crashed' and killed/restarted all pods. The restart surge created 200+ new HikariCP connections trying to reach still-unreachable RDS, worsening connection exhaustion when RDS came back online.
Fix
Immediate: kubectl patch deployments to point liveness probe at /actuator/health/liveness (which only checks application-internal state, not DB). Long-term: added management.endpoint.health.probes.enabled=true and management.health.livenessstate.enabled=true to all Spring Boot apps. Added a PodDisruptionBudget to prevent >1 pod being disrupted simultaneously.
Key lesson
  • The liveness probe must ONLY check if the JVM process is alive and not deadlocked — never external dependencies.
  • Use /actuator/health/liveness for liveness, /actuator/health/readiness for readiness.
  • They exist for exactly this reason.
Production debug guideSymptom → root cause → fix5 entries
Symptom · 01
Pods stuck in CrashLoopBackOff
Fix
Run kubectl describe pod <pod-name> and check the 'Last State' section for the exit code. Exit code 1: application threw an uncaught exception at startup — check kubectl logs <pod> --previous for the stack trace. Exit code 137: OOM Kill — the pod exceeded its memory limit; check kubectl top pod and increase the memory limit or fix a memory leak. Exit code 143: SIGTERM not handled — the app did not shut down within terminationGracePeriodSeconds; check graceful shutdown configuration. For startup failures, temporarily increase the startupProbe failureThreshold to 30 to give the app more time to start.
Symptom · 02
Rolling update causes traffic errors — 502/503 during deployment
Fix
This is usually a readiness probe misconfiguration or missing preStop hook. First check: is the readiness probe passing before the pod receives traffic? Add readinessProbe.initialDelaySeconds matching your app's startup time (use startupProbe instead for variable startup times). Second check: does the pod stop receiving traffic before it starts shutting down? Add a preStop hook: lifecycle.preStop.exec.command=['/bin/sh','-c','sleep 5'] — this gives the Kubernetes Service endpoint controller time to remove the pod from the load balancer before the app starts shutting down connections.
Symptom · 03
HPA not scaling up despite high CPU
Fix
Run kubectl describe hpa <name> and read the 'Conditions' section. Common causes: (1) metrics-server not installed — check kubectl top pods; if it fails, install metrics-server. (2) resource requests not set — HPA calculates CPU utilisation as actual/requested; without requests, the calculation fails. (3) pod already at maxReplicas — increase maxReplicas. (4) cooldown period — HPA has a default 5-minute scale-up stabilisation window; check kubectl describe hpa for last scale event.
Symptom · 04
Pods not receiving SIGTERM — forceful kill after terminationGracePeriodSeconds
Fix
Check if the application is handling SIGTERM correctly. Spring Boot 2.3+ handles SIGTERM via the SmartLifecycle abstraction when spring.lifecycle.timeout-per-shutdown-phase is set. Verify by checking the logs for 'Shutting down ExecutorService' or 'Closing JPA EntityManagerFactory' messages. If absent, the JVM is not receiving SIGTERM — verify your Docker ENTRYPOINT uses exec form: ENTRYPOINT ["java", "-jar", "app.jar"] not shell form: ENTRYPOINT java -jar app.jar. Shell form spawns a shell that receives SIGTERM but does not forward it to the JVM.
Symptom · 05
OOMKilled pods — memory limit exceeded
Fix
Run kubectl top pods --sort-by=memory to find the memory-hungry pods. Check if the JVM heap is configured: -Xmx should be set to ~75% of the container memory limit. Without -Xmx, the JVM uses 25% of total node memory as heap ceiling, which often exceeds the container limit after GC overhead. Add -XX:MaxRAMPercentage=75.0 instead of hardcoding -Xmx. Also check for off-heap memory: Metaspace (-XX:MaxMetaspaceSize=256m), direct buffers, and native libraries. Use -XX:+HeapDumpOnOutOfMemoryError to capture heap dumps on OOM.
★ Debug Cheat SheetImmediate Kubernetes diagnostic commands for Spring Boot deployments
Pod CrashLoopBackOff or failing health probes
Immediate action
Describe the pod and get previous container logs
Commands
kubectl describe pod -l app=my-service -n production | grep -A 20 'Last State\|Events'
kubectl logs -l app=my-service --previous -n production --tail=100
Fix now
Exit code 137 = OOM (increase memory limit); Exit code 1 = startup failure (check stack trace in previous logs); Exit code 143 = graceful shutdown timeout (check terminationGracePeriodSeconds)
HPA not scaling+
Immediate action
Check HPA status and metrics availability
Commands
kubectl describe hpa my-service-hpa -n production
kubectl top pods -l app=my-service -n production
Fix now
If 'unknown' metrics: check metrics-server is running and resource requests are set. If at maxReplicas: increase maxReplicas. If ScalingLimited: check conditions for stabilization window.
Rolling update causing 503 errors+
Immediate action
Check rollout status and probe configuration
Commands
kubectl rollout status deployment/my-service -n production
kubectl get endpoints my-service -n production -w
Fix now
Watch endpoints — if pod IP appears before readiness passes, add readinessProbe.initialDelaySeconds. If IP removed too fast at shutdown, add preStop sleep hook.
Pod consuming more memory than expected+
Immediate action
Check live memory usage and JVM heap
Commands
kubectl top pods -l app=my-service -n production --sort-by=memory
kubectl exec -it <pod-name> -n production -- jcmd 1 VM.native_memory summary
Fix now
Add -XX:MaxRAMPercentage=75.0 and -XX:MaxMetaspaceSize=256m to JVM flags; increase container memory limit if legitimate growth
Probe Types Comparison
ProbeSpring EndpointChecksFailure ActionWhen to Use
Startup/actuator/health/livenessJVM alive (no DB)Restart podSlow startup apps; replaces liveness during init
Liveness/actuator/health/livenessJVM alive, no deadlockRestart podDetect hung/deadlocked JVM — never external deps
Readiness/actuator/health/readinessDB, Kafka, RedisRemove from LBDetect unhealthy dependencies; stops traffic routing

Key takeaways

1
Liveness probe must NEVER check external dependencies
only /actuator/health/liveness (JVM-only); external deps go in readiness probe only
2
Graceful shutdown requires server.shutdown=graceful, terminationGracePeriodSeconds > total shutdown time, preStop sleep, and exec-form ENTRYPOINT
3
maxUnavailable=0 + maxSurge=1 is the zero-downtime rolling update strategy
any maxUnavailable>0 risks traffic errors during deployment
4
Always set both resource requests and limits
missing requests breaks HPA and causes noisy-neighbour scheduling
5
PodDisruptionBudget with minAvailable=1 is non-negotiable for production services with SLAs
6
Database migrations must be backward-compatible across deployment windows
phase schema changes across two deployments

Common mistakes to avoid

6 patterns
×

Checking external dependencies (DB, Redis) in the liveness probe

Symptom
Pod restarts cascade during infrastructure maintenance — DB failover triggers pod restart storm
Fix
Use /actuator/health/liveness for liveness (JVM-only checks); use /actuator/health/readiness for readiness (all dependencies)
×

Using Docker ENTRYPOINT in shell form instead of exec form

Symptom
SIGTERM not forwarded to JVM — app is forcefully killed after terminationGracePeriodSeconds; in-flight requests lost
Fix
Use exec form: ENTRYPOINT ["java", "-jar", "app.jar"] — the JVM is PID 1 and receives SIGTERM directly
×

Not setting CPU/memory resource requests

Symptom
HPA shows 'unknown' metrics; scheduler places too many pods on one node; noisy-neighbour CPU throttling
Fix
Always set both requests and limits; use VPA recommendation mode for one week to calibrate realistic values
×

Setting terminationGracePeriodSeconds less than spring.lifecycle.timeout-per-shutdown-phase

Symptom
SIGKILL arrives before graceful shutdown completes; in-flight requests killed; data potentially corrupted
Fix
terminationGracePeriodSeconds must be > (lifecycle timeout × phases) + preStop sleep + 10s buffer
×

Running with a single replica (replicas: 1) in production

Symptom
Any pod eviction, node drain, or rolling update causes complete service downtime
Fix
Always run minReplicas: 2; add a PodDisruptionBudget with minAvailable: 1 to protect against concurrent disruptions
×

No preStop hook — race condition between SIGTERM and load balancer endpoint removal

Symptom
Requests routed to terminating pod return connection refused errors during rolling deployments
Fix
Add lifecycle.preStop.exec.command=['/bin/sh','-c','sleep 5'] to cover the load balancer propagation window
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between a liveness probe and a readiness probe in...
Q02SENIOR
What is graceful shutdown in Spring Boot and how does it work with Kuber...
Q03SENIOR
How does Kubernetes HPA calculate the number of replicas to scale to?
Q04SENIOR
Why must terminationGracePeriodSeconds exceed the spring.lifecycle.timeo...
Q05SENIOR
How do you achieve zero-downtime deployments on Kubernetes?
Q06SENIOR
How would you scale a Kafka consumer on Kubernetes based on consumer lag...
Q07SENIOR
What is a PodDisruptionBudget and when is it needed?
Q08SENIOR
How do you handle database schema migrations in a zero-downtime Kubernet...
Q01 of 08JUNIOR

What is the difference between a liveness probe and a readiness probe in Kubernetes?

ANSWER
The liveness probe answers 'is the container alive?' — Kubernetes restarts a container that fails it. It should only check application-internal state (JVM alive, no deadlock). The readiness probe answers 'is the container ready for traffic?' — Kubernetes removes a failing pod from the Service endpoint pool, stopping traffic routing, but does not restart it. It should check external dependencies (DB connectivity, cache health). A pod failing readiness is still alive and waiting for dependencies to recover.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
Should I use cpu limits in production?
02
How do I handle slow Spring Boot startup (>60 seconds) in Kubernetes?
03
What is the difference between ConfigMap and Secret in Kubernetes?
04
How many replicas should a production Spring Boot service run?
05
How do I set JVM heap size correctly for a Kubernetes container?
06
Can I use Spring Cloud Kubernetes to reload configuration without restarting pods?
🔥

That's Deployment. Mark it forged?

8 min read · try the examples if you haven't

Previous
CQRS Pattern in Spring Boot Microservices
1 / 2 · Deployment
Next
Deployment Rollback Strategies for Spring Boot