Intermediate 7 min · May 23, 2026

Spring Boot Production Deployment on Kubernetes

Q: Should I use cpu limits in production?

CPU limits cause throttling (slowing the app, not killing it), which is invisible in most monitoring. For latency-sensitive services, consider setting CPU limits generously (4x request) or omitting them entirely and using namespace ResourceQuota to bound total consumption. Memory limits should always be set — memory limit breaches cause OOM kills which are more disruptive but at least visible.

Q: How do I handle slow Spring Boot startup (>60 seconds) in Kubernetes?

Use a startupProbe instead of relying on liveness initialDelaySeconds. Set failureThreshold=30 and periodSeconds=10 to allow up to 5 minutes for startup. The startupProbe replaces the liveness probe until it succeeds, then hands off to the regular liveness probe. This prevents premature liveness failures during Spring Boot's context loading, Flyway migration, and cache warmup.

Q: What is the difference between ConfigMap and Secret in Kubernetes?

ConfigMaps store non-sensitive configuration as plain text. Secrets store sensitive data as base64-encoded values — they are NOT encrypted by default, just encoded. Enable encryption at rest (Envelope Encryption via KMS) for true secret protection. In practice, treat Secrets the same as ConfigMaps from an access control perspective unless encryption at rest is enabled. For production secrets, use External Secrets Operator with AWS Secrets Manager or HashiCorp Vault.

Q: How many replicas should a production Spring Boot service run?

Minimum 2 replicas for any service with a SLA — a single replica has zero tolerance for pod evictions or rolling updates. 3 replicas is the common baseline for reliability under node failure (losing one node leaves 2 replicas serving). Scale based on load via HPA. Set PodDisruptionBudget.minAvailable=1 (for 2 replicas) or maxUnavailable=1 (for 3+ replicas).

Q: How do I set JVM heap size correctly for a Kubernetes container?

Use -XX:MaxRAMPercentage=75.0 instead of a hardcoded -Xmx. This automatically calculates the heap as 75% of the container's memory limit, adapting correctly when you change the limit. Also set -XX:MaxMetaspaceSize=256m to cap Metaspace growth. The total JVM memory footprint is: Heap (75%) + Metaspace (256Mi) + Thread stacks (threads × 1Mi) + Direct buffers + JVM overhead. Size your memory limit to accommodate all of these, not just the heap.

Q: Can I use Spring Cloud Kubernetes to reload configuration without restarting pods?

Yes. Add spring-cloud-starter-kubernetes-client-config dependency and configure spring.cloud.kubernetes.reload.enabled=true. Beans annotated with @RefreshScope are recreated when the ConfigMap changes. Use mode: event for immediate reload on ConfigMap update or polling for periodic checks. This enables configuration changes (feature flags, tuning parameters) without a rolling deployment.

Deploy Spring Boot on Kubernetes with production-grade YAML, readiness/liveness probes, graceful shutdown, HPA, resource limits, and zero-downtime strategies..

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Drawn from code that ran under real load.

✓ Production

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Use /actuator/health/readiness for readiness probe and /actuator/health/liveness for liveness probe — never swap them
Set spring.lifecycle.timeout-per-shutdown-phase=30s and terminationGracePeriodSeconds=60 for graceful shutdown
Always set both requests AND limits for CPU and memory — missing requests breaks HPA scheduling
HPA requires metrics-server installed; for custom metrics use Prometheus Adapter or KEDA
ConfigMap for non-sensitive config, Secret for credentials — never bake secrets into container images

✦ Definition~90s read

What is Spring Boot Production Deployment?

Kubernetes is a container orchestration platform that manages the deployment, scaling, and health of containerised applications. A Spring Boot application is packaged as a Docker image and declared in a Kubernetes Deployment resource, which ensures a specified number of pod replicas are always running.

★

Running Spring Boot on Kubernetes is like managing a restaurant chain.

Kubernetes continuously reconciles the desired state (defined in YAML manifests) with the actual state of the cluster.

Key Kubernetes resources for a Spring Boot deployment: Deployment (manages pod replicas and rolling update strategy), Service (stable DNS name and load-balanced VIP for a set of pods), ConfigMap (environment configuration injected as env vars or mounted files), Secret (base64-encoded sensitive values — credentials, API keys), HorizontalPodAutoscaler (adjusts replica count based on CPU, memory, or custom metrics), and PodDisruptionBudget (ensures minimum availability during voluntary disruptions like node drains).

Spring Boot 2.3+ includes first-class Kubernetes support via Spring Boot Actuator: /actuator/health/readiness and /actuator/health/liveness endpoints, graceful shutdown via the SmartLifecycle abstraction, and integration with Kubernetes ConfigMap reloading via Spring Cloud Kubernetes.

Plain-English First

Running Spring Boot on Kubernetes is like managing a restaurant chain. Each pod is a kitchen. The readiness probe checks if the kitchen is ready to take orders (DB connected, caches warm). The liveness probe checks if the kitchen hasn't caught fire. HPA is the manager who opens more kitchens during lunch rush and closes them at night. Graceful shutdown is the 'last orders' call — no new customers, finish the current tables.

Deploying a Spring Boot application to Kubernetes for the first time typically takes an afternoon. Getting that deployment production-ready — with proper health checks, graceful shutdown, resource sizing, autoscaling, and zero-downtime rollouts — takes considerably longer, often because the gap between 'it runs' and 'it is production-safe' is invisible until an incident exposes it.

The most common first mistake is missing or misconfigured health probes. A liveness probe that points to /actuator/health (which includes database health) will kill a healthy pod the moment the database becomes temporarily unavailable — creating a cascade where the loss of DB connectivity triggers pod restarts, which surge connection pool creation, which worsens the DB overload. Kubernetes probes require precise configuration of which health indicator reports to which endpoint.

The second common mistake is omitting graceful shutdown. Without it, Kubernetes sends SIGTERM and immediately stops routing traffic to the pod — but the pod's in-flight requests are still being processed. With a 5-10 second request timeout and no graceful shutdown, every in-flight request at deployment time returns a 503. At 10 deployments per day, that is 10 traffic spikes of errors that accumulate in SLO burn rate dashboards.

Resource requests and limits are equally critical. Missing resource requests means the Kubernetes scheduler cannot make informed placement decisions, and the HPA has no baseline for scaling decisions. Missing limits means a memory leak in one pod can consume all node memory, triggering OOM kills of unrelated pods on the same node.

This guide covers the full production deployment stack: Deployment/Service/ConfigMap/Secret YAML, probe configuration, graceful shutdown setup, HPA with both CPU and custom Prometheus metrics, resource sizing formulas, and the operational checklist for zero-downtime deploys.

Complete Kubernetes Deployment YAML

A production Spring Boot Kubernetes deployment requires eight distinct resource types working together: Deployment, Service, ConfigMap, Secret, HorizontalPodAutoscaler, PodDisruptionBudget, ServiceAccount, and (optionally) a NetworkPolicy. Omitting any of these creates operational gaps.

The Deployment spec defines the pod template, container configuration, resource requests/limits, health probes, lifecycle hooks, and rolling update strategy. The strategy.rollingUpdate section should specify maxUnavailable=0 (zero pods taken down before new ones are ready) and maxSurge=1 (one extra pod spun up during rollout) for true zero-downtime deploys.

Resource requests must be set accurately based on observed metrics, not guessed. Under-setting requests causes the scheduler to pack too many pods on one node, leading to noisy-neighbour CPU throttling. Over-setting requests wastes capacity and prevents efficient bin-packing. The target is to set requests at the p50 (median) observed usage and limits at the p99 plus a 20% buffer.

The ServiceAccount with minimal RBAC permissions follows the principle of least privilege. Spring Cloud Kubernetes uses the service account to read ConfigMaps for live configuration reload — scope this permission tightly.

A PodDisruptionBudget with minAvailable=1 (or maxUnavailable=1 for larger deployments) ensures that node drains during cluster maintenance do not take down all pods simultaneously. This is frequently overlooked and causes unnecessary downtime during infrastructure maintenance.

terminationGracePeriodSeconds Must Exceed App Shutdown Timeout

If spring.lifecycle.timeout-per-shutdown-phase=30s (default), your terminationGracePeriodSeconds must be at least 30 + preStop sleep (5s) + buffer = 45s minimum. Kubernetes sends SIGKILL after terminationGracePeriodSeconds regardless, abruptly killing in-flight requests.

Production Insight

Always pin image tags to a specific digest (image@sha256:...) in production — mutable tags like :latest can silently pull a different image on pod restart.

Key Takeaway

maxUnavailable=0 + maxSurge=1 is the zero-downtime rolling update strategy — never set maxUnavailable>0 if you care about availability during deployments.

thecodeforge.io

Spring Boot Production Deployment

Health Probes: Readiness vs Liveness vs Startup

Kubernetes provides three probe types that serve distinct purposes. Confusing them is one of the most common and damaging production misconfigurations.

The liveness probe answers: 'Is this container worth keeping alive?' Kubernetes kills and restarts a container that fails the liveness probe. Therefore, liveness should only check application-internal state — is the JVM running, is the main thread alive, is there no deadlock. It must NOT check external dependencies like databases. If the database goes down, the application is still alive and should remain alive, waiting for the database to recover. Checking external health in the liveness probe causes pod restarts on infrastructure failures, worsening the situation.

The readiness probe answers: 'Is this container ready to receive traffic?' Kubernetes removes a pod from the Service's endpoint pool when it fails the readiness probe, stopping new traffic from routing to it. Readiness SHOULD check external dependencies: is the database connection pool healthy, is the cache warm, are required downstream services reachable. A pod that is failing readiness is still alive and counted in replica count — it just does not receive traffic.

The startup probe answers: 'Has the application finished starting up?' It runs instead of liveness and readiness until it succeeds, giving slow-starting applications (Spring Boot with many auto-configurations, database migration on startup) time to initialise without triggering liveness failures. Once the startup probe succeeds, Kubernetes switches to liveness and readiness probes.

Spring Boot Actuator exposes these three endpoints out of the box with management.endpoint.health.probes.enabled=true. The /actuator/health/liveness endpoint reports LivenessState (CORRECT or BROKEN). The /actuator/health/readiness endpoint reports ReadinessState (ACCEPTING_TRAFFIC or REFUSING_TRAFFIC). Your application can programmatically update these states via ApplicationAvailability.

Never Check External Dependencies in the Liveness Probe

Including database, Redis, or Kafka health in the liveness probe causes pod restarts on infrastructure failures — the worst time to restart. Use /actuator/health/liveness (JVM-only checks) for liveness and /actuator/health/readiness (all dependencies) for readiness.

Production Insight

Add a startupProbe with failureThreshold=30 and periodSeconds=10 — this gives 5 minutes for Spring Boot startup without triggering false liveness failures on slow nodes.

Key Takeaway

Liveness = JVM alive (never external deps). Readiness = all deps healthy (stops traffic). Startup = slow init allowed (prevents premature liveness failure).

Graceful Shutdown

Graceful shutdown is the process by which a Spring Boot application, upon receiving SIGTERM from Kubernetes, stops accepting new requests, allows in-flight requests to complete, then cleanly shuts down its connections and resources. Without it, every rolling deployment or node drain causes visible errors for the requests in flight at the moment of pod termination.

The graceful shutdown sequence in Kubernetes is: (1) Kubernetes sets pod to Terminating state, removes it from Service endpoints (stops routing new traffic), runs preStop hook; (2) simultaneously, Kubernetes sends SIGTERM to the container; (3) the application's shutdown hooks run in order: Spring's SmartLifecycle stops accepting new requests, waits for in-flight requests to complete (up to timeout), then closes each infrastructure component in order (HTTP server, Kafka consumers, JDBC pool, caches); (4) after terminationGracePeriodSeconds, Kubernetes sends SIGKILL regardless.

Critical timing issue: the endpoint removal (step 1) and SIGTERM (step 2) happen concurrently. There is a race condition where the load balancer may still route requests to the pod for a few hundred milliseconds after SIGTERM is received. The preStop sleep of 5 seconds covers this window.

Spring Boot 2.3+ enables graceful shutdown natively. Set server.shutdown=graceful and spring.lifecycle.timeout-per-shutdown-phase=30s. The timeout is per phase — Spring's SmartLifecycle has ordered phases (HIGHEST_PRECEDENCE to LOWEST_PRECEDENCE) and each gets 30 seconds. Total shutdown time could be multiple minutes for complex applications with many lifecycle phases.

Shell Form ENTRYPOINT Breaks SIGTERM

ENTRYPOINT java -jar app.jar (shell form) spawns /bin/sh -c which receives SIGTERM but does not forward it to the JVM. The JVM is force-killed after terminationGracePeriodSeconds. Always use exec form: ENTRYPOINT ["java", "-jar", "app.jar"].

Production Insight

Set terminationGracePeriodSeconds to (spring.lifecycle.timeout-per-shutdown-phase × number of lifecycle phases) + preStop sleep + 10s buffer — calculate it, do not guess.

Key Takeaway

Graceful shutdown requires three coordinated settings: server.shutdown=graceful, terminationGracePeriodSeconds>shutdown timeout, and a preStop sleep to cover the endpoint removal race.

thecodeforge.io

Spring Boot Production Deployment

HorizontalPodAutoscaler: CPU and Custom Metrics

The HorizontalPodAutoscaler (HPA) automatically adjusts the replica count of a Deployment based on observed metrics. The most common trigger is CPU utilisation, but production systems often need to scale on custom metrics like request queue depth, Kafka consumer lag, or active WebSocket connections.

CPU-based HPA uses the Kubernetes Metrics Server, which scrapes CPU usage from the kubelet every 15 seconds. HPA calculates the desired replica count as: ceil(currentMetricValue / desiredMetricValue × currentReplicas). For example, with targetCPUUtilizationPercentage=70, 3 replicas at 90% CPU → ceil(90/70 × 3) = ceil(3.86) = 4 replicas.

Custom metrics HPA requires additional infrastructure. The Prometheus Adapter translates Prometheus metrics into the Kubernetes custom metrics API. KEDA (Kubernetes Event-Driven Autoscaling) is a more powerful alternative that supports scaling to zero and a wider range of metric sources including Kafka consumer lag, RabbitMQ queue depth, and AWS SQS queue length.

HPA has configurable scale-up and scale-down stabilisation windows to prevent flapping. The default scale-up stabilisation is 0 seconds (scale up immediately) and scale-down is 300 seconds (wait 5 minutes before scaling down). This asymmetry is intentional — scale up fast to handle load spikes, scale down slowly to avoid premature termination.

VerticalPodAutoscaler (VPA) adjusts resource requests/limits based on observed usage, complementing HPA. Use VPA in recommendation mode first (never in auto mode in production) to understand actual resource needs before setting requests and limits manually.

Use KEDA for Event-Driven Scaling

KEDA can scale to zero (0 replicas when no events) and back up on demand — impossible with standard HPA. It supports Kafka lag, RabbitMQ queue depth, AWS SQS, Azure Service Bus, and 50+ other triggers. For async workloads, KEDA provides much more accurate scaling signals than CPU utilisation.

Production Insight

Set minReplicas=2 for any production service — a single-replica deployment has zero tolerance for pod evictions, and node drains will cause downtime.

Key Takeaway

CPU-based HPA is the starting point; migrate to custom metrics (KEDA with Kafka lag) for event-driven workloads where CPU is a poor proxy for actual load.

ConfigMap and Secret Management

Kubernetes ConfigMaps and Secrets decouple application configuration from container images, enabling the same image to run in development, staging, and production with different configuration values. This is a core Twelve-Factor App principle.

ConfigMaps store non-sensitive configuration: feature flags, tuning parameters, service URLs, Spring profiles. Secrets store sensitive values: database passwords, API keys, TLS certificates, Kafka credentials. Secrets are base64-encoded in the API server (not encrypted by default — enable encryption at rest for production clusters).

Spring Boot reads Kubernetes configuration via environment variables (preferred for simple values) or mounted files (preferred for complex configuration like Java keystore files or multi-line YAML). For dynamic configuration reload without pod restart, Spring Cloud Kubernetes Config Server polls the ConfigMap for changes and refreshes @RefreshScope beans.

Never bake secrets into Docker images or commit them to source control. Use external secret management: AWS Secrets Manager + External Secrets Operator, HashiCorp Vault + Vault Agent Injector, or Sealed Secrets for GitOps. The External Secrets Operator syncs secrets from external stores into Kubernetes Secrets automatically, including rotation.

For sensitive Spring Boot properties, prefer environment variable injection over mounted files — mounted files may be accessible to other processes in the container. Use Kubernetes Secret references in env.valueFrom.secretKeyRef for fine-grained per-key injection.

Enable Kubernetes Secret Encryption at Rest

By default, Kubernetes Secrets are stored base64-encoded (not encrypted) in etcd. Enable Envelope Encryption using a KMS provider (AWS KMS, GCP KMS) to encrypt Secrets at rest. Without this, anyone with etcd access can read all Secrets in plaintext.

Production Insight

Use External Secrets Operator to sync secrets from AWS Secrets Manager or HashiCorp Vault — it handles rotation automatically and keeps Kubernetes Secrets current without manual intervention.

Key Takeaway

ConfigMaps for non-sensitive config (reload without restart via Spring Cloud Kubernetes); Secrets for credentials (use External Secrets Operator, never commit to Git).

Resource Sizing and Zero-Downtime Deploy Checklist

Resource sizing for Spring Boot on Kubernetes requires measuring actual usage, not guessing. Start with VPA in recommendation mode for one week, then set requests at the p50 of observed usage and limits at p99 + 20%.

For a typical Spring Boot REST service with moderate traffic: memory request 512Mi, memory limit 1Gi, CPU request 250m, CPU limit 1000m. Adjust based on your JVM heap configuration (-XX:MaxRAMPercentage=75 → 768Mi heap for 1Gi limit), number of threads (each thread uses ~1MB stack by default), and HikariCP pool size (each connection uses ~10MB in PostgreSQL).

Zero-downtime deployment requires all of the following to be true simultaneously: (1) readiness probe passes before pod receives traffic, (2) maxUnavailable=0 so old pods stay until new pods are ready, (3) preStop sleep covers load balancer propagation delay, (4) terminationGracePeriodSeconds > shutdown time, (5) PodDisruptionBudget ensures minimum replicas during the rollout, (6) new version is backward-compatible with the current database schema (no breaking migrations before deployment completes).

Database migration strategy for zero-downtime: use Flyway or Liquibase with non-breaking migrations. The rule: never remove or rename columns in the same migration as deploying new code that stops using them. Phase it: (1) add new column, deploy new code that writes to both old and new column; (2) backfill new column; (3) deploy code that reads from new column only; (4) remove old column in a later migration after all pods are on the new version.

Use CPU Limits Conservatively — Throttling is Invisible

CPU limits cause CPU throttling (not OOM kill) — the application slows down but does not crash. Throttling is invisible in most metrics dashboards. Set CPU limits generously (2-4x request) or not at all for latency-sensitive services. Use resource quotas at the namespace level to bound total cluster consumption instead.

Production Insight

Run kubectl top pods every hour for a week after a new deployment to establish accurate resource baselines — do not trust sizing estimates made before load testing.

Key Takeaway

Zero-downtime deployment is a system property requiring maxUnavailable=0, preStop sleep, graceful shutdown, PDB, and backward-compatible DB migrations — any missing piece causes visible errors.

Why Your Dockerfile Is Sabotaging Your Deployments

Most Spring Boot Dockerfiles copy the entire fat JAR into a single layer. That means every code change forces a full image rebuild and a full layer download on every node in your cluster. In production, that's minutes of cold start time you don't have.

Spring Boot 3.x ships with native layered JAR support. When you build with Maven or Gradle, the JAR is split into four layers: dependencies, spring-boot-loader, snapshot-dependencies, and application code. Your Dockerfile should extract these layers into separate image layers. Now, when you change one Java file, only the application layer is rebuilt and pushed. Kubernetes nodes pull only the changed layer — not the entire 200MB artifact.

The payoff: deployments go from 90 seconds to 15. Zero-downtime rollouts actually work because your pods are ready before the old ones are killed.

DockerfileDOCKERFILE

// io.thecodeforge — docker best practices
FROM eclipse-temurin:17-jre-alpine AS builder
WORKDIR /app
ARG JAR_FILE=target/*.jar
COPY ${JAR_FILE} app.jar
RUN java -Djarmode=layertools -jar app.jar extract

FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
COPY --from=builder /app/dependencies/ ./
COPY --from=builder /app/spring-boot-loader/ ./
COPY --from=builder /app/snapshot-dependencies/ ./
COPY --from=builder /app/application/ ./
ENTRYPOINT ["java", "org.springframework.boot.loader.launch.JarLauncher"]

Output

Layer sizes: dependencies (85MB), app code (12MB). Image pull time: 8 seconds vs 45.

Production Trap:

Never use ADD or COPY with a wildcard on the JAR. Each change busts the Docker cache. Always extract layers in a multi-stage build.

Key Takeaway

Fat JARs are for devs. Layered images are for production.

Stop Hardcoding Environment Names in Spring Boot Config

Competitor tutorials show you how to use application-dev.yml and application-prod.yml. That works until you need to deploy the same artifact to staging, canary, and prod from one CI pipeline. Hardcoded profiles break reproducibility.

Spring Boot 3.x supports externalized configuration via Kubernetes ConfigMaps and Secrets. But the real power is in Spring Cloud Kubernetes — it reloads configuration without restarting the pod. Your deployment.yaml mounts the ConfigMap as a volume. Spring Boot watches for file changes and rebinds beans annotated with @RefreshScope.

Why this matters: you can roll back a config change in seconds by editing the ConfigMap. No redeploy. No image rebuild. The pod picks up the new values within the refresh interval (default 5 seconds). This is how Netflix pushes config to 10,000 instances without a single restart.

Map your environment-specific secrets to Spring properties using spring.config.import. Your code stays clean. Your ops team sleeps better.

deployment.yamlYAML

// io.thecodeforge — spring cloud kubernetes config
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  template:
    spec:
      containers:
      - name: app
        env:
        - name: SPRING_CONFIG_IMPORT
          value: "configmap:payment-config"
        volumeMounts:
        - name: config
          mountPath: /etc/config
      volumes:
      - name: config
        configMap:
          name: payment-config

Output

Pod reads ConfigMap at startup. Changes reflect within 5 seconds. No restart needed.

The Smart Way:

Map your Spring profiles to Kubernetes namespaces, not file names. One image. Many environments.

Key Takeaway

Config is data. Treat it like a database, not a compile-time artifact.

● Production incidentPOST-MORTEMseverity: high

Liveness Probe Pointed at /actuator/health Causes Pod Restart Storm During RDS Maintenance

Symptom

At 03:12 AM: PagerDuty alert — order service availability dropped to 23%. Logs showed hundreds of pod restarts. RDS event log showed a planned maintenance failover at 03:10 AM.

Assumption

The on-call engineer assumed the RDS failover caused the database connections to fail and pods to crash. In fact, pods were healthy — they were being killed by Kubernetes.

Root cause

All liveness probes were configured to httpGet: path: /actuator/health. Spring Boot's /actuator/health includes a DataSourceHealthIndicator that reports DOWN when the database is unreachable. When RDS failed over (2-minute window), all pods reported DOWN to the liveness probe. Kubernetes interpreted this as 'the application is crashed' and killed/restarted all pods. The restart surge created 200+ new HikariCP connections trying to reach still-unreachable RDS, worsening connection exhaustion when RDS came back online.

Fix

Immediate: kubectl patch deployments to point liveness probe at /actuator/health/liveness (which only checks application-internal state, not DB). Long-term: added management.endpoint.health.probes.enabled=true and management.health.livenessstate.enabled=true to all Spring Boot apps. Added a PodDisruptionBudget to prevent >1 pod being disrupted simultaneously.

Key lesson

The liveness probe must ONLY check if the JVM process is alive and not deadlocked — never external dependencies.
Use /actuator/health/liveness for liveness, /actuator/health/readiness for readiness.
They exist for exactly this reason.

Production debug guideSymptom → root cause → fix5 entries

Symptom · 01

Pods stuck in CrashLoopBackOff

→

Fix

Run kubectl describe pod <pod-name> and check the 'Last State' section for the exit code. Exit code 1: application threw an uncaught exception at startup — check kubectl logs <pod> --previous for the stack trace. Exit code 137: OOM Kill — the pod exceeded its memory limit; check kubectl top pod and increase the memory limit or fix a memory leak. Exit code 143: SIGTERM not handled — the app did not shut down within terminationGracePeriodSeconds; check graceful shutdown configuration. For startup failures, temporarily increase the startupProbe failureThreshold to 30 to give the app more time to start.

Symptom · 02

Rolling update causes traffic errors — 502/503 during deployment

→

Fix

This is usually a readiness probe misconfiguration or missing preStop hook. First check: is the readiness probe passing before the pod receives traffic? Add readinessProbe.initialDelaySeconds matching your app's startup time (use startupProbe instead for variable startup times). Second check: does the pod stop receiving traffic before it starts shutting down? Add a preStop hook: lifecycle.preStop.exec.command=['/bin/sh','-c','sleep 5'] — this gives the Kubernetes Service endpoint controller time to remove the pod from the load balancer before the app starts shutting down connections.

Symptom · 03

HPA not scaling up despite high CPU

→

Fix

Run kubectl describe hpa <name> and read the 'Conditions' section. Common causes: (1) metrics-server not installed — check kubectl top pods; if it fails, install metrics-server. (2) resource requests not set — HPA calculates CPU utilisation as actual/requested; without requests, the calculation fails. (3) pod already at maxReplicas — increase maxReplicas. (4) cooldown period — HPA has a default 5-minute scale-up stabilisation window; check kubectl describe hpa for last scale event.

Symptom · 04

Pods not receiving SIGTERM — forceful kill after terminationGracePeriodSeconds

→

Fix

Check if the application is handling SIGTERM correctly. Spring Boot 2.3+ handles SIGTERM via the SmartLifecycle abstraction when spring.lifecycle.timeout-per-shutdown-phase is set. Verify by checking the logs for 'Shutting down ExecutorService' or 'Closing JPA EntityManagerFactory' messages. If absent, the JVM is not receiving SIGTERM — verify your Docker ENTRYPOINT uses exec form: ENTRYPOINT ["java", "-jar", "app.jar"] not shell form: ENTRYPOINT java -jar app.jar. Shell form spawns a shell that receives SIGTERM but does not forward it to the JVM.

Symptom · 05

OOMKilled pods — memory limit exceeded

→

Fix

Run kubectl top pods --sort-by=memory to find the memory-hungry pods. Check if the JVM heap is configured: -Xmx should be set to ~75% of the container memory limit. Without -Xmx, the JVM uses 25% of total node memory as heap ceiling, which often exceeds the container limit after GC overhead. Add -XX:MaxRAMPercentage=75.0 instead of hardcoding -Xmx. Also check for off-heap memory: Metaspace (-XX:MaxMetaspaceSize=256m), direct buffers, and native libraries. Use -XX:+HeapDumpOnOutOfMemoryError to capture heap dumps on OOM.

★ Debug Cheat SheetImmediate Kubernetes diagnostic commands for Spring Boot deployments

Pod CrashLoopBackOff or failing health probes−

Immediate action

Describe the pod and get previous container logs

Commands

kubectl describe pod -l app=my-service -n production | grep -A 20 'Last State\|Events'

kubectl logs -l app=my-service --previous -n production --tail=100

Fix now

Exit code 137 = OOM (increase memory limit); Exit code 1 = startup failure (check stack trace in previous logs); Exit code 143 = graceful shutdown timeout (check terminationGracePeriodSeconds)

HPA not scaling+

Rolling update causing 503 errors+

Pod consuming more memory than expected+

Probe Types Comparison

Probe	Spring Endpoint	Checks	Failure Action	When to Use
Startup	/actuator/health/liveness	JVM alive (no DB)	Restart pod	Slow startup apps; replaces liveness during init
Liveness	/actuator/health/liveness	JVM alive, no deadlock	Restart pod	Detect hung/deadlocked JVM — never external deps
Readiness	/actuator/health/readiness	DB, Kafka, Redis	Remove from LB	Detect unhealthy dependencies; stops traffic routing

⚙ Quick Reference

2 commands from this guide

File	Command / Code	Purpose
Dockerfile	FROM eclipse-temurin:17-jre-alpine AS builder	Why Your Dockerfile Is Sabotaging Your Deployments
deployment.yaml	apiVersion: apps/v1	Stop Hardcoding Environment Names in Spring Boot Config

Key takeaways

Liveness probe must NEVER check external dependencies

only /actuator/health/liveness (JVM-only); external deps go in readiness probe only

Graceful shutdown requires server.shutdown=graceful, terminationGracePeriodSeconds > total shutdown time, preStop sleep, and exec-form ENTRYPOINT

maxUnavailable=0 + maxSurge=1 is the zero-downtime rolling update strategy

any maxUnavailable>0 risks traffic errors during deployment

Always set both resource requests and limits

missing requests breaks HPA and causes noisy-neighbour scheduling

PodDisruptionBudget with minAvailable=1 is non-negotiable for production services with SLAs

Database migrations must be backward-compatible across deployment windows

phase schema changes across two deployments

Common mistakes to avoid

6 patterns

Checking external dependencies (DB, Redis) in the liveness probe

Symptom

Pod restarts cascade during infrastructure maintenance — DB failover triggers pod restart storm

Fix

Use /actuator/health/liveness for liveness (JVM-only checks); use /actuator/health/readiness for readiness (all dependencies)

Using Docker ENTRYPOINT in shell form instead of exec form

Symptom

SIGTERM not forwarded to JVM — app is forcefully killed after terminationGracePeriodSeconds; in-flight requests lost

Fix

Use exec form: ENTRYPOINT ["java", "-jar", "app.jar"] — the JVM is PID 1 and receives SIGTERM directly

Not setting CPU/memory resource requests

Symptom

HPA shows 'unknown' metrics; scheduler places too many pods on one node; noisy-neighbour CPU throttling

Fix

Always set both requests and limits; use VPA recommendation mode for one week to calibrate realistic values

Setting terminationGracePeriodSeconds less than spring.lifecycle.timeout-per-shutdown-phase

Symptom

SIGKILL arrives before graceful shutdown completes; in-flight requests killed; data potentially corrupted

Fix

terminationGracePeriodSeconds must be > (lifecycle timeout × phases) + preStop sleep + 10s buffer

Running with a single replica (replicas: 1) in production

Symptom

Any pod eviction, node drain, or rolling update causes complete service downtime

Fix

Always run minReplicas: 2; add a PodDisruptionBudget with minAvailable: 1 to protect against concurrent disruptions

No preStop hook — race condition between SIGTERM and load balancer endpoint removal

Symptom

Requests routed to terminating pod return connection refused errors during rolling deployments

Fix

Add lifecycle.preStop.exec.command=['/bin/sh','-c','sleep 5'] to cover the load balancer propagation window

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

What is the difference between a liveness probe and a readiness probe in...

Q02SENIOR

What is graceful shutdown in Spring Boot and how does it work with Kuber...

Q03SENIOR

How does Kubernetes HPA calculate the number of replicas to scale to?

Q04SENIOR

Why must terminationGracePeriodSeconds exceed the spring.lifecycle.timeo...

Q05SENIOR

How do you achieve zero-downtime deployments on Kubernetes?

Q06SENIOR

How would you scale a Kafka consumer on Kubernetes based on consumer lag...

Q07SENIOR

What is a PodDisruptionBudget and when is it needed?

Q08SENIOR

How do you handle database schema migrations in a zero-downtime Kubernet...

Q01 of 08JUNIOR

What is the difference between a liveness probe and a readiness probe in Kubernetes?

ANSWER

The liveness probe answers 'is the container alive?' — Kubernetes restarts a container that fails it. It should only check application-internal state (JVM alive, no deadlock). The readiness probe answers 'is the container ready for traffic?' — Kubernetes removes a failing pod from the Service endpoint pool, stopping traffic routing, but does not restart it. It should check external dependencies (DB connectivity, cache health). A pod failing readiness is still alive and waiting for dependencies to recover.

FAQ · 6 QUESTIONS

Frequently Asked Questions

Should I use cpu limits in production?

How do I handle slow Spring Boot startup (>60 seconds) in Kubernetes?

What is the difference between ConfigMap and Secret in Kubernetes?

How many replicas should a production Spring Boot service run?

How do I set JVM heap size correctly for a Kubernetes container?

Can I use Spring Cloud Kubernetes to reload configuration without restarting pods?

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Drawn from code that ran under real load.

✓ Verified

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

🔥

That's Deployment. Mark it forged?

7 min read · try the examples if you haven't