Kubernetes Pods and Deployments - Missing Startup Probe 502
A readiness probe with initialDelaySeconds=5 caused 8 minutes of 502 errors in production.
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
- Pod: Ephemeral, non-self-healing. Shares a network namespace (localhost) and volumes across containers.
- Deployment: Manages ReplicaSets which manage Pods. Handles rolling updates, rollbacks, and scaling.
- ReplicaSet: The intermediate controller that ensures N replicas exist. You rarely interact with it directly.
- Probes: Liveness (restart if deadlocked) vs Readiness (remove from service endpoints if not ready).
- maxSurge: More surge = faster rollout but higher peak resource usage.
- maxUnavailable: 0 = zero-downtime but slower rollout. Higher = faster but riskier.
- Creating Pods directly instead of Deployments. Direct Pods are not self-healing. If the node dies, the Pod stays dead forever.
Think of a Pod as a shipping container — it holds your application and its immediate companions (like a logging sidecar). A Deployment is the shipping company's logistics system — it makes sure the right number of containers are always at the dock, replaces any that get damaged, and can swap out old cargo for new cargo without stopping operations.
Every production Kubernetes workload ultimately runs as Pods managed by Deployments. The Pod is the atomic scheduling unit — a group of containers sharing a network namespace and volumes. The Deployment is the declarative controller that ensures the right number of Pods exist, handles rolling updates, and rolls back on failure.
Misconfiguring either object causes production incidents: Pods without resource limits cause noisy-neighbor OOMKills, Deployments without proper probes cause 502 errors during rollouts, and direct Pod creation bypasses self-healing entirely. Understanding the reconciliation loop — how the Deployment controller continuously drives current state toward desired state — is the foundation for debugging every higher-level Kubernetes object.
Why Pods Need Deployments — and Probes
A Kubernetes Pod is the smallest deployable unit: one or more containers sharing a network namespace, storage volumes, and a lifecycle. A Deployment is a controller that declares a desired state — number of replicas, update strategy, rollback policy — and drives the cluster toward it. Without a Deployment, a Pod is ephemeral: if its node dies, it's gone. With a Deployment, the ReplicaSet ensures the count stays correct, and the controller handles rolling updates and self-healing.
In practice, a Deployment creates a ReplicaSet, which creates Pods. The Deployment's spec defines a Pod template — image, ports, resource limits, and probes. The key properties: replicas (target count), strategy (RollingUpdate vs Recreate), and revisionHistoryLimit (how many old ReplicaSets to keep for rollback). RollingUpdate uses maxSurge and maxUnavailable to control the pace of change — default 25% each, which means during an update you may have 125% of desired Pods briefly.
Use a Deployment for any stateless or lightly stateful service that needs to survive failures, scale horizontally, or update without downtime. StatefulSets exist for ordered, persistent workloads (databases). Deployments are the default choice for APIs, workers, web servers — anything that can be replaced without data loss. The critical nuance: a Deployment only guarantees Pods are running, not that they're ready to serve traffic. That's where probes come in.
failureThreshold * periodSeconds exceeding your worst-case startup time — typically 60-90s for JVM apps with connection pools.Pod Basics: The Atomic Unit
In the world of Kubernetes, the Pod is the atomic unit of scheduling. While you might be used to thinking in terms of 'containers,' Kubernetes thinks in 'Pods.' A Pod can host a single container, or a tightly coupled group of containers (like an app container and a 'sidecar' logging agent) that need to share the same local network (localhost) and storage volumes.
Crucially, Pods are ephemeral. They are born, they live, and they die. They are never 'repaired'; they are replaced.
- Pod IP is assigned at creation and changes on restart. Use Service DNS for discovery.
- Container filesystem is ephemeral. Use PersistentVolumes for data that must survive restarts.
- Pods in the same Pod share localhost networking. Containers in different Pods do not.
- Sidecar containers (logging, proxy) share the Pod's lifecycle. If one crashes, the Pod is restarted.
- Init containers run before the main container. They block Pod startup until they complete successfully.
kubectl run or a Pod manifest) for production workloads. Direct Pods have no self-healing: if the node crashes, the Pod is gone forever with no replacement. The Deployment controller is what provides self-healing, rolling updates, and rollback. Every production workload must use a Deployment, StatefulSet, or DaemonSet — never a raw Pod.Deployments: Orchestrating the Desired State
A Deployment is a high-level object that manages a ReplicaSet, which in turn manages Pods. Its job is to ensure the 'Desired State' matches the 'Current State.' If you tell a Deployment you want 3 replicas, and a node crashes taking one Pod with it, the Deployment controller notices the discrepancy and immediately schedules a new Pod on a healthy node.
Deployments are also the primary vehicle for Rolling Updates. By manipulating the maxSurge and maxUnavailable parameters, you can swap out version 1.0 for 2.0 without dropping a single user request.
- Deployment creates ReplicaSets. ReplicaSets create Pods. You interact with Deployments.
- Each rollout creates a new ReplicaSet. Old ReplicaSets are kept for rollback (default% of desired pods are always up template: : 10 revisions).
- The Deployment controller manages the transition: scale up new ReplicaSet, scale down old ReplicaSet.
- maxSurge: How many extra Pods can exist above the desired count during rollout.
- maxUnavailable: How many Pods can be missing below the desired count during rollout.
maxSurge and maxUnavailable parameters directly control rollout speed vs resource usage. maxSurge: 25%, maxUnavailable: 0 means: during rollout, create up to 25% extra new Pods before terminating old Pods. This ensures zero-downtime but requires 125% of normal resource capacity. For resource-constrained clusters, set maxSurge: 0, maxUnavailable: 25% to terminate old Pods first (faster, less resource usage, but brief capacity reduction). The worst configuration is maxSurge: 0, maxUnavailable: 0 — the rollout blocks forever because it cannot create new Pods (no surge) and cannot terminate old Pods (no unavailability allowed).Probe Comparison: Liveness vs Readiness vs Startup
Kubernetes provides three distinct probe types — liveness, readiness, and startup — each serving a different purpose in the Pod lifecycle. Misunderstanding their roles is the root cause of many production incidents, including the 502 error scenario described earlier.
Liveness Probe: Determines if the container is alive. If it fails, the kubelet kills the container and restarts it (according to the Pod's restartPolicy). Use it to recover from deadlocks or infinite loops. Never check external dependencies in a liveness probe — if the database is slow, the probe fails, the container restarts, and the restart increases load on the database, causing cascading failure.
Readiness Probe: Determines if the container is ready to serve traffic. If it fails, the Pod's IP is removed from the Service endpoints — traffic stops, but the container is NOT restarted. Use it for applications that need to load cache, connect to databases, or run startup migrations before handling requests. The readiness probe controls traffic flow, not container lifecycle.
Startup Probe: Gates both liveness and readiness probes. While the startup probe has not yet succeeded, liveness and readiness probes are disabled. Use it for applications with startup times greater than 30 seconds. Set a high failureThreshold (e.g., 60) with a short periodSeconds (e.g., 5) to allow up to 300 seconds for startup. Once the startup probe succeeds, liveness and readiness probes begin their normal checks.
All three probes support the same handler types: httpGet, tcpSocket, exec, and grpc.
failureThreshold: 60 and periodSeconds: 5 would have prevented traffic from reaching the Pod until the startup probe succeeded, completely eliminating the error window.Liveness/Readiness/Startup Probe Comparison Table
While the previous section explained each probe's purpose, a comparison table helps you quickly decide which probe to use in any scenario. The table below summarizes the differences across key dimensions: behavior on failure, impact on the Pod, typical use cases, handler types, and best practices for configuration.
| Aspect | Liveness Probe | Readiness Probe | Startup Probe |
|---|---|---|---|
| Purpose | Is the container alive? | Is the container ready to serve traffic? | Has the container finished booting? |
| On failure | Kubelet kills container, restarts per restartPolicy | Pod removed from Service endpoints (no restart) | Gates liveness & readiness; while failing, liveness/readiness disabled |
| Impact | Container restarts (may cause CrashLoopBackOff) | Traffic stops, but container stays running | Blocks startup progress if failing; once passes, liveness/readiness start |
| Typical use case | Deadlock detection, infinite loop recovery | Cache warmup, DB connection, migration completion | Apps with >30s boot time, legacy apps, heavy initialization |
| Handler support | httpGet, tcpSocket, exec, grpc | httpGet, tcpSocket, exec, grpc | httpGet, tcpSocket, exec, grpc |
| Configuration advice | Keep simple: high threshold, long period; avoid dependency checks | Set initialDelaySeconds to 0 when using startup probe; check real readiness | High failureThreshold (e.g., 60) × short periodSeconds (e.g., 5) to cover max startup |
| Common mistakes | Using to check database (cascading failure) | Setting initialDelaySeconds too low (adds Pod to endpoints before ready) | Not used at all for slow-start apps (liveness kills during boot) |
When to combine: Always pair readiness probes with startup probes for applications that warm up slowly. The startup probe disables liveness and readiness during boot, preventing premature restarts and premature traffic. For fast-starting apps (<30s), a simple readiness probe with initialDelaySeconds may suffice, but adding a startup probe costs nothing and adds safety.
Deployment Strategies: RollingUpdate vs Recreate
When updating a Deployment, Kubernetes supports two strategies: RollingUpdate (default) and Recreate. The choice between them directly impacts availability, resource usage, and rollout speed.
RollingUpdate: Replaces old Pods with new Pods incrementally. Controlled by two parameters: - maxSurge: How many extra Pods (count or percentage) can be created above the desired replica count during the update. - maxUnavailable: How many Pods (count or percentage) can be unavailable during the update.
During a rolling update, the Deployment controller creates new Pods in a new ReplicaSet, waits for them to become ready, then scales down the old ReplicaSet. The process repeats until all old Pods are replaced. This strategy enables zero-downtime deployments, but at the cost of requiring extra cluster capacity (at least maxSurge proportion of overhead).
Recreate: Terminates all old Pods simultaneously, then creates new Pods. This is a simple 'kill all, start all' pattern. During the update, the application is completely unavailable — no traffic can be served until all new Pods are ready. Use this strategy only when the application cannot have multiple versions running concurrently (e.g., stateful applications with incompatible schemas, or when file locks prevent coexistence). Recreate is also useful for cost-constrained environments where you cannot afford the overhead of extra Pods during a rollout.
- RollingUpdate: Stateless APIs, microservices, web frontends — anything that needs 100% uptime during deploys.
- Recreate: Stateful databases (during schema migrations), batch jobs, or any application that enforces exclusive access to a resource.
maxSurge proportion of replicas. For a 10-replica Deployment with maxSurge: 25%, you need capacity for 12.5 Pods (round up to 13) during the rollout. In resource-constrained clusters, you can trade availability for capacity by setting maxUnavailable: 25% and maxSurge: 0, which terminates old Pods before creating new ones (brief capacity dip but less overhead). Always calculate the peak resource requirement for rolling updates and ensure the cluster can handle it.RollingUpdate vs Recreate Visual Comparison
While the previous section explained when to use each strategy, this visual comparison highlights the key differences in resource usage, timeline, and availability during a deployment. Use this diagram and table to communicate rollout behavior to your team and to decide which strategy fits your workload.
Timeline Comparison (3 replicas, 10s per Pod startup): - Recreate: 0s — all 3 old pods terminated. 10s — first new pod ready. 20s — all new pods ready. Total downtime: ~10s (from 0s to first pod ready). - RollingUpdate (maxSurge=1, maxUnavailable=0): 0s — 3 old pods serving, 1 new pod created. 10s — new pod ready, old pod terminated. 15s — second new pod created. 25s — second new pod ready, second old pod terminated. 30s — third new pod created. 40s — third new pod ready, third old pod terminated. Total downtime: 0s (always at least 2 pods serving).
Resource Peak Comparison (3 replicas, each request 256Mi memory, 500m CPU): - Recreate: 0s-10s: 0 pods, 0 resources used. 10s-20s: 3 pods, 768Mi/1.5 CPU. Peak: same as steady state. - RollingUpdate (maxSurge=1, maxUnavailable=0): Overlap phase: 2 old + 1 new = 3 pods, plus 1 extra during surge = 4 pods simultaneously. Peak memory: 4×256Mi = 1024Mi. Peak CPU: 4×500m = 2000m. Requires 33% more capacity than steady state.
Resource Management: Requests, Limits, and QoS Classes
Every container in Kubernetes should specify CPU and memory requests and limits. These settings directly affect scheduling, runtime performance, and cluster stability. Misconfiguring them is a top cause of production incidents.
Requests are the minimum amount of resources guaranteed to the container. The Kubernetes scheduler uses requests to make placement decisions — it only schedules a Pod on a node that has at least the sum of all Pod requests available. Requests also ensure the container gets at least that much CPU and memory under contention.
Limits are the maximum amount of resources a container is allowed to consume. If a container exceeds its CPU limit, it gets throttled (not killed). If it exceeds its memory limit, it gets OOMKilled (exit code 137). Limits prevent a single container from starving other containers on the same node.
QoS Classes: Kubernetes categorizes Pods into three Quality of Service classes based on request and limit settings: - Guaranteed: requests == limits for all resources (e.g., cpu: 500m, limits cpu: 500m). These Pods are least likely to be evicted and get the most predictable performance. - Burstable: At least one container has requests < limits for any resource. These Pods get their requested minimum but can burst to their limit if node capacity is available. - BestEffort: No requests or limits set. These Pods receive no guarantees and are the first to be evicted under node pressure. Avoid BestEffort in production.
Production Best Practices: 1. Always set requests and limits for every container. Never leave them unset. 2. Set requests equal to limits for critical workloads to achieve Guaranteed QoS and predictable performance. 3. Base request values on steady-state resource usage observed in production over 7 days. Do not guess. 4. Set memory limits to 1.5x the observed maximum to handle transient spikes without OOMKill. 5. CPU limits are less critical than memory limits, but still set them to prevent noisy neighbors. CPU throttling is better than OOMKill. 6. Use Vertical Pod Autoscaler (VPA) in recommendation mode to generate initial request/limit values, then refine manually.
Resource Management Best Practices (Requests/Limits)
While the previous section explained the mechanics of requests, limits, and QoS classes, this section provides a concise list of best practices that you can apply immediately to your production Deployments. These recommendations come from real-world incidents and thousands of production clusters.
1. Never leave resources unset. A container without requests runs as BestEffort QoS — it gets no guarantees and is first to be evicted. A container without limits can consume all node memory and OOMKill other Pods. Always set both.
2. Base requests on steady-state usage, not peak. Monitor your application for at least 7 days using kubectl top pod or a metrics system (Prometheus). Set requests to the 50th percentile of observed usage. Set limits to 1.5x the 95th percentile for memory (to handle spikes) and 2x the 95th percentile for CPU (since throttling is acceptable).
3. Use Guaranteed QoS for all critical Deployments. Setting requests == limits gives the Pod the highest eviction priority and most stable CPU scheduling. The only cost is that you cannot overcommit, but for critical services this is a feature, not a bug.
4. Test resource configurations under load. A common mistake: setting memory too low based on idle measurement. During traffic spikes, memory usage can double. Run load tests that mimic peak traffic and verify that memory stays below the limit.
5. Avoid setting identical CPU and memory values for all containers. Each container type has different resource profiles. For example, a sidecar proxy (Envoy) needs different requests than the main application container. Use per-container settings.
6. Implement resource quotas at the namespace level. Even with per-Pod limits, a runaway Deployment can create many Pods that collectively use too many resources. Use ResourceQuota objects to enforce aggregate limits per namespace.
7. Monitor for OOMKill events and CPU throttling. Set up alerts on kube_pod_container_status_terminated_reason (OOMKilled) and container_cpu_cfs_throttled_seconds_total. These events indicate misconfigured resource limits.
8. Use VPA in recommendation mode (not auto) for initial tuning. The Vertical Pod Autoscaler can analyze historical usage and suggest request/limit values. Review its recommendations before applying. Do not enable auto mode for stateless workloads unless you understand the disruption it causes (Pod restart on every recommendation).
9. Consider using alternative scheduling policies. For batch workloads that can tolerate lower priority, use priorityClassName to distinguish critical vs. best-effort. Combined with Guaranteed QoS, priority ensures your most important workloads survive contention.
10. Document your resource strategy. Include typical request/limit values for each service, the reasoning behind them, and instructions for adjusting. This prevents future engineers from guessing or copying incorrect values from other services.
Production Operations: Essential kubectl Patterns
Managing Deployments in production requires more than just apply. You need to be able to inspect the rollout history, trigger instant rollbacks, and scale on demand.
- Rollback = scale up old ReplicaSet + scale down new ReplicaSet. No image pull needed if cached.
- revisionHistoryLimit (default 10): How many old ReplicaSets to keep. Increase to 20-50 for critical services.
- kubectl rollout pause/resume: Pause a rollout mid-way to test a subset of new Pods before completing.
- kubectl rollout undo --to-revision=N: Rollback to a specific revision, not just the previous one.
- kubectl rollout restart: Triggers a rolling restart without changing the image. Useful for picking up ConfigMap changes.
kubectl rollout pause command is the foundation of canary deployments without a service mesh. Pause the rollout after one new Pod is created, send a small percentage of traffic to it via Service endpoint selection, monitor error rates, then either resume or undo. This gives you canary behavior with standard Kubernetes objects. Combine with maxSurge: 1 to ensure only one canary Pod is created during the pause.kubectl apply. Use rollout undo for instant rollbacks, rollout pause for canary deployments, and rollout restart for ConfigMap-driven restarts. Keep old images available — rollback fails if the image is garbage-collected.Automated Rollbacks: When Your Deploy Goes Sideways at 3 AM
You pushed a bad image. Users are screaming. The deployment is bleeding. Kubernetes can auto-rollback if you configure it right — but most teams don't. They rely on manual kubectl rollout undo after paging someone. That's slow. That's stupid. Here's the real deal: Deployments track revision history. Set revisionHistoryLimit to something sane (10-20, not default 10). Set progressDeadlineSeconds to 60-120. When a new pod fails liveness or readiness within that window, the Deployment controller automatically triggers a rollback to the last stable revision. The catch? Your probes must be tight. If your readiness probe accepts traffic before the app is actually ready, the rollback won't fire. You'll serve 500s. Test rollbacks in staging with a deliberately broken image. Automate the command kubectl rollout undo deployment/nginx-proxy --to-revision=3 into a runbook. Never trust a deployment that hasn't been rolled back in anger.
kubectl rollout history deployment/nginx-proxy --revision=0 or set revisionHistoryLimit too low, you'll be stuck in a loop. Always keep at least 10 revisions.progressDeadlineSeconds to 120 or less. Couple it with aggressive probes. Test rollbacks with a bad image in a canary deployment first.Horizontal Scaling: Why CPU-Based Autoscaling Is a Lie
Everyone starts with HorizontalPodAutoscaler (HPA) based on CPU. It's easy. It's wrong. CPU metrics are noisy. A pod can spike CPU during startup, trigger scale-out, then drop — wasting resources. Worse: your app might be memory-bound or I/O-bound. Scaling on CPU tells you nothing about actual request latency. The fix? Use custom or external metrics. Expose http_requests_per_second or queue_depth as a Prometheus metric, then let the HPA scale on that. You want the HPA to react to real load, not a system counter. Set behavior.scaleDown.stabilizationWindowSeconds to 300 to avoid thrashing. For bursty traffic, use behavior.scaleUp.policies to allow fast scale-up (e.g., add 100% in 30 seconds). Memory-based scaling? Only use it if your app leaks or caches aggressively. Otherwise, it's a fool's errand. Also: never set minReplicas to 1 in production. If that pod crashes, you're dark. Start at 2. Always.
kubectl get hpa -w to watch HPA decisions live. If you see flapping (scale up then immediately down), your stabilizationWindowSeconds is too low or your metric is noisy. Fix the metric, not the window.minReplicas to 2. Use stabilization to kill thrashing.Exposing Pods: Why Services Are a Deployment's Better Half
A Deployment that nobody can reach is a data-center paperweight. Pods are ephemeral — they die, restart, and shift IPs like a moving target. That's why a Deployment without a Service is irresponsible. Services give you a stable endpoint (DNS name, ClusterIP, or LoadBalancer) that abstracts the chaos underneath.
The HOW is simple: a Service uses label selectors to match Pods from a Deployment. Your app doesn't care which Pod handles the request — the Service load-balances across healthy replicas. This separation lets you scale or rollback the Deployment without touching the network config.
Production trap: never hardcode Pod IPs. Always route through a Service. When you're debugging a 3 AM outage, the last thing you want is a client pointing to a dead Pod.
ConfigMaps and Secrets: Don't Hardcode the Chaos
Hardcoding database URLs, API keys, or feature flags directly into your Deployment YAML is a one-way ticket to disaster. When you rotate credentials or change a config value, you don't want to rebuild and redeploy the entire image. That's where ConfigMaps and Secrets come in — they decouple configuration from the immutable container image.
ConfigMaps store non-sensitive data (like environment variables, config files, or command-line arguments). Secrets handle the sensitive stuff (tokens, passwords, TLS certs) with base64 encoding and optional encryption at rest. Mount them as volumes or inject them as env vars in your Pod spec. This keeps your Deployment portable across environments — dev, staging, prod — without touching the container.
Senior shortcut: don't stuff 20 env vars into the Deployment spec. Use envFrom with a ConfigMap or Secret to keep your manifests clean and auditable.
kubectl create secret generic from the CLI with --from-literal or --from-file — don't hand-write base64 strings. Less human error, faster iteration.Pod Templates: The Blueprint Deployments Copy
A Deployment does not manage pods directly. It manages a ReplicaSet, and the ReplicaSet creates pods from a template embedded in the Deployment spec. This pod template is the only place you define containers, volumes, environment variables, and resource limits. Changing the template triggers a new ReplicaSet with updated pods, following your configured rollout strategy. Why care? Because editing a running pod’s spec is pointless — the ReplicaSet overwrites changes. To update an application, edit the Deployment’s pod template, never the pod itself. Templates also enforce consistency: every pod from the same template is identical, preventing configuration drift across replicas. Common mistake: adding a sidecar container directly to a pod via kubectl edit. That pod will be replaced on the next deployment revision. Always modify the template in the Deployment manifest.
Pod Networking: IPs Are Ephemeral, Services Are Permanent
Every pod in Kubernetes gets a unique IP address from the cluster network, assigned before any container starts. But pod IPs are ephemeral — when a pod dies or is rescheduled, it gets a new IP. This is why Deployments pair with Services. A Service provides a stable virtual IP and DNS name that routes traffic to healthy pods, regardless of pod IP changes. Inside the pod, containers share the same network namespace: they see each other on localhost. Outside the pod, communication requires a Service, an Ingress, or a network policy. Why this matters? Direct pod IP references break the moment a Deployment rolls out a new version. Always address pods through a Service. For advanced isolation, use NetworkPolicies to control traffic between pods — default is all allow, which is insecure for multi-tenant clusters.
What Is Kubernetes? A Beginner's Guide to Container Orchestration
Kubernetes, often abbreviated as K8s, is an open-source platform designed to automate the deployment, scaling, and management of containerized applications. Before Kubernetes, developers ran containers on single hosts, manually handling failures, resource contention, and networking. Container orchestration solves these problems by treating a cluster of machines as a single pool of resources. At its core, Kubernetes schedules containers onto nodes based on declared desired states, monitors health, and restarts failed components automatically. Why this matters: without orchestration, a crashed container means downtime. Kubernetes ensures your application runs as intended, even when nodes fail or traffic spikes. Key primitives include Pods (the smallest deployable unit), Deployments (managing Pod replicas and updates), and Services (stable networking). For beginners, think of Kubernetes as a distributed operating system for containers—it abstracts away individual server complexities, letting you focus on application logic rather than infrastructure babysitting.
ClusterIP, NodePort, LoadBalancer, ExternalName, Headless Kubernetes Services with Hands-on Sample
Services provide stable networking for ephemeral Pods. ClusterIP (default) exposes a service on an internal IP—only reachable within the cluster, ideal for inter-service communication. NodePort opens a static port on every node’s IP, allowing external traffic when combined with a load balancer. LoadBalancer provisions a cloud load balancer (e.g., AWS ELB) and routes traffic to NodePorts—best for production HTTP apps. ExternalName maps a service to a DNS CNAME, useful for integrating external databases (e.g., mydb.example.com). Headless services (clusterIP: None) return all Pod IPs via DNS for stateful workloads like databases—each Pod gets a unique DNS entry. Why this order? Start with ClusterIP for internal microservices, then expose via LoadBalancer for clients. Headless only when you need direct Pod addressing. Below is a ClusterIP sample—modify type to NodePort or LoadBalancer for external access.
Rolling Update Caused 502 Errors for 8 Minutes During Production Deploy
kubectl apply. The rollout completed in 3 minutes but 502 errors persisted for 5 additional minutes. kubectl get pods showed all Pods in Running state. No CrashLoopBackOff. No OOMKill events.maxUnavailable: 0 (correct for zero-downtime) but the readiness probe was misconfigured: initialDelaySeconds: 5 and periodSeconds: 10 with failureThreshold: 1. The application took 45 seconds to warm its cache and connect to the database. During this 45-second window, the readiness probe failed once after 5 seconds, but Kubernetes had already added the Pod to the Service endpoints because the first probe check had not yet run. The kube-proxy rules were updated to route traffic to the new Pod before it was actually ready. Additionally, the old Pods were terminated immediately when the new Pods passed their first readiness check, creating a window where neither old nor new Pods were fully serving traffic.failureThreshold: 60 and periodSeconds: 5 (300 seconds max startup time) to gate liveness and readiness probes until the app is fully booted.
2. Changed readiness probe initialDelaySeconds to 0 (startup probe handles the delay) and set failureThreshold: 3 with periodSeconds: 5.
3. Added terminationGracePeriodSeconds: 60 with a pre-stop hook that sleeps 15 seconds to drain in-flight connections before the container is killed.
4. Set maxSurge: 1 to ensure at least one new Pod is fully ready before old Pods are terminated.
5. Added a PodDisruptionBudget with minAvailable: 2 to prevent simultaneous termination of multiple Pods.- Readiness probes must reflect actual readiness, not just process liveness. A Pod that is 'running' is not the same as a Pod that is 'ready'.
- Startup probes are mandatory for applications with warm-up times greater than 30 seconds. Without them, liveness probes kill the container during boot.
- Always combine
terminationGracePeriodSecondswith a pre-stop hook to drain in-flight connections before container shutdown. - Test rolling updates in staging with real traffic patterns. The first time you see a rollout failure should not be in production.
- PodDisruptionBudgets prevent the Deployment from terminating too many Pods simultaneously during updates or node drains.
kubectl logs <pod> --previous to see why it crashed. Check for OOMKill (exit code 137), missing environment variables, or failed startup dependencies. If the crash happens during startup, add a startup probe.kubectl describe nodes | grep -A 5 Allocatable. Check for insufficient CPU/memory, PVC binding failures, or taints/tolerations mismatches. If no nodes can schedule the Pod, it stays Pending indefinitely.kubectl rollout status deployment/<name>. If maxUnavailable is 0 and a new Pod cannot become ready, the rollout blocks. Check readiness probe failures. Check for PDB conflicts that prevent Pod termination.failureThreshold is too low or periodSeconds is too aggressive, healthy-but-slow Pods are killed. Check liveness probe endpoint latency. Consider using a startup probe to gate liveness.terminationGracePeriodSeconds and pre-stop hooks. Verify maxSurge and maxUnavailable settings. Check Service endpoints during rollout: kubectl get endpoints <service> -w.kubectl describe node <node> | grep Conditions. Check if the Pod's QoS class is BestEffort (first to be evicted). Set resource requests and limits to achieve Guaranteed QoS.kubectl logs <pod> --previous --tail=50kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'Key takeaways
Interview Questions on This Topic
Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
That's Kubernetes. Mark it forged?
16 min read · try the examples if you haven't