Kubernetes Pods and Deployments - Missing Startup Probe 502
A readiness probe with initialDelaySeconds=5 caused 8 minutes of 502 errors in production.
- Pod: Ephemeral, non-self-healing. Shares a network namespace (localhost) and volumes across containers.
- Deployment: Manages ReplicaSets which manage Pods. Handles rolling updates, rollbacks, and scaling.
- ReplicaSet: The intermediate controller that ensures N replicas exist. You rarely interact with it directly.
- Probes: Liveness (restart if deadlocked) vs Readiness (remove from service endpoints if not ready).
- maxSurge: More surge = faster rollout but higher peak resource usage.
- maxUnavailable: 0 = zero-downtime but slower rollout. Higher = faster but riskier.
- Creating Pods directly instead of Deployments. Direct Pods are not self-healing. If the node dies, the Pod stays dead forever.
Think of a Pod as a shipping container — it holds your application and its immediate companions (like a logging sidecar). A Deployment is the shipping company's logistics system — it makes sure the right number of containers are always at the dock, replaces any that get damaged, and can swap out old cargo for new cargo without stopping operations.
Every production Kubernetes workload ultimately runs as Pods managed by Deployments. The Pod is the atomic scheduling unit — a group of containers sharing a network namespace and volumes. The Deployment is the declarative controller that ensures the right number of Pods exist, handles rolling updates, and rolls back on failure.
Misconfiguring either object causes production incidents: Pods without resource limits cause noisy-neighbor OOMKills, Deployments without proper probes cause 502 errors during rollouts, and direct Pod creation bypasses self-healing entirely. Understanding the reconciliation loop — how the Deployment controller continuously drives current state toward desired state — is the foundation for debugging every higher-level Kubernetes object.
Pod Basics: The Atomic Unit
In the world of Kubernetes, the Pod is the atomic unit of scheduling. While you might be used to thinking in terms of 'containers,' Kubernetes thinks in 'Pods.' A Pod can host a single container, or a tightly coupled group of containers (like an app container and a 'sidecar' logging agent) that need to share the same local network (localhost) and storage volumes.
Crucially, Pods are ephemeral. They are born, they live, and they die. They are never 'repaired'; they are replaced.
- Pod IP is assigned at creation and changes on restart. Use Service DNS for discovery.
- Container filesystem is ephemeral. Use PersistentVolumes for data that must survive restarts.
- Pods in the same Pod share localhost networking. Containers in different Pods do not.
- Sidecar containers (logging, proxy) share the Pod's lifecycle. If one crashes, the Pod is restarted.
- Init containers run before the main container. They block Pod startup until they complete successfully.
kubectl run or a Pod manifest) for production workloads. Direct Pods have no self-healing: if the node crashes, the Pod is gone forever with no replacement. The Deployment controller is what provides self-healing, rolling updates, and rollback. Every production workload must use a Deployment, StatefulSet, or DaemonSet — never a raw Pod.Deployments: Orchestrating the Desired State
A Deployment is a high-level object that manages a ReplicaSet, which in turn manages Pods. Its job is to ensure the 'Desired State' matches the 'Current State.' If you tell a Deployment you want 3 replicas, and a node crashes taking one Pod with it, the Deployment controller notices the discrepancy and immediately schedules a new Pod on a healthy node.
Deployments are also the primary vehicle for Rolling Updates. By manipulating the maxSurge and maxUnavailable parameters, you can swap out version 1.0 for 2.0 without dropping a single user request.
- Deployment creates ReplicaSets. ReplicaSets create Pods. You interact with Deployments.
- Each rollout creates a new ReplicaSet. Old ReplicaSets are kept for rollback (default% of desired pods are always up template: : 10 revisions).
- The Deployment controller manages the transition: scale up new ReplicaSet, scale down old ReplicaSet.
- maxSurge: How many extra Pods can exist above the desired count during rollout.
- maxUnavailable: How many Pods can be missing below the desired count during rollout.
maxSurge and maxUnavailable parameters directly control rollout speed vs resource usage. maxSurge: 25%, maxUnavailable: 0 means: during rollout, create up to 25% extra new Pods before terminating old Pods. This ensures zero-downtime but requires 125% of normal resource capacity. For resource-constrained clusters, set maxSurge: 0, maxUnavailable: 25% to terminate old Pods first (faster, less resource usage, but brief capacity reduction). The worst configuration is maxSurge: 0, maxUnavailable: 0 — the rollout blocks forever because it cannot create new Pods (no surge) and cannot terminate old Pods (no unavailability allowed).Probe Comparison: Liveness vs Readiness vs Startup
Kubernetes provides three distinct probe types — liveness, readiness, and startup — each serving a different purpose in the Pod lifecycle. Misunderstanding their roles is the root cause of many production incidents, including the 502 error scenario described earlier.
Liveness Probe: Determines if the container is alive. If it fails, the kubelet kills the container and restarts it (according to the Pod's restartPolicy). Use it to recover from deadlocks or infinite loops. Never check external dependencies in a liveness probe — if the database is slow, the probe fails, the container restarts, and the restart increases load on the database, causing cascading failure.
Readiness Probe: Determines if the container is ready to serve traffic. If it fails, the Pod's IP is removed from the Service endpoints — traffic stops, but the container is NOT restarted. Use it for applications that need to load cache, connect to databases, or run startup migrations before handling requests. The readiness probe controls traffic flow, not container lifecycle.
Startup Probe: Gates both liveness and readiness probes. While the startup probe has not yet succeeded, liveness and readiness probes are disabled. Use it for applications with startup times greater than 30 seconds. Set a high failureThreshold (e.g., 60) with a short periodSeconds (e.g., 5) to allow up to 300 seconds for startup. Once the startup probe succeeds, liveness and readiness probes begin their normal checks.
All three probes support the same handler types: httpGet, tcpSocket, exec, and grpc.
failureThreshold: 60 and periodSeconds: 5 would have prevented traffic from reaching the Pod until the startup probe succeeded, completely eliminating the error window.Liveness/Readiness/Startup Probe Comparison Table
While the previous section explained each probe's purpose, a comparison table helps you quickly decide which probe to use in any scenario. The table below summarizes the differences across key dimensions: behavior on failure, impact on the Pod, typical use cases, handler types, and best practices for configuration.
| Aspect | Liveness Probe | Readiness Probe | Startup Probe |
|---|---|---|---|
| Purpose | Is the container alive? | Is the container ready to serve traffic? | Has the container finished booting? |
| On failure | Kubelet kills container, restarts per restartPolicy | Pod removed from Service endpoints (no restart) | Gates liveness & readiness; while failing, liveness/readiness disabled |
| Impact | Container restarts (may cause CrashLoopBackOff) | Traffic stops, but container stays running | Blocks startup progress if failing; once passes, liveness/readiness start |
| Typical use case | Deadlock detection, infinite loop recovery | Cache warmup, DB connection, migration completion | Apps with >30s boot time, legacy apps, heavy initialization |
| Handler support | httpGet, tcpSocket, exec, grpc | httpGet, tcpSocket, exec, grpc | httpGet, tcpSocket, exec, grpc |
| Configuration advice | Keep simple: high threshold, long period; avoid dependency checks | Set initialDelaySeconds to 0 when using startup probe; check real readiness | High failureThreshold (e.g., 60) × short periodSeconds (e.g., 5) to cover max startup |
| Common mistakes | Using to check database (cascading failure) | Setting initialDelaySeconds too low (adds Pod to endpoints before ready) | Not used at all for slow-start apps (liveness kills during boot) |
When to combine: Always pair readiness probes with startup probes for applications that warm up slowly. The startup probe disables liveness and readiness during boot, preventing premature restarts and premature traffic. For fast-starting apps (<30s), a simple readiness probe with initialDelaySeconds may suffice, but adding a startup probe costs nothing and adds safety.
Deployment Strategies: RollingUpdate vs Recreate
When updating a Deployment, Kubernetes supports two strategies: RollingUpdate (default) and Recreate. The choice between them directly impacts availability, resource usage, and rollout speed.
RollingUpdate: Replaces old Pods with new Pods incrementally. Controlled by two parameters: - maxSurge: How many extra Pods (count or percentage) can be created above the desired replica count during the update. - maxUnavailable: How many Pods (count or percentage) can be unavailable during the update.
During a rolling update, the Deployment controller creates new Pods in a new ReplicaSet, waits for them to become ready, then scales down the old ReplicaSet. The process repeats until all old Pods are replaced. This strategy enables zero-downtime deployments, but at the cost of requiring extra cluster capacity (at least maxSurge proportion of overhead).
Recreate: Terminates all old Pods simultaneously, then creates new Pods. This is a simple 'kill all, start all' pattern. During the update, the application is completely unavailable — no traffic can be served until all new Pods are ready. Use this strategy only when the application cannot have multiple versions running concurrently (e.g., stateful applications with incompatible schemas, or when file locks prevent coexistence). Recreate is also useful for cost-constrained environments where you cannot afford the overhead of extra Pods during a rollout.
- RollingUpdate: Stateless APIs, microservices, web frontends — anything that needs 100% uptime during deploys.
- Recreate: Stateful databases (during schema migrations), batch jobs, or any application that enforces exclusive access to a resource.
maxSurge proportion of replicas. For a 10-replica Deployment with maxSurge: 25%, you need capacity for 12.5 Pods (round up to 13) during the rollout. In resource-constrained clusters, you can trade availability for capacity by setting maxUnavailable: 25% and maxSurge: 0, which terminates old Pods before creating new ones (brief capacity dip but less overhead). Always calculate the peak resource requirement for rolling updates and ensure the cluster can handle it.RollingUpdate vs Recreate Visual Comparison
While the previous section explained when to use each strategy, this visual comparison highlights the key differences in resource usage, timeline, and availability during a deployment. Use this diagram and table to communicate rollout behavior to your team and to decide which strategy fits your workload.
Timeline Comparison (3 replicas, 10s per Pod startup): - Recreate: 0s — all 3 old pods terminated. 10s — first new pod ready. 20s — all new pods ready. Total downtime: ~10s (from 0s to first pod ready). - RollingUpdate (maxSurge=1, maxUnavailable=0): 0s — 3 old pods serving, 1 new pod created. 10s — new pod ready, old pod terminated. 15s — second new pod created. 25s — second new pod ready, second old pod terminated. 30s — third new pod created. 40s — third new pod ready, third old pod terminated. Total downtime: 0s (always at least 2 pods serving).
Resource Peak Comparison (3 replicas, each request 256Mi memory, 500m CPU): - Recreate: 0s-10s: 0 pods, 0 resources used. 10s-20s: 3 pods, 768Mi/1.5 CPU. Peak: same as steady state. - RollingUpdate (maxSurge=1, maxUnavailable=0): Overlap phase: 2 old + 1 new = 3 pods, plus 1 extra during surge = 4 pods simultaneously. Peak memory: 4×256Mi = 1024Mi. Peak CPU: 4×500m = 2000m. Requires 33% more capacity than steady state.
Resource Management: Requests, Limits, and QoS Classes
Every container in Kubernetes should specify CPU and memory requests and limits. These settings directly affect scheduling, runtime performance, and cluster stability. Misconfiguring them is a top cause of production incidents.
Requests are the minimum amount of resources guaranteed to the container. The Kubernetes scheduler uses requests to make placement decisions — it only schedules a Pod on a node that has at least the sum of all Pod requests available. Requests also ensure the container gets at least that much CPU and memory under contention.
Limits are the maximum amount of resources a container is allowed to consume. If a container exceeds its CPU limit, it gets throttled (not killed). If it exceeds its memory limit, it gets OOMKilled (exit code 137). Limits prevent a single container from starving other containers on the same node.
QoS Classes: Kubernetes categorizes Pods into three Quality of Service classes based on request and limit settings: - Guaranteed: requests == limits for all resources (e.g., cpu: 500m, limits cpu: 500m). These Pods are least likely to be evicted and get the most predictable performance. - Burstable: At least one container has requests < limits for any resource. These Pods get their requested minimum but can burst to their limit if node capacity is available. - BestEffort: No requests or limits set. These Pods receive no guarantees and are the first to be evicted under node pressure. Avoid BestEffort in production.
Production Best Practices: 1. Always set requests and limits for every container. Never leave them unset. 2. Set requests equal to limits for critical workloads to achieve Guaranteed QoS and predictable performance. 3. Base request values on steady-state resource usage observed in production over 7 days. Do not guess. 4. Set memory limits to 1.5x the observed maximum to handle transient spikes without OOMKill. 5. CPU limits are less critical than memory limits, but still set them to prevent noisy neighbors. CPU throttling is better than OOMKill. 6. Use Vertical Pod Autoscaler (VPA) in recommendation mode to generate initial request/limit values, then refine manually.
Resource Management Best Practices (Requests/Limits)
While the previous section explained the mechanics of requests, limits, and QoS classes, this section provides a concise list of best practices that you can apply immediately to your production Deployments. These recommendations come from real-world incidents and thousands of production clusters.
1. Never leave resources unset. A container without requests runs as BestEffort QoS — it gets no guarantees and is first to be evicted. A container without limits can consume all node memory and OOMKill other Pods. Always set both.
2. Base requests on steady-state usage, not peak. Monitor your application for at least 7 days using kubectl top pod or a metrics system (Prometheus). Set requests to the 50th percentile of observed usage. Set limits to 1.5x the 95th percentile for memory (to handle spikes) and 2x the 95th percentile for CPU (since throttling is acceptable).
3. Use Guaranteed QoS for all critical Deployments. Setting requests == limits gives the Pod the highest eviction priority and most stable CPU scheduling. The only cost is that you cannot overcommit, but for critical services this is a feature, not a bug.
4. Test resource configurations under load. A common mistake: setting memory too low based on idle measurement. During traffic spikes, memory usage can double. Run load tests that mimic peak traffic and verify that memory stays below the limit.
5. Avoid setting identical CPU and memory values for all containers. Each container type has different resource profiles. For example, a sidecar proxy (Envoy) needs different requests than the main application container. Use per-container settings.
6. Implement resource quotas at the namespace level. Even with per-Pod limits, a runaway Deployment can create many Pods that collectively use too many resources. Use ResourceQuota objects to enforce aggregate limits per namespace.
7. Monitor for OOMKill events and CPU throttling. Set up alerts on kube_pod_container_status_terminated_reason (OOMKilled) and container_cpu_cfs_throttled_seconds_total. These events indicate misconfigured resource limits.
8. Use VPA in recommendation mode (not auto) for initial tuning. The Vertical Pod Autoscaler can analyze historical usage and suggest request/limit values. Review its recommendations before applying. Do not enable auto mode for stateless workloads unless you understand the disruption it causes (Pod restart on every recommendation).
9. Consider using alternative scheduling policies. For batch workloads that can tolerate lower priority, use priorityClassName to distinguish critical vs. best-effort. Combined with Guaranteed QoS, priority ensures your most important workloads survive contention.
10. Document your resource strategy. Include typical request/limit values for each service, the reasoning behind them, and instructions for adjusting. This prevents future engineers from guessing or copying incorrect values from other services.
Production Operations: Essential kubectl Patterns
Managing Deployments in production requires more than just apply. You need to be able to inspect the rollout history, trigger instant rollbacks, and scale on demand.
- Rollback = scale up old ReplicaSet + scale down new ReplicaSet. No image pull needed if cached.
- revisionHistoryLimit (default 10): How many old ReplicaSets to keep. Increase to 20-50 for critical services.
- kubectl rollout pause/resume: Pause a rollout mid-way to test a subset of new Pods before completing.
- kubectl rollout undo --to-revision=N: Rollback to a specific revision, not just the previous one.
- kubectl rollout restart: Triggers a rolling restart without changing the image. Useful for picking up ConfigMap changes.
kubectl rollout pause command is the foundation of canary deployments without a service mesh. Pause the rollout after one new Pod is created, send a small percentage of traffic to it via Service endpoint selection, monitor error rates, then either resume or undo. This gives you canary behavior with standard Kubernetes objects. Combine with maxSurge: 1 to ensure only one canary Pod is created during the pause.kubectl apply. Use rollout undo for instant rollbacks, rollout pause for canary deployments, and rollout restart for ConfigMap-driven restarts. Keep old images available — rollback fails if the image is garbage-collected.Rolling Update Caused 502 Errors for 8 Minutes During Production Deploy
kubectl apply. The rollout completed in 3 minutes but 502 errors persisted for 5 additional minutes. kubectl get pods showed all Pods in Running state. No CrashLoopBackOff. No OOMKill events.maxUnavailable: 0 (correct for zero-downtime) but the readiness probe was misconfigured: initialDelaySeconds: 5 and periodSeconds: 10 with failureThreshold: 1. The application took 45 seconds to warm its cache and connect to the database. During this 45-second window, the readiness probe failed once after 5 seconds, but Kubernetes had already added the Pod to the Service endpoints because the first probe check had not yet run. The kube-proxy rules were updated to route traffic to the new Pod before it was actually ready. Additionally, the old Pods were terminated immediately when the new Pods passed their first readiness check, creating a window where neither old nor new Pods were fully serving traffic.failureThreshold: 60 and periodSeconds: 5 (300 seconds max startup time) to gate liveness and readiness probes until the app is fully booted.
2. Changed readiness probe initialDelaySeconds to 0 (startup probe handles the delay) and set failureThreshold: 3 with periodSeconds: 5.
3. Added terminationGracePeriodSeconds: 60 with a pre-stop hook that sleeps 15 seconds to drain in-flight connections before the container is killed.
4. Set maxSurge: 1 to ensure at least one new Pod is fully ready before old Pods are terminated.
5. Added a PodDisruptionBudget with minAvailable: 2 to prevent simultaneous termination of multiple Pods.- Readiness probes must reflect actual readiness, not just process liveness. A Pod that is 'running' is not the same as a Pod that is 'ready'.
- Startup probes are mandatory for applications with warm-up times greater than 30 seconds. Without them, liveness probes kill the container during boot.
- Always combine
terminationGracePeriodSecondswith a pre-stop hook to drain in-flight connections before container shutdown. - Test rolling updates in staging with real traffic patterns. The first time you see a rollout failure should not be in production.
- PodDisruptionBudgets prevent the Deployment from terminating too many Pods simultaneously during updates or node drains.
kubectl logs <pod> --previous to see why it crashed. Check for OOMKill (exit code 137), missing environment variables, or failed startup dependencies. If the crash happens during startup, add a startup probe.kubectl describe nodes | grep -A 5 Allocatable. Check for insufficient CPU/memory, PVC binding failures, or taints/tolerations mismatches. If no nodes can schedule the Pod, it stays Pending indefinitely.kubectl rollout status deployment/<name>. If maxUnavailable is 0 and a new Pod cannot become ready, the rollout blocks. Check readiness probe failures. Check for PDB conflicts that prevent Pod termination.failureThreshold is too low or periodSeconds is too aggressive, healthy-but-slow Pods are killed. Check liveness probe endpoint latency. Consider using a startup probe to gate liveness.terminationGracePeriodSeconds and pre-stop hooks. Verify maxSurge and maxUnavailable settings. Check Service endpoints during rollout: kubectl get endpoints <service> -w.kubectl describe node <node> | grep Conditions. Check if the Pod's QoS class is BestEffort (first to be evicted). Set resource requests and limits to achieve Guaranteed QoS.Key takeaways
Interview Questions on This Topic
Frequently Asked Questions
That's Kubernetes. Mark it forged?
10 min read · try the examples if you haven't