Kubernetes Namespace Terminating — Finalizer Debug Strategy
A missing IAM permission caused CCM to fail to remove finalizer, blocking namespace deletion for 3 days — debug this for production and interviews..
20+ years shipping production code across the stack, with years spent interviewing engineers. Everything here is grounded in real deployments.
- Control Plane request lifecycle: Auth -> Mutating Webhook -> Validation -> etcd -> Controllers -> Scheduler -> Kubelet
- etcd: Raft consensus, split-brain scenarios, compaction, and why disk latency kills clusters
- Networking: CNI overlay vs flat networking, kube-proxy iptables vs IPVS, NetworkPolicy enforcement
- Resource Management: Requests vs Limits, QoS classes, OOMKill behavior, CPU throttling
- Autoscaling: HPA algorithm, stabilization windows, KEDA, HPA/VPA conflict
- RBAC and Admission: Webhook chains, OPA/Gatekeeper, service account token risks
Imagine a massive airport with hundreds of flights (your apps), gates (servers), ground crew (Kubernetes components), and air traffic control (the scheduler). Kubernetes is the entire airport management system — it decides which plane parks at which gate, reroutes flights when a gate breaks, and makes sure no single runway gets overloaded. When an interviewer asks about Kubernetes internals, they're asking you to explain how the airport actually runs — not just that planes land and take off.
Kubernetes has become the de facto operating system for cloud-native infrastructure. At senior and staff-level interviews, nobody is going to ask you what a Pod is. They want to know what happens inside the API server when you run kubectl apply, why your HPA isn't scaling when CPU is clearly spiking, or how etcd consistency guarantees affect your cluster's behaviour under partition.
The gap between 'I know Kubernetes' and 'I understand Kubernetes' comes down to internals. When something breaks at 3am — a node drains but Pods stay Pending, a Deployment rolls out but traffic never shifts, a namespace hangs in Terminating forever — the engineers who can diagnose and fix fast are the ones who understand the watch-loop reconciliation model, the scheduler predicates and priorities, and how the CNI interacts with kube-proxy.
This guide covers the failure modes, edge cases, and architectural decisions that surface in real senior/staff-level interviews at companies running Kubernetes at scale. Every question maps to a production incident you will eventually encounter.
What Kubernetes Interview Questions Actually Test
Kubernetes interview questions are not trivia — they probe your understanding of distributed system mechanics under pressure. The core mechanic is simple: an interviewer presents a scenario (e.g., a namespace stuck in Terminating) and expects you to trace the control loop, identify the blocking condition, and state the exact command to resolve it. This is a test of mental model, not memorization.
In practice, these questions focus on three properties: how finalizers block deletion until a controller completes cleanup, how the garbage collector propagates owner references, and how to use kubectl patch to force-remove a stuck resource. For example, a namespace stuck in Terminating usually means a finalizer (like kubernetes) is waiting on a controller that is down or misconfigured. The fix is kubectl get namespace <name> -o json | jq '.spec.finalizers = []' | kubectl replace --raw /api/v1/namespaces/<name>/finalize -f -.
You use this knowledge when debugging production clusters where a namespace won't delete, blocking CI/CD pipelines or resource reclamation. It matters because a single stuck namespace can cascade into failed deployments, leaking resources, and alert fatigue. Senior engineers don't guess — they read the finalizer list, check the controller logs, and decide whether to patch or restart.
kubectl get ns shows the namespace in Terminating state for hours, and kubectl describe ns reveals a finalizer referencing a custom resource that no longer exists.kubernetes finalizer is safe to remove if the controller is gone; custom finalizers require controller recovery.kubectl replace --raw to edit the finalize endpoint — never delete the namespace with --force alone.The Anatomy of a Request: What Happens When You Run 'kubectl apply'?
A senior candidate must articulate the journey of a manifest from the CLI to the Kubelet. It isn't just 'the API server saves it.' The lifecycle involves Authentication/Authorization, Mutating Admission Webhooks (which might inject sidecars like Istio or Linkerd), Schema Validation, and finally, Validating Admission Webhooks (like OPA/Gatekeeper).
Once persisted in etcd, the Control Plane controllers see the state change via a watch event. The Deployment controller creates a ReplicaSet, which creates Pod objects. These Pods remain in a 'Pending' state with an empty nodeName until the Kube-Scheduler performs its two-step dance: Filtering (Predicates) to find capable nodes, and Scoring (Priorities) to find the best node. Only then does the Kubelet on the target node see the Pod and instruct the Container Runtime (CRI) to pull images and start containers.
- Authentication: Service account tokens, OIDC, certificates.
- Authorization: RBAC, ABAC, Webhook authorizers.
- Mutating Webhooks: Istio sidecar injection, default resource limits, label injection.
- Validating Webhooks: OPA/Gatekeeper policies, image signature verification, namespace quotas.
- etcd: Only persisted after all gates pass. The API Server is the only component that writes to etcd.
failurePolicy: Ignore for non-critical webhooks, webhook HA (multiple replicas), and monitoring webhook latency. Never set failurePolicy: Fail on a webhook that is not absolutely critical.Networking Internals: Services, Kube-Proxy, and the CNI
A Service in Kubernetes is not a process; it's a virtual IP (VIP) managed by kube-proxy. You should be prepared to explain the difference between the legacy iptables mode and the modern IPVS mode. While iptables uses sequential rule checking (O(n) complexity), IPVS uses hash tables (O(1) complexity), making it significantly more performant for clusters with thousands of services.
Furthermore, the CNI (Container Network Interface) is responsible for the 'plumbing' — assigning IPs to Pods and ensuring they can talk across nodes. If an interviewer asks why a Pod can't reach another Pod, your answer should start with the CNI overlay (Calico/Cilium) and move to NetworkPolicies, rather than just 'checking the app logs.'
- iptables: Simple, well-understood, but O(n) rule matching. No native load balancing algorithms.
- IPVS: O(1) hash matching, native LB algorithms (rr, lc, sh), but more complex debugging.
- eBPF (Cilium): Bypasses both iptables and IPVS entirely. Kernel-level packet processing. The future.
- kube-proxy is being replaced by eBPF-based CNIs in high-performance clusters.
externalTrafficPolicy: Cluster (default) distributes traffic evenly across all nodes, then to pods. This loses the client source IP. externalTrafficPolicy: Local only routes traffic to nodes that have local pods, preserving the source IP but risking uneven load distribution if pods are not evenly spread. This is a common interview question and a common production misconfiguration.etcd Internals: Raft, Consistency, and Failure Modes
etcd is the single source of truth for all Kubernetes cluster state. It uses the Raft consensus algorithm to replicate data across an odd number of members (typically 3 or 5). Understanding Raft is essential for diagnosing cluster-wide failures.
- Raft leader: Elected by members. All writes go through the leader.
- Heartbeat interval: Leader sends heartbeats (default 100ms). If a follower misses elections (default 1000ms), it starts a new election.
- Disk latency: etcd requires fsync on every write. Slow disks cause leader elections and cluster instability.
- Compaction: Old revisions accumulate. Periodic compaction and defragmentation are required to prevent unbounded growth.
--quota-backend-bytes (default 2GB, max 8GB) is the hard limit on the database size. If exceeded, etcd enters a maintenance mode that rejects all writes, effectively halting the cluster. Monitor etcd_mvcc_db_total_size_in_bytes and alert at 75%. Run compaction and defragmentation regularly. In large clusters with many ConfigMaps/Secrets, etcd can grow quickly. Consider externalizing large data (e.g., Helm charts) to object storage.Resource Management: Requests, Limits, and QoS Classes
Resource requests and limits are not just about preventing OOMKills. They define the contract between the application and the scheduler. Requests are used for scheduling decisions (can this Pod fit on this node?). Limits are enforced by the kernel cgroup (can this Pod use more than allocated?).
- Guaranteed: requests == limits for all containers. Highest eviction priority.
- Burstable: requests < limits (or only requests set). Medium priority.
- BestEffort: No requests or limits. Lowest priority. First to be evicted.
- CPU throttling: If CPU limit is set, the container is throttled when it exceeds the limit. This is NOT an eviction — it is a performance penalty.
- Memory OOMKill: If memory usage exceeds the limit, the kernel kills the container (OOMKill, exit code 137).
container_cpu_cfs_throttled_periods_total in cAdvisor metrics.RBAC, Service Accounts, and Admission Control
RBAC (Role-Based Access Control) is the primary authorization mechanism in Kubernetes. It defines who (Subject) can do what (Verb) on which resources (Resource) in which scope (Namespace or Cluster). Understanding RBAC is critical for security and for debugging 'access denied' errors.
- Role: Namespace-scoped. RoleBinding binds it to subjects within the namespace.
- ClusterRole: Cluster-scoped. ClusterRoleBinding binds it to subjects across all namespaces.
- ServiceAccount: The identity for a Pod. Default SA is mounted into every Pod unless
automountServiceAccountToken: false. - Aggregated ClusterRoles: Combine multiple ClusterRoles using label selectors. Used by operators to extend permissions dynamically.
automountServiceAccountToken: false as the namespace default, creating dedicated ServiceAccounts per workload, and auditing ClusterRoleBindings regularly with kubectl auth can-i --list --as=system:serviceaccount:<ns>:<sa>.Scheduler Internals: Filtering, Scoring, and Custom Schedulers
The Kubernetes scheduler is a control loop that watches for Pods with an empty nodeName and assigns them to nodes. It does not actually run Pods — it only sets the nodeName field, and the kubelet on that node picks up the Pod. The scheduler's decision process has two phases: Filtering (formerly Predicates) and Scoring (formerly Priorities).
- NodeResourcesFit: Checks if the node has enough CPU/memory for the Pod's requests.
- NodeAffinity: Matches nodeSelector and nodeAffinity rules.
- TaintToleration: Ensures the Pod tolerates all taints on the node.
- PodTopologySpread: Enforces topology spread constraints (zone, hostname).
- VolumeBinding: Ensures required PVs can be bound on the node.
- ImageLocality: Prefers nodes that already have the container image cached.
topologySpreadConstraints or podAntiAffinity to force spread. Also, the scheduler's --percentage-of-nodes-to-score flag (default 50%) limits scoring to a subset of feasible nodes for performance. In small clusters, set this to 100% to ensure optimal placement.Probes Deep Dive: Liveness, Readiness, and Startup
Probes are the kubelet's mechanism for monitoring container health. Misconfigured probes are one of the most common causes of production incidents: liveness probes that kill healthy-but-slow containers, readiness probes that flap during cache warm-up, and missing startup probes that cause crash loops on legacy applications.
- Startup probe: Only runs during boot. Gates liveness/readiness.
- Liveness probe: Runs continuously. Failure = container restart.
- Readiness probe: Runs continuously. Failure = remove from Service endpoints.
- Probe types: httpGet, tcpSocket, exec (command).
- timeoutSeconds: Must be less than periodSeconds, or the probe is always considered failed.
ConfigMaps and Secrets: The 'We Pushed Creds to Git' Interview Question
Most juniors can recite what ConfigMaps and Secrets are. The interview question isn't about definitions—it's about whether you've been on call at 3 AM because someone base64-encoded a production database password and committed it.
ConfigMaps store non-sensitive configuration (environment variables, config files). Secrets store sensitive data—but Kubernetes only claims they're secure. Default encryption is at rest in etcd unless you enable encryption at rest with a KMS provider. The base64 encoding is not encryption; it's obfuscation.
The real gotcha: Secrets are mounted as files or env vars. If a pod crashes and you exec into it, those env vars are still in /proc. Don't treat Secrets as bulletproof. Use external secret stores (Vault, AWS Secrets Manager) with CSI drivers for production. Interviewers want to hear you've thought about the attack surface, not just the API object.
Namespaces: The 'Why Is My Pod Missing?' Trap
Namespaces are how you isolate resources in a cluster—think virtual clusters inside a physical one. The question isn't 'What is a namespace?' It's 'What happens when you forget to specify one?'
Every kubectl command targets the default namespace unless you pass -n or change your context. That's fine for dev, but in production you'll have namespaces for teams, environments, or feature flags. The failure mode: you kubectl get pods after a deployment, see nothing, and panic. Then you realize you're in default and the pod is in production.
Namespaces provide scope for resource quotas, network policies, and RBAC bindings. They don't isolate network traffic by default—you need NetworkPolicies for that. They also don't create a security boundary; a compromised pod in one namespace can still reach another namespace's service unless you restrict egress.
The interview hot take: Namespaces are organizational, not security. Use them to avoid naming collisions and enforce resource limits, but don't rely on them for isolation without NetworkPolicies.
-n flags on all kubectl commands.Persistent Volumes & Claims: Storage That Survives Pod Death
Containers are ephemeral; their filesystems vanish with the pod. Persistent Volumes (PVs) are cluster-wide storage resources provisioned by an admin or dynamically via StorageClass. A Persistent Volume Claim (PVC) is a request for storage by a user, specifying size and access mode (ReadWriteOnce, ReadOnlyMany, ReadWriteMany). Kubernetes binds a PVC to a matching PV, then pods reference the PVC as a volume. Why this matters in interviews: they test if you understand that PV/PVC decouple storage consumption from provisioning. A common trap: two pods sharing a PVC with ReadWriteOnce will fail if scheduled on different nodes. Always match access modes with your workload's concurrency needs.
Rolling Updates & Rollbacks: Zero-Downtime Deployments
A Deployment manages a ReplicaSet that creates pods. When you change the pod template (image, env, etc.), Kubernetes performs a rolling update: it creates a new ReplicaSet, scales it up while scaling the old one down, ensuring a configurable number of pods are always available. The maxSurge and maxUnavailable fields control this pace. If the update fails (e.g., ImagePullBackOff), you rollback with kubectl rollout undo. Why this is a gotcha: interviewers ask about strategy vs. revision history. By default, Kubernetes keeps 10 revisions in .spec.revisionHistoryLimit. Without it, you can't rollback past the limit. Always test rollbacks in staging—container start failures often mask as healthy until traffic hits.
Pod Disruption Budgets: Surviving Node Failures Gracefully
Voluntary disruptions (node drains, cluster upgrades) can kill pods. Without controls, you lose all replicas simultaneously. A PodDisruptionBudget (PDB) specifies the minimum available or maximum unavailable pods for a set of labels. When a node is drained, the eviction API checks PDBs: if removing a pod would violate the budget, the drain waits. Why this matters: interviewers test if you separate voluntary from involuntary disruptions (node crashes). Involuntary disruptions ignore PDBs—you need multiple replicas across nodes via anti-affinity or topology spread constraints. Common mistake: setting minAvailable too high blocks drains entirely. Aim for 1-2 unavailable per service, especially for stateful workloads.
Namespace Stuck in Terminating: Finalizer Blocking Cluster Decommission
service.kubernetes.io/load-balancer-cleanup). The cloud controller manager (CCM) was responsible for removing this finalizer after deleting the cloud load balancer. However, the CCM had been redeployed with a new service account that lacked IAM permissions to delete load balancers. The CCM silently failed to remove the finalizer, and Kubernetes refused to complete namespace deletion because finalizers were still present on resources within the namespace.kubectl patch service <name> -p '{"metadata":{"finalizers":null}}'.
3. Verified the cloud load balancer was already deleted (no orphaned resources).
4. Namespace deletion completed immediately after finalizer removal.
5. Added monitoring for namespaces in Terminating state for more than 5 minutes.- Finalizers block deletion until the responsible controller acknowledges cleanup. If the controller is broken, deletion hangs indefinitely.
- Never manually delete cloud resources (load balancers, volumes) without ensuring the controller can reconcile. Orphaned resources cost money.
- Monitor for resources stuck in Terminating state. It is always a sign of a broken controller or missing permissions.
- When debugging Terminating hangs, check
kubectl get <resource> -o json | jq .metadata.finalizersto identify which controller is blocking.
kubectl api-resources --verbs=list -o name | xargs -n 1 kubectl get -n <ns> --ignore-not-found -o json | jq '.items[] | select(.metadata.finalizers) | {kind: .kind, name: .metadata.name, finalizers: .metadata.finalizers}'. Patch or investigate each blocking resource.externalTrafficPolicy — if set to Local, traffic only routes to nodes with local pods. Check kube-proxy mode and logs.kubectl rollout status deployment/<name>. If maxUnavailable is 0 and a new pod cannot be scheduled, the rollout blocks forever. Check for resource quota limits, PDB conflicts, and node capacity.etcdctl endpoint health --cluster. Check disk latency on etcd nodes (iostat -x 1). High fsync latency causes Raft timeouts. Check network connectivity between etcd members.kubectl describe pod <pod> | grep -A 20 Eventskubectl describe nodes | grep -A 5 Allocatable -B 2Key takeaways
Interview Questions on This Topic
Frequently Asked Questions
20+ years shipping production code across the stack, with years spent interviewing engineers. Everything here is grounded in real deployments.
That's DevOps Interview. Mark it forged?
7 min read · try the examples if you haven't