etcd Disk Latency — How 800ms Killed the Kubernetes Cluster
- You now understand what Introduction to Kubernetes is and why it exists
- You've seen it working in a real runnable example
- Practice daily — the forge only works when it's hot
- Kubernetes is a declarative container orchestration platform that continuously reconciles observed state with desired state.
- Control plane: kube-apiserver, etcd, kube-scheduler, kube-controller-manager — each has a distinct role and failure mode.
- etcd is the single source of truth — its disk latency is the cluster's performance ceiling.
- The scheduler filters then scores nodes; it does NOT rebalance or predict load.
- kubelet on each node runs the actual containers and reports status back to the API server.
- Most production outages trace back to etcd misconfiguration, not application code.
Kubernetes Triage Cheat Sheet
Pod not starting — no events visible.
kubectl get pods -n kube-system | grep schedulerkubectl describe nodes | grep -A 5 'Allocated resources'Service returns 502/503 intermittently.
kubectl get endpoints <service-name>kubectl get pods -l app=<selector> -o wideNode marked NotReady — Pods being evicted.
kubectl describe node <node-name> | grep -A 10 Conditionssystemctl status kubeletPersistentVolumeClaim stuck in Pending.
kubectl get pvkubectl describe pvc <pvc-name>Pod evicted due to node pressure.
kubectl describe node <node-name> | grep -i pressurekubectl top node <node-name>Production Incident
etcdctl defrag).
4. Configure etcd auto-compaction (--auto-compaction-retention=8) to prevent unbounded data growth.
5. Monitor etcd member health with etcdctl endpoint health and etcdctl endpoint status.Production Debug GuideSymptom-driven investigation paths for the most common failure modes.
kubectl describe pod <name> and read the Events section. 2. Common causes: insufficient CPU/memory on any node (check kubectl describe nodes for Allocatable vs Allocated), PersistentVolumeClaim not bound, node affinity/taint mismatches, resource quotas exceeded. 3. If no events appear, the scheduler may be down — check kubectl get pods -n kube-system for kube-scheduler.kubectl logs <pod> --previous to see the logs from the crashed container (current logs may be empty). 2. Common causes: missing environment variables, failed health checks, OOMKill (check kubectl describe pod for Last State), misconfigured entrypoint. 3. If OOMKill, increase memory limits or fix the memory leak. Check kubectl get pod <name> -o jsonpath='{.status.containerStatuses[0].lastState}'.kubectl get pods -n kube-system | grep calico (or flannel/weave). 2. Check if Pod CIDR ranges overlap between nodes: kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'. 3. Verify kube-proxy is running: kubectl get pods -n kube-system | grep kube-proxy. 4. Test from within a Pod: kubectl exec -it <pod> -- curl <service-ip>:<port>.kubectl describe rs <new-rs-name>. 2. Look for Pods that are Pending or CrashLoopBackOff. 3. Check if the new image exists in the registry and if imagePullSecrets are configured. 4. If using rolling update with maxUnavailable=0 and the cluster has no spare capacity, new Pods cannot be scheduled. 5. Rollback: kubectl rollout undo deployment/<name>.etcdctl endpoint health --write-out=table. 2. Check disk I/O on etcd nodes: iostat -x 1. 3. Check etcd database size: etcdctl endpoint status --write-out=table. 4. If disk is the bottleneck, migrate to local SSDs. 5. If database is large, run defragmentation: etcdctl defrag.kubectl get pv. 2. Describe the PVC: kubectl describe pvc <name>. Common causes: no PV available with matching accessModes and storageClassName, or the StorageClass has no provisioner. 3. If using dynamic provisioning, verify the storage provisioner pod is running and hasn't hit a quota or permission error.Kubernetes is not a deployment tool. It is a distributed state reconciliation engine. Every component — from the scheduler to the kubelet — operates on the same principle: watch the desired state in etcd, compare it with observed state, and act to close the gap. This is the mental model that unlocks real debugging capability.
The control plane is the brain. etcd is the memory. The kubelet is the muscle on each node. The scheduler decides placement. When any of these components degrades, the symptoms are often misleading — a Pod stuck in Pending looks like a scheduling problem but is frequently an etcd latency issue or a resource quota misconfiguration.
The common misconception is that Kubernetes 'runs containers.' It does not. Kubernetes manages the desired state of workloads. The container runtime (containerd, CRI-O) runs containers. Kubernetes tells the runtime what to run, monitors whether it is running, and corrects deviations. This distinction matters when debugging crashes, image pull failures, and networking issues.
Control Plane Architecture: The Brain of the Cluster
The Kubernetes control plane consists of four components that work together to maintain cluster state. Understanding each component's role — and its failure modes — is essential for production operations.
kube-apiserver is the front door. Every kubectl command, every controller reconciliation, every kubelet status report goes through the API server. It validates requests, persists state to etcd, and serves as the watch endpoint for all controllers. It is stateless — you can run multiple replicas behind a load balancer for HA.
etcd is the single source of truth. It is a distributed, consistent key-value store built on the Raft consensus protocol. All cluster state — Pod definitions, ConfigMaps, Secrets, node registrations — lives in etcd. If etcd loses quorum, the cluster cannot make any state changes. etcd is the most critical component and the most commonly under-provisioned.
kube-scheduler watches for unscheduled Pods and assigns them to nodes. It does not run Pods — it only writes the nodeName field. The kubelet on the assigned node then pulls the image and starts the container. The scheduler uses a two-phase process: filtering (eliminate infeasible nodes) and scoring (rank feasible nodes, pick the highest score).
kube-controller-manager runs the control loops. Each controller watches a specific resource type and reconciles actual state with desired state. The Deployment controller ensures the right number of replicas exist. The Node controller detects when nodes go unhealthy. The Endpoint controller updates Service endpoints as Pods come and go.
# Control Plane Health Check — Run this to verify all components are healthy # Save as check-control-plane.sh # 1. API Server health (returns 200 if healthy) curl -k https://localhost:6443/healthz # Expected: "ok" # 2. etcd cluster health ETCDCTL_API=3 etcdctl endpoint health \n --endpoints=https://127.0.0.1:2379 \n --cacert=/etc/kubernetes/pki/etcd/ca.crt \n --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \n --key=/etc/kubernetes/pki/etcd/healthcheck-client.key # Expected: "is healthy" # 3. etcd cluster member status ETCDCTL_API=3 etcdctl endpoint status \n --endpoints=https://127.0.0.1:2379 \n --cacert=/etc/kubernetes/pki/etcd/ca.crt \n --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \n --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \n --write-out=table # Shows: ID, Status, Version, DB Size, Raft Term, Raft Index # 4. Scheduler and Controller-Manager leader election kubectl get endpoints kube-scheduler -n kube-system -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}' kubectl get endpoints kube-controller-manager -n kube-system -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}' # 5. All control plane components running kubectl get pods -n kube-system -o wide
127.0.0.1:2379 is healthy: successfully committed proposal: took = 2.145ms
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://127.0.0.1:2379 | 8e9e05c52164694d | 3.5.9 | 25 MB | true | false | 4 | 18234 | 18234 | |
+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
{"holderIdentity":"master-1_xxxxx","leaseDurationSeconds":15,"acquireTime":"2026-03-01T10:00:00Z","renewTime":"2026-04-07T14:30:00Z","leaderTransitions":3}
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system coredns-5d78c9869d-abc12 1/1 Running 0 30d 10.244.0.5 master-1
kube-system etcd-master-1 1/1 Running 0 30d 192.168.1.10 master-1
kube-system kube-apiserver-master-1 1/1 Running 0 30d 192.168.1.10 master-1
kube-system kube-controller-manager-master-1 1/1 Running 0 30d 192.168.1.10 master-1
kube-system kube-proxy-xyz78 1/1 Running 0 30d 192.168.1.10 master-1
kube-system kube-scheduler-master-1 1/1 Running 0 30d 192.168.1.10 master-1
- The API server is the only component that talks to etcd. All other components go through the API server.
- Controllers are level-triggered, not edge-triggered. They care about the current state, not the event that caused it.
- This is why Kubernetes is self-healing. It does not remember what happened — it only checks what is true right now.
The Scheduler: How Kubernetes Decides Where Pods Run
The kube-scheduler is the component that assigns Pods to nodes. It does not run Pods — it only writes the spec.nodeName field on the Pod object. The kubelet on the assigned node then pulls the image and starts the container.
The scheduler uses a two-phase process:
Filtering (Feasibility): Eliminate nodes that cannot run the Pod. Filter reasons include: insufficient CPU/memory, node taints the Pod cannot tolerate, node affinity mismatches, volume zone constraints, and Pod topology spread constraints. After filtering, if zero nodes remain, the Pod stays in Pending.
Scoring (Ranking): Rank the feasible nodes by a set of scoring plugins. Default scoring includes: NodeResourcesBalancedAllocation (prefer nodes with balanced CPU/memory usage), ImageLocality (prefer nodes that already have the container image), InterPodAffinity (prefer nodes where affinity rules are satisfied), and TaintToleration (prefer nodes with fewer taints). The node with the highest weighted score wins.
The scheduler makes decisions based on the state of the cluster at scheduling time. It does not predict future load. It does not rebalance existing Pods. Once a Pod is scheduled, only explicit actions (eviction, deletion, preemption) can move it.
# Example: Pod with scheduling constraints # This Pod will ONLY be scheduled on nodes with the label 'disktype=ssd' # and will prefer nodes in zone 'us-east-1a' apiVersion: v1 kind: Pod metadata: name: io-thecodeforge-payment-service namespace: production spec: # Hard requirement: node MUST have this label nodeSelector: disktype: ssd # Soft preference: scheduler tries to place here, but can choose elsewhere affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 80 preference: matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - us-east-1a # Pod affinity: prefer to run near other payment-service Pods podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 50 podAffinityTerm: labelSelector: matchLabels: app: payment-service topologyKey: kubernetes.io/hostname # Tolerations: allow scheduling on nodes with the 'dedicated=high-cpu' taint tolerations: - key: dedicated operator: Equal value: high-cpu effect: NoSchedule # Topology spread: distribute replicas evenly across zones topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: payment-service containers: - name: payment-service image: registry.thecodeforge.io/payment-service:v2.4.1 resources: requests: cpu: "500m" memory: "512Mi" limits: cpu: "1000m" memory: "1Gi"
# Verify scheduling decision
kubectl describe pod io-thecodeforge-payment-service -n production | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5s default-scheduler Successfully assigned production/io-thecodeforge-payment-service to node-3
- Guaranteed QoS (requests=limits): Pod is last to be evicted under resource pressure.
- Burstable QoS (requests < limits): Pod can burst but is evicted before Guaranteed Pods.
- BestEffort QoS (no requests, no limits): First to be evicted. Never use in production.
nodeSelector or nodeAffinity required mode. Hard constraint — Pod stays Pending if no node matches.nodeAffinity preferred mode. Soft constraint — scheduler tries to match but places elsewhere if needed.topologySpreadConstraints. More flexible and performant than pod anti-affinity.podAffinity (co-locate) or podAntiAffinity (spread). At scale prefer topologySpreadConstraints.tolerations to the Pod spec. Without a matching toleration, the Pod won't schedule on the tainted node.Pod Networking: How Containers Talk to Each Other
Kubernetes networking has three fundamental requirements, enforced by the CNI (Container Network Interface) plugin:
- Every Pod gets its own IP address, unique across the cluster.
- Pods on any node can communicate with Pods on any other node without NAT.
- Agents on a node (kubelet, system daemons) can communicate with all Pods on that node.
These requirements are simple to state but complex to implement. The CNI plugin (Calico, Cilium, Flannel, AWS VPC CNI) is responsible for wiring this up. It allocates IP addresses from the node's Pod CIDR range, sets up network interfaces inside the Pod's network namespace, and configures routing rules so Pods can reach each other across nodes.
kube-proxy handles Service networking. It watches the API server for Service and Endpoint objects, then programs iptables rules (or IPVS rules) on each node. When a Pod connects to a Service's ClusterIP, the kernel's iptables rules intercept the connection and DNAT it to one of the backend Pod IPs. This is why Service IPs are virtual — they do not exist on any network interface.
# Debugging Pod networking step by step # 1. Verify Pod has an IP address kubectl get pods -n production -o wide # If Pod IP is <none>, the CNI plugin failed to assign an address # 2. Check if the CNI plugin is healthy kubectl get pods -n kube-system | grep -E 'calico|cilium|flannel|aws-node' # 3. Verify Pod CIDR allocation per node kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}' # Each node must have a unique, non-overlapping CIDR # 4. Test Pod-to-Pod connectivity across nodes kubectl exec -it pod-on-node-a -- ping <pod-ip-on-node-b> # If this fails but intra-node works, the CNI cross-node routing is broken # 5. Check Service endpoints kubectl get endpoints payment-service -n production # If endpoints are empty, no Pods match the Service's selector # 6. Test Service DNS resolution kubectl exec -it <pod> -- nslookup payment-service.production.svc.cluster.local # If DNS fails, check CoreDNS pods: kubectl get pods -n kube-system | grep coredns # 7. Inspect iptables rules for a Service # (run on the node where your Pod is running) iptables-save | grep <service-cluster-ip>
payment-service-7d8f9-abc12 1/1 Running 10.244.1.45 node-2
payment-service-7d8f9-def34 1/1 Running 10.244.2.78 node-3
NAME READY STATUS RESTARTS AGE
calico-node-abc12 1/1 Running 0 30d
calico-kube-controllers-5d78-def34 1/1 Running 0 30d
node-1 10.244.0.0/24
node-2 10.244.1.0/24
node-3 10.244.2.0/24
PING 10.244.2.78 (10.244.2.78): 56 data bytes
64 bytes from 10.244.2.78: seq=0 ttl=62 time=0.456 ms
NAME ENDPOINTS AGE
payment-service 10.244.1.45:8080,10.244.2.78:8080 15d
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: payment-service.production.svc.cluster.local
Address 1: 10.96.45.12 payment-service.production.svc.cluster.local
- Pod IP works but Service IP fails: kube-proxy or iptables issue.
- Service IP works but DNS fails: CoreDNS issue.
- DNS works but external access fails: Ingress controller or cloud LB issue.
Kubernetes Storage: PersistentVolumes, Claims, and StorageClasses
Kubernetes storage decouples Pod lifecycle from data life. A Pod can be deleted and recreated, but its data persists if it uses a PersistentVolume (PV) and PersistentVolumeClaim (PVC). This is critical for stateful workloads like databases.
PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically by a StorageClass. It is a cluster resource, like a node. PVs have a capacity and access mode (ReadWriteOnce, ReadOnlyMany, ReadWriteMany).
PersistentVolumeClaim (PVC) is a request for storage by a user. It specifies size and access mode. Kubernetes binds a PVC to a PV that meets the requirements. If no matching PV exists, the PVC remains Pending — unless a StorageClass with a dynamic provisioner is referenced.
StorageClass defines a class of storage. It specifies the provisioner (e.g., kubernetes.io/aws-ebs), parameters (type, IOPS), and reclaim policy. When a PVC requests a StorageClass, the provisioner automatically creates a PV that satisfies the claim.
# StorageClass for AWS gp3 volumes with 3000 IOPS apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: io-thecodeforge-fast provisioner: ebs.csi.aws.com parameters: type: gp3 iops: "3000" throughput: "125" reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer --- # PVC that uses the StorageClass above apiVersion: v1 kind: PersistentVolumeClaim metadata: name: io-thecodeforge-payment-db-pvc namespace: production spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: io-thecodeforge-fast --- # Pod using the PVC apiVersion: v1 kind: Pod metadata: name: io-thecodeforge-payment-db namespace: production spec: containers: - name: postgres image: postgres:16 env: - name: PGDATA value: /var/lib/postgresql/data/pgdata volumeMounts: - name: data mountPath: /var/lib/postgresql/data volumes: - name: data persistentVolumeClaim: claimName: io-thecodeforge-payment-db-pvc
kubectl get sc
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
# io-thecodeforge-fast ebs.csi.aws.com Delete WaitForFirstConsumer
kubectl get pvc -n production
# NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS
# io-thecodeforge-payment-db-pvc Bound pvc-abc123 100Gi RWO io-thecodeforge-fast
kubectl get pv
# NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM
# pvc-abc123 100Gi RWO Delete Bound production/io-thecodeforge-payment-db-pvc
- If a PVC is deleted, what happens to the underlying PV depends on the
persistentVolumeReclaimPolicy: - Retain: PV remains but is in Released state — you must manually reclaim it.
- Delete: PV and underlying storage are deleted. This is default for dynamic provisioners.
- Recycle: Deprecated. Attempts to scrub and re-use.
- Production gotcha*: If you delete a PVC with a
Deletereclaim policy without first taking a snapshot, you lose all data. Always setRetainfor critical databases, or use a backup solution.
volumeClaimTemplates to automatically generate unique PVCs per replica.emptyDir volume. Data is lost when the Pod is deleted, which is expected.ReadWriteOnce access mode.ReadWriteMany access mode. Not all provisioners support it — consider NFS, EFS, or GlusterFS.ReadWriteMany for databases.Namespaces, Resource Quotas, and Multi-Tenancy
Namespaces are virtual clusters within a physical cluster. They provide isolation boundaries for resources, RBAC, and network policies. Every resource lives in a namespace — except cluster-scoped resources like Nodes and PersistentVolumes.
ResourceQuota limits aggregate resource consumption within a namespace. You can set quotas on CPU, memory, Pod count, PVC storage, and even the number of Services. Without quotas, a single misconfigured application can consume all cluster resources and starve others.
LimitRange sets default requests/limits and min/max constraints for Pods in a namespace. This prevents a Pod from requesting an absurd amount of resources or running without any limits.
Multi-tenancy with Namespaces is common: each team gets its own namespace, with RBAC restricting cross-namespace access. But true multi-tenancy (running untrusted workloads) requires additional isolation — consider virtual clusters (vClusters) or sandbox containers (gVisor, Kata Containers).
# ResourceQuota for a namespace apiVersion: v1 kind: ResourceQuota metadata: name: io-thecodeforge-team-quota namespace: team-a spec: hard: requests.cpu: "10" requests.memory: 20Gi limits.cpu: "20" limits.memory: 40Gi persistentvolumeclaims: "10" requests.storage: 500Gi pods: "50" services: "10" --- # LimitRange to enforce default resource boundaries apiVersion: v1 kind: LimitRange metadata: name: io-thecodeforge-default-limits namespace: team-a spec: limits: - default: cpu: "500m" memory: 512Mi defaultRequest: cpu: "100m" memory: 128Mi max: cpu: "2" memory: 4Gi min: cpu: "50m" memory: 64Mi type: Container
kubectl describe resourcequota -n team-a
# Name: io-thecodeforge-team-quota
# Namespace: team-a
# Resource Used Hard
# -------- --- ---
# pods 12 50
# requests.cpu 3.5 10
# requests.memory 7Gi 20Gi
# limits.cpu 8 20
# limits.memory 18Gi 40Gi
# persistentvolumeclaims 2 10
# requests.storage 120Gi 500Gi
# services 4 10
kubectl describe limitrange -n team-a
# Limits:
# Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
# ---- -------- --- --- --------------- ------------- -----------------------
# Container cpu 50m 2 100m 500m -
# Container memory 64Mi 4Gi 128Mi 512Mi -
kubectl top and Prometheus alerts to catch quota exhaustion before it causes deployment failures.| Component | Role | Failure Impact | Recovery |
|---|---|---|---|
| kube-apiserver | Validates and serves all API requests. Gateway to etcd. | No new deployments, scaling, or config changes. Existing Pods continue running. | Restart the process. If HA, load balancer routes to healthy replica. |
| etcd | Distributed key-value store. Single source of truth for all cluster state. | Cluster freezes — no state changes possible. If quorum lost, cluster is partitioned. | Restore from snapshot or replace failed member. Requires etcdctl expertise. |
| kube-scheduler | Assigns unscheduled Pods to nodes based on resource availability and constraints. | New Pods stuck in Pending. Existing Pods unaffected. | Restart the process. If leader election fails, check lease in etcd. |
| kube-controller-manager | Runs reconciliation loops for Deployments, ReplicaSets, Nodes, Endpoints, etc. | No self-healing. Crashed Pods not restarted. Scaling stops. Node failures not detected. | Restart the process. Controllers resume reconciliation from current state. |
| kubelet | Node agent. Pulls images, starts containers, reports node status to API server. | Pods on that node stop being managed. Node marked NotReady after 40s (default). Pods evicted after 5 minutes. | Restart kubelet. If node is unhealthy, cordoning and replacing the node may be necessary. |
| kube-proxy | Programs iptables/IPVS rules for Service load balancing on each node. | Services unreachable from Pods on that node. Cross-node Service access still works from other nodes. | Restart the process. Rules are rebuilt from current Service/Endpoint state. |
| CoreDNS | Cluster DNS. Resolves Service names to ClusterIPs. | Service DNS resolution fails. Pods can still reach other Pods by direct IP. | Restart CoreDNS Pods. Check ConfigMap for misconfiguration. |
🎯 Key Takeaways
- You now understand what Introduction to Kubernetes is and why it exists
- You've seen it working in a real runnable example
- Practice daily — the forge only works when it's hot
- The reconciliation loop is the fundamental operating principle of every Kubernetes controller. Understanding it transforms debugging from trial-and-error to systematic investigation.
- etcd is the single point of truth and the most common root cause of cluster-wide issues. Its disk latency is the cluster's ceiling.
- The scheduler scores nodes — it does not bin-pack, predict load, or rebalance. Scheduling decisions are permanent until the Pod is explicitly moved.
- Kubernetes networking is layered (CNI, kube-proxy, Ingress). Debug from the bottom up: Pod IP, ClusterIP, DNS, Ingress.
- Resource requests drive scheduling; resource limits drive runtime enforcement. Setting requests=limits (Guaranteed QoS) gives the most predictable behavior.
- Storage is decoupled from Pod lifecycle via PV/PVC claims. The reclaim policy determines whether data survives PVC deletion — set to Retain for irretrievable data.
- Namespaces provide isolated environments, but true security requires RBAC, NetworkPolicies, and ResourceQuotas. Without quotas, a single app can starve the cluster.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QExplain the Kubernetes reconciliation loop. How does it apply to a Deployment managing a ReplicaSet managing Pods?Mid-levelReveal
- QWhat happens when you delete a Pod that belongs to a Deployment? Trace the full sequence of events through every controller involved.SeniorReveal
- QHow does the kube-scheduler decide which node to place a Pod on? What are the two phases, and what plugins participate in each?SeniorReveal
- QWhat is the difference between a Service's ClusterIP and the Pod IPs it routes to? How does kube-proxy implement this?Mid-levelReveal
- QA Pod is stuck in Pending. Walk me through your debugging process, from the first command you would run to identifying the root cause.Mid-levelReveal
- QExplain etcd's role in the cluster. What happens if etcd loses quorum? How would you recover?SeniorReveal
- QWhat is the difference between requests and limits, and how do they affect scheduling vs runtime behavior?Mid-levelReveal
- QHow would you design a zero-downtime deployment strategy using Kubernetes primitives (Deployments, PDBs, health checks)?SeniorReveal
Frequently Asked Questions
What is Introduction to Kubernetes in simple terms?
Introduction to Kubernetes is a fundamental concept in DevOps. Think of it as a tool — once you understand its purpose, you'll reach for it constantly.
What is the difference between a Deployment, a ReplicaSet, and a Pod?
A Pod is the smallest unit — one or more containers sharing a network namespace. A ReplicaSet ensures a specified number of Pod replicas are running at all times. A Deployment manages ReplicaSets and provides declarative updates (rolling updates, rollbacks). The hierarchy is: Deployment -> ReplicaSet -> Pod. You almost never create ReplicaSets or Pods directly — you create Deployments, and the Deployment controller creates the ReplicaSet, which creates the Pods.
What happens if the control plane node goes down?
Existing Pods on worker nodes continue running — the kubelet on each node operates independently of the control plane for running workloads. However, you cannot deploy new workloads, scale existing workloads, update configurations, or modify any cluster state until the control plane recovers. This is why production clusters need at least 3 control plane nodes for high availability.
How does Kubernetes handle node failures?
The Node controller in kube-controller-manager monitors node heartbeats. If a node stops sending heartbeats (default: every 10s), the node is marked NotReady after 40 seconds. After 5 minutes (the pod-eviction-timeout), the control plane evicts Pods from the unreachable node and reschedules them on healthy nodes. During this 5-minute window, the Pods are running but unreachable if the node is truly down. You can tune this timeout, but setting it too low causes unnecessary evictions during temporary network blips.
What is the difference between a ConfigMap and a Secret?
Functionally, they are identical — both inject configuration data into Pods as environment variables or mounted files. The difference is intent and handling: Secrets are base64-encoded (not encrypted by default), stored separately in etcd, and can be encrypted at rest with an EncryptionConfiguration. ConfigMaps are for non-sensitive configuration. In production, use an external secrets manager (Vault, AWS Secrets Manager) with the Secrets Store CSI Driver instead of Kubernetes Secrets for sensitive data.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.