Senior 13 min · March 06, 2026
AWS EKS — Elastic Kubernetes Service

AWS EKS ENI Limits — Why Healthy Nodes Reject Pods

EC2 ENI slots cap EKS pod count per node, even with free CPU and memory.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • EKS is a managed Kubernetes control plane on AWS.
  • Control plane runs in a separate AWS account, multi-AZ for HA.
  • VPC CNI assigns each pod a VPC IP address directly.
  • Node groups: managed, self-managed, or Fargate.
  • IRSA uses OIDC to grant IAM roles to pods.
  • Biggest gotcha: hitting ENI limits causes pod scheduling failures.
✦ Definition~90s read
What is AWS EKS?

AWS EKS ENI (Elastic Network Interface) limits are the hard cap on how many pods a Kubernetes node can run, determined by the node instance type's maximum ENI count and IPs per ENI. This isn't a bug—it's a fundamental constraint of the AWS VPC CNI plugin, which assigns each pod a real VPC IP address from the node's ENI.

Imagine you run a massive food court with hundreds of stalls.

When a node appears healthy (CPU, memory, disk all fine) but rejects new pods, it's almost certainly hitting this ENI/IP limit. The VPC CNI allocates IPs from a warm pool attached to each node's ENIs; once exhausted, the node can't schedule more pods until IPs are released or the node is scaled up.

This is a common production gotcha because Kubernetes' scheduler doesn't natively understand AWS networking limits—it only sees node resource requests, not the ENI/IP ceiling. You'll hit this most often with smaller instance types (e.g., t3.medium, which maxes at 6 ENIs and 6 IPs per ENI, yielding only ~17 pods) or when running daemonsets that consume IPs.

The fix involves either choosing larger instance types with higher ENI limits, using custom networking with prefix delegation (assigning /28 subnets per ENI for up to 256 IPs per ENI on Nitro instances), or switching to Fargate where each pod gets its own ENI but with higher cost and cold-start latency. IRSA (IAM Roles for Service Accounts) is orthogonal but often confused here—it handles pod AWS credential management via OIDC, not networking.

In production, monitor your ENI/IP utilization with tools like the VPC CNI's aws-node metrics or custom Prometheus exporters, and consider using node groups with mixed instance types or Karpenter for automatic scaling that respects these limits.

Plain-English First

Imagine you run a massive food court with hundreds of stalls. You need someone to manage which stalls open when, reroute customers if one stall closes, and make sure nobody runs out of supplies. EKS is like hiring AWS to be your food court manager — they handle all the hard 'keep everything running' work while you just worry about what food your stalls serve. The stalls are your containers, the food court is Kubernetes, and AWS owns the building.

Kubernetes is the de facto standard for running containerised workloads at scale, but running a production-grade Kubernetes control plane yourself is genuinely brutal. etcd upgrades, API server HA, certificate rotation, audit log pipelines — it's a full-time job before you've written a single line of application code. That's the gap AWS EKS was built to fill, and in 2024 it powers thousands of production systems from fintech to streaming to machine learning pipelines.

The problem EKS solves isn't just 'run Kubernetes for me.' It's the deep integration question: how do your pods get IAM permissions without storing static credentials? How does pod networking interact with AWS VPC routing tables? How do you autoscale nodes without leaving zombie instances behind? These are the questions that burn teams at 2 AM, and they all have specific EKS answers that differ from vanilla Kubernetes.

By the end of this article you'll understand exactly how the EKS control plane is architected and why, how VPC CNI assigns IPs to pods and where it breaks under load, how IAM Roles for Service Accounts (IRSA) works at the token level, how to choose between managed node groups, self-managed nodes, and Fargate, and which production gotchas have silently broken real deployments. This is the article you'll come back to before your next EKS architecture review.

Why AWS EKS Nodes Reject Pods Despite Being Healthy

AWS EKS ENI limits are the maximum number of Elastic Network Interfaces (ENIs) and IP addresses per node type, enforced by the AWS VPC CNI plugin. Each pod gets its own IP from a secondary IP on a node's ENI. When the ENI or IP pool is exhausted, the node reports Ready but the scheduler cannot bind new pods — they remain Pending with a "node(s) had insufficient pods" error. This is not a resource (CPU/memory) issue; it's a hard network interface cap.

Each instance type has a fixed ENI count and IPs per ENI. For example, a t3.medium supports 3 ENIs × 6 IPs = 18 pod IPs, but the kubelet's max-pods flag may be set higher (e.g., 20), causing silent scheduling failures. The VPC CNI plugin manages IP allocation via a warm pool; if the pool is too small or the node is at its ENI limit, new pods cannot be assigned an IP even if the node has free CPU and memory.

Use ENI limits to size node groups correctly and avoid mysterious pod scheduling failures. In production, always set max-pods to match the instance's ENI/IP capacity, and monitor the VPC CNI's IP pool metrics. Overprovisioning pods beyond the ENI limit leads to unpredictable scaling behavior and wasted debugging time.

ENI Limits Are Not Resource Limits
A node can have 60% free CPU and memory but still reject pods because its ENI/IP pool is exhausted — always check the VPC CNI metrics first.
Production Insight
Teams using t3.medium nodes with default max-pods=20 hit pod scheduling failures at 18 pods, even though the node shows 70% free resources.
The symptom: pods stuck in Pending with 'node(s) had insufficient pods' — no CPU/memory pressure, no taints.
Rule of thumb: set max-pods = (ENIs per instance × IPs per ENI) - 1 for the kube-proxy and system pods.
Key Takeaway
ENI limits are a hard cap on pod density per node — not CPU or memory.
Always align max-pods with the instance's ENI/IP capacity to avoid silent scheduling failures.
Monitor VPC CNI metrics (eni-max-pods, warm-pool-size) to detect IP exhaustion before it blocks deployments.
AWS EKS ENI Limits and Pod Rejection Flow THECODEFORGE.IO AWS EKS ENI Limits and Pod Rejection Flow Why healthy nodes reject pods due to ENI/IP limits EKS Node Launch Managed or self-managed node group VPC CNI Pod IP Allocation ENI and secondary IP limits per instance ENI Limit Reached No available IPs for new pods Pod Rejection Node healthy but pod stuck in Pending Cluster Autoscaler or Karpenter Scale out or replace node Pod Scheduled Successfully New node with available ENI/IP capacity ⚠ ENI limits vary by instance type; max pods = (ENIs × IPs per ENI) - 1 Use larger instances or prefix delegation to avoid hitting limits THECODEFORGE.IO
thecodeforge.io
AWS EKS ENI Limits and Pod Rejection Flow
Aws Eks Elastic Kubernetes

EKS Control Plane Internals

AWS EKS runs the Kubernetes control plane (etcd, API server, controller manager, scheduler) in a separate AWS account that you never see. It's fully managed, meaning AWS handles upgrades, patching, and multi-AZ high availability for free. But 'managed' doesn't mean you can ignore it. The control plane exposes a public or private API server endpoint. You interact with it exactly like a vanilla Kubernetes API server, but there are subtle differences. For example, you cannot directly access etcd — AWS abstracts it completely. Audit logs must be enabled via CloudTrail integration. And the API server is fronted by an Elastic Load Balancer (NLB) that can be internal or public. The control plane also includes the AWS-specific admission webhooks, like the one that enforces IRSA annotations on pods.

Under the hood, AWS runs the Kubernetes control plane components as containers on EC2 instances in its own infrastructure. They're isolated per tenant. The etcd nodes are encrypted at rest and in transit. If you need to inspect control plane health, you rely on CloudWatch metrics like apiserver_request_duration_seconds or etcd_request_duration_seconds. AWS exposes a subset of etcd metrics through AWS managed Prometheus (AMP) or CloudWatch. In production, the most common control plane issue is throttling from the API server when your client requests or watches generate too many requests per second. AWS applies rate limits at the NLB and the API server itself.

Debugging hint: If you see 429 Too Many Requests from kubectl, you've hit the API server rate limit. Enable retries with backoff in your automation tools.

check_control_plane_health.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
# TheCodeForgeCheck EKS control plane health via CloudWatch
# Prerequisites: AWS CLI, jq
# Replace <cluster-name>

aws cloudwatch get-metric-statistics \
  --namespace AWS/EKS \
  --metric-name cluster_failed_request_count \
  --dimensions Name=cluster_name,Value=<cluster-name> \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Sum \
  --output json | jq '.Datapoints[] | {timestamp: .Timestamp, failures: .Sum}'
Mental Model: Managed Control Plane as a Black Box with Windows
  • You cannot modify etcd, Certificate Authority, or scheduler directly.
  • Amazon provides CloudWatch metrics, audit logs, and event streams to peer inside.
  • You are responsible for client-side retries, network latency, and API request rates.
  • Control plane upgrades are scheduled by AWS, but you can initiate them after a maintenance window.
Production Insight
If your API server becomes unreachable, etcd might be slow. Check etcd request duration via CloudWatch.
Rule: Monitor etcd_request_duration_seconds p99 — if >1s, investigate controller load.
Key Takeaway
EKS control plane is fully managed, not invisible.
Monitor etcd and API server metrics.
Don't assume AWS handles everything — you must watch for throttling.

VPC CNI Networking: Pod IP Allocation and Limits

The AWS VPC CNI is what makes EKS unique among Kubernetes distributions. Instead of using an overlay network (like Flannel or Calico in IPIP mode), each pod gets a real IP address from your VPC subnet. This means pods can communicate with other VPC resources (RDS, EC2, Lambda) without any NAT or proxy. The CNI achieves this by attaching multiple Elastic Network Interfaces (ENIs) to each EC2 node and assigning secondary IP addresses from those ENIs to pods. When a pod is created, the CNI plugin picks an unused IP from a warm pool and assigns it to the pod's network namespace. This setup gives native VPC integration, but it comes with hard limits.

The first limit is per-instance ENI count. An m5.large has a maximum of 3 ENIs, each with up to 10 IPs, so total pods = 30. An m5.4xlarge supports 8 ENIs × 15 IPs = 120 pods. You hit these limits faster than you think. The second limit is IP address exhaustion in your VPC subnet. If you run 1000 pods on a /24 subnet (256 IPs), you'll quickly run out. The CNI also competes with other services for IPs.

To scale beyond these limits, AWS offers custom networking (assign pods IPs from different subnet ranges) and prefix delegation (assign /28 prefixes to ENIs, giving many more IPs per ENI). Prefix delegation is enabled via the AWS_VPC_CNI_PREFIX_DELEGATION environment variable on the aws-node daemonset. For new clusters, it's enabled by default.

Another common production trap: The CNI plugin uses the EC2 API to attach/detach ENIs, which means the node's IAM role needs specific permissions (ec2:AttachNetworkInterface, ec2:CreateNetworkInterface, etc.). If those permissions are missing, the CNI silently fails, and pods stay Pending with a FailedCreatePodSandBox error.

check_eni_usage.shBASH
1
2
3
4
5
6
7
8
# TheCodeForgeCheck ENI and IP usage per node
# Replace <node-instance-id> with the EC2 instance ID of your node

INSTANCE_ID=$(kubectl get node <node-name> -o jsonpath='{.spec.providerID}' | cut -d'/' -f2)
aws ec2 describe-network-interfaces \
  --filters Name=attachment.instance-id,Values=$INSTANCE_ID \
  --query 'NetworkInterfaces[*].{ENI_ID: NetworkInterfaceId, PrivateIPs: PrivateIpAddresses}' \
  --output json
Don't Ignore ENI Limits During Capacity Planning
Many teams choose node instance types based on CPU and memory only, ignoring the max pods per node. A node with plenty of resources may refuse to schedule new pods because it has run out of ENI slots. Always use the max-pods calculator (available in AWS docs) to determine the right instance size.
Production Insight
Hitting ENI limits is the #1 cause of mysterious pod scheduling failures.
You'll see '0/6 nodes are available: 4 pod does not fit into any node due to pod limit'.
Rule: Use prefix delegation or custom networking to increase pod density per node.
Key Takeaway
VPC CNI assigns IPs from the VPC, not an overlay.
ENI limits are the bottleneck.
Design subnet size for max pods per node.
Choosing an IP Allocation Strategy
IfYour node runs fewer than 50 total pods
UseDefault single IP mode is sufficient. Just plan subnet size accordingly.
IfYour node runs between 50 and 110 pods
UseEnable prefix delegation (AWS_VPC_CNI_PREFIX_DELEGATION=true). This gives up to 110 pods per node.
IfYour node needs more than 110 pods or strict IP separation
UseUse custom networking with secondary CIDRs and assign different subnet per node group.

Node Group Strategies: Managed vs Self-Managed vs Fargate

EKS offers three modes to run worker nodes: managed node groups, self-managed node groups, and AWS Fargate. Each has trade-offs in operational overhead, cost, and flexibility.

Managed node groups are the default choice. You specify an AMI family (Amazon Linux 2 or Bottlerocket), instance type, and scaling config. AWS handles patching the AMI (rolling updates), replacing unhealthy nodes, and draining them gracefully. You also get automatic security group rules for the control plane. The downside? You cannot use a custom AMI, and you're limited to the instance types AWS supports. If you need GPU drivers, custom kernel parameters, or pre-installed agents, you're out of luck.

Self-managed node groups give you full control. You launch EC2 instances, install kubelet, and join them to the cluster. You control the AMI, the bootstrap script, and the lifecycle policies. This is necessary for GPU-heavy workloads (like ML training) or when you need to pin specific kernel versions. The trade-off is operational burden: you must manage AMI updates, security patches, and node replacement yourself.

Fargate is the serverless option. You define a Fargate profile, and pods that match certain selectors run on Fargate instead of EC2. This is ideal for batch jobs, CI/CD runners, or sporadic workloads that don't justify always-on instances. The catch: you pay per pod-second, with a minimum charge of 1 minute. Fargate also limits your pod to a max of 4 vCPU and 30 GB memory, and you cannot run daemonsets or privileged containers.

A real-world pattern: Run critical microservices on managed node groups (steady state), bursty data processing on Fargate, and GPU training on a self-managed node group with custom AMI.

create_managed_nodegroup.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
# TheCodeForgeCreate a managed node group
# Replace <cluster-name> and <node-role-arn>

aws eks create-nodegroup \
  --cluster-name <cluster-name> \
  --nodegroup-name core-services \
  --scaling-config minSize=2,maxSize=10,desiredSize=3 \
  --instance-types t3.medium m5.large \
  --node-role <node-role-arn> \
  --subnets subnet-abc123 subnet-def456 \
  --disk-size 50 \
  --ami-type AL2_x86_64 \
  --tags Environment=production
Forge Tip: Hybrid Node Group Strategy
Don't force all workloads into one type. Use managed node groups for your 'always on' services, Fargate for jobs with predictable short durations, and self-managed for GPU or custom AMI needs. Label nodes accordingly.
Production Insight
Managing self-managed node groups is a full-time job — we had a team of three doing AMI rotations.
Rule: Start with managed node groups, escalate to self-managed only when necessary.
Key Takeaway
Managed node groups simplify ops.
Fargate works for bursty, non-long-running workloads.
Self-managed gives full control at cost of ops.

IAM Roles for Service Accounts (IRSA) — How Pods Get AWS Credentials

Before IRSA, you had to store AWS access keys in Kubernetes Secrets or in the node's instance profile and use kube2iam or kiam. Both were fragile. IRSA solves this by using OpenID Connect (OIDC) federation. Every EKS cluster has an OIDC issuer URL (e.g., oidc.eks.<region>.amazonaws.com/id/XXXXX). You create an IAM role with a trust policy that allows the OIDC provider to assume the role for a specific service account in a specific namespace. The Kubernetes API server issues a signed token (a JWT) that includes the pod's service account and namespace. The pod's aws-iam-token mutating webhook injects the token into the pod as a file (by default at /var/run/secrets/eks.amazonaws.com/serviceaccount/token). The AWS SDK uses a credential chain that picks up this token and calls STS:AssumeRoleWithWebIdentity to get temporary AWS credentials.

Common pitfalls: The OIDC provider's thumbprint must match the EKS cluster's certificate (you set this during IRSA setup). If the trust policy's aud claim doesn't match sts.amazonaws.com, the token is rejected. Also, the token has a time-to-live (TTL) of about 15 minutes, but the AWS SDK automatically refreshes it — as long as the pod can access the OIDC provider URL. If your VPC doesn't have internet access and you're using a private cluster, you need to expose the OIDC endpoint via a VPC endpoint (com.amazonaws.region.eks.auth).

Another gotcha: The token is issued by the Kubernetes API server, not by AWS STS. It's literally a Kubernetes service account token. The IAM trust policy verifies the token's signature using the OIDC provider's public keys. If the cluster is upgraded and the OIDC thumbprint changes, the token validation fails, and pods lose all AWS permissions until you update the thumbprint in the IAM role trust policy.

test_irsa_from_pod.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# TheCodeForgeTest IRSA from inside a pod
# First, create a service account annotated with the IAM role
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app-sa
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/my-eks-role
---
# Then test from inside the pod
TOKEN=$(cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token)
aws sts assume-role-with-web-identity \
  --role-arn arn:aws:iam::123456789012:role/my-eks-role \
  --role-session-name test \
  --web-identity-token $TOKEN
Mental Model: The Token Relay Race
  • The K8s API server signs a JWT containing the service account identity.
  • The pod presents this token to STS through the AWS SDK.
  • STS verifies the token's signature with the OIDC provider's public keys.
  • Result: temporary AWS credentials (valid for up to 1 hour, auto-refreshed).
Production Insight
IRSA broke silently after an EKS upgrade because the OIDC thumbprint in the trust policy was outdated.
Rule: After every cluster upgrade, verify the thumbprint and update IAM trust policies if needed.
Key Takeaway
IRSA eliminates static AWS keys.
Token is short-lived, automatically refreshed.
Misconfigured trust policy breaks pod permissions.

Production Gotchas and How to Avoid Them

Even with managed services, EKS clusters suffer from specific failure patterns. Here are the ones we've seen destroy weekends.

CNI plugin race condition: When nodes start, the aws-node daemonset must be running before pods can get IPs. If the node's kubelet starts too fast, pods may try to launch before the CNI is ready, leading to a FailedCreatePodSandBox. This is especially common during node group scaling events. Fix: Add an init container or delay kubelet readiness until the CNI is ready.

Cluster Autoscaler not scaling down: The autoscaler needs a list of what's preventing scale-down (e.g., pods with cluster-autoscaler.kubernetes.io/safe-to-evict: false). But a common culprit is kube-system pods that are not backed by a PDB (Pod Disruption Budget). Without a PDB, the autoscaler will never scale down a node because it can't guarantee safe eviction. Fix: Always add PDBs for critical system pods.

AWS Load Balancer Controller (ALB/NLB) misconfig: The controller needs IAM permissions to create Target Groups, Listeners, etc. If the integration test passes but production times out, check for subnet tags — the subnets where ALBs are created must be tagged with kubernetes.io/role/elb (public) or kubernetes.io/role/internal-elb (private). Without those tags, the controller silently fails to provision load balancers.

DNS resolution issues with CoreDNS: CoreDNS pods sometimes get scheduled on nodes under memory pressure, causing them to be OOMKilled. This leads to intermittent DNS failures across the cluster. Fix: Set resource requests and limits on CoreDNS, and consider deploying two replicas on different node types.

EBS CSI driver not installed: If you run stateful workloads, you need the EBS CSI driver. Many teams miss this and see pods stuck in ContainerCreating with an event regarding volumes. Fix: Verify the EBS CSI add-on is enabled in the EKS console or via eksctl.

Security group rule limits: Each EKS cluster can have up to 5 security groups per ENI. If you use many security groups per pod (via the CNI later), you'll hit an AWS API limit. Plan your security groups carefully.

The Most Dangerous Alert: 'NodeHasSufficientMemory' but pods are Pending
If a node shows healthy but pods are Pending, check not only CPU/memory but also IP pool, disk pressure (PVs), and volume attachment limits. The EBS maximum attachment per instance type can throttle new pods requiring volumes.
Production Insight
In three years of running EKS, 70% of our production incidents were network-related — ENI limits, DNS, or security groups.
Rule: Always have a chaos engineering schedule that tests network failures.
Second rule: Use the eks-node-viewer tool to visualize pod and node capacity in real-time.
Key Takeaway
Most EKS failures are network-related.
Check IAM, subnet, and ENI limits first.
Use EKS console and CloudWatch to preempt issues.

EBS CSI Driver: Why Your Stateful Workloads Randomly Die

You deployed a StatefulSet with persistent volumes. Everything works in dev. Then in production, pods start crashing with "Volume is already attached to another node" or "disk full" errors that don't match your monitoring. That's the EBS CSI driver not being configured properly. Default storage classes are your enemy. They use gp2 volumes with no IOPS guarantees and attach volumes to nodes via the legacy in-tree driver, which doesn't handle reattachment gracefully.

The fix: install the EBS CSI driver as an Add-on (not an add-on, the official Add-on via EKS console or eksctl). Then create a StorageClass that explicitly sets volumeBindingMode: WaitForFirstConsumer. This delays volume creation until Kubernetes schedules the pod, ensuring the volume lands on the same AZ as the node. Without this, volumes get created in random AZs, then fail to attach because the node's network is in a different AZ.

Use gp3 as your default. It's cheaper, faster, and you can increase baseline IOPS without increasing size. Never use the default gp2 StorageClass in production.

ProductionEbsStorageClass.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — devops tutorial

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-fast
provisioner: ebs.csi.aws.com  # Must be CSI driver, not in-tree
volumeBindingMode: WaitForFirstConsumer  # Ensures volume is in same AZ as pod
allowVolumeExpansion: true  # Resize volumes without restarting pods
parameters:
  type: gp3
  iops: "6000"  # Baseline for gp3 is 3000, bump it for database workloads
  throughput: "250"  # Baseline is 125 MB/s
reclaimPolicy: Retain  # Never auto-delete PVC data on accident
Output
$ kubectl get sc
gp3-fast (default) ebs.csi.aws.com WaitForFirstConsumer Retain 2m
Production Trap:
If you see 'Volume is already attached to another node' errors, check your StorageClass has WaitForFirstConsumer. 90% of EBS detachment failures are AZ misalignment between volume and node.
Key Takeaway
Always set volumeBindingMode: WaitForFirstConsumer in your EBS StorageClass — it prevents the #1 cause of persistent volume attachment failures in production.

Cluster Autoscaler vs Karpenter: When to Throw Out the Default

Cluster Autoscaler is the default choice for scaling EKS node groups, but it has a fundamental flaw: it waits for unschedulable pods before it acts. That means cold starts, wasted spot capacity, and no awareness of pod resource fragmentation. Karpenter solves this by watching Pod specs directly and provisioning optimal instances within seconds—no node group management required. The real question isn't which tool to use, but when. Stick with Cluster Autoscaler if you have static node groups, strict compliance zones, or your cluster lives in accounts where Karpenter's IRSA permissions are blocked. Throw it out the moment you run spot instances, burst workloads, or any scenario where delaying a Pod by 60 seconds costs money. Karpenter wins on speed and cost—Cluster Autoscaler wins on simplicity and auditability. Pick the one that matches your operational risk tolerance, not the one that markets better.

karpenter-provisioner-example.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-pool
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["c5", "c6i", "m5"]
      nodeClassRef:
        name: default
      taints:
        - key: spot
          value: "true"
          effect: NoSchedule
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h
Output
NodePool 'spot-pool' created.
Production Trap:
Never mix Cluster Autoscaler and Karpenter on the same node group—they fight over instance termination and can cause cascading scale-up loops.
Key Takeaway
Cluster Autoscaler for compliance, Karpenter for speed and spot savings.

Common Mistakes That Burn Junior Teams (And How to Avoid Them)

You think deploying on EKS is just spinning up nodes and running kubectl? Wrong. The most expensive mistake is treating EKS like a DIY Kubernetes cluster. The control plane is managed, but you still own networking, IAM, and storage. Ignore that, and you get mysterious pod evictions and cost overruns.

Second mistake: overprovisioning node groups without understanding VPC CNI limits. Each node has a max pod count based on instance type ENI limits. Exceed that and pods hang in Pending state, not failing visibly. Third: skipping pod resource requests. Without them, the scheduler treats high-memory apps as low-priority, causing OOM kills during spikes. Fix these three, and you stop debugging symptoms and start shipping features.

common-mistakes-eks.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.theforge — devops tutorial

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
  - name: app
    image: my-app:1.0
    resources:
      requests:
        memory: "256Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"
Output
pod/my-app created
Production Trap:
Never skip VPC CNI version upgrades. EKS updates the CNI separately from the control plane. Running an old version silently throttles pod IP allocation, causing unexplained pod scheduling failures.
Key Takeaway
Always set resource requests and limits, and monitor VPC CNI IP pool exhaustion.

Best Practices That Drastically Reduce EKS Complexity

Stop treating EKS like a pet cluster. Use blue/green node group upgrades with Karpenter instead of Cluster Autoscaler for zero-downtime scaling. Why? Karpenter provisions instances in seconds based on pod resource requirements, not node counts. It also supports spot instances natively, cutting costs 60-70% for stateless workloads.

Second: enforce IRSA (IAM Roles for Service Accounts) for every pod. No hardcoded keys, no Node IAM roles. IRSA eliminates credential leaks and limits blast radius. Third: use AWS EBS CSI driver with gp3 volumes and snapshot schedules. This prevents the random storage failures that plague stateful workloads in production. And always tag nodes with lifecycle and team labels. Without tags, autoscaling groups become black boxes for cost allocation.

eks-best-practices.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.theforge — devops tutorial

apiVersion: karpenter.sh/v1
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot", "on-demand"]
  limits:
    resources:
      cpu: 1000
  provider:
    subnetSelector:
      Name: my-cluster-subnet
Output
provisioner.karpenter.sh/default created
Senior Shortcut:
Pair Karpenter with node-level taints and tolerations for spot instances. This ensure critical workloads never land on spot nodes, while batch jobs get the cheap compute.
Key Takeaway
Automate node provisioning with Karpenter, enforce IRSA, and tag everything for cost visibility.

EKS Pricing: Why Your Bill Explodes and How to Cap It

EKS pricing is deceptively simple: $0.10 per hour per cluster control plane. The real cost is the infrastructure you attach. Fargate pods cost $0.02–$0.06 per vCPU-hour plus memory, while managed node groups bill you for EC2 instances (t3.medium at ~$0.0416/hr), EBS volumes ($0.08/GB-month), and data transfer. The trap: idle nodes, oversized instance types, and NAT gateway data processing ($0.045/GB) for private subnets. A 3-node cluster running 24/7 can cost $400–$1,200/month before storage. To cap costs: use Karpenter with spot instances (60–90% discount), right-size pods via VPA, and delete unused load balancers. Always set a budget alert at 80% of your projected spend.

eks-cost-estimate.ymlYAML
1
2
3
4
5
6
7
8
9
// io.thecodeforge — devops tutorial

// Minimal EKS cluster cost per month (us-east-1)
// Control plane: $0.10/hr * 730 = $73
// 3x t3.medium (on-demand): $0.0416 * 3 * 730 = $91
// 3x 20GB gp3 EBS: $0.08 * 60 = $4.80
// NAT Gateway: $0.045/hr * 730 = $32.85
// Data transfer (1TB out): $0.09/GB first 10TB = $90
// Total baseline: ~$291.65/month
Output
Control plane: $73.00
EC2 nodes: $91.00
EBS: $4.80
NAT: $32.85
Data xfer: $90.00
Total: ~$291.65
Production Trap:
Leaving an unused ALB or NLB attached to an old ingress will burn $20–$30/month per load balancer. Audit with kubectl get ingress -A and delete orphaned resources.
Key Takeaway
Your EKS bill is 90% infrastructure — use spot instances, right-size pods, and delete unused cloud resources.

Use Cases of AWS EKS: Where It Wins and Where It Doesn't

EKS shines in hybrid workloads: batch processing with millions of short-lived jobs, real-time inference serving at scale, and multi-tenant SaaS backends needing strict IAM isolation. Top use cases: 1) CI/CD pipelines — spin up Jenkins agents or GitLab runners on spot instances, tear down instantly. 2) Microservices with service mesh (Istio/App Mesh) for observability and traffic splitting. 3) Data processing with Spark on Kubernetes — EKS handles autoscaling of 1,000+ nodes. Where it fails: single-instance stateful apps (use ECS Fargate instead for lower cost), latency-sensitive trading systems (EC2 with Elastic Fabric Adapter wins), and teams with no Kubernetes expertise (you'll burn budget on cluster misconfiguration). EKS is production-grade for organizations that already containerize.

eks-use-case-ingress.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — devops tutorial

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: multi-tenant-app
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
  - host: tenant1.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-service
            port:
              number: 80
Output
Ingress created for multi-tenant routing with AWS ALB in front of EKS.
Production Trap:
Running a single Docker container per pod on a 32-core node wastes resources. Use sidecars and pod resource requests to bin-pack workloads efficiently.
Key Takeaway
EKS is for elastic, multi-service workloads — not for single-stateful apps or teams new to Kubernetes.
● Production incidentPOST-MORTEMseverity: high

The Great Pod Scheduling Freeze: ENI Limit Nightmare

Symptom
New deployments and scale-ups fail. kubectl describe pod shows events like '0/6 nodes are available: 2 Insufficient cpu, 4 pod does not fit into any node due to pod limit.'
Assumption
The cluster was out of CPU or memory. Autoscaler was not launching new nodes because of a misconfiguration.
Root cause
Each EC2 node type has a limit on the number of Elastic Network Interfaces (ENIs) it can attach. EKS VPC CNI creates one ENI per pod. The node had exhausted its ENI slots, so no more IPs could be assigned. The node looked healthy but was 'IP exhaustion' inside.
Fix
Switched the cluster to use a prefix delegation mode (AWS_VPC_CNI_NODE_PORT_POOL and AWS_VPC_CNI_PREFIX_DELEGATION) which assigns /28 prefixes instead of single IPs, effectively increasing the pod density per node. Also increased the number of subnets to distribute IP pressure.
Key lesson
  • Never assume node resource metrics tell the whole story — check IP address pool size.
  • ENI limits per instance type are a hard ceiling; plan your max pods per node before choosing instance types.
  • Use prefix delegation or custom networking to scale beyond the default IP allocation model.
Production debug guideQuick reference for the most common EKS failures4 entries
Symptom · 01
New pods stay Pending with 'node(s) had taints' or 'Insufficient pods'
Fix
Check node IP availability: aws ec2 describe-network-interfaces --filters Name=status,Values=in-use --query 'NetworkInterfaces[*].[PrivateIpAddress]' --output text and compare to node max pods. Run kubectl describe node <node> for allocatable pods.
Symptom · 02
Nodes not joining cluster after AMI update
Fix
Verify node IAM role trust policy allows EC2 to assume the role. Check kubelet logs via SSM or user data output. Common cause: outdated CNI plugin version on the AMI.
Symptom · 03
Service DNS resolution fails intermittently
Fix
Check CoreDNS pods are running and have network connectivity. Use kubectl exec -it <pod> -- nslookup <service> from a test pod. Review VPC security group rules and network policy — CoreDNS must be able to reach the API server.
Symptom · 04
Pod cannot assume IAM role with IRSA
Fix
Verify the OIDC provider URL in the cluster matches the one in the trust policy. Check the token file path and the service account annotation. Use aws sts assume-role-with-web-identity manually from a pod to test the token.
★ EKS Quick Debug Cheat SheetRun these commands first when something breaks in an EKS cluster
Pod stuck in Pending state
Immediate action
Describe the pod and node
Commands
kubectl describe pod <pod>
kubectl describe node $(kubectl get pod <pod> -o jsonpath='{.spec.nodeName}') | grep -A5 'Allocated resources'
Fix now
If pod limit is hit, add more subnets or enable prefix delegation. If CPU/memory, add nodes or adjust commitments.
Node unreachable / NotReady+
Immediate action
Check node conditions and cloud provider readiness
Commands
kubectl get node <node> -o yaml | grep -A5 'conditions'
aws eks describe-nodegroup --cluster-name <cluster> --nodegroup-name <nodegroup>
Fix now
Restart the kubelet on the node via SSM: sudo systemctl restart kubelet. If persistent, replace the node.
Cluster autoscaler not scaling+
Immediate action
Check autoscaler logs and IAM permissions
Commands
kubectl logs -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler | tail -50
aws sts assume-role --role-arn <autoscaler-role-arn> --role-session-name test
Fix now
Grant autoscale permissions: add autoscaling:DescribeAutoScalingGroups etc. Set correct min/max node group sizes.
EKS Node Group Types Comparison
FeatureManaged Node GroupsSelf-Managed Node GroupsFargate
AMI controlLimited to AWS-provided AMIsFull control, custom AMINo control, uses AWS-managed infra
UpgradesAutomatic rolling updatesManual (AMIs, kubelet)Automatic (infra is ephemeral)
Instance typesAny EC2 instance typeAny EC2 instance typeFixed: 4 vCPU max, 30 GB memory
DaemonsetsFull supportFull supportNot supported
Cost modelEC2 instances (per hour)EC2 instances (per hour)Per pod-second (min 1 min)
Pod networkingVPC CNI (native)VPC CNI (native) or overlayVPC CNI (native, limited)

Key takeaways

1
EKS abstracts the control plane but exposes metrics
monitor etcd and API server request latency.
2
VPC CNI is the most powerful EKS feature and the most dangerous
plan subnet size per node type.
3
IRSA eliminates static credentials but requires precise OIDC thumbprint management after upgrades.
4
Most production incidents are network-related
ENI limits, DNS, or security group caps.
5
Use a hybrid node group strategy
managed for steady workloads, Fargate for bursty jobs, self-managed for custom needs.

Common mistakes to avoid

3 patterns
×

Not planning IP address exhaustion in subnets

Symptom
Cluster runs out of IPs and new pods cannot be scheduled. No errors in CloudWatch, just Pending events.
Fix
Monitor IP usage per subnet with VPC CNI metrics. Use prefix delegation or add secondary CIDRs. Plan pod density per instance type before deployment.
×

Assuming managed node groups handle all security patching

Symptom
Node AMIs become outdated with severe CVEs. AWS patches the control plane but not the nodes automatically unless you update the node group.
Fix
Set up regular node group updates (via eksctl or console). Use Bottlerocket for automated OS updates. Enable security scanning tools.
×

Running critical workloads without Pod Disruption Budgets

Symptom
During new node deployments, pods are evicted without warning, causing service disruption. Cluster autoscaler cannot drain nodes.
Fix
Define PDBs for all production services: minAvailable: 2 for replicas >= 3. Test node group updates in staging first.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does EKS VPC CNI assign IPs to pods? What limit does it have?
Q02SENIOR
Explain IRSA (IAM Roles for Service Accounts) and how it works at the to...
Q03SENIOR
Your EKS cluster's pods are stuck in Pending but nodes have free CPU and...
Q04SENIOR
What are the trade-offs between managed node groups and self-managed nod...
Q01 of 04SENIOR

How does EKS VPC CNI assign IPs to pods? What limit does it have?

ANSWER
The VPC CNI attaches ENIs to each EC2 node and assigns secondary IPs from those ENIs to pods. The limit is per-instance ENI count and IPs per ENI. For example, an m5.large can have 3 ENIs × 10 IPs = 30 pods. You can use prefix delegation to assign /28 prefixes per ENI, dramatically increasing pod density. Without it, you'll hit the limit quickly.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
Can I use EKS without any VPC experience?
02
How does EKS pricing work?
03
Can I run a private EKS cluster without internet access?
04
Why do my pods fail to start after an EKS cluster upgrade?
N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's Cloud. Mark it forged?

13 min read · try the examples if you haven't

Previous
Serverless Architecture Explained
16 / 23 · Cloud
Next
Google Cloud Run Basics