AWS EKS ENI Limits — Why Healthy Nodes Reject Pods
EC2 ENI slots cap EKS pod count per node, even with free CPU and memory.
20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.
- EKS is a managed Kubernetes control plane on AWS.
- Control plane runs in a separate AWS account, multi-AZ for HA.
- VPC CNI assigns each pod a VPC IP address directly.
- Node groups: managed, self-managed, or Fargate.
- IRSA uses OIDC to grant IAM roles to pods.
- Biggest gotcha: hitting ENI limits causes pod scheduling failures.
Imagine you run a massive food court with hundreds of stalls. You need someone to manage which stalls open when, reroute customers if one stall closes, and make sure nobody runs out of supplies. EKS is like hiring AWS to be your food court manager — they handle all the hard 'keep everything running' work while you just worry about what food your stalls serve. The stalls are your containers, the food court is Kubernetes, and AWS owns the building.
Kubernetes is the de facto standard for running containerised workloads at scale, but running a production-grade Kubernetes control plane yourself is genuinely brutal. etcd upgrades, API server HA, certificate rotation, audit log pipelines — it's a full-time job before you've written a single line of application code. That's the gap AWS EKS was built to fill, and in 2024 it powers thousands of production systems from fintech to streaming to machine learning pipelines.
The problem EKS solves isn't just 'run Kubernetes for me.' It's the deep integration question: how do your pods get IAM permissions without storing static credentials? How does pod networking interact with AWS VPC routing tables? How do you autoscale nodes without leaving zombie instances behind? These are the questions that burn teams at 2 AM, and they all have specific EKS answers that differ from vanilla Kubernetes.
By the end of this article you'll understand exactly how the EKS control plane is architected and why, how VPC CNI assigns IPs to pods and where it breaks under load, how IAM Roles for Service Accounts (IRSA) works at the token level, how to choose between managed node groups, self-managed nodes, and Fargate, and which production gotchas have silently broken real deployments. This is the article you'll come back to before your next EKS architecture review.
Why AWS EKS Nodes Reject Pods Despite Being Healthy
AWS EKS ENI limits are the maximum number of Elastic Network Interfaces (ENIs) and IP addresses per node type, enforced by the AWS VPC CNI plugin. Each pod gets its own IP from a secondary IP on a node's ENI. When the ENI or IP pool is exhausted, the node reports Ready but the scheduler cannot bind new pods — they remain Pending with a "node(s) had insufficient pods" error. This is not a resource (CPU/memory) issue; it's a hard network interface cap.
Each instance type has a fixed ENI count and IPs per ENI. For example, a t3.medium supports 3 ENIs × 6 IPs = 18 pod IPs, but the kubelet's max-pods flag may be set higher (e.g., 20), causing silent scheduling failures. The VPC CNI plugin manages IP allocation via a warm pool; if the pool is too small or the node is at its ENI limit, new pods cannot be assigned an IP even if the node has free CPU and memory.
Use ENI limits to size node groups correctly and avoid mysterious pod scheduling failures. In production, always set max-pods to match the instance's ENI/IP capacity, and monitor the VPC CNI's IP pool metrics. Overprovisioning pods beyond the ENI limit leads to unpredictable scaling behavior and wasted debugging time.
EKS Control Plane Internals
AWS EKS runs the Kubernetes control plane (etcd, API server, controller manager, scheduler) in a separate AWS account that you never see. It's fully managed, meaning AWS handles upgrades, patching, and multi-AZ high availability for free. But 'managed' doesn't mean you can ignore it. The control plane exposes a public or private API server endpoint. You interact with it exactly like a vanilla Kubernetes API server, but there are subtle differences. For example, you cannot directly access etcd — AWS abstracts it completely. Audit logs must be enabled via CloudTrail integration. And the API server is fronted by an Elastic Load Balancer (NLB) that can be internal or public. The control plane also includes the AWS-specific admission webhooks, like the one that enforces IRSA annotations on pods.
Under the hood, AWS runs the Kubernetes control plane components as containers on EC2 instances in its own infrastructure. They're isolated per tenant. The etcd nodes are encrypted at rest and in transit. If you need to inspect control plane health, you rely on CloudWatch metrics like apiserver_request_duration_seconds or etcd_request_duration_seconds. AWS exposes a subset of etcd metrics through AWS managed Prometheus (AMP) or CloudWatch. In production, the most common control plane issue is throttling from the API server when your client requests or watches generate too many requests per second. AWS applies rate limits at the NLB and the API server itself.
Debugging hint: If you see 429 Too Many Requests from kubectl, you've hit the API server rate limit. Enable retries with backoff in your automation tools.
- You cannot modify etcd, Certificate Authority, or scheduler directly.
- Amazon provides CloudWatch metrics, audit logs, and event streams to peer inside.
- You are responsible for client-side retries, network latency, and API request rates.
- Control plane upgrades are scheduled by AWS, but you can initiate them after a maintenance window.
etcd_request_duration_seconds p99 — if >1s, investigate controller load.VPC CNI Networking: Pod IP Allocation and Limits
The AWS VPC CNI is what makes EKS unique among Kubernetes distributions. Instead of using an overlay network (like Flannel or Calico in IPIP mode), each pod gets a real IP address from your VPC subnet. This means pods can communicate with other VPC resources (RDS, EC2, Lambda) without any NAT or proxy. The CNI achieves this by attaching multiple Elastic Network Interfaces (ENIs) to each EC2 node and assigning secondary IP addresses from those ENIs to pods. When a pod is created, the CNI plugin picks an unused IP from a warm pool and assigns it to the pod's network namespace. This setup gives native VPC integration, but it comes with hard limits.
The first limit is per-instance ENI count. An m5.large has a maximum of 3 ENIs, each with up to 10 IPs, so total pods = 30. An m5.4xlarge supports 8 ENIs × 15 IPs = 120 pods. You hit these limits faster than you think. The second limit is IP address exhaustion in your VPC subnet. If you run 1000 pods on a /24 subnet (256 IPs), you'll quickly run out. The CNI also competes with other services for IPs.
To scale beyond these limits, AWS offers custom networking (assign pods IPs from different subnet ranges) and prefix delegation (assign /28 prefixes to ENIs, giving many more IPs per ENI). Prefix delegation is enabled via the AWS_VPC_CNI_PREFIX_DELEGATION environment variable on the aws-node daemonset. For new clusters, it's enabled by default.
Another common production trap: The CNI plugin uses the EC2 API to attach/detach ENIs, which means the node's IAM role needs specific permissions (ec2:AttachNetworkInterface, ec2:CreateNetworkInterface, etc.). If those permissions are missing, the CNI silently fails, and pods stay Pending with a FailedCreatePodSandBox error.
max-pods calculator (available in AWS docs) to determine the right instance size.AWS_VPC_CNI_PREFIX_DELEGATION=true). This gives up to 110 pods per node.Node Group Strategies: Managed vs Self-Managed vs Fargate
EKS offers three modes to run worker nodes: managed node groups, self-managed node groups, and AWS Fargate. Each has trade-offs in operational overhead, cost, and flexibility.
Managed node groups are the default choice. You specify an AMI family (Amazon Linux 2 or Bottlerocket), instance type, and scaling config. AWS handles patching the AMI (rolling updates), replacing unhealthy nodes, and draining them gracefully. You also get automatic security group rules for the control plane. The downside? You cannot use a custom AMI, and you're limited to the instance types AWS supports. If you need GPU drivers, custom kernel parameters, or pre-installed agents, you're out of luck.
Self-managed node groups give you full control. You launch EC2 instances, install kubelet, and join them to the cluster. You control the AMI, the bootstrap script, and the lifecycle policies. This is necessary for GPU-heavy workloads (like ML training) or when you need to pin specific kernel versions. The trade-off is operational burden: you must manage AMI updates, security patches, and node replacement yourself.
Fargate is the serverless option. You define a Fargate profile, and pods that match certain selectors run on Fargate instead of EC2. This is ideal for batch jobs, CI/CD runners, or sporadic workloads that don't justify always-on instances. The catch: you pay per pod-second, with a minimum charge of 1 minute. Fargate also limits your pod to a max of 4 vCPU and 30 GB memory, and you cannot run daemonsets or privileged containers.
A real-world pattern: Run critical microservices on managed node groups (steady state), bursty data processing on Fargate, and GPU training on a self-managed node group with custom AMI.
IAM Roles for Service Accounts (IRSA) — How Pods Get AWS Credentials
Before IRSA, you had to store AWS access keys in Kubernetes Secrets or in the node's instance profile and use kube2iam or kiam. Both were fragile. IRSA solves this by using OpenID Connect (OIDC) federation. Every EKS cluster has an OIDC issuer URL (e.g., oidc.eks.<region>.amazonaws.com/id/XXXXX). You create an IAM role with a trust policy that allows the OIDC provider to assume the role for a specific service account in a specific namespace. The Kubernetes API server issues a signed token (a JWT) that includes the pod's service account and namespace. The pod's aws-iam-token mutating webhook injects the token into the pod as a file (by default at /var/run/secrets/eks.amazonaws.com/serviceaccount/token). The AWS SDK uses a credential chain that picks up this token and calls STS:AssumeRoleWithWebIdentity to get temporary AWS credentials.
Common pitfalls: The OIDC provider's thumbprint must match the EKS cluster's certificate (you set this during IRSA setup). If the trust policy's aud claim doesn't match sts.amazonaws.com, the token is rejected. Also, the token has a time-to-live (TTL) of about 15 minutes, but the AWS SDK automatically refreshes it — as long as the pod can access the OIDC provider URL. If your VPC doesn't have internet access and you're using a private cluster, you need to expose the OIDC endpoint via a VPC endpoint (com.amazonaws.region.eks.auth).
Another gotcha: The token is issued by the Kubernetes API server, not by AWS STS. It's literally a Kubernetes service account token. The IAM trust policy verifies the token's signature using the OIDC provider's public keys. If the cluster is upgraded and the OIDC thumbprint changes, the token validation fails, and pods lose all AWS permissions until you update the thumbprint in the IAM role trust policy.
- The K8s API server signs a JWT containing the service account identity.
- The pod presents this token to STS through the AWS SDK.
- STS verifies the token's signature with the OIDC provider's public keys.
- Result: temporary AWS credentials (valid for up to 1 hour, auto-refreshed).
Production Gotchas and How to Avoid Them
Even with managed services, EKS clusters suffer from specific failure patterns. Here are the ones we've seen destroy weekends.
CNI plugin race condition: When nodes start, the aws-node daemonset must be running before pods can get IPs. If the node's kubelet starts too fast, pods may try to launch before the CNI is ready, leading to a FailedCreatePodSandBox. This is especially common during node group scaling events. Fix: Add an init container or delay kubelet readiness until the CNI is ready.
Cluster Autoscaler not scaling down: The autoscaler needs a list of what's preventing scale-down (e.g., pods with cluster-autoscaler.kubernetes.io/safe-to-evict: false). But a common culprit is kube-system pods that are not backed by a PDB (Pod Disruption Budget). Without a PDB, the autoscaler will never scale down a node because it can't guarantee safe eviction. Fix: Always add PDBs for critical system pods.
AWS Load Balancer Controller (ALB/NLB) misconfig: The controller needs IAM permissions to create Target Groups, Listeners, etc. If the integration test passes but production times out, check for subnet tags — the subnets where ALBs are created must be tagged with kubernetes.io/role/elb (public) or kubernetes.io/role/internal-elb (private). Without those tags, the controller silently fails to provision load balancers.
DNS resolution issues with CoreDNS: CoreDNS pods sometimes get scheduled on nodes under memory pressure, causing them to be OOMKilled. This leads to intermittent DNS failures across the cluster. Fix: Set resource requests and limits on CoreDNS, and consider deploying two replicas on different node types.
EBS CSI driver not installed: If you run stateful workloads, you need the EBS CSI driver. Many teams miss this and see pods stuck in ContainerCreating with an event regarding volumes. Fix: Verify the EBS CSI add-on is enabled in the EKS console or via eksctl.
Security group rule limits: Each EKS cluster can have up to 5 security groups per ENI. If you use many security groups per pod (via the CNI later), you'll hit an AWS API limit. Plan your security groups carefully.
eks-node-viewer tool to visualize pod and node capacity in real-time.EBS CSI Driver: Why Your Stateful Workloads Randomly Die
You deployed a StatefulSet with persistent volumes. Everything works in dev. Then in production, pods start crashing with "Volume is already attached to another node" or "disk full" errors that don't match your monitoring. That's the EBS CSI driver not being configured properly. Default storage classes are your enemy. They use gp2 volumes with no IOPS guarantees and attach volumes to nodes via the legacy in-tree driver, which doesn't handle reattachment gracefully.
The fix: install the EBS CSI driver as an Add-on (not an add-on, the official Add-on via EKS console or eksctl). Then create a StorageClass that explicitly sets volumeBindingMode: WaitForFirstConsumer. This delays volume creation until Kubernetes schedules the pod, ensuring the volume lands on the same AZ as the node. Without this, volumes get created in random AZs, then fail to attach because the node's network is in a different AZ.
Use gp3 as your default. It's cheaper, faster, and you can increase baseline IOPS without increasing size. Never use the default gp2 StorageClass in production.
Cluster Autoscaler vs Karpenter: When to Throw Out the Default
Cluster Autoscaler is the default choice for scaling EKS node groups, but it has a fundamental flaw: it waits for unschedulable pods before it acts. That means cold starts, wasted spot capacity, and no awareness of pod resource fragmentation. Karpenter solves this by watching Pod specs directly and provisioning optimal instances within seconds—no node group management required. The real question isn't which tool to use, but when. Stick with Cluster Autoscaler if you have static node groups, strict compliance zones, or your cluster lives in accounts where Karpenter's IRSA permissions are blocked. Throw it out the moment you run spot instances, burst workloads, or any scenario where delaying a Pod by 60 seconds costs money. Karpenter wins on speed and cost—Cluster Autoscaler wins on simplicity and auditability. Pick the one that matches your operational risk tolerance, not the one that markets better.
Common Mistakes That Burn Junior Teams (And How to Avoid Them)
You think deploying on EKS is just spinning up nodes and running kubectl? Wrong. The most expensive mistake is treating EKS like a DIY Kubernetes cluster. The control plane is managed, but you still own networking, IAM, and storage. Ignore that, and you get mysterious pod evictions and cost overruns.
Second mistake: overprovisioning node groups without understanding VPC CNI limits. Each node has a max pod count based on instance type ENI limits. Exceed that and pods hang in Pending state, not failing visibly. Third: skipping pod resource requests. Without them, the scheduler treats high-memory apps as low-priority, causing OOM kills during spikes. Fix these three, and you stop debugging symptoms and start shipping features.
Best Practices That Drastically Reduce EKS Complexity
Stop treating EKS like a pet cluster. Use blue/green node group upgrades with Karpenter instead of Cluster Autoscaler for zero-downtime scaling. Why? Karpenter provisions instances in seconds based on pod resource requirements, not node counts. It also supports spot instances natively, cutting costs 60-70% for stateless workloads.
Second: enforce IRSA (IAM Roles for Service Accounts) for every pod. No hardcoded keys, no Node IAM roles. IRSA eliminates credential leaks and limits blast radius. Third: use AWS EBS CSI driver with gp3 volumes and snapshot schedules. This prevents the random storage failures that plague stateful workloads in production. And always tag nodes with lifecycle and team labels. Without tags, autoscaling groups become black boxes for cost allocation.
EKS Pricing: Why Your Bill Explodes and How to Cap It
EKS pricing is deceptively simple: $0.10 per hour per cluster control plane. The real cost is the infrastructure you attach. Fargate pods cost $0.02–$0.06 per vCPU-hour plus memory, while managed node groups bill you for EC2 instances (t3.medium at ~$0.0416/hr), EBS volumes ($0.08/GB-month), and data transfer. The trap: idle nodes, oversized instance types, and NAT gateway data processing ($0.045/GB) for private subnets. A 3-node cluster running 24/7 can cost $400–$1,200/month before storage. To cap costs: use Karpenter with spot instances (60–90% discount), right-size pods via VPA, and delete unused load balancers. Always set a budget alert at 80% of your projected spend.
kubectl get ingress -A and delete orphaned resources.Use Cases of AWS EKS: Where It Wins and Where It Doesn't
EKS shines in hybrid workloads: batch processing with millions of short-lived jobs, real-time inference serving at scale, and multi-tenant SaaS backends needing strict IAM isolation. Top use cases: 1) CI/CD pipelines — spin up Jenkins agents or GitLab runners on spot instances, tear down instantly. 2) Microservices with service mesh (Istio/App Mesh) for observability and traffic splitting. 3) Data processing with Spark on Kubernetes — EKS handles autoscaling of 1,000+ nodes. Where it fails: single-instance stateful apps (use ECS Fargate instead for lower cost), latency-sensitive trading systems (EC2 with Elastic Fabric Adapter wins), and teams with no Kubernetes expertise (you'll burn budget on cluster misconfiguration). EKS is production-grade for organizations that already containerize.
The Great Pod Scheduling Freeze: ENI Limit Nightmare
kubectl describe pod shows events like '0/6 nodes are available: 2 Insufficient cpu, 4 pod does not fit into any node due to pod limit.'AWS_VPC_CNI_NODE_PORT_POOL and AWS_VPC_CNI_PREFIX_DELEGATION) which assigns /28 prefixes instead of single IPs, effectively increasing the pod density per node. Also increased the number of subnets to distribute IP pressure.- Never assume node resource metrics tell the whole story — check IP address pool size.
- ENI limits per instance type are a hard ceiling; plan your max pods per node before choosing instance types.
- Use prefix delegation or custom networking to scale beyond the default IP allocation model.
aws ec2 describe-network-interfaces --filters Name=status,Values=in-use --query 'NetworkInterfaces[*].[PrivateIpAddress]' --output text and compare to node max pods. Run kubectl describe node <node> for allocatable pods.kubectl exec -it <pod> -- nslookup <service> from a test pod. Review VPC security group rules and network policy — CoreDNS must be able to reach the API server.aws sts assume-role-with-web-identity manually from a pod to test the token.kubectl describe pod <pod>kubectl describe node $(kubectl get pod <pod> -o jsonpath='{.spec.nodeName}') | grep -A5 'Allocated resources'Key takeaways
Common mistakes to avoid
3 patternsNot planning IP address exhaustion in subnets
Assuming managed node groups handle all security patching
Running critical workloads without Pod Disruption Budgets
minAvailable: 2 for replicas >= 3. Test node group updates in staging first.Interview Questions on This Topic
How does EKS VPC CNI assign IPs to pods? What limit does it have?
Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.
That's Cloud. Mark it forged?
13 min read · try the examples if you haven't