Fargate is AWS serverless compute engine that runs containers without managing servers
You define CPU and memory per task — Fargate provisions and scales infrastructure automatically
Works with both Amazon ECS and Amazon EKS for orchestration
You pay per second for vCPU and memory resources allocated to running tasks
Production use requires careful networking, IAM, and logging configuration
Biggest mistake: over-provisioning CPU and memory per task, inflating costs by 3-5x
✦ Definition~90s read
What is AWS Fargate — ENI IP Exhaustion Blocks Deployments Silently?
AWS Fargate is a serverless compute engine for containers that works with both Amazon ECS and Amazon EKS. It removes the need to manage EC2 instances — you define container images, CPU, memory, and networking requirements, and Fargate provisions the underlying infrastructure to run your containers.
★
Fargate is like renting individual apartments instead of buying an entire building.
Fargate assigns each task its own kernel runtime environment and elastic network interface (ENI). This provides task-level isolation comparable to running containers on dedicated EC2 instances, without the operational overhead of managing the instance fleet.
The core abstraction is the task — a set of one or more containers that share a network namespace and storage volumes. You define tasks in a task definition, which specifies the container image, resource requirements, IAM roles, logging configuration, and networking mode. ECS or EKS schedules these tasks onto Fargate-managed infrastructure.
Plain-English First
Fargate is like renting individual apartments instead of buying an entire building. With EC2, you own the building and manage everything — plumbing, electricity, maintenance. With Fargate, you rent just the space you need, and AWS handles the building. You bring your containers, specify how much CPU and memory they need, and Fargate runs them without you ever seeing a server.
AWS Fargate is a serverless compute engine for containers that eliminates the need to provision, configure, or scale virtual machine clusters. You package your application as a container image, define resource requirements, and Fargate runs it on infrastructure managed entirely by AWS.
Fargate shifts operational burden from managing EC2 instance fleets to defining task-level resource requirements. This simplifies capacity planning but introduces new challenges around networking configuration, IAM task roles, cold start latency, and cost optimization at scale. Production deployments require understanding these trade-offs before committing to Fargate over EC2 launch type.
What Is AWS Fargate?
AWS Fargate is a serverless compute engine for containers that works with both Amazon ECS and Amazon EKS. It removes the need to manage EC2 instances — you define container images, CPU, memory, and networking requirements, and Fargate provisions the underlying infrastructure to run your containers.
Fargate assigns each task its own kernel runtime environment and elastic network interface (ENI). This provides task-level isolation comparable to running containers on dedicated EC2 instances, without the operational overhead of managing the instance fleet.
The core abstraction is the task — a set of one or more containers that share a network namespace and storage volumes. You define tasks in a task definition, which specifies the container image, resource requirements, IAM roles, logging configuration, and networking mode. ECS or EKS schedules these tasks onto Fargate-managed infrastructure.
Each task gets its own ENI, kernel, and resource isolation — no shared host contention
You define CPU and memory per task, not per cluster — capacity planning is task-level
Fargate works with ECS and EKS — same serverless model for both orchestrators
You pay per second for vCPU and memory allocated to running tasks only
No SSH access to underlying infrastructure — all debugging happens through logs and APIs
Production Insight
Fargate tasks are immutable — you cannot SSH into them for debugging.
All troubleshooting must happen through CloudWatch Logs and ECS APIs.
Rule: invest in structured logging and health checks before deploying to Fargate.
Key Takeaway
Fargate runs containers without managing servers — you define tasks, AWS handles infrastructure.
Each task gets its own network interface and resource isolation.
Choose Fargate for operational simplicity, EC2 for cost optimization at steady scale.
.${AWS::RegionFargate vs EC2 Launch Type
IfWorkload has predictable, steady-state traffic
→
UseConsider EC2 with Savings Plans — lower cost at consistent utilization
IfWorkload is bursty or has variable scaling patterns
→
UseUse Fargate — pay only for running tasks, no idle instance cost
IfTasks require GPU, large memory (>30GB), or specific instance types
→
UseUse EC2 — Fargate has CPU/memory limits and no GPU support
IfTeam wants minimal operational overhead
→
UseUse Fargate — no instance patching, AMI management, or capacity planning
Fargate Networking and Security
Fargate tasks run in awsvpc mode — each task receives its own elastic network interface (ENI) with a private IP address in your VPC subnet. This provides VPC-level security controls through security groups and network ACLs, but requires careful subnet planning to avoid IP exhaustion.
Networking decisions have cost and performance implications. Tasks in private subnets require a NAT gateway for outbound internet access, which adds data processing charges. VPC endpoints for AWS services (S3, ECR, CloudWatch, Secrets Manager) eliminate NAT gateway costs for service-to-service communication.
Security follows the principle of least privilege through two IAM roles per task: the execution role (pulling images, writing logs, fetching secrets) and the task role (application-level AWS API access). Separating these roles ensures the task can only access resources it actually needs.
Each Fargate task consumes one ENI with a private IP — plan subnet CIDR blocks for peak task count
Tasks in private subnets without NAT gateway or VPC endpoints cannot pull images from ECR
Security groups on Fargate tasks must allow outbound HTTPS (443) for ECR, Secrets Manager, and CloudWatch
Cross-AZ traffic between Fargate tasks and RDS incurs data transfer charges
Public IP on Fargate tasks exposes them directly to the internet — always use private subnets with ALB
Production Insight
VPC endpoints eliminate NAT gateway data processing charges for AWS service access.
A single ECR image pull through NAT costs ~$0.045/GB — at scale this adds up fast.
Rule: create VPC endpoints for ECR, S3, CloudWatch, and Secrets Manager immediately.
Key Takeaway
Fargate tasks run in awsvpc mode with dedicated ENIs in your VPC subnets.
VPC endpoints for AWS services eliminate NAT gateway costs and improve reliability.
Separate execution role (infrastructure) from task role (application) for least privilege.
Fargate Pricing and Cost Optimization
Fargate pricing is based on vCPU and memory resources allocated to running tasks, billed per second with a one-minute minimum. This model eliminates idle capacity costs but requires right-sizing tasks to avoid over-provisioning.
Cost optimization in Fargate centers on three levers: right-sizing CPU and memory allocations, using Fargate Spot for fault-tolerant workloads, and consolidating containers into fewer, larger tasks. Most production Fargate deployments overspend by 30-50% due to inflated resource requests that do not match actual utilization.
Fargate Spot provides up to 70% cost reduction for interrupt-tolerant workloads like batch processing, CI/CD pipelines, and stateless workers. Spot tasks can be interrupted with two minutes notice, requiring graceful shutdown handling in your application.
Right-size tasks using CloudWatch Container Insights — most deployments over-provision by 2-3x
Use Fargate Spot for batch jobs, CI/CD workers, and stateless background tasks (70% savings)
Consolidate sidecar containers into the main task to reduce per-task overhead
Schedule non-production tasks to stop outside business hours using EventBridge + Lambda
Use Compute Savings Plans for predictable Fargate workloads (up to 17% savings)
Production Insight
Fargate Spot interruptions come with a two-minute warning via ECS.
Applications must handle SIGTERM gracefully to avoid data loss.
Rule: implement graceful shutdown hooks before using Fargate Spot in production.
Key Takeaway
Fargate charges per second for allocated vCPU and memory — not actual usage.
Right-sizing tasks based on real utilization saves 30-50% of compute costs.
Fargate Spot provides 70% savings for interrupt-tolerant workloads.
Deploying ECS Services on Fargate
Production ECS services on Fargate require a deployment configuration that handles rolling updates, health checks, auto-scaling, and service discovery. The ECS service abstraction manages task placement, desired count, and deployment strategy across Fargate-managed infrastructure.
Rolling updates with the circuit breaker pattern prevent failed deployments from replacing healthy tasks. The circuit breaker monitors task health and automatically rolls back if new tasks fail to start. Combined with health check grace periods, this prevents deployment cascading failures.
Auto-scaling on Fargate adjusts the desired task count based on CloudWatch metrics — CPU utilization, memory utilization, request count, or custom metrics via Application Auto Scaling. Scaling policies should use target tracking for steady-state adjustments and step scaling for rapid traffic spikes.
Circuit breaker monitors new task health during rolling deployments
If tasks fail to start, ECS automatically rolls back to the previous task definition
Health check grace period gives containers time to initialize before health checks begin
MinimumHealthyPercent: 100 ensures zero-downtime deployments — old tasks stay until new ones are healthy
Auto-scaling adjusts task count based on CPU, memory, or custom CloudWatch metrics
Production Insight
Deployment circuit breaker is the most important Fargate production feature.
Without it, a bad image push replaces all healthy tasks with failing ones.
Rule: enable circuit breaker with rollback on every ECS Fargate service.
Key Takeaway
ECS services on Fargate need circuit breaker, health checks, and auto-scaling.
Circuit breaker with rollback prevents bad deployments from taking down production.
Auto-scaling on CPU utilization at 65% target provides headroom for traffic spikes.
Fargate Logging and Observability
Production Fargate workloads require structured logging, distributed tracing, and container-level metrics. Since Fargate provides no SSH access, all observability must be configured through the task definition and external services before deployment.
CloudWatch Logs is the default log driver, but production systems benefit from FireLens — a Fluent Bit-based log router that supports structured JSON output, multi-destination routing, and log filtering. FireLens sends logs to CloudWatch, Datadog, Splunk, or Elasticsearch simultaneously.
Container Insights provides CPU, memory, disk, and network metrics per task and container. Combined with X-Ray for distributed tracing, this creates a complete observability stack for Fargate microservices.
Use FireLens (Fluent Bit) as log router — supports structured JSON and multi-destination output
Enable Container Insights on the ECS cluster for per-task CPU, memory, and network metrics
Add X-Ray sidecar for distributed tracing across microservices
Emit structured JSON logs with correlation IDs — never plain text log lines
Set log retention policy on CloudWatch log groups — default is infinite, which is expensive
Production Insight
Fargate has no SSH — logs are the only window into running containers.
Structured JSON logs with correlation IDs are mandatory for debugging microservices.
Rule: configure logging and tracing in the task definition before first deployment.
Key Takeaway
Fargate requires pre-configured observability — no SSH means no after-the-fact debugging.
FireLens routes structured logs to multiple destinations with filtering.
Container Insights and X-Ray provide the metrics and tracing stack for production monitoring.
Fargate Task Lifecycle: The State Machine You Can't Ignore
When a Fargate task dies — and it will — most engineers sit there staring at CloudWatch logs asking why. The real problem isn't debugging the crash. It's understanding the seven explicit states a task transitions through before it ever reaches RUNNING. Ignore this, and you'll waste hours chasing ghosts.
Fargate moves from PENDING to ACTIVATING to RUNNING, then eventually STOPPED when it fails or completes. The PROVISIONING state is where the magic — and the pain — lives. That's when Fargate allocates the underlying compute, pulls the container image, and attaches the ENI. If the image pull takes more than two minutes, your task gets recycled.
Here's the production gotcha: you don't get real-time feedback during PROVISIONING. CloudWatch logs aren't streaming yet because the container isn't running. The only signal is the event stream from ECS. Wire that into your alerting pipeline. If a task sits in PENDING for more than 30 seconds, something is starving — usually vCPU quota, subnet capacity, or ECR rate limits.
The state machine is unforgiving. A single STOPPED transition with reason "ScalingActivityInitiated" means your task was healthy but the deployment replaced it. That's fine. But "EssentialContainerExited" means your entrypoint blew up. Treat states as signals, not noise.
"Task arn:aws:ecs:us-east-1:123456789012:task/production-backend/abc12345 transitioned to STOPPED. Reason: EssentialContainerExited"
Production Trap:
Fargate task restart policies are strict. If your container exits with non-zero code, ECS will restart it up to three times by default. After that, the task stops permanently and the service drains. Always set RetryLimit=0 in your executionRole if you want to crash once and stay dead for debugging. Otherwise, you'll see ephemeral flashes in CloudWatch and never catch the root cause.
Key Takeaway
Every Fargate task transitions through seven states. Wire the ECS event stream to your logging system on day one, not after the third incident.
ephemeral Storage Limits: Why Your 20GB Ephemeral Task Will Silently Fail
AWS Fargate gives you 20 GB of ephemeral storage by default per task. That sounds generous until your ETL job downloads a 10 GB model file, unpacks it, and tries to write a 15 GB result. The task runs for an hour, then exits with no obvious error — just a disk-full message buried in the kernel logs that you'll never see unless you dig into /var/log/messages inside the container.
The fix isn't more storage — it's knowing your data. Fargate supports up to 200 GB per task if you explicitly set ephemeralStorage.sizeInGiB in your task definition. But every extra gig costs you. Budget it like you budget memory and CPU.
Here's the worst part:EphimeralStorage is shared across all containers in the same task. If you run a sidecar proxy and a main container, the proxy's log files eat into the same pool as your processing workspace. Set log rotation on every container. Pin your working directory to a mount point you monitor.
And under no circumstance should you rely on ephemeral storage for persistent data. Fargate tasks get recycled on deploy, scale-in, or failure. That 20 GB is gone with the task. If you need durable state, use EFS or S3. If you need scratch space for transient workloads, bump the storage to 60 GB and put a lifecycle check in your entrypoint that bails if disk usage exceeds 80%.
Don't guess storage needs. Add a 10-second health check in your container that writes a marker file, then checks disk usage. If it exceeds 90%, log a WARN and stay alive. You'll catch silent failures before they become production outages. Also: never use docker cp on Fargate — it doesn't work. Always debug with aws ecs execute-command and SSM Session Manager.
Key Takeaway
Ephemeral storage is shared, limited to 20 GB by default, and cleared on task stop. Always set explicit size, pin logs to EFS or CloudWatch, and monitor disk usage inside the container.
Why Fargate Sprawl Costs You Thousands: Right-Sizing From Day One
Most Fargate bills explode because teams deploy without a sizing strategy. Fargate charges per vCPU and per GB of memory provisioned, not used. A single idle 4-vCPU, 16-GB task costs over $700 monthly. The root cause? Developers default to oversized instances assuming safety buffers. Stop guessing. Use AWS Compute Optimizer recommendations or run your tasks with CloudWatch Container Insights to observe peak CPU and memory. Right-size each task definition: batch jobs tolerate lower memory, while web servers need memory headroom for request bursts. For spiky workloads, pair Fargate with Application Auto Scaling using target tracking on CPU or request count. This cuts costs by 40-60% without code changes. Remember: Fargate’s elasticity is useless if every task eats capacity you never use. Standardize a sizing review in your CI/CD pipeline — reject any task definition without a justified resource spec.
Auto Scaling policy created. Fargate service will maintain 50% CPU target.
Production Trap:
Setting MinCapacity too high wastes money. Use a CloudWatch scheduled action to drop to 0 at night for dev environments.
Key Takeaway
Always right-size tasks; idle vCPUs burn money. Auto Scale by CPU, not memory.
Clean Up Fargate Leftovers Before AWS Bills You for Ghost Resources
Fargate tasks that complete exit instantly, but their associated resources linger. Elastic Network Interfaces (ENIs), CloudWatch log groups, and Application Load Balancer (ALB) target group registrations remain orphaned unless explicitly cleaned. Each orphaned ENI costs $3.60/month plus data transfer. Worse, active ALB target groups incur per-hour charges even with zero healthy targets. The why: Fargate detaches only the task, not its infrastructure. Fix it by tagging every Fargate task definition and service with an expiry date or environment tag. Then run a nightly Lambda function that queries ECS services with a tag filter and deletes terminated tasks' ENIs via the EC2 API. For ephemeral jobs, set ECS task execution role policy to auto-remove CloudWatch log groups older than 30 days. Finally, attach a lifecycle hook to your CI/CD pipeline that deletes ALB target groups after a rolling update. Cleaning isn't optional — it's a cost discipline.
cleanup-orphaned-enis.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — devops tutorial
// Lambda to delete orphaned ENIs from stopped Fargate tasks
AWSTemplateFormatVersion: '2010-09-09'Resources:
CleanupFunction:
Type: AWS::Lambda::FunctionProperties:
Handler: index.handler
Role: !GetAttCleanupRole.ArnRuntime: python3.11Code:
ZipFile: |
import boto3
ec2 = boto3.client('ec2')
def handler(event, context):
enis = ec2.describe_network_interfaces(
Filters=[{'Name':'description','Values':['AWS ECS *']}]
)['NetworkInterfaces']
for eni in enis:
if eni['Status'] == 'available':
ec2.delete_network_interface(NetworkInterfaceId=eni['NetworkInterfaceId'])
Output
Orphaned ENIs deleted. Logs show removed interface IDs.
Production Trap:
Deleting an ENI attached to a running task breaks the task. Always filter by status 'available'.
Key Takeaway
Orphaned ENIs and ALB target groups bleed costs. Automate nightly cleanup with Lambda.
● Production incidentPOST-MORTEMseverity: high
Fargate Task ENI Exhaustion Blocked All New Deployments
Symptom
New Fargate tasks stuck in PENDING status indefinitely. ECS service deployments hung at 0% progress. No error messages in CloudWatch — tasks simply never transitioned to RUNNING.
Assumption
Fargate capacity was temporarily unavailable in the us-east-1a Availability Zone.
Root cause
Each Fargate task requires an elastic network interface (ENI) with a private IP address in the VPC subnet. The team used /24 subnets (251 usable IPs) across two Availability Zones. With 120 tasks running and each task consuming one ENI, plus 30 ENIs consumed by NAT gateways, ALBs, and other VPC resources, the subnets were exhausted. New tasks could not be placed because no IP addresses were available. The team had no monitoring on subnet IP utilization.
Fix
Expanded subnets to /20 (4091 usable IPs) per AZ using secondary CIDR blocks. Added a CloudWatch alarm on available IP count via the AWS::EC2::Subnet AvailableIpAddressCount metric. Set alarm threshold at 20% remaining IPs. Added subnet IP utilization to the weekly capacity review dashboard.
Key lesson
Each Fargate task consumes one ENI with a private IP — plan subnet sizing for peak task count plus infrastructure overhead
Monitor subnet AvailableIpAddressCount and alert before exhaustion
Use /20 or larger subnets for Fargate workloads to avoid IP exhaustion
Consider AWS VPC Lattice or awsvpc mode alternatives for high-density task deployments
Production debug guideCommon symptoms and actions for Fargate production issues5 entries
Symptom · 01
Fargate task stuck in PENDING status
→
Fix
Check subnet available IPs, security group rules, and task execution role permissions. Run: aws ecs describe-tasks --cluster CLUSTER --tasks TASK_ARN --query 'tasks[0].stopReason'
Symptom · 02
Fargate task starts then exits immediately
→
Fix
Check CloudWatch Logs for the container. Verify the entrypoint and command in the task definition. Ensure the image exists in ECR with correct permissions.
Symptom · 03
Fargate tasks cannot reach RDS or other AWS services
→
Fix
Verify the task is in a subnet with NAT gateway or VPC endpoint. Check security group outbound rules. Verify the task execution role has required permissions.
Symptom · 04
Fargate deployment takes 5-10 minutes to replace tasks
→
Fix
Check health check grace period and deregistration delay on the target group. Reduce health check interval to 10s and healthy threshold to 2 for faster detection.
Symptom · 05
Fargate costs higher than expected
→
Fix
Review task CPU and memory allocation vs actual usage in CloudWatch Container Insights. Right-size tasks by analyzing p95 utilization over 14 days.
aws iam list-attached-role-policies --role-name ecsTaskExecutionRole --query 'AttachedPolicies[].PolicyArn'
Fix now
Ensure AmazonECSTaskExecutionRolePolicy is attached. If in private subnet, create ECR VPC endpoint (com.amazonaws.region.ecr.api + com.amazonaws.region.ecr.dkr).
If utilization consistently below 40%, reduce task CPU/memory. Consider Fargate Spot for non-critical workloads (up to 70% savings).
Fargate vs EC2 vs Lambda for Container Workloads
Feature
Fargate
EC2 Launch Type
Lambda (Container Images)
Server Management
Fully managed by AWS
You manage instances
Fully managed by AWS
Max Memory
Up to 120 GB per task
Depends on instance type
Up to 10 GB
Max vCPU
Up to 16 per task
Depends on instance type
Up to 6 vCPU
Execution Duration
Unlimited
Unlimited
15 minutes max
Networking
ENI per task in VPC
Shared ENI on instance
VPC optional
Cold Start
30-90 seconds for new tasks
None (instances running)
1-3 seconds
Cost Model
Per second for vCPU + memory
Per hour for instances
Per invocation + duration
Best For
Steady microservices, APIs
Cost-optimized at scale
Event-driven, short tasks
Key takeaways
1
Fargate runs containers without managing servers
you define tasks, AWS handles infrastructure
2
Each task gets its own ENI and resource isolation
plan subnet CIDR blocks for peak task count
3
Right-sizing CPU and memory based on actual utilization saves 30-50% of compute costs
4
Enable deployment circuit breaker with rollback on every ECS Fargate service
5
VPC endpoints for ECR, S3, and CloudWatch eliminate NAT gateway charges
6
Structured logging via FireLens is mandatory
Fargate has no SSH access for debugging
Common mistakes to avoid
6 patterns
×
Over-provisioning CPU and memory per Fargate task
Symptom
CloudWatch shows 15% CPU utilization and 25% memory utilization — paying for 85% unused resources across all tasks
Fix
Review Container Insights metrics for p95 utilization over 14 days. Right-size tasks to 1.3x actual utilization. Valid Fargate CPU values: 0.25, 0.5, 1, 2, 4, 8, 16 vCPU.
×
Running Fargate tasks in subnets too small for peak task count
Symptom
New tasks stuck in PENDING status — no error message, tasks never transition to RUNNING
Fix
Each task requires one ENI with a private IP. Use /20 subnets (4091 IPs) minimum. Monitor AvailableIpAddressCount with CloudWatch alarms at 20% remaining threshold.
×
Not enabling deployment circuit breaker on ECS services
Symptom
Bad deployment replaces all healthy tasks with failing ones — entire service goes down with no automatic recovery
Fix
Enable DeploymentCircuitBreaker with rollback in the ECS service definition. This automatically reverts to the previous task definition when new tasks fail.
×
Using default log driver without structured logging
Symptom
Debugging production issues requires grep through raw text logs — no correlation IDs, no structured fields, no multi-destination routing
Fix
Use FireLens with Fluent Bit as log router. Emit structured JSON logs with correlation IDs, service name, and request context. Set log retention policy on CloudWatch log groups.
×
Not creating VPC endpoints for AWS services
Symptom
All ECR image pulls and Secrets Manager fetches route through NAT gateway — high data processing charges and single point of failure
Fix
Create VPC endpoints for ECR (api + dkr), S3 (gateway), CloudWatch Logs, and Secrets Manager. This eliminates NAT gateway charges for AWS service communication.
×
Running production and development Fargate tasks in the same cluster without namespace isolation
Symptom
Development task OOM kills affect production task placement. Noisy neighbor issues in shared cluster capacity.
Fix
Use separate ECS clusters for production and development. Tag resources with environment labels. Use cluster capacity providers to isolate Fargate and Fargate Spot workloads.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01JUNIOR
What is AWS Fargate and how does it differ from running containers on EC...
Q02SENIOR
How would you optimize Fargate costs for a production microservices arch...
Q03SENIOR
A production Fargate service is experiencing intermittent task placement...
Q01 of 03JUNIOR
What is AWS Fargate and how does it differ from running containers on EC2?
ANSWER
AWS Fargate is a serverless compute engine for containers that works with Amazon ECS and Amazon EKS. The key differences from EC2 launch type:
1. Server management: With Fargate, AWS manages the underlying infrastructure — no EC2 instances to provision, patch, or scale. With EC2, you manage the instance fleet.
2. Resource model: Fargate allocates resources per task (vCPU and memory). EC2 allocates resources per instance, and tasks share instance capacity.
3. Isolation: Each Fargate task gets its own kernel runtime and ENI. On EC2, tasks share the host kernel and network interface.
4. Pricing: Fargate charges per second for allocated task resources. EC2 charges per hour for running instances, regardless of task utilization.
5. Scaling: Fargate scales task count automatically. EC2 requires Auto Scaling Groups to scale the instance fleet.
Fargate is best for variable workloads and operational simplicity. EC2 is better for cost optimization at steady, predictable scale.
Q02 of 03SENIOR
How would you optimize Fargate costs for a production microservices architecture?
ANSWER
Fargate cost optimization follows a systematic approach:
1. Right-sizing: Use Container Insights to measure actual CPU and memory utilization per task over 14 days. Most deployments over-provision by 2-3x. Reduce task resources to 1.3x p95 utilization.
2. Fargate Spot: Use Spot for fault-tolerant workloads — background workers, batch jobs, CI/CD pipelines. Spot provides up to 70% savings. Implement graceful shutdown to handle two-minute interruption notices.
3. Container consolidation: Combine sidecar containers (logging, monitoring) into fewer tasks to reduce per-task overhead. Each task has a base cost regardless of how many containers it runs.
4. VPC endpoints: Create endpoints for ECR, S3, CloudWatch, and Secrets Manager to eliminate NAT gateway data processing charges.
5. Scheduling: Stop non-production tasks outside business hours using EventBridge-triggered Lambda functions that update ECS service desired count to zero.
6. Compute Savings Plans: Purchase 1-year or 3-year Savings Plans for predictable baseline workloads — up to 17% savings.
The most impactful single action is right-sizing — it typically saves 30-50% without any functionality changes.
Q03 of 03SENIOR
A production Fargate service is experiencing intermittent task placement failures with tasks stuck in PENDING. Walk through your diagnosis process.
ANSWER
Systematic diagnosis for Fargate PENDING tasks:
1. Check task stop reason: aws ecs describe-tasks --cluster CLUSTER --tasks TASK_ARN. Look at stopReason and attachments for ENI creation failures.
2. Subnet IP exhaustion: Each task needs one ENI with a private IP. Check available IPs: aws ec2 describe-subnets --subnet-ids SUBNET_ID. If below 10%, this is the issue. Fix by expanding CIDR or adding subnets.
3. Task execution role: Verify the ecsTaskExecutionRole has AmazonECSTaskExecutionRolePolicy attached. Without it, tasks cannot pull images from ECR or fetch secrets.
4. Security group: Check that the task security group allows outbound HTTPS (443) for ECR, Secrets Manager, and CloudWatch Logs.
5. VPC endpoints: In private subnets without NAT gateway, tasks need VPC endpoints for ECR (api + dkr) and CloudWatch Logs.
6. Resource constraints: Check if the cluster has sufficient Fargate capacity. Verify no service quota limits are hit.
7. Image availability: Verify the container image exists in ECR and the task execution role has ecr:GetAuthorizationToken and ecr:BatchGetImage permissions.
The most common cause in production is subnet IP exhaustion — teams underestimate how many ENIs Fargate tasks consume at scale.
01
What is AWS Fargate and how does it differ from running containers on EC2?
JUNIOR
02
How would you optimize Fargate costs for a production microservices architecture?
SENIOR
03
A production Fargate service is experiencing intermittent task placement failures with tasks stuck in PENDING. Walk through your diagnosis process.
SENIOR
FAQ · 5 QUESTIONS
Frequently Asked Questions
01
Is AWS Fargate really serverless?
Fargate is serverless in the sense that you do not provision, manage, or patch any servers. AWS manages the underlying compute infrastructure entirely. However, unlike Lambda, Fargate tasks run continuously and you are billed for the duration they run, not per invocation. You still need to define networking, IAM, and logging configuration. Fargate removes server management but not infrastructure configuration.
Was this helpful?
02
What is the maximum size of a Fargate task?
Fargate supports up to 16 vCPU and 120 GB of memory per task. The valid CPU values are 0.25, 0.5, 1, 2, 4, 8, and 16 vCPU. Each CPU value has a set of valid memory configurations — for example, 1 vCPU supports 2-8 GB of memory. A single task can run up to 10 containers that share the task's CPU and memory allocation.
Was this helpful?
03
Can Fargate tasks communicate with each other?
Yes, Fargate tasks communicate through standard networking since each task has its own ENI in the VPC. Tasks can reach each other using private IP addresses, service discovery (Cloud Map), or an internal ALB. For ECS, AWS Service Connect provides service mesh capabilities with automatic service discovery and traffic management.
Was this helpful?
04
How does Fargate handle persistent storage?
Fargate supports two storage options: Amazon EFS (Elastic File System) for persistent shared storage across tasks, and ephemeral storage up to 200 GB per task. EFS volumes mount inside containers like a regular filesystem and persist across task restarts. For stateful workloads, EFS is the recommended approach — it provides shared, durable storage without managing EBS volumes.
Was this helpful?
05
Should I use Fargate Spot for production workloads?
Fargate Spot can be used in production for fault-tolerant workloads. AWS provides a two-minute interruption warning before reclaiming Spot capacity. Your application must handle SIGTERM gracefully and drain connections. Use a mixed capacity provider strategy — run baseline capacity on regular Fargate and burst capacity on Spot. This provides cost savings while maintaining availability for critical workloads.