Intermediate 14 min · March 06, 2026

Lambda Timeout at 15 Minutes — Migration Nightmare

Lambda's 15-minute hard timeout aborts migrations; most tutorials miss it.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • EC2: virtual machine for continuous workloads; charges per second
  • Lambda: event-driven function; charges per 1ms execution, zero at rest
  • S3: object storage with 11 nines durability
  • RDS: managed relational database with automated backups
  • IAM: identity and permissions layer — everything is denied by default
  • Performance insight: Lambda cold start adds 100ms–1s to first request
  • Production insight: Lambda + RDS without a connection pool exhausts DB connections
  • Biggest mistake: using the root account for daily work
Plain-English First

Imagine you're opening a pizza restaurant. You could buy your own building, ovens, and delivery vans — or you could rent a kitchen by the hour, use a shared delivery fleet, and only pay when orders come in. AWS is that rental model, but for computing. Instead of buying servers, storage, and networking hardware, you rent exactly what you need, scale it up on a busy Friday night, and scale it back down when things are quiet. You pay for what you use, nothing more.

Every app you use daily — Netflix streaming your show, Airbnb finding you a room, even NASA processing telescope images — runs on someone else's hardware. That hardware is overwhelmingly likely to be Amazon Web Services. AWS controls roughly 31% of the global cloud market, and understanding it isn't optional for a modern developer. Whether you're deploying your first side project or designing a system that serves millions, AWS is the environment you'll be working in.

Before cloud computing existed, launching a product meant buying physical servers, installing them in a data centre, estimating your peak traffic years in advance, and paying for that capacity whether you used it or not. A startup that went viral overnight would crash under load with no way to recover quickly. AWS solved this by turning infrastructure into software — things you provision with an API call in seconds, pay for by the minute, and throw away when you're done.

By the end of this article you'll understand the five services every AWS project touches — EC2, S3, RDS, Lambda, and IAM — why each one exists, when to reach for it over the alternatives, and how they wire together into a real production architecture. You'll also walk away with the vocabulary and mental models that make AWS job interviews approachable.

AWS Global Infrastructure: Regions, Availability Zones, and Edge Locations

Before you provision a single resource, understand where it lives. AWS runs out of 33 geographic Regions worldwide (as of 2026), each containing at least three Availability Zones (AZs). An AZ is one or more data centres — physically separate, each with independent power, cooling, and networking. AZs are connected by high-speed, low-latency links, but a disaster that takes out one AZ leaves the others functional.

When you deploy an EC2 instance or an RDS database, you choose both a Region (e.g., us-east-1 in Northern Virginia) and an AZ within that Region. For high availability, you spread across multiple AZs. Multi-AZ architectures are the standard for production workloads. If one AZ goes offline, traffic shifts to the others.

Edge Locations extend AWS's footprint beyond Regions. These are points of presence (POPs) in major cities around the world, used by CloudFront (the CDN) and Route 53 (DNS) to cache content and respond to DNS queries from the closest edge. There are over 400 Edge Locations — far more than Regions — because it's cheaper to deploy a cache than a full data centre.

When building for global audiences, pick Regions closest to your users, and use CloudFront to cache static assets at edge locations. For disaster recovery, replicate data to a second Region hundreds of miles away.

Production Insight
Choosing a Region matters for cost, latency, and compliance. AWS pricing varies by region — for example, eu-west-1 (Ireland) is typically 10-15% more expensive than us-east-1 (N. Virginia). Some data must stay within a country (GDPR in Europe). Always select the Region physically closest to your users to minimise latency. For fault tolerance, deploy across at least 2 AZs in the same region; for disaster recovery, replicate to a second region.
Key Takeaway
Regions are geographic areas, AZs are isolated data centers within a Region. Edge Locations accelerate content delivery. For production, spread across at least two AZs in the same region.

The Five Core Services Every AWS Project Uses — and Why They Were Built

AWS has over 200 services, which is overwhelming until you realise that almost every architecture starts with the same five building blocks. Think of them as the five trades in construction: electricity, plumbing, walls, a roof, and a lock on the door. Everything else is finishing work.

EC2 (Elastic Compute Cloud) is your rented computer. It runs your application code exactly as a physical server would, but you can resize it, clone it, or delete it in minutes.

S3 (Simple Storage Service) is unlimited file storage. Not a database — a place to put files. Images, videos, backups, static websites, data exports. It's so reliable (eleven 9s of durability) that AWS themselves use it internally.

RDS (Relational Database Service) runs a managed PostgreSQL, MySQL, or other SQL engine. You don't patch it, back it up, or handle failover — AWS does. You just query it.

Lambda runs a function without a server. Upload code, define a trigger, done. No EC2 instance sitting idle waiting for work.

IAM (Identity and Access Management) is the lock on the door. Every call to every AWS service checks IAM first. Get this wrong and either nothing works or everything is exposed.

aws_core_services_setup.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/bash
# Prerequisites: AWS CLI installed and configured with `aws configure`
# This script creates the skeleton of a real web app infrastructure:
# an S3 bucket for static assets, checks your IAM identity, and
# lists available EC2 instance types so you can make an informed choice.

# ── Step 1: Confirm who you are (IAM) ────────────────────────────────
# Always run this first. If the wrong profile is active you'll create
# resources in the wrong account — a very expensive mistake.
echo "Current IAM identity:"
aws sts get-caller-identity
# Output shows Account ID, IAM User ARN, and User ID.
# If you see 'Unable to locate credentials'
Output
Current IAM identity:
{
"UserId": "AIDA4EXAMPLE7USERID",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:user/sarah-dev"
}
Creating S3 bucket: myapp-static-assets-1718123456
{\n \"Location\": \"/myapp-static-assets-1718123456\"\n}\nBucket myapp-static-assets-1718123456 created and locked down.\nFile uploaded successfully.\nBucket contents:\n2024-06-11 14:22:01 22 index.html"
}

EC2 vs Lambda: Choosing the Right Compute Model Before You Write a Line of Code

The most consequential architectural decision in AWS isn't which database to use or how to structure your VPC. It's whether your code runs on EC2 or Lambda. Getting this wrong means either paying for idle servers 24/7 or hitting cold-start timeouts on user-facing requests.

Use EC2 when: your workload is continuous and predictable, you need full OS control, you're running long-running processes (video encoding, ML training), or you're lifting-and-shifting an existing app. An EC2 instance is just a VM — it starts up and stays up until you stop it.

Use Lambda when: your workload is event-driven and intermittent. An API endpoint that gets 50 requests per minute, a function that fires when a file lands in S3, a nightly data transform. Lambda charges per 1ms of execution. If the function doesn't run, you pay nothing.

The trap beginners fall into is using Lambda for everything because it sounds cheaper and more modern. Lambda has a hard 15-minute execution timeout. Put a 20-minute database migration in a Lambda and it will die mid-run, leaving your schema in a broken state. Put a CPU-intensive image processor in Lambda and the cold start latency will frustrate your users.

The sweet spot is using Lambda for glue — the code that reacts to events and orchestrates other services — while EC2 or containers handle the persistent, long-running workloads.

lambda_s3_image_thumbnail.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# This is a complete AWS Lambda function that triggers whenever a new image
# is uploaded to an S3 bucket, generates a thumbnail, and saves it to a
# second 'thumbnails' bucket. This is one of the most common Lambda patterns.

import boto3       # AWS SDK for Python — installed in Lambda runtime by default
import json
from PIL import Image   # Requires a Lambda Layer or packaging Pillow with your deploy
import io
import os

# boto3 clients are created outside the handler so they are reused across
# warm invocations — this is a real performance optimisation, not just style.
s3_client = boto3.client('s3')

# The THUMBNAILS_BUCKET env var is set in the Lambda config, not hardcoded.
# Hardcoding bucket names is a common mistake that breaks staging/prod parity.
THUMBNAILS_BUCKET = os.environ['THUMBNAILS_BUCKET']
THUMBNAIL_SIZE = (128, 128)  # width x height in pixels

def lambda_handler(event, context):
    """
    AWS calls this function automatically when a new object is created in
    the source S3 bucket. 'event' contains the bucket name and object key.
    """
    # Extract the source bucket and file key from the S3 event payload
    source_bucket = event['Records'][0]['s3']['bucket']['name']
    source_key = event['Records'][0]['s3']['object']['key']

    # Only process image files — avoid infinite loops if thumbnails land
    # in the same bucket as originals (a classic footgun).
    if source_key.startswith('thumbnails/'):
        print(f"Skipping thumbnail file to prevent recursion: {source_key}")
        return {'statusCode': 200
Pro Tip: Initialise boto3 clients outside the handler
Boto3 client initialisation takes ~50ms. On a warm Lambda invocation, code outside the handler function is NOT re-executed — AWS reuses the same execution environment. Moving client creation outside the handler is free performance. On a high-traffic Lambda processing 10,000 requests/day, this saves roughly 8 minutes of billed compute time per day.
Production Insight
A team once used Lambda for a daily ETL job that processed 500MB CSV files. The function ran for 14 minutes each time, barely under the 15-minute limit. When data volume grew, jobs started timing out. Switched to AWS Batch with EC2 spot instances and cut costs by 60%.
Cold starts: For latency-sensitive APIs, a Lambda hitting cold start (>500ms) can cause user frustration. Use Provisioned Concurrency for predictable latency, but it costs.
Know your workload profile before choosing compute.
Key Takeaway
Continuous + predictable → EC2. Intermittent + event-driven → Lambda.
If it runs longer than 15 minutes, it can't be Lambda.
If it's latency-critical, cold starts matter.
EC2 vs Lambda decision guide
IfWorkload runs longer than 15 minutes
UseUse EC2 or containers
IfWorkload runs intermittently, with idle periods
UseUse Lambda to avoid paying for idle time
IfNeed to control the OS or install custom software
UseUse EC2
IfJust want to react to events (S3 upload, API call)
UseUse Lambda — glue code is its sweet spot

IAM Done Right: Why Least-Privilege Access Is Not Optional

IAM is the part of AWS that most tutorials rush through to get to the 'interesting' stuff, and it's the part that causes the most expensive real-world incidents. The 2019 Capital One breach that exposed 100 million customer records was an IAM misconfiguration. Understanding IAM isn't bureaucracy — it's engineering.

Every entity in AWS (a user, an EC2 instance, a Lambda function) has an identity. Every action on every resource is authorised by checking IAM policies attached to that identity. By default, everything is denied. You grant access explicitly.

The three concepts you must internalise are: Users (humans), Roles (services and applications — an EC2 instance assumes a role, not a user), and Policies (JSON documents that say what is allowed or denied on which resources).

The golden rule is **least privilege**: grant only the exact permissions needed for a specific task, scoped to the specific resource. Not s3: on — that's every S3 action on every bucket in your account. Instead: s3:GetObject on arn:aws:s3:::myapp-assets/*.

If a Lambda function only needs to read from one S3 bucket, its execution role should be able to do exactly that — nothing else. If that Lambda is compromised, the blast radius is one bucket in read-only mode, not your entire AWS account.

lambda_s3_readonly_policy.jsonJSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowReadingFromSpecificBucketOnly",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::myapp-static-assets-1718123456",
        "arn:aws:s3:::myapp-static-assets-1718123456/*"
      ]
    },
    {
      "Sid": "AllowCloudWatchLoggingForDebugging",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/thumbnail-processor:*"
    }
  ]
}

# To attach this policy to a Lambda execution role via CLI:
#
# 1. Create the policy in IAM:
# aws iam create-policy \n#   --policy-name LambdaThumbnailS3ReadPolicy \n#   --policy-document file://lambda_s3_readonly_policy.json
#
# 2. Create the role (Lambda needs permission to assume it):
# aws iam create-role \n#   --role-name LambdaThumbnailRole \n#   --assume-role-policy-document '{
#     "Version": "2012-10-17",
#     "Statement": [{
#       "Effect": "Allow",
#       "Principal": {"Service": "lambda.amazonaws.com"},
#       "Action": "sts:AssumeRole"
#     }]
#   }'
#
# 3. Attach the policy to the role:
# aws iam attach-role-policy \n#   --role-name LambdaThumbnailRole \n#   --policy-arn arn:aws:iam::123456789012:policy/LambdaThumbnailS3ReadPolicy
Output
# After running create-policy:
{
"Policy": {
"PolicyName": "LambdaThumbnailS3ReadPolicy",
"PolicyId": "ANPA4EXAMPLEPOLICYID",
"Arn": "arn:aws:iam::123456789012:policy/LambdaThumbnailS3ReadPolicy",
"CreateDate": "2024-06-11T14:30:00+00:00",
"AttachmentCount": 0,
"IsAttachable": true
}
}
# After attach-role-policy: (no output means success — this is intentional CLI behaviour)
Interview Gold: The Two-Statement S3 Pattern
Notice the policy has two Resource ARNs for the bucket: one without / and one with. The ListBucket action applies to the bucket itself (no slash), while GetObject applies to objects inside it (with /). Using only the /* ARN is a very common mistake that causes ListBucket to silently fail with an AccessDenied error. Mentioning this in an interview signals real hands-on experience.
Production Insight
The Capital One breach in 2019 was caused by an overly permissive IAM role attached to a WAF instance. The role had s3:PutObject on a bucket that stored customer data, and the attacker used a SSRF vulnerability to assume that role.
Least privilege sounds bureaucratic until it's your breach. Scope actions to specific resources.
Always enable AWS CloudTrail to audit who did what. Without it, you're blind.
Key Takeaway
IAM is the only service that can grant or deny every other service.
Default deny is the only safe starting point.
Wildcard actions and wildcard resources together = disaster waiting to happen.
When to use an IAM Role vs User
IfAn EC2 instance needs to access S3
UseCreate an IAM Role and attach it to the EC2 instance profile
IfA human developer needs AWS Console access
UseCreate an IAM User with MFA
IfA Lambda function needs to write to DynamoDB
UseCreate an IAM Role with a trust policy for lambda.amazonaws.com

AWS Shared Responsibility Model: What AWS Secures and What You Must Secure

A common misconception for new AWS users is that AWS is fully responsible for security. In reality, security is shared: AWS secures the cloud infrastructure (data centres, hardware, networking, hypervisors), while you secure everything inside that infrastructure — your data, applications, operating systems, network configurations, IAM policies, and encryption.

This is often described as Security OF the Cloud (AWS's responsibility) vs Security IN the Cloud (your responsibility). The boundary shifts depending on the service. For EC2, you manage the OS, patches, and firewall; for RDS, AWS manages the OS and database engine patching, but you manage database access, user accounts, and data encryption at rest. With Lambda, AWS manages the runtime environment, but you manage function code, environment variables, and execution role permissions.

In practice, this means: always encrypt data at rest (S3 SSE, EBS encryption, RDS encryption), encrypt data in transit (SSL/TLS), use IAM roles instead of long-lived access keys, regularly patch your EC2 AMIs, and never open more ports than necessary. Assume AWS will protect the physical data centre; assume everything else is your problem.

Common Mistake: Assuming AWS Encrypts Everything by Default
By default, new S3 buckets and EBS volumes are NOT encrypted. You must explicitly enable encryption. AWS now offers default encryption at the account level for S3 and EBS — enable it in the account settings. For RDS, encryption can only be enabled at creation time; you cannot encrypt an existing unencrypted RDS instance without migrating to a new one.
Production Insight
A real incident: a company stored unencrypted logs in S3 containing customer PII. An S3 bucket policy misconfiguration made the logs publicly readable. Since the bucket was unencrypted, anyone who accessed it got plaintext data. The shared responsibility model means you own data classification and encryption. Use S3 Block Public Access, enable S3 Server-Side Encryption, and use CloudTrail to monitor bucket operations.
Key Takeaway
AWS secures the infrastructure; you secure your data, access controls, and configurations. Never assume encryption is on by default. Always audit your resources with AWS Config and use IAM least privilege.

AWS Pricing Models: On-Demand, Reserved, Spot, and Savings Plans — When to Use Each

AWS offers four primary pricing models, and choosing the wrong one is like paying first-class for a cargo flight — you get the same seat but at a wildly different price. Understanding when to use each can cut your compute costs by 50-70% with zero architectural change.

On-Demand: pay per hour or per second with no commitment. Best for short-term, spiky, or unpredictable workloads — development environments, new applications still being evaluated, or workloads that cannot tolerate interruption. You pay a premium for flexibility.

Reserved Instances (RIs): commit to 1 or 3 years of specific instance usage in a specific region. You save up to 72% compared to On-Demand. Best for steady-state, predictable workloads — your production web server running 24/7, an RDS instance for a core database. Convertible RIs allow some flexibility in instance family.

Spot Instances: bid on unused EC2 capacity at up to 90% discount. AWS can reclaim the instance with a 2-minute warning. Best for fault-tolerant, stateless, or batch workloads — data processing, image rendering, CI/CD workers, or any workload that can be interrupted and resumed. Never use Spot for databases or stateful app servers.

Savings Plans: a flexible discount model in exchange for a commitment to a consistent amount of compute usage (measured in $/hour) for 1 or 3 years. Savings Plans apply across EC2, Lambda, and Fargate, and automatically apply to any instance in the chosen family. They offer similar savings to RIs but with more flexibility. Best for organisations with diverse compute usage across multiple services.

The practical strategy: Use On-Demand for anything temporary or variable. For baseline, always-on workloads, buy Reserved Instances or Savings Plans. For batch processing, test environments, or non-critical services, use Spot. Never use On-Demand for predictable, long-running workloads — you're throwing money away.

Production Insight
A common cost pitfall: leaving a large On-Demand EC2 instance running 24/7 when it's only used during business hours. The fix: use AWS Instance Scheduler to stop instances overnight, or switch to a Spot instance if the workload is tolerant. Another: using On-Demand for a multi-AZ RDS deployment that never scales down. Convert to Reserved Instances or Savings Plan to cut 30-40%. Use AWS Cost Explorer and Athena on Cost and Usage Reports to identify the biggest savings opportunities.
Key Takeaway
On-Demand for flexibility, Reserved/Savings Plans for baseline, Spot for batch/cost-sensitive. A mix of all three, aligned to workload patterns, optimises cost without sacrificing reliability.

EC2 vs Lambda vs Fargate: When to Use Each Compute Service

Beyond EC2 and Lambda, Fargate is a third compute option that sits between them — it runs containers without managing servers. Here's how to decide among all three.

EC2 (Elastic Compute Cloud): Full control over the OS and runtime. You manage everything from the kernel up. Best when you need custom AMIs, specific kernel modules, or direct hardware access (GPU). Also ideal for lift-and-shift migrations, long-running jobs (>15 min), and workloads requiring persistent storage attachments (EBS).

Lambda: Zero management. You upload code, set a trigger, and AWS runs it. Best for event-driven, short-lived tasks (<15 min), intermittent workloads with idle periods, and functions that need to scale to thousands of concurrent invocations instantly. Cold start can be an issue for latency-sensitive APIs.

Fargate: Run Docker containers without managing EC2 instances. You define the task (CPU, memory, container image), and AWS provisions and manages the underlying servers. Best for microservices that run continuously, batch jobs longer than Lambda's timeout, or workloads that need consistent performance without the overhead of EC2 management. Fargate is more expensive than EC2 for large, predictable workloads (you pay a ~10-20% markup for the managed infrastructure).

The practical guidance: use Lambda as glue for event-driven tasks, use Fargate for persistent containerised microservices (especially if you already use ECS/EKS), and use EC2 for any workload that needs full control, runs very long, or is cost-sensitive at scale. Many architectures mix all three: Lambda for processing S3 uploads, Fargate for the API server, and EC2 spot fleets for data processing jobs.

Production Insight
A team ran a Python ML inference service on Lambda, but inference took 10-12 minutes per request — often timing out. They moved to Fargate, which handled the long-running tasks with stable CPU and memory. Costs went up slightly, but reliability improved. Another team used Fargate for a low-traffic API, but the “always running” cost exceeded what Lambda would have charged for the same volume of requests. Always calculate cost: Lambda is cheapest at low utilisation, Fargate is competitive at moderate utilisation, EC2 is cheapest at high utilisation (especially with Reserved Instances).
Key Takeaway
Lambda: short event-driven. Fargate: managed containers for persistent services. EC2: full control, long-running, or cost-optimised at scale. Mix and match based on workload profile.
EC2, Lambda, or Fargate Decision Guide
IfNeed full OS or hardware control?
UseEC2
IfWorkload runs less than 15 min, event-driven, intermittent?
UseLambda
IfRunning containers, don't want to manage servers, workload longer than 15 min?
UseFargate
IfNeed GPU or custom kernel?
UseEC2
IfLarge, predictable, cost-sensitive container workload?
UseEC2 (self-managed ECS/EKS)

How a Real Production Architecture Wires These Services Together

Seeing services in isolation is useful for learning. But AWS's real power emerges when services compose. Here's how a production web application typically connects the pieces we've covered.

A user hits your domain. Route 53 (AWS DNS) resolves it to a CloudFront distribution (CDN). CloudFront serves static assets (HTML, CSS, JS) directly from S3 — zero server involved, globally cached, essentially free at scale. For dynamic API requests, CloudFront forwards to an Application Load Balancer, which distributes traffic across EC2 instances (or ECS containers) running your application.

The application reads and writes to RDS for structured data, and stores uploaded files directly to S3 using pre-signed URLs (so files go direct from the browser to S3 — never through your server). When a file lands in S3, an event triggers a Lambda function for async processing: thumbnail generation, virus scanning, metadata extraction.

Everything runs inside a VPC (Virtual Private Cloud) — a private network. The RDS instance has no public IP. The EC2 instances live in private subnets. Only the Load Balancer is internet-facing. IAM roles control exactly which service can talk to which resource.

This pattern — static assets on S3/CloudFront, compute on EC2/Lambda, data on RDS, files on S3, security via IAM and VPC — handles everything from a startup's MVP to a Fortune 500 platform without fundamentally changing shape.

rds_setup_and_connect.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/bin/bash
# This script creates a minimal RDS PostgreSQL instance for a web app
# and demonstrates connecting to it securely from an EC2 instance.
# Cost note: even the smallest RDS instance (~$15/month) is NOT free tier
# eligible after the first 12 months. Use RDS Proxy in production for
# connection pooling — Lambda functions can exhaust DB connections instantly.

DB_IDENTIFIER="myapp-postgres-prod"
DB_NAME="myapp"
DB_USER="myapp_admin"
# In real usage, pull this from AWS Secrets Manager — never hardcode passwords
DB_PASSWORD="$(aws secretsmanager get-secret-value \n  --secret-id myapp/db/password \n  --query SecretString \n  --output text)"

# ── Create the RDS instance ───────────────────────────────────────────
echo "Creating RDS PostgreSQL instance..."
aws rds create-db-instance \n  --db-instance-identifier "$DB_IDENTIFIER" \n  --db-instance-class db.t3.micro \n  --engine postgres \n  --engine-version "15.4" \n  --master-username "$DB_USER" \n  --master-user-password "$DB_PASSWORD" \n  --db-name "$DB_NAME" \n  --allocated-storage 20 \n  --storage-type gp3 \n  --no-publicly-accessible \n  --backup-retention-period 7 \n  --deletion-protection \n  --region us-east-1

# --no-publicly-accessible: the DB only accepts connections from within the VPC
# --deletion-protection: prevents accidental deletion with a single CLI call
# --backup-retention-period 7: keeps 7 days of automated backups

echo "Waiting for instance to become available (this takes ~5 minutes)..."
aws rds wait db-instance-available \n  --db-instance-identifier "$DB_IDENTIFIER"

# ── Retrieve the endpoint once the instance is ready ──────────────────
DB_ENDPOINT=$(aws rds describe-db-instances \n  --db-instance-identifier "$DB_IDENTIFIER" \n  --query 'DBInstances[0].Endpoint.Address' \n  --output text)

echo "RDS instance ready at: $DB_ENDPOINT"

# ── Connect (run this from inside your EC2 instance, not your laptop) ─
# psql is available on Amazon Linux 2: sudo yum install -y postgresql15
echo "Connecting to database..."
PGPASSWORD="$DB_PASSWORD" psql \n  --host="$DB_ENDPOINT" \n  --port=5432 \n  --username="$DB_USER" \n  --dbname="$DB_NAME" \n  --command="SELECT version();"
Output
Creating RDS PostgreSQL instance...
{
"DBInstance": {
"DBInstanceIdentifier": "myapp-postgres-prod",
"DBInstanceClass": "db.t3.micro",
"Engine": "postgres",
"DBInstanceStatus": "creating",
"Endpoint": null
}
}
Waiting for instance to become available (this takes ~5 minutes)...
RDS instance ready at: myapp-postgres-prod.cxyz1234abcd.us-east-1.rds.amazonaws.com
Connecting to database...
version
------------------------------------------------------------------------
PostgreSQL 15.4 on x86_64-pc-linux-gnu, compiled by gcc 7.3.1, 64-bit
(1 row)
Watch Out: Lambda + RDS Without a Connection Pool
Each Lambda invocation opens a new database connection. At 1,000 concurrent Lambda executions, you've just opened 1,000 simultaneous DB connections. PostgreSQL's default max_connections is 100. Your database will refuse connections and your app will crash. The fix: put RDS Proxy between Lambda and RDS. It pools and reuses connections, and it's purpose-built for this exact scenario. This is a very common production incident for teams new to serverless.
Production Insight
When using Lambda + RDS without RDS Proxy, a traffic spike can open thousands of database connections, hitting PostgreSQL's max_connections (often 100). The database refuses new connections, causing application errors.
RDS Proxy is not optional; it reuses connections and reduces database load.
VPC design: placing RDS in a private subnet with no public IP is essential. An EC2 in the same VPC can connect via internal DNS.
Key Takeaway
Static content → S3 + CloudFront. API → ALB + EC2/Lambda. Database → RDS with RDS Proxy.
Network isolation via VPC subnets. IAM policies enforce service-to-resource permissions.
This pattern scales from MVP to enterprise.
Database connection strategy with Lambda
IfLambda + RDS with high concurrency
UseUse RDS Proxy
IfLambda + DynamoDB
UseNo connection pooling needed — DynamoDB is serverless
IfEC2 + RDS
UseUse a standard connection pool (HikariCP, pgBouncer) in the application

Networking and Security: VPC, Security Groups, and NACLs — The Invisible Backbone

Every AWS resource lives inside a Virtual Private Cloud (VPC) — your private slice of the AWS network. Without understanding VPC basics, you'll struggle to connect services securely. The VPC is where you define subnets (public and private), route tables, internet gateways, and NAT gateways.

Security Groups are stateful firewalls attached to individual resources (EC2, RDS, Lambda). If you allow inbound on port 443, outbound traffic is automatically allowed regardless of rules. This is convenient but can mask misconfigurations.

Network ACLs (NACLs) are stateless firewalls applied to entire subnets. You must define both inbound and outbound rules explicitly. If you allow inbound HTTP but forget outbound return traffic, the connection fails silently.

The standard pattern: place your load balancer in a public subnet with a Security Group allowing HTTP/HTTPS from the internet. Place EC2 instances and RDS in private subnets with Security Groups that only allow traffic from the load balancer's security group. A bastion host (jump box) in a public subnet provides secure SSH access for administrators.

Enable VPC Flow Logs to capture metadata about every packet that traverses your VPC — invaluable for debugging connectivity issues and security incidents.

create_vpc_basic.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
# Creates a VPC with one public and one private subnet, an Internet Gateway,
# and a NAT Gateway (costs ~$0.045/hour when running).

VPC_NAME="myapp-vpc"
VPC_CIDR="10.0.0.0/16"
PUBLIC_SUBNET_CIDR="10.0.1.0/24"
PRIVATE_SUBNET_CIDR="10.0.2.0/24"
REGION="us-east-1"

# ── Create VPC ────────────────────────────────────────────────────────
VPC_ID=$(aws ec2 create-vpc \n  --cidr-block "$VPC_CIDR" \n  --region "$REGION" \n  --query 'Vpc.VpcId' \n  --output text)
aws ec2 create-tags --resources "$VPC_ID" --tags Key=Name,Value="$VPC_NAME"
echo "Created VPC: $VPC_ID"

# ── Enable DNS hostnames ──────────────────────────────────────────────
aws ec2 modify-vpc-attribute \n  --vpc-id "$VPC_ID" \n  --enable-dns-hostnames "{\\\"Value\\\":true}\"\n\n# ── Create subnets ────────────────────────────────────────────────────\nPUBLIC_SUBNET_ID=$(aws ec2 create-subnet \\\n  --vpc-id \"$VPC_ID\" \\\n  --cidr-block \"$PUBLIC_SUBNET_CIDR\" \\\n  --region \"$REGION\" \\\n  --query 'Subnet.SubnetId' \\\n  --output text)\naws ec2 create-tags --resources \"$PUBLIC_SUBNET_ID\" --tags Key=Name,Value=\"${VPC_NAME}-public\"\n\necho \"Created public subnet: $PUBLIC_SUBNET_ID\"\n\nPRIVATE_SUBNET_ID=$(aws ec2 create-subnet \\\n  --vpc-id \"$VPC_ID\" \\\n  --cidr-block \"$PRIVATE_SUBNET_CIDR\" \\\n  --region \"$REGION\" \\\n  --query 'Subnet.SubnetId' \\\n  --output text)\naws ec2 create-tags --resources \"$PRIVATE_SUBNET_ID\" --tags Key=Name,Value=\"${VPC_NAME}-private\"\necho \"Created private subnet: $PRIVATE_SUBNET_ID\"\n\n# ── Internet Gateway ───────────────────────────────────────────────────\nIGW_ID=$(aws ec2 create-internet-gateway \\\n  --region \"$REGION\" \\\n  --query 'InternetGateway.InternetGatewayId' \\\n  --output text)\naws ec2 attach-internet-gateway \\\n  --internet-gateway-id \"$IGW_ID\" \\\n  --vpc-id \"$VPC_ID\"\necho \"Attached Internet Gateway: $IGW_ID\"\n\n# ── Route table for public subnet ─────────────────────────────────────\nPUBLIC_RT_ID=$(aws ec2 create-route-table \\\n  --vpc-id \"$VPC_ID\" \\\n  --region \"$REGION\" \\\n  --query 'RouteTable.RouteTableId' \\\n  --output text)\naws ec2 create-route \\\n  --route-table-id \"$PUBLIC_RT_ID\" \\\n  --destination-cidr-block 0.0.0.0/0 \\\n  --gateway-id \"$IGW_ID\"\naws ec2 associate-route-table \\\n  --route-table-id \"$PUBLIC_RT_ID\" \\\n  --subnet-id \"$PUBLIC_SUBNET_ID\"\necho \"Public route table configured.\"\n\n# ── NAT Gateway (for private subnet outbound) ─────────────────────────\n# First allocate an Elastic IP\nEIP_ALLOC=$(aws ec2 allocate-address \\\n  --domain vpc \\\n  --region \"$REGION\" \\\n  --query 'AllocationId' \\\n  --output text)\n\nNAT_GW_ID=$(aws ec2 create-nat-gateway \\\n  --subnet-id \"$PUBLIC_SUBNET_ID\" \\\n  --allocation-id \"$EIP_ALLOC\" \\\n  --region \"$REGION\" \\\n  --query 'NatGateway.NatGatewayId' \\\n  --output text)\n# NAT Gateway takes time to become available; do not proceed until it's active\necho \"Waiting for NAT Gateway to become available...\"\naws ec2 wait nat-gateway-available --nat-gateway-ids \"$NAT_GW_ID\"\necho \"NAT Gateway ready: $NAT_GW_ID\"\n\n# Private subnet route table (default one created with VPC, but we'll create a dedicated one)\nPRIVATE_RT_ID=$(aws ec2 create-route-table \\\n  --vpc-id \"$VPC_ID\" \\\n  --region \"$REGION\" \\\n  --query 'RouteTable.RouteTableId' \\\n  --output text)\naws ec2 create-route \\\n  --route-table-id \"$PRIVATE_RT_ID\" \\\n  --destination-cidr-block 0.0.0.0/0 \\\n  --nat-gateway-id \"$NAT_GW_ID\"\naws ec2 associate-route-table \\\n  --route-table-id \"$PRIVATE_RT_ID\" \\\n  --subnet-id \"$PRIVATE_SUBNET_ID\"\necho \"Private route table with NAT Gateway configured.\"\n\necho \"VPC setup complete. Summary:\"\necho \"  VPC: $VPC_ID\"\necho \"  Public subnet: $PUBLIC_SUBNET_ID\"\necho \"  Private subnet: $PRIVATE_SUBNET_ID\"\necho \"  Internet Gateway: $IGW_ID\"\necho \"  NAT Gateway: $NAT_GW_ID\"",
        "output": "Created VPC: vpc-0a1b2c3d4e5f67890\nCreated public subnet: subnet-12345678\nCreated private subnet: subnet-87654321\nAttached Internet Gateway: igw-12345678\nPublic route table configured.\nWaiting for NAT Gateway to become available...\nNAT Gateway ready: nat-0abcdef1234567890\nPrivate route table with NAT Gateway configured.\nVPC setup complete.\n  VPC: vpc-0a1b2c3d4e5f67890\n  Public subnet: subnet-12345678\n  Private subnet: subnet-87654321\n  Internet Gateway: igw-12345678\n  NAT Gateway: nat-0abcdef1234567890"
      }

AWS Certification Roadmap: Which Exams to Take and in What Order

If you're a working developer aiming to validate your AWS skills, the certification path can be confusing. Here's a brief, opinionated roadmap based on what matters for real engineering roles.

Start with: AWS Certified Cloud Practitioner (CLF-C02). This foundational exam covers basic cloud concepts, pricing, and core services. It's non-technical but gives you a broad overview. Skip it if you already have 6+ months of hands-on experience — go straight to Associate.

Then: AWS Certified Solutions Architect – Associate (SAA-C03). This is the gold standard for developers and architects. It tests your ability to design secure, resilient, cost-optimised architectures using core services. Most job postings list this as a preferred certification. Study focus: VPC, S3, EC2, Lambda, RDS, IAM, CloudFront, Route 53, and the right patterns for each use case.

Optional but valuable: AWS Certified Developer – Associate (DVA-C02). Overlaps with Solutions Architect but dives deeper into CI/CD, CloudFormation, Lambda, DynamoDB, and application deployment. If you write code on AWS, this validates development-specific skills.

Advanced: AWS Certified Solutions Architect – Professional (SAP-C02). For senior engineers who design multi-account, hybrid, and large-scale architectures. Expect scenario-based questions about migration, cost control, and security at enterprise scale.

Specialty certifications (Security, Data Analytics, Machine Learning, Networking, Database) are for focused roles. Don't chase them unless your daily work demands it.

The practical path: Cloud Practitioner (optional) → Solutions Architect Associate (mandatory) → Developer Associate (if you build apps) → Solutions Architect Professional (after 2+ years of AWS experience). This sequence gives you the vocabulary, design principles, and confidence to architect and debug production systems.

Production Insight
Certifications don't replace experience. The most valuable learning comes from breaking something in a dev account and fixing it. Use the exam as a structured study guide, then apply the concepts by building real projects. The AWS re:Invent videos, official documentation, and labs at AWS Skill Builder are excellent preparation resources. Combine certifications with a side project — like hosting a static site on S3+CloudFront or building a serverless API — to cement the concepts.
Key Takeaway
Start with Solutions Architect Associate (SAA-C03) for the broadest ROI. Add Developer Associate if you write code. Professional is for senior architects. Use exams as a study roadmap, not end goals.

Hands-On Practice: 5 AWS Exercises to Build Real Skills

Reading about AWS is not enough. You must provision resources, misconfigure them, break them, and fix them. These five exercises cover the services discussed in this article and will give you the hands-on confidence to tackle production issues.

Exercise 1: S3 Bucket Policy – Public vs Private Create an S3 bucket, upload a file, and make it publicly readable by adding a bucket policy. Then block public access using S3 Block Public Access. Verify the public access fails. Then create a pre-signed URL that grants temporary access. This exercise teaches bucket policies, public access blocks, and signed URLs — a common production pattern for file sharing.

Exercise 2: Create an IAM Role and Attach a Policy via CLI Use the AWS CLI to create an IAM role for EC2 with a trust policy that allows ec2.amazonaws.com to assume it. Attach a managed policy (e.g., AmazonS3ReadOnlyAccess). Launch an EC2 instance with this role and verify it can list S3 buckets without any access key. This exercise demonstrates instance profiles and role assumption — the foundation of secure AWS usage.

Exercise 3: Trigger a Lambda Function from an S3 Upload Create a simple Lambda function (e.g., in Python) that logs the bucket and key of uploaded objects. Create an S3 bucket and add a trigger that invokes the Lambda function on s3:ObjectCreated:*. Upload a file and check CloudWatch Logs to confirm the invocation. This exercise is the building block for event-driven architectures.

Exercise 4: Launch an RDS Instance and Connect from an EC2 Instance Create an RDS PostgreSQL instance in a private subnet. Create an EC2 instance in the same VPC (public subnet) and install the PostgreSQL client. Connect to the RDS instance using its internal DNS. Then enable deletion protection and attempt to delete the instance via CLI to see the error. This exercise covers VPC networking, security groups, and database management basics.

Exercise 5: Build a Two-Tier VPC with Public and Private Subnets Create a VPC with CIDR 10.0.0.0/16. Add a public subnet and a private subnet. Set up an Internet Gateway for the public subnet and a NAT Gateway for the private subnet. Test connectivity: launch an EC2 in the public subnet (should have internet access) and another in the private subnet (should have outbound internet via NAT but no direct inbound). This exercise is the foundation of any secure network in AWS.

exercise5_vpc_two_tier.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/bin/bash
# Complete VPC with public and private subnets, Internet and NAT Gateways.
# Reuses logic from the VPC section but combines into a single script.

VPC_CIDR="10.0.0.0/16"
PUBLIC_CIDR="10.0.1.0/24"
PRIVATE_CIDR="10.0.2.0/24"
REGION="us-east-1"

# Create VPC
VPC_ID=$(aws ec2 create-vpc --cidr-block $VPC_CIDR --region $REGION --query 'Vpc.VpcId' --output text)

# Enable DNS hostnames
aws ec2 modify-vpc-attribute --vpc-id $VPC_ID --enable-dns-hostnames "{"Value":true
● Production incidentPOST-MORTEMseverity: high

Lambda Timeout Wrecks Database Migration

Symptom
Migration job fails after exactly 15 minutes. Logs show: "Task timed out after 15:00 minutes". No partial rollback; database left in mid-migration state.
Assumption
Lambda can run any workload because it's serverless and scales automatically.
Root cause
Lambda has a hard 15-minute execution timeout. The migration script took 20 minutes to complete. The function was killed before finishing.
Fix
Switch to EC2, ECS, or AWS Batch for long-running tasks. Alternatively, break the migration into smaller chunks processed by sequential Lambda invocations using Step Functions.
Key lesson
  • Know Lambda's limits before choosing compute. 15-minute max is a hard wall.
  • Long-running batch jobs belong on persistent compute — EC2 or containers.
  • Always test with realistic data volume — development migrations are fast, production ones are not.
Production debug guideWhen AWS returns AccessDenied errors, use these steps to find the missing policy.3 entries
Symptom · 01
AccessDenied when calling s3:GetObject on bucket my-bucket
Fix
Check the identity's attached policies using aws iam list-attached-role-policies --role-name YourRole. Look for a policy that allows s3:GetObject on arn:aws:s3:::my-bucket/*.
Symptom · 02
AccessDenied on ec2:StartInstances for a specific instance
Fix
Check if the policy uses resource-level permissions. EC2 actions may require specifying the instance ARN. Also check for explicit Deny statements in the same policy or in Service Control Policies (SCPs).
Symptom · 03
Role assumed from EC2 cannot write to CloudWatch Logs
Fix
Verify the trust policy allows sts:AssumeRole for ec2.amazonaws.com. Then check the permissions policy includes logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents on the correct log group ARN.
★ Lambda Debug Cheat SheetQuick commands to diagnose Lambda execution issues.
Lambda function times out
Immediate action
Check CloudWatch logs for "Task timed out" message at the exact timeout mark.
Commands
aws logs get-log-events --log-group-name /aws/lambda/your-function --log-stream-name 'LATEST'
aws lambda get-function-configuration --function-name your-function | jq .Timeout
Fix now
Increase timeout via console or CLI: aws lambda update-function-configuration --function-name your-function --timeout 30
Lambda returns AccessDenied on S3+
Immediate action
Identify the Lambda execution role ARN from the function configuration.
Commands
aws lambda get-function-configuration --function-name your-function | jq .Role
aws iam list-attached-role-policies --role-name RoleNameFromARN
Fix now
Add the missing S3 permission to the role's policy, or update the bucket policy to allow the Lambda role.
Lambda invocation fails with "ResourceNotFoundException"+
Immediate action
Check that the event source (S3 bucket, SQS queue, etc.) still exists and the Lambda trigger is configured correctly.
Commands
aws lambda get-event-source-mapping --uuid your-uuid
aws s3api get-bucket-notification-configuration --bucket your-bucket
Fix now
Reconfigure the trigger: delete and recreate the event source mapping. Ensure the Lambda permissions policy allows the source service to invoke the function.
🔥

That's Cloud. Mark it forged?

14 min read · try the examples if you haven't

Previous
Cloud Computing Explained: Models, Services, and Real-World Architecture
2 / 23 · Cloud
Next
AWS EC2 Basics