Lambda Timeout at 15 Minutes — Migration Nightmare
Lambda's 15-minute hard timeout aborts migrations; most tutorials miss it.
20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.
- EC2: virtual machine for continuous workloads; charges per second
- Lambda: event-driven function; charges per 1ms execution, zero at rest
- S3: object storage with 11 nines durability
- RDS: managed relational database with automated backups
- IAM: identity and permissions layer — everything is denied by default
- Performance insight: Lambda cold start adds 100ms–1s to first request
- Production insight: Lambda + RDS without a connection pool exhausts DB connections
- Biggest mistake: using the root account for daily work
Imagine you're opening a pizza restaurant. You could buy your own building, ovens, and delivery vans — or you could rent a kitchen by the hour, use a shared delivery fleet, and only pay when orders come in. AWS is that rental model, but for computing. Instead of buying servers, storage, and networking hardware, you rent exactly what you need, scale it up on a busy Friday night, and scale it back down when things are quiet. You pay for what you use, nothing more.
Every app you use daily — Netflix streaming your show, Airbnb finding you a room, even NASA processing telescope images — runs on someone else's hardware. That hardware is overwhelmingly likely to be Amazon Web Services. AWS controls roughly 31% of the global cloud market, and understanding it isn't optional for a modern developer. Whether you're deploying your first side project or designing a system that serves millions, AWS is the environment you'll be working in.
Before cloud computing existed, launching a product meant buying physical servers, installing them in a data centre, estimating your peak traffic years in advance, and paying for that capacity whether you used it or not. A startup that went viral overnight would crash under load with no way to recover quickly. AWS solved this by turning infrastructure into software — things you provision with an API call in seconds, pay for by the minute, and throw away when you're done.
By the end of this article you'll understand the five services every AWS project touches — EC2, S3, RDS, Lambda, and IAM — why each one exists, when to reach for it over the alternatives, and how they wire together into a real production architecture. You'll also walk away with the vocabulary and mental models that make AWS job interviews approachable.
What AWS Lambda's 15-Minute Timeout Really Means
AWS Lambda enforces a hard 15-minute timeout on all function invocations. This is not a configurable soft limit — it is a platform-imposed ceiling that terminates execution at exactly 900 seconds. The timeout is measured from the moment the function receives an invocation event to the moment it returns a response or throws an unhandled error. Once the timeout fires, Lambda freezes the execution context, discards any in-flight work, and returns a 408 status code (Task timed out) to the caller. This is not a retry — it is a kill signal.
Under the hood, the timeout is enforced by the Lambda service at the hypervisor level. The function's allocated CPU and memory do not pause or slow down as the deadline approaches — execution continues until the wall clock hits the limit, then the process is terminated immediately. This means any cleanup logic, database commits, or external API calls that haven't completed by the 900-second mark will never execute. The function's response payload is also discarded; the caller receives only the timeout error.
You must design for this constraint from day one. If your workload regularly exceeds 15 minutes — such as large ETL jobs, video transcoding, or bulk database migrations — Lambda is the wrong compute service. Use AWS Fargate, Batch, or EC2 instead. For workloads that fit within the limit, implement idempotency and checkpointing so that a timeout does not corrupt state or lose data. The 15-minute ceiling is not negotiable; it is a design boundary.
AWS Global Infrastructure: Regions, Availability Zones, and Edge Locations
Before you provision a single resource, understand where it lives. AWS runs out of 33 geographic Regions worldwide (as of 2026), each containing at least three Availability Zones (AZs). An AZ is one or more data centres — physically separate, each with independent power, cooling, and networking. AZs are connected by high-speed, low-latency links, but a disaster that takes out one AZ leaves the others functional.
When you deploy an EC2 instance or an RDS database, you choose both a Region (e.g., us-east-1 in Northern Virginia) and an AZ within that Region. For high availability, you spread across multiple AZs. Multi-AZ architectures are the standard for production workloads. If one AZ goes offline, traffic shifts to the others.
Edge Locations extend AWS's footprint beyond Regions. These are points of presence (POPs) in major cities around the world, used by CloudFront (the CDN) and Route 53 (DNS) to cache content and respond to DNS queries from the closest edge. There are over 400 Edge Locations — far more than Regions — because it's cheaper to deploy a cache than a full data centre.
When building for global audiences, pick Regions closest to your users, and use CloudFront to cache static assets at edge locations. For disaster recovery, replicate data to a second Region hundreds of miles away.
The Five Core Services Every AWS Project Uses — and Why They Were Built
AWS has over 200 services, which is overwhelming until you realise that almost every architecture starts with the same five building blocks. Think of them as the five trades in construction: electricity, plumbing, walls, a roof, and a lock on the door. Everything else is finishing work.
EC2 (Elastic Compute Cloud) is your rented computer. It runs your application code exactly as a physical server would, but you can resize it, clone it, or delete it in minutes.
S3 (Simple Storage Service) is unlimited file storage. Not a database — a place to put files. Images, videos, backups, static websites, data exports. It's so reliable (eleven 9s of durability) that AWS themselves use it internally.
RDS (Relational Database Service) runs a managed PostgreSQL, MySQL, or other SQL engine. You don't patch it, back it up, or handle failover — AWS does. You just query it.
Lambda runs a function without a server. Upload code, define a trigger, done. No EC2 instance sitting idle waiting for work.
IAM (Identity and Access Management) is the lock on the door. Every call to every AWS service checks IAM first. Get this wrong and either nothing works or everything is exposed.
EC2 vs Lambda: Choosing the Right Compute Model Before You Write a Line of Code
The most consequential architectural decision in AWS isn't which database to use or how to structure your VPC. It's whether your code runs on EC2 or Lambda. Getting this wrong means either paying for idle servers 24/7 or hitting cold-start timeouts on user-facing requests.
Use EC2 when: your workload is continuous and predictable, you need full OS control, you're running long-running processes (video encoding, ML training), or you're lifting-and-shifting an existing app. An EC2 instance is just a VM — it starts up and stays up until you stop it.
Use Lambda when: your workload is event-driven and intermittent. An API endpoint that gets 50 requests per minute, a function that fires when a file lands in S3, a nightly data transform. Lambda charges per 1ms of execution. If the function doesn't run, you pay nothing.
The trap beginners fall into is using Lambda for everything because it sounds cheaper and more modern. Lambda has a hard 15-minute execution timeout. Put a 20-minute database migration in a Lambda and it will die mid-run, leaving your schema in a broken state. Put a CPU-intensive image processor in Lambda and the cold start latency will frustrate your users.
The sweet spot is using Lambda for glue — the code that reacts to events and orchestrates other services — while EC2 or containers handle the persistent, long-running workloads.
IAM Done Right: Why Least-Privilege Access Is Not Optional
IAM is the part of AWS that most tutorials rush through to get to the 'interesting' stuff, and it's the part that causes the most expensive real-world incidents. The 2019 Capital One breach that exposed 100 million customer records was an IAM misconfiguration. Understanding IAM isn't bureaucracy — it's engineering.
Every entity in AWS (a user, an EC2 instance, a Lambda function) has an identity. Every action on every resource is authorised by checking IAM policies attached to that identity. By default, everything is denied. You grant access explicitly.
The three concepts you must internalise are: Users (humans), Roles (services and applications — an EC2 instance assumes a role, not a user), and Policies (JSON documents that say what is allowed or denied on which resources).
The golden rule is **least privilege**: grant only the exact permissions needed for a specific task, scoped to the specific resource. Not s3: on — that's every S3 action on every bucket in your account. Instead: s3:GetObject on arn:aws:s3:::myapp-assets/*.
If a Lambda function only needs to read from one S3 bucket, its execution role should be able to do exactly that — nothing else. If that Lambda is compromised, the blast radius is one bucket in read-only mode, not your entire AWS account.
AWS Shared Responsibility Model: What AWS Secures and What You Must Secure
A common misconception for new AWS users is that AWS is fully responsible for security. In reality, security is shared: AWS secures the cloud infrastructure (data centres, hardware, networking, hypervisors), while you secure everything inside that infrastructure — your data, applications, operating systems, network configurations, IAM policies, and encryption.
This is often described as Security OF the Cloud (AWS's responsibility) vs Security IN the Cloud (your responsibility). The boundary shifts depending on the service. For EC2, you manage the OS, patches, and firewall; for RDS, AWS manages the OS and database engine patching, but you manage database access, user accounts, and data encryption at rest. With Lambda, AWS manages the runtime environment, but you manage function code, environment variables, and execution role permissions.
In practice, this means: always encrypt data at rest (S3 SSE, EBS encryption, RDS encryption), encrypt data in transit (SSL/TLS), use IAM roles instead of long-lived access keys, regularly patch your EC2 AMIs, and never open more ports than necessary. Assume AWS will protect the physical data centre; assume everything else is your problem.
AWS Pricing Models: On-Demand, Reserved, Spot, and Savings Plans — When to Use Each
AWS offers four primary pricing models, and choosing the wrong one is like paying first-class for a cargo flight — you get the same seat but at a wildly different price. Understanding when to use each can cut your compute costs by 50-70% with zero architectural change.
On-Demand: pay per hour or per second with no commitment. Best for short-term, spiky, or unpredictable workloads — development environments, new applications still being evaluated, or workloads that cannot tolerate interruption. You pay a premium for flexibility.
Reserved Instances (RIs): commit to 1 or 3 years of specific instance usage in a specific region. You save up to 72% compared to On-Demand. Best for steady-state, predictable workloads — your production web server running 24/7, an RDS instance for a core database. Convertible RIs allow some flexibility in instance family.
Spot Instances: bid on unused EC2 capacity at up to 90% discount. AWS can reclaim the instance with a 2-minute warning. Best for fault-tolerant, stateless, or batch workloads — data processing, image rendering, CI/CD workers, or any workload that can be interrupted and resumed. Never use Spot for databases or stateful app servers.
Savings Plans: a flexible discount model in exchange for a commitment to a consistent amount of compute usage (measured in $/hour) for 1 or 3 years. Savings Plans apply across EC2, Lambda, and Fargate, and automatically apply to any instance in the chosen family. They offer similar savings to RIs but with more flexibility. Best for organisations with diverse compute usage across multiple services.
The practical strategy: Use On-Demand for anything temporary or variable. For baseline, always-on workloads, buy Reserved Instances or Savings Plans. For batch processing, test environments, or non-critical services, use Spot. Never use On-Demand for predictable, long-running workloads — you're throwing money away.
EC2 vs Lambda vs Fargate: When to Use Each Compute Service
Beyond EC2 and Lambda, Fargate is a third compute option that sits between them — it runs containers without managing servers. Here's how to decide among all three.
EC2 (Elastic Compute Cloud): Full control over the OS and runtime. You manage everything from the kernel up. Best when you need custom AMIs, specific kernel modules, or direct hardware access (GPU). Also ideal for lift-and-shift migrations, long-running jobs (>15 min), and workloads requiring persistent storage attachments (EBS).
Lambda: Zero management. You upload code, set a trigger, and AWS runs it. Best for event-driven, short-lived tasks (<15 min), intermittent workloads with idle periods, and functions that need to scale to thousands of concurrent invocations instantly. Cold start can be an issue for latency-sensitive APIs.
Fargate: Run Docker containers without managing EC2 instances. You define the task (CPU, memory, container image), and AWS provisions and manages the underlying servers. Best for microservices that run continuously, batch jobs longer than Lambda's timeout, or workloads that need consistent performance without the overhead of EC2 management. Fargate is more expensive than EC2 for large, predictable workloads (you pay a ~10-20% markup for the managed infrastructure).
The practical guidance: use Lambda as glue for event-driven tasks, use Fargate for persistent containerised microservices (especially if you already use ECS/EKS), and use EC2 for any workload that needs full control, runs very long, or is cost-sensitive at scale. Many architectures mix all three: Lambda for processing S3 uploads, Fargate for the API server, and EC2 spot fleets for data processing jobs.
How a Real Production Architecture Wires These Services Together
Seeing services in isolation is useful for learning. But AWS's real power emerges when services compose. Here's how a production web application typically connects the pieces we've covered.
A user hits your domain. Route 53 (AWS DNS) resolves it to a CloudFront distribution (CDN). CloudFront serves static assets (HTML, CSS, JS) directly from S3 — zero server involved, globally cached, essentially free at scale. For dynamic API requests, CloudFront forwards to an Application Load Balancer, which distributes traffic across EC2 instances (or ECS containers) running your application.
The application reads and writes to RDS for structured data, and stores uploaded files directly to S3 using pre-signed URLs (so files go direct from the browser to S3 — never through your server). When a file lands in S3, an event triggers a Lambda function for async processing: thumbnail generation, virus scanning, metadata extraction.
Everything runs inside a VPC (Virtual Private Cloud) — a private network. The RDS instance has no public IP. The EC2 instances live in private subnets. Only the Load Balancer is internet-facing. IAM roles control exactly which service can talk to which resource.
This pattern — static assets on S3/CloudFront, compute on EC2/Lambda, data on RDS, files on S3, security via IAM and VPC — handles everything from a startup's MVP to a Fortune 500 platform without fundamentally changing shape.
Networking and Security: VPC, Security Groups, and NACLs — The Invisible Backbone
Every AWS resource lives inside a Virtual Private Cloud (VPC) — your private slice of the AWS network. Without understanding VPC basics, you'll struggle to connect services securely. The VPC is where you define subnets (public and private), route tables, internet gateways, and NAT gateways.
Security Groups are stateful firewalls attached to individual resources (EC2, RDS, Lambda). If you allow inbound on port 443, outbound traffic is automatically allowed regardless of rules. This is convenient but can mask misconfigurations.
Network ACLs (NACLs) are stateless firewalls applied to entire subnets. You must define both inbound and outbound rules explicitly. If you allow inbound HTTP but forget outbound return traffic, the connection fails silently.
The standard pattern: place your load balancer in a public subnet with a Security Group allowing HTTP/HTTPS from the internet. Place EC2 instances and RDS in private subnets with Security Groups that only allow traffic from the load balancer's security group. A bastion host (jump box) in a public subnet provides secure SSH access for administrators.
Enable VPC Flow Logs to capture metadata about every packet that traverses your VPC — invaluable for debugging connectivity issues and security incidents.
AWS Certification Roadmap: Which Exams to Take and in What Order
If you're a working developer aiming to validate your AWS skills, the certification path can be confusing. Here's a brief, opinionated roadmap based on what matters for real engineering roles.
Start with: AWS Certified Cloud Practitioner (CLF-C02). This foundational exam covers basic cloud concepts, pricing, and core services. It's non-technical but gives you a broad overview. Skip it if you already have 6+ months of hands-on experience — go straight to Associate.
Then: AWS Certified Solutions Architect – Associate (SAA-C03). This is the gold standard for developers and architects. It tests your ability to design secure, resilient, cost-optimised architectures using core services. Most job postings list this as a preferred certification. Study focus: VPC, S3, EC2, Lambda, RDS, IAM, CloudFront, Route 53, and the right patterns for each use case.
Optional but valuable: AWS Certified Developer – Associate (DVA-C02). Overlaps with Solutions Architect but dives deeper into CI/CD, CloudFormation, Lambda, DynamoDB, and application deployment. If you write code on AWS, this validates development-specific skills.
Advanced: AWS Certified Solutions Architect – Professional (SAP-C02). For senior engineers who design multi-account, hybrid, and large-scale architectures. Expect scenario-based questions about migration, cost control, and security at enterprise scale.
Specialty certifications (Security, Data Analytics, Machine Learning, Networking, Database) are for focused roles. Don't chase them unless your daily work demands it.
The practical path: Cloud Practitioner (optional) → Solutions Architect Associate (mandatory) → Developer Associate (if you build apps) → Solutions Architect Professional (after 2+ years of AWS experience). This sequence gives you the vocabulary, design principles, and confidence to architect and debug production systems.
Hands-On Practice: 5 AWS Exercises to Build Real Skills
Reading about AWS is not enough. You must provision resources, misconfigure them, break them, and fix them. These five exercises cover the services discussed in this article and will give you the hands-on confidence to tackle production issues.
Exercise 1: S3 Bucket Policy – Public vs Private Create an S3 bucket, upload a file, and make it publicly readable by adding a bucket policy. Then block public access using S3 Block Public Access. Verify the public access fails. Then create a pre-signed URL that grants temporary access. This exercise teaches bucket policies, public access blocks, and signed URLs — a common production pattern for file sharing.
Exercise 2: Create an IAM Role and Attach a Policy via CLI Use the AWS CLI to create an IAM role for EC2 with a trust policy that allows ec2.amazonaws.com to assume it. Attach a managed policy (e.g., AmazonS3ReadOnlyAccess). Launch an EC2 instance with this role and verify it can list S3 buckets without any access key. This exercise demonstrates instance profiles and role assumption — the foundation of secure AWS usage.
Exercise 3: Trigger a Lambda Function from an S3 Upload Create a simple Lambda function (e.g., in Python) that logs the bucket and key of uploaded objects. Create an S3 bucket and add a trigger that invokes the Lambda function on s3:ObjectCreated:*. Upload a file and check CloudWatch Logs to confirm the invocation. This exercise is the building block for event-driven architectures.
Exercise 4: Launch an RDS Instance and Connect from an EC2 Instance Create an RDS PostgreSQL instance in a private subnet. Create an EC2 instance in the same VPC (public subnet) and install the PostgreSQL client. Connect to the RDS instance using its internal DNS. Then enable deletion protection and attempt to delete the instance via CLI to see the error. This exercise covers VPC networking, security groups, and database management basics.
Exercise 5: Build a Two-Tier VPC with Public and Private Subnets Create a VPC with CIDR 10.0.0.0/16. Add a public subnet and a private subnet. Set up an Internet Gateway for the public subnet and a NAT Gateway for the private subnet. Test connectivity: launch an EC2 in the public subnet (should have internet access) and another in the private subnet (should have outbound internet via NAT but no direct inbound). This exercise is the foundation of any secure network in AWS.
What is DevOps — And Why AWS Doesn't Give a Damn About Your Job Titles
DevOps isn't a role you hire for. It's a contract between developers and operators that says: 'We stop throwing code over the wall and start owning what we ship together.' The core principles — automation, CI/CD, monitoring, feedback loops — existed long before someone put a buzzword on a slide deck.
Here's what actually matters: you automate everything that hurts. You build pipelines that catch failures before they hit production. You monitor in real-time because users won't file tickets — they'll just leave. AWS doesn't care if your team is called 'DevOps' or 'SRE' or 'Site Reliability Wizards.' It gives you the tools to enforce this contract. CodePipeline for CI/CD. CloudWatch for monitoring. Systems Manager for operational automation.
The real question isn't 'what is DevOps.' It's: does your team own the outcome, or just the code? AWS forces you to answer that question the first time a deployment breaks at 3 AM.
Getting Started: Your First AWS DevOps Account Without Getting Fired
Setting up an AWS account is the easy part. Keeping it from becoming a security incident waiting to happen is where most people fail. Here's the bare minimum: create a root account, enable MFA immediately, generate an access key for programmatic access, then lock the root user in a drawer. You don't deploy from root. Period.
Next: create an IAM user for yourself with administrator access — but only for initial setup. Then build a least-privilege user for your actual work. The principle is simple: if a deployment doesn't need S3 delete permissions, it shouldn't have them. AWS IAM Access Analyzer will tell you when you're being sloppy. Listen to it.
For the actual DevOps setup: enable CloudTrail from day one. It's your audit log when something breaks at 2 AM and you need to know who deleted the production database. Set up a budget alert. Trust me — the first month's bill will be a shock if you don't. Start with the Free Tier, learn in a single region (us-east-1 is fine), and never leave a playground environment running overnight.
Accessing AWS: The API, Console, and CLI Are Not Interchangeable
You will access AWS through three doors: the web console, the CLI, and the SDK/API. They all talk to the same backend, but they are not interchangeable for production work.
The console is for exploration, debugging, and one-off tasks. It lulls you into clicking through wizard UIs. That is fine for learning. Dangerous for operations. Every click is a manual step that cannot be version-controlled, audited, or repeated reliably. If you are building infrastructure, you should be writing code, not clicking buttons.
The CLI and SDK are your production tools. The CLI is for scripting and ad-hoc automation. The SDK is for embedding AWS calls into your application code. Both authenticate through the same IAM credentials — never hardcode them. Use environment variables, instance profiles, or AWS Secrets Manager. The moment you paste an access key into a config file committed to Git, you have a security incident waiting to happen.
IAM Users vs Roles: Stop Creating Users for Machines
IAM users have long-term credentials. IAM roles have temporary credentials that rotate automatically. If you are tempted to create an IAM user for your EC2 instance or Lambda function, stop.
IAM roles are the only correct way to give AWS resources permissions. You attach a role to the resource, and AWS hands it temporary credentials valid for up to 12 hours. If those credentials leak, they expire. An IAM user's access keys live until you revoke them — and you will forget to rotate them.
The same logic applies to cross-account access. Never create a user in Account A and share keys with Account B. Instead, establish a trust policy on Account A's role that allows Account B's role to assume it. Now Account B can act in Account A without any long-lived secrets. This is the foundation of AWS Organizations and multi-account security.
One exception: human developers doing local testing. They need long-lived keys. Use an IAM user with a strong password and MFA. But the moment that code runs on a server, the user goes away and the role takes over.
Why AWS CodeCommit Is Not Just a Git Clone
AWS CodeCommit is a fully managed source control service that hosts secure Git repositories. Unlike GitHub or GitLab, it integrates natively with AWS IAM for access control, meaning no SSH keys or personal access tokens to manage—your AWS credentials become your Git credentials. This reduces attack surface and simplifies compliance audits. CodeCommit automatically encrypts repositories at rest and in transit, and scales without provisioning servers. Its tight integration with CodeBuild, CodeDeploy, and CodePipeline makes it the obvious choice for AWS-native CI/CD. However, the tradeoff is fewer community features: no pull request reviews or forking. Use CodeCommit when your team already operates within AWS and needs audit trails, VPC isolation, or cross-account access. Avoid it if you rely on GitHub Actions or third-party integrations that AWS doesn't mirror. The real value is not the Git features—it's the IAM-powered security boundary that eliminates credential sprawl.
AWS CodeDeploy: Why You Deploy to Instances, Not to Environments
AWS CodeDeploy automates application deployments to EC2, Lambda, or on-premises servers. It does not care about your environment labels—it deploys to compute targets based on deployment groups. You define a deployment group (e.g., production-asg) containing auto scaling group instances or Lambda aliases. CodeDeploy then rolls out your revision (AppSpec + artifacts) with configurable traffic shifting strategies: AllAtOnce for dev, Rolling for production, or Blue/Green for zero-downtime. The AppSpec file defines lifecycle hooks (BeforeInstall, AfterInstall, ValidateService) where you run validation scripts. If a hook fails, CodeDeploy automatically rolls back. This removes manual SSH and error-prone bash scripts from deployments. Critical detail: deployment groups can be updated without redeploying, and you can stop a deployment mid-rollout. Use with CodePipeline for fully automated rollouts. Avoid manual tagging or environment checks—let CodeDeploy manage the target mapping. The missing link most engineers overlook: hook scripts must be idempotent, because CodeDeploy re-runs them on retries.
1. Introduction
DevOps is not a tool — it's a cultural shift that collapses the wall between development and operations. AWS provides the raw infrastructure to automate that collapse, but understanding the why is critical before touching any button. The core promise of DevOps on AWS is this: infrastructure becomes code, deployments become automated, and feedback loops shrink from weeks to minutes. When you define your entire stack in CloudFormation or Terraform, you eliminate configuration drift and manual errors. AWS services like CodePipeline, CodeBuild, and CodeDeploy exist to formalize that automation, but they are meaningless without first understanding the problem they solve — namely that manual deployment is the single biggest source of production downtime. The real metric of success is not how many AWS services you know, but how fast you can recover from a failed deployment. Before you build pipelines, ask yourself: what does 'done' mean for your team? If you cannot answer that without mentioning a business outcome, you are not ready for AWS DevOps.
6. Implementing CI/CD on AWS
Continuous Integration and Continuous Deployment on AWS is not a pipeline — it is a feedback mechanism. The why is simple: every code commit should be deployable to production without human intervention, because hands on keyboards introduce variability and delay. AWS CodePipeline orchestrates the flow, but the real intelligence lives in the triggers. Instead of polling Git every hour, configure webhooks so that CodePipeline activates on every push to main. CodeBuild compiles, tests, and packages your application in isolated containers — this catches dependency hell before it reaches staging. CodeDeploy then pushes artifacts to EC2, Lambda, or ECS using the same deployment strategy every time, eliminating the 'works on my machine' syndrome. The key trap: don't deploy to environments; deploy to instances. Use blue/green or rolling deployments with health checks. A failed deployment should automatically roll back, not require a midnight Slack message. Remember: CI/CD is not about speed — it is about repeatable, auditable, safe delivery. If your pipeline takes 10 minutes but never breaks production, you win.
10. Hands-On Projects for Learning
Theory evaporates without practice. Do not start with a complex microservice architecture — start with a single Lambda function behind API Gateway, triggered by a CodeCommit push. Project 1: Build a static site on S3 behind CloudFront with a CI pipeline that invalidates the cache on deploy. You will learn CloudFormation, IAM roles, and the pain of SSL certs. Project 2: Containerize a Node.js app with Docker, push to ECR, and deploy on ECS Fargate using CodePipeline. You will touch task definitions, service auto-scaling, and load balancer health checks. Project 3: implement blue/green deployment on EC2 with CodeDeploy — deliberately break the health check to see an automatic rollback. Each project should be fully destroyed and rebuilt from code in under 30 minutes. If it takes longer, your automation is wrong. The goal is not the application — it is the infrastructure as code that deploys it. Track your time: every minute spent clicking in the AWS console is a minute stolen from learning real automation.
12. Conclusion
AWS DevOps is not a certification or a resume bullet — it is the discipline of eliminating trust in humans and placing trust in code. If you walk away with one thing, let it be this: every manual action you take in AWS is a future incident waiting to happen. Script everything. Automate rollbacks before you automate deployments. Never use root credentials, never hardcode secrets, never approve a production deploy on a Friday. The tools — CodePipeline, CloudFormation, EKS — are just syntax. The real skill is knowing when to say 'no' to a process that cannot be automated. Production architectures fail not because of bad code, but because of bad processes. AWS gives you the raw power to build anything; it also gives you the power to destroy your entire account in one command. Use Infrastructure as Code, enforce least-privilege IAM, and treat every deployment like it will fail. Do that, and you will not just be a DevOps engineer on AWS — you will be someone who never gets paged at 3 AM.
Kubernetes on AWS: The Orchestrator You Didn't Know You Needed
Kubernetes (K8s) is not a deployment tool — it is a declarative operating system for your containers. On AWS, you have two paths: EKS (managed) or self-hosted on EC2. Choose EKS unless you enjoy patching control plane nodes at 2 AM. The why is simple: Kubernetes gives you self-healing, scaling, and rolling updates out of the box. When a pod dies, K8s restarts it. When traffic spikes, the Horizontal Pod Autoscaler spins up replicas. When you push a new image, a rolling update replaces old pods without downtime. But here is the trap — Kubernetes adds complexity. You now manage Ingress controllers, RBAC, ConfigMaps, and persistent volumes. Do not run K8s just to run containers. Use it when you need multi-service orchestration, blue/green deploys across nodes, or fine-grained resource limits. On AWS, integrate with ALB Ingress Controller for traffic routing, and store secrets in AWS Secrets Manager not plaintext YAML. The gold rule: if your app fits on one ECS service with Fargate, do not touch Kubernetes.
Lambda Timeout Wrecks Database Migration
- Know Lambda's limits before choosing compute. 15-minute max is a hard wall.
- Long-running batch jobs belong on persistent compute — EC2 or containers.
- Always test with realistic data volume — development migrations are fast, production ones are not.
aws iam list-attached-role-policies --role-name YourRole. Look for a policy that allows s3:GetObject on arn:aws:s3:::my-bucket/*.aws logs get-log-events --log-group-name /aws/lambda/your-function --log-stream-name 'LATEST'aws lambda get-function-configuration --function-name your-function | jq .TimeoutKey takeaways
Common mistakes to avoid
4 patternsUsing root account for daily operations
Leaving S3 buckets publicly accessible
Not setting up CloudTrail and monitoring
Putting RDS in a public subnet
Interview Questions on This Topic
What is the difference between a Security Group and a Network ACL?
Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Notes here come from systems that actually shipped.
That's Cloud. Mark it forged?
24 min read · try the examples if you haven't