AWS Lambda and Serverless Explained — Architecture, Triggers, and Real-World Patterns
Every application needs compute power — something has to run your code. Traditionally, that meant renting a virtual machine or physical server that runs 24/7, even at 3 a.m. when zero users are online. You're paying for potential, not actual work. As cloud adoption exploded, this idle-cost problem became impossible to ignore, especially for startups and teams with unpredictable traffic spikes.
AWS Lambda, launched in 2014, flipped the model. Instead of managing servers, you upload a function — a single, focused piece of logic — and AWS handles everything else: provisioning, scaling, patching, and availability. The term 'serverless' doesn't mean there are no servers; it means YOU don't manage them. The servers exist, they're just Amazon's problem. This lets your team focus entirely on business logic instead of infrastructure operations.
By the end of this article you'll understand how Lambda executes code, how to wire it to real-world triggers like API Gateway and S3, how to avoid the cold start trap that kills performance, and how to structure a production-worthy serverless workflow. You'll also know exactly when Lambda is the right tool — and when it absolutely isn't.
How AWS Lambda Actually Executes Your Code — The Execution Model
Lambda's execution model is the foundation everything else builds on. When a trigger fires — say, an HTTP request hits API Gateway — Lambda needs to run your function. If a pre-warmed container exists from a recent invocation, Lambda reuses it. This is a 'warm start' and it's fast. If no container is available, Lambda has to bootstrap one from scratch: download your code package, spin up a runtime environment, run any initialisation code outside your handler, then finally invoke your handler. That bootstrap phase is the dreaded cold start.
Cold starts typically add 100ms–1000ms of latency depending on the runtime (.NET and Java are heavier; Node.js and Python are lighter). For a background job this is irrelevant. For a user-facing API call, it's noticeable.
Your handler function receives two objects: the event (the payload that triggered the invocation — could be an HTTP body, an S3 event, a queue message) and the context (metadata about the invocation itself — function name, memory limit, request ID). Understanding this distinction is critical: the event is about WHAT happened, the context is about WHO is running.
Code outside the handler runs once per container lifecycle. That's where you put database connections, SDK clients, and config loading — doing it inside the handler means re-initialising on every single invocation, which is both slow and wasteful.
import boto3 import json import os from PIL import Image import io # ✅ Initialise the S3 client OUTSIDE the handler. # This runs once when the container boots (cold start), # then gets reused across all warm invocations — saving ~50ms per call. s3_client = boto3.client('s3') # Target width for all resized thumbnails THUMBNAIL_WIDTH = 200 def handler(event, context): """ Triggered by an S3 PUT event whenever a new image is uploaded to the 'uploads-raw' bucket. Resizes it and saves a thumbnail to the 'uploads-thumbnails' bucket. """ # The event payload from S3 contains a list of records — # each record represents one file upload. for record in event['Records']: source_bucket = record['s3']['bucket']['name'] object_key = record['s3']['object']['key'] # e.g. 'photos/sunset.jpg' print(f"Processing: s3://{source_bucket}/{object_key}") # Download the original image bytes into memory (no temp file needed) response = s3_client.get_object(Bucket=source_bucket, Key=object_key) image_bytes = response['Body'].read() # Open image with Pillow and calculate proportional height original_img = Image.open(io.BytesIO(image_bytes)) original_w, original_h = original_img.size ratio = THUMBNAIL_WIDTH / original_w new_height = int(original_h * ratio) thumbnail = original_img.resize((THUMBNAIL_WIDTH, new_height)) # Save resized image to an in-memory buffer — Lambda has no persistent disk output_buffer = io.BytesIO() thumbnail.save(output_buffer, format='JPEG', quality=85) output_buffer.seek(0) # Rewind buffer to the start before uploading # Write thumbnail to the destination bucket under the same key name destination_bucket = os.environ['THUMBNAIL_BUCKET'] # Read from env vars, not hardcoded s3_client.put_object( Bucket = destination_bucket, Key = object_key, Body = output_buffer, ContentType = 'image/jpeg' ) print(f"Thumbnail saved: s3://{destination_bucket}/{object_key} ({THUMBNAIL_WIDTH}x{new_height})") # Lambda expects a return value when invoked synchronously (e.g. via API Gateway). # For async triggers like S3, the return value is ignored — but it's good practice. return { 'statusCode': 200, 'body': json.dumps({'processed': len(event['Records'])}) }
Processing: s3://uploads-raw/photos/sunset.jpg
Thumbnail saved: s3://uploads-thumbnails/photos/sunset.jpg (200x133)
END RequestId: 7f3a1c2b-...
REPORT RequestId: 7f3a1c2b-... Duration: 312.45 ms Billed Duration: 313 ms Memory Size: 256 MB Max Memory Used: 89 MB
Wiring Lambda to the Real World — Triggers, Events, and API Gateway
A Lambda function sitting alone does nothing. It needs a trigger — an AWS service that says 'hey, something happened, go run'. The trigger determines the shape of the event object your handler receives, which is why reading the AWS event schema docs for each trigger type matters.
The most common triggers in production are: API Gateway (HTTP requests), S3 (file uploads/deletions), SQS (queue messages for async processing), EventBridge (scheduled cron jobs and event routing), DynamoDB Streams (react to database changes), and SNS (fan-out notifications).
API Gateway is the one you'll use for building REST APIs or webhooks. When a request hits your endpoint, API Gateway wraps it into a structured event object and hands it to Lambda. Your function returns a response object with a statusCode, headers, and body, and API Gateway translates that back into a real HTTP response.
The Lambda Proxy Integration model (the default and recommended approach) passes the raw request to your function and expects you to construct the full HTTP response yourself. This gives you complete control over status codes, CORS headers, and response bodies. Older tutorials show Lambda custom integrations — avoid them, they're fiddly and add complexity for no gain.
For async workloads, SQS is your best friend. Rather than calling Lambda directly (which creates tight coupling), push messages to a queue and let Lambda poll and process them in batches. This naturally handles traffic bursts without rate-limit errors.
import json import boto3 import uuid import os from datetime import datetime, timezone # DynamoDB resource initialised at cold-start — reused on warm invocations dynamodb = boto3.resource('dynamodb') orders_table = dynamodb.Table(os.environ['ORDERS_TABLE_NAME']) def handler(event, context): """ Handles POST /orders from API Gateway (Lambda Proxy Integration). Creates a new order record in DynamoDB and returns the order ID. API Gateway event shape (key fields): event['httpMethod'] -> 'POST' event['path'] -> '/orders' event['body'] -> Raw JSON string of the request body event['requestContext'] -> Metadata including caller identity """ http_method = event.get('httpMethod', '') # Route guard — this function only handles order creation if http_method != 'POST': return _build_response(405, {'error': f'Method {http_method} not allowed'}) # API Gateway sends the body as a raw string — we must parse it try: request_body = json.loads(event.get('body') or '{}') except json.JSONDecodeError: return _build_response(400, {'error': 'Request body must be valid JSON'}) # Validate required fields before touching the database required_fields = ['customer_id', 'items', 'total_amount'] missing_fields = [f for f in required_fields if f not in request_body] if missing_fields: return _build_response(400, {'error': f'Missing required fields: {missing_fields}'}) # Build the order record order_id = str(uuid.uuid4()) # Unique ID for this order created_at = datetime.now(timezone.utc).isoformat() # ISO 8601, always UTC order_record = { 'order_id': order_id, 'customer_id': request_body['customer_id'], 'items': request_body['items'], 'total_amount': str(request_body['total_amount']), # DynamoDB doesn't support float natively 'status': 'PENDING', 'created_at': created_at } # Write to DynamoDB — put_item overwrites if the key already exists, # so ConditionExpression ensures we never silently stomp an existing order orders_table.put_item( Item=order_record, ConditionExpression='attribute_not_exists(order_id)' ) print(f"Order created: {order_id} for customer {request_body['customer_id']}") return _build_response(201, { 'order_id': order_id, 'status': 'PENDING', 'created_at': created_at }) def _build_response(status_code, body_dict): """ Constructs the response object API Gateway expects. CORS headers are included so browser-based clients can call this API. Without these headers, browsers silently block the response. """ return { 'statusCode': status_code, 'headers': { 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*' # Tighten to your domain in production }, 'body': json.dumps(body_dict) }
Order created: 3c8b1a2f-4e9d-4f3a-b1c2-7d8e9f0a1b2c for customer cust_00847
END RequestId: 9a2e4f7d-...
REPORT RequestId: 9a2e4f7d-... Duration: 187.22 ms Billed Duration: 188 ms Memory Size: 128 MB Max Memory Used: 54 MB
# HTTP Response seen by the client:
# Status: 201 Created
# Body: {"order_id": "3c8b1a2f-4e9d-4f3a-b1c2-7d8e9f0a1b2c", "status": "PENDING", "created_at": "2024-03-15T14:22:01.483921+00:00"}
Cold Starts, Memory Tuning, and the Performance Levers You Actually Control
Lambda gives you one direct performance dial: memory. You set it anywhere from 128 MB to 10,240 MB. What most developers don't realise is that CPU allocation scales proportionally with memory. A 1,024 MB Lambda function gets roughly 8x the CPU of a 128 MB one. If your function is CPU-bound (image processing, data transformation, encryption), doubling the memory can halve the execution time — and since you pay for duration × memory, the cost often stays the same or even drops.
Cold starts are the other major lever. Three strategies exist: Provisioned Concurrency, keeping functions warm with scheduled EventBridge pings, and minimising package size.
Provisioned Concurrency is the only AWS-supported solution. You pay for a set number of pre-warmed containers to stay alive at all times. It costs more than on-demand but eliminates cold starts entirely for that concurrency slot. Use it for customer-facing APIs where tail latency matters.
Package size matters because Lambda has to download your deployment package before running it. A 50 MB Python package with unnecessary dependencies cold-starts noticeably slower than a 3 MB lean package. Use Lambda Layers to separate large dependencies (like numpy or Pillow) from your application code, and use .zip deployment packages rather than container images unless you specifically need Docker tooling.
Finally, watch your timeout setting. The default is 3 seconds. Downstream API calls, DB queries, and S3 operations can easily exceed this. Set it realistically (15 minutes max) and always handle partial failures gracefully.
# AWS SAM (Serverless Application Model) template — the standard way to # define Lambda functions as Infrastructure-as-Code. # Run: sam build && sam deploy --guided AWSTemplateFormatVersion: '2010-09-09' Transform: AWS::Serverless-2016-10-31 Description: Orders API — production-grade Lambda configuration Globals: Function: # Runtime for all functions in this template unless overridden Runtime: python3.12 # Timeout generous enough for DynamoDB + downstream calls, not infinite Timeout: 30 # Environment variables available to all functions Environment: Variables: LOG_LEVEL: INFO ORDERS_TABLE_NAME: !Ref OrdersTable Resources: OrdersApiFunction: Type: AWS::Serverless::Function Properties: FunctionName: orders-api-handler CodeUri: src/ Handler: orders_api_handler.handler # 512 MB gives ~4x CPU vs 128 MB — worth it for JSON parsing + DynamoDB calls # Run AWS Lambda Power Tuning tool to find YOUR optimal memory setting MemorySize: 512 # Provisioned concurrency: 5 containers always warm for the production alias # This eliminates cold starts for the first 5 concurrent requests # Cost: ~$0.0000041 per GB-second × 5 containers × all hours in month AutoPublishAlias: live ProvisionedConcurrencyConfig: ProvisionedConcurrentExecutions: 5 # IAM permissions — principle of least privilege # Only grant what this specific function actually needs Policies: - DynamoDBCrudPolicy: TableName: !Ref OrdersTable # API Gateway trigger — Lambda Proxy Integration (recommended) Events: CreateOrder: Type: Api Properties: Path: /orders Method: POST # OPTIONS method needed for browser CORS preflight requests CreateOrderOptions: Type: Api Properties: Path: /orders Method: OPTIONS OrdersTable: Type: AWS::DynamoDB::Table Properties: TableName: orders BillingMode: PAY_PER_REQUEST # Serverless billing — no provisioned capacity to manage AttributeDefinitions: - AttributeName: order_id AttributeType: S KeySchema: - AttributeName: order_id KeyType: HASH # Point-in-time recovery — always enable for production data PointInTimeRecoverySpecification: PointInTimeRecoveryEnabled: true Outputs: OrdersApiEndpoint: Description: "API Gateway endpoint for the Orders API" Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/orders"
Building codeuri: src/ runtime: python3.12
Build Succeeded
Built Artifacts: .aws-sam/build
# sam deploy output (key lines):
Deploying with following values
Stack name : orders-api-stack
Region : eu-west-1
Confirm changes: Yes
CloudFormation events:
CREATE_COMPLETE AWS::DynamoDB::Table OrdersTable
CREATE_COMPLETE AWS::Lambda::Function OrdersApiFunction
CREATE_COMPLETE AWS::ApiGateway::RestApi ServerlessRestApi
Outputs:
OrdersApiEndpoint: https://x7k2mn3p4q.execute-api.eu-west-1.amazonaws.com/Prod/orders
Successfully created/updated stack - orders-api-stack in eu-west-1
| Aspect | AWS Lambda (Serverless) | EC2 Instance (Traditional) |
|---|---|---|
| Billing model | Per 1ms of execution + invocation count | Per hour the instance runs (even idle) |
| Scaling | Automatic — up to 1,000 concurrent by default | Manual or Auto Scaling Group (minutes to scale) |
| Max execution time | 15 minutes per invocation | Unlimited — process runs indefinitely |
| Cold start latency | 100ms–1s for first request after idle period | None (process stays resident in memory) |
| State management | Stateless — no persistent memory between calls | Stateful — in-memory state survives between requests |
| Long-running workloads | Not suitable (15 min cap) | Ideal — batch jobs, ML training, websockets |
| Operational overhead | Near zero — AWS patches, scales, monitors | High — OS updates, capacity planning, monitoring setup |
| Best for | Event-driven, spiky, short-duration tasks | Steady, high-throughput, stateful applications |
🎯 Key Takeaways
- Cold starts are a real cost — put all initialisation (DB clients, SDK objects, config) outside your handler so Lambda reuses it across warm invocations. This alone can cut average latency by 40-200ms.
- Memory is your CPU dial — increasing Lambda memory from 128 MB to 1024 MB allocates 8x the CPU. For compute-heavy functions this can reduce duration enough that the higher-memory run is cheaper per invocation.
- Lambda is stateless by design — never rely on in-memory state surviving between invocations. Use DynamoDB, ElastiCache, or S3 for any state that must persist. Any 'persistence' you observe in /tmp or module-level variables is a side effect of container reuse, not a guarantee.
- Match the trigger to the job — API Gateway for synchronous HTTP APIs, SQS for reliable async processing with retry semantics, EventBridge for scheduled tasks and event routing. Picking the wrong trigger means fighting the tool instead of building your feature.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Initialising DB connections inside the handler — Every invocation opens a new connection, exhausting your RDS connection pool within minutes under load. The symptom is 'too many connections' errors that appear fine locally but explode in production. Fix: move the connection client instantiation outside the handler function so Lambda reuses the same connection across warm invocations.
- ✕Mistake 2: Ignoring the 512 MB /tmp storage limit and assuming a clean filesystem — Lambda does NOT guarantee a clean /tmp directory between warm invocations. A previous call's temp files may still be there, causing 'file already exists' errors or stale data bugs. Fix: always generate unique filenames (use uuid or the request ID from context.aws_request_id) and clean up /tmp explicitly at the end of your handler.
- ✕Mistake 3: Setting Lambda timeout lower than the slowest downstream dependency — If your function calls an external API that sometimes takes 8 seconds and your timeout is 3 seconds, Lambda kills the invocation and returns a 504 to the caller with no useful error message. Fix: set your Lambda timeout to at least 2x your expected worst-case downstream latency, implement retries with exponential backoff for transient failures, and use AWS X-Ray tracing to measure where time is actually spent.
Interview Questions on This Topic
- QA Lambda function handles user logins and is experiencing high tail latency during morning traffic spikes. The p99 latency is 1.2 seconds but the p50 is 180ms. What's likely causing this and how would you fix it?
- QExplain the difference between synchronous and asynchronous Lambda invocation models. Give a concrete example of when you'd choose one over the other, and what happens to errors in each model.
- QYour team wants to use Lambda to process DynamoDB Stream events. A batch of 100 records comes in, your function processes 60 successfully, then fails on record 61. What happens to all 100 records, and how would you implement partial batch failure handling to avoid reprocessing the first 60?
Frequently Asked Questions
How much does AWS Lambda cost in production?
Lambda charges on two axes: number of requests ($0.20 per 1 million requests) and duration rounded to the nearest 1ms ($0.0000166667 per GB-second). The free tier covers 1 million requests and 400,000 GB-seconds per month permanently — not just the first year. A function using 512 MB running for 200ms, invoked 5 million times a month, costs roughly $8. Compare that to a t3.small EC2 at ~$15/month that sits idle most of the time.
What is a Lambda cold start and can it be completely eliminated?
A cold start is the initialisation delay when Lambda has to provision a fresh execution environment because no warm container is available. It includes downloading your code, starting the runtime, and running module-level initialisation code. Provisioned Concurrency is the only way to fully eliminate cold starts — you pay to keep N containers permanently warm. Keeping package sizes small (under 5 MB) and using lighter runtimes (Python, Node.js) minimises cold start duration but doesn't eliminate the occurrence.
Can Lambda handle long-running background jobs?
Lambda has a hard 15-minute maximum execution timeout. For jobs that run longer than that — nightly batch reports, large file processing, ML model training — you need a different tool. AWS Step Functions can chain multiple Lambda calls to work around the timeout for sequential tasks. For truly long-running jobs, AWS Fargate (containerised tasks) or AWS Batch are the right choices. Trying to hack around Lambda's timeout with recursive self-invocation is an anti-pattern and will create billing surprises.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.