Intermediate 15 min · March 06, 2026

AWS Lambda and Serverless

AWS Lambda Cold Starts — Why P99 Spikes to 1.2s at 9 AM

Q: How much does AWS Lambda cost in production?

Lambda charges on two axes: number of requests ($0.20 per 1 million requests) and duration rounded to the nearest 1ms ($0.0000166667 per GB-second). The free tier covers 1 million requests and 400,000 GB-seconds per month permanently — not just the first year. A function using 512 MB running for 200ms, invoked 5 million times a month, costs roughly $8. Compare that to a t3.small EC2 at ~$15/month that sits idle most of the time.

Q: What is a Lambda cold start and can it be completely eliminated?

A cold start is the initialisation delay when Lambda has to provision a fresh execution environment because no warm container is available. It includes downloading your code, starting the runtime, and running module-level initialisation code. Provisioned Concurrency is the only way to fully eliminate cold starts — you pay to keep N containers permanently warm. Keeping package sizes small (under 5 MB) and using lighter runtimes (Python, Node.js) minimises cold start duration but doesn't eliminate the occurrence.

Q: Can Lambda handle long-running background jobs?

Lambda has a hard 15-minute maximum execution timeout. For jobs that run longer than that — nightly batch reports, large file processing, ML model training — you need a different tool. AWS Step Functions can chain multiple Lambda calls to work around the timeout for sequential tasks. For truly long-running jobs, AWS Fargate (containerised tasks) or AWS Batch are the right choices. Trying to hack around Lambda's timeout with recursive self-invocation is an anti-pattern and will create billing surprises.

Q: How do I debug a Lambda function that's timing out?

Start by checking the CloudWatch Logs for the log stream of that invocation. Look for 'Task timed out after X seconds' line. Increase the function's timeout in the configuration (max 15 minutes) to see if it's just a wall clock issue. If it still times out, use AWS X-Ray to trace the request and identify which downstream call (database, external API, S3) is slow. Consider adding timeouts on individual client calls to avoid hanging forever. Also check if your function is waiting on a synchronous SDK call without timeout configured.

Q: Should I use container images or .zip packages for Lambda?

Use .zip deployment packages unless you specifically need a container image (e.g., large dependencies, custom runtime, need to use Docker tooling). Container images have slower cold starts because Lambda has to pull the entire image before invocation. .zip packages are smaller and faster to download. Use Lambda Layers to separate large dependencies from your code. If you must use containers, use multi-stage builds and a minimal base image (e.g., public.ecr.aws/lambda/python:latest).

Lambda cold starts added 800-1200ms to our /orders API every morning.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.

✓ Production

production tested

July 27, 2026

last updated

1,713

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of DevOps fundamentals
✓Comfortable with command-line tools
✓Basic Linux administration knowledge

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

AWS Lambda runs your code on demand without provisioning or managing servers
Three core components: Functions (your code), Triggers (event sources), Execution Environment (isolated container)
Cold starts add 100ms–1s latency when a new container spins up
Performance insight: More memory = more CPU; tuning memory can reduce both cost and duration for compute-heavy tasks
Production insight: Lambda bills for the full timeout duration even if your function finishes early — always set timeouts realistically
Biggest mistake: Assuming /tmp is clean between invocations — it persists across warm starts, causing silent data corruption

✦ Definition~90s read

What is AWS Lambda and Serverless?

AWS Lambda is a function-as-a-service (FaaS) compute service that runs your code in response to events without provisioning or managing servers. You upload a function (a zip or container image), configure a trigger (like an API Gateway HTTP request, S3 bucket event, or SQS message), and AWS handles scaling, patching, and availability.

★

Imagine you own a pizza shop but you only pay the chef when someone actually orders a pizza.

The core trade-off: you pay only for compute time consumed (per-millisecond billing after a 100ms minimum) and get automatic scaling from zero to thousands of concurrent executions — but that elasticity comes with a hidden cost called the cold start. Lambda is ideal for bursty, event-driven workloads (webhooks, image processing, real-time file transforms) but becomes problematic for latency-sensitive, steady-state traffic where the cold start tax dominates P99 response times.

The execution model is deceptively simple: when an event arrives, Lambda spins up a sandbox (a micro-VM using Firecracker), loads your runtime (Node.js, Python, Java, .NET, or custom), runs your handler code, then freezes the sandbox for ~5-15 minutes. Subsequent requests hitting the same sandbox reuse the warm environment — that's a hot start, typically <10ms overhead.

But at 9 AM when traffic spikes, the fleet of warm sandboxes is exhausted, and every new concurrent request forces a cold start: provisioning a new sandbox, downloading your code, initializing the runtime, and executing your init logic. For Java or .NET functions with heavy dependency loading, that cold start can hit 1-2 seconds, while Node.js or Python might stay under 200ms.

The P99 spike you see at 9 AM is the tail latency from a batch of concurrent cold starts hitting users simultaneously.

You control cold start performance through three levers: memory allocation (which proportionally allocates vCPU — 1,769 MB gives one full vCPU), runtime choice (avoid Java/.NET for latency-critical paths), and Provisioned Concurrency (pre-warms a set number of sandboxes, billed per hour even when idle). Provisioned Concurrency eliminates cold starts for predictable traffic patterns but costs roughly the same as keeping EC2 instances running — it's a hedge against the cold start tax, not a free lunch.

For most serverless architectures, the pragmatic approach is to tune memory to the point where your function's compute time plateaus (usually 1-2 GB for I/O-bound work), use async invocation or SQS buffering to absorb cold start latency, and reserve Provisioned Concurrency only for the top 5% of your traffic that drives P99. The alternative — running a container on ECS Fargate or a fixed pool of EC2 instances — gives you predictable sub-10ms latency but requires capacity planning and pays for idle time.

Plain-English First

Imagine you own a pizza shop but you only pay the chef when someone actually orders a pizza. The chef doesn't sit around waiting — they appear the moment an order comes in, make the pizza, then disappear. AWS Lambda is exactly that chef. You write a function, AWS runs it only when something triggers it, and you pay only for the milliseconds it runs. No server to babysit, no idle hours billed, no infrastructure to patch.

⚙ Browser compatibility

Latest versions — ✓ supported

Chrome	Firefox	Safari	Edge
✓	✓	✓	✓

Every application needs compute power — something has to run your code. Traditionally, that meant renting a virtual machine or physical server that runs 24/7, even at 3 a.m. when zero users are online. You're paying for potential, not actual work. As cloud adoption exploded, this idle-cost problem became impossible to ignore, especially for startups and teams with unpredictable traffic spikes.

AWS Lambda, launched in 2014, flipped the model. Instead of managing servers, you upload a function — a single, focused piece of logic — and AWS handles everything else: provisioning, scaling, patching, and availability. The term 'serverless' doesn't mean there are no servers; it means YOU don't manage them. The servers exist, they're just Amazon's problem. This lets your team focus entirely on business logic instead of infrastructure operations.

By the end of this article you'll understand how Lambda executes code, how to wire it to real-world triggers like API Gateway and S3, how to avoid the cold start trap that kills performance, and how to structure a production-worthy serverless workflow. You'll also know exactly when Lambda is the right tool — and when it absolutely isn't.

AWS Lambda Serverless — The Execution Model That Bites at Scale

AWS Lambda is a function-as-a-service (FaaS) platform that runs your code in ephemeral, stateless containers. You upload a function, specify a trigger (API Gateway, SQS, S3, etc.), and AWS manages the underlying compute. The core mechanic: each invocation runs in a fresh or recycled sandbox, with no persistent local state across invocations. This is not a long-running process — it's a request-scoped execution that starts, runs, and dies within minutes.

When a Lambda function is invoked, the service either reuses a warm sandbox (if one is available) or creates a new one — this is the cold start. A cold start includes downloading your code, initializing the runtime (JVM in your case), and running your static initializers. For Java, this adds 500ms–1.2s of latency before your handler even executes. The sandbox lifecycle is opaque: you cannot pin a container, and AWS recycles them aggressively (typically after 5–15 minutes of idle time).

Use Lambda when you need elastic scaling with zero idle cost — bursty workloads, event-driven pipelines, or microservices that can tolerate sub-second startup latency. It's not for latency-sensitive user-facing endpoints at the 99th percentile unless you pre-warm or use Provisioned Concurrency. In production, the 9 AM spike is a classic pattern: a wave of concurrent requests hits cold containers simultaneously, amplifying P99 latency by 3–10x.

⚠ Cold Start ≠ Slow Code

A 1.2s cold start is not a code performance issue — it's the JVM initialization and class loading. Your handler runs in <10ms after that.

📊 Production Insight

Teams using Java Lambda for synchronous API endpoints see P99 latency jump from 200ms to 1.4s during morning traffic spikes.

The symptom: intermittent timeouts from API Gateway (504s) that correlate with low invocation counts in CloudWatch.

Rule: For any endpoint requiring <500ms P99, use Provisioned Concurrency or switch to a runtime with faster startup (Python, Node, or GraalVM native).

🎯 Key Takeaway

Lambda is not a server — it's a request-scoped sandbox with no sticky state.

Cold starts are a deployment-time and scaling-time tax, not a runtime tax.

Java's JVM startup dominates cold start latency; use SnapStart or GraalVM to mitigate.

thecodeforge.io

Aws Lambda Serverless

How AWS Lambda Actually Executes Your Code — The Execution Model

Lambda's execution model is the foundation everything else builds on. When a trigger fires — say, an HTTP request hits API Gateway — Lambda needs to run your function. If a pre-warmed container exists from a recent invocation, Lambda reuses it. This is a 'warm start' and it's fast. If no container is available, Lambda has to bootstrap one from scratch: download your code package, spin up a runtime environment, run any initialisation code outside your handler, then finally invoke your handler. That bootstrap phase is the dreaded cold start.

Cold starts typically add 100ms–1000ms of latency depending on the runtime (.NET and Java are heavier; Node.js and Python are lighter). For a background job this is irrelevant. For a user-facing API call, it's noticeable.

Your handler function receives two objects: the event (the payload that triggered the invocation — could be an HTTP body, an S3 event, a queue message) and the context (metadata about the invocation itself — function name, memory limit, request ID). Understanding this distinction is critical: the event is about WHAT happened, the context is about WHO is running.

Code outside the handler runs once per container lifecycle. That's where you put database connections, SDK clients, and config loading — doing it inside the handler means re-initialising on every single invocation, which is both slow and wasteful.

image_resize_handler.pyPYTHON

import boto3
import json
import os
from PIL import Image
import io

# ✅ Initialise the S3 client OUTSIDE the handler.
# This runs once when the container boots (cold start),
# then gets reused across all warm invocations — saving ~50ms per call.
s3_client = boto3.client('s3')

# Target width for all resized thumbnails
THUMBNAIL_WIDTH = 200


def handler(event, context):
    """
    Triggered by an S3 PUT event whenever a new image is uploaded
    to the 'uploads-raw' bucket. Resizes it and saves a thumbnail
    to the 'uploads-thumbnails' bucket.
    """

    # The event payload from S3 contains a list of records —
    # each record represents one file upload.
    for record in event['Records']:
        source_bucket = record['s3']['bucket']['name']
        object_key    = record['s3']['object']['key']  # e.g. 'photos/sunset.jpg'

        print(f"Processing: s3://{source_bucket}/{object_key}")

        # Download the original image bytes into memory (no temp file needed)
        response      = s3_client.get_object(Bucket=source_bucket, Key=object_key)
        image_bytes   = response['Body'].read()

        # Open image with Pillow and calculate proportional height
        original_img  = Image.open(io.BytesIO(image_bytes))
        original_w, original_h = original_img.size
        ratio         = THUMBNAIL_WIDTH / original_w
        new_height    = int(original_h * ratio)

        thumbnail     = original_img.resize((THUMBNAIL_WIDTH, new_height))

        # Save resized image to an in-memory buffer — Lambda has no persistent disk
        output_buffer = io.BytesIO()
        thumbnail.save(output_buffer, format='JPEG', quality=85)
        output_buffer.seek(0)  # Rewind buffer to the start before uploading

        # Write thumbnail to the destination bucket under the same key name
        destination_bucket = os.environ['THUMBNAIL_BUCKET']  # Read from env vars, not hardcoded
        s3_client.put_object(
            Bucket      = destination_bucket,
            Key         = object_key,
            Body        = output_buffer,
            ContentType = 'image/jpeg'
        )

        print(f"Thumbnail saved: s3://{destination_bucket}/{object_key} ({THUMBNAIL_WIDTH}x{new_height})")

    # Lambda expects a return value when invoked synchronously (e.g. via API Gateway).
    # For async triggers like S3, the return value is ignored — but it's good practice.
    return {
        'statusCode': 200,
        'body': json.dumps({'processed': len(event['Records'])})
    }

Output

START RequestId: 7f3a1c2b-... Version: $LATEST

Processing: s3://uploads-raw/photos/sunset.jpg

Thumbnail saved: s3://uploads-thumbnails/photos/sunset.jpg (200x133)

END RequestId: 7f3a1c2b-...

REPORT RequestId: 7f3a1c2b-... Duration: 312.45 ms Billed Duration: 313 ms Memory Size: 256 MB Max Memory Used: 89 MB

💡Pro Tip: The Container Reuse Rule

Anything initialised outside your handler (DB connections, SDK clients, parsed config) is cached for the lifetime of the container — potentially minutes or hours. This is a feature, not a bug. Put expensive initialisation there. But never assume a clean slate between invocations: a previous call's temp files or in-memory state might still exist. Always write to /tmp explicitly and defensively.

📊 Production Insight

Cold starts burn the most time on first invocation after idle periods.

Use the Init Duration field in CloudWatch logs to measure it.

If your init duration exceeds 200ms, consider reducing package size or switching runtimes.

🎯 Key Takeaway

Initialise everything outside the handler.

Measure Init Duration to quantify cold start.

Use Provisioned Concurrency only for latency-critical endpoints.

Cold Start Mitigation Decision

IfFunction is latency-sensitive (user-facing API)

→

UseUse Provisioned Concurrency for pre-warmed containers.

IfFunction runs less than once per 15 minutes

→

UseCold start penalty is small; optimise code instead.

IfDeployment package > 10 MB with heavy deps

→

UseExternalise dependencies to a Lambda Layer to reduce download time.

Wiring Lambda to the Real World — Triggers, Events, and API Gateway

A Lambda function sitting alone does nothing. It needs a trigger — an AWS service that says 'hey, something happened, go run'. The trigger determines the shape of the event object your handler receives, which is why reading the AWS event schema docs for each trigger type matters.

The most common triggers in production are: API Gateway (HTTP requests), S3 (file uploads/deletions), SQS (queue messages for async processing), EventBridge (scheduled cron jobs and event routing), DynamoDB Streams (react to database changes), and SNS (fan-out notifications).

API Gateway is the one you'll use for building REST APIs or webhooks. When a request hits your endpoint, API Gateway wraps it into a structured event object and hands it to Lambda. Your function returns a response object with a statusCode, headers, and body, and API Gateway translates that back into a real HTTP response.

The Lambda Proxy Integration model (the default and recommended approach) passes the raw request to your function and expects you to construct the full HTTP response yourself. This gives you complete control over status codes, CORS headers, and response bodies. Older tutorials show Lambda custom integrations — avoid them, they're fiddly and add complexity for no gain.

For async workloads, SQS is your best friend. Rather than calling Lambda directly (which creates tight coupling), push messages to a queue and let Lambda poll and process them in batches. This naturally handles traffic bursts without rate-limit errors.

orders_api_handler.pyPYTHON

import json
import boto3
import uuid
import os
from datetime import datetime, timezone

# DynamoDB resource initialised at cold-start — reused on warm invocations
dynamodb    = boto3.resource('dynamodb')
orders_table = dynamodb.Table(os.environ['ORDERS_TABLE_NAME'])


def handler(event, context):
    """
    Handles POST /orders from API Gateway (Lambda Proxy Integration).
    Creates a new order record in DynamoDB and returns the order ID.

    API Gateway event shape (key fields):
      event['httpMethod']         -> 'POST'
      event['path']               -> '/orders'
      event['body']               -> Raw JSON string of the request body
      event['requestContext']     -> Metadata including caller identity
    """

    http_method = event.get('httpMethod', '')

    # Route guard — this function only handles order creation
    if http_method != 'POST':
        return _build_response(405, {'error': f'Method {http_method} not allowed'})

    # API Gateway sends the body as a raw string — we must parse it
    try:
        request_body = json.loads(event.get('body') or '{}')
    except json.JSONDecodeError:
        return _build_response(400, {'error': 'Request body must be valid JSON'})

    # Validate required fields before touching the database
    required_fields = ['customer_id', 'items', 'total_amount']
    missing_fields  = [f for f in required_fields if f not in request_body]
    if missing_fields:
        return _build_response(400, {'error': f'Missing required fields: {missing_fields}'})

    # Build the order record
    order_id    = str(uuid.uuid4())  # Unique ID for this order
    created_at  = datetime.now(timezone.utc).isoformat()  # ISO 8601, always UTC

    order_record = {
        'order_id':      order_id,
        'customer_id':   request_body['customer_id'],
        'items':         request_body['items'],
        'total_amount':  str(request_body['total_amount']),  # DynamoDB doesn't support float natively
        'status':        'PENDING',
        'created_at':    created_at
    }

    # Write to DynamoDB — put_item overwrites if the key already exists,
    # so ConditionExpression ensures we never silently stomp an existing order
    orders_table.put_item(
        Item=order_record,
        ConditionExpression='attribute_not_exists(order_id)'
    )

    print(f"Order created: {order_id} for customer {request_body['customer_id']}")

    return _build_response(201, {
        'order_id':   order_id,
        'status':     'PENDING',
        'created_at': created_at
    })


def _build_response(status_code, body_dict):
    """
    Constructs the response object API Gateway expects.
    CORS headers are included so browser-based clients can call this API.
    Without these headers, browsers silently block the response.
    """
    return {
        'statusCode': status_code,
        'headers': {
            'Content-Type':                'application/json',
            'Access-Control-Allow-Origin': '*'  # Tighten to your domain in production
        },
        'body': json.dumps(body_dict)
    }

Output

START RequestId: 9a2e4f7d-... Version: $LATEST

Order created: 3c8b1a2f-4e9d-4f3a-b1c2-7d8e9f0a1b2c for customer cust_00847

END RequestId: 9a2e4f7d-...

REPORT RequestId: 9a2e4f7d-... Duration: 187.22 ms Billed Duration: 188 ms Memory Size: 128 MB Max Memory Used: 54 MB

# HTTP Response seen by the client:

# Status: 201 Created

# Body: {"order_id": "3c8b1a2f-4e9d-4f3a-b1c2-7d8e9f0a1b2c", "status": "PENDING", "created_at": "2024-03-15T14:22:01.483921+00:00"}

⚠ Watch Out: The Missing CORS Headers Trap

If your Lambda-backed API works perfectly in Postman but fails in the browser with a CORS error, you're missing Access-Control-Allow-Origin in your Lambda response AND you haven't configured the OPTIONS preflight method in API Gateway. You need both — Lambda handles the headers on real requests, but API Gateway must respond to preflight OPTIONS requests independently (enable CORS on the resource in the API Gateway console or via CloudFormation).

📊 Production Insight

SQS-based async invocation removes tight coupling and handles bursts gracefully.

But if your function is idempotent and fast, SQS batch processing can process 10 messages per invocation.

Always set a dead-letter queue for failed messages — without it, messages silently vanish.

🎯 Key Takeaway

Pick the trigger that fits the job.

API Gateway for sync calls, SQS for async reliability.

Never use SNS direct to Lambda without a queue — retries are weak.

Choosing the Right Trigger Pattern

IfNeed RESTful HTTP API

→

UseUse API Gateway with Proxy Integration.

IfNeed reliable async processing with retry

→

UseUse SQS with Lambda as event source mapping.

IfReact to file uploads or changes in a bucket

→

UseUse S3 event notifications directly to Lambda.

IfNeed scheduled execution or cron

→

UseUse EventBridge (CloudWatch Events) rule as trigger.

thecodeforge.io

Aws Lambda Serverless

Lambda Event Source Reference Table — What Triggers Your Function

The following table catalogs the most common Lambda event sources, their invocation model, payload size limits, retry behavior, and best-fit use cases. Knowing these details helps you design reliable, cost-efficient serverless workflows. For each source, the event structure is fixed by AWS — you cannot change the schema — so you must parse the documented fields correctly in your handler.

Event Source	Invocation Type	Max Payload	Retry Behavior	Best For
API Gateway	Synchronous	10 MB (request), 10 MB (response)	No automatic retries; client handles	HTTP/REST APIs, webhooks
S3 (Event Notifications)	Asynchronous	128 KB (event record)	2 retries (async)	File processing (image resize, logs, analytics)
DynamoDB Streams	Stream-based	1 MB (batch)	Indefinite retry until data expires (24h)	React to DB changes (materialized views, sync)
Kinesis Data Streams	Stream-based	1 MB (per record)	Indefinite retry until data expires (7 days)	Real-time data processing (clickstreams, logs)
SQS (Standard)	Poll-based (event source mapping)	256 KB per message	Retries based on redrive policy	Async decoupling, buffering, batch processing
SQS (FIFO)	Poll-based (event source mapping)	256 KB per message	Retries with exactly-once semantics	Ordered processing, deduplication
SNS (topic subscription)	Asynchronous	256 KB	2 retries (async)	Fan-out notifications to multiple subscribers
EventBridge (scheduled or event)	Asynchronous	256 KB	2 retries (async)	Cron jobs, event routing between AWS services
CloudFront (Lambda@Edge)	Synchronous	1 MB	No automatic retries	Modify HTTP request/response at edge
Lambda Function URL	Synchronous	10 MB (request/response)	No automatic retries	Simple HTTP endpoints without API Gateway

Key details to remember: - Asynchronous invocations (S3, SNS, EventBridge) retry twice with 1–2 minute delays. Always configure a dead-letter queue (DLQ) for these triggers. - Stream-based triggers (DynamoDB, Kinesis) retry until the data record expires — a persistent bug will block the entire shard. Use bisectBatchOnFunctionError to split batches on failure. - Synchronous triggers (API Gateway, Lambda Function URL) do not retry; your client or upstream service must implement retry logic. - Payload size limits are hard: if your S3 event payload exceeds 128 KB, S3 will send the notification anyway but truncates the event — use the Deep Archive storage class sparingly to avoid this.

For a full list of event sources and their exact event schemas, refer to the [AWS Lambda Developer Guide — Using AWS Lambda with other services](https://docs.aws.amazon.com/lambda/latest/dg/lambda-services.html).

💡Quick Reference: Async vs Sync Invocation

Async invocation returns '202 Accepted' immediately and runs the function in the background. Sync invocation waits for the function to complete and returns the result. If you need the response, use sync. If you don't need the response and want decoupling, use async. Stream-based triggers are a hybrid — they poll the stream and invoke your function synchronously with batches of records.

📊 Production Insight

Payload size limits are a hidden trap. For example, S3 event notifications max out at 128 KB per event — if you upload many small files in a single S3 PUT (via multipart upload), the event can be truncated silently. Always validate that the event object contains the expected number of records and fields, or switch to SQS notifications for large batches.

🎯 Key Takeaway

Match the event source to your reliability requirements.

Async sources need a DLQ.

Stream sources need idempotent handlers.

Synchronous sources need client-side retry.

Cold Starts, Memory Tuning, and the Performance Levers You Actually Control

Lambda gives you one direct performance dial: memory. You set it anywhere from 128 MB to 10,240 MB. What most developers don't realise is that CPU allocation scales proportionally with memory. A 1,024 MB Lambda function gets roughly 8x the CPU of a 128 MB one. If your function is CPU-bound (image processing, data transformation, encryption), doubling the memory can halve the execution time — and since you pay for duration × memory, the cost often stays the same or even drops.

Cold starts are the other major lever. Three strategies exist: Provisioned Concurrency, keeping functions warm with scheduled EventBridge pings, and minimising package size.

Provisioned Concurrency is the only AWS-supported solution. You pay for a set number of pre-warmed containers to stay alive at all times. It costs more than on-demand but eliminates cold starts entirely for that concurrency slot. Use it for customer-facing APIs where tail latency matters.

Package size matters because Lambda has to download your deployment package before running it. A 50 MB Python package with unnecessary dependencies cold-starts noticeably slower than a 3 MB lean package. Use Lambda Layers to separate large dependencies (like numpy or Pillow) from your application code, and use .zip deployment packages rather than container images unless you specifically need Docker tooling.

Finally, watch your timeout setting. The default is 3 seconds. Downstream API calls, DB queries, and S3 operations can easily exceed this. Set it realistically (15 minutes max) and always handle partial failures gracefully.

serverless-function-config.yamlYAML

# AWS SAM (Serverless Application Model) template — the standard way to
# define Lambda functions as Infrastructure-as-Code.
# Run: sam build && sam deploy --guided

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Description: Orders API — production-grade Lambda configuration

Globals:
  Function:
    # Runtime for all functions in this template unless overridden
    Runtime: python3.12
    # Timeout generous enough for DynamoDB + downstream calls, not infinite
    Timeout: 30
    # Environment variables available to all functions
    Environment:
      Variables:
        LOG_LEVEL:          INFO
        ORDERS_TABLE_NAME:  !Ref OrdersTable

Resources:

  OrdersApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: orders-api-handler
      CodeUri:       src/
      Handler:       orders_api_handler.handler

      # 512 MB gives ~4x CPU vs 128 MB — worth it for JSON parsing + DynamoDB calls
      # Run AWS Lambda Power Tuning tool to find YOUR optimal memory setting
      MemorySize: 512

      # Provisioned concurrency: 5 containers always warm for the production alias
      # This eliminates cold starts for the first 5 concurrent requests
      # Cost: ~$0.0000041 per GB-second × 5 containers × all hours in month
      AutoPublishAlias: live
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 5

      # IAM permissions — principle of least privilege
      # Only grant what this specific function actually needs
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref OrdersTable

      # API Gateway trigger — Lambda Proxy Integration (recommended)
      Events:
        CreateOrder:
          Type:  Api
          Properties:
            Path:   /orders
            Method: POST

        # OPTIONS method needed for browser CORS preflight requests
        CreateOrderOptions:
          Type:  Api
          Properties:
            Path:   /orders
            Method: OPTIONS

  OrdersTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName:    orders
      BillingMode:  PAY_PER_REQUEST  # Serverless billing — no provisioned capacity to manage
      AttributeDefinitions:
        - AttributeName: order_id
          AttributeType: S
      KeySchema:
        - AttributeName: order_id
          KeyType:        HASH

      # Point-in-time recovery — always enable for production data
      PointInTimeRecoverySpecification:
        PointInTimeRecoveryEnabled: true

Outputs:
  OrdersApiEndpoint:
    Description: "API Gateway endpoint for the Orders API"
    Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/orders"

Output

# sam build output:

Building codeuri: src/ runtime: python3.12

Build Succeeded

Built Artifacts: .aws-sam/build

# sam deploy output (key lines):

Deploying with following values

Stack name : orders-api-stack

Region : eu-west-1

Confirm changes: Yes

CloudFormation events:

CREATE_COMPLETE AWS::DynamoDB::Table OrdersTable

CREATE_COMPLETE AWS::Lambda::Function OrdersApiFunction

CREATE_COMPLETE AWS::ApiGateway::RestApi ServerlessRestApi

Outputs:

OrdersApiEndpoint: https://x7k2mn3p4q.execute-api.eu-west-1.amazonaws.com/Prod/orders

Successfully created/updated stack - orders-api-stack in eu-west-1

🔥Interview Gold: Lambda Power Tuning

AWS publishes an open-source Step Functions state machine called 'Lambda Power Tuning' (github.com/alexcasalboni/aws-lambda-power-tuning). It runs your function at every memory configuration from 128 MB to 10 GB, measures cost and duration, and plots the optimal setting. Mentioning this tool in an interview signals you understand production cost optimisation, not just configuration.

📊 Production Insight

Memory tuning is the only performance lever — use Power Tuning to find the sweet spot.

Provisioned Concurrency eliminates cold starts but adds cost.

Container images slower to deploy than .zip; prefer .zip unless you need large binaries.

🎯 Key Takeaway

Memory scales CPU, not just RAM.

Run Power Tuning once per function.

Prefer .zip over container images for faster cold starts.

Memory & Cold Start Trade-off Decision

IfFunction is CPU-bound and cost-sensitive

→

UseRun Power Tuning. Often 1024 MB is cheaper than 256 MB because it finishes faster.

Ifp99 latency must stay under 300ms

→

UseUse Provisioned Concurrency. Do not rely on ping warming.

Provisioned Concurrency vs Cold Start — Visual Breakdown

Provisioned Concurrency is the only AWS-native mechanism that guarantees zero cold starts for a fixed number of concurrent invocations. The diagram below contrasts the request flow for an on-demand function (which may incur a cold start) versus a function with Provisioned Concurrency.

How it works: When you enable Provisioned Concurrency, Lambda pre-initialises a specified number of execution environments and keeps them warm. Incoming invocations are routed to these warm environments instantly. On-demand environments are still used for invocations beyond the provisioned count, so cold starts still occur when the provisioned pool is exhausted. The visual logic flow:

On-Demand Path: Request arrives → check for warm container → if none found → cold start (init + handler delay).
Provisioned Concurrency Path: Request arrives → route to pre-warmed container → warm start (handler only, no init delay).

The benefit is a 100% elimination of cold start latency for the initial set of concurrent requests. The cost is paying for those environments even when idle.

When to use it: Only for latency-critical production endpoints where p99 must stay below, say, 500ms. For batch processing or background jobs, on-demand is sufficient and cheaper.

When NOT to use it: If your function is rarely invoked (once per hour), the cost of keeping a container warm 24/7 will far exceed any performance benefit. A simple scheduled EventBridge ping (every 5 minutes) is cheaper and nearly as effective — though not guaranteed, as AWS may reclaim containers during maintenance.

Alternative warming patterns: A common pattern is to set up an EventBridge rule that invokes your function every 5 minutes with a synthetic event (e.g., a 'warmup' field). This keeps 1–2 containers warm without Provisioned Concurrency cost. However, this is unreliable under burst traffic — if multiple concurrent requests arrive simultaneously, only one container may be warm. Provisioned Concurrency guarantees capacity.

provisioned-concurrency-snippet.yamlYAML

# Snippet: enabling Provisioned Concurrency in SAM
# Full template in previous section; this is the critical part

      AutoPublishAlias: live
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 5

# After deploy, verify with:
# aws lambda get-provisioned-concurrency-config --function-name orders-api-handler --qualifier live
# Expected: Status: READY, AllocatedProvisionedConcurrentExecutions: 5

Output

# Output:

{

"AllocatedProvisionedConcurrentExecutions": 5,

"AvailableProvisionedConcurrentExecutions": 5,

"Status": "READY"

}

⚠ Provisioned Concurrency Costs Money Even When Idle

You pay for Provisioned Concurrency on a per-GB-second basis, even when no requests are being processed. For 5 containers at 512 MB for a full month, that's roughly $10. If your endpoint handles <1 req/min, a scheduled warmer may be more cost-effective. Always monitor the ProvisionedConcurrencySpillover metric to see how many requests exceed the provisioned pool.

📊 Production Insight

Provisioned Concurrency is the only way to guarantee zero cold starts for latency-critical endpoints. Use it sparingly — only for the minimum number of concurrent executions needed to cover p95 traffic. For everything else, optimize package size and runtime to keep cold starts under 200ms.

🎯 Key Takeaway

Provisioned Concurrency eliminates cold starts but at a cost. Use only for latency-sensitive paths where p99 latency must stay under a threshold. Monitor spillover to adjust the count.

Cold Start vs Provisioned Concurrency Flow

Lambda Resource Limits & Constraints Table — What You Can't Change

Lambda has specific hard limits that constrain how you design your serverless applications. Exceeding these limits results in deployment failures, throttling, or runtime errors. The table below shows the most important limits — know them before you architect your system.

Resource	Limit	Notes
Memory per function	128 MB – 10,240 MB (in 1 MB increments)	CPU scales with memory; more memory = more CPU
Ephemeral storage `/tmp`	512 MB	Shared across warm invocations; not reset on reuse
Maximum execution timeout	15 minutes (900 seconds)	Hard limit; cannot be increased
Deployment package size (.zip)	250 MB (unzipped), 50 MB (zipped for direct upload)	Use Lambda Layers to exceed: up to 5 layers, each up to 250 MB unzipped
Container image size	10 GB (ECR image)	Larger images cause slower cold starts
Concurrent executions per region (default)	1,000	Can be increased via service quota request
Concurrent executions per function (default)	1,000 (unreserved)	Can be limited with reserved concurrency
Request/response payload size (sync)	256 KB (6 MB for API Gateway)	For larger payloads, use S3 or streaming
Function environment variables	4 KB total (unencrypted)	Use AWS Secrets Manager or Parameter Store for secrets
Lambda Layers per function	5	Layer size counts toward total unzipped limit (250 MB)
Event source mappings per function	10 (for SQS, DynamoDB, Kinesis)	Add more by using multiple triggers
Reserved concurrency per function	0 – regional limit	Setting reserved concurrency guarantees capacity but blocks other functions
Provisioned Concurrency per function	0 – regional limit	Regional limit is 5,000 per region by default
Function execution role	AWS IAM role	Lambda attaches this role to the execution environment

How to work around limits: - Package size: If you exceed 250 MB unzipped, separate large libraries (Panda, OpenCV, etc.) into Lambda Layers. Each layer can be up to 250 MB, and you can use up to 5 layers, giving you an effective 1.25 GB total. - Timeout: Lambda supports up to 15 minutes. For longer jobs, use AWS Step Functions to orchestrate multiple Lambda calls, or switch to Fargate/Batch. - Concurrency: If you anticipate more than 1,000 concurrent executions, request a limit increase in the AWS Service Quotas console. Also consider using SQS buffering to smooth traffic. - Payload size: For payloads larger than 256 KB, upload to S3 and pass the object key in the event. Lambda reads from S3 instead of the event body.

These limits are not negotiable — building against them from day one avoids costly refactors later.

🔥The 50 MB Zip Limit Is for Direct Upload Only

If you use AWS SAM, CloudFormation, or the AWS CLI to deploy from S3, the 50 MB zip limit doesn't apply — the package is stored in S3 and Lambda downloads it from there. The 250 MB unzipped limit still applies. For large packages, always deploy via S3, or use container images (up to 10 GB).

📊 Production Insight

Most production incidents involving Lambda stem from hitting a limit unexpectedly: timeout too low, payload too large, concurrency exhausted. Include limit checks in your CI/CD pipeline. Use the AWS Lambda API to query configured limits and alert when you approach thresholds.

🎯 Key Takeaway

Know the hard limits before building. Package size, timeout, payload size, and concurrency are the top constraints. Design within them or use layers/step functions to extend.

Production Patterns: Error Handling, Retries, and Observability

Lambda's default retry behaviour depends on invocation type. Synchronous invocations (API Gateway, custom apps) do NOT retry automatically — your client must handle errors. Asynchronous invocations (S3, SNS, EventBridge) retry twice using built-in retry logic, then discard the event unless you configure a dead-letter queue (DLQ). Stream-based triggers (DynamoDB Streams, Kinesis) retry until the data expires (default 24 hours) and block the shard — meaning a permanently failing function stalls your stream.

For synchronous APIs, implement your own retry with exponential backoff inside Lambda. For async triggers, always attach a DLQ (SQS or SNS) to capture failed events. Without a DLQ, failed events vanish after two retries — you'll never know.

Observability in Lambda is driven by CloudWatch Logs, CloudWatch Metrics, and AWS X-Ray. Every invocation writes a REPORT line showing duration, billed duration, memory used, and init duration. X-Ray traces show downstream calls to DynamoDB, S3, and other services — essential for debugging latency.

Structured logging is critical. Use JSON-formatted logs with a correlation ID (often the X-Ray trace ID) so you can correlate invocations. Avoid print() statements without context.

error_handling_logging.pyPYTHON

import json
import logging
import os
import traceback

# Configure structured JSON logging
logging.basicConfig(level=logging.INFO, format='%(message)s')
logger = logging.getLogger()


def handler(event, context):
    # Capture X-Ray trace ID for correlation
    trace_id = context.aws_request_id
    logger.info(json.dumps({
        'trace_id': trace_id,
        'event_type': type(event).__name__,
        'message': 'Function invoked'
    }))

    try:
        # business logic
        result = process_order(event)
        return _build_response(200, result)
    except ValidationError as e:
        logger.warning(json.dumps({
            'trace_id': trace_id,
            'error': str(e),
            'message': 'Validation failed'
        }))
        return _build_response(400, {'error': str(e)})
    except ExternalServiceError as e:
        logger.error(json.dumps({
            'trace_id': trace_id,
            'error': str(e),
            'message': 'Downstream service failed'
        }))
        # Retry with exponential backoff (simplified)
        time.sleep(2 ** context.retry_attempt)  # not recommended for sync; use async DLQ instead
        raise  # Let Lambda retry if async
    except Exception:
        logger.critical(json.dumps({
            'trace_id': trace_id,
            'error': traceback.format_exc(),
            'message': 'Unhandled exception'
        }))
        return _build_response(500, {'error': 'Internal server error'})

def _build_response(status_code, body):
    return {
        'statusCode': status_code,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps(body)
    }

Output

# CloudWatch Logs output:

{"trace_id": "9a2e4f7d-...", "event_type": "dict", "message": "Function invoked"}

{"trace_id": "9a2e4f7d-...", "error": "Invalid field: customer_id", "message": "Validation failed"}

# Lambda REPORT line:

REPORT RequestId: 9a2e4f7d-... Duration: 45.23 ms Billed Duration: 46 ms Memory Size: 128 MB Max Memory Used: 32 MB

Mental Model

The Async Retry Model Mental Model

Lambda treats async invocations as fire-and-forget with two retries. Think of it like a bus that picks up events, tries three times to deliver, then drops them at the dead-letter stop.

Synchronous invocations: no automatic retries. The caller must handle errors.
Asynchronous invocations: two automatic retries with exponential backoff (0, 1, 2 min delays).
Stream-based triggers: retry forever (up to 24 hours or 7 days for Kinesis).
Always configure a dead-letter queue (DLQ) for async triggers to catch failures.
DLQ can be an SQS queue (for processing later) or an SNS topic (for alerting).

📊 Production Insight

Never assume your Lambda will succeed on first try.

Always attach a DLQ to async triggers — otherwise failures vanish silently.

Use X-Ray to trace every downstream call; it's the only way to see where time is spent.

🎯 Key Takeaway

Synchronous: no retries — you own the failure.

Async: two retries, then DLQ.

Stream: indefinite retry — fix fast or skip gracefully.

Retry Strategy by Invocation Type

IfInvocation via API Gateway (sync)

→

UseImplement retry in client or inside Lambda with circuit breaker.

IfS3, SNS, EventBridge (async)

→

UseLambda retries twice; configure DLQ to capture remaining failures.

IfDynamoDB Streams, Kinesis (stream)

→

UseLambda retries until data expires; fix the bug or implement filter to skip bad records.

When Lambda is the Wrong Tool — Alternatives and Trade-offs

Lambda excels at short-lived, event-driven, bursty workloads. But it's not a general-purpose compute platform. If your workload contradicts any of the following, reach for another service.

First, long-running processes: Lambda's hard 15-minute timeout means you cannot run a nightly batch job that takes an hour. Use AWS Batch or ECS/Fargate for that.

Second, stateful applications: Lambda is stateless by design. If your application needs to hold client connections (WebSockets), maintain session state in memory, or use files that persist beyond a single invocation, you'll fight the architecture. Use EC2 or ECS with sticky sessions instead.

Third, predictable, steady traffic: If your load is constant 24/7, Lambda's per-ms billing is more expensive than a low-cost EC2 instance or a reserved instance. A t3.small running 24 hours costs $15/month; 5 million Lambda invocations at 200ms average could cost $8, but steady traffic at 100 req/s would push cost higher than an EC2.

Fourth, heavy GPU/compute: Lambda has no GPU support. ML training, 3D rendering, or video transcoding with high compute needs are better on EC2 GPU instances or SageMaker.

Fifth, very low latency requirements (<10ms): Lambda's cold start and network overhead make it unsuitable for sub-millisecond use cases like real-time trading. Use containers on EC2 or custom hardware.

Finally, large binary processing: Lambda's deployment package limit is 250 MB (unzipped) and 50 MB (zipped) for direct upload. If you're processing multi-GB files, you'll hit storage and timeout limits. Use ECS or Batch with EFS.

lambda-vs-alternatives.yamlYAML

# Quick decision reference for choosing Lambda or an alternative

deployment_type:
  - name: Lambda
    good_for:
      - Event-driven functions
      - Spiky HTTP APIs
      - Scheduled tasks under 15 minutes
    bad_for:
      - Long-running batch jobs (over 15 min)
      - Stateful applications
      - Predictable steady traffic
  - name: ECS/Fargate
    good_for:
      - Containerized workloads (stateful or stateless)
      - Long-running services
      - WebSocket servers
      - Background workers needing >15 min
    bad_for:
      - Very short-lived functions (cold start is higher)
      - Simple event reactions (overkill)
  - name: EC2
    good_for:
      - Full control of OS/GPU
      - Stable traffic patterns
      - Applications needing persistent storage
    bad_for:
      - Variable traffic (idle cost)
      - Need minimal ops overhead

# Cost comparison example:
# Scenario: 100 req/s steady, average duration 100ms, 512 MB memory
# Lambda: 100 * 86400 = 8.64M invocations/day -> $1.728/day = $51.84/month
# t3.small (2 vCPU, 2 GB) on-demand: ~$15/month (24/7)
# Lambda is 3x more expensive for steady traffic.
# Lambda wins when traffic is spiky and idle periods exist.

Output

# Decision output:

choose: ECS/Fargate for long-running, stateful, or steady traffic.

choose: Lambda for event-driven, short, and spiky workloads only.

⚠ Don't Force Lambda Where It Doesn't Belong

I've seen teams migrate a steady-traffic e-commerce backend to Lambda and see their monthly bill triple. Lambda's per-request cost is excellent for sporadic traffic, but for constant load, a small EC2 or Fargate instance is cheaper. Always run a cost simulation before migrating.

📊 Production Insight

Lambda is not cheaper than EC2 for steady loads — run the numbers.

15-minute timeout is a hard limit — you cannot batch process large files.

If you need GPU, use SageMaker or EC2 G family.

🎯 Key Takeaway

Lambda is a scalpel, not a Swiss Army knife.

Use it for short, event-driven tasks.

For everything else, pick the right compute service.

Lambda vs Alternative Decision

IfExecution time > 15 minutes

→

UseUse Fargate or Batch.

IfRequires persistent state across calls

→

UseUse EC2 or ECS with sticky sessions.

IfSteady traffic 24/7

→

UseEC2 or ECS cheaper than Lambda for constant load.

IfGPU or heavy compute required

→

UseEC2 GPU or SageMaker.

IfEvent-driven, spiky, short-lived

→

UseLambda is the perfect fit.

The Core Concepts: Serverless & Event-Driven — What Your Manager Actually Means

Your manager says 'serverless'. You hear 'no ops work'. Both are wrong.

Serverless doesn't mean servers vanish. It means you stop caring about kernel patches, SSH keys, and OS upgrades. AWS runs the hypervisor, the runtime, and the scaling plane. Your job shrinks to code and IAM permissions. That's the trade: you give up control over the execution environment in exchange for not paging at 3 AM when a disk fills up.

Event-driven is the engine behind that trade. Your Lambda function does nothing until something pokes it. An S3 upload. An API Gateway request. A DynamoDB stream. That event arrives as a JSON payload, your function processes it, and then it dies. No daemons. No polling loops. No idle costs.

The mental model: Lambda is a stateless worker pool that only exists while handling a single request. If you write code that assumes long-lived connections, local file state, or sticky sessions, you will fail in production. Design for stateless idempotent handlers or don't deploy it.

EventFlow.ymlYAML

// io.thecodeforge — devops tutorial
// Minimal event-driven pipeline: S3 → Lambda → DynamoDB

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  ImageProcessorFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./handler.py
      Runtime: python3.12
      Events:
        S3UploadEvent:
          Type: S3
          Properties:
            Bucket: !Ref InputImagesBucket
            Events: s3:ObjectCreated:*
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref MetadataTable
  InputImagesBucket:
    Type: AWS::S3::Bucket
  MetadataTable:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
        - AttributeName: image_id
          AttributeType: S
      KeySchema:
        - AttributeName: image_id
          KeyType: HASH
      BillingMode: PAY_PER_REQUEST

Output

CloudFormation stack creates S3 bucket, Lambda function triggered on object creation, DynamoDB table for metadata. No servers visible. No SSH required.

⚠ Production Trap:

Event sources can retry. S3 will retry failed invocations for up to 24 hours. If your handler isn't idempotent, you'll duplicate records or corrupt state. Always use idempotency keys or upsert logic.

🎯 Key Takeaway

Serverless means you own the code and the permissions. AWS owns the rest. Event-driven means your function is a reaction, not a service.

Use Cases That Won't Burn Your Budget — And Two That Will

Lambda shines when the work is asynchronous, bursty, or short-lived. It bleeds money when you try to force it into a container-shaped hole.

The safe bets

Image/video processing on upload. S3 event triggers Lambda, you resize, transcode, or extract metadata. Perfect fit: milliseconds of CPU per file, scales to zero when no uploads happen.
Webhook handlers. Stripe, GitHub, Slack — they send JSON, you validate a signature, update a database, return 200. No keepalive costs.
Scheduled batch jobs. CloudWatch Events every 15 minutes to purge stale records or aggregate metrics. 900 invocations a day, 500ms each, costs pennies.
Real-time file transformation. CSV → Parquet before loading into Athena. Lambda grabs the S3 object, transforms in memory, writes to a target bucket.

The traps

Synchronous request-response APIs with tight latency SLAs (<100ms p99). Cold starts kill you. Yes, Provisioned Concurrency exists. Yes, it costs 3x more per GB-hour than warm Lambda.
Long-running data processing (>15 minutes runtime). Lambda hard caps at 15 minutes. If your ETL job runs 20 minutes, you can't split it? Lambda is the wrong tool. Use EMR or Fargate.
WebSocket connections with 10k+ concurrent users. Lambda per-connection costs scale linearly with active connections. A single t3.medium handling WebSockets costs less at scale.

ScheduledCleanup.ymlYAML

// io.thecodeforge — devops tutorial
// Lambda scheduled to purge expired session tokens every 15 minutes

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  SessionCleanupFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./cleanup.py
      Runtime: python3.12
      Events:
        ScheduledPurge:
          Type: Schedule
          Properties:
            Schedule: rate(15 minutes)
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref SessionTable
      Timeout: 120
  SessionTable:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
        - AttributeName: session_id
          AttributeType: S
        - AttributeName: expires_at
          AttributeType: N
      KeySchema:
        - AttributeName: session_id
          KeyType: HASH
      BillingMode: PAY_PER_REQUEST
      TTLSpecifications:
        AttributeName: expires_at
        Enabled: true

Output

Creates a DynamoDB table with TTL enabled. Lambda runs every 15 minutes to process expired items. Cost: ~$0.30/month at typical load.

🔥Senior Shortcut:

Before building a Lambda for any use case, calculate the cost at 10x your expected traffic. If the bill makes you flinch, look at Fargate Spot or ECS with capacity providers.

🎯 Key Takeaway

Use Lambda for short, asynchronous, bursty work. Avoid it for stateful, long-lived, or latency-sensitive synchronous workloads unless you have budget to burn on Provisioned Concurrency.

💰 Pricing: Pay-Per-Use — The Bill That Sneaks Up on You

Lambda pricing sounds simple: pay per request and compute duration. But the details matter when your traffic goes from zero to a million requests overnight.

Requests cost $0.20 per million. Compute charges by GB-second — memory allocation times execution time. The cheaper your memory tier, the longer your function runs, and sometimes a slightly higher memory setting finishes faster and costs less overall. Always benchmark with realistic payloads.

The real budget killer? Free tier ends after 12 months and 1 million requests. After that, sustained traffic adds up fast. A 128MB function running 500ms, hit 10 million times per month, runs roughly $35 — peanuts. But a 3GB function with 30-second cold starts and retries? That bill hits $500+ real quick.

Watch for data transfer costs too. Lambda talking to RDS or S3 in different regions racks up per-GB charges. Your serverless bill isn't just Lambda — it's the entire egress chain.

LambdaPricingExample.ymlYAML

// io.thecodeforge — devops tutorial

// Monthly cost estimate for 128MB, 500ms avg duration, 10M requests
pricing:
  region: us-east-1
  requests:
    count: 10_000_000
    cost_per_million: 0.20
    subtotal: 2.00
  compute:
    memory_gb: 0.125
    duration_seconds: 0.5
    gb_seconds_per_request: 0.0625
    total_gb_seconds: 625_000
    free_tier_gb_seconds: 400_000
    billable_gb_seconds: 225_000
    cost_per_gb_second: 0.0000166667
    subtotal: 3.75
  total_monthly: 5.75

Output

Calculated monthly cost ≈ $5.75 before free tier exhaustion.

⚠ Production Trap:

Never assume free tier lasts forever. Set a budget alert at $10 and a hard limit at $50. Lambda bills are silent until the credit card statement arrives.

🎯 Key Takeaway

Lambda pricing isn't complex — but ignoring GB-second interaction with memory tuning and data transfer will cost you real money at scale.

⚙️ Key Features — What Makes Lambda Worth the Headache

Lambda exists because nobody wants to manage servers. The core feature is automatic scaling: zero to thousands of concurrent executions in seconds, no provisioning, no load balancers. Each request gets an isolated micro-VM — your code and dependencies, no neighbors.

Event-driven execution is the architectural win. Lambda sits downstream of 200+ AWS services as a native event target. S3 object creation, DynamoDB streams, SNS, SQS, API Gateway — just drop a Lambda in the flow and you're done. No polling, no workers, no crons.

Built-in observability through CloudWatch logs, metrics, and traces. Every invocation gets a request ID, duration, memory used, and billing breakdown. You can catch failures, retry with backoff, and DLQ dead letters to SQS or SNS for reprocessing.

But don't mistake simplicity for power. Lambda is stateless by design — you cannot store local state across invocations. Any state must live in external services. That's a feature, not a bug — it forces stateless architecture that scales horizontally without thinking.

LambdaFeatureChecklist.ymlYAML

// io.thecodeforge — devops tutorial

features:
  scaling: auto, concurrency up to 1000 per region (default)
  runtime: Node.js, Python, Java, Go, .NET, Ruby, custom runtime
  triggers: S3, DynamoDB, Kinesis, SQS, SNS, API Gateway, EventBridge
  max_execution: 15 minutes per invocation
  ephemeral_storage: 512 MB /tmp directory per execution
  network: VPC support via ENI attachments
  logging: CloudWatch Logs, X-Ray traces, metrics
  security: IAM roles per function, no shared credentials

Output

Lambda scales automatically — but you must understand per-account concurrency limits and VPC cold start penalties.

🔥Senior Shortcut:

Use Lambda for event-driven glue, not for long-running batch jobs. If your function runs >5 minutes regularly, you need Step Functions or Fargate.

🎯 Key Takeaway

Lambda's killer feature is event-driven auto-scaling with zero server management — but only if your workload fits stateless, short-lived execution.

Securing Your Account with IAM

Identity and Access Management (IAM) is the front door to your AWS account. Before writing a single Lambda function, you must understand why IAM matters: it prevents accidental data leaks, stops unauthorized cost spikes, and enforces least-privilege access. The principle is simple—every action your Lambda performs, from reading an S3 object to writing logs in CloudWatch, requires explicit permission. Start by creating dedicated IAM roles for each function rather than using a shared admin role. Attach AWS managed policies like AWSLambdaBasicExecutionRole initially, then scope down to custom inline policies that specify exact ARNs of resources your function touches. Use IAM Access Analyzer to validate your policy statements against actual usage. Avoid hardcoding credentials in environment variables; rely on the execution role's temporary credentials. Enable CloudTrail to audit all IAM actions, and rotate keys regularly for any human users. This discipline prevents the all-too-common production incident where a misconfigured policy exposes a database. Treat IAM as your first security line, not an afterthought.

iam-lambda-role.ymlYAML

// io.thecodeforge — devops tutorial
// IAM role for Lambda with least privilege
Resources:
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: WriteLogs
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action: logs:CreateLogGroup
                Resource: !Sub arn:aws:logs:${AWS::Region}:${AWS::AccountId}:*
              - Effect: Allow
                Action:
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                Resource: !Sub arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*:*
        - PolicyName: ReadS3Data
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action: s3:GetObject
                Resource: !Sub arn:aws:s3:::my-secure-bucket/*

⚠ Production Trap:

Never attach the AdministratorAccess policy to a Lambda role. A single vulnerable function becomes an open door to your entire account.

🎯 Key Takeaway

Lock down each Lambda with a purpose-built IAM role that grants only the actions it needs.

thecodeforge.io

Aws Lambda Serverless

Computing in AWS

Lambda is one compute option among many, and choosing it blindly leads to cost overruns or performance headaches. Understanding why is straightforward: compute models in AWS fall on a spectrum of control versus overhead. EC2 gives you full control over the OS, runtime, and scaling but requires managing patching, capacity planning, and failover. ECS/EKS removes the underlying server management while letting you control the container orchestration. Fargate abstracts away the infrastructure entirely but still gives you per-task pricing and no cold starts. Lambda sits at the extreme end—zero infrastructure management, automatic scaling from zero to thousands of concurrent executions, but with hard limits on execution time (15 minutes max), memory (10,240 MB), and storage (512 MB /tmp). The why: If your workload is bursty, event-driven, and short-lived, Lambda is ideal. If you need persistent connections, long-running processes, or predictable latency for user-facing APIs under load, consider Fargate or EC2 with an Auto Scaling Group. Many teams build a hybrid architecture—Lambda for glue logic and ingestion pipelines, ECS for the heavy-lifting backend.

fargate-vs-lambda.ymlYAML

// io.thecodeforge — devops tutorial
// Fargate service for long-running compute
Resources:
  ComputeService:
    Type: AWS::ECS::Service
    Properties:
      ServiceName: heavy-processor
      TaskDefinition: !Ref ComputeTaskDefinition
      LaunchType: FARGATE
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: DISABLED
          SecurityGroups:
            - sg-12345
          Subnets:
            - subnet-abcdef
      DesiredCount: 2
      DeploymentConfiguration:
        MinimumHealthyPercent: 50
        MaximumPercent: 200
  ComputeTaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: compute-td
      Cpu: 1024
      Memory: 2048
      ExecutionRoleArn: !GetAtt TaskExecutionRole.Arn
      ContainerDefinitions:
        - Name: app-container
          Image: nginx:alpine
          Essential: true
          PortMappings:
            - ContainerPort: 80
              Protocol: tcp

🔥Why Fargate?

For APIs with sustained traffic > 100 req/s, Fargate avoids cold start variance and offers predictable sub-100ms latency.

🎯 Key Takeaway

Match your compute model to workload duration and latency needs—Lambda is not always the cheapest or fastest option.

● Production incidentPOST-MORTEMseverity: high

The Cold Start P99 Spike That Killed Our API Response Times

Symptom

Every morning at 9 AM, the /orders API endpoint's p99 latency jumped from 200ms to over 1.2 seconds while p50 stayed below 200ms. Newly deployed functions also showed slow first requests.

Assumption

The team assumed Lambda automatically handled warm containers and that the latency was due to database queries. They tuned DynamoDB but saw no improvement.

Root cause

Lambda's default on-demand concurrency doesn't keep containers warm during idle periods. After ~15 minutes of inactivity, all containers are reclaimed. Morning traffic caused a burst of cold starts — the first request for each concurrent container had to spin up the Python runtime, import boto3 and Pillow, and initialise the S3 client. This added 800–1200ms to those first requests.

Fix

Implemented Provisioned Concurrency for the production alias with 10 pre-warmed containers. Also reduced deployment package size from 18 MB to 4 MB by separating Pillow into a Lambda Layer. Cold start duration dropped to ~200ms.

Key lesson

Measure p50 and p99 separately — if p99 is much higher than p50, cold starts or throttling are the likely cause.
Use Provisioned Concurrency for latency-sensitive endpoints, but only for the minimum number needed.
Minimise package size and externalise heavy dependencies to Lambda Layers.

Production debug guideSymptom → Action Grid5 entries

Symptom · 01

Function times out before completing

→

Fix

Check CloudWatch Logs for 'Task timed out' line; increase timeout in function config (max 15 min). Use AWS X-Ray to identify the slowest downstream call.

Symptom · 02

High latency on first request after a period of inactivity

→

Fix

Enable Provisioned Concurrency or warm containers via scheduled EventBridge ping every 5 minutes. Check init duration in Lambda logs under 'Init Duration' in the REPORT line.

Symptom · 03

Function fails with out-of-memory error (Process exited before completing)

→

Fix

Increase memory allocation. Monitor 'Max Memory Used' in CloudWatch Logs. Use the open-source Lambda Power Tuning tool to find the optimal memory setting for cost and speed.

Symptom · 04

Throttling errors (429 'TooManyRequestsException')

→

Fix

Request a concurrency limit increase via AWS Support. Use SQS to buffer requests and decouple invocations (async invocation). Implement exponential backoff in the caller.

Symptom · 05

Database connections exhausted under load

→

Fix

Move DB client initialisation outside the handler (reuse across invocations). Reduce connection pool size — Lambda works best with 1-5 connections per container. Switch to RDS Proxy if using RDS.

★ Lambda Debug Cheat SheetQuick commands and immediate fixes for the most common Lambda production issues.

Cold start latency spike−

Immediate action

Check Init Duration in the most recent CloudWatch log REPORT line.

Commands

aws lambda get-function-configuration --function-name your-function

aws cloudwatch get-metric-statistics --metric-name InitDuration --namespace AWS/Lambda --dimensions Name=FunctionName,Value=your-function

Fix now

Add Provisioned Concurrency: aws lambda put-provisioned-concurrency-config --function-name your-function --qualifier PROD --provisioned-concurrent-executions 5

Function throttling (429 TooManyRequests)+

Out of memory (Process exited before completing)+

Timeout (Task timed out after X seconds)+

AWS Lambda vs EC2 vs Fargate

Aspect	AWS Lambda (Serverless)	EC2 Instance (Traditional)	AWS Fargate (Serverless Containers)
Billing model	Per 1ms of execution + invocation count	Per hour the instance runs (even idle)	Per second of vCPU and memory used
Scaling	Automatic — up to 1,000 concurrent by default	Manual or Auto Scaling Group (minutes to scale)	Automatic — per service or task definition
Max execution time	15 minutes per invocation	Unlimited — process runs indefinitely	Unlimited (but services are long-running)
Cold start latency	100ms–1s for first request after idle period	None (process stays resident)	Minimal (container pre-pulled if warm)
State management	Stateless — no persistent memory between calls	Stateful — in-memory state survives between requests	Stateful by design (container runs persistently)
Long-running workloads	Not suitable (15 min cap)	Ideal — batch jobs, ML training, websockets	Ideal for long-running services and workers
Operational overhead	Near zero — AWS patches, scales, monitors	High — OS updates, capacity planning, monitoring setup	Low — no OS patching, but container management needed
Best for	Event-driven, spiky, short-duration tasks	Steady, high-throughput, stateful applications	Containerized apps that need to scale and are long-running

⚙ Quick Reference

12 commands from this guide

File	Command / Code	Purpose
image_resize_handler.py	from PIL import Image	How AWS Lambda Actually Executes Your Code
orders_api_handler.py	from datetime import datetime, timezone	Wiring Lambda to the Real World
serverless-function-config.yaml	AWSTemplateFormatVersion: '2010-09-09'	Cold Starts, Memory Tuning, and the Performance Levers You A
provisioned-concurrency-snippet.yaml	AutoPublishAlias: live	Provisioned Concurrency vs Cold Start
error_handling_logging.py	logging.basicConfig(level=logging.INFO, format='%(message)s')	Production Patterns
lambda-vs-alternatives.yaml	deployment_type:	When Lambda is the Wrong Tool
EventFlow.yml	AWSTemplateFormatVersion: '2010-09-09'	The Core Concepts: Serverless & Event-Driven
ScheduledCleanup.yml	AWSTemplateFormatVersion: '2010-09-09'	Use Cases That Won't Burn Your Budget
LambdaPricingExample.yml	pricing:	💰 Pricing: Pay-Per-Use
LambdaFeatureChecklist.yml	features:	⚙️ Key Features
iam-lambda-role.yml	Resources:	Securing Your Account with IAM
fargate-vs-lambda.yml	Resources:	Computing in AWS

Key takeaways

Cold starts are a real cost

put all initialisation (DB clients, SDK objects, config) outside your handler so Lambda reuses it across warm invocations. This alone can cut average latency by 40-200ms.

Memory is your CPU dial

increasing Lambda memory from 128 MB to 1024 MB allocates 8x the CPU. For compute-heavy functions this can reduce duration enough that the higher-memory run is cheaper per invocation.

Lambda is stateless by design

never rely on in-memory state surviving between invocations. Use DynamoDB, ElastiCache, or S3 for any state that must persist. Any 'persistence' you observe in /tmp or module-level variables is a side effect of container reuse, not a guarantee.

Match the trigger to the job

API Gateway for synchronous HTTP APIs, SQS for reliable async processing with retry semantics, EventBridge for scheduled tasks and event routing. Picking the wrong trigger means fighting the tool instead of building your feature.

Always configure a dead-letter queue for async triggers

without it, failed events vanish after two retries. Monitor the DLQ for latency and alert on message arrival.

Common mistakes to avoid

5 patterns

Initialising DB connections inside the handler

Symptom

Every invocation opens a new connection, exhausting your RDS connection pool within minutes under load. The symptom is 'too many connections' errors that appear fine locally but explode in production.

Fix

Move the connection client instantiation outside the handler function so Lambda reuses the same connection across warm invocations. Use connection pooling with a small pool size (e.g., max 5 connections per container).

Ignoring the 512 MB /tmp storage limit and assuming a clean filesystem

Symptom

A previous call's temp files may still be there, causing 'file already exists' errors or stale data bugs. Lambda does NOT guarantee a clean /tmp directory between warm invocations.

Fix

Always generate unique filenames (use uuid or the request ID from context.aws_request_id) and clean up /tmp explicitly at the end of your handler. Never rely on /tmp being empty.

Setting Lambda timeout lower than the slowest downstream dependency

Symptom

If your function calls an external API that sometimes takes 8 seconds and your timeout is 3 seconds, Lambda kills the invocation and returns a 504 to the caller with no useful error message.

Fix

Set your Lambda timeout to at least 2x your expected worst-case downstream latency, implement retries with exponential backoff for transient failures, and use AWS X-Ray tracing to measure where time is actually spent.

Using synchronous invocation for long-polling or cron tasks

Symptom

If your cron job calls a Lambda synchronously and the function takes longer than 3 seconds (default timeout), the client (EventBridge) times out but the function continues to run — you get no result, no error, and no retry.

Fix

Use asynchronous invocation for EventBridge targets. Wrap your cron trigger in an async invocation pattern or use Step Functions to orchestrate longer-running tasks.

Forgotten DLQ for async triggers

Symptom

An S3 event triggers a Lambda that fails every time (e.g., a malformed file). Lambda retries twice and then silently drops the event. You never know a file was not processed.

Fix

Always attach an SQS queue as a dead-letter queue to your Lambda event source mapping for async triggers. Monitor the DLQ for failed events and alert when messages appear.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

A Lambda function handles user logins and is experiencing high tail late...

Q02SENIOR

Explain the difference between synchronous and asynchronous Lambda invoc...

Q03SENIOR

Your team wants to use Lambda to process DynamoDB Stream events. A batch...

Q04SENIOR

What is Provisioned Concurrency and when would you use it? What are the ...

Q05SENIOR

How does Lambda's scaling work? What are the concurrency limits and how ...

Q01 of 05SENIOR

A Lambda function handles user logins and is experiencing high tail latency during morning traffic spikes. The p99 latency is 1.2 seconds but the p50 is 180ms. What's likely causing this and how would you fix it?

ANSWER

This pattern is classic cold start or throttling. The p50 being low shows warm invocations are fast. The p99 spike indicates that some requests hit cold starts (new containers) or are throttled. First, check CloudWatch Logs for Init Duration and Throttles metrics. Likely cause: after idle periods (e.g., minutes with no traffic), all containers are reclaimed, so the first few requests after idle each experience a cold start. Fixes: (1) Use Provisioned Concurrency to keep a few containers warm. (2) Minimise package size and use Lambda Layers to reduce init time. (3) Use warmer via EventBridge scheduled events every 5 minutes to pre-warm containers (less reliable than Provisioned Concurrency).

FAQ · 5 QUESTIONS

Frequently Asked Questions

How much does AWS Lambda cost in production?

What is a Lambda cold start and can it be completely eliminated?

Can Lambda handle long-running background jobs?

How do I debug a Lambda function that's timing out?

Should I use container images or .zip packages for Lambda?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.

✓ Verified

production tested

July 27, 2026

last updated

1,713

articles · all by Naren

🔥

That's AWS. Mark it forged?

15 min read · try the examples if you haven't