Mid-level 12 min · March 06, 2026

AWS Lambda Cold Starts — Why P99 Spikes to 1.2s at 9 AM

Lambda cold starts added 800-1200ms to our /orders API every morning.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • AWS Lambda runs your code on demand without provisioning or managing servers
  • Three core components: Functions (your code), Triggers (event sources), Execution Environment (isolated container)
  • Cold starts add 100ms–1s latency when a new container spins up
  • Performance insight: More memory = more CPU; tuning memory can reduce both cost and duration for compute-heavy tasks
  • Production insight: Lambda bills for the full timeout duration even if your function finishes early — always set timeouts realistically
  • Biggest mistake: Assuming /tmp is clean between invocations — it persists across warm starts, causing silent data corruption
Plain-English First

Imagine you own a pizza shop but you only pay the chef when someone actually orders a pizza. The chef doesn't sit around waiting — they appear the moment an order comes in, make the pizza, then disappear. AWS Lambda is exactly that chef. You write a function, AWS runs it only when something triggers it, and you pay only for the milliseconds it runs. No server to babysit, no idle hours billed, no infrastructure to patch.

Every application needs compute power — something has to run your code. Traditionally, that meant renting a virtual machine or physical server that runs 24/7, even at 3 a.m. when zero users are online. You're paying for potential, not actual work. As cloud adoption exploded, this idle-cost problem became impossible to ignore, especially for startups and teams with unpredictable traffic spikes.

AWS Lambda, launched in 2014, flipped the model. Instead of managing servers, you upload a function — a single, focused piece of logic — and AWS handles everything else: provisioning, scaling, patching, and availability. The term 'serverless' doesn't mean there are no servers; it means YOU don't manage them. The servers exist, they're just Amazon's problem. This lets your team focus entirely on business logic instead of infrastructure operations.

By the end of this article you'll understand how Lambda executes code, how to wire it to real-world triggers like API Gateway and S3, how to avoid the cold start trap that kills performance, and how to structure a production-worthy serverless workflow. You'll also know exactly when Lambda is the right tool — and when it absolutely isn't.

How AWS Lambda Actually Executes Your Code — The Execution Model

Lambda's execution model is the foundation everything else builds on. When a trigger fires — say, an HTTP request hits API Gateway — Lambda needs to run your function. If a pre-warmed container exists from a recent invocation, Lambda reuses it. This is a 'warm start' and it's fast. If no container is available, Lambda has to bootstrap one from scratch: download your code package, spin up a runtime environment, run any initialisation code outside your handler, then finally invoke your handler. That bootstrap phase is the dreaded cold start.

Cold starts typically add 100ms–1000ms of latency depending on the runtime (.NET and Java are heavier; Node.js and Python are lighter). For a background job this is irrelevant. For a user-facing API call, it's noticeable.

Your handler function receives two objects: the event (the payload that triggered the invocation — could be an HTTP body, an S3 event, a queue message) and the context (metadata about the invocation itself — function name, memory limit, request ID). Understanding this distinction is critical: the event is about WHAT happened, the context is about WHO is running.

Code outside the handler runs once per container lifecycle. That's where you put database connections, SDK clients, and config loading — doing it inside the handler means re-initialising on every single invocation, which is both slow and wasteful.

image_resize_handler.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import boto3
import json
import os
from PIL import Image
import io

# ✅ Initialise the S3 client OUTSIDE the handler.
# This runs once when the container boots (cold start),
# then gets reused across all warm invocations — saving ~50ms per call.
s3_client = boto3.client('s3')

# Target width for all resized thumbnails
THUMBNAIL_WIDTH = 200


def handler(event, context):
    """
    Triggered by an S3 PUT event whenever a new image is uploaded
    to the 'uploads-raw' bucket. Resizes it and saves a thumbnail
    to the 'uploads-thumbnails' bucket.
    """

    # The event payload from S3 contains a list of records —
    # each record represents one file upload.
    for record in event['Records']:
        source_bucket = record['s3']['bucket']['name']
        object_key    = record['s3']['object']['key']  # e.g. 'photos/sunset.jpg'

        print(f"Processing: s3://{source_bucket}/{object_key}")

        # Download the original image bytes into memory (no temp file needed)
        response      = s3_client.get_object(Bucket=source_bucket, Key=object_key)
        image_bytes   = response['Body'].read()

        # Open image with Pillow and calculate proportional height
        original_img  = Image.open(io.BytesIO(image_bytes))
        original_w, original_h = original_img.size
        ratio         = THUMBNAIL_WIDTH / original_w
        new_height    = int(original_h * ratio)

        thumbnail     = original_img.resize((THUMBNAIL_WIDTH, new_height))

        # Save resized image to an in-memory buffer — Lambda has no persistent disk
        output_buffer = io.BytesIO()
        thumbnail.save(output_buffer, format='JPEG', quality=85)
        output_buffer.seek(0)  # Rewind buffer to the start before uploading

        # Write thumbnail to the destination bucket under the same key name
        destination_bucket = os.environ['THUMBNAIL_BUCKET']  # Read from env vars, not hardcoded
        s3_client.put_object(
            Bucket      = destination_bucket,
            Key         = object_key,
            Body        = output_buffer,
            ContentType = 'image/jpeg'
        )

        print(f"Thumbnail saved: s3://{destination_bucket}/{object_key} ({THUMBNAIL_WIDTH}x{new_height})")

    # Lambda expects a return value when invoked synchronously (e.g. via API Gateway).
    # For async triggers like S3, the return value is ignored — but it's good practice.
    return {
        'statusCode': 200,
        'body': json.dumps({'processed': len(event['Records'])})
    }
Output
START RequestId: 7f3a1c2b-... Version: $LATEST
Processing: s3://uploads-raw/photos/sunset.jpg
Thumbnail saved: s3://uploads-thumbnails/photos/sunset.jpg (200x133)
END RequestId: 7f3a1c2b-...
REPORT RequestId: 7f3a1c2b-... Duration: 312.45 ms Billed Duration: 313 ms Memory Size: 256 MB Max Memory Used: 89 MB
Pro Tip: The Container Reuse Rule
Anything initialised outside your handler (DB connections, SDK clients, parsed config) is cached for the lifetime of the container — potentially minutes or hours. This is a feature, not a bug. Put expensive initialisation there. But never assume a clean slate between invocations: a previous call's temp files or in-memory state might still exist. Always write to /tmp explicitly and defensively.
Production Insight
Cold starts burn the most time on first invocation after idle periods.
Use the Init Duration field in CloudWatch logs to measure it.
If your init duration exceeds 200ms, consider reducing package size or switching runtimes.
Key Takeaway
Initialise everything outside the handler.
Measure Init Duration to quantify cold start.
Use Provisioned Concurrency only for latency-critical endpoints.
Cold Start Mitigation Decision
IfFunction is latency-sensitive (user-facing API)
UseUse Provisioned Concurrency for pre-warmed containers.
IfFunction runs less than once per 15 minutes
UseCold start penalty is small; optimise code instead.
IfDeployment package > 10 MB with heavy deps
UseExternalise dependencies to a Lambda Layer to reduce download time.

Wiring Lambda to the Real World — Triggers, Events, and API Gateway

A Lambda function sitting alone does nothing. It needs a trigger — an AWS service that says 'hey, something happened, go run'. The trigger determines the shape of the event object your handler receives, which is why reading the AWS event schema docs for each trigger type matters.

The most common triggers in production are: API Gateway (HTTP requests), S3 (file uploads/deletions), SQS (queue messages for async processing), EventBridge (scheduled cron jobs and event routing), DynamoDB Streams (react to database changes), and SNS (fan-out notifications).

API Gateway is the one you'll use for building REST APIs or webhooks. When a request hits your endpoint, API Gateway wraps it into a structured event object and hands it to Lambda. Your function returns a response object with a statusCode, headers, and body, and API Gateway translates that back into a real HTTP response.

The Lambda Proxy Integration model (the default and recommended approach) passes the raw request to your function and expects you to construct the full HTTP response yourself. This gives you complete control over status codes, CORS headers, and response bodies. Older tutorials show Lambda custom integrations — avoid them, they're fiddly and add complexity for no gain.

For async workloads, SQS is your best friend. Rather than calling Lambda directly (which creates tight coupling), push messages to a queue and let Lambda poll and process them in batches. This naturally handles traffic bursts without rate-limit errors.

orders_api_handler.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import json
import boto3
import uuid
import os
from datetime import datetime, timezone

# DynamoDB resource initialised at cold-start — reused on warm invocations
dynamodb    = boto3.resource('dynamodb')
orders_table = dynamodb.Table(os.environ['ORDERS_TABLE_NAME'])


def handler(event, context):
    """
    Handles POST /orders from API Gateway (Lambda Proxy Integration).
    Creates a new order record in DynamoDB and returns the order ID.

    API Gateway event shape (key fields):
      event['httpMethod']         -> 'POST'
      event['path']               -> '/orders'
      event['body']               -> Raw JSON string of the request body
      event['requestContext']     -> Metadata including caller identity
    """

    http_method = event.get('httpMethod', '')

    # Route guard — this function only handles order creation
    if http_method != 'POST':
        return _build_response(405, {'error': f'Method {http_method} not allowed'})

    # API Gateway sends the body as a raw string — we must parse it
    try:
        request_body = json.loads(event.get('body') or '{}')
    except json.JSONDecodeError:
        return _build_response(400, {'error': 'Request body must be valid JSON'})

    # Validate required fields before touching the database
    required_fields = ['customer_id', 'items', 'total_amount']
    missing_fields  = [f for f in required_fields if f not in request_body]
    if missing_fields:
        return _build_response(400, {'error': f'Missing required fields: {missing_fields}'})

    # Build the order record
    order_id    = str(uuid.uuid4())  # Unique ID for this order
    created_at  = datetime.now(timezone.utc).isoformat()  # ISO 8601, always UTC

    order_record = {
        'order_id':      order_id,
        'customer_id':   request_body['customer_id'],
        'items':         request_body['items'],
        'total_amount':  str(request_body['total_amount']),  # DynamoDB doesn't support float natively
        'status':        'PENDING',
        'created_at':    created_at
    }

    # Write to DynamoDB — put_item overwrites if the key already exists,
    # so ConditionExpression ensures we never silently stomp an existing order
    orders_table.put_item(
        Item=order_record,
        ConditionExpression='attribute_not_exists(order_id)'
    )

    print(f"Order created: {order_id} for customer {request_body['customer_id']}")

    return _build_response(201, {
        'order_id':   order_id,
        'status':     'PENDING',
        'created_at': created_at
    })


def _build_response(status_code, body_dict):
    """
    Constructs the response object API Gateway expects.
    CORS headers are included so browser-based clients can call this API.
    Without these headers, browsers silently block the response.
    """
    return {
        'statusCode': status_code,
        'headers': {
            'Content-Type':                'application/json',
            'Access-Control-Allow-Origin': '*'  # Tighten to your domain in production
        },
        'body': json.dumps(body_dict)
    }
Output
START RequestId: 9a2e4f7d-... Version: $LATEST
Order created: 3c8b1a2f-4e9d-4f3a-b1c2-7d8e9f0a1b2c for customer cust_00847
END RequestId: 9a2e4f7d-...
REPORT RequestId: 9a2e4f7d-... Duration: 187.22 ms Billed Duration: 188 ms Memory Size: 128 MB Max Memory Used: 54 MB
# HTTP Response seen by the client:
# Status: 201 Created
# Body: {"order_id": "3c8b1a2f-4e9d-4f3a-b1c2-7d8e9f0a1b2c", "status": "PENDING", "created_at": "2024-03-15T14:22:01.483921+00:00"}
Watch Out: The Missing CORS Headers Trap
If your Lambda-backed API works perfectly in Postman but fails in the browser with a CORS error, you're missing Access-Control-Allow-Origin in your Lambda response AND you haven't configured the OPTIONS preflight method in API Gateway. You need both — Lambda handles the headers on real requests, but API Gateway must respond to preflight OPTIONS requests independently (enable CORS on the resource in the API Gateway console or via CloudFormation).
Production Insight
SQS-based async invocation removes tight coupling and handles bursts gracefully.
But if your function is idempotent and fast, SQS batch processing can process 10 messages per invocation.
Always set a dead-letter queue for failed messages — without it, messages silently vanish.
Key Takeaway
Pick the trigger that fits the job.
API Gateway for sync calls, SQS for async reliability.
Never use SNS direct to Lambda without a queue — retries are weak.
Choosing the Right Trigger Pattern
IfNeed RESTful HTTP API
UseUse API Gateway with Proxy Integration.
IfNeed reliable async processing with retry
UseUse SQS with Lambda as event source mapping.
IfReact to file uploads or changes in a bucket
UseUse S3 event notifications directly to Lambda.
IfNeed scheduled execution or cron
UseUse EventBridge (CloudWatch Events) rule as trigger.

Lambda Event Source Reference Table — What Triggers Your Function

The following table catalogs the most common Lambda event sources, their invocation model, payload size limits, retry behavior, and best-fit use cases. Knowing these details helps you design reliable, cost-efficient serverless workflows. For each source, the event structure is fixed by AWS — you cannot change the schema — so you must parse the documented fields correctly in your handler.

Event SourceInvocation TypeMax PayloadRetry BehaviorBest For
API GatewaySynchronous10 MB (request), 10 MB (response)No automatic retries; client handlesHTTP/REST APIs, webhooks
S3 (Event Notifications)Asynchronous128 KB (event record)2 retries (async)File processing (image resize, logs, analytics)
DynamoDB StreamsStream-based1 MB (batch)Indefinite retry until data expires (24h)React to DB changes (materialized views, sync)
Kinesis Data StreamsStream-based1 MB (per record)Indefinite retry until data expires (7 days)Real-time data processing (clickstreams, logs)
SQS (Standard)Poll-based (event source mapping)256 KB per messageRetries based on redrive policyAsync decoupling, buffering, batch processing
SQS (FIFO)Poll-based (event source mapping)256 KB per messageRetries with exactly-once semanticsOrdered processing, deduplication
SNS (topic subscription)Asynchronous256 KB2 retries (async)Fan-out notifications to multiple subscribers
EventBridge (scheduled or event)Asynchronous256 KB2 retries (async)Cron jobs, event routing between AWS services
CloudFront (Lambda@Edge)Synchronous1 MBNo automatic retriesModify HTTP request/response at edge
Lambda Function URLSynchronous10 MB (request/response)No automatic retriesSimple HTTP endpoints without API Gateway

Key details to remember: - Asynchronous invocations (S3, SNS, EventBridge) retry twice with 1–2 minute delays. Always configure a dead-letter queue (DLQ) for these triggers. - Stream-based triggers (DynamoDB, Kinesis) retry until the data record expires — a persistent bug will block the entire shard. Use bisectBatchOnFunctionError to split batches on failure. - Synchronous triggers (API Gateway, Lambda Function URL) do not retry; your client or upstream service must implement retry logic. - Payload size limits are hard: if your S3 event payload exceeds 128 KB, S3 will send the notification anyway but truncates the event — use the Deep Archive storage class sparingly to avoid this.

For a full list of event sources and their exact event schemas, refer to the [AWS Lambda Developer Guide — Using AWS Lambda with other services](https://docs.aws.amazon.com/lambda/latest/dg/lambda-services.html).

Quick Reference: Async vs Sync Invocation
Async invocation returns '202 Accepted' immediately and runs the function in the background. Sync invocation waits for the function to complete and returns the result. If you need the response, use sync. If you don't need the response and want decoupling, use async. Stream-based triggers are a hybrid — they poll the stream and invoke your function synchronously with batches of records.
Production Insight
Payload size limits are a hidden trap. For example, S3 event notifications max out at 128 KB per event — if you upload many small files in a single S3 PUT (via multipart upload), the event can be truncated silently. Always validate that the event object contains the expected number of records and fields, or switch to SQS notifications for large batches.
Key Takeaway
Match the event source to your reliability requirements.
Async sources need a DLQ.
Stream sources need idempotent handlers.
Synchronous sources need client-side retry.

Cold Starts, Memory Tuning, and the Performance Levers You Actually Control

Lambda gives you one direct performance dial: memory. You set it anywhere from 128 MB to 10,240 MB. What most developers don't realise is that CPU allocation scales proportionally with memory. A 1,024 MB Lambda function gets roughly 8x the CPU of a 128 MB one. If your function is CPU-bound (image processing, data transformation, encryption), doubling the memory can halve the execution time — and since you pay for duration × memory, the cost often stays the same or even drops.

Cold starts are the other major lever. Three strategies exist: Provisioned Concurrency, keeping functions warm with scheduled EventBridge pings, and minimising package size.

Provisioned Concurrency is the only AWS-supported solution. You pay for a set number of pre-warmed containers to stay alive at all times. It costs more than on-demand but eliminates cold starts entirely for that concurrency slot. Use it for customer-facing APIs where tail latency matters.

Package size matters because Lambda has to download your deployment package before running it. A 50 MB Python package with unnecessary dependencies cold-starts noticeably slower than a 3 MB lean package. Use Lambda Layers to separate large dependencies (like numpy or Pillow) from your application code, and use .zip deployment packages rather than container images unless you specifically need Docker tooling.

Finally, watch your timeout setting. The default is 3 seconds. Downstream API calls, DB queries, and S3 operations can easily exceed this. Set it realistically (15 minutes max) and always handle partial failures gracefully.

serverless-function-config.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# AWS SAM (Serverless Application Model) template — the standard way to
# define Lambda functions as Infrastructure-as-Code.
# Run: sam build && sam deploy --guided

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Description: Orders API — production-grade Lambda configuration

Globals:
  Function:
    # Runtime for all functions in this template unless overridden
    Runtime: python3.12
    # Timeout generous enough for DynamoDB + downstream calls, not infinite
    Timeout: 30
    # Environment variables available to all functions
    Environment:
      Variables:
        LOG_LEVEL:          INFO
        ORDERS_TABLE_NAME:  !Ref OrdersTable

Resources:

  OrdersApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: orders-api-handler
      CodeUri:       src/
      Handler:       orders_api_handler.handler

      # 512 MB gives ~4x CPU vs 128 MB — worth it for JSON parsing + DynamoDB calls
      # Run AWS Lambda Power Tuning tool to find YOUR optimal memory setting
      MemorySize: 512

      # Provisioned concurrency: 5 containers always warm for the production alias
      # This eliminates cold starts for the first 5 concurrent requests
      # Cost: ~$0.0000041 per GB-second × 5 containers × all hours in month
      AutoPublishAlias: live
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 5

      # IAM permissions — principle of least privilege
      # Only grant what this specific function actually needs
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref OrdersTable

      # API Gateway trigger — Lambda Proxy Integration (recommended)
      Events:
        CreateOrder:
          Type:  Api
          Properties:
            Path:   /orders
            Method: POST

        # OPTIONS method needed for browser CORS preflight requests
        CreateOrderOptions:
          Type:  Api
          Properties:
            Path:   /orders
            Method: OPTIONS

  OrdersTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName:    orders
      BillingMode:  PAY_PER_REQUEST  # Serverless billing — no provisioned capacity to manage
      AttributeDefinitions:
        - AttributeName: order_id
          AttributeType: S
      KeySchema:
        - AttributeName: order_id
          KeyType:        HASH

      # Point-in-time recovery — always enable for production data
      PointInTimeRecoverySpecification:
        PointInTimeRecoveryEnabled: true

Outputs:
  OrdersApiEndpoint:
    Description: "API Gateway endpoint for the Orders API"
    Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/orders"
Output
# sam build output:
Building codeuri: src/ runtime: python3.12
Build Succeeded
Built Artifacts: .aws-sam/build
# sam deploy output (key lines):
Deploying with following values
Stack name : orders-api-stack
Region : eu-west-1
Confirm changes: Yes
CloudFormation events:
CREATE_COMPLETE AWS::DynamoDB::Table OrdersTable
CREATE_COMPLETE AWS::Lambda::Function OrdersApiFunction
CREATE_COMPLETE AWS::ApiGateway::RestApi ServerlessRestApi
Outputs:
OrdersApiEndpoint: https://x7k2mn3p4q.execute-api.eu-west-1.amazonaws.com/Prod/orders
Successfully created/updated stack - orders-api-stack in eu-west-1
Interview Gold: Lambda Power Tuning
AWS publishes an open-source Step Functions state machine called 'Lambda Power Tuning' (github.com/alexcasalboni/aws-lambda-power-tuning). It runs your function at every memory configuration from 128 MB to 10 GB, measures cost and duration, and plots the optimal setting. Mentioning this tool in an interview signals you understand production cost optimisation, not just configuration.
Production Insight
Memory tuning is the only performance lever — use Power Tuning to find the sweet spot.
Provisioned Concurrency eliminates cold starts but adds cost.
Container images slower to deploy than .zip; prefer .zip unless you need large binaries.
Key Takeaway
Memory scales CPU, not just RAM.
Run Power Tuning once per function.
Prefer .zip over container images for faster cold starts.
Memory & Cold Start Trade-off Decision
IfFunction is CPU-bound and cost-sensitive
UseRun Power Tuning. Often 1024 MB is cheaper than 256 MB because it finishes faster.
Ifp99 latency must stay under 300ms
UseUse Provisioned Concurrency. Do not rely on ping warming.

Provisioned Concurrency vs Cold Start — Visual Breakdown

Provisioned Concurrency is the only AWS-native mechanism that guarantees zero cold starts for a fixed number of concurrent invocations. The diagram below contrasts the request flow for an on-demand function (which may incur a cold start) versus a function with Provisioned Concurrency.

How it works: When you enable Provisioned Concurrency, Lambda pre-initialises a specified number of execution environments and keeps them warm. Incoming invocations are routed to these warm environments instantly. On-demand environments are still used for invocations beyond the provisioned count, so cold starts still occur when the provisioned pool is exhausted. The visual logic flow:

  • On-Demand Path: Request arrives → check for warm container → if none found → cold start (init + handler delay).
  • Provisioned Concurrency Path: Request arrives → route to pre-warmed container → warm start (handler only, no init delay).

The benefit is a 100% elimination of cold start latency for the initial set of concurrent requests. The cost is paying for those environments even when idle.

When to use it: Only for latency-critical production endpoints where p99 must stay below, say, 500ms. For batch processing or background jobs, on-demand is sufficient and cheaper.

When NOT to use it: If your function is rarely invoked (once per hour), the cost of keeping a container warm 24/7 will far exceed any performance benefit. A simple scheduled EventBridge ping (every 5 minutes) is cheaper and nearly as effective — though not guaranteed, as AWS may reclaim containers during maintenance.

Alternative warming patterns: A common pattern is to set up an EventBridge rule that invokes your function every 5 minutes with a synthetic event (e.g., a 'warmup' field). This keeps 1–2 containers warm without Provisioned Concurrency cost. However, this is unreliable under burst traffic — if multiple concurrent requests arrive simultaneously, only one container may be warm. Provisioned Concurrency guarantees capacity.

provisioned-concurrency-snippet.yamlYAML
1
2
3
4
5
6
7
8
9
10
# Snippet: enabling Provisioned Concurrency in SAM
# Full template in previous section; this is the critical part

      AutoPublishAlias: live
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 5

# After deploy, verify with:
# aws lambda get-provisioned-concurrency-config --function-name orders-api-handler --qualifier live
# Expected: Status: READY, AllocatedProvisionedConcurrentExecutions: 5
Output
# Output:
{
"AllocatedProvisionedConcurrentExecutions": 5,
"AvailableProvisionedConcurrentExecutions": 5,
"Status": "READY"
}
Provisioned Concurrency Costs Money Even When Idle
You pay for Provisioned Concurrency on a per-GB-second basis, even when no requests are being processed. For 5 containers at 512 MB for a full month, that's roughly $10. If your endpoint handles <1 req/min, a scheduled warmer may be more cost-effective. Always monitor the ProvisionedConcurrencySpillover metric to see how many requests exceed the provisioned pool.
Production Insight
Provisioned Concurrency is the only way to guarantee zero cold starts for latency-critical endpoints. Use it sparingly — only for the minimum number of concurrent executions needed to cover p95 traffic. For everything else, optimize package size and runtime to keep cold starts under 200ms.
Key Takeaway
Provisioned Concurrency eliminates cold starts but at a cost. Use only for latency-sensitive paths where p99 latency must stay under a threshold. Monitor spillover to adjust the count.

Lambda Resource Limits & Constraints Table — What You Can't Change

Lambda has specific hard limits that constrain how you design your serverless applications. Exceeding these limits results in deployment failures, throttling, or runtime errors. The table below shows the most important limits — know them before you architect your system.

ResourceLimitNotes
Memory per function128 MB – 10,240 MB (in 1 MB increments)CPU scales with memory; more memory = more CPU
Ephemeral storage /tmp512 MBShared across warm invocations; not reset on reuse
Maximum execution timeout15 minutes (900 seconds)Hard limit; cannot be increased
Deployment package size (.zip)250 MB (unzipped), 50 MB (zipped for direct upload)Use Lambda Layers to exceed: up to 5 layers, each up to 250 MB unzipped
Container image size10 GB (ECR image)Larger images cause slower cold starts
Concurrent executions per region (default)1,000Can be increased via service quota request
Concurrent executions per function (default)1,000 (unreserved)Can be limited with reserved concurrency
Request/response payload size (sync)256 KB (6 MB for API Gateway)For larger payloads, use S3 or streaming
Function environment variables4 KB total (unencrypted)Use AWS Secrets Manager or Parameter Store for secrets
Lambda Layers per function5Layer size counts toward total unzipped limit (250 MB)
Event source mappings per function10 (for SQS, DynamoDB, Kinesis)Add more by using multiple triggers
Reserved concurrency per function0 – regional limitSetting reserved concurrency guarantees capacity but blocks other functions
Provisioned Concurrency per function0 – regional limitRegional limit is 5,000 per region by default
Function execution roleAWS IAM roleLambda attaches this role to the execution environment

How to work around limits: - Package size: If you exceed 250 MB unzipped, separate large libraries (Panda, OpenCV, etc.) into Lambda Layers. Each layer can be up to 250 MB, and you can use up to 5 layers, giving you an effective 1.25 GB total. - Timeout: Lambda supports up to 15 minutes. For longer jobs, use AWS Step Functions to orchestrate multiple Lambda calls, or switch to Fargate/Batch. - Concurrency: If you anticipate more than 1,000 concurrent executions, request a limit increase in the AWS Service Quotas console. Also consider using SQS buffering to smooth traffic. - Payload size: For payloads larger than 256 KB, upload to S3 and pass the object key in the event. Lambda reads from S3 instead of the event body.

These limits are not negotiable — building against them from day one avoids costly refactors later.

The 50 MB Zip Limit Is for Direct Upload Only
If you use AWS SAM, CloudFormation, or the AWS CLI to deploy from S3, the 50 MB zip limit doesn't apply — the package is stored in S3 and Lambda downloads it from there. The 250 MB unzipped limit still applies. For large packages, always deploy via S3, or use container images (up to 10 GB).
Production Insight
Most production incidents involving Lambda stem from hitting a limit unexpectedly: timeout too low, payload too large, concurrency exhausted. Include limit checks in your CI/CD pipeline. Use the AWS Lambda API to query configured limits and alert when you approach thresholds.
Key Takeaway
Know the hard limits before building. Package size, timeout, payload size, and concurrency are the top constraints. Design within them or use layers/step functions to extend.

Production Patterns: Error Handling, Retries, and Observability

Lambda's default retry behaviour depends on invocation type. Synchronous invocations (API Gateway, custom apps) do NOT retry automatically — your client must handle errors. Asynchronous invocations (S3, SNS, EventBridge) retry twice using built-in retry logic, then discard the event unless you configure a dead-letter queue (DLQ). Stream-based triggers (DynamoDB Streams, Kinesis) retry until the data expires (default 24 hours) and block the shard — meaning a permanently failing function stalls your stream.

For synchronous APIs, implement your own retry with exponential backoff inside Lambda. For async triggers, always attach a DLQ (SQS or SNS) to capture failed events. Without a DLQ, failed events vanish after two retries — you'll never know.

Observability in Lambda is driven by CloudWatch Logs, CloudWatch Metrics, and AWS X-Ray. Every invocation writes a REPORT line showing duration, billed duration, memory used, and init duration. X-Ray traces show downstream calls to DynamoDB, S3, and other services — essential for debugging latency.

Structured logging is critical. Use JSON-formatted logs with a correlation ID (often the X-Ray trace ID) so you can correlate invocations. Avoid print() statements without context.

error_handling_logging.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import json
import logging
import os
import traceback

# Configure structured JSON logging
logging.basicConfig(level=logging.INFO, format='%(message)s')
logger = logging.getLogger()


def handler(event, context):
    # Capture X-Ray trace ID for correlation
    trace_id = context.aws_request_id
    logger.info(json.dumps({
        'trace_id': trace_id,
        'event_type': type(event).__name__,
        'message': 'Function invoked'
    }))

    try:
        # business logic
        result = process_order(event)
        return _build_response(200, result)
    except ValidationError as e:
        logger.warning(json.dumps({
            'trace_id': trace_id,
            'error': str(e),
            'message': 'Validation failed'
        }))
        return _build_response(400, {'error': str(e)})
    except ExternalServiceError as e:
        logger.error(json.dumps({
            'trace_id': trace_id,
            'error': str(e),
            'message': 'Downstream service failed'
        }))
        # Retry with exponential backoff (simplified)
        time.sleep(2 ** context.retry_attempt)  # not recommended for sync; use async DLQ instead
        raise  # Let Lambda retry if async
    except Exception:
        logger.critical(json.dumps({
            'trace_id': trace_id,
            'error': traceback.format_exc(),
            'message': 'Unhandled exception'
        }))
        return _build_response(500, {'error': 'Internal server error'})

def _build_response(status_code, body):
    return {
        'statusCode': status_code,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps(body)
    }
Output
# CloudWatch Logs output:
{"trace_id": "9a2e4f7d-...", "event_type": "dict", "message": "Function invoked"}
{"trace_id": "9a2e4f7d-...", "error": "Invalid field: customer_id", "message": "Validation failed"}
# Lambda REPORT line:
REPORT RequestId: 9a2e4f7d-... Duration: 45.23 ms Billed Duration: 46 ms Memory Size: 128 MB Max Memory Used: 32 MB
The Async Retry Model Mental Model
  • Synchronous invocations: no automatic retries. The caller must handle errors.
  • Asynchronous invocations: two automatic retries with exponential backoff (0, 1, 2 min delays).
  • Stream-based triggers: retry forever (up to 24 hours or 7 days for Kinesis).
  • Always configure a dead-letter queue (DLQ) for async triggers to catch failures.
  • DLQ can be an SQS queue (for processing later) or an SNS topic (for alerting).
Production Insight
Never assume your Lambda will succeed on first try.
Always attach a DLQ to async triggers — otherwise failures vanish silently.
Use X-Ray to trace every downstream call; it's the only way to see where time is spent.
Key Takeaway
Synchronous: no retries — you own the failure.
Async: two retries, then DLQ.
Stream: indefinite retry — fix fast or skip gracefully.
Retry Strategy by Invocation Type
IfInvocation via API Gateway (sync)
UseImplement retry in client or inside Lambda with circuit breaker.
IfS3, SNS, EventBridge (async)
UseLambda retries twice; configure DLQ to capture remaining failures.
IfDynamoDB Streams, Kinesis (stream)
UseLambda retries until data expires; fix the bug or implement filter to skip bad records.

When Lambda is the Wrong Tool — Alternatives and Trade-offs

Lambda excels at short-lived, event-driven, bursty workloads. But it's not a general-purpose compute platform. If your workload contradicts any of the following, reach for another service.

First, long-running processes: Lambda's hard 15-minute timeout means you cannot run a nightly batch job that takes an hour. Use AWS Batch or ECS/Fargate for that.

Second, stateful applications: Lambda is stateless by design. If your application needs to hold client connections (WebSockets), maintain session state in memory, or use files that persist beyond a single invocation, you'll fight the architecture. Use EC2 or ECS with sticky sessions instead.

Third, predictable, steady traffic: If your load is constant 24/7, Lambda's per-ms billing is more expensive than a low-cost EC2 instance or a reserved instance. A t3.small running 24 hours costs $15/month; 5 million Lambda invocations at 200ms average could cost $8, but steady traffic at 100 req/s would push cost higher than an EC2.

Fourth, heavy GPU/compute: Lambda has no GPU support. ML training, 3D rendering, or video transcoding with high compute needs are better on EC2 GPU instances or SageMaker.

Fifth, very low latency requirements (<10ms): Lambda's cold start and network overhead make it unsuitable for sub-millisecond use cases like real-time trading. Use containers on EC2 or custom hardware.

Finally, large binary processing: Lambda's deployment package limit is 250 MB (unzipped) and 50 MB (zipped) for direct upload. If you're processing multi-GB files, you'll hit storage and timeout limits. Use ECS or Batch with EFS.

lambda-vs-alternatives.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Quick decision reference for choosing Lambda or an alternative

deployment_type:
  - name: Lambda
    good_for:
      - Event-driven functions
      - Spiky HTTP APIs
      - Scheduled tasks under 15 minutes
    bad_for:
      - Long-running batch jobs (over 15 min)
      - Stateful applications
      - Predictable steady traffic
  - name: ECS/Fargate
    good_for:
      - Containerized workloads (stateful or stateless)
      - Long-running services
      - WebSocket servers
      - Background workers needing >15 min
    bad_for:
      - Very short-lived functions (cold start is higher)
      - Simple event reactions (overkill)
  - name: EC2
    good_for:
      - Full control of OS/GPU
      - Stable traffic patterns
      - Applications needing persistent storage
    bad_for:
      - Variable traffic (idle cost)
      - Need minimal ops overhead

# Cost comparison example:
# Scenario: 100 req/s steady, average duration 100ms, 512 MB memory
# Lambda: 100 * 86400 = 8.64M invocations/day -> $1.728/day = $51.84/month
# t3.small (2 vCPU, 2 GB) on-demand: ~$15/month (24/7)
# Lambda is 3x more expensive for steady traffic.
# Lambda wins when traffic is spiky and idle periods exist.
Output
# Decision output:
choose: ECS/Fargate for long-running, stateful, or steady traffic.
choose: Lambda for event-driven, short, and spiky workloads only.
Don't Force Lambda Where It Doesn't Belong
I've seen teams migrate a steady-traffic e-commerce backend to Lambda and see their monthly bill triple. Lambda's per-request cost is excellent for sporadic traffic, but for constant load, a small EC2 or Fargate instance is cheaper. Always run a cost simulation before migrating.
Production Insight
Lambda is not cheaper than EC2 for steady loads — run the numbers.
15-minute timeout is a hard limit — you cannot batch process large files.
If you need GPU, use SageMaker or EC2 G family.
Key Takeaway
Lambda is a scalpel, not a Swiss Army knife.
Use it for short, event-driven tasks.
For everything else, pick the right compute service.
Lambda vs Alternative Decision
IfExecution time > 15 minutes
UseUse Fargate or Batch.
IfRequires persistent state across calls
UseUse EC2 or ECS with sticky sessions.
IfSteady traffic 24/7
UseEC2 or ECS cheaper than Lambda for constant load.
IfGPU or heavy compute required
UseEC2 GPU or SageMaker.
IfEvent-driven, spiky, short-lived
UseLambda is the perfect fit.
● Production incidentPOST-MORTEMseverity: high

The Cold Start P99 Spike That Killed Our API Response Times

Symptom
Every morning at 9 AM, the /orders API endpoint's p99 latency jumped from 200ms to over 1.2 seconds while p50 stayed below 200ms. Newly deployed functions also showed slow first requests.
Assumption
The team assumed Lambda automatically handled warm containers and that the latency was due to database queries. They tuned DynamoDB but saw no improvement.
Root cause
Lambda's default on-demand concurrency doesn't keep containers warm during idle periods. After ~15 minutes of inactivity, all containers are reclaimed. Morning traffic caused a burst of cold starts — the first request for each concurrent container had to spin up the Python runtime, import boto3 and Pillow, and initialise the S3 client. This added 800–1200ms to those first requests.
Fix
Implemented Provisioned Concurrency for the production alias with 10 pre-warmed containers. Also reduced deployment package size from 18 MB to 4 MB by separating Pillow into a Lambda Layer. Cold start duration dropped to ~200ms.
Key lesson
  • Measure p50 and p99 separately — if p99 is much higher than p50, cold starts or throttling are the likely cause.
  • Use Provisioned Concurrency for latency-sensitive endpoints, but only for the minimum number needed.
  • Minimise package size and externalise heavy dependencies to Lambda Layers.
Production debug guideSymptom → Action Grid5 entries
Symptom · 01
Function times out before completing
Fix
Check CloudWatch Logs for 'Task timed out' line; increase timeout in function config (max 15 min). Use AWS X-Ray to identify the slowest downstream call.
Symptom · 02
High latency on first request after a period of inactivity
Fix
Enable Provisioned Concurrency or warm containers via scheduled EventBridge ping every 5 minutes. Check init duration in Lambda logs under 'Init Duration' in the REPORT line.
Symptom · 03
Function fails with out-of-memory error (Process exited before completing)
Fix
Increase memory allocation. Monitor 'Max Memory Used' in CloudWatch Logs. Use the open-source Lambda Power Tuning tool to find the optimal memory setting for cost and speed.
Symptom · 04
Throttling errors (429 'TooManyRequestsException')
Fix
Request a concurrency limit increase via AWS Support. Use SQS to buffer requests and decouple invocations (async invocation). Implement exponential backoff in the caller.
Symptom · 05
Database connections exhausted under load
Fix
Move DB client initialisation outside the handler (reuse across invocations). Reduce connection pool size — Lambda works best with 1-5 connections per container. Switch to RDS Proxy if using RDS.
★ Lambda Debug Cheat SheetQuick commands and immediate fixes for the most common Lambda production issues.
Cold start latency spike
Immediate action
Check Init Duration in the most recent CloudWatch log REPORT line.
Commands
aws lambda get-function-configuration --function-name your-function
aws cloudwatch get-metric-statistics --metric-name InitDuration --namespace AWS/Lambda --dimensions Name=FunctionName,Value=your-function
Fix now
Add Provisioned Concurrency: aws lambda put-provisioned-concurrency-config --function-name your-function --qualifier PROD --provisioned-concurrent-executions 5
Function throttling (429 TooManyRequests)+
Immediate action
Check CloudWatch metric Throttles; verify concurrency limit.
Commands
aws cloudwatch get-metric-statistics --metric-name Throttles --namespace AWS/Lambda --period 300 --statistics Sum
aws lambda get-account-settings | grep Concurrency
Fix now
Reduce function concurrency per function or request limit increase. Add SQS queue for async decoupling.
Out of memory (Process exited before completing)+
Immediate action
Look for 'exit code 137' in logs.
Commands
aws logs filter-log-events --log-group-name /aws/lambda/your-function --filter-pattern 'exit code 137'
aws lambda update-function-configuration --function-name your-function --memory-size 1024
Fix now
Increase memory via CLI; use Power Tuning to find optimal setting.
Timeout (Task timed out after X seconds)+
Immediate action
Identify the slow call in CloudWatch X-Ray traces.
Commands
aws xray get-trace-summaries --start-time <unix> --end-time <unix> --filter 'service("your-function")'
aws lambda update-function-configuration --function-name your-function --timeout 30
Fix now
Extend timeout, implement retry with backoff, and add circuit breaker for downstream dependencies.
AWS Lambda vs EC2 vs Fargate
AspectAWS Lambda (Serverless)EC2 Instance (Traditional)AWS Fargate (Serverless Containers)
Billing modelPer 1ms of execution + invocation countPer hour the instance runs (even idle)Per second of vCPU and memory used
ScalingAutomatic — up to 1,000 concurrent by defaultManual or Auto Scaling Group (minutes to scale)Automatic — per service or task definition
Max execution time15 minutes per invocationUnlimited — process runs indefinitelyUnlimited (but services are long-running)
Cold start latency100ms–1s for first request after idle periodNone (process stays resident)Minimal (container pre-pulled if warm)
State managementStateless — no persistent memory between callsStateful — in-memory state survives between requestsStateful by design (container runs persistently)
Long-running workloadsNot suitable (15 min cap)Ideal — batch jobs, ML training, websocketsIdeal for long-running services and workers
Operational overheadNear zero — AWS patches, scales, monitorsHigh — OS updates, capacity planning, monitoring setupLow — no OS patching, but container management needed
Best forEvent-driven, spiky, short-duration tasksSteady, high-throughput, stateful applicationsContainerized apps that need to scale and are long-running

Key takeaways

1
Cold starts are a real cost
put all initialisation (DB clients, SDK objects, config) outside your handler so Lambda reuses it across warm invocations. This alone can cut average latency by 40-200ms.
2
Memory is your CPU dial
increasing Lambda memory from 128 MB to 1024 MB allocates 8x the CPU. For compute-heavy functions this can reduce duration enough that the higher-memory run is cheaper per invocation.
3
Lambda is stateless by design
never rely on in-memory state surviving between invocations. Use DynamoDB, ElastiCache, or S3 for any state that must persist. Any 'persistence' you observe in /tmp or module-level variables is a side effect of container reuse, not a guarantee.
4
Match the trigger to the job
API Gateway for synchronous HTTP APIs, SQS for reliable async processing with retry semantics, EventBridge for scheduled tasks and event routing. Picking the wrong trigger means fighting the tool instead of building your feature.
5
Always configure a dead-letter queue for async triggers
without it, failed events vanish after two retries. Monitor the DLQ for latency and alert on message arrival.

Common mistakes to avoid

5 patterns
×

Initialising DB connections inside the handler

Symptom
Every invocation opens a new connection, exhausting your RDS connection pool within minutes under load. The symptom is 'too many connections' errors that appear fine locally but explode in production.
Fix
Move the connection client instantiation outside the handler function so Lambda reuses the same connection across warm invocations. Use connection pooling with a small pool size (e.g., max 5 connections per container).
×

Ignoring the 512 MB /tmp storage limit and assuming a clean filesystem

Symptom
A previous call's temp files may still be there, causing 'file already exists' errors or stale data bugs. Lambda does NOT guarantee a clean /tmp directory between warm invocations.
Fix
Always generate unique filenames (use uuid or the request ID from context.aws_request_id) and clean up /tmp explicitly at the end of your handler. Never rely on /tmp being empty.
×

Setting Lambda timeout lower than the slowest downstream dependency

Symptom
If your function calls an external API that sometimes takes 8 seconds and your timeout is 3 seconds, Lambda kills the invocation and returns a 504 to the caller with no useful error message.
Fix
Set your Lambda timeout to at least 2x your expected worst-case downstream latency, implement retries with exponential backoff for transient failures, and use AWS X-Ray tracing to measure where time is actually spent.
×

Using synchronous invocation for long-polling or cron tasks

Symptom
If your cron job calls a Lambda synchronously and the function takes longer than 3 seconds (default timeout), the client (EventBridge) times out but the function continues to run — you get no result, no error, and no retry.
Fix
Use asynchronous invocation for EventBridge targets. Wrap your cron trigger in an async invocation pattern or use Step Functions to orchestrate longer-running tasks.
×

Forgotten DLQ for async triggers

Symptom
An S3 event triggers a Lambda that fails every time (e.g., a malformed file). Lambda retries twice and then silently drops the event. You never know a file was not processed.
Fix
Always attach an SQS queue as a dead-letter queue to your Lambda event source mapping for async triggers. Monitor the DLQ for failed events and alert when messages appear.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
A Lambda function handles user logins and is experiencing high tail late...
Q02SENIOR
Explain the difference between synchronous and asynchronous Lambda invoc...
Q03SENIOR
Your team wants to use Lambda to process DynamoDB Stream events. A batch...
Q04SENIOR
What is Provisioned Concurrency and when would you use it? What are the ...
Q05SENIOR
How does Lambda's scaling work? What are the concurrency limits and how ...
Q01 of 05SENIOR

A Lambda function handles user logins and is experiencing high tail latency during morning traffic spikes. The p99 latency is 1.2 seconds but the p50 is 180ms. What's likely causing this and how would you fix it?

ANSWER
This pattern is classic cold start or throttling. The p50 being low shows warm invocations are fast. The p99 spike indicates that some requests hit cold starts (new containers) or are throttled. First, check CloudWatch Logs for Init Duration and Throttles metrics. Likely cause: after idle periods (e.g., minutes with no traffic), all containers are reclaimed, so the first few requests after idle each experience a cold start. Fixes: (1) Use Provisioned Concurrency to keep a few containers warm. (2) Minimise package size and use Lambda Layers to reduce init time. (3) Use warmer via EventBridge scheduled events every 5 minutes to pre-warm containers (less reliable than Provisioned Concurrency).
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
How much does AWS Lambda cost in production?
02
What is a Lambda cold start and can it be completely eliminated?
03
Can Lambda handle long-running background jobs?
04
How do I debug a Lambda function that's timing out?
05
Should I use container images or .zip packages for Lambda?
🔥

That's Cloud. Mark it forged?

12 min read · try the examples if you haven't

Previous
AWS S3 Basics
5 / 23 · Cloud
Next
AWS RDS and DynamoDB