AWS Lambda Cold Starts — Why P99 Spikes to 1.2s at 9 AM
Lambda cold starts added 800-1200ms to our /orders API every morning.
- AWS Lambda runs your code on demand without provisioning or managing servers
- Three core components: Functions (your code), Triggers (event sources), Execution Environment (isolated container)
- Cold starts add 100ms–1s latency when a new container spins up
- Performance insight: More memory = more CPU; tuning memory can reduce both cost and duration for compute-heavy tasks
- Production insight: Lambda bills for the full timeout duration even if your function finishes early — always set timeouts realistically
- Biggest mistake: Assuming /tmp is clean between invocations — it persists across warm starts, causing silent data corruption
Imagine you own a pizza shop but you only pay the chef when someone actually orders a pizza. The chef doesn't sit around waiting — they appear the moment an order comes in, make the pizza, then disappear. AWS Lambda is exactly that chef. You write a function, AWS runs it only when something triggers it, and you pay only for the milliseconds it runs. No server to babysit, no idle hours billed, no infrastructure to patch.
Every application needs compute power — something has to run your code. Traditionally, that meant renting a virtual machine or physical server that runs 24/7, even at 3 a.m. when zero users are online. You're paying for potential, not actual work. As cloud adoption exploded, this idle-cost problem became impossible to ignore, especially for startups and teams with unpredictable traffic spikes.
AWS Lambda, launched in 2014, flipped the model. Instead of managing servers, you upload a function — a single, focused piece of logic — and AWS handles everything else: provisioning, scaling, patching, and availability. The term 'serverless' doesn't mean there are no servers; it means YOU don't manage them. The servers exist, they're just Amazon's problem. This lets your team focus entirely on business logic instead of infrastructure operations.
By the end of this article you'll understand how Lambda executes code, how to wire it to real-world triggers like API Gateway and S3, how to avoid the cold start trap that kills performance, and how to structure a production-worthy serverless workflow. You'll also know exactly when Lambda is the right tool — and when it absolutely isn't.
How AWS Lambda Actually Executes Your Code — The Execution Model
Lambda's execution model is the foundation everything else builds on. When a trigger fires — say, an HTTP request hits API Gateway — Lambda needs to run your function. If a pre-warmed container exists from a recent invocation, Lambda reuses it. This is a 'warm start' and it's fast. If no container is available, Lambda has to bootstrap one from scratch: download your code package, spin up a runtime environment, run any initialisation code outside your handler, then finally invoke your handler. That bootstrap phase is the dreaded cold start.
Cold starts typically add 100ms–1000ms of latency depending on the runtime (.NET and Java are heavier; Node.js and Python are lighter). For a background job this is irrelevant. For a user-facing API call, it's noticeable.
Your handler function receives two objects: the event (the payload that triggered the invocation — could be an HTTP body, an S3 event, a queue message) and the context (metadata about the invocation itself — function name, memory limit, request ID). Understanding this distinction is critical: the event is about WHAT happened, the context is about WHO is running.
Code outside the handler runs once per container lifecycle. That's where you put database connections, SDK clients, and config loading — doing it inside the handler means re-initialising on every single invocation, which is both slow and wasteful.
Init Duration field in CloudWatch logs to measure it.Wiring Lambda to the Real World — Triggers, Events, and API Gateway
A Lambda function sitting alone does nothing. It needs a trigger — an AWS service that says 'hey, something happened, go run'. The trigger determines the shape of the event object your handler receives, which is why reading the AWS event schema docs for each trigger type matters.
The most common triggers in production are: API Gateway (HTTP requests), S3 (file uploads/deletions), SQS (queue messages for async processing), EventBridge (scheduled cron jobs and event routing), DynamoDB Streams (react to database changes), and SNS (fan-out notifications).
API Gateway is the one you'll use for building REST APIs or webhooks. When a request hits your endpoint, API Gateway wraps it into a structured event object and hands it to Lambda. Your function returns a response object with a statusCode, headers, and body, and API Gateway translates that back into a real HTTP response.
The Lambda Proxy Integration model (the default and recommended approach) passes the raw request to your function and expects you to construct the full HTTP response yourself. This gives you complete control over status codes, CORS headers, and response bodies. Older tutorials show Lambda custom integrations — avoid them, they're fiddly and add complexity for no gain.
For async workloads, SQS is your best friend. Rather than calling Lambda directly (which creates tight coupling), push messages to a queue and let Lambda poll and process them in batches. This naturally handles traffic bursts without rate-limit errors.
Lambda Event Source Reference Table — What Triggers Your Function
The following table catalogs the most common Lambda event sources, their invocation model, payload size limits, retry behavior, and best-fit use cases. Knowing these details helps you design reliable, cost-efficient serverless workflows. For each source, the event structure is fixed by AWS — you cannot change the schema — so you must parse the documented fields correctly in your handler.
| Event Source | Invocation Type | Max Payload | Retry Behavior | Best For |
|---|---|---|---|---|
| API Gateway | Synchronous | 10 MB (request), 10 MB (response) | No automatic retries; client handles | HTTP/REST APIs, webhooks |
| S3 (Event Notifications) | Asynchronous | 128 KB (event record) | 2 retries (async) | File processing (image resize, logs, analytics) |
| DynamoDB Streams | Stream-based | 1 MB (batch) | Indefinite retry until data expires (24h) | React to DB changes (materialized views, sync) |
| Kinesis Data Streams | Stream-based | 1 MB (per record) | Indefinite retry until data expires (7 days) | Real-time data processing (clickstreams, logs) |
| SQS (Standard) | Poll-based (event source mapping) | 256 KB per message | Retries based on redrive policy | Async decoupling, buffering, batch processing |
| SQS (FIFO) | Poll-based (event source mapping) | 256 KB per message | Retries with exactly-once semantics | Ordered processing, deduplication |
| SNS (topic subscription) | Asynchronous | 256 KB | 2 retries (async) | Fan-out notifications to multiple subscribers |
| EventBridge (scheduled or event) | Asynchronous | 256 KB | 2 retries (async) | Cron jobs, event routing between AWS services |
| CloudFront (Lambda@Edge) | Synchronous | 1 MB | No automatic retries | Modify HTTP request/response at edge |
| Lambda Function URL | Synchronous | 10 MB (request/response) | No automatic retries | Simple HTTP endpoints without API Gateway |
Key details to remember: - Asynchronous invocations (S3, SNS, EventBridge) retry twice with 1–2 minute delays. Always configure a dead-letter queue (DLQ) for these triggers. - Stream-based triggers (DynamoDB, Kinesis) retry until the data record expires — a persistent bug will block the entire shard. Use bisectBatchOnFunctionError to split batches on failure. - Synchronous triggers (API Gateway, Lambda Function URL) do not retry; your client or upstream service must implement retry logic. - Payload size limits are hard: if your S3 event payload exceeds 128 KB, S3 will send the notification anyway but truncates the event — use the Deep Archive storage class sparingly to avoid this.
For a full list of event sources and their exact event schemas, refer to the [AWS Lambda Developer Guide — Using AWS Lambda with other services](https://docs.aws.amazon.com/lambda/latest/dg/lambda-services.html).
Cold Starts, Memory Tuning, and the Performance Levers You Actually Control
Lambda gives you one direct performance dial: memory. You set it anywhere from 128 MB to 10,240 MB. What most developers don't realise is that CPU allocation scales proportionally with memory. A 1,024 MB Lambda function gets roughly 8x the CPU of a 128 MB one. If your function is CPU-bound (image processing, data transformation, encryption), doubling the memory can halve the execution time — and since you pay for duration × memory, the cost often stays the same or even drops.
Cold starts are the other major lever. Three strategies exist: Provisioned Concurrency, keeping functions warm with scheduled EventBridge pings, and minimising package size.
Provisioned Concurrency is the only AWS-supported solution. You pay for a set number of pre-warmed containers to stay alive at all times. It costs more than on-demand but eliminates cold starts entirely for that concurrency slot. Use it for customer-facing APIs where tail latency matters.
Package size matters because Lambda has to download your deployment package before running it. A 50 MB Python package with unnecessary dependencies cold-starts noticeably slower than a 3 MB lean package. Use Lambda Layers to separate large dependencies (like numpy or Pillow) from your application code, and use .zip deployment packages rather than container images unless you specifically need Docker tooling.
Finally, watch your timeout setting. The default is 3 seconds. Downstream API calls, DB queries, and S3 operations can easily exceed this. Set it realistically (15 minutes max) and always handle partial failures gracefully.
Provisioned Concurrency vs Cold Start — Visual Breakdown
Provisioned Concurrency is the only AWS-native mechanism that guarantees zero cold starts for a fixed number of concurrent invocations. The diagram below contrasts the request flow for an on-demand function (which may incur a cold start) versus a function with Provisioned Concurrency.
How it works: When you enable Provisioned Concurrency, Lambda pre-initialises a specified number of execution environments and keeps them warm. Incoming invocations are routed to these warm environments instantly. On-demand environments are still used for invocations beyond the provisioned count, so cold starts still occur when the provisioned pool is exhausted. The visual logic flow:
- On-Demand Path: Request arrives → check for warm container → if none found → cold start (init + handler delay).
- Provisioned Concurrency Path: Request arrives → route to pre-warmed container → warm start (handler only, no init delay).
The benefit is a 100% elimination of cold start latency for the initial set of concurrent requests. The cost is paying for those environments even when idle.
When to use it: Only for latency-critical production endpoints where p99 must stay below, say, 500ms. For batch processing or background jobs, on-demand is sufficient and cheaper.
When NOT to use it: If your function is rarely invoked (once per hour), the cost of keeping a container warm 24/7 will far exceed any performance benefit. A simple scheduled EventBridge ping (every 5 minutes) is cheaper and nearly as effective — though not guaranteed, as AWS may reclaim containers during maintenance.
Alternative warming patterns: A common pattern is to set up an EventBridge rule that invokes your function every 5 minutes with a synthetic event (e.g., a 'warmup' field). This keeps 1–2 containers warm without Provisioned Concurrency cost. However, this is unreliable under burst traffic — if multiple concurrent requests arrive simultaneously, only one container may be warm. Provisioned Concurrency guarantees capacity.
ProvisionedConcurrencySpillover metric to see how many requests exceed the provisioned pool.Lambda Resource Limits & Constraints Table — What You Can't Change
Lambda has specific hard limits that constrain how you design your serverless applications. Exceeding these limits results in deployment failures, throttling, or runtime errors. The table below shows the most important limits — know them before you architect your system.
| Resource | Limit | Notes |
|---|---|---|
| Memory per function | 128 MB – 10,240 MB (in 1 MB increments) | CPU scales with memory; more memory = more CPU |
Ephemeral storage /tmp | 512 MB | Shared across warm invocations; not reset on reuse |
| Maximum execution timeout | 15 minutes (900 seconds) | Hard limit; cannot be increased |
| Deployment package size (.zip) | 250 MB (unzipped), 50 MB (zipped for direct upload) | Use Lambda Layers to exceed: up to 5 layers, each up to 250 MB unzipped |
| Container image size | 10 GB (ECR image) | Larger images cause slower cold starts |
| Concurrent executions per region (default) | 1,000 | Can be increased via service quota request |
| Concurrent executions per function (default) | 1,000 (unreserved) | Can be limited with reserved concurrency |
| Request/response payload size (sync) | 256 KB (6 MB for API Gateway) | For larger payloads, use S3 or streaming |
| Function environment variables | 4 KB total (unencrypted) | Use AWS Secrets Manager or Parameter Store for secrets |
| Lambda Layers per function | 5 | Layer size counts toward total unzipped limit (250 MB) |
| Event source mappings per function | 10 (for SQS, DynamoDB, Kinesis) | Add more by using multiple triggers |
| Reserved concurrency per function | 0 – regional limit | Setting reserved concurrency guarantees capacity but blocks other functions |
| Provisioned Concurrency per function | 0 – regional limit | Regional limit is 5,000 per region by default |
| Function execution role | AWS IAM role | Lambda attaches this role to the execution environment |
How to work around limits: - Package size: If you exceed 250 MB unzipped, separate large libraries (Panda, OpenCV, etc.) into Lambda Layers. Each layer can be up to 250 MB, and you can use up to 5 layers, giving you an effective 1.25 GB total. - Timeout: Lambda supports up to 15 minutes. For longer jobs, use AWS Step Functions to orchestrate multiple Lambda calls, or switch to Fargate/Batch. - Concurrency: If you anticipate more than 1,000 concurrent executions, request a limit increase in the AWS Service Quotas console. Also consider using SQS buffering to smooth traffic. - Payload size: For payloads larger than 256 KB, upload to S3 and pass the object key in the event. Lambda reads from S3 instead of the event body.
These limits are not negotiable — building against them from day one avoids costly refactors later.
Production Patterns: Error Handling, Retries, and Observability
Lambda's default retry behaviour depends on invocation type. Synchronous invocations (API Gateway, custom apps) do NOT retry automatically — your client must handle errors. Asynchronous invocations (S3, SNS, EventBridge) retry twice using built-in retry logic, then discard the event unless you configure a dead-letter queue (DLQ). Stream-based triggers (DynamoDB Streams, Kinesis) retry until the data expires (default 24 hours) and block the shard — meaning a permanently failing function stalls your stream.
For synchronous APIs, implement your own retry with exponential backoff inside Lambda. For async triggers, always attach a DLQ (SQS or SNS) to capture failed events. Without a DLQ, failed events vanish after two retries — you'll never know.
Observability in Lambda is driven by CloudWatch Logs, CloudWatch Metrics, and AWS X-Ray. Every invocation writes a REPORT line showing duration, billed duration, memory used, and init duration. X-Ray traces show downstream calls to DynamoDB, S3, and other services — essential for debugging latency.
Structured logging is critical. Use JSON-formatted logs with a correlation ID (often the X-Ray trace ID) so you can correlate invocations. Avoid print() statements without context.
- Synchronous invocations: no automatic retries. The caller must handle errors.
- Asynchronous invocations: two automatic retries with exponential backoff (0, 1, 2 min delays).
- Stream-based triggers: retry forever (up to 24 hours or 7 days for Kinesis).
- Always configure a dead-letter queue (DLQ) for async triggers to catch failures.
- DLQ can be an SQS queue (for processing later) or an SNS topic (for alerting).
When Lambda is the Wrong Tool — Alternatives and Trade-offs
Lambda excels at short-lived, event-driven, bursty workloads. But it's not a general-purpose compute platform. If your workload contradicts any of the following, reach for another service.
First, long-running processes: Lambda's hard 15-minute timeout means you cannot run a nightly batch job that takes an hour. Use AWS Batch or ECS/Fargate for that.
Second, stateful applications: Lambda is stateless by design. If your application needs to hold client connections (WebSockets), maintain session state in memory, or use files that persist beyond a single invocation, you'll fight the architecture. Use EC2 or ECS with sticky sessions instead.
Third, predictable, steady traffic: If your load is constant 24/7, Lambda's per-ms billing is more expensive than a low-cost EC2 instance or a reserved instance. A t3.small running 24 hours costs $15/month; 5 million Lambda invocations at 200ms average could cost $8, but steady traffic at 100 req/s would push cost higher than an EC2.
Fourth, heavy GPU/compute: Lambda has no GPU support. ML training, 3D rendering, or video transcoding with high compute needs are better on EC2 GPU instances or SageMaker.
Fifth, very low latency requirements (<10ms): Lambda's cold start and network overhead make it unsuitable for sub-millisecond use cases like real-time trading. Use containers on EC2 or custom hardware.
Finally, large binary processing: Lambda's deployment package limit is 250 MB (unzipped) and 50 MB (zipped) for direct upload. If you're processing multi-GB files, you'll hit storage and timeout limits. Use ECS or Batch with EFS.
The Cold Start P99 Spike That Killed Our API Response Times
- Measure p50 and p99 separately — if p99 is much higher than p50, cold starts or throttling are the likely cause.
- Use Provisioned Concurrency for latency-sensitive endpoints, but only for the minimum number needed.
- Minimise package size and externalise heavy dependencies to Lambda Layers.
Key takeaways
Common mistakes to avoid
5 patternsInitialising DB connections inside the handler
Ignoring the 512 MB /tmp storage limit and assuming a clean filesystem
Setting Lambda timeout lower than the slowest downstream dependency
Using synchronous invocation for long-polling or cron tasks
Forgotten DLQ for async triggers
Interview Questions on This Topic
A Lambda function handles user logins and is experiencing high tail latency during morning traffic spikes. The p99 latency is 1.2 seconds but the p50 is 180ms. What's likely causing this and how would you fix it?
Frequently Asked Questions
That's Cloud. Mark it forged?
12 min read · try the examples if you haven't