Home DevOps AWS Lambda and Serverless Explained — Architecture, Triggers, and Real-World Patterns

AWS Lambda and Serverless Explained — Architecture, Triggers, and Real-World Patterns

In Plain English 🔥
Imagine you own a pizza shop but you only pay the chef when someone actually orders a pizza. The chef doesn't sit around waiting — they appear the moment an order comes in, make the pizza, then disappear. AWS Lambda is exactly that chef. You write a function, AWS runs it only when something triggers it, and you pay only for the milliseconds it runs. No server to babysit, no idle hours billed, no infrastructure to patch.
⚡ Quick Answer
Imagine you own a pizza shop but you only pay the chef when someone actually orders a pizza. The chef doesn't sit around waiting — they appear the moment an order comes in, make the pizza, then disappear. AWS Lambda is exactly that chef. You write a function, AWS runs it only when something triggers it, and you pay only for the milliseconds it runs. No server to babysit, no idle hours billed, no infrastructure to patch.

Every application needs compute power — something has to run your code. Traditionally, that meant renting a virtual machine or physical server that runs 24/7, even at 3 a.m. when zero users are online. You're paying for potential, not actual work. As cloud adoption exploded, this idle-cost problem became impossible to ignore, especially for startups and teams with unpredictable traffic spikes.

AWS Lambda, launched in 2014, flipped the model. Instead of managing servers, you upload a function — a single, focused piece of logic — and AWS handles everything else: provisioning, scaling, patching, and availability. The term 'serverless' doesn't mean there are no servers; it means YOU don't manage them. The servers exist, they're just Amazon's problem. This lets your team focus entirely on business logic instead of infrastructure operations.

By the end of this article you'll understand how Lambda executes code, how to wire it to real-world triggers like API Gateway and S3, how to avoid the cold start trap that kills performance, and how to structure a production-worthy serverless workflow. You'll also know exactly when Lambda is the right tool — and when it absolutely isn't.

How AWS Lambda Actually Executes Your Code — The Execution Model

Lambda's execution model is the foundation everything else builds on. When a trigger fires — say, an HTTP request hits API Gateway — Lambda needs to run your function. If a pre-warmed container exists from a recent invocation, Lambda reuses it. This is a 'warm start' and it's fast. If no container is available, Lambda has to bootstrap one from scratch: download your code package, spin up a runtime environment, run any initialisation code outside your handler, then finally invoke your handler. That bootstrap phase is the dreaded cold start.

Cold starts typically add 100ms–1000ms of latency depending on the runtime (.NET and Java are heavier; Node.js and Python are lighter). For a background job this is irrelevant. For a user-facing API call, it's noticeable.

Your handler function receives two objects: the event (the payload that triggered the invocation — could be an HTTP body, an S3 event, a queue message) and the context (metadata about the invocation itself — function name, memory limit, request ID). Understanding this distinction is critical: the event is about WHAT happened, the context is about WHO is running.

Code outside the handler runs once per container lifecycle. That's where you put database connections, SDK clients, and config loading — doing it inside the handler means re-initialising on every single invocation, which is both slow and wasteful.

image_resize_handler.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
import boto3
import json
import os
from PIL import Image
import io

# ✅ Initialise the S3 client OUTSIDE the handler.
# This runs once when the container boots (cold start),
# then gets reused across all warm invocations — saving ~50ms per call.
s3_client = boto3.client('s3')

# Target width for all resized thumbnails
THUMBNAIL_WIDTH = 200


def handler(event, context):
    """
    Triggered by an S3 PUT event whenever a new image is uploaded
    to the 'uploads-raw' bucket. Resizes it and saves a thumbnail
    to the 'uploads-thumbnails' bucket.
    """

    # The event payload from S3 contains a list of records —
    # each record represents one file upload.
    for record in event['Records']:
        source_bucket = record['s3']['bucket']['name']
        object_key    = record['s3']['object']['key']  # e.g. 'photos/sunset.jpg'

        print(f"Processing: s3://{source_bucket}/{object_key}")

        # Download the original image bytes into memory (no temp file needed)
        response      = s3_client.get_object(Bucket=source_bucket, Key=object_key)
        image_bytes   = response['Body'].read()

        # Open image with Pillow and calculate proportional height
        original_img  = Image.open(io.BytesIO(image_bytes))
        original_w, original_h = original_img.size
        ratio         = THUMBNAIL_WIDTH / original_w
        new_height    = int(original_h * ratio)

        thumbnail     = original_img.resize((THUMBNAIL_WIDTH, new_height))

        # Save resized image to an in-memory buffer — Lambda has no persistent disk
        output_buffer = io.BytesIO()
        thumbnail.save(output_buffer, format='JPEG', quality=85)
        output_buffer.seek(0)  # Rewind buffer to the start before uploading

        # Write thumbnail to the destination bucket under the same key name
        destination_bucket = os.environ['THUMBNAIL_BUCKET']  # Read from env vars, not hardcoded
        s3_client.put_object(
            Bucket      = destination_bucket,
            Key         = object_key,
            Body        = output_buffer,
            ContentType = 'image/jpeg'
        )

        print(f"Thumbnail saved: s3://{destination_bucket}/{object_key} ({THUMBNAIL_WIDTH}x{new_height})")

    # Lambda expects a return value when invoked synchronously (e.g. via API Gateway).
    # For async triggers like S3, the return value is ignored — but it's good practice.
    return {
        'statusCode': 200,
        'body': json.dumps({'processed': len(event['Records'])})
    }
▶ Output
START RequestId: 7f3a1c2b-... Version: $LATEST
Processing: s3://uploads-raw/photos/sunset.jpg
Thumbnail saved: s3://uploads-thumbnails/photos/sunset.jpg (200x133)
END RequestId: 7f3a1c2b-...
REPORT RequestId: 7f3a1c2b-... Duration: 312.45 ms Billed Duration: 313 ms Memory Size: 256 MB Max Memory Used: 89 MB
⚠️
Pro Tip: The Container Reuse RuleAnything initialised outside your handler (DB connections, SDK clients, parsed config) is cached for the lifetime of the container — potentially minutes or hours. This is a feature, not a bug. Put expensive initialisation there. But never assume a clean slate between invocations: a previous call's temp files or in-memory state might still exist. Always write to /tmp explicitly and defensively.

Wiring Lambda to the Real World — Triggers, Events, and API Gateway

A Lambda function sitting alone does nothing. It needs a trigger — an AWS service that says 'hey, something happened, go run'. The trigger determines the shape of the event object your handler receives, which is why reading the AWS event schema docs for each trigger type matters.

The most common triggers in production are: API Gateway (HTTP requests), S3 (file uploads/deletions), SQS (queue messages for async processing), EventBridge (scheduled cron jobs and event routing), DynamoDB Streams (react to database changes), and SNS (fan-out notifications).

API Gateway is the one you'll use for building REST APIs or webhooks. When a request hits your endpoint, API Gateway wraps it into a structured event object and hands it to Lambda. Your function returns a response object with a statusCode, headers, and body, and API Gateway translates that back into a real HTTP response.

The Lambda Proxy Integration model (the default and recommended approach) passes the raw request to your function and expects you to construct the full HTTP response yourself. This gives you complete control over status codes, CORS headers, and response bodies. Older tutorials show Lambda custom integrations — avoid them, they're fiddly and add complexity for no gain.

For async workloads, SQS is your best friend. Rather than calling Lambda directly (which creates tight coupling), push messages to a queue and let Lambda poll and process them in batches. This naturally handles traffic bursts without rate-limit errors.

orders_api_handler.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
import json
import boto3
import uuid
import os
from datetime import datetime, timezone

# DynamoDB resource initialised at cold-start — reused on warm invocations
dynamodb    = boto3.resource('dynamodb')
orders_table = dynamodb.Table(os.environ['ORDERS_TABLE_NAME'])


def handler(event, context):
    """
    Handles POST /orders from API Gateway (Lambda Proxy Integration).
    Creates a new order record in DynamoDB and returns the order ID.

    API Gateway event shape (key fields):
      event['httpMethod']         -> 'POST'
      event['path']               -> '/orders'
      event['body']               -> Raw JSON string of the request body
      event['requestContext']     -> Metadata including caller identity
    """

    http_method = event.get('httpMethod', '')

    # Route guard — this function only handles order creation
    if http_method != 'POST':
        return _build_response(405, {'error': f'Method {http_method} not allowed'})

    # API Gateway sends the body as a raw string — we must parse it
    try:
        request_body = json.loads(event.get('body') or '{}')
    except json.JSONDecodeError:
        return _build_response(400, {'error': 'Request body must be valid JSON'})

    # Validate required fields before touching the database
    required_fields = ['customer_id', 'items', 'total_amount']
    missing_fields  = [f for f in required_fields if f not in request_body]
    if missing_fields:
        return _build_response(400, {'error': f'Missing required fields: {missing_fields}'})

    # Build the order record
    order_id    = str(uuid.uuid4())  # Unique ID for this order
    created_at  = datetime.now(timezone.utc).isoformat()  # ISO 8601, always UTC

    order_record = {
        'order_id':      order_id,
        'customer_id':   request_body['customer_id'],
        'items':         request_body['items'],
        'total_amount':  str(request_body['total_amount']),  # DynamoDB doesn't support float natively
        'status':        'PENDING',
        'created_at':    created_at
    }

    # Write to DynamoDB — put_item overwrites if the key already exists,
    # so ConditionExpression ensures we never silently stomp an existing order
    orders_table.put_item(
        Item=order_record,
        ConditionExpression='attribute_not_exists(order_id)'
    )

    print(f"Order created: {order_id} for customer {request_body['customer_id']}")

    return _build_response(201, {
        'order_id':   order_id,
        'status':     'PENDING',
        'created_at': created_at
    })


def _build_response(status_code, body_dict):
    """
    Constructs the response object API Gateway expects.
    CORS headers are included so browser-based clients can call this API.
    Without these headers, browsers silently block the response.
    """
    return {
        'statusCode': status_code,
        'headers': {
            'Content-Type':                'application/json',
            'Access-Control-Allow-Origin': '*'  # Tighten to your domain in production
        },
        'body': json.dumps(body_dict)
    }
▶ Output
START RequestId: 9a2e4f7d-... Version: $LATEST
Order created: 3c8b1a2f-4e9d-4f3a-b1c2-7d8e9f0a1b2c for customer cust_00847
END RequestId: 9a2e4f7d-...
REPORT RequestId: 9a2e4f7d-... Duration: 187.22 ms Billed Duration: 188 ms Memory Size: 128 MB Max Memory Used: 54 MB

# HTTP Response seen by the client:
# Status: 201 Created
# Body: {"order_id": "3c8b1a2f-4e9d-4f3a-b1c2-7d8e9f0a1b2c", "status": "PENDING", "created_at": "2024-03-15T14:22:01.483921+00:00"}
⚠️
Watch Out: The Missing CORS Headers TrapIf your Lambda-backed API works perfectly in Postman but fails in the browser with a CORS error, you're missing Access-Control-Allow-Origin in your Lambda response AND you haven't configured the OPTIONS preflight method in API Gateway. You need both — Lambda handles the headers on real requests, but API Gateway must respond to preflight OPTIONS requests independently (enable CORS on the resource in the API Gateway console or via CloudFormation).

Cold Starts, Memory Tuning, and the Performance Levers You Actually Control

Lambda gives you one direct performance dial: memory. You set it anywhere from 128 MB to 10,240 MB. What most developers don't realise is that CPU allocation scales proportionally with memory. A 1,024 MB Lambda function gets roughly 8x the CPU of a 128 MB one. If your function is CPU-bound (image processing, data transformation, encryption), doubling the memory can halve the execution time — and since you pay for duration × memory, the cost often stays the same or even drops.

Cold starts are the other major lever. Three strategies exist: Provisioned Concurrency, keeping functions warm with scheduled EventBridge pings, and minimising package size.

Provisioned Concurrency is the only AWS-supported solution. You pay for a set number of pre-warmed containers to stay alive at all times. It costs more than on-demand but eliminates cold starts entirely for that concurrency slot. Use it for customer-facing APIs where tail latency matters.

Package size matters because Lambda has to download your deployment package before running it. A 50 MB Python package with unnecessary dependencies cold-starts noticeably slower than a 3 MB lean package. Use Lambda Layers to separate large dependencies (like numpy or Pillow) from your application code, and use .zip deployment packages rather than container images unless you specifically need Docker tooling.

Finally, watch your timeout setting. The default is 3 seconds. Downstream API calls, DB queries, and S3 operations can easily exceed this. Set it realistically (15 minutes max) and always handle partial failures gracefully.

serverless-function-config.yaml · YAML
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
# AWS SAM (Serverless Application Model) template — the standard way to
# define Lambda functions as Infrastructure-as-Code.
# Run: sam build && sam deploy --guided

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Description: Orders API — production-grade Lambda configuration

Globals:
  Function:
    # Runtime for all functions in this template unless overridden
    Runtime: python3.12
    # Timeout generous enough for DynamoDB + downstream calls, not infinite
    Timeout: 30
    # Environment variables available to all functions
    Environment:
      Variables:
        LOG_LEVEL:          INFO
        ORDERS_TABLE_NAME:  !Ref OrdersTable

Resources:

  OrdersApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: orders-api-handler
      CodeUri:       src/
      Handler:       orders_api_handler.handler

      # 512 MB gives ~4x CPU vs 128 MB — worth it for JSON parsing + DynamoDB calls
      # Run AWS Lambda Power Tuning tool to find YOUR optimal memory setting
      MemorySize: 512

      # Provisioned concurrency: 5 containers always warm for the production alias
      # This eliminates cold starts for the first 5 concurrent requests
      # Cost: ~$0.0000041 per GB-second × 5 containers × all hours in month
      AutoPublishAlias: live
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 5

      # IAM permissions — principle of least privilege
      # Only grant what this specific function actually needs
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref OrdersTable

      # API Gateway trigger — Lambda Proxy Integration (recommended)
      Events:
        CreateOrder:
          Type:  Api
          Properties:
            Path:   /orders
            Method: POST

        # OPTIONS method needed for browser CORS preflight requests
        CreateOrderOptions:
          Type:  Api
          Properties:
            Path:   /orders
            Method: OPTIONS

  OrdersTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName:    orders
      BillingMode:  PAY_PER_REQUEST  # Serverless billing — no provisioned capacity to manage
      AttributeDefinitions:
        - AttributeName: order_id
          AttributeType: S
      KeySchema:
        - AttributeName: order_id
          KeyType:        HASH

      # Point-in-time recovery — always enable for production data
      PointInTimeRecoverySpecification:
        PointInTimeRecoveryEnabled: true

Outputs:
  OrdersApiEndpoint:
    Description: "API Gateway endpoint for the Orders API"
    Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/orders"
▶ Output
# sam build output:
Building codeuri: src/ runtime: python3.12
Build Succeeded
Built Artifacts: .aws-sam/build

# sam deploy output (key lines):
Deploying with following values
Stack name : orders-api-stack
Region : eu-west-1
Confirm changes: Yes

CloudFormation events:
CREATE_COMPLETE AWS::DynamoDB::Table OrdersTable
CREATE_COMPLETE AWS::Lambda::Function OrdersApiFunction
CREATE_COMPLETE AWS::ApiGateway::RestApi ServerlessRestApi

Outputs:
OrdersApiEndpoint: https://x7k2mn3p4q.execute-api.eu-west-1.amazonaws.com/Prod/orders

Successfully created/updated stack - orders-api-stack in eu-west-1
🔥
Interview Gold: Lambda Power TuningAWS publishes an open-source Step Functions state machine called 'Lambda Power Tuning' (github.com/alexcasalboni/aws-lambda-power-tuning). It runs your function at every memory configuration from 128 MB to 10 GB, measures cost and duration, and plots the optimal setting. Mentioning this tool in an interview signals you understand production cost optimisation, not just configuration.
AspectAWS Lambda (Serverless)EC2 Instance (Traditional)
Billing modelPer 1ms of execution + invocation countPer hour the instance runs (even idle)
ScalingAutomatic — up to 1,000 concurrent by defaultManual or Auto Scaling Group (minutes to scale)
Max execution time15 minutes per invocationUnlimited — process runs indefinitely
Cold start latency100ms–1s for first request after idle periodNone (process stays resident in memory)
State managementStateless — no persistent memory between callsStateful — in-memory state survives between requests
Long-running workloadsNot suitable (15 min cap)Ideal — batch jobs, ML training, websockets
Operational overheadNear zero — AWS patches, scales, monitorsHigh — OS updates, capacity planning, monitoring setup
Best forEvent-driven, spiky, short-duration tasksSteady, high-throughput, stateful applications

🎯 Key Takeaways

  • Cold starts are a real cost — put all initialisation (DB clients, SDK objects, config) outside your handler so Lambda reuses it across warm invocations. This alone can cut average latency by 40-200ms.
  • Memory is your CPU dial — increasing Lambda memory from 128 MB to 1024 MB allocates 8x the CPU. For compute-heavy functions this can reduce duration enough that the higher-memory run is cheaper per invocation.
  • Lambda is stateless by design — never rely on in-memory state surviving between invocations. Use DynamoDB, ElastiCache, or S3 for any state that must persist. Any 'persistence' you observe in /tmp or module-level variables is a side effect of container reuse, not a guarantee.
  • Match the trigger to the job — API Gateway for synchronous HTTP APIs, SQS for reliable async processing with retry semantics, EventBridge for scheduled tasks and event routing. Picking the wrong trigger means fighting the tool instead of building your feature.

⚠ Common Mistakes to Avoid

  • Mistake 1: Initialising DB connections inside the handler — Every invocation opens a new connection, exhausting your RDS connection pool within minutes under load. The symptom is 'too many connections' errors that appear fine locally but explode in production. Fix: move the connection client instantiation outside the handler function so Lambda reuses the same connection across warm invocations.
  • Mistake 2: Ignoring the 512 MB /tmp storage limit and assuming a clean filesystem — Lambda does NOT guarantee a clean /tmp directory between warm invocations. A previous call's temp files may still be there, causing 'file already exists' errors or stale data bugs. Fix: always generate unique filenames (use uuid or the request ID from context.aws_request_id) and clean up /tmp explicitly at the end of your handler.
  • Mistake 3: Setting Lambda timeout lower than the slowest downstream dependency — If your function calls an external API that sometimes takes 8 seconds and your timeout is 3 seconds, Lambda kills the invocation and returns a 504 to the caller with no useful error message. Fix: set your Lambda timeout to at least 2x your expected worst-case downstream latency, implement retries with exponential backoff for transient failures, and use AWS X-Ray tracing to measure where time is actually spent.

Interview Questions on This Topic

  • QA Lambda function handles user logins and is experiencing high tail latency during morning traffic spikes. The p99 latency is 1.2 seconds but the p50 is 180ms. What's likely causing this and how would you fix it?
  • QExplain the difference between synchronous and asynchronous Lambda invocation models. Give a concrete example of when you'd choose one over the other, and what happens to errors in each model.
  • QYour team wants to use Lambda to process DynamoDB Stream events. A batch of 100 records comes in, your function processes 60 successfully, then fails on record 61. What happens to all 100 records, and how would you implement partial batch failure handling to avoid reprocessing the first 60?

Frequently Asked Questions

How much does AWS Lambda cost in production?

Lambda charges on two axes: number of requests ($0.20 per 1 million requests) and duration rounded to the nearest 1ms ($0.0000166667 per GB-second). The free tier covers 1 million requests and 400,000 GB-seconds per month permanently — not just the first year. A function using 512 MB running for 200ms, invoked 5 million times a month, costs roughly $8. Compare that to a t3.small EC2 at ~$15/month that sits idle most of the time.

What is a Lambda cold start and can it be completely eliminated?

A cold start is the initialisation delay when Lambda has to provision a fresh execution environment because no warm container is available. It includes downloading your code, starting the runtime, and running module-level initialisation code. Provisioned Concurrency is the only way to fully eliminate cold starts — you pay to keep N containers permanently warm. Keeping package sizes small (under 5 MB) and using lighter runtimes (Python, Node.js) minimises cold start duration but doesn't eliminate the occurrence.

Can Lambda handle long-running background jobs?

Lambda has a hard 15-minute maximum execution timeout. For jobs that run longer than that — nightly batch reports, large file processing, ML model training — you need a different tool. AWS Step Functions can chain multiple Lambda calls to work around the timeout for sequential tasks. For truly long-running jobs, AWS Fargate (containerised tasks) or AWS Batch are the right choices. Trying to hack around Lambda's timeout with recursive self-invocation is an anti-pattern and will create billing surprises.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousAWS S3 BasicsNext →AWS RDS and DynamoDB
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged