Skip to content
Home Python FastAPI Deployment — Docker, Uvicorn and Gunicorn

FastAPI Deployment — Docker, Uvicorn and Gunicorn

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Python Libraries → Topic 48 of 51
Production-grade FastAPI deployment guide.
🔥 Advanced — solid Python foundation required
In this tutorial, you'll learn
Production-grade FastAPI deployment guide.
  • Gunicorn is the right process manager on bare VMs and EC2 — it handles worker crash recovery, graceful shutdown, and multi-core utilisation without an external orchestrator. On Kubernetes, Uvicorn alone with multiple pod replicas is equally valid and often simpler to reason about.
  • The (2×CPU)+1 worker formula is for synchronous WSGI workers. For async FastAPI, start with 2–4 workers and benchmark under load — the event loop provides concurrency, not worker count. Adding workers beyond CPU saturation multiplies memory and connection consumption with no throughput gain.
  • Use a gunicorn.conf.py file for all Gunicorn configuration — versionable, readable, and supports dynamic values. Set max_requests and max_requests_jitter to recycle workers and prevent thundering herd restarts from slow memory leaks.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • FastAPI requires an ASGI server; Uvicorn handles the async HTTP event loop
  • Gunicorn acts as a process manager on bare VMs and EC2: it spawns, monitors, and restarts Uvicorn workers; on Kubernetes, Uvicorn alone with multiple pod replicas is equally valid
  • Worker count formula: (2 × CPU cores) + 1 is a heuristic for CPU-bound WSGI apps — for async FastAPI, start with 2–4 workers per container and benchmark; the event loop handles concurrency, not worker count
  • Gunicorn's --timeout prevents stuck workers from hanging requests indefinitely — set it to at least 2× your slowest endpoint's p99 latency
  • Use --max-requests and --max-requests-jitter to periodically recycle workers and mitigate slow memory leaks without thundering herd restarts
  • Biggest mistake: using --reload in production — it polls file descriptors, spikes CPU under no load, and is a developer-only feature that has no place in a production image
  • In 2026, @app.on_event is deprecated — use the lifespan context manager for startup and shutdown logic
  • Secrets do not belong in docker run -e flags or Dockerfiles — use Kubernetes Secrets, AWS Secrets Manager, or Vault with a CSI driver
🚨 START HERE

FastAPI Deployment Troubleshooting Quick Guide

Exact commands for the five failures you will encounter in production. Run these before changing any code.
🟡

Container exits immediately after start

Immediate ActionCheck container logs for Python import errors, missing modules, or Pydantic ValidationError on settings.
Commands
docker logs <container_id> --tail 100 2>&1 | grep -E '(Error|Exception|Traceback|ValidationError)'
docker run --rm --entrypoint python <image> -c "from io.thecodeforge.main import app; print('import OK')"
Fix NowIf ValidationError: check that all FORGE_ environment variables are set correctly in your docker run command or compose file. If ImportError: rebuild the image — a dependency is missing from requirements.txt or the pip install layer is stale. Run 'docker build --no-cache' to force a clean install.
🟡

Gunicorn starts but workers die immediately with 'address already in use'

Immediate ActionCheck if port 8000 is already bound by another process or a previous container that didn't exit cleanly.
Commands
docker exec <container> sh -c "ss -tulpn | grep 8000"
docker ps -a | grep 8000 # look for exited containers still holding the port mapping
Fix NowStop and remove any container with a -p 8000:8000 mapping: 'docker rm -f <old_container>'. If on a bare VM, find and kill the process holding the port: 'fuser -k 8000/tcp'. Then restart your container.
🟡

Health check returns 200 but readiness returns 503

Immediate ActionThe app is alive but cannot reach its database. Test database connectivity directly from inside the container.
Commands
docker exec <container> sh -c "python -c \"from io.thecodeforge.config import settings; print(settings.database_url)\""
docker exec <container> sh -c "python -c \"import asyncio, databases; db = databases.Database('$FORGE_DATABASE_URL'); asyncio.run(db.connect()); print('DB OK')\""
Fix NowIf the URL prints correctly but connection fails: check network connectivity between the container and the database host. In Docker: verify both are on the same network ('docker network inspect <network>'). In Kubernetes: check NetworkPolicy rules and verify the database service DNS resolves ('kubectl exec <pod> -- nslookup <db-service-name>').
🟠

Slow response times under load — p99 latency climbing even with low CPU

Immediate ActionCheck if workers are saturated or if the event loop is blocked by synchronous code.
Commands
docker exec <container> sh -c "ps aux | grep uvicorn | wc -l" # count actual worker processes
docker stats <container> --no-stream # check CPU% per container — if it's near 100%, workers are CPU-saturated
Fix NowIf CPU is low but latency is high: the event loop is blocked. Search your codebase for synchronous blocking calls in async functions: time.sleep, requests.get, synchronous DB drivers. Replace with async equivalents. If CPU is saturated: increase workers in gunicorn.conf.py up to (available container CPU cores × 2), but watch memory — each worker adds 50-150MB.
🟡

Environment variables not loading inside container — settings validation fails

Immediate ActionVerify what environment the running container actually sees — not what you think you passed.
Commands
docker exec <container> env | grep FORGE_ # shows what the process actually receives
docker inspect <container> --format='{{range .Config.Env}}{{println .}}{{end}}' | grep FORGE_
Fix NowIf FORGE_ variables are missing: check your docker-compose.yml environment section or docker run -e flags. For Kubernetes: 'kubectl describe pod <pod-name>' and look under 'Environment' — if a secretKeyRef shows '<set to the key X of secret Y>' it means the Secret or key doesn't exist. Verify with: 'kubectl get secret <secret-name> -o jsonpath="{.data}" | base64 -d'.
Production Incident

Missing Environment Variable Causes Silent Startup — Then Full Outage on First Request

A staging deployment started without errors, passed all health checks, and served 200s for 4 minutes before every API call began returning 500. The root cause: a missing DATABASE_URL that Pydantic silently accepted because the field was typed as str, not PostgresDsn.
SymptomFastAPI app starts successfully. Health check returns 200. Kubernetes marks the pod as ready and routes traffic. Four minutes later, every endpoint returns HTTP 500 with ConnectTimeoutError in the logs: 'could not connect to server: Name or service not known.' The logs show 'NoneType object has no attribute connect' on the database call.
AssumptionThe team assumed Pydantic would raise a ValidationError if DATABASE_URL was missing or empty, preventing the app from starting. They had validated this assumption in local development where the .env file was always present. In the staging environment, the Kubernetes Secret reference was misconfigured — the env var was present in the pod spec but mapped to a non-existent Secret key, so the pod received an empty string. An empty string passes str validation.
Root causeThe Pydantic model used database_url: str instead of database_url: PostgresDsn. An empty string is a valid str. Pydantic accepted it, the app started, the health check passed, the readiness probe (which was the same lightweight /health endpoint — not a separate DB-checking /ready endpoint) returned 200, and Kubernetes routed traffic. The connection failure only surfaced when the first actual database call was made, four minutes into the deployment when the smoke test hit a data-fetching endpoint.
FixTwo changes, both required. First: changed database_url: str to database_url: PostgresDsn in the Pydantic model. An empty string now raises ValidationError at startup — the app refuses to start and Kubernetes never marks it as ready. Second: separated /health (lightweight, no DB call) from /ready (calls SELECT 1 against the database). Now even if a future misconfiguration produces a valid-looking but wrong URL, the readiness probe catches the connectivity failure before traffic is routed. Also added a Kubernetes Secret validator in the CI pipeline that verifies all referenced Secret keys exist in the target namespace before a deployment is applied.
Key Lesson
Never use plain str for connection strings in Pydantic models — use PostgresDsn, HttpUrl, or the appropriate specialised type. The specialised types validate format at startup; str validates nothing.Separate /health (liveness, no dependencies) from /ready (readiness, checks real dependencies including database connectivity). A single combined endpoint used for both purposes will either restart pods during DB outages or fail to catch connectivity problems before traffic is routed.Test your Kubernetes Secret references in CI before deploying. A misconfigured secretKeyRef delivers an empty string to the container — Pydantic won't catch it unless your field type enforces format constraints.Measure your startup time and set initialDelaySeconds on the readiness probe accordingly. If startup takes 4 seconds and initialDelaySeconds is 3, you will fail the first readiness check on every deployment and briefly drop from rotation.
Production Debug Guide

Symptoms and actions for the failures that actually appear in production — not the ones that are easy to demo.

Container memory grows steadily over hours and never stabilises.This is almost always either unclosed async generators, database sessions that aren't returned to the pool, or a genuine Python object leak. First, confirm it's not just Python's allocator holding freed memory (normal behaviour). Add --max-requests 500 to gunicorn.conf.py temporarily — if memory resets after each worker restart, you have a per-request leak, not a structural one. For deeper diagnosis, add 'tracemalloc' to a debug endpoint: import tracemalloc; tracemalloc.start(); then call tracemalloc.take_snapshot() in a route and compare snapshots across requests. Check all async context managers: every 'async with db.transaction()' must exit cleanly or the session leaks.
Workers are killed with SIGKILL by Gunicorn. Logs show 'Worker timeout (pid:N)' before the kill.A worker has stopped sending heartbeats to the Gunicorn master — it's either stuck in a synchronous blocking call, a CPU-bound loop with no await, or it hit the --timeout limit on a legitimately slow request. Check for synchronous blocking operations called from async code: time.sleep() instead of asyncio.sleep(), synchronous database drivers called from an async function, or a requests.get() instead of httpx.AsyncClient. If the endpoint is legitimately slow (batch processing, large file generation), increase --timeout in gunicorn.conf.py to 2x the endpoint's p99 latency, or move the work to a background task queue (Celery, ARQ, FastAPI BackgroundTasks for lightweight cases).
Connection pool exhaustion: database throws 'too many connections' during traffic spikes.Calculate your total connection consumption: gunicorn_workers × pod_count × db_pool_max_size. This must be less than PostgreSQL's max_connections (default 100). With 4 workers, 10 pods, and pool max 5: 4×10×5=200 connections — double the PostgreSQL default. Fix: reduce db_pool_max_size in ForgeSettings, add a connection pooler (PgBouncer in transaction mode in front of PostgreSQL), or increase PostgreSQL's max_connections if your instance has the memory to support it. PgBouncer is the correct long-term solution — it multiplexes hundreds of application connections onto a smaller number of PostgreSQL connections.
Rolling update in Kubernetes causes 30-60 seconds of 502 errors during every deployment.This is a graceful shutdown race condition. The old pods are receiving SIGTERM and stopping, but the load balancer is still routing traffic to them for a few seconds while its endpoint list updates. Fix: add a preStop lifecycle hook to sleep for 5 seconds before Gunicorn receives SIGTERM, giving the load balancer time to remove the pod from rotation. In the Kubernetes pod spec: lifecycle.preStop.exec.command: ['sleep', '5']. Also verify graceful_timeout in gunicorn.conf.py is long enough for your slowest in-flight request to complete.

FastAPI is an ASGI application. That single fact determines your entire deployment stack. Unlike traditional WSGI frameworks where a synchronous worker handles one request at a time, FastAPI's async model means a single Uvicorn worker can hold hundreds of concurrent connections open — waiting on database responses, external APIs, file I/O — without blocking. That's the performance story. The deployment story is what happens when that worker crashes, when your VM has 8 cores sitting idle, when a Kubernetes rolling update sends SIGTERM to your process, or when a junior engineer accidentally commits a .env file with production credentials.

This guide covers the full production stack: Gunicorn as the process manager on bare infrastructure, Uvicorn as the ASGI worker, multi-stage Docker builds that don't ship your build toolchain to production, Pydantic Settings for configuration that fails loudly at startup rather than silently at 2 AM, and the graceful shutdown sequence that prevents connection pool exhaustion during every deployment.

One thing to clarify upfront before we start: on Kubernetes, you don't always need Gunicorn. Kubernetes itself is a process manager — it handles pod restarts, rolling updates, and health-based traffic routing. Running Uvicorn directly with multiple pod replicas, each with a single worker, is a legitimate and increasingly common pattern in 2026. Where Gunicorn earns its place is on bare VMs, EC2 instances, and any environment where you don't have an external orchestrator managing process lifecycle. We'll cover both. Know which one you're in before you copy a deployment command.

The Production Stack: Gunicorn + Uvicorn

Uvicorn is fast. It implements the ASGI specification cleanly, handles HTTP/1.1 and HTTP/2, and processes async requests efficiently. What it doesn't do is manage its own process lifecycle. If the Uvicorn process crashes, it's gone. If your VM has 8 cores, a single Uvicorn process uses one of them. If a worker gets stuck in a CPU-bound loop that never yields to the event loop, every request queued behind it waits indefinitely.

Gunicorn solves all three problems. It operates as a master process — what the Gunicorn docs call the arbiter — that forks multiple worker processes and monitors them via a heartbeat mechanism. If a worker stops sending heartbeats (because it's crashed, hung, or exceeded the timeout), Gunicorn sends SIGKILL and forks a replacement. The master process itself never handles HTTP traffic. It just manages the workers.

The key configuration flag is --worker-class uvicorn.workers.UvicornWorker. This tells Gunicorn to fork workers that run Uvicorn's async event loop rather than Gunicorn's default synchronous worker. Without this flag, Gunicorn spawns sync workers that can't run FastAPI's async code correctly.

On the worker count question: the (2 × CPU) + 1 formula comes from Gunicorn's own documentation and was designed for synchronous, CPU-bound WSGI workers like Django or Flask with sync views. For async FastAPI applications, it's the wrong starting point. Each Uvicorn worker runs a full asyncio event loop that can handle hundreds of concurrent connections. The bottleneck is almost never CPU cores — it's your database connection pool size, external API rate limits, or available memory. Start with 2 workers on a 2-core machine, load test with realistic traffic patterns, watch CPU utilisation and memory, and add workers only when CPU is actually saturated. On a 4-core machine running a typical API-heavy FastAPI app, 4 workers is usually the ceiling before memory pressure outweighs concurrency gains.

For production Gunicorn configuration, use a gunicorn.conf.py file rather than a long command-line flag string. It's versionable, readable, and supports Python expressions for dynamic values.

gunicorn.conf.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
# gunicorn.conf.py — version-control this alongside your application code
# Reference: https://docs.gunicorn.org/en/stable/settings.html

import multiprocessing

# -----------------------------------------------------------------
# Server socket
# -----------------------------------------------------------------
bind = "0.0.0.0:8000"

# -----------------------------------------------------------------
# Worker processes
# For async FastAPI: start low (2-4) and benchmark.
# The (2*CPU)+1 formula is for sync WSGI workers, not async ASGI.
# Each UvicornWorker runs a full asyncio event loop — concurrency
# comes from the event loop, not from worker count.
# -----------------------------------------------------------------
workers = 2  # tune based on load testing, not the formula
worker_class = "uvicorn.workers.UvicornWorker"

# -----------------------------------------------------------------
# Timeouts
# timeout: kill and restart a worker that hasn't responded in N seconds.
# graceful_timeout: time allowed for in-flight requests to complete
#                   after SIGTERM before workers are force-killed.
# Set timeout to at least 2x your slowest endpoint's p99 latency.
# -----------------------------------------------------------------
timeout = 60
graceful_timeout = 30
keepalive = 5

# -----------------------------------------------------------------
# Worker recycling — critical for long-running services
# max_requests: restart each worker after this many requests.
#               Mitigates slow memory leaks without a full redeploy.
# max_requests_jitter: randomises the restart point per worker.
#                      Without jitter, all workers restart simultaneously
#                      (thundering herd) causing a brief availability dip.
# -----------------------------------------------------------------
max_requests = 1000
max_requests_jitter = 100

# -----------------------------------------------------------------
# Logging — always route to stdout/stderr in containers
# accesslog "-" and errorlog "-" send to stdout/stderr so your
# container runtime (Docker, Kubernetes) captures them.
# -----------------------------------------------------------------
accesslog = "-"
errorlog = "-"
loglevel = "info"
# Structured access log format for JSON log aggregators (Datadog, Loki, CloudWatch)
# Fields: remote_addr, request_time, status, response_length, referer, user_agent
access_log_format = '{"remote": "%({X-Forwarded-For}i)s", "method": "%(m)s", "path": "%(U)s", "status": %(s)s, "duration_ms": %(D)s}'

# -----------------------------------------------------------------
# Process naming — visible in ps aux, htop, and monitoring dashboards
# -----------------------------------------------------------------
proc_name = "thecodeforge-api"
▶ Output
[2026-01-15 10:00:00] [INFO] Starting gunicorn 22.0.0
[2026-01-15 10:00:00] [INFO] Listening at: http://0.0.0.0:8000
[2026-01-15 10:00:00] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2026-01-15 10:00:00] [INFO] Booting worker with pid: 42
[2026-01-15 10:00:00] [INFO] Booting worker with pid: 43
📊 Production Insight
The --preload flag looks attractive — it loads application code in the master process before forking workers, which on Linux can reduce memory via copy-on-write. Do not use it with async FastAPI if you initialise database connection pools at module import time.
Here is what happens: asyncpg or SQLAlchemy async creates connection objects tied to an asyncio event loop in the master process. After fork(), each worker has its own event loop, but the connection objects reference the master's loop — which no longer exists in the worker. The first database call produces a cryptic 'Future attached to a different loop' error. We've debugged this at 2 AM more than once.
Rule: if you use --preload, initialise all connection pools inside the lifespan context manager, not at module level. If you can't guarantee that, skip --preload entirely. The memory savings rarely justify the debugging cost.
🎯 Key Takeaway
Gunicorn manages worker lifecycle on bare VMs; Kubernetes manages it at the pod level — know which environment you're in before choosing. For async FastAPI, worker count is about memory budget and CPU saturation, not the (2×CPU)+1 formula. Always use gunicorn.conf.py in production, set --max-requests with jitter to recycle workers, and never use --preload with connection pools initialised at import time.

Production-Grade Multi-Stage Dockerfile

A naive Dockerfile copies your entire project directory, installs dependencies, and produces an image that ships your build toolchain, compiler headers, pip cache, .git history, and local .env files to every environment it runs in. In a security audit this is a finding. In a CI pipeline it's a slow build. In production it's an unnecessarily large attack surface.

The solution is a multi-stage build. The builder stage installs system dependencies and compiles any C extensions (psycopg2, cryptography, numpy). The runtime stage starts clean from a slim base image and copies only the compiled packages from the builder — no gcc, no build headers, no pip internals. The final image is typically 60–80% smaller than a naive single-stage build.

Alongside the Dockerfile, a .dockerignore file is not optional — it's the first line of defence against shipping things that should never leave your workstation. Without it, COPY . . sends your .git directory, your local venv, your .env file with real credentials, your editor configuration, and your __pycache__ to the Docker build context. Some of those end up in image layers. Image layers are readable by anyone with docker history access.

One more thing about user creation: the adduser instruction should appear before the COPY instructions, not after. Docker caches layers sequentially. If adduser runs after COPY, any change to your application code invalidates the adduser cache layer, which forces Docker to re-run user creation on every build that touches app code. Put adduser early, before any COPY that changes frequently.

Dockerfile · DOCKERFILE
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
# =================================================================
# STAGE 1: Builder
# Installs dependencies and compiles C extensions.
# This stage is discarded — nothing from it ships to production
# except the installed packages we explicitly copy.
# =================================================================
FROM python:3.12-slim AS builder

# Install build tools needed for compiled extensions (psycopg2, cryptography)
# These are NOT copied to the runtime stage
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    libpq-dev \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /build

# Copy only requirements first — layer cache means pip only re-runs
# when requirements.txt changes, not on every code change
COPY requirements.txt .

# Install into an isolated prefix so we can copy the exact set of
# installed packages to the runtime stage cleanly
RUN pip install --no-cache-dir --upgrade pip \
    && pip install --no-cache-dir --prefix=/install -r requirements.txt

# =================================================================
# STAGE 2: Runtime
# Clean slim image — no build tools, no compiler, no pip cache.
# Only the application code and the installed packages.
# Typical size reduction: 60-80% vs naive single-stage build.
# =================================================================
FROM python:3.12-slim AS runtime

# Prevents Python from writing .pyc files to disk
# (not useful in containers — the filesystem is ephemeral)
ENV PYTHONDONTWRITEBYTECODE=1

# Forces Python stdout/stderr to be unbuffered.
# Without this, logs may be lost on crash because they're sitting
# in Python's internal buffer when the process exits.
# Critical for CloudWatch, Stackdriver, and Loki log capture.
ENV PYTHONUNBUFFERED=1

# Add the installed packages from the builder stage to the Python path
ENV PYTHONPATH=/install/lib/python3.12/site-packages

# Install only the runtime system libraries (not build tools)
# libpq is the PostgreSQL client library — needed at runtime for psycopg2
RUN apt-get update && apt-get install -y --no-install-recommends \
    libpq5 \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Create a non-privileged user BEFORE copying application code.
# Reason: adduser is layer-cached independently of app code.
# If adduser runs after COPY, a code change invalidates the adduser
# layer and Docker re-runs user creation on every build.
RUN addgroup --system forgegroup \
    && adduser --system --ingroup forgegroup --no-create-home forgeuser

WORKDIR /app

# Copy installed packages from builder stage
COPY --from=builder /install /usr/local

# Copy application code — changes here do NOT invalidate earlier layers
COPY --chown=forgeuser:forgegroup . .

# Switch to non-privileged user
# Running as root in a container is a container escape risk:
# a vulnerability in your app or a dependency could give an attacker
# root access to the host if the container runtime is misconfigured.
USER forgeuser

# Document the port — this is metadata for tooling and humans.
# EXPOSE does NOT control network access. The -p flag in docker run
# or the containerPort in Kubernetes does that.
EXPOSE 8000

# Use gunicorn.conf.py for all configuration — cleaner than
# a long flag string and versionable alongside the application.
# Workers and timeout are set in gunicorn.conf.py, not hardcoded here.
CMD ["gunicorn", "-c", "gunicorn.conf.py", "io.thecodeforge.main:app"]
▶ Output
[+] Building 24.3s (14/14) FINISHED
=> [builder 1/4] FROM python:3.12-slim
=> [builder 2/4] RUN apt-get update && apt-get install gcc libpq-dev
=> [builder 3/4] COPY requirements.txt .
=> [builder 4/4] RUN pip install --prefix=/install -r requirements.txt
=> [runtime 1/5] FROM python:3.12-slim
=> [runtime 2/5] RUN apt-get install libpq5 curl
=> [runtime 3/5] RUN addgroup && adduser
=> [runtime 4/5] COPY --from=builder /install /usr/local
=> [runtime 5/5] COPY . .
=> exporting to image
Successfully built thecodeforge-api:latest
Image size: 187MB (vs 312MB single-stage)
📊 Production Insight
Always ship a .dockerignore file alongside your Dockerfile. Without it, COPY . . sends everything — including your local .env file with real credentials — to the Docker build daemon. In CI systems that cache build contexts, those credentials can persist in the cache layer longer than you expect.
Minimum .dockerignore for a Python project:
.git/
.gitignore
.env
.env.*
__pycache__/
*.pyc
*.pyo
.pytest_cache/
venv/
.venv/
*.egg-info/
dist/
build/
.vscode/
.idea/
docs/
tests/
Note that tests/ is excluded from the production image — your test suite and its fixtures have no business in a production container. If you need to run tests in CI, run them against the builder stage before the runtime stage is built, not inside the final image.
🎯 Key Takeaway
Multi-stage builds separate the build environment from the runtime image — the builder stage compiles dependencies, the runtime stage ships nothing but what's needed to run. The adduser instruction belongs before COPY for correct layer caching. A .dockerignore file is not optional — without it you risk shipping credentials and inflating image size. EXPOSE is documentation; -p and containerPort control actual network access.

Startup and Shutdown: The lifespan Context Manager

Before FastAPI 0.93.0 (early 2023), the way to run code at startup and shutdown was @app.on_event('startup') and @app.on_event('shutdown'). Both decorators are now deprecated and marked for removal in a future version. In 2026, they should not appear in any new code or tutorial. If you see them in a codebase you're inheriting, put them on your refactoring list.

The replacement is the lifespan context manager — a single async generator function that wraps your application's entire lifecycle. Everything before the yield runs at startup. Everything after the yield runs at shutdown. The yield itself is when your application accepts traffic.

This pattern is better for several reasons. Startup and shutdown logic lives in one function, so it's impossible to initialise a resource in startup without a corresponding cleanup in shutdown — they're in the same scope. It composes naturally with async context managers from your database libraries (databases, SQLAlchemy async, motor). And it works correctly with Gunicorn's graceful timeout: when Gunicorn receives SIGTERM, it sets a timer equal to graceful_timeout, drains in-flight requests, then calls the ASGI lifespan shutdown event — which triggers everything after the yield.

The graceful shutdown sequence in a Kubernetes rolling update is worth understanding completely, because getting it wrong means connection pool exhaustion on every deployment:

  1. Kubernetes sends SIGTERM to the Gunicorn master process
  2. Gunicorn stops accepting new connections and signals workers to finish in-flight requests
  3. Workers complete their current requests within graceful_timeout (default 30s)
  4. FastAPI's lifespan runs the shutdown section — closes database pools, flushes caches, deregisters from service discovery
  5. Workers exit cleanly
  6. Gunicorn master exits

If step 4 doesn't happen — because there's no shutdown handler — every rolling update leaks database connections. With SQLAlchemy's default pool of 5 connections per worker, 4 workers per pod, and 10 pods, a deployment cycle leaks 200 connection slots. PostgreSQL's default max_connections is 100. You can see where this ends.

io/thecodeforge/main.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
from contextlib import asynccontextmanager
from typing import AsyncGenerator

from fastapi import FastAPI
from fastapi.responses import JSONResponse
import databases

from io.thecodeforge.config import settings

# -----------------------------------------------------------------
# Database setup
# Using the 'databases' library for async PostgreSQL.
# The connection pool is created inside lifespan, NOT at module level.
# Creating it at module level breaks --preload and causes
# 'Future attached to a different loop' errors after fork().
# -----------------------------------------------------------------
database = databases.Database(
    str(settings.database_url),
    min_size=2,
    max_size=10  # align with your PostgreSQL max_connections / (workers * pods)
)


@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """
    Manages the full application lifecycle.
    Replaces the deprecated @app.on_event('startup') and @app.on_event('shutdown').

    Startup (before yield):
      - Connect to the database and verify connectivity
      - Warm any caches
      - Register with service discovery if applicable

    Shutdown (after yield):
      - Disconnect from the database (returns connections to the pool,
        then closes pool connections to PostgreSQL)
      - Flush metrics buffers
      - Deregister from service discovery

    Gunicorn's graceful_timeout controls how long workers have to reach
    the shutdown section before being force-killed.
    """
    # --- STARTUP ---
    await database.connect()

    # Verify the database is actually reachable before accepting traffic.
    # This prevents a Kubernetes pod from passing its readiness check
    # before the connection pool is healthy.
    try:
        await database.execute("SELECT 1")
    except Exception as exc:
        # Log and re-raise — the app should not start if the DB is unreachable
        # Gunicorn will log the exception and the worker will exit
        raise RuntimeError(f"Database connectivity check failed at startup: {exc}") from exc

    yield  # Application is live and accepting traffic here

    # --- SHUTDOWN ---
    # This runs when Gunicorn receives SIGTERM and graceful_timeout elapses,
    # or when the container receives SIGTERM directly (Kubernetes rolling update).
    # Closing the pool here prevents connection leak on every deployment.
    await database.disconnect()


app = FastAPI(
    title="TheCodeForge API",
    lifespan=lifespan
)


# -----------------------------------------------------------------
# Liveness probe: is the process alive and the event loop running?
# Kubernetes restarts the pod if this returns non-200 or times out.
# Keep this lightweight — no database calls. If the event loop is
# blocked, this endpoint won't respond and Kubernetes will act.
# -----------------------------------------------------------------
@app.get("/health", tags=["observability"])
async def health() -> dict:
    return {"status": "healthy"}


# -----------------------------------------------------------------
# Readiness probe: is this pod ready to receive traffic?
# Kubernetes stops sending traffic to this pod if this returns non-200.
# Unlike /health, this SHOULD check dependencies — a pod that has
# started but cannot reach its database should not receive traffic.
# -----------------------------------------------------------------
@app.get("/ready", tags=["observability"])
async def ready() -> JSONResponse:
    try:
        await database.execute("SELECT 1")
        return JSONResponse(status_code=200, content={"status": "ready"})
    except Exception as exc:
        # Return 503 — Kubernetes will remove this pod from the load balancer
        # rotation until the database becomes reachable again
        return JSONResponse(
            status_code=503,
            content={"status": "not_ready", "detail": str(exc)}
        )
▶ Output
[2026-01-15 10:00:00] [INFO] Application startup: connecting to database
[2026-01-15 10:00:00] [INFO] Database connectivity check passed
[2026-01-15 10:00:00] [INFO] Application ready — accepting traffic
...
[2026-01-15 10:05:00] [INFO] SIGTERM received — beginning graceful shutdown
[2026-01-15 10:05:00] [INFO] Finishing in-flight requests
[2026-01-15 10:05:01] [INFO] Database pool disconnected cleanly
[2026-01-15 10:05:01] [INFO] Worker exited
📊 Production Insight
The /health and /ready endpoints serve fundamentally different purposes and should never be the same endpoint.
/health (liveness): Is this process alive? Kubernetes restarts the pod if this fails. It should be fast and dependency-free — just return 200. If your event loop is blocked by a CPU-bound task, /health won't respond, and Kubernetes will correctly restart the pod.
/ready (readiness): Can this pod serve traffic right now? Kubernetes removes the pod from the load balancer if this fails. It should check actual dependencies — database connectivity, cache availability, any external service this pod needs to function. A pod that's alive but can't reach its database should return 503 from /ready so traffic routes to healthy pods.
The mistake we see regularly: a single /health endpoint that does a database ping. During a database outage, Kubernetes interprets a failing liveness probe as 'restart this pod,' which causes a restart loop. The pod keeps restarting, never staying up long enough to recover when the database comes back. Separate liveness and readiness, and only ping the database from readiness.
🎯 Key Takeaway
The lifespan context manager is the only correct way to handle startup and shutdown in FastAPI as of 2026 — @app.on_event is deprecated. Initialise connection pools inside lifespan, not at module level, to avoid post-fork event loop conflicts. The /health and /ready endpoints serve different Kubernetes probe purposes — liveness must be lightweight and dependency-free; readiness should check actual connectivity.

Robust Configuration with Pydantic Settings

Hardcoding database URLs is a security problem. Passing them as plain str fields in Pydantic is a validation problem — an empty string passes str validation and your app starts fine, then crashes on the first database call with a connection error that takes 10 minutes to trace back to a missing environment variable.

Pydantic Settings solves both: it reads configuration from environment variables (and optionally .env files for local development), and it validates the types strictly. Using PostgresDsn instead of str means an empty string, a typo, or a missing scheme like postgresql:// will raise a ValidationError at startup — before your app accepts a single request.

The env_prefix configuration prevents collision with system environment variables. Without a prefix, a setting named DEBUG would be overridden by any DEBUG variable in the system environment, which on some Linux distributions is set by the shell. With FORGE_ as the prefix, your settings are namespaced: FORGE_DATABASE_URL, FORGE_SECRET_KEY, FORGE_DEBUG.

The @lru_cache pattern on the settings factory function ensures the .env file is read and validated exactly once, even if get_settings() is called from multiple modules. It also makes the settings injectable via FastAPI's dependency injection system, which makes testing significantly cleaner — you can override get_settings in tests to return a test-specific configuration without modifying environment variables.

io/thecodeforge/config.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
from functools import lru_cache
from typing import Literal

from pydantic import Field, PostgresDsn, SecretStr, field_validator
from pydantic_settings import BaseSettings, SettingsConfigDict


class ForgeSettings(BaseSettings):
    """
    Application configuration loaded from environment variables.

    All fields with FORGE_ prefix are read from the environment.
    In local development, values are loaded from a .env file.
    In production (Kubernetes, ECS), they come from the orchestrator's
    secret injection — never from a file committed to version control.

    Validation happens at instantiation time. If any required field
    is missing or malformed, the app raises ValidationError immediately
    and refuses to start — which is the correct behaviour.
    """

    # PostgresDsn validates the full URL format including scheme, host, port, dbname.
    # An empty string or a URL without 'postgresql://' raises ValidationError.
    # Plain str would accept '' and let your app start, then crash on first DB call.
    database_url: PostgresDsn

    # SecretStr prevents the value from appearing in logs, repr(), or tracebacks.
    # settings.secret_key returns SecretStr — call .get_secret_value() to use the raw string.
    secret_key: SecretStr = Field(min_length=32)

    # Literal type restricts valid values — 'development', 'staging', 'production'.
    # Passing FORGE_ENVIRONMENT=prod (typo) raises ValidationError at startup.
    environment: Literal["development", "staging", "production"] = "production"

    debug: bool = False

    # Database pool sizing — align with (gunicorn workers × pods × this value)
    # to stay under PostgreSQL's max_connections limit
    db_pool_min_size: int = Field(default=2, ge=1, le=10)
    db_pool_max_size: int = Field(default=10, ge=1, le=50)

    # Worker configuration — read by gunicorn.conf.py if you want
    # dynamic worker count based on container CPU allocation
    workers: int = Field(default=2, ge=1, le=16)

    @field_validator("debug", mode="before")
    @classmethod
    def debug_must_be_false_in_production(cls, v, info):
        """Prevent debug mode from running in production accidentally."""
        # info.data may not have 'environment' yet during validation order
        # so we validate this in a post-model validator instead
        return v

    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        env_prefix="FORGE_",
        # Raise an error if any extra environment variables with FORGE_ prefix
        # exist but are not defined in this model — catches typos in var names
        extra="ignore",
        case_sensitive=False,
    )


@lru_cache(maxsize=1)
def get_settings() -> ForgeSettings:
    """
    Returns a cached singleton of the application settings.

    lru_cache ensures the .env file is read and validated exactly once
    at startup, regardless of how many modules call get_settings().

    In tests, override this with:
        app.dependency_overrides[get_settings] = lambda: ForgeSettings(
            database_url='postgresql://test:test@localhost/testdb',
            secret_key='a' * 32,
        )
    """
    return ForgeSettings()


# Module-level singleton for use outside FastAPI's DI system
# (e.g., in gunicorn.conf.py or CLI scripts)
settings = get_settings()
▶ Output
# When DATABASE_URL is missing:
ValidationError: 1 validation error for ForgeSettings
database_url
Field required [type=missing, input_url=FORGE_DATABASE_URL]

# When DATABASE_URL is an empty string:
ValidationError: 1 validation error for ForgeSettings
database_url
URL scheme should be 'postgresql', 'postgresql+asyncpg' or similar [type=url_scheme]

# When all values are valid:
Settings validated: environment=production debug=False workers=2
📊 Production Insight
The most dangerous configuration mistake is not a missing variable — it's a present but wrong variable that passes validation. A DATABASE_URL pointing to a staging database in a production deployment will pass PostgresDsn validation, start the app, and silently write production traffic to staging data.
Defence layers:
1. Use SecretStr for the database URL in environments where you want to prevent accidental logging of the connection string (it contains the password).
2. Add a startup check in the lifespan that compares settings.environment against a value written to the database at provisioning time — if they disagree, refuse to start.
3. In Kubernetes, use separate ServiceAccounts with separate Secret access per namespace. The production pod role cannot access staging secrets and vice versa.
No amount of Pydantic validation catches a valid URL pointing to the wrong server. The defence is IAM and network policy, not field validators.
🎯 Key Takeaway
Use PostgresDsn (not str) for connection URLs — it validates format at startup rather than at first use. Use SecretStr for keys and passwords to prevent them appearing in logs and tracebacks. The @lru_cache pattern makes settings injectable and testable. No validator catches a valid URL pointing to the wrong database — defence in depth means network policy and IAM, not just Pydantic.

Environment Separation and Secrets Management

Local development uses a .env file. Staging and production do not. That sentence should settle most debates about secret management, but the details of how production secrets are delivered matter more than most articles acknowledge.

The wrong pattern — and one that still appears in deployment tutorials in 2026 — is passing secrets as docker run -e DATABASE_URL=postgresql://... in a CI/CD script. This exposes the secret in: - Shell history on the CI runner - The process list (ps aux shows environment variables on some Linux kernels) - docker inspect output, which is readable by anyone with Docker socket access - CI logs if the command is echoed

Kubernetes: Store secrets in Kubernetes Secrets (base64-encoded, etcd-backed). Mount them as environment variables in the pod spec using secretKeyRef, or mount them as files using a volume. For stronger security, use the Secrets Store CSI Driver to pull secrets directly from AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault into the pod's filesystem — the secret never touches etcd.

AWS ECS: Use AWS Secrets Manager or Parameter Store. Reference secrets in the task definition's secrets block — ECS injects them as environment variables at task startup. The secret value is fetched at runtime by the ECS agent, not baked into the task definition.

Bare VMs: Use HashiCorp Vault with the AppRole auth method. The VM has a role ID and secret ID (injected by your provisioning system), authenticates to Vault at startup, and receives a short-lived token to fetch secrets. The actual credentials are never on disk.

For local development, .env files are fine — with two non-negotiable rules: .env is in .gitignore, and .env.example (with placeholder values) is in version control as documentation.

.env.example · BASH
1234567891011121314151617181920212223242526272829
# .env.example — commit this to version control
# Copy to .env and fill in real values for local development
# NEVER commit .env — it is in .gitignore
#
# In production (Kubernetes), these come from Kubernetes Secrets
# mounted as environment variables via secretKeyRef in the pod spec.
# In ECS, they come from AWS Secrets Manager referenced in the task definition.
# On bare VMs, they come from HashiCorp Vault injected at startup.

# PostgreSQL connection string
# Format: postgresql+asyncpg://user:password@host:port/database
FORGE_DATABASE_URL=postgresql+asyncpg://forge_user:change_me@localhost:5432/forge_db

# Must be at least 32 characters — used for JWT signing and session encryption
# Generate with: python -c "import secrets; print(secrets.token_hex(32))"
FORGE_SECRET_KEY=change-me-to-a-64-character-random-string-generated-with-secrets-module

# Environment tag — controls logging level, error detail exposure, and feature flags
# Valid values: development | staging | production
FORGE_ENVIRONMENT=development

# Debug mode — NEVER set to true in staging or production
FORGE_DEBUG=false

# Database connection pool sizing
# Max connections consumed = FORGE_DB_POOL_MAX_SIZE × gunicorn_workers × pod_count
# Must be less than PostgreSQL's max_connections (default: 100)
FORGE_DB_POOL_MIN_SIZE=2
FORGE_DB_POOL_MAX_SIZE=5
📊 Production Insight
A junior developer committed a .env file containing real staging credentials to a public GitHub repository. Within 90 minutes — before anyone noticed — a bot had found the credentials, authenticated to the database, exfiltrated the users table, and spun up EC2 instances in the same AWS account for crypto mining. The database credentials were also used as the SECRET_KEY, so all existing sessions were invalidated when the key was rotated.
The cost: two days of incident response, mandatory password resets for all users, a security disclosure, and one very difficult conversation with the CTO.
The prevention was three lines: .env in .gitignore, a pre-commit hook running git-secrets or truffleHog to scan for credential patterns before every commit, and separate SECRET_KEY and DATABASE_URL values so rotating one doesn't invalidate the other.
Rule: treat a committed .env file as a full security incident, not a 'oops, force-push.' Rotate every credential in that file immediately, assume they have already been exfiltrated, and check your cloud provider's billing dashboard for unexpected resources.
🎯 Key Takeaway
Local development uses .env files; production secrets come from the orchestrator — Kubernetes secretKeyRef, ECS Secrets Manager references, or Vault AppRole. Never pass production secrets via docker run -e flags — they appear in shell history, process lists, and docker inspect output. A committed .env file is a security incident, not a mistake to revert.

Health Checks, Readiness, and Graceful Shutdown

Health checks are where the gap between 'it works on my machine' and 'it works in production' is most visible. Getting them wrong costs you downtime on every deployment — which in a team doing continuous delivery might mean several times a day.

The distinction between liveness and readiness is not semantic pedantry. It maps directly to different Kubernetes behaviours with different consequences:

Liveness probe failure → Kubernetes restarts the pod. Use this for: detecting a deadlocked event loop, a process that's running but not processing requests, or a hung worker. The probe itself must be cheap and dependency-free. If your liveness probe calls the database and the database goes down, Kubernetes will restart all your pods in a loop — making an outage worse.

Readiness probe failure → Kubernetes removes the pod from the load balancer's endpoint set. Traffic stops going to that pod, but the pod itself is not restarted. Use this for: verifying database connectivity, checking that connection pools are initialised, confirming that any required cache warmup has completed. A pod that's alive but can't reach its database should return 503 from readiness so traffic routes to healthy pods — not 503 from liveness, which would trigger a restart loop.

For Kubernetes deployment configuration, set both probes with appropriate delays and thresholds. The initialDelaySeconds on the readiness probe should account for your application's startup time — the database connectivity check in the lifespan context manager, any cache warming, service registration. If your startup takes 5 seconds and initialDelaySeconds is 3, your pod will fail its first readiness check and briefly drop from rotation on every deployment.

io/thecodeforge/main.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
# This file shows the complete main.py — combining lifespan, health,
# and readiness into the single application entry point.
# The lifespan implementation from the previous section is included
# here in full context.

from contextlib import asynccontextmanager
from typing import AsyncGenerator

import databases
from fastapi import FastAPI
from fastapi.responses import JSONResponse

from io.thecodeforge.config import get_settings

settings = get_settings()

database = databases.Database(
    str(settings.database_url),
    min_size=settings.db_pool_min_size,
    max_size=settings.db_pool_max_size,
)


@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    # STARTUP
    await database.connect()
    try:
        await database.execute("SELECT 1")
    except Exception as exc:
        raise RuntimeError(f"Database unreachable at startup: {exc}") from exc

    yield  # serving traffic

    # SHUTDOWN — runs on SIGTERM via Gunicorn graceful_timeout
    await database.disconnect()


app = FastAPI(title="TheCodeForge API", lifespan=lifespan)


@app.get("/health", tags=["observability"])
async def health() -> dict:
    """
    Liveness probe — is this process alive and the event loop running?

    Kubernetes action on failure: RESTART the pod.
    Keep this dependency-free. If this endpoint calls the database
    and the DB goes down, Kubernetes will restart all pods — turning
    a DB outage into a total availability incident.

    A blocked event loop will prevent this endpoint from responding,
    which is exactly the signal Kubernetes needs to restart the pod.
    """
    return {"status": "healthy"}


@app.get("/ready", tags=["observability"])
async def ready() -> JSONResponse:
    """
    Readiness probe — can this pod receive traffic?

    Kubernetes action on failure: REMOVE from load balancer (no restart).
    This SHOULD check actual dependencies. A pod that's alive but
    disconnected from its database should not receive traffic.

    Return 503 to signal 'alive but not ready'Kubernetes stops
    routing traffic to this pod without restarting it.
    """
    try:
        await database.execute("SELECT 1")
        return JSONResponse(status_code=200, content={"status": "ready"})
    except Exception as exc:
        return JSONResponse(
            status_code=503,
            content={
                "status": "not_ready",
                "detail": "database unreachable",
                # Do not expose exc details in production — log them instead
            }
        )
▶ Output
# Normal operation:
GET /health -> 200 {"status": "healthy"}
GET /ready -> 200 {"status": "ready"}

# During database outage:
GET /health -> 200 {"status": "healthy"} # pod stays up
GET /ready -> 503 {"status": "not_ready"} # pod removed from LB rotation

# During SIGTERM (Kubernetes rolling update):
[INFO] SIGTERM received
[INFO] Finishing 3 in-flight requests
[INFO] Database pool disconnected
[INFO] Worker exited cleanly
📊 Production Insight
During a Kubernetes rolling update, the sequence that prevents downtime is:
1. New pods start, pass their readiness probe, join the load balancer
2. Kubernetes sends SIGTERM to old pods
3. Gunicorn receives SIGTERM, stops accepting new connections
4. In-flight requests complete within graceful_timeout
5. lifespan shutdown section closes the database pool
6. Old pods exit cleanly
If step 1 completes before step 2 starts — which requires correct readiness probe configuration and a non-zero minReadySeconds — there is zero downtime. If your readiness probe is wrong (always returns 200 before the DB is connected, for example), new pods join the load balancer before they can serve requests, and users get errors during the window before the startup database check completes.
The initialDelaySeconds on your readiness probe should be set to 110% of your measured startup time. Measure it. Don't guess.
🎯 Key Takeaway
Liveness failure means restart the pod — keep it dependency-free. Readiness failure means remove from load balancer — check real dependencies. The lifespan shutdown section is what prevents connection pool exhaustion on every rolling update. initialDelaySeconds on your readiness probe should match your actual measured startup time.
🗂 Gunicorn + Uvicorn vs Uvicorn Alone in Production
The right choice depends on your infrastructure, not a universal rule
FeatureUvicorn AloneGunicorn + Uvicorn
Process managementNone from Uvicorn itself — single process per invocationMaster process (arbiter) spawns, monitors, and restarts workers via heartbeat
Worker crash recoveryPod exits — Kubernetes or systemd must restart itGunicorn detects failed heartbeat, sends SIGKILL, forks replacement — no external restart needed
Graceful shutdownHandle SIGTERM in lifespan — works correctly but no worker drainingBuilt-in graceful_timeout: workers finish in-flight requests before shutdown
Multi-core utilisationUse multiple pod replicas in Kubernetes — each pod one workerMultiple workers per process — suited for bare VMs with no external orchestrator
Zero-downtime reloadRolling update via Kubernetes — replace pods not processesSIGHUP for config reload; USR2 for zero-downtime binary upgrade on bare VMs
Best fitKubernetes, Cloud Run, ECS — where the orchestrator manages restarts, scaling, and healthBare VMs, EC2 without ECS, GCE — where no external process manager exists
Memory per worker50-150MB per pod for a typical FastAPI appSame per worker, plus ~10-20MB for the Gunicorn master process
Configurationuvicorn main:app --workers N or gunicorn.conf.pygunicorn.conf.py with worker_class=uvicorn.workers.UvicornWorker

🎯 Key Takeaways

  • Gunicorn is the right process manager on bare VMs and EC2 — it handles worker crash recovery, graceful shutdown, and multi-core utilisation without an external orchestrator. On Kubernetes, Uvicorn alone with multiple pod replicas is equally valid and often simpler to reason about.
  • The (2×CPU)+1 worker formula is for synchronous WSGI workers. For async FastAPI, start with 2–4 workers and benchmark under load — the event loop provides concurrency, not worker count. Adding workers beyond CPU saturation multiplies memory and connection consumption with no throughput gain.
  • Use a gunicorn.conf.py file for all Gunicorn configuration — versionable, readable, and supports dynamic values. Set max_requests and max_requests_jitter to recycle workers and prevent thundering herd restarts from slow memory leaks.
  • The lifespan context manager is the only correct pattern for startup and shutdown in FastAPI as of 2026. @app.on_event is deprecated. Initialise database pools inside lifespan (never at module level) to avoid post-fork event loop conflicts with --preload.
  • Use PostgresDsn (not str) for database URLs in Pydantic Settings — empty strings and malformed URLs fail at startup, not at runtime under real traffic. Use SecretStr for passwords and keys to prevent them appearing in logs and stack traces.
  • Separate /health (liveness — no dependencies, must be fast) from /ready (readiness — calls SELECT 1, returns 503 when DB is unreachable). Using a single combined endpoint used for both Kubernetes probes will either cause restart loops during DB outages or fail to catch connectivity problems before traffic is routed.
  • Production secrets do not belong in docker run -e flags — they appear in shell history, process lists, and docker inspect output. Use Kubernetes secretKeyRef, AWS Secrets Manager references in ECS task definitions, or HashiCorp Vault with AppRole on bare VMs.
  • Multi-stage Dockerfiles separate the build environment from the runtime image. The builder stage compiles C extensions; the runtime stage ships nothing but what's needed to run. Add adduser before COPY for correct layer caching. Ship a .dockerignore alongside every Dockerfile.

⚠ Common Mistakes to Avoid

    Using --reload in production
    Symptom

    CPU usage sits at 15-30% with zero traffic — the inotify file watcher is polling every file in your project directory. On containers with large dependency trees, this can consume a meaningful fraction of your CPU budget before serving a single request. Also, --reload requires Uvicorn to know your source file paths, which can expose them in certain error messages.

    Fix

    Remove --reload from every production CMD and entrypoint. It is a development convenience that has no place in a production image. Use separate Dockerfiles (Dockerfile.dev, Dockerfile) or override the CMD in docker-compose.override.yml for local development. In CI, lint your Dockerfile to fail on --reload: 'grep -r "--reload" Dockerfile && exit 1 || exit 0'.

    Using plain str instead of PostgresDsn for database URLs in Pydantic Settings
    Symptom

    An empty string or a malformed URL passes validation. The app starts, passes health checks, joins the load balancer, and crashes on the first database call — potentially minutes later. This is the worst failure mode: silent startup followed by runtime errors under real traffic.

    Fix

    Use PostgresDsn (or the appropriate specialised type) for any URL field in your Pydantic Settings model. For async drivers, use 'postgresql+asyncpg://' as the scheme — PostgresDsn validates this correctly. Add a startup connectivity check in the lifespan context manager so even a valid-format URL pointing to an unreachable host fails fast before traffic is routed.

    Using the deprecated @app.on_event decorator for startup and shutdown logic
    Symptom

    No immediate runtime error — @app.on_event still works in current FastAPI versions. The mistake is accruing technical debt: the decorator is marked deprecated since FastAPI 0.93.0 (2023) and will be removed in a future major version. Code written today with @app.on_event will break on the next major FastAPI upgrade.

    Fix

    Use the lifespan context manager for all startup and shutdown logic. Initialise database pools, warm caches, and register with service discovery before the yield. Disconnect pools, flush metrics, and deregister after the yield. The lifespan pattern also makes it impossible to initialise a resource without a corresponding cleanup — they're in the same function scope.

    Not handling SIGTERM — or using a liveness probe that calls the database
    Symptom

    Two separate problems with the same root cause (not thinking about the full shutdown sequence). Without SIGTERM handling via lifespan, every rolling update leaks database connections — with 4 workers per pod and 10 pods, a deployment cycle can exhaust PostgreSQL's connection limit. With a database-calling liveness probe, a database outage triggers a pod restart loop — Kubernetes restarts every pod repeatedly during the outage, making recovery slower when the database comes back.

    Fix

    Handle SIGTERM via the lifespan shutdown section — close database pools, flush any write buffers. Separate /health (liveness, no dependencies, just return 200) from /ready (readiness, call SELECT 1). Set Gunicorn's graceful_timeout to at least 30 seconds and add a Kubernetes preStop lifecycle hook with a 5-second sleep to give the load balancer time to drain before Gunicorn starts refusing connections.

    Applying the (2×CPU)+1 worker formula blindly to async FastAPI deployments
    Symptom

    On a 16-core machine, this formula produces 33 workers. A typical FastAPI worker starts at 50-150MB. At 33 workers you're consuming 1.5-5GB at startup before handling any traffic. Context switching across 33 OS-level processes degrades throughput. For async workloads, the event loop provides concurrency — not worker count. More workers than needed multiplies your database connection consumption and memory pressure with no throughput benefit.

    Fix

    Start with 2 workers on containers with 2 CPU cores, 4 workers on 4-core machines. Load test with realistic traffic. Watch CPU utilisation per worker via 'docker stats'. Add workers only when CPU is consistently above 80% — not based on the formula. For most async FastAPI applications serving typical API traffic, the bottleneck is database connection pool size or external API rate limits, not CPU cores or worker count.

Interview Questions on This Topic

  • QExplain the master-worker architecture of Gunicorn. What happens to the master process if a worker encounters a segmentation fault, and what happens if the master itself crashes?SeniorReveal
    Gunicorn uses a master process — called the arbiter — that forks multiple worker processes. The master monitors workers via a heartbeat mechanism: each worker periodically touches a file in a temporary directory. If a worker stops updating its heartbeat file within the --timeout window, the master assumes it's hung and sends SIGKILL. If a worker crashes with a segfault, the kernel sends SIGCHLD to the master. The master logs the crash and immediately forks a new worker to replace it. The master itself never handles HTTP traffic — it only manages worker lifecycle. If the master crashes, all workers become orphaned processes and the application stops accepting new connections (existing workers may finish in-flight requests but no new workers are spawned). This is why you run Gunicorn under an external supervisor: systemd on bare VMs, Docker's --restart policy in containers, or Kubernetes Deployment controller in clusters. On Kubernetes specifically, a crashed master means the entire pod exits and Kubernetes restarts it — which is why some teams prefer running Uvicorn directly in Kubernetes with multiple pod replicas rather than Gunicorn with multiple workers in a single pod.
  • QIn a Docker context, why is .dockerignore important for Python applications, and what specifically should it always contain?Mid-levelReveal
    Without .dockerignore, the COPY . . instruction sends everything in your project directory to the Docker build daemon — including files that should never leave your workstation. For Python projects this includes: .git/ (potentially hundreds of MB of history, plus any credentials committed in past commits), .env (real credentials), venv/ or .venv/ (your local virtual environment — reinstalling this from requirements.txt in the image produces the correct platform-specific packages, while copying from venv copies your OS-specific binaries), __pycache__ and .pyc files (platform-specific bytecode), .pytest_cache/, test fixtures, and editor configuration. The security concern is real: a .git directory in your Docker build context can expose the full commit history, including any secrets that were committed and removed from the working tree but still exist in git history. The performance concern is also real: sending a 500MB venv to the build daemon on every build adds minutes to CI build times. Minimum .dockerignore for Python: .git/, .env, .env., venv/, .venv/, __pycache__/, .pyc, .pytest_cache/, .egg-info/, dist/, build/, tests/, docs/.
  • QHow does PYTHONUNBUFFERED affect log visibility in containerised environments, and what happens to logs if a container crashes without it set?Mid-levelReveal
    By default, Python buffers stdout and stderr in chunks — for stdout, the buffer is typically 8KB when writing to a pipe (which is what a container runtime reads). In a container environment, your application's log output goes to a log shipping agent (Fluentd, Fluent Bit, the Docker json-file driver) via stdout. Without PYTHONUNBUFFERED=1, those logs sit in Python's internal buffer until either the buffer fills to 8KB or the process flushes explicitly. If the container crashes — OOM kill, segfault, unhandled exception — the buffer contents that haven't been flushed are lost permanently. In production this means the last few seconds of log output before a crash disappear, which is exactly when you most need them. Setting PYTHONUNBUFFERED=1 (or equivalently, running Python with -u flag) forces stdout and stderr to be written immediately. In AWS CloudWatch, GCP Cloud Logging, and Datadog, this ensures every log line is captured in real time. The performance cost is negligible for typical API logging volumes.
  • QYou're deploying a FastAPI application to a 16-core machine. The (2n+1) formula suggests 33 workers. Why might this degrade performance, and how would you determine the correct worker count?SeniorReveal
    The (2n+1) formula was designed for synchronous WSGI workers (Django, Flask sync views) where each worker handles exactly one request at a time and CPU cores are the concurrency limit. For async FastAPI with UvicornWorker, each worker runs an asyncio event loop that can hold hundreds of concurrent connections open — waiting on database responses, external APIs, or I/O — without blocking. The concurrency comes from the event loop, not from worker count. Spawning 33 workers on a 16-core machine for an async workload causes: each worker consuming 50-150MB at startup (1.5-5GB total before any traffic), database connection pool multiplied by 33 (if pool max is 5, that's 165 connections against a PostgreSQL default max of 100), and OS context switching across 33 processes that are mostly idle waiting on I/O. The correct approach: start with 4 workers (matching physical cores), load test with realistic production traffic patterns, observe per-worker CPU utilisation via docker stats or a metrics exporter. Add workers only when CPU is consistently above 80%. For most FastAPI applications, the real constraint surfaces long before CPU saturation: the database connection pool fills, or an external API rate limit is hit. Solve those constraints first before increasing worker count.
  • QWhat is the difference between EXPOSE and publishing a port in Docker, and which one actually controls whether traffic reaches your FastAPI application?JuniorReveal
    EXPOSE in a Dockerfile is documentation — it records that the container is designed to listen on a particular port. It has zero effect on network access. No firewall rule is created, no port is opened, nothing is mapped. It exists so that humans reading the Dockerfile and tools like docker-compose and Kubernetes know what port to map. Publishing a port — the -p flag in docker run (e.g., -p 8000:8000) or containerPort in a Kubernetes pod spec — is what actually controls network access. The -p flag creates a network address translation rule that maps a host port to a container port, making the application reachable from outside the container. In Kubernetes, containerPort in the pod spec is also primarily documentation — actual traffic routing is controlled by the Service resource and its selector, not by containerPort alone. In production, you typically do not expose Gunicorn directly to the internet — it sits behind a reverse proxy (Nginx, Caddy, Traefik) or a cloud load balancer that handles TLS termination, request buffering, and rate limiting.

Frequently Asked Questions

Why use Gunicorn with Uvicorn workers instead of just Uvicorn?

On bare VMs, EC2 instances, and any environment without an external process manager: Gunicorn handles worker crash recovery, graceful shutdown, and multi-core utilisation that Uvicorn alone doesn't provide. If your single Uvicorn process crashes, the app is down until something restarts it. Gunicorn restarts the worker in seconds. On Kubernetes: the answer flips. Kubernetes is already a process manager — it handles pod restarts, rolling updates, autoscaling, and health-based traffic routing. Running Uvicorn directly in a Kubernetes Deployment with multiple replicas (each pod one Uvicorn process) is a completely valid 2026 pattern. The Gunicorn master process becomes redundant overhead when Kubernetes is doing the same job at the pod level. Know your infrastructure before choosing.

How do I handle database migrations in a Docker deployment?

Never put migration commands (alembic upgrade head) inside your Docker CMD or the lifespan startup handler. If you scale to multiple pods and all of them run migrations on startup, you get concurrent migration attempts — Alembic's migration lock handles this, but it adds startup latency and occasional lock timeout failures. The correct pattern: run migrations as a Kubernetes Job before the Deployment is updated (or as an init container in the pod spec, which runs to completion before the main container starts). In ECS, run migrations as a one-off task in your CI/CD pipeline before the new task definition is deployed. The rule is: migrations run exactly once per deployment, not once per pod replica.

What is the 'zombie worker' problem in Gunicorn, and how does async FastAPI reduce its likelihood?

A zombie worker is a Gunicorn worker that has stopped sending heartbeats to the master — either because it's stuck in a CPU-bound synchronous loop that never yields, or because it's handling a request that exceeds the --timeout limit. Gunicorn detects the missing heartbeat and sends SIGKILL. For synchronous WSGI workers, any blocking operation (a slow database query, a network call, a file read) can cause this because the worker thread is blocked and can't yield. For async FastAPI workers, the event loop naturally yields during every I/O wait — a slow database query suspends the current coroutine and lets the event loop handle other requests. The zombie worker problem becomes a code quality issue: it only occurs when someone calls a synchronous blocking function (time.sleep, requests.get, a sync database driver) from an async context. The fix is always the same: use the async equivalent.

Should I use a reverse proxy like Nginx in front of Gunicorn?

In production on bare infrastructure: yes. Gunicorn is not hardened against slow HTTP attacks (Slowloris and similar), does not handle TLS termination efficiently, cannot serve static files without loading Python, and has no built-in rate limiting. Nginx handles all of these before a request reaches Gunicorn. The typical bare-VM stack: Nginx (TLS, rate limiting, static files, request buffering) → Gunicorn (process management) → Uvicorn workers (async HTTP) → FastAPI. In Kubernetes or managed container platforms: your ingress controller (Nginx Ingress, Traefik, Caddy, or a cloud-native load balancer) takes the place of a local Nginx. You don't typically run a per-pod Nginx sidecar — the ingress handles TLS and routing at the cluster level, and you deploy Gunicorn + Uvicorn in the application pod.

How do I enable Prometheus metrics for FastAPI without blocking the event loop?

Install prometheus-fastapi-instrumentator: 'pip install prometheus-fastapi-instrumentator'. Add to your application: 'from prometheus_fastapi_instrumentator import Instrumentator; Instrumentator().instrument(app).expose(app)'. This exposes a /metrics endpoint with request counts, latency histograms, and in-progress request gauges — all collected without blocking the async event loop. Configure Prometheus to scrape /metrics with a 15-second interval. For Gunicorn-level metrics (worker count, worker restarts, request queue depth), add '--statsd-host=localhost:8125' to gunicorn.conf.py and run a StatsD exporter sidecar. In Kubernetes, annotate the pod with 'prometheus.io/scrape: true' and 'prometheus.io/path: /metrics' so Prometheus discovers it automatically via pod annotations.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousFastAPI WebSockets — Real-time CommunicationNext →FastAPI vs Flask vs Django — When to Use Which
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged