Python Advanced

FastAPI Deployment — Docker, Uvicorn and Gunicorn

Q: Why use Gunicorn with Uvicorn workers instead of just Uvicorn?

While Uvicorn has a `--workers` flag, Gunicorn is a battle-tested process manager that has been the industry standard for over a decade. It offers more sophisticated heartbeats, worker timeouts, and signals handling. If you are deploying on 'bare' servers or VMs, Gunicorn is mandatory. However, if you are using Kubernetes, you can often run Uvicorn alone because Kubernetes itself acts as the process manager, handling replicas and health checks at the pod level.

Q: How do I handle database migrations in a Docker deployment?

Never put migration commands (like `alembic upgrade head`) inside your Docker `CMD`. This causes race conditions if you scale to multiple containers. Instead, run migrations as a separate 'init-container' in Kubernetes or a one-off task in your CI/CD pipeline before the new image goes live.

Q: How do I enable Prometheus metrics for FastAPI?

Install `prometheus-fastapi-instrumentator` and add it to your app: `from prometheus_fastapi_instrumentator import Instrumentator; Instrumentator().instrument(app).expose(app)`. This exposes metrics at `/metrics`. Configure Prometheus to scrape that endpoint. Combine with Gunicorn's `--statsd-host` for worker-level metrics.

📅 March 05, 2026 ⏱ 3 min read 🎯 Advanced

Where developers are forged. · Structured learning · Free forever.

📍 Part of: Python Libraries → Topic 48 of 51

Production-grade FastAPI deployment guide.

🔥 Advanced — solid Python foundation required

In this tutorial, you'll learn

Production-grade FastAPI deployment guide.

Never use --reload in production; it monitors file descriptors which causes significant overhead and potential security leaks.
Gunicorn handles the 'Manager' role (restarting dead workers), while Uvicorn handles the 'Worker' role (parsing async HTTP).
Worker calculation: Standard heuristic is (2 x CPU Cores) + 1. On a 2-core cloud VM, use 5 workers.

✦ Plain-English analogy ✦ Real code with output ✦ Interview questions

⚡Quick Answer

FastAPI requires an ASGI server; Uvicorn handles the async HTTP loop.
Gunicorn acts as a process manager: it spawns, monitors, and restarts Uvicorn workers.
Worker count formula: (2 × CPU cores) + 1 — 5 workers on a 2-core VM.
Gunicorn's --timeout prevents stuck workers from hanging requests indefinitely.
Biggest mistake: using --reload in production — it monitors file descriptors, wastes CPU, and can leak source paths.

🚨 START HERE

FastAPI Deployment Troubleshooting Quick Guide

Five common failures and the exact commands to diagnose them.

🟡Container exits immediately after start

Immediate ActionCheck container logs for Python import errors or missing modules.

Commands

docker logs <container_id> --tail 50

docker run --rm <image> sh -c "python -c 'from app.main import app'"

Fix NowFix missing dependencies by running pip install inside the container or rebuild image with correct requirements.

🟡Gunicorn workers not starting (permission denied)

Immediate ActionCheck if port is already in use or user lacks permissions.

Commands

docker exec <container> sh -c "ps aux | grep gunicorn"

docker exec <container> sh -c "netstat -tulpn | grep 8000"

Fix NowRebind to a port >1024 or run container with `--user root` temporarily (bad practice). Better: add `--cap-add=NET_BIND_SERVICE` or use non-privileged port.

🟡Health check endpoint returns 500

Immediate ActionTest the health endpoint manually with curl from inside the container.

Commands

docker exec <container> sh -c "curl -f http://localhost:8000/health"

docker logs <container_id> --since 5m | grep ERROR

Fix NowCheck database connectivity: `docker exec <container> sh -c "python -c 'from app.core.config import settings; print(settings.database_url)'"`. Then fix connection string or DB hostname.

🟠Slow response times under load

Immediate ActionCheck Gunicorn worker count and request queue.

Commands

curl -s http://localhost:8000/health | jq .

gunicorn --statsd-host=localhost:8125 -w 4 -k uvicorn.workers.UvicornWorker (requires statsd exporter)

Fix NowAdd `--worker-class uvicorn.workers.UvicornH11Worker` for lower memory, or increase worker count up to (2×CPU)+1.

🟡Environment variables not loading inside container

Immediate ActionPrint environment from inside the running container.

Commands

docker exec <container> env | grep FORGE_

docker inspect <container> --format='{{range .Config.Env}}{{print . "\n"}}{{end}}'

Fix NowVerify `docker run -e` or `docker-compose.yml` environment section. For Kubernetes: check `kubectl describe pod` under Continer.Env. Add missing variables.

Production IncidentMissing Environment Variable Causes Silent FailoverA staging deployment started without errors but every API call returned 500. The root cause: a missing `DATABASE_URL` environment variable that Pydantic did not validate because it was set to an empty string.

SymptomFastAPI app starts successfully, but all endpoints return HTTP 500 with ConnectTimeoutError: could not connect to server. Logs show 'NoneType' object has no attribute 'connect'.

AssumptionThe team assumed that if an environment variable was missing, Pydantic would raise a ValidationError and the app would fail to start. But they had set DATABASE_URL: str without Field(...), so an empty string was accepted.

Root causeThe Pydantic model used database_url: str instead of database_url: PostgresDsn. An empty string passed validation because str allows empty. The database driver then tried to connect to an empty hostname, causing a timeout.

FixChanged the field to database_url: PostgresDsn from pydantic to enforce a valid URL format. Also added a startup check that pings the database before the app accepts traffic.

Key Lesson

Never use plain str for connection strings in Pydantic — use the specialized types like PostgresDsn.Add a startup health check that fails the app if dependencies are unreachable.Use a separate validate_on_startup flag to skip validation in tests.

Production Debug GuideThree symptoms and their fixes

Container memory grows steadily over time, never decreasing.→Check for unclosed async generators or database sessions. Run docker stats to confirm growth is not due to worker count. Add async with context managers in all resource-heavy endpoints.

Gunicorn kills workers with SIGKILL after OOM.→Set Gunicorn's --max-requests to 1000 to force worker restarts after a fixed number of requests. Combine with --max-requests-jitter to avoid thundering herd restarts.

High memory usage only after traffic spikes.→Increase Gunicorn --workers to handle concurrency? Actually, too many workers increase memory. Use --worker-class uvicorn.workers.UvicornH11Worker instead of the default Uvicorn worker — it uses h11 library which has lower memory footprint per connection.

The Production Stack: Gunicorn + Uvicorn

Uvicorn is incredibly fast, but it lacks the advanced process management features required for 99.9% uptime. Gunicorn acts as the master process that manages the lifecycle of several Uvicorn worker processes. This architecture allows you to utilize multiple CPU cores effectively while maintaining a single entry point for your traffic.

deployment_commands.sh · BASH

123456789101112131415

# Install the production stack
pip install fastapi gunicorn uvicorn

# 1. Development Mode (Single process, auto-reload)
uvicorn io.thecodeforge.main:app --reload --host 127.0.0.1 --port 8000

# 2. Production Mode (Gunicorn as Process Manager)
# -w 4: Spawns 4 worker processes
# -k: Tells Gunicorn to use the Uvicorn worker class
gunicorn io.thecodeforge.main:app \
    --workers 4 \
    --worker-class uvicorn.workers.UvicornWorker \
    --bind 0.0.0.0:8000 \
    --access-logfile - \
    --error-logfile -

▶ Output

[INFO] Starting gunicorn 22.0.0
[INFO] Listening at: http://0.0.0.0:8000
[INFO] Booting worker with pid: 42

📊 Production Insight

If a worker gets stuck in an infinite while True loop, it never yields to Gunicorn's heartbeat. Gunicorn's --timeout (default 30s) kills and restarts it.

Without a timeout, your whole app hangs — no workers process new requests.

Rule: always set --timeout to at least 2× your longest endpoint latency.

🎯 Key Takeaway

Gunicorn manages worker lifecycle; Uvicorn handles async I/O.

Worker count = (2×CPUs)+1.

Set --timeout or risk silent worker-freeze.

Production-Grade Dockerfile

A naive Dockerfile creates a bloated image. At TheCodeForge, we use a 'slim' base image and careful layer ordering. By copying only requirements.txt first, we ensure that Docker only re-installs your libraries if the dependencies change, drastically speeding up your CI/CD pipeline.

Dockerfile · DOCKERFILE

12345678910111213141516171819202122232425262728

# Use an official Python slim image for a smaller footprint
FROM python:3.12-slim

# Prevents Python from writing pyc files and buffering stdout/stderr
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

WORKDIR /app

# Install system dependencies if needed (e.g., libpq for Postgres)
# RUN apt-get update && apt-get install -y --no-install-recommends gcc && rm -rf /var/lib/apt/lists/*

# Layer caching: Install dependencies first
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade -r requirements.txt

# Copy the application code
COPY . .

# Create a non-privileged user for security
RUN adduser --disabled-password forgeuser
USER forgeuser

# Expose the port FastAPI will run on
EXPOSE 8000

# Run Gunicorn with Uvicorn workers
CMD ["gunicorn", "io.thecodeforge.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]

▶ Output

Successfully built and tagged thecodeforge-api:latest

📊 Production Insight

If you COPY the entire app before installing dependencies, any code change invalidates the pip layer — every CI build re-downloads packages.

These builds take 3-5 minutes longer and fail more often due to network timeouts.

Rule: always copy requirements.txt first to leverage Docker layer caching.

🎯 Key Takeaway

Optimise layer order: dependencies before app code.

Always create a non-root user.

Use slim base images to reduce attack surface.

Robust Configuration with Pydantic Settings

Hardcoding database strings is a security hazard. We use pydantic-settings to create a strictly typed configuration object that pulls from environment variables. This ensures that if a required variable like DATABASE_URL is missing, the application will fail loudly at startup rather than crashing later in the middle of a request.

io/thecodeforge/config.py · PYTHON

1234567891011121314

from pydantic import Field, PostgresDsn
from pydantic_settings import BaseSettings, SettingsConfigDict

class ForgeSettings(BaseSettings):
    # Automatically reads from FORGE_DATABASE_URL or .env
    database_url: PostgresDsn
    secret_key: str = Field(min_length=32)
    debug: bool = False

    # Configuration for environment file loading
    model_config = SettingsConfigDict(env_file=".env", env_prefix="FORGE_")

# Instantiate once for singleton usage
settings = ForgeSettings()

▶ Output

Settings validated and loaded successfully.

📊 Production Insight

If you forget to set DATABASE_URL in the container environment, FastAPI starts anyway — then crashes on the first DB call.

This causes a 5xx error for your users and a confusing stack trace in the logs.

Rule: always validate required settings on startup, ideally in a health check endpoint.

🎯 Key Takeaway

Use Pydantic Settings with env_prefix to avoid collisions.

Validation on startup prevents silent failures.

Never hardcode secrets in code or Dockerfiles.

Environment Separation and .env Files

Different environments (dev, staging, production) need different configurations. Pydantic Settings loads from .env files locally, but in production you inject environment variables via your container orchestrator (Kubernetes, ECS). Keep a .env.example in version control — never commit the actual .env file.

.env.example · BASH

1234

# Example .env file — copy to .env and fill in values
FORGE_DATABASE_URL=postgresql://user:pass@localhost:5432/db
FORGE_SECRET_KEY=change-me-to-a-long-random-string
FORGE_DEBUG=false

📊 Production Insight

A junior dev committed a .env file with real staging credentials to GitHub. Within two hours, crypto miners were running on that server.

Rule: always add .env to .gitignore and use .env.example as the committed template.

🎯 Key Takeaway

Never commit .env files.

Use .env.example for documentation.

Secrets in production come from the orchestrator, not files.

Health Checks and Graceful Shutdown

Kubernetes and Docker rely on health checks to know if your app is alive and ready. FastAPI exposes /health endpoints, but you must also handle SIGTERM gracefully. Gunicorn's --graceful-timeout and --preload can cause issues if workers hold database connections.

io/thecodeforge/main.py · PYTHON

1234567891011121314

from fastapi import FastAPI
from app.core.config import settings

app = FastAPI()

@app.get("/health")
async def health():
    # Add database connectivity check here
    db_ok = check_database()
    return {"status": "healthy" if db_ok else "degraded"}

@app.get("/ready")
async def ready():
    return {"status": "ready"}

▶ Output

Health check endpoint returns 200 OK.

📊 Production Insight

A team deployed without a readiness check. The old version was terminated before the new one was ready — 45 seconds of downtime during every rolling update.

The fix: add separate liveness and readiness endpoints with proper DB checks.

🎯 Key Takeaway

Implement /health and /ready endpoints.

Handle SIGTERM: close DB connections in an app shutdown handler.

Gunicorn's --graceful-timeout gives workers time to finish requests.

🗂 Gunicorn + Uvicorn vs Uvicorn Alone in Production

Why you need a process manager

Feature	Uvicorn Alone	Gunicorn + Uvicorn
Process management	None — single process	Master process spawns, monitors, restarts workers
Worker crash recovery	App goes down	Gunicorn auto-restarts crashed worker
Graceful shutdown	Manual SIGTERM handling required	Built-in graceful timeout and worker draining
Worker concurrency	Limited to one core	Multiple workers across CPU cores
Zero-downtime reload	Not supported	TTIN/TTOU signals for safe restart
Production maturity	Newer, less battle-tested	Industry standard for 10+ years

🎯 Key Takeaways

Never use --reload in production; it monitors file descriptors which causes significant overhead and potential security leaks.
Gunicorn handles the 'Manager' role (restarting dead workers), while Uvicorn handles the 'Worker' role (parsing async HTTP).
Worker calculation: Standard heuristic is (2 x CPU Cores) + 1. On a 2-core cloud VM, use 5 workers.
Containerization: Always run as a non-root user (USER forgeuser) in your Dockerfile to mitigate container escape vulnerabilities.
Pydantic Settings: Use env_prefix to prevent collision with system environment variables.
Health checks: Add separate /health and /ready endpoints for liveness and readiness probes in orchestrators.
Graceful shutdown: Handle SIGTERM to close database connections; set Gunicorn's --graceful-timeout.

⚠ Common Mistakes to Avoid

✕Using `--reload` in production

Symptom

CPU usage spikes to 100% even with no traffic, and source code files are readable from the container (security risk).

Fix

Never include --reload in the production CMD. Use separate Dockerfiles for development and production, or override CMD locally.

✕Hardcoding environment variables in Dockerfile

Symptom

Secrets exposed in image layers (visible via docker history), and cannot change without rebuild.

Fix

Use Pydantic Settings to load from environment variables. Pass real values at container runtime: docker run -e DATABASE_URL=... or kubectl create secret generic.

✕Setting `--workers` to a very high number (e.g., 33 workers on a 16-core machine)

Symptom

Memory exhaustion, increased context switching, and degraded throughput.

Fix

Start with (2×CPU)+1 and benchmark. For Python GIL-bound apps, more workers may not help — consider using UvicornH11Worker or async tasks.

✕Not handling SIGTERM in FastAPI app

Symptom

During Kubernetes rolling updates, old pods are killed but database connections remain open, causing connection pool exhaustion.

Fix

Add a shutdown handler that closes DB connections: @app.on_event("shutdown") async def shutdown(): .... Also set Gunicorn's --graceful-timeout to 30s.

Interview Questions on This Topic

QExplain the 'Master-Worker' architecture of Gunicorn. What happens to the Master process if a Worker process encounters a Segmentation Fault?SeniorReveal
Gunicorn uses a master process (the arbiter) that forks multiple worker processes. The master monitors workers via heartbeat signals. If a worker segfaults, the kernel sends SIGCHLD to the master. The master logs the crash and immediately forks a new worker to replace it. The master itself never handles requests — if the master crashes, all workers become orphans and the app goes down. That's why you should run Gunicorn under a process supervisor like systemd or Docker's restart policy.
QIn a Docker context, why is it considered a 'best practice' to use a .dockerignore file for Python applications, and which specific folders should be included?Mid-levelReveal
A .dockerignore prevents unnecessary files from being sent to the Docker build context, speeding up builds and reducing image size. For Python apps, you should include: __pycache__/, .pyc, .pyo, .env, .git/, .gitignore, .vscode/, venv/, *.egg-info/, dist/, build/. Without it, a developer's local cache can bloat the image by hundreds of MB.
QHow does the PYTHONUNBUFFERED environment variable affect log visibility in containerized environments like AWS ECS or Google Cloud Run?Mid-levelReveal
By default, Python buffers stdout/stderr — you may see logs in bursts or miss them entirely in CloudWatch/Stackdriver. Setting PYTHONUNBUFFERED=1 disables buffering, so logs are written immediately. Without it, a crash may lose the last few log lines. In ECS and Cloud Run, logs are typically shipped via a sidecar agent that reads stdout — unbuffered output guarantees they're captured in real time.
QScenario: You are deploying to a 16-core machine. Following the (2n + 1) rule gives you 33 workers. Why might this actually degrade performance for a high-concurrency FastAPI app, and how would you tune it?SeniorReveal
33 Python processes would consume ~33 GB RAM (1 GB per worker) before handling any traffic, and context switching overhead degrades throughput. For FastAPI (async), each worker can handle many concurrent connections, so you don't need many workers. Start with 8–16 workers and benchmark. Also consider using UvicornH11Worker which has lower memory overhead per connection. The real limit is often I/O bandwidth or database connection pool size, not CPU cores.
QWhat is the difference between EXPOSE and PUBLISH in Docker, and which one actually controls network access to your FastAPI app?JuniorReveal
EXPOSE in the Dockerfile is documentation-only — it tells readers (and tools) that the container listens on that port. It has no effect on network access. PUBLISH (the -p flag in docker run) maps a host port to a container port, making the app accessible from outside. In production, you typically publish port 8000 (or 80 if behind a reverse proxy). Keep EXPOSE accurate for security scanners and automated tools.

Frequently Asked Questions

Why use Gunicorn with Uvicorn workers instead of just Uvicorn?

While Uvicorn has a --workers flag, Gunicorn is a battle-tested process manager that has been the industry standard for over a decade. It offers more sophisticated heartbeats, worker timeouts, and signals handling. If you are deploying on 'bare' servers or VMs, Gunicorn is mandatory. However, if you are using Kubernetes, you can often run Uvicorn alone because Kubernetes itself acts as the process manager, handling replicas and health checks at the pod level.

How do I handle database migrations in a Docker deployment?

Never put migration commands (like alembic upgrade head) inside your Docker CMD. This causes race conditions if you scale to multiple containers. Instead, run migrations as a separate 'init-container' in Kubernetes or a one-off task in your CI/CD pipeline before the new image goes live.

What is the 'Zombie Worker' problem in FastAPI deployments?

If a worker process gets stuck in a CPU-heavy loop (like an infinite while loop), it may stop responding to Gunicorn's heartbeats. Gunicorn will eventually kill and restart it. This is why we use async—to ensure the worker is always 'yielding' back to the manager during I/O waits.

Should I use a reverse proxy like Nginx in front of Gunicorn?

Yes, in production. Nginx handles SSL termination, static file serving, rate limiting, and buffer overflow protection. Gunicorn alone is not secure from slow HTTP attacks. The typical stack: Nginx → Gunicorn + Uvicorn → FastAPI. For container deployments, you can use Caddy or Traefik as alternatives.

How do I enable Prometheus metrics for FastAPI?

Install prometheus-fastapi-instrumentator and add it to your app: from prometheus_fastapi_instrumentator import Instrumentator; Instrumentator().instrument(app).expose(app). This exposes metrics at /metrics. Configure Prometheus to scrape that endpoint. Combine with Gunicorn's --statsd-host for worker-level metrics.

🔥

Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

About Naren Get in touch

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged