FastAPI Deployment — Docker, Uvicorn and Gunicorn
- Never use
--reloadin production; it monitors file descriptors which causes significant overhead and potential security leaks. - Gunicorn handles the 'Manager' role (restarting dead workers), while Uvicorn handles the 'Worker' role (parsing async HTTP).
- Worker calculation: Standard heuristic is
(2 x CPU Cores) + 1. On a 2-core cloud VM, use 5 workers.
- FastAPI requires an ASGI server; Uvicorn handles the async HTTP loop.
- Gunicorn acts as a process manager: it spawns, monitors, and restarts Uvicorn workers.
- Worker count formula: (2 × CPU cores) + 1 — 5 workers on a 2-core VM.
- Gunicorn's
--timeoutprevents stuck workers from hanging requests indefinitely. - Biggest mistake: using
--reloadin production — it monitors file descriptors, wastes CPU, and can leak source paths.
Container exits immediately after start
docker logs <container_id> --tail 50docker run --rm <image> sh -c "python -c 'from app.main import app'"Gunicorn workers not starting (permission denied)
docker exec <container> sh -c "ps aux | grep gunicorn"docker exec <container> sh -c "netstat -tulpn | grep 8000"Health check endpoint returns 500
docker exec <container> sh -c "curl -f http://localhost:8000/health"docker logs <container_id> --since 5m | grep ERRORSlow response times under load
curl -s http://localhost:8000/health | jq .gunicorn --statsd-host=localhost:8125 -w 4 -k uvicorn.workers.UvicornWorker (requires statsd exporter)Environment variables not loading inside container
docker exec <container> env | grep FORGE_docker inspect <container> --format='{{range .Config.Env}}{{print . "\n"}}{{end}}'Production Incident
ConnectTimeoutError: could not connect to server. Logs show 'NoneType' object has no attribute 'connect'.ValidationError and the app would fail to start. But they had set DATABASE_URL: str without Field(...), so an empty string was accepted.database_url: str instead of database_url: PostgresDsn. An empty string passed validation because str allows empty. The database driver then tried to connect to an empty hostname, causing a timeout.database_url: PostgresDsn from pydantic to enforce a valid URL format. Also added a startup check that pings the database before the app accepts traffic.str for connection strings in Pydantic — use the specialized types like PostgresDsn.Add a startup health check that fails the app if dependencies are unreachable.Use a separate validate_on_startup flag to skip validation in tests.Production Debug GuideThree symptoms and their fixes
docker stats to confirm growth is not due to worker count. Add async with context managers in all resource-heavy endpoints.--max-requests to 1000 to force worker restarts after a fixed number of requests. Combine with --max-requests-jitter to avoid thundering herd restarts.--workers to handle concurrency? Actually, too many workers increase memory. Use --worker-class uvicorn.workers.UvicornH11Worker instead of the default Uvicorn worker — it uses h11 library which has lower memory footprint per connection.The Production Stack: Gunicorn + Uvicorn
Uvicorn is incredibly fast, but it lacks the advanced process management features required for 99.9% uptime. Gunicorn acts as the master process that manages the lifecycle of several Uvicorn worker processes. This architecture allows you to utilize multiple CPU cores effectively while maintaining a single entry point for your traffic.
# Install the production stack pip install fastapi gunicorn uvicorn # 1. Development Mode (Single process, auto-reload) uvicorn io.thecodeforge.main:app --reload --host 127.0.0.1 --port 8000 # 2. Production Mode (Gunicorn as Process Manager) # -w 4: Spawns 4 worker processes # -k: Tells Gunicorn to use the Uvicorn worker class gunicorn io.thecodeforge.main:app \ --workers 4 \ --worker-class uvicorn.workers.UvicornWorker \ --bind 0.0.0.0:8000 \ --access-logfile - \ --error-logfile -
[INFO] Listening at: http://0.0.0.0:8000
[INFO] Booting worker with pid: 42
while True loop, it never yields to Gunicorn's heartbeat. Gunicorn's --timeout (default 30s) kills and restarts it.--timeout to at least 2× your longest endpoint latency.--timeout or risk silent worker-freeze.Production-Grade Dockerfile
A naive Dockerfile creates a bloated image. At TheCodeForge, we use a 'slim' base image and careful layer ordering. By copying only requirements.txt first, we ensure that Docker only re-installs your libraries if the dependencies change, drastically speeding up your CI/CD pipeline.
# Use an official Python slim image for a smaller footprint FROM python:3.12-slim # Prevents Python from writing pyc files and buffering stdout/stderr ENV PYTHONDONTWRITEBYTECODE=1 ENV PYTHONUNBUFFERED=1 WORKDIR /app # Install system dependencies if needed (e.g., libpq for Postgres) # RUN apt-get update && apt-get install -y --no-install-recommends gcc && rm -rf /var/lib/apt/lists/* # Layer caching: Install dependencies first COPY requirements.txt . RUN pip install --no-cache-dir --upgrade -r requirements.txt # Copy the application code COPY . . # Create a non-privileged user for security RUN adduser --disabled-password forgeuser USER forgeuser # Expose the port FastAPI will run on EXPOSE 8000 # Run Gunicorn with Uvicorn workers CMD ["gunicorn", "io.thecodeforge.main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]
Robust Configuration with Pydantic Settings
Hardcoding database strings is a security hazard. We use pydantic-settings to create a strictly typed configuration object that pulls from environment variables. This ensures that if a required variable like DATABASE_URL is missing, the application will fail loudly at startup rather than crashing later in the middle of a request.
from pydantic import Field, PostgresDsn from pydantic_settings import BaseSettings, SettingsConfigDict class ForgeSettings(BaseSettings): # Automatically reads from FORGE_DATABASE_URL or .env database_url: PostgresDsn secret_key: str = Field(min_length=32) debug: bool = False # Configuration for environment file loading model_config = SettingsConfigDict(env_file=".env", env_prefix="FORGE_") # Instantiate once for singleton usage settings = ForgeSettings()
DATABASE_URL in the container environment, FastAPI starts anyway — then crashes on the first DB call.Environment Separation and .env Files
Different environments (dev, staging, production) need different configurations. Pydantic Settings loads from .env files locally, but in production you inject environment variables via your container orchestrator (Kubernetes, ECS). Keep a .env.example in version control — never commit the actual .env file.
# Example .env file — copy to .env and fill in values FORGE_DATABASE_URL=postgresql://user:pass@localhost:5432/db FORGE_SECRET_KEY=change-me-to-a-long-random-string FORGE_DEBUG=false
.env file with real staging credentials to GitHub. Within two hours, crypto miners were running on that server..env to .gitignore and use .env.example as the committed template.Health Checks and Graceful Shutdown
Kubernetes and Docker rely on health checks to know if your app is alive and ready. FastAPI exposes /health endpoints, but you must also handle SIGTERM gracefully. Gunicorn's --graceful-timeout and --preload can cause issues if workers hold database connections.
from fastapi import FastAPI from app.core.config import settings app = FastAPI() @app.get("/health") async def health(): # Add database connectivity check here db_ok = check_database() return {"status": "healthy" if db_ok else "degraded"} @app.get("/ready") async def ready(): return {"status": "ready"}
--graceful-timeout gives workers time to finish requests.| Feature | Uvicorn Alone | Gunicorn + Uvicorn |
|---|---|---|
| Process management | None — single process | Master process spawns, monitors, restarts workers |
| Worker crash recovery | App goes down | Gunicorn auto-restarts crashed worker |
| Graceful shutdown | Manual SIGTERM handling required | Built-in graceful timeout and worker draining |
| Worker concurrency | Limited to one core | Multiple workers across CPU cores |
| Zero-downtime reload | Not supported | TTIN/TTOU signals for safe restart |
| Production maturity | Newer, less battle-tested | Industry standard for 10+ years |
🎯 Key Takeaways
- Never use
--reloadin production; it monitors file descriptors which causes significant overhead and potential security leaks. - Gunicorn handles the 'Manager' role (restarting dead workers), while Uvicorn handles the 'Worker' role (parsing async HTTP).
- Worker calculation: Standard heuristic is
(2 x CPU Cores) + 1. On a 2-core cloud VM, use 5 workers. - Containerization: Always run as a non-root user (
USER forgeuser) in your Dockerfile to mitigate container escape vulnerabilities. - Pydantic Settings: Use
env_prefixto prevent collision with system environment variables. - Health checks: Add separate /health and /ready endpoints for liveness and readiness probes in orchestrators.
- Graceful shutdown: Handle SIGTERM to close database connections; set Gunicorn's
--graceful-timeout.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QExplain the 'Master-Worker' architecture of Gunicorn. What happens to the Master process if a Worker process encounters a Segmentation Fault?SeniorReveal
- QIn a Docker context, why is it considered a 'best practice' to use a .dockerignore file for Python applications, and which specific folders should be included?Mid-levelReveal
- QHow does the
PYTHONUNBUFFEREDenvironment variable affect log visibility in containerized environments like AWS ECS or Google Cloud Run?Mid-levelReveal - QScenario: You are deploying to a 16-core machine. Following the
(2n + 1)rule gives you 33 workers. Why might this actually degrade performance for a high-concurrency FastAPI app, and how would you tune it?SeniorReveal - QWhat is the difference between
EXPOSEandPUBLISHin Docker, and which one actually controls network access to your FastAPI app?JuniorReveal
Frequently Asked Questions
Why use Gunicorn with Uvicorn workers instead of just Uvicorn?
While Uvicorn has a --workers flag, Gunicorn is a battle-tested process manager that has been the industry standard for over a decade. It offers more sophisticated heartbeats, worker timeouts, and signals handling. If you are deploying on 'bare' servers or VMs, Gunicorn is mandatory. However, if you are using Kubernetes, you can often run Uvicorn alone because Kubernetes itself acts as the process manager, handling replicas and health checks at the pod level.
How do I handle database migrations in a Docker deployment?
Never put migration commands (like alembic upgrade head) inside your Docker CMD. This causes race conditions if you scale to multiple containers. Instead, run migrations as a separate 'init-container' in Kubernetes or a one-off task in your CI/CD pipeline before the new image goes live.
What is the 'Zombie Worker' problem in FastAPI deployments?
If a worker process gets stuck in a CPU-heavy loop (like an infinite while loop), it may stop responding to Gunicorn's heartbeats. Gunicorn will eventually kill and restart it. This is why we use async—to ensure the worker is always 'yielding' back to the manager during I/O waits.
Should I use a reverse proxy like Nginx in front of Gunicorn?
Yes, in production. Nginx handles SSL termination, static file serving, rate limiting, and buffer overflow protection. Gunicorn alone is not secure from slow HTTP attacks. The typical stack: Nginx → Gunicorn + Uvicorn → FastAPI. For container deployments, you can use Caddy or Traefik as alternatives.
How do I enable Prometheus metrics for FastAPI?
Install prometheus-fastapi-instrumentator and add it to your app: from prometheus_fastapi_instrumentator import Instrumentator; . This exposes metrics at Instrumentator().instrument(app).expose(app)/metrics. Configure Prometheus to scrape that endpoint. Combine with Gunicorn's --statsd-host for worker-level metrics.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.