Performance insight: stateless processes enable horizontal scaling — any instance can handle any request, making autoscaling linear and predictable
Production insight: violating Factor III (config in code) leads to credential leaks; over 60% of cloud breaches trace back to hardcoded secrets in git
Biggest mistake: treating the factors as optional — each one prevents a specific class of production failure; skipping even one creates a brittle point
Plain-English First
Imagine a fast-food franchise. Every location makes the same burger the same way, using the same recipe, the same equipment, and the same training manual. A new employee in Tokyo can follow the same steps as one in Toronto and get the same result. The Twelve-Factor App is that franchise manual — but for software. It's a set of twelve rules that makes your app behave predictably no matter where it runs: your laptop, a test server, or a cloud cluster with a thousand instances.
Every developer has felt the dread of 'it works on my machine.' You deploy to staging and something breaks. You scale up and the app starts behaving differently under load. You hand the codebase to a new teammate and it takes them two days just to run it locally. These aren't bad-luck problems — they're architecture problems. And they have a name: tightly coupled, environment-dependent software.
In 2011, the engineers at Heroku distilled years of operating thousands of production apps into a document called the Twelve-Factor App. It's not a framework or a library — it's a methodology. Twelve principles that, when followed together, produce apps that are portable between environments, scalable without re-architecture, and maintainable by any competent developer who picks up the codebase. Cloud platforms like Heroku, AWS Elastic Beanstalk, and Google Cloud Run are essentially built around these ideas.
By the end of this article you'll understand not just what each factor is, but exactly WHY it exists — what specific failure mode it prevents. You'll see concrete code-level and config-level examples, know which factors trip up most teams in production, and be able to speak fluently about this methodology in a system design interview. Let's build something that actually scales.
Factors I–IV: Your Codebase, Dependencies, Config, and Backing Services
The first four factors are about the foundation: how you store your code, how you declare what it needs, where you put your secrets, and how you talk to external things like databases.
Factor I — Codebase: One codebase, tracked in version control, deployed many times. If you have two apps sharing code via copy-paste, that's two codebases — extract the shared part into a library. If one codebase powers multiple apps, that's a monorepo (a different, legitimate pattern), but the factor still applies per deployable unit.
Factor II — Dependencies: Explicitly declare every dependency. Never rely on system-wide installed packages. A Python app should have a requirements.txt. A Node app, package.json. This means a fresh clone + one install command = runnable app. No 'oh you also need to brew install libpq globally' surprises.
Factor III — Config: Anything that changes between deploys (dev, staging, prod) lives in environment variables — not in code, not in a config file committed to git. Database URLs, API keys, feature flags: all env vars. The test is simple — could you open-source your codebase right now without leaking credentials? If yes, your config is correctly separated.
Factor IV — Backing Services: Treat every external resource (database, cache, message queue, email service) as an attached resource accessed via URL. Swapping your local Postgres for a managed RDS instance should require only changing an environment variable, not touching code. This is the plugin model applied to infrastructure.
twelve-factor-foundation.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# ============================================================
# DemonstratingFactors I-IV in a real DockerCompose setup
# This file shows how a twelve-factor app wires its foundation
# ============================================================
version: '3.9'
services:
# --- TheApplication (Factor I: one codebase, one image) ---
web_api:
build:
context: . # Build from THIS repo — one codebase, one image
dockerfile: Dockerfile
ports:
- "8000:8000"
# FactorIII: ALL config via environment variables
# Nothing here is hardcoded in application source code
environment:
- APP_ENV=development
- SECRET_KEY=dev-only-secret-replace-in-prod # In prod, inject via secrets manager
- LOG_LEVEL=debug
# FactorIV: Backing services treated as attached resources via URL
# Swap DATABASE_URL to point at RDS in prod — zero code changes needed
- DATABASE_URL=postgresql://app_user:app_pass@postgres_db:5432/appdb
- CACHE_URL=redis://cache_store:6379/0
- EMAIL_API_URL=https://api.sendgrid.com/v3/mail/send
- EMAIL_API_KEY=SG.placeholder-replace-with-real-key
depends_on:
- postgres_db
- cache_store
# --- Postgres (a BackingService — FactorIV) ---
postgres_db:
image: postgres:15-alpine
environment:
- POSTGRES_USER=app_user
- POSTGRES_PASSWORD=app_pass
- POSTGRES_DB=appdb
volumes:
- postgres_data:/var/lib/postgresql/data # Persist data outside container
# --- RedisCache (another BackingService — FactorIV) ---
cache_store:
image: redis:7-alpine
volumes:
postgres_data:
# ============================================================
# FactorII: Explicit dependencies are in requirements.txt
# (shown below — never pip install globally without pinning)
# ============================================================
# requirements.txt (referenced by Dockerfile)
# fastapi==0.110.0
# uvicorn==0.29.0
# psycopg2-binary==2.9.9
# redis==5.0.3
# httpx==0.27.0
# python-dotenv==1.0.1 # Onlyfor local dev — prod uses real env vars
Output
$ docker compose up --build
[+] Building web_api (12 layers) — DONE
[+] Running 3/3
✔ Container postgres_db Started
✔ Container cache_store Started
✔ Container web_api Started
web_api | INFO: Started server process [1]
web_api | INFO: Waiting for application startup.
web_api | INFO: Application startup complete.
web_api | INFO: Uvicorn running on http://0.0.0.0:8000
# In production, only DATABASE_URL changes — zero code changes needed
Watch Out: The .env file trap
Using a .env file locally is fine — but NEVER commit it to git. Add .env to .gitignore immediately. Provide a .env.example with placeholder values instead. Leaked API keys in git history are one of the most common (and costly) security incidents in real teams.
Production Insight
The secret's in the environment, not the code.
Config leaks happen when a team commits a .env.development to git even temporarily.
Rule: gitignore all .env* patterns and rotate any credentials that ever touch a file.
Key Takeaway
Config is the root of all evil.
A twelve-factor app treats config as environment, not code — because environment can be changed without a deploy.
When to break Factor IV: monorepo sharing
IfTwo apps need to share a common config schema or validation logic
→
UseExtract the shared logic into a library package and include it as a dependency in both apps. Do not copy-paste config files between repos.
IfOne repo contains both frontend and backend codebases that are deployed separately
→
UseUse a monorepo tool (Nx, Turborepo) that allows independent build and deploy per codebase. Each deployable unit still follows Factor I.
Factors V–VIII: Build, Process Model, Port Binding, and Concurrency
These four factors define how your app runs. They're the reason cloud platforms can scale you from 1 instance to 1,000 without you writing special scaling code.
Factor V — Build, Release, Run: Strictly separate these three stages. The build stage compiles code and assets. The release stage combines the build with config (env vars). The run stage executes the release. You should never be able to change code in a running process — that's an emergency anti-pattern. Every release gets an ID. You can roll back to release #47 anytime.
Factor VI — Processes: Run your app as one or more stateless processes. No sticky sessions. No storing user data in memory between requests. If your app needs to remember something, it stores it in a backing service (Redis, Postgres). This is what makes horizontal scaling possible — any instance can handle any request.
Factor VII — Port Binding: Your app is self-contained and exposes its service by binding to a port. It doesn't rely on a web server like Apache being injected at runtime. A Python FastAPI app runs Uvicorn internally — you tell it uvicorn main:app --port 8000 and it's a web server. This lets it be consumed as a backing service by other apps too.
Factor VIII — Concurrency: Scale out via the process model, not up via bigger machines. Use a process type hierarchy: web processes handle HTTP, worker processes handle background jobs, scheduler processes handle cron jobs. Scale each type independently. Ten web processes + two worker processes is better than one giant machine doing everything.
stateless_web_process.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# ============================================================# Factor VI: Stateless Processes — the right way# This FastAPI app stores ALL session state in Redis,# so ANY running instance can handle ANY request.# Scale to 50 instances — every one works identically.# ============================================================import os
import json
import uuid
from fastapi importFastAPI, HTTPException, Cookiefrom fastapi.responses importJSONResponseimport redis
app = FastAPI()
# Factor IV + III: Backing service URL from environment variable
SESSION_STORE = redis.from_url(
os.environ["CACHE_URL"], # e.g. redis://cache_store:6379/0
decode_responses=True
)
SESSION_TTL_SECONDS = 3600# Sessions expire after 1 hour
@app.post("/login")
asyncdeflogin(username: str, password: str):
"""
FactorVIin action: we create a session token and store
ALL session data inRedis — nothing lives in process memory.
Any instance of this app can validate this session.
"""
# (In reality, verify username/password against database)if password != "correct-horse-battery-staple":
raiseHTTPException(status_code=401, detail="Invalid credentials")
# Generate a unique session ID
session_id = str(uuid.uuid4())
# Store session data in Redis (the backing service) — NOT in process memory
session_data = {
"username": username,
"role": "editor",
"login_timestamp": "2024-01-15T09:30:00Z"
}
SESSION_STORE.setex(
name=f"session:{session_id}", # Namespaced key
time=SESSION_TTL_SECONDS, # Auto-expire old sessions
value=json.dumps(session_data) # Serialised to string for Redis
)
response = JSONResponse({"message": "Login successful", "session_id": session_id})
# Return session_id to client via cookie
response.set_cookie(key="session_id", value=session_id, httponly=True)
return response
@app.get("/dashboard")
asyncdefdashboard(session_id: str = Cookie(default=None)):
"""
Any of the 50 running instances can serve this request
because session state lives inRedis, notin process memory.
Thisis what makes horizontal scaling work.
"""
ifnot session_id:
raiseHTTPException(status_code=401, detail="No session cookie")
# Look up session from Redis — works regardless of which instance handles this
raw_session = SESSION_STORE.get(f"session:{session_id}")
ifnot raw_session:
raiseHTTPException(status_code=401, detail="Session expired or invalid")
session_data = json.loads(raw_session)
return {
"welcome": f"Hello, {session_data['username']}!",
"role": session_data["role"],
"instance_note": "Any instance served this — stateless processes working correctly"
}
# ============================================================# Factor VII: Port Binding — app is self-contained# Run with: uvicorn stateless_web_process:app --host 0.0.0.0 --port 8000# No Apache/Nginx dependency at runtime. The app IS the server.# ============================================================# ============================================================# Factor VIII: Concurrency — Procfile for process type hierarchy# web: uvicorn stateless_web_process:app --host 0.0.0.0 --port $PORT --workers 4# worker: celery -A tasks worker --concurrency=8# scheduler: celery -A tasks beat# Scale web and worker independently — no single giant process# ============================================================
# GET /dashboard (with cookie, served by a DIFFERENT instance)
# Response: {
# "welcome": "Hello, alice!",
# "role": "editor",
# "instance_note": "Any instance served this — stateless processes working correctly"
# }
Pro Tip: The Sticky Session Smell Test
If someone on your team says 'we need sticky sessions because our app stores X in memory', that's Factor VI being violated. The fix isn't to configure sticky sessions in your load balancer — it's to move X into Redis or Postgres. Sticky sessions are a band-aid that makes horizontal scaling fragile and breaks when an instance restarts.
Production Insight
Sticky sessions hide the real problem.
When a web instance restarts, any session data held in its memory is lost — users get logged out or see partial state.
Rule: if you need sticky sessions, you don't have stateless processes — fix that first.
Key Takeaway
Statelessness is the prerequisite for scale.
Any in-memory state ties you to a single instance. Move it out, and scaling becomes trivial.
Build/Release/Run separation decisions
IfYou need to roll back to a previous version of the app with the same config as current production
→
UseUse a release ID (e.g., git tag or Docker image tag) and store the config snapshot separately. Roll back by deploying the old release image with the same config.
IfYou want to run a database migration one-off process
→
UseUse the same release image with an alternative command (e.g., docker run release:v47 python manage.py migrate). Never SSH into a running container to run migrations.
Factors IX–XII: Disposability, Dev/Prod Parity, Logs, and Admin Tasks
The final four factors are about operational maturity — how your app behaves under real production conditions: restarts, failures, debugging, and maintenance.
Factor IX — Disposability: Processes start fast and shut down gracefully. On SIGTERM, a web process stops accepting new requests, finishes in-flight requests, then exits. A worker process returns its current job to the queue before dying. This means you can deploy new versions, auto-scale down, or recover from crashes without data loss or user-facing errors. If your app takes 3 minutes to start, you can't rapidly scale or deploy.
Factor X — Dev/Prod Parity: Keep development, staging, and production as similar as possible — same OS, same backing service versions, same data. The classic violation: using SQLite locally but Postgres in prod. You miss Postgres-specific bugs all the way to production. Use Docker Compose locally to run the real Postgres version.
Factor XI — Logs: Treat logs as event streams. Your app writes to stdout — period. It does NOT manage log files, rotate them, or decide where they go. The execution environment captures stdout and routes it to wherever you've configured (Datadog, Splunk, CloudWatch). This separation lets ops teams change log routing without touching application code.
Factor XII — Admin Processes: Run one-off admin tasks (database migrations, console sessions, data backups) as one-off processes in the same environment as the app. heroku run python manage.py migrate — same release, same config, same codebase. Don't ssh into a production box and run scripts by hand.
graceful_shutdown_and_logging.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# ============================================================# Factor IX: Disposability — graceful shutdown# Factor XI: Logs as event streams — write to stdout only# ============================================================import os
import sys
import signal
import logging
import time
from threading importEvent# Factor XI: Configure logging to stdout ONLY.# The platform (Heroku/K8s/ECS) captures this and routes it.# Your app NEVER writes to /var/log/app.log or manages log rotation.
logging.basicConfig(
stream=sys.stdout, # stdout only — no file handlers
level=logging.INFO,
format='{"timestamp": "%(asctime)s", "level": "%(levelname)s", "message": "%(message)s"}'# Structured JSON logs are even better — tools like Datadog parse these automatically
)
logger = logging.getLogger(__name__)
classOrderProcessingWorker:
"""
A background worker that processes orders from a queue.
DemonstratesFactorIX: fast startup + graceful SIGTERM handling.
"""
def__init__(self):
self.is_running = Falseself.shutdown_event = Event()
self.current_job_id = None# Register signal handlers for graceful shutdown# SIGTERM is sent by Heroku/Kubernetes when scaling down or deploying
signal.signal(signal.SIGTERM, self._handle_shutdown_signal)
signal.signal(signal.SIGINT, self._handle_shutdown_signal) # Ctrl+C locallydef_handle_shutdown_signal(self, signum, frame):
"""
FactorIX: When the platform sends SIGTERM, we don't die immediately.
We finish the current job, return incomplete work to the queue,
then exit cleanly. Zero data loss.
"""
signal_name = "SIGTERM"if signum == 15else"SIGINT"
logger.info(f"Received {signal_name} — starting graceful shutdown")
ifself.current_job_id:
logger.info(f"Returning job {self.current_job_id} to queue before shutdown")
# In a real app: queue.nack(self.current_job_id) — returns job to queue# so another worker picks it up. No lost orders.
self.shutdown_event.set() # Signal the main loop to stopdefprocess_order(self, order_id: str, order_data: dict) -> bool:
"""Simulate processing a single order."""self.current_job_id = order_id
logger.info(f"Processing order {order_id} for customer {order_data['customer_email']}")
# Simulate work (database writes, payment processing, etc.)
time.sleep(2)
logger.info(f"Order {order_id} processed successfully — total: ${order_data['total_cents'] / 100:.2f}")
self.current_job_id = NonereturnTruedefrun(self):
"""Main processing loop."""self.is_running = True
logger.info("Order worker started — listening for jobs") # Logs to stdout# Simulate picking up jobs from a queue
pending_orders = [
{"id": "ord_001", "customer_email": "alice@example.com", "total_cents": 4999},
{"id": "ord_002", "customer_email": "bob@example.com", "total_cents": 12500},
{"id": "ord_003", "customer_email": "carol@example.com", "total_cents": 899},
]
for order in pending_orders:
ifself.shutdown_event.is_set():
logger.info("Shutdown requested — stopping before next job")
break # Clean exit — don't start a new job if shutting downself.process_order(order["id"], order)
logger.info("Worker shut down cleanly") # This reaches your log aggregator
sys.exit(0) # Clean exit code — platform knows this was intentional# ============================================================# Factor XII: Admin process example# Run database migrations as a one-off process:# heroku run python manage.py db upgrade# or in Docker:# docker run --env-file .env myapp:v1.2 python manage.py db upgrade# Same image, same config, same codebase as production — guaranteed consistency# ============================================================if __name__ == "__main__":
worker = OrderProcessingWorker()
worker.run()
Output
{"timestamp": "2024-01-15 09:30:01", "level": "INFO", "message": "Order worker started — listening for jobs"}
{"timestamp": "2024-01-15 09:30:01", "level": "INFO", "message": "Processing order ord_001 for customer alice@example.com"}
{"timestamp": "2024-01-15 09:30:04", "level": "INFO", "message": "Returning job ord_002 to queue before shutdown"}
{"timestamp": "2024-01-15 09:30:04", "level": "INFO", "message": "Worker shut down cleanly"}
# ord_002 is picked up by another worker instance — zero data loss
Interview Gold: Why Logs to Stdout?
Interviewers love this one. The answer isn't just 'it's a best practice'. It's a separation of concerns: the app's job is to produce events; the platform's job is to route them. In a Kubernetes cluster you might have 40 pods across 10 nodes — you can't have each writing to local files. stdout means a single log aggregator (Fluentd, Filebeat) can collect everything centrally. It also means log routing config lives in the platform, not scattered across dozens of application codebases.
Production Insight
Log files inside containers disappear with the container.
When Kubernetes kills a pod for scaling down, all log files on the pod are gone. You lose that data.
Rule: always write logs to stdout, and use the platform's log driver to persist them.
Key Takeaway
Stdout is the universal log interface.
Let your app write events, let the platform decide where they go. That separation is what makes centralized logging possible at scale.
Dev/Prod parity enforcement
IfYour local dev uses SQLite but production uses Postgres
→
UseSwitch to Docker Compose with a Postgres container. Use environment variables to switch between local Postgres and production RDS.
IfProduction uses Redis 7 but dev has Redis 5 (system package)
→
UseAlways use the same version in dev — run Redis 7 in Docker locally. Dev/prod parity includes versions, not just technology.
Common Twelve-Factor Violations in Production (and How to Fix Them)
Even teams that know the twelve factors often get them wrong in subtle ways. Here are the most frequent violations we see in production audits — and exactly how to fix each one.
Violation: Config in code with fallback defaults. Many codebases have os.getenv('DATABASE_URL', 'sqlite:///dev.db'). That fallback means the app silently runs against SQLite in a production environment where the env var is missing. Instead of a fast failure, you get data loss. Fix: remove fallback values. If the env var is missing, crash early with a clear error.
Violation: Dependencies not pinned.requirements.txt with django>=4.0 rather than django==4.2.7. A minor release can change behaviour (e.g., Django 4.2 dropped support for some PostgreSQL functions). The build that worked yesterday might fail today. Fix: pin exact versions and use a lockfile (e.g., pip freeze > requirements.txt or Poetry's poetry.lock).
Violation: Backing services accessed via IP rather than URL. Hardcoding 192.168.1.5:5432 instead of using a DNS name or service URL. When the database is migrated or scaled, the IP changes. Fix: always connect via a resolvable hostname (e.g., postgres_db in Docker Compose, or a full URL like postgresql://host:port/db).
Violation: Process stores local state in the filesystem. Uploaded images stored in ./uploads/ on the local disk. When you scale to multiple instances, an image uploaded to instance A is not available on instance B. Fix: use object storage (S3, GCS) or a shared file system (NFS, EFS).
violations_fixes.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# ============================================================# Fixes for common twelve-factor violations# ============================================================import os
# VIOLATION: Config with fallback (silently uses dev defaults in prod)# Bad:
DATABASE_URL = os.getenv('DATABASE_URL', 'sqlite:///dev.db')
# Fix: Crash early if essential config is missing# Good:
DATABASE_URL = os.getenv('DATABASE_URL')
ifnot DATABASE_URL:
raiseEnvironmentError("DATABASE_URL environment variable is not set. App cannot start.")
# VIOLATION: Storing files locally (lost on instance restart, not shared)# Bad:# import tempfile# upload_path = '/tmp/uploads/'# Use S3/GCS instead:
upload_path = os.getenv('UPLOAD_BUCKET_URL', 's3://myapp-uploads/') # s3fs or boto3# VIOLATION: Hardcoded backing service IP# Bad:# REDIS_HOST = '192.168.1.10'# Fix: Use environment variable for host
REDIS_URL = os.getenv('REDIS_URL', 'redis://redis_service:6379/0')
Output
$ python check_config.py
Traceback (most recent call last):
File "check_config.py", line 8, in <module>
EnvironmentError: DATABASE_URL environment variable is not set. App cannot start.
# Good — early failure prevents silent data loss
Seven Sins of Twelve-Factor Violations
Config in code → Factor III (Config) – fix by moving to env vars
Sticky sessions needed → Factor VI (Processes) – move state to backing service
Slow startup (>30s) → Factor IX (Disposability) – lazy load and defer
Log files in container → Factor XI (Logs) – switch to stdout
SQLite locally, Postgres in prod → Factor X (Dev/Prod parity) – use Docker Compose with same DB version
SSH into prod to run migrations → Factor XII (Admin processes) – use one-off containers with same image
Relies on global npm packages → Factor II (Dependencies) – declare in package.json and use lockfile
Production Insight
Most production incidents trace back to one of these seven violations.
The fix is almost always a config change, not a code rewrite.
Rule: run a weekly automated compliance check using a script that verifies each factor (e.g., check no .env files in git, check lockfile exists, check stdout logging).
Key Takeaway
Compliance is a habit, not a project.
Automate it. Every deploy should include a twelve-factor compliance gate that blocks violations before they reach production.
Auditing Your App for Twelve-Factor Compliance
You can't fix what you don't measure. Here's a systematic way to audit your codebase against all twelve factors. Run this checklist before every major deploy or at least once a quarter.
1. Codebase (I): Is there exactly one deployable unit per repository? Check for multiple apps in one repo with no separation. If using a monorepo, verify each app has its own package.json or pom.xml.
2. Dependencies (II): Does the project have an explicit dependency file (requirements.txt, package.json, pom.xml, Gemfile)? Does it include transitive dependencies via a lockfile? Run pip freeze or npm list and compare to the declared file.
3. Config (III): Search the entire codebase for any hardcoded strings that look like secrets or environment-specific values. Use grep -rn 'password\|secret\|api_key\|DATABASE_URL' . --exclude-dir=.git. Check that no .env files are in git history.
4. Backing Services (IV): Identify all external services (database, cache, queue, mail). Verify that each is accessed via a URL from an environment variable — not a hardcoded hostname or IP.
5. Build/Release/Run (V): Can you build a release artifact (Docker image, JAR) that is immutable? Does the CI/CD pipeline separate build from release? Is there a way to roll back to a previous release by changing the release tag?
6. Processes (VI): Test by logging in, then connecting to a different instance and checking that the session persists. Also test by killing one instance while a request is in-flight — does the load balancer retry on another instance without user-visible error?
7. Port Binding (VII): Does the app expose itself by binding to a port (e.g., 0.0.0.0:8000) without requiring an external web server? Can you access it directly on that port?
8. Concurrency (VIII): Is there a process type hierarchy (web, worker, scheduler)? Can each scale independently? Check the Procfile or equivalent.
9. Disposability (IX): How long does it take to start from cold? (Target <5 seconds). Graceful shutdown: send SIGTERM while processing a request and verify the request completes.
10. Dev/Prod Parity (X): Are the dev, staging, and production environments running the same OS and same backing service versions? Check Docker images and database versions.
11. Logs (XI): Are there any file-based log handlers in the code? Check logging configuration. All logs should go to stdout.
12. Admin Processes (XII): Can you run a one-off admin task (e.g., migration) using the same release image and config? Test with a dry run.
check_twelve_factor.shSHELL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#!/bin/bash
# Quick twelve-factor compliance check script
# Run from project root after setting up environment
set -euo pipefail
echo "=== Twelve-Factor Compliance Check ==="
echo "[1] CODEBASE — One repo per app?"if [ -f "package.json" ] && [ -f "server.py" ]; then
echo " Found both package.json and server.py — check if monorepo structure is intentional."
fi
echo "[2] DEPENDENCIES — Pinned with lockfile?"if [ -f "requirements.txt" ]; then
if grep -qE '>=|<=|~=' requirements.txt; then
echo " WARNING: Unpinned dependency found (e.g., >=) — use == for exact version."
fi
fi
if [ -f "package.json" ]; then
if [ ! -f "package-lock.json" ] && [ ! -f "yarn.lock" ]; then
echo " WARNING: No lockfile found for Node.js project."
fi
fi
echo "[3] CONFIG — Secrets in code?"if grep -rn 'password' src/ --include='*.py' --include='*.js'2>/dev/null; then
echo " FAILED: Found potential password in source code."
fi
echo "[4] BACKING SERVICES — URLs in env?"if grep -r 'postgresql://|redis://|mysql://' src/ --include='*.py' --include='*.js'2>/dev/null; then
echo " WARNING: Backing service URL appears hardcoded in code."
fi
echo "[9] DISPOSABILITY — Startup time test"
echo " Run: time docker run --rm <image> python -c 'import sys; sys.exit(0)'"
echo "[11] LOGS — Stdout only?"if grep -rn 'FileHandler' --include='*.py'2>/dev/null; then
echo " FAILED: File-based log handler found. Must use stdout."
fi
echo "=== Check complete ==="
Output
$ bash check_twelve_factor.sh
=== Twelve-Factor Compliance Check ===
[1] CODEBASE — One repo per app?
[2] DEPENDENCIES — Pinned with lockfile?
WARNING: Unpinned dependency found (e.g., >=) — use == for exact version.
[3] CONFIG — Secrets in code?
[4] BACKING SERVICES — URLs in env?
[9] DISPOSABILITY — Startup time test
Run: time docker run --rm <image> python -c 'import sys; sys.exit(0)'
[11] LOGS — Stdout only?
FAILED: File-based log handler found. Must use stdout.
=== Check complete ===
Automate compliance in CI/CD
Add this check script to your CI pipeline as a non-blocking advisory step at first, then gradually enforce failures. Start with the high-impact violations: config in code (Factor III), sticky sessions (Factor VI), and log files (Factor XI). The others can be gated later.
Production Insight
Auditing once a year is not enough. Violations creep in with every commit.
We've seen teams pass compliance review with flying colours, then a new developer adds a .env file and commits it within a week.
Rule: run the compliance script on every pull request. Reject PRs that violate Factor III or XI.
Key Takeaway
Compliance is a continuous process, not a checkbox.
Make it part of your development workflow: every PR gets a twelve-factor review, every deploy confirms the checklist.
● Production incidentPOST-MORTEMseverity: high
Hardcoded Credentials in config/database.yml: The Breach That Shouldn't Have Happened
Symptom
An internal security scan found the public fork with real credentials still in git history (even after the config file was removed).
Assumption
The team assumed that .gitignore would protect secrets after adding the file post-commit — they didn't realize git history retains every version.
Root cause
Factor III (Config) was violated: credentials lived in code instead of environment variables. No automated secret scanning was in place.
Fix
1. Rotate all exposed credentials immediately. 2. Rewrite git history using git filter-branch to purge the file from all commits. 3. Move every secret to environment variables. 4. Add a pre-commit hook using git-secrets or truffleHog to block future commits. 5. Provide a .env.example file with placeholder values for local dev.
Key lesson
Config must never be in code — not even temporarily. The moment it touches git, it's forever.
Treat secrets with the same paranoia as production data. Use secrets managers (Vault, AWS Secrets Manager, GitHub Secrets) from day one.
Automated guardrails (pre-commit hooks, CI scans) catch mistakes before they become breaches.
Production debug guideCommon symptoms and the exact commands to diagnose each factor violation3 entries
Symptom · 01
App starts with wrong database URL in production — config hardcoded in code
→
Fix
Check source code for hardcoded values: grep -rn 'DATABASE_URL' src/ --include='.py' --include='.js'. Verify environment variable injection: echo $DATABASE_URL inside the container. If absent, add env vars in deployment manifest (Docker Compose, K8s ConfigMap).
Symptom · 02
Horizontal scaling breaks — users' session data lost when request hits different instance
→
Fix
Check if session storage uses in-memory (e.g., req.session in Node.js with default memory store). Switch to Redis: docker run -d --name redis redis:7. Update code to use Redis as session store. Verify with: redis-cli KEYS "session:*" after login from two different instances.
Symptom · 03
Deploy causes 5-second outage because app takes 3 minutes to start
→
Fix
Measure startup time: time docker run myapp:latest. Profile startup: enable JVM flags -XX:+PrintClassLoading, check for slow initialization. Apply lazy loading, deferred connections, and reduce classpath scanning. Target startup under 10 seconds for horizontal scaling.
★ Quick Twelve-Factor Compliance ChecklistRun these commands to verify your app follows each factor in production
Factor I: Codebase not linked to single repo−
Immediate action
Check if repo contains multiple deployable units
Commands
git log --oneline | head -5 (verify one project per repo)
Check CI config for multiple deploy targets from same repo
Fix now
Split monorepo into separate repos per app, or use monorepo tool like Nx with separate build configs
Check if any config file is committed (e.g., config/database.yml not in .gitignore)
Fix now
Move all secrets to environment variables, rotate exposed credentials, add .gitignore entry
Factor VI: Sticky sessions needed+
Immediate action
Identify session data type and location
Commands
Check load balancer config for session affinity setting
Inspect code for in-memory session store: grep -rn 'session' src/
Fix now
Move sessions to Redis, remove sticky session config, verify: kill one instance — user should not be logged out
Factor IX: Slow shutdown causes data loss+
Immediate action
Test graceful shutdown with SIGTERM
Commands
kill -TERM <pid> while app processes a request
Check logs for 'shutdown' or 'SIGTERM' handling
Fix now
Add signal handler that drains in-flight requests, returns jobs to queue, then exits with 0
Factor XI: Log files inside container+
Immediate action
Check log driver configuration
Commands
docker inspect <container> | grep -i log
Inside container: ls /var/log/app/ (if exists, violoation)
Fix now
Redirect all logging to stdout: set logging handler to StreamHandler(sys.stdout), remove file handlers, update Docker logging driver to json-file or awslogs
Twelve-Factor vs Traditional App: Key Differences
Aspect
Traditional (Non-12-Factor) App
Twelve-Factor App
Config storage
Hardcoded in source files or committed config files
Exclusively in environment variables — never in code
Session state
Stored in process memory (sticky sessions required)
Stored in external backing service (Redis, DB)
Scaling strategy
Scale up — buy a bigger server
Scale out — add more identical stateless instances
Log handling
App writes and rotates its own log files
App writes to stdout; platform handles routing
Dev/Prod parity
SQLite locally, Postgres in prod — bugs hide until deploy
Same Postgres version in dev and prod via Docker Compose
Shutdown behaviour
Process killed immediately — in-flight work lost
SIGTERM triggers graceful drain — zero data loss
Database migrations
SSH into server, run scripts manually
One-off process with same image + config as running app
Dependency management
Relies on globally installed system packages
Fully declared in requirements.txt / package.json
Backing service swaps
Requires code changes to swap DB or cache
Change one environment variable — zero code changes
New developer onboarding
Hours of setup docs, tribal knowledge required
Clone + set env vars + one command = running app
Key takeaways
1
Config belongs in environment variables
the test is whether you could open-source the repo right now without exposing any credentials. If no, you're violating Factor III.
2
Stateless processes are the entire reason horizontal scaling works. If your app can't run 50 identical instances behind a load balancer, you have state living in process memory
move it to Redis or Postgres.
3
Dev/prod parity isn't aesthetic
it's economic. Every difference between your dev and prod environments is a category of bug that only surfaces in production, where it's most expensive to fix.
4
Graceful shutdown (Factor IX) is the difference between 'deploy at 2pm, users see errors' and 'deploy continuously, users notice nothing'. Handle SIGTERM, drain in-flight work, then exit cleanly.
5
Logs to stdout
it's not a style choice. It's a separation of concerns that makes centralised logging possible at scale. File-based logging in containers is a production incident waiting to happen.
Common mistakes to avoid
3 patterns
×
Storing secrets in a committed config file
Symptom
Credentials exposed in git history (even after deletion — history is forever). Attackers scan public repos for these patterns.
Fix
Move ALL secrets to environment variables immediately, rotate any exposed credentials, add the config file to .gitignore, and provide a .env.example with placeholder values for onboarding. Use a secrets manager (Vault, AWS Secrets Manager) in production.
×
Violating Dev/Prod parity with SQLite locally
Symptom
Your app works perfectly in development but fails in production with cryptic Postgres errors (e.g., column type mismatches, JSON operator syntax errors, case-sensitivity differences).
Fix
Run the exact same Postgres version locally via Docker Compose that you use in production. The five-minute setup cost saves hours of 'but it worked locally' debugging.
×
Writing application logs to a file inside the container
Symptom
Logs disappear when the container restarts (containers are ephemeral), or your log aggregator sees nothing because it's watching stdout not a file.
Fix
Remove all file handlers from your logger config and replace with logging.StreamHandler(sys.stdout). Let the platform's log driver (Docker's json-file driver, Kubernetes Fluentd) handle the rest.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
A candidate claims their app follows the Twelve-Factor methodology, but ...
Q02SENIOR
Walk me through how you would migrate a legacy application that hardcode...
Q03JUNIOR
Factor XI says to treat logs as event streams and write to stdout. A jun...
Q01 of 03SENIOR
A candidate claims their app follows the Twelve-Factor methodology, but it uses sticky sessions in the load balancer. Which factor does this violate and why does it make horizontal scaling fragile?
ANSWER
Factor VI (Processes) — stateless processes. Sticky sessions mean state is being held in process memory, so requests must always route to the same instance. This makes scaling fragile because if that instance restarts, the in-memory state is lost. The correct approach is to move session state to a backing service like Redis, so any instance can handle any request — that's true horizontal scaling.
Q02 of 03SENIOR
Walk me through how you would migrate a legacy application that hardcodes database credentials in `config.py` to be compliant with Factor III. What are the steps and what risks do you need to manage?
ANSWER
1. Identify all hardcoded credentials in the codebase (search for passwords, API keys, database URLs). 2. Replace the values with os.getenv('VAR_NAME') calls. 3. Define the environment variables in the deployment environment (e.g., Docker Compose, Kubernetes ConfigMap/Secret, CI/CD pipelines). 4. Add a .env.example file for local development, and ensure .env is in .gitignore. 5. Rotate all existing credentials immediately after the migration — the old values might still be in git history. Use git filter-branch or BFG Repo-Cleaner to purge them from history. 6. Add a pre-commit hook to prevent future commits containing credentials. 7. Test in a staging environment first. Risk: missed credentials in less obvious places (e.g., comment blocks, README files, test fixtures). Risk of breaking local dev setups if fallback values are not appropriate. Mitigate by crashing early with a clear error if an essential env var is missing.
Q03 of 03JUNIOR
Factor XI says to treat logs as event streams and write to stdout. A junior engineer argues it's easier to just write to a log file directly. What specific operational problems arise in a containerised, horizontally-scaled environment when you take the log-to-file approach?
ANSWER
Three critical problems: 1. Ephemeral storage: containers are transient. When they restart or are scaled down, log files written to the local filesystem are gone forever. You lose all historical log data. 2. Log aggregation complexity: With 50+ container instances each writing to their own log files, you need a centralised log collector anyway. But that collector now needs to parse each container's filesystem, which is more complex and fragile than reading stdout (which is automatically streamed by Docker/K8s). 3. Resource pressure: Writing to a file inside a container adds I/O overhead on the node's local disk. Under heavy load, this can cause node-wide disk I/O contention, affecting other containers on the same node. Stdout is buffered and handled by the container runtime with minimal overhead.
01
A candidate claims their app follows the Twelve-Factor methodology, but it uses sticky sessions in the load balancer. Which factor does this violate and why does it make horizontal scaling fragile?
SENIOR
02
Walk me through how you would migrate a legacy application that hardcodes database credentials in `config.py` to be compliant with Factor III. What are the steps and what risks do you need to manage?
SENIOR
03
Factor XI says to treat logs as event streams and write to stdout. A junior engineer argues it's easier to just write to a log file directly. What specific operational problems arise in a containerised, horizontally-scaled environment when you take the log-to-file approach?
JUNIOR
FAQ · 5 QUESTIONS
Frequently Asked Questions
01
Is the Twelve-Factor App only relevant for apps deployed on Heroku?
Not at all — Heroku engineers wrote it because they operated thousands of apps, but the principles apply anywhere: AWS ECS, Kubernetes, Google Cloud Run, or even a plain VPS. Any environment where you want portability, scalability, and maintainability benefits from these factors. Kubernetes in particular is designed around many of these same assumptions.
Was this helpful?
02
Do I need to implement all twelve factors at once?
No, and most teams don't. Factors III (Config), VI (Stateless Processes), and XI (Logs) tend to deliver the most immediate value and are the easiest to start with. Treat it as a maturity model — assess which factors you're currently violating, prioritise by impact, and improve incrementally. Even hitting eight of twelve factors puts you well ahead of most production codebases.
Was this helpful?
03
What's the difference between the Twelve-Factor App and microservices architecture?
They're complementary, not the same thing. Microservices is about how you split your system into independent services. The Twelve-Factor App is about how each individual service (or monolith) should be built and operated. You can have a twelve-factor monolith or a non-twelve-factor microservices mess. Ideally, each of your microservices is itself a twelve-factor app.
Was this helpful?
04
How do I handle state that can't be moved to a database (e.g., ML model weights)?
Store the weights in a backing service (S3, shared filesystem) and load them at startup. The process itself remains stateless — the model is loaded from the service, not embedded in the code. This also allows you to update the model without redeploying the application.
Was this helpful?
05
Does Factor II mean I should avoid all system packages?
System packages (like openssl, curl, fontconfig) that are needed for the OS can be installed via the Dockerfile. The factor applies to language-level dependencies — the modules your application code directly imports. OS packages should still be pinned in the Dockerfile base image (e.g., FROM python:3.11-slim already includes a known set of system libs).