Skip to content
Home DevOps Docker :latest Tag Broke Production — Pin Your Base Images

Docker :latest Tag Broke Production — Pin Your Base Images

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Docker → Topic 16 of 18
A python:3 tag silently upgraded from Debian 11 to 12, crashing production with libffi errors.
🧑‍💻 Beginner-friendly — no prior DevOps experience needed
In this tutorial, you'll learn
A python:3 tag silently upgraded from Debian 11 to 12, crashing production with libffi errors.
  • Docker Architecture — How It All Fits Together
  • What Containers Actually Are (Not VMs)
  • Installing Docker and Your First Commands
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Docker Client: CLI that sends commands via REST API
  • Docker Daemon: Background service that builds, runs, and manages containers
  • Docker Engine: Client + Daemon + API layer combined
  • Docker Registry: Stores and distributes container images (Docker Hub, ECR, etc.)
🚨 START HERE

Docker Container Triage Cheat Sheet

First-response commands when a container issue is reported. No theory — just actions.
🟡

Container crashed or is restarting in a loop.

Immediate ActionCheck logs and exit code.
Commands
docker logs --tail 50 <container>
docker inspect <container> --format='{{.State.ExitCode}} {{.State.Error}}'
Fix NowExit code 137 = OOM killed (set --memory higher). Exit code 1 = application error (check logs). Exit code 139 = segfault (check base image compatibility).
🟡

Container is running but not responding to requests.

Immediate ActionVerify port mapping and process status inside container.
Commands
docker port <container>
docker exec <container> ps aux
Fix NowIf process is not running, the CMD failed silently. If process is running but port is wrong, check EXPOSE and -p mapping. If on Docker Compose, verify service name DNS: docker exec <container> nslookup <service>
🟡

Docker build fails with 'no space left on device'.

Immediate ActionClean up Docker disk usage.
Commands
docker system df
docker system prune -a --volumes
Fix NowWarning: docker system prune -a removes ALL unused images and volumes. In production, use docker image prune and docker container prune selectively.
🟡

Container cannot connect to database or other services.

Immediate ActionVerify network and DNS resolution.
Commands
docker network inspect bridge
docker exec <container> ping -c 2 <service-name>
Fix NowIf using Docker Compose, ensure both services are in the same docker-compose.yml. If using custom networks, verify both containers are attached: docker network connect <network> <container>
Production Incident

Production API Crash After Docker Image Rebuild — Silent :latest Upgrade

A payments API deployed with `image: python:latest` in its Dockerfile silently upgraded from Python 3.11 to 3.12 during a routine CI rebuild, breaking a C extension dependency and causing 47 minutes of downtime during peak traffic.
SymptomService health check failed after automated deployment. Container started but crashed within 30 seconds with ImportError: libffi.so.7: cannot open shared object file. No code changes in the last 2 weeks. Previous deployment worked fine.
AssumptionTeam assumed a corrupted Docker image cache or a flaky CI runner. They retried the deployment 3 times. Each attempt failed with the same error. Second assumption: a dependency in requirements.txt had a breaking update.
Root causeThe Dockerfile used FROM python:3 (not pinned to a patch version). Between the last successful deployment and this one, the official Python image updated from 3.11.7 to 3.11.8, which changed the base OS from Debian 11 to Debian 12. Debian 12 ships libffi8, not libffi7. The cffi package compiled against libffi7 could not load. The image was rebuilt from scratch (no layer cache on the new CI runner), so Docker pulled the latest python:3 image.
Fix1. Pinned the base image to FROM python:3.11.7-slim-bookworm — exact version, exact OS codename. 2. Added --platform linux/amd64 to all FROM instructions to prevent ARM/AMD64 mismatches. 3. Added a CI step that runs docker inspect on the built image and fails if the base image digest changed unexpectedly. 4. Added Trivy vulnerability scanning to the CI pipeline. 5. Documented the rule: every FROM instruction must pin to an exact version and OS codename.
Key Lesson
Never use :latest or unversioned tags (like python:3) in production Dockerfiles.Pin both the version AND the OS codename: python:3.11.7-slim-bookworm, not python:3.11-slim.Docker layer caching means a stale cache can hide a base image change — always test with --no-cache periodically.CI runners without warm caches will pull the latest base image on every build.A 2-character change (FROM python:3 → FROM python:3.11.7-slim-bookworm) prevents hours of incident response.
Production Debug Guide

From failed health check to root cause — systematic debugging paths.

Container starts but crashes immediately.Check container logs first — docker logs <container> shows stdout/stderr. If the process exits before writing logs, run the container interactively: docker run -it <image> bash and execute the CMD manually.
Container runs but health check fails.Exec into the container and run the health check command manually: docker exec <container> curl -f http://localhost:8000/health. If it works inside but fails from the host, check port mapping and network configuration.
Container runs out of disk space.Check if volumes are mounted correctly. Check Docker's own disk usage: docker system df. Clean up dangling images and stopped containers: docker system prune -a.
Container cannot reach other containers or external services.Check Docker network configuration: docker network ls and docker network inspect <network>. Verify DNS resolution inside the container: docker exec <container> nslookup <service-name>. Check if the container is on the same Docker network as the target service.
Container uses too much memory or CPU, affecting other services.Check resource usage: docker stats <container>. If no resource limits are set, the container can consume all host resources. Set --memory and --cpus flags, or use deploy.resources in Docker Compose.
Docker build is extremely slow (5+ minutes for a simple app).Check layer caching: docker history <image>. If code changes invalidate the dependency install layer, move COPY requirements.txt and RUN pip install before COPY . . Use .dockerignore to exclude large directories from build context.

Docker containerization solves the 'works on my machine' problem at the infrastructure level. Your application runs in development. You deploy it and it crashes — different Python version, different library, different timezone. Docker eliminates environment drift by packaging the application with its entire dependency graph into a portable image.

The architecture is client-server. The Docker client sends commands to the Docker Daemon via REST API. The Daemon manages all Docker objects — images, containers, networks, volumes. This separation means the client and daemon can run on different machines, enabling remote builds and CI/CD integration.

Containers are not VMs. They share the host Linux kernel and use namespaces for isolation and cgroups for resource limits. This gives millisecond startup times and sub-megabyte overhead — but also means a compromised container has kernel-level access to the host. Understanding this trade-off is essential for production security decisions.

Docker Architecture — How It All Fits Together

Docker containerization uses a client-server architecture — before writing a single Dockerfile, understand what you are actually talking to. Docker with three main components:

Docker Client: The CLI you interact with. When you run docker build or docker run, the Docker client sends these commands via REST API to the Docker Daemon. The client and daemon can run on the same machine or on different machines.

Docker Daemon (dockerd): The background service that does the actual work. The Docker Daemon listens for Docker API requests and manages Docker objects — images, containers, networks, and volumes. It is the engine that builds, runs, and distributes containers.

Docker Engine: The collective name for the client + daemon + REST API layer. When people say 'install Docker', they mean install Docker Engine (or Docker Desktop on macOS/Windows, which bundles Docker Engine inside a lightweight Linux VM).

Docker Registry: A storage and distribution system for container images. Docker Hub is the default public registry — it hosts official images for postgres, nginx, python, node, and thousands more. Companies run private registries (Amazon ECR, GitHub Container Registry, or a self-hosted Docker registry) to store proprietary images. When you run docker pull postgres:16, Docker Engine contacts Docker Hub and downloads the image layers.

The flow for every docker run command: Docker client → REST API → Docker Daemon → checks local image cache → pulls from registry if not cached → creates container → starts process.

Failure scenario — Daemon unreachable: If the Docker Daemon is down or the socket is misconfigured, every docker command fails with 'Cannot connect to the Docker daemon'. In production, this means your CI/CD pipeline stops, health checks cannot exec into containers, and log collection breaks. Always monitor the Docker Daemon process and socket permissions.

Mental Model
Docker Objects as the Core Abstraction
Why does Docker use a client-server architecture instead of a monolithic CLI?
  • Separation allows remote management — build on one machine, deploy on another.
  • The Daemon can manage multiple containers simultaneously without blocking the CLI.
  • CI/CD systems interact with the Daemon via the same REST API as the CLI.
  • Multiple clients (CLI, Docker Compose, IDE plugins) can connect to the same Daemon.
📊 Production Insight
In production, the Docker Daemon is a single point of failure. If dockerd crashes, all running containers continue to run (they are separate processes), but you cannot start, stop, or inspect containers until the Daemon restarts. Monitor dockerd with systemd and alert on failures. In Kubernetes, the equivalent is the kubelet — same principle, different abstraction.
🎯 Key Takeaway
Docker is a client-server system. The CLI is a thin client. The Daemon does all the work — building, running, networking, storage. If the Daemon is unreachable, your entire container management surface is down. Monitor it like you monitor your database.
Choosing a Docker Registry
IfOpen-source project or public images
UseUse Docker Hub — free for public repositories, largest image library
IfPrivate images in AWS infrastructure
UseUse Amazon ECR — integrates with IAM, no egress costs within AWS
IfPrivate images, multi-cloud or on-premise
UseUse GitHub Container Registry or a self-hosted Harbor instance
IfRegulatory requirements (data sovereignty, air-gapped environments)
UseSelf-hosted registry (Harbor, Docker Registry) with image signing and scanning
Docker Architecture 2026 — How It All Fits Together
docker_architecture.py · client → daemon → registry flow
🖥 Docker Client
$ docker build .
$ docker run nginx
$ docker pull redis
$ docker push myapp
CLI sends commands
via REST API →
to Docker Daemon
REST API
⚙️ Docker Engine host machine
🔧 Docker Daemon dockerd
📦 Images read-only layers
🏃 Containers running instances
💾 Volumes persistent data
🌐 Networks inter-container
▼ delegates to
containerd container lifecycle
runc OCI runtime
pull / push
☁️ Registry
🐳
Docker Hub
hub.docker.com
↙ docker pull
↗ docker push
Also: ECR, GCR,
GHCR, private
registries
Docker Containerization

What Containers Actually Are (Not VMs)

The most common misunderstanding about Docker containerization: Linux containers are not virtual machines. A virtual machine emulates hardware — it has its own kernel, its own memory management, its own full operating system. Booting a VM takes seconds to minutes and uses hundreds of MB of RAM just for the OS overhead.

Containers share the host's operating system kernel. They use Linux kernel features — namespaces (for isolation: each running container sees its own filesystem, network, and process tree) and cgroups (for resource utilization limits: CPU caps, memory limits) — to create isolated environments without the overhead of a separate OS.

Practical difference: a VM running Ubuntu might use 512MB RAM just for the OS. A Docker container running Ubuntu uses <1MB overhead for isolation — the processes inside see an Ubuntu-like environment but share the host's Linux kernel. This is why you can run 50 containers on a machine that could only run 3 VMs, with dramatically better resource utilization.

This architecture is also why Docker Desktop on macOS and Windows runs a lightweight Linux VM internally — macOS and Windows have different kernels, so Docker Engine needs a Linux kernel to host Linux containers.

Security trade-off: Because containers share the host kernel, a container escape vulnerability gives the attacker root access to the host. VMs have a much stronger isolation boundary (the hypervisor). For multi-tenant workloads where you run untrusted code, use VMs (gVisor, Firecracker) or sandboxed containers (Kata Containers). For trusted application deployment, containers are the right choice.

Performance impact: The shared-kernel architecture means container-to-container communication on the same host uses localhost networking — no hypervisor overhead. Inter-container latency is sub-millisecond. In a VM-based architecture, network traffic between VMs on the same host still goes through virtual network interfaces, adding 10-50 microseconds per packet.

Mental Model
Containers as Process Isolation, Not Machine Emulation
If containers share the host kernel, how is isolation actually enforced?
  • PID namespace: each container sees PID 1 as its init process. The host sees the real PID.
  • Network namespace: each container gets its own network stack (interfaces, routes, iptables).
  • Mount namespace: each container sees its own filesystem root. The host sees the real paths under /var/lib/docker/overlay2.
  • Cgroups: limit CPU shares, memory (hard limit), and I/O bandwidth per container.
  • Seccomp and AppArmor: restrict which syscalls a container process can make.
📊 Production Insight
The container-vs-VM decision is a security boundary decision, not a performance decision. Containers win on density and startup time. VMs win on isolation strength. In production, use containers for trusted application workloads. Use VMs or gVisor/Firecracker for multi-tenant SaaS where you run customer code. Never run privileged containers (--privileged flag) in production — it disables all isolation.
🎯 Key Takeaway
Containers are process isolation, not machine emulation. They share the host kernel via namespaces and cgroups. This gives them millisecond startup and sub-megabyte overhead — but also means a kernel vulnerability can compromise the host. Choose containers for trusted workloads, VMs for untrusted multi-tenant isolation. Never use --privileged in production.
Container vs VM — When to Use Each
IfDeploying your own trusted application code
UseUse containers — better density, faster startup, simpler management
IfRunning untrusted customer code (SaaS, CI runners)
UseUse VMs (Firecracker, gVisor) or sandboxed containers (Kata) for stronger isolation
IfNeed different OS kernels (Windows + Linux workloads)
UseUse VMs — containers share the host kernel, cannot run cross-OS
IfNeed sub-second scaling and high density (microservices)
UseUse containers — millisecond startup, <1MB overhead per container
IfRegulatory compliance requiring hardware-level isolation
UseUse VMs — auditors often require hypervisor-level boundary
Containers vs Virtual Machines — Why Containers Are So Fast
containers_vs_vms.py · shared kernel vs full OS stack
🖥️
Virtual Machine
full OS per instance
Physical Hardware CPU, RAM, storage
🔧
Hypervisor VMware / KVM / VirtualBox
🐧
Guest OS + Kernel full Ubuntu / Windows kernel
📚
Guest Libraries libc, openssl, etc.
📦
Your Application finally, your actual code
512MB+OS overhead
30–60sboot time
~5 GBimage size
3VMs / machine
vs
🐳
Docker Container
shared host kernel
🐧
Host OS Kernel shared — one kernel for all
🔲
Namespaces pid, net, mnt, uts — isolation
📊
cgroups CPU + memory resource limits
📦
Your App + Deps Only no extra OS — just what you need
↑ all containers share the same host kernel ↑
<1MBkernel overhead
<100msstart time
~180MBimage size
50+containers / machine
Docker Containerization

Installing Docker and Your First Commands

Installing Docker on Linux (Ubuntu/Debian):

install_docker.sh · BASH
1234567891011121314151617181920212223242526272829303132333435363738394041424344
# Install Docker Engine on Ubuntu
curl -fsSL https://get.docker.com | sh

# Add your user to the docker group (avoids sudo on every command)
sudo usermod -aG docker $USER
newgrp docker

# Verify install
docker --version
# Docker version 26.1.3, build b72abbb

# ── Essential Docker commands ─────────────────────────────────────
# Pull an image from Docker Hub
docker pull python:3.12-slim

# Run a container interactively
docker run -it python:3.12-slim bash

# Run a container in the background (detached)
docker run -d --name myapp -p 8000:8000 myapp:latest

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

# Stop a running container
docker stop myapp

# View container logs
docker logs -f myapp

# Execute a command inside a running container
docker exec -it myapp bash

# Remove a stopped container
docker rm myapp

# List Docker images
docker images

# Remove an image
docker rmi python:3.12-slim
▶ Output
Docker version 26.1.3, build b72abbb

# docker ps output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a1b2c3d4e5f6 myapp:latest ... 1m ago Up 1m 0.0.0.0:8000->8000/tcp myapp
📊 Production Insight
The docker group grants root-equivalent access. Any user in the docker group can mount the host filesystem into a container and read/write any file as root. In production, never add application users to the docker group. Use rootless Docker mode (dockerd-rootless) or restrict access via Docker socket proxy.
🎯 Key Takeaway
Docker installation is straightforward on Linux — curl and add your user to the docker group. But the docker group is root-equivalent. In production, use rootless mode or socket proxies. Never grant docker group access to service accounts.

Your First Dockerfile — Building Container Images

A Dockerfile is a recipe for building a container image — the fundamental unit of Docker containerization. Docker reads it top to bottom, executing each instruction as a layer. Docker caches layers and only rebuilds what changed — understanding this is the difference between 30-second builds and 5-minute builds.

When you run docker build, the Docker client sends your build context (your project files) to the Docker Daemon, which executes each Dockerfile instruction in sequence, creating a new image layer for each one.

Layer caching mechanics: Each instruction in a Dockerfile creates a layer. Docker caches layers and reuses them if the instruction and all preceding layers are unchanged. If you COPY your entire application before installing dependencies, every code change invalidates the pip install layer — forcing a full reinstall on every build. The fix: COPY requirements.txt first, RUN pip install, then COPY the rest.

Build context size matters: The docker build command sends your entire build context (current directory by default) to the Daemon. Without a .dockerignore file, this includes .git (often 100MB+), node_modules, __pycache__, and potentially .env files with secrets. A large build context slows every build, even with layer caching.

Dockerfile · DOCKERFILE
123456789101112131415161718192021222324252627282930
# ── Base image ───────────────────────────────────────────────────
# Always pin exact versions — 'python:latest' will break your
# build when a new Python version releases
FROM python:3.12-slim

# ── Metadata ────────────────────────────────────────────────────
LABEL maintainer="your@email.com"
LABEL version="1.0.0"

# ── Set working directory ────────────────────────────────────────
WORKDIR /app

# ── Copy requirements FIRST (layer caching trick) ────────────────
# If requirements.txt doesn't change, Docker caches this layer.
# If you copied all files first, every code change would
# invalidate the pip install layer — rebuilds take forever.
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# ── Now copy the rest of your code ──────────────────────────────
COPY . .

# ── Create non-root user (security best practice) ───────────────
# Running as root inside a container is a security risk.
RUN useradd --create-home appuser
USER appuser

# ── Expose port and set default command ──────────────────────────
EXPOSE 8000
CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
▶ Output
# Build the container image:
docker build -t myapp:1.0.0 .

# Run it (docker run maps host port 8000 to container port 8000):
docker run -d -p 8000:8000 --name myapp myapp:1.0.0

# Push to Docker Hub (or private registry):
docker tag myapp:1.0.0 yourusername/myapp:1.0.0
docker push yourusername/myapp:1.0.0
Mental Model
Docker Layers as a Stack of Diffs
Why does the order of Dockerfile instructions matter for build speed?
  • Docker caches layers top-to-bottom. If an instruction changes, all subsequent layers are rebuilt.
  • Dependencies change rarely. Code changes frequently. Put rare changes first.
  • COPY requirements.txt before COPY . . ensures pip install is cached when only code changes.
  • Each RUN command creates a layer. Combining commands with && reduces layer count and image size.
📊 Production Insight
The layer caching insight is the single highest-impact optimization for Docker build speed. In a typical Python project, dependencies change once per sprint but code changes every commit. Without the requirements.txt-first pattern, every commit triggers a full pip install — adding 30-120 seconds to every build. With it, code-only changes rebuild in 2-5 seconds. This compounds across CI/CD pipelines running hundreds of builds per day.
🎯 Key Takeaway
Dockerfile layer ordering is a build speed optimization, not just a style choice. Copy dependencies before code. Combine RUN commands. Use .dockerignore. These three changes can turn a 5-minute build into a 30-second build. In CI/CD, that difference compounds into hours of developer time saved per week.
Dockerfile Optimization Decision Tree
IfBuild takes > 2 minutes and dependencies rarely change
UseMove COPY requirements.txt and RUN pip install before COPY . . — cache the dependency layer
IfImage is > 1GB and includes build tools (gcc, make, node_modules)
UseUse multi-stage builds — copy only runtime artifacts to the final image
IfBuild context upload is slow (> 10 seconds)
UseAdd .dockerignore to exclude .git, node_modules, __pycache__, .env
IfImage contains secrets (API keys, passwords in ENV or ARG)
UseUse Docker BuildKit secrets: --mount=type=secret,id=mysecret. Never bake secrets into layers.
IfMultiple services share a base image
UseCreate a shared base image with common dependencies, extend it per service
Dockerfile Layer Caching — Why Order Matters
layer_caching.py · cache hit vs cache miss · build time impact
✗ Wrong Order — Code Before Deps
1📦
FROM python:3.12-slim base image
cached
2📁
WORKDIR /app set directory
cached
3📄
COPY . . copy ALL files first
⚡ MISS
4
RUN pip install -r req.txt reinstalls ALL deps every time
✗ REBUILT
⏱️
Build time: ~4 min every change Any code edit invalidates layer 3. pip install runs again — 300+ packages reinstalled.
✓ Correct Order — Deps Before Code
1📦
FROM python:3.12-slim base image
cached
2📁
WORKDIR /app set directory
cached
3📋
COPY requirements.txt . only deps file — rarely changes
✓ HIT
4
RUN pip install -r req.txt cached until requirements.txt changes
✓ HIT
5📄
COPY . . only this layer rebuilds on code change
⚡ MISS (ok)
🚀
Build time: ~8 sec on code changes pip install is cached. Only layer 5 rebuilds. 300+ packages served from cache instantly.
⚡ The Rule: Put things that change LEAST at the TOP
🔴 Any changed layer invalidates ALL layers below it — that's how Docker's union filesystem works
🟢 COPY requirements.txt changes rarely → safe to cache → pip install only re-runs when deps change
🟢 COPY . . changes every commit — put it LAST so deps cache survives
Docker Containerization

Multi-Stage Builds — Shrinking Production Images

The single biggest mistake in beginner Dockerfiles: shipping build tools to production. A Python application that compiles some C extensions needs gcc, make, and build headers during the build — but not at runtime. A Go application needs the entire Go toolchain to compile — but the final binary needs nothing.

Multi-stage builds solve this: use a heavy 'builder' image with all your build tools, then copy only the finished artifact into a minimal 'runtime' image. Production images shrink from gigabytes to tens of megabytes, reducing attack surface and improving pull times across different environments.

Why this matters for security: Every tool in your production image is an attack surface. gcc, make, curl, wget — if an attacker gets shell access to your container, these tools let them compile exploits, download payloads, and pivot. A slim runtime image with no build tools gives an attacker almost nothing to work with.

Why this matters for deployment speed: Container images must be pulled to every node before they can run. A 1.2GB image takes 30-60 seconds to pull over a fast network. A 180MB image pulls in 5-10 seconds. During rolling deployments across 20 nodes, that difference is minutes of deployment time.

Failure scenario — bloated image causes deployment timeout: A team deployed a 2.4GB Python image (single-stage, with gcc, build headers, and test dependencies). During a rolling update on Kubernetes, the image pull took 90 seconds per node. With 15 nodes and a 2-minute readiness timeout, 8 nodes failed to pull the image in time, causing the rollout to fail. The fix was a multi-stage build that reduced the image to 220MB — pulls now complete in 8 seconds.

Dockerfile.multistage · DOCKERFILE
123456789101112131415161718192021222324
# ── Stage 1: Builder ─────────────────────────────────────────────
FROM python:3.12 AS builder
WORKDIR /build

# Install build dependencies (won't be in final image)
RUN apt-get update && apt-get install -y gcc libpq-dev

COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# ── Stage 2: Runtime (minimal) ───────────────────────────────────
FROM python:3.12-slim AS runtime
WORKDIR /app

# Copy only the installed packages from builder — not gcc, not apt cache
COPY --from=builder /root/.local /root/.local
COPY . .

RUN useradd --create-home appuser && chown -R appuser /app
USER appuser

ENV PATH=/root/.local/bin:$PATH
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
▶ Output
# Build and check size:
docker build -f Dockerfile.multistage -t myapp:slim .
docker images myapp

# REPOSITORY TAG SIZE
# myapp slim 178MB ← runtime only (no build tools)
# vs single-stage: 1.24GB
Mental Model
Multi-Stage Builds as a Conveyor Belt
Why not just delete build tools in a RUN command at the end of a single-stage Dockerfile?
  • Deleted files in a RUN command still exist in the previous layer — the image size does not shrink.
  • Docker layers are additive. A file added then deleted in a later layer still occupies space in the earlier layer.
  • Multi-stage builds start fresh — the runtime stage never contains build tools in any layer.
  • This is the only way to genuinely reduce image size, not just hide files.
📊 Production Insight
Multi-stage builds are not optional for production. A single-stage image with build tools has a larger attack surface, slower pull times, and higher storage costs. The security benefit alone justifies the effort — every unnecessary binary in your production image is a tool an attacker can use post-compromise. Scan your images with Trivy or Docker Scout to verify your runtime image contains no build tools.
🎯 Key Takeaway
Multi-stage builds separate build-time and runtime dependencies. The builder stage compiles everything. The runtime stage copies only the artifact. This reduces image size by 80-95%, shrinks attack surface, and speeds up deployments. Never ship gcc, make, or test dependencies to production.
Multi-Stage Build — From 1.2 GB to 178 MB
multi_stage_build.py · builder stage → COPY --from=builder → slim runtime
🔨
Stage 1: Builder
FROM python:3.12 AS builder
🔧
gcc + build-essential ~350 MB
build only
🐍
Python full + headers ~650 MB
build only
📦
pip + wheel toolchain ~80 MB
build only
/root/.local (installed pkgs) ~120 MB
→ copied
🗑 All tools discarded — never shipped
COPY --from=builder
/root/.local only
🚀
Stage 2: Runtime
FROM python:3.12-slim AS runtime
🐧
python:3.12-slim base ~50 MB
base
📋
Installed packages only ~120 MB (from builder)
✓ kept
📄
Your app code ~8 MB
✓ kept
✓ No gcc. No pip. No build tools.
1.24 GB single-stage build
ships gcc, headers, pip, full Python — everything the build needed
178 MB multi-stage build
ships only compiled packages + slim Python + your code
85% smaller
Faster pulls · smaller attack surface · less CVEs · lower registry storage costs
Docker Containerization

Volumes — Persisting Data Beyond Container Lifetime

Containers are ephemeral by design — when a running container stops, everything written to its filesystem is lost. For stateful applications (databases, file uploads, logs), you need volumes.

Named volumes: Docker Daemon manages the storage location. Survives container restarts and removals. Best for databases and multiple containers that need to share data.

Bind mounts: Mount a host directory into the container. Great for developers working on software development workflows where code changes need to reflect immediately without rebuilding the container image. Not recommended for production — ties the container to host filesystem paths.

tmpfs mounts: Stored in host memory only. Useful for sensitive temporary data that must not persist to disk.

Failure scenario — bind mount in production causes data loss: A team ran PostgreSQL in Docker with a bind mount: -v /data/postgres:/var/lib/postgresql/data. During a server migration, they copied the container but forgot to copy /data/postgres on the host. The new container started with an empty bind mount — PostgreSQL initialized a fresh database, overwriting nothing (the old data was on the old host). But the team thought the data was 'in Docker' and deleted the old server. All production data was lost. The fix: use named volumes (docker volume create) which are managed by Docker and backed up explicitly, not bind mounts that depend on host filesystem awareness.

Performance impact: Named volumes use Docker's storage driver (overlay2 by default) which is optimized for container workloads. Bind mounts go through the host filesystem, which may use different I/O schedulers and caching. For database workloads, named volumes on SSD-backed storage outperform bind mounts by 10-20% on write-heavy benchmarks.

docker_volumes.sh · BASH
123456789101112131415161718192021222324252627
# ── Named volume (production databases) ──────────────────────────
docker volume create postgres_data

docker run -d \
  --name postgres \
  -e POSTGRES_PASSWORD=secret \
  -v postgres_data:/var/lib/postgresql/data \
  -p 5432:5432 \
  postgres:16

# Data persists even after removing the container:
docker rm postgres
docker run -d \
  --name postgres_new \
  -e POSTGRES_PASSWORD=secret \
  -v postgres_data:/var/lib/postgresql/data \
  postgres:16
# Same data — the volume survived the container removal

# ── Bind mount (software development) ───────────────────────────
docker run -d \
  --name dev_app \
  -v $(pwd):/app \
  -p 8000:8000 \
  myapp:latest
# Edit code on the docker host → changes reflected in container immediately
# No docker build required during development
▶ Output
# List volumes:
docker volume ls
# DRIVER VOLUME NAME
# local postgres_data

# Inspect:
docker volume inspect postgres_data
# Mountpoint: /var/lib/docker/volumes/postgres_data/_data
Mental Model
Volumes as External Hard Drives
Why are bind mounts discouraged in production?
  • Bind mounts tie the container to a specific host path — breaks portability across machines.
  • Host filesystem permissions can conflict with container user permissions.
  • No Docker-managed backup or migration — you must handle host directory lifecycle yourself.
  • Security risk: a compromised container with a bind mount can read/write any host file in the mounted directory.
📊 Production Insight
Named volumes for production, bind mounts for development. This is a hard rule. Named volumes are portable — Docker manages the storage location, and docker volume commands let you back up, restore, and migrate data. Bind mounts are host-path-dependent — they break when you move the container to a different machine. If you use bind mounts in production, you are coupling your container to a specific server, defeating the purpose of containerization.
🎯 Key Takeaway
Containers are ephemeral — their filesystem dies with them. Named volumes for production persistence. Bind mounts for development convenience. tmpfs for sensitive temporary data. Never use bind mounts in production — they break portability and couple containers to specific hosts.
Volume Type Selection
IfDatabase or persistent state in production
UseUse named volumes: docker volume create. Backup with docker run --rm -v vol:/data -v $(pwd):/backup alpine tar czf /backup/backup.tar.gz /data
IfDevelopment — live code reloading
UseUse bind mounts: -v $(pwd):/app. Fast iteration, no rebuild needed.
IfSensitive temporary data (session tokens, encryption keys in memory)
UseUse tmpfs mounts: --tmpfs /tmp/secrets:size=10m. Data never touches disk.
IfMultiple containers sharing the same data
UseUse named volumes with shared mount points. Ensure proper file locking if concurrent writes.

Docker Compose — Orchestrating Multiple Containers

Real applications built by developers are never one container. A REST API needs a database. A background worker needs a message queue. A web application needs a cache layer. Docker Compose defines and runs multi-container applications — what Docker Inc. calls 'multi container applications' — with a single YAML file and a single command.

Docker Compose handles networking between containers automatically: every service defined in docker-compose.yml can reach every other service by its service name. Your web container reaches the database at db:5432, not localhost:5432 — Docker's internal DNS resolves service names to container IP addresses.

Failure scenario — depends_on does not wait for readiness: A team used depends_on: db to ensure the database started before the web service. But depends_on only waits for the container to start, not for the database to accept connections. The web service started, tried to connect to PostgreSQL before it was ready, and crashed. The team saw intermittent failures on every docker compose up. The fix: add a healthcheck to the database service and use depends_on: condition: service_healthy.

Networking gotcha — default bridge network isolation: Docker Compose creates a default network for all services in the same docker-compose.yml. But containers from different docker-compose.yml files are on different networks and cannot communicate by default. To connect services across compose files, create an external network and attach both compose files to it.

docker-compose.yml · YAML
12345678910111213141516171819202122232425262728293031323334353637383940
version: '3.9'

services:
  # ── Web application ──────────────────────────────────────────
  web:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://user:password@db:5432/myapp
      - REDIS_URL=redis://redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    restart: unless-stopped

  # ── PostgreSQL database ───────────────────────────────────────
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: myapp
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d myapp"]
      interval: 10s
      timeout: 5s
      retries: 5

  # ── Redis cache ───────────────────────────────────────────────
  redis:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

volumes:
  postgres_data:
▶ Output
# Start all multiple containers:
docker compose up -d

# Check status:
docker compose ps

# View logs:
docker compose logs -f web

# Stop everything:
docker compose down

# Stop and delete volumes (wipes database):
docker compose down -v
Mental Model
Docker Compose as an Orchestra Conductor
Why can containers in the same docker-compose.yml reach each other by service name?
  • Docker Compose creates a shared network for all services in the file.
  • Docker's embedded DNS server resolves service names to container IPs on that network.
  • This is automatic — no manual IP configuration or /etc/hosts editing needed.
  • Services in different compose files need an explicit external network to communicate.
📊 Production Insight
Docker Compose is ideal for local development and small single-host deployments. For production multi-host deployments, migrate to Docker Swarm or Kubernetes. The docker-compose.yml file translates directly to Kubernetes manifests (use kompose convert). The healthcheck + depends_on pattern is essential — without it, services start in arbitrary order and fail randomly on cold starts.
🎯 Key Takeaway
Docker Compose orchestrates multi-container applications on a single host. Service name DNS handles inter-container networking. Healthchecks with depends_on: condition: service_healthy prevent startup race conditions. For multi-host production, graduate to Swarm or Kubernetes.
Docker Compose — Service Discovery & Networking
docker_compose_network.py · service names as hostnames · no localhost needed
myapp_default network (bridge)
🌐
web
build: . · uvicorn
8000:8000 (host exposed)
DATABASE_URL=...@db:5432/app
web → db:5432
web → redis:6379
🗄️
db
postgres:16-alpine
5432 (internal only)
redis
redis:7-alpine
6379 (internal only)
🔑 Service Discovery: How it works
Docker DNS maps db → container IP automatically
Use db:5432 not localhost:5432 — each container has its own localhost
Only web:8000 is exposed to the host — db and redis are network-internal
Docker DNS Resolution (automatic)
db 172.20.0.3 (assigned automatically)
redis 172.20.0.4 (assigned automatically)
web 172.20.0.2 + exposed 0.0.0.0:8000
docker-compose.yml
YAML
services: web: build: . ports: - "8000:8000" environment: # service name, not localhost - DATABASE_URL=postgresql:// user:pass@db:5432/app - REDIS_URL=redis://redis:6379 depends_on: db: condition: service_healthy db: image: postgres:16-alpine healthcheck: test: ["CMD-SHELL", "pg_isready -U user"] # no ports: — internal only redis: image: redis:7-alpine # no ports: — internal only # Compose creates one shared network # All services reach each other via # service names as hostnames
Docker Containerization

Container Orchestration — Docker Swarm and Beyond

Docker Compose handles multiple containers on a single docker host. When you need containers running across multiple machines — for high availability or scale — you need container orchestration.

Docker Swarm: Docker's built-in container orchestration mode. Turn multiple Docker hosts into a cluster with docker swarm init. Supports service scaling, rolling updates, and automatic container rescheduling when a node fails. Simpler than Kubernetes, suitable for smaller deployments.

```bash # Initialize a swarm docker swarm init

# Deploy a service across the swarm (3 replicas) docker service create --replicas 3 --name myapp -p 8000:8000 myapp:1.0.0

# Scale up docker service scale myapp=5

# Rolling update docker service update --image myapp:2.0.0 myapp ```

Kubernetes vs Docker Swarm: Docker Swarm is simpler to operate. Kubernetes has a larger ecosystem and is the standard for production container orchestration at scale — used by Amazon ECS alternatives, Google GKE, and Azure AKS. AWS Fargate takes this further: run containers without managing any servers or clusters at all — you define the container, AWS Fargate handles the infrastructure.

Docker containerization at scale requires orchestration. For most application development teams: start with Docker Compose locally, Docker Swarm for small production deployments, Kubernetes (or a managed service like Amazon ECS or AWS Fargate) for large-scale production.

When to graduate from Swarm to Kubernetes: If you need custom resource definitions, advanced networking (service mesh, network policies), sophisticated autoscaling (HPA, VPA, KEDA), or a large ecosystem of operators and tools — Kubernetes is the answer. If you need simple rolling updates and basic scaling on 3-10 nodes, Swarm is sufficient and far simpler to operate.

📊 Production Insight
The orchestration decision is an operational complexity decision. Docker Swarm has a flat learning curve — a team can go from zero to production in a day. Kubernetes has a steep learning curve — expect 2-4 weeks for a team to become productive. But Kubernetes scales to thousands of nodes and has a massive ecosystem. Start simple (Swarm), graduate when complexity demands it (Kubernetes), or skip the operational burden entirely (AWS Fargate, Amazon ECS).
🎯 Key Takeaway
Docker Compose for single-host. Swarm for simple multi-host. Kubernetes for complex multi-host at scale. AWS Fargate and Amazon ECS for managed options. The right choice depends on your team size, node count, and operational maturity — not on what is trendy.
Choosing an Orchestration Platform
IfSingle machine, local development or small deployment
UseDocker Compose — no orchestration needed
If2-10 machines, simple scaling and rolling updates
UseDocker Swarm — built into Docker, minimal learning curve
If10+ machines, complex networking, autoscaling, multi-team
UseKubernetes (GKE, EKS, AKS) — industry standard, large ecosystem
IfWant containers without managing any infrastructure
UseAWS Fargate or Google Cloud Run — serverless containers
IfAlready on AWS, want managed orchestration
UseAmazon ECS on Fargate — AWS-native, simpler than EKS

Production Best Practices — What Separates Senior from Junior Docker Usage

Experienced developers follow these Docker containerization rules religiously.

1. Never use :latest in production. FROM python:latest or image: postgres:latest will silently upgrade on your next deployment and potentially break your application. Always pin exact versions: python:3.12.3-slim, postgres:16.2-alpine.

2. Scan images for vulnerabilities. docker scout quickview myimage or integrate Trivy into your CI pipeline. Docker images accumulate CVEs as base operating system packages age.

3. Use .dockerignore. Excluding node_modules, .git, __pycache__, .env from the build context prevents accidentally shipping secrets and dramatically speeds up docker build.

4. Set resource limits. A running container with no resource limits can consume all docker host resources and crash other services. Always set --memory and --cpus in production, or use Docker Compose deploy.resources limits.

5. Implement health checks. The Docker Daemon and Kubernetes use health checks to know when a running container is ready to receive traffic and when it needs to be restarted.

6. Store secrets in secrets managers, not env vars or images. ENV SECRET_KEY=abc123 in a Dockerfile bakes the secret into every layer of the container image — it appears in docker history. Use Docker secrets, AWS Secrets Manager, or Vault.

7. Run as non-root. The USER instruction in a Dockerfile is not optional in production. Running as root inside a container means a container escape gives the attacker root on the host.

8. Use read-only filesystems where possible. --read-only flag makes the container filesystem read-only. Writable paths (tmp, logs) use tmpfs mounts. This prevents an attacker from writing binaries or modifying application code inside the container.

9. Log to stdout/stderr, not files. Docker captures stdout/stderr and makes it available via docker logs. Logging to files inside the container requires a volume mount and a log rotation strategy. Let Docker handle log collection.

.dockerignore · TEXT
12345678910111213141516171819202122
# .dockerignore — what NOT to send to Docker Daemon during docker build
.git
.gitignore
__pycache__
*.pyc
*.pyo
.pytest_cache
.coverage
htmlcov/
.env
.env.*
*.env
node_modules/
npm-debug.log
.DS_Store
docker-compose*.yml
Dockerfile*
README.md
docs/
tests/
*.test.py
Coverage/
▶ Output
# Effect: build context sent to Docker Daemon goes from 847MB to 12MB
# Faster docker build, no accidental secret leaks, smaller attack surface
Mental Model
Production Docker as a Security Boundary
Why is running as non-root so important if the container is 'isolated'?
  • Container isolation is namespace-based, not hardware-based. Kernel vulnerabilities can break namespaces.
  • Root inside a container has UID 0 — the same as root on the host. A namespace escape gives full host access.
  • Non-root containers limit the damage of a compromise — the attacker cannot modify system files or install packages.
  • Many Kubernetes security policies (PodSecurityStandards) require non-root containers.
📊 Production Insight
These 9 rules are not suggestions — they are production requirements. Every one of them has a corresponding failure story: silent upgrades from :latest, secret leaks from ENV, OOM from missing resource limits, data loss from missing health checks. Enforce them in CI with linting tools like hadolint (Dockerfile linter) and docker-compose config validation.
🎯 Key Takeaway
Production Docker is a security and reliability discipline. Pin versions. Scan for CVEs. Use .dockerignore. Set resource limits. Implement health checks. Manage secrets externally. Run as non-root. Log to stdout. These rules exist because every one has a corresponding production incident.
🗂 Containers vs Virtual Machines
Isolation model, performance characteristics, and when to use each.
CharacteristicContainerVirtual Machine
Isolation levelProcess-level (namespaces + cgroups)Hardware-level (hypervisor)
KernelShared host kernelSeparate kernel per VM
Startup timeMillisecondsSeconds to minutes
Memory overhead< 1MB per container100s of MB per VM (OS overhead)
Density50+ containers per host3-10 VMs per host
Security boundaryWeaker (kernel shared)Stronger (hypervisor isolated)
Image size10MB - 500MB typical1GB - 50GB typical
Use caseApplication deployment, microservicesMulti-tenant isolation, different OS requirements
OrchestrationDocker Compose, Swarm, KubernetesvSphere, OpenStack, Proxmox

🎯 Key Takeaways

    ⚠ Common Mistakes to Avoid

      Using :latest or unversioned tags in FROM instructions
      Symptom

      builds break silently when the base image updates; deployments that worked yesterday fail today with no code changes —

      Fix

      always pin exact version and OS codename: python:3.12.3-slim-bookworm, not python:3 or python:3.12-slim. Use a tool like renovate or dependabot to manage version updates explicitly.

      Not using .dockerignore
      Symptom

      docker build is slow (> 2 minutes for a simple app); .env secrets appear in the built image layers; .git directory (100MB+) is sent to the Daemon on every build —

      Fix

      create a .dockerignore file that excludes .git, node_modules, __pycache__, .env, tests, docs, and any large non-essential directories.

      Running containers as root in production
      Symptom

      container escape vulnerability gives attacker root access to the host; Kubernetes PodSecurityPolicy rejects the deployment —

      Fix

      add RUN useradd --create-home appuser and USER appuser to your Dockerfile. Ensure file permissions are set correctly with chown.

      No resource limits on containers
      Symptom

      one container consumes all host memory, OOM-killing other containers; Docker host becomes unresponsive —

      Fix

      set --memory and --cpus flags on docker run, or deploy.resources.limits in Docker Compose. Always set limits in production.

      Storing secrets in Dockerfile ENV or ARG
      Symptom

      secrets appear in docker history and docker inspect output; anyone with image pull access can extract credentials —

      Fix

      use Docker secrets, AWS Secrets Manager, HashiCorp Vault, or BuildKit --mount=type=secret for build-time secrets. Never put secrets in ENV, ARG, or COPY into the image.

      Using depends_on without healthchecks
      Symptom

      application fails intermittently on cold start because the database is not ready; docker compose up works sometimes but not always —

      Fix

      add a healthcheck to the dependency service and use depends_on with condition: service_healthy.

    Interview Questions on This Topic

    • QExplain the Docker architecture — what are the Docker client, Docker Daemon, and Docker Engine? How do they interact?
    • QWhat is the difference between a Docker container and a virtual machine? How does container isolation actually work at the kernel level?
    • QWhy should you copy requirements.txt and run pip install before copying your application code in a Dockerfile?
    • QWhat is a multi-stage build and when would you use it? What happens if you delete files in a RUN command instead of using multi-stage?
    • QHow do Docker volumes differ from bind mounts? When would you use each? What happens to data in each when the container is removed?
    • QWhat is the difference between Docker Compose and Docker Swarm? When would you choose each?
    • QWhat are five security best practices for production Docker images?
    • QA container starts but crashes immediately with exit code 137. What happened and how do you debug it?
    • QYour Docker build takes 5 minutes for a simple Python app. Walk me through how you would optimize it.

    Frequently Asked Questions

    What is Docker Hub and do I need it?

    Docker Hub is Docker Inc.'s public container image registry — the default source when you run docker pull. It hosts official images for postgres, nginx, python, node, and thousands of community images. You need Docker Hub (or another registry like Amazon ECR or GitHub Container Registry) to store and share container images between different environments — your laptop, CI pipeline, and production servers. For private images, use a private registry rather than Docker Hub's public repositories.

    Does Docker work on Windows and macOS?

    Linux containers require a Linux kernel. On macOS and Windows, Docker Desktop runs a lightweight Linux VM transparently — the Docker Daemon and all running containers operate inside that VM. This is why container startup time on macOS/Windows is slightly slower than native Linux, and why installing Docker Desktop is the standard approach for developers on those operating systems.

    What is the difference between Docker and Kubernetes?

    Docker packages and runs containers — either on a single machine (Docker Engine) or a small cluster (Docker Swarm). Kubernetes is a full container orchestration platform for running containers across large clusters of machines, handling scheduling, auto-scaling, rolling deployments, service discovery, and self-healing. A common pattern: develop and test locally with Docker Compose, deploy to production on Kubernetes or a managed service like Amazon ECS or AWS Fargate.

    What is Docker used for in data science?

    Docker containerization is widely used in data science for reproducibility — packaging a Jupyter notebook environment with specific Python, CUDA, and library versions so results are reproducible across different environments and team members. Data science teams use Docker images to standardise environments, run training jobs on cloud infrastructure, and deploy ML models as REST API services using containers.

    How do I reduce my Docker image size?

    Four main techniques: (1) Use slim or alpine base images (python:3.12-slim vs python:3.12 saves ~600MB). (2) Multi-stage builds — don't ship build tools to production. (3) Combine RUN commands with && to reduce layers. (4) Use .dockerignore to exclude large directories from the build context. A well-optimised Python image is typically 100-200MB vs 1GB+ naively.

    Should I run multiple processes in one container?

    Generally no — the container philosophy is one process per container. Multiple processes require a process supervisor (supervisord), make the container harder to monitor and scale, and blur the boundaries between services. Use Docker Compose to run multiple containers instead of cramming multiple processes into one.

    What does exit code 137 mean for a Docker container?

    Exit code 137 means the container was killed by signal 9 (SIGKILL), typically by the Linux OOM killer. The container exceeded its memory limit (--memory flag) or the host ran out of memory. Debug with docker inspect to check OOMKilled status, then either increase the memory limit or fix the memory leak in your application.

    🔥
    Naren Founder & Author

    Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

    ← PreviousDocker Networking Deep DiveNext →Docker in Production
    Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged