Docker Security Best Practices: Hardening Containers in Production
- Docker containers run as root by default. Add USER nonroot to every production Dockerfile. Pair with --security-opt=no-new-privileges and --cap-drop=ALL.
- The --privileged flag disables ALL container isolation. Never use it in production. Use --cap-drop=ALL and --cap-add for specific capabilities.
- Never bake secrets into Docker image layers. Use BuildKit --mount=type=secret for build-time and secrets managers for runtime.
- Image layer: minimal base, non-root user, no secrets in layers, scanned for CVEs
- Runtime layer: read-only filesystem, seccomp profile, no --privileged, resource limits
- Network layer: no published DB ports, custom bridge networks, TLS everywhere
- Secrets layer: never in ENV/ARG/COPY, use secrets managers or tmpfs mounts
Suspected container breakout or host compromise.
docker ps -a --format '{{.Names}} {{.Image}} {{.CreatedAt}}' | sort -k3docker inspect --format='{{.Name}} Mounts={{.Mounts}}' $(docker ps -q) | grep docker.sockContainer running as root in production.
for c in $(docker ps -q); do docker inspect --format='{{.Name}} user={{.Config.User}}' $c; donedocker inspect --format='{{.Name}} Privileged={{.HostConfig.Privileged}} CapAdd={{.HostConfig.CapAdd}}' $(docker ps -q)Image with known vulnerabilities deployed to production.
trivy image --severity CRITICAL,HIGH <image>docker history --no-trunc <image> | grep -i 'secret\|password\|key'Docker daemon exposed to the network without TLS.
ss -tlnp | grep -E '2375|2376'curl -s http://<host>:2375/versionContainer has writable root filesystem in production.
docker inspect --format='{{.Name}} ReadOnly={{.HostConfig.ReadonlyRootfs}}' $(docker ps -q)docker inspect --format='{{.Name}} Tmpfs={{.HostConfig.Tmpfs}}' $(docker ps -q)Docker daemon configuration is insecure.
cat /etc/docker/daemon.jsondocker info --format '{{json .SecurityOptions}}'Production Incident
Production Debug GuideFrom exposed sockets to root containers — systematic security audit paths.
Docker containers share the host kernel. A misconfigured container can escape its namespace, read host secrets, or pivot laterally across your cluster. The attack surface spans the image build pipeline, the runtime configuration, the network, the daemon itself, and your secrets management. Miss one layer and the rest does not matter.
Docker's defaults are built for developer convenience, not production hardening. Containers run as root by default. The seccomp profile blocks ~44 of 300+ syscalls but allows the rest. The daemon socket has no authentication by default. Understanding these defaults and how to override them is the foundation of Docker security.
Common misconceptions: containers are not VMs (they share the kernel, so kernel vulnerabilities affect all containers), --privileged is not 'a little extra access' (it disables all isolation), and deleting a secret from a Dockerfile layer does not remove it from the image (layers are additive). Every one of these misconceptions has caused a production breach.
Non-Root Containers — The Single Most Important Security Practice
Docker containers run as root by default. This means the application process inside the container has UID 0 — the same UID as root on the host. If the container's namespace isolation is broken (via a kernel vulnerability), the attacker gains root access to the host.
Running as a non-root user does not prevent container escape, but it limits the damage. A process running as UID 1000 inside the container, even after escaping the namespace, runs as UID 1000 on the host — an unprivileged user who cannot modify system files, install packages, or access other users' data.
- RUN useradd --create-home appuser
- USER appuser
The USER instruction must come before CMD/ENTRYPOINT. Any RUN instructions after USER execute as the non-root user, which may cause permission errors for operations that require root (apt-get install, chown). The common pattern is to perform all root operations first, then switch to the non-root user at the end.
Failure scenario — root container exploited via RCE: A web application container running as root had an RCE vulnerability in its image upload endpoint. The attacker uploaded a webshell and gained shell access as root inside the container. Because the container ran as root, the attacker could read /etc/shadow (if mounted), install tools (curl, ncat), and attempt container escape. If the container had run as a non-root user, the attacker would have been UID 1000 — unable to install packages, read protected files, or escalate privileges on the host.
# ── Secure Dockerfile: non-root user, minimal base, no secrets ── FROM python:3.12-slim-bookworm WORKDIR /app # All root operations first COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Create non-root user RUN groupadd --gid 1000 appgroup && \ useradd --uid 1000 --gid appgroup --create-home appuser # Copy application code and set ownership COPY --chown=appuser:appgroup . . # Switch to non-root user — everything after this runs as appuser USER appuser EXPOSE 8000 CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
docker build -f io/thecodeforge/secure-app.dockerfile -t io.thecodeforge/secure-app:v1 .
# Run with additional security flags:
docker run -d \
--name secure-app \
--read-only \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
--security-opt=no-new-privileges \
--tmpfs /tmp:size=64m \
-p 8000:8000 \
io.thecodeforge/secure-app:v1
# Verify non-root:
docker exec secure-app whoami
# Output: appuser
docker exec secure-app id
# Output: uid=1000(appuser) gid=1000(appgroup) groups=1000(appgroup)
- Docker was designed for developer convenience. Running as root avoids permission issues during development.
- Many base image instructions (apt-get install, chown) require root. Non-root by default would break most Dockerfiles.
- The USER instruction puts the responsibility on the developer — Docker provides the mechanism, not the policy.
- Kubernetes enforces non-root via Pod Security Standards. Docker does not — you must enforce it yourself.
Image Scanning and Supply Chain Security
Every dependency in your Docker image is an attack surface. The base OS packages, the runtime (Python, Node, Java), the application dependencies (pip, npm, Maven packages) — each can contain known CVEs. Image scanning identifies these vulnerabilities before they reach production.
Scanning tools: - Trivy: open-source, fast, scans OS packages and language dependencies. Integrates with CI/CD. - Grype: open-source, Syft-based, good for SBOM generation. - Docker Scout: Docker's built-in scanner, available in Docker Desktop and Docker Hub. - Snyk Container: commercial, deep integration with CI/CD and container registries.
When to scan: - In CI/CD: scan every image build. Fail the build on critical CVEs. - In the registry: scan on push. Block pulls of images with critical CVEs. - In production: scan running images periodically. Alert on newly discovered CVEs.
Supply chain attacks: Beyond CVEs, consider supply chain attacks — malicious packages injected into public registries. Mitigate with: - Image signing (Docker Content Trust, cosign) - SBOM (Software Bill of Materials) generation - Base image pinning to specific digests - Private registries for internal images
SBOM generation: An SBOM lists every package in your image with its version. It is required for compliance (SBOM Executive Order, SOC 2) and enables rapid response when a new CVE is disclosed — you can query your SBOM database to find all affected images without rescanning.
#!/bin/bash # Image scanning and supply chain security # ── Scan with Trivy ─────────────────────────────────────────────── # Install: curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh # Scan a local image for critical and high CVEs trivy image --severity CRITICAL,HIGH --exit-code 1 io.thecodeforge/secure-app:v1 # --exit-code 1 returns non-zero if vulnerabilities are found (fails CI) # Scan with JSON output for CI integration trivy image --format json --output trivy-report.json io.thecodeforge/secure-app:v1 # ── Generate SBOM with Syft ─────────────────────────────────────── # Install: curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh syft io.thecodeforge/secure-app:v1 -o spdx-json > sbom.spdx.json # SBOM can be queried later: 'which images contain openssl 3.0.2?' # ── Sign image with cosign ──────────────────────────────────────── # Install: go install github.com/sigstore/cosign/v2/cmd/cosign@latest # Generate a key pair (once) cosign generate-key-pair # Sign the image cosign sign --key cosign.key io.thecodeforge/secure-app:v1 # Verify the signature cosign verify --key cosign.pub io.thecodeforge/secure-app:v1 # ── CI/CD integration example (GitHub Actions) ──────────────────── # .github/workflows/docker-security.yml # - name: Scan image # run: trivy image --severity CRITICAL,HIGH --exit-code 1 $IMAGE # - name: Generate SBOM # run: syft $IMAGE -o spdx-json > sbom.spdx.json # - name: Sign image # run: cosign sign --key env://COSIGN_PRIVATE_KEY $IMAGE
2026-04-05T12:00:00.000Z INFO Detecting OS...
2026-04-05T12:00:00.000Z INFO Detecting python-pkg...
io.thecodeforge/secure-app:v1 (debian 12.5)
===========================================
Total: 2 (CRITICAL: 0, HIGH: 2)
┌─────────┬────────────────┬──────────┬───────────┐
│ Library │ Vulnerability │ Severity │ Installed │
├─────────┼────────────────┼──────────┼───────────┤
│ openssl │ CVE-2024-XXXX │ HIGH │ 3.0.11 │
│ libxml2 │ CVE-2024-YYYY │ HIGH │ 2.────────────────┴──────────┴───────────┘
- When a new CVE is disclosed, you can query your SBOM database to find all affected images in seconds — without rescanning every image.
- Compliance frameworks (SOC 2, PCI-DSS, SBOM Executive Order) require an inventory of all software components.
- SBOM enables rapid incident response — you know exactly what is in every image without forensic analysis.
- SBOM generation is a one-time cost per build. The benefits compound over time as your image library grows.
Secrets Management — Never Bake Secrets Into Images
Secrets (API keys, database passwords, TLS certificates) must never be baked into Docker image layers. Three exposure vectors make this critical:
Vector 1: ENV in Dockerfile. ENV SECRET_KEY=abc123 is visible in docker inspect, docker history, and every container derived from the image. Anyone with image pull access can extract the secret.
Vector 2: ARG in Dockerfile. ARG is build-time only, but it is visible in docker history. If used in a RUN instruction that writes to a file, the secret ends up in that layer.
Vector 3: COPY secrets into the image. COPY .env /app/.env bakes the entire .env file into a layer. Even if a later RUN rm /app/.env removes it, the file still exists in the earlier layer — layers are additive.
The right patterns: - Build-time secrets: Use BuildKit --mount=type=secret. The secret is available during the build but never written to any layer. - Runtime secrets: Use Docker secrets (Swarm), Kubernetes secrets, or mount a tmpfs volume with the secret file. - Environment variables: Acceptable for non-sensitive config. Never for secrets.
Docker Content Trust (DCT): DCT uses digital signatures to verify that an image has not been tampered with. When DOCKER_CONTENT_TRUST=1 is set, Docker only pulls signed images. This prevents supply chain attacks where a malicious image is pushed to a registry with the same tag as a legitimate image.
# ── Safe secrets handling with BuildKit ────────────────────────── # syntax=docker/dockerfile:1 # Requires BuildKit: DOCKER_BUILDKIT=1 docker build ... FROM python:3.12-slim-bookworm WORKDIR /app COPY requirements.txt . # WRONG: This bakes the secret into the image layer # RUN pip install --no-cache-dir -r requirements.txt --index-url https://user:password@private.pypi/simple # RIGHT: Mount the secret as a file during build, never stored in a layer RUN --mount=type=secret,id=pypi_token \ pip install --no-cache-dir -r requirements.txt \ --index-url https://$(cat /run/secrets/pypi_token)@private.pypi/simple # Build command: # DOCKER_BUILDKIT=1 docker build \ # --secret id=pypi_token,src=$HOME/.pypi_token \ # -t io.thecodeforge/app:v1 . COPY . . RUN useradd --create-home appuser USER appuser CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
DOCKER_BUILDKIT=1 docker build \
--secret id=pypi_token,src=$HOME/.pypi_token \
-t io.thecodeforge/app:v1 .
# Verify the secret is NOT in the image:
docker history --no-trunc io.thecodeforge/app:v1 | grep pypi_token
# No output — the secret was never written to a layer
docker save io.thecodeforge/app:v1 | tar -xO 2>/dev/null | grep -c 'password'
# 0 — no secrets in any layer
- Docker layers are additive. Each layer is a filesystem diff on top of the previous one.
- COPY .env adds the file to layer N. RUN rm .env adds a whiteout marker to layer N+1.
- The file still exists in layer N. Anyone who extracts the layers can read it.
- Only BuildKit --mount=type=secret avoids writing the secret to any layer.
Runtime Hardening — Seccomp, AppArmor, Read-Only Filesystems, and Capabilities
Runtime hardening reduces the attack surface of a running container by restricting what the container process can do. Four mechanisms work together:
1. seccomp (Secure Computing Mode): Filters syscalls at the kernel level. Docker's default seccomp profile blocks ~44 dangerous syscalls (mount, reboot, kexec_load) but allows the rest. Custom profiles can block more syscalls for defense in depth.
2. AppArmor / SELinux: Mandatory Access Control (MAC) frameworks that restrict file access, network access, and capability usage at the process level. AppArmor is default on Ubuntu/Debian. SELinux is default on RHEL/CentOS.
3. Read-only filesystem: --read-only makes the container's root filesystem read-only. The application can only write to tmpfs mounts. This prevents an attacker from installing tools, modifying application code, or writing a backdoor to the filesystem.
4. Linux capabilities: Fine-grained privilege control. Instead of granting full root (all capabilities), grant only what is needed. --cap-drop=ALL removes all capabilities. --cap-add=NET_BIND_SERVICE adds back only the ability to bind to privileged ports.
Performance impact: seccomp adds <1% CPU overhead per syscall (the kernel checks the filter before executing the syscall). AppArmor adds similar negligible overhead. Read-only filesystems can actually improve performance by preventing unnecessary writes. There is no performance reason to skip these security measures.
Failure scenario — writable filesystem exploited: An attacker gained RCE in a web application container through a deserialization vulnerability. Because the filesystem was writable, the attacker wrote a PHP webshell to /app/uploads/shell.php and used it for persistent access. With --read-only, the write would have failed, and the attacker would have been limited to in-memory exploitation (much harder).
#!/bin/bash # Runtime hardening flags for production containers # ── Full hardened container run command ──────────────────────────── docker run -d \ --name secure-api \ \ # Non-root user (from Dockerfile USER instruction) \ # Read-only root filesystem --read-only \ \ # Writable tmpfs for temp files --tmpfs /tmp:size=64m,noexec,nosuid \ --tmpfs /var/run:size=16m,noexec,nosuid \ \ # Drop ALL capabilities, add back only what's needed --cap-drop=ALL \ --cap-add=NET_BIND_SERVICE \ \ # Prevent privilege escalation via setuid binaries --security-opt=no-new-privileges:true \ \ # Use custom seccomp profile (optional — default is good enough for most) --security-opt=seccomp=/etc/docker/seccomp-profile.json \ \ # Use AppArmor profile (Ubuntu/Debian) --security-opt=apparmor=docker-default \ \ # Resource limits --memory=512m \ --cpus=1.0 \ --pids-limit=256 \ \ # Disable inter-container communication --icc=false \ \ # Network --network app-network \ -p 8000:8000 \ \ io.thecodeforge/secure-app:v1 # ── Verify capabilities ─────────────────────────────────────────── docker exec secure-api cat /proc/1/status | grep Cap # CapInh: 0000000000000000 (inherited — should be 0) # CapPrm: 0000000000000400 (permitted — only NET_BIND_SERVICE) # CapEff: 0000000000000400 (effective — only NET_BIND_SERVICE) # CapBnd: 0000000000000400 (bounding — only NET_BIND_SERVICE) # ── Verify seccomp profile ──────────────────────────────────────── docker exec secure-api grep Seccomp /proc/1/status # Seccomp: 2 (filter mode — seccomp profile is active) # ── Verify read-only filesystem ─────────────────────────────────── docker exec secure-api touch /test-file 2>&1 # touch: cannot touch '/test-file': Read-only file system
# - Read-only filesystem
# - Dropped capabilities (only NET_BIND_SERVICE)
# - No privilege escalation
# - seccomp and AppArmor profiles
# - Resource limits (memory, CPU, PIDs)
# - Non-root user
- Non-root limits the UID. Capabilities limit the privileges. They are complementary, not redundant.
- A non-root process with CAP_NET_RAW can sniff network traffic. Dropping ALL capabilities prevents this.
- A non-root process with CAP_SYS_PTRACE can debug other processes. Dropping ALL prevents this.
- The principle of least privilege demands both: minimum UID AND minimum capabilities.
Docker Daemon Security — Protecting the Control Plane
The Docker daemon (dockerd) is the control plane for all container operations. If the daemon is compromised, every container on the host is compromised. Three critical daemon security practices:
1. Never expose the daemon socket without TLS. The default daemon listens on a Unix socket (/var/run/docker.sock) which requires local access. If configured to listen on TCP (port 2375), it accepts unauthenticated connections from the network. Anyone who can reach port 2375 can create, stop, and delete containers — effectively root access to the host.
2. Enable TLS with client certificate authentication. If remote daemon access is required (for CI/CD, monitoring), configure TLS on port 2376 with client certificates. Only clients with a valid certificate can connect. This is the equivalent of SSH key authentication for the Docker daemon.
3. Enable user namespace remapping. By default, UID 0 inside the container maps to UID 0 on the host. User namespace remapping maps container UIDs to unprivileged host UIDs (e.g., container UID 0 -> host UID 100000). This means even a container escape results in an unprivileged host user, not root.
4. Enable live-restore. If the Docker daemon restarts, running containers are killed by default. live-restore=true keeps containers running during daemon restarts, improving availability. This also means a daemon crash does not take down your production workloads.
5. Disable the legacy registry (v1). Docker Registry v1 is deprecated and has known security issues. Ensure the daemon only interacts with v2 registries.
{
"userns-remap": "default",
"live-restore": true,
"no-new-privileges": true,
"userland-proxy": false,
"icc": false,
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"storage-driver": "overlay2",
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 65536,
"Soft": 65536
}
},
"tls": true,
"tlscacert": "/etc/docker/tls/ca.pem",
"tlscert": "/etc/docker/tls/server-cert.pem",
"tlskey": "/etc/docker/tls/server-key.pem",
"tlsverify": true,
"hosts": [
"unix:///var/run/docker.sock",
"tcp://0.0.0.0:2376"
]
}
sudo systemctl restart docker
# Verify user namespace remapping:
docker info | grep -i "docker root dir"
# Docker Root Dir: /var/lib/docker/100000.100000
# The 100000.100000 indicates remapping is active
# Verify TLS:
docker --tlsverify \
--tlscacert=/etc/docker/tls/ca.pem \
--tlscert=/etc/docker/tls/client-cert.pem \
--tlskey=/etc/docker/tls/client-key.pem \
-H=tcp://localhost:2376 version
# Must use TLS flags to connect — unauthenticated connections are rejected
- User namespace remapping breaks some workflows — file permissions between host and container become mismatched.
- Volume mounts with specific UID/GID expectations may fail because the container UID maps to a different host UID.
- Some applications that need to interact with host resources (Docker-in-Docker, monitoring agents) break with remapping.
- Docker chose developer convenience over security by default. Production environments should enable it.
| Mechanism | Layer | Overhead | What It Protects | Default State |
|---|---|---|---|---|
| Non-root user (USER) | Image/Runtime | None | Limits damage after container escape | Off (runs as root) |
| seccomp profile | Kernel | <1% per syscall | Blocks dangerous syscalls (mount, reboot) | On (default profile) |
| AppArmor / SELinux | Kernel | <1% per access check | Restricts file/network/capability access | On (docker-default on Ubuntu) |
| Read-only filesystem | Runtime | Can improve perf (fewer writes) | Prevents filesystem modification | Off (writable) |
| Capabilities (cap-drop) | Kernel | None | Limits kernel-level privileges | On (default profile drops ~14 of ~42) |
| no-new-privileges | Kernel | None | Prevents setuid/setgid escalation | Off |
| User namespace remapping | Daemon | Negligible | Maps container UID 0 to unprivileged host UID | Off |
| Image scanning (Trivy) | CI/CD | Build time only | Identifies known CVEs in image | Off (must be configured) |
| Image signing (cosign) | Supply chain | Push/pull time only | Verifies image integrity and provenance | Off (must be configured) |
🎯 Key Takeaways
- Docker containers run as root by default. Add USER nonroot to every production Dockerfile. Pair with --security-opt=no-new-privileges and --cap-drop=ALL.
- The --privileged flag disables ALL container isolation. Never use it in production. Use --cap-drop=ALL and --cap-add for specific capabilities.
- Never bake secrets into Docker image layers. Use BuildKit --mount=type=secret for build-time and secrets managers for runtime.
- The Docker daemon socket (/var/run/docker.sock) is the most dangerous thing to expose. A container with socket access can take over the host.
- Image scanning with Trivy, SBOM generation with Syft, and image signing with cosign form a complete supply chain security pipeline.
- Runtime hardening is layered: seccomp + AppArmor + read-only filesystem + capabilities. Each adds <1% overhead. There is no reason to skip them.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QExplain why running a Docker container as root is dangerous. What is the blast radius if a root container is compromised versus a non-root container?
- QWhat is the --privileged flag and why should it never be used in production? What alternatives provide the specific capabilities a container needs without full privilege escalation?
- QA developer pushes a Docker image with a database password hardcoded as an ENV variable to a public Docker Hub repository. Walk me through the exposure vectors and the remediation steps.
- QHow do seccomp profiles and AppArmor work together to harden a container at runtime? What is the performance impact?
- QWhat is user namespace remapping in Docker? Why is it not enabled by default, and when should you enable it in production?
- QYour team needs to give a monitoring container access to the Docker API to collect metrics. How do you do this securely without mounting the raw Docker socket?
Frequently Asked Questions
What is the difference between seccomp and AppArmor?
seccomp filters syscalls at the kernel level — it blocks specific system calls like mount, reboot, or kexec_load. AppArmor restricts what resources a process can access — files, network sockets, capabilities. seccomp is syscall-level filtering. AppArmor is access control. They are complementary: seccomp blocks dangerous operations, AppArmor restricts access to specific resources.
Is Docker secure enough for production?
Docker's defaults are not production-secure. Containers run as root, the seccomp profile is permissive, and the daemon socket has no authentication. But with proper hardening — non-root users, custom seccomp profiles, read-only filesystems, TLS on the daemon, image scanning, and secrets management — Docker containers can meet PCI-DSS and SOC 2 compliance requirements. The hardening is your responsibility, not Docker's.
What happens if I use --privileged in production?
--privileged disables ALL container isolation: namespaces, cgroups, seccomp, AppArmor, and capability restrictions. The container can access all host devices, load kernel modules, modify the host filesystem, and create new namespaces. It is equivalent to giving the container root access to the host. A compromised privileged container IS a compromised host.
How do I scan my Docker images for vulnerabilities?
Use Trivy: trivy image --severity CRITICAL,HIGH --exit-code 1 <image>. Integrate this into your CI pipeline to fail builds on critical CVEs. Trivy scans both OS packages (apt, apk) and language dependencies (pip, npm, Maven). For continuous monitoring, scan images in your registry and alert on newly discovered CVEs.
What is the safest way to pass secrets to a Docker container?
For build-time secrets, use BuildKit --mount=type=secret. The secret is available during the build but never written to any image layer. For runtime secrets, use Docker secrets (Swarm), Kubernetes secrets, or mount a tmpfs volume. Never use ENV, ARG, or COPY to pass secrets — they are permanently visible in image layers.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.