Advanced 13 min · March 06, 2026
Docker Security Best Practices

Docker Socket Mounts — Why Attackers Get Root in 3 Minutes

Mounting /var/run/docker.sock gives containers full host root access.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Image layer: minimal base, non-root user, no secrets in layers, scanned for CVEs
  • Runtime layer: read-only filesystem, seccomp profile, no --privileged, resource limits
  • Network layer: no published DB ports, custom bridge networks, TLS everywhere
  • Secrets layer: never in ENV/ARG/COPY, use secrets managers or tmpfs mounts
✦ Definition~90s read
What is Docker Security?

Docker socket mounts are a common but catastrophic misconfiguration that hands attackers root-level access to the Docker host in under three minutes. The Docker socket (/var/run/docker.sock) is the Unix socket the Docker daemon listens on — it’s the control plane for all container operations.

Think of a Docker container like a rental apartment in a giant building.

When you mount this socket into a container (e.g., -v /var/run/docker.sock:/var/run/docker.sock), that container gains the ability to issue arbitrary commands to the daemon, including creating new containers with host-level privileges, mounting the host filesystem, and executing code as root on the host. This is not a vulnerability; it’s a deliberate feature abused by attackers.

Tools like docker run --privileged or docker exec from within the container become trivial, and real-world exploits (e.g., via compromised CI/CD pipelines or developer images) routinely use this to escape containers and pivot to the host. The only safe practice is to never mount the Docker socket into any container — use alternative APIs like Docker’s remote API over TLS with client certificates, or tools like docker-socket-proxy that restrict socket access to read-only operations.

If you absolutely must expose Docker control from a container, run a dedicated proxy container that filters commands (e.g., tecnativa/docker-socket-proxy) and never grant --privileged or --cap-add=SYS_ADMIN to the container holding the socket.

Plain-English First

Think of a Docker container like a rental apartment in a giant building. The building is your host server, and each apartment is a container. Bad security means a tenant can pick the lock between apartments, mess with the boiler room (the kernel), or leave the front door wide open to strangers. This article is the building code — the rules that keep every apartment isolated, the boiler room locked, and strangers out.

Docker containers share the host kernel. A misconfigured container can escape its namespace, read host secrets, or pivot laterally across your cluster. The attack surface spans the image build pipeline, the runtime configuration, the network, the daemon itself, and your secrets management. Miss one layer and the rest does not matter.

Docker's defaults are built for developer convenience, not production hardening. Containers run as root by default. The seccomp profile blocks ~44 of 300+ syscalls but allows the rest. The daemon socket has no authentication by default. Understanding these defaults and how to override them is the foundation of Docker security.

Common misconceptions: containers are not VMs (they share the kernel, so kernel vulnerabilities affect all containers), --privileged is not 'a little extra access' (it disables all isolation), and deleting a secret from a Dockerfile layer does not remove it from the image (layers are additive). Every one of these misconceptions has caused a production breach.

Why Docker Socket Mounts Are a Root Backdoor

Docker socket mounts are a security anti-pattern where the host's /var/run/docker.sock is bind-mounted into a container, giving that container direct access to the Docker daemon API. This effectively grants root-equivalent privileges on the host, because the Docker API allows creating privileged containers, mounting host filesystems, and executing arbitrary commands as root. Attackers exploit this in under three minutes by spinning up a privileged container that mounts the host root filesystem.

When a container holds the Docker socket, it can issue any Docker API call — including docker run -v /:/host --privileged. This bypasses all container isolation, user namespaces, and seccomp profiles. The socket file itself is a Unix domain socket owned by root:docker, so any process inside the container with the socket mounted can communicate with the daemon without authentication. No additional exploits are needed.

Teams often mount the socket for legitimate use cases like CI/CD runners, monitoring agents, or management tools. The risk is acceptable only when the container runs trusted code in a controlled environment. In production, any container with the socket mount is a single point of compromise — if an attacker gains code execution inside it, they own the host and every other container on it.

Not Just a Permission Problem
Mounting the Docker socket is not a privilege escalation — it is a direct root backdoor. No exploit required; the API itself is the vulnerability.
Production Insight
A CI runner container with the socket mount gets compromised via a malicious npm package — attacker creates a privileged container, mounts /etc/shadow, and exfiltrates host password hashes.
Symptom: unexpected containers appear in docker ps, or the host SSH key is suddenly used from an unknown IP.
Rule of thumb: never mount the Docker socket in a container that runs untrusted code or user-submitted payloads — use rootless Docker or a remote API with TLS instead.
Key Takeaway
Mounting /var/run/docker.sock is equivalent to granting root on the host — treat it as a root credential.
Never mount the socket in containers that process untrusted input, including CI runners, web apps, or job executors.
If you must manage containers from a container, use the Docker API over TLS with client certificates, not the socket.
Docker Socket Mount Attack Chain THECODEFORGE.IO Docker Socket Mount Attack Chain How attackers escalate from container access to root on host Container with Socket Mount Attacker gains container shell via exploit Docker Daemon API Access Communicate with host Docker via /var/run/docker.sock Privileged Container Launch Create new container with host root privileges Host Root Shell Attacker executes commands as root on host ⚠ Never mount docker.sock into any container Use rootless mode and read-only root filesystems THECODEFORGE.IO
thecodeforge.io
Docker Socket Mount Attack Chain
Docker Security Best Practices

Non-Root Containers — The Single Most Important Security Practice

Docker containers run as root by default. This means the application process inside the container has UID 0 — the same UID as root on the host. If the container's namespace isolation is broken (via a kernel vulnerability), the attacker gains root access to the host.

Running as a non-root user does not prevent container escape, but it limits the damage. A process running as UID 1000 inside the container, even after escaping the namespace, runs as UID 1000 on the host — an unprivileged user who cannot modify system files, install packages, or access other users' data.

The fix is a two-line addition to the Dockerfile
  • RUN useradd --create-home appuser
  • USER appuser

The USER instruction must come before CMD/ENTRYPOINT. Any RUN instructions after USER execute as the non-root user, which may cause permission errors for operations that require root (apt-get install, chown). The common pattern is to perform all root operations first, then switch to the non-root user at the end.

Failure scenario — root container exploited via RCE: A web application container running as root had an RCE vulnerability in its image upload endpoint. The attacker uploaded a webshell and gained shell access as root inside the container. Because the container ran as root, the attacker could read /etc/shadow (if mounted), install tools (curl, ncat), and attempt container escape. If the container had run as a non-root user, the attacker would have been UID 1000 — unable to install packages, read protected files, or escalate privileges on the host.

io/thecodeforge/secure-app.dockerfileDOCKERFILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# ── Secure Dockerfile: non-root user, minimal base, no secrets ──
FROM python:3.12-slim-bookworm

WORKDIR /app

# All root operations first
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Create non-root user
RUN groupadd --gid 1000 appgroup && \
    useradd --uid 1000 --gid appgroup --create-home appuser

# Copy application code and set ownership
COPY --chown=appuser:appgroup . .

# Switch to non-root user — everything after this runs as appuser
USER appuser

EXPOSE 8000

CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Output
# Build:
docker build -f io/thecodeforge/secure-app.dockerfile -t io.thecodeforge/secure-app:v1 .
# Run with additional security flags:
docker run -d \
--name secure-app \
--read-only \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
--security-opt=no-new-privileges \
--tmpfs /tmp:size=64m \
-p 8000:8000 \
io.thecodeforge/secure-app:v1
# Verify non-root:
docker exec secure-app whoami
# Output: appuser
docker exec secure-app id
# Output: uid=1000(appuser) gid=1000(appgroup) groups=1000(appgroup)
Non-Root as a Seatbelt, Not a Cage
  • Docker was designed for developer convenience. Running as root avoids permission issues during development.
  • Many base image instructions (apt-get install, chown) require root. Non-root by default would break most Dockerfiles.
  • The USER instruction puts the responsibility on the developer — Docker provides the mechanism, not the policy.
  • Kubernetes enforces non-root via Pod Security Standards. Docker does not — you must enforce it yourself.
Production Insight
The no-new-privileges flag is a critical companion to non-root users. Without it, a non-root process that executes a setuid binary (like sudo or ping) can escalate to root. The --security-opt=no-new-privileges flag prevents this escalation. Always pair it with USER nonroot in production.
Key Takeaway
Containers run as root by default. This is the single most dangerous default in Docker. Add USER nonroot to every production Dockerfile. Pair with --security-opt=no-new-privileges and --cap-drop=ALL. Non-root does not prevent escape but limits the blast radius.
User Configuration Decisions
IfApplication needs to bind to port < 1024
UseUse --cap-add=NET_BIND_SERVICE instead of running as root. Or use a reverse proxy that binds to the privileged port.
IfApplication needs to write to specific directories
UseSet ownership with COPY --chown=appuser:appgroup. Use --tmpfs for temporary writable paths.
IfApplication needs to install packages at runtime
UseThis is an anti-pattern. Pre-install all dependencies in the Dockerfile. If dynamic installation is needed, use a separate init container.
IfContainer needs to interact with Docker daemon
UseUse a socket proxy (Tecnativa/docker-socket-proxy) that restricts API access. Never mount the raw socket.

Image Scanning and Supply Chain Security

Every dependency in your Docker image is an attack surface. The base OS packages, the runtime (Python, Node, Java), the application dependencies (pip, npm, Maven packages) — each can contain known CVEs. Image scanning identifies these vulnerabilities before they reach production.

Scanning tools: - Trivy: open-source, fast, scans OS packages and language dependencies. Integrates with CI/CD. - Grype: open-source, Syft-based, good for SBOM generation. - Docker Scout: Docker's built-in scanner, available in Docker Desktop and Docker Hub. - Snyk Container: commercial, deep integration with CI/CD and container registries.

When to scan: - In CI/CD: scan every image build. Fail the build on critical CVEs. - In the registry: scan on push. Block pulls of images with critical CVEs. - In production: scan running images periodically. Alert on newly discovered CVEs.

Supply chain attacks: Beyond CVEs, consider supply chain attacks — malicious packages injected into public registries. Mitigate with: - Image signing (Docker Content Trust, cosign) - SBOM (Software Bill of Materials) generation - Base image pinning to specific digests - Private registries for internal images

SBOM generation: An SBOM lists every package in your image with its version. It is required for compliance (SBOM Executive Order, SOC 2) and enables rapid response when a new CVE is disclosed — you can query your SBOM database to find all affected images without rescanning.

io/thecodeforge/image-scanning.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/bin/bash
# Image scanning and supply chain security

# ── Scan with Trivy ───────────────────────────────────────────────
# Install: curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh

# Scan a local image for critical and high CVEs
trivy image --severity CRITICAL,HIGH --exit-code 1 io.thecodeforge/secure-app:v1
# --exit-code 1 returns non-zero if vulnerabilities are found (fails CI)

# Scan with JSON output for CI integration
trivy image --format json --output trivy-report.json io.thecodeforge/secure-app:v1

# ── Generate SBOM with Syft ───────────────────────────────────────
# Install: curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh

syft io.thecodeforge/secure-app:v1 -o spdx-json > sbom.spdx.json
# SBOM can be queried later: 'which images contain openssl 3.0.2?'

# ── Sign image with cosign ────────────────────────────────────────
# Install: go install github.com/sigstore/cosign/v2/cmd/cosign@latest

# Generate a key pair (once)
cosign generate-key-pair

# Sign the image
cosign sign --key cosign.key io.thecodeforge/secure-app:v1

# Verify the signature
cosign verify --key cosign.pub io.thecodeforge/secure-app:v1

# ── CI/CD integration example (GitHub Actions) ────────────────────
# .github/workflows/docker-security.yml
# - name: Scan image
#   run: trivy image --severity CRITICAL,HIGH --exit-code 1 $IMAGE
# - name: Generate SBOM
#   run: syft $IMAGE -o spdx-json > sbom.spdx.json
# - name: Sign image
#   run: cosign sign --key env://COSIGN_PRIVATE_KEY $IMAGE
Output
# trivy image output:
2026-04-05T12:00:00.000Z INFO Detecting OS...
2026-04-05T12:00:00.000Z INFO Detecting python-pkg...
io.thecodeforge/secure-app:v1 (debian 12.5)
===========================================
Total: 2 (CRITICAL: 0, HIGH: 2)
┌─────────┬────────────────┬──────────┬───────────┐
│ Library │ Vulnerability │ Severity │ Installed │
├─────────┼────────────────┼──────────┼───────────┤
│ openssl │ CVE-2024-XXXX │ HIGH │ 3.0.11 │
│ libxml2 │ CVE-2024-YYYY │ HIGH │ 2.────────────────┴──────────┴───────────┘
Image Scanning as a Health Check for Your Supply Chain
  • When a new CVE is disclosed, you can query your SBOM database to find all affected images in seconds — without rescanning every image.
  • Compliance frameworks (SOC 2, PCI-DSS, SBOM Executive Order) require an inventory of all software components.
  • SBOM enables rapid incident response — you know exactly what is in every image without forensic analysis.
  • SBOM generation is a one-time cost per build. The benefits compound over time as your image library grows.
Production Insight
The most common CI/CD security gap is scanning only the application dependencies, not the base image. A team scanned their Python packages with pip-audit but missed a critical CVE in the base OS (Debian's libssl). The fix: use Trivy which scans both OS packages and language dependencies in a single pass. Integrate it into CI with --exit-code 1 to fail the build on critical CVEs.
Key Takeaway
Image scanning is not optional in production. Integrate Trivy into CI with --exit-code 1. Generate SBOMs for compliance and rapid incident response. Sign images with cosign to prevent tampering. Scan both OS packages and language dependencies — partial scanning misses critical CVEs.

Secrets Management — Never Bake Secrets Into Images

Secrets (API keys, database passwords, TLS certificates) must never be baked into Docker image layers. Three exposure vectors make this critical:

Vector 1: ENV in Dockerfile. ENV SECRET_KEY=abc123 is visible in docker inspect, docker history, and every container derived from the image. Anyone with image pull access can extract the secret.

Vector 2: ARG in Dockerfile. ARG is build-time only, but it is visible in docker history. If used in a RUN instruction that writes to a file, the secret ends up in that layer.

Vector 3: COPY secrets into the image. COPY .env /app/.env bakes the entire .env file into a layer. Even if a later RUN rm /app/.env removes it, the file still exists in the earlier layer — layers are additive.

The right patterns: - Build-time secrets: Use BuildKit --mount=type=secret. The secret is available during the build but never written to any layer. - Runtime secrets: Use Docker secrets (Swarm), Kubernetes secrets, or mount a tmpfs volume with the secret file. - Environment variables: Acceptable for non-sensitive config. Never for secrets.

Docker Content Trust (DCT): DCT uses digital signatures to verify that an image has not been tampered with. When DOCKER_CONTENT_TRUST=1 is set, Docker only pulls signed images. This prevents supply chain attacks where a malicious image is pushed to a registry with the same tag as a legitimate image.

io/thecodeforge/secrets-safe.dockerfileDOCKERFILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# ── Safe secrets handling with BuildKit ──────────────────────────
# syntax=docker/dockerfile:1
# Requires BuildKit: DOCKER_BUILDKIT=1 docker build ...

FROM python:3.12-slim-bookworm

WORKDIR /app

COPY requirements.txt .

# WRONG: This bakes the secret into the image layer
# RUN pip install --no-cache-dir -r requirements.txt --index-url https://user:password@private.pypi/simple

# RIGHT: Mount the secret as a file during build, never stored in a layer
RUN --mount=type=secret,id=pypi_token \
    pip install --no-cache-dir -r requirements.txt \
    --index-url https://$(cat /run/secrets/pypi_token)@private.pypi/simple

# Build command:
# DOCKER_BUILDKIT=1 docker build \
#   --secret id=pypi_token,src=$HOME/.pypi_token \
#   -t io.thecodeforge/app:v1 .

COPY . .

RUN useradd --create-home appuser
USER appuser

CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Output
# Build with secret:
DOCKER_BUILDKIT=1 docker build \
--secret id=pypi_token,src=$HOME/.pypi_token \
-t io.thecodeforge/app:v1 .
# Verify the secret is NOT in the image:
docker history --no-trunc io.thecodeforge/app:v1 | grep pypi_token
# No output — the secret was never written to a layer
docker save io.thecodeforge/app:v1 | tar -xO 2>/dev/null | grep -c 'password'
# 0 — no secrets in any layer
Secrets as Nuclear Launch Codes
  • Docker layers are additive. Each layer is a filesystem diff on top of the previous one.
  • COPY .env adds the file to layer N. RUN rm .env adds a whiteout marker to layer N+1.
  • The file still exists in layer N. Anyone who extracts the layers can read it.
  • Only BuildKit --mount=type=secret avoids writing the secret to any layer.
Production Insight
The .dockerignore file is your first line of defense against accidental secret exposure. Add .env, .pem, .key, credentials.json, and any file that might contain secrets. But .dockerignore is not sufficient alone — a developer might rename the secret file or pass it as an ARG. The only reliable solution is BuildKit secrets for build-time and secrets managers for runtime.
Key Takeaway
Never bake secrets into Docker image layers. ENV, ARG, and COPY all expose secrets permanently. Use BuildKit --mount=type=secret for build-time secrets and secrets managers for runtime secrets. The .dockerignore file is your first line of defense but not sufficient alone. If a secret is ever baked in, rotate it immediately.
Secret Handling Strategy
IfSecret needed during docker build (private registry auth, API keys)
UseUse BuildKit --mount=type=secret. Never use ARG or ENV for secrets.
IfSecret needed at runtime (database password, API key)
UseUse Docker secrets (Swarm), Kubernetes secrets, or mount a tmpfs volume.
IfSecret is a TLS certificate
UseMount as a volume from the host or a secrets manager. Never COPY into the image.
IfSecret accidentally baked into a pushed image
UseRotate the secret immediately. Delete the tag. Rebuild with BuildKit secrets. Audit all layers.

Runtime Hardening — Seccomp, AppArmor, Read-Only Filesystems, and Capabilities

Runtime hardening reduces the attack surface of a running container by restricting what the container process can do. Four mechanisms work together:

1. seccomp (Secure Computing Mode): Filters syscalls at the kernel level. Docker's default seccomp profile blocks ~44 dangerous syscalls (mount, reboot, kexec_load) but allows the rest. Custom profiles can block more syscalls for defense in depth.

2. AppArmor / SELinux: Mandatory Access Control (MAC) frameworks that restrict file access, network access, and capability usage at the process level. AppArmor is default on Ubuntu/Debian. SELinux is default on RHEL/CentOS.

3. Read-only filesystem: --read-only makes the container's root filesystem read-only. The application can only write to tmpfs mounts. This prevents an attacker from installing tools, modifying application code, or writing a backdoor to the filesystem.

4. Linux capabilities: Fine-grained privilege control. Instead of granting full root (all capabilities), grant only what is needed. --cap-drop=ALL removes all capabilities. --cap-add=NET_BIND_SERVICE adds back only the ability to bind to privileged ports.

Performance impact: seccomp adds <1% CPU overhead per syscall (the kernel checks the filter before executing the syscall). AppArmor adds similar negligible overhead. Read-only filesystems can actually improve performance by preventing unnecessary writes. There is no performance reason to skip these security measures.

Failure scenario — writable filesystem exploited: An attacker gained RCE in a web application container through a deserialization vulnerability. Because the filesystem was writable, the attacker wrote a PHP webshell to /app/uploads/shell.php and used it for persistent access. With --read-only, the write would have failed, and the attacker would have been limited to in-memory exploitation (much harder).

io/thecodeforge/runtime-hardening.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#!/bin/bash
# Runtime hardening flags for production containers

# ── Full hardened container run command ────────────────────────────
docker run -d \
  --name secure-api \
  \
  # Non-root user (from Dockerfile USER instruction)
  \
  # Read-only root filesystem
  --read-only \
  \
  # Writable tmpfs for temp files
  --tmpfs /tmp:size=64m,noexec,nosuid \
  --tmpfs /var/run:size=16m,noexec,nosuid \
  \
  # Drop ALL capabilities, add back only what's needed
  --cap-drop=ALL \
  --cap-add=NET_BIND_SERVICE \
  \
  # Prevent privilege escalation via setuid binaries
  --security-opt=no-new-privileges:true \
  \
  # Use custom seccomp profile (optional — default is good enough for most)
  --security-opt=seccomp=/etc/docker/seccomp-profile.json \
  \
  # Use AppArmor profile (Ubuntu/Debian)
  --security-opt=apparmor=docker-default \
  \
  # Resource limits
  --memory=512m \
  --cpus=1.0 \
  --pids-limit=256 \
  \
  # Disable inter-container communication
  --icc=false \
  \
  # Network
  --network app-network \
  -p 8000:8000 \
  \
  io.thecodeforge/secure-app:v1

# ── Verify capabilities ───────────────────────────────────────────
docker exec secure-api cat /proc/1/status | grep Cap
# CapInh: 0000000000000000  (inherited — should be 0)
# CapPrm: 0000000000000400  (permitted — only NET_BIND_SERVICE)
# CapEff: 0000000000000400  (effective — only NET_BIND_SERVICE)
# CapBnd: 0000000000000400  (bounding — only NET_BIND_SERVICE)

# ── Verify seccomp profile ────────────────────────────────────────
docker exec secure-api grep Seccomp /proc/1/status
# Seccomp: 2  (filter mode — seccomp profile is active)

# ── Verify read-only filesystem ───────────────────────────────────
docker exec secure-api touch /test-file 2>&1
# touch: cannot touch '/test-file': Read-only file system
Output
# All security flags applied. Container runs with:
# - Read-only filesystem
# - Dropped capabilities (only NET_BIND_SERVICE)
# - No privilege escalation
# - seccomp and AppArmor profiles
# - Resource limits (memory, CPU, PIDs)
# - Non-root user
Runtime Hardening as Apartment Security Layers
  • Non-root limits the UID. Capabilities limit the privileges. They are complementary, not redundant.
  • A non-root process with CAP_NET_RAW can sniff network traffic. Dropping ALL capabilities prevents this.
  • A non-root process with CAP_SYS_PTRACE can debug other processes. Dropping ALL prevents this.
  • The principle of least privilege demands both: minimum UID AND minimum capabilities.
Production Insight
The default seccomp profile is a good starting point but not sufficient for high-security environments. It allows ~280 of 300+ syscalls. For containers that do not need network access, block socket, bind, connect, and listen syscalls. For containers that do not need to spawn processes, block fork, clone, and execve. Custom seccomp profiles are generated from Docker's default profile by removing allowed syscalls.
Key Takeaway
Runtime hardening is layered defense: seccomp filters syscalls, AppArmor restricts access, read-only filesystem prevents writes, capabilities limit privileges. Each layer adds <1% overhead. There is no performance reason to skip them. The default seccomp profile is a starting point — customize it for high-security environments.
Runtime Hardening Decisions
IfStandard web application (API, web server)
UseUse default seccomp + --read-only + --cap-drop=ALL + --cap-add=NET_BIND_SERVICE + --no-new-privileges
IfApplication needs to write to specific directories
UseUse --read-only with --tmpfs for writable paths. Never make the entire filesystem writable.
IfContainer needs to interact with other containers (service mesh sidecar)
UseUse custom seccomp profile that allows network syscalls but blocks filesystem syscalls.
IfHigh-security environment (PCI-DSS, SOC 2)
UseUse all hardening measures + custom seccomp + AppArmor/SELinux + image signing + SBOM.

Docker Daemon Security — Protecting the Control Plane

The Docker daemon (dockerd) is the control plane for all container operations. If the daemon is compromised, every container on the host is compromised. Three critical daemon security practices:

1. Never expose the daemon socket without TLS. The default daemon listens on a Unix socket (/var/run/docker.sock) which requires local access. If configured to listen on TCP (port 2375), it accepts unauthenticated connections from the network. Anyone who can reach port 2375 can create, stop, and delete containers — effectively root access to the host.

2. Enable TLS with client certificate authentication. If remote daemon access is required (for CI/CD, monitoring), configure TLS on port 2376 with client certificates. Only clients with a valid certificate can connect. This is the equivalent of SSH key authentication for the Docker daemon.

3. Enable user namespace remapping. By default, UID 0 inside the container maps to UID 0 on the host. User namespace remapping maps container UIDs to unprivileged host UIDs (e.g., container UID 0 -> host UID 100000). This means even a container escape results in an unprivileged host user, not root.

4. Enable live-restore. If the Docker daemon restarts, running containers are killed by default. live-restore=true keeps containers running during daemon restarts, improving availability. This also means a daemon crash does not take down your production workloads.

5. Disable the legacy registry (v1). Docker Registry v1 is deprecated and has known security issues. Ensure the daemon only interacts with v2 registries.

/etc/docker/daemon.jsonJSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
  "userns-remap": "default",
  "live-restore": true,
  "no-new-privileges": true,
  "userland-proxy": false,
  "icc": false,
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "storage-driver": "overlay2",
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 65536,
      "Soft": 65536
    }
  },
  "tls": true,
  "tlscacert": "/etc/docker/tls/ca.pem",
  "tlscert": "/etc/docker/tls/server-cert.pem",
  "tlskey": "/etc/docker/tls/server-key.pem",
  "tlsverify": true,
  "hosts": [
    "unix:///var/run/docker.sock",
    "tcp://0.0.0.0:2376"
  ]
}
Output
# After updating daemon.json, restart Docker:
sudo systemctl restart docker
# Verify user namespace remapping:
docker info | grep -i "docker root dir"
# Docker Root Dir: /var/lib/docker/100000.100000
# The 100000.100000 indicates remapping is active
# Verify TLS:
docker --tlsverify \
--tlscacert=/etc/docker/tls/ca.pem \
--tlscert=/etc/docker/tls/client-cert.pem \
--tlskey=/etc/docker/tls/client-key.pem \
-H=tcp://localhost:2376 version
# Must use TLS flags to connect — unauthenticated connections are rejected
Docker Daemon as the Building Superintendent
  • User namespace remapping breaks some workflows — file permissions between host and container become mismatched.
  • Volume mounts with specific UID/GID expectations may fail because the container UID maps to a different host UID.
  • Some applications that need to interact with host resources (Docker-in-Docker, monitoring agents) break with remapping.
  • Docker chose developer convenience over security by default. Production environments should enable it.
Production Insight
The daemon.json configuration is the most impactful security hardening step because it applies globally to all containers. userns-remap, icc=false, and no-new-privileges=true apply to every container without modifying individual Dockerfiles. This is defense in depth at the infrastructure level, not the application level.
Key Takeaway
The Docker daemon is the control plane. Exposing it without TLS is equivalent to giving root access to anyone on the network. Enable TLS with client certificates, user namespace remapping, live-restore, and disable inter-container communication (icc=false). These daemon-level settings apply globally and are the highest-impact security hardening steps.

Rootless Mode — Stop Running Containers as Root (Even Inside the Container)

You've already told your team to never run containers as root inside the container. Good. But what about the container runtime itself? The Docker daemon still runs as root on the host. That means a container escape — or even a misconfigured volume mount — is a direct line to root on your host. Rootless mode fixes this. It runs the entire Docker daemon and containers under a user namespace, mapping the container's root (UID 0) to a non-root user on the host. If someone escapes the container, they get a user account, not root. This is the difference between 'oops, logs are readable' and 'oops, the attacker has your SSH keys'. It's not a silver bullet — some networking and storage drivers don't work in rootless mode — but for most microservices workloads, it's a free upgrade to your security posture. The performance hit is negligible in practice. The isolation gain is massive.

RootlessCheck.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — devops tutorial

// Verify rootless mode is active on a Docker host
$ docker info --format '{{.SecurityOptions}}'

// Expected output if rootless is enabled:
[name=seccomp,profile=default,name=rootless]

// If you see 'name=rootless' missing, you're running in legacy mode.
// Migrate using:
$ dockerd-rootless-setuptool.sh install

// After migration, confirm:
$ ps aux | grep dockerd- rootless
rootless   1234  ...  /usr/bin/dockerd-rootless.sh
Output
[name=seccomp,profile=default,name=rootless]
Production Trap:
Rootless mode breaks --privileged containers (good riddance) and some CNI plugins like Calico. Test your networking stack before switching. And never enable --userns=host in a rootless setup — it defeats the entire point.
Key Takeaway
Rootless mode turns container root into host non-root. A container escape becomes a local privilege escalation, not a free pass to your host.

Read-Only Root Filesystems — Your Containers Don't Need to Write to /usr

Every time a container writes to its own root filesystem, you're creating a potential persistence vector for an attacker. Ransomware? Writes to disk. Crypto miners? Writes to disk. Command & control scripts? Writes to disk. The fix is stupidly simple: make the root filesystem read-only. Your application only needs to write to specific mount points — logs, caches, uploads. Mount those as writable tmpfs or volumes. Everything else stays immutable. In production, this also means you catch config issues early: if a container tries to write to /usr/share/nginx/html (should be a volume), it fails immediately during CI, not during a pentest. Enable it with --read-only in Docker or readOnlyRootFilesystem: true in Kubernetes. Pair it with an explicit tmpfs mount for /tmp and /run and you've just made your containers read-only in practice, not just theory.

ReadOnlyDeployment.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — devops tutorial

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
spec:
  template:
    spec:
      containers:
      - name: payments
        image: payments-api:2.4.1
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: run
          mountPath: /var/run
      volumes:
      - name: tmp
        emptyDir: {}
      - name: run
        emptyDir: {}
Output
kubectl apply -f ReadOnlyDeployment.yml
# Pod comes up, rootfs is read-only.
# Try writing: touch /opt/app/foo -> fails with 'Read-only file system'
Senior Shortcut:
Set readOnlyRootFilesystem: true in your pod security standards. It catches 90% of 'but it works on my machine' bugs that only show up when the writable layer fills the node's disk.
Key Takeaway
Read-only root filesystems are free immutable infrastructure for containers. The only things that need write access are explicitly declared volumes.

Cgroups — The Resource Guardrails Nobody Audits

Control groups (cgroups) are the Linux kernel feature that Docker uses to limit CPU, memory, and I/O per container. Most teams set a memory limit and call it done. That's like locking your front door but leaving the windows open. Without cgroup constraints, a runaway container can starve the host — causing OOM kills on neighbouring containers or even triggering the kernel's OOM killer on critical system processes. More insidious: attackers use CPU exhaustion to mask crypto mining in bursts. Set --memory, --cpus, and --pids-limit on every container. The --pids-limit one catches fork bombs that hide in image layers. In Kubernetes, map these to resources.limits and resources.requests. And for god's sake, mount cgroupfs read-only inside the container. If a compromised container can write to cgroupfs, it can escape the cgroup constraints entirely. Docker blocks this by default, but check your custom runtime configs.

CgroupLimits.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — devops tutorial

// Docker run with cgroup guards
$ docker run -d \
  --memory=512m \
  --cpus=0.5 \
  --pids-limit=100 \
  --security-opt seccomp=default.json \
  nginx:alpine

// Kubernetes equivalent:
apiVersion: v1
kind: Pod
metadata:
  name: api-server
spec:
  containers:
  - name: api
    resources:
      limits:
        memory: "512Mi"
        cpu: "500m"
        pid: 100
Output
$ docker stats
CONTAINER ID CPU % MEM USAGE / LIMIT PIDS
abc123def456 23.4% 187MiB / 512MiB 67/100
Production Trap:
Setting --memory-swap to the same value as --memory disables swap for that container. Without this, the container can swap to disk, degrading neighbour latency and bypassing memory limits under pressure.
Key Takeaway
Cgroups are the difference between a noisy neighbour and a full-blown resource exhaustion DoS. Set CPU, memory, and PID limits on every container. Audit them quarterly.

Monitor Containers Like Theyʼre Hostile — Real-Time Threat Detection

Why: Attackers exploit silent drift — a container pulling malicious libraries or establishing unexpected outbound connections. Standard Docker logs miss kernel-level anomalies. How: Deploy Falco, the CNCF runtime security tool, as a DaemonSet or systemd service. Falco hooks syscalls via eBPF and triggers alerts on rule violations: reverse shells, privilege escalation, or unexpected file writes to /etc. Pair with a log aggregator (Loki, Elastic) and set up alerting on critical rules. Example: A container suddenly writing to /proc/1/environ triggers “Sensitive file opened for reading” — immediate investigation. Run falco --watch locally to catch policy breaks before production. Keep rules updated: Falco community rulesets flag cryptominer patterns and Kubernetes pod escapes. Do not rely solely on logs; syscall-level monitoring catches what docker logs misses. One misconfigured volume + no runtime detection = undetected breach for months.

docker-falco-rules.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — devops tutorial

// Falco custom rule: alert on new outbound SSH
- rule: Outbound SSH Connection
  desc: Detect SSH client spawned inside container
  condition: >
    spawned_process and
    proc.name = ssh and
    evt.type = connect and
    fd.type = ipv4
  output: >
    SSH connection from container (user=%user.name command=%proc.cmdline)
  priority: WARNING
  tags: [network, mitre_exfiltration]
Output
exact
Production Trap:
Falco's default rules can flood your SIEM in high-traffic clusters. Always tune rule priorities and throttle repeated alerts per container.
Key Takeaway
Always combine syscall-level monitoring with traditional logs — attackers bypass Docker logs, not kernel traces.

Two-Factor Authentication for Docker Hub — Your Registry Is a Supply Chain Target

Why: Compromised Docker Hub accounts push malicious images that propagate to thousands of downstream users. Static passwords are the weakest link. How: Enable two-factor authentication (2FA) via Docker Hub Settings > Security > Enable 2FA using an authenticator app (Google Authenticator, Authy). For teams, enforce organization-wide 2FA under Organization Settings > Security > Require 2FA. This blocks attackers even if a userʼs password leaks in a breach. After enabling, generate a personal access token for CLI logins — never use your password. Update CI/CD pipelines to use tokens with scoped permissions (read-only for pull, read-write only when pushing). Automate token rotation monthly. Audit member 2FA status via Docker Hub API: GET /v2/orgs/{org}/members shows two_factor_authentication field. Remove members who disable it. Without 2FA, a single phished credential can backdoor your entire image supply chain.

docker-login-token.ymlYAML
1
2
3
4
5
6
7
8
9
10
// io.thecodeforge — devops tutorial

// CI/CD token login with scoped permissions
- name: Login to Docker Hub with token
  uses: docker/login-action@v3
  with:
    username: ${{ secrets.DOCKER_USER }}
    password: ${{ secrets.DOCKER_TOKEN }}
// Token created with: read-only, no admin
// Rotate token: set renewal to 30 days via cron
Output
exact
Production Trap:
Personal access tokens bypass 2FA — if leaked, an attacker has full access without authentication challenges. Store tokens in a vault, never in source code.
Key Takeaway
Enforce 2FA at the organization level — individual user opt-in leaves supply chain gaps.

MacOS Installation — Not Just Docker Desktop

MacOS developers often default to Docker Desktop, but this choice has security and licensing implications. Docker Desktop runs a Linux VM under the hood, which means container isolation depends on that VM's integrity. For production-like environments, consider using Colima or Lima + Docker CLI instead. These tools provide a lightweight, open-source VM that avoids Docker Desktop's proprietary licensing changes. When installing on MacOS, always verify the checksum of the downloaded binary against the official Docker or Colima releases. Never install using curl | sh scripts without validation. Set up Docker in rootless mode immediately after installation to prevent the daemon from running with elevated privileges. This is critical because MacOS lacks native container isolation, so the VM's security boundary is your last defense. Use docker context to switch between local and remote Docker hosts, keeping your laptop from becoming a production attack vector.

docker-macos-setup.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
// io.thecodeforge — devops tutorial
version: '3.8'
services:
  colima:
    image: docker:stable
    command: colima start --cpu 4 --memory 4 --disk 50 --vm-type vz
    environment:
      - COLIMA_HOME=/root/.colima
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    restart: unless-stopped
// 25 lines max
Output
colima start completed; docker context points to colima VM
Production Trap:
Docker Desktop's VM uses HyperKit which shares host kernel. Always set 'socket_vmnet' to restrict network access. Otherwise, any container escape reaches your Mac's filesystem.
Key Takeaway
Never run Docker Desktop in production-adjacent workflows; Colima provides auditable, minimal VM security.

Managing Multi-Container Applications with Docker Compose

Docker Compose simplifies multi-container orchestration, but misconfigurations here create broad attack surfaces. Always define explicit networks instead of relying on the default bridge. Use internal: true on networks that should never access the host. Define read-only root filesystems for each service that doesn't require persistent writes, and set deploy.resources.limits to prevent one compromised container from starving others. Never use depends_on alone — it only controls startup order, not health. Combine it with condition: service_healthy to enforce that dependent services are actually responsive. For secrets in Compose, use the secrets top-level element with file-based or external secrets. Avoid environment variable injection; they persist in process lists and logs. Pin image tags to specific digests (image: myapp@sha256:...) rather than mutable tags. This prevents drift when upstream images are rebuilt with new vulnerabilities.

docker-compose-secure.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// io.thecodeforge — devops tutorial
version: '3.8'
services:
  app:
    image: myapp@sha256:abc123def456
    read_only: true
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    networks:
      - internal
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
  db:
    image: postgres:15@sha256:7890
    secrets:
      - db_password
    healthcheck:
      test: pg_isready || exit 1
networks:
  internal:
    internal: true
secrets:
  db_password:
    file: ./secrets/db_password.txt
// 25 lines max
Output
Services start only after health checks pass; internal network blocks all host access.
Production Trap:
Compose 'depends_on' without health checks allows containers to start before dependencies are ready, causing cascading failures and data corruption.
Key Takeaway
Treat Compose files as infrastructure-as-code; each service must have explicit network, resource, and health constraints.

Best Practices for Maintaining and Securing Containerized Applications

Containerized applications require proactive maintenance beyond initial hardening. Implement multi-stage builds to minimize final image size and reduce the attack surface. Each stage isolates build tools from runtime dependencies. Optimize layer caching by ordering Dockerfile instructions from least to most frequently changing. Install only production dependencies in the final stage. Use .dockerignore to exclude secrets, .git, and node_modules from the build context, preventing them from leaking into layers. Set HEALTHCHECK on every service — it tells Docker when a container is truly broken, not just running. For runtime integrity, enable Docker Content Trust (DCT) to sign and verify image tags. Schedule weekly image rescans against CVE databases. Rotate secrets every 90 days using external vaults like HashiCorp Vault, never in environment files. Finally, enforce operating system updates inside containers by rebuilding base images monthly — Alpine's musl libc and Ubuntu's glibc both get patch updates that close privilege escalation vectors.

Dockerfile.multi-stageYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — devops tutorial
FROM golang:1.21 AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o app .

FROM scratch
COPY --from=builder /src/app /app
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
USER 65534:65534
HEALTHCHECK --interval=30s --timeout=3s CMD ["/app", "health"]
LABEL maintainer="devops@thecodeforge.io" security-check="2024-01-15"
// 25 lines max
Output
Final image size: ~12MB; only go binary and CA certs; HEALTHCHECK probes /health endpoint
Production Trap:
Skipping .dockerignore sends .git folders and local secrets into build context. These become accessible in image layers even if later deleted.
Key Takeaway
Treat maintenance as continuous: rebuild images weekly, rotate secrets, and always include HEALTHCHECK for automated recovery.
● Production incidentPOST-MORTEMseverity: high

Cryptominer Deployed via Exposed Docker Daemon Socket — Full Host Compromise in 3 Minutes

Symptom
Host CPU utilization spiked to 100% overnight. The ops team noticed the spike during morning standup. Initial investigation showed a container named 'system-monitor' running at 98% CPU. The team did not recognize this container — it was not in their docker-compose.yml or deployment manifests.
Assumption
Team assumed a runaway process in one of their application containers. They ran docker stats and identified the 'system-monitor' container. They stopped it, but a new container named 'health-checker' appeared within seconds. They stopped that one too, and a third appeared. The team assumed a Docker daemon bug causing ghost containers.
Root cause
A monitoring container (Prometheus node-exporter) was deployed with -v /var/run/docker.sock:/var/run/docker.sock to enable Docker metrics collection. The node-exporter had an unpatched RCE vulnerability (CVE-2024-XXXX). The attacker exploited the RCE to gain shell access inside the container, then used the mounted Docker socket to create new containers with --privileged and --pid=host. The privileged container mounted the host filesystem at /host, installed a cryptominer, added an SSH key to /host/root/.ssh/authorized_keys, and modified /host/etc/crontab for persistence. The 'system-monitor' container was the attacker's cryptominer.
Fix
1. Removed the Docker socket mount from the monitoring container. Switched to Prometheus cAdvisor which does not require socket access. 2. Rebuilt all containers with USER nonroot. 3. Enabled Docker daemon TLS with client certificate authentication. 4. Added a Falco runtime security rule that alerts on any new container creation outside of the CI/CD pipeline. 5. Rotated all SSH keys on the compromised host. 6. Implemented Pod Security Standards (restricted) in Kubernetes to prevent privileged containers.
Key lesson
  • Never mount /var/run/docker.sock into a container unless absolutely necessary. If you must, use a socket proxy that restricts the API calls the container can make.
  • A container with Docker socket access is equivalent to root access on the host. Treat it with the same security rigor as SSH access.
  • Cryptomining is the most common payload for Docker socket exploitation — the attacker wants compute, not data.
  • Runtime security monitoring (Falco, Sysdig) detects anomalous container creation. Without it, the attack is invisible until the CPU spike is noticed.
  • Patch all containers, including monitoring and utility containers. They are part of your attack surface.
Production debug guideFrom exposed sockets to root containers — systematic security audit paths.6 entries
Symptom · 01
Docker daemon socket is accessible from inside a container.
Fix
Check if any container has /var/run/docker.sock mounted: docker inspect --format='{{.Mounts}}' <container>. If found, assess whether the container truly needs it. If yes, replace with a socket proxy (Tecnativa/docker-socket-proxy) that restricts API access. If no, remove the mount immediately.
Symptom · 02
Containers are running as root in production.
Fix
Audit all running containers: for c in $(docker ps -q); do docker inspect --format='{{.Name}} {{.Config.User}}' $c; done. Any container with an empty User field is running as root. Add USER instructions to Dockerfiles and rebuild.
Symptom · 03
Image has known CVEs that were not caught before deployment.
Fix
Scan the image with Trivy: trivy image <image-name>. Review critical and high CVEs. Check if the CVEs are in the base image (update the base) or in application dependencies (update the dependency). Integrate scanning into CI to prevent future deployments of vulnerable images.
Symptom · 04
Container has --privileged flag or excessive capabilities.
Fix
Audit all containers: docker inspect --format='{{.Name}} Privileged={{.HostConfig.Privileged}} CapAdd={{.HostConfig.CapAdd}}' $(docker ps -q). Remove --privileged. Drop all capabilities with --cap-drop=ALL and add back only what is needed with --cap-add.
Symptom · 05
Secrets found in image layers.
Fix
Inspect image history: docker history --no-trunc <image>. Search for secrets: docker save <image> | tar -xO 2>/dev/null | grep -i 'password\|secret\|key\|token'. Rotate exposed credentials immediately. Rebuild with .dockerignore excluding all secret files. Use BuildKit --mount=type=secret for build-time secrets.
Symptom · 06
Docker daemon is accessible from the network without TLS.
Fix
Check if the daemon is listening on a TCP port: ss -tlnp | grep 2375. If port 2375 is open, the daemon is exposed without authentication. Immediately restrict access via firewall. Configure TLS with client certificates on port 2376. Never expose port 2375 to the public internet.
★ Docker Security Triage Cheat SheetFirst-response commands when a Docker security issue is suspected.
Suspected container breakout or host compromise.
Immediate action
Check for unauthorized containers and socket access.
Commands
docker ps -a --format '{{.Names}} {{.Image}} {{.CreatedAt}}' | sort -k3
docker inspect --format='{{.Name}} Mounts={{.Mounts}}' $(docker ps -q) | grep docker.sock
Fix now
Stop unauthorized containers. If docker.sock is mounted in an unexpected container, treat as a breach — rotate credentials, audit host.
Container running as root in production.+
Immediate action
Audit user configuration across all containers.
Commands
for c in $(docker ps -q); do docker inspect --format='{{.Name}} user={{.Config.User}}' $c; done
docker inspect --format='{{.Name}} Privileged={{.HostConfig.Privileged}} CapAdd={{.HostConfig.CapAdd}}' $(docker ps -q)
Fix now
Add USER instruction to Dockerfile. Rebuild and redeploy. Drop all capabilities: --cap-drop=ALL --cap-add=NET_BIND_SERVICE.
Image with known vulnerabilities deployed to production.+
Immediate action
Scan the deployed image for CVEs.
Commands
trivy image --severity CRITICAL,HIGH <image>
docker history --no-trunc <image> | grep -i 'secret\|password\|key'
Fix now
If critical CVEs exist, patch the base image or dependency. If secrets found, rotate credentials immediately. Add trivy to CI pipeline.
Docker daemon exposed to the network without TLS.+
Immediate action
Check if daemon is listening on TCP port 2375.
Commands
ss -tlnp | grep -E '2375|2376'
curl -s http://<host>:2375/version
Fix now
If port 2375 responds, block it with iptables immediately. Configure TLS on port 2376. Set "hosts": ["unix:///var/run/docker.sock"] in daemon.json to disable TCP.
Container has writable root filesystem in production.+
Immediate action
Check if containers are using --read-only flag.
Commands
docker inspect --format='{{.Name}} ReadOnly={{.HostConfig.ReadonlyRootfs}}' $(docker ps -q)
docker inspect --format='{{.Name}} Tmpfs={{.HostConfig.Tmpfs}}' $(docker ps -q)
Fix now
Add --read-only to container run command. Mount tmpfs for writable paths: --tmpfs /tmp:size=64m --tmpfs /var/run:size=16m.
Docker daemon configuration is insecure.+
Immediate action
Audit daemon.json for dangerous settings.
Commands
cat /etc/docker/daemon.json
docker info --format '{{json .SecurityOptions}}'
Fix now
Ensure: userland-proxy=false, noicc=true, userns-remap enabled, live-restore=true. Remove any insecure-registries entries for production hosts.
Docker Security Mechanisms Compared
MechanismLayerOverheadWhat It ProtectsDefault State
Non-root user (USER)Image/RuntimeNoneLimits damage after container escapeOff (runs as root)
seccomp profileKernel<1% per syscallBlocks dangerous syscalls (mount, reboot)On (default profile)
AppArmor / SELinuxKernel<1% per access checkRestricts file/network/capability accessOn (docker-default on Ubuntu)
Read-only filesystemRuntimeCan improve perf (fewer writes)Prevents filesystem modificationOff (writable)
Capabilities (cap-drop)KernelNoneLimits kernel-level privilegesOn (default profile drops ~14 of ~42)
no-new-privilegesKernelNonePrevents setuid/setgid escalationOff
User namespace remappingDaemonNegligibleMaps container UID 0 to unprivileged host UIDOff
Image scanning (Trivy)CI/CDBuild time onlyIdentifies known CVEs in imageOff (must be configured)
Image signing (cosign)Supply chainPush/pull time onlyVerifies image integrity and provenanceOff (must be configured)

Key takeaways

1
Docker containers run as root by default. Add USER nonroot to every production Dockerfile. Pair with --security-opt=no-new-privileges and --cap-drop=ALL.
2
The --privileged flag disables ALL container isolation. Never use it in production. Use --cap-drop=ALL and --cap-add for specific capabilities.
3
Never bake secrets into Docker image layers. Use BuildKit --mount=type=secret for build-time and secrets managers for runtime.
4
The Docker daemon socket (/var/run/docker.sock) is the most dangerous thing to expose. A container with socket access can take over the host.
5
Image scanning with Trivy, SBOM generation with Syft, and image signing with cosign form a complete supply chain security pipeline.
6
Runtime hardening is layered
seccomp + AppArmor + read-only filesystem + capabilities. Each adds <1% overhead. There is no reason to skip them.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the difference between seccomp and AppArmor?
02
Is Docker secure enough for production?
03
What happens if I use --privileged in production?
04
How do I scan my Docker images for vulnerabilities?
05
What is the safest way to pass secrets to a Docker container?
N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Lessons pulled from things that broke in production.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's Docker. Mark it forged?

13 min read · try the examples if you haven't

Previous
Docker Registry and Docker Hub
11 / 18 · Docker
Next
Multi-stage Docker Builds