Senior 13 min · April 05, 2026

Docker Architecture Explained

Docker Daemon Bottleneck — 50 Concurrent Builds Crash

Q: What is the difference between containerd and dockerd?

dockerd (the Docker daemon) is the user-facing server that manages the Docker API, image building, networking, and volumes. containerd is the container runtime that manages the container lifecycle — pulling images, creating containers, and handling execution. dockerd delegates to containerd for container operations. containerd was extracted from Docker in 2017 and is now used independently by Kubernetes and other platforms.

Q: What is the OCI spec and why does it matter?

The OCI (Open Container Initiative) spec defines two standards: the image spec (how images are packaged as layers + manifest) and the runtime spec (how containers are created as config.json). This standardization means any OCI-compliant runtime (runc, crun, kata-runtime, runsc) can run any OCI-compliant image. It enables runtime replaceability — you can swap runc for gVisor without changing Docker.

Q: Can I use containerd directly without dockerd?

Yes. containerd provides its own CLI (ctr) and API (gRPC). Kubernetes uses containerd directly via the CRI plugin, bypassing dockerd entirely. You can use ctr to pull images, create containers, and manage snapshots. This reduces overhead and removes the daemon as a single point of failure.

Q: Why is my docker build so slow at 'Sending build context'?

The build context is the entire current directory (or the path specified in docker build -f). Without a .dockerignore file, this includes node_modules (500MB+), .git history (100MB+), and other large files. The CLI tar's this directory and sends it to the daemon over the Unix socket. Create a .dockerignore file to exclude unnecessary files. This alone can reduce build time from minutes to seconds.

50 concurrent docker builds consumed 12GB RAM and froze the daemon for 20 minutes.

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.

✓ Production

production tested

July 04, 2026

last updated

1,663

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of DevOps fundamentals
✓Comfortable with command-line tools
✓Basic Linux administration knowledge

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

docker build: CLI sends context to dockerd -> dockerd executes Dockerfile instructions -> each instruction creates a cached layer -> layers stored under /var/lib/docker/overlay2/
docker push: dockerd uploads layers to a registry -> registry stores layers by digest -> tags point to manifests
docker run: CLI sends API request to dockerd -> dockerd delegates to containerd -> containerd invokes runc -> runc configures namespaces/cgroups/filesystem -> exec starts the application
Docker CLI: HTTP client that talks to the daemon via Unix socket
dockerd (daemon): manages images, networks, volumes, and the REST API
containerd: manages container lifecycle, image pulling, and snapshot management
runc: creates containers by calling kernel syscalls (clone, pivot_root, exec)

✦ Definition~90s read

What is Docker Architecture?

Docker's architecture is a layered client-server model where the Docker CLI communicates with a central daemon (dockerd) that manages containers, images, and builds. The daemon delegates low-level container operations to containerd (a container runtime supervisor) and runc (the OCI-compliant container spawner), while the daemon itself handles the Docker API, image management, and storage orchestration.

★

Think of Docker architecture as a shipping company.

This design centralizes control but creates a single point of failure: all CLI commands, image pulls, and concurrent builds funnel through one daemon process, which can saturate CPU, memory, or I/O under load—like 50 parallel builds exhausting layer cache locks and filesystem operations. The OCI Spec standardizes the image format and runtime behavior, allowing containerd and runc to be swapped with alternatives like CRI-O or Kata Containers, but the daemon remains the bottleneck in Docker's default stack.

When you run docker build, the daemon parses the Dockerfile, executes each instruction in a temporary container via containerd, commits layers as overlay2 diffs, and caches them in /var/lib/docker. The storage stack uses overlay2 for copy-on-write union mounts, where each layer is a read-only filesystem diff, and a container's writable layer sits on top.

Volumes bypass this stack by mounting host directories directly, avoiding the performance overhead of the layered filesystem. For distribution, the daemon interacts with a registry (e.g., Docker Hub) using the OCI Distribution Spec: it fetches a manifest listing layer digests (SHA256 hashes), deduplicates layers already cached locally, and streams missing layers as compressed tarballs.

This architecture works well for single-host workflows but breaks under concurrent load because the daemon serializes layer operations and registry interactions, making it unsuitable for high-scale CI/CD without external orchestration or daemon-per-build patterns.

Plain-English First

Think of Docker architecture as a shipping company. The Docker CLI is the customer placing an order. The daemon (dockerd) is the dispatch center that receives orders and coordinates everything. containerd is the warehouse manager that tracks inventory (images) and assembles shipments (containers). runc is the forklift operator who physically moves items into the shipping container. The kernel is the warehouse floor — the physical space where everything is built. Each layer has a specific job, and the handoff between layers is standardized (the OCI spec) so you can swap one forklift brand (runc) for another (crun, gVisor) without redesigning the warehouse.

Most Docker documentation treats the architecture as a black box — run a command, get a container. This abstraction breaks down when containers fail to start, images pull slowly, or the daemon crashes under load. Understanding the component stack and the data flow between components is essential for production debugging.

Docker is not one program. It is a chain of specialized components: the CLI sends API requests to the daemon, the daemon delegates to containerd, containerd invokes runc, and runc configures the Linux kernel to create an isolated process. Each handoff is a potential failure point. The OCI (Open Container Initiative) spec standardizes the interface between containerd and runc, enabling runtime replaceability.

This article traces the complete end-to-end flow: what happens when you run docker build, how images are stored and distributed, what happens when you run docker run, how networking and storage are wired, and where each component lives on the filesystem. Every section includes production failure scenarios and debugging commands.

Why Docker Daemon Architecture Becomes a Single Point of Failure

Docker uses a client-server architecture where the Docker daemon (dockerd) is the central orchestrator. It manages images, containers, networks, and volumes via a REST API. The daemon runs as a single process, handling all requests serially through a shared state store. This means every docker build, run, or pull command goes through the same bottleneck. Under load, especially with concurrent builds, the daemon's internal job queue and filesystem operations (layer management, image extraction) become the limiting factor. The daemon processes each build step sequentially for a given image, but across builds, it must synchronize access to shared resources like the image cache and storage driver. With 50 concurrent builds, the daemon's single-threaded event loop and lock contention on the overlay filesystem cause exponential latency growth. The result: builds time out, the daemon runs out of file descriptors, or the entire host becomes unresponsive. This architecture works for low concurrency but fails when you treat Docker as a CI build orchestrator without understanding its single-process limits.

Docker Is Not a Build Orchestrator

Docker daemon was designed for interactive use, not for 50 concurrent builds. Each build consumes daemon resources linearly — plan for 1-2 concurrent builds per host.

Production Insight

Teams running 50+ concurrent builds on a single Docker host see the daemon's goroutine count spike to 10k+ and file descriptors exhaust, causing 'connection refused' errors.

The exact symptom: builds hang at 'Sending build context to Docker daemon' or fail with 'Cannot connect to the Docker daemon at unix:///var/run/docker.sock' — the daemon has stopped accepting new connections.

Rule of thumb: never exceed 4 concurrent builds per Docker host; use build farms or remote builders with separate daemon instances for CI pipelines.

Key Takeaway

Docker daemon is a single-process bottleneck — concurrent builds compete for the same image cache, storage driver, and network resources.

50 concurrent builds will crash the daemon due to file descriptor exhaustion, lock contention, and memory pressure — not from build logic but from architectural limits.

Mitigate by using Docker-in-Docker (dind) per build, remote builders, or switching to container-native build tools like BuildKit with separate daemon instances.

thecodeforge.io

Docker Architecture Explained

Component Stack: CLI, Daemon, containerd, runc, and the OCI Spec

Docker is a chain of five components, each with a specific responsibility. Understanding this chain is the foundation for debugging any Docker issue.

Docker CLI (docker): A Go binary that sends HTTP requests to the Docker daemon via a Unix socket (/var/run/docker.sock) or TCP. The CLI does not create containers, build images, or manage networks — it is a thin client. You can replace it with curl: curl --unix-socket /var/run/docker.sock http://localhost/containers/json.

Docker daemon (dockerd): A long-running Go process that manages the Docker API, image storage, network configuration, volume management, and build orchestration. The daemon listens on the Unix socket and processes all API requests. It delegates container lifecycle operations to containerd. The daemon runs as root and has full access to the host.

containerd: A container runtime daemon that manages the complete container lifecycle — pulling images, managing snapshots, creating containers, and handling execution. containerd was originally part of Docker but was extracted as a CNCF project in 2017. It is now used independently by Docker, Kubernetes (via CRI), AWS ECS, GKE, and other platforms. containerd invokes runc to actually create containers.

runc: A lightweight CLI tool that creates a single container from an OCI runtime specification (config.json). runc calls clone() to create a new process with namespaces, configures cgroups, mounts the filesystem via overlay2, drops privileges, and exec's the application process. runc exits after creating the container — it does not manage the lifecycle.

OCI spec: The Open Container Initiative defines two standards: the image spec (how images are packaged as layers + manifest) and the runtime spec (how containers are created as config.json). This standardization enables runtime replaceability — you can swap runc for crun, kata-runtime, or runsc without changing Docker or containerd.

The handoff chain: docker run -> dockerd (API) -> containerd (lifecycle) -> runc (creation) -> kernel (namespaces, cgroups, overlay2). Each handoff is a potential failure point. If dockerd crashes, all operations fail. If containerd crashes, new containers cannot be created but existing ones keep running. If runc fails, the specific container creation fails but the stack above is unaffected.

io/thecodeforge/architecture_flow.shBASH

#!/bin/bash
# Trace the complete Docker architecture flow

# ── 1. CLI -> Daemon communication ───────────────────────────────────────────
# The CLI sends HTTP requests to the daemon socket
curl --unix-socket /var/run/docker.sock http://localhost/version | python3 -m json.tool
# Shows: Version, ApiVersion, GoVersion, Os, Arch, KernelVersion

# List containers via the API (same as docker ps)
curl --unix-socket /var/run/docker.sock http://localhost/containers/json | python3 -m json.tool

# ── 2. Daemon -> containerd communication ────────────────────────────────────
# containerd runs as a separate process, communicating via gRPC
ps aux | grep containerd
# root  1234  0.3  0.5  ... /usr/bin/containerd

# Check the containerd socket
ls -la /run/containerd/containerd.sock
# srw-rw---- 1 root containerd /run/containerd/containerd.sock

# List containers managed by containerd (via ctr)
sudo ctr -n moby containers ls
# Shows containers that containerd is managing on behalf of Docker

# ── 3. containerd -> runc communication ──────────────────────────────────────
# runc is invoked by containerd to create each container
which runc
# /usr/bin/runc

runc --version
# runc version 1.1.9

# List containers managed by runc
sudo runc list
# Shows: container ID, PID, status, bundle path

# ── 4. Inspect the OCI runtime spec for a running container ──────────────────
CONTAINER_ID=$(docker ps -q | head -1)

# Find the container's bundle directory
sudo find /run/containerd -name config.json 2>/dev/null | head -3
# /run/containerd/io.containerd.runtime.v2.task/default/<id>/config.json

# ── 5. Check the daemon process tree ─────────────────────────────────────────
pstree -p $(pidof dockerd)
# dockerd(1234)───containerd(1235)───containerd-shim(5678)───node(5679)
#                                                           └─pause(5677)

# ── 6. Check all components are running ──────────────────────────────────────
echo "dockerd: $(systemctl is-active docker)"
echo "containerd: $(systemctl is-active containerd)"
echo "runc: $(which runc && echo 'installed' || echo 'missing')"

# ── 7. Check the daemon's storage driver and root directory ──────────────────
docker info --format '{{.Driver}} {{.DockerRootDir}}'
# overlay2 /var/lib/docker

# ── 8. Check the daemon's configured runtimes ────────────────────────────────
docker info --format '{{json .Runtimes}}' | python3 -m json.tool
# Shows: runc (default), and any custom runtimes (runsc, kata)

Output

# Daemon version:

{

"Version": "24.0.7",

"ApiVersion": "1.43",

"GoVersion": "go1.20.10",

"Os": "linux",

"Arch": "amd64",

"KernelVersion": "6.1.0-18-amd64"

}

# Component status:

dockerd: active

containerd: active

runc: installed

# Storage driver:

overlay2 /var/lib/docker

# Process tree:

dockerd(1234)───containerd(1235)───containerd-shim(5678)───node(5679)

└─pause(5677)

The Component Chain as an Assembly Line

containerd was extracted from Docker in 2017 to become a standalone CNCF project.
Kubernetes can use containerd directly (via CRI) without dockerd — reducing overhead and complexity.
Separation allows independent scaling — containerd can be updated without restarting dockerd.
containerd manages the lifecycle; dockerd manages the user-facing API and image building.

Production Insight

The daemon is a single point of failure. If dockerd crashes, all Docker operations (build, run, stop, ps, logs) fail. Existing containers keep running (they are managed by containerd, not dockerd), but you cannot interact with them via the Docker CLI. For high-availability environments, consider using containerd directly (via ctr or crictl) to bypass the daemon for container lifecycle operations.

Key Takeaway

Docker is a chain: CLI -> dockerd -> containerd -> runc -> kernel. Each component has a specific role. The OCI spec standardizes the interface between containerd and runc. If dockerd crashes, CLI operations fail but existing containers keep running because containerd manages them independently.

Component Failure Impact

Ifdockerd crashes

→

UseAll CLI operations fail. Existing containers keep running (managed by containerd). Restart dockerd to recover.

Ifcontainerd crashes

→

UseNew containers cannot be created. Existing containers keep running (processes are still alive). Restart containerd to recover.

Ifrunc fails to create a container

→

UseThe specific container creation fails. Other containers are unaffected. Check OCI spec and kernel logs.

IfDocker socket (/var/run/docker.sock) is deleted

→

UseAll CLI operations fail with 'Cannot connect to the Docker daemon'. Restart dockerd to recreate the socket.

thecodeforge.io

Docker Architecture Explained

Image Build Flow: From Dockerfile to Cached Layers

When you run docker build, a precise sequence of operations transforms a Dockerfile into a cached, layered image. Understanding this flow explains why builds are slow, why layers are cached, and why image size matters.

Step 1: Send build context. The CLI tar's the current directory (or the path specified in docker build -f) and sends it to the daemon via the Unix socket. This is the 'Sending build context to Docker daemon' message. The .dockerignore file filters out excluded files before sending. Without .dockerignore, the entire directory (including .git, node_modules) is sent.

Step 2: Parse the Dockerfile. The daemon parses the Dockerfile and executes each instruction sequentially. Each instruction is evaluated against the layer cache.

Step 3: Cache lookup. For each instruction, the daemon checks if a cached layer exists with the same instruction text and the same parent layer. If the cache hit, the layer is reused (no execution). If the cache miss, the instruction is executed and a new layer is created. The cache is sequential — a miss invalidates all subsequent layers.

Step 4: Execute the instruction. For RUN, the daemon creates a temporary container from the previous layer, executes the command, and captures the filesystem diff as a new layer. For COPY/ADD, the daemon copies files from the build context into a new layer. For ENV/EXPOSE/LABEL, the daemon creates a metadata-only layer (no filesystem change).

Step 5: Commit the layer. The filesystem diff is committed as a new layer under /var/lib/docker/overlay2/. Each layer is a directory containing only the files that changed from the previous layer. The layer is identified by a SHA256 digest.

Step 6: Tag the image. After all instructions are executed, the final layer is tagged with the image name and tag (e.g., my-app:1.0.0). The tag points to a manifest — a JSON file that lists all layers in order.

BuildKit vs legacy builder: The legacy builder executes instructions sequentially. BuildKit (DOCKER_BUILDKIT=1) builds a dependency graph and executes independent instructions in parallel. BuildKit also supports --mount=type=secret for build-time secrets without baking them into layers. BuildKit is the default in Docker Desktop and is recommended for all builds.

io/thecodeforge/build_flow.shBASH

#!/bin/bash
# Trace the complete image build flow

# ── 1. Build context size (before and after .dockerignore) ───────────────────
# Without .dockerignore:
tar -cf - . | wc -c
# May be 500MB+ if node_modules and .git are included

# With .dockerignore:
cat .dockerignore
# node_modules/
# .git/
# *.log

tar -cf - --exclude-from=.dockerignore . | wc -c
# Should be <10MB for a typical project

# ── 2. Build with cache inspection ───────────────────────────────────────────
# Build with BuildKit and progress=plain to see every step
DOCKER_BUILDKIT=1 docker build --progress=plain -t io.thecodeforge/api:1.0 . 2>&1 | tee /tmp/build.log

# Count cached vs executed steps:
grep -c 'CACHED' /tmp/build.log
grep -c 'RUN\|COPY' /tmp/build.log

# ── 3. Inspect the image layers ──────────────────────────────────────────────
# List layers in the image
docker inspect io.thecodeforge/api:1.0 --format '{{json .RootFS.Layers}}' | python3 -m json.tool
# Each entry is a SHA256 digest of a layer

# Show layer sizes
docker history io.thecodeforge/api:1.0 --format '{{.Size}}\t{{.CreatedBy}}' | head -10
# Shows the size contribution of each instruction

# ── 4. Find layers on disk ───────────────────────────────────────────────────
ls /var/lib/docker/overlay2/ | head -10
# Each directory is a layer. Shared layers are hard-linked.

# Check disk usage per layer:
du -sh /var/lib/docker/overlay2/* | sort -hr | head -10

# ── 5. Inspect the image manifest ────────────────────────────────────────────
# Save the image and inspect its manifest
docker save io.thecodeforge/api:1.0 | tar -xO manifest.json | python3 -m json.tool
# Shows: Config (image config), RepoTags, Layers (ordered list of layer tar files)

# ── 6. Compare BuildKit vs legacy builder performance ────────────────────────
# Legacy builder:
time DOCKER_BUILDKIT=0 docker build -t test:legacy .
# Sequential execution — slower for multi-step builds

# BuildKit:
time DOCKER_BUILDKIT=1 docker build -t test:buildkit .
# Parallel execution — faster for independent steps

# ── 7. Check the build cache ─────────────────────────────────────────────────
docker builder du
# Shows disk usage of the build cache

docker builder prune
# Removes unused build cache entries

Output

# Build context size:

8543232 (8.1MB with .dockerignore)

524288000 (500MB without .dockerignore)

# Build with cache:

#5 [ package.json ./ 2/6] COPY CACHED

#6 [3/6] RUN npm ci CACHED

#7 [4/6] COPY src/ ./src/ 0.3s

# Only the COPY src/ step was rebuilt

# Image layers:

[

"sha256:abc123...",

"sha256:def456...",

"sha256:ghi789..."

]

# Layer sizes:

142MB COPY --from=builder /app/node_modules ./node_modules

12MB COPY --from=builder /app/dist ./dist

7MB FROM node:20-alpine

# BuildKit vs legacy:

Legacy: 42.3s

BuildKit: 28.1s (33% faster)

Build Context as a Delivery Truck

The daemon needs access to files referenced by COPY and ADD instructions.
The daemon runs on the host (or a remote machine) — it cannot access the CLI's local filesystem directly.
The CLI tar's the context and sends it over the Unix socket. This is why .dockerignore is critical for build speed.
BuildKit optimizes this by only sending files referenced by COPY/ADD, not the entire context.

Production Insight

The build context is the most common cause of slow builds. A 500MB build context (including node_modules and .git) takes minutes to transfer over the Unix socket before any instruction executes. The fix: add .dockerignore with at minimum: node_modules/, .git/, .log, coverage/, .env. This alone can reduce build time from 5 minutes to 10 seconds.

Key Takeaway

The build flow is: CLI sends context -> daemon parses Dockerfile -> cache lookup per instruction -> execute miss -> commit layer -> tag image. The build context is the most common bottleneck. .dockerignore is mandatory. BuildKit parallelizes independent steps and is 30-50% faster than the legacy builder.

Image Distribution: Registry, Manifest, and Layer Deduplication

Once an image is built, it needs to be distributed to other machines — CI servers, staging environments, production clusters. This is the registry's job.

Image format: An image is not a single file. It is a collection of: - Layers: compressed tar archives, each identified by a SHA256 digest - Manifest: a JSON file that lists the layers in order and points to the image config - Image config: a JSON file that defines the runtime configuration (env vars, entrypoint, exposed ports, user)

The registry protocol: Docker registries implement the OCI Distribution Spec — an HTTP API for pushing and pulling images. The flow: 1. Client sends the manifest to the registry 2. Registry checks which layers it already has (by digest) 3. Client uploads only the missing layers 4. Registry stores layers by digest and links them to the manifest

Layer deduplication: This is the key efficiency mechanism. If two images share the same base layer (e.g., both use node:20-alpine), the layer is stored once on the registry and once on the local machine. When you pull a second image that shares layers with an existing image, only the unique layers are downloaded. This is why pulling a new version of your app is fast — only the top layers (containing your code) change.

Docker Hub pull-rate limits: Docker Hub limits pulls per IP: 100 per 6 hours for anonymous users, 200 for authenticated free users. This limit is per IP, not per user — a NAT gateway makes multiple machines appear as one IP. For CI/CD pipelines, this limit is hit quickly. The fix: authenticate with docker login, use a pull-through cache, or mirror images to a private registry.

Content trust (DCT): Docker Content Trust uses digital signatures to verify image integrity. When DOCKER_CONTENT_TRUST=1, Docker only pulls signed images. This prevents supply chain attacks where a malicious image is pushed with the same tag as a legitimate image.

io/thecodeforge/registry_flow.shBASH

#!/bin/bash
# Trace the image distribution flow

# ── 1. Inspect the local image manifest ──────────────────────────────────────
# Save the image and extract the manifest
docker save io.thecodeforge/api:1.0 -o /tmp/api-image.tar
cd /tmp && tar xf api-image.tar

# The manifest.json lists all components:
cat manifest.json | python3 -m json.tool
# [
#   {
#     "Config": "sha256:abc123...json",      <- image config
#     "RepoTags": ["io.thecodeforge/api:1.0"],
#     "Layers": [                              <- ordered layer list
#       "sha256:def456.../layer.tar",
#       "sha256:ghi789.../layer.tar"
#     ]
#   }
# ]

# ── 2. Inspect the image config ──────────────────────────────────────────────
cat sha256:abc123*.json | python3 -m json.tool | head -30
# Shows: architecture, os, config (env, cmd, entrypoint), rootfs (diff_ids)

# ── 3. Push to a registry ────────────────────────────────────────────────────
# Login to Docker Hub
docker login

# Tag the image for the registry
docker tag io.thecodeforge/api:1.0 youruser/io-thecodeforge-api:1.0

# Push — watch which layers are pushed vs already exist
docker push youruser/io-thecodeforge-api:1.0
# Output shows:
# Layer already exists (shared with base image)
# Pushing layer (unique to this image)

# ── 4. Pull from a registry ──────────────────────────────────────────────────
# Pull on a different machine
docker pull youruser/io-thecodeforge-api:1.0
# Output shows:
# Already exists (layers shared with local images)
# Downloading (unique layers)

# ── 5. Check pull-rate limit status ──────────────────────────────────────────
curl -s -I \
  -H "Authorization: Bearer $(curl -s 'https://auth.docker.io/token?service=registry.docker.io&scope=repository:ratelimitpreview/test:pull' | python3 -c 'import sys,json; print(json.load(sys.stdin)["token"])')" \
  https://registry-1.docker.io/v2/ratelimitpreview/test/manifests/latest \
  | grep -i ratelimit
# ratelimit-limit: 100;w=21600
# ratelimit-remaining: 76;w=21600

# ── 6. Enable Docker Content Trust ───────────────────────────────────────────
export DOCKER_CONTENT_TRUST=1

# Now docker pull only fetches signed images
docker pull youruser/io-thecodeforge-api:1.0
# If the image is not signed, the pull fails with a trust error

# ── 7. Check layer deduplication ─────────────────────────────────────────────
# Compare layers between two images
docker inspect node:20-alpine --format '{{.RootFS.Layers}}' | tr ' ' '\n' | wc -l
docker inspect io.thecodeforge/api:1.0 --format '{{.RootFS.Layers}}' | tr ' ' '\n' | wc -l
# The API image shares base layers with node:20-alpine

Output

# Manifest:

[

{

"Config": "sha256:a1b2c3d4e5f6...json",

"RepoTags": ["io.thecodeforge/api:1.0"],

"Layers": [

"sha256:f1e2d3c4b5a6.../layer.tar",

"sha256:a7b8c9d0e1f2.../layer.tar",

"sha256:d3e4f5a6b7c8.../layer.tar"

]

}

]

# Push output:

The push refers to repository [docker.io/youruser/io-thecodeforge-api]

f1e2d3c4b5a6: Mounted from library/node (shared layer)

a7b8c9d0e1f2: Pushed (unique layer)

d3e4f5a6b7c8: Pushed (unique layer)

1.0: digest: sha256:abc123... size: 1570

# Rate limit:

ratelimit-limit: 100;w=21600

ratelimit-remaining: 76;w=21600

Registry as a Library with ISBN Numbers

Without deduplication, every image would store a full copy of its base OS — wasting disk and bandwidth.
With deduplication, shared layers (like node:20-alpine) are stored once and referenced by multiple images.
Pulling a new app version only downloads the changed layers (typically your code — a few MB), not the entire image.
This is why Docker images are practical at scale — the overhead per image is only the unique layers.

Production Insight

The pull-rate limit is per IP, not per user. A CI server behind a NAT gateway with 50 engineers hits the 100-pull limit in minutes. The fix: always authenticate CI runners with docker login (doubles the limit to 200), deploy a pull-through cache registry, or mirror critical base images to a private registry (AWS ECR, GCR).

Key Takeaway

An image is layers + manifest + config. The registry protocol deduplicates layers by SHA256 digest. Pulling a new version only downloads changed layers. Docker Hub rate limits are per IP — authenticate CI runners and consider a pull-through cache. Docker Content Trust verifies image signatures to prevent supply chain attacks.

Container Creation Flow: From Image to Running Process

When you run docker run, a precise sequence of operations creates an isolated process from an image. This is the most critical flow to understand for production debugging.

Step 1: API request. The CLI sends a POST /containers/create request to dockerd. The request includes the image name, command, environment variables, port mappings, volume mounts, and resource limits.

Step 2: Image resolution. dockerd checks if the image exists locally. If not, it pulls the image from the registry. The image's layers are unpacked into /var/lib/docker/overlay2/.

Step 3: Create container metadata. dockerd creates a container configuration (container JSON) that includes the merged overlay2 directory, network settings, volume mounts, and resource limits. This metadata is stored in /var/lib/docker/containers/<container-id>/.

Step 4: Delegate to containerd. dockerd sends a gRPC request to containerd to create the container. containerd generates the OCI runtime spec (config.json) — a JSON file that defines namespaces, cgroups, mounts, and the process to execute.

Step 5: Invoke runc. containerd invokes runc create with the OCI spec. runc reads config.json and executes kernel syscalls: - clone(CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC) — creates a new process with namespaces - mount() — mounts /proc, /sys, /dev inside the container - pivot_root() — changes the root to the overlay2 merge directory - setuid()/setgid() — drops privileges (if non-root) - execve() — starts the application process

Step 6: Network setup. dockerd (via libnetwork) creates a veth pair — one end in the container's network namespace, one end on the Docker bridge (docker0). The container gets an IP address from the bridge's subnet. iptables rules are added for port publishing (-p) and inter-container communication.

Step 7: Monitor the process. containerd-shim monitors the container process, captures stdout/stderr, and handles signals. The pause process holds the namespaces open. When the application process exits, containerd-shim reports the exit code to containerd, which reports to dockerd.

io/thecodeforge/container_creation_flow.shBASH

#!/bin/bash
# Trace the complete container creation flow

# ── 1. Create a container (without starting it) ─────────────────────────────
docker create --name flow-demo \
  --cpus=1.0 \
  --memory=256m \
  -p 8080:3000 \
  -v demo-data:/app/data \
  alpine:3.19 sleep 3600

# ── 2. Inspect the container metadata ────────────────────────────────────────
# Container config stored by the daemon:
ls /var/lib/docker/containers/$(docker inspect flow-demo --format '{{.Id}}')/
# config.v2.json  hostconfig.json  hostname  hosts  resolv.conf  ...

# ── 3. Start the container and trace the flow ────────────────────────────────
docker start flow-demo

# Get the container's host PID:
CONTAINER_PID=$(docker inspect --format '{{.State.Pid}}' flow-demo)
echo "Container process PID on host: $CONTAINER_PID"

# ── 4. Inspect the OCI runtime spec ──────────────────────────────────────────
# Find the container's bundle in containerd's state:
sudo find /run/containerd -path '*flow-demo*' -name config.json 2>/dev/null

# Inspect the OCI spec (namespaces, mounts, process):
sudo cat /run/containerd/io.containerd.runtime.v2.task/default/*/config.json 2>/dev/null | python3 -m json.tool | head -60

# ── 5. Inspect the overlay2 filesystem ───────────────────────────────────────
docker inspect flow-demo --format '{{json .GraphDriver.Data}}' | python3 -m json.tool
# MergedDir: what the container sees as /
# UpperDir: writable layer (container-specific changes)
# LowerDir: read-only image layers

# ── 6. Inspect the network setup ─────────────────────────────────────────────
# Container's network config:
docker inspect flow-demo --format '{{json .NetworkSettings}}' | python3 -m json.tool | head -20
# Shows: IPAddress, Gateway, Ports, Networks

# veth pair on the host:
ip link show | grep veth
# vethXXXX@if4: <BROADCAST,MULTICAST,UP> ... master docker0

# iptables rules for port publishing:
sudo iptables -t nat -L -n | grep 8080
# DNAT rule forwarding host:8080 to container:3000

# ── 7. Find the pause process ────────────────────────────────────────────────
ps aux | grep pause | grep -v grep
# root  5677  0.0  0.0  1024  4  ?  Ss  10:23  0:00 /pause

# The pause process and container process share namespaces:
ls -la /proc/$CONTAINER_PID/ns/net
ls -la /proc/$(pgrep -f '/pause' | head -1)/ns/net
# Both point to the same namespace inode

# ── Cleanup ──────────────────────────────────────────────────────────────────
docker rm -f flow-demo

Output

# Container created:

flow-demo

# Host PID:

Container process PID on host: 5679

# Overlay2:

{

"LowerDir": "/var/lib/docker/overlay2/.../layers",

"MergedDir": "/var/lib/docker/overlay2/.../merged",

"UpperDir": "/var/lib/docker/overlay2/.../diff",

"WorkDir": "/var/lib/docker/overlay2/.../work"

}

# Network:

{

"IPAddress": "172.17.0.2",

"Gateway": "172.17.0.1",

"Ports": {"3000/tcp": [{"HostIp": "0.0.0.0", "HostPort": "8080"}]}

}

# iptables:

DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:3000

Container Creation as Furnishing a Room

runc's job is to create the container, not to manage it. After execve(), runc is replaced by the application process.
containerd-shim monitors the application process, captures output, and handles signals.
The pause process holds the namespaces open so they survive application restarts.
This separation allows containerd to manage the lifecycle without being PID 1 in the container.

Production Insight

The iptables rules for port publishing (-p) are created by dockerd when the container starts. If dockerd crashes and restarts, it recreates the rules for running containers. But if iptables is flushed (iptables -F) while dockerd is running, the port forwarding breaks and containers become unreachable from the host. The fix: restart dockerd to recreate the rules, or use docker network connect to re-attach containers to networks.

Key Takeaway

Container creation flow: CLI -> dockerd API -> containerd -> runc -> kernel syscalls. runc calls clone() for namespaces, pivot_root() for filesystem, execve() for the application. Network setup creates veth pairs and iptables rules. The pause process holds namespaces open. Every container is a real Linux process on the host.

Container Creation Failure Points

IfImage not found locally or in registry

→

Usedocker pull fails. Check image name, tag, and registry credentials. Check network connectivity.

IfPort already allocated

→

UseContainer creation fails. Check docker ps for conflicting port mappings. Change the host port.

IfVolume mount path does not exist

→

UseDocker creates the path as a directory (for bind mounts) or fails (for named volumes). Check volume existence.

Ifrunc fails to create namespaces

→

UseKernel error — check dmesg. May indicate cgroup or namespace limits. Check /proc/sys/user/max_user_namespaces.

thecodeforge.io

Docker Architecture Explained

Storage Architecture: overlay2, Volumes, and the Filesystem Stack

Docker's storage architecture has three layers: the image layers (read-only, cached), the container layer (writable, per-container), and volumes (persistent, managed separately). Understanding this stack explains why containers start fast, why data disappears, and why database performance differs between containers and bare metal.

overlay2 driver: The default storage driver. It stacks directories (layers) and presents a merged view. Each image layer is a directory under /var/lib/docker/overlay2/. The container's writable layer is a separate directory. The merged view is what the container sees as its root filesystem.

Layer sharing: Multiple containers from the same image share the same read-only layers. Each container has its own writable layer. This is why starting a second container from the same image is nearly instant — no data is copied, only a new writable directory is created.

Volumes: Named volumes are directories under /var/lib/docker/volumes/<volume-name>/_data. They are mounted into the container at the specified path. Volumes bypass overlay2 entirely — reads and writes go directly to the host filesystem. This is why databases should use volumes: no copy-up overhead, no overlay2 performance penalty, and data survives container deletion.

Bind mounts: Bind mounts map a specific host directory into the container. They also bypass overlay2. Bind mounts are ideal for development (live code reload) but risky in production (the container can modify host files).

tmpfs mounts: tmpfs mounts store data in memory only. They never touch the disk. Useful for sensitive data (secrets, session tokens) that should not persist.

Storage driver alternatives: overlay2 is the default on all modern Linux distributions. Other drivers include fuse-overlayfs (rootless containers), devicemapper (legacy, deprecated), btrfs (Btrfs filesystem), and zfs (ZFS filesystem). overlay2 is recommended for all use cases unless you have a specific reason to use another driver.

io/thecodeforge/storage_architecture.shBASH

#!/bin/bash
# Inspect the complete Docker storage architecture

# ── 1. Check the storage driver ──────────────────────────────────────────────
docker info --format '{{.Driver}}'
# overlay2 (default on modern Linux)

# ── 2. Inspect the overlay2 directory structure ──────────────────────────────
ls /var/lib/docker/overlay2/ | head -10
# Each directory is a layer (image or container writable layer)

# Each layer directory contains:
ls /var/lib/docker/overlay2/<layer-hash>/
# diff/   — the actual filesystem content (only files that changed)
# link    — short name for the layer (used for path length limits)
# lower   — references to parent layers
# merged/ — the combined view (only for container layers)
# work/   — overlay2 internal working directory

# ── 3. Compare container vs image layers ─────────────────────────────────────
# Image layers are read-only and shared:
IMAGE_LAYERS=$(docker inspect alpine:3.19 --format '{{.RootFS.Layers}}')
echo "Image has $(echo $IMAGE_LAYERS | tr ' ' '\n' | wc -l) layers"

# Container adds one writable layer:
docker create --name storage-test alpine:3.19 sleep 3600
CONTAINER_UPPER=$(docker inspect storage-test --format '{{.GraphDriver.Data.UpperDir}}')
echo "Container writable layer: $CONTAINER_UPPER"

# ── 4. Demonstrate layer sharing between containers ──────────────────────────
# Create two containers from the same image
docker create --name storage-a alpine:3.19 sleep 3600
docker create --name storage-b alpine:3.19 sleep 3600

# Compare their lower layers (should be identical):
LOWER_A=$(docker inspect storage-a --format '{{.GraphDriver.Data.LowerDir}}')
LOWER_B=$(docker inspect storage-b --format '{{.GraphDriver.Data.LowerDir}}')
echo "Container A lower: $LOWER_A"
echo "Container B lower: $LOWER_B"
# Same layers — shared, not duplicated

# Compare their upper layers (should be different):
UPPER_A=$(docker inspect storage-a --format '{{.GraphDriver.Data.UpperDir}}')
UPPER_B=$(docker inspect storage-b --format '{{.GraphDriver.Data.UpperDir}}')
echo "Container A upper: $UPPER_A"
echo "Container B upper: $UPPER_B"
# Different directories — each container has its own writable layer

# ── 5. Inspect volumes ───────────────────────────────────────────────────────
docker volume create demo-volume

# Volume location on host:
docker volume inspect demo-volume --format '{{.Mountpoint}}'
# /var/lib/docker/volumes/demo-volume/_data

# Volumes bypass overlay2 — direct host filesystem access:
docker run --rm -v demo-volume:/data alpine:3.19 sh -c 'echo hello > /data/test'
cat /var/lib/docker/volumes/demo-volume/_data/test
# hello — directly accessible on the host

# ── 6. Compare performance: overlay2 vs volume vs bind mount ─────────────────
# overlay2 write (container writable layer):
time docker run --rm alpine:3.19 sh -c 'dd if=/dev/zero of=/tmp/test bs=1M count=100'
# ~0.3s

# Volume write:
time docker run --rm -v demo-volume:/data alpine:3.19 sh -c 'dd if=/dev/zero of=/data/test bs=1M count=100'
# ~0.2s (slightly faster — no overlay2 overhead)

# Bind mount write:
time docker run --rm -v $(pwd):/data alpine:3.19 sh -c 'dd if=/dev/zero of=/data/test bs=1M count=100'
# ~0.2s (direct host filesystem)

# ── Cleanup ──────────────────────────────────────────────────────────────────
docker rm -f storage-test storage-a storage-b
docker volume rm demo-volume

Output

# Storage driver:

overlay2

# Image layers:

Image has 1 layers

# Layer sharing:

Container A lower: /var/lib/docker/overlay2/abc123/layers

Container B lower: /var/lib/docker/overlay2/abc123/layers

# Same layers — shared

Container A upper: /var/lib/docker/overlay2/def456/diff

Container B upper: /var/lib/docker/overlay2/ghi789/diff

# Different writable layers

# Volume:

/var/lib/docker/volumes/demo-volume/_data

# Performance:

overlay2: 0.31s

volume: 0.22s

bind: 0.21s

Storage Architecture as a Building

overlay2 has a copy-up penalty: modifying a file from a lower layer requires copying it to the upper layer first.
For multi-GB database files, copy-up causes seconds of latency on first write.
Volumes bypass overlay2 entirely — reads and writes go directly to the host filesystem.
Volumes survive container deletion. The overlay2 writable layer is deleted when the container is removed.

Production Insight

The layer sharing mechanism means that running 10 containers from the same image uses only one copy of the read-only layers plus 10 small writable layers. This is why Docker achieves 10-50x better density than VMs — shared layers are not duplicated. Monitor /var/lib/docker/overlay2/ disk usage to ensure shared layers are not consuming excessive space.

Key Takeaway

overlay2 stacks read-only image layers with a writable container layer. Multiple containers share read-only layers — this is the key to Docker's density advantage. Volumes bypass overlay2 and go directly to the host filesystem. Databases should always use volumes to avoid the copy-up overhead and ensure data persistence.

Storage Strategy by Use Case

IfStateless application (API, web server)

→

UseDefault overlay2 writable layer. No volumes needed.

IfDatabase or persistent data

→

UseNamed volume. Bypasses overlay2. Survives container deletion.

IfDevelopment — live code reload

→

UseBind mount (-v ./src:/app/src). Direct host access for fast iteration.

IfSensitive data (secrets, tokens)

→

Usetmpfs mount (--tmpfs /secrets:size=1m). In-memory only, never on disk.

Network Architecture: Bridge, veth, iptables, and DNS

Docker networking is built on Linux networking primitives — virtual bridges, veth pairs, iptables rules, and an embedded DNS server. Understanding these primitives explains why containers can communicate, why ports are published, and why the default bridge network lacks DNS.

The Docker bridge (docker0): When Docker is installed, it creates a Linux bridge called docker0 on the host. This bridge acts as a virtual switch. Each container connects to this bridge via a veth pair.

veth pairs: A veth (virtual Ethernet) pair is a pair of connected network interfaces — packets sent to one end appear on the other. Docker creates a veth pair for each container: one end (eth0) is inside the container's network namespace, the other end (vethXXXX) is attached to the docker0 bridge. This is how containers communicate with each other and the outside world.

iptables rules: Docker adds iptables rules for: - Port publishing (-p): DNAT rules forward traffic from the host port to the container's IP and port - Inter-container communication: the default bridge allows all containers to communicate. User-defined networks can be configured with --icc=false to block inter-container communication. - Outbound NAT: MASQUERADE rules allow containers to reach the internet via the host's network interface.

DNS resolution: The default bridge network has no DNS resolution — containers can only reach each other by IP. User-defined bridge networks have an embedded DNS server (127.0.0.11) that resolves container names to IP addresses. This is why docker-compose.yml services can reference each other by service name.

Network drivers: Docker supports multiple network drivers: - bridge: default for single-host setups. Creates a virtual bridge. - host: container shares the host's network stack. No isolation, best performance. - none: no network. Completely air-gapped. - overlay: VXLAN tunnel for multi-host communication (Docker Swarm, Kubernetes). - macvlan: assigns a MAC address to the container, making it appear as a physical device on the network.

io/thecodeforge/network_architecture.shBASH

#!/bin/bash
# Inspect the complete Docker network architecture

# ── 1. Check the Docker bridge ───────────────────────────────────────────────
ip addr show docker0
# docker0: <BROADCAST,MULTICAST,UP> mtu 1500
#     inet 172.17.0.1/16
# The bridge has the gateway IP for the container subnet

# ── 2. Create a container and inspect its veth pair ──────────────────────────
docker run -d --name net-demo alpine:3.19 sleep 3600

# Get the container's host PID:
CONTAINER_PID=$(docker inspect --format '{{.State.Pid}}' net-demo)

# Inside the container, see eth0 (one end of the veth pair):
docker exec net-demo ip addr show eth0
# eth0@if7: <BROADCAST,MULTICAST,UP> mtu 1500
#     inet 172.17.0.2/16

# On the host, see the other end of the veth pair:
ip link show | grep veth
# vethXXXX@if4: <BROADCAST,MULTICAST,UP> ... master docker0
# The host end is attached to the docker0 bridge

# ── 3. Inspect iptables rules ────────────────────────────────────────────────

# NAT rules for port publishing:
sudo iptables -t nat -L DOCKER -n -v
# DNAT  tcp  --  anywhere  anywhere  tcp dpt:8080 to:172.17.0.2:3000

# Forward rules:
sudo iptables -L DOCKER -n -v
# ACCEPT  tcp  --  anywhere  172.17.0.2  tcp dpt:3000

# MASQUERADE for outbound traffic:
sudo iptables -t nat -L POSTROUTING -n -v | grep 172.17
# MASQUERADE  all  --  172.17.0.0/16  !172.17.0.0/16

# ── 4. Check DNS resolution (default vs user-defined network) ────────────────

# Default bridge — no DNS:
docker exec net-demo cat /etc/resolv.conf
# nameserver 8.8.8.8 (host's DNS, not container-specific)
docker exec net-demo nslookup other-container
# Fails — no embedded DNS on default bridge

# User-defined network — embedded DNS:
docker network create app-net
docker run -d --name api --network app-net alpine:3.19 sleep 3600
docker run -d --name db --network app-net alpine:3.19 sleep 3600

docker exec api cat /etc/resolv.conf
# nameserver 127.0.0.11 (embedded DNS server)
docker exec api nslookup db
# Name: db  Address: 172.18.0.3

# ── 5. Inspect network configuration ─────────────────────────────────────────
docker network inspect bridge --format '{{json .IPAM.Config}}' | python3 -m json.tool
# [{"Subnet": "172.17.0.0/16", "Gateway": "172.17.0.1"}]

docker network inspect app-net --format '{{json .IPAM.Config}}' | python3 -m json.tool
# [{"Subnet": "172.18.0.0/16", "Gateway": "172.18.0.1"}]

# ── 6. Trace network traffic ─────────────────────────────────────────────────
# Capture packets on the docker0 bridge:
sudo tcpdump -i docker0 -n -c 10
# Shows ARP requests, TCP SYN packets between containers

# Capture packets inside a container:
sudo nsenter --net --target $CONTAINER_PID tcpdump -i eth0 -n -c 10

# ── Cleanup ──────────────────────────────────────────────────────────────────
docker rm -f net-demo api db
docker network rm app-net

Output

# Docker bridge:

docker0: inet 172.17.0.1/16

# Container eth0:

eth0@if7: inet 172.17.0.2/16

# veth pair on host:

vethXXXX@if4: master docker0

# iptables NAT:

DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:3000

# Default bridge DNS:

nameserver 8.8.8.8

# User-defined network DNS:

nameserver 127.0.0.11

Name: db Address: 172.18.0.3

Networking as a Building's Phone System

The default bridge is a legacy design from before Docker had user-defined networks.
Docker chose not to add DNS to the default bridge to avoid breaking backward compatibility.
User-defined networks were introduced later with DNS as a built-in feature.
The default bridge is effectively deprecated for production use — always create a user-defined network.

Production Insight

The iptables rules for port publishing are managed by dockerd. If iptables is flushed (iptables -F) while containers are running, port forwarding breaks. If dockerd restarts, it recreates the rules. But if a third-party firewall tool (ufw, firewalld) modifies iptables, Docker's rules may be overwritten. The fix: configure firewalld to not manage the docker0 bridge, or use Docker's --iptables=false flag and manage rules manually.

Key Takeaway

Docker networking uses a Linux bridge (docker0), veth pairs, and iptables rules. The default bridge has no DNS — always use user-defined networks. iptables rules for port publishing are managed by dockerd — third-party firewalls can interfere. The embedded DNS server (127.0.0.11) resolves container names on user-defined networks.

The Core Architectural Model — Why Client-Server Matters in Production

Docker uses a client-server architecture. The Docker client talks to the Docker Daemon, which builds, runs, and manages containers. They communicate through a REST API via UNIX sockets or a network interface. This is the fundamental model underlying everything else.

You don't install a monolithic "Docker." You install a client and a daemon. The daemon runs as root. The client runs as you. Same host, different security contexts.

Your docker run command is a REST call. Nothing more. If the daemon crashes, containers stop. No graceful degradation. No graceful anything.

Remote daemon support exists but introduces latency. A request to a daemon on another continent adds 300ms per command. Your CI pipeline hates this.

Most production setups pin the daemon to local sockets. UNIX sockets are faster than TCP and don't expose an attack surface. Your security team appreciates this.

When you debug "Docker not responding," check the socket, not the client. The client is almost never the problem.

DaemonConfigCheck.ymlYAML

// io.thecodeforge — devops tutorial

# Check which socket the daemon is listening on
sudo dockerd --config-file /etc/docker/daemon.json

# Typical daemon.json for production
{
  "hosts": ["unix:///var/run/docker.sock"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "storage-driver": "overlay2"
}

# Verify the socket exists
ls -la /var/run/docker.sock
# Output:
# srw-rw---- 1 root docker 0 Jan 15 14:23 /var/run/docker.sock

Output

srw-rw---- 1 root docker 0 Jan 15 14:23 /var/run/docker.sock

Production Trap:

Never expose the Docker socket over TCP without TLS. That's how you get cryptojackers. Use UNIX sockets or authenticated TLS endpoints only.

Key Takeaway

The Docker client is a CLI wrapper for REST calls. The daemon does all the work. Always secure the socket.

Images, Containers, Networks, Volumes — They're All Objects with Metadata

Docker stores everything as objects in a graph database on the daemon host. Images, containers, networks, volumes, secrets, configs. Every object has an ID, metadata, and a lifecycle.

Images are read-only templates. Think of them as frozen filesystem snapshots with metadata about ports, environment variables, and entrypoints. They're stored in layers. Each layer is a diff. Pulling an image means fetching layers. You can inspect an image's layers with docker history.

Containers are running instances of images. A container is a process with namespaces and cgroups applied, plus a writable layer on top of the image layers. When you stop a container, the writable layer persists unless you use --rm.

Networks are virtual Layer 2 segments. Bridge networks connect containers on the same host. Overlay networks span hosts in Swarm mode. Each network object has an IPAM config, subnet, and gateway.

Volumes are persistent data stores managed by Docker. They exist outside the container's union filesystem. Bind mounts are not volumes — they're host directory references. Don't confuse them.

Secrets and configs are encrypted objects available to Swarm services. They're mounted as files inside containers. Never store secrets in environment variables.

ObjectInspection.ymlYAML

// io.thecodeforge — devops tutorial

# Inspect an image's layers
$ docker history nginx:alpine
IMAGE          CREATED       CREATED BY                                      SIZE
605c77e624dd   2 weeks ago   /bin/sh -c #(nop)  CMD ["nginx" "-g" "daemon…   0B
cfbafb0ab33a   2 weeks ago   /bin/sh -c #(nop)  STOPSIGNAL SIGQUIT              0B
...

# Show all objects on the system
$ docker system df
TYPE            TOTAL     ACTIVE    SIZE        RECLAIMABLE
Images          12        4         2.345GB     1.2GB (51%)
Containers      8         3         1.2GB       800MB (67%)
Local Volumes   5         2         500MB       300MB (60%)
Build Cache     23        0         0B          0B

Output

TYPE TOTAL ACTIVE SIZE RECLAIMABLE

Images 12 4 2.345GB 1.2GB (51%)

Containers 8 3 1.2GB 800MB (67%)

Local Volumes 5 2 500MB 300MB (60%)

Build Cache 23 0 0B 0B

Senior Shortcut:

Run docker system df before any cleanup. It shows exact reclaimable space. Prune images with docker image prune -a only after confirming your cache isn't useful.

Key Takeaway

Everything in Docker is an object with metadata. Learn to inspect objects with docker inspect and docker system df. That's your debugging foundation.

● Production incidentPOST-MORTEMseverity: high

Docker Daemon Crashes Under Load — All Container Operations Fail for 20 Minutes

Symptom

CI builds started hanging at the 'docker build' step. The build command did not return an error — it just hung indefinitely. docker ps from the host also hung. systemctl status docker showed the daemon as 'active (running)' but docker info returned 'Cannot connect to the Docker daemon'. The daemon process (dockerd) was consuming 100% of one CPU core and 12GB of RAM (normally 200MB).

Assumption

The team assumed a network issue — perhaps the Docker registry was unreachable and the pull was hanging. They checked network connectivity to Docker Hub — it was fine. They assumed a disk space issue — df -h showed 40% disk usage, well within limits. They assumed a corrupted image cache — they tried docker system prune but the command also hung.

Root cause

The CI pipeline was running 50 concurrent docker build operations, each sending a 500MB build context to the daemon via the Unix socket. The daemon serialized all API requests through a single goroutine pool. With 50 concurrent builds, each requiring image layer extraction, filesystem operations, and metadata updates, the daemon's internal queue grew unbounded. The daemon's memory usage grew from 200MB to 12GB as it buffered build contexts and layer data. The Go runtime's garbage collector could not keep up, and the daemon became CPU-bound on GC cycles. The root cause was the daemon's single-process architecture — all operations (build, run, network, volume) share the same process and resource pool.

Fix

1. Limited concurrent builds to 10 per host using a semaphore in the CI pipeline. 2. Moved image builds to dedicated build servers separate from container runtime hosts. 3. Enabled BuildKit (DOCKER_BUILDKIT=1) which parallelizes build steps and reduces daemon load. 4. Added daemon resource monitoring: alert when dockerd RSS exceeds 2GB. 5. Configured the daemon with max-concurrent-downloads and max-concurrent-uploads to limit registry operations. 6. Considered migrating to containerd directly (bypassing dockerd) for high-throughput CI environments.

Key lesson

The Docker daemon is a single process that handles all operations — builds, runs, networking, volumes. Under high concurrency, it becomes a bottleneck.
Limit concurrent docker build operations per host. 50 concurrent builds can exhaust the daemon's memory and CPU.
Use BuildKit (DOCKER_BUILDKIT=1) for builds — it parallelizes steps and reduces daemon load compared to the legacy builder.
Separate build hosts from runtime hosts. Build operations are more resource-intensive than container lifecycle operations.
Monitor dockerd resource usage (CPU, memory, open file descriptors). A daemon consuming >2GB RSS is a sign of overload.

Production debug guideFrom daemon crashes to slow builds — systematic debugging paths through the component stack.6 entries

Symptom · 01

Docker daemon is unresponsive — all commands hang.

→

Fix

Check daemon process status and resource usage. Run ps aux | grep dockerd to find the PID. Check memory: cat /proc/<pid>/status | grep VmRSS. Check open file descriptors: ls /proc/<pid>/fd | wc -l. Check daemon logs: journalctl -u docker --since '10 minutes ago'. If the daemon is OOM-killed, check dmesg | grep -i oom. Restart: systemctl restart docker.

Symptom · 02

docker build is slow — hangs at 'Sending build context'.

→

Fix

Check build context size: du -sh . in the build directory. Check if .dockerignore exists and excludes large directories (node_modules, .git). Check if BuildKit is enabled: echo $DOCKER_BUILDKIT. Enable it: DOCKER_BUILDKIT=1 docker build . Check daemon logs for layer extraction errors: journalctl -u docker | grep -i 'error\|failed'.

Symptom · 03

docker pull is slow or times out.

→

Fix

Check network connectivity to the registry: curl -v https://registry-1.docker.io/v2/. Check if pull-rate limits are hit: check RateLimit-Remaining header. Check daemon concurrent download limit: docker info | grep 'Max Concurrent Downloads'. Increase if needed in /etc/docker/daemon.json. Check if a proxy or mirror is configured.

Symptom · 04

Container starts but immediately exits with code 0.

→

Fix

Check if the entrypoint/command is correct: docker inspect <image> --format '{{.Config.Cmd}} {{.Config.Entrypoint}}'. Check if the process completes immediately (e.g., echo instead of a long-running server). Check container logs: docker logs <container>. If using a shell-form CMD, the process may be PID 2 and not receive signals correctly.

Symptom · 05

Docker daemon disk usage is growing unboundedly.

→

Fix

Check disk usage: docker system df. Check detailed breakdown: docker system df -v. Identify unused images: docker images --filter dangling=true. Check for orphaned volumes: docker volume ls --filter dangling=true. Prune: docker system prune -a --volumes (WARNING: removes all unused images, containers, networks, and volumes).

Symptom · 06

Containerd is crashing or not responding.

→

Fix

Check containerd status: systemctl status containerd. Check containerd logs: journalctl -u containerd --since '10 minutes ago'. Check if the containerd socket exists: ls -la /run/containerd/containerd.sock. If containerd crashes, dockerd cannot create or manage containers. Restart: systemctl restart containerd. If it keeps crashing, check for corrupted snapshots: ls /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/.

★ Docker Architecture Triage Cheat SheetFirst-response commands when the daemon, builds, pulls, or container creation fail.

Docker daemon is unresponsive or crashed.−

Immediate action

Check daemon process and resource usage.

Commands

systemctl status docker && ps aux | grep dockerd

journalctl -u docker --since '10 minutes ago' --no-pager | tail -30

Fix now

If daemon is running but unresponsive, restart: systemctl restart docker. If OOM-killed, increase memory limits or reduce concurrent operations.

docker build hangs at 'Sending build context'.+

docker pull fails or is extremely slow.+

Container exits immediately after start.+

Docker disk usage is growing rapidly.+

containerd is not running or crashing.+

Docker Component Responsibilities

Component	Role	Failure Impact	Runs As
Docker CLI	Sends API requests to the daemon	CLI commands fail — containers unaffected	User process
dockerd (daemon)	Manages API, images, networks, volumes	All CLI operations fail — existing containers keep running	Root process
containerd	Manages container lifecycle, image pulling	New containers cannot be created — existing ones keep running	Root process
runc	Creates a single container from OCI spec	The specific container creation fails — others unaffected	Short-lived (exits after creation)
containerd-shim	Monitors container process, captures output	Container loses stdout/stderr capture — process still runs	Per-container process
pause	Holds namespaces open for restart	Container cannot restart — namespaces destroyed on exit	Per-container process

⚙ Quick Reference

8 commands from this guide

File	Command / Code	Purpose
iothecodeforgearchitecture_flow.sh	curl --unix-socket /var/run/docker.sock http://localhost/version \| python3 -m js...	Component Stack
iothecodeforgebuild_flow.sh	tar -cf - . \| wc -c	Image Build Flow
iothecodeforgeregistry_flow.sh	docker save io.thecodeforge/api:1.0 -o /tmp/api-image.tar	Image Distribution
iothecodeforgecontainer_creation_flow.sh	docker create --name flow-demo \	Container Creation Flow
iothecodeforgestorage_architecture.sh	docker info --format '{{.Driver}}'	Storage Architecture
iothecodeforgenetwork_architecture.sh	ip addr show docker0	Network Architecture
DaemonConfigCheck.yml	sudo dockerd --config-file /etc/docker/daemon.json	The Core Architectural Model
ObjectInspection.yml	$ docker history nginx:alpine	Images, Containers, Networks, Volumes

Key takeaways

Docker is a stack

CLI -> dockerd -> containerd -> runc -> kernel. Each component has a specific role. The OCI spec standardizes the interface between containerd and runc.

Image build flow

CLI sends context -> daemon parses Dockerfile -> cache lookup per instruction -> execute miss -> commit layer -> tag image. .dockerignore is mandatory.

Container creation flow

CLI -> dockerd API -> containerd -> runc -> kernel syscalls (clone, pivot_root, execve). Every container is a real Linux process on the host.

overlay2 stacks read-only image layers with a writable container layer. Multiple containers share read-only layers

this is the key to Docker's density advantage.

Docker networking uses a Linux bridge, veth pairs, and iptables. The default bridge has no DNS

always use user-defined networks.

The daemon is a single point of failure. Limit concurrent operations, use BuildKit, and monitor daemon resource usage in production.

Common mistakes to avoid

7 patterns

Not understanding that the daemon is a single point of failure

Symptom

all Docker operations hang when the daemon is overloaded —

Fix

limit concurrent builds, use BuildKit, separate build hosts from runtime hosts, monitor daemon resource usage.

Sending a 500MB build context without .dockerignore

Symptom

docker build hangs at 'Sending build context' for minutes —

Fix

create .dockerignore with node_modules/, .git/, *.log, coverage/. This alone can reduce build time from 5 minutes to 10 seconds.

Using the default bridge network and expecting DNS resolution

Symptom

containers cannot reach each other by hostname —

Fix

create a user-defined bridge network. The embedded DNS server only works on user-defined networks.

Writing database data to the overlay2 filesystem

Symptom

slow writes due to copy-up, data loss on container removal —

Fix

use named volumes for databases. Volumes bypass overlay2 and survive container deletion.

Not setting resource limits on containers

Symptom

one container consumes all host RAM, OOM-killing unrelated containers —

Fix

set --cpus and --memory on every production container. Monitor with docker stats.

Flushing iptables while Docker is running

Symptom

container port forwarding breaks, containers become unreachable from the host —

Fix

restart dockerd to recreate iptables rules. Configure firewalld to not manage the docker0 bridge.

Not authenticating CI runners for Docker Hub pulls

Symptom

CI builds fail with 'toomanyrequests' after hitting the 100-pull-per-6-hours limit —

Fix

run docker login on all CI agents. Consider a pull-through cache registry.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Walk me through the complete flow from 'docker build' to a cached image ...

Q02SENIOR

Explain the Docker component stack: CLI, daemon, containerd, runc. What ...

Q03SENIOR

How does the OCI spec enable runtime replaceability? Why can you swap ru...

Q04SENIOR

Trace the container creation flow from 'docker run' to a running process...

Q05SENIOR

How does Docker networking work at the Linux level? Explain veth pairs, ...

Q06SENIOR

Why should databases use named volumes instead of the overlay2 filesyste...

Q07SENIOR

Your CI pipeline runs 50 concurrent docker build operations and the daem...

Q01 of 07SENIOR

Walk me through the complete flow from 'docker build' to a cached image on disk. What happens at each step, and how does layer caching work?

ANSWER

FAQ · 4 QUESTIONS

Frequently Asked Questions

What is the difference between containerd and dockerd?

What is the OCI spec and why does it matter?

Can I use containerd directly without dockerd?

Why is my docker build so slow at 'Sending build context'?

Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Drawn from code that ran under real load.

✓ Verified

production tested

July 04, 2026

last updated

1,663

articles · all by Naren

🔥

That's Docker. Mark it forged?

13 min read · try the examples if you haven't