Docker Daemon Bottleneck — 50 Concurrent Builds Crash
50 concurrent docker builds consumed 12GB RAM and froze the daemon for 20 minutes.
- docker build: CLI sends context to dockerd -> dockerd executes Dockerfile instructions -> each instruction creates a cached layer -> layers stored under /var/lib/docker/overlay2/
- docker push: dockerd uploads layers to a registry -> registry stores layers by digest -> tags point to manifests
- docker run: CLI sends API request to dockerd -> dockerd delegates to containerd -> containerd invokes runc -> runc configures namespaces/cgroups/filesystem -> exec starts the application
- Docker CLI: HTTP client that talks to the daemon via Unix socket
- dockerd (daemon): manages images, networks, volumes, and the REST API
- containerd: manages container lifecycle, image pulling, and snapshot management
- runc: creates containers by calling kernel syscalls (clone, pivot_root, exec)
Most Docker documentation treats the architecture as a black box — run a command, get a container. This abstraction breaks down when containers fail to start, images pull slowly, or the daemon crashes under load. Understanding the component stack and the data flow between components is essential for production debugging.
Docker is not one program. It is a chain of specialized components: the CLI sends API requests to the daemon, the daemon delegates to containerd, containerd invokes runc, and runc configures the Linux kernel to create an isolated process. Each handoff is a potential failure point. The OCI (Open Container Initiative) spec standardizes the interface between containerd and runc, enabling runtime replaceability.
This article traces the complete end-to-end flow: what happens when you run docker build, how images are stored and distributed, what happens when you run docker run, how networking and storage are wired, and where each component lives on the filesystem. Every section includes production failure scenarios and debugging commands.
Component Stack: CLI, Daemon, containerd, runc, and the OCI Spec
Docker is a chain of five components, each with a specific responsibility. Understanding this chain is the foundation for debugging any Docker issue.
Docker CLI (docker): A Go binary that sends HTTP requests to the Docker daemon via a Unix socket (/var/run/docker.sock) or TCP. The CLI does not create containers, build images, or manage networks — it is a thin client. You can replace it with curl: curl --unix-socket /var/run/docker.sock http://localhost/containers/json.
Docker daemon (dockerd): A long-running Go process that manages the Docker API, image storage, network configuration, volume management, and build orchestration. The daemon listens on the Unix socket and processes all API requests. It delegates container lifecycle operations to containerd. The daemon runs as root and has full access to the host.
containerd: A container runtime daemon that manages the complete container lifecycle — pulling images, managing snapshots, creating containers, and handling execution. containerd was originally part of Docker but was extracted as a CNCF project in 2017. It is now used independently by Docker, Kubernetes (via CRI), AWS ECS, GKE, and other platforms. containerd invokes runc to actually create containers.
runc: A lightweight CLI tool that creates a single container from an OCI runtime specification (config.json). runc calls clone() to create a new process with namespaces, configures cgroups, mounts the filesystem via overlay2, drops privileges, and exec's the application process. runc exits after creating the container — it does not manage the lifecycle.
OCI spec: The Open Container Initiative defines two standards: the image spec (how images are packaged as layers + manifest) and the runtime spec (how containers are created as config.json). This standardization enables runtime replaceability — you can swap runc for crun, kata-runtime, or runsc without changing Docker or containerd.
The handoff chain: docker run -> dockerd (API) -> containerd (lifecycle) -> runc (creation) -> kernel (namespaces, cgroups, overlay2). Each handoff is a potential failure point. If dockerd crashes, all operations fail. If containerd crashes, new containers cannot be created but existing ones keep running. If runc fails, the specific container creation fails but the stack above is unaffected.
Image Build Flow: From Dockerfile to Cached Layers
When you run docker build, a precise sequence of operations transforms a Dockerfile into a cached, layered image. Understanding this flow explains why builds are slow, why layers are cached, and why image size matters.
Step 1: Send build context. The CLI tar's the current directory (or the path specified in docker build -f) and sends it to the daemon via the Unix socket. This is the 'Sending build context to Docker daemon' message. The .dockerignore file filters out excluded files before sending. Without .dockerignore, the entire directory (including .git, node_modules) is sent.
Step 2: Parse the Dockerfile. The daemon parses the Dockerfile and executes each instruction sequentially. Each instruction is evaluated against the layer cache.
Step 3: Cache lookup. For each instruction, the daemon checks if a cached layer exists with the same instruction text and the same parent layer. If the cache hit, the layer is reused (no execution). If the cache miss, the instruction is executed and a new layer is created. The cache is sequential — a miss invalidates all subsequent layers.
Step 4: Execute the instruction. For RUN, the daemon creates a temporary container from the previous layer, executes the command, and captures the filesystem diff as a new layer. For COPY/ADD, the daemon copies files from the build context into a new layer. For ENV/EXPOSE/LABEL, the daemon creates a metadata-only layer (no filesystem change).
Step 5: Commit the layer. The filesystem diff is committed as a new layer under /var/lib/docker/overlay2/. Each layer is a directory containing only the files that changed from the previous layer. The layer is identified by a SHA256 digest.
Step 6: Tag the image. After all instructions are executed, the final layer is tagged with the image name and tag (e.g., my-app:1.0.0). The tag points to a manifest — a JSON file that lists all layers in order.
BuildKit vs legacy builder: The legacy builder executes instructions sequentially. BuildKit (DOCKER_BUILDKIT=1) builds a dependency graph and executes independent instructions in parallel. BuildKit also supports --mount=type=secret for build-time secrets without baking them into layers. BuildKit is the default in Docker Desktop and is recommended for all builds.
Image Distribution: Registry, Manifest, and Layer Deduplication
Once an image is built, it needs to be distributed to other machines — CI servers, staging environments, production clusters. This is the registry's job.
Image format: An image is not a single file. It is a collection of: - Layers: compressed tar archives, each identified by a SHA256 digest - Manifest: a JSON file that lists the layers in order and points to the image config - Image config: a JSON file that defines the runtime configuration (env vars, entrypoint, exposed ports, user)
The registry protocol: Docker registries implement the OCI Distribution Spec — an HTTP API for pushing and pulling images. The flow: 1. Client sends the manifest to the registry 2. Registry checks which layers it already has (by digest) 3. Client uploads only the missing layers 4. Registry stores layers by digest and links them to the manifest
Layer deduplication: This is the key efficiency mechanism. If two images share the same base layer (e.g., both use node:20-alpine), the layer is stored once on the registry and once on the local machine. When you pull a second image that shares layers with an existing image, only the unique layers are downloaded. This is why pulling a new version of your app is fast — only the top layers (containing your code) change.
Docker Hub pull-rate limits: Docker Hub limits pulls per IP: 100 per 6 hours for anonymous users, 200 for authenticated free users. This limit is per IP, not per user — a NAT gateway makes multiple machines appear as one IP. For CI/CD pipelines, this limit is hit quickly. The fix: authenticate with docker login, use a pull-through cache, or mirror images to a private registry.
Content trust (DCT): Docker Content Trust uses digital signatures to verify image integrity. When DOCKER_CONTENT_TRUST=1, Docker only pulls signed images. This prevents supply chain attacks where a malicious image is pushed with the same tag as a legitimate image.
Container Creation Flow: From Image to Running Process
When you run docker run, a precise sequence of operations creates an isolated process from an image. This is the most critical flow to understand for production debugging.
Step 1: API request. The CLI sends a POST /containers/create request to dockerd. The request includes the image name, command, environment variables, port mappings, volume mounts, and resource limits.
Step 2: Image resolution. dockerd checks if the image exists locally. If not, it pulls the image from the registry. The image's layers are unpacked into /var/lib/docker/overlay2/.
Step 3: Create container metadata. dockerd creates a container configuration (container JSON) that includes the merged overlay2 directory, network settings, volume mounts, and resource limits. This metadata is stored in /var/lib/docker/containers/<container-id>/.
Step 4: Delegate to containerd. dockerd sends a gRPC request to containerd to create the container. containerd generates the OCI runtime spec (config.json) — a JSON file that defines namespaces, cgroups, mounts, and the process to execute.
Step 5: Invoke runc. containerd invokes runc create with the OCI spec. runc reads config.json and executes kernel syscalls: - clone(CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC) — creates a new process with namespaces - mount() — mounts /proc, /sys, /dev inside the container - pivot_root() — changes the root to the overlay2 merge directory - setuid()/setgid() — drops privileges (if non-root) - execve() — starts the application process
Step 6: Network setup. dockerd (via libnetwork) creates a veth pair — one end in the container's network namespace, one end on the Docker bridge (docker0). The container gets an IP address from the bridge's subnet. iptables rules are added for port publishing (-p) and inter-container communication.
Step 7: Monitor the process. containerd-shim monitors the container process, captures stdout/stderr, and handles signals. The pause process holds the namespaces open. When the application process exits, containerd-shim reports the exit code to containerd, which reports to dockerd.
Storage Architecture: overlay2, Volumes, and the Filesystem Stack
Docker's storage architecture has three layers: the image layers (read-only, cached), the container layer (writable, per-container), and volumes (persistent, managed separately). Understanding this stack explains why containers start fast, why data disappears, and why database performance differs between containers and bare metal.
overlay2 driver: The default storage driver. It stacks directories (layers) and presents a merged view. Each image layer is a directory under /var/lib/docker/overlay2/. The container's writable layer is a separate directory. The merged view is what the container sees as its root filesystem.
Layer sharing: Multiple containers from the same image share the same read-only layers. Each container has its own writable layer. This is why starting a second container from the same image is nearly instant — no data is copied, only a new writable directory is created.
Volumes: Named volumes are directories under /var/lib/docker/volumes/<volume-name>/_data. They are mounted into the container at the specified path. Volumes bypass overlay2 entirely — reads and writes go directly to the host filesystem. This is why databases should use volumes: no copy-up overhead, no overlay2 performance penalty, and data survives container deletion.
Bind mounts: Bind mounts map a specific host directory into the container. They also bypass overlay2. Bind mounts are ideal for development (live code reload) but risky in production (the container can modify host files).
tmpfs mounts: tmpfs mounts store data in memory only. They never touch the disk. Useful for sensitive data (secrets, session tokens) that should not persist.
Storage driver alternatives: overlay2 is the default on all modern Linux distributions. Other drivers include fuse-overlayfs (rootless containers), devicemapper (legacy, deprecated), btrfs (Btrfs filesystem), and zfs (ZFS filesystem). overlay2 is recommended for all use cases unless you have a specific reason to use another driver.
Network Architecture: Bridge, veth, iptables, and DNS
Docker networking is built on Linux networking primitives — virtual bridges, veth pairs, iptables rules, and an embedded DNS server. Understanding these primitives explains why containers can communicate, why ports are published, and why the default bridge network lacks DNS.
The Docker bridge (docker0): When Docker is installed, it creates a Linux bridge called docker0 on the host. This bridge acts as a virtual switch. Each container connects to this bridge via a veth pair.
veth pairs: A veth (virtual Ethernet) pair is a pair of connected network interfaces — packets sent to one end appear on the other. Docker creates a veth pair for each container: one end (eth0) is inside the container's network namespace, the other end (vethXXXX) is attached to the docker0 bridge. This is how containers communicate with each other and the outside world.
iptables rules: Docker adds iptables rules for: - Port publishing (-p): DNAT rules forward traffic from the host port to the container's IP and port - Inter-container communication: the default bridge allows all containers to communicate. User-defined networks can be configured with --icc=false to block inter-container communication. - Outbound NAT: MASQUERADE rules allow containers to reach the internet via the host's network interface.
DNS resolution: The default bridge network has no DNS resolution — containers can only reach each other by IP. User-defined bridge networks have an embedded DNS server (127.0.0.11) that resolves container names to IP addresses. This is why docker-compose.yml services can reference each other by service name.
Network drivers: Docker supports multiple network drivers: - bridge: default for single-host setups. Creates a virtual bridge. - host: container shares the host's network stack. No isolation, best performance. - none: no network. Completely air-gapped. - overlay: VXLAN tunnel for multi-host communication (Docker Swarm, Kubernetes). - macvlan: assigns a MAC address to the container, making it appear as a physical device on the network.
| Component | Role | Failure Impact | Runs As |
|---|---|---|---|
| Docker CLI | Sends API requests to the daemon | CLI commands fail — containers unaffected | User process |
| dockerd (daemon) | Manages API, images, networks, volumes | All CLI operations fail — existing containers keep running | Root process |
| containerd | Manages container lifecycle, image pulling | New containers cannot be created — existing ones keep running | Root process |
| runc | Creates a single container from OCI spec | The specific container creation fails — others unaffected | Short-lived (exits after creation) |
| containerd-shim | Monitors container process, captures output | Container loses stdout/stderr capture — process still runs | Per-container process |
| pause | Holds namespaces open for restart | Container cannot restart — namespaces destroyed on exit | Per-container process |
Key Takeaways
- Docker is a stack: CLI -> dockerd -> containerd -> runc -> kernel. Each component has a specific role. The OCI spec standardizes the interface between containerd and runc.
- Image build flow: CLI sends context -> daemon parses Dockerfile -> cache lookup per instruction -> execute miss -> commit layer -> tag image. .dockerignore is mandatory.
- Container creation flow: CLI -> dockerd API -> containerd -> runc -> kernel syscalls (clone, pivot_root, execve). Every container is a real Linux process on the host.
- overlay2 stacks read-only image layers with a writable container layer. Multiple containers share read-only layers — this is the key to Docker's density advantage.
- Docker networking uses a Linux bridge, veth pairs, and iptables. The default bridge has no DNS — always use user-defined networks.
- The daemon is a single point of failure. Limit concurrent operations, use BuildKit, and monitor daemon resource usage in production.
Interview Questions on This Topic
- QWalk me through the complete flow from 'docker build' to a cached image on disk. What happens at each step, and how does layer caching work?
- QExplain the Docker component stack: CLI, daemon, containerd, runc. What does each component do, and what happens when each one fails?
- QHow does the OCI spec enable runtime replaceability? Why can you swap runc for gVisor or Kata without changing Docker?
- QTrace the container creation flow from 'docker run' to a running process. What kernel syscalls does runc make?
- QHow does Docker networking work at the Linux level? Explain veth pairs, the docker0 bridge, and iptables rules for port publishing.
- QWhy should databases use named volumes instead of the overlay2 filesystem? What is the copy-up problem?
- QYour CI pipeline runs 50 concurrent docker build operations and the daemon becomes unresponsive. What is happening and how do you fix it?
Frequently Asked Questions
What is the difference between containerd and dockerd?
dockerd (the Docker daemon) is the user-facing server that manages the Docker API, image building, networking, and volumes. containerd is the container runtime that manages the container lifecycle — pulling images, creating containers, and handling execution. dockerd delegates to containerd for container operations. containerd was extracted from Docker in 2017 and is now used independently by Kubernetes and other platforms.
What is the OCI spec and why does it matter?
The OCI (Open Container Initiative) spec defines two standards: the image spec (how images are packaged as layers + manifest) and the runtime spec (how containers are created as config.json). This standardization means any OCI-compliant runtime (runc, crun, kata-runtime, runsc) can run any OCI-compliant image. It enables runtime replaceability — you can swap runc for gVisor without changing Docker.
Why is my docker build so slow at path specified). Without a .dockerignore file, this includes node_modules (500MB+), .git history (100MB+), and other large files. The CLI tar's this directory and sends it to the daemon over the Unix socket. Create a .dockerignore file to exclude unnecessary files. This alone can reduce build time from minutes to seconds.
Can I use containerd directly without dockerd?
Yes. containerd provides its own CLI (ctr) and API (gRPC). Kubernetes uses containerd directly via the CRI plugin, bypassing dockerd entirely. You can use ctr to pull images, create containers, and manage snapshots. This reduces overhead and removes the daemon as a single point of failure.
Why is my docker build so slow at 'Sending build context'?
The build context is the entire current directory (or the path specified in docker build -f). Without a .dockerignore file, this includes node_modules (500MB+), .git history (100MB+), and other large files. The CLI tar's this directory and sends it to the daemon over the Unix socket. Create a .dockerignore file to exclude unnecessary files. This alone can reduce build time from minutes to seconds.
That's Docker. Mark it forged?
10 min read · try the examples if you haven't