Multi-Stage Docker Builds — Secrets Survive rm
Even after 'rm', secrets persist in Docker layers—token was recovered by inspecting history.
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
- Core mechanism: Each
FROMstarts a new stage. Only files explicitly copied viaCOPY --from=end up in the final image. Everything else — compilers, build caches, source code — is discarded. - Size impact: Typical reduction from 1.2 GB (single-stage) to 80-150 MB (multi-stage). The build tools never ship.
- Layer caching: BuildKit caches each stage independently. Changing application code does not invalidate the dependency-install stage.
- BuildKit parallelism: Stages with no dependency between them execute in parallel, cutting CI time.
- Security: Build secrets (API keys, tokens) used in early stages never appear in the final image layers.
- Biggest mistake: Forgetting that only explicitly COPYed artifacts survive. If you build a binary in stage 1 but forget to COPY it in stage 2, the final image has nothing to run.
Imagine you're baking a cake. You need mixing bowls, electric beaters, and measuring cups to make it — but when you serve the cake to guests, you don't put all that equipment on the plate with it. Multi-stage Docker builds work the same way: one stage is your messy kitchen where all the building happens, and the final stage is just the clean, finished cake. Your users get the cake. The mixing bowls stay in the kitchen — and never ship to production.
Every second your Docker image takes to pull across a network is a second your deployment is stalled. In Kubernetes environments rolling out hundreds of pods under load, or CI pipelines building dozens of images a day, bloated images are a reliability and cost problem. A Node.js app shipping with its full devDependencies, TypeScript compiler, and build toolchain alongside the production binary is a 1.2 GB image waiting to become a 3 AM outage.
Traditional single-stage Dockerfiles are all-or-nothing. You install the compiler, build the binary, copy the source — and all of it ends up baked into the final layer. Docker does not have a native concept of 'clean up after yourself' within a single build context, because every RUN instruction adds a new immutable layer. Removing files in a later layer does not reclaim space — it just hides them.
Multi-stage builds solve this by introducing multiple isolated build contexts inside one Dockerfile. Each FROM starts a fresh stage with its own filesystem. Only artifacts you explicitly copy forward survive into the final image. The build tools, intermediate object files, and source code are discarded with the build stage. This is the single most impactful Dockerfile optimization for production images.
Why rm in a Dockerfile Doesn't Shrink Your Image
Multi-stage Docker builds let you compile code in one stage and copy only the final artifact into a minimal runtime image. The core mechanic: multiple FROM statements in a single Dockerfile, each starting a new stage, and COPY --from=target to extract files from earlier stages. This eliminates the need to chain RUN apt-get remove or rm commands that never actually reduce image size because each RUN layer is additive — deleting a file in a later layer only marks it as hidden, not removed. The final image still contains the full weight of every intermediate layer.
A typical Java workflow: stage one uses a full JDK image (e.g., eclipse-temurin:17-jdk) to compile and package a Spring Boot fat JAR. Stage two starts from a slim JRE image (e.g., eclipse-temurin:17-jre-alpine) and copies only the JAR from stage one. The resulting image is ~180 MB instead of ~400 MB. No apt-get purge, no rm -rf /var/lib/apt/lists/* — just a clean COPY. Each stage can use different base images, different working directories, and even different OS families.
Use multi-stage builds whenever your build process requires tools absent from the runtime image — compilers, build tools, test runners. This is every production Java service, every microservice, every CI pipeline. The alternative is either a bloated image with build-time dependencies or a fragile script that tries to clean up after itself. Multi-stage is the only pattern that guarantees a minimal, reproducible runtime image without sacrificing build-time tooling.
How Multi-Stage Builds Work at the Layer Level
A Dockerfile with multiple FROM instructions creates multiple isolated stages. Each stage starts with a fresh filesystem initialised from its base image. Stages are identified by their index (0, 1, 2...) or by an alias assigned with AS.
The critical insight: only files you explicitly COPY --from=<stage> are transferred between stages. Everything else — compilers, build caches, intermediate object files, source code — exists only in the build stage's filesystem and is discarded when the build completes. The final image contains only the last stage's filesystem plus any files copied into it.
Docker's layer system means each RUN, COPY, and ADD instruction creates an immutable layer. In a single-stage build, RUN rm file creates a NEW layer that hides the file — the original layer with the file still exists in the image. Multi-stage builds avoid this entirely by never including the file in the final stage's layers in the first place.
- Each FROM creates a new isolated filesystem
- Data moves between stages ONLY via COPY --from
- Build stage is destroyed after build — its layers never ship
- Final image = last stage filesystem only
- Secrets in early stages cannot leak into final stage unless explicitly copied
COPY . .) before installing dependencies. When any source file changes, Docker invalidates the COPY layer and every subsequent layer — including the dependency installation layer. Effect: Every code change triggers a full npm install or go mod download, adding 30-120 seconds to build time depending on dependency count. In CI, this multiplies across dozens of daily builds. Action: Copy dependency manifests (package.json, go.mod, pom.xml) BEFORE copying source code. Install dependencies in a layer that only invalidates when the manifest changes. This single reordering typically cuts rebuild time by 60-80%.FROM scratch (~0 MB) or FROM alpine (~7 MB) with ca-certificates for HTTPS.FROM node:20-slim (~200 MB). Alpine may fail on native modules that require glibc.FROM eclipse-temurin:21-jre-alpine (~100 MB). JRE only, not JDK — the compiler is not needed at runtime.FROM python:3.12-slim (~150 MB). Alpine's musl libc fails on C extensions.FROM scratch with a statically linked binary. No shell, no attacks — but no debug tools either.Stage Targeting with --target: Production vs Build vs Test
Docker allows you to stop the build at any named stage using the --target flag. This is invaluable for development workflows, CI, and debugging. Without --target, Docker builds all stages up to the last FROM. With --target, you can request only the stages you need.
- Development: Build only up to the
depsorbuilderstage, which includes all dev dependencies and tooling. Developers get a container with live-reload tools (e.g., nodemon, reflex) without waiting for the full production image to build. - Testing: Build a
teststage that runs unit tests, linters, and security scans. CI can pull this stage, run tests, and discard it without ever building the final production image. - Production: Build the final runtime stage (
runtimeorproduction) that contains only what ships to production.
- Use
AS <name>on everyFROMyou may want to target. - The last
FROMthat isn't used as a base for other stages is the default target. - Build command:
docker build --target production -t myapp:latest .
- --target includes the named stage and all its upstream dependencies
- Stages not in the dependency chain are skipped completely
- Use --target in CI to build test image without waiting for production build
- Great for parallel CI: test stage and production stage can be built independently
docker build --target test to build only the test stage. Run tests in that image, then docker build --target production only on successful test passes. This cuts CI pipeline time by 30-50% for typical workflows.docker build --target <stage> gives you surgical control over which stages execute. Use it to create efficient CI pipelines that separate test from production builds, and to give developers fast feedback images. Name every stage you might want to target.--target development or --target dev. This stage has full toolchain and dev dependencies.--target test. Build only dependencies + test execution stage. Faster feedback.--target production (or the last stage, which is the default). Minimal, secure, no build tools.--target builder or the stage that fails. Quickly inspect state with docker run -it <stage-image> /bin/sh.BuildKit, Parallel Stages, and Cache Mounts
BuildKit is Docker's modern build engine, enabled by setting DOCKER_BUILDKIT=1 or using docker buildx. It brings three critical capabilities to multi-stage builds:
- Parallel stage execution: Stages that do not depend on each other run concurrently. If your Dockerfile has a test stage and a build stage that both depend on the dependency-install stage, BuildKit runs test and build in parallel after dependencies are installed.
- Cache mounts:
--mount=type=cachepersists a directory across builds without invalidating the layer. This is transformative for package managers — mount the npm/pip/go cache directory so dependency downloads are cached across builds even when the layer would otherwise be invalidated. - Secret mounts:
--mount=type=secretprovides a temporary file during RUN execution that is never stored in any layer. This is the correct way to use API keys, tokens, and credentials during builds.
- Layer cache: if inputs unchanged, skip layer entirely (0s)
- Cache mount: if layer re-runs, reuse downloaded packages (fast re-download)
- Use both together for maximum build speed
- Cache mounts require BuildKit — they do not work with the legacy builder
--mount=type=cache for package manager caches. Combine with proper layer ordering (dependency manifest before source code). This typically cuts CI build time from 3-5 minutes to 30-60 seconds for incremental changes.--mount=type=secret. Available as a file during RUN, never stored in any layer.--mount=type=cache targeting the package manager cache (e.g., /root/.npm, /root/.cache/pip).COPY --from=intermediate in the final stage.Production Patterns: Go, Node.js, and Java
Each language ecosystem has specific multi-stage build patterns that address its unique characteristics. The patterns below are battle-tested in production and handle the most common failure modes.
- Copy dependency manifest (pom.xml, go.mod, package.json) FIRST
- Install/download dependencies in a separate layer
- Copy source code AFTER dependencies are installed
- Build/compile in the final layer
- Changing source code only invalidates the build layer, not the dependency layer
FROM scratch (0 MB). Binary runs directly. COPY ca-certificates from alpine for HTTPS.FROM node:20-alpine (~120 MB). Fastest pull, smallest footprint.FROM node:20-slim (~200 MB). Alpine's musl libc breaks some native modules.FROM eclipse-temurin:21-jre-alpine (~120 MB). JRE only, not JDK.FROM python:3.12-alpine (~50 MB). Minimal footprint.FROM python:3.12-slim (~150 MB). Alpine requires recompiling C extensions.Image Size Reduction Comparison Table
One of the strongest arguments for multi-stage builds is the dramatic reduction in image size. Below is a comparison of typical image sizes for common language stacks using single-stage vs. optimal multi-stage builds. These numbers are based on production images (source code + dependencies + runtime) and assume best practices like using slim/alpine bases and stripping debug symbols.
The pattern is consistent: build tools and compilers account for 70-90% of image bulk. Multi-stage builds eliminate them from the final image, leaving only the runtime and the compiled artifacts.
- Go and Rust can use FROM scratch – often < 20 MB
- Node.js minimum ~120 MB (node:20-alpine), but includes Node runtime
- Java JRE alpine ~120 MB, but JDK adds ~700 MB
- Python slim ~150 MB, but C extensions add 200-400 MB
Multi-Stage Build Example: Go (Minimal Pattern)
This section provides a focused, minimal multi-stage Dockerfile for a Go application. It demonstrates the canonical pattern: build stage containing the compiler and source, and a runtime stage with only the binary. The result is a 97% reduction in image size.
Key steps: 1. Use golang:alpine as the build stage – includes Go compiler and standard library, but is smaller than full debian-based images. 2. Copy go.mod/go.sum first – layer caching ensures go mod download runs only when dependencies change. 3. Build with CGO_ENABLED=0 – produces a statically linked binary that does not depend on glibc, allowing FROM scratch as runtime. 4. Use -ldflags='-s -w' – strips debug symbol tables, reducing binary size by ~30%. 5. Runtime base: scratch – zero bytes, no shell, no package manager. Maximum security and minimal size.
- CGO_ENABLED=0 produces a fully static binary
- scratch has no shell, no utilities – secure but limited
- Add ca-certificates from builder for HTTPS calls
- For debugging, use ephemeral debug containers
file server (should say 'statically linked').FROM scratch. Smallest and most secure. Downside: no shell for debugging.FROM alpine:3.19 (5-7 MB) and install only what you need. Avoid debian-based images unless necessary.FROM gcr.io/distroless/base. Includes glibc, tzdata, but no shell or package manager. Good balance.Security: Secret Management and Image Scanning
Multi-stage builds are a security primitive, not just a size optimisation. The build stage isolation means secrets used during compilation never appear in the final image — IF you use the correct mechanisms. The wrong mechanism (ENV, ARG, or inline RUN) leaks secrets into layers permanently.
Three rules for secret management in Docker builds: 1. Never use ENV or ARG for secrets — they persist in image metadata. 2. Never hardcode secrets in RUN commands — they persist in layer history. 3. Always use BuildKit secret mounts — --mount=type=secret provides temporary file access without layer persistence.
Beyond secrets, the final image should be scanned for vulnerabilities. Even a minimal alpine base image may contain packages with known CVEs. Trivy, Grype, and Snyk Container can scan images in CI and block deployment if critical vulnerabilities are found.
- ARG values appear in docker inspect metadata
- ENV values appear in docker inspect and are available to all subsequent layers
- RUN echo secret > file stores the secret in that layer permanently
- --mount=type=secret is the ONLY mechanism that does not persist the secret
docker inspect or docker history on the image. In shared registries, this means every developer and every CI system with pull access can extract production secrets. Action: Use BuildKit secret mounts exclusively. Add Trivy or Grype scanning to CI to detect accidentally embedded secrets. Run docker history <image> as a post-build verification step.--mount=type=secret,id=token to mount the secret as a file during RUN. Never stored in any layer.docker run -e VAR=value or Kubernetes env vars — not in the Dockerfile.docker history <image> then docker inspect <image>. Search for leaked tokens. Add Trivy to CI.Name Your Build Stages or Get Burned in Debugging
Most devs treat stage names like commit messages — useless until you need them. Don't be that person. Naming stages with descriptive aliases isn't for clarity in the Dockerfile. It's for survival when you're chasing a 3AM build failure.
When you reference --from=build instead of --from=0, you can read your CI logs without a decoder ring. But the real win comes from docker build --target=test. That single line lets you run integration tests in a throwaway stage without rebuilding the entire pipeline. Your production image stays untouched. Your CI minutes stay human.
Here's the rule: every stage that produces something you might inspect, test, or copy from needs a name. build, test, dev-deps, production. If it only exists to install compilers, leave it unnamed — it's dead code the moment it finishes.
The senior move? Use stage names as documentation. If I see FROM golang:1.21 AS vet-lint, I know exactly what that stage does. If I see FROM node:18 AS build, I know the author has been burned by FROM 0.
COPY --from=0 in a multi-stage build, changing the order of any stage above it will silently break your build. Always use names. Always.Go Distroless or Go Home: Why Scratch Isn't Enough
You've shrunk your Go binary from 800MB to 12MB using multi-stage builds. Congratulations. Now what? Your scratch image has no shell, no ls, no curl. A perfectly minimal attack surface. But also a perfectly useless debugging environment when production misbehaves.
Distroless images from Google (gcr.io/distroless/base) give you the middle ground: they contain essential runtime dependencies — glibc, SSL certificates, timezone data — but no package manager, no shell, no exploit-friendly utilities. Your 12MB Go binary on scratch just became 16MB on distroless. That 4MB overhead buys you the ability to run openssl checks and properly resolve DNS in Kubernetes.
Here's the decision tree: If your app needs SSL certificates (HTTP clients, database connections, gRPC), use distroless. If your app runs as PID 1 and never talks to the outside world, scratch is fine. But if you ever need to exec into a container and troubleshoot — distroless gives you just enough rope without the noose.
The real hack? Start with distroless, then strip back to scratch only when you've profiled exactly what's unused. Most teams overestimate their binary's independence and underestimate debugging time.
gcr.io/distroless/base as your base and only copy your binary. It's maintained by Google's security team — you can't beat that for free.The Chunky Single-Stage Build
A single-stage Dockerfile installs everything—compiler, test tools, dev dependencies, and runtime—into one image. Each RUN command adds a new layer, but even if you delete files in the same layer, the file data remains in the previous layer, bloating the final image. The real problem: your production container carries a compiler and build artifacts it never executes. This violates the principle of least privilege and expands the attack surface. A chunky build also slows CI, because every layer change invalidates downstream caches. When you push a minor code fix, Docker rebuilds layers from the first change, not from the optimized base. The fix isn't more rm commands—it's separating build and runtime into distinct stages. Single-stage builds feel simple but cost you security, speed, and image hygiene.
How to Implement Multi-Stage Builds
Multi-stage builds let you use multiple FROM statements in one Dockerfile. Each FROM begins a new stage, and only files you explicitly COPY from previous stages survive to the final image. The pattern: build stage installs dependencies and compiles; runtime stage starts from a minimal base (e.g., alpine, distroless) and copies only the binary. This drops image size by 80-90% and removes tooling that attackers could exploit. You can name stages—builder, test, production—and target them with --target. Implementation requires no extra tooling; Docker's built-in parser handles it. Start by moving all build steps (go build, npm install, mvn package) into a stage named build. Then create a production stage with COPY --from=build. Validate with docker history to see only runtime layers survive.
--from=build copies from your host filesystem, not the build stage—your final image will be incomplete or huge.Secrets Leaked in Docker Image Layers Exposed API Keys to Public Registry
RUN rm /root/.npmrc after npm install would remove the token from the image. They did not understand that Docker layers are immutable — the rm creates a new layer that hides the file, but the original layer containing the token is still present in the image history./root/.npmrc in one RUN layer, npm install ran in the next layer, and RUN rm /root/.npmrc ran in a third layer. The rm command only marked the file as deleted in the new layer — the token was still recoverable from the earlier layer by inspecting docker history or extracting layers manually. The image was pushed to a public ECR registry without secret scanning.--mount=type=secret to mount the npm token as a temporary file that never persists in any layer.
2. Added DOCKER_BUILDKIT=1 to CI to enable BuildKit secret mounts.
3. Added Trivy and Hadolint scanning to CI pipeline — both detect secrets in image layers.
4. Rotated the compromised npm token immediately.
5. Made the registry private and added IP-based access controls.- Docker layers are immutable.
RUN rmdoes not remove data — it creates a new layer that hides the file. The original data is still in the image. - Never embed secrets in RUN commands. Use BuildKit secret mounts (
--mount=type=secret) or multi-stage builds where the secret stage is discarded. - Always run image security scanning (Trivy, Grype, Snyk Container) in CI before pushing to any registry.
- Public registries are hostile environments. Assume every layer will be inspected by an attacker.
COPY --from=builder path matches the actual build output location. Common mistake: building with WORKDIR /app in the build stage but copying from /build/output in the final stage. Check docker run --rm -it <build-stage-image> ls /app to see what the build stage actually produced.FROM node:20 as the runtime base pulls in the full Node.js SDK (~900 MB). Switch to FROM node:20-slim (~200 MB) or FROM node:20-alpine (~120 MB). Also verify that devDependencies are not installed in the final stage — use npm ci --omit=dev.DOCKER_BUILDKIT=1 or add "features": {"buildkit": true} to /etc/docker/daemon.json. Also check if layer caching is being invalidated by copying files too early in the Dockerfile.docker history <image> output.RUN --mount=type=secret,id=npm_token npm install and pass the secret at build time with --secret id=npm_token,src=.npmrc. Verify with docker history that the secret does not appear.docker history <image> --no-trunc # inspect layer sizesdocker run --rm <image> du -sh /* # find large directoriesKey takeaways
Interview Questions on This Topic
Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Everything here is grounded in real deployments.
That's Docker. Mark it forged?
9 min read · try the examples if you haven't