Intermediate 11 min · March 06, 2026

Docker Crash on Boot — API Connects Before Postgres Ready

API container exits ECONNREFUSED 5432 due to missing healthcheck on Postgres.

N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Container: a running instance of an image — isolated filesystem, network, and process tree sharing the host kernel
  • Image: a read-only blueprint built from layers — each Dockerfile instruction creates one cached layer
  • Dockerfile: the build script that defines the image — instructions are executed top-to-bottom
  • Volume: persistent storage that survives container deletion — named volumes for production, bind mounts for development
  • Containers share the host OS kernel (no guest OS overhead)
  • VMs run a full guest OS per instance (stronger isolation, much heavier)
  • Layer caching: changing one layer invalidates all layers after it — order from least-to-most frequently changing
  • Multi-stage builds: use heavy toolchains during compilation, ship only the output to production
✦ Definition~90s read
What is Introduction to Docker?

Docker is a containerization platform that packages an application and its dependencies into a lightweight, portable unit called a container. Unlike virtual machines, which emulate entire operating systems with their own kernel, containers share the host OS kernel and run as isolated processes.

Imagine you're moving house.

This fundamental difference means containers start in milliseconds, consume far less memory and disk space (megabytes vs gigabytes), and eliminate the overhead of hypervisor virtualization. Docker solves the classic 'it works on my machine' problem by ensuring consistent environments across development, testing, and production — a single Docker image built once runs identically on a laptop, a CI server, or a cloud VM.

Docker Compose extends this by orchestrating multi-container applications, defining services, networks, and volumes in a declarative YAML file. The 'crash on boot' scenario you'll encounter — where your API container starts before Postgres is ready — highlights a critical architectural truth: Docker Compose only guarantees container startup order via depends_on, not service readiness.

Postgres may be running but not yet accepting connections, causing your API to crash. The solution is application-level retry logic (like exponential backoff in your connection code) or a health check wrapper, not just reordering services. This pattern is non-negotiable in production.

Docker's architecture runs as a client-server application: the CLI (docker) talks to a daemon (dockerd) via a REST API. The daemon manages images (read-only templates with layered filesystems), containers (runnable instances with writable layers), and registries (like Docker Hub, which hosts over 10 million public images).

Images are built from Dockerfiles using a layer-caching system — each instruction (FROM, RUN, COPY) creates a cached layer, making rebuilds fast when only the last few lines change. The first five commands you'll master — docker run, ps, images, build, and pull — cover 90% of daily workflow.

When not to use Docker: for GUI-heavy desktop apps, real-time systems requiring bare-metal performance, or when your host OS kernel version conflicts with the container's requirements (e.g., running Linux containers on Windows without WSL2).

Plain-English First

Imagine you're moving house. Instead of dismantling every piece of furniture and hoping it fits in the new place, you pack everything — sofa, TV, cables, instruction manuals — into a perfectly sized shipping container. That container can be loaded onto any truck, ship, or train and delivered anywhere. Docker does exactly this for software: it bundles your app, its dependencies, its config, and its runtime into one portable 'container' that runs identically on your laptop, your colleague's machine, or a cloud server in Singapore. No more 'but it works on my machine'.

Environment drift is the root cause of most 'works on my machine' failures. A different Node version, a missing library, an environment variable pointing nowhere — these are not skill problems, they are infrastructure problems. Docker eliminates this class of issue by packaging the entire runtime environment into a portable, immutable container.

Containers are not VMs. They share the host OS kernel and use Linux namespaces and cgroups for isolation. This means containers start in milliseconds and use megabytes of memory, making microservices architectures economically viable. On the same machine that runs three VMs, you can run thirty containers.

Common misconceptions: containers are not inherently insecure (misconfiguration is the problem, not the technology), data inside containers is not persistent by default (you need volumes), and Docker Compose is not just for development (it works in production for single-host deployments).

Why Docker Compose Depends on Application-Level Retry, Not Just Container Order

Docker Compose's depends_on only controls container startup order, not service readiness. When your Java API container starts, it may attempt to connect to Postgres before Postgres is actually ready to accept connections. This is not a Docker bug — it's a fundamental design choice: Compose considers a container 'started' the moment its main process begins, not when the service inside is healthy.

In practice, this means your Spring Boot or Micronaut app will fail its initial connection pool initialization, crash, and potentially enter a restart loop. The fix isn't to add sleep commands — that's fragile and wastes seconds in CI. Instead, use healthcheck blocks in your Compose file to probe actual readiness (e.g., pg_isready for Postgres), then combine with depends_on.condition: service_healthy. Even then, your application code must implement retry logic with exponential backoff for the first connection, because health checks have a polling interval and can still race.

This pattern matters in every multi-service local dev environment and CI pipeline. Without it, you'll see intermittent 'Connection refused' errors that disappear on manual restart — wasting developer time and breaking automated tests. The rule: never trust container order alone; always pair health checks with application-level retry.

depends_on is not a readiness guarantee
depends_on only waits for the container to start, not for the service inside to be ready. Always add health checks and application retry logic.
Production Insight
Teams using Docker Compose for integration tests often see flaky failures where the API container crashes on boot because Postgres isn't accepting connections yet.
The exact symptom: 'org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.'
Rule of thumb: always add a healthcheck for Postgres (pg_isready -U postgres) and set depends_on condition: service_healthy, plus configure your connection pool with a 30-second initial retry timeout.
Key Takeaway
Docker Compose's depends_on only guarantees container start, not service readiness.
Always implement health checks for databases and caches in your Compose files.
Your application code must retry initial connections — health checks alone can still race.
Docker Crash on Boot: Postgres Not Ready THECODEFORGE.IO Docker Crash on Boot: Postgres Not Ready Why application-level retry logic is needed in Docker Compose Docker Compose Up Starts containers in dependency order Postgres Container Starts but not accepting connections yet App Container Starts immediately after Postgres App Connects to DB Fails because Postgres not ready Container Exits App crashes, restart loop begins Retry Logic App-level retry or healthcheck solves it ⚠ depends_on does not wait for readiness Use healthcheck + depends_on condition: service_healthy THECODEFORGE.IO
thecodeforge.io
Docker Crash on Boot: Postgres Not Ready
Introduction Docker

Containers vs Virtual Machines: Why Docker Is a Fundamentally Different Idea

Most people learn Docker by running commands without understanding the architectural shift underneath. That's fine for getting started, but it bites you the moment something breaks.

A virtual machine (VM) runs a full guest operating system — its own kernel, drivers, system processes — on top of a hypervisor. Your app sits at the top of this tower. Booting a VM can take minutes. It consumes gigabytes of RAM even before your app starts. Scaling ten microservices with VMs means ten full operating systems running simultaneously.

Docker containers take a different path. They share the host machine's kernel directly. Each container gets its own isolated view of the filesystem (via union file system layers), its own network namespace, and its own process tree — but there's no duplicated OS. A container starts in milliseconds. It uses megabytes of overhead instead of gigabytes.

The practical implication: on the same machine where you could run three VMs, you can run thirty containers. That's not a minor efficiency gain — it's the reason microservices architectures became economically viable. When AWS charges you per second of compute, that difference compounds fast.

Containers are not inherently less secure than VMs — they're just differently isolated. A misconfigured container is dangerous, just as a misconfigured VM is. The security story depends on your configuration, not the technology itself.

Kernel sharing trade-off: Because containers share the host kernel, a kernel vulnerability (like CVE-2022-0185 or Dirty Pipe) affects all containers on that host. VMs have a separate kernel per instance, so a kernel vulnerability in one VM does not affect others. For high-security multi-tenant environments (running untrusted code), VMs provide stronger isolation. For single-tenant application workloads, container isolation is sufficient.

io/thecodeforge/container_vs_vm_demo.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Compare startup time and resource footprint — run these and watch the difference

# Pull a minimal Linux image (only ~5MB compressed)
docker pull alpine:3.19

# Start a container, run a command, and exit — time the whole thing
time docker run --rm alpine:3.19 echo "Container is alive"
# --rm tells Docker to delete the container after it exits (no cleanup needed)
# alpine:3.19 is the image — think of it as the blueprint
# 'echo ...' is the command to run inside the container

# Now check how much memory the container used at peak
# Run it in the background with resource stats
docker run -d --name resource-demo alpine:3.19 sleep 30
# -d runs in detached (background) mode
# --name gives it a human-readable name instead of a random hash

docker stats resource-demo --no-stream
# --no-stream prints one snapshot instead of a live feed
# Look at the MEM USAGE column — typically under 1MB for alpine doing nothing

# Clean up
docker stop resource-demo && docker rm resource-demo
Output
Container is alive
real 0m0.387s
user 0m0.021s
sys 0m0.018s
NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
resource-demo 0.00% 632KiB / 15.55GiB 0.00% 796B / 0B 0B / 0B
Containers as Apartments, VMs as Houses
  • Multi-tenant environments running untrusted code — the shared kernel is a risk.
  • Workloads requiring a different kernel version than the host.
  • Compliance requirements that mandate full OS isolation.
  • For everything else — single-tenant application workloads — containers are the right choice.
Production Insight
The kernel sharing trade-off has real security implications. In 2022, the Dirty Pipe vulnerability (CVE-2022-0847) allowed any process to overwrite read-only files on the host kernel. Every container on an affected host was vulnerable simultaneously. VMs were not affected because each VM has its own kernel. For high-security environments, consider gVisor (user-space kernel) or Kata Containers (lightweight VMs) as alternatives that provide VM-level isolation with container-like startup speed.
Key Takeaway
Containers share the host kernel — they start in milliseconds and use megabytes of memory. VMs run a full guest OS — stronger isolation but much heavier. For single-tenant workloads, containers are the right choice. For multi-tenant or high-security environments, consider gVisor or Kata Containers.
Container vs VM Selection
IfSingle-tenant application workload (API, web server, worker)
UseContainer. Faster startup, lower cost, sufficient isolation.
IfMulti-tenant environment running untrusted code
UseVM or gVisor/Kata Containers. Stronger isolation required.
IfWorkload requires a specific kernel version
UseVM. Containers share the host kernel.
IfCI/CD pipeline, developer environment
UseContainer. Fast spin-up, disposable, reproducible.

Containerization vs Virtualization Comparison Table

The table below summarizes the key differences between traditional virtual machines and Docker containers. These trade-offs directly impact how you architect, deploy, and secure your applications.

AspectVirtual MachinesDocker Containers
Startup time30 seconds – 5 minutesMilliseconds to 2 seconds
Memory overhead512MB – 2GB per instance1MB – 50MB per instance
OS isolationFull guest OS per VMShared host kernel, isolated namespaces
Disk footprint5GB – 50GB per image5MB – 500MB per image
PortabilityHypervisor-dependent (.vmdk, .vhd)Runs on any Docker host (Linux, Mac, Windows, Cloud)
Security isolationStrong (separate kernel)Good (namespaces + cgroups, but shared kernel)
Best forFull OS control, strong isolation needsMicroservices, CI/CD pipelines, developer environments
Scaling speedMinutes (VM provisioning)Seconds (container spin-up)

When choosing between them, consider your workload's isolation requirements, startup latency tolerance, and operational overhead budget. For most web applications and APIs running on a single tenancy infrastructure, containers offer a 10-100x resource efficiency improvement over VMs.

Quick Decision Rule
If you can run it on the same kernel as the host, use a container. If you need a different kernel, full OS isolation, or are running untrusted code in a multi-tenant environment, use a VM or a lightweight VM alternative like Kata Containers.
Production Insight
Many production teams start with containers but eventually need a mix: containers for stateless services and VMs for legacy workloads or compliance requirements. Tools like AWS Fargate and Azure Container Instances abstract away the underlying VM management, giving you container speed with VM-level isolation for each task.
Key Takeaway
Containers are not miniature VMs — they share the host kernel. This fundamental difference makes containers faster and more efficient, but with weaker isolation. Choose based on your security and workload requirements.

Docker Architecture: From CLI to Running Container

When you type docker run, a chain of components works together to create and start a container. Understanding this flow helps you troubleshoot startup failures (like the ECONNREFUSED scenario) and optimize build performance.

  1. Docker CLI (your terminal) sends a REST API request to the Docker daemon. The CLI uses the DOCKER_HOST environment variable to know where the daemon is listening (default: unix:///var/run/docker.sock).
  2. Docker daemon (dockerd) receives the request, checks local image cache, and pulls the image if necessary. It then calls containerd via gRPC to create a container.
  3. containerd is the industry-standard container runtime (used by Docker, Kubernetes, and others). It manages the entire container lifecycle — image transfer, storage, network interfaces, and process orchestration. It calls runc to actually start the container.
  4. runc is the low-level OCI runtime that spawns the container process on the host. It uses Linux namespaces (PID, mount, network, user) and cgroups (CPU, memory, I/O) to isolate the process. Once the container is running, runc exits and the process runs directly under the host kernel.

The diagram below visualizes this flow. The important detail: the Docker daemon is not involved in the running container's process — it only coordinates setup. This makes containers lightweight and fast.

``mermaid graph TD A[User runs docker run] --> B[Docker CLI] B -->|REST API on /var/run/docker.sock| C[Docker Daemon (dockerd)] C -->|gRPC| D[containerd] D -->|OCI runc call| E[runc] E -->|clone() with namespaces & cgroups| F[Container Process] F --> G[Host Kernel] C -->|image pull| H[Registry (Docker Hub)] H --> C ``

Production Insight
In Kubernetes, you often replace dockerd with containerd directly (no Docker daemon). Understanding this decoupling helps when migrating from Docker Swarm or Docker Compose to Kubernetes — the container runtime stays the same, but the orchestration layer changes. Also, the Docker daemon is a single point of failure if you tie all container management to it; running containers directly via containerd (using crictl or ctr) avoids that bottleneck.
Key Takeaway
The Docker architecture is a layered pipeline: CLI -> daemon -> containerd -> runc. The daemon coordinates but does not run containers directly. This decoupling allows Kubernetes to use containerd independently.
Docker Architecture: CLI -> Daemon -> containerd -> runc
REST API via docker.sockgRPCOCI runc callnamespaces and cgroupsimage pullUser runs docker runDocker CLIDocker Daemon dockerdcontainerdruncContainer ProcessHost KernelRegistry Docker Hub

First 5 Docker Commands Every Developer Should Know

If you're new to Docker, these five commands will cover 90% of your daily workflow. Master them before diving into advanced topics like multi-stage builds or healthchecks.

  1. docker pull — Download an image from a registry (default Docker Hub). Always specify a tag; never use latest in scripts.
  2. ```bash
  3. docker pull postgres:16-alpine
  4. ```
  5. docker run — Create and start a container from an image. Common flags: -d (detach), -p (port mapping), --name, -v (volume), -e (environment variable).
  6. ```bash
  7. docker run -d --name my-postgres -p 5432:5432 -e POSTGRES_PASSWORD=secret postgres:16-alpine
  8. ```
  9. docker ps — List running containers. Add -a to include stopped containers. Use --format for custom output.
  10. ```bash
  11. docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
  12. ```
  13. docker logs — View logs from a container. Use -f to follow (tail -f style). Combine with --tail for performance.
  14. ```bash
  15. docker logs -f my-postgres --tail 20
  16. ```
  17. docker exec — Run a command inside a running container. The -it flags allocate an interactive terminal (useful for debugging).
  18. ```bash
  19. docker exec -it my-postgres psql -U postgres
  20. ```

These commands are the foundation. Once comfortable, add docker compose, docker build, docker images, and docker system prune to your toolkit.

io/thecodeforge/first_5_commands.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 1. Pull an image
$ docker pull alpine:3.19

# 2. Run a container interactively
$ docker run -it alpine:3.19 sh
/ # echo "Hello from inside container"
Hello from inside container
/ # exit

# 3. List running containers
$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS   PORTS   NAMES

# 4. View logs of a container (replace <container-id>)
$ docker logs <container-id>

# 5. Execute a command in a running container
$ docker run -d --name test alpine:3.19 sleep 30
$ docker exec test echo "Alive!"
Alive!
$ docker stop test && docker rm test
Output
Hello from inside container
Alive!
Always Clean Up After Yourself
Containers and images can accumulate quickly. Run docker system prune -a --volumes periodically (but carefully — it removes unused containers, images, networks, and volumes). For safety, omit --volumes if you want to keep cached data.
Production Insight
In CI/CD pipelines, use docker run --rm to auto-clean containers after they finish. This prevents disk fill-ups on build machines. Also, always pull images with a specific digest (@sha256:...) instead of tags for immutable builds.
Key Takeaway
Five commands — pull, run, ps, logs, exec — form the daily driver set. Practice them until they're muscle memory, then explore compose and volumes.

Images, Layers and Dockerfiles: How Docker Actually Builds Your App

A Docker image is a read-only blueprint for creating containers. A container is a running instance of an image — the same relationship as a class and an object in OOP, or a recipe and a meal.

Images are built in layers. Every instruction in a Dockerfile creates a new layer on top of the previous one. Docker caches these layers aggressively. This is the single most important thing to understand about Dockerfile efficiency: if layer 3 changes, Docker rebuilds from layer 3 downward. Layers 1 and 2 are served from cache instantly.

This is why experienced engineers always copy dependency manifests (package.json, requirements.txt, go.mod) and install dependencies BEFORE copying application source code. Source code changes every commit; dependencies change rarely. Put the slow, stable work near the top of your Dockerfile so it stays cached.

Multi-stage builds are the other major pattern worth knowing early. You use one image (with compilers, build tools, dev dependencies) to build your app, then copy only the compiled output into a minimal runtime image. Your final image contains zero build tooling — smaller, faster, and with a dramatically reduced attack surface.

Let's build a realistic Node.js API with both patterns applied — this is what a production-ready Dockerfile actually looks like, not the toy examples you usually see.

Layer cleanup in the same RUN: Each RUN creates a new layer. If you download a 200MB package in one RUN and delete it in the next RUN, the 200MB still exists in the first layer — layers are additive. Always chain download and cleanup in the same RUN with &&.

io/thecodeforge/DockerfileDOCKERFILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# ── STAGE 1: Build Stage ──────────────────────────────────────────────────────
# Use the full Node image with build tools available
FROM node:20-alpine AS builder
# 'AS builder' names this stage so we can reference it later
# node:20-alpine uses Alpine Linux — much smaller than node:20-bullseye

# Set the working directory inside the container
WORKDIR /app

# COPY dependency files FIRST — before application code
# Docker caches this layer. If package.json hasn't changed, npm install
# won't re-run even if your source code changed. This saves minutes per build.
COPY package.json package-lock.json ./

# Install only production dependencies (saves ~200MB vs installing devDependencies)
RUN npm ci --omit=dev
# npm ci is faster and stricter than npm install — it respects package-lock.json exactly

# NOW copy application source code
# Changing any source file only invalidates from this line forward
COPY src/ ./src/

# ── STAGE 2: Production Runtime Stage ─────────────────────────────────────────
# Start fresh from a minimal image — no build tools, no npm, no package manager cruft
FROM node:20-alpine AS production

# Run as a non-root user — critical for production security
# node:alpine ships with a 'node' user built in
USER node

WORKDIR /app

# Copy only what we need from the builder stage — not the entire filesystem
COPY --from=builder --chown=node:node /app/node_modules ./node_modules
COPY --from=builder --chown=node:node /app/src ./src
COPY --chown=node:node package.json ./
# --chown ensures the node user owns these files, not root

# Document which port the app listens on (informational — doesn't actually publish it)
EXPOSE 3000

# Define the command to run when a container starts from this image
# Use array form (exec form) — NOT string form — to ensure signals are handled correctly
CMD ["node", "src/server.js"]
Output
# Build the image — run from the directory containing your Dockerfile
$ docker build -t my-node-api:1.0.0 .
Sending build context to Docker daemon 48.13kB
Step 1/11 : FROM node:20-alpine AS builder
---> 3f4d90098f5b
Step 2/11 : WORKDIR /app
---> Using cache
Step 3/11 : COPY package.json package-lock.json ./
---> Using cache <- dependencies layer served from cache!
Step 4/11 : RUN npm ci --omit=dev
---> Using cache <- install step also cached — build is fast
Step 5/11 : COPY src/ ./src/
---> 8c3a1b2d4e5f <- only this layer rebuilt (source changed)
...
Successfully built a7b3c9d1e2f4
Successfully tagged my-node-api:1.0.0
# Check the final image size
$ docker image ls my-node-api
REPOSITORY TAG IMAGE ID CREATED SIZE
my-node-api 1.0.0 a7b3c9d1e2f4 12 seconds ago 142MB
# Compare: the builder stage alone would be ~380MB with all dev tooling
Layers as Transparent Slides
  • Each layer is a diff on top of the previous layer. If the base changes, the diff no longer applies.
  • Docker cannot know if a later instruction depends on the changed content in an earlier layer.
  • The cache is sequential, not selective — Docker rebuilds from the first invalidated layer onward.
  • This is why layer ordering (least-change to most-change) is the single most impactful Dockerfile optimization.
Production Insight
The cleanup-in-same-layer rule is the most common cause of bloated images. A team's image was 1.2GB because they ran apt-get install in one RUN and apt-get clean in the next. The 800MB apt cache persisted in the first layer. Fix: chain with && and clean up in the same RUN. This alone reduced their image from 1.2GB to 340MB.
Key Takeaway
Docker builds images as a stack of cached layers. Order instructions from least-to-most frequently changing. Copy dependency manifests before source code. Chain cleanup in the same RUN as the operation. This single optimization can turn 5-minute rebuilds into 10-second rebuilds.
Layer Ordering Strategy
IfBase image (FROM)
UseFirst layer. Changes rarely. Cached indefinitely until the tag is updated.
IfSystem dependencies (apt-get install, apk add)
UseSecond layer. Changes occasionally. Chain with && and clean up in the same RUN.
IfDependency manifests (package.json, requirements.txt)
UseThird layer. Changes when dependencies change. Copy BEFORE source code.
IfDependency installation (npm ci, pip install)
UseFourth layer. Changes when dependencies change. Cached until manifests change.
IfSource code (COPY . . or COPY src/)
UseLast layer. Changes on every code edit. Must be the final COPY to maximize cache.

Volumes and Docker Compose: Persistence and Multi-Container Orchestration

Containers are ephemeral by design. When a container stops, any data written inside it vanishes. That's perfect for stateless services, but databases, file uploads, and logs need to survive container restarts. Docker volumes solve this by mounting a storage location from the host (or a managed volume) into the container's filesystem.

There are three storage mechanisms: bind mounts (link a specific host directory into the container — great for local development where you want live code reloading), named volumes (Docker manages the storage location — best for databases in production), and tmpfs mounts (in-memory only — useful for sensitive data you never want written to disk).

Real applications are never a single container. You have an API, a database, a cache, maybe a background worker. Running and networking these manually with individual docker run commands is error-prone and impossible to reproduce reliably. Docker Compose lets you define your entire multi-container application in one YAML file and bring it all up with a single command.

Here's a complete, realistic Compose setup for a Node.js API backed by PostgreSQL and Redis — the stack you'll encounter in most backend roles.

The depends_on trap: depends_on without condition: service_healthy only waits for the container to START — not for the process inside to be READY. Postgres takes 5-15 seconds to initialize. Without service_healthy, your API will crash on boot trying to connect to a database that is not accepting connections yet. This is the single most common cause of flaky Docker Compose environments.

io/thecodeforge/docker-compose.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# Docker Compose V2 format (no 'version' key needed with modern Docker Desktop)
services:

  # ── The API service ───────────────────────────────────────────────────────
  api:
    build:
      context: .           # Build from the Dockerfile in the current directory
      target: production   # Use the 'production' stage from our multi-stage Dockerfile
    container_name: my-api
    ports:
      - "3000:3000"        # Map host port 3000 -> container port 3000
    environment:
      # Reference values from a .env file — never hardcode secrets in Compose files
      NODE_ENV: production
      DATABASE_URL: postgresql://api_user:${DB_PASSWORD}@postgres:5432/app_db
      REDIS_URL: redis://redis:6379
      # 'postgres' and 'redis' are the service names below — Docker's internal
      # DNS resolves them automatically within the shared network
    depends_on:
      postgres:
        condition: service_healthy   # Wait until postgres passes its health check
      redis:
        condition: service_started
    restart: unless-stopped          # Restart on crash, but not if manually stopped

  # ── PostgreSQL database ───────────────────────────────────────────────────
  postgres:
    image: postgres:16-alpine        # Always pin a specific version — never use 'latest'
    container_name: my-postgres
    environment:
      POSTGRES_DB: app_db
      POSTGRES_USER: api_user
      POSTGRES_PASSWORD: ${DB_PASSWORD}   # Pulled from .env file
    volumes:
      - postgres_data:/var/lib/postgresql/data
      # Named volume — Docker manages where this lives on the host.
      # Database files survive 'docker compose down' and container rebuilds.
      - ./db/init.sql:/docker-entrypoint-initdb.d/init.sql:ro
      # Bind mount an init script — runs once when the DB is first created.
      # :ro makes it read-only inside the container (good security habit)
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U api_user -d app_db"]
      interval: 5s     # Check every 5 seconds
      timeout: 5s      # Fail if no response in 5 seconds
      retries: 5       # Mark unhealthy after 5 consecutive failures
      start_period: 30s  # Grace period before health checks start

  # ── Redis cache ───────────────────────────────────────────────────────────
  redis:
    image: redis:7-alpine
    container_name: my-redis
    command: redis-server --appendonly yes
    # --appendonly yes enables AOF persistence — data survives Redis restarts
    volumes:
      - redis_data:/data

# Named volumes must be declared at the top level
# Docker creates and manages these — they persist across 'docker compose down'
volumes:
  postgres_data:
  redis_data:
Output
# Start everything (add -d for detached/background mode)
$ docker compose up -d
[+] Running 5/5
✔ Network my-app_default Created
✔ Volume "postgres_data" Created
✔ Volume "redis_data" Created
✔ Container my-postgres Healthy
✔ Container my-redis Started
✔ Container my-api Started
# Check all services are running
$ docker compose ps
NAME IMAGE COMMAND STATUS PORTS
my-api my-app-api "docker-entrypoint.s…" Up 12 seconds 0.0.0.0:3000->3000/tcp
my-postgres postgres:16-alpine "docker-entrypoint.s…" Up 18 seconds 5432/tcp
my-redis redis:7-alpine "docker-entrypoint.s…" Up 18 seconds 6379/tcp
# Tail logs from a specific service
$ docker compose logs -f api
my-api | Server listening on port 3000
my-api | Database connection established
my-api | Redis connection established
# Tear down (volumes are preserved by default)
$ docker compose down
# Add --volumes to also delete the named volumes (WARNING: deletes all DB data)
docker compose down vs docker compose down -v
  • It deletes all named volumes for the project — including databases with days of data.
  • There is no undo. Once volumes are deleted, data is gone unless backed up.
  • Developers often use it thinking it is a 'clean restart' — it is a destructive operation.
  • Always back up volumes before running down -v. Use: docker run --rm -v vol:/data -v $(pwd):/backup alpine tar czf /backup/backup.tar.gz -C /data .
Production Insight
The depends_on with service_healthy pattern is not just for databases. Any service with initialization time — Redis, Elasticsearch, Kafka, message queues — needs a healthcheck and a depends_on condition. Without it, dependent services will crash on boot and enter a restart loop, delaying deployments and creating flaky CI pipelines.
Key Takeaway
Named volumes persist data across container restarts — they are the production default. depends_on with condition: service_healthy prevents race conditions between services. docker compose down preserves volumes; down -v deletes them. Always pin image versions — never use 'latest' in production.
Volume Type Selection
IfProduction database or stateful service
UseNamed volume (postgres_data:/var/lib/postgresql/data). Docker manages the path. Portable.
IfDevelopment — live code reload, config files
UseBind mount (-v ./src:/app/src). Direct access to host files. Fast iteration.
IfSensitive data that should never touch disk
Usetmpfs mount (--tmpfs /secrets:size=1m). In-memory only. Deleted on container stop.
IfShared config files across multiple containers
UseNamed volume with :ro (read-only) flag. Prevents accidental modification by any container.

Why Your Docker Builds Break at 2 AM: The Registry Rate-Limit Ambush

You've tuned Dockerfiles, layered images, and pinned base tags. Then your CI pipeline fails at 3 AM because Docker Hub throttled your pull. That's not a networking glitch — it's a rate limit. Default anonymous pulls max out at 100 per six hours per IP. Authenticated free users get 200. Your build server is sharing an office IP with twenty other devs. You're screwed.

Fix this before it bites. Authenticate every Docker client that pulls images — even from your local laptop. Add credsStore to ~/.docker/config.json and log in via docker login. For CI, inject a read-only PAT as a secret. Never rely on anonymous pulls for production builds. If you use a mirror registry like Docker's own registry-1.docker.io, you still get throttled. Run your own pull-through cache with Nexus or Harbor. That way your CI pulls once, caches locally, and never hits the public limit again.

docker-compose.registry-cache.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// io.thecodeforge — devops tutorial

version: '3.8'
services:
  registry:
    image: registry:2
    ports:
      - "5000:5000"
    environment:
      REGISTRY_PROXY_REMOTEURL: https://registry-1.docker.io
      REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY: /data
    volumes:
      - registry_data:/data

volumes:
  registry_data:
Output
Create a local pull-through cache. Then point every Docker daemon --registry-mirror http://localhost:5000. Rate limits vanish.
Production Trap:
Don't assume your cloud provider's Docker Hub mirror is free. AWS, GCP, and Azure all rate-limit their public mirrors differently. Measure your actual pull throughput before relying on them.
Key Takeaway
Always authenticate your Docker client and use a local pull-through cache for any shared build infrastructure.

Stop Using Latest: How Unpinned Tags Corrode Your Deployments

image: postgres:latest looks harmless on day one. Six weeks later, a latest tag silently maps to Postgres 16 while your code expects Postgres 14. Your staging environment passes tests because you haven't rebuilt the image. Production pulls the new tag, crashes on a breaking change, and your pager goes off. That's not a CI failure — it's a trust failure.

Every tag except a digest is mutable. latest, alpine, bullseye — all of them get overwritten by maintainers. Pin to a digest (sha256:abc123...). Or at minimum pin to a minor version: postgres:14.10. Never use latest in any environment you care about. Not dev, not staging, not prod. If you use Docker Compose for local dev, still pin it. The same bug applies when a co-worker pulls the compose file on a fresh machine three months later.

Automate your image update policy with Renovate or Dependabot. They open PRs for digest bumps. You review, approve, and know exactly what changed. No surprises.

docker-compose.pinned.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — devops tutorial

version: '3.8'
services:
  db:
    image: postgres@sha256:f9183f69e6ca1b1b2f9d1e0faee24f466ce5e92ec1cd20e7cd6e3a0f0a0b5e7c
    # not: image: postgres:latest
    environment:
      POSTGRES_DB: payments
      POSTGRES_USER: app
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    secrets:
      - db_password

secrets:
  db_password:
    file: ./secrets/db_password.txt
Output
Pulling digest ensures every deploy uses the exact same image. No drift. No mystery.
Senior Shortcut:
Use 'docker inspect --format='{{index .RepoDigests 0}}' <image>' to get the digest of an already-pulled image. Paste that into your compose file.
Key Takeaway
Never use latest or any mutable tag in a Docker Compose file or CI pipeline. Pin to a digest or a specific minor version.

The Hidden Cost of `docker exec` in Production Debugging

You SSH into a box, run docker exec -it web bash, and start poking around. Maybe you install curl. Maybe you grep logs. Ten minutes later you forget to exit. That container is now a snowflake — modified, untracked, and unreproducible. When you restart the service, those changes vanish. But if you scaled up replicas, only one has your ad-hoc tools. The next on-call engineer is confused. The deployment pipeline can't rebuild your state. You've broken reproducibility.

Production debugging with docker exec is a crutch you should break. Instead, run a sidecar container with the debugging tools you need, controlled by its own health check. Or ship all logs to a central aggregator (ELK, Loki, Datadog) before you need them. If you absolutely must inspect a running process, use docker cp to copy files out, or docker logs --tail 1000 -f to stream output. Never mutate a running production container.

If you're tempted to install packages inside a container, that's a signal your base image is missing essential debugging tools. Add them to your Dockerfile behind a build arg: ARG DEBUG_TOOLS=true and RUN apt-get install -y netcat strace tcpdump. Rebuild when you need them.

docker-compose.debug-sidecar.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — devops tutorial

version: '3.8'
services:
  api:
    image: payments-api:1.2.3
    # debug tools built-in via Dockerfile
    build:
      context: .
      args:
        DEBUG_TOOLS: "true"
  debug:
    image: alpine:3.18
    command: ["sleep", "infinity"]
    network_mode: "service:api"
    # share network namespace with api for tcpdump, netcat, etc.
    depends_on:
      - api
    profiles:
      - debug # only start with: docker compose --profile debug up
Output
Run 'docker compose --profile debug run debug sh' to get a shell sharing the same network. Debug without mutating the API container.
Production Trap:
If you use docker exec to install a tool, then commit the container — that image is now untraceable. Never commit a running container. Rebuild from source.
Key Takeaway
Don't mutate production containers with docker exec. Use sidecars, centralized logging, or build-time debug tools instead.
● Production incidentPOST-MORTEMseverity: high

Production Deployment Crashes on Boot — API Connects to Database Before Postgres Is Ready

Symptom
After running docker compose up -d, the API container exited immediately with code 1. docker compose logs api showed: 'Error: connect ECONNREFUSED 172.18.0.2:5432'. The container restarted automatically (restart: unless-stopped) but crashed again with the same error. After 4-5 restarts over 60 seconds, Postgres finished initializing, and the API finally started successfully on the 6th attempt.
Assumption
The team assumed a network configuration issue — maybe the containers were on different networks. They checked docker network inspect and confirmed both containers were on the same network. They assumed a DNS resolution issue — they exec'd into the API container and ran nslookup postgres, which resolved correctly. They assumed the database was misconfigured — they exec'd into the Postgres container and ran psql, which connected successfully.
Root cause
The docker-compose.yml had depends_on: [postgres] without a condition. depends_on without condition: service_healthy only waits for the container to START (docker start returns), not for the process inside to be READY. Postgres takes 5-15 seconds to initialize after the container starts — creating the default database, running init scripts, and opening the port. The API container started immediately after the Postgres container started, before Postgres was accepting connections. The API crashed, Docker restarted it, and this repeated until Postgres was finally ready.
Fix
1. Added a healthcheck to the Postgres service using pg_isready. 2. Changed depends_on to use condition: service_healthy so the API waits until Postgres passes its health check. 3. Added a start_period: 30s to the healthcheck to prevent false failures during Postgres initialization. 4. Added a retry loop in the API's startup code as a defense-in-depth measure (connect with exponential backoff for 30 seconds before giving up). 5. Documented the pattern in the team's Docker Compose style guide.
Key lesson
  • depends_on without condition: service_healthy only waits for container start — not process readiness. Always use service_healthy for databases.
  • Postgres, MySQL, Redis, and any service with initialization time needs a healthcheck. Without it, dependent services will crash on boot.
  • A restart loop (container crashing and restarting repeatedly) is a symptom of a race condition, not a network issue. Check startup ordering first.
  • Defense-in-depth: add a connection retry loop in your application code as a second layer of protection, even with healthchecks in place.
  • The start_period flag on healthcheck prevents false failures during slow startup. Set it to the expected maximum initialization time.
Production debug guideFrom startup crashes to slow builds — systematic debugging paths.6 entries
Symptom · 01
Container exits immediately with code 1 or 137 on startup.
Fix
Check logs: docker compose logs <service> or docker logs <container>. Exit code 1 is an application error — check the stack trace. Exit code 137 is OOM-killed — check memory limits with docker stats. Exit code 143 is SIGTERM — check if the container is being stopped by another process or healthcheck.
Symptom · 02
Container cannot connect to another container — 'ECONNREFUSED' or 'could not translate host name'.
Fix
Verify both containers are on the same network: docker network inspect <network>. Check if the target container is running: docker compose ps. Check if the target service is healthy (if it has a healthcheck). Verify DNS resolution: docker exec <container> nslookup <service-name>.
Symptom · 03
Docker build is slow — every rebuild takes 3-5 minutes.
Fix
Check layer ordering. Run docker history <image> to see which layers were rebuilt. If the dependency install layer rebuilds on every code change, the Dockerfile copies source code before dependency manifests. Fix: copy package.json/requirements.txt before COPY . . and run install in a separate layer.
Symptom · 04
Container data disappears after restart.
Fix
Check if a volume is mounted: docker inspect <container> --format '{{.Mounts}}'. If no volume is mounted, data lives in the container's writable layer and is lost on docker rm. Fix: add a named volume to docker-compose.yml and restart.
Symptom · 05
docker compose up fails with 'port is already allocated'.
Fix
Check what is using the port: docker ps --format '{{.Ports}}' or ss -tlnp | grep <port>. Either stop the conflicting container or change the host-side port mapping in docker-compose.yml.
Symptom · 06
Image is unexpectedly large — 1GB+ for a simple application.
Fix
Run docker history --no-trunc <image> to see layer sizes. Check if .dockerignore exists — without it, COPY . . includes node_modules and .git. Check if multi-stage builds are used — the final image may contain build tools. Check if package manager cache is cleaned in the same RUN layer.
★ Docker Container Triage Cheat SheetFirst-response commands when containers crash, builds are slow, or services cannot communicate.
Container exits immediately on startup.
Immediate action
Check container logs and exit code.
Commands
docker compose logs <service> --tail 50
docker inspect <container> --format '{{.State.ExitCode}} {{.State.Error}}'
Fix now
Exit code 1 = app error (check logs). Exit code 137 = OOM (increase memory limit). Exit code 143 = SIGTERM (check healthcheck or depends_on).
Container A cannot reach Container B by hostname.+
Immediate action
Verify network connectivity and DNS resolution.
Commands
docker network inspect <network> --format '{{range .Containers}}{{.Name}} {{end}}'
docker exec <container-a> nslookup <container-b-service-name>
Fix now
If container-b is missing from the network, add it to the same network in docker-compose.yml. If using default bridge, create a user-defined network.
Docker build is slow — every rebuild takes minutes.+
Immediate action
Check which layers are being rebuilt vs cached.
Commands
docker build --progress=plain -t test . 2>&1 | grep -E 'CACHED|RUN|COPY'
docker history <image> --format '{{.Size}} {{.CreatedBy}}' | sort -hr
Fix now
If RUN npm install rebuilds on every change, move COPY package.json before COPY . . . Separate dependency installation from source code copying.
Port already allocated — container cannot start.+
Immediate action
Find what is using the port.
Commands
docker ps --format '{{.Names}} {{.Ports}}' | grep <port>
ss -tlnp | grep <port>
Fix now
Stop the conflicting container or change the host-side port in docker-compose.yml (e.g., 3001:3000 instead of 3000:3000).
Container data lost after docker compose down.+
Immediate action
Check if volumes were deleted or never created.
Commands
docker volume ls | grep <project>
docker compose config | grep -A2 volumes
Fix now
If docker compose down -v was run, data is gone (check backups). If no volumes defined in docker-compose.yml, add named volumes for stateful services.
Image is unexpectedly large (>500MB).+
Immediate action
Inspect layer sizes and check for .dockerignore.
Commands
docker history <image> --format '{{.Size}} {{.CreatedBy}}' | sort -hr
cat .dockerignore 2>/dev/null || echo 'NO .dockerignore FILE'
Fix now
Create .dockerignore. Use multi-stage builds. Chain cleanup in same RUN: RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*.
Virtual Machines vs Docker Containers
AspectVirtual MachinesDocker Containers
Startup time30 seconds – 5 minutesMilliseconds to 2 seconds
Memory overhead512MB – 2GB per instance1MB – 50MB per instance
OS isolationFull guest OS per VMShared host kernel, isolated namespaces
Disk footprint5GB – 50GB per image5MB – 500MB per image
PortabilityHypervisor-dependent (.vmdk, .vhd)Runs on any Docker host (Linux, Mac, Windows, Cloud)
Security isolationStrong (separate kernel)Good (namespaces + cgroups, but shared kernel)
Best forFull OS control, strong isolation needsMicroservices, CI/CD pipelines, developer environments
Scaling speedMinutes (VM provisioning)Seconds (container spin-up)

Key takeaways

1
Containers share the host OS kernel
they're not mini VMs. This is why they start in milliseconds and use megabytes of memory, making them economically practical for microservices at scale.
2
Docker image layers are cached from top to bottom. Copy dependency manifests and run installs BEFORE copying source code, or every git commit will trigger a full package reinstall.
3
Multi-stage builds are not optional in production
they separate build-time tooling from the runtime image, cutting image sizes by 50-70% and removing attack surface from your deployed artifact.
4
Named volumes persist data across container restarts and rebuilds; depends_on with service_healthy prevents race conditions
both are non-negotiable for any database-backed service.
5
docker compose down preserves volumes. docker compose down -v deletes them. Always back up volumes before any destructive operation.
6
Always use exec-form CMD (CMD ["node", "server.js"])
shell form silently breaks graceful shutdown in Kubernetes.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is Docker used for in real-world software development?
02
Is Docker the same as a virtual machine?
03
Does data inside a Docker container get deleted when the container stops?
04
What is the difference between depends_on and depends_on with condition: service_healthy?
05
How do I reduce the size of my Docker image?
N
Naren Founder & Principal Engineer

20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's Docker. Mark it forged?

11 min read · try the examples if you haven't

Previous
Git Amend: Edit the Last Commit
1 / 18 · Docker
Next
Containerization vs Virtualization