Docker Crash on Boot — API Connects Before Postgres Ready
API container exits ECONNREFUSED 5432 due to missing healthcheck on Postgres.
20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.
- Container: a running instance of an image — isolated filesystem, network, and process tree sharing the host kernel
- Image: a read-only blueprint built from layers — each Dockerfile instruction creates one cached layer
- Dockerfile: the build script that defines the image — instructions are executed top-to-bottom
- Volume: persistent storage that survives container deletion — named volumes for production, bind mounts for development
- Containers share the host OS kernel (no guest OS overhead)
- VMs run a full guest OS per instance (stronger isolation, much heavier)
- Layer caching: changing one layer invalidates all layers after it — order from least-to-most frequently changing
- Multi-stage builds: use heavy toolchains during compilation, ship only the output to production
Imagine you're moving house. Instead of dismantling every piece of furniture and hoping it fits in the new place, you pack everything — sofa, TV, cables, instruction manuals — into a perfectly sized shipping container. That container can be loaded onto any truck, ship, or train and delivered anywhere. Docker does exactly this for software: it bundles your app, its dependencies, its config, and its runtime into one portable 'container' that runs identically on your laptop, your colleague's machine, or a cloud server in Singapore. No more 'but it works on my machine'.
Environment drift is the root cause of most 'works on my machine' failures. A different Node version, a missing library, an environment variable pointing nowhere — these are not skill problems, they are infrastructure problems. Docker eliminates this class of issue by packaging the entire runtime environment into a portable, immutable container.
Containers are not VMs. They share the host OS kernel and use Linux namespaces and cgroups for isolation. This means containers start in milliseconds and use megabytes of memory, making microservices architectures economically viable. On the same machine that runs three VMs, you can run thirty containers.
Common misconceptions: containers are not inherently insecure (misconfiguration is the problem, not the technology), data inside containers is not persistent by default (you need volumes), and Docker Compose is not just for development (it works in production for single-host deployments).
Why Docker Compose Depends on Application-Level Retry, Not Just Container Order
Docker Compose's depends_on only controls container startup order, not service readiness. When your Java API container starts, it may attempt to connect to Postgres before Postgres is actually ready to accept connections. This is not a Docker bug — it's a fundamental design choice: Compose considers a container 'started' the moment its main process begins, not when the service inside is healthy.
In practice, this means your Spring Boot or Micronaut app will fail its initial connection pool initialization, crash, and potentially enter a restart loop. The fix isn't to add sleep commands — that's fragile and wastes seconds in CI. Instead, use healthcheck blocks in your Compose file to probe actual readiness (e.g., pg_isready for Postgres), then combine with depends_on.condition: service_healthy. Even then, your application code must implement retry logic with exponential backoff for the first connection, because health checks have a polling interval and can still race.
This pattern matters in every multi-service local dev environment and CI pipeline. Without it, you'll see intermittent 'Connection refused' errors that disappear on manual restart — wasting developer time and breaking automated tests. The rule: never trust container order alone; always pair health checks with application-level retry.
Containers vs Virtual Machines: Why Docker Is a Fundamentally Different Idea
Most people learn Docker by running commands without understanding the architectural shift underneath. That's fine for getting started, but it bites you the moment something breaks.
A virtual machine (VM) runs a full guest operating system — its own kernel, drivers, system processes — on top of a hypervisor. Your app sits at the top of this tower. Booting a VM can take minutes. It consumes gigabytes of RAM even before your app starts. Scaling ten microservices with VMs means ten full operating systems running simultaneously.
Docker containers take a different path. They share the host machine's kernel directly. Each container gets its own isolated view of the filesystem (via union file system layers), its own network namespace, and its own process tree — but there's no duplicated OS. A container starts in milliseconds. It uses megabytes of overhead instead of gigabytes.
The practical implication: on the same machine where you could run three VMs, you can run thirty containers. That's not a minor efficiency gain — it's the reason microservices architectures became economically viable. When AWS charges you per second of compute, that difference compounds fast.
Containers are not inherently less secure than VMs — they're just differently isolated. A misconfigured container is dangerous, just as a misconfigured VM is. The security story depends on your configuration, not the technology itself.
Kernel sharing trade-off: Because containers share the host kernel, a kernel vulnerability (like CVE-2022-0185 or Dirty Pipe) affects all containers on that host. VMs have a separate kernel per instance, so a kernel vulnerability in one VM does not affect others. For high-security multi-tenant environments (running untrusted code), VMs provide stronger isolation. For single-tenant application workloads, container isolation is sufficient.
- Multi-tenant environments running untrusted code — the shared kernel is a risk.
- Workloads requiring a different kernel version than the host.
- Compliance requirements that mandate full OS isolation.
- For everything else — single-tenant application workloads — containers are the right choice.
Containerization vs Virtualization Comparison Table
The table below summarizes the key differences between traditional virtual machines and Docker containers. These trade-offs directly impact how you architect, deploy, and secure your applications.
| Aspect | Virtual Machines | Docker Containers |
|---|---|---|
| Startup time | 30 seconds – 5 minutes | Milliseconds to 2 seconds |
| Memory overhead | 512MB – 2GB per instance | 1MB – 50MB per instance |
| OS isolation | Full guest OS per VM | Shared host kernel, isolated namespaces |
| Disk footprint | 5GB – 50GB per image | 5MB – 500MB per image |
| Portability | Hypervisor-dependent (.vmdk, .vhd) | Runs on any Docker host (Linux, Mac, Windows, Cloud) |
| Security isolation | Strong (separate kernel) | Good (namespaces + cgroups, but shared kernel) |
| Best for | Full OS control, strong isolation needs | Microservices, CI/CD pipelines, developer environments |
| Scaling speed | Minutes (VM provisioning) | Seconds (container spin-up) |
When choosing between them, consider your workload's isolation requirements, startup latency tolerance, and operational overhead budget. For most web applications and APIs running on a single tenancy infrastructure, containers offer a 10-100x resource efficiency improvement over VMs.
Docker Architecture: From CLI to Running Container
When you type docker run, a chain of components works together to create and start a container. Understanding this flow helps you troubleshoot startup failures (like the ECONNREFUSED scenario) and optimize build performance.
- Docker CLI (your terminal) sends a REST API request to the Docker daemon. The CLI uses the
DOCKER_HOSTenvironment variable to know where the daemon is listening (default:unix:///var/run/docker.sock). - Docker daemon (
dockerd) receives the request, checks local image cache, and pulls the image if necessary. It then calls containerd via gRPC to create a container. - containerd is the industry-standard container runtime (used by Docker, Kubernetes, and others). It manages the entire container lifecycle — image transfer, storage, network interfaces, and process orchestration. It calls runc to actually start the container.
- runc is the low-level OCI runtime that spawns the container process on the host. It uses Linux namespaces (PID, mount, network, user) and cgroups (CPU, memory, I/O) to isolate the process. Once the container is running, runc exits and the process runs directly under the host kernel.
The diagram below visualizes this flow. The important detail: the Docker daemon is not involved in the running container's process — it only coordinates setup. This makes containers lightweight and fast.
``mermaid graph TD A[User runs docker run] --> B[Docker CLI] B -->|REST API on /var/run/docker.sock| C[Docker Daemon (dockerd)] C -->|gRPC| D[containerd] D -->|OCI runc call| E[runc] E -->|clone() with namespaces & cgroups| F[Container Process] F --> G[Host Kernel] C -->|image pull| H[Registry (Docker Hub)] H --> C ``
dockerd with containerd directly (no Docker daemon). Understanding this decoupling helps when migrating from Docker Swarm or Docker Compose to Kubernetes — the container runtime stays the same, but the orchestration layer changes. Also, the Docker daemon is a single point of failure if you tie all container management to it; running containers directly via containerd (using crictl or ctr) avoids that bottleneck.First 5 Docker Commands Every Developer Should Know
If you're new to Docker, these five commands will cover 90% of your daily workflow. Master them before diving into advanced topics like multi-stage builds or healthchecks.
- docker pull — Download an image from a registry (default Docker Hub). Always specify a tag; never use
latestin scripts. - ```bash
- docker pull postgres:16-alpine
- ```
- docker run — Create and start a container from an image. Common flags:
-d(detach),-p(port mapping),--name,-v(volume),-e(environment variable). - ```bash
- docker run -d --name my-postgres -p 5432:5432 -e POSTGRES_PASSWORD=secret postgres:16-alpine
- ```
- docker ps — List running containers. Add
-ato include stopped containers. Use--formatfor custom output. - ```bash
- docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
- ```
- docker logs — View logs from a container. Use
-fto follow (tail -f style). Combine with--tailfor performance. - ```bash
- docker logs -f my-postgres --tail 20
- ```
- docker exec — Run a command inside a running container. The
-itflags allocate an interactive terminal (useful for debugging). - ```bash
- docker exec -it my-postgres psql -U postgres
- ```
These commands are the foundation. Once comfortable, add docker compose, docker build, docker images, and docker system prune to your toolkit.
docker system prune -a --volumes periodically (but carefully — it removes unused containers, images, networks, and volumes). For safety, omit --volumes if you want to keep cached data.docker run --rm to auto-clean containers after they finish. This prevents disk fill-ups on build machines. Also, always pull images with a specific digest (@sha256:...) instead of tags for immutable builds.Images, Layers and Dockerfiles: How Docker Actually Builds Your App
A Docker image is a read-only blueprint for creating containers. A container is a running instance of an image — the same relationship as a class and an object in OOP, or a recipe and a meal.
Images are built in layers. Every instruction in a Dockerfile creates a new layer on top of the previous one. Docker caches these layers aggressively. This is the single most important thing to understand about Dockerfile efficiency: if layer 3 changes, Docker rebuilds from layer 3 downward. Layers 1 and 2 are served from cache instantly.
This is why experienced engineers always copy dependency manifests (package.json, requirements.txt, go.mod) and install dependencies BEFORE copying application source code. Source code changes every commit; dependencies change rarely. Put the slow, stable work near the top of your Dockerfile so it stays cached.
Multi-stage builds are the other major pattern worth knowing early. You use one image (with compilers, build tools, dev dependencies) to build your app, then copy only the compiled output into a minimal runtime image. Your final image contains zero build tooling — smaller, faster, and with a dramatically reduced attack surface.
Let's build a realistic Node.js API with both patterns applied — this is what a production-ready Dockerfile actually looks like, not the toy examples you usually see.
Layer cleanup in the same RUN: Each RUN creates a new layer. If you download a 200MB package in one RUN and delete it in the next RUN, the 200MB still exists in the first layer — layers are additive. Always chain download and cleanup in the same RUN with &&.
- Each layer is a diff on top of the previous layer. If the base changes, the diff no longer applies.
- Docker cannot know if a later instruction depends on the changed content in an earlier layer.
- The cache is sequential, not selective — Docker rebuilds from the first invalidated layer onward.
- This is why layer ordering (least-change to most-change) is the single most impactful Dockerfile optimization.
Volumes and Docker Compose: Persistence and Multi-Container Orchestration
Containers are ephemeral by design. When a container stops, any data written inside it vanishes. That's perfect for stateless services, but databases, file uploads, and logs need to survive container restarts. Docker volumes solve this by mounting a storage location from the host (or a managed volume) into the container's filesystem.
There are three storage mechanisms: bind mounts (link a specific host directory into the container — great for local development where you want live code reloading), named volumes (Docker manages the storage location — best for databases in production), and tmpfs mounts (in-memory only — useful for sensitive data you never want written to disk).
Real applications are never a single container. You have an API, a database, a cache, maybe a background worker. Running and networking these manually with individual docker run commands is error-prone and impossible to reproduce reliably. Docker Compose lets you define your entire multi-container application in one YAML file and bring it all up with a single command.
Here's a complete, realistic Compose setup for a Node.js API backed by PostgreSQL and Redis — the stack you'll encounter in most backend roles.
The depends_on trap: depends_on without condition: service_healthy only waits for the container to START — not for the process inside to be READY. Postgres takes 5-15 seconds to initialize. Without service_healthy, your API will crash on boot trying to connect to a database that is not accepting connections yet. This is the single most common cause of flaky Docker Compose environments.
- It deletes all named volumes for the project — including databases with days of data.
- There is no undo. Once volumes are deleted, data is gone unless backed up.
- Developers often use it thinking it is a 'clean restart' — it is a destructive operation.
- Always back up volumes before running down -v. Use: docker run --rm -v vol:/data -v $(pwd):/backup alpine tar czf /backup/backup.tar.gz -C /data .
Why Your Docker Builds Break at 2 AM: The Registry Rate-Limit Ambush
You've tuned Dockerfiles, layered images, and pinned base tags. Then your CI pipeline fails at 3 AM because Docker Hub throttled your pull. That's not a networking glitch — it's a rate limit. Default anonymous pulls max out at 100 per six hours per IP. Authenticated free users get 200. Your build server is sharing an office IP with twenty other devs. You're screwed.
Fix this before it bites. Authenticate every Docker client that pulls images — even from your local laptop. Add credsStore to ~/.docker/config.json and log in via docker login. For CI, inject a read-only PAT as a secret. Never rely on anonymous pulls for production builds. If you use a mirror registry like Docker's own registry-1.docker.io, you still get throttled. Run your own pull-through cache with Nexus or Harbor. That way your CI pulls once, caches locally, and never hits the public limit again.
Stop Using Latest: How Unpinned Tags Corrode Your Deployments
image: postgres:latest looks harmless on day one. Six weeks later, a latest tag silently maps to Postgres 16 while your code expects Postgres 14. Your staging environment passes tests because you haven't rebuilt the image. Production pulls the new tag, crashes on a breaking change, and your pager goes off. That's not a CI failure — it's a trust failure.
Every tag except a digest is mutable. latest, alpine, bullseye — all of them get overwritten by maintainers. Pin to a digest (sha256:abc123...). Or at minimum pin to a minor version: postgres:14.10. Never use latest in any environment you care about. Not dev, not staging, not prod. If you use Docker Compose for local dev, still pin it. The same bug applies when a co-worker pulls the compose file on a fresh machine three months later.
Automate your image update policy with Renovate or Dependabot. They open PRs for digest bumps. You review, approve, and know exactly what changed. No surprises.
latest or any mutable tag in a Docker Compose file or CI pipeline. Pin to a digest or a specific minor version.The Hidden Cost of `docker exec` in Production Debugging
You SSH into a box, run docker exec -it web bash, and start poking around. Maybe you install curl. Maybe you grep logs. Ten minutes later you forget to exit. That container is now a snowflake — modified, untracked, and unreproducible. When you restart the service, those changes vanish. But if you scaled up replicas, only one has your ad-hoc tools. The next on-call engineer is confused. The deployment pipeline can't rebuild your state. You've broken reproducibility.
Production debugging with docker exec is a crutch you should break. Instead, run a sidecar container with the debugging tools you need, controlled by its own health check. Or ship all logs to a central aggregator (ELK, Loki, Datadog) before you need them. If you absolutely must inspect a running process, use docker cp to copy files out, or docker logs --tail 1000 -f to stream output. Never mutate a running production container.
If you're tempted to install packages inside a container, that's a signal your base image is missing essential debugging tools. Add them to your Dockerfile behind a build arg: ARG DEBUG_TOOLS=true and RUN apt-get install -y netcat strace tcpdump. Rebuild when you need them.
docker exec to install a tool, then commit the container — that image is now untraceable. Never commit a running container. Rebuild from source.docker exec. Use sidecars, centralized logging, or build-time debug tools instead.Production Deployment Crashes on Boot — API Connects to Database Before Postgres Is Ready
- depends_on without condition: service_healthy only waits for container start — not process readiness. Always use service_healthy for databases.
- Postgres, MySQL, Redis, and any service with initialization time needs a healthcheck. Without it, dependent services will crash on boot.
- A restart loop (container crashing and restarting repeatedly) is a symptom of a race condition, not a network issue. Check startup ordering first.
- Defense-in-depth: add a connection retry loop in your application code as a second layer of protection, even with healthchecks in place.
- The start_period flag on healthcheck prevents false failures during slow startup. Set it to the expected maximum initialization time.
docker compose logs <service> --tail 50docker inspect <container> --format '{{.State.ExitCode}} {{.State.Error}}'Key takeaways
Interview Questions on This Topic
Frequently Asked Questions
20+ years shipping production infrastructure and CI/CD at scale. Written from production experience, not tutorials.
That's Docker. Mark it forged?
11 min read · try the examples if you haven't