Introduction to Docker: Containers, Images and Real-World Usage Explained
- Containers share the host OS kernel — they're not mini VMs. This is why they start in milliseconds and use megabytes of memory, making them economically practical for microservices at scale.
- Docker image layers are cached from top to bottom. Copy dependency manifests and run installs BEFORE copying source code, or every git commit will trigger a full package reinstall.
- Multi-stage builds are not optional in production — they separate build-time tooling from the runtime image, cutting image sizes by 50-70% and removing attack surface from your deployed artifact.
- Container: a running instance of an image — isolated filesystem, network, and process tree sharing the host kernel
- Image: a read-only blueprint built from layers — each Dockerfile instruction creates one cached layer
- Dockerfile: the build script that defines the image — instructions are executed top-to-bottom
- Volume: persistent storage that survives container deletion — named volumes for production, bind mounts for development
- Containers share the host OS kernel (no guest OS overhead)
- VMs run a full guest OS per instance (stronger isolation, much heavier)
- Layer caching: changing one layer invalidates all layers after it — order from least-to-most frequently changing
- Multi-stage builds: use heavy toolchains during compilation, ship only the output to production
Container exits immediately on startup.
docker compose logs <service> --tail 50docker inspect <container> --format '{{.State.ExitCode}} {{.State.Error}}'Container A cannot reach Container B by hostname.
docker network inspect <network> --format '{{range .Containers}}{{.Name}} {{end}}'docker exec <container-a> nslookup <container-b-service-name>Docker build is slow — every rebuild takes minutes.
docker build --progress=plain -t test . 2>&1 | grep -E 'CACHED|RUN|COPY'docker history <image> --format '{{.Size}} {{.CreatedBy}}' | sort -hrPort already allocated — container cannot start.
'{{.Names}} {{.Ports}}' | grep <port>ss -tlnp | grep <port>Container data lost after docker compose down.
docker volume ls | grep <project>docker compose config | grep -A2 volumesImage is unexpectedly large (>500MB).
docker history <image> --format '{{.Size}} {{.CreatedBy}}' | sort -hrcat .dockerignore 2>/dev/null || echo 'NO .dockerignore FILE'Production Incident
Production Debug GuideFrom startup crashes to slow builds — systematic debugging paths.
Environment drift is the root cause of most 'works on my machine' failures. A different Node version, a missing library, an environment variable pointing nowhere — these are not skill problems, they are infrastructure problems. Docker eliminates this class of issue by packaging the entire runtime environment into a portable, immutable container.
Containers are not VMs. They share the host OS kernel and use Linux namespaces and cgroups for isolation. This means containers start in milliseconds and use megabytes of memory, making microservices architectures economically viable. On the same machine that runs three VMs, you can run thirty containers.
Common misconceptions: containers are not inherently insecure (misconfiguration is the problem, not the technology), data inside containers is not persistent by default (you need volumes), and Docker Compose is not just for development (it works in production for single-host deployments).
Containers vs Virtual Machines: Why Docker Is a Fundamentally Different Idea
Most people learn Docker by running commands without understanding the architectural shift underneath. That's fine for getting started, but it bites you the moment something breaks.
A virtual machine (VM) runs a full guest operating system — its own kernel, drivers, system processes — on top of a hypervisor. Your app sits at the top of this tower. Booting a VM can take minutes. It consumes gigabytes of RAM even before your app starts. Scaling ten microservices with VMs means ten full operating systems running simultaneously.
Docker containers take a different path. They share the host machine's kernel directly. Each container gets its own isolated view of the filesystem (via union file system layers), its own network namespace, and its own process tree — but there's no duplicated OS. A container starts in milliseconds. It uses megabytes of overhead instead of gigabytes.
The practical implication: on the same machine where you could run three VMs, you can run thirty containers. That's not a minor efficiency gain — it's the reason microservices architectures became economically viable. When AWS charges you per second of compute, that difference compounds fast.
Containers are not inherently less secure than VMs — they're just differently isolated. A misconfigured container is dangerous, just as a misconfigured VM is. The security story depends on your configuration, not the technology itself.
Kernel sharing trade-off: Because containers share the host kernel, a kernel vulnerability (like CVE-2022-0185 or Dirty Pipe) affects all containers on that host. VMs have a separate kernel per instance, so a kernel vulnerability in one VM does not affect others. For high-security multi-tenant environments (running untrusted code), VMs provide stronger isolation. For single-tenant application workloads, container isolation is sufficient.
# Compare startup time and resource footprint — run these and watch the difference # Pull a minimal Linux image (only ~5MB compressed) docker pull alpine:3.19 # Start a container, run a command, and exit — time the whole thing time docker run --rm alpine:3.19 echo "Container is alive" # --rm tells Docker to delete the container after it exits (no cleanup needed) # alpine:3.19 is the image — think of it as the blueprint # 'echo ...' is the command to run inside the container # Now check how much memory the container used at peak # Run it in the background with resource stats docker run -d --name resource-demo alpine:3.19 sleep 30 # -d runs in detached (background) mode # --name gives it a human-readable name instead of a random hash docker stats resource-demo --no-stream # --no-stream prints one snapshot instead of a live feed # Look at the MEM USAGE column — typically under 1MB for alpine doing nothing # Clean up docker stop resource-demo && docker rm resource-demo
real 0m0.387s
user 0m0.021s
sys 0m0.018s
NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
resource-demo 0.00% 632KiB / 15.55GiB 0.00% 796B / 0B 0B / 0B
- Multi-tenant environments running untrusted code — the shared kernel is a risk.
- Workloads requiring a different kernel version than the host.
- Compliance requirements that mandate full OS isolation.
- For everything else — single-tenant application workloads — containers are the right choice.
Images, Layers and Dockerfiles: How Docker Actually Builds Your App
A Docker image is a read-only blueprint for creating containers. A container is a running instance of an image — the same relationship as a class and an object in OOP, or a recipe and a meal.
Images are built in layers. Every instruction in a Dockerfile creates a new layer on top of the previous one. Docker caches these layers aggressively. This is the single most important thing to understand about Dockerfile efficiency: if layer 3 changes, Docker rebuilds from layer 3 downward. Layers 1 and 2 are served from cache instantly.
This is why experienced engineers always copy dependency manifests (package.json, requirements.txt, go.mod) and install dependencies BEFORE copying application source code. Source code changes every commit; dependencies change rarely. Put the slow, stable work near the top of your Dockerfile so it stays cached.
Multi-stage builds are the other major pattern worth knowing early. You use one image (with compilers, build tools, dev dependencies) to build your app, then copy only the compiled output into a minimal runtime image. Your final image contains zero build tooling — smaller, faster, and with a dramatically reduced attack surface.
Let's build a realistic Node.js API with both patterns applied — this is what a production-ready Dockerfile actually looks like, not the toy examples you usually see.
Layer cleanup in the same RUN: Each RUN creates a new layer. If you download a 200MB package in one RUN and delete it in the next RUN, the 200MB still exists in the first layer — layers are additive. Always chain download and cleanup in the same RUN with &&.
# ── STAGE 1: Build Stage ────────────────────────────────────────────────────── # Use the full Node image with build tools available FROM node:20-alpine AS builder # 'AS builder' names this stage so we can reference it later # node:20-alpine uses Alpine Linux — much smaller than node:20-bullseye # Set the working directory inside the container WORKDIR /app # COPY dependency files FIRST — before application code # Docker caches this layer. If package.json hasn't changed, npm install # won't re-run even if your source code changed. This saves minutes per build. COPY package.json package-lock.json ./ # Install only production dependencies (saves ~200MB vs installing devDependencies) RUN npm ci --omit=dev # npm ci is faster and stricter than npm install — it respects package-lock.json exactly # NOW copy application source code # Changing any source file only invalidates from this line forward COPY src/ ./src/ # ── STAGE 2: Production Runtime Stage ───────────────────────────────────────── # Start fresh from a minimal image — no build tools, no npm, no package manager cruft FROM node:20-alpine AS production # Run as a non-root user — critical for production security # node:alpine ships with a 'node' user built in USER node WORKDIR /app # Copy only what we need from the builder stage — not the entire filesystem COPY --from=builder --chown=node:node /app/node_modules ./node_modules COPY --from=builder --chown=node:node /app/src ./src COPY --chown=node:node package.json ./ # --chown ensures the node user owns these files, not root # Document which port the app listens on (informational — doesn't actually publish it) EXPOSE 3000 # Define the command to run when a container starts from this image # Use array form (exec form) — NOT string form — to ensure signals are handled correctly CMD ["node", "src/server.js"]
$ docker build -t my-node-api:1.0.0 .
Sending build context to Docker daemon 48.13kB
Step 1/11 : FROM node:20-alpine AS builder
---> 3f4d90098f5b
Step 2/11 : WORKDIR /app
---> Using cache
Step 3/11 : COPY package.json package-lock.json ./
---> Using cache <- dependencies layer served from cache!
Step 4/11 : RUN npm ci --omit=dev
---> Using cache <- install step also cached — build is fast
Step 5/11 : COPY src/ ./src/
---> 8c3a1b2d4e5f <- only this layer rebuilt (source changed)
...
Successfully built a7b3c9d1e2f4
Successfully tagged my-node-api:1.0.0
# Check the final image size
$ docker image ls my-node-api
REPOSITORY TAG IMAGE ID CREATED SIZE
my-node-api 1.0.0 a7b3c9d1e2f4 12 seconds ago 142MB
# Compare: the builder stage alone would be ~380MB with all dev tooling
- Each layer is a diff on top of the previous layer. If the base changes, the diff no longer applies.
- Docker cannot know if a later instruction depends on the changed content in an earlier layer.
- The cache is sequential, not selective — Docker rebuilds from the first invalidated layer onward.
- This is why layer ordering (least-change to most-change) is the single most impactful Dockerfile optimization.
Volumes and Docker Compose: Persistence and Multi-Container Orchestration
Containers are ephemeral by design. When a container stops, any data written inside it vanishes. That's perfect for stateless services, but databases, file uploads, and logs need to survive container restarts. Docker volumes solve this by mounting a storage location from the host (or a managed volume) into the container's filesystem.
There are three storage mechanisms: bind mounts (link a specific host directory into the container — great for local development where you want live code reloading), named volumes (Docker manages the storage location — best for databases in production), and tmpfs mounts (in-memory only — useful for sensitive data you never want written to disk).
Real applications are never a single container. You have an API, a database, a cache, maybe a background worker. Running and networking these manually with individual docker run commands is error-prone and impossible to reproduce reliably. Docker Compose lets you define your entire multi-container application in one YAML file and bring it all up with a single command.
Here's a complete, realistic Compose setup for a Node.js API backed by PostgreSQL and Redis — the stack you'll encounter in most backend roles.
The depends_on trap: depends_on without condition: service_healthy only waits for the container to START — not for the process inside to be READY. Postgres takes 5-15 seconds to initialize. Without service_healthy, your API will crash on boot trying to connect to a database that is not accepting connections yet. This is the single most common cause of flaky Docker Compose environments.
# Docker Compose V2 format (no 'version' key needed with modern Docker Desktop) services: # ── The API service ─────────────────────────────────────────────────────── api: build: context: . # Build from the Dockerfile in the current directory target: production # Use the 'production' stage from our multi-stage Dockerfile container_name: my-api ports: - "3000:3000" # Map host port 3000 -> container port 3000 environment: # Reference values from a .env file — never hardcode secrets in Compose files NODE_ENV: production DATABASE_URL: postgresql://api_user:${DB_PASSWORD}@postgres:5432/app_db REDIS_URL: redis://redis:6379 # 'postgres' and 'redis' are the service names below — Docker's internal # DNS resolves them automatically within the shared network depends_on: postgres: condition: service_healthy # Wait until postgres passes its health check redis: condition: service_started restart: unless-stopped # Restart on crash, but not if manually stopped # ── PostgreSQL database ─────────────────────────────────────────────────── postgres: image: postgres:16-alpine # Always pin a specific version — never use 'latest' container_name: my-postgres environment: POSTGRES_DB: app_db POSTGRES_USER: api_user POSTGRES_PASSWORD: ${DB_PASSWORD} # Pulled from .env file volumes: - postgres_data:/var/lib/postgresql/data # Named volume — Docker manages where this lives on the host. # Database files survive 'docker compose down' and container rebuilds. - ./db/init.sql:/docker-entrypoint-initdb.d/init.sql:ro # Bind mount an init script — runs once when the DB is first created. # :ro makes it read-only inside the container (good security habit) healthcheck: test: ["CMD-SHELL", "pg_isready -U api_user -d app_db"] interval: 5s # Check every 5 seconds timeout: 5s # Fail if no response in 5 seconds retries: 5 # Mark unhealthy after 5 consecutive failures start_period: 30s # Grace period before health checks start # ── Redis cache ─────────────────────────────────────────────────────────── redis: image: redis:7-alpine container_name: my-redis command: redis-server --appendonly yes # --appendonly yes enables AOF persistence — data survives Redis restarts volumes: - redis_data:/data # Named volumes must be declared at the top level # Docker creates and manages these — they persist across 'docker compose down' volumes: postgres_data: redis_data:
$ docker compose up -d
[+] Running 5/5
✔ Network my-app_default Created
✔ Volume "postgres_data" Created
✔ Volume "redis_data" Created
✔ Container my-postgres Healthy
✔ Container my-redis Started
✔ Container my-api Started
# Check all services are running
$ docker compose ps
NAME IMAGE COMMAND STATUS PORTS
my-api my-app-api "docker-entrypoint.s…" Up 12 seconds 0.0.0.0:3000->3000/tcp
my-postgres postgres:16-alpine "docker-entrypoint.s…" Up 18 seconds 5432/tcp
my-redis redis:7-alpine "docker-entrypoint.s…" Up 18 seconds 6379/tcp
# Tail logs from a specific service
$ docker compose logs -f api
my-api | Server listening on port 3000
my-api | Database connection established
my-api | Redis connection established
# Tear down (volumes are preserved by default)
$ docker compose down
# Add --volumes to also delete the named volumes (WARNING: deletes all DB data)
- It deletes all named volumes for the project — including databases with days of data.
- There is no undo. Once volumes are deleted, data is gone unless backed up.
- Developers often use it thinking it is a 'clean restart' — it is a destructive operation.
- Always back up volumes before running down -v. Use: docker run --rm -v vol:/data -v $(pwd):/backup alpine tar czf /backup/backup.tar.gz -C /data .
| Aspect | Virtual Machines | Docker Containers |
|---|---|---|
| Startup time | 30 seconds – 5 minutes | Milliseconds to 2 seconds |
| Memory overhead | 512MB – 2GB per instance | 1MB – 50MB per instance |
| OS isolation | Full guest OS per VM | Shared host kernel, isolated namespaces |
| Disk footprint | 5GB – 50GB per image | 5MB – 500MB per image |
| Portability | Hypervisor-dependent (.vmdk, .vhd) | Runs on any Docker host (Linux, Mac, Windows, Cloud) |
| Security isolation | Strong (separate kernel) | Good (namespaces + cgroups, but shared kernel) |
| Best for | Full OS control, strong isolation needs | Microservices, CI/CD pipelines, developer environments |
| Scaling speed | Minutes (VM provisioning) | Seconds (container spin-up) |
🎯 Key Takeaways
- Containers share the host OS kernel — they're not mini VMs. This is why they start in milliseconds and use megabytes of memory, making them economically practical for microservices at scale.
- Docker image layers are cached from top to bottom. Copy dependency manifests and run installs BEFORE copying source code, or every git commit will trigger a full package reinstall.
- Multi-stage builds are not optional in production — they separate build-time tooling from the runtime image, cutting image sizes by 50-70% and removing attack surface from your deployed artifact.
- Named volumes persist data across container restarts and rebuilds; depends_on with service_healthy prevents race conditions — both are non-negotiable for any database-backed service.
- docker compose down preserves volumes. docker compose down -v deletes them. Always back up volumes before any destructive operation.
- Always use exec-form CMD (CMD ["node", "server.js"]) — shell form silently breaks graceful shutdown in Kubernetes.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat's the difference between a Docker image and a Docker container, and how does the layer caching system affect your Dockerfile design decisions?
- QIf your API container starts before your database is ready and crashes on boot, how would you fix that in a Docker Compose file without adding a sleep command?
- QWhat's the practical difference between a bind mount and a named volume, and when would you choose one over the other in a production environment?
- QExplain the difference between containers and VMs at the kernel level. When would you choose one over the other for security reasons?
- QYour Docker image is 1.5GB for a simple Node.js API. Walk me through how you would diagnose and reduce the size.
- QYour container takes 30 seconds to stop in Kubernetes. What is the most likely cause and how do you fix it?
Frequently Asked Questions
What is Docker used for in real-world software development?
Docker is used to package applications and their dependencies into portable containers that run identically across development, staging, and production environments. In practice it's used for local development environments, CI/CD pipelines, microservices deployment, and running databases or third-party services locally without installing them on your machine.
Is Docker the same as a virtual machine?
No — they solve a similar problem (environment isolation) but in fundamentally different ways. A VM runs a complete guest operating system with its own kernel, which costs gigabytes of memory and minutes to start. A Docker container shares the host OS kernel and uses Linux namespaces and cgroups for isolation, starting in milliseconds and using megabytes. For most application workloads, containers are faster, cheaper, and just as reliable.
Does data inside a Docker container get deleted when the container stops?
Yes — by default, any data written inside a container's writable layer is lost when the container is removed. To persist data you need to use volumes: named volumes (Docker manages the storage location, best for databases) or bind mounts (maps a specific host directory into the container, best for local development). Neither type of volume is deleted by docker compose down unless you explicitly pass the --volumes flag.
What is the difference between depends_on and depends_on with condition: service_healthy?
depends_on without a condition only waits for the container to START (docker start returns), not for the process inside to be READY. depends_on with condition: service_healthy waits for the healthcheck to PASS — the process inside must be fully ready and accepting connections. For databases, message queues, and any service with startup time, always use service_healthy.
How do I reduce the size of my Docker image?
The three highest-impact changes are: (1) use a minimal base image like Alpine instead of full Debian — this alone drops your base from ~180MB to ~7MB; (2) use multi-stage builds so your build tools and compiler never ship to production; (3) chain RUN commands with && and clean up package manager caches in the same RUN instruction so intermediate files don't persist in a layer.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.