Skip to content
Home Java Spring Boot with Docker: The Ultimate Containerization Guide

Spring Boot with Docker: The Ultimate Containerization Guide

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Spring Boot → Topic 13 of 15
Master Spring Boot and Docker integration.
🔥 Advanced — solid Java foundation required
In this tutorial, you'll learn
Master Spring Boot and Docker integration.
  • Docker ensures that if it runs in CI it runs in production — the image is the deployment artifact and the environment is baked into it, not configured separately on each host.
  • Multi-stage builds are not an optimization — they are the correct way to build production Java images. Single-stage builds with JDK in production are a security and operational liability.
  • Never run Spring Boot containers as root. Four lines of Dockerfile (groupadd, useradd, mkdir with chown, USER directive) prevent an entire class of security vulnerabilities and compliance findings.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Docker packages your Spring Boot app with its exact runtime environment — JDK, OS libraries, config — into a portable, reproducible image
  • Multi-stage builds separate the heavy build stage (Maven + JDK) from the slim runtime stage (JRE-only) — reducing image size from 800MB to 250MB
  • Docker Compose orchestrates multi-service stacks (app + database + Redis) with service-name DNS resolution — no hardcoded IPs
  • depends_on only controls startup order, not readiness — always pair it with healthcheck conditions to prevent connection failures on boot
  • Never run containers as root — a compromised app gets host-level access; always define a non-root USER directive
  • The biggest mistake: including Maven and JDK in the production image — it doubles size and triples the attack surface
🚨 START HERE
Docker Debug Cheat Sheet — Commands That Save Hours
Real commands for debugging Spring Boot Docker containers. These are the commands I type first when a container is misbehaving — ordered by how often each situation actually comes up.
🟡Need to see why a container exited
Immediate ActionCheck container logs and exit code — exit code tells you the kill reason before you read a single log line
Commands
docker logs --tail 100 <container>
docker inspect <container> | jq '.[0].State.ExitCode'
Fix NowExit code 137 means OOM killed — increase the container --memory limit or reduce MaxRAMPercentage. Exit code 1 means the JVM or Spring threw an unhandled exception — grep the logs for 'ERROR' and 'Exception'. Exit code 0 means the process exited cleanly — check if the entrypoint is correct and the JAR path is valid.
🟡Need to verify JVM memory settings are respecting container limits
Immediate ActionCheck actual JVM heap configuration inside the running container — do not trust that flags are being applied without verifying
Commands
docker exec <container> jcmd 1 VM.flags | grep -iE 'ram|heap|container'
curl -s http://localhost:8080/actuator/metrics/jvm.memory.max | jq '.measurements[0].value'
Fix NowIf MaxHeapSize is larger than 75% of your container --memory limit, the JVM is ignoring container boundaries. Verify -XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0 are in your ENTRYPOINT. The Actuator jvm.memory.max value should be approximately 75% of your container memory limit in bytes.
🟡Need to verify database connectivity from inside the application container
Immediate ActionTest DNS resolution and TCP port reachability before blaming the application
Commands
docker exec <container> nslookup forge-db
docker exec <container> nc -zv forge-db 5432
Fix NowIf nslookup fails with NXDOMAIN, the services are on different Docker networks or the service name in the connection string does not match the Compose service key. If nslookup succeeds but nc fails, the database container is not listening on that port — check if it has a healthcheck that reports healthy. If both succeed but the app still fails, check the connection pool configuration and credentials.
🟡Need to inspect what is inside a Docker image without running it
Immediate ActionUse docker history to inspect layers and a temporary container to verify file layout
Commands
docker history <image> --no-trunc | head -20
docker run --rm -it --entrypoint sh <image> -c 'ls -la /app/ && du -sh /* 2>/dev/null | sort -rh | head -10'
Fix NowIf you see /root/.m2 in the layer history or the du output shows a large /root directory, Maven cache is included in the production image — switch to multi-stage builds. If you see eclipse-temurin:17-jdk as the base layer, switch the runtime stage to eclipse-temurin:17-jre-jammy. A clean multi-stage image should show only /app/app.jar and JRE libraries under /opt/java.
🟡Docker Compose services failing to start in the right order despite depends_on
Immediate ActionCheck the actual healthcheck status of dependency services — not just whether they are running
Commands
docker compose ps
docker inspect <db_container> | jq '.[0].State.Health'
Fix NowIf Health is null, no healthcheck is defined on the dependency — add one. If Health.Status is 'starting', the app is starting before the healthcheck passes. If Health.Status is 'unhealthy', the healthcheck command is failing — run it manually inside the container: docker exec <db_container> pg_isready -U forge_admin -d bookstore to see the actual output.
Production IncidentThe ECONNREFUSED Boot Loop — depends_on Without HealthchecksA Spring Boot service crashed on every startup with ECONNREFUSED to PostgreSQL. Docker Compose showed the database container as 'running' but the application could not connect. The fix was one healthcheck block and a changed depends_on condition.
SymptomSpring Boot application crashed on startup with 'Connection refused to forge-db:5432'. Docker Compose reported both containers as 'running' with green status. Restarting the entire stack manually sometimes worked, sometimes did not — the behavior was not deterministic. The CI pipeline failed on roughly 30% of runs with no code change between passing and failing builds. Developers assumed the database was slow to initialize and added a 30-second sleep in the entrypoint script.
AssumptionThe team accepted the sleep workaround as necessary because Docker Compose 'said the database was running.' Nobody questioned why a running container could refuse connections. The CI failures were attributed to 'flaky infrastructure' rather than a deterministic timing bug. The 30-second sleep occasionally was not enough when the PostgreSQL image was pulled fresh in CI, and it wasted 20-25 seconds on every other run when PostgreSQL was ready in under 10 seconds.
Root causeDocker Compose depends_on with no condition only ensures the database container process has started — specifically that the Docker daemon has launched the container and the entrypoint has begun executing. It does not mean PostgreSQL has finished its initialization sequence, created the database, applied the schema, and opened its listening socket on port 5432. PostgreSQL initialization on the alpine image takes between 3 and 15 seconds depending on the host, the image cache state, and whether it is performing first-run cluster initialization. The Spring Boot application's HikariCP connection pool attempted its initial connection during this window and crashed. The 30-second sleep was a race condition with a longer timeout — not a fix.
FixAdded a healthcheck to the PostgreSQL service using pg_isready, which is the official PostgreSQL readiness probe that reports healthy only when the server is fully initialized and accepting connections on the correct socket. Changed the depends_on block to use condition: service_healthy. The Spring Boot container now blocks at startup until the PostgreSQL healthcheck reports healthy, then attempts its first connection. The 30-second sleep was removed entirely. CI failure rate dropped from 30% to 0% across 200 subsequent runs. Average startup time reduced by 12 seconds on warm runs where PostgreSQL was ready quickly.
Key Lesson
depends_on controls container start order, not service readiness — 'running' means the process started, not that the service inside is accepting connectionsA hardcoded sleep is a race condition with a longer timeout — use healthchecks for deterministic readiness, not timing guessesEach database has its own readiness probe: pg_isready for PostgreSQL, mysqladmin ping for MySQL, redis-cli ping for Redis — use the database-native probe rather than generic TCP checksIf CI fails intermittently with ECONNREFUSED and both containers show as running, the first thing to check is healthcheck configuration — not infrastructure stabilityThe CI failure rate is the leading indicator here — 30% failure with no code change is a determinism bug, not a flaky test
Production Debug GuideWhen Spring Boot containers behave unexpectedly, here is how to go from observable symptom to a verified resolution. These are ordered by frequency — the first three account for about 80% of the issues I have debugged.
Container exits immediately after start with no error in application logsCheck the exit code first — it tells you what killed the process. docker inspect <container> | jq '.[0].State.ExitCode'. Exit code 137 means OOM kill — check docker stats and increase the container memory limit. Exit code 1 means the JVM crashed before Spring finished starting — check docker logs <container> | grep -i 'error\|exception\|outofmemory'. If the log is empty entirely, check if the JAR is present: docker run --rm --entrypoint sh <image> -c 'ls -la /app/' to verify the COPY step in your Dockerfile worked correctly.
Application crashes with ECONNREFUSED to database on startupVerify the database healthcheck is configured and reporting healthy before diagnosing further: docker inspect <db_container> | jq '.[0].State.Health'. If Health is null, no healthcheck is configured — add one. If Health.Status is 'starting', the app is connecting before the database is ready — add condition: service_healthy to depends_on. If Health.Status is 'healthy' but connection still fails, verify DNS resolution from inside the app container: docker exec <app_container> nslookup forge-db. If nslookup fails, the containers are on different Docker networks — check your Compose network configuration.
Container runs but application is not accessible on the mapped portVerify the port mapping is correct: docker ps shows 0.0.0.0:8080->8080/tcp, not 127.0.0.1:8080->8080/tcp. If it shows 127.0.0.1, the application is binding to loopback inside the container — set server.address=0.0.0.0 in application.properties. Verify the application is actually listening: docker exec <container> ss -tlnp | grep 8080. If the port is not in the listen state, the Spring Boot context failed to start — check application logs for startup errors that happened after the JVM launched.
Image is 800MB or larger and takes 5 minutes to push to the registryCheck which layers are causing the bloat: docker history <image> --no-trunc | head -30. Look for JDK installation layers and Maven cache layers. If you see /root/.m2 or eclipse-temurin:17-jdk in the final image layers, you are not using multi-stage builds. Switch to multi-stage: Stage 1 builds with JDK and Maven, Stage 2 copies only the JAR to a JRE base image. Verify the result: docker images <image> should show under 300MB. Use docker run --rm <image> sh -c 'which mvn; which javac' to confirm build tools are absent from the final image.
Container cannot write to log files or create temp directories — permission denied errorsCheck which user the container is running as: docker exec <container> whoami and docker exec <container> id. If non-root, verify the target directories are owned by that user: docker exec <container> ls -la /app/logs/. The most common cause is COPY in the Dockerfile copying files owned by root. Fix with COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar. Alternatively, add RUN mkdir -p /app/logs && chown forgeuser:forgegroup /app/logs before the USER directive.
Docker Compose services cannot reach each other by service nameVerify all services are on the same Docker network: docker network ls and docker network inspect <network_name>. All services in the same Compose file are on the same default network unless you have defined custom networks. Check if service names in connection strings match the Compose service key exactly — SPRING_DATASOURCE_URL=jdbc:postgresql://forge-db:5432 requires the Compose service to be named forge-db, not postgres or db. Verify DNS from inside the container: docker exec <app_container> nslookup forge-db.

In the modern DevOps landscape, containerization is no longer optional — it is the baseline expectation for any Java service that ships to more than one environment. Spring Boot provides the framework for microservices, but Docker provides the deployment mechanism that makes those services reproducible and portable. Together they eliminate the environment drift that turns 'it works on my machine' into a recurring production incident.

But production-grade containerization is not just about running docker build and calling it done. A bloated 800MB image with embedded Maven doubles deployment time and triples the attack surface — every unused JDK library is a potential CVE waiting to be flagged by your security scanner. A container running as root exposes the host to privilege escalation the moment the application is compromised. A missing healthcheck causes cascading startup failures across your entire Compose stack every time the database takes longer than usual to initialize.

I have debugged all three of these in production. What they share is that the fix was straightforward once you understood the mechanism. The gap was never Docker knowledge — it was understanding what Docker actually does versus what developers assume it does.

This guide covers multi-stage builds with real size numbers, non-root security configurations, Docker Compose orchestration with proper healthchecks, JVM container memory optimization, layer caching strategy, and the specific production failures that separate hobby containers from deployments you can put your name on.

Multi-Stage Builds: The Single Most Impactful Dockerfile Change

The first Dockerfile most developers write for a Spring Boot application uses a single stage with a JDK image and copies the source code directly. It works. But it ships Maven, the JDK compiler, all your source code, and every downloaded dependency into the production image. The result is typically 750-900MB of image that takes minutes to push to ECR or GCR, gets flagged by container security scanners for dozens of JDK-related CVEs, and runs with far more filesystem surface than the running application ever needs.

Multi-stage builds are the fix. The concept is straightforward: use two FROM statements in one Dockerfile. The first stage has everything you need to build — JDK, Maven wrapper, source code. The second stage starts clean with only a JRE base image and copies the compiled JAR artifact from the build stage. When Docker builds the image, the build stage is used and then discarded. Nothing from Stage 1 ends up in the final image except the files you explicitly copy over.

The size improvement is not marginal. I have measured the before-and-after on multiple projects: single-stage JDK image runs 780-850MB, multi-stage JRE image runs 220-280MB. That is a 65-70% reduction. At 50 deployments per day across a team, the cumulative pull and push time savings are substantial. The security improvement is harder to quantify but consistently significant — a JDK image ships hundreds of binaries and libraries that a running Spring Boot application never touches at runtime.

The layer caching strategy inside the build stage matters almost as much as the stage separation itself. If you COPY src before running dependency:go-offline, every code change invalidates the Maven download cache layer and forces a full re-download. Copy pom.xml first, run the dependency resolution, then copy src. This way, only actual pom.xml changes trigger dependency re-downloads. Code changes reuse the cached dependency layer.

Dockerfile · DOCKERFILE
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
# ── STAGE 1: Build ───────────────────────────────────────────────────────────
# Use the full JDK with Maven wrapper support
# eclipse-temurin is the preferred base: community-maintained, Adoptium project
FROM eclipse-temurin:21-jdk-jammy AS build
WORKDIR /app

# Copy Maven wrapper first — these rarely change, so this layer is almost always cached
COPY .mvn/ .mvn
COPY mvnw pom.xml ./

# Download all dependencies before copying source code
# Layer cache key is pom.xml — only invalidated when dependencies change, not when code changes
RUN ./mvnw dependency:go-offline -q

# Now copy source — this layer changes on every code commit, which is expected
COPY src ./src

# Build without tests (tests should run in a separate CI step, not during image build)
RUN ./mvnw clean package -DskipTests -q

# ── STAGE 2: Runtime ─────────────────────────────────────────────────────────
# JRE only — no compiler, no javac, no Maven, no source code
# Jammy (Ubuntu 22.04) base keeps OS libraries consistent with the build stage
FROM eclipse-temurin:21-jre-jammy
WORKDIR /app

# Create a dedicated non-root system user and group before adding any files
# -r flag: system account (no home dir, no login shell, no password)
RUN groupadd -r forgegroup && useradd -r -g forgegroup -s /sbin/nologin forgeuser

# Create log and temp directories before switching to non-root user
# Without this, the non-root user cannot create directories at runtime
RUN mkdir -p /app/logs /app/tmp && chown -R forgeuser:forgegroup /app

# Copy only the built artifact — nothing else from the build stage comes through
# --chown ensures the JAR is owned by the non-root user from the moment it lands
COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar

# Switch to non-root user for all subsequent operations including ENTRYPOINT
USER forgeuser

# JVM flags explained:
#   -XX:+UseContainerSupport        — read memory/CPU limits from cgroups, not /proc/meminfo
#   -XX:MaxRAMPercentage=75.0       — allocate 75% of container memory as heap, leave 25% for metaspace/threads
#   -XX:+ExitOnOutOfMemoryError     — crash loudly on OOM instead of degrading silently
#   -Djava.io.tmpdir=/app/tmp       — write temp files to the directory the non-root user owns
#   -Dfile.encoding=UTF-8           — explicit encoding prevents locale-dependent behavior differences
ENTRYPOINT ["java", \
  "-XX:+UseContainerSupport", \
  "-XX:MaxRAMPercentage=75.0", \
  "-XX:+ExitOnOutOfMemoryError", \
  "-Djava.io.tmpdir=/app/tmp", \
  "-Dfile.encoding=UTF-8", \
  "-jar", "app.jar"]
▶ Output
# Build: docker build -t io.thecodeforge/forge-api:1.0.0 .
#
# First build (cold cache — downloads all dependencies):
# STAGE 1 build: ~180 seconds (Maven download + compile)
# STAGE 2 setup: ~15 seconds
# Final image: 268MB
#
# Second build after code-only change (warm cache — pom.xml unchanged):
# STAGE 1 build: ~25 seconds (dependency layer cached, only compile runs)
# STAGE 2 setup: ~5 seconds
# Final image: 268MB
#
# Size comparison:
# Single-stage JDK image: 834MB
# Multi-stage JRE image: 268MB
# Reduction: 67.8%
#
# Security scan results (example using Trivy):
# Single-stage JDK: 43 CVEs (12 HIGH, 3 CRITICAL)
# Multi-stage JRE: 8 CVEs (1 HIGH, 0 CRITICAL)
#
# Verify non-root user:
# docker run --rm io.thecodeforge/forge-api:1.0.0 whoami
# forgeuser
#
# Verify build tools absent from final image:
# docker run --rm io.thecodeforge/forge-api:1.0.0 sh -c 'which mvn || echo "Maven: absent"'
# Maven: absent
# docker run --rm io.thecodeforge/forge-api:1.0.0 sh -c 'which javac || echo "JDK compiler: absent"'
# JDK compiler: absent
Mental Model
Multi-Stage Builds Are a Factory Assembly Line
Stage 1 is the machine shop — heavy equipment, raw materials, tools everywhere. Stage 2 is the shipping dock — finished product only, nothing extra, nothing sharp that could cause problems in transit.
  • Stage 1 (build): eclipse-temurin:21-jdk-jammy + Maven wrapper + all source code — heavy, approximately 800MB, everything needed to compile
  • Stage 2 (run): eclipse-temurin:21-jre-jammy + compiled JAR only — slim, approximately 260MB, nothing that was not in the final artifact
  • The build stage is completely discarded by Docker — COPY --from=build is the only bridge between stages
  • Layer cache ordering matters as much as stage separation: pom.xml first, dependency download second, src last — code changes do not invalidate the dependency cache
  • The -XX:+ExitOnOutOfMemoryError flag is often skipped in examples but matters in production — a JVM degrading silently under memory pressure is harder to diagnose than one that exits with a clear OOM error
📊 Production Insight
A team at a fintech company deployed a single-stage Dockerfile with JDK 17 and Maven in the final production image. Image size was 862MB. Pushing to their private ECR registry took 8 minutes per deploy on a standard CI runner. Their security scanner flagged 51 CVEs from unused JDK libraries including three critical severity items related to the Java compiler toolchain — none of which would ever execute in a running Spring Boot service. The CVE report blocked three consecutive release cycles while the team argued about remediation. Switching to a multi-stage build took 45 minutes of work. Final image: 241MB. Deploy time: 90 seconds. CVEs: 6, all low severity from base OS packages present in both images.
🎯 Key Takeaway
Multi-stage builds are the single most impactful change you can make to a Spring Boot Dockerfile — 67% smaller image, 60% fewer CVEs, 70% faster registry operations from one structural change.
Layer cache ordering within the build stage is the second most impactful decision: copy pom.xml before src so code changes do not trigger dependency re-downloads.
The runtime stage should be provably clean — verify with docker run commands that confirm Maven, javac, and source code are absent.

Docker Compose Orchestration: Healthchecks, DNS, and Service Dependencies

Docker Compose solves a real problem: running a Spring Boot application locally requires a database, probably a Redis instance, maybe a message broker. Without Compose, you are maintaining a mental checklist of docker run commands with the right port mappings, environment variables, and network flags. With Compose, the entire stack lives in one file that any developer can bring up with a single command.

But Compose introduces its own failure modes, and the most common one is the depends_on misunderstanding. The depends_on directive controls the order in which Docker starts containers. It does not wait for the service inside that container to become ready. A PostgreSQL container is 'started' from Docker's perspective the moment the postgres binary begins executing — which is 10-15 seconds before PostgreSQL finishes cluster initialization, creates the specified database, and opens its TCP listener on port 5432. The Spring Boot application's HikariCP pool makes its first connection attempt during this window and fails.

The fix is pairing depends_on with a healthcheck that uses the database's own readiness probe. PostgreSQL ships pg_isready for exactly this purpose. MySQL has mysqladmin ping. Redis has redis-cli ping. Use the database-native probe rather than a generic TCP check — TCP connectivity alone does not guarantee the database is ready to authenticate and execute queries.

Service name DNS is the other Compose feature that eliminates a class of bugs. Every service in a Compose file gets a DNS entry matching its service name on the default bridge network. This means you never hardcode 172.x.x.x addresses in connection strings — you use forge-db as the hostname and Docker resolves it to whatever internal IP the database container received. This works automatically and correctly handles container restarts where the IP may change.

docker-compose.yml · YAML
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
# Docker Compose V2 format — requires Docker Desktop 4.x+ or Docker Engine 20.10+
# Run: docker compose up -d
# Stop: docker compose down
# Rebuild app only: docker compose up -d --build forge-app

services:

  forge-app:
    image: io.thecodeforge/forge-api:latest
    build:
      context: .
      dockerfile: Dockerfile
    container_name: forge-app
    ports:
      - "8080:8080"
    environment:
      # Spring profiles — controls which application-{profile}.yml is loaded
      - SPRING_PROFILES_ACTIVE=docker
      # Database — uses service name 'forge-db' as hostname, Docker DNS resolves it
      - SPRING_DATASOURCE_URL=jdbc:postgresql://forge-db:5432/forgedb
      - SPRING_DATASOURCE_USERNAME=forge_admin
      # Credentials from .env file — never hardcode passwords in this file
      - SPRING_DATASOURCE_PASSWORD=${DB_PASSWORD}
      # Redis — uses service name 'forge-redis' as hostname
      - SPRING_DATA_REDIS_HOST=forge-redis
      - SPRING_DATA_REDIS_PORT=6379
      # JVM tuning — overrides ENTRYPOINT defaults if needed at runtime
      - JAVA_OPTS=-XX:MaxRAMPercentage=75.0
    # Container memory limit — JVM respects this via -XX:+UseContainerSupport
    mem_limit: 512m
    # Restart policy — restarts on crash but not on explicit stop
    restart: on-failure
    depends_on:
      forge-db:
        condition: service_healthy
      forge-redis:
        condition: service_healthy
    healthcheck:
      # /actuator/health requires spring-boot-starter-actuator dependency
      test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
      interval: 15s
      timeout: 5s
      retries: 5
      start_period: 30s  # Grace period — healthcheck not evaluated until app has had 30s to start
    networks:
      - forge-network

  forge-db:
    image: postgres:16-alpine
    container_name: forge-db
    environment:
      - POSTGRES_DB=forgedb
      - POSTGRES_USER=forge_admin
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      # Named volume — data persists across docker compose down/up cycles
      - forge-db-data:/var/lib/postgresql/data
      # Optional: mount init SQL scripts for schema setup
      # - ./db/init:/docker-entrypoint-initdb.d
    healthcheck:
      # pg_isready: PostgreSQL's official readiness probe
      # -U and -d flags ensure we are checking the specific database, not just the socket
      test: ["CMD-SHELL", "pg_isready -U forge_admin -d forgedb -q"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 15s
    networks:
      - forge-network

  forge-redis:
    image: redis:7-alpine
    container_name: forge-redis
    # allkeys-lru: evict least recently used keys when maxmemory is reached
    # appendonly yes: persist data to disk for cache warm-up after restart
    command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru --appendonly yes
    volumes:
      - forge-redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5
    networks:
      - forge-network

volumes:
  forge-db-data:
    driver: local
  forge-redis-data:
    driver: local

networks:
  forge-network:
    driver: bridge
▶ Output
# Start the full stack:
# docker compose up -d
#
# Expected startup sequence:
# 1. forge-db and forge-redis start in parallel
# 2. forge-db healthcheck runs every 10s — reports healthy after ~15s
# 3. forge-redis healthcheck runs every 10s — reports healthy after ~5s
# 4. forge-app starts only after BOTH dependencies report healthy
# 5. forge-app healthcheck begins after 30s start_period grace
#
# Verify service health:
# docker compose ps
# NAME IMAGE STATUS PORTS
# forge-app io.thecodeforge/forge-api Up (healthy) 0.0.0.0:8080->8080/tcp
# forge-db postgres:16-alpine Up (healthy)
# forge-redis redis:7-alpine Up (healthy)
#
# Test DNS resolution from app container:
# docker exec forge-app nslookup forge-db
# Server: 127.0.0.11 (Docker's embedded DNS resolver)
# Address: 127.0.0.11#53
# Name: forge-db
# Address: 172.18.0.2
#
# Test connectivity:
# docker exec forge-app nc -zv forge-db 5432
# Connection to forge-db 5432 port [tcp/postgresql] succeeded!
#
# Check healthcheck detail for database:
# docker inspect forge-db | jq '.[0].State.Health'
# {"Status": "healthy", "FailingStreak": 0, "Log": [...]}
⚠ depends_on Is Startup Order — Healthcheck Is Readiness
depends_on without a condition only ensures the dependent container's process has started. PostgreSQL's container process starts in under one second. PostgreSQL the database engine finishes initialization in 10-20 seconds. That gap is where ECONNREFUSED lives. The condition: service_healthy directive tells Compose to wait until the dependency's healthcheck reports healthy before starting the dependent service. This converts a race condition into a deterministic startup sequence. start_period on the healthcheck is equally important: it gives a container time to start before the healthcheck begins counting failures. A Spring Boot application that takes 20 seconds to start should have start_period: 30s so Docker does not mark it unhealthy before it has had a chance to initialize.
📊 Production Insight
At one company, the development team had been maintaining a wait-for-it.sh script — a shell script that loops checking TCP connectivity until the database port opens. This worked but had two problems: it checked TCP connectivity, not PostgreSQL readiness, and it had to be maintained separately from the Compose file. After a Docker Engine upgrade changed how shell scripts were executed in entrypoints, the wait script broke silently and the team spent a day debugging what appeared to be a JVM issue. Switching to native healthcheck blocks removed the external dependency entirely and leveraged pg_isready, which tests actual PostgreSQL protocol readiness rather than just TCP port open status.
🎯 Key Takeaway
depends_on controls container start order, not service readiness — 'running' means the process started, not that the service is accepting connections.
Use the database's native readiness probe in healthcheck: pg_isready for PostgreSQL, mysqladmin ping for MySQL, redis-cli ping for Redis. Generic TCP checks are insufficient.
start_period prevents false-negative health failures during slow startup — set it to at least 1.5x your slowest measured startup time in CI.
Docker Compose Dependency and Readiness Strategy
IfService depends on another service being started but not necessarily ready — for example, a logging sidecar
UseUse depends_on without condition — container start order is guaranteed, no readiness wait
IfService depends on another service being ready to accept connections — database, Redis, message broker
UseUse depends_on with condition: service_healthy and a healthcheck using the dependency's native readiness probe
IfService depends on an external resource not managed by Compose — a cloud database, an external API
UseHandle readiness in the application itself with retry logic and exponential backoff — Spring Retry or Resilience4j RetryRegistry
IfService startup is slow and Docker marks it unhealthy before it finishes initializing
UseAdd start_period to the healthcheck block — delays the first health evaluation to give the service time to start
IfMultiple services need to start in a specific sequence with readiness gates between each
UseChain depends_on with condition: service_healthy through the sequence — each service waits for the previous one's healthcheck to pass

JVM Memory Configuration for Containerized Environments

This is the configuration mistake that kills containers silently. Before Java 10, the JVM had no awareness of container memory limits. It read /proc/meminfo to determine available memory and sized the heap based on the host machine's total RAM. If your host had 64GB of RAM and your container limit was 512MB, the JVM allocated approximately 16GB of heap — 32 times what the container allowed. The container OOM killer then terminated the process with exit code 137 and no Java exception was ever written to the logs because the JVM never got a chance to throw OutOfMemoryError.

Java 10 introduced -XX:+UseContainerSupport, which reads memory limits from the cgroup filesystem instead of /proc/meminfo. This flag is enabled by default starting in Java 11. Combined with -XX:MaxRAMPercentage, it gives you precise control over heap sizing relative to your container's memory limit.

The 75% figure for MaxRAMPercentage is not arbitrary. JVM memory consumption includes heap, metaspace, code cache, thread stacks, direct buffers, and the JVM's own native overhead. On a typical Spring Boot application, non-heap memory runs between 100MB and 200MB. A 512MB container with 75% MaxRAMPercentage allocates approximately 384MB of heap, leaving 128MB for the rest. This balance holds for most Spring Boot services I have worked with. Services with heavy reflection usage (frameworks that rely on proxies and dynamic class generation) may need more metaspace and a lower MaxRAMPercentage.

The -XX:+ExitOnOutOfMemoryError flag is underused and undervalued. Without it, a JVM that runs out of heap attempts to GC repeatedly, degrades to extreme latency, and eventually becomes unresponsive — without exiting. Kubernetes readiness probes may keep marking it as healthy while the JVM is effectively frozen. With the flag, the JVM exits cleanly, the container restarts, and your monitoring captures a clear OOM event rather than a mysterious latency spike.

application-docker.yml · YAML
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
# application-docker.yml — loaded when SPRING_PROFILES_ACTIVE=docker
# This profile contains Docker-specific configuration that should not appear
# in application.yml (which is used for local development without containers)

spring:
  datasource:
    # forge-db resolves via Docker Compose DNS to the postgres container
    url: jdbc:postgresql://forge-db:5432/forgedb
    username: ${SPRING_DATASOURCE_USERNAME:forge_admin}
    password: ${SPRING_DATASOURCE_PASSWORD}
    hikari:
      # Container-appropriate pool sizing — not the development defaults
      maximum-pool-size: 10
      minimum-idle: 2
      connection-timeout: 20000
      # If connection cannot be obtained in 20s, fail fast rather than hang
      idle-timeout: 300000
      # Validate connections on checkout — catches connections dropped by postgres idle timeout
      connection-test-query: SELECT 1
  data:
    redis:
      host: forge-redis
      port: 6379
      timeout: 2000ms
      lettuce:
        pool:
          max-active: 8
          min-idle: 2
          max-wait: 2000ms

# Bind to all interfaces — 0.0.0.0 is required inside containers
# 127.0.0.1 (loopback) is the default and will make the app unreachable from outside the container
server:
  address: 0.0.0.0
  port: 8080

# Expose actuator endpoints for health probes and metrics
management:
  endpoints:
    web:
      exposure:
        include: health, metrics, info, conditions
  endpoint:
    health:
      # Show full health details for Docker healthcheck probe
      show-details: always
      # Kubernetes-compatible readiness and liveness probes
      probes:
        enabled: true
  # Actuator port can be different from app port for internal-only access
  # Omit this to use the same port as the application
  server:
    port: 8080

logging:
  # Output to stdout — Docker captures stdout and makes it available via docker logs
  # Never log to files inside a container unless you have a persistent volume mount
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n"
  level:
    root: INFO
    io.thecodeforge: DEBUG
▶ Output
# Verify JVM is respecting container memory limit (512MB container):
#
# docker exec forge-app jcmd 1 VM.flags | grep -iE 'MaxHeap|RAMPercentage'
# -XX:MaxHeapSize=402653184 (384MB = 75% of 512MB container limit)
# -XX:MaxRAMPercentage=75.000000
#
# Without -XX:+UseContainerSupport (or before Java 11):
# -XX:MaxHeapSize=16777216000 (16GB = 25% of 64GB host RAM — dangerous)
#
# Actuator health response:
# curl http://localhost:8080/actuator/health | jq '.status'
# "UP"
#
# Actuator readiness probe (for Kubernetes):
# curl http://localhost:8080/actuator/health/readiness
# {"status": "UP"}
#
# JVM memory usage:
# curl http://localhost:8080/actuator/metrics/jvm.memory.used | jq '.measurements[0].value'
# 187432960 (approximately 179MB used of 384MB allocated heap)
Mental Model
Container Memory Budget: The 75-25 Split
MaxRAMPercentage=75.0 is a starting point, not a universal rule. The remaining 25% must cover everything the JVM uses outside the heap.
  • Heap (75%): object allocations, application data, cached objects — this is what MaxRAMPercentage controls
  • Metaspace (variable): class metadata, proxies, reflection data — Spring Boot with many annotations uses 80-150MB
  • Code cache (variable): JIT-compiled native code — typically 40-80MB for a Spring Boot service
  • Thread stacks (predictable): approximately 512KB per thread, multiply by your thread pool size
  • Direct buffer memory: Netty (used by Lettuce, WebFlux) allocates off-heap direct buffers that do not count toward heap
  • If your service uses heavy AOP, Lombok, or Hibernate proxies, start at 65% MaxRAMPercentage and measure before increasing
📊 Production Insight
A team set --memory=256m on their container but did not configure MaxRAMPercentage. The JVM, reading the host's 32GB RAM, allocated 8GB of heap. The container OOM killer terminated the process with exit code 137 approximately 90 seconds after startup — after the JVM had finished loading classes and the heap usage crossed 256MB. No Java exception was logged because the JVM process was killed at the OS level before it could write anything. The team spent several hours checking for application exceptions before someone noticed the exit code 137 and recognized it as an OOM kill. Adding -XX:MaxRAMPercentage=70.0 resolved the issue immediately.
🎯 Key Takeaway
-XX:+UseContainerSupport tells the JVM to read memory limits from cgroups — enabled by default in Java 11+, add it explicitly for clarity and compatibility.
-XX:MaxRAMPercentage=75.0 allocates 75% of container memory as heap — the remaining 25% covers metaspace, code cache, threads, and native memory.
-XX:+ExitOnOutOfMemoryError converts a slow, invisible JVM degradation into a fast, visible container restart — prefer loud failures over silent ones.
Always verify with docker exec <container> jcmd 1 VM.flags after deploy to confirm the JVM is using the expected heap size.

Security Hardening: Non-Root Users and Secret Management

Running a containerized application as root is a security debt that auditors and security teams flag on every compliance review — for good reason. If your Spring Boot application has a vulnerability that allows arbitrary command execution (a deserializaton exploit, a dependency with an RCE CVE, a path traversal in a file upload handler), and the JVM is running as root, the attacker has root access to the container filesystem. From there, depending on your Docker daemon configuration, container escape to the host becomes possible.

The fix is three lines of Dockerfile: create a system group, create a system user, switch to that user. The user and group are created with -r (system account) flags, which means no home directory, no login shell, no password hash in /etc/shadow. This user can run the JAR but cannot install software, modify system files, or write outside the directories you explicitly own.

The COPY --chown directive is equally important. If you create the user after copying the JAR, the JAR is owned by root and the non-root user cannot read it. Copy with --chown=forgeuser:forgegroup, or create the user first and then copy.

Secret management is the second hardening dimension. The pattern I see most often — and most often in incident post-mortems — is credentials hardcoded in docker-compose.yml and committed to the repository. Once a credential is in Git history, it is effectively compromised regardless of whether you 'fixed it' in a later commit. The correct pattern is environment variable references in Compose (${DB_PASSWORD}) resolved from a .env file that is in .gitignore, or from a secret management system in production.

For production deployments, the .env file approach is still insufficient if the secret manager is not involved. AWS Secrets Manager, HashiCorp Vault, and GCP Secret Manager all have Spring Boot integrations that inject secrets as environment variables or properties at startup without them ever appearing in the container definition.

Dockerfile.security-hardened · DOCKERFILE
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
# Security-hardened production Dockerfile
# Demonstrates: non-root user, read-only filesystem, minimal base image

FROM eclipse-temurin:21-jdk-jammy AS build
WORKDIR /app
COPY .mvn/ .mvn
COPY mvnw pom.xml ./
RUN ./mvnw dependency:go-offline -q
COPY src ./src
RUN ./mvnw clean package -DskipTests -q

# ── Security-hardened runtime stage ──────────────────────────────────────────
FROM eclipse-temurin:21-jre-jammy

# Security: Remove package manager caches after any installs
# No apt-get installs needed here — JRE image already has curl for healthchecks
RUN apt-get update && apt-get install -y --no-install-recommends curl \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

WORKDIR /app

# Create system group and user with no login shell
# -r: system account  -s /sbin/nologin: no interactive login possible
RUN groupadd -r -g 1001 forgegroup \
    && useradd -r -u 1001 -g forgegroup -s /sbin/nologin -M forgeuser

# Create all directories the app needs with correct ownership
# Do this BEFORE switching to non-root user
RUN mkdir -p /app/logs /app/tmp /app/config \
    && chown -R forgeuser:forgegroup /app

# Copy artifact with explicit ownership — no root-owned files in /app
COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar

# Switch to non-root — all subsequent instructions and the ENTRYPOINT run as forgeuser
USER forgeuser

# Expose port for documentation purposes — docker run -p still required
EXPOSE 8080

# Health check uses curl — confirms app is responding before reporting healthy
# --fail flag: curl exits non-zero on HTTP 4xx/5xx — Docker treats this as unhealthy
HEALTHCHECK --interval=15s --timeout=5s --start-period=30s --retries=3 \
  CMD curl --fail --silent http://localhost:8080/actuator/health || exit 1

ENTRYPOINT ["java", \
  "-XX:+UseContainerSupport", \
  "-XX:MaxRAMPercentage=75.0", \
  "-XX:+ExitOnOutOfMemoryError", \
  "-Djava.io.tmpdir=/app/tmp", \
  "-Dfile.encoding=UTF-8", \
  "-jar", "app.jar"]
▶ Output
# Verify non-root user is running the process:
# docker exec forge-app id
# uid=1001(forgeuser) gid=1001(forgegroup) groups=1001(forgegroup)
#
# Verify the process cannot write outside its directories:
# docker exec forge-app sh -c 'touch /etc/test 2>&1'
# touch: cannot touch '/etc/test': Permission denied
#
# Verify curl is available for healthcheck:
# docker exec forge-app curl --fail http://localhost:8080/actuator/health
# {"status":"UP",...}
#
# Trivy security scan comparison:
# Root user image: "Runs as root" — CRITICAL severity finding
# Non-root image: "Runs as non-root user" — PASS
#
# .env file pattern (never commit this file to Git):
# .env contains: DB_PASSWORD=actual_secret_here
# .gitignore: .env
# docker-compose: ${DB_PASSWORD} references the .env value
⚠ UID 0 in a Container Is Still Root
Running a container as root is not mitigated by container isolation alone. Container escape vulnerabilities are discovered regularly in both Docker daemon and Linux kernel. The principle of least privilege applies inside containers the same way it applies everywhere else in security engineering. Additionally, most compliance frameworks — SOC 2, PCI DSS, HIPAA — require workloads to run as non-root. An audit finding on this point typically results in a required remediation with a deadline. Building non-root images from the start is easier than retrofitting it under audit pressure.
📊 Production Insight
A team skipped the non-root user setup 'for simplicity during the initial rollout, to be fixed later.' The service was deployed to production. Six months later, a security assessment flagged every container as running as root. By that point, the Dockerfile had been copied as a template for 14 microservices. Remediating all 14 required coordinating 14 teams, discovering that 3 of the services had permission issues with file paths that only manifested after the user switch, and delaying two feature releases. The cost of 'fix it later' was orders of magnitude higher than adding the four lines of user creation to the original Dockerfile.
🎯 Key Takeaway
Non-root container users are not optional in production — they are a baseline security requirement and a compliance expectation in most regulated environments.
Create the user and directories before the COPY command, then switch with USER — the order matters for file ownership.
Never hardcode secrets in Dockerfile or docker-compose.yml — use .env files in .gitignore for local development and a proper secret manager for production deployments.

Layer Caching Strategy: Making Builds Deterministically Fast

Docker layer caching is the mechanism that makes subsequent builds fast after the first. Every instruction in a Dockerfile creates a layer. If Docker determines a layer's inputs have not changed since the last build, it reuses the cached result instead of re-executing the instruction. This is binary — either the layer is fully cached or it is fully invalidated and re-executed.

The rule that determines cache validity: if any layer above a given layer is invalidated, all layers below it are also invalidated. This makes the order of instructions in your Dockerfile as important as the instructions themselves.

For Spring Boot applications, the expensive operation is the Maven dependency download — pulling hundreds of JARs from Maven Central. This step can take 60-180 seconds on a cold cache. If you structure your Dockerfile so that COPY src precedes RUN mvnw dependency:go-offline, every single code change — even a one-character fix in a comment — invalidates the dependency download layer. The result is a 3-minute build for every change.

The fix is sequencing. Copy pom.xml and the Maven wrapper first. Run the dependency download. Then copy src. Now the dependency layer is only invalidated when pom.xml changes — which happens far less frequently than code changes. Code-only changes reuse the cached dependency layer and the build takes 20-30 seconds instead of 3 minutes.

This is not a minor optimization. On a team making 30-50 commits per day, the difference between 3-minute and 30-second Docker builds is the difference between a CI pipeline that developers trust and one they work around.

Dockerfile.cached · DOCKERFILE
12345678910111213141516171819202122232425262728293031323334353637383940
# ── STAGE 1: Build with optimized layer caching ──────────────────────────────
FROM eclipse-temurin:21-jdk-jammy AS build
WORKDIR /app

# LAYER 1: Maven wrapper files
# Cache key: .mvn directory and mvnw script
# Invalidated: when Maven wrapper version changes (very rare)
COPY .mvn/ .mvn
COPY mvnw ./
RUN chmod +x mvnw

# LAYER 2: pom.xml only — no source code
# Cache key: pom.xml content hash
# Invalidated: when dependencies change (infrequent — maybe once per sprint)
COPY pom.xml ./

# LAYER 3: Dependency download
# Cache key: inherited from LAYER 2 (pom.xml hash)
# Invalidated: only when pom.xml changes
# This layer takes 60-180 seconds on first run, 0 seconds when cached
RUN ./mvnw dependency:go-offline -q

# LAYER 4: Source code
# Cache key: src directory content hash
# Invalidated: on every code change (expected and unavoidable)
COPY src ./src

# LAYER 5: Compile and package
# Cache key: inherited from LAYER 4 (source code hash)
# Invalidated: on every code change
RUN ./mvnw clean package -DskipTests -q

# ── STAGE 2: Runtime (same as before) ────────────────────────────────────────
FROM eclipse-temurin:21-jre-jammy
WORKDIR /app
RUN groupadd -r -g 1001 forgegroup && useradd -r -u 1001 -g forgegroup -s /sbin/nologin -M forgeuser
RUN mkdir -p /app/logs /app/tmp && chown -R forgeuser:forgegroup /app
COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar
USER forgeuser
ENTRYPOINT ["java", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=75.0", "-XX:+ExitOnOutOfMemoryError", "-Djava.io.tmpdir=/app/tmp", "-jar", "app.jar"]
▶ Output
# Build time comparison — 50-commit working day:
#
# WRONG ORDER (COPY src before dependency download):
# Every commit invalidates dependency cache
# Average build time: 3 minutes 20 seconds
# 50 commits x 3.3 minutes = 165 minutes of CI time per day
#
# CORRECT ORDER (pom.xml first, dependency download, then src):
# Code-only changes: dependency layer cached, build in 28 seconds
# pom.xml changes: full re-download, build in 3 minutes 10 seconds
# 50 commits (48 code, 2 pom.xml): 48x28s + 2x190s = 1344s + 380s = 28.7 minutes
# Daily CI time reduction: 165 minutes -> 29 minutes = 82% reduction
#
# Docker cache inspection:
# docker history io.thecodeforge/forge-api:latest --no-trunc
# Shows each layer with size and creation command
# Cached layers show age (e.g., '2 hours ago') — non-cached show 'just now'
#
# Force rebuild without cache (when you need a clean state):
# docker build --no-cache -t io.thecodeforge/forge-api:latest .
💡Cache Invalidation Is Binary and Cascading
Docker layer caching has no partial invalidation. Either a layer is fully cached or it is fully rebuilt. And when a layer is rebuilt, every subsequent layer is also rebuilt regardless of whether its inputs changed. This makes the ordering of COPY and RUN instructions the primary variable in build performance. If you have slow operations, put them high in the Dockerfile where they are most likely to be cached. Put frequently-changing files — source code — as low as possible so they invalidate only the fast-running layers below them.
📊 Production Insight
A team running 8 microservices on a shared CI runner pool had average pipeline times of 22 minutes. Each pipeline ran docker build with source code copied before the dependency download, making every commit invalidate the Maven cache layer. After reordering the Dockerfile instructions — pom.xml first, dependency:go-offline second, src third — average pipeline time dropped to 6 minutes. The runner pool was effectively 3.7x more capable without adding any infrastructure, purely from Dockerfile instruction ordering.
🎯 Key Takeaway
Layer cache ordering determines build speed more than hardware does — put slow, stable operations high, fast, changing operations low.
COPY pom.xml before COPY src and run dependency:go-offline between them — this is the single highest-leverage Dockerfile optimization for Spring Boot.
Docker cache invalidation is cascading: one invalidated layer rebuilds everything below it — understand which layers change at which frequency and order accordingly.
🗂 Standard JAR Execution vs. Containerized Execution
Docker adds isolation, portability, and resource control at the cost of a thin virtualization layer. For anything beyond a single-developer local setup, the trade-off is overwhelmingly positive.
AspectStandard JAR ExecutionContainerized Execution (Docker)
PortabilityRequires matching JRE version, OS libraries, and environment variables on each host — works on developer's machine, may fail on CI or productionCompletely self-contained — the JRE, OS libraries, and configuration are in the image. Runs identically on any host with Docker installed
Environment ConsistencyVaries by host OS, JRE version, and installed libraries — 'works on my machine' is a real failure mode that causes production incidentsIdentical everywhere — Linux namespace isolation ensures the container sees the same environment on every host
Resource ControlNo enforcement — a runaway GC cycle or memory leak can consume all host RAM and affect other services on the same machineStrict — container memory and CPU limits enforced by cgroups. A misbehaving container cannot affect other containers on the same host
IsolationShared filesystem, shared network stack, shared process namespace — one service's file operations can affect another'sIsolated filesystem, isolated network namespace, isolated process space — containers share the kernel but not each other's resources
Security SurfaceRuns with whatever user launched the JAR — often a developer's local user or a broad service accountConfigurable user via USER directive — non-root by default in well-configured images, with explicit filesystem permissions
Startup DependenciesManual — you must start the database before the application and hope it is ready. Sleep calls are the common workaround.Structured — Docker Compose healthchecks and depends_on conditions make startup order and readiness deterministic
Deployment UnitJAR file — recipient environment must have the correct JRE, correct OS, and correct environment variables configured separatelyDocker image — a single immutable artifact contains the entire runtime. Push once, pull anywhere
Best ForLocal development with a single service, scripted deployments to managed servers with configuration management tools (Ansible, Chef)Everything that ships to more than one environment — development parity with production, CI pipelines, staging, and production deployments

🎯 Key Takeaways

  • Docker ensures that if it runs in CI it runs in production — the image is the deployment artifact and the environment is baked into it, not configured separately on each host.
  • Multi-stage builds are not an optimization — they are the correct way to build production Java images. Single-stage builds with JDK in production are a security and operational liability.
  • Never run Spring Boot containers as root. Four lines of Dockerfile (groupadd, useradd, mkdir with chown, USER directive) prevent an entire class of security vulnerabilities and compliance findings.
  • depends_on is startup order — healthcheck is readiness. Always pair database dependencies with service_healthy conditions and database-native readiness probes (pg_isready, mysqladmin ping, redis-cli ping).
  • Configure JVM memory flags explicitly: -XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0. Verify with jcmd after deploy. A JVM that ignores container limits is a time bomb.
  • Layer cache ordering determines build speed: pom.xml before src, dependency download before compile. Code-only builds should take under 30 seconds on a warm cache.
  • Log to stdout inside containers — Docker captures it automatically. File-based logging inside containers loses logs on restart and grows the writable layer unboundedly.
  • Set server.address=0.0.0.0 in your container-specific application profile. The default loopback binding makes your application unreachable from outside the container network namespace.

⚠ Common Mistakes to Avoid

    Running the container as the root user
    Symptom

    Security audits flag every container as a critical or high severity finding. Compliance reviews fail. If the application has any RCE vulnerability, the attacker gets root access to the container filesystem — and potentially to the host depending on Docker daemon configuration and kernel version.

    Fix

    Add a non-root user in the Dockerfile: RUN groupadd -r -g 1001 forgegroup && useradd -r -u 1001 -g forgegroup -s /sbin/nologin -M forgeuser. Create the directories the app needs with chown before switching users. Use COPY --chown=forgeuser:forgegroup when copying the JAR. Switch with USER forgeuser before ENTRYPOINT. Verify with docker exec <container> id.

    Hardcoding secrets in Dockerfile or docker-compose.yml
    Symptom

    Credentials are committed to Git history and visible to everyone with repository access. Docker images in registries expose secrets to anyone who pulls the image. A single repository access event compromises all environments that use those credentials.

    Fix

    Reference environment variables in docker-compose.yml using ${DB_PASSWORD} syntax. Resolve them from a .env file that is in .gitignore — never committed. For production, use Docker Secrets in Swarm mode, HashiCorp Vault with the Spring Cloud Vault integration, or a cloud-native secret manager (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault). Secrets in Git are permanent — rotation is required after any accidental commit, even if the commit is subsequently reverted.

    Including build tools (Maven and JDK) in the final production image
    Symptom

    Image size is 750-900MB. Push times to container registries are 4-8 minutes. Security scans flag dozens of CVEs from JDK libraries and Maven's own dependencies. None of these libraries execute at runtime — they are pure dead weight in the production image.

    Fix

    Use multi-stage builds. Stage 1 compiles with JDK and Maven. Stage 2 copies only the built JAR to a JRE base image. The build stage is discarded. Image size drops from 800MB to 250MB. CVE count drops by 60-80%. Verify build tools are absent from the final image: docker run --rm <image> sh -c 'which mvn || echo absent'.

    Using depends_on without healthcheck conditions
    Symptom

    Application crashes on startup with ECONNREFUSED to the database. Docker Compose shows both containers as running. The failure is intermittent — sometimes the database is ready in time, sometimes it is not. CI failure rate is 20-40% with no code change between passing and failing builds.

    Fix

    Add a healthcheck to the database service using its native readiness probe: pg_isready for PostgreSQL, mysqladmin ping for MySQL, redis-cli ping for Redis. Change depends_on to use condition: service_healthy. Add start_period to the healthcheck to give the database time to initialize before health evaluation begins. The startup sequence becomes deterministic.

    Not configuring JVM memory flags for container environments
    Symptom

    Container is killed with exit code 137 (OOM kill) with no Java exception in the logs. The JVM allocated heap based on host RAM (e.g., 16GB from a 64GB host) while the container limit was 512MB. The OOM kill happens at the OS level before the JVM can write OutOfMemoryError.

    Fix

    Add -XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0 to the ENTRYPOINT. Set container memory limits via --memory in docker run or mem_limit in Compose. Verify the effective heap size: docker exec <container> jcmd 1 VM.flags | grep MaxHeap. The value should be approximately 75% of your container memory limit.

    Copying source code before downloading dependencies in the Dockerfile
    Symptom

    Every Docker build takes 3+ minutes even for trivial one-line code changes. CI pipelines are slow and developers work around them rather than waiting. Build runners are saturated with redundant Maven downloads.

    Fix

    Reorder Dockerfile instructions: COPY pom.xml first, RUN dependency:go-offline second, COPY src third. This ensures the dependency download layer is only invalidated when pom.xml changes, not on every code change. Code-only builds drop from 3 minutes to 25 seconds.

    Logging to files inside a container without a persistent volume mount
    Symptom

    Application logs disappear when the container restarts. Post-incident analysis is impossible because the evidence is gone. Log files grow unbounded inside the container filesystem and consume the container's writable layer.

    Fix

    Configure Spring Boot to log to stdout (the default console appender) instead of rolling file appenders. Docker captures stdout automatically and makes it available via docker logs. For long-term log retention, ship logs to an external system (CloudWatch, Elasticsearch, Datadog) via the Docker logging driver or a sidecar agent.

    Not setting server.address=0.0.0.0 in the container application configuration
    Symptom

    Application starts successfully, healthcheck inside the container passes, but the mapped port is unreachable from the host or from other containers. Port mapping shows correctly in docker ps but connections are refused.

    Fix

    Set server.address=0.0.0.0 in application-docker.yml or pass SERVER_ADDRESS=0.0.0.0 as an environment variable. The default loopback binding (127.0.0.1) is unreachable from outside the container's network namespace. This is a Linux networking fundamental that surprises developers used to macOS where the behavior differs.

Interview Questions on This Topic

  • QExplain the concept of multi-stage builds in Docker. Why is it particularly useful for Spring Boot applications and what is the real measured impact?Mid-levelReveal
    Multi-stage builds use multiple FROM statements in a single Dockerfile. Each FROM starts a new build stage with its own filesystem. Stage 1 uses a heavy image — eclipse-temurin:21-jdk-jammy — with Maven to compile the Spring Boot application and run the dependency resolution. Stage 2 starts fresh from a lightweight image — eclipse-temurin:21-jre-jammy — and copies only the compiled JAR using COPY --from=build. The build stage is completely discarded — nothing from it appears in the final image except files explicitly copied over. This is particularly useful for Spring Boot because Java requires a JDK to compile but only a JRE to run. The JDK is approximately 400MB larger than the JRE and includes tools (javac, javap, jshell, diagnostic utilities) that a running application never needs. Maven adds another 100-200MB of tooling and cached dependencies. Measured impact: single-stage JDK images typically measure 800-900MB. Multi-stage JRE images measure 220-280MB — a 65-70% reduction. Security scanner CVE counts drop by 60-80% because the JDK toolchain and its transitive dependencies are absent. Container registry push times drop proportionally. On a team with 50 deployments per day, this translates to hours of saved CI time weekly without any application code change.
  • QHow does Spring Boot detect it is running inside a Docker container, and how do you optimize JVM memory settings for containerized deployments?SeniorReveal
    Before Java 10, the JVM read /proc/meminfo to determine available memory — which reports the host machine's total RAM, not the container's memory limit. A JVM in a 512MB container on a 64GB host would see 64GB of available memory and allocate approximately 16GB of heap. The container OOM killer would then terminate the JVM process with exit code 137 — before the JVM could write any exception to logs. Java 10 introduced -XX:+UseContainerSupport, which instructs the JVM to read memory limits from the cgroup filesystem (/sys/fs/cgroup/memory) rather than /proc/meminfo. This flag is enabled by default in Java 11 and later. Combined with -XX:MaxRAMPercentage=75.0, it allocates 75% of the container's memory limit as heap. The 75% figure accounts for non-heap JVM memory: metaspace (class metadata, Spring's proxy classes — typically 80-150MB), code cache (JIT-compiled native code — typically 40-80MB), thread stacks (approximately 512KB per thread), and Netty's direct buffer pool (used by Lettuce and WebFlux). These together typically consume 100-200MB on a Spring Boot service. Add -XX:+ExitOnOutOfMemoryError to convert silent JVM degradation under memory pressure into an immediate, visible container exit. Verify effectiveness with docker exec <container> jcmd 1 VM.flags | grep MaxHeap — the reported value should be approximately 75% of your container --memory limit in bytes.
  • QWhat is the difference between a Docker Image and a Docker Container in the context of a Spring Boot application?JuniorReveal
    A Docker Image is a read-only, layered filesystem artifact built by executing the instructions in a Dockerfile. For a Spring Boot application, it contains: the JRE runtime, the OS base libraries, your compiled application JAR, and configuration instructions (ENTRYPOINT, ENV, EXPOSE). It is immutable once built and stored in a registry. You can think of it as the class definition in Java — it describes the structure and behavior but does not itself run. A Docker Container is a running instance of an image — an image with a live process running inside an isolated environment with its own network namespace, PID namespace, and writable filesystem layer. It is created from an image with docker run or docker compose up. The writable layer captures any filesystem changes the running process makes — log files, temp files, downloaded content. This layer is ephemeral: it disappears when the container is removed. In Spring Boot terms: the image is the deployment artifact equivalent of a JAR file — immutable, versioned, distributable. The container is the running process equivalent — has a lifecycle (start, run, stop, restart), consumes real resources (CPU, memory), and has observable state (logs, metrics, health status). Multiple containers can be created from the same image simultaneously — that is how horizontal scaling works in container orchestration.
  • QHow do you pass application.properties values to a Spring Boot application running inside a Docker container? What are the trade-offs between the three approaches?Mid-levelReveal
    Three methods with distinct trade-offs: (1) Environment variables — Spring Boot's relaxed binding maps SPRING_DATASOURCE_URL to spring.datasource.url automatically. Pass via docker run -e SPRING_DATASOURCE_URL=jdbc:postgresql://... or the environment block in docker-compose.yml. This is the most Docker-native approach and works with every orchestration platform. Trade-off: environment variables are visible to all processes inside the container (via /proc/self/environ) and appear in docker inspect output — unsuitable for high-sensitivity secrets without additional protection. (2) Volume-mounted properties file — mount application-docker.yml into the container at /app/config/application-docker.yml. Spring Boot loads from /config on the classpath automatically. Pass -e SPRING_PROFILES_ACTIVE=docker to activate it. Trade-off: the file must exist on the host at the mount path — works well for VM deployments, awkward for Kubernetes (requires ConfigMap or Secret volume mounts). (3) Command-line arguments — append --spring.datasource.url=... after the JAR path in ENTRYPOINT. Takes highest priority in Spring's property resolution order. Trade-off: arguments appear in ps output and container inspect — not suitable for secrets. Also makes ENTRYPOINT verbose. Best practice: use environment variables for non-sensitive configuration and service coordinates (URLs, ports, feature flags). Use a secret manager integration (Spring Cloud Vault, AWS Secrets Manager) for credentials — these inject as environment variables at startup but are never stored in the container definition.
  • QExplain the PID 1 problem in Docker and why it matters for Spring Boot graceful shutdown.SeniorReveal
    In Linux, the process with PID 1 has special kernel-level responsibilities that no other process has: it must reap orphaned child processes (calling wait() to prevent zombie processes from accumulating), and it is the signal recipient for the init system. Signal forwarding to PID 1 works differently than to other PIDs — specifically, SIGTERM sent to PID 1 is only delivered if the process has an explicit signal handler for it. The default signal behavior (which terminates the process) does not apply. When your Spring Boot application runs as PID 1 inside a Docker container, docker stop sends SIGTERM to PID 1. If the JVM does not have an explicit SIGTERM handler registered — and by default it does, but edge cases exist with some JVM distributions and init system interactions — the signal is ignored. Docker waits for the grace period (default 10 seconds) and then sends SIGKILL, which cannot be caught. SIGKILL terminates the JVM instantly, skipping all shutdown hooks: Spring's context close listeners, @PreDestroy methods, HikariCP connection pool drain, Lettuce connection cleanup, and any in-flight request completion. The fix has two parts: first, use the exec form of ENTRYPOINT (the JSON array form: ENTRYPOINT ["java", ...]) rather than shell form. Shell form wraps the command in sh -c, making sh PID 1 and your JVM a child process — sh does not forward signals to children by default. Exec form makes the JVM itself PID 1 and it handles SIGTERM correctly. Second, for services that spawn child processes (JVM-based frameworks that use ProcessBuilder, or applications that exec subprocesses), add a lightweight init system. Docker provides --init flag (uses tini), or you can add tini explicitly as ENTRYPOINT ['/usr/bin/tini', '--', 'java', ...]. Tini properly reaps zombies and forwards signals to the JVM child process.

Frequently Asked Questions

What is the difference between a single-stage and multi-stage Dockerfile for Spring Boot?

A single-stage Dockerfile uses one FROM statement — typically a JDK image — and includes all build tools (JDK, Maven, Maven cache) in the final production image. The result is typically 750-900MB. A multi-stage Dockerfile uses two FROM statements: Stage 1 compiles with JDK and Maven, Stage 2 copies only the compiled JAR to a JRE base image. The build stage is completely discarded. The result is 220-280MB. The size difference directly affects push time to registries, pull time in CI and Kubernetes, and the number of CVEs reported by security scanners. Multi-stage is not a best practice — it is the correct approach for any image that ships to production.

How do I pass secrets to a Docker container without hardcoding them?

Use environment variable references in docker-compose.yml: ${DB_PASSWORD} resolves from a .env file at runtime. The .env file stays in .gitignore and never enters version control. For production, do not use .env files — use a proper secret manager. AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault all have Spring Boot integrations (Spring Cloud Vault, AWS Secrets Manager property source) that inject secrets as environment variables or application properties at startup. The key principle: secrets should never appear in the container definition, the Dockerfile, or any version-controlled file. If a secret touches Git, it is compromised and must be rotated regardless of subsequent commits that 'remove' it.

Why does my Spring Boot app crash with ECONNREFUSED even though the database container is running?

Docker Compose depends_on only ensures the database container process has started. It does not mean the database inside that container has finished initialization and is accepting connections. PostgreSQL's container process starts in under one second. PostgreSQL the database engine finishes cluster initialization and opens its listener in 10-20 seconds. Your Spring Boot application's HikariCP pool attempts its first connection during this gap and fails. The fix: add a healthcheck to the database service using pg_isready (for PostgreSQL), and change depends_on to use condition: service_healthy. Also add start_period to the healthcheck to give PostgreSQL time to initialize before health evaluation begins.

What JVM flags should I use for running Spring Boot in Docker?

The minimum set: -XX:+UseContainerSupport (reads memory limits from cgroups instead of host /proc/meminfo — enabled by default in Java 11+ but add explicitly for clarity), -XX:MaxRAMPercentage=75.0 (allocates 75% of container memory as heap — leaves 25% for metaspace, code cache, thread stacks, and Netty direct buffers), and -XX:+ExitOnOutOfMemoryError (fails loudly with a container exit instead of degrading silently under memory pressure). Additionally: -Djava.io.tmpdir=/app/tmp (ensures temp files go to a directory the non-root user owns) and -Dfile.encoding=UTF-8 (prevents locale-dependent character encoding differences between environments). Verify after deployment with docker exec <container> jcmd 1 VM.flags.

How do I reduce my Spring Boot Docker image size?

In order of impact: (1) Multi-stage build — switch from JDK to JRE in the runtime stage. This is the largest single reduction, typically 500-600MB. (2) Alpine variant — switch from eclipse-temurin:21-jre-jammy to eclipse-temurin:21-jre-alpine for a smaller OS footprint, typically saving another 40-60MB. Note: alpine uses musl libc instead of glibc — verify your dependencies are compatible before using alpine in production. (3) Remove Maven download cache — add -Dmaven.repo.local=/tmp/m2 to the Maven command so the dependency cache is not included if you accidentally structured layers incorrectly. Realistic targets: Jammy JRE image 240-280MB, Alpine JRE image 180-220MB. Both are 65-75% smaller than a single-stage JDK image.

Can I use Docker Compose for production deployments?

Docker Compose is purpose-built for local development, CI pipelines, and single-host deployments. For production across multiple servers, it lacks the features you need: automatic rescheduling when a host goes down, rolling deployment with zero downtime, horizontal scaling across multiple nodes, integrated secret management, and resource quotas across a cluster. For production with multiple hosts, use Kubernetes (the industry standard for orchestration), Docker Swarm (simpler Compose-compatible syntax but less ecosystem support), or a managed container service (AWS ECS, Google Cloud Run, Azure Container Apps) that handles orchestration for you. Compose in production is appropriate only for single-server deployments where you are consciously accepting its limitations.

My container is being killed with exit code 137 but I see no Java exception in the logs. What is happening?

Exit code 137 means the container process was killed by SIGKILL — the uncatchable termination signal. In containerized environments, this is almost always an OOM (Out of Memory) kill from the Linux OOM killer. The JVM was consuming more memory than the container's limit and the kernel terminated it forcibly, before the JVM had an opportunity to write OutOfMemoryError to any log. Confirm with: docker inspect <container> | jq '.[0].State.OOMKilled' — if true, memory is the issue. Fix by adding -XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0 to your ENTRYPOINT. Verify the effective heap allocation with docker exec <container> jcmd 1 VM.flags | grep MaxHeap — it should be approximately 75% of your container --memory limit. If the OOM kills persist, the container memory limit is genuinely too low for the workload — measure actual memory usage under realistic load with docker stats before increasing the limit.

Why is my Docker build taking 3 minutes on every commit even though I only changed one line of code?

The most common cause is COPY src appearing before the dependency download step in your Dockerfile. Every code change invalidates the COPY src layer, which cascades to invalidate all subsequent layers including the Maven dependency download. Docker layer caching is binary and cascading — one invalidated layer rebuilds everything below it. The fix is instruction reordering: COPY .mvn and COPY mvnw first (rarely changes), then COPY pom.xml (changes only when dependencies change), then RUN mvnw dependency:go-offline (cached against pom.xml hash), then COPY src (changes every commit), then RUN mvnw package. With this ordering, code-only changes hit the cached dependency layer and the build takes 20-30 seconds. Only pom.xml changes trigger the full dependency download.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousSpring Boot Testing with JUnit and MockitoNext →Microservices with Spring Boot and Spring Cloud
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged