Spring Boot with Docker: The Ultimate Containerization Guide
- Docker ensures that if it runs in CI it runs in production — the image is the deployment artifact and the environment is baked into it, not configured separately on each host.
- Multi-stage builds are not an optimization — they are the correct way to build production Java images. Single-stage builds with JDK in production are a security and operational liability.
- Never run Spring Boot containers as root. Four lines of Dockerfile (groupadd, useradd, mkdir with chown, USER directive) prevent an entire class of security vulnerabilities and compliance findings.
- Docker packages your Spring Boot app with its exact runtime environment — JDK, OS libraries, config — into a portable, reproducible image
- Multi-stage builds separate the heavy build stage (Maven + JDK) from the slim runtime stage (JRE-only) — reducing image size from 800MB to 250MB
- Docker Compose orchestrates multi-service stacks (app + database + Redis) with service-name DNS resolution — no hardcoded IPs
- depends_on only controls startup order, not readiness — always pair it with healthcheck conditions to prevent connection failures on boot
- Never run containers as root — a compromised app gets host-level access; always define a non-root USER directive
- The biggest mistake: including Maven and JDK in the production image — it doubles size and triples the attack surface
Need to see why a container exited
docker logs --tail 100 <container>docker inspect <container> | jq '.[0].State.ExitCode'Need to verify JVM memory settings are respecting container limits
docker exec <container> jcmd 1 VM.flags | grep -iE 'ram|heap|container'curl -s http://localhost:8080/actuator/metrics/jvm.memory.max | jq '.measurements[0].value'Need to verify database connectivity from inside the application container
docker exec <container> nslookup forge-dbdocker exec <container> nc -zv forge-db 5432Need to inspect what is inside a Docker image without running it
docker history <image> --no-trunc | head -20docker run --rm -it --entrypoint sh <image> -c 'ls -la /app/ && du -sh /* 2>/dev/null | sort -rh | head -10'Docker Compose services failing to start in the right order despite depends_on
docker compose psdocker inspect <db_container> | jq '.[0].State.Health'Production Incident
Production Debug GuideWhen Spring Boot containers behave unexpectedly, here is how to go from observable symptom to a verified resolution. These are ordered by frequency — the first three account for about 80% of the issues I have debugged.
In the modern DevOps landscape, containerization is no longer optional — it is the baseline expectation for any Java service that ships to more than one environment. Spring Boot provides the framework for microservices, but Docker provides the deployment mechanism that makes those services reproducible and portable. Together they eliminate the environment drift that turns 'it works on my machine' into a recurring production incident.
But production-grade containerization is not just about running docker build and calling it done. A bloated 800MB image with embedded Maven doubles deployment time and triples the attack surface — every unused JDK library is a potential CVE waiting to be flagged by your security scanner. A container running as root exposes the host to privilege escalation the moment the application is compromised. A missing healthcheck causes cascading startup failures across your entire Compose stack every time the database takes longer than usual to initialize.
I have debugged all three of these in production. What they share is that the fix was straightforward once you understood the mechanism. The gap was never Docker knowledge — it was understanding what Docker actually does versus what developers assume it does.
This guide covers multi-stage builds with real size numbers, non-root security configurations, Docker Compose orchestration with proper healthchecks, JVM container memory optimization, layer caching strategy, and the specific production failures that separate hobby containers from deployments you can put your name on.
Multi-Stage Builds: The Single Most Impactful Dockerfile Change
The first Dockerfile most developers write for a Spring Boot application uses a single stage with a JDK image and copies the source code directly. It works. But it ships Maven, the JDK compiler, all your source code, and every downloaded dependency into the production image. The result is typically 750-900MB of image that takes minutes to push to ECR or GCR, gets flagged by container security scanners for dozens of JDK-related CVEs, and runs with far more filesystem surface than the running application ever needs.
Multi-stage builds are the fix. The concept is straightforward: use two FROM statements in one Dockerfile. The first stage has everything you need to build — JDK, Maven wrapper, source code. The second stage starts clean with only a JRE base image and copies the compiled JAR artifact from the build stage. When Docker builds the image, the build stage is used and then discarded. Nothing from Stage 1 ends up in the final image except the files you explicitly copy over.
The size improvement is not marginal. I have measured the before-and-after on multiple projects: single-stage JDK image runs 780-850MB, multi-stage JRE image runs 220-280MB. That is a 65-70% reduction. At 50 deployments per day across a team, the cumulative pull and push time savings are substantial. The security improvement is harder to quantify but consistently significant — a JDK image ships hundreds of binaries and libraries that a running Spring Boot application never touches at runtime.
The layer caching strategy inside the build stage matters almost as much as the stage separation itself. If you COPY src before running dependency:go-offline, every code change invalidates the Maven download cache layer and forces a full re-download. Copy pom.xml first, run the dependency resolution, then copy src. This way, only actual pom.xml changes trigger dependency re-downloads. Code changes reuse the cached dependency layer.
# ── STAGE 1: Build ─────────────────────────────────────────────────────────── # Use the full JDK with Maven wrapper support # eclipse-temurin is the preferred base: community-maintained, Adoptium project FROM eclipse-temurin:21-jdk-jammy AS build WORKDIR /app # Copy Maven wrapper first — these rarely change, so this layer is almost always cached COPY .mvn/ .mvn COPY mvnw pom.xml ./ # Download all dependencies before copying source code # Layer cache key is pom.xml — only invalidated when dependencies change, not when code changes RUN ./mvnw dependency:go-offline -q # Now copy source — this layer changes on every code commit, which is expected COPY src ./src # Build without tests (tests should run in a separate CI step, not during image build) RUN ./mvnw clean package -DskipTests -q # ── STAGE 2: Runtime ───────────────────────────────────────────────────────── # JRE only — no compiler, no javac, no Maven, no source code # Jammy (Ubuntu 22.04) base keeps OS libraries consistent with the build stage FROM eclipse-temurin:21-jre-jammy WORKDIR /app # Create a dedicated non-root system user and group before adding any files # -r flag: system account (no home dir, no login shell, no password) RUN groupadd -r forgegroup && useradd -r -g forgegroup -s /sbin/nologin forgeuser # Create log and temp directories before switching to non-root user # Without this, the non-root user cannot create directories at runtime RUN mkdir -p /app/logs /app/tmp && chown -R forgeuser:forgegroup /app # Copy only the built artifact — nothing else from the build stage comes through # --chown ensures the JAR is owned by the non-root user from the moment it lands COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar # Switch to non-root user for all subsequent operations including ENTRYPOINT USER forgeuser # JVM flags explained: # -XX:+UseContainerSupport — read memory/CPU limits from cgroups, not /proc/meminfo # -XX:MaxRAMPercentage=75.0 — allocate 75% of container memory as heap, leave 25% for metaspace/threads # -XX:+ExitOnOutOfMemoryError — crash loudly on OOM instead of degrading silently # -Djava.io.tmpdir=/app/tmp — write temp files to the directory the non-root user owns # -Dfile.encoding=UTF-8 — explicit encoding prevents locale-dependent behavior differences ENTRYPOINT ["java", \ "-XX:+UseContainerSupport", \ "-XX:MaxRAMPercentage=75.0", \ "-XX:+ExitOnOutOfMemoryError", \ "-Djava.io.tmpdir=/app/tmp", \ "-Dfile.encoding=UTF-8", \ "-jar", "app.jar"]
#
# First build (cold cache — downloads all dependencies):
# STAGE 1 build: ~180 seconds (Maven download + compile)
# STAGE 2 setup: ~15 seconds
# Final image: 268MB
#
# Second build after code-only change (warm cache — pom.xml unchanged):
# STAGE 1 build: ~25 seconds (dependency layer cached, only compile runs)
# STAGE 2 setup: ~5 seconds
# Final image: 268MB
#
# Size comparison:
# Single-stage JDK image: 834MB
# Multi-stage JRE image: 268MB
# Reduction: 67.8%
#
# Security scan results (example using Trivy):
# Single-stage JDK: 43 CVEs (12 HIGH, 3 CRITICAL)
# Multi-stage JRE: 8 CVEs (1 HIGH, 0 CRITICAL)
#
# Verify non-root user:
# docker run --rm io.thecodeforge/forge-api:1.0.0 whoami
# forgeuser
#
# Verify build tools absent from final image:
# docker run --rm io.thecodeforge/forge-api:1.0.0 sh -c 'which mvn || echo "Maven: absent"'
# Maven: absent
# docker run --rm io.thecodeforge/forge-api:1.0.0 sh -c 'which javac || echo "JDK compiler: absent"'
# JDK compiler: absent
- Stage 1 (build): eclipse-temurin:21-jdk-jammy + Maven wrapper + all source code — heavy, approximately 800MB, everything needed to compile
- Stage 2 (run): eclipse-temurin:21-jre-jammy + compiled JAR only — slim, approximately 260MB, nothing that was not in the final artifact
- The build stage is completely discarded by Docker — COPY --from=build is the only bridge between stages
- Layer cache ordering matters as much as stage separation: pom.xml first, dependency download second, src last — code changes do not invalidate the dependency cache
- The -XX:+ExitOnOutOfMemoryError flag is often skipped in examples but matters in production — a JVM degrading silently under memory pressure is harder to diagnose than one that exits with a clear OOM error
Docker Compose Orchestration: Healthchecks, DNS, and Service Dependencies
Docker Compose solves a real problem: running a Spring Boot application locally requires a database, probably a Redis instance, maybe a message broker. Without Compose, you are maintaining a mental checklist of docker run commands with the right port mappings, environment variables, and network flags. With Compose, the entire stack lives in one file that any developer can bring up with a single command.
But Compose introduces its own failure modes, and the most common one is the depends_on misunderstanding. The depends_on directive controls the order in which Docker starts containers. It does not wait for the service inside that container to become ready. A PostgreSQL container is 'started' from Docker's perspective the moment the postgres binary begins executing — which is 10-15 seconds before PostgreSQL finishes cluster initialization, creates the specified database, and opens its TCP listener on port 5432. The Spring Boot application's HikariCP pool makes its first connection attempt during this window and fails.
The fix is pairing depends_on with a healthcheck that uses the database's own readiness probe. PostgreSQL ships pg_isready for exactly this purpose. MySQL has mysqladmin ping. Redis has redis-cli ping. Use the database-native probe rather than a generic TCP check — TCP connectivity alone does not guarantee the database is ready to authenticate and execute queries.
Service name DNS is the other Compose feature that eliminates a class of bugs. Every service in a Compose file gets a DNS entry matching its service name on the default bridge network. This means you never hardcode 172.x.x.x addresses in connection strings — you use forge-db as the hostname and Docker resolves it to whatever internal IP the database container received. This works automatically and correctly handles container restarts where the IP may change.
# Docker Compose V2 format — requires Docker Desktop 4.x+ or Docker Engine 20.10+ # Run: docker compose up -d # Stop: docker compose down # Rebuild app only: docker compose up -d --build forge-app services: forge-app: image: io.thecodeforge/forge-api:latest build: context: . dockerfile: Dockerfile container_name: forge-app ports: - "8080:8080" environment: # Spring profiles — controls which application-{profile}.yml is loaded - SPRING_PROFILES_ACTIVE=docker # Database — uses service name 'forge-db' as hostname, Docker DNS resolves it - SPRING_DATASOURCE_URL=jdbc:postgresql://forge-db:5432/forgedb - SPRING_DATASOURCE_USERNAME=forge_admin # Credentials from .env file — never hardcode passwords in this file - SPRING_DATASOURCE_PASSWORD=${DB_PASSWORD} # Redis — uses service name 'forge-redis' as hostname - SPRING_DATA_REDIS_HOST=forge-redis - SPRING_DATA_REDIS_PORT=6379 # JVM tuning — overrides ENTRYPOINT defaults if needed at runtime - JAVA_OPTS=-XX:MaxRAMPercentage=75.0 # Container memory limit — JVM respects this via -XX:+UseContainerSupport mem_limit: 512m # Restart policy — restarts on crash but not on explicit stop restart: on-failure depends_on: forge-db: condition: service_healthy forge-redis: condition: service_healthy healthcheck: # /actuator/health requires spring-boot-starter-actuator dependency test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"] interval: 15s timeout: 5s retries: 5 start_period: 30s # Grace period — healthcheck not evaluated until app has had 30s to start networks: - forge-network forge-db: image: postgres:16-alpine container_name: forge-db environment: - POSTGRES_DB=forgedb - POSTGRES_USER=forge_admin - POSTGRES_PASSWORD=${DB_PASSWORD} volumes: # Named volume — data persists across docker compose down/up cycles - forge-db-data:/var/lib/postgresql/data # Optional: mount init SQL scripts for schema setup # - ./db/init:/docker-entrypoint-initdb.d healthcheck: # pg_isready: PostgreSQL's official readiness probe # -U and -d flags ensure we are checking the specific database, not just the socket test: ["CMD-SHELL", "pg_isready -U forge_admin -d forgedb -q"] interval: 10s timeout: 5s retries: 5 start_period: 15s networks: - forge-network forge-redis: image: redis:7-alpine container_name: forge-redis # allkeys-lru: evict least recently used keys when maxmemory is reached # appendonly yes: persist data to disk for cache warm-up after restart command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru --appendonly yes volumes: - forge-redis-data:/data healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 10s timeout: 3s retries: 5 networks: - forge-network volumes: forge-db-data: driver: local forge-redis-data: driver: local networks: forge-network: driver: bridge
# docker compose up -d
#
# Expected startup sequence:
# 1. forge-db and forge-redis start in parallel
# 2. forge-db healthcheck runs every 10s — reports healthy after ~15s
# 3. forge-redis healthcheck runs every 10s — reports healthy after ~5s
# 4. forge-app starts only after BOTH dependencies report healthy
# 5. forge-app healthcheck begins after 30s start_period grace
#
# Verify service health:
# docker compose ps
# NAME IMAGE STATUS PORTS
# forge-app io.thecodeforge/forge-api Up (healthy) 0.0.0.0:8080->8080/tcp
# forge-db postgres:16-alpine Up (healthy)
# forge-redis redis:7-alpine Up (healthy)
#
# Test DNS resolution from app container:
# docker exec forge-app nslookup forge-db
# Server: 127.0.0.11 (Docker's embedded DNS resolver)
# Address: 127.0.0.11#53
# Name: forge-db
# Address: 172.18.0.2
#
# Test connectivity:
# docker exec forge-app nc -zv forge-db 5432
# Connection to forge-db 5432 port [tcp/postgresql] succeeded!
#
# Check healthcheck detail for database:
# docker inspect forge-db | jq '.[0].State.Health'
# {"Status": "healthy", "FailingStreak": 0, "Log": [...]}
JVM Memory Configuration for Containerized Environments
This is the configuration mistake that kills containers silently. Before Java 10, the JVM had no awareness of container memory limits. It read /proc/meminfo to determine available memory and sized the heap based on the host machine's total RAM. If your host had 64GB of RAM and your container limit was 512MB, the JVM allocated approximately 16GB of heap — 32 times what the container allowed. The container OOM killer then terminated the process with exit code 137 and no Java exception was ever written to the logs because the JVM never got a chance to throw OutOfMemoryError.
Java 10 introduced -XX:+UseContainerSupport, which reads memory limits from the cgroup filesystem instead of /proc/meminfo. This flag is enabled by default starting in Java 11. Combined with -XX:MaxRAMPercentage, it gives you precise control over heap sizing relative to your container's memory limit.
The 75% figure for MaxRAMPercentage is not arbitrary. JVM memory consumption includes heap, metaspace, code cache, thread stacks, direct buffers, and the JVM's own native overhead. On a typical Spring Boot application, non-heap memory runs between 100MB and 200MB. A 512MB container with 75% MaxRAMPercentage allocates approximately 384MB of heap, leaving 128MB for the rest. This balance holds for most Spring Boot services I have worked with. Services with heavy reflection usage (frameworks that rely on proxies and dynamic class generation) may need more metaspace and a lower MaxRAMPercentage.
The -XX:+ExitOnOutOfMemoryError flag is underused and undervalued. Without it, a JVM that runs out of heap attempts to GC repeatedly, degrades to extreme latency, and eventually becomes unresponsive — without exiting. Kubernetes readiness probes may keep marking it as healthy while the JVM is effectively frozen. With the flag, the JVM exits cleanly, the container restarts, and your monitoring captures a clear OOM event rather than a mysterious latency spike.
# application-docker.yml — loaded when SPRING_PROFILES_ACTIVE=docker # This profile contains Docker-specific configuration that should not appear # in application.yml (which is used for local development without containers) spring: datasource: # forge-db resolves via Docker Compose DNS to the postgres container url: jdbc:postgresql://forge-db:5432/forgedb username: ${SPRING_DATASOURCE_USERNAME:forge_admin} password: ${SPRING_DATASOURCE_PASSWORD} hikari: # Container-appropriate pool sizing — not the development defaults maximum-pool-size: 10 minimum-idle: 2 connection-timeout: 20000 # If connection cannot be obtained in 20s, fail fast rather than hang idle-timeout: 300000 # Validate connections on checkout — catches connections dropped by postgres idle timeout connection-test-query: SELECT 1 data: redis: host: forge-redis port: 6379 timeout: 2000ms lettuce: pool: max-active: 8 min-idle: 2 max-wait: 2000ms # Bind to all interfaces — 0.0.0.0 is required inside containers # 127.0.0.1 (loopback) is the default and will make the app unreachable from outside the container server: address: 0.0.0.0 port: 8080 # Expose actuator endpoints for health probes and metrics management: endpoints: web: exposure: include: health, metrics, info, conditions endpoint: health: # Show full health details for Docker healthcheck probe show-details: always # Kubernetes-compatible readiness and liveness probes probes: enabled: true # Actuator port can be different from app port for internal-only access # Omit this to use the same port as the application server: port: 8080 logging: # Output to stdout — Docker captures stdout and makes it available via docker logs # Never log to files inside a container unless you have a persistent volume mount pattern: console: "%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n" level: root: INFO io.thecodeforge: DEBUG
#
# docker exec forge-app jcmd 1 VM.flags | grep -iE 'MaxHeap|RAMPercentage'
# -XX:MaxHeapSize=402653184 (384MB = 75% of 512MB container limit)
# -XX:MaxRAMPercentage=75.000000
#
# Without -XX:+UseContainerSupport (or before Java 11):
# -XX:MaxHeapSize=16777216000 (16GB = 25% of 64GB host RAM — dangerous)
#
# Actuator health response:
# curl http://localhost:8080/actuator/health | jq '.status'
# "UP"
#
# Actuator readiness probe (for Kubernetes):
# curl http://localhost:8080/actuator/health/readiness
# {"status": "UP"}
#
# JVM memory usage:
# curl http://localhost:8080/actuator/metrics/jvm.memory.used | jq '.measurements[0].value'
# 187432960 (approximately 179MB used of 384MB allocated heap)
- Heap (75%): object allocations, application data, cached objects — this is what MaxRAMPercentage controls
- Metaspace (variable): class metadata, proxies, reflection data — Spring Boot with many annotations uses 80-150MB
- Code cache (variable): JIT-compiled native code — typically 40-80MB for a Spring Boot service
- Thread stacks (predictable): approximately 512KB per thread, multiply by your thread pool size
- Direct buffer memory: Netty (used by Lettuce, WebFlux) allocates off-heap direct buffers that do not count toward heap
- If your service uses heavy AOP, Lombok, or Hibernate proxies, start at 65% MaxRAMPercentage and measure before increasing
Security Hardening: Non-Root Users and Secret Management
Running a containerized application as root is a security debt that auditors and security teams flag on every compliance review — for good reason. If your Spring Boot application has a vulnerability that allows arbitrary command execution (a deserializaton exploit, a dependency with an RCE CVE, a path traversal in a file upload handler), and the JVM is running as root, the attacker has root access to the container filesystem. From there, depending on your Docker daemon configuration, container escape to the host becomes possible.
The fix is three lines of Dockerfile: create a system group, create a system user, switch to that user. The user and group are created with -r (system account) flags, which means no home directory, no login shell, no password hash in /etc/shadow. This user can run the JAR but cannot install software, modify system files, or write outside the directories you explicitly own.
The COPY --chown directive is equally important. If you create the user after copying the JAR, the JAR is owned by root and the non-root user cannot read it. Copy with --chown=forgeuser:forgegroup, or create the user first and then copy.
Secret management is the second hardening dimension. The pattern I see most often — and most often in incident post-mortems — is credentials hardcoded in docker-compose.yml and committed to the repository. Once a credential is in Git history, it is effectively compromised regardless of whether you 'fixed it' in a later commit. The correct pattern is environment variable references in Compose (${DB_PASSWORD}) resolved from a .env file that is in .gitignore, or from a secret management system in production.
For production deployments, the .env file approach is still insufficient if the secret manager is not involved. AWS Secrets Manager, HashiCorp Vault, and GCP Secret Manager all have Spring Boot integrations that inject secrets as environment variables or properties at startup without them ever appearing in the container definition.
# Security-hardened production Dockerfile # Demonstrates: non-root user, read-only filesystem, minimal base image FROM eclipse-temurin:21-jdk-jammy AS build WORKDIR /app COPY .mvn/ .mvn COPY mvnw pom.xml ./ RUN ./mvnw dependency:go-offline -q COPY src ./src RUN ./mvnw clean package -DskipTests -q # ── Security-hardened runtime stage ────────────────────────────────────────── FROM eclipse-temurin:21-jre-jammy # Security: Remove package manager caches after any installs # No apt-get installs needed here — JRE image already has curl for healthchecks RUN apt-get update && apt-get install -y --no-install-recommends curl \ && rm -rf /var/lib/apt/lists/* \ && apt-get clean WORKDIR /app # Create system group and user with no login shell # -r: system account -s /sbin/nologin: no interactive login possible RUN groupadd -r -g 1001 forgegroup \ && useradd -r -u 1001 -g forgegroup -s /sbin/nologin -M forgeuser # Create all directories the app needs with correct ownership # Do this BEFORE switching to non-root user RUN mkdir -p /app/logs /app/tmp /app/config \ && chown -R forgeuser:forgegroup /app # Copy artifact with explicit ownership — no root-owned files in /app COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar # Switch to non-root — all subsequent instructions and the ENTRYPOINT run as forgeuser USER forgeuser # Expose port for documentation purposes — docker run -p still required EXPOSE 8080 # Health check uses curl — confirms app is responding before reporting healthy # --fail flag: curl exits non-zero on HTTP 4xx/5xx — Docker treats this as unhealthy HEALTHCHECK --interval=15s --timeout=5s --start-period=30s --retries=3 \ CMD curl --fail --silent http://localhost:8080/actuator/health || exit 1 ENTRYPOINT ["java", \ "-XX:+UseContainerSupport", \ "-XX:MaxRAMPercentage=75.0", \ "-XX:+ExitOnOutOfMemoryError", \ "-Djava.io.tmpdir=/app/tmp", \ "-Dfile.encoding=UTF-8", \ "-jar", "app.jar"]
# docker exec forge-app id
# uid=1001(forgeuser) gid=1001(forgegroup) groups=1001(forgegroup)
#
# Verify the process cannot write outside its directories:
# docker exec forge-app sh -c 'touch /etc/test 2>&1'
# touch: cannot touch '/etc/test': Permission denied
#
# Verify curl is available for healthcheck:
# docker exec forge-app curl --fail http://localhost:8080/actuator/health
# {"status":"UP",...}
#
# Trivy security scan comparison:
# Root user image: "Runs as root" — CRITICAL severity finding
# Non-root image: "Runs as non-root user" — PASS
#
# .env file pattern (never commit this file to Git):
# .env contains: DB_PASSWORD=actual_secret_here
# .gitignore: .env
# docker-compose: ${DB_PASSWORD} references the .env value
Layer Caching Strategy: Making Builds Deterministically Fast
Docker layer caching is the mechanism that makes subsequent builds fast after the first. Every instruction in a Dockerfile creates a layer. If Docker determines a layer's inputs have not changed since the last build, it reuses the cached result instead of re-executing the instruction. This is binary — either the layer is fully cached or it is fully invalidated and re-executed.
The rule that determines cache validity: if any layer above a given layer is invalidated, all layers below it are also invalidated. This makes the order of instructions in your Dockerfile as important as the instructions themselves.
For Spring Boot applications, the expensive operation is the Maven dependency download — pulling hundreds of JARs from Maven Central. This step can take 60-180 seconds on a cold cache. If you structure your Dockerfile so that COPY src precedes RUN mvnw dependency:go-offline, every single code change — even a one-character fix in a comment — invalidates the dependency download layer. The result is a 3-minute build for every change.
The fix is sequencing. Copy pom.xml and the Maven wrapper first. Run the dependency download. Then copy src. Now the dependency layer is only invalidated when pom.xml changes — which happens far less frequently than code changes. Code-only changes reuse the cached dependency layer and the build takes 20-30 seconds instead of 3 minutes.
This is not a minor optimization. On a team making 30-50 commits per day, the difference between 3-minute and 30-second Docker builds is the difference between a CI pipeline that developers trust and one they work around.
# ── STAGE 1: Build with optimized layer caching ────────────────────────────── FROM eclipse-temurin:21-jdk-jammy AS build WORKDIR /app # LAYER 1: Maven wrapper files # Cache key: .mvn directory and mvnw script # Invalidated: when Maven wrapper version changes (very rare) COPY .mvn/ .mvn COPY mvnw ./ RUN chmod +x mvnw # LAYER 2: pom.xml only — no source code # Cache key: pom.xml content hash # Invalidated: when dependencies change (infrequent — maybe once per sprint) COPY pom.xml ./ # LAYER 3: Dependency download # Cache key: inherited from LAYER 2 (pom.xml hash) # Invalidated: only when pom.xml changes # This layer takes 60-180 seconds on first run, 0 seconds when cached RUN ./mvnw dependency:go-offline -q # LAYER 4: Source code # Cache key: src directory content hash # Invalidated: on every code change (expected and unavoidable) COPY src ./src # LAYER 5: Compile and package # Cache key: inherited from LAYER 4 (source code hash) # Invalidated: on every code change RUN ./mvnw clean package -DskipTests -q # ── STAGE 2: Runtime (same as before) ──────────────────────────────────────── FROM eclipse-temurin:21-jre-jammy WORKDIR /app RUN groupadd -r -g 1001 forgegroup && useradd -r -u 1001 -g forgegroup -s /sbin/nologin -M forgeuser RUN mkdir -p /app/logs /app/tmp && chown -R forgeuser:forgegroup /app COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar USER forgeuser ENTRYPOINT ["java", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=75.0", "-XX:+ExitOnOutOfMemoryError", "-Djava.io.tmpdir=/app/tmp", "-jar", "app.jar"]
#
# WRONG ORDER (COPY src before dependency download):
# Every commit invalidates dependency cache
# Average build time: 3 minutes 20 seconds
# 50 commits x 3.3 minutes = 165 minutes of CI time per day
#
# CORRECT ORDER (pom.xml first, dependency download, then src):
# Code-only changes: dependency layer cached, build in 28 seconds
# pom.xml changes: full re-download, build in 3 minutes 10 seconds
# 50 commits (48 code, 2 pom.xml): 48x28s + 2x190s = 1344s + 380s = 28.7 minutes
# Daily CI time reduction: 165 minutes -> 29 minutes = 82% reduction
#
# Docker cache inspection:
# docker history io.thecodeforge/forge-api:latest --no-trunc
# Shows each layer with size and creation command
# Cached layers show age (e.g., '2 hours ago') — non-cached show 'just now'
#
# Force rebuild without cache (when you need a clean state):
# docker build --no-cache -t io.thecodeforge/forge-api:latest .
| Aspect | Standard JAR Execution | Containerized Execution (Docker) |
|---|---|---|
| Portability | Requires matching JRE version, OS libraries, and environment variables on each host — works on developer's machine, may fail on CI or production | Completely self-contained — the JRE, OS libraries, and configuration are in the image. Runs identically on any host with Docker installed |
| Environment Consistency | Varies by host OS, JRE version, and installed libraries — 'works on my machine' is a real failure mode that causes production incidents | Identical everywhere — Linux namespace isolation ensures the container sees the same environment on every host |
| Resource Control | No enforcement — a runaway GC cycle or memory leak can consume all host RAM and affect other services on the same machine | Strict — container memory and CPU limits enforced by cgroups. A misbehaving container cannot affect other containers on the same host |
| Isolation | Shared filesystem, shared network stack, shared process namespace — one service's file operations can affect another's | Isolated filesystem, isolated network namespace, isolated process space — containers share the kernel but not each other's resources |
| Security Surface | Runs with whatever user launched the JAR — often a developer's local user or a broad service account | Configurable user via USER directive — non-root by default in well-configured images, with explicit filesystem permissions |
| Startup Dependencies | Manual — you must start the database before the application and hope it is ready. Sleep calls are the common workaround. | Structured — Docker Compose healthchecks and depends_on conditions make startup order and readiness deterministic |
| Deployment Unit | JAR file — recipient environment must have the correct JRE, correct OS, and correct environment variables configured separately | Docker image — a single immutable artifact contains the entire runtime. Push once, pull anywhere |
| Best For | Local development with a single service, scripted deployments to managed servers with configuration management tools (Ansible, Chef) | Everything that ships to more than one environment — development parity with production, CI pipelines, staging, and production deployments |
🎯 Key Takeaways
- Docker ensures that if it runs in CI it runs in production — the image is the deployment artifact and the environment is baked into it, not configured separately on each host.
- Multi-stage builds are not an optimization — they are the correct way to build production Java images. Single-stage builds with JDK in production are a security and operational liability.
- Never run Spring Boot containers as root. Four lines of Dockerfile (groupadd, useradd, mkdir with chown, USER directive) prevent an entire class of security vulnerabilities and compliance findings.
- depends_on is startup order — healthcheck is readiness. Always pair database dependencies with service_healthy conditions and database-native readiness probes (pg_isready, mysqladmin ping, redis-cli ping).
- Configure JVM memory flags explicitly: -XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0. Verify with jcmd after deploy. A JVM that ignores container limits is a time bomb.
- Layer cache ordering determines build speed: pom.xml before src, dependency download before compile. Code-only builds should take under 30 seconds on a warm cache.
- Log to stdout inside containers — Docker captures it automatically. File-based logging inside containers loses logs on restart and grows the writable layer unboundedly.
- Set server.address=0.0.0.0 in your container-specific application profile. The default loopback binding makes your application unreachable from outside the container network namespace.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QExplain the concept of multi-stage builds in Docker. Why is it particularly useful for Spring Boot applications and what is the real measured impact?Mid-levelReveal
- QHow does Spring Boot detect it is running inside a Docker container, and how do you optimize JVM memory settings for containerized deployments?SeniorReveal
- QWhat is the difference between a Docker Image and a Docker Container in the context of a Spring Boot application?JuniorReveal
- QHow do you pass application.properties values to a Spring Boot application running inside a Docker container? What are the trade-offs between the three approaches?Mid-levelReveal
- QExplain the PID 1 problem in Docker and why it matters for Spring Boot graceful shutdown.SeniorReveal
Frequently Asked Questions
What is the difference between a single-stage and multi-stage Dockerfile for Spring Boot?
A single-stage Dockerfile uses one FROM statement — typically a JDK image — and includes all build tools (JDK, Maven, Maven cache) in the final production image. The result is typically 750-900MB. A multi-stage Dockerfile uses two FROM statements: Stage 1 compiles with JDK and Maven, Stage 2 copies only the compiled JAR to a JRE base image. The build stage is completely discarded. The result is 220-280MB. The size difference directly affects push time to registries, pull time in CI and Kubernetes, and the number of CVEs reported by security scanners. Multi-stage is not a best practice — it is the correct approach for any image that ships to production.
How do I pass secrets to a Docker container without hardcoding them?
Use environment variable references in docker-compose.yml: ${DB_PASSWORD} resolves from a .env file at runtime. The .env file stays in .gitignore and never enters version control. For production, do not use .env files — use a proper secret manager. AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault all have Spring Boot integrations (Spring Cloud Vault, AWS Secrets Manager property source) that inject secrets as environment variables or application properties at startup. The key principle: secrets should never appear in the container definition, the Dockerfile, or any version-controlled file. If a secret touches Git, it is compromised and must be rotated regardless of subsequent commits that 'remove' it.
Why does my Spring Boot app crash with ECONNREFUSED even though the database container is running?
Docker Compose depends_on only ensures the database container process has started. It does not mean the database inside that container has finished initialization and is accepting connections. PostgreSQL's container process starts in under one second. PostgreSQL the database engine finishes cluster initialization and opens its listener in 10-20 seconds. Your Spring Boot application's HikariCP pool attempts its first connection during this gap and fails. The fix: add a healthcheck to the database service using pg_isready (for PostgreSQL), and change depends_on to use condition: service_healthy. Also add start_period to the healthcheck to give PostgreSQL time to initialize before health evaluation begins.
What JVM flags should I use for running Spring Boot in Docker?
The minimum set: -XX:+UseContainerSupport (reads memory limits from cgroups instead of host /proc/meminfo — enabled by default in Java 11+ but add explicitly for clarity), -XX:MaxRAMPercentage=75.0 (allocates 75% of container memory as heap — leaves 25% for metaspace, code cache, thread stacks, and Netty direct buffers), and -XX:+ExitOnOutOfMemoryError (fails loudly with a container exit instead of degrading silently under memory pressure). Additionally: -Djava.io.tmpdir=/app/tmp (ensures temp files go to a directory the non-root user owns) and -Dfile.encoding=UTF-8 (prevents locale-dependent character encoding differences between environments). Verify after deployment with docker exec <container> jcmd 1 VM.flags.
How do I reduce my Spring Boot Docker image size?
In order of impact: (1) Multi-stage build — switch from JDK to JRE in the runtime stage. This is the largest single reduction, typically 500-600MB. (2) Alpine variant — switch from eclipse-temurin:21-jre-jammy to eclipse-temurin:21-jre-alpine for a smaller OS footprint, typically saving another 40-60MB. Note: alpine uses musl libc instead of glibc — verify your dependencies are compatible before using alpine in production. (3) Remove Maven download cache — add -Dmaven.repo.local=/tmp/m2 to the Maven command so the dependency cache is not included if you accidentally structured layers incorrectly. Realistic targets: Jammy JRE image 240-280MB, Alpine JRE image 180-220MB. Both are 65-75% smaller than a single-stage JDK image.
Can I use Docker Compose for production deployments?
Docker Compose is purpose-built for local development, CI pipelines, and single-host deployments. For production across multiple servers, it lacks the features you need: automatic rescheduling when a host goes down, rolling deployment with zero downtime, horizontal scaling across multiple nodes, integrated secret management, and resource quotas across a cluster. For production with multiple hosts, use Kubernetes (the industry standard for orchestration), Docker Swarm (simpler Compose-compatible syntax but less ecosystem support), or a managed container service (AWS ECS, Google Cloud Run, Azure Container Apps) that handles orchestration for you. Compose in production is appropriate only for single-server deployments where you are consciously accepting its limitations.
My container is being killed with exit code 137 but I see no Java exception in the logs. What is happening?
Exit code 137 means the container process was killed by SIGKILL — the uncatchable termination signal. In containerized environments, this is almost always an OOM (Out of Memory) kill from the Linux OOM killer. The JVM was consuming more memory than the container's limit and the kernel terminated it forcibly, before the JVM had an opportunity to write OutOfMemoryError to any log. Confirm with: docker inspect <container> | jq '.[0].State.OOMKilled' — if true, memory is the issue. Fix by adding -XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0 to your ENTRYPOINT. Verify the effective heap allocation with docker exec <container> jcmd 1 VM.flags | grep MaxHeap — it should be approximately 75% of your container --memory limit. If the OOM kills persist, the container memory limit is genuinely too low for the workload — measure actual memory usage under realistic load with docker stats before increasing the limit.
Why is my Docker build taking 3 minutes on every commit even though I only changed one line of code?
The most common cause is COPY src appearing before the dependency download step in your Dockerfile. Every code change invalidates the COPY src layer, which cascades to invalidate all subsequent layers including the Maven dependency download. Docker layer caching is binary and cascading — one invalidated layer rebuilds everything below it. The fix is instruction reordering: COPY .mvn and COPY mvnw first (rarely changes), then COPY pom.xml (changes only when dependencies change), then RUN mvnw dependency:go-offline (cached against pom.xml hash), then COPY src (changes every commit), then RUN mvnw package. With this ordering, code-only changes hit the cached dependency layer and the build takes 20-30 seconds. Only pom.xml changes trigger the full dependency download.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.