Senior 8 min · March 09, 2026

Spring Boot Docker — depends_on Without Healthchecks

30% CI failure due to ECONNREFUSED from depends_on without healthchecks.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Docker packages your Spring Boot app with its exact runtime environment — JDK, OS libraries, config — into a portable, reproducible image
  • Multi-stage builds separate the heavy build stage (Maven + JDK) from the slim runtime stage (JRE-only) — reducing image size from 800MB to 250MB
  • Docker Compose orchestrates multi-service stacks (app + database + Redis) with service-name DNS resolution — no hardcoded IPs
  • depends_on only controls startup order, not readiness — always pair it with healthcheck conditions to prevent connection failures on boot
  • Never run containers as root — a compromised app gets host-level access; always define a non-root USER directive
  • The biggest mistake: including Maven and JDK in the production image — it doubles size and triples the attack surface
Plain-English First

Think of Spring Boot with Docker as the difference between shipping a cake recipe and shipping the entire bakery setup. If you hand someone just the JAR file, they need the right JRE version installed, the right OS libraries present, and the right environment variables configured. Get any of those wrong and the application refuses to start. With Docker, you pack the cake, the specific oven model, the oven settings, and the chef's exact instructions into one standardized shipping container. That container lands identically on a developer's laptop running macOS and on a production server running Amazon Linux 2023 — same JRE, same OS libraries, same behavior.

The 'it works on my machine' problem is not a myth. I have watched two-hour incident bridges caused by a JRE minor version difference between development and production. Docker makes that class of problem structurally impossible.

In the modern DevOps landscape, containerization is no longer optional — it is the baseline expectation for any Java service that ships to more than one environment. Spring Boot provides the framework for microservices, but Docker provides the deployment mechanism that makes those services reproducible and portable. Together they eliminate the environment drift that turns 'it works on my machine' into a recurring production incident.

But production-grade containerization is not just about running docker build and calling it done. A bloated 800MB image with embedded Maven doubles deployment time and triples the attack surface — every unused JDK library is a potential CVE waiting to be flagged by your security scanner. A container running as root exposes the host to privilege escalation the moment the application is compromised. A missing healthcheck causes cascading startup failures across your entire Compose stack every time the database takes longer than usual to initialize.

I have debugged all three of these in production. What they share is that the fix was straightforward once you understood the mechanism. The gap was never Docker knowledge — it was understanding what Docker actually does versus what developers assume it does.

This guide covers multi-stage builds with real size numbers, non-root security configurations, Docker Compose orchestration with proper healthchecks, JVM container memory optimization, layer caching strategy, and the specific production failures that separate hobby containers from deployments you can put your name on.

Multi-Stage Builds: The Single Most Impactful Dockerfile Change

The first Dockerfile most developers write for a Spring Boot application uses a single stage with a JDK image and copies the source code directly. It works. But it ships Maven, the JDK compiler, all your source code, and every downloaded dependency into the production image. The result is typically 750-900MB of image that takes minutes to push to ECR or GCR, gets flagged by container security scanners for dozens of JDK-related CVEs, and runs with far more filesystem surface than the running application ever needs.

Multi-stage builds are the fix. The concept is straightforward: use two FROM statements in one Dockerfile. The first stage has everything you need to build — JDK, Maven wrapper, source code. The second stage starts clean with only a JRE base image and copies the compiled JAR artifact from the build stage. When Docker builds the image, the build stage is used and then discarded. Nothing from Stage 1 ends up in the final image except the files you explicitly copy over.

The size improvement is not marginal. I have measured the before-and-after on multiple projects: single-stage JDK image runs 780-850MB, multi-stage JRE image runs 220-280MB. That is a 65-70% reduction. At 50 deployments per day across a team, the cumulative pull and push time savings are substantial. The security improvement is harder to quantify but consistently significant — a JDK image ships hundreds of binaries and libraries that a running Spring Boot application never touches at runtime.

The layer caching strategy inside the build stage matters almost as much as the stage separation itself. If you COPY src before running dependency:go-offline, every code change invalidates the Maven download cache layer and forces a full re-download. Copy pom.xml first, run the dependency resolution, then copy src. This way, only actual pom.xml changes trigger dependency re-downloads. Code changes reuse the cached dependency layer.

DockerfileDOCKERFILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# ── STAGE 1: Build ───────────────────────────────────────────────────────────
# Use the full JDK with Maven wrapper support
# eclipse-temurin is the preferred base: community-maintained, Adoptium project
FROM eclipse-temurin:21-jdk-jammy AS build
WORKDIR /app

# Copy Maven wrapper first — these rarely change, so this layer is almost always cached
COPY .mvn/ .mvn
COPY mvnw pom.xml ./

# Download all dependencies before copying source code
# Layer cache key is pom.xml — only invalidated when dependencies change, not when code changes
RUN ./mvnw dependency:go-offline -q

# Now copy source — this layer changes on every code commit, which is expected
COPY src ./src

# Build without tests (tests should run in a separate CI step, not during image build)
RUN ./mvnw clean package -DskipTests -q

# ── STAGE 2: Runtime ─────────────────────────────────────────────────────────
# JRE only — no compiler, no javac, no Maven, no source code
# Jammy (Ubuntu 22.04) base keeps OS libraries consistent with the build stage
FROM eclipse-temurin:21-jre-jammy
WORKDIR /app

# Create a dedicated non-root system user and group before adding any files
# -r flag: system account (no home dir, no login shell, no password)
RUN groupadd -r forgegroup && useradd -r -g forgegroup -s /sbin/nologin forgeuser

# Create log and temp directories before switching to non-root user
# Without this, the non-root user cannot create directories at runtime
RUN mkdir -p /app/logs /app/tmp && chown -R forgeuser:forgegroup /app

# Copy only the built artifact — nothing else from the build stage comes through
# --chown ensures the JAR is owned by the non-root user from the moment it lands
COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar

# Switch to non-root user for all subsequent operations including ENTRYPOINT
USER forgeuser

# JVM flags explained:
#   -XX:+UseContainerSupport        — read memory/CPU limits from cgroups, not /proc/meminfo
#   -XX:MaxRAMPercentage=75.0       — allocate 75% of container memory as heap, leave 25% for metaspace/threads
#   -XX:+ExitOnOutOfMemoryError     — crash loudly on OOM instead of degrading silently
#   -Djava.io.tmpdir=/app/tmp       — write temp files to the directory the non-root user owns
#   -Dfile.encoding=UTF-8           — explicit encoding prevents locale-dependent behavior differences
ENTRYPOINT ["java", \
  "-XX:+UseContainerSupport", \
  "-XX:MaxRAMPercentage=75.0", \
  "-XX:+ExitOnOutOfMemoryError", \
  "-Djava.io.tmpdir=/app/tmp", \
  "-Dfile.encoding=UTF-8", \
  "-jar", "app.jar"]
Output
# Build: docker build -t io.thecodeforge/forge-api:1.0.0 .
#
# First build (cold cache — downloads all dependencies):
# STAGE 1 build: ~180 seconds (Maven download + compile)
# STAGE 2 setup: ~15 seconds
# Final image: 268MB
#
# Second build after code-only change (warm cache — pom.xml unchanged):
# STAGE 1 build: ~25 seconds (dependency layer cached, only compile runs)
# STAGE 2 setup: ~5 seconds
# Final image: 268MB
#
# Size comparison:
# Single-stage JDK image: 834MB
# Multi-stage JRE image: 268MB
# Reduction: 67.8%
#
# Security scan results (example using Trivy):
# Single-stage JDK: 43 CVEs (12 HIGH, 3 CRITICAL)
# Multi-stage JRE: 8 CVEs (1 HIGH, 0 CRITICAL)
#
# Verify non-root user:
# docker run --rm io.thecodeforge/forge-api:1.0.0 whoami
# forgeuser
#
# Verify build tools absent from final image:
# docker run --rm io.thecodeforge/forge-api:1.0.0 sh -c 'which mvn || echo "Maven: absent"'
# Maven: absent
# docker run --rm io.thecodeforge/forge-api:1.0.0 sh -c 'which javac || echo "JDK compiler: absent"'
# JDK compiler: absent
Multi-Stage Builds Are a Factory Assembly Line
  • Stage 1 (build): eclipse-temurin:21-jdk-jammy + Maven wrapper + all source code — heavy, approximately 800MB, everything needed to compile
  • Stage 2 (run): eclipse-temurin:21-jre-jammy + compiled JAR only — slim, approximately 260MB, nothing that was not in the final artifact
  • The build stage is completely discarded by Docker — COPY --from=build is the only bridge between stages
  • Layer cache ordering matters as much as stage separation: pom.xml first, dependency download second, src last — code changes do not invalidate the dependency cache
  • The -XX:+ExitOnOutOfMemoryError flag is often skipped in examples but matters in production — a JVM degrading silently under memory pressure is harder to diagnose than one that exits with a clear OOM error
Production Insight
A team at a fintech company deployed a single-stage Dockerfile with JDK 17 and Maven in the final production image. Image size was 862MB. Pushing to their private ECR registry took 8 minutes per deploy on a standard CI runner. Their security scanner flagged 51 CVEs from unused JDK libraries including three critical severity items related to the Java compiler toolchain — none of which would ever execute in a running Spring Boot service. The CVE report blocked three consecutive release cycles while the team argued about remediation. Switching to a multi-stage build took 45 minutes of work. Final image: 241MB. Deploy time: 90 seconds. CVEs: 6, all low severity from base OS packages present in both images.
Key Takeaway
Multi-stage builds are the single most impactful change you can make to a Spring Boot Dockerfile — 67% smaller image, 60% fewer CVEs, 70% faster registry operations from one structural change.
Layer cache ordering within the build stage is the second most impactful decision: copy pom.xml before src so code changes do not trigger dependency re-downloads.
The runtime stage should be provably clean — verify with docker run commands that confirm Maven, javac, and source code are absent.

Docker Compose Orchestration: Healthchecks, DNS, and Service Dependencies

Docker Compose solves a real problem: running a Spring Boot application locally requires a database, probably a Redis instance, maybe a message broker. Without Compose, you are maintaining a mental checklist of docker run commands with the right port mappings, environment variables, and network flags. With Compose, the entire stack lives in one file that any developer can bring up with a single command.

But Compose introduces its own failure modes, and the most common one is the depends_on misunderstanding. The depends_on directive controls the order in which Docker starts containers. It does not wait for the service inside that container to become ready. A PostgreSQL container is 'started' from Docker's perspective the moment the postgres binary begins executing — which is 10-15 seconds before PostgreSQL finishes cluster initialization, creates the specified database, and opens its TCP listener on port 5432. The Spring Boot application's HikariCP pool makes its first connection attempt during this window and fails.

The fix is pairing depends_on with a healthcheck that uses the database's own readiness probe. PostgreSQL ships pg_isready for exactly this purpose. MySQL has mysqladmin ping. Redis has redis-cli ping. Use the database-native probe rather than a generic TCP check — TCP connectivity alone does not guarantee the database is ready to authenticate and execute queries.

Service name DNS is the other Compose feature that eliminates a class of bugs. Every service in a Compose file gets a DNS entry matching its service name on the default bridge network. This means you never hardcode 172.x.x.x addresses in connection strings — you use forge-db as the hostname and Docker resolves it to whatever internal IP the database container received. This works automatically and correctly handles container restarts where the IP may change.

docker-compose.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# Docker Compose V2 format — requires Docker Desktop 4.x+ or Docker Engine 20.10+
# Run: docker compose up -d
# Stop: docker compose down
# Rebuild app only: docker compose up -d --build forge-app

services:

  forge-app:
    image: io.thecodeforge/forge-api:latest
    build:
      context: .
      dockerfile: Dockerfile
    container_name: forge-app
    ports:
      - "8080:8080"
    environment:
      # Spring profiles — controls which application-{profile}.yml is loaded
      - SPRING_PROFILES_ACTIVE=docker
      # Database — uses service name 'forge-db' as hostname, Docker DNS resolves it
      - SPRING_DATASOURCE_URL=jdbc:postgresql://forge-db:5432/forgedb
      - SPRING_DATASOURCE_USERNAME=forge_admin
      # Credentials from .env file — never hardcode passwords in this file
      - SPRING_DATASOURCE_PASSWORD=${DB_PASSWORD}
      # Redis — uses service name 'forge-redis' as hostname
      - SPRING_DATA_REDIS_HOST=forge-redis
      - SPRING_DATA_REDIS_PORT=6379
      # JVM tuning — overrides ENTRYPOINT defaults if needed at runtime
      - JAVA_OPTS=-XX:MaxRAMPercentage=75.0
    # Container memory limit — JVM respects this via -XX:+UseContainerSupport
    mem_limit: 512m
    # Restart policy — restarts on crash but not on explicit stop
    restart: on-failure
    depends_on:
      forge-db:
        condition: service_healthy
      forge-redis:
        condition: service_healthy
    healthcheck:
      # /actuator/health requires spring-boot-starter-actuator dependency
      test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
      interval: 15s
      timeout: 5s
      retries: 5
      start_period: 30s  # Grace period — healthcheck not evaluated until app has had 30s to start
    networks:
      - forge-network

  forge-db:
    image: postgres:16-alpine
    container_name: forge-db
    environment:
      - POSTGRES_DB=forgedb
      - POSTGRES_USER=forge_admin
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      # Named volume — data persists across docker compose down/up cycles
      - forge-db-data:/var/lib/postgresql/data
      # Optional: mount init SQL scripts for schema setup
      # - ./db/init:/docker-entrypoint-initdb.d
    healthcheck:
      # pg_isready: PostgreSQL's official readiness probe
      # -U and -d flags ensure we are checking the specific database, not just the socket
      test: ["CMD-SHELL", "pg_isready -U forge_admin -d forgedb -q"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 15s
    networks:
      - forge-network

  forge-redis:
    image: redis:7-alpine
    container_name: forge-redis
    # allkeys-lru: evict least recently used keys when maxmemory is reached
    # appendonly yes: persist data to disk for cache warm-up after restart
    command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru --appendonly yes
    volumes:
      - forge-redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5
    networks:
      - forge-network

volumes:
  forge-db-data:
    driver: local
  forge-redis-data:
    driver: local

networks:
  forge-network:
    driver: bridge
Output
# Start the full stack:
# docker compose up -d
#
# Expected startup sequence:
# 1. forge-db and forge-redis start in parallel
# 2. forge-db healthcheck runs every 10s — reports healthy after ~15s
# 3. forge-redis healthcheck runs every 10s — reports healthy after ~5s
# 4. forge-app starts only after BOTH dependencies report healthy
# 5. forge-app healthcheck begins after 30s start_period grace
#
# Verify service health:
# docker compose ps
# NAME IMAGE STATUS PORTS
# forge-app io.thecodeforge/forge-api Up (healthy) 0.0.0.0:8080->8080/tcp
# forge-db postgres:16-alpine Up (healthy)
# forge-redis redis:7-alpine Up (healthy)
#
# Test DNS resolution from app container:
# docker exec forge-app nslookup forge-db
# Server: 127.0.0.11 (Docker's embedded DNS resolver)
# Address: 127.0.0.11#53
# Name: forge-db
# Address: 172.18.0.2
#
# Test connectivity:
# docker exec forge-app nc -zv forge-db 5432
# Connection to forge-db 5432 port [tcp/postgresql] succeeded!
#
# Check healthcheck detail for database:
# docker inspect forge-db | jq '.[0].State.Health'
# {"Status": "healthy", "FailingStreak": 0, "Log": [...]}
depends_on Is Startup Order — Healthcheck Is Readiness
depends_on without a condition only ensures the dependent container's process has started. PostgreSQL's container process starts in under one second. PostgreSQL the database engine finishes initialization in 10-20 seconds. That gap is where ECONNREFUSED lives. The condition: service_healthy directive tells Compose to wait until the dependency's healthcheck reports healthy before starting the dependent service. This converts a race condition into a deterministic startup sequence. start_period on the healthcheck is equally important: it gives a container time to start before the healthcheck begins counting failures. A Spring Boot application that takes 20 seconds to start should have start_period: 30s so Docker does not mark it unhealthy before it has had a chance to initialize.
Production Insight
At one company, the development team had been maintaining a wait-for-it.sh script — a shell script that loops checking TCP connectivity until the database port opens. This worked but had two problems: it checked TCP connectivity, not PostgreSQL readiness, and it had to be maintained separately from the Compose file. After a Docker Engine upgrade changed how shell scripts were executed in entrypoints, the wait script broke silently and the team spent a day debugging what appeared to be a JVM issue. Switching to native healthcheck blocks removed the external dependency entirely and leveraged pg_isready, which tests actual PostgreSQL protocol readiness rather than just TCP port open status.
Key Takeaway
depends_on controls container start order, not service readiness — 'running' means the process started, not that the service is accepting connections.
Use the database's native readiness probe in healthcheck: pg_isready for PostgreSQL, mysqladmin ping for MySQL, redis-cli ping for Redis. Generic TCP checks are insufficient.
start_period prevents false-negative health failures during slow startup — set it to at least 1.5x your slowest measured startup time in CI.
Docker Compose Dependency and Readiness Strategy
IfService depends on another service being started but not necessarily ready — for example, a logging sidecar
UseUse depends_on without condition — container start order is guaranteed, no readiness wait
IfService depends on another service being ready to accept connections — database, Redis, message broker
UseUse depends_on with condition: service_healthy and a healthcheck using the dependency's native readiness probe
IfService depends on an external resource not managed by Compose — a cloud database, an external API
UseHandle readiness in the application itself with retry logic and exponential backoff — Spring Retry or Resilience4j RetryRegistry
IfService startup is slow and Docker marks it unhealthy before it finishes initializing
UseAdd start_period to the healthcheck block — delays the first health evaluation to give the service time to start
IfMultiple services need to start in a specific sequence with readiness gates between each
UseChain depends_on with condition: service_healthy through the sequence — each service waits for the previous one's healthcheck to pass

JVM Memory Configuration for Containerized Environments

This is the configuration mistake that kills containers silently. Before Java 10, the JVM had no awareness of container memory limits. It read /proc/meminfo to determine available memory and sized the heap based on the host machine's total RAM. If your host had 64GB of RAM and your container limit was 512MB, the JVM allocated approximately 16GB of heap — 32 times what the container allowed. The container OOM killer then terminated the process with exit code 137 and no Java exception was ever written to the logs because the JVM never got a chance to throw OutOfMemoryError.

Java 10 introduced -XX:+UseContainerSupport, which reads memory limits from the cgroup filesystem instead of /proc/meminfo. This flag is enabled by default starting in Java 11. Combined with -XX:MaxRAMPercentage, it gives you precise control over heap sizing relative to your container's memory limit.

The 75% figure for MaxRAMPercentage is not arbitrary. JVM memory consumption includes heap, metaspace, code cache, thread stacks, direct buffers, and the JVM's own native overhead. On a typical Spring Boot application, non-heap memory runs between 100MB and 200MB. A 512MB container with 75% MaxRAMPercentage allocates approximately 384MB of heap, leaving 128MB for the rest. This balance holds for most Spring Boot services I have worked with. Services with heavy reflection usage (frameworks that rely on proxies and dynamic class generation) may need more metaspace and a lower MaxRAMPercentage.

The -XX:+ExitOnOutOfMemoryError flag is underused and undervalued. Without it, a JVM that runs out of heap attempts to GC repeatedly, degrades to extreme latency, and eventually becomes unresponsive — without exiting. Kubernetes readiness probes may keep marking it as healthy while the JVM is effectively frozen. With the flag, the JVM exits cleanly, the container restarts, and your monitoring captures a clear OOM event rather than a mysterious latency spike.

application-docker.ymlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# application-docker.yml — loaded when SPRING_PROFILES_ACTIVE=docker
# This profile contains Docker-specific configuration that should not appear
# in application.yml (which is used for local development without containers)

spring:
  datasource:
    # forge-db resolves via Docker Compose DNS to the postgres container
    url: jdbc:postgresql://forge-db:5432/forgedb
    username: ${SPRING_DATASOURCE_USERNAME:forge_admin}
    password: ${SPRING_DATASOURCE_PASSWORD}
    hikari:
      # Container-appropriate pool sizing — not the development defaults
      maximum-pool-size: 10
      minimum-idle: 2
      connection-timeout: 20000
      # If connection cannot be obtained in 20s, fail fast rather than hang
      idle-timeout: 300000
      # Validate connections on checkout — catches connections dropped by postgres idle timeout
      connection-test-query: SELECT 1
  data:
    redis:
      host: forge-redis
      port: 6379
      timeout: 2000ms
      lettuce:
        pool:
          max-active: 8
          min-idle: 2
          max-wait: 2000ms

# Bind to all interfaces — 0.0.0.0 is required inside containers
# 127.0.0.1 (loopback) is the default and will make the app unreachable from outside the container
server:
  address: 0.0.0.0
  port: 8080

# Expose actuator endpoints for health probes and metrics
management:
  endpoints:
    web:
      exposure:
        include: health, metrics, info, conditions
  endpoint:
    health:
      # Show full health details for Docker healthcheck probe
      show-details: always
      # Kubernetes-compatible readiness and liveness probes
      probes:
        enabled: true
  # Actuator port can be different from app port for internal-only access
  # Omit this to use the same port as the application
  server:
    port: 8080

logging:
  # Output to stdout — Docker captures stdout and makes it available via docker logs
  # Never log to files inside a container unless you have a persistent volume mount
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n"
  level:
    root: INFO
    io.thecodeforge: DEBUG
Output
# Verify JVM is respecting container memory limit (512MB container):
#
# docker exec forge-app jcmd 1 VM.flags | grep -iE 'MaxHeap|RAMPercentage'
# -XX:MaxHeapSize=402653184 (384MB = 75% of 512MB container limit)
# -XX:MaxRAMPercentage=75.000000
#
# Without -XX:+UseContainerSupport (or before Java 11):
# -XX:MaxHeapSize=16777216000 (16GB = 25% of 64GB host RAM — dangerous)
#
# Actuator health response:
# curl http://localhost:8080/actuator/health | jq '.status'
# "UP"
#
# Actuator readiness probe (for Kubernetes):
# curl http://localhost:8080/actuator/health/readiness
# {"status": "UP"}
#
# JVM memory usage:
# curl http://localhost:8080/actuator/metrics/jvm.memory.used | jq '.measurements[0].value'
# 187432960 (approximately 179MB used of 384MB allocated heap)
Container Memory Budget: The 75-25 Split
  • Heap (75%): object allocations, application data, cached objects — this is what MaxRAMPercentage controls
  • Metaspace (variable): class metadata, proxies, reflection data — Spring Boot with many annotations uses 80-150MB
  • Code cache (variable): JIT-compiled native code — typically 40-80MB for a Spring Boot service
  • Thread stacks (predictable): approximately 512KB per thread, multiply by your thread pool size
  • Direct buffer memory: Netty (used by Lettuce, WebFlux) allocates off-heap direct buffers that do not count toward heap
  • If your service uses heavy AOP, Lombok, or Hibernate proxies, start at 65% MaxRAMPercentage and measure before increasing
Production Insight
A team set --memory=256m on their container but did not configure MaxRAMPercentage. The JVM, reading the host's 32GB RAM, allocated 8GB of heap. The container OOM killer terminated the process with exit code 137 approximately 90 seconds after startup — after the JVM had finished loading classes and the heap usage crossed 256MB. No Java exception was logged because the JVM process was killed at the OS level before it could write anything. The team spent several hours checking for application exceptions before someone noticed the exit code 137 and recognized it as an OOM kill. Adding -XX:MaxRAMPercentage=70.0 resolved the issue immediately.
Key Takeaway
-XX:+UseContainerSupport tells the JVM to read memory limits from cgroups — enabled by default in Java 11+, add it explicitly for clarity and compatibility.
-XX:MaxRAMPercentage=75.0 allocates 75% of container memory as heap — the remaining 25% covers metaspace, code cache, threads, and native memory.
-XX:+ExitOnOutOfMemoryError converts a slow, invisible JVM degradation into a fast, visible container restart — prefer loud failures over silent ones.
Always verify with docker exec <container> jcmd 1 VM.flags after deploy to confirm the JVM is using the expected heap size.

Security Hardening: Non-Root Users and Secret Management

Running a containerized application as root is a security debt that auditors and security teams flag on every compliance review — for good reason. If your Spring Boot application has a vulnerability that allows arbitrary command execution (a deserializaton exploit, a dependency with an RCE CVE, a path traversal in a file upload handler), and the JVM is running as root, the attacker has root access to the container filesystem. From there, depending on your Docker daemon configuration, container escape to the host becomes possible.

The fix is three lines of Dockerfile: create a system group, create a system user, switch to that user. The user and group are created with -r (system account) flags, which means no home directory, no login shell, no password hash in /etc/shadow. This user can run the JAR but cannot install software, modify system files, or write outside the directories you explicitly own.

The COPY --chown directive is equally important. If you create the user after copying the JAR, the JAR is owned by root and the non-root user cannot read it. Copy with --chown=forgeuser:forgegroup, or create the user first and then copy.

Secret management is the second hardening dimension. The pattern I see most often — and most often in incident post-mortems — is credentials hardcoded in docker-compose.yml and committed to the repository. Once a credential is in Git history, it is effectively compromised regardless of whether you 'fixed it' in a later commit. The correct pattern is environment variable references in Compose (${DB_PASSWORD}) resolved from a .env file that is in .gitignore, or from a secret management system in production.

For production deployments, the .env file approach is still insufficient if the secret manager is not involved. AWS Secrets Manager, HashiCorp Vault, and GCP Secret Manager all have Spring Boot integrations that inject secrets as environment variables or properties at startup without them ever appearing in the container definition.

Dockerfile.security-hardenedDOCKERFILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Security-hardened production Dockerfile
# Demonstrates: non-root user, read-only filesystem, minimal base image

FROM eclipse-temurin:21-jdk-jammy AS build
WORKDIR /app
COPY .mvn/ .mvn
COPY mvnw pom.xml ./
RUN ./mvnw dependency:go-offline -q
COPY src ./src
RUN ./mvnw clean package -DskipTests -q

# ── Security-hardened runtime stage ──────────────────────────────────────────
FROM eclipse-temurin:21-jre-jammy

# Security: Remove package manager caches after any installs
# No apt-get installs needed here — JRE image already has curl for healthchecks
RUN apt-get update && apt-get install -y --no-install-recommends curl \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

WORKDIR /app

# Create system group and user with no login shell
# -r: system account  -s /sbin/nologin: no interactive login possible
RUN groupadd -r -g 1001 forgegroup \
    && useradd -r -u 1001 -g forgegroup -s /sbin/nologin -M forgeuser

# Create all directories the app needs with correct ownership
# Do this BEFORE switching to non-root user
RUN mkdir -p /app/logs /app/tmp /app/config \
    && chown -R forgeuser:forgegroup /app

# Copy artifact with explicit ownership — no root-owned files in /app
COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar

# Switch to non-root — all subsequent instructions and the ENTRYPOINT run as forgeuser
USER forgeuser

# Expose port for documentation purposes — docker run -p still required
EXPOSE 8080

# Health check uses curl — confirms app is responding before reporting healthy
# --fail flag: curl exits non-zero on HTTP 4xx/5xx — Docker treats this as unhealthy
HEALTHCHECK --interval=15s --timeout=5s --start-period=30s --retries=3 \
  CMD curl --fail --silent http://localhost:8080/actuator/health || exit 1

ENTRYPOINT ["java", \
  "-XX:+UseContainerSupport", \
  "-XX:MaxRAMPercentage=75.0", \
  "-XX:+ExitOnOutOfMemoryError", \
  "-Djava.io.tmpdir=/app/tmp", \
  "-Dfile.encoding=UTF-8", \
  "-jar", "app.jar"]
Output
# Verify non-root user is running the process:
# docker exec forge-app id
# uid=1001(forgeuser) gid=1001(forgegroup) groups=1001(forgegroup)
#
# Verify the process cannot write outside its directories:
# docker exec forge-app sh -c 'touch /etc/test 2>&1'
# touch: cannot touch '/etc/test': Permission denied
#
# Verify curl is available for healthcheck:
# docker exec forge-app curl --fail http://localhost:8080/actuator/health
# {"status":"UP",...}
#
# Trivy security scan comparison:
# Root user image: "Runs as root" — CRITICAL severity finding
# Non-root image: "Runs as non-root user" — PASS
#
# .env file pattern (never commit this file to Git):
# .env contains: DB_PASSWORD=actual_secret_here
# .gitignore: .env
# docker-compose: ${DB_PASSWORD} references the .env value
UID 0 in a Container Is Still Root
Running a container as root is not mitigated by container isolation alone. Container escape vulnerabilities are discovered regularly in both Docker daemon and Linux kernel. The principle of least privilege applies inside containers the same way it applies everywhere else in security engineering. Additionally, most compliance frameworks — SOC 2, PCI DSS, HIPAA — require workloads to run as non-root. An audit finding on this point typically results in a required remediation with a deadline. Building non-root images from the start is easier than retrofitting it under audit pressure.
Production Insight
A team skipped the non-root user setup 'for simplicity during the initial rollout, to be fixed later.' The service was deployed to production. Six months later, a security assessment flagged every container as running as root. By that point, the Dockerfile had been copied as a template for 14 microservices. Remediating all 14 required coordinating 14 teams, discovering that 3 of the services had permission issues with file paths that only manifested after the user switch, and delaying two feature releases. The cost of 'fix it later' was orders of magnitude higher than adding the four lines of user creation to the original Dockerfile.
Key Takeaway
Non-root container users are not optional in production — they are a baseline security requirement and a compliance expectation in most regulated environments.
Create the user and directories before the COPY command, then switch with USER — the order matters for file ownership.
Never hardcode secrets in Dockerfile or docker-compose.yml — use .env files in .gitignore for local development and a proper secret manager for production deployments.

Layer Caching Strategy: Making Builds Deterministically Fast

Docker layer caching is the mechanism that makes subsequent builds fast after the first. Every instruction in a Dockerfile creates a layer. If Docker determines a layer's inputs have not changed since the last build, it reuses the cached result instead of re-executing the instruction. This is binary — either the layer is fully cached or it is fully invalidated and re-executed.

The rule that determines cache validity: if any layer above a given layer is invalidated, all layers below it are also invalidated. This makes the order of instructions in your Dockerfile as important as the instructions themselves.

For Spring Boot applications, the expensive operation is the Maven dependency download — pulling hundreds of JARs from Maven Central. This step can take 60-180 seconds on a cold cache. If you structure your Dockerfile so that COPY src precedes RUN mvnw dependency:go-offline, every single code change — even a one-character fix in a comment — invalidates the dependency download layer. The result is a 3-minute build for every change.

The fix is sequencing. Copy pom.xml and the Maven wrapper first. Run the dependency download. Then copy src. Now the dependency layer is only invalidated when pom.xml changes — which happens far less frequently than code changes. Code-only changes reuse the cached dependency layer and the build takes 20-30 seconds instead of 3 minutes.

This is not a minor optimization. On a team making 30-50 commits per day, the difference between 3-minute and 30-second Docker builds is the difference between a CI pipeline that developers trust and one they work around.

Dockerfile.cachedDOCKERFILE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# ── STAGE 1: Build with optimized layer caching ──────────────────────────────
FROM eclipse-temurin:21-jdk-jammy AS build
WORKDIR /app

# LAYER 1: Maven wrapper files
# Cache key: .mvn directory and mvnw script
# Invalidated: when Maven wrapper version changes (very rare)
COPY .mvn/ .mvn
COPY mvnw ./
RUN chmod +x mvnw

# LAYER 2: pom.xml only — no source code
# Cache key: pom.xml content hash
# Invalidated: when dependencies change (infrequent — maybe once per sprint)
COPY pom.xml ./

# LAYER 3: Dependency download
# Cache key: inherited from LAYER 2 (pom.xml hash)
# Invalidated: only when pom.xml changes
# This layer takes 60-180 seconds on first run, 0 seconds when cached
RUN ./mvnw dependency:go-offline -q

# LAYER 4: Source code
# Cache key: src directory content hash
# Invalidated: on every code change (expected and unavoidable)
COPY src ./src

# LAYER 5: Compile and package
# Cache key: inherited from LAYER 4 (source code hash)
# Invalidated: on every code change
RUN ./mvnw clean package -DskipTests -q

# ── STAGE 2: Runtime (same as before) ────────────────────────────────────────
FROM eclipse-temurin:21-jre-jammy
WORKDIR /app
RUN groupadd -r -g 1001 forgegroup && useradd -r -u 1001 -g forgegroup -s /sbin/nologin -M forgeuser
RUN mkdir -p /app/logs /app/tmp && chown -R forgeuser:forgegroup /app
COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar
USER forgeuser
ENTRYPOINT ["java", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=75.0", "-XX:+ExitOnOutOfMemoryError", "-Djava.io.tmpdir=/app/tmp", "-jar", "app.jar"]
Output
# Build time comparison — 50-commit working day:
#
# WRONG ORDER (COPY src before dependency download):
# Every commit invalidates dependency cache
# Average build time: 3 minutes 20 seconds
# 50 commits x 3.3 minutes = 165 minutes of CI time per day
#
# CORRECT ORDER (pom.xml first, dependency download, then src):
# Code-only changes: dependency layer cached, build in 28 seconds
# pom.xml changes: full re-download, build in 3 minutes 10 seconds
# 50 commits (48 code, 2 pom.xml): 48x28s + 2x190s = 1344s + 380s = 28.7 minutes
# Daily CI time reduction: 165 minutes -> 29 minutes = 82% reduction
#
# Docker cache inspection:
# docker history io.thecodeforge/forge-api:latest --no-trunc
# Shows each layer with size and creation command
# Cached layers show age (e.g., '2 hours ago') — non-cached show 'just now'
#
# Force rebuild without cache (when you need a clean state):
# docker build --no-cache -t io.thecodeforge/forge-api:latest .
Cache Invalidation Is Binary and Cascading
Docker layer caching has no partial invalidation. Either a layer is fully cached or it is fully rebuilt. And when a layer is rebuilt, every subsequent layer is also rebuilt regardless of whether its inputs changed. This makes the ordering of COPY and RUN instructions the primary variable in build performance. If you have slow operations, put them high in the Dockerfile where they are most likely to be cached. Put frequently-changing files — source code — as low as possible so they invalidate only the fast-running layers below them.
Production Insight
A team running 8 microservices on a shared CI runner pool had average pipeline times of 22 minutes. Each pipeline ran docker build with source code copied before the dependency download, making every commit invalidate the Maven cache layer. After reordering the Dockerfile instructions — pom.xml first, dependency:go-offline second, src third — average pipeline time dropped to 6 minutes. The runner pool was effectively 3.7x more capable without adding any infrastructure, purely from Dockerfile instruction ordering.
Key Takeaway
Layer cache ordering determines build speed more than hardware does — put slow, stable operations high, fast, changing operations low.
COPY pom.xml before COPY src and run dependency:go-offline between them — this is the single highest-leverage Dockerfile optimization for Spring Boot.
Docker cache invalidation is cascading: one invalidated layer rebuilds everything below it — understand which layers change at which frequency and order accordingly.
● Production incidentPOST-MORTEMseverity: high

The ECONNREFUSED Boot Loop — depends_on Without Healthchecks

Symptom
Spring Boot application crashed on startup with 'Connection refused to forge-db:5432'. Docker Compose reported both containers as 'running' with green status. Restarting the entire stack manually sometimes worked, sometimes did not — the behavior was not deterministic. The CI pipeline failed on roughly 30% of runs with no code change between passing and failing builds. Developers assumed the database was slow to initialize and added a 30-second sleep in the entrypoint script.
Assumption
The team accepted the sleep workaround as necessary because Docker Compose 'said the database was running.' Nobody questioned why a running container could refuse connections. The CI failures were attributed to 'flaky infrastructure' rather than a deterministic timing bug. The 30-second sleep occasionally was not enough when the PostgreSQL image was pulled fresh in CI, and it wasted 20-25 seconds on every other run when PostgreSQL was ready in under 10 seconds.
Root cause
Docker Compose depends_on with no condition only ensures the database container process has started — specifically that the Docker daemon has launched the container and the entrypoint has begun executing. It does not mean PostgreSQL has finished its initialization sequence, created the database, applied the schema, and opened its listening socket on port 5432. PostgreSQL initialization on the alpine image takes between 3 and 15 seconds depending on the host, the image cache state, and whether it is performing first-run cluster initialization. The Spring Boot application's HikariCP connection pool attempted its initial connection during this window and crashed. The 30-second sleep was a race condition with a longer timeout — not a fix.
Fix
Added a healthcheck to the PostgreSQL service using pg_isready, which is the official PostgreSQL readiness probe that reports healthy only when the server is fully initialized and accepting connections on the correct socket. Changed the depends_on block to use condition: service_healthy. The Spring Boot container now blocks at startup until the PostgreSQL healthcheck reports healthy, then attempts its first connection. The 30-second sleep was removed entirely. CI failure rate dropped from 30% to 0% across 200 subsequent runs. Average startup time reduced by 12 seconds on warm runs where PostgreSQL was ready quickly.
Key lesson
  • depends_on controls container start order, not service readiness — 'running' means the process started, not that the service inside is accepting connections
  • A hardcoded sleep is a race condition with a longer timeout — use healthchecks for deterministic readiness, not timing guesses
  • Each database has its own readiness probe: pg_isready for PostgreSQL, mysqladmin ping for MySQL, redis-cli ping for Redis — use the database-native probe rather than generic TCP checks
  • If CI fails intermittently with ECONNREFUSED and both containers show as running, the first thing to check is healthcheck configuration — not infrastructure stability
  • The CI failure rate is the leading indicator here — 30% failure with no code change is a determinism bug, not a flaky test
Production debug guideWhen Spring Boot containers behave unexpectedly, here is how to go from observable symptom to a verified resolution. These are ordered by frequency — the first three account for about 80% of the issues I have debugged.6 entries
Symptom · 01
Container exits immediately after start with no error in application logs
Fix
Check the exit code first — it tells you what killed the process. docker inspect <container> | jq '.[0].State.ExitCode'. Exit code 137 means OOM kill — check docker stats and increase the container memory limit. Exit code 1 means the JVM crashed before Spring finished starting — check docker logs <container> | grep -i 'error\|exception\|outofmemory'. If the log is empty entirely, check if the JAR is present: docker run --rm --entrypoint sh <image> -c 'ls -la /app/' to verify the COPY step in your Dockerfile worked correctly.
Symptom · 02
Application crashes with ECONNREFUSED to database on startup
Fix
Verify the database healthcheck is configured and reporting healthy before diagnosing further: docker inspect <db_container> | jq '.[0].State.Health'. If Health is null, no healthcheck is configured — add one. If Health.Status is 'starting', the app is connecting before the database is ready — add condition: service_healthy to depends_on. If Health.Status is 'healthy' but connection still fails, verify DNS resolution from inside the app container: docker exec <app_container> nslookup forge-db. If nslookup fails, the containers are on different Docker networks — check your Compose network configuration.
Symptom · 03
Container runs but application is not accessible on the mapped port
Fix
Verify the port mapping is correct: docker ps shows 0.0.0.0:8080->8080/tcp, not 127.0.0.1:8080->8080/tcp. If it shows 127.0.0.1, the application is binding to loopback inside the container — set server.address=0.0.0.0 in application.properties. Verify the application is actually listening: docker exec <container> ss -tlnp | grep 8080. If the port is not in the listen state, the Spring Boot context failed to start — check application logs for startup errors that happened after the JVM launched.
Symptom · 04
Image is 800MB or larger and takes 5 minutes to push to the registry
Fix
Check which layers are causing the bloat: docker history <image> --no-trunc | head -30. Look for JDK installation layers and Maven cache layers. If you see /root/.m2 or eclipse-temurin:17-jdk in the final image layers, you are not using multi-stage builds. Switch to multi-stage: Stage 1 builds with JDK and Maven, Stage 2 copies only the JAR to a JRE base image. Verify the result: docker images <image> should show under 300MB. Use docker run --rm <image> sh -c 'which mvn; which javac' to confirm build tools are absent from the final image.
Symptom · 05
Container cannot write to log files or create temp directories — permission denied errors
Fix
Check which user the container is running as: docker exec <container> whoami and docker exec <container> id. If non-root, verify the target directories are owned by that user: docker exec <container> ls -la /app/logs/. The most common cause is COPY in the Dockerfile copying files owned by root. Fix with COPY --chown=forgeuser:forgegroup --from=build /app/target/*.jar app.jar. Alternatively, add RUN mkdir -p /app/logs && chown forgeuser:forgegroup /app/logs before the USER directive.
Symptom · 06
Docker Compose services cannot reach each other by service name
Fix
Verify all services are on the same Docker network: docker network ls and docker network inspect <network_name>. All services in the same Compose file are on the same default network unless you have defined custom networks. Check if service names in connection strings match the Compose service key exactly — SPRING_DATASOURCE_URL=jdbc:postgresql://forge-db:5432 requires the Compose service to be named forge-db, not postgres or db. Verify DNS from inside the container: docker exec <app_container> nslookup forge-db.
★ Docker Debug Cheat Sheet — Commands That Save HoursReal commands for debugging Spring Boot Docker containers. These are the commands I type first when a container is misbehaving — ordered by how often each situation actually comes up.
Need to see why a container exited
Immediate action
Check container logs and exit code — exit code tells you the kill reason before you read a single log line
Commands
docker logs --tail 100 <container>
docker inspect <container> | jq '.[0].State.ExitCode'
Fix now
Exit code 137 means OOM killed — increase the container --memory limit or reduce MaxRAMPercentage. Exit code 1 means the JVM or Spring threw an unhandled exception — grep the logs for 'ERROR' and 'Exception'. Exit code 0 means the process exited cleanly — check if the entrypoint is correct and the JAR path is valid.
Need to verify JVM memory settings are respecting container limits+
Immediate action
Check actual JVM heap configuration inside the running container — do not trust that flags are being applied without verifying
Commands
docker exec <container> jcmd 1 VM.flags | grep -iE 'ram|heap|container'
curl -s http://localhost:8080/actuator/metrics/jvm.memory.max | jq '.measurements[0].value'
Fix now
If MaxHeapSize is larger than 75% of your container --memory limit, the JVM is ignoring container boundaries. Verify -XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0 are in your ENTRYPOINT. The Actuator jvm.memory.max value should be approximately 75% of your container memory limit in bytes.
Need to verify database connectivity from inside the application container+
Immediate action
Test DNS resolution and TCP port reachability before blaming the application
Commands
docker exec <container> nslookup forge-db
docker exec <container> nc -zv forge-db 5432
Fix now
If nslookup fails with NXDOMAIN, the services are on different Docker networks or the service name in the connection string does not match the Compose service key. If nslookup succeeds but nc fails, the database container is not listening on that port — check if it has a healthcheck that reports healthy. If both succeed but the app still fails, check the connection pool configuration and credentials.
Need to inspect what is inside a Docker image without running it+
Immediate action
Use docker history to inspect layers and a temporary container to verify file layout
Commands
docker history <image> --no-trunc | head -20
docker run --rm -it --entrypoint sh <image> -c 'ls -la /app/ && du -sh /* 2>/dev/null | sort -rh | head -10'
Fix now
If you see /root/.m2 in the layer history or the du output shows a large /root directory, Maven cache is included in the production image — switch to multi-stage builds. If you see eclipse-temurin:17-jdk as the base layer, switch the runtime stage to eclipse-temurin:17-jre-jammy. A clean multi-stage image should show only /app/app.jar and JRE libraries under /opt/java.
Docker Compose services failing to start in the right order despite depends_on+
Immediate action
Check the actual healthcheck status of dependency services — not just whether they are running
Commands
docker compose ps
docker inspect <db_container> | jq '.[0].State.Health'
Fix now
If Health is null, no healthcheck is defined on the dependency — add one. If Health.Status is 'starting', the app is starting before the healthcheck passes. If Health.Status is 'unhealthy', the healthcheck command is failing — run it manually inside the container: docker exec <db_container> pg_isready -U forge_admin -d bookstore to see the actual output.
Standard JAR Execution vs. Containerized Execution
AspectStandard JAR ExecutionContainerized Execution (Docker)
PortabilityRequires matching JRE version, OS libraries, and environment variables on each host — works on developer's machine, may fail on CI or productionCompletely self-contained — the JRE, OS libraries, and configuration are in the image. Runs identically on any host with Docker installed
Environment ConsistencyVaries by host OS, JRE version, and installed libraries — 'works on my machine' is a real failure mode that causes production incidentsIdentical everywhere — Linux namespace isolation ensures the container sees the same environment on every host
Resource ControlNo enforcement — a runaway GC cycle or memory leak can consume all host RAM and affect other services on the same machineStrict — container memory and CPU limits enforced by cgroups. A misbehaving container cannot affect other containers on the same host
IsolationShared filesystem, shared network stack, shared process namespace — one service's file operations can affect another'sIsolated filesystem, isolated network namespace, isolated process space — containers share the kernel but not each other's resources
Security SurfaceRuns with whatever user launched the JAR — often a developer's local user or a broad service accountConfigurable user via USER directive — non-root by default in well-configured images, with explicit filesystem permissions
Startup DependenciesManual — you must start the database before the application and hope it is ready. Sleep calls are the common workaround.Structured — Docker Compose healthchecks and depends_on conditions make startup order and readiness deterministic
Deployment UnitJAR file — recipient environment must have the correct JRE, correct OS, and correct environment variables configured separatelyDocker image — a single immutable artifact contains the entire runtime. Push once, pull anywhere
Best ForLocal development with a single service, scripted deployments to managed servers with configuration management tools (Ansible, Chef)Everything that ships to more than one environment — development parity with production, CI pipelines, staging, and production deployments

Key takeaways

1
Docker ensures that if it runs in CI it runs in production
the image is the deployment artifact and the environment is baked into it, not configured separately on each host.
2
Multi-stage builds are not an optimization
they are the correct way to build production Java images. Single-stage builds with JDK in production are a security and operational liability.
3
Never run Spring Boot containers as root. Four lines of Dockerfile (groupadd, useradd, mkdir with chown, USER directive) prevent an entire class of security vulnerabilities and compliance findings.
4
depends_on is startup order
healthcheck is readiness. Always pair database dependencies with service_healthy conditions and database-native readiness probes (pg_isready, mysqladmin ping, redis-cli ping).
5
Configure JVM memory flags explicitly
-XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0. Verify with jcmd after deploy. A JVM that ignores container limits is a time bomb.
6
Layer cache ordering determines build speed
pom.xml before src, dependency download before compile. Code-only builds should take under 30 seconds on a warm cache.
7
Log to stdout inside containers
Docker captures it automatically. File-based logging inside containers loses logs on restart and grows the writable layer unboundedly.
8
Set server.address=0.0.0.0 in your container-specific application profile. The default loopback binding makes your application unreachable from outside the container network namespace.

Common mistakes to avoid

8 patterns
×

Running the container as the root user

Symptom
Security audits flag every container as a critical or high severity finding. Compliance reviews fail. If the application has any RCE vulnerability, the attacker gets root access to the container filesystem — and potentially to the host depending on Docker daemon configuration and kernel version.
Fix
Add a non-root user in the Dockerfile: RUN groupadd -r -g 1001 forgegroup && useradd -r -u 1001 -g forgegroup -s /sbin/nologin -M forgeuser. Create the directories the app needs with chown before switching users. Use COPY --chown=forgeuser:forgegroup when copying the JAR. Switch with USER forgeuser before ENTRYPOINT. Verify with docker exec <container> id.
×

Hardcoding secrets in Dockerfile or docker-compose.yml

Symptom
Credentials are committed to Git history and visible to everyone with repository access. Docker images in registries expose secrets to anyone who pulls the image. A single repository access event compromises all environments that use those credentials.
Fix
Reference environment variables in docker-compose.yml using ${DB_PASSWORD} syntax. Resolve them from a .env file that is in .gitignore — never committed. For production, use Docker Secrets in Swarm mode, HashiCorp Vault with the Spring Cloud Vault integration, or a cloud-native secret manager (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault). Secrets in Git are permanent — rotation is required after any accidental commit, even if the commit is subsequently reverted.
×

Including build tools (Maven and JDK) in the final production image

Symptom
Image size is 750-900MB. Push times to container registries are 4-8 minutes. Security scans flag dozens of CVEs from JDK libraries and Maven's own dependencies. None of these libraries execute at runtime — they are pure dead weight in the production image.
Fix
Use multi-stage builds. Stage 1 compiles with JDK and Maven. Stage 2 copies only the built JAR to a JRE base image. The build stage is discarded. Image size drops from 800MB to 250MB. CVE count drops by 60-80%. Verify build tools are absent from the final image: docker run --rm <image> sh -c 'which mvn || echo absent'.
×

Using depends_on without healthcheck conditions

Symptom
Application crashes on startup with ECONNREFUSED to the database. Docker Compose shows both containers as running. The failure is intermittent — sometimes the database is ready in time, sometimes it is not. CI failure rate is 20-40% with no code change between passing and failing builds.
Fix
Add a healthcheck to the database service using its native readiness probe: pg_isready for PostgreSQL, mysqladmin ping for MySQL, redis-cli ping for Redis. Change depends_on to use condition: service_healthy. Add start_period to the healthcheck to give the database time to initialize before health evaluation begins. The startup sequence becomes deterministic.
×

Not configuring JVM memory flags for container environments

Symptom
Container is killed with exit code 137 (OOM kill) with no Java exception in the logs. The JVM allocated heap based on host RAM (e.g., 16GB from a 64GB host) while the container limit was 512MB. The OOM kill happens at the OS level before the JVM can write OutOfMemoryError.
Fix
Add -XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0 to the ENTRYPOINT. Set container memory limits via --memory in docker run or mem_limit in Compose. Verify the effective heap size: docker exec <container> jcmd 1 VM.flags | grep MaxHeap. The value should be approximately 75% of your container memory limit.
×

Copying source code before downloading dependencies in the Dockerfile

Symptom
Every Docker build takes 3+ minutes even for trivial one-line code changes. CI pipelines are slow and developers work around them rather than waiting. Build runners are saturated with redundant Maven downloads.
Fix
Reorder Dockerfile instructions: COPY pom.xml first, RUN dependency:go-offline second, COPY src third. This ensures the dependency download layer is only invalidated when pom.xml changes, not on every code change. Code-only builds drop from 3 minutes to 25 seconds.
×

Logging to files inside a container without a persistent volume mount

Symptom
Application logs disappear when the container restarts. Post-incident analysis is impossible because the evidence is gone. Log files grow unbounded inside the container filesystem and consume the container's writable layer.
Fix
Configure Spring Boot to log to stdout (the default console appender) instead of rolling file appenders. Docker captures stdout automatically and makes it available via docker logs. For long-term log retention, ship logs to an external system (CloudWatch, Elasticsearch, Datadog) via the Docker logging driver or a sidecar agent.
×

Not setting server.address=0.0.0.0 in the container application configuration

Symptom
Application starts successfully, healthcheck inside the container passes, but the mapped port is unreachable from the host or from other containers. Port mapping shows correctly in docker ps but connections are refused.
Fix
Set server.address=0.0.0.0 in application-docker.yml or pass SERVER_ADDRESS=0.0.0.0 as an environment variable. The default loopback binding (127.0.0.1) is unreachable from outside the container's network namespace. This is a Linux networking fundamental that surprises developers used to macOS where the behavior differs.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain the concept of multi-stage builds in Docker. Why is it particula...
Q02SENIOR
How does Spring Boot detect it is running inside a Docker container, and...
Q03JUNIOR
What is the difference between a Docker Image and a Docker Container in ...
Q04SENIOR
How do you pass application.properties values to a Spring Boot applicati...
Q05SENIOR
Explain the PID 1 problem in Docker and why it matters for Spring Boot g...
Q01 of 05SENIOR

Explain the concept of multi-stage builds in Docker. Why is it particularly useful for Spring Boot applications and what is the real measured impact?

ANSWER
Multi-stage builds use multiple FROM statements in a single Dockerfile. Each FROM starts a new build stage with its own filesystem. Stage 1 uses a heavy image — eclipse-temurin:21-jdk-jammy — with Maven to compile the Spring Boot application and run the dependency resolution. Stage 2 starts fresh from a lightweight image — eclipse-temurin:21-jre-jammy — and copies only the compiled JAR using COPY --from=build. The build stage is completely discarded — nothing from it appears in the final image except files explicitly copied over. This is particularly useful for Spring Boot because Java requires a JDK to compile but only a JRE to run. The JDK is approximately 400MB larger than the JRE and includes tools (javac, javap, jshell, diagnostic utilities) that a running application never needs. Maven adds another 100-200MB of tooling and cached dependencies. Measured impact: single-stage JDK images typically measure 800-900MB. Multi-stage JRE images measure 220-280MB — a 65-70% reduction. Security scanner CVE counts drop by 60-80% because the JDK toolchain and its transitive dependencies are absent. Container registry push times drop proportionally. On a team with 50 deployments per day, this translates to hours of saved CI time weekly without any application code change.
FAQ · 8 QUESTIONS

Frequently Asked Questions

01
What is the difference between a single-stage and multi-stage Dockerfile for Spring Boot?
02
How do I pass secrets to a Docker container without hardcoding them?
03
Why does my Spring Boot app crash with ECONNREFUSED even though the database container is running?
04
What JVM flags should I use for running Spring Boot in Docker?
05
How do I reduce my Spring Boot Docker image size?
06
Can I use Docker Compose for production deployments?
07
My container is being killed with exit code 137 but I see no Java exception in the logs. What is happening?
08
Why is my Docker build taking 3 minutes on every commit even though I only changed one line of code?
🔥

That's Spring Boot. Mark it forged?

8 min read · try the examples if you haven't

Previous
Spring Boot Testing with JUnit and Mockito
13 / 15 · Spring Boot
Next
Microservices with Spring Boot and Spring Cloud