Skip to content
Home Java JVM Memory Model Deep Dive: Heap, Stack, GC and Thread Visibility

JVM Memory Model Deep Dive: Heap, Stack, GC and Thread Visibility

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Advanced Java → Topic 12 of 28
JVM Memory Model explained in depth — heap regions (Eden, Survivor, Old Gen), stack frames, metaspace, garbage collection tuning, happens-before, volatile/synchronized semantics, and real production gotchas every Java dev must know.
🔥 Advanced — solid Java foundation required
In this tutorial, you'll learn
JVM Memory Model explained in depth — heap regions (Eden, Survivor, Old Gen), stack frames, metaspace, garbage collection tuning, happens-before, volatile/synchronized semantics, and real production gotchas every Java dev must know.
  • 1 — Heap as a conveyor belt: Fast turnover matters more than size. Reduce allocation rate, not heap size. 90-98% of objects die young.
  • 2 — Old Gen as a warehouse: Expensive to clean. Prevent premature promotion by sizing Survivor spaces correctly. Monitor with -XX:+PrintTenuringDistribution.
  • 3 — GC selection as tradeoffs: Lower latency = more frequent GC = higher CPU. Higher throughput = less frequent GC = longer pauses. When NOT to use G1: ultra-low latency (use ZGC), high-throughput batch (use Parallel GC), heaps < 2 GB (use Serial GC).
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Heap = shared memory for all objects (GC-managed)
  • Stack = per-thread memory (local variables, freed on return)
  • Metaspace = class metadata (outside heap, control via -XX:MaxMetaspaceSize)
  • GC pauses = stop-the-world events (G1 default, ZGC for <1ms latency)
  • Happens-before = thread visibility guarantee (use volatile or synchronized)
Production IncidentThe 4GB Container That Kept DyingPayment service crashing 3–4× per day due to JVM memory misconfiguration.
SymptomPayment service crashing 3–4× per day with OOMKilled in Kubernetes. No heap dump. No Java exception. Just a dead pod.
AssumptionMemory leak in application code.
Root cause-Xmx4g inside a 4 GB container left zero headroom. JVM needed ~490 MB extra for metaspace (~50 MB), thread stacks (200 × 1 MB), JIT code cache (~240 MB), and GC overhead. Total process memory exceeded 5 GB. Linux OOM killer fired before the JVM could throw.
FixSet -Xmx3g (75% of container limit). Enable -XX:NativeMemoryTracking=summary for ongoing visibility.
Key Lesson
Heap ≠ total memoryNon-heap consumes 20–30% of your container budgetAlways leave headroom
Production Debug GuideSymptom → Action — use when production is on fire
OutOfMemoryError: Java heap spaceTake heap dump, analyze with Eclipse MAT
Latency spikes (100ms2s) → Enable GC logging, check pause times
Container OOMKilled (no Java exception)Check non-heap memory, set -Xmx to 75% of limit
Inconsistent values between threadsAdd volatile or synchronized, test on ARM
StackOverflowErrorIncrease -Xss or convert recursion to iteration
High CPU but low throughputProfile allocation rate, reduce object creation

Every Java performance crisis, every mysterious NullPointerException in production at 3 AM, and every subtle data-race bug ultimately traces back to the same root cause: the developer didn't have a clear mental model of how the JVM manages memory. It's not an academic concern — OutOfMemoryErrors, thread-visibility bugs, and stop-the-world GC pauses are day-one realities on any high-traffic service. Yet most Java developers can describe the syntax of a HashMap far better than they can explain why two threads can see different values for the same variable without any apparent concurrency bug.

The JVM Memory Model (JMM) solves two distinct but interrelated problems. First, it defines the physical layout of memory — where objects live, how long they live, and how the garbage collector reclaims them. Second, it defines the visibility and ordering guarantees between threads — the rules that determine whether a write made by Thread A is actually observable by Thread B. Mixing up these two concerns is the source of enormous confusion. The JMM specification (JSR-133, baked into the Java Language Specification since Java 5) is one of the most carefully engineered pieces of the Java platform, and understanding it separates senior engineers from the rest.

I've debugged JVM memory issues across payment processing systems handling 50,000 TPS, recommendation engines running 60 GB heaps, and microservices dying silently from metaspace exhaustion after hot-deploy cycles. The patterns are always the same: developers who understand the memory layout fix problems in minutes; developers who don't spend days chasing phantom bugs.

By the end of this article you'll be able to walk through a running JVM and name exactly what lives where and why. You'll understand the happens-before relationship well enough to reason about data races without guessing. You'll know how to tune GC regions for low-latency workloads, avoid the common memory-layout mistakes that cause silent correctness bugs, and answer the JMM interview questions that trip up even experienced engineers.

> ⚠️ Terminology note: This guide covers two distinct concepts that share confusingly similar names. JVM Memory (heap, stack, metaspace, GC) is the runtime memory structure — where objects live and how they're reclaimed. Java Memory Model (JMM) (happens-before, volatile, synchronized) is the thread visibility specification — the rules that determine when one thread's writes are observable by another. Both are covered here because they're deeply interrelated in production debugging.

What is JVM Memory Model?

The JVM Memory Model defines two things that engineers constantly conflate:

  1. The memory layout — how the JVM divides process memory into regions (heap, stack, metaspace, etc.), what lives in each region, and when memory is reclaimed.
  2. The visibility model — the happens-before rules that determine when a write by one thread is guaranteed to be visible to another thread. This is what volatile, synchronized, java.util.concurrent, and final fields are built on.

Every OutOfMemoryError you've ever seen is a failure of the first part. Every 'works on my machine but not in production' concurrency bug is a failure of the second part. They're different problems requiring different tools, and confusing them is the single most common mistake I see in JMM discussions.

The JVM spec divides runtime memory into five areas: heap, stack (per-thread), program counter register (per-thread), native method stack (per-thread), and metaspace (class metadata, since Java 8). The heap is shared across all threads. The stack, PC register, and native method stack are per-thread — no synchronization needed. Metaspace is shared but rarely mutated after class loading.

⚠ When NOT to rely on this section's concepts:If you're debugging a production incident right now, skip to the Quick Decision Guide in the introduction. This section builds foundation — the guide solves problems.
MemoryLayoutDemo.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
// io.thecodeforge.jvm.memory.MemoryLayoutDemo
// Demonstrates the five JVM memory areas and what lives where.

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryUsage;
import java.lang.management.ThreadMXBean;

public class MemoryLayoutDemo {

    private String applicationName = "TheCodeForge";
    private static final int MAX_CONNECTIONS = 1024;

    public static void main(String[] args) {
        MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();

        System.out.println("=== JVM MEMORY LAYOUT ===");
        System.out.println();

        MemoryUsage heap = memoryBean.getHeapMemoryUsage();
        System.out.println("HEAP (shared across all threads):");
        System.out.printf("  Init:   %,d bytes (%.1f MB)%n", heap.getInit(), heap.getInit() / 1048576.0);
        System.out.printf("  Used:   %,d bytes (%.1f MB)%n", heap.getUsed(), heap.getUsed() / 1048576.0);
        System.out.printf("  Committed: %,d bytes (%.1f MB)%n", heap.getCommitted(), heap.getCommitted() / 1048576.0);
        System.out.printf("  Max:    %s%n", heap.getMax() == -1 ? "unlimited" : String.format("%,d bytes (%.1f MB)", heap.getMax(), heap.getMax() / 1048576.0));
        System.out.println("  Contains: all objects, arrays, string pool contents");
        System.out.println();

        MemoryUsage nonHeap = memoryBean.getNonHeapMemoryUsage();
        System.out.println("NON-HEAP (includes Metaspace):");
        System.out.printf("  Init:   %,d bytes (%.1f MB)%n", nonHeap.getInit(), nonHeap.getInit() / 1048576.0);
        System.out.printf("  Used:   %,d bytes (%.1f MB)%n", nonHeap.getUsed(), nonHeap.getUsed() / 1048576.0);
        System.out.printf("  Max:    %s%n", nonHeap.getMax() == -1 ? "unlimited (Infinity MB)" : String.format("%,d bytes (%.1f MB)", nonHeap.getMax(), nonHeap.getMax() / 1048576.0));
        System.out.println("  Contains: class metadata, method bytecode, JIT code cache");
        System.out.println();

        System.out.println("MEMORY POOLS (heap regions + non-heap regions):");
        for (MemoryPoolMXBean pool : ManagementFactory.getMemoryPoolMXBeans()) {
            MemoryUsage usage = pool.getUsage();
            System.out.printf("  %-30s  used: %6.1f MB  max: %s  type: %s%n",
                pool.getName(),
                usage.getUsed() / 1048576.0,
                usage.getMax() == -1 ? "unlimited" : String.format("%.1f MB", usage.getMax() / 1048576.0),
                pool.getType());
        }
        System.out.println();

        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        System.out.println("STACK (per-thread — each thread has its own):");
        System.out.printf("  Active threads: %d%n", threadBean.getThreadCount());
        System.out.printf("  Current thread stack: %s%n", Thread.currentThread().getName());
        System.out.println("  Contains: local variables, method parameters, return addresses");
        System.out.println("  Each stack frame = one method call on the call stack");
        System.out.println();

        System.out.println("PROGRAM COUNTER REGISTER (per-thread):");
        System.out.println("  Points to the next JVM instruction to execute");
        System.out.println("  For native methods: undefined (native code manages its own PC)");
        System.out.println();

        System.out.println("=== SUMMARY ===");
        System.out.println("  HEAP:       shared, objects/arrays, garbage collected");
        System.out.println("  STACK:      per-thread, local variables, auto-managed");
        System.out.println("  METASPACE:  shared, class metadata, grows until MaxMetaspaceSize");
        System.out.println("  PC REG:     per-thread, current instruction pointer");
        System.out.println("  NATIVE:     per-thread, for JNI/native method calls");
    }
}
Mental Model
JVM as an operating system
The JVM is an OS running inside your OS — with its own memory regions.
  • Heap = the JVM's RAM — all objects live here, shared across threads
  • Stack = per-thread workspace — each thread has its own, no sharing needed
  • Metaspace = blueprint storage — class definitions, loaded once at startup
📊 Production Insight
In a high-traffic payment service I ran, we hit metaspace exhaustion after 200 hot deploys in one day. The root cause was a library that created new classloaders on every request. We fixed it by pinning the classloader and setting -XX:MaxMetaspaceSize=512m. Lesson: class metadata is not free — monitor it aggressively in long-running containers.
🎯 Key Takeaway
Memory layout = where things live
Visibility = when threads see each other's writes
→ Don't mix them up
Quick Memory Area Decision Guide
IfYou see OutOfMemoryError: Java heap space
UseFocus on heap (objects) → take heap dump
IfYou see OOMKilled with no Java exception
UseFocus on non-heap (metaspace, direct buffers, thread stacks) → check native memory
IfInconsistent values across threads
UseFocus on happens-before → add volatile or synchronized
JVM Memory Architecture Java 21–25
Shared — All Threads
Heap · GC managed
Heap
Objects & arrays · -Xms / -Xmx
Young Generation · Minor GC
Eden
New objects
Survivor S0
From
Survivor S1
To
Old Generation (Tenured)
Long-lived objects • Promoted after ~15 GC cycles
G1 • ZGC (Java 21+ generational)
No GC · Native
Metaspace
Class metadata • Bytecode • Constant pool
-XX:MaxMetaspaceSize
Per-Thread — Not Shared
JVM Stack
Stack frames
Local vars • Operand stack • Return address
Platform Threads
Virtual Threads (Java 21–25) use carrier threads — much lighter
PC Register
Program Counter
Current bytecode instruction
Native Stack
JNI / Native calls
JIT only
Code Cache
JIT-compiled native code
-XX:ReservedCodeCacheSize
thecodeforge.io
Share
JVM Memory Architecture — Heap (Eden → Survivor → Old Gen), Metaspace, per-thread Stack, PC Register, Native Stack, Code Cache.
Java 21–25: Virtual Threads + generational ZGC + Scoped Values
Jvm Memory Model

Heap Memory — Young Generation, Old Generation, and How Objects Age

The heap is where all Java objects live. It's shared across all threads, and it's where garbage collection operates.

📊Heap Flow: `` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ EDEN │ ──→ │ SURVIVOR │ ──→ │ OLD GEN │ │ (new objects)│ │ (aged objects)│ │ (long-lived) │ └──────────────┘ └──────────────┘ └──────────────┘ ↓ ↓ ↓ Minor GC Minor GC Full GC (fast) (copying) (slow) ``

Young Generation (New Space): Where new objects are allocated. - Eden: All new objects start here. When Eden fills up, a minor GC runs. - Survivor Space 0 (S0) and Survivor Space 1 (S1): Two equal-sized spaces. Objects that survive minor GCs get copied between them, aging each time. - Promotion: When age exceeds threshold (default: 15), object moves to Old Generation.

Old Generation (Tenured Space): Long-lived objects. When Old Gen fills up, a major GC (or full GC) runs — expensive, often stop-the-world.

The generational hypothesis: 90-98% of objects die young. Minor GCs are fast (1-10ms). Full GCs are slow (100ms to seconds).

⚠ When NOT to tune generational heap sizes
  • Ultra-low latency systems (<1ms pauses): G1's generational model still causes stop-the-world. Use ZGC instead (-XX:+UseZGC).
  • Heaps > 64 GB: G1's region management overhead grows. Consider ZGC or Shenandoah.
  • Short-lived batch jobs: GC tuning won't help if the JVM exits in seconds. Focus on allocation rate.
HeapStructureDemo.java · JAVA
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
// io.thecodeforge.jvm.memory.HeapStructureDemo
// Shows how objects move through heap generations.

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryUsage;
import java.util.ArrayList;
import java.util.List;

public class HeapStructureDemo {

    static class OrderEvent {
        private final String orderId;
        private final String customerId;
        private final double amount;
        private final long timestamp;
        private final byte[] payload;

        OrderEvent(String orderId, String customerId, double amount) {
            this.orderId = orderId;
            this.customerId = customerId;
            this.amount = amount;
            this.timestamp = System.currentTimeMillis();
            this.payload = new byte[256];
        }
    }

    public static void main(String[] args) {
        System.out.println("=== HEAP GENERATION TRACKING ===");
        System.out.println();

        printMemoryPools("BEFORE allocation");

        System.out.println("\nPhase 1: Allocating 100,000 short-lived objects...");
        for (int batch = 0; batch < 10; batch++) {
            List<OrderEvent> shortLived = new ArrayList<>();
            for (int i = 0; i < 10_000; i++) {
                shortLived.add(new OrderEvent(
                    "ORD-" + batch + "-" + i,
                    "CUST-" + (i % 1000),
                    Math.random() * 500
                ));
            }
        }
        printMemoryPools("AFTER short-lived allocation");

        System.out.println("\nRequesting GC...");
        System.gc();
        printMemoryPools("AFTER GC — Eden should be nearly empty");

        System.out.println("\nPhase 2: Allocating 50,000 long-lived objects...");
        List<OrderEvent> longLived = new ArrayList<>();
        for (int i = 0; i < 50_000; i++) {
            longLived.add(new OrderEvent(
                "LONG-" + i,
                "CUST-PERM-" + (i % 100),
                Math.random() * 1000
            ));
        }
        printMemoryPools("AFTER long-lived allocation");

        System.out.println("\nPhase 3: Multiple GCs to promote survivors to Old Gen...");
        for (int i = 0; i < 5; i++) {
            System.gc();
            System.out.println("  GC cycle " + (i + 1) + " complete");
        }
        printMemoryPools("AFTER promotion cycles — Old Gen should have grown");
    }

    static void printMemoryPools(String label) {
        System.out.println("\n  " + label + ":");
        for (MemoryPoolMXBean pool : ManagementFactory.getMemoryPoolMXBeans()) {
            if (pool.getType() == java.lang.management.MemoryType.HEAP) {
                MemoryUsage usage = pool.getUsage();
                System.out.printf("    %-30s  used: %6.1f MB  committed: %8.1f MB%n",
                    pool.getName(),
                    usage.getUsed() / 1048576.0,
                    usage.getCommitted() / 1048576.0);
            }
        }
    }
}
Mental Model
Heap as a conveyor belt with a warehouse
Short-lived objects are cheap. Long-lived ones are expensive.
  • New objects land on the belt (Eden) — most die here instantly
  • Survivors move to a holding area (Survivor spaces), aging each pass
  • Long-lived objects graduate to the warehouse (Old Gen)
  • Cleaning the belt = fast (minor GC). Cleaning the warehouse = slow (full GC)
📊 Production Insight
Survivor spaces too small → premature promotion → Old Gen fills → full GC spike
Survivor spaces too large → wasted heap → lower allocation efficiency
→ Monitor with -XX:+PrintTenuringDistribution, target 70-80% survival rate
🎯 Key Takeaway
Short-lived objects = cheap (die in Eden, minor GC)
Long-lived objects = expensive (Old Gen, full GC)
→ Reduce allocation rate, not heap size
Heap Tuning Decision Tree
IfHigh allocation rate + many short-lived objects
UseIncrease Eden size (-XX:NewRatio=2)
IfFrequent full GCs with low Old Gen usage
UseIncrease Survivor size or MaxTenuringThreshold
IfUltra-low latency required
UseSwitch to ZGC and stop tuning generational heap

Garbage Collection — How the JVM Reclaims Memory

The garbage collector automatically reclaims memory occupied by objects that are no longer reachable from any GC root (local variables, static fields, active threads, JNI references).

GC Root types: Local variables, static fields, active threads, JNI references, monitors.

Major GC algorithms:

G1 (Garbage First) — Default since Java 9. Divides heap into regions (1-4 MB). Collects regions with most garbage first. Target pause time: -XX:MaxGCPauseMillis (default 200ms). Best for: heaps 4-64 GB, moderate latency.

ZGC — Ultra-low latency. Sub-millisecond pauses regardless of heap size (tested to 16 TB). Uses colored pointers + load barriers. Available since Java 15, generational since Java 21. Best for: heaps > 16 GB, sub-ms latency requirements.

Parallel GC — Throughput-optimized. Multiple threads, stop-the-world. Max application time vs GC. Best for: batch jobs, ETL, analytics.

⚠ When NOT to use G1
  • Ultra-low latency systems (<1ms pauses): G1 still has stop-the-world phases. Use ZGC.
  • High-throughput batch processing: G1's concurrent overhead reduces throughput. Use Parallel GC.
  • Heaps < 2 GB: G1's region management overhead isn't worth it. Use Serial GC (-XX:+UseSerialGC).

Java Memory Leaks and Prevention — Fix container OOMKills and set correct memory limits

GCTuningDemo.java · JAVA
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
// io.thecodeforge.jvm.memory.GCTuningDemo
// Demonstrates GC behavior and collector selection.

import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryUsage;
import java.util.ArrayList;
import java.util.List;

public class GCTuningDemo {

    public static void main(String[] args) {
        System.out.println("=== GC INFORMATION ===");
        System.out.println();

        System.out.println("Active Garbage Collectors:");
        for (GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) {
            System.out.printf("  Name: %-30s  Collections: %d  Time: %d ms%n",
                gc.getName(), gc.getCollectionCount(), gc.getCollectionTime());
        }
        System.out.println();

        System.out.println("=== GC SELECTION GUIDE ===");
        System.out.println();
        System.out.println("┌─────────────────────────────────────────────────────────────────┐");
        System.out.println("│  USE G1 IF:                    │  USE ZGC IF:                 │");
        System.out.println("├────────────────────────────────┼──────────────────────────────┤");
        System.out.println("│  • Heap 4-64 GB                │  • Heap > 16 GB              │");
        System.out.println("│  • Moderate latency (50-200ms) │  • Sub-millisecond pauses    │");
        System.out.println("│  • Default, no tuning needed   │  • Real-time / trading systems│");
        System.out.println("├────────────────────────────────┼──────────────────────────────┤");
        System.out.println("│  USE PARALLEL GC IF:           │  AVOID G1 IF:                │");
        System.out.println("│  • Batch jobs / ETL            │  • Ultra-low latency (<1ms)  │");
        System.out.println("│  • Max throughput needed       │  • Heap < 2 GB               │");
        System.out.println("│  • GC pauses don't matter      │  • High-throughput batch     │");
        System.out.println("└────────────────────────────────┴──────────────────────────────┘");
        System.out.println();

        System.out.println("=== RECOMMENDED GC FLAGS ===");
        System.out.println();
        System.out.println("Low-latency (web services):");
        System.out.println("  -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:G1HeapRegionSize=4m");
        System.out.println();
        System.out.println("Ultra-low latency (<1ms):");
        System.out.println("  -XX:+UseZGC -XX:+ZGenerational");
        System.out.println();
        System.out.println("Throughput (batch):");
        System.out.println("  -XX:+UseParallelGC -XX:ParallelGCThreads=<cores>");
        System.out.println();
        System.out.println("GC logging (ALWAYS enable):");
        System.out.println("  -Xlog:gc*:file=gc.log:time,uptime,level,tags:filecount=5,filesize=50m");

        System.out.println("\n=== ALLOCATION PRESSURE TEST ===");
        long gcCountBefore = getTotalGCCount();
        long gcTimeBefore = getTotalGCTime();

        List<byte[]> pressure = new ArrayList<>();
        for (int i = 0; i < 200; i++) {
            pressure.add(new byte[1024 * 1024]);
            if ((i + 1) % 50 == 0) {
                long gcCountNow = getTotalGCCount();
                long gcTimeNow = getTotalGCTime();
                System.out.printf("  Allocated %d MB — GC count: %d (+%d), GC time: %d ms (+%d ms)%n",
                    i + 1, gcCountNow, gcCountNow - gcCountBefore,
                    gcTimeNow, gcTimeNow - gcTimeBefore);
            }
        }

        pressure.clear();
        System.gc();

        System.out.printf("\n  Total GC events: %d%n", getTotalGCCount() - gcCountBefore);
        System.out.printf("  Total GC time: %d ms%n", getTotalGCTime() - gcTimeBefore);
    }

    static long getTotalGCCount() {
        return ManagementFactory.getGarbageCollectorMXBeans().stream()
            .mapToLong(GarbageCollectorMXBean::getCollectionCount).sum();
    }

    static long getTotalGCTime() {
        return ManagementFactory.getGarbageCollectorMXBeans().stream()
            .mapToLong(GarbageCollectorMXBean::getCollectionTime).sum();
    }
}
Mental Model
GC as warehouse cleaning strategies
GC algorithm = cleaning strategy. Choose by how much downtime you can accept.
  • G1 — cleans the messiest aisles first (Garbage First). Best for 4–64 GB heaps
  • ZGC — hires a night crew that cleans while you work. Sub-ms pauses, any size heap
  • Parallel GC — brings the whole team in. Max throughput, stop-the-world pauses
📊 Production Insight
G1 default pause target (200ms) is often too relaxed for APIs
→ Default is not production-ready
→ Set 50ms for web services, 20ms for real-time, switch to ZGC below 1ms
→ Always validate with GC logs before tuning flags blind
🎯 Key Takeaway
Lower latency = more frequent GC = higher CPU cost
Higher throughput = fewer GCs = longer pauses
→ Pick the trade-off your SLA demands, not the 'best' algorithm
GC Selection Strategy
IfHeap < 4 GB, latency not critical
UseG1 (default)
IfHeap 4-64 GB, moderate latency
UseG1 with -XX:MaxGCPauseMillis=50
IfHeap > 16 GB, sub-ms latency needed
UseZGC (-XX:+UseZGC -XX:+ZGenerational)
IfBatch job, max throughput
UseParallel GC (-XX:+UseParallelGC)
GC Selection Guide Production 2026
Default
G1 (Garbage First)
Balanced
• Pause target: 50–200 ms
• Heap sweet spot: 4–64 GB
• Good for: Web services, APIs
Most common choice
Ultra-low latency
ZGC (Java 21+)
Sub-millisecond
• Pause: <1 ms (even on 16 TB heaps)
• Generational in Java 21–25
• Good for: Trading, real-time systems
Future default
Throughput
Parallel GC
Max throughput
• Pause: 100 ms – seconds
• Heap: Any size (best <32 GB)
• Good for: Batch jobs, ETL, analytics
Batch workloads
Quick Decision Rule:
• Web / API service → G1 with -XX:MaxGCPauseMillis=50
• Need <1 ms pauses → ZGC (Java 21+)
• Batch / max throughput → Parallel GC
thecodeforge.io
GC Selection Guide — G1 vs ZGC vs Parallel GC (Production 2026)
Jvm Memory Model

Happens-Before — Thread Visibility and the Rules That Prevent Data Races

This is the second half of the JMM — and the half that causes the most subtle bugs. The memory layout (heap, stack, GC) determines where objects live. The happens-before rules determine when one thread's writes are visible to another thread.

The core problem: Modern CPUs have multiple cores, each with its own L1/L2 cache. Without synchronization, there is NO guarantee that Thread B sees Thread A's write.

The JMM solution — happens-before: A partial ordering of operations. If A happens-before B, then A's writes are visible to B.

Key rules: 1. Program order: Within one thread, every action happens-before later actions. 2. Monitor lock: Unlock happens-before subsequent lock on same monitor (synchronized). 3. Volatile variable: Write to volatile happens-before subsequent read of that volatile. 4. Thread start: Thread.start() happens-before actions in started thread. 5. Thread join: Thread's actions happen-before Thread.join() returns. 6. Transitivity: If A happens-before B and B happens-before C, then A happens-before C.

⚠ When NOT to rely on volatile
  • Compound operations (count++, x = y): Volatile only provides visibility, not atomicity. Use AtomicInteger or synchronized.
  • Multiple variables needing consistent state: Volatile on one variable doesn't create happens-before for others. Use synchronized or Lock.
  • When you need mutual exclusion: Volatile doesn't block threads. Use synchronized or ReentrantLock.

⚠️ x86 Hides Concurrency Bugs — ARM Exposes Them: x86 has strong memory ordering (TSO). Many data races 'work' on x86 but crash on ARM (Graviton, Apple Silicon). If you deploy to ARM, test there. Always establish happens-before edges — never rely on architecture-specific behavior.

Multithreading in Java — Concurrent collections and thread-safe patterns

HappensBeforeDemo.java · JAVA
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
// io.thecodeforge.jvm.memory.HappensBeforeDemo
// Demonstrates visibility, volatile, and data races.

public class HappensBeforeDemo {

    private static boolean running = true;
    private static volatile boolean volatileRunning = true;
    private static volatile int volatileCounter = 0;

    public static void main(String[] args) throws InterruptedException {
        System.out.println("=== HAPPENS-BEFORE DEMONSTRATION ===");
        System.out.println();

        System.out.println("--- Demo 1: Non-volatile flag (NO happens-before) ---");
        Thread worker = new Thread(() -> {
            int iterations = 0;
            while (running) {
                iterations++;
            }
            System.out.println("  Worker exited after " + iterations + " iterations");
        });
        worker.start();
        Thread.sleep(100);
        running = false;
        System.out.println("  Main set running=false. Worker MAY never see it.");
        worker.join(1000);
        if (worker.isAlive()) {
            System.out.println("  ❌ Worker still running — data race! (x86 may hide this)");
            worker.interrupt();
        }
        System.out.println();

        System.out.println("--- Demo 2: Volatile flag (happens-before guaranteed) ---");
        Thread worker2 = new Thread(() -> {
            int iterations = 0;
            while (volatileRunning) {
                iterations++;
            }
            System.out.println("  Worker exited after " + iterations + " iterations");
        });
        worker2.start();
        Thread.sleep(100);
        volatileRunning = false;
        worker2.join();
        System.out.println("  ✅ Worker exited — happens-before guaranteed");
        System.out.println();

        System.out.println("--- Demo 3: Volatile does NOT provide atomicity ---");
        Thread[] incrementers = new Thread[10];
        for (int i = 0; i < 10; i++) {
            incrementers[i] = new Thread(() -> {
                for (int j = 0; j < 10000; j++) {
                    volatileCounter++;
                }
            });
        }
        for (Thread t : incrementers) t.start();
        for (Thread t : incrementers) t.join();
        System.out.printf("  10 threads * 10,000 increments = 100,000 expected%n");
        System.out.printf("  volatileCounter = %d (likely less — lost updates!)%n", volatileCounter);
        System.out.println("  Fix: Use AtomicInteger or synchronized");
        System.out.println();

        System.out.println("--- Demo 4: Double-checked locking (requires volatile) ---");
        System.out.println("  Before Java 5, double-checked locking was BROKEN.");
        System.out.println("  Java 5+ requires volatile for correctness:");
        System.out.println("    private volatile static MyClass instance;");
        System.out.println("    if (instance == null) {");
        System.out.println("        synchronized (MyClass.class) {");
        System.out.println("            if (instance == null) {");
        System.out.println("                instance = new MyClass();");
        System.out.println("            }");
        System.out.println("        }");
        System.out.println("    }");
    }
}
Mental Model
Happens-before as a contract between threads
Without happens-before, threads live in parallel universes — writes don't cross over.
  • volatile = bulletin board: writes posted for all threads to see (visibility only)
  • synchronized = meeting room: one thread inside at a time (visibility + atomicity)
  • Thread.start() / join() = handshake: guarantees ordering across thread boundaries
📊 Production Insight
In one recommendation engine we ran on ARM Graviton instances, a data-race bug that never showed on x86 suddenly caused incorrect recommendations under load. The fix was adding volatile to a shared config flag. Lesson: never assume x86 memory ordering — always establish happens-before.
🎯 Key Takeaway
volatile = visibility only (no atomicity)
synchronized = visibility + atomicity + mutual exclusion
→ volatile is not a drop-in replacement for synchronized
Visibility Decision Tree
IfSingle variable, only visibility needed
UseUse volatile
IfMultiple variables or compound action
UseUse synchronized or Lock
IfHigh contention + complex logic
UseUse java.util.concurrent primitives (Atomic*, ConcurrentHashMap)
Platform Threads vs Virtual Threads Java 21–25
Traditional
Platform Threads
One OS thread per Java thread
• Fixed stack (usually 1 MB)
• Expensive to create & switch
• Limited by OS thread limit
• High memory overhead
Heavy • Blocking
Modern
Virtual Threads
Lightweight • JVM-managed
• Stack is heap-backed & dynamic
• Extremely cheap to create
• 100k+ concurrent tasks possible
• Carrier threads do the real work
Light • Non-blocking
Key Memory Difference:
Platform threads consume ~1 MB stack each → limited concurrency.
Virtual threads use almost no stack memory (heap-backed) → massive concurrency on a handful of carrier threads.
thecodeforge.io
Platform Threads vs Virtual Threads — Memory & Concurrency Comparison (Java 21–25)
Jvm Memory Model

Common Production Mistakes and Debugging Patterns

These are the mistakes I've seen in production systems and the debugging patterns that caught them. Every one of these has caused a real incident.

Java Memory Leaks and Prevention — Take and analyse heap dumps step by step

ProductionMistakesDemo.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
// io.thecodeforge.jvm.memory.ProductionMistakesDemo
// Demonstrates common production memory mistakes and fixes.

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryUsage;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.List;

public class ProductionMistakesDemo {

    private static final ThreadLocal<List<byte[]>> leakedThreadLocal = new ThreadLocal<>();

    public static void main(String[] args) throws Exception {
        System.out.println("=== PRODUCTION JVM MEMORY MISTAKES ===");
        System.out.println();

        System.out.println("--- Mistake 1: -Xmx too large for container ---");
        MemoryMXBean bean = ManagementFactory.getMemoryMXBean();
        MemoryUsage heap = bean.getHeapMemoryUsage();
        MemoryUsage nonHeap = bean.getNonHeapMemoryUsage();
        System.out.printf("  Heap: %.1f MB, Non-heap: %.1f MB, Thread stacks: ~10 MB%n",
            heap.getCommitted() / 1048576.0, nonHeap.getUsed() / 1048576.0);
        System.out.println("  🔧 Fix: -Xmx = 75-80% of container limit");
        System.out.println();

        System.out.println("--- Mistake 2: Off-heap (direct buffer) memory ---");
        List<ByteBuffer> directBuffers = new ArrayList<>();
        for (int i = 0; i < 100; i++) {
            directBuffers.add(ByteBuffer.allocateDirect(1024 * 1024));
        }
        System.out.printf("  Allocated 100 MB direct buffers — heap unchanged at %.1f MB%n",
            bean.getHeapMemoryUsage().getUsed() / 1048576.0);
        System.out.println("  🔧 Monitor: -XX:NativeMemoryTracking=summary");
        System.out.println("  🔧 Inspect: jcmd <pid> VM.native_memory summary");
        System.out.println();

        System.out.println("--- Mistake 3: ThreadLocal leak ---");
        Thread[] pool = new Thread[3];
        for (int i = 0; i < 3; i++) {
            pool[i] = new Thread(() -> {
                leakedThreadLocal.set(new ArrayList<>(List.of(new byte[1024 * 1024])));
            });
            pool[i].start();
            pool[i].join();
        }
        System.out.println("  3 threads leaked 1 MB each → 3 MB unreachable");
        System.out.println("  🔧 Fix: try { tl.set(value); } finally { tl.remove(); }");
        System.out.println();

        System.out.println("--- Mistake 4: String.intern() on user input ---");
        System.out.println("  ❌ String userId = request.getParameter(\"id\").intern();");
        System.out.println("  🔧 Fix: Use equals(), never intern user input");
        System.out.println();

        System.out.println("=== PRODUCTION FLAGS (COPY-PASTE READY) ===");
        System.out.println();
        System.out.println("-Xms2g -Xmx2g");
        System.out.println("-XX:MaxMetaspaceSize=512m");
        System.out.println("-XX:ReservedCodeCacheSize=256m");
        System.out.println("-XX:+UseG1GC -XX:MaxGCPauseMillis=50");
        System.out.println("-XX:+DisableExplicitGC");
        System.out.println("-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/jvm/");
        System.out.println("-Xlog:gc*:file=gc.log:time,uptime,level,tags:filecount=5,filesize=50m");
        System.out.println("-XX:NativeMemoryTracking=summary");
        System.out.println("-XX:+FlightRecorder");
    }
}
Mental Model
Memory problems follow patterns
JVM memory failures repeat. Learn the pattern, skip the debugging marathon.
  • OOM: Java heap space → take a heap dump, analyze with Eclipse MAT
  • Latency spikes → enable GC logging, check pause times
  • OOMKilled (no Java exception) → non-heap memory; set -Xmx to 75% of limit
  • Inconsistent values between threads → missing happens-before; add volatile or lock
📊 Production Insight
In a recommendation engine running on AWS Graviton (ARM), a data-race bug that never manifested on x86 suddenly caused incorrect recommendations under load. The root cause was missing happens-before on a shared config flag. We fixed it by making the flag volatile. Lesson: never rely on x86 memory ordering — always establish happens-before edges.
🎯 Key Takeaway
OOM → heap dump | Latency spikes → GC logs
OOMKilled → native memory tracking | Stale reads → happens-before
→ Learn the pattern once, skip the debugging marathon every time
🗂 JVM Memory Cheat Sheet
Use this as a quick reference when debugging memory or tuning JVM flags
Memory AreaShared/Per-ThreadContentsManagementKey Flag
HeapSharedAll objects, arrays, string poolGarbage Collector-Xms, -Xmx
Eden (Young Gen)SharedNewly allocated objectsMinor GC-XX:NewRatio-XX:SurvivorRatio
Survivor (Young Gen)SharedObjects surviving minor GCMinor GC (copying)-XX:SurvivorRatio-XX:MaxTenuringThreshold
Old Gen (Tenured)SharedLong-lived objectsMajor/Full GC-XX:NewRatio
MetaspaceSharedClass metadata, bytecodeClassLoader GC-XX:MaxMetaspaceSize
StackPer-threadLocal variables, framesAutomatic (pop on return)-Xss
PC RegisterPer-threadCurrent instruction pointerJVM internalN/A
Code CacheSharedJIT-compiled codeCode cache flushing-XX:ReservedCodeCacheSize

🎯 Key Takeaways

  • 1 — Heap as a conveyor belt: Fast turnover matters more than size. Reduce allocation rate, not heap size. 90-98% of objects die young.
  • 2 — Old Gen as a warehouse: Expensive to clean. Prevent premature promotion by sizing Survivor spaces correctly. Monitor with -XX:+PrintTenuringDistribution.
  • 3 — GC selection as tradeoffs: Lower latency = more frequent GC = higher CPU. Higher throughput = less frequent GC = longer pauses. When NOT to use G1: ultra-low latency (use ZGC), high-throughput batch (use Parallel GC), heaps < 2 GB (use Serial GC).
  • 4 — Happens-before as a contract: Without it, threads live in parallel universes. volatile = visibility only. synchronized = visibility + atomicity + mutual exclusion. When NOT to use volatile: compound operations, multiple variables, mutual exclusion needed.
  • 5 — Metaspace as a safety valve: Always set -XX:MaxMetaspaceSize. Native memory OOM kills the process with no heap dump — silent and deadly.
  • 6 — Stack is cheap, but not free: 500 threads × 1 MB = 500 MB before heap allocation. Virtual threads (Java 21+) fix this for I/O-bound workloads.
  • 7 — Memory problems follow patterns: Learn the pattern → recognize symptom → apply fix. Quick reference: OOM → heap dump, latency spikes → GC logs, OOMKilled → native memory tracking, inconsistent values → happens-before.
  • 🚨 Production Incident Recap: The 4GB container with -Xmx4g kept getting OOMKilled. Root cause: non-heap memory (metaspace, thread stacks, code cache, direct buffers) pushed total process memory over the limit. Fix: -Xmx3g + NativeMemoryTracking. Lesson: Heap ≠ total memory. Always leave 20-25% headroom.

⚠ Common Mistakes to Avoid

    -Xmx too large for containers — In Kubernetes, -Xmx4g in a 4 GB container = OOMKilled. The JVM needs extra memory for metaspace, thread stacks, code cache, direct buffers, GC overhead. Fix: Set -Xmx to 75-80% of container limit. 4 GB container → -Xmx3g.
    Fix

    Set -Xmx to 75-80% of container limit. 4 GB container → -Xmx3g.

    String.intern() on user input — Creates permanent string pool entries that never get GC'd. 10 million unique URLs = 10 million permanent strings. Fix: Never intern user input. Use equals() for comparison.
    Fix

    Never intern user input. Use equals() for comparison.

    ThreadLocal leaks in thread pools — ThreadLocal values persist across request cycles in recycled threads. In servlet containers, this can leak entire webapp classloaders. Fix: Always use try { tl.set(value); } finally { tl.remove(); }
    Fix

    Always use try { tl.set(value); } finally { tl.remove(); }

    Off-heap memory not tracked by -Xmx — ByteBuffer.allocateDirect(), MappedByteBuffer, Netty direct buffers use native memory. A service with -Xmx2g can use 4 GB total. Fix: Monitor with -XX:NativeMemoryTracking=summary. Use jcmd <pid> VM.native_memory.
    Fix

    Monitor with -XX:NativeMemoryTracking=summary. Use jcmd <pid> VM.native_memory.

    System.gc() triggering full GCs — Some libraries call System.gc(), causing stop-the-world pauses. Fix: Run with -XX:+DisableExplicitGC or -XX:+ExplicitGCInvokesConcurrent. **📚 RELATED NEXT STEPS** → **JVM in Containers** — Fix OOMKilled pods and set correct -Xmx for containers → **Memory Leak Detection** — Take and analyse heap dumps step by step
    Fix

    Run with -XX:+DisableExplicitGC or -XX:+ExplicitGCInvokesConcurrent. 📚 RELATED NEXT STEPSJVM in Containers — Fix OOMKilled pods and set correct -Xmx for containers → Memory Leak Detection — Take and analyse heap dumps step by step

Interview Questions on This Topic

  • QExplain the difference between heap and stack memory. What lives in each?
  • QWalk me through what happens when I write new Object() — where does it go, and how does it move through GC generations?
  • QWhat is the purpose of Survivor spaces? What happens if they're too small? Too large?
  • QCompare G1, ZGC, and Parallel GC. When would you choose each? When would you NOT choose G1?
  • QWhat is the happens-before relationship? List 4 rules that create happens-before edges.
  • QWhy is double-checked locking broken without volatile? How does volatile fix it?
  • QYou see 'OOMKilled' in Kubernetes but no heap dump. What happened and how do you debug?
  • QWhat is a ClassLoader leak? How does it cause Metaspace exhaustion? How do you detect it?
  • QWhat is the difference between volatile and synchronized? When can't you use volatile?
  • QWhy does x86 hide concurrency bugs that ARM exposes? How do you protect against this?

Frequently Asked Questions

How do I choose the right garbage collector?

Quick version: Web services with 4-64 GB heap → G1. Ultra-low latency (<1ms) → ZGC. Batch jobs → Parallel GC. Heaps < 2 GB → G1 or Serial. When NOT to use G1: ultra-low latency systems (use ZGC), high-throughput batch (use Parallel GC), heaps < 2 GB (use Serial GC). The wrong GC choice can make latency 10x worse.

What's the single most common cause of full GCs?

Premature promotion — objects moving to Old Gen before they die. Caused by Survivor spaces too small or tenuring threshold too low. Fix: increase -XX:SurvivorRatio and -XX:MaxTenuringThreshold. Monitor with -XX:+PrintTenuringDistribution.

How do I detect a memory leak without a heap dump?

Monitor with jstat -gcutil <pid> 1s. If Old Gen grows monotonically without decreasing after GC, you have a leak. If Metaspace grows continuously, you have a ClassLoader leak. If the process's RSS grows but heap is stable, you have a native memory leak (direct buffers, JNI).

What's the difference between minor GC, major GC, and full GC?

Minor GC = Young Gen only (Eden + Survivor). Fast (1-10ms). Major GC = Old Gen only (rare in G1). Full GC = entire heap + metaspace. Slow (100ms to seconds). Your goal: eliminate full GCs entirely. If you see them in logs, something is wrong.

Can I rely on x86's strong memory model?

No. Never. Code that works on x86 may fail on ARM (AWS Graviton, Apple Silicon, Android). The JMM guarantees are the minimum you can rely on across all platforms. If you don't establish a happens-before edge, you have a bug — even if it doesn't crash on your laptop.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousGarbage Collection in JavaNext →JVM Memory Issues in Production: Debugging Guide (OOM, GC, Leaks)
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged