Advanced 13 min · March 05, 2026

JVM Memory Model — OOMKilled by Non-Heap Overhead

JVM's -Xmx4g in 4GB container leaves zero headroom; non-heap overhead ~490 MB triggers OOMKilled.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Heap: Shared memory for all objects, managed by the garbage collector. Divided into young generation (eden + survivors) and old generation.
  • Stack: Per-thread memory holding local variables and method frames. Freed automatically on method return — not GC-managed.
  • Metaspace: Stores class metadata outside the heap. Unbounded by default — always set -XX:MaxMetaspaceSize in production to prevent runaway growth.
  • GC pauses: Stop-the-world events where all application threads halt. G1 is the default (50–200ms). Use ZGC for sub-10ms pause requirements.
  • Happens-before: The JMM guarantee that memory writes in one thread are visible to another. Established by volatile, synchronized, and Lock — without it, changes may never be seen.
Plain-English First

Imagine your Java program is a busy restaurant kitchen. The heap is the giant walk-in fridge where all the ingredients (objects) are stored — anyone on the team can grab from it. Each chef (thread) has their own small personal workbench (stack) for chopping and prep — nobody else touches it. The maitre d' (garbage collector) periodically walks the fridge and tosses anything nobody is using anymore. The JVM Memory Model is simply the blueprint that describes exactly how that kitchen is laid out, who can access what, and the rules for keeping orders from getting mixed up.

Every Java performance crisis, every mysterious NullPointerException in production at 3 AM, and every subtle data-race bug ultimately traces back to the same root cause: the developer didn't have a clear mental model of how the JVM manages memory. It's not an academic concern — OutOfMemoryErrors, thread-visibility bugs, and stop-the-world GC pauses are day-one realities on any high-traffic service. Yet most Java developers can describe the syntax of a HashMap far better than they can explain why two threads can see different values for the same variable without any apparent concurrency bug.

The JVM Memory Model (JMM) solves two distinct but interrelated problems. First, it defines the physical layout of memory — where objects live, how long they live, and how the garbage collector reclaims them. Second, it defines the visibility and ordering guarantees between threads — the rules that determine whether a write made by Thread A is actually observable by Thread B. Mixing up these two concerns is the source of enormous confusion. The JMM specification (JSR-133, baked into the Java Language Specification since Java 5) is one of the most carefully engineered pieces of the Java platform, and understanding it separates senior engineers from the rest.

I've debugged JVM memory issues across payment processing systems handling 50,000 TPS, recommendation engines running 60 GB heaps, and microservices dying silently from metaspace exhaustion after hot-deploy cycles. The patterns are always the same: developers who understand the memory layout fix problems in minutes; developers who don't spend days chasing phantom bugs.

By the end of this article you'll be able to walk through a running JVM and name exactly what lives where and why. You'll understand the happens-before relationship well enough to reason about data races without guessing. You'll know how to tune GC regions for low-latency workloads, avoid the common memory-layout mistakes that cause silent correctness bugs, and answer the JMM interview questions that trip up even experienced engineers.

> ⚠️ Terminology note: This guide covers two distinct concepts that share confusingly similar names. JVM Memory (heap, stack, metaspace, GC) is the runtime memory structure — where objects live and how they're reclaimed. Java Memory Model (JMM) (happens-before, volatile, synchronized) is the thread visibility specification — the rules that determine when one thread's writes are observable by another. Both are covered here because they're deeply interrelated in production debugging.

What is JVM Memory Model?

The JVM Memory Model defines two things that engineers constantly conflate:

  1. The memory layout — how the JVM divides process memory into regions (heap, stack, metaspace, etc.), what lives in each region, and when memory is reclaimed.
  2. The visibility model — the happens-before rules that determine when a write by one thread is guaranteed to be visible to another thread. This is what volatile, synchronized, java.util.concurrent, and final fields are built on.

Every OutOfMemoryError you've ever seen is a failure of the first part. Every 'works on my machine but not in production' concurrency bug is a failure of the second part. They're different problems requiring different tools, and confusing them is the single most common mistake I see in JMM discussions.

The JVM spec divides runtime memory into five areas: heap, stack (per-thread), program counter register (per-thread), native method stack (per-thread), and metaspace (class metadata, since Java 8). The heap is shared across all threads. The stack, PC register, and native method stack are per-thread — no synchronization needed. Metaspace is shared but rarely mutated after class loading.

📚 RELATED NEXT STEPS

Garbage Collection in Java — If you're seeing memory errors or OOM crashes

Multithreading in Java — If you're debugging thread visibility or race conditions

⚠ When NOT to rely on this section's concepts:If you're debugging a production incident right now, skip to the Quick Decision Guide in the introduction. This section builds foundation — the guide solves problems.
MemoryLayoutDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
// io.thecodeforge.jvm.memory.MemoryLayoutDemo
// Demonstrates the five JVM memory areas and what lives where.

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryUsage;
import java.lang.management.ThreadMXBean;

public class MemoryLayoutDemo {

    private String applicationName = "TheCodeForge";
    private static final int MAX_CONNECTIONS = 1024;

    public static void main(String[] args) {
        MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();

        System.out.println("=== JVM MEMORY LAYOUT ===");
        System.out.println();

        MemoryUsage heap = memoryBean.getHeapMemoryUsage();
        System.out.println("HEAP (shared across all threads):");
        System.out.printf("  Init:   %,d bytes (%.1f MB)%n", heap.getInit(), heap.getInit() / 1048576.0);
        System.out.printf("  Used:   %,d bytes (%.1f MB)%n", heap.getUsed(), heap.getUsed() / 1048576.0);
        System.out.printf("  Committed: %,d bytes (%.1f MB)%n", heap.getCommitted(), heap.getCommitted() / 1048576.0);
        System.out.printf("  Max:    %s%n", heap.getMax() == -1 ? "unlimited" : String.format("%,d bytes (%.1f MB)", heap.getMax(), heap.getMax() / 1048576.0));
        System.out.println("  Contains: all objects, arrays, string pool contents");
        System.out.println();

        MemoryUsage nonHeap = memoryBean.getNonHeapMemoryUsage();
        System.out.println("NON-HEAP (includes Metaspace):");
        System.out.printf("  Init:   %,d bytes (%.1f MB)%n", nonHeap.getInit(), nonHeap.getInit() / 1048576.0);
        System.out.printf("  Used:   %,d bytes (%.1f MB)%n", nonHeap.getUsed(), nonHeap.getUsed() / 1048576.0);
        System.out.printf("  Max:    %s%n", nonHeap.getMax() == -1 ? "unlimited (Infinity MB)" : String.format("%,d bytes (%.1f MB)", nonHeap.getMax(), nonHeap.getMax() / 1048576.0));
        System.out.println("  Contains: class metadata, method bytecode, JIT code cache");
        System.out.println();

        System.out.println("MEMORY POOLS (heap regions + non-heap regions):");
        for (MemoryPoolMXBean pool : ManagementFactory.getMemoryPoolMXBeans()) {
            MemoryUsage usage = pool.getUsage();
            System.out.printf("  %-30s  used: %6.1f MB  max: %s  type: %s%n",
                pool.getName(),
                usage.getUsed() / 1048576.0,
                usage.getMax() == -1 ? "unlimited" : String.format("%.1f MB", usage.getMax() / 1048576.0),
                pool.getType());
        }
        System.out.println();

        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        System.out.println("STACK (per-thread — each thread has its own):");
        System.out.printf("  Active threads: %d%n", threadBean.getThreadCount());
        System.out.printf("  Current thread stack: %s%n", Thread.currentThread().getName());
        System.out.println("  Contains: local variables, method parameters, return addresses");
        System.out.println("  Each stack frame = one method call on the call stack");
        System.out.println();

        System.out.println("PROGRAM COUNTER REGISTER (per-thread):");
        System.out.println("  Points to the next JVM instruction to execute");
        System.out.println("  For native methods: undefined (native code manages its own PC)");
        System.out.println();

        System.out.println("=== SUMMARY ===");
        System.out.println("  HEAP:       shared, objects/arrays, garbage collected");
        System.out.println("  STACK:      per-thread, local variables, auto-managed");
        System.out.println("  METASPACE:  shared, class metadata, grows until MaxMetaspaceSize");
        System.out.println("  PC REG:     per-thread, current instruction pointer");
        System.out.println("  NATIVE:     per-thread, for JNI/native method calls");
    }
}
JVM as an operating system
  • Heap = the JVM's RAM — all objects live here, shared across threads
  • Stack = per-thread workspace — each thread has its own, no sharing needed
  • Metaspace = blueprint storage — class definitions, loaded once at startup
Production Insight
In a high-traffic payment service I ran, we hit metaspace exhaustion after 200 hot deploys in one day. The root cause was a library that created new classloaders on every request. We fixed it by pinning the classloader and setting -XX:MaxMetaspaceSize=512m. Lesson: class metadata is not free — monitor it aggressively in long-running containers.
Key Takeaway
Memory layout = where things live
Visibility = when threads see each other's writes
→ Don't mix them up
Quick Memory Area Decision Guide
IfYou see OutOfMemoryError: Java heap space
UseFocus on heap (objects) → take heap dump
IfYou see OOMKilled with no Java exception
UseFocus on non-heap (metaspace, direct buffers, thread stacks) → check native memory
IfInconsistent values across threads
UseFocus on happens-before → add volatile or synchronized
JVM Memory Architecture Java 21–25
Shared — All Threads
Heap · GC managed
Heap
Objects & arrays · -Xms / -Xmx
Young Generation · Minor GC
Eden
New objects
Survivor S0
From
Survivor S1
To
Old Generation (Tenured)
Long-lived objects • Promoted after ~15 GC cycles
G1 • ZGC (Java 21+ generational)
No GC · Native
Metaspace
Class metadata • Bytecode • Constant pool
-XX:MaxMetaspaceSize
Per-Thread — Not Shared
JVM Stack
Stack frames
Local vars • Operand stack • Return address
Platform Threads
Virtual Threads (Java 21–25) use carrier threads — much lighter
PC Register
Program Counter
Current bytecode instruction
Native Stack
JNI / Native calls
JIT only
Code Cache
JIT-compiled native code
-XX:ReservedCodeCacheSize
thecodeforge.io
Share
JVM Memory Architecture — Heap (Eden → Survivor → Old Gen), Metaspace, per-thread Stack, PC Register, Native Stack, Code Cache.
Java 21–25: Virtual Threads + generational ZGC + Scoped Values
Jvm Memory Model

PC Register and Native Method Stack: The Overlooked Per‑Thread Memory Regions

While heap and stack get all the attention, two smaller per-thread regions play a critical role in execution: the Program Counter (PC) register and the Native Method Stack.

Program Counter (PC) Register - Each thread has its own PC register, which points to the address of the next JVM instruction to execute. - For Java methods, the PC holds the offset of the current instruction in the method’s bytecode. - For native methods (methods marked native, implemented in C/C++), the PC value is undefined — the native code manages its own program counter. - The PC register is small (a few bytes) and never causes memory errors directly. However, understanding it helps interpret thread dumps: the PC often appears as the top frame’s instruction pointer.

Native Method Stack - Also per-thread, the Native Method Stack supports calls to native methods via the Java Native Interface (JNI). - It’s structured like the Java stack: each native method call pushes a frame containing local variables, operand stack, and references to native objects. - Unlike the Java stack, its size is platform-dependent and not directly configurable with JVM flags. On most platforms the default is 512 KB – 1 MB, shared with the Java stack in the same OS thread. - If native code deeply recurses or allocates large local arrays, it can cause a StackOverflowError inside the native method — but the error message may be confusing because JVM doesn't always report it clearly.

Why These Regions Matter in Production - Thread dumps show the PC register value (often as pc=0x...) for each thread — useful for identifying where a thread is stuck (e.g., infinite loop, blocking I/O). - Native method stack exhaustion is rare but can happen with JNI-intensive libraries. Symptoms: the process freezes or crashes with no heap dump. Diagnose with -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading and native memory tracking. - Virtual threads (Java 21+) share the carrier thread's native stack but have their own PC register state — a subtle detail that matters when debugging virtual thread pinning.

Debugging with PC Register
In a thread dump from jstack, look for the line "PC = ...". On x86, you can match this address to the generated assembly (use -XX:PrintAssembly). For most developers, the PC value is less useful than the stack frame listing, but it’s essential for JVM developers and profiler tooling.
Production Insight
I once debugged a JNI crash in a video encoding library where the native method stack grew beyond the thread limit because of recursive C calls. The JVM threw an opaque signal (SIGSEGV) instead of a clean Java exception. We fixed it by reducing recursion depth in the native code and adding a guard. Lesson: when you see a crash in a native method, suspect the native method stack — the JVM won't tell you it's out of space.
Key Takeaway
PC Register and Native Method Stack are per‑thread, small, and rarely configurable — but understanding them helps debug JNI crashes and interpret thread dumps.

JVM Memory Regions: Visual Overview

The JVM divides its process memory into five primary regions, each with a distinct role. The diagram below groups them into shared (heap, metaspace) and per‑thread (stack, PC register, native method stack).

Heap (Shared) - All objects, arrays, and the string pool live here. - Garbage collector reclaims unreachable objects. - Tuned via -Xms, -Xmx, and GC algorithm flags.

Metaspace (Shared, since Java 8) - Holds class metadata (bytecode, method tables, field layouts). - Unbounded by default — always set -XX:MaxMetaspaceSize to prevent runaway growth from classloader leaks. - Replaced PermGen; now uses native memory, not heap.

Java Stack (Per‑Thread) - Stores method call frames (local variables, operand stack, return address). - Size controlled by -Xss (default ~1 MB on most platforms). - StackOverflowError occurs when stack depth exceeds limit (recursion bug).

PC Register (Per‑Thread) - Holds the address of the next JVM instruction to execute. - For native methods, the value is undefined. - Tiny memory footprint; never a source of OOM.

Native Method Stack (Per‑Thread) - Supports JNI calls; each native method gets a frame. - Size is platform-dependent and not directly configurable. - Exhaustion leads to SIGSEGV crashes, not Java exceptions.

This five-region layout is the foundation for all JVM memory management. Every production memory issue maps to one or more of these regions: high heap usage → GC tuning, metaspace growth → classloader leak, stack overflow → recursion, native crash → JNI issue.

Visualizing memory regions in production
Use jcmd <pid> VM.native_memory summary to see a breakdown of all JVM memory regions. Enable with -XX:NativeMemoryTracking=summary. This is the closest you can get to a live diagram of your JVM's memory layout.
Production Insight
When setting container memory limits, remember that all five regions together must fit inside the container. The heap is only one slice. A 4 GB container with -Xmx3.5g and 200 threads leaves only ~500 MB for metaspace, code cache, and native stacks — which can be tight if you have a large codebase or heavy JNI usage.
Key Takeaway
The JVM divides memory into five regions: two shared (heap, metaspace) and three per‑thread (stack, PC, native stack) — map every production issue to the right region.

Stack vs Heap: Side‑by‑Side Comparison

The stack and heap are the two most important memory regions developers interact with daily. Below is a direct comparison of their key characteristics:

PropertyStackHeap
Access speedVery fast (direct memory access, no GC)Slower (allocation + GC overhead)
Thread safetyNaturally thread‑safe (per‑thread)Not thread‑safe (shared; needs synchronization)
SizeSmall, fixed per thread (default 1 MB)Large, configurable (GBs)
Overflow errorStackOverflowError (deep recursion)OutOfMemoryError: Java heap space
Storage typeLocal variables, method parameters, return addressesObjects, arrays, string pool
VisibilityOnly owning thread can accessAll threads can access (with references)
LifetimeUntil method returns (freed automatically)Until unreachable (garbage collected)
Memory managementAutomatic on method exit (pop frame)Garbage collector (mark‑sweep or copying)

Key Takeaways - Stack: fast, small, private — use for primitives and object references. - Heap: slower, large, shared — use for objects that outlive the method or need to be accessed by multiple threads. - Common mistake: keeping large arrays or collections as local variables in a deep recursive method — can blow the stack because the array object is allocated on the heap, but its reference lives on the stack, and the stack frame itself is small. The array object doesn't cause stack overflow, but the frame count does.

When to Worry About Stack Size - Recursive algorithms (DFS, tree traversal) — increase -Xss or convert to iteration. - Deep call chains in enterprise frameworks (e.g., Spring AOP, many filters). - Virtual threads (Java 21+): they don't consume OS stack, but the carrier thread still has a fixed stack — pinning can cause stack overflow.

StackVsHeapDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// io.thecodeforge.jvm.memory.StackVsHeapDemo
// Demonstrates stack overflow vs heap OOM.

public class StackVsHeapDemo {

    private static int recursionDepth = 0;

    // Stack overflow: recursive call without base case
    public static void recursiveMethod() {
        recursionDepth++;
        recursiveMethod();
    }

    // Heap OOM: allocate until out of memory
    public static void allocateUntilOOM() {
        java.util.List<byte[]> list = new java.util.ArrayList<>();
        try {
            while (true) {
                list.add(new byte[1024 * 1024]); // 1 MB each
            }
        } catch (OutOfMemoryError e) {
            System.out.println("Heap OOM after allocating " + list.size() + " MB");
        }
    }

    public static void main(String[] args) {
        System.out.println("=== STACK VS HEAP DEMONSTRATION ===\n");

        // Part 1: Stack overflow
        System.out.println("--- Stack overflow test ---");
        try {
            recursiveMethod();
        } catch (StackOverflowError e) {
            System.out.println("StackOverflowError after " + recursionDepth + " recursive calls");
        }
        System.out.println();

        // Part 2: Heap OOM
        System.out.println("--- Heap OOM test ---");
        allocateUntilOOM();

        System.out.println("\n=== CONCLUSION ===");
        System.out.println("Stack: small, per-thread, fast, auto-managed.");
        System.out.println("Heap: large, shared, slower, GC-managed.");
        System.out.println("Use stack for short-lived locals; heap for objects that live beyond method scope.");
    }
}
Stack is a scratchpad, heap is a warehouse
  • Stack: locals, method params, return addresses — gone when method returns
  • Heap: objects, arrays — live until GC decides they're unreachable
  • Both: an object reference lives on the stack; the object itself lives on the heap
Production Insight
In a trading system with deep AOP call chains (security, logging, transaction, caching), we hit StackOverflowError under high load because each filter added 5–10 frames. The stack default was 1 MB, but with 500 threads that’s 500 MB of wasted stack space if we increased -Xss. We fixed it by reducing the number of AOP advisors and refactoring recursion out of the hot path. Lesson: Stack overflow is not always a recursion bug — sometimes it's framework overhead.
Key Takeaway
Stack is fast but small (per‑thread); Heap is slower but large (shared). Choose storage based on object lifecycle and size.

Heap Memory — Young Generation, Old Generation, and How Objects Age

The heap is where all Java objects live. It's shared across all threads, and it's where garbage collection operates.

📊Heap Flow: `` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ EDEN │ ──→ │ SURVIVOR │ ──→ │ OLD GEN │ │ (new objects)│ │ (aged objects)│ │ (long-lived) │ └──────────────┘ └──────────────┘ └──────────────┘ ↓ ↓ ↓ Minor GC Minor GC Full GC (fast) (copying) (slow) ``

Young Generation (New Space): Where new objects are allocated. - Eden: All new objects start here. When Eden fills up, a minor GC runs. - Survivor Space 0 (S0) and Survivor Space 1 (S1): Two equal-sized spaces. Objects that survive minor GCs get copied between them, aging each time. - Promotion: When age exceeds threshold (default: 15), object moves to Old Generation.

Old Generation (Tenured Space): Long-lived objects. When Old Gen fills up, a major GC (or full GC) runs — expensive, often stop-the-world.

The generational hypothesis: 90-98% of objects die young. Minor GCs are fast (1-10ms). Full GCs are slow (100ms to seconds).

⚠ When NOT to tune generational heap sizes
  • Ultra-low latency systems (<1ms pauses): G1's generational model still causes stop-the-world. Use ZGC instead (-XX:+UseZGC).
  • Heaps > 64 GB: G1's region management overhead grows. Consider ZGC or Shenandoah.
  • Short-lived batch jobs: GC tuning won't help if the JVM exits in seconds. Focus on allocation rate.
HeapStructureDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// io.thecodeforge.jvm.memory.HeapStructureDemo
// Shows how objects move through heap generations.

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryUsage;
import java.util.ArrayList;
import java.util.List;

public class HeapStructureDemo {

    static class OrderEvent {
        private final String orderId;
        private final String customerId;
        private final double amount;
        private final long timestamp;
        private final byte[] payload;

        OrderEvent(String orderId, String customerId, double amount) {\n            this.orderId = orderId;\n            this.customerId = customerId;\n            this.amount = amount;\n            this.timestamp = System.currentTimeMillis();\n            this.payload = new byte[256];\n        }
    }

    public static void main(String[] args) {
        System.out.println("=== HEAP GENERATION TRACKING ===");
        System.out.println();

        printMemoryPools("BEFORE allocation");

        System.out.println("\nPhase 1: Allocating 100,000 short-lived objects...");
        for (int batch = 0; batch < 10; batch++) {
            List<OrderEvent> shortLived = new ArrayList<>();
            for (int i = 0; i < 10_000; i++) {
                shortLived.add(new OrderEvent(
                    "ORD-" + batch + "-" + i,
                    "CUST-" + (i % 1000),
                    Math.random() * 500
                ));
            }
        }
        printMemoryPools("AFTER short-lived allocation");

        System.out.println("\nRequesting GC...");
        System.gc();
        printMemoryPools("AFTER GC — Eden should be nearly empty");

        System.out.println("\nPhase 2: Allocating 50,000 long-lived objects...");
        List<OrderEvent> longLived = new ArrayList<>();
        for (int i = 0; i < 50_000; i++) {
            longLived.add(new OrderEvent(
                "LONG-" + i,
                "CUST-PERM-" + (i % 100),
                Math.random() * 1000
            ));
        }
        printMemoryPools("AFTER long-lived allocation");

        System.out.println("\nPhase 3: Multiple GCs to promote survivors to Old Gen...");
        for (int i = 0; i < 5; i++) {
            System.gc();
            System.out.println("  GC cycle " + (i + 1) + " complete");
        }
        printMemoryPools("AFTER promotion cycles — Old Gen should have grown");
    }

    static void printMemoryPools(String label) {
        System.out.println("\n  " + label + ":");
        for (MemoryPoolMXBean pool : ManagementFactory.getMemoryPoolMXBeans()) {
            if (pool.getType() == java.lang.management.MemoryType.HEAP) {
                MemoryUsage usage = pool.getUsage();
                System.out.printf("    %-30s  used: %6.1f MB  committed: %8.1f MB%n",
                    pool.getName(),
                    usage.getUsed() / 1048576.0,
                    usage.getCommitted() / 1048576.0);
            }
        }
    }
}
Heap as a conveyor belt with a warehouse
  • New objects land on the belt (Eden) — most die here instantly
  • Survivors move to a holding area (Survivor spaces), aging each pass
  • Long-lived objects graduate to the warehouse (Old Gen)
  • Cleaning the belt = fast (minor GC). Cleaning the warehouse = slow (full GC)
Production Insight
Survivor spaces too small → premature promotion → Old Gen fills → full GC spike
Survivor spaces too large → wasted heap → lower allocation efficiency
→ Monitor with -XX:+PrintTenuringDistribution, target 70-80% survival rate
Key Takeaway
Short-lived objects = cheap (die in Eden, minor GC)
Long-lived objects = expensive (Old Gen, full GC)
→ Reduce allocation rate, not heap size
Heap Tuning Decision Tree
IfHigh allocation rate + many short-lived objects
UseIncrease Eden size (-XX:NewRatio=2)
IfFrequent full GCs with low Old Gen usage
UseIncrease Survivor size or MaxTenuringThreshold
IfUltra-low latency required
UseSwitch to ZGC and stop tuning generational heap

Garbage Collection — How the JVM Reclaims Memory

The garbage collector automatically reclaims memory occupied by objects that are no longer reachable from any GC root (local variables, static fields, active threads, JNI references).

GC Root types: Local variables, static fields, active threads, JNI references, monitors.

Major GC algorithms:

G1 (Garbage First) — Default since Java 9. Divides heap into regions (1-4 MB). Collects regions with most garbage first. Target pause time: -XX:MaxGCPauseMillis (default 200ms). Best for: heaps 4-64 GB, moderate latency.

ZGC — Ultra-low latency. Sub-millisecond pauses regardless of heap size (tested to 16 TB). Uses colored pointers + load barriers. Available since Java 15, generational since Java 21. Best for: heaps > 16 GB, sub-ms latency requirements.

Parallel GC — Throughput-optimized. Multiple threads, stop-the-world. Max application time vs GC. Best for: batch jobs, ETL, analytics.

⚠ When NOT to use G1
  • Ultra-low latency systems (<1ms pauses): G1 still has stop-the-world phases. Use ZGC.
  • High-throughput batch processing: G1's concurrent overhead reduces throughput. Use Parallel GC.
  • Heaps < 2 GB: G1's region management overhead isn't worth it. Use Serial GC (-XX:+UseSerialGC).

📚 RELATED NEXT STEPS

Garbage Collection in Java — Tune GC algorithms and pause targets in depth

Java Memory Leaks and Prevention — Fix container OOMKills and set correct memory limits

GCTuningDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// io.thecodeforge.jvm.memory.GCTuningDemo
// Demonstrates GC behavior and collector selection.

import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryUsage;
import java.util.ArrayList;
import java.util.List;

public class GCTuningDemo {

    public static void main(String[] args) {
        System.out.println("=== GC INFORMATION ===");
        System.out.println();

        System.out.println("Active Garbage Collectors:");
        for (GarbageCollectorMXBean gc : ManagementFactory.getGarbageCollectorMXBeans()) {
            System.out.printf("  Name: %-30s  Collections: %d  Time: %d ms%n",
                gc.getName(), gc.getCollectionCount(), gc.getCollectionTime());
        }
        System.out.println();

        System.out.println("=== GC SELECTION GUIDE ===");
        System.out.println();
        System.out.println("┌─────────────────────────────────────────────────────────────────┐");
        System.out.println("│  USE G1 IF:                    │  USE ZGC IF:                 │");
        System.out.println("├────────────────────────────────┼──────────────────────────────┤");
        System.out.println("│  • Heap 4-64 GB                │  • Heap > 16 GB              │");
        System.out.println("│  • Moderate latency (50-200ms) │  • Sub-millisecond pauses    │");
        System.out.println("│  • Default, no tuning needed   │  • Real-time / trading systems│");
        System.out.println("├────────────────────────────────┼──────────────────────────────┤");
        System.out.println("│  USE PARALLEL GC IF:           │  AVOID G1 IF:                │");
        System.out.println("│  • Batch jobs / ETL            │  • Ultra-low latency (<1ms)  │");
        System.out.println("│  • Max throughput needed       │  • Heap < 2 GB               │");
        System.out.println("│  • GC pauses don't matter      │  • High-throughput batch     │");
        System.out.println(
GC as warehouse cleaning strategies
  • G1 — cleans the messiest aisles first (Garbage First). Best for 4–64 GB heaps
  • ZGC — hires a night crew that cleans while you work. Sub-ms pauses, any size heap
  • Parallel GC — brings the whole team in. Max throughput, stop-the-world pauses
Production Insight
G1 default pause target (200ms) is often too relaxed for APIs
→ Default is not production-ready
→ Set 50ms for web services, 20ms for real-time, switch to ZGC below 1ms
→ Always validate with GC logs before tuning flags blind
Key Takeaway
Lower latency = more frequent GC = higher CPU cost
Higher throughput = fewer GCs = longer pauses
→ Pick the trade-off your SLA demands, not the 'best' algorithm
GC Selection Strategy
IfHeap < 4 GB, latency not critical
UseG1 (default)
IfHeap 4-64 GB, moderate latency
UseG1 with -XX:MaxGCPauseMillis=50
IfHeap > 16 GB, sub-ms latency needed
UseZGC (-XX:+UseZGC -XX:+ZGenerational)
IfBatch job, max throughput
UseParallel GC (-XX:+UseParallelGC)
GC Selection Guide Production 2026
Default
G1 (Garbage First)
Balanced
• Pause target: 50–200 ms
• Heap sweet spot: 4–64 GB
• Good for: Web services, APIs
Most common choice
Ultra-low latency
ZGC (Java 21+)
Sub-millisecond
• Pause: <1 ms (even on 16 TB heaps)
• Generational in Java 21–25
• Good for: Trading, real-time systems
Future default
Throughput
Parallel GC
Max throughput
• Pause: 100 ms – seconds
• Heap: Any size (best <32 GB)
• Good for: Batch jobs, ETL, analytics
Batch workloads
Quick Decision Rule:
• Web / API service → G1 with -XX:MaxGCPauseMillis=50
• Need <1 ms pauses → ZGC (Java 21+)
• Batch / max throughput → Parallel GC
thecodeforge.io
GC Selection Guide — G1 vs ZGC vs Parallel GC (Production 2026)
Jvm Memory Model

JVM Flags Reference: Setting Heap, Stack, Metaspace, and Code Cache

Configuration flags are the first line of defense against memory-related production incidents. Below is a reference table of the five essential JVM memory flags, with their purpose, typical values, and critical notes.

FlagSetsTypical ValueNotes
-XmsInitial heap size-Xms2gJVM pre-allocates this at startup. Set equal to -Xmx to avoid resizing overhead.
-XmxMaximum heap size-Xmx2g (75% of container limit)Never use 100% of container memory; leave headroom for non-heap.
-XssThread stack size-Xss1m (default)Common mistake: 1000 threads × 1 MB = 1 GB stack overhead. Consider 256 KB for virtual threads.
-XX:MaxMetaspaceSizeMaximum metaspace size-XX:MaxMetaspaceSize=512mAlways set this. Unbounded metaspace can silently consume all native memory.
-XX:ReservedCodeCacheSizeMaximum JIT code cache size-XX:ReservedCodeCacheSize=256mCode cache fills up if you have large codebase or many JIT compilations. Flushes cause performance drops.

Interaction Between Flags - -Xms and -Xmx control only the heap. Non-heap regions are additive. - Metaspace (-XX:MaxMetaspaceSize) is separate from heap — an application can run out of native memory even if heap is 50% free. - Code cache (-XX:ReservedCodeCacheSize) is also native memory and competes with metaspace for the non-heap budget. - Thread stacks (-Xss) multiply by thread count: 500 threads × 1 MB = 500 MB of native memory.

Container + JVM Memory Budget Calculation Total process memory ≈ Heap + Metaspace + (Threads × StackSize) + CodeCache + DirectBuffers + GC overhead

Example for a 4 GB container with 200 threads, default 1 MB stacks, 512 MB metaspace, 256 MB code cache: - Heap: 3 GB (75%) - Stacks: 200 MB - Metaspace: 512 MB - Code cache: 256 MB - GC overhead: ~10% of heap = 300 MB - Total: ~4.3 GB → OOM risk. Solution: reduce -Xmx to 2.5 GB or lower stack size to 512 KB.

Always set MaxMetaspaceSize in production
Without -XX:MaxMetaspaceSize, the JVM will let metaspace grow until it consumes all available native memory. This is especially dangerous in containers because Linux OOM killer will terminate the process without a heap dump. Always set a limit based on your application's class metadata footprint (typically 128–512 MB).
Production Insight
In a microservice with 200 threads and a large Spring Boot codebase, the default stack size (1 MB) was consuming 200 MB per pod. Reducing it to 512 KB saved 100 MB and prevented OOMKilled pods during traffic spikes. Combined with setting -XX:MaxMetaspaceSize=256m and -XX:ReservedCodeCacheSize=128m, we reduced native memory overhead by 40%.
Key Takeaway
Set -Xmx to 75% of container limit; always set -XX:MaxMetaspaceSize; -Xss default 1 MB may be too high for large thread pools.

Happens-Before — Thread Visibility and the Rules That Prevent Data Races

This is the second half of the JMM — and the half that causes the most subtle bugs. The memory layout (heap, stack, GC) determines where objects live. The happens-before rules determine when one thread's writes are visible to another thread.

The core problem: Modern CPUs have multiple cores, each with its own L1/L2 cache. Without synchronization, there is NO guarantee that Thread B sees Thread A's write.

The JMM solution — happens-before: A partial ordering of operations. If A happens-before B, then A's writes are visible to B.

Key rules: 1. Program order: Within one thread, every action happens-before later actions. 2. Monitor lock: Unlock happens-before subsequent lock on same monitor (synchronized). 3. Volatile variable: Write to volatile happens-before subsequent read of that volatile. 4. Thread start: Thread.start() happens-before actions in started thread. 5. Thread join: Thread's actions happen-before Thread.join() returns. 6. Transitivity: If A happens-before B and B happens-before C, then A happens-before C.

⚠ When NOT to rely on volatile
  • Compound operations (count++, x = y): Volatile only provides visibility, not atomicity. Use AtomicInteger or synchronized.
  • Multiple variables needing consistent state: Volatile on one variable doesn't create happens-before for others. Use synchronized or Lock.
  • When you need mutual exclusion: Volatile doesn't block threads. Use synchronized or ReentrantLock.

⚠️ x86 Hides Concurrency Bugs — ARM Exposes Them: x86 has strong memory ordering (TSO). Many data races 'work' on x86 but crash on ARM (Graviton, Apple Silicon). If you deploy to ARM, test there. Always establish happens-before edges — never rely on architecture-specific behavior.

📚 RELATED NEXT STEPS

Multithreading in Java — Debug race conditions and thread visibility issues

Multithreading in Java — Concurrent collections and thread-safe patterns

HappensBeforeDemo.javaJAVA
1
2
3
4
5
// io.thecodeforge.jvm.memory.HappensBeforeDemo
// Demonstrates visibility, volatile, and data races.

public class HappensBeforeDemo {\n\n    private static boolean running = true;\n    private static volatile boolean volatileRunning = true;\n    private static volatile int volatileCounter = 0;\n\n    public static void main(String[] args) throws InterruptedException {\n        System.out.println(\"=== HAPPENS-BEFORE DEMONSTRATION ===\");\n        System.out.println();\n\n        System.out.println(\"--- Demo 1: Non-volatile flag (NO happens-before) ---\");\n        Thread worker = new Thread(() -> {\n            int iterations = 0;\n            while (running) {\n                iterations++;\n            }\n            System.out.println(\"  Worker exited after \" + iterations + \" iterations\");\n        });\n        worker.start();\n        Thread.sleep(100);\n        running = false;\n        System.out.println(\"  Main set running=false. Worker MAY never see it.\");\n        worker.join(1000);\n        if (worker.isAlive()) {\n            System.out.println(\"  ❌ Worker still running — data race! (x86 may hide this)\");\n            worker.interrupt();\n        }\n        System.out.println();\n\n        System.out.println(\"--- Demo 2: Volatile flag (happens-before guaranteed) ---\");\n        Thread worker2 = new Thread(() -> {\n            int iterations = 0;\n            while (volatileRunning) {\n                iterations++;\n            }\n            System.out.println(\"  Worker exited after \" + iterations + \" iterations\");\n        });\n        worker2.start();\n        Thread.sleep(100);\n        volatileRunning = false;\n        worker2.join();\n        System.out.println(\"  ✅ Worker exited — happens-before guaranteed\");\n        System.out.println();\n\n        System.out.println(\"--- Demo 3: Volatile does NOT provide atomicity ---\");\n        Thread[] incrementers = new Thread[10];\n        for (int i = 0; i < 10; i++) {\n            incrementers[i] = new Thread(() -> {\n                for (int j = 0; j < 10000; j++) {\n                    volatileCounter++;\n                }\n            });\n        }\n        for (Thread t : incrementers) t.start();\n        for (Thread t : incrementers) t.join();\n        System.out.printf(\"  10 threads * 10,000 increments = 100,000 expected%n\");\n        System.out.printf(\"  volatileCounter = %d (likely less — lost updates!)%n\", volatileCounter);\n        System.out.println(\"  Fix: Use AtomicInteger or synchronized\");\n        System.out.println();\n\n        System.out.println(\"--- Demo 4: Double-checked locking (requires volatile) ---\");\n        System.out.println(\"  Before Java 5, double-checked locking was BROKEN.\");\n        System.out.println(\"  Java 5+ requires volatile for correctness:\");\n        System.out.println(\"    private volatile static MyClass instance;\");\n        System.out.println(\"    if (instance == null) {\");\n        System.out.println(\"        synchronized (MyClass.class) {\");\n        System.out.println(\"            if (instance == null) {\");\n        System.out.println(\"                instance = new MyClass();\");\n        System.out.println(\"            }\");\n        System.out.println(\"        }\");\n        System.out.println(\"    }\");\n    }\n}"
      }
Platform Threads vs Virtual Threads Java 21–25
Traditional
Platform Threads
One OS thread per Java thread
• Fixed stack (usually 1 MB)
• Expensive to create & switch
• Limited by OS thread limit
• High memory overhead
Heavy • Blocking
Modern
Virtual Threads
Lightweight • JVM-managed
• Stack is heap-backed & dynamic
• Extremely cheap to create
• 100k+ concurrent tasks possible
• Carrier threads do the real work
Light • Non-blocking
Key Memory Difference:
Platform threads consume ~1 MB stack each → limited concurrency.
Virtual threads use almost no stack memory (heap-backed) → massive concurrency on a handful of carrier threads.
thecodeforge.io
Platform Threads vs Virtual Threads — Memory & Concurrency Comparison (Java 21–25)
Jvm Memory Model

Common Production Mistakes and Debugging Patterns

These are the mistakes I've seen in production systems and the debugging patterns that caught them. Every one of these has caused a real incident.

📚 RELATED NEXT STEPS

Java Memory Leaks and Prevention — Fix OOMKilled pods and set correct -Xmx for containers

Java Memory Leaks and Prevention — Take and analyse heap dumps step by step

ProductionMistakesDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge.jvm.memory.ProductionMistakesDemo
// Demonstrates common production memory mistakes and fixes.

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryUsage;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.List;

public class ProductionMistakesDemo {

    private static final ThreadLocal<List<byte[
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
Payment service crashing 3–4× per day with OOMKilled in Kubernetes. No heap dump. No Java exception. Just a dead pod.
Assumption
Memory leak in application code.
Root cause
-Xmx4g inside a 4 GB container left zero headroom. JVM needed ~490 MB extra for metaspace (~50 MB), thread stacks (200 × 1 MB), JIT code cache (~240 MB), and GC overhead. Total process memory exceeded 5 GB. Linux OOM killer fired before the JVM could throw.
Fix
Set -Xmx3g (75% of container limit). Enable -XX:NativeMemoryTracking=summary for ongoing visibility.
Key lesson
  • Heap ≠ total memory
  • Non-heap consumes 20–30% of your container budget
  • Always leave headroom
Production debug guideSymptom → Action — use when production is on fire6 entries
Symptom · 01
OutOfMemoryError: Java heap space
Fix
Take heap dump, analyze with Eclipse MAT
Symptom · 02
Latency spikes (100ms
Fix
2s) → Enable GC logging, check pause times
Symptom · 03
Container OOMKilled (no Java exception)
Fix
Check non-heap memory, set -Xmx to 75% of limit
Symptom · 04
Inconsistent values between threads
Fix
Add volatile or synchronized, test on ARM
Symptom · 05
StackOverflowError
Fix
Increase -Xss or convert recursion to iteration
Symptom · 06
High CPU but low throughput
Fix
Profile allocation rate, reduce object creation
🔥

That's Advanced Java. Mark it forged?

13 min read · try the examples if you haven't

Previous
Garbage Collection in Java
12 / 28 · Advanced Java
Next
JVM Memory Issues in Production: Debugging Guide (OOM, GC, Leaks)