Skip to content
Home Java Java Garbage Collection Internals: GC Algorithms, Tuning & Production Gotchas

Java Garbage Collection Internals: GC Algorithms, Tuning & Production Gotchas

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Advanced Java → Topic 11 of 28
Java Garbage Collection explained deeply — GC algorithms, generational heap internals, G1 vs ZGC, tuning flags, and real production pitfalls every senior dev must know.
🔥 Advanced — solid Java foundation required
In this tutorial, you'll learn
Java Garbage Collection explained deeply — GC algorithms, generational heap internals, G1 vs ZGC, tuning flags, and real production pitfalls every senior dev must know.
  • You now understand what Garbage Collection in Java is and why it exists
  • You've seen it working in a real runnable example
  • Practice daily — the forge only works when it's hot
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Young generation (Eden + Survivor spaces)
  • Old generation
🚨 START HERE
GC Triage Cheat Sheet — First 60 Seconds
Fast diagnostic commands when GC is suspected. Run these before diving into GC logs.
🟡Application unresponsive, suspected full GC
Immediate ActionCheck if JVM is in a GC stop-the-world pause
Commands
jcmd <pid> GC.heap_info
jstat -gcutil <pid> 1000 10
Fix NowIf Full GC count is incrementing, check for unbounded caches and heap fragmentation immediately. Restart with -Xlog:gc+humongous=debug
🟠High CPU with low application throughput
Immediate ActionCheck if GC threads are consuming CPU
Commands
top -H -p <pid> | grep -E 'VM Thread|GC Thread'
jcmd <pid> VM.flags | grep -i conc
Fix NowReduce -XX:ConcGCThreads or -XX:ParallelGCThreads if GC CPU > 20%. Consider if allocation rate can be reduced at application level.
🟠Latency spikes at regular intervals
Immediate ActionCorrelate spike timing with GC cycle phases
Commands
jstat -gcutil <pid> 500 20
grep 'Pause' gc.log | tail -20
Fix NowIf spikes align with 'mixed' or 'remark' phases, tune -XX:G1MixedGCCountTarget or -XX:MaxGCPauseMillis.
🔴OOM kill by container orchestrator (k8s)
Immediate ActionCompare container memory limit with JVM heap + native overhead
Commands
kubectl describe pod <pod> | grep -A5 'OOMKilled'
jcmd <pid> VM.native_memory summary
Fix NowSet -XX:MaxRAMPercentage to 75% max (not 90%). Account for ~20% native overhead. Add container memory limit = heap * 1.3 for ZGC.
🟡Allocation failure in logs, to-space exhausted
Immediate ActionG1 cannot evacuate objects — critical failure
Commands
grep 'to-space exhausted' gc.log | wc -l
grep 'humongous' gc.log | tail -20
Fix NowIncrease -XX:G1ReservePercent to 15. Increase region size. Reduce allocation rate. This triggers full GC — treat as P1.
Production IncidentFull GC Spiral Crashes Order Processing Service During Flash SaleAn order processing service running G1 GC with default settings experienced repeated full GC pauses exceeding 30 seconds during a flash sale, causing container health checks to fail and pods to restart in a cascading loop.
SymptomOrder API p99 latency spiked from 80ms to 30+ seconds. Kubernetes liveness probes failed, triggering pod restarts. After restart, the pattern repeated within 10 minutes. GC logs showed 'Pause Full (Allocation Failure)' with increasing frequency.
AssumptionTeam assumed the heap was too small and doubled -Xmx from 4GB to 8GB. The problem persisted — full GC pauses were longer because the live data set was larger.
Root causeThe service cached order objects in a ConcurrentHashMap with no eviction policy. Under flash sale traffic, the cache grew unbounded until old generation was 98% full. G1 could not reclaim enough space during mixed GCs because most old regions contained live cached data. Concurrent marking kept running but found almost nothing collectible. Eventually, young generation allocation failed and G1 fell back to a full GC stop-the-world pause. Doubling the heap only delayed the inevitable — the cache still grew unbounded.
FixThree-part fix: (1) Added size-bounded eviction to the order cache using Caffeine with maximumSize(50000) and expireAfterWrite(Duration.ofMinutes(30)). (2) Enabled GC logging with -Xlog:gc*,gc+humongous=debug:file=/var/log/gc.log to monitor heap pressure proactively. (3) Set -XX:InitiatingHeapOccupancyPercent=35 to trigger concurrent marking earlier, giving mixed GCs more cycles to reclaim space before allocation pressure hit.
Key Lesson
Unbounded caches are the #1 cause of GC-related production incidents in Java servicesFull GC 'Allocation Failure' means the collector cannot free enough space — it is not a tuning problem, it is an application memory management problemDoubling heap without fixing the allocation pattern just delays the same failure with a longer full GC pauseEvery production service must have a bounded eviction strategy for any in-memory data structureMonitor old generation utilization sustained above 85% as a leading indicator of full GC risk
Production Debug GuideFollow this path when GC is suspected as the root cause of latency or availability issues.
Latency spikes correlate with GC pauses in application logsEnable GC logging with -Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime,level,tags and correlate pause timestamps with latency metrics. Check if pauses are young GC, mixed GC, or full GC.
Full GC appearing frequently in steady-state trafficFull GC signals the collector cannot keep up. Check for unbounded caches, humongous allocation rate, heap fragmentation, or metaspace exhaustion. Use jmap -histo to identify which object types dominate the heap.
Throughput drops but pause times are acceptableCollector is consuming too much CPU. Check concurrent GC thread count (-XX:ConcGCThreads). Reduce if GC CPU usage exceeds 15-20% of total. Profile allocation rate — if > 2GB/sec, reduce allocation pressure at the application level.
OOM kill with no heap exhaustion visible in metricsCheck native memory: metaspace, thread stacks, direct byte buffers, mmap regions. Use -XX:NativeMemoryTracking=detail and jcmd <pid> VM.native_memory summary.
GC pause time increases linearly with heap sizeG1 pauses scale with live data set, not heap size. If pauses scale with heap, evaluate switching to ZGC or Shenandoah where pauses are independent of heap size.

Every Java application runs a second program inside the JVM — the Garbage Collector. It decides when memory gets freed, how long your threads pause, and whether your latency SLAs hold up under load. Most developers treat it like a black box and then wonder why their microservice spikes to 500ms every few seconds in production.

Before automatic memory management, C and C++ developers had to manually allocate and free every byte. Java solved this with a managed heap and a runtime that tracks object reachability — if nothing in your program can reach an object, its memory can be reclaimed. That single idea eliminated an entire class of bugs but introduced a new challenge: the collector itself consumes CPU and introduces pauses.

The core misconception: GC pauses are inevitable and unfixable. They are not. Modern collectors offer pause-time guarantees independent of heap size — but only if you understand the trade-offs and tune correctly for your workload.

What is Garbage Collection in Java?

Garbage Collection in Java is the JVM's automatic memory management mechanism. The GC periodically identifies objects that are no longer reachable from any GC root (thread stacks, static fields, JNI references) and reclaims their heap memory. This eliminates manual memory management but introduces pauses and CPU overhead that must be managed in production.

The JVM determines object reachability through a reachability analysis starting from GC roots. An object is considered dead — and eligible for collection — when no chain of references from any root can reach it. This is fundamentally different from reference counting (used in early Python/PHP) which cannot handle cyclic references. Java's tracing GC handles cycles naturally because it only cares about reachability, not reference count.

The key production insight: GC does not run when memory is low. GC runs when allocation pressure triggers it. This means a service with a large heap but low allocation rate may run GC infrequently, while a service with a small heap and high allocation rate may run GC constantly. Allocation rate, not heap size, is the primary driver of GC frequency.

io/thecodeforge/gc/ReachabilityDemo.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
package io.thecodeforge.gc;

import java.util.ArrayList;
import java.util.List;

/**
 * Demonstrates how Java GC determines object reachability.
 *
 * Key concept: An object is reachable if any GC root can access it
 * through a chain of references. When the chain breaks, the object
 * becomes eligible for collection.
 */
public class ReachabilityDemo {

    public static void main(String[] args) {
        // Object created on the heap — referenced by local variable 'order'
        // 'order' is a GC root (stack reference)
        Order order = new Order("ORD-001", 149.99);
        System.out.println("Order created: " + order.getId());

        // After this reassignment, the original Order object has no
        // reachable references. It becomes eligible for GC.
        order = new Order("ORD-002", 299.99);
        // The first Order("ORD-001") is now unreachable — GC will reclaim it

        // Demonstrating cyclic references — GC handles this correctly
        OrderNode nodeA = new OrderNode("A");
        OrderNode nodeB = new OrderNode("B");
        nodeA.next = nodeB;
        nodeB.next = nodeA; // cycle: A -> B -> A

        // Even though A and B reference each other, if we null out
        // our stack references, both become unreachable and are collected
        nodeA = null;
        nodeB = null;
        // The cycle A -> B -> A is still intact in memory, but no GC root
        // can reach either node. Both are eligible for collection.
    }

    static class Order {
        private final String id;
        private final double amount;

        Order(String id, double amount) {
            this.id = id;
            this.amount = amount;
        }

        String getId() { return id; }
    }

    static class OrderNode {
        final String name;
        OrderNode next;

        OrderNode(String name) {
            this.name = name;
        }
    }
}
▶ Output
Order created: ORD-001
Mental Model
GC Roots — What Counts as a Root
The four categories of GC roots
  • Local variables on thread stacks — every active method frame holds references to objects it is using
  • Static fields of loaded classes — ClassLoader roots keep static objects alive for the lifetime of the class
  • JNI references — native code can hold references that the JVM must respect
  • Active monitors — objects currently locked by a thread are temporarily rooted during GC
📊 Production Insight
The most common cause of memory leaks in production Java services is unintentional GC root retention. A static Map that accumulates entries, a ThreadLocal that is never cleaned, or a listener that is never deregistered creates a chain of references from a root that the GC cannot break. Use heap dumps (jmap -dump:live,format=b,file=heap.hprof) and analyze with Eclipse MAT to find dominator trees — the objects keeping the most memory alive through root chains.
🎯 Key Takeaway
GC reclaims memory from objects that no GC root can reach. Cyclic references are handled correctly by tracing GC. The #1 production memory leak pattern is objects retained through static fields, ThreadLocals, or unremoved listeners — not missing free() calls.

The Generational Heap — Why Most Objects Die Young

The JVM heap is divided into generations based on the weak generational hypothesis: most objects die young, and objects that survive one collection are likely to survive many more. This observation drives the generational heap design that every modern JVM collector uses.

The young generation consists of eden space (where new objects are allocated) and two survivor spaces (S0 and S1). New objects are allocated in eden. When eden fills up, a minor GC (young collection) runs: live objects in eden are copied to one survivor space, and live objects in the other survivor space are also copied and aged. Objects that survive enough young collections (controlled by -XX:MaxTenuringThreshold) are promoted to the old generation.

The old generation holds long-lived objects. When the old generation fills up or a collection threshold is reached, a major GC runs. In G1, this is a mixed GC that collects both young and old regions. In extreme cases, a full GC (stop-the-world compaction of the entire heap) is triggered — this is the catastrophic failure mode you must avoid.

The critical production insight: the tenuring threshold determines how quickly objects move to old generation. Too low, and short-lived objects pollute old generation, increasing old gen GC frequency. Too high, and survivor spaces overflow, forcing premature promotion. Both paths degrade performance.

io/thecodeforge/gc/GenerationalBehaviorDemo.java · JAVA
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
package io.thecodeforge.gc;

import java.util.ArrayList;
import java.util.List;

/**
 * Demonstrates how allocation patterns interact with the generational heap.
 *
 * Objects that survive young collections are promoted to old generation.
 * Understanding this promotion mechanism is critical for tuning.
 */
public class GenerationalBehaviorDemo {

    /**
     * Pattern 1: Short-lived objects — ideal for generational GC.
     * These objects die in eden and never reach old generation.
     * GC can reclaim them with a fast young collection.
     */
    public void processRequest() {
        // These objects are created, used, and become unreachable
        // within a single method call. They die in eden.
        String requestId = java.util.UUID.randomUUID().toString();
        byte[] payload = new byte[4096];
        List<String> validationErrors = new ArrayList<>();

        // After this method returns, all three objects become unreachable
        // because they are only referenced by local variables (stack roots).
    }

    /**
     * Pattern 2: Long-lived cached objects — promoted to old gen.
     * These objects survive young collections and get promoted.
     * They occupy old generation permanently (or until eviction).
     *
     * Production risk: If this cache grows unbounded, old generation
     * fills up and triggers full GC or OOM.
     */
    private final List<byte[]> longLivedCache = new ArrayList<>();

    public void cacheData(byte[] data) {
        // This reference keeps the byte array alive indefinitely.
        // After surviving MaxTenuringThreshold young collections,
        // it is promoted to old generation.
        longLivedCache.add(data);
    }

    /**
     * Pattern 3: Premature promotion — objects that should die young
     * but get promoted because survivor space is full.
     *
     * If allocation rate exceeds survivor space capacity, objects
     * are promoted directly to old generation even if they are short-lived.
     * This is called premature promotion and it pollutes old generation.
     *
     * Fix: Increase survivor space ratio (-XX:SurvivorRatio)
     *       or reduce allocation rate.
     */
    public void burstAllocation() {
        // If this loop runs fast enough to fill eden AND overflow
        // survivor space, these temporary objects get promoted to
        // old generation even though they die after each iteration.
        for (int i = 0; i < 100_000; i++) {
            byte[] temp = new byte[256];
            // temp is short-lived, but under pressure it may be
            // prematurely promoted to old generation
        }
    }

    /**
     * Production tuning flags for generational behavior:
     *
     * -XX:NewRatio=2              // old:young = 2:1 (default for most collectors)
     * -XX:SurvivorRatio=8         // eden:survivor = 8:1 (default)
     * -XX:MaxTenuringThreshold=15 // objects survive 15 young GCs before promotion
     * -XX:+AlwaysTenure            // promote immediately (dangerous — avoid)
     * -XX:+NeverTenure             // never promote (survivor overflow → old gen)
     *
     * Monitor promotion rate with:
     *   jstat -gcutil <pid> 1000
     *   Watch 'O' column (old gen utilization) for steady growth.
     *   Steady growth with low live data = premature promotion.
     */
}
Mental Model
The Weak Generational Hypothesis — The Foundation of All Modern GC
Why this hypothesis drives GC design
  • If 90% of objects die in eden, collecting eden reclaims 90% of garbage with minimal work
  • Young collection only scans eden + survivor spaces — not the entire heap. This is fast.
  • Old generation collection is expensive because it must handle long-lived object graphs
  • The hypothesis fails for workloads with uniform object lifetimes — batch processing, data pipelines
  • When the hypothesis fails, you see high promotion rates and frequent old gen collections
📊 Production Insight
Monitor promotion rate as a leading indicator of GC health. Use jstat -gcutil and watch the bytes promoted from young to old generation per GC cycle. A healthy service promotes < 5% of young gen per cycle. If promotion rate exceeds 20%, your objects are living too long in young gen — either increase -XX:MaxTenuringThreshold, increase survivor space (-XX:SurvivorRatio=6), or investigate why short-lived objects are escaping young gen (common cause: objects stored in thread-local caches or request-scoped maps that persist across requests).
🎯 Key Takeaway
The generational heap exploits the statistical fact that most objects die young. Young collection is fast because it only scans eden + survivor. Old generation is expensive to collect. Premature promotion — short-lived objects reaching old gen — is a silent performance killer. Monitor promotion rate with jstat -gcutil.

GC Algorithms — Mark-Sweep, Copying, and Compaction

All GC algorithms are built on three fundamental operations: marking (identifying live objects), sweeping (reclaiming dead objects' memory), and compacting (defragmenting live objects to create contiguous free space). Different collectors combine these operations differently to optimize for pause time, throughput, or memory efficiency.

Mark-and-sweep identifies live objects (mark phase) then reclaims unmarked memory (sweep phase). The problem: it creates fragmentation. After many allocation-deallocation cycles, free memory is scattered in small chunks. Large object allocations may fail even when total free memory is sufficient — this is external fragmentation.

Copying collectors solve fragmentation by copying live objects to a fresh region and discarding the old region entirely. This is inherently compacting — live objects end up contiguous. The cost: copying live objects takes time proportional to the live data set, and you need double the memory (from-space and to-space). The generational heap reduces this cost by only copying in young generation.

Mark-and-compact identifies live objects then slides them to one end of the heap, creating one contiguous free region. This avoids the double-memory cost of copying but requires updating every reference to moved objects — a potentially expensive operation that must be done during a stop-the-world pause or with complex concurrent mechanisms.

io/thecodeforge/gc/GCAlgorithmDemo.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
package io.thecodeforge.gc;

import java.util.ArrayList;
import java.util.List;

/**
 * Demonstrates how different GC algorithm characteristics
 * affect production behavior.
 *
 * This is not a GC implementation — it illustrates the concepts
 * that drive real collector design decisions.
 */
public class GCAlgorithmDemo {

    /**
     * MARK-AND-SWEEP characteristic:
     * - Fast reclaim but creates fragmentation
     * - External fragmentation: total free > requested, but not contiguous
     *
     * Production impact: After hours of operation, allocation of large
     * objects fails even though 40% of heap is free — it's fragmented.
     * This triggers unnecessary full GC or OOM.
     */
    public void demonstrateFragmentation() {
        // Imagine this array is the heap, each index is a memory block
        // true = occupied, false = free
        boolean[] heap = new boolean[100];

        // Simulate allocation pattern: allocate and free alternating blocks
        for (int i = 0; i < 100; i++) {
            heap[i] = true; // allocate
        }
        for (int i = 0; i < 100; i += 2) {
            heap[i] = false; // free every other block
        }
        // Result: 50% free, but no contiguous block of size > 1
        // A request for 3 contiguous blocks fails despite 50 free blocks
        // This is external fragmentation — the problem compaction solves
    }

    /**
     * COPYING COLLECTOR characteristic:
     * - Copies live objects to to-space, discards from-space
     * - Inherently compacting — no fragmentation
     * - Cost: proportional to live data, not dead data
     * - Requires double the memory (from + to spaces)
     *
     * Production insight: Copying cost is why large live data sets
     * cause longer young collection pauses. If your service has
     * 2GB of live objects in young gen, copying takes measurable time.
     */
    public void demonstrateCopyingCost() {
        // Simulating live data that must be copied during young GC
        List<byte[]> liveObjects = new ArrayList<>();
        for (int i = 0; i < 10_000; i++) {
            liveObjects.add(new byte[1024]); // 1KB each = ~10MB live data
        }

        // During young GC, all 10MB must be copied to survivor space.
        // If only 1MB were live, the cost would be 10x lower.
        // This is why reducing live data in young gen reduces pause time.
        //
        // Real production fix: avoid holding references to temporary
        // objects across request boundaries. Let them die in eden.
    }
}
Mental Model
The Three Fundamental GC Operations
How real collectors use these operations
  • Serial GC: mark-sweep-compact, all stop-the-world. Simple but pauses grow with heap.
  • Parallel GC: same algorithm as Serial but uses multiple threads. Faster but same pause characteristics.
  • G1: mark + concurrent sweep via region evacuation. Compaction happens per-region, not whole-heap.
  • ZGC: concurrent mark + concurrent compact via colored pointers. All phases concurrent except initial/final mark.
  • Shenandoah: concurrent mark + concurrent compact via Brooks pointers. Similar to ZGC with different implementation.
📊 Production Insight
Fragmentation is the silent killer of long-running services. After days of operation, a heap with 40% free memory may fail to allocate a 10MB object because no contiguous 10MB block exists. This triggers a full GC to compact the heap. Monitor fragmentation with jcmd <pid> GC.heap_info and look at free region distribution. G1 handles fragmentation well through region-based evacuation. If you see increasing full GC frequency over time without increasing live data, fragmentation is the cause.
🎯 Key Takeaway
All GC algorithms are built on mark, sweep, and compact. Fragmentation is the primary failure mode of mark-and-sweep. Copying collectors solve fragmentation but cost proportional to live data. Modern collectors (G1, ZGC, Shenandoah) do as much work concurrently as possible to minimize stop-the-world pauses.

G1 GC — The Default Workhorse

G1 (Garbage-First) has been the default JVM collector since Java 9. It divides the heap into equal-sized regions (1MB to 32MB) and prioritizes collecting regions with the most garbage — hence 'garbage-first'. G1 maintains a remembered set per region tracking incoming references, enabling independent region collection without scanning the entire heap.

G1 operates in young-only and mixed collection cycles. Young GC collects survivor and eden regions. When the heap occupancy exceeds the Initiating Heap Occupancy Percent (IHOP), G1 triggers a concurrent marking cycle. After marking completes, subsequent mixed GCs collect both young and old regions identified as mostly garbage.

The critical production insight: G1's pause time is primarily driven by the number of regions it must collect in a single pause, not heap size. A 64GB heap with aggressive evacuation can pause longer than a 4GB heap with conservative settings. This is the opposite of what most engineers assume.

io/thecodeforge/gc/G1TuningExample.java · JAVA
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
package io.thecodeforge.gc;

import java.util.concurrent.ConcurrentHashMap;
import java.util.Map;

/**
 * Demonstrates allocation patterns that stress G1 differently.
 *
 * Key insight: G1 humongous objects (>50% region size) bypass normal
 * allocation and can trigger to-space exhausted failures.
 */
public class G1TuningExample {

    // Cache with large value objects — common source of humongous allocations
    private final Map<String, byte[]> payloadCache = new ConcurrentHashMap<>();

    /**
     * BAD: Allocates objects that may exceed humongous threshold.
     * With default 1MB region size, objects > 512KB are humongous.
     * With 32MB regions, threshold is 16MB — much safer for large payloads.
     *
     * Tuning: -XX:G1HeapRegionSize=32M
     *         -XX:G1ReservePercent=15
     *         -XX:InitiatingHeapOccupancyPercent=35
     */
    public void cacheLargePayload(String key, int sizeBytes) {
        byte[] payload = new byte[sizeBytes];
        for (int i = 0; i < Math.min(sizeBytes, 1024); i++) {
            payload[i] = (byte) (i & 0xFF);
        }
        payloadCache.put(key, payload);
    }

    /**
     * BETTER: Chunk large payloads to stay below humongous threshold.
     * Each chunk is independently collectible as a regular object.
     */
    public void cacheChunkedPayload(String key, byte[] fullPayload) {
        int chunkSize = 256 * 1024; // 256KB chunks
        int numChunks = (fullPayload.length + chunkSize - 1) / chunkSize;

        for (int i = 0; i < numChunks; i++) {
            int offset = i * chunkSize;
            int length = Math.min(chunkSize, fullPayload.length - offset);
            byte[] chunk = new byte[length];
            System.arraycopy(fullPayload, offset, chunk, 0, length);
            payloadCache.put(key + ":chunk:" + i, chunk);
        }
    }

    /**
     * Production G1 flags for a 16GB heap with mixed allocation profile:
     *
     * -XX:+UseG1GC
     * -Xms16g -Xmx16g
     * -XX:G1HeapRegionSize=16m
     * -XX:MaxGCPauseMillis=200
     * -XX:G1ReservePercent=15
     * -XX:InitiatingHeapOccupancyPercent=35
     * -XX:G1MixedGCCountTarget=8
     * -XX:G1MixedGCLiveThresholdPercent=85
     * -Xlog:gc*,gc+humongous=debug:file=/var/log/gc.log:time,uptime,level,tags
     */
}
Mental Model
G1's Core Mental Model: Region-Based Evacuation
Why this matters for production
  • Pause time scales with live data in collected regions, not total heap size
  • Humongous objects break this model — they span multiple regions and cannot be partially evacuated
  • Remembered sets consume 5-10% of heap as off-heap overhead — budget for this when setting -Xmx
  • To-space exhausted means G1 literally ran out of regions to evacuate into — this is a full GC fallback
📊 Production Insight
G1's -XX:MaxGCPauseMillis is a soft target, not a hard guarantee. G1 will attempt to meet this by adjusting how many regions to collect per cycle, but allocation rate spikes can violate it. If you need hard latency guarantees, G1 is the wrong collector. Monitor actual pause times against your SLA — if G1 violates MaxGCPauseMillis more than 5% of the time, the workload demands ZGC or Shenandoah.
🎯 Key Takeaway
G1 is the right default for most workloads, but it has a hard ceiling on pause-time predictability. Once your latency budget drops below ~100ms p99, evaluate ZGC or Shenandoah. Never tune G1 without GC logs enabled — the default logging is insufficient for production diagnosis.
G1 Tuning Decision Tree
IfHumongous allocations appearing in GC logs
UseIncrease -XX:G1HeapRegionSize to reduce humongous threshold. Max region size is 32MB. Chunk large objects at the application level if possible.
IfMixed GCs are too frequent, causing throughput loss
UseIncrease -XX:G1MixedGCCountTarget (default 8) to spread collection over more cycles. Adjust -XX:G1MixedGCLiveThresholdPercent to collect only regions with more garbage.
IfFull GC appearing despite adequate heap
UseIHOP is miscalibrated. Set -XX:InitiatingHeapOccupancyPercent lower (try 35) or enable -XX:+G1UseAdaptiveIHOP (Java 10+) to let G1 self-tune.
IfPause times exceed MaxGCPauseMillis consistently
UseLive data set is too large for G1's evacuation budget. Either reduce live data (caching strategy) or migrate to ZGC/Shenandoah where pause times are independent of live data size.

ZGC — Sub-Millisecond Pause Collector

ZGC (Z Garbage Collector) was introduced as experimental in JDK 11 and became production-ready in JDK 15. Its defining characteristic: pause times stay below 10ms regardless of heap size — tested up to 16TB heaps. ZGC achieves this through concurrent everything: marking, relocation, and reference processing all happen while application threads run.

ZGC uses load barriers with colored pointers. Every object reference carries metadata bits (marked0, marked1, remap, finalize) embedded in the pointer itself. The load barrier intercepts every object access to check if the reference needs remapping. This is the fundamental trade-off: ZGC replaces long GC pauses with per-access overhead on every object load.

As of JDK 21, ZGC supports generational mode (-XX:+ZGenerational) which dramatically improves throughput by focusing collection on young objects. Non-generational ZGC collects the entire heap every cycle, which limits throughput on allocation-heavy workloads.

io/thecodeforge/gc/ZGCTuningExample.java · JAVA
1234567891011121314151617181920212223242526272829303132333435363738394041424344
package io.thecodeforge.gc;

import java.util.concurrent.atomic.AtomicLong;

/**
 * ZGC-specific considerations for production workloads.
 *
 * ZGC trades per-access overhead for near-zero pause times.
 * The load barrier adds ~4-8% overhead on pointer-heavy workloads.
 */
public class ZGCTuningExample {

    private final AtomicLong allocationCounter = new AtomicLong(0);

    /**
     * Production ZGC flags for a 32GB heap, latency-sensitive service:
     *
     * -XX:+UseZGC
     * -XX:+ZGenerational              // JDK 21+ — critical for throughput
     * -Xms32g -Xmx32g                // Always set Xms=Xmx for ZGC
     * -XX:SoftMaxHeapSize=28g         // ZGC-specific: target heap occupancy
     * -XX:ZCollectionInterval=5       // Suggest GC cycle every 5 seconds
     * -XX:ConcGCThreads=4             // Concurrent GC threads
     * -Xlog:gc*:file=/var/log/zgc.log:time,uptime,level,tags
     *
     * CRITICAL: ZGC uses ~20% native memory overhead beyond -Xmx.
     * Container memory limit must be heap * 1.25 minimum.
     */

    /**
     * ZGC SoftMaxHeapSize is unique — it tells ZGC to try to stay below
     * this threshold but can exceed it under allocation pressure.
     *
     * Use case: Set heap to 32GB, SoftMaxHeapSize to 28GB.
     * ZGC will trigger cycles aggressively to stay under 28GB.
     * Only allocates into the remaining 4GB under extreme pressure.
     */
    public void demonstrateSoftMaxHeapConcept() {
        // With SoftMaxHeapSize=28g and Xmx=32g:
        // - ZGC targets 28GB occupancy
        // - If allocation pressure pushes past 28GB, ZGC cycles more aggressively
        // - If it hits 32GB, allocation stalls (not OOM, but backpressure)
    }
}
Mental Model
ZGC's Core Mental Model: Colored Pointers + Load Barriers
Why this changes everything about GC trade-offs
  • Pause times are truly independent of heap size and live data size — tested to 16TB
  • The trade-off is per-access CPU overhead, not pause time — you pay on every object load
  • ZGC cannot use compressed object pointers (UseCompressedOops) — increases memory usage by ~15% on heaps < 32GB
  • Generational ZGC (JDK 21+) reduces overhead dramatically by focusing on young generation
📊 Production Insight
ZGC's biggest production risk is native memory consumption. ZGC multi-maps the heap across multiple virtual address spaces for colored pointer management, and this multi-mapping eats into the process's virtual address space. Budget container memory as heap 1.25 for ZGC versus heap 1.15 for G1. Also, ZGC requires a 64-bit system — it does not run on 32-bit.
🎯 Key Takeaway
ZGC is the correct choice when p99 latency must be below 10ms and you can afford 10-15% throughput overhead. Enable generational mode on JDK 21+. Budget 25% extra native memory beyond heap size. ZGC's SoftMaxHeapSize is the most underrated production feature for containerized deployments.

Shenandoah — Red Hat's Low-Pause Contender

Shenandoah is Red Hat's concurrent compacting collector, available as production-ready since JDK 12. It achieves low pause times through concurrent evacuation — moving live objects while application threads run — using Brooks pointers (an indirection layer on every object).

Shenandoah differs from ZGC in a critical way: it uses Brooks pointers (every object has a forwarding pointer field) instead of colored pointers. This means Shenandoah does not require specific pointer bit layouts and works with compressed oops, reducing memory overhead compared to ZGC on heaps under 32GB.

Shenandoah operates in three concurrent phases: concurrent mark, concurrent evacuate, and concurrent update-refs. The initial mark and final mark phases are short stop-the-world pauses, typically under 10ms. Shenandoah's pacing mechanism backpressures allocation threads proportionally when the collector falls behind, creating smoother degradation than ZGC's hard allocation stalls.

io/thecodeforge/gc/ShenandoahTuningExample.java · JAVA
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
package io.thecodeforge.gc;

import java.util.ArrayList;
import java.util.List;

/**
 * Shenandoah-specific production considerations.
 *
 * Shenandoah uses Brooks pointers — every object has an extra forwarding
 * pointer field. This adds 8 bytes per object on 64-bit systems.
 */
public class ShenandoahTuningExample {

    /**
     * Brooks pointer overhead calculation:
     *
     * Object with 2 fields (16 bytes header + 16 bytes data = 32 bytes)
     * + 8 bytes Brooks pointer = 40 bytes per object
     * Overhead: 25% increase per object
     *
     * For 10 million small objects: ~80MB additional memory
     * For 100 million small objects: ~800MB additional memory
     */
    public long estimateBrooksOverhead(int objectCount) {
        return (long) objectCount * 8;
    }

    /**
     * Production Shenandoah flags for a 16GB heap:
     *
     * -XX:+UseShenandoahGC
     * -Xms16g -Xmx16g
     * -XX:ShenandoahGCHeuristics=adaptive
     * -XX:ShenandoahAllocationThreshold=10
     * -XX:+UseCompressedOops               // works with Shenandoah (unlike ZGC)
     * -Xlog:gc*:file=/var/log/shenandoah.log:time,uptime,level,tags
     */

    /**
     * Shenandoah pacing is a unique feature that backpressures allocation
     * threads when the collector falls behind.
     *
     * Unlike ZGC which stalls allocation entirely, Shenandoah slows down
     * allocating threads proportionally. This creates smoother latency
     * degradation under load rather than sharp spikes.
     */
    public void demonstratePacingBehavior() {
        List<byte[]> allocations = new ArrayList<>();

        // Under heavy allocation, Shenandoah will pace this loop
        // by adding small delays to each allocation.
        // The delay is proportional to how far behind the collector is.
        for (int i = 0; i < 100_000; i++) {
            allocations.add(new byte[1024]);
        }
    }
}
Mental Model
Shenandoah's Core Mental Model: Brooks Pointers + Concurrent Evacuation
Why Brooks pointers create different trade-offs than ZGC
  • No load barrier overhead — Shenandoah uses store barriers instead, which fire less frequently
  • Works with compressed oops — saves ~15% memory compared to ZGC on heaps under 32GB
  • Per-object overhead of 8 bytes — significant for workloads with many small objects
  • Pacing mechanism creates graceful degradation instead of hard allocation stalls
📊 Production Insight
Shenandoah's biggest production risk is the Brooks pointer overhead on small-object-heavy workloads. If your service has 100M+ objects under 64 bytes, the 8-byte Brooks pointer per object adds ~800MB of overhead. Profile with compressed oops disabled to see true memory consumption. Additionally, Shenandoah's pacing can create subtle latency degradation that is hard to distinguish from application-level slowness — always correlate pacing delays with latency metrics.
🎯 Key Takeaway
Shenandoah is the right choice when you need low-pause GC on moderate heaps (< 32GB) and want compressed oops support. Its pacing mechanism creates smoother degradation than ZGC's allocation stalls. The Brooks pointer overhead is the hidden cost — budget 8 bytes per object.

JVM Flags That Actually Matter

Most JVM GC flags have sensible defaults. A small subset moves the needle in production. Understanding which flags to adjust — and when — prevents the common anti-pattern of blindly copying flags from blog posts without understanding their impact on your specific workload.

Flags fall into three categories: heap sizing, collector behavior, and logging. Heap sizing flags (-Xms, -Xmx, -XX:NewRatio) control memory layout. Collector behavior flags (-XX:MaxGCPauseMillis, -XX:InitiatingHeapOccupancyPercent) control collection strategy. Logging flags (-Xlog:gc) enable observability. The third category is the most important — you cannot tune what you cannot measure.

JVM Memory Issues in Production: Debugging Guide (OOM, GC, Leaks) — When flags alone are not enough and you need live incident triage

io/thecodeforge/gc/ProductionJVMFlags.java · JAVA
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
package io.thecodeforge.gc;

/**
 * Production JVM flag configurations organized by collector.
 * These are starting points — tune based on measured workload characteristics.
 */
public class ProductionJVMFlags {

    /**
     * UNIVERSAL FLAGS (apply to all collectors):
     *
     * -Xms<size> -Xmx<size>         // Set min=max to avoid resize overhead
     * -XX:+AlwaysPreTouch             // Pre-zero heap pages at startup
     * -XX:+DisableExplicitGC          // Ignore System.gc() calls
     * -XX:+HeapDumpOnOutOfMemoryError // Auto heap dump on OOM
     * -XX:HeapDumpPath=/var/log/      // Where to write heap dumps
     * -XX:+UseContainerSupport        // Respect cgroup limits (default JDK 10+)
     * -XX:MaxRAMPercentage=75.0       // Set heap as % of container memory
     * -XX:NativeMemoryTracking=detail // Track off-heap memory usage
     *
     * LOGGING FLAGS (always enable in production):
     * -Xlog:gc*:file=/var/log/gc.log:time,uptime,level,tags:filecount=5,filesize=50m
     *
     * CRITICAL: Never set -Xmx equal to container memory limit.
     * JVM needs native memory for threads, metaspace, GC structures.
     * Rule: container_limit = Xmx * 1.15 (G1) or Xmx * 1.25 (ZGC)
     */

    /**
     * G1-SPECIFIC FLAGS:
     *
     * -XX:+UseG1GC
     * -XX:G1HeapRegionSize=<1-32m>    // Must be power of 2, 1MB-32MB
     * -XX:MaxGCPauseMillis=200        // Soft target (default 200ms)
     * -XX:InitiatingHeapOccupancyPercent=45  // When to start concurrent mark
     * -XX:G1ReservePercent=10         // Reserve buffer for evacuation
     * -XX:G1MixedGCCountTarget=8      // Spread mixed GC over N cycles
     * -XX:G1MixedGCLiveThresholdPercent=85  // Skip regions with >85% live
     *
     * When to adjust:
     * - Humongous objects → increase G1HeapRegionSize
     * - Frequent full GC → decrease InitiatingHeapOccupancyPercent to 35
     * - Long mixed GC pauses → increase G1MixedGCCountTarget
     */

    /**
     * ZGC-SPECIFIC FLAGS:
     *
     * -XX:+UseZGC
     * -XX:+ZGenerational              // JDK 21+ — ALWAYS enable
     * -XX:SoftMaxHeapSize=<size>      // Target occupancy (ZGC-specific)
     * -XX:ConcGCThreads=<count>       // Concurrent threads (default: auto)
     * -XX:+ZUncommit                  // Return unused memory to OS
     * -XX:ZUncommitDelay=300          // Seconds before uncommitting
     *
     * CRITICAL: ZGC does not support compressed oops.
     * On heaps < 32GB, this means ~15% more memory usage than G1.
     */

    /**
     * SHENANDOAH-SPECIFIC FLAGS:
     *
     * -XX:+UseShenandoahGC
     * -XX:ShenandoahGCHeuristics=adaptive  // adaptive, compact, or static
     * -XX:ShenandoahAllocationThreshold=10 // cycle after N% allocation
     * -XX:+UseCompressedOops              // Supported (unlike ZGC)
     *
     * Heuristic modes:
     * - adaptive: adjusts cycle frequency based on allocation rate (default)
     * - compact: more aggressive collection, lower heap usage
     * - static: fixed cycle interval, good for benchmarking
     */
}
Mental Model
The Flag Hierarchy — What to Tune First
The tuning priority order
  • First: Set -Xms = -Xmx to prevent resize overhead. Size heap based on container limits, not guesswork.
  • Second: Enable GC logging. You cannot tune what you cannot measure. This alone solves 50% of debugging issues.
  • Third: Adjust collector-specific flags only after measuring with logging enabled.
  • Never: Copy flags from blog posts without understanding your workload's allocation profile.
📊 Production Insight
The most impactful single flag change is enabling GC logging. Most production services run with default or minimal GC logging, making post-incident diagnosis impossible. A single line -Xlog:gc*:file=/var/log/gc.log:time,uptime,level,tags:filecount=5,filesize=50m provides pause time breakdowns, heap occupancy trends, and humongous allocation detection. Enable it before you need it — GC logs are retroactive only if they were already enabled.
🎯 Key Takeaway
Most JVM GC flags have sensible defaults. The three flags that matter most: (1) -Xms=-Xmx to prevent resize, (2) GC logging flags for observability, (3) collector-specific flags only after measuring. Never copy-paste JVM flags from the internet without profiling your own workload.
🗂 GC Collector Comparison — Production Reality
Real-world characteristics, not benchmark lab results
CharacteristicG1GCZGCShenandoah
JDK availabilityJDK 7+ (default JDK 9+)JDK 11+ (prod JDK 15+)JDK 8+ (backports), prod JDK 12+
Typical pause time50-200ms (tunable to ~50ms)< 10ms (independent of heap)< 10ms (independent of heap)
Throughput overheadBaseline (lowest)10-15% vs G15-10% vs G1
Native memory overhead~10-15% of heap~20-25% of heap~10-15% of heap + 8 bytes/object
Compressed oopsSupportedNot supportedSupported
Generational collectionYes (built-in)Yes (JDK 21+ with -XX:+ZGenerational)No (full-heap concurrent)
Max tested heapTerabytes16TBTerabytes
Humongous objectsProblematic — requires tuningNo concept — handles large objects wellNo concept — handles large objects well
Container friendlinessGood — predictable overheadPoor — high native memoryGood — supports compressed oops
Allocation stall behaviorFull GC fallback (catastrophic)Hard stall (backpressure)Soft pacing (gradual degradation)
Tuning complexityModerate — many flagsLow — fewer flags, self-tuningLow-moderate — heuristic modes
Best use caseGeneral purpose, cost-sensitiveUltra-low latency, large heapsLow latency, memory-efficient, moderate heaps

🎯 Key Takeaways

  • You now understand what Garbage Collection in Java is and why it exists
  • You've seen it working in a real runnable example
  • Practice daily — the forge only works when it's hot
  • GC reclaims memory from unreachable objects via tracing from GC roots — not reference counting. Cyclic references are handled correctly.
  • The generational heap exploits the weak generational hypothesis: most objects die young. Young collection is fast; old generation collection is expensive.
  • G1 is the right default for most workloads. Move to ZGC only when p99 latency must be below 10ms. Move to Shenandoah for moderate heaps needing low pauses with memory efficiency.
  • The most effective GC tuning is reducing allocation rate at the application level. No collector flag compensates for excessive allocation.
  • GC observability is non-negotiable. Enable logging with -Xlog:gc* before you need it — GC logs are retroactive only if they were already enabled.

⚠ Common Mistakes to Avoid

    Memorising syntax before understanding the concept
    Skipping practice and only reading theory
    Setting -Xmx without accounting for native memory overhead — JVM heap is not total JVM memory. GC internal structures, thread stacks, metaspace, and direct byte buffers all consume off-heap memory. Setting container memory limit equal to -Xmx guarantees OOM kills. — Fix: Use container_limit = Xmx * 1.15 (G1/Shenandoah) or Xmx * 1.25 (ZGC). Monitor with -XX:NativeMemoryTracking=detail.
    Fix

    Use container_limit = Xmx 1.15 (G1/Shenandoah) or Xmx 1.25 (ZGC). Monitor with -XX:NativeMemoryTracking=detail.

    Choosing ZGC because 'lower pauses are always better' — ZGC's 10-15% throughput overhead and 25% native memory overhead are real costs. If your latency SLA is 200ms, G1 meets that comfortably. The throughput and memory savings with G1 translate to fewer pods and lower infrastructure cost. — Fix: Only adopt ZGC when your measured p99 latency with tuned G1 exceeds your SLA.
    Fix

    Only adopt ZGC when your measured p99 latency with tuned G1 exceeds your SLA.

    Tuning GC flags without enabling detailed GC logging — Default GC logging is insufficient for production tuning. Without -Xlog:gc*,gc+phases=debug, you cannot see pause time breakdowns, humongous allocation rates, or evacuation failures. — Fix: Always enable: -Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime,level,tags.
    Fix

    Always enable: -Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime,level,tags.

    Ignoring humongous allocations in G1 — Humongous objects (>50% of G1 region size) bypass normal region allocation and can trigger to-space exhausted failures. This is the #1 cause of unexpected full GC in G1-tuned services. — Fix: Monitor with -Xlog:gc+humongous=debug. Increase -XX:G1HeapRegionSize. Chunk large objects at the application level.
    Fix

    Monitor with -Xlog:gc+humongous=debug. Increase -XX:G1HeapRegionSize. Chunk large objects at the application level.

    Not setting Xms equal to Xmx for ZGC and Shenandoah — ZGC and Shenandoah perform best with a fixed heap size. Dynamic heap resizing adds unnecessary complexity. — Fix: Always set -Xms equal to -Xmx for production workloads with ZGC and Shenandoah.
    Fix

    Always set -Xms equal to -Xmx for production workloads with ZGC and Shenandoah.

    Measuring GC health by pause time alone — A collector with 5ms pauses that runs 1000 times per minute spends more time in GC than one with 50ms pauses that runs 10 times per minute. GC overhead = pause_time * frequency. — Fix: Track GC overhead percentage: total GC time / total elapsed time. Alert if > 5% for latency-sensitive services.
    Fix

    Track GC overhead percentage: total GC time / total elapsed time. Alert if > 5% for latency-sensitive services.

Interview Questions on This Topic

  • QExplain how the JVM determines if an object is eligible for garbage collection. What are GC roots?Reveal
    The JVM uses reachability analysis starting from GC roots. GC roots include local variables on thread stacks, static fields of loaded classes, JNI references, and active monitors. An object is eligible for collection when no chain of references from any root can reach it. This tracing approach handles cyclic references correctly — unlike reference counting, which cannot collect cycles. In production, the most common memory leak pattern is objects retained through unintentional root chains: static Maps, ThreadLocals, or unremoved listeners.
  • QWhat is the generational hypothesis and why does every modern JVM collector use it?Reveal
    The weak generational hypothesis states that most objects die young, and objects that survive one collection are likely to survive many more. This is so statistically reliable that every modern collector exploits it by dividing the heap into young and old generations. Young collection only scans eden + survivor spaces, which is fast because most garbage is there. Old generation collection is expensive because it must handle long-lived object graphs. When this hypothesis fails — uniform object lifetimes, batch processing — you see high promotion rates and frequent old gen collections.
  • QYour service is running G1 GC. GC logs show 'to-space exhausted'. What does this mean and how do you fix it?Reveal
    To-space exhausted means G1 cannot find free regions to evacuate live objects into. This is a critical failure that triggers a full GC stop-the-world pause. Common causes: (1) Humongous objects consuming free regions faster than concurrent marking can reclaim them. (2) IHOP is miscalibrated — concurrent marking starts too late. (3) Allocation rate exceeds reclamation capacity. Fix: increase -XX:G1ReservePercent to 15, increase -XX:G1HeapRegionSize to reduce humongous threshold, lower -XX:InitiatingHeapOccupancyPercent to 30-35, and investigate allocation patterns at the application level.
  • QWhat is the fundamental trade-off between G1, ZGC, and Shenandoah?Reveal
    Each collector optimizes two of three axes: pause time, throughput, and memory efficiency. G1 maximizes throughput and memory efficiency, sacrificing pause-time predictability below ~50ms. ZGC maximizes pause-time guarantee and compaction, sacrificing throughput (10-15%) and memory (no compressed oops). Shenandoah maximizes pause-time guarantee and throughput balance, sacrificing per-object memory (8-byte Brooks pointers). No tuning can break this triangle — you choose which axis to sacrifice.
  • QHow do you calculate the right container memory limit for a JVM running ZGC with a 32GB heap?Reveal
    ZGC's native memory overhead comes from multi-mapping (~15-20% of heap), no compressed oops (+15% heap usage for <32GB heaps), and GC internal structures. Formula: container_limit = Xmx 1.25 = 32GB 1.25 = 40GB. Additionally account for thread stacks (500 threads * 1MB = 500MB), metaspace (~200MB), and direct byte buffers. Total recommended: 42-44GB container limit.
  • QWhy does setting -XX:MaxGCPauseMillis=200 not guarantee 200ms maximum pause with G1?Reveal
    MaxGCPauseMillis is a soft target that G1 uses to calibrate its region collection budget. G1 cannot enforce it when: (1) live data in a single region is large, (2) humongous objects bypass normal region collection, (3) to-space exhausted triggers a full GC, or (4) remark phase duration depends on reference processing workload. The flag influences G1's adaptive sizing decisions but does not impose a hard ceiling on any individual pause.

Frequently Asked Questions

What is Garbage Collection in Java in simple terms?

Garbage Collection in Java is a fundamental concept in Java. Think of it as a tool — once you understand its purpose, you'll reach for it constantly.

Should I use G1, ZGC, or Shenandoah for my microservice?

Start with G1. If your p99 latency with tuned G1 exceeds your SLA, evaluate ZGC (for large heaps or ultra-low latency) or Shenandoah (for moderate heaps with memory constraints). Profile your actual workload — do not choose based on benchmarks.

How much heap should I allocate in a Kubernetes container?

Set -Xmx to container_memory_limit / 1.15 for G1 or Shenandoah, or container_memory_limit / 1.25 for ZGC. Always set -Xms equal to -Xmx. The remaining memory covers thread stacks, metaspace, GC native structures, and direct byte buffers.

What is the difference between a young GC and a mixed GC in G1?

Young GC collects only eden and survivor regions. Mixed GC collects both young regions and old regions identified as mostly garbage during the preceding concurrent marking cycle. Mixed GCs are how G1 reclaims old generation space without a full GC.

Can I switch collectors without restarting the JVM?

No. The garbage collector is selected at JVM startup and cannot be changed at runtime. This is a fundamental JVM design constraint.

How do I know if my allocation rate is too high?

Calculate allocation rate from jstat output: (bytes allocated between samples) / time interval. If allocation rate consistently exceeds 2GB/sec and you are seeing GC pressure, the rate is too high. Profile with async-profiler or JFR to identify allocation hotspots.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousBuilder Pattern in JavaNext →JVM Memory Model
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged