Junior 4 min · April 04, 2026

JVM GC Tuning — G1 Humongous Allocation Storm

Payment p99 latency spike from 45ms to 12s from G1 humongous objects (3-5MB protobuf).

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • G1GC: Default JVM collector (Java 9+). Balances throughput and latency using region-based heap partitioning and concurrent marking. Right choice for most workloads.
  • ZGC: Targets sub-10ms pause times via fully concurrent marking, relocation, and reference processing. Costs 10–15% throughput and ~25% extra native memory overhead.
  • Shenandoah: Similar low-pause goals as ZGC using Brooks pointers and concurrent evacuation. Better memory efficiency than ZGC but narrower platform support.
  • Key trade-off: All three collectors trade throughput for latency guarantees in different ways. No collector maximises all three axes simultaneously.
  • Biggest mistake: Choosing a collector without matching it to your workload profile — leads to unnecessary latency or wasted CPU.
  • Important: GC tuning is not portable. Flags and behaviour differ significantly across G1, ZGC, and Shenandoah — never copy-paste tuning flags between collectors.
Plain-English First

Think of GC as a janitor cleaning a warehouse while workers keep stocking shelves. G1 is a methodical janitor who cleans room-by-room with short pauses. ZGC is a ghost janitor who cleans almost invisibly but needs more helpers. Shenandoah is another ghost janitor with a different cleaning strategy. Each janitor trades off how much disruption they cause against how many workers (CPU) they need to do the job.

Garbage collection tuning is the single most impactful JVM performance lever after algorithm design. Yet most teams default to G1 without understanding whether their workload demands ZGC or Shenandoah. The wrong collector choice manifests as either unexplained latency spikes or wasted CPU capacity — both invisible until they compound under production load.

This guide covers G1, ZGC, and Shenandoah from a production operator's perspective. Each section includes failure scenarios, tuning knobs, and trade-off analysis grounded in real production incidents. No toy examples — every configuration reflects what actually breaks in the field.

The core misconception: GC tuning is about eliminating pauses. It is not. GC tuning is about aligning pause behavior with your application's latency budget and throughput requirements. A 200ms pause is catastrophic for a trading engine and irrelevant for a batch ETL job. Context determines correctness.

What G1 Humongous Allocation Storm Actually Means

A humongous allocation storm occurs when the G1 garbage collector repeatedly fails to allocate objects larger than half a G1 region (typically 512 KB to 1 MB) because the free regions are scattered, not contiguous. G1 treats these oversized objects as 'humongous' and allocates them directly in a contiguous block of regions, bypassing the normal young generation path. When many such allocations happen in rapid succession, G1 must perform frequent concurrent marking cycles to reclaim space, which degrades throughput and can trigger full STW (stop-the-world) collections.

G1's region-based heap is divided into ~1 MB regions by default. A humongous object occupies multiple entire regions, and those regions are reclaimed only during a concurrent mark cycle or a full GC. During a storm, the heap fragments rapidly because humongous regions are never moved (G1 does not compact them). The result: premature promotion failures, increased concurrent cycle frequency, and eventually a full GC that pauses all application threads for seconds.

You encounter this in systems with large caches, byte buffers, or serialized payloads that exceed the region size. The fix is not just increasing heap size — it's controlling the size and lifetime of large allocations. Use -XX:G1HeapRegionSize to tune region size, or pool and reuse large objects to avoid repeated humongous allocations.

Humongous ≠ Large Object in Old Gen
Humongous objects are not promoted through young gen — they are allocated directly in the old gen, so they never benefit from minor GC evacuation.
Production Insight
A team saw 5-second full GC pauses every 10 minutes after deploying a new gRPC service that allocated 2 MB response buffers per request.
Symptom: G1 concurrent mark cycles ran back-to-back, and the 'Humongous Allocation' count in GC logs exceeded 10,000 per minute.
Rule: If your application allocates objects larger than 512 KB at a rate above 100/sec, either increase G1HeapRegionSize to 4 MB or switch to direct ByteBuffer pooling.
Key Takeaway
Humongous allocations bypass young GC entirely and fragment the heap.
A storm is diagnosed by frequent concurrent marks and rising 'Humongous Allocation' counts in GC logs.
Control object size below half a region, or pool large objects — never rely on G1 to compact them.

G1GC — The Workhorse Collector

G1 (Garbage-First) has been the default JVM collector since Java 9. It divides the heap into equal-sized regions (1MB to 32MB) and prioritizes collecting regions with the most garbage — hence 'garbage-first'. G1 maintains a remembered set per region tracking incoming references, enabling independent region collection without scanning the entire heap.

G1 operates in young-only and mixed collection cycles. Young GC collects survivor and eden regions. When the heap occupancy exceeds the Initiating Heap Occupancy Percent (IHOP), G1 triggers a concurrent marking cycle. After marking completes, subsequent mixed GCs collect both young and old regions identified as mostly garbage.

The critical production insight: G1's pause time is primarily driven by the number of regions it must collect in a single pause, not heap size. A 64GB heap with aggressive evacuation can pause longer than a 4GB heap with conservative settings. This is the opposite of what most engineers assume.

io/thecodeforge/gc/G1TuningExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
package io.thecodeforge.gc;

import java.util.concurrent.ConcurrentHashMap;
import java.util.Map;

/**
 * Demonstrates allocation patterns that stress G1 differently.
 *
 * Key insight: G1 humongous objects (>50% region size) bypass normal
 * allocation and can trigger to-space exhausted failures.
 */
public class G1TuningExample {

    // Cache with large value objects — common source of humongous allocations
    private final Map<String, byte[]> payloadCache = new ConcurrentHashMap<>();

    /**
     * BAD: Allocates objects that may exceed humongous threshold.
     * With default 1MB region size, objects > 512KB are humongous.
     * With 32MB regions, threshold is 16MB — much safer for large payloads.
     *
     * Tuning: -XX:G1HeapRegionSize=32M
     *         -XX:G1ReservePercent=15
     *         -XX:InitiatingHeapOccupancyPercent=35
     */
    public void cacheLargePayload(String key, int sizeBytes) {
        byte[] payload = new byte[sizeBytes];
        // Simulate deserialization fill
        for (int i = 0; i < Math.min(sizeBytes, 1024); i++) {
            payload[i] = (byte) (i & 0xFF);
        }
        payloadCache.put(key, payload);
    }

    /**
     * BETTER: Chunk large payloads to stay below humongous threshold.
     * Each chunk is independently collectible as a regular object.
     */
    public void cacheChunkedPayload(String key, byte[] fullPayload) {
        int chunkSize = 256 * 1024; // 256KB chunks — well below humongous threshold
        int numChunks = (fullPayload.length + chunkSize - 1) / chunkSize;

        for (int i = 0; i < numChunks; i++) {
            int offset = i * chunkSize;
            int length = Math.min(chunkSize, fullPayload.length - offset);
            byte[] chunk = new byte[length];
            System.arraycopy(fullPayload, offset, chunk, 0, length);
            payloadCache.put(key + ":chunk:" + i, chunk);
        }
    }

    /**
     * Production G1 flags for a 16GB heap with mixed allocation profile:
     *
     * -XX:+UseG1GC
     * -Xms16g -Xmx16g
     * -XX:G1HeapRegionSize=16m
     * -XX:MaxGCPauseMillis=200
     * -XX:G1ReservePercent=15
     * -XX:InitiatingHeapOccupancyPercent=35
     * -XX:G1MixedGCCountTarget=8
     * -XX:G1MixedGCLiveThresholdPercent=85
     * -Xlog:gc*,gc+humongous=debug:file=/var/log/gc.log:time,uptime,level,tags
     */
}
G1's Core Mental Model: Region-Based Evacuation
  • Pause time scales with live data in collected regions, not total heap size
  • Humongous objects break this model — they span multiple regions and cannot be partially evacuated
  • Remembered sets consume 5-10% of heap as off-heap overhead — budget for this when setting -Xmx
  • To-space exhausted means G1 literally ran out of regions to evacuate into — this is a full GC fallback
Production Insight
G1's -XX:MaxGCPauseMillis is a soft target, not a hard guarantee. G1 will attempt to meet this by adjusting how many regions to collect per cycle, but allocation rate spikes can violate it. If you need hard latency guarantees, G1 is the wrong collector. Monitor actual pause times against your SLA — if G1 violates MaxGCPauseMillis more than 5% of the time, the workload demands ZGC or Shenandoah.
Key Takeaway
G1 is the right default for most workloads, but it has a hard ceiling on pause-time predictability. Once your latency budget drops below ~100ms p99, evaluate ZGC or Shenandoah. Never tune G1 without GC logs enabled — the default logging is insufficient for production diagnosis.
G1 Tuning Decision Tree
IfHumongous allocations appearing in GC logs
UseIncrease -XX:G1HeapRegionSize to reduce humongous threshold. Max region size is 32MB. Chunk large objects at the application level if possible.
IfMixed GCs are too frequent, causing throughput loss
UseIncrease -XX:G1MixedGCCountTarget (default 8) to spread collection over more cycles. Adjust -XX:G1MixedGCLiveThresholdPercent to collect only regions with more garbage.
IfFull GC appearing despite adequate heap
UseIHOP is miscalibrated. Set -XX:InitiatingHeapOccupancyPercent lower (try 35) or enable -XX:+G1UseAdaptiveIHOP (Java 10+) to let G1 self-tune.
IfPause times exceed MaxGCPauseMillis consistently
UseLive data set is too large for G1's evacuation budget. Either reduce live data (caching strategy) or migrate to ZGC/Shenandoah where pause times are independent of live data size.
IfHigh remembered set overhead consuming native memory
UseCheck -XX:G1RSetUpdatingPauseTimePercent (default 10). If RSet maintenance is expensive, reduce cross-region references by improving object locality at the application level.

ZGC — Sub-Millisecond Pause Collector

ZGC (Z Garbage Collector) was introduced as an experimental feature in JDK 11 and became production-ready in JDK 15. Its defining characteristic: pause times stay below 10ms regardless of heap size — tested up to 16TB heaps. ZGC achieves this through concurrent everything: marking, relocation, and reference processing all happen while application threads run.

ZGC uses load barriers (not write barriers) with colored pointers. Every object reference in ZGC carries metadata bits (marked0, marked1, remap, finalize) embedded in the pointer itself. The load barrier intercepts every object access to check if the reference needs remapping. This is the fundamental trade-off: ZGC replaces long GC pauses with per-access overhead on every object load.

As of JDK 21, ZGC supports generational mode (-XX:+ZGenerational) which dramatically improves throughput by focusing collection on young objects. Non-generational ZGC collects the entire heap every cycle, which limits throughput on allocation-heavy workloads.

io/thecodeforge/gc/ZGCTuningExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
package io.thecodeforge.gc;

import java.util.concurrent.atomic.AtomicLong;

/**
 * ZGC-specific considerations for production workloads.
 *
 * ZGC trades per-access overhead for near-zero pause times.
 * The load barrier adds ~4-8% overhead on pointer-heavy workloads.
 */
public class ZGCTuningExample {

    private final AtomicLong allocationCounter = new AtomicLong(0);

    /**
     * ZGC is sensitive to allocation rate, not allocation size.
     * A workload allocating many small objects stresses ZGC more than
     * fewer large objects because load barriers fire more frequently.
     *
     * Monitor: jstat -gcutil <pid> 1000
     * Watch: ZGC cycle count and allocation rate.
     */
    public void highFrequencyAllocation() {
        // 1000 allocations per call — each triggers load barrier overhead
        // on subsequent reads. ZGC handles this well with generational mode.
        for (int i = 0; i < 1000; i++) {
            Object temp = new Object();
            allocationCounter.incrementAndGet();
            // temp is immediately eligible for collection
        }
    }

    /**
     * Production ZGC flags for a 32GB heap, latency-sensitive service:
     *
     * -XX:+UseZGC
     * -XX:+ZGenerational              // JDK 21+ — critical for throughput
     * -Xms32g -Xmx32g                // Always set Xms=Xmx for ZGC
     * -XX:SoftMaxHeapSize=28g         // ZGC-specific: target heap occupancy
     * -XX:ZCollectionInterval=5       // Suggest GC cycle every 5 seconds
     * -XX:ConcGCThreads=4             // Concurrent GC threads (default: auto)
     * -XX:ParallelGCThreads=8         // Parallel GC threads
     * -Xlog:gc*:file=/var/log/zgc.log:time,uptime,level,tags
     *
     * CRITICAL: ZGC uses ~20% native memory overhead beyond -Xmx.
     * Container memory limit must be heap * 1.25 minimum.
     */

    /**
     * ZGC SoftMaxHeapSize is unique — it tells ZGC to try to stay below
     * this threshold but can exceed it under allocation pressure.
     *
     * Use case: Set heap to 32GB, SoftMaxHeapSize to 28GB.
     * ZGC will trigger cycles aggressively to stay under 28GB.
     * Only allocates into the remaining 4GB under extreme pressure.
     * This prevents container OOM kills while keeping a safety margin.
     */
    public void demonstrateSoftMaxHeapConcept() {
        // With SoftMaxHeapSize=28g and Xmx=32g:
        // - ZGC targets 28GB occupancy
        // - If allocation pressure pushes past 28GB, ZGC cycles more aggressively
        // - Only if allocation rate exceeds reclamation rate does it use 28-32GB
        // - If it hits 32GB, allocation stalls (not OOM, but backpressure)
        //
        // This is fundamentally different from G1's IHOP which just triggers
        // a marking cycle. SoftMaxHeapSize is a continuous pressure signal.
    }
}
ZGC's Core Mental Model: Colored Pointers + Load Barriers
  • Pause times are truly independent of heap size and live data size — tested to 16TB
  • The trade-off is per-access CPU overhead, not pause time — you pay on every object load, not during GC
  • ZGC needs 4-byte aligned addresses to use pointer bits — this constrains compressed oops behavior
  • Generational ZGC (JDK 21+) reduces this overhead dramatically by focusing on young generation
  • ZGC cannot use compressed object pointers (UseCompressedOops) — this increases memory usage by ~15% on heaps < 32GB
Production Insight
ZGC's biggest production risk is native memory consumption. ZGC multi-maps the heap across multiple virtual address spaces (for colored pointer management), and this multi-mapping eats into the process's virtual address space. On systems with tight container memory limits, ZGC can OOM-kill even when heap usage is well below -Xmx. Budget container memory as heap 1.25 for ZGC versus heap 1.15 for G1. Also, ZGC requires a 64-bit system with specific OS support — it does not run on 32-bit or certain older Linux kernels.
Key Takeaway
ZGC is the correct choice when p99 latency must be below 10ms and you can afford 10-15% throughput overhead. Enable generational mode on JDK 21+ — non-generational ZGC is a throughput disaster on allocation-heavy workloads. Budget 25% extra native memory beyond heap size. ZGC's SoftMaxHeapSize is the most underrated production feature for containerized deployments.
ZGC Tuning Decision Tree
IfPause times are still above 10ms with ZGC
UseCheck if you are on JDK < 15 (experimental mode has higher pauses). Verify -XX:+ZGenerational is enabled on JDK 21+. Check for allocation stalls in GC logs — these are not pauses but backpressure events.
IfThroughput is 10-15% lower than G1 on same workload
UseThis is expected without generational mode. Enable -XX:+ZGenerational. If already enabled, profile allocation rate — ZGC's load barrier overhead scales with pointer-heavy object graphs. Consider object layout optimization.
IfContainer OOM kills despite heap usage below Xmx
UseNative memory overhead. Run jcmd VM.native_memory summary. ZGC multi-mapping and remembered sets consume significant off-heap. Increase container limit or reduce SoftMaxHeapSize.
IfAllocation stalls appearing in GC logs
UseZGC cannot keep up with allocation rate. Increase -XX:ConcGCThreads. Reduce allocation rate at application level. Set SoftMaxHeapSize lower to trigger cycles earlier.
IfRunning on JDK 11-14
UseZGC is experimental and lacks generational support. Pause times may exceed targets. Upgrade to JDK 21+ or fall back to G1 with aggressive tuning.

Shenandoah — Red Hat's Low-Pause Contender

Shenandoah is Red Hat's concurrent compacting collector, available as a production feature since JDK 12 (backported to JDK 8 and 11 via Shenandoah project). It achieves low pause times through concurrent evacuation — moving live objects while application threads run — using Brooks pointers (an indirection layer on every object).

Shenandoah's architecture differs from ZGC in a critical way: it uses Brooks pointers (every object has a forwarding pointer field) instead of colored pointers. This means Shenandoah does not require specific pointer bit layouts and works with compressed oops, reducing memory overhead compared to ZGC on heaps under 32GB.

Shenandoah operates in three concurrent phases: concurrent mark (identify live objects), concurrent evacuate (move live objects out of garbage-heavy regions), and concurrent update-refs (fix pointers to moved objects). The initial mark and final mark phases are short stop-the-world pauses, typically under 10ms.

io/thecodeforge/gc/ShenandoahTuningExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
package io.thecodeforge.gc;

import java.util.ArrayList;
import java.util.List;

/**
 * Shenandoah-specific production considerations.
 *
 * Shenandoah uses Brooks pointers — every object has an extra forwarding
 * pointer field. This adds 8 bytes per object on 64-bit systems.
 * For applications with millions of small objects, this overhead is measurable.
 */
public class ShenandoahTuningExample {

    /**
     * Brooks pointer overhead calculation:
     *
     * Object with 2 fields (16 bytes header + 16 bytes data = 32 bytes)
     * + 8 bytes Brooks pointer = 40 bytes per object
     * Overhead: 25% increase per object
     *
     * For 10 million small objects: ~80MB additional memory
     * For 100 million small objects: ~800MB additional memory
     *
     * Compare to ZGC: no per-object overhead, but ~15% heap overhead
     * from multi-mapping and compressed oops unavailability.
     */
    public long estimateBrooksOverhead(int objectCount) {
        return (long) objectCount * 8; // 8 bytes per Brooks pointer
    }

    /**
     * Production Shenandoah flags for a 16GB heap:
     *
     * -XX:+UseShenandoahGC
     * -Xms16g -Xmx16g
     * -XX:ShenandoahGCHeuristics=adaptive  // or 'compact', 'static'
     * -XX:ShenandoahAllocationThreshold=10  // trigger cycle after 10% allocation
     * -XX:+UseCompressedOops               // works with Shenandoah (unlike ZGC)
     * -XX:+UseCompressedClassPointers
     * -Xlog:gc*:file=/var/log/shenandoah.log:time,uptime,level,tags
     *
     * Heuristic modes:
     * - adaptive: (default) adjusts cycle frequency based on allocation rate
     * - compact: more aggressive collection, lower heap usage, slightly higher CPU
     * - static: fixed cycle interval, predictable behavior for benchmarking
     */

    /**
     * Shenandoah pacing is a unique feature that backpressures allocation
     * threads when the collector falls behind.
     *
     * Unlike ZGC which stalls allocation entirely, Shenandoah slows down
     * allocating threads proportionally. This creates smoother latency
     * degradation under load rather than sharp spikes.
     *
     * Monitor: -Xlog:gc+stats to see pacing delays.
     * If pacing delays exceed 1ms consistently, allocation rate is too high
     * for the current heap size and ConcGCThreads setting.
     */
    public void demonstratePacingBehavior() {
        List<byte[]> allocations = new ArrayList<>();

        // Under heavy allocation, Shenandoah will pace this loop
        // by adding small delays to each allocation.
        // The delay is proportional to how far behind the collector is.
        //
        // This is different from ZGC's allocation stall which is a hard stop.
        // Shenandoah's pacing creates gradual degradation.
        for (int i = 0; i < 100_000; i++) {
            allocations.add(new byte[1024]); // 1KB each
        }
    }
}
Shenandoah's Core Mental Model: Brooks Pointers + Concurrent Evacuation
  • No load barrier overhead — Shenandoah uses store barriers instead, which fire less frequently
  • Works with compressed oops — saves ~15% memory compared to ZGC on heaps under 32GB
  • Per-object overhead of 8 bytes — significant for workloads with many small objects
  • Concurrent evacuation means compaction happens while application runs — less fragmentation than G1
  • Pacing mechanism creates graceful degradation instead of hard allocation stalls
Production Insight
Shenandoah's biggest production risk is the Brooks pointer overhead on small-object-heavy workloads. If your service has 100M+ objects under 64 bytes, the 8-byte Brooks pointer per object adds ~800MB of overhead that does not show up in heap usage metrics. Profile with -XX:+UseCompressedOops disabled to see true memory consumption. Additionally, Shenandoah's pacing mechanism can create subtle latency degradation that is hard to distinguish from application-level slowness — always correlate pacing delays with latency metrics.
Key Takeaway
Shenandoah is the right choice when you need low-pause GC on moderate heaps (< 32GB) and want compressed oops support. Its pacing mechanism creates smoother latency degradation than ZGC's allocation stalls. The Brooks pointer overhead is the hidden cost — budget 8 bytes per object. Shenandoah is less battle-tested at extreme scale than ZGC but offers better memory efficiency on medium heaps.
Shenandoah Tuning Decision Tree
IfPacing delays visible in GC logs, application feels slow
UseAllocation rate exceeds collector capacity. Increase -XX:ConcGCThreads. Reduce allocation rate. Consider increasing heap size — Shenandoah pacing is proportional to how close you are to heap exhaustion.
IfHigher memory usage than expected with same heap settings
UseBrooks pointer overhead. Profile object count. If > 50M objects, the 8-byte-per-object overhead is significant. Consider ZGC if object count is high and you can afford compressed oops being disabled.
IfPause times higher than expected (>10ms)
UseCheck which heuristic is in use. 'Compact' heuristic can cause longer pauses during aggressive compaction. Switch to 'adaptive'. Also check -XX:ShenandoahGCThreads — too few threads lengthen mark phases.
IfRunning on JDK 8 or 11
UseShenandoah is available via backports but may lack optimizations from newer JDK versions. Verify the specific backport version. JDK 17+ Shenandoah is significantly more mature.
IfNeed to choose between Shenandoah and ZGC
UseShenandoah wins on heaps < 32GB where compressed oops matter (saves 15% memory). ZGC wins on larger heaps and when generational mode is needed. Shenandoah's pacing is gentler than ZGC's allocation stalls.

Comparing the Three Collectors — Real Trade-offs

The collector choice is not about which is 'best' — it is about matching the collector's trade-off profile to your workload's requirements. Every collector sacrifices something: G1 sacrifices pause-time predictability for throughput. ZGC sacrifices throughput and memory for near-zero pauses. Shenandoah sacrifices per-object memory for balanced pause-throughput behavior.

The following comparison reflects production reality, not benchmark lab conditions. Real workloads have allocation spikes, mixed object lifetimes, and container constraints that change the calculus entirely.

io/thecodeforge/gc/CollectorSelectionGuide.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
package io.thecodeforge.gc;

/**
 * Decision framework for collector selection based on production constraints.
 *
 * No collector is universally superior. This guide maps workload
 * characteristics to the appropriate collector.
 */
public class CollectorSelectionGuide {

    /**
     * SELECTION MATRIX:
     *
     * Workload Profile          | Recommended Collector | Reason
     * --------------------------|----------------------|------------------
     * General web service       | G1                   | Good balance, mature ecosystem
     * Sub-10ms latency SLA     | ZGC (generational)   | Hard pause guarantee
     * Sub-10ms + <32GB heap    | Shenandoah           | Compressed oops, pacing
     * Batch processing         | G1 or Parallel       | Throughput over latency
     * Large heap (>64GB)       | ZGC                  | Pause times scale with heap
     * Small heap (<4GB)        | G1                   | Overhead of ZGC/Shenandoah unjustified
     * Container-constrained    | G1 or Shenandoah     | Lower native memory overhead
     * High allocation rate     | ZGC (generational)   | Generational mode handles young gen
     * Mixed object lifetimes   | G1                   | Region-based collection handles this well
     * Many small objects       | ZGC                  | No per-object overhead
     * Many large objects       | G1 (large regions)   | Humongous object handling
     */

    /**
     * NATIVE MEMORY OVERHEAD COMPARISON (approximate, production values):
     *
     * G1:
     *   - Remembered sets: 5-10% of heap
     *   - Card table: ~0.2% of heap
     *   - Total native overhead: ~10-15% of heap
     *
     * ZGC:
     *   - Multi-mapping: ~15-20% of heap (virtual address space)
     *   - Page table overhead: variable
     *   - No compressed oops: +15% heap usage for <32GB heaps
     *   - Total native overhead: ~20-25% of heap
     *
     * Shenandoah:
     *   - Brooks pointers: 8 bytes per object
     *   - Remembered sets: ~5% of heap
     *   - Compressed oops: supported (saves ~15% vs ZGC)
     *   - Total native overhead: ~10-15% of heap + per-object cost
     */

    /**
     * CONTAINER MEMORY FORMULA:
     *
     * G1:        container_limit = Xmx * 1.15
     * ZGC:       container_limit = Xmx * 1.25
     * Shenandoah: container_limit = Xmx * 1.15 + (object_count * 8)
     *
     * If container limit is fixed, work backwards:
     * G1:        Xmx = container_limit / 1.15
     * ZGC:       Xmx = container_limit / 1.25
     * Shenandoah: Xmx = (container_limit - object_count * 8) / 1.15
     */
}
The Fundamental Trade-off Triangle
  • G1: Maximizes throughput and memory efficiency. Sacrifices pause-time predictability below ~50ms.
  • ZGC: Maximizes pause-time guarantee and memory compaction. Sacrifices throughput (10-15%) and memory (compressed oops unavailable).
  • Shenandoah: Maximizes pause-time guarantee and throughput balance. Sacrifices per-object memory (8 bytes Brooks pointer).
  • No tuning can break this triangle — you are choosing which axis to sacrifice, not eliminating trade-offs.
Production Insight
The most common production mistake is choosing ZGC for the wrong reason. Teams choose ZGC because 'lower pauses are always better' without accounting for the 10-15% throughput loss and 25% native memory overhead. If your SLA is 200ms p99, G1 meets that comfortably. The throughput and memory you save with G1 translates directly to infrastructure cost savings. Only move to ZGC when your latency budget actually demands it.
Key Takeaway
Start with G1 unless your latency SLA explicitly demands sub-50ms pauses. Move to Shenandoah for moderate heaps needing low pauses with memory efficiency. Move to ZGC for large heaps or ultra-low latency requirements. Never choose a collector based on benchmarks alone — profile your actual workload's allocation pattern, object lifetime distribution, and container constraints.
Collector Selection Decision Tree
Ifp99 latency SLA > 100ms
UseUse G1. It meets this target with proper tuning. Save the throughput and memory overhead of ZGC/Shenandoah for infrastructure cost reduction.
Ifp99 latency SLA 50-100ms
UseUse G1 with aggressive tuning (-XX:MaxGCPauseMillis=50). If G1 cannot meet this, evaluate Shenandoah for its smoother pacing behavior.
Ifp99 latency SLA < 50ms, heap < 32GB
UseUse Shenandoah. Compressed oops support saves memory. Pacing mechanism provides graceful degradation.
Ifp99 latency SLA < 50ms, heap > 32GB
UseUse ZGC with generational mode. Pause times are truly independent of heap size. Budget extra native memory.
Ifp99 latency SLA < 10ms (ultra-low latency)
UseUse ZGC (generational). This is ZGC's designed use case. Accept throughput and memory trade-offs. Consider off-heap allocation for hot paths.
IfContainer memory is tightly constrained
UseUse G1 or Shenandoah. ZGC's 25% native overhead makes it expensive in memory-constrained containers. Shenandoah wins if you also need low pauses.

Production Tuning Patterns That Actually Work

Most GC tuning guides present flags in isolation. Production tuning requires understanding how flags interact and which signals indicate which adjustments. These patterns are derived from incidents across payment processing, real-time bidding, and high-frequency trading systems.

io/thecodeforge/gc/ProductionTuningPatterns.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
package io.thecodeforge.gc;

/**
 * Production tuning patterns organized by problem type.
 * Each pattern addresses a specific failure mode.
 */
public class ProductionTuningPatterns {

    /**
     * PATTERN 1: Allocation Rate Spike Handler
     *
     * Problem: Bursts of allocation cause GC to fall behind.
     * Symptom: Increasing pause times during traffic spikes.
     *
     * G1 Fix:
     *   -XX:InitiatingHeapOccupancyPercent=30  // start marking earlier
     *   -XX:G1ReservePercent=15                // more evacuation buffer
     *   -XX:G1RSetUpdatingPauseTimePercent=5   // less RSet work in pause
     *
     * ZGC Fix:
     *   -XX:SoftMaxHeapSize=<70% of Xmx>       // trigger cycles earlier
     *   -XX:ConcGCThreads=<cores/4>            // more concurrent threads
     *   -XX:+ZGenerational                     // focus on young objects
     *
     * Shenandoah Fix:
     *   -XX:ShenandoahAllocationThreshold=5    // cycle after 5% allocation
     *   -XX:ConcGCThreads=<cores/4>            // more concurrent threads
     *   -XX:ShenandoahGCHeuristics=compact     // aggressive reclamation
     */

    /**
     * PATTERN 2: Long-Lived Cache Optimization
     *
     * Problem: Large caches create a big live data set that GC must scan
     * but never reclaim. This wastes GC cycles and increases pause times.
     *
     * Solution: Use off-heap caching (Caffeine with weakValues, or
     * Chronicle Map) to move cached data outside GC's jurisdiction.
     *
     * If on-heap caching is required:
     * G1:   -XX:G1MixedGCLiveThresholdPercent=90  // skip regions with >90% live
     * ZGC:  Already handles this well with concurrent marking
     * Shen: -XX:ShenandoahGCHeuristics=adaptive   // skip mostly-live regions
     */

    /**
     * PATTERN 3: Container-Aware Sizing
     *
     * Problem: JVM heap + native memory exceeds container limit.
     * Symptom: OOM kill with no heap exhaustion in metrics.
     *
     * Rule of thumb for container memory limits:
     * - Set Xmx = container_limit * 0.80 for G1
     * - Set Xmx = container_limit * 0.70 for ZGC
     * - Set Xmx = container_limit * 0.80 for Shenandoah
     *
     * Remaining memory covers:
     * - Thread stacks (1MB per thread, ~500 threads = 500MB)
     * - Metaspace (class metadata, usually 100-300MB)
     * - Direct byte buffers (monitor with MBean)
     * - GC internal structures (remembered sets, card tables)
     * - JNI native memory
     */

    /**
     * PATTERN 4: Warm-Up Tuning for Low-Latency Services
     *
     * Problem: First requests after deployment have high latency due to
     * JIT compilation, class loading, and initial GC cycles.
     *
     * Solution:
     * 1. Use -XX:+AlwaysPreTouch to pre-zero heap pages at startup
     * 2. Implement warm-up traffic routing (load balancer weight ramp)
     * 3. Run synthetic allocation load for 60s before accepting traffic
     * 4. For ZGC: first 2-3 cycles are slower as JIT optimizes load barriers
     *
     * io.thecodeforge.gc.WarmUpManager can handle synthetic warm-up.
     */
}
The Allocation Rate Rule
  • < 500 MB/sec: Any collector handles this comfortably with default settings
  • 500 MB/sec - 2 GB/sec: G1 works with tuning. ZGC generational mode handles well.
  • 2-5 GB/sec: Requires aggressive tuning or allocation reduction. ZGC generational is best.
  • > 5 GB/sec: Consider object pooling, arena allocation, or off-heap strategies. GC alone cannot keep up.
  • Measure with: jstat -gc <pid> 1000 — calculate (bytes allocated between samples) / interval
Production Insight
The most effective GC tuning is reducing allocation rate at the application level. No collector flag compensates for a service that allocates 5GB/sec of short-lived objects. Common allocation reduction strategies: object pooling for hot-path allocations, StringBuilder reuse in logging frameworks, arena allocation for request-scoped data, and avoiding autoboxing in tight loops. A 50% allocation rate reduction has more impact than any GC flag change.
Key Takeaway
GC tuning is 20% flag adjustment and 80% allocation pattern optimization. The best production engineers profile allocation rate first, optimize object lifetimes second, and adjust GC flags last. If you are tuning GC flags without measuring allocation rate, you are guessing.

Monitoring and Observability for GC Health

GC tuning without observability is blind optimization. Every production JVM must emit GC metrics that allow correlation with application latency and throughput. The minimum viable GC observability setup includes pause time histograms, allocation rate tracking, and GC cycle phase breakdowns.

📚 RELATED NEXT STEPS

JVM Memory Issues in Production: Debugging Guide (OOM, GC, Leaks) — When GC metrics point to an active incident, use this triage sequence

Garbage Collection in Java — Deep dive into how each collector works internally

Java Memory Leaks and Prevention — GC tuning alone cannot fix a leak — identify the root cause first

io/thecodeforge/gc/GCMetricsExporter.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
package io.thecodeforge.gc;

import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryPoolMXBean;
import java.lang.management.MemoryUsage;
import java.util.List;

/**
 * Production GC metrics exporter for Prometheus/Micrometer integration.
 *
 * These metrics enable correlation of GC behavior with application
 * latency and throughput in your observability stack.
 */
public class GCMetricsExporter {

    /**
     * Essential GC metrics every production service must emit:
     *
     * 1. jvm_gc_pause_seconds{collector, action}
     *    - Histogram of GC pause durations
     *    - Alert on p99 > SLA threshold
     *
     * 2. jvm_gc_allocation_rate_mbps
     *    - Calculated from heap usage delta between GC cycles
     *    - Leading indicator of GC pressure
     *
     * 3. jvm_gc_live_data_size_bytes
     *    - Size of live objects after major collection
     *    - Growing trend = memory leak
     *
     * 4. jvm_gc_memory_promoted_bytes_total
     *    - Bytes promoted from young to old generation
     *    - High rate = short-lived objects escaping young gen
     *
     * 5. jvm_memory_used_bytes{area, pool}
     *    - Per-memory-pool usage
     *    - Alert on old gen > 80% sustained
     */

    public void exportGCMetrics() {
        List<GarbageCollectorMXBean> gcBeans = ManagementFactory.getGarbageCollectorMXBeans();

        for (GarbageCollectorMXBean gcBean : gcBeans) {
            String collectorName = gcBean.getName();
            long collectionCount = gcBean.getCollectionCount();
            long collectionTimeMs = gcBean.getCollectionTime();

            // Export as:
            // jvm_gc_collection_count_total{collector="<name>"} <count>
            // jvm_gc_collection_time_seconds_total{collector="<name>"} <time_sec>

            System.out.printf("Collector: %s, Count: %d, Time: %dms%n",
                    collectorName, collectionCount, collectionTimeMs);
        }

        // Memory pool monitoring for heap pressure detection
        List<MemoryPoolMXBean> memoryBeans = ManagementFactory.getMemoryPoolMXBeans();
        for (MemoryPoolMXBean pool : memoryBeans) {
            MemoryUsage usage = pool.getUsage();
            long usedMB = usage.getUsed() / (1024 * 1024);
            long maxMB = usage.getMax() / (1024 * 1024);
            double utilization = (double) usage.getUsed() / usage.getMax() * 100;

            // Alert if old gen utilization > 80% for extended period
            System.out.printf("Pool: %s, Used: %dMB, Max: %dMB, Util: %.1f%%%n",
                    pool.getName(), usedMB, maxMB, utilization);
        }
    }

    /**
     * GC log analysis commands for production triage:
     *
     * 1. Pause time distribution:
     *    grep 'Pause' gc.log | awk '{print $NF}' | sort -n | awk '
     *      {a[NR]=$1} END {print "p50:",a[int(NR*0.5)],"p99:",a[int(NR*0.99)],"max:",a[NR]}'
     *
     * 2. GC frequency over time:
     *    grep '\[gc.*\]' gc.log | awk '{print $1}' | cut -d'T' -f1 | uniq -c
     *
     * 3. Humongous allocation rate (G1 specific):
     *    grep 'humongous' gc.log | wc -l
     *    grep 'humongous' gc.log | awk '{sum+=$NF} END {print sum/NR, "bytes avg"}'
     *
     * 4. ZGC cycle time distribution:
     *    grep 'Garbage Collection.*GC\(' gc.log | grep -oP '\d+\.\d+ms' | sort -n
     *
     * 5. Shenandoah pacing delays:
     *    grep 'Pacing' gc.log | awk '{print $NF}' | sort -n | tail -20
     */
}
The Three Metrics That Matter Most
  • Allocation rate trend: A steadily increasing allocation rate (week over week) means you will hit GC capacity limits. Fix before it becomes an incident.
  • Live data size trend: A growing live data set after full GC means a memory leak. GC cannot reclaim it — the application is retaining references.
  • Pause time p99 trend: If p99 pause time is growing over days, the heap is fragmenting or the live data set is growing. Investigate before it violates SLA.
Production Insight
Set up GC alerting on three signals: (1) p99 pause time exceeding 80% of your SLA budget, (2) allocation rate exceeding 70% of your collector's sustainable rate, and (3) old gen utilization sustained above 85%. These three signals catch 90% of GC-related production incidents before they impact users. Do not alert on full GC count alone — a single full GC during startup is normal. Alert on full GC during steady-state traffic.
Key Takeaway
GC observability is not optional. If you cannot answer 'what is the current allocation rate?' and 'what is the p99 pause time?' in under 30 seconds, your monitoring is insufficient. Export GC metrics to your existing observability stack — do not rely on manual GC log analysis for production triage.
● Production incidentPOST-MORTEMseverity: high

G1 Humongous Allocation Storm Crashes Payment Service Under Black Friday Load

Symptom
Payment API p99 latency spiked from 45ms to 12s within 30 minutes of Black Friday traffic ramp. Pod restarts every 8-12 minutes. GC logs show repeated 'to-space exhausted' and 'concurrent cycle interrupted' messages.
Assumption
Team assumed the heap was undersized and doubled -Xmx from 8GB to 16GB. Problem worsened — longer GC cycles, same pattern.
Root cause
Bulk payment batch payloads (serialized protobuf messages averaging 3-5MB each) were classified as humongous objects by G1 (anything > 50% of region size). With 8MB regions (16GB heap / 2048 regions), 3MB objects were humongous. G1 allocates humongous objects in contiguous free regions. Under burst traffic, humongous allocation consumed free regions faster than concurrent marking could reclaim them, triggering 'to-space exhausted' — a full GC fallback that locked the application thread.
Fix
Three-part fix: (1) Increased region size to 32MB via -XX:G1HeapRegionSize=32M, converting 3MB objects from humongous to regular allocations. (2) Implemented payload chunking in io.thecodeforge.payment.serialization.BatchSerializer to cap individual allocations at 512KB. (3) Added -XX:G1ReservePercent=15 to increase the reserve buffer that prevents humongous allocation failures.
Key lesson
  • G1 humongous objects bypass normal region allocation and can starve the collector
  • Region size is the single most important G1 tuning parameter for workloads with large transient objects
  • Doubling heap without fixing the allocation pattern just delays the same failure with a longer full GC pause
  • Monitor humongous allocation rate with -Xlog:gc+humongous=debug before incidents occur
Production debug guideFollow this path when GC is suspected as the root cause of latency or availability issues.5 entries
Symptom · 01
Latency spikes correlate with GC pauses in application logs
Fix
Enable GC logging with -Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime,level,tags and correlate pause timestamps with latency metrics. Check if pauses are young GC, mixed GC, or full GC.
Symptom · 02
Full GC appearing frequently in logs
Fix
Full GC in G1 signals a critical failure mode — the collector cannot keep up. Check for humongous allocation rate, heap fragmentation, or metaspace exhaustion. In ZGC/Shenandoah, full GC is exceptionally rare and indicates a serious configuration problem.
Symptom · 03
Throughput drops but pause times are acceptable
Fix
Collector is consuming too much CPU. Check concurrent GC thread count (-XX:ConcGCThreads). Reduce if GC CPU usage exceeds 15-20% of total. Profile allocation rate — if > 2GB/sec, consider reducing allocation pressure at the application level.
Symptom · 04
OOM kill with no heap exhaustion visible in metrics
Fix
Check native memory: metaspace, thread stacks, direct byte buffers, mmap regions. Use -XX:NativeMemoryTracking=detail and jcmd <pid> VM.native_memory summary. G1's remembered sets and ZGC's multi-mapping both consume significant off-heap memory.
Symptom · 05
GC pause time increases linearly with heap size
Fix
You are likely hitting a GC algorithm limitation. G1 pauses scale with live data set, not heap size. ZGC pauses are independent of heap size. If pauses scale with heap, evaluate switching collectors or reducing live data through object pooling or cache eviction.
★ GC Triage Cheat Sheet — First 60 SecondsFast diagnostic commands when GC is suspected. Run these before diving into GC logs.
Application unresponsive, suspected full GC
Immediate action
Check if JVM is in a GC stop-the-world pause
Commands
jcmd <pid> GC.heap_info
jstat -gcutil <pid> 1000 10
Fix now
If Full GC count is incrementing, check humongous allocations and heap fragmentation immediately. Restart with -Xlog:gc+humongous=debug
High CPU with low application throughput+
Immediate action
Check if GC threads are consuming CPU
Commands
top -H -p <pid> | grep -E 'VM Thread|GC Thread'
jcmd <pid> VM.flags | grep -i conc
Fix now
Reduce -XX:ConcGCThreads or -XX:ParallelGCThreads if GC CPU > 20%. Consider if allocation rate can be reduced at application level.
Latency spikes at regular intervals+
Immediate action
Correlate spike timing with GC cycle phases
Commands
jstat -gcutil <pid> 500 20
grep 'Pause' gc.log | tail -20
Fix now
If spikes align with 'mixed' or 'remark' phases, tune -XX:G1MixedGCCountTarget or -XX:MaxGCPauseMillis. For ZGC, spikes during 'Relocate' phase suggest allocation rate exceeds reclamation speed.
OOM kill by container orchestrator (k8s)+
Immediate action
Compare container memory limit with JVM heap + native overhead
Commands
kubectl describe pod <pod> | grep -A5 'OOMKilled'
jcmd <pid> VM.native_memory summary
Fix now
Set -XX:MaxRAMPercentage to 75% max (not 90%). Account for ~20% native overhead with ZGC and ~15% with G1. Add container memory limit = heap * 1.3 for ZGC.
Allocation failure in logs, to-space exhausted+
Immediate action
G1 cannot evacuate objects — critical failure
Commands
grep 'to-space exhausted' gc.log | wc -l
grep 'humongous' gc.log | tail -20
Fix now
Increase -XX:G1ReservePercent to 15. Increase region size. Reduce allocation rate. This triggers full GC — treat as P1.
G1 vs ZGC vs Shenandoah — Production Comparison
CharacteristicG1GCZGCShenandoah
JDK availabilityJDK 7+ (default JDK 9+)JDK 11+ (prod JDK 15+)JDK 8+ (backports), prod JDK 12+
Typical pause time50-200ms (tunable to ~50ms)< 10ms (independent of heap)< 10ms (independent of heap)
Throughput overheadBaseline (lowest)10-15% vs G15-10% vs G1
Native memory overhead~10-15% of heap~20-25% of heap~10-15% of heap + 8 bytes/object
Compressed oopsSupportedNot supportedSupported
Generational collectionYes (built-in)Yes (JDK 21+ with -XX:+ZGenerational)No (full-heap concurrent)
Max tested heapTerabytes16TBTerabytes
Humongous objectsProblematic — requires tuningNo concept — handles large objects wellNo concept — handles large objects well
Container friendlinessGood — predictable overheadPoor — high native memoryGood — supports compressed oops
Allocation stall behaviorFull GC fallback (catastrophic)Hard stall (backpressure)Soft pacing (gradual degradation)
Tuning complexityModerate — many flagsLow — fewer flags, self-tuningLow-moderate — heuristic modes
Community/ecosystem maturityVery mature — default collectorMature — growing adoptionModerate — Red Hat backed
Best use caseGeneral purpose, cost-sensitiveUltra-low latency, large heapsLow latency, memory-efficient, moderate heaps

Key takeaways

1
G1 is the right default for most workloads
do not adopt ZGC or Shenandoah unless your latency SLA explicitly demands sub-50ms pauses.
2
The most effective GC tuning is reducing allocation rate at the application level. No collector flag compensates for excessive allocation.
3
ZGC's generational mode (JDK 21+) is transformative
always enable it. Non-generational ZGC is a throughput disaster on allocation-heavy workloads.
4
Container memory must account for native overhead
15% for G1/Shenandoah, 25% for ZGC. OOM kills from native memory exhaustion are the #1 containerized JVM incident.
5
GC observability is non-negotiable. Track allocation rate, pause time p99, and live data size trend. These three metrics predict 90% of GC incidents.
6
Humongous objects are G1's hidden failure mode. Monitor them proactively. Increase region size or chunk objects at the application level.
7
Shenandoah's pacing creates smoother degradation than ZGC's allocation stalls. Choose Shenandoah for workloads where gradual degradation is preferred over hard backpressure.
8
Never tune GC flags without detailed GC logging enabled. Default logging is insufficient for production diagnosis.

Common mistakes to avoid

8 patterns
×

Setting -Xmx without accounting for native memory overhead

Symptom
JVM heap is not total JVM memory. GC internal structures (remembered sets, card tables, ZGC multi-mapping), thread stacks, metaspace, and direct byte buffers all consume off-heap memory. Setting container memory limit equal to -Xmx guarantees OOM kills. Budget 15-25% extra depending on collector. —
Fix
Use container_limit = Xmx 1.15 (G1/Shenandoah) or Xmx 1.25 (ZGC). Monitor with -XX:NativeMemoryTracking=detail.
×

Choosing ZGC because 'lower pauses are always better'

Symptom
ZGC's 10-15% throughput overhead and 25% native memory overhead are real costs. If your latency SLA is 200ms, G1 meets that comfortably. The throughput and memory savings with G1 translate to fewer pods and lower infrastructure cost. —
Fix
Only adopt ZGC when your measured p99 latency with tuned G1 exceeds your SLA. Profile first, then decide.
×

Tuning GC flags without enabling detailed GC logging

Symptom
Default GC logging is insufficient for production tuning. Without -Xlog:gc*,gc+phases=debug, you cannot see pause time breakdowns, humongous allocation rates, or evacuation failures. You are flying blind. —
Fix
Always enable: -Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime,level,tags. Rotate logs. Ship to observability platform.
×

Using the same GC flags across all services

Symptom
Each service has a different allocation profile, heap size, and latency requirement. Flags tuned for a low-allocation REST API will fail catastrophically on a high-throughput stream processor. Tune per-service based on actual workload characteristics. —
Fix
Profile each service independently. Start with defaults. Adjust based on measured allocation rate, pause times, and memory utilization.
×

Ignoring humongous allocations in G1

Symptom
Humongous objects (>50% of G1 region size) bypass normal region allocation and can trigger to-space exhausted failures. This is the #1 cause of unexpected full GC in G1-tuned services. —
Fix
Monitor with -Xlog:gc+humongous=debug. Increase -XX:G1HeapRegionSize to reduce humongous threshold. Chunk large objects at the application level.
×

Not setting Xms equal to Xmx for ZGC and Shenandoah

Symptom
ZGC and Shenandoah perform best with a fixed heap size. Dynamic heap resizing adds unnecessary complexity and can cause unpredictable behavior during resize events. G1 tolerates Xms != Xmx better, but fixed sizing is still recommended. —
Fix
Always set -Xms equal to -Xmx for production workloads with ZGC and Shenandoah.
×

Measuring GC health by pause time alone

Symptom
A collector with 5ms pauses that runs 1000 times per minute spends more time in GC than one with 50ms pauses that runs 10 times per minute. GC overhead = pause_time * frequency. Always measure both. —
Fix
Track GC overhead percentage: total GC time / total elapsed time. Alert if > 5% for latency-sensitive services, > 10% for throughput-oriented services.
×

Running non-generational ZGC in production on JDK 21+

Symptom
Non-generational ZGC collects the entire heap every cycle. This is a throughput disaster on allocation-heavy workloads. Generational ZGC (JDK 21+) focuses on young objects and is dramatically more efficient. —
Fix
Always enable -XX:+ZGenerational on JDK 21+ production deployments. There is almost no reason to use non-generational ZGC on JDK 21+.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
Explain the fundamental trade-off between G1, ZGC, and Shenandoah. Why c...
Q02JUNIOR
Your payment service is running G1 with 16GB heap. During peak traffic, ...
Q03JUNIOR
You are migrating a service from G1 to ZGC. After migration, p99 latency...
Q04JUNIOR
What is the difference between ZGC's allocation stall and Shenandoah's p...
Q05JUNIOR
A service has a large on-heap cache holding 10GB of data with a 24-hour ...
Q06JUNIOR
How do you calculate the right container memory limit for a JVM running ...
Q07JUNIOR
Explain why setting -XX:MaxGCPauseMillis=200 does not guarantee 200ms ma...
Q08JUNIOR
You need to support both a latency-sensitive API (p99 < 20ms) and a batc...
Q01 of 08JUNIOR

Explain the fundamental trade-off between G1, ZGC, and Shenandoah. Why can't one collector optimize all three axes (pause time, throughput, memory efficiency)?

ANSWER
Each collector makes a different bet on which two axes to optimize. G1 optimizes throughput and memory efficiency by accepting longer pauses (region-based evacuation with remembered sets). ZGC optimizes pause time and compaction by accepting throughput overhead (load barriers on every object access) and memory overhead (no compressed oops, multi-mapping). Shenandoah optimizes pause time and throughput balance by accepting per-object memory overhead (8-byte Brooks pointers). The fundamental constraint is Amdahl's Law applied to concurrent collection — doing more work concurrently requires more coordination overhead, which either costs CPU (throughput) or memory (metadata structures).
FAQ · 8 QUESTIONS

Frequently Asked Questions

01
Should I use G1, ZGC, or Shenandoah for my microservice?
02
How much heap should I allocate in a Kubernetes container?
03
What is the difference between a young GC and a mixed GC in G1?
04
Can I switch collectors without restarting the JVM?
05
How do I know if my allocation rate is too high?
06
Does ZGC work on ARM processors?
07
What causes 'allocation stall' in ZGC logs?
08
Is Shenandoah production-ready?
🔥

That's Advanced Java. Mark it forged?

4 min read · try the examples if you haven't

Previous
JVM Memory Issues in Production: Debugging Guide (OOM, GC, Leaks)
14 / 28 · Advanced Java
Next
Observer Pattern in Java