JVM GC Tuning — G1 Humongous Allocation Storm
Payment p99 latency spike from 45ms to 12s from G1 humongous objects (3-5MB protobuf).
20+ years shipping production Java in banking & fintech. Written from production experience, not tutorials.
- ✓Solid grasp of fundamentals
- ✓Comfortable reading code examples
- ✓Basic production concepts
- G1GC: Default JVM collector (Java 9+). Balances throughput and latency using region-based heap partitioning and concurrent marking. Right choice for most workloads.
- ZGC: Targets sub-10ms pause times via fully concurrent marking, relocation, and reference processing. Costs 10–15% throughput and ~25% extra native memory overhead.
- Shenandoah: Similar low-pause goals as ZGC using Brooks pointers and concurrent evacuation. Better memory efficiency than ZGC but narrower platform support.
- Key trade-off: All three collectors trade throughput for latency guarantees in different ways. No collector maximises all three axes simultaneously.
- Biggest mistake: Choosing a collector without matching it to your workload profile — leads to unnecessary latency or wasted CPU.
- Important: GC tuning is not portable. Flags and behaviour differ significantly across G1, ZGC, and Shenandoah — never copy-paste tuning flags between collectors.
G1 Humongous Allocation Storm is a pathological GC behavior where objects larger than half a G1 region (typically 512KB–1MB, depending on -XX:G1HeapRegionSize) trigger immediate, serialized full-heap pauses. Unlike normal G1 concurrent marking and evacuation, humongous allocations bypass the young generation entirely, landing directly in the old generation as contiguous blocks.
When a storm hits—say, a burst of 10MB byte arrays during a batch job—G1 must stop-the-world to find free space, often failing and falling back to a serial full GC that can freeze production JVMs for seconds or minutes. This isn't a memory leak; it's a design limitation of G1's region-based compaction, which cannot move humongous objects once allocated.
G1GC (Garbage-First) is the default collector since JDK 9, designed for heaps from 4GB to 64GB with target pause times under 200ms. It divides the heap into 1MB–32MB regions and prioritizes collecting regions with the most garbage first. G1 works well for steady-state web servers but struggles with large allocations, high allocation rates, or heaps above 64GB where its remembered sets bloat.
ZGC (JDK 11+) uses colored pointers and load barriers to achieve sub-millisecond pauses regardless of heap size, handling terabyte heaps without the humongous storm problem—but at the cost of 15-20% CPU overhead and higher memory footprint. Shenandoah (JDK 12+, backported to JDK 8u) uses Brooks pointers for concurrent compaction, avoiding G1's humongous fragmentation entirely, though it trades throughput for pause consistency.
In production, you don't fight humongous storms with GC tuning alone—you redesign the allocation pattern. Common patterns that work: pooling reusable buffers via ThreadLocal<byte[]> or Netty's PooledByteBufAllocator, splitting large objects into region-sized chunks (e.g., 512KB segments), or switching to ZGC/Shenandoah when object sizes exceed 1MB regularly.
Real-world data: a 64GB heap running G1 with 2MB regions will STW for 3-5 seconds on a 200MB humongous allocation; ZGC handles the same allocation in <1ms. The trade-off is CPU—ZGC adds ~15% more cycles than G1 at steady state. For latency-sensitive services (sub-10ms p99), ZGC or Shenandoah are mandatory; for batch processing with predictable object sizes, G1 with -XX:G1HeapRegionSize=4M and -XX:G1MixedGCLiveThresholdPercent=85 often suffices.
Think of GC as a janitor cleaning a warehouse while workers keep stocking shelves. G1 is a methodical janitor who cleans room-by-room with short pauses. ZGC is a ghost janitor who cleans almost invisibly but needs more helpers. Shenandoah is another ghost janitor with a different cleaning strategy. Each janitor trades off how much disruption they cause against how many workers (CPU) they need to do the job.
Garbage collection tuning is the single most impactful JVM performance lever after algorithm design. Yet most teams default to G1 without understanding whether their workload demands ZGC or Shenandoah. The wrong collector choice manifests as either unexplained latency spikes or wasted CPU capacity — both invisible until they compound under production load.
This guide covers G1, ZGC, and Shenandoah from a production operator's perspective. Each section includes failure scenarios, tuning knobs, and trade-off analysis grounded in real production incidents. No toy examples — every configuration reflects what actually breaks in the field.
The core misconception: GC tuning is about eliminating pauses. It is not. GC tuning is about aligning pause behavior with your application's latency budget and throughput requirements. A 200ms pause is catastrophic for a trading engine and irrelevant for a batch ETL job. Context determines correctness.
What G1 Humongous Allocation Storm Actually Means
A humongous allocation storm occurs when the G1 garbage collector repeatedly fails to allocate objects larger than half a G1 region (typically 512 KB to 1 MB) because the free regions are scattered, not contiguous. G1 treats these oversized objects as 'humongous' and allocates them directly in a contiguous block of regions, bypassing the normal young generation path. When many such allocations happen in rapid succession, G1 must perform frequent concurrent marking cycles to reclaim space, which degrades throughput and can trigger full STW (stop-the-world) collections.
G1's region-based heap is divided into ~1 MB regions by default. A humongous object occupies multiple entire regions, and those regions are reclaimed only during a concurrent mark cycle or a full GC. During a storm, the heap fragments rapidly because humongous regions are never moved (G1 does not compact them). The result: premature promotion failures, increased concurrent cycle frequency, and eventually a full GC that pauses all application threads for seconds.
You encounter this in systems with large caches, byte buffers, or serialized payloads that exceed the region size. The fix is not just increasing heap size — it's controlling the size and lifetime of large allocations. Use -XX:G1HeapRegionSize to tune region size, or pool and reuse large objects to avoid repeated humongous allocations.
G1GC — The Workhorse Collector
G1 (Garbage-First) has been the default JVM collector since Java 9. It divides the heap into equal-sized regions (1MB to 32MB) and prioritizes collecting regions with the most garbage — hence 'garbage-first'. G1 maintains a remembered set per region tracking incoming references, enabling independent region collection without scanning the entire heap.
G1 operates in young-only and mixed collection cycles. Young GC collects survivor and eden regions. When the heap occupancy exceeds the Initiating Heap Occupancy Percent (IHOP), G1 triggers a concurrent marking cycle. After marking completes, subsequent mixed GCs collect both young and old regions identified as mostly garbage.
The critical production insight: G1's pause time is primarily driven by the number of regions it must collect in a single pause, not heap size. A 64GB heap with aggressive evacuation can pause longer than a 4GB heap with conservative settings. This is the opposite of what most engineers assume.
package io.thecodeforge.gc; import java.util.concurrent.ConcurrentHashMap; import java.util.Map; /** * Demonstrates allocation patterns that stress G1 differently. * * Key insight: G1 humongous objects (>50% region size) bypass normal * allocation and can trigger to-space exhausted failures. */ public class G1TuningExample { // Cache with large value objects — common source of humongous allocations private final Map<String, byte[]> payloadCache = new ConcurrentHashMap<>(); /** * BAD: Allocates objects that may exceed humongous threshold. * With default 1MB region size, objects > 512KB are humongous. * With 32MB regions, threshold is 16MB — much safer for large payloads. * * Tuning: -XX:G1HeapRegionSize=32M * -XX:G1ReservePercent=15 * -XX:InitiatingHeapOccupancyPercent=35 */ public void cacheLargePayload(String key, int sizeBytes) { byte[] payload = new byte[sizeBytes]; // Simulate deserialization fill for (int i = 0; i < Math.min(sizeBytes, 1024); i++) { payload[i] = (byte) (i & 0xFF); } payloadCache.put(key, payload); } /** * BETTER: Chunk large payloads to stay below humongous threshold. * Each chunk is independently collectible as a regular object. */ public void cacheChunkedPayload(String key, byte[] fullPayload) { int chunkSize = 256 * 1024; // 256KB chunks — well below humongous threshold int numChunks = (fullPayload.length + chunkSize - 1) / chunkSize; for (int i = 0; i < numChunks; i++) { int offset = i * chunkSize; int length = Math.min(chunkSize, fullPayload.length - offset); byte[] chunk = new byte[length]; System.arraycopy(fullPayload, offset, chunk, 0, length); payloadCache.put(key + ":chunk:" + i, chunk); } } /** * Production G1 flags for a 16GB heap with mixed allocation profile: * * -XX:+UseG1GC * -Xms16g -Xmx16g * -XX:G1HeapRegionSize=16m * -XX:MaxGCPauseMillis=200 * -XX:G1ReservePercent=15 * -XX:InitiatingHeapOccupancyPercent=35 * -XX:G1MixedGCCountTarget=8 * -XX:G1MixedGCLiveThresholdPercent=85 * -Xlog:gc*,gc+humongous=debug:file=/var/log/gc.log:time,uptime,level,tags */ }
- Pause time scales with live data in collected regions, not total heap size
- Humongous objects break this model — they span multiple regions and cannot be partially evacuated
- Remembered sets consume 5-10% of heap as off-heap overhead — budget for this when setting -Xmx
- To-space exhausted means G1 literally ran out of regions to evacuate into — this is a full GC fallback
ZGC — Sub-Millisecond Pause Collector
ZGC (Z Garbage Collector) was introduced as an experimental feature in JDK 11 and became production-ready in JDK 15. Its defining characteristic: pause times stay below 10ms regardless of heap size — tested up to 16TB heaps. ZGC achieves this through concurrent everything: marking, relocation, and reference processing all happen while application threads run.
ZGC uses load barriers (not write barriers) with colored pointers. Every object reference in ZGC carries metadata bits (marked0, marked1, remap, finalize) embedded in the pointer itself. The load barrier intercepts every object access to check if the reference needs remapping. This is the fundamental trade-off: ZGC replaces long GC pauses with per-access overhead on every object load.
As of JDK 21, ZGC supports generational mode (-XX:+ZGenerational) which dramatically improves throughput by focusing collection on young objects. Non-generational ZGC collects the entire heap every cycle, which limits throughput on allocation-heavy workloads.
package io.thecodeforge.gc; import java.util.concurrent.atomic.AtomicLong; /** * ZGC-specific considerations for production workloads. * * ZGC trades per-access overhead for near-zero pause times. * The load barrier adds ~4-8% overhead on pointer-heavy workloads. */ public class ZGCTuningExample { private final AtomicLong allocationCounter = new AtomicLong(0); /** * ZGC is sensitive to allocation rate, not allocation size. * A workload allocating many small objects stresses ZGC more than * fewer large objects because load barriers fire more frequently. * * Monitor: jstat -gcutil <pid> 1000 * Watch: ZGC cycle count and allocation rate. */ public void highFrequencyAllocation() { // 1000 allocations per call — each triggers load barrier overhead // on subsequent reads. ZGC handles this well with generational mode. for (int i = 0; i < 1000; i++) { Object temp = new Object(); allocationCounter.incrementAndGet(); // temp is immediately eligible for collection } } /** * Production ZGC flags for a 32GB heap, latency-sensitive service: * * -XX:+UseZGC * -XX:+ZGenerational // JDK 21+ — critical for throughput * -Xms32g -Xmx32g // Always set Xms=Xmx for ZGC * -XX:SoftMaxHeapSize=28g // ZGC-specific: target heap occupancy * -XX:ZCollectionInterval=5 // Suggest GC cycle every 5 seconds * -XX:ConcGCThreads=4 // Concurrent GC threads (default: auto) * -XX:ParallelGCThreads=8 // Parallel GC threads * -Xlog:gc*:file=/var/log/zgc.log:time,uptime,level,tags * * CRITICAL: ZGC uses ~20% native memory overhead beyond -Xmx. * Container memory limit must be heap * 1.25 minimum. */ /** * ZGC SoftMaxHeapSize is unique — it tells ZGC to try to stay below * this threshold but can exceed it under allocation pressure. * * Use case: Set heap to 32GB, SoftMaxHeapSize to 28GB. * ZGC will trigger cycles aggressively to stay under 28GB. * Only allocates into the remaining 4GB under extreme pressure. * This prevents container OOM kills while keeping a safety margin. */ public void demonstrateSoftMaxHeapConcept() { // With SoftMaxHeapSize=28g and Xmx=32g: // - ZGC targets 28GB occupancy // - If allocation pressure pushes past 28GB, ZGC cycles more aggressively // - Only if allocation rate exceeds reclamation rate does it use 28-32GB // - If it hits 32GB, allocation stalls (not OOM, but backpressure) // // This is fundamentally different from G1's IHOP which just triggers // a marking cycle. SoftMaxHeapSize is a continuous pressure signal. } }
- Pause times are truly independent of heap size and live data size — tested to 16TB
- The trade-off is per-access CPU overhead, not pause time — you pay on every object load, not during GC
- ZGC needs 4-byte aligned addresses to use pointer bits — this constrains compressed oops behavior
- Generational ZGC (JDK 21+) reduces this overhead dramatically by focusing on young generation
- ZGC cannot use compressed object pointers (UseCompressedOops) — this increases memory usage by ~15% on heaps < 32GB
Shenandoah — Red Hat's Low-Pause Contender
Shenandoah is Red Hat's concurrent compacting collector, available as a production feature since JDK 12 (backported to JDK 8 and 11 via Shenandoah project). It achieves low pause times through concurrent evacuation — moving live objects while application threads run — using Brooks pointers (an indirection layer on every object).
Shenandoah's architecture differs from ZGC in a critical way: it uses Brooks pointers (every object has a forwarding pointer field) instead of colored pointers. This means Shenandoah does not require specific pointer bit layouts and works with compressed oops, reducing memory overhead compared to ZGC on heaps under 32GB.
Shenandoah operates in three concurrent phases: concurrent mark (identify live objects), concurrent evacuate (move live objects out of garbage-heavy regions), and concurrent update-refs (fix pointers to moved objects). The initial mark and final mark phases are short stop-the-world pauses, typically under 10ms.
package io.thecodeforge.gc; import java.util.ArrayList; import java.util.List; /** * Shenandoah-specific production considerations. * * Shenandoah uses Brooks pointers — every object has an extra forwarding * pointer field. This adds 8 bytes per object on 64-bit systems. * For applications with millions of small objects, this overhead is measurable. */ public class ShenandoahTuningExample { /** * Brooks pointer overhead calculation: * * Object with 2 fields (16 bytes header + 16 bytes data = 32 bytes) * + 8 bytes Brooks pointer = 40 bytes per object * Overhead: 25% increase per object * * For 10 million small objects: ~80MB additional memory * For 100 million small objects: ~800MB additional memory * * Compare to ZGC: no per-object overhead, but ~15% heap overhead * from multi-mapping and compressed oops unavailability. */ public long estimateBrooksOverhead(int objectCount) { return (long) objectCount * 8; // 8 bytes per Brooks pointer } /** * Production Shenandoah flags for a 16GB heap: * * -XX:+UseShenandoahGC * -Xms16g -Xmx16g * -XX:ShenandoahGCHeuristics=adaptive // or 'compact', 'static' * -XX:ShenandoahAllocationThreshold=10 // trigger cycle after 10% allocation * -XX:+UseCompressedOops // works with Shenandoah (unlike ZGC) * -XX:+UseCompressedClassPointers * -Xlog:gc*:file=/var/log/shenandoah.log:time,uptime,level,tags * * Heuristic modes: * - adaptive: (default) adjusts cycle frequency based on allocation rate * - compact: more aggressive collection, lower heap usage, slightly higher CPU * - static: fixed cycle interval, predictable behavior for benchmarking */ /** * Shenandoah pacing is a unique feature that backpressures allocation * threads when the collector falls behind. * * Unlike ZGC which stalls allocation entirely, Shenandoah slows down * allocating threads proportionally. This creates smoother latency * degradation under load rather than sharp spikes. * * Monitor: -Xlog:gc+stats to see pacing delays. * If pacing delays exceed 1ms consistently, allocation rate is too high * for the current heap size and ConcGCThreads setting. */ public void demonstratePacingBehavior() { List<byte[]> allocations = new ArrayList<>(); // Under heavy allocation, Shenandoah will pace this loop // by adding small delays to each allocation. // The delay is proportional to how far behind the collector is. // // This is different from ZGC's allocation stall which is a hard stop. // Shenandoah's pacing creates gradual degradation. for (int i = 0; i < 100_000; i++) { allocations.add(new byte[1024]); // 1KB each } } }
- No load barrier overhead — Shenandoah uses store barriers instead, which fire less frequently
- Works with compressed oops — saves ~15% memory compared to ZGC on heaps under 32GB
- Per-object overhead of 8 bytes — significant for workloads with many small objects
- Concurrent evacuation means compaction happens while application runs — less fragmentation than G1
- Pacing mechanism creates graceful degradation instead of hard allocation stalls
Comparing the Three Collectors — Real Trade-offs
The collector choice is not about which is 'best' — it is about matching the collector's trade-off profile to your workload's requirements. Every collector sacrifices something: G1 sacrifices pause-time predictability for throughput. ZGC sacrifices throughput and memory for near-zero pauses. Shenandoah sacrifices per-object memory for balanced pause-throughput behavior.
The following comparison reflects production reality, not benchmark lab conditions. Real workloads have allocation spikes, mixed object lifetimes, and container constraints that change the calculus entirely.
package io.thecodeforge.gc; /** * Decision framework for collector selection based on production constraints. * * No collector is universally superior. This guide maps workload * characteristics to the appropriate collector. */ public class CollectorSelectionGuide { /** * SELECTION MATRIX: * * Workload Profile | Recommended Collector | Reason * --------------------------|----------------------|------------------ * General web service | G1 | Good balance, mature ecosystem * Sub-10ms latency SLA | ZGC (generational) | Hard pause guarantee * Sub-10ms + <32GB heap | Shenandoah | Compressed oops, pacing * Batch processing | G1 or Parallel | Throughput over latency * Large heap (>64GB) | ZGC | Pause times scale with heap * Small heap (<4GB) | G1 | Overhead of ZGC/Shenandoah unjustified * Container-constrained | G1 or Shenandoah | Lower native memory overhead * High allocation rate | ZGC (generational) | Generational mode handles young gen * Mixed object lifetimes | G1 | Region-based collection handles this well * Many small objects | ZGC | No per-object overhead * Many large objects | G1 (large regions) | Humongous object handling */ /** * NATIVE MEMORY OVERHEAD COMPARISON (approximate, production values): * * G1: * - Remembered sets: 5-10% of heap * - Card table: ~0.2% of heap * - Total native overhead: ~10-15% of heap * * ZGC: * - Multi-mapping: ~15-20% of heap (virtual address space) * - Page table overhead: variable * - No compressed oops: +15% heap usage for <32GB heaps * - Total native overhead: ~20-25% of heap * * Shenandoah: * - Brooks pointers: 8 bytes per object * - Remembered sets: ~5% of heap * - Compressed oops: supported (saves ~15% vs ZGC) * - Total native overhead: ~10-15% of heap + per-object cost */ /** * CONTAINER MEMORY FORMULA: * * G1: container_limit = Xmx * 1.15 * ZGC: container_limit = Xmx * 1.25 * Shenandoah: container_limit = Xmx * 1.15 + (object_count * 8) * * If container limit is fixed, work backwards: * G1: Xmx = container_limit / 1.15 * ZGC: Xmx = container_limit / 1.25 * Shenandoah: Xmx = (container_limit - object_count * 8) / 1.15 */ }
- G1: Maximizes throughput and memory efficiency. Sacrifices pause-time predictability below ~50ms.
- ZGC: Maximizes pause-time guarantee and memory compaction. Sacrifices throughput (10-15%) and memory (compressed oops unavailable).
- Shenandoah: Maximizes pause-time guarantee and throughput balance. Sacrifices per-object memory (8 bytes Brooks pointer).
- No tuning can break this triangle — you are choosing which axis to sacrifice, not eliminating trade-offs.
Production Tuning Patterns That Actually Work
Most GC tuning guides present flags in isolation. Production tuning requires understanding how flags interact and which signals indicate which adjustments. These patterns are derived from incidents across payment processing, real-time bidding, and high-frequency trading systems.
package io.thecodeforge.gc; /** * Production tuning patterns organized by problem type. * Each pattern addresses a specific failure mode. */ public class ProductionTuningPatterns { /** * PATTERN 1: Allocation Rate Spike Handler * * Problem: Bursts of allocation cause GC to fall behind. * Symptom: Increasing pause times during traffic spikes. * * G1 Fix: * -XX:InitiatingHeapOccupancyPercent=30 // start marking earlier * -XX:G1ReservePercent=15 // more evacuation buffer * -XX:G1RSetUpdatingPauseTimePercent=5 // less RSet work in pause * * ZGC Fix: * -XX:SoftMaxHeapSize=<70% of Xmx> // trigger cycles earlier * -XX:ConcGCThreads=<cores/4> // more concurrent threads * -XX:+ZGenerational // focus on young objects * * Shenandoah Fix: * -XX:ShenandoahAllocationThreshold=5 // cycle after 5% allocation * -XX:ConcGCThreads=<cores/4> // more concurrent threads * -XX:ShenandoahGCHeuristics=compact // aggressive reclamation */ /** * PATTERN 2: Long-Lived Cache Optimization * * Problem: Large caches create a big live data set that GC must scan * but never reclaim. This wastes GC cycles and increases pause times. * * Solution: Use off-heap caching (Caffeine with weakValues, or * Chronicle Map) to move cached data outside GC's jurisdiction. * * If on-heap caching is required: * G1: -XX:G1MixedGCLiveThresholdPercent=90 // skip regions with >90% live * ZGC: Already handles this well with concurrent marking * Shen: -XX:ShenandoahGCHeuristics=adaptive // skip mostly-live regions */ /** * PATTERN 3: Container-Aware Sizing * * Problem: JVM heap + native memory exceeds container limit. * Symptom: OOM kill with no heap exhaustion in metrics. * * Rule of thumb for container memory limits: * - Set Xmx = container_limit * 0.80 for G1 * - Set Xmx = container_limit * 0.70 for ZGC * - Set Xmx = container_limit * 0.80 for Shenandoah * * Remaining memory covers: * - Thread stacks (1MB per thread, ~500 threads = 500MB) * - Metaspace (class metadata, usually 100-300MB) * - Direct byte buffers (monitor with MBean) * - GC internal structures (remembered sets, card tables) * - JNI native memory */ /** * PATTERN 4: Warm-Up Tuning for Low-Latency Services * * Problem: First requests after deployment have high latency due to * JIT compilation, class loading, and initial GC cycles. * * Solution: * 1. Use -XX:+AlwaysPreTouch to pre-zero heap pages at startup * 2. Implement warm-up traffic routing (load balancer weight ramp) * 3. Run synthetic allocation load for 60s before accepting traffic * 4. For ZGC: first 2-3 cycles are slower as JIT optimizes load barriers * * io.thecodeforge.gc.WarmUpManager can handle synthetic warm-up. */ }
- < 500 MB/sec: Any collector handles this comfortably with default settings
- 500 MB/sec - 2 GB/sec: G1 works with tuning. ZGC generational mode handles well.
- 2-5 GB/sec: Requires aggressive tuning or allocation reduction. ZGC generational is best.
- > 5 GB/sec: Consider object pooling, arena allocation, or off-heap strategies. GC alone cannot keep up.
- Measure with: jstat -gc <pid> 1000 — calculate (bytes allocated between samples) / interval
Monitoring and Observability for GC Health
GC tuning without observability is blind optimization. Every production JVM must emit GC metrics that allow correlation with application latency and throughput. The minimum viable GC observability setup includes pause time histograms, allocation rate tracking, and GC cycle phase breakdowns.
📚 RELATED NEXT STEPS
→ JVM Memory Issues in Production: Debugging Guide (OOM, GC, Leaks) — When GC metrics point to an active incident, use this triage sequence
→ Garbage Collection in Java — Deep dive into how each collector works internally
→ Java Memory Leaks and Prevention — GC tuning alone cannot fix a leak — identify the root cause first
package io.thecodeforge.gc; import java.lang.management.GarbageCollectorMXBean; import java.lang.management.ManagementFactory; import java.lang.management.MemoryPoolMXBean; import java.lang.management.MemoryUsage; import java.util.List; /** * Production GC metrics exporter for Prometheus/Micrometer integration. * * These metrics enable correlation of GC behavior with application * latency and throughput in your observability stack. */ public class GCMetricsExporter { /** * Essential GC metrics every production service must emit: * * 1. jvm_gc_pause_seconds{collector, action} * - Histogram of GC pause durations * - Alert on p99 > SLA threshold * * 2. jvm_gc_allocation_rate_mbps * - Calculated from heap usage delta between GC cycles * - Leading indicator of GC pressure * * 3. jvm_gc_live_data_size_bytes * - Size of live objects after major collection * - Growing trend = memory leak * * 4. jvm_gc_memory_promoted_bytes_total * - Bytes promoted from young to old generation * - High rate = short-lived objects escaping young gen * * 5. jvm_memory_used_bytes{area, pool} * - Per-memory-pool usage * - Alert on old gen > 80% sustained */ public void exportGCMetrics() { List<GarbageCollectorMXBean> gcBeans = ManagementFactory.getGarbageCollectorMXBeans(); for (GarbageCollectorMXBean gcBean : gcBeans) { String collectorName = gcBean.getName(); long collectionCount = gcBean.getCollectionCount(); long collectionTimeMs = gcBean.getCollectionTime(); // Export as: // jvm_gc_collection_count_total{collector="<name>"} <count> // jvm_gc_collection_time_seconds_total{collector="<name>"} <time_sec> System.out.printf("Collector: %s, Count: %d, Time: %dms%n", collectorName, collectionCount, collectionTimeMs); } // Memory pool monitoring for heap pressure detection List<MemoryPoolMXBean> memoryBeans = ManagementFactory.getMemoryPoolMXBeans(); for (MemoryPoolMXBean pool : memoryBeans) { MemoryUsage usage = pool.getUsage(); long usedMB = usage.getUsed() / (1024 * 1024); long maxMB = usage.getMax() / (1024 * 1024); double utilization = (double) usage.getUsed() / usage.getMax() * 100; // Alert if old gen utilization > 80% for extended period System.out.printf("Pool: %s, Used: %dMB, Max: %dMB, Util: %.1f%%%n", pool.getName(), usedMB, maxMB, utilization); } } /** * GC log analysis commands for production triage: * * 1. Pause time distribution: * grep 'Pause' gc.log | awk '{print $NF}' | sort -n | awk ' * {a[NR]=$1} END {print "p50:",a[int(NR*0.5)],"p99:",a[int(NR*0.99)],"max:",a[NR]}' * * 2. GC frequency over time: * grep '\[gc.*\]' gc.log | awk '{print $1}' | cut -d'T' -f1 | uniq -c * * 3. Humongous allocation rate (G1 specific): * grep 'humongous' gc.log | wc -l * grep 'humongous' gc.log | awk '{sum+=$NF} END {print sum/NR, "bytes avg"}' * * 4. ZGC cycle time distribution: * grep 'Garbage Collection.*GC\(' gc.log | grep -oP '\d+\.\d+ms' | sort -n * * 5. Shenandoah pacing delays: * grep 'Pacing' gc.log | awk '{print $NF}' | sort -n | tail -20 */ }
- Allocation rate trend: A steadily increasing allocation rate (week over week) means you will hit GC capacity limits. Fix before it becomes an incident.
- Live data size trend: A growing live data set after full GC means a memory leak. GC cannot reclaim it — the application is retaining references.
- Pause time p99 trend: If p99 pause time is growing over days, the heap is fragmenting or the live data set is growing. Investigate before it violates SLA.
STW vs Concurrent — Why the Collector Choice Is a Betrayal Contract
Every GC toggle is a promise. G1GC promises throughput. ZGC promises latency. Shenandoah promises predictable pauses. Each of them delivers on one axis and betrays you on another.
The real cost isn't the pause duration. It's the frequency multiplied by the workload sensitivity. A 2ms STW event kills a real-time trading feed. A 10ms incremental pause is invisible to a batch processor. Stop thinking in millisecond numbers. Start thinking in business time.
Concurrent collectors give you nearly-zero-pause GC cycles. But they steal CPU cycles from your application threads. G1GC co-schedules with your workload. ZGC and Shenandoah run phases in parallel. You're trading heap size for CPU overhead. If your heap exceeds 64GB and you're running G1GC, you're paying interest on a loan you didn't take out.
The betrayal is subtle: concurrent collectors hide the cost in the CPU graph, not the GC log. Watch your user time vs system time. A ZGC runtime that spikes your CPU to 95% while keeping 200µs pauses means you've got a collector that works but a system that fails under load. Test under peak production load, not synthetic benchmarks.
// io.thecodeforge — java tutorial // Simulates STW vs concurrent cost measurement public class StopwatchTest { // Production incident: 8ms GC pause killed\.5M trades public static long measurePauseImpact(Runnable businessLogic) { long start = System.nanoTime(); businessLogic.run(); // simulates application work long end = System.nanoTime(); return (end - start) / 1_000_000; // ms } public static void main(String[] args) { // G1GC scenario: 1% of CPU stolen per cycle long appTime = measurePauseImpact(() -> { for (int i = 0; i < 100_000; i++) { Math.sqrt(i); // pretend work } }); // ZGC scenario: 10% CPU overhead hidden long stolenCycles = appTime + (appTime * 10 / 100); // +10% System.out.println("Application time: " + appTime + "ms"); System.out.println("With ZGC overhead: " + stolenCycles + "ms"); // Output reveals the hidden cost } }
The C4 Collector — Censorship, Compactation, and the Real Cost of Object Promotion
You've read about G1GC's region-based collection. You've heard Shenandoah's evacuation threads. But nobody talks about C4 — the Continuously Concurrent Compacting Collector from Azul Systems. It's been running in production for over a decade, and it's the only collector that literally never pauses.
C4 doesn't stop the world. Ever. Not for marking, not for compaction, not for relocation. It uses load barriers and self-healing — objects that are moved are automatically forwarded. The thread reading the object doesn't know it moved. There's no STW safety net. It's a concurrent collector that took the training wheels off.
The price? C4 requires a custom JVM (Zing or now part of OpenJDK's generational mode). It's tuned for heaps over 100GB. The overhead is a 5-15% CPU tax for zero-pause guarantees. If you have a 128GB heap with 50ms response SLA, C4 is the only catcher that works.
Most teams don't need C4. But understanding it reveals a truth: every other collector gives you a pause that grows with heap size. C4's pause is constant — zero. If your GC tuning is failing above 100GB, stop tuning. Buy the right tool.
// io.thecodeforge — java tutorial // Simulates object promotion cost in Young Gen public class PromotionCost { static final int OBJECT_SIZE = 1024; // 1KB objects public static void main(String[] args) { // Eden: objects die young long edenStart = System.nanoTime(); byte[][] young = new byte[100_000][OBJECT_SIZE]; long edenEnd = System.nanoTime(); // Survivor: objects promoted (survive minor GC) long survivorStart = System.nanoTime(); // Simulate objects that survive 3 GC cycles byte[][] survivor = new byte[50_000][OBJECT_SIZE]; for (int i = 0; i < 50_000; i++) { survivor[i] = new byte[OBJECT_SIZE]; } long survivorEnd = System.nanoTime(); System.out.println("Eden allocation: " + (edenEnd - edenStart) / 1_000_000 + "ms"); System.out.println("Promoted objects: " + (survivorEnd - survivorStart) / 1_000_000 + "ms"); // Output shows promotion is 2-3x more expensive } }
Overview
GC tuning in Java often starts with the wrong question: "Which collector is fastest?" Instead, the right question is: "What latency and throughput constraints does my application have?" Garbage collection is a trade-off engine, not a speed dial. Every pause — STW or concurrent — steals cycles from your business logic. The Java Virtual Machine offers multiple collectors because no single strategy fits all workloads. Understanding the WHY behind each implementation prevents cargo-cult tuning. Modern GCs like G1, ZGC, and Shenandoah share a goal: minimize pause impact. But they differ in how they manage memory regions, compaction, and concurrency. Your application's object allocation rate, heap size, and tolerable pause budget dictate which collector fits. Ignoring these fundamentals leads to humongous allocation storms or silent latency spikes. This section gives you the mental model to evaluate GC choices before touching any JVM flags.
// io.thecodeforge — java tutorial // Max 25 lines // Overview: GC tuning starts with constraints, not flags. public class GCOverview { public static void main(String[] args) { // The key question: latency budget or throughput? String constraint = "pause < 10ms"; String collector = constraint.startsWith("pause") ? "ZGC or Shenandoah" : "G1GC or Parallel"; System.out.println("Choose: " + collector); } }
GC Implementations — Modern Approach Using Cleaner API (Without finalize())
Java's finalize() method was deprecated in Java 9 and removed in Java 18. It introduced unpredictable GC delays, resurrection bugs, and no guarantee of execution. The modern replacement is the Cleaner API (java.lang.ref.Cleaner) introduced in Java 9. It provides deterministic, efficient resource cleanup tied to object reachability — not finalization cycles. The WHY: finalize() forced GC to treat objects specially, deferring collection and increasing STW duration. Cleaner uses PhantomReference under the hood, registered with a dedicated cleanup thread. This decouples object reclamation from cleanup logic, reducing GC overhead. When an object becomes phantom-reachable, the Cleaner runs the registered action without stalling the application threads. This approach aligns with modern collectors like ZGC and Shenandoah that prioritize low-pause, concurrent sweeping. Always use Cleaner for native resources (e.g., off-heap memory, file descriptors) instead of finalize(). It's safer, faster, and future-proof.
// io.thecodeforge — java tutorial // Max 25 lines // Modern cleanup: Cleaner replaces finalize(). import java.lang.ref.Cleaner; public class CleanerExample { static final Cleaner CLEANER = Cleaner.create(); private final Resource resource; private final Cleaner.Cleanable cleanable; public CleanerExample() { resource = new Resource(); cleanable = CLEANER.register(this, resource::close); } static class Resource implements AutoCloseable { @Override public void close() { System.out.println("Native cleanup executed"); } } }
G1 Humongous Allocation Storm Crashes Payment Service Under Black Friday Load
- G1 humongous objects bypass normal region allocation and can starve the collector
- Region size is the single most important G1 tuning parameter for workloads with large transient objects
- Doubling heap without fixing the allocation pattern just delays the same failure with a longer full GC pause
- Monitor humongous allocation rate with -Xlog:gc+humongous=debug before incidents occur
jcmd <pid> GC.heap_infojstat -gcutil <pid> 1000 10top -H -p <pid> | grep -E 'VM Thread|GC Thread'jcmd <pid> VM.flags | grep -i concjstat -gcutil <pid> 500 20grep 'Pause' gc.log | tail -20kubectl describe pod <pod> | grep -A5 'OOMKilled'jcmd <pid> VM.native_memory summarygrep 'to-space exhausted' gc.log | wc -lgrep 'humongous' gc.log | tail -20| Characteristic | G1GC | ZGC | Shenandoah |
|---|---|---|---|
| JDK availability | JDK 7+ (default JDK 9+) | JDK 11+ (prod JDK 15+) | JDK 8+ (backports), prod JDK 12+ |
| Typical pause time | 50-200ms (tunable to ~50ms) | < 10ms (independent of heap) | < 10ms (independent of heap) |
| Throughput overhead | Baseline (lowest) | 10-15% vs G1 | 5-10% vs G1 |
| Native memory overhead | ~10-15% of heap | ~20-25% of heap | ~10-15% of heap + 8 bytes/object |
| Compressed oops | Supported | Not supported | Supported |
| Generational collection | Yes (built-in) | Yes (JDK 21+ with -XX:+ZGenerational) | No (full-heap concurrent) |
| Max tested heap | Terabytes | 16TB | Terabytes |
| Humongous objects | Problematic — requires tuning | No concept — handles large objects well | No concept — handles large objects well |
| Container friendliness | Good — predictable overhead | Poor — high native memory | Good — supports compressed oops |
| Allocation stall behavior | Full GC fallback (catastrophic) | Hard stall (backpressure) | Soft pacing (gradual degradation) |
| Tuning complexity | Moderate — many flags | Low — fewer flags, self-tuning | Low-moderate — heuristic modes |
| Community/ecosystem maturity | Very mature — default collector | Mature — growing adoption | Moderate — Red Hat backed |
| Best use case | General purpose, cost-sensitive | Ultra-low latency, large heaps | Low latency, memory-efficient, moderate heaps |
| File | Command / Code | Purpose |
|---|---|---|
| io | /** | G1GC |
| io | /** | ZGC |
| io | /** | Shenandoah |
| io | /** | Comparing the Three Collectors |
| io | /** | Production Tuning Patterns That Actually Work |
| io | /** | Monitoring and Observability for GC Health |
| StopwatchTest.java | public class StopwatchTest { | STW vs Concurrent |
| PromotionCost.java | public class PromotionCost { | The C4 Collector |
| GCOverview.java | public class GCOverview { | Overview |
| CleanerExample.java | public class CleanerExample { | GC Implementations |
Key takeaways
Common mistakes to avoid
8 patternsSetting -Xmx without accounting for native memory overhead
Choosing ZGC because 'lower pauses are always better'
Tuning GC flags without enabling detailed GC logging
Using the same GC flags across all services
Ignoring humongous allocations in G1
Not setting Xms equal to Xmx for ZGC and Shenandoah
Measuring GC health by pause time alone
Running non-generational ZGC in production on JDK 21+
Interview Questions on This Topic
Explain the fundamental trade-off between G1, ZGC, and Shenandoah. Why can't one collector optimize all three axes (pause time, throughput, memory efficiency)?
Your payment service is running G1 with 16GB heap. During peak traffic, you see 'to-space exhausted' in GC logs followed by a 12-second full GC. What is happening and how do you fix it?
You are migrating a service from G1 to ZGC. After migration, p99 latency improved from 120ms to 8ms, but throughput dropped 15% and the service needs 25% more memory in Kubernetes. The team wants to revert. How do you evaluate this decision?
What is the difference between ZGC's allocation stall and Shenandoah's pacing mechanism? Which creates a better user experience under load?
A service has a large on-heap cache holding 10GB of data with a 24-hour TTL. How does this affect each collector and what would you recommend?
How do you calculate the right container memory limit for a JVM running ZGC with a 32GB heap?
Explain why setting -XX:MaxGCPauseMillis=200 does not guarantee 200ms maximum pause with G1.
You need to support both a latency-sensitive API (p99 < 20ms) and a batch processing job in the same JVM. Which collector do you choose and why?
Frequently Asked Questions
Start with G1. If your p99 latency with tuned G1 exceeds your SLA, evaluate ZGC (for large heaps or ultra-low latency) or Shenandoah (for moderate heaps with memory constraints). Profile your actual workload — do not choose based on benchmarks or blog posts.
Set -Xmx to container_memory_limit / 1.15 for G1 or Shenandoah, or container_memory_limit / 1.25 for ZGC. Always set -Xms equal to -Xmx. The remaining memory covers thread stacks, metaspace, GC native structures, and direct byte buffers.
Young GC collects only eden and survivor regions — objects that have survived one or more young collections. Mixed GC collects both young regions and old regions identified as mostly garbage during the preceding concurrent marking cycle. Mixed GCs are how G1 reclaims old generation space without a full GC.
No. The garbage collector is selected at JVM startup and cannot be changed at runtime. This is a fundamental JVM design constraint. If you need to test a different collector, deploy a separate instance with the new collector flags.
Calculate allocation rate from jstat output: (bytes allocated between samples) / time interval. If allocation rate consistently exceeds 2GB/sec and you are seeing GC pressure (frequent cycles, growing pause times), the rate is too high for comfortable GC operation. Profile with async-profiler or JFR to identify allocation hotspots.
Yes. ZGC supports x86_64, AArch64 (ARM 64-bit), and other 64-bit architectures as of JDK 17+. Earlier JDK versions had limited ARM support. Verify your specific JDK version's platform support matrix.
Allocation stall means ZGC cannot keep up with the allocation rate. The JVM temporarily blocks allocating threads while the collector catches up. This is ZGC's backpressure mechanism. Fix by: increasing -XX:ConcGCThreads, reducing allocation rate at the application level, or increasing heap size / lowering SoftMaxHeapSize to trigger cycles earlier.
Yes. Shenandoah has been production-ready since JDK 12 and is actively maintained by Red Hat. It is used in production at scale by Red Hat's own infrastructure and by customers running OpenJDK. It is less widely adopted than G1 or ZGC but is a mature, reliable collector.
20+ years shipping production Java in banking & fintech. Written from production experience, not tutorials.
That's Advanced Java. Mark it forged?
8 min read · try the examples if you haven't