Imagine you're at a big party and everyone keeps leaving empty cups on tables. You hired a cleaner (the Garbage Collector) whose only job is to walk around, spot cups nobody is holding anymore, and throw them away so there's room for fresh drinks. The cleaner doesn't interrupt the party every second — they work in bursts, and sometimes they have to pause everything to do a deep clean. That pause is what Java developers are always trying to shrink. Java's GC is exactly that cleaner: it automatically finds objects your program no longer references and reclaims their memory so you never have to call free() yourself.
Every Java application runs a second program inside the JVM — the Garbage Collector. It decides when memory gets freed, how long your threads pause, and whether your latency SLAs hold up under load. Most developers treat it like a black box and then wonder why their microservice spikes to 500ms every few seconds in production.
Before automatic memory management, C and C++ developers had to manually allocate and free every byte. Java solved this with a managed heap and a runtime that tracks object reachability — if nothing in your program can reach an object, its memory can be reclaimed. That single idea eliminated an entire class of bugs but introduced a new challenge: the collector itself consumes CPU and introduces pauses.
The core misconception: GC pauses are inevitable and unfixable. They are not. Modern collectors offer pause-time guarantees independent of heap size — but only if you understand the trade-offs and tune correctly for your workload.
What is Garbage Collection in Java?
Garbage Collection in Java is the JVM's automatic memory management mechanism. The GC periodically identifies objects that are no longer reachable from any GC root (thread stacks, static fields, JNI references) and reclaims their heap memory. This eliminates manual memory management but introduces pauses and CPU overhead that must be managed in production.
The JVM determines object reachability through a reachability analysis starting from GC roots. An object is considered dead — and eligible for collection — when no chain of references from any root can reach it. This is fundamentally different from reference counting (used in early Python/PHP) which cannot handle cyclic references. Java's tracing GC handles cycles naturally because it only cares about reachability, not reference count.
The key production insight: GC does not run when memory is low. GC runs when allocation pressure triggers it. This means a service with a large heap but low allocation rate may run GC infrequently, while a service with a small heap and high allocation rate may run GC constantly. Allocation rate, not heap size, is the primary driver of GC frequency.
io/thecodeforge/gc/ReachabilityDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package io.thecodeforge.gc;
import java.util.ArrayList;
import java.util.List;
/**
* Demonstrates how JavaGC determines object reachability.
*
* Key concept: An object is reachable if any GC root can access it
* through a chain of references. When the chain breaks, the object
* becomes eligible for collection.
*/
public class ReachabilityDemo {\n\n public static void main(String[] args) {\n // Object created on the heap — referenced by local variable 'order'\n // 'order' is a GC root (stack reference)\n Order order = new Order(\"ORD-001\", 149.99);\n System.out.println(\"Order created: \" + order.getId());\n\n // After this reassignment, the original Order object has no\n // reachable references. It becomes eligible for GC.\n order = new Order(\"ORD-002\", 299.99);\n // The first Order(\"ORD-001\") is now unreachable — GC will reclaim it\n\n // Demonstrating cyclic references — GC handles this correctly\n OrderNode nodeA = new OrderNode(\"A\");\n OrderNode nodeB = new OrderNode(\"B\");\n nodeA.next = nodeB;\n nodeB.next = nodeA; // cycle: A -> B -> A\n\n // Even though A and B reference each other, if we null out\n // our stack references, both become unreachable and are collected\n nodeA = null;\n nodeB = null;\n // The cycle A -> B -> A is still intact in memory, but no GC root\n // can reach either node. Both are eligible for collection.\n }\n\n static class Order {\n private final String id;\n private final double amount;\n\n Order(String id, double amount) {\n this.id = id;\n this.amount = amount;\n }\n\n String getId() { return id; }\n }\n\n static class OrderNode {\n final String name;\n OrderNode next;\n\n OrderNode(String name) {\n this.name = name;\n }\n }\n}","output": "Order created: ORD-001"
}
The Generational Heap — Why Most Objects Die Young
The JVM heap is divided into generations based on the weak generational hypothesis: most objects die young, and objects that survive one collection are likely to survive many more. This observation drives the generational heap design that every modern JVM collector uses.
The young generation consists of eden space (where new objects are allocated) and two survivor spaces (S0 and S1). New objects are allocated in eden. When eden fills up, a minor GC (young collection) runs: live objects in eden are copied to one survivor space, and live objects in the other survivor space are also copied and aged. Objects that survive enough young collections (controlled by -XX:MaxTenuringThreshold) are promoted to the old generation.
The old generation holds long-lived objects. When the old generation fills up or a collection threshold is reached, a major GC runs. In G1, this is a mixed GC that collects both young and old regions. In extreme cases, a full GC (stop-the-world compaction of the entire heap) is triggered — this is the catastrophic failure mode you must avoid.
The critical production insight: the tenuring threshold determines how quickly objects move to old generation. Too low, and short-lived objects pollute old generation, increasing old gen GC frequency. Too high, and survivor spaces overflow, forcing premature promotion. Both paths degrade performance.
package io.thecodeforge.gc;
import java.util.ArrayList;
import java.util.List;
/**
* Demonstrates how allocation patterns interact with the generational heap.
*
* Objects that survive young collections are promoted to old generation.
* Understandingthis promotion mechanism is critical for tuning.
*/
publicclassGenerationalBehaviorDemo {
/**
* Pattern1: Short-lived objects — ideal for generational GC.
* These objects die in eden and never reach old generation.
* GC can reclaim them with a fast young collection.
*/
publicvoidprocessRequest() {
// These objects are created, used, and become unreachable// within a single method call. They die in eden.String requestId = java.util.UUID.randomUUID().toString();
byte[] payload = newbyte[4096];
List<String> validationErrors = newArrayList<>();
// After this method returns, all three objects become unreachable// because they are only referenced by local variables (stack roots).
}
/**
* Pattern2: Long-lived cached objects — promoted to old gen.
* These objects survive young collections and get promoted.
* They occupy old generation permanently (or until eviction).
*
* Production risk: Ifthis cache grows unbounded, old generation
* fills up and triggers full GC or OOM.
*/
privatefinalList<byte[]> longLivedCache = newArrayList<>();
publicvoidcacheData(byte[] data) {
// This reference keeps the byte array alive indefinitely.// After surviving MaxTenuringThreshold young collections,// it is promoted to old generation.
longLivedCache.add(data);
}
/**
* Pattern3: Premature promotion — objects that should die young
* but get promoted because survivor space is full.
*
* If allocation rate exceeds survivor space capacity, objects
* are promoted directly to old generation even if they are short-lived.
* This is called premature promotion and it pollutes old generation.
*
* Fix: Increase survivor space ratio (-XX:SurvivorRatio)
* or reduce allocation rate.
*/
publicvoidburstAllocation() {
// If this loop runs fast enough to fill eden AND overflow// survivor space, these temporary objects get promoted to// old generation even though they die after each iteration.for (int i = 0; i < 100_000; i++) {
byte[] temp = newbyte[256];
// temp is short-lived, but under pressure it may be// prematurely promoted to old generation
}
}
/**
* Production tuning flags for generational behavior:
*
* -XX:NewRatio=2// old:young = 2:1 (default for most collectors)
* -XX:SurvivorRatio=8// eden:survivor = 8:1 (default)
* -XX:MaxTenuringThreshold=15// objects survive 15 young GCs before promotion
* -XX:+AlwaysTenure// promote immediately (dangerous — avoid)
* -XX:+NeverTenure// never promote (survivor overflow → old gen)
*
* Monitor promotion rate with:
* jstat -gcutil <pid> 1000
* Watch'O'column (old gen utilization) for steady growth.
* Steady growth with low live data = premature promotion.
*/
}
The Weak Generational Hypothesis — The Foundation of All Modern GC
If 90% of objects die in eden, collecting eden reclaims 90% of garbage with minimal work
Young collection only scans eden + survivor spaces — not the entire heap. This is fast.
Old generation collection is expensive because it must handle long-lived object graphs
The hypothesis fails for workloads with uniform object lifetimes — batch processing, data pipelines
When the hypothesis fails, you see high promotion rates and frequent old gen collections
Production Insight
Monitor promotion rate as a leading indicator of GC health. Use jstat -gcutil and watch the bytes promoted from young to old generation per GC cycle. A healthy service promotes < 5% of young gen per cycle. If promotion rate exceeds 20%, your objects are living too long in young gen — either increase -XX:MaxTenuringThreshold, increase survivor space (-XX:SurvivorRatio=6), or investigate why short-lived objects are escaping young gen (common cause: objects stored in thread-local caches or request-scoped maps that persist across requests).
Key Takeaway
The generational heap exploits the statistical fact that most objects die young. Young collection is fast because it only scans eden + survivor. Old generation is expensive to collect. Premature promotion — short-lived objects reaching old gen — is a silent performance killer. Monitor promotion rate with jstat -gcutil.
GC Algorithms — Mark-Sweep, Copying, and Compaction
All GC algorithms are built on three fundamental operations: marking (identifying live objects), sweeping (reclaiming dead objects' memory), and compacting (defragmenting live objects to create contiguous free space). Different collectors combine these operations differently to optimize for pause time, throughput, or memory efficiency.
Mark-and-sweep identifies live objects (mark phase) then reclaims unmarked memory (sweep phase). The problem: it creates fragmentation. After many allocation-deallocation cycles, free memory is scattered in small chunks. Large object allocations may fail even when total free memory is sufficient — this is external fragmentation.
Copying collectors solve fragmentation by copying live objects to a fresh region and discarding the old region entirely. This is inherently compacting — live objects end up contiguous. The cost: copying live objects takes time proportional to the live data set, and you need double the memory (from-space and to-space). The generational heap reduces this cost by only copying in young generation.
Mark-and-compact identifies live objects then slides them to one end of the heap, creating one contiguous free region. This avoids the double-memory cost of copying but requires updating every reference to moved objects — a potentially expensive operation that must be done during a stop-the-world pause or with complex concurrent mechanisms.
io/thecodeforge/gc/GCAlgorithmDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
package io.thecodeforge.gc;
import java.util.ArrayList;
import java.util.List;
/**
* Demonstrates how different GC algorithm characteristics
* affect production behavior.
*
* This is not a GC implementation — it illustrates the concepts
* that drive real collector design decisions.
*/
publicclassGCAlgorithmDemo {
/**
* MARK-AND-SWEEP characteristic:
* - Fast reclaim but creates fragmentation
* - External fragmentation: total free > requested, but not contiguous
*
* Production impact: After hours of operation, allocation of large
* objects fails even though 40% of heap is free — it's fragmented.
* This triggers unnecessary full GC or OOM.
*/
publicvoiddemonstrateFragmentation() {
// Imagine this array is the heap, each index is a memory block// true = occupied, false = freeboolean[] heap = newboolean[100];
// Simulate allocation pattern: allocate and free alternating blocksfor (int i = 0; i < 100; i++) {
heap[i] = true; // allocate
}
for (int i = 0; i < 100; i += 2) {
heap[i] = false; // free every other block
}
// Result: 50% free, but no contiguous block of size > 1// A request for 3 contiguous blocks fails despite 50 free blocks// This is external fragmentation — the problem compaction solves
}
/**
* COPYINGCOLLECTOR characteristic:
* - Copies live objects to to-space, discards from-space
* - Inherently compacting — no fragmentation
* - Cost: proportional to live data, not dead data
* - Requiresdouble the memory (from + to spaces)
*
* Production insight: Copying cost is why large live data sets
* cause longer young collection pauses. If your service has
* 2GB of live objects in young gen, copying takes measurable time.
*/
publicvoiddemonstrateCopyingCost() {
// Simulating live data that must be copied during young GCList<byte[]> liveObjects = newArrayList<>();
for (int i = 0; i < 10_000; i++) {
liveObjects.add(new byte[1024]); // 1KB each = ~10MB live data
}
// During young GC, all 10MB must be copied to survivor space.// If only 1MB were live, the cost would be 10x lower.// This is why reducing live data in young gen reduces pause time.//// Real production fix: avoid holding references to temporary// objects across request boundaries. Let them die in eden.
}
}
The Three Fundamental GC Operations
Serial GC: mark-sweep-compact, all stop-the-world. Simple but pauses grow with heap.
Parallel GC: same algorithm as Serial but uses multiple threads. Faster but same pause characteristics.
G1: mark + concurrent sweep via region evacuation. Compaction happens per-region, not whole-heap.
ZGC: concurrent mark + concurrent compact via colored pointers. All phases concurrent except initial/final mark.
Shenandoah: concurrent mark + concurrent compact via Brooks pointers. Similar to ZGC with different implementation.
Production Insight
Fragmentation is the silent killer of long-running services. After days of operation, a heap with 40% free memory may fail to allocate a 10MB object because no contiguous 10MB block exists. This triggers a full GC to compact the heap. Monitor fragmentation with jcmd <pid> GC.heap_info and look at free region distribution. G1 handles fragmentation well through region-based evacuation. If you see increasing full GC frequency over time without increasing live data, fragmentation is the cause.
Key Takeaway
All GC algorithms are built on mark, sweep, and compact. Fragmentation is the primary failure mode of mark-and-sweep. Copying collectors solve fragmentation but cost proportional to live data. Modern collectors (G1, ZGC, Shenandoah) do as much work concurrently as possible to minimize stop-the-world pauses.
G1 GC — The Default Workhorse
G1 (Garbage-First) has been the default JVM collector since Java 9. It divides the heap into equal-sized regions (1MB to 32MB) and prioritizes collecting regions with the most garbage — hence 'garbage-first'. G1 maintains a remembered set per region tracking incoming references, enabling independent region collection without scanning the entire heap.
G1 operates in young-only and mixed collection cycles. Young GC collects survivor and eden regions. When the heap occupancy exceeds the Initiating Heap Occupancy Percent (IHOP), G1 triggers a concurrent marking cycle. After marking completes, subsequent mixed GCs collect both young and old regions identified as mostly garbage.
The critical production insight: G1's pause time is primarily driven by the number of regions it must collect in a single pause, not heap size. A 64GB heap with aggressive evacuation can pause longer than a 4GB heap with conservative settings. This is the opposite of what most engineers assume.
io/thecodeforge/gc/G1TuningExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
package io.thecodeforge.gc;
import java.util.concurrent.ConcurrentHashMap;
import java.util.Map;
/**
* Demonstrates allocation patterns that stress G1 differently.
*
* Key insight: G1 humongous objects (>50% region size) bypass normal
* allocation and can trigger to-space exhausted failures.
*/
publicclassG1TuningExample {
// Cache with large value objects — common source of humongous allocationsprivatefinalMap<String, byte[]> payloadCache = newConcurrentHashMap<>();
/**
* BAD: Allocates objects that may exceed humongous threshold.
* Withdefault 1MB region size, objects > 512KB are humongous.
* With 32MB regions, threshold is 16MB — much safer for large payloads.
*
* Tuning: -XX:G1HeapRegionSize=32M
* -XX:G1ReservePercent=15
* -XX:InitiatingHeapOccupancyPercent=35
*/
publicvoidcacheLargePayload(String key, int sizeBytes) {
byte[] payload = newbyte[sizeBytes];
for (int i = 0; i < Math.min(sizeBytes, 1024); i++) {\n payload[i] = (byte) (i & 0xFF);\n }
payloadCache.put(key, payload);
}
/**
* BETTER: Chunk large payloads to stay below humongous threshold.
* Each chunk is independently collectible as a regular object.
*/
publicvoidcacheChunkedPayload(String key, byte[] fullPayload) {
int chunkSize = 256 * 1024; // 256KB chunksint numChunks = (fullPayload.length + chunkSize - 1) / chunkSize;
for (int i = 0; i < numChunks; i++) {
int offset = i * chunkSize;
int length = Math.min(chunkSize, fullPayload.length - offset);
byte[] chunk = newbyte[length];
System.arraycopy(fullPayload, offset, chunk, 0, length);
payloadCache.put(key + ":chunk:" + i, chunk);
}
}
/**
* ProductionG1 flags for a 16GB heap with mixed allocation profile:
*
* -XX:+UseG1GC
* -Xms16g -Xmx16g
* -XX:G1HeapRegionSize=16m
* -XX:MaxGCPauseMillis=200
* -XX:G1ReservePercent=15
* -XX:InitiatingHeapOccupancyPercent=35
* -XX:G1MixedGCCountTarget=8
* -XX:G1MixedGCLiveThresholdPercent=85
* -Xlog:gc*,gc+humongous=debug:file=/var/log/gc.log:time,uptime,level,tags
*/
}
G1's Core Mental Model: Region-Based Evacuation
Pause time scales with live data in collected regions, not total heap size
Humongous objects break this model — they span multiple regions and cannot be partially evacuated
Remembered sets consume 5-10% of heap as off-heap overhead — budget for this when setting -Xmx
To-space exhausted means G1 literally ran out of regions to evacuate into — this is a full GC fallback
Production Insight
G1's -XX:MaxGCPauseMillis is a soft target, not a hard guarantee. G1 will attempt to meet this by adjusting how many regions to collect per cycle, but allocation rate spikes can violate it. If you need hard latency guarantees, G1 is the wrong collector. Monitor actual pause times against your SLA — if G1 violates MaxGCPauseMillis more than 5% of the time, the workload demands ZGC or Shenandoah.
Key Takeaway
G1 is the right default for most workloads, but it has a hard ceiling on pause-time predictability. Once your latency budget drops below ~100ms p99, evaluate ZGC or Shenandoah. Never tune G1 without GC logs enabled — the default logging is insufficient for production diagnosis.
G1 Tuning Decision Tree
IfHumongous allocations appearing in GC logs
→
UseIncrease -XX:G1HeapRegionSize to reduce humongous threshold. Max region size is 32MB. Chunk large objects at the application level if possible.
IfMixed GCs are too frequent, causing throughput loss
→
UseIncrease -XX:G1MixedGCCountTarget (default 8) to spread collection over more cycles. Adjust -XX:G1MixedGCLiveThresholdPercent to collect only regions with more garbage.
IfFull GC appearing despite adequate heap
→
UseIHOP is miscalibrated. Set -XX:InitiatingHeapOccupancyPercent lower (try 35) or enable -XX:+G1UseAdaptiveIHOP (Java 10+) to let G1 self-tune.
IfPause times exceed MaxGCPauseMillis consistently
→
UseLive data set is too large for G1's evacuation budget. Either reduce live data (caching strategy) or migrate to ZGC/Shenandoah where pause times are independent of live data size.
ZGC — Sub-Millisecond Pause Collector
ZGC (Z Garbage Collector) was introduced as experimental in JDK 11 and became production-ready in JDK 15. Its defining characteristic: pause times stay below 10ms regardless of heap size — tested up to 16TB heaps. ZGC achieves this through concurrent everything: marking, relocation, and reference processing all happen while application threads run.
ZGC uses load barriers with colored pointers. Every object reference carries metadata bits (marked0, marked1, remap, finalize) embedded in the pointer itself. The load barrier intercepts every object access to check if the reference needs remapping. This is the fundamental trade-off: ZGC replaces long GC pauses with per-access overhead on every object load.
As of JDK 21, ZGC supports generational mode (-XX:+ZGenerational) which dramatically improves throughput by focusing collection on young objects. Non-generational ZGC collects the entire heap every cycle, which limits throughput on allocation-heavy workloads.
io/thecodeforge/gc/ZGCTuningExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package io.thecodeforge.gc;
import java.util.concurrent.atomic.AtomicLong;
/**
* ZGC-specific considerations for production workloads.
*
* ZGC trades per-access overhead for near-zero pause times.
* The load barrier adds ~4-8% overhead on pointer-heavy workloads.
*/
publicclassZGCTuningExample {
privatefinalAtomicLong allocationCounter = newAtomicLong(0);
/**
* ProductionZGC flags for a 32GB heap, latency-sensitive service:
*
* -XX:+UseZGC
* -XX:+ZGenerational// JDK 21+ — critical for throughput
* -Xms32g -Xmx32g// Always set Xms=Xmx for ZGC
* -XX:SoftMaxHeapSize=28g // ZGC-specific: target heap occupancy
* -XX:ZCollectionInterval=5// Suggest GC cycle every 5 seconds
* -XX:ConcGCThreads=4// Concurrent GC threads
* -Xlog:gc*:file=/var/log/zgc.log:time,uptime,level,tags
*
* CRITICAL: ZGC uses ~20% native memory overhead beyond -Xmx.
* Container memory limit must be heap * 1.25 minimum.
*/
/**
* ZGCSoftMaxHeapSize is unique — it tells ZGC to try to stay below
* this threshold but can exceed it under allocation pressure.
*
* Usecase: Set heap to 32GB, SoftMaxHeapSize to 28GB.
* ZGC will trigger cycles aggressively to stay under 28GB.
* Only allocates into the remaining 4GB under extreme pressure.
*/
publicvoiddemonstrateSoftMaxHeapConcept() {
// With SoftMaxHeapSize=28g and Xmx=32g:// - ZGC targets 28GB occupancy// - If allocation pressure pushes past 28GB, ZGC cycles more aggressively// - If it hits 32GB, allocation stalls (not OOM, but backpressure)
}
}
Pause times are truly independent of heap size and live data size — tested to 16TB
The trade-off is per-access CPU overhead, not pause time — you pay on every object load
ZGC cannot use compressed object pointers (UseCompressedOops) — increases memory usage by ~15% on heaps < 32GB
Generational ZGC (JDK 21+) reduces overhead dramatically by focusing on young generation
Production Insight
ZGC's biggest production risk is native memory consumption. ZGC multi-maps the heap across multiple virtual address spaces for colored pointer management, and this multi-mapping eats into the process's virtual address space. Budget container memory as heap 1.25 for ZGC versus heap 1.15 for G1. Also, ZGC requires a 64-bit system — it does not run on 32-bit.
Key Takeaway
ZGC is the correct choice when p99 latency must be below 10ms and you can afford 10-15% throughput overhead. Enable generational mode on JDK 21+. Budget 25% extra native memory beyond heap size. ZGC's SoftMaxHeapSize is the most underrated production feature for containerized deployments.
Shenandoah — Red Hat's Low-Pause Contender
Shenandoah is Red Hat's concurrent compacting collector, available as production-ready since JDK 12. It achieves low pause times through concurrent evacuation — moving live objects while application threads run — using Brooks pointers (an indirection layer on every object).
Shenandoah differs from ZGC in a critical way: it uses Brooks pointers (every object has a forwarding pointer field) instead of colored pointers. This means Shenandoah does not require specific pointer bit layouts and works with compressed oops, reducing memory overhead compared to ZGC on heaps under 32GB.
Shenandoah operates in three concurrent phases: concurrent mark, concurrent evacuate, and concurrent update-refs. The initial mark and final mark phases are short stop-the-world pauses, typically under 10ms. Shenandoah's pacing mechanism backpressures allocation threads proportionally when the collector falls behind, creating smoother degradation than ZGC's hard allocation stalls.
package io.thecodeforge.gc;
import java.util.ArrayList;
import java.util.List;
/**
* Shenandoah-specific production considerations.
*
* Shenandoah uses Brooks pointers — every object has an extra forwarding
* pointer field. This adds 8 bytes per object on 64-bit systems.
*/
publicclassShenandoahTuningExample {
/**
* Brooks pointer overhead calculation:
*
* Object with 2fields (16 bytes header + 16 bytes data = 32 bytes)
* + 8 bytes Brooks pointer = 40 bytes per object
* Overhead: 25% increase per object
*
* For10 million small objects: ~80MB additional memory
* For100 million small objects: ~800MB additional memory
*/
publiclongestimateBrooksOverhead(int objectCount) {
return (long) objectCount * 8;
}
/**
* ProductionShenandoah flags for a 16GB heap:
*
* -XX:+UseShenandoahGC
* -Xms16g -Xmx16g
* -XX:ShenandoahGCHeuristics=adaptive
* -XX:ShenandoahAllocationThreshold=10
* -XX:+UseCompressedOops// works with Shenandoah (unlike ZGC)
* -Xlog:gc*:file=/var/log/shenandoah.log:time,uptime,level,tags
*/
/**
* Shenandoah pacing is a unique feature that backpressures allocation
* threads when the collector falls behind.
*
* UnlikeZGC which stalls allocation entirely, Shenandoah slows down
* allocating threads proportionally. This creates smoother latency
* degradation under load rather than sharp spikes.
*/
publicvoiddemonstratePacingBehavior() {
List<byte[]> allocations = newArrayList<>();
// Under heavy allocation, Shenandoah will pace this loop// by adding small delays to each allocation.// The delay is proportional to how far behind the collector is.for (int i = 0; i < 100_000; i++) {
allocations.add(newbyte[1024]);
}
}
}
No load barrier overhead — Shenandoah uses store barriers instead, which fire less frequently
Works with compressed oops — saves ~15% memory compared to ZGC on heaps under 32GB
Per-object overhead of 8 bytes — significant for workloads with many small objects
Pacing mechanism creates graceful degradation instead of hard allocation stalls
Production Insight
Shenandoah's biggest production risk is the Brooks pointer overhead on small-object-heavy workloads. If your service has 100M+ objects under 64 bytes, the 8-byte Brooks pointer per object adds ~800MB of overhead. Profile with compressed oops disabled to see true memory consumption. Additionally, Shenandoah's pacing can create subtle latency degradation that is hard to distinguish from application-level slowness — always correlate pacing delays with latency metrics.
Key Takeaway
Shenandoah is the right choice when you need low-pause GC on moderate heaps (< 32GB) and want compressed oops support. Its pacing mechanism creates smoother degradation than ZGC's allocation stalls. The Brooks pointer overhead is the hidden cost — budget 8 bytes per object.
JVM Flags That Actually Matter
Most JVM GC flags have sensible defaults. A small subset moves the needle in production. Understanding which flags to adjust — and when — prevents the common anti-pattern of blindly copying flags from blog posts without understanding their impact on your specific workload.
Flags fall into three categories: heap sizing, collector behavior, and logging. Heap sizing flags (-Xms, -Xmx, -XX:NewRatio) control memory layout. Collector behavior flags (-XX:MaxGCPauseMillis, -XX:InitiatingHeapOccupancyPercent) control collection strategy. Logging flags (-Xlog:gc) enable observability. The third category is the most important — you cannot tune what you cannot measure.
📚 RELATED NEXT STEPS
→ JVM Memory Model — Understand the heap regions these flags operate on
→ JVM Memory Issues in Production: Debugging Guide (OOM, GC, Leaks) — When flags alone are not enough and you need live incident triage
io/thecodeforge/gc/ProductionJVMFlags.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
package io.thecodeforge.gc;
/**
* ProductionJVM flag configurations organized by collector.
* These are starting points — tune based on measured workload characteristics.
*/
publicclassProductionJVMFlags {
/**
* UNIVERSALFLAGS (apply to all collectors):
*
* -Xms<size> -Xmx<size> // Set min=max to avoid resize overhead
* -XX:+AlwaysPreTouch// Pre-zero heap pages at startup
* -XX:+DisableExplicitGC// Ignore System.gc() calls
* -XX:+HeapDumpOnOutOfMemoryError// Auto heap dump on OOM
* -XX:HeapDumpPath=/var/log/ // Where to write heap dumps
* -XX:+UseContainerSupport// Respect cgroup limits (default JDK 10+)
* -XX:MaxRAMPercentage=75.0// Set heap as % of container memory
* -XX:NativeMemoryTracking=detail // Track off-heap memory usage
*
* LOGGINGFLAGS (always enable in production):
* -Xlog:gc*:file=/var/log/gc.log:time,uptime,level
The Flag Hierarchy — What to Tune First
First: Set -Xms = -Xmx to prevent resize overhead. Size heap based on container limits, not guesswork.
Second: Enable GC logging. You cannot tune what you cannot measure. This alone solves 50% of debugging issues.
Third: Adjust collector-specific flags only after measuring with logging enabled.
Never: Copy flags from blog posts without understanding your workload's allocation profile.
Production Insight
The most impactful single flag change is enabling GC logging. Most production services run with default or minimal GC logging, making post-incident diagnosis impossible. A single line -Xlog:gc*:file=/var/log/gc.log:time,uptime,level,tags:filecount=5,filesize=50m provides pause time breakdowns, heap occupancy trends, and humongous allocation detection. Enable it before you need it — GC logs are retroactive only if they were already enabled.
Key Takeaway
Most JVM GC flags have sensible defaults. The three flags that matter most: (1) -Xms=-Xmx to prevent resize, (2) GC logging flags for observability, (3) collector-specific flags only after measuring. Never copy-paste JVM flags from the internet without profiling your own workload.
Choosing the right garbage collector depends on your workload's pause-time sensitivity, heap size, and throughput requirements. The table below summarizes the key characteristics of each major collector available in the JVM.
Collector
Pause Model
Heap Size
Primary Use Case
Java Version
Serial
Stop-the-world (STW) single-thread
<1GB
Small applications, client-side, embedded
Since JDK 1.2
Parallel
STW multi-thread
1-8GB
Throughput-oriented batch jobs, analytics
Since JDK 1.2 (default JDK 5-8)
G1
Region-based STW + concurrent marking
1GB-64GB+
General-purpose server applications
Since JDK 7 (default JDK 9+)
ZGC
Concurrent (STW < 10ms)
4GB-16TB
Ultra-low latency, large heaps
Experimental JDK 11, prod JDK 15+
Shenandoah
Concurrent (STW < 10ms)
1GB-64GB
Low latency with memory efficiency
Since JDK 12 (backported to 8, 11)
Key takeaway: For most web services, start with G1. Only move to ZGC or Shenandoah when your measured p99 latency exceeds 100ms after tuning G1. Serial and Parallel are legacy choices for resource-constrained or batch workloads.
Production Insight
The comparison above is based on default configurations. Actual production behavior depends on allocation rate, live data size, and object distribution. Always profile with your workload before making a collector switch. The most common mistake is switching to ZGC for a 2GB heap service — the native memory overhead and lack of compressed oops can increase memory consumption by 30%, leading to OOM kills.
Key Takeaway
G1 is the default for a reason — it balances throughput and pause time for most workloads. ZGC and Shenandoah offer sub-10ms pauses but cost throughput and memory. Serial and Parallel are specialized tools for batch processing or tiny heaps.
System.gc() and finalize() — Patterns to Avoid
Two legacy Java mechanisms that should be avoided in production: System.gc() and finalize(). Both degrade GC performance and unpredictability.
System.gc() — An explicit request to run the garbage collector. It's a hint, not a command, but JVM often treats it as a full GC trigger (especially with -XX:+DisableExplicitGC disabled). Calling it frequently causes unnecessary full GC pauses, wrecking latency. Also, some frameworks like RMI, NIO, and JNDI call it internally. Always set -XX:+DisableExplicitGC in production to mitigate accidental calls.
finalize() — The finalize() method, defined in Object, runs before an object is reclaimed. It's unpredictable — the JVM may never call it before exit, and GC threads can finalize objects out of order. Additionally, finalize() can resurrect objects by assigning this to a reachable reference. The method also introduces latency as the JVM must finalize objects in a separate pass. Since Java 9, finalize() is deprecated. Use Cleaner (JDK 9+), PhantomReference with a cleanup thread, or AutoCloseable / try-with-resources instead.
package io.thecodeforge.gc;
import java.lang.ref.Cleaner;
/**
* Demonstrates how to avoid System.gc() and finalize().
*
* BADPRACTICES:
* 1. CallingSystem.gc() - triggers unnecessary full GC
* 2. Overridingfinalize() - unpredictable, deprecated.
*
* GOOD: UseCleaner (JDK9+) or PhantomReference with reference queue.
*/
publicclassAvoidSystemGCAndFinalize {
// BAD - Avoid this
@Override
@Deprecated(since = "9")
protectedvoidfinalize() throwsThrowable {
try {
// Cleanup logic here - but this may never run!close();
} finally {
super.finalize();
}
}
privatevoidclose() {
System.out.println("Cleanup (if finalize runs)");
}
// GOOD - Use Cleaner (JDK 9+)privatestaticfinalCleanerCLEANER = Cleaner.create();
// State that needs cleaningprivatefinalCleaner.Cleanable cleanable;
publicAvoidSystemGCAndFinalize() {
// Register a cleaning actionthis.cleanable = CLEANER.register(this, () -> {
// This runs when the object becomes phantom-reachableSystem.out.println("Cleanup via Cleaner");
});
}
publicstaticvoidmain(String[] args) {
// NEVER do this:// System.gc(); // tells JVM to run GC - pauses, unpredictable// Better: let GC decide.// Disable explicit calls with -XX:+DisableExplicitGC// Use Cleaner or try-with-resources for cleanup.
}
}
Production Risk: System.gc() in Libraries
Some third-party libraries (RMI, JNDI, direct buffer management) call System.gc() internally. Without -XX:+DisableExplicitGC, these calls trigger full GC in your application, causing latency spikes. Always disable explicit GC in production, but test thoroughly — some frameworks rely on it for cleanup.
Production Insight
Even with -XX:+DisableExplicitGC, System.gc() is silently ignored. Best practice: always set this flag in production. For resource cleanup (file handles, sockets), use try-with-resources or Cleaner. Never rely on finalize() — it's deprecated and removed in future JDK versions (proposed for removal in JDK 18+).
Key Takeaway
Avoid System.gc() and finalize() at all costs in production code. Use -XX:+DisableExplicitGC to ignore explicit GC calls. Prefer try-with-resources for deterministic cleanup, and Cleaner for native resource cleanup.
Advantages and Disadvantages of Garbage Collection
Garbage Collection is a mixed blessing. It eliminates manual memory management bugs but introduces new operational challenges. The table below summarizes the trade-offs.
Advantages
Disadvantages
Eliminates memory leaks caused by forgotten free() calls
Introduces pauses (stop-the-world) that affect latency
Prevents dangling pointer bugs - objects are only reused after being unreachable
Reduces developer cognitive load – no manual memory management
Performance unpredictability – pauses vary with allocation pattern
Enables memory-safe concurrent programming with bounded overhead
Full GC occasionally compacts the entire heap, causing multi-second pauses
Provides tools for analysis (heap dumps, GC logs) to diagnose issues
Tuning requires deep understanding of collector algorithms and application behavior
Monitored at runtime – GC logs give insight into object lifetimes
Cannot control exactly when memory is reclaimed – objects may linger in old gen
Key takeaway: The disadvantages can be mitigated with proper collector selection and tuning. For most production services, the benefits far outweigh the costs, but ignore the downsides at your peril.
Production Insight
The biggest hidden disadvantage is the 'death by a thousand cuts' effect: a service with 50ms young GC pauses every second spends 5% of its time in GC. Combined with mixed GCs, remark pauses, and occasional full GCs, the total GC overhead can exceed 10% without any single pause being catastrophic. Track total GC time as a percentage of wall-clock time using GC logs – alert if it exceeds 5% for latency-sensitive services.
Key Takeaway
GC removes an entire class of programming errors but introduces pause and CPU overhead. Modern collectors minimize pauses but cannot eliminate them entirely. The key is to choose the right collector for your latency and throughput budget.
GC Tuning Flags Reference Table
This table lists the most important GC tuning flags along with their purpose and typical values. Use it as a quick reference when configuring JVM options for production.
Flag
Affects
Purpose
Typical Value / Range
-Xms, -Xmx
Heap size
Set initial and maximum heap
Equal values, e.g., -Xms4g -Xmx4g
-XX:MaxGCPauseMillis
G1
Soft target for maximum pause time
50–200ms (default 200)
-XX:G1HeapRegionSize
G1
Size of each region (humongous threshold = 50% of region)
1–32MB, power of 2
-XX:InitiatingHeapOccupancyPercent
G1
Heap occupancy % to trigger concurrent marking
30–45 (default 45)
-XX:G1ReservePercent
G1
Reserve % of heap for evacuation failures
10–20 (default 10)
-XX:ConcGCThreads
All concurrent
Number of threads for concurrent GC work
Auto-detected, typically n-1 cores
-XX:+DisableExplicitGC
All
Ignore System.gc() calls
Always enable in production
-XX:+UseContainerSupport
All
Respect container memory limits
Enabled by default JDK 10+
-XX:MaxRAMPercentage
All
Set max heap as % of container memory
75–85 (default 25 if not set!)
-XX:+AlwaysPreTouch
All
Commit heap pages at startup to reduce runtime latency
Enable for large heaps
-XX:NativeMemoryTracking
All
Track off-heap memory usage
summary or detail
-XX:+HeapDumpOnOutOfMemoryError
All
Generate heap dump on OOM
Enable for diagnosis
-XX:+ZGenerational
ZGC
Enable generational mode (JDK 21+)
Always enable on JDK 21+
-XX:SoftMaxHeapSize
ZGC
Target heap occupancy for ZGC (hints GC to cycle earlier)
75–90% of Xmx
-XX:ShenandoahGCHeuristics
Shenandoah
Collection policy: adaptive, compact, or static
adaptive
-Xlog:gc*
Logging
Enable GC logging with details
-Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime
Key takeaway: The most impactful flags are GC logging (for observability) and heap sizing. Tuning collector-specific flags without enabling logs is like fixing a car engine blindfolded – possible but wasteful.
Production Insight
A common mistake is setting -XX:MaxRAMPercentage incorrectly. Many container images leave it at default (25%), causing the JVM to allocate only 25% of container memory as heap. Always explicitly set -XX:MaxRAMPercentage=75.0 (or MaxRAMFraction=1) to utilize available memory. Also, never set -Xmx equal to container memory – you need room for native overhead.
Key Takeaway
Use this reference table when configuring JVM flags for a new service. Start with logging and heap sizing, then add collector-specific flags based on observed behavior. Test flag changes in staging before applying to production.
● Production incidentPOST-MORTEMseverity: high
Full GC Spiral Crashes Order Processing Service During Flash Sale
Symptom
Order API p99 latency spiked from 80ms to 30+ seconds. Kubernetes liveness probes failed, triggering pod restarts. After restart, the pattern repeated within 10 minutes. GC logs showed 'Pause Full (Allocation Failure)' with increasing frequency.
Assumption
Team assumed the heap was too small and doubled -Xmx from 4GB to 8GB. The problem persisted — full GC pauses were longer because the live data set was larger.
Root cause
The service cached order objects in a ConcurrentHashMap with no eviction policy. Under flash sale traffic, the cache grew unbounded until old generation was 98% full. G1 could not reclaim enough space during mixed GCs because most old regions contained live cached data. Concurrent marking kept running but found almost nothing collectible. Eventually, young generation allocation failed and G1 fell back to a full GC stop-the-world pause. Doubling the heap only delayed the inevitable — the cache still grew unbounded.
Fix
Three-part fix: (1) Added size-bounded eviction to the order cache using Caffeine with maximumSize(50000) and expireAfterWrite(Duration.ofMinutes(30)). (2) Enabled GC logging with -Xlog:gc*,gc+humongous=debug:file=/var/log/gc.log to monitor heap pressure proactively. (3) Set -XX:InitiatingHeapOccupancyPercent=35 to trigger concurrent marking earlier, giving mixed GCs more cycles to reclaim space before allocation pressure hit.
Key lesson
Unbounded caches are the #1 cause of GC-related production incidents in Java services
Full GC 'Allocation Failure' means the collector cannot free enough space — it is not a tuning problem, it is an application memory management problem
Doubling heap without fixing the allocation pattern just delays the same failure with a longer full GC pause
Every production service must have a bounded eviction strategy for any in-memory data structure
Monitor old generation utilization sustained above 85% as a leading indicator of full GC risk
Production debug guideFollow this path when GC is suspected as the root cause of latency or availability issues.5 entries
Symptom · 01
Latency spikes correlate with GC pauses in application logs
→
Fix
Enable GC logging with -Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime,level,tags and correlate pause timestamps with latency metrics. Check if pauses are young GC, mixed GC, or full GC.
Symptom · 02
Full GC appearing frequently in steady-state traffic
→
Fix
Full GC signals the collector cannot keep up. Check for unbounded caches, humongous allocation rate, heap fragmentation, or metaspace exhaustion. Use jmap -histo to identify which object types dominate the heap.
Symptom · 03
Throughput drops but pause times are acceptable
→
Fix
Collector is consuming too much CPU. Check concurrent GC thread count (-XX:ConcGCThreads). Reduce if GC CPU usage exceeds 15-20% of total. Profile allocation rate — if > 2GB/sec, reduce allocation pressure at the application level.
Symptom · 04
OOM kill with no heap exhaustion visible in metrics
→
Fix
Check native memory: metaspace, thread stacks, direct byte buffers, mmap regions. Use -XX:NativeMemoryTracking=detail and jcmd <pid> VM.native_memory summary.
Symptom · 05
GC pause time increases linearly with heap size
→
Fix
G1 pauses scale with live data set, not heap size. If pauses scale with heap, evaluate switching to ZGC or Shenandoah where pauses are independent of heap size.
★ GC Triage Cheat Sheet — First 60 SecondsFast diagnostic commands when GC is suspected. Run these before diving into GC logs.
Application unresponsive, suspected full GC−
Immediate action
Check if JVM is in a GC stop-the-world pause
Commands
jcmd <pid> GC.heap_info
jstat -gcutil <pid> 1000 10
Fix now
If Full GC count is incrementing, check for unbounded caches and heap fragmentation immediately. Restart with -Xlog:gc+humongous=debug
High CPU with low application throughput+
Immediate action
Check if GC threads are consuming CPU
Commands
top -H -p <pid> | grep -E 'VM Thread|GC Thread'
jcmd <pid> VM.flags | grep -i conc
Fix now
Reduce -XX:ConcGCThreads or -XX:ParallelGCThreads if GC CPU > 20%. Consider if allocation rate can be reduced at application level.
Latency spikes at regular intervals+
Immediate action
Correlate spike timing with GC cycle phases
Commands
jstat -gcutil <pid> 500 20
grep 'Pause' gc.log | tail -20
Fix now
If spikes align with 'mixed' or 'remark' phases, tune -XX:G1MixedGCCountTarget or -XX:MaxGCPauseMillis.
OOM kill by container orchestrator (k8s)+
Immediate action
Compare container memory limit with JVM heap + native overhead
Commands
kubectl describe pod <pod> | grep -A5 'OOMKilled'
jcmd <pid> VM.native_memory summary
Fix now
Set -XX:MaxRAMPercentage to 75% max (not 90%). Account for ~20% native overhead. Add container memory limit = heap * 1.3 for ZGC.
Allocation failure in logs, to-space exhausted+
Immediate action
G1 cannot evacuate objects — critical failure
Commands
grep 'to-space exhausted' gc.log | wc -l
grep 'humongous' gc.log | tail -20
Fix now
Increase -XX:G1ReservePercent to 15. Increase region size. Reduce allocation rate. This triggers full GC — treat as P1.