Java Memory Leaks Explained: Causes, Detection and Prevention
- A Java memory leak is always a reachability problem, not a GC failure — the GC cannot collect an object that any live reference chain touches, even if your code will never use that object again. The GC is working correctly. Your reference is the problem.
- ThreadLocal in a thread pool is the most dangerous leak pattern in enterprise Java — always call
ThreadLocal.remove()in a finally block, or use a framework-level TaskDecorator that does it for you. In JDK 23+, ScopedValue provides a safer alternative for per-request context. - WeakHashMap silently fails to evict entries when String literals are used as keys because the JVM string pool holds a permanent strong reference to every interned string — use new String(key) or a domain object as the key, or replace WeakHashMap with Caffeine for anything in production.
- A Java memory leak is an object reachable from a GC root but logically dead — GC cannot read your intent, only your references
- GC uses reachability analysis, not reference counting — any object on a live reference chain stays in memory forever regardless of whether your code will ever touch it again
- The six classic patterns: unbounded static collections, un-deregistered listeners, non-static inner classes, ThreadLocal in pools, mutated HashMap keys, classloader leaks
- Leaks in old generation are silent — they survive minor GCs and grow slowly until OOM, often hours or days after the leak began
- The biggest production trap: ThreadLocal.set() without .remove() in thread pools — the value stays pinned to the thread for its entire lifetime
- Biggest mistake: assuming GC prevents leaks. GC prevents unreachable objects from staying. A leaked object is, by definition, reachable.
OutOfMemoryError in logs, service restarted.
ls -lh /var/log/myapp/heapdumps/*.hprofjcmd <pid> GC.class_histogram | head -20Heap usage trending upward over hours or days.
jcmd <pid> GC.class_histogram > /tmp/histo1.txtsleep 600 && jcmd <pid> GC.class_histogram > /tmp/histo2.txtHigh CPU with frequent full GC pauses.
jstat -gcutil <pid> 1000 10jcmd <pid> VM.flags | grep -i 'heapdump\|gc'Suspect ThreadLocal leak — pool threads with large retained heap.
jcmd <pid> Thread.print > /tmp/threaddump.txtjcmd <pid> GC.class_histogram | grep -i 'threadlocal\|ThreadLocalMap'Production Incident
ThreadLocal.remove(). The ThreadLocal entry itself — and the empty-but-still-allocated PaymentContext object — remained pinned to the thread's ThreadLocalMap. Over 18 hours of processing approximately 2,000 tasks per hour, the 4 pool threads accumulated stale contexts that the GC could not touch because the threads themselves were GC roots. The fix was a single missing line.REQUEST_CONTEXT.remove() in the finally block of every Runnable submitted to the executor — the immediate one-line fix.
2. Registered a TaskDecorator on the ThreadPoolTaskExecutor to enforce cleanup at the framework level, so individual task authors cannot forget it.
3. Added a custom Micrometer gauge monitoring the ThreadLocal map size per thread, exposing the metric to the production dashboard so future accumulation is visible before it becomes an incident.
4. Enabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/myapp/heapdumps/ for all production JVMs — if it happens again, evidence is captured automatically.
5. Added a code review checklist item and a custom ArchUnit rule: 'Every ThreadLocal.set() must have a corresponding remove() in a finally block.'remove() on the ThreadLocal object, not just null out fields on the valueFramework-level enforcement via TaskDecorator is more reliable than per-developer discipline — if cleanup can be forgotten, it will eventually be forgottenA leak that took 18 hours to manifest will take 18 hours to reproduce without a heap dump — capture evidence before taking any other actionRolling back code changes is useless if the leak is in a long-lived component like a thread pool that persists across deployments — identify the mechanism, not just the deploymentProduction Debug GuideSystematic debugging path for production OOM incidents.
ThreadLocal.set() calls and verify every one has a matching remove() in a finally block.Memory leaks in Java are reference management failures, not garbage collector failures. The GC works on reachability — if any live reference chain touches an object, it stays in memory regardless of whether your code will ever use it again. The GC is faithfully doing what the specification says. The problem is yours.
In production, leaks manifest as a rising heap baseline after each GC cycle. The old generation creeps upward after every collection. The floor rises. This pattern often goes unnoticed in staging because test load profiles rarely exercise the accumulation over hours or days that production traffic does — a leak that takes 18 hours to OOM a production service may never surface in a 5-minute load test.
The core challenge is that the JVM cannot distinguish intentional caching from unintentional retention. Only disciplined lifecycle management, defensive coding patterns, and the right monitoring setup can prevent leaks from reaching production — and when they do, the right tooling makes the difference between a two-hour diagnosis and a two-day investigation.
In 2026, with JDK 21 and virtual threads mainstream in enterprise codebases, the ThreadLocal leak pattern has become even more consequential. Virtual threads interact with ThreadLocal in ways that can amplify existing leaks, and the scoped values API (JEP 446, JDK 21 preview, JDK 23 standard) provides a safer alternative for per-request context propagation. Understanding the foundational leak mechanisms is the prerequisite for understanding why those newer APIs exist.
How the JVM Garbage Collector Actually Decides What to Free
Before you can understand why leaks happen, you need a clear picture of how the GC decides what to keep. The JVM uses reachability analysis, not reference counting. Python uses reference counting — every object tracks how many references point to it, and when that count drops to zero, the object is freed. Java takes a different approach because reference counting cannot handle circular references: if object A references object B and object B references object A, both counts are non-zero even if nothing else in the program uses either object. They would leak forever.
Reachability analysis solves this. The GC starts from a fixed set of root references — local variables on thread stacks, static fields, JNI references, and a few others — and walks the entire object graph from those roots. Any object reachable by following references from a root is considered live and is kept. Everything unreachable — including entire cycles of objects that reference only each other — is eligible for collection.
This is why a memory leak in Java is always a reference problem, not a GC problem. If you have a static List that accumulates objects, every object in that list is reachable from a GC root (the static field), so nothing gets collected — ever. The GC is behaving correctly. The leak is your reference.
Modern JVMs split the heap into regions and collect high-churn areas more aggressively. G1 uses a mix of young-generation regions collected frequently with short pauses and old-generation regions collected infrequently with longer pauses. ZGC and Shenandoah perform concurrent marking and compaction with sub-millisecond pause goals. But no collector can save you from long-lived references. An object that survives enough minor GCs gets promoted to old generation, and a leak in old generation grows silently — the old-gen baseline rises after every collection cycle until you hit OutOfMemoryError: Java heap space, often hours or days after the first leaked object was created.
The performance impact compounds: each full GC must mark and scan the entire live set in old generation. As the leaked set grows to millions of objects, full GC pause time grows proportionally. With G1, you will see increasing allocation failure GCs and eventually full GCs that pause the application for seconds. The leak does not just waste memory — it degrades GC performance, which degrades application latency, which is often the first observable symptom before memory exhaustion.
package io.thecodeforge.memory; import java.util.ArrayList; import java.util.List; /** * Demonstrates the difference between an object being logically 'done' * and being GC-eligible. The GC cannot read intent — only references. * * Run with: java -Xmx64m io.thecodeforge.memory.ReachabilityDemo * You will see OOM in roughly 60 iterations with a 64MB heap. */ public class ReachabilityDemo { /** * Static field == GC root. * Anything added here is permanently reachable and NEVER collected. * The GC sees a live reference chain from the class → CACHE → each byte[]. * It has no way to know that our business logic is done with those arrays. */ private static final List<byte[]> CACHE = new ArrayList<>(); public static void main(String[] args) throws InterruptedException { System.out.println("Simulating a static-field memory leak..."); System.out.println("Watch the heap baseline rise — it never drops."); System.out.println(); for (int iteration = 0; iteration < 1000; iteration++) { // Each byte array is 1 MB. Business logic is 'done' with it // after this method call, but CACHE still holds a reference. byte[] oneMegabyte = new byte[1024 * 1024]; oneMegabyte[0] = (byte) iteration; // use it so the compiler keeps it CACHE.add(oneMegabyte); // <-- this is the leak // The GC runs frequently but the old-gen baseline never drops. // Each iteration pushes the floor higher by 1MB. System.out.printf( "Iteration %3d | CACHE entries: %3d | Heap used: %3d MB%n", iteration, CACHE.size(), (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / (1024 * 1024) ); Thread.sleep(50); } } }
Watch the heap baseline rise — it never drops.
Iteration 0 | CACHE entries: 1 | Heap used: 3 MB
Iteration 1 | CACHE entries: 2 | Heap used: 4 MB
Iteration 10 | CACHE entries: 11 | Heap used: 14 MB
Iteration 50 | CACHE entries: 51 | Heap used: 54 MB
Iteration 60 | CACHE entries: 61 | Heap used: 64 MB
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
- Reference counting cannot handle circular references — A references B, B references A, both counts are 1, both leak forever
- Reachability analysis collects entire dead object cycles in one pass by starting from roots, not from objects
- The GC roots are: thread stacks (local variables), static fields, JNI references, and a few JVM internals
- Any object not reachable from a root is dead to the GC — whether or not it is logically dead to your code is irrelevant to the GC
The Six Classic Java Memory Leak Patterns (With Real Code)
Every Java memory leak in production falls into one of six categories. Knowing them by name means you can spot them in code review in seconds and ask the right questions in a heap dump in minutes.
Pattern 1 — Unbounded Static Collections: A static field grows without any removal or eviction strategy. Because static fields are GC roots, every object in the collection is permanently reachable. This is the simplest leak to understand and one of the easiest to introduce — any utility cache implemented as a static HashMap without size bounds qualifies.
Pattern 2 — Listener or Observer Not Deregistered: You add an event listener to a button, a JMX MBeanServer, an application event bus, or any other publisher. When the subscriber is logically done, nobody calls removeListener. The publisher's internal list holds a reference to the subscriber, keeping the entire object graph rooted at that subscriber alive indefinitely.
Pattern 3 — Non-Static Inner Classes and Anonymous Classes: Every non-static inner class in Java holds an implicit reference to its enclosing outer instance. If you hand that inner class to a long-lived component — a thread pool, a static cache, an executor service — the outer instance is pinned in memory for as long as that component lives. This is invisible at the call site and very common with anonymous Runnable and Callable implementations.
Pattern 4 — ThreadLocal Variables in Thread Pools: The most dangerous pattern in enterprise code. ThreadLocal values live in a ThreadLocalMap on the Thread object itself. In a thread pool, threads are reused and never die. If you call ThreadLocal.set() and never call ThreadLocal.remove(), that value — and the full object graph it references — lives as long as the thread does, which in a pool is the lifetime of the application.
Pattern 5 — Mutable Objects as HashMap Keys: Objects used as HashMap keys that are mutated after insertion can become orphaned in the map. The hashCode() changes, the object is in the wrong bucket, and get() returns null even though the entry exists. It is consuming memory but unreachable via normal Map operations. Over time, orphaned entries fill the map.
Pattern 6 — Classloader Leaks in Application Servers: Redeploying a web application creates a new classloader. If any JVM-wide component — a JDBC driver, a logging framework, a static thread — holds a reference to a class from the old classloader, the entire old classloader and every class it loaded stays in metaspace. Each redeploy leaks one classloader. After enough redeploys, metaspace exhaustion causes OOM: Metaspace.
For Pattern 6 specifically in 2026 Kubernetes environments: if your JDBC driver registers a static singleton with DriverManager and your application is deployed as a WAR to a shared Tomcat, each undeploy and redeploy leaks the old classloader (~50 to 200MB per cycle). After 10 redeploys, the container is out of metaspace. This is why Spring Boot's embedded server model was partially motivated — isolating the classloader lifecycle with the application process avoids this class of leak entirely.
package io.thecodeforge.memory; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; /** * Demonstrates a ThreadLocal memory leak inside a fixed thread pool. * * The pool reuses threads. ThreadLocal values set in one task remain * in the thread's ThreadLocalMap when the next task starts on the same thread. * * Run with: java -Xmx128m io.thecodeforge.memory.ThreadLocalLeakDemo * Watch heap climb with the fix commented out. * Uncomment REQUEST_CONTEXT.remove() to see heap stay flat. */ public class ThreadLocalLeakDemo { /** * ThreadLocal is NOT a variable. It is a key into a hidden * Map<ThreadLocal, Object> that lives on each Thread object. * When the Thread dies, the map is collected. * In a thread pool, threads never die — so neither do forgotten entries. */ private static final ThreadLocal<byte[]> REQUEST_CONTEXT = new ThreadLocal<>(); public static void main(String[] args) throws InterruptedException { // 4 threads — they live forever, accumulating stale ThreadLocal values ExecutorService threadPool = Executors.newFixedThreadPool(4); for (int taskNumber = 0; taskNumber < 500; taskNumber++) { final int currentTask = taskNumber; threadPool.submit(() -> { try { // Simulating per-request data: user session, trace context, // transaction payload — anything you would not want pinned // to the thread after the request is done. byte[] requestPayload = new byte[500 * 1024]; // 500 KB requestPayload[0] = (byte) currentTask; REQUEST_CONTEXT.set(requestPayload); processRequest(currentTask); // Without remove(), requestPayload stays in this thread's // ThreadLocalMap after the task completes. The thread goes // back to the pool. The reference is never cleared. // 4 threads × accumulating entries = unbounded growth. } finally { // THE FIX: uncomment this line to prevent the leak. // This is the single most important line in thread pool code // that uses ThreadLocal. // REQUEST_CONTEXT.remove(); } }); if (currentTask % 50 == 0) { long usedHeapMB = (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / (1024 * 1024); System.out.printf( "Tasks submitted: %3d | Heap used: ~%d MB%n", currentTask, usedHeapMB ); } } threadPool.shutdown(); threadPool.awaitTermination(30, TimeUnit.SECONDS); System.out.println("Done. Check heap trend above."); } private static void processRequest(int taskNumber) { byte[] context = REQUEST_CONTEXT.get(); // In production: this might be a UserSession, PaymentContext, // MDC trace ID, or any per-request state. System.out.printf("Task %3d on thread: %s | context bytes: %d%n", taskNumber, Thread.currentThread().getName(), context != null ? context.length : 0 ); } }
Task 0 on thread: pool-1-thread-1 | context bytes: 512000
Task 1 on thread: pool-1-thread-2 | context bytes: 512000
...
Tasks submitted: 50 | Heap used: ~31 MB
Tasks submitted: 100 | Heap used: ~55 MB
Tasks submitted: 150 | Heap used: ~79 MB
Tasks submitted: 200 | Heap used: ~104 MB
--- With REQUEST_CONTEXT.remove() uncommented ---
Tasks submitted: 50 | Heap used: ~9 MB
Tasks submitted: 100 | Heap used: ~9 MB
Tasks submitted: 200 | Heap used: ~9 MB <-- flat. No leak.
- The leak is per-thread — a 4-thread pool means 4 independent accumulation points, not 1
- The ThreadLocalMap is anchored in thread stacks, which are GC roots — the leak is invisible from the object graph perspective without thread-specific heap analysis
- Clearing fields inside the ThreadLocal value does NOT remove the ThreadLocal entry — the entry stays in the ThreadLocalMap even with all fields set to null
- In JDK 21 with virtual threads, ThreadLocal semantics are preserved but ScopedValue (JEP 446) provides a safer alternative for per-request context that is automatically cleaned up
remove(), non-static anonymous inner class submitted to an executor, HashMap keys with mutable state used in hashCode, and JDBC driver registration without deregistration lifecycle. Prevention is orders of magnitude cheaper than diagnosis — a single missing remove() can cost hours of debugging and thousands in incident response.ThreadLocal.remove() in a finally block, and add a TaskDecorator for framework-level enforcement.get() returns null for keys you know were insertedWeakReference, SoftReference and the Right Way to Build a Cache
Java provides four reference strengths, and choosing the right one is how you build caches that release memory correctly under pressure instead of growing without bounds.
A Strong reference is your normal Object obj = new Object(). The GC will never collect the referent while any strong reference to it exists. This is what every variable assignment creates by default.
A SoftReference tells the GC: keep this if you have the memory, but clear it before throwing OutOfMemoryError. The JVM guarantees that all soft references are cleared before an OOM is thrown. This makes SoftReference suitable for memory-sensitive caches where the cached value is expensive to recompute and you want to keep it as long as possible.
A WeakReference tells the GC: collect this whenever you want — I don't need it to survive a GC cycle. WeakHashMap uses this internally: if the key has no strong references outside the map, the entry is automatically removed. This is ideal for metadata caches where the cache entry's lifecycle should be bound to the key object's lifecycle.
A PhantomReference is for post-mortem cleanup. You get a notification via a ReferenceQueue after the object is enqueued for collection. Used for cleaning up native resources (off-heap memory, file handles) as a safer, more predictable alternative to finalize().
The production reality is that SoftReference-based caches have non-deterministic eviction timing that can cause thundering herd cache misses under sudden memory pressure. WeakHashMap has subtle failure modes with interned String keys and is not thread-safe. For any production cache, use Caffeine — it implements Window TinyLFU eviction, is fully thread-safe, provides statistics, integrates with Spring Boot's caching abstraction, and outperforms hand-rolled reference queues in every benchmark that matters. WeakHashMap and SoftReference are important to understand because they are the foundation, but Caffeine is what you deploy.
package io.thecodeforge.memory; import java.lang.ref.SoftReference; import java.lang.ref.WeakReference; import java.util.WeakHashMap; /** * Side-by-side comparison of Strong, Soft, and Weak references. * Shows WeakHashMap auto-eviction and the String literal gotcha. * * Run with: java -Xmx32m io.thecodeforge.memory.ReferenceTypesDemo */ public class ReferenceTypesDemo { public static void main(String[] args) throws InterruptedException { demonstrateSoftReference(); demonstrateWeakHashMap(); demonstrateStringLiteralGotcha(); } private static void demonstrateSoftReference() { System.out.println("=== SoftReference: cleared only under memory pressure ==="); byte[] expensiveData = new byte[10 * 1024 * 1024]; // 10 MB SoftReference<byte[]> softCache = new SoftReference<>(expensiveData); // Drop the strong reference — only the soft reference holds the data now expensiveData = null; System.out.println("Before pressure — data available: " + (softCache.get() != null)); // true // Simulate memory pressure — forces GC to consider clearing soft refs try { byte[] pressureBlock = new byte[25 * 1024 * 1024]; System.out.println("Allocated pressure block: " + pressureBlock.length + " bytes"); } catch (OutOfMemoryError oom) { System.out.println("OOM — JVM cleared soft refs before throwing"); } System.out.println("After pressure — data available: " + (softCache.get() != null)); // likely false System.out.println(); } private static void demonstrateWeakHashMap() throws InterruptedException { System.out.println("=== WeakHashMap: entry lives only as long as the key ==="); WeakHashMap<Object, String> metadataCache = new WeakHashMap<>(); // Use Object as key — a domain object that has no other references. // This simulates caching metadata about a parsed AST node. Object sessionKey = new Object(); metadataCache.put(sessionKey, "{ role: admin, locale: en-US }"); System.out.println("Cache size before GC: " + metadataCache.size()); // 1 // Drop the only strong reference to the key sessionKey = null; System.gc(); Thread.sleep(200); // give GC time to run // WeakHashMap expunges stale entries on the next map operation System.out.println("Cache size after GC: " + metadataCache.size()); // 0 System.out.println("Entry auto-evicted — no manual removal needed!"); System.out.println(); } private static void demonstrateStringLiteralGotcha() throws InterruptedException { System.out.println("=== String literal key: the WeakHashMap gotcha ==="); WeakHashMap<String, String> brokenCache = new WeakHashMap<>(); // String literals are interned — the JVM string pool holds a permanent // strong reference to them. This key will NEVER be collected. // The WeakHashMap entry stays forever. String literalKey = "my-session-key"; // interned — permanent strong ref brokenCache.put(literalKey, "some cached value"); // Setting literalKey = null does NOT clear the intern pool reference. // The string pool retains the strong reference. literalKey = null; System.gc(); Thread.sleep(200); System.out.println("Cache size after GC with String literal key: " + brokenCache.size()); // still 1 — entry was NOT evicted System.out.println("This WeakHashMap will grow forever with String literal keys."); System.out.println("Fix: use 'new String(key)' or a domain object as the key."); System.out.println("Better fix: use Caffeine with expireAfterWrite and maximumSize."); } }
Before pressure — data available: true
Allocated pressure block: 26214400 bytes
After pressure — data available: false
=== WeakHashMap: entry lives only as long as the key ===
Cache size before GC: 1
Cache size after GC: 0
Entry auto-evicted — no manual removal needed!
=== String literal key: the WeakHashMap gotcha ===
Cache size after GC with String literal key: 1
This WeakHashMap will grow forever with String literal keys.
Fix: use 'new String(key)' or a domain object as the key.
Better fix: use Caffeine with expireAfterWrite and maximumSize.
- Use WeakReference when the cached value is only useful while the key is alive — metadata for a parsed AST node, classloader-scoped data, or canonicalisation maps
- Use SoftReference when the cached value is expensive to recompute and you want to keep it as long as possible without risking OOM — image thumbnails, compiled templates
- WeakReferences are collected eagerly at the next GC cycle; SoftReferences are cleared only under genuine memory pressure before OOM
- For production caches with predictable behaviour, use Caffeine with explicit maximumSize and expireAfterWrite — eviction is gradual, measurable, and does not cause thundering herd cache misses
Collections.synchronizedMap() or replace it with ConcurrentHashMap. The combination of non-thread-safety and surprising String interning behaviour makes WeakHashMap a footgun in most production scenarios.String()), or Caffeine with weakKeys() for thread safetyCaffeine.newBuilder().maximumSize(10_000).expireAfterWrite(Duration.ofMinutes(10)).build() — this is the right answer for almost every production cacheFinding Leaks in Production: VisualVM, JVM Flags and Eclipse MAT
Knowing the patterns is half the battle. The other half is diagnosing a leak you did not write — in a service you have never seen before, under traffic you cannot fully reproduce. Here is the systematic approach that works in real incidents.
Step 1: Confirm the leak with GC logs. Enable GC logging on every production JVM: -Xlog:gc*:file=gc.log:time,uptime:filecount=5,filesize=20m (JDK 9+ unified logging syntax). A healthy heap shows a sawtooth pattern — usage climbs, GC runs, usage drops back to a consistent baseline. A leaking heap shows that baseline creeping upward after every GC cycle. That rising floor is your smoking gun before you touch any other tool.
Step 2: Get a heap dump. Trigger one without restarting: jcmd pid GC.heap_dump /tmp/heapdump.hprof. This is preferred over jmap -dump in production because it uses a safer code path in JDK 9+. For automated capture, add -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp to your JVM flags permanently — this is non-negotiable for production services.
Step 3: Analyse with Eclipse MAT. Open the .hprof file and immediately run Leak Suspects Report. MAT identifies the largest retained heaps and the reference chains keeping them alive without you needing to know where to look. Then examine the Dominator Tree — it shows retained heap (the total memory that would be freed if this object were collected, including its entire object graph), not just shallow heap (the object's own bytes). Follow the dominator chain until you reach the GC root.
Step 4: VisualVM for live profiling. Connect via JMX, open the Sampler tab, and use Memory sampling to see which classes have the most live instances and total retained bytes. The key metric is monotonic growth — a class whose instance count keeps rising across samples is leaking.
Step 5: Java Flight Recorder for continuous low-overhead production monitoring. The jdk.OldObjectSample event captures objects that have survived multiple GC cycles — exactly the objects you care about — with near-zero overhead (under 2% CPU). Run jcmd pid JFR.start duration=300s filename=recording.jfr settings=profile and open in JDK Mission Control. This is the preferred approach for production systems where you need ongoing visibility without the stop-the-world cost of heap dumps.
MAT OQL for power users: SELECT FROM java.util.HashMap WHERE size > 10000 finds large maps. SELECT FROM java.lang.Thread WHERE name LIKE 'pool*' finds pool threads and their retained heap. The Compare Snapshots feature is essential — take two dumps 15 minutes apart and MAT shows exactly what grew between them, confirming the leak is active and identifying the growing class.
package io.thecodeforge.memory; /** * Production leak detection configuration reference. * * Add JVM flags to your startup configuration: * - Docker: ENV JAVA_OPTS in Dockerfile or docker-compose * - Kubernetes: spec.containers[].env in deployment manifest * - systemd: Environment= in unit file * * ===================================================================== * MANDATORY JVM FLAGS FOR PRODUCTION LEAK DETECTION * ===================================================================== * * # Automatic heap dump on OutOfMemoryError — non-negotiable * -XX:+HeapDumpOnOutOfMemoryError * -XX:HeapDumpPath=/var/log/myapp/heapdumps/ * * # GC logging — sawtooth pattern confirms health, rising baseline confirms leak * -Xlog:gc*:file=/var/log/myapp/gc.log:time,uptime:filecount=5,filesize=20m * * # Native memory tracking — for off-heap and metaspace leaks * -XX:NativeMemoryTracking=summary * * # G1 is default in JDK 9+ and suitable for most workloads * -XX:+UseG1GC * -XX:MaxGCPauseMillis=200 * * ===================================================================== * LIVE DIAGNOSTIC COMMANDS (no restart required) * ===================================================================== * * # Find the JVM process ID * $ jps -l * 18423 io.thecodeforge.service.PaymentService * * # Trigger a heap dump without stopping the process * # Preferred over jmap in production — safer code path in JDK 9+ * $ jcmd 18423 GC.heap_dump /tmp/heapdump-$(date +%Y%m%d-%H%M%S).hprof * * # Class histogram — top memory consumers by class type * # Take two snapshots 10 minutes apart and diff them * $ jcmd 18423 GC.class_histogram | head -30 * * # GC occupancy — watch Old/O for rising baseline across collections * # O = old gen occupancy, S0/S1 = survivor spaces, E = eden * $ jstat -gcutil 18423 1000 30 * * # Native memory summary — catches metaspace and direct buffer leaks * $ jcmd 18423 VM.native_memory summary * * # Thread dump — needed for ThreadLocal leak analysis * $ jcmd 18423 Thread.print > /tmp/threaddump.txt * * # Start JFR recording — near-zero overhead, runs in production * # Look for jdk.OldObjectSample events for long-lived object tracking * $ jcmd 18423 JFR.start duration=300s filename=/tmp/recording.jfr settings=profile * * ===================================================================== * INTERPRETING A CLASS HISTOGRAM * ===================================================================== * * num #instances #bytes class name * --- ---------- ------ ---------- * 1: 950,234 22.8MB [B (byte arrays) * 2: 420,000 13.4MB io.thecodeforge.model.UserSession * 3: 420,000 6.7MB java.util.HashMap$Node * * If UserSession count grows monotonically across two snapshots, * and HashMap$Node count matches it (UserSession has a HashMap field), * you almost certainly have a session or cache without eviction. * * The 1:1 ratio between UserSession and HashMap$Node is a strong signal * that the same objects are being accumulated — not coincidentally similar counts. */ public class LeakDetectionSetup { public static void main(String[] args) { System.out.println("See Javadoc above for production JVM configuration."); System.out.println(); // Runtime heap stats — useful for a /health/memory endpoint Runtime jvmRuntime = Runtime.getRuntime(); long maxHeapMB = jvmRuntime.maxMemory() / (1024 * 1024); long totalHeapMB = jvmRuntime.totalMemory() / (1024 * 1024); long freeHeapMB = jvmRuntime.freeMemory() / (1024 * 1024); long usedHeapMB = totalHeapMB - freeHeapMB; System.out.printf("Max heap: %6d MB%n", maxHeapMB); System.out.printf("Used heap: %6d MB%n", usedHeapMB); System.out.printf("Free heap: %6d MB%n", freeHeapMB); System.out.printf("Usage: %6.1f%%%n", (double) usedHeapMB / maxHeapMB * 100); } }
Max heap: 256 MB
Used heap: 8 MB
Free heap: 248 MB
Usage: 3.1%
- MAT's Leak Suspects Report automates the initial hunt — it identifies the largest retained heaps and the reference chains keeping them alive without you knowing where to start
- MAT's Dominator Tree shows retained heap (total memory freed if this object is collected), not just shallow heap (the object's own bytes) — the distinction is everything for identifying the real culprit
- MAT's Compare Snapshots feature shows what grew between two dumps — essential for confirming an active leak and identifying the growing class before an OOM occurs
- MAT's OQL lets you query the heap like a database — find all HashMaps over 10K entries, all ThreadLocalMaps, all instances of a specific domain class
- VisualVM is better for live interactive sampling and CPU profiling. MAT is the right tool for post-mortem heap analysis.
| Reference Type | Collected When? | Ideal Use Case | get() After GC |
|---|---|---|---|
| Strong Reference (normal variable assignment) | Never while the reference exists — the GC will not touch it | All regular objects — this is the default for every variable and field in Java | Not applicable — the referent is always live while the reference exists |
| SoftReference | Only under genuine memory pressure, guaranteed before OOM is thrown | Memory-sensitive caches where the cached value is expensive to recompute (compiled templates, image thumbnails) | Returns null after the GC clears it — always check for null before dereferencing |
| WeakReference | Next GC cycle — no guarantee on timing, but collected eagerly regardless of memory pressure | WeakHashMap metadata caches where entry lifetime should match key lifetime, canonicalisation maps, listener registries | Returns null after collection — always check for null, and do not use String literals as keys in WeakHashMap |
| PhantomReference | After the object is finalised and enqueued, before memory is actually reclaimed | Native resource cleanup (off-heap memory, file handles) as a safer and more predictable alternative to finalize() | Always returns null by specification — use a ReferenceQueue to receive notification of collection |
| WeakHashMap entry | Automatically when the key object has no strong references outside the map — expunged on the next map operation after GC | Cache where entry lifetime should match key lifetime — effective only with non-interned, non-primitive keys | Entry removed automatically — but only on next map operation (get, put, size), not proactively after GC |
🎯 Key Takeaways
- A Java memory leak is always a reachability problem, not a GC failure — the GC cannot collect an object that any live reference chain touches, even if your code will never use that object again. The GC is working correctly. Your reference is the problem.
- ThreadLocal in a thread pool is the most dangerous leak pattern in enterprise Java — always call
ThreadLocal.remove()in a finally block, or use a framework-level TaskDecorator that does it for you. In JDK 23+, ScopedValue provides a safer alternative for per-request context. - WeakHashMap silently fails to evict entries when String literals are used as keys because the JVM string pool holds a permanent strong reference to every interned string — use new String(key) or a domain object as the key, or replace WeakHashMap with Caffeine for anything in production.
- Enable -XX:+HeapDumpOnOutOfMemoryError and -XX:HeapDumpPath on every production JVM from day one — without a heap dump captured at crash time, diagnosing an OOM is guesswork that typically takes days instead of hours.
- The rising old-generation baseline after full GC is the definitive leak signature and your only early warning before OOM. A healthy sawtooth returns to the same floor after each collection. A leaking sawtooth shows a floor that rises monotonically — monitor this metric and alert at 70% old-gen occupancy.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QThe GC is supposed to handle memory management in Java — so how can a memory leak even occur? Walk me through the exact mechanism that keeps an object alive despite it being logically unused.Mid-levelReveal
- QYou get paged at 2 AM: production service restarted with OutOfMemoryError. You have a heap dump. Walk me through exactly what you do next — tools, commands, what you are looking at, and how you pinpoint the root cause.Mid-levelReveal
- QWhat is the difference between a SoftReference and a WeakReference, and when would you choose one over the other? What happens if you use a String literal as a key in a WeakHashMap, and why does the entry not get evicted?JuniorReveal
- QYour service uses a thread pool with 8 threads. Each task sets a ThreadLocal with a 2MB payload. After running 10,000 tasks, heap usage is higher than expected. Why exactly, and what is the precise fix?Mid-levelReveal
- QYou deploy a Spring Boot WAR to Tomcat. After 20 redeploys without restarting the container, it crashes with OutOfMemoryError: Metaspace. What is happening mechanically, and how do you fix it?SeniorReveal
Frequently Asked Questions
How do I find a memory leak in a Java application without restarting it?
Use jcmd pid GC.class_histogram to print a live class instance count — take two snapshots 10 to 15 minutes apart and diff them to find classes whose instance count is growing monotonically. For a full heap analysis without restarting, trigger a dump with jcmd pid GC.heap_dump /tmp/dump.hprof and open it in Eclipse MAT. Note that heap dumps cause a full Stop-The-World pause proportional to heap size, so use them sparingly in production. For continuous monitoring without STW cost, enable Java Flight Recorder with jcmd pid JFR.start duration=120s filename=recording.jfr and examine the jdk.OldObjectSample events in JDK Mission Control — this runs with under 2% overhead and captures long-lived objects across GC cycles.
Does setting an object to null in Java immediately free its memory?
No. Setting a reference to null removes that specific reference from the reachability graph, but the object's memory is only reclaimed once all references to it are gone and the GC has run. You have no control over when the GC runs or reclaims specific objects. In most code you do not need to null out references explicitly — letting local variables go out of scope naturally removes their references. Explicitly nulling long-lived collection entries or static fields is sometimes necessary to aid the GC, but setting a local variable to null at end-of-method has no practical effect.
What is the difference between a memory leak and an OutOfMemoryError? Are they the same thing?
A memory leak is a cause; OutOfMemoryError is one possible symptom. A memory leak means your application holds references to objects it will never use again, preventing the GC from reclaiming them. An OOM is thrown when the JVM cannot allocate memory for a new object after exhausting the available heap and running a full GC. You can get an OOM without any leak — processing a genuinely enormous dataset or allocating an oversized buffer both cause OOM without retention problems. You can have a slow leak that runs for days or weeks before causing an OOM. Always look for the rising heap baseline after full GC cycles — that pattern distinguishes a leak from simply needing more heap.
Can a Java memory leak occur in metaspace instead of the heap?
Yes. Metaspace (renamed from PermGen in JDK 8) stores class metadata — the structural information about every class the JVM has loaded. Classloader leaks — where an old classloader is not garbage collected because something outside it references a class it loaded — cause metaspace to grow unboundedly. Each WAR redeploy in a shared container creates a new classloader. If the old one is not collected, its classes remain in metaspace permanently. Monitor metaspace with jcmd pid VM.native_memory summary and watch the Class space. Enable -XX:NativeMemoryTracking=summary for detailed native memory accounting. Heap analysis tools like MAT will not show metaspace leaks because metaspace is not part of the Java heap.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.