Skip to content
Home Java Java Memory Leaks Explained: Causes, Detection and Prevention

Java Memory Leaks Explained: Causes, Detection and Prevention

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Advanced Java → Topic 20 of 28
Java memory leaks explained deeply — static references, listeners, ThreadLocal misuse, heap profiling with VisualVM and MAT.
🔥 Advanced — solid Java foundation required
In this tutorial, you'll learn
Java memory leaks explained deeply — static references, listeners, ThreadLocal misuse, heap profiling with VisualVM and MAT.
  • A Java memory leak is always a reachability problem, not a GC failure — the GC cannot collect an object that any live reference chain touches, even if your code will never use that object again. The GC is working correctly. Your reference is the problem.
  • ThreadLocal in a thread pool is the most dangerous leak pattern in enterprise Java — always call ThreadLocal.remove() in a finally block, or use a framework-level TaskDecorator that does it for you. In JDK 23+, ScopedValue provides a safer alternative for per-request context.
  • WeakHashMap silently fails to evict entries when String literals are used as keys because the JVM string pool holds a permanent strong reference to every interned string — use new String(key) or a domain object as the key, or replace WeakHashMap with Caffeine for anything in production.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • A Java memory leak is an object reachable from a GC root but logically dead — GC cannot read your intent, only your references
  • GC uses reachability analysis, not reference counting — any object on a live reference chain stays in memory forever regardless of whether your code will ever touch it again
  • The six classic patterns: unbounded static collections, un-deregistered listeners, non-static inner classes, ThreadLocal in pools, mutated HashMap keys, classloader leaks
  • Leaks in old generation are silent — they survive minor GCs and grow slowly until OOM, often hours or days after the leak began
  • The biggest production trap: ThreadLocal.set() without .remove() in thread pools — the value stays pinned to the thread for its entire lifetime
  • Biggest mistake: assuming GC prevents leaks. GC prevents unreachable objects from staying. A leaked object is, by definition, reachable.
🚨 START HERE
Java Memory Leak Triage Cheat Sheet
First-response commands when an OOM alert fires. No theory — just actions.
🔴OutOfMemoryError in logs, service restarted.
Immediate ActionCheck for heap dump file before doing anything else — evidence disappears on restart.
Commands
ls -lh /var/log/myapp/heapdumps/*.hprof
jcmd <pid> GC.class_histogram | head -20
Fix NowIf no dump exists, restart with: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/myapp/heapdumps/ — make this permanent in your JVM startup flags, not a one-time addition
🟡Heap usage trending upward over hours or days.
Immediate ActionTake two class histogram snapshots and diff them to find the growing class.
Commands
jcmd <pid> GC.class_histogram > /tmp/histo1.txt
sleep 600 && jcmd <pid> GC.class_histogram > /tmp/histo2.txt
Fix Nowdiff /tmp/histo1.txt /tmp/histo2.txt | grep '>' | head -10 — classes whose instance count is growing are your leak suspects. Sort by instance count delta.
🟠High CPU with frequent full GC pauses.
Immediate ActionCheck GC overhead and old generation occupancy to confirm the leak pattern.
Commands
jstat -gcutil <pid> 1000 10
jcmd <pid> VM.flags | grep -i 'heapdump\|gc'
Fix NowIf Old/O stays above 90% constantly after full GC, leak is confirmed. Trigger heap dump immediately: jcmd <pid> GC.heap_dump /tmp/dump-$(date +%Y%m%d-%H%M%S).hprof
🟡Suspect ThreadLocal leak — pool threads with large retained heap.
Immediate ActionGet thread dump and inspect ThreadLocal maps before the service is restarted.
Commands
jcmd <pid> Thread.print > /tmp/threaddump.txt
jcmd <pid> GC.class_histogram | grep -i 'threadlocal\|ThreadLocalMap'
Fix NowSearch codebase for ThreadLocal.set() without matching .remove() in finally blocks — grep -rn 'ThreadLocal.*set' src/ and verify every result has a corresponding remove() in the same try block
Production IncidentPayment Service OOM During Black Friday PeakA payment processing service using ThreadLocal for per-request transaction context leaked 3.2GB over 18 hours, causing cascading failures across dependent services during peak traffic.
SymptomService restarted with java.lang.OutOfMemoryError: Java heap space at 2:47 AM. Monitoring showed heap growing approximately 180MB per hour even after restart — meaning the fix was not the restart but whatever caused the original build to be redeployed. GC logs showed old generation baseline rising from 1.1GB to 2.8GB between full GC cycles. Thread dump revealed 4 pool threads each holding approximately 800MB in ThreadLocal maps.
AssumptionTeam initially blamed the new payment gateway integration deployed 3 days prior. They rolled back the deployment, but the leak continued unchanged. Second assumption was a regression in the G1 collector — team upgraded the JDK patch version. Leak persisted at the same rate. Both assumptions cost four hours of incident time.
Root causeA custom ThreadPoolTaskExecutor with 4 core threads and 16 maximum handled payment processing. Each task set a ThreadLocal containing a PaymentContext object approximately 500KB in size — holding the full transaction payload, fraud check results, and audit trail. The finally block called a cleanup method that cleared the context fields but never called ThreadLocal.remove(). The ThreadLocal entry itself — and the empty-but-still-allocated PaymentContext object — remained pinned to the thread's ThreadLocalMap. Over 18 hours of processing approximately 2,000 tasks per hour, the 4 pool threads accumulated stale contexts that the GC could not touch because the threads themselves were GC roots. The fix was a single missing line.
Fix1. Added REQUEST_CONTEXT.remove() in the finally block of every Runnable submitted to the executor — the immediate one-line fix. 2. Registered a TaskDecorator on the ThreadPoolTaskExecutor to enforce cleanup at the framework level, so individual task authors cannot forget it. 3. Added a custom Micrometer gauge monitoring the ThreadLocal map size per thread, exposing the metric to the production dashboard so future accumulation is visible before it becomes an incident. 4. Enabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/myapp/heapdumps/ for all production JVMs — if it happens again, evidence is captured automatically. 5. Added a code review checklist item and a custom ArchUnit rule: 'Every ThreadLocal.set() must have a corresponding remove() in a finally block.'
Key Lesson
ThreadLocal in a thread pool is the single most dangerous leak pattern in enterprise Java — threads are GC roots, and their ThreadLocalMaps are unreachable from outside without heap analysis toolsClearing fields inside a ThreadLocal value does not remove the ThreadLocal entry itself — you must call remove() on the ThreadLocal object, not just null out fields on the valueFramework-level enforcement via TaskDecorator is more reliable than per-developer discipline — if cleanup can be forgotten, it will eventually be forgottenA leak that took 18 hours to manifest will take 18 hours to reproduce without a heap dump — capture evidence before taking any other actionRolling back code changes is useless if the leak is in a long-lived component like a thread pool that persists across deployments — identify the mechanism, not just the deployment
Production Debug GuideSystematic debugging path for production OOM incidents.
OutOfMemoryError in logs, service restarted automatically.Check if -XX:+HeapDumpOnOutOfMemoryError was enabled before doing anything else. If a .hprof file exists at the configured path, download it immediately before the disk is cleared. If no dump exists, you need to reproduce the leak under load and capture evidence on the next occurrence — do not restart repeatedly without enabling the flag first.
Heap usage trending upward over hours but no OOM yet.Take two class histograms 10 to 15 minutes apart using jcmd GC.class_histogram. Compare instance counts between the two snapshots. Look for classes whose instance count grows monotonically — that is your leak class. The difference between snapshots is more informative than either snapshot alone.
You have a heap dump (.hprof file).Open in Eclipse MAT. Run Leak Suspects Report immediately — it automates the initial hunt and identifies the largest retained heaps with the reference chains keeping them alive. Then inspect the Dominator Tree to find which single object retains the most memory. Follow the GC root path from that object to identify what is keeping the entire chain alive: static field, ThreadLocal, listener, or something else.
MAT shows a large retained heap under a static field or collection.Trace the reference chain from the GC root. Identify which specific static field holds the reference. Check whether the collection has an eviction policy — any Map over 10,000 entries without a TTL or size limit is a strong leak candidate. In MAT, use OQL: SELECT * FROM java.util.HashMap WHERE size > 10000 to find all large maps in one query.
Old generation occupancy stays above 80% after full GC.This confirms a leak in long-lived objects. Use jstat -gcutil pid 1000 30 to monitor old generation occupancy over 30 seconds. If it never drops below a rising baseline after full GC, objects are being promoted to old gen and never collected. A heap dump is now required to identify what — the histogram shows what is there, but not why it is being held.
Class histogram shows many instances of domain objects but no obvious collection holding them.Check for ThreadLocal leaks. Use jcmd Thread.print to get a thread dump and look for long-lived pool threads. In MAT, search for java.lang.ThreadLocalMap entries — each thread has one, and their size is directly visible. If thread-local maps are large relative to the number of pool threads, you have a ThreadLocal leak. Grep the codebase for ThreadLocal.set() calls and verify every one has a matching remove() in a finally block.

Memory leaks in Java are reference management failures, not garbage collector failures. The GC works on reachability — if any live reference chain touches an object, it stays in memory regardless of whether your code will ever use it again. The GC is faithfully doing what the specification says. The problem is yours.

In production, leaks manifest as a rising heap baseline after each GC cycle. The old generation creeps upward after every collection. The floor rises. This pattern often goes unnoticed in staging because test load profiles rarely exercise the accumulation over hours or days that production traffic does — a leak that takes 18 hours to OOM a production service may never surface in a 5-minute load test.

The core challenge is that the JVM cannot distinguish intentional caching from unintentional retention. Only disciplined lifecycle management, defensive coding patterns, and the right monitoring setup can prevent leaks from reaching production — and when they do, the right tooling makes the difference between a two-hour diagnosis and a two-day investigation.

In 2026, with JDK 21 and virtual threads mainstream in enterprise codebases, the ThreadLocal leak pattern has become even more consequential. Virtual threads interact with ThreadLocal in ways that can amplify existing leaks, and the scoped values API (JEP 446, JDK 21 preview, JDK 23 standard) provides a safer alternative for per-request context propagation. Understanding the foundational leak mechanisms is the prerequisite for understanding why those newer APIs exist.

How the JVM Garbage Collector Actually Decides What to Free

Before you can understand why leaks happen, you need a clear picture of how the GC decides what to keep. The JVM uses reachability analysis, not reference counting. Python uses reference counting — every object tracks how many references point to it, and when that count drops to zero, the object is freed. Java takes a different approach because reference counting cannot handle circular references: if object A references object B and object B references object A, both counts are non-zero even if nothing else in the program uses either object. They would leak forever.

Reachability analysis solves this. The GC starts from a fixed set of root references — local variables on thread stacks, static fields, JNI references, and a few others — and walks the entire object graph from those roots. Any object reachable by following references from a root is considered live and is kept. Everything unreachable — including entire cycles of objects that reference only each other — is eligible for collection.

This is why a memory leak in Java is always a reference problem, not a GC problem. If you have a static List that accumulates objects, every object in that list is reachable from a GC root (the static field), so nothing gets collected — ever. The GC is behaving correctly. The leak is your reference.

Modern JVMs split the heap into regions and collect high-churn areas more aggressively. G1 uses a mix of young-generation regions collected frequently with short pauses and old-generation regions collected infrequently with longer pauses. ZGC and Shenandoah perform concurrent marking and compaction with sub-millisecond pause goals. But no collector can save you from long-lived references. An object that survives enough minor GCs gets promoted to old generation, and a leak in old generation grows silently — the old-gen baseline rises after every collection cycle until you hit OutOfMemoryError: Java heap space, often hours or days after the first leaked object was created.

The performance impact compounds: each full GC must mark and scan the entire live set in old generation. As the leaked set grows to millions of objects, full GC pause time grows proportionally. With G1, you will see increasing allocation failure GCs and eventually full GCs that pause the application for seconds. The leak does not just waste memory — it degrades GC performance, which degrades application latency, which is often the first observable symptom before memory exhaustion.

io/thecodeforge/memory/ReachabilityDemo.java · JAVA
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
package io.thecodeforge.memory;

import java.util.ArrayList;
import java.util.List;

/**
 * Demonstrates the difference between an object being logically 'done'
 * and being GC-eligible. The GC cannot read intent — only references.
 *
 * Run with: java -Xmx64m io.thecodeforge.memory.ReachabilityDemo
 * You will see OOM in roughly 60 iterations with a 64MB heap.
 */
public class ReachabilityDemo {

    /**
     * Static field == GC root.
     * Anything added here is permanently reachable and NEVER collected.
     * The GC sees a live reference chain from the classCACHE → each byte[].
     * It has no way to know that our business logic is done with those arrays.
     */
    private static final List<byte[]> CACHE = new ArrayList<>();

    public static void main(String[] args) throws InterruptedException {
        System.out.println("Simulating a static-field memory leak...");
        System.out.println("Watch the heap baseline rise — it never drops.");
        System.out.println();

        for (int iteration = 0; iteration < 1000; iteration++) {
            // Each byte array is 1 MB. Business logic is 'done' with it
            // after this method call, but CACHE still holds a reference.
            byte[] oneMegabyte = new byte[1024 * 1024];
            oneMegabyte[0] = (byte) iteration; // use it so the compiler keeps it

            CACHE.add(oneMegabyte); // <-- this is the leak

            // The GC runs frequently but the old-gen baseline never drops.
            // Each iteration pushes the floor higher by 1MB.

            System.out.printf(
                "Iteration %3d | CACHE entries: %3d | Heap used: %3d MB%n",
                iteration,
                CACHE.size(),
                (Runtime.getRuntime().totalMemory()
                    - Runtime.getRuntime().freeMemory()) / (1024 * 1024)
            );

            Thread.sleep(50);
        }
    }
}
▶ Output
Simulating a static-field memory leak...
Watch the heap baseline rise — it never drops.

Iteration 0 | CACHE entries: 1 | Heap used: 3 MB
Iteration 1 | CACHE entries: 2 | Heap used: 4 MB
Iteration 10 | CACHE entries: 11 | Heap used: 14 MB
Iteration 50 | CACHE entries: 51 | Heap used: 54 MB
Iteration 60 | CACHE entries: 61 | Heap used: 64 MB
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Mental Model
GC Roots as Anchors
Why does the JVM use reachability analysis instead of reference counting?
  • Reference counting cannot handle circular references — A references B, B references A, both counts are 1, both leak forever
  • Reachability analysis collects entire dead object cycles in one pass by starting from roots, not from objects
  • The GC roots are: thread stacks (local variables), static fields, JNI references, and a few JVM internals
  • Any object not reachable from a root is dead to the GC — whether or not it is logically dead to your code is irrelevant to the GC
📊 Production Insight
Monitor jstat -gcutil pid 1000 for Old/O (old generation occupancy). A healthy application shows a sawtooth: usage climbs, GC runs, occupancy drops back to roughly the same floor. A leaking application shows that floor rising after every cycle. That rising floor is your definitive early warning — a small, rising baseline now will become an OOM in hours or days.
With G1 on JDK 17+, enable -Xlog:gc*:file=gc.log:time,uptime and inspect the after-full-GC heap size from the logs. If it grows monotonically across collections, you have a leak in old generation.
Rule: set up old-gen occupancy alerting at 70% and 85%. 70% is the investigation threshold. 85% is the escalation threshold. Waiting for OOM means waiting for a 2 AM page.
🎯 Key Takeaway
GC is a reachability engine, not a leak prevention system. It faithfully preserves all reachable objects and collects nothing it can reach. If your code creates a permanent reference chain from a GC root to an object, the GC is powerless. Monitor old generation baseline across full GC cycles — a rising floor is the definitive leak signature and your only early warning before OOM.
Is This a Leak or Just High Memory Usage?
IfHeap baseline drops back to roughly the same floor after each full GC
UseNot a leak — you may need a larger heap, or your workload has genuine high memory requirements. Profile allocation patterns before increasing heap.
IfHeap baseline rises after each full GC cycle — the floor keeps moving up
UseLeak confirmed. Get a heap dump and analyze with MAT. This rising floor pattern is the definitive signature.
IfOld generation occupancy stays above 90% even immediately after full GC
UseSevere leak — OOM is likely within hours. Trigger a heap dump now: jcmd pid GC.heap_dump /tmp/dump.hprof. Do not wait for the automatic dump.
IfYoung generation fills rapidly but old generation is stable at a consistent level
UseNot a leak — short-lived objects are being allocated faster than minor GC can collect them. Tune young generation size or review allocation hotspots with JFR.

The Six Classic Java Memory Leak Patterns (With Real Code)

Every Java memory leak in production falls into one of six categories. Knowing them by name means you can spot them in code review in seconds and ask the right questions in a heap dump in minutes.

Pattern 1 — Unbounded Static Collections: A static field grows without any removal or eviction strategy. Because static fields are GC roots, every object in the collection is permanently reachable. This is the simplest leak to understand and one of the easiest to introduce — any utility cache implemented as a static HashMap without size bounds qualifies.

Pattern 2 — Listener or Observer Not Deregistered: You add an event listener to a button, a JMX MBeanServer, an application event bus, or any other publisher. When the subscriber is logically done, nobody calls removeListener. The publisher's internal list holds a reference to the subscriber, keeping the entire object graph rooted at that subscriber alive indefinitely.

Pattern 3 — Non-Static Inner Classes and Anonymous Classes: Every non-static inner class in Java holds an implicit reference to its enclosing outer instance. If you hand that inner class to a long-lived component — a thread pool, a static cache, an executor service — the outer instance is pinned in memory for as long as that component lives. This is invisible at the call site and very common with anonymous Runnable and Callable implementations.

Pattern 4 — ThreadLocal Variables in Thread Pools: The most dangerous pattern in enterprise code. ThreadLocal values live in a ThreadLocalMap on the Thread object itself. In a thread pool, threads are reused and never die. If you call ThreadLocal.set() and never call ThreadLocal.remove(), that value — and the full object graph it references — lives as long as the thread does, which in a pool is the lifetime of the application.

Pattern 5 — Mutable Objects as HashMap Keys: Objects used as HashMap keys that are mutated after insertion can become orphaned in the map. The hashCode() changes, the object is in the wrong bucket, and get() returns null even though the entry exists. It is consuming memory but unreachable via normal Map operations. Over time, orphaned entries fill the map.

Pattern 6 — Classloader Leaks in Application Servers: Redeploying a web application creates a new classloader. If any JVM-wide component — a JDBC driver, a logging framework, a static thread — holds a reference to a class from the old classloader, the entire old classloader and every class it loaded stays in metaspace. Each redeploy leaks one classloader. After enough redeploys, metaspace exhaustion causes OOM: Metaspace.

For Pattern 6 specifically in 2026 Kubernetes environments: if your JDBC driver registers a static singleton with DriverManager and your application is deployed as a WAR to a shared Tomcat, each undeploy and redeploy leaks the old classloader (~50 to 200MB per cycle). After 10 redeploys, the container is out of metaspace. This is why Spring Boot's embedded server model was partially motivated — isolating the classloader lifecycle with the application process avoids this class of leak entirely.

io/thecodeforge/memory/ThreadLocalLeakDemo.java · JAVA
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
package io.thecodeforge.memory;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

/**
 * Demonstrates a ThreadLocal memory leak inside a fixed thread pool.
 *
 * The pool reuses threads. ThreadLocal values set in one task remain
 * in the thread's ThreadLocalMap when the next task starts on the same thread.
 *
 * Run with: java -Xmx128m io.thecodeforge.memory.ThreadLocalLeakDemo
 * Watch heap climb with the fix commented out.
 * Uncomment REQUEST_CONTEXT.remove() to see heap stay flat.
 */
public class ThreadLocalLeakDemo {

    /**
     * ThreadLocal is NOT a variable. It is a key into a hidden
     * Map<ThreadLocal, Object> that lives on each Thread object.
     * When the Thread dies, the map is collected.
     * In a thread pool, threads never die — so neither do forgotten entries.
     */
    private static final ThreadLocal<byte[]> REQUEST_CONTEXT = new ThreadLocal<>();

    public static void main(String[] args) throws InterruptedException {
        // 4 threads — they live forever, accumulating stale ThreadLocal values
        ExecutorService threadPool = Executors.newFixedThreadPool(4);

        for (int taskNumber = 0; taskNumber < 500; taskNumber++) {
            final int currentTask = taskNumber;

            threadPool.submit(() -> {
                try {
                    // Simulating per-request data: user session, trace context,
                    // transaction payload — anything you would not want pinned
                    // to the thread after the request is done.
                    byte[] requestPayload = new byte[500 * 1024]; // 500 KB
                    requestPayload[0] = (byte) currentTask;
                    REQUEST_CONTEXT.set(requestPayload);

                    processRequest(currentTask);

                    // Without remove(), requestPayload stays in this thread's
                    // ThreadLocalMap after the task completes. The thread goes
                    // back to the pool. The reference is never cleared.
                    // 4 threads × accumulating entries = unbounded growth.

                } finally {
                    // THE FIX: uncomment this line to prevent the leak.
                    // This is the single most important line in thread pool code
                    // that uses ThreadLocal.
                    // REQUEST_CONTEXT.remove();
                }
            });

            if (currentTask % 50 == 0) {
                long usedHeapMB = (Runtime.getRuntime().totalMemory()
                    - Runtime.getRuntime().freeMemory()) / (1024 * 1024);
                System.out.printf(
                    "Tasks submitted: %3d | Heap used: ~%d MB%n",
                    currentTask, usedHeapMB
                );
            }
        }

        threadPool.shutdown();
        threadPool.awaitTermination(30, TimeUnit.SECONDS);
        System.out.println("Done. Check heap trend above.");
    }

    private static void processRequest(int taskNumber) {
        byte[] context = REQUEST_CONTEXT.get();
        // In production: this might be a UserSession, PaymentContext,
        // MDC trace ID, or any per-request state.
        System.out.printf("Task %3d on thread: %s | context bytes: %d%n",
            taskNumber,
            Thread.currentThread().getName(),
            context != null ? context.length : 0
        );
    }
}
▶ Output
Tasks submitted: 0 | Heap used: ~8 MB
Task 0 on thread: pool-1-thread-1 | context bytes: 512000
Task 1 on thread: pool-1-thread-2 | context bytes: 512000
...
Tasks submitted: 50 | Heap used: ~31 MB
Tasks submitted: 100 | Heap used: ~55 MB
Tasks submitted: 150 | Heap used: ~79 MB
Tasks submitted: 200 | Heap used: ~104 MB

--- With REQUEST_CONTEXT.remove() uncommented ---
Tasks submitted: 50 | Heap used: ~9 MB
Tasks submitted: 100 | Heap used: ~9 MB
Tasks submitted: 200 | Heap used: ~9 MB <-- flat. No leak.
Mental Model
ThreadLocal as a Hidden Map Inside Each Thread
Why is a ThreadLocal leak more dangerous than a static collection leak?
  • The leak is per-thread — a 4-thread pool means 4 independent accumulation points, not 1
  • The ThreadLocalMap is anchored in thread stacks, which are GC roots — the leak is invisible from the object graph perspective without thread-specific heap analysis
  • Clearing fields inside the ThreadLocal value does NOT remove the ThreadLocal entry — the entry stays in the ThreadLocalMap even with all fields set to null
  • In JDK 21 with virtual threads, ThreadLocal semantics are preserved but ScopedValue (JEP 446) provides a safer alternative for per-request context that is automatically cleaned up
📊 Production Insight
Pattern 3 — non-static inner class as Runnable — is the most invisible because the implicit outer reference is nowhere in the source code. An anonymous Runnable written inside a request handler implicitly captures the handler instance. If that Runnable is submitted to a cached thread pool, the handler and everything it references (DB connection wrappers, session objects, large payloads) is pinned in memory for the lifetime of the thread. The fix is one word: make the inner class static, or extract it to a top-level class. But you have to know to look for it — the heap dump shows it, but the code looks completely innocent.
In JDK 21+ codebases using virtual threads: if you pin a virtual thread to a carrier by blocking inside a synchronized block while holding a ThreadLocal, the pinning duration equals the lock hold time. Under contention, this can amplify the leak surface. The ScopedValue API avoids both ThreadLocal retention and virtual thread pinning in new code.
🎯 Key Takeaway
The six patterns are your code review checklist. Grep for: static Map or List without eviction, addListener without removeListener, new ThreadLocal without remove(), non-static anonymous inner class submitted to an executor, HashMap keys with mutable state used in hashCode, and JDBC driver registration without deregistration lifecycle. Prevention is orders of magnitude cheaper than diagnosis — a single missing remove() can cost hours of debugging and thousands in incident response.
Which Leak Pattern Am I Dealing With?
IfClass histogram shows a Map or List growing without bounds — size increases with every snapshot
UsePattern 1: Unbounded Static Collection. Add an eviction strategy or replace with Caffeine with maximumSize and expireAfterWrite.
IfHeap dump shows a publisher (EventBus, MBeanServer, UI component) holding references to subscriber objects that should have been cleaned up
UsePattern 2: Listener Not Deregistered. Implement a cleanup lifecycle method, DisposableBean, or @PreDestroy that calls removeListener.
IfHeap dump shows a request handler or controller instance retained by an anonymous Runnable or Callable in a thread pool work queue
UsePattern 3: Non-Static Inner Class. Make the inner class static or extract it to a top-level class — the implicit outer reference must be eliminated.
IfThread dump or heap dump shows pool threads with large ThreadLocalMap entries — each thread holds significant retained heap
UsePattern 4: ThreadLocal in Thread Pool. Add ThreadLocal.remove() in a finally block, and add a TaskDecorator for framework-level enforcement.
IfHashMap size grows monotonically but get() returns null for keys you know were inserted
UsePattern 5: Mutated HashMap Key. Make keys immutable or override equals/hashCode to depend only on immutable fields.
IfMetaspace grows with each application redeploy and old classloaders appear in heap dumps
UsePattern 6: Classloader Leak. Deregister JDBC drivers in contextDestroyed(), shut down application-owned thread pools, and clear static references on undeploy.

WeakReference, SoftReference and the Right Way to Build a Cache

Java provides four reference strengths, and choosing the right one is how you build caches that release memory correctly under pressure instead of growing without bounds.

A Strong reference is your normal Object obj = new Object(). The GC will never collect the referent while any strong reference to it exists. This is what every variable assignment creates by default.

A SoftReference tells the GC: keep this if you have the memory, but clear it before throwing OutOfMemoryError. The JVM guarantees that all soft references are cleared before an OOM is thrown. This makes SoftReference suitable for memory-sensitive caches where the cached value is expensive to recompute and you want to keep it as long as possible.

A WeakReference tells the GC: collect this whenever you want — I don't need it to survive a GC cycle. WeakHashMap uses this internally: if the key has no strong references outside the map, the entry is automatically removed. This is ideal for metadata caches where the cache entry's lifecycle should be bound to the key object's lifecycle.

A PhantomReference is for post-mortem cleanup. You get a notification via a ReferenceQueue after the object is enqueued for collection. Used for cleaning up native resources (off-heap memory, file handles) as a safer, more predictable alternative to finalize().

The production reality is that SoftReference-based caches have non-deterministic eviction timing that can cause thundering herd cache misses under sudden memory pressure. WeakHashMap has subtle failure modes with interned String keys and is not thread-safe. For any production cache, use Caffeine — it implements Window TinyLFU eviction, is fully thread-safe, provides statistics, integrates with Spring Boot's caching abstraction, and outperforms hand-rolled reference queues in every benchmark that matters. WeakHashMap and SoftReference are important to understand because they are the foundation, but Caffeine is what you deploy.

io/thecodeforge/memory/ReferenceTypesDemo.java · JAVA
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
package io.thecodeforge.memory;

import java.lang.ref.SoftReference;
import java.lang.ref.WeakReference;
import java.util.WeakHashMap;

/**
 * Side-by-side comparison of Strong, Soft, and Weak references.
 * Shows WeakHashMap auto-eviction and the String literal gotcha.
 *
 * Run with: java -Xmx32m io.thecodeforge.memory.ReferenceTypesDemo
 */
public class ReferenceTypesDemo {

    public static void main(String[] args) throws InterruptedException {
        demonstrateSoftReference();
        demonstrateWeakHashMap();
        demonstrateStringLiteralGotcha();
    }

    private static void demonstrateSoftReference() {
        System.out.println("=== SoftReference: cleared only under memory pressure ===");

        byte[] expensiveData = new byte[10 * 1024 * 1024]; // 10 MB
        SoftReference<byte[]> softCache = new SoftReference<>(expensiveData);

        // Drop the strong reference — only the soft reference holds the data now
        expensiveData = null;

        System.out.println("Before pressure — data available: "
            + (softCache.get() != null)); // true

        // Simulate memory pressure — forces GC to consider clearing soft refs
        try {
            byte[] pressureBlock = new byte[25 * 1024 * 1024];
            System.out.println("Allocated pressure block: " + pressureBlock.length + " bytes");
        } catch (OutOfMemoryError oom) {
            System.out.println("OOM — JVM cleared soft refs before throwing");
        }

        System.out.println("After pressure  — data available: "
            + (softCache.get() != null)); // likely false
        System.out.println();
    }

    private static void demonstrateWeakHashMap() throws InterruptedException {
        System.out.println("=== WeakHashMap: entry lives only as long as the key ===");

        WeakHashMap<Object, String> metadataCache = new WeakHashMap<>();

        // Use Object as key — a domain object that has no other references.
        // This simulates caching metadata about a parsed AST node.
        Object sessionKey = new Object();
        metadataCache.put(sessionKey, "{ role: admin, locale: en-US }");

        System.out.println("Cache size before GC: " + metadataCache.size()); // 1

        // Drop the only strong reference to the key
        sessionKey = null;

        System.gc();
        Thread.sleep(200); // give GC time to run

        // WeakHashMap expunges stale entries on the next map operation
        System.out.println("Cache size after GC:   " + metadataCache.size()); // 0
        System.out.println("Entry auto-evicted — no manual removal needed!");
        System.out.println();
    }

    private static void demonstrateStringLiteralGotcha() throws InterruptedException {
        System.out.println("=== String literal key: the WeakHashMap gotcha ===");

        WeakHashMap<String, String> brokenCache = new WeakHashMap<>();

        // String literals are interned — the JVM string pool holds a permanent
        // strong reference to them. This key will NEVER be collected.
        // The WeakHashMap entry stays forever.
        String literalKey = "my-session-key"; // interned — permanent strong ref
        brokenCache.put(literalKey, "some cached value");

        // Setting literalKey = null does NOT clear the intern pool reference.
        // The string pool retains the strong reference.
        literalKey = null;

        System.gc();
        Thread.sleep(200);

        System.out.println("Cache size after GC with String literal key: "
            + brokenCache.size()); // still 1 — entry was NOT evicted
        System.out.println("This WeakHashMap will grow forever with String literal keys.");
        System.out.println("Fix: use 'new String(key)' or a domain object as the key.");
        System.out.println("Better fix: use Caffeine with expireAfterWrite and maximumSize.");
    }
}
▶ Output
=== SoftReference: cleared only under memory pressure ===
Before pressure — data available: true
Allocated pressure block: 26214400 bytes
After pressure — data available: false

=== WeakHashMap: entry lives only as long as the key ===
Cache size before GC: 1
Cache size after GC: 0
Entry auto-evicted — no manual removal needed!

=== String literal key: the WeakHashMap gotcha ===
Cache size after GC with String literal key: 1
This WeakHashMap will grow forever with String literal keys.
Fix: use 'new String(key)' or a domain object as the key.
Better fix: use Caffeine with expireAfterWrite and maximumSize.
Mental Model
Reference Strength as a Spectrum of Permanence
When would you choose WeakReference over SoftReference for a cache?
  • Use WeakReference when the cached value is only useful while the key is alive — metadata for a parsed AST node, classloader-scoped data, or canonicalisation maps
  • Use SoftReference when the cached value is expensive to recompute and you want to keep it as long as possible without risking OOM — image thumbnails, compiled templates
  • WeakReferences are collected eagerly at the next GC cycle; SoftReferences are cleared only under genuine memory pressure before OOM
  • For production caches with predictable behaviour, use Caffeine with explicit maximumSize and expireAfterWrite — eviction is gradual, measurable, and does not cause thundering herd cache misses
📊 Production Insight
The String literal WeakHashMap failure is the most common WeakHashMap mistake and it is completely silent. Developers write WeakHashMap<String, V> expecting automatic cleanup, use string literal keys because the keys come from configuration, observe that the map never shrinks, and spend hours debugging before discovering the intern pool. The fix is simple: use a domain object as the key, or use new String(key) to create a non-interned copy. The better fix is to stop using WeakHashMap for production caches and use Caffeine — it handles eviction, TTL, concurrency, statistics, and monitoring in a single well-tested library.
WeakHashMap is also not thread-safe. In any concurrent context, wrap it with Collections.synchronizedMap() or replace it with ConcurrentHashMap. The combination of non-thread-safety and surprising String interning behaviour makes WeakHashMap a footgun in most production scenarios.
🎯 Key Takeaway
WeakReference for lifecycle-bound metadata. SoftReference for memory-sensitive, recomputable caches. Caffeine for everything in production — it handles eviction, TTL, thread safety, and observability better than any hand-rolled reference queue. Never use a plain HashMap as a cache without an eviction strategy. Never use a String literal as a WeakHashMap key — the string pool holds a permanent strong reference and the entry will never be evicted.
Choosing the Right Cache Implementation
IfCache entries should live only as long as the key object itself — key death means entry death
UseUse WeakHashMap with non-interned, non-primitive keys (domain objects, new String()), or Caffeine with weakKeys() for thread safety
IfCache should keep data as long as possible but release under genuine memory pressure
UseUse SoftReference-based cache or Caffeine with maximumWeight — but accept that eviction timing is non-deterministic
IfCache needs TTL, size limits, and predictable gradual eviction
UseUse Caffeine.newBuilder().maximumSize(10_000).expireAfterWrite(Duration.ofMinutes(10)).build() — this is the right answer for almost every production cache
IfCache is accessed by many concurrent threads
UseUse Caffeine — lock-free reads, striped writes, and built-in concurrency. Never use plain WeakHashMap in concurrent code.

Finding Leaks in Production: VisualVM, JVM Flags and Eclipse MAT

Knowing the patterns is half the battle. The other half is diagnosing a leak you did not write — in a service you have never seen before, under traffic you cannot fully reproduce. Here is the systematic approach that works in real incidents.

Step 1: Confirm the leak with GC logs. Enable GC logging on every production JVM: -Xlog:gc*:file=gc.log:time,uptime:filecount=5,filesize=20m (JDK 9+ unified logging syntax). A healthy heap shows a sawtooth pattern — usage climbs, GC runs, usage drops back to a consistent baseline. A leaking heap shows that baseline creeping upward after every GC cycle. That rising floor is your smoking gun before you touch any other tool.

Step 2: Get a heap dump. Trigger one without restarting: jcmd pid GC.heap_dump /tmp/heapdump.hprof. This is preferred over jmap -dump in production because it uses a safer code path in JDK 9+. For automated capture, add -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp to your JVM flags permanently — this is non-negotiable for production services.

Step 3: Analyse with Eclipse MAT. Open the .hprof file and immediately run Leak Suspects Report. MAT identifies the largest retained heaps and the reference chains keeping them alive without you needing to know where to look. Then examine the Dominator Tree — it shows retained heap (the total memory that would be freed if this object were collected, including its entire object graph), not just shallow heap (the object's own bytes). Follow the dominator chain until you reach the GC root.

Step 4: VisualVM for live profiling. Connect via JMX, open the Sampler tab, and use Memory sampling to see which classes have the most live instances and total retained bytes. The key metric is monotonic growth — a class whose instance count keeps rising across samples is leaking.

Step 5: Java Flight Recorder for continuous low-overhead production monitoring. The jdk.OldObjectSample event captures objects that have survived multiple GC cycles — exactly the objects you care about — with near-zero overhead (under 2% CPU). Run jcmd pid JFR.start duration=300s filename=recording.jfr settings=profile and open in JDK Mission Control. This is the preferred approach for production systems where you need ongoing visibility without the stop-the-world cost of heap dumps.

MAT OQL for power users: SELECT FROM java.util.HashMap WHERE size > 10000 finds large maps. SELECT FROM java.lang.Thread WHERE name LIKE 'pool*' finds pool threads and their retained heap. The Compare Snapshots feature is essential — take two dumps 15 minutes apart and MAT shows exactly what grew between them, confirming the leak is active and identifying the growing class.

io/thecodeforge/memory/LeakDetectionSetup.java · JAVA
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
package io.thecodeforge.memory;

/**
 * Production leak detection configuration reference.
 *
 * Add JVM flags to your startup configuration:
 * - Docker: ENV JAVA_OPTS in Dockerfile or docker-compose
 * - Kubernetes: spec.containers[].env in deployment manifest
 * - systemd: Environment= in unit file
 *
 * =====================================================================
 * MANDATORY JVM FLAGS FOR PRODUCTION LEAK DETECTION
 * =====================================================================
 *
 * # Automatic heap dump on OutOfMemoryError — non-negotiable
 * -XX:+HeapDumpOnOutOfMemoryError
 * -XX:HeapDumpPath=/var/log/myapp/heapdumps/
 *
 * # GC logging — sawtooth pattern confirms health, rising baseline confirms leak
 * -Xlog:gc*:file=/var/log/myapp/gc.log:time,uptime:filecount=5,filesize=20m
 *
 * # Native memory tracking — for off-heap and metaspace leaks
 * -XX:NativeMemoryTracking=summary
 *
 * # G1 is default in JDK 9+ and suitable for most workloads
 * -XX:+UseG1GC
 * -XX:MaxGCPauseMillis=200
 *
 * =====================================================================
 * LIVE DIAGNOSTIC COMMANDS (no restart required)
 * =====================================================================
 *
 * # Find the JVM process ID
 * $ jps -l
 * 18423 io.thecodeforge.service.PaymentService
 *
 * # Trigger a heap dump without stopping the process
 * # Preferred over jmap in production — safer code path in JDK 9+
 * $ jcmd 18423 GC.heap_dump /tmp/heapdump-$(date +%Y%m%d-%H%M%S).hprof
 *
 * # Class histogram — top memory consumers by class type
 * # Take two snapshots 10 minutes apart and diff them
 * $ jcmd 18423 GC.class_histogram | head -30
 *
 * # GC occupancy — watch Old/O for rising baseline across collections
 * # O = old gen occupancy, S0/S1 = survivor spaces, E = eden
 * $ jstat -gcutil 18423 1000 30
 *
 * # Native memory summary — catches metaspace and direct buffer leaks
 * $ jcmd 18423 VM.native_memory summary
 *
 * # Thread dump — needed for ThreadLocal leak analysis
 * $ jcmd 18423 Thread.print > /tmp/threaddump.txt
 *
 * # Start JFR recording — near-zero overhead, runs in production
 * # Look for jdk.OldObjectSample events for long-lived object tracking
 * $ jcmd 18423 JFR.start duration=300s filename=/tmp/recording.jfr settings=profile
 *
 * =====================================================================
 * INTERPRETING A CLASS HISTOGRAM
 * =====================================================================
 *
 * num   #instances   #bytes   class name
 * ---   ----------   ------   ----------
 *   1:    950,234    22.8MB   [B (byte arrays)
 *   2:    420,000    13.4MB   io.thecodeforge.model.UserSession
 *   3:    420,000     6.7MB   java.util.HashMap$Node
 *
 * If UserSession count grows monotonically across two snapshots,
 * and HashMap$Node count matches it (UserSession has a HashMap field),
 * you almost certainly have a session or cache without eviction.
 *
 * The 1:1 ratio between UserSession and HashMap$Node is a strong signal
 * that the same objects are being accumulated — not coincidentally similar counts.
 */
public class LeakDetectionSetup {

    public static void main(String[] args) {
        System.out.println("See Javadoc above for production JVM configuration.");
        System.out.println();

        // Runtime heap stats — useful for a /health/memory endpoint
        Runtime jvmRuntime = Runtime.getRuntime();
        long maxHeapMB   = jvmRuntime.maxMemory()   / (1024 * 1024);
        long totalHeapMB = jvmRuntime.totalMemory()  / (1024 * 1024);
        long freeHeapMB  = jvmRuntime.freeMemory()   / (1024 * 1024);
        long usedHeapMB  = totalHeapMB - freeHeapMB;

        System.out.printf("Max heap:   %6d MB%n", maxHeapMB);
        System.out.printf("Used heap:  %6d MB%n", usedHeapMB);
        System.out.printf("Free heap:  %6d MB%n", freeHeapMB);
        System.out.printf("Usage:      %6.1f%%%n",
            (double) usedHeapMB / maxHeapMB * 100);
    }
}
▶ Output
See Javadoc above for production JVM configuration.

Max heap: 256 MB
Used heap: 8 MB
Free heap: 248 MB
Usage: 3.1%
Mental Model
Heap Dump as a Crime Scene Photo
Why use MAT over VisualVM for heap dump analysis?
  • MAT's Leak Suspects Report automates the initial hunt — it identifies the largest retained heaps and the reference chains keeping them alive without you knowing where to start
  • MAT's Dominator Tree shows retained heap (total memory freed if this object is collected), not just shallow heap (the object's own bytes) — the distinction is everything for identifying the real culprit
  • MAT's Compare Snapshots feature shows what grew between two dumps — essential for confirming an active leak and identifying the growing class before an OOM occurs
  • MAT's OQL lets you query the heap like a database — find all HashMaps over 10K entries, all ThreadLocalMaps, all instances of a specific domain class
  • VisualVM is better for live interactive sampling and CPU profiling. MAT is the right tool for post-mortem heap analysis.
📊 Production Insight
Taking a heap dump from a multi-gigabyte heap causes a full Stop-The-World pause for the entire dump duration — on a 4GB heap this can be 30 to 60 seconds of complete application unavailability. jmap -dump is the worst offender and should never be used in production. jcmd GC.heap_dump is safer but still triggers STW. Best practice: rely entirely on -XX:+HeapDumpOnOutOfMemoryError for crash captures, and use Java Flight Recorder with jdk.OldObjectSample for continuous production monitoring. JFR runs with under 2% overhead and captures exactly the information you need — objects that have survived multiple GC cycles — without touching the Stop-The-World path.
🎯 Key Takeaway
Diagnosis requires evidence captured at the right moment. Enable -XX:+HeapDumpOnOutOfMemoryError from day one on every production JVM — without a heap dump from the crash, diagnosing an OOM is guesswork. Use jcmd GC.class_histogram for quick live checks without STW impact. Use Eclipse MAT for deep forensic analysis of heap dumps. Use JFR for continuous low-overhead production monitoring. The rising old-generation baseline in GC logs is your first confirmation of a leak — everything else is post-mortem investigation.
Which Tool to Use for Leak Detection?
IfService crashed with OOM and you have a .hprof file
UseUse Eclipse MAT — open the dump, run Leak Suspects Report, inspect the Dominator Tree, follow the GC root path
IfHeap is rising but no crash yet — want to identify the growing class
UseTake two jcmd GC.class_histogram snapshots 10 to 15 minutes apart and diff them — growing classes are the leak candidates
IfNeed continuous monitoring with minimal overhead in production
UseEnable JFR with jdk.OldObjectSample event — under 2% CPU overhead, captures long-lived objects, opens in JDK Mission Control
IfNeed to inspect a running JVM interactively without taking a full heap dump
UseConnect VisualVM via JMX and use the Memory Sampler tab to watch instance counts in real time
IfSuspect a metaspace or off-heap (direct buffer) leak rather than heap
UseUse jcmd VM.native_memory summary and enable -XX:NativeMemoryTracking=summary — heap analysis tools will not show you what is not on the heap
🗂 Reference Types Compared
When each type is collected, ideal use cases, and behaviour after collection.
Reference TypeCollected When?Ideal Use Caseget() After GC
Strong Reference (normal variable assignment)Never while the reference exists — the GC will not touch itAll regular objects — this is the default for every variable and field in JavaNot applicable — the referent is always live while the reference exists
SoftReferenceOnly under genuine memory pressure, guaranteed before OOM is thrownMemory-sensitive caches where the cached value is expensive to recompute (compiled templates, image thumbnails)Returns null after the GC clears it — always check for null before dereferencing
WeakReferenceNext GC cycle — no guarantee on timing, but collected eagerly regardless of memory pressureWeakHashMap metadata caches where entry lifetime should match key lifetime, canonicalisation maps, listener registriesReturns null after collection — always check for null, and do not use String literals as keys in WeakHashMap
PhantomReferenceAfter the object is finalised and enqueued, before memory is actually reclaimedNative resource cleanup (off-heap memory, file handles) as a safer and more predictable alternative to finalize()Always returns null by specification — use a ReferenceQueue to receive notification of collection
WeakHashMap entryAutomatically when the key object has no strong references outside the map — expunged on the next map operation after GCCache where entry lifetime should match key lifetime — effective only with non-interned, non-primitive keysEntry removed automatically — but only on next map operation (get, put, size), not proactively after GC

🎯 Key Takeaways

  • A Java memory leak is always a reachability problem, not a GC failure — the GC cannot collect an object that any live reference chain touches, even if your code will never use that object again. The GC is working correctly. Your reference is the problem.
  • ThreadLocal in a thread pool is the most dangerous leak pattern in enterprise Java — always call ThreadLocal.remove() in a finally block, or use a framework-level TaskDecorator that does it for you. In JDK 23+, ScopedValue provides a safer alternative for per-request context.
  • WeakHashMap silently fails to evict entries when String literals are used as keys because the JVM string pool holds a permanent strong reference to every interned string — use new String(key) or a domain object as the key, or replace WeakHashMap with Caffeine for anything in production.
  • Enable -XX:+HeapDumpOnOutOfMemoryError and -XX:HeapDumpPath on every production JVM from day one — without a heap dump captured at crash time, diagnosing an OOM is guesswork that typically takes days instead of hours.
  • The rising old-generation baseline after full GC is the definitive leak signature and your only early warning before OOM. A healthy sawtooth returns to the same floor after each collection. A leaking sawtooth shows a floor that rises monotonically — monitor this metric and alert at 70% old-gen occupancy.

⚠ Common Mistakes to Avoid

    Forgetting ThreadLocal.remove() in thread pool tasks
    Symptom

    Heap climbs steadily under load even with stable active user count. Class histogram shows context or session objects multiplying. Old generation fills with objects tied to request processing that should have been transient. Thread dump or MAT inspection shows pool threads each holding hundreds of megabytes in ThreadLocalMap.

    Fix

    Always wrap ThreadLocal.set() in a try block with REQUEST_CONTEXT.remove() in the finally clause — no exceptions, even if you are certain the task will succeed. In Spring, implement TaskDecorator and register it on ThreadPoolTaskExecutor to enforce cleanup at the framework level so individual task authors cannot introduce the leak. Consider ScopedValue (JDK 23+) for new code requiring per-request context propagation — it is automatically cleaned up and does not require explicit remove().

    Using a String literal as a WeakHashMap key
    Symptom

    WeakHashMap never shrinks despite null-ing out the key reference. Entries accumulate indefinitely. Memory grows without bound in what was intended to be a self-evicting cache.

    Fix

    String literals are interned by the JVM — the string pool holds a permanent strong reference that the weak key mechanism cannot overcome. Use new String(key) to create a heap-allocated, non-interned key that can be collected, or use a domain object as the key. The production-grade fix is to replace WeakHashMap with Caffeine using explicit size bounds and TTL — predictable eviction, thread safety, and statistics included.

    Registering listeners on a long-lived publisher and never deregistering
    Symptom

    Heap dump shows a publisher (EventBus, JMX MBeanServer, Spring ApplicationEventPublisher) holding thousands of stale subscriber instances. Subscriber objects whose owning components were closed, navigated away from, or garbage-collected hours ago are still reachable through the publisher's internal listener list.

    Fix

    Implement a cleanup lifecycle method that calls publisher.removeListener(this), eventBus.unregister(this), or equivalent. In Spring, use @EventListener on a managed bean (Spring handles deregistration on context close) or implement DisposableBean to deregister in destroy(). For UI components, use weak listener patterns where the framework supports them.

    Using a non-static inner class or anonymous class as a Runnable in a thread pool
    Symptom

    Heap dump shows request handler or controller instances retained by Runnable objects sitting in the thread pool's work queue or already executing — long after the request that created them completed. The handler's entire object graph (session data, large payloads, DB connection wrappers) is pinned in memory.

    Fix

    Make the inner class static — this removes the implicit outer reference entirely. Or extract it to a top-level class. If the Runnable genuinely needs data from the outer class, pass it explicitly as a constructor parameter to the static inner class. This makes the dependency visible and controllable rather than implicit and permanent.

    Not deregistering JDBC drivers and thread pools on application undeploy
    Symptom

    Metaspace grows with each redeploy. Old classloaders appear in heap dumps from previous deployments. After 10 to 20 redeploys, the container crashes with OutOfMemoryError: Metaspace. Tomcat logs warnings about memory leaks on undeploy.

    Fix

    Implement ServletContextListener.contextDestroyed() to call DriverManager.deregisterDriver() for each registered driver, shut down all thread pools owned by the application, and clear any static references that point to application-classloader-loaded classes. In Spring Boot with embedded servers, this is largely handled automatically — but it remains critical for traditional WAR deployments to shared Tomcat or JBoss containers.

Interview Questions on This Topic

  • QThe GC is supposed to handle memory management in Java — so how can a memory leak even occur? Walk me through the exact mechanism that keeps an object alive despite it being logically unused.Mid-levelReveal
    A Java memory leak is a reachability problem, not a GC failure. The GC uses reachability analysis starting from GC roots — thread stacks, static fields, JNI references. It walks the entire object graph and retains everything reachable from a root, regardless of whether your business logic will ever use that object again. The GC cannot read intent; it can only follow references. A leak occurs when your code creates a reference chain from a GC root to a logically dead object and never severs it. Common mechanisms: a static Map that grows without eviction (static field is a GC root), a ThreadLocal set in a pooled thread without calling remove() (the Thread is a GC root, its ThreadLocalMap is reachable), or a listener registered on a long-lived publisher that is never deregistered (the publisher holds a reference to the subscriber). In all cases, the GC is working correctly — the reference chain is live, and the objects on it are kept. The problem is the reference, not the collector.
  • QYou get paged at 2 AM: production service restarted with OutOfMemoryError. You have a heap dump. Walk me through exactly what you do next — tools, commands, what you are looking at, and how you pinpoint the root cause.Mid-levelReveal
    Step 1: Open the .hprof file in Eclipse MAT and run Leak Suspects Report immediately — it automates the initial identification of the largest retained heaps and the reference chains holding them. Step 2: Examine the Dominator Tree sorted by retained heap descending — find which single object retains the most memory including its entire object graph. Step 3: Click through to the GC root path of the top dominator — this shows the chain from the GC root (static field, thread, JNI) to the leaked object. That chain is your root cause. Step 4: Cross-reference with jstat -gcutil output from before the crash — confirm old generation was rising across full GC cycles, which confirms the leak was long-lived. Step 5: Use MAT OQL for targeted queries: SELECT * FROM java.util.HashMap WHERE size > 10000 to find large uncapped caches. Step 6: If the dominator is a thread, inspect its ThreadLocalMap size — if each pool thread holds hundreds of megabytes, you have a ThreadLocal leak and need to grep the codebase for ThreadLocal.set() without matching remove() in finally blocks.
  • QWhat is the difference between a SoftReference and a WeakReference, and when would you choose one over the other? What happens if you use a String literal as a key in a WeakHashMap, and why does the entry not get evicted?JuniorReveal
    A SoftReference is cleared by the GC only when the JVM is about to throw OutOfMemoryError — it keeps the referent alive as long as memory is available. A WeakReference is cleared at the next GC cycle regardless of memory pressure. Use SoftReference for memory-sensitive caches where recomputation is expensive and you want to keep the value as long as possible. Use WeakReference for metadata caches where the cache entry should be automatically evicted when the key object is no longer referenced anywhere else. If you use a String literal as a WeakHashMap key, the entry is never evicted because Java interns String literals — the JVM's string pool holds a permanent strong reference to every interned string. Setting your key variable to null does not remove the intern pool reference. The weak key is never collected, and the WeakHashMap entry persists forever, growing the map without bound. The fix is to use new String(key) for a non-interned heap-allocated key, or to use a domain object as the key — or better, to replace WeakHashMap with Caffeine.
  • QYour service uses a thread pool with 8 threads. Each task sets a ThreadLocal with a 2MB payload. After running 10,000 tasks, heap usage is higher than expected. Why exactly, and what is the precise fix?Mid-levelReveal
    ThreadLocal values are stored in a ThreadLocalMap that lives on the Thread object itself. In a fixed thread pool, threads are reused and never garbage collected — they run for the application's lifetime. Each task calls ThreadLocal.set() with a 2MB payload but never calls ThreadLocal.remove(). When the task completes and the thread returns to the pool, the ThreadLocal entry remains in the thread's ThreadLocalMap. The thread goes on to run the next task, but the old 2MB payload is still there. With 8 threads each accumulating stale payloads, the leak grows continuously. The critical nuance: clearing fields inside the ThreadLocal value does not remove the ThreadLocal entry from the map — you must call ThreadLocal.remove() on the ThreadLocal object itself. The fix: add REQUEST_CONTEXT.remove() in a finally block wrapping the task body. For framework enforcement so individual task authors cannot forget this, register a TaskDecorator on the ThreadPoolTaskExecutor that wraps every submitted Runnable in try/finally with remove() in the finally clause.
  • QYou deploy a Spring Boot WAR to Tomcat. After 20 redeploys without restarting the container, it crashes with OutOfMemoryError: Metaspace. What is happening mechanically, and how do you fix it?SeniorReveal
    This is a classloader leak. Each WAR deployment creates a new web application classloader for the deployed application. When the WAR is undeployed, for the old classloader to be garbage collected, every reference from outside the old classloader to any class it loaded must be severed. If any JVM-wide component — the JDBC DriverManager, a logging framework's static registry, a background thread started by the application — holds a reference to a class loaded by the old classloader, the old classloader cannot be collected. Metaspace stores class metadata per classloader. The old classloader and all the classes it loaded remain in metaspace. Each redeploy leaks one classloader with its full class set — typically 50 to 200MB. After 20 redeploys with no container restart, the container runs out of metaspace. The fix: implement ServletContextListener.contextDestroyed() to call DriverManager.deregisterDriver() for each registered JDBC driver, shut down all thread pools started by the application, cancel any background schedulers, and clear any static caches that hold references to application-classloader-loaded types. In Spring Boot with the embedded server model, this class of leak is much less common because the entire JVM restarts with the application — but it remains critical for traditional WAR deployments to shared containers.

Frequently Asked Questions

How do I find a memory leak in a Java application without restarting it?

Use jcmd pid GC.class_histogram to print a live class instance count — take two snapshots 10 to 15 minutes apart and diff them to find classes whose instance count is growing monotonically. For a full heap analysis without restarting, trigger a dump with jcmd pid GC.heap_dump /tmp/dump.hprof and open it in Eclipse MAT. Note that heap dumps cause a full Stop-The-World pause proportional to heap size, so use them sparingly in production. For continuous monitoring without STW cost, enable Java Flight Recorder with jcmd pid JFR.start duration=120s filename=recording.jfr and examine the jdk.OldObjectSample events in JDK Mission Control — this runs with under 2% overhead and captures long-lived objects across GC cycles.

Does setting an object to null in Java immediately free its memory?

No. Setting a reference to null removes that specific reference from the reachability graph, but the object's memory is only reclaimed once all references to it are gone and the GC has run. You have no control over when the GC runs or reclaims specific objects. In most code you do not need to null out references explicitly — letting local variables go out of scope naturally removes their references. Explicitly nulling long-lived collection entries or static fields is sometimes necessary to aid the GC, but setting a local variable to null at end-of-method has no practical effect.

What is the difference between a memory leak and an OutOfMemoryError? Are they the same thing?

A memory leak is a cause; OutOfMemoryError is one possible symptom. A memory leak means your application holds references to objects it will never use again, preventing the GC from reclaiming them. An OOM is thrown when the JVM cannot allocate memory for a new object after exhausting the available heap and running a full GC. You can get an OOM without any leak — processing a genuinely enormous dataset or allocating an oversized buffer both cause OOM without retention problems. You can have a slow leak that runs for days or weeks before causing an OOM. Always look for the rising heap baseline after full GC cycles — that pattern distinguishes a leak from simply needing more heap.

Can a Java memory leak occur in metaspace instead of the heap?

Yes. Metaspace (renamed from PermGen in JDK 8) stores class metadata — the structural information about every class the JVM has loaded. Classloader leaks — where an old classloader is not garbage collected because something outside it references a class it loaded — cause metaspace to grow unboundedly. Each WAR redeploy in a shared container creates a new classloader. If the old one is not collected, its classes remain in metaspace permanently. Monitor metaspace with jcmd pid VM.native_memory summary and watch the Class space. Enable -XX:NativeMemoryTracking=summary for detailed native memory accounting. Heap analysis tools like MAT will not show metaspace leaks because metaspace is not part of the Java heap.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousJava Modules — JPMSNext →Spring Boot Introduction
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged