Senior 4 min · March 06, 2026

Sparse Arrays in Java — Preventing 8 TB OOM in Grids

A 500k×500k double[][] takes 1.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • A sparse array stores only non-default elements, reducing memory when most slots are empty.
  • HashMap-based: O(1) average access, but no ordering and high overhead per entry.
  • TreeMap-based: O(log n) access, maintains sorted keys, useful for range queries.
  • CSR format: Uses parallel arrays for indices and values — best for static, dense-in-sparse matrices.
  • Performance insight: For 10M slots with 0.1% occupancy, sparse array uses ~16 MB vs 80 MB for dense.
  • Production insight: HashMap sparse array causes OOM if load factor and resize aren't tuned for sparse writes.
Plain-English First

Imagine a massive stadium with 100,000 seats, but only 200 people show up. You wouldn't hand out a wristband to every single empty seat just to track who's there — you'd keep a list of the 200 occupied seats and their row numbers. That's exactly what a sparse array does: instead of allocating memory for every possible slot (most of which are empty), it only remembers the slots that actually have data. It's the difference between renting a 100-floor skyscraper when you only use two offices versus just renting those two offices directly.

Most Java developers reach for a plain array or ArrayList without a second thought. That works beautifully when your data is dense. But what happens when you're modeling a 1,000,000 × 1,000,000 grid where fewer than 0.001% of cells have values? At 8 bytes per double, a fully allocated 2D array needs 8 petabytes of RAM. Your JVM heap isn't that generous.

This isn't a theoretical problem. It shows up in graph adjacency matrices, scientific computing, game world maps, recommendation engines, and financial risk grids every single day. The fix? Stop allocating slots that'll never hold a value. Use a sparse array instead. But the devil's in the details — pick the wrong implementation and you'll trade one memory problem for another.

What Is a Sparse Array?

A sparse array is a data structure that stores only the non-default (non-zero, non-null) elements of a large-indexed collection. Instead of allocating a contiguous block of memory for every possible index, it uses a mapping from index to value — only for indices that hold a value different from the default.

Think of it as a compressed view of an array. The index space can be huge (up to 2^31-1 for int indices), but the actual number of populated entries is small. The key insight: memory is proportional to the number of stored entries, not the theoretical maximum index.

In Java, you can implement sparse arrays using standard library classes like HashMap, TreeMap, or custom structures like Compressed Sparse Row (CSR).

io/thecodeforge/sparse/SparseArrayDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package io.thecodeforge.sparse;

import java.util.HashMap;
import java.util.Map;

public class SparseArrayDemo {
    public static void main(String[] args) {
        // Index space: 0 to 1,000,000
        // But only 3 non-zero entries
        Map<Integer, Double> sparse = new HashMap<>();
        sparse.put(100_000, 42.0);
        sparse.put(500_000, 3.14);
        sparse.put(999_999, 7.0);

        System.out.println("Memory: " + (sparse.size() * 32L) + " bytes (approx)");
        // Dense array would be 1,000,001 * 8 = 8 MB
    }
}
Output
Memory: 96 bytes (approx)
Production Insight
HashMap sparse arrays are the go-to for dynamic data, but each entry carries ~32 bytes overhead on a typical JVM.
Pre-sizing the map avoids expensive resize storms — always compute expected non-zero count first.
For fixed-size index spaces, a simple array with a sentinel default value can be faster than any map.
Key Takeaway
Sparse arrays trade index-range memory for per-entry overhead.
Always benchmark your sparsity ratio: below 1% occupancy, sparse wins.
Above 20%, a dense array often wins.

HashMap-Based Sparse Array Implementation

The simplest approach is to store only non‑default entries in a HashMap. The key encodes the index (or (row, col) for 2D), and the value holds the actual data. Default values (0, null, etc.) are never stored. This gives O(1) amortised access and is perfect when the index set changes dynamically.

But the memory overhead per entry is significant — each Map.Entry carries key, value, hash, and pointer overhead (typically ~32–40 bytes on a 64‑bit JVM with compressed OOPs). For a 1D sparse array of length N with M non‑zero entries, memory usage is roughly M * (overhead + size-of-key + size-of-value). If you need ordered iteration (e.g., range queries), HashMap is the wrong choice.

Production gotcha: resizing. When the load factor is exceeded, HashMap doubles its capacity and rehashes all entries. For a map with 10 million entries, that's a ~100ms pause — and it happens up to 20 times if you don't pre-size.

io/thecodeforge/sparse/HashMapSparseArray.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package io.thecodeforge.sparse;

import java.util.HashMap;
import java.util.Map;

public class HashMapSparseArray {
    private final Map<Integer, Double> map;
    private final double defaultValue;

    public HashMapSparseArray(double defaultValue, int expectedNonZero) {
        // Pre-size to avoid resizing: capacity = expectedSize / loadFactor + 1
        int capacity = (int) (expectedNonZero / 0.75f) + 1;
        this.map = new HashMap<>(capacity);
        this.defaultValue = defaultValue;
    }

    public void set(int index, double value) {
        if (value == defaultValue) {
            map.remove(index);
        } else {
            map.put(index, value);
        }
    }

    public double get(int index) {
        return map.getOrDefault(index, defaultValue);
    }

    public int nonZeroCount() {
        return map.size();
    }
}
Production Insight
HashMap resize doubles capacity when load factor is exceeded — if you add many entries rapidly, you may see a latency spike as the table rebuilds.
Always pre-size the HashMap with expected non‑zero count to avoid repeated resizing.
Using a primitive map library (e.g., Trove) cuts overhead by storing int→double in a single array instead of Entry objects.
Key Takeaway
HashMap sparse array = O(1) access, high per-entry memory.
Best for dynamic, unordered access with moderate occupancy (< 1%).
Worst for ordered iteration or range queries.

TreeMap-Based Sparse Array Implementation

When you need the array indices to be sorted (e.g., to support range scans or order‑dependent algorithms), TreeMap provides a natural ordering. It implements NavigableMap, giving you subMap(), headMap(), tailMap(), lowerKey(), and higherKey() operations — all in O(log n) time.

Trade‑off: each operation is O(log n) instead of O(1). For very large sparse arrays where range queries are common, this is often the right choice. But the memory overhead per entry is higher than HashMap because each node stores left/right/parent pointers in addition to the key and value.

TreeMap is also heavier on GC: each insert allocates a new node. Bulk inserts in sorted order cause tree rebalancing (red‑black tree rotations) which adds CPU cost. In production, if you're doing concurrent writes, TreeMap is not thread-safe — use ConcurrentSkipListMap instead.

io/thecodeforge/sparse/TreeMapSparseArray.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
package io.thecodeforge.sparse;

import java.util.NavigableMap;
import java.util.TreeMap;

public class TreeMapSparseArray {
    private final NavigableMap<Integer, Double> map;
    private final double defaultValue;

    public TreeMapSparseArray(double defaultValue) {
        this.map = new TreeMap<>();
        this.defaultValue = defaultValue;
    }

    public void set(int index, double value) {
        if (value == defaultValue) {
            map.remove(index);
        } else {
            map.put(index, value);
        }
    }

    public double get(int index) {
        return map.getOrDefault(index, defaultValue);
    }

    public NavigableMap<Integer, Double> range(int fromIndex, int toIndex) {
        return map.subMap(fromIndex, true, toIndex, true);
    }
}
Production Insight
TreeMap's subMap() returns a view of the original map — creating many views in a tight loop leads to excessive object allocation.
If you need to iterate over a range repeatedly, copy the entries into an array once.
Also, TreeMap is not thread-safe by default — use Collections.synchronizedSortedMap() or ConcurrentSkipListMap for multi‑threaded access.
Key Takeaway
TreeMap sparse array = O(log n) access, sorted keys, overhead ~50% more than HashMap.
Best for sorted iteration and range queries.
Avoid for high‑throughput point writes.

Compressed Sparse Row (CSR) Format for Fixed Index Sets

For static sparse matrices (e.g., a graph adjacency matrix that is built once and queried many times), the CSR format is the most memory‑efficient representation. It uses three parallel arrays:

  • values[]: stores all non‑zero values in row‑major order.
  • columnIndices[]: stores the column index for each value.
  • rowPointers[]: stores the starting index in values/columnIndices for each row, plus an extra sentinel at the end.

Memory usage is roughly (2 nonZero + rows + 1) (size of value type + size of index). No per‑entry object overhead. Accessing a single element requires binary search within the row's column indices — O(log nnz_per_row) average, but row iteration is trivial.

CSR is built once and queried many times. It's the de‑facto standard for scientific computing (used in Eigen, SuiteSparse, and Apache Commons Math). But don't use it if you need to add new entries later — you'll have to rebuild the entire structure.

io/thecodeforge/sparse/CSRSparseMatrix.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
package io.thecodeforge.sparse;

import java.util.Arrays;

public class CSRSparseMatrix {
    private final double[] values;
    private final int[] columnIndices;
    private final int[] rowPointers;
    private final int rows;
    private final int cols;

    public CSRSparseMatrix(double[] values, int[] columnIndices, int[] rowPointers, int rows, int cols) {
        this.values = values;
        this.columnIndices = columnIndices;
        this.rowPointers = rowPointers;
        this.rows = rows;
        this.cols = cols;
    }

    public double get(int row, int col) {
        int start = rowPointers[row];
        int end = rowPointers[row + 1];
        int pos = Arrays.binarySearch(columnIndices, start, end, col);
        return pos >= 0 ? values[pos] : 0.0;
    }

    public int nonZeroCount() {
        return values.length;
    }
}
Production Insight
CSR is static by nature — adding new non‑zero entries later requires rebuilding the entire structure.
If you need mutability, use a HashMap or TreeMap for construction, then convert to CSR for query‑heavy workloads.
Also, binary search on column indices assumes they are sorted per row — violation produces silent wrong results.
Use an assertion at construction time to validate sort order.
Key Takeaway
CSR = lowest memory overhead, static structure, fast row iteration.
Best for sparse matrices built once and queried many times.
Not for dynamic updates — use HashMap/TreeMap for writes, then convert.

Performance Comparison and When to Use Each

Choosing the right sparse array implementation depends on your access patterns, sparsity, and mutability requirements.

FactorHashMapTreeMapCSR
Access (read)O(1) avgO(log n)O(log nnz_per_row)
InsertO(1) avgO(log n)Not supported (rebuild)
UpdateO(1)O(log n)Not supported
Memory per entry~40–48 bytes~56–72 bytes~8–16 bytes (value+index)
Sorted iterationNo (unless external sort)Yes (natural order)Yes (row‑major implicit)
Range queryO(n) scanO(log n + k)O(nnz in rows)
Thread‑safe optionConcurrentHashMapConcurrentSkipListMapRead‑only (immutable)
General rules
  • If occupancy < 1% and access is point‑wise, use HashMap.
  • If you need ordered iteration or range queries, use TreeMap.
  • If the structure is static and memory is critical, use CSR.
  • If you need concurrent writes, prefer ConcurrentHashMap or ConcurrentSkipListMap.
io/thecodeforge/sparse/PerformanceTest.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
package io.thecodeforge.sparse;

import java.util.concurrent.ThreadLocalRandom;

public class PerformanceTest {
    public static void main(String[] args) {
        int size = 10_000_000;
        int nonZero = 10_000;

        HashMapSparseArray hashMapArray = new HashMapSparseArray(0.0, nonZero);
        long start = System.nanoTime();
        for (int i = 0; i < nonZero; i++) {
            int idx = ThreadLocalRandom.current().nextInt(size);
            hashMapArray.set(idx, 1.0);
        }
        long hashMapTime = System.nanoTime() - start;
        System.out.println("HashMap insert time: " + hashMapTime / 1e6 + " ms");

        // Similar for TreeMapSparseArray and CSR (build from sorted data)
    }
}
Production Insight
Always benchmark with your actual data pattern. Theoretical O(1) can be beaten by O(log n) if the constant factors differ.
For example, HashMap's hashing overhead can dominate when the key range is small (e.g., indices 0–10k) — a simple array with sentinel may be faster.
Also, GC pressure from HashMap entry objects can cause pauses — CSR or primitive maps (Trove, HPPC) avoid that entirely.
Key Takeaway
There's no one‑size‑fits‑all sparse array.
Match the structure to your access pattern and sparsity.
When in doubt, start with HashMap, profile, then optimise.

Real-World Use Cases and Trade-offs

  • Graph adjacency matrices: Social networks with millions of nodes but few connections per node (e.g., Twitter). CSR is ideal for matrix-vector multiplications in PageRank.
  • Recommendation engines: User-item interaction matrices where most entries are unrated. HashMap-based for incremental updates, CSR for batch computations.
  • Scientific computing: Finite element simulations use CSR for stiffness matrices.
  • Game development: World map with sparse object placements — TreeMap for spatial queries.
  • Financial risk: Covariance matrices in portfolio optimization — often 99.9% sparse.

Each domain imposes different constraints. A social graph that's updated in real-time (new follows) needs a mutable structure. A static scientific matrix that's multiplied millions of times needs CSR.

io/thecodeforge/sparse/RecommendationMatrix.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package io.thecodeforge.sparse;

import java.util.HashMap;
import java.util.Map;

public class RecommendationMatrix {
    private final Map<Long, Map<Long, Double>> userItemRatings;

    public RecommendationMatrix(int expectedUsers, int expectedItemsPerUser) {
        this.userItemRatings = new HashMap<>(expectedUsers);
    }

    public void setRating(long userId, long itemId, double rating) {
        userItemRatings
            .computeIfAbsent(userId, k -> new HashMap<>(expectedItemsPerUser))
            .put(itemId, rating);
    }

    public double getRating(long userId, long itemId) {
        Map<Long, Double> userRatings = userItemRatings.get(userId);
        if (userRatings == null) return 0.0;
        return userRatings.getOrDefault(itemId, 0.0);
    }
}
Production Insight
Nested sparse maps (HashMap of HashMaps) can cause memory explosion — each inner map has its own overhead.
Consider using a single map with a composite key (Long user + Long item) or a primitive pair library.
For batch computation, convert to CSR once and reuse.
Key Takeaway
Real-world sparse arrays often involve multiple levels of sparsity.
Profile both memory and access patterns before selecting implementation.
Periodically compact or convert structures to match the workload phase.
● Production incidentPOST-MORTEMseverity: high

The 8 TB OOM That Took Down a Financial Grid

Symptom
The service threw java.lang.OutOfMemoryError: Java heap space after 45 seconds of startup. GC logs showed repeated Full GC attempts failing to reclaim any space.
Assumption
The team assumed the matrix was dense because the raw CSV file contained 2.5 million entries. They didn't realise that 2.5M out of 250 billion possible cells is 0.001% occupancy.
Root cause
They allocated a double[][] of size 500,000 × 500,000. Each double is 8 bytes, so the array alone requires 500,000 × 500,000 × 8 = 2 × 10^12 bytes ≈ 1.8 TB. The JVM heap was set to 32 GB. The allocation triggered OOM almost immediately.
Fix
Switched to a sparse matrix representation using a HashMap<Pair<Integer,Integer>, Double> where the Pair key encodes the (row, col) coordinate. The new structure stored only the 2.5 million non-zero entries, reducing memory to ~80 MB with HashMap overhead.
Key lesson
  • Never assume occupancy from entry count — compute sparsity ratio first.
  • Always estimate memory requirements before allocating large arrays.
  • Sparse representations are not free: the HashMap overhead per entry (~32 bytes in JDK 17) must be accounted for in the memory budget.
Production debug guideCommon symptoms and immediate actions when sparse arrays behave unexpectedly.4 entries
Symptom · 01
Memory spikes when many entries are written to the sparse array
Fix
Check load factor and initial capacity of the backing map. Use a larger initial capacity to avoid resizing: new HashMap<>(expectedSize, 1.0f) or use Guava's Maps.newHashMapWithExpectedSize().
Symptom · 02
Queries on the sparse array are slower than expected, even though occupancy is low
Fix
Instrument with timing. If using TreeMap, range queries are O(log n + k). For point queries, HashMap may outperform. Consider a flat array with a sentinel default value if the index range is small enough.
Symptom · 03
CSR sparse matrix operations produce incorrect results
Fix
Verify that the column indices array and values array are parallel and sorted within each row. Use assertions to check that row pointers are monotonically non-decreasing.
Symptom · 04
ConcurrentModificationException during iteration over TreeMap sparse array
Fix
Switch to ConcurrentSkipListMap for thread-safe access, or synchronize the TreeMap using Collections.synchronizedSortedMap().
★ Quick Debug Cheat Sheet for Sparse ArraysQuick reference for diagnosing issues with HashMap‑, TreeMap‑, and CSR‑based sparse structures.
OutOfMemoryError with HashMap sparse array
Immediate action
Reduce load factor or pre-size map to expected entries.
Commands
-XX:+PrintGCDetails -XX:+PrintHeapAtGC
jmap -histo:live <pid> | head -20
Fix now
Switch to a custom int→double map (e.g., Trove, HPPC, or use java.util.HashMap<Integer,Double> with initial capacity = (int) (expectedEntries / 0.75f)).
Range queries on TreeMap sparse array are slow (minutes)+
Immediate action
Check if subMap() is being used correctly — it returns a view, but repeated calls may cause O(n log n).
Commands
JMC or async-profiler to find hot methods
Check for accidental re‑sorting: entrySet().iterator() is safe.
Fix now
If range queries are common, use a flat array with a sorted list of indices and binary search (custom sparse array).
CSR matrix multiplication gives NaN values+
Immediate action
Print the first few rows' indices and verify no gaps or duplicates.
Commands
System.out.println(Arrays.toString(rowPtr));
System.out.println(Arrays.toString(colInd));
Fix now
Ensure colInd arrays per row are sorted and rowPtr[i+1] - rowPtr[i] matches the number of non‑zeros in row i.
HashMap resize latency spikes during burst writes+
Immediate action
Check heap usage and pre-size map to avoid resize events.
Commands
jstat -gcutil <pid> 1s
jcmd <pid> GC.heap_info
Fix now
Set initial capacity using (int) (expectedEntries / 0.75f) + 1. For extremely large maps, use a load factor of 1.0f and initial capacity equal to expected entries.
Sparse Array Implementation Comparison
ImplementationAccess TimeMemory Per EntryMutabilityOrdered IterationBest For
HashMapO(1) avg~40–48 bytesSupports inserts/updatesNoDynamic, point queries
TreeMapO(log n)~56–72 bytesSupports inserts/updatesYes (sorted)Range queries, ordered iteration
CSRO(log nnz/row)~8–16 bytesRead‑only or rebuildImplicit rowsStatic matrices, memory critical

Key takeaways

1
Sparse arrays save memory by storing only non‑default elements; the data structure choice depends on access patterns and mutability.
2
HashMap gives O(1) average access but high per‑entry overhead; pre‑size it to avoid resizing.
3
TreeMap provides sorted iteration and range queries at O(log n); use ConcurrentSkipListMap for concurrency.
4
CSR is the most memory‑efficient static format, ideal for built‑once‑query‑many workloads.
5
Benchmark with your real data
theoretical complexity is only half the story; GC pressure matters.
6
Never use HashMap for very small arrays; dense array may be simpler and faster.

Common mistakes to avoid

5 patterns
×

Assuming sparse array saves memory for small arrays

Symptom
You use a HashMap-backed sparse array for an array of size 100 with 50 non‑zero entries. The memory usage is higher than a plain double[100] because each entry has overhead.
Fix
Compute the break‑even point: if occupancy > ~20–30%, a dense array is often more memory‑efficient. For small arrays, always prefer dense unless occupancy is extremely low.
×

Using HashMap without pre‑sizing, causing repeated resizing

Symptom
Adding 10 million entries to a HashMap with default capacity (16) causes ~24 rehashes, each copying all entries and causing latency spikes. GC pressure increases due to discarded old arrays.
Fix
Use new HashMap<>(expectedSize / 0.75f + 1) or Guava's Maps.newHashMapWithExpectedSize(expectedSize).
×

Forgetting that TreeMap entry iteration is not thread‑safe

Symptom
ConcurrentModificationException during iteration in a multi‑threaded context, or inconsistent state if modifying while iterating.
Fix
Use ConcurrentSkipListMap for concurrent access, or synchronize the TreeMap and its iterators explicitly.
×

Using CSR format for dynamic data sets

Symptom
Frequent rebuilds of CSR triple arrays consume CPU and cause stop‑the‑world pauses during reallocation.
Fix
Collect writes in a temporary HashMap or TreeMap, then convert to CSR once for the query phase. Or use a mutable sparse matrix format like the COO (Coordinate) list and convert later.
×

Not considering GC pressure from HashMap entry objects

Symptom
Your application runs fine initially but starts having frequent GC pauses after adding millions of entries to a HashMap sparse array. The old generation fills with Entry objects and their keys/values.
Fix
Switch to a primitive map library (Trove, HPPC, or use Int2DoubleOpenHashMap from fastutil) that stores values in flat arrays and avoids object overhead per entry. This can reduce GC pressure by 10x.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How would you implement a sparse array in Java for a use case where indi...
Q02SENIOR
What is the memory overhead of a single entry in a HashMap
Q03SENIOR
Explain how Compressed Sparse Row (CSR) format works. When would you cho...
Q04SENIOR
How would you make a sparse array thread-safe in a concurrent read-heavy...
Q05SENIOR
You have a 2D sparse matrix of size 10^6 x 10^6 with about 1 million non...
Q01 of 05SENIOR

How would you implement a sparse array in Java for a use case where indices range up to 2^31 but only 10,000 are populated? Discuss trade-offs between HashMap, TreeMap, and CSR.

ANSWER
HashMap gives O(1) average access and supports dynamic updates, but each entry has overhead (~40 bytes). TreeMap maintains sorted order for range queries at O(log n) cost. CSR is most memory‑efficient (~8 bytes per entry) but static. I'd start with HashMap, pre-sized to 10,000 / 0.75 ~ 13,334. If the business later requires range scans, I'd migrate to TreeMap or add an index. CSR is overkill unless memory is extremely constrained and the structure is built once.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
When should I use a sparse array instead of a regular array in Java?
02
Can I use a simple array of objects with null for empty slots as a sparse array?
03
How do I handle a sparse array in multi‑threaded environments?
04
What is the fastest sparse array implementation for matrix multiplication?
05
How do I estimate the memory savings of a sparse array before implementing?
06
Is there a built-in sparse array in Java standard library?
🔥

That's Arrays. Mark it forged?

4 min read · try the examples if you haven't

Previous
Copying Arrays in Java
8 / 8 · Arrays
Next
Classes and Objects in Java