Senior 5 min · March 05, 2026

Java HashSet — Mutable hashCode Causes 30% Cache Loss

A cache hit rate dropped from 85% to 55% because mutable 'lastAccessed' orphaned HashSet entries.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • A HashSet stores unique elements using a hash table with O(1) average-time add, remove, and contains
  • Backed by a HashMap — elements become keys, a dummy object is the value
  • Relies on hashCode() to pick a bucket and equals() to confirm uniqueness
  • Add() returns boolean: true if element was new, false if duplicate
  • Resizing at 75% load factor triggers O(n) rehashing — the only expensive operation
  • Biggest mistake: mutating hashCode-related fields after insertion leaves elements orphaned in wrong buckets
Plain-English First

Imagine you're organising a guest list for a party. You don't care what order people arrive — you just need to know instantly whether someone is already on the list, and you never want the same name written down twice. A HashSet is exactly that guest list. It's a bag that holds things in no particular order, refuses duplicates without complaining, and can tell you 'yes they're in' or 'no they're not' almost instantly — no matter how many thousands of names are on the list.

Every non-trivial Java application deals with collections of data. But most developers reach for ArrayList out of habit, even when uniqueness or fast lookup is the actual requirement. That habit has real costs — O(n) contains() checks, duplicates sneaking into your data, and business logic cluttered with manual deduplication. HashSet exists to make those problems disappear at the data-structure level.

The problem HashSet solves is simple but important: how do you store a group of items where order doesn't matter, duplicates are impossible, and checking membership needs to be near-instant? ArrayList fails on all three counts at scale. HashSet uses a hash table under the hood — each element gets a numeric hash code, which maps it to a bucket in an array. Lookup becomes a matter of jumping to the right bucket, not scanning the whole collection. That's how it achieves O(1) average-time contains(), add(), and remove().

By the end of this article you'll understand not just how to use HashSet, but why it's built the way it is, what happens internally when you call add(), how it plays with equals() and hashCode(), where it beats and loses to its siblings TreeSet and LinkedHashSet, and the exact mistakes that cause silent bugs in production code. You'll also walk away with answers to the three HashSet questions interviewers love to ask.

How HashSet Actually Stores Elements — The Bucket Mechanism

A HashSet is backed by a HashMap. When you call add(element), Java calls element.hashCode(), takes that integer, does some internal bit-mixing, and uses the result to pick a bucket — essentially a slot in an internal array. If that bucket is empty, the element goes in. If the bucket already has an occupant, Java calls equals() to check whether it's the same object. If equals() returns true, the element is rejected silently — no exception, no error, it just doesn't get added. If equals() returns false (a hash collision), both elements coexist in the same bucket as a linked list.

This two-step process — hashCode() to find the bucket, equals() to confirm identity — is the most important thing to understand about HashSet. It also reveals a critical contract: if two objects are equal according to equals(), they MUST return the same hashCode(). Break that contract and your HashSet will silently store duplicates as if they're different objects.

The default initial capacity is 16 buckets, with a load factor of 0.75. When 75% of the buckets are occupied, the internal array doubles in size and all elements are rehashed. This is called resizing, and it's the one expensive operation in HashSet's life.

HashSetInternals.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import java.util.HashSet;
import java.util.Set;

public class HashSetInternals {

    public static void main(String[] args) {

        // --- Basic uniqueness guarantee ---
        Set<String> visitedUrls = new HashSet<>();

        visitedUrls.add("https://thecodeforge.io/java");
        visitedUrls.add("https://thecodeforge.io/python");
        visitedUrls.add("https://thecodeforge.io/java"); // duplicate — silently ignored

        // Size is 2, not 3. The duplicate was rejected without any error.
        System.out.println("Unique URLs tracked: " + visitedUrls.size()); // 2

        // --- O(1) membership check ---
        String urlToCheck = "https://thecodeforge.io/java";
        if (visitedUrls.contains(urlToCheck)) {
            // This lookup is near-instant whether the set has 3 or 3 million elements
            System.out.println("Already crawled: " + urlToCheck);
        }

        // --- add() returns a boolean — use it! ---
        boolean wasNew = visitedUrls.add("https://thecodeforge.io/java"); // already exists
        System.out.println("URL was newly added: " + wasNew); // false — tells you it was a duplicate

        boolean isNew = visitedUrls.add("https://thecodeforge.io/go"); // brand new
        System.out.println("URL was newly added: " + isNew); // true

        // --- Order is NOT guaranteed ---
        // Iterating a HashSet gives you elements in an unpredictable order
        System.out.println("\nAll tracked URLs (order not guaranteed):");
        for (String url : visitedUrls) {
            System.out.println("  " + url);
        }
    }
}
Output
Unique URLs tracked: 2
Already crawled: https://thecodeforge.io/java
URL was newly added: false
URL was newly added: true
All tracked URLs (order not guaranteed):
https://thecodeforge.io/go
https://thecodeforge.io/java
https://thecodeforge.io/python
Pro Tip: add() returns a boolean — don't ignore it
Most developers write 'set.add(item)' and throw away the return value. That boolean tells you whether the element was actually new. In deduplication pipelines, counting those 'false' returns tells you exactly how many duplicates you caught — that's business-critical telemetry.
Production Insight
Resizing at 75% load factor is the only expensive O(n) operation.
If you know the final size in advance, create HashSet with initial capacity: new HashSet<>(expectedSize / 0.75f + 1).
Rule: pre-size to avoid repeated resizing under high insert throughput.
Key Takeaway
HashSet = HashMap under the hood.
Two-step add: hashCode() finds bucket, equals() confirms identity.
Break the contract and you get silent duplicates.

The equals() and hashCode() Contract — Where Most Bugs Are Born

When you store Strings or Integer wrappers in a HashSet, everything works perfectly because those classes implement hashCode() and equals() correctly. The danger arrives when you create your own class and put instances of it into a HashSet.

By default, every Java object inherits hashCode() from Object, which returns a value based on the object's memory address. Two distinct objects with identical field values will produce different hash codes. To HashSet, they look like completely different elements — so it happily stores both, silently creating duplicates.

The fix is always the same: override both equals() AND hashCode() together. Never one without the other. Modern IDEs and libraries like Lombok can generate them, but you need to understand why they're needed before you trust the automation.

The contract is: if a.equals(b) is true, then a.hashCode() must equal b.hashCode(). The reverse doesn't need to hold — two objects can share a hash code (collision) but still be not-equal. Keep that asymmetry in mind.

ProductDeduplication.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import java.util.HashSet;
import java.util.Objects;
import java.util.Set;

public class ProductDeduplication {

    // A product SKU from an e-commerce catalogue
    static class Product {
        private final String sku;
        private final String name;

        Product(String sku, String name) {
            this.sku = sku;
            this.name = name;
        }

        // Without overriding these, two Product objects with the same SKU
        // would be treated as completely different by HashSet.
        @Override
        public boolean equals(Object other) {
            if (this == other) return true; // same reference — definitely equal
            if (!(other instanceof Product)) return false; // wrong type — definitely not equal
            Product otherProduct = (Product) other;
            // Business rule: two Products are the same if their SKUs match
            return Objects.equals(this.sku, otherProduct.sku);
        }

        @Override
        public int hashCode() {
            // Must use the same field(s) as equals() — SKU drives both
            return Objects.hash(sku);
        }

        @Override
        public String toString() {
            return "Product[sku=" + sku + ", name=" + name + "]";
        }
    }

    public static void main(String[] args) {

        Set<Product> catalogue = new HashSet<>();

        // Simulating products loaded from two different data sources
        Product fromDatabaseA = new Product("BOOT-42", "Trail Boot Size 42");
        Product fromDatabaseB = new Product("BOOT-42", "Trail Boot Size 42"); // same SKU, different object
        Product differentProduct = new Product("HAT-01", "Hiking Hat");

        catalogue.add(fromDatabaseA);
        catalogue.add(fromDatabaseB); // rejected — equals() + hashCode() say it's the same product
        catalogue.add(differentProduct);

        // If we had NOT overridden equals/hashCode, size would be 3 (a silent bug!)
        System.out.println("Unique products in catalogue: " + catalogue.size()); // 2

        // contains() also works correctly now
        Product lookupProduct = new Product("HAT-01", "Hiking Hat");
        System.out.println("Catalogue contains hat: " + catalogue.contains(lookupProduct)); // true

        System.out.println("\nCatalogue contents:");
        catalogue.forEach(System.out::println);
    }
}
Output
Unique products in catalogue: 2
Catalogue contains hat: true
Catalogue contents:
Product[sku=HAT-01, name=Hiking Hat]
Product[sku=BOOT-42, name=Trail Boot Size 42]
Watch Out: Mutating a field used in hashCode() after insertion
If you add an object to a HashSet and then change a field that's part of its hashCode() calculation, the object moves to a new bucket — but HashSet doesn't know. You'll never find it again with contains(), and you'll never be able to remove() it. Treat objects in a HashSet as effectively immutable, or use only fields that never change as the basis for equals/hashCode.
Production Insight
Silent duplicates from missing hashCode/equals are hard to spot because the set just grows.
First symptom is often a memory warning or unexpected size in metrics.
Rule: always write a unit test that adds two distinct instances with equal fields and asserts size is 1.
Key Takeaway
Override equals AND hashCode together, using the same fields.
a.equals(b) -> a.hashCode() == b.hashCode().
Mutate hash fields after insertion and you lose the element forever.

HashSet vs LinkedHashSet vs TreeSet — Choosing the Right Tool

All three implement the Set interface and guarantee uniqueness, but they make different trade-offs around ordering and performance. Choosing the wrong one is a common source of subtle bugs — especially when code that 'works' in testing breaks in production because iteration order changed.

HashSet is the default choice. No ordering, pure speed. Use it whenever you don't care about the sequence of elements.

LinkedHashSet maintains insertion order — elements come out in the order you put them in. It does this by linking each bucket entry to the previous and next one with a doubly-linked list. You pay a small memory overhead but gain predictable iteration. Use it when order matters for display or reproducibility.

TreeSet maintains sorted order — elements are always iterated in their natural order (or a custom Comparator you provide). It uses a Red-Black tree internally, so all core operations are O(log n) rather than O(1). Use it when you need a sorted unique collection or range queries like headSet() and tailSet().

The rule of thumb: start with HashSet. Upgrade to LinkedHashSet when reproducibility matters. Upgrade to TreeSet when sorted order matters.

SetVariantsComparison.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.Set;
import java.util.TreeSet;

public class SetVariantsComparison {

    public static void main(String[] args) {

        // Imagine these are user-selected tags on a blog post, added in this order
        String[] selectedTags = {"java", "backend", "spring", "java", "performance", "backend"};

        // --- HashSet: unique, but order is unpredictable ---
        Set<String> hashTags = new HashSet<>();
        for (String tag : selectedTags) hashTags.add(tag);
        System.out.println("HashSet (no order guarantee): " + hashTags);
        // Output order will vary between JVM runs

        // --- LinkedHashSet: unique AND preserves insertion order ---
        Set<String> linkedTags = new LinkedHashSet<>();
        for (String tag : selectedTags) linkedTags.add(tag);
        System.out.println("LinkedHashSet (insertion order): " + linkedTags);
        // Always: [java, backend, spring, performance] — first occurrence wins

        // --- TreeSet: unique AND always alphabetically sorted ---
        Set<String> sortedTags = new TreeSet<>();
        for (String tag : selectedTags) sortedTags.add(tag);
        System.out.println("TreeSet (sorted order):        " + sortedTags);
        // Always: [backend, java, performance, spring]

        // --- Range query — a TreeSet superpower HashSet can't do ---
        TreeSet<Integer> pageNumbers = new TreeSet<>();
        for (int page = 1; page <= 20; page++) pageNumbers.add(page);

        // Give me all pages from 5 up to (but not including) 10
        System.out.println("\nPages 5 to 9: " + pageNumbers.subSet(5, 10));

        // Give me all pages from 15 onwards
        System.out.println("Pages 15+: " + pageNumbers.tailSet(15));
    }
}
Output
HashSet (no order guarantee): [java, spring, performance, backend]
LinkedHashSet (insertion order): [java, backend, spring, performance]
TreeSet (sorted order): [backend, java, performance, spring]
Pages 5 to 9: [5, 6, 7, 8, 9]
Pages 15+: [15, 16, 17, 18, 19, 20]
Interview Gold: Why can't you use subSet() on a HashSet?
subSet(), headSet() and tailSet() are defined on the NavigableSet interface, which TreeSet implements but HashSet does not. HashSet has no notion of order, so range queries are meaningless to it. If an interviewer asks about this, the answer connects interface design to internal data structure — that's the depth they're looking for.
Production Insight
Order-dependent code that works during testing can break in production if you assume iteration order of HashSet.
A staging test with low data volume may show consistent order that changes under load after resizing.
Rule: if you rely on iteration order, use LinkedHashSet or TreeSet explicitly — don't rely on HashSet's accidental stability.
Key Takeaway
HashSet: fastest, no order guaranteed.
LinkedHashSet: insertion order preserved, small memory premium.
TreeSet: sorted order, O(log n) operations, supports range queries.

Real-World Patterns — Where HashSet Actually Earns Its Keep

Theory is great, but HashSet shines brightest in specific real-world patterns. Understanding these patterns means you'll reach for the right tool instinctively.

The most common pattern is the visited/seen set — a fast-rejection gate. You're processing a stream of events, URLs, user IDs, or transactions and you want to skip anything you've already handled. A HashSet lets you check-and-mark in one add() call and the O(1) lookup means it stays fast even as the seen set grows to millions of entries.

The second pattern is set operations — intersection, union, and difference. The Collections API makes these one-liners with retainAll(), addAll(), and removeAll(). These are exactly how you'd compute shared followers between two users, unique items across two orders, or features present in one plan but not another.

A third pattern is deduplication before storage. When your application receives data from multiple sources that might overlap, running everything through a HashSet before persisting it is far cheaper than writing duplicates to a database and cleaning them up later.

SetOperationsDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

public class SetOperationsDemo {

    public static void main(String[] args) {

        // --- Pattern 1: Seen-set for deduplication in a stream ---
        System.out.println("=== Pattern 1: Deduplication Gate ===");
        Set<String> processedOrderIds = new HashSet<>();
        String[] incomingOrders = {"ORD-001", "ORD-002", "ORD-001", "ORD-003", "ORD-002"};

        for (String orderId : incomingOrders) {
            // add() returns false if already present — use that as a skip signal
            if (!processedOrderIds.add(orderId)) {
                System.out.println("Skipping duplicate order: " + orderId);
            } else {
                System.out.println("Processing new order: " + orderId);
            }
        }

        // --- Pattern 2: Set intersection — find common elements ---
        System.out.println("\n=== Pattern 2: Set Operations ===");

        // Permissions assigned to two different user roles
        Set<String> editorPermissions = new HashSet<>(
            Arrays.asList("read", "write", "comment", "upload")
        );
        Set<String> viewerPermissions = new HashSet<>(
            Arrays.asList("read", "comment")
        );

        // INTERSECTION — permissions that both roles share
        Set<String> sharedPermissions = new HashSet<>(editorPermissions);
        sharedPermissions.retainAll(viewerPermissions); // modifies in place — keep only what's in both
        System.out.println("Shared permissions: " + sharedPermissions); // [read, comment]

        // DIFFERENCE — permissions only editors have (not viewers)
        Set<String> editorOnlyPermissions = new HashSet<>(editorPermissions);
        editorOnlyPermissions.removeAll(viewerPermissions); // remove anything viewers also have
        System.out.println("Editor-only permissions: " + editorOnlyPermissions); // [write, upload]

        // UNION — all permissions across both roles combined (no duplicates)
        Set<String> allPermissions = new HashSet<>(editorPermissions);
        allPermissions.addAll(viewerPermissions); // adds only what's not already present
        System.out.println("All permissions (union): " + allPermissions); // [read, write, comment, upload]
    }
}
Output
=== Pattern 1: Deduplication Gate ===
Processing new order: ORD-001
Processing new order: ORD-002
Skipping duplicate order: ORD-001
Processing new order: ORD-003
Skipping duplicate order: ORD-002
=== Pattern 2: Set Operations ===
Shared permissions: [read, comment]
Editor-only permissions: [write, upload]
All permissions (union): [read, write, comment, upload]
Watch Out: retainAll() and removeAll() mutate the original set
Both retainAll() and removeAll() modify the Set you call them on. Always work on a defensive copy — 'new HashSet<>(originalSet)' — if you need to preserve the original. This is one of the most common sources of accidental data loss when doing set operations in production code.
Production Insight
Using HashSet as a dedup gate before DB insert is far cheaper than DB unique constraint violations.
A million adds in a HashSet takes ~200ms while a million DB inserts with duplicate checks can take minutes.
Rule: pre-size the HashSet to expected cardinality to avoid resizing overhead.
Key Takeaway
HashSet excels for dedup gates, set arithmetic, and pre-storage dedup.
Use add() return value as a skip signal.
Always copy before retainAll/removeAll to avoid mutating originals.

Performance & Resizing: When HashSet Gets Expensive

HashSet's O(1) operations are average-case. Under the hood, the array of buckets starts at size 16 with a load factor of 0.75. When the number of elements exceeds capacity * load factor, the array doubles and all elements are rehashed — that means every element's hashCode() is called again, and each lands in a new bucket. This is an O(n) operation.

If you insert a million elements into a default HashSet, it will resize about 17 times (16 -> 32 -> 64 -> ... -> 1,048,576). Each resize triggers a full rehash. The total cost of resizing can dominate your insertion time if you don't pre-size.

Pre-sizing: if you know you'll store N unique items, create the HashSet with initial capacity = (int)(N / 0.75) + 1. This avoids all resizes. The +1 accounts for rounding errors from integer division.

Another performance trap: poor hashCode() implementations that produce many collisions. When many elements land in the same bucket, the linked list grows and operations degrade to O(m) where m is the bucket length. In extreme cases, a badly distributed hashCode() can turn HashSet into a slow ArrayList. Java 8 improved this by converting long linked lists to trees when a bucket holds more than 8 elements, but that's a fallback — not an excuse for a bad hash function.

HashSetPreSizing.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import java.util.HashSet;
import java.util.Set;

public class HashSetPreSizing {

    public static void main(String[] args) {
        int expectedSize = 1_000_000;
        // Calculate optimal initial capacity to avoid resizing
        int initialCapacity = (int)(expectedSize / 0.75f) + 1;
        
        long start = System.nanoTime();
        Set<Integer> sizedSet = new HashSet<>(initialCapacity);
        for (int i = 0; i < expectedSize; i++) {
            sizedSet.add(i);
        }
        long durationSized = System.nanoTime() - start;
        
        start = System.nanoTime();
        Set<Integer> defaultSet = new HashSet<>();
        for (int i = 0; i < expectedSize; i++) {
            defaultSet.add(i);
        }
        long durationDefault = System.nanoTime() - start;
        
        System.out.println("Pre-sized HashSet: " + durationSized / 1_000_000 + " ms");
        System.out.println("Default HashSet: " + durationDefault / 1_000_000 + " ms");
        // Pre-sizing typically cuts insertion time by 30-50% for large sets
    }
}
Output
Pre-sized HashSet: 480 ms
Default HashSet: 760 ms
Memory vs Speed Trade-off
Pre-sizing allocates a larger internal array upfront, which uses more memory but avoids the CPU cost of repeated rehashing. For most applications, the memory overhead is trivial compared to the speed gain. Use pre-sizing when you know the final size within 20%.
Production Insight
In high-throughput services, unplanned resizing can cause latency spikes as the GC kicks in to clean old arrays.
If your HashSet holds millions of entries and you see periodic GC pauses, check resizing costs.
Rule: always pre-size when the approximate data volume is known at design time.
Key Takeaway
Default capacity=16, load factor=0.75.
Resizing doubles array and rehashes all elements — O(n) cost.
Pre-size using expectedSize / 0.75 + 1 to avoid resizing entirely.
● Production incidentPOST-MORTEMseverity: high

The Cache That Lost 30% of Its Entries Overnight

Symptom
The cache hit rate dropped from 85% to 55% within hours of deployment. contains() returned false for objects that had been added minutes earlier, causing expensive recomputation.
Assumption
The team assumed that once an object was added to the HashSet, it would remain discoverable forever. They had overridden equals() and hashCode() based on a mutable 'lastAccessed' field to speed up eviction logic.
Root cause
When 'lastAccessed' changed, the object's hashCode changed. The bucket location changed, but the set still held the old bucket. contains() looked in the new bucket, didn't find it, and returned false. The object was effectively orphaned but still consuming memory — a slow memory leak.
Fix
Changed the hashCode() to use only immutable fields (ID + creation timestamp). The 'lastAccessed' field was moved out of hashCode/equals. Added a unit test that mutates the object and verifies the set still contains it after mutation.
Key lesson
  • Never include mutable fields in hashCode() or equals() for objects stored in a HashSet
  • Treat objects in a HashSet as immutable — or design them so that identity fields never change
  • Write tests that explicitly mutate objects after insertion and verify set membership
Production debug guideCommon symptoms and the exact actions to diagnose and fix them4 entries
Symptom · 01
contains() returns false for an element you know was added
Fix
Check if the element's hashCode() changed after insertion. Print the hash code at add time and at check time. Look for mutable fields in hashCode().
Symptom · 02
Set size is larger than expected, duplicates present
Fix
Verify that equals() and hashCode() are both overridden consistently. Two objects with identical field values must return same hashCode(). Print hash codes for suspected duplicates.
Symptom · 03
Iterating the set shows elements in different order across runs
Fix
That's expected for HashSet — the order is not guaranteed. If reproducible order is needed, switch to LinkedHashSet.
Symptom · 04
NullPointerException when adding null?
Fix
HashSet allows one null. NPE occurs only if you call hashCode() on null. But HashSet handles null internally. If you see NPE, check if you're using TreeSet (throws NPE on null).
★ Quick Hash Set Debug CommandsCommands and code snippets to quickly diagnose common HashSet problems in production
Element not found after mutation
Immediate action
Print hash codes before and after mutation to confirm change
Commands
System.out.println(element.hashCode()); // before mutation
System.out.println(element.hashCode()); // after mutation
Fix now
Redesign hashCode() to use only immutable fields. Remove mutable fields from equals() as well.
Set contains duplicates of custom objects+
Immediate action
Print field-by-field comparison for suspected duplicates
Commands
System.out.println(obj1.equals(obj2) + " " + (obj1.hashCode() == obj2.hashCode()));
// Also check if both overrides exist: run 'javap -p' on the class
Fix now
Override both equals() and hashCode() using the same fields. Use Objects.hash() for safety.
Set size grows too large — possible memory leak+
Immediate action
Dump the set contents and check for orphaned objects
Commands
System.out.println("Set size: " + set.size());
set.forEach(e -> System.out.println(e + " hash=" + e.hashCode()));
Fix now
If orphaned objects found, they were mutated after insertion. The only fix is to avoid mutation of hash/equals fields.
Feature / AspectHashSetLinkedHashSetTreeSet
Internal structureHashMap (array of buckets)LinkedHashMap (buckets + linked list)Red-Black Tree
Allows duplicatesNoNoNo
Allows nullYes (one null)Yes (one null)No — throws NullPointerException
Iteration orderUnpredictableInsertion orderSorted (natural or Comparator)
add() / remove() / contains()O(1) averageO(1) averageO(log n)
Memory overheadLowMedium (extra linked list)High (tree node objects)
Use when you needFast uniqueness, no order careUnique + reproducible orderUnique + sorted + range queries
Implements NavigableSetNoNoYes
Thread-safeNoNoNo

Key takeaways

1
HashSet gives you O(1) average-time add(), remove(), and contains() by using a hash table
that's why it beats ArrayList for membership checks at scale, where ArrayList's contains() is O(n).
2
HashSet is backed by a HashMap
your elements are stored as keys with a dummy constant as the value. Everything you know about how HashMap hashes keys applies directly to HashSet elements.
3
If you store custom objects in a HashSet, you must override both equals() AND hashCode() using the same fields
skip either one and you get silent duplicate storage, which is one of the hardest bugs to diagnose.
4
The three Set implementations form a clear hierarchy of trade-offs
HashSet for raw speed with no ordering, LinkedHashSet for insertion-order preservation, and TreeSet for sorted order and range queries — pick based on what your code actually needs, not habit.
5
Pre-size your HashSet when you know the approximate number of elements to avoid expensive resize/rehash operations. Formula
initialCapacity = (int)(expectedSize / 0.75f) + 1.

Common mistakes to avoid

3 patterns
×

Forgetting to override hashCode() when overriding equals()

Symptom
Two distinct objects with identical field values are both added to the HashSet, silently creating duplicates. The set grows bigger than expected. contains() returns false for fields you know are present because they end up in different buckets.
Fix
Always override both equals() and hashCode() together. Use the same fields in both methods. Use Objects.hash(field1, field2) for a safe, null-tolerant hashCode() implementation.
×

Mutating an object's hashCode-contributing field after adding it to a HashSet

Symptom
contains() returns false for an element you're certain is in the set. remove() fails silently. The object is stranded in the wrong bucket forever, causing a slow memory leak as the orphaned element never gets cleaned up.
Fix
Design objects stored in a HashSet to be effectively immutable. Ensure fields used in hashCode() are declared final and set only in the constructor. If mutation is unavoidable, remove the object from the set before mutation and re-add it afterwards.
×

Using HashSet when thread-safe set is needed in concurrent environment

Symptom
Intermittent data corruption, elements mysteriously disappearing, or infinite loops during concurrent iteration. The internal arrays can become inconsistent when multiple threads add/remove simultaneously.
Fix
Use Collections.synchronizedSet(new HashSet<>()) for coarse-grained locking, or ConcurrentHashMap.newKeySet() for better concurrent throughput (backed by ConcurrentHashMap, avoids full lock on every operation).
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
HashSet is backed by a HashMap internally — can you explain what that me...
Q02SENIOR
If two objects have the same hashCode() but equals() returns false, what...
Q03SENIOR
You add an object to a HashSet, then modify a field that's included in i...
Q01 of 03JUNIOR

HashSet is backed by a HashMap internally — can you explain what that means in practice, and what gets stored as the key and value in that underlying map?

ANSWER
When you add an element to a HashSet, the element becomes the key in the internal HashMap. The value is a shared dummy Object constant (PRESENT). All keys in a HashMap must be unique, so the set enforces uniqueness automatically. HashMap operations like put() are reused for add(), containsKey() for contains(), etc. The hash table structure with buckets and collision handling is identical. This also means that the requirements for a good key in HashMap (immutable, proper equals/hashCode) apply exactly to elements stored in HashSet.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Can a Java HashSet contain null values?
02
Does HashSet maintain insertion order in Java?
03
What is the difference between HashSet and HashMap in Java?
04
How does HashSet handle hash collisions?
05
Is HashSet thread-safe?
🔥

That's Collections. Mark it forged?

5 min read · try the examples if you haven't

Previous
HashMap in Java
5 / 21 · Collections
Next
TreeMap and TreeSet in Java