Java TreeSet Comparator — How It Silently Drops Duplicates
TreeSet comparator defined only on score caused 950 leaderboard entries to be silently dropped.
- TreeMap/TreeSet maintain automatic sorted order via Red-Black Tree
- Keys/elements must implement Comparable or receive a Comparator
- Range queries (subMap, headSet) are O(log n) - the killer feature
- Two elements are equal if compare() returns 0, not if equals() says so
- Null keys throw NullPointerException; null values are allowed
- Mutation of key fields after insertion silently corrupts the tree
Imagine a library where books are always kept in alphabetical order on the shelves — no matter when a new book arrives, the librarian slots it into exactly the right spot. That's a TreeSet or TreeMap. Unlike a pile of books on the floor (a HashMap), every insertion maintains perfect order automatically. You pay a small price in speed, but you gain the superpower of always knowing what's first, last, or nearest to any value.
Every non-trivial Java application eventually needs sorted data — a leaderboard ranked by score, a task scheduler ordered by deadline, a word-frequency map listed alphabetically. You could dump everything into an ArrayList and call Collections.sort() every time, but that's like re-alphabetising your entire library every time a new book arrives. Java's TreeMap and TreeSet solve this permanently by keeping data sorted at all times, automatically, on every insert and delete.
The problem with HashMap and HashSet is that they trade order for raw speed. That's a great deal when you only need to answer 'does this key exist?' or 'give me this value'. But the moment a user asks 'show me the top 10 scores' or 'find everything between Tuesday and Friday', a HashMap forces you to scan the entire structure. A TreeMap answers those range and boundary questions in O(log n) time because it's backed by a Red-Black Tree — a self-balancing binary search tree that keeps everything sorted without you lifting a finger.
By the end of this article you'll know exactly when to reach for a TreeMap versus a HashMap, how to exploit the NavigableMap and NavigableSet APIs to slice your data in powerful ways, and which subtle mistakes cause ClassCastExceptions and incorrect ordering in production. You'll also have concrete, real-world patterns you can copy directly into your own codebase.
TreeSet — When Order Is Non-Negotiable
A TreeSet is a Set — so it never stores duplicates — but unlike HashSet it keeps every element in its natural sorted order at all times. 'Natural order' means whatever Comparable defines for that type: alphabetical for Strings, ascending for numbers, chronological for dates.
The real payoff isn't just iteration order. It's the NavigableSet API: methods like first(), last(), floor(), ceiling(), headSet(), and tailSet() let you slice and query your sorted data without loops. Want every username that comes after 'mike' alphabetically? One line. Want the five closest scores below a threshold? One line.
Internally, TreeSet wraps a TreeMap and stores your elements as keys. Every add(), remove(), and contains() runs in O(log n) — slower than HashSet's O(1) average, but a fair trade when you genuinely need sorted access. Don't use TreeSet by default; use it when the ordering itself is part of the feature you're building.
first(), last(), or range methods.TreeMap — Sorted Keys With All the Power of a Map
TreeMap is to HashMap what TreeSet is to HashSet: it keeps all its keys in sorted order at all times, and it exposes a rich NavigableMap API for range queries, boundary lookups, and sliced views.
The classic use case is any kind of timeline, schedule, or rank-ordered dataset. Imagine you're building a task scheduler. Each task has a deadline (the key) and a description (the value). A TreeMap lets you instantly find the next task due, all tasks due before a certain date, or every task in a given window — without scanning the entire map.
TreeMap also implements SortedMap and NavigableMap, which means you get firstKey(), lastKey(), floorKey(), ceilingKey(), lowerKey(), higherKey(), headMap(), tailMap(), and subMap() — all backed by the same O(log n) Red-Black Tree. You can even get a descending view of the entire map with descendingMap(). These aren't convenience extras; they're the whole reason to choose TreeMap over HashMap.
NavigableMap and NavigableSet — The Real Reason to Choose Tree
The true power of TreeMap and TreeSet lies in their Navigable interfaces — not just sorted iteration, but the ability to slice, dice, and query your data in logarithmic time. These methods don't copy data; they return live views that reflect ongoing mutations.
- floorKey(k) / ceilingKey(k): Find the nearest key less than/equal or greater than/equal to a target.
- lowerKey(k) / higherKey(k): Strict versions of above (exclusive).
- headMap(k, inclusive): All keys less than (or ≤) k.
- tailMap(k, inclusive): All keys greater than (or ≥) k.
- subMap(from, fromInclusive, to, toInclusive): A contiguous slice.
- descendingMap() / descendingSet(): Reverse-order views without sorting copy.
- pollFirstEntry() / pollLastEntry(): Retrieve and remove boundary entries.
Use these when you need to answer 'what's the next available slot?', 'find all orders within a date range', or 'get the top 5 scores' without ever calling sort().
- floor/lower find the page before or at the topic
- headMap/tailMap give you the entire chapter before or after
- subMap is the exact page range you're looking for
- descendingMap flips the book upside down without reprinting it
Custom Ordering With Comparator — When Natural Order Isn't Enough
Natural order works great for Strings and numbers, but what about your own objects? And what if you want reverse order, or a case-insensitive sort? That's where a Comparator comes in.
Both TreeMap and TreeSet accept a Comparator in their constructor. Once you supply one, it completely overrides natural order — even if your objects implement Comparable. This is powerful, but there's a critical rule: the Comparator you use for ordering becomes the equality definition for the collection. Two elements are considered duplicates if comparator.compare(a, b) == 0, not if a.equals(b). This surprises many developers.
A practical pattern: pass Comparator.reverseOrder() to get a descending TreeMap, which is perfect for leaderboards where you want the highest value first. Or pass String.CASE_INSENSITIVE_ORDER to a TreeSet of strings so 'Apple' and 'apple' are treated as the same entry. The Comparator is your formatting rule for the entire collection.
equals() considers different, the TreeSet or TreeMap will treat them as the same element and silently discard the second one. Always make your Comparator consistent with equals() — or at least be deliberate when it isn't (like the case-insensitive example above, where the behaviour is intentional).equals() is a common source of data loss.equals() causes silent data loss.TreeMap vs HashMap vs TreeSet vs HashSet — Choosing the Right Tool
The most common mistake is defaulting to HashMap or HashSet out of habit and then discovering you need sorted access later — at which point you're either converting structures or adding sort calls everywhere. Choosing right from the start costs nothing.
The decision is straightforward: if you need to ask range questions ('everything between X and Y'), boundary questions ('what's the closest element to Z?'), or if iteration order must be predictable and sorted, use the Tree variant. If you only ever ask point questions ('does this key exist?', 'give me this value') and don't care about order, stick with the Hash variant for its faster O(1) average operations.
LinkedHashMap and LinkedHashSet are the middle ground — they maintain insertion order but don't sort. Use them when you want predictable iteration without the overhead of sorting. The table below captures the key trade-offs so you can settle this question in seconds during a code review.
TreeMap vs HashMap — Detailed Comparison
While both implement Map, TreeMap and HashMap serve fundamentally different purposes. The table below highlights the key differences you need to consider when choosing.
| Feature | TreeMap | HashMap |
|---|---|---|
| Internal structure | Red-Black Tree | Hash Table (array of buckets) |
| Iteration order | Always sorted (natural or Comparator) | Unpredictable (effectively random) |
| get / contains | O(log n) worst case | O(1) average, O(n) worst case |
| put / add | O(log n) worst case | O(1) average, O(n) worst case |
| Range queries | Built-in O(log n) | Not supported |
| Null keys | Not allowed (NPE) | One null key allowed |
| Null values | Allowed | Allowed |
| Thread safety | Not thread-safe | Not thread-safe |
| Memory overhead | Higher (tree node pointers) | Lower (flat array) |
| Best used when | Sorted iteration or range queries needed | Maximum speed, order doesn't matter |
| Use case example | Task scheduler, leaderboard, price range queries | Caching, ID lookups, configuration maps |
This table is a quick reference for code reviews and design decisions. When in doubt, ask: "Do I need to answer 'what's between X and Y'?" If yes, TreeMap is the answer.
TreeSet vs HashSet — Detailed Comparison
The same trade-offs apply to the Set variants. Here's a direct comparison.
| Feature | TreeSet | HashSet |
|---|---|---|
| Internal structure | Red-Black Tree (backed by TreeMap) | Hash Table |
| Iteration order | Always sorted | Unpredictable |
| add / contains / remove | O(log n) | O(1) average, O(n) worst |
| Range queries | Built-in (subSet, headSet) O(log n) | Not supported |
| Null elements | Not allowed (NPE) | One null allowed |
| Thread safety | Not thread-safe | Not thread-safe |
| Memory overhead | Higher (tree node pointers) | Lower |
| Best used when | Need sorted elements or range searches | Maximum performance, order irrelevant |
| Use case example | Unique sorted scores, auto-complete lists | Deduplication, presence checks |
Like the Map counterparts, the decision hinges on whether you ever need ordered iteration or range queries. If you only need uniqueness and fast membership tests, HashSet is the better choice.
Advantages and Disadvantages of TreeMap and TreeSet
Both TreeMap and TreeSet share the same underlying data structure (Red-Black Tree), so their strengths and weaknesses are nearly identical.
Advantages
| Advantage | Explanation |
|---|---|
| Automatic sorting | Elements are always in order without manual sorting. |
| Rich Navigable API | floor, ceiling, headSet, subSet, descendingMap — all O(log n). |
| Live views | Range views reflect mutations to the backing collection instantly. |
| Guaranteed log(n) performance | All operations are O(log n) worst-case, no performance cliffs. |
| Comparator injection | Custom ordering without modifying the element class. |
| Natural ordering fallback | Works with any Comparable type out of the box. |
Disadvantages
| Disadvantage | Explanation |
|---|---|
| Slower than hash-based counterparts | O(log n) vs O(1) for get/put/add. 40% more overhead for single-key ops. |
| No null keys/elements | NullPointerException on null keys/elements. |
| Higher memory consumption | Tree nodes store pointers to left, right, parent, and color (Red/Black). |
| Comparator inconsistency risk | Silent data loss if compare() returns 0 for non-identical objects. |
| Not thread-safe | Requires external synchronization or ConcurrentSkipListMap for concurrent access. |
| Key mutation danger | Mutating key fields after insertion breaks the tree without warning. |
These trade-offs are fundamental to the data structure. Use the advantages when you need sorted ordering; avoid the disadvantages by carefully designing keys and comparators.
Advanced NavigableMap Operations: descendingMap, pollFirstEntry, pollLastEntry
Beyond the basic floor/ceiling and sub-range views, NavigableMap offers three powerful methods for boundary manipulation: descendingMap(), pollFirstEntry(), and pollLastEntry(). These are especially useful for leaderboards, job queues, and any scenario where you need to process items in order while mutating the collection.
- descendingMap(): Returns a reverse-order view of the map. No new data is created; iteration over the descending map is just as fast and memory-cheap as iterating forward. Combined with
subMap(), you can easily get slices in descending order. - pollFirstEntry(): Retrieves and removes the entry with the smallest key. This is a single O(log n) operation that both reads and deletes the first entry. Equivalent to
firstEntry()+remove(firstKey())but atomic and more efficient. - pollLastEntry(): Retrieves and removes the entry with the largest key. Same benefits as pollFirstEntry for the other end.
These methods are commonly used in priority queues where you want to serve items in order while draining the structure. They are also great for implementing round-robin or timeout processing without additional locking or iteration.
Practice Problems: Interval Scheduling, Range Queries, Sliding Window
The best way to internalize the power of TreeMap and TreeSet is to solve problems that naturally require sorted order and range queries. Below are five practice problems you can implement to sharpen your skills.
1. Interval Scheduling with TreeSet Given a list of intervals (start, end), find the maximum number of non-overlapping intervals you can schedule. Use a TreeSet to store intervals sorted by end time. Iterate through sorted intervals, and when you find one that doesn't overlap with the last scheduled, add it. Demonstrates or floor() for boundary checks.last()
2. Range Sum Queries on a Dynamic Array Maintain a TreeMap of cumulative sums or directly a TreeMap<Integer, Integer> where keys are indices. Support operations: add value at index, query sum between indices. Use subMap() to get entries in the range and compute the sum. Simulates a Fenwick tree without the complexity.
3. Sliding Window Median Given a stream of integers and a window size k, find the median of each window. Use two TreeSets (or one TreeSet with order statistics) to maintain left and right halves. Use and last() to get the max of left and min of right. Demonstrates maintaining balance and using boundary methods.first()
4. Nearest Neighbor in 1D Given a set of points on a line (integers), support insert(x) and query(x) that returns the closest point to x. Use TreeSet and its and floor() methods. Compare distances to find nearest. Classic example of why NavigableSet exists.ceiling()
5. Task Scheduler with Deadline Constraints Given tasks with deadlines and durations, find if all tasks can be scheduled. Use a TreeMap to store tasks sorted by deadline. Iterate from earliest deadline, subtract duration from current time, and if you ever go negative, impossible. Uses firstKey() and pollFirstEntry() efficiently.
Each problem can be solved in < 50 lines of Java using TreeMap/TreeSet. Implement them to gain confidence with the APIs discussed in this article.
The Case of the Missing Leaderboard Entries
compare() return 0 only for truly identical players.- Comparator IS the equality definition in TreeSets and TreeMaps
- Always add a tiebreaker unless you intentionally want duplicates suppressed
- Test with deliberately equal sort keys to verify behavior
Contains() or get() returns unexpected results after mutationKey takeaways
equals() says. Always add a tiebreaker or you'll silently lose data.Common mistakes to avoid
3 patternsStoring objects in TreeSet without Comparable or a Comparator
Employee to a TreeSet. On the second insertion you get a runtime ClassCastException: 'Employee cannot be cast to Comparable'. The first insert works because the tree is empty and no comparison is needed.add() because that's when TreeSet first needs to compare two elements.Comparator.compare(a,b) == 0 causes silent data loss
compare() returns 0 only when the objects are genuinely the same.Mutating a key's fields after inserting it into a TreeMap or TreeSet
Interview Questions on This Topic
How does TreeMap maintain its balance, and what is the time complexity of a re-balancing operation?
Frequently Asked Questions
That's Collections. Mark it forged?
10 min read · try the examples if you haven't