Longest Consecutive Sequence — O(n²) Bug in Login Streak
A missing guard caused 50M inner steps for a 10k streak — the O(n²) bug that almost killed login streaks.
20+ years shipping performance-critical code where algorithms decide the bill. Lessons pulled from things that broke in production.
- The algorithm uses a HashSet to find the longest run of consecutive numbers in O(n) time.
- Only start counting from sequence starts — numbers without a left neighbour in the set.
- Each number is visited at most twice: once as a start candidate, once during a forward walk.
- Total work stays O(n) because the inner while loop never revisits elements already counted.
- In production, forgetting the start guard turns O(n) into O(n²) for large inputs.
Imagine you have a shoebox full of random numbered tickets — 4, 2, 100, 3, 1, 101. You want to find the longest unbroken run of numbers, like 1-2-3-4. Instead of sorting all the tickets first, you tip them into a lookup board (a hash set) so you can instantly check 'is this number here?' Then for each ticket, you only start counting a run if it's the very first in a chain — meaning the number one below it is NOT on the board. That's the entire trick: the hash set turns 'does this number exist?' from a slow search into an instant yes-or-no, letting you count every chain in one pass.
Consecutive sequence problems appear everywhere in the real world — think about detecting unbroken login streaks in a fitness app, finding contiguous ID ranges in a database audit, or identifying runs of sensor readings in IoT pipelines. Whenever you need to find structure inside a pile of unsorted numbers, this problem and its pattern keep showing up. It's also one of the most frequently asked medium-level questions at FAANG-tier interviews because it forces you to think about data access patterns, not just loops.
The naive approach — sort the array and scan for breaks — costs O(n log n) and tempts almost everyone at first glance. But sorting is expensive, and it destroys the original order. The real insight is that you don't need sorted order at all. You just need to answer one question in O(1): 'Does the number X exist in the input?' A hash set is purpose-built for exactly that. With it, you can scan the array once and trace every consecutive chain from its true starting point, bringing the total cost down to O(n).
By the end of this article you'll understand exactly why the hash set approach works and not just mechanically how to code it, you'll be able to trace through any example by hand, spot the two most common implementation bugs before they bite you, and confidently explain your reasoning in an interview when the follow-up questions start flying.
Why Consecutive Sequence Detection Fails in Streak Systems
A longest consecutive sequence is a contiguous run of integers where each element is exactly one greater than the previous, found in an unsorted array. The core mechanic: you must identify all numbers that form uninterrupted chains — not just sorted order, but adjacency by value. For example, [100, 4, 200, 1, 3, 2] yields sequence [1,2,3,4] of length 4.
The key property: only the start of a sequence matters. If a number has no left neighbor (n-1 absent), it's a potential start. From there, you walk forward counting consecutive increments. This avoids O(n log n) sorting and achieves O(n) time with a hash set. The trick is that each element is visited at most twice — once to check for start, once to extend — giving linear total work.
Use this when you need to detect user login streaks, session continuity, or any time-series where gaps break the chain. In production, streak logic often fails because teams sort first (O(n log n)) or use nested loops that degenerate to O(n²) under real data patterns. The O(n) hash-set approach is the only safe choice for large, unsorted datasets.
Why Sorting Is the Wrong First Instinct
When most developers see 'find the longest run of consecutive numbers', their brain jumps straight to sorting. Sort the array, walk through it, count runs, track the max. Clean, simple, done. And it works — but it's costing you more than you need to pay.
Sorting is O(n log n). For an array of 10 million elements, that's a meaningful difference from O(n). More importantly, the sorting instinct reveals a deeper misunderstanding: you're paying to impose order on the entire dataset when you only care about local neighbourhood relationships — 'is num+1 in the array?' and 'is num-1 in the array?'.
This is the mental shift that separates intermediate from advanced thinking. Instead of restructuring the data, change how you query it. A HashSet gives you O(1) membership checks. Once every number is in the set, you can ask 'does 5 exist?' or 'does 6 exist?' for free, without touching any sorted structure.
The sorted approach also doesn't handle duplicates gracefully without extra logic — you have to skip them manually. The hash set absorbs duplicates automatically because a set stores only unique values. Two problems solved with one data structure choice.
The Core Insight — Only Count From Sequence Starts
Here's the question that unlocks everything: what would happen if you started counting a chain from every single number in the set? You'd recount the same sequence multiple times. Starting from 1 you'd count 1→2→3→4 (length 4). Starting from 2 you'd count 2→3→4 (length 3). Starting from 3, length 2. You'd do redundant work proportional to the square of the sequence length in the worst case.
The genius of the algorithm is the guard: only begin a chain-count when the current number has no left-neighbour in the set. If num - 1 is absent, then num must be the true beginning of a consecutive run. You can safely count forward from there, knowing you'll never double-count that sequence.
This single condition — !numberSet.contains(num - 1) — is what keeps the whole algorithm at O(n). Every number is visited at most twice: once when you check it as a potential sequence start, and once when it's stepped over during a forward count from an earlier start. Those two passes across n elements together are still O(n).
Think of it like painting. You only dip your brush at the leftmost edge of each white stripe. You never start mid-stripe. The result: every stripe gets painted exactly once.
numberSet, not the original numbers array. This means duplicates are never visited a second time during the outer loop, tightening the constant factor of your O(n) runtime. It's a small detail that signals precision in interviews.!numberSet.contains(num - 1) is the entire reason for O(n).Proving O(n) — Walking Through the Work Count
When interviewers push back with 'but isn't there a nested while loop inside your for loop — doesn't that make it O(n²)?', this is the moment that separates candidates who understand their own code from those who memorised it.
The answer lies in amortised analysis. Ask: how many total times can any single number be visited by the while loop? Exactly once. The while loop only runs when we're at a confirmed sequence start. Every number that gets consumed by a while-loop forward-walk is never itself a sequence start (because its predecessor exists in the set), so it will hit the continue in the outer for-loop and never trigger its own while-walk.
Total outer loop iterations: at most n (one per unique number). Total inner while-loop steps across ALL iterations: also at most n, because each of the n numbers is stepped over by a while loop at most once. That's 2n operations total — O(n).
This is the same amortised reasoning used to prove that a dynamic array's push operation is O(1) amortised. The expensive step is rare and its cost is spread evenly across all operations. Knowing this explanation cold will visibly impress an interviewer.
Real-World Application — Detecting Login Streak Gaps in a User System
Here's a scenario you'll actually encounter. Your app tracks which calendar days (as day-numbers since epoch) a user logged in. You want to find the user's longest unbroken login streak. The login days come from a database query — unsorted, possibly containing duplicates if the schema logs multiple events per day.
This is the Longest Consecutive Sequence problem in production clothing. The 'array of integers' is your list of day numbers. The 'consecutive sequence' is unbroken daily logins. The hash set approach handles duplicates (multiple logins on one day) without special casing, works in O(n) time even for users with years of history, and doesn't require you to sort a potentially large dataset.
The same pattern applies to: finding contiguous free blocks in a memory allocator, detecting unbroken sensor uptime windows in an IoT dashboard, finding runs of consecutive student attendance days, and identifying sequential order IDs to spot gaps in an audit trail.
Whenever your domain has 'find the longest run of something that can be modelled as consecutive integers', this algorithm is your friend.
bestStreakStart alongside longestStreak — it's a one-line addition and immediately makes the result actionable in a UI (e.g., 'Your best streak ran from Jan 5 to Jan 9').Edge Cases, Variants, and Follow-Up Questions
The basic algorithm is elegant, but interviewers love to push on the edges. Let's cover the most common follow-ups.
Integer underflow at MIN_VALUE: num - 1 when num = Integer.MIN_VALUE wraps to Integer.MAX_VALUE. The guard will think MAX_VALUE exists and skip a legitimate start. Fix: cast to long before subtraction.
Returning the actual sequence: Track bestStart and bestLength. After the loop, reconstruct: IntStream.range(bestStart, bestStart + bestLength).toArray(). That's a common extension.
Distributed variant: What if the data is sharded across thousands of servers? You can't load everything into a single hash set. The pattern shifts to MapReduce: map each number to its bucket (num / segmentSize), find sequences per bucket, then merge across bucket boundaries. This appears in real-time streaming systems.
Memory-constrained environment: If you can't afford O(n) memory, you're back to sorting (O(1) extra space) or use a Bloom filter as a probabilistic membership test if exactness isn't required. That's a rare but advanced trade-off.
num - 1 overflows to Integer.MAX_VALUE. Always cast to long when doing arithmetic on array values in this algorithm.The Hash Set Trap — Why O(n) Isn't Really O(n)
Most devs stop at "use a hash set for O(1) lookups." That's naive. In production, the hash function itself is the hidden multiplier. Java's HashSet uses hashCode() — for integers, that's the value itself. Simple, right? But the bucket array resizes. Rehashing costs O(n) per resize. If your input array has 10 million elements, you're not getting O(n). You're getting multiple O(n) passes as the table grows.
Senior engineers pre-size the HashSet to avoid resize storms. If you know the array length, use new HashSet<>(initialCapacity, loadFactor). The default load factor is 0.75. For 100k elements, set initial capacity to 133k. That's one allocation, zero resizes. The code below shows the difference between naive and pre-sized sets. The output proves it: pre-sizing cuts rehash count to zero.
This isn't academic. A streak detection system polling 50k users every minute will crater under resize pauses. Pre-size or pay the latency tax.
Memory Is the Real Bottleneck — Compact the Ints
You're processing 10 million ints. Each autoboxed Integer costs 16 bytes (object header + int + padding). That's 160 MB just for the data. Plus the hash table's internal arrays. You'll hit GC before you hit O(n).
Real solutions: Use a primitive int set. Libraries like Koloboke or HPPC provide IntHashSet — stores ints directly, 4 bytes each. Or use BitSet if your sequence values are bounded and non-negative. BitSet maps values to bits. 10 million ints in a bounded range = 1.25 MB. The tradeoff is that negative numbers or huge gaps blow up memory. But for streak IDs or user IDs that are sequential, BitSet is absurdly efficient.
The code below compares memory for 100k ints. Object overhead killed the HashSet. The output shows heap usage difference.
Production lesson: O(n) in time is meaningless if memory pressure kills throughput. Profile your data distribution before choosing the data structure.
The O(n²) That Almost Killed a Login Streak Feature
if (numberSet.contains(num - 1)) continue;, the inner while loop runs for every number, not just sequence starts. For a sequence of length L, it's executed L times from each element, turning the algorithm into O(n²). Here, a single streak of 10,000 consecutive logins caused 10,000 * 10,000 / 2 ≈ 50 million inner steps.- Always validate O(n) claims with data from both worst-case (one long chain) and worst-case-looking (many short chains) inputs.
- Amortised analysis isn't just theory — it's the difference between a 200ms response and a failed deployment.
- When performance degrades unexpectedly in prod, suspect hidden quadratic behaviour from missing loop guards.
for (int num : numberSet).!numberSet.contains((long) num - 1). Java's int underflows MIN_VALUE - 1 to MAX_VALUE, breaking the guard.Copy input into a HashSet and log its size: Set<Integer> set = new HashSet<>(Arrays.asList(input)); System.out.println(set.size());Assert that set size matches unique count: assert set.size() == expectedUniqueCount;for (int num : numbers) with for (int num : numberSet).Key takeaways
!numberSet.contains(num - 1)) is the entire reason the nested loop doesn't blow up to O(n²)Common mistakes to avoid
4 patternsMissing the sequence-start guard
if (numberSet.contains(num - 1)) continue; before the inner while loop. This single line keeps the algorithm linear.Iterating the original array instead of the set in the outer loop
for (int num : numbers) with for (int num : numberSet). Iterating the set guarantees each unique value is processed exactly once.Forgetting null or empty input handling
if (numbers == null || numbers.length == 0) return 0;.Not accounting for integer overflow with MIN_VALUE
!numberSet.contains(num - 1) incorrectly checks for Integer.MAX_VALUE (because of wraparound), potentially skipping a legitimate sequence start. The result is a missed streak.contains() calls. Or use a Set<Long>.Practice These on LeetCode
Interview Questions on This Topic
Can you walk me through why your solution is O(n) even though there's a while loop nested inside a for loop?
Frequently Asked Questions
20+ years shipping performance-critical code where algorithms decide the bill. Lessons pulled from things that broke in production.
That's Hashing. Mark it forged?
8 min read · try the examples if you haven't