Counting Inversions — 64-bit Overflow at 50K Items
32-bit inversion counts overflow at ~46K items.
- Inversion count measures how unsorted an array is: integer between 0 and n(n-1)/2
- Merge sort counts inversions for free during the merge step, adding O(1) per comparison
- Performance: O(n log n) time, O(n) auxiliary space (for merge)
- Production insight: integer overflow when n > 10^4 in 32-bit systems — use 64-bit
- Biggest mistake: using naive O(n²) n-squared algorithm for large datasets
- Real use: Kendall tau distance for ranking similarity, minimum adjacent swaps to sort
An inversion is a pair of positions (i, j) where i < j but arr[i] > arr[j] — an out-of-order pair. A sorted array has 0 inversions. A reverse-sorted array has n(n-1)/2 inversions. Counting inversions measures how "unsorted" an array is. Merge sort counts them for free during the merge step — when you take an element from the right half before the left half, every remaining element in the left half forms an inversion with it.
Counting inversions is a classic application of extracting extra information during a divide-and-conquer combine step at no asymptotic cost. The merge step of merge sort compares elements — when a right element is taken before a left element, it has to jump over every remaining left element. Adding those numbers gives you the total inversion count.
This pattern — piggybacking computation on an existing algorithm — appears across algorithm design. Inversions for collaborative filtering, closest pair during a geometric divide, maximum subarray in a recurrence. The lesson: if you already have the structure, ask what else you can compute for free.
In production, this matters. A naive O(n²) solution on an array of 100,000 items takes 5 billion comparisons. Merge sort does fewer than 2 million. That's the difference between a sub-second response and a dropped request.
What is an Inversion?
A pair (i, j) is an inversion if i < j but arr[i] > arr[j].
Example: arr = [2, 4, 1, 3, 5] Inversions: (2,1), (4,1), (4,3) → count = 3
- Measuring how similar two preference rankings are
- Minimum swaps to sort an array = number of inversions
- Kendall tau distance between two permutations
Modified Merge Sort
During the merge step of merge sort, when we pick an element from the right subarray before exhausting the left subarray, every remaining element in the left subarray forms an inversion with it.
Why This Works
When merging two sorted halves, any element right[j] that is taken before left[i] forms inversions with all of left[i..end]. This is because: 1. left is sorted → all remaining elements left[i..] are ≥ left[i] > right[j] 2. All these elements come before right[j] in the original array (they're in the left half)
So inversions += len(left) - i counts them all in O(1).
Application — Minimum Swaps to Sort
The minimum number of adjacent swaps needed to sort an array equals the number of inversions. Each adjacent swap reduces the inversion count by exactly 1.
Handling Large Data: Overflow and Recursion Depth
For arrays larger than 100,000 elements, two issues arise: integer overflow and recursion depth. Python's integers are arbitrary precision so overflow isn't an issue, but recursion depth may become a problem if the recursion limit is too low. Java's int overflows at n > 10^4; use long. Java recursion limit is typically ~10000, which is safe for arrays up to ~2^10000 but stack memory may be exhausted earlier. The iterative merge sort avoids both issues.
Real-World Application: Kendall Tau Distance
Kendall tau distance measures the similarity between two rankings. It is the number of discordant pairs divided by total pairs. The number of discordant pairs is exactly the inversion count when one ranking is reordered to match the order of the other. This metric is used in collaborative filtering, ensemble methods, and statistical analysis.
Inversion Count Overflow in Ranking Similarity Service
- Always use 64-bit integers for inversion counts when n > 10^4.
- Validate assumptions about data size in production — edge cases always exist.
- Add overflow detection to critical calculations: a negative count is a dead giveaway.
- Use arbitrary-precision integers (Python, BigInteger) when possible.
Key takeaways
Common mistakes to avoid
4 patternsUsing 32-bit integer for inversion count
Counting inversions incorrectly for equal elements
Using naive O(n²) algorithm for large arrays
Assuming recursion depth is always safe
Interview Questions on This Topic
How does the merge step naturally count inversions?
Frequently Asked Questions
That's Recursion. Mark it forged?
3 min read · try the examples if you haven't