Counting Inversions — 64-bit Overflow at 50K Items
- When right[j] is taken before all of left[i..end] during merge, it forms inversions with every remaining left element. inversions += len(left) - i is the O(1) count.
- Same O(n log n) as merge sort — inversion counting adds O(1) work per comparison, no asymptotic cost.
- Inversion count = minimum number of adjacent swaps to sort the array. Every adjacent swap reduces the count by exactly 1.
- Inversion count measures how unsorted an array is: integer between 0 and n(n-1)/2
- Merge sort counts inversions for free during the merge step, adding O(1) per comparison
- Performance: O(n log n) time, O(n) auxiliary space (for merge)
- Production insight: integer overflow when n > 10^4 in 32-bit systems — use 64-bit
- Biggest mistake: using naive O(n²) n-squared algorithm for large datasets
- Real use: Kendall tau distance for ranking similarity, minimum adjacent swaps to sort
Quick Debug Cheat Sheet for Inversion Counting
Overflow suspected
python -c 'n=100000; print(n*(n-1)//2)'Check type: type(count) should be int (Python has arbitrary precision). In Java, check if count is long.Result too slow
python -m cProfile count_inversions.pyMeasure: timeit.timeit('count_inversions(...)', globals=globals(), number=1)Recursion depth error
import sys; sys.setrecursionlimit(100000)Use iterative merge sort implementation (see code above).Production Incident
Production Debug GuideSymptom → Action guide for common problems
Counting inversions is a classic application of extracting extra information during a divide-and-conquer combine step at no asymptotic cost. The merge step of merge sort compares elements — when a right element is taken before a left element, it has to jump over every remaining left element. Adding those numbers gives you the total inversion count.
This pattern — piggybacking computation on an existing algorithm — appears across algorithm design. Inversions for collaborative filtering, closest pair during a geometric divide, maximum subarray in a recurrence. The lesson: if you already have the structure, ask what else you can compute for free.
In production, this matters. A naive O(n²) solution on an array of 100,000 items takes 5 billion comparisons. Merge sort does fewer than 2 million. That's the difference between a sub-second response and a dropped request.
What is an Inversion?
A pair (i, j) is an inversion if i < j but arr[i] > arr[j].
Example: arr = [2, 4, 1, 3, 5] Inversions: (2,1), (4,1), (4,3) → count = 3
- Measuring how similar two preference rankings are
- Minimum swaps to sort an array = number of inversions
- Kendall tau distance between two permutations
Modified Merge Sort
During the merge step of merge sort, when we pick an element from the right subarray before exhausting the left subarray, every remaining element in the left subarray forms an inversion with it.
def count_inversions(arr: list) -> tuple[list, int]: """Returns (sorted_array, inversion_count)""" if len(arr) <= 1: return arr, 0 mid = len(arr) // 2 left, left_inv = count_inversions(arr[:mid]) right, right_inv = count_inversions(arr[mid:]) merged, split_inv = merge_count(left, right) return merged, left_inv + right_inv + split_inv def merge_count(left: list, right: list) -> tuple[list, int]: merged, inversions = [], 0 i = j = 0 while i < len(left) and j < len(right): if left[i] <= right[j]: merged.append(left[i]) i += 1 else: # left[i..] all form inversions with right[j] inversions += len(left) - i merged.append(right[j]) j += 1 merged.extend(left[i:]) merged.extend(right[j:]) return merged, inversions _, count = count_inversions([2, 4, 1, 3, 5]) print(f'Inversions: {count}') # 3 _, count = count_inversions([5, 4, 3, 2, 1]) print(f'Inversions in reverse: {count}') # 10
Inversions in reverse: 10
Why This Works
When merging two sorted halves, any element right[j] that is taken before left[i] forms inversions with all of left[i..end]. This is because: 1. left is sorted → all remaining elements left[i..] are ≥ left[i] > right[j] 2. All these elements come before right[j] in the original array (they're in the left half)
So inversions += len(left) - i counts them all in O(1).
Application — Minimum Swaps to Sort
The minimum number of adjacent swaps needed to sort an array equals the number of inversions. Each adjacent swap reduces the inversion count by exactly 1.
def min_swaps_to_sort(arr): _, count = count_inversions(arr) return count print(min_swaps_to_sort([3, 1, 2])) # 2 swaps: [3,1,2]→[1,3,2]→[1,2,3]
Handling Large Data: Overflow and Recursion Depth
For arrays larger than 100,000 elements, two issues arise: integer overflow and recursion depth. Python's integers are arbitrary precision so overflow isn't an issue, but recursion depth may become a problem if the recursion limit is too low. Java's int overflows at n > 10^4; use long. Java recursion limit is typically ~10000, which is safe for arrays up to ~2^10000 but stack memory may be exhausted earlier. The iterative merge sort avoids both issues.
def iterative_count_inversions(arr: list) -> int: """Iterative bottom-up merge sort counting inversions.""" n = len(arr) temp = [0] * n inversions = 0 size = 1 while size < n: for left_start in range(0, n, 2 * size): mid = min(left_start + size, n) right_end = min(left_start + 2 * size, n) i = left_start j = mid k = left_start while i < mid and j < right_end: if arr[i] <= arr[j]: temp[k] = arr[i] i += 1 else: inversions += mid - i temp[k] = arr[j] j += 1 k += 1 while i < mid: temp[k] = arr[i] i += 1 k += 1 while j < right_end: temp[k] = arr[j] j += 1 k += 1 arr[:] = temp size *= 2 return inversions data = list(range(100000, 0, -1)) inv = iterative_count_inversions(data) print(f'Inversions in 100k reverse-sorted: {inv}')
Real-World Application: Kendall Tau Distance
Kendall tau distance measures the similarity between two rankings. It is the number of discordant pairs divided by total pairs. The number of discordant pairs is exactly the inversion count when one ranking is reordered to match the order of the other. This metric is used in collaborative filtering, ensemble methods, and statistical analysis.
def kendall_tau(ranking_a: list, ranking_b: list) -> float: """Compute Kendall tau correlation between two rankings.""" n = len(ranking_a) # Map each item in ranking_a to its position pos = {item: idx for idx, item in enumerate(ranking_a)} # Transform ranking_b into an array of positions relative to a b_positions = [pos[item] for item in ranking_b] # Number of discordant pairs = inversion count of b_positions _, inversions = count_inversions(b_positions) # Total pairs total_pairs = n * (n - 1) // 2 # Kendall tau = (concordant - discordant) / total = 1 - 2*(inversions / total) tau = 1 - (2 * inversions) / total_pairs return tau # Example: same ranking print(kendall_tau([1,2,3,4,5], [1,2,3,4,5])) # 1.0 # Reverse ranking print(kendall_tau([1,2,3,4,5], [5,4,3,2,1])) # -1.0 # Slightly different print(kendall_tau([1,2,3,4,5], [2,1,3,4,5])) # 0.8
-1.0
0.8
| Approach | Time Complexity | Space Complexity | Stability | Ease of Implementation |
|---|---|---|---|---|
| Naive (double loop) | O(n²) | O(1) | N/A | Simply two nested loops — but too slow for n > 10^4. |
| Merge Sort | O(n log n) | O(n) | Stable (preserves input order of equal elements) | Moderate — requires correct merge logic with inversion counting. |
| Fenwick Tree (BIT) | O(n log n) | O(n) | Depends on iteration order | More complex to understand but often faster in practice. |
🎯 Key Takeaways
- When right[j] is taken before all of left[i..end] during merge, it forms inversions with every remaining left element. inversions += len(left) - i is the O(1) count.
- Same O(n log n) as merge sort — inversion counting adds O(1) work per comparison, no asymptotic cost.
- Inversion count = minimum number of adjacent swaps to sort the array. Every adjacent swap reduces the count by exactly 1.
- Applications: Kendall tau distance between two rankings (collaborative filtering similarity), measuring how 'sorted' external data is before choosing a sort algorithm.
- O(n log n) is optimal — lower bound by reduction from sorting. The merge sort approach achieves this bound.
- Integer overflow is a real production risk for n > 10^4 — always use 64-bit or arbitrary-precision integers.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QHow does the merge step naturally count inversions?Mid-levelReveal
- QWhat is the inversion count of a reverse-sorted array of n elements?JuniorReveal
- QWhy is O(n log n) achievable — what's the lower bound for counting inversions?SeniorReveal
- QHow would you use inversion count to measure similarity between two rankings?SeniorReveal
- QExplain how Fenwick tree can count inversions and compare it to merge sort.SeniorReveal
Frequently Asked Questions
Can we count inversions in O(n)?
No — counting inversions has a lower bound of Ω(n log n) by reduction from sorting. The merge sort approach is therefore optimal.
What is the inversion count of an already sorted array?
Zero. No pair (i,j) with i < j has arr[i] > arr[j].
Does counting inversions require stable sorting?
No, but the merge step must preserve the original order of equal elements to avoid counting false inversions. The standard merge sort is stable; the condition left[i] <= right[j] ensures equal elements from left are placed before those from right, preventing incorrect counts.
Can we use a Fenwick tree to count inversions?
Yes. Iterate from right to left, query the number of smaller elements seen so far, add to count, then update the tree. This is O(n log n) with O(n) space, often faster than merge sort in practice due to lower constant factors.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.