Senior 3 min · March 24, 2026

Counting Inversions — 64-bit Overflow at 50K Items

32-bit inversion counts overflow at ~46K items.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Inversion count measures how unsorted an array is: integer between 0 and n(n-1)/2
  • Merge sort counts inversions for free during the merge step, adding O(1) per comparison
  • Performance: O(n log n) time, O(n) auxiliary space (for merge)
  • Production insight: integer overflow when n > 10^4 in 32-bit systems — use 64-bit
  • Biggest mistake: using naive O(n²) n-squared algorithm for large datasets
  • Real use: Kendall tau distance for ranking similarity, minimum adjacent swaps to sort
Plain-English First

An inversion is a pair of positions (i, j) where i < j but arr[i] > arr[j] — an out-of-order pair. A sorted array has 0 inversions. A reverse-sorted array has n(n-1)/2 inversions. Counting inversions measures how "unsorted" an array is. Merge sort counts them for free during the merge step — when you take an element from the right half before the left half, every remaining element in the left half forms an inversion with it.

Counting inversions is a classic application of extracting extra information during a divide-and-conquer combine step at no asymptotic cost. The merge step of merge sort compares elements — when a right element is taken before a left element, it has to jump over every remaining left element. Adding those numbers gives you the total inversion count.

This pattern — piggybacking computation on an existing algorithm — appears across algorithm design. Inversions for collaborative filtering, closest pair during a geometric divide, maximum subarray in a recurrence. The lesson: if you already have the structure, ask what else you can compute for free.

In production, this matters. A naive O(n²) solution on an array of 100,000 items takes 5 billion comparisons. Merge sort does fewer than 2 million. That's the difference between a sub-second response and a dropped request.

What is an Inversion?

A pair (i, j) is an inversion if i < j but arr[i] > arr[j].

Example: arr = [2, 4, 1, 3, 5] Inversions: (2,1), (4,1), (4,3) → count = 3

Applications
  • Measuring how similar two preference rankings are
  • Minimum swaps to sort an array = number of inversions
  • Kendall tau distance between two permutations
Production Insight
In massive datasets, inversion count can overflow 32-bit integers. Use long (64-bit) when n > 10^4.
Python's arbitrary-precision ints handle this automatically; Java and C++ must be explicit.
Rule: if n * (n-1) > 2^31 start, use 64-bit.
Key Takeaway
Inversion count = number of out-of-order pairs.
Sorted array = 0, reverse-sorted = n(n-1)/2.
Use long for n > 10^4 to avoid overflow.

Modified Merge Sort

During the merge step of merge sort, when we pick an element from the right subarray before exhausting the left subarray, every remaining element in the left subarray forms an inversion with it.

count_inversions.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def count_inversions(arr: list) -> tuple[list, int]:
    """Returns (sorted_array, inversion_count)"""
    if len(arr) <= 1:
        return arr, 0
    mid = len(arr) // 2
    left,  left_inv  = count_inversions(arr[:mid])
    right, right_inv = count_inversions(arr[mid:])
    merged, split_inv = merge_count(left, right)
    return merged, left_inv + right_inv + split_inv

def merge_count(left: list, right: list) -> tuple[list, int]:
    merged, inversions = [], 0
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            merged.append(left[i])
            i += 1
        else:
            # left[i..] all form inversions with right[j]
            inversions += len(left) - i
            merged.append(right[j])
            j += 1
    merged.extend(left[i:])
    merged.extend(right[j:])
    return merged, inversions

_, count = count_inversions([2, 4, 1, 3, 5])
print(f'Inversions: {count}')  # 3

_, count = count_inversions([5, 4, 3, 2, 1])
print(f'Inversions in reverse: {count}')  # 10
Output
Inversions: 3
Inversions in reverse: 10
Production Insight
Merge sort recursion depth is O(log n), safe for n up to about 10^7 in CPython (default recursion limit 1000 gives max depth ~2^1000, but stack memory is limiting factor). For n > 10^7, use iterative bottom-up merge to avoid RecursionError.
In Java, stack depth is also limited; prefer iterative or increase stack size with -Xss.
Key Takeaway
Counting inversions during merge adds O(1) per comparison.
Total O(n log n) time, O(n) space.
Switch to iterative merge sort when array exceeds 10 million elements.

Why This Works

When merging two sorted halves, any element right[j] that is taken before left[i] forms inversions with all of left[i..end]. This is because: 1. left is sorted → all remaining elements left[i..] are ≥ left[i] > right[j] 2. All these elements come before right[j] in the original array (they're in the left half)

So inversions += len(left) - i counts them all in O(1).

Complexity
O(n log n) — same as merge sort. The inversion counting adds only O(1) work per comparison in the merge step.
Production Insight
Edge case: equal elements. Inversions count only when left[i] > right[j] (strict). The condition left[i] <= right[j] means no inversion. This matches the definition where equal elements are not out-of-order.
If your application requires counting inversions as pairs where arr[i] >= arr[j], adjust the comparison to > instead of >=.
Key Takeaway
Inversion count formula: when left[i] > right[j], inversions += len(left) - i.
Equal elements are not inversions.
Condition matters: use > for strict inversions.

Application — Minimum Swaps to Sort

The minimum number of adjacent swaps needed to sort an array equals the number of inversions. Each adjacent swap reduces the inversion count by exactly 1.

min_swaps.pyPYTHON
1
2
3
4
5
def min_swaps_to_sort(arr):
    _, count = count_inversions(arr)
    return count

print(min_swaps_to_sort([3, 1, 2]))  # 2 swaps: [3,1,2]→[1,3,2]→[1,2,3]
Output
2
Production Insight
In production sorting, knowing minimum swaps helps estimate distance from sorted state. Used in approximate sorting algorithms and pre-sorting checks. For non-adjacent swaps, inversion count is not directly the answer — that problem requires greedy selection or cycle decomposition.
For adjacent swaps only, inversion count is exact.
Key Takeaway
Every adjacent swap reduces inversion count by exactly 1.
Minimum adjacent swaps to sort = inversion count.
For non-adjacent swaps, use a different approach.

Handling Large Data: Overflow and Recursion Depth

For arrays larger than 100,000 elements, two issues arise: integer overflow and recursion depth. Python's integers are arbitrary precision so overflow isn't an issue, but recursion depth may become a problem if the recursion limit is too low. Java's int overflows at n > 10^4; use long. Java recursion limit is typically ~10000, which is safe for arrays up to ~2^10000 but stack memory may be exhausted earlier. The iterative merge sort avoids both issues.

io/thecodeforge/inversions/iterative.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def iterative_count_inversions(arr: list) -> int:
    """Iterative bottom-up merge sort counting inversions."""
    n = len(arr)
    temp = [0] * n
    inversions = 0
    size = 1
    while size < n:
        for left_start in range(0, n, 2 * size):
            mid = min(left_start + size, n)
            right_end = min(left_start + 2 * size, n)
            i = left_start
            j = mid
            k = left_start
            while i < mid and j < right_end:
                if arr[i] <= arr[j]:
                    temp[k] = arr[i]
                    i += 1
                else:
                    inversions += mid - i
                    temp[k] = arr[j]
                    j += 1
                k += 1
            while i < mid:
                temp[k] = arr[i]
                i += 1
                k += 1
            while j < right_end:
                temp[k] = arr[j]
                j += 1
                k += 1
        arr[:] = temp
        size *= 2
    return inversions

data = list(range(100000, 0, -1))
inv = iterative_count_inversions(data)
print(f'Inversions in 100k reverse-sorted: {inv}')
Output
Inversions in 100k reverse-sorted: 4999950000
Production Insight
Python's recursion limit is typically 1000, which accommodates arrays up to ~2^1000 size — impractical. However, stack memory grows with recursive calls. For 10 million elements, log2(1e7) ≈ 24, which is safe. For 1 billion elements, log2(1e9) ≈ 30 — still safe. The real limit is the available C stack. In Java, default stack size may handle only about 10,000 calls, which limits n to 2^10000, but heap memory is the bottleneck first.
Use iterative merge for extremely large arrays to avoid any stack concerns.
Key Takeaway
Recursion depth O(log n) is safe for n up to ~10^9.
Python recursion limit is not a practical problem for any realistic array.
For safety in any language, use iterative merge sort.

Real-World Application: Kendall Tau Distance

Kendall tau distance measures the similarity between two rankings. It is the number of discordant pairs divided by total pairs. The number of discordant pairs is exactly the inversion count when one ranking is reordered to match the order of the other. This metric is used in collaborative filtering, ensemble methods, and statistical analysis.

kendall_tau.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def kendall_tau(ranking_a: list, ranking_b: list) -> float:
    """Compute Kendall tau correlation between two rankings."""
    n = len(ranking_a)
    # Map each item in ranking_a to its position
    pos = {item: idx for idx, item in enumerate(ranking_a)}
    # Transform ranking_b into an array of positions relative to a
    b_positions = [pos[item] for item in ranking_b]
    # Number of discordant pairs = inversion count of b_positions
    _, inversions = count_inversions(b_positions)
    # Total pairs
    total_pairs = n * (n - 1) // 2
    # Kendall tau = (concordant - discordant) / total = 1 - 2*(inversions / total)
    tau = 1 - (2 * inversions) / total_pairs
    return tau

# Example: same ranking
print(kendall_tau([1,2,3,4,5], [1,2,3,4,5]))  # 1.0
# Reverse ranking
print(kendall_tau([1,2,3,4,5], [5,4,3,2,1]))  # -1.0
# Slightly different
print(kendall_tau([1,2,3,4,5], [2,1,3,4,5]))  # 0.8
Output
1.0
-1.0
0.8
Production Insight
In production recommendation systems, computing Kendall tau between user ranking histories can find similar users. With O(n log n) inversion count, you can compare thousands of users against each other efficiently. For item catalogs of millions, bucket by common items and compute only on overlapping sets.
Integer overflow: ranking length up to 10^5 gives ~5e9 pairs, which fits in Python's int but would overflow Java's int. Use long.
Key Takeaway
Kendall tau = 1 - 2 * inversion_count / nC2.
Inversion count of reordered ranking gives discordant pairs.
Lower tau = more similar rankings.
● Production incidentPOST-MORTEMseverity: high

Inversion Count Overflow in Ranking Similarity Service

Symptom
Ranking similarity returned negative values for large catalogs (>50,000 items). Kendall tau would occasionally be outside [-1,1].
Assumption
Inversion count fits in a 32-bit integer. The team assumed max users with 10,000 items would produce at most 50 million inversions, which is below 2^31.
Root cause
Some users had catalogs of 80,000 items, giving max inversions ~3.2 billion, exceeding 2^31-1 (2.147 billion). The inversion count wrapped around to negative values, causing Kendall tau formula to produce nonsensical results.
Fix
Changed all inversion count variables from int to long (64-bit) in Java and Python code. Added automated overflow detection: assert count >= 0 always. Added monitoring to alert if count > 1 billion.
Key lesson
  • Always use 64-bit integers for inversion counts when n > 10^4.
  • Validate assumptions about data size in production — edge cases always exist.
  • Add overflow detection to critical calculations: a negative count is a dead giveaway.
  • Use arbitrary-precision integers (Python, BigInteger) when possible.
Production debug guideSymptom → Action guide for common problems4 entries
Symptom · 01
Negative inversion count
Fix
Check integer overflow: use 64-bit types (long, Python int). Verify max inversions for array length = n*(n-1)/2.
Symptom · 02
Incorrect count for reverse-sorted array
Fix
Verify merge logic: inversions += len(left)-i when left[i] > right[j]. Ensure no off-by-one errors.
Symptom · 03
Stack overflow on large array
Fix
Switch to iterative merge sort or increase recursion limit (sys.setrecursionlimit) in Python, or increase stack size with -Xss in JVM.
Symptom · 04
O(n²) runtime for naive algorithm
Fix
Profile and replace with O(n log n) merge sort or Fenwick tree/BIT approach.
★ Quick Debug Cheat Sheet for Inversion CountingFast commands and fixes for common inversion counting problems in production Python code.
Overflow suspected
Immediate action
Print maximum inversion count for array length: n*(n-1)//2
Commands
python -c 'n=100000; print(n*(n-1)//2)'
Check type: type(count) should be int (Python has arbitrary precision). In Java, check if count is long.
Fix now
Use Python's arbitrary-precision ints; in Java use long; in databases use BIGINT.
Result too slow+
Immediate action
Check if using O(n²) algorithm
Commands
python -m cProfile count_inversions.py
Measure: timeit.timeit('count_inversions(...)', globals=globals(), number=1)
Fix now
Replace naive double loop with merge sort inversion counting (O(n log n)).
Recursion depth error+
Immediate action
Check input size
Commands
import sys; sys.setrecursionlimit(100000)
Use iterative merge sort implementation (see code above).
Fix now
Increase recursion limit or switch to iterative merge sort.
Comparison of Inversion Counting Approaches
ApproachTime ComplexitySpace ComplexityStabilityEase of Implementation
Naive (double loop)O(n²)O(1)N/ASimply two nested loops — but too slow for n > 10^4.
Merge SortO(n log n)O(n)Stable (preserves input order of equal elements)Moderate — requires correct merge logic with inversion counting.
Fenwick Tree (BIT)O(n log n)O(n)Depends on iteration orderMore complex to understand but often faster in practice.

Key takeaways

1
When right[j] is taken before all of left[i..end] during merge, it forms inversions with every remaining left element. inversions += len(left) - i is the O(1) count.
2
Same O(n log n) as merge sort
inversion counting adds O(1) work per comparison, no asymptotic cost.
3
Inversion count = minimum number of adjacent swaps to sort the array. Every adjacent swap reduces the count by exactly 1.
4
Applications
Kendall tau distance between two rankings (collaborative filtering similarity), measuring how 'sorted' external data is before choosing a sort algorithm.
5
O(n log n) is optimal
lower bound by reduction from sorting. The merge sort approach achieves this bound.
6
Integer overflow is a real production risk for n > 10^4
always use 64-bit or arbitrary-precision integers.

Common mistakes to avoid

4 patterns
×

Using 32-bit integer for inversion count

Symptom
Inversion count becomes negative or wraps around for arrays larger than ~10,000 elements. In Java, overflow silently produces negative or incorrect values.
Fix
Use Python int (unbounded) or Java long (64-bit). In SQL, use BIGINT. For extremely large arrays, consider BigInteger.
×

Counting inversions incorrectly for equal elements

Symptom
Count off by the number of equal pairs that are not actually inversions. For arrays with many duplicates, error is significant.
Fix
Use strict inequality (arr[i] > arr[j]) to count inversions. Use >= only if your definition includes equal elements as inversions.
×

Using naive O(n²) algorithm for large arrays

Symptom
Runtime explodes for arrays > 10,000 elements. An array of 100,000 takes ~10 billion comparisons (unacceptable).
Fix
Adopt O(n log n) merge sort or Fenwick tree/BIT approach. Profile with cProfile to confirm.
×

Assuming recursion depth is always safe

Symptom
RecursionError in Python or StackOverflowError in Java for very large arrays (n > 10^7) or deeply nested recursion from degenerate input.
Fix
Use iterative merge sort (bottom-up). Increase recursion limit only as a temporary workaround.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does the merge step naturally count inversions?
Q02JUNIOR
What is the inversion count of a reverse-sorted array of n elements?
Q03SENIOR
Why is O(n log n) achievable — what's the lower bound for counting inver...
Q04SENIOR
How would you use inversion count to measure similarity between two rank...
Q05SENIOR
Explain how Fenwick tree can count inversions and compare it to merge so...
Q01 of 05SENIOR

How does the merge step naturally count inversions?

ANSWER
During merge, when we pick an element from the right subarray before the left subarray is exhausted, that right element is smaller than all remaining left elements. Since they appear after it in the original array, each remaining left element forms an inversion with this right element. So we add len(left) - i to the inversion count for each such right element. This counts all inversions without any extra overhead.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
Can we count inversions in O(n)?
02
What is the inversion count of an already sorted array?
03
Does counting inversions require stable sorting?
04
Can we use a Fenwick tree to count inversions?
🔥

That's Recursion. Mark it forged?

3 min read · try the examples if you haven't

Previous
Divide and Conquer — Strategy and Patterns
6 / 9 · Recursion
Next
Closest Pair of Points — Divide and Conquer