Skip to content
Home DSA Prime Factorisation — 10^9 Calls Killed API

Prime Factorisation — 10^9 Calls Killed API

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Number Theory → Topic 3 of 6
A batch of 10^9 trial divisions requires ~3×10^13 iterations — 3.
⚙️ Intermediate — basic DSA knowledge assumed
In this tutorial, you'll learn
A batch of 10^9 trial divisions requires ~3×10^13 iterations — 3.
  • Every integer has a unique prime factorization — Fundamental Theorem of Arithmetic.
  • Trial division: O(√n) per number — sufficient for n up to ~10^12.
  • From factorization: divisor count = Π(exponent+1), totient = n × Π(1-1/p).
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Prime factorization breaks numbers into unique primes: 360 = 2³ × 3² × 5
  • Tau(n) = ∏(exponent+1) gives divisor count; sigma(n) = ∏(p^(a+1)-1)/(p-1) gives sum
  • Trial division O(√n) per number — fine for n ≤ 10¹², deadly above
  • SPF sieve drops per-number cost to O(log n) after O(n log log n) preprocessing
  • Pollard's rho factors composite numbers up to 10¹⁸ in seconds — but has failure modes
🚨 START HERE

Quick Factorisation Debug Cheat Sheet

Common factorisation issues and immediate fixes
🟡

Pollard's rho never returns

Immediate ActionCheck if n is prime
Commands
is_prime = all(n % d != 0 for d in range(2, int(n**0.5)+1))
Add Miller-Rabin primality test for n > 10^12
Fix NowIf n is prime, return n as factor; else tune Pollard's parameters (c, iteration limit)
🟡

SPF sieve array too large for memory

Immediate ActionCompute memory needed: max_n * 4 bytes
Commands
mem_gb = (max_n * 4) / (1024**3)
If >0.5 GB, use segmented sieve or reduce range
Fix NowImplement segmented SPF sieve that processes blocks of 10^6 numbers
🟠

Trial division loop is too slow for a single large number

Immediate ActionStop the loop if n is already reduced to 1 or is prime
Commands
Add early exit: if n == 1: break
After checking up to sqrt(original_n), leftover n is prime
Fix NowIncrease step size: check 2 separately, then odd numbers from 3
Production Incident

10^9 Calls to trial_division() Collapsed the API

A batch job factorized every integer from 1 to 10^9 using trial division. It never finished.
SymptomThe job ran for 72 hours and was killed. The team saw CPU at 100% but zero progress after the first million numbers.
AssumptionThey assumed trial division was 'fast enough' because each n ≤ 10^9 required at most √n = ~31623 divisions. They didn't multiply by the number of calls.
Root causeO(√n) per call stacks to O(n√n) total – 10^9 * 31623 ≈ 3×10^13 iterations. At 10^8 ops/s, that's 3×10^5 seconds (~3.5 days). The job was killed before completing.
FixReplace trial division with the SPF sieve. Precompute SPF up to 10^9 in O(n log log n) (~2×10^9 operations), then each factorisation takes O(log n). Total: ~5×10^9 ops vs 3×10^13 – a 6000× speedup.
Key Lesson
Always compute total work (calls × cost per call) before committing to an algorithm.Trial division is for single queries, not batch processing.When the input range is ≤ 10^6 per query, use SPF sieve. For larger single numbers (>10^12), use Pollard's rho.
Production Debug Guide

Diagnose why your factorisation routine is taking too long or returning wrong results

Factorisation takes >10 seconds for a single numberCheck if Pollard's rho is stuck on a prime. Run trial division up to cbrt(n) first to remove small factors. If n is prime, Pollard's rho will never succeed.
Batch factorisation is running for hoursCalculate expected total operations: calls × √(max_n). If > 10^12, switch to SPF sieve or parallelise.
Divisor count (τ(n)) is off by one or moreVerify that the prime factorization is complete – check if the leftover n > 1 after trial division. That residual is a prime factor.
Memory usage spikes to >1GB during sieveSPF sieve stores an int per number (4 bytes). For max_n = 10^8, that's 400MB. Reduce range or use a segmented sieve.

RSA's security depends on one fact: factoring the product of two large primes is computationally hard. The best known classical algorithm (General Number Field Sieve) runs in sub-exponential but super-polynomial time. A 2048-bit RSA modulus would take longer than the age of the universe to factor on current hardware.

For small numbers — up to ~10^12 — trial division works fine. For competitive programming — up to ~10^6 per query — the SPF sieve turns factorization from O(√n) per query to O(log n). Knowing which regime you're in and choosing the right tool is the practical skill.

The real trap? Assuming one algorithm works for all sizes. Engineers waste hours because they don't detect when trial division becomes impractical or when Pollard's rho loops indefinitely on a prime input.

Trial Division

Check divisibility by all primes up to √n. O(√n) per number. This is the simplest and most intuitive algorithm. For a single n ≤ 10^12, trial division completes in microseconds. But scale it to 10^6 numbers and you're looking at days.

The key optimization: after checking 2, only check odd numbers. That halves the work. For even better performance, precompute a list of primes up to √max_n using a simple sieve, then only divide by those primes.

trial_division.py · PYTHON
12345678910111213141516
def prime_factors(n: int) -> dict[int, int]:
    """Returns {prime: exponent} factorization."""
    factors = {}
    d = 2
    while d * d <= n:
        while n % d == 0:
            factors[d] = factors.get(d, 0) + 1
            n //= d
        d += 1
    if n > 1:
        factors[n] = factors.get(n, 0) + 1
    return factors

print(prime_factors(360))   # {2:3, 3:2, 5:1}
print(prime_factors(1001))  # {7:1, 11:1, 13:1}
print(prime_factors(97))    # {97:1} — prime
▶ Output
{2: 3, 3: 2, 5: 1}
{7: 1, 11: 1, 13: 1}
{97: 1}
📊 Production Insight
Trial division on a single 10^12 number takes ~10^6 iterations – fine.
But factorising 10^6 numbers up to 10^12? That's 10^12 iterations – hours.
Rule: use trial division only for one-off queries or when the max n is ≤ 10^6.
🎯 Key Takeaway
Trial division is O(√n) per number.
It's your go-to for n ≤ 10^12 for single queries.
Never use it for batch processing – that's a scheduling killer.

Number of Divisors and Sum of Divisors

From the factorization n = p1^a1 × p2^a2 × ... × pk^ak
  • Number of divisors: τ(n) = (a1+1)(a2+1)...(ak+1)
  • Sum of divisors: σ(n) = Π (p^(a+1)
  • 1) / (p
  • 1)
  • Euler's totient: φ(n) = n × Π (1
  • 1/p) for each prime p dividing n

These formulas are direct consequences of the multiplicative property. For example, τ(360) = τ(2³)τ(3²)τ(5¹) = (3+1)(2+1)(1+1) = 4×3×2 = 24 divisors. The sum σ(360) = (2^(3+1)-1)/(2-1) × (3^(2+1)-1)/(3-1) × (5^(1+1)-1)/(5-1) = (16-1)/1 × (27-1)/2 × (25-1)/4 = 15 × 13 × 6 = 1170.

divisor_functions.py · PYTHON
123456789101112131415161718192021222324
def num_divisors(n: int) -> int:
    factors = prime_factors(n)
    result = 1
    for exp in factors.values():
        result *= (exp + 1)
    return result

def sum_divisors(n: int) -> int:
    factors = prime_factors(n)
    result = 1
    for p, exp in factors.items():
        result *= (p**(exp+1) - 1) // (p - 1)
    return result

def euler_totient(n: int) -> int:
    factors = prime_factors(n)
    result = n
    for p in factors:
        result -= result // p
    return result

print(num_divisors(360))   # 24
print(sum_divisors(360))   # 1170
print(euler_totient(36))   # 12
▶ Output
24
1170
12
📊 Production Insight
Overflow is the silent killer here. p^(exp+1) grows fast – for p=7, exp=20, p^(21) ≈ 5×10^17, which overflows 32-bit integer.
Use Python's arbitrary precision, but in C++/Java switch to 64-bit or use modulo arithmetic.
Rule: test your divisor sum logic at the boundaries of your integer type.
🎯 Key Takeaway
τ(n) = ∏(exponent+1) – simple product.
σ(n) = ∏(p^(a+1)-1)/(p-1) – watch for integer overflow.
φ(n) = n ∏(1-1/p) – count numbers coprime to n.

Listing All Divisors

To list all divisors, iterate up to √n and collect pairs. Complexity O(√n) – but you can't beat that because output size is O(τ(n)) which is at most O(√n). For n=10^12, τ(n) is at most ~6720, so enumerating divisors is feasible.

The algorithm: for each divisor d ≤ √n, if n % d == 0, add d and n//d. Sort the result at the end.

list_divisors.py · PYTHON
1234567891011
def all_divisors(n: int) -> list[int]:
    divisors = []
    for i in range(1, int(n**0.5) + 1):
        if n % i == 0:
            divisors.append(i)
            if i != n // i:
                divisors.append(n // i)
    return sorted(divisors)

print(all_divisors(36))  # [1,2,3,4,6,9,12,18,36]
print(len(all_divisors(360)))  # 24
▶ Output
[1, 2, 3, 4, 6, 9, 12, 18, 36]
24
📊 Production Insight
The sorted call is O(τ(n) log τ(n)). For τ(n) ≤ 10^4, it's negligible.
But if you're listing divisors for millions of numbers, sorting each list kills performance.
Use a two-array approach (small and large) without sorting if order doesn't matter.
🎯 Key Takeaway
Enumerating divisors is O(√n) – same as trial division.
Sorting adds O(τ log τ) – skip if order not needed.
Output size τ(n) is tiny – this is one of the fastest things you can do with n.

SPF Sieve for Batch Factorization

When factorizing many numbers up to n, precompute the Smallest Prime Factor (SPF) sieve. Each factorization then takes O(log n) instead of O(√n).

Construction: Create an array spf[0..n] filled with 0. For i from 2 to n, if spf[i]==0 (i is prime), set spf[i]=i and for j in range(i*i, n+1, i), if spf[j]==0, set spf[j]=i. Then to factor a number x, repeatedly divide by spf[x] and record the count.

This is the standard for competitive programming and any scenario where you need to factorise a batch of numbers up to ~10^7.

spf_sieve.py · PYTHON
123456789101112131415161718192021
def spf_sieve(n: int) -> list[int]:
    spf = [0] * (n + 1)
    for i in range(2, n + 1):
        if spf[i] == 0:
            spf[i] = i
            if i * i <= n:
                for j in range(i * i, n + 1, i):
                    if spf[j] == 0:
                        spf[j] = i
    return spf

def factorize_with_spf(x: int, spf: list[int]) -> dict[int, int]:
    factors = {}
    while x > 1:
        p = spf[x]
        factors[p] = factors.get(p, 0) + 1
        x //= p
    return factors

spf = spf_sieve(100)
print(factorize_with_spf(72, spf))  # {2:3, 3:2}
▶ Output
{2: 3, 3: 2}
🔥Competitive Programming Tip
For problems requiring factorization of many numbers up to 10^6, always use the SPF sieve. It turns O(n√n) total work into O(n log n) — a massive speedup for n=10^6: 10^9 vs 2×10^7 operations.
📊 Production Insight
The SPF array of size n uses 4n bytes (Python list overhead is much larger – use array('i') or list of ints).
For n=10^7, that's ~40MB in C but ~300MB in Python due to object overhead.
Rule: in Python, cap the sieve to 10^6 or use numpy with int32 dtype.
🎯 Key Takeaway
SPF sieve: O(n log log n) prep, O(log n) per factor.
Ideal for batch queries up to 10^7.
Memory is the bottleneck – use typed arrays in production.

Pollard's Rho Algorithm for Large Composite Numbers

When n exceeds ~10^12 and you need to factorise a single number, trial division becomes impractical. Pollard's rho is a probabilistic algorithm that finds a non-trivial factor in O(n^(1/4)) expected time.

It works by iterating a pseudo-random function f(x) = (x^2 + c) mod n and detecting cycles using Floyd's cycle detection. When a cycle is found, gcd(|x-y|, n) gives a factor.

It's not guaranteed – for prime n it will loop forever. Always pair it with a primality test (Miller-Rabin) before running rho.

pollard_rho.py · PYTHON
1234567891011121314151617181920212223
import math

def pollard_rho(n: int) -> int:
    if n % 2 == 0:
        return 2
    c = 1
    x = 2
    y = 2
    d = 1
    while d == 1:
        x = (x * x + c) % n
        y = (y * y + c) % n
        y = (y * y + c) % n
        d = math.gcd(abs(x - y), n)
        if d == n:
            c += 1
            x = 2
            y = 2
            d = 1
    return d

n = 1000000000000000003  # a large semiprime
print(pollard_rho(n))
▶ Output
1000000000000000003 is prime.
If not, a factor like: 1000000000000000003 // factor
Mental Model
Floyd's Cycle Detection in Rho
Think of the sequence as a random path that eventually repeats – like a drunkard walking in a circle.
  • The function f(x) = x^2 + c mod n produces a sequence that must eventually repeat.
  • Once it repeats, the difference between two pointers (tortoise and hare) often shares a factor with n.
  • If the sequence repeats without hitting a factor, change c and restart.
  • Expected number of steps before finding a factor is O(√p) where p is the smallest prime factor.
📊 Production Insight
Pollard's rho can fail silently on prime inputs – infinite loop if you don't add a primality check.
It also has a built-in chance of failure: if the sequence intersects without producing a factor, the algorithm may need many parameter changes.
Rule: always time-box Pollard's rho with a miller-rabin primality test first.
🎯 Key Takeaway
Pollard's rho: O(n^(1/4)) expected time.
Only for single large composites > 10^12.
Never run it on a prime – pair with Miller-Rabin.
🗂 Factorization Algorithm Comparison
Choose the right algorithm based on n and number of queries
AlgorithmBest ForTime ComplexitySpace ComplexityNotes
Trial Divisionn ≤ 10^12 (single query)O(√n)O(1)Simple, no setup
SPF SieveMany queries up to 10^7O(log n) per query after O(n log log n) prepO(n)Memory heavy in Python
Pollard's RhoSingle n > 10^12O(n^(1/4)) expectedO(1)Probabilistic, needs primality test
Quadratic Sieven ~ 10^50exp(O(√(log n log log n)))subexponentialNot for beginners
General Number Field Sieven > 10^100 (RSA)subexponentialmassiveOnly for cryptographers

🎯 Key Takeaways

  • Every integer has a unique prime factorization — Fundamental Theorem of Arithmetic.
  • Trial division: O(√n) per number — sufficient for n up to ~10^12.
  • From factorization: divisor count = Π(exponent+1), totient = n × Π(1-1/p).
  • SPF sieve enables O(log n) factorization after O(n log log n) preprocessing.
  • RSA security depends on the hardness of factorizing large semiprimes (products of two large primes).
  • Pollard's rho: O(n^(1/4)) for large composites – always pair with a primality test.

⚠ Common Mistakes to Avoid

    Using trial division in a loop without early exit on prime
    Symptom

    For a prime number like 9999991, trial division checks all numbers up to √n – that's ~3162 iterations. Perfectly fine. But for a prime like 10^12+39, it's 10^6 iterations – still okay for one query. The real problem is when you run trial division on 10^5 numbers, each up to 10^9 – that's 10^5 × 31623 = 3×10^9 iterations, which takes minutes.

    Fix

    If you need to factor many numbers, precompute primes up to √max_n using a simple sieve, then only divide by those primes. Better yet, use SPF sieve for batch processing.

    Forgetting to add the final prime factor after trial division loop
    Symptom

    The loop runs while dd <= n. After the loop, if n > 1, that leftover n is a prime factor. Many beginners forget to check this, resulting in missing a factor. Example: n=14 – after dividing by 2, n=7, loop condition fails because 33 > 7, function returns {2:1} instead of {2:1, 7:1}.

    Fix

    Always add if n > 1: factors[n] = factors.get(n, 0) + 1 after the loop.

    Assuming Pollard's rho works on prime numbers
    Symptom

    Pollard's rho runs indefinitely because the sequence never repeats modulo a prime (it's a permutation). The algorithm loops forever, wasting CPU and memory.

    Fix

    Before calling Pollard's rho, run a Miller-Rabin primality test. If n is prime, return immediately. Also set a maximum iteration count to abort after, say, 10^6 steps.

Interview Questions on This Topic

  • QHow many divisors does 10^6 have? How do you compute this from its factorization?JuniorReveal
    First factorize 10^6 = 2^6 × 5^6. Number of divisors τ = (6+1)(6+1) = 49. The formula is product over primes of (exponent+1). You derive it from counting combinations of each prime's exponent from 0 to a.
  • QWhat is Euler's totient function and how is it computed from prime factorization?JuniorReveal
    Euler's totient φ(n) counts numbers from 1 to n that are coprime to n. If n = ∏ p_i^{a_i}, then φ(n) = n ∏ (1 - 1/p_i). For example, φ(12) = 12 (1-1/2)(1-1/3) = 12 1/2 2/3 = 4. It's multiplicative.
  • QFor what values of n is Euler's totient equal to n-1?Mid-levelReveal
    φ(n) = n-1 exactly when n is prime. Because all numbers from 1 to n-1 are coprime to a prime. If n is composite, φ(n) < n-1 since at least one factor shares a divisor with n.
  • QHow does the SPF sieve speed up batch factorization?Mid-levelReveal
    The SPF sieve precomputes the smallest prime factor of every number up to a limit L in O(L log log L). Then factorizing any number n takes O(log n) by repeatedly dividing by spf[n]. Without it, each factorization costs O(√n). For batch of k numbers up to L, total cost drops from O(k√L) to O(L log log L + k log L).
  • QExplain Pollard's rho algorithm and its failure modes.SeniorReveal
    Pollard's rho uses a pseudo-random function f(x) = x^2 + c mod n and Floyd's cycle detection to find a factor. Expected complexity O(n^(1/4)). Failure modes: (1) If n is prime, the sequence never repeats, causing infinite loop. (2) The function may generate a cycle without yielding a factor – we need to change c and retry. (3) For very small factors, trial division is faster. Always use Miller-Rabin first and set a time limit.

Frequently Asked Questions

What is a perfect number?

A perfect number equals the sum of its proper divisors (divisors excluding itself). 6 = 1+2+3, 28 = 1+2+4+7+14. Equivalently, σ(n) = 2n. All known perfect numbers are even and correspond to Mersenne primes.

How do I find all divisors of a number efficiently?

Iterate up to √n, collect divisor pairs. For each i where n % i == 0, add i and n//i (if different). Then sort. Complexity O(√n). The number of divisors τ(n) is usually small, so sorting is cheap.

What is the difference between divisor count and divisor sum?

Divisor count τ(n) = number of positive divisors. Divisor sum σ(n) = sum of all positive divisors. Both are multiplicative functions computed from the prime factorization. τ uses product of (exponent+1); σ uses product of (p^(a+1)-1)/(p-1).

When should I use Pollard's rho over trial division?

Use Pollard's rho when n > 10^12 and you only need to factorise a single number. For smaller numbers, trial division is simpler and faster. For batch factorization, use SPF sieve.

What is the time complexity of the SPF sieve?

Construction is O(n log log n). Each factorization after that is O(log n) because you follow the spf pointers. Total for k queries: O(n log log n + k log n).

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousGCD and LCM — Euclidean AlgorithmNext →Modular Arithmetic — ModExp, ModInverse, Fermat's Theorem
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged