Sieve of Eratosthenes — OOM Crash from 28GB Python List
- Start inner loop at p*p, not 2p — numbers below p² are already marked. This is not a micro-optimisation; it's the difference between correct complexity analysis and a slower algorithm.
- Use bytearray or numpy bool arrays, not Python list of booleans — 28x less memory, dramatically better cache performance above n=10^5.
- O(n log log n) is effectively linear — for n=10^7 the constant is ~3.2. A correct sieve is fast enough that you should never be using trial division to generate all primes below a bound.
- Sieve of Eratosthenes marks multiples of each prime starting from p²
- Works in O(n log log n) time, essentially linear for n ≤ 10⁹
- Space is O(n) with a boolean array; bit-packing reduces memory 8x
- Segmented sieve extends it to arbitrary intervals [L,R] without O(R) memory
- Production failure: using Python list of bools for n=10⁸ consumes 280MB; bytearray uses 10MB
- Biggest mistake: starting inner loop at 2p instead of p² wastes half the work
- Real-world use: precomputing primes for RSA key generation, cryptographic protocols, and competitive programming
Sieve Quick Debug
Memory too high
python -c "is_prime=[True]*10**8; import sys; print(sys.getsizeof(is_prime)/(1024**3), 'GB')"python -c "from array import array; a=array('b', [1])*10**8; import sys; print(sys.getsizeof(a)/(1024**3), 'GB')"Incorrect prime count
python -c "def sieve(n): ...; print(len(sieve(100)))"python -c "import sympy; print(sympy.primepi(100))"Production Incident
bytearray and switched to segmented sieve with a segment size of 10^6. Memory dropped to 2MB and the sieve completed in under 10 seconds.Production Debug GuideSymptom → Action mapping for common sieve problems
sys.getsizeof(is_prime). Replace list of bools with bytearray or bitarray. Consider segmented sieve.The Sieve of Eratosthenes is 2300 years old and still the fastest way to generate all primes up to n in most practical scenarios. That longevity is not nostalgia — it is a testament to how completely it solves the problem. In competitive programming, you hit this algorithm in the first 10 minutes of almost any number theory problem. In production, you hit it when building RSA key generation utilities, cryptographic prime tables, or anything involving primality at scale.
What makes it elegant is not the algorithm itself — it's obvious once explained — but the complexity. O(n log log n) time is almost linear. For n=10^7, a correctly implemented sieve runs in under 100ms in Python and under 5ms in C. Compare that to trial division on each number: O(n√n) = O(10^10.5) for n=10^7 — three orders of magnitude slower. The choice is not subtle.
The Basic Sieve
The implementation is four lines of meaningful code. The subtlety is in where the inner loop starts.
Starting the inner loop at pp instead of 2p is the critical optimisation. Every composite number smaller than p² already has a prime factor smaller than p and has therefore already been marked. When we reach prime p=5, multiples 10, 15, 20 were already marked by 2 and 3. The first unmarked multiple of 5 is 5×5=25. This halves the total work.
The second optimisation is memory layout. A Python list of booleans uses 28 bytes per element (Python object overhead). Use a bytearray or numpy bool array instead — 28x less memory and dramatically better cache performance. For n=10^7, a Python list uses 268MB; a bytearray uses 10MB. Cache misses dominate the runtime at large n, so this is not premature optimisation.
def sieve(n: int) -> list[int]: """Return all primes <= n using Sieve of Eratosthenes.""" is_prime = [True] * (n + 1) is_prime[0] = is_prime[1] = False p = 2 while p * p <= n: if is_prime[p]: # Mark multiples of p starting from p*p for multiple in range(p * p, n + 1, p): is_prime[multiple] = False p += 1 return [i for i, prime in enumerate(is_prime) if prime] print(sieve(50)) # [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47] print(f'Primes up to 100: {len(sieve(100))}') # 25
Primes up to 100: 25
Step-by-Step Visual Walkthrough
Understanding the order in which numbers are crossed out clarifies why the algorithm runs in almost-linear time. Below is a visual grid showing the first four passes of the sieve for n = 20.
Initial grid: All numbers 2 through 20 are unmarked.
Pass 1 (p=2): Circle 2 (prime). Then mark every multiple of 2 starting from p² = 4: 4, 6, 8, 10, 12, 14, 16, 18, 20 are crossed out.
Pass 2 (p=3): Circle 3 (prime). Starting from 3² = 9, mark 9, 12, 15, 18.
Pass 3 (p=4): Skip 4 (already crossed).
Pass 4 (p=5): Circle 5 (prime). Starting from 5² = 25 (beyond 20), nothing to mark.
Remaining uncrossed numbers: 2, 3, 5, 7, 11, 13, 17, 19 — all primes.
Notice that multiples like 6 and 10 were marked during the first pass and are not processed again. The grid below visualises the state after each step.
Why Start Marking at p²?
The harmonic series of primes 1/2 + 1/3 + 1/5 + 1/7 + ... grows as log(log(n)) — this is Mertens' theorem from 1874. Each prime p contributes n/p marking operations. Summing over all primes up to √n gives the total work: n × Σ(1/p) ≈ n × log(log(n)).
The practical implication: for n=10^9, log(log(n)) ≈ 3.04. The sieve does roughly 3× the work of a simple O(n) pass. This is why the sieve feels "almost free" in practice — the constant is tiny.
Memory limits in production: At n=10^9, a bit-packed sieve uses 125MB. A bytearray uses 1GB. Bit packing (using Python's bitarray library or numpy with uint8) is the practical choice above n=10^8. The segmented sieve reduces this further — work on √n-sized chunks, keeping only the current segment in cache at any time.
Advantages and Disadvantages vs Trial Division
When deciding between the Sieve of Eratosthenes and trial division for generating primes, the choice depends on the problem scale.
Advantages of Sieve of Eratosthenes: - Generates all primes up to n in O(n log log n) time — dramatically faster than trial division's O(n√n). - Uses O(n) memory, which for n ≤ 10^8 is acceptable on modern machines. - Easily optimised with wheel factorisation and bit-packing. - Foundation for advanced sieves (SPF, linear, segmented).
Disadvantages: - Requires O(n) memory; impractical for n > 10^9 without segmentation. - Overkill if you only need primality of a single number or a small set. - Not suitable for very large numbers (n > 10^12) without custom implementation.
Advantages of Trial Division: - Constant O(1) memory. - Simple to implement, no precomputation. - Ideal for primality testing of individual numbers up to 10^12.
Disadvantages: - O(√n) per number; generating many primes becomes infeasible. - No shared benefits across multiple queries.
| Aspect | Sieve of Eratosthenes | Trial Division |
|---|---|---|
| Generate all primes up to n | O(n log log n) | O(n√n) |
| Space | O(n) | O(1) |
| Primality test for one number (n up to 10^12) | Overkill, but works | O(√n) — adequate |
| Multiple queries | Excellent (precomputation amortised) | Poor (each query is independent) |
| Memory tuning | Bit-packing, segmentation | Not needed |
In production systems, the sieve is the go-to when you need a prime table or many factorisations. Trial division is relegated to quick checks or when memory is extremely constrained.
Segmented Sieve — Large Ranges
The segmented sieve is what you reach for when n doesn't fit in RAM or when you need primes in a specific range [L, R] where R is large but R-L is manageable.
Real scenario: Finding all primes in [10^12, 10^12 + 10^6]. You obviously cannot sieve up to 10^12. But √(10^12) = 10^6 — you can sieve all primes up to 10^6 (trivial: ~78,498 primes), then use them to sieve the segment. The segment has only 10^6 numbers — fits easily in L1/L2 cache.
This is the pattern used in distributed prime-finding projects (like primegrid.com) and in production RSA implementations that need to test primality of large numbers using a quick sieve pre-filter before Miller-Rabin.
import math def segmented_sieve(n: int) -> list[int]: """Memory-efficient sieve for large n using O(sqrt(n)) space.""" limit = int(math.sqrt(n)) + 1 base_primes = sieve(limit) primes = list(base_primes) low, high = limit + 1, min(2 * limit, n) while low <= n: segment = [True] * (high - low + 1) for p in base_primes: start = max(p * p, ((low + p - 1) // p) * p) for j in range(start, high + 1, p): segment[j - low] = False for i, is_p in enumerate(segment): if is_p: primes.append(low + i) low = high + 1 high = min(high + limit, n) return primes print(len(segmented_sieve(1_000_000))) # 78498
Smallest Prime Factor Sieve
A variant that stores the smallest prime factor (SPF) for each number instead of just prime/composite. Enables O(log n) factorisation of any number after O(n log log n) preprocessing.
This is the workhorse for competitive programming problems that require prime factorisation of many numbers. Instead of factorising each number individually (which would be O(√n) per query), you run the SPF sieve once and then factor each number by repeatedly dividing by its SPF — O(log n) per query.
def spf_sieve(n: int) -> list[int]: """Smallest prime factor sieve.""" spf = list(range(n + 1)) # spf[i] = i initially p = 2 while p * p <= n: if spf[p] == p: # p is prime for multiple in range(p * p, n + 1, p): if spf[multiple] == multiple: spf[multiple] = p p += 1 return spf def factorise(n: int
[2, 2, 2, 3, 3, 5]
Linear Sieve (Euler's Sieve)
The classic Sieve of Eratosthenes marks each composite multiple times — for example, 12 is marked by both 2 and 3. The linear sieve (also called Euler's sieve) ensures every composite is crossed out exactly once, achieving true O(n) time complexity.
How it works: Instead of marking multiples of every prime, the linear sieve iterates over all numbers i from 2 to n. When i is prime, it is added to a list. For each prime p ≤ smallest-prime-factor(i) (stored as SPF), the product i × p is marked composite. This condition guarantees each composite is generated only by its smallest prime factor.
Performance: For n=10^7, the linear sieve runs in about 0.2 seconds in C++, while the standard sieve is about 0.15 seconds — both are extremely fast, but the linear sieve eliminates redundant operations. For n=10^9, the difference is around 10-15% in favour of the linear sieve, making it the preferred choice in high-performance libraries.
Drawback: The linear sieve requires O(n) memory for the SPF array (4-8 bytes per entry), so it uses more memory than a bit-packed standard sieve. However, the memory overhead is acceptable for most applications up to 10^8.
def linear_sieve(n: int) -> tuple[list[int], list[int]]: """Return (primes list, smallest prime factor array).""" spf = [0] * (n + 1) primes = [] for i in range(2, n + 1): if spf[i] == 0: spf[i] = i primes.append(i) for p in primes: if p > spf[i] or i * p > n: break spf[i * p] = p return primes, spf primes, spf = linear_sieve(100) print(primes[:20]) # [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71]
Performance Tuning for Production
When you push the sieve beyond toy sizes, three things matter: memory locality, loop optimisation, and cache misses. Here's what senior engineers actually do.
Wheel factorisation: Skip multiples of 2, 3, 5 in the outer loop. Reduces the number of primes to process by ~75%. For n=10^9, this drops runtime by 30%.
Bit-packing: Use Python's bitarray or C++ std::bitset to store 8 primes per byte instead of 1. This reduces memory bandwidth and improves cache hit rates.
C++ with compiler optimisations: A C++ sieve with -O3 -march=native can process n=10^9 in under 2 seconds. Python's overhead (object model, bytecode) adds a factor of 10-20x.
Parallel segmented sieve: Distribute segments across threads. Each segment is independent once base primes are known. With OpenMP or Python multiprocessing, you can achieve near-linear speedup on multi-core systems.
from bitarray import bitarray def sieve_bitarray(n: int) -> list[int]: """Memory-efficient sieve using bitarray.""" is_prime = bitarray(n + 1) is_prime.setall(True) is_prime[0] = is_prime[1] = False p = 2 while p * p <= n: if is_prime[p]: is_prime[p*p:n+1:p] = False # slice assignment, fast p += 1 return [i for i in range(n+1) if is_prime[i]] print(len(sieve_bitarray(10**7))) # ~664579
C++ Implementation
For maximum performance, C++ is the language of choice. A well-written C++ sieve leverages template metaprogramming, compiler optimisations, and direct memory control.
Key points: - Use std::vector<bool> or std::bitset<N> for space-efficient storage. std::vector<bool> is specialised to store bits. - For dynamic sizes, std::unique_ptr<bool[]> with manual bit-packing can be faster but more complex. - The inner loop should be a simple for with pointer arithmetic to maximise CPU instruction-level parallelism. - Compile with -O3 -march=native -mtune=native for best performance.
The code below implements the standard sieve up to n and returns a vector of primes. It runs in under 2 seconds for n=10^9 on a modern CPU.
#include <vector> #include <cmath> #include <algorithm> std::vector<int> sieve(int n) { if (n < 2) return {}; std::vector<bool> is_prime(n + 1, true); is_prime[0] = is_prime[1] = false; for (int p = 2; p * p <= n; ++p) { if (is_prime[p]) { for (int multiple = p * p; multiple <= n; multiple += p) { is_prime[multiple] = false; } } } std::vector<int> primes; primes.reserve(std::max(1, static_cast<int>(n / std::log(n)))); for (int i = 2; i <= n; ++i) { if (is_prime[i]) primes.push_back(i); } return primes; } // Usage: auto primes = sieve(1000000); // std::cout << primes.size(); // 78498
Practice Problems
Apply the sieve techniques to these classic problems. Each requires a different variation: prime counting, prime gaps, primorial, or segmented sieving.
| Problem | Description | Technique |
|---|---|---|
| [SPOJ - PRIME1](https://www.spoj.com/problems/PRIME1/) | Generate all primes in a given range [m, n] with n up to 10^9. | Segmented sieve |
| [Project Euler 7](https://projecteuler.net/problem=7) | Find the 10,001st prime. | Basic sieve or SPF |
| [Project Euler 46](https://projecteuler.net/problem=46) | Goldbach's other conjecture: find the smallest odd composite that cannot be written as the sum of a prime and twice a square. | Sieve + set |
| [SPOJ - DIVSUM](https://www.spoj.com/problems/DIVSUM/) | Sum of divisors; precompute smallest prime factor for fast factorisation. | SPF sieve |
| [Codeforces - Prime Gap](https://codeforces.com/problemset/problem/1225/D) | Count numbers with exactly k prime factors. | Sieve for prime counting |
| [Primorial](https://www.hackerrank.com/challenges/primorial/problem) | Compute primorial (product of first k primes) modulo MOD. | Basic sieve or linear sieve |
| [UVA - 10140: Prime Distance](https://onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=24&page=show_problem&problem=1081) | Find the smallest and largest gap between consecutive primes in a large interval. | Segmented sieve, optimised marking |
| [SPOJ - NDIV](https://www.spoj.com/problems/NDIV/) | Count numbers in [a,b] with exactly n divisors. | SPF sieve for divisor function |
These problems range from easy (basic sieve) to advanced (segmented sieve with large bounds). Attempt them in order to build proficiency.
🎯 Key Takeaways
- Start inner loop at p*p, not 2p — numbers below p² are already marked. This is not a micro-optimisation; it's the difference between correct complexity analysis and a slower algorithm.
- Use bytearray or numpy bool arrays, not Python list of booleans — 28x less memory, dramatically better cache performance above n=10^5.
- O(n log log n) is effectively linear — for n=10^7 the constant is ~3.2. A correct sieve is fast enough that you should never be using trial division to generate all primes below a bound.
- SPF (Smallest Prime Factor) sieve is the production version for competitive programming — O(log n) factorization of any number after O(n log log n) preprocessing.
- Segmented sieve for large ranges: sieve primes to √n, then sieve each segment. Handles ranges [L, R] where R is up to 10^12 or larger.
- Memory, not CPU, is the bottleneck for large n. Always estimate memory before writing a sieve.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhy does the outer loop only need to go up to √n?JuniorReveal
- QWhat is the time complexity of the Sieve of Eratosthenes and why?Mid-levelReveal
- QHow would you find all primes in the range [L, R] where R can be up to 10^12?SeniorReveal
- QHow can you optimise the sieve for multi-core systems?SeniorReveal
- QWhat is the smallest prime factor sieve and why is it useful?Mid-levelReveal
Frequently Asked Questions
What is the prime counting function π(n)?
π(n) counts the number of primes ≤ n. π(100)=25, π(10^6)=78498, π(10^9)≈50.8 million. The prime number theorem says π(n) ≈ n/ln(n).
Can the sieve be parallelised?
Yes — the segmented sieve naturally parallelises since each segment is independent once base primes are computed. Used in distributed prime-finding projects.
What is wheel factorisation and does it improve the sieve?
Wheel factorisation skips multiples of small primes (e.g., 2,3,5) in the outer loop. This reduces the number of prime candidates by about 75% and can speed up the basic sieve by 30-40% for large n.
Why does the SPF sieve use O(n log log n) time but O(n) extra memory?
The SPF sieve stores an integer for each number (4-8 bytes), so memory is larger than a boolean sieve. However, it enables O(log n) factorisation per query, which can save time when many factorisations are needed.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.