Composite detection is O(sqrt(n)) via trial division — critical for cryptographic key generation
RSA encryption relies on the difficulty of factoring large composite numbers into their prime components
✦ Definition~90s read
What is Composite Numbers?
A composite number is a positive integer n > 1 that has at least one positive divisor d where 1 < d < n. Equivalently, n is composite if and only if it can be written as n = a * b where both a > 1 and b > 1.
★
Think of a composite number like a brick wall that can be broken into smaller, identical sections.
Key properties:
- Every composite number has at least one prime factor <= sqrt(n)
- 1 is neither prime nor composite (it is a unit)
- 2 is the only even prime; all other even numbers are composite
- The smallest composite number is 4 (2 * 2)
- Composite numbers have exactly 3 or more positive divisors
Fundamental Theorem of Arithmetic: Every integer n > 1 can be written as a unique product of prime numbers (up to the order of factors). For example, 60 = 2^2 3 5. This factorization is unique — no other combination of primes produces 60.
This theorem is the foundation of RSA cryptography. The security of RSA depends on the fact that while multiplying two large primes p and q to produce n = p * q is trivial, factoring n back into p and q is computationally infeasible for sufficiently large n (2048+ bits).
Plain-English First
Think of a composite number like a brick wall that can be broken into smaller, identical sections. The number 12 can be split into 2 groups of 6, 3 groups of 4, 4 groups of 3, or 6 groups of 2. A prime number like 7 can only be split into 1 group of 7 — it has no smaller building blocks. Composite numbers are the ones that have multiple ways to be divided evenly.
A composite number is any positive integer greater than 1 that is not prime — meaning it has divisors beyond 1 and itself. The numbers 4, 6, 8, 9, 10, and 12 are all composite. Every composite number factors uniquely into a product of primes, a property formalized by the Fundamental Theorem of Arithmetic.
Composite numbers are not just a school math concept. They underpin RSA cryptography, where the security of encrypted communication depends on the computational difficulty of factoring large composite numbers back into their prime components. Hash function design, pseudorandom number generation, and error-correcting codes all rely on properties of composite and prime factorization.
The common misconception is that composite detection is trivial. For small numbers, trial division works fine. For cryptographic-scale numbers (2048-bit integers), factoring is computationally infeasible with classical algorithms — that infeasibility is what makes RSA secure.
What Is a Composite Number: Definition, Properties, and the Fundamental Theorem of Arithmetic
A composite number is a positive integer n > 1 that has at least one positive divisor d where 1 < d < n. Equivalently, n is composite if and only if it can be written as n = a * b where both a > 1 and b > 1.
Every composite number has at least one prime factor <= sqrt(n)
1 is neither prime nor composite (it is a unit)
2 is the only even prime; all other even numbers are composite
The smallest composite number is 4 (2 * 2)
Composite numbers have exactly 3 or more positive divisors
Fundamental Theorem of Arithmetic: Every integer n > 1 can be written as a unique product of prime numbers (up to the order of factors). For example, 60 = 2^2 3 5. This factorization is unique — no other combination of primes produces 60.
This theorem is the foundation of RSA cryptography. The security of RSA depends on the fact that while multiplying two large primes p and q to produce n = p * q is trivial, factoring n back into p and q is computationally infeasible for sufficiently large n (2048+ bits).
io/thecodeforge/math/composite_detector.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
import math
from typing importList, Tuple, Dict, Optionalfrom dataclasses import dataclass
@dataclass
classFactorization:
"""Result of prime factorization."""
number: int
is_composite: bool
prime_factors: Dict[int, int] # prime -> exponent
smallest_factor: Optional[int]
divisor_count: int
classCompositeDetector:
"""Detect composite numbers and perform prime factorization."""defis_composite(self, n: int) -> bool:
"""
Checkif n is composite using trial division.
Time: O(sqrt(n))
Space: O(1)
"""
if n <= 1:
return False# 1 is neither prime nor compositeif n <= 3:
return False# 2 and 3 are primeif n % 2 == 0:
return True# all even numbers > 2 are compositeif n % 3 == 0:
return True# all multiples of 3 > 3 are composite# Check divisors of form 6k +/- 1 up to sqrt(n)# All primes > 3 are of this form — skips 2/3 of candidates
i = 5while i * i <= n:
if n % i == 0or n % (i + 2) == 0:
returnTrue
i += 6returnFalsedeffactorize(self, n: int) -> Factorization:
"""
Complete prime factorization using trial division.
Returns all prime factors and their exponents.
"""
if n <= 1:
returnFactorization(n, False, {}, None, 1)
factors: Dict[int, int] = {}
temp = n
# Factor out all 2swhile temp % 2 == 0:
factors[2] = factors.get(2, 0) + 1
temp //= 2# Factor out odd primes from 3 onwards
d = 3while d * d <= temp:
while temp % d == 0:
factors[d] = factors.get(d, 0) + 1
temp //= d
d += 2# If temp > 1, it is a prime factorif temp > 1:
factors[temp] = 1
is_composite = len(factors) > 1or (len(factors) == 1andlist(factors.values())[0] > 1)
smallest = min(factors.keys()) if factors elseNone# Calculate divisor count: if n = p1^a1 * p2^a2 * ...# then divisor_count = (a1+1) * (a2+1) * ...
divisor_count = 1for exp in factors.values():
divisor_count *= (exp + 1)
returnFactorization(
number=n,
is_composite=is_composite,
prime_factors=factors,
smallest_factor=smallest,
divisor_count=divisor_count,
)
defget_divisors(self, n: int) -> List[int]:
"""Return all positive divisors of n."""if n <= 0:
return []
divisors = []
for i inrange(1, int(math.isqrt(n)) + 1):
if n % i == 0:
divisors.append(i)
if i != n // i:
divisors.append(n // i)
returnsorted(divisors)
defsieve_of_eratosthenes(self, limit: int) -> Tuple[List[bool], List[int]]:
"""
Generate all primes up to limit using Sieve of Eratosthenes.
is_composite[i] = True means i is composite.
Time: O(n log log n)
Space: O(n)
"""
is_composite = [False] * (limit + 1)
is_composite[0] = True# 0 is not prime
is_composite[1] = True# 1 is not primefor i inrange(2, int(math.isqrt(limit)) + 1):
ifnot is_composite[i]:
# Mark all multiples of i as compositefor j inrange(i * i, limit + 1, i):
is_composite[j] = True
primes = [i for i inrange(2, limit + 1) ifnot is_composite[i]]
return is_composite, primes
defsegmented_sieve(self, low: int, high: int, segment_size: int = 65536) -> List[int]:
"""
Find all primes in range [low, high] using segmented sieve.
Memory: O(sqrt(high) + segment_size) instead of O(high).
Use when range is large but you only need primes in a specific window.
"""
sqrt_high = int(math.isqrt(high)) + 1
_, base_primes = self.sieve_of_eratosthenes(sqrt_high)
primes_in_range = []
for seg_start inrange(low, high + 1, segment_size):
seg_end = min(seg_start + segment_size - 1, high)
seg_size = seg_end - seg_start + 1
is_composite_seg = [False] * seg_size
for p in base_primes:
# Find first multiple of p in [seg_start, seg_end]
start = max(p * p, ((seg_start + p - 1) // p) * p)
for j inrange(start, seg_end + 1, p):
is_composite_seg[j - seg_start] = Truefor i inrange(seg_size):
num = seg_start + i
if num >= 2andnot is_composite_seg[i]:
primes_in_range.append(num)
return primes_in_range
defcount_composites_up_to(self, n: int) -> int:
"""
Count composite numbers from4 to n inclusive.
Uses prime counting: composites = (n - 1) - prime_count
(subtract 1for the number 1 which is neither)
"""
if n < 4:
return0
_, primes = self.sieve_of_eratosthenes(n)
return (n - 1) - len(primes)
Every Composite Number Has a Small Factor
n = 100: sqrt(100) = 10. Factor 2 divides 100. Found composite in 1 check.
n = 997 (prime): sqrt(997) ~ 31. Must check all 31 candidates. O(sqrt(n)) checks.
n = 2048-bit RSA modulus: sqrt(n) ~ 2^1024. Trial division is computationally impossible.
Rule: trial division works for n < 10^12. Above that, use probabilistic tests or specialized factorization algorithms.
Production Insight
A hash table implementation used n % table_size for bucket assignment. When table_size was a composite number with many small factors (like 1000 = 2^3 * 5^3), keys clustered unevenly because hash values sharing those factors mapped to the same buckets.
Cause: composite bucket count with many small factors creates uneven distribution. Effect: hash table lookup degraded from O(1) to O(n/k) where k is the number of occupied buckets. Impact: API latency increased 10x under load. Action: switched to a prime number of buckets (1009 instead of 1000). Distribution became uniform and lookup returned to O(1).
Key Takeaway
A composite number has divisors beyond 1 and itself. Every composite has a prime factor <= sqrt(n).
The Fundamental Theorem of Arithmetic guarantees unique prime factorization — the basis of RSA security.
Trial division is O(sqrt(n)). For cryptographic-scale numbers, factoring is computationally infeasible with classical algorithms.
Composite Number Algorithm Selection
IfCheck if n < 10^12 is composite
→
UseUse trial division up to sqrt(n). O(sqrt(n)) time, O(1) space. Completes in under 1 second.
IfCheck if n < 10^18 is composite
→
UseUse Miller-Rabin with 15-20 witness rounds. Probabilistic but error probability < 2^-40.
IfCheck if n > 10^18 is composite (cryptographic scale)
→
UseUse Miller-Rabin with 40+ rounds or AKS deterministic test. AKS is polynomial but slow in practice.
IfFactor a composite number n < 10^18
→
UseTrial division for small factors, then Pollard's Rho for remaining composite parts. O(n^(1/4)) expected.
IfFactor a composite number n > 10^50
→
UseUse Quadratic Sieve or General Number Field Sieve (GNFS). GNFS is the fastest known classical algorithm.
IfGenerate all primes up to 10^7
→
UseSieve of Eratosthenes. O(n log log n) time, O(n) space. Completes in under 1 second.
IfGenerate primes in a large range [a, b] where b > 10^9
→
UseSegmented sieve. O((b-a) log log b + sqrt(b)) time, O(sqrt(b) + segment_size) space.
Primality Testing: Miller-Rabin, Deterministic Witnesses, and Error Bounds
Primality testing determines whether a number is prime or composite. For small numbers, trial division suffices. For large numbers, probabilistic tests are required.
Trial division
Check all divisors from 2 to sqrt(n)
Time: O(sqrt(n)) — impractical for n > 10^12
Deterministic: always correct
Miller-Rabin probabilistic test
Based on properties of modular exponentiation
Each round has at most 25% chance of falsely identifying a composite as prime
Theoretically important but impractically slow in practice
Miller-Rabin with sufficient rounds is faster and the error probability is negligible
io/thecodeforge/math/primality_tester.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
import random
from typing importListclassPrimalityTester:
"""Production-grade primality testing with Miller-Rabin and deterministic witnesses."""# Deterministic witnesses for n < 2^64
DETERMINISTIC_WITNESSES_64BIT = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
# Small primes for quick divisibility check
SMALL_PRIMES = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113]
defis_prime_trial_division(self, n: int) -> bool:
"""Deterministic primality test via trial division. O(sqrt(n))."""if n < 2:
returnFalseif n in (2, 3):
returnTrueif n % 2 == 0or n % 3 == 0:
returnFalse
i = 5while i * i <= n:
if n % i == 0or n % (i + 2) == 0:
returnFalse
i += 6returnTruedef_miller_rabin_round(self, n: int, witness: int) -> bool:
"""
Single round of Miller-Rabin test.
ReturnsTrueif n passes this round (possibly prime).
ReturnsFalseif n is definitely composite.
"""
# Write n-1 as 2^r * d where d is odd
r, d = 0, n - 1while d % 2 == 0:
r += 1
d //= 2# Compute witness^d mod n
x = pow(witness, d, n)
if x == 1or x == n - 1:
return True# passes this roundfor _ inrange(r - 1):
x = pow(x, 2, n)
if x == n - 1:
return True# passes this round
return False# definitely compositedefis_prime_miller_rabin(self, n: int, num_rounds: int = 20) -> bool:
"""
Probabilistic primality test using Miller-Rabin.
Error probability <= 4^(-num_rounds).
For n < 2^64with12 fixed witnesses: deterministic.
"""
if n < 2:
returnFalseif n in (2, 3):
returnTrueif n % 2 == 0:
returnFalse# Quick check against small primesfor p inself.SMALL_PRIMES:
if n == p:
returnTrueif n % p == 0:
returnFalse# For n < 2^64, use deterministic witnessesif n < (1 << 64):
witnesses = self.DETERMINISTIC_WITNESSES_64BIT
else:
# Random witnesses for numbers beyond 64-bit
witnesses = [random.randint(2, n - 2) for _ inrange(num_rounds)]
for w in witnesses:
ifnotself._miller_rabin_round(n, w):
return False# definitely composite
return True# probably prime (or definitely prime for n < 2^64)defnext_prime(self, n: int) -> int:
"""Find the smallest prime >= n."""if n <= 2:
return2
candidate = n if n % 2 == 1else n + 1whilenotself.is_prime_miller_rabin(candidate):
candidate += 2return candidate
defgenerate_rsa_primes(self, bit_length: int) -> tuple:
"""
Generate two primes of specified bit length forRSA key generation.
UsesMiller-Rabinwith40 rounds for cryptographic assurance.
"""
whileTrue:
p = random.getrandbits(bit_length) | (1 << (bit_length - 1)) | 1ifself.is_prime_miller_rabin(p, num_rounds=40):
breakwhileTrue:
q = random.getrandbits(bit_length) | (1 << (bit_length - 1)) | 1if q != p andself.is_prime_miller_rabin(q, num_rounds=40):
breakreturn p, q
defverify_rsa_modulus_uniqueness(self, moduli: List[int]) -> List[tuple]:
"""
Checkfor shared prime factors across RSA moduli.
Returns list of (index_i, index_j, shared_factor) for any pair withGCD > 1.
"""
from math import gcd
collisions = []
for i inrange(len(moduli)):
for j inrange(i + 1, len(moduli)):
g = gcd(moduli[i], moduli[j])
if g > 1:
collisions.append((i, j, g))
return collisions
Miller-Rabin Error Probability Decreases Exponentially
1 round: at most 25% error. Useless alone.
10 rounds: at most 10^(-6) error. Acceptable for non-critical use.
20 rounds: at most 10^(-12) error. Standard for general-purpose primality testing.
40 rounds: at most 10^(-24) error. Standard for cryptographic key generation.
Rule: use 40+ rounds for anything security-related. Use 12 fixed witnesses for 64-bit determinism.
Production Insight
A random number generator library used Miller-Rabin with only 5 rounds for primality testing. In a batch of 1 million candidate primes, 3 candidates were falsely identified as prime (composite numbers passing 5 rounds). These pseudo-primes were used in a Bloom filter, causing false positive rates to exceed the theoretical bound by 0.0003%.
Cause: insufficient Miller-Rabin rounds for the required precision. Effect: 3 composites classified as primes in 1M candidates. Impact: Bloom filter false positive rate exceeded SLA. Action: increased to 20 rounds. False positives dropped to zero in subsequent batches of 100M candidates.
Key Takeaway
Miller-Rabin with 40 rounds is the standard for cryptographic primality testing. For 64-bit integers, 12 fixed witnesses give deterministic results.
Trial division is O(sqrt(n)) and only practical for n < 10^12.
The AKS test is deterministic and polynomial but too slow for practical use — Miller-Rabin with sufficient rounds is always preferred.
RSA and the Composite Number Problem: Why Factoring Is Hard
RSA encryption relies on a single mathematical fact: multiplying two large primes is easy, but factoring their product is hard.
RSA key generation: 1. Choose two large primes p and q (typically 1024 bits each) 2. Compute n = p * q (the composite modulus, 2048 bits) 3. Compute phi(n) = (p-1)(q-1) (Euler's totient function) 4. Choose e such that gcd(e, phi(n)) = 1 (commonly e = 65537) 5. Compute d = e^(-1) mod phi(n) (the private key) 6. Public key: (n, e). Private key: (n, d)
Encryption: c = m^e mod n Decryption: m = c^d mod n
The security assumption: given n, it is computationally infeasible to find p and q. The best known classical algorithm (General Number Field Sieve) factors an n-bit composite in time O(exp(c n^(1/3) (log n)^(2/3))). For 2048-bit RSA, this requires approximately 2^112 operations — far beyond current computational capability.
Quantum threat: Shor's algorithm factors composite numbers in polynomial time on a quantum computer. A sufficiently large quantum computer could break RSA-2048 in hours. This is why NIST is standardizing post-quantum cryptographic algorithms (lattice-based, code-based, hash-based) that do not rely on the hardness of factoring composite numbers.
Common RSA vulnerabilities related to composite numbers
Shared prime factors across keys (detected via GCD)
Small prime factors (smooth numbers) that make factoring easier
Weak random number generators producing correlated primes
Using the same prime for both p and q (n = p^2, trivially factorable)
import random
from math import gcd
from typing importList, Tuple, OptionalclassRSACompositeAnalyzer:
"""Analyze RSA moduli for composite number vulnerabilities."""defgenerate_rsa_keypair(self, bit_length: int = 1024) -> dict:
"""
GenerateRSA keypair with post-generation verification.
Ensures p != q and both are verified prime.
"""
from io.thecodeforge.math.primality_tester importPrimalityTester
tester = PrimalityTester()
p, q = tester.generate_rsa_primes(bit_length)
n = p * q
phi_n = (p - 1) * (q - 1)
e = 65537# standard public exponent# Compute modular inverse: d = e^(-1) mod phi(n)
d = pow(e, -1, phi_n)
return {
'public_key': (n, e),
'private_key': (n, d),
'modulus': n,
'modulus_bit_length': n.bit_length(),
'p': p,
'q': q,
}
defdetect_shared_factors(self, moduli: List[int]) -> List[dict]:
"""
DetectRSA moduli that share prime factors.
GCD(n1, n2) > 1 means both keys are compromised.
"""
findings = []
for i inrange(len(moduli)):
for j inrange(i + 1, len(moduli)):
g = gcd(moduli[i], moduli[j])
if g > 1and g != moduli[i] and g != moduli[j]:
p = g
q1 = moduli[i] // g
q2 = moduli[j] // g
findings.append({
'key_pair': (i, j),
'shared_factor': g,
'key_i_factors': (g, q1),
'key_j_factors': (g, q2),
'severity': 'CRITICAL',
'action': 'Revoke both keys immediately. Re-generate with verified entropy.',
})
return findings
deffactor_via_shared_prime(self, n1: int, n2: int) -> Optional[Tuple[Tuple[int, int], Tuple[int, int]]]:
"""
If two RSA moduli share a prime factor, factor both instantly.
Thisis why shared primes are catastrophic.
"""
g = gcd(n1, n2)
if g > 1and g != n1 and g != n2:
p = g
q1 = n1 // g
q2 = n2 // g
return ((p, q1), (p, q2))
returnNonedefcheck_smooth_number(self, n: int, smoothness_bound: int = 1000) -> dict:
"""
Checkif n is B-smooth (all prime factors <= B).
Smooth numbers are easier to factor using algorithms like Pollard's p-1.
"""
from io.thecodeforge.math.composite_detector importCompositeDetector
detector = CompositeDetector()
temp = n
factors = {}
# Simplified prime check; in production use a proper prime listdefis_prime_simple(x: int) -> bool:
if x < 2:
returnFalsefor i inrange(2, int(x**0.5) + 1):
if x % i == 0:
returnFalsereturnTruefor p inrange(2, smoothness_bound + 1):
ifis_prime_simple(p):
while temp % p == 0:
factors[p] = factors.get(p, 0) + 1
temp //= p
is_smooth = temp == 1return {
'number': n,
'smoothness_bound': smoothness_bound,
'is_smooth': is_smooth,
'factors_found': factors,
'remaining_cofactor': temp if temp > 1elseNone,
'risk': 'HIGH'if is_smooth else'LOW',
'note': (
'B-smooth numbers are vulnerable to Pollard p-1 factorization. ''RSA primes should be generated to avoid smoothness.'if is_smooth else'Number is not B-smooth at the given bound. Standard factoring difficulty applies.'
),
}
defestimate_factoring_difficulty(self, n: int) -> dict:
"""
Estimate the difficulty of factoring a composite number
based on its bit length and structure.
"""
bit_length = n.bit_length()
if bit_length <= 64:
difficulty = 'TRIVIAL'
method = 'Trial division or Pollard Rho. Seconds on modern hardware.'
time_estimate = '< 1 second'elif bit_length <= 128:
difficulty = 'EASY'
method = 'Pollard Rho or ECM. Minutes on modern hardware.'
time_estimate = '1-10 minutes'elif bit_length <= 256:
difficulty = 'MODERATE'
method = 'ECM or Quadratic Sieve. Hours to days.'
time_estimate = '1-48 hours'elif bit_length <= 512:
difficulty = 'HARD'
method = 'Quadratic Sieve or GNFS. Days to weeks on a cluster.'
time_estimate = '1-30 days'elif bit_length <= 1024:
difficulty = 'VERY HARD'
method = 'GNFS. Months to years on a large cluster.'
time_estimate = '1-5 years'elif bit_length <= 2048:
difficulty = 'EXTREME'
method = 'GNFS. Requires massive computational resources.'
time_estimate = '10-100 years on current hardware'else:
difficulty = 'IMPOSSIBLE (classical)'
method = 'GNFS'
time_estimate = '> age of universe'return {
'bit_length': bit_length,
'difficulty': difficulty,
'method': method,
'time_estimate': time_estimate,
'quantum_note': (
'Shor algorithm on a sufficiently large quantum computer ''could factor this in polynomial time. Post-quantum migration recommended.'if bit_length <= 2048else'Quantum threat applies to all RSA key sizes.'
),
}
RSA Security Is the Composite Number Problem
RSA-1024 (512-bit primes): factored in 2009 by a team using hundreds of machines over 2 years. No longer secure.
RSA-2048 (1024-bit primes): not yet factored. Estimated 2^112 operations with GNFS. Considered secure until ~2030.
RSA-4096 (2048-bit primes): not factored. Provides margin against quantum computing advances.
Shared prime detection: GCD(n1, n2) > 1 instantly factors both keys. Run pairwise GCD scans on all stored moduli.
Rule: use RSA-2048 minimum. Plan migration to post-quantum algorithms (lattice-based) by 2030.
Production Insight
A certificate authority generated 100,000 RSA-2048 keys over 3 years. A security researcher computed pairwise GCD across all public moduli and found 275 pairs that shared a prime factor. Each pair could be factored in milliseconds, exposing the private keys.
Cause: insufficient entropy during prime generation on containerized VMs that shared snapshot state. Effect: 275 key pairs compromised — any attacker with access to the public keys could derive the private keys. Impact: mass certificate revocation, regulatory investigation, and $8M in remediation costs. Action: switched to HSM-based key generation with hardware entropy and added post-generation GCD verification.
Key Takeaway
RSA security depends on the hardness of factoring composite numbers. Shared prime factors across moduli make factoring trivial via GCD.
Plan for post-quantum migration — Shor's algorithm breaks RSA on quantum computers.
Sieve of Eratosthenes: Generating Primes and Identifying Composites at Scale
The Sieve of Eratosthenes is the most efficient algorithm for generating all prime numbers up to a limit n. It works by iteratively marking composite numbers.
Algorithm: 1. Create a boolean array is_composite[0..n], initialized to False 2. Mark 0 and 1 as composite 3. For each number i from 2 to sqrt(n): a. If i is not marked composite, it is prime b. Mark all multiples of i starting from i^2 as composite 4. All unmarked numbers are prime
Time complexity: O(n log log n) — nearly linear Space complexity: O(n)
Optimization: start marking from i^2 instead of 2*i, because all smaller multiples of i have already been marked by smaller primes.
Segmented sieve: for ranges where n is too large for O(n) memory, process the range in segments of size sqrt(n) or smaller. Each segment uses O(segment_size) memory while the base primes use O(sqrt(n)).
Bit-packing optimization: use a bit array instead of a boolean array. Each number uses 1 bit instead of 1 byte. Memory drops by 8x. For n = 10^9, this is 125MB instead of 1GB.
Wheel factorization: skip multiples of 2, 3, and 5 by only checking numbers of the form 30k + {1, 7, 11, 13, 17, 19, 23, 29}. This reduces the work by a factor of 30/8 = 3.75x.
io/thecodeforge/math/optimized_sieve.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
import math
from typing importList, Iteratorimport array
classOptimizedSieve:
"""Production-grade sieve implementations for prime generation and composite detection."""defbasic_sieve(self, limit: int) -> List[int]:
"""
StandardSieve of Eratosthenes.
Time: O(n log log n), Space: O(n)
"""
if limit < 2:
return []
is_composite = bytearray(limit + 1)
for i inrange(2, int(math.isqrt(limit)) + 1):
ifnot is_composite[i]:
for j inrange(i * i, limit + 1, i):
is_composite[j] = 1return [i for i inrange(2, limit + 1) ifnot is_composite[i]]
defbit_packed_sieve(self, limit: int) -> List[int]:
"""
Bit-packed sieve — 8x memory reduction over boolean array.
Each number uses 1 bit instead of 1 byte.
For limit=10^9: 125MB instead of 1GB.
"""
if limit < 2:
return []
# Each byte stores 8 flags (odd numbers only)
size = (limit + 1) // 2
sieve = bytearray((size + 7) // 8)
defis_set(idx: int) -> bool:
returnbool(sieve[idx // 8] & (1 << (idx % 8)))
defclear_bit(idx: int):
sieve[idx // 8] |= (1 << (idx % 8))
# Only sieve odd numbers: index i represents number 2*i + 1
sqrt_limit = int(math.isqrt(limit))
for i inrange(3, sqrt_limit + 1, 2):
ifnotis_set(i // 2):
# Mark odd multiples of i starting from i*i
step = i # step in terms of odd numbers
start = (i * i) // 2for j inrange(start, size, step):
clear_bit(j)
# Collect primes
primes = [2] # 2 is the only even primefor i inrange(1, size):
ifnotis_set(i):
primes.append(2 * i + 1)
return [p for p in primes if p <= limit]
defsegmented_sieve(self, low: int, high: int, segment_size: int = 65536) -> List[int]:
"""
Segmented sieve for finding primes in [low, high].
Memory: O(sqrt(high) + segment_size)
Use when high > 10^7and you cannot allocate O(high) memory.
"""
sqrt_high = int(math.isqrt(high)) + 1# Generate base primes up to sqrt(high)
base_is_composite = bytearray(sqrt_high + 1)
for i inrange(2, int(math.isqrt(sqrt_high)) + 1):
ifnot base_is_composite[i]:
for j inrange(i * i, sqrt_high + 1, i):
base_is_composite[j] = 1
base_primes = [i for i inrange(2, sqrt_high + 1) ifnot base_is_composite[i]]
primes = []
for seg_start inrange(max(2, low), high + 1, segment_size):
seg_end = min(seg_start + segment_size - 1, high)
seg_is_composite = bytearray(seg_end - seg_start + 1)
for p in base_primes:
# First multiple of p in [seg_start, seg_end]
start = max(p * p, ((seg_start + p - 1) // p) * p)
for j inrange(start, seg_end + 1, p):
seg_is_composite[j - seg_start] = 1for i inrange(len(seg_is_composite)):
num = seg_start + i
if num >= 2andnot seg_is_composite[i]:
primes.append(num)
return primes
defwheel_sieve(self, limit: int) -> List[int]:
"""
Wheel factorization sieve — skips multiples of 2, 3, 5.
Checks only numbers of form 30k + {1,7,11,13,17,19,23,29}.
3.75x fewer candidates than basic sieve.
"""
if limit < 2:
return []
# Base primes up to the wheel size
small_primes = [2, 3, 5]
if limit < 7:
return [p for p in small_primes if p <= limit]
# Wheel offsets: numbers mod 30 that are coprime to 2, 3, 5
wheel = [1, 7, 11, 13, 17, 19, 23, 29]
wheel_gaps = [6, 4, 2, 4, 2, 4, 6, 2] # gaps between wheel positions# Sieve array for numbers in the wheel pattern
size = (limit + 29) // 30
sieve = bytearray(size * 8) # 8 flags per 30-number block
sqrt_limit = int(math.isqrt(limit))
# Sieve using wheel offsetsfor i inrange(1, size):
for j, offset inenumerate(wheel):
num = 30 * i + offset
if num > sqrt_limit:
breakifnot sieve[i * 8 + j]:
# Mark multiples of numfor k inrange(i, size):
for l, inner_offset inenumerate(wheel):
multiple = 30 * k + inner_offset
if multiple > limit:
breakif multiple % num == 0:
sieve[k * 8 + l] = 1# Collect primes
primes = list(small_primes)
for i inrange(size):
for j, offset inenumerate(wheel):
num = 30 * i + offset
if num > limit:
breakif num >= 7andnot sieve[i * 8 + j]:
primes.append(num)
return primes
defprime_iterator(self, limit: int) -> Iterator[int]:
"""
Memory-efficient prime iterator using segmented sieve.
Yields primes one at a time without storing all in memory.
"""
segment_size = 65536for seg_start inrange(2, limit + 1, segment_size):
seg_end = min(seg_start + segment_size - 1, limit)
for p inself.segmented_sieve(seg_start, seg_end, segment_size):
yield p
The Sieve Marks Composites, Not Primes
Basic sieve: O(n log log n) time, O(n) space. Good for n < 10^8.
Bit-packed sieve: same time, 1/8 the memory. Good for n < 10^9.
Segmented sieve: O(n log log n) time, O(sqrt(n)) memory. Good for n > 10^9.
Wheel sieve: 3.75x fewer candidates by skipping multiples of 2, 3, 5.
Rule: use basic sieve for n < 10^8. Use segmented sieve for larger ranges. Use bit-packing when memory is constrained.
Production Insight
A search engine needed all primes up to 10^9 for a hash function that used prime-sized tables. The basic boolean sieve required 1GB of memory, causing OOM on the 2GB container. Switching to a bit-packed sieve reduced memory to 125MB, fitting comfortably. Further optimization with a wheel sieve reduced runtime from 12 seconds to 3.2 seconds.
Cause: boolean array used 1 byte per number instead of 1 bit. Effect: 8x memory waste. Impact: OOM crashes in production. Action: implemented bit-packed sieve with wheel factorization. Memory dropped 8x, runtime dropped 3.75x.
Key Takeaway
The Sieve of Eratosthenes is O(n log log n) — nearly linear. It is 300x faster than trial division for generating all primes up to n.
Use bit-packing for 8x memory reduction. Use segmented sieve when n exceeds available memory.
For n > 10^9, segmented sieve with 64KB segments balances cache efficiency and memory usage.
Composite Numbers in Hash Tables, Random Number Generation, and Error-Correcting Codes
Beyond cryptography, composite numbers appear in several areas of computer science where their properties affect system behavior.
Hash tables
Open addressing hash tables perform best with a prime number of buckets
Composite bucket counts (especially powers of 2 or numbers with many small factors) cause uneven key distribution
If the hash function produces values that share factors with the bucket count, keys cluster into fewer buckets
Example: bucket_count = 1000 (2^3 * 5^3). Keys with hash values divisible by 2 or 5 cluster, reducing effective capacity
Solution: use prime bucket counts (1009 instead of 1000) or powers of 2 with a high-quality hash function
Pseudorandom number generators (PRNGs)
Linear Congruential Generators (LCGs) use the recurrence: X_{n+1} = (a * X_n + c) mod m
The modulus m determines the period. If m is composite, the period may be shorter than m
For maximum period, m should be prime or a power of 2 with specific multiplier constraints
Composite moduli with small factors reduce the effective period and create detectable patterns
Error-correcting codes
Reed-Solomon codes operate over finite fields GF(p^k) where p is prime
The field size determines the symbol size and error correction capability
Composite field sizes are not used because they do not form fields
BCH codes use properties of composite polynomial rings for burst error correction
Modular arithmetic
Euler's totient function phi(n) counts integers <= n that are coprime to n
For composite n = p^a q^b ..., phi(n) = n (1-1/p) (1-1/q) * ...
Fermat's Little Theorem (a^(p-1) = 1 mod p for prime p) does not hold for composite n
Carmichael numbers are composite numbers that satisfy a^(n-1) = 1 mod n for all coprime a — they fool Fermat's primality test
import math
from typing importList, Dict, TupleclassCompositeApplications:
"""Composite number applications in hash tables, PRNGs, and modular arithmetic."""defanalyze_hash_distribution(self, keys: List[int], bucket_count: int) -> dict:
"""
Analyze key distribution across buckets to detect composite-related clustering.
"""
buckets = [0] * bucket_count
for key in keys:
buckets[key % bucket_count] += 1
non_empty = sum(1for b in buckets if b > 0)
max_bucket = max(buckets)
min_bucket = min(b for b in buckets if b > 0) if non_empty > 0else0
avg_bucket = len(keys) / non_empty if non_empty > 0else0# Calculate variance from ideal distribution
ideal = len(keys) / bucket_count
variance = sum((b - ideal) ** 2for b in buckets) / bucket_count
return {
'bucket_count': bucket_count,
'is_prime_bucket_count': self._is_prime(bucket_count),
'total_keys': len(keys),
'non_empty_buckets': non_empty,
'utilization': round(non_empty / bucket_count * 100, 1),
'max_bucket_size': max_bucket,
'min_bucket_size': min_bucket,
'avg_bucket_size': round(avg_bucket, 2),
'variance': round(variance, 2),
'recommendation': (
'Prime bucket count — good distribution expected.'ifself._is_prime(bucket_count) else'Composite bucket count — consider switching to nearest prime for better distribution.'
),
}
def_is_prime(self, n: int) -> bool:
"""Quick primality check for small numbers."""if n < 2:
returnFalseif n < 4:
returnTrueif n % 2 == 0or n % 3 == 0:
returnFalse
i = 5while i * i <= n:
if n % i == 0or n % (i + 2) == 0:
returnFalse
i += 6returnTruedefeuler_totient(self, n: int) -> int:
"""
ComputeEuler's totient function phi(n).
phi(n) counts integers 1..n that are coprime to n.
For prime p: phi(p) = p - 1For composite n = p1^a1 * p2^a2 * ...: phi(n) = n * prod(1 - 1/pi)
"""
result = n
temp = n
p = 2while p * p <= temp:
if temp % p == 0:
while temp % p == 0:
temp //= p
result -= result // p # result *= (1 - 1/p)
p += 1if temp > 1:
result -= result // temp
return result
deffind_carmichael_numbers(self, limit: int) -> List[int]:
"""
FindCarmichael numbers up to limit.
Carmichael numbers are composite numbers n where a^(n-1) = 1 mod n
for all a coprime to n. They fool Fermat's primality test.
"""
carmichael = []
for n inrange(2, limit + 1):
ifself._is_prime(n):
continue # Carmichael numbers must be composite# Must be square-free (no repeated prime factors)
temp = n
is_square_free = True
p = 2while p * p <= temp:
count = 0while temp % p == 0:
count += 1
temp //= p
if count > 1:
is_square_free = Falsebreak
p += 1ifnot is_square_free:
continue# Must have at least 3 prime factors
phi = self.euler_totient(n)
if (n - 1) % phi == 0:
# Verify Fermat condition for a few bases
passes_all = Truefor a in [2, 3, 5, 7, 11]:
if math.gcd(a, n) == 1:
ifpow(a, n - 1, n) != 1:
passes_all = Falsebreakif passes_all:
carmichael.append(n)
return carmichael
deflcg_period_analysis(self, a: int, c: int, m: int, num_samples: int = 100000) -> dict:
"""
Analyze the period of a LinearCongruentialGenerator.
X_{n+1} = (a * X_n + c) mod m
Maximum period = m if: gcd(c,m)=1, andfor every prime p|m: (a-1)=0 mod p,
andif4|m: (a-1)=0 mod 4.
"""
x = 1# seed
seen = {}
period = 0for i inrange(num_samples):
if x in seen:
period = i - seen[x]
break
seen[x] = i
x = (a * x + c) % m
max_possible_period = m
achieves_max = period == max_possible_period
# Check conditions for maximum period
conditions = {
'gcd_c_m_is_1': math.gcd(c, m) == 1,
}
# Check: for every prime p dividing m, (a-1) = 0 mod p
temp = m
p = 2
prime_conditions = Truewhile p * p <= temp:
if temp % p == 0:
if (a - 1) % p != 0:
prime_conditions = Falsewhile temp % p == 0:
temp //= p
p += 1if temp > 1and (a - 1) % temp != 0:
prime_conditions = False
conditions['a_minus_1_divisible_by_all_prime_factors_of_m'] = prime_conditions
# If 4 divides m, check (a-1) = 0 mod 4if m % 4 == 0:
conditions['a_minus_1_divisible_by_4'] = (a - 1) % 4 == 0return {
'parameters': {'a': a, 'c': c, 'm': m},
'm_is_prime': self._is_prime(m),
'm_is_composite': notself._is_prime(m) and m > 1,
'observed_period': period,
'max_possible_period': max_possible_period,
'achieves_max_period': achieves_max,
'conditions': conditions,
}
Carmichael Numbers Fool Fermat's Test
561 = 3 11 17: smallest Carmichael number. Passes Fermat for all bases coprime to 561.
1105 = 5 13 17: second Carmichael number.
Carmichael numbers are always odd, square-free, and have at least 3 prime factors.
Infinitely many Carmichael numbers exist (Alford, Granville, Pomerance, 1994).
Rule: never use Fermat's test alone for primality. Always use Miller-Rabin or AKS.
Production Insight
A distributed cache used hash bucket count = 2^20 (1,048,576). The hash function was key % bucket_count. Because bucket_count is a power of 2, only the lower 20 bits of the hash determined the bucket. The upper bits were wasted. When keys followed a pattern (sequential IDs), they clustered in the first 1024 buckets, causing 99.9% of buckets to be empty.
Cause: power-of-2 bucket count discarded upper hash bits. Effect: sequential keys clustered in 0.1% of buckets. Impact: cache hit rate dropped from 95% to 12%. Action: switched to prime bucket count (1,048,573) so all hash bits contribute to the bucket index. Hit rate recovered to 94%.
Key Takeaway
Composite bucket counts with many small factors cause uneven hash distribution. Use prime bucket counts for open addressing.
Carmichael numbers are composites that fool Fermat's test — always use Miller-Rabin.
LCG period depends on the modulus. Composite moduli with small factors reduce the effective period.
The Naive Loop That Cost Us a Datacenter
Before you reach for Sieve of Eratosthenes or Miller-Rabin, understand why the naive approach is where production systems die. The O(n√n) loop checking every number against every divisor works fine for n = 10⁶. Push it to 10⁹ and you've got a 30-minute compute bill and a pager alert.
The naive method: for each integer i from 2 to n, test divisibility from 2 up to √i. If any divisor found, i is composite. This is how you teach primality in CS 101. It's deterministic, correct, and catastrophically slow at scale.
Where we see this fail most often: hash table sizing and random number generator seeding. A developer writes for(int i=2; i<n; i++) isPrime[i] = checkNaive(i) and wonders why their service starts timing out at 10k requests. The √n inside a loop creates quadratic-like behavior in disguise.
Never use this outside of prototyping or academic exercises. Real systems need probabilistic tests or sieves. The naive approach is a teaching tool, not a production tool.
NaiveCompositeCheck.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — dsa tutorialpublicclassNaiveCompositeCheck {
publicstaticbooleanisPrime(int candidate) {
if (candidate < 2) returnfalse;
if (candidate == 2) returntrue;
if (candidate % 2 == 0) returnfalse;
int limit = (int) Math.sqrt(candidate);
for (int divisor = 3; divisor <= limit; divisor += 2) {
if (candidate % divisor == 0) returnfalse;
}
returntrue;
}
publicstaticvoidmain(String[] args) {
int n = 100;
for (int i = 2; i <= n; i++) {
if (!isPrime(i)) {
System.out.print(i + " ");
}
}
}
}
This loop seems linear, but that √n per iteration gives you O(n√n). At n=10^6, that's ~1 trillion operations. Your CPU will throttle, your latency will spike, and your on-call rotation will hate you.
Key Takeaway
Naive O(n√n) composite detection is for proofs, not production. For n > 10^4, use a sieve or probabilistic test.
Segmenting the Sieve: When Memory Eats Your Budget
Standard Sieve of Eratosthenes needs a boolean array of size n+1. For n = 10⁹, that's ~1 GB of memory just for flags. In a container with 512 MB limit, you crash before you compute anything. That's where segmented sieves save your ass.
Instead of allocating for n, you chunk the range [1, n] into segments of size √n (or less). For each segment, you track which primes cross into it. Memory drops from O(n) to O(√n). Time complexity stays O(n log log n) — you're just trading memory for a constant factor in runtime.
Here's the pattern that works in production: first generate all primes up to √n using a normal sieve (that's O(√n) memory). Then iterate over segments, marking composites using those primes. Each segment's range is small enough to fit in L2 cache or your container limit.
This is how you handle queries like "count composites in [10¹², 10¹² + 10⁶]" without renting a mainframe. Real-world applications: prime gaps research, cryptography parameter validation, and large-scale hash table preallocation.
SegmentedSieve.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// io.thecodeforge — dsa tutorialimport java.util.*;
publicclassSegmentedSieve {
publicstaticList<Integer> getCompositesInRange(long low, long high) {
int limit = (int) Math.sqrt(high) + 1;
boolean[] isPrimeSmall = newboolean[limit + 1];
Arrays.fill(isPrimeSmall, true);
for (int p = 2; p * p <= limit; p++) {
if (isPrimeSmall[p]) {
for (int multiple = p * p; multiple <= limit; multiple += p)
isPrimeSmall[multiple] = false;
}
}
List<Integer> smallPrimes = newArrayList<>();
for (int i = 2; i <= limit; i++) {
if (isPrimeSmall[i]) smallPrimes.add(i);
}
int segmentSize = (int) (high - low + 1);
boolean[] isComposite = newboolean[segmentSize];
for (int prime : smallPrimes) {
long start = Math.max(prime * prime, ((low + prime - 1) / prime) * prime);
for (long j = start; j <= high; j += prime) {
isComposite[(int) (j - low)] = true;
}
}
List<Integer> composites = newArrayList<>();
for (int idx = 0; idx < segmentSize; idx++) {
if (isComposite[idx]) composites.add((int) (low + idx));
}
return composites;
}
publicstaticvoidmain(String[] args) {
List<Integer> result = getCompositesInRange(10, 30);
System.out.println(result);
}
}
Segmented sieves trade cache efficiency for memory. For ranges under 10^8, a flat boolean array is faster. Above that, segment sizes of 10^6 (fitting L2 cache) give the best throughput. Profile both.
Key Takeaway
Segmented sieve drops memory to O(√n) without changing runtime asymptotics — essential for ranges above 10⁸ in memory-constrained environments.
Why Quicksort Degrades on Composite-Length Arrays
Quicksort's worst-case O(n²) behavior rarely comes from random data—it comes from structured inputs where the pivot choice systematically fails. Composite-length arrays (e.g., 12, 18, 100 elements) often hide worst-case patterns, especially when the array size has many factors and the pivot selection aligns with those factors. For example, arrays whose length is a power of 2 can expose median-of-three pivot failures if data is nearly sorted. The fix: use random pivot selection to eliminate dependency on array length structure. Historically, Java's Arrays.sort() moved from median-of-three to dual-pivot quicksort partly to mitigate these length-related performance cliffs. When your data has composite-number lengths, test with adversarial sequences—your median-of-three pivot might be picking the second or third quartile, not the median, blowing up recursion depth.
QuicksortPivot.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — dsa tutorialpublicclassQuicksortPivot {
publicstaticvoidsort(int[] a) {
sort(a, 0, a.length - 1);
}
privatestaticvoidsort(int[] a, int lo, int hi) {
if (lo >= hi) return;
int len = hi - lo + 1;
int p = a[lo + len / 2]; // bad: midpoint on composite lengthint i = lo, j = hi;
while (i <= j) {
while (a[i] < p) i++;
while (a[j] > p) j--;
if (i <= j) {
int t = a[i]; a[i] = a[j]; a[j] = t;
i++; j--;
}
}
sort(a, lo, j);
sort(a, i, hi);
}
}
Output
No output (method sorts in place)
Production Trap:
A financial trading system using median-of-three pivots on fixed-length arrays of 100 (composite) experienced 40ms spikes during market opens — switching to random pivot fixed the latency jitter.
Key Takeaway
Always use random pivot selection when array length is composite; length factors correlate with worst-case data patterns.
Visualizing Quicksort Pathologies on Composite Array Sizes
When students draw quicksort recursion trees, composite-length arrays expose a visual truth: the partition imbalance amplifies with every composite divisor. A 12-element array (factors: 2, 3, 4, 6) selected with a fixed pivot yields a recursion tree that's 5 levels deep at worst, instead of the ideal 4 for O(n log n). Draw each pivot as a red node: on composite lengths, red nodes cluster on one side, creating lopsided trees that approach chain-like O(n) structure. The critical insight: composite-number sizes offer more pivot traps because the median-of-three on even length picks between indices that may align with sorted regions. Visualize the swap count per level: composite lengths show 15–40% more swaps than prime-length arrays of similar size. This visual proof—lopsided recursion depth and extra swaps—is why production systems randomize or use introspective sort (fallback to heapsort) when recursion exceeds 2*log2(n).
VisualizeSwaps.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// io.thecodeforge — dsa tutorialpublicclassVisualizeSwaps {
staticint swaps = 0;
publicstaticintquickSortSwaps(int[] a) {
swaps = 0;
sort(a, 0, a.length - 1);
return swaps;
}
privatestaticvoidsort(int[] a, int lo, int hi) {
if (lo >= hi) return;
int p = a[lo]; // fixed pivotint i = lo, j = hi;
while (i <= j) {
while (a[i] < p) i++;
while (a[j] > p) j--;
if (i <= j) {
int t = a[i]; a[i] = a[j]; a[j] = t;
swaps++;
i++; j--;
}
}
sort(a, lo, j);
sort(a, i, hi);
}
publicstaticvoidmain(String[] args) {
int[] arr = {3, 7, 1, 9, 2, 8, 5, 6, 4};
System.out.println("Swaps: " + quickSortSwaps(arr));
}
}
Output
Swaps: 10
Production Trap:
Visualizing swap counts across array sizes in your CI benchmark reveals that composite-length inputs (e.g., 84) consistently trigger 2x more swaps than prime-length neighbors (83, 89), indicating pivot selection failure.
Key Takeaway
Composite-length arrays produce visually lopsided recursion trees; monitor swap count as a proxy for quicksort health in production.
● Production incidentPOST-MORTEMseverity: high
The Weak RSA Key: A Composite Number With Repeated Prime Factors
Symptom
During a routine security scan, GCD computation between pairs of RSA public moduli revealed 47 keys out of 15,000 that shared a common prime factor. Any two keys with a shared factor can be fully factored in milliseconds.
Assumption
The team assumed the system's random number generator produced sufficient entropy for cryptographic key generation. They did not verify key uniqueness or factor independence after generation.
Root cause
The key generation service used /dev/urandom on a containerized VM that had just been cloned from a snapshot. The clone operation reset the entropy pool, causing the first 47 key generation calls to produce correlated random seeds. The prime generation algorithm (Miller-Rabin with random witnesses) selected the same prime candidate for multiple keys because the random inputs were correlated.
The composite modulus n = p * q was different for each key (because q varied), but p was shared across 47 keys. Computing GCD(n1, n2) for any pair of these keys instantly reveals p, and dividing n by p reveals q. The entire private key is compromised.
Fix
1. Immediately revoked all 47 affected certificates and re-generated keys with verified entropy.
2. Added a post-generation uniqueness check: after generating each RSA modulus, compute GCD(new_modulus, all_existing_moduli). If GCD > 1, reject the key and re-generate.
3. Replaced /dev/urandom with a hardware security module (HSM) for all cryptographic key generation. The HSM provides guaranteed entropy independent of VM state.
4. Added a nightly batch job that computes pairwise GCD across all stored RSA moduli and alerts if any common factors are detected.
5. Implemented entropy verification at VM boot: the key generation service refuses to start if the entropy pool has fewer than 256 bits of estimated entropy.
Key lesson
RSA security depends on the assumption that n = p * q uses two unique, independently generated primes. If p or q is shared across keys, the composite number's factorization becomes trivial.
VM cloning, snapshot restores, and container reuse reset entropy pools. Always re-seed the random number generator with hardware entropy after any virtualization event.
Post-generation verification is not optional for cryptographic keys. GCD checks across all stored moduli detect shared factors that indicate entropy failures.
The Fundamental Theorem of Arithmetic guarantees unique prime factorization. But uniqueness of the factorization does not guarantee uniqueness of the primes used — you must verify independently.
Production debug guideSymptom-to-action guide for primality testing failures, factorization bottlenecks, and cryptographic key issues5 entries
Symptom · 01
Primality test returns incorrect result for large numbers
→
Fix
Verify you are using a probabilistic test (Miller-Rabin) with sufficient witness rounds. For deterministic results on 64-bit integers, use the deterministic witness set: {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}. For cryptographic use, Miller-Rabin with 40+ rounds gives error probability below 2^-80.
Symptom · 02
Factorization algorithm times out on large composite numbers
→
Fix
Check the algorithm choice. Trial division is O(sqrt(n)) — impractical for numbers above 10^18. For large composites, use Pollard's Rho for small factors, then Elliptic Curve Method (ECM) for medium factors, then Quadratic Sieve or General Number Field Sieve (GNFS) for large factors. GNFS is the fastest known classical algorithm for numbers above 100 digits.
Symptom · 03
Sieve of Eratosthenes runs out of memory for large ranges
→
Fix
A segmented sieve processes one segment at a time, keeping only sqrt(n) primes in memory. For range [a, b], memory is O(sqrt(b) + segment_size) instead of O(b). Use segment sizes of 32KB-1MB to balance cache efficiency and memory usage.
Symptom · 04
RSA key generation produces keys with shared prime factors
→
Fix
This indicates entropy failure. Check the random number generator source. Run pairwise GCD across all generated moduli: GCD(n1, n2) > 1 means shared factor. Re-seed the RNG with hardware entropy. Consider switching to an HSM for key generation.
Symptom · 05
Hash table performance degrades due to composite bucket count
→
Fix
Hash tables with composite bucket counts (especially powers of 2 or numbers with many small factors) cluster keys unevenly. Use a prime number of buckets for open addressing. For closed addressing (chaining), powers of 2 with a good hash function work fine.
★ Composite Number Triage Cheat SheetFast symptom-to-action for engineers investigating primality, factorization, or cryptographic issues. First 5 minutes.
Need to check if a number is composite or prime−
Immediate action
Run trial division up to sqrt(n) for numbers under 10^12. Use Miller-Rabin for larger numbers.
Commands
python3 -c "n=123456789; print('COMPOSITE' if any(n%i==0 for i in range(2,int(n**0.5)+1)) else 'PRIME')"
python3 -c "# segmented sieve: only store primes up to sqrt(n), process range in chunks"
Fix now
Use segmented sieve with 32KB-1MB segment size. Memory drops from O(n) to O(sqrt(n) + segment_size).
Primality Testing and Factorization Algorithm Comparison
Algorithm
Type
Time Complexity
Practical Limit
Use Case
Trial Division
Deterministic
O(sqrt(n))
n < 10^12
Small numbers, finding small factors
Miller-Rabin (k rounds)
Probabilistic
O(k * log^3 n)
n < 2^10000
General primality testing, cryptographic key generation
Miller-Rabin (12 fixed witnesses)
Deterministic for n < 2^64
O(12 * log^3 n)
n < 2^64
64-bit integer primality, database keys
AKS
Deterministic
O(log^6 n)
Theoretical only
Academic interest, not practical
Sieve of Eratosthenes
Deterministic
O(n log log n)
n < 10^9
Generating all primes up to a limit
Segmented Sieve
Deterministic
O(n log log n), O(sqrt(n)) space
n < 10^12+
Primes in large ranges, memory-constrained
Pollard's Rho
Probabilistic
O(n^(1/4)) expected
n < 10^30
Finding small factors of large composites
Quadratic Sieve
Deterministic
Sub-exponential
n < 10^100
Factoring medium-sized composites
GNFS
Deterministic
O(exp(c n^(1/3) (log n)^(2/3)))
n < 10^300
Fastest classical factoring for large composites
Shor's Algorithm
Quantum
O(log^3 n)
Requires quantum computer
Breaking RSA (post-quantum threat)
Key takeaways
1
A composite number has divisors beyond 1 and itself. Every composite has at least one prime factor <= sqrt(n). This property makes trial division work.
2
The Fundamental Theorem of Arithmetic guarantees unique prime factorization. This is the mathematical foundation of RSA cryptography.
3
Miller-Rabin with 40 rounds is the standard for cryptographic primality testing. For 64-bit integers, 12 fixed witnesses give deterministic results.
4
RSA security depends on the hardness of factoring composite numbers. Shared prime factors across moduli make factoring trivial via GCD. Always verify key uniqueness.
5
The Sieve of Eratosthenes is O(n log log n)
300x faster than trial division for generating all primes up to n. Use bit-packing for 8x memory reduction.
6
Carmichael numbers are composites that fool Fermat's test. Never use Fermat's test alone. Always use Miller-Rabin.
7
Hash tables with composite bucket counts (many small factors) cause uneven key distribution. Use prime bucket counts for open addressing.
8
Shor's algorithm breaks RSA on quantum computers. Plan post-quantum migration (lattice-based cryptography) by 2030.
Common mistakes to avoid
5 patterns
×
Using Fermat's primality test alone without Miller-Rabin
Symptom
Carmichael numbers (e.g., 561, 1105, 1729) pass Fermat's test for all bases coprime to n. Composite numbers are classified as prime.
Fix
Always use Miller-Rabin instead of Fermat's test. Miller-Rabin adds the squaring step that catches Carmichael numbers. For 64-bit integers, use 12 fixed deterministic witnesses.
×
Using trial division for numbers above 10^12
Symptom
Trial division takes minutes to hours for numbers above 10^12. For 10^18, it takes days. For cryptographic-scale numbers (2^2048), it is computationally impossible.
Fix
Use Miller-Rabin for primality testing. For factorization, switch to Pollard's Rho, Quadratic Sieve, or GNFS depending on the size.
×
Not verifying RSA key uniqueness after generation
Symptom
Shared prime factors across keys due to entropy failure. GCD(n1, n2) > 1 instantly reveals the shared prime, compromising both private keys.
Fix
After generating each RSA key, compute GCD(new_modulus, all_existing_moduli). Reject and re-generate if GCD > 1. Run nightly pairwise GCD scans across all stored moduli.
×
Using a composite modulus in a Linear Congruential Generator
Symptom
The PRNG period is shorter than the modulus. Patterns appear in the output sequence. Randomness tests fail.
Fix
Use a prime modulus for maximum period. If using a power-of-2 modulus, ensure the multiplier satisfies specific constraints (a mod 8 = 5). Verify the observed period matches the theoretical maximum.
×
Allocating O(n) memory for sieve when n > 10^8
Symptom
Sieve of Eratosthenes for n = 10^9 requires 1GB with boolean array. Container OOM crash.
Fix
Use bit-packed sieve (1/8 memory) or segmented sieve (O(sqrt(n)) memory). For n = 10^9, bit-packing reduces memory to 125MB. Segmented sieve reduces it further.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01JUNIOR
What is the difference between a prime number and a composite number? Wh...
Q02SENIOR
How does the Miller-Rabin primality test work, and why is it preferred o...
Q03SENIOR
Why is RSA encryption based on composite numbers? What would happen if s...
Q04SENIOR
A hash table uses 1000 buckets. Performance degrades under certain key p...
Q05SENIOR
What are Carmichael numbers and why do they matter for primality testing...
Q01 of 05JUNIOR
What is the difference between a prime number and a composite number? What about the number 1?
ANSWER
A prime number has exactly two positive divisors: 1 and itself. A composite number has more than two positive divisors — it can be written as a product of two integers both greater than 1. The number 1 is neither prime nor composite — it is a unit. It has exactly one divisor (itself), which does not satisfy the prime definition (needs exactly two). This distinction matters because the Fundamental Theorem of Arithmetic requires unique prime factorization, and including 1 as prime would break uniqueness (6 = 2 3 = 1 2 3 = 1^100 2 * 3).
Q02 of 05SENIOR
How does the Miller-Rabin primality test work, and why is it preferred over trial division for large numbers?
ANSWER
Miller-Rabin is based on properties of modular exponentiation. It writes n-1 as 2^r d, then checks whether a^d = 1 mod n or a^(2^j d) = -1 mod n for some j. If neither holds for a witness a, then n is definitely composite. Each round has at most 25% chance of missing a composite. After k rounds, error probability is at most 4^(-k). It is preferred over trial division because trial division is O(sqrt(n)) — for a 2048-bit number, sqrt(n) is 2^1024, which is computationally impossible. Miller-Rabin is O(log^3 n) per round — milliseconds even for cryptographic-scale numbers.
Q03 of 05SENIOR
Why is RSA encryption based on composite numbers? What would happen if someone could factor large composites efficiently?
ANSWER
RSA relies on the computational asymmetry between multiplying two large primes (easy, milliseconds) and factoring their composite product (hard, infeasible for 2048-bit numbers with classical algorithms). The public key contains n = p * q. The private key requires knowing p and q individually. If someone could factor n efficiently, they could derive the private key from the public key, breaking all RSA encryption. The best classical algorithm (GNFS) factors a 2048-bit composite in approximately 2^112 operations. Shor's algorithm on a quantum computer could do it in polynomial time, which is why NIST is standardizing post-quantum alternatives.
Q04 of 05SENIOR
A hash table uses 1000 buckets. Performance degrades under certain key patterns. What is the problem and how do you fix it?
ANSWER
1000 = 2^3 * 5^3 is a composite number with many small factors. If the hash function produces values that share these factors, keys cluster into fewer buckets. For example, sequential integer keys hashed as key % 1000 will cluster in buckets 0-999 with uneven distribution if the hash function preserves divisibility patterns. Fix: use a prime number of buckets (1009 instead of 1000). Primes have no factors in common with most hash values, ensuring uniform distribution. Alternatively, use powers of 2 (1024) with a high-quality hash function (murmurhash, xxhash) that mixes all input bits into the output.
Q05 of 05SENIOR
What are Carmichael numbers and why do they matter for primality testing?
ANSWER
Carmichael numbers are composite numbers that pass Fermat's primality test for all bases coprime to n. The smallest is 561 = 3 11 17. They satisfy a^(n-1) = 1 mod n for all coprime a, which is Fermat's criterion for primality. They fool Fermat's test into declaring them prime. This is why Fermat's test alone is insufficient for primality testing. Miller-Rabin catches Carmichael numbers because it checks additional conditions beyond Fermat's criterion — specifically, it requires that a^d = 1 or a^(2^j * d) = -1 mod n for some j, which Carmichael numbers fail for certain witnesses.
01
What is the difference between a prime number and a composite number? What about the number 1?
JUNIOR
02
How does the Miller-Rabin primality test work, and why is it preferred over trial division for large numbers?
SENIOR
03
Why is RSA encryption based on composite numbers? What would happen if someone could factor large composites efficiently?
SENIOR
04
A hash table uses 1000 buckets. Performance degrades under certain key patterns. What is the problem and how do you fix it?
SENIOR
05
What are Carmichael numbers and why do they matter for primality testing?
SENIOR
FAQ · 8 QUESTIONS
Frequently Asked Questions
01
What is a composite number?
A composite number is a positive integer greater than 1 that has at least one positive divisor other than 1 and itself. In other words, it can be divided evenly by numbers other than 1 and itself. Examples: 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20.
Was this helpful?
02
What is the smallest composite number?
The smallest composite number is 4. It is the product of 2 * 2. The numbers 2 and 3 are prime (not composite), and 1 is neither prime nor composite.
Was this helpful?
03
Is 1 a composite number?
No. The number 1 is neither prime nor composite. It is called a unit. It has exactly one positive divisor (itself), which does not satisfy the definition of prime (exactly two divisors) or composite (more than two divisors).
Was this helpful?
04
How do you check if a number is composite?
Check if any integer from 2 to sqrt(n) divides n evenly. If any divisor is found, the number is composite. This is trial division with O(sqrt(n)) time complexity. For large numbers, use the Miller-Rabin primality test — if the number is not prime (and greater than 1), it is composite.
Was this helpful?
05
What is the Fundamental Theorem of Arithmetic?
Every integer greater than 1 can be expressed as a unique product of prime numbers, up to the order of factors. For example, 60 = 2^2 3 5. This factorization is unique — no other combination of primes produces 60. This theorem is the foundation of RSA cryptography.
Was this helpful?
06
Why are composite numbers important in cryptography?
RSA encryption relies on the difficulty of factoring large composite numbers. The public key contains n = p * q (the product of two large primes). Multiplying p and q is easy, but factoring n back into p and q is computationally infeasible for 2048-bit numbers with classical algorithms. The security of encrypted communication depends on this asymmetry.
Was this helpful?
07
What is the difference between trial division and the Sieve of Eratosthenes?
Trial division tests individual numbers for primality by checking divisors up to sqrt(n). It is O(sqrt(n)) per number. The Sieve of Eratosthenes generates all primes up to a limit by iteratively marking composite numbers. It is O(n log log n) total — 300x faster than testing each number individually with trial division.
Was this helpful?
08
What are Carmichael numbers?
Carmichael numbers are composite numbers that pass Fermat's primality test for all bases coprime to n. They satisfy a^(n-1) = 1 mod n, which is Fermat's criterion for primality. The smallest is 561 = 3 11 17. They are the reason Fermat's test alone is insufficient — Miller-Rabin adds additional checks to detect them.