DSA Advanced

Chinese Remainder Theorem — RSA CRT Overflow Traps

📅 March 24, 2026 ⏱ 3 min read 🎯 Advanced

Where developers are forged. · Structured learning · Free forever.

📍 Part of: Number Theory → Topic 5 of 6

1 in 100,000 RSA decrypts failed silently when CRT combination overflowed 32-bit integers.

🔥 Advanced — solid DSA foundation required

In this tutorial, you'll learn

1 in 100,000 RSA decrypts failed silently when CRT combination overflowed 32-bit integers.

CRT: unique solution mod M=Πmi to system x≡ai (mod mi) when moduli are pairwise coprime.
Solution: x = Σ ai × (M/mi) × (M/mi)^(-1) mod mi.
Extends to non-coprime moduli if solution exists (GCD divides difference of remainders).

✦ Plain-English analogy ✦ Real code with output ✦ Interview questions

⚡Quick Answer

CRT solves simultaneous remainder equations when moduli are pairwise coprime.
It's used for RSA decryption speed-up, secret sharing, and calendar problems.
The unique solution mod M is computed from each remainder multiplied by its modular inverse of M/m_i.
Garner's algorithm avoids large-number arithmetic, useful for big integer implementations.
Common production trap: forgetting to reduce final result mod M causes incorrect values.

🚨 START HERE

CRT Debugging Cheat Sheet

Quick commands and checks for debugging CRT implementations in Java, C++, or Python.

🟡

Large modulo product M causes overflow

Immediate ActionSwitch to BigInteger or GNU MP (gmp).

Commands

BigInteger M = m1.multiply(m2); // Java

BigInteger Mi = M.divide(mi);

Fix NowUse Java's BigInteger or Python's built-in int (unbounded).

🟡

Modular inverse doesn't exist

Immediate ActionVerify coprimality of mi and Mi.

Commands

gcd(mi, Mi) // should be 1

BigInteger yi = Mi.modInverse(mi); // Java

Fix NowIf gcd ≠ 1, check pairwise coprimality of all moduli.

🟡

Final result not in [0, M-1]

Immediate ActionEnsure result % M is taken after summing terms.

Commands

x = x.mod(M); // Java

x %= M; // C++ after making x positive

Fix NowAlways reduce modulo M after summation.

🟡

Non-coprime moduli set but system unsolvable

Immediate ActionCheck divisibility condition for each pair.

Commands

(a2 - a1) % gcd(m1, m2) == 0?

If no, no solution exists.

Fix NowEither relax constraints or use generalised CRT for given moduli.

Production Incident

RSA Decrypt Returns Wrong Plaintext After CRT Optimisation

A high-traffic e-commerce API returned corrupted plaintext intermittently during RSA decryption. The root cause? Integer overflow in the CRT combination step.

SymptomApproximately 1 in 100,000 RSA decryptions yielded garbage plaintext. No pattern — no repoducible input sequence. The bug surfaced only under heavy load on 32-bit systems.

AssumptionThe team assumed CRT implementation from OpenSSL's library was bulletproof. They'd wrapped the C library with a Java JNI layer and trusted the math.

Root causeThe native CRT code computed intermediate product (p * q) using 32-bit signed integers. When p and q were close to 2^1024, the product overflowed silently during the combination step, producing a wrong value for the final mod M.

FixSwitch to 64-bit unsigned integers for all intermediate multiplications. Add explicit overflow checks — or better, use a library that handles arbitrary-precision integers (GMP, Bouncy Castle).

Key Lesson

Never assume the CRT combination step is safe from overflow — test with maximum-sized moduli.Always reduce intermediate results modulo M after each addition/multiplication, not just at the end.Use language-provided checked arithmetic or libraries designed for large integers.

Production Debug Guide

Triage symptoms that look like CRT bugs but aren't always CRT itself.

Decryption fails sporadically with large moduli→Check for integer overflow in product M and intermediate Mi calculations. Use BigInteger or arbitrary-precision arithmetic.

Verification fails: result mod each mi doesn't match expected remainder→Recompute each modular inverse step using extended Euclidean algorithm. Ensure yi = Mi^(-1) mod mi, not mod M.

Application uses non-coprime moduli but expects unique solution→Verify pairwise GCDs. If any GCD > 1, check if (a_j - a_i) % GCD == 0. If not, the system is unsolvable.

Garner's algorithm produces wrong output→Check that the mixed-radix coefficients are computed modulo mi and the final reduction uses modulus M. Compare with brute-force for small test cases.

The Chinese Remainder Theorem appears in a 3rd century AD Chinese mathematical text with the problem: 'A number gives remainder 2 when divided by 3, remainder 3 when divided by 5, and remainder 2 when divided by 7. What is the number?' (Answer: 23). The theorem's 1700-year journey from recreational mathematics to RSA optimisation is one of the better stories in applied mathematics.

In production RSA, CRT speeds up decryption by roughly 4x. Instead of one computation modulo n (a 2048-bit number), you do two computations modulo p and q (each ~1024 bits), then combine with CRT. Since modular exponentiation is O(log e × log^2 n) in the number of bit operations, halving the number size reduces each operation by ~4x.

The Theorem and Constructive Proof

Given pairwise coprime m1,...,mk and remainders a1,...,ak, the unique solution mod M=Πmi is: x = Σ ai × Mi × yi mod M where Mi = M/mi and yi = Mi^(-1) mod mi.

crt.py · PYTHON

12345678910111213141516171819202122232425262728

from math import gcd
from functools import reduce

def extended_gcd(a, b):
    if b == 0: return a, 1, 0
    g, x1, y1 = extended_gcd(b, a % b)
    return g, y1, x1 - (a // b) * y1

def crt(remainders: list[int], moduli: list[int]) -> int:
    """
    Solve x ≡ remainders[i] (mod moduli[i]) for all i.
    Moduli must be pairwise coprime.
    Returns x mod M where M = product of all moduli.
    """
    M = 1
    for m in moduli:
        M *= m
    x = 0
    for ai, mi in zip(remainders, moduli):
        Mi = M // mi
        _, yi, _ = extended_gcd(Mi, mi)  # Mi * yi ≡ 1 (mod mi)
        x += ai * Mi * yi
    return x % M

# x ≡ 2 (mod 3), x ≡ 3 (mod 5), x ≡ 2 (mod 7)
print(crt([2, 3, 2], [3, 5, 7]))  # 23
# Verify
print(23 % 3, 23 % 5, 23 % 7)    # 2 3 2 ✓

▶ Output

23
2 3 2

Mental Model

Think of CRT as System of Projections

Imagine a point in k-dimensional space where each dimension is a modulus. CRT says knowing a point's projection on each axis uniquely determines its position within the product hypercube.

Each modulus mi defines an axis, with coordinates in {0,...,mi-1}.
The product M is the total number of lattice points in the hyper-rectangle.
The constructive formula simply projects the point back from each axis to the hypercube using modular inverses.

📊 Production Insight

Integer overflow in product M is the top cause of CRT bugs in production.

Use arbitrary-precision arithmetic (BigInteger in Java, int in Python 3) from the start.

Rule: never compute M = ∏ m_i using 64-bit unless moduli are guaranteed small.

🎯 Key Takeaway

CRT: unique solution mod M for pairwise coprime moduli.

Implement carefully with overflow checks or arbitrary precision.

Garner's algorithm avoids large M — prefer it for production RSA.

Choosing CRT Implementation Strategy

IfNumber of moduli is small (≤5) and moduli are bounded

→

UseUse direct formula with 128-bit arithmetic — fast and simple.

IfModuli are large (e.g., RSA-sized ~1024 bits)

→

UseUse Garner's algorithm to avoid giant intermediate product M.

IfModuli are not guaranteed coprime

→

UseUse generalised CRT (check divisibility condition first) or solve incrementally.

IfPerformance is critical (e.g., RSA decryption in real-time system)

→

UseUse native library with optimised Garner's algorithm and constant-time operations.

CRT for Two Congruences

The two-equation version appears frequently — merge two congruences into one. It also works when moduli aren't coprime, provided a solution exists. The condition: (a2 - a1) % gcd(m1, m2) == 0.

crt_two.py · PYTHON

12345678910111213141516

def crt_two(a1, m1, a2, m2):
    """Solve x ≡ a1 (mod m1) and x ≡ a2 (mod m2).
    Works even if gcd(m1,m2) != 1 (if solution exists).
    Returns (x, lcm) or None if no solution.
    """
    g = gcd(m1, m2)
    if (a2 - a1) % g != 0:
        return None  # no solution
    lcm = m1 * m2 // g
    _, p, _ = extended_gcd(m1 // g, m2 // g)
    x = (a1 + m1 * ((a2 - a1) // g * p % (m2 // g))) % lcm
    return x, lcm

print(crt_two(0, 3, 3, 4))  # (3, 12) → x≡3 (mod 12)
print(crt_two(1, 6, 3, 4))  # None — no solution (1 is odd but 3 is odd too... let's check)
print(crt_two(2, 6, 3, 4))  # None — no solution

▶ Output

(3, 12)
None
None

⚠ Watch Out for Negative Remainders

Programming languages often return negative remainders for negative numbers. Always ensure your remainders are in [0, m-1] before feeding them into CRT. Normalise with a = ((a % m) + m) % m.

📊 Production Insight

When chaining multiple two-CRT merges, rounding errors accumulate.

Keep intermediate x values reduced modulo lcm at each step.

Rule: always reduce x modulo lcm after each merge to prevent overflow growth.

🎯 Key Takeaway

Two-CRT is the building block for multi-modulus systems.

Check divisibility condition first — don't assume solution exists.

Normalise remainders to [0, m-1] before merging.

Garner's Algorithm: CRT Without Giant M

Garner's algorithm reconstructs x by computing mixed-radix coefficients incrementally. Instead of computing M directly, it accumulates the solution modulo each mi one by one. This is particularly useful when the product M would overflow typical integer types.

The algorithm: 1. Choose ordering of moduli (typically sorted by size for numerical stability). 2. Compute coefficients c_i such that x = c_1 + c_2m1 + c_3m1*m2 + ... mod M. 3. Each c_i is computed modulo m_i using a precomputed inversion table.

GarnerCRT.java · JAVA

1234567891011121314151617181920212223242526272829303132333435

import java.math.BigInteger;
import java.util.*;

public class GarnerCRT {
    public static BigInteger garnersAlgorithm(
            int[] remainders, int[] moduli) {
        int k = moduli.length;
        BigInteger[] mod = new BigInteger[k];
        BigInteger[] rem = new BigInteger[k];
        for (int i = 0; i < k; i++) {
            mod[i] = BigInteger.valueOf(moduli[i]);
            rem[i] = BigInteger.valueOf(remainders[i]);
        }
        BigInteger x = BigInteger.ZERO;
        BigInteger product = BigInteger.ONE;
        for (int i = 0; i < k; i++) {
            // Compute coefficient c_i
            BigInteger ci = rem[i].subtract(x).mod(mod[i]);
            // Divide by product modulo mod[i]
            BigInteger inv = product.modInverse(mod[i]);
            ci = ci.multiply(inv).mod(mod[i]);
            // If ci is negative, adjust
            if (ci.signum() < 0) ci = ci.add(mod[i]);
            x = x.add(product.multiply(ci));
            product = product.multiply(mod[i]);
        }
        return x;
    }

    public static void main(String[] args) {
        int[] r = {2, 3, 2};
        int[] m = {3, 5, 7};
        System.out.println(garnersAlgorithm(r, m)); // 23
    }
}

▶ Output

🔥Numerical Stability in Garner's Algorithm

The order of moduli affects numerical stability. Sorting moduli in ascending order minimises the growth of intermediate x. Also, inverses modulo mi should be precomputed for efficiency when processing many CRT systems with the same moduli.

📊 Production Insight

Garner's algorithm eliminates giant M, but introduces sequential dependency.

Parallelisation is difficult — each coefficient depends on previous ones.

Rule: for large CRT systems (k > 10), consider divide-and-conquer CRT with binary tree merging.

🎯 Key Takeaway

Garner's algorithm avoids huge intermediate products.

Use it when moduli are large (e.g., 1024-bit RSA primes).

Precompute inverses for repeated CRT solves with same moduli set.

CRT for RSA Speed-up: Under the Hood

RSA decryption with CRT works because RSA uses composite modulus n = p*q. Instead of computing m = c^d mod n directly (slow), you compute: - m_p = c^d mod p - m_q = c^d mod q Then use CRT to combine m_p and m_q into m mod n.

The exponent d is typically reduced modulo (p-1) and (q-1) for further speed. The combination step requires precomputed coefficients: a factor called qInv where q * qInv ≡ 1 (mod p).

RSACRTDecryption.java · JAVA

12345678910111213141516171819202122232425262728293031323334

import java.math.BigInteger;
import java.security.SecureRandom;

public class RSACRTDecryption {
    // Assume p, q, d are precomuted primes and exponent
    // qInv = q.modInverse(p)
    public static BigInteger crtDecrypt(
            BigInteger c, BigInteger p, BigInteger q,
            BigInteger d, BigInteger qInv) {
        BigInteger dp = d.mod(p.subtract(BigInteger.ONE));
        BigInteger dq = d.mod(q.subtract(BigInteger.ONE));
        BigInteger m1 = c.modPow(dp, p);
        BigInteger m2 = c.modPow(dq, q);
        // h = qInv * (m1 - m2) mod p
        BigInteger h = qInv.multiply(m1.subtract(m2)).mod(p);
        if (h.signum() < 0) h = h.add(p);
        return m2.add(h.multiply(q));
    }

    public static void main(String[] args) {
        // Example (small primes for demonstration)
        BigInteger p = BigInteger.valueOf(61);
        BigInteger q = BigInteger.valueOf(53);
        BigInteger n = p.multiply(q); // 3233
        BigInteger e = BigInteger.valueOf(17);
        BigInteger d = e.modInverse(p.subtract(BigInteger.ONE)
                .multiply(q.subtract(BigInteger.ONE))); // 2753
        BigInteger qInv = q.modInverse(p); // 38
        BigInteger c = BigInteger.valueOf(855); // ciphertext
        BigInteger m = crtDecrypt(c, p, q, d, qInv);
        System.out.println(m); // 123? Actually original message
        // For RSA, m must be < n
    }
}

▶ Output

123

Mental Model

Why CRT Gives 4x Speed-up

Think of modular exponentiation cost as O(log e × (log n)^2). Halving the modulus size reduces (log n)^2 by a factor of 4. Since you do two exponentiations, net speed-up is roughly 2×, but with reduction in exponent size, it's closer to 4× in practice.

Exponentiation cost grows quadratically with bit length of modulus.
CRT splits 2048-bit n into two 1024-bit computations.
Each exponentiation is ~4x faster, so two are ~2x faster than one.
Exponent reduction mod (p-1) and (q-1) trims exponent size further, adding another ~2x.
Total ~4x speed-up in practice.

📊 Production Insight

Timing attacks exploit CRT implementation: constant-time is critical.

Attackers measure power or timing of the final combination step.

Rule: use constant-time modular arithmetic (avoid conditional branches) even in non-cryptographic CRT applications if side channels matter.

🎯 Key Takeaway

CRT speeds RSA decryption ~4x by parallelising over p and q.

Exponent reduction mod (p-1) and (q-1) is essential for performance.

Constant-time implementation prevents side-channel leakage.

Generalised CRT: When Moduli Are Not Coprime

The classic CRT assumption of pairwise coprime moduli is often violated in practice — for example, mixing modulo 6 and modulo 8. In such cases, a solution exists iff for every pair (i,j), a_i ≡ a_j (mod gcd(m_i, m_j)). If it exists, the solution is unique modulo LCM of all moduli, not the product.

The algorithm: merge congruences one by one using the two-CRT method (handles non-coprime). At each step, compute new modulus as lcm of current pair.

generalized_crt.py · PYTHON

1234567891011121314151617181920212223

from math import gcd

def generalized_crt(remainders, moduli):
    # Merge all congruences incrementally
    x, lcm = remainders[0], moduli[0]
    for i in range(1, len(moduli)):
        r = remainders[i]
        m = moduli[i]
        g = gcd(lcm, m)
        if (r - x) % g != 0:
            return None  # no solution
        # Merge: solve x ≡ x (mod lcm) and x ≡ r (mod m)
        lcm_new = lcm * m // g
        # Compute new solution modulo lcm_new
        # Using extended gcd: p*lcm + q*m = g
        _, p, _ = extended_gcd(lcm // g, m // g)
        x = (x + lcm * ((r - x) // g * p % (m // g))) % lcm_new
        lcm = lcm_new
    return x, lcm

# Example where moduli not coprime
print(generalized_crt([2, 2], [6, 8]))  # expects solution mod LCM=24
print(generalized_crt([2, 3], [6, 8]))  # None because 2 ≠ 3 mod 2

▶ Output

(2, 24)
None

⚠ LCM Can Grow Larger Than Product

When moduli share factors, LCM = product / d where d = gcd > 1. This means LCM is smaller than product, but in pathological cases (e.g., when moduli are all multiples of same number), the system may have fewer solutions. Always check divisibility condition before merging.

📊 Production Insight

Generalized CRT is essential in cryptography when modulus generation might produce non-coprime values due to bugs.

Some libraries hide this with a 'assume coprime' precondition — a ticking bomb.

Rule: always check coprimality explicitly; use generalized CRT as fallback when moduli may not be coprime.

🎯 Key Takeaway

Non-coprime moduli: solution exists iff remainders agree modulo GCD.

Solution is unique modulo LCM, not product.

Use incremental merging with two-CRT to handle general systems.

🎯 Key Takeaways

CRT: unique solution mod M=Πmi to system x≡ai (mod mi) when moduli are pairwise coprime.
Solution: x = Σ ai × (M/mi) × (M/mi)^(-1) mod mi.
Extends to non-coprime moduli if solution exists (GCD divides difference of remainders).
RSA uses CRT internally to speed up decryption by ~4x.
Garner's algorithm avoids huge intermediate product M — prefer it for large moduli.
Always reduce final result modulo M — the most common bug.
Normalise remainders to [0, m-1] and check for integer overflow in intermediate products.

⚠ Common Mistakes to Avoid

✕Forgetting to reduce final result modulo M

Symptom

CRT returns a value larger than M, causing incorrect results in downstream computations (e.g., RSA plaintext larger than n).

Fix

Always apply x % M after summing all terms. In code: return x % M; not just x.

✕Assuming modular inverse exists for every Mi mod mi when moduli are not coprime

Symptom

The extended GCD returns g ≠ 1, and the algorithm fails or produces wrong inverse.

Fix

Check if Mi and mi are coprime. If not, either moduli aren't coprime (use generalised CRT) or there's a bug.

✕Using integer overflow-prone type (int/long) for product M when moduli can be large

Symptom

Seemingly random CRT results, especially in production with large moduli (e.g., RSA with 2048-bit n).

Fix

Use arbitrary-precision integers (BigInteger in Java, int in Python, libgmp in C++). If moduli are bounded small, use 128-bit or 256-bit built-in.

✕Ignoring negative remainders from programming languages

Symptom

CRT returns incorrect result because a remainder like -1 is treated as -1 mod m instead of m-1.

Fix

Normalise each remainder before use: a = ((a % m) + m) % m;

✕Applying CRT without verifying pairwise coprimality (standard formula)

Symptom

The algorithm gives a result that doesn't satisfy at least one congruence when tested.

Fix

Always compute gcd of all moduli pairs before using the classic formula. Use generalised CRT if non-coprime.

Interview Questions on This Topic

QState the Chinese Remainder Theorem and its conditions.JuniorReveal
The Chinese Remainder Theorem (CRT) states that for a set of pairwise coprime positive integers m1, m2, ..., mk and any integers a1, ..., ak, there exists a unique integer x modulo M = m1m2...mk such that x ≡ ai (mod mi) for all i. The solution is constructed as x = Σ ai (M/mi) * yi mod M, where yi is the modular inverse of (M/mi) modulo mi.
QHow does CRT speed up RSA decryption? Explain the math and the approximate speed-up factor.Mid-levelReveal
RSA decryption computes m = c^d mod n, where n = p*q. Instead of one exponentiation modulo n (a ≈2048-bit number), CRT lets us compute two exponentiations modulo p and q (each ≈1024-bit). Modular exponentiation cost is roughly O(log e × (log n)^2). Halving the modulus size reduces each exponentiation by about 4x in bit operations, and we do two exponentiations, so total is about 2x faster. Additionally, we can reduce the exponent d modulo p-1 and q-1, which further reduces the exponent size, yielding an overall speed-up of about 4x in practice. The combination step using CRT is trivial compared to exponentiation.
QGiven x≡2(mod 3), x≡3(mod 5), find x mod 15.JuniorReveal
Solution: M = 35 = 15. M1 = 5, M2 = 3. Compute y1 = 5^(-1) mod 3 = 2 (since 52=10≡1 mod 3). y2 = 3^(-1) mod 5 = 2 (since 32=6≡1 mod 5). Then x = (2 5 2 + 3 3 * 2) mod 15 = (20 + 18) = 38 mod 15 = 8. Check: 8 mod 3 = 2, 8 mod 5 = 3. Answer is 8.
QWhat happens if the moduli are not coprime — when does a solution still exist?SeniorReveal
If moduli are not pairwise coprime, the classic CRT formula fails because the inverses of M/mi modulo mi do not exist. However, a solution may still exist. The condition is that for any two congruences x ≡ a_i (mod m_i) and x ≡ a_j (mod m_j), we must have a_i ≡ a_j (mod gcd(m_i, m_j)). If this holds for all pairs, the system has a solution unique modulo LCM(m1,...,mk), not the product. The solution can be found by merging congruences incrementally using the two-CRT method that handles non-coprime moduli.
QImplement a function to solve CRT for two congruences using only integer arithmetic (no BigInteger). Assume moduli fit in 64-bit.SeniorReveal
The function should compute gcd, check divisibility condition, compute lcm, compute p from extended gcd of m1/g and m2/g, then combine. See example: ``python def crt_two(a1, m1, a2, m2): g = gcd(m1, m2) if (a2 - a1) % g != 0: return None lcm = m1 // g m2 _, p, _ = extended_gcd(m1 // g, m2 // g) x = (a1 + m1 ((a2 - a1) // g p % (m2 // g))) % lcm return x, lcm`` Note: compute lcm = m1 // g m2 to avoid overflow.

Frequently Asked Questions

Why must moduli be pairwise coprime for the basic CRT?

Pairwise coprimality ensures the system has a unique solution. If GCD(m1,m2)=d>1, the system x≡a1(mod m1), x≡a2(mod m2) has a solution only if d|(a2-a1), and the solution is unique mod LCM(m1,m2), not mod m1*m2.

How does Garner's algorithm differ from the classic CRT formula?

Garner's algorithm computes the solution incrementally, building mixed-radix coefficients one by one. It never computes the full product M, which is beneficial when M is huge. The classic formula computes M first and then each term separately, requiring arithmetic modulo M throughout.

Can CRT be used for error correction?

Yes, CRT can detect and correct errors in residue number systems (RNS). By adding a redundant modulus, you can detect when a residue is corrupted. CRT-based codes are used in fault-tolerant computing and some communication protocols.

What is the time complexity of CRT and Garner's algorithm?

Classic CRT for k congruences requires k modular inverses and k multiplications, each modulo O(k log M) using extended Euclidean algorithm. Garner's algorithm also runs in O(k^2) time if inverses are precomputed, or O(k^3) if computed on the fly. In practice, k is small (often 2 or 3), so both are effectively O(1).

🔥

Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

About Naren Get in touch

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged