Strassen's Algorithm — Catastrophic Cancellation in Seismic
Strassen's 18 additions per block cause 5× more FP ops and up to 1e-3 relative error.
- Strassen's algorithm multiplies n×n matrices in O(n^2.807) time, beating the naive O(n³).
- Core trick: compute 7 products instead of 8 for 2×2 blocks, then combine with 18 additions.
- Recursively split into n/2×n/2 blocks — recurrence T(n) = 7T(n/2) + O(n²).
- Practical crossover: only faster than BLAS for n > 1024 in double precision; naive wins below that.
- Biggest mistake: ignoring catastrophic cancellation — Strassen's additions amplify floating-point errors.
Multiplying two n×n matrices naively takes n³ multiplications. Strassen's 1969 insight: for 2×2 matrices, you can get away with 7 multiplications instead of 8, using some clever additions. Recursively applying this to n/2×n/2 blocks gives O(n^2.807) — beating the O(n³) barrier. Each multiplication saved at the 2×2 level propagates to asymptotic improvement at scale.
For decades, everyone assumed matrix multiplication required exactly n^3 multiplications for n×n matrices. In 1969, Volker Strassen proved them wrong by showing you only need 7 recursive multiplications instead of 8 for the 2×2 case. Applied recursively, this changes the exponent from log_2(8)=3 to log_2(7)≈2.807.
Strassen's algorithm matters less for its practical use (numerical stability issues and large constants make it impractical below n≈512) than for what it proved: the O(n^3) barrier is not a fundamental law. It opened the door to algorithms approaching O(n^2) — Coppersmith-Winograd (1987) achieved O(n^2.376), and subsequent work has continued pushing toward O(n^2). As of 2026, the theoretical lower bound is unknown. The entire sub-cubic matrix multiplication literature traces back to Strassen's 1969 insight.
Standard Matrix Multiplication
For C = A × B where A, B are n×n matrices: C[i][j] = Σ A[i][k] × B[k][j] for k in 0..n-1
This requires n³ multiplications and n²(n-1) additions → O(n³).
Strassen's 7-Multiplication Formula
For 2×2 matrices A and B split into quadrants: A = [[a,b],[c,d]], B = [[e,f],[g,h]]
- M1 = (a+d)(e+h)
- M2 = (c+d)e
- M3 = a(f-h)
- M4 = d(g-e)
- M5 = (a+b)h
- M6 = (c-a)(e+f)
- M7 = (b-d)(g+h)
Result: C = [[M1+M4-M5+M7, M3+M5],[M2+M4, M1-M2+M3+M6]]
Master Theorem Analysis
T(n) = 7T(n/2) + O(n²)
a=7, b=2, f(n)=n² log_b(a) = log₂(7) ≈ 2.807
Since f(n) = O(n^(2.807-ε)), we are in Master Theorem Case 1: T(n) = Θ(n^log₂7) ≈ Θ(n^2.807)
Saving just one multiplication per level of recursion changed the exponent from 3 to 2.807.
Crossover Point
Strassen is only faster than naive multiplication for large n (typically n > 128-512 depending on implementation). The large number of additions and the overhead of recursion make it slower for small matrices. Practical implementations switch to naive multiplication below a threshold.
Numerical Stability and When to Avoid Strassen
Strassen's recursion amplifies floating-point errors. Each of the 7 products involves addition of potentially large intermediate values, leading to catastrophic cancellation. For example, M6 = (c-a)(e+f): if c and a are close, the subtraction loses significant digits. At deeper recursion, these errors compound. In double precision, errors can reach 1e-8 relative for n=1000, while naive stays near machine epsilon.
- You can tolerate errors > 1e-10 relative.
- Matrices are large (n > 1024) and your BLAS is already saturated.
- You've verified against naive for a random subset.
- You need exact results (symbolic or integer matrices) — use exact algorithms.
- Matrices are small (<512).
- Your matrix conditioning is poor — Strassen worsens error.
Catastrophic Cancellation in a Seismic Simulation
- Strassen is algebraically exact but numerically unstable in floating-point. Production code should benchmark against BLAS before adopting.
- Always verify Strassen output against naive multiplication for small random tests before deploying.
- If you must use Strassen in production, switch to naive below a threshold n<1024 and consider using compensated summation or double-double arithmetic.
Key takeaways
numpy.dot() in production.Common mistakes to avoid
3 patternsUsing Strassen without a threshold switch
Assuming Strassen is numerically safe in double precision
Implementing Strassen from scratch without cache optimisation
Interview Questions on This Topic
How many multiplications does Strassen's algorithm use for 2×2 matrices compared to the naive method?
Frequently Asked Questions
That's Recursion. Mark it forged?
3 min read · try the examples if you haven't