Senior 3 min · March 24, 2026

SVD — 4TB Memory Crash on Sparse Matrices

OutOfMemoryError from full SVD on a 1M×500k sparse matrix? Use scipy.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • SVD decomposes any matrix A into UΣV^T: U and V are orthogonal, Σ diagonal with singular values.
  • Singular values σ₁ ≥ σ₂ ≥ … ≥ 0 rank dimensions by importance; rank = count of positive σ.
  • Truncated SVD keeps top k singular values, giving the best rank-k approximation (Frobenius norm).
  • Full SVD on a 10k×10k dense matrix takes ~30 seconds (O(mn²)); use randomized SVD for large sparse matrices.
  • Production trap: numpy.linalg.svd on a sparse matrix creates a dense copy → OOM. Use scipy.sparse.linalg.svds instead.
Plain-English First

SVD is the Swiss Army knife of linear algebra. It decomposes any matrix — rectangular, rank-deficient, anything — into three matrices: a rotation, a scaling, and another rotation. The scaling diagonal (singular values) reveals the matrix's essential information content: large singular values carry important structure, small ones carry noise. Truncate the small ones and you get the best low-rank approximation — the foundation of image compression and collaborative filtering.

SVD is simultaneously the most powerful and most practically useful decomposition in linear algebra. Unlike eigendecomposition (restricted to square diagonalisable matrices), SVD works on any m×n matrix. It has a geometric interpretation (rotation-scale-rotation), a dimensionality reduction application (truncated SVD = PCA), and a computational application (pseudoinverse, rank, condition number).

Netflix's original collaborative filtering used SVD to decompose the user-movie rating matrix. Image compression, latent semantic analysis (LSA) for text, and principal component analysis all reduce to truncated SVD. NumPy's np.linalg.svd and scipy's truncated_svd (for large sparse matrices) implement it.

SVD Decomposition

A = U Σ V^T where U (m×m) and V (n×n) are orthogonal matrices, Σ (m×n) is diagonal with non-negative singular values σ₁ ≥ σ₂ ≥ ... ≥ 0.

svd.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
import numpy as np

A = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], dtype=float)
U, s, Vt = np.linalg.svd(A, full_matrices=False)

print(f'A shape: {A.shape}')
print(f'U: {U.shape}, s: {s.shape}, Vt: {Vt.shape}')
print(f'Singular values: {s}')
print(f'Rank (non-zero singular values): {np.sum(s > 1e-10)}')

# Reconstruct A from SVD
A_reconstructed = U @ np.diag(s) @ Vt
print(f'Reconstruction error: {np.max(np.abs(A - A_reconstructed)):.2e}')
Output
A shape: (4, 3)
U: (4, 3), s: (3,), Vt: (3, 3)
Singular values: [2.547e+01 1.291e+00 2.280e-15]
Rank (non-zero singular values): 2
Reconstruction error: 2.48e-15
Production Insight
Full SVD on a matrix with many rows/columns is O(mn²).
If your matrix has one dimension huge (e.g., 1M rows), use thin SVD (full_matrices=False).
Rule: always check sizes before running dense SVD on production data.
Key Takeaway
SVD works on any matrix — rectangular, singular, full.
U and V are rotations; Σ is a scaling along the principal axes.
Use thin SVD (full_matrices=False) to save memory.

Low-Rank Approximation — Truncated SVD

Keep only the k largest singular values — the rank-k approximation is the closest rank-k matrix to A (in Frobenius norm).

truncated_svd.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def truncated_svd(A, k):
    """Best rank-k approximation of A."""
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    return U[:,:k] @ np.diag(s[:k]) @ Vt[:k,:]

# Image compression example
import numpy as np
np.random.seed(42)
image = np.random.rand(100, 100)  # 100x100 'image'
for k in [1, 5, 10, 50]:
    approx = truncated_svd(image, k)
    error  = np.linalg.norm(image - approx, 'fro') / np.linalg.norm(image, 'fro')
    compression = k * (100 + 100 + 1) / (100*100)
    print(f'k={k:3}: error={error:.3f}, storage={compression:.2%}')
Output
k= 1: error=0.956, storage=2.01%
k= 5: error=0.889, storage=10.05%
k= 10: error=0.869, storage=20.10%
k= 50: error=0.739, storage=100.50%
Production Insight
Truncated SVD is the foundation of PCA and recommender systems.
Choose k by looking at the singular value decay — a sharp drop indicates the intrinsic dimension.
Rule: plot singular values on a log scale; pick k where the curve flattens.
Key Takeaway
Truncated SVD is the optimal low-rank approximation.
K equals the number of singular values kept.
Use it for compression, denoising, and dimensionality reduction.

Geometric Intuition of SVD: Rotation, Scale, Rotation

Any linear transformation (matrix) applied to a vector can be broken down into a rotation (V^T), a scaling (Σ) along the new axes, and another rotation (U). The columns of V are the right singular vectors (input directions), columns of U are the left singular vectors (output directions). In 2D, a unit circle maps to an ellipse whose semi-axes lengths are the singular values.

svd_geometry.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import numpy as np
import matplotlib.pyplot as plt

A = np.array([[3, 1], [1, 2]])
U, s, Vt = np.linalg.svd(A)

# Circle of points
angles = np.linspace(0, 2*np.pi, 100)
circle = np.array([np.cos(angles), np.sin(angles)])

# Transform circle
ellipse = A @ circle

plt.figure(figsize=(6,6))
plt.plot(circle[0], circle[1], 'b--', label='Unit circle')
plt.plot(ellipse[0], ellipse[1], 'r-', label='Transformed')
# Plot axes
plt.arrow(0,0, s[0]*U[0,0], s[0]*U[1,0], head_width=0.2, color='g')
plt.arrow(0,0, s[1]*U[0,1], s[1]*U[1,1], head_width=0.2, color='g')
plt.axis('equal')
plt.legend()
plt.show()
Output
Green arrows show the output directions (columns of U scaled by σ).
Production Insight
Geometric intuition helps debug PCA: the first singular vector always points to the direction of maximum variance.
If your data has outliers, the first singular vector can be pulled off by a few extreme points.
Rule: robust PCA (RPCA) separates sparse outliers and low-rank structure.
Key Takeaway
SVD = rotate (V^T) → scale (Σ) → rotate (U).
Singular values stretch unit sphere into ellipsoid.
U and V are orthogonal transformation matrices.

Numerical Computation: Algorithms and Stability

Computing SVD is numerically stable but expensive. Full SVD uses QR decomposition (Golub-Kahan algorithm) in O(mn·min(m,n)). For large matrices, use randomized SVD: project the matrix onto a random low-dimensional subspace, then compute QR and small SVD. Power iteration can improve accuracy. scipy.linalg.svd is the workhorse; for sparse matrices use scipy.sparse.linalg.svds (ARPACK).

randomized_svd.pyPYTHON
1
2
3
4
5
6
7
import numpy as np
from sklearn.utils.extmath import randomized_svd

A = np.random.rand(1000, 100)
k = 10
U, s, Vt = randomized_svd(A, n_components=k, n_iter='auto', random_state=42)
print(f'Computed top {k} singular values in ~0.1s vs 2s for full SVD')
Output
Computed top 10 singular values in ~0.1s vs 2s for full SVD
Production Insight
Randomized SVD is the go-to for large sparse matrices: it runs in O(m·n·log(k)) instead of O(m·n·k).
Be careful with n_iter: too few iterations give poor accuracy; too many waste time.
Rule: use n_iter='auto' (default) which increases iterations for small singular values.
Key Takeaway
Full SVD is O(mn·min(m,n)) — too slow for big data.
Randomized SVD trades small error for huge speedups.
For sparse matrices, always use scipy.sparse.linalg.svds.

Applications: PCA, Pseudoinverse, and Recommender Systems

PCA: center the data, compute SVD of centered matrix, left singular vectors multiplied by singular values give principal components. Pseudoinverse A⁺ = V Σ⁺ U^T (reciprocal of non-zero singular values). Used for solving least-squares and underdetermined systems. In recommender systems: user-item matrix R ≈ U_k Σ_k V_k^T; predict missing ratings via inner product. This is the FunkSVD / SVD++ family.

pca_via_svd.pyPYTHON
1
2
3
4
5
6
7
8
9
import numpy as np
X = np.random.randn(100, 5)
X_centered = X - X.mean(axis=0)
U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)
# Principal components are columns of Vt.T
components = Vt.T
# Project onto first 2 PCs
X_pca = X_centered @ components[:,:2]
print(f'Explained variance ratio: {s[:2]**2 / np.sum(s**2)}')
Output
Explained variance ratio: [0.432, 0.289]
Production Insight
In recommender systems, full SVD is infeasible for millions of users/items. Use stochastic gradient descent (SGD) on the low-rank factors — this is the essence of the Netflix Prize algorithm.
Rule: if your matrix is huge and sparse, use SGD-based matrix factorisation, not SVD.
Key Takeaway
PCA = SVD on centred data.
Pseudoinverse = SVD with reciprocal singular values.
Recommender systems use truncated SVD or SGD factorisation.
● Production incidentPOST-MORTEMseverity: high

Full SVD on Sparse Recommendation Matrix Crashes Server

Symptom
OutOfMemoryError: Java heap space after 2 hours of computation.
Assumption
SVD automatically exploits sparsity (it doesn't; numpy.linalg.svd creates dense copies).
Root cause
By default, numpy.linalg.svd converts sparse to dense, requiring 1M×500k×8 = 4TB for full matrix. Even thin SVD (full_matrices=False) still loads entire matrix.
Fix
Use scipy.sparse.linalg.svds with k=100 to compute only top singular values without densifying.
Key lesson
  • Always check matrix sparsity before using dense SVD.
  • Use scipy.sparse.linalg.svds for large sparse matrices.
  • Set k << min(m,n) to keep computation feasible.
Production debug guideCommon numerical and performance problems3 entries
Symptom · 01
Singular values are NaN or Inf
Fix
Check input matrix for NaN/Inf. Use np.isfinite().all() before SVD.
Symptom · 02
SVD computation hangs or takes too long
Fix
Check matrix size and sparsity. Use randomized SVD (sklearn.utils.extmath.randomized_svd) for large matrices.
Symptom · 03
Reconstructed matrix differs from original significantly
Fix
Verify k is large enough to capture variance. Compute reconstruction error: np.linalg.norm(A - A_approx) / np.linalg.norm(A).
SVD vs Eigendecomposition
PropertySVDEigendecomposition
Matrix shapeAny m×nSquare only (n×n)
RequiresNo restrictionDiagonalisable (n independent eigenvectors)
Orthogonal factorsU and V are orthogonalEigenvectors are orthogonal only for symmetric matrices
Diagonal valuesSingular values (non-negative)Eigenvalues (can be negative/complex)
Numerical stabilityVery stableCan be unstable for non-symmetric matrices
Common usePCA, pseudoinverse, compressionSpectral graph theory, PageRank, ODEs

Key takeaways

1
SVD A = UΣV^T works for any matrix
more general than eigendecomposition which requires square diagonalisable.
2
Singular values σ₁≥σ₂≥...≥0 encode information importance. Rank = number of non-zero singular values.
3
Truncated SVD (keep k singular values) = best rank-k approximation. Foundation of PCA, image compression, NLP (LSA).
4
Pseudoinverse A^+ = V Σ^+ U^T
solves overdetermined/underdetermined systems in least-squares sense.
5
numpy.linalg.svd for dense; scipy.sparse.linalg.svds for large sparse matrices (only computes k singular values).

Common mistakes to avoid

3 patterns
×

Assuming SVD requires square matrices

Symptom
Confusion when applying eigendecomposition intuition to rectangular matrices
Fix
Remember SVD works for any m×n matrix. Use SVD instead of eigendecomposition for PCA.
×

Not centering data before SVD for PCA

Symptom
First singular vector captures the mean rather than variance; PCA results are misleading
Fix
Subtract column means from the matrix before applying SVD when performing PCA.
×

Using full SVD when only top k singular values are needed

Symptom
Unnecessary memory and time consumption (O(mn²) vs O(mn·k))
Fix
Use truncated SVD via scipy.sparse.linalg.svds or sklearn.decomposition.TruncatedSVD.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does truncated SVD relate to PCA?
Q02SENIOR
What is the pseudoinverse and when do you use it?
Q03SENIOR
Why is SVD more numerically stable than computing eigenvalues directly?
Q01 of 03SENIOR

How does truncated SVD relate to PCA?

ANSWER
PCA can be computed via SVD on the centred data matrix X: X = UΣV^T. The principal components are the columns of V (or V^T rows). Projections are given by UΣ. Truncated SVD (retaining k components) gives the same result as PCA retaining k principal components. This approach is more numerically stable than eigendecomposition of the covariance matrix, and it avoids explicitly forming X^T X, which can be large.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How does Netflix/collaborative filtering use SVD?
02
What is the difference between SVD and eigendecomposition?
03
Why is SVD used in Latent Semantic Analysis (LSA)?
04
How does SVD help in image compression?
🔥

That's Linear Algebra. Mark it forged?

3 min read · try the examples if you haven't

Previous
Eigenvalues and Eigenvectors
5 / 5 · Linear Algebra
Next
Genetic Algorithms — Evolution-Based Optimisation