SVD — 4TB Memory Crash on Sparse Matrices
OutOfMemoryError from full SVD on a 1M×500k sparse matrix? Use scipy.
- SVD decomposes any matrix A into UΣV^T: U and V are orthogonal, Σ diagonal with singular values.
- Singular values σ₁ ≥ σ₂ ≥ … ≥ 0 rank dimensions by importance; rank = count of positive σ.
- Truncated SVD keeps top k singular values, giving the best rank-k approximation (Frobenius norm).
- Full SVD on a 10k×10k dense matrix takes ~30 seconds (O(mn²)); use randomized SVD for large sparse matrices.
- Production trap: numpy.linalg.svd on a sparse matrix creates a dense copy → OOM. Use scipy.sparse.linalg.svds instead.
SVD is the Swiss Army knife of linear algebra. It decomposes any matrix — rectangular, rank-deficient, anything — into three matrices: a rotation, a scaling, and another rotation. The scaling diagonal (singular values) reveals the matrix's essential information content: large singular values carry important structure, small ones carry noise. Truncate the small ones and you get the best low-rank approximation — the foundation of image compression and collaborative filtering.
SVD is simultaneously the most powerful and most practically useful decomposition in linear algebra. Unlike eigendecomposition (restricted to square diagonalisable matrices), SVD works on any m×n matrix. It has a geometric interpretation (rotation-scale-rotation), a dimensionality reduction application (truncated SVD = PCA), and a computational application (pseudoinverse, rank, condition number).
Netflix's original collaborative filtering used SVD to decompose the user-movie rating matrix. Image compression, latent semantic analysis (LSA) for text, and principal component analysis all reduce to truncated SVD. NumPy's np.linalg.svd and scipy's truncated_svd (for large sparse matrices) implement it.
SVD Decomposition
A = U Σ V^T where U (m×m) and V (n×n) are orthogonal matrices, Σ (m×n) is diagonal with non-negative singular values σ₁ ≥ σ₂ ≥ ... ≥ 0.
Low-Rank Approximation — Truncated SVD
Keep only the k largest singular values — the rank-k approximation is the closest rank-k matrix to A (in Frobenius norm).
Geometric Intuition of SVD: Rotation, Scale, Rotation
Any linear transformation (matrix) applied to a vector can be broken down into a rotation (V^T), a scaling (Σ) along the new axes, and another rotation (U). The columns of V are the right singular vectors (input directions), columns of U are the left singular vectors (output directions). In 2D, a unit circle maps to an ellipse whose semi-axes lengths are the singular values.
Numerical Computation: Algorithms and Stability
Computing SVD is numerically stable but expensive. Full SVD uses QR decomposition (Golub-Kahan algorithm) in O(mn·min(m,n)). For large matrices, use randomized SVD: project the matrix onto a random low-dimensional subspace, then compute QR and small SVD. Power iteration can improve accuracy. scipy.linalg.svd is the workhorse; for sparse matrices use scipy.sparse.linalg.svds (ARPACK).
Applications: PCA, Pseudoinverse, and Recommender Systems
PCA: center the data, compute SVD of centered matrix, left singular vectors multiplied by singular values give principal components. Pseudoinverse A⁺ = V Σ⁺ U^T (reciprocal of non-zero singular values). Used for solving least-squares and underdetermined systems. In recommender systems: user-item matrix R ≈ U_k Σ_k V_k^T; predict missing ratings via inner product. This is the FunkSVD / SVD++ family.
Full SVD on Sparse Recommendation Matrix Crashes Server
- Always check matrix sparsity before using dense SVD.
- Use scipy.sparse.linalg.svds for large sparse matrices.
- Set k << min(m,n) to keep computation feasible.
np.isfinite().all() before SVD.Key takeaways
Common mistakes to avoid
3 patternsAssuming SVD requires square matrices
Not centering data before SVD for PCA
Using full SVD when only top k singular values are needed
Interview Questions on This Topic
How does truncated SVD relate to PCA?
Frequently Asked Questions
That's Linear Algebra. Mark it forged?
3 min read · try the examples if you haven't