SVD — 4TB Memory Crash on Sparse Matrices
- SVD A = UΣV^T works for any matrix — more general than eigendecomposition which requires square diagonalisable.
- Singular values σ₁≥σ₂≥...≥0 encode information importance. Rank = number of non-zero singular values.
- Truncated SVD (keep k singular values) = best rank-k approximation. Foundation of PCA, image compression, NLP (LSA).
- SVD decomposes any matrix A into UΣV^T: U and V are orthogonal, Σ diagonal with singular values.
- Singular values σ₁ ≥ σ₂ ≥ … ≥ 0 rank dimensions by importance; rank = count of positive σ.
- Truncated SVD keeps top k singular values, giving the best rank-k approximation (Frobenius norm).
- Full SVD on a 10k×10k dense matrix takes ~30 seconds (O(mn²)); use randomized SVD for large sparse matrices.
- Production trap: numpy.linalg.svd on a sparse matrix creates a dense copy → OOM. Use scipy.sparse.linalg.svds instead.
Production Incident
Production Debug GuideCommon numerical and performance problems
np.isfinite().all() before SVD.SVD is simultaneously the most powerful and most practically useful decomposition in linear algebra. Unlike eigendecomposition (restricted to square diagonalisable matrices), SVD works on any m×n matrix. It has a geometric interpretation (rotation-scale-rotation), a dimensionality reduction application (truncated SVD = PCA), and a computational application (pseudoinverse, rank, condition number).
Netflix's original collaborative filtering used SVD to decompose the user-movie rating matrix. Image compression, latent semantic analysis (LSA) for text, and principal component analysis all reduce to truncated SVD. NumPy's np.linalg.svd and scipy's truncated_svd (for large sparse matrices) implement it.
SVD Decomposition
A = U Σ V^T where U (m×m) and V (n×n) are orthogonal matrices, Σ (m×n) is diagonal with non-negative singular values σ₁ ≥ σ₂ ≥ ... ≥ 0.
import numpy as np A = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], dtype=float) U, s, Vt = np.linalg.svd(A, full_matrices=False) print(f'A shape: {A.shape}') print(f'U: {U.shape}, s: {s.shape}, Vt: {Vt.shape}') print(f'Singular values: {s}') print(f'Rank (non-zero singular values): {np.sum(s > 1e-10)}') # Reconstruct A from SVD A_reconstructed = U @ np.diag(s) @ Vt print(f'Reconstruction error: {np.max(np.abs(A - A_reconstructed)):.2e}')
U: (4, 3), s: (3,), Vt: (3, 3)
Singular values: [2.547e+01 1.291e+00 2.280e-15]
Rank (non-zero singular values): 2
Reconstruction error: 2.48e-15
Low-Rank Approximation — Truncated SVD
Keep only the k largest singular values — the rank-k approximation is the closest rank-k matrix to A (in Frobenius norm).
def truncated_svd(A, k): """Best rank-k approximation of A.""" U, s, Vt = np.linalg.svd(A, full_matrices=False) return U[:,:k] @ np.diag(s[:k]) @ Vt[:k,:] # Image compression example import numpy as np np.random.seed(42) image = np.random.rand(100, 100) # 100x100 'image' for k in [1, 5, 10, 50]: approx = truncated_svd(image, k) error = np.linalg.norm(image - approx, 'fro') / np.linalg.norm(image, 'fro') compression = k * (100 + 100 + 1) / (100*100) print(f'k={k:3}: error={error:.3f}, storage={compression:.2%}')
k= 5: error=0.889, storage=10.05%
k= 10: error=0.869, storage=20.10%
k= 50: error=0.739, storage=100.50%
Geometric Intuition of SVD: Rotation, Scale, Rotation
Any linear transformation (matrix) applied to a vector can be broken down into a rotation (V^T), a scaling (Σ) along the new axes, and another rotation (U). The columns of V are the right singular vectors (input directions), columns of U are the left singular vectors (output directions). In 2D, a unit circle maps to an ellipse whose semi-axes lengths are the singular values.
import numpy as np import matplotlib.pyplot as plt A = np.array([[3, 1], [1, 2]]) U, s, Vt = np.linalg.svd(A) # Circle of points angles = np.linspace(0, 2*np.pi, 100) circle = np.array([np.cos(angles), np.sin(angles)]) # Transform circle ellipse = A @ circle plt.figure(figsize=(6,6)) plt.plot(circle[0], circle[1], 'b--', label='Unit circle') plt.plot(ellipse[0], ellipse[1], 'r-', label='Transformed') # Plot axes plt.arrow(0,0, s[0]*U[0,0], s[0]*U[1,0], head_width=0.2, color='g') plt.arrow(0,0, s[1]*U[0,1], s[1]*U[1,1], head_width=0.2, color='g') plt.axis('equal') plt.legend() plt.show()
Numerical Computation: Algorithms and Stability
Computing SVD is numerically stable but expensive. Full SVD uses QR decomposition (Golub-Kahan algorithm) in O(mn·min(m,n)). For large matrices, use randomized SVD: project the matrix onto a random low-dimensional subspace, then compute QR and small SVD. Power iteration can improve accuracy. scipy.linalg.svd is the workhorse; for sparse matrices use scipy.sparse.linalg.svds (ARPACK).
import numpy as np from sklearn.utils.extmath import randomized_svd A = np.random.rand(1000, 100) k = 10 U, s, Vt = randomized_svd(A, n_components=k, n_iter='auto', random_state=42) print(f'Computed top {k} singular values in ~0.1s vs 2s for full SVD')
Applications: PCA, Pseudoinverse, and Recommender Systems
PCA: center the data, compute SVD of centered matrix, left singular vectors multiplied by singular values give principal components. Pseudoinverse A⁺ = V Σ⁺ U^T (reciprocal of non-zero singular values). Used for solving least-squares and underdetermined systems. In recommender systems: user-item matrix R ≈ U_k Σ_k V_k^T; predict missing ratings via inner product. This is the FunkSVD / SVD++ family.
import numpy as np X = np.random.randn(100, 5) X_centered = X - X.mean(axis=0) U, s, Vt = np.linalg.svd(X_centered, full_matrices=False) # Principal components are columns of Vt.T components = Vt.T # Project onto first 2 PCs X_pca = X_centered @ components[:,:2] print(f'Explained variance ratio: {s[:2]**2 / np.sum(s**2)}')
| Property | SVD | Eigendecomposition |
|---|---|---|
| Matrix shape | Any m×n | Square only (n×n) |
| Requires | No restriction | Diagonalisable (n independent eigenvectors) |
| Orthogonal factors | U and V are orthogonal | Eigenvectors are orthogonal only for symmetric matrices |
| Diagonal values | Singular values (non-negative) | Eigenvalues (can be negative/complex) |
| Numerical stability | Very stable | Can be unstable for non-symmetric matrices |
| Common use | PCA, pseudoinverse, compression | Spectral graph theory, PageRank, ODEs |
🎯 Key Takeaways
- SVD A = UΣV^T works for any matrix — more general than eigendecomposition which requires square diagonalisable.
- Singular values σ₁≥σ₂≥...≥0 encode information importance. Rank = number of non-zero singular values.
- Truncated SVD (keep k singular values) = best rank-k approximation. Foundation of PCA, image compression, NLP (LSA).
- Pseudoinverse A^+ = V Σ^+ U^T — solves overdetermined/underdetermined systems in least-squares sense.
- numpy.linalg.svd for dense; scipy.sparse.linalg.svds for large sparse matrices (only computes k singular values).
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QHow does truncated SVD relate to PCA?SeniorReveal
- QWhat is the pseudoinverse and when do you use it?Mid-levelReveal
- QWhy is SVD more numerically stable than computing eigenvalues directly?SeniorReveal
Frequently Asked Questions
How does Netflix/collaborative filtering use SVD?
User-movie rating matrix R ≈ UΣV^T with k dimensions (latent factors). U represents user preferences in latent space, V represents movie characteristics. Missing ratings are predicted as (UΣV^T)[user][movie]. Simon Funk's 2006 matrix factorization approach (a variant) won the Netflix Prize. Modern systems use SGD-based factorisation rather than full SVD for scalability.
What is the difference between SVD and eigendecomposition?
SVD decomposes any m×n matrix into three factors (U,S,V^T). Eigendecomposition works only on square matrices and requires diagonalisability (n independent eigenvectors). For symmetric positive-definite matrices, SVD and eigendecomposition give the same results (up to signs). SVD is more numerically stable and used more broadly in practice.
Why is SVD used in Latent Semantic Analysis (LSA)?
LSA applies truncated SVD to the term-document matrix. It identifies latent topics by grouping terms that co-occur in documents. Low-rank approximation removes noise and synonymy/polysemy effects. The result is a semantic space where documents can be compared by cosine similarity.
How does SVD help in image compression?
An image is a matrix of pixel values (or three matrices for RGB). SVD decomposes it; truncating to k singular values reduces storage from m×n to k(m+n+1). For typical images, k=20 gives 90% compression with acceptable quality. Higher k preserves more detail.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.