Singular Value Decomposition — SVD Explained
- SVD A = UΣV^T works for any matrix — more general than eigendecomposition which requires square diagonalisable.
- Singular values σ₁≥σ₂≥...≥0 encode information importance. Rank = number of non-zero singular values.
- Truncated SVD (keep k singular values) = best rank-k approximation. Foundation of PCA, image compression, NLP (LSA).
SVD is simultaneously the most powerful and most practically useful decomposition in linear algebra. Unlike eigendecomposition (restricted to square diagonalisable matrices), SVD works on any m×n matrix. It has a geometric interpretation (rotation-scale-rotation), a dimensionality reduction application (truncated SVD = PCA), and a computational application (pseudoinverse, rank, condition number).
Netflix's original collaborative filtering used SVD to decompose the user-movie rating matrix. Image compression, latent semantic analysis (LSA) for text, and principal component analysis all reduce to truncated SVD. NumPy's np.linalg.svd and scipy's truncated_svd (for large sparse matrices) implement it.
SVD Decomposition
A = U Σ V^T where U (m×m) and V (n×n) are orthogonal matrices, Σ (m×n) is diagonal with non-negative singular values σ₁ ≥ σ₂ ≥ ... ≥ 0.
import numpy as np A = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], dtype=float) U, s, Vt = np.linalg.svd(A, full_matrices=False) print(f'A shape: {A.shape}') print(f'U: {U.shape}, s: {s.shape}, Vt: {Vt.shape}') print(f'Singular values: {s}') print(f'Rank (non-zero singular values): {np.sum(s > 1e-10)}') # Reconstruct A from SVD A_reconstructed = U @ np.diag(s) @ Vt print(f'Reconstruction error: {np.max(np.abs(A - A_reconstructed)):.2e}')
U: (4, 3), s: (3,), Vt: (3, 3)
Singular values: [2.547e+01 1.291e+00 2.280e-15]
Rank (non-zero singular values): 2
Reconstruction error: 2.48e-15
Low-Rank Approximation — Truncated SVD
Keep only the k largest singular values — the rank-k approximation is the closest rank-k matrix to A (in Frobenius norm).
def truncated_svd(A, k): """Best rank-k approximation of A.""" U, s, Vt = np.linalg.svd(A, full_matrices=False) return U[:,:k] @ np.diag(s[:k]) @ Vt[:k,:] # Image compression example import numpy as np np.random.seed(42) image = np.random.rand(100, 100) # 100x100 'image' for k in [1, 5, 10, 50]: approx = truncated_svd(image, k) error = np.linalg.norm(image - approx, 'fro') / np.linalg.norm(image, 'fro') compression = k * (100 + 100 + 1) / (100*100) print(f'k={k:3}: error={error:.3f}, storage={compression:.2%}')
k= 5: error=0.889, storage=10.05%
k= 10: error=0.869, storage=20.10%
k= 50: error=0.739, storage=100.50%
🎯 Key Takeaways
- SVD A = UΣV^T works for any matrix — more general than eigendecomposition which requires square diagonalisable.
- Singular values σ₁≥σ₂≥...≥0 encode information importance. Rank = number of non-zero singular values.
- Truncated SVD (keep k singular values) = best rank-k approximation. Foundation of PCA, image compression, NLP (LSA).
- Pseudoinverse A^+ = V Σ^+ U^T — solves overdetermined/underdetermined systems in least-squares sense.
- numpy.linalg.svd for dense; scipy.sparse.linalg.svds for large sparse matrices (only computes k singular values).
Interview Questions on This Topic
- QHow does truncated SVD relate to PCA?
- QWhat is the pseudoinverse and when do you use it?
- QWhy is SVD more numerically stable than computing eigenvalues directly?
Frequently Asked Questions
How does Netflix/collaborative filtering use SVD?
User-movie rating matrix R ≈ UΣV^T with k dimensions (latent factors). U represents user preferences in latent space, V represents movie characteristics. Missing ratings are predicted as (UΣV^T)[user][movie]. Simon Funk's 2006 matrix factorization approach (a variant) won the Netflix Prize. Modern systems use SGD-based factorisation rather than full SVD for scalability.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.