Home DSA Singular Value Decomposition — SVD Explained

Singular Value Decomposition — SVD Explained

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Linear Algebra → Topic 5 of 5
Learn SVD — how it decomposes any matrix into U, Σ, V^T, why it is more general than eigendecomposition, and how it powers PCA, image compression, and recommender systems.
🔥 Advanced — solid DSA foundation required
In this tutorial, you'll learn:
  • SVD A = UΣV^T works for any matrix — more general than eigendecomposition which requires square diagonalisable.
  • Singular values σ₁≥σ₂≥...≥0 encode information importance. Rank = number of non-zero singular values.
  • Truncated SVD (keep k singular values) = best rank-k approximation. Foundation of PCA, image compression, NLP (LSA).
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
⚡ Quick Answer
SVD is the Swiss Army knife of linear algebra. It decomposes any matrix — rectangular, rank-deficient, anything — into three matrices: a rotation, a scaling, and another rotation. The scaling diagonal (singular values) reveals the matrix's essential information content: large singular values carry important structure, small ones carry noise. Truncate the small ones and you get the best low-rank approximation — the foundation of image compression and collaborative filtering.

SVD is simultaneously the most powerful and most practically useful decomposition in linear algebra. Unlike eigendecomposition (restricted to square diagonalisable matrices), SVD works on any m×n matrix. It has a geometric interpretation (rotation-scale-rotation), a dimensionality reduction application (truncated SVD = PCA), and a computational application (pseudoinverse, rank, condition number).

Netflix's original collaborative filtering used SVD to decompose the user-movie rating matrix. Image compression, latent semantic analysis (LSA) for text, and principal component analysis all reduce to truncated SVD. NumPy's np.linalg.svd and scipy's truncated_svd (for large sparse matrices) implement it.

SVD Decomposition

A = U Σ V^T where U (m×m) and V (n×n) are orthogonal matrices, Σ (m×n) is diagonal with non-negative singular values σ₁ ≥ σ₂ ≥ ... ≥ 0.

svd.py · PYTHON
12345678910111213
import numpy as np

A = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], dtype=float)
U, s, Vt = np.linalg.svd(A, full_matrices=False)

print(f'A shape: {A.shape}')
print(f'U: {U.shape}, s: {s.shape}, Vt: {Vt.shape}')
print(f'Singular values: {s}')
print(f'Rank (non-zero singular values): {np.sum(s > 1e-10)}')

# Reconstruct A from SVD
A_reconstructed = U @ np.diag(s) @ Vt
print(f'Reconstruction error: {np.max(np.abs(A - A_reconstructed)):.2e}')
▶ Output
A shape: (4, 3)
U: (4, 3), s: (3,), Vt: (3, 3)
Singular values: [2.547e+01 1.291e+00 2.280e-15]
Rank (non-zero singular values): 2
Reconstruction error: 2.48e-15

Low-Rank Approximation — Truncated SVD

Keep only the k largest singular values — the rank-k approximation is the closest rank-k matrix to A (in Frobenius norm).

truncated_svd.py · PYTHON
1234567891011121314
def truncated_svd(A, k):
    """Best rank-k approximation of A."""
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    return U[:,:k] @ np.diag(s[:k]) @ Vt[:k,:]

# Image compression example
import numpy as np
np.random.seed(42)
image = np.random.rand(100, 100)  # 100x100 'image'
for k in [1, 5, 10, 50]:
    approx = truncated_svd(image, k)
    error  = np.linalg.norm(image - approx, 'fro') / np.linalg.norm(image, 'fro')
    compression = k * (100 + 100 + 1) / (100*100)
    print(f'k={k:3}: error={error:.3f}, storage={compression:.2%}')
▶ Output
k= 1: error=0.956, storage=2.01%
k= 5: error=0.889, storage=10.05%
k= 10: error=0.869, storage=20.10%
k= 50: error=0.739, storage=100.50%

🎯 Key Takeaways

  • SVD A = UΣV^T works for any matrix — more general than eigendecomposition which requires square diagonalisable.
  • Singular values σ₁≥σ₂≥...≥0 encode information importance. Rank = number of non-zero singular values.
  • Truncated SVD (keep k singular values) = best rank-k approximation. Foundation of PCA, image compression, NLP (LSA).
  • Pseudoinverse A^+ = V Σ^+ U^T — solves overdetermined/underdetermined systems in least-squares sense.
  • numpy.linalg.svd for dense; scipy.sparse.linalg.svds for large sparse matrices (only computes k singular values).

Interview Questions on This Topic

  • QHow does truncated SVD relate to PCA?
  • QWhat is the pseudoinverse and when do you use it?
  • QWhy is SVD more numerically stable than computing eigenvalues directly?

Frequently Asked Questions

How does Netflix/collaborative filtering use SVD?

User-movie rating matrix R ≈ UΣV^T with k dimensions (latent factors). U represents user preferences in latent space, V represents movie characteristics. Missing ratings are predicted as (UΣV^T)[user][movie]. Simon Funk's 2006 matrix factorization approach (a variant) won the Netflix Prize. Modern systems use SGD-based factorisation rather than full SVD for scalability.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousEigenvalues and Eigenvectors
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged