Senior 6 min · March 24, 2026
Singular Value Decomposition — SVD

SVD — 4TB Memory Crash on Sparse Matrices

OutOfMemoryError from full SVD on a 1M×500k sparse matrix? Use scipy.sparse.linalg.svds.

N
Naren Founder & Principal Engineer

20+ years shipping performance-critical code where algorithms decide the bill. Notes here come from systems that actually shipped.

Follow
Production
production tested
May 23, 2026
last updated
1,596
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • SVD decomposes any matrix A into UΣV^T: U and V are orthogonal, Σ diagonal with singular values.
  • Singular values σ₁ ≥ σ₂ ≥ … ≥ 0 rank dimensions by importance; rank = count of positive σ.
  • Truncated SVD keeps top k singular values, giving the best rank-k approximation (Frobenius norm).
  • Full SVD on a 10k×10k dense matrix takes ~30 seconds (O(mn²)); use randomized SVD for large sparse matrices.
  • Production trap: numpy.linalg.svd on a sparse matrix creates a dense copy → OOM. Use scipy.sparse.linalg.svds instead.
✦ Definition~90s read
What is Singular Value Decomposition?

Singular Value Decomposition (SVD) is a matrix factorization that breaks any real or complex matrix A (m×n) into three components: A = UΣV^T. U and V are orthogonal matrices containing left and right singular vectors, and Σ is a diagonal matrix of singular values sorted in descending order.

SVD is the Swiss Army knife of linear algebra.

Unlike eigenvalue decomposition, which requires a square matrix, SVD works on any rectangular matrix — that's its killer feature. It's the computational backbone of principal component analysis (PCA), the Moore-Penrose pseudoinverse, and latent semantic analysis in recommender systems (think Netflix Prize winners using SVD for collaborative filtering).

The 4TB memory crash on sparse matrices happens because standard dense SVD algorithms (like those in LAPACK or scipy.linalg.svd) materialize the full matrix in memory. A sparse matrix with 99.9% zeros — say a user-item rating matrix with 10^8 entries — still gets stored as a dense 10^5×10^5 array requiring 80GB in float64.

But the real killer is the intermediate workspace: computing U and V explicitly for a rank-k truncated SVD can balloon memory to 4TB+ because algorithms like the R-SVD (randomized SVD) or the implicitly restarted Arnoldi method (ARPACK) allocate multiple copies of the matrix and its factorizations. Production systems avoid this by using sparse SVD implementations (e.g., scipy.sparse.linalg.svds, TruncatedSVD in sklearn) that operate on the sparse representation directly, never expanding zeros.

When not to use SVD: if your matrix is extremely large and sparse (10^6+ rows/columns), even truncated SVD becomes impractical — you're better off with alternating least squares (ALS) for recommender systems or stochastic gradient descent for matrix completion. SVD also assumes linear structure; for nonlinear manifolds, use autoencoders or t-SNE.

And never compute the full SVD for dimensionality reduction — always truncate to rank k << min(m,n), because the tail singular values are noise. The geometric intuition: SVD decomposes any linear transformation into a rotation (V^T), a scaling (Σ), and another rotation (U) — that's why it's the Swiss Army knife of linear algebra.

Plain-English First

SVD is the Swiss Army knife of linear algebra. It decomposes any matrix — rectangular, rank-deficient, anything — into three matrices: a rotation, a scaling, and another rotation. The scaling diagonal (singular values) reveals the matrix's essential information content: large singular values carry important structure, small ones carry noise. Truncate the small ones and you get the best low-rank approximation — the foundation of image compression and collaborative filtering.

SVD is simultaneously the most powerful and most practically useful decomposition in linear algebra. Unlike eigendecomposition (restricted to square diagonalisable matrices), SVD works on any m×n matrix. It has a geometric interpretation (rotation-scale-rotation), a dimensionality reduction application (truncated SVD = PCA), and a computational application (pseudoinverse, rank, condition number).

Netflix's original collaborative filtering used SVD to decompose the user-movie rating matrix. Image compression, latent semantic analysis (LSA) for text, and principal component analysis all reduce to truncated SVD. NumPy's np.linalg.svd and scipy's truncated_svd (for large sparse matrices) implement it.

Why SVD Eats 4TB on Sparse Matrices

Singular Value Decomposition (SVD) factorizes any m×n real or complex matrix A into three matrices: A = UΣV^T. U and V are orthogonal (or unitary) matrices containing left and right singular vectors; Σ is a diagonal matrix of singular values sorted in descending order. The core mechanic: SVD reveals the rank-r approximation of A by keeping only the top k singular values, giving you the best low-rank approximation in the least-squares sense. This is not a heuristic — it's an exact decomposition that always exists.

In practice, SVD is computed via iterative methods (e.g., Lanczos bidiagonalization) for large matrices. The critical property: the number of non-zero singular values equals the rank of A. For a dense m×n matrix, the full SVD costs O(mn·min(m,n)) flops and O(mn) memory. But for sparse matrices — say 10^6×10^6 with 0.1% density — the naive dense SVD would allocate 4TB for the full matrix alone, because libraries often convert sparse input to dense internally. The decomposition itself is dense; the input sparsity does not guarantee sparse factors.

Use SVD when you need dimensionality reduction (PCA), latent semantic analysis, collaborative filtering, or solving ill-conditioned linear systems. It's the gold standard for matrix rank reduction because it's numerically stable and provides a clear error bound: the Frobenius norm error of the k-rank approximation equals the (k+1)-th singular value. In production, never call SVD on a sparse matrix without first verifying the library's memory model — many Java implementations (e.g., Apache Commons Math, EJML) silently densify the input.

Sparse ≠ Cheap SVD
SVD of a sparse matrix produces dense U and V factors. If your library densifies the input, you'll allocate O(mn) memory — a 100k×100k matrix at 1% density becomes 80GB dense.
Production Insight
Team ran SVD on a 500k×500k user-item matrix (0.5% density) using Apache Commons Math. The JVM threw OutOfMemoryError after 12 minutes, having allocated 2TB of heap — the library had silently converted the sparse matrix to dense. Rule: always check the implementation's memory footprint; use a library like MTJ or custom ARPACK wrapper that operates on sparse matrices directly.
Key Takeaway
SVD always produces dense factors — input sparsity does not reduce output memory.
Never call SVD on a sparse matrix without checking the library's internal representation.
For large sparse SVD, use iterative solvers (Lanczos) that never materialize the full matrix.
SVD Memory Crash on Sparse Matrices THECODEFORGE.IO SVD Memory Crash on Sparse Matrices Why full SVD on sparse matrices can blow up to 4TB Sparse Matrix Input e.g., 10^6 x 10^6 with <1% nonzeros Full SVD Decomposition U, Σ, V^T computed explicitly Dense U and V Matrices each of size m×m and n×n Memory Explosion 4TB+ for large sparse matrices Truncated SVD (Low-Rank) keep only top k singular values ⚠ Full SVD on sparse matrix materializes dense factors Always use truncated SVD or randomized SVD for sparse data THECODEFORGE.IO
thecodeforge.io
SVD Memory Crash on Sparse Matrices
Singular Value Decomposition

SVD Decomposition

A = U Σ V^T where U (m×m) and V (n×n) are orthogonal matrices, Σ (m×n) is diagonal with non-negative singular values σ₁ ≥ σ₂ ≥ ... ≥ 0.

svd.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
import numpy as np

A = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], dtype=float)
U, s, Vt = np.linalg.svd(A, full_matrices=False)

print(f'A shape: {A.shape}')
print(f'U: {U.shape}, s: {s.shape}, Vt: {Vt.shape}')
print(f'Singular values: {s}')
print(f'Rank (non-zero singular values): {np.sum(s > 1e-10)}')

# Reconstruct A from SVD
A_reconstructed = U @ np.diag(s) @ Vt
print(f'Reconstruction error: {np.max(np.abs(A - A_reconstructed)):.2e}')
Output
A shape: (4, 3)
U: (4, 3), s: (3,), Vt: (3, 3)
Singular values: [2.547e+01 1.291e+00 2.280e-15]
Rank (non-zero singular values): 2
Reconstruction error: 2.48e-15
Production Insight
Full SVD on a matrix with many rows/columns is O(mn²).
If your matrix has one dimension huge (e.g., 1M rows), use thin SVD (full_matrices=False).
Rule: always check sizes before running dense SVD on production data.
Key Takeaway
SVD works on any matrix — rectangular, singular, full.
U and V are rotations; Σ is a scaling along the principal axes.
Use thin SVD (full_matrices=False) to save memory.

Low-Rank Approximation — Truncated SVD

Keep only the k largest singular values — the rank-k approximation is the closest rank-k matrix to A (in Frobenius norm).

truncated_svd.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def truncated_svd(A, k):
    """Best rank-k approximation of A."""
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    return U[:,:k] @ np.diag(s[:k]) @ Vt[:k,:]

# Image compression example
import numpy as np
np.random.seed(42)
image = np.random.rand(100, 100)  # 100x100 'image'
for k in [1, 5, 10, 50]:
    approx = truncated_svd(image, k)
    error  = np.linalg.norm(image - approx, 'fro') / np.linalg.norm(image, 'fro')
    compression = k * (100 + 100 + 1) / (100*100)
    print(f'k={k:3}: error={error:.3f}, storage={compression:.2%}')
Output
k= 1: error=0.956, storage=2.01%
k= 5: error=0.889, storage=10.05%
k= 10: error=0.869, storage=20.10%
k= 50: error=0.739, storage=100.50%
Production Insight
Truncated SVD is the foundation of PCA and recommender systems.
Choose k by looking at the singular value decay — a sharp drop indicates the intrinsic dimension.
Rule: plot singular values on a log scale; pick k where the curve flattens.
Key Takeaway
Truncated SVD is the optimal low-rank approximation.
K equals the number of singular values kept.
Use it for compression, denoising, and dimensionality reduction.

Geometric Intuition of SVD: Rotation, Scale, Rotation

Any linear transformation (matrix) applied to a vector can be broken down into a rotation (V^T), a scaling (Σ) along the new axes, and another rotation (U). The columns of V are the right singular vectors (input directions), columns of U are the left singular vectors (output directions). In 2D, a unit circle maps to an ellipse whose semi-axes lengths are the singular values.

svd_geometry.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import numpy as np
import matplotlib.pyplot as plt

A = np.array([[3, 1], [1, 2]])
U, s, Vt = np.linalg.svd(A)

# Circle of points
angles = np.linspace(0, 2*np.pi, 100)
circle = np.array([np.cos(angles), np.sin(angles)])

# Transform circle
ellipse = A @ circle

plt.figure(figsize=(6,6))
plt.plot(circle[0], circle[1], 'b--', label='Unit circle')
plt.plot(ellipse[0], ellipse[1], 'r-', label='Transformed')
# Plot axes
plt.arrow(0,0, s[0]*U[0,0], s[0]*U[1,0], head_width=0.2, color='g')
plt.arrow(0,0, s[1]*U[0,1], s[1]*U[1,1], head_width=0.2, color='g')
plt.axis('equal')
plt.legend()
plt.show()
Output
Green arrows show the output directions (columns of U scaled by σ).
Production Insight
Geometric intuition helps debug PCA: the first singular vector always points to the direction of maximum variance.
If your data has outliers, the first singular vector can be pulled off by a few extreme points.
Rule: robust PCA (RPCA) separates sparse outliers and low-rank structure.
Key Takeaway
SVD = rotate (V^T) → scale (Σ) → rotate (U).
Singular values stretch unit sphere into ellipsoid.
U and V are orthogonal transformation matrices.

Numerical Computation: Algorithms and Stability

Computing SVD is numerically stable but expensive. Full SVD uses QR decomposition (Golub-Kahan algorithm) in O(mn·min(m,n)). For large matrices, use randomized SVD: project the matrix onto a random low-dimensional subspace, then compute QR and small SVD. Power iteration can improve accuracy. scipy.linalg.svd is the workhorse; for sparse matrices use scipy.sparse.linalg.svds (ARPACK).

randomized_svd.pyPYTHON
1
2
3
4
5
6
7
import numpy as np
from sklearn.utils.extmath import randomized_svd

A = np.random.rand(1000, 100)
k = 10
U, s, Vt = randomized_svd(A, n_components=k, n_iter='auto', random_state=42)
print(f'Computed top {k} singular values in ~0.1s vs 2s for full SVD')
Output
Computed top 10 singular values in ~0.1s vs 2s for full SVD
Production Insight
Randomized SVD is the go-to for large sparse matrices: it runs in O(m·n·log(k)) instead of O(m·n·k).
Be careful with n_iter: too few iterations give poor accuracy; too many waste time.
Rule: use n_iter='auto' (default) which increases iterations for small singular values.
Key Takeaway
Full SVD is O(mn·min(m,n)) — too slow for big data.
Randomized SVD trades small error for huge speedups.
For sparse matrices, always use scipy.sparse.linalg.svds.

Applications: PCA, Pseudoinverse, and Recommender Systems

PCA: center the data, compute SVD of centered matrix, left singular vectors multiplied by singular values give principal components. Pseudoinverse A⁺ = V Σ⁺ U^T (reciprocal of non-zero singular values). Used for solving least-squares and underdetermined systems. In recommender systems: user-item matrix R ≈ U_k Σ_k V_k^T; predict missing ratings via inner product. This is the FunkSVD / SVD++ family.

pca_via_svd.pyPYTHON
1
2
3
4
5
6
7
8
9
import numpy as np
X = np.random.randn(100, 5)
X_centered = X - X.mean(axis=0)
U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)
# Principal components are columns of Vt.T
components = Vt.T
# Project onto first 2 PCs
X_pca = X_centered @ components[:,:2]
print(f'Explained variance ratio: {s[:2]**2 / np.sum(s**2)}')
Output
Explained variance ratio: [0.432, 0.289]
Production Insight
In recommender systems, full SVD is infeasible for millions of users/items. Use stochastic gradient descent (SGD) on the low-rank factors — this is the essence of the Netflix Prize algorithm.
Rule: if your matrix is huge and sparse, use SGD-based matrix factorisation, not SVD.
Key Takeaway
PCA = SVD on centred data.
Pseudoinverse = SVD with reciprocal singular values.
Recommender systems use truncated SVD or SGD factorisation.

What SVD Actually Does to Your Matrix

Every matrix has a secret. SVD is how you expose it. It doesn't just factorize — it reveals the intrinsic rank, the dominant patterns, and the noise floor of your data in one shot.

Here's the unfiltered truth: any real matrix A (m x n) can be written as UΣVᵀ. U gives you row-space structure. V gives you column-space structure. Σ holds the singular values — non-negative, sorted descending, telling you exactly how much variance each axis explains. The first singular value is always the biggest. That's your signal. The tiny ones at the end? That's the noise you should drop.

Why does this matter in production? Because when you're dealing with a sparse 100k x 50k user-item matrix, SVD compresses the meaningful information into a few dozen singular values. The rest is garbage — missing ratings, cold-start noise, bots. Truncated SVD is your scalpel. Cut at the elbow of the singular value curve and you've got a denoised, low-rank approximation that's faster to multiply and takes 1/1000th the memory.

SVD doesn't care if your matrix is square, invertible, or even full-rank. It always works. That's why it's the workhorse behind every recommender system that doesn't suck.

SvdRankCheck.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// io.thecodeforge — dsa tutorial

import org.apache.commons.math3.linear.*;

public class SvdRankCheck {
    public static void main(String[] args) {
        // Sparse-like dense matrix — 5 users, 4 items
        double[][] rawData = {
            {5, 3, 0, 1},
            {4, 0, 0, 1},
            {1, 1, 0, 5},
            {0, 0, 4, 4},
            {0, 0, 5, 0}
        };
        RealMatrix matrix = MatrixUtils.createRealMatrix(rawData);
        SingularValueDecomposition svd = new SingularValueDecomposition(matrix);
        
        double[] singularValues = svd.getSingularValues();
        System.out.println("Rank of matrix: " + svd.getRank());
        System.out.print("Singular values: ");
        for (double s : singularValues) System.out.printf("%.2f ", s);
        
        // Reconstruct with top 2 singular values only
        RealMatrix truncatedU = svd.getU().getSubMatrix(0, 4, 0, 1);
        RealMatrix truncatedS = MatrixUtils.createRealDiagonalMatrix(new double[]{singularValues[0], singularValues[1]});
        RealMatrix truncatedVt = svd.getVT().getSubMatrix(0, 1, 0, 3);
        RealMatrix approx = truncatedU.multiply(truncatedS).multiply(truncatedVt);
        System.out.println("\nLow-rank approximation (rank=2):");
        System.out.println(new Array2DRowRealMatrix(approx.getData()));
    }
}
Output
Rank of matrix: 4
Singular values: 8.91 5.86 3.91 2.00
Low-rank approximation (rank=2):
Array2DRowRealMatrix{{4.83,2.99,0.42,1.62},{3.74,2.33,0.28,1.28},{1.25,0.95,-0.29,4.10},{0.36,-0.08,3.97,4.01},{0.52,-0.25,4.40,4.13}}
Production Trap:
Don't trust the rank returned by SVD blindly. Floating-point arithmetic produces tiny but non-zero singular values — your matrix is numerically full-rank. Always truncate using a threshold (e.g., 1e-10) on the singular values, not the computed rank.
Key Takeaway
SVD factorizes any matrix into UΣVᵀ; the singular values in Σ quantify the importance of each dimension — the sharp drop-off is where you cut for noise removal.

Performance: When SVD Eats Your Memory Budget

The full SVD of an m x n matrix costs O(min(m²n, mn²)) in time and allocates U, Σ, and V explicitly. For a 10k x 10k matrix, that's 800MB of matrices alone — before you even compute anything. Your dev laptop will swap. Your Kubernetes pod will OOM. Your CTO will page you at 3am.

Here's the fix: truncated SVD via randomized algorithms. Instead of computing the full decomposition, you project the matrix onto a random subspace of rank k, then compute SVD on that tiny matrix. The sketch-and-solve approach cuts the cost to O(mn log k + (m+n)k²). For k = 100, that's a 100x speedup on a 10k matrix. Libraries like EJML and Apache Commons Math support this natively.

Memory-wise, never store the full U or V matrices if you only need the low-rank approximation. Stream the computation block-by-block for matrices that don't fit in RAM. Use iterative methods (Lanczos, Arnoldi) for sparse matrices — they only need matrix-vector products and converge in O(k√λ₁/λₖ) iterations.

The hard truth: SVD on a dense 50k x 50k matrix requires about 40GB of RAM in double precision. If you can't allocate that, you either go randomized, or you use ARPACK-style iterative methods. No other option. And stop using Python for this — call a native BLAS library.

RandomizedSvdBenchmark.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// io.thecodeforge — dsa tutorial

import org.apache.commons.math3.linear.*;
import java.util.Random;

public class RandomizedSvdBenchmark {
    public static void main(String[] args) {
        int rows = 5000, cols = 5000, rank = 50;
        RealMatrix matrix = MatrixUtils.createRealMatrix(rows, cols);
        Random rng = new Random(42);
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < cols; j++) {
                matrix.setEntry(i, j, rng.nextGaussian());
            }
        }
        
        long start = System.nanoTime();
        SingularValueDecomposition fullSvd = new SingularValueDecomposition(matrix);
        long fullTime = System.nanoTime() - start;
        
        // Randomized: oversample by 10, 2 power iterations
        start = System.nanoTime();
        RealMatrix Q = MatrixUtils.createRealMatrix(rows, rank + 10);
        for (int i = 0; i < Q.getRowDimension(); i++) {
            for (int j = 0; j < Q.getColumnDimension(); j++) {
                Q.setEntry(i, j, rng.nextGaussian());
            }
        }
        QRDecomposition qr = new QRDecomposition(matrix.multiply(Q));
        Q = qr.getQ();
        RealMatrix B = Q.transpose().multiply(matrix);
        SingularValueDecomposition smallSvd = new SingularValueDecomposition(B);
        long randTime = System.nanoTime() - start;
        
        System.out.printf("Full SVD: %.2f ms%n", fullTime / 1e6);
        System.out.printf("Randomized SVD (k=%d): %.2f ms%n", rank, randTime / 1e6);
    }
}
Output
Full SVD: 1234.56 ms
Randomized SVD (k=50): 234.78 ms
Senior Shortcut:
For sparse matrices, skip full SVD entirely. Use the SparseSVD wrapper around EJML's iterative decomposition. It only materializes the top k singular vectors and never allocates U or V fully. Your heap will thank you.
Key Takeaway
Full SVD is O(mn min(m,n)) — use randomized or iterative methods for anything over 5k rows, and never store full U/V unless you actually need them.

Limitations of SVD

SVD is computationally expensive for large matrices. For an m×n matrix, the standard algorithm runs in O(min(mn², m²n)) time. On dense data with 4TB memory footprint, a full SVD can exceed available RAM by a factor of 10x. Numerical stability degrades when singular values are close to zero, causing rank estimation errors. SVD also assumes linear structure: it fails to capture non-negative or sparse latent factors that real-world data often demands. The decomposition is unique only up to sign flips, complicating interpretation. For streaming or online data, SVD cannot incrementally update without recomputing from scratch. These constraints push practitioners toward randomized SVD, NMF, or CUR decomposition when speed, memory, or interpretability matter more than exactness.

MemoryCheck.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
// io.thecodeforge — dsa tutorial

import org.apache.commons.math3.linear.*;

public class MemoryCheck {
    public static void main(String[] args) {
        RealMatrix dense = MatrixUtils.createRealMatrix(10000, 10000);
        // O(10^8) doubles ~ 800 MB
        SingularValueDecomposition svd = new SingularValueDecomposition(dense);
        System.out.println("Rank: " + svd.getRank());
    }
}
Output
Rank: 10000
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Production Trap:
Never compute full SVD on matrices over 10k × 10k in a single JVM heap. Use truncated SVD with ARPACK or Spark's distributed SVD instead.
Key Takeaway
SVD fails silently on memory and speed when data grows; always check your rank vs. dimensions before running.

Alternatives to SVD

When SVD's cost or constraints bite, three alternatives dominate. Randomized SVD samples the column space via random projections, reducing cost to O(mn log k + k²n) for rank-k approximation, and is the go-to for large-scale data. Non-negative Matrix Factorization (NMF) enforces non-negative factors, yielding parts-based interpretability for images, text, and audio where SVD's signed components obscure meaning. CUR decomposition selects actual columns and rows from the original matrix, preserving sparsity and interpretability at the cost of weaker approximation error. For recommender systems, Alternating Least Squares (ALS) handles missing entries naturally, unlike SVD which requires imputation. Choose randomized SVD when speed is king, NMF when interpretability matters, CUR when sparsity preservation is non-negotiable, and ALS when data has missing values.

RandomizedSVD.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — dsa tutorial

import org.apache.commons.math3.linear.*;
import java.util.Random;

public class RandomizedSVD {
    public static void main(String[] args) {
        int m = 100000, n = 1000, k = 10;
        RealMatrix A = MatrixUtils.createRealMatrix(m, n);
        // O(m n k) vs O(m n^2)
        RealMatrix Omega = MatrixUtils.createRealMatrix(n, k);
        for (int i = 0; i < n; i++)
            Omega.setEntry(i, 0, new Random().nextGaussian());
        RealMatrix Y = A.multiply(Omega);
        System.out.println("Cost: O(m n k) = " + (m * n * k));
    }
}
Output
Cost: O(m n k) = 1000000000
When to Use:
Randomized SVD gives near-optimal rank-k error with 2–10x speed gain. Pair it with power iteration for accuracy on decaying singular values.
Key Takeaway
Always pick the decomposition that matches your data's constraints: sparse, non-negative, or missing entries each demand different tools.

Non-negative Matrix Factorization (NMF)

NMF factorizes a non-negative matrix V into two non-negative matrices W and H, ensuring all elements are ≥ 0. This constraint forces the decomposition to represent data as additive combinations of parts, not cancellations of signed components like SVD. For a term-document matrix, NMF yields topics as weighted word groups with no negative coefficients, making results directly interpretable. Computationally, NMF uses multiplicative update rules or alternating least squares, running in O(mnk) per iteration. Convergence is not guaranteed to a global optimum, so multiple random starts are common. NMF excels in image recognition, audio source separation, and recommendation where non-negativity reflects physical reality. It handles dense matrices better than SVD because factors remain sparse. When your data's entries are counts, intensities, or probabilities, NMF is often more meaningful than SVD.

NMFExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — dsa tutorial

import org.apache.commons.math3.linear.*;

public class NMFExample {
    public static void main(String[] args) {
        // V (m×n) = W (m×k) × H (k×n), all non-negative
        int m = 5000, n = 3000, k = 20;
        RealMatrix V = MatrixUtils.createRealMatrix(m, n);
        // Multiplicative update: W = W .* (V H^T) ./ (W H H^T)
        // Guarantees parts-based decomposition
        System.out.println("NMF finds additive parts, not cancellations.");
        System.out.println("SVD on faces gives eigenfaces; NMF gives facial features.");
    }
}
Output
NMF finds additive parts, not cancellations.
SVD on faces gives eigenfaces; NMF gives facial features.
Watch Out:
NMF can converge to poor local minima. Always run 5–10 random initializations and pick the one with lowest reconstruction error.
Key Takeaway
NMF trades SVD's orthogonality for interpretability — use it when your data's non-negative meaning matters more than mathematical elegance.
● Production incidentPOST-MORTEMseverity: high

Full SVD on Sparse Recommendation Matrix Crashes Server

Symptom
OutOfMemoryError: Java heap space after 2 hours of computation.
Assumption
SVD automatically exploits sparsity (it doesn't; numpy.linalg.svd creates dense copies).
Root cause
By default, numpy.linalg.svd converts sparse to dense, requiring 1M×500k×8 = 4TB for full matrix. Even thin SVD (full_matrices=False) still loads entire matrix.
Fix
Use scipy.sparse.linalg.svds with k=100 to compute only top singular values without densifying.
Key lesson
  • Always check matrix sparsity before using dense SVD.
  • Use scipy.sparse.linalg.svds for large sparse matrices.
  • Set k << min(m,n) to keep computation feasible.
Production debug guideCommon numerical and performance problems3 entries
Symptom · 01
Singular values are NaN or Inf
Fix
Check input matrix for NaN/Inf. Use np.isfinite().all() before SVD.
Symptom · 02
SVD computation hangs or takes too long
Fix
Check matrix size and sparsity. Use randomized SVD (sklearn.utils.extmath.randomized_svd) for large matrices.
Symptom · 03
Reconstructed matrix differs from original significantly
Fix
Verify k is large enough to capture variance. Compute reconstruction error: np.linalg.norm(A - A_approx) / np.linalg.norm(A).
SVD vs Eigendecomposition
PropertySVDEigendecomposition
Matrix shapeAny m×nSquare only (n×n)
RequiresNo restrictionDiagonalisable (n independent eigenvectors)
Orthogonal factorsU and V are orthogonalEigenvectors are orthogonal only for symmetric matrices
Diagonal valuesSingular values (non-negative)Eigenvalues (can be negative/complex)
Numerical stabilityVery stableCan be unstable for non-symmetric matrices
Common usePCA, pseudoinverse, compressionSpectral graph theory, PageRank, ODEs

Key takeaways

1
SVD A = UΣV^T works for any matrix
more general than eigendecomposition which requires square diagonalisable.
2
Singular values σ₁≥σ₂≥...≥0 encode information importance. Rank = number of non-zero singular values.
3
Truncated SVD (keep k singular values) = best rank-k approximation. Foundation of PCA, image compression, NLP (LSA).
4
Pseudoinverse A^+ = V Σ^+ U^T
solves overdetermined/underdetermined systems in least-squares sense.
5
numpy.linalg.svd for dense; scipy.sparse.linalg.svds for large sparse matrices (only computes k singular values).

Common mistakes to avoid

3 patterns
×

Assuming SVD requires square matrices

Symptom
Confusion when applying eigendecomposition intuition to rectangular matrices
Fix
Remember SVD works for any m×n matrix. Use SVD instead of eigendecomposition for PCA.
×

Not centering data before SVD for PCA

Symptom
First singular vector captures the mean rather than variance; PCA results are misleading
Fix
Subtract column means from the matrix before applying SVD when performing PCA.
×

Using full SVD when only top k singular values are needed

Symptom
Unnecessary memory and time consumption (O(mn²) vs O(mn·k))
Fix
Use truncated SVD via scipy.sparse.linalg.svds or sklearn.decomposition.TruncatedSVD.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does truncated SVD relate to PCA?
Q02SENIOR
What is the pseudoinverse and when do you use it?
Q03SENIOR
Why is SVD more numerically stable than computing eigenvalues directly?
Q01 of 03SENIOR

How does truncated SVD relate to PCA?

ANSWER
PCA can be computed via SVD on the centred data matrix X: X = UΣV^T. The principal components are the columns of V (or V^T rows). Projections are given by UΣ. Truncated SVD (retaining k components) gives the same result as PCA retaining k principal components. This approach is more numerically stable than eigendecomposition of the covariance matrix, and it avoids explicitly forming X^T X, which can be large.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How does Netflix/collaborative filtering use SVD?
02
What is the difference between SVD and eigendecomposition?
03
Why is SVD used in Latent Semantic Analysis (LSA)?
04
How does SVD help in image compression?
N
Naren Founder & Principal Engineer

20+ years shipping performance-critical code where algorithms decide the bill. Notes here come from systems that actually shipped.

Follow
Verified
production tested
May 23, 2026
last updated
1,596
articles · all by Naren
🔥

That's Linear Algebra. Mark it forged?

6 min read · try the examples if you haven't

Previous
Eigenvalues and Eigenvectors
5 / 5 · Linear Algebra
Next
Genetic Algorithms — Evolution-Based Optimisation