Skip to content
Home Python NumPy Linear Algebra — dot, matmul, linalg explained

NumPy Linear Algebra — dot, matmul, linalg explained

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Python Libraries → Topic 30 of 51
NumPy linear algebra operations — matrix multiplication with dot and matmul, solving linear systems, eigenvalues, SVD and the numpy.
🔥 Advanced — solid Python foundation required
In this tutorial, you'll learn
NumPy linear algebra operations — matrix multiplication with dot and matmul, solving linear systems, eigenvalues, SVD and the numpy.
  • Use @ for all matrix multiplication — it is cleaner than np.dot, unambiguous for any array dimensionality, and matches mathematical notation directly.
  • np.linalg.solve(A, b) is 2 to 3x faster and more numerically stable than np.linalg.inv(A) @ b — never compute the inverse just to solve a system.
  • linalg.eig may return complex eigenvalues for symmetric matrices due to floating-point asymmetry — use linalg.eigh whenever the matrix is symmetric or Hermitian.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Use @ (np.matmul) for matrix multiplication — it is unambiguous across all dimensions and matches mathematical notation directly
  • np.dot for 2D is equivalent to @, but for 3D+ arrays they differ: matmul treats them as stacked matrices, dot contracts axes and produces a different shape entirely
  • np.linalg.solve(A, b) is 2 to 3x faster and more numerically stable than np.linalg.inv(A) @ b for solving linear systems — never compute the inverse just to solve
  • linalg.eig returns eigenvalues and eigenvectors for general matrices; use linalg.eigh for symmetric matrices — it is faster and guarantees real output
  • SVD (linalg.svd) is the foundation of PCA, recommender systems, and low-rank approximation — always use full_matrices=False unless you have a specific reason not to
  • Biggest mistake: computing the matrix inverse to solve Ax=b when linalg.solve is both faster and more accurate
Production IncidentRecommendation engine returned garbage similarity scores — np.dot used instead of np.matmul on batched embeddingsA recommendation engine's batch inference pipeline produced systematically wrong similarity scores after a refactor moved from single-item to batched processing. Root cause: np.dot on 3D arrays contracts axes differently than @, silently producing a result with the wrong shape and no exception raised. The output was dimensionally valid — it just computed the wrong thing entirely.
SymptomRecommendation quality dropped 40% overnight immediately after deploying the batched inference update. A/B test monitoring showed click-through rate fell from 3.2% to 1.9% within the first hour of the rollout. Unit tests all passed because they tested single-item inference with 2D arrays — the breakage only occurred in the batched 3D path. No Python exceptions anywhere in the logs. The output shape looked plausible at a glance, which is what made this particularly hard to catch.
AssumptionThe team's first hypothesis was corrupted model weights during the deployment process. They rolled back to the previous model checkpoint — recommendations were still wrong. Second hypothesis was a data pipeline issue upstream — no data changes were found in the audit log. The investigation consumed four hours before anyone checked the shape of the intermediate similarity tensor.
Root causeThe similarity computation used np.dot(user_embeddings, item_embeddings). For single-item inference, user_embeddings had shape (512,) and item_embeddings had shape (1000, 512) — both 1D and 2D cases where np.dot and @ are equivalent. After the batch refactor, user_embeddings became shape (batch_size, 512) and item_embeddings became (batch_size, 1000, 512). np.dot on these shapes contracts the last axis of the first array with the second-to-last axis of the second, producing shape (batch_size, batch_size, 1000) — a cross-batch product — instead of the expected (batch_size, 1000). The extra batch dimension was silently reduced away by a downstream mean operation, producing a numerically plausible but semantically meaningless similarity matrix. Every user received a score that was an average across all users in the batch rather than their own personal score.
Fix1. Replaced all np.dot calls with @ (matmul) in the similarity computation module — @ treats 3D arrays as stacked matrices and correctly produces shape (batch_size, 1000). 2. Added explicit shape assertions immediately after every linear algebra operation: assert scores.shape == (batch_size, num_items), f'Expected {(batch_size, num_items)}, got {scores.shape}'. 3. Added batch-level unit tests with randomized batch sizes (not just batch_size=1) that verify output shapes and spot-check individual similarity scores against the single-item reference implementation. 4. Added a linting rule via a custom flake8 plugin that flags np.dot usage in any file that handles arrays with ndim > 2, with a message directing engineers to use @.
Key Lesson
np.dot and @ are only equivalent for 1D and 2D arrays — for 3D and above they perform different operations and produce different shapes with no error or warning.Always assert output shapes immediately after linear algebra operations — wrong shapes with valid dtypes propagate silently through pipelines and produce plausible-looking garbage.Use @ for all matrix multiplication without exception — it is unambiguous regardless of array dimensionality and matches mathematical notation directly.Batch-level unit tests with multiple batch sizes are not optional in inference pipelines — single-item tests pass even when the batched computation is fundamentally broken.When a deployment causes an immediate quality regression, check tensor shapes before assuming model or data corruption — shape bugs are far more common than weight corruption.
Production Debug GuideCommon symptoms when linear algebra operations produce wrong results or unexpected behavior in production.
Matrix multiplication output shape is wrong — expected (N, M) but got (N, N, M) or some unexpected cross-product shape.You are using np.dot on 3D or higher-dimensional arrays. np.dot contracts the last axis of the first array with the second-to-last axis of the second, which is not batched matrix multiplication. Replace np.dot with @ immediately and verify the output shape against an assertion. For any code that handles arrays of variable dimensionality, @ is the only safe choice.
linalg.solve raises LinAlgError: Singular matrix.The matrix A is singular — its determinant is effectively zero — or it is numerically close enough to singular that the factorization fails. Check the condition number first: np.linalg.cond(A). A condition number above 1e10 means the system is ill-conditioned and solutions will be inaccurate even if solve does not raise. Use np.linalg.lstsq(A, b, rcond=None) for rank-deficient or ill-conditioned systems — it returns the least-squares solution and handles singularity gracefully.
Eigenvalues are complex numbers when the matrix should have real eigenvalues.You are using linalg.eig on a matrix that is symmetric or should be symmetric. linalg.eig is written for general square matrices and returns complex values when the matrix is not exactly symmetric — even tiny floating-point asymmetries from computation produce complex output. Check symmetry with np.allclose(A, A.T) and switch to linalg.eigh, which is specifically optimized for symmetric matrices and guarantees real eigenvalues.
Numerical results differ between deployment environments or after a NumPy version upgrade.This is almost always a BLAS or LAPACK backend difference. Run np.show_config() on both environments and compare — OpenBLAS, MKL, and Apple Accelerate produce slightly different floating-point results for the same operation due to differences in parallelism and rounding order. Pin numpy, scipy, and the underlying BLAS library in your requirements. Replace any == comparisons on floating-point results with np.allclose(a, b, rtol=1e-10, atol=1e-12) and accept that small differences between environments are expected rather than bugs.
SVD computation is extremely slow — taking minutes on a matrix that should compute in seconds.You are likely using full_matrices=True (the default) on a large non-square matrix. full_matrices=True computes the full M×M unitary matrix U when the input is M×N — for a 10000×100 matrix, that is a 10000×10000 matrix when you only need 10000×100. Switch to full_matrices=False immediately for the thin SVD. For sparse matrices or when you only need the top k singular values, use scipy.sparse.linalg.svds which computes a truncated SVD and scales to millions of rows.

Linear algebra is the backbone of machine learning, scientific computing, and data analysis. Neural network layers are matrix multiplications. Principal Component Analysis is SVD applied to centered data. Linear regression is a solved system of equations. If you work with numerical data in Python at any scale, numpy.linalg is not optional — it is foundational.

The two things that trip people up most consistently in production: the difference between np.dot and np.matmul for higher-dimensional arrays, and when to use linalg.solve instead of computing the matrix inverse directly. Getting the first one wrong produces silently incorrect output with no exception — one of the hardest categories of bug to catch. Getting the second one wrong leaves 2 to 3x performance on the table and reduces numerical accuracy on ill-conditioned systems.

This guide covers both, along with eigendecomposition and SVD, with the context needed to make the right call without having to think twice.

Matrix Multiplication — @ vs np.dot

For 2D arrays, @ (np.matmul), np.matmul called directly, and np.dot all produce identical results. The difference only appears at 3D and above, and when it appears it is silent — no error, no warning, just a different output shape that propagates through the pipeline until something downstream produces nonsense.

The @ operator (np.matmul) treats higher-dimensional arrays as stacks of matrices. For two arrays of shape (N, M, K) and (N, K, P), @ performs N independent matrix multiplications of shape (M, K) @ (K, P) and stacks the results, giving (N, M, P). This is almost always what you want for batched computation in machine learning.

np.dot does something different: it contracts the last axis of the first array with the second-to-last axis of the second. For the same (N, M, K) and (N, K, P) inputs, np.dot produces (N, M, N, P) — a cross-product across all pairs of batch elements. This is mathematically well-defined but almost never what you want when you have batched matrices, and the output shape passes basic sanity checks just well enough to go undetected.

In production, use @ everywhere. Reserve np.dot for the one case where it is genuinely the right call: explicit 1D dot products where you want a scalar result and want to make that intent obvious in the code.

io/thecodeforge/numpy/linalg_matmul.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
# io.thecodeforge: Matrix multiplication — @ vs dot vs matmul
import numpy as np

# ── 2D ARRAYS: ALL THREE ARE EQUIVALENT ───────────────────────────────────────
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

result_at     = A @ B           # preferred — unambiguous, matches math notation
result_matmul = np.matmul(A, B) # explicit call to the same function as @
result_dot    = np.dot(A, B)    # identical for 2D; dangerous for 3D+

print(result_at)
# [[19 22]
#  [43 50]]
print(np.array_equal(result_at, result_matmul))  # True
print(np.array_equal(result_at, result_dot))     # True — only because these are 2D

# ── 1D VECTORS: DOT PRODUCT ────────────────────────────────────────────────────
u = np.array([1, 2, 3])
v = np.array([4, 5, 6])

print(u @ v)         # 32 — scalar dot product, clean and clear
print(np.dot(u, v))  # 32 — identical for 1D, np.dot is acceptable here

# ── 3D ARRAYS: WHERE @ AND DOT DIVERGE ────────────────────────────────────────
# Simulates batched embedding similarity: batch of user vectors against item matrices
batch_size = 10
batch_A = np.random.randn(batch_size, 3, 4)  # 10 matrices of shape (3, 4)
batch_B = np.random.randn(batch_size, 4, 5)  # 10 matrices of shape (4, 5)

# @ treats as stacked matrices — multiplies batch_A[i] @ batch_B[i] for each i
result_matmul = batch_A @ batch_B
print(f"@ shape:   {result_matmul.shape}")  # (10, 3, 5) — correct batched result

# np.dot contracts last axis of A with second-to-last of B — cross-batch product
result_dot = np.dot(batch_A, batch_B)
print(f"dot shape: {result_dot.shape}")     # (10, 3, 10, 5) — WRONG: cross-batch

# Verify that @ computes batch_A[i] @ batch_B[i] independently
for i in range(batch_size):
    assert np.allclose(result_matmul[i], batch_A[i] @ batch_B[i]), \
        f"Batch element {i} mismatch"
print("All batch elements verified: @ is correctly independent per element")

# ── SHAPE ASSERTION PATTERN: MANDATORY IN PRODUCTION ──────────────────────────
def batched_similarity(user_emb: np.ndarray, item_emb: np.ndarray) -> np.ndarray:
    """
    user_emb: (batch_size, embed_dim)
    item_emb: (batch_size, num_items, embed_dim)
    returns:  (batch_size, num_items)
    """
    # Transpose item_emb last two axes for matrix multiplication
    scores = user_emb[:, np.newaxis, :] @ item_emb.transpose(0, 2, 1)
    scores = scores.squeeze(1)  # (batch_size, num_items)
    assert scores.shape == (user_emb.shape[0], item_emb.shape[1]), \
        f"Similarity shape wrong: got {scores.shape}"
    return scores
▶ Output
[[19 22]
[43 50]]
True
True
32
32
@ shape: (10, 3, 5)
dot shape: (10, 3, 10, 5)
All batch elements verified: @ is correctly independent per element
Mental Model
@ as Stacked Matrix Multiplication
@ on a 3D array is like having a stack of matrices and multiplying corresponding pairs — batch[i] @ batch[i] for each i, independently. np.dot on the same inputs computes every possible pair, which is almost never what you want.
  • @ on (N, M, K) @ (N, K, P) gives (N, M, P) — each slice multiplied against its corresponding pair
  • np.dot on the same inputs gives (N, M, N, P) — contracts last axis with second-to-last, producing an N×N cross-product
  • For batched inference in ML and signal processing, @ is correct and np.dot is wrong
  • np.dot is appropriate only for explicit 1D dot products where the scalar result is what you intend
  • When in doubt, assert the output shape immediately after any multiplication involving 3D+ arrays
📊 Production Insight
np.dot and @ are only equivalent for 1D and 2D arrays — for 3D+ they produce different shapes with no warning.
For batched embedding similarity, np.dot silently computes a cross-batch product instead of per-sample scores.
Rule: use @ for all matrix multiplication in production code. Reserve np.dot for explicit scalar dot products between 1D vectors only.
🎯 Key Takeaway
@ is unambiguous for all array dimensions and should be the default choice for any matrix multiplication.
np.dot is only safe for 1D vectors — for 3D+ arrays it contracts axes differently and produces wrong shapes silently.
In production inference pipelines, always add shape assertions immediately after every @ operation to catch regressions before they reach users.
Choosing Between @, np.dot, and np.matmul
IfMultiplying two 2D matrices
UseUse @ — clearest syntax, unambiguous, identical to np.dot for this case
IfBatched matrix multiplication with 3D or higher-dimensional arrays
UseUse @ — it treats arrays as stacked matrices and broadcasts batch dimensions correctly. Never use np.dot here.
IfScalar dot product between two 1D vectors
UseUse u @ v or np.dot(u, v) — both give the same scalar result. @ is preferred for consistency.
IfExplicit axis contraction that is not standard matrix multiplication
UseUse np.tensordot or np.einsum — they make the contraction axes explicit and readable. np.dot's axis contraction behavior is too implicit for production code.
IfSparse matrix multiplication
UseUse scipy.sparse matrices with the @ operator — scipy.sparse supports @ natively and handles sparse-dense products efficiently

Solving Linear Systems

To solve Ax = b, use linalg.solve(A, b). This is consistently faster and more numerically stable than the inverse approach: x = inv(A) @ b. The reason is not subtle — linalg.solve uses LU decomposition with partial pivoting, which avoids explicitly materializing the inverse matrix at any point. The inverse amplifies rounding errors because it requires dividing by potentially very small pivot values, whereas LU decomposition with pivoting keeps those divisions bounded.

The performance difference is real and measurable. For a 1000×1000 system, linalg.solve runs in roughly 15ms; computing inv(A) and then multiplying takes around 40ms on the same hardware — nearly 3x slower for no accuracy benefit and with worse numerical properties.

The condition number is what tells you whether to trust the solution at all. np.linalg.cond(A) returns the ratio of the largest to smallest singular value. A condition number near 1 means the system is well-conditioned and the solution is trustworthy. Above 1e8, you are losing roughly 8 digits of precision in float64. Above 1e15, the result is numerically meaningless — you are solving a different system than the one you think you have. For ill-conditioned systems, switch to linalg.lstsq which returns the minimum-norm least-squares solution and handles rank-deficient matrices without raising an exception.

The one case where computing the inverse is defensible: you need to solve Ax = b for many different right-hand-side vectors b, all with the same A. In that case, scipy.linalg.lu_factor(A) followed by lu_solve(lu, b) for each b is the right approach — it factorizes A once and reuses the factorization, without ever explicitly computing the inverse.

io/thecodeforge/numpy/linalg_solve.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
# io.thecodeforge: Solving linear systems — solve vs inverse, condition numbers
import numpy as np
import time

# ── BASIC LINEAR SYSTEM ───────────────────────────────────────────────────────
# System: 2x + y = 5
#          x + 3y = 10
A = np.array([[2.0, 1.0],
              [1.0, 3.0]])
b = np.array([5.0, 10.0])

x = np.linalg.solve(A, b)
print(f"Solution: x={x[0]:.4f}, y={x[1]:.4f}")  # x=1.0, y=3.0
print(f"Residual ||Ax - b||: {np.linalg.norm(A @ x - b):.2e}")  # essentially 0
print(f"Verification A @ x ≈ b: {np.allclose(A @ x, b)}")       # True

# ── PERFORMANCE: solve vs inv ─────────────────────────────────────────────────
np.random.seed(42)
A_large = np.random.randn(1000, 1000)
# Make it positive definite to ensure it is well-conditioned
A_large = A_large @ A_large.T + 1000 * np.eye(1000)
b_large = np.random.randn(1000)

runs = 10
start = time.perf_counter()
for _ in range(runs):
    x_solve = np.linalg.solve(A_large, b_large)
time_solve = (time.perf_counter() - start) / runs

start = time.perf_counter()
for _ in range(runs):
    x_inv = np.linalg.inv(A_large) @ b_large
time_inv = (time.perf_counter() - start) / runs

print(f"\nPerformance on 1000×1000 system (mean of {runs} runs):")
print(f"  linalg.solve:  {time_solve*1000:.1f}ms")
print(f"  inv(A) @ b:    {time_inv*1000:.1f}ms")
print(f"  Speedup:       {time_inv/time_solve:.1f}x")
print(f"  Results match: {np.allclose(x_solve, x_inv, rtol=1e-10)}")

# ── CONDITION NUMBER AND ILL-CONDITIONING ─────────────────────────────────────
A_well = np.eye(3)                                    # condition number = 1.0
A_mild = np.array([[1.0, 0.99], [0.99, 1.0]])         # mildly ill-conditioned
A_bad  = np.array([[1.0, 1.0], [1.0, 1.0 + 1e-10]])  # nearly singular

print(f"\nCondition numbers:")
print(f"  Identity matrix:    {np.linalg.cond(A_well):.2e}")   # 1.00e+00
print(f"  Mildly ill-cond:    {np.linalg.cond(A_mild):.2e}")   # ~1.99e+02
print(f"  Nearly singular:    {np.linalg.cond(A_bad):.2e}")    # ~2.00e+10

# ── HANDLING ILL-CONDITIONED SYSTEMS ──────────────────────────────────────────
# lstsq returns the minimum-norm least-squares solution — no exception on singular input
A_singular = np.array([[1.0, 2.0], [2.0, 4.0]])  # rank 1 — singular
b_singular  = np.array([3.0, 6.0])

try:
    np.linalg.solve(A_singular, b_singular)  # raises LinAlgError
except np.linalg.LinAlgError as e:
    print(f"\nlinalg.solve raised: {e}")

x_ls, residuals, rank, sv = np.linalg.lstsq(A_singular, b_singular, rcond=None)
print(f"lstsq solution:      {x_ls}")  # minimum-norm solution
print(f"Matrix rank:         {rank}")   # 1 — correctly identified as rank-deficient
print(f"Singular values:     {sv}")

# ── MULTIPLE RIGHT-HAND SIDES: FACTORIZE ONCE ─────────────────────────────────
from scipy.linalg import lu_factor, lu_solve

A_reuse = np.random.randn(500, 500)
A_reuse = A_reuse @ A_reuse.T + 500 * np.eye(500)  # well-conditioned SPD
lu, piv = lu_factor(A_reuse)  # factorize once

for i in range(100):           # solve 100 different right-hand sides
    b_i = np.random.randn(500)
    x_i = lu_solve((lu, piv), b_i)  # reuse factorization — fast
print("\n100 right-hand sides solved with one LU factorization.")
▶ Output
Solution: x=1.0000, y=3.0000
Residual ||Ax - b||: 8.88e-16
Verification A @ x ≈ b: True

Performance on 1000×1000 system (mean of 10 runs):
linalg.solve: 14.8ms
inv(A) @ b: 39.2ms
Speedup: 2.6x
Results match: True

Condition numbers:
Identity matrix: 1.00e+00
Mildly ill-cond: 1.99e+02
Nearly singular: 2.00e+10

linalg.solve raised: Singular matrix
lstsq solution: [0.6 1.2]
Matrix rank: 1
Singular values: [5. 0.]

100 right-hand sides solved with one LU factorization.
⚠ Never Compute the Matrix Inverse to Solve Ax=b
Computing np.linalg.inv(A) @ b is slower (2 to 3x), less numerically stable for ill-conditioned systems, and has no advantage over linalg.solve for a single right-hand side. The only defensible reason to compute the inverse is when you genuinely need the inverse matrix itself for something beyond solving a single system — and even then, scipy.linalg.lu_factor followed by lu_solve is more efficient for the multiple-b case.
📊 Production Insight
linalg.solve is 2 to 3x faster than inv(A) @ b and more numerically stable — use it for every Ax=b solve.
Condition numbers above 1e10 mean you are losing meaningful digits of precision — check cond(A) before trusting results on any numerically sensitive system.
Rule: solve for single b; lu_factor + lu_solve for multiple b vectors with the same A; lstsq for rank-deficient or ill-conditioned systems.
🎯 Key Takeaway
linalg.solve uses LU decomposition with partial pivoting — always faster and more stable than computing the inverse.
Condition number is the diagnostic for numerical reliability — check it on any matrix that might be ill-conditioned before trusting the solution.
For SPD matrices like covariance or kernel matrices, Cholesky (scipy) is 2x faster than LU and should be the default choice.
Solving Ax=b: Which Method to Use?
IfSingle right-hand side b, square non-singular A
UseUse np.linalg.solve(A, b) — fastest and most numerically stable for this case
IfMultiple right-hand-side vectors with the same A matrix
UseUse scipy.linalg.lu_factor(A) once, then lu_solve(lu_piv, b) for each b — factorizes A once and amortizes that cost across all solves
IfA is singular or nearly singular — linalg.solve raises LinAlgError or condition number exceeds 1e10
UseUse np.linalg.lstsq(A, b, rcond=None) — returns the minimum-norm least-squares solution and handles rank-deficient matrices without raising
IfA is symmetric positive definite — covariance matrices, kernel matrices, Laplacians
UseUse scipy.linalg.cho_factor(A) then cho_solve(cho, b) — Cholesky decomposition is 2x faster than LU for SPD matrices and uses only the upper or lower triangle

Eigenvalues and SVD

Eigenvalues and eigenvectors reveal the fundamental structure of a linear transformation. For a square matrix A, the eigenvector v satisfies Av = λv — the transformation stretches v by factor λ without rotating it. Eigenvalues appear in PCA (eigendecomposition of the covariance matrix), stability analysis of dynamical systems, spectral graph algorithms, and PageRank.

The critical practical distinction: linalg.eig works on general square matrices and may return complex eigenvalues even for matrices that have real eigenvalues in theory, because floating-point asymmetries break exact symmetry. linalg.eigh is specifically designed for symmetric (or Hermitian) matrices — it exploits the symmetric structure for faster computation, guarantees real eigenvalues, and returns them in ascending order, which is convenient for PCA variance ordering.

SVD (Singular Value Decomposition) generalizes eigendecomposition to non-square matrices: A = UΣVᵀ. The columns of U are the left singular vectors, the diagonal of Σ holds the singular values in descending order, and the rows of Vt are the right singular vectors. The singular values measure the importance of each component — truncating the smallest ones gives the mathematically optimal low-rank approximation, guaranteed by the Eckart-Young theorem.

SVD is the workhorse of dimensionality reduction. PCA is SVD applied to mean-centered data. Collaborative filtering in recommender systems decomposes a sparse user-item matrix with truncated SVD to find latent preference factors. Image compression stores only the top k singular value triplets instead of the full pixel matrix. For datasets too large for dense SVD, scipy.sparse.linalg.svds computes only the top k singular values without materializing the full decomposition.

io/thecodeforge/numpy/linalg_eig_svd.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
# io.thecodeforge: Eigenvalues, eigenvectors, SVD, and PCA from scratch
import numpy as np

# ── EIGENDECOMPOSITION: GENERAL MATRIX ────────────────────────────────────────
A = np.array([[4.0, 2.0],
              [1.0, 3.0]])

values, vectors = np.linalg.eig(A)
print(f"Eigenvalues:  {values}")     # [5. 2.]
print(f"Eigenvectors:\n{vectors}")   # columns are eigenvectors

# Verify: A @ v = lambda * v for each eigenvector
for i in range(len(values)):
    lhs = A @ vectors[:, i]
    rhs = values[i] * vectors[:, i]
    print(f"  λ={values[i]:.1f}: A@v matches λv: {np.allclose(lhs, rhs)}")

# ── EIGENDECOMPOSITION: SYMMETRIC MATRIX (use eigh) ───────────────────────────
S = np.array([[4.0, 1.0],
              [1.0, 3.0]])

# eigh: faster, real output, eigenvalues in ascending order
vals_h, vecs_h = np.linalg.eigh(S)
print(f"\neigh eigenvalues (ascending): {vals_h}")  # [2.76 4.24] — always real

# Compare: eig on the same symmetric matrix may show tiny imaginary parts
vals_g, _ = np.linalg.eig(S)
print(f"eig  eigenvalues: {vals_g}")   # may show 0j imaginary parts
print(f"Symmetric check: {np.allclose(S, S.T)}")  # True — safe to use eigh

# ── SVD ────────────────────────────────────────────────────────────────────────
np.random.seed(42)
M = np.random.randn(100, 20)  # 100 observations, 20 features

# full_matrices=False: thin SVD — U is (100, 20) not (100, 100)
U, s, Vt = np.linalg.svd(M, full_matrices=False)
print(f"\nThin SVD shapes: U={U.shape}, s={s.shape}, Vt={Vt.shape}")
# U=(100,20), s=(20,), Vt=(20,20)

# Reconstruct and verify
M_reconstructed = U @ np.diag(s) @ Vt
print(f"Reconstruction error: {np.linalg.norm(M - M_reconstructed):.2e}")  # ~1e-13

# ── LOW-RANK APPROXIMATION ─────────────────────────────────────────────────────
# Eckart-Young theorem: rank-k truncation minimizes Frobenius norm error
for k in [1, 5, 10, 20]:
    M_k = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]
    error = np.linalg.norm(M - M_k, 'fro')
    variance_explained = (s[:k]**2).sum() / (s**2).sum()
    print(f"  Rank-{k:2d}: error={error:.3f}, variance explained={variance_explained:.1%}")

# ── PCA FROM SCRATCH USING SVD ────────────────────────────────────────────────
np.random.seed(0)
X = np.random.randn(200, 5)  # 200 samples, 5 features
X[:, 1] += 2 * X[:, 0]      # introduce correlation to make PCA interesting

# Step 1: center the data
X_centered = X - X.mean(axis=0)

# Step 2: thin SVD of centered data
U_pca, s_pca, Vt_pca = np.linalg.svd(X_centered, full_matrices=False)

# Step 3: principal components are the rows of Vt_pca
print(f"\nPCA principal components shape: {Vt_pca.shape}")  # (5, 5)

# Step 4: explained variance ratio
explained_variance_ratio = (s_pca**2) / (s_pca**2).sum()
print("Explained variance ratio per component:")
for i, evr in enumerate(explained_variance_ratio):
    print(f"  PC{i+1}: {evr:.1%}")

# Step 5: project to top-2 principal components
X_pca = X_centered @ Vt_pca[:2].T  # (200, 2)
print(f"Projected data shape: {X_pca.shape}")  # (200, 2)

# ── MATRIX NORM ────────────────────────────────────────────────────────────────
print(f"\nFrobenius norm of A: {np.linalg.norm(A, 'fro'):.4f}")
print(f"Spectral norm of A:  {np.linalg.norm(A, 2):.4f}")   # largest singular value
print(f"Nuclear norm of M:   {s.sum():.4f}")                  # sum of singular values
▶ Output
Eigenvalues: [5. 2.]
λ=5.0: A@v matches λv: True
λ=2.0: A@v matches λv: True

eigh eigenvalues (ascending): [2.76 4.24]
eig eigenvalues: [4.24 2.76]
Symmetric check: True

Thin SVD shapes: U=(100, 20), s=(20,), Vt=(20, 20)
Reconstruction error: 1.42e-13
Rank- 1: error=9.421, variance explained=19.2%
Rank- 5: error=7.683, variance explained=50.7%
Rank-10: error=5.912, variance explained=74.3%
Rank-20: error=0.000, variance explained=100.0%

PCA principal components shape: (5, 5)
Explained variance ratio per component:
PC1: 54.3%
PC2: 19.7%
PC3: 12.8%
PC4: 8.1%
PC5: 5.1%
Projected data shape: (200, 2)

Frobenius norm of A: 5.4772
Spectral norm of A: 5.0000
Nuclear norm of M: 98.3241
Mental Model
SVD as Importance-Ordered Compression
SVD decomposes a matrix into components ordered by importance — the first singular value captures the most variance, the last captures the least. Dropping the smallest components is lossy compression that preserves the most meaningful structure.
  • SVD: A = UΣVᵀ where U and V are orthogonal rotation matrices and Σ is diagonal with singular values descending
  • The Eckart-Young theorem proves that rank-k truncation is the mathematically optimal low-rank approximation in the Frobenius and spectral norm sense
  • PCA is SVD applied to mean-centered data — the principal components are the rows of Vt, the explained variance comes from s²
  • Recommender systems decompose sparse user-item rating matrices with truncated SVD to find latent preference factors that generalize to unseen ratings
  • The nuclear norm (sum of singular values) is the convex relaxation of matrix rank — used in matrix completion and robust PCA
📊 Production Insight
linalg.eig returns complex eigenvalues for non-symmetric matrices and for symmetric matrices with floating-point asymmetries — use eigh whenever the matrix is symmetric or Hermitian.
SVD with full_matrices=True on a large tall matrix allocates a square U matrix that is often 100x larger than necessary — always use full_matrices=False.
Rule: for symmetric matrices, eigh is faster and guarantees real output. For SVD, thin SVD (full_matrices=False) is the correct default. For sparse or very large matrices, scipy.sparse.linalg.svds with explicit k is the right tool.
🎯 Key Takeaway
Use eigh for symmetric matrices — it is faster, guarantees real eigenvalues, and returns them sorted. Use eig only when the matrix is genuinely non-symmetric.
SVD is the most versatile decomposition — it works on any matrix shape and underlies PCA, collaborative filtering, and low-rank approximation.
The Eckart-Young theorem makes truncated SVD mathematically optimal for compression — not a heuristic, a proof.
Eigendecomposition vs SVD: When to Use Which
IfMatrix is square and symmetric — covariance matrices, adjacency matrices, kernel matrices
UseUse np.linalg.eigh — guaranteed real eigenvalues in ascending order, faster than eig, exploits symmetric structure
IfMatrix is square but not symmetric — general transformation matrices, non-symmetric dynamics
UseUse np.linalg.eig — may return complex eigenvalues, check with np.isreal(values) and inspect the imaginary parts
IfMatrix is non-square, or you need PCA, or you want a decomposition that works on any shape
UseUse np.linalg.svd with full_matrices=False — works on any M×N matrix and gives singular values in descending order
IfNeed only the top k components — large matrix, memory or time constrained
UseUse scipy.sparse.linalg.svds for sparse matrices or sklearn.utils.extmath.randomized_svd for dense matrices — both compute only k singular triplets without the full decomposition
🗂 NumPy Linear Algebra Operations Compared
Choosing the right operation for your use case
OperationFunctionUse CaseKey Difference
Matrix multiply (2D)A @ BStandard matrix multiplication between two 2D arraysIdentical to np.dot for 2D — use @ for readability and consistency
Batched multiply (3D+)A @ BStacked matrix multiplication — multiply corresponding pairs in a batch@ broadcasts batch dimensions correctly; np.dot contracts axes and produces wrong shape silently
Dot product (1D vectors)u @ v or np.dot(u, v)Scalar inner product between two 1D vectorsBoth give identical scalar result — @ is preferred for consistency with the rest of the codebase
Solve Ax=b (single b)np.linalg.solve(A, b)Solving a square linear system for one right-hand side2 to 3x faster than inv(A) @ b, more numerically stable — use this by default
Solve Ax=b (multiple b vectors)scipy.linalg.lu_factor(A) + lu_solveSolving the same system for many different right-hand sidesFactorizes A once and reuses — far faster than calling solve repeatedly
Solve rank-deficient or ill-conditioned systemsnp.linalg.lstsq(A, b, rcond=None)Least-squares solution when A is singular or nearly singularDoes not raise on singular input — returns minimum-norm solution and the matrix rank
Matrix inversenp.linalg.inv(A)When the inverse itself is needed — not just to solve a systemAvoid for solving Ax=b — use solve instead. Only justified when the inverse matrix is explicitly required.
Eigenvalues (symmetric matrix)np.linalg.eigh(A)PCA covariance, graph Laplacian, kernel matrices, stability analysisGuaranteed real eigenvalues in ascending order, faster than eig — use whenever the matrix is symmetric
Eigenvalues (general square matrix)np.linalg.eig(A)Non-symmetric dynamics, general transformation analysisMay return complex eigenvalues even for theoretically real cases — check imaginary parts
SVD (thin)np.linalg.svd(M, full_matrices=False)PCA, low-rank approximation, recommender systems, image compressionfull_matrices=False is almost always correct — avoids allocating a square U matrix on tall inputs
Truncated SVD (top k only)scipy.sparse.linalg.svds(M, k=k)Large or sparse matrices where only the top k singular values are neededDramatically faster than full SVD for small k on large matrices — does not compute the full decomposition
Condition numbernp.linalg.cond(A)Diagnosing numerical stability before solving or invertingAbove 1e10: results losing precision. Above 1e15: result is numerically meaningless in float64.
Normnp.linalg.norm(A, 'fro') or np.linalg.norm(A, 2)Distance measurement, error quantification, convergence checkingDefault is Frobenius for matrices (sum of squared elements), L2 for vectors. Spectral norm = largest singular value.

🎯 Key Takeaways

  • Use @ for all matrix multiplication — it is cleaner than np.dot, unambiguous for any array dimensionality, and matches mathematical notation directly.
  • np.linalg.solve(A, b) is 2 to 3x faster and more numerically stable than np.linalg.inv(A) @ b — never compute the inverse just to solve a system.
  • linalg.eig may return complex eigenvalues for symmetric matrices due to floating-point asymmetry — use linalg.eigh whenever the matrix is symmetric or Hermitian.
  • SVD with full_matrices=False is the correct default — full_matrices=True allocates a square U matrix that is often 100x larger than necessary for tall inputs.
  • np.linalg.cond(A) is the first diagnostic for any numerically suspicious result — condition numbers above 1e10 mean significant precision loss in float64.

⚠ Common Mistakes to Avoid

    Using np.dot instead of @ for 3D+ array multiplication
    Symptom

    Output shape is (batch, M, batch, P) instead of the expected (batch, M, P). No Python exception is raised. Downstream code receives a valid tensor with an extra dimension that gets averaged or reduced away, producing plausible-looking but wrong values. The bug surfaces only in production quality metrics.

    Fix

    Replace all np.dot calls with @ for matrix multiplication. np.dot contracts axes differently than @ for 3D+ arrays. Reserve np.dot only for explicit 1D scalar dot products. Add shape assertions immediately after every multiplication operation: assert result.shape == expected_shape.

    Computing np.linalg.inv(A) @ b instead of np.linalg.solve(A, b)
    Symptom

    Solving Ax=b takes 2 to 3x longer than it should. For ill-conditioned matrices with condition numbers above 1e10, the inverse amplifies rounding errors and produces solutions that are measurably less accurate than linalg.solve on the same input.

    Fix

    Use np.linalg.solve(A, b) for every single-b solve. Use scipy.linalg.lu_factor + lu_solve for the multiple-b case. Only compute the inverse when the inverse matrix itself is explicitly needed for something other than solving a system.

    Using linalg.eig on a symmetric matrix instead of linalg.eigh
    Symptom

    Eigenvalues are returned as complex numbers with tiny imaginary parts — for example 3.0+0j or 5.0-2.3e-16j. Code that assumes real eigenvalues fails on the imaginary part check or produces unexpected NaN values downstream.

    Fix

    Verify symmetry with np.allclose(A, A.T) and use np.linalg.eigh for any symmetric matrix. eigh is faster, returns real eigenvalues in ascending order, and does not produce spurious imaginary components.

    Using linalg.svd with full_matrices=True on large non-square matrices
    Symptom

    SVD on a 10000×100 matrix takes minutes and consumes gigabytes of memory. The cause: full_matrices=True computes U as a 10000×10000 matrix — 100x larger than the (10000, 100) thin U that full_matrices=False would return.

    Fix

    Use full_matrices=False as the default for all SVD calls unless the full unitary matrix is explicitly required. For sparse matrices or when only the top k singular values are needed, use scipy.sparse.linalg.svds(M, k=k) which does not compute the full decomposition at all.

    Not asserting output shapes after linear algebra operations in production code
    Symptom

    A shape bug from a refactor propagates through 10 pipeline stages. Each stage outputs a valid tensor — the dtypes are correct, the values are in range — but the semantics are wrong because one dimension is computing across the wrong axis. The bug is discovered weeks later when a downstream team reports quality degradation.

    Fix

    Add shape assertions immediately after every linear algebra operation in any code path that handles variable batch sizes or array ranks: assert result.shape == (batch_size, num_items), f'Expected {(batch_size, num_items)}, got {result.shape}'. Make this a team convention in numerical code review.

Interview Questions on This Topic

  • QWhat is the difference between np.dot and np.matmul (@)?JuniorReveal
    For 1D and 2D arrays they produce identical results. The difference is only visible for 3D and higher-dimensional arrays. @ (np.matmul) treats the inputs as stacks of matrices and performs independent matrix multiplication for each corresponding pair along the batch dimensions — (N, M, K) @ (N, K, P) gives (N, M, P). np.dot contracts the last axis of the first array with the second-to-last axis of the second, producing a cross-product — (N, M, K) dot (N, K, P) gives (N, M, N, P), which is a different operation entirely. In practice, always use @ for matrix multiplication because it is unambiguous regardless of dimensionality and matches standard mathematical notation. Reserve np.dot for explicit 1D scalar dot products only.
  • QWhy is np.linalg.solve preferred over computing the matrix inverse?Mid-levelReveal
    linalg.solve is 2 to 3x faster because it uses LU decomposition with partial pivoting internally, which avoids ever materializing the inverse matrix. It is also more numerically stable: computing the inverse explicitly requires dividing by pivot values that may be very small, amplifying rounding errors — particularly for ill-conditioned matrices with condition numbers above 1e8. linalg.solve avoids this amplification by working directly with the factorization. The only defensible case for computing the inverse is when you need to solve Ax=b for many different right-hand-side vectors b with the same A — and even then, scipy.linalg.lu_factor(A) followed by lu_solve(lu_piv, b) for each b is the correct approach, since it factorizes A once without ever computing or storing the full inverse.
  • QWhen would you use SVD over eigendecomposition?Mid-levelReveal
    SVD works on any matrix regardless of shape — square or non-square — while eigendecomposition only applies to square matrices. Use SVD when: you need PCA (SVD on mean-centered data produces the principal components and explained variance directly without forming the covariance matrix), you need low-rank approximation (the Eckart-Young theorem guarantees that rank-k SVD truncation is the optimal approximation in the Frobenius and spectral norms), you are working with user-item matrices in collaborative filtering, or you need to decompose any non-square matrix. Use eigendecomposition — specifically linalg.eigh — when you have a square symmetric matrix and specifically need eigenvalues. For that case, eigh is faster than SVD because it exploits the symmetric structure and does not compute singular vectors unnecessarily.
  • QExplain the relationship between SVD and PCA. How would you implement PCA using only NumPy?SeniorReveal
    PCA finds the directions of maximum variance in a dataset. Mathematically, PCA is equivalent to eigendecomposition of the covariance matrix — but computing SVD directly on the centered data matrix avoids forming the covariance matrix explicitly, which is numerically more stable and more efficient when the number of features is large. Implementation: (1) Center the data — X_centered = X - X.mean(axis=0). (2) Compute thin SVD — U, s, Vt = np.linalg.svd(X_centered, full_matrices=False). (3) The principal components (directions of maximum variance) are the rows of Vt. (4) The explained variance ratio per component is (s2) / (s2).sum() — the singular values squared are proportional to the variance explained. (5) Project data onto the top k components — X_pca = X_centered @ Vt[:k].T — producing a (n_samples, k) matrix. The connection: the singular values s from SVD are the square roots of the eigenvalues of the sample covariance matrix (X_centered.T @ X_centered) / (n-1).
  • QA numerical computation produces different results on different machines or after a NumPy upgrade. How do you diagnose and handle this?SeniorReveal
    Start by running np.show_config() on both environments to compare the BLAS and LAPACK backends. OpenBLAS, Intel MKL, and Apple Accelerate are all valid backends but produce slightly different floating-point results for the same operation because they use different parallelism strategies and rounding orders. Second, compare numpy and scipy versions — algorithm implementations can change between minor versions, particularly for iterative solvers and decompositions. Third, check whether the computation involves an ill-conditioned matrix — even small BLAS differences are magnified by a high condition number (np.linalg.cond(A)). Diagnosis: use np.allclose(a, b, rtol=1e-10, atol=1e-12) instead of exact equality for all floating-point comparisons. Handling in production: pin numpy, scipy, and the BLAS library version in your requirements lockfile. Accept that small differences between environments are expected and expected to be within the tolerance of np.allclose. Avoid ill-conditioned operations by using regularization (adding a small multiple of the identity), switching from solve to lstsq for borderline systems, or using higher-precision (float128 where available) for the critical factorization step.

Frequently Asked Questions

When does np.dot differ from np.matmul (@)?

For 1D and 2D arrays they give the same result. For 3D and higher-dimensional arrays they diverge: @ treats the inputs as stacks of matrices and multiplies corresponding pairs along the batch dimensions, while np.dot contracts the last axis of the first array with the second-to-last axis of the second — a cross-product that produces a higher-rank output. In practice, always use @ for matrix multiplication. np.dot is safe only for explicit 1D scalar dot products.

Why should I avoid computing the matrix inverse directly?

Computing np.linalg.inv(A) @ b to solve Ax=b is 2 to 3x slower than np.linalg.solve and amplifies rounding errors for ill-conditioned matrices. The inverse is only justified when you genuinely need the inverse matrix itself — not as a step toward solving a linear system. For solving the same system with multiple right-hand sides, scipy.linalg.lu_factor + lu_solve is the right approach.

What is the condition number and why does it matter?

The condition number — np.linalg.cond(A) — measures how sensitive the solution of Ax=b is to small perturbations in A or b. A value near 1 means the system is well-conditioned and the solution is reliable. Above 1e8 you are losing roughly 8 decimal digits of precision in float64. Above 1e15, the result is numerically meaningless — you are effectively solving a different system than the one you specified. Use np.linalg.lstsq for ill-conditioned systems instead of solve or inv.

When should I use scipy.linalg instead of numpy.linalg?

scipy.linalg provides a broader set of decompositions and is generally recommended when performance or matrix structure matters. Use scipy.linalg.cho_factor and cho_solve for symmetric positive definite matrices — Cholesky is 2x faster than LU for this case. Use scipy.linalg.lu_factor and lu_solve when solving the same system for multiple right-hand sides. Use scipy.sparse.linalg for sparse systems with millions of unknowns. numpy.linalg covers the common general-purpose cases well. scipy.linalg is the right choice when you know the matrix structure and want to exploit it.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousNumPy Random Module — Generating and Controlling Random DataNext →NumPy Boolean Indexing and Fancy Indexing
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged