Matrix Operations — Addition, Multiplication, Transpose
- Matrix multiplication: C[i][j] = Σ A[i][k] × B[k][j]. Requires cols(A) = rows(B). Result is rows(A) × cols(B).
- Not commutative: AB ≠ BA in general. Is associative: (AB)C = A(BC).
- O(n³) naive — Strassen O(n^2.807) — practical threshold ~512. Always use NumPy/BLAS in production.
Matrices are the fundamental data structure of numerical computing. Every machine learning model is a stack of matrix multiplications. Every 3D graphics transformation is a matrix product. Every linear system of equations is a matrix equation. NumPy's core operation (np.dot, @) and every GPU-accelerated deep learning framework (PyTorch, TensorFlow) are built around efficient matrix multiplication.
Understanding matrix operations from first principles — not just API calls — means understanding why matrix multiplication is O(n³), why it is not commutative (AB ≠ BA in general), and why the memory layout matters as much as the algorithm for performance.
Core Operations
import numpy as np A = np.array([[1,2],[3,4]]) B = np.array([[5,6],[7,8]]) # Addition: element-wise, O(nm) print('A + B =\n', A + B) # Multiplication: (m×n) @ (n×p) = (m×p), O(mnp) # AB[i][j] = dot product of row i of A and col j of B print('A @ B =\n', A @ B) # Non-commutativity: AB ≠ BA in general print('BA =\n', B @ A) print('AB == BA:', np.allclose(A@B, B@A)) # False # Transpose: rows become columns print('A^T =\n', A.T) # Inverse: A @ A^-1 = I if np.linalg.det(A) != 0: print('A^-1 =\n', np.linalg.inv(A))
[[ 6 8]
[10 12]]
A @ B =
[[19 22]
[43 50]]
BA =
[[23 34]
[31 46]]
AB == BA: False
A^T =
[[1 3]
[2 4]]
A^-1 =
[[-2. 1. ]
[ 1.5 -0.5]]
Naive Matrix Multiplication Implementation
def matmul(A, B): """O(n^3) naive matrix multiplication.""" rows_A, cols_A = len(A), len(A[0]) rows_B, cols_B = len(B), len(B[0]) assert cols_A == rows_B, 'Incompatible dimensions' C = [[0]*cols_B for _ in range(rows_A)] for i in range(rows_A): for j in range(cols_B): for k in range(cols_A): C[i][j] += A[i][k] * B[k][j] return C A = [[1,2],[3,4]] B = [[5,6],[7,8]] print(matmul(A, B)) # [[19,22],[43,50]]
Why Cache Layout Dominates Performance
Matrix multiplication in Python (pure loops) for n=500 takes ~10 seconds. NumPy for the same: ~1ms. The 10,000x difference is not the algorithm — it is memory access patterns and hardware optimisation.
NumPy uses BLAS (Basic Linear Algebra Subprograms) with: block decomposition for cache efficiency (fitting submatrices in L1/L2 cache), SIMD vectorisation (processing 8 floats per instruction with AVX2), and multi-threading. The algorithm is still O(n³) — but the constant factor is ~1000x smaller than naive Python loops.
Rule: never implement matrix multiplication yourself. Use NumPy (@) or torch.matmul for GPU. The gap between a correct naive implementation and a production one is orders of magnitude.
🎯 Key Takeaways
- Matrix multiplication: C[i][j] = Σ A[i][k] × B[k][j]. Requires cols(A) = rows(B). Result is rows(A) × cols(B).
- Not commutative: AB ≠ BA in general. Is associative: (AB)C = A(BC).
- O(n³) naive — Strassen O(n^2.807) — practical threshold ~512. Always use NumPy/BLAS in production.
- Inverse exists only for square matrices with non-zero determinant. Use np.linalg.solve (not inv) for linear systems.
- Memory layout (C-contiguous vs Fortran-order) can affect matrix multiplication performance by 2-5x.
Interview Questions on This Topic
- QWhy is matrix multiplication O(n³) and what is the best known algorithm?
- QWhen does AB = BA?
- QWhy should you use np.linalg.solve instead of np.linalg.inv for solving linear systems?
Frequently Asked Questions
What is the difference between element-wise multiplication and matrix multiplication?
Element-wise (Hadamard product): C[i][j] = A[i][j] × B[i][j]. Requires same dimensions. In NumPy: A B. Matrix multiplication (dot product): C[i][j] = Σ A[i][k] × B[k][j]. Requires cols(A) = rows(B). In NumPy: A @ B. Neural networks use both — fully connected layers use @, attention masks use .
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.