DSA Beginner

Matrix Operations — Addition, Multiplication, Transpose

📅 March 24, 2026 ⏱ 8 min read 🎯 Beginner

Where developers are forged. · Structured learning · Free forever.

📍 Part of: Linear Algebra → Topic 1 of 5

Learn matrix operations — addition, multiplication, transpose, and inverse.

🧑‍💻 Beginner-friendly — no prior DSA experience needed

In this tutorial, you'll learn:

Matrix multiplication: C[i][j] = Σ A[i][k] × B[k][j]. Requires cols(A) = rows(B). Result is rows(A) × cols(B).
Not commutative: AB ≠ BA in general. Is associative: (AB)C = A(BC).
O(n³) naive — Strassen O(n^2.807) — practical threshold ~512. Always use NumPy/BLAS in production.

✦ Plain-English analogy ✦ Real code with output ✦ Interview questions

⚡ Quick Answer

Matrices are grids of numbers that encode transformations. Matrix multiplication rotates, scales, and shears. Neural network layers are matrix multiplications. Image processing filters are matrix convolutions. GPS coordinate transformations are matrix operations. Understanding matrix algebra means understanding the mathematical engine running underneath most of modern computing.

Matrices are the fundamental data structure of numerical computing. Every machine learning model is a stack of matrix multiplications. Every 3D graphics transformation is a matrix product. Every linear system of equations is a matrix equation. NumPy's core operation (np.dot, @) and every GPU-accelerated deep learning framework (PyTorch, TensorFlow) are built around efficient matrix multiplication.

Understanding matrix operations from first principles — not just API calls — means understanding why matrix multiplication is O(n³), why it is not commutative (AB ≠ BA in general), and why the memory layout matters as much as the algorithm for performance.

Core Operations

matrix_ops.py · PYTHON

12345678910111213141516171819202122

import numpy as np

A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])

# Addition: element-wise, O(nm)
print('A + B =\n', A + B)

# Multiplication: (m×n) @ (n×p) = (m×p), O(mnp)
# AB[i][j] = dot product of row i of A and col j of B
print('A @ B =\n', A @ B)

# Non-commutativity: AB ≠ BA in general
print('BA =\n', B @ A)
print('AB == BA:', np.allclose(A@B, B@A))  # False

# Transpose: rows become columns
print('A^T =\n', A.T)

# Inverse: A @ A^-1 = I
if np.linalg.det(A) != 0:
    print('A^-1 =\n', np.linalg.inv(A))

▶ Output

A + B =
[[ 6 8]
[10 12]]
A @ B =
[[19 22]
[43 50]]
BA =
[[23 34]
[31 46]]
AB == BA: False
A^T =
[[1 3]
[2 4]]
A^-1 =
[[-2. 1. ]
[ 1.5 -0.5]]

Naive Matrix Multiplication Implementation

matmul.py · PYTHON

123456789101112131415

def matmul(A, B):
    """O(n^3) naive matrix multiplication."""
    rows_A, cols_A = len(A), len(A[0])
    rows_B, cols_B = len(B), len(B[0])
    assert cols_A == rows_B, 'Incompatible dimensions'
    C = [[0]*cols_B for _ in range(rows_A)]
    for i in range(rows_A):
        for j in range(cols_B):
            for k in range(cols_A):
                C[i][j] += A[i][k] * B[k][j]
    return C

A = [[1,2],[3,4]]
B = [[5,6],[7,8]]
print(matmul(A, B))  # [[19,22],[43,50]]

▶ Output

[[19, 22], [43, 50]]

Why Cache Layout Dominates Performance

Matrix multiplication in Python (pure loops) for n=500 takes ~10 seconds. NumPy for the same: ~1ms. The 10,000x difference is not the algorithm — it is memory access patterns and hardware optimisation.

NumPy uses BLAS (Basic Linear Algebra Subprograms) with: block decomposition for cache efficiency (fitting submatrices in L1/L2 cache), SIMD vectorisation (processing 8 floats per instruction with AVX2), and multi-threading. The algorithm is still O(n³) — but the constant factor is ~1000x smaller than naive Python loops.

Rule: never implement matrix multiplication yourself. Use NumPy (@) or torch.matmul for GPU. The gap between a correct naive implementation and a production one is orders of magnitude.

🎯 Key Takeaways

Matrix multiplication: C[i][j] = Σ A[i][k] × B[k][j]. Requires cols(A) = rows(B). Result is rows(A) × cols(B).
Not commutative: AB ≠ BA in general. Is associative: (AB)C = A(BC).
O(n³) naive — Strassen O(n^2.807) — practical threshold ~512. Always use NumPy/BLAS in production.
Inverse exists only for square matrices with non-zero determinant. Use np.linalg.solve (not inv) for linear systems.
Memory layout (C-contiguous vs Fortran-order) can affect matrix multiplication performance by 2-5x.

Interview Questions on This Topic

QWhy is matrix multiplication O(n³) and what is the best known algorithm?
QWhen does AB = BA?
QWhy should you use np.linalg.solve instead of np.linalg.inv for solving linear systems?

Frequently Asked Questions

What is the difference between element-wise multiplication and matrix multiplication?

Element-wise (Hadamard product): C[i][j] = A[i][j] × B[i][j]. Requires same dimensions. In NumPy: A B. Matrix multiplication (dot product): C[i][j] = Σ A[i][k] × B[k][j]. Requires cols(A) = rows(B). In NumPy: A @ B. Neural networks use both — fully connected layers use @, attention masks use .

🔥

Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

About Naren Get in touch

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged