Senior 3 min · March 16, 2026

NumPy Mathematical Functions — The NaN Trap in np.mean

np.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • NumPy mathematical functions are vectorised: they run in compiled C, not Python loops.
  • Ufuncs (universal functions) operate element-wise and support broadcasting.
  • Aggregation functions reduce arrays: sum, mean, min, max with axis control.
  • Axis=0 collapses rows (down), axis=1 collapses columns (across).
  • Performance: vectorised operations are 10-100x faster than equivalent Python loops.
  • Production trap: forgetting nan-safe aggregations (np.nanmean) silently propagates NaNs.

Universal Functions (ufuncs)

Ufuncs apply a function to every element of an array. They support broadcasting, type casting, and output array specification.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import numpy as np

a = np.array([1.0, 4.0, 9.0, 16.0])

print(np.sqrt(a))          # [1. 2. 3. 4.]
print(np.log(a))           # [0.    1.386 2.197 2.773]
print(np.exp([0, 1, 2]))   # [1.    2.718 7.389]
print(np.abs([-3, -1, 2])) # [3 1 2]

# Two-array ufuncs
b = np.array([2.0, 3.0, 4.0, 5.0])
print(np.add(a, b))       # same as a + b
print(np.maximum(a, b))   # element-wise max
print(np.power(b, 2))     # [4. 9. 16. 25.]
Output
[1. 2. 3. 4.]
[3 1 2]
Production Insight
Ufuncs automatically upcast dtypes (float32 + float64 = float64) which can silently double memory.
When combined with broadcasting, the intermediate result can blow up before the operation completes.
Rule: always check the dtype of the output via .dtype before processing high-volume data.
Key Takeaway
Ufuncs are fast because they're written in C.
Broadcasting + dtype promotion can cause unexpected memory spikes.
If memory is tight, specify output dtype explicitly with the dtype parameter.

Aggregation Functions and the axis Parameter

axis=0 operates down rows (along columns). axis=1 operates across columns (along rows). axis=None reduces everything to a scalar.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
import numpy as np

m = np.array([[1, 2, 3],
              [4, 5, 6]])

print(m.sum())           # 21 — all elements
print(m.sum(axis=0))     # [5 7 9] — column sums
print(m.sum(axis=1))     # [6 15] — row sums

print(m.mean(axis=0))    # [2.5 3.5 4.5]
print(m.max(axis=1))     # [3 6]
print(m.argmax(axis=1))  # [2 2] — index of max in each row
Output
21
[5 7 9]
[6 15]
Production Insight
The most common production bug is passing axis=0 when you meant axis=1 (or vice versa).
The result shape changes silently — a 2D array becomes 1D, which can break downstream code expecting a different shape.
Use keepdims=True to preserve dimensionality and make the intent explicit.
Key Takeaway
axis=0 collapses rows (down), axis=1 collapses columns (across).
keepdims=True keeps the reduced dimension as a size-1 axis.
Always test the output shape before passing to functions that expect a specific number of dimensions.

Statistical Functions

NumPy covers standard statistics. For variance and standard deviation, note the ddof parameter — ddof=0 is population (default), ddof=1 is sample.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
import numpy as np

data = np.array([2, 4, 4, 4, 5, 5, 7, 9])

print(np.mean(data))          # 5.0
print(np.median(data))        # 4.5
print(np.std(data))           # 2.0  (population)
print(np.std(data, ddof=1))   # 2.138 (sample)
print(np.var(data))           # 4.0
print(np.percentile(data, 75)) # 5.75
print(np.corrcoef([1,2,3], [1,2,3]))  # correlation matrix
Output
5.0
4.5
2.0
Production Insight
Default ddof=0 gives population statistics, but most real-world uses require sample statistics (ddof=1).
Using the wrong ddof can shift ML model thresholds by a non-trivial amount.
Also, np.std is not robust to outliers — consider using np.median and IQR for production anomaly detection.
Key Takeaway
np.std() defaults to population std (ddof=0).
For sample data, always pass ddof=1.
For robust statistics in production, use median and IQR instead of mean and std.

Broadcasting: How ufuncs Handle Different Shapes

Broadcasting allows ufuncs to operate on arrays of different shapes by automatically expanding dimensions. The arrays must be compatible: they align from the rightmost dimension, and each dimension must be equal or one of them must be 1.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
import numpy as np

a = np.array([[1, 2, 3],
              [4, 5, 6]])  # shape (2,3)
b = np.array([10, 20, 30]) # shape (3,) -> broadcasts to (1,3) then (2,3)
print(a + b)  # element-wise addition
# [[11 22 33]
#  [14 25 36]]

# Scalar broadcasting works too
print(a * 2)  # multiplies every element by 2
Output
[[11 22 33]
[14 25 36]]
[[2 4 6]
[8 10 12]]
Production Insight
Broadcasting is memory-efficient because it creates virtual views — no data is copied.
But if you accidentally create a very large broadcast (e.g., shape (1, 1_000_000) + (1_000_000, 1) -> (1_000_000, 1_000_000)), you can burn gigabytes of RAM instantly.
The broadcast error message 'operands could not be broadcast together with shapes' is a frequent debugging stop.
Key Takeaway
Broadcasting aligns dimensions from the right.
It's fast and memory-efficient for small expansions.
Watch out for accidental large broadcasts — they can OOM your process in seconds.

Performance: Vectorisation vs Python Loops

Vectorised operations in NumPy run in compiled C and are orders of magnitude faster than Python loops. The performance gap grows with array size — for a 10-million-element array, vectorised sum takes ~5ms, while a Python loop takes over a minute.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import numpy as np
import time

arr = np.random.randn(10_000_000)

# Vectorised
start = time.time()
result = np.sum(arr)
print(f"Vectorised: {time.time() - start:.5f}s")

# Python loop
start = time.time()
total = 0.0
for x in arr:
    total += x
print(f"Loop: {time.time() - start:.5f}s")
Output
Vectorised: 0.005s
Loop: 62.3s
Production Insight
Writing Python loops over NumPy arrays is often a code smell in code reviews.
The performance penalty becomes critical in data pipelines processing millions of rows per second.
Always use np.vectorize or np.frompyfunc only as a last resort — they are still slower than native ufuncs, but better than loops.
Key Takeaway
Vectorised operations are 10-100x faster than Python loops.
If you see a for loop over a NumPy array in production code, it's a red flag.
For custom element-wise logic, use np.vectorize only when you cannot rewrite the function as a ufunc.

Advanced ufunc Methods: reduce, accumulate, outer

Ufuncs provide methods beyond direct element-wise application. reduce applies the operation cumulatively along an axis, accumulate returns all intermediate results, and outer computes the outer product.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import numpy as np

a = np.array([1, 2, 3, 4])

# reduce: cumulative sum produces single value
print(np.add.reduce(a))  # 10 = 1+2+3+4

# accumulate: all intermediate sums
print(np.add.accumulate(a))  # [1 3 6 10]

# outer: outer product
print(np.multiply.outer([1,2,3], [10,20,30]))
# [[10 20 30]
#  [20 40 60]
#  [30 60 90]]
Output
10
[1 3 6 10]
[[10 20 30]
[20 40 60]
[30 60 90]]
Production Insight
np.add.reduce is identical to np.sum in behaviour but slightly faster for some dtypes.
accumulate is useful for running totals in time-series pipelines but can overflow with int32 arrays.
outer operations can produce huge outputs (O(n^2) memory) — always check dimensions before using them in production.
Key Takeaway
Ufunc methods (reduce, accumulate, outer) are efficient tools for specialised operations.
Outer product memory scales quadratically — dangerous for large arrays.
Use accumulate for running totals instead of manual loops.

Handling NaN in Statistical Aggregations

By default, NumPy aggregation functions (mean, sum, std) return NaN if any element is NaN. Use nan-aware versions: np.nansum, np.nanmean, np.nanstd. They ignore NaN values and compute on the remaining elements.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
import numpy as np

data = np.array([1.0, 2.0, np.nan, 4.0])

print(np.mean(data))   # nan
print(np.nanmean(data)) # 2.333...

print(np.std(data))     # nan
print(np.nanstd(data))  # ~1.247

# For percentage of missing values
print(np.isnan(data).sum())  # 1
Output
nan
2.333333333333333
nan
1.247219128924647
1
Production Insight
Silent NaN propagation is one of the most insidious bugs in data pipelines.
A single missing value can turn an entire column of statistics to NaN, which often goes unnoticed for days.
Always use np.isnan().any() in assertions before feeding aggregated results to downstream systems.
Key Takeaway
Default aggregations propagate NaN silently.
Always use nan-aware functions (np.nanmean, np.nansum, np.nanstd) when data may contain missing values.
Assert no NaN after aggregation: assert not np.any(np.isnan(result))
● Production incidentPOST-MORTEMseverity: high

The Silent NaN: How Aggregation Broke Our ML Pipeline

Symptom
Model produced NaN predictions after a data refresh. Input data had occasional missing sensor readings, but np.mean returned NaN for the entire column, which propagated through the feature pipeline.
Assumption
The team assumed that np.mean would simply skip missing values (like Excel's AVERAGE does). They also assumed that a 'few missing values' wouldn't affect the final model.
Root cause
np.mean returns NaN if any of the input elements is NaN. The team used np.mean instead of np.nanmean, and the downstream scaler and model had no NaN handling — so NaN propagated to predictions.
Fix
Replaced np.mean with np.nanmean in the feature processing code. Added a pre-aggregation check: if np.any(np.isnan(data)), log a warning and drop the row or impute. Also added a post-aggregation assertion: assert not np.isnan(result).any().
Key lesson
  • Never assume standard aggregation functions handle missing data gracefully.
  • Always use nan-aware functions when your data pipeline may contain missing values.
  • Add defensive checks after aggregation to catch NaN propagation early.
  • Review the function's documentation — np.mean vs np.nanmean, np.std vs np.nanstd.
Production debug guideCommon failure modes when using ufuncs and aggregations5 entries
Symptom · 01
Aggregation returns NaN or Inf unexpectedly
Fix
Check for NaN values in input: np.isnan(data).any(). If true, use nan-aware functions (np.nanmean, np.nansum, etc.) or clean the data. Also check for Inf with np.isinf().
Symptom · 02
BroadcastingError: operands could not be broadcast together with shapes (a,b) (c,d)
Fix
Print both array shapes: print(a.shape, b.shape). Ensure dimensions are compatible: they must be equal or one must be 1, aligning from the right. Reshape using .reshape() or np.newaxis.
Symptom · 03
MemoryError when using np.power or np.outer on large arrays
Fix
Check intermediate array sizes. For power, use out parameter to write to pre-allocated array. For outer, consider chunking or use np.multiply.outer with careful memory monitoring.
Symptom · 04
Unexpected result: aggregation returns a scalar instead of an array
Fix
Check if you forgot the axis parameter. By default, axis=None reduces all elements. Specify axis=0 or axis=1 for column/row operations. Use keepdims=True to preserve dimensions.
Symptom · 05
np.std result differs from expected sample standard deviation
Fix
np.std defaults to population std (ddof=0). For sample std, pass ddof=1. Compare with np.std(data, ddof=1) and confirm using a small test set.
★ Quick Debug Cheatsheet: NumPy Math FunctionsImmediate commands to diagnose and fix common issues in production
NaN in aggregation result
Immediate action
Run np.isnan(data).any() to detect NaNs in input
Commands
np.isnan(data).sum()
np.nanmean(data) if using mean
Fix now
Replace np.mean with np.nanmean or clean data with np.nan_to_num(data)
Broadcasting error+
Immediate action
Print shapes with print(a.shape, b.shape)
Commands
a.shape, b.shape
np.broadcast_shapes(a.shape, b.shape) (NumPy >= 1.20)
Fix now
Reshape one array: b = b.reshape(-1, 1) or use np.newaxis
Memory blow-up (OOM)+
Immediate action
Stop the process and estimate array sizes
Commands
sys.getsizeof(a) / (1024**3) # GB
a.nbytes / (1024**3) # more accurate
Fix now
Use out= parameter to write into pre-allocated array, or use np.linalg.norm instead of outer for distance matrices
Wrong reduction dimension+
Immediate action
Check axis value and output shape
Commands
result.shape
np.sum(a, axis=0, keepdims=True).shape
Fix now
Add keepdims=True and verify shape before downstream logic

Key takeaways

1
Ufuncs operate element-wise and support broadcasting
always prefer them over Python loops.
2
Aggregation functions accept an axis parameter
axis=0 collapses rows, axis=1 collapses columns.
3
np.std() uses ddof=0 (population) by default. Use ddof=1 for sample standard deviation.
4
np.argmax() and np.argmin() return indices, not values.
5
np.cumsum() and np.cumprod() give running totals without reducing the array size.
6
Broadcasting avoids memory copies but can silently explode when shapes are non-optimal.
7
Always use nan-aware functions (np.nanmean, np.nansum) when data may have missing values.
8
Ufunc methods (reduce, accumulate, outer) are efficient for specialised operations but watch for overflow and memory.

Common mistakes to avoid

4 patterns
×

Using Python loops instead of ufuncs

Symptom
Code runs extremely slowly (minutes vs milliseconds) on large arrays. CPU usage is low because Python overhead dominates.
Fix
Replace the loop with a vectorised operation (e.g., np.sum, np.sqrt, or custom ufunc using np.vectorize as last resort).
×

Forgetting the axis parameter in aggregations

Symptom
You expected an array of per-column sums but got a single scalar. Downstream code expecting a 1D array breaks with shape mismatch.
Fix
Always specify axis=0 for column-wise, axis=1 for row-wise. Use keepdims=True to retain dimensions.
×

Using np.std without ddof=1 for sample data

Symptom
Standard deviation is slightly smaller than expected (population vs sample). In hypothesis testing or anomaly detection, thresholds are off.
Fix
Use np.std(data, ddof=1) for sample standard deviation. For variance, np.var(data, ddof=1).
×

Assuming aggregations ignore NaN values

Symptom
Result is NaN when input contains any NaN. Downstream processes fail or produce NaN predictions.
Fix
Use nan-aware functions: np.nanmean, np.nansum, np.nanstd. Or clean input with np.isnan check.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What does the axis parameter mean in NumPy aggregation functions?
Q02SENIOR
What is the difference between np.std() with ddof=0 and ddof=1?
Q03SENIOR
How does broadcasting work in NumPy?
Q04SENIOR
Why are vectorised operations faster than Python loops in NumPy?
Q01 of 04JUNIOR

What does the axis parameter mean in NumPy aggregation functions?

ANSWER
The axis parameter specifies the dimension along which to perform the aggregation. For a 2D array: axis=0 operates along the rows (collapsing rows, giving per-column results), axis=1 operates along the columns (collapsing columns, giving per-row results). axis=None (default) flattens the array and aggregates everything to a scalar. The shape of the result depends on which axes are removed. Using keepdims=True preserves the reduced axes as size-1 dimensions, which helps maintain shape compatibility.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between np.sum(a) and a.sum()?
02
How do I compute a weighted average in NumPy?
03
How do I compute the running sum or cumulative product?
04
What is the difference between np.add.reduce and np.sum?
🔥

That's Python Libraries. Mark it forged?

3 min read · try the examples if you haven't

Previous
NumPy Shape Manipulation — reshape, flatten, ravel, transpose
28 / 51 · Python Libraries
Next
NumPy Random Module — Generating and Controlling Random Data