Universal Functions (ufuncs)
Ufuncs apply a function to every element of an array. They support broadcasting, type casting, and output array specification.
import numpy as np
a = np.array([1.0, 4.0, 9.0, 16.0])
print(np.sqrt(a)) # [1. 2. 3. 4.]
print(np.log(a)) # [0. 1.386 2.197 2.773]
print(np.exp([0, 1, 2])) # [1. 2.718 7.389]
print(np.abs([-3, -1, 2])) # [3 1 2]
# Two-array ufuncs
b = np.array([2.0, 3.0, 4.0, 5.0])
print(np.add(a, b)) # same as a + b
print(np.maximum(a, b)) # element-wise max
print(np.power(b, 2)) # [4. 9. 16. 25.]
Output
[1. 2. 3. 4.]
[3 1 2]
Production Insight
Ufuncs automatically upcast dtypes (float32 + float64 = float64) which can silently double memory.
When combined with broadcasting, the intermediate result can blow up before the operation completes.
Rule: always check the dtype of the output via .dtype before processing high-volume data.
Key Takeaway
Ufuncs are fast because they're written in C.
Broadcasting + dtype promotion can cause unexpected memory spikes.
If memory is tight, specify output dtype explicitly with the dtype parameter.
Aggregation Functions and the axis Parameter
axis=0 operates down rows (along columns). axis=1 operates across columns (along rows). axis=None reduces everything to a scalar.
import numpy as np
m = np.array([[1, 2, 3],
[4, 5, 6]])
print(m.sum()) # 21 — all elements
print(m.sum(axis=0)) # [5 7 9] — column sums
print(m.sum(axis=1)) # [6 15] — row sums
print(m.mean(axis=0)) # [2.5 3.5 4.5]
print(m.max(axis=1)) # [3 6]
print(m.argmax(axis=1)) # [2 2] — index of max in each rowProduction Insight
The most common production bug is passing axis=0 when you meant axis=1 (or vice versa).
The result shape changes silently — a 2D array becomes 1D, which can break downstream code expecting a different shape.
Use keepdims=True to preserve dimensionality and make the intent explicit.
Key Takeaway
axis=0 collapses rows (down), axis=1 collapses columns (across).
keepdims=True keeps the reduced dimension as a size-1 axis.
Always test the output shape before passing to functions that expect a specific number of dimensions.
Statistical Functions
NumPy covers standard statistics. For variance and standard deviation, note the ddof parameter — ddof=0 is population (default), ddof=1 is sample.
import numpy as np
data = np.array([2, 4, 4, 4, 5, 5, 7, 9])
print(np.mean(data)) # 5.0
print(np.median(data)) # 4.5
print(np.std(data)) # 2.0 (population)
print(np.std(data, ddof=1)) # 2.138 (sample)
print(np.var(data)) # 4.0
print(np.percentile(data, 75)) # 5.75
print(np.corrcoef([1,2,3], [1,2,3])) # correlation matrix
Production Insight
Default ddof=0 gives population statistics, but most real-world uses require sample statistics (ddof=1).
Using the wrong ddof can shift ML model thresholds by a non-trivial amount.
Also, np.std is not robust to outliers — consider using np.median and IQR for production anomaly detection.
Key Takeaway
np.std() defaults to population std (ddof=0).
For sample data, always pass ddof=1.
For robust statistics in production, use median and IQR instead of mean and std.
Broadcasting: How ufuncs Handle Different Shapes
Broadcasting allows ufuncs to operate on arrays of different shapes by automatically expanding dimensions. The arrays must be compatible: they align from the rightmost dimension, and each dimension must be equal or one of them must be 1.
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6]]) # shape (2,3)
b = np.array([10, 20, 30]) # shape (3,) -> broadcasts to (1,3) then (2,3)
print(a + b) # element-wise addition
# [[11 22 33]
# [14 25 36]]
# Scalar broadcasting works too
print(a * 2) # multiplies every element by 2Output
[[11 22 33]
[14 25 36]]
[[2 4 6]
[8 10 12]]
Production Insight
Broadcasting is memory-efficient because it creates virtual views — no data is copied.
But if you accidentally create a very large broadcast (e.g., shape (1, 1_000_000) + (1_000_000, 1) -> (1_000_000, 1_000_000)), you can burn gigabytes of RAM instantly.
The broadcast error message 'operands could not be broadcast together with shapes' is a frequent debugging stop.
Key Takeaway
Broadcasting aligns dimensions from the right.
It's fast and memory-efficient for small expansions.
Watch out for accidental large broadcasts — they can OOM your process in seconds.
Vectorised operations in NumPy run in compiled C and are orders of magnitude faster than Python loops. The performance gap grows with array size — for a 10-million-element array, vectorised sum takes ~5ms, while a Python loop takes over a minute.
import numpy as np
import time
arr = np.random.randn(10_000_000)
# Vectorised
start = time.time()
result = np.sum(arr)
print(f"Vectorised: {time.time() - start:.5f}s")
# Python loop
start = time.time()
total = 0.0
for x in arr:
total += x
print(f"Loop: {time.time() - start:.5f}s")Output
Vectorised: 0.005s
Loop: 62.3s
Production Insight
Writing Python loops over NumPy arrays is often a code smell in code reviews.
The performance penalty becomes critical in data pipelines processing millions of rows per second.
Always use np.vectorize or np.frompyfunc only as a last resort — they are still slower than native ufuncs, but better than loops.
Key Takeaway
Vectorised operations are 10-100x faster than Python loops.
If you see a for loop over a NumPy array in production code, it's a red flag.
For custom element-wise logic, use np.vectorize only when you cannot rewrite the function as a ufunc.
Advanced ufunc Methods: reduce, accumulate, outer
Ufuncs provide methods beyond direct element-wise application. reduce applies the operation cumulatively along an axis, accumulate returns all intermediate results, and outer computes the outer product.
import numpy as np
a = np.array([1, 2, 3, 4])
# reduce: cumulative sum produces single value
print(np.add.reduce(a)) # 10 = 1+2+3+4
# accumulate: all intermediate sums
print(np.add.accumulate(a)) # [1 3 6 10]
# outer: outer product
print(np.multiply.outer([1,2,3], [10,20,30]))
# [[10 20 30]
# [20 40 60]
# [30 60 90]]
Output
10
[1 3 6 10]
[[10 20 30]
[20 40 60]
[30 60 90]]
Production Insight
np.add.reduce is identical to np.sum in behaviour but slightly faster for some dtypes.
accumulate is useful for running totals in time-series pipelines but can overflow with int32 arrays.
outer operations can produce huge outputs (O(n^2) memory) — always check dimensions before using them in production.
Key Takeaway
Ufunc methods (reduce, accumulate, outer) are efficient tools for specialised operations.
Outer product memory scales quadratically — dangerous for large arrays.
Use accumulate for running totals instead of manual loops.
Handling NaN in Statistical Aggregations
By default, NumPy aggregation functions (mean, sum, std) return NaN if any element is NaN. Use nan-aware versions: np.nansum, np.nanmean, np.nanstd. They ignore NaN values and compute on the remaining elements.
import numpy as np
data = np.array([1.0, 2.0, np.nan, 4.0])
print(np.mean(data)) # nan
print(np.nanmean(data)) # 2.333...
print(np.std(data)) # nan
print(np.nanstd(data)) # ~1.247
# For percentage of missing values
print(np.isnan(data).sum()) # 1
Output
nan
2.333333333333333
nan
1.247219128924647
1
Production Insight
Silent NaN propagation is one of the most insidious bugs in data pipelines.
A single missing value can turn an entire column of statistics to NaN, which often goes unnoticed for days.
Always use np.isnan().any() in assertions before feeding aggregated results to downstream systems.
Key Takeaway
Default aggregations propagate NaN silently.
Always use nan-aware functions (np.nanmean, np.nansum, np.nanstd) when data may contain missing values.
Assert no NaN after aggregation: assert not np.any(np.isnan(result))