NumPy where, select and piecewise — Conditional Array Operations
- np.where(condition, x, y) is a vectorised ternary operator — no loop needed.
- np.select evaluates conditions in order — the first True condition wins.
- np.piecewise is useful when different mathematical functions apply to different intervals.
- np.where(condition, x, y) returns x where condition is True, y elsewhere; vectorised ternary
- np.select([cond1, cond2], [val1, val2], default) maps multiple exclusive conditions to values
- np.piecewise(x, [cond1, cond2], [func1, func2]) applies different functions per interval
- All three operate element-wise and return same-shape arrays as the input
- Performance: np.where ~3-5× faster than list comprehension for 1M elements
- Gotcha: np.where with single argument returns tuple of index arrays, not a mask
Production Incident
Production Debug GuideSymptom → Action mapping for np.where, np.select, and np.piecewise issues in production
Array operations often need conditional logic—clip outliers, assign grades, replace missing values. Most tutorials stop after showing np.where with a single condition. But production code frequently has multiple conditions, overlapping ranges, or per-interval functions. That's where np.select and np.piecewise earn their place. This article covers all three, the failure modes each solves, and the one rule that prevents most debugging pain: match the function to the shape of your decision logic.
np.where — Single Condition, Two Outcomes
np.where(condition, x, y) is the vectorised ternary operator for arrays. It evaluates condition element-wise, returns x[i] where condition[i] is True, y[i] otherwise. The single-argument form np.where(condition) returns a tuple of index arrays where condition is True, equivalent to np.nonzero(condition).
Common use cases: clipping values, replacing NaNs, assigning binary labels. The output dtype is inferred from x and y—if one is integer and the other float, the result is float.
One subtlety: when x and y are scalars, they're broadcast to match the condition shape. But if they are arrays, they must be broadcastable—mismatched shapes silently produce garbage or error.
import numpy as np # Binary classification based on threshold scores = np.array([55, 72, 88, 45, 91, 60]) grade = np.where(scores >= 70, 'pass', 'fail') print(grade) # ['fail' 'pass' 'pass' 'fail' 'pass' 'fail'] # Clip negative values to 0.0 data = np.array([-2.0, 3.0, -1.0, 5.0]) positive_only = np.where(data > 0, data, 0.0) print(positive_only) # [0. 3. 0. 5.] # Single-argument form: find indices where condition is True indices = np.where(scores < 60) print(indices) # (array([0, 3]),) # Use indices to modify original array (in-place filtering) scores[indices] = 0 print(scores) # [0 72 88 0 91 60]
[0. 3. 0. 5.]
(array([0, 3]),)
[0 72 88 0 91 60]
np.select — Multiple Exclusive Conditions
np.select evaluates a list of conditions in order and returns the corresponding choice for the first True condition encountered per element. If no condition is True, the default value is returned.
- Conditions are evaluated in order—the first True wins (like if-elif chain)
- All condition arrays must be boolean, all choice arrays must have the same shape (or be scalars)
- default can be any scalar or array—subject to broadcasting rules
- The function is fully vectorised: conditions are evaluated together, but the first-match logic is applied per element
Real-world uses: categorising continuous values (temperature → description), mapping error codes to severity levels, applying business rules to transaction amounts.
import numpy as np # Categorise temperature into four ranges temp = np.array([-5.0, 8.0, 18.0, 26.0, 35.0]) conditions = [ temp < 0, (temp >= 0) & (temp < 15), (temp >= 15) & (temp < 28), temp >= 28 ] choices = ['freezing', 'cold', 'comfortable', 'hot'] result = np.select(conditions, choices, default='unknown') print(result) # Output: ['freezing' 'cold' 'comfortable' 'comfortable' 'hot'] # With overlapping conditions, first True wins overlap_conditions = [temp < 10, temp < 20] # second condition is broader but comes later overlap_choices = ['low', 'medium'] result2 = np.select(overlap_conditions, overlap_choices, default='high') print(result2) # ['low' 'low' 'medium' 'high' 'high']
['low' 'low' 'medium' 'high' 'high']
- Order matters—place the narrowest condition first
- default is the else clause
- All conditions evaluate fully (vectorised), but only the first True per element is used
- Performance is constant with respect to number of conditions (all evaluated once)
np.piecewise — Function per Interval
np.piecewise applies different functions to different regions of an array. Unlike np.select which returns values directly, piecewise evaluates a callable for the elements that fall into each interval. This is useful when the outcome depends on a mathematical transformation specific to each range.
Signature: np.piecewise(x, condlist, funclist, args, *kw) - condlist: list of boolean arrays or scalars (conditions) - funclist: list of callables or values. If a value is not a callable, it's treated as a constant function returning that value. - If None is the last element of funclist, elements not matching any condition are set to the default (0 for numeric, False for bool, etc.).
The function is applied only to the subset of elements where the condition is True—this can reduce unnecessary computation.
Common use: piecewise linear transformations, clamping functions, adaptive masking.
import numpy as np # Soft clamp function: -1 below -1, identity between -1 and 1, 1 above 1 x = np.linspace(-3, 3, 7) result = np.piecewise( x, [x < -1, (x >= -1) & (x <= 1), x > 1], [lambda x: -1, lambda x: x, lambda x: 1] ) print(x) print(result) # [-3. -2. -1. 0. 1. 2. 3.] -> [-1. -1. -1. 0. 1. 1. 1.] # Using constant values (non-callable) in funclist # Assign 0 for negative, original for others result2 = np.piecewise(x, [x < 0, x >= 0], [0, lambda x: x]) print(result2) # [0. 0. 0. 0. 1. 2. 3.]
[-1. -1. -1. 0. 1. 1. 1.]
[0. 0. 0. 0. 1. 2. 3.]
Performance Comparison: Vectorised vs Loop
The primary value of conditional array functions is that they are vectorised—they operate on the entire array at once using compiled C code. A Python loop over elements with if-else runs at Python speed, often 10–100× slower.
But not all vectorised functions are equal. np.where creates intermediate boolean arrays. np.select evaluates all conditions. np.piecewise calls Python callables per condition group, which adds overhead.
- np.where: ~50 ms
- np.select (5 conditions): ~120 ms
- np.piecewise (3 intervals): ~200 ms
- List comprehension with if-elif-else: ~2.5 s
The gap widens with more conditions: np.select adds ~20 ms per condition; nested np.where adds ~40 ms per nesting level due to repeated allocations.
Memory-wise, np.select allocates one boolean array per condition plus the output array. For 10M float64 elements, that's 80 MB per boolean array (10M × 1 byte) — 5 conditions = 400 MB temporary memory. np.where with 3 args allocates two temporary arrays (condition mask and one value array).
import numpy as np import time n = 10_000_000 arr = np.random.uniform(-10, 10, n) # np.where (single condition, two outcomes) start = time.time() result = np.where(arr > 0, arr, 0.0) print(f"np.where: {time.time()-start:.3f}s") # np.select conditions = [arr < -5, (arr >= -5) & (arr < 0), (arr >= 0) & (arr < 5), arr >= 5] choices = [-5, 0, arr, 5] start = time.time() result = np.select(conditions, choices, default=0.0) print(f"np.select: {time.time()-start:.3f}s") # List comprehension start = time.time() result = [ -5 if v < -5 else (0 if v < 0 else (v if v < 5 else 5)) for v in arr ] print(f"Loop: {time.time()-start:.3f}s")
np.select: 0.118s
Loop: 2.431s
Common Pitfalls and How to Avoid Them
Even experienced NumPy users trip on these:
- Singular argument form: Calling np.where(cond) when you intended np.where(cond, x, y). The single-arg form returns a tuple of index arrays, not an array of values. Use it only when you explicitly need indices.
- Dtype mismatches: np.where and np.select infer output dtype from x, y, or choices/default. Mixing strings and numbers may force object dtype, losing performance. Keep types consistent.
- Overlapping conditions in np.select: The first True wins. If two conditions overlap unintentionally, you'll get unexpected results. Always check that conditions are mutually exclusive if that's the intent.
- np.piecewise function signature: The lambda must accept the array slice, not the whole array. Write lambda x: x + 1, not lambda: x + 1.
- Broadcasting errors: When x and y in np.where are arrays, they must broadcast to the shape of condition. Scalars are fine, but arrays may cause ValueError if shapes don't match.
- Default handling in np.select: If default is not provided, it defaults to 0, which may not be meaningful. Always specify an explicit default.
import numpy as np # Pitfall 1: Single-arg instead of three-arg arr = np.array([1, -2, 3]) # Wrong: indices = np.where(arr > 0) # returns (array([0, 2]),) # Correct: positives = np.where(arr > 0, arr, 0) print(positives) # [1 0 3] # Pitfall 2: Dtype mismatch forces object scores = np.array([55, 72]) # Wrong: result = np.where(scores > 60, 'pass', 0) # object dtype, slow # Correct: use same type result = np.where(scores > 60, 'pass', 'fail') print(result) # ['fail' 'pass'] # Pitfall 3: Overlapping conditions in np.select temp = np.array([20]) # Overlap: condition[0] temp >= 10, condition[1] temp >= 18 — both True # Wrong order (narrower first is correct) conds = [temp >= 10, temp >= 18] # first wins: both match, first is 10+ choices = ['mild', 'warm'] print(np.select(conds, choices)) # ['mild'] — never reaches 'warm' # Fix: put narrower condition first conds_fixed = [temp >= 18, temp >= 10] print(np.select(conds_fixed, choices)) # ['warm']
['fail' 'pass']
['mild']
['warm']
| Feature | np.where | np.select | np.piecewise |
|---|---|---|---|
| Number of outcomes | 2 | Unlimited | Unlimited |
| Outcome type | Value or array | Value or array | Function (callable) or value |
| Conditions evaluated | Single | All (first True wins) | All (first True wins) |
| Default fallback | Implicit (y) | Explicit default param | None or last function |
| Memory usage | Low (2 temp arrays) | High (1 boolean per condition) | Moderate (calls per match) |
| Speed (10M elements) | ~50 ms | ~120 ms (5 conds) | ~200 ms (3 intervals) |
| Readability growth with conditions | Degrades (nested) | Good (list forms) | Good (list forms) |
🎯 Key Takeaways
- np.where(condition, x, y) is a vectorised ternary operator — no loop needed.
- np.select evaluates conditions in order — the first True condition wins.
- np.piecewise is useful when different mathematical functions apply to different intervals.
- np.where with a single argument returns a tuple of index arrays — equivalent to np.nonzero.
- All three functions operate element-wise and return arrays of the same shape as the input.
- For more than two outcomes, prefer np.select over nested np.where for both performance and readability.
- np.piecewise is 10–50× faster than a Python loop but 2–4× slower than np.clip for simple operations.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QHow would you replace all negative values in a NumPy array with zero without a loop?JuniorReveal
- QWhen would you use np.select instead of nested np.where calls?Mid-levelReveal
- QExplain the difference between np.where and np.piecewise when both can handle multiple conditions.SeniorReveal
Frequently Asked Questions
What is the difference between np.where and np.select?
np.where handles a single condition with two outcomes (x if True, y if False). np.select handles multiple mutually exclusive conditions with a corresponding value for each, plus a default for when none match. For complex logic, np.select is cleaner than nesting multiple np.where calls.
Can np.where return strings?
Yes. The output dtype is inferred from x and y. If both are strings, the result is a string array. np.where(arr > 0, 'positive', 'non-positive') works as expected.
Does np.piecewise evaluate all functions for all elements?
No. Each function in funclist is called only with the array elements that satisfy the corresponding condition. This means expensive functions are only applied where needed. However, the conditions themselves are evaluated for all elements.
What happens if no condition matches in np.select?
The default value is returned for that element. If no default is provided, it defaults to 0 (or False for bool arrays). Always specify an explicit default to avoid silent bugs.
Can I use np.where to modify the original array in-place?
Not directly with the three-argument form—it returns a new array. For in-place modification, use boolean indexing: arr[condition] = new_value. This avoids allocating a new array.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.