Home Python NumPy Boolean Indexing and Fancy Indexing

NumPy Boolean Indexing and Fancy Indexing

⚡ Quick Answer
Boolean indexing filters an array using a True/False mask: arr[arr > 0]. Fancy indexing selects elements using an array of indices: arr[[0, 2, 4]]. Both return copies. Use np.where() to return element values based on a condition, or indices where a condition is True.

Boolean Indexing — Filtering by Condition

Boolean indexing is the primary way to perform 'Search and Destroy' operations on your data. By applying a comparison operator to an array, you generate a boolean 'mask'. When this mask is passed back into the array, NumPy retrieves only the elements that correspond to 'True'.

io/thecodeforge/numpy/boolean_filtering.py · PYTHON
123456789101112131415161718
import numpy as np

# Production log latency samples in ms
latencies = np.array([72, 85, 91, 60, 78, 95, 55, 88])

# Filter: High-latency samples (≥70ms)
critical_events = latencies[latencies >= 70]
print(f"Critical Latencies: {critical_events}")

# In-place modification: Cap outliers at 90ms
# Note: This modifies the original 'latencies' array
latencies[latencies > 90] = 90
print(f"Capped Latencies: {latencies}")

# Compound conditions: Latency between 70 and 85
# Logic: Bitwise & is required for element-wise comparison
mid_range = latencies[(latencies >= 70) & (latencies < 85)]
print(f"Mid-range events: {mid_range}")
▶ Output
Critical Latencies: [72 85 91 78 95 88]
Capped Latencies: [72 85 90 60 78 90 55 88]
Mid-range events: [72 78]

np.where — Conditional Selection and Replacement

Think of np.where as a vectorized if-else statement. It allows you to transform an entire dataset based on a predicate without the performance penalty of a list comprehension.

io/thecodeforge/numpy/conditional_ops.py · PYTHON
1234567891011121314
import numpy as np

# Environmental sensor data
temps = np.array([18.5, 22.1, 35.4, 8.2, 30.0])

# Functional Syntax: np.where(condition, if_true, if_false)
# If temp > 30, normalize to 30.0, else keep original
normalized_temps = np.where(temps > 30, 30.0, temps)
print(f"Normalized: {normalized_temps}")

# Search Syntax: Single argument returns coordinates
hot_indices = np.where(temps > 25)
print(f"Indices of hot zones: {hot_indices}")
print(f"Actual values: {temps[hot_indices]}")
▶ Output
Normalized: [18.5 22.1 30. 8.2 30. ]
Indices of hot zones: (array([2, 4]),)

Fancy Indexing and Dimensional Meshing

Fancy indexing uses integer arrays to pull non-contiguous data. When working with 2D matrices, np.ix_ allows you to construct a 'mesh' to extract rectangular sub-sections with surgical precision.

io/thecodeforge/numpy/fancy_indexing.py · PYTHON
1234567891011121314151617
import numpy as np

# 4x4 Identity Matrix surrogate
matrix = np.arange(16).reshape(4, 4)

# Use np.ix_ to grab a cross-section sub-matrix
# Extracting rows 0 and 2, and columns 1 and 3
sub_matrix = matrix[np.ix_([0, 2], [1, 3])]
print(f"Sub-matrix extracted:\n{sub_matrix}")

# Advanced Pattern: Sorting a matrix by a specific feature (Column 0)
# argsort() is a staple for custom data ordering
raw_data = np.array([[3, 100], [1, 500], [2, 200]])
sort_indices = raw_data[:, 0].argsort()
sorted_data = raw_data[sort_indices]

print(f"Sorted by primary key:\n{sorted_data}")
▶ Output
[[ 1 3]
[ 9 11]]
[[ 1 500]
[ 2 200]
[ 3 100]]

🎯 Key Takeaways

  • Boolean indexing always returns a copy — in-place modification via mask works because NumPy uses the mask to locate elements first.
  • Use & and | for compound conditions, not 'and'/'or'. Wrap each condition in parentheses.
  • np.where(condition) returns indices; np.where(condition, x, y) returns values.
  • np.ix_ builds an open mesh for submatrix selection using fancy indexing.
  • argsort() returns the indices that would sort the array — useful for sorting by a column.

Interview Questions on This Topic

  • QExplain the difference between a 'View' and a 'Copy' in NumPy. Which one does Boolean Indexing produce?
  • QHow does the bitwise '&' operator differ from the 'and' keyword when applied to two NumPy boolean masks?
  • QHow would you use `np.where` to replace all NaN values in a dataset with the column's mean?
  • QGiven a 2D array, how do you extract the diagonal elements using Fancy Indexing without using `np.diag`?
  • QDescribe the performance trade-offs of using a boolean mask vs. a list of integer indices for filtering large arrays.

Frequently Asked Questions

Why do I get 'ValueError: The truth value of an array is ambiguous'?

This usually happens when you use Python's built-in 'and' or 'or' keywords on NumPy arrays. Python tries to evaluate if the entire array is 'True', which doesn't make sense. Use the bitwise operators & (AND) and | (OR) to force element-wise comparison. Pro tip: Always wrap conditions in parentheses like (arr > 0) & (arr < 5) to avoid operator precedence issues.

Can I use a boolean mask to set values in the original array?

Yes. arr[arr < 0] = 0 sets all negative values to zero in-place. NumPy uses the mask to locate the positions, then writes to those positions in the original array.

Does Fancy Indexing return a view or a copy?

Unlike basic slicing (e.g., arr[0:5]), which returns a view, Fancy Indexing (using lists of indices) always returns a copy of the data. This means modifying the results of a fancy index will not reflect back in the original array unless you perform an assignment back to it.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousNumPy Linear Algebra — dot, matmul, linalg explainedNext →NumPy Performance Tips — Vectorisation vs Loops
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged