Home Python NumPy Random Module — Generating and Controlling Random Data

NumPy Random Module — Generating and Controlling Random Data

⚡ Quick Answer
NumPy 1.17+ recommends using the Generator API: rng = np.random.default_rng(seed=42). This is faster and statistically better than the legacy np.random.rand() functions. Use rng.random(), rng.integers(), rng.normal(), rng.choice() for most tasks.

The Modern Generator API

Create a generator with np.random.default_rng(). Pass a seed for reproducibility.

Example · PYTHON
123456789
import numpy as np

# Reproducible — same seed gives same numbers every run
rng = np.random.default_rng(seed=42)

print(rng.random(5))          # 5 floats in [0, 1)
print(rng.integers(0, 10, 5)) # 5 ints in [0, 10)
print(rng.normal(0, 1, 5))    # 5 standard normal samples
print(rng.uniform(2.0, 5.0, 3)) # 3 floats in [2, 5)
▶ Output
[0.773 0.438 0.858 0.697 0.094]
[0 9 5 0 2]
[-0.234 1.573 -0.462 0.241 -1.913]

Common Distributions

Example · PYTHON
1234567891011121314
import numpy as np
rng = np.random.default_rng(0)

# Normal (Gaussian)
print(rng.normal(loc=170, scale=10, size=5))  # heights in cm

# Binomial — n trials, p probability
print(rng.binomial(n=10, p=0.5, size=5))  # coin flips

# Poisson — events per interval
print(rng.poisson(lam=3, size=5))

# Exponential — time between events
print(rng.exponential(scale=1.0, size=5))
▶ Output
[175.4 164.8 171.2 167.9 182.3]
[4 5 7 5 3]

Shuffling and Sampling

Example · PYTHON
1234567891011121314151617181920
import numpy as np
rng = np.random.default_rng(42)

arr = np.arange(10)

# Shuffle in place
rng.shuffle(arr)
print(arr)  # [0 3 7 2 5 1 9 4 6 8] — order varies

# Sample without replacement
print(rng.choice(arr, size=3, replace=False))

# Sample with replacement (bootstrap)
print(rng.choice(arr, size=5, replace=True))

# Permutation — returns a copy, does not modify original
orig = np.arange(5)
shuffled = rng.permutation(orig)
print(orig)     # [0 1 2 3 4] — unchanged
print(shuffled) # shuffled copy
▶ Output
[ 0 3 7 2 5 1 9 4 6 8]
[0 1 2 3 4]
[3 7 2]

🎯 Key Takeaways

  • Use np.random.default_rng(seed) for new code — it is faster and better than the legacy API.
  • Seeding makes random numbers reproducible — essential for ML experiments.
  • rng.shuffle() modifies in place; rng.permutation() returns a copy.
  • rng.choice() with replace=False is sampling without replacement.
  • Each call to a Generator method advances the internal state — the same rng object produces different numbers on consecutive calls.

Interview Questions on This Topic

  • QWhy is np.random.default_rng() preferred over np.random.seed() in modern NumPy?
  • QHow do you generate reproducible random numbers in NumPy?

Frequently Asked Questions

What is the difference between np.random.seed() and np.random.default_rng()?

np.random.seed() sets a global seed that affects all legacy numpy.random functions. np.random.default_rng() creates an independent Generator object. The Generator approach is better because it avoids shared global state — multiple generators with different seeds can run independently in the same process.

Why does my random data change every time I run my script?

You are not seeding the generator. Add seed=42 (or any integer) to np.random.default_rng(). The exact number does not matter — what matters is that you use the same number consistently.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousNumPy Mathematical Functions — ufuncs, aggregations and statisticsNext →NumPy Linear Algebra — dot, matmul, linalg explained
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged