NumPy Random Module — Generating and Controlling Random Data
- Use np.random.default_rng(seed) for new code — it is faster and better than the legacy API.
- Seeding makes random numbers reproducible — essential for ML experiments.
- rng.shuffle() modifies in place;
rng.permutation()returns a copy.
- NumPy 1.17+ recommends the Generator API: rng = np.random.default_rng(seed=42)
- Generator API is faster and statistically better than legacy np.random.rand() functions
- Use rng.random(), rng.integers(), rng.normal(), rng.choice() for most tasks
- The legacy API uses a global state; the Generator API creates independent random number generators
- Seeding ensures reproducibility: same seed → same sequence every run
- Performance: Generator API is ~30% faster for single-threaded random draws
I can't reproduce a random number sequence
print(rng.bit_generator.state['state']['state']) # low-level state checknp.random.get_state() # only for legacy API; check if modified elsewhereLegacy np.random.seed() seems to have no effect
grep -r 'numpy.random.seed' . --include='*.py'In the script, insert a sys.addaudithook to log seed changes: sys.setprofile? Not ideal; better to replace all legacy calls with Generator.Production Incident
np.random.seed() independently: the data loader used np.random.seed(int(time.time())) to shuffle, overwriting the global seed. Additionally, the augmentation library used the legacy np.random.rand() which respects the global state. The seed was not passed explicitly.rng.normal() etc. The training script now saves the full generator state (rng.bit_generator.state) for exact restarts.Production Debug GuideSymptom → Action guide for common NumPy random issues
np.random.rand() without a seed or uses a different Generator, the sequence diverges. Add a logging statement that prints rng.bit_generator.state['state']['state'] after setup.The Modern Generator API
Create a generator with np.random.default_rng(). Pass a seed for reproducibility. The generator object is independent — you can have multiple generators with different seeds without interference. Use it for all subsequent random operations.
import numpy as np # Reproducible — same seed gives same numbers every run rng = np.random.default_rng(seed=42) print(rng.random(5)) # 5 floats in [0, 1) print(rng.integers(0, 10, 5)) # 5 ints in [0, 10) print(rng.normal(0, 1, 5)) # 5 standard normal samples print(rng.uniform(2.0, 5.0, 3)) # 3 floats in [2, 5)
[0 9 5 0 2]
[-0.234 1.573 -0.462 0.241 -1.913]
np.random.seed() call in one module affects random operations in unrelated modules. This breaks reproducibility when refactoring code.Common Distributions
The Generator API supports all standard distributions: normal, binomial, poisson, exponential, uniform, and more. Each distribution function accepts shape parameters and a size argument to produce arrays.
import numpy as np rng = np.random.default_rng(0) # Normal (Gaussian) print(rng.normal(loc=170, scale=10, size=5)) # heights in cm # Binomial — n trials, p probability print(rng.binomial(n=10, p=0.5, size=5)) # coin flips # Poisson — events per interval print(rng.poisson(lam=3, size=5)) # Exponential — time between events print(rng.exponential(scale=1.0, size=5))
[4 5 7 5 3]
Shuffling and Sampling
Shuffle arrays in place with rng.shuffle() or get a permuted copy with rng.permutation(). For random sampling from an array without replacement, use rng.choice(replace=False). For bootstrap sampling, set replace=True.
import numpy as np rng = np.random.default_rng(42) arr = np.arange(10) # Shuffle in place rng.shuffle(arr) print(arr) # [0 3 7 2 5 1 9 4 6 8] — order varies # Sample without replacement print(rng.choice(arr, size=3, replace=False)) # Sample with replacement (bootstrap) print(rng.choice(arr, size=5, replace=True)) # Permutation — returns a copy, does not modify original orig = np.arange(5) shuffled = rng.permutation(orig) print(orig) # [0 1 2 3 4] — unchanged print(shuffled) # shuffled copy
[0 1 2 3 4]
[3 7 2]
Seeding Strategies and Reproducibility
Seeding controls the initial state of the generator. For reproducibility, use a fixed integer seed. For distributed systems, ensure each process gets a unique but reproducible seed (e.g., based on process rank). For testing, consider using a seed derived from the test name to isolate test randomness.
import numpy as np from hashlib import sha256 # Good: fixed seed rng = np.random.default_rng(seed=42) # Better: unique seed per process (e.g., MPI rank) process_id = 0 # from MPI seed = int(sha256(b"my_experiment").hexdigest(), 16) + process_id rng_rank = np.random.default_rng(seed) # Testing: seed from test name def my_test_function(): test_seed = int(sha256(b"my_test_function").hexdigest(), 16) % 2**32 rng_local = np.random.default_rng(test_seed) # ... use rng_local
default_rng() without an explicit seed inside workers can generate the same sequence (common bug with fork).Performance Considerations and Vectorisation
Generator API is vectorised — always generate arrays of samples in one call rather than looping. The performance gain is 10-100x for large sizes. Additionally, use dtype parameters for integer and float precision to control memory and speed.
import numpy as np import time rng = np.random.default_rng(0) # Slow: loop start = time.time() for _ in range(100000): rng.random() print("Loop time:", time.time() - start) # Fast: vectorised start = time.time() rng.random(100000) print("Vectorised time:", time.time() - start)
| Feature | Legacy API (np.random.seed) | Generator API (default_rng) |
|---|---|---|
| Recommended for new code | No | Yes |
| Global state | Single global RandomState | Independent per generator |
| Thread safety | Global lock serialises | No global lock; use per-thread |
| Speed (single thread) | Baseline | ~30% faster |
| Seed multiple streams | Impossible without hacks | Create multiple generators |
| New distributions | Limited | More algorithms, better accuracy |
| Reproducibility across versions | Unstable across NumPy versions | Stable within major versions |
🎯 Key Takeaways
- Use np.random.default_rng(seed) for new code — it is faster and better than the legacy API.
- Seeding makes random numbers reproducible — essential for ML experiments.
- rng.shuffle() modifies in place;
rng.permutation()returns a copy. - rng.choice() with replace=False is sampling without replacement.
- Each call to a Generator method advances the internal state — the same rng object produces different numbers on consecutive calls.
- In distributed systems, give each worker a unique seed derived from a base seed to avoid identical sequences.
- Log the generator state (seed or BitGenerator state) with every experiment for full reproducibility.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhy is
np.random.default_rng()preferred overnp.random.seed()in modern NumPy?Mid-levelReveal - QHow do you generate reproducible random numbers in NumPy?SeniorReveal
Frequently Asked Questions
What is the difference between np.random.seed() and np.random.default_rng()?
np.random.seed() sets a global seed that affects all legacy numpy.random functions. np.random.default_rng() creates an independent Generator object. The Generator approach is better because it avoids shared global state — multiple generators with different seeds can run independently in the same process.
Why does my random data change every time I run my script?
You are not seeding the generator. Add seed=42 (or any integer) to np.random.default_rng(). The exact number does not matter — what matters is that you use the same number consistently.
Can I mix legacy and Generator API in the same script?
Yes, but avoid it. The legacy API uses a global RandomState that can interact unpredictably with Generator objects. Migrate all code to Generator for consistency.
How do I generate the same random numbers in Python 2 and Python 3?
Use the bit-level Generator API and a fixed seed. In Python 2, use np.random.RandomState(seed).randn() (legacy) but note that Python 2 is unsupported. Better: use the same NumPy version and the Generator API which is stable within major versions.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.