NumPy loadtxt and savetxt — Reading and Writing Array Data
loadtxt and savetxt — Text Files
import numpy as np # Save to text data = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) np.savetxt('data.csv', data, delimiter=',', header='a,b,c', fmt='%.2f') # Load from text loaded = np.loadtxt('data.csv', delimiter=',', skiprows=1) print(loaded) # [[1. 2. 3.] # [4. 5. 6.]] # Load specific columns col_a = np.loadtxt('data.csv', delimiter=',', skiprows=1, usecols=0) print(col_a) # [1. 4.]
[4. 5. 6.]]
save and load — Fast Binary Format
import numpy as np arr = np.random.randn(1000, 100) # Save as binary .npy — much faster than CSV for large arrays np.save('array.npy', arr) loaded = np.load('array.npy') print(loaded.shape) # (1000, 100) print(np.allclose(arr, loaded)) # True # Multiple arrays in one file np.savez('bundle.npz', features=arr, labels=np.zeros(1000)) bundle = np.load('bundle.npz') print(list(bundle.keys())) # ['features', 'labels'] print(bundle['features'].shape) # (1000, 100)
True
['features', 'labels']
Java Integration: Consuming Binary Payloads
At TheCodeForge, our Spring Boot microservices often ingest high-dimensional features generated by NumPy. Using a standard naming convention and binary format allows us to map bytes directly into Java's off-heap memory for rapid processing.
package io.thecodeforge.io; import org.springframework.stereotype.Component; import java.io.InputStream; import java.nio.file.Files; import java.nio.file.Path; @Component public class NpyIngestionHandler { /** * Processes a .npy file generated by NumPy. * In production, this would use a library like JNumPy or ND4J * to handle the header parsing and data mapping. */ public void handleUpload(Path path) throws Exception { try (InputStream is = Files.newInputStream(path)) { byte[] header = new byte[128]; // .npy files have a fixed-length header int bytesRead = is.read(header); System.out.println("Ingested binary tensor of size: " + Files.size(path) + " bytes"); } } }
Production Storage Strategy
When dealing with large .npz files in a containerized environment, ensure your Docker volumes are optimized for sequential I/O to maximize NumPy's loading performance.
# Optimized Docker configuration for Data Science I/O FROM python:3.11-slim LABEL maintainer="engineering@thecodeforge.io" WORKDIR /data_storage # NumPy requires minimal overhead, but we ensure /tmp is fast for scratch data VOLUME /data_storage COPY requirements.txt . RUN pip install --no-cache-dir numpy COPY process_io.py . CMD ["python", "process_io.py"]
🎯 Key Takeaways
- np.loadtxt() and np.savetxt() work with text files (CSV/TSV). Use skiprows and delimiter to skip headers and handle custom formatting.
- np.save() and np.load() work with binary .npy files — faster and smaller than CSV because they store the raw memory buffer.
- np.savez() bundles multiple arrays into one .npz archive — access data by name similar to a Python dictionary.
- np.savez_compressed() adds zlib compression — useful for cold storage archiving, though slightly slower to load than uncompressed .npz.
- For mixed-type tabular data with complex headers or missing values, Pandas read_csv remains the industry standard.
Interview Questions on This Topic
- QExplain why np.save is significantly faster than np.savetxt for an array with 10 million floats.
- QHow does NumPy's np.save() format prevent the loss of precision compared to standard CSV exports?
- QDescribe a situation where you would use np.savez_compressed over np.savez. What is the trade-off in terms of CPU cycles?
- QIn a microservices architecture, how would you handle the data type mismatch when sending a NumPy float32 binary to a Java system expecting a Double?
- QWhat is np.memmap, and how does it allow you to work with files larger than your physical RAM?
Frequently Asked Questions
What is the difference between .npy and .npz files?
A .npy file stores a single array including its shape and dtype information. A .npz file is actually a zip archive containing multiple .npy files indexed by key names. This makes .npz the preferred choice for bundling datasets (e.g., training features and labels) into a single distribution file.
How do I load a CSV file that has a header row?
Use skiprows=1 in np.loadtxt() to manually skip the first line. If your file has more complex structure (like missing values or non-numeric headers), use np.genfromtxt() with names=True to parse headers into a structured array.
Which NumPy I/O method is best for Large Language Model (LLM) weights?
For pure weights, .npy or .npz is efficient, but for production-scale models, engineers often use safetensors. However, for local experimentation and checkpointing, np.save() is the fastest native way to dump a state-dict to disk.
Is it possible to append data to an existing .npy file?
No. The .npy format has a fixed header at the start that defines the array shape. To append data, you must either read the whole array, append in memory, and re-save, or use memory-mapping (np.memmap) for extremely large files that don't fit in RAM.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.