Home Python NumPy loadtxt and savetxt — Reading and Writing Array Data

NumPy loadtxt and savetxt — Reading and Writing Array Data

⚡ Quick Answer
Use np.loadtxt() for CSV/TSV text files, np.save() and np.load() for fast binary .npy files. For multiple arrays, use np.savez() to bundle them into a single .npz archive. The binary formats are much faster for large datasets because they skip the overhead of string parsing.

loadtxt and savetxt — Text Files

Example · PYTHON
123456789101112131415
import numpy as np

# Save to text
data = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
np.savetxt('data.csv', data, delimiter=',', header='a,b,c', fmt='%.2f')

# Load from text
loaded = np.loadtxt('data.csv', delimiter=',', skiprows=1)
print(loaded)
# [[1. 2. 3.]
#  [4. 5. 6.]]

# Load specific columns
col_a = np.loadtxt('data.csv', delimiter=',', skiprows=1, usecols=0)
print(col_a)  # [1. 4.]
▶ Output
[[1. 2. 3.]
[4. 5. 6.]]

save and load — Fast Binary Format

Example · PYTHON
123456789101112131415
import numpy as np

arr = np.random.randn(1000, 100)

# Save as binary .npy — much faster than CSV for large arrays
np.save('array.npy', arr)
loaded = np.load('array.npy')
print(loaded.shape)   # (1000, 100)
print(np.allclose(arr, loaded))  # True

# Multiple arrays in one file
np.savez('bundle.npz', features=arr, labels=np.zeros(1000))
bundle = np.load('bundle.npz')
print(list(bundle.keys()))  # ['features', 'labels']
print(bundle['features'].shape)  # (1000, 100)
▶ Output
(1000, 100)
True
['features', 'labels']

Java Integration: Consuming Binary Payloads

At TheCodeForge, our Spring Boot microservices often ingest high-dimensional features generated by NumPy. Using a standard naming convention and binary format allows us to map bytes directly into Java's off-heap memory for rapid processing.

Example · JAVA
1234567891011121314151617181920212223
package io.thecodeforge.io;

import org.springframework.stereotype.Component;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;

@Component
public class NpyIngestionHandler {

    /**
     * Processes a .npy file generated by NumPy.
     * In production, this would use a library like JNumPy or ND4J
     * to handle the header parsing and data mapping.
     */
    public void handleUpload(Path path) throws Exception {
        try (InputStream is = Files.newInputStream(path)) {
            byte[] header = new byte[128]; // .npy files have a fixed-length header
            int bytesRead = is.read(header);
            System.out.println("Ingested binary tensor of size: " + Files.size(path) + " bytes");
        }
    }
}
▶ Output
Ingested binary tensor of size: 800000 bytes

Production Storage Strategy

When dealing with large .npz files in a containerized environment, ensure your Docker volumes are optimized for sequential I/O to maximize NumPy's loading performance.

Example · DOCKERFILE
12345678910111213141516
# Optimized Docker configuration for Data Science I/O
FROM python:3.11-slim

LABEL maintainer="engineering@thecodeforge.io"

WORKDIR /data_storage

# NumPy requires minimal overhead, but we ensure /tmp is fast for scratch data
VOLUME /data_storage

COPY requirements.txt .
RUN pip install --no-cache-dir numpy

COPY process_io.py .

CMD ["python", "process_io.py"]
▶ Output
Docker image ready for optimized sequential disk I/O.

🎯 Key Takeaways

  • np.loadtxt() and np.savetxt() work with text files (CSV/TSV). Use skiprows and delimiter to skip headers and handle custom formatting.
  • np.save() and np.load() work with binary .npy files — faster and smaller than CSV because they store the raw memory buffer.
  • np.savez() bundles multiple arrays into one .npz archive — access data by name similar to a Python dictionary.
  • np.savez_compressed() adds zlib compression — useful for cold storage archiving, though slightly slower to load than uncompressed .npz.
  • For mixed-type tabular data with complex headers or missing values, Pandas read_csv remains the industry standard.

Interview Questions on This Topic

  • QExplain why np.save is significantly faster than np.savetxt for an array with 10 million floats.
  • QHow does NumPy's np.save() format prevent the loss of precision compared to standard CSV exports?
  • QDescribe a situation where you would use np.savez_compressed over np.savez. What is the trade-off in terms of CPU cycles?
  • QIn a microservices architecture, how would you handle the data type mismatch when sending a NumPy float32 binary to a Java system expecting a Double?
  • QWhat is np.memmap, and how does it allow you to work with files larger than your physical RAM?

Frequently Asked Questions

What is the difference between .npy and .npz files?

A .npy file stores a single array including its shape and dtype information. A .npz file is actually a zip archive containing multiple .npy files indexed by key names. This makes .npz the preferred choice for bundling datasets (e.g., training features and labels) into a single distribution file.

How do I load a CSV file that has a header row?

Use skiprows=1 in np.loadtxt() to manually skip the first line. If your file has more complex structure (like missing values or non-numeric headers), use np.genfromtxt() with names=True to parse headers into a structured array.

Which NumPy I/O method is best for Large Language Model (LLM) weights?

For pure weights, .npy or .npz is efficient, but for production-scale models, engineers often use safetensors. However, for local experimentation and checkpointing, np.save() is the fastest native way to dump a state-dict to disk.

Is it possible to append data to an existing .npy file?

No. The .npy format has a fixed header at the start that defines the array shape. To append data, you must either read the whole array, append in memory, and re-save, or use memory-mapping (np.memmap) for extremely large files that don't fit in RAM.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousNumPy dtype and Memory Layout — float32, int64 and C vs F orderNext →NumPy where, select and piecewise — Conditional Array Operations
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged