Advanced 12 min · March 05, 2026

Node.js Streams and Buffers

Node.js Streams — Missing drain Handler Causes OOM

Q: What is the default highWaterMark for a Node.js stream?

For binary streams, the default highWaterMark is 16,384 bytes (16 KB). For objectMode streams, it's 16 objects. You can change it when creating the stream via options.

Q: Can a process run out of memory because of a stream?

Yes. If the backpressure protocol is broken (e.g., the producer ignores write() returning false), the internal buffer of the writable stream grows without bound until the OOM killer terminates the process. This is a common cause of OOM in stream-based Node.js services.

Q: How do I check if a stream is in flowing mode?

Check the readableFlowing property: null means paused and no consumer; true means flowing; false means paused after being flowing (e.g., after pause()).

Q: What does ERR_STREAM_PREMATURE_CLOSE mean?

It means a stream in a pipeline was destroyed before all data was flushed. This often happens when a client disconnects mid-upload/download. In production, handle it gracefully by catching it and cleaning up partial state.

Q: Why does my custom Transform lose the last chunk of data?

You probably forgot to implement _flush(). If your Transform buffers data across chunks (e.g., building lines), _flush() is called when the writable side ends and is your only opportunity to emit the final partial chunk.

Q: How do I handle Range requests for video streaming in Node.js?

Parse the `Range` header, extract the start and end bytes, create a read stream with `fs.createReadStream` using `start` and `end` options, and respond with a 206 status and `Content-Range` header. Always validate the range to avoid errors. For production, add support for `If-Range` and ETags.

Q: Can I use Web Streams API in Node.js for file I/O?

Yes, Node.js 18+ supports Web Streams. You can convert Node.js streams to Web Streams using `Readable.toWeb()` and back using `Writable.fromWeb()`. However, Web Streams lack automatic backpressure, so prefer Node.js streams for file I/O and use Web Streams for HTTP or browser-compatible APIs.

Q: What is the optimal highWaterMark for a file read stream?

The default is 64KB, which is a good starting point. For high-latency connections, increase to 256KB to reduce I/O calls. For memory-constrained environments, decrease to 16KB. Benchmark with your actual workload to find the sweet spot. Monitor memory usage with `process.memoryUsage()`.

Q: How do I handle range requests for video streaming in Node.js?

Parse the `Range` header, extract start and end bytes, then use `fs.createReadStream` with `start` and `end` options. Respond with status 206 and `Content-Range` header. For multi-range, split and stream each part. Always validate ranges to avoid errors.

Q: What's the difference between `pipe()` and `pipeline()` for encryption?

`pipeline()` automatically cleans up streams and propagates errors, while `pipe()` does not. For encryption, always use `pipeline()` to ensure the cipher is finalized and auth tag is retrieved. `pipe()` can cause hanging streams or unhandled errors.

Missing drain handler in Transform caused RSS climb from 120 MB to 3.8 GB in 90s, killing upload service.

Naren Founder & Principal Engineer

20+ years shipping production JavaScript and front-end systems at scale. Drawn from code that ran under real load.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

- npm i or npm install Install Packages

✦ Definition~90s read

What is Node.js Streams and Buffers?

Node.js streams are the backbone of efficient I/O in Node.js, designed to process data piece-by-piece rather than loading entire payloads into memory. They solve the fundamental problem of handling large or continuous data sources — like file reads, HTTP requests, or database cursors — without exhausting the V8 heap.

★

Imagine you're filling a bathtub from a fire hose.

Internally, streams use a buffer mechanism that can reside either in V8 heap memory (for small objects) or as external memory (for large Buffers), and the highWaterMark option controls the buffer size threshold. When a stream's internal buffer exceeds this limit, it signals the writer to pause via backpressure — but if you ignore the 'drain' event, the buffer grows unbounded, leading to an out-of-memory (OOM) crash.

This is especially common in production systems processing high-throughput logs, video transcoding, or real-time data pipelines where developers assume pipe() handles everything automatically.

Streams operate as state machines with distinct modes: flowing (data is read and emitted immediately) and paused (data is buffered until explicitly read). The backpressure protocol is the contract between a writable stream's internal buffer and the source — when write() returns false, you must wait for 'drain' before writing more.

Failing to do so is the #1 cause of OOM in Node.js stream applications. While pipe() simplifies this by managing backpressure automatically, it has a critical flaw: it doesn't destroy the source stream on error, leaving open file handles or dangling HTTP connections.

The pipeline() API (added in Node 10, stable in 12+) fixes this by properly cleaning up resources and propagating errors to all participants. For production systems handling gigabytes of data, always use pipeline() over pipe(), and never assume backpressure is handled without explicit drain event handling in custom writable or transform streams.

When building custom transform streams for production, you must implement both _transform() and _flush() methods, respecting the push() and callback() contract to maintain backpressure. The internal buffer is managed by the stream's highWaterMark (default 16KB for objectMode: false, 16 objects for objectMode: true), but external memory for Buffer objects is tracked separately by V8.

Tools like clinic.js or Node's --trace-gc flag can reveal memory pressure from unbounded buffers. Alternatives to raw streams include high-level libraries like pump (deprecated in favor of pipeline()), through2 (for older codebases), or RxJS observables for reactive patterns — but for raw performance and control, native streams with proper drain handling remain the standard in Node.js production environments handling terabytes of data daily at companies like Netflix and PayPal.

Plain-English First

Imagine you're filling a bathtub from a fire hose. If you just blast the water all at once, it floods the bathroom. Streams are like turning that fire hose into a gentle tap — water flows in at a rate the tub can handle. A Buffer is the plug in the drain: it holds a fixed chunk of water (raw bytes) temporarily so you can inspect or move it before letting more in. Together, they let Node.js handle huge amounts of data without drowning in memory. The moment you understand that analogy at a mechanical level — not just as a metaphor — is the moment streams stop being confusing.

⚙ Browser compatibility

Latest versions — ✓ supported

Chrome	Firefox	Safari	Edge
✓	✓	✓	✓

Every Node.js server you have ever run has been silently relying on Streams and Buffers — whether you knew it or not. When you serve a 4 GB video file, parse an incoming multipart upload, or pipe data from a database cursor to an HTTP response, you are in stream territory. Get it wrong and your server leaks memory, stalls under load, or corrupts binary data in ways that are genuinely nightmarish to debug at 2 AM. Get it right and you can process files larger than your available RAM with a flat, predictable memory footprint that holds steady under load.

The core problem Streams solve is the mismatch between producer speed and consumer speed. A database might emit rows faster than your HTTP client can receive them. A file system read might outpace a gzip compressor. Without a flow-control mechanism, the fast side buffers everything into memory until something crashes. Node.js Streams solve this with backpressure — a built-in signalling protocol between producers and consumers that says 'slow down' or 'keep going' without you writing a single line of coordination logic.

I have personally traced three separate production OOM incidents back to misunderstood stream backpressure — in two cases, engineers had used pipe() and assumed it handled everything safely, not realizing that pipe() does not propagate errors and does not protect against a broken backpressure chain inside a custom Transform.

By the end of this article you will understand how the Buffer class maps onto V8 memory outside the garbage collector, what highWaterMark actually controls (it is not a hard limit, and most engineers get this wrong), how to build a production-grade Transform stream, why pipeline() is almost always safer than pipe(), and the specific failure modes that only show up under load — never in your development environment.

Why Node.js Streams Without drain Handling Cause OOM

Node.js streams are an abstraction for handling data piece-by-piece rather than loading entire payloads into memory. The core mechanic is backpressure: when a writable stream's internal buffer exceeds highWaterMark (default 16KB), stream.write() returns false, signaling the producer to pause. If the producer ignores this signal and keeps writing, data accumulates in memory until the process runs out of heap space. The drain event fires when the buffer has drained below highWaterMark, allowing writes to resume. This push-pull contract is what prevents unbounded memory growth. In practice, the highWaterMark is not a hard limit — it's a threshold that triggers the backpressure signal. The buffer can grow beyond it if writes continue, but the stream will eventually reject writes with an error if memory pressure is extreme. The drain event is the only reliable way to know when it's safe to write again. Use streams whenever you process data that could exceed available memory — file transfers, HTTP request bodies, database result sets. Without explicit drain handling, any high-throughput pipeline risks silent OOM crashes in production, especially under load spikes where the consumer (e.g., a slow network socket or disk) cannot keep up with the producer.

⚠ Backpressure Is Not Automatic

Stream.write() returning false is a suggestion, not a lock — the producer must listen for 'drain' to actually pause; ignoring it is the #1 cause of stream-related OOM.

📊 Production Insight

A Node.js service piping a large CSV from S3 to a slow HTTP response without drain handling caused 2GB heap usage and crashed under 50 concurrent requests.

Symptom: heap grows linearly with request concurrency, GC thrashing, then process killed by OOM killer.

Rule: always check stream.write() return value and wait for 'drain' before writing more — never assume the consumer can keep up.

🎯 Key Takeaway

Backpressure is a cooperative protocol, not automatic — you must check write() return and listen for drain.

The default highWaterMark (16KB) is tiny; for high-throughput pipelines, tune it or risk micro-pausing.

Without drain handling, any stream pipeline is a ticking OOM bomb under load — test with slow consumers.

thecodeforge.io

Nodejs Streams Buffers

Buffer Internals — V8 Heap vs External Memory

Buffer is one of the most misunderstood classes in Node.js, and the misunderstanding usually surfaces in production as an OOM kill that heap profiling tools cannot explain. Engineers look at the heap snapshot, see 50 MB, and cannot reconcile it with the 2 GB RSS climbing in their dashboards. The reason is where Buffer memory actually lives.

Before Node.js 4.5, Buffer used the V8 heap, meaning every Buffer allocation competed with JavaScript objects for garbage-collected memory. Since then, Buffer.alloc() and Buffer.allocUnsafe() allocate memory from a pool managed outside V8's heap using the C++ layer through libuv. This memory is not tracked by V8's garbage collector in the same way — it is reference-counted and returned to the pool when the Buffer is dereferenced. The GC knows a Buffer exists as a JavaScript object, but the actual bytes that Buffer points to are in external memory that the GC cannot compact or move.

This has a concrete production implication: your heap snapshot will show a clean 50 MB heap while your process RSS sits at 2 GB because of accumulated Buffer allocations. Every heap profiling tool, every Chrome DevTools memory snapshot, every heapUsed metric will look completely healthy. You must monitor process.memoryUsage().external and process.memoryUsage().rss to detect Buffer-related memory growth.

Buffer.alloc(size) zero-fills the allocated memory before returning it, which is safe but slower. Buffer.allocUnsafe(size) skips zero-filling, making it roughly 10x faster for large allocations. The 'unsafe' name is precise: the returned memory may contain bytes from previous allocations — fragments of other users' data, previous tokens, partial file contents — if you read from it before writing. For security-sensitive paths, always use Buffer.alloc(). For internal processing where you guarantee write-before-read, Buffer.allocUnsafe() is the correct production choice and the performance difference is real at high allocation rates.

io/thecodeforge/buffers/buffer-allocation.jsJAVASCRIPT

const { Buffer } = require('buffer');

// SAFE: zero-filled before returning — use for user-facing or security-sensitive data.
// The zero-fill is not optional safety theater — it prevents information disclosure.
const safeBuf = Buffer.alloc(1024);
console.log('alloc [0]:', safeBuf[0]); // 0 — guaranteed, always

// FAST: returned uninitialized — use for internal processing with write-before-read.
// The 'unsafe' refers to information disclosure risk, not memory safety.
const fastBuf = Buffer.allocUnsafe(1024);
console.log('allocUnsafe [0]:', fastBuf[0]); // unpredictable — could be any byte

// Small allocations (< 4 KB) share a pre-allocated 8 KB pool.
// This means two small allocUnsafe calls may share an underlying ArrayBuffer.
// Writing to one at the wrong offset can corrupt the other.
const smallA = Buffer.allocUnsafe(100);
const smallB = Buffer.allocUnsafe(200);
console.log('Same backing store:', smallA.buffer === smallB.buffer); // true for small allocs

// The key metric to watch in production — NOT heapUsed.
// Buffer memory shows up in external and rss, not heapUsed.
const mem = process.memoryUsage();
console.log({
  rss:          `${(mem.rss          / 1024 / 1024).toFixed(1)} MB`, // total process memory
  heapUsed:     `${(mem.heapUsed     / 1024 / 1024).toFixed(1)} MB`, // V8 JS objects
  external:     `${(mem.external     / 1024 / 1024).toFixed(1)} MB`, // Buffer bytes
  arrayBuffers: `${(mem.arrayBuffers / 1024 / 1024).toFixed(1)} MB`  // ArrayBuffer bytes
});

// If external climbs while heapUsed stays flat — Buffer leak.

Output

alloc [0]: 0

allocUnsafe [0]: 47

Same backing store: true

{ rss: '28.4 MB', heapUsed: '5.8 MB', external: '0.3 MB', arrayBuffers: '0.3 MB' }

Try it live

⚠ Buffer memory is invisible to heap profilers

process.memoryUsage().external is your only reliable indicator of Buffer memory growth. Heap snapshots will not show it. If your RSS climbs while heapUsed is stable, you have a Buffer leak — almost always from a broken backpressure chain.

📊 Production Insight

process.memoryUsage().external tracks Buffer memory that lives outside V8's heap.

Heap snapshots and V8 profilers will not reveal Buffer leaks — RSS climbing while heap stays flat is the only reliable telltale.

Rule: instrument process.memoryUsage() in your production health endpoint and alert on RSS growth rate, not just heapUsed.

🎯 Key Takeaway

Buffer memory lives outside V8's garbage-collected heap — this is intentional, not an oversight.

Heap profilers will not reveal Buffer leaks. Monitor process.memoryUsage().external and .rss.

If RSS climbs while heap stays flat, your Buffers are accumulating — find the backpressure break.

Buffer Allocation Decision

IfBuffer contains user input, authentication tokens, or data sent to clients

→

UseUse Buffer.alloc() — zero-filled, prevents information disclosure

IfInternal processing with guaranteed write-before-read (file reads, binary protocol encoding)

→

UseUse Buffer.allocUnsafe() — roughly 10x faster, safe when you control the write cycle

IfAllocation size < 4 KB and using allocUnsafe

→

UseBe aware of shared pool corruption risk — use Buffer.alloc() or manually manage a pool

Stream Types and Their Internal State Machines

Node.js provides five stream types, each with a distinct role in a data pipeline. Understanding their internal state machines is not academic — it is what lets you diagnose production issues where streams silently stop flowing, emit data after destruction, or hold memory that the GC cannot reclaim.

Every Readable stream has two operating modes: paused and flowing. In paused mode — the default — data is buffered internally and you must explicitly call read() to pull chunks out. In flowing mode, data is pushed to you automatically via data events as fast as the underlying source can produce it. Calling .resume(), piping to a Writable, or attaching a data listener switches to flowing mode. The most common cause of 'stream hang' bugs I have debugged is a Readable created and then left in paused mode with no consumer attached — data accumulates in the internal buffer, the highWaterMark is crossed, and the underlying source pauses, and nothing ever flows. The process looks healthy. No error is emitted. Everything is just silently stuck.

Writable streams have a simpler state machine driven by the callback in _write(). When _write() invokes its callback, the stream is ready to receive the next chunk. When the internal buffer crosses highWaterMark, write() returns false — this is the backpressure signal. The drain event fires when the buffer drops back below highWaterMark.

Duplex streams like TCP sockets combine both — independent Readable and Writable sides with independent state machines sharing one underlying resource. Transform streams like zlib.createGzip() are Duplex streams where the write side feeds into the read side through your _transform() implementation. PassThrough streams are identity Transforms useful for injecting inspection points into a pipeline without modifying data.

io/thecodeforge/streams/stream-state-inspection.jsJAVASCRIPT

const { Readable, Writable, Transform, PassThrough } = require('stream');

// --- Readable state inspection ---
const readable = new Readable({
  highWaterMark: 16 * 1024, // 16 KB internal buffer
  read(size) {
    // This is called when the consumer wants data.
    const shouldContinue = this.push(Buffer.from('data chunk'));
    if (!shouldContinue) {
      // Consumer not reading fast enough — stop producing.
      // The stream will call read() again when the consumer is ready.
    }
    this.push(null); // null signals end of stream
  }
});

console.log('Initial state:', {
  readableFlowing: readable.readableFlowing, // null = paused, no listeners
  readableLength:  readable.readableLength,   // 0 = nothing buffered yet
  readableEnded:   readable.readableEnded     // false = not done
});

readable.resume(); // switch to flowing mode — data events start firing

console.log('After resume:', {
  readableFlowing: readable.readableFlowing, // true
});

// --- Transform with destroy guard ---
const safeTransform = new Transform({
  highWaterMark: 16 * 1024,
  transform(chunk, encoding, callback) {
    if (this.destroyed) return callback();
    const processed = chunk.toString().toUpperCase();
    callback(null, Buffer.from(processed));
  },
  flush(callback) {
    // emit buffered remainder here, if any
    callback();
  }
});

// --- PassThrough for pipeline inspection ---
const inspector = new PassThrough();
let bytesThrough = 0;
inspector.on('data', chunk => {
  bytesThrough += chunk.length;
});
// Insert inspector between any two pipeline stages without affecting data flow.

Output

Initial state: { readableFlowing: null, readableLength: 0, readableEnded: false }

After resume: { readableFlowing: true }

Try it live

Mental Model

Streams as a Factory Assembly Line

Think of a stream pipeline as a factory assembly line — each station processes one item at a time and passes it to the next at the pace the next station can handle. The conveyor belt does not speed up just because the first station is fast.

Readable = raw material supplier — produces data chunks on demand
Transform = processing station — modifies chunks and pushes downstream at downstream pace
Writable = packaging station — consumes final product and writes it
Backpressure = conveyor belt speed controller — pauses upstream when downstream is slow
highWaterMark = buffer shelf at each station — triggers pause signal when full

📊 Production Insight

A Readable with readableFlowing === null has no consumer and silently accumulates data in its internal buffer.

This is the most common cause of the 'stream hang' bug class in production.

Rule: check readableFlowing first when debugging a stream that appears to have stopped — if null, the stream is paused and waiting for a consumer that never showed up.

🎯 Key Takeaway

Readable has two modes (paused and flowing) — null readableFlowing is the silent hang state with no consumer.

Writable backpressure is driven by write() returning false and the drain event — ignore these and you get an OOM kill.

Always guard _transform() with a destroyed check to prevent post-destroy data emission into an already-closed stream.

thecodeforge.io

Nodejs Streams Buffers

Backpressure — The Flow Control Protocol

Backpressure is the single most important concept in Node.js Streams, and the one most frequently misunderstood in practice. It is not a rate limiter, not a throttle, and not a buffer size configuration. It is a cooperative protocol between a Readable and a Writable where the Writable signals 'I need you to slow down' and the Readable obliges — if the producer is paying attention.

The mechanism works through the return value of write(). When you call writable.write(chunk), the method returns true if the internal buffer is below highWaterMark and false if it is at or above it. When write() returns false, the protocol says the producer should stop writing and wait for the drain event before sending more data. If the producer ignores this signal and keeps calling write(), the data is still buffered — but in memory, without any bound, until the process runs out of memory and the OOM killer fires. There is no automatic enforcement. Backpressure is cooperative, not mandatory.

The highWaterMark is not a hard limit. This is the specific detail that most engineers who have read about streams still get wrong in interviews. It is a heuristic threshold — 16,384 bytes for binary streams, 16 objects for objectMode streams by default — where write() starts returning false to signal the producer to pause. But the stream will still accept data beyond this point. The buffer can grow arbitrarily beyond highWaterMark if the producer ignores the signal. Think of highWaterMark as the 'please slow down' sign on a highway, not the physical guardrail at the edge of a cliff.

pipe() handles backpressure automatically within its direct neighbours: when the destination's write() returns false, pipe() calls readable.pause(). When drain fires, pipe() calls readable.resume(). This is why pipe() seems to work in simple cases. The failure mode — which only appears in production under sustained load — is when a custom Transform breaks the backpressure chain by calling its callback immediately regardless of whether the downstream stream has drained.

io/thecodeforge/streams/backpressure-manual.jsJAVASCRIPT

const fs = require('fs');

// Manual backpressure implementation — shown to illustrate the protocol.
// In production, use pipeline() which implements this correctly for you.
function copyWithBackpressure(sourcePath, destPath) {
  const readable = fs.createReadStream(sourcePath);
  const writable = fs.createWriteStream(destPath);

  readable.on('data', (chunk) => {
    const canContinue = writable.write(chunk);
    if (!canContinue) {
      // Backpressure engaged — pause the producer.
      readable.pause();
      // Resume only when writable has drained.
      writable.once('drain', () => readable.resume());
    }
  });

  readable.on('end', () => writable.end());

  // Error handling on both sides — without this, errors crash the process.
  readable.on('error', (err) => {
    console.error('Read error:', err.message);
    writable.destroy(err); // destroy the other stream too
  });
  writable.on('error', (err) => {
    console.error('Write error:', err.message);
    readable.destroy(err);
  });

  return new Promise((resolve, reject) => {
    writable.on('finish', resolve);
    writable.on('error', reject);
  });
}

Output

// No visible output — data flows from source to destination with flat memory.

// Use this pattern to understand the protocol; use pipeline() in production.

Try it live

Mental Model

highWaterMark Is the Polite Request, Not the Physical Guardrail

highWaterMark is the point where the stream politely asks the producer to slow down — but it will still accept more data if the producer ignores the request. The guardrail at the cliff edge is your process's available RAM.

Default highWaterMark: 16 KB for binary streams, 16 objects for objectMode
write() returns false when buffered data >= highWaterMark — this is the backpressure signal
The stream still accepts data after write() returns false — it keeps buffering in memory without bound
Only pausing the producer (or using pipe/pipeline) actually stops the data flow
Tuning highWaterMark lower = more frequent pauses but lower peak memory; higher = smoother throughput but larger memory spikes

📊 Production Insight

Ignoring write() returning false is the single most common cause of OOM kills in stream-based Node.js services.

The buffer grows unbounded — there is no automatic circuit breaker unless you use pipe() or pipeline().

Rule: if you call write() manually in any loop or data handler, always check the return value and implement the pause/drain cycle. Or use pipeline() and let it do this correctly.

🎯 Key Takeaway

Backpressure is cooperative, not automatic — the producer must respect write() returning false or risk an OOM kill.

highWaterMark is advisory, not a hard limit — the stream keeps buffering if the signal is ignored.

The only way to actually stop data flow is to pause the readable or use pipe()/pipeline().

Backpressure Handling Strategy

IfYou are manually piping data between two streams

→

UseUse pipeline() from stream/promises — it handles backpressure, error propagation, and resource cleanup automatically

IfYou are inside a custom Transform

→

UseEnsure _transform() callback is only called after the downstream stream has drained. Do not call callback() synchronously if push() returned false

IfYou have long-lived pipelines with asymmetric speeds (fast producer, slow consumer)

→

UseAdd highWaterMark tuning and monitor process.memoryUsage().external to detect early backpressure breaks

pipe() vs pipeline() — Error Propagation and Resource Cleanup

pipe() is the most commonly used stream API, and in production it is also the most dangerous one when used without fully understanding its limitations. I have seen three separate post-mortem write-ups at different companies trace back to the same root cause: pipe() does not propagate errors, and it does not destroy streams on error.

Here is the concrete failure mode. If you have readable.pipe(transform).pipe(writable) and the transform stream emits an error, the error is emitted only on the transform. The readable stream is not notified, not paused, not destroyed — it keeps emitting data into a transform that is in an error state. The writable stream is not notified and not destroyed — it keeps waiting for data that may never arrive or may arrive in a corrupted state. Both streams hold their underlying resources: the readable holds an open file descriptor, the writable holds an open socket or file handle. Under sustained error conditions — a flaky upstream service that errors on 5% of requests — this accumulates EMFILE errors as file descriptors exhaust the OS limit.

pipeline() from stream/promises solves both problems with one API call. When any stream in the chain errors or closes prematurely, pipeline() automatically destroys all other streams in the chain, propagates the error as a rejected Promise, and ensures all resources are cleaned up. In Node.js 18+, pipeline() also supports async generators as pipeline stages, which allows you to inject stateful processing logic — like computing a hash or accumulating metrics — inline without writing a full Transform class.

stream.finished() is the complementary utility for monitoring a single stream's completion. It returns a Promise that resolves when a stream emits 'finish' or 'end', and rejects on error or premature close. Use it when you need to wait for a stream to complete without piping it anywhere — for example, waiting for a write stream to flush before reading the file it wrote.

io/thecodeforge/streams/pipeline-production.jsJAVASCRIPT

const { pipeline } = require('stream/promises');
const { finished }  = require('stream/promises');
const fs     = require('fs');
const zlib   = require('zlib');
const crypto = require('crypto');

// Production pattern: process an upload, compute its hash, and write compressed.
// The async generator stage is a pipeline-compatible way to do inline processing
// without writing a full Transform class.
async function processUpload(inputStream, outputPath) {
  const gzip       = zlib.createGzip({ level: 6 });
  const fileStream = fs.createWriteStream(outputPath);
  const hasher     = crypto.createHash('sha256');

  await pipeline(
    inputStream,
    gzip,
    // Async generator as a pipeline stage
    async function* (source) {
      for await (const chunk of source) {
        hasher.update(chunk);
        yield chunk;
      }
    },
    fileStream
  );

  return hasher.digest('hex');
}

// Usage with specific error handling
async function handleUpload(req, outputPath) {
  try {
    const sha256 = await processUpload(req, outputPath);
    console.log('Upload complete. SHA256:', sha256);
    return { success: true, hash: sha256 };
  } catch (err) {
    if (err.code === 'ERR_STREAM_PREMATURE_CLOSE') {
      console.info('Client disconnected before upload completed');
      return { success: false, reason: "client_disconnect" };
    }
    console.error('Upload failed:', err.message);
    return { success: false, reason: "processing_error", error: err.message };
  }
}

// stream.finished() — wait for a single stream to complete
async function waitForFlush(writeStream) {
  await finished(writeStream);
  console.log('Write stream fully flushed — safe to read the file now');
}

Output

Upload complete. SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Try it live

⚠ pipe() Leaves Streams Open on Error

If any stream in a pipe chain errors, the others are not destroyed. This can exhaust file descriptors under sustained errors. pipeline() auto-destroys all streams on any error.

📊 Production Insight

pipe() leaves all other streams open on error — under sustained failure conditions, this exhausts file descriptor limits with EMFILE errors.

pipeline() auto-destroys all streams in the chain on any error and surfaces the error as a rejected Promise.

Rule: use pipeline() for any multi-stream operation. The only defensible exception is a prototype or script where you have explicitly added error listeners and destroy() calls to every stream in the chain — and even then, pipeline() is shorter.

🎯 Key Takeaway

pipe() is convenient but silently swallows errors and leaves all other streams open when any one fails.

pipeline() is the production-safe default — it propagates errors and auto-destroys all streams in the chain.

Use stream.finished() for single-stream completion tracking and async generators for inline stateful processing.

Choosing pipe() vs pipeline()

IfProduction code with multi-stream operations and error handling requirements

→

UseAlways use pipeline() from stream/promises — auto-destroys streams, propagates errors, returns a Promise

IfNeed inline stateful processing without a full Transform class

→

UseUse an async generator function as a pipeline() stage

IfNeed to wait for a single stream to complete without piping

→

UseUse stream.finished() from stream/promises

Building a Custom Transform Stream for Production

Custom Transform streams are where most backpressure bugs are born. You write a _transform() method, call the callback, and assume everything works. Then under load, memory grows, or data gets corrupted, or the stream hangs. The problem is almost always the same: the Transform breaks the backpressure chain by not waiting for the downstream consumer to drain before signalling readiness to the upstream producer.

The contract of _transform() is simple but unforgiving: you receive a chunk, you process it, and you call the callback with the result (or null if you want to pass it through). The stream uses the timing of that callback to decide whether to ask for more data from upstream. If you call callback() synchronously on every chunk — even when the downstream is struggling — the upstream never pauses, and your Transform becomes an unbounded buffer.

A production-grade Transform must respect the backpressure signal from its own writable side. Concretely: if this.push() returns false because the readable buffer is full, you should not call the callback until the drain event fires on the readable side. The built-in Transform class handles this in most cases, but if you are using a custom push mechanism or if you are writing to multiple destinations, you must implement the pause/drain cycle yourself.

Another critical pattern: always guard _transform() with a this.destroyed check. The destroy() method sets the destroyed flag but does not abort in-flight _transform() calls. Without the guard, chunks queued before destroy will still be processed, pushed to an already-closed stream, causing ERR_STREAM_DESTROYED or silent data loss.

And don't forget _flush(). It's called when the writable side ends. If your Transform buffers data across chunks (like a CSV row parser waiting for a newline), _flush() is where you emit the remainder. Forgetting _flush() means data loss at the end of every stream.

io/thecodeforge/streams/custom-transform-production.jsJAVASCRIPT

const { Transform } = require('stream');

class LineParser extends Transform {
  constructor(options = {}) {
    options.objectMode = true; // emit full lines as strings
    super(options);
    this._buffer = '';
    this._paused = false;
  }

  _transform(chunk, encoding, callback) {
    // 1. Guard against post-destroy processing
    if (this.destroyed) {
      return callback();
    }

    this._buffer += chunk.toString();
    const lines = this._buffer.split('\n');
    // Keep the last (potentially incomplete) piece in buffer
    this._buffer = lines.pop();

    for (const line of lines) {
      const processed = this._processLine(line);
      const shouldContinue = this.push(processed);
      if (!shouldContinue) {
        // Backpressure: stop processing and wait for drain
        this._paused = true;
        this.once('drain', () => {
          this._paused = false;
          this._flushBuffer(callback);
        });
        return; // don't call callback yet
      }
    }

    callback();
  }

  _flush(callback) {
    // Emit the final partial line (if any) when writable side ends
    if (this._buffer.length > 0) {
      this.push(this._processLine(this._buffer));
      this._buffer = '';
    }
    callback();
  }

  _processLine(line) {
    // Example processing: trim and uppercase
    return line.trim().toUpperCase();
  }

  _flushBuffer(callback) {
    // If we were paused, resume emitting from buffer when drain fires
    // This is a simplified version; production code would re-emit queued items
    callback();
  }
}

// Usage
const { pipeline } = require('stream/promises');
const fs = require('fs');

async function processFile(inputPath, outputPath) {
  const readable = fs.createReadStream(inputPath, { encoding: 'utf8' });
  const writable = fs.createWriteStream(outputPath);
  const parser = new LineParser();

  await pipeline(readable, parser, writable);
  console.log('File processed');
}

Output

File processed

Try it live

🔥Always Implement _flush()

If your Transform buffers data across chunks, _flush() is the only place to emit the final partial piece. Forgetting it causes silent data loss at the end of every stream.

📊 Production Insight

Synchronous callback in _transform() breaks backpressure — upstream never pauses and memory grows unbounded.

Missing _flush() loses final data — a bug that only appears on the last chunk of every stream.

Rule: always guard with this.destroyed, respect push() return value, and implement _flush() for any buffering Transform.

🎯 Key Takeaway

Custom Transforms break backpressure unless _transform() respects push() return value.

Always guard with this.destroyed and implement _flush() for buffering Transforms.

Production Transforms must be tested under asymmetric I/O speeds — not just unit tests.

Transform Implementation Checklist

IfDoes your Transform buffer data across chunks?

→

UseImplement _flush() to emit the final partial chunk

IfAre you calling callback() synchronously even when push() returned false?

→

UseImplement pause/drain cycle: save state, wait for drain, then call callback

IfCould the synchronous callback be called after stream is destroyed?

→

UseAdd if (this.destroyed) return callback() at the top of _transform()

Why Streams Matter — The 10GB File Problem

Most developers learn streams when their production service OOMs on a file upload. The math is brutal. A 100MB file loaded into memory with readFileSync consumes 100MB of RAM. A 10GB file consumes 10GB. Your server has 4GB. Game over.

Streams sidestep this entirely. Instead of swallowing the whole file, they process data in 64KB chunks. That means a 10GB file uses ~64KB of memory. Not a typo. The same memory footprint whether the file is 1MB or 100GB.

This isn't just about files. HTTP requests, database cursors, compression pipelines — any data source that produces bytes over time benefits from streaming. The alternative is buffering everything into a single blob, which scales linearly with data size. Streams scale to infinity, bounded only by disk I/O and network throughput.

The catch: streams demand a different mental model. You're not writing linear code anymore. You're orchestrating a pipeline where data flows asynchronously through stages. Get it wrong, and you'll face backpressure deadlocks, memory leaks, or silent data loss. But master it, and you can process datasets that would crash any naive implementation.

MemoryComparison.jsJAVASCRIPT

// io.thecodeforge — javascript tutorial

const fs = require('fs');

// ❌ Blows up on 10GB files
function naiveCopy(source, dest) {
  const data = fs.readFileSync(source);  // 10GB in RAM
  fs.writeFileSync(dest, data);          // 10GB more in RAM
  // Peak memory: 20GB
}

// ✅ Graceful at any size
function streamCopy(source, dest) {
  const read = fs.createReadStream(source, { highWaterMark: 65536 }); // 64KB chunks
  const write = fs.createWriteStream(dest);
  read.pipe(write);
  // Peak memory: ~64KB
}

// Usage
const FILE = './giant-log-2025-01-28.bin';
try {
  naiveCopy(FILE, '/dev/null');
} catch (err) {
  console.error('OOM:', err.message);
}

streamCopy(FILE, '/dev/null');
console.log('Streamed successfully');

Output

OOM: Cannot allocate memory

Streamed successfully

Try it live

⚠ Production Trap:

If your server runs out of memory and you see fs.readFileSync anywhere near large files, that's the culprit. Swap to streams before tuning the garbage collector.

🎯 Key Takeaway

Streams consume constant memory (~64KB) regardless of total data size. Always stream in production when file size exceeds available RAM.

Error Handling in Streams — The Silent Failure Trap

You piped a stream and nothing came out. No error. No output. Just a black hole. This is the classic stream surprise. Streams don't crash by default when something goes wrong inside a pipe chain. Errors get swallowed, and data silently stops flowing.

pipe() is the culprit. It propagates data and backpressure, but it does not propagate errors. If your read stream emits an 'error' event after piping, the write stream keeps waiting forever. No data, no close, just a zombie pipeline with dangling file handles.

The fix has two layers. First, always attach error listeners to every stream in the pipe chain. Second, stop using pipe() and switch to pipeline() from the 'stream/promises' module. pipeline() forwards errors to the final stream, and handles cleanup. It's the production-grade replacement.

Missing an error handler on a writable stream means the process never exits. You'll accumulate open file descriptors until your OS kills you. Seen it happen in a log aggregator that lost error events. 4000 file handles later, the kernel said no.

ErrorHandling.jsJAVASCRIPT

// io.thecodeforge — javascript tutorial

const { pipeline } = require('stream/promises');
const fs = require('fs');

// ❌ Silent failure: pipe() ignores errors
function fragilePipeline() {
  const read = fs.createReadStream('missing-file.log');
  const write = fs.createWriteStream('output.log');
  read.pipe(write);  // If read fails, write waits forever
}

// ✅ Production-ready: pipeline() propagates errors
async function robustPipeline() {
  try {
    const read = fs.createReadStream('missing-file.log');
    const write = fs.createWriteStream('output.log');
    await pipeline(read, write);
  } catch (err) {
    console.error('Pipeline failed:', err.code, err.message);
    // Cleanup is automatic — close events fire
  }
}

// Test
robustPipeline();
// Output: Pipeline failed: ENOENT no such file or directory

Output

Pipeline failed: ENOENT no such file or directory

Try it live

💡Critical:

Every stream in a pipeline needs an error listener. Without one, a failed read.stream will leak file descriptors and cause the process to hang. pipeline() handles this. pipe() does not.

🎯 Key Takeaway

Never use bare pipe() in production code. Always use pipeline() to propagate errors and prevent resource leakage.

Scenario A: Massive Database Exports (MongoDB / PostgreSQL)

When exporting millions of rows from MongoDB or PostgreSQL, naive collection can cause OOM crashes. Streams solve this by processing rows in flight. The critical pattern is piping a database cursor through a Transform stream that formats rows (CSV/JSON) and writes to a file or HTTP response. Without backpressure handling, the database cursor outruns the file system, causing memory buildup. The fix: listen to the cursor's 'readable' event and call read() only when the downstream signals 'drain'. For PostgreSQL, use pg-query-stream's cursor; for MongoDB, use cursor.pipe(). Never load the entire dataset into an array. Always end with pipeline() for cleanup on errors.

MongoExport.jsJAVASCRIPT

// io.thecodeforge — javascript tutorial

import { pipeline } from 'stream/promises';
import { createCursor } from 'mongoose'; // or MongoDB driver

const cursor = Model.find().batchSize(1000).cursor();
const transform = new Transform({
  objectMode: true,
  transform(row, enc, cb) {
    this.push(JSON.stringify(row) + '\n');
    cb();
  }
});

await pipeline(cursor, transform, process.stdout);
console.log('Done — no OOM');

Output

Done — no OOM

Try it live

⚠ Production Trap:

pg-query-stream silently buffers rows if backpressure is ignored. Always pair with pipeline(), never raw .pipe(), to abort on error.

🎯 Key Takeaway

Backpressure from Transform stream prevents database cursor from flooding memory.

Scenario B: AI Text Generation (The Web Streams API)

Modern AI APIs like OpenAI and Anthropic return chat completions as streams. Using the Web Streams API directly in Node.js lets you process tokens as they arrive—critical for real-time UI updates. The pattern: fetch the endpoint with response.body (a ReadableStream), pipe through a TextDecoderStream to get strings, then process each chunk. Key nuance: AbortSignal on the fetch cancels the stream cleanly, preventing partial token leakage. Do NOT accumulate tokens in a buffer; instead, push to a Transform stream for formatting or directly to an HTTP response. This eliminates tail latency and lets you cancel mid-request without waste.

AIStream.jsJAVASCRIPT

// io.thecodeforge — javascript tutorial

const response = await fetch(url, {
  headers: { 'Authorization': `Bearer ${apiKey}` },
  signal: AbortSignal.timeout(30000)
});

const reader = response.body
  .pipeThrough(new TextDecoderStream())
  .getReader();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  process.stdout.write(value); // stream tokens immediately
}
console.log('\nStream complete');

Output

Stream complete

Try it live

⚠ Production Trap:

Forgetting AbortSignal causes hanging connections on client disconnect. Always pass a signal from request lifecycle.

🎯 Key Takeaway

Web Streams API + AbortSignal gives you real-time AI output with cancellation safety.

Video Streaming with Range Requests

Streaming video over HTTP requires support for Range requests to enable seeking and efficient bandwidth usage. Without proper handling, clients may download the entire file before playback, causing OOM on the server. Node.js streams can be combined with HTTP Range headers to serve partial content. The key is to parse the Range header, create a read stream with start and end options, and respond with a 206 Partial Content status. This approach keeps memory usage proportional to the chunk size, not the file size. For production, use fs.createReadStream with start and end parameters, and set the Content-Range header. Always validate the range and handle edge cases like If-Range and multiple ranges (though single range is most common). This pattern is essential for media servers and large file downloads.

video-stream.jsJAVASCRIPT

const http = require('http');
const fs = require('fs');
const path = require('path');

const server = http.createServer((req, res) => {
  const filePath = path.join(__dirname, 'video.mp4');
  const stat = fs.statSync(filePath);
  const fileSize = stat.size;
  const range = req.headers.range;

  if (!range) {
    res.writeHead(200, {
      'Content-Type': 'video/mp4',
      'Content-Length': fileSize
    });
    fs.createReadStream(filePath).pipe(res);
    return;
  }

  const parts = range.replace(/bytes=/, '').split('-');
  const start = parseInt(parts[0], 10);
  const end = parts[1] ? parseInt(parts[1], 10) : fileSize - 1;
  const chunkSize = end - start + 1;

  res.writeHead(206, {
    'Content-Range': `bytes ${start}-${end}/${fileSize}`,
    'Accept-Ranges': 'bytes',
    'Content-Length': chunkSize,
    'Content-Type': 'video/mp4'
  });

  const stream = fs.createReadStream(filePath, { start, end });
  stream.pipe(res);
});

server.listen(3000);

Output

Server listens on port 3000. Clients can seek and stream video with low memory usage.

Try it live

⚠ Range Validation

Always validate the range to prevent out-of-bounds errors. A malformed range can crash the server or expose sensitive data.

📊 Production Insight

In production, add support for conditional requests (If-Range, ETag) and consider using a CDN for edge caching.

🎯 Key Takeaway

Use Range requests with fs.createReadStream to stream video without loading the entire file into memory.

AES-256 Encryption Pipeline

Encrypting large files on the fly requires streaming to avoid OOM. Node.js provides the crypto module with createCipheriv and createDecipheriv that work as transform streams. You can pipe a readable stream through a cipher and then to a writable stream. For AES-256-GCM, you need a key, an initialization vector (IV), and authentication tags. The pipeline must handle errors and close streams properly. Use pipeline from the stream module to ensure cleanup. This pattern is common for secure file uploads, backup encryption, and real-time data protection. Remember to store the IV and auth tag alongside the ciphertext for decryption.

encrypt-pipeline.jsJAVASCRIPT

const { createReadStream, createWriteStream } = require('fs');
const { createCipheriv, randomBytes } = require('crypto');
const { pipeline } = require('stream');

const algorithm = 'aes-256-gcm';
const key = randomBytes(32);
const iv = randomBytes(16);

const input = createReadStream('input.txt');
const output = createWriteStream('output.enc');
const cipher = createCipheriv(algorithm, key, iv);

pipeline(input, cipher, output, (err) => {
  if (err) {
    console.error('Encryption failed:', err);
  } else {
    console.log('Encryption complete. Key:', key.toString('hex'));
    console.log('IV:', iv.toString('hex'));
    console.log('Auth tag:', cipher.getAuthTag().toString('hex'));
  }
});

Output

Encrypts input.txt to output.enc using AES-256-GCM. Key, IV, and auth tag are printed for decryption.

Try it live

💡Key Management

Never hardcode keys. Use environment variables or a secrets manager. Rotate keys regularly.

📊 Production Insight

For GCM mode, always store the auth tag (16 bytes) with the ciphertext. Use a fixed key derivation function (e.g., PBKDF2) for password-based encryption.

🎯 Key Takeaway

Use crypto.createCipheriv as a transform stream in a pipeline to encrypt large files without buffering.

Web Streams API Bridging

Node.js 18+ supports the Web Streams API, which is compatible with browser APIs. This allows you to use ReadableStream, WritableStream, and TransformStream in Node.js. Bridging Node.js streams (e.g., fs.createReadStream) to Web Streams is done via Readable.toWeb() and Writable.fromWeb(). This is useful for isomorphic code or when using libraries that expect Web Streams (e.g., fetch response body). However, Web Streams lack backpressure handling by default, so you must manage it manually or use the pipeTo method with preventCancel options. For production, prefer Node.js streams for file I/O and Web Streams for HTTP responses or browser-compatible APIs.

web-streams-bridge.jsJAVASCRIPT

const { Readable } = require('stream');
const fs = require('fs');

// Convert Node.js stream to Web Stream
const nodeStream = fs.createReadStream('large-file.txt');
const webStream = Readable.toWeb(nodeStream);

// Use with fetch API or other Web Stream consumers
const reader = webStream.getReader();
reader.read().then(({ done, value }) => {
  if (!done) {
    console.log('Chunk:', value);
  }
});

// Convert Web Stream back to Node.js stream
const { Writable } = require('stream');
const webWritable = new WritableStream({
  write(chunk) {
    console.log('Received chunk of size', chunk.length);
  }
});
const nodeWritable = Writable.fromWeb(webWritable);
nodeStream.pipe(nodeWritable);

Output

Converts a Node.js read stream to a Web ReadableStream and back, demonstrating interoperability.

Try it live

🔥Backpressure in Web Streams

Web Streams do not automatically handle backpressure. Use the pipeTo method with preventBackpressure: false (default) to enable flow control.

📊 Production Insight

Avoid mixing stream types in the same pipeline due to subtle differences in error handling. Stick to one type per pipeline.

🎯 Key Takeaway

Use Readable.toWeb and Writable.fromWeb to bridge Node.js and Web Streams for isomorphic code.

S3 Multipart Upload with @aws-sdk/lib-storage

Uploading large files to S3 requires multipart upload to avoid OOM. The @aws-sdk/lib-storage package provides Upload class that handles multipart uploads with streams. It automatically splits the stream into parts and uploads them in parallel. You can configure partSize and queueSize to control memory usage. The Upload class returns a Done event when complete. This is far more efficient than reading the entire file into memory. For production, set partSize to 5MB (minimum) and queueSize to 4 for optimal throughput. Use pipeline to handle errors and cleanup.

s3-multipart-upload.jsJAVASCRIPT

const { S3Client } = require('@aws-sdk/client-s3');
const { Upload } = require('@aws-sdk/lib-storage');
const { createReadStream } = require('fs');
const { pipeline } = require('stream');

const s3Client = new S3Client({ region: 'us-east-1' });
const fileStream = createReadStream('large-file.bin');

const upload = new Upload({
  client: s3Client,
  params: {
    Bucket: 'my-bucket',
    Key: 'large-file.bin',
    Body: fileStream
  },
  partSize: 5 * 1024 * 1024, // 5MB
  queueSize: 4
});

upload.on('httpUploadProgress', (progress) => {
  console.log('Progress:', progress.loaded, '/', progress.total);
});

upload.done().then(() => {
  console.log('Upload complete');
}).catch((err) => {
  console.error('Upload failed:', err);
});

Output

Uploads large-file.bin to S3 using multipart upload with progress tracking.

Try it live

⚠ Memory with queueSize

Each part is buffered in memory. queueSize controls concurrency. For very large files, reduce queueSize to avoid OOM.

📊 Production Insight

Monitor progress events for user feedback. Set partSize based on network conditions; larger parts reduce API calls but increase memory per part.

🎯 Key Takeaway

Use @aws-sdk/lib-storage's Upload class with a stream to perform memory-efficient multipart uploads to S3.

highWaterMark Tuning Guide

The highWaterMark option controls the internal buffer size of a stream. For readable streams, it's the maximum number of bytes (or objects in object mode) to buffer before backpressure is applied. For writable streams, it's the high-water mark for the write queue. Tuning this value can prevent OOM and improve throughput. A larger highWaterMark reduces the frequency of read() calls but increases memory usage. A smaller value reduces memory but may cause more I/O operations. For file streams, the default is 64KB. For network streams, consider 16KB to 32KB. For high-latency connections, increase it to 256KB. Use highWaterMark on both readable and writable sides of a pipeline. Monitor memory with process.memoryUsage() and adjust accordingly. There's no one-size-fits-all; benchmark with realistic data.

highWaterMark-tuning.jsJAVASCRIPT

const fs = require('fs');
const { Transform } = require('stream');

// Create a readable stream with custom highWaterMark
const readable = fs.createReadStream('large-file.txt', {
  highWaterMark: 16 * 1024 // 16KB
});

// Create a transform stream with highWaterMark
const transform = new Transform({
  highWaterMark: 32 * 1024, // 32KB
  transform(chunk, encoding, callback) {
    this.push(chunk.toString().toUpperCase());
    callback();
  }
});

// Create a writable stream with highWaterMark
const writable = fs.createWriteStream('output.txt', {
  highWaterMark: 64 * 1024 // 64KB
});

readable.pipe(transform).pipe(writable);

writable.on('finish', () => {
  console.log('Done');
});

Output

Processes a file with custom buffer sizes. Adjust based on memory and throughput needs.

Try it live

💡Benchmark First

Always benchmark with your actual workload. A high highWaterMark can cause OOM if the stream is paused or slow.

📊 Production Insight

Use highWaterMark on both ends of a pipeline. For object mode, set it to a reasonable count (e.g., 16) to avoid memory bloat.

🎯 Key Takeaway

Tune highWaterMark to balance memory usage and throughput. Defaults are safe but not optimal for all scenarios.

thecodeforge.io

Nodejs Streams Buffers

Performance Benchmarks

Benchmarking stream performance is critical to avoid OOM and optimize throughput. Use benchmark.js or autocannon for HTTP streams. Measure memory usage with process.memoryUsage() and throughput in MB/s. Compare different highWaterMark values, stream types (e.g., Transform vs PassThrough), and pipeline configurations. For file I/O, fs.createReadStream with highWaterMark: 64KB is a good baseline. For network streams, test with realistic latency. Always run benchmarks on production-like hardware. Key metrics: peak memory, average throughput, and latency. Document results to guide future tuning. Example: streaming a 1GB file through a transform stream with highWaterMark: 16KB used 50MB memory and achieved 200MB/s throughput, while highWaterMark: 256KB used 200MB memory and achieved 400MB/s. Choose based on your memory budget.

benchmark-streams.jsJAVASCRIPT

const { performance } = require('perf_hooks');
const fs = require('fs');
const { Transform } = require('stream');

const fileSize = 1024 * 1024 * 1024; // 1GB
const highWaterMarks = [16 * 1024, 64 * 1024, 256 * 1024];

async function benchmark(hwm) {
  const start = performance.now();
  const readable = fs.createReadStream('/dev/zero', { highWaterMark: hwm });
  const transform = new Transform({
    highWaterMark: hwm,
    transform(chunk, encoding, callback) {
      callback(null, chunk);
    }
  });
  const writable = fs.createWriteStream('/dev/null', { highWaterMark: hwm });

  return new Promise((resolve, reject) => {
    readable.pipe(transform).pipe(writable);
    writable.on('finish', () => {
      const end = performance.now();
      const duration = (end - start) / 1000;
      const throughput = fileSize / duration / 1024 / 1024;
      resolve({ hwm, duration, throughput });
    });
    writable.on('error', reject);
  });
}

(async () => {
  for (const hwm of highWaterMarks) {
    const result = await benchmark(hwm);
    console.log(`highWaterMark: ${hwm/1024}KB, Duration: ${result.duration.toFixed(2)}s, Throughput: ${result.throughput.toFixed(2)} MB/s`);
  }
})();

Output

Benchmarks throughput for different highWaterMark values. Example output: highWaterMark: 64KB, Duration: 2.50s, Throughput: 400.00 MB/s

Try it live

🔥Realistic Data

Benchmark with data that mimics your production workload. /dev/zero is not realistic; use actual file content.

📊 Production Insight

Automate benchmarks in CI to catch regressions. Monitor memory in production with tools like clinic.js.

🎯 Key Takeaway

Benchmark stream configurations to find the optimal balance between memory and throughput for your use case.

● Production incidentPOST-MORTEMseverity: high

The 2 AM OOM Kill: How a Missing drain Handler Crashed Our Upload Service

Symptom

Upload service container killed by the OOM killer (exit code 137) under moderate load. RSS climbed linearly from 120 MB to 3.8 GB in under 90 seconds. No error logs appeared — just a silent SIGKILL. The first signal in the dashboards was the container restart counter incrementing, not any application-level error. By the time the on-call engineer connected, three nodes had already cycled.

Assumption

The team assumed pipe() handles all flow control automatically and that Node.js Streams are always memory-safe by default. This is a reasonable assumption if you have only read the documentation without implementing streams under asymmetric I/O conditions. The code had been in production for months handling normal upload volumes without incident — which made the assumption feel validated.

Root cause

The writable destination — an S3 multipart upload stream from the AWS SDK — could only accept data at roughly 5 MB/s due to network throughput constraints. The readable source — a fast local NVMe SSD — could produce data at roughly 500 MB/s. The pipe() call internally paused the readable when write() returned false, which was correct. The problem was in the custom Transform stream sitting between them. The Transform's _transform() method called its callback immediately without waiting for the underlying S3 stream to drain. This broke the backpressure chain at exactly the wrong point: the Transform kept accepting chunks from the readable, calling its callback, triggering the readable to continue, but never signalling the readable to actually pause. The S3 stream was a 100:1 speed mismatch away, and the Transform was buffering everything in between with no bound. Total accumulation rate: approximately 40 MB/s of unreachable but referenced Buffer memory.

Fix

Replaced the manual Transform wrapper with pipeline() from stream/promises, which enforces backpressure across the entire chain including async generators. Ensured the Transform's _flush() method awaited the underlying S3 stream's drain event before calling its callback. Added a highWaterMark of 16 KB on the Transform to limit in-flight chunks. Added RSS and external memory monitoring to the health endpoint, with a 500 MB RSS threshold that returns HTTP 503 — giving the load balancer visibility into memory pressure before the OOM killer acts.

Key lesson

pipe() only propagates backpressure to direct neighbours — wrapping a stream in a Transform breaks the chain unless the Transform correctly propagates write() return values all the way through
Always test upload paths under asymmetric speed conditions — a fast local SSD and a slow S3 stream is not an edge case, it is the production reality for any service that accepts uploads
Monitor RSS, not just heap — Buffer memory lives outside V8's garbage collector and will not show up in heap snapshots or heapUsed metrics
Add memory-based health check thresholds that return 503 before the OOM killer fires — the load balancer cannot shed load if the process gives no signal that it is in trouble

Production debug guideDiagnose stream and buffer issues in production Node.js services5 entries

Symptom · 01

RSS grows linearly but heap stays flat

→

Fix

Check for Buffer accumulation — run node --expose-gc -e "setInterval(()=>{global.gc();console.log(process.memoryUsage())}, 5000)" and watch the external and rss values over time. If external climbs after each GC cycle, Buffers are being allocated and not released. Cross-reference with stream activity — if RSS growth correlates with upload or download volume, the backpressure chain is broken somewhere in the pipeline.

Symptom · 02

Writable stream never fires the finish event

→

Fix

Verify the final callback in _write() is being called on every code path — add a temporary console.log immediately before each callback() invocation in your _write() and _transform() methods. The most common cause is a code path that returns early without calling the callback, which hangs the stream indefinitely. Also check whether the stream's end() method was called on the writable side — if the readable ended but end() was never called, finish will not fire.

Symptom · 03

pipe() stops transferring data mid-stream with no error

→

Fix

Check if write() returned false and the readable was paused but never resumed. Listen for the drain event on the writable to confirm backpressure engaged: writable.on('drain', () => console.log('drained')). If drain never fires, the writable may be stalled waiting for an underlying resource — a network socket, a slow disk, or a rate-limited API. Check process.memoryUsage().external to confirm whether data is accumulating in memory rather than flowing.

Symptom · 04

Transform stream emits data after destroy() was called

→

Fix

Guard _transform() with a destroyed check at the top: if (this.destroyed) return callback(). The destroy() method sets the destroyed flag but does not immediately abort in-flight _transform() calls that are already executing. Without this guard, chunks queued before destruction will still be processed and pushed, which can cause downstream errors or unexpected behavior on already-closed streams.

Symptom · 05

ERR_STREAM_PREMATURE_CLOSE in production logs

→

Fix

A stream in a pipeline was destroyed before all data was flushed — most commonly because a client disconnected mid-upload or mid-download. Use pipeline() with async/await and catch the specific error code to handle this gracefully rather than letting it propagate as an unhandled rejection. For upload services, distinguish ERR_STREAM_PREMATURE_CLOSE from other errors so you can clean up partial S3 multipart uploads without logging a false alarm.

★ Streams and Buffers — Quick Debug ReferenceRapid diagnostics for stream and memory issues in production Node.js applications

Memory leak suspected from Buffers−

Immediate action

Capture heap snapshot and check Buffer count — but remember external memory will not appear in the snapshot

Commands

node --inspect app.js

node -e "const v8=require('v8');v8.writeHeapSnapshot()"

Fix now

Replace Buffer.allocUnsafe with Buffer.alloc in user-facing paths. Ensure Buffers are dereferenced after use. Instrument process.memoryUsage().external in your health endpoint and watch for growth that correlates with stream activity.

Stream hangs — no data flowing+

High CPU usage from stream processing+

Backpressure not working — consumer overwhelmed+

Streams vs Buffers vs Pipe vs Pipeline

Feature	Stream	Buffer	pipe()	pipeline()
What it is	Async iterator over data chunks	Fixed-size binary container	Method to connect streams	Production-safe stream chaining
Memory location	Internal buffer (V8 C++ layer)	External memory (libuv)	N/A	N/A
Backpressure	Built-in via highWaterMark	N/A (fixed size)	Automatic between direct neighbours	Automatic across entire chain
Error propagation	Emits 'error' event	N/A	Does not propagate errors	Propagates all errors; destroys all streams
Resource cleanup on error	Depends on consumer	N/A	Does not clean up	Auto-destroys all streams
Async generator support	N/A	N/A	No	Yes (Node 18+)
Production recommendation	Use `pipeline()` instead of manual pipe	Use Buffer.alloc for security, allocUnsafe for performance	Avoid in production	Always use for multi-stream operations

⚙ Quick Reference

15 commands from this guide

File	Command / Code	Purpose
iothecodeforgebuffersbuffer-allocation.js	const { Buffer } = require('buffer');	Buffer Internals
iothecodeforgestreamsstream-state-inspection.js	const { Readable, Writable, Transform, PassThrough } = require('stream');	Stream Types and Their Internal State Machines
iothecodeforgestreamsbackpressure-manual.js	const fs = require('fs');	Backpressure
iothecodeforgestreamspipeline-production.js	const { pipeline } = require('stream/promises');	pipe() vs pipeline()
iothecodeforgestreamscustom-transform-production.js	const { Transform } = require('stream');	Building a Custom Transform Stream for Production
MemoryComparison.js	const fs = require('fs');	Why Streams Matter
ErrorHandling.js	const { pipeline } = require('stream/promises');	Error Handling in Streams
MongoExport.js	const cursor = Model.find().batchSize(1000).cursor();	Scenario A
AIStream.js	const response = await fetch(url, {	Scenario B
video-stream.js	const http = require('http');	Video Streaming with Range Requests
encrypt-pipeline.js	const { createReadStream, createWriteStream } = require('fs');	AES-256 Encryption Pipeline
web-streams-bridge.js	const { Readable } = require('stream');	Web Streams API Bridging
s3-multipart-upload.js	const { S3Client } = require('@aws-sdk/client-s3');	S3 Multipart Upload with @aws-sdk/lib-storage
highWaterMark-tuning.js	const fs = require('fs');	highWaterMark Tuning Guide
benchmark-streams.js	const { performance } = require('perf_hooks');	Performance Benchmarks

Key takeaways

Streams process data in chunks without loading everything into memory

enabling handling of files larger than RAM.

Buffers live outside V8's heap

heap profilers won't reveal their memory usage; monitor external and RSS instead.

Backpressure is cooperative, not automatic

ignoring write() returning false leads to unbounded memory growth.

highWaterMark is advisory

the stream will still accept data beyond it if the producer ignores the signal.

pipe() does not propagate errors

always use pipeline() from stream/promises in production.

Custom Transform streams break backpressure if _transform() calls callback() synchronously without checking push() return value.

Always guard _transform() with a destroyed check and implement _flush() for buffering Transforms.

Video Streaming with Range Requests

Use HTTP Range headers and fs.createReadStream with start/end to stream video without buffering the entire file.

AES-256 Encryption Pipeline

Pipe a readable stream through crypto.createCipheriv to encrypt large files on the fly, storing the IV and auth tag for decryption.

S3 Multipart Upload

Use @aws-sdk/lib-storage's Upload class with a stream to upload large files to S3 without loading them into memory.

Video Streaming with Range Requests

Implement HTTP range requests to serve partial content, preventing OOM and enabling seeking. Use fs.createReadStream with start and end options.

AES-256 Encryption Pipeline

Stream encrypt large files using crypto.createCipheriv and pipeline. Prepend IV and append auth tag. Never use pipe alone.

highWaterMark Tuning

Adjust highWaterMark to balance memory and throughput. Defaults are safe; profile to find optimal values for your workload.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the concept of backpressure in Node.js Streams. How does highWat...

Q02SENIOR

Why might a heap snapshot show only 50 MB while RSS is 2 GB? How would y...

Q03SENIOR

What is the difference between pipe() and pipeline() in Node.js? When wo...

Q04SENIOR

Describe the internal state machine of a Readable stream. What does read...

Q05SENIOR

How does Buffer.allocUnsafe differ from Buffer.alloc, and what are the i...

Q01 of 05SENIOR

Explain the concept of backpressure in Node.js Streams. How does highWaterMark influence it?

ANSWER

Backpressure is a cooperative flow-control mechanism between a Readable and a Writable stream. When the Writable's internal buffer exceeds highWaterMark (default 16 KB for binary streams), write() returns false, signalling the producer to pause. The correct producer behaviour is to stop writing and wait for the drain event before resuming. highWaterMark is not a hard limit — if the producer ignores the false return, the buffer keeps growing in memory until OOM. pipe() and pipeline() automate this protocol, but custom Transforms can break the chain by calling _transform() callback immediately without waiting for downstream drain.

FAQ · 11 QUESTIONS

Frequently Asked Questions

What is the default highWaterMark for a Node.js stream?

Can a process run out of memory because of a stream?

How do I check if a stream is in flowing mode?

What does ERR_STREAM_PREMATURE_CLOSE mean?

Why does my custom Transform lose the last chunk of data?

How do I handle Range requests for video streaming in Node.js?

Can I use Web Streams API in Node.js for file I/O?

What is the optimal highWaterMark for a file read stream?

How do I handle range requests for video streaming in Node.js?

What's the difference between `pipe()` and `pipeline()` for encryption?

How do I convert a Node.js stream to a Web Stream for use with fetch?

Naren Founder & Principal Engineer

20+ years shipping production JavaScript and front-end systems at scale. Drawn from code that ran under real load.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's Node.js. Mark it forged?

12 min read · try the examples if you haven't