Node.js runs on a single-threaded event loop with libuv handling async I/O via a thread pool (default 4 threads, tunable via UV_THREADPOOL_SIZE)
The event loop has 6 distinct phases (timers, pending, idle/prepare, poll, check, close) — blocking any phase freezes the entire process for every connected client
cluster module forks one process per CPU core, sharing no memory but enabling true parallel request handling across cores
Worker threads share memory via SharedArrayBuffer and handle CPU-bound work without blocking the event loop — use a fixed pool, never spawn per request
Memory leaks typically come from unbounded caches, forgotten timers, or closure-retained references — detect with --inspect and heap snapshot comparison
Production profiling (flamegraphs, clinic.js) reveals bottlenecks that synthetic benchmarks never expose
Node.js 22 LTS (active as of 2026) ships with a native test runner, built-in WebSocket client, and improved single-executable application support — no runtime changes affect the event loop model described here
Plain-English First
Imagine a single brilliant chef running your entire restaurant kitchen alone. They're fast — genuinely fast — but if one order takes 20 minutes to prepare (like baking a cake from scratch), every other customer waits. The chef doesn't serve table two while table one's cake is in the oven. Node.js is that chef: one thread, blazing fast for quick tasks, but one slow synchronous operation can freeze everything else behind it. Optimising Node.js means teaching that chef to delegate long jobs to kitchen assistants, batch similar tasks efficiently, and never stand still doing nothing when there are orders waiting. Clustering is like opening multiple identical kitchens. Worker threads are the kitchen assistants for heavy prep work.
Every Node.js app starts fast. Then reality hits — traffic spikes, database queries pile up, memory climbs steadily overnight, and that 'non-blocking' promise starts feeling like a lie. The truth is Node.js is genuinely efficient by design, but that efficiency has a very specific shape. Violate that shape and you'll hit performance walls that no amount of horizontal scaling will fully fix.
Node.js is single-threaded, but not single-concurrent. The event loop, backed by libuv's thread pool and epoll/kqueue/io_uring at the OS level, handles thousands of concurrent I/O operations without spawning OS threads. The problem starts when you confuse 'non-blocking I/O' with 'can do anything without consequence'. A JSON.parse on a 50MB payload, a synchronous crypto operation, or a tight computation loop will block the event loop — and every connected client pays the price simultaneously.
In 2026, Node.js 22 LTS is the active long-term support release (Node.js 20 LTS reaches maintenance mode in October 2026). Node.js 22 shipped with io_uring support on Linux for significantly faster async file I/O, a stable built-in test runner, native WebSocket client support, and performance improvements to V8 12.x. None of these changes alter the event loop model or the clustering architecture described in this guide — the fundamentals remain exactly what they were, and the mistakes remain exactly as costly.
This guide covers the internals that matter in production: event loop phases and timing, clustering for multi-core utilisation, worker threads for CPU-bound work, memory leak patterns and detection, and the profiling tools that reveal what synthetic benchmarks consistently miss. Every section is grounded in production behaviour, not toy examples.
Event Loop Internals: Phases, Timing, and What Blocks Everything
The Node.js event loop is not a simple FIFO queue. It runs in six distinct phases, each with its own callback queue, executed in a fixed order every tick. Understanding this order is the difference between writing code that performs predictably in production and code that surprises you at 3 AM on-call.
The phases in execution order: timers (setTimeout/setInterval callbacks whose delay has expired), pending callbacks (I/O callbacks deferred from the previous iteration, like TCP error notifications), idle/prepare (internal V8 housekeeping — you don't interact with this), poll (new I/O events — this is where most work happens, and where the loop may block waiting for events when there's nothing queued), check (setImmediate callbacks), and close callbacks (socket.on('close', ...) handlers).
Between each phase transition, the microtask queue is fully drained. This includes process.nextTick callbacks first, then resolved Promise.then handlers. This ordering matters more than most engineers realise — process.nextTick fires before any Promise.then, which fires before the next event loop phase.
The critical practical insight: setImmediate always runs after the poll phase completes within the current iteration. setTimeout(fn, 0) runs in the next iteration's timer phase. Under I/O callbacks specifically, setImmediate fires first — always. This isn't academic trivia. It determines the ordering of operations in streaming pipelines, connection teardown sequences, and graceful shutdown logic where getting the order wrong causes data loss or unclosed handles.
On Node.js 22 LTS, the underlying I/O layer on Linux now uses io_uring where available, which significantly reduces syscall overhead for file I/O operations. The event loop phase model is unchanged — what changes is the speed at which the poll phase can process file system events.
event-loop-phases.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
const fs = require('fs');
// Demonstrates the phase ordering — run this and study the output
fs.readFile(__filename, () => {
// We are now inside an I/O callback (poll phase)setTimeout(() => console.log('3: setTimeout'), 0);
// Will run in the NEXT iteration's timer phasesetImmediate(() => console.log('2: setImmediate'));
// Will run in THIS iteration's check phase — before setTimeoutPromise.resolve().then(() => console.log('1: Promise.then'));
// Microtask — runs before check phase
process.nextTick(() => console.log('0: nextTick'));
// Microtask — runs before Promise.then, before any phase
});
// Output order:// 0: nextTick (microtask, highest priority — before any phase transition)// 1: Promise.then (microtask, runs after nextTick queue is empty)// 2: setImmediate (check phase — this iteration)// 3: setTimeout (timer phase — NEXT iteration)// ── Production pattern: yield the event loop during batch processing ───────// Without yielding, processing 100,000 items blocks the loop for the full duration.// With setImmediate between batches, I/O and other requests can interleave.functionprocessBatchWithYield(items, batchSize, processFn, onComplete) {
let index = 0;
functionprocessNextBatch() {
const batchEnd = Math.min(index + batchSize, items.length);
// Process one batch synchronouslyfor (; index < batchEnd; index++) {
processFn(items[index]);
}
if (index < items.length) {
// Yield to the event loop — allows pending I/O and HTTP requests to run// setImmediate is better than setTimeout(fn, 0) here:// no artificial 1ms delay, runs in the check phase of the current iterationsetImmediate(processNextBatch);
} else {
onComplete();
}
}
processNextBatch();
}
// Usage:processBatchWithYield(
largeArray,
500, // Process 500 items per batch before yielding
(item) => transform(item),
() => console.log('All items processed')
);
// ── Measuring actual event loop lag in production ──────────────────────────// Schedule a task, measure how long it actually takes to execute vs expected.// The delta is your event loop lag.functionmeasureEventLoopLag() {
const INTERVAL_MS = 1000;
let lastCheck = Date.now();
setInterval(() => {
const now = Date.now();
const lag = now - lastCheck - INTERVAL_MS;
lastCheck = now;
if (lag > 50) {
console.warn(`Event loop lag: ${lag}ms — investigate blocking operations`);
}
// In production: expose this as a Prometheus gauge
}, INTERVAL_MS);
}
measureEventLoopLag();
Output
// Output from the phase ordering demo:
0: nextTick
1: Promise.then
2: setImmediate
3: setTimeout
// Output from measureEventLoopLag() under normal load:
// (no output — lag under 50ms)
// Output from measureEventLoopLag() when a synchronous operation blocks the loop:
process.nextTick() cuts the queue entirely — it runs before the next station even starts, making it useful for deferring work within the current operation but dangerous if overused (it can starve I/O)
Promise.then() callbacks also run as microtasks, after nextTick is exhausted but before any phase transition
The poll phase is where the loop spends most of its time — it processes I/O callbacks and may block waiting for new I/O events when the queue is empty and no timers are pending
setImmediate was designed specifically for yielding inside I/O callbacks — it fires before the next timer phase without the 1ms minimum delay that setTimeout carries
Blocking any phase for more than 10-20ms degrades latency for every connected client simultaneously — there is no isolation between requests in a single Node.js process
Production Insight
In production, the single most common event loop blocker is JSON.parse on large request bodies.
Parsing a 10MB JSON payload takes 50-200ms depending on structure complexity and CPU generation — during that entire window, zero other requests are processed.
On Node.js 22 with V8 12.x, JSON.parse performance has improved, but the blocking nature has not changed — it is still synchronous.
Rule: stream-parse large bodies with a streaming JSON parser, or enforce a strict body size limit at the reverse proxy (nginx, AWS ALB) before the payload reaches Node.js.
Key Takeaway
The event loop has six phases with fixed execution order — microtasks (nextTick, Promise.then) drain between every phase transition.
Blocking any phase blocks all concurrent connections for the full duration — there is no request isolation in a single process.
Yield long synchronous work with setImmediate between batches, or offload to worker threads for work that cannot be chunked.
Node.js 22 on Linux uses io_uring for faster async file I/O — the phase model is unchanged, but poll phase throughput for file operations improves.
Choosing Between setTimeout, setImmediate, and nextTick
IfNeed to defer work until after I/O callbacks complete in the current event loop iteration
→
UseUse setImmediate — runs in the check phase immediately after poll, with no artificial timer delay
IfNeed work to run before any I/O, timer, or check callbacks — highest priority deferral
→
UseUse process.nextTick — runs as a microtask before the next event loop phase, but use sparingly as it can starve I/O if called recursively
IfNeed a minimum wall-clock delay before execution
→
UseUse setTimeout — minimum granularity is ~1ms due to timer resolution; Node.js 22 improves timer accuracy but does not eliminate this floor
IfProcessing a large array or dataset without blocking the event loop
→
UseBatch with setImmediate between chunks — yield after every N items to allow pending I/O and incoming requests to interleave
IfCPU-bound work that will take more than 10ms
→
UseOffload to a worker thread via piscina — do not try to chunk it with setImmediate if the work cannot be easily divided
Clustering: Scaling Across CPU Cores Without Shared State
Node.js runs on a single thread. A single process on a 32-core machine uses roughly 3% of available CPU — the other 97% sits idle while your users wait. The cluster module solves this by forking the main process into N worker processes, each with its own event loop, its own V8 heap, and its own memory space. The primary process accepts incoming connections and distributes them to workers via IPC.
Clustering is not the same as load balancing across machines. Cluster workers share the same physical host — same CPU, same RAM, same network interface, same file descriptor table at the OS level. The scaling ceiling is not just CPU. As you add workers, shared resources become the actual bottleneck: a database connection pool of 20 connections shared across 16 workers gives each worker an average of 1.25 connections, which serialises any concurrent database work completely.
On Linux, the default scheduling policy (SCHED_NONE) delegates connection distribution to the OS. This sounds neutral but can produce significantly uneven worker load in practice — some workers handle 3x more requests than others depending on connection arrival timing. Setting NODE_CLUSTER_SCHED_POLICY=rr enables round-robin scheduling in the primary process, which distributes connections more evenly at the cost of a small IPC overhead per connection.
In Kubernetes in 2026, the question of whether to cluster at all has a nuanced answer. If your pod is allocated 1 CPU, clustering gives you nothing — you have one core. If your pod is allocated 4 CPUs, clustering with 3-4 workers utilises those cores. The K8s-preferred pattern is one process per pod (no clustering) with the orchestrator managing pod count for horizontal scaling — this gives you cleaner isolation, simpler crash recovery, and better resource accounting. But if your workload has high per-request CPU variance (some requests are light, some are heavy), clustering within a multi-CPU pod still wins on tail latency.
cluster.production.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
const cluster = require('cluster');
const http = require('http');
const os = require('os');
// In K8s: WEB_CONCURRENCY can be injected as an env var matching allocated CPUs.// In VMs/bare metal: default to os.cpus().length.// Never hard-code a number — it breaks when the container size changes.const WORKER_COUNT = parseInt(process.env.WEB_CONCURRENCY, 10) || os.cpus().length;
constPORT = parseInt(process.env.PORT, 10) || 3000;
if (cluster.isPrimary) {
console.log(`Primary ${process.pid} — forking ${WORKER_COUNT} workers`);
// Track each worker's lifecycle separatelyconst workerMeta = newMap();
for (let i = 0; i < WORKER_COUNT; i++) {
spawnWorker();
}
functionspawnWorker() {
const worker = cluster.fork();
workerMeta.set(worker.id, {
pid: worker.process.pid,
startedAt: Date.now(),
crashCount: 0,
});
console.log(`Worker ${worker.id} (pid ${worker.process.pid}) started`);
return worker;
}
cluster.on('exit', (worker, code, signal) => {
const meta = workerMeta.get(worker.id);
meta.crashCount++;
const reason = signal || `exit code ${code}`;
console.error(`Worker ${worker.id} exited (${reason}). Crash count: ${meta.crashCount}`);
// Crash loop protection: if a worker crashes more than 5 times,// something is fundamentally wrong. Don't keep restarting it.// In K8s, the pod will be replaced. In a VM, alert and investigate.if (meta.crashCount > 5) {
console.error(`Worker ${worker.id} has crashed ${meta.crashCount} times — stopping restarts`);
// Remove from tracking so we don't endlessly accumulate stale entries
workerMeta.delete(worker.id);
return;
}
// Brief delay before respawning to avoid crash storms on startup failuressetTimeout(() => spawnWorker(), 500);
});
// Rolling restart: send shutdown signal to each worker in sequence,// wait for it to drain and exit, then restart it.// This achieves zero-downtime restarts without PM2.asyncfunctionrollingRestart() {
const workerIds = Object.keys(cluster.workers);
for (const id of workerIds) {
awaitnewPromise((resolve) => {
const worker = cluster.workers[id];
if (!worker) { resolve(); return; }
worker.send('graceful-shutdown');
worker.once('exit', () => {
spawnWorker();
// Give the new worker 2 seconds to initialise before draining the next onesetTimeout(resolve, 2000);
});
});
}
console.log('Rolling restart complete');
}
process.on('SIGUSR1', rollingRestart);
process.on('SIGTERM', () => {
console.log('Primary received SIGTERM — initiating graceful shutdown');
for (const id in cluster.workers) {
cluster.workers[id].send('graceful-shutdown');
cluster.workers[id].disconnect();
}
});
} else {
// Worker process — this is where your actual application runsconst server = http.createServer((req, res) => {
res.writeHead(200, { 'Content-Type': 'text/plain' });
res.end(`Worker ${cluster.worker.id} (pid ${process.pid}) handled this request`);
});
server.listen(PORT, () => {
console.log(`Worker ${cluster.worker.id} listening on port ${PORT}`);
});
// Handle graceful shutdown: stop accepting new connections,// finish in-flight requests, then exit cleanly.
process.on('message', (msg) => {
if (msg === 'graceful-shutdown') {
console.log(`Worker ${cluster.worker.id} shutting down gracefully`);
server.close(() => {
console.log(`Worker ${cluster.worker.id} shutdown complete`);
process.exit(0);
});
// Force exit if graceful shutdown takes too long (e.g., a stuck WebSocket)setTimeout(() => {
console.error(`Worker ${cluster.worker.id} forced exit after timeout`);
process.exit(1);
}, 30_000);
}
});
// Catch unhandled rejections at the worker level// Log them fully before the process exits — silent rejections lose stack traces
process.on('unhandledRejection', (reason, promise) => {
console.error('Unhandled rejection in worker', cluster.worker.id, reason);
// Depending on the severity, you may want to exit here:// process.exit(1);
});
}
Output
Primary 1234 — forking 4 workers
Worker 1 (pid 1235) listening on port 3000
Worker 2 (pid 1236) listening on port 3000
Worker 3 (pid 1237) listening on port 3000
Worker 4 (pid 1238) listening on port 3000
# After sending SIGTERM to the primary:
Primary received SIGTERM — initiating graceful shutdown
Worker 1 shutting down gracefully
Worker 2 shutting down gracefully
Worker 3 shutting down gracefully
Worker 4 shutting down gracefully
Worker 1 shutdown complete
Worker 2 shutdown complete
Worker 3 shutdown complete
Worker 4 shutdown complete
# If a worker crashes:
Worker 2 exited (exit code 1). Crash count: 1
Worker 5 (pid 1291) listening on port 3000
Cluster Workers Share Absolutely Nothing in Memory
Each cluster worker is a completely separate V8 isolate with its own heap. In-memory caches, session stores, rate limiters, and feature flag states are NOT shared across workers — they exist independently in each worker's heap. A user whose request hits worker 1 has zero visibility into what worker 2 has cached. In practice this means: any state that must be consistent across requests must live in an external store (Redis for sessions and caches, a database for rate limiting counters). In-memory state is only reliable in single-process deployments, which is fine for local development and occasionally fine for small internal tools, but never appropriate for multi-worker production services.
Production Insight
Cluster workers die silently in production more often than engineers expect.
Unhandled promise rejections (which become fatal in Node.js 15+), native addon segfaults, and OOM kills all crash workers cleanly without obvious log output beyond the exit event.
Node.js 22 still propagates unhandled promise rejections as fatal by default — if your workers are dying and you're not sure why, add a process.on('unhandledRejection') handler to log the full rejection reason before the process exits.
Rule: implement worker death tracking with crash count limits. A crash-looping worker that restarts every 500ms consumes resources and delays investigation. Cap restarts at 5, alert, and let the orchestrator handle replacement.
Key Takeaway
Cluster forks one process per CPU core — each gets its own V8 heap, its own event loop, and its own memory space.
Workers share nothing in memory — sessions, caches, and rate limiters must live in Redis or an equivalent external store.
Shared resource contention (database connection pool size, file descriptor limits) is the actual scaling ceiling, not CPU count.
In Kubernetes, prefer one process per pod plus K8s horizontal pod autoscaling over intra-pod clustering for cleaner isolation and resource accounting.
Worker Threads for CPU-Intensive Work — And When Not to Use Them
Cluster processes are heavyweight. Each fork creates a full V8 instance with its own heap, its own garbage collector, and its own event loop. Worker threads are lighter in a specific way: they run within the same process, can share memory via SharedArrayBuffer, and have a startup cost of 10-50ms versus the 100-300ms for a full process fork.
Worker threads are not a general-purpose concurrency model. They exist for one job: CPU-bound parallelism within a single request or operation lifecycle. They excel at image processing, cryptographic key derivation, data transformation, report generation, ML model inference, and any algorithm that is genuinely computation-heavy. For I/O-bound work — database queries, HTTP calls, file reads — the event loop already handles concurrency efficiently. Adding threads for I/O adds complexity, coordination overhead, and thread pool management with zero performance benefit.
The architectural difference that matters most: worker threads can share memory via SharedArrayBuffer with Atomics for synchronisation. This enables zero-copy data sharing between threads — you pass a reference to a shared buffer, not a serialised copy of the data. For high-throughput data processing where the payload size is large, the difference between zero-copy and serialise-deserialise can dominate latency entirely.
The rule about thread pools is non-negotiable in production: never spawn a worker thread per incoming request. Thread creation has real overhead — 10-50ms startup, ~5-10MB memory per thread including stack and V8 overhead. Under concurrent load, naive spawn-per-request causes thread storms. 200 simultaneous requests spawn 200 threads, the process hits OS thread limits, thread creation starts failing, and the whole thing collapses. The correct pattern is a fixed-size thread pool — piscina is the most production-hardened library for this in the Node.js ecosystem as of 2026.
workers.cpu-task.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
// ── cpu-worker.js ─────────────────────────────────────────────────────────// This file runs inside each worker thread.// It receives work via workerData, does computation, and posts the result back.const { parentPort, workerData, threadId } = require('worker_threads');
const crypto = require('crypto');
functioncomputeHashes(count) {
// This is genuinely CPU-bound — no I/O, pure computation.// Running this on the main thread would block the event loop for the full duration.const results = newArray(count);
for (let i = 0; i < count; i++) {
results[i] = crypto
.createHash('sha256')
.update(`payload-${i}-${Date.now()}`)
.digest('hex');
}
return results;
}
parentPort.postMessage({
threadId,
count: workerData.iterations,
sample: computeHashes(workerData.iterations)[0],
});
// ── thread-pool.js ────────────────────────────────────────────────────────// Production pattern: fixed-size thread pool using piscina.// Never spawn a thread per request. Queue excess work.constPiscina = require('piscina');
const path = require('path');
// Create the pool once at application startup — not per request.// maxThreads should roughly match CPU core count.// idleTimeout: threads that have been idle for 30s are terminated to free memory.const pool = newPiscina({
filename: path.resolve(__dirname, 'cpu-worker.js'),
maxThreads: parseInt(process.env.WORKER_THREADS, 10) || (require('os').cpus().length - 1),
idleTimeout: 30_000,
});
// In your Express/Fastify route handler:asyncfunctionhandleHashRequest(req, res) {
const { iterations = 1000 } = req.query;
// Cap iterations to prevent intentional DoS via this endpointconst safeIterations = Math.min(parseInt(iterations, 10), 10_000);
try {
// This runs in a worker thread — the event loop is completely free// to handle other requests while this computation runs.const result = await pool.run({ iterations: safeIterations });
res.json({ threadId: result.threadId, count: result.count, sample: result.sample });
} catch (err) {
// piscina surfaces worker errors as rejected promises
console.error('Worker thread error:', err);
res.status(500).json({ error: 'Computation failed' });
}
}
// Monitor pool queue depth — if it grows unboundedly, you need more threads// or you need to reject requests earlier with a 503setInterval(() => {
if (pool.queueSize > 50) {
console.warn(`Thread pool queue depth: ${pool.queueSize} — consider increasing maxThreads or load shedding`);
}
}, 5_000);
// ── SharedArrayBuffer pattern for zero-copy data sharing ──────────────────// When passing large datasets to worker threads, avoid serialisation overhead// by sharing the underlying ArrayBuffer directly.const { Worker } = require('worker_threads');
functionprocessLargeDataset(data) {
// Convert data to a typed array for sharingconst sharedBuffer = newSharedArrayBuffer(data.length * 4);
const sharedView = newInt32Array(sharedBuffer);
// Copy data into shared buffer — this is the only copy
data.forEach((val, i) => { sharedView[i] = val; });
returnnewPromise((resolve, reject) => {
const worker = newWorker(`
const { workerData, parentPort } = require('worker_threads');
const view = newInt32Array(workerData.buffer);
// Process in-place — no additional memory allocation for the data itselflet sum = 0;
for (let i = 0; i < view.length; i++) sum += view[i];
parentPort.postMessage({ sum });
`, {
eval: true,
workerData: { buffer: sharedBuffer } // Transferred, not copied
});
worker.on('message', resolve);
worker.on('error', reject);
});
}
// (threadId 1 was reused — threads stay alive and handle multiple tasks)
// Pool queue warning under sustained load:
// Thread pool queue depth: 67 — consider increasing maxThreads or load shedding
// Comparison: blocking event loop vs worker thread
// Without worker thread:
// Request latency during 10,000 hash computation: ~380ms blocked
// Other requests during that time: 0 served
//
// With worker thread pool:
// Request latency during 10,000 hash computation: ~380ms (in thread)
// Other requests during that time: served normally, <5ms latency
Cluster vs Worker Threads — Two Different Problems
Cluster workers: each gets its own event loop and V8 heap — best when you need to handle more concurrent I/O-bound requests than one event loop can manage
Worker threads: share the parent process heap (optionally), lighter startup — best when a single request triggers a CPU-bound computation that would block the event loop
If your bottleneck is database query throughput or concurrent HTTP connections, clustering wins — more event loops means more concurrent I/O operations
If your bottleneck is computation within a request (hashing, image resizing, PDF generation, ML inference), worker threads win — CPU parallelism without full process overhead
You can combine both: a clustered app where each worker has a small thread pool for the occasional CPU-bound request — this is the right architecture for mixed I/O and CPU workloads
Never spawn a thread per request — use piscina with a fixed maxThreads sized to CPU count minus 1 (leave one core for the event loop)
Production Insight
Worker thread startup costs 10-50ms and 5-10MB memory per thread on Node.js 22.
Spawning threads per request under load creates thread storms — 500 concurrent requests attempt to spawn 500 threads, hit OS limits, and the process crashes or thrashes.
Piscina's queue mechanism is your safety valve: excess tasks queue rather than spawning new threads. Monitor pool.queueSize as a metric — if it grows consistently, you need more threads or earlier load shedding.
Rule: create the pool once at startup, size maxThreads to CPU count, cap queueSize with a circuit breaker that returns 503 when the queue exceeds your SLA threshold.
Key Takeaway
Worker threads handle CPU-bound work without blocking the event loop — they are not for I/O.
Use piscina for thread pool management — never spawn per request, never more threads than CPU cores.
SharedArrayBuffer enables zero-copy data sharing between threads — critical for high-throughput data processing.
You can combine clustering and worker threads: clustered workers each with a small thread pool is the right architecture for mixed I/O and CPU workloads.
Memory Management, GC Pauses, and Leak Detection That Actually Works
Node.js uses V8's generational garbage collector. Objects start in the young generation (also called the nursery or new space), which is collected via a fast scavenge algorithm that runs frequently. Objects that survive two scavenge cycles are promoted to the old generation, which is collected via a mark-sweep-compact algorithm — slower, less frequent, and critically, it pauses the event loop while it runs.
The size of those GC pauses scales with old generation heap utilisation. A heap at 30% utilisation with 512MB old space might produce 5-15ms GC pauses. The same heap at 90% utilisation triggers increasingly frequent major GC cycles producing 50-200ms pauses. Those pauses look exactly like event loop blocking in your latency metrics — P99 spikes with no corresponding CPU spike.
The default old space limit is approximately 1.5GB on 64-bit systems (slightly higher on Node.js 22 due to V8 12.x improvements). This is not a target — it is a ceiling. Running a 1.4GB heap is not healthy; it means V8 is under severe GC pressure. Use --max-old-space-size to set an appropriate limit based on your container allocation, then use clustering to run multiple smaller heaps rather than one large one.
The most dangerous memory leaks in production are the gradual ones. A slow-growing Map, an event emitter accumulating listeners, a closure in a middleware capturing a request object — none of these cause immediate failures. They grow over hours or days, GC pauses increase gradually, latency percentiles drift upward, and the eventual OOM crash looks like a random event rather than the conclusion of a long-running leak. Heap snapshot comparison is the only reliable way to find them.
memory.leak-patterns.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
// ══════════════════════════════════════════════════════════════════// COMMON PRODUCTION LEAK PATTERNS — what they look like and how to fix them// ══════════════════════════════════════════════════════════════════// ── PATTERN 1: Unbounded in-memory cache ──────────────────────────// The most common leak in production Node.js services.// Works fine in staging (low traffic, frequent restarts).// Reaches OOM in production after 12-48 hours of sustained traffic.const BAD_cache = {};
functiongetUserBad(id) {
if (!BAD_cache[id]) {
BAD_cache[id] = fetchFromDb(id); // Added forever, never evicted
}
return BAD_cache[id];
}
// After 100,000 unique users: BAD_cache has 100,000 entries eating memory// FIX: LRU cache with both max size and TTLconst { LRUCache } = require('lru-cache');
const userCache = newLRUCache({
max: 10_000, // Evict least-recently-used entries beyond this count
ttl: 1000 * 60 * 5, // Each entry expires after 5 minutes regardless of access
allowStale: false, // Don't serve expired entries even while revalidating
});
functiongetUser(id) {
const cached = userCache.get(id);
if (cached) return cached;
const user = fetchFromDb(id);
userCache.set(id, user);
return user;
}
// ── PATTERN 2: Event listener accumulation ────────────────────────// Each reconnection adds a new listener. removeListener is never called.// process.on('warning') will emit MaxListenersExceededWarning at 11 listeners,// but by then you might already have hundreds.functionhandleConnectionBad(socket) {
const onData = (chunk) => processChunk(socket, chunk);
socket.on('data', onData);
// BUG: socket 'close' event never removes 'data' listener// If this socket is reused or the same emitter receives multiple calls,// listeners pile up indefinitely
}
// FIX: always pair addListener with removeListenerfunctionhandleConnection(socket) {
const onData = (chunk) => processChunk(socket, chunk);
socket.on('data', onData);
socket.once('close', () => {
socket.removeListener('data', onData);
// 'once' ensures this cleanup handler itself doesn't accumulate
});
}
// For high-connection-volume services, set explicit listener limits:
socket.setMaxListeners(3); // data + error + close — that's all this socket needs// ── PATTERN 3: Closure retaining large objects ────────────────────// The closure captures the entire scope, including objects it doesn't use.// Returned functions keep those objects alive in the old generation forever.function BAD_createMiddleware() {
const requestLog = []; // Grows with every request — never clearedreturnfunctionmiddleware(req, res, next) {
requestLog.push(req); // Holds every req object ever receivednext();
};
}
// FIX: only retain what you actually needfunctioncreateMiddleware() {
const requestCount = { value: 0 }; // Tiny counter, not the whole req objectreturnfunctionmiddleware(req, res, next) {
requestCount.value++;
next();
};
}
// ── PATTERN 4: setInterval without clearInterval ──────────────────// Common in module-level setup code — the interval callback fires forever// and holds references to everything in its closure scope.let pollingInterval;
functionstartPolling(config) {
// GOOD: store the reference so we can clear it
pollingInterval = setInterval(async () => {
awaitpollRemoteService(config);
}, 5000);
}
functionstopPolling() {
if (pollingInterval) {
clearInterval(pollingInterval);
pollingInterval = null;
}
}
// Always hook into process shutdown to clean up timers:
process.on('SIGTERM', () => {
stopPolling();
// then close server, drain DB pool, etc.
});
// ── Heap snapshot helper for production debugging ─────────────────// Add this to a debug-only route or triggered by a signal.// Never leave heap snapshot generation on the hot path — it pauses the event loop.const v8 = require('v8');
process.on('SIGUSR2', () => {
const filename = `heap-${Date.now()}.heapsnapshot`;
v8.writeHeapSnapshot(filename);
console.log(`Heap snapshot written to ${filename}`);
// Copy the file off the instance and open in Chrome DevTools Memory tab
});
Output
// MaxListenersExceededWarning from Node.js when listener pattern is wrong:
(node:1234) MaxListenersExceededWarning: Possible EventEmitter memory leak detected.
11 data listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit
// Heap snapshot written on SIGUSR2:
Heap snapshot written to heap-1741132800000.heapsnapshot
// lru-cache eviction working correctly under load:
// After 100,000 unique user requests, userCache.size === 10,000
// (not 100,000 — LRU eviction kept it bounded)
// Memory stays stable instead of growing linearly with unique users
GC Pauses Are the Silent Latency Killer
A 1.4GB heap at 90% utilisation triggers aggressive mark-sweep-compact cycles that pause the event loop for 50-200ms each. These pauses do not appear in your application logs. They do not increment your error counters. They show up only as latency spikes at the P95 and P99 percentiles — and they look identical to a slow database query or a blocked event loop. If your latency percentiles drift upward over hours with no corresponding application error, check heap utilisation and GC frequency before debugging anything else. The metric you want: v8.getHeapStatistics().used_heap_size / v8.getHeapStatistics().heap_size_limit — if this ratio exceeds 0.85, you are in GC pressure territory.
Production Insight
The most common production memory leak in Node.js is an unbounded in-memory cache with no size limit and no TTL.
It works perfectly in staging: low traffic, frequent redeploys, never grows large enough to matter.
In production: 48 hours of real traffic later, the cache has 500,000 entries, GC is running continuously, event loop lag is 300ms, and on-call gets paged for an 'unexplained latency increase'.
Rule: every in-memory cache, every Map used as a cache, every object accumulating state must have an explicit maximum size and a TTL. No exceptions. lru-cache with both max and ttl options is one line of configuration.
Key Takeaway
V8 GC pauses scale with old generation heap utilisation — a heap at 90% capacity causes 50-200ms event loop stops that look like application bugs.
The three most common production leak sources: unbounded caches without TTL/size limits, event listeners never removed on connection close, and closures in long-lived objects retaining large references.
Heap snapshot comparison in Chrome DevTools is the only reliable way to find leaks — guessing wastes hours. Take a baseline, wait under load, take a second snapshot, compare retained size growth by object type.
Keep heaps small: --max-old-space-size at 70-80% of container memory, multiple smaller processes via clustering rather than one large heap.
Production Profiling: Finding Real Bottlenecks Under Real Load
Synthetic benchmarks lie in very specific ways. A service that handles 50,000 requests per second in autocannon or wrk testing may collapse at 5,000 requests per second in production because synthetic benchmarks cannot replicate: real-world payload size variance, database query latency distribution under concurrent connection pressure, connection pool contention between concurrent requests, GC pressure from actual memory usage patterns, and the interaction between all of these simultaneously. Production profiling reveals what benchmarks never expose.
The three essential tools in 2026: clinic.js for automated high-level diagnosis across event loop, CPU, and memory simultaneously; --inspect with Chrome DevTools for interactive CPU flame graphs and heap timeline recording; and perf/DTrace for low-level kernel-level analysis when the issue is in native code or the V8 runtime itself. Each operates at a different depth.
Start with clinic doctor every time — it gives the fastest cross-dimensional view of what is wrong. Event loop delay, CPU profile, and memory trend in a single dashboard that takes two minutes to generate. When doctor identifies an area of concern, drill into it: clinic flame for CPU hotspot identification, clinic heapprofiler for allocation timeline analysis. Only drop to perf/DTrace when the problem is not visible at the JavaScript level — native addon performance, V8 JIT behaviour, or system-call overhead.
The profiling overhead question matters in 2026 because teams are increasingly reluctant to run profiling tools in production after incidents caused by profiler overhead. Clinic.js adds roughly 5-15% overhead to throughput. Chrome DevTools CPU profiling adds 20-40% overhead and should only run on canary instances. The right workflow: run clinic against a canary pod or a staging environment under production-representative load, not against your primary fleet.
profiling.commands.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# ── Install clinic globally (requires Node.js 18+, works on 22LTS) ────────
npm install -g clinic
# Verify installation:
clinic --version
# ═════════════════════════════════════════════════════════════════════
# STEP1: Full diagnostic overview — start here, always
# ═════════════════════════════════════════════════════════════════════
clinic doctor -- node server.js
# Whilethis runs, apply load in a separate terminal:
npx autocannon -c 100 -d 30 http://localhost:3000
# clinic doctor generates an HTML report showing:
# - Event loop delay over time (spikes = blocking operations)
# - CPU usage across the profile period
# - Memory growth trend
# - Recommendations based on detected patterns
# ═════════════════════════════════════════════════════════════════════
# STEP2: CPU hotspot identification (when doctor shows high CPU)
# ═════════════════════════════════════════════════════════════════════
clinic flame -- node server.js
# Apply load while running.
# The flamegraph shows call stacks with width proportional to CPU time.
# Wide boxes at the top of the stack are your hotspots.
# Lookfor synchronous JavaScript in the middle of the stack — anything
# that should be async but isn't shows up as a wide synchronous column.
# ═════════════════════════════════════════════════════════════════════
# STEP3: Memory allocation profiling (when memory grows unexpectedly)
# ═════════════════════════════════════════════════════════════════════
clinic heapprofiler -- node server.js
# Shows an allocation timeline — not just what IS in the heap,
# but what is being actively allocated over time.
# Useful distinction: a large retained object is a leak.
# High allocation rate that GC keeps up with is a GC pressure issue, not a leak.
# ═════════════════════════════════════════════════════════════════════
# STEP4: Async operation flow analysis (when timing looks wrong)
# ═════════════════════════════════════════════════════════════════════
clinic bubbleprof -- node server.js
# Visualises async operation chains — shows where time is spent waiting
# between async operations. Usefulfor finding:
# - Database query chains that could be parallelised with Promise.all()
# - Missing connection pool capacity (lots of time waiting for a connection)
# - Unnecessary sequential async operations
# ═════════════════════════════════════════════════════════════════════
# STEP5: Manual event loop lag measurement (production-safe)
# ═════════════════════════════════════════════════════════════════════
# Option A: quick terminal check
node -e "
constINTERVAL = 1000;
let last = Date.now();
setInterval(() => {
const now = Date.now();
const lag = now - last - INTERVAL;
last = now;
console.log('Event loop lag:', lag.toFixed(0), 'ms');
if (lag > 100) console.warn('WARNING: event loop saturation detected');
}, INTERVAL);
"
# Option B: expose as a Prometheusmetric (recommended for production)
# In your application startup:
# const promClient = require('prom-client');
# promClient.collectDefaultMetrics(); // includes nodejs_eventloop_lag_seconds
# ═════════════════════════════════════════════════════════════════════
# STEP6: V8CPU profile via --inspect (for interactive investigation)
# ═════════════════════════════════════════════════════════════════════
# Start the process with the inspector enabled:
node --inspect=0.0.0.0:9229 server.js
# Then:
# 1. OpenChrome and navigate to chrome://inspect
# 2. Click'inspect' on your Node.js target
# 3. Go to the Profiler tab
# 4. Click'Start' — apply load — click 'Stop'
# 5. Analyse the CPU profile flamegraph
# For production canary: bind inspector to localhost only
node --inspect=127.0.0.1:9229 server.js
# Tunnel via SSH to access from your laptop:
# ssh -L 9229:localhost:9229 user@canary-host
# ═════════════════════════════════════════════════════════════════════
# STEP7: Heap snapshot on demand (production debugging)
# ═════════════════════════════════════════════════════════════════════
# If you added process.on('SIGUSR2') for heap snapshots:
kill -USR2 <node-pid>
# File appears in the working directory as heap-<timestamp>.heapsnapshot
# Load it in ChromeDevTools > Memory tab > Load profile
Output
# clinic doctor output (terminal summary before HTML report opens):
✔ Analysed data
✔ Generated HTML report
open file:///.clinic/12345.clinic-doctor-html
Key findings:
⚠ Event loop delay detected — max 3847ms at 14:23:07
⚠ Synchronous operations detected in the event loop
✓ Memory: stable growth — no apparent leak
✓ CPU: consistent with expected load
Recommendation: Use clinic flame to identify the synchronous hotspot
# Event loop lag measurement output:
Event loop lag: 2 ms
Event loop lag: 3 ms
Event loop lag: 1 ms
WARNING: event loop saturation detected
Event loop lag: 387 ms ← This is when the blocking operation ran
Req/Sec: 8,432 (average) — note the P99 spike despite high average throughput
# The average looks fine. The P99 is catastrophic. This is why you watch percentiles.
Always Profile Under Realistic Concurrent Load
An idle Node.js server shows zero event loop lag, zero CPU hotspots, and flat memory — a completely clean profile that tells you nothing. The blocking operations, the GC pressure patterns, and the async chain bottlenecks only appear under concurrent load because they require multiple requests competing for the same event loop. Before you profile, generate realistic traffic with autocannon, k6, or artillery. Use realistic payload sizes and request patterns from your production access logs — not uniform requests to a single lightweight endpoint. A profiling session on an idle server is the most expensive way to learn nothing.
Production Insight
Clinic.js adds 5-15% overhead to request throughput and should never run permanently in production.
The right approach: run clinic against a canary pod or a dedicated staging instance that receives a copy of production traffic via traffic mirroring.
On Node.js 22, you can use --cpu-prof for built-in V8 CPU profiling without any external tooling — the output is compatible with Chrome DevTools: node --cpu-prof server.js.
Rule: profile in production-equivalent environments under production-equivalent traffic patterns. Analyse the report. Remove the profiler. Make one change. Measure. Repeat.
Key Takeaway
Start with clinic doctor — it gives the fastest cross-dimensional view of event loop, CPU, and memory behaviour simultaneously.
Move from overview (doctor) to detail (flame for CPU, heapprofiler for allocations, bubbleprof for async chains) as patterns emerge.
Node.js 22 includes --cpu-prof as a built-in V8 profiler — useful when clinic.js is not available or when you need zero external dependencies.
● Production incidentPOST-MORTEMseverity: high
The 4-Second Event Loop Block That Took Down Production
Symptom
All API endpoints — not just the reporting endpoint — started returning 504 Gateway Timeout errors within minutes of the deployment. P99 latency spiked from 45ms to over 4,000ms. Kubernetes pods showed healthy status (the health check was a simple synchronous string response that returned in under 1ms) but load balancer metrics showed 100% connection exhaustion across all pods in the cluster.
Assumption
The team assumed the database was the bottleneck because latency had spiked once before due to a slow query. They scaled read replicas, increased connection pool sizes from 10 to 50, and waited. Latency continued climbing. Nobody looked at the newly deployed reporting endpoint because it was described in the PR as a 'minor template change'.
Root cause
The reporting endpoint used handlebars.compile() called inline on every request against a 2MB template file with deeply nested loops. Each compile-and-render cycle blocked the event loop for approximately 3.8 seconds. Under concurrent load, event loop lag accumulated exponentially — every new request queued behind the blocked loop, which meant even a 5ms health check request had to wait for the 3.8-second render to complete. The Kubernetes liveness probe had a 5-second timeout, so probes barely passed — just long enough to keep the pods alive and accepting traffic, while all actual application traffic timed out. The incident ran for 22 minutes before the new deployment was identified as the cause.
Fix
1. Immediately reverted the deployment — restored P99 to 45ms within 90 seconds of rollback
2. Moved template compilation to application startup: compile once at boot, store the compiled template function, call it on each request — compile time is now paid once, not per request
3. Switched to streaming template rendering for the large report output using a streaming-compatible templating approach
4. Added event loop lag monitoring via the event-loop-lag npm package, exposed as a Prometheus gauge
5. Implemented a circuit breaker middleware that returns HTTP 503 with a Retry-After header when event loop lag exceeds 500ms, preventing further request queuing behind a saturated loop
6. Updated the Kubernetes liveness probe to measure event loop responsiveness (a dedicated /health/live endpoint that uses setImmediate to verify the event loop is actually scheduling work) rather than just TCP connectivity
Key lesson
Never compile templates, parse large payloads, or run regex on untrusted input synchronously inside the request path — these are event loop blockers regardless of how small they look in code review
Health checks must measure event loop responsiveness, not just process existence — a process can be alive and completely unable to serve requests
A single synchronous operation does not just slow one endpoint — it blocks every concurrent request in the entire process for its full duration
Monitor event loop lag as a first-class production metric from day one — latency percentiles alone will not tell you why things are slow
Code review descriptions like 'minor template change' deserve scrutiny when the change touches the hot request path — the word 'minor' has no meaning when the event loop is involved
Production debug guideCommon production symptoms and their immediate debugging actions5 entries
Symptom · 01
P99 latency spikes while CPU usage remains low across all cores
→
Fix
This is almost always event loop blocking or GC pressure — not a resource exhaustion problem. Measure event loop lag directly with the event-loop-lag package or use clinic doctor to visualise event loop blocking over time. Low CPU with high latency means work is queued behind something synchronous, not that the system is idle.
Symptom · 02
Memory usage grows linearly over hours until OOM crash (exit code 137 in containers)
→
Fix
Take heap snapshots at intervals via --inspect and Chrome DevTools Memory tab. Take snapshot A, wait 30 minutes under load, take snapshot B, then use the Comparison view to identify object types with growing retained size. Exit code 137 is the kernel OOM killer — the container hit its memory limit. Check not just V8 heap but total RSS including Buffer allocations.
Symptom · 03
CPU pinned at 100% on a single core despite clustering being configured
→
Fix
Verify cluster workers are actually forked and running: ps aux | grep node. Check if the master process is somehow handling requests instead of delegating to workers. Also verify that the scheduling policy is round-robin — on Linux, the default SCHED_NONE delegates scheduling to the OS and can produce uneven distribution under certain connection patterns.
Symptom · 04
Throughput plateaus after adding more cluster workers — no improvement beyond N workers
→
Fix
Check shared resource contention first. Database connection pool size is the most common ceiling — if 8 workers share a pool of 10 connections, adding workers 9 and 10 gains nothing. Also check file descriptor limits (ulimit -n), port range exhaustion for outbound connections (ss -s), and whether a downstream service is the actual bottleneck.
Symptom · 05
Gradual latency degradation over days with no single identifiable event
→
Fix
This pattern usually indicates a slow-growing event loop block — a cache whose lookup time grows as it fills, a regex applied to progressively longer strings, or GC pauses increasing as heap utilisation climbs. Correlate event loop lag metrics over time with heap size metrics. If lag tracks heap growth, you have a memory leak causing GC pressure.
★ Node.js Performance Quick Debug Cheat SheetImmediate actions for common Node.js performance issues in production — ordered by what to check first
Event loop appears blocked — all requests timing out simultaneously−
Immediate action
Identify the blocking operation before changing anything else — you need to know what's blocking before you can fix it
Commands
npx clinic doctor -- node app.js
node --inspect=9229 app.js
Fix now
Move the identified synchronous work off the request path entirely — either pre-compute at startup, cache the result, or offload to a worker thread using piscina
Heap memory growing unbounded — RSS increasing steadily over hours+
Immediate action
Force a GC cycle and take a heap snapshot to establish a baseline before making any changes
Commands
node --inspect --expose-gc app.js
kill -USR2 <pid>
Fix now
Add max-size and TTL limits to every in-memory cache using lru-cache. Check for event emitters with growing listener counts (process.on('warning') will report MaxListenersExceeded). Look for Map or Set objects in closure scope that are never cleared.
Single worker consuming disproportionate CPU in cluster mode+
Immediate action
Verify that round-robin scheduling is active — the OS default on Linux is not round-robin
Commands
NODE_CLUSTER_SCHED_POLICY=rr node app.js
htop -p $(pgrep -f 'node app.js' | tr '\n' ',')
Fix now
Set NODE_CLUSTER_SCHED_POLICY=rr environment variable and restart. If distribution remains uneven, check whether long-lived connections (WebSockets, SSE) are pinning clients to specific workers.
High latency under load with low CPU — requests queuing without obvious reason+
Immediate action
Measure event loop lag quantiles immediately — this distinguishes GC pressure from synchronous blocking
If lag tracks with request rate, look for synchronous work in the hot path — JSON.parse, crypto, regex. If lag is high even at low request rates, suspect GC pauses — check heap utilisation percentage.
Node.js Scaling Strategies Compared
Strategy
Best For
Memory Model
Overhead
Primary Limitation
Cluster Module
I/O-bound HTTP services on multi-core VMs or multi-CPU pods
Fully isolated heaps per worker — no sharing
Low (~10-30MB per worker for runtime overhead)
No shared in-memory state; shared resource contention (DB pools, FDs) is the real ceiling
Worker Threads (piscina)
CPU-bound computation within a request (hashing, image processing, report generation)
Shared memory possible via SharedArrayBuffer
Medium (10-50ms startup, ~5-10MB per thread)
Not for I/O; pool sizing critical; thread errors surface as promise rejections
PM2 Cluster Mode
Long-running services needing zero-downtime reload and log management without K8s
Isolated heaps per worker (wraps cluster module)
Low (thin wrapper around cluster module)
Less operational control than raw cluster API; adds dependency; redundant in K8s
Kubernetes Horizontal Pod Autoscaling
Stateless services with variable or unpredictable traffic patterns
Fully isolated pods — separate processes, separate nodes
Higher (orchestration, scheduling, network overhead between pods)
Network latency between services; slower scale-up than in-process solutions
Event Loop Optimisation alone
Services already at low concurrency where single-core throughput is the bottleneck
Single process, single heap
None — pure code improvement
Single-core CPU ceiling; cannot utilise additional cores on the host
Cluster + Worker Threads (combined)
Mixed workloads: high concurrent I/O with occasional CPU-bound operations per request
Isolated heaps per cluster worker; optional shared memory within each worker's thread pool
Medium (cluster overhead + thread pool overhead)
Complexity: two concurrency models to reason about, debug, and tune simultaneously
Key takeaways
1
The event loop has six phases with fixed execution order
microtasks (nextTick first, then Promise.then) drain between every phase transition. Blocking any phase blocks every concurrent connection for the full duration.
2
Cluster forks one process per CPU core, each with its own V8 heap and event loop. Workers share nothing in memory
sessions, caches, and rate limiters must live in Redis or an equivalent external store.
3
Worker threads handle CPU-bound computation within a request without blocking the event loop. Use a fixed-size pool via piscina sized to CPU count
never spawn a thread per request. Cluster and worker threads solve different problems and can be combined.
4
Most production memory leaks come from three sources
unbounded in-memory caches without size limits or TTLs, event listeners accumulated without corresponding removeListener calls, and closures in long-lived objects retaining large references. Every cache needs both max and ttl.
5
V8 GC pauses scale with old generation heap utilisation. A heap at 90% capacity produces 50-200ms event loop stops that manifest as latency spikes, not errors. Set --max-old-space-size to 70-75% of container memory allocation, not 100%.
6
Profile under realistic concurrent load with clinic.js
start with clinic doctor for the cross-dimensional overview, then drill into flame or heapprofiler as patterns emerge. Idle servers hide every problem.
7
Monitor event loop lag as a first-class production metric using prom-client's collectDefaultMetrics(). It is the earliest signal of event loop health degradation
earlier than error rates, earlier than latency percentiles.
8
Node.js 22 LTS is the active release in 2026
it ships with io_uring on Linux for faster async file I/O, V8 12.x with improved JSON.parse performance, and a stable built-in test runner. The event loop model and all optimisation principles in this guide remain unchanged.
Common mistakes to avoid
7 patterns
×
Using JSON.parse on large request bodies without streaming or size limits
Symptom
Event loop blocks for 50-500ms per large request, causing all concurrent connections to experience the same latency spike simultaneously. Appears as P99 spikes correlated with specific request types, not with overall load.
Fix
Enforce a strict body size limit at the reverse proxy (nginx, AWS ALB, Cloudflare) before the payload reaches Node.js. For payloads that legitimately need to be large, use a streaming JSON parser (JSONStream or the WHATWG Streams API with a streaming JSON decoder). Set body-parser's limit option as a last line of defence. Never buffer a multi-megabyte payload and then synchronously parse it in the request handler.
×
Using cluster without an external session or cache store
Symptom
Users experience random logouts, inconsistent feature flag states, or stale data because their requests hit different workers with completely separate in-memory stores. The issue appears intermittently and is difficult to reproduce locally because local development typically runs a single process.
Fix
Move sessions to Redis (connect-redis with express-session), move caches to Redis with appropriate TTLs, and move rate limiting state to Redis (rate-limit-redis). In-memory stores are only reliable in single-process deployments. As of 2026, Valkey (the Redis fork maintained by the Linux Foundation) is a production-ready alternative if Redis licensing is a concern.
×
Setting --max-old-space-size to the full container memory allocation
Symptom
Container gets OOM-killed by the kernel (exit code 137) even though the V8 heap appears to be within the limit. The kill happens during peak GC activity when V8 briefly holds both the current heap and the compacted copy in memory simultaneously.
Fix
Set --max-old-space-size to 70-75% of your container memory allocation. The remaining 25-30% covers non-heap allocations: Node.js Buffer pool (off-heap by design), native module memory (zlib, crypto), libuv thread pool stacks, worker thread stacks, and the OS page cache. A 512MB container should have --max-old-space-size=384 at most.
×
Using setTimeout(fn, 0) to yield the event loop in batch processing
Symptom
Batch processing of large arrays takes significantly longer than expected — sometimes 10-100x longer than necessary. Each setTimeout(fn, 0) call introduces a minimum ~1ms delay due to timer resolution, which adds up to seconds across thousands of batches.
Fix
Use setImmediate instead — it runs in the check phase of the current event loop iteration with no artificial delay. For a batch of 100,000 items processed 500 at a time, setImmediate yields 200 times versus setTimeout adding 200ms of minimum delay. The difference between 'fast enough' and 'too slow for production SLAs' at scale.
×
Not monitoring event loop lag as a production metric
Symptom
Latency degrades gradually over days — a slow-growing cache, a regex applied to progressively longer strings, or GC pressure building as a memory leak matures. No single request fails. No error rate increases. Latency percentiles drift upward by 10ms per day for two weeks before someone notices.
Fix
Expose event loop lag as a Prometheus metric using prom-client's collectDefaultMetrics() — which emits nodejs_eventloop_lag_seconds, nodejs_eventloop_lag_p50_seconds, and nodejs_eventloop_lag_p99_seconds. Set alerts: warn at P95 > 50ms, page at P99 > 200ms. This metric is the earliest signal of event loop health degradation — earlier than latency percentiles, earlier than error rates.
×
Spawning a worker thread per incoming request
Symptom
Under sustained load, the process spawns hundreds of threads simultaneously. Memory usage spikes 2-3x normal. CPU spends more time on thread management than actual work. OS thread limits (typically 4,096-32,768 depending on configuration) cause thread creation to start failing with EAGAIN errors, which surface as unhandled exceptions.
Fix
Use piscina with a fixed maxThreads sized to the CPU core count (or CPU count minus 1 to leave a core for the event loop). Queue excess work rather than spawning more threads. Monitor pool.queueSize as a metric and implement load shedding (return 503) when the queue exceeds a threshold that your SLA cannot tolerate.
×
Ignoring unhandledRejection events in cluster workers
Symptom
On Node.js 15 and later (including Node.js 22), unhandled promise rejections terminate the process. Cluster workers die silently — no stack trace in the application logs, just a 'Worker N exited with code 1' message. The root cause is invisible.
Fix
Add a process.on('unhandledRejection') handler in every worker that logs the full rejection reason and stack trace before the process exits. Consider whether to exit immediately (safest — prevents undefined state) or attempt graceful shutdown first. Never swallow the rejection without logging — a silent crash is the hardest class of production bug to diagnose.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
Explain the phases of the Node.js event loop and what happens in each ph...
Q02SENIOR
What is the difference between cluster.fork() and worker_threads, and wh...
Q03SENIOR
How would you detect and fix a memory leak in a production Node.js servi...
Q04SENIOR
Why might a Node.js service have low CPU usage but high latency?
Q05JUNIOR
What is event loop lag and how do you monitor it in production?
Q01 of 05SENIOR
Explain the phases of the Node.js event loop and what happens in each phase.
ANSWER
The event loop runs six phases in a fixed order every iteration. Timers: executes setTimeout and setInterval callbacks whose minimum delay has elapsed. Pending callbacks: executes I/O callbacks that were deferred to the next iteration, like TCP error notifications from the previous iteration. Idle/prepare: internal V8 housekeeping — application code does not interact with this phase. Poll: retrieves new I/O events and executes their callbacks — this is where the majority of application work happens, and where the loop may block waiting for new I/O events if there are no pending timers and no setImmediate callbacks. Check: executes setImmediate callbacks — specifically designed to run after the poll phase, before the next timer phase. Close callbacks: executes handlers for abrupt closes, like socket.on('close').
Between every phase transition, the microtask queue is fully drained in priority order: process.nextTick callbacks first (all of them), then resolved Promise.then handlers. This means process.nextTick fires before any Promise.then, which fires before the next event loop phase. The practical implication: if you call process.nextTick recursively, you can starve I/O indefinitely — the next phase never starts until the nextTick queue is empty.
Q02 of 05SENIOR
What is the difference between cluster.fork() and worker_threads, and when would you choose each?
ANSWER
cluster.fork() creates a separate OS process with a fully isolated V8 instance, its own heap, its own garbage collector, and its own event loop. Worker threads create a new thread within the same process — they share the same heap by default and can share memory explicitly via SharedArrayBuffer.
Choose clustering when your bottleneck is I/O throughput — serving more concurrent HTTP connections, more concurrent database queries, more concurrent outbound API calls. Each worker gets its own event loop, so N workers means N concurrent event loops processing I/O simultaneously.
Choose worker threads when your bottleneck is CPU computation within a single request or operation — image resizing, PDF generation, cryptographic key derivation, ML inference, large data transformation. The computation runs in parallel without blocking the main event loop.
In practice, most production services benefit from both: a cluster of workers (one per CPU core) for I/O throughput, with each worker maintaining a small thread pool via piscina for the occasional CPU-bound operation within a request. This handles the mixed workload that most real services have.
The critical operational difference: cluster workers cannot share in-memory state at all. Worker threads can share memory via SharedArrayBuffer, which matters for high-throughput data processing where serialisation overhead between thread and main process would dominate.
Q03 of 05SENIOR
How would you detect and fix a memory leak in a production Node.js service?
ANSWER
Step 1 — Confirm the leak with metrics: look for RSS (Resident Set Size) and heap used values that grow steadily over hours without stabilising after GC cycles. A healthy service's memory graph should have a sawtooth pattern (allocate, GC, down) that stays bounded. A linear upward trend without GC recovery is a leak.
Step 2 — Isolate the generation: check whether old generation heap is growing using v8.getHeapStatistics(). Young generation churn is normal. Old generation growth that doesn't reverse is a leak — objects are being promoted from young to old generation and then retained permanently.
Step 3 — Heap snapshot comparison: start the process with --inspect, trigger a heap snapshot via Chrome DevTools Memory tab (or via kill -USR2 if you've implemented the SIGUSR2 handler), wait 30-60 minutes under production-representative load, take a second snapshot. Use the Comparison view to show objects allocated between the two snapshots, sorted by retained size. The object types at the top are your suspects.
Step 4 — Identify the retention path: select a retained object and look at the retainer tree — it shows which object is holding a reference to it, and which object holds that, all the way to the GC root. This chain tells you exactly where the reference is being held and what code path created it.
Step 5 — Fix and verify: common fixes are adding max-size and TTL to caches (lru-cache), pairing addListener with removeListener on connection close, clearing intervals in shutdown handlers, and breaking circular references. Verify the fix by repeating the snapshot comparison in staging under load — memory growth should stop.
Step 6 — Prevent recurrence: add heap utilisation as a monitored metric (used_heap_size / heap_size_limit). Alert when utilisation exceeds 0.75 — this gives you time to investigate before the OOM kill.
Q04 of 05SENIOR
Why might a Node.js service have low CPU usage but high latency?
ANSWER
Low CPU with high latency in Node.js points to one of three root causes, each with a different diagnostic path.
First cause: event loop blocking by synchronous operations. JSON.parse on large payloads, synchronous file I/O (fs.readFileSync), regex on long strings, synchronous crypto, or any tight computation loop will block the event loop. CPU appears low because the work finishes quickly in absolute terms, but during those milliseconds, zero other requests are served. Diagnosis: measure event loop lag with the event-loop-lag package or prom-client's nodejs_eventloop_lag metrics. If lag spikes correlate with specific request types, that request type is blocking the loop.
Second cause: GC pressure from a heap near capacity. V8's mark-sweep-compact GC pauses the event loop for 50-200ms when the old generation is near its size limit. These pauses appear as latency spikes with no CPU spike — the CPU is genuinely idle during a GC pause. Diagnosis: check heap utilisation percentage. If v8.getHeapStatistics().used_heap_size / heap_size_limit exceeds 0.85, GC pressure is the likely cause.
Third cause: downstream dependency latency — database queries, external API calls, or cache operations taking longer than normal. The event loop is healthy, but requests are queuing behind slow I/O. Diagnosis: clinic bubbleprof visualises async operation chains and shows where time is spent waiting. Database slow query logs and APM traces confirm this.
In production on Node.js 22, always check the Prometheus nodejs_eventloop_lag_p99_seconds metric first — it immediately distinguishes event loop blocking from downstream latency issues.
Q05 of 05JUNIOR
What is event loop lag and how do you monitor it in production?
ANSWER
Event loop lag is the delta between when a callback was scheduled to execute and when it actually executes. In a healthy Node.js service under moderate load, lag should be under 10ms. When it exceeds 50ms, the event loop is under stress. When it exceeds 200ms, clients are experiencing meaningful latency degradation.
Lag increases for three reasons: synchronous operations blocking a phase, GC pauses (which are also synchronous from the event loop's perspective), or a backlog of queued callbacks that takes time to drain.
To measure it: the simplest approach is scheduling a timer with a known interval and measuring the actual interval. The difference is the lag. The event-loop-lag npm package does exactly this. In production, prom-client's collectDefaultMetrics() emits nodejs_eventloop_lag_seconds, nodejs_eventloop_lag_p50_seconds, nodejs_eventloop_lag_p95_seconds, and nodejs_eventloop_lag_p99_seconds automatically — these are the metrics to alert on.
Alerts that work in practice: warn when P95 lag exceeds 50ms, page when P99 lag exceeds 200ms. Correlate lag spikes with deployment timestamps, traffic spikes, and heap utilisation to identify root causes systematically rather than hunting blindly.
Clinic doctor also visualises event loop delay as a time-series chart during a profiling session, making it straightforward to correlate lag spikes with specific traffic patterns or deployments.
01
Explain the phases of the Node.js event loop and what happens in each phase.
SENIOR
02
What is the difference between cluster.fork() and worker_threads, and when would you choose each?
SENIOR
03
How would you detect and fix a memory leak in a production Node.js service?
SENIOR
04
Why might a Node.js service have low CPU usage but high latency?
SENIOR
05
What is event loop lag and how do you monitor it in production?
JUNIOR
FAQ · 6 QUESTIONS
Frequently Asked Questions
01
How many cluster workers should I run in production?
Start with one worker per CPU core, but subtract one core for the primary process and OS overhead. On a 4-core container: run 3 workers. On an 8-core VM: run 7 workers. This is a starting point, not a hard rule.
Monitor per-worker CPU utilisation after deploying. If workers consistently sit below 50% CPU utilisation under peak load, you have more workers than the workload needs and you are wasting memory (each worker consumes 30-100MB of baseline RAM). If workers consistently exceed 80% CPU utilisation, add workers or scale the container.
In Kubernetes with autoscaling, the K8s-preferred pattern is one process per pod (no intra-pod clustering) with the HPA managing pod count. This gives cleaner isolation, simpler crash recovery, more accurate resource accounting, and allows K8s to schedule pods across nodes. Intra-pod clustering makes most sense when a single pod is allocated 4+ CPUs and you want to utilise all of them without the orchestration overhead of additional pods.
Was this helpful?
02
Why does my Node.js process use more memory than --max-old-space-size allows?
The V8 heap is only one component of total Node.js process memory. Total RSS (Resident Set Size) includes all of: the V8 old generation heap (controlled by --max-old-space-size), the V8 new generation (young space, ~32MB by default), off-heap Buffer allocations (Buffer.alloc and Buffer.from use memory outside the V8 heap by design — this is intentional for performance), native module memory (crypto, zlib, bcrypt, and other C++ addons allocate from the OS directly), the libuv thread pool stacks (default 4 threads), worker thread stacks (if you're using worker_threads), and V8 code space and map space for compiled JavaScript.
A process with --max-old-space-size=512 routinely uses 700-900MB of total RSS under moderate load. Set the flag to 70-75% of your container memory allocation, not 90% or 100%. The remaining headroom is needed for non-heap allocations and to prevent the OOM killer from intervening during peak GC activity when V8 briefly holds both the current and compacted heap simultaneously.
Was this helpful?
03
Should I use PM2 or the built-in cluster module?
It depends on where you're deploying. On a VM or bare-metal server where you are responsible for process lifecycle management: PM2 is the right choice. It wraps the cluster module and adds operational necessities — zero-downtime reload (pm2 reload), structured log aggregation, a monitoring dashboard (pm2 monit), and automatic restart policies. Managing process lifecycle manually with the raw cluster module on a VM is reinventing what PM2 already does reliably.
In Kubernetes: PM2 adds very little value. K8s handles restarts (liveness probes), scaling (HPA), rolling deployments (RollingUpdate strategy), and log collection (stdout to a sidecar or log aggregator). The overhead of PM2 inside a container that K8s is already managing creates a confusing double layer of process supervision. In K8s, prefer either one process per pod (simplest) or the raw cluster module if you need multi-core utilisation within a single pod.
Was this helpful?
04
Can I use async/await for everything and never worry about blocking the event loop?
No — and this misconception is responsible for a significant number of production event loop blocking incidents. async/await is syntax that makes asynchronous code readable. It does not make synchronous operations asynchronous.
Every one of the following is synchronous and blocks the event loop regardless of whether it appears inside an async function: JSON.parse(), JSON.stringify() on large objects, regex matching on long strings (especially with backtracking-prone patterns), Array.sort() on large arrays, crypto.pbkdf2Sync() (note the Sync suffix — that is the warning), any tight computation loop, and any native addon that does not use libuv's async APIs.
await only helps when you're awaiting something that is already asynchronous — a network request, a database query, a file read via fs.promises. If the underlying operation is synchronous, wrapping it in async/await changes nothing about its blocking behaviour. The rule: use async/await for I/O concurrency. Use worker threads for CPU-bound work that takes more than 10ms. There is no shortcut between these two.
Was this helpful?
05
What causes the 'JavaScript heap out of memory' error and how do I fix it immediately?
This error fires when V8's old generation heap reaches its configured maximum (default ~1.5GB on 64-bit, slightly higher in Node.js 22). V8 cannot collect enough garbage to make room for new allocations and gives up.
Immediate fix: increase --max-old-space-size to buy time. Add NODE_OPTIONS='--max-old-space-size=2048' before diagnosing the root cause. This is not a permanent solution — it delays the crash but does not fix the leak.
Proper fix: take a heap snapshot before and after 30 minutes of load, compare them in Chrome DevTools Memory tab's Comparison view, identify the object types with the largest retained size growth. Then fix the actual leak: the most common causes are unbounded in-memory caches (fix with lru-cache max and ttl options), event listeners that accumulate across reconnections (fix by pairing addListener with removeListener on close), and closures in middleware or request handlers retaining req or res objects beyond the request lifecycle.
Long-term fix: add heap utilisation monitoring (used_heap_size / heap_size_limit) as a Prometheus metric and alert when it exceeds 0.75. This gives you hours of warning before the OOM crash, time to take a heap snapshot and diagnose while the service is still running.
Was this helpful?
06
Is Node.js 22 LTS significantly different from Node.js 20 LTS for the topics covered in this guide?
For the event loop model, clustering architecture, and worker thread API — no, they are functionally identical. The fundamental architecture has not changed.
What is different in Node.js 22 that might affect production performance: io_uring support on Linux (kernel 5.1+) for significantly faster async file I/O — the poll phase processes file system events more efficiently; V8 12.x with improved JSON.parse performance (still synchronous, still blocks the event loop, but faster than 20 LTS for the same payload size); a stable built-in test runner (no more mocha/jest dependency for basic testing); built-in WebSocket client via the WHATWG WebSocket API; and improved single-executable application support.
Node.js 20 LTS enters maintenance mode in October 2026. If you are running 20 in production today, the migration to 22 LTS is low-risk for most applications — the breaking changes between 20 and 22 are minimal and well-documented in the Node.js changelog.