Junior 8 min · March 06, 2026

Event Loop Block: 4-Second Template Compile in Node.js

An inline handlebars.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Node.js runs on a single-threaded event loop with libuv handling async I/O via a thread pool (default 4 threads, tunable via UV_THREADPOOL_SIZE)
  • The event loop has 6 distinct phases (timers, pending, idle/prepare, poll, check, close) — blocking any phase freezes the entire process for every connected client
  • cluster module forks one process per CPU core, sharing no memory but enabling true parallel request handling across cores
  • Worker threads share memory via SharedArrayBuffer and handle CPU-bound work without blocking the event loop — use a fixed pool, never spawn per request
  • Memory leaks typically come from unbounded caches, forgotten timers, or closure-retained references — detect with --inspect and heap snapshot comparison
  • Production profiling (flamegraphs, clinic.js) reveals bottlenecks that synthetic benchmarks never expose
  • Node.js 22 LTS (active as of 2026) ships with a native test runner, built-in WebSocket client, and improved single-executable application support — no runtime changes affect the event loop model described here
Plain-English First

Imagine a single brilliant chef running your entire restaurant kitchen alone. They're fast — genuinely fast — but if one order takes 20 minutes to prepare (like baking a cake from scratch), every other customer waits. The chef doesn't serve table two while table one's cake is in the oven. Node.js is that chef: one thread, blazing fast for quick tasks, but one slow synchronous operation can freeze everything else behind it. Optimising Node.js means teaching that chef to delegate long jobs to kitchen assistants, batch similar tasks efficiently, and never stand still doing nothing when there are orders waiting. Clustering is like opening multiple identical kitchens. Worker threads are the kitchen assistants for heavy prep work.

Every Node.js app starts fast. Then reality hits — traffic spikes, database queries pile up, memory climbs steadily overnight, and that 'non-blocking' promise starts feeling like a lie. The truth is Node.js is genuinely efficient by design, but that efficiency has a very specific shape. Violate that shape and you'll hit performance walls that no amount of horizontal scaling will fully fix.

Node.js is single-threaded, but not single-concurrent. The event loop, backed by libuv's thread pool and epoll/kqueue/io_uring at the OS level, handles thousands of concurrent I/O operations without spawning OS threads. The problem starts when you confuse 'non-blocking I/O' with 'can do anything without consequence'. A JSON.parse on a 50MB payload, a synchronous crypto operation, or a tight computation loop will block the event loop — and every connected client pays the price simultaneously.

In 2026, Node.js 22 LTS is the active long-term support release (Node.js 20 LTS reaches maintenance mode in October 2026). Node.js 22 shipped with io_uring support on Linux for significantly faster async file I/O, a stable built-in test runner, native WebSocket client support, and performance improvements to V8 12.x. None of these changes alter the event loop model or the clustering architecture described in this guide — the fundamentals remain exactly what they were, and the mistakes remain exactly as costly.

This guide covers the internals that matter in production: event loop phases and timing, clustering for multi-core utilisation, worker threads for CPU-bound work, memory leak patterns and detection, and the profiling tools that reveal what synthetic benchmarks consistently miss. Every section is grounded in production behaviour, not toy examples.

Event Loop Internals: Phases, Timing, and What Blocks Everything

The Node.js event loop is not a simple FIFO queue. It runs in six distinct phases, each with its own callback queue, executed in a fixed order every tick. Understanding this order is the difference between writing code that performs predictably in production and code that surprises you at 3 AM on-call.

The phases in execution order: timers (setTimeout/setInterval callbacks whose delay has expired), pending callbacks (I/O callbacks deferred from the previous iteration, like TCP error notifications), idle/prepare (internal V8 housekeeping — you don't interact with this), poll (new I/O events — this is where most work happens, and where the loop may block waiting for events when there's nothing queued), check (setImmediate callbacks), and close callbacks (socket.on('close', ...) handlers).

Between each phase transition, the microtask queue is fully drained. This includes process.nextTick callbacks first, then resolved Promise.then handlers. This ordering matters more than most engineers realise — process.nextTick fires before any Promise.then, which fires before the next event loop phase.

The critical practical insight: setImmediate always runs after the poll phase completes within the current iteration. setTimeout(fn, 0) runs in the next iteration's timer phase. Under I/O callbacks specifically, setImmediate fires first — always. This isn't academic trivia. It determines the ordering of operations in streaming pipelines, connection teardown sequences, and graceful shutdown logic where getting the order wrong causes data loss or unclosed handles.

On Node.js 22 LTS, the underlying I/O layer on Linux now uses io_uring where available, which significantly reduces syscall overhead for file I/O operations. The event loop phase model is unchanged — what changes is the speed at which the poll phase can process file system events.

event-loop-phases.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
const fs = require('fs');

// Demonstrates the phase ordering — run this and study the output
fs.readFile(__filename, () => {
  // We are now inside an I/O callback (poll phase)

  setTimeout(() => console.log('3: setTimeout'), 0);
  // Will run in the NEXT iteration's timer phase

  setImmediate(() => console.log('2: setImmediate'));
  // Will run in THIS iteration's check phase — before setTimeout

  Promise.resolve().then(() => console.log('1: Promise.then'));
  // Microtask — runs before check phase

  process.nextTick(() => console.log('0: nextTick'));
  // Microtask — runs before Promise.then, before any phase
});

// Output order:
// 0: nextTick       (microtask, highest priority — before any phase transition)
// 1: Promise.then   (microtask, runs after nextTick queue is empty)
// 2: setImmediate   (check phase — this iteration)
// 3: setTimeout     (timer phase — NEXT iteration)


// ── Production pattern: yield the event loop during batch processing ───────
// Without yielding, processing 100,000 items blocks the loop for the full duration.
// With setImmediate between batches, I/O and other requests can interleave.
function processBatchWithYield(items, batchSize, processFn, onComplete) {
  let index = 0;

  function processNextBatch() {
    const batchEnd = Math.min(index + batchSize, items.length);

    // Process one batch synchronously
    for (; index < batchEnd; index++) {
      processFn(items[index]);
    }

    if (index < items.length) {
      // Yield to the event loop — allows pending I/O and HTTP requests to run
      // setImmediate is better than setTimeout(fn, 0) here:
      // no artificial 1ms delay, runs in the check phase of the current iteration
      setImmediate(processNextBatch);
    } else {
      onComplete();
    }
  }

  processNextBatch();
}

// Usage:
processBatchWithYield(
  largeArray,
  500,           // Process 500 items per batch before yielding
  (item) => transform(item),
  () => console.log('All items processed')
);


// ── Measuring actual event loop lag in production ──────────────────────────
// Schedule a task, measure how long it actually takes to execute vs expected.
// The delta is your event loop lag.
function measureEventLoopLag() {
  const INTERVAL_MS = 1000;
  let lastCheck = Date.now();

  setInterval(() => {
    const now = Date.now();
    const lag = now - lastCheck - INTERVAL_MS;
    lastCheck = now;

    if (lag > 50) {
      console.warn(`Event loop lag: ${lag}ms — investigate blocking operations`);
    }
    // In production: expose this as a Prometheus gauge
  }, INTERVAL_MS);
}

measureEventLoopLag();
Output
// Output from the phase ordering demo:
0: nextTick
1: Promise.then
2: setImmediate
3: setTimeout
// Output from measureEventLoopLag() under normal load:
// (no output — lag under 50ms)
// Output from measureEventLoopLag() when a synchronous operation blocks the loop:
Event loop lag: 382ms — investigate blocking operations
Event loop lag: 3847ms — investigate blocking operations
Event loop lag: 12ms — investigate blocking operations
The Event Loop as a Six-Station Assembly Line
  • process.nextTick() cuts the queue entirely — it runs before the next station even starts, making it useful for deferring work within the current operation but dangerous if overused (it can starve I/O)
  • Promise.then() callbacks also run as microtasks, after nextTick is exhausted but before any phase transition
  • The poll phase is where the loop spends most of its time — it processes I/O callbacks and may block waiting for new I/O events when the queue is empty and no timers are pending
  • setImmediate was designed specifically for yielding inside I/O callbacks — it fires before the next timer phase without the 1ms minimum delay that setTimeout carries
  • Blocking any phase for more than 10-20ms degrades latency for every connected client simultaneously — there is no isolation between requests in a single Node.js process
Production Insight
In production, the single most common event loop blocker is JSON.parse on large request bodies.
Parsing a 10MB JSON payload takes 50-200ms depending on structure complexity and CPU generation — during that entire window, zero other requests are processed.
On Node.js 22 with V8 12.x, JSON.parse performance has improved, but the blocking nature has not changed — it is still synchronous.
Rule: stream-parse large bodies with a streaming JSON parser, or enforce a strict body size limit at the reverse proxy (nginx, AWS ALB) before the payload reaches Node.js.
Key Takeaway
The event loop has six phases with fixed execution order — microtasks (nextTick, Promise.then) drain between every phase transition.
Blocking any phase blocks all concurrent connections for the full duration — there is no request isolation in a single process.
Yield long synchronous work with setImmediate between batches, or offload to worker threads for work that cannot be chunked.
Node.js 22 on Linux uses io_uring for faster async file I/O — the phase model is unchanged, but poll phase throughput for file operations improves.
Choosing Between setTimeout, setImmediate, and nextTick
IfNeed to defer work until after I/O callbacks complete in the current event loop iteration
UseUse setImmediate — runs in the check phase immediately after poll, with no artificial timer delay
IfNeed work to run before any I/O, timer, or check callbacks — highest priority deferral
UseUse process.nextTick — runs as a microtask before the next event loop phase, but use sparingly as it can starve I/O if called recursively
IfNeed a minimum wall-clock delay before execution
UseUse setTimeout — minimum granularity is ~1ms due to timer resolution; Node.js 22 improves timer accuracy but does not eliminate this floor
IfProcessing a large array or dataset without blocking the event loop
UseBatch with setImmediate between chunks — yield after every N items to allow pending I/O and incoming requests to interleave
IfCPU-bound work that will take more than 10ms
UseOffload to a worker thread via piscina — do not try to chunk it with setImmediate if the work cannot be easily divided

Clustering: Scaling Across CPU Cores Without Shared State

Node.js runs on a single thread. A single process on a 32-core machine uses roughly 3% of available CPU — the other 97% sits idle while your users wait. The cluster module solves this by forking the main process into N worker processes, each with its own event loop, its own V8 heap, and its own memory space. The primary process accepts incoming connections and distributes them to workers via IPC.

Clustering is not the same as load balancing across machines. Cluster workers share the same physical host — same CPU, same RAM, same network interface, same file descriptor table at the OS level. The scaling ceiling is not just CPU. As you add workers, shared resources become the actual bottleneck: a database connection pool of 20 connections shared across 16 workers gives each worker an average of 1.25 connections, which serialises any concurrent database work completely.

On Linux, the default scheduling policy (SCHED_NONE) delegates connection distribution to the OS. This sounds neutral but can produce significantly uneven worker load in practice — some workers handle 3x more requests than others depending on connection arrival timing. Setting NODE_CLUSTER_SCHED_POLICY=rr enables round-robin scheduling in the primary process, which distributes connections more evenly at the cost of a small IPC overhead per connection.

In Kubernetes in 2026, the question of whether to cluster at all has a nuanced answer. If your pod is allocated 1 CPU, clustering gives you nothing — you have one core. If your pod is allocated 4 CPUs, clustering with 3-4 workers utilises those cores. The K8s-preferred pattern is one process per pod (no clustering) with the orchestrator managing pod count for horizontal scaling — this gives you cleaner isolation, simpler crash recovery, and better resource accounting. But if your workload has high per-request CPU variance (some requests are light, some are heavy), clustering within a multi-CPU pod still wins on tail latency.

cluster.production.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
const cluster = require('cluster');
const http = require('http');
const os = require('os');

// In K8s: WEB_CONCURRENCY can be injected as an env var matching allocated CPUs.
// In VMs/bare metal: default to os.cpus().length.
// Never hard-code a number — it breaks when the container size changes.
const WORKER_COUNT = parseInt(process.env.WEB_CONCURRENCY, 10) || os.cpus().length;
const PORT = parseInt(process.env.PORT, 10) || 3000;

if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} — forking ${WORKER_COUNT} workers`);

  // Track each worker's lifecycle separately
  const workerMeta = new Map();

  for (let i = 0; i < WORKER_COUNT; i++) {
    spawnWorker();
  }

  function spawnWorker() {
    const worker = cluster.fork();
    workerMeta.set(worker.id, {
      pid: worker.process.pid,
      startedAt: Date.now(),
      crashCount: 0,
    });
    console.log(`Worker ${worker.id} (pid ${worker.process.pid}) started`);
    return worker;
  }

  cluster.on('exit', (worker, code, signal) => {
    const meta = workerMeta.get(worker.id);
    meta.crashCount++;

    const reason = signal || `exit code ${code}`;
    console.error(`Worker ${worker.id} exited (${reason}). Crash count: ${meta.crashCount}`);

    // Crash loop protection: if a worker crashes more than 5 times,
    // something is fundamentally wrong. Don't keep restarting it.
    // In K8s, the pod will be replaced. In a VM, alert and investigate.
    if (meta.crashCount > 5) {
      console.error(`Worker ${worker.id} has crashed ${meta.crashCount} times — stopping restarts`);
      // Remove from tracking so we don't endlessly accumulate stale entries
      workerMeta.delete(worker.id);
      return;
    }

    // Brief delay before respawning to avoid crash storms on startup failures
    setTimeout(() => spawnWorker(), 500);
  });

  // Rolling restart: send shutdown signal to each worker in sequence,
  // wait for it to drain and exit, then restart it.
  // This achieves zero-downtime restarts without PM2.
  async function rollingRestart() {
    const workerIds = Object.keys(cluster.workers);
    for (const id of workerIds) {
      await new Promise((resolve) => {
        const worker = cluster.workers[id];
        if (!worker) { resolve(); return; }

        worker.send('graceful-shutdown');
        worker.once('exit', () => {
          spawnWorker();
          // Give the new worker 2 seconds to initialise before draining the next one
          setTimeout(resolve, 2000);
        });
      });
    }
    console.log('Rolling restart complete');
  }

  process.on('SIGUSR1', rollingRestart);

  process.on('SIGTERM', () => {
    console.log('Primary received SIGTERM — initiating graceful shutdown');
    for (const id in cluster.workers) {
      cluster.workers[id].send('graceful-shutdown');
      cluster.workers[id].disconnect();
    }
  });

} else {
  // Worker process — this is where your actual application runs
  const server = http.createServer((req, res) => {
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end(`Worker ${cluster.worker.id} (pid ${process.pid}) handled this request`);
  });

  server.listen(PORT, () => {
    console.log(`Worker ${cluster.worker.id} listening on port ${PORT}`);
  });

  // Handle graceful shutdown: stop accepting new connections,
  // finish in-flight requests, then exit cleanly.
  process.on('message', (msg) => {
    if (msg === 'graceful-shutdown') {
      console.log(`Worker ${cluster.worker.id} shutting down gracefully`);
      server.close(() => {
        console.log(`Worker ${cluster.worker.id} shutdown complete`);
        process.exit(0);
      });

      // Force exit if graceful shutdown takes too long (e.g., a stuck WebSocket)
      setTimeout(() => {
        console.error(`Worker ${cluster.worker.id} forced exit after timeout`);
        process.exit(1);
      }, 30_000);
    }
  });

  // Catch unhandled rejections at the worker level
  // Log them fully before the process exits — silent rejections lose stack traces
  process.on('unhandledRejection', (reason, promise) => {
    console.error('Unhandled rejection in worker', cluster.worker.id, reason);
    // Depending on the severity, you may want to exit here:
    // process.exit(1);
  });
}
Output
Primary 1234 — forking 4 workers
Worker 1 (pid 1235) listening on port 3000
Worker 2 (pid 1236) listening on port 3000
Worker 3 (pid 1237) listening on port 3000
Worker 4 (pid 1238) listening on port 3000
# After sending SIGTERM to the primary:
Primary received SIGTERM — initiating graceful shutdown
Worker 1 shutting down gracefully
Worker 2 shutting down gracefully
Worker 3 shutting down gracefully
Worker 4 shutting down gracefully
Worker 1 shutdown complete
Worker 2 shutdown complete
Worker 3 shutdown complete
Worker 4 shutdown complete
# If a worker crashes:
Worker 2 exited (exit code 1). Crash count: 1
Worker 5 (pid 1291) listening on port 3000
Cluster Workers Share Absolutely Nothing in Memory
Each cluster worker is a completely separate V8 isolate with its own heap. In-memory caches, session stores, rate limiters, and feature flag states are NOT shared across workers — they exist independently in each worker's heap. A user whose request hits worker 1 has zero visibility into what worker 2 has cached. In practice this means: any state that must be consistent across requests must live in an external store (Redis for sessions and caches, a database for rate limiting counters). In-memory state is only reliable in single-process deployments, which is fine for local development and occasionally fine for small internal tools, but never appropriate for multi-worker production services.
Production Insight
Cluster workers die silently in production more often than engineers expect.
Unhandled promise rejections (which become fatal in Node.js 15+), native addon segfaults, and OOM kills all crash workers cleanly without obvious log output beyond the exit event.
Node.js 22 still propagates unhandled promise rejections as fatal by default — if your workers are dying and you're not sure why, add a process.on('unhandledRejection') handler to log the full rejection reason before the process exits.
Rule: implement worker death tracking with crash count limits. A crash-looping worker that restarts every 500ms consumes resources and delays investigation. Cap restarts at 5, alert, and let the orchestrator handle replacement.
Key Takeaway
Cluster forks one process per CPU core — each gets its own V8 heap, its own event loop, and its own memory space.
Workers share nothing in memory — sessions, caches, and rate limiters must live in Redis or an equivalent external store.
Shared resource contention (database connection pool size, file descriptor limits) is the actual scaling ceiling, not CPU count.
In Kubernetes, prefer one process per pod plus K8s horizontal pod autoscaling over intra-pod clustering for cleaner isolation and resource accounting.

Worker Threads for CPU-Intensive Work — And When Not to Use Them

Cluster processes are heavyweight. Each fork creates a full V8 instance with its own heap, its own garbage collector, and its own event loop. Worker threads are lighter in a specific way: they run within the same process, can share memory via SharedArrayBuffer, and have a startup cost of 10-50ms versus the 100-300ms for a full process fork.

Worker threads are not a general-purpose concurrency model. They exist for one job: CPU-bound parallelism within a single request or operation lifecycle. They excel at image processing, cryptographic key derivation, data transformation, report generation, ML model inference, and any algorithm that is genuinely computation-heavy. For I/O-bound work — database queries, HTTP calls, file reads — the event loop already handles concurrency efficiently. Adding threads for I/O adds complexity, coordination overhead, and thread pool management with zero performance benefit.

The architectural difference that matters most: worker threads can share memory via SharedArrayBuffer with Atomics for synchronisation. This enables zero-copy data sharing between threads — you pass a reference to a shared buffer, not a serialised copy of the data. For high-throughput data processing where the payload size is large, the difference between zero-copy and serialise-deserialise can dominate latency entirely.

The rule about thread pools is non-negotiable in production: never spawn a worker thread per incoming request. Thread creation has real overhead — 10-50ms startup, ~5-10MB memory per thread including stack and V8 overhead. Under concurrent load, naive spawn-per-request causes thread storms. 200 simultaneous requests spawn 200 threads, the process hits OS thread limits, thread creation starts failing, and the whole thing collapses. The correct pattern is a fixed-size thread pool — piscina is the most production-hardened library for this in the Node.js ecosystem as of 2026.

workers.cpu-task.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
// ── cpu-worker.js ─────────────────────────────────────────────────────────
// This file runs inside each worker thread.
// It receives work via workerData, does computation, and posts the result back.

const { parentPort, workerData, threadId } = require('worker_threads');
const crypto = require('crypto');

function computeHashes(count) {
  // This is genuinely CPU-bound — no I/O, pure computation.
  // Running this on the main thread would block the event loop for the full duration.
  const results = new Array(count);
  for (let i = 0; i < count; i++) {
    results[i] = crypto
      .createHash('sha256')
      .update(`payload-${i}-${Date.now()}`)
      .digest('hex');
  }
  return results;
}

parentPort.postMessage({
  threadId,
  count: workerData.iterations,
  sample: computeHashes(workerData.iterations)[0],
});


// ── thread-pool.js ────────────────────────────────────────────────────────
// Production pattern: fixed-size thread pool using piscina.
// Never spawn a thread per request. Queue excess work.

const Piscina = require('piscina');
const path = require('path');

// Create the pool once at application startup — not per request.
// maxThreads should roughly match CPU core count.
// idleTimeout: threads that have been idle for 30s are terminated to free memory.
const pool = new Piscina({
  filename: path.resolve(__dirname, 'cpu-worker.js'),
  maxThreads: parseInt(process.env.WORKER_THREADS, 10) || (require('os').cpus().length - 1),
  idleTimeout: 30_000,
});

// In your Express/Fastify route handler:
async function handleHashRequest(req, res) {
  const { iterations = 1000 } = req.query;

  // Cap iterations to prevent intentional DoS via this endpoint
  const safeIterations = Math.min(parseInt(iterations, 10), 10_000);

  try {
    // This runs in a worker thread — the event loop is completely free
    // to handle other requests while this computation runs.
    const result = await pool.run({ iterations: safeIterations });
    res.json({ threadId: result.threadId, count: result.count, sample: result.sample });
  } catch (err) {
    // piscina surfaces worker errors as rejected promises
    console.error('Worker thread error:', err);
    res.status(500).json({ error: 'Computation failed' });
  }
}

// Monitor pool queue depth — if it grows unboundedly, you need more threads
// or you need to reject requests earlier with a 503
setInterval(() => {
  if (pool.queueSize > 50) {
    console.warn(`Thread pool queue depth: ${pool.queueSize} — consider increasing maxThreads or load shedding`);
  }
}, 5_000);


// ── SharedArrayBuffer pattern for zero-copy data sharing ──────────────────
// When passing large datasets to worker threads, avoid serialisation overhead
// by sharing the underlying ArrayBuffer directly.

const { Worker } = require('worker_threads');

function processLargeDataset(data) {
  // Convert data to a typed array for sharing
  const sharedBuffer = new SharedArrayBuffer(data.length * 4);
  const sharedView = new Int32Array(sharedBuffer);

  // Copy data into shared buffer — this is the only copy
  data.forEach((val, i) => { sharedView[i] = val; });

  return new Promise((resolve, reject) => {
    const worker = new Worker(`
      const { workerData, parentPort } = require('worker_threads');
      const view = new Int32Array(workerData.buffer);
      // Process in-place — no additional memory allocation for the data itself
      let sum = 0;
      for (let i = 0; i < view.length; i++) sum += view[i];
      parentPort.postMessage({ sum });
    `, {
      eval: true,
      workerData: { buffer: sharedBuffer } // Transferred, not copied
    });

    worker.on('message', resolve);
    worker.on('error', reject);
  });
}
Output
// piscina pool handling concurrent requests:
// Request 1 → threadId: 1, count: 1000, sample: 'a3f8c2...'
// Request 2 → threadId: 2, count: 1000, sample: 'b7d4e1...'
// Request 3 → threadId: 1, count: 1000, sample: 'c9a1f3...'
// (threadId 1 was reused — threads stay alive and handle multiple tasks)
// Pool queue warning under sustained load:
// Thread pool queue depth: 67 — consider increasing maxThreads or load shedding
// Comparison: blocking event loop vs worker thread
// Without worker thread:
// Request latency during 10,000 hash computation: ~380ms blocked
// Other requests during that time: 0 served
//
// With worker thread pool:
// Request latency during 10,000 hash computation: ~380ms (in thread)
// Other requests during that time: served normally, <5ms latency
Cluster vs Worker Threads — Two Different Problems
  • Cluster workers: each gets its own event loop and V8 heap — best when you need to handle more concurrent I/O-bound requests than one event loop can manage
  • Worker threads: share the parent process heap (optionally), lighter startup — best when a single request triggers a CPU-bound computation that would block the event loop
  • If your bottleneck is database query throughput or concurrent HTTP connections, clustering wins — more event loops means more concurrent I/O operations
  • If your bottleneck is computation within a request (hashing, image resizing, PDF generation, ML inference), worker threads win — CPU parallelism without full process overhead
  • You can combine both: a clustered app where each worker has a small thread pool for the occasional CPU-bound request — this is the right architecture for mixed I/O and CPU workloads
  • Never spawn a thread per request — use piscina with a fixed maxThreads sized to CPU count minus 1 (leave one core for the event loop)
Production Insight
Worker thread startup costs 10-50ms and 5-10MB memory per thread on Node.js 22.
Spawning threads per request under load creates thread storms — 500 concurrent requests attempt to spawn 500 threads, hit OS limits, and the process crashes or thrashes.
Piscina's queue mechanism is your safety valve: excess tasks queue rather than spawning new threads. Monitor pool.queueSize as a metric — if it grows consistently, you need more threads or earlier load shedding.
Rule: create the pool once at startup, size maxThreads to CPU count, cap queueSize with a circuit breaker that returns 503 when the queue exceeds your SLA threshold.
Key Takeaway
Worker threads handle CPU-bound work without blocking the event loop — they are not for I/O.
Use piscina for thread pool management — never spawn per request, never more threads than CPU cores.
SharedArrayBuffer enables zero-copy data sharing between threads — critical for high-throughput data processing.
You can combine clustering and worker threads: clustered workers each with a small thread pool is the right architecture for mixed I/O and CPU workloads.

Memory Management, GC Pauses, and Leak Detection That Actually Works

Node.js uses V8's generational garbage collector. Objects start in the young generation (also called the nursery or new space), which is collected via a fast scavenge algorithm that runs frequently. Objects that survive two scavenge cycles are promoted to the old generation, which is collected via a mark-sweep-compact algorithm — slower, less frequent, and critically, it pauses the event loop while it runs.

The size of those GC pauses scales with old generation heap utilisation. A heap at 30% utilisation with 512MB old space might produce 5-15ms GC pauses. The same heap at 90% utilisation triggers increasingly frequent major GC cycles producing 50-200ms pauses. Those pauses look exactly like event loop blocking in your latency metrics — P99 spikes with no corresponding CPU spike.

The default old space limit is approximately 1.5GB on 64-bit systems (slightly higher on Node.js 22 due to V8 12.x improvements). This is not a target — it is a ceiling. Running a 1.4GB heap is not healthy; it means V8 is under severe GC pressure. Use --max-old-space-size to set an appropriate limit based on your container allocation, then use clustering to run multiple smaller heaps rather than one large one.

The most dangerous memory leaks in production are the gradual ones. A slow-growing Map, an event emitter accumulating listeners, a closure in a middleware capturing a request object — none of these cause immediate failures. They grow over hours or days, GC pauses increase gradually, latency percentiles drift upward, and the eventual OOM crash looks like a random event rather than the conclusion of a long-running leak. Heap snapshot comparison is the only reliable way to find them.

memory.leak-patterns.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
// ══════════════════════════════════════════════════════════════════
// COMMON PRODUCTION LEAK PATTERNS — what they look like and how to fix them
// ══════════════════════════════════════════════════════════════════

// ── PATTERN 1: Unbounded in-memory cache ──────────────────────────
// The most common leak in production Node.js services.
// Works fine in staging (low traffic, frequent restarts).
// Reaches OOM in production after 12-48 hours of sustained traffic.

const BAD_cache = {};
function getUserBad(id) {
  if (!BAD_cache[id]) {
    BAD_cache[id] = fetchFromDb(id); // Added forever, never evicted
  }
  return BAD_cache[id];
}
// After 100,000 unique users: BAD_cache has 100,000 entries eating memory

// FIX: LRU cache with both max size and TTL
const { LRUCache } = require('lru-cache');
const userCache = new LRUCache({
  max: 10_000,           // Evict least-recently-used entries beyond this count
  ttl: 1000 * 60 * 5,   // Each entry expires after 5 minutes regardless of access
  allowStale: false,     // Don't serve expired entries even while revalidating
});

function getUser(id) {
  const cached = userCache.get(id);
  if (cached) return cached;
  const user = fetchFromDb(id);
  userCache.set(id, user);
  return user;
}


// ── PATTERN 2: Event listener accumulation ────────────────────────
// Each reconnection adds a new listener. removeListener is never called.
// process.on('warning') will emit MaxListenersExceededWarning at 11 listeners,
// but by then you might already have hundreds.

function handleConnectionBad(socket) {
  const onData = (chunk) => processChunk(socket, chunk);
  socket.on('data', onData);
  // BUG: socket 'close' event never removes 'data' listener
  // If this socket is reused or the same emitter receives multiple calls,
  // listeners pile up indefinitely
}

// FIX: always pair addListener with removeListener
function handleConnection(socket) {
  const onData = (chunk) => processChunk(socket, chunk);
  socket.on('data', onData);

  socket.once('close', () => {
    socket.removeListener('data', onData);
    // 'once' ensures this cleanup handler itself doesn't accumulate
  });
}

// For high-connection-volume services, set explicit listener limits:
socket.setMaxListeners(3); // data + error + close — that's all this socket needs


// ── PATTERN 3: Closure retaining large objects ────────────────────
// The closure captures the entire scope, including objects it doesn't use.
// Returned functions keep those objects alive in the old generation forever.

function BAD_createMiddleware() {
  const requestLog = []; // Grows with every request — never cleared

  return function middleware(req, res, next) {
    requestLog.push(req); // Holds every req object ever received
    next();
  };
}

// FIX: only retain what you actually need
function createMiddleware() {
  const requestCount = { value: 0 }; // Tiny counter, not the whole req object

  return function middleware(req, res, next) {
    requestCount.value++;
    next();
  };
}


// ── PATTERN 4: setInterval without clearInterval ──────────────────
// Common in module-level setup code — the interval callback fires forever
// and holds references to everything in its closure scope.

let pollingInterval;
function startPolling(config) {
  // GOOD: store the reference so we can clear it
  pollingInterval = setInterval(async () => {
    await pollRemoteService(config);
  }, 5000);
}

function stopPolling() {
  if (pollingInterval) {
    clearInterval(pollingInterval);
    pollingInterval = null;
  }
}

// Always hook into process shutdown to clean up timers:
process.on('SIGTERM', () => {
  stopPolling();
  // then close server, drain DB pool, etc.
});


// ── Heap snapshot helper for production debugging ─────────────────
// Add this to a debug-only route or triggered by a signal.
// Never leave heap snapshot generation on the hot path — it pauses the event loop.
const v8 = require('v8');

process.on('SIGUSR2', () => {
  const filename = `heap-${Date.now()}.heapsnapshot`;
  v8.writeHeapSnapshot(filename);
  console.log(`Heap snapshot written to ${filename}`);
  // Copy the file off the instance and open in Chrome DevTools Memory tab
});
Output
// MaxListenersExceededWarning from Node.js when listener pattern is wrong:
(node:1234) MaxListenersExceededWarning: Possible EventEmitter memory leak detected.
11 data listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit
// Heap snapshot written on SIGUSR2:
Heap snapshot written to heap-1741132800000.heapsnapshot
// lru-cache eviction working correctly under load:
// After 100,000 unique user requests, userCache.size === 10,000
// (not 100,000 — LRU eviction kept it bounded)
// Memory stays stable instead of growing linearly with unique users
GC Pauses Are the Silent Latency Killer
A 1.4GB heap at 90% utilisation triggers aggressive mark-sweep-compact cycles that pause the event loop for 50-200ms each. These pauses do not appear in your application logs. They do not increment your error counters. They show up only as latency spikes at the P95 and P99 percentiles — and they look identical to a slow database query or a blocked event loop. If your latency percentiles drift upward over hours with no corresponding application error, check heap utilisation and GC frequency before debugging anything else. The metric you want: v8.getHeapStatistics().used_heap_size / v8.getHeapStatistics().heap_size_limit — if this ratio exceeds 0.85, you are in GC pressure territory.
Production Insight
The most common production memory leak in Node.js is an unbounded in-memory cache with no size limit and no TTL.
It works perfectly in staging: low traffic, frequent redeploys, never grows large enough to matter.
In production: 48 hours of real traffic later, the cache has 500,000 entries, GC is running continuously, event loop lag is 300ms, and on-call gets paged for an 'unexplained latency increase'.
Rule: every in-memory cache, every Map used as a cache, every object accumulating state must have an explicit maximum size and a TTL. No exceptions. lru-cache with both max and ttl options is one line of configuration.
Key Takeaway
V8 GC pauses scale with old generation heap utilisation — a heap at 90% capacity causes 50-200ms event loop stops that look like application bugs.
The three most common production leak sources: unbounded caches without TTL/size limits, event listeners never removed on connection close, and closures in long-lived objects retaining large references.
Heap snapshot comparison in Chrome DevTools is the only reliable way to find leaks — guessing wastes hours. Take a baseline, wait under load, take a second snapshot, compare retained size growth by object type.
Keep heaps small: --max-old-space-size at 70-80% of container memory, multiple smaller processes via clustering rather than one large heap.

Production Profiling: Finding Real Bottlenecks Under Real Load

Synthetic benchmarks lie in very specific ways. A service that handles 50,000 requests per second in autocannon or wrk testing may collapse at 5,000 requests per second in production because synthetic benchmarks cannot replicate: real-world payload size variance, database query latency distribution under concurrent connection pressure, connection pool contention between concurrent requests, GC pressure from actual memory usage patterns, and the interaction between all of these simultaneously. Production profiling reveals what benchmarks never expose.

The three essential tools in 2026: clinic.js for automated high-level diagnosis across event loop, CPU, and memory simultaneously; --inspect with Chrome DevTools for interactive CPU flame graphs and heap timeline recording; and perf/DTrace for low-level kernel-level analysis when the issue is in native code or the V8 runtime itself. Each operates at a different depth.

Start with clinic doctor every time — it gives the fastest cross-dimensional view of what is wrong. Event loop delay, CPU profile, and memory trend in a single dashboard that takes two minutes to generate. When doctor identifies an area of concern, drill into it: clinic flame for CPU hotspot identification, clinic heapprofiler for allocation timeline analysis. Only drop to perf/DTrace when the problem is not visible at the JavaScript level — native addon performance, V8 JIT behaviour, or system-call overhead.

The profiling overhead question matters in 2026 because teams are increasingly reluctant to run profiling tools in production after incidents caused by profiler overhead. Clinic.js adds roughly 5-15% overhead to throughput. Chrome DevTools CPU profiling adds 20-40% overhead and should only run on canary instances. The right workflow: run clinic against a canary pod or a staging environment under production-representative load, not against your primary fleet.

profiling.commands.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# ── Install clinic globally (requires Node.js 18+, works on 22 LTS) ────────
npm install -g clinic

# Verify installation:
clinic --version


# ═════════════════════════════════════════════════════════════════════
# STEP 1: Full diagnostic overview — start here, always
# ═════════════════════════════════════════════════════════════════════

clinic doctor -- node server.js
# While this runs, apply load in a separate terminal:
npx autocannon -c 100 -d 30 http://localhost:3000
# clinic doctor generates an HTML report showing:
#   - Event loop delay over time (spikes = blocking operations)
#   - CPU usage across the profile period
#   - Memory growth trend
#   - Recommendations based on detected patterns


# ═════════════════════════════════════════════════════════════════════
# STEP 2: CPU hotspot identification (when doctor shows high CPU)
# ═════════════════════════════════════════════════════════════════════

clinic flame -- node server.js
# Apply load while running.
# The flamegraph shows call stacks with width proportional to CPU time.
# Wide boxes at the top of the stack are your hotspots.
# Look for synchronous JavaScript in the middle of the stack — anything
# that should be async but isn't shows up as a wide synchronous column.


# ═════════════════════════════════════════════════════════════════════
# STEP 3: Memory allocation profiling (when memory grows unexpectedly)
# ═════════════════════════════════════════════════════════════════════

clinic heapprofiler -- node server.js
# Shows an allocation timeline — not just what IS in the heap,
# but what is being actively allocated over time.
# Useful distinction: a large retained object is a leak.
# High allocation rate that GC keeps up with is a GC pressure issue, not a leak.


# ═════════════════════════════════════════════════════════════════════
# STEP 4: Async operation flow analysis (when timing looks wrong)
# ═════════════════════════════════════════════════════════════════════

clinic bubbleprof -- node server.js
# Visualises async operation chains — shows where time is spent waiting
# between async operations. Useful for finding:
#   - Database query chains that could be parallelised with Promise.all()
#   - Missing connection pool capacity (lots of time waiting for a connection)
#   - Unnecessary sequential async operations


# ═════════════════════════════════════════════════════════════════════
# STEP 5: Manual event loop lag measurement (production-safe)
# ═════════════════════════════════════════════════════════════════════

# Option A: quick terminal check
node -e "
  const INTERVAL = 1000;
  let last = Date.now();
  setInterval(() => {
    const now = Date.now();
    const lag = now - last - INTERVAL;
    last = now;
    console.log('Event loop lag:', lag.toFixed(0), 'ms');
    if (lag > 100) console.warn('WARNING: event loop saturation detected');
  }, INTERVAL);
"

# Option B: expose as a Prometheus metric (recommended for production)
# In your application startup:
# const promClient = require('prom-client');
# promClient.collectDefaultMetrics(); // includes nodejs_eventloop_lag_seconds


# ═════════════════════════════════════════════════════════════════════
# STEP 6: V8 CPU profile via --inspect (for interactive investigation)
# ═════════════════════════════════════════════════════════════════════

# Start the process with the inspector enabled:
node --inspect=0.0.0.0:9229 server.js

# Then:
# 1. Open Chrome and navigate to chrome://inspect
# 2. Click 'inspect' on your Node.js target
# 3. Go to the Profiler tab
# 4. Click 'Start' — apply load — click 'Stop'
# 5. Analyse the CPU profile flamegraph

# For production canary: bind inspector to localhost only
node --inspect=127.0.0.1:9229 server.js
# Tunnel via SSH to access from your laptop:
# ssh -L 9229:localhost:9229 user@canary-host


# ═════════════════════════════════════════════════════════════════════
# STEP 7: Heap snapshot on demand (production debugging)
# ═════════════════════════════════════════════════════════════════════

# If you added process.on('SIGUSR2') for heap snapshots:
kill -USR2 <node-pid>
# File appears in the working directory as heap-<timestamp>.heapsnapshot
# Load it in Chrome DevTools > Memory tab > Load profile
Output
# clinic doctor output (terminal summary before HTML report opens):
✔ Analysed data
✔ Generated HTML report
open file:///.clinic/12345.clinic-doctor-html
Key findings:
⚠ Event loop delay detected — max 3847ms at 14:23:07
⚠ Synchronous operations detected in the event loop
✓ Memory: stable growth — no apparent leak
✓ CPU: consistent with expected load
Recommendation: Use clinic flame to identify the synchronous hotspot
# Event loop lag measurement output:
Event loop lag: 2 ms
Event loop lag: 3 ms
Event loop lag: 1 ms
WARNING: event loop saturation detected
Event loop lag: 387 ms ← This is when the blocking operation ran
Event loop lag: 4 ms
Event loop lag: 2 ms
# autocannon load test output:
Running 30s test @ http://localhost:3000
100 connections
┌─────────┬────────┬─────────┬─────────┬────────────┬──────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │
├─────────┼────────┼─────────┼─────────┼────────────┼──────────┤
│ Latency │ 8 ms │ 12 ms │ 3847 ms │ 4203 ms │ 47 ms │
└─────────┴────────┴─────────┴─────────┴────────────┴──────────┘
Req/Sec: 8,432 (average) — note the P99 spike despite high average throughput
# The average looks fine. The P99 is catastrophic. This is why you watch percentiles.
Always Profile Under Realistic Concurrent Load
An idle Node.js server shows zero event loop lag, zero CPU hotspots, and flat memory — a completely clean profile that tells you nothing. The blocking operations, the GC pressure patterns, and the async chain bottlenecks only appear under concurrent load because they require multiple requests competing for the same event loop. Before you profile, generate realistic traffic with autocannon, k6, or artillery. Use realistic payload sizes and request patterns from your production access logs — not uniform requests to a single lightweight endpoint. A profiling session on an idle server is the most expensive way to learn nothing.
Production Insight
Clinic.js adds 5-15% overhead to request throughput and should never run permanently in production.
The right approach: run clinic against a canary pod or a dedicated staging instance that receives a copy of production traffic via traffic mirroring.
On Node.js 22, you can use --cpu-prof for built-in V8 CPU profiling without any external tooling — the output is compatible with Chrome DevTools: node --cpu-prof server.js.
Rule: profile in production-equivalent environments under production-equivalent traffic patterns. Analyse the report. Remove the profiler. Make one change. Measure. Repeat.
Key Takeaway
Start with clinic doctor — it gives the fastest cross-dimensional view of event loop, CPU, and memory behaviour simultaneously.
Always apply realistic concurrent load while profiling — idle servers reveal nothing.
Move from overview (doctor) to detail (flame for CPU, heapprofiler for allocations, bubbleprof for async chains) as patterns emerge.
Node.js 22 includes --cpu-prof as a built-in V8 profiler — useful when clinic.js is not available or when you need zero external dependencies.
● Production incidentPOST-MORTEMseverity: high

The 4-Second Event Loop Block That Took Down Production

Symptom
All API endpoints — not just the reporting endpoint — started returning 504 Gateway Timeout errors within minutes of the deployment. P99 latency spiked from 45ms to over 4,000ms. Kubernetes pods showed healthy status (the health check was a simple synchronous string response that returned in under 1ms) but load balancer metrics showed 100% connection exhaustion across all pods in the cluster.
Assumption
The team assumed the database was the bottleneck because latency had spiked once before due to a slow query. They scaled read replicas, increased connection pool sizes from 10 to 50, and waited. Latency continued climbing. Nobody looked at the newly deployed reporting endpoint because it was described in the PR as a 'minor template change'.
Root cause
The reporting endpoint used handlebars.compile() called inline on every request against a 2MB template file with deeply nested loops. Each compile-and-render cycle blocked the event loop for approximately 3.8 seconds. Under concurrent load, event loop lag accumulated exponentially — every new request queued behind the blocked loop, which meant even a 5ms health check request had to wait for the 3.8-second render to complete. The Kubernetes liveness probe had a 5-second timeout, so probes barely passed — just long enough to keep the pods alive and accepting traffic, while all actual application traffic timed out. The incident ran for 22 minutes before the new deployment was identified as the cause.
Fix
1. Immediately reverted the deployment — restored P99 to 45ms within 90 seconds of rollback 2. Moved template compilation to application startup: compile once at boot, store the compiled template function, call it on each request — compile time is now paid once, not per request 3. Switched to streaming template rendering for the large report output using a streaming-compatible templating approach 4. Added event loop lag monitoring via the event-loop-lag npm package, exposed as a Prometheus gauge 5. Implemented a circuit breaker middleware that returns HTTP 503 with a Retry-After header when event loop lag exceeds 500ms, preventing further request queuing behind a saturated loop 6. Updated the Kubernetes liveness probe to measure event loop responsiveness (a dedicated /health/live endpoint that uses setImmediate to verify the event loop is actually scheduling work) rather than just TCP connectivity
Key lesson
  • Never compile templates, parse large payloads, or run regex on untrusted input synchronously inside the request path — these are event loop blockers regardless of how small they look in code review
  • Health checks must measure event loop responsiveness, not just process existence — a process can be alive and completely unable to serve requests
  • A single synchronous operation does not just slow one endpoint — it blocks every concurrent request in the entire process for its full duration
  • Monitor event loop lag as a first-class production metric from day one — latency percentiles alone will not tell you why things are slow
  • Code review descriptions like 'minor template change' deserve scrutiny when the change touches the hot request path — the word 'minor' has no meaning when the event loop is involved
Production debug guideCommon production symptoms and their immediate debugging actions5 entries
Symptom · 01
P99 latency spikes while CPU usage remains low across all cores
Fix
This is almost always event loop blocking or GC pressure — not a resource exhaustion problem. Measure event loop lag directly with the event-loop-lag package or use clinic doctor to visualise event loop blocking over time. Low CPU with high latency means work is queued behind something synchronous, not that the system is idle.
Symptom · 02
Memory usage grows linearly over hours until OOM crash (exit code 137 in containers)
Fix
Take heap snapshots at intervals via --inspect and Chrome DevTools Memory tab. Take snapshot A, wait 30 minutes under load, take snapshot B, then use the Comparison view to identify object types with growing retained size. Exit code 137 is the kernel OOM killer — the container hit its memory limit. Check not just V8 heap but total RSS including Buffer allocations.
Symptom · 03
CPU pinned at 100% on a single core despite clustering being configured
Fix
Verify cluster workers are actually forked and running: ps aux | grep node. Check if the master process is somehow handling requests instead of delegating to workers. Also verify that the scheduling policy is round-robin — on Linux, the default SCHED_NONE delegates scheduling to the OS and can produce uneven distribution under certain connection patterns.
Symptom · 04
Throughput plateaus after adding more cluster workers — no improvement beyond N workers
Fix
Check shared resource contention first. Database connection pool size is the most common ceiling — if 8 workers share a pool of 10 connections, adding workers 9 and 10 gains nothing. Also check file descriptor limits (ulimit -n), port range exhaustion for outbound connections (ss -s), and whether a downstream service is the actual bottleneck.
Symptom · 05
Gradual latency degradation over days with no single identifiable event
Fix
This pattern usually indicates a slow-growing event loop block — a cache whose lookup time grows as it fills, a regex applied to progressively longer strings, or GC pauses increasing as heap utilisation climbs. Correlate event loop lag metrics over time with heap size metrics. If lag tracks heap growth, you have a memory leak causing GC pressure.
★ Node.js Performance Quick Debug Cheat SheetImmediate actions for common Node.js performance issues in production — ordered by what to check first
Event loop appears blocked — all requests timing out simultaneously
Immediate action
Identify the blocking operation before changing anything else — you need to know what's blocking before you can fix it
Commands
npx clinic doctor -- node app.js
node --inspect=9229 app.js
Fix now
Move the identified synchronous work off the request path entirely — either pre-compute at startup, cache the result, or offload to a worker thread using piscina
Heap memory growing unbounded — RSS increasing steadily over hours+
Immediate action
Force a GC cycle and take a heap snapshot to establish a baseline before making any changes
Commands
node --inspect --expose-gc app.js
kill -USR2 <pid>
Fix now
Add max-size and TTL limits to every in-memory cache using lru-cache. Check for event emitters with growing listener counts (process.on('warning') will report MaxListenersExceeded). Look for Map or Set objects in closure scope that are never cleared.
Single worker consuming disproportionate CPU in cluster mode+
Immediate action
Verify that round-robin scheduling is active — the OS default on Linux is not round-robin
Commands
NODE_CLUSTER_SCHED_POLICY=rr node app.js
htop -p $(pgrep -f 'node app.js' | tr '\n' ',')
Fix now
Set NODE_CLUSTER_SCHED_POLICY=rr environment variable and restart. If distribution remains uneven, check whether long-lived connections (WebSockets, SSE) are pinning clients to specific workers.
High latency under load with low CPU — requests queuing without obvious reason+
Immediate action
Measure event loop lag quantiles immediately — this distinguishes GC pressure from synchronous blocking
Commands
npx clinic bubbleprof -- node app.js
node -e "const lag=require('event-loop-lag')(1000); setInterval(()=>console.log('lag:', lag().toFixed(2)+'ms'),5000)"
Fix now
If lag tracks with request rate, look for synchronous work in the hot path — JSON.parse, crypto, regex. If lag is high even at low request rates, suspect GC pauses — check heap utilisation percentage.
Node.js Scaling Strategies Compared
StrategyBest ForMemory ModelOverheadPrimary Limitation
Cluster ModuleI/O-bound HTTP services on multi-core VMs or multi-CPU podsFully isolated heaps per worker — no sharingLow (~10-30MB per worker for runtime overhead)No shared in-memory state; shared resource contention (DB pools, FDs) is the real ceiling
Worker Threads (piscina)CPU-bound computation within a request (hashing, image processing, report generation)Shared memory possible via SharedArrayBufferMedium (10-50ms startup, ~5-10MB per thread)Not for I/O; pool sizing critical; thread errors surface as promise rejections
PM2 Cluster ModeLong-running services needing zero-downtime reload and log management without K8sIsolated heaps per worker (wraps cluster module)Low (thin wrapper around cluster module)Less operational control than raw cluster API; adds dependency; redundant in K8s
Kubernetes Horizontal Pod AutoscalingStateless services with variable or unpredictable traffic patternsFully isolated pods — separate processes, separate nodesHigher (orchestration, scheduling, network overhead between pods)Network latency between services; slower scale-up than in-process solutions
Event Loop Optimisation aloneServices already at low concurrency where single-core throughput is the bottleneckSingle process, single heapNone — pure code improvementSingle-core CPU ceiling; cannot utilise additional cores on the host
Cluster + Worker Threads (combined)Mixed workloads: high concurrent I/O with occasional CPU-bound operations per requestIsolated heaps per cluster worker; optional shared memory within each worker's thread poolMedium (cluster overhead + thread pool overhead)Complexity: two concurrency models to reason about, debug, and tune simultaneously

Key takeaways

1
The event loop has six phases with fixed execution order
microtasks (nextTick first, then Promise.then) drain between every phase transition. Blocking any phase blocks every concurrent connection for the full duration.
2
Cluster forks one process per CPU core, each with its own V8 heap and event loop. Workers share nothing in memory
sessions, caches, and rate limiters must live in Redis or an equivalent external store.
3
Worker threads handle CPU-bound computation within a request without blocking the event loop. Use a fixed-size pool via piscina sized to CPU count
never spawn a thread per request. Cluster and worker threads solve different problems and can be combined.
4
Most production memory leaks come from three sources
unbounded in-memory caches without size limits or TTLs, event listeners accumulated without corresponding removeListener calls, and closures in long-lived objects retaining large references. Every cache needs both max and ttl.
5
V8 GC pauses scale with old generation heap utilisation. A heap at 90% capacity produces 50-200ms event loop stops that manifest as latency spikes, not errors. Set --max-old-space-size to 70-75% of container memory allocation, not 100%.
6
Profile under realistic concurrent load with clinic.js
start with clinic doctor for the cross-dimensional overview, then drill into flame or heapprofiler as patterns emerge. Idle servers hide every problem.
7
Monitor event loop lag as a first-class production metric using prom-client's collectDefaultMetrics(). It is the earliest signal of event loop health degradation
earlier than error rates, earlier than latency percentiles.
8
Node.js 22 LTS is the active release in 2026
it ships with io_uring on Linux for faster async file I/O, V8 12.x with improved JSON.parse performance, and a stable built-in test runner. The event loop model and all optimisation principles in this guide remain unchanged.

Common mistakes to avoid

7 patterns
×

Using JSON.parse on large request bodies without streaming or size limits

Symptom
Event loop blocks for 50-500ms per large request, causing all concurrent connections to experience the same latency spike simultaneously. Appears as P99 spikes correlated with specific request types, not with overall load.
Fix
Enforce a strict body size limit at the reverse proxy (nginx, AWS ALB, Cloudflare) before the payload reaches Node.js. For payloads that legitimately need to be large, use a streaming JSON parser (JSONStream or the WHATWG Streams API with a streaming JSON decoder). Set body-parser's limit option as a last line of defence. Never buffer a multi-megabyte payload and then synchronously parse it in the request handler.
×

Using cluster without an external session or cache store

Symptom
Users experience random logouts, inconsistent feature flag states, or stale data because their requests hit different workers with completely separate in-memory stores. The issue appears intermittently and is difficult to reproduce locally because local development typically runs a single process.
Fix
Move sessions to Redis (connect-redis with express-session), move caches to Redis with appropriate TTLs, and move rate limiting state to Redis (rate-limit-redis). In-memory stores are only reliable in single-process deployments. As of 2026, Valkey (the Redis fork maintained by the Linux Foundation) is a production-ready alternative if Redis licensing is a concern.
×

Setting --max-old-space-size to the full container memory allocation

Symptom
Container gets OOM-killed by the kernel (exit code 137) even though the V8 heap appears to be within the limit. The kill happens during peak GC activity when V8 briefly holds both the current heap and the compacted copy in memory simultaneously.
Fix
Set --max-old-space-size to 70-75% of your container memory allocation. The remaining 25-30% covers non-heap allocations: Node.js Buffer pool (off-heap by design), native module memory (zlib, crypto), libuv thread pool stacks, worker thread stacks, and the OS page cache. A 512MB container should have --max-old-space-size=384 at most.
×

Using setTimeout(fn, 0) to yield the event loop in batch processing

Symptom
Batch processing of large arrays takes significantly longer than expected — sometimes 10-100x longer than necessary. Each setTimeout(fn, 0) call introduces a minimum ~1ms delay due to timer resolution, which adds up to seconds across thousands of batches.
Fix
Use setImmediate instead — it runs in the check phase of the current event loop iteration with no artificial delay. For a batch of 100,000 items processed 500 at a time, setImmediate yields 200 times versus setTimeout adding 200ms of minimum delay. The difference between 'fast enough' and 'too slow for production SLAs' at scale.
×

Not monitoring event loop lag as a production metric

Symptom
Latency degrades gradually over days — a slow-growing cache, a regex applied to progressively longer strings, or GC pressure building as a memory leak matures. No single request fails. No error rate increases. Latency percentiles drift upward by 10ms per day for two weeks before someone notices.
Fix
Expose event loop lag as a Prometheus metric using prom-client's collectDefaultMetrics() — which emits nodejs_eventloop_lag_seconds, nodejs_eventloop_lag_p50_seconds, and nodejs_eventloop_lag_p99_seconds. Set alerts: warn at P95 > 50ms, page at P99 > 200ms. This metric is the earliest signal of event loop health degradation — earlier than latency percentiles, earlier than error rates.
×

Spawning a worker thread per incoming request

Symptom
Under sustained load, the process spawns hundreds of threads simultaneously. Memory usage spikes 2-3x normal. CPU spends more time on thread management than actual work. OS thread limits (typically 4,096-32,768 depending on configuration) cause thread creation to start failing with EAGAIN errors, which surface as unhandled exceptions.
Fix
Use piscina with a fixed maxThreads sized to the CPU core count (or CPU count minus 1 to leave a core for the event loop). Queue excess work rather than spawning more threads. Monitor pool.queueSize as a metric and implement load shedding (return 503) when the queue exceeds a threshold that your SLA cannot tolerate.
×

Ignoring unhandledRejection events in cluster workers

Symptom
On Node.js 15 and later (including Node.js 22), unhandled promise rejections terminate the process. Cluster workers die silently — no stack trace in the application logs, just a 'Worker N exited with code 1' message. The root cause is invisible.
Fix
Add a process.on('unhandledRejection') handler in every worker that logs the full rejection reason and stack trace before the process exits. Consider whether to exit immediately (safest — prevents undefined state) or attempt graceful shutdown first. Never swallow the rejection without logging — a silent crash is the hardest class of production bug to diagnose.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain the phases of the Node.js event loop and what happens in each ph...
Q02SENIOR
What is the difference between cluster.fork() and worker_threads, and wh...
Q03SENIOR
How would you detect and fix a memory leak in a production Node.js servi...
Q04SENIOR
Why might a Node.js service have low CPU usage but high latency?
Q05JUNIOR
What is event loop lag and how do you monitor it in production?
Q01 of 05SENIOR

Explain the phases of the Node.js event loop and what happens in each phase.

ANSWER
The event loop runs six phases in a fixed order every iteration. Timers: executes setTimeout and setInterval callbacks whose minimum delay has elapsed. Pending callbacks: executes I/O callbacks that were deferred to the next iteration, like TCP error notifications from the previous iteration. Idle/prepare: internal V8 housekeeping — application code does not interact with this phase. Poll: retrieves new I/O events and executes their callbacks — this is where the majority of application work happens, and where the loop may block waiting for new I/O events if there are no pending timers and no setImmediate callbacks. Check: executes setImmediate callbacks — specifically designed to run after the poll phase, before the next timer phase. Close callbacks: executes handlers for abrupt closes, like socket.on('close'). Between every phase transition, the microtask queue is fully drained in priority order: process.nextTick callbacks first (all of them), then resolved Promise.then handlers. This means process.nextTick fires before any Promise.then, which fires before the next event loop phase. The practical implication: if you call process.nextTick recursively, you can starve I/O indefinitely — the next phase never starts until the nextTick queue is empty.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
How many cluster workers should I run in production?
02
Why does my Node.js process use more memory than --max-old-space-size allows?
03
Should I use PM2 or the built-in cluster module?
04
Can I use async/await for everything and never worry about blocking the event loop?
05
What causes the 'JavaScript heap out of memory' error and how do I fix it immediately?
06
Is Node.js 22 LTS significantly different from Node.js 20 LTS for the topics covered in this guide?
🔥

That's Node.js. Mark it forged?

8 min read · try the examples if you haven't

Previous
Node.js Error Handling
14 / 18 · Node.js
Next
Node.js Clustering