Advanced 7 min · March 06, 2026

Virtual Memory and Paging — The Hidden 10ms Disk I/O Trap

Q: What is virtual memory in simple terms?

Virtual memory is an OS feature that makes each process think it has its own private memory space, even though physical memory is shared and limited. The OS maps virtual addresses to physical RAM and disk pages transparently.

Q: What is a page fault and why is it expensive?

A page fault occurs when a program accesses a virtual page that is not currently in RAM. The OS must load it from disk (or create a zero‑filled page). A major fault (from disk) costs about 10ms — millions of CPU cycles.

Q: How can I reduce page faults in my application?

Use mlock() to lock pages in RAM, allocate memory with MAP_POPULATE to pre‑fault, use huge pages to reduce TLB pressure, and structure data access to be sequential or with good locality.

Q: What is the difference between a minor and major page fault?

A minor page fault occurs when the page is in memory but not yet mapped in the process's page table (e.g., for copy‑on‑write). It takes microseconds. A major fault requires disk I/O and takes milliseconds.

Q: What does 'thrashing' mean?

Thrashing happens when the system spends more time swapping pages in and out than executing user code. It occurs when the total working set of all processes exceeds physical RAM. The system effectively freezes.

Virtual Memory and Paging: a 4KB page fault triggers 10ms disk I/O, a hidden production killer.

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Virtual memory gives every process its own address space, mapped to physical RAM and disk by the OS
Pages are 4 KB fixed-size blocks; a page fault triggers disk I/O costing ~10ms
TLB caches address translations; a TLB miss adds ~10–100 cycles
Working set must fit in RAM to avoid thrashing — LRU fails for large scans
Production trap: random access to large memory-mapped files causes unpredictable page faults
Rule: use mlock() for latency-critical regions, or prefault pages sequentially

✦ Definition~90s read

What is Virtual Memory and Paging?

★

Imagine a huge library with millions of books, but your desk only fits 10 at a time.

If missing, a hardware TLB miss walks the page tables in RAM (~10–100 cycles). If the page is not in RAM at all, a page fault traps to the kernel, which reads the page from disk (up to 10ms).

This abstraction lets processes use more memory than physically available, but the cost of a fault is massive: 10ms is 10 million CPU cycles at 1 GHz. The key insight: the working set (pages actively accessed) must fit in RAM. If it doesn't, the system thrashes — continuously swapping pages in and out, CPU stalls, throughput collapses.

Plain-English First

Imagine a huge library with millions of books, but your desk only fits 10 at a time. A librarian keeps the books you're actively reading on your desk and stores the rest in a back room. When you need a book that's in storage, she fetches it and swaps out one you haven't touched in a while. Virtual memory is exactly that librarian — your program thinks it has access to a massive, private desk (address space), but the OS is quietly shuffling real memory (RAM) in and out of storage (disk) behind the scenes.

Every process on your machine behaves as if it owns the entire address space — gigabytes of pristine, contiguous memory all to itself. That illusion is one of the most consequential engineering decisions in operating system history. Without it, every program would need to know exactly where other programs live in RAM, a coordination nightmare that would make modern multitasking impossible. Chrome, your game engine, and your SSH daemon can all believe they start at address 0x0000000000400000 simultaneously, and none of them are lying — they're just working with different maps to the same physical territory.

The problem virtual memory solves is threefold: isolation (one process can't stomp on another's memory), overcommitment (you can allocate more memory than physically exists, betting that not all of it will be needed at once), and flexibility (the OS can place physical pages anywhere in RAM regardless of where the process thinks they are). Before virtual memory, if a program needed 100 MB contiguous in RAM and you only had 80 MB free, you were stuck. With paging, the OS can stitch together 25,600 scattered 4 KB pages and the program never knows the difference.

By the end of this article you'll understand exactly how a virtual address becomes a physical one, what happens cycle-by-cycle during a TLB miss and a page fault, how the page replacement algorithms work and where they fail, and — critically — how to write code that doesn't accidentally destroy your own performance by fighting the paging system. We'll dig into the kernel data structures, write instrumented C code to observe paging in action, and cover the production gotchas that have burned engineers at scale.

What is Virtual Memory and Paging?

Virtual memory is a hardware‑software illusion that gives each process its own private address space. The OS slices physical RAM and disk into fixed‑size pages (typically 4 KB). When a program accesses a virtual address, the MMU looks up the page table entry (PTE) in the TLB. If missing, a hardware TLB miss walks the page tables in RAM (~10–100 cycles). If the page is not in RAM at all, a page fault traps to the kernel, which reads the page from disk (up to 10ms).

io/thecodeforge/PagingObservations.javaJAVA

package io.thecodeforge;

import java.nio.ByteBuffer;

public class PagingObservations {
    // Allocate 1 GB direct buffer and touch every 4KB page
    public static void main(String[] args) {
        int size = 1 << 30; // 1 GB
        ByteBuffer buf = ByteBuffer.allocateDirect(size);
        long start = System.nanoTime();
        // Touch each 4KB page by reading the first byte
        for (int i = 0; i < size; i += 4096) {
            byte b = buf.get(i);
        }
        long end = System.nanoTime();
        // Page faults are visible via /usr/bin/time -v
        System.out.println("Touched " + (size / 4096) + " pages in " + (end - start) / 1e6 + " ms");
    }
}

Output

Touched 262144 pages in 320 ms (cold) / 2 ms (warm)

Mental Model

Mental Model: The Librarian on a Budget

Think of the OS as a busy librarian with a tiny desk (RAM) and a huge storeroom (disk).

Each process has its own infinite shelf (virtual address space).
The librarian keeps the books you're actively reading on the desk.
When you ask for a book from storage, she swaps it with one you haven't touched (page replacement).
If you ask for books faster than she can swap, you wait — that's thrashing.

📊 Production Insight

Cold page faults for a 1 GB buffer take ~300ms; warm accesses are sub‑ms.

If your app goes idle and gets swapped out, the next access sees cold faults.

Rule: for latency‑critical paths, pre‑touch and lock pages.

🎯 Key Takeaway

Virtual memory is an abstraction, not a speed guarantee.

Working set > RAM means 10ms faults per access.

Lock or prefault pages for predictable latency.

thecodeforge.io

Virtual Memory Paging

How Address Translation Works

When a program accesses memory at virtual address 0x7f123456, the MMU splits it into three parts: a directory index (bits 39–47), a page table index (bits 30–38), and a page offset (bits 0–29). On x86 with 4‑level page tables, the CPU walks these levels to find the physical page. Each level stores the base of the next table.

This walk is expensive: up to 4 memory accesses for the table entries plus the final data access. That's why the TLB (Translation Lookaside Buffer) exists. The TLB caches recent translations. A hit costs 1 cycle; a miss costs 10–100 cycles for the hardware walk. Modern CPUs have L1 and L2 TLBs, separate for instructions and data. Huge pages (2 MB or 1 GB) reduce the number of entries needed, improving TLB coverage.

io/thecodeforge/translation.cC

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

// Demonstrates page table walk overhead
int main() {
    volatile int *array = malloc(1 << 20);
    int sum = 0;
    for (int i = 0; i < (1 << 20); i += 128) { // stride 128 B to miss TLB often
        sum += array[i];
    }
    printf("Sum: %d\n", sum);
    free(array);
    return 0;
}

Output

Run with: perf stat -e dTLB-load-misses ./translation

📊 Production Insight

TLB misses hurt random access patterns.

For large data sets, use huge pages (2 MB) to reduce TLB pressure.

On Linux, enable transparent huge pages or manually allocate with mmap + MAP_HUGETLB.

🎯 Key Takeaway

Every memory access may cost 4+ cache misses for page walks.

Huge pages improve TLB coverage and reduce miss rate.

Measure TLB misses with perf to know your true access cost.

Page Replacement Algorithms

When RAM is full and a new page is needed, the OS must evict one. The classic algorithm is LRU (Least Recently Used), but real implementations approximate it because true LRU requires tracking every access. Linux uses a variant: the active/inactive list with a second‑chance clock algorithm. Pages are initially placed on the inactive list; when accessed, they move to the active list. The memory manager periodically moves pages from the active to the inactive tail, and the page reclaim code evicts from the inactive head.

This works well for most workloads, but fails for large sequential scans: a scan touches many pages exactly once, causing them to be moved to the active list and evicting truly hot pages. This is the scanning problem. To avoid it, Linux limits how many pages can be activated per rotation (the page_cache_limit heuristics).

io/thecodeforge/scan.cC

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Heavy sequential scan to flood page cache
int main() {
    size_t size = 1UL << 30; // 1 GB
    char *buf = malloc(size);
    memset(buf, 'a', size);  // touch every page
    printf("Touched all pages once. Check /proc/meminfo for file-backed pages\n");
    free(buf);
    return 0;
}

Output

Observe with: grep -E '^(Active|Inactive)' /proc/meminfo

📊 Production Insight

Sequential scans poison LRU by pushing out hot pages.

Use fadvise(FADV_SEQUENTIAL) or madvise(MADV_SEQUENTIAL) to tell the kernel not to activate scanned pages.

For databases, use direct I/O to bypass the page cache entirely.

🎯 Key Takeaway

LRU eviction fails under scans — active list is polluted.

Use madvise to hint access patterns.

Direct I/O avoids cache pollution for streaming workloads.

thecodeforge.io

Virtual Memory Paging

Performance Considerations and Production Pitfalls

The most common paging performance trap is assuming that memory allocated is memory instantly accessible. With demand paging, mmap or malloc only set up virtual mappings; the pages are allocated and populated only on first access (or not at all if overcommitted). This means a seemingly harmless access to a memory‑mapped file or a newly allocated buffer can cost 10ms.

Memory pressure leads to swapping: the OS writes pages to disk and reads them back when needed. This is catastrophic for latency. Use vmstat to watch si (swap in) and so (swap out). If they are non‑zero, your system is already thrashing.

Production tools: perf for TLB misses and page faults; numastat for NUMA local/remote hits; trace-cmd for page fault traces. The golden rule: measure, not guess.

monitor.shBASH

#!/bin/bash
# Monitor paging activity with perf and /proc
while true; do
    clear
    echo "=== Page Faults (PID 1) ==="
    perf stat -e page-faults,minor-faults,major-faults -p 1 sleep 1 2>&1 | head -4
    echo "=== Swap Activity ==="
    vmstat 1 2 | tail -1 | awk '{print "si:" $7 " so:" $8}'
    sleep 2
done

Output

Shows real‑time paging stats

📊 Production Insight

Always profile before optimising paging.

If page faults are low but latency is high, check TLB misses.

Swap is the final warning — at that point, performance is already degraded.

🎯 Key Takeaway

Memory allocation ≠ memory presence.

Tools: perf for faults, vmstat for swap, /proc/meminfo for active pages.

Fix: mlock, huge pages, madvise, or bypass page cache.

The Hidden Performance Cost of TLB Misses in Production

Every virtual memory access hits the Translation Lookaside Buffer (TLB) first. That's a tiny hardware cache for page table entries. A TLB hit costs ~1 CPU cycle. A miss? That triggers a multi-step page walk through memory, costing 10-100 cycles. In latency-sensitive systems -- think high-frequency trading or real-time video processing -- TLB misses are silent killers.

The problem isn't just speed. Modern CPUs use multi-level TLBs (L1, L2). When a process jumps between many virtual pages without spatial locality, you trash those caches. I've seen Node.js servers degrade 40% under load because of scattered memory access patterns.

You can check your TLB miss rate with perf stat -e dTLB-load-misses,iTLB-load-misses. If misses exceed 1% of total accesses, you're leaving performance on the table. Resize your page tables (huge pages help) or restructure data for sequential access. Your CPU's TLB is small -- treat it like L1 cache.

check_tlb.shBASH

#!/bin/bash
# TheCodeForge: Measure TLB miss rates on Linux
perf stat -e dTLB-load-misses,dTLB-loads -p $(pgrep -n my_app) -- sleep 5

Output

Performance counter stats for process 12345:

42,731,492 dTLB-loads

412,893 dTLB-load-misses # 0.97% miss rate

⚠ Production Trap:

Don't assume 'bigger pages' always win. Huge pages (2MB) reduce TLB pressure but fragment memory. In Kubernetes pods with memory limits, huge pages can cause allocation failures. Test with and without transparent huge pages (THP) before deploying.

🎯 Key Takeaway

Your TLB is the fastest path in the memory system. Miss it, and you pay with CPU cycles.

Why Demand Paging Killed the 'Load Everything' Mentality

Old-school memory managers loaded entire programs into RAM before execution. That's wasteful. Demand paging loads only the pages a process actually touches -- on first access. The mechanism is elegant: when the CPU issues a virtual address for an unmapped page, the MMU fires a page fault. The OS traps it, reads the page from disk, updates the page table, and retries the instruction.

This is why a Python script importing 50 modules doesn't need 2GB of RAM. Each import triggers fault-driven loads. The real insight? Most code lives on disk. Only hot paths hit RAM. In containerized microservices, this means your 500MB Docker image might only need 50MB of RSS at steady state.

Watch out for "thrashing" though. If working set exceeds physical RAM, the system spends all time swapping. The fix: pin critical pages with mlock() (for low-latency paths) or profile memory with mincore() to see what's actually mapped.

page_fault_monitor.pyPYTHON

#!/usr/bin/env python3
# TheCodeForge: Monitor page faults per process
import os, time, subprocess

pid = os.getpid()
print(f"Watching PID {pid} for 10 seconds...")

for _ in range(5):
    with open(f'/proc/{pid}/stat') as f:
        fields = f.read().split()
        maj_flt = int(fields[11])
        min_flt = int(fields[9])
    print(f"Major faults: {maj_flt}, Minor faults: {min_flt}")
    time.sleep(2)

Output

Watching PID 12345 for 10 seconds...

Major faults: 0, Minor faults: 342

Major faults: 0, Minor faults: 356

Major faults: 1, Minor faults: 412

Major faults: 1, Minor faults: 415

Major faults: 1, Minor faults: 420

🔥Nerd Note:

A 'major' fault means disk I/O happened -- that's a 10ms+ stall. 'Minor' faults are cheap (just page table update). If major faults spike, your working set doesn't fit in RAM. Add memory or reduce concurrency.

🎯 Key Takeaway

Demand paging saves memory at the cost of latency. Profile your faults or prepay with prefaulting.

Five-Level Page Tables in Modern x86-64

Modern x86-64 processors have evolved from four-level page tables to five-level page tables to support larger virtual address spaces. The traditional four-level table (PML4, PDPT, PD, PT) supports 48-bit virtual addresses, giving 256 TiB of addressable space. With the explosion of large-memory workloads (e.g., in-memory databases, big data analytics), 48 bits became insufficient. Five-level page tables extend the virtual address space to 57 bits, supporting up to 128 PiB of virtual memory. This is achieved by adding a new level called PML5 (Page Map Level 5) at the top of the hierarchy. Each level is a 4KB page containing 512 entries (9 bits each), so the total bits used are 9×5 + 12 (offset) = 57 bits. For example, on an Intel Ice Lake or later server CPU, the kernel can use five-level paging to map huge memory regions. However, this comes at a cost: each TLB miss now requires up to 5 memory accesses (one per level) instead of 4, increasing the latency of page table walks. In production, this can degrade performance for workloads with poor TLB locality. To mitigate, modern CPUs use larger TLB sizes and support huge pages (2MB and 1GB) which reduce the number of levels needed. When enabling five-level paging, ensure your hypervisor and OS support it (e.g., Linux 5.0+ with CONFIG_PGTABLE_LEVELS=5).

check_page_levels.cC

#include <stdio.h>
#include <unistd.h>
#include <sys/sysinfo.h>

int main() {
    // On Linux, check /proc/cpuinfo for 'pdpe1gb' flag
    // This code prints the page table level from kernel config
    FILE *f = fopen("/proc/version", "r");
    if (!f) return 1;
    char buf[256];
    fgets(buf, sizeof(buf), f);
    fclose(f);
    printf("Kernel: %s", buf);
    // Check if 5-level paging is enabled
    if (access("/sys/kernel/mm/5level-paging", F_OK) == 0)
        printf("Five-level page tables enabled\n");
    else
        printf("Four-level page tables (default)\n");
    return 0;
}

🔥Five-Level Paging Support

📊 Production Insight

In production, monitor /proc/vmstat for 'pgtable_walk' events to detect excessive page table walks. Consider using 1GB huge pages for large-memory workloads to reduce TLB pressure.

🎯 Key Takeaway

Five-level page tables extend virtual address space to 57 bits but increase TLB miss cost; use huge pages to mitigate.

MMU and TLB: How Virtual Address Translation Actually Works

The Memory Management Unit (MMU) is a hardware component that translates virtual addresses to physical addresses. When a CPU core issues a memory access, the MMU first checks the Translation Lookaside Buffer (TLB), a small, fast cache of recent translations. If the translation is in the TLB (TLB hit), the physical address is obtained immediately. On a TLB miss, the MMU performs a page table walk: it traverses the multi-level page tables in memory to find the corresponding page table entry (PTE). Each level requires a memory read, so a miss can take tens of cycles (e.g., 4-5 memory accesses). Modern CPUs have separate TLBs for instructions (ITLB) and data (DTLB), and often multi-level TLBs (L1, L2) to improve hit rates. For example, an Intel Skylake has 64 entries L1 DTLB and 1536 entries L2 STLB. The page table walk is handled by a hardware state machine called the page walker. The OS manages the page tables; when a process is context-switched, the TLB is flushed (or tagged with Address Space Identifiers, ASIDs, to avoid flushing). In production, TLB misses are a hidden cost: they add latency to every memory access that misses the TLB. Tools like perf can measure TLB miss rates (e.g., perf stat -e dTLB-load-misses). High TLB miss rates can be mitigated by using huge pages (2MB or 1GB) which reduce the number of page table entries and improve TLB coverage. For example, a 2MB huge page covers 512 4KB pages with one TLB entry.

tlb_miss_measure.shBASH

#!/bin/bash
# Measure TLB misses for a process using perf
# Usage: ./tlb_miss_measure.sh <command>
perf stat -e dTLB-load-misses,dTLB-loads,iTLB-load-misses,iTLB-loads \
    -e page-faults,minor-faults,major-faults \
    -- $@
# Example output:
# 10,000 dTLB-load-misses  # 0.5% of all dTLB loads
# 2,000,000 dTLB-loads
# High miss rate (>1%) indicates TLB pressure.

⚠ TLB Miss Cost

📊 Production Insight

Monitor TLB miss rates with perf. If >1%, enable transparent hugepages (THP) or manually configure huge pages for critical applications.

🎯 Key Takeaway

MMU translates virtual to physical addresses; TLB caches translations. TLB misses trigger page walks, adding latency. Use huge pages to reduce misses.

Swap and zRAM: Modern Swap Techniques

Swap has traditionally been used to extend physical memory by paging out infrequently used pages to disk. However, disk I/O is slow (milliseconds), making swap a performance killer. Modern techniques like zRAM (compressed RAM swap) and zswap (compressed cache) mitigate this. zRAM creates a compressed block device in RAM, acting as a swap device. Pages are compressed (typically using LZ4 or ZSTD) and stored in memory, avoiding disk I/O. Compression ratios of 2:1 to 3:1 are common, effectively increasing memory capacity. For example, on a system with 8GB RAM, enabling zRAM with 4GB swap can handle workloads that would otherwise require 12GB, but with much lower latency than disk swap. zswap is similar but acts as a write-back cache in front of a disk swap device: compressed pages are stored in a memory pool, and only evicted to disk when the pool is full. Both are configured via kernel parameters. In production, zRAM is popular on embedded systems and laptops (e.g., Chrome OS, Android) where disk I/O is costly. On servers, zswap can reduce swap I/O by 70-80%. However, compression uses CPU cycles; monitor with vmstat or /proc/swap. For example, to enable zRAM: modprobe zram; echo 4G > /sys/block/zram0/disksize; mkswap /dev/zram0; swapon /dev/zram0. Use lz4 for speed or zstd for better compression. In cloud environments, consider using zswap with a small disk swap as a safety net.

setup_zram.shBASH

#!/bin/bash
# Setup zRAM swap with 4GB compressed RAM
modprobe zram
# Set compression algorithm (lz4 or zstd)
echo lz4 > /sys/block/zram0/comp_algorithm
# Set size (4GB)
echo 4G > /sys/block/zram0/disksize
# Create swap filesystem
mkswap /dev/zram0
# Enable swap
swapon -p 100 /dev/zram0  # priority 100
# Verify
swapon --show
# To disable: swapoff /dev/zram0; modprobe -r zram

💡zRAM vs zswap

📊 Production Insight

Monitor swap activity with vmstat 1. If si/so (swap in/out) is high, enable zRAM or zswap. For databases, avoid swap entirely; use memory limits instead.

🎯 Key Takeaway

zRAM and zswap use compression to reduce swap I/O, trading CPU for memory efficiency. They are essential for systems with limited RAM or slow disks.

● Production incidentPOST-MORTEMseverity: high

The 10ms Paging Ambush

Symptom

Spikes of 10–15ms latency every 3–5 seconds under load, no CPU or IO uptick. Only visible in tail latency (p99.9).

Assumption

Team assumed all memory accesses were equally fast, and that 32 GB of RAM was enough for a 20 GB dataset.

Root cause

Random access pattern to a memory‑mapped file forced the OS to evict hot pages and fetch cold ones from disk. Each fault took 10ms (4 KB page, SSD ~10ms seek).

Fix

1) mlock() the working set. 2) Prefault pages via sequential read at startup. 3) Fall back to custom‑managed buffer pool for random workloads.

Key lesson

Virtual memory is not real memory — page faults cost 10ms.
The working set must fit in RAM under all access patterns, not just total allocated size.
Random access to memory‑mapped files is a paging anti‑pattern.

Production debug guideDiagnose page faults and thrashing in production4 entries

Symptom · 01

Latency spikes without CPU or IO

→

Fix

Run perf stat -e page-faults,dTLB-loads,dTLB-load-misses -p PID to isolate paging costs.

Symptom · 02

High steal or si/so in top (swap activity)

→

Fix

Check /proc/meminfo for SwapCached and dirty; increase RAM or reduce working set.

Symptom · 03

Random access to large mmap file

→

Fix

Use mlockall() or mmap with MAP_POPULATE to prefault pages; profile with ftrace.

Symptom · 04

Unexpected out‑of‑memory (OOM) kill

→

Fix

Check dmesg for OOM killer messages and adjust vm.overcommit_ratio.

★ Paging Quick DebugCommands to catch and fix page faults fast

Latency spike−

Immediate action

Run `perf stat -e page-faults -p PID 2>&1 | head -5`

Commands

`perf stat -e page-faults,dTLB-load-misses -p PID sleep 10`

`cat /proc/PID/status | grep VmRSS; cat /proc/PID/status | grep VmSwap`

Fix now

Call mlockall(MCL_CURRENT | MCL_FUTURE) on startup to lock pages.

Swap usage+

Virtual Memory vs Physical Memory

Aspect	Virtual Memory	Physical Memory
Size	Up to address space width (e.g., 48 bits)	Limited by RAM
Access speed	1 cycle if TLB hit; 10–100 cycles if miss; 10ms if page fault	~10ns for cache hit, ~100ns for DRAM
Persistence	Backed by disk, lost on power off	Volatile, lost on power off
Allocation	Instant (lazy), pages populated on demand	Instant but limited
Isolation	Fully isolated per process	Shared across processes via kernel

⚙ Quick Reference

9 commands from this guide

File	Command / Code	Purpose
iothecodeforgePagingObservations.java	public class PagingObservations {	What is Virtual Memory and Paging?
iothecodeforgetranslation.c	int main() {	How Address Translation Works
iothecodeforgescan.c	int main() {	Page Replacement Algorithms
monitor.sh	while true; do	Performance Considerations and Production Pitfalls
check_tlb.sh	perf stat -e dTLB-load-misses,dTLB-loads -p $(pgrep -n my_app) -- sleep 5	The Hidden Performance Cost of TLB Misses in Production
page_fault_monitor.py	pid = os.getpid()	Why Demand Paging Killed the 'Load Everything' Mentality
check_page_levels.c	int main() {	Five-Level Page Tables in Modern x86-64
tlb_miss_measure.sh	perf stat -e dTLB-load-misses,dTLB-loads,iTLB-load-misses,iTLB-loads \	MMU and TLB
setup_zram.sh	modprobe zram	Swap and zRAM

Key takeaways

Pages are not free

each fault costs ~10ms.

Working set must fit in RAM; measure with working_set_size tools.

Use mlock(), huge pages, and madvise() for predictable performance.

Profile first

perf for TLB and faults, vmstat for swap.

Random access to mmap files is a paging trap.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the cost of a page fault in terms of CPU cycles. Why does it mat...

Q02SENIOR

What is the 'scanning problem' in page replacement and how does Linux mi...

Q03SENIOR

How would you diagnose and fix a production service that experiences 10m...

Q01 of 03SENIOR

Explain the cost of a page fault in terms of CPU cycles. Why does it matter for real‑time systems?

ANSWER

A major page fault on an SSD costs roughly 10ms, which is 10 million CPU cycles at 1 GHz. For a 3 GHz CPU, that's 30 million cycles wasted. In real‑time systems, deterministic latency requires bounded response times; a page fault breaks those bounds. Solutions: lock memory with mlockall(), use huge pages to reduce page count, or allocate and prefault all required memory at startup.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is virtual memory in simple terms?

What is a page fault and why is it expensive?

How can I reduce page faults in my application?

What is the difference between a minor and major page fault?

What does 'thrashing' mean?

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's Operating Systems. Mark it forged?

7 min read · try the examples if you haven't