Senior 4 min · March 06, 2026

Memory Management Thrashing — 120GB Working Set on 64GB

A Spark job's 120GB working set on 64GB server caused thrashing: 85% CPU on page faults, 200ms→4+ min queries.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Memory management is the OS subsystem that allocates, tracks, and reclaims memory for running processes
  • Physical memory is shared among processes; each process gets a private virtual address space
  • Paging maps virtual pages to physical frames using page tables, enabling isolation and sparse use
  • Virtual memory extends RAM to disk via swapping, with page replacement algorithms deciding what to evict
  • Performance trap: TLB misses on context switch hurt throughput more than page faults in many workloads
  • Biggest mistake: assuming virtual addresses match physical addresses — they never do in modern OSes
Plain-English First

Imagine your desk is the computer's RAM — it's the space where you actually do your work. Your OS is the office manager who decides which papers (programs) get desk space, where they go, and who gets kicked off when the desk is full. When the desk overflows, the manager quietly moves older papers to a filing cabinet (your hard drive) and brings them back when needed — you barely notice. That swap between desk and cabinet is exactly what virtual memory does.

Every program you run — a browser, a game, a database — needs memory to breathe. Without a fair, structured way to hand out that memory, one misbehaving app could read your bank app's data, a crashed process could corrupt the entire system, and you'd never be able to run more than one program at a time. Memory management is the silent contract that makes modern computing safe and multi-tasking possible.

The problem it solves is deceptively deep. Physical RAM is finite and shared. Process A shouldn't be able to peek into Process B's address space. The OS needs to allocate memory fast, reclaim it when a process exits, and give each program the illusion that it owns all the memory in the world — even when RAM is nearly full. Without a memory manager, none of that is possible.

By the end of this article you'll understand exactly how the OS partitions memory, why paging replaced older schemes, how virtual memory lets your laptop run 40 browser tabs on 8 GB of RAM, and what questions about memory management reveal in a system-design or OS interview. Let's dig in.

What is Memory Management in OS?

Memory management is the OS subsystem responsible for allocating and deallocating memory to processes, tracking which parts of memory are free or in use, and providing isolation so that one process cannot access another's data. At its core, it solves three problems: sharing of finite physical RAM, protection between processes, and translation from virtual to physical addresses. Every modern OS—Linux, Windows, macOS—implements memory management as part of the kernel, using hardware support from the CPU's MMU (Memory Management Unit). Without it, a bug in a browser could corrupt a password manager in memory, and multiprogramming would be impossible.

io/thecodeforge/memory/addr_space.cC
1
2
3
4
5
6
7
8
9
10
11
#include <stdio.h>
#include <unistd.h>

int main() {
    int var = 42;
    printf("Virtual address of var: %p\n", (void*)&var);
    printf("Process PID: %d\n", getpid());
    printf("Check /proc/%d/maps for virtual memory layout\n", getpid());
    // Run: cat /proc/PID/maps to see the address space
    return 0;
}
Mental Model: The Apartment Building
  • The building has limited rooms (physical frames).
  • Each tenant has their own numbering system (virtual addresses).
  • The landlord (MMU) translates the tenant's room number (virtual address) into a real room (physical address).
  • Many tenants can share a common area (shared memory).
  • If too many tenants arrive, the landlord moves some stuff to a storage locker (swap).
Production Insight
A single process with a memory leak can exhaust all physical RAM, triggering the OOM killer. Always set per-process memory limits (ulimit -v) in production.\nMonitoring RSS vs VSZ is critical: RSS is real RAM used; VSZ includes virtual allocations that may never be pinned.\nRule: Set vm.overcommit_memory=2 on database servers to prevent the OS from promising more memory than exists.
Key Takeaway
Memory management creates the illusion of private, infinite memory for each process.\nThat illusion is built on page tables, TLB caching, and the MMU.\nRule: The OS lies about memory — always monitor RSS, not VSZ.
When to Worry About Memory Management Issues
IfProcess crashes randomly with 'out of memory'
UseCheck dmesg for OOM killer; enable memory cgroup limits
IfSystem is slow, high load, but low CPU
UseCheck vmstat for swapping; reduce working set or add RAM
IfApplication works fine but uses more RSS than expected
UseProfile with perf or valgrind to find memory leaks
IfMemory fragmentation prevents large allocation
UseUse huge pages (HugeTLB) or defragment via compaction

Memory Allocation Strategies: Contiguous vs Non-Contiguous

Early operating systems used contiguous allocation: a process gets a single block of physical memory. That led to fragmentation and overcommit problems. Modern OSes use non-contiguous allocation via paging. But within a process, memory allocation requests (malloc) are served by the heap manager, which uses a mix of contiguity and segmentation. The two main strategies are:

  1. Contiguous (Fixed Partitioning): Each process gets a fixed-size block; simple but wastes memory (external fragmentation). Not used in general-purpose OSes.
  2. Non-Contiguous (Paging): Physical memory is split into fixed-size frames; processes get virtual pages that can map to any frame. This eliminates external fragmentation and enables virtual memory.

Additionally, Segmentation allows variable-sized logical chunks (code, data, stack) but suffers from external fragmentation unless combined with paging. Most modern systems (Linux, Windows) use paged segmentation where segments are further broken into pages.

io/thecodeforge/memory/page_table_sim.cC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdio.h>
#include <stdlib.h>

// Simulate a page table for a process with 4 virtual pages
#define NUM_PAGES 4

struct page_table_entry {
    int valid;      // 1 if page is in physical memory
    int frame_num;  // physical frame number (if valid)
    int dirty;      // modified flag for swapping
    int referenced; // for LRU approximation
};

int main() {
    struct page_table_entry pt[NUM_PAGES];
    pt[0] = (struct page_table_entry){1, 10, 0, 0};
    pt[1] = (struct page_table_entry){1, 5, 1, 0};
    pt[2] = (struct page_table_entry){0, 0, 0, 0}; // page fault will occur
    pt[3] = (struct page_table_entry){1, 8, 0, 0};

    int virtual_page = 2;
    if (pt[virtual_page].valid) {
        int physical_addr = pt[virtual_page].frame_num * 4096; // 4KB pages
        printf("Physical address for vaddr %d: 0x%x\n", virtual_page, physical_addr);
    } else {
        printf("Page fault: must load from swap\n");
    }
    return 0;
}
Fragmentation Trap
Contiguous allocation was abandoned because of external fragmentation: even if total free memory is sufficient, no single contiguous block may be large enough. Paging solves this but introduces internal fragmentation (last page partially unused).
Production Insight
Heap allocators (glibc's ptmalloc) use a mix of mmap and sbrk. Heavy use of small allocations can fragment the heap. Use jemalloc or tcmalloc to reduce fragmentation.\nIn production, monitor /proc/buddyinfo for physical fragmentation; if Buddyinfo shows many 1-order blocks, compaction reduces latency spikes.\nRule: Always benchmark with realistic allocation patterns — malloc implementations differ dramatically.
Key Takeaway
Non-contiguous paging eliminated external fragmentation but added page table overhead.\nThe golden rule of memory allocation: pay for what you use, but be ready for page faults.\nRule: Prefer mature allocators (jemalloc) for high-throughput servers.

Paging: The Mechanism Behind Virtual Memory

Paging divides virtual memory into fixed-size blocks called pages (typically 4KB on x86_64) and physical memory into frames of the same size. Each process has a page table that maps virtual page numbers to physical frame numbers. The MMU uses this page table to translate every memory access from virtual to physical address. When a process accesses a page not currently in physical memory, a page fault occurs, and the OS loads the page from disk (swap) or from the file system (demand paging).

Page tables themselves are hierarchical (e.g., 4-level page tables on x86_64) to avoid needing a flat table with billions of entries. The Translation Lookaside Buffer (TLB) caches recent translations to speed up address translation. On a context switch, the TLB must be flushed, which is why repeated context switching hurts performance.

io/thecodeforge/memory/page_fault_measure.cC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/resource.h>

int main() {
    struct rusage usage_before, usage_after;
    getrusage(RUSAGE_SELF, &usage_before);

    // Cause a page fault by accessing a page that hasn't been touched
    int *big = malloc(1024 * 1024 * 100); // allocate 100 MB
    for (int i = 0; i < 100 * 1024 * 1024 / sizeof(int); i += 4096 / sizeof(int)) {
        big[i] = i; // touch every page boundary
    }

    getrusage(RUSAGE_SELF, &usage_after);
    long page_faults = usage_after.ru_majflt - usage_before.ru_majflt;
    printf("Major page faults: %ld\n", page_faults);
    free(big);
    return 0;
}
TLB Performance Note
Modern CPUs have multiple TLB levels (L1, L2). A TLB miss costs 10-100 cycles; a page fault costs millions of cycles (disk I/O). The 2MB huge pages reduce TLB misses by covering more memory per entry, at the cost of more internal fragmentation.
Production Insight
Page faults are a major cause of latency spikes in memory-bound services. Use perf stat -e page-faults to count them.\nHuge pages (HugeTLB or transparent huge pages) can reduce TLB misses by 10-30% for large working sets.\nRule: For databases and VMs, explicit huge pages (HugeTLB) are more predictable than transparent huge pages (THP).
Key Takeaway
Paging converts the problem of fitting many processes into limited RAM into a mapping problem.\nThe TLB makes paging fast; context switches make it slow again.\nRule: Reduce context switches (e.g., by pinning processes to CPUs) to reduce TLB flushes.

Virtual Memory: The Illusion of Infinite Memory

Virtual memory extends the concept of paging: each process gets a full virtual address space (e.g., 2^48 on x86_64), but only the parts actively needed are backed by physical memory. The rest sits on disk (swap). When a process accesses a virtual address that is not in RAM, the OS loads the corresponding page from disk into a freed frame—this is demand paging. If no free frames exist, the OS evicts a page to disk using a replacement algorithm (LRU, Clock, etc.). Virtual memory enables: - Running programs larger than physical RAM. - Sharing libraries and memory-mapped files. - Copy-on-write (COW) forking.

The key components: page table, swap space (disk area), page replacement algorithm, and the page fault handler.

io/thecodeforge/memory/lru_sim.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from collections import OrderedDict

def simulate_lru(pages, num_frames):
    frames = OrderedDict()
    page_faults = 0
    for page in pages:
        if page in frames:
            frames.move_to_end(page)  # mark as recently used
        else:
            page_faults += 1
            if len(frames) >= num_frames:
                frames.popitem(last=False)  # remove LRU
            frames[page] = True
    return page_faults

# Access pattern: start of loop, then large working set
accesses = [1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5]
print(simulate_lru(accesses, 3))  # Output: 8 (most page faults)
Mental Model: The Library with Limited Shelves
  • The library catalog (page table) tells where each book is.
  • A book on the shelf = page in RAM; in the basement = on swap.
  • Frequent trips to the basement (thrashing) mean patrons are referencing books that keep getting removed.
  • To avoid thrashing, ensure the total working set of active patrons fits on the shelves.
Production Insight
Virtual memory breaks when total working set of all processes exceeds RAM + swap. That's thrashing. Use sar -B to monitor pgpgin/s and pgpgout/s.\nCopy-on-write (COW) after fork can cause memory doubling if not managed (e.g., after fork, child modifies many pages). Use vfork or posix_spawn to avoid COW.\nRule: Set vm.max_map_count appropriately; default (65530) is too low for large-memory processes like Elasticsearch.
Key Takeaway
Virtual memory trades speed for capacity — disk paging is 100,000x slower than RAM access.\nThrashing is the worst-case scenario: the OS is busy swapping but no process makes progress.\nRule: Monitor page fault rate, not just free RAM. If major faults > 10/second, you have a problem.

Page Replacement Algorithms: Which Pages Get Evicted?

When physical memory is full and a page fault occurs, the OS must evict a victim page to disk. The replacement algorithm determines which page to remove. The goal is to minimize future page faults by evicting pages unlikely to be used soon.

LRU (Least Recently Used): Evict the page not accessed for the longest time. Requires hardware support (reference bits) or software approximation (e.g., Clock algorithm).

Clock (Second Chance): Use a circular list with a reference bit. Sweep through, clearing bits; if bit is already clear, evict. Efficient approximation of LRU.

Working Set Model: Estimate the set of pages a process is actively using; only keep that set in RAM. Prevents thrashing by adjusting degree of multiprogramming.

Other algorithms: FIFO (simple but suffers from Belady's anomaly), Optimal (unimplementable, used as comparison).

io/thecodeforge/memory/clock_alg.cC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdio.h>
#include <stdlib.h>

#define NUM_FRAMES 4

struct frame {
    int page;
    int referenced; // 0 or 1
};

int main() {
    struct frame frames[NUM_FRAMES] = {{1,0},{2,0},{3,0},{4,0}};
    int hand = 0; // clock hand
    int new_page = 5;

    // Clock algorithm: find a victim
    while (1) {
        if (frames[hand].referenced == 0) {
            printf("Evicting page %d from frame %d\n", frames[hand].page, hand);
            frames[hand].page = new_page;
            frames[hand].referenced = 0;
            break;
        } else {
            frames[hand].referenced = 0; // give second chance
            hand = (hand + 1) % NUM_FRAMES;
        }
    }
    return 0;
}
Belady's Anomaly with FIFO
FIFO replacement can have more page faults with more frames: adding physical memory may increase thrashing. This anomaly does not occur with LRU, Optimal, or Stack algorithms. Avoid FIFO in performance-critical OS code.
Production Insight
Linux uses a variant of the Clock algorithm (with active/inactive lists). The /proc/meminfo fields 'Active' and 'Inactive' reflect this.\nTuning vm.vfs_cache_pressure and vm.swappiness alters how aggressively the OS reclaims page cache vs swap. For databases, low swappiness and high vfs_cache_pressure prevent file pages from being evicted too early.\nRule: In cloud VMs, avoid swap entirely if possible — use instance storage or rely on SSDs with low latency.
Key Takeaway
Replacement algorithms are trade-offs between accuracy and overhead.\nLRU is optimal in theory; Clock is practical in kernels.\nRule: If you design a custom cache (e.g., Redis LRU), measure hit ratio under your access pattern — standard LRU may not suit all workloads.
● Production incidentPOST-MORTEMseverity: high

The Thrashed Production Server — When the OS Spends More Time Swapping Than Working

Symptom
Database queries that normally run in 200ms suddenly take 4+ minutes. CPU is near 100% but system load is through the roof. Disk I/O saturates with swap activity. dmesg shows 'out of memory' messages even though 20 GB RAM is free.
Assumption
The team assumed leftover memory meant the system wasn't memory-constrained. They thought the issue was a slow query and started indexing, not realizing the real bottleneck was page thrashing from a process that kept requesting more memory than physical RAM.
Root cause
One analytic process (a misconfigured Spark job) requested a 120 GB working set on a 64 GB machine. The OS's virtual memory manager started swapping out other processes' pages to make room. Every access to those pages caused a page fault, reading them back from disk. With enough concurrent pressure, the system spent 85% of its time on page fault handling and only 15% on actual work — classic thrashing.
Fix
1) Kill the runaway Spark job (immediate recovery). 2) Set ulimit -v and memory cgroups to cap per-process memory. 3) Configure vm.swappiness to 10 on database servers to prevent OS from swapping OS pages. 4) Add memory.soft_limit_in_bytes in cgroup to deprioritize batch jobs before they cause thrashing.
Key lesson
  • Thrashing happens when total working set exceeds physical RAM, not when RAM is 'full' — always monitor page fault rates (ps -eo min_flt,maj_flt)
  • Use cgroups and ulimits to prevent one rogue process from starving the system
  • Set vm.swappiness low (1-10) on latency-sensitive servers; never let the OS swap application pages
Production debug guideSymptom-based steps to diagnose OS-level memory issues4 entries
Symptom · 01
Process crashes with OOM (Out-of-Memory) killer
Fix
Check dmesg | grep -i 'killed process'. Identify the process and its memory usage before death. Increase memory limit or fix leak.
Symptom · 02
High system load but low CPU usage
Fix
Run vmstat 1 and look at 'si' (swap in) and 'so' (swap out). High values indicate swapping/thrashing. Check /proc/meminfo for Committed_AS vs CommitLimit.
Symptom · 03
Process feels slow intermittently
Fix
Check /proc/<pid>/status for VmPeak, VmRSS, VmSwap. Run perf stat -e page-faults to count major+minor page faults. High major faults (majflt) mean disk reads.
Symptom · 04
Memory fragmentation — cannot allocate large contiguous blocks
Fix
Check /proc/buddyinfo for fragmentation. On 64-bit systems this is less common for user space. For kernel: echo 1 > /proc/sys/vm/compact_memory to trigger compaction.
★ Quick Memory Debug Cheat SheetCommands to diagnose OS memory issues fast — every one works on Linux 5.x+ and most cloud kernels.
App killed by OOM killer
Immediate action
Check dmesg for 'Out of memory' and 'Killed process'
Commands
dmesg | grep -i oom | tail -5
cat /proc/meminfo | grep -E '^MemTotal|^MemFree|^Cached'
Fix now
Raise vm.overcommit_memory to 2 (never overcommit) and set memory cgroups
Slow app, high swap usage+
Immediate action
Check vmstat for si/so columns
Commands
vmstat 1 5
cat /proc/meminfo | grep -E '^SwapTotal|^SwapFree|^Committed_AS'
Fix now
Set vm.swappiness=10, kill memory-hungry process, add more RAM or reduce working set
Process won't allocate memory even when free RAM exists+
Immediate action
Check overcommit policy and memory fragmentation
Commands
cat /proc/sys/vm/overcommit_memory
cat /proc/buddyinfo | head -5
Fix now
If overcommit=2, increase vm.overcommit_ratio. If fragmented, echo 1 > /proc/sys/vm/compact_memory
Memory Management Concepts Comparison
ConceptLevelGranularityFragmentation TypePerformance Implication
Fixed PartitionsOSProcess-sizedExternalSimple but wastes memory
PagingOS/HWPage (4KB)InternalTLB misses on access
SegmentationOS/HWVariable segmentsExternalContext switch saves segment registers
Heap Allocation (mmap)User spaceMultiple of pageInternal, externalSyscall overhead on mmap
Buddy AllocatorKernelPower-of-2 blocksInternalFast, but merges are O(log n)

Key takeaways

1
Memory management creates a virtual address space per process, mapped to physical frames via page tables.
2
Paging eliminates external fragmentation but introduces TLB pressure and page fault latency.
3
Virtual memory allows overcommitment; the OOM killer is the last line of defense
always set cgroup limits.
4
Thrashing happens when total working set exceeds RAM; monitor page fault rates, not free memory.
5
Choose replacement algorithms based on workload
LRU approximates optimal, Clock is practical for kernels.

Common mistakes to avoid

4 patterns
×

Confusing virtual address space with physical memory usage

Symptom
Monitoring tools show 100 GB virtual memory per process, but physical RAM is only 64 GB. No visible swap usage. Panic over 'memory leak'.
Fix
VSZ is virtual address space allocated (may be mapped but unpinned). RSS is actual RAM. Use ps -eo pid,rss,vsz and focus on RSS. Virtual memory is cheap; physical memory is expensive.
×

Relying on swap to save you from running out of memory

Symptom
Production server starts swapping heavily, latency spikes from milliseconds to seconds. OOM killer may still fire because swap is slow.
Fix
Set vm.swappiness=1 for database servers. Use swap as emergency reserve only. Monitor swap usage with sar -S and alert if swap usage > 10% of RAM.
×

Forgetting to set per-process memory limits

Symptom
One process with a memory leak consumes all RAM, then OOM killer kills an innocent process (often an important service).
Fix
Use systemd service files with MemoryMax=, MemoryHigh= and TasksMax=. Or ulimit -v <bytes> for shell-started processes. In Kubernetes, resource limits are mandatory.
×

Assuming huge pages are always faster

Symptom
Enabling transparent huge pages (THP) leads to memory allocation stalls because the kernel must find contiguous 2MB blocks, causing compaction overhead.
Fix
For latency-sensitive apps (databases, Java GC), disable THP: echo never > /sys/kernel/mm/transparent_hugepage/enabled. Use explicit HugeTLB pages for large heaps.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain how virtual memory works. What happens when a process accesses a...
Q02JUNIOR
What is the difference between a page fault and a segmentation fault?
Q03SENIOR
How does the operating system allocate memory to a process? Describe the...
Q04SENIOR
What is the Copy-on-Write mechanism? How does it affect memory usage aft...
Q05SENIOR
Name and explain the three page replacement algorithms. Which one does L...
Q01 of 05SENIOR

Explain how virtual memory works. What happens when a process accesses a page not in RAM?

ANSWER
Virtual memory maps virtual addresses to physical addresses via the page table. The Translation Lookaside Buffer (TLB) caches recent translations. On a page fault (invalid page table entry), the OS takes control: 1. Hardware raises a page fault exception. 2. OS checks if the virtual address is valid (e.g., part of the process address space). 3. If valid, the OS loads the page from disk (swap or file) into a free frame. 4. It updates the page table with the new entry. 5. The instruction that caused the fault is retried. If no free frame exists, the OS must evict a page (using LRU/Clock). The process is blocked during I/O. If the sum of working sets exceeds physical RAM, thrashing occurs.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the difference between physical and virtual memory?
02
What is a page fault and why does it matter?
03
How do I check memory usage on Linux?
04
What is swap and should I use it in production?
05
What is the OOM killer and how does it decide which process to kill?
🔥

That's Operating Systems. Mark it forged?

4 min read · try the examples if you haven't

Previous
Process Scheduling Algorithms
4 / 12 · Operating Systems
Next
Virtual Memory and Paging