Intermediate 6 min · March 06, 2026

Memory Management in OS

Memory Management Thrashing — 120GB Working Set on 64GB

Q: What is the difference between physical and virtual memory?

**Physical memory** is the actual RAM hardware modules (chips) installed in a computer. **Virtual memory** is an abstraction provided by the OS that maps a process's address space to physical memory, potentially backed by disk swap. Every program sees its own continuous virtual address space, which may be larger than physical RAM. The OS and MMU handle the translation on the fly.

Q: What is a page fault and why does it matter?

A **page fault** occurs when a program accesses a virtual page that is not currently in physical RAM. The OS must load the page from disk (major fault, slow) or from the file page cache (minor fault, fast). Frequent major page faults (page fault storm) cause thrashing and severely degrade performance. In production, keep major faults below 1/second per core.

Q: How do I check memory usage on Linux?

Use `free -h` for overall RAM and swap, `cat /proc/meminfo` for detailed counters (Active, Inactive, SwapTotal, SwapFree, Committed_AS), `vmstat 1` for page fault rates, and `top -o %MEM` to see per-process RSS. For deeper analysis, `sar -B` gives paging statistics over time.

Q: What is swap and should I use it in production?

Swap is disk space used as extension of physical RAM. It allows the OS to page out less-used memory. For latency-sensitive services, disable swap entirely or set vm.swappiness to 1 to avoid unnecessary swap activity. Swap can help survive memory spikes but at a huge latency cost (disk I/O). In cloud VMs, avoid swap; rely on sufficient RAM and vertical scaling.

Q: What is the OOM killer and how does it decide which process to kill?

When the kernel runs out of memory and cannot reclaim more, it invokes the **Out-Of-Memory Killer**, which selects a process to kill based on an oom_score (badness heuristic). Scores are based on memory usage, process lifespan, and privileges. You can adjust a process's score via /proc/ /oom_adj. The OOM killer is a last resort—cgroups and limits should prevent it.

A Spark job's 120GB working set on 64GB server caused thrashing: 85% CPU on page faults, 200ms→4+ min queries.

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Drawn from code that ran under real load.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Memory management is the OS subsystem that allocates, tracks, and reclaims memory for running processes
Physical memory is shared among processes; each process gets a private virtual address space
Paging maps virtual pages to physical frames using page tables, enabling isolation and sparse use
Virtual memory extends RAM to disk via swapping, with page replacement algorithms deciding what to evict
Performance trap: TLB misses on context switch hurt throughput more than page faults in many workloads
Biggest mistake: assuming virtual addresses match physical addresses — they never do in modern OSes

✦ Definition~90s read

What is Memory Management in OS?

★

Imagine your desk is the computer's RAM — it's the space where you actually do your work.

Every modern OS—Linux, Windows, macOS—implements memory management as part of the kernel, using hardware support from the CPU's MMU (Memory Management Unit). Without it, a bug in a browser could corrupt a password manager in memory, and multiprogramming would be impossible.

Plain-English First

Imagine your desk is the computer's RAM — it's the space where you actually do your work. Your OS is the office manager who decides which papers (programs) get desk space, where they go, and who gets kicked off when the desk is full. When the desk overflows, the manager quietly moves older papers to a filing cabinet (your hard drive) and brings them back when needed — you barely notice. That swap between desk and cabinet is exactly what virtual memory does.

Every program you run — a browser, a game, a database — needs memory to breathe. Without a fair, structured way to hand out that memory, one misbehaving app could read your bank app's data, a crashed process could corrupt the entire system, and you'd never be able to run more than one program at a time. Memory management is the silent contract that makes modern computing safe and multi-tasking possible.

The problem it solves is deceptively deep. Physical RAM is finite and shared. Process A shouldn't be able to peek into Process B's address space. The OS needs to allocate memory fast, reclaim it when a process exits, and give each program the illusion that it owns all the memory in the world — even when RAM is nearly full. Without a memory manager, none of that is possible.

By the end of this article you'll understand exactly how the OS partitions memory, why paging replaced older schemes, how virtual memory lets your laptop run 40 browser tabs on 8 GB of RAM, and what questions about memory management reveal in a system-design or OS interview. Let's dig in.

What is Memory Management in OS?

Memory management is the OS subsystem responsible for allocating and deallocating memory to processes, tracking which parts of memory are free or in use, and providing isolation so that one process cannot access another's data. At its core, it solves three problems: sharing of finite physical RAM, protection between processes, and translation from virtual to physical addresses. Every modern OS—Linux, Windows, macOS—implements memory management as part of the kernel, using hardware support from the CPU's MMU (Memory Management Unit). Without it, a bug in a browser could corrupt a password manager in memory, and multiprogramming would be impossible.

io/thecodeforge/memory/addr_space.cC

#include <stdio.h>
#include <unistd.h>

int main() {
    int var = 42;
    printf("Virtual address of var: %p\n", (void*)&var);
    printf("Process PID: %d\n", getpid());
    printf("Check /proc/%d/maps for virtual memory layout\n", getpid());
    // Run: cat /proc/PID/maps to see the address space
    return 0;
}

Mental Model

Mental Model: The Apartment Building

Think of physical RAM as an apartment building with numbered rooms. Each process (tenant) gets a set of keys (page table) that maps its own room numbers to actual rooms.

The building has limited rooms (physical frames).
Each tenant has their own numbering system (virtual addresses).
The landlord (MMU) translates the tenant's room number (virtual address) into a real room (physical address).
Many tenants can share a common area (shared memory).
If too many tenants arrive, the landlord moves some stuff to a storage locker (swap).

📊 Production Insight

A single process with a memory leak can exhaust all physical RAM, triggering the OOM killer. Always set per-process memory limits (ulimit -v) in production.\nMonitoring RSS vs VSZ is critical: RSS is real RAM used; VSZ includes virtual allocations that may never be pinned.\nRule: Set vm.overcommit_memory=2 on database servers to prevent the OS from promising more memory than exists.

🎯 Key Takeaway

Memory management creates the illusion of private, infinite memory for each process.\nThat illusion is built on page tables, TLB caching, and the MMU.\nRule: The OS lies about memory — always monitor RSS, not VSZ.

When to Worry About Memory Management Issues

IfProcess crashes randomly with 'out of memory'

→

UseCheck dmesg for OOM killer; enable memory cgroup limits

IfSystem is slow, high load, but low CPU

→

UseCheck vmstat for swapping; reduce working set or add RAM

IfApplication works fine but uses more RSS than expected

→

UseProfile with perf or valgrind to find memory leaks

IfMemory fragmentation prevents large allocation

→

UseUse huge pages (HugeTLB) or defragment via compaction

thecodeforge.io

Memory Management Os

Memory Allocation Strategies: Contiguous vs Non-Contiguous

Early operating systems used contiguous allocation: a process gets a single block of physical memory. That led to fragmentation and overcommit problems. Modern OSes use non-contiguous allocation via paging. But within a process, memory allocation requests (malloc) are served by the heap manager, which uses a mix of contiguity and segmentation. The two main strategies are:

Contiguous (Fixed Partitioning): Each process gets a fixed-size block; simple but wastes memory (external fragmentation). Not used in general-purpose OSes.
Non-Contiguous (Paging): Physical memory is split into fixed-size frames; processes get virtual pages that can map to any frame. This eliminates external fragmentation and enables virtual memory.

Additionally, Segmentation allows variable-sized logical chunks (code, data, stack) but suffers from external fragmentation unless combined with paging. Most modern systems (Linux, Windows) use paged segmentation where segments are further broken into pages.

io/thecodeforge/memory/page_table_sim.cC

#include <stdio.h>
#include <stdlib.h>

// Simulate a page table for a process with 4 virtual pages
#define NUM_PAGES 4

struct page_table_entry {
    int valid;      // 1 if page is in physical memory
    int frame_num;  // physical frame number (if valid)
    int dirty;      // modified flag for swapping
    int referenced; // for LRU approximation
};

int main() {
    struct page_table_entry pt[NUM_PAGES];
    pt[0] = (struct page_table_entry){1, 10, 0, 0};
    pt[1] = (struct page_table_entry){1, 5, 1, 0};
    pt[2] = (struct page_table_entry){0, 0, 0, 0}; // page fault will occur
    pt[3] = (struct page_table_entry){1, 8, 0, 0};

    int virtual_page = 2;
    if (pt[virtual_page].valid) {
        int physical_addr = pt[virtual_page].frame_num * 4096; // 4KB pages
        printf("Physical address for vaddr %d: 0x%x\n", virtual_page, physical_addr);
    } else {
        printf("Page fault: must load from swap\n");
    }
    return 0;
}

⚠ Fragmentation Trap

Contiguous allocation was abandoned because of external fragmentation: even if total free memory is sufficient, no single contiguous block may be large enough. Paging solves this but introduces internal fragmentation (last page partially unused).

📊 Production Insight

Heap allocators (glibc's ptmalloc) use a mix of mmap and sbrk. Heavy use of small allocations can fragment the heap. Use jemalloc or tcmalloc to reduce fragmentation.\nIn production, monitor /proc/buddyinfo for physical fragmentation; if Buddyinfo shows many 1-order blocks, compaction reduces latency spikes.\nRule: Always benchmark with realistic allocation patterns — malloc implementations differ dramatically.

🎯 Key Takeaway

Non-contiguous paging eliminated external fragmentation but added page table overhead.\nThe golden rule of memory allocation: pay for what you use, but be ready for page faults.\nRule: Prefer mature allocators (jemalloc) for high-throughput servers.

Paging: The Mechanism Behind Virtual Memory

Paging divides virtual memory into fixed-size blocks called pages (typically 4KB on x86_64) and physical memory into frames of the same size. Each process has a page table that maps virtual page numbers to physical frame numbers. The MMU uses this page table to translate every memory access from virtual to physical address. When a process accesses a page not currently in physical memory, a page fault occurs, and the OS loads the page from disk (swap) or from the file system (demand paging).

Page tables themselves are hierarchical (e.g., 4-level page tables on x86_64) to avoid needing a flat table with billions of entries. The Translation Lookaside Buffer (TLB) caches recent translations to speed up address translation. On a context switch, the TLB must be flushed, which is why repeated context switching hurts performance.

io/thecodeforge/memory/page_fault_measure.cC

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/resource.h>

int main() {
    struct rusage usage_before, usage_after;
    getrusage(RUSAGE_SELF, &usage_before);

    // Cause a page fault by accessing a page that hasn't been touched
    int *big = malloc(1024 * 1024 * 100); // allocate 100 MB
    for (int i = 0; i < 100 * 1024 * 1024 / sizeof(int); i += 4096 / sizeof(int)) {
        big[i] = i; // touch every page boundary
    }

    getrusage(RUSAGE_SELF, &usage_after);
    long page_faults = usage_after.ru_majflt - usage_before.ru_majflt;
    printf("Major page faults: %ld\n", page_faults);
    free(big);
    return 0;
}

🔥TLB Performance Note

Modern CPUs have multiple TLB levels (L1, L2). A TLB miss costs 10-100 cycles; a page fault costs millions of cycles (disk I/O). The 2MB huge pages reduce TLB misses by covering more memory per entry, at the cost of more internal fragmentation.

📊 Production Insight

Page faults are a major cause of latency spikes in memory-bound services. Use perf stat -e page-faults to count them.\nHuge pages (HugeTLB or transparent huge pages) can reduce TLB misses by 10-30% for large working sets.\nRule: For databases and VMs, explicit huge pages (HugeTLB) are more predictable than transparent huge pages (THP).

🎯 Key Takeaway

Paging converts the problem of fitting many processes into limited RAM into a mapping problem.\nThe TLB makes paging fast; context switches make it slow again.\nRule: Reduce context switches (e.g., by pinning processes to CPUs) to reduce TLB flushes.

thecodeforge.io

Memory Management Os

Virtual Memory: The Illusion of Infinite Memory

Virtual memory extends the concept of paging: each process gets a full virtual address space (e.g., 2^48 on x86_64), but only the parts actively needed are backed by physical memory. The rest sits on disk (swap). When a process accesses a virtual address that is not in RAM, the OS loads the corresponding page from disk into a freed frame—this is demand paging. If no free frames exist, the OS evicts a page to disk using a replacement algorithm (LRU, Clock, etc.). Virtual memory enables: - Running programs larger than physical RAM. - Sharing libraries and memory-mapped files. - Copy-on-write (COW) forking.

The key components: page table, swap space (disk area), page replacement algorithm, and the page fault handler.

io/thecodeforge/memory/lru_sim.pyPYTHON

from collections import OrderedDict

def simulate_lru(pages, num_frames):
    frames = OrderedDict()
    page_faults = 0
    for page in pages:
        if page in frames:
            frames.move_to_end(page)  # mark as recently used
        else:
            page_faults += 1
            if len(frames) >= num_frames:
                frames.popitem(last=False)  # remove LRU
            frames[page] = True
    return page_faults

# Access pattern: start of loop, then large working set
accesses = [1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5]
print(simulate_lru(accesses, 3))  # Output: 8 (most page faults)

Mental Model

Mental Model: The Library with Limited Shelves

The library (physical RAM) has limited shelf space. Patrons (processes) request books (pages). The librarian (OS) checks the shelves first; if the book isn't there, she retrieves it from the basement (disk). To make room, she puts back the least recently used book.

The library catalog (page table) tells where each book is.
A book on the shelf = page in RAM; in the basement = on swap.
Frequent trips to the basement (thrashing) mean patrons are referencing books that keep getting removed.
To avoid thrashing, ensure the total working set of active patrons fits on the shelves.

📊 Production Insight

Virtual memory breaks when total working set of all processes exceeds RAM + swap. That's thrashing. Use sar -B to monitor pgpgin/s and pgpgout/s.\nCopy-on-write (COW) after fork can cause memory doubling if not managed (e.g., after fork, child modifies many pages). Use vfork or posix_spawn to avoid COW.\nRule: Set vm.max_map_count appropriately; default (65530) is too low for large-memory processes like Elasticsearch.

🎯 Key Takeaway

Virtual memory trades speed for capacity — disk paging is 100,000x slower than RAM access.\nThrashing is the worst-case scenario: the OS is busy swapping but no process makes progress.\nRule: Monitor page fault rate, not just free RAM. If major faults > 10/second, you have a problem.

Page Replacement Algorithms: Which Pages Get Evicted?

When physical memory is full and a page fault occurs, the OS must evict a victim page to disk. The replacement algorithm determines which page to remove. The goal is to minimize future page faults by evicting pages unlikely to be used soon.

LRU (Least Recently Used): Evict the page not accessed for the longest time. Requires hardware support (reference bits) or software approximation (e.g., Clock algorithm).

Clock (Second Chance): Use a circular list with a reference bit. Sweep through, clearing bits; if bit is already clear, evict. Efficient approximation of LRU.

Working Set Model: Estimate the set of pages a process is actively using; only keep that set in RAM. Prevents thrashing by adjusting degree of multiprogramming.

Other algorithms: FIFO (simple but suffers from Belady's anomaly), Optimal (unimplementable, used as comparison).

io/thecodeforge/memory/clock_alg.cC

#include <stdio.h>
#include <stdlib.h>

#define NUM_FRAMES 4

struct frame {
    int page;
    int referenced; // 0 or 1
};

int main() {
    struct frame frames[NUM_FRAMES] = {{1,0},{2,0},{3,0},{4,0}};
    int hand = 0; // clock hand
    int new_page = 5;

    // Clock algorithm: find a victim
    while (1) {
        if (frames[hand].referenced == 0) {
            printf("Evicting page %d from frame %d\n", frames[hand].page, hand);
            frames[hand].page = new_page;
            frames[hand].referenced = 0;
            break;
        } else {
            frames[hand].referenced = 0; // give second chance
            hand = (hand + 1) % NUM_FRAMES;
        }
    }
    return 0;
}

⚠ Belady's Anomaly with FIFO

FIFO replacement can have more page faults with more frames: adding physical memory may increase thrashing. This anomaly does not occur with LRU, Optimal, or Stack algorithms. Avoid FIFO in performance-critical OS code.

📊 Production Insight

Linux uses a variant of the Clock algorithm (with active/inactive lists). The /proc/meminfo fields 'Active' and 'Inactive' reflect this.\nTuning vm.vfs_cache_pressure and vm.swappiness alters how aggressively the OS reclaims page cache vs swap. For databases, low swappiness and high vfs_cache_pressure prevent file pages from being evicted too early.\nRule: In cloud VMs, avoid swap entirely if possible — use instance storage or rely on SSDs with low latency.

🎯 Key Takeaway

Replacement algorithms are trade-offs between accuracy and overhead.\nLRU is optimal in theory; Clock is practical in kernels.\nRule: If you design a custom cache (e.g., Redis LRU), measure hit ratio under your access pattern — standard LRU may not suit all workloads.

Fragmentation: The Silent Performance Killer

Fragmentation is what happens when the OS's memory allocation decisions come back to bite you. Two flavors ruin your day. Internal fragmentation wastes memory inside an allocated block. Give a 10KB process a fixed 16KB partition? You just blew 6KB on nothing. External fragmentation is worse: free memory exists, but it's scattered into tiny, unusable holes. After enough allocate-free cycles, you get a Swiss cheese heap. No single free block can satisfy a large request. Your system starts thrashing, swapping pages like a gambler chasing losses. The fix? Non-contiguous schemes like paging. By breaking memory into fixed-size frames and processes into pages, the OS sidesteps external fragmentation entirely. Internal fragmentation becomes minimal — at most one partial page per process. If you're still seeing allocation failures on a box with free memory, check your fragmentation. It's the first thing I grep for in a production outage.

fragmentation_check.cC

// io.thecodeforge
#include <stdio.h>
#include <stdlib.h>

void check_contiguous_fragmentation(size_t *free_blocks, int count, size_t needed) {
    size_t largest = 0;
    for (int i = 0; i < count; i++) {
        if (free_blocks[i] >= needed) {
            printf("Found viable block: %zu bytes\n", free_blocks[i]);
            return;
        }
        if (free_blocks[i] > largest) largest = free_blocks[i];
    }
    printf("EXTERNAL FRAGMENTATION: Largest free block = %zu, Requested = %zu\n", largest, needed);
    // Output: EXTERNAL FRAGMENTATION: Largest free block = 40, Requested = 64
}

int main() {
    size_t blocks[] = {10, 4, 20, 6};  // total free = 40, but no contiguous 64
    check_contiguous_fragmentation(blocks, 4, 64);
    return 0;
}

Output

EXTERNAL FRAGMENTATION: Largest free block = 20, Requested = 64

⚠ Production Trap:

Your memory allocator can lie to you. free(memory) returns 0, but subsequent malloc(64) fails. Check /proc/meminfo for 'Committed_AS' vs 'MemFree'. That's fragmentation in plain sight.

🎯 Key Takeaway

External fragmentation is why paging exists. Total free memory means nothing if no single block is big enough.

Modern Memory Allocation: jemalloc, tcmalloc, mimalloc

Traditional malloc implementations like glibc's ptmalloc2 can suffer from fragmentation and scalability issues under multi-threaded workloads. Modern allocators address these problems with per-thread caches, lock-free data structures, and optimized memory layouts.

jemalloc (used by Facebook, Redis, Firefox) reduces fragmentation by using separate arenas for different thread groups and maintaining size classes with buddy allocation. It excels in multi-threaded environments by minimizing contention.

tcmalloc (Google's allocator) uses thread-local caches and a central heap. It batches small allocations into pages and uses a page-level free list. This reduces lock contention and improves cache locality.

mimalloc (Microsoft) focuses on free list sharding and eager page purging. It uses a compact metadata structure and a novel 'free list of free lists' approach to reduce fragmentation.

Practical Example: Consider a web server handling 1000 concurrent requests, each allocating and freeing small objects. With ptmalloc, lock contention on the central heap can cause thrashing. Switching to jemalloc reduces contention by using per-thread arenas, improving throughput by 30%.

Production Insight: When migrating to a modern allocator, benchmark with realistic workloads. For example, Redis uses jemalloc by default; switching to mimalloc improved latency by 5% in some tests. Always test with your specific allocation patterns.

allocator_benchmark.cC

#include <stdlib.h>
#include <pthread.h>
#include <stdio.h>

#define NUM_THREADS 100
#define ALLOCS_PER_THREAD 100000

void* worker(void* arg) {
    for (int i = 0; i < ALLOCS_PER_THREAD; i++) {
        void* p = malloc(64);
        free(p);
    }
    return NULL;
}

int main() {
    pthread_t threads[NUM_THREADS];
    for (int i = 0; i < NUM_THREADS; i++)
        pthread_create(&threads[i], NULL, worker, NULL);
    for (int i = 0; i < NUM_THREADS; i++)
        pthread_join(threads[i], NULL);
    printf("Done\n");
    return 0;
}

💡Choosing an Allocator

📊 Production Insight

In production, monitor allocation patterns with tools like perf or heaptrack. A 10% reduction in allocation latency can significantly impact tail latency in high-throughput services.

🎯 Key Takeaway

Modern allocators reduce fragmentation and contention, improving performance in multi-threaded applications.

Huge Pages and Transparent Huge Pages in Linux

Huge pages reduce TLB misses by mapping large contiguous memory regions with fewer page table entries. Standard 4KB pages can cause TLB thrashing for workloads with large working sets (e.g., databases, VMs). Linux supports 2MB and 1GB huge pages.

Explicit Huge Pages: Reserved at boot or via /proc/sys/vm/nr_hugepages. Applications use mmap with MAP_HUGETLB or hugetlbfs. This guarantees huge pages but requires manual management.

Transparent Huge Pages (THP): Automatically promotes eligible 4KB pages to 2MB huge pages. Enabled by default on many distributions. However, THP can cause latency spikes due to compaction (memory defragmentation) and increased memory usage.

Practical Example: A Redis instance with 50GB dataset on a 64GB machine. Without huge pages, TLB misses cause 5% CPU overhead. Enabling THP reduces TLB misses by 80%, but compaction pauses increase latency by 2ms. Using explicit huge pages avoids compaction but requires memory reservation.

Production Insight: For latency-sensitive applications, disable THP (echo never > /sys/kernel/mm/transparent_hugepage/enabled) and use explicit huge pages. For throughput-oriented workloads, THP can be beneficial. Monitor /proc/meminfo for HugePages_Total and HugePages_Free.

hugepage_setup.shBASH

# Reserve 512 2MB huge pages (1GB total)
echo 512 > /proc/sys/vm/nr_hugepages

# Check reservation
cat /proc/meminfo | grep HugePages

# Mount hugetlbfs
mkdir -p /mnt/huge
mount -t hugetlbfs hugetlbfs /mnt/huge

# Use in C program
// mmap with MAP_HUGETLB flag
void* ptr = mmap(NULL, 2*1024*1024, PROT_READ|PROT_WRITE,
                 MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0);

⚠ THP Compaction Overhead

📊 Production Insight

In production, benchmark with and without THP. For databases like PostgreSQL, explicit huge pages often yield more predictable performance.

🎯 Key Takeaway

Huge pages reduce TLB misses and improve performance for large-memory workloads, but THP can introduce latency variability.

Memory Overcommit and OOM Killer

Linux overcommits memory by default: it allows processes to allocate more virtual memory than physical RAM + swap. This relies on the fact that applications rarely use all allocated memory. However, when actual memory usage exceeds capacity, the Out-Of-Memory (OOM) Killer terminates processes to free memory.

Overcommit Modes: - 0 (heuristic overcommit): Based on overcommit ratio. - 1 (always overcommit): Never refuse malloc. - 2 (no overcommit): Fail if allocation exceeds commit limit.

OOM Killer Selection: Based on oom_score (badness) which considers memory usage, runtime, and root privileges. The process with the highest score is killed.

Practical Example: A Java application with -Xmx80G on a 64GB machine. With overcommit enabled, the JVM starts successfully. When the application actually uses 70GB and swap is full, the OOM Killer kills a random process (e.g., SSH daemon). To avoid this, set vm.overcommit_memory=2 and vm.overcommit_ratio=50 (50% of RAM+swap).

Production Insight: For critical services, disable overcommit (vm.overcommit_memory=2) and set vm.overcommit_ratio appropriately. Monitor /proc/meminfo for Committed_AS (total committed memory). Use cgroups to limit memory per process and avoid OOM killing unrelated services.

oom_config.shBASH

# Disable overcommit (strict mode)
echo 2 > /proc/sys/vm/overcommit_memory

# Set overcommit ratio to 50% of RAM+swap
echo 50 > /proc/sys/vm/overcommit_ratio

# Check current committed memory
cat /proc/meminfo | grep Committed_AS

# Adjust OOM score for a process (lower = less likely to be killed)
echo -1000 > /proc/1234/oom_score_adj

🔥OOM Killer Tuning

📊 Production Insight

In production, use cgroups memory limits and swap accounting to prevent OOM killer from affecting unrelated processes. Always test under peak load.

🎯 Key Takeaway

Memory overcommit allows efficient memory utilization but risks OOM killing; strict mode prevents overcommit at the cost of rejecting large allocations.

● Production incidentPOST-MORTEMseverity: high

The Thrashed Production Server — When the OS Spends More Time Swapping Than Working

Symptom

Database queries that normally run in 200ms suddenly take 4+ minutes. CPU is near 100% but system load is through the roof. Disk I/O saturates with swap activity. dmesg shows 'out of memory' messages even though 20 GB RAM is free.

Assumption

The team assumed leftover memory meant the system wasn't memory-constrained. They thought the issue was a slow query and started indexing, not realizing the real bottleneck was page thrashing from a process that kept requesting more memory than physical RAM.

Root cause

One analytic process (a misconfigured Spark job) requested a 120 GB working set on a 64 GB machine. The OS's virtual memory manager started swapping out other processes' pages to make room. Every access to those pages caused a page fault, reading them back from disk. With enough concurrent pressure, the system spent 85% of its time on page fault handling and only 15% on actual work — classic thrashing.

Fix

1) Kill the runaway Spark job (immediate recovery). 2) Set ulimit -v and memory cgroups to cap per-process memory. 3) Configure vm.swappiness to 10 on database servers to prevent OS from swapping OS pages. 4) Add memory.soft_limit_in_bytes in cgroup to deprioritize batch jobs before they cause thrashing.

Key lesson

Thrashing happens when total working set exceeds physical RAM, not when RAM is 'full' — always monitor page fault rates (ps -eo min_flt,maj_flt)
Use cgroups and ulimits to prevent one rogue process from starving the system
Set vm.swappiness low (1-10) on latency-sensitive servers; never let the OS swap application pages

Production debug guideSymptom-based steps to diagnose OS-level memory issues4 entries

Symptom · 01

Process crashes with OOM (Out-of-Memory) killer

→

Fix

Check dmesg | grep -i 'killed process'. Identify the process and its memory usage before death. Increase memory limit or fix leak.

Symptom · 02

High system load but low CPU usage

→

Fix

Run vmstat 1 and look at 'si' (swap in) and 'so' (swap out). High values indicate swapping/thrashing. Check /proc/meminfo for Committed_AS vs CommitLimit.

Symptom · 03

Process feels slow intermittently

→

Fix

Check /proc/<pid>/status for VmPeak, VmRSS, VmSwap. Run perf stat -e page-faults to count major+minor page faults. High major faults (majflt) mean disk reads.

Symptom · 04

Memory fragmentation — cannot allocate large contiguous blocks

→

Fix

Check /proc/buddyinfo for fragmentation. On 64-bit systems this is less common for user space. For kernel: echo 1 > /proc/sys/vm/compact_memory to trigger compaction.

★ Quick Memory Debug Cheat SheetCommands to diagnose OS memory issues fast — every one works on Linux 5.x+ and most cloud kernels.

App killed by OOM killer−

Immediate action

Check dmesg for 'Out of memory' and 'Killed process'

Commands

dmesg | grep -i oom | tail -5

cat /proc/meminfo | grep -E '^MemTotal|^MemFree|^Cached'

Fix now

Raise vm.overcommit_memory to 2 (never overcommit) and set memory cgroups

Slow app, high swap usage+

Process won't allocate memory even when free RAM exists+

Memory Management Concepts Comparison

Concept	Level	Granularity	Fragmentation Type	Performance Implication
Fixed Partitions	OS	Process-sized	External	Simple but wastes memory
Paging	OS/HW	Page (4KB)	Internal	TLB misses on access
Segmentation	OS/HW	Variable segments	External	Context switch saves segment registers
Heap Allocation (mmap)	User space	Multiple of page	Internal, external	Syscall overhead on mmap
Buddy Allocator	Kernel	Power-of-2 blocks	Internal	Fast, but merges are O(log n)

⚙ Quick Reference

9 commands from this guide

File	Command / Code	Purpose
iothecodeforgememoryaddr_space.c	int main() {	What is Memory Management in OS?
iothecodeforgememorypage_table_sim.c	struct page_table_entry {	Memory Allocation Strategies
iothecodeforgememorypage_fault_measure.c	int main() {	Paging
iothecodeforgememorylru_sim.py	from collections import OrderedDict	Virtual Memory
iothecodeforgememoryclock_alg.c	struct frame {	Page Replacement Algorithms
fragmentation_check.c	void check_contiguous_fragmentation(size_t *free_blocks, int count, size_t neede...	Fragmentation
allocator_benchmark.c	void* worker(void* arg) {	Modern Memory Allocation
hugepage_setup.sh	echo 512 > /proc/sys/vm/nr_hugepages	Huge Pages and Transparent Huge Pages in Linux
oom_config.sh	echo 2 > /proc/sys/vm/overcommit_memory	Memory Overcommit and OOM Killer

Key takeaways

Memory management creates a virtual address space per process, mapped to physical frames via page tables.

Paging eliminates external fragmentation but introduces TLB pressure and page fault latency.

Virtual memory allows overcommitment; the OOM killer is the last line of defense

always set cgroup limits.

Thrashing happens when total working set exceeds RAM; monitor page fault rates, not free memory.

Choose replacement algorithms based on workload

LRU approximates optimal, Clock is practical for kernels.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain how virtual memory works. What happens when a process accesses a...

Q02JUNIOR

What is the difference between a page fault and a segmentation fault?

Q03SENIOR

How does the operating system allocate memory to a process? Describe the...

Q04SENIOR

What is the Copy-on-Write mechanism? How does it affect memory usage aft...

Q05SENIOR

Name and explain the three page replacement algorithms. Which one does L...

Q01 of 05SENIOR

Explain how virtual memory works. What happens when a process accesses a page not in RAM?

ANSWER

Virtual memory maps virtual addresses to physical addresses via the page table. The Translation Lookaside Buffer (TLB) caches recent translations. On a page fault (invalid page table entry), the OS takes control: 1. Hardware raises a page fault exception. 2. OS checks if the virtual address is valid (e.g., part of the process address space). 3. If valid, the OS loads the page from disk (swap or file) into a free frame. 4. It updates the page table with the new entry. 5. The instruction that caused the fault is retried. If no free frame exists, the OS must evict a page (using LRU/Clock). The process is blocked during I/O. If the sum of working sets exceeds physical RAM, thrashing occurs.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is the difference between physical and virtual memory?

What is a page fault and why does it matter?

How do I check memory usage on Linux?

What is swap and should I use it in production?

What is the OOM killer and how does it decide which process to kill?

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Drawn from code that ran under real load.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's Operating Systems. Mark it forged?

6 min read · try the examples if you haven't