Mid-level 6 min · March 06, 2026

Linux Disk Management — The /dev/sdX Reboot Trap

A database crashed post-reboot because /dev/sdb pointed to an empty backup.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Linux storage layers: raw disk → partition → filesystem → mount point
  • LVM adds a flexible abstraction layer between disks and filesystems
  • Always use UUIDs in /etc/fstab — device names change between boots
  • ext4 is the safe default; xfs excels with large files (single file up to 8 EiB)
  • df -h shows block usage; df -i reveals silent inode exhaustion
  • Inode exhaustion: you'll get 'No space left' despite free space on disk
Plain-English First

Think of your physical hard drive like a giant empty warehouse. Before you can store anything useful, you need to divide it into rooms (partitions), decide what kind of shelving system each room uses (filesystem), and then hang a sign on the door so people can find it (mounting). LVM is like hiring a warehouse manager who can knock down walls and resize rooms on the fly without moving all your boxes. Linux disk management is just you being that warehouse architect.

Every production outage I've ever seen that started with 'disk' in the alert was caused by someone who treated storage as an afterthought. A full root partition kills web servers, a misconfigured filesystem destroys databases, and a missing mount point in /etc/fstab means your server reboots into chaos at 3 AM. Storage management isn't glamorous, but it is the difference between a system that hums along and one that pages you on a Friday night.

The problem is that most tutorials show you the commands and stop there. They'll tell you to run mkfs.ext4 without explaining that formatting is irreversible and takes seconds. They'll show you mount without mentioning it evaporates on reboot unless you wire it into /etc/fstab. The gap between 'ran the command in a tutorial' and 'confidently managing storage on a live server' is exactly where people get hurt.

By the end of this article you'll know how to inspect a disk from scratch, partition it intentionally, format it with the right filesystem for your workload, mount it persistently, and use LVM to manage storage dynamically when your needs change. These are the skills you actually need on the job — not just for passing an exam.

Inspecting What You Have — Reading the Disk Landscape Before Touching Anything

The first rule of storage management is: never run a destructive command on a disk you haven't fully inspected. This sounds obvious, but under pressure people confuse /dev/sda with /dev/sdb and wipe the wrong drive. It happens more than anyone admits.

lsblk is your safest starting point. It reads block device info from sysfs without touching the disk itself — no risk, no side effects. It shows you the full device tree: physical drives, their partitions, and any logical volumes sitting on top. fdisk -l goes deeper, showing partition types, sizes, and sector alignment, but it requires root.

df -h tells you about mounted filesystems — what's actually in use right now. Note the difference: lsblk shows you everything attached to the system, df -h shows only what's mounted and accessible. A disk can exist on lsblk and be completely invisible to df -h if nobody's mounted it yet. Understanding this distinction stops a whole class of 'where did my disk go?' confusion.

The UUID shown in blkid is critical — always use UUIDs in /etc/fstab, not device names like /dev/sdb1. Device names are assigned at boot time and can change if you add or remove hardware. UUIDs are permanent identifiers burned into the filesystem itself.

inspect_disk_landscape.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
# inspect_disk_landscape.sh
# Safe read-only commands to fully understand your storage before making any changes.
# Run as root (or with sudo) for full output.

echo "=== BLOCK DEVICE TREE (lsblk) ==="
lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT,UUID

echo ""
echo "=== PARTITION DETAILS (fdisk) ==="
sudo fdisk -l /dev/sda

echo ""
echo "=== MOUNTED FILESYSTEM USAGE (df) ==="
df -h --output=source,size,used,avail,pcent,target

echo ""
echo "=== FILESYSTEM UUIDs (blkid) ==="
sudo blkid
Output
=== BLOCK DEVICE TREE (lsblk) ===
NAME SIZE TYPE FSTYPE MOUNTPOINT UUID
sda 100G disk
├─sda1 512M part vfat /boot/efi A1B2-C3D4
├─sda2 1G part ext4 /boot a1b2c3d4-1111-2222-3333-aabbccddeeff
└─sda3 98.5G part LVM2_member b2c3d4e5-2222-3333-4444-bbccddeeff00
├─vg0-root 20G lvm ext4 / c3d4e5f6-3333-4444-5555-ccddeeff0011
├─vg0-home 40G lvm ext4 /home d4e5f6a7-4444-5555-6666-ddeeff001122
└─vg0-data 38G lvm xfs /data e5f6a7b8-5555-6666-7777-eeff00112233
sdb 500G disk
└─sdb1 500G part xfs /mnt/backups f6a7b8c9-6666-7777-8888-ff0011223344
=== PARTITION DETAILS (fdisk) ===
Disk /dev/sda: 100 GiB, 107374182400 bytes, 209715200 sectors
Disk model: Virtual Disk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Device Start End Sectors Size Type
/dev/sda1 2048 1050623 1048576 512M EFI System
/dev/sda2 1050624 3147775 2097152 1G Linux filesystem
/dev/sda3 3147776 209715166 206567391 98.5G Linux LVM
=== MOUNTED FILESYSTEM USAGE (df) ===
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-root 20G 8.1G 10.6G 44% /
/dev/sda2 976M 201M 708M 23% /boot
/dev/sda1 511M 5.2M 506M 2% /boot/efi
/dev/mapper/vg0-home 40G 15G 23G 39% /home
/dev/mapper/vg0-data 38G 22G 14G 59% /data
/dev/sdb1 500G 87G 413G 18% /mnt/backups
=== FILESYSTEM UUIDs (blkid) ===
/dev/sda1: UUID="A1B2-C3D4" TYPE="vfat" PARTUUID="..."
/dev/sda2: UUID="a1b2c3d4-1111-2222-3333-aabbccddeeff" TYPE="ext4"
/dev/sda3: UUID="b2c3d4e5-2222-3333-4444-bbccddeeff00" TYPE="LVM2_member"
/dev/mapper/vg0-root: UUID="c3d4e5f6-3333-4444-5555-ccddeeff0011" TYPE="ext4"
/dev/mapper/vg0-data: UUID="e5f6a7b8-5555-6666-7777-eeff00112233" TYPE="xfs"
Watch Out: /dev/sdX Names Are Not Stable
The kernel assigns /dev/sda, /dev/sdb etc. based on the order it discovers drives at boot. Add a new disk, change a SATA port, or move to a different hypervisor and /dev/sdb can become /dev/sdc overnight. Always reference disks by UUID in fstab and scripts. Use sudo blkid | grep UUID to grab the stable identifier before you write anything to fstab.
Production Insight
Device name instability is the #1 cause of storage-related boot failures.
A single line using /dev/sdb1 in fstab can cause your production database to mount a backup volume after a hardware change.
Rule: never put /dev/sdX in fstab. Ever.
Use UUID or filesystem label instead.
Key Takeaway
Inspect with lsblk and blkid before any write.
Never use /dev/sdX in fstab — always use UUID.
The difference between lsblk and df -h is often the source of 'missing disk' confusion.
When to Use Which Inspection Tool
IfNeed to see all block devices (including unmounted)?
Uselsblk
IfNeed partition table details (sectors, type)?
Usefdisk -l
IfNeed to see only mounted filesystems and usage?
Usedf -h
IfNeed stable filesystem identifiers for fstab?
Useblkid

Partitioning, Formatting and Mounting — Preparing a New Disk From Scratch

When a fresh disk arrives — whether it's a new SSD in a bare-metal server or a new EBS volume attached to an EC2 instance — it's a blank slate. No partition table, no filesystem, no mount point. Before any application can write data to it, you need to walk through three distinct steps: partition, format, mount.

Partitioning with gdisk (for GPT) or fdisk (for MBR) defines the logical boundaries on the disk. For any disk over 2TB or any UEFI system, use GPT. For older systems or VMs where you know it's MBR, fdisk is fine. The partition table is just metadata that tells the OS where one region ends and another begins.

Formatting writes a filesystem into that partition. ext4 is the safe, well-understood default for general-purpose workloads — it has journaling, solid fsck tooling, and decades of battle testing. xfs is better for large files and high-throughput workloads (think log aggregation, big data). Don't overthink it for most use cases: ext4 unless you have a specific reason.

Mounting connects the formatted partition to a directory in the filesystem tree. The mount command does it immediately, but it vanishes on reboot. The /etc/fstab file makes it permanent. Every mounted filesystem you care about needs an entry there.

partition_format_mount.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/bin/bash
# partition_format_mount.sh
# Full walkthrough: take a raw disk (/dev/sdb) and make it usable.
# WARNING: This DESTROYS all data on /dev/sdb. Verify the device name first.
# Prerequisites: run as root. Confirm target disk with: lsblk | grep sdb

TARGET_DISK="/dev/sdb"           # The raw disk we're preparing
PARTITION="/dev/sdb1"            # The partition we'll create
MOUNT_DIR="/mnt/appdata"         # Where we'll attach it in the filesystem tree
FS_LABEL="appdata-vol"           # Human-readable label (helpful in logs and blkid)

# --- STEP 1: PARTITION THE DISK ---
echo "Creating GPT partition table on ${TARGET_DISK}..."
sudo gdisk ${TARGET_DISK} <<EOF
n
1


8300
w
yes
EOF

sudo partprobe ${TARGET_DISK}
sleep 2
echo "Partition layout after gdisk:"
lsblk ${TARGET_DISK}

# --- STEP 2: FORMAT THE PARTITION ---
echo "Formatting ${PARTITION} as ext4..."
sudo mkfs.ext4 -L ${FS_LABEL} ${PARTITION}

# --- STEP 3: CREATE MOUNT POINT ---
sudo mkdir -p ${MOUNT_DIR}

# --- STEP 4: MOUNT TEMPORARILY (to verify it works) ---
sudo mount ${PARTITION} ${MOUNT_DIR}
echo "Temporary mount successful. Testing write access..."
echo "storage_test" | sudo tee ${MOUNT_DIR}/write_test.txt > /dev/null

# --- STEP 5: GET UUID FOR FSTAB ---
DISK_UUID=$(sudo blkid -s UUID -o value ${PARTITION})
echo "UUID for ${PARTITION}: ${DISK_UUID}"

# --- STEP 6: ADD TO /etc/fstab FOR PERSISTENT MOUNTING ---
sudo cp /etc/fstab /etc/fstab.backup.$(date +%Y%m%d_%H%M%S)
echo "UUID=${DISK_UUID}  ${MOUNT_DIR}  ext4  defaults,nofail  0  2" | sudo tee -a /etc/fstab

sudo mount -a && echo "fstab validation passed — all entries mounted successfully."
df -h ${MOUNT_DIR}
Output
Creating GPT partition table on /dev/sdb...
Partition layout after gdisk:
NAME SIZE TYPE FSTYPE MOUNTPOINT
sdb 200G disk
└─sdb1 200G part
Formatting /dev/sdb1 as ext4...
mke2fs 1.46.5 (30-Dec-2021)
Creating filesystem with 52428800 4k blocks and 13107200 inodes
Filesystem UUID: f7a8b9c0-7777-8888-9999-001122334455
Superblock backups stored on blocks: 32768, 98304, 163840 ...
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
Temporary mount successful. Testing write access...
UUID for /dev/sdb1: f7a8b9c0-7777-8888-9999-001122334455
fstab validation passed — all entries mounted successfully.
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 197G 28K 187G 1% /mnt/appdata
Pro Tip: Always Use 'nofail' in fstab for Non-Root Disks
Without 'nofail', if a secondary disk fails to appear at boot (detached EBS volume, failed SAN mount, pulled SATA cable), the entire system drops into emergency mode and requires console access to fix. Add 'nofail' to every non-root fstab entry and your system keeps booting even when storage is misbehaving. On cloud instances, this is non-negotiable.
Production Insight
A missing 'nofail' on a cloud instance's ephemeral disk can cause a full boot failure.
I've seen production web servers stay down for hours because a transient EBS attachment issue pulled the whole server into emergency mode.
Rule: all non-root fstab entries get 'nofail'. Every time.
Key Takeaway
Partition, format, mount.
Use GPT for modern systems, ext4 for general use.
Always add fstab entry with UUID and nofail.
Never trust /dev/sdX names for persistence.
Partition Table Choice
IfDisk size > 2TB or UEFI boot?
UseUse GPT (gdisk)
IfLegacy BIOS, disk < 2TB, no UEFI?
UseUse MBR (fdisk)
IfNeed to reuse space from an existing partition?
UseUse LVM — avoid repartitioning

LVM — Dynamic Storage That Grows With Your Application

Here's the problem with raw partitions: they're static. You create a 50GB partition for your database, the database grows to 48GB, and now you're racing against time. Your only options are to resize the partition (risky, requires unmounting on most filesystems) or provision a new disk and move data. Neither is fun at 2 AM.

LVM — Logical Volume Manager — solves this by adding an abstraction layer between physical disks and the filesystems sitting on them. Instead of your filesystem sitting directly on /dev/sdb1, it sits on a logical volume that can be expanded by simply adding more physical storage to the underlying pool, called a Volume Group.

The mental model has three layers. Physical Volumes (PVs) are the raw disks or partitions you hand to LVM. A Volume Group (VG) is the pool — LVM combines all your PVs into one big storage bucket. Logical Volumes (LVs) are carved out of that pool and behave like normal partitions from the filesystem's perspective. The magic is that you can extend an LV while it's live and mounted, without unmounting or stopping the application.

This is why nearly every production Linux server uses LVM for everything except /boot. It's not complexity for its own sake — it's the ability to respond to storage demands without downtime.

lvm_setup_and_extend.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#!/bin/bash
# lvm_setup_and_extend.sh
# Demonstrates: creating an LVM stack from scratch AND extending a full volume live.
# Scenario: web app data volume (/dev/sdc) is full. We add a new disk (/dev/sdd)
# and expand the logical volume online — zero downtime.

# ============================================================
# PART 1: BUILD AN LVM STACK ON A FRESH DISK
# ============================================================

NEW_DISK="/dev/sdc"
VOLUME_GROUP="webdata_vg"
LOGICAL_VOLUME="webapp_lv"
LV_SIZE="30G"
MOUNT_POINT="/var/www/appdata"

echo "=== STEP 1: Create Physical Volume ==="
sudo pvcreate ${NEW_DISK}
sudo pvdisplay ${NEW_DISK}

echo "=== STEP 2: Create Volume Group ==="
sudo vgcreate ${VOLUME_GROUP} ${NEW_DISK}
sudo vgdisplay ${VOLUME_GROUP}

echo "=== STEP 3: Create Logical Volume ==="
sudo lvcreate -L ${LV_SIZE} -n ${LOGICAL_VOLUME} ${VOLUME_GROUP}
echo "LV device path: /dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME}"

echo "=== STEP 4: Format and Mount the Logical Volume ==="
sudo mkfs.ext4 -L webapp-data /dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME}
sudo mkdir -p ${MOUNT_POINT}
sudo mount /dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME} ${MOUNT_POINT}

DEVICE_PATH="/dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME}"
echo "${DEVICE_PATH}  ${MOUNT_POINT}  ext4  defaults,nofail  0  2" | sudo tee -a /etc/fstab
df -h ${MOUNT_POINT}

# ============================================================
# PART 2: EXTENDING THE VOLUME ONLINE (ZERO DOWNTIME)
# ============================================================

EXTRA_DISK="/dev/sdd"
EXTEND_BY="+50G"

echo "=== EXTEND: Add new disk to the VG pool ==="
sudo pvcreate ${EXTRA_DISK}
sudo vgextend ${VOLUME_GROUP} ${EXTRA_DISK}
sudo vgs ${VOLUME_GROUP}

echo "=== EXTEND: Grow the logical volume ==="
sudo lvextend -L ${EXTEND_BY} -r /dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME}
df -h ${MOUNT_POINT}
Output
=== STEP 1: Create Physical Volume ===
Physical volume "/dev/sdc" successfully created.
--- Physical volume ---
PV Name /dev/sdc
VG Name
PV Size 100.00 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 25599
Free PE 25599
=== STEP 2: Create Volume Group ===
Volume group "webdata_vg" successfully created
--- Volume group ---
VG Name webdata_vg
VG Size <100.00 GiB
PE Size 4.00 MiB
Total PE 25599
Free PE / Size 25599 / <100.00 GiB
=== STEP 3: Create Logical Volume ===
Logical volume "webapp_lv" created.
LV device path: /dev/mapper/webdata_vg-webapp_lv
=== STEP 4: Format and Mount the Logical Volume ===
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/webdata_vg-webapp_lv 30G 24K 28G 1% /var/www/appdata
=== EXTEND: Add new disk to the VG pool ===
Physical volume "/dev/sdd" successfully created.
Volume group "webdata_vg" successfully extended
VG #PV #LV #SN Attr VSize VFree
webdata_vg 2 1 0 wz--n- 199.99g 169.99g
=== EXTEND: Grow the logical volume ===
Size of logical volume webdata_vg/webapp_lv changed from 30.00 GiB to 80.00 GiB.
Logical volume webdata_vg/webapp_lv successfully resized.
resize2fs 1.46.5
Resizing the filesystem on /dev/mapper/webdata_vg-webapp_lv to 20971520 (4k) blocks.
The filesystem on /dev/mapper/webdata_vg-webapp_lv is now 20971520 (4k) blocks long.
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/webdata_vg-webapp_lv 79G 24K 75G 1% /var/www/appdata
Interview Gold: LVM Extend vs Resize
lvextend grows the logical volume block device. Without the -r flag, the filesystem inside the LV doesn't know it has more space — df -h will still show the old size. The -r flag runs resize2fs (for ext4) or xfs_growfs (for xfs) automatically. If you forget -r, run 'sudo resize2fs /dev/mapper/vg-lv' manually afterward. Interviewers love asking why df shows the old size after lvextend.
Production Insight
Forgetting -r on lvextend is the #1 LVM mistake in production.
The LV grows, the filesystem does not. The application still sees the old capacity.
Rule: always use lvextend -L +<size> -r. Or remember to run resize2fs/xfs_growfs immediately after.
This is why monitoring df -h after an extension is critical — not just lvs.
Key Takeaway
LVM = PV + VG + LV.
Extend live with lvextend -r.
Without -r, you extend the LV but not the filesystem.
xfs cannot be shrunk — plan accordingly.
LVM Growth Decision
IfVolume group has free space (check vgs)?
Uselvextend -L +<size> -r /dev/vg/lv
IfVolume group is full, new disk available?
Usepvcreate + vgextend + lvextend -r
IfNeed to shrink an ext4 volume?
UseUnmount, resize2fs, lvreduce, remount. xfs cannot shrink.

Monitoring, Troubleshooting and the /etc/fstab Deep Dive

Understanding how to provision storage is half the job. The other half is knowing when something's going wrong before it takes down your application, and being able to diagnose it fast.

The biggest production risk is a full disk — but the sneaky version is inodes running out before disk space does. Every file on an ext4 filesystem consumes one inode. A directory full of millions of tiny temp files (log shards, session files, cache chunks) can exhaust inodes while df -h shows 40% free space. The symptom is 'No space left on device' errors even though the disk looks fine. df -i reveals the truth.

For performance visibility, iostat from the sysstat package shows read/write throughput and I/O wait per device. High iowait on a specific device tells you whether your application is CPU-bound or storage-bound. iotop shows which processes are doing the most I/O right now — invaluable for finding a runaway process.

For /etc/fstab specifically: the six fields matter. The 'dump' field (5th, almost always 0) controls backup utilities. The 'pass' field (6th) controls fsck order — root should be 1, everything else 2 or 0 to skip. A wrong pass value on a network filesystem causes boot hangs because fsck tries to check an NFS share that isn't available yet.

storage_monitoring_and_diagnostics.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash
# storage_monitoring_and_diagnostics.sh
# Production-grade monitoring and diagnostics for Linux storage.
# Covers: disk usage, inode exhaustion, I/O performance, fstab validation.

echo "============================================"
echo " STORAGE HEALTH DASHBOARD"
echo "============================================"

echo ""
echo "[1] DISK SPACE USAGE (human-readable)"
df -hT --exclude-type=tmpfs --exclude-type=devtmpfs

echo ""
echo "[2] INODE USAGE — check this if you see 'No space left on device' with free space"
df -i --exclude-type=tmpfs --exclude-type=devtmpfs

echo ""
echo "[3] TOP 10 LARGEST DIRECTORIES under /var (common culprit for space issues)"
sudo du -sh /var/*/  2>/dev/null | sort -rh | head -10

echo ""
echo "[4] DISK I/O STATISTICS (3-second sample)"
sudo iostat -xd 1 3 2>/dev/null || echo "Install sysstat: sudo apt install sysstat"

echo ""
echo "[5] TOP I/O PROCESSES (requires iotop)"
sudo iotop -b -n 1 -o 2>/dev/null || echo "Install iotop: sudo apt install iotop"

echo ""
echo "[6] FSTAB VALIDATION"
grep -v '^#' /etc/fstab | grep -v '^$' | column -t
echo "Testing fstab by running: mount -a"
sudo mount -a 2>&1 && echo "✓ fstab OK" || echo "✗ fstab ERROR — fix before rebooting!"

echo ""
echo "[7] DISK HEALTH CHECK (smartctl)"
sudo smartctl -H /dev/sda 2>/dev/null || echo "Install smartmontools"

echo ""
echo "[8] CAPACITY ALERTS (>80% used)"
df -h --output=source,pcent,target | awk 'NR>1 && $2+0 > 80 { print "ALERT: " $1 " at " $2 " capacity — mount: " $3 }' || echo "All filesystems under 80%"
Output
============================================
STORAGE HEALTH DASHBOARD
============================================
[1] DISK SPACE USAGE (human-readable)
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/vg0-root ext4 20G 8.1G 10G 44% /
/dev/sda2 ext4 976M 201M 708M 23% /boot
/dev/mapper/vg0-home ext4 40G 15G 23G 39% /home
/dev/mapper/webdata_vg-webapp_lv ext4 79G 24G 51G 31% /var/www/appdata
/dev/sdb1 xfs 500G 87G 413G 18% /mnt/backups
[2] INODE USAGE
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/vg0-root 1310720 204831 1105889 16% /
/dev/mapper/webdata_vg-webapp_lv 5242880 4900012 342868 94% /var/www/appdata
[3] TOP 10 LARGEST DIRECTORIES under /var
4.2G /var/log/
1.8G /var/cache/
890M /var/lib/
210M /var/www/
[4] DISK I/O STATISTICS
Device r/s w/s rkB/s wkB/s await %util
sda 2.14 18.72 87.4 742.3 3.21 12.4
sdb 0.08 45.61 3.2 1821.4 1.84 67.2
[5] TOP I/O PROCESSES
Total DISK READ: 87.4 KiB/s | Total DISK WRITE: 2.5 MiB/s
PID USER DISK READ DISK WRITE COMMAND
14821 mysql 0.00 B/s 1.9 MiB/s mysqld
9034 www 0.00 B/s 612.0 KiB/s php-fpm
[6] FSTAB VALIDATION
UUID=c3d4e5f6... / ext4 defaults 0 1
UUID=a1b2c3d4... /boot ext4 defaults 0 2
UUID=A1B2-C3D4 /boot/efi vfat umask=0077 0 2
/dev/mapper/... /var/www/app ext4 defaults,nofail 0 2
✓ fstab OK — all entries valid
[8] CAPACITY ALERTS (>80% used)
ALERT: /dev/mapper/webdata_vg-webapp_lv at 94% inode capacity — mount: /var/www/appdata
Watch Out: 94% Inode Usage is a Ticking Clock
The output above shows inode usage at 94% on /var/www/appdata even though block space is only 31% used. This means millions of tiny files are accumulating — probably PHP session files or application cache. Find them with: find /var/www/appdata -xdev -type f | cut -d/ -f1-5 | sort | uniq -c | sort -rn | head -20. Then clean the offending directory and consider adding a cron job to prune session files. You can't increase inodes without reformatting ext4, so catch this early.
Production Insight
Inode exhaustion is silent until the application fails with 'No space left on device'.
df -h looks fine, but df -i shows 100%.
Rule: monitor both space and inodes. Set alerts at 80% for both.
Common culprits: session files, log shards, mail spools, temporary cache directories.
Key Takeaway
df -h for space, df -i for inodes.
Inode exhaustion looks like a full disk but isn't.
Monitor both. Never skip inode checks in production.
Diagnosing 'No Space Left'
Ifdf -h shows 100% space used?
UseDelete old files or extend volume.
Ifdf -h shows free space but df -i shows 100%?
UseInode exhaustion. Find and delete tiny files.
IfBoth show free space but application can't write?
UseCheck filesystem is not read-only (mount | grep ro). Check permissions. Check for hardware read error in dmesg.

LVM Snapshots: Consistent Backups Without Downtime

LVM snapshots let you take a point-in-time copy of a logical volume without unmounting it. They're not backups themselves — they're a consistent image you can then back up. Snapshots use copy-on-write: the original volume continues to be used normally, and the snapshot only stores the original data as it changes. This means snapshots are space-efficient initially, but they grow as writes occur.

The classic use case is database backup. You take a snapshot of the LV containing your MySQL data directory, mount the snapshot somewhere else, and run mysqldump or copy files from the snapshot. The production database keeps running with minimal impact.

Critical: snapshots consume space in the same Volume Group. If the original volume changes too much (writes happen), the snapshot fills up and becomes invalid. You must allocate enough snapshot size or keep the snapshot duration short. A full snapshot is read-only until you extend or remove it.

Another pattern: create a snapshot before a risky operation (e.g., filesystem resize, partition table change). If something goes wrong, you can revert by copying data back from the snapshot.

lvm_snapshot_and_restore.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
# lvm_snapshot_and_restore.sh
# Demonstrates: creating an LVM snapshot, mounting it for backup, and removing it.
# Also shows how to restore data from a snapshot (in case of accidental deletion).

TARGET_LV="/dev/webdata_vg/webapp_lv"   # The LV we want to snapshot
SNAPSHOT_NAME="webapp_lv_snap"           # Name for the snapshot
SNAPSHOT_SIZE="5G"                       # Allocate enough to cover expected writes during snapshot lifetime
MOUNT_DIR_BACKUP="/mnt/snapshot_backup"  # Temporary mount for snapshot

echo "=== STEP 1: Create a read-write snapshot of ${TARGET_LV} ==="
sudo lvcreate -s -L ${SNAPSHOT_SIZE} -n ${SNAPSHOT_NAME} ${TARGET_LV}

# Verify the snapshot device exists
sudo lvs | grep ${SNAPSHOT_NAME}

echo "=== STEP 2: Mount the snapshot for backup ==="
sudo mkdir -p ${MOUNT_DIR_BACKUP}
sudo mount /dev/webdata_vg/${SNAPSHOT_NAME} ${MOUNT_DIR_BACKUP}

# Now you can tar, rsync, or run mysqldump from the snapshot mount point.
# The production volume is untouched.
echo "Backing up data from snapshot..."
sudo tar -czf /backups/webapp_$(date +%Y%m%d_%H%M%S).tar.gz -C ${MOUNT_DIR_BACKUP} .

echo "=== STEP 3: Unmount and remove the snapshot ==="
sudo umount ${MOUNT_DIR_BACKUP}
sudo lvremove -f /dev/webdata_vg/${SNAPSHOT_NAME}

echo "Snapshot removed. Backup complete."
Output
=== STEP 1: Create a read-write snapshot of /dev/webdata_vg/webapp_lv ===
Logical volume "webapp_lv_snap" created.
LV VG Attr LSize Pool Origin Data%
webapp_lv_snap webdata_vg swi-aos--- 5.00g webapp_lv 0.02
=== STEP 2: Mount the snapshot for backup ===
Backing up data from snapshot...
tar: /backups/webapp_20260422_120000.tar.gz created.
=== STEP 3: Unmount and remove the snapshot ===
Logical volume "webapp_lv_snap" successfully removed.
Snapshot removed. Backup complete.
Snapshot Space Warning: Don't Let It Fill Up
A snapshot that runs out of space becomes 'full' and 'inactive' — it's no longer usable. Data in the original volume continues to change, but the snapshot cannot record old blocks. You'll see 'Snapshot overflow' in dmesg. Monitor snapshot usage with 'lvs -o+snap_percent'. If it reaches 100%, remove and recreate the snapshot. Plan size: estimate write rate to the original volume × snapshot duration.
Production Insight
Snapshots are not backups — they are backup enablers.
A full snapshot becomes read-only and useless, but the production volume is unaffected.
Rule: always monitor snapshot usage with 'lvs -o+snap_percent'.
And never use a snapshot older than a few hours as recovery source — it uses more space and has higher chance of failure.
Key Takeaway
LVM snapshots provide consistent point-in-time images.
They use copy-on-write and need free space in the VG.
Monitor snap_percent — a full snapshot is a dead snapshot.
Remove snapshots promptly after use.
● Production incidentPOST-MORTEMseverity: high

The Device Name Shuffle: How /dev/sdb1 Became the Wrong Disk After Reboot

Symptom
After a scheduled reboot, the database failed to start with 'data directory not found' or 'unexpected filesystem format'. Actually, it started but was reading from a backup volume that happened to be empty.
Assumption
The engineer assumed /dev/sdb would always refer to the same disk because it was the last one added. They added the entry to fstab using the device name.
Root cause
Linux assigns device names in the order they are discovered at boot. Adding a new disk or changing the boot order can reassign /dev/sdX names. The UUID written in the filesystem superblock is the only stable identifier.
Fix
Changed fstab to use UUID=... instead of /dev/sdb1. Ran 'sudo mount -a' to verify. Restored the database from the most recent backup (some data loss was inevitable). Set up monitoring to email on fsck errors at boot.
Key lesson
  • Never, ever use /dev/sdX names in fstab.
  • Always use 'sudo blkid' to get the UUID before writing fstab entries.
  • Add 'nofail' option to non-root entries so a missing disk doesn't halt the boot.
  • Consider using filesystem labels as a secondary stable reference.
Production debug guideQuick symptom-to-action mapping for common Linux storage failures4 entries
Symptom · 01
Application reports 'No space left on device' but df -h shows free space.
Fix
Check inode usage with 'df -i'. If IUse% is 100%, you've exhausted inodes. Find the directory with millions of tiny files using: find / -xdev -type f | cut -d/ -f1-5 | sort | uniq -c | sort -rn | head -10
Symptom · 02
Server doesn't boot or drops into emergency mode after adding a new disk.
Fix
Boot into single-user mode or use a live ISO. Check /etc/fstab for any invalid UUIDs or device names. Comment out the suspicious line, then run 'mount -a' to test. Restore the fstab backup if available.
Symptom · 03
df -h shows the same size after lvextend.
Fix
lvextend grew the logical volume but not the filesystem. Run 'sudo resize2fs /dev/mapper/vg-lv' for ext4 or 'sudo xfs_growfs /mount/point' for xfs. Add -r flag to future lvextend commands.
Symptom · 04
LVM snapshot volume is full and applications are frozen.
Fix
Snapshot consumes writable blocks in the origin. Large snapshot = many writes. Remove the snapshot with 'lvremove' once backup is complete. Plan snapshot size based on expected write rate × duration.
★ Storage Emergency Cheat SheetCommands to run when storage is misbehaving — no theory, just action.
Disk full but can't find files
Immediate action
Check inodes
Commands
df -i
find / -xdev -type f | cut -d/ -f1-5 | sort | uniq -c | sort -rn | head -20
Fix now
Delete old log/session files: find /var/www -name '*.log' -mtime +7 -delete
Fstab broken, server in emergency mode+
Immediate action
Remount root writable
Commands
mount -o remount,rw /
grep -v '^#' /etc/fstab | grep -v '^$'
Fix now
Comment out bad line with sed: sed -i 's/^UUID=bad/UUID=bad/' /etc/fstab; mount -a
Lvextend didn't increase df size+
Immediate action
Check filesystem type
Commands
df -T /mount/point
lvextend -L +10G -r /dev/vg/lv
Fix now
resize2fs /dev/vg/lv (ext4) or xfs_growfs /mount/point (xfs)
New disk not visible after attach+
Immediate action
Rescan SCSI bus
Commands
lsblk
echo '- - -' > /sys/class/scsi_host/host*/scan
Fix now
partprobe if partition table changed; reboot if still missing
Filesystem and Storage Comparison
Aspectext4xfsRaw Partition (no LVM)
Best Use CaseGeneral purpose, boot volumes, home dirsLarge files, high-throughput, databases, log aggregationSimple, single-purpose disks where overhead isn't wanted
Max File Size16 TiB8 EiBDepends on filesystem on top
Max Volume Size1 EiB8 EiBPartition table limit (2TB for MBR, 9.4ZB for GPT)
Online ShrinkSupported (unmount required)Not supported — cannot shrink xfs volumesNot applicable
Online GrowYes with resize2fsYes with xfs_growfsRequires partition resize (risky, usually needs unmount)
JournalingYes (protects metadata on crash)Yes (metadata-only by default)N/A — filesystem-level feature
Inode FlexibilityFixed at format timeDynamic inode allocation (no inode exhaustion)N/A
Recovery Toolinge2fsck — mature, well-documentedxfs_repair — powerful but less forgivingN/A
LVM CompatibleYes — recommended pairingYes — recommended for large data volumesNo LVM layer — static allocation only
Cloud Usage (AWS/GCP)Common for root volumesCommon for data volumes, EBS optimized workloadsRarely used directly in cloud environments

Key takeaways

1
Always inspect with lsblk and blkid before modifying any disk.
2
Use UUIDs (not /dev/sdX) in /etc/fstab for persistent mounts.
3
Add nofail to non-root fstab entries to avoid boot failures.
4
LVM provides live volume resizing
use lvextend -r to auto-resize filesystem.
5
Monitor both block space and inode usage. Inode exhaustion is silent.
6
LVM snapshots are copy-on-write; allocate enough space and monitor usage.
7
For production databases, use LVM snapshots for consistent backups without downtime.

Common mistakes to avoid

4 patterns
×

Using /dev/sdX names in /etc/fstab instead of UUIDs

Symptom
Server boots fine in the VM but after a hardware migration or adding a disk, the wrong filesystem mounts in the wrong place (or nothing mounts at all, dropping into emergency mode).
Fix
Always use UUID from 'sudo blkid -s UUID -o value /dev/sdX1' in fstab. UUIDs are written into the filesystem metadata and follow the disk wherever it goes.
×

Running lvextend without the -r flag

Symptom
lvextend succeeds and 'sudo lvs' shows the new size, but 'df -h' still shows the old size and the application can't use the extra space.
Fix
Either rerun with 'sudo lvextend -L +50G -r /dev/mapper/vg-lv' (the -r flag auto-resizes the filesystem), or manually run 'sudo resize2fs /dev/mapper/vg-lv' for ext4 or 'sudo xfs_growfs /mount/point' for xfs. Remember: the LV is the container; the filesystem inside it is a separate thing that also needs resizing.
×

Formatting a partition that still has data on it

Symptom
'mkfs.ext4 /dev/sdb1' completes instantly and silently overwrites everything that was on that partition.
Fix
Before any mkfs command, always run 'sudo mount | grep sdb1' to check if it's mounted, 'sudo lsblk /dev/sdb' to confirm what's on it, and 'sudo blkid /dev/sdb1' to see if it already has a filesystem. If you're on a server with multiple attached volumes, triple-check lsblk output before destructive operations. In a script, add a confirmation prompt or use a variable with a clearly wrong default (e.g., TARGET_DISK="SETME") so the script explodes safely if someone forgets to set it.
×

Creating an LVM snapshot without monitoring its space usage

Symptom
The snapshot becomes full and inactive, rendering the backup useless. The volume group may also run out of free space, affecting other LVs.
Fix
Monitor snapshot usage with 'lvs -o+snap_percent'. Allocate snapshot size based on expected write rate × snapshot lifetime. Remove snapshots promptly after backup. Consider using thin provisioning snapshots for larger workloads.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain the three layers of LVM and how you would extend a logical volum...
Q02SENIOR
What's the difference between ext4 and xfs? When would you choose one ov...
Q03JUNIOR
A new developer adds a disk and mounts it using the device name /dev/sdc...
Q04SENIOR
What is inode exhaustion and how do you detect and resolve it?
Q01 of 04SENIOR

Explain the three layers of LVM and how you would extend a logical volume that is running out of space on a production server without downtime.

ANSWER
LVM has Physical Volumes (PVs) — the raw disks or partitions, Volume Groups (VGs) — the pool of storage, and Logical Volumes (LVs) — the carved-out block devices that filesystems live on. To extend an LV without downtime: first check if the VG has free space with 'vgs'. If it does, run 'lvextend -L +<size> -r /dev/vg/lv' which grows the LV and the filesystem (ext4) at the same time via -r. If the VG doesn't have free space, attach a new disk, run 'pvcreate' on it, then 'vgextend VG /dev/newdisk', then 'lvextend -r'. The application never stops. For xfs, you need 'xfs_growfs' instead of -r.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
Why can't I use /dev/sdb1 in fstab?
02
Can I shrink an xfs filesystem?
03
What does the 'nofail' option in fstab do?
04
How do I check if a disk is healthy?
05
What happens if an LVM snapshot runs out of space?
🔥

That's Linux. Mark it forged?

6 min read · try the examples if you haven't

Previous
Linux System Performance Tuning
12 / 12 · Linux
Next
Introduction to Git