Inode exhaustion: you'll get 'No space left' despite free space on disk
Plain-English First
Think of your physical hard drive like a giant empty warehouse. Before you can store anything useful, you need to divide it into rooms (partitions), decide what kind of shelving system each room uses (filesystem), and then hang a sign on the door so people can find it (mounting). LVM is like hiring a warehouse manager who can knock down walls and resize rooms on the fly without moving all your boxes. Linux disk management is just you being that warehouse architect.
Every production outage I've ever seen that started with 'disk' in the alert was caused by someone who treated storage as an afterthought. A full root partition kills web servers, a misconfigured filesystem destroys databases, and a missing mount point in /etc/fstab means your server reboots into chaos at 3 AM. Storage management isn't glamorous, but it is the difference between a system that hums along and one that pages you on a Friday night.
The problem is that most tutorials show you the commands and stop there. They'll tell you to run mkfs.ext4 without explaining that formatting is irreversible and takes seconds. They'll show you mount without mentioning it evaporates on reboot unless you wire it into /etc/fstab. The gap between 'ran the command in a tutorial' and 'confidently managing storage on a live server' is exactly where people get hurt.
By the end of this article you'll know how to inspect a disk from scratch, partition it intentionally, format it with the right filesystem for your workload, mount it persistently, and use LVM to manage storage dynamically when your needs change. These are the skills you actually need on the job — not just for passing an exam.
Inspecting What You Have — Reading the Disk Landscape Before Touching Anything
The first rule of storage management is: never run a destructive command on a disk you haven't fully inspected. This sounds obvious, but under pressure people confuse /dev/sda with /dev/sdb and wipe the wrong drive. It happens more than anyone admits.
lsblk is your safest starting point. It reads block device info from sysfs without touching the disk itself — no risk, no side effects. It shows you the full device tree: physical drives, their partitions, and any logical volumes sitting on top. fdisk -l goes deeper, showing partition types, sizes, and sector alignment, but it requires root.
df -h tells you about mounted filesystems — what's actually in use right now. Note the difference: lsblk shows you everything attached to the system, df -h shows only what's mounted and accessible. A disk can exist on lsblk and be completely invisible to df -h if nobody's mounted it yet. Understanding this distinction stops a whole class of 'where did my disk go?' confusion.
The UUID shown in blkid is critical — always use UUIDs in /etc/fstab, not device names like /dev/sdb1. Device names are assigned at boot time and can change if you add or remove hardware. UUIDs are permanent identifiers burned into the filesystem itself.
inspect_disk_landscape.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
# inspect_disk_landscape.sh
# Safe read-only commands to fully understand your storage before making any changes.
# Run as root (or with sudo) for full output.
echo "=== BLOCK DEVICE TREE (lsblk) ==="
lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT,UUID
echo ""
echo "=== PARTITION DETAILS (fdisk) ==="
sudo fdisk -l /dev/sda
echo ""
echo "=== MOUNTED FILESYSTEM USAGE (df) ==="
df -h --output=source,size,used,avail,pcent,target
echo ""
echo "=== FILESYSTEM UUIDs (blkid) ==="
sudo blkid
Output
=== BLOCK DEVICE TREE (lsblk) ===
NAME SIZE TYPE FSTYPE MOUNTPOINT UUID
sda 100G disk
├─sda1 512M part vfat /boot/efi A1B2-C3D4
├─sda2 1G part ext4 /boot a1b2c3d4-1111-2222-3333-aabbccddeeff
└─sda3 98.5G part LVM2_member b2c3d4e5-2222-3333-4444-bbccddeeff00
The kernel assigns /dev/sda, /dev/sdb etc. based on the order it discovers drives at boot. Add a new disk, change a SATA port, or move to a different hypervisor and /dev/sdb can become /dev/sdc overnight. Always reference disks by UUID in fstab and scripts. Use sudo blkid | grep UUID to grab the stable identifier before you write anything to fstab.
Production Insight
Device name instability is the #1 cause of storage-related boot failures.
A single line using /dev/sdb1 in fstab can cause your production database to mount a backup volume after a hardware change.
Rule: never put /dev/sdX in fstab. Ever.
Use UUID or filesystem label instead.
Key Takeaway
Inspect with lsblk and blkid before any write.
Never use /dev/sdX in fstab — always use UUID.
The difference between lsblk and df -h is often the source of 'missing disk' confusion.
When to Use Which Inspection Tool
IfNeed to see all block devices (including unmounted)?
→
Uselsblk
IfNeed partition table details (sectors, type)?
→
Usefdisk -l
IfNeed to see only mounted filesystems and usage?
→
Usedf -h
IfNeed stable filesystem identifiers for fstab?
→
Useblkid
Partitioning, Formatting and Mounting — Preparing a New Disk From Scratch
When a fresh disk arrives — whether it's a new SSD in a bare-metal server or a new EBS volume attached to an EC2 instance — it's a blank slate. No partition table, no filesystem, no mount point. Before any application can write data to it, you need to walk through three distinct steps: partition, format, mount.
Partitioning with gdisk (for GPT) or fdisk (for MBR) defines the logical boundaries on the disk. For any disk over 2TB or any UEFI system, use GPT. For older systems or VMs where you know it's MBR, fdisk is fine. The partition table is just metadata that tells the OS where one region ends and another begins.
Formatting writes a filesystem into that partition. ext4 is the safe, well-understood default for general-purpose workloads — it has journaling, solid fsck tooling, and decades of battle testing. xfs is better for large files and high-throughput workloads (think log aggregation, big data). Don't overthink it for most use cases: ext4 unless you have a specific reason.
Mounting connects the formatted partition to a directory in the filesystem tree. The mount command does it immediately, but it vanishes on reboot. The /etc/fstab file makes it permanent. Every mounted filesystem you care about needs an entry there.
partition_format_mount.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/bin/bash
# partition_format_mount.sh
# Full walkthrough: take a raw disk (/dev/sdb) and make it usable.
# WARNING: ThisDESTROYS all data on /dev/sdb. Verify the device name first.
# Prerequisites: run as root. Confirm target disk with: lsblk | grep sdb
TARGET_DISK="/dev/sdb" # The raw disk we're preparing
PARTITION="/dev/sdb1" # The partition we'll create
MOUNT_DIR="/mnt/appdata" # Where we'll attach it in the filesystem tree
FS_LABEL="appdata-vol" # Human-readable label (helpful in logs and blkid)
# --- STEP1: PARTITIONTHEDISK ---
echo "Creating GPT partition table on ${TARGET_DISK}..."
sudo gdisk ${TARGET_DISK} <<EOF
n
18300
w
yes
EOF
sudo partprobe ${TARGET_DISK}
sleep 2
echo "Partition layout after gdisk:"
lsblk ${TARGET_DISK}
# --- STEP2: FORMATTHEPARTITION ---
echo "Formatting ${PARTITION} as ext4..."
sudo mkfs.ext4 -L ${FS_LABEL} ${PARTITION}
# --- STEP3: CREATEMOUNTPOINT ---
sudo mkdir -p ${MOUNT_DIR}
# --- STEP4: MOUNTTEMPORARILY (to verify it works) ---
sudo mount ${PARTITION} ${MOUNT_DIR}
echo "Temporary mount successful. Testing write access..."
echo "storage_test" | sudo tee ${MOUNT_DIR}/write_test.txt > /dev/null
# --- STEP5: GETUUIDFORFSTAB ---
DISK_UUID=$(sudo blkid -s UUID -o value ${PARTITION})
echo "UUID for ${PARTITION}: ${DISK_UUID}"
# --- STEP6: ADDTO /etc/fstab FORPERSISTENTMOUNTING ---
sudo cp /etc/fstab /etc/fstab.backup.$(date +%Y%m%d_%H%M%S)
echo "UUID=${DISK_UUID} ${MOUNT_DIR} ext4 defaults,nofail 0 2" | sudo tee -a /etc/fstab
sudo mount -a && echo "fstab validation passed — all entries mounted successfully."
df -h ${MOUNT_DIR}
Output
Creating GPT partition table on /dev/sdb...
Partition layout after gdisk:
NAME SIZE TYPE FSTYPE MOUNTPOINT
sdb 200G disk
└─sdb1 200G part
Formatting /dev/sdb1 as ext4...
mke2fs 1.46.5 (30-Dec-2021)
Creating filesystem with 52428800 4k blocks and 13107200 inodes
Superblock backups stored on blocks: 32768, 98304, 163840 ...
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
Temporary mount successful. Testing write access...
UUID for /dev/sdb1: f7a8b9c0-7777-8888-9999-001122334455
fstab validation passed — all entries mounted successfully.
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 197G 28K 187G 1% /mnt/appdata
Pro Tip: Always Use 'nofail' in fstab for Non-Root Disks
Without 'nofail', if a secondary disk fails to appear at boot (detached EBS volume, failed SAN mount, pulled SATA cable), the entire system drops into emergency mode and requires console access to fix. Add 'nofail' to every non-root fstab entry and your system keeps booting even when storage is misbehaving. On cloud instances, this is non-negotiable.
Production Insight
A missing 'nofail' on a cloud instance's ephemeral disk can cause a full boot failure.
I've seen production web servers stay down for hours because a transient EBS attachment issue pulled the whole server into emergency mode.
Rule: all non-root fstab entries get 'nofail'. Every time.
Key Takeaway
Partition, format, mount.
Use GPT for modern systems, ext4 for general use.
Always add fstab entry with UUID and nofail.
Never trust /dev/sdX names for persistence.
Partition Table Choice
IfDisk size > 2TB or UEFI boot?
→
UseUse GPT (gdisk)
IfLegacy BIOS, disk < 2TB, no UEFI?
→
UseUse MBR (fdisk)
IfNeed to reuse space from an existing partition?
→
UseUse LVM — avoid repartitioning
LVM — Dynamic Storage That Grows With Your Application
Here's the problem with raw partitions: they're static. You create a 50GB partition for your database, the database grows to 48GB, and now you're racing against time. Your only options are to resize the partition (risky, requires unmounting on most filesystems) or provision a new disk and move data. Neither is fun at 2 AM.
LVM — Logical Volume Manager — solves this by adding an abstraction layer between physical disks and the filesystems sitting on them. Instead of your filesystem sitting directly on /dev/sdb1, it sits on a logical volume that can be expanded by simply adding more physical storage to the underlying pool, called a Volume Group.
The mental model has three layers. Physical Volumes (PVs) are the raw disks or partitions you hand to LVM. A Volume Group (VG) is the pool — LVM combines all your PVs into one big storage bucket. Logical Volumes (LVs) are carved out of that pool and behave like normal partitions from the filesystem's perspective. The magic is that you can extend an LV while it's live and mounted, without unmounting or stopping the application.
This is why nearly every production Linux server uses LVM for everything except /boot. It's not complexity for its own sake — it's the ability to respond to storage demands without downtime.
lvm_setup_and_extend.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#!/bin/bash
# lvm_setup_and_extend.sh
# Demonstrates: creating an LVM stack from scratch AND extending a full volume live.
# Scenario: web app data volume (/dev/sdc) is full. We add a newdisk (/dev/sdd)
# and expand the logical volume online — zero downtime.
# ============================================================
# PART1: BUILDANLVMSTACKON A FRESHDISK
# ============================================================
NEW_DISK="/dev/sdc"
VOLUME_GROUP="webdata_vg"
LOGICAL_VOLUME="webapp_lv"
LV_SIZE="30G"
MOUNT_POINT="/var/www/appdata"
echo "=== STEP 1: Create Physical Volume ==="
sudo pvcreate ${NEW_DISK}
sudo pvdisplay ${NEW_DISK}
echo "=== STEP 2: Create Volume Group ==="
sudo vgcreate ${VOLUME_GROUP} ${NEW_DISK}
sudo vgdisplay ${VOLUME_GROUP}
echo "=== STEP 3: Create Logical Volume ==="
sudo lvcreate -L ${LV_SIZE} -n ${LOGICAL_VOLUME} ${VOLUME_GROUP}
echo "LV device path: /dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME}"
echo "=== STEP 4: Format and Mount the Logical Volume ==="
sudo mkfs.ext4 -L webapp-data /dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME}
sudo mkdir -p ${MOUNT_POINT}
sudo mount /dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME} ${MOUNT_POINT}
DEVICE_PATH="/dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME}"
echo "${DEVICE_PATH} ${MOUNT_POINT} ext4 defaults,nofail 0 2" | sudo tee -a /etc/fstab
df -h ${MOUNT_POINT}
# ============================================================
# PART2: EXTENDINGTHEVOLUMEONLINE (ZERODOWNTIME)
# ============================================================
EXTRA_DISK="/dev/sdd"
EXTEND_BY="+50G"
echo "=== EXTEND: Add new disk to the VG pool ==="
sudo pvcreate ${EXTRA_DISK}
sudo vgextend ${VOLUME_GROUP} ${EXTRA_DISK}
sudo vgs ${VOLUME_GROUP}
echo "=== EXTEND: Grow the logical volume ==="
sudo lvextend -L ${EXTEND_BY} -r /dev/mapper/${VOLUME_GROUP}-${LOGICAL_VOLUME}
df -h ${MOUNT_POINT}
Output
=== STEP 1: Create Physical Volume ===
Physical volume "/dev/sdc" successfully created.
--- Physical volume ---
PV Name /dev/sdc
VG Name
PV Size 100.00 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 25599
Free PE 25599
=== STEP 2: Create Volume Group ===
Volume group "webdata_vg" successfully created
--- Volume group ---
VG Name webdata_vg
VG Size <100.00 GiB
PE Size 4.00 MiB
Total PE 25599
Free PE / Size 25599 / <100.00 GiB
=== STEP 3: Create Logical Volume ===
Logical volume "webapp_lv" created.
LV device path: /dev/mapper/webdata_vg-webapp_lv
=== STEP 4: Format and Mount the Logical Volume ===
lvextend grows the logical volume block device. Without the -r flag, the filesystem inside the LV doesn't know it has more space — df -h will still show the old size. The -r flag runs resize2fs (for ext4) or xfs_growfs (for xfs) automatically. If you forget -r, run 'sudo resize2fs /dev/mapper/vg-lv' manually afterward. Interviewers love asking why df shows the old size after lvextend.
Production Insight
Forgetting -r on lvextend is the #1 LVM mistake in production.
The LV grows, the filesystem does not. The application still sees the old capacity.
Rule: always use lvextend -L +<size> -r. Or remember to run resize2fs/xfs_growfs immediately after.
This is why monitoring df -h after an extension is critical — not just lvs.
Key Takeaway
LVM = PV + VG + LV.
Extend live with lvextend -r.
Without -r, you extend the LV but not the filesystem.
Monitoring, Troubleshooting and the /etc/fstab Deep Dive
Understanding how to provision storage is half the job. The other half is knowing when something's going wrong before it takes down your application, and being able to diagnose it fast.
The biggest production risk is a full disk — but the sneaky version is inodes running out before disk space does. Every file on an ext4 filesystem consumes one inode. A directory full of millions of tiny temp files (log shards, session files, cache chunks) can exhaust inodes while df -h shows 40% free space. The symptom is 'No space left on device' errors even though the disk looks fine. df -i reveals the truth.
For performance visibility, iostat from the sysstat package shows read/write throughput and I/O wait per device. High iowait on a specific device tells you whether your application is CPU-bound or storage-bound. iotop shows which processes are doing the most I/O right now — invaluable for finding a runaway process.
For /etc/fstab specifically: the six fields matter. The 'dump' field (5th, almost always 0) controls backup utilities. The 'pass' field (6th) controls fsck order — root should be 1, everything else 2 or 0 to skip. A wrong pass value on a network filesystem causes boot hangs because fsck tries to check an NFS share that isn't available yet.
storage_monitoring_and_diagnostics.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash
# storage_monitoring_and_diagnostics.sh
# Production-grade monitoring and diagnostics forLinux storage.
# Covers: disk usage, inode exhaustion, I/O performance, fstab validation.
echo "============================================"
echo " STORAGE HEALTH DASHBOARD"
echo "============================================"
echo ""
echo "[1] DISK SPACE USAGE (human-readable)"
df -hT --exclude-type=tmpfs --exclude-type=devtmpfs
echo ""
echo "[2] INODE USAGE — check this if you see 'No space left on device' with free space"
df -i --exclude-type=tmpfs --exclude-type=devtmpfs
echo ""
echo "[3] TOP 10 LARGEST DIRECTORIES under /var (common culprit for space issues)"
sudo du -sh /var/*/ 2>/dev/null | sort -rh | head -10
echo ""
echo "[4] DISK I/O STATISTICS (3-second sample)"
sudo iostat -xd 132>/dev/null || echo "Install sysstat: sudo apt install sysstat"
echo ""
echo "[5] TOP I/O PROCESSES (requires iotop)"
sudo iotop -b -n 1 -o 2>/dev/null || echo "Install iotop: sudo apt install iotop"
echo ""
echo "[6] FSTAB VALIDATION"
grep -v '^#' /etc/fstab | grep -v '^$' | column -t
echo "Testing fstab by running: mount -a"
sudo mount -a 2>&1 && echo "✓ fstab OK" || echo "✗ fstab ERROR — fix before rebooting!"
echo ""
echo "[7] DISK HEALTH CHECK (smartctl)"
sudo smartctl -H /dev/sda 2>/dev/null || echo "Install smartmontools"
echo ""
echo "[8] CAPACITY ALERTS (>80% used)"
df -h --output=source,pcent,target | awk 'NR>1 && $2+0 > 80 { print "ALERT: " $1" at " $2" capacity — mount: " $3 }' || echo "All filesystems under 80%"
ALERT: /dev/mapper/webdata_vg-webapp_lv at 94% inode capacity — mount: /var/www/appdata
Watch Out: 94% Inode Usage is a Ticking Clock
The output above shows inode usage at 94% on /var/www/appdata even though block space is only 31% used. This means millions of tiny files are accumulating — probably PHP session files or application cache. Find them with: find /var/www/appdata -xdev -type f | cut -d/ -f1-5 | sort | uniq -c | sort -rn | head -20. Then clean the offending directory and consider adding a cron job to prune session files. You can't increase inodes without reformatting ext4, so catch this early.
Production Insight
Inode exhaustion is silent until the application fails with 'No space left on device'.
df -h looks fine, but df -i shows 100%.
Rule: monitor both space and inodes. Set alerts at 80% for both.
Common culprits: session files, log shards, mail spools, temporary cache directories.
Key Takeaway
df -h for space, df -i for inodes.
Inode exhaustion looks like a full disk but isn't.
Monitor both. Never skip inode checks in production.
Diagnosing 'No Space Left'
Ifdf -h shows 100% space used?
→
UseDelete old files or extend volume.
Ifdf -h shows free space but df -i shows 100%?
→
UseInode exhaustion. Find and delete tiny files.
IfBoth show free space but application can't write?
→
UseCheck filesystem is not read-only (mount | grep ro). Check permissions. Check for hardware read error in dmesg.
LVM Snapshots: Consistent Backups Without Downtime
LVM snapshots let you take a point-in-time copy of a logical volume without unmounting it. They're not backups themselves — they're a consistent image you can then back up. Snapshots use copy-on-write: the original volume continues to be used normally, and the snapshot only stores the original data as it changes. This means snapshots are space-efficient initially, but they grow as writes occur.
The classic use case is database backup. You take a snapshot of the LV containing your MySQL data directory, mount the snapshot somewhere else, and run mysqldump or copy files from the snapshot. The production database keeps running with minimal impact.
Critical: snapshots consume space in the same Volume Group. If the original volume changes too much (writes happen), the snapshot fills up and becomes invalid. You must allocate enough snapshot size or keep the snapshot duration short. A full snapshot is read-only until you extend or remove it.
Another pattern: create a snapshot before a risky operation (e.g., filesystem resize, partition table change). If something goes wrong, you can revert by copying data back from the snapshot.
lvm_snapshot_and_restore.shBASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
# lvm_snapshot_and_restore.sh
# Demonstrates: creating an LVM snapshot, mounting it for backup, and removing it.
# Also shows how to restore data from a snapshot (in case of accidental deletion).
TARGET_LV="/dev/webdata_vg/webapp_lv" # TheLV we want to snapshot
SNAPSHOT_NAME="webapp_lv_snap" # Namefor the snapshot
SNAPSHOT_SIZE="5G" # Allocate enough to cover expected writes during snapshot lifetime
MOUNT_DIR_BACKUP="/mnt/snapshot_backup" # Temporary mount for snapshot
echo "=== STEP 1: Create a read-write snapshot of ${TARGET_LV} ==="
sudo lvcreate -s -L ${SNAPSHOT_SIZE} -n ${SNAPSHOT_NAME} ${TARGET_LV}
# Verify the snapshot device exists
sudo lvs | grep ${SNAPSHOT_NAME}
echo "=== STEP 2: Mount the snapshot for backup ==="
sudo mkdir -p ${MOUNT_DIR_BACKUP}
sudo mount /dev/webdata_vg/${SNAPSHOT_NAME} ${MOUNT_DIR_BACKUP}
# Now you can tar, rsync, or run mysqldump from the snapshot mount point.
# The production volume is untouched.
echo "Backing up data from snapshot..."
sudo tar -czf /backups/webapp_$(date +%Y%m%d_%H%M%S).tar.gz -C ${MOUNT_DIR_BACKUP} .
echo "=== STEP 3: Unmount and remove the snapshot ==="
sudo umount ${MOUNT_DIR_BACKUP}
sudo lvremove -f /dev/webdata_vg/${SNAPSHOT_NAME}
echo "Snapshot removed. Backup complete."
Output
=== STEP 1: Create a read-write snapshot of /dev/webdata_vg/webapp_lv ===
A snapshot that runs out of space becomes 'full' and 'inactive' — it's no longer usable. Data in the original volume continues to change, but the snapshot cannot record old blocks. You'll see 'Snapshot overflow' in dmesg. Monitor snapshot usage with 'lvs -o+snap_percent'. If it reaches 100%, remove and recreate the snapshot. Plan size: estimate write rate to the original volume × snapshot duration.
Production Insight
Snapshots are not backups — they are backup enablers.
A full snapshot becomes read-only and useless, but the production volume is unaffected.
Rule: always monitor snapshot usage with 'lvs -o+snap_percent'.
And never use a snapshot older than a few hours as recovery source — it uses more space and has higher chance of failure.
Key Takeaway
LVM snapshots provide consistent point-in-time images.
They use copy-on-write and need free space in the VG.
Monitor snap_percent — a full snapshot is a dead snapshot.
Remove snapshots promptly after use.
● Production incidentPOST-MORTEMseverity: high
The Device Name Shuffle: How /dev/sdb1 Became the Wrong Disk After Reboot
Symptom
After a scheduled reboot, the database failed to start with 'data directory not found' or 'unexpected filesystem format'. Actually, it started but was reading from a backup volume that happened to be empty.
Assumption
The engineer assumed /dev/sdb would always refer to the same disk because it was the last one added. They added the entry to fstab using the device name.
Root cause
Linux assigns device names in the order they are discovered at boot. Adding a new disk or changing the boot order can reassign /dev/sdX names. The UUID written in the filesystem superblock is the only stable identifier.
Fix
Changed fstab to use UUID=... instead of /dev/sdb1. Ran 'sudo mount -a' to verify. Restored the database from the most recent backup (some data loss was inevitable). Set up monitoring to email on fsck errors at boot.
Key lesson
Never, ever use /dev/sdX names in fstab.
Always use 'sudo blkid' to get the UUID before writing fstab entries.
Add 'nofail' option to non-root entries so a missing disk doesn't halt the boot.
Consider using filesystem labels as a secondary stable reference.
Production debug guideQuick symptom-to-action mapping for common Linux storage failures4 entries
Symptom · 01
Application reports 'No space left on device' but df -h shows free space.
→
Fix
Check inode usage with 'df -i'. If IUse% is 100%, you've exhausted inodes. Find the directory with millions of tiny files using: find / -xdev -type f | cut -d/ -f1-5 | sort | uniq -c | sort -rn | head -10
Symptom · 02
Server doesn't boot or drops into emergency mode after adding a new disk.
→
Fix
Boot into single-user mode or use a live ISO. Check /etc/fstab for any invalid UUIDs or device names. Comment out the suspicious line, then run 'mount -a' to test. Restore the fstab backup if available.
Symptom · 03
df -h shows the same size after lvextend.
→
Fix
lvextend grew the logical volume but not the filesystem. Run 'sudo resize2fs /dev/mapper/vg-lv' for ext4 or 'sudo xfs_growfs /mount/point' for xfs. Add -r flag to future lvextend commands.
Symptom · 04
LVM snapshot volume is full and applications are frozen.
→
Fix
Snapshot consumes writable blocks in the origin. Large snapshot = many writes. Remove the snapshot with 'lvremove' once backup is complete. Plan snapshot size based on expected write rate × duration.
★ Storage Emergency Cheat SheetCommands to run when storage is misbehaving — no theory, just action.
Disk full but can't find files−
Immediate action
Check inodes
Commands
df -i
find / -xdev -type f | cut -d/ -f1-5 | sort | uniq -c | sort -rn | head -20
Comment out bad line with sed: sed -i 's/^UUID=bad/UUID=bad/' /etc/fstab; mount -a
Lvextend didn't increase df size+
Immediate action
Check filesystem type
Commands
df -T /mount/point
lvextend -L +10G -r /dev/vg/lv
Fix now
resize2fs /dev/vg/lv (ext4) or xfs_growfs /mount/point (xfs)
New disk not visible after attach+
Immediate action
Rescan SCSI bus
Commands
lsblk
echo '- - -' > /sys/class/scsi_host/host*/scan
Fix now
partprobe if partition table changed; reboot if still missing
Filesystem and Storage Comparison
Aspect
ext4
xfs
Raw Partition (no LVM)
Best Use Case
General purpose, boot volumes, home dirs
Large files, high-throughput, databases, log aggregation
Simple, single-purpose disks where overhead isn't wanted
Max File Size
16 TiB
8 EiB
Depends on filesystem on top
Max Volume Size
1 EiB
8 EiB
Partition table limit (2TB for MBR, 9.4ZB for GPT)
Online Shrink
Supported (unmount required)
Not supported — cannot shrink xfs volumes
Not applicable
Online Grow
Yes with resize2fs
Yes with xfs_growfs
Requires partition resize (risky, usually needs unmount)
Journaling
Yes (protects metadata on crash)
Yes (metadata-only by default)
N/A — filesystem-level feature
Inode Flexibility
Fixed at format time
Dynamic inode allocation (no inode exhaustion)
N/A
Recovery Tooling
e2fsck — mature, well-documented
xfs_repair — powerful but less forgiving
N/A
LVM Compatible
Yes — recommended pairing
Yes — recommended for large data volumes
No LVM layer — static allocation only
Cloud Usage (AWS/GCP)
Common for root volumes
Common for data volumes, EBS optimized workloads
Rarely used directly in cloud environments
Key takeaways
1
Always inspect with lsblk and blkid before modifying any disk.
2
Use UUIDs (not /dev/sdX) in /etc/fstab for persistent mounts.
3
Add nofail to non-root fstab entries to avoid boot failures.
4
LVM provides live volume resizing
use lvextend -r to auto-resize filesystem.
5
Monitor both block space and inode usage. Inode exhaustion is silent.
6
LVM snapshots are copy-on-write; allocate enough space and monitor usage.
7
For production databases, use LVM snapshots for consistent backups without downtime.
Common mistakes to avoid
4 patterns
×
Using /dev/sdX names in /etc/fstab instead of UUIDs
Symptom
Server boots fine in the VM but after a hardware migration or adding a disk, the wrong filesystem mounts in the wrong place (or nothing mounts at all, dropping into emergency mode).
Fix
Always use UUID from 'sudo blkid -s UUID -o value /dev/sdX1' in fstab. UUIDs are written into the filesystem metadata and follow the disk wherever it goes.
×
Running lvextend without the -r flag
Symptom
lvextend succeeds and 'sudo lvs' shows the new size, but 'df -h' still shows the old size and the application can't use the extra space.
Fix
Either rerun with 'sudo lvextend -L +50G -r /dev/mapper/vg-lv' (the -r flag auto-resizes the filesystem), or manually run 'sudo resize2fs /dev/mapper/vg-lv' for ext4 or 'sudo xfs_growfs /mount/point' for xfs. Remember: the LV is the container; the filesystem inside it is a separate thing that also needs resizing.
×
Formatting a partition that still has data on it
Symptom
'mkfs.ext4 /dev/sdb1' completes instantly and silently overwrites everything that was on that partition.
Fix
Before any mkfs command, always run 'sudo mount | grep sdb1' to check if it's mounted, 'sudo lsblk /dev/sdb' to confirm what's on it, and 'sudo blkid /dev/sdb1' to see if it already has a filesystem. If you're on a server with multiple attached volumes, triple-check lsblk output before destructive operations. In a script, add a confirmation prompt or use a variable with a clearly wrong default (e.g., TARGET_DISK="SETME") so the script explodes safely if someone forgets to set it.
×
Creating an LVM snapshot without monitoring its space usage
Symptom
The snapshot becomes full and inactive, rendering the backup useless. The volume group may also run out of free space, affecting other LVs.
Fix
Monitor snapshot usage with 'lvs -o+snap_percent'. Allocate snapshot size based on expected write rate × snapshot lifetime. Remove snapshots promptly after backup. Consider using thin provisioning snapshots for larger workloads.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
Explain the three layers of LVM and how you would extend a logical volum...
Q02SENIOR
What's the difference between ext4 and xfs? When would you choose one ov...
Q03JUNIOR
A new developer adds a disk and mounts it using the device name /dev/sdc...
Q04SENIOR
What is inode exhaustion and how do you detect and resolve it?
Q01 of 04SENIOR
Explain the three layers of LVM and how you would extend a logical volume that is running out of space on a production server without downtime.
ANSWER
LVM has Physical Volumes (PVs) — the raw disks or partitions, Volume Groups (VGs) — the pool of storage, and Logical Volumes (LVs) — the carved-out block devices that filesystems live on. To extend an LV without downtime: first check if the VG has free space with 'vgs'. If it does, run 'lvextend -L +<size> -r /dev/vg/lv' which grows the LV and the filesystem (ext4) at the same time via -r. If the VG doesn't have free space, attach a new disk, run 'pvcreate' on it, then 'vgextend VG /dev/newdisk', then 'lvextend -r'. The application never stops. For xfs, you need 'xfs_growfs' instead of -r.
Q02 of 04SENIOR
What's the difference between ext4 and xfs? When would you choose one over the other?
ANSWER
ext4 is the safe default — it's been around forever, has excellent recovery tools (e2fsck), supports online shrinking (unmount required), and its inode count is fixed at format time. xfs excels with large files (single file up to 8 EiB vs ext4's 16 TiB), dynamic inode allocation (no inode exhaustion), and performs better under high concurrency with large files. Choose ext4 for root filesystems, small databases, and general-purpose workloads. Choose xfs for big data, log aggregation, video storage, and high-throughput databases like MongoDB. You cannot shrink an xfs filesystem — plan accordingly.
Q03 of 04JUNIOR
A new developer adds a disk and mounts it using the device name /dev/sdc1 in /etc/fstab. The server reboots and now mounts the wrong volume. What happened and how do you fix it permanently?
ANSWER
The disk order changed at boot. Device names like /dev/sdc are assigned dynamically based on kernel discovery order. Adding or removing hardware can reorder them. Fix: boot into emergency mode or single-user, check 'blkid' to find the correct UUID for the intended filesystem, edit /etc/fstab to use UUID=... instead of /dev/sdc1, run 'mount -a' to test, then reboot. Preventative: always use UUIDs in fstab and scripts. Also consider using filesystem labels if you need a more human-friendly stable reference.
Q04 of 04SENIOR
What is inode exhaustion and how do you detect and resolve it?
ANSWER
Inode exhaustion happens when a filesystem runs out of inodes (the data structures that store file metadata). It causes 'No space left on device' errors even though df -h shows free block space. Detect it with 'df -i'. If IUse% is 100%, that's the cause. Find the directory with millions of tiny files using 'find /mountpoint -xdev -type f | cut -d/ -f1-5 | sort | uniq -c | sort -rn | head -20'. Clean up files (log rotation, session cleanup, etc.). You can't increase inodes on ext4 without reformatting. For xfs, inodes are dynamically allocated, so it's not an issue. prevention: set up monitoring on inode usage.
01
Explain the three layers of LVM and how you would extend a logical volume that is running out of space on a production server without downtime.
SENIOR
02
What's the difference between ext4 and xfs? When would you choose one over the other?
SENIOR
03
A new developer adds a disk and mounts it using the device name /dev/sdc1 in /etc/fstab. The server reboots and now mounts the wrong volume. What happened and how do you fix it permanently?
JUNIOR
04
What is inode exhaustion and how do you detect and resolve it?
SENIOR
FAQ · 5 QUESTIONS
Frequently Asked Questions
01
Why can't I use /dev/sdb1 in fstab?
Device names like /dev/sdb1 are assigned dynamically at boot based on kernel discovery order. Adding, removing, or reordering drives can change the mapping. UUIDs are stored in the filesystem metadata and stay with the disk regardless of which device name it gets. Always use blkid to get the UUID and reference that in fstab.
Was this helpful?
02
Can I shrink an xfs filesystem?
No. xfs does not support online or offline shrinking. To reduce an xfs volume, you must back up data, destroy the LV, create a smaller one, and restore. Plan your xfs volumes carefully — it's grow-only. In contrast, ext4 can be shrunk (unmounted, resize2fs, then lvreduce).
Was this helpful?
03
What does the 'nofail' option in fstab do?
Without nofail, if a filesystem fails to mount at boot (e.g., detachable EBS volume, missing SAN LUN), the system drops into emergency mode and requires manual intervention. With nofail, the mount attempt is non-critical — the boot continues even if the mount fails. Use nofail on all non-root filesystems, especially in cloud environments where volumes can be detached.
Was this helpful?
04
How do I check if a disk is healthy?
Use sudo smartctl -H /dev/sda for a quick health status (PASSED/FAILED). For detailed self-log, use sudo smartctl -a /dev/sda. Also check dmesg | grep -i error for kernel-level storage errors. For SSDs, look at wear level indicators. For RAID controllers, use megacli or storcli depending on the card.
Was this helpful?
05
What happens if an LVM snapshot runs out of space?
The snapshot becomes 'full' and 'inactive' — it drops to read-only and cannot record new changes from the original volume. The original volume itself is unaffected. You'll see 'Snapshot overflow' in dmesg. The snapshot must be removed and recreated with more space. Monitor with lvs -o+snap_percent.