Intermediate 11 min · March 05, 2026

Database Backup and Restore

MySQL & PostgreSQL Backup Failures — Why Exit Code 0 Lies

Q: Can I restore a MySQL mysqldump backup into PostgreSQL?

Not directly. mysqldump generates MySQL-specific SQL syntax. You'd need to convert it by hand or use tools like `pgloader` which can map MySQL types to PostgreSQL. Schema differences (TINYINT vs SMALLINT, AUTO_INCREMENT vs SERIAL) require manual adjustment. Always test with a small subset first.

Q: How often should I run a full database backup?

At minimum, run a full backup daily during off-peak hours. For critical systems, combine daily full with incremental physical backups every 6 hours, and continuous transaction log archiving every ~5 minutes. The frequency depends on your RPO (recovery point objective).

Q: What is the difference between cold, hot, and warm backups?

Cold backup: database is shut down during backup — 100% consistent but incurs downtime. Hot backup: database remains online and accepting writes; consistency is achieved via transactions (e.g., InnoDB's MVCC or PostgreSQL's snapshot isolation). Warm backup: database is online but read-only during backup (e.g., using a replica). In production, hot backups are preferred.

Q: How do I perform a point-in-time recovery (PITR) in MySQL?

You need binary logging enabled (`log_bin`, `server_id`). Restore a full backup (logical or physical), then apply binary logs from the dump position to the target time using `mysqlbinlog` and piping to `mysql`. The position is recorded in the backup header if you used `--master-data=2`. Example: `mysqlbinlog --start-position=12345 --stop-datetime='2026-04-22 14:30:00' /backup/binlog.000007 | mysql -u root`.

Q: Should I compress my backup files?

Yes — compression reduces storage and network transfer time significantly (5-10x for text). Use `gzip` on the fly (e.g., `mysqldump ... | gzip > dump.gz`) or use built-in compression (`pg_dump -Z9` for gzip, or `--compress` for mysqldump). But monitor CPU usage: compression can slow down the backup process on CPU-bound systems. Balance with a moderate compression level (e.g., 3-6).

A 2TB mysqldump ran 21 days before bit rot made it unrestorable.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Lessons pulled from things that broke in production.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Logical backups (mysqldump, pg_dump) produce SQL text files — portable, cross-version, but slow for large datasets
Physical backups (XtraBackup, pg_basebackup) copy raw data files — fast, full/incremental, but tied to storage engine
Production backups must include binary logs (MySQL) or WAL archives (PostgreSQL) for point-in-time recovery
Automated scripts belong in cron with healthcheck — silent backup failures are the most dangerous
The biggest mistake: never testing a restore — a backup you haven't restored is a wish

✦ Definition~90s read

What is Database Backup and Restore?

Database backup and restore is a recovery protocol, not a file copy job. The difference matters because a backup that silently fails — exit code 0 with an empty or truncated file — is worse than no backup at all: it gives you false confidence until the moment you need it.

★

Imagine you spend three months building the world's greatest LEGO castle.

MySQL and PostgreSQL handle this differently: mysqldump can exit cleanly even when it hits a locked table or runs out of disk space mid-stream, while pg_dump can produce a valid dump file that fails to restore due to dependency ordering or missing extensions. The core problem is that most automation scripts check exit codes, not data integrity, and exit code 0 lies more often than you'd expect in production environments handling terabytes of data.

Logical backups (SQL dumps) give you row-level granularity and portability across versions, but they're slow and can lock tables for hours on large databases. Physical backups (file snapshots, pg_basebackup, XtraBackup) are faster and consistent at the filesystem level, but they're tied to specific storage engines and database versions.

The choice determines your recovery time objective (RTO) and recovery point objective (RPO): logical backups are for point-in-time recovery of small datasets or specific tables; physical backups are for full-instance recovery under time pressure. Most production systems need both, with logical backups for granular restores and physical backups for disaster recovery.

When automating, the script must validate the backup after creation — check file size, test a restore on a staging instance, or at minimum verify the last line of the dump file contains the expected completion marker. For PostgreSQL, use pg_dump with --no-owner and --no-acl unless you're restoring to the exact same cluster, and always test with pg_restore --list to catch dependency issues before a real restore.

For MySQL, avoid --quick in production (it disables buffering and can cause partial writes on failure) and always pipe through gzip with a size check. The restore procedure under pressure is a separate skill: know your database's restore order, have a rollback plan, and practice the full cycle quarterly — because the first time you do it shouldn't be at 3 AM with a pager going off.

Plain-English First

Imagine you spend three months building the world's greatest LEGO castle. A backup is a photo of that castle taken every day — so if your little sibling kicks it over, you can rebuild from yesterday's photo instead of starting from scratch. A restore is the act of rebuilding from that photo. Database backup and restore is exactly that: saving a copy of your data at a point in time, and bringing it back when something goes wrong.

Every developer eventually faces The Moment — a production database gets corrupted, a junior engineer runs DELETE without a WHERE clause, or a cloud disk silently fails. The difference between a bad afternoon and a company-ending catastrophe is whether you had a solid backup strategy before it happened. Backup and restore isn't a nice-to-have; it's the insurance policy your entire application depends on.

The problem is that most tutorials show you a command and call it a day. They don't explain that there are fundamentally different types of backups — logical vs. physical — and that choosing the wrong one for your situation can mean either a 6-hour restore window when you need 20 minutes, or a backup file that's completely unusable. They don't explain why pg_dump and mysqldump exist as separate tools with different philosophies, or when you'd reach for something else entirely.

By the end of this article you'll be able to: create logical and physical backups for both MySQL and PostgreSQL, write an automated backup script that you can drop into a cron job today, restore from a backup under pressure, and know exactly which approach to use for which scenario. Let's build that safety net.

Why Database Backup Restore Is a Recovery Protocol, Not a Copy Job

Database backup restore is the process of reconstructing a database from a previously captured snapshot or transaction log sequence. The core mechanic is not merely copying files back into place — it's replaying a point-in-time state through a recovery engine that validates checksums, applies WAL segments (PostgreSQL) or binary logs (MySQL), and rolls forward or backward to a consistent LSN or transaction ID. A restore that exits with code 0 can still produce a logically corrupt database if the backup itself was taken without a consistent read lock or if partial writes were captured during the snapshot.

In practice, restore correctness depends on three properties: atomicity of the backup point, replayability of the transaction log, and verification of page-level integrity. MySQL's mysqldump with --single-transaction gives a consistent snapshot for InnoDB but misses ongoing DDL; XtraBackup captures physical pages but requires redo log apply. PostgreSQL's pg_basebackup with --wal-method=stream ensures a consistent cluster state, but a missing WAL segment makes the restore fail silently — exit code 0, data missing. The restore tool must validate that the backup covers a complete log sequence number range, or you get a running database with silent data loss.

You use backup restore when a primary fails, a schema migration corrupts data, or a logical error (e.g., DELETE without WHERE) needs point-in-time recovery. In production, the restore is the only test that matters — if you haven't restored a backup in the last 30 days, you don't have a backup. Teams that rely on exit code 0 from pg_restore or mysql without verifying row counts or checksums discover this the hard way during an incident. The restore must be automated, monitored, and periodically validated against a staging environment with production-scale data.

⚠ Exit Code 0 Is Not a Validator

A restore that exits with code 0 can still produce a database that is missing rows, has broken foreign keys, or is at the wrong point in time — always verify with a row count or checksum query.

📊 Production Insight

A team restored a 2 TB PostgreSQL cluster from a 6-hour-old backup during an outage — exit code 0, database started, but the backup had been taken without pg_start_backup() completing, so the WAL segment was missing. The symptom: queries returned stale data for 30 minutes before the database crashed with a missing WAL error. Rule of thumb: always run pg_verify_checksums and a SELECT COUNT(*) on a critical table immediately after restore.

🎯 Key Takeaway

A backup is only as good as its last successful restore — test restores monthly, not annually.

Exit code 0 from a restore tool does not guarantee logical consistency — always verify with application-level queries.

Point-in-time recovery requires continuous WAL or binlog archiving — a full backup alone is useless for granular recovery.

thecodeforge.io

Database Backup Restore

Logical vs Physical Backups — The First Decision

Before you write a single backup script, you need to decide which type of backup fits your workload. Logical backups produce SQL statements or delimited text. Physical backups copy the raw database files.

Logical backups (mysqldump, pg_dump) are the Swiss Army knife of backups. They produce portable archives that you can restore on a different MySQL/PostgreSQL version, different architecture, or even different database engine with minor tweaks. They're ideal for small databases (<50 GB), schema-only backups, or when you need to migrate between environments. The cost: they are slow. A 500 GB MySQL dump can take hours, and restore takes even longer because every row must go through the SQL parser and re-index.

Physical backups (XtraBackup, pg_basebackup) copy raw data files at the filesystem level. They're fast — a 2 TB database can be backed up in under 30 minutes. They support incremental backups and have minimal database impact during operation. The trade-off: they're tightly coupled to the database version, storage engine, and operating system. You cannot restore an InnoDB physical backup into MyISAM, or a PostgreSQL 16 physical backup into version 15 without pg_upgrade.

The reality: in production, you need both. Logical backups for granular restore (single table, specific rows), and physical backups for full database recovery speed. You also need continuous archiving of transaction logs (binary logs for MySQL, WAL for PostgreSQL) for point-in-time recovery — no backup is complete without them.

backup_decision.shBASH

#!/bin/bash
# io.thecodeforge.backup_decision —— choose backup strategy
DB_SIZE_GB=$(du -sb /var/lib/mysql/io_thecodeforge | awk '{print $1/1073741824}')
if [ $DB_SIZE_GB -lt 50 ]; then
  echo "Use mysqldump or pg_dump for logical backup"
else
  echo "Use XtraBackup or pg_basebackup for physical backup"
fi
# Always enable binary log / WAL archiving regardless of size
mysql -e "SELECT @@log_bin;" | grep -q 1 || echo "Enable log_bin in my.cnf"

Output

Use mysqldump or pg_dump for logical backup

Mental Model

The Photograph vs Xerox Distinction

Think of a logical backup as a detailed description of your database written in English — it's understandable but slow to recreate. A physical backup is a photograph of the entire data structure — instant capture, but you need the same camera to view it.

Logical = human-readable, portable, slow, good for small databases and migration
Physical = machine-optimised, fast, tied to version/storage engine, good for large databases
Both need transaction log archiving for point-in-time recovery — don't skip this

📊 Production Insight

Choosing wrong backup type at scale bites hard: 500 GB pg_dump taking 8 hours while production crawls because of full table scans.

The rule: if restore time matters more than portability, go physical. Logical backup for schema-only or sub-50 GB databases.

Point-in-time recovery requires WAL or binlog archiving — no exception. Without it, you lose everything since the last backup.

🎯 Key Takeaway

Logical backups are portable and slow. Physical backups are fast and version-locked.

Choose based on database size and recovery objectives, not convenience.

Always add transaction log archiving — that's what makes point-in-time recovery possible.

Which Backup Type Do I Need?

IfDatabase < 50 GB, cross-version restore needed, or schema-only

→

UseUse logical (mysqldump, pg_dump)

IfDatabase > 50 GB, need fast restore, or incremental backups

→

UseUse physical (XtraBackup, pg_basebackup)

IfNeed to recover single row or table from any point in time

→

UseLogical backup + binlog/WAL archiving

IfDisaster recovery — restore entire DB in < 4 hours

→

UsePhysical backup + continuous WAL shipping + standby replica

mysqldump: Options That Matter in Production

mysqldump is ubiquitous, but most engineers use it with defaults that are dangerous in production. The default mode locks tables with a global read lock, blocking writes for the entire dump duration. On a busy database, that's a full write outage.

Critical options: - --single-transaction: For InnoDB tables, starts a transaction using MVCC to get a consistent snapshot without locking writes. This is the most important flag for production MySQL. Do not dump a live InnoDB database without it. - --routines --events --triggers: By default, mysqldump does NOT include stored procedures, events, or triggers unless you explicitly add these flags. Missing them means your restore will lose application logic. - --master-data=2: Records the binary log file and position at dump time, essential for replication setup or point-in-time recovery. Also enables --flush-logs implicitly. - --compress: Compresses the output on the fly, saving disk space and network bandwidth without needing a separate gzip step. - --opt: A shortcut for several optimizations (disable foreign key checks, extended inserts) — but it's enabled by default. Good for restore speed, but the extended inserts can cause memory issues on restricted test instances.

Performance consideration: A dump of a 200 GB InnoDB database will take about 30–60 minutes on modern hardware. During that time, the database buffer pool might be flushed by the full table scan. Monitor SHOW ENGINE INNODB STATUS for pending disk reads. For databases over 500 GB, use mydumper (parallel logical dump) or switch to physical backup.

mysql_backup_prod.shBASH

#!/bin/bash
# io.thecodeforge.mysql_backup — production-safe MySQL logical backup
BACKUP_DIR="/backup/mysql/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
mysqldump \
  --single-transaction \
  --routines \
  --events \
  --triggers \
  --master-data=2 \
  --compress \
  --all-databases \
  > "$BACKUP_DIR/full_dump.sql.gz"

# Verify dump integrity (minimal parse test)
mysql -e "SELECT 1" # check connectivity first
zcat "$BACKUP_DIR/full_dump.sql.gz" | tail -n +22 | head -c 10000 | mysql test_verify 2>&1 || echo "Backup corruption detected!"

# Copy binary logs since last dump
mysql -e "SHOW BINARY LOGS;" | awk 'NR>1{print $1}' | while read log; do
  mysqlbinlog /var/lib/mysql/$log > "$BACKUP_DIR/$log.sql"
done

Output

Backup completed: full_dump.sql.gz (342 MB, verified OK)

⚠ The --single-transaction Trap

This option only works with transactional storage engines like InnoDB. If you have MyISAM tables (mixed engine environment), those tables will still be locked globally. Check engine types first:

SELECT ENGINE, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA NOT IN ('sys','performance_schema','information_schema') AND ENGINE != 'InnoDB';

📊 Production Insight

Missing --single-transaction locks writes for 30 minutes — our trading platform's orders stopped.

Forgot --routines? Restored database had no stored procedures; the application crashed with 'does not exist'.

Dumps without --master-data=2 make point-in-time recovery impossible — you don't know the exact binlog position.

Rule: automate these flags in your backup script and review them quarterly.

🎯 Key Takeaway

Always use --single-transaction, --routines, --events, --triggers, and --master-data=2.

Default mysqldump is not production-safe. Never run it without these.

Verify the dump by parsing a small portion into a test database.

thecodeforge.io

Database Backup Restore

pg_dump: What Seniors Know About PostgreSQL Backups

PostgreSQL's logical backup tool pg_dump is more flexible than mysqldump but has its own sharp edges. The key difference: pg_dump runs in a READ COMMITTED isolation level by default, not REPEATABLE READ. That means if another transaction modifies data during the dump, you may get an inconsistent snapshot.

Critical flags: - -j N (parallel jobs): Speeds up dump by using N concurrent workers, each dumping a separate table. The default is 1. For databases with large tables, use -j 4 to cut dump time by 60–70%. But beware: parallel dumps put more load on the server. - --schema-only / --data-only: Useful when you need to preserve structure separately from data, e.g., for change management. - --format=directory: Outputs each table as a separate file inside a directory. This enables parallel restore with pg_restore -j. It also compresses each file individually, so you can restore single tables. - --no-owner / --no-privileges: Essential when restoring on a different server where the PostgreSQL user may not exist. - --snapshot: Use a previously exported snapshot ID for consistent backup across multiple dumps (e.g., schema + data).

Point-in-time recovery prerequisite: pg_dump alone cannot give you PITR. You must combine it with continuous WAL archiving via archive_mode = on and archive_command. The base backup (e.g., pg_basebackup) plus WAL segments allows restoring to any second.

pg_backup_prod.shBASH

#!/bin/bash
# io.thecodeforge.pg_backup — production Postgres logical backup with parallel workers
DB_NAME="io_thecodeforge"
BACKUP_DIR="/backup/pg/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Parallel dump (4 jobs, directory format)
pg_dump -h localhost -U backup_user -j 4 -Fd -f "$BACKUP_DIR/dump" "$DB_NAME"

# Verify the dump by trying to restore a single table to a temp DB
createdb verify_backup
pg_restore -d verify_backup -t "settings" "$BACKUP_DIR/dump" 2>&1 || echo "Restore test failed"
dropdb verify_backup

# Archive WAL segments since last base backup
rsync -avz /var/lib/postgresql/16/archive/* "$BACKUP_DIR/wal/"

Output

pg_dump: 8 tables dumped in parallel (12.3s total)

Restore test passed

Mental Model

Logical dump is like copying a library by rewriting every book by hand

pg_dump reads every row and generates SQL INSERT statements. It takes time proportional to the amount of data, not the size of the files. A 50 GB WAL segment can be copied in seconds; pg_dump on the same database could take hours.

pg_dump is fine for databases up to 100 GB, beyond that use pg_basebackup
Parallel jobs help but increase server load — monitor CPU and I/O
For PITR, pg_dump is not enough; you need WAL archiving configured before the backup

📊 Production Insight

Parallel dump with -j 8 on a 32-core server dropped our query performance by 15% during peak hours. We throttle to -j 4.

The default format creates one huge file — restore of a single table requires parsing the entire dump. Switch to directory format.

WAL archiving must be set up before you need it. Retrospective WAL cannot be generated.

Rule: test parallel jobs on a staging environment first, and always use directory format for production.

🎯 Key Takeaway

Use directory format (-Fd) for production backups — enables granular restore and parallel restore.

Add -j N to parallelize the dump, but don't oversubscribe your CPU.

pg_dump alone cannot do PITR — set up continuous WAL archiving today.

pg_dump Format Selection

IfCustom format, single file, compressed

→

UseUse -Fc (custom format) — good for small databases, single-file restore

IfDirectory format, parallel restore, selective table restore

→

UseUse -Fd (directory format) — best for production, enables pg_restore -j

IfPlain SQL text, portable

→

UseUse default (no -F) — but only for schema-only or tiny datasets

IfNeed to restore single table from large dump

→

UseMust use directory format (-Fd); plain or custom require full restore then dump again

Automating Backups: The Cron Script That Won't Fail Silently

Most backup strategies fail not because of bad tools, but because the automation is brittle. A cron job that silently stops running, or a script that returns exit code 0 even when the backup file is empty, is worse than no backup — it gives false confidence.

Build a robust automated backup pipeline: 1. Pre-checks: Before starting a backup, verify disk space (df -h) is above threshold, database is reachable (mysqladmin ping or pg_isready), and no other backup is running (lock file). 2. Backup execution: Use the commands from earlier sections with proper flags. Redirect stdout/stderr to a log file. 3. Post-check: Immediately verify the backup by attempting a partial restore on a temporary database. For logical dumps, run the first 1000 lines through the parser. For physical backups, check the last modified timestamp and file size. 4. Rotation & retention: Keep daily backups for 7 days, weekly for 4 weeks, monthly for 12 months. Delete old backups automatically, but keep the latest full backup in a separate location. 5. Alerting: If the backup fails or the verification step fails, send an alert to PagerDuty, email, or Slack. Do not rely on cron's built-in mail — it's often not configured.

The cron schedule: For most workloads, run a full backup daily during off-peak hours (e.g., 2 AM). For MySQL with high transaction volume, stream binary logs every 5 minutes to a remote server. For PostgreSQL, configure archive_timeout = 60 in postgresql.conf to archive WAL segments at least every minute.

Pro tip: Use a tool like autobackup.py or a simple bash script with error handling. Avoid complex frameworks unless you have a dedicated team to maintain them.

/etc/cron.d/db_backupBASH

# io.thecodeforge.cron — database backup scheduling
# Daily full backup at 2:00 AM
0 2 * * * root /opt/scripts/mysql_backup_prod.sh > /var/log/db_backup.log 2>&1 || /usr/local/bin/critical_alert "MySQL backup failed"
# Verify yesterday's backup integrity at 3:00 AM (staggered)
30 3 * * * root /opt/scripts/verify_backup.sh yesterday > /var/log/db_verify.log 2>&1 || /usr/local/bin/critical_alert "Backup verification failed"
# Healthcheck: email if no backup file newer than 26 hours
0 * * * * root /opt/scripts/check_backup_age.sh /backup/mysql/ 26 || /usr/local/bin/critical_alert "Backup age exceeded threshold"

Output

cron job installed, logs to /var/log/db_backup.log

⚠ The Silent Failure Pattern

A backup script that runs successfully but produces an empty file because the database was down at the time creates a false sense of security. Always check that the output file is non-empty and has a valid header. For mysqldump, verify the second line contains '-- MySQL dump'. For pg_dump custom format, use pg_restore -l to list contents.

📊 Production Insight

We learned the hard way: cron job was running but the backup script had a permission denied error on the output directory. Because stdout was redirected to /dev/null (from a copy-paste mistake), no log was generated. We discovered after 10 days of missing backups.

Rule: every backup script must write to a log file, log rotation must be separate, and a healthcheck must check backup file age and size.

Use 'set -euo pipefail' at the top of every backup script to catch early errors.

🎯 Key Takeaway

Automate backups with cron, but add pre-checks, post-verification, and alerting.

A silent failure is the most dangerous — your backup is a lie until proven restoreable.

Always log to a file and monitor backup age with a separate healthcheck.

Restore Under Pressure: The Procedure That Saves Your Career

Restores happen under pressure. The system is down, stakeholders are watching, and your hands are shaking. A well-documented restore procedure is your lifeline. The number one rule: never restore directly onto the production server unless you have no other choice. Always restore to a separate instance first to validate the backup.

Restore steps for logical backup: 1. Spin up a clean instance with the same version and configuration. 2. Transfer the backup file (if not already local). 3. For MySQL: mysql -u root -p < full_dump.sql or using a compressed file: zcat full_dump.sql.gz | mysql -u root -p. 4. For PostgreSQL: pg_restore -d new_db -j 4 -Fd backup_directory or for plain SQL: psql -d new_db -f backup.sql. 5. Verify the data: run select count(*) on critical tables, test application queries. 6. If success, point application to new database (or promote restored instance).

Restore steps for physical backup (XtraBackup): 1. Prepare the backup: xtrabackup --prepare --target-dir=/backup/xtra. 2. Copy the prepared backup to the MySQL data directory. 3. Start MySQL; check error log. 4. Apply binary logs from the dump position to the desired point in time.

Restore steps for physical backup (pg_basebackup): 1. Ensure the base backup directory is in place. 2. Edit postgresql.auto.conf or recovery.conf to set restore_command to fetch WAL from archive. 3. Create a recovery target if needed: recovery_target_time '2026-04-22 14:30:00'. 4. Start PostgreSQL; it will automatically replay WAL until the target. 5. Run SELECT pg_is_in_recovery(); to confirm it's up.

Test your restores routinely. Schedule a monthly restore drill where you restore the latest backup to a test environment and run a script that validates row counts from a known checkpoint. Without testing, you're gambling.

restore_drill.shBASH

#!/bin/bash
# io.thecodeforge.restore_drill — monthly restore validation
set -euo pipefail

RESTORE_DIR="/tmp/restore_drill_$(date +%s)"
mkdir -p "$RESTORE_DIR"

cd "$RESTORE_DIR"
# Fetch latest backup from remote storage
aws s3 cp s3://backups/mysql/latest/full_dump.sql.gz .

# Prepare test database
mysql -e "CREATE DATABASE IF NOT EXISTS restore_test;"
mysql -e "DROP DATABASE IF EXISTS restore_test;"
mysql -e "CREATE DATABASE restore_test;"

# Restore
zcat full_dump.sql.gz | mysql -u root restore_test 2>&1 | tee restore.log || echo "Restore failed"

# Verify
ROWS=$(mysql -u root -e "SELECT COUNT(*) AS total_rows FROM restore_test.io_thecodeforge.users;" 2>/dev/null | tail -1)
if [ "$ROWS" -gt 1000 ]; then
  echo "Restore validation passed: $ROWS rows restored."
else
  echo "Restore validation FAILED: expected >1000 rows, got $ROWS"
  exit 1
fi

# Cleanup
mysql -e "DROP DATABASE restore_test;"
rm -rf "$RESTORE_DIR"

Output

Restore validation passed: 15473 rows restored.

Mental Model

The Fire Drill Principle

You don't practice fire drills while the building is ablaze. Restore drills are the same: slow, methodical, documented. When the real fire comes, muscle memory takes over.

Document the restore procedure step by step, with exact commands
Include environment variables, paths, and which backup file to use
Practice once a month — rotate the person running the drill
After each restore, write a one-paragraph retrospective

📊 Production Insight

Our first production restore took 6 hours because we didn't know the binary log position matched the dump. We had to scan the backup file header for 'MASTER_LOG_POS' manually.

Rule: store the backup metadata (binlog position, WAL file name) in a separate file next to the backup.

The second time took 27 minutes because we had a documented procedure and had practiced it twice.

Rule: a one-page restore document beats a thousand-word book.

🎯 Key Takeaway

Never restore directly onto production — validate on a separate instance first.

Document every step of the restore process with exact commands.

Run monthly restore drills — a backup untested will fail when you need it most.

Point-in-Time Recovery: Your Undo Button for the Entire Database

Point-in-time recovery (PITR) isn't a feature. It's your career insurance. The difference between restoring a full backup from midnight and recovering to 2:47 PM — right before that DELETE without a WHERE clause ran — is a matter of minutes, not days.

Every major database supports it. MySQL binlogs, PostgreSQL WAL archives, SQL Server transaction logs. They all record every write operation in sequence. PITR replays those logs up to a specific timestamp or LSN. You don't need a separate backup for every minute. You need one full backup plus all the logs generated after it.

The gotcha: you must have the logs. No logs, no PITR. That means log archiving must be configured before the incident. Not after. If your binlog retention is 24 hours and someone drops a table on Friday afternoon, your Saturday morning restore will miss that window. Set retention to cover your worst-case recovery scenario, not your average Tuesday.

Senior move: test PITR quarterly. Restore a copy of production to a staging server and roll forward to five minutes before a known event. If it takes longer than your RTO allows, you either need faster storage or a smaller full backup frequency. Either way, you find out before the CEO asks why the dashboard is showing yesterday's numbers.

MysqlPITR.sqlSQL

// io.thecodeforge — database tutorial

-- Step 1: Enable binlogs (my.cnf)
-- log_bin = /var/log/mysql/mysql-bin.log
-- expire_logs_days = 7
-- binlog_format = ROW

-- Step 2: Full backup (Sunday 02:00)
mysqldump --single-transaction --all-databases \
  --master-data=2 > /backup/full_$(date +%Y%m%d).sql

-- Step 3: Locate the crime timestamp in binlog
mysqlbinlog /var/log/mysql/mysql-bin.000045 \
  | grep -A5 'DROP TABLE payments'

-- Output example:
# at 1843672
#250317  2:47:15 server id 1  end_log_pos 1843720 ... DELETE

-- Step 4: Restore to 2 minutes before disaster
mysql < /backup/full_20250317.sql
mysqlbinlog --stop-datetime='2025-03-17 02:45:00' \
  /var/log/mysql/mysql-bin.* | mysql

Output

Restored to timestamp 2025-03-17 02:45:00.

0 rows affected. 1,234 transactions replayed.

Database 'payments' contains 87,654 rows.

Expected row count at incident time: 87,654.

Verification complete. No data loss.

⚠ Production Trap:

mysqlbinlog can explode your terminal if you pipe a multi-gigabyte log without limiting output. Always use --stop-datetime or --stop-position. Stare at the wall for 30 minutes while it runs, or be fired. Your choice.

🎯 Key Takeaway

PITR requires logs first, backups second. Set binlog/WAL retention to RPO + 20%. Test the restore path quarterly.

Restore to a Different Server: Why You Never Overwrite Production

Restoring over the original database is the cardinal sin of database recovery. It turns a single-point failure into a total loss. If the restore fails halfway, you've nuked the source and the target. Now you have no database at all.

The rule: always restore to a new server or database instance. Every major database supports this. PostgreSQL has createdb from template. MySQL lets you restore into a new database name. SQL Server has the WITH MOVE option. Azure SQL forces it — you can't overwrite an existing database. That's not a limitation, it's a safety feature.

Why this matters beyond safety: you can verify the restored data before cutting over. Run integrity checks. Compare row counts. Validate that your report queries still work. If something is broken, you don't cascade the corruption back into production. You fix the restore process, drop the bad copy, and try again.

Senior shortcut: script the rename-and-switch. Restore to 'payments_restored', run checks, then rename 'payments' to 'payments_old' and 'payments_restored' to 'payments'. Atomic. Reversible. If the checks fail, you just delete the restores copy and production stays untouched. This is the difference between a 15-minute recovery and a 3-hour firefight with your CTO on the bridge.

PgRestoreNewServer.sqlSQL

// io.thecodeforge — database tutorial

-- Backup production (already exists from cron)
pg_dump -h prod-db-01 -U dba \
  --format=custom --compress=9 \
  --file=/backup/prod_payments.dump \
  payments

-- Restore to new database on same server
createdb -U dba -T template0 payments_restored
pg_restore -U dba -d payments_restored \
  --no-owner --no-privileges \
  /backup/prod_payments.dump

-- Verify: row count sanity check
psql -U dba -d payments_restored \
  -c "SELECT count(*) FROM orders WHERE created_at > now() - interval '1 day';"

-- Output:
--   count
-- --------
--  45231

-- Compare with production (if accessible, read-only)
psql -U dba -d payments \
  -c "SELECT count(*) FROM orders WHERE created_at > now() - interval '1 day';"

-- Atomic swap (transactional)
psql -U dba <<EOF
BEGIN;
ALTER DATABASE payments RENAME TO payments_old;
ALTER DATABASE payments_restored RENAME TO payments;
COMMIT;
EOF

Output

Row counts match: 45231.

Database swap completed in 0.8 seconds.

Zero downtime during rename transaction.

Old database 'payments_old' retained for 7 days.

Rollback script available: rename back if incident detected.

💡Senior Shortcut:

Use PostgreSQL's pg_restore --list and --use-list to pick individual tables or schemas from a full backup. No need to restore 2TB when you only need the 'invoices' table. Master this and you're the hero who restored the CFO's data while the DBA team was still planning the full restore.

🎯 Key Takeaway

Never restore in-place. Use a new database name, verify integrity, then swap atomically. Keep the old name as a rollback safety net.

Compression and Encryption: Why Your Backup Pipeline Must Be a Safe, Slim Pipeline

You can dump all day, but if you're copying raw 50GB SQL files across the wire, you deserve the bandwidth bill. Compress before you transfer, encrypt before you store. Your backup script is not just a copy job—it's a data exfiltration risk if left unencrypted.

Use gzip on the fly with mysqldump: mysqldump ... | gzip > dump.sql.gz. For encryption, pipe through openssl or use GPG. PostgreSQL allows compressed dumps natively with pg_dump -Z 9. Don't skip integrity checks. Compute SHA256 checksums on the encrypted file and store them separately. Verifying checksums before restore is what separates you from the junior who restores a corrupted archive at 3 AM.

Test your decryption pipeline. If you can't decrypt your own backup in a dry run, it's not a backup—it's a paperweight.

BackupPipe.sqlSQL

// io.thecodeforge — database tutorial

-- Compressed, encrypted PostgreSQL backup with integrity
\! pg_dump -Z 9 -h prod-db -U admin myapp \
  | gpg --encrypt --recipient backup-team@corp.com \
  > /backup/myapp_$(date +%F).sql.gz.gpg
\! sha256sum /backup/myapp_$(date +%F).sql.gz.gpg \
  > /backup/myapp_$(date +%F).sha256

Output

File: myapp_2024-10-27.sql.gz.gpg

SHA256: a1b2c3d4e5f6... myapp_2024-10-27.sql.gz.gpg

⚠ Production Trap:

If your backup pipeline lacks encryption, you are a single S3 bucket misconfiguration away from a breach notice. Encrypt at rest and in transit—no exceptions.

🎯 Key Takeaway

Always compress and encrypt your backup stream; verify integrity with a checksum stored separately.

Restore to a Staging Environment First: Your Safety Net Before Touching Production

The fastest way to get fired is restoring a backup directly onto production without verifying it first. I've seen a senior dev overwrite a live database with a week-old snapshot because he didn't check the backup timestamp. Don't be that guy.

Always restore to a staging or isolated environment first. Validate the data: row counts, recent timestamps, foreign key integrity, and any business-critical queries. Run a smoke test that exercises your application's core features against the restored data. Only after confirmation do you point production traffic at the fresh copy or migrate the validated snapshot.

Automate this validation. A cron job that restores last night's backup to a staging DB and runs a test suite every morning pays for itself the first time it catches corruption. If you can't restore and verify automatically, your backup strategy is a hope, not a plan.

RestoreToStaging.sqlSQL

// io.thecodeforge — database tutorial

-- Restore to staging, validate, then signal readiness
\! pg_restore -d myapp_staging \
  /backup/myapp_2024-10-27.dump

-- Validate row count
SELECT COUNT(*) FROM orders WHERE order_date >= '2024-10-25';

-- Validate recent login
SELECT MAX(login_at) FROM users;

-- Exit code 1 if staging count mismatch
SELECT CASE WHEN COUNT(*) = 0 THEN 1/0 ELSE 1 END FROM orders;

Output

count

-------

13452

max

------------

2024-10-27 14:23:01

ERROR: division by zero

💡Senior Shortcut:

Write a shell script that restores to staging, runs validation queries, and exits non-zero on failure. Wire it to your CI/CD pipeline as a pre-deployment gate.

🎯 Key Takeaway

Never restore directly to production. Validate data integrity in a staging environment first, every single time.

Point-in-Time Recovery: PostgreSQL WAL Archiving and MySQL Binlog

Point-in-time recovery (PITR) is the ultimate safety net for databases, allowing you to restore to any moment before a disaster. In PostgreSQL, PITR relies on Write-Ahead Log (WAL) archiving. Enable it by setting wal_level = replica or logical in postgresql.conf, then configure archive_mode = on and archive_command to copy WAL segments to a safe location (e.g., cp %p /archive/%f). To restore, you use pg_basebackup to take a base backup and then apply WAL files up to the desired point. For MySQL, binary logs (binlogs) serve the same purpose. Enable binlogging with log_bin = /var/log/mysql/binlog and set expire_logs_days = 7 to retain logs. Use mysqlbinlog to replay events: mysqlbinlog --stop-datetime="2025-03-15 10:30:00" binlog.000001 | mysql -u root. Always test your PITR procedure regularly to ensure logs are being archived correctly and restores work.

pitr-example.sqlSQL

-- PostgreSQL WAL archiving setup
ALTER SYSTEM SET wal_level = replica;
ALTER SYSTEM SET archive_mode = on;
ALTER SYSTEM SET archive_command = 'cp %p /archive/%f';

-- MySQL binlog configuration
SET GLOBAL log_bin = '/var/log/mysql/binlog';
SET GLOBAL expire_logs_days = 7;

-- MySQL PITR restore using binlog
mysqlbinlog --stop-datetime="2025-03-15 10:30:00" /var/log/mysql/binlog.000001 | mysql -u root

⚠ Test Your PITR Regularly

📊 Production Insight

In production, set up automated monitoring for archive gaps. Use tools like pg_waldump or mysqlbinlog to verify log integrity, and always keep at least 7 days of logs for compliance.

🎯 Key Takeaway

Point-in-time recovery requires continuous archiving of transaction logs (WAL for PostgreSQL, binlog for MySQL) and regular testing to ensure you can restore to any moment.

Physical vs Logical Backup: When to Use Each

Physical backups copy the raw database files (e.g., data directory, WAL segments), while logical backups export data as SQL or delimited text. Physical backups are faster for large databases and support PITR, but are tied to the exact server version and filesystem. Logical backups (e.g., mysqldump, pg_dump) are portable across versions and platforms, but slower and unsuitable for PITR. Use physical backups for large databases (>100GB) where recovery speed matters, and for full disaster recovery. Use logical backups for small databases, schema migrations, or selective table exports. For PostgreSQL, pg_basebackup creates a physical backup, while pg_dump is logical. For MySQL, xtrabackup (physical) vs mysqldump (logical). A common strategy: take daily physical backups for quick recovery and weekly logical backups for portability.

backup-types.sqlSQL

-- PostgreSQL physical backup (requires replication slot)
pg_basebackup -D /backup/physical -X stream -P -v

-- PostgreSQL logical backup
pg_dump -Fc mydb > /backup/logical/mydb.dump

-- MySQL physical backup using Percona XtraBackup
xtrabackup --backup --target-dir=/backup/physical

-- MySQL logical backup
mysqldump --single-transaction --routines --triggers mydb > /backup/logical/mydb.sql

🔥Hybrid Approach

📊 Production Insight

In production, schedule physical backups during low traffic and logical backups on weekends. Always verify backups by restoring to a staging environment.

🎯 Key Takeaway

Choose physical backups for speed and PITR capability on large databases; choose logical backups for portability and selective restores.

Cloud Database Backup: RDS Snapshots, Cloud SQL Backups

Cloud-managed databases like AWS RDS, Google Cloud SQL, and Azure Database offer automated backups, but you must understand their limitations. RDS snapshots are physical backups stored in S3, with automated daily snapshots and transaction log retention for PITR (up to 35 days). Cloud SQL backups are also automated, with binary log retention for PITR. However, these backups are tied to the cloud provider—you cannot easily restore them to an on-premises database. Always export logical backups (e.g., mysqldump from RDS, pg_dump from Cloud SQL) for cross-cloud or local recovery. For RDS, use aws rds create-db-snapshot for manual snapshots and aws rds restore-db-instance-from-db-snapshot for restore. For Cloud SQL, use gcloud sql backups create and gcloud sql instances restore-backup. Remember: cloud backups are region-specific; enable cross-region replication for disaster recovery.

cloud-backup.shSQL

-- AWS RDS manual snapshot (via AWS CLI)
aws rds create-db-snapshot --db-instance-identifier mydb --db-snapshot-identifier mydb-snapshot-20250315

-- Restore RDS from snapshot
aws rds restore-db-instance-from-db-snapshot --db-instance-identifier mydb-restored --db-snapshot-identifier mydb-snapshot-20250315

-- Google Cloud SQL backup (via gcloud)
gcloud sql backups create --instance mydb --async

-- Restore Cloud SQL from backup
gcloud sql instances restore-backup mydb --backup-id 123456

💡Export Logical Backups from Cloud

📊 Production Insight

In production, automate logical exports to a different cloud region or on-premises storage. Test cross-region restore at least quarterly to ensure DR readiness.

🎯 Key Takeaway

Cloud database backups are convenient but vendor-specific; always complement them with logical exports for cross-platform recovery.

● Production incidentPOST-MORTEMseverity: high

The Silent Backup Blob: When 2 TB of Nothing Saves Your Database

Symptom

Restore from a 2 TB mysqldump file produced an error 20 minutes in: 'ERROR 1064 (42000) at line 1: You have an error in your SQL syntax'. The backup had been completing without errors every night for 21 days.

Assumption

The team assumed that a successful exit code (0) from mysqldump meant the data was valid and restorable. They also assumed the full nightly backup eliminated the need for binary log archiving.

Root cause

A silent disk controller failure on the backup server caused intermittent bit rot in blocks written to a specific SATA port. The corrupt data was read back successfully during the backup but had flipped bits on magnetic media. MySQL's InnoDB page checksums caught the corruption during restore, but mysqldump's text output had no integrity checksum. The backup was structurally valid SQL that simply referenced non-existent data.

Fix

1) Replaced the failing disk controller. 2) Added --checksum and --single-transaction to mysqldump. 3) Implemented a post-backup verification step using mysql < backup.sql > /dev/null on a staging instance. 4) Enabled binary log retention and started shipping WAL-equivalent logs to an S3 bucket. 5) Instituted monthly full restore drills.

Key lesson

A backup that hasn't been restored is a wish, not a backup.
Exit code 0 from a dump tool is necessary but not sufficient — verify the restore on a test instance.
Always combine logical backups with continuous archiving of transaction logs for point-in-time recovery.

Production debug guideCommon symptoms in production and the exact commands to diagnose them4 entries

Symptom · 01

mysqldump completes but restore fails with syntax errors

→

Fix

Check the backup file for binary content: head -c 100 backup.sql | xxd. Use mysql -e "SELECT @@version" to verify server version compatibility. Test restore on a staging DB: mysql test_db < backup.sql and look for error line numbers.

Symptom · 02

pg_dump takes hours and inflates disk I/O to 100%

→

Fix

Check if WAL archiving is competing: SELECT * FROM pg_stat_archiver. Use pg_dump -j 4 --format=directory for parallel dump. Monitor iotop and pg_stat_activity for blocking sessions. Consider using pg_basebackup for large databases.

Symptom · 03

Restore from pg_basebackup fails with 'archive not found'

→

Fix

Verify the restore_command in recovery.conf points to the correct location and the WAL segments are present. Use pg_waldump on the first segment to check timeline. Ensure --wal-method=stream was used during base backup if continuous archiving is off.

Symptom · 04

Backup size grows unexpectedly large

→

Fix

Check for transaction log retention: MySQL: SHOW BINARY LOGS; — are old logs being purged? PostgreSQL: SELECT * FROM pg_stat_replication — is a stale replication slot preventing WAL cleanup? For logical dumps, check if binary logs are included (MySQL --flush-logs).

★ Emergency Backup/Restore CommandsWhen a backup fails or restore is needed under pressure, run these commands first.

MySQL restore broken after partial restore−

Immediate action

Stop the MySQL service, move corrupted tablespace files aside, restart with `innodb_force_recovery=1`

Commands

systemctl stop mysql && mv /var/lib/mysql/ibdata1 /tmp/ibdata1.corrupt

mysqld --innodb_force_recovery=1 &

Fix now

Rebuild the table using an earlier valid backup or point-in-time recovery with binary logs.

PostgreSQL cannot start after restore because of WAL mismatch+

Automated backup cron job hasn't run for days+

Backup Comparison: MySQL vs PostgreSQL

Aspect	MySQL	PostgreSQL
Logical dump tool	mysqldump (single-thread, --single-transaction for InnoDB)	pg_dump (parallel jobs with -j, directory format -Fd)
Physical backup tool	Percona XtraBackup (hot backup, incremental)	pg_basebackup (full cluster copy, includes WAL)
Point-in-time recovery	Binary logs (binlog), restore with mysqlbinlog	WAL archives, set recovery_target_time in postgresql.conf
Parallel restore support	No native — use mydumper/myldumper for parallel dump/restore	pg_restore -j N with directory format; parallel apply of WAL
Cross-version restore (logical)	Generally compatible — test for syntax differences (e.g., charset)	pg_dump --no-owner for major version migration; pg_upgrade for physical
Incremental backup	XtraBackup supports incremental; binlog can be considered incremental	Incremental via pg_basebackup --incremental in v17+; WAL archiving is continuous
Backup verification built-in	None — must do test restore	pg_verifybackup for physical backups (v13+); pg_dump has --check option

⚙ Quick Reference

12 commands from this guide

File	Command / Code	Purpose
backup_decision.sh	DB_SIZE_GB=$(du -sb /var/lib/mysql/io_thecodeforge \| awk '{print $1/1073741824}'...	Logical vs Physical Backups
mysql_backup_prod.sh	BACKUP_DIR="/backup/mysql/$(date +%Y%m%d)"	mysqldump
pg_backup_prod.sh	DB_NAME="io_thecodeforge"	pg_dump
etccron.ddb_backup	0 2 * * * root /opt/scripts/mysql_backup_prod.sh > /var/log/db_backup.log 2>&1 \|...	Automating Backups
restore_drill.sh	set -euo pipefail	Restore Under Pressure
MysqlPITR.sql	mysqldump --single-transaction --all-databases \	Point-in-Time Recovery
PgRestoreNewServer.sql	pg_dump -h prod-db-01 -U dba \	Restore to a Different Server
BackupPipe.sql	\! pg_dump -Z 9 -h prod-db -U admin myapp \	Compression and Encryption
RestoreToStaging.sql	\! pg_restore -d myapp_staging \	Restore to a Staging Environment First
pitr-example.sql	ALTER SYSTEM SET wal_level = replica;	Point-in-Time Recovery
backup-types.sql	pg_basebackup -D /backup/physical -X stream -P -v	Physical vs Logical Backup
cloud-backup.sh	aws rds create-db-snapshot --db-instance-identifier mydb --db-snapshot-identifie...	Cloud Database Backup

Key takeaways

Choose backup type based on database size and recovery SLAs

logical for small/portable, physical for large/fast.

Always use --single-transaction for MySQL InnoDB dumps and -Fd -j for PostgreSQL to avoid blocking and speed up restore.

Automate backups with pre-checks, post-verification, and alerting. A silent failure is the most dangerous.

Never trust a backup until you've tested a restore. Run monthly restore drills on a staging environment.

Continuous WAL/binlog archiving is non-negotiable for point-in-time recovery

set it up before you need it.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the difference between a logical backup and a physical backup. W...

Q02SENIOR

How would you automate PostgreSQL backups for a 200 GB database with a r...

Q03SENIOR

What is point-in-time recovery (PITR) and how do you implement it in Pos...

Q04SENIOR

You receive a corrupted mysqldump file with 2 TB of data. How do you rec...

Q01 of 04SENIOR

Explain the difference between a logical backup and a physical backup. When would you use each for a MySQL database in production?

ANSWER

A logical backup exports the data into SQL statements or delimited files (e.g., mysqldump). A physical backup copies the raw database files at the filesystem level (e.g., XtraBackup). Use logical backups for databases under 50 GB, schema migrations, or when you need cross-version portability. Use physical backups for large databases, fast recovery SLAs, and incremental backups. In production, combine logical for granular restore and physical for speed, plus binary log archiving for point-in-time recovery.

FAQ · 5 QUESTIONS

Frequently Asked Questions

Can I restore a MySQL mysqldump backup into PostgreSQL?

How often should I run a full database backup?

What is the difference between cold, hot, and warm backups?

How do I perform a point-in-time recovery (PITR) in MySQL?

Should I compress my backup files?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's MySQL & PostgreSQL. Mark it forged?

11 min read · try the examples if you haven't