MySQL & PostgreSQL Backup Failures — Why Exit Code 0 Lies
A 2TB mysqldump ran 21 days before bit rot made it unrestorable.
- Logical backups (mysqldump, pg_dump) produce SQL text files — portable, cross-version, but slow for large datasets
- Physical backups (XtraBackup, pg_basebackup) copy raw data files — fast, full/incremental, but tied to storage engine
- Production backups must include binary logs (MySQL) or WAL archives (PostgreSQL) for point-in-time recovery
- Automated scripts belong in cron with healthcheck — silent backup failures are the most dangerous
- The biggest mistake: never testing a restore — a backup you haven't restored is a wish
Imagine you spend three months building the world's greatest LEGO castle. A backup is a photo of that castle taken every day — so if your little sibling kicks it over, you can rebuild from yesterday's photo instead of starting from scratch. A restore is the act of rebuilding from that photo. Database backup and restore is exactly that: saving a copy of your data at a point in time, and bringing it back when something goes wrong.
Every developer eventually faces The Moment — a production database gets corrupted, a junior engineer runs DELETE without a WHERE clause, or a cloud disk silently fails. The difference between a bad afternoon and a company-ending catastrophe is whether you had a solid backup strategy before it happened. Backup and restore isn't a nice-to-have; it's the insurance policy your entire application depends on.
The problem is that most tutorials show you a command and call it a day. They don't explain that there are fundamentally different types of backups — logical vs. physical — and that choosing the wrong one for your situation can mean either a 6-hour restore window when you need 20 minutes, or a backup file that's completely unusable. They don't explain why pg_dump and mysqldump exist as separate tools with different philosophies, or when you'd reach for something else entirely.
By the end of this article you'll be able to: create logical and physical backups for both MySQL and PostgreSQL, write an automated backup script that you can drop into a cron job today, restore from a backup under pressure, and know exactly which approach to use for which scenario. Let's build that safety net.
Logical vs Physical Backups — The First Decision
Before you write a single backup script, you need to decide which type of backup fits your workload. Logical backups produce SQL statements or delimited text. Physical backups copy the raw database files.
Logical backups (mysqldump, pg_dump) are the Swiss Army knife of backups. They produce portable archives that you can restore on a different MySQL/PostgreSQL version, different architecture, or even different database engine with minor tweaks. They're ideal for small databases (<50 GB), schema-only backups, or when you need to migrate between environments. The cost: they are slow. A 500 GB MySQL dump can take hours, and restore takes even longer because every row must go through the SQL parser and re-index.
Physical backups (XtraBackup, pg_basebackup) copy raw data files at the filesystem level. They're fast — a 2 TB database can be backed up in under 30 minutes. They support incremental backups and have minimal database impact during operation. The trade-off: they're tightly coupled to the database version, storage engine, and operating system. You cannot restore an InnoDB physical backup into MyISAM, or a PostgreSQL 16 physical backup into version 15 without pg_upgrade.
The reality: in production, you need both. Logical backups for granular restore (single table, specific rows), and physical backups for full database recovery speed. You also need continuous archiving of transaction logs (binary logs for MySQL, WAL for PostgreSQL) for point-in-time recovery — no backup is complete without them.
- Logical = human-readable, portable, slow, good for small databases and migration
- Physical = machine-optimised, fast, tied to version/storage engine, good for large databases
- Both need transaction log archiving for point-in-time recovery — don't skip this
mysqldump: Options That Matter in Production
mysqldump is ubiquitous, but most engineers use it with defaults that are dangerous in production. The default mode locks tables with a global read lock, blocking writes for the entire dump duration. On a busy database, that's a full write outage.
Critical options: - --single-transaction: For InnoDB tables, starts a transaction using MVCC to get a consistent snapshot without locking writes. This is the most important flag for production MySQL. Do not dump a live InnoDB database without it. - --routines --events --triggers: By default, mysqldump does NOT include stored procedures, events, or triggers unless you explicitly add these flags. Missing them means your restore will lose application logic. - --master-data=2: Records the binary log file and position at dump time, essential for replication setup or point-in-time recovery. Also enables --flush-logs implicitly. - --compress: Compresses the output on the fly, saving disk space and network bandwidth without needing a separate gzip step. - --opt: A shortcut for several optimizations (disable foreign key checks, extended inserts) — but it's enabled by default. Good for restore speed, but the extended inserts can cause memory issues on restricted test instances.
Performance consideration: A dump of a 200 GB InnoDB database will take about 30–60 minutes on modern hardware. During that time, the database buffer pool might be flushed by the full table scan. Monitor SHOW ENGINE INNODB STATUS for pending disk reads. For databases over 500 GB, use mydumper (parallel logical dump) or switch to physical backup.
SELECT ENGINE, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA NOT IN ('sys','performance_schema','information_schema') AND ENGINE != 'InnoDB';pg_dump: What Seniors Know About PostgreSQL Backups
PostgreSQL's logical backup tool pg_dump is more flexible than mysqldump but has its own sharp edges. The key difference: pg_dump runs in a READ COMMITTED isolation level by default, not REPEATABLE READ. That means if another transaction modifies data during the dump, you may get an inconsistent snapshot.
Critical flags: - -j N (parallel jobs): Speeds up dump by using N concurrent workers, each dumping a separate table. The default is 1. For databases with large tables, use -j 4 to cut dump time by 60–70%. But beware: parallel dumps put more load on the server. - --schema-only / --data-only: Useful when you need to preserve structure separately from data, e.g., for change management. - --format=directory: Outputs each table as a separate file inside a directory. This enables parallel restore with pg_restore -j. It also compresses each file individually, so you can restore single tables. - --no-owner / --no-privileges: Essential when restoring on a different server where the PostgreSQL user may not exist. - --snapshot: Use a previously exported snapshot ID for consistent backup across multiple dumps (e.g., schema + data).
Point-in-time recovery prerequisite: pg_dump alone cannot give you PITR. You must combine it with continuous WAL archiving via archive_mode = on and archive_command. The base backup (e.g., pg_basebackup) plus WAL segments allows restoring to any second.
- pg_dump is fine for databases up to 100 GB, beyond that use pg_basebackup
- Parallel jobs help but increase server load — monitor CPU and I/O
- For PITR, pg_dump is not enough; you need WAL archiving configured before the backup
-Fc (custom format) — good for small databases, single-file restore-Fd (directory format) — best for production, enables pg_restore -jAutomating Backups: The Cron Script That Won't Fail Silently
Most backup strategies fail not because of bad tools, but because the automation is brittle. A cron job that silently stops running, or a script that returns exit code 0 even when the backup file is empty, is worse than no backup — it gives false confidence.
Build a robust automated backup pipeline: 1. Pre-checks: Before starting a backup, verify disk space (df -h) is above threshold, database is reachable (mysqladmin ping or pg_isready), and no other backup is running (lock file). 2. Backup execution: Use the commands from earlier sections with proper flags. Redirect stdout/stderr to a log file. 3. Post-check: Immediately verify the backup by attempting a partial restore on a temporary database. For logical dumps, run the first 1000 lines through the parser. For physical backups, check the last modified timestamp and file size. 4. Rotation & retention: Keep daily backups for 7 days, weekly for 4 weeks, monthly for 12 months. Delete old backups automatically, but keep the latest full backup in a separate location. 5. Alerting: If the backup fails or the verification step fails, send an alert to PagerDuty, email, or Slack. Do not rely on cron's built-in mail — it's often not configured.
The cron schedule: For most workloads, run a full backup daily during off-peak hours (e.g., 2 AM). For MySQL with high transaction volume, stream binary logs every 5 minutes to a remote server. For PostgreSQL, configure archive_timeout = 60 in postgresql.conf to archive WAL segments at least every minute.
Pro tip: Use a tool like autobackup.py or a simple bash script with error handling. Avoid complex frameworks unless you have a dedicated team to maintain them.
pg_restore -l to list contents.Restore Under Pressure: The Procedure That Saves Your Career
Restores happen under pressure. The system is down, stakeholders are watching, and your hands are shaking. A well-documented restore procedure is your lifeline. The number one rule: never restore directly onto the production server unless you have no other choice. Always restore to a separate instance first to validate the backup.
Restore steps for logical backup: 1. Spin up a clean instance with the same version and configuration. 2. Transfer the backup file (if not already local). 3. For MySQL: mysql -u root -p < full_dump.sql or using a compressed file: zcat full_dump.sql.gz | mysql -u root -p. 4. For PostgreSQL: pg_restore -d new_db -j 4 -Fd backup_directory or for plain SQL: psql -d new_db -f backup.sql. 5. Verify the data: run select count(*) on critical tables, test application queries. 6. If success, point application to new database (or promote restored instance).
Restore steps for physical backup (XtraBackup): 1. Prepare the backup: xtrabackup --prepare --target-dir=/backup/xtra. 2. Copy the prepared backup to the MySQL data directory. 3. Start MySQL; check error log. 4. Apply binary logs from the dump position to the desired point in time.
Restore steps for physical backup (pg_basebackup): 1. Ensure the base backup directory is in place. 2. Edit postgresql.auto.conf or recovery.conf to set restore_command to fetch WAL from archive. 3. Create a recovery target if needed: recovery_target_time '2026-04-22 14:30:00'. 4. Start PostgreSQL; it will automatically replay WAL until the target. 5. Run SELECT to confirm it's up.pg_is_in_recovery();
Test your restores routinely. Schedule a monthly restore drill where you restore the latest backup to a test environment and run a script that validates row counts from a known checkpoint. Without testing, you're gambling.
- Document the restore procedure step by step, with exact commands
- Include environment variables, paths, and which backup file to use
- Practice once a month — rotate the person running the drill
- After each restore, write a one-paragraph retrospective
The Silent Backup Blob: When 2 TB of Nothing Saves Your Database
--checksum and --single-transaction to mysqldump. 3) Implemented a post-backup verification step using mysql < backup.sql > /dev/null on a staging instance. 4) Enabled binary log retention and started shipping WAL-equivalent logs to an S3 bucket. 5) Instituted monthly full restore drills.- A backup that hasn't been restored is a wish, not a backup.
- Exit code 0 from a dump tool is necessary but not sufficient — verify the restore on a test instance.
- Always combine logical backups with continuous archiving of transaction logs for point-in-time recovery.
head -c 100 backup.sql | xxd. Use mysql -e "SELECT @@version" to verify server version compatibility. Test restore on a staging DB: mysql test_db < backup.sql and look for error line numbers.SELECT * FROM pg_stat_archiver. Use pg_dump -j 4 --format=directory for parallel dump. Monitor iotop and pg_stat_activity for blocking sessions. Consider using pg_basebackup for large databases.restore_command in recovery.conf points to the correct location and the WAL segments are present. Use pg_waldump on the first segment to check timeline. Ensure --wal-method=stream was used during base backup if continuous archiving is off.SHOW BINARY LOGS; — are old logs being purged? PostgreSQL: SELECT * FROM pg_stat_replication — is a stale replication slot preventing WAL cleanup? For logical dumps, check if binary logs are included (MySQL --flush-logs).Key takeaways
Common mistakes to avoid
5 patternsUsing mysqldump without --single-transaction on a live InnoDB database
--single-transaction to the mysqldump command to use a transaction snapshot instead of locking tables.Not testing backups by performing a restore
Assuming pg_dump includes WAL information for point-in-time recovery
archive_mode = on and archive_command BEFORE the backup.Backing up only the database, ignoring configuration and scripts
my.cnf / postgresql.conf, schema-only dumps, and application-side database scripts (migrations) in your backup rotation.Relying on a single backup destination (same disk, same datacenter)
Interview Questions on This Topic
Explain the difference between a logical backup and a physical backup. When would you use each for a MySQL database in production?
Frequently Asked Questions
That's MySQL & PostgreSQL. Mark it forged?
7 min read · try the examples if you haven't