Senior 4 min · March 06, 2026

Checkpoint in DBMS — Why 60s Timeout Killed Production I/O

A 60-second checkpoint_timeout spiked disk to 100% and latency from 5ms to 5s.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Checkpoint writes dirty pages from buffer pool to disk and updates WAL with a redo start point
  • Reduces crash recovery time from hours to seconds by avoiding full WAL replay
  • Sharp checkpoints write all dirty pages synchronously (slow but clean); fuzzy checkpoints spread writes over time (faster but complex)
  • In PostgreSQL, checkpoint_timeout and checkpoint_completion_target control frequency and I/O impact
  • Production gotcha: too frequent checkpoints cause I/O storms; too infrequent cause long recovery
Plain-English First

Imagine you're doing a 500-piece jigsaw puzzle and you stop every hour to take a photo of your progress. If the dog jumps on the table and scatters everything, you restart from the photo — not from the empty table. A database checkpoint is exactly that photo: a confirmed 'safe point' on disk so that after a crash, the database only has to redo work done after the last photo, not replay every single move since it was first installed.

Every production database — Postgres, MySQL InnoDB, Oracle, SQL Server — will crash at some point. Power cuts happen. Kernel panics happen. Someone trips over the wrong cable. The question isn't if your database will die mid-transaction; it's how fast it can get back on its feet with zero data loss. That answer lives almost entirely in one mechanism: the checkpoint.

Without checkpoints, crash recovery means replaying every single log record ever written — potentially years of transactions — before the database can accept a single query. That would make restarts take hours or days. Checkpoints solve this by periodically writing all dirty pages from memory to disk and recording a 'you can start recovery from here' marker in the write-ahead log. That single act turns a potential multi-hour recovery into seconds.

By the end of this article you'll understand exactly what happens inside a checkpoint cycle, the difference between sharp and fuzzy checkpoints and why fuzzy ones exist, how the WAL and buffer pool interact during checkpointing, what the performance trade-offs look like in PostgreSQL's real configuration knobs, and the production gotchas that bite engineers who treat checkpoints as a background detail.

What Is a Checkpoint and Why Does It Exist?

A checkpoint in a DBMS is a synchronization point where the buffer pool's dirty pages (modified but not yet on disk) are flushed to the data files, and the WAL is updated to mark that all changes up to that point are safely stored. This drastically shortens crash recovery: instead of replaying the entire WAL from the beginning, the database can start from the most recent checkpoint. The checkpoint record in the WAL contains the LSN (Log Sequence Number) that acts as the new recovery start point.

Checkpoints are not optional – every ACID-compliant database implements them. Without checkpoints, recovery time would grow linearly with the age of the database. A ten-year-old database would need to replay ten years of transactions after each restart. That simply doesn't work at scale.

checkpoint_basics.sqlPOSTGRESQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- PostgreSQL: Checkpoint in action
-- This triggers a manual checkpoint
CHECKPOINT;

-- Check checkpoint-related statistics
SELECT checkpoints_timed, checkpoints_req,
       buffers_checkpoint, buffers_clean, buffers_backend,
       checkpoint_write_time, checkpoint_sync_time
FROM pg_stat_bgwriter;

-- Show last checkpoint location
SELECT pg_current_wal_lsn(), 
       pg_last_checkpoint_location() AS last_checkpoint_lsn,
       pg_last_checkpoint_timestamp();
Checkpoint as Snapshot
  • Dirty pages = unsaved progress; checkpoint = save to disk.
  • WAL = a journal of every action; checkpoint = a bookmark in the journal saying 'everything before this is safe'.
  • Recovery scans the WAL from the checkpoint forward, not from the beginning — that's the speed gain.
  • Without checkpoints, recovery time equals database age. With them, it's bounded by the checkpoint interval.
Production Insight
If checkpoint_timeout is too long, recovery takes longer — but too short causes I/O thrashing.
Set checkpoint_completion_target between 0.7 and 0.9 to spread writes evenly.
Monitor buffers_checkpoint vs buffers_backend — backend writes signal checkpoint pressure.
Key Takeaway
Checkpoints are the reason databases recover in seconds, not days.
Tuning checkpoint parameters balances I/O load vs recovery speed.
Always measure your current checkpoint behavior before changing config.

WAL and Checkpoint Interaction: The Recovery Handshake

The Write-Ahead Log (WAL) records every change before it's applied to data files. A checkpoint ensures that all WAL records up to a certain LSN have been fully applied to the data files on disk. After checkpoint, the WAL can be truncated up to that LSN. During recovery, the database reads the last checkpoint location from the WAL and replays all subsequent records. This is called the redo phase. Some databases also have an undo phase to roll back incomplete transactions.

The key insight: the checkpoint record itself is written after all dirty pages are safely on disk. If the checkpoint record is missing, recovery falls back to the previous checkpoint. That's why checkpoint writes are synchronous and include a WAL flush. The checkpoint writes use direct I/O to ensure persistence.

wal_interaction.sqlPOSTGRESQL
1
2
3
4
5
6
7
8
9
10
11
-- Show where the last checkpoint is in the WAL
SELECT pg_last_checkpoint_location() AS checkpoint_lsn;

-- Show the LSN of a recent transaction
SELECT txid_current(), pg_current_wal_lsn();

-- After a checkpoint, old WAL segments are reusable
SELECT pg_switch_wal(); -- Force WAL segment switch
CHECKPOINT;
-- Now inspect WAL directory
SELECT * FROM pg_ls_waldir() ORDER BY name;
WAL Archiving and Checkpoints
In production, WAL is archived for Point-in-Time Recovery (PITR). Checkpoints coordinate with archivers to ensure archived WAL is contiguous. If archive_command is slow, checkpoints may stall, causing transaction lag.
Production Insight
A checkpoint can stall if WAL archiving is falling behind.
Monitor pg_stat_archiver for failed archiving jobs.
The rule: checkpoint cannot complete until all preceding WAL is archived if archive_mode = on.
Key Takeaway
Checkpoint writes a 'safe up to here' marker in WAL.
Recovery replays WAL from that marker.
If WAL archiving fails, checkpoints block — preventing new WAL segment reuse.

Sharp vs Fuzzy Checkpoints: The Performance Trade-off

Sharp checkpoints flush all dirty pages at one moment. This is simple but causes a massive I/O spike. Fuzzy checkpoints (used by PostgreSQL, MySQL InnoDB) spread the flushing over a configurable window. They start writing dirty pages early in the interval and try to complete before the next checkpoint. The WAL checkpoint record marks the boundary, but data writes are asynchronous.

In PostgreSQL, checkpoint_completion_target controls how much of the interval is used for flushing. A value of 0.9 means 90% of checkpoint_timeout is spent writing dirty pages, with the final 10% reserved for the sync. This avoids a sudden burst but requires a steady I/O capacity.

Sharp checkpoints still exist in some systems (e.g., SQL Server simple recovery model), but modern systems prefer fuzzy for stability.

checkpoint_configuration.sqlPOSTGRESQL
1
2
3
4
5
6
7
8
9
10
11
12
13
-- View current checkpoint parameters
SHOW checkpoint_timeout;
SHOW checkpoint_completion_target;
SHOW max_wal_size;
SHOW min_wal_size;

-- Example: configure for a heavy-write workload
ALTER SYSTEM SET checkpoint_timeout = '10min';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET max_wal_size = '4GB';
ALTER SYSTEM SET min_wal_size = '1GB';
-- Reload configuration
SELECT pg_reload_conf();
Fuzzy Checkpoint Analogy
  • Sharp = sprint: all writes at once, I/O spike, fast but painful.
  • Fuzzy = steady jog: writes spread over minutes, smooth I/O, easier on disk.
  • Completion target = your pace plan: how much of the interval you use.
  • Database replay doesn't care about fuzziness — it only looks at the LSN boundary.
Production Insight
Fuzzy checkpoints are essential in high-write environments.
If checkpoint_completion_target is too high (>0.95), the checkpoint may not finish before the next one starts.
If too low (<0.5), I/O spikes occur because flushing is compressed into a short time.
Key Takeaway
Fuzzy checkpoints smooth I/O but need careful completion target tuning.
Sharp checkpoints are simple but not production-friendly in write-heavy systems.
Goal: finish checkpoint just before the next timeout — not too early, not too late.

Checkpoint Tuning in PostgreSQL: Real Knobs and Constraints

PostgreSQL provides several knobs to control checkpoint behaviour. The most important are:

  • checkpoint_timeout: Maximum time between automatic checkpoints (default 5 minutes).
  • max_wal_size: Target for total WAL size. When exceeded, a checkpoint is forced.
  • min_wal_size: Minimum WAL size to keep for recycling.
  • checkpoint_completion_target: Fraction of checkpoint_timeout spent flushing dirty pages.
  • checkpoint_flush_after: Number of pages after which to flush writes to OS.

Understanding the interaction: checkpoints can be triggered by time (timed checkpoints) or by WAL size (requested checkpoints). The pg_stat_bgwriter view shows how many of each occurred. A high ratio of requested to timed means the WAL size is often hitting max_wal_size — that can signal write bursts or insufficient max_wal_size.

checkpoint_analyze.sqlPOSTGRESQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Analyze checkpoint effectiveness
SELECT 
  (checkpoints_timed + checkpoints_req) AS total_checkpoints,
  checkpoints_timed,
  checkpoints_req,
  buffers_checkpoint,
  buffers_clean,
  buffers_backend,
  (buffers_backend * 100.0 / NULLIF(buffers_checkpoint, 0)) AS backend_write_ratio
FROM pg_stat_bgwriter;

-- Check if checkpoints are being forced by WAL size
SELECT * FROM pg_stat_bgwriter 
WHERE checkpoints_req > checkpoints_timed;

-- View current WAL size
SELECT pg_size_pretty(SUM(size)) FROM pg_ls_waldir();
Production Insight
PostgreSQL 9.6+ introduced checkpoint_flush_after to reduce dirty page accumulation.
If you see long checkpoint_sync_time, consider using a faster SSD or tuning checkpoint_completion_target.
Never set max_wal_size too small — it leads to constantly forced checkpoints and WAL recycling overhead.
Key Takeaway
Monitor pg_stat_bgwriter to detect checkpoint pressure.
Timed checkpoints are cheaper than requested ones.
I/O capability determines your checkpoint interval — not the other way around.
Choosing Checkpoint Settings
IfHigh write workload, I/O capacity is limited
UseIncrease checkpoint_timeout to 10-15 min and set checkpoint_completion_target to 0.9.
IfLow write workload, need fast recovery
UseKeep checkpoint_timeout low (2-5 min) to minimize recovery time.
IfCheckpoints are requested more often than timed
UseIncrease max_wal_size or reduce write rate. Check for long-running transactions blocking WAL recycling.
IfBackend writes (buffers_backend) > buffers_checkpoint
UseCheckpoint interval is too long — increase checkpoint frequency or improve checkpointer's I/O speed.

Common Checkpoint Mistakes and How to Fix Them

Even experienced DBAs misconfigure checkpoints. The most common issues:

  1. Using default settings in production: Default checkpoint_timeout=5min may be fine for dev but causes I/O spikes in heavy-load production. Always baseline I/O and adjust.
  2. Ignoring full_page_writes: PostgreSQL writes full page images to WAL during first modification after a checkpoint. Disabling this saves space but risks page corruption on crash. Only disable on replicas with data checksums.
  3. Setting checkpoint_completion_target >0.95: Leaves no buffer for OS I/O delays. A checkpoint that doesn't complete before the next timeout triggers a forced sync, causing a sharp I/O spike.
  4. Not monitoring buffers_backend: If backend processes are writing dirty pages themselves, the checkpointer is overwhelmed. Increase checkpoint_timeout or add I/O capacity.
Production Insight
Checkpoint-related I/O storms are the #1 cause of 'unexplained' performance degradation in PostgreSQL.
Always add checkpointer dedicated cores if workload is write-heavy.
Use pg_test_timing to measure checkpoint timing overhead.
Key Takeaway
Default checkpoint settings are for development — not for production.
Watch buffers_backend in pg_stat_bgwriter — that's where hidden pressure lives.
Full page writes are not optional on primary; disable only on replicas with checksums.
● Production incidentPOST-MORTEMseverity: high

PostgreSQL Checkpoint Storm Wiped Out Production DB Performance

Symptom
Every hour, disk utilization spiked to 100%, query latency jumped from 5ms to 5s, and the database was unresponsive for several minutes.
Assumption
The team thought the disk was failing. They replaced SSDs twice before tracing it back to checkpoints.
Root cause
checkpoint_timeout was set to 60 seconds, forcing a full checkpoint every minute. Each checkpoint flushed hundreds of dirty buffers at once, saturating the I/O subsystem.
Fix
Increased checkpoint_timeout to 300 seconds and set checkpoint_completion_target to 0.9 to spread writes evenly over the interval. Also enabled full_page_writes only on replicas to reduce write amplification.
Key lesson
  • Monitor checkpoint-related metrics in pg_stat_bgwriter (buffers_checkpoint, checkpoint_sync_time).
  • Never use default checkpoint_timeout in production without baseline I/O capacity testing.
  • Fuzzy checkpoints are a lifesaver – they smooth out I/O load but require careful tuning of checkpoint_completion_target.
Production debug guideSymptom → Action mapping for checkpoint problems5 entries
Symptom · 01
Crash recovery takes >10 minutes
Fix
Check pg_stat_bgwriter: if checkpoints_timed > checkpoints_req, recovery will be long. Increase checkpoint_segments or checkpoint_timeout.
Symptom · 02
High I/O during checkpoint bursts
Fix
Query pg_stat_bgwriter: compare buffers_backend vs buffers_checkpoint. If buffers_backend is high, checkpoint interval is too long causing backend writes. Reduce checkpoint_timeout.
Symptom · 03
WAL segment accumulation (pg_xlog filling disk)
Fix
Check if archive_command is stalled or slow. If checkpoints are too infrequent, WAL keeps accumulating. Use pg_current_wal_lsn to track lag.
Symptom · 04
Checkpoint not completing before next one starts
Fix
Inspect checkpoint_write_time and checkpoint_sync_time in pg_stat_bgwriter. If total > checkpoint_timeout * checkpoint_completion_target, increase checkpoint_timeout or improve disk throughput.
Symptom · 05
Backend writes (buffers_backend) spike during checkpoint
Fix
Increase checkpoint_completion_target (e.g., 0.9) to spread writes. Also raise wal_buffers to reduce contention.
★ Checkpoint Quick Debug Cheat SheetImmediate actions and commands to diagnose checkpoint issues in PostgreSQL
Database recovery is slow after crash
Immediate action
Check last checkpoint location and identify how much WAL needs to be replayed.
Commands
SELECT pg_current_wal_lsn(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn();
SELECT * FROM pg_stat_bgwriter;
Fix now
Force a checkpoint manually with CHECKPOINT; and monitor recovery progress.
Checkpoint causes I/O spikes every few minutes+
Immediate action
Query pg_stat_bgwriter to see checkpoints_timed vs checkpoints_req.
Commands
SELECT checkpoints_timed, checkpoints_req, buffers_checkpoint, buffers_backend FROM pg_stat_bgwriter;
SHOW checkpoint_timeout; SHOW checkpoint_completion_target;
Fix now
Set checkpoint_timeout to at least 5 minutes and checkpoint_completion_target to 0.9.
WAL directory grows unbounded+
Immediate action
Check if WAL segments are being archived and if checkpoints are occurring.
Commands
SELECT * FROM pg_stat_archiver;
SELECT COUNT(*) FROM pg_ls_waldir();
Fix now
Enable archive_mode and ensure archive_command succeeds. Force a checkpoint to trigger archiving of old segments.
Checkpoint Types Comparison
AspectSharp CheckpointFuzzy Checkpoint
Flush patternAll dirty pages at onceSpread over interval
I/O impactSpike (can saturate disk)Smooth (steady load)
Recovery startImmediately after checkpointSame — only LSN matters
ImplementationSQL Server simple recovery, manualPostgreSQL, InnoDB automatic
Suitable forLow-write or nightly batch windowsHigh-write OLTP systems
Configuration knobFrequency only (e.g., recovery interval)checkpoint_timeout + checkpoint_completion_target

Key takeaways

1
Checkpoints cut crash recovery time
without them, recovery scales with database age; with them, it's bounded by the checkpoint interval.
2
Fuzzy checkpoints spread writes and prevent I/O storms—tune checkpoint_completion_target between 0.7 and 0.9.
3
Monitor pg_stat_bgwriter regularly
buffers_backend > buffers_checkpoint is a red flag for checkpoint pressure.
4
Never use default checkpoint settings in production without baseline knowledge of your I/O capacity.
5
Full page writes are not optional on primary servers—disabling them invites silent data corruption.

Common mistakes to avoid

5 patterns
×

Using default checkpoint_timeout in high-write production

Symptom
Frequent I/O spikes, checkpoint stalling, backend writes increase, query latency varying wildly.
Fix
Base checkpoint_timeout on I/O benchmarks. Start with 10 minutes for heavy write workloads. Monitor buffers_backend ratio.
×

Setting checkpoint_completion_target too high (>0.95)

Symptom
Checkpoint times out, causing forced sync of all remaining pages — I/O storm exactly when you least want it.
Fix
Keep checkpoint_completion_target between 0.7 and 0.9. Leave headroom for OS write-back delays.
×

Disabling full_page_writes to save WAL space

Symptom
After a crash, pages are partially written and cannot be recovered — data corruption.
Fix
Keep full_page_writes on primary. On replicas, enable data checksums and test before disabling.
×

Forgetting to monitor pg_stat_bgwriter

Symptom
Unexplained performance degradation over time; checkpoint warnings in logs are ignored.
Fix
Set up monitoring for buffers_checkpoint, buffers_backend, checkpoint_write_time. Alert when backend_write_ratio > 20%.
×

Checking only total time, not the distribution

Symptom
Checkpoint appears fast, but recovery time grows because WAL accumulates.
Fix
Monitor max_wal_size usage and checkpoints_req vs checkpoints_timed ratio. Adjust accordingly.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the purpose of a checkpoint in a DBMS?
Q02SENIOR
Explain the difference between sharp and fuzzy checkpoints. Which one do...
Q03SENIOR
How does checkpoint interact with WAL? What happens to WAL after a check...
Q04SENIOR
What configuration parameters would you tune in PostgreSQL to reduce I/O...
Q05SENIOR
What is full_page_writes and why is it important for checkpoint?
Q01 of 05JUNIOR

What is the purpose of a checkpoint in a DBMS?

ANSWER
A checkpoint synchronizes the database's in-memory dirty pages with the data files on disk. It records a special marker in the write-ahead log (WAL) indicating that all changes up to that point are safely stored. During crash recovery, the database only needs to replay WAL entries after the last checkpoint, drastically reducing recovery time. Without checkpoints, recovery would require replaying the entire transaction history from the beginning of the database.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is a checkpoint in DBMS in simple terms?
02
Why do checkpoints cause I/O spikes?
03
How can I see when my last checkpoint occurred?
04
What happens if a checkpoint fails?
05
Do NoSQL databases use checkpoints?
🔥

That's DBMS. Mark it forged?

4 min read · try the examples if you haven't

Previous
DBMS Interview Questions
11 / 11 · DBMS
Next
Introduction to Compiler Design