EXPLAIN shows the query plan the database will use; EXPLAIN ANALYZE actually runs the query and shows real timings
Seq Scan = full table scan — every row read — usually the problem node to fix
Index Scan = B-tree traversal + heap fetch per row; Index Only Scan = no heap fetch (fastest)
Cost is in arbitrary units: (startup_cost..total_cost rows=estimated width=bytes)
Nested Loop = one table drives, the other is looked up per row — good for small outer tables; Hash Join = build hash table from smaller table, probe with larger — good for large tables
Loops=N on an inner node means that node ran N times — the O(n²) signal for correlated subqueries
✦ Definition~90s read
What is SQL EXPLAIN and Execution Plans?
SQL EXPLAIN is a database diagnostic tool that reveals the execution plan a query optimizer generates to retrieve your data. Instead of guessing why a query is slow, EXPLAIN shows you the actual steps—scan methods (Seq Scan, Index Scan, Index Only Scan), join algorithms (Nested Loop, Hash Join, Merge Join), and row estimates at each node.
★
Every slow query has a story, and EXPLAIN is how you read it.
It exists because SQL is declarative: you say what you want, not how to get it. The optimizer chooses the how, and when it guesses wrong—like picking a Nested Loop join over 5 million rows instead of a Hash Join—you get a 12-minute query. EXPLAIN is your only window into that decision, letting you pinpoint where the plan breaks down and fix it with indexes, statistics, or query rewrites.
In the PostgreSQL ecosystem, EXPLAIN is the standard tool for query tuning, but it has a critical limitation: it shows the optimizer's estimated costs, not actual execution time. That’s where EXPLAIN ANALYZE comes in—it runs the query and reports real timings and row counts, exposing when estimates are off by orders of magnitude due to stale statistics or correlated columns.
Without ANALYZE, you’re debugging blind. Use EXPLAIN when you need to understand the plan structure; use EXPLAIN ANALYZE when you need the truth. Alternatives like pg_stat_statements or auto_explain provide historical or automated plan logging, but for ad-hoc debugging, nothing beats EXPLAIN ANALYZE.
When not to use EXPLAIN? Don’t rely on it for production queries with side effects (INSERT/UPDATE/DELETE) unless wrapped in a transaction and rolled back—EXPLAIN ANALYZE actually executes the query. Also, EXPLAIN alone can mislead you if statistics are outdated; always run ANALYZE on the relevant tables first.
For complex queries, focus on the highest-cost node first—that’s where your 12 minutes are hiding. Real-world example: a Nested Loop join on two tables with 5M and 100K rows, no index on the join column, and the optimizer underestimating the inner table’s row count—EXPLAIN ANALYZE shows 5M index scans instead of the estimated 100, and you know exactly where to add an index or force a Hash Join.
Plain-English First
Every slow query has a story, and EXPLAIN is how you read it. A query can be written in dozens of equivalent ways, and the database makes a decision about the most efficient path to execute it. EXPLAIN shows you that decision — like the GPS navigation plan before you drive. EXPLAIN ANALYZE is like driving the route and recording how long each segment actually took.
Every slow query has a story, and EXPLAIN is how you read it. In production, a query that ran fine on 10,000 rows can grind a system to a halt at 10 million. The difference isn't always the query itself — it's the execution plan the query optimiser chose. Developers who can read execution plans diagnose slow queries in minutes instead of hours of guesswork.
This guide teaches you to read execution plans the same way a senior DBA does: starting from the innermost node and reading the cost signals, not from the top down. By the end, you'll know the difference between every major node type, what the cost numbers actually mean, and how to use the plan to drive targeted fixes.
Why Your Query Is Slow: The Execution Plan Tells the Real Story
An SQL EXPLAIN execution plan is the database's step-by-step recipe for retrieving your requested data. It shows how the query optimizer chose to access tables, join them, and filter rows — including the algorithms used (e.g., nested loop, hash join, sequential scan) and their estimated costs. The plan is the only reliable way to understand why a query runs for minutes instead of milliseconds.
Execution plans are tree structures: each node represents an operation (scan, join, sort) with its own cost, row estimate, and actual row count when run with ANALYZE. The critical property is that the optimizer's estimates can be wildly wrong due to outdated statistics or complex predicates. A nested loop join with 5 million rows on the outer side and no index on the inner side will produce O(n*m) complexity — that's 25 trillion comparisons, explaining a 12-minute runtime.
Use EXPLAIN (ANALYZE, BUFFERS, TIMING) on any query that takes longer than 100ms in production. Without it, you're guessing. The plan reveals exactly where time is spent: sequential scans on large tables, missing indexes, or bad join order. In real systems, a single missing index can turn a 50ms query into a 5-minute disaster — the execution plan shows you the smoking gun.
Estimates ≠ Reality
EXPLAIN without ANALYZE shows optimizer guesses — they can be off by 1000x. Always use EXPLAIN (ANALYZE, BUFFERS) to see actual row counts and I/O.
Production Insight
A team ran a nightly report joining 5M orders to 2M customers with no index on customer_id — the nested loop produced 10 trillion comparisons, killing the database CPU for 12 minutes.
Symptom: 100% CPU on one core, query stuck in 'executing' state, other queries timing out due to resource contention.
Rule: If EXPLAIN shows a nested loop with estimated rows > 10,000 on the outer side, you must have an index on the inner join column — or force a hash join.
Key Takeaway
EXPLAIN (ANALYZE, BUFFERS) is the only way to diagnose query performance — never guess.
A nested loop join on large tables without an index is the #1 cause of multi-minute queries.
Always check actual vs. estimated row counts — a 100x mismatch means stale statistics or a bad query structure.
thecodeforge.io
SQL EXPLAIN: Nested Loop on 5M Rows
Sql Explain Execution Plans
How to Read an Execution Plan — The Fundamentals
An execution plan is a tree. The query planner breaks your SQL into operations and arranges them from innermost (raw data retrieval) to outermost (final result). You read a plan from the inside out — the innermost nodes execute first.
Every node in the plan shows three things: the operation type (Seq Scan, Index Scan, Hash Join), the cost estimate (startup_cost..total_cost), and the row and width estimates. When you run EXPLAIN ANALYZE, you also get the actual time and actual rows — the ground truth.
The most important comparison: estimated rows vs actual rows. When these diverge by an order of magnitude, the planner is working from stale statistics and the plan choice may be wrong. A planner that thinks a table has 1,000 rows when it has 1,000,000 will choose algorithms designed for small data.
explain_analyze_reading.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Basic EXPLAIN: shows estimated plan without executingEXPLAINSELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'GROUPBY c.customer_id, c.name;
-- EXPLAIN ANALYZE: executes and shows actual timingsEXPLAINANALYZESELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'GROUPBY c.customer_id, c.name;
-- EXPLAIN ANALYZE BUFFERS: also shows cache hit ratioEXPLAIN (ANALYZE, BUFFERS, FORMATTEXT)
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'GROUPBY c.customer_id, c.name;
-> Seq Scan on orders (cost=0.00..6200.00 rows=12000 width=16)
(actual time=0.05..98.3 rows=11842 loops=1)
Filter: ((status)::text = 'completed')
Rows Removed by Filter: 38158
-> Hash (cost=750.00..750.00 rows=50000 width=24)
-> Seq Scan on customers (cost=0..750 rows=50000 width=24)
Always Use EXPLAIN ANALYZE on the Actual Slow Query
EXPLAIN without ANALYZE shows estimated plans based on statistics — it does not run the query. The estimates can be wrong. EXPLAIN ANALYZE runs the query and shows real timing. For read queries on production, this is safe. For destructive queries (DELETE, UPDATE), wrap in a transaction and roll back: BEGIN; EXPLAIN ANALYZE DELETE FROM ...; ROLLBACK;
Production Insight
Read execution plans from the innermost indented node outward — the deepest node executes first.
The most useful signal: actual rows vs estimated rows. A 10x or greater divergence means statistics are stale — run ANALYZE.
For production query analysis, use EXPLAIN (ANALYZE, BUFFERS) — the BUFFERS output shows how many data pages came from cache vs disk, which is often the real performance signal.
Key Takeaway
Execution plans are trees — read from innermost node (deepest indentation) outward.
Actual rows vs estimated rows is the most important signal — large divergence means stale statistics.
EXPLAIN ANALYZE runs the query; EXPLAIN alone uses estimates. Always ANALYZE with care on production destructive queries.
Seq Scan, Index Scan, and Index Only Scan — The Core Node Types
Three scan types appear in most queries. Understanding what each does determines what fix to apply.
Seq Scan (Sequential Scan) reads every page of the table in order. This is not always wrong — on small tables or when the query returns >15-20% of all rows, a sequential scan is cheaper than following millions of index pointers. But Seq Scan on a large table with a selective WHERE clause is the most common performance problem in SQL.
Index Scan uses a B-tree index to find matching row locations, then fetches each matching row from the heap (the main table storage). The index narrows the search; the heap provides the full row data. This is the standard fix for a Seq Scan — add an index on the WHERE column.
Index Only Scan is the fastest path — all data the query needs is in the index itself (a covering index), so the heap is never accessed. Zero heap fetches. Achievable by adding INCLUDE columns to the index definition.
scan_types_diagnosis.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Seq Scan: no index, or planner chose not to use oneEXPLAINANALYZESELECT * FROM orders WHERE status = 'pending';
-- Seq Scan on orders (cost=0.00..8500.00 rows=1200 width=80)-- Actual rows: 1200, loops=1 -- if this is >15% of the table, seq scan is correct-- Add index -> Index ScanCREATEINDEX idx_orders_status ONorders(status) WHERE status = 'pending';
-- partial index: only indexes the rows that match -- much smaller, more selectiveEXPLAINANALYZESELECT * FROM orders WHERE status = 'pending';
-- Index Scan using idx_orders_status on orders-- (cost=0.42..180.00 rows=1200 width=80) -- cost dropped from 8500 to 180-- Covering index -> Index Only Scan (no heap access)CREATEINDEX idx_orders_customer_status
ONorders(customer_id, status)
INCLUDE (order_id, total, created_at); -- add SELECT columnsEXPLAINANALYZESELECT order_id, total, created_at
FROM orders
WHERE customer_id = 42AND status = 'completed';
-- Index Only Scan using idx_orders_customer_status-- Heap Fetches: 0 <- fastest possible path
Output
-- Before index: Seq Scan cost=8500
-- After index: Index Scan cost=180
-- With covering index: Index Only Scan, Heap Fetches: 0
-- Cost reduced from 8500 to ~40
Seq Scan Is Not Always Wrong
The planner chooses Seq Scan when it estimates it is cheaper than following index pointers — typically when >15-20% of rows would be returned. On a table of 1,000 rows, Seq Scan is almost always chosen regardless of indexes — and correctly so. Only investigate Seq Scan when it appears on a large table with a highly selective WHERE clause.
Production Insight
Seq Scan on a large table + low estimated rows = missing index or function wrapping the column.
Seq Scan on a large table + high estimated rows = low selectivity — the planner is correct, the query needs redesigning.
Index Only Scan is the goal for your top read-heavy queries — identify them with pg_stat_statements and cover them.
Key Takeaway
Seq Scan = full table read — investigate when on large tables with selective WHERE.
Index Scan = B-tree lookup + heap fetch per row — standard fix for Seq Scan on selective queries.
Index Only Scan = zero heap access — achievable with INCLUDE columns, the highest-performance read path.
Three join algorithms appear in execution plans. The planner chooses based on estimated table sizes, available indexes, and whether the data is sorted.
Nested Loop Join drives with one table (the outer loop) and probes the inner table once per outer row. Fast when the outer table is small and the inner table has an index on the join column. Catastrophic when the outer table is large — the inner probe repeats N times where N is the outer row count. The loops=N in the inner node of EXPLAIN ANALYZE reveals this.
Hash Join builds a hash table from the smaller table, then probes it for each row of the larger table. Two passes over the data, but O(n+m) rather than O(n×m). The standard choice for large table joins. Requires memory — if the hash table spills to disk (HashBatches > 1 in EXPLAIN output), performance degrades significantly.
Merge Join requires both tables to be sorted on the join column. Extremely fast when both sides are already sorted or can use an index scan in sorted order. Rarely chosen by the planner unless sort order is already available.
join_algorithms_explain.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Force specific join type for comparison (PostgreSQL)-- Normally leave this to the planner -- these are for diagnosis onlySET enable_hashjoin = off;
SET enable_mergejoin = off;
EXPLAINANALYZESELECT * FROM orders o JOIN customers c ON c.customer_id = o.customer_id;
-- Will use Nested Loop -- shows cost with forced algorithmRESET enable_hashjoin;
RESET enable_mergejoin;
EXPLAINANALYZESELECT * FROM orders o JOIN customers c ON c.customer_id = o.customer_id;
-- Planner chooses optimal -- typically Hash Join for large tables-- Diagnose: is Hash Join spilling to disk?EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large_table a JOIN another_large_table b ON a.id = b.a_id;
-- Look for: Batches: 4 (instead of 1)-- Batches > 1 means the hash table spilled to disk -- increase work_mem-- SET work_mem = '256MB'; -- session-level increase for expensive joins
-- -> Hash -> Seq Scan on customers (loops=1) <- hash built once
-- Massive improvement: 18420ms -> 380ms
Hash Join Spilling to Disk
If EXPLAIN ANALYZE shows Batches: 4 (or any number > 1) on a Hash Join, the hash table exceeded work_mem and spilled to disk. This dramatically slows the join. Increase work_mem for the session: SET work_mem = '256MB'. This is a session-level setting — increase it for specific expensive queries, not globally, to avoid OOM on concurrent connections.
Production Insight
Nested Loop + large loops= number is the most common join performance problem — it means the planner underestimated the outer table's size (stale statistics).
Hash Join Batches > 1 means memory pressure — increase work_mem for the session, not globally.
Planner forcing (SET enable_hashjoin = off) is a diagnostic tool only — never leave it disabled in production.
Key Takeaway
Nested Loop = good for small outer tables + indexed inner; catastrophic for large outer tables.
Hash Join = large table joins; Batches > 1 means hash spilled to disk — increase work_mem.
Merge Join = when both sides are already sorted — rare but extremely fast when applicable.
Statistics, ANALYZE, and Why Estimates Go Wrong
The query planner does not read your data to make decisions — it reads statistics. PostgreSQL maintains per-column statistics: the number of distinct values, the most common values and their frequencies, and a histogram of the value distribution. These statistics live in pg_statistic and are updated by VACUUM ANALYZE.
When statistics are stale — after a bulk insert, a data migration, or a period of high insert/delete volume — the planner's row estimates can be orders of magnitude off. A planner that thinks a table has 100,000 rows when it has 5,000,000 will choose Nested Loop joins instead of Hash Joins, index scans instead of seq scans on highly selective queries, and estimate sort operations as cheap when they are expensive.
The practical workflow: whenever a query plan looks wrong (estimated rows diverges greatly from actual rows in EXPLAIN ANALYZE), run ANALYZE on the affected table, then re-run EXPLAIN ANALYZE. In most cases the plan improves immediately.
statistics_and_analyze.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Check how fresh the statistics are for a tableSELECT
relname AS table_name,
n_live_tup AS estimated_live_rows,
n_dead_tup AS dead_rows,
last_analyze AS last_manual_analyze,
last_autoanalyze AS last_auto_analyze
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- Force statistics update after bulk import or migrationANALYZE orders;
-- Or with verbosity:ANALYZEVERBOSE orders;
-- Check column-level statistics (what the planner sees)SELECT
attname AS column,
n_distinct, -- negative = fraction of total rows; -0.5 means 50% distinct
correlation -- 1.0 = perfectly sorted, 0 = random -- affects index vs seq scan choiceFROM pg_stats
WHERE tablename = 'orders'AND attname IN ('customer_id', 'status', 'created_at');
-- Identify tables with stale or missing statisticsSELECT relname, n_live_tup, last_analyze
FROM pg_stat_user_tables
WHERE last_analyze < NOW() - INTERVAL'7 days'OR last_analyze ISNULLORDERBY n_live_tup DESC;
ANALYZE must be run manually after bulk inserts — autovacuum may not keep up with sudden large row count changes. A table that grows from 100K to 5M rows in a single migration will have statistics based on the old row count until ANALYZE runs. Add ANALYZE to every migration script that inserts or deletes large numbers of rows.
Production Insight
The single most impactful EXPLAIN diagnostic: compare estimated rows vs actual rows in EXPLAIN ANALYZE output — any 10x divergence means stale statistics and a likely suboptimal plan.
For tables with high insert/delete rates, check last_autoanalyze frequency in pg_stat_user_tables — if autovacuum is not running frequently enough, adjust autovacuum_analyze_scale_factor.
Add ANALYZE tablename to every migration script that affects large row counts.
Key Takeaway
The planner uses statistics, not data — stale statistics produce wrong plans.
EXPLAIN ANALYZE shows estimated vs actual rows — large divergence = run ANALYZE.
Add ANALYZE to migration scripts; monitor last_autoanalyze in pg_stat_user_tables for high-volume tables.
EXPLAIN ANALYZE — The Only Honest Estimator
The plain EXPLAIN lies to you. It shows the planner's guess — what it thinks will happen based on table statistics that might be stale, skewed, or just wrong. You see an Index Scan with cost 12.42 and assume your query is fine. Then it runs for three seconds in production.
EXPLAIN ANALYZE actually executes the query. It shows real timings, real row counts, and the actual number of loops for each node. The gap between "estimated rows" and "actual rows" is where performance goes to die. When you see estimated=1 vs actual=1,000,000, you know the planner chose a terrible join order or scan method based on bad stats.
Run EXPLAIN ANALYZE on a representative dataset — not a local dev table with three rows. Capture the output before a migration or schema change. Compare the before and after. That's how you catch regressions before they hit users.
The time overhead is real, especially on writes (INSERT/UPDATE/DELETE). Don't run ANALYZE on production OLTP during peak hours. Use a read replica or a staging environment with production-scale data.
ExplainAnalyzeCost.sqlSQL
1
2
3
4
5
6
7
8
9
// io.thecodeforge — database tutorial
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2024-01-01'ORDERBY o.total DESCLIMIT100;
-> Seq Scan on customers c (cost=0.00..183.25 rows=10025 width=36) (actual time=0.008..1.023 rows=10000 loops=1)
Planning Time: 0.245 ms
Execution Time: 15.589 ms
Production Trap:
EXPLAIN ANALYZE on a DELETE or UPDATE will actually modify data. Always wrap in a transaction and ROLLBACK if you're just diagnosing. Better yet, use EXPLAIN (ANALYZE, FORMAT JSON) which doesn't mutate state.
Key Takeaway
Always compare estimated rows to actual rows. A 10x or higher mismatch means your planner is flying blind — fix it with VACUUM ANALYZE or better statistics targets.
Why the Planner Chooses Wrong — and How to Fix It
You've seen it: a query that was fast yesterday is slow today. No code change. No schema change. The execution plan flipped from an Index Scan to a Sequential Scan. Your first instinct is wrong — don't add an index hint or force a plan. Fix the root cause.
The planner's job is to minimize total cost. It relies on table statistics: row counts, null fractions, average column width, distribution histograms. When those stats are stale — after massive inserts, deletes, or bulk updates — the planner guesses wrong. A table that had 1,000 rows an hour ago now has 10,000,000. The planner still thinks a Seq Scan is cheap because it expects 1,000 rows. Oops.
Run ANALYZE after bulk operations. Not just once — set autovacuum thresholds that match your write volume. For tables with heavy churn (event logs, audit trails), consider increasing default_statistics_target to 1000 or higher for columns used in WHERE and JOIN predicates. That captures histograms with 1000 buckets instead of the default 100. More buckets = better cardinality estimates for skewed data.
Still wrong? Use pg_stats to inspect column frequency arrays and bucket boundaries. If the planner doesn't see a value's histogram entry, it assumes uniform distribution. Real data is never uniform.
FixBadPlan.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — database tutorial
-- Check current statistics target (default 100)SHOW default_statistics_target;
-- Increase per-column for the join columnALTERTABLE orders ALTERCOLUMN customer_id SETSTATISTICS1000;
-- Collect fresh statsANALYZE orders;
-- Verify the histogram now has enough bucketsSELECT attname, n_distinct, most_common_vals, histogram_bounds
FROM pg_stats
WHERE tablename = 'orders'AND attname = 'customer_id';
Set default_statistics_target to 500 for databases with skewed or large tables. The planning time increase is negligible — the execution plan reliability gain is massive.
Key Takeaway
Stale statistics are the #1 cause of sudden query regressions. Automate ANALYZE after every bulk write, or your planner will make increasingly terrible choices.
Parallel Plans: Why Your 64-Core Machine Idles While You Wait
Most devs assume throwing hardware at a slow query makes it faster. The planner disagrees. It decides whether a parallel plan is worth the overhead — and it's often wrong.
Parallelism kicks in when the planner estimates a query will process enough rows to justify splitting work across workers. It starts with a Gather or Gather Merge node, then fans out to partial scans or joins. The problem? If your table is small or your WHERE clause is selective, the coordinator spends more time orchestrating than the workers spend executing. You get slower results on a screaming-fast machine.
Check parallel_workers and parallel_tuple_cost in the plan. A Seq Scan on a 10M-row table should go parallel. An Index Scan returning 100 rows should not. Force parallelism with a hint only after confirming the planner's row estimate is correct — otherwise you're just burning CPU cycles.
ParallelCheck.sqlSQL
1
2
3
4
5
6
7
8
9
10
// io.thecodeforge — database tutorial
-- Force a parallel plan to see what the planner avoidsSET max_parallel_workers_per_gather = 4;
EXPLAIN (ANALYZE, TIMINGOFF)
SELECTcount(*) FROM orders WHERE order_date >= '2024-01-01';
-- Output shows Gather node with workers-- If you see 'Workers Launched: 0', the planner thinks it's not worth it
Parallel workers consume connections. If you set max_parallel_workers_per_gather too high, you'll starve other queries and hit connection pool limits. Never set it above half your max_connections.
Key Takeaway
Parallelism has a cost. If your plan shows Workers Launched: 0, trust the planner — or fix your estimates first.
Subquery vs. CTE: The Planner's Dirty Secret About Materialization
Every dev loves a CTE for readability. The planner loves it for a different reason: it materializes the CTE once and reuses it. That sounds great until the CTE returns 10M rows and the outer query only needs 10.
Subqueries get flattened into the main plan. The planner can push predicates down, use indexes, and join in smarter order. CTEs act as optimization fences. Once materialized, the planner treats the result as a static table — no pushdown, no index use. You're stuck with a full scan on a temp dataset.
Check the plan for 'CTE Scan' or 'Materialize' nodes. If you see CTE Scan on my_cte followed by a sequential scan, you're paying the materialization tax. Rewrite as a subquery or a LATERAL join. Exceptions: recursive CTEs and CTEs referenced more than once. Those genuinely benefit from materialization.
CTEvsSubquery.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — database tutorial
-- Slow CTE: plan shows CTE Scan + Seq Scan on large datasetEXPLAINANALYZEWITH recent_orders AS (
SELECT * FROM orders WHERE order_date > '2024-06-01'
)
SELECT * FROM recent_orders WHERE customer_id = 42;
-- Fast subquery: plan shows Index Scan with predicate pushdownEXPLAINANALYZESELECT * FROM orders
WHERE order_date > '2024-06-01'AND customer_id = 42;
Index Scan using idx_orders_customer_date on orders
Index Cond: (customer_id = 42)
Filter: (order_date > '2024-06-01'::date)
Senior Shortcut:
If you must use a CTE and it's referenced only once, slap a NOT MATERIALIZED hint after the AS keyword. PostgreSQL 12+ respects it. Oracle and SQL Server ignore it — know your engine.
Key Takeaway
CTEs materialize by default. If your plan shows CTE Scan, consider a subquery — or add NOT MATERIALIZED to force inlining.
● Production incidentPOST-MORTEMseverity: high
A Nested Loop Join on a 5-Million-Row Table Caused a 12-Minute Query
Symptom
A weekly revenue report that generated in 30 seconds started taking 12 minutes after a data migration increased the orders table from 100,000 rows to 5 million. The query had not changed.
Assumption
The query had been running fine for a year. No one checked whether the execution plan had changed after the migration.
Root cause
The query planner's statistics were stale — it estimated 100 rows for the orders table (based on pre-migration statistics) and chose a Nested Loop join as appropriate for small tables. At 5 million rows, a Nested Loop repeats the inner table probe 5 million times. The correct plan for 5 million rows was a Hash Join, which the planner would have chosen with accurate statistics.
Fix
ANALYZE orders; forced the statistics to update. The planner immediately chose a Hash Join on the next run. Query time dropped from 12 minutes to 18 seconds. Added ANALYZE to the post-migration checklist.
Key lesson
Run ANALYZE on large tables after bulk inserts or migrations — stale statistics cause wrong plan choices
Always run EXPLAIN ANALYZE after any significant data volume change to verify the plan has not regressed
The planner estimates rows based on statistics — if the estimate is wildly off from actual rows, statistics are stale
Production debug guideRead the plan systematically — innermost node first4 entries
Symptom · 01
Seq Scan on a large table in the plan
→
Fix
Check if the WHERE column has an index. If yes, check if the query uses a function on the column (WHERE LOWER(email) kills the index). Check selectivity — if >20% of rows match, the planner may prefer Seq Scan even with an index. Run ANALYZE to refresh statistics.
Symptom · 02
Actual rows is many times higher than estimated rows
→
Fix
Stale statistics. Run ANALYZE table_name. For tables with frequent inserts/deletes, check autovacuum settings — autovacuum may not be keeping statistics current. The planner chooses plans based on estimates — bad estimates produce bad plans.
Symptom · 03
Nested Loop with loops=N where N is very large
→
Fix
The outer table is large and the inner probe is repeating N times. Add an index on the inner table's JOIN column. If N is in the millions, the planner may have stale statistics causing it to underestimate the outer table's row count — run ANALYZE.
Symptom · 04
Sort node with high actual time
→
Fix
The query cannot use an index for ORDER BY — it is performing a filesort. Add an index that matches the ORDER BY column order. For window functions, add a composite index on (PARTITION_BY_col, ORDER_BY_col).
★ EXPLAIN Reading Cheat SheetKey nodes, what they mean, and what to do
Seq Scan on large table−
Immediate action
Check for missing index or function wrapping the column
Commands
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
SELECT indexname FROM pg_indexes WHERE tablename = 'orders';
Fix now
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Stale estimates — actual rows far exceeds estimated rows+
Immediate action
Refresh table statistics
Commands
ANALYZE orders;
EXPLAIN ANALYZE SELECT ... ; -- rerun the slow query
Fix now
If autovacuum is not keeping up, check: SELECT relname, n_live_tup, n_dead_tup, last_analyze FROM pg_stat_user_tables WHERE relname = 'orders';
Node Type
What It Does
When It Appears
What To Do
Seq Scan
Reads every table page sequentially
No index, low selectivity, or small table
Add index if WHERE is selective; check if >15% of rows returned
Index Scan
B-tree lookup + heap fetch per row
Selective WHERE with index
Usually optimal — consider INCLUDE for Index Only Scan
Index Only Scan
B-tree lookup only, no heap access
Covering index on all needed columns
Ideal — add INCLUDE columns to achieve this
Nested Loop
Outer table drives, inner probed per row
Small outer table or index on inner JOIN col
Catastrophic on large outer — add index to inner or check statistics
Hash Join
Build hash table, probe with other table
Large table joins without sort order
Standard for large joins — check Batches > 1 (spill to disk)
Sort
Sort rows for ORDER BY or Merge Join
ORDER BY without matching index
Add index matching ORDER BY order
HashAggregate
Group rows for GROUP BY
GROUP BY or DISTINCT
Usually fine — check for large estimated distinct count
Key takeaways
1
Read execution plans from the innermost (deepest indented) node outward
innermost nodes execute first.
2
Estimated rows vs actual rows is the most important signal
10x divergence means stale statistics, run ANALYZE.
3
Seq Scan is only a problem on large tables with selective WHERE clauses
on small tables or low-selectivity queries, it is often correct.
4
Always run ANALYZE after bulk inserts and migrations
add it to every migration script that changes large row counts.
Common mistakes to avoid
3 patterns
×
Running EXPLAIN without ANALYZE and treating the estimated plan as ground truth
Symptom
The estimated plan looks fine but the query is slow in practice — actual rows is orders of magnitude higher than estimated rows
Fix
Always use EXPLAIN ANALYZE for diagnosis. For destructive queries, wrap in a transaction: BEGIN; EXPLAIN ANALYZE UPDATE ...; ROLLBACK. The estimated plan from plain EXPLAIN is based on statistics that may be stale.
×
Immediately adding an index after seeing a Seq Scan without checking selectivity
Symptom
The new index is never used by the planner — it still chooses Seq Scan; the index adds write overhead for no read benefit
Fix
Check what percentage of rows the WHERE clause returns. If >15-20% of rows match, Seq Scan is correct and an index will not help. Only add an index when the WHERE clause is highly selective (returns a small fraction of total rows).
×
Not running ANALYZE after bulk inserts or data migrations
Symptom
Queries that were fast before a migration become slow afterward — the plan has regressed to a worse algorithm despite no query changes
Fix
Add ANALYZE table_name to every migration script after bulk inserts or deletes. Verify with EXPLAIN ANALYZE that estimated rows match actual rows after the migration.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How do you read a PostgreSQL execution plan — where do you start?
Q02SENIOR
A Hash Join in your execution plan shows Batches: 8. What does this mean...
Q03SENIOR
The execution plan estimated 100 rows but the query returned 5 million. ...
Q01 of 03SENIOR
How do you read a PostgreSQL execution plan — where do you start?
ANSWER
Start from the innermost node (the most deeply indented line) and read outward. The innermost nodes are the raw data retrieval operations — Seq Scan or Index Scan — and they execute first. Each outer node receives the result of its inner nodes. For each node, I look at three things: the node type (what operation), the cost estimate (startup..total), and most critically, the estimated rows vs actual rows when using EXPLAIN ANALYZE. A large divergence between estimated and actual rows means stale statistics and a potentially suboptimal plan. I then look for the most expensive nodes by actual time and investigate those first — Seq Scans on large tables, Nested Loops with high loops values, Sort nodes without index backing.
Q02 of 03SENIOR
A Hash Join in your execution plan shows Batches: 8. What does this mean and how do you fix it?
ANSWER
Batches > 1 means the hash table built for the Hash Join exceeded available work_mem and spilled to disk. PostgreSQL split the join into 8 batches, reading the data 8 times instead of once. This dramatically slows the join — disk I/O is orders of magnitude slower than memory. The immediate fix: increase work_mem for the session before running the query: SET work_mem = '512MB'. This is a session-level setting — do not increase it globally, as each concurrent connection would allocate up to that amount and you risk OOM. The permanent fix: either increase work_mem globally if the server has sufficient RAM and query concurrency is limited, or restructure the query to join on a smaller subset of rows (filter earlier with CTEs or subqueries to reduce the hash table size).
Q03 of 03SENIOR
The execution plan estimated 100 rows but the query returned 5 million. What do you do?
ANSWER
The statistics are stale. The planner chose a plan optimised for 100 rows (likely a Nested Loop join) that is catastrophic for 5 million rows. Immediate action: run ANALYZE on the affected table(s) to refresh statistics. Then run EXPLAIN ANALYZE again — the planner should now choose a plan appropriate for 5 million rows (typically a Hash Join). Verify that estimated rows now matches actual rows. If autovacuum should have handled this, check pg_stat_user_tables: SELECT relname, n_live_tup, last_autoanalyze FROM pg_stat_user_tables WHERE relname = 'orders'. If last_autoanalyze is stale or NULL, investigate autovacuum settings — autovacuum_analyze_scale_factor may be set too high for tables with large sudden row count changes.
01
How do you read a PostgreSQL execution plan — where do you start?
SENIOR
02
A Hash Join in your execution plan shows Batches: 8. What does this mean and how do you fix it?
SENIOR
03
The execution plan estimated 100 rows but the query returned 5 million. What do you do?
SENIOR
FAQ · 3 QUESTIONS
Frequently Asked Questions
01
What is the difference between EXPLAIN and EXPLAIN ANALYZE?
EXPLAIN shows the query plan based on statistics without executing the query. EXPLAIN ANALYZE actually runs the query and shows both the estimated plan and the real execution timings and row counts. For diagnosis, always use EXPLAIN ANALYZE — the estimated plan from plain EXPLAIN can be misleading when statistics are stale.
Was this helpful?
02
Does EXPLAIN ANALYZE work in MySQL?
MySQL supports EXPLAIN and, from MySQL 8.0.18 onward, EXPLAIN ANALYZE. MySQL EXPLAIN uses different node terminology than PostgreSQL: 'ALL' is equivalent to Seq Scan, 'ref' and 'eq_ref' are equivalent to Index Scan variants. The interpretation principles are the same — look for 'ALL' on large tables as the signal for a missing or unused index.
Was this helpful?
03
Why does my query use an index in EXPLAIN but a Seq Scan in production?
The planner can make different decisions based on the actual parameter values at execution time. A query WHERE customer_id = 42 with a parameter that returns 100 rows may use an index scan. The same query WHERE customer_id = 1 returning 500,000 rows may use a seq scan — correctly, because an index scan over 500,000 rows with heap fetches is slower than a sequential scan. This is parameter-dependent plan selection, related to parameter sniffing in stored procedures.