Senior 8 min · March 05, 2026

SQL EXPLAIN — Nested Loop on 5M Rows Caused 12-Min Query

Stale statistics on a 5M-row table: planner estimated 100 rows, chose Nested Loop, caused 12-min query.

N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • EXPLAIN shows the query plan the database will use; EXPLAIN ANALYZE actually runs the query and shows real timings
  • Seq Scan = full table scan — every row read — usually the problem node to fix
  • Index Scan = B-tree traversal + heap fetch per row; Index Only Scan = no heap fetch (fastest)
  • Cost is in arbitrary units: (startup_cost..total_cost rows=estimated width=bytes)
  • Nested Loop = one table drives, the other is looked up per row — good for small outer tables; Hash Join = build hash table from smaller table, probe with larger — good for large tables
  • Loops=N on an inner node means that node ran N times — the O(n²) signal for correlated subqueries
✦ Definition~90s read
What is SQL EXPLAIN and Execution Plans?

SQL EXPLAIN is a database diagnostic tool that reveals the execution plan a query optimizer generates to retrieve your data. Instead of guessing why a query is slow, EXPLAIN shows you the actual steps—scan methods (Seq Scan, Index Scan, Index Only Scan), join algorithms (Nested Loop, Hash Join, Merge Join), and row estimates at each node.

Every slow query has a story, and EXPLAIN is how you read it.

It exists because SQL is declarative: you say what you want, not how to get it. The optimizer chooses the how, and when it guesses wrong—like picking a Nested Loop join over 5 million rows instead of a Hash Join—you get a 12-minute query. EXPLAIN is your only window into that decision, letting you pinpoint where the plan breaks down and fix it with indexes, statistics, or query rewrites.

In the PostgreSQL ecosystem, EXPLAIN is the standard tool for query tuning, but it has a critical limitation: it shows the optimizer's estimated costs, not actual execution time. That’s where EXPLAIN ANALYZE comes in—it runs the query and reports real timings and row counts, exposing when estimates are off by orders of magnitude due to stale statistics or correlated columns.

Without ANALYZE, you’re debugging blind. Use EXPLAIN when you need to understand the plan structure; use EXPLAIN ANALYZE when you need the truth. Alternatives like pg_stat_statements or auto_explain provide historical or automated plan logging, but for ad-hoc debugging, nothing beats EXPLAIN ANALYZE.

When not to use EXPLAIN? Don’t rely on it for production queries with side effects (INSERT/UPDATE/DELETE) unless wrapped in a transaction and rolled back—EXPLAIN ANALYZE actually executes the query. Also, EXPLAIN alone can mislead you if statistics are outdated; always run ANALYZE on the relevant tables first.

For complex queries, focus on the highest-cost node first—that’s where your 12 minutes are hiding. Real-world example: a Nested Loop join on two tables with 5M and 100K rows, no index on the join column, and the optimizer underestimating the inner table’s row count—EXPLAIN ANALYZE shows 5M index scans instead of the estimated 100, and you know exactly where to add an index or force a Hash Join.

Plain-English First

Every slow query has a story, and EXPLAIN is how you read it. A query can be written in dozens of equivalent ways, and the database makes a decision about the most efficient path to execute it. EXPLAIN shows you that decision — like the GPS navigation plan before you drive. EXPLAIN ANALYZE is like driving the route and recording how long each segment actually took.

Every slow query has a story, and EXPLAIN is how you read it. In production, a query that ran fine on 10,000 rows can grind a system to a halt at 10 million. The difference isn't always the query itself — it's the execution plan the query optimiser chose. Developers who can read execution plans diagnose slow queries in minutes instead of hours of guesswork.

This guide teaches you to read execution plans the same way a senior DBA does: starting from the innermost node and reading the cost signals, not from the top down. By the end, you'll know the difference between every major node type, what the cost numbers actually mean, and how to use the plan to drive targeted fixes.

Why Your Query Is Slow: The Execution Plan Tells the Real Story

An SQL EXPLAIN execution plan is the database's step-by-step recipe for retrieving your requested data. It shows how the query optimizer chose to access tables, join them, and filter rows — including the algorithms used (e.g., nested loop, hash join, sequential scan) and their estimated costs. The plan is the only reliable way to understand why a query runs for minutes instead of milliseconds.

Execution plans are tree structures: each node represents an operation (scan, join, sort) with its own cost, row estimate, and actual row count when run with ANALYZE. The critical property is that the optimizer's estimates can be wildly wrong due to outdated statistics or complex predicates. A nested loop join with 5 million rows on the outer side and no index on the inner side will produce O(n*m) complexity — that's 25 trillion comparisons, explaining a 12-minute runtime.

Use EXPLAIN (ANALYZE, BUFFERS, TIMING) on any query that takes longer than 100ms in production. Without it, you're guessing. The plan reveals exactly where time is spent: sequential scans on large tables, missing indexes, or bad join order. In real systems, a single missing index can turn a 50ms query into a 5-minute disaster — the execution plan shows you the smoking gun.

Estimates ≠ Reality
EXPLAIN without ANALYZE shows optimizer guesses — they can be off by 1000x. Always use EXPLAIN (ANALYZE, BUFFERS) to see actual row counts and I/O.
Production Insight
A team ran a nightly report joining 5M orders to 2M customers with no index on customer_id — the nested loop produced 10 trillion comparisons, killing the database CPU for 12 minutes.
Symptom: 100% CPU on one core, query stuck in 'executing' state, other queries timing out due to resource contention.
Rule: If EXPLAIN shows a nested loop with estimated rows > 10,000 on the outer side, you must have an index on the inner join column — or force a hash join.
Key Takeaway
EXPLAIN (ANALYZE, BUFFERS) is the only way to diagnose query performance — never guess.
A nested loop join on large tables without an index is the #1 cause of multi-minute queries.
Always check actual vs. estimated row counts — a 100x mismatch means stale statistics or a bad query structure.
SQL EXPLAIN: Nested Loop on 5M Rows THECODEFORGE.IO SQL EXPLAIN: Nested Loop on 5M Rows How to read execution plans and fix slow queries Execution Plan Basics Seq Scan, Index Scan, Index Only Scan Join Algorithms Nested Loop, Hash Join, Merge Join Statistics & ANALYZE Why estimates go wrong EXPLAIN ANALYZE The only honest estimator Planner Mistakes Why wrong plan chosen and fix Parallel Plans Why 64-core machine idles ⚠ Relying only on EXPLAIN without ANALYZE Always use EXPLAIN ANALYZE to see actual vs estimated rows THECODEFORGE.IO
thecodeforge.io
SQL EXPLAIN: Nested Loop on 5M Rows
Sql Explain Execution Plans

How to Read an Execution Plan — The Fundamentals

An execution plan is a tree. The query planner breaks your SQL into operations and arranges them from innermost (raw data retrieval) to outermost (final result). You read a plan from the inside out — the innermost nodes execute first.

Every node in the plan shows three things: the operation type (Seq Scan, Index Scan, Hash Join), the cost estimate (startup_cost..total_cost), and the row and width estimates. When you run EXPLAIN ANALYZE, you also get the actual time and actual rows — the ground truth.

The most important comparison: estimated rows vs actual rows. When these diverge by an order of magnitude, the planner is working from stale statistics and the plan choice may be wrong. A planner that thinks a table has 1,000 rows when it has 1,000,000 will choose algorithms designed for small data.

explain_analyze_reading.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Basic EXPLAIN: shows estimated plan without executing
EXPLAIN
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name;

-- EXPLAIN ANALYZE: executes and shows actual timings
EXPLAIN ANALYZE
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name;

-- EXPLAIN ANALYZE BUFFERS: also shows cache hit ratio
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name;
Output
HashAggregate (cost=8420.00..8425.00 rows=500 width=40)
(actual time=124.3..125.1 rows=487 loops=1)
-> Hash Join (cost=1250.00..8100.00 rows=12000 width=32)
(actual time=18.2..119.4 rows=11842 loops=1)
Hash Cond: (o.customer_id = c.customer_id)
-> Seq Scan on orders (cost=0.00..6200.00 rows=12000 width=16)
(actual time=0.05..98.3 rows=11842 loops=1)
Filter: ((status)::text = 'completed')
Rows Removed by Filter: 38158
-> Hash (cost=750.00..750.00 rows=50000 width=24)
-> Seq Scan on customers (cost=0..750 rows=50000 width=24)
Always Use EXPLAIN ANALYZE on the Actual Slow Query
EXPLAIN without ANALYZE shows estimated plans based on statistics — it does not run the query. The estimates can be wrong. EXPLAIN ANALYZE runs the query and shows real timing. For read queries on production, this is safe. For destructive queries (DELETE, UPDATE), wrap in a transaction and roll back: BEGIN; EXPLAIN ANALYZE DELETE FROM ...; ROLLBACK;
Production Insight
Read execution plans from the innermost indented node outward — the deepest node executes first.
The most useful signal: actual rows vs estimated rows. A 10x or greater divergence means statistics are stale — run ANALYZE.
For production query analysis, use EXPLAIN (ANALYZE, BUFFERS) — the BUFFERS output shows how many data pages came from cache vs disk, which is often the real performance signal.
Key Takeaway
Execution plans are trees — read from innermost node (deepest indentation) outward.
Actual rows vs estimated rows is the most important signal — large divergence means stale statistics.
EXPLAIN ANALYZE runs the query; EXPLAIN alone uses estimates. Always ANALYZE with care on production destructive queries.

Seq Scan, Index Scan, and Index Only Scan — The Core Node Types

Three scan types appear in most queries. Understanding what each does determines what fix to apply.

Seq Scan (Sequential Scan) reads every page of the table in order. This is not always wrong — on small tables or when the query returns >15-20% of all rows, a sequential scan is cheaper than following millions of index pointers. But Seq Scan on a large table with a selective WHERE clause is the most common performance problem in SQL.

Index Scan uses a B-tree index to find matching row locations, then fetches each matching row from the heap (the main table storage). The index narrows the search; the heap provides the full row data. This is the standard fix for a Seq Scan — add an index on the WHERE column.

Index Only Scan is the fastest path — all data the query needs is in the index itself (a covering index), so the heap is never accessed. Zero heap fetches. Achievable by adding INCLUDE columns to the index definition.

scan_types_diagnosis.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Seq Scan: no index, or planner chose not to use one
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';
-- Seq Scan on orders (cost=0.00..8500.00 rows=1200 width=80)
-- Actual rows: 1200, loops=1  -- if this is >15% of the table, seq scan is correct

-- Add index -> Index Scan
CREATE INDEX idx_orders_status ON orders(status) WHERE status = 'pending';
-- partial index: only indexes the rows that match -- much smaller, more selective

EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';
-- Index Scan using idx_orders_status on orders
-- (cost=0.42..180.00 rows=1200 width=80) -- cost dropped from 8500 to 180

-- Covering index -> Index Only Scan (no heap access)
CREATE INDEX idx_orders_customer_status
    ON orders(customer_id, status)
    INCLUDE (order_id, total, created_at);  -- add SELECT columns

EXPLAIN ANALYZE
SELECT order_id, total, created_at
FROM orders
WHERE customer_id = 42 AND status = 'completed';
-- Index Only Scan using idx_orders_customer_status
-- Heap Fetches: 0  <- fastest possible path
Output
-- Before index: Seq Scan cost=8500
-- After index: Index Scan cost=180
-- With covering index: Index Only Scan, Heap Fetches: 0
-- Cost reduced from 8500 to ~40
Seq Scan Is Not Always Wrong
The planner chooses Seq Scan when it estimates it is cheaper than following index pointers — typically when >15-20% of rows would be returned. On a table of 1,000 rows, Seq Scan is almost always chosen regardless of indexes — and correctly so. Only investigate Seq Scan when it appears on a large table with a highly selective WHERE clause.
Production Insight
Seq Scan on a large table + low estimated rows = missing index or function wrapping the column.
Seq Scan on a large table + high estimated rows = low selectivity — the planner is correct, the query needs redesigning.
Index Only Scan is the goal for your top read-heavy queries — identify them with pg_stat_statements and cover them.
Key Takeaway
Seq Scan = full table read — investigate when on large tables with selective WHERE.
Index Scan = B-tree lookup + heap fetch per row — standard fix for Seq Scan on selective queries.
Index Only Scan = zero heap access — achievable with INCLUDE columns, the highest-performance read path.

Join Algorithms — Nested Loop, Hash Join, Merge Join

Three join algorithms appear in execution plans. The planner chooses based on estimated table sizes, available indexes, and whether the data is sorted.

Nested Loop Join drives with one table (the outer loop) and probes the inner table once per outer row. Fast when the outer table is small and the inner table has an index on the join column. Catastrophic when the outer table is large — the inner probe repeats N times where N is the outer row count. The loops=N in the inner node of EXPLAIN ANALYZE reveals this.

Hash Join builds a hash table from the smaller table, then probes it for each row of the larger table. Two passes over the data, but O(n+m) rather than O(n×m). The standard choice for large table joins. Requires memory — if the hash table spills to disk (HashBatches > 1 in EXPLAIN output), performance degrades significantly.

Merge Join requires both tables to be sorted on the join column. Extremely fast when both sides are already sorted or can use an index scan in sorted order. Rarely chosen by the planner unless sort order is already available.

join_algorithms_explain.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Force specific join type for comparison (PostgreSQL)
-- Normally leave this to the planner -- these are for diagnosis only

SET enable_hashjoin = off;
SET enable_mergejoin = off;
EXPLAIN ANALYZE SELECT * FROM orders o JOIN customers c ON c.customer_id = o.customer_id;
-- Will use Nested Loop -- shows cost with forced algorithm

RESET enable_hashjoin;
RESET enable_mergejoin;
EXPLAIN ANALYZE SELECT * FROM orders o JOIN customers c ON c.customer_id = o.customer_id;
-- Planner chooses optimal -- typically Hash Join for large tables

-- Diagnose: is Hash Join spilling to disk?
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large_table a JOIN another_large_table b ON a.id = b.a_id;
-- Look for: Batches: 4 (instead of 1)
-- Batches > 1 means the hash table spilled to disk -- increase work_mem
-- SET work_mem = '256MB'; -- session-level increase for expensive joins
Output
-- Nested Loop (forced, 100K rows outer):
-- Nested Loop (actual time=0.04..18420.3 rows=100000 loops=1)
-- -> Seq Scan on orders (loops=1)
-- -> Index Scan on customers (loops=100000) <- inner runs 100K times
-- Hash Join (planner chose, same query):
-- Hash Join (actual time=42.1..380.2 rows=100000 loops=1)
-- -> Seq Scan on orders (loops=1)
-- -> Hash -> Seq Scan on customers (loops=1) <- hash built once
-- Massive improvement: 18420ms -> 380ms
Hash Join Spilling to Disk
If EXPLAIN ANALYZE shows Batches: 4 (or any number > 1) on a Hash Join, the hash table exceeded work_mem and spilled to disk. This dramatically slows the join. Increase work_mem for the session: SET work_mem = '256MB'. This is a session-level setting — increase it for specific expensive queries, not globally, to avoid OOM on concurrent connections.
Production Insight
Nested Loop + large loops= number is the most common join performance problem — it means the planner underestimated the outer table's size (stale statistics).
Hash Join Batches > 1 means memory pressure — increase work_mem for the session, not globally.
Planner forcing (SET enable_hashjoin = off) is a diagnostic tool only — never leave it disabled in production.
Key Takeaway
Nested Loop = good for small outer tables + indexed inner; catastrophic for large outer tables.
Hash Join = large table joins; Batches > 1 means hash spilled to disk — increase work_mem.
Merge Join = when both sides are already sorted — rare but extremely fast when applicable.

Statistics, ANALYZE, and Why Estimates Go Wrong

The query planner does not read your data to make decisions — it reads statistics. PostgreSQL maintains per-column statistics: the number of distinct values, the most common values and their frequencies, and a histogram of the value distribution. These statistics live in pg_statistic and are updated by VACUUM ANALYZE.

When statistics are stale — after a bulk insert, a data migration, or a period of high insert/delete volume — the planner's row estimates can be orders of magnitude off. A planner that thinks a table has 100,000 rows when it has 5,000,000 will choose Nested Loop joins instead of Hash Joins, index scans instead of seq scans on highly selective queries, and estimate sort operations as cheap when they are expensive.

The practical workflow: whenever a query plan looks wrong (estimated rows diverges greatly from actual rows in EXPLAIN ANALYZE), run ANALYZE on the affected table, then re-run EXPLAIN ANALYZE. In most cases the plan improves immediately.

statistics_and_analyze.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Check how fresh the statistics are for a table
SELECT
    relname          AS table_name,
    n_live_tup       AS estimated_live_rows,
    n_dead_tup       AS dead_rows,
    last_analyze     AS last_manual_analyze,
    last_autoanalyze AS last_auto_analyze
FROM pg_stat_user_tables
WHERE relname = 'orders';

-- Force statistics update after bulk import or migration
ANALYZE orders;
-- Or with verbosity:
ANALYZE VERBOSE orders;

-- Check column-level statistics (what the planner sees)
SELECT
    attname          AS column,
    n_distinct,      -- negative = fraction of total rows; -0.5 means 50% distinct
    correlation      -- 1.0 = perfectly sorted, 0 = random -- affects index vs seq scan choice
FROM pg_stats
WHERE tablename = 'orders'
  AND attname IN ('customer_id', 'status', 'created_at');

-- Identify tables with stale or missing statistics
SELECT relname, n_live_tup, last_analyze
FROM pg_stat_user_tables
WHERE last_analyze < NOW() - INTERVAL '7 days'
   OR last_analyze IS NULL
ORDER BY n_live_tup DESC;
Output
-- pg_stat_user_tables:
table_name | estimated_live_rows | last_analyze
orders | 5000000 | 2024-03-14 02:30:00 <- analyzed recently
customer_stats | 85000 | NULL <- NEVER analyzed! Statistics are default
-- pg_stats for orders:
column | n_distinct | correlation
customer_id| -0.8 | 0.12 <- 80% distinct, random distribution
status | 4 | 0.45 <- only 4 distinct values (low cardinality)
created_at | -1.0 | 0.98 <- highly distinct, mostly sorted
Always ANALYZE After Bulk Inserts and Migrations
ANALYZE must be run manually after bulk inserts — autovacuum may not keep up with sudden large row count changes. A table that grows from 100K to 5M rows in a single migration will have statistics based on the old row count until ANALYZE runs. Add ANALYZE to every migration script that inserts or deletes large numbers of rows.
Production Insight
The single most impactful EXPLAIN diagnostic: compare estimated rows vs actual rows in EXPLAIN ANALYZE output — any 10x divergence means stale statistics and a likely suboptimal plan.
For tables with high insert/delete rates, check last_autoanalyze frequency in pg_stat_user_tables — if autovacuum is not running frequently enough, adjust autovacuum_analyze_scale_factor.
Add ANALYZE tablename to every migration script that affects large row counts.
Key Takeaway
The planner uses statistics, not data — stale statistics produce wrong plans.
EXPLAIN ANALYZE shows estimated vs actual rows — large divergence = run ANALYZE.
Add ANALYZE to migration scripts; monitor last_autoanalyze in pg_stat_user_tables for high-volume tables.

EXPLAIN ANALYZE — The Only Honest Estimator

The plain EXPLAIN lies to you. It shows the planner's guess — what it thinks will happen based on table statistics that might be stale, skewed, or just wrong. You see an Index Scan with cost 12.42 and assume your query is fine. Then it runs for three seconds in production.

EXPLAIN ANALYZE actually executes the query. It shows real timings, real row counts, and the actual number of loops for each node. The gap between "estimated rows" and "actual rows" is where performance goes to die. When you see estimated=1 vs actual=1,000,000, you know the planner chose a terrible join order or scan method based on bad stats.

Run EXPLAIN ANALYZE on a representative dataset — not a local dev table with three rows. Capture the output before a migration or schema change. Compare the before and after. That's how you catch regressions before they hit users.

The time overhead is real, especially on writes (INSERT/UPDATE/DELETE). Don't run ANALYZE on production OLTP during peak hours. Use a read replica or a staging environment with production-scale data.

ExplainAnalyzeCost.sqlSQL
1
2
3
4
5
6
7
8
9
// io.thecodeforge — database tutorial

EXPLAIN (ANALYZE, BUFFERS, TIMING) 
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2024-01-01'
ORDER BY o.total DESC
LIMIT 100;
Output
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1583.45..1583.70 rows=100 width=64) (actual time=15.432..15.450 rows=100 loops=1)
-> Sort (cost=1583.45..1633.45 rows=20000 width=64) (actual time=15.431..15.442 rows=100 loops=1)
Sort Key: o.total DESC
Sort Method: top-N heapsort Memory: 37kB
-> Hash Join (cost=308.25..1083.25 rows=20000 width=64) (actual time=3.112..12.001 rows=20000 loops=1)
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o (cost=0.00..625.00 rows=20000 width=36) (actual time=0.012..4.543 rows=20000 loops=1)
Filter: (created_at > '2024-01-01')
Rows Removed by Filter: 50000
-> Hash (cost=183.25..183.25 rows=10000 width=36) (actual time=2.100..2.100 rows=10000 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 561kB
-> Seq Scan on customers c (cost=0.00..183.25 rows=10025 width=36) (actual time=0.008..1.023 rows=10000 loops=1)
Planning Time: 0.245 ms
Execution Time: 15.589 ms
Production Trap:
EXPLAIN ANALYZE on a DELETE or UPDATE will actually modify data. Always wrap in a transaction and ROLLBACK if you're just diagnosing. Better yet, use EXPLAIN (ANALYZE, FORMAT JSON) which doesn't mutate state.
Key Takeaway
Always compare estimated rows to actual rows. A 10x or higher mismatch means your planner is flying blind — fix it with VACUUM ANALYZE or better statistics targets.

Why the Planner Chooses Wrong — and How to Fix It

You've seen it: a query that was fast yesterday is slow today. No code change. No schema change. The execution plan flipped from an Index Scan to a Sequential Scan. Your first instinct is wrong — don't add an index hint or force a plan. Fix the root cause.

The planner's job is to minimize total cost. It relies on table statistics: row counts, null fractions, average column width, distribution histograms. When those stats are stale — after massive inserts, deletes, or bulk updates — the planner guesses wrong. A table that had 1,000 rows an hour ago now has 10,000,000. The planner still thinks a Seq Scan is cheap because it expects 1,000 rows. Oops.

Run ANALYZE after bulk operations. Not just once — set autovacuum thresholds that match your write volume. For tables with heavy churn (event logs, audit trails), consider increasing default_statistics_target to 1000 or higher for columns used in WHERE and JOIN predicates. That captures histograms with 1000 buckets instead of the default 100. More buckets = better cardinality estimates for skewed data.

Still wrong? Use pg_stats to inspect column frequency arrays and bucket boundaries. If the planner doesn't see a value's histogram entry, it assumes uniform distribution. Real data is never uniform.

FixBadPlan.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — database tutorial

-- Check current statistics target (default 100)
SHOW default_statistics_target;

-- Increase per-column for the join column
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;

-- Collect fresh stats
ANALYZE orders;

-- Verify the histogram now has enough buckets
SELECT attname, n_distinct, most_common_vals, histogram_bounds
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'customer_id';
Output
default_statistics_target
---------------------------
100
(1 row)
-- After ANALYZE:
attname | n_distinct | most_common_vals | histogram_bounds
-------------+------------+------------------+------------------
customer_id | -0.85 | {142, 583, 902} | {1, 12, 27, 45, ...}
(1 row)
Senior Shortcut:
Set default_statistics_target to 500 for databases with skewed or large tables. The planning time increase is negligible — the execution plan reliability gain is massive.
Key Takeaway
Stale statistics are the #1 cause of sudden query regressions. Automate ANALYZE after every bulk write, or your planner will make increasingly terrible choices.

Parallel Plans: Why Your 64-Core Machine Idles While You Wait

Most devs assume throwing hardware at a slow query makes it faster. The planner disagrees. It decides whether a parallel plan is worth the overhead — and it's often wrong.

Parallelism kicks in when the planner estimates a query will process enough rows to justify splitting work across workers. It starts with a Gather or Gather Merge node, then fans out to partial scans or joins. The problem? If your table is small or your WHERE clause is selective, the coordinator spends more time orchestrating than the workers spend executing. You get slower results on a screaming-fast machine.

Check parallel_workers and parallel_tuple_cost in the plan. A Seq Scan on a 10M-row table should go parallel. An Index Scan returning 100 rows should not. Force parallelism with a hint only after confirming the planner's row estimate is correct — otherwise you're just burning CPU cycles.

ParallelCheck.sqlSQL
1
2
3
4
5
6
7
8
9
10
// io.thecodeforge — database tutorial

-- Force a parallel plan to see what the planner avoids
SET max_parallel_workers_per_gather = 4;

EXPLAIN (ANALYZE, TIMING OFF)
SELECT count(*) FROM orders WHERE order_date >= '2024-01-01';

-- Output shows Gather node with workers
-- If you see 'Workers Launched: 0', the planner thinks it's not worth it
Output
QUERY PLAN
---------------------------------------------------------------------
Finalize Aggregate (cost=12421.30..12421.31 rows=1 width=8)
-> Gather (cost=12421.08..12421.29 rows=2 width=8)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=11421.08..11421.09 rows=1 width=8)
-> Parallel Seq Scan on orders
Filter: (order_date >= '2024-01-01'::date)
Production Trap:
Parallel workers consume connections. If you set max_parallel_workers_per_gather too high, you'll starve other queries and hit connection pool limits. Never set it above half your max_connections.
Key Takeaway
Parallelism has a cost. If your plan shows Workers Launched: 0, trust the planner — or fix your estimates first.

Subquery vs. CTE: The Planner's Dirty Secret About Materialization

Every dev loves a CTE for readability. The planner loves it for a different reason: it materializes the CTE once and reuses it. That sounds great until the CTE returns 10M rows and the outer query only needs 10.

Subqueries get flattened into the main plan. The planner can push predicates down, use indexes, and join in smarter order. CTEs act as optimization fences. Once materialized, the planner treats the result as a static table — no pushdown, no index use. You're stuck with a full scan on a temp dataset.

Check the plan for 'CTE Scan' or 'Materialize' nodes. If you see CTE Scan on my_cte followed by a sequential scan, you're paying the materialization tax. Rewrite as a subquery or a LATERAL join. Exceptions: recursive CTEs and CTEs referenced more than once. Those genuinely benefit from materialization.

CTEvsSubquery.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — database tutorial

-- Slow CTE: plan shows CTE Scan + Seq Scan on large dataset
EXPLAIN ANALYZE
WITH recent_orders AS (
    SELECT * FROM orders WHERE order_date > '2024-06-01'
)
SELECT * FROM recent_orders WHERE customer_id = 42;

-- Fast subquery: plan shows Index Scan with predicate pushdown
EXPLAIN ANALYZE
SELECT * FROM orders 
WHERE order_date > '2024-06-01' 
  AND customer_id = 42;
Output
CTE Plan:
---------------------------------------------------------------------
CTE Scan on recent_orders (cost=1234.56..2234.67 rows=50000 width=100)
Filter: (customer_id = 42)
Subquery Plan:
---------------------------------------------------------------------
Index Scan using idx_orders_customer_date on orders
Index Cond: (customer_id = 42)
Filter: (order_date > '2024-06-01'::date)
Senior Shortcut:
If you must use a CTE and it's referenced only once, slap a NOT MATERIALIZED hint after the AS keyword. PostgreSQL 12+ respects it. Oracle and SQL Server ignore it — know your engine.
Key Takeaway
CTEs materialize by default. If your plan shows CTE Scan, consider a subquery — or add NOT MATERIALIZED to force inlining.
● Production incidentPOST-MORTEMseverity: high

A Nested Loop Join on a 5-Million-Row Table Caused a 12-Minute Query

Symptom
A weekly revenue report that generated in 30 seconds started taking 12 minutes after a data migration increased the orders table from 100,000 rows to 5 million. The query had not changed.
Assumption
The query had been running fine for a year. No one checked whether the execution plan had changed after the migration.
Root cause
The query planner's statistics were stale — it estimated 100 rows for the orders table (based on pre-migration statistics) and chose a Nested Loop join as appropriate for small tables. At 5 million rows, a Nested Loop repeats the inner table probe 5 million times. The correct plan for 5 million rows was a Hash Join, which the planner would have chosen with accurate statistics.
Fix
ANALYZE orders; forced the statistics to update. The planner immediately chose a Hash Join on the next run. Query time dropped from 12 minutes to 18 seconds. Added ANALYZE to the post-migration checklist.
Key lesson
  • Run ANALYZE on large tables after bulk inserts or migrations — stale statistics cause wrong plan choices
  • Always run EXPLAIN ANALYZE after any significant data volume change to verify the plan has not regressed
  • The planner estimates rows based on statistics — if the estimate is wildly off from actual rows, statistics are stale
Production debug guideRead the plan systematically — innermost node first4 entries
Symptom · 01
Seq Scan on a large table in the plan
Fix
Check if the WHERE column has an index. If yes, check if the query uses a function on the column (WHERE LOWER(email) kills the index). Check selectivity — if >20% of rows match, the planner may prefer Seq Scan even with an index. Run ANALYZE to refresh statistics.
Symptom · 02
Actual rows is many times higher than estimated rows
Fix
Stale statistics. Run ANALYZE table_name. For tables with frequent inserts/deletes, check autovacuum settings — autovacuum may not be keeping statistics current. The planner chooses plans based on estimates — bad estimates produce bad plans.
Symptom · 03
Nested Loop with loops=N where N is very large
Fix
The outer table is large and the inner probe is repeating N times. Add an index on the inner table's JOIN column. If N is in the millions, the planner may have stale statistics causing it to underestimate the outer table's row count — run ANALYZE.
Symptom · 04
Sort node with high actual time
Fix
The query cannot use an index for ORDER BY — it is performing a filesort. Add an index that matches the ORDER BY column order. For window functions, add a composite index on (PARTITION_BY_col, ORDER_BY_col).
★ EXPLAIN Reading Cheat SheetKey nodes, what they mean, and what to do
Seq Scan on large table
Immediate action
Check for missing index or function wrapping the column
Commands
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
SELECT indexname FROM pg_indexes WHERE tablename = 'orders';
Fix now
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Stale estimates — actual rows far exceeds estimated rows+
Immediate action
Refresh table statistics
Commands
ANALYZE orders;
EXPLAIN ANALYZE SELECT ... ; -- rerun the slow query
Fix now
If autovacuum is not keeping up, check: SELECT relname, n_live_tup, n_dead_tup, last_analyze FROM pg_stat_user_tables WHERE relname = 'orders';
Node TypeWhat It DoesWhen It AppearsWhat To Do
Seq ScanReads every table page sequentiallyNo index, low selectivity, or small tableAdd index if WHERE is selective; check if >15% of rows returned
Index ScanB-tree lookup + heap fetch per rowSelective WHERE with indexUsually optimal — consider INCLUDE for Index Only Scan
Index Only ScanB-tree lookup only, no heap accessCovering index on all needed columnsIdeal — add INCLUDE columns to achieve this
Nested LoopOuter table drives, inner probed per rowSmall outer table or index on inner JOIN colCatastrophic on large outer — add index to inner or check statistics
Hash JoinBuild hash table, probe with other tableLarge table joins without sort orderStandard for large joins — check Batches > 1 (spill to disk)
SortSort rows for ORDER BY or Merge JoinORDER BY without matching indexAdd index matching ORDER BY order
HashAggregateGroup rows for GROUP BYGROUP BY or DISTINCTUsually fine — check for large estimated distinct count

Key takeaways

1
Read execution plans from the innermost (deepest indented) node outward
innermost nodes execute first.
2
Estimated rows vs actual rows is the most important signal
10x divergence means stale statistics, run ANALYZE.
3
Seq Scan is only a problem on large tables with selective WHERE clauses
on small tables or low-selectivity queries, it is often correct.
4
Always run ANALYZE after bulk inserts and migrations
add it to every migration script that changes large row counts.

Common mistakes to avoid

3 patterns
×

Running EXPLAIN without ANALYZE and treating the estimated plan as ground truth

Symptom
The estimated plan looks fine but the query is slow in practice — actual rows is orders of magnitude higher than estimated rows
Fix
Always use EXPLAIN ANALYZE for diagnosis. For destructive queries, wrap in a transaction: BEGIN; EXPLAIN ANALYZE UPDATE ...; ROLLBACK. The estimated plan from plain EXPLAIN is based on statistics that may be stale.
×

Immediately adding an index after seeing a Seq Scan without checking selectivity

Symptom
The new index is never used by the planner — it still chooses Seq Scan; the index adds write overhead for no read benefit
Fix
Check what percentage of rows the WHERE clause returns. If >15-20% of rows match, Seq Scan is correct and an index will not help. Only add an index when the WHERE clause is highly selective (returns a small fraction of total rows).
×

Not running ANALYZE after bulk inserts or data migrations

Symptom
Queries that were fast before a migration become slow afterward — the plan has regressed to a worse algorithm despite no query changes
Fix
Add ANALYZE table_name to every migration script after bulk inserts or deletes. Verify with EXPLAIN ANALYZE that estimated rows match actual rows after the migration.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How do you read a PostgreSQL execution plan — where do you start?
Q02SENIOR
A Hash Join in your execution plan shows Batches: 8. What does this mean...
Q03SENIOR
The execution plan estimated 100 rows but the query returned 5 million. ...
Q01 of 03SENIOR

How do you read a PostgreSQL execution plan — where do you start?

ANSWER
Start from the innermost node (the most deeply indented line) and read outward. The innermost nodes are the raw data retrieval operations — Seq Scan or Index Scan — and they execute first. Each outer node receives the result of its inner nodes. For each node, I look at three things: the node type (what operation), the cost estimate (startup..total), and most critically, the estimated rows vs actual rows when using EXPLAIN ANALYZE. A large divergence between estimated and actual rows means stale statistics and a potentially suboptimal plan. I then look for the most expensive nodes by actual time and investigate those first — Seq Scans on large tables, Nested Loops with high loops values, Sort nodes without index backing.
FAQ · 3 QUESTIONS

Frequently Asked Questions

01
What is the difference between EXPLAIN and EXPLAIN ANALYZE?
02
Does EXPLAIN ANALYZE work in MySQL?
03
Why does my query use an index in EXPLAIN but a Seq Scan in production?
N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's SQL Advanced. Mark it forged?

8 min read · try the examples if you haven't

Previous
SQL Query Optimisation
9 / 16 · SQL Advanced
Next
Full-Text Search in SQL