Advanced 8 min · March 05, 2026

SQL EXPLAIN and Execution Plans

SQL EXPLAIN — Nested Loop on 5M Rows Caused 12-Min Query

Q: What is the difference between EXPLAIN and EXPLAIN ANALYZE?

EXPLAIN shows the query plan based on statistics without executing the query. EXPLAIN ANALYZE actually runs the query and shows both the estimated plan and the real execution timings and row counts. For diagnosis, always use EXPLAIN ANALYZE — the estimated plan from plain EXPLAIN can be misleading when statistics are stale.

Q: Does EXPLAIN ANALYZE work in MySQL?

MySQL supports EXPLAIN and, from MySQL 8.0.18 onward, EXPLAIN ANALYZE. MySQL EXPLAIN uses different node terminology than PostgreSQL: 'ALL' is equivalent to Seq Scan, 'ref' and 'eq_ref' are equivalent to Index Scan variants. The interpretation principles are the same — look for 'ALL' on large tables as the signal for a missing or unused index.

Q: Why does my query use an index in EXPLAIN but a Seq Scan in production?

The planner can make different decisions based on the actual parameter values at execution time. A query WHERE customer_id = 42 with a parameter that returns 100 rows may use an index scan. The same query WHERE customer_id = 1 returning 500,000 rows may use a seq scan — correctly, because an index scan over 500,000 rows with heap fetches is slower than a sequential scan. This is parameter-dependent plan selection, related to parameter sniffing in stored procedures.

Stale statistics on a 5M-row table: planner estimated 100 rows, chose Nested Loop, caused 12-min query.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

EXPLAIN shows the query plan the database will use; EXPLAIN ANALYZE actually runs the query and shows real timings
Seq Scan = full table scan — every row read — usually the problem node to fix
Index Scan = B-tree traversal + heap fetch per row; Index Only Scan = no heap fetch (fastest)
Cost is in arbitrary units: (startup_cost..total_cost rows=estimated width=bytes)
Nested Loop = one table drives, the other is looked up per row — good for small outer tables; Hash Join = build hash table from smaller table, probe with larger — good for large tables
Loops=N on an inner node means that node ran N times — the O(n²) signal for correlated subqueries

✦ Definition~90s read

What is SQL EXPLAIN and Execution Plans?

SQL EXPLAIN is a database diagnostic tool that reveals the execution plan a query optimizer generates to retrieve your data. Instead of guessing why a query is slow, EXPLAIN shows you the actual steps—scan methods (Seq Scan, Index Scan, Index Only Scan), join algorithms (Nested Loop, Hash Join, Merge Join), and row estimates at each node.

★

Every slow query has a story, and EXPLAIN is how you read it.

It exists because SQL is declarative: you say what you want, not how to get it. The optimizer chooses the how, and when it guesses wrong—like picking a Nested Loop join over 5 million rows instead of a Hash Join—you get a 12-minute query. EXPLAIN is your only window into that decision, letting you pinpoint where the plan breaks down and fix it with indexes, statistics, or query rewrites.

In the PostgreSQL ecosystem, EXPLAIN is the standard tool for query tuning, but it has a critical limitation: it shows the optimizer's estimated costs, not actual execution time. That’s where EXPLAIN ANALYZE comes in—it runs the query and reports real timings and row counts, exposing when estimates are off by orders of magnitude due to stale statistics or correlated columns.

Without ANALYZE, you’re debugging blind. Use EXPLAIN when you need to understand the plan structure; use EXPLAIN ANALYZE when you need the truth. Alternatives like pg_stat_statements or auto_explain provide historical or automated plan logging, but for ad-hoc debugging, nothing beats EXPLAIN ANALYZE.

When not to use EXPLAIN? Don’t rely on it for production queries with side effects (INSERT/UPDATE/DELETE) unless wrapped in a transaction and rolled back—EXPLAIN ANALYZE actually executes the query. Also, EXPLAIN alone can mislead you if statistics are outdated; always run ANALYZE on the relevant tables first.

For complex queries, focus on the highest-cost node first—that’s where your 12 minutes are hiding. Real-world example: a Nested Loop join on two tables with 5M and 100K rows, no index on the join column, and the optimizer underestimating the inner table’s row count—EXPLAIN ANALYZE shows 5M index scans instead of the estimated 100, and you know exactly where to add an index or force a Hash Join.

Plain-English First

Every slow query has a story, and EXPLAIN is how you read it. A query can be written in dozens of equivalent ways, and the database makes a decision about the most efficient path to execute it. EXPLAIN shows you that decision — like the GPS navigation plan before you drive. EXPLAIN ANALYZE is like driving the route and recording how long each segment actually took.

Every slow query has a story, and EXPLAIN is how you read it. In production, a query that ran fine on 10,000 rows can grind a system to a halt at 10 million. The difference isn't always the query itself — it's the execution plan the query optimiser chose. Developers who can read execution plans diagnose slow queries in minutes instead of hours of guesswork.

This guide teaches you to read execution plans the same way a senior DBA does: starting from the innermost node and reading the cost signals, not from the top down. By the end, you'll know the difference between every major node type, what the cost numbers actually mean, and how to use the plan to drive targeted fixes.

Why Your Query Is Slow: The Execution Plan Tells the Real Story

An SQL EXPLAIN execution plan is the database's step-by-step recipe for retrieving your requested data. It shows how the query optimizer chose to access tables, join them, and filter rows — including the algorithms used (e.g., nested loop, hash join, sequential scan) and their estimated costs. The plan is the only reliable way to understand why a query runs for minutes instead of milliseconds.

Execution plans are tree structures: each node represents an operation (scan, join, sort) with its own cost, row estimate, and actual row count when run with ANALYZE. The critical property is that the optimizer's estimates can be wildly wrong due to outdated statistics or complex predicates. A nested loop join with 5 million rows on the outer side and no index on the inner side will produce O(n*m) complexity — that's 25 trillion comparisons, explaining a 12-minute runtime.

Use EXPLAIN (ANALYZE, BUFFERS, TIMING) on any query that takes longer than 100ms in production. Without it, you're guessing. The plan reveals exactly where time is spent: sequential scans on large tables, missing indexes, or bad join order. In real systems, a single missing index can turn a 50ms query into a 5-minute disaster — the execution plan shows you the smoking gun.

⚠ Estimates ≠ Reality

EXPLAIN without ANALYZE shows optimizer guesses — they can be off by 1000x. Always use EXPLAIN (ANALYZE, BUFFERS) to see actual row counts and I/O.

📊 Production Insight

A team ran a nightly report joining 5M orders to 2M customers with no index on customer_id — the nested loop produced 10 trillion comparisons, killing the database CPU for 12 minutes.

Symptom: 100% CPU on one core, query stuck in 'executing' state, other queries timing out due to resource contention.

Rule: If EXPLAIN shows a nested loop with estimated rows > 10,000 on the outer side, you must have an index on the inner join column — or force a hash join.

🎯 Key Takeaway

EXPLAIN (ANALYZE, BUFFERS) is the only way to diagnose query performance — never guess.

A nested loop join on large tables without an index is the #1 cause of multi-minute queries.

Always check actual vs. estimated row counts — a 100x mismatch means stale statistics or a bad query structure.

thecodeforge.io

Sql Explain Execution Plans

How to Read an Execution Plan — The Fundamentals

An execution plan is a tree. The query planner breaks your SQL into operations and arranges them from innermost (raw data retrieval) to outermost (final result). You read a plan from the inside out — the innermost nodes execute first.

Every node in the plan shows three things: the operation type (Seq Scan, Index Scan, Hash Join), the cost estimate (startup_cost..total_cost), and the row and width estimates. When you run EXPLAIN ANALYZE, you also get the actual time and actual rows — the ground truth.

The most important comparison: estimated rows vs actual rows. When these diverge by an order of magnitude, the planner is working from stale statistics and the plan choice may be wrong. A planner that thinks a table has 1,000 rows when it has 1,000,000 will choose algorithms designed for small data.

explain_analyze_reading.sqlSQL

-- Basic EXPLAIN: shows estimated plan without executing
EXPLAIN
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name;

-- EXPLAIN ANALYZE: executes and shows actual timings
EXPLAIN ANALYZE
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name;

-- EXPLAIN ANALYZE BUFFERS: also shows cache hit ratio
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name;

Output

HashAggregate (cost=8420.00..8425.00 rows=500 width=40)

(actual time=124.3..125.1 rows=487 loops=1)

-> Hash Join (cost=1250.00..8100.00 rows=12000 width=32)

(actual time=18.2..119.4 rows=11842 loops=1)

Hash Cond: (o.customer_id = c.customer_id)

-> Seq Scan on orders (cost=0.00..6200.00 rows=12000 width=16)

(actual time=0.05..98.3 rows=11842 loops=1)

Filter: ((status)::text = 'completed')

Rows Removed by Filter: 38158

-> Hash (cost=750.00..750.00 rows=50000 width=24)

-> Seq Scan on customers (cost=0..750 rows=50000 width=24)

💡Always Use EXPLAIN ANALYZE on the Actual Slow Query

EXPLAIN without ANALYZE shows estimated plans based on statistics — it does not run the query. The estimates can be wrong. EXPLAIN ANALYZE runs the query and shows real timing. For read queries on production, this is safe. For destructive queries (DELETE, UPDATE), wrap in a transaction and roll back: BEGIN; EXPLAIN ANALYZE DELETE FROM ...; ROLLBACK;

📊 Production Insight

Read execution plans from the innermost indented node outward — the deepest node executes first.

The most useful signal: actual rows vs estimated rows. A 10x or greater divergence means statistics are stale — run ANALYZE.

For production query analysis, use EXPLAIN (ANALYZE, BUFFERS) — the BUFFERS output shows how many data pages came from cache vs disk, which is often the real performance signal.

🎯 Key Takeaway

Execution plans are trees — read from innermost node (deepest indentation) outward.

Actual rows vs estimated rows is the most important signal — large divergence means stale statistics.

EXPLAIN ANALYZE runs the query; EXPLAIN alone uses estimates. Always ANALYZE with care on production destructive queries.

Seq Scan, Index Scan, and Index Only Scan — The Core Node Types

Three scan types appear in most queries. Understanding what each does determines what fix to apply.

Seq Scan (Sequential Scan) reads every page of the table in order. This is not always wrong — on small tables or when the query returns >15-20% of all rows, a sequential scan is cheaper than following millions of index pointers. But Seq Scan on a large table with a selective WHERE clause is the most common performance problem in SQL.

Index Scan uses a B-tree index to find matching row locations, then fetches each matching row from the heap (the main table storage). The index narrows the search; the heap provides the full row data. This is the standard fix for a Seq Scan — add an index on the WHERE column.

Index Only Scan is the fastest path — all data the query needs is in the index itself (a covering index), so the heap is never accessed. Zero heap fetches. Achievable by adding INCLUDE columns to the index definition.

scan_types_diagnosis.sqlSQL

-- Seq Scan: no index, or planner chose not to use one
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';
-- Seq Scan on orders (cost=0.00..8500.00 rows=1200 width=80)
-- Actual rows: 1200, loops=1  -- if this is >15% of the table, seq scan is correct

-- Add index -> Index Scan
CREATE INDEX idx_orders_status ON orders(status) WHERE status = 'pending';
-- partial index: only indexes the rows that match -- much smaller, more selective

EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';
-- Index Scan using idx_orders_status on orders
-- (cost=0.42..180.00 rows=1200 width=80) -- cost dropped from 8500 to 180

-- Covering index -> Index Only Scan (no heap access)
CREATE INDEX idx_orders_customer_status
    ON orders(customer_id, status)
    INCLUDE (order_id, total, created_at);  -- add SELECT columns

EXPLAIN ANALYZE
SELECT order_id, total, created_at
FROM orders
WHERE customer_id = 42 AND status = 'completed';
-- Index Only Scan using idx_orders_customer_status
-- Heap Fetches: 0  <- fastest possible path

Output

-- Before index: Seq Scan cost=8500

-- After index: Index Scan cost=180

-- With covering index: Index Only Scan, Heap Fetches: 0

-- Cost reduced from 8500 to ~40

⚠ Seq Scan Is Not Always Wrong

The planner chooses Seq Scan when it estimates it is cheaper than following index pointers — typically when >15-20% of rows would be returned. On a table of 1,000 rows, Seq Scan is almost always chosen regardless of indexes — and correctly so. Only investigate Seq Scan when it appears on a large table with a highly selective WHERE clause.

📊 Production Insight

Seq Scan on a large table + low estimated rows = missing index or function wrapping the column.

Seq Scan on a large table + high estimated rows = low selectivity — the planner is correct, the query needs redesigning.

Index Only Scan is the goal for your top read-heavy queries — identify them with pg_stat_statements and cover them.

🎯 Key Takeaway

Seq Scan = full table read — investigate when on large tables with selective WHERE.

Index Scan = B-tree lookup + heap fetch per row — standard fix for Seq Scan on selective queries.

Index Only Scan = zero heap access — achievable with INCLUDE columns, the highest-performance read path.

thecodeforge.io

Sql Explain Execution Plans

Join Algorithms — Nested Loop, Hash Join, Merge Join

Three join algorithms appear in execution plans. The planner chooses based on estimated table sizes, available indexes, and whether the data is sorted.

Nested Loop Join drives with one table (the outer loop) and probes the inner table once per outer row. Fast when the outer table is small and the inner table has an index on the join column. Catastrophic when the outer table is large — the inner probe repeats N times where N is the outer row count. The loops=N in the inner node of EXPLAIN ANALYZE reveals this.

Hash Join builds a hash table from the smaller table, then probes it for each row of the larger table. Two passes over the data, but O(n+m) rather than O(n×m). The standard choice for large table joins. Requires memory — if the hash table spills to disk (HashBatches > 1 in EXPLAIN output), performance degrades significantly.

Merge Join requires both tables to be sorted on the join column. Extremely fast when both sides are already sorted or can use an index scan in sorted order. Rarely chosen by the planner unless sort order is already available.

join_algorithms_explain.sqlSQL

-- Force specific join type for comparison (PostgreSQL)
-- Normally leave this to the planner -- these are for diagnosis only

SET enable_hashjoin = off;
SET enable_mergejoin = off;
EXPLAIN ANALYZE SELECT * FROM orders o JOIN customers c ON c.customer_id = o.customer_id;
-- Will use Nested Loop -- shows cost with forced algorithm

RESET enable_hashjoin;
RESET enable_mergejoin;
EXPLAIN ANALYZE SELECT * FROM orders o JOIN customers c ON c.customer_id = o.customer_id;
-- Planner chooses optimal -- typically Hash Join for large tables

-- Diagnose: is Hash Join spilling to disk?
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large_table a JOIN another_large_table b ON a.id = b.a_id;
-- Look for: Batches: 4 (instead of 1)
-- Batches > 1 means the hash table spilled to disk -- increase work_mem
-- SET work_mem = '256MB'; -- session-level increase for expensive joins

Output

-- Nested Loop (forced, 100K rows outer):

-- Nested Loop (actual time=0.04..18420.3 rows=100000 loops=1)

-- -> Seq Scan on orders (loops=1)

-- -> Index Scan on customers (loops=100000) <- inner runs 100K times

-- Hash Join (planner chose, same query):

-- Hash Join (actual time=42.1..380.2 rows=100000 loops=1)

-- -> Seq Scan on orders (loops=1)

-- -> Hash -> Seq Scan on customers (loops=1) <- hash built once

-- Massive improvement: 18420ms -> 380ms

🔥Hash Join Spilling to Disk

If EXPLAIN ANALYZE shows Batches: 4 (or any number > 1) on a Hash Join, the hash table exceeded work_mem and spilled to disk. This dramatically slows the join. Increase work_mem for the session: SET work_mem = '256MB'. This is a session-level setting — increase it for specific expensive queries, not globally, to avoid OOM on concurrent connections.

📊 Production Insight

Nested Loop + large loops= number is the most common join performance problem — it means the planner underestimated the outer table's size (stale statistics).

Hash Join Batches > 1 means memory pressure — increase work_mem for the session, not globally.

Planner forcing (SET enable_hashjoin = off) is a diagnostic tool only — never leave it disabled in production.

🎯 Key Takeaway

Nested Loop = good for small outer tables + indexed inner; catastrophic for large outer tables.

Hash Join = large table joins; Batches > 1 means hash spilled to disk — increase work_mem.

Merge Join = when both sides are already sorted — rare but extremely fast when applicable.

Statistics, ANALYZE, and Why Estimates Go Wrong

The query planner does not read your data to make decisions — it reads statistics. PostgreSQL maintains per-column statistics: the number of distinct values, the most common values and their frequencies, and a histogram of the value distribution. These statistics live in pg_statistic and are updated by VACUUM ANALYZE.

When statistics are stale — after a bulk insert, a data migration, or a period of high insert/delete volume — the planner's row estimates can be orders of magnitude off. A planner that thinks a table has 100,000 rows when it has 5,000,000 will choose Nested Loop joins instead of Hash Joins, index scans instead of seq scans on highly selective queries, and estimate sort operations as cheap when they are expensive.

The practical workflow: whenever a query plan looks wrong (estimated rows diverges greatly from actual rows in EXPLAIN ANALYZE), run ANALYZE on the affected table, then re-run EXPLAIN ANALYZE. In most cases the plan improves immediately.

statistics_and_analyze.sqlSQL

-- Check how fresh the statistics are for a table
SELECT
    relname          AS table_name,
    n_live_tup       AS estimated_live_rows,
    n_dead_tup       AS dead_rows,
    last_analyze     AS last_manual_analyze,
    last_autoanalyze AS last_auto_analyze
FROM pg_stat_user_tables
WHERE relname = 'orders';

-- Force statistics update after bulk import or migration
ANALYZE orders;
-- Or with verbosity:
ANALYZE VERBOSE orders;

-- Check column-level statistics (what the planner sees)
SELECT
    attname          AS column,
    n_distinct,      -- negative = fraction of total rows; -0.5 means 50% distinct
    correlation      -- 1.0 = perfectly sorted, 0 = random -- affects index vs seq scan choice
FROM pg_stats
WHERE tablename = 'orders'
  AND attname IN ('customer_id', 'status', 'created_at');

-- Identify tables with stale or missing statistics
SELECT relname, n_live_tup, last_analyze
FROM pg_stat_user_tables
WHERE last_analyze < NOW() - INTERVAL '7 days'
   OR last_analyze IS NULL
ORDER BY n_live_tup DESC;

Output

-- pg_stat_user_tables:

table_name | estimated_live_rows | last_analyze

orders | 5000000 | 2024-03-14 02:30:00 <- analyzed recently

customer_stats | 85000 | NULL <- NEVER analyzed! Statistics are default

-- pg_stats for orders:

column | n_distinct | correlation

customer_id| -0.8 | 0.12 <- 80% distinct, random distribution

status | 4 | 0.45 <- only 4 distinct values (low cardinality)

created_at | -1.0 | 0.98 <- highly distinct, mostly sorted

⚠ Always ANALYZE After Bulk Inserts and Migrations

ANALYZE must be run manually after bulk inserts — autovacuum may not keep up with sudden large row count changes. A table that grows from 100K to 5M rows in a single migration will have statistics based on the old row count until ANALYZE runs. Add ANALYZE to every migration script that inserts or deletes large numbers of rows.

📊 Production Insight

The single most impactful EXPLAIN diagnostic: compare estimated rows vs actual rows in EXPLAIN ANALYZE output — any 10x divergence means stale statistics and a likely suboptimal plan.

For tables with high insert/delete rates, check last_autoanalyze frequency in pg_stat_user_tables — if autovacuum is not running frequently enough, adjust autovacuum_analyze_scale_factor.

Add ANALYZE tablename to every migration script that affects large row counts.

🎯 Key Takeaway

The planner uses statistics, not data — stale statistics produce wrong plans.

EXPLAIN ANALYZE shows estimated vs actual rows — large divergence = run ANALYZE.

Add ANALYZE to migration scripts; monitor last_autoanalyze in pg_stat_user_tables for high-volume tables.

EXPLAIN ANALYZE — The Only Honest Estimator

The plain EXPLAIN lies to you. It shows the planner's guess — what it thinks will happen based on table statistics that might be stale, skewed, or just wrong. You see an Index Scan with cost 12.42 and assume your query is fine. Then it runs for three seconds in production.

EXPLAIN ANALYZE actually executes the query. It shows real timings, real row counts, and the actual number of loops for each node. The gap between "estimated rows" and "actual rows" is where performance goes to die. When you see estimated=1 vs actual=1,000,000, you know the planner chose a terrible join order or scan method based on bad stats.

Run EXPLAIN ANALYZE on a representative dataset — not a local dev table with three rows. Capture the output before a migration or schema change. Compare the before and after. That's how you catch regressions before they hit users.

The time overhead is real, especially on writes (INSERT/UPDATE/DELETE). Don't run ANALYZE on production OLTP during peak hours. Use a read replica or a staging environment with production-scale data.

ExplainAnalyzeCost.sqlSQL

// io.thecodeforge — database tutorial

EXPLAIN (ANALYZE, BUFFERS, TIMING) 
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2024-01-01'
ORDER BY o.total DESC
LIMIT 100;

Output

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=1583.45..1583.70 rows=100 width=64) (actual time=15.432..15.450 rows=100 loops=1)

-> Sort (cost=1583.45..1633.45 rows=20000 width=64) (actual time=15.431..15.442 rows=100 loops=1)

Sort Key: o.total DESC

Sort Method: top-N heapsort Memory: 37kB

-> Hash Join (cost=308.25..1083.25 rows=20000 width=64) (actual time=3.112..12.001 rows=20000 loops=1)

Hash Cond: (o.customer_id = c.id)

-> Seq Scan on orders o (cost=0.00..625.00 rows=20000 width=36) (actual time=0.012..4.543 rows=20000 loops=1)

Filter: (created_at > '2024-01-01')

Rows Removed by Filter: 50000

-> Hash (cost=183.25..183.25 rows=10000 width=36) (actual time=2.100..2.100 rows=10000 loops=1)

Buckets: 4096 Batches: 1 Memory Usage: 561kB

-> Seq Scan on customers c (cost=0.00..183.25 rows=10025 width=36) (actual time=0.008..1.023 rows=10000 loops=1)

Planning Time: 0.245 ms

Execution Time: 15.589 ms

⚠ Production Trap:

EXPLAIN ANALYZE on a DELETE or UPDATE will actually modify data. Always wrap in a transaction and ROLLBACK if you're just diagnosing. Better yet, use EXPLAIN (ANALYZE, FORMAT JSON) which doesn't mutate state.

🎯 Key Takeaway

Always compare estimated rows to actual rows. A 10x or higher mismatch means your planner is flying blind — fix it with VACUUM ANALYZE or better statistics targets.

Why the Planner Chooses Wrong — and How to Fix It

You've seen it: a query that was fast yesterday is slow today. No code change. No schema change. The execution plan flipped from an Index Scan to a Sequential Scan. Your first instinct is wrong — don't add an index hint or force a plan. Fix the root cause.

The planner's job is to minimize total cost. It relies on table statistics: row counts, null fractions, average column width, distribution histograms. When those stats are stale — after massive inserts, deletes, or bulk updates — the planner guesses wrong. A table that had 1,000 rows an hour ago now has 10,000,000. The planner still thinks a Seq Scan is cheap because it expects 1,000 rows. Oops.

Run ANALYZE after bulk operations. Not just once — set autovacuum thresholds that match your write volume. For tables with heavy churn (event logs, audit trails), consider increasing default_statistics_target to 1000 or higher for columns used in WHERE and JOIN predicates. That captures histograms with 1000 buckets instead of the default 100. More buckets = better cardinality estimates for skewed data.

Still wrong? Use pg_stats to inspect column frequency arrays and bucket boundaries. If the planner doesn't see a value's histogram entry, it assumes uniform distribution. Real data is never uniform.

FixBadPlan.sqlSQL

// io.thecodeforge — database tutorial

-- Check current statistics target (default 100)
SHOW default_statistics_target;

-- Increase per-column for the join column
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;

-- Collect fresh stats
ANALYZE orders;

-- Verify the histogram now has enough buckets
SELECT attname, n_distinct, most_common_vals, histogram_bounds
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'customer_id';

Output

default_statistics_target

---------------------------

100

(1 row)

-- After ANALYZE:

attname | n_distinct | most_common_vals | histogram_bounds

-------------+------------+------------------+------------------

customer_id | -0.85 | {142, 583, 902} | {1, 12, 27, 45, ...}

(1 row)

💡Senior Shortcut:

Set default_statistics_target to 500 for databases with skewed or large tables. The planning time increase is negligible — the execution plan reliability gain is massive.

🎯 Key Takeaway

Stale statistics are the #1 cause of sudden query regressions. Automate ANALYZE after every bulk write, or your planner will make increasingly terrible choices.

Parallel Plans: Why Your 64-Core Machine Idles While You Wait

Most devs assume throwing hardware at a slow query makes it faster. The planner disagrees. It decides whether a parallel plan is worth the overhead — and it's often wrong.

Parallelism kicks in when the planner estimates a query will process enough rows to justify splitting work across workers. It starts with a Gather or Gather Merge node, then fans out to partial scans or joins. The problem? If your table is small or your WHERE clause is selective, the coordinator spends more time orchestrating than the workers spend executing. You get slower results on a screaming-fast machine.

Check parallel_workers and parallel_tuple_cost in the plan. A Seq Scan on a 10M-row table should go parallel. An Index Scan returning 100 rows should not. Force parallelism with a hint only after confirming the planner's row estimate is correct — otherwise you're just burning CPU cycles.

ParallelCheck.sqlSQL

// io.thecodeforge — database tutorial

-- Force a parallel plan to see what the planner avoids
SET max_parallel_workers_per_gather = 4;

EXPLAIN (ANALYZE, TIMING OFF)
SELECT count(*) FROM orders WHERE order_date >= '2024-01-01';

-- Output shows Gather node with workers
-- If you see 'Workers Launched: 0', the planner thinks it's not worth it

Output

QUERY PLAN

---------------------------------------------------------------------

Finalize Aggregate (cost=12421.30..12421.31 rows=1 width=8)

-> Gather (cost=12421.08..12421.29 rows=2 width=8)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=11421.08..11421.09 rows=1 width=8)

-> Parallel Seq Scan on orders

Filter: (order_date >= '2024-01-01'::date)

⚠ Production Trap:

Parallel workers consume connections. If you set max_parallel_workers_per_gather too high, you'll starve other queries and hit connection pool limits. Never set it above half your max_connections.

🎯 Key Takeaway

Parallelism has a cost. If your plan shows Workers Launched: 0, trust the planner — or fix your estimates first.

Subquery vs. CTE: The Planner's Dirty Secret About Materialization

Every dev loves a CTE for readability. The planner loves it for a different reason: it materializes the CTE once and reuses it. That sounds great until the CTE returns 10M rows and the outer query only needs 10.

Subqueries get flattened into the main plan. The planner can push predicates down, use indexes, and join in smarter order. CTEs act as optimization fences. Once materialized, the planner treats the result as a static table — no pushdown, no index use. You're stuck with a full scan on a temp dataset.

Check the plan for 'CTE Scan' or 'Materialize' nodes. If you see CTE Scan on my_cte followed by a sequential scan, you're paying the materialization tax. Rewrite as a subquery or a LATERAL join. Exceptions: recursive CTEs and CTEs referenced more than once. Those genuinely benefit from materialization.

CTEvsSubquery.sqlSQL

// io.thecodeforge — database tutorial

-- Slow CTE: plan shows CTE Scan + Seq Scan on large dataset
EXPLAIN ANALYZE
WITH recent_orders AS (
    SELECT * FROM orders WHERE order_date > '2024-06-01'
)
SELECT * FROM recent_orders WHERE customer_id = 42;

-- Fast subquery: plan shows Index Scan with predicate pushdown
EXPLAIN ANALYZE
SELECT * FROM orders 
WHERE order_date > '2024-06-01' 
  AND customer_id = 42;

Output

CTE Plan:

---------------------------------------------------------------------

CTE Scan on recent_orders (cost=1234.56..2234.67 rows=50000 width=100)

Filter: (customer_id = 42)

Subquery Plan:

---------------------------------------------------------------------

Index Scan using idx_orders_customer_date on orders

Index Cond: (customer_id = 42)

Filter: (order_date > '2024-06-01'::date)

🔥Senior Shortcut:

If you must use a CTE and it's referenced only once, slap a NOT MATERIALIZED hint after the AS keyword. PostgreSQL 12+ respects it. Oracle and SQL Server ignore it — know your engine.

🎯 Key Takeaway

CTEs materialize by default. If your plan shows CTE Scan, consider a subquery — or add NOT MATERIALIZED to force inlining.

Reading EXPLAIN Plans: Seq Scan, Index Scan, Bitmap Scan, Hash Join

Understanding the different node types in an EXPLAIN plan is crucial for diagnosing query performance. A Seq Scan reads the entire table row by row, which is efficient for small tables but disastrous for large ones. An Index Scan uses an index to locate rows, but may still access the heap for non-indexed columns. A Bitmap Scan combines multiple indexes or handles large result sets by creating a bitmap of matching pages, then fetching rows in physical order. A Hash Join builds a hash table on one input and probes it with the other, ideal for equi-joins on unsorted data. For example, consider a query joining two tables:

``sql EXPLAIN SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE c.country = 'US'; ``

A typical plan might show a Hash Join: the planner scans customers (filtering by country), builds a hash table, then scans orders and probes the hash. If no index exists on o.customer_id, you might see a Seq Scan on orders, which could be slow. Adding an index on orders.customer_id can change the plan to an Index Scan or even a Nested Loop. Recognizing these node types helps you identify where the planner is spending time and whether an index or query rewrite is needed.

explain_scan_types.sqlSQL

-- Example: Seq Scan on orders without index
EXPLAIN SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE c.country = 'US';

-- After adding index on orders.customer_id
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
EXPLAIN SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE c.country = 'US';

🔥Bitmap Scan vs Index Scan

📊 Production Insight

In production, monitor for unexpected Seq Scans on large tables. Use pg_stat_user_tables to see if sequential scans are happening frequently, and consider adding indexes or tuning work_mem to encourage Bitmap or Hash joins.

🎯 Key Takeaway

Recognizing Seq Scan, Index Scan, Bitmap Scan, and Hash Join in EXPLAIN plans helps you pinpoint performance bottlenecks and decide where to add indexes or rewrite queries.

EXPLAIN ANALYZE: Actual vs Estimated Rows and Timing

EXPLAIN ANALYZE executes the query and provides actual row counts and timing, revealing discrepancies between the planner's estimates and reality. Large mismatches indicate stale statistics or poor cardinality estimates. For example:

``sql EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending'; ``

The output might show rows=10000 (estimated) vs rows=500000 (actual). This can lead to a poor join order or wrong join algorithm. The actual time includes startup and total time per node, helping you identify the most expensive step. If a Hash Join estimates 1000 rows but actual is 1 million, the hash table may spill to disk, causing high I/O. To fix, run ANALYZE to refresh statistics, or increase default_statistics_target for better histogram detail. Also, EXPLAIN (ANALYZE, BUFFERS) adds buffer hit/miss info, showing whether the query is I/O bound. For instance, shared hit=1000 means data was in cache, while shared read=500 indicates disk reads. This guides tuning of shared_buffers or query patterns.

explain_analyze_example.sqlSQL

-- Run EXPLAIN ANALYZE to see actual vs estimated
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE status = 'pending';

-- Check statistics freshness
SELECT tablename, schemaname, last_analyze FROM pg_stat_user_tables WHERE tablename = 'orders';

⚠ Don't Run EXPLAIN ANALYZE on Production Unnecessarily

📊 Production Insight

Set up regular ANALYZE schedules (e.g., via autovacuum) and monitor pg_stat_user_tables for last_analyze. If estimates are consistently off, increase default_statistics_target for key columns.

🎯 Key Takeaway

EXPLAIN ANALYZE reveals the true cost of a query by showing actual rows and timing, helping you detect misestimates and tune statistics or query structure.

Visual Explain Tools: pgAdmin, MSSQL Execution Plan, MySQL Workbench

Visual explain tools provide a graphical representation of execution plans, making it easier to spot expensive operations. pgAdmin offers a 'Explain' button that shows a tree with node costs, row estimates, and actuals if ANALYZE is used. You can hover over nodes for details. MSSQL Management Studio displays the execution plan as a flowchart, with thicker lines indicating higher cost. It also shows missing index suggestions. MySQL Workbench has an 'Explain' tab that visualizes the plan, though it's less detailed than pgAdmin. For example, in pgAdmin, after running EXPLAIN, click 'Explain' -> 'Analyze' to get a visual tree. The width of arrows indicates relative cost. You can also use third-party tools like Depesz (online) or PEV (PostgreSQL Explain Visualizer) for deeper analysis. These tools highlight sequential scans, nested loops, and other expensive nodes. They also show the percentage of total time spent in each node. For MSSQL, the actual execution plan (Ctrl+M) includes per-operator row counts and CPU time. MySQL Workbench's visual explain shows table access methods and join types. Using these tools, you can quickly identify that a Nested Loop join on 5M rows is the culprit, as in the article's example.

visual_explain_usage.sqlSQL

-- In pgAdmin, run EXPLAIN then click 'Explain' -> 'Analyze'
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT * FROM orders WHERE status = 'pending';

-- For MySQL Workbench, use:
EXPLAIN SELECT * FROM orders WHERE status = 'pending';

💡Use Visual Explain for Complex Queries

📊 Production Insight

Integrate visual explain into your code review process. For slow queries, generate a visual plan and share it with the team to discuss optimization strategies.

🎯 Key Takeaway

Visual explain tools like pgAdmin, MSSQL Execution Plan, and MySQL Workbench make it easier to identify expensive operations and understand plan structure at a glance.

● Production incidentPOST-MORTEMseverity: high

A Nested Loop Join on a 5-Million-Row Table Caused a 12-Minute Query

Symptom

A weekly revenue report that generated in 30 seconds started taking 12 minutes after a data migration increased the orders table from 100,000 rows to 5 million. The query had not changed.

Assumption

The query had been running fine for a year. No one checked whether the execution plan had changed after the migration.

Root cause

The query planner's statistics were stale — it estimated 100 rows for the orders table (based on pre-migration statistics) and chose a Nested Loop join as appropriate for small tables. At 5 million rows, a Nested Loop repeats the inner table probe 5 million times. The correct plan for 5 million rows was a Hash Join, which the planner would have chosen with accurate statistics.

Fix

ANALYZE orders; forced the statistics to update. The planner immediately chose a Hash Join on the next run. Query time dropped from 12 minutes to 18 seconds. Added ANALYZE to the post-migration checklist.

Key lesson

Run ANALYZE on large tables after bulk inserts or migrations — stale statistics cause wrong plan choices
Always run EXPLAIN ANALYZE after any significant data volume change to verify the plan has not regressed
The planner estimates rows based on statistics — if the estimate is wildly off from actual rows, statistics are stale

Production debug guideRead the plan systematically — innermost node first4 entries

Symptom · 01

Seq Scan on a large table in the plan

→

Fix

Check if the WHERE column has an index. If yes, check if the query uses a function on the column (WHERE LOWER(email) kills the index). Check selectivity — if >20% of rows match, the planner may prefer Seq Scan even with an index. Run ANALYZE to refresh statistics.

Symptom · 02

Actual rows is many times higher than estimated rows

→

Fix

Stale statistics. Run ANALYZE table_name. For tables with frequent inserts/deletes, check autovacuum settings — autovacuum may not be keeping statistics current. The planner chooses plans based on estimates — bad estimates produce bad plans.

Symptom · 03

Nested Loop with loops=N where N is very large

→

Fix

The outer table is large and the inner probe is repeating N times. Add an index on the inner table's JOIN column. If N is in the millions, the planner may have stale statistics causing it to underestimate the outer table's row count — run ANALYZE.

Symptom · 04

Sort node with high actual time

→

Fix

The query cannot use an index for ORDER BY — it is performing a filesort. Add an index that matches the ORDER BY column order. For window functions, add a composite index on (PARTITION_BY_col, ORDER_BY_col).

★ EXPLAIN Reading Cheat SheetKey nodes, what they mean, and what to do

Seq Scan on large table−

Immediate action

Check for missing index or function wrapping the column

Commands

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

SELECT indexname FROM pg_indexes WHERE tablename = 'orders';

Fix now

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Stale estimates — actual rows far exceeds estimated rows+

Node Type	What It Does	When It Appears	What To Do
Seq Scan	Reads every table page sequentially	No index, low selectivity, or small table	Add index if WHERE is selective; check if >15% of rows returned
Index Scan	B-tree lookup + heap fetch per row	Selective WHERE with index	Usually optimal — consider INCLUDE for Index Only Scan
Index Only Scan	B-tree lookup only, no heap access	Covering index on all needed columns	Ideal — add INCLUDE columns to achieve this
Nested Loop	Outer table drives, inner probed per row	Small outer table or index on inner JOIN col	Catastrophic on large outer — add index to inner or check statistics
Hash Join	Build hash table, probe with other table	Large table joins without sort order	Standard for large joins — check Batches > 1 (spill to disk)
Sort	Sort rows for ORDER BY or Merge Join	ORDER BY without matching index	Add index matching ORDER BY order
HashAggregate	Group rows for GROUP BY	GROUP BY or DISTINCT	Usually fine — check for large estimated distinct count

⚙ Quick Reference

11 commands from this guide

File	Command / Code	Purpose
explain_analyze_reading.sql	EXPLAIN	How to Read an Execution Plan
scan_types_diagnosis.sql	EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending';	Seq Scan, Index Scan, and Index Only Scan
join_algorithms_explain.sql	SET enable_hashjoin = off;	Join Algorithms
statistics_and_analyze.sql	SELECT	Statistics, ANALYZE, and Why Estimates Go Wrong
ExplainAnalyzeCost.sql	EXPLAIN (ANALYZE, BUFFERS, TIMING)	EXPLAIN ANALYZE
FixBadPlan.sql	SHOW default_statistics_target;	Why the Planner Chooses Wrong
ParallelCheck.sql	SET max_parallel_workers_per_gather = 4;	Parallel Plans
CTEvsSubquery.sql	EXPLAIN ANALYZE	Subquery vs. CTE
explain_scan_types.sql	EXPLAIN SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE c....	Reading EXPLAIN Plans
explain_analyze_example.sql	EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE status = 'pending';	EXPLAIN ANALYZE
visual_explain_usage.sql	EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT * FROM orders WHE...	Visual Explain Tools

Key takeaways

Read execution plans from the innermost (deepest indented) node outward

innermost nodes execute first.

Estimated rows vs actual rows is the most important signal

10x divergence means stale statistics, run ANALYZE.

Seq Scan is only a problem on large tables with selective WHERE clauses

on small tables or low-selectivity queries, it is often correct.

Always run ANALYZE after bulk inserts and migrations

add it to every migration script that changes large row counts.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How do you read a PostgreSQL execution plan — where do you start?

Q02SENIOR

A Hash Join in your execution plan shows Batches: 8. What does this mean...

Q03SENIOR

The execution plan estimated 100 rows but the query returned 5 million. ...

Q01 of 03SENIOR

How do you read a PostgreSQL execution plan — where do you start?

ANSWER

Start from the innermost node (the most deeply indented line) and read outward. The innermost nodes are the raw data retrieval operations — Seq Scan or Index Scan — and they execute first. Each outer node receives the result of its inner nodes. For each node, I look at three things: the node type (what operation), the cost estimate (startup..total), and most critically, the estimated rows vs actual rows when using EXPLAIN ANALYZE. A large divergence between estimated and actual rows means stale statistics and a potentially suboptimal plan. I then look for the most expensive nodes by actual time and investigate those first — Seq Scans on large tables, Nested Loops with high loops values, Sort nodes without index backing.

FAQ · 3 QUESTIONS

Frequently Asked Questions

What is the difference between EXPLAIN and EXPLAIN ANALYZE?

Does EXPLAIN ANALYZE work in MySQL?

Why does my query use an index in EXPLAIN but a Seq Scan in production?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's SQL Advanced. Mark it forged?

8 min read · try the examples if you haven't