EXPLAIN shows the query plan the database will use; EXPLAIN ANALYZE actually runs the query and shows real timings
Seq Scan = full table scan — every row read — usually the problem node to fix
Index Scan = B-tree traversal + heap fetch per row; Index Only Scan = no heap fetch (fastest)
Cost is in arbitrary units: (startup_cost..total_cost rows=estimated width=bytes)
Nested Loop = one table drives, the other is looked up per row — good for small outer tables; Hash Join = build hash table from smaller table, probe with larger — good for large tables
Loops=N on an inner node means that node ran N times — the O(n²) signal for correlated subqueries
Plain-English First
Every slow query has a story, and EXPLAIN is how you read it. A query can be written in dozens of equivalent ways, and the database makes a decision about the most efficient path to execute it. EXPLAIN shows you that decision — like the GPS navigation plan before you drive. EXPLAIN ANALYZE is like driving the route and recording how long each segment actually took.
Every slow query has a story, and EXPLAIN is how you read it. In production, a query that ran fine on 10,000 rows can grind a system to a halt at 10 million. The difference isn't always the query itself — it's the execution plan the query optimiser chose. Developers who can read execution plans diagnose slow queries in minutes instead of hours of guesswork.
This guide teaches you to read execution plans the same way a senior DBA does: starting from the innermost node and reading the cost signals, not from the top down. By the end, you'll know the difference between every major node type, what the cost numbers actually mean, and how to use the plan to drive targeted fixes.
How to Read an Execution Plan — The Fundamentals
An execution plan is a tree. The query planner breaks your SQL into operations and arranges them from innermost (raw data retrieval) to outermost (final result). You read a plan from the inside out — the innermost nodes execute first.
Every node in the plan shows three things: the operation type (Seq Scan, Index Scan, Hash Join), the cost estimate (startup_cost..total_cost), and the row and width estimates. When you run EXPLAIN ANALYZE, you also get the actual time and actual rows — the ground truth.
The most important comparison: estimated rows vs actual rows. When these diverge by an order of magnitude, the planner is working from stale statistics and the plan choice may be wrong. A planner that thinks a table has 1,000 rows when it has 1,000,000 will choose algorithms designed for small data.
explain_analyze_reading.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Basic EXPLAIN: shows estimated plan without executingEXPLAINSELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'GROUPBY c.customer_id, c.name;
-- EXPLAIN ANALYZE: executes and shows actual timingsEXPLAINANALYZESELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'GROUPBY c.customer_id, c.name;
-- EXPLAIN ANALYZE BUFFERS: also shows cache hit ratioEXPLAIN (ANALYZE, BUFFERS, FORMATTEXT)
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed'GROUPBY c.customer_id, c.name;
-> Seq Scan on orders (cost=0.00..6200.00 rows=12000 width=16)
(actual time=0.05..98.3 rows=11842 loops=1)
Filter: ((status)::text = 'completed')
Rows Removed by Filter: 38158
-> Hash (cost=750.00..750.00 rows=50000 width=24)
-> Seq Scan on customers (cost=0..750 rows=50000 width=24)
Always Use EXPLAIN ANALYZE on the Actual Slow Query
EXPLAIN without ANALYZE shows estimated plans based on statistics — it does not run the query. The estimates can be wrong. EXPLAIN ANALYZE runs the query and shows real timing. For read queries on production, this is safe. For destructive queries (DELETE, UPDATE), wrap in a transaction and roll back: BEGIN; EXPLAIN ANALYZE DELETE FROM ...; ROLLBACK;
Production Insight
Read execution plans from the innermost indented node outward — the deepest node executes first.
The most useful signal: actual rows vs estimated rows. A 10x or greater divergence means statistics are stale — run ANALYZE.
For production query analysis, use EXPLAIN (ANALYZE, BUFFERS) — the BUFFERS output shows how many data pages came from cache vs disk, which is often the real performance signal.
Key Takeaway
Execution plans are trees — read from innermost node (deepest indentation) outward.
Actual rows vs estimated rows is the most important signal — large divergence means stale statistics.
EXPLAIN ANALYZE runs the query; EXPLAIN alone uses estimates. Always ANALYZE with care on production destructive queries.
Seq Scan, Index Scan, and Index Only Scan — The Core Node Types
Three scan types appear in most queries. Understanding what each does determines what fix to apply.
Seq Scan (Sequential Scan) reads every page of the table in order. This is not always wrong — on small tables or when the query returns >15-20% of all rows, a sequential scan is cheaper than following millions of index pointers. But Seq Scan on a large table with a selective WHERE clause is the most common performance problem in SQL.
Index Scan uses a B-tree index to find matching row locations, then fetches each matching row from the heap (the main table storage). The index narrows the search; the heap provides the full row data. This is the standard fix for a Seq Scan — add an index on the WHERE column.
Index Only Scan is the fastest path — all data the query needs is in the index itself (a covering index), so the heap is never accessed. Zero heap fetches. Achievable by adding INCLUDE columns to the index definition.
scan_types_diagnosis.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Seq Scan: no index, or planner chose not to use oneEXPLAINANALYZESELECT * FROM orders WHERE status = 'pending';
-- Seq Scan on orders (cost=0.00..8500.00 rows=1200 width=80)-- Actual rows: 1200, loops=1 -- if this is >15% of the table, seq scan is correct-- Add index -> Index ScanCREATEINDEX idx_orders_status ONorders(status) WHERE status = 'pending';
-- partial index: only indexes the rows that match -- much smaller, more selectiveEXPLAINANALYZESELECT * FROM orders WHERE status = 'pending';
-- Index Scan using idx_orders_status on orders-- (cost=0.42..180.00 rows=1200 width=80) -- cost dropped from 8500 to 180-- Covering index -> Index Only Scan (no heap access)CREATEINDEX idx_orders_customer_status
ONorders(customer_id, status)
INCLUDE (order_id, total, created_at); -- add SELECT columnsEXPLAINANALYZESELECT order_id, total, created_at
FROM orders
WHERE customer_id = 42AND status = 'completed';
-- Index Only Scan using idx_orders_customer_status-- Heap Fetches: 0 <- fastest possible path
Output
-- Before index: Seq Scan cost=8500
-- After index: Index Scan cost=180
-- With covering index: Index Only Scan, Heap Fetches: 0
-- Cost reduced from 8500 to ~40
Seq Scan Is Not Always Wrong
The planner chooses Seq Scan when it estimates it is cheaper than following index pointers — typically when >15-20% of rows would be returned. On a table of 1,000 rows, Seq Scan is almost always chosen regardless of indexes — and correctly so. Only investigate Seq Scan when it appears on a large table with a highly selective WHERE clause.
Production Insight
Seq Scan on a large table + low estimated rows = missing index or function wrapping the column.
Seq Scan on a large table + high estimated rows = low selectivity — the planner is correct, the query needs redesigning.
Index Only Scan is the goal for your top read-heavy queries — identify them with pg_stat_statements and cover them.
Key Takeaway
Seq Scan = full table read — investigate when on large tables with selective WHERE.
Index Scan = B-tree lookup + heap fetch per row — standard fix for Seq Scan on selective queries.
Index Only Scan = zero heap access — achievable with INCLUDE columns, the highest-performance read path.
Three join algorithms appear in execution plans. The planner chooses based on estimated table sizes, available indexes, and whether the data is sorted.
Nested Loop Join drives with one table (the outer loop) and probes the inner table once per outer row. Fast when the outer table is small and the inner table has an index on the join column. Catastrophic when the outer table is large — the inner probe repeats N times where N is the outer row count. The loops=N in the inner node of EXPLAIN ANALYZE reveals this.
Hash Join builds a hash table from the smaller table, then probes it for each row of the larger table. Two passes over the data, but O(n+m) rather than O(n×m). The standard choice for large table joins. Requires memory — if the hash table spills to disk (HashBatches > 1 in EXPLAIN output), performance degrades significantly.
Merge Join requires both tables to be sorted on the join column. Extremely fast when both sides are already sorted or can use an index scan in sorted order. Rarely chosen by the planner unless sort order is already available.
join_algorithms_explain.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Force specific join type for comparison (PostgreSQL)-- Normally leave this to the planner -- these are for diagnosis onlySET enable_hashjoin = off;
SET enable_mergejoin = off;
EXPLAINANALYZESELECT * FROM orders o JOIN customers c ON c.customer_id = o.customer_id;
-- Will use Nested Loop -- shows cost with forced algorithmRESET enable_hashjoin;
RESET enable_mergejoin;
EXPLAINANALYZESELECT * FROM orders o JOIN customers c ON c.customer_id = o.customer_id;
-- Planner chooses optimal -- typically Hash Join for large tables-- Diagnose: is Hash Join spilling to disk?EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large_table a JOIN another_large_table b ON a.id = b.a_id;
-- Look for: Batches: 4 (instead of 1)-- Batches > 1 means the hash table spilled to disk -- increase work_mem-- SET work_mem = '256MB'; -- session-level increase for expensive joins
-- -> Hash -> Seq Scan on customers (loops=1) <- hash built once
-- Massive improvement: 18420ms -> 380ms
Hash Join Spilling to Disk
If EXPLAIN ANALYZE shows Batches: 4 (or any number > 1) on a Hash Join, the hash table exceeded work_mem and spilled to disk. This dramatically slows the join. Increase work_mem for the session: SET work_mem = '256MB'. This is a session-level setting — increase it for specific expensive queries, not globally, to avoid OOM on concurrent connections.
Production Insight
Nested Loop + large loops= number is the most common join performance problem — it means the planner underestimated the outer table's size (stale statistics).
Hash Join Batches > 1 means memory pressure — increase work_mem for the session, not globally.
Planner forcing (SET enable_hashjoin = off) is a diagnostic tool only — never leave it disabled in production.
Key Takeaway
Nested Loop = good for small outer tables + indexed inner; catastrophic for large outer tables.
Hash Join = large table joins; Batches > 1 means hash spilled to disk — increase work_mem.
Merge Join = when both sides are already sorted — rare but extremely fast when applicable.
Statistics, ANALYZE, and Why Estimates Go Wrong
The query planner does not read your data to make decisions — it reads statistics. PostgreSQL maintains per-column statistics: the number of distinct values, the most common values and their frequencies, and a histogram of the value distribution. These statistics live in pg_statistic and are updated by VACUUM ANALYZE.
When statistics are stale — after a bulk insert, a data migration, or a period of high insert/delete volume — the planner's row estimates can be orders of magnitude off. A planner that thinks a table has 100,000 rows when it has 5,000,000 will choose Nested Loop joins instead of Hash Joins, index scans instead of seq scans on highly selective queries, and estimate sort operations as cheap when they are expensive.
The practical workflow: whenever a query plan looks wrong (estimated rows diverges greatly from actual rows in EXPLAIN ANALYZE), run ANALYZE on the affected table, then re-run EXPLAIN ANALYZE. In most cases the plan improves immediately.
statistics_and_analyze.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Check how fresh the statistics are for a tableSELECT
relname AS table_name,
n_live_tup AS estimated_live_rows,
n_dead_tup AS dead_rows,
last_analyze AS last_manual_analyze,
last_autoanalyze AS last_auto_analyze
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- Force statistics update after bulk import or migrationANALYZE orders;
-- Or with verbosity:ANALYZEVERBOSE orders;
-- Check column-level statistics (what the planner sees)SELECT
attname AS column,
n_distinct, -- negative = fraction of total rows; -0.5 means 50% distinct
correlation -- 1.0 = perfectly sorted, 0 = random -- affects index vs seq scan choiceFROM pg_stats
WHERE tablename = 'orders'AND attname IN ('customer_id', 'status', 'created_at');
-- Identify tables with stale or missing statisticsSELECT relname, n_live_tup, last_analyze
FROM pg_stat_user_tables
WHERE last_analyze < NOW() - INTERVAL'7 days'OR last_analyze ISNULLORDERBY n_live_tup DESC;
ANALYZE must be run manually after bulk inserts — autovacuum may not keep up with sudden large row count changes. A table that grows from 100K to 5M rows in a single migration will have statistics based on the old row count until ANALYZE runs. Add ANALYZE to every migration script that inserts or deletes large numbers of rows.
Production Insight
The single most impactful EXPLAIN diagnostic: compare estimated rows vs actual rows in EXPLAIN ANALYZE output — any 10x divergence means stale statistics and a likely suboptimal plan.
For tables with high insert/delete rates, check last_autoanalyze frequency in pg_stat_user_tables — if autovacuum is not running frequently enough, adjust autovacuum_analyze_scale_factor.
Add ANALYZE tablename to every migration script that affects large row counts.
Key Takeaway
The planner uses statistics, not data — stale statistics produce wrong plans.
EXPLAIN ANALYZE shows estimated vs actual rows — large divergence = run ANALYZE.
Add ANALYZE to migration scripts; monitor last_autoanalyze in pg_stat_user_tables for high-volume tables.
● Production incidentPOST-MORTEMseverity: high
A Nested Loop Join on a 5-Million-Row Table Caused a 12-Minute Query
Symptom
A weekly revenue report that generated in 30 seconds started taking 12 minutes after a data migration increased the orders table from 100,000 rows to 5 million. The query had not changed.
Assumption
The query had been running fine for a year. No one checked whether the execution plan had changed after the migration.
Root cause
The query planner's statistics were stale — it estimated 100 rows for the orders table (based on pre-migration statistics) and chose a Nested Loop join as appropriate for small tables. At 5 million rows, a Nested Loop repeats the inner table probe 5 million times. The correct plan for 5 million rows was a Hash Join, which the planner would have chosen with accurate statistics.
Fix
ANALYZE orders; forced the statistics to update. The planner immediately chose a Hash Join on the next run. Query time dropped from 12 minutes to 18 seconds. Added ANALYZE to the post-migration checklist.
Key lesson
Run ANALYZE on large tables after bulk inserts or migrations — stale statistics cause wrong plan choices
Always run EXPLAIN ANALYZE after any significant data volume change to verify the plan has not regressed
The planner estimates rows based on statistics — if the estimate is wildly off from actual rows, statistics are stale
Production debug guideRead the plan systematically — innermost node first4 entries
Symptom · 01
Seq Scan on a large table in the plan
→
Fix
Check if the WHERE column has an index. If yes, check if the query uses a function on the column (WHERE LOWER(email) kills the index). Check selectivity — if >20% of rows match, the planner may prefer Seq Scan even with an index. Run ANALYZE to refresh statistics.
Symptom · 02
Actual rows is many times higher than estimated rows
→
Fix
Stale statistics. Run ANALYZE table_name. For tables with frequent inserts/deletes, check autovacuum settings — autovacuum may not be keeping statistics current. The planner chooses plans based on estimates — bad estimates produce bad plans.
Symptom · 03
Nested Loop with loops=N where N is very large
→
Fix
The outer table is large and the inner probe is repeating N times. Add an index on the inner table's JOIN column. If N is in the millions, the planner may have stale statistics causing it to underestimate the outer table's row count — run ANALYZE.
Symptom · 04
Sort node with high actual time
→
Fix
The query cannot use an index for ORDER BY — it is performing a filesort. Add an index that matches the ORDER BY column order. For window functions, add a composite index on (PARTITION_BY_col, ORDER_BY_col).
★ EXPLAIN Reading Cheat SheetKey nodes, what they mean, and what to do
Seq Scan on large table−
Immediate action
Check for missing index or function wrapping the column
Commands
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
SELECT indexname FROM pg_indexes WHERE tablename = 'orders';
Fix now
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Stale estimates — actual rows far exceeds estimated rows+
Immediate action
Refresh table statistics
Commands
ANALYZE orders;
EXPLAIN ANALYZE SELECT ... ; -- rerun the slow query
Fix now
If autovacuum is not keeping up, check: SELECT relname, n_live_tup, n_dead_tup, last_analyze FROM pg_stat_user_tables WHERE relname = 'orders';
Node Type
What It Does
When It Appears
What To Do
Seq Scan
Reads every table page sequentially
No index, low selectivity, or small table
Add index if WHERE is selective; check if >15% of rows returned
Index Scan
B-tree lookup + heap fetch per row
Selective WHERE with index
Usually optimal — consider INCLUDE for Index Only Scan
Index Only Scan
B-tree lookup only, no heap access
Covering index on all needed columns
Ideal — add INCLUDE columns to achieve this
Nested Loop
Outer table drives, inner probed per row
Small outer table or index on inner JOIN col
Catastrophic on large outer — add index to inner or check statistics
Hash Join
Build hash table, probe with other table
Large table joins without sort order
Standard for large joins — check Batches > 1 (spill to disk)
Sort
Sort rows for ORDER BY or Merge Join
ORDER BY without matching index
Add index matching ORDER BY order
HashAggregate
Group rows for GROUP BY
GROUP BY or DISTINCT
Usually fine — check for large estimated distinct count
Key takeaways
1
Read execution plans from the innermost (deepest indented) node outward
innermost nodes execute first.
2
Estimated rows vs actual rows is the most important signal
10x divergence means stale statistics, run ANALYZE.
3
Seq Scan is only a problem on large tables with selective WHERE clauses
on small tables or low-selectivity queries, it is often correct.
4
Always run ANALYZE after bulk inserts and migrations
add it to every migration script that changes large row counts.
Common mistakes to avoid
3 patterns
×
Running EXPLAIN without ANALYZE and treating the estimated plan as ground truth
Symptom
The estimated plan looks fine but the query is slow in practice — actual rows is orders of magnitude higher than estimated rows
Fix
Always use EXPLAIN ANALYZE for diagnosis. For destructive queries, wrap in a transaction: BEGIN; EXPLAIN ANALYZE UPDATE ...; ROLLBACK. The estimated plan from plain EXPLAIN is based on statistics that may be stale.
×
Immediately adding an index after seeing a Seq Scan without checking selectivity
Symptom
The new index is never used by the planner — it still chooses Seq Scan; the index adds write overhead for no read benefit
Fix
Check what percentage of rows the WHERE clause returns. If >15-20% of rows match, Seq Scan is correct and an index will not help. Only add an index when the WHERE clause is highly selective (returns a small fraction of total rows).
×
Not running ANALYZE after bulk inserts or data migrations
Symptom
Queries that were fast before a migration become slow afterward — the plan has regressed to a worse algorithm despite no query changes
Fix
Add ANALYZE table_name to every migration script after bulk inserts or deletes. Verify with EXPLAIN ANALYZE that estimated rows match actual rows after the migration.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How do you read a PostgreSQL execution plan — where do you start?
Q02SENIOR
A Hash Join in your execution plan shows Batches: 8. What does this mean...
Q03SENIOR
The execution plan estimated 100 rows but the query returned 5 million. ...
Q01 of 03SENIOR
How do you read a PostgreSQL execution plan — where do you start?
ANSWER
Start from the innermost node (the most deeply indented line) and read outward. The innermost nodes are the raw data retrieval operations — Seq Scan or Index Scan — and they execute first. Each outer node receives the result of its inner nodes. For each node, I look at three things: the node type (what operation), the cost estimate (startup..total), and most critically, the estimated rows vs actual rows when using EXPLAIN ANALYZE. A large divergence between estimated and actual rows means stale statistics and a potentially suboptimal plan. I then look for the most expensive nodes by actual time and investigate those first — Seq Scans on large tables, Nested Loops with high loops values, Sort nodes without index backing.
Q02 of 03SENIOR
A Hash Join in your execution plan shows Batches: 8. What does this mean and how do you fix it?
ANSWER
Batches > 1 means the hash table built for the Hash Join exceeded available work_mem and spilled to disk. PostgreSQL split the join into 8 batches, reading the data 8 times instead of once. This dramatically slows the join — disk I/O is orders of magnitude slower than memory. The immediate fix: increase work_mem for the session before running the query: SET work_mem = '512MB'. This is a session-level setting — do not increase it globally, as each concurrent connection would allocate up to that amount and you risk OOM. The permanent fix: either increase work_mem globally if the server has sufficient RAM and query concurrency is limited, or restructure the query to join on a smaller subset of rows (filter earlier with CTEs or subqueries to reduce the hash table size).
Q03 of 03SENIOR
The execution plan estimated 100 rows but the query returned 5 million. What do you do?
ANSWER
The statistics are stale. The planner chose a plan optimised for 100 rows (likely a Nested Loop join) that is catastrophic for 5 million rows. Immediate action: run ANALYZE on the affected table(s) to refresh statistics. Then run EXPLAIN ANALYZE again — the planner should now choose a plan appropriate for 5 million rows (typically a Hash Join). Verify that estimated rows now matches actual rows. If autovacuum should have handled this, check pg_stat_user_tables: SELECT relname, n_live_tup, last_autoanalyze FROM pg_stat_user_tables WHERE relname = 'orders'. If last_autoanalyze is stale or NULL, investigate autovacuum settings — autovacuum_analyze_scale_factor may be set too high for tables with large sudden row count changes.
01
How do you read a PostgreSQL execution plan — where do you start?
SENIOR
02
A Hash Join in your execution plan shows Batches: 8. What does this mean and how do you fix it?
SENIOR
03
The execution plan estimated 100 rows but the query returned 5 million. What do you do?
SENIOR
FAQ · 3 QUESTIONS
Frequently Asked Questions
01
What is the difference between EXPLAIN and EXPLAIN ANALYZE?
EXPLAIN shows the query plan based on statistics without executing the query. EXPLAIN ANALYZE actually runs the query and shows both the estimated plan and the real execution timings and row counts. For diagnosis, always use EXPLAIN ANALYZE — the estimated plan from plain EXPLAIN can be misleading when statistics are stale.
Was this helpful?
02
Does EXPLAIN ANALYZE work in MySQL?
MySQL supports EXPLAIN and, from MySQL 8.0.18 onward, EXPLAIN ANALYZE. MySQL EXPLAIN uses different node terminology than PostgreSQL: 'ALL' is equivalent to Seq Scan, 'ref' and 'eq_ref' are equivalent to Index Scan variants. The interpretation principles are the same — look for 'ALL' on large tables as the signal for a missing or unused index.
Was this helpful?
03
Why does my query use an index in EXPLAIN but a Seq Scan in production?
The planner can make different decisions based on the actual parameter values at execution time. A query WHERE customer_id = 42 with a parameter that returns 100 rows may use an index scan. The same query WHERE customer_id = 1 returning 500,000 rows may use a seq scan — correctly, because an index scan over 500,000 rows with heap fetches is slower than a sequential scan. This is parameter-dependent plan selection, related to parameter sniffing in stored procedures.