Senior 3 min · March 06, 2026

SQL Indexes — Avoid 90-Minute Foreign Key Lock

An unindexed foreign key on an 8M-row child table caused a 90-minute hold during a 50,000-row deletion.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • An index is a sorted B-tree mapping column values to row locations — turns O(n) full table scans into O(log n) lookups
  • Clustered index = data physically stored in B-tree leaf nodes (one per table, usually the PK)
  • Non-clustered index = separate structure with pointers to heap rows (multiple per table)
  • Leftmost prefix rule: index on (A, B) helps queries on A or (A,B) — never on B alone
  • Covering index = INCLUDE clause adds columns to leaf nodes — zero heap access, fastest possible read
  • Biggest trap: function calls on indexed columns (WHERE YEAR(created_at) = 2024) silently disable the index
Plain-English First

Think of a SQL database like a massive 1,000-page textbook. If you want to find the chapter on 'Photosynthesis,' you don't start at page one and read every sentence until you find it (that's a Full Table Scan). Instead, you flip to the Index at the back, find the word 'Photosynthesis,' see it's on page 412, and jump straight there. An SQL index is that exact 'map'—it's a smaller, sorted list that tells the database exactly where the data lives so it doesn't have to search the whole 'book.'

Indexes are the single most impactful optimization available in relational databases. A query that takes 30 seconds on a table with 10 million rows can return in 2 milliseconds after adding the right index. The difference is the gap between scanning every row on disk and jumping directly to the matching ones via a balanced tree structure.

However, indexes aren't free. The cost is write overhead and disk space. Every index you add slows down INSERT, UPDATE, and DELETE operations because the database must maintain the sorted order of the index in real-time. The 'Forge' philosophy is about precision: adding the exact indexes your critical queries need without bloating the table with redundant metadata.

How a B-Tree Index Works

Most relational databases (PostgreSQL, MySQL, SQL Server) use a B-tree (Balanced Tree) structure for indexing. A B-tree keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.

When you query a column, the database starts at the root node and follows pointers down to the leaf nodes. Each leaf node contains the indexed value plus a 'Record Identifier' (RID) or a Primary Key value that points to the actual row data in the 'Heap' or Clustered Index.

btree_index_demo.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
-- io.thecodeforge.optimization
-- Scenario: Searching for a customer without an index
SELECT * FROM io_thecodeforge.orders WHERE customer_id = 1234;
-- Result: The database performs a 'Sequential Scan' (reads 100% of the table).

-- Optimization: Create a standard B-tree index
CREATE INDEX idx_orders_customer 
ON io_thecodeforge.orders(customer_id);

-- Now the query engine performs an 'Index Scan'
EXPLAIN ANALYZE 
SELECT * FROM io_thecodeforge.orders WHERE customer_id = 1234;
Output
Index Scan using idx_orders_customer_id on orders
(cost=0.42..8.44 rows=1 width=120)
(actual time=0.082..0.085 rows=1 loops=1)
Why Low-Cardinality Columns Don't Benefit
An index on a boolean column (is_active with values true/false) is often useless. If 80% of rows have is_active = true, the planner calculates it is cheaper to read the whole table than to follow 8 million individual heap pointers. The query planner uses statistics — run ANALYZE after bulk inserts to keep them fresh.
Production Insight
A B-tree with 10M rows has depth ~23 — at most 23 page reads to find any row vs potentially millions without an index.
The planner ignores indexes when estimated selectivity is too low (>15-20% of rows matching).
Function calls on indexed columns (LOWER(email), YEAR(created_at)) bypass the index entirely — normalise data at write time instead.
Key Takeaway
B-tree depth scales as log(n) — 10M rows = ~23 hops vs millions of page reads for a sequential scan.
The planner decides whether to use your index — selectivity, statistics freshness, and function wrapping all affect that decision.
Always validate with EXPLAIN ANALYZE after creating an index — never assume it is being used.

Composite Indexes and the Leftmost Prefix Rule

A composite index (multi-column index) is sorted primarily by the first column, then the second, and so on. This leads to the Leftmost Prefix Rule: an index on $(A, B)$ can be used for queries filtering by $(A)$ or $(A, B)$, but it is completely useless for a query filtering only by $(B)$.

Choosing the right column order is vital. Generally, you put the column with the highest cardinality (most unique values) first, or the column most frequently used in WHERE clauses.

composite_and_covering_indexes.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- io.thecodeforge.optimization
-- Composite index on (last_name, first_name)
CREATE INDEX idx_user_full_name 
ON io_thecodeforge.users(last_name, first_name);

-- VALID: Uses index (leftmost prefix)
SELECT * FROM io_thecodeforge.users WHERE last_name = 'Jordan';

-- VALID: Uses index (exact match)
SELECT * FROM io_thecodeforge.users WHERE last_name = 'Jordan' AND first_name = 'Michael';

-- INVALID: Full Table Scan (skips the first column of the index)
SELECT * FROM io_thecodeforge.users WHERE first_name = 'Michael';

-- PRO TIP: Covering Index
-- This index contains ALL data needed, avoiding a 'Table Lookup' entirely
CREATE INDEX idx_covering_search 
ON io_thecodeforge.users(email) INCLUDE (user_id, status);
Output
-- Covering index result:
-- Index Only Scan using idx_covering_user_lookup
-- Heap Fetches: 0 <- zero heap access
The Phone Book Mental Model
  • First column = primary sort key — queries must include it to use the composite index
  • Second column = tie-breaker — only useful after the first column filters
  • Equality conditions first, range conditions last — maximises usability
  • INCLUDE columns add data to leaves without changing sort order — for covering indexes only
Production Insight
The most common composite index mistake: putting a range column first (created_at, status) — only the range portion is usable, the status column is ignored by the planner.
Covering indexes eliminate heap fetches entirely — for high-frequency reads on large tables this is often the single biggest performance win available.
For ORDER BY optimisation, index column order must match the sort order exactly — otherwise a filesort step appears in EXPLAIN.
Key Takeaway
Leftmost prefix is non-negotiable — index on (A, B) cannot help a WHERE B = ? query.
Equality columns first, range columns last in composite index definitions.
Covering indexes = zero heap access — identify your top read-heavy queries and cover them.

When Indexes Hurt: Write Amplification and Bloat

Every time you INSERT a row into a table with 5 indexes, the database must perform 6 writes (1 to the table, 5 to the indexes). This is known as Write Amplification. Furthermore, indexes can become 'fragmented' over time as rows are deleted and updated, leading to bloated files and degraded performance.

index_maintenance_and_audit.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
-- io.thecodeforge.maintenance
-- 1. Identifying unused indexes (PostgreSQL example)
SELECT relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND idx_read = 0;
-- Action: Drop these to reclaim write performance.

-- 2. Avoiding 'Functional' index traps
-- This ignores a standard index on 'created_at'
SELECT * FROM io_thecodeforge.orders WHERE DATE(created_at) = '2024-01-01';

-- Correct approach: Use a range or a functional index
CREATE INDEX idx_functional_date ON io_thecodeforge.orders ((DATE(created_at)));
Output
-- Unused indexes found:
table_name | index_name | times_used | index_size
orders | idx_orders_legacy_status | 0 | 156 MB
users | idx_users_created_month | 0 | 43 MB
-- 199 MB of indexes adding write overhead and never being used
Function Calls Kill Index Usage
WHERE DATE(created_at) = '2024-01-01' forces a full table scan even with an index on created_at — the function transforms the value before comparison, making the index unusable. Rewrite as a range: WHERE created_at >= '2024-01-01' AND created_at < '2024-01-02'. This is the single most common indexing mistake in production systems.
Production Insight
A high-write table (logs, events) with 8+ indexes will show write latency degradation that looks like a capacity problem but is actually an over-indexing problem.
Run the unused index audit query monthly — pg_stat_user_indexes resets on restart, give it at least 2 weeks of representative traffic before trusting idx_scan counts.
Foreign key columns are the most commonly un-indexed columns in production databases — every parent DELETE implicitly queries the child table on that column.
Key Takeaway
Every index = one extra write per INSERT/UPDATE/DELETE — 10 indexes = 10x write amplification.
Audit unused indexes monthly — dead indexes are pure write tax with no read benefit.
Function calls on indexed columns silently disable the index — use range queries instead.
● Production incidentPOST-MORTEMseverity: high

An Unindexed Foreign Key Caused a 90-Minute Table Lock During a Nightly Deletion Job

Symptom
A nightly cleanup job deleting 50,000 expired records from an accounts table blocked all application writes for 90 minutes. No deadlock was detected — the lock was simply held for a very long time.
Assumption
The team indexed all columns used in WHERE clauses. Foreign keys were treated as constraints only, not as query columns that also need indexing.
Root cause
When PostgreSQL deletes a row from a parent table, it checks all child tables for referential integrity — running SELECT FROM child WHERE fk_column = $1 for every deleted row. With no index on the FK column, each check was a full sequential scan of the 8-million-row child table. 50,000 deletions × full scan = held locks while the engine worked through each one.
Fix
CREATE INDEX CONCURRENTLY idx_child_account_id ON child_table(account_id). The CONCURRENTLY keyword builds the index without a table lock. The next deletion job ran in 4 seconds.
Key lesson
  • Foreign key columns always need an index — they are implicitly queried on every parent-table DELETE and UPDATE
  • Use CREATE INDEX CONCURRENTLY in production to avoid blocking reads and writes during index creation
  • Run SELECT indexrelname, idx_scan FROM pg_stat_user_indexes weekly — idx_scan = 0 after 30 days means the index is adding write overhead for zero benefit
Production debug guideDiagnosing index failures and performance regressions4 entries
Symptom · 01
Query is slow despite an index existing on the WHERE column
Fix
Run EXPLAIN ANALYZE. If you see Seq Scan, the planner chose not to use the index. Check for function calls on the column (WHERE LOWER(email) = '...' disables the index on email). Check selectivity — if >20% of rows match, the planner may prefer seq scan. Run ANALYZE to refresh table statistics.
Symptom · 02
Composite index exists but queries on the second column still full-scan
Fix
Leftmost prefix violation. Index on (status, created_at) cannot be used for WHERE created_at alone. Add a separate single-column index on created_at, or restructure with created_at first if that filter is more common.
Symptom · 03
INSERT/UPDATE/DELETE latency has degraded gradually over months
Fix
Count indexes on the table: SELECT COUNT(*) FROM pg_indexes WHERE tablename = 'your_table'. Audit unused: SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0. Drop unused indexes with DROP INDEX CONCURRENTLY.
Symptom · 04
Need to add an index to a live production table without downtime
Fix
Use CREATE INDEX CONCURRENTLY idx_name ON table(col). Monitor progress: SELECT phase, blocks_done, blocks_total FROM pg_stat_progress_create_index. Takes longer than standard CREATE INDEX but never blocks reads or writes.
★ Index Quick Debug CommandsFast commands for diagnosing index problems in PostgreSQL
Slow query — need to verify index is being used
Immediate action
Run EXPLAIN ANALYZE on the exact query
Commands
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 1234;
SELECT indexrelname, idx_scan, idx_tup_read FROM pg_stat_user_indexes WHERE relname = 'orders';
Fix now
If idx_scan is 0 after weeks of traffic, the index is unused. Investigate why or DROP INDEX CONCURRENTLY.
Table has too many indexes — write latency degrading+
Immediate action
Audit indexes by usage frequency
Commands
SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size FROM pg_stat_user_indexes WHERE relname = 'your_table' ORDER BY idx_scan ASC;
DROP INDEX CONCURRENTLY idx_unused_index_name;
Fix now
Drop indexes with idx_scan = 0 that have been monitored for at least 2 weeks of representative traffic.
FeatureClustered IndexNon-Clustered Index
Data StorageStores actual row data in the leaf nodesStores pointers (RID/PK) to the data
LimitOnly 1 per table (the physical order)Multiple per table (logical order)
Best ForRange searches and Primary Key lookupsFrequent filtering on non-PK columns
MaintenanceHigh overhead when updating the PKLower overhead but requires a 'lookup' step

Key takeaways

1
B-tree indexes transform $O(n)$ full table scans into $O(\log n)$ logarithmic lookups.
2
The Leftmost Prefix rule is non-negotiable for composite indexes—order your columns by search frequency.
3
Covering indexes are a 'cheat code'—they allow the database to answer queries without ever touching the main table file.
4
Always check your execution plans (EXPLAIN) to ensure your indexes are actually being utilized by the optimizer.
5
Balance is key
index for your slowest, most frequent queries, but keep an eye on write latency.

Common mistakes to avoid

5 patterns
×

Indexing low-cardinality columns (status, gender, is_active)

Symptom
The query planner ignores the index and performs a full table scan — EXPLAIN shows Seq Scan even though the index exists
Fix
Check selectivity: if >15-20% of rows match the typical WHERE value, the planner will prefer a seq scan. Use partial indexes instead: CREATE INDEX idx_pending_orders ON orders(created_at) WHERE status = 'pending' — index only the relevant subset.
×

Applying functions to indexed columns in WHERE clauses

Symptom
WHERE YEAR(created_at) = 2024 or WHERE LOWER(email) = 'alice@...' runs as a full table scan despite indexes on those columns
Fix
Rewrite as range queries: WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'. For case-insensitive searches, store data lowercase at write time or create a functional index: CREATE INDEX ON users(LOWER(email)).
×

Getting column order wrong in composite indexes — range column first

Symptom
EXPLAIN shows the composite index not being used for queries that filter on the second column — or the planner only uses the first column portion
Fix
Put the most-filtered equality column first. Verify with EXPLAIN that all critical query patterns use the index. Add separate single-column indexes for queries that only filter on non-first columns.
×

Not indexing foreign key columns

Symptom
DELETE operations on parent tables are extremely slow or lock for minutes — JOIN queries on related tables do full scans on the child
Fix
Always CREATE INDEX on every foreign key column. PostgreSQL does NOT auto-create indexes for foreign keys — unlike primary keys. Use CREATE INDEX CONCURRENTLY to avoid write locks on live tables.
×

Accumulating unused indexes on high-write tables

Symptom
INSERT and UPDATE latency gradually increases over months on a table with many write operations — no single change caused it
Fix
Run the pg_stat_user_indexes audit. Drop indexes with idx_scan = 0 after a full week of representative traffic. Use DROP INDEX CONCURRENTLY to avoid write locks.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain the Leftmost Prefix rule. Can an index on (A, B, C) be used for ...
Q02SENIOR
What is a Covering Index and why is it faster than a standard index scan...
Q03SENIOR
A query on a table with 100M rows with an index on created_at is still s...
Q04SENIOR
What are the disadvantages of too many indexes on a single table?
Q05SENIOR
Compare B-tree and Hash indexes. When would you choose a Hash index?
Q01 of 05SENIOR

Explain the Leftmost Prefix rule. Can an index on (A, B, C) be used for a query on (A, C)?

ANSWER
The leftmost prefix rule means a composite index can only serve queries that include the leftmost columns. An index on (A, B, C) can serve queries on A, (A, B), or (A, B, C). For a query filtering on (A, C) that skips B, the index can be used for the A portion only — the C column filter cannot leverage the index because B is the intermediate sort key. The engine uses the index to find A matches, then filters C in memory. To fully support (A, C) queries, add a separate index on (A, C).
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between a clustered and non-clustered index?
02
Does a Primary Key automatically create an index?
03
Why would the database ignore my index and do a full scan?
04
What is Index Fragmentation?
🔥

That's SQL Advanced. Mark it forged?

3 min read · try the examples if you haven't

Previous
SQL UPDATE Statement: Syntax, Examples and Best Practices
1 / 16 · SQL Advanced
Next
SQL Views