Senior 3 min · June 25, 2026

Database Indexing: The Only Guide That Won't Make Your Queries Slower

Database indexing explained with real production war stories.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

Indexes make SELECT queries faster by reducing disk I/O. But every index slows down INSERT, UPDATE, and DELETE because the index must be updated too. The trick is to index only the columns used in WHERE, JOIN, and ORDER BY clauses — and never index blindly.

✦ Definition~90s read
What is Database Indexing?

A database index is a data structure (usually a B-tree or hash table) that speeds up data retrieval at the cost of slower writes and extra storage. It's a sorted copy of a subset of columns, enabling the database to find rows without scanning the entire table.

Think of a database index like the index at the back of a textbook.
Plain-English First

Think of a database index like the index at the back of a textbook. Without it, you'd have to flip through every page to find a topic (full table scan). With it, you jump straight to the right page. But every time you add a new page to the book, you also have to update the index — that's the write overhead.

I've seen a single missing index turn a 50ms API call into a 30-second timeout that took down a checkout service at 3am. The fix was one line of SQL. The cost of not having it was $40k in lost revenue. Database indexing is the single highest-leverage performance optimization you can make — and the easiest to screw up. Most developers either over-index (slowing writes to a crawl) or under-index (wondering why their queries are slow). After this guide, you'll be able to look at any slow query, identify the missing index, and know exactly when adding one will hurt more than help.

How B-Tree Indexes Actually Work (And Why They Fail)

The default index type in every major database is the B-tree. It's a balanced tree where each node holds a range of sorted keys, and leaf nodes point to the actual rows (or row IDs). The tree depth grows logarithmically with data size — a 1-billion-row table needs only about 4-5 levels. That's why a lookup is O(log n). But here's where it breaks: if your query doesn't match the leftmost prefix of the index, the B-tree is useless. A composite index on (a, b, c) can't speed up WHERE b = 5 — it'll fall back to a full scan. I've seen this bring down a reporting dashboard because the WHERE clause filtered on the second column. The fix: either reorder the columns or create a separate index on b.

CompositeIndexMismatch.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- io.thecodeforge — System Design tutorial

-- Table: orders (50M rows)
-- Index: (user_id, created_at)

-- This query uses the index efficiently:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 12345 AND created_at > '2024-01-01';
-- Output: Index Scan using idx_orders_user_created (cost=0.56..8.58 rows=100 width=120)

-- This query does NOT use the index:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE created_at > '2024-01-01';
-- Output: Seq Scan on orders (cost=0.00..123456.78 rows=5000000 width=120)
-- The index on (user_id, created_at) can't be used because user_id is not in the WHERE clause.

-- Fix: create a separate index on created_at
CREATE INDEX idx_orders_created_at ON orders(created_at);
Output
First query: Index Scan, fast. Second query: Seq Scan, slow.
Production Trap: Leftmost Prefix Rule
A composite index on (a, b, c) only helps queries that filter on a, or a AND b, or a AND b AND c. Filtering on b alone or c alone will ignore the index. Always put the most selective column first.
Database Index Types and Maintenance Guide THECODEFORGE.IO Database Index Types and Maintenance Guide From B-Tree to GIN: choosing and caring for indexes B-Tree Index Balanced tree for range & equality queries Partial Index Index on subset of rows (WHERE clause) Covering Index Includes all columns needed by query Hash Index Fast equality lookups only GiST / GIN Index Full-text search & complex data types Index Maintenance Rebuild/reindex to avoid bloat ⚠ Over-indexing slows writes and bloats storage Only index columns used in WHERE, JOIN, or ORDER BY THECODEFORGE.IO
thecodeforge.io
Database Index Types and Maintenance Guide
Database Indexing
B-Tree Index Lookup FlowTHECODEFORGE.IOB-Tree Index Lookup FlowLogarithmic depth enables fast searches even on billions of rowsRoot NodeHolds range of sorted keysInternal NodesBranch to child rangesLeaf NodesPoint to actual row IDsHeap FetchRetrieve full row data⚠ Depth grows log(n) — 1B rows need only ~4-5 levelsTHECODEFORGE.IO
thecodeforge.io
B-Tree Index Lookup Flow
Database Indexing

When to Use Partial Indexes (And When Not To)

A partial index indexes only a subset of rows — those matching a WHERE condition. This shrinks the index size and reduces write overhead. I use them constantly for soft-delete patterns: index only WHERE deleted_at IS NULL. The index is 10x smaller, writes are faster, and queries for active records are still instant. But partial indexes have a gotcha: the query planner only uses them if your query's WHERE clause is a superset of the index's condition. If you index WHERE status = 'active' and query WHERE status = 'active' AND created_at > '2024-01-01', it works. But if you query WHERE status IN ('active', 'pending'), it won't use the partial index — it'll fall back to a full scan. Always match the condition exactly.

PartialIndexPattern.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- io.thecodeforge — System Design tutorial

-- Table: users (10M rows, 20% deleted)
-- Without partial index:
CREATE INDEX idx_users_active ON users(active) WHERE active = true;
-- This index is 1/5 the size of a full index on active.

-- Query that uses the partial index:
EXPLAIN ANALYZE
SELECT * FROM users WHERE active = true AND last_login > '2024-06-01';
-- Output: Index Scan using idx_users_active (cost=0.29..8.30 rows=100 width=200)

-- Query that does NOT use the partial index:
EXPLAIN ANALYZE
SELECT * FROM users WHERE active = true OR last_login > '2024-06-01';
-- Output: Seq Scan (cost=0.00..250000.00 rows=5000000 width=200)
-- The OR condition breaks the partial index match.
Output
First query uses partial index. Second query does full scan.
Senior Shortcut: Partial Index for Soft Deletes
Index WHERE deleted_at IS NULL. Your active-record queries will be fast, and the index stays small because deleted rows are excluded. Just remember to include the same condition in your queries.
Partial vs Full IndexesTHECODEFORGE.IOPartial vs Full IndexesPartial indexes shrink size and speed writes for filtered queriesFull IndexIndexes every row in tableLarger size, more write overheadSuitable for unfiltered queriesPartial IndexIndexes only matching rows10x smaller, faster writesGreat for soft-delete patternsUse partial indexes when queries always filter by a constant conditionTHECODEFORGE.IO
thecodeforge.io
Partial vs Full Indexes
Database Indexing

Covering Indexes: The Secret to Blazing Fast Reads

A covering index includes all columns needed by a query, so the database never has to touch the table (heap). This is called an index-only scan. In PostgreSQL, you use the INCLUDE clause to add extra columns without expanding the index tree. In MySQL, you just add them to the index (but that increases tree size). I once optimized a reporting query from 12 seconds to 3ms by adding two columns to an existing index with INCLUDE. The index size grew by 30%, but the query became instant. The trade-off: covering indexes are great for read-heavy workloads but brutal on writes. Use them sparingly — only for the hottest queries.

CoveringIndexExample.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- io.thecodeforge — System Design tutorial

-- Table: orders (50M rows)
-- Frequent query: get order total and status by user_id

-- Before: index on user_id only, query needs heap fetch for total and status
EXPLAIN ANALYZE
SELECT total, status FROM orders WHERE user_id = 12345;
-- Output: Index Scan using idx_orders_user_id (cost=0.56..8.58 rows=100 width=16) + 100 heap fetches

-- After: covering index with INCLUDE (PostgreSQL 11+)
CREATE INDEX idx_orders_user_id_covering ON orders(user_id) INCLUDE (total, status);

EXPLAIN ANALYZE
SELECT total, status FROM orders WHERE user_id = 12345;
-- Output: Index Only Scan using idx_orders_user_id_covering (cost=0.56..4.58 rows=100 width=16)
-- No heap fetches!
Output
Before: Index Scan + heap fetches. After: Index Only Scan, no heap fetches.
Interview Gold: Index-Only Scan vs. Index Scan
An index-only scan reads only the index. An index scan reads the index then fetches rows from the heap. The difference is disk I/O. Covering indexes eliminate heap fetches. Interviewers love this distinction.

Hash Indexes: Fast Equality Lookups, Nothing Else

Hash indexes map keys to a fixed-size hash bucket. They're O(1) for equality lookups — faster than B-tree's O(log n). But they can't do range queries, sorting, or partial matches. PostgreSQL supports hash indexes natively; MySQL uses them only in MEMORY tables. I use hash indexes for lookup tables — like a country code to timezone mapping. But never use them on columns with many duplicates (low cardinality) because hash collisions degrade performance. And never use them for columns you'll sort or filter with >, <, or LIKE. The query planner will ignore the index and do a full scan.

HashIndexExample.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- io.thecodeforge — System Design tutorial

-- Table: country_codes (250 rows)
-- Frequent query: get timezone by country code

CREATE INDEX idx_country_code_hash ON country_codes USING hash(code);

-- Fast equality lookup:
EXPLAIN ANALYZE
SELECT timezone FROM country_codes WHERE code = 'US';
-- Output: Index Scan using idx_country_code_hash (cost=0.00..8.02 rows=1 width=16)

-- Range query (ignores hash index):
EXPLAIN ANALYZE
SELECT timezone FROM country_codes WHERE code > 'US';
-- Output: Seq Scan on country_codes (cost=0.00..4.50 rows=125 width=16)
Output
Equality lookup uses hash index. Range query does full scan.
Never Do This: Hash Index on High-Cardinality Column
If your column has millions of unique values, a hash index's bucket size grows, and collisions increase. B-tree is almost always better for high-cardinality columns. Hash indexes shine on small, static lookup tables.

GiST and GIN Indexes: Full-Text Search and Beyond

GiST (Generalized Search Tree) and GIN (Generalized Inverted Index) are for complex data types: arrays, JSONB, full-text search, geometric data. GIN is faster for lookups but slower to build; GiST is faster to build but slower for lookups. I use GIN for JSONB queries and full-text search. The classic mistake: indexing a JSONB column with a GIN index on the whole column, then querying with @> (contains) — that works. But if you query with -> (access by key), the index is useless. You need a path-specific index. I've seen this cause a 10-second query on a 1M-row table. The fix: create an index on the specific path, e.g., CREATE INDEX idx_data_email ON users USING gin ((data -> 'email'));

GINIndexJSONB.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- io.thecodeforge — System Design tutorial

-- Table: users with JSONB column 'data'
-- Frequent query: find users by email inside JSONB

-- Wrong: index on whole column
CREATE INDEX idx_users_data ON users USING gin(data);
-- This index helps @> queries but not -> queries.

-- Query that uses the index:
EXPLAIN ANALYZE
SELECT * FROM users WHERE data @> '{"email": "test@example.com"}';
-- Output: Bitmap Index Scan on idx_users_data

-- Query that does NOT use the index:
EXPLAIN ANALYZE
SELECT * FROM users WHERE data->>'email' = 'test@example.com';
-- Output: Seq Scan (slow!)

-- Fix: path-specific index
CREATE INDEX idx_users_data_email ON users USING gin ((data->'email'));
-- Now the -> query uses the index.
Output
First query uses index. Second query does full scan until path-specific index is added.
Senior Shortcut: Path-Specific GIN Indexes
For JSONB columns, always create indexes on the specific paths you query, not the whole column. The whole-column index is only useful for containment (@>) queries, which are rare in production.

Index Maintenance: The Silent Performance Killer

Indexes bloat over time. In PostgreSQL, updated rows leave dead index entries that VACUUM cleans up. If autovacuum can't keep up, the index grows and queries slow down. I've seen a 10GB index bloat to 40GB because autovacuum was disabled on a busy table. The symptom: queries that used to take 1ms suddenly take 100ms. The fix: monitor index size over time, and run REINDEX periodically. In MySQL, the equivalent is OPTIMIZE TABLE. But be careful — REINDEX locks the table for writes. Use CONCURRENTLY in PostgreSQL to avoid downtime: REINDEX INDEX CONCURRENTLY idx_name;

IndexMaintenance.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- io.thecodeforge — System Design tutorial

-- Check index bloat in PostgreSQL:
SELECT
    indexrelid::regclass AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    pg_size_pretty(pg_relation_size(indrelid)) AS table_size,
    round(100 * pg_relation_size(indexrelid) / nullif(pg_relation_size(indrelid), 0), 2) AS bloat_pct
FROM pg_stat_all_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild index without locking writes:
REINDEX INDEX CONCURRENTLY idx_orders_user_id;
-- Note: CONCURRENTLY requires extra disk space and takes longer, but avoids downtime.
Output
Shows index size, table size, and bloat percentage.
Production Trap: REINDEX Without CONCURRENTLY
Running REINDEX (without CONCURRENTLY) on a busy table will lock it for writes. Your app will time out. Always use REINDEX INDEX CONCURRENTLY in production. It takes longer but doesn't block writes.
● Production incidentPOST-MORTEMseverity: high

The Index That Killed Our Writes

Symptom
A payments service started timing out on INSERT operations. The database CPU was pegged at 100%, and the connection pool was exhausted within 2 minutes.
Assumption
We assumed a sudden traffic spike — the team started scaling up instances.
Root cause
A junior dev had added a composite index on (user_id, created_at) on a 50-million-row table. The index was 12GB and every INSERT required updating 3 levels of the B-tree. The write throughput dropped from 10k ops/s to 200 ops/s.
Fix
Dropped the composite index. Replaced with a single-column index on user_id only. Created a separate partial index on created_at WHERE status = 'pending' for the reporting query that needed it.
Key lesson
  • Every index is a tax on writes.
  • Before adding one, ask: is the query that needs it worth the write penalty? If yes, make the index as narrow as possible.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Slow SELECT queries on a large table (millions of rows)
Fix
1. Run EXPLAIN ANALYZE on the slow query. 2. Look for 'Seq Scan' or 'Table Scan'. 3. Identify the columns in WHERE/JOIN/ORDER BY. 4. Create an index on those columns (composite if multiple). 5. Re-run EXPLAIN ANALYZE to confirm 'Index Scan'.
Symptom · 02
Slow INSERT/UPDATE after adding an index
Fix
1. Check if the indexed column changes frequently (e.g., last_updated). 2. Check index size with pg_size_pretty(pg_relation_size('index_name')). 3. If index is large and write-heavy, consider dropping it or making it a partial index. 4. If the index is needed, batch writes to reduce overhead.
Symptom · 03
Index scan still slow (many heap fetches)
Fix
1. Check if the query can be satisfied by index-only scan. 2. Add covering columns with INCLUDE (PostgreSQL) or add all needed columns to the index (MySQL). 3. Re-run EXPLAIN ANALYZE and look for 'Index Only Scan'.
★ Database Indexing Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
`Seq Scan` in EXPLAIN ANALYZE output
Immediate action
Check if an index exists on the filtered columns.
Commands
EXPLAIN ANALYZE SELECT ... WHERE ...;
\d+ table_name (PostgreSQL) or SHOW INDEX FROM table_name (MySQL)
Fix now
CREATE INDEX idx_table_column ON table_name(column);
High disk I/O on write-heavy table+
Immediate action
Check number of indexes on the table.
Commands
SELECT count(*) FROM pg_indexes WHERE tablename = 'table_name';
\di+ (PostgreSQL) to see index sizes
Fix now
DROP INDEX unused_index; or convert to partial index.
Index size growing rapidly+
Immediate action
Check bloat percentage.
Commands
SELECT pg_size_pretty(pg_relation_size('index_name'));
SELECT * FROM pg_stat_all_indexes WHERE indexrelid = 'index_name'::regclass;
Fix now
REINDEX INDEX CONCURRENTLY index_name;
Query uses index but still slow (many rows returned)+
Immediate action
Check if index is selective enough.
Commands
SELECT count(DISTINCT column) FROM table_name;
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
Fix now
Consider a composite index with more selective columns first, or add a WHERE clause to reduce rows.
Feature / AspectB-Tree IndexHash IndexGIN Index
Lookup typeEquality, range, prefixEquality onlyArray/JSONB containment, full-text
Ordering supportYes (ORDER BY)NoNo
Write overheadModerateLowHigh (slow to build)
Use caseGeneral purposeSmall lookup tablesComplex data types
Storage sizeModerateSmallLarge

Key takeaways

1
Every index is a tax on writes. Before adding one, ask
is the query that needs it worth the write penalty?
2
Composite indexes follow the leftmost prefix rule
put the most selective column first.
3
Partial indexes are your best friend for soft-delete patterns
index only WHERE deleted_at IS NULL.
4
Covering indexes (with INCLUDE) eliminate heap fetches and make read queries blazing fast
but only use them for the hottest queries.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does a B-tree index handle concurrent inserts and what happens to re...
Q02SENIOR
When would you choose a GiST index over a B-tree for a geometric column?
Q03SENIOR
What happens when you create an index on a column with many NULLs, and h...
Q04JUNIOR
What is the difference between a clustered and non-clustered index?
Q05SENIOR
You see a query that does an index scan but still fetches 10,000 rows fr...
Q06SENIOR
Design an indexing strategy for a social media feed that shows the lates...
Q01 of 06SENIOR

How does a B-tree index handle concurrent inserts and what happens to read performance during a page split?

ANSWER
During a page split, the B-tree locks the page being split and its parent. Concurrent reads on that page may wait briefly. In PostgreSQL, the split is done under a write lock, but readers using MVCC see the old page until the transaction commits. The split itself is fast (microseconds), but on a hot spot (e.g., incrementing primary key), frequent splits can cause contention. Mitigation: use hash partitioning or a UUID that distributes writes.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How many indexes should I have on a table?
02
What's the difference between a clustered and non-clustered index?
03
How do I find missing indexes in PostgreSQL?
04
Can an index make a query slower?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Database Internals. Mark it forged?

3 min read · try the examples if you haven't

Previous
Database Sharding
3 / 9 · Database Internals
Next
Normalization vs Denormalization