Senior 3 min · June 25, 2026

Database Indexing: The Only Guide That Won't Make Your Queries Slower

Q: How many indexes should I have on a table?

As few as possible while covering your slow queries. A good rule of thumb: no more than 5-10 indexes on a write-heavy table. Each index adds overhead to every INSERT, UPDATE, and DELETE. Start with indexes on primary keys and foreign keys, then add indexes for specific slow queries one at a time.

Q: What's the difference between a clustered and non-clustered index?

A clustered index determines the physical order of rows on disk. A table can have only one clustered index (e.g., InnoDB's primary key). A non-clustered index is a separate structure with pointers to the rows. Clustered indexes are faster for range queries because rows are stored sequentially, but slower for inserts that cause page splits. Non-clustered indexes are smaller and can be many per table.

Q: How do I find missing indexes in PostgreSQL?

Run the pg_stat_user_tables and pg_stat_user_indexes views to see which tables have the most sequential scans. Then use the pg_stat_statements extension to find slow queries. For automatic suggestions, use the hypopg extension to test hypothetical indexes, or the pg_qualstats extension to track filter columns.

Q: Can an index make a query slower?

Yes. If the query planner chooses an index that requires many heap fetches (e.g., a non-selective index returning 50% of rows), it can be slower than a full table scan. Also, indexes slow down writes. In rare cases, a poorly chosen index can cause the planner to make bad decisions due to outdated statistics. Always test with realistic data.

Database indexing explained with real production war stories.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Indexes make SELECT queries faster by reducing disk I/O. But every index slows down INSERT, UPDATE, and DELETE because the index must be updated too. The trick is to index only the columns used in WHERE, JOIN, and ORDER BY clauses — and never index blindly.

✦ Definition~90s read

What is Database Indexing?

A database index is a data structure (usually a B-tree or hash table) that speeds up data retrieval at the cost of slower writes and extra storage. It's a sorted copy of a subset of columns, enabling the database to find rows without scanning the entire table.

★

Think of a database index like the index at the back of a textbook.

Plain-English First

Think of a database index like the index at the back of a textbook. Without it, you'd have to flip through every page to find a topic (full table scan). With it, you jump straight to the right page. But every time you add a new page to the book, you also have to update the index — that's the write overhead.

I've seen a single missing index turn a 50ms API call into a 30-second timeout that took down a checkout service at 3am. The fix was one line of SQL. The cost of not having it was $40k in lost revenue. Database indexing is the single highest-leverage performance optimization you can make — and the easiest to screw up. Most developers either over-index (slowing writes to a crawl) or under-index (wondering why their queries are slow). After this guide, you'll be able to look at any slow query, identify the missing index, and know exactly when adding one will hurt more than help.

How B-Tree Indexes Actually Work (And Why They Fail)

The default index type in every major database is the B-tree. It's a balanced tree where each node holds a range of sorted keys, and leaf nodes point to the actual rows (or row IDs). The tree depth grows logarithmically with data size — a 1-billion-row table needs only about 4-5 levels. That's why a lookup is O(log n). But here's where it breaks: if your query doesn't match the leftmost prefix of the index, the B-tree is useless. A composite index on (a, b, c) can't speed up WHERE b = 5 — it'll fall back to a full scan. I've seen this bring down a reporting dashboard because the WHERE clause filtered on the second column. The fix: either reorder the columns or create a separate index on b.

CompositeIndexMismatch.sqlSQL

-- io.thecodeforge — System Design tutorial

-- Table: orders (50M rows)
-- Index: (user_id, created_at)

-- This query uses the index efficiently:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 12345 AND created_at > '2024-01-01';
-- Output: Index Scan using idx_orders_user_created (cost=0.56..8.58 rows=100 width=120)

-- This query does NOT use the index:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE created_at > '2024-01-01';
-- Output: Seq Scan on orders (cost=0.00..123456.78 rows=5000000 width=120)
-- The index on (user_id, created_at) can't be used because user_id is not in the WHERE clause.

-- Fix: create a separate index on created_at
CREATE INDEX idx_orders_created_at ON orders(created_at);

Output

First query: Index Scan, fast. Second query: Seq Scan, slow.

Production Trap: Leftmost Prefix Rule

A composite index on (a, b, c) only helps queries that filter on a, or a AND b, or a AND b AND c. Filtering on b alone or c alone will ignore the index. Always put the most selective column first.

thecodeforge.io

Database Index Types and Maintenance Guide

Database Indexing

thecodeforge.io

B-Tree Index Lookup Flow

Database Indexing

When to Use Partial Indexes (And When Not To)

A partial index indexes only a subset of rows — those matching a WHERE condition. This shrinks the index size and reduces write overhead. I use them constantly for soft-delete patterns: index only WHERE deleted_at IS NULL. The index is 10x smaller, writes are faster, and queries for active records are still instant. But partial indexes have a gotcha: the query planner only uses them if your query's WHERE clause is a superset of the index's condition. If you index WHERE status = 'active' and query WHERE status = 'active' AND created_at > '2024-01-01', it works. But if you query WHERE status IN ('active', 'pending'), it won't use the partial index — it'll fall back to a full scan. Always match the condition exactly.

PartialIndexPattern.sqlSQL

-- io.thecodeforge — System Design tutorial

-- Table: users (10M rows, 20% deleted)
-- Without partial index:
CREATE INDEX idx_users_active ON users(active) WHERE active = true;
-- This index is 1/5 the size of a full index on active.

-- Query that uses the partial index:
EXPLAIN ANALYZE
SELECT * FROM users WHERE active = true AND last_login > '2024-06-01';
-- Output: Index Scan using idx_users_active (cost=0.29..8.30 rows=100 width=200)

-- Query that does NOT use the partial index:
EXPLAIN ANALYZE
SELECT * FROM users WHERE active = true OR last_login > '2024-06-01';
-- Output: Seq Scan (cost=0.00..250000.00 rows=5000000 width=200)
-- The OR condition breaks the partial index match.

Output

First query uses partial index. Second query does full scan.

Senior Shortcut: Partial Index for Soft Deletes

Index WHERE deleted_at IS NULL. Your active-record queries will be fast, and the index stays small because deleted rows are excluded. Just remember to include the same condition in your queries.

thecodeforge.io

Partial vs Full Indexes

Database Indexing

Covering Indexes: The Secret to Blazing Fast Reads

A covering index includes all columns needed by a query, so the database never has to touch the table (heap). This is called an index-only scan. In PostgreSQL, you use the INCLUDE clause to add extra columns without expanding the index tree. In MySQL, you just add them to the index (but that increases tree size). I once optimized a reporting query from 12 seconds to 3ms by adding two columns to an existing index with INCLUDE. The index size grew by 30%, but the query became instant. The trade-off: covering indexes are great for read-heavy workloads but brutal on writes. Use them sparingly — only for the hottest queries.

CoveringIndexExample.sqlSQL

-- io.thecodeforge — System Design tutorial

-- Table: orders (50M rows)
-- Frequent query: get order total and status by user_id

-- Before: index on user_id only, query needs heap fetch for total and status
EXPLAIN ANALYZE
SELECT total, status FROM orders WHERE user_id = 12345;
-- Output: Index Scan using idx_orders_user_id (cost=0.56..8.58 rows=100 width=16) + 100 heap fetches

-- After: covering index with INCLUDE (PostgreSQL 11+)
CREATE INDEX idx_orders_user_id_covering ON orders(user_id) INCLUDE (total, status);

EXPLAIN ANALYZE
SELECT total, status FROM orders WHERE user_id = 12345;
-- Output: Index Only Scan using idx_orders_user_id_covering (cost=0.56..4.58 rows=100 width=16)
-- No heap fetches!

Output

Before: Index Scan + heap fetches. After: Index Only Scan, no heap fetches.

Interview Gold: Index-Only Scan vs. Index Scan

An index-only scan reads only the index. An index scan reads the index then fetches rows from the heap. The difference is disk I/O. Covering indexes eliminate heap fetches. Interviewers love this distinction.

Hash Indexes: Fast Equality Lookups, Nothing Else

Hash indexes map keys to a fixed-size hash bucket. They're O(1) for equality lookups — faster than B-tree's O(log n). But they can't do range queries, sorting, or partial matches. PostgreSQL supports hash indexes natively; MySQL uses them only in MEMORY tables. I use hash indexes for lookup tables — like a country code to timezone mapping. But never use them on columns with many duplicates (low cardinality) because hash collisions degrade performance. And never use them for columns you'll sort or filter with >, <, or LIKE. The query planner will ignore the index and do a full scan.

HashIndexExample.sqlSQL

-- io.thecodeforge — System Design tutorial

-- Table: country_codes (250 rows)
-- Frequent query: get timezone by country code

CREATE INDEX idx_country_code_hash ON country_codes USING hash(code);

-- Fast equality lookup:
EXPLAIN ANALYZE
SELECT timezone FROM country_codes WHERE code = 'US';
-- Output: Index Scan using idx_country_code_hash (cost=0.00..8.02 rows=1 width=16)

-- Range query (ignores hash index):
EXPLAIN ANALYZE
SELECT timezone FROM country_codes WHERE code > 'US';
-- Output: Seq Scan on country_codes (cost=0.00..4.50 rows=125 width=16)

Output

Equality lookup uses hash index. Range query does full scan.

Never Do This: Hash Index on High-Cardinality Column

If your column has millions of unique values, a hash index's bucket size grows, and collisions increase. B-tree is almost always better for high-cardinality columns. Hash indexes shine on small, static lookup tables.

GiST and GIN Indexes: Full-Text Search and Beyond

GiST (Generalized Search Tree) and GIN (Generalized Inverted Index) are for complex data types: arrays, JSONB, full-text search, geometric data. GIN is faster for lookups but slower to build; GiST is faster to build but slower for lookups. I use GIN for JSONB queries and full-text search. The classic mistake: indexing a JSONB column with a GIN index on the whole column, then querying with @> (contains) — that works. But if you query with -> (access by key), the index is useless. You need a path-specific index. I've seen this cause a 10-second query on a 1M-row table. The fix: create an index on the specific path, e.g., CREATE INDEX idx_data_email ON users USING gin ((data -> 'email'));

GINIndexJSONB.sqlSQL

-- io.thecodeforge — System Design tutorial

-- Table: users with JSONB column 'data'
-- Frequent query: find users by email inside JSONB

-- Wrong: index on whole column
CREATE INDEX idx_users_data ON users USING gin(data);
-- This index helps @> queries but not -> queries.

-- Query that uses the index:
EXPLAIN ANALYZE
SELECT * FROM users WHERE data @> '{"email": "test@example.com"}';
-- Output: Bitmap Index Scan on idx_users_data

-- Query that does NOT use the index:
EXPLAIN ANALYZE
SELECT * FROM users WHERE data->>'email' = 'test@example.com';
-- Output: Seq Scan (slow!)

-- Fix: path-specific index
CREATE INDEX idx_users_data_email ON users USING gin ((data->'email'));
-- Now the -> query uses the index.

Output

First query uses index. Second query does full scan until path-specific index is added.

Senior Shortcut: Path-Specific GIN Indexes

For JSONB columns, always create indexes on the specific paths you query, not the whole column. The whole-column index is only useful for containment (@>) queries, which are rare in production.

Index Maintenance: The Silent Performance Killer

Indexes bloat over time. In PostgreSQL, updated rows leave dead index entries that VACUUM cleans up. If autovacuum can't keep up, the index grows and queries slow down. I've seen a 10GB index bloat to 40GB because autovacuum was disabled on a busy table. The symptom: queries that used to take 1ms suddenly take 100ms. The fix: monitor index size over time, and run REINDEX periodically. In MySQL, the equivalent is OPTIMIZE TABLE. But be careful — REINDEX locks the table for writes. Use CONCURRENTLY in PostgreSQL to avoid downtime: REINDEX INDEX CONCURRENTLY idx_name;

IndexMaintenance.sqlSQL

-- io.thecodeforge — System Design tutorial

-- Check index bloat in PostgreSQL:
SELECT
    indexrelid::regclass AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    pg_size_pretty(pg_relation_size(indrelid)) AS table_size,
    round(100 * pg_relation_size(indexrelid) / nullif(pg_relation_size(indrelid), 0), 2) AS bloat_pct
FROM pg_stat_all_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild index without locking writes:
REINDEX INDEX CONCURRENTLY idx_orders_user_id;
-- Note: CONCURRENTLY requires extra disk space and takes longer, but avoids downtime.

Output

Shows index size, table size, and bloat percentage.

Production Trap: REINDEX Without CONCURRENTLY

Running REINDEX (without CONCURRENTLY) on a busy table will lock it for writes. Your app will time out. Always use REINDEX INDEX CONCURRENTLY in production. It takes longer but doesn't block writes.

● Production incidentPOST-MORTEMseverity: high

The Index That Killed Our Writes

Symptom

A payments service started timing out on INSERT operations. The database CPU was pegged at 100%, and the connection pool was exhausted within 2 minutes.

Assumption

We assumed a sudden traffic spike — the team started scaling up instances.

Root cause

A junior dev had added a composite index on (user_id, created_at) on a 50-million-row table. The index was 12GB and every INSERT required updating 3 levels of the B-tree. The write throughput dropped from 10k ops/s to 200 ops/s.

Fix

Dropped the composite index. Replaced with a single-column index on user_id only. Created a separate partial index on created_at WHERE status = 'pending' for the reporting query that needed it.

Key lesson

Every index is a tax on writes.
Before adding one, ask: is the query that needs it worth the write penalty? If yes, make the index as narrow as possible.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

Slow SELECT queries on a large table (millions of rows)

→

Fix

1. Run EXPLAIN ANALYZE on the slow query. 2. Look for 'Seq Scan' or 'Table Scan'. 3. Identify the columns in WHERE/JOIN/ORDER BY. 4. Create an index on those columns (composite if multiple). 5. Re-run EXPLAIN ANALYZE to confirm 'Index Scan'.

Symptom · 02

Slow INSERT/UPDATE after adding an index

→

Fix

1. Check if the indexed column changes frequently (e.g., last_updated). 2. Check index size with pg_size_pretty(pg_relation_size('index_name')). 3. If index is large and write-heavy, consider dropping it or making it a partial index. 4. If the index is needed, batch writes to reduce overhead.

Symptom · 03

Index scan still slow (many heap fetches)

→

Fix

1. Check if the query can be satisfied by index-only scan. 2. Add covering columns with INCLUDE (PostgreSQL) or add all needed columns to the index (MySQL). 3. Re-run EXPLAIN ANALYZE and look for 'Index Only Scan'.

★ Database Indexing Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

`Seq Scan` in EXPLAIN ANALYZE output−

Immediate action

Check if an index exists on the filtered columns.

Commands

EXPLAIN ANALYZE SELECT ... WHERE ...;

\d+ table_name (PostgreSQL) or SHOW INDEX FROM table_name (MySQL)

Fix now

CREATE INDEX idx_table_column ON table_name(column);

High disk I/O on write-heavy table+

Index size growing rapidly+

Query uses index but still slow (many rows returned)+

Feature / Aspect	B-Tree Index	Hash Index	GIN Index
Lookup type	Equality, range, prefix	Equality only	Array/JSONB containment, full-text
Ordering support	Yes (ORDER BY)	No	No
Write overhead	Moderate	Low	High (slow to build)
Use case	General purpose	Small lookup tables	Complex data types
Storage size	Moderate	Small	Large

Key takeaways

Every index is a tax on writes. Before adding one, ask

is the query that needs it worth the write penalty?

Composite indexes follow the leftmost prefix rule

put the most selective column first.

Partial indexes are your best friend for soft-delete patterns

index only WHERE deleted_at IS NULL.

Covering indexes (with INCLUDE) eliminate heap fetches and make read queries blazing fast

but only use them for the hottest queries.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does a B-tree index handle concurrent inserts and what happens to re...

Q02SENIOR

When would you choose a GiST index over a B-tree for a geometric column?

Q03SENIOR

What happens when you create an index on a column with many NULLs, and h...

Q04JUNIOR

What is the difference between a clustered and non-clustered index?

Q05SENIOR

You see a query that does an index scan but still fetches 10,000 rows fr...

Q06SENIOR

Design an indexing strategy for a social media feed that shows the lates...

Q01 of 06SENIOR

How does a B-tree index handle concurrent inserts and what happens to read performance during a page split?

ANSWER

During a page split, the B-tree locks the page being split and its parent. Concurrent reads on that page may wait briefly. In PostgreSQL, the split is done under a write lock, but readers using MVCC see the old page until the transaction commits. The split itself is fast (microseconds), but on a hot spot (e.g., incrementing primary key), frequent splits can cause contention. Mitigation: use hash partitioning or a UUID that distributes writes.

FAQ · 4 QUESTIONS

Frequently Asked Questions

How many indexes should I have on a table?

What's the difference between a clustered and non-clustered index?

How do I find missing indexes in PostgreSQL?

Can an index make a query slower?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Database Internals. Mark it forged?

3 min read · try the examples if you haven't