Database Indexing: The Only Guide That Won't Make Your Queries Slower
Database indexing explained with real production war stories.
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
Indexes make SELECT queries faster by reducing disk I/O. But every index slows down INSERT, UPDATE, and DELETE because the index must be updated too. The trick is to index only the columns used in WHERE, JOIN, and ORDER BY clauses — and never index blindly.
Think of a database index like the index at the back of a textbook. Without it, you'd have to flip through every page to find a topic (full table scan). With it, you jump straight to the right page. But every time you add a new page to the book, you also have to update the index — that's the write overhead.
I've seen a single missing index turn a 50ms API call into a 30-second timeout that took down a checkout service at 3am. The fix was one line of SQL. The cost of not having it was $40k in lost revenue. Database indexing is the single highest-leverage performance optimization you can make — and the easiest to screw up. Most developers either over-index (slowing writes to a crawl) or under-index (wondering why their queries are slow). After this guide, you'll be able to look at any slow query, identify the missing index, and know exactly when adding one will hurt more than help.
How B-Tree Indexes Actually Work (And Why They Fail)
The default index type in every major database is the B-tree. It's a balanced tree where each node holds a range of sorted keys, and leaf nodes point to the actual rows (or row IDs). The tree depth grows logarithmically with data size — a 1-billion-row table needs only about 4-5 levels. That's why a lookup is O(log n). But here's where it breaks: if your query doesn't match the leftmost prefix of the index, the B-tree is useless. A composite index on (a, b, c) can't speed up WHERE b = 5 — it'll fall back to a full scan. I've seen this bring down a reporting dashboard because the WHERE clause filtered on the second column. The fix: either reorder the columns or create a separate index on b.
When to Use Partial Indexes (And When Not To)
A partial index indexes only a subset of rows — those matching a WHERE condition. This shrinks the index size and reduces write overhead. I use them constantly for soft-delete patterns: index only WHERE deleted_at IS NULL. The index is 10x smaller, writes are faster, and queries for active records are still instant. But partial indexes have a gotcha: the query planner only uses them if your query's WHERE clause is a superset of the index's condition. If you index WHERE status = 'active' and query WHERE status = 'active' AND created_at > '2024-01-01', it works. But if you query WHERE status IN ('active', 'pending'), it won't use the partial index — it'll fall back to a full scan. Always match the condition exactly.
Covering Indexes: The Secret to Blazing Fast Reads
A covering index includes all columns needed by a query, so the database never has to touch the table (heap). This is called an index-only scan. In PostgreSQL, you use the INCLUDE clause to add extra columns without expanding the index tree. In MySQL, you just add them to the index (but that increases tree size). I once optimized a reporting query from 12 seconds to 3ms by adding two columns to an existing index with INCLUDE. The index size grew by 30%, but the query became instant. The trade-off: covering indexes are great for read-heavy workloads but brutal on writes. Use them sparingly — only for the hottest queries.
Hash Indexes: Fast Equality Lookups, Nothing Else
Hash indexes map keys to a fixed-size hash bucket. They're O(1) for equality lookups — faster than B-tree's O(log n). But they can't do range queries, sorting, or partial matches. PostgreSQL supports hash indexes natively; MySQL uses them only in MEMORY tables. I use hash indexes for lookup tables — like a country code to timezone mapping. But never use them on columns with many duplicates (low cardinality) because hash collisions degrade performance. And never use them for columns you'll sort or filter with >, <, or LIKE. The query planner will ignore the index and do a full scan.
GiST and GIN Indexes: Full-Text Search and Beyond
GiST (Generalized Search Tree) and GIN (Generalized Inverted Index) are for complex data types: arrays, JSONB, full-text search, geometric data. GIN is faster for lookups but slower to build; GiST is faster to build but slower for lookups. I use GIN for JSONB queries and full-text search. The classic mistake: indexing a JSONB column with a GIN index on the whole column, then querying with @> (contains) — that works. But if you query with -> (access by key), the index is useless. You need a path-specific index. I've seen this cause a 10-second query on a 1M-row table. The fix: create an index on the specific path, e.g., CREATE INDEX idx_data_email ON users USING gin ((data -> 'email'));
Index Maintenance: The Silent Performance Killer
Indexes bloat over time. In PostgreSQL, updated rows leave dead index entries that VACUUM cleans up. If autovacuum can't keep up, the index grows and queries slow down. I've seen a 10GB index bloat to 40GB because autovacuum was disabled on a busy table. The symptom: queries that used to take 1ms suddenly take 100ms. The fix: monitor index size over time, and run REINDEX periodically. In MySQL, the equivalent is OPTIMIZE TABLE. But be careful — REINDEX locks the table for writes. Use CONCURRENTLY in PostgreSQL to avoid downtime: REINDEX INDEX CONCURRENTLY idx_name;
The Index That Killed Our Writes
- Every index is a tax on writes.
- Before adding one, ask: is the query that needs it worth the write penalty? If yes, make the index as narrow as possible.
EXPLAIN ANALYZE SELECT ... WHERE ...;\d+ table_name (PostgreSQL) or SHOW INDEX FROM table_name (MySQL)Key takeaways
Interview Questions on This Topic
How does a B-tree index handle concurrent inserts and what happens to read performance during a page split?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
That's Database Internals. Mark it forged?
3 min read · try the examples if you haven't