Mid-level 6 min · March 06, 2026

Indexing in DBMS — Missing Foreign Key Broke Checkout

A missing foreign key index on a 2-million-row table caused 12-second scans and 100% CPU, taking down checkout —avoid this with proper indexing.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • An index is a sorted lookup structure that maps column values to disk locations, avoiding full table scans
  • B-Tree indexes are default and support equality and range queries; Hash indexes are O(1) for equality only
  • Clustered indexes reorder rows on disk (one per table); non-clustered indexes are separate structures requiring key lookups
  • Performance: Index seek on 1M rows is ~0.1ms vs full scan ~2s — a 20,000x difference
  • Production trap: Every index adds write overhead — 8 indexes on a 10K writes/sec table means 80K extra I/O ops per second
  • Biggest mistake: Indexing low-cardinality columns alone — the planner will ignore the index at >20% row selectivity
Plain-English First

Imagine a 1,000-page cookbook with no table of contents. To find the recipe for 'chocolate cake' you'd flip every single page. An index at the back of the book says 'chocolate cake → page 847' and you jump straight there. A database index works exactly the same way — it's a separate, pre-sorted lookup structure that tells the database engine exactly where your data lives on disk, so it never has to flip through every page.

Every application you've ever used — Instagram, Spotify, your bank — runs queries against tables that hold millions or even billions of rows. Without indexing, every single query would force the database to read every row from disk, one by one, until it found what it needed. That process is called a full table scan, and at scale it turns a 2ms query into a 30-second disaster that crashes your app under load.

Indexing solves the needle-in-a-haystack problem. Instead of storing data in one giant unsorted heap, the database builds a compact, sorted auxiliary structure (the index) that maps column values directly to the physical location of the matching rows. When a query arrives, the engine consults the index first, finds the exact disk address, and fetches only what it needs. The difference is the same as searching a phone book alphabetically versus reading every name from start to finish.

By the end of this article you'll understand not just what an index is, but why different index types exist, when to create them, when NOT to, and the subtle mistakes — like indexing the wrong column or over-indexing a write-heavy table — that engineers make all the time in production. You'll walk away ready to reason about index choices during system design interviews and during your next database performance investigation.

How a Database Actually Finds Data Without an Index

Before we talk about indexes, we need to understand what they replace. When you run a query like SELECT * FROM orders WHERE customer_id = 4821, the database has two options: scan every row in the table checking each customer_id one by one, or consult an index to jump directly to the answer.

The first option is a Full Table Scan. It sounds terrible, but for a table with 500 rows it's actually fine — the overhead of an index lookup can even be slower for tiny datasets. The problem explodes when your orders table has 50 million rows and you're running that query 10,000 times per second.

Disk I/O is the bottleneck. Data lives in fixed-size blocks (usually 8KB or 16KB) on disk. A full scan means reading every block. An index is designed so that finding a specific value requires reading only a handful of blocks — typically O(log n) reads for a B-Tree index — regardless of whether the table has 1,000 rows or 1 billion.

This is the core contract: indexes trade extra disk space and slower writes for dramatically faster reads. That trade-off is the single most important idea in this entire article.

full_scan_vs_index.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- ============================================================
-- Demonstrate the cost difference between a full scan
-- and an index seek on a large orders table.
-- Run EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) to see the plan.
-- ============================================================

-- Step 1: Create a realistic orders table
CREATE TABLE orders (
    order_id      SERIAL PRIMARY KEY,
    customer_id   INT            NOT NULL,
    product_name  VARCHAR(200)   NOT NULL,
    order_total   DECIMAL(10,2)  NOT NULL,
    created_at    TIMESTAMP      NOT NULL DEFAULT NOW()
);

-- Step 2: Seed 1 million rows (simulate a production-sized table)
INSERT INTO orders (customer_id, product_name, order_total, created_at)
SELECT
    (RANDOM() * 100000)::INT,              -- 100k unique customers
    'Product_' || (RANDOM() * 500)::INT,   -- 500 unique products
    (RANDOM() * 500 + 5)::DECIMAL(10,2),   -- order value $5 - $505
    NOW() - (RANDOM() * INTERVAL '2 years') -- orders from last 2 years
FROM generate_series(1, 1000000);

-- Step 3: Query WITHOUT an index — watch the execution plan
EXPLAIN ANALYZE
SELECT order_id, product_name, order_total
FROM   orders
WHERE  customer_id = 4821;
-- Expected plan: "Seq Scan on orders" — reads ALL 1M rows
-- Typical cost on spinning disk: 800ms - 3000ms

-- Step 4: Create a B-Tree index on customer_id
CREATE INDEX idx_orders_customer_id
    ON orders (customer_id);
-- This builds the index structure — one-time cost, usually seconds

-- Step 5: Run the SAME query — the planner now uses the index
EXPLAIN ANALYZE
SELECT order_id, product_name, order_total
FROM   orders
WHERE  customer_id = 4821;
-- Expected plan: "Index Scan using idx_orders_customer_id"
-- Typical cost: 0.5ms - 3ms — a 1000x improvement
Output
-- BEFORE index (Seq Scan):
Seq Scan on orders (cost=0.00..28374.00 rows=10 width=54)
(actual time=12.431..1847.223 rows=9 loops=1)
Planning Time: 0.112 ms
Execution Time: 1847.441 ms
-- AFTER index (Index Scan):
Index Scan using idx_orders_customer_id on orders
(cost=0.42..16.85 rows=10 width=54)
(actual time=0.041..0.063 rows=9 loops=1)
Planning Time: 0.218 ms
Execution Time: 0.089 ms
Why the Planner Sometimes Ignores Your Index
If your query matches a very large percentage of rows (say, 30% of the table), the database query planner will deliberately choose a full table scan over your index. Fetching millions of scattered disk pages via an index is actually slower than reading the table sequentially. This is expected, intelligent behaviour — not a bug.
Production Insight
Full table scans are the silent killer of database performance in production.
They aren't bad for small tables — but on a 10M row table, even a single scan can consume seconds of I/O.
Rule: always run EXPLAIN on any query you add to a hot path.
Key Takeaway
An index trades write cost for read speed — but the planner is smart.
If your query returns >20% of rows, it will skip the index and scan.
Know your data distribution before declaring an index useless.

B-Tree vs Hash Indexes — Picking the Right Tool for the Job

There isn't just one type of index. The right choice depends entirely on the kind of queries you run.

B-Tree (Balanced Tree) is the default in every major RDBMS — PostgreSQL, MySQL, Oracle, SQL Server. It stores index entries in a sorted tree structure. The sorted order means it handles equality lookups (=), range queries (BETWEEN, <, >), and ORDER BY efficiently. 90% of the time, B-Tree is what you want.

Hash Index stores a hash of the column value mapped to a row pointer. Hash lookups are O(1) — theoretically faster than B-Tree's O(log n) for equality checks. The catch: hash indexes are useless for range queries. WHERE price > 100 on a hash index results in a full scan because hashed values have no sorted relationship to each other.

Bitmap Index (Oracle, PostgreSQL partial support) is ideal for low-cardinality columns — columns with very few distinct values like status (active/inactive) or country_code. It stores a bit array per distinct value and uses bitwise AND/OR to combine multiple conditions. Never use bitmap indexes on high-cardinality columns like email — the storage overhead becomes catastrophic.

Choosing wrong here is a real interview red flag.

index_types_comparison.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- ============================================================
-- Compare B-Tree vs Hash index behaviour on a products table.
-- PostgreSQL syntax — MySQL uses ENGINE=MEMORY for Hash indexes.
-- ============================================================

CREATE TABLE products (
    product_id    SERIAL PRIMARY KEY,
    sku           VARCHAR(50)    NOT NULL,   -- unique product code
    category      VARCHAR(100)   NOT NULL,   -- e.g. 'electronics'
    price         DECIMAL(10,2)  NOT NULL,
    in_stock      BOOLEAN        NOT NULL DEFAULT TRUE
);

-- ── B-TREE INDEX ─────────────────────────────────────────────
-- Best for: equality, range queries, ORDER BY, LIKE 'prefix%'
CREATE INDEX idx_products_price_btree
    ON products USING BTREE (price);

-- This query USES the B-Tree index efficiently (range query)
EXPLAIN SELECT product_id, sku
FROM   products
WHERE  price BETWEEN 50.00 AND 150.00;  -- range scan ✓

-- ── HASH INDEX ───────────────────────────────────────────────
-- Best for: exact equality lookups ONLY — useless for ranges
CREATE INDEX idx_products_sku_hash
    ON products USING HASH (sku);

-- This query USES the Hash index efficiently (exact match)
EXPLAIN SELECT product_id, price
FROM   products
WHERE  sku = 'ELEC-4K-TV-001';          -- equality lookup ✓

-- This query CANNOT use the Hash index — falls back to seq scan
EXPLAIN SELECT product_id, price
FROM   products
WHERE  sku LIKE 'ELEC%';               -- range/pattern — hash fails ✗

-- ── PARTIAL INDEX (a B-Tree power move) ─────────────────────
-- Index ONLY the rows you actually query — smaller, faster index
CREATE INDEX idx_products_instock_price
    ON products (price)
    WHERE in_stock = TRUE;              -- only index in-stock items
-- Result: index is a fraction of the size but covers the hot query path
Output
-- B-Tree range query plan:
Bitmap Heap Scan on products
Recheck Cond: ((price >= 50.00) AND (price <= 150.00))
-> Bitmap Index Scan on idx_products_price_btree
Index Cond: ((price >= 50.00) AND (price <= 150.00))
-- Hash equality query plan:
Index Scan using idx_products_sku_hash on products
Index Cond: ((sku)::text = 'ELEC-4K-TV-001'::text)
-- Hash + LIKE (falls back to seq scan — hash can't help here):
Seq Scan on products
Filter: ((sku)::text ~~ 'ELEC%'::text)
Partial Indexes Are Criminally Underused
A partial index on WHERE status = 'pending' in an orders table can be 50x smaller than a full index on status, because 98% of orders are 'completed' and never queried that way. Smaller index = fits in memory = faster. If your queries consistently filter on a specific subset of rows, a partial index is almost always the right call.
Production Insight
Using a Hash index on a column that needs range queries silently falls back to full scans.
No error, no warning — just slower queries.
Rule: stick to B-Tree for anything that isn't a simple equality lookup in a high-throughput path.
Key Takeaway
B-Tree is the safe default for 90% of use cases.
Hash indexes are only faster for exact equality — never use them for ranges.
Partial indexes are a free performance win for filtered queries.

Clustered vs Non-Clustered Indexes — The One That Actually Moves Your Data

This distinction trips people up in interviews constantly, so let's nail it.

A Clustered Index physically reorders the rows on disk to match the index order. Because the data itself IS the index, there can only ever be one clustered index per table. In MySQL InnoDB, the PRIMARY KEY is always the clustered index. SQL Server lets you choose which column to cluster on. When you read data via the clustered index, you get the actual row immediately — no second lookup needed.

A Non-Clustered Index is a completely separate structure that stores the indexed column values and a pointer back to the actual row (called a Row Identifier or RID in SQL Server, or the primary key in InnoDB). When the query engine uses a non-clustered index, it first finds the matching pointer in the index, then does a second lookup in the main table to fetch the remaining columns. This second lookup is called a Key Lookup or Bookmark Lookup and shows up in execution plans.

The practical implication: if a non-clustered index query needs many columns beyond what's in the index, those Key Lookups add up. The fix is a Covering Index — an index that includes all the columns a query needs, so the engine never has to touch the main table at all.

Covering indexes are one of the highest-impact performance optimisations you can make with zero application code changes.

clustered_vs_nonclustered.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- ============================================================
-- Illustrate clustered index layout vs non-clustered,
-- and the power of a COVERING index to eliminate key lookups.
-- SQL Server syntax — concepts identical in MySQL/PostgreSQL.
-- ============================================================

CREATE TABLE employees (
    employee_id   INT           NOT NULL IDENTITY(1,1),
    department_id INT           NOT NULL,
    last_name     VARCHAR(100)  NOT NULL,
    first_name    VARCHAR(100)  NOT NULL,
    salary        DECIMAL(10,2) NOT NULL,
    hire_date     DATE          NOT NULL,

    -- PRIMARY KEY creates the CLUSTERED index by default in SQL Server
    -- Rows are physically sorted on disk by employee_id
    CONSTRAINT PK_employees PRIMARY KEY CLUSTERED (employee_id)
);

-- Non-clustered index on department_id
-- Stores: department_id → employee_id (pointer back to clustered index)
CREATE NONCLUSTERED INDEX idx_employees_department
    ON employees (department_id);

-- ── SCENARIO 1: Non-clustered causes a Key Lookup ────────────
-- The index has department_id but NOT salary or last_name.
-- Engine finds the pointer via the index, then looks up the main
-- table again for each matching row — expensive at scale.
SELECT employee_id, last_name, salary
FROM   employees
WHERE  department_id = 7;
-- Execution plan shows: Key Lookup (clustered) — one extra disk read per row

-- ── SCENARIO 2: Covering index eliminates the Key Lookup ─────
-- Include all columns the query needs directly in the index.
-- The INCLUDE clause adds columns to the leaf level of the B-Tree
-- without making them part of the sort key.
CREATE NONCLUSTERED INDEX idx_employees_dept_covering
    ON employees (department_id)          -- the WHERE / JOIN column
    INCLUDE (last_name, salary);           -- columns needed in SELECT

-- Now the SAME query never touches the main table at all
SELECT employee_id, last_name, salary
FROM   employees
WHERE  department_id = 7;
-- Execution plan shows: Index Only Scan — all data in the index ✓

-- ── PRO MOVE: Composite index column ORDER matters ────────────
-- For a query filtering on department_id AND sorting by hire_date:
CREATE NONCLUSTERED INDEX idx_employees_dept_hire
    ON employees (department_id, hire_date);  -- filter col first, sort col second
-- Putting hire_date first would make the department filter less selective
Output
-- SCENARIO 1 Execution Plan (with Key Lookup):
|--Nested Loops (Inner Join)
|--Index Seek(idx_employees_department, department_id=7)
|--Key Lookup(Clustered)(PK_employees) ← extra table hit per row
-- SCENARIO 2 Execution Plan (Covering Index — no Key Lookup):
|--Index Seek(idx_employees_dept_covering, department_id=7)
[All needed columns found in index — main table never touched]
Watch Out: Column Order in Composite Indexes Is Not Arbitrary
In a composite index on (department_id, hire_date), the index can support queries filtering on department_id alone, or on both columns. It CANNOT efficiently support a query filtering only on hire_date — that's a full index scan. The rule: put the most selective, most frequently filtered column first. A query that skips the leading column loses most of the index benefit.
Production Insight
Key lookups are insidious — they add one random read per row, turning a 3ms query into 300ms when 100 rows are returned.
Under load, these random reads cause disk queue contention.
Rule: cover your most frequent queries with INCLUDE columns to eliminate lookups entirely.
Key Takeaway
Clustered index = data is the index (one per table).
Non-clustered = separate structure + optional key lookup.
Covering indexes eliminate the lookup — always build them for hot queries.

When NOT to Add an Index — The Write Performance Tax

Here's what nobody tells beginners: indexes are never free. Every index you add is a liability on write-heavy tables, and getting this wrong in production is painful.

Every time you INSERT, UPDATE, or DELETE a row, the database must update every index on that table to keep it consistent. A table with 8 indexes doesn't just write once — it performs up to 9 writes per operation. On a table receiving 10,000 inserts per second, that's 80,000 extra I/O operations per second before you've even handled reads.

The situation gets worse with UPDATE. If you update an indexed column, the database must delete the old index entry and insert a new one — two write operations per index, per updated row.

The right mental model is to think of each index as a maintenance contract. The database pays that contract on every write, forever. You only justify that contract when read performance gains outweigh the write penalty.

Rules of thumb from the trenches: - Tables with > 80% writes and < 20% reads: index sparingly, only the absolute hottest read paths. - Tables with > 80% reads: index aggressively, your queries will thank you. - Event/audit log tables: almost never index beyond the primary key — they're append-only and rarely read column-by-column. - Foreign key columns: almost always index these — they're hit on every JOIN.

index_write_overhead_demo.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
-- ============================================================
-- Demonstrate the measurable write cost of indexes on a
-- high-throughput event_log table (append-only pattern).
-- ============================================================

-- Version A: Over-indexed event log (common beginner mistake)
CREATE TABLE event_log_over_indexed (
    event_id      BIGSERIAL PRIMARY KEY,
    user_id       BIGINT        NOT NULL,
    session_id    UUID          NOT NULL,
    event_type    VARCHAR(100)  NOT NULL,
    page_url      TEXT          NOT NULL,
    metadata      JSONB,
    created_at    TIMESTAMP     NOT NULL DEFAULT NOW()
);
-- Beginner adds indexes on every column "just in case"
CREATE INDEX idx_event_user    ON event_log_over_indexed (user_id);
CREATE INDEX idx_event_session ON event_log_over_indexed (session_id);
CREATE INDEX idx_event_type    ON event_log_over_indexed (event_type);
CREATE INDEX idx_event_url     ON event_log_over_indexed (page_url);
CREATE INDEX idx_event_time    ON event_log_over_indexed (created_at);
-- 6 total structures to update on every INSERT — silent write killer

-- Version B: Lean, thoughtful indexing
CREATE TABLE event_log_lean (
    event_id      BIGSERIAL PRIMARY KEY,
    user_id       BIGINT        NOT NULL,
    session_id    UUID          NOT NULL,
    event_type    VARCHAR(100)  NOT NULL,
    page_url      TEXT          NOT NULL,
    metadata      JSONB,
    created_at    TIMESTAMP     NOT NULL DEFAULT NOW()
);
-- Only index what your actual queries need:
-- Hot query 1: "Show me all events for user X in the last 7 days"
CREATE INDEX idx_lean_user_time
    ON event_log_lean (user_id, created_at DESC);
-- Hot query 2: "Count events by type in the last hour"
CREATE INDEX idx_lean_type_time
    ON event_log_lean (event_type, created_at DESC);
-- Result: 3 structures updated per INSERT vs 6 — 50% write reduction

-- ── Measure index bloat on existing tables (PostgreSQL) ──────
SELECT
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan   AS times_used,    -- indexes with 0 here are deadweight
    idx_tup_read,
    idx_tup_fetch
FROM   pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Any index with idx_scan = 0 is costing you writes and giving nothing back
Output
-- pg_stat_user_indexes output (example):
schemaname | tablename | indexname | index_size | times_used
------------+------------------------+------------------------+------------+-----------
public | event_log_over_indexed | idx_event_url | 284 MB | 0
public | event_log_over_indexed | idx_event_session | 201 MB | 3
public | event_log_over_indexed | idx_event_type | 18 MB | 12041
public | event_log_lean | idx_lean_user_time | 94 MB | 89234
public | event_log_lean | idx_lean_type_time | 22 MB | 41109
-- idx_event_url: 284 MB of disk space, 0 uses — pure overhead. DROP IT.
Dead Indexes Are a Hidden Tax You're Paying Right Now
Query pg_stat_user_indexes in PostgreSQL (or sys.dm_db_index_usage_stats in SQL Server) to find indexes with zero or near-zero usage. These are burning disk space, slowing every write, and contributing to table bloat. Dropping an unused index on a write-heavy table can give you an immediate, measurable throughput boost with zero risk to read performance.
Production Insight
Over-indexing a high-write table can cause replication lag — the replica falls behind because it must replay all index writes.
A team once added 5 indexes to a logging table and saw replication delay grow to 45 minutes.
Rule: treat each index as a contract — if it doesn't serve a real query, drop it.
Key Takeaway
Indexes are paid on every write — never add one without a read justification.
Use index usage stats to find and drop dead indexes.
Foreign keys always need indexes; audit logs rarely do.

Composite Indexes — Column Order Makes or Breaks Performance

A composite index spans multiple columns. The order of those columns determines which queries it can serve efficiently. This is one of the most misunderstood concepts in indexing.

The Leading Column Rule: The index can be used for queries that filter on the leading column alone, or on the leading column plus any subsequent columns. It cannot efficiently support a query that skips the leading column — that forces a full index scan or full table scan.

For example, an index on (department_id, hire_date)
  • WHERE department_id = 7 → uses index (good)
  • WHERE department_id = 7 AND hire_date > '2024-01-01' → uses index very efficiently
  • WHERE hire_date > '2024-01-01' → cannot use leading column; likely full scan

Selector First Rule: Put the column with the highest selectivity (most unique values) first. If department_id has 100 distinct values and hire_date has 10,000, then hire_date is more selective. But if queries always filter on department_id first, keep it leading.

The real-world strategy: List all your hot queries, group them by filter columns, then design composite indexes that cover multiple related queries by varying the column order. One index cannot serve all; you often need two or three well-placed indexes rather than fifteen scattered ones.

Covering indexes (using INCLUDE) can add extra columns without affecting the sort order, preserving the selectivity of the leading columns.

composite_index_order.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- ============================================================
-- Show how column order in a composite index affects query plans.
-- Example: an orders table with department and order_date.
-- ============================================================

CREATE TABLE orders100k (
    order_id       SERIAL PRIMARY KEY,
    department_id  INT NOT NULL,
    order_date     DATE NOT NULL,
    amount         DECIMAL(10,2),
    status         VARCHAR(20)
);

-- Insert sample data (100k rows, 10 departments, 1000 dates)
INSERT INTO orders100k (department_id, order_date, amount, status)
SELECT
    (RANDOM() * 9 + 1)::INT,
    '2025-01-01'::DATE + (RANDOM() * 365)::INT,
    (RANDOM() * 1000)::DECIMAL(10,2),
    CASE WHEN RANDOM() < 0.5 THEN 'completed' ELSE 'pending' END
FROM generate_series(1, 100000);

-- Index A: department_id first, order_date second
CREATE INDEX idx_dept_date ON orders100k (department_id, order_date);

-- Query 1: filter on leading column only → efficient
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM   orders100k
WHERE  department_id = 3;
-- Plan: Index Only Scan on idx_dept_date — excellent

-- Query 2: filter on both → still efficient
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM   orders100k
WHERE  department_id = 3
  AND  order_date BETWEEN '2025-06-01' AND '2025-06-30';
-- Plan: Index Only Scan — even better because range is narrowed

-- Query 3: filter on second column only → NOT efficient
EXPLAIN ANALYZE
SELECT COUNT(*)
FROM   orders100k
WHERE  order_date BETWEEN '2025-06-01' AND '2025-06-30';
-- Plan: Seq Scan (reads whole table) — index cannot help because leading column is missing

-- To support Query 3, you'd need an index on (order_date) alone
CREATE INDEX idx_date ON orders100k (order_date);
Output
-- Query 1 (department_id only):
Index Only Scan using idx_dept_date on orders100k
Index Cond: (department_id = 3)
Planning Time: 0.123 ms
Execution Time: 0.456 ms
-- Query 2 (both):
Index Only Scan using idx_dept_date on orders100k
Index Cond: ((department_id = 3) AND (order_date >= '2025-06-01') AND (order_date <= '2025-06-30'))
Execution Time: 0.712 ms
-- Query 3 (order_date only):
Seq Scan on orders100k (cost=0.00..1234.56 rows=3000 width=4)
Filter: ((order_date >= '2025-06-01') AND (order_date <= '2025-06-30'))
Execution Time: 45.234 ms ← 100x slower
Think of a Composite Index Like a Phone Book
  • Leading column determines the primary sort order — queries must include it to get the index benefit.
  • Second column is sorted within each leading value — useful for refining after the first filter.
  • Skipping the leading column forces a full index scan (still reads all leaf pages) or a full table scan.
  • Design indexes around your actual query patterns, not theoretical coverage — look at slow query logs.
Production Insight
A poorly ordered composite index silently wastes disk space and write overhead.
Your monitoring might show 'index used' but the scan depth is still high.
Rule: validate composite index efficiency by checking the number of index pages read (Buffers in PostgreSQL EXPLAIN).
Key Takeaway
Column order in composite indexes determines which queries benefit.
Leading column must match the first filter — skipping it kills performance.
Profile actual query patterns before designing composite indexes.
● Production incidentPOST-MORTEMseverity: high

The Missing Foreign Key Index That Took Down Checkout at Midnight

Symptom
Checkout API started timing out after 30 seconds. Database CPU pinned at 100%. Queries for SELECT * FROM order_items WHERE order_id = ? took 12 seconds each, even though the table had only 2 million rows.
Assumption
The team assumed the primary key index on order_id would cover all queries on the orders table. They didn't realize that a query filtering on customer_id — a foreign key column with no index — would force a full table scan regardless of the primary key.
Root cause
The orders table had 2 million rows. The query SELECT * FROM orders WHERE customer_id = 4821 had no index on customer_id, so the database performed a sequential scan. Under normal load (5 queries/sec) this was tolerable at ~800ms. During Black Friday, the query ran 200 times per second, saturating disk I/O and causing all other queries to queue.
Fix
Added a B-Tree index on customer_id: CREATE INDEX idx_orders_customer_id ON orders (customer_id);. Query time dropped from 800ms to 2ms. CPU dropped from 100% to 15%.
Key lesson
  • Every foreign key column needs an index — queries that JOIN or filter on FK columns will otherwise scan the whole table.
  • Query performance is not linear — a single missing index under load causes system-wide degradation, not just one slow query.
  • Monitor index usage with pg_stat_user_indexes or sys.dm_db_index_usage_stats; unused indexes are a write tax, missing indexes are a read disaster.
Production debug guideFollow these symptom→action pairs to identify and fix index-related performance issues in production.4 entries
Symptom · 01
Query takes seconds on a table with millions of rows, and the execution plan shows 'Seq Scan' or 'Table Scan'.
Fix
Check the WHERE clause columns — if they are not indexed, create a B-Tree index. For multi-column filters, create a composite index with the most selective column first.
Symptom · 02
Query uses an index but still has 'Key Lookup' (clustered) for every row, adding extra page reads.
Fix
Convert the index to a covering index by including all SELECT columns using the INCLUDE clause (SQL Server) or adding them to the index (PostgreSQL).
Symptom · 03
A query with a range condition (BETWEEN, <, >) is slow even though the column is indexed.
Fix
Verify the index type. If it's a Hash index, it cannot support range scans. Drop it and create a B-Tree index.
Symptom · 04
INSERT/UPDATE performance is degrading over time on a write-heavy table with many indexes.
Fix
Query index usage stats and drop any indexes with zero or near-zero scan counts. Also consider index maintenance (rebuild/reorganize) to reduce fragmentation.
★ Index Performance Quick ReferenceUse these commands and checks when you suspect index issues in PostgreSQL or SQL Server.
Query seems slow but you don't know if it's using an index
Immediate action
Run EXPLAIN ANALYZE on the query to see the execution plan.
Commands
EXPLAIN ANALYZE SELECT ... WHERE ...;
Look for 'Index Scan', 'Index Seek' (good) vs 'Seq Scan' (bad).
Fix now
If Seq Scan appears and the table is large, add an index on the filter column(s).
High write latency on a table with many indexes+
Immediate action
Check which indexes are actually used by the application.
Commands
SELECT * FROM pg_stat_user_indexes WHERE relname = 'your_table';
Look for idx_scan = 0 — those indexes are never used.
Fix now
DROP any index with idx_scan = 0 (after confirming no periodic use).
Index size is huge but the table is moderate+
Immediate action
Check for redundant or overlapping indexes.
Commands
SELECT pg_size_pretty(pg_relation_size(indexrelid)) FROM pg_stat_user_indexes WHERE relname = 'your_table';
Compare with table size via pg_total_relation_size.
Fix now
Remove duplicate indexes and consider partial indexes to shrink size.
Feature / AspectClustered IndexNon-Clustered Index
Physical row orderingYes — data rows sorted by index key on diskNo — separate structure, data stays in heap
Count per tableExactly 1 (the table IS the index)Up to 999 (SQL Server) / effectively unlimited
Lookup cost (full row)O(log n) — row returned directlyO(log n) for index + O(1) key lookup for row
Range query speedExcellent — sequential disk readsGood, but key lookups add overhead per row
INSERT / UPDATE costHigher — rows may need physical reorderingLower per index, but adds up with many indexes
Best forPrimary key, most-queried sort columnForeign keys, filter columns, covering queries
Covering index supportN/A — always has full rowYes — INCLUDE clause eliminates key lookups
Default in MySQL InnoDBAlways the PRIMARY KEYAny additional CREATE INDEX statement

Key takeaways

1
An index trades disk space and write speed for faster reads
every index is a maintenance contract the database pays on every INSERT, UPDATE and DELETE, forever.
2
B-Tree indexes support equality AND range queries and are the safe default; Hash indexes are faster for pure equality lookups but completely blind to ranges
choosing wrong leaves queries doing full scans silently.
3
A clustered index physically reorders table rows so there can only be one per table; a non-clustered index is a separate structure that points back to the row, requiring a key lookup unless you make it a covering index with INCLUDE.
4
Composite index column order determines which queries benefit
the leading column must appear in the filter; skipping it forces a full scan.
5
Use pg_stat_user_indexes (PostgreSQL) or sys.dm_db_index_usage_stats (SQL Server) to find indexes with zero usage and drop them
they're costing you write performance and giving nothing back.

Common mistakes to avoid

4 patterns
×

Indexing a low-cardinality column alone (e.g., `CREATE INDEX ON orders (status)` where status has only 3 values)

Symptom
The query planner ignores the index because fetching 33% of the table via scattered index lookups is slower than a sequential scan.
Fix
Use a composite index with a high-cardinality leading column (customer_id, status), or use a partial index (WHERE status = 'pending') to target only the rare, frequently-queried value.
×

Wrapping an indexed column in a function inside a WHERE clause

Symptom
A query like WHERE YEAR(created_at) = 2024 or WHERE LOWER(email) = 'user@example.com' cannot use a standard B-Tree index because the function transforms the value before comparison, destroying the sorted order the index depends on.
Fix
Rewrite the predicate to leave the column bare (WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01') or create a function-based index on the expression (CREATE INDEX ON users (LOWER(email))).
×

Never running ANALYZE or UPDATE STATISTICS after bulk data loads

Symptom
Index performance depends on the query planner having accurate statistics about data distribution. After loading millions of rows, stale statistics can cause the planner to choose a full table scan even when a perfect index exists, or vice versa.
Fix
Always run ANALYZE table_name in PostgreSQL or UPDATE STATISTICS table_name in SQL Server after any significant bulk load or data deletion, and ensure your auto-analyze/auto-stats settings are tuned for the table's change rate.
×

Creating an index on a UUID column without considering writes

Symptom
Random UUIDs cause index fragmentation and page splits on B-Tree indexes because new inserts go to random pages instead of sequentially. The index becomes bloated and write performance degrades over time.
Fix
Use sequential UUIDs (UUID v7) or a separate sequential surrogate key as the indexed column. If random UUIDs are unavoidable, schedule periodic index rebuilds during low-traffic windows.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the difference between a clustered and non-clustered index, and ...
Q02SENIOR
You have a query that filters on `department_id` and `hire_date`. How do...
Q03SENIOR
A colleague says 'indexes make queries faster, so let's index every colu...
Q04SENIOR
Explain how a covering index eliminates key lookups and why that matters...
Q01 of 04SENIOR

What is the difference between a clustered and non-clustered index, and why can a table have only one clustered index?

ANSWER
A clustered index physically reorders the data rows on disk to match the index order. Because the data itself is the index's leaf level, there can only be one such ordering per table. Non-clustered indexes are separate structures that store copies of the indexed columns plus a pointer to the actual row (either the clustered index key or a row identifier). They can support multiple lookups from different column combinations, but each lookup may require a key lookup step.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
Does adding an index always make queries faster?
02
What is a covering index and why does it matter?
03
How do I find out which indexes my database is actually using?
04
What is the difference between a full index scan and an index seek?
🔥

That's DBMS. Mark it forged?

6 min read · try the examples if you haven't

Previous
Concurrency Control in DBMS
6 / 11 · DBMS
Next
Transactions in DBMS