Skip to content
Home Database SQL Indexes: The Ultimate Guide to Query Optimization

SQL Indexes: The Ultimate Guide to Query Optimization

Where developers are forged. · Structured learning · Free forever.
📍 Part of: SQL Advanced → Topic 1 of 16
Master SQL indexes: B-tree vs hash internals, clustered vs non-clustered differences, composite index rules, and how to avoid the performance traps of over-indexing.
⚙️ Intermediate — basic Database knowledge assumed
In this tutorial, you'll learn
Master SQL indexes: B-tree vs hash internals, clustered vs non-clustered differences, composite index rules, and how to avoid the performance traps of over-indexing.
  • B-tree indexes transform $O(n)$ full table scans into $O(\log n)$ logarithmic lookups.
  • The Leftmost Prefix rule is non-negotiable for composite indexes—order your columns by search frequency.
  • Covering indexes are a 'cheat code'—they allow the database to answer queries without ever touching the main table file.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer

An index is a separate data structure (usually a B-tree) that stores a sorted copy of one or more columns with pointers to the actual rows. This lets the database find rows matching a WHERE clause in $O(\log n)$ instead of scanning every row $O(n)$. Indexes speed up reads but slow down writes — every INSERT, UPDATE, and DELETE must update the index too.

How a B-Tree Index Works

Most relational databases (PostgreSQL, MySQL, SQL Server) use a B-tree (Balanced Tree) structure for indexing. A B-tree keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time.

When you query a column, the database starts at the root node and follows pointers down to the leaf nodes. Each leaf node contains the indexed value plus a 'Record Identifier' (RID) or a Primary Key value that points to the actual row data in the 'Heap' or Clustered Index.

Example · SQL
123456789101112
-- io.thecodeforge.optimization
-- Scenario: Searching for a customer without an index
SELECT * FROM io_thecodeforge.orders WHERE customer_id = 1234;
-- Result: The database performs a 'Sequential Scan' (reads 100% of the table).

-- Optimization: Create a standard B-tree index
CREATE INDEX idx_orders_customer 
ON io_thecodeforge.orders(customer_id);

-- Now the query engine performs an 'Index Scan'
EXPLAIN ANALYZE 
SELECT * FROM io_thecodeforge.orders WHERE customer_id = 1234;
▶ Output
Index Scan using idx_orders_customer on orders (cost=0.42..8.44 rows=1 width=120)

Composite Indexes and the Leftmost Prefix Rule

A composite index (multi-column index) is sorted primarily by the first column, then the second, and so on. This leads to the Leftmost Prefix Rule: an index on $(A, B)$ can be used for queries filtering by $(A)$ or $(A, B)$, but it is completely useless for a query filtering only by $(B)$.

Choosing the right column order is vital. Generally, you put the column with the highest cardinality (most unique values) first, or the column most frequently used in WHERE clauses.

Example · SQL
123456789101112131415161718
-- io.thecodeforge.optimization
-- Composite index on (last_name, first_name)
CREATE INDEX idx_user_full_name 
ON io_thecodeforge.users(last_name, first_name);

-- VALID: Uses index (leftmost prefix)
SELECT * FROM io_thecodeforge.users WHERE last_name = 'Jordan';

-- VALID: Uses index (exact match)
SELECT * FROM io_thecodeforge.users WHERE last_name = 'Jordan' AND first_name = 'Michael';

-- INVALID: Full Table Scan (skips the first column of the index)
SELECT * FROM io_thecodeforge.users WHERE first_name = 'Michael';

-- PRO TIP: Covering Index
-- This index contains ALL data needed, avoiding a 'Table Lookup' entirely
CREATE INDEX idx_covering_search 
ON io_thecodeforge.users(email) INCLUDE (user_id, status);
▶ Output
-- Index-only scan performed; heap access avoided.

When Indexes Hurt: Write Amplification

Every time you INSERT a row into a table with 5 indexes, the database must perform 6 writes (1 to the table, 5 to the indexes). This is known as Write Amplification. Furthermore, indexes can become 'fragmented' over time as rows are deleted and updated, leading to bloated files and degraded performance.

Example · SQL
12345678910111213
-- io.thecodeforge.maintenance
-- 1. Identifying unused indexes (PostgreSQL example)
SELECT relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND idx_read = 0;
-- Action: Drop these to reclaim write performance.

-- 2. Avoiding 'Functional' index traps
-- This ignores a standard index on 'created_at'
SELECT * FROM io_thecodeforge.orders WHERE DATE(created_at) = '2024-01-01';

-- Correct approach: Use a range or a functional index
CREATE INDEX idx_functional_date ON io_thecodeforge.orders ((DATE(created_at)));
▶ Output
-- Query planner now recognizes the functional transformation.
FeatureClustered IndexNon-Clustered Index
Data StorageStores actual row data in the leaf nodesStores pointers (RID/PK) to the data
LimitOnly 1 per table (the physical order)Multiple per table (logical order)
Best ForRange searches and Primary Key lookupsFrequent filtering on non-PK columns
MaintenanceHigh overhead when updating the PKLower overhead but requires a 'lookup' step

🎯 Key Takeaways

  • B-tree indexes transform $O(n)$ full table scans into $O(\log n)$ logarithmic lookups.
  • The Leftmost Prefix rule is non-negotiable for composite indexes—order your columns by search frequency.
  • Covering indexes are a 'cheat code'—they allow the database to answer queries without ever touching the main table file.
  • Always check your execution plans (EXPLAIN) to ensure your indexes are actually being utilized by the optimizer.
  • Balance is key: index for your slowest, most frequent queries, but keep an eye on write latency.

⚠ Common Mistakes to Avoid

    Indexing columns with low cardinality (like 'Gender' or 'Is_Active')—the optimizer will usually ignore these and perform a full scan anyway.
    Applying functions to indexed columns in the WHERE clause (e.g., WHERE YEAR(date_col) = 2023), which prevents the index from being used.
    Ignoring the column order in composite indexes, leading to 'phantom' indexes that the database never uses.
    Over-indexing tables that have high write volume, causing severe latency in transaction processing (OLTP).
    Forgetting to index Foreign Keys, which leads to slow JOIN operations and table locks.

Interview Questions on This Topic

  • QExplain the 'Leftmost Prefix' rule. Can an index on (A, B, C) be used for a query on (A, C)?
  • QWhat is a 'Covering Index' and why is it faster than a standard index scan?
  • QLeetCode Context: You have a table with 100 million logs. A query for 'logs from the last 24 hours' is slow despite an index. What could be the cause? (Hint: Selectivity/I/O cost).
  • QWhat are the disadvantages of having too many indexes on a single table?
  • QCompare B-tree and Hash indexes. In what specific scenario would a Hash index be superior?

Frequently Asked Questions

What is the difference between a clustered and non-clustered index?

A clustered index defines the physical order in which data is stored on the disk. There can only be one clustered index per table (usually the Primary Key). A non-clustered index is a separate structure that contains a sorted list of values and pointers back to the actual data. Think of the clustered index as the actual dictionary entries (alphabetical) and the non-clustered index as the 'Index' section at the back.

Does a Primary Key automatically create an index?

Yes. In almost every RDBMS, defining a Primary Key automatically creates a unique clustered index (or a unique B-tree index) on that column to ensure uniqueness and fast lookups.

Why would the database ignore my index and do a full scan?

The Query Optimizer is cost-based. If the table is very small, or if the column has 'low selectivity' (e.g., searching for 'Male' in a table where 50% of the rows match), the optimizer decides it is faster to just read the whole table into memory than to jump back and forth between the index and the data.

What is Index Fragmentation?

As you delete or update data, the logical order in the B-tree may no longer match the physical order on disk, leaving gaps ('internal fragmentation'). This makes index scans slower. Regular maintenance like REINDEX or ALTER INDEX ... REBUILD fixes this.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

Next →SQL Views
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged