Senior 11 min · March 05, 2026

MongoDB $in Query COLLSCAN — Non-Selective Index Gotcha

A $in query on status caused 30-second response times due to COLLSCAN—single-field index was non-selective.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • MongoDB indexes are B-tree data structures that map indexed field values to document locations
  • A single-field index sorts values and enables fast lookup, range queries, and sort operations
  • Compound indexes support queries on multiple fields; the order of fields must follow the ESR rule (Equality, Sort, Range)
  • explain() reveals whether a query uses an index (IXSCAN) or scans the entire collection (COLLSCAN)
  • Performance insight: index lookup is O(log n) vs COLLSCAN's O(n) — on 10M documents, that's ~25 reads vs 10M reads
  • Production insight: wrong index order can make queries slower than no index because MongoDB may still read many documents to filter unindexed fields
  • Covered queries skip document fetch — fast reads, great for high-frequency paths
Plain-English First

Imagine a 1,000-page cookbook with no table of contents. Every time you want a pasta recipe, you flip every single page until you find one. An index is that table of contents — it tells MongoDB exactly which page (document) to turn to without reading every other page first. The difference between a query taking 50ms and 50 seconds is almost always whether the right index exists.

MongoDB can hold hundreds of millions of documents and still answer a query in single-digit milliseconds — but only if you've built the right indexes. Without them, every query triggers a COLLSCAN (collection scan), meaning MongoDB reads every document in the collection sequentially. At scale, that's the difference between a snappy user experience and a timed-out API call that wakes your on-call engineer at 3 AM.

The problem indexes solve is deceptively simple: avoid reading data you don't need. But the implementation details are where engineers get into trouble. The wrong index order in a compound index, a missing index on a sort field, or a wildcard index used carelessly can make performance worse than no index at all — because now MongoDB has to maintain extra data structures on every write.

By the end of this article you'll understand exactly how MongoDB's B-tree indexes are structured on disk, how to construct compound indexes using the ESR rule, how to read an explain() plan like a senior engineer, and the production gotchas that trip up teams who learned indexing from a five-minute tutorial. The silent killer: a missing index on a sort field can turn a 10ms query into a 10-second blocking sort. Don't let that be you.

What is MongoDB Indexing?

MongoDB Indexing is a core concept in Database. Rather than starting with a dry definition, let's see it in action and understand why it exists.

Indexes in MongoDB are special data structures that store a small subset of the collection's data in an easy-to-traverse form. They point to the location of documents so the query engine can find data without scanning every document. Think of them as a book's index — you don't read every page to find a topic, you flip to the index and then jump directly to the relevant pages.

In production, forgetting an index on a frequently queried field is the #1 cause of performance degradation. The worst part? It's silent until your collection hits a few million documents and your API latency spikes from 50ms to 30s. Indexing isn't optional — it's the difference between your database feeling like a spreadsheet and feeling like a proper DBMS.

Here's the thing: a single COLLSCAN on a 10M document collection can take 10 seconds even on fast hardware. That same query with an index runs in under 10ms. But indexes cost you on writes — each insert or update must update every index. Over-indexing is a real problem, especially in high-throughput systems. We'll cover the trade-offs later.

ForgeExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
package io.thecodeforge;

// TheCodeForge — MongoDB Indexing example
// Always use meaningful names, not x or n
public class ForgeExample {
    public static void main(String[] args) {
        String topic = "MongoDB Indexing";
        System.out.println("Learning: " + topic + " 🔥");
    }
}
Output
Learning: MongoDB Indexing 🔥
Forge Tip:
Type this code yourself rather than copy-pasting. The muscle memory of writing it will help it stick.
Production Insight
COLLSCAN on large collections is the top cause of unpredictable query latency.
Indexes add write overhead: each index adds O(log n) per write.
Rule: always profile queries before deciding whether to index.
On SSDs, a COLLSCAN of 50M documents averages ~8 seconds; with an index, the same query takes ~5ms.
But beware: over-indexing a collection with 100k writes/sec can degrade write throughput by 30% or more.
Key Takeaway
Indexes trade write speed for read speed.
Know your workload ratio before deciding how many indexes to create.
Always measure before indexing.
COLLSCAN = guaranteed performance problem at scale.

How B-Tree Indexes Work Internally

MongoDB indexes use a B-tree data structure — the same family as MySQL and PostgreSQL indexes. The B-tree organises keys in sorted order and uses internal nodes to speed up search, insert, and delete. A typical index entry stores the indexed field value(s) and a pointer to the document's location (BSON object ID or disk location).

The branching factor (number of children per node) is typically around 1000 for modern MongoDB versions. This keeps tree depth low: a 10-million document index is only 3-4 levels deep. Searching a B-tree is O(log n) — for 10M documents, that's about 25 node accesses compared to a full collection scan of 10M documents.

MongoDB stores indexes in wiredTiger's B-tree engine by default. Each index is a separate B-tree that must be maintained on every write. That's why over-indexing hurts write performance.

Here's the thing many engineers miss: the B-tree is also used for sorting. If your sort field matches the index order, MongoDB can traverse the index leaf nodes in order and avoid an in-memory sort. That's where the ESR rule becomes critical.

Additionally, B-tree indexes are stored as multikey structures when indexing arrays. Each array element becomes a separate index entry, which can cause index explosion. If you have an array with 1000 elements, each document will create 1000 index entries for that one index.

Another subtlety: B-tree page splits happen when a node overflows. On high-write systems with monotonic keys, constant right-side splits can cause fragmentation and wasted space. That's one reason hashed indexes help on shard keys.

createIndex.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
// Create a single-field B-tree index
// The index will sort by 'status' ascending
db.orders.createIndex(
  { status: 1 },
  { name: "idx_status_asc" }
);

// MongoDB automatically builds the B-tree
// You can verify index creation with:
// db.orders.getIndexes()
B-tree Mental Model
  • The root node is the directory: it tells you which drawer (child node) to open.
  • Each internal node narrows the search range by comparing the query value with the keys stored in the node.
  • Leaf nodes contain the actual index entries (key + doc pointer) and are linked together for efficient range scans.
  • The tree is self-balancing — every leaf is at the same depth, ensuring consistent performance.
Production Insight
B-tree height grows logarithmically — adding 10x more documents adds only one more level.
However, index size consumes RAM — if your working set doesn't fit in memory, MongoDB will page and performance drops.
Rule: ensure the hot subset of your index fits in the WiredTiger cache (default 50% of RAM - 1GB).
Also, multikey indexes on arrays can blow up index size — monitor indexSize regularly.
The branching factor of ~1000 means a 1B document index fits in 3 levels – all B-tree operations are essentially constant time at scale.
Page splits due to monotonic inserts can increase index size by up to 20% over time.
Key Takeaway
B-tree indexes give O(log n) read performance.
They add O(log n) write overhead per index per write.
Always measure: is the read gain worth the write cost?
B-tree depth stays nearly constant as data grows – that's the magic.

Compound Indexes and the ESR Rule

A compound index is an index on multiple fields. The order of fields in a compound index is critical because MongoDB can only use the index to support queries that filter prefixes of the indexed fields. For example, an index on {a:1, b:1} can support queries on {a} and {a,b} but not on {b} alone.

The ESR rule — Equality, Sort, Range — is the mental model senior engineers use to design compound indexes: - Fields that are compared by equality (=) must come first. - Next comes the field used for sorting (sort). - Last come fields used for range queries ($gt, $lt, $in, etc.).

ESR ensures that MongoDB can use the index to satisfy both the filter and the sort in one pass, avoiding an in-memory sort. Violating ESR usually forces a blocking sort stage, which can consume huge amounts of memory and slow queries.

In practice, I've seen production outages caused by a compound index with the sort field at the beginning. The query would use the index for the equality match, but then MongoDB had to sort 100k documents in memory, hitting the 32MB limit and throwing an error. The fix? Move the range field last and add the sort field as the second key.

But ESR isn't always perfect. When the sort field differs from the range field, you have a dilemma: you can satisfy either the sort with an index scan or the range with an index scan, but not both in one pass. MongoDB will choose an index based on selectivity. In such cases, consider two indexes: one for the filter and one for the sort, or use a covered query if possible.

A common mistake: the sort direction must match the index direction. If the index is {createdAt: 1} but the query sorts by {createdAt: -1}, MongoDB can still use the index but will traverse it backwards. That's fine for single-field indexes, but for compound indexes the direction of each key matters. MongoDB can reverse the entire index order, but not individual fields. So {a: 1, b: -1} can support sort {a: -1, b: 1} but not {a: 1, b: 1}.

compoundIndexExample.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Query: find active orders created after Jan 1, sorted by creation date
// Expected indexes candidates:
// 1. { status: 1, createdAt: 1 } — E(field) then R/S
// 2. { createdAt: 1, status: 1 } — would require a sort in memory

db.orders.createIndex(
  { status: 1, createdAt: 1 },
  { name: "idx_status_createdAt" }
);

// Test the query
db.orders.find(
  { status: "active", createdAt: { $gte: ISODate("2026-01-01") } }
).sort({ createdAt: 1 }).explain("executionStats");
// Expected: IXSCAN on compound index, no SORT stage
Common ESR Mistake
Putting the range field before the sort field forces an in-memory sort. Even if the index is used, MongoDB may have to sort the results in memory, which can be slower than a COLLSCAN for large result sets. Also, sort direction must align with the index direction.
Production Insight
A compound index with wrong field order can still be used but may trigger an in-memory sort.
On large datasets, an in-memory sort can exceed the 32MB memory limit and throw an error.
Rule: always check explain for a 'SORT' stage after a 'FETCH' stage.
When sort and range fields differ, MongoDB may need to do a blocking sort — consider an index that covers the sort alone if range is very selective.
A common production failure: index on {status, createdAt} but query sorts by createdAt DESC – index direction must match sort direction.
Also, using $in with many values can cause the index to be used but still scan many entries – treat $in as a range, put it last.
Key Takeaway
ESR rule: Equality first, then Sort, then Range.
Wrong order = in-memory sort + potential OOM.
Check explain for SORT stage — it's your early warning.
Direction matters: {createdAt:1} vs {createdAt:-1} are different indexes.
Deciding Compound Index Field Order
IfQuery uses equality on fields A and B, range on C, sort on C (same as range)
UseIndex on {A:1, B:1, C:1} — equality fields first, then the range/sort field.
IfQuery uses equality on A, sort on B, no range
UseIndex on {A:1, B:1} — equality then sort. If the sort is descending, index on {A:1, B:-1}.
IfQuery uses equality on A, range on B, sort on C (different from range)
UseIndex on {A:1, C:1, B:1} — equality, then sort, then range. The range field last, even though it's not sorted.

Reading Explain Plans Like a Senior Engineer

When a query is slow, the explain() command is your first diagnostic tool. Run db.collection.find(...).explain('executionStats'). The output has three key sections: - queryPlanner: the plan MongoDB chose, including the winning index. - executionStats: actual execution metrics — nReturned, totalDocsExamined, totalKeysExamined, executionTimeMillis. - rejectedPlans: why other indexes were not chosen.

The key ratio is totalKeysExamined / nReturned. If it's close to 1, the index is highly selective. Ratios above 10 indicate the query is scanning many index entries for each returned document — consider adding more fields to the index.

Also check for stage values: IXSCAN (index scan), COLLSCAN (collection scan), SORT (in-memory sort), FETCH (retrieving documents after index lookup). A SORT stage combined with IXSCAN means the index is not covering the sort.

Senior engineers don't just look at executionTimeMillis. They look at the ratio of keys examined to documents returned, and they look at the number of documents fetched. If totalDocsExamined equals nReturned but totalKeysExamined is 10x higher, you have a selectivity problem, not a missing index.

Another hidden metric: totalDocsExamined. If that number is high, it means MongoDB is fetching many documents from disk even after the index. That happens when the index doesn't cover the projected fields. Consider a covered query.

And remember: the rejectedPlans array is gold. It shows why the optimizer rejected other index candidates. If you see a plan that looks better but wasn't chosen, you may need to hint or rewrite the query.

explainAnalysis.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Run explain on a slow query
db.orders.find(
  { status: "active", total: { $gt: 100 } }
).sort({ createdAt: -1 }).explain("executionStats");

// Example output snippet (in real response, check each field)
// "stage": "IXSCAN",
// "nReturned": 5000,
// "totalKeysExamined": 15000,
// "totalDocsExamined": 5000,
// "executionTimeMillis": 120,
// "rejectedPlans": []

// If totalKeysExamined is much higher than nReturned, the index is not selective enough.
// If there is a "SORT" stage, the index doesn't cover the sort order.
Production Tip
Use db.collection.explain('executionStats').find(...) to get execution stats without running the query twice. But be careful — in production, running executionStats on a heavy query may consume resources. For read-heavy workloads, consider using db.collection.explain('queryPlanner') first.
Production Insight
A winning plan from queryPlanner does not guarantee good performance — always check executionStats.
The number of documents examined vs returned is the single most important metric.
If totalDocsExamined >> nReturned, your query is fetching documents that don't match the filter — add index fields to cover the filter.
Also, watch for 'FETCH' stage without 'IXSCAN' — that's a COLLSCAN with a misleading plan.
The 'rejectedPlans' array tells you why other indexes weren't chosen – use it to learn what MongoDB considers optimal.
In sharded clusters, explain shows per-shard stats — look for skewed execution times that indicate bad shard keys.
Key Takeaway
Explain is your scalpel — don't guess, measure.
Focus on totalKeysExamined / nReturned ratio.
A SORT stage means your index is not covering the sort.
Explain('allPlansExecution') shows detailed plan cost.

Common Index Types and When to Use Them

MongoDB offers several index types beyond the basic single-field and compound indexes: - Multikey Index: automatically created when indexing an array field. Each array element gets an index entry. Useful for querying tags, categories. - Text Index: for full-text search on string fields. Supports stemming and stop words. Only one text index per collection. - Hashed Index: uses a hash of the field value to provide uniform distribution. Good for equality queries only, not range. - Wildcard Index: indexes all fields in a document (or a subset) without specifying field names. Useful for schemaless data but large and write-heavy. - 2dsphere Index: for geospatial queries on GeoJSON objects. Supports $near, $geoWithin, $geoIntersects.

Each type has trade-offs. For example, wildcard indexes consume significant memory and CPU on writes. Multikey indexes can cause index explosion if array elements are many.

I've seen teams create a wildcard index on a collection they used for logging — and then wonder why write throughput dropped by 80%. Wildcard indexes are not a shortcut; they're a last resort when query patterns are truly unpredictable. For most production workloads, explicit indexes on known query fields are always better.

Text indexes are particularly tricky: they support stemming and stop words in many languages, but they're not as fast as Elasticsearch. If you need full-text search at scale, consider using a dedicated search engine instead.

Another subtlety: TTL indexes automatically expire documents after a specified time. They're built on a single date field and run a background thread every 60 seconds. Great for session data, logs, or temporary records. But don't rely on them for precise expiry — the deletion can be delayed up to 60 seconds.

indexTypes.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Multikey index on tags array
db.articles.createIndex({ tags: 1 });

// Text index for full-text search
db.articles.createIndex(
  { title: "text", body: "text" },
  { weights: { title: 10, body: 5 } }
);

// Hashed index for shard key
db.users.createIndex({ email: "hashed" });

// Wildcard index on all fields
db.analytics.createIndex({ "$**": 1 });

// 2dsphere index on location field
db.places.createIndex({ location: "2dsphere" });

// TTL index on session expiry
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 });
Wildcard Index Pitfall
Creating a wildcard index on a collection with many fields and frequent writes can cause significant write slowdown and memory pressure. Prefer explicit indexes on known query fields instead. Also, wildcard indexes cannot be used for sort or range queries effectively.
Production Insight
Multikey indexes cannot be used as the leading field of a compound index if the array is on the first field.
Text indexes are not as fast as database-specific search engines (e.g., Elasticsearch) for heavy text workloads.
Hashed indexes break range queries — they only support equality.
Wildcard indexes grow with every new field in your documents — watch index size.
Text indexes have a word limit of ~32,000 tokens per document for indexing – beyond that, tokens are ignored silently.
TTL indexes run on a background thread; if the primary is under heavy write load, expiry may lag.
Key Takeaway
Choose index type based on query pattern, not convenience.
Wildcard indexes are a tax on write performance.
Know the limitations before you commit.
Text indexes are slow for real-time search – use Elasticsearch for high-volume text queries.

Production Gotchas: Over-Indexing, Covered Queries, and Sparse Indexes

Even well-designed indexes can cause problems at scale. Three common gotchas:

Over-indexing: More indexes means slower writes because each index B-tree must be updated. MongoDB uses an internal write lock per collection, but index maintenance adds CPU and memory pressure. A typical rule of thumb: no more than 5 indexes per collection for high-write workloads.

Covered queries: A query is 'covered' when all the fields it needs are in the index and MongoDB doesn't need to fetch documents from the collection. Covered queries are blazing fast because they only touch the index B-tree. To achieve coverage, include projected fields in the index (but be aware of the 1024-byte limit on index keys — MongoDB indexes can only store values up to 1024 bytes; larger values are skipped).

Sparse indexes: A sparse index only includes documents where the indexed field exists. This can save space but can cause queries to miss documents that don't have the field. For example, if you have a sparse index on email, a query for {email: null} will return no results.

I once debugged a production issue where a sparse index on an optional field caused all queries filtering on that field to miss documents that simply didn't have the field. The team assumed the index covered everything, but it only covered documents with the field set. The fix was to remove the sparse option and let the index handle nulls explicitly.

Another hidden gotcha: indexes on large fields (>1024 bytes) are silently ignored — MongoDB stores only the first 1024 bytes, making them useless for equality matching on long strings or binary data.

Also, collation can break index usage. If your index is created with a specific collation (e.g., case-insensitive), but the query doesn't specify the same collation, MongoDB may not use the index. Always match collation settings.

coveredQuery.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
// Create an index covering the filter and projection
db.orders.createIndex(
  { status: 1, createdAt: -1, total: 1 }
);

// Query that only needs fields in the index
// This is a covered query — no FETCH stage
db.orders.find(
  { status: "active" },
  { status: 1, createdAt: 1, total: 1, _id: 0 }
).sort({ createdAt: -1 }).explain();
// Expected: stage: "PROJECTION_COVERED" or no FETCH stage
Covered Query Mental Model
  • If all the fields your query needs are in the index, MongoDB never touches the collection.
  • Check the explain output for a 'FETCH' stage. If it's missing, the query is covered.
  • Covered queries are the fastest possible read path in MongoDB.
  • Trade-off: adding more fields to an index makes it larger and slower to insert.
Production Insight
In production, covered queries can be 10-100x faster than non-covered queries.
However, over-covering (adding every field to the index) leads to index bloat and write slowdown.
Rule of thumb: cover only the most frequent query patterns, not all of them.
Also, remember the 1024-byte key limit — large strings may not be indexable fully.
A covered query with an index of 3 fields can be 50x faster than a non-covered query with the same index. Profile before optimizing.
Sparse indexes: if you need to filter on existence of a field, use a partial index with { field: { $exists: true } } instead.
Key Takeaway
Covered queries skip document fetch — they're the fastest reads.
But don't over-cover: every extra field in the index slows writes.
Sparse indexes are dangerous: they hide missing data.
Covered queries are the fastest reads but cost write performance – target only hot queries.

Index Maintenance and Monitoring in Production

Indexes aren't a 'set and forget' concern. Over time, query patterns change, data grows, and old indexes become dead weight. MongoDB provides $indexStats to see index usage: how many times each index was used, and how long it took to build.

Every quarter, I run db.collection.aggregate([{$indexStats:{}}]) and look for indexes with zero access in the last 30 days. Those are candidates for removal. Dropping unused indexes is the easiest performance win — it reduces write overhead and frees up RAM.

Another common maintenance task is rebuilding indexes after major data changes. In MongoDB, you can rebuild an index with db.collection.reIndex() but be aware: this blocks all reads and writes on the collection. For production, use createIndex() with a background build option (though in recent versions this is non-blocking by default for new builds).

Also monitor index size vs available RAM. If the hot index keys don't fit in memory, MongoDB will page from disk, and your query latency will spike. Use db.collection.stats() and compare indexSizes with the WiredTiger cache size.

For large indexes, consider using createIndexes() with multiple indexes in one command to reduce round trips. Also, index builds can be resource-intensive — schedule them during low-traffic windows.

Additionally, use the MongoDB profiler to capture slow queries. db.setProfilingLevel(1, { slowms: 100 }) logs all queries over 100ms to the system.profile collection. That's your early warning system for missing or misused indexes.

indexMaintenance.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Check index usage stats
db.orders.aggregate([{$indexStats:{}}]);

// Example output:
// { "name": "idx_status", "accesses": 54321, "since": ISODate(...) }
// { "name": "idx_createdAt", "accesses": 0, "since": ISODate(...) }

// Drop unused index
db.orders.dropIndex("idx_createdAt");

// Check index sizes
db.orders.stats().indexSizes;
// Compare with serverStatus().wiredTiger.cache

// Enable profiler for slow queries
db.setProfilingLevel(1, { slowms: 100 });
Index Build Blocking
Building indexes in production can block reads and writes. Use createIndex() with {background: true} (default in 4.2+ for builds but still has some impact). For large collections, consider rolling index builds with replica sets. Also, index builds can cause replication lag, so monitor secondary replicas.
Production Insight
Unused indexes are silent write-performance killers.
Monitor access counts monthly, drop any index with zero usage for 30+ days.
Index size should fit in WiredTiger cache — otherwise paging kills performance.
Also, index builds on large collections can cause replication lag — monitor secondary replicas.
Index builds on large collections can cause replication lag that trips replica set elections – schedule during low traffic.
The profiler is your friend: use it to catch queries that miss indexes before they cause incidents.
Key Takeaway
Indexes need garbage collection too.
$indexStats tells you which indexes are dead weight.
Drop them — your write path will thank you.
Regular index audits prevent silent performance degradation.

Indexing Strategies for High-Throughput Systems

In high-throughput environments (e.g., real-time analytics, event logging, IoT), every millisecond counts. Here are proven strategies:

Partial Indexes: Use createIndex({ status: 1 }, { partialFilterExpression: { active: true } }) to index only documents matching a condition. This reduces index size and write overhead when only a subset of documents are queried.

TTL Indexes: For time-series data, use a TTL index on a timestamp field to automatically expire old documents. This keeps the index lean and avoids manual cleanup.

Rarely Used Indexes on Write-Heavy Collections: If you have a collection that receives 100k writes/second but only occasional reads, consider using an index only for the read-heavy secondary replica, not on the primary.

Sorted vs Unsorted Index Maintenance: If your insert patterns create keys in monotonic order (e.g., timestamps), the B-tree can become right-heavy, causing page splits and fragmentation. Use {key: 1} on a hashed field or shard key to distribute writes evenly.

I once worked on a system where a monotonically increasing _id field caused index fragmentation on a high-write collection. The fix was to use a hashed shard key and rebuild the index periodically.

Another tactic: compound indexes with a hashed prefix can spread writes across the B-tree while still supporting equality queries on the hash. For example, { hashedUserId: 1, timestamp: 1 }. The hash ensures each user's writes go to different parts of the tree.

partialIndex.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Partial index on active orders only
db.orders.createIndex(
  { status: 1, createdAt: 1 },
  { 
    partialFilterExpression: { active: true },
    name: "idx_active_orders"
  }
);

// TTL index to expire documents after 7 days
db.events.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 604800 }
);

// Index with hashed prefix for write distribution
db.logs.createIndex(
  { hashedSource: "hashed", timestamp: 1 }
);
High-Throughput Tip
For write-heavy collections, consider using a hashed index on the leading field of your compound index to distribute writes across the B-tree. This reduces page splits and fragmentation.
Production Insight
Partial indexes can dramatically reduce write overhead on skewed data.
TTL indexes are passive — they run every 60 seconds by default, so don't rely on them for precise expiry.
Monotonically increasing keys cause B-tree page splits — use hashed indexes or sharding.
Partial indexes reduce index size by up to 90% on skewed data – e.g., indexing only active records in a users table.
But beware: partial indexes can't be used for queries that don't include the partial filter expression – test with explain.
Key Takeaway
High throughput demands lean indexes.
Partial and TTL indexes reduce maintenance overhead.
Monotonically increasing keys cause fragmentation — plan for it.
Write-heavy? Use partial indexes to reduce maintenance overhead.
Choose Index Maintenance Strategy
IfWrites are uniform, no hot spots
UseStandard compound index with ESR. Monitor index fragmentation quarterly.
IfWrites are monotonically increasing (e.g., timestamps)
UseUse a hashed index for the leading field to spread writes. Or consider time-based sharding.
IfOnly a small subset of documents are queried frequently
UseUse partial indexes to index only the active subset. Reduces size and write cost.
IfHigh write volume with occasional read requirements
UseConsider using indexes only on secondary replicas via { readPreference: 'secondary' }
● Production incidentPOST-MORTEMseverity: high

The Missing Index That Killed Search Latency

Symptom
A REST endpoint for searching orders by status took over 30 seconds to respond, causing API gateway timeouts and client-side errors.
Assumption
The team assumed that because an index existed on the 'status' field, all queries filtering by 'status' would use it.
Root cause
The query used db.orders.find({ status: { $in: ['pending', 'shipped', 'cancelled'] } }). While a single-field index on 'status' existed, the query still scanned millions of documents because the index only stored the field value — filtering by multiple values required scanning a large portion of the index tree. MongoDB decided to COLLSCAN instead because the index was not selective enough.
Fix
Created a compound index on (status, createdAt) to support both the filter and a subsequent sort. This reduced the index scan to only the matching documents and leveraged the sort order to stop early.
Key lesson
  • Always analyze query patterns before creating indexes — single-field indexes may not be sufficient for multi-value queries.
  • Use explain() with executionStats to see 'totalDocsExamined' vs 'nReturned'. If they differ significantly, the index is not covering the query well.
  • Monitor slow queries with the profiler; don't assume indexes will automatically be used.
  • Always test with production-like data volume before release.
Production debug guideSymptom-to-action guide for common MongoDB indexing problems in production6 entries
Symptom · 01
Query shows IXSCAN but still reads millions of documents
Fix
Check 'totalKeysExamined' vs 'nReturned' in explain output. If totalKeysExamined >> nReturned, the index is not selective. Consider a compound index that covers the filter fields more precisely.
Symptom · 02
Explain shows COLLSCAN even though an index exists on the queried field
Fix
Verify the index exists with db.collection.getIndexes(). Check if the query uses operators like $regex or $where that prevent index usage. Also check if the query includes a sort field that is not part of the index.
Symptom · 03
Write operations are slow on a collection with many indexes
Fix
Each index adds write overhead. Use db.collection.aggregate([ { $indexStats: {} } ]) to see index usage. Remove unused indexes. Consider whether some indexes can be merged into compound indexes.
Symptom · 04
Compound index exists but query still performs a sort in memory
Fix
The sort field must be the last field in the compound index after equality and range fields (ESR rule). Also ensure the sort direction matches the index direction. Check 'sort' stage in explain.
Symptom · 05
Index used but query returns wrong results due to sparse index
Fix
Run db.collection.find({field: {$exists: false}}).count() to find documents missing the field. Sparse indexes exclude missing fields — design accordingly.
Symptom · 06
Covered query expected but FETCH stage present
Fix
Check if all projected fields are in the index. Include them in the index key. Also ensure _id is excluded if not needed.
★ Quick MongoDB Index Debugging Cheat SheetCommands and checks for diagnosing index-related performance issues in production
Query is slow — find out if index is used
Immediate action
Run explain with executionStats
Commands
db.collection.find({status: 'active'}).explain('executionStats')
Check 'stage' field: 'IXSCAN' means index used, 'COLLSCAN' means full collection scan
Fix now
If COLLSCAN, create an index on the filter field(s) using db.collection.createIndex({status: 1})
Index looks right but query still scans many documents+
Immediate action
Check total keys examined vs documents returned
Commands
Look at 'totalKeysExamined' and 'nReturned' in explain output
If totalKeysExamined is much larger, the index is not selective enough
Fix now
Add additional fields to the compound index to narrow the scan — e.g., {status: 1, createdAt: 1}
Writes are slow, suspect too many indexes+
Immediate action
List all indexes and their usage stats
Commands
db.collection.getIndexes()
db.collection.aggregate([{$indexStats:{}}])
Fix now
Drop any indexes with zero or very low use using db.collection.dropIndex('indexName')
Index size grows unexpectedly, suspect fragmentation+
Immediate action
Check index size vs collection size
Commands
db.collection.stats().indexSizes
Compare with total data size. If index is larger than data, fragmentation may be high.
Fix now
Rebuild index during maintenance window: db.collection.reIndex() but blocks all operations.
Covered query expected but FETCH stage present+
Immediate action
Check projection includes all required fields
Commands
db.collection.find(...).projection(...).explain()
Look for stage 'FETCH' — if present, add missing fields to the index
Fix now
Update the compound index to include all projected fields
Sort stage present despite compound index+
Immediate action
Verify ESR rule order
Commands
db.collection.find(query).sort({field: direction}).explain()
Check if sort field matches the last key in the compound index
Fix now
Recreate the index with sort field after equality fields: {eqField: 1, sortField: 1}
MongoDB Index Types Comparison
Index TypeUse CaseLimitations
Single-fieldSimple equality or range queries on one fieldDoes not support multi-field queries efficiently
CompoundQueries with multiple filter fields and/or sortOrder of fields matters — wrong order can cause in-memory sort
MultikeyQuerying array fields (e.g., tags)Cannot be leading field of compound index if array is first; index explosion with large arrays
TextFull-text search on string fieldsOnly one text index per collection; not as fast as dedicated search engines
HashedShard key for data distributionOnly supports equality queries, no range or sort
WildcardSchemaless collections with unpredictable query patternsLarge index size, high write overhead, cannot support sort or range effectively
TTLAuto-expire documents based on date fieldNot precise — deletes run every 60 seconds. Can't be used with compound indexes

Key takeaways

1
You now understand what MongoDB Indexing is and why it exists
2
You've seen it working in a real runnable example
3
Practice daily
the forge only works when it's hot 🔥
4
B-tree indexes give O(log n) reads and O(log n) per-index write overhead
5
Use ESR rule for compound index field order
Equality → Sort → Range
6
Explain('executionStats') reveals selectivity via totalKeysExamined / nReturned ratio
7
Covered queries are the fastest read path
they skip document retrieval entirely
8
Monitor index usage with $indexStats and drop unused indexes to save write performance
9
Sparse indexes can silently exclude missing fields
use with caution
10
Over-indexing is the #1 cause of write performance degradation in MongoDB
11
Partial indexes reduce overhead on skewed datasets
12
Always profile queries before deciding index strategy
13
Covered queries are the fastest reads
make them your default for high-frequency paths.
14
Regular index audits prevent silent performance degradation.
15
Collation mismatches are a subtle but common cause of index not used.
16
Hashed indexes distribute writes evenly and help avoid fragmentation.

Common mistakes to avoid

10 patterns
×

Memorising syntax before understanding the concept

Symptom
You can write createIndex() but can't explain why the index improves performance or why a compound index fails.
Fix
Practice with explain() on real queries. Understand the B-tree structure and impact on execution.
×

Skipping practice and only reading theory

Symptom
You know the ESR rule but still create indexes that cause in-memory sorts.
Fix
Set up a local MongoDB instance with 1M records, create different indexes, and check explain() results.
×

Creating indexes on every field

Symptom
Write throughput drops significantly; inserts and updates slow down.
Fix
Only index fields used in queries. Monitor index usage with $indexStats and drop unused indexes.
×

Using wildcard indexes as a default

Symptom
Large index size, high memory usage, and slow writes.
Fix
Use wildcard indexes only for schemaless collections where query patterns are unpredictable. Otherwise, create specific indexes.
×

Not monitoring index usage

Symptom
Unused indexes consume RAM and slow writes without any benefit.
Fix
Run db.collection.aggregate([{ $indexStats: {} }]) periodically and drop indexes with zero usage for 30+ days.
×

Assuming a compound index covers all queries regardless of field order

Symptom
A compound index exists but the query still does a COLLSCAN or in-memory sort because the query doesn't use a prefix of the index.
Fix
Always check if the query filter starts with the first field of the index. Reorder index fields to match the most common query pattern.
×

Ignoring the 1024-byte key limit

Symptom
Equality queries on long strings or binary data return incorrect results.
Fix
If your indexed field can exceed 1024 bytes, consider a hashed index or store a shorter derivative (e.g., a hash) in a separate field.
×

Overusing sparse indexes without understanding null semantics

Symptom
Queries filtering on the indexed field with $exists or null return unexpected results.
Fix
Avoid sparse indexes unless you fully understand the query patterns. Use a partial index with an explicit filter instead.
×

Not covering sort direction in compound index

Symptom
Query sorts in descending order but index is ascending – MongoDB still uses index but needs in-memory sort for large result sets.
Fix
Create index with matching sort direction: {field1: 1, field2: -1}.
×

Using $in with index but still seeing high keysExamined

Symptom
Index used but query still slow because $in scans many index entries.
Fix
Treat $in as a range query. Put it last in the compound index after equality and sort fields.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does a B-tree index work in MongoDB? Explain with respect to read an...
Q02SENIOR
Explain the ESR rule for compound indexes. Give an example where violati...
Q03SENIOR
How do you debug a slow query that uses an index but still scans many do...
Q04SENIOR
What is a covered query and how do you achieve one in MongoDB?
Q05SENIOR
You have a compound index on {status: 1, createdAt: -1}. Queries with fi...
Q06SENIOR
How do you handle index fragmentation in MongoDB?
Q07SENIOR
When would you choose a partial index over a regular compound index?
Q08SENIOR
How do you handle index fragmentation in a high-write environment?
Q09SENIOR
What is the difference between a covered query and a non-covered query i...
Q10SENIOR
How does collation affect index usage in MongoDB?
Q01 of 10SENIOR

How does a B-tree index work in MongoDB? Explain with respect to read and write performance.

ANSWER
A B-tree index is a self-balancing tree where each node can have multiple children. In MongoDB, the index is stored separately from the collection. For reads, the index guides the query to the leaf nodes containing the key and document pointer, enabling O(log n) search. For writes, each insert or update must update all relevant B-trees (one per index), which adds O(log n) cost per index. This is why over-indexing hurts write performance. The WiredTiger storage engine uses a B-tree variant for indexes.
FAQ · 12 QUESTIONS

Frequently Asked Questions

01
What is MongoDB Indexing in simple terms?
02
Why does the order of fields in a compound index matter?
03
How many indexes should I have on a collection?
04
What is the difference between IXSCAN and COLLSCAN in explain?
05
Can I use a wildcard index with MongoDB?
06
What is a covered query and how do I know if my query is covered?
07
How do I find unused indexes in my MongoDB collection?
08
What is a partial filter expression in MongoDB indexes?
09
How do I handle the 1024-byte key limit in MongoDB indexes?
10
What happens if an index key exceeds 1024 bytes?
11
Why does MongoDB choose a COLLSCAN even when an index exists?
12
How does collation affect index usage?
🔥

That's NoSQL. Mark it forged?

11 min read · try the examples if you haven't

Previous
MongoDB Aggregation Pipeline
5 / 15 · NoSQL
Next
Redis Basics