Senior 14 min · March 05, 2026

MongoDB $in Query COLLSCAN — Non-Selective Index Gotcha

A $in query on status caused 30-second response times due to COLLSCAN—single-field index was non-selective.

N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • MongoDB indexes are B-tree data structures that map indexed field values to document locations
  • A single-field index sorts values and enables fast lookup, range queries, and sort operations
  • Compound indexes support queries on multiple fields; the order of fields must follow the ESR rule (Equality, Sort, Range)
  • explain() reveals whether a query uses an index (IXSCAN) or scans the entire collection (COLLSCAN)
  • Performance insight: index lookup is O(log n) vs COLLSCAN's O(n) — on 10M documents, that's ~25 reads vs 10M reads
  • Production insight: wrong index order can make queries slower than no index because MongoDB may still read many documents to filter unindexed fields
  • Covered queries skip document fetch — fast reads, great for high-frequency paths
✦ Definition~90s read
What is MongoDB Indexing?

MongoDB indexing is the mechanism that allows the database to locate documents without scanning every document in a collection — a full collection scan (COLLSCAN). Internally, MongoDB uses B-tree indexes, which store sorted references to documents, enabling logarithmic-time lookups.

Imagine a 1,000-page cookbook with no table of contents.

The critical gotcha with $in queries is that even with an index, if the set of values in the $in array is not selective enough (i.e., matches a large percentage of the collection), the query optimizer may decide a COLLSCAN is cheaper than bouncing through the index for each value. This happens because $in effectively becomes multiple index lookups, and if the combined result set exceeds ~10-20% of the collection, the cost of random I/O from index traversal outweighs a sequential scan.

The ESR (Equality, Sort, Range) rule governs compound index design: fields used in equality filters must come first, then sort fields, then range filters like $in. Violating this order forces the index to be non-selective for the range portion, often triggering a COLLSCAN.

Production monitoring with explain() output — specifically looking at stage: COLLSCAN and totalDocsExamined vs nReturned — is the only way to catch this. Alternatives include using $or with separate indexes, restructuring queries to use $eq with multiple queries, or leveraging covered queries where the index contains all needed fields, eliminating document fetches entirely.

Over-indexing is a common pitfall: each index adds write overhead and memory pressure, so you must balance query patterns against index maintenance costs.

Plain-English First

Imagine a 1,000-page cookbook with no table of contents. Every time you want a pasta recipe, you flip every single page until you find one. An index is that table of contents — it tells MongoDB exactly which page (document) to turn to without reading every other page first. The difference between a query taking 50ms and 50 seconds is almost always whether the right index exists.

MongoDB can hold hundreds of millions of documents and still answer a query in single-digit milliseconds — but only if you've built the right indexes. Without them, every query triggers a COLLSCAN (collection scan), meaning MongoDB reads every document in the collection sequentially. At scale, that's the difference between a snappy user experience and a timed-out API call that wakes your on-call engineer at 3 AM.

The problem indexes solve is deceptively simple: avoid reading data you don't need. But the implementation details are where engineers get into trouble. The wrong index order in a compound index, a missing index on a sort field, or a wildcard index used carelessly can make performance worse than no index at all — because now MongoDB has to maintain extra data structures on every write.

By the end of this article you'll understand exactly how MongoDB's B-tree indexes are structured on disk, how to construct compound indexes using the ESR rule, how to read an explain() plan like a senior engineer, and the production gotchas that trip up teams who learned indexing from a five-minute tutorial. The silent killer: a missing index on a sort field can turn a 10ms query into a 10-second blocking sort. Don't let that be you.

How MongoDB Indexing Actually Works — And Why It Fails

MongoDB indexing is a B-tree data structure that maps indexed field values to disk locations, enabling the query engine to locate documents without scanning every document in the collection. Without an index, every query triggers a collection scan (COLLSCAN) — an O(n) operation that reads every document. With a proper index, lookups become O(log n) via tree traversal.

An index stores a sorted reference of field values plus a pointer to the document. When a query uses the indexed field, the query planner selects the most efficient index based on cardinality, selectivity, and sort order. A compound index can cover multiple fields, but only the leftmost prefix is used for equality matches. Range queries, sorting, and $in clauses each interact with the index differently — $in, for example, can degrade to multiple index scans if the list is large or the index is non-selective.

Use indexes when a field appears frequently in query filters, sort operations, or join conditions. In production, a single missing index can cause latency spikes from 2ms to 2s as collections grow beyond memory. Indexes also consume RAM and slow writes, so you must balance read performance against write overhead. The rule: index for your most critical read paths, not every field.

Index Selectivity Matters More Than Existence
An index on a low-cardinality field (e.g., boolean 'isActive') may still cause a COLLSCAN if the query planner decides scanning is cheaper than fetching 50% of documents via the index.
Production Insight
A payment service used an index on 'status' (values: pending, completed, failed) for a $in query filtering 3 statuses. As the collection grew to 50M docs, the index became non-selective — the query planner chose COLLSCAN over index scan, causing 12-second timeouts during peak hours.
Symptom: db.currentOp() showed COLLSCAN on a query that should have used the index, with 'planSummary' field reading 'COLLSCAN' instead of 'IXSCAN'.
Rule: For $in queries, ensure the indexed field has high cardinality (>1000 distinct values) or use a compound index that includes a highly selective field.
Key Takeaway
An index only helps if it is selective enough — low-cardinality fields often cause the query planner to skip the index.
$in queries can trigger multiple index scans or a full COLLSCAN if the list size exceeds 100-200 elements or the index is non-selective.
Always verify index usage with .explain('executionStats') — never assume an index is being used just because it exists.
MongoDB $in Query COLLSCAN — Non-Selective Index Gotcha THECODEFORGE.IO MongoDB $in Query COLLSCAN — Non-Selective Index Gotcha How non-selective indexes cause full collection scans in MongoDB Query with $in on Non-Selective Field e.g., status: {$in: ['active', 'inactive']} Index Lookup Returns Many Documents Index selectivity too low; large result set Query Planner Chooses COLLSCAN Index scan cost exceeds collection scan threshold Full Collection Scan Executed Scans all documents, high latency Performance Degradation High CPU, slow response times Fix: Use Selective Index or Restructure Query Add compound index or filter on high-cardinality field ⚠ Non-selective $in queries often trigger COLLSCAN Ensure index selectivity > 90% or use covered queries THECODEFORGE.IO
thecodeforge.io
MongoDB $in Query COLLSCAN — Non-Selective Index Gotcha
Mongodb Indexing

How B-Tree Indexes Work Internally

MongoDB indexes use a B-tree data structure — the same family as MySQL and PostgreSQL indexes. The B-tree organises keys in sorted order and uses internal nodes to speed up search, insert, and delete. A typical index entry stores the indexed field value(s) and a pointer to the document's location (BSON object ID or disk location).

The branching factor (number of children per node) is typically around 1000 for modern MongoDB versions. This keeps tree depth low: a 10-million document index is only 3-4 levels deep. Searching a B-tree is O(log n) — for 10M documents, that's about 25 node accesses compared to a full collection scan of 10M documents.

MongoDB stores indexes in wiredTiger's B-tree engine by default. Each index is a separate B-tree that must be maintained on every write. That's why over-indexing hurts write performance.

Here's the thing many engineers miss: the B-tree is also used for sorting. If your sort field matches the index order, MongoDB can traverse the index leaf nodes in order and avoid an in-memory sort. That's where the ESR rule becomes critical.

Additionally, B-tree indexes are stored as multikey structures when indexing arrays. Each array element becomes a separate index entry, which can cause index explosion. If you have an array with 1000 elements, each document will create 1000 index entries for that one index.

Another subtlety: B-tree page splits happen when a node overflows. On high-write systems with monotonic keys, constant right-side splits can cause fragmentation and wasted space. That's one reason hashed indexes help on shard keys.

createIndex.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
// Create a single-field B-tree index
// The index will sort by 'status' ascending
db.orders.createIndex(
  { status: 1 },
  { name: "idx_status_asc" }
);

// MongoDB automatically builds the B-tree
// You can verify index creation with:
// db.orders.getIndexes()
B-tree Mental Model
  • The root node is the directory: it tells you which drawer (child node) to open.
  • Each internal node narrows the search range by comparing the query value with the keys stored in the node.
  • Leaf nodes contain the actual index entries (key + doc pointer) and are linked together for efficient range scans.
  • The tree is self-balancing — every leaf is at the same depth, ensuring consistent performance.
Production Insight
B-tree height grows logarithmically — adding 10x more documents adds only one more level.
However, index size consumes RAM — if your working set doesn't fit in memory, MongoDB will page and performance drops.
Rule: ensure the hot subset of your index fits in the WiredTiger cache (default 50% of RAM - 1GB).
Also, multikey indexes on arrays can blow up index size — monitor indexSize regularly.
The branching factor of ~1000 means a 1B document index fits in 3 levels – all B-tree operations are essentially constant time at scale.
Page splits due to monotonic inserts can increase index size by up to 20% over time.
Key Takeaway
B-tree indexes give O(log n) read performance.
They add O(log n) write overhead per index per write.
Always measure: is the read gain worth the write cost?
B-tree depth stays nearly constant as data grows – that's the magic.

Compound Indexes and the ESR Rule

A compound index is an index on multiple fields. The order of fields in a compound index is critical because MongoDB can only use the index to support queries that filter prefixes of the indexed fields. For example, an index on {a:1, b:1} can support queries on {a} and {a,b} but not on {b} alone.

The ESR rule — Equality, Sort, Range — is the mental model senior engineers use to design compound indexes: - Fields that are compared by equality (=) must come first. - Next comes the field used for sorting (sort). - Last come fields used for range queries ($gt, $lt, $in, etc.).

ESR ensures that MongoDB can use the index to satisfy both the filter and the sort in one pass, avoiding an in-memory sort. Violating ESR usually forces a blocking sort stage, which can consume huge amounts of memory and slow queries.

In practice, I've seen production outages caused by a compound index with the sort field at the beginning. The query would use the index for the equality match, but then MongoDB had to sort 100k documents in memory, hitting the 32MB limit and throwing an error. The fix? Move the range field last and add the sort field as the second key.

But ESR isn't always perfect. When the sort field differs from the range field, you have a dilemma: you can satisfy either the sort with an index scan or the range with an index scan, but not both in one pass. MongoDB will choose an index based on selectivity. In such cases, consider two indexes: one for the filter and one for the sort, or use a covered query if possible.

A common mistake: the sort direction must match the index direction. If the index is {createdAt: 1} but the query sorts by {createdAt: -1}, MongoDB can still use the index but will traverse it backwards. That's fine for single-field indexes, but for compound indexes the direction of each key matters. MongoDB can reverse the entire index order, but not individual fields. So {a: 1, b: -1} can support sort {a: -1, b: 1} but not {a: 1, b: 1}.

compoundIndexExample.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Query: find active orders created after Jan 1, sorted by creation date
// Expected indexes candidates:
// 1. { status: 1, createdAt: 1 } — E(field) then R/S
// 2. { createdAt: 1, status: 1 } — would require a sort in memory

db.orders.createIndex(
  { status: 1, createdAt: 1 },
  { name: "idx_status_createdAt" }
);

// Test the query
db.orders.find(
  { status: "active", createdAt: { $gte: ISODate("2026-01-01") } }
).sort({ createdAt: 1 }).explain("executionStats");
// Expected: IXSCAN on compound index, no SORT stage
Common ESR Mistake
Putting the range field before the sort field forces an in-memory sort. Even if the index is used, MongoDB may have to sort the results in memory, which can be slower than a COLLSCAN for large result sets. Also, sort direction must align with the index direction.
Production Insight
A compound index with wrong field order can still be used but may trigger an in-memory sort.
On large datasets, an in-memory sort can exceed the 32MB memory limit and throw an error.
Rule: always check explain for a 'SORT' stage after a 'FETCH' stage.
When sort and range fields differ, MongoDB may need to do a blocking sort — consider an index that covers the sort alone if range is very selective.
A common production failure: index on {status, createdAt} but query sorts by createdAt DESC – index direction must match sort direction.
Also, using $in with many values can cause the index to be used but still scan many entries – treat $in as a range, put it last.
Key Takeaway
ESR rule: Equality first, then Sort, then Range.
Wrong order = in-memory sort + potential OOM.
Check explain for SORT stage — it's your early warning.
Direction matters: {createdAt:1} vs {createdAt:-1} are different indexes.
Deciding Compound Index Field Order
IfQuery uses equality on fields A and B, range on C, sort on C (same as range)
UseIndex on {A:1, B:1, C:1} — equality fields first, then the range/sort field.
IfQuery uses equality on A, sort on B, no range
UseIndex on {A:1, B:1} — equality then sort. If the sort is descending, index on {A:1, B:-1}.
IfQuery uses equality on A, range on B, sort on C (different from range)
UseIndex on {A:1, C:1, B:1} — equality, then sort, then range. The range field last, even though it's not sorted.

Reading Explain Plans Like a Senior Engineer

When a query is slow, the explain() command is your first diagnostic tool. Run db.collection.find(...).explain('executionStats'). The output has three key sections: - queryPlanner: the plan MongoDB chose, including the winning index. - executionStats: actual execution metrics — nReturned, totalDocsExamined, totalKeysExamined, executionTimeMillis. - rejectedPlans: why other indexes were not chosen.

The key ratio is totalKeysExamined / nReturned. If it's close to 1, the index is highly selective. Ratios above 10 indicate the query is scanning many index entries for each returned document — consider adding more fields to the index.

Also check for stage values: IXSCAN (index scan), COLLSCAN (collection scan), SORT (in-memory sort), FETCH (retrieving documents after index lookup). A SORT stage combined with IXSCAN means the index is not covering the sort.

Senior engineers don't just look at executionTimeMillis. They look at the ratio of keys examined to documents returned, and they look at the number of documents fetched. If totalDocsExamined equals nReturned but totalKeysExamined is 10x higher, you have a selectivity problem, not a missing index.

Another hidden metric: totalDocsExamined. If that number is high, it means MongoDB is fetching many documents from disk even after the index. That happens when the index doesn't cover the projected fields. Consider a covered query.

And remember: the rejectedPlans array is gold. It shows why the optimizer rejected other index candidates. If you see a plan that looks better but wasn't chosen, you may need to hint or rewrite the query.

explainAnalysis.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Run explain on a slow query
db.orders.find(
  { status: "active", total: { $gt: 100 } }
).sort({ createdAt: -1 }).explain("executionStats");

// Example output snippet (in real response, check each field)
// "stage": "IXSCAN",
// "nReturned": 5000,
// "totalKeysExamined": 15000,
// "totalDocsExamined": 5000,
// "executionTimeMillis": 120,
// "rejectedPlans": []

// If totalKeysExamined is much higher than nReturned, the index is not selective enough.
// If there is a "SORT" stage, the index doesn't cover the sort order.
Production Tip
Use db.collection.explain('executionStats').find(...) to get execution stats without running the query twice. But be careful — in production, running executionStats on a heavy query may consume resources. For read-heavy workloads, consider using db.collection.explain('queryPlanner') first.
Production Insight
A winning plan from queryPlanner does not guarantee good performance — always check executionStats.
The number of documents examined vs returned is the single most important metric.
If totalDocsExamined >> nReturned, your query is fetching documents that don't match the filter — add index fields to cover the filter.
Also, watch for 'FETCH' stage without 'IXSCAN' — that's a COLLSCAN with a misleading plan.
The 'rejectedPlans' array tells you why other indexes weren't chosen – use it to learn what MongoDB considers optimal.
In sharded clusters, explain shows per-shard stats — look for skewed execution times that indicate bad shard keys.
Key Takeaway
Explain is your scalpel — don't guess, measure.
Focus on totalKeysExamined / nReturned ratio.
A SORT stage means your index is not covering the sort.
Explain('allPlansExecution') shows detailed plan cost.

Common Index Types and When to Use Them

MongoDB offers several index types beyond the basic single-field and compound indexes: - Multikey Index: automatically created when indexing an array field. Each array element gets an index entry. Useful for querying tags, categories. - Text Index: for full-text search on string fields. Supports stemming and stop words. Only one text index per collection. - Hashed Index: uses a hash of the field value to provide uniform distribution. Good for equality queries only, not range. - Wildcard Index: indexes all fields in a document (or a subset) without specifying field names. Useful for schemaless data but large and write-heavy. - 2dsphere Index: for geospatial queries on GeoJSON objects. Supports $near, $geoWithin, $geoIntersects.

Each type has trade-offs. For example, wildcard indexes consume significant memory and CPU on writes. Multikey indexes can cause index explosion if array elements are many.

I've seen teams create a wildcard index on a collection they used for logging — and then wonder why write throughput dropped by 80%. Wildcard indexes are not a shortcut; they're a last resort when query patterns are truly unpredictable. For most production workloads, explicit indexes on known query fields are always better.

Text indexes are particularly tricky: they support stemming and stop words in many languages, but they're not as fast as Elasticsearch. If you need full-text search at scale, consider using a dedicated search engine instead.

Another subtlety: TTL indexes automatically expire documents after a specified time. They're built on a single date field and run a background thread every 60 seconds. Great for session data, logs, or temporary records. But don't rely on them for precise expiry — the deletion can be delayed up to 60 seconds.

indexTypes.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Multikey index on tags array
db.articles.createIndex({ tags: 1 });

// Text index for full-text search
db.articles.createIndex(
  { title: "text", body: "text" },
  { weights: { title: 10, body: 5 } }
);

// Hashed index for shard key
db.users.createIndex({ email: "hashed" });

// Wildcard index on all fields
db.analytics.createIndex({ "$**": 1 });

// 2dsphere index on location field
db.places.createIndex({ location: "2dsphere" });

// TTL index on session expiry
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 });
Wildcard Index Pitfall
Creating a wildcard index on a collection with many fields and frequent writes can cause significant write slowdown and memory pressure. Prefer explicit indexes on known query fields instead. Also, wildcard indexes cannot be used for sort or range queries effectively.
Production Insight
Multikey indexes cannot be used as the leading field of a compound index if the array is on the first field.
Text indexes are not as fast as database-specific search engines (e.g., Elasticsearch) for heavy text workloads.
Hashed indexes break range queries — they only support equality.
Wildcard indexes grow with every new field in your documents — watch index size.
Text indexes have a word limit of ~32,000 tokens per document for indexing – beyond that, tokens are ignored silently.
TTL indexes run on a background thread; if the primary is under heavy write load, expiry may lag.
Key Takeaway
Choose index type based on query pattern, not convenience.
Wildcard indexes are a tax on write performance.
Know the limitations before you commit.
Text indexes are slow for real-time search – use Elasticsearch for high-volume text queries.

Production Gotchas: Over-Indexing, Covered Queries, and Sparse Indexes

Even well-designed indexes can cause problems at scale. Three common gotchas:

Over-indexing: More indexes means slower writes because each index B-tree must be updated. MongoDB uses an internal write lock per collection, but index maintenance adds CPU and memory pressure. A typical rule of thumb: no more than 5 indexes per collection for high-write workloads.

Covered queries: A query is 'covered' when all the fields it needs are in the index and MongoDB doesn't need to fetch documents from the collection. Covered queries are blazing fast because they only touch the index B-tree. To achieve coverage, include projected fields in the index (but be aware of the 1024-byte limit on index keys — MongoDB indexes can only store values up to 1024 bytes; larger values are skipped).

Sparse indexes: A sparse index only includes documents where the indexed field exists. This can save space but can cause queries to miss documents that don't have the field. For example, if you have a sparse index on email, a query for {email: null} will return no results.

I once debugged a production issue where a sparse index on an optional field caused all queries filtering on that field to miss documents that simply didn't have the field. The team assumed the index covered everything, but it only covered documents with the field set. The fix was to remove the sparse option and let the index handle nulls explicitly.

Another hidden gotcha: indexes on large fields (>1024 bytes) are silently ignored — MongoDB stores only the first 1024 bytes, making them useless for equality matching on long strings or binary data.

Also, collation can break index usage. If your index is created with a specific collation (e.g., case-insensitive), but the query doesn't specify the same collation, MongoDB may not use the index. Always match collation settings.

coveredQuery.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
// Create an index covering the filter and projection
db.orders.createIndex(
  { status: 1, createdAt: -1, total: 1 }
);

// Query that only needs fields in the index
// This is a covered query — no FETCH stage
db.orders.find(
  { status: "active" },
  { status: 1, createdAt: 1, total: 1, _id: 0 }
).sort({ createdAt: -1 }).explain();
// Expected: stage: "PROJECTION_COVERED" or no FETCH stage
Covered Query Mental Model
  • If all the fields your query needs are in the index, MongoDB never touches the collection.
  • Check the explain output for a 'FETCH' stage. If it's missing, the query is covered.
  • Covered queries are the fastest possible read path in MongoDB.
  • Trade-off: adding more fields to an index makes it larger and slower to insert.
Production Insight
In production, covered queries can be 10-100x faster than non-covered queries.
However, over-covering (adding every field to the index) leads to index bloat and write slowdown.
Rule of thumb: cover only the most frequent query patterns, not all of them.
Also, remember the 1024-byte key limit — large strings may not be indexable fully.
A covered query with an index of 3 fields can be 50x faster than a non-covered query with the same index. Profile before optimizing.
Sparse indexes: if you need to filter on existence of a field, use a partial index with { field: { $exists: true } } instead.
Key Takeaway
Covered queries skip document fetch — they're the fastest reads.
But don't over-cover: every extra field in the index slows writes.
Sparse indexes are dangerous: they hide missing data.
Covered queries are the fastest reads but cost write performance – target only hot queries.

Index Maintenance and Monitoring in Production

Indexes aren't a 'set and forget' concern. Over time, query patterns change, data grows, and old indexes become dead weight. MongoDB provides $indexStats to see index usage: how many times each index was used, and how long it took to build.

Every quarter, I run db.collection.aggregate([{$indexStats:{}}]) and look for indexes with zero access in the last 30 days. Those are candidates for removal. Dropping unused indexes is the easiest performance win — it reduces write overhead and frees up RAM.

Another common maintenance task is rebuilding indexes after major data changes. In MongoDB, you can rebuild an index with db.collection.reIndex() but be aware: this blocks all reads and writes on the collection. For production, use createIndex() with a background build option (though in recent versions this is non-blocking by default for new builds).

Also monitor index size vs available RAM. If the hot index keys don't fit in memory, MongoDB will page from disk, and your query latency will spike. Use db.collection.stats() and compare indexSizes with the WiredTiger cache size.

For large indexes, consider using createIndexes() with multiple indexes in one command to reduce round trips. Also, index builds can be resource-intensive — schedule them during low-traffic windows.

Additionally, use the MongoDB profiler to capture slow queries. db.setProfilingLevel(1, { slowms: 100 }) logs all queries over 100ms to the system.profile collection. That's your early warning system for missing or misused indexes.

indexMaintenance.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Check index usage stats
db.orders.aggregate([{$indexStats:{}}]);

// Example output:
// { "name": "idx_status", "accesses": 54321, "since": ISODate(...) }
// { "name": "idx_createdAt", "accesses": 0, "since": ISODate(...) }

// Drop unused index
db.orders.dropIndex("idx_createdAt");

// Check index sizes
db.orders.stats().indexSizes;
// Compare with serverStatus().wiredTiger.cache

// Enable profiler for slow queries
db.setProfilingLevel(1, { slowms: 100 });
Index Build Blocking
Building indexes in production can block reads and writes. Use createIndex() with {background: true} (default in 4.2+ for builds but still has some impact). For large collections, consider rolling index builds with replica sets. Also, index builds can cause replication lag, so monitor secondary replicas.
Production Insight
Unused indexes are silent write-performance killers.
Monitor access counts monthly, drop any index with zero usage for 30+ days.
Index size should fit in WiredTiger cache — otherwise paging kills performance.
Also, index builds on large collections can cause replication lag — monitor secondary replicas.
Index builds on large collections can cause replication lag that trips replica set elections – schedule during low traffic.
The profiler is your friend: use it to catch queries that miss indexes before they cause incidents.
Key Takeaway
Indexes need garbage collection too.
$indexStats tells you which indexes are dead weight.
Drop them — your write path will thank you.
Regular index audits prevent silent performance degradation.

Indexing Strategies for High-Throughput Systems

In high-throughput environments (e.g., real-time analytics, event logging, IoT), every millisecond counts. Here are proven strategies:

Partial Indexes: Use createIndex({ status: 1 }, { partialFilterExpression: { active: true } }) to index only documents matching a condition. This reduces index size and write overhead when only a subset of documents are queried.

TTL Indexes: For time-series data, use a TTL index on a timestamp field to automatically expire old documents. This keeps the index lean and avoids manual cleanup.

Rarely Used Indexes on Write-Heavy Collections: If you have a collection that receives 100k writes/second but only occasional reads, consider using an index only for the read-heavy secondary replica, not on the primary.

Sorted vs Unsorted Index Maintenance: If your insert patterns create keys in monotonic order (e.g., timestamps), the B-tree can become right-heavy, causing page splits and fragmentation. Use {key: 1} on a hashed field or shard key to distribute writes evenly.

I once worked on a system where a monotonically increasing _id field caused index fragmentation on a high-write collection. The fix was to use a hashed shard key and rebuild the index periodically.

Another tactic: compound indexes with a hashed prefix can spread writes across the B-tree while still supporting equality queries on the hash. For example, { hashedUserId: 1, timestamp: 1 }. The hash ensures each user's writes go to different parts of the tree.

partialIndex.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Partial index on active orders only
db.orders.createIndex(
  { status: 1, createdAt: 1 },
  { 
    partialFilterExpression: { active: true },
    name: "idx_active_orders"
  }
);

// TTL index to expire documents after 7 days
db.events.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 604800 }
);

// Index with hashed prefix for write distribution
db.logs.createIndex(
  { hashedSource: "hashed", timestamp: 1 }
);
High-Throughput Tip
For write-heavy collections, consider using a hashed index on the leading field of your compound index to distribute writes across the B-tree. This reduces page splits and fragmentation.
Production Insight
Partial indexes can dramatically reduce write overhead on skewed data.
TTL indexes are passive — they run every 60 seconds by default, so don't rely on them for precise expiry.
Monotonically increasing keys cause B-tree page splits — use hashed indexes or sharding.
Partial indexes reduce index size by up to 90% on skewed data – e.g., indexing only active records in a users table.
But beware: partial indexes can't be used for queries that don't include the partial filter expression – test with explain.
Key Takeaway
High throughput demands lean indexes.
Partial and TTL indexes reduce maintenance overhead.
Monotonically increasing keys cause fragmentation — plan for it.
Write-heavy? Use partial indexes to reduce maintenance overhead.
Choose Index Maintenance Strategy
IfWrites are uniform, no hot spots
UseStandard compound index with ESR. Monitor index fragmentation quarterly.
IfWrites are monotonically increasing (e.g., timestamps)
UseUse a hashed index for the leading field to spread writes. Or consider time-based sharding.
IfOnly a small subset of documents are queried frequently
UseUse partial indexes to index only the active subset. Reduces size and write cost.
IfHigh write volume with occasional read requirements
UseConsider using indexes only on secondary replicas via { readPreference: 'secondary' }

Dropping Indexes Without Causing a Meltdown

Deleting an index sounds trivial. In production, it’s a live grenade. If that index was supporting a critical query path, your database instantly falls back to a full collection scan. Latency spikes. Connections queue. Your pager goes off.

Why drop an index at all? Bloat from unused indexes wastes RAM and slows writes. You should audit every index quarterly. A query that hasn’t run in six months doesn’t need its crutch.

Use db.collection.dropIndex("index_name") — but always drop in a maintenance window. First confirm no query touches it by checking the output of $indexStats. Then drop, wait thirty seconds, and verify explain("executionStats") on the target queries still shows IXSCAN. If you see COLLSCAN for even one query, recreate the index and rethink your strategy.

Never drop a compound index under load. The index rebuild blocks writes on that collection. That’s how you turn a five-second query into a five-minute outage.

DropIndexAudit.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — database tutorial

// Find unused indexes — they are costing you writes and RAM
db.orders.aggregate([
  { $indexStats: {} },
  { $match: { accesses: { ops: { $lt: 10 }, since: { $gte: ISODate("2025-01-01") } } } }
]).pretty()

// Drop the bloat — confirm the index name first
// db.orders.getIndexes() lists all index keys and names
db.orders.dropIndex("order_date_1_status_1")

// Verify production queries still use an index
db.orders.find({ status: "shipped", order_date: { $gte: ISODate("2025-03-01") } })
  .explain("executionStats")
Output
// db.orders.getIndexes()
[
{ "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" },
{ "v" : 2, "key" : { "order_date" : 1, "status" : 1 }, "name" : "order_date_1_status_1" }
]
// explain output — should show IXSCAN, not COLLSCAN
{
"queryPlanner" : { "winningPlan" : { "stage" : "IXSCAN" } },
"executionStats" : { "nReturned" : 24500, "totalDocsExamined" : 24500 }
}
Production Trap:
If you drop an index on a sharded collection, the drop propagates asynchronously. It may still participate in queries for seconds or minutes after the command returns. Always wait for the balancer to catch up.
Key Takeaway
Drop indexes only after verifying they are unused and never during peak traffic. Use $indexStats and explain() to confirm safety.

View All Indexes Without Guessing

You inherited a Mongo cluster from someone who left no documentation. Or you’ve been adding indexes ad hoc for six months. Either way, you need a single source of truth. getIndexes() is that source.

Run db.collection.getIndexes() and it spits back every index on that collection — name, key pattern, and any options like sparse, unique, or TTL. The output is a JSON array. Pipe it to pretty() before you try to read it raw.

But a collection-by-collection approach is slow in a schema with dozens of collections. Use a short JavaScript loop in the shell to dump all indexes across the database at once. Store the output somewhere — a file, a wiki, a Slack thread. That snapshot is your baseline.

The real insight comes from cross-referencing this list with your application’s query patterns. An index on user_email_1 that is unique? That’s a constraint, not a performance tool. A compound index on status_1_created_at_-1? That’s your report query. If you see a single-field index that is never used in a compound index, you may have a fragmentation problem — drop and consolidate.

ListAllIndexes.jsSQL
1
2
3
4
5
6
7
8
9
10
11
12
// io.thecodeforge — database tutorial

// List indexes for one collection
db.orders.getIndexes().forEach(idx => printjson(idx))

// Dump all indexes in the entire database — for that inventory file
db.getCollectionNames().forEach(function(coll) {
  var idxs = db[coll].getIndexes();
  idxs.forEach(function(idx) {
    print(coll + ": " + JSON.stringify(idx));
  });
});
Output
// Output for db.orders.getIndexes()
{
"v" : 2,
"key" : { "_id" : 1 },
"name" : "_id_"
}
{
"v" : 2,
"key" : { "order_date" : 1, "status" : 1 },
"name" : "order_date_1_status_1"
}
// Script output for the whole database
orders: {"v":2,"key":{"_id":1},"name":"_id_"}
orders: {"v":2,"key":{"order_date":1,"status":1},"name":"order_date_1_status_1"}
users: {"v":2,"key":{"_id":1},"name":"_id_"}
users: {"v":2,"key":{"email":1},"name":"email_1","unique":true}
Senior Shortcut:
Store a snapshot of all indexes in a version-controlled file. When deploying application changes, diff the new index set against the baseline to avoid surprises.
Key Takeaway
Run getIndexes() on every collection to audit your index inventory. Cross-reference with query patterns to identify dead weight or missing compound indexes.

Introduction

Indexes are the single most impactful performance lever in MongoDB, yet they are frequently misunderstood or misapplied. This guide cuts through the noise to give you a senior engineer's understanding of index internals, practical selection strategies, and production pitfalls. We start with why indexes exist: without them, MongoDB performs a collection scan on every query, reading every document to find matches. That works for tiny datasets but collapses under real-world loads. Indexes trade write overhead for read speed—a tradeoff you must evaluate for every collection. You will learn how B-Tree structures enable logarithmic lookups, how compound indexes obey the ESR rule (Equality, Sort, Range), and how to read explain plans to diagnose performance. We also cover over-indexing, covered queries, maintenance, and high-throughput strategies. By the end, you will not only know which index to create but also when not to create one, avoiding the silent production meltdowns that plague teams who treat indexing as a checklist item instead of a design decision.

intro_index_create.sqlSQL
1
2
3
4
5
6
// io.thecodeforge — database tutorial
// Minimal index creation example
use myapp;
db.users.createIndex({ email: 1 }, { unique: true });
db.users.createIndex({ lastLogin: -1 }, { background: true });
print("Indexes created: email (unique), lastLogin (descending)");
Output
Indexes created: email (unique), lastLogin (descending)
Production Trap:
Creating a unique index on a large collection will block writes until the build completes. Always use { background: true } or build during maintenance windows to avoid downtime.
Key Takeaway
Never add indexes blindly—understand the read/write ratio of your workload first.

Step 1 — Preparing the Sample Database

Before you can test index performance, you need a realistic dataset. We'll create a database called shop with a collection orders that contains at least 100,000 documents. This volume ensures B-Tree depth becomes visible in explain plans. Start by switching to the shop database: use shop;. Insert a batch of documents with fields: orderId, customerId, total, status, and createdAt. Use a script or the insertMany command with a loop. For example, generate 100k documents where orderId increments, customerId cycles through 1000 distinct values, total is random between 10.00 and 1000.00, status is one of 'pending', 'shipped', 'delivered', and createdAt is a random date in the last year. This distribution mirrors a real e-commerce workload. After insertion, confirm the count with db.orders.countDocuments(). Then create a simple index on status to observe the difference between a collection scan and an index scan in the next steps. Ensure your MongoDB instance is running locally or on a test server—never experiment on production data.

setup_sample_data.jsSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — database tutorial
use shop;
const bulk = db.orders.initializeUnorderedBulkOp();
for (let i = 0; i < 100000; i++) {
  bulk.insert({
    orderId: i,
    customerId: Math.floor(Math.random() * 1000),
    total: (Math.random() * 990 + 10).toFixed(2),
    status: ['pending','shipped','delivered'][Math.floor(Math.random()*3)],
    createdAt: new Date(Date.now() - Math.random() * 365 * 86400000)
  });
}
bulk.execute();
print("Inserted: " + db.orders.countDocuments());
db.orders.createIndex({ status: 1 });
Output
Inserted: 100000
Pro Tip:
Use initializeUnorderedBulkOp() for large inserts—it’s faster and handles partial failures gracefully. Always validate the document count after insertion.
Key Takeaway
A realistic dataset with high cardinality fields is essential for meaningful index experiments.
● Production incidentPOST-MORTEMseverity: high

The Missing Index That Killed Search Latency

Symptom
A REST endpoint for searching orders by status took over 30 seconds to respond, causing API gateway timeouts and client-side errors.
Assumption
The team assumed that because an index existed on the 'status' field, all queries filtering by 'status' would use it.
Root cause
The query used db.orders.find({ status: { $in: ['pending', 'shipped', 'cancelled'] } }). While a single-field index on 'status' existed, the query still scanned millions of documents because the index only stored the field value — filtering by multiple values required scanning a large portion of the index tree. MongoDB decided to COLLSCAN instead because the index was not selective enough.
Fix
Created a compound index on (status, createdAt) to support both the filter and a subsequent sort. This reduced the index scan to only the matching documents and leveraged the sort order to stop early.
Key lesson
  • Always analyze query patterns before creating indexes — single-field indexes may not be sufficient for multi-value queries.
  • Use explain() with executionStats to see 'totalDocsExamined' vs 'nReturned'. If they differ significantly, the index is not covering the query well.
  • Monitor slow queries with the profiler; don't assume indexes will automatically be used.
  • Always test with production-like data volume before release.
Production debug guideSymptom-to-action guide for common MongoDB indexing problems in production6 entries
Symptom · 01
Query shows IXSCAN but still reads millions of documents
Fix
Check 'totalKeysExamined' vs 'nReturned' in explain output. If totalKeysExamined >> nReturned, the index is not selective. Consider a compound index that covers the filter fields more precisely.
Symptom · 02
Explain shows COLLSCAN even though an index exists on the queried field
Fix
Verify the index exists with db.collection.getIndexes(). Check if the query uses operators like $regex or $where that prevent index usage. Also check if the query includes a sort field that is not part of the index.
Symptom · 03
Write operations are slow on a collection with many indexes
Fix
Each index adds write overhead. Use db.collection.aggregate([ { $indexStats: {} } ]) to see index usage. Remove unused indexes. Consider whether some indexes can be merged into compound indexes.
Symptom · 04
Compound index exists but query still performs a sort in memory
Fix
The sort field must be the last field in the compound index after equality and range fields (ESR rule). Also ensure the sort direction matches the index direction. Check 'sort' stage in explain.
Symptom · 05
Index used but query returns wrong results due to sparse index
Fix
Run db.collection.find({field: {$exists: false}}).count() to find documents missing the field. Sparse indexes exclude missing fields — design accordingly.
Symptom · 06
Covered query expected but FETCH stage present
Fix
Check if all projected fields are in the index. Include them in the index key. Also ensure _id is excluded if not needed.
★ Quick MongoDB Index Debugging Cheat SheetCommands and checks for diagnosing index-related performance issues in production
Query is slow — find out if index is used
Immediate action
Run explain with executionStats
Commands
db.collection.find({status: 'active'}).explain('executionStats')
Check 'stage' field: 'IXSCAN' means index used, 'COLLSCAN' means full collection scan
Fix now
If COLLSCAN, create an index on the filter field(s) using db.collection.createIndex({status: 1})
Index looks right but query still scans many documents+
Immediate action
Check total keys examined vs documents returned
Commands
Look at 'totalKeysExamined' and 'nReturned' in explain output
If totalKeysExamined is much larger, the index is not selective enough
Fix now
Add additional fields to the compound index to narrow the scan — e.g., {status: 1, createdAt: 1}
Writes are slow, suspect too many indexes+
Immediate action
List all indexes and their usage stats
Commands
db.collection.getIndexes()
db.collection.aggregate([{$indexStats:{}}])
Fix now
Drop any indexes with zero or very low use using db.collection.dropIndex('indexName')
Index size grows unexpectedly, suspect fragmentation+
Immediate action
Check index size vs collection size
Commands
db.collection.stats().indexSizes
Compare with total data size. If index is larger than data, fragmentation may be high.
Fix now
Rebuild index during maintenance window: db.collection.reIndex() but blocks all operations.
Covered query expected but FETCH stage present+
Immediate action
Check projection includes all required fields
Commands
db.collection.find(...).projection(...).explain()
Look for stage 'FETCH' — if present, add missing fields to the index
Fix now
Update the compound index to include all projected fields
Sort stage present despite compound index+
Immediate action
Verify ESR rule order
Commands
db.collection.find(query).sort({field: direction}).explain()
Check if sort field matches the last key in the compound index
Fix now
Recreate the index with sort field after equality fields: {eqField: 1, sortField: 1}
MongoDB Index Types Comparison
Index TypeUse CaseLimitations
Single-fieldSimple equality or range queries on one fieldDoes not support multi-field queries efficiently
CompoundQueries with multiple filter fields and/or sortOrder of fields matters — wrong order can cause in-memory sort
MultikeyQuerying array fields (e.g., tags)Cannot be leading field of compound index if array is first; index explosion with large arrays
TextFull-text search on string fieldsOnly one text index per collection; not as fast as dedicated search engines
HashedShard key for data distributionOnly supports equality queries, no range or sort
WildcardSchemaless collections with unpredictable query patternsLarge index size, high write overhead, cannot support sort or range effectively
TTLAuto-expire documents based on date fieldNot precise — deletes run every 60 seconds. Can't be used with compound indexes

Key takeaways

1
You now understand what MongoDB Indexing is and why it exists
2
You've seen it working in a real runnable example
3
Practice daily
the forge only works when it's hot 🔥
4
B-tree indexes give O(log n) reads and O(log n) per-index write overhead
5
Use ESR rule for compound index field order
Equality → Sort → Range
6
Explain('executionStats') reveals selectivity via totalKeysExamined / nReturned ratio
7
Covered queries are the fastest read path
they skip document retrieval entirely
8
Monitor index usage with $indexStats and drop unused indexes to save write performance
9
Sparse indexes can silently exclude missing fields
use with caution
10
Over-indexing is the #1 cause of write performance degradation in MongoDB
11
Partial indexes reduce overhead on skewed datasets
12
Always profile queries before deciding index strategy
13
Covered queries are the fastest reads
make them your default for high-frequency paths.
14
Regular index audits prevent silent performance degradation.
15
Collation mismatches are a subtle but common cause of index not used.
16
Hashed indexes distribute writes evenly and help avoid fragmentation.

Common mistakes to avoid

10 patterns
×

Memorising syntax before understanding the concept

Symptom
You can write createIndex() but can't explain why the index improves performance or why a compound index fails.
Fix
Practice with explain() on real queries. Understand the B-tree structure and impact on execution.
×

Skipping practice and only reading theory

Symptom
You know the ESR rule but still create indexes that cause in-memory sorts.
Fix
Set up a local MongoDB instance with 1M records, create different indexes, and check explain() results.
×

Creating indexes on every field

Symptom
Write throughput drops significantly; inserts and updates slow down.
Fix
Only index fields used in queries. Monitor index usage with $indexStats and drop unused indexes.
×

Using wildcard indexes as a default

Symptom
Large index size, high memory usage, and slow writes.
Fix
Use wildcard indexes only for schemaless collections where query patterns are unpredictable. Otherwise, create specific indexes.
×

Not monitoring index usage

Symptom
Unused indexes consume RAM and slow writes without any benefit.
Fix
Run db.collection.aggregate([{ $indexStats: {} }]) periodically and drop indexes with zero usage for 30+ days.
×

Assuming a compound index covers all queries regardless of field order

Symptom
A compound index exists but the query still does a COLLSCAN or in-memory sort because the query doesn't use a prefix of the index.
Fix
Always check if the query filter starts with the first field of the index. Reorder index fields to match the most common query pattern.
×

Ignoring the 1024-byte key limit

Symptom
Equality queries on long strings or binary data return incorrect results.
Fix
If your indexed field can exceed 1024 bytes, consider a hashed index or store a shorter derivative (e.g., a hash) in a separate field.
×

Overusing sparse indexes without understanding null semantics

Symptom
Queries filtering on the indexed field with $exists or null return unexpected results.
Fix
Avoid sparse indexes unless you fully understand the query patterns. Use a partial index with an explicit filter instead.
×

Not covering sort direction in compound index

Symptom
Query sorts in descending order but index is ascending – MongoDB still uses index but needs in-memory sort for large result sets.
Fix
Create index with matching sort direction: {field1: 1, field2: -1}.
×

Using $in with index but still seeing high keysExamined

Symptom
Index used but query still slow because $in scans many index entries.
Fix
Treat $in as a range query. Put it last in the compound index after equality and sort fields.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does a B-tree index work in MongoDB? Explain with respect to read an...
Q02SENIOR
Explain the ESR rule for compound indexes. Give an example where violati...
Q03SENIOR
How do you debug a slow query that uses an index but still scans many do...
Q04SENIOR
What is a covered query and how do you achieve one in MongoDB?
Q05SENIOR
You have a compound index on {status: 1, createdAt: -1}. Queries with fi...
Q06SENIOR
How do you handle index fragmentation in MongoDB?
Q07SENIOR
When would you choose a partial index over a regular compound index?
Q08SENIOR
How do you handle index fragmentation in a high-write environment?
Q09SENIOR
What is the difference between a covered query and a non-covered query i...
Q10SENIOR
How does collation affect index usage in MongoDB?
Q01 of 10SENIOR

How does a B-tree index work in MongoDB? Explain with respect to read and write performance.

ANSWER
A B-tree index is a self-balancing tree where each node can have multiple children. In MongoDB, the index is stored separately from the collection. For reads, the index guides the query to the leaf nodes containing the key and document pointer, enabling O(log n) search. For writes, each insert or update must update all relevant B-trees (one per index), which adds O(log n) cost per index. This is why over-indexing hurts write performance. The WiredTiger storage engine uses a B-tree variant for indexes.
FAQ · 12 QUESTIONS

Frequently Asked Questions

01
What is MongoDB Indexing in simple terms?
02
Why does the order of fields in a compound index matter?
03
How many indexes should I have on a collection?
04
What is the difference between IXSCAN and COLLSCAN in explain?
05
Can I use a wildcard index with MongoDB?
06
What is a covered query and how do I know if my query is covered?
07
How do I find unused indexes in my MongoDB collection?
08
What is a partial filter expression in MongoDB indexes?
09
How do I handle the 1024-byte key limit in MongoDB indexes?
10
What happens if an index key exceeds 1024 bytes?
11
Why does MongoDB choose a COLLSCAN even when an index exists?
12
How does collation affect index usage?
N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's NoSQL. Mark it forged?

14 min read · try the examples if you haven't

Previous
MongoDB Aggregation Pipeline
5 / 15 · NoSQL
Next
Redis Basics