Senior 24 min · March 05, 2026
MongoDB Aggregation Pipeline

MongoDB Agg: $lookup Without Index - 25B Comparisons

An unindexed $lookup caused 25 billion comparisons and a 45-minute timeout in production.

N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

Follow
Production
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • MongoDB Aggregation Pipeline processes documents through a series of stages, each transforming the data stream.
  • Core stages: $match (filter), $group (aggregate), $sort (order), $project (reshape), $lookup (join), $unwind (flatten arrays).
  • Stage order matters: early $match reduces document count before expensive operations.
  • Performance: indexes used automatically for $match and $sort early in pipeline; late-stage operations may require allowDiskUse() to exceed 100MB memory limit.
  • Production insight: $lookup without an index on the foreign field triggers a full collection scan per input document — can kill query performance.
  • Biggest mistake: assuming the pipeline executes in the order you type it — MongoDB optimises, but $sort before $match blocks index usage.
✦ Definition~90s read
What is MongoDB Aggregation Pipeline?

MongoDB Aggregation Pipeline is a core concept in Database. Instead of a dry definition, think of it as a data processing assembly line. Each stage receives a stream of documents, transforms them, and passes the result to the next stage. It's declarative — you say what you want, not how to do it.

Imagine you work in a huge warehouse full of cardboard boxes.

That's the key insight: you describe the transformation steps, and MongoDB's query planner figures out the execution order, index usage, and resource allocation.

The pipeline was introduced in MongoDB 2.2 to replace MapReduce, which ran JavaScript in a single thread and couldn't leverage indexes. The pipeline runs in C++ (native speed), can use indexes for $match and $sort, and streams results through stages to keep memory low.

But that streaming nature also creates pitfalls: blocking stages ($group, $sort) must buffer everything before output, which is where the 100MB memory limit bites you.

Here's a concrete example: you want the top 10 customers by total order value this year. A find() can't do that — you need to filter, join, group, sort, and limit. The pipeline does all that in one go.

Before the pipeline existed, teams often relied on application-side aggregation: pull all orders into memory, filter, group, sort. That worked for small datasets but broke at production scale. The pipeline moves the computation to the database, saving network bandwidth and leveraging indexes. That's the difference between a 2-second dashboard and a 45-minute timeout.

One thing most engineers miss: the pipeline optimizer can reorder stages, but only within strict rules. For example, $match can be pushed before $project, but not through $lookup. This means your mental model of stage order must match the optimizer's limitations — or you'll write a pipeline that looks optimised but forces full collection scans.

Here's how you can verify this yourself: run explain() on a pipeline with $match after $lookup. You'll see the $match stays put — it won't push down.

Another nuance: the optimizer can also merge consecutive $match stages and push $project filters into earlier stages. But it cannot reorder stages across a $lookup boundary because it doesn't know the foreign collection's data distribution. That's why you need to manually push filters into $lookup's sub-pipeline.

Plain-English First

Imagine you work in a huge warehouse full of cardboard boxes. You need to find all boxes heavier than 10kg, group them by color, and count how many of each color you have. Instead of opening every box yourself, you set up an assembly line: the first worker filters heavy boxes, the second sorts by color, and the third counts. That assembly line is MongoDB's Aggregation Pipeline — each stage hands its output to the next, transforming your raw data into the exact answer you need.

Most applications don't just store data — they ask questions of it. 'Which users spent the most last month?', 'How many orders shipped per region per day?', 'What's the 90th-percentile response time across our API endpoints?' These are not simple find() queries. They require filtering, reshaping, grouping, and computing across potentially millions of documents. That's the job MongoDB's Aggregation Pipeline was built for, and in high-traffic production systems it's the difference between a dashboard that loads in 40ms and one that times out.

Before the Aggregation Pipeline existed (pre-MongoDB 2.2), developers were stuck with MapReduce — a JavaScript-evaluated, single-threaded monster that was slow, opaque, and painful to debug. The Pipeline replaced it with a declarative, composable model: you describe what you want, stage by stage, and MongoDB's query planner figures out how to execute it efficiently. That mental shift — from imperative to declarative — is what makes the Pipeline feel elegant once it clicks.

By the end of this article you'll be able to write multi-stage pipelines that actually use indexes, avoid the dreaded 100MB in-memory sort limit, spot the three most common production mistakes before they bite you in a code review, and explain to an interviewer how the execution engine decides whether to push a $match before a $lookup.

What is MongoDB Aggregation Pipeline?

MongoDB Aggregation Pipeline is a core concept in Database. Instead of a dry definition, think of it as a data processing assembly line. Each stage receives a stream of documents, transforms them, and passes the result to the next stage. It's declarative — you say what you want, not how to do it. That's the key insight: you describe the transformation steps, and MongoDB's query planner figures out the execution order, index usage, and resource allocation.

The pipeline was introduced in MongoDB 2.2 to replace MapReduce, which ran JavaScript in a single thread and couldn't leverage indexes. The pipeline runs in C++ (native speed), can use indexes for $match and $sort, and streams results through stages to keep memory low. But that streaming nature also creates pitfalls: blocking stages ($group, $sort) must buffer everything before output, which is where the 100MB memory limit bites you.

Here's a concrete example: you want the top 10 customers by total order value this year. A find() can't do that — you need to filter, join, group, sort, and limit. The pipeline does all that in one go.

Before the pipeline existed, teams often relied on application-side aggregation: pull all orders into memory, filter, group, sort. That worked for small datasets but broke at production scale. The pipeline moves the computation to the database, saving network bandwidth and leveraging indexes. That's the difference between a 2-second dashboard and a 45-minute timeout.

One thing most engineers miss: the pipeline optimizer can reorder stages, but only within strict rules. For example, $match can be pushed before $project, but not through $lookup. This means your mental model of stage order must match the optimizer's limitations — or you'll write a pipeline that looks optimised but forces full collection scans.

Here's how you can verify this yourself: run explain() on a pipeline with $match after $lookup. You'll see the $match stays put — it won't push down.

Another nuance: the optimizer can also merge consecutive $match stages and push $project filters into earlier stages. But it cannot reorder stages across a $lookup boundary because it doesn't know the foreign collection's data distribution. That's why you need to manually push filters into $lookup's sub-pipeline.

pipeline-intro.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Top 10 customers by total order value in 2026
db.orders.aggregate([
  { $match: { orderDate: { $gte: ISODate('2026-01-01') } } },
  { $lookup: {
      from: 'customers',
      localField: 'customerId',
      foreignField: '_id',
      as: 'customer'
  }},
  { $unwind: '$customer' },
  { $group: { _id: '$customer.email', totalSpent: { $sum: '$total' } } },
  { $sort: { totalSpent: -1 } },
  { $limit: 10 }
]);

// Check optimization — see if $match stays first
db.orders.explain('queryPlanner').aggregate([...]);
Forge Tip:
Type this code yourself rather than copy-pasting. The muscle memory of writing it will help it stick.
Production Insight
A pipeline that works on 10k documents may fail on 10M.
Always test at scale — use explain() to see stage boundaries.
The streaming model breaks when a blocking stage holds all documents.
Real fix: push $match early and use allowDiskUse only as a last resort.
Key Takeaway
Pipeline is declarative — describe what, not how.
Blocking stages ($group, $sort) consume memory — plan for it.
Streaming doesn't mean no memory limit; it means per-stage.
When to use aggregation vs find()
IfNeed to transform data (group, sort, project) across documents
UseUse aggregation pipeline — find() can't reshape or group.
IfSimple filter with field selection, no joins
UseUse find() — it's faster and uses indexes more flexibly.
IfNeed to join multiple collections
UseUse aggregation with $lookup — but ensure foreign index exists.
MongoDB Aggregation Pipeline: $lookup Without Index THECODEFORGE.IO MongoDB Aggregation Pipeline: $lookup Without Index 25B comparisons: performance impact of missing index in $lookup Aggregation Pipeline Stages: $match, $group, $sort, $project, $lookup Blocking vs Streaming Stages Blocking: $sort, $group; Streaming: $match, $project Index Usage & 100MB Limit Index speeds up $lookup; 100MB memory limit per stage $lookup Without Index 25B comparisons: full collection scan per document Pipeline Optimization Execution engine: reorder stages, use indexes Debugging & Profiling Use explain() and profiler to identify bottlenecks ⚠ Missing index on foreign field in $lookup causes 25B comparisons Always create index on foreign field to avoid full collection scan THECODEFORGE.IO
thecodeforge.io
MongoDB Aggregation Pipeline: $lookup Without Index
Mongodb Aggregation Pipeline

Stage Execution Order and Document Flow

Understanding the physical execution order of an aggregation pipeline is crucial for debugging performance. The logical order you write (e.g., $match then $group) may be reordered by the optimizer, but the actual engine follows a strict streaming model: each stage processes documents one at a time and passes them to the next. However, stages that require all input before producing output (blocking stages) break the stream. The diagram below illustrates a typical pipeline flow showing which stages stream and which block.

The key insight: $match and $project are streaming — they process one doc at a time. $group, $sort, and $bucket are blocking — they accumulate all input before emitting any output. $lookup is hybrid: it streams the local documents but performs indexed lookups per document (if indexed) or full scans (if not). $unwind is streaming but can increase document count dramatically.

When you see a slow pipeline, trace the flow: identify the first blocking stage. That's where the memory and time bottleneck lives. Always put a $match before a blocking stage to reduce its input size.

Production Insight
In one production incident, a pipeline had $lookup before $group — but $group was blocking 2 million docs. Adding a $match before $lookup cut input by 70%, dropping memory usage below 100MB. The fix was as simple as adding a $match on a date range.
Key Takeaway
Blocking stages are the performance chokepoints. Place streaming stages before them to minimize input size.
Aggregation Pipeline Stage Execution Flow
$match: streaming$project: streaming$lookup: hybrid$unwind: streaming$group: blocking$sort: blocking$limit: streaming

Core Stages: $match, $group, $sort, $project, $lookup, $unwind

The pipeline is only as good as its stages. Here's the real behaviour of the most used ones:

  • $match: Filters documents. Always put it first — it reduces the document count before any expensive operation. Importantly, $match can use indexes if it appears at the start, but once you've had a $project or $unwind, the engine can't use indexes for later $match stages.
  • $group: Groups documents by a key and applies accumulator expressions (sum, avg, etc.). This stage buffers all documents in memory, so if your group key is high-cardinality, you may hit the 100MB limit.
  • $sort: Sorts documents. If used early, it can use an index. If used after $group or $project, it's a blocking operation — it must see all input before emitting output.
  • $project: Reshapes documents. Use it to exclude fields you don't need downstream. This reduces memory pressure.
  • $lookup: Performs a left outer join with another collection. Slowest stage by far — always ensure the foreign field is indexed. You can also pass a pipeline to $lookup to filter before joining.
  • $unwind: Deconstructs an array field. Each array element becomes a separate document. This can explode cardinality — one document with a 1000-element array becomes 1000 documents. Use with caution.

Order matters more than you think. MongoDB's optimizer can reorder some stages, but not all. A $match after $lookup won't push down into the foreign collection. Put $match before $lookup if possible.

Let's look at a real-world example: a pipeline that totals sales per region, but we also want to include region details from a lookup.

One pattern senior engineers use: pass a pipeline to $lookup with $match to filter joined documents before the join completes. This reduces the number of documents returned and avoids unnecessary $unwind multiplication. For example, if you only need active customers, add a $match in the $lookup pipeline.

Another common trap: using $addFields to compute temporary fields that are later used in $group. Every $addFields re-materialises the document, costing CPU. Instead, compute derived fields inside $group using $sum with expressions or use $project to trim fields before $group.

I once saw a team add a $addFields to compute a discount after a $lookup — it doubled the pipeline execution time. Moving the computation into $group fixed it.

There's also a subtlety with $project: removing fields early reduces memory in later $group and $sort stages. Use $project to drop all fields except the group key and accumulator. You can even use $project to rename fields to avoid $addFields entirely.

pipeline-example.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Example: Get total sales per customer in 2026
db.orders.aggregate([
  { $match: { orderDate: { $gte: ISODate('2026-01-01'), $lt: ISODate('2027-01-01') } } },
  { $lookup: {
      from: 'customers',
      localField: 'customerId',
      foreignField: '_id',
      as: 'customer'
  }},
  { $unwind: '$customer' },  // one doc per order+customer
  { $group: {
      _id: '$customer.email',
      totalSpent: { $sum: '$total' }
  }},
  { $sort: { totalSpent: -1 } },
  { $limit: 10 }
]);

// Optimised $lookup with pipeline
db.orders.aggregate([
  { $match: { orderDate: { $gte: ISODate('2026-01-01') } } },
  { $lookup: {
      from: 'customers',
      let: { custId: '$customerId' },
      pipeline: [
        { $match: { $expr: { $eq: ['$_id', '$$custId'] } } },
        { $match: { active: true } } // only active customers
      ],
      as: 'customer'
  }},
  { $unwind: { path: '$customer', preserveNullAndEmptyArrays: true } },
  { $group: { _id: '$customer.email', totalSpent: { $sum: '$total' } } },
  { $sort: { totalSpent: -1 } }
]);
Common trap
Placing $match after $unwind does NOT filter the original array elements — it filters the exploded documents. To filter array elements before exploding, use $filter inside $project or $addFields.
Production Insight
A $lookup without an index on the foreign field becomes a nested loop join — O(n*m). For 10k orders and 100k customers, that's 1 billion comparisons.
Always index foreign fields before deploying a $lookup.
Even with an index, $lookup adds network round-trips; reduce input with early $match.
Debug command: db.orders.explain('executionStats').aggregate([...]) — check stage for 'COLLSCAN'.
Key Takeaway
Stage order determines index usage and memory consumption.
Early $match is the single most impactful performance lever.
Use explain() to verify stage execution — assumptions are expensive.
Real world: an index on foreign field cut a 45-minute job to 3 seconds.
Stage order decision
IfNeed to reduce data before any join
UsePlace $match as stage 1, then $lookup. Reduces foreign key lookups.
IfNeed to sort on a field after transformation
UseConsider using $sort before $project if possible to use index, otherwise expect in-memory sort.
IfNeed to filter after $lookup
UseMove filter into $lookup's pipeline parameter — engine won't push it down automatically.

Blocking vs Streaming Stage Performance Table

One of the most important distinctions in pipeline performance is whether a stage is streaming or blocking. A streaming stage processes documents one at a time and never materialises the entire set in memory. A blocking stage must accumulate all input documents before it can emit any output — which consumes memory and adds latency.

StageTypeMemory UsageNotes
$matchStreamingLow (filters one doc at a time)Can use indexes if at pipeline start
$projectStreamingLow (reshapes one doc)Use to trim fields before blocking stages
$addFieldsStreamingLowStill re-materialises each doc; prefer inline in $group
$lookupHybrid (varies)Moderate-highStreaming on local docs; blocking on foreign collection if unindexed
$unwindStreamingLow (but multiplies docs)Each input doc can produce many output docs
$groupBlockingHigh (can exceed 100MB)Buffers all documents with the same group key
$sortBlockingHigh (full in-memory sort)Exception: $sort + $limit uses top-k (memory ~k elements)
$bucketBlockingHighDivides into buckets; memory grows with number of buckets
$facetBlockingMultiple sub-pipelinesEach sub-pipeline can use up to 100MB independently
$limitStreamingLowCaps number of documents
$skipStreamingLowSkips first N docs
$sampleStreaming / BlockingVariesIf size < 5% of collection, uses random cursor (streaming); otherwise sorts on _id (blocking)

Production tip: Always place streaming stages before blocking stages. The early $match and $project reduce the avalanche of documents hitting a $group or $sort. If you must use a blocking stage, remember that $sort + $limit is an exception: it keeps only the top K elements in memory, making it effectively streaming for the sorted window.

Memory Budget in Practice
Each blocking stage gets its own 100MB memory allowance. If you have a $group then a $sort, that's 200MB potential usage before allowDiskUse kicks in. Use explain('executionStats') to see if any stage reported 'usedDisk: true'.
Production Insight
When I migrated a pipeline from a $group + $sort to $sort + $limit + $group (with a different aggregation), we reduced memory usage from 300MB to 40MB. The $sort + $limit used top-k and never materialised the full dataset.
Key Takeaway
Streaming stages are your friends — keep them early. Blocking stages should be as late as possible, with inputs trimmed by earlier streaming stages.

Index Usage and the 100MB Memory Limit

MongoDB's aggregation engine can leverage indexes for $match, $sort, and $geoNear — but only if those stages appear early in the pipeline, before any stage that transforms the document shape.

The 100MB memory limit applies to blocking stages: $group, $sort, $bucket, $facet, $lookup (in-memory side). When these stages exceed 100MB, the engine throws an error unless you pass { allowDiskUse: true }. Disk spill is slower but lets you process larger-than-memory datasets.

Here's the trick: the memory limit is per-stage, not per-pipeline. A $group that processes 90MB and a $sort that processes 80MB will both fit, but a $group processing 120MB will fail. Use explain to see memory consumption.

What about indexes in $lookup? The foreign collection's index is used when you specify an exact field match. If you pass a pipeline, the engine tries to use indexes for $match within that sub-pipeline.

Also, $match within $lookup's pipeline does NOT use indexes on the local collection — only the foreign collection's indexes matter there.

Let's see how to check memory usage and enable disk use with code.

Note that $bucket and $facet can also hit the memory limit. Each sub-pipeline in $facet gets its own memory allowance, but if one sub-pipeline spills to disk, it delays the whole $facet.

One edge case: $sort after $limit. If you sort then limit, MongoDB uses a top-k sort that keeps only k elements in memory. This is a huge win — but it only works if $sort is before $limit with no intervening $group. If you need to sort after grouping, you're back to full in-memory sort.

I've seen a pipeline fail on a 200MB $group with a perfectly healthy $sort. The fix wasn't allowDiskUse — it was adding a $match to cut input by 60%.

Another important detail: the 100MB limit is configurable with the internalQueryMaxBlockingSortMemoryUsageBytes parameter (though not recommended to change in production). But the real lever is $project — reducing each document's size before $group means more documents fit in 100MB. A 20% reduction in document size can be the difference between spilling and not spilling.

explain-memory.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Check execution stats and memory usage
db.orders.aggregate([
  { $match: { status: 'completed' } },
  { $group: { _id: '$region', total: { $sum: '$amount' } } },
  { $sort: { total: -1 } }
]).explain('executionStats');

// Enable disk use for large sorts
db.orders.aggregate(
  [ ... ],
  { allowDiskUse: true }
);

// Check if disk was used in executionStats: look for 'usedDisk' field
// On MongoDB 4.4+, also run db.collection.aggregate([...], { allowDiskUse: true }).explain('executionStats') to see spill.
The Pipeline as a Waterfall
  • Early $match reduces the volume entering the entire pipeline.
  • Indexes only work before the first document-transforming stage ($project, $unwind, $addFields).
  • Memory limit is per-pool — one large $group can blow it even if other stages are small.
  • allowDiskUse lets the overflow spill into a reservoir (disk), but slows the flow.
  • explain() shows you the water level at each stage — the stage with the highest 'docs examined' is your bottleneck.
Production Insight
We once saw a $sort before a $match blow up the memory limit despite a small result set. The $sort processed the entire collection because the index could not be used after an earlier $addFields.
Fixed by moving $match to position 1 and $sort to position 2.
Rely on explain(), not intuition.
Add monitoring: if you see 'usedDisk:true' in explain, your query needs optimisation.
Key Takeaway
Put $match and $sort before any stage that changes document shape.
Memory limit is per blocking stage — check explain for 'usedDisk'.
allowDiskUse is a band-aid; proper stage order is the cure.
When to use allowDiskUse?
IfSingle $group/sort exceeding 100MB but pipeline is otherwise efficient
UseUse allowDiskUse: true. Monitor disk I/O — it's slower but works.
IfMultiple stages each near limit, or overall pipeline too slow
UseRestructure: push $match and $project earlier, use indexes, break pipeline into separate aggregations.
If$bucket or $facet hits memory limit
UseSet allowDiskUse: true, but also consider reducing the number of buckets or sub-pipelines.

Pipeline Optimization and Execution Engine Internals

MongoDB's query planner does more than you think. When it sees a pipeline, it applies a set of transformations:

  • $match coalescence: Multiple $match stages are merged into one. This matters because it reduces overhead.
  • $project removal: If a $project only removes fields and is followed by another $project, the engine may combine them.
  • $sort + $limit: When you have a $sort followed by $limit, the engine uses a top-k sort algorithm — it only keeps the k best elements in memory. This drastically reduces memory for large sorts.
  • $lookup optimization: For simple equality lookups, the engine uses a hash join internally. For more complex pipelines, it falls back to nested loop.

But not all optimizations happen automatically. For example, $match after $lookup does not push into the $lookup pipeline. You must manually move that filter inside the $lookup's pipeline parameter.

The execution model is document-stream based. Each stage outputs documents one by one to the next stage. Blocking stages (like $group) must buffer all documents before emitting. This is why a $group after a $project that reduces document size is faster than a $group on original docs.

Here's how to check if optimization occurred.

Another optimization: when using $lookup with a pipeline, the engine can sometimes use indexes on the foreign collection for $match stages inside that pipeline. But it won't reorder stages inside the sub-pipeline. You can force a plan using $hint if the planner picks a suboptimal one.

A production trick: if you see multiple $match stages in your explain output and they haven't been coalesced, check if there's an intervening $addFields or $project that blocks coalescence. Sometimes adding a $project can actually prevent optimization.

I've debugged a pipeline where a $project between two $match stages prevented coalescence. Removing the first $project and using $unset to drop fields solved it.

Additionally, the optimizer can also remove $project stages that do nothing — for example, if you $project a field that's already the only field in the document. But it won't remove a $project that adds computed fields. That's another reason to avoid unnecessary $addFields.

optimization-check.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Check if optimization occurred: look for 'optimizedPipeline'
db.collection.aggregate([...]).explain('queryPlanner');

// If you see 'stages' with different order, MongoDB optimized it.
// Example: two $match stages become one.

// Push $match into $lookup manually:
db.orders.aggregate([
  { $lookup: {
      from: 'products',
      let: { prodId: '$productId' },
      pipeline: [
        { $match: { $expr: { $eq: ['$_id', '$$prodId'] } } },
        { $match: { active: true } }  // pushed into lookup
      ],
      as: 'product'
  }}
]);

// Force a query plan with $hint on foreign collection (rare)
db.products.createIndex({ active: 1 });
db.orders.aggregate([
  { $lookup: {
      from: 'products',
      let: { prodId: '$productId' },
      pipeline: [
        { $match: { $expr: { $eq: ['$_id', '$$prodId'] }, active: true } }
      ],
      as: 'product'
  }}
]);

// To see if $sort+$limit used top-k sort, check explain for 'optimizedPipeline' or 'sort' stage with 'limit'.
Optimizer limitations
The optimizer cannot push $match through $lookup because it doesn't know the foreign collection's data distribution. You have to do it manually.
Production Insight
The top-k sort optimization saved us 6GB of RAM on a pipeline that sorted all orders then took top 100. Without the optimization, the $sort would have failed memory limit.
Always check explain output to see if optimization kicked in.
If you don't see 'optimizedPipeline', the engine didn't change anything.
Pro tip: use $hint to force a plan when needed — but test against production data.
Key Takeaway
MongoDB optimizes $match merging and $sort+$limit automatically.
$match after $lookup is NOT optimized — you have to push it manually.
Always verify with explain('queryPlanner').
The optimizer won't save you from bad stage ordering.
When Does Optimization Help?
IfMultiple $match stages in sequence
UseEngine merges them into one — no performance gain needed.
IfOne $project removes fields, another adds fields
UseEngine may combine — check explain for 'optimizedPipeline'.
If$sort followed by $limit
UseEngine uses top-k sort — huge memory savings.
If$match appears after $lookup
UseEngine does NOT push it inside — must manually move into lookup pipeline.

Aggregation Optimization Ruleset Checklist

Here is a concise checklist of optimization rules that you can apply when writing or reviewing an aggregation pipeline. These rules are derived from production experience and the MongoDB documentation. Apply them in order:

  1. Rule 1 — Push $match as early as possible. Ideally the first stage. This reduces the number of documents flowing into the pipeline. Use explain to confirm that the $match uses an index.
  2. Rule 2 — Precede blocking stages with $match and $project. Before any $group, $sort, $bucket, or $facet, add a $project to remove unnecessary fields, reducing document size and memory pressure.
  3. Rule 3 — Always index foreign fields used in $lookup. Without an index, $lookup does a full collection scan per local document. Create a supporting index on the foreign collection's join field.
  4. Rule 4 — Use $lookup with a sub‑pipeline instead of $unwind on the full result. Filters inside the sub-pipeline reduce the number of joined documents before $unwind multiplies them.
  5. Rule 5 — Replace $addFields with inline calculations in $group accumulators. Instead of computing a field with $addFields then using it in $sum, compute the expression directly inside the accumulator.
  6. Rule 6 — Exploit the $sort + $limit top-k optimization. When you need only the top N sorted results, always place $limit immediately after $sort. This keeps only N elements in memory.
  7. Rule 7 — Avoid $facet if one sub-pipeline is significantly slower than others. $facet blocks until the slowest sub-pipeline finishes. Use separate aggregations and merge in application code.
  8. Rule 8 — Test with production-scale data. A pipeline that works on 1M documents may fail on 100M. Use explain('executionStats') and check 'docsExamined' vs 'nReturned'.
  9. Rule 9 — Set maxTimeMS on every aggregation in production. This prevents a runaway pipeline from hanging the application. Use a timeout that the use case can tolerate.
  10. Rule 10 — Monitor $lookup performance via $currentOp and profiler. Look for operations with high 'docsExamined' relative to 'nReturned'.
Quick Review Template
Before deploying a new aggregation pipeline, run through this checklist. If you can't justify one of the rules being skipped, revisit the pipeline design.
Production Insight
We embedded this checklist in our code review process. In the first three months, it caught 12 pipelines that would have failed in production — all due to missing $match or unindexed $lookup. The cost of a review is 10 minutes; the cost of a production incident is hours.
Key Takeaway
Apply these 10 rules as a mental checklist when writing any aggregation pipeline. They convert implicit knowledge into explicit guardrails.

Debugging and Profiling Aggregations in Production

You can't fix what you can't see. MongoDB gives you four tools to inspect aggregation performance:

  1. explain(): Use with 'executionStats' to see the number of documents examined, returned, and stage-level timings. This is your single most important diagnostic.
  2. Database Profiler: Set profiling level to 2 for slow operations. Look for aggregations with high 'docsExamined' vs 'nReturned' ratio.
  3. $planCacheStats: If a query shape has multiple plans, the cache holds the winning plan. You can clear it to force re-optimization.
  4. setProfilingLevel and system.profile: Log all operations taking longer than a threshold. Analyze the profile collection for pattern detection.

Real trick: When debugging, start with a pipeline that has only the first stage, add stages one by one, running explain at each step. This isolates which stage is the culprit. Never debug a 10-stage pipeline as a whole.

Here's a practical example of enabling profiling and querying slow aggregations.

You can also use $indexStats on the foreign collection to verify indexes are being used by your $lookup.

An additional pattern: if you're seeing intermittent slow aggregations, check if the plan cache is holding a stale plan after data growth. Clear it with getPlanCache().clear() and force re-optimization.

I had a situation where a $lookup was fast for weeks, then suddenly took 10x longer. The index was there, but the plan had changed. Clearing the cache restored performance.

Profiling can also reveal aggregations that return far more documents than expected — often a sign of an accidental $unwind multiplication. Check the nReturned vs totalDocsExamined ratio in explain. If nReturned is orders of magnitude larger than the input count, you've got an explosion.

profiling-commands.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Enable profiling for slow operations (>100ms)
db.setProfilingLevel(1, { slowms: 100 });

// View recent slow aggregation queries
db.system.profile.find({
  op: 'command',
  'command.aggregate': { $exists: true },
  millis: { $gt: 500 }
}).sort({ ts: -1 }).limit(10).pretty();

// Clear plan cache for a collection
db.collection.getPlanCache().clear();

// Use plan cache stats to see cached plans
db.collection.getPlanCache().listQueryShapes();

// Check index usage for $lookup foreign collection (in mongosh)
db.customers.aggregate([ { $indexStats: {} } ]);

// Test isolation: incrementally add stages to pinpoint bottleneck
db.orders.aggregate(pipeline.slice(0,1)).explain('executionStats');
db.orders.aggregate(pipeline.slice(0,2)).explain('executionStats');
Profiling overhead in production
Setting profiling level to 2 logs every operation — can cause significant performance degradation. Use level 1 with appropriate slowms threshold. Or better, turn on profiling only on a secondary replica if you have one.
Production Insight
In one incident, a $lookup performed well in QA but terribly in prod.
QA had 1M docs, prod had 50M — different query plans.
Always test with production-scale data volumes.
Use plan cache clearing as a first step when performance degrades unexpectedly.
Key Takeaway
explain('executionStats') is your first debugging step.
Add stages incrementally to find the bottleneck.
Profile in production with level 1 and a sensible slowms.
When to use each profiling tool
IfSingle aggregation is slow, want to see stage details
UseUse explain('executionStats').
IfNeed to find slow aggregations across the system
UseEnable profiler at level 1 with slowms threshold and query system.profile.
IfQuery plan changed after index creation
UseClear plan cache with getPlanCache().clear() to force re-optimization.

Explain Plan Walkthrough for Aggregation Queries

The explain() method provides three verbosity modes: queryPlanner, executionStats, and allPlansExecution. For aggregation debugging, executionStats is the most useful. It shows the winning plan's performance details: how many documents were examined at each stage, how long each stage took, and whether indexes were used.

Let's walk through a real explain output for a simple pipeline.

First, run the pipeline with explain('executionStats'). The output contains a stages array. Each element represents a stage in the execution tree. Look for: - stage: the stage name (e.g., 'COLLSCAN', 'IXSCAN', 'GROUP', 'SORT') - nReturned: number of documents output by that stage - totalDocsExamined: number of documents the stage read from disk or index - executionTimeMillisEstimate: time spent in that stage - inputStage: nested details for dependent stages

Key metrics to inspect:

  • Ratio totalDocsExamined / nReturned: A high ratio indicates a stage is scanning many documents to produce few results. For a $match, this means no index was used. For a $lookup, it means the foreign field is not indexed.
  • usedDisk: If true, the stage spilled to disk. This is a red flag — restructure the pipeline to avoid it.
  • IXSCAN vs COLLSCAN: IXSCAN means an index was used; COLLSCAN means a full collection scan.

Example walkthrough:

Consider the pipeline: [{$match: {status: 'active'}}, {$group: {_id: '$category', count: {$sum:1}}}]

The explain output might show: `` stages: [ { stage: 'GROUP', nReturned: 50, totalDocsExamined: 20000, ... }, { stage: 'COLLSCAN', nReturned: 20000, totalDocsExamined: 20000, filter: {status: 'active'} } ] ` Here we see GROUP examined 20,000 documents (all input) and returned only 50 groups. But the collection scan (COLLSCAN) examined all documents in the collection, not just those with status: 'active'. That's because the filter was applied inside the scan, but an index on status would have reduced totalDocsExamined to only matching documents. With an index, the scan becomes IXSCAN on status, and totalDocsExamined` would be much lower.

Production debugging sequence: 1. Run explain('executionStats') on the full pipeline. 2. Identify the stage with the highest totalDocsExamined. 3. If it's a COLLSCAN, create an index for the filter used in that stage. 4. If it's a $group or $sort and usedDisk: true, add a $match before it to reduce input. 5. Re-run explain and verify improvements.

Here's a concrete example of reading explain output for a $lookup pipeline.

explain-walkthrough.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Pipeline to debug
db.orders.aggregate([
  { $match: { status: 'completed' } },
  { $lookup: {
      from: 'customers',
      localField: 'customerId',
      foreignField: '_id',
      as: 'customer'
  }}
]).explain('executionStats');

// Example output interpretation:
// "stages" : [
//   {
//     "$lookup" : {
//       "stage" : "$lookup",
//       "nReturned" : 10000,
//       "executionTimeMillisEstimate" : 4500,
//       "inputStage" : {
//         "stage" : "$match",
//         "nReturned" : 10000,
//         "totalDocsExamined" : 10000,
//         "inputStage" : {
//           "stage" : "IXSCAN",
//           "indexName" : "status_1",
//           "keyPattern" : { "status" : 1 },
//           "totalDocsExamined" : 10000
//         }
//       },
//       "foreignCollection" : {
//         "stage" : "COLLSCAN",
//         "totalDocsExamined" : 500000  // BAD: no index on _id
//       }
//     }
//   }
// ]

// Here the $lookup scanned the foreign collection 500k times per local document.
// Creating an index on customers._id (the _id field already has unique index, but if foreignField is different, create index)
// will change COLLSCAN to IXSCAN.

db.customers.createIndex({ _id: 1 }); // _id is already indexed; this example assumes foreignField is something else
// For a real fix: ensure foreignField has an index.

// After indexing, re-run explain to see IXSCAN in the foreign collection section.
Explain reading shortcut
In large output, search for 'COLLSCAN' — each one represents a full table scan. Your goal is to eliminate all COLLSCANs from the plan, replacing them with 'IXSCAN' by adding appropriate indexes.
Production Insight
During a post‑mortem, we found that a pipeline had two COLLSCANs: one on the local collection (because $match was after $lookup) and one on the foreign collection (no index). Adding an index on the foreign field and moving $match to position 1 eliminated both scans. Runtime dropped from 8 minutes to 12 seconds.
Key Takeaway
Regularly run explain('executionStats') on every aggregation you deploy. Look for COLLSCAN and high totalDocsExamined. Fix those before they become incidents.

Common Pitfalls and How to Avoid Them

Even senior engineers make these mistakes. Here are the three that bite hardest:

1. Assuming $lookup is cheap. It's not. Even with indexed foreign fields, $lookup adds a round-trip to the foreign collection per input document (paged, but still expensive). Minimize input to $lookup with early $match.

2. Forgetting that $unwind multiplies documents. A single document with an array of 10k elements becomes 10k documents downstream. Later $group or $sort will process 10kx more. Use $unwind with careful filtering or consider $lookup pipeline to pre-filter.

3. Not handling null or missing foreign keys in $lookup. If the foreign field is missing, $lookup returns an empty array. $unwind then removes that document entirely by default (preserveNullAndEmptyArrays: false). This silently drops documents with no match. Always check your join semantics.

4. Using $facet for simple parallel streams. $facet runs multiple sub-pipelines on the same input. If one sub-pipeline is expensive, it blocks the others. Often two separate aggregations are faster than one $facet.

5. Overusing $addFields to add computed fields early. Every $addFields triggers a document re-materialization, which can hurt performance. Use $project to only keep needed fields.

Here's an example of the silent document loss and how to fix it.

Another pitfall: using $group on high-cardinality fields without considering memory. If you group by a field with millions of unique values, $group will create millions of buckets in memory, easily hitting the 100MB limit. In such cases, consider using $bucket to group into ranges.

In one code review, I found a pipeline that did $group on user IDs for an event table with 10 million users. The fix: $bucket by user ID ranges, then aggregate.

6. Relying on allowDiskUse as a crutch. Spilling to disk is 10-100x slower than in-memory processing. A pipeline that relies on allowDiskUse needs restructuring first.

7. Ignoring the $lookup pipeline's sub-pipeline optimizations. The sub-pipeline inside $lookup can be a full aggregation pipeline — you can $sort, $limit, and $match inside it to control the joined data volume.

pitfalls-examples.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Pitfall 3: Silent document loss with $lookup + $unwind
db.orders.aggregate([
  { $lookup: {
      from: 'customers',
      localField: 'customerId',
      foreignField: '_id',
      as: 'customer'
  }},
  // If customerId doesn't exist in customers, customer is []
  // $unwind without preserveNullAndEmptyArrays removes the order!
  { $unwind: '$customer' }  // orders without customers vanish
]);

// Fix: use preserveNullAndEmptyArrays: true or handle missing
db.orders.aggregate([
  { $lookup: { ... } },
  { $unwind: { path: '$customer', preserveNullAndEmptyArrays: true } }
]);

// Pitfall: $group on high-cardinality field causing memory issues
db.events.aggregate([
  { $group: { _id: '$userId', count: { $sum: 1 } } }  // if 10M users, 10M buckets
]); // likely to hit memory limit
// Fix: use $bucket to group into ranges
db.events.aggregate([
  { $bucket: { groupBy: '$userId', boundaries: [0, 1000, 10000, 100000], default: 'Other', output: { count: { $sum: 1 } } } }
]);

// Pitfall 5: overusing $addFields
db.orders.aggregate([
  { $addFields: { totalWithDiscount: { $multiply: ['$total', 0.9] } } },
  { $group: { _id: null, total: { $sum: '$totalWithDiscount' } } }
]);
// Better: inline computation in $group { $sum: { $multiply: ['$total', 0.9] } }
Memory for $facet
Each sub-pipeline in $facet can use up to 100MB separately, but the overall pipeline may still hit limits. Use explain to track memory per sub-pipeline.
Production Insight
A $facet with three sub-pipelines: total sales (8s), others (100ms).
All waited for the slowest — splitting didn't help wall-clock but improved perceived speed.
Moral: $facet hides parallelism — return partial results separately.
Real fix: run two aggregations in parallel and merge results in application code.
Key Takeaway
$lookup is expensive — reduce input before it.
$unwind can silently drop docs — check preserveNullAndEmptyArrays.
$facet blocks until all sub-pipelines finish — split for partial results.
High-cardinality $group needs $bucket or allowDiskUse.
When to Use $facet vs. Separate Aggregations
IfSub-pipelines share the same initial $match and are fast (<10ms each)
UseUse $facet — saves one full collection scan.
IfOne sub-pipeline is slow (sort/group on large data)
UseSplit into two separate aggregations — $facet will block all until the slowest completes.
IfSub-pipelines require different initial filters
UseAlways split — $facet can only have one initial input.

Aggregation in Sharded Clusters

When your collection is sharded, aggregation pipelines run across multiple shards. The mongos router merges results from each shard. This adds complexity and performance considerations that are easy to overlook.

First, $lookup across shards: if the local collection is sharded but the foreign collection is not, the foreign collection is scanned on every shard. If both are sharded, MongoDB can optimise by targeting only shards that contain matching documents, but only if the local field used in $lookup is the shard key. If not, the $lookup broadcasts to all shards.

Second, $merge and $out are your friends for large pipelines. They write the results to a collection, avoiding the memory limit and allowing incremental processing. Use $merge to periodically persist intermediate results in multi-step ETL workflows.

Third, the order of stages crossing shard boundaries matters. $match and $project are pushed down to each shard (if they don't depend on merged data). $group and $sort are split into two phases: one per shard, then a merge on mongos. This can lead to duplicate scanning if not careful.

A common production mistake: running a large $group across shards without pre-filtering with $match. Each shard processes its entire data set, then mongos merges and re-groups. You can often push a condition down by adding a $match stage that uses the shard key.

Let's see an example that uses $merge to store intermediate results and avoid memory pressure.

Another critical point: the merge stage ($merge) runs on the primary shard of the output collection. If the output collection is also sharded, the merge stage's performance depends on that shard's capacity. For very large outputs, consider using $out with a sharded output collection strategy.

Also, $sort after $group in a sharded pipeline: each shard sorts its own partial results, then mongos does a merge sort. For top-k queries, you can use $sort + $limit on each shard to reduce data sent to mongos. If you omit $limit, all partial results are sent — potentially huge network payloads.

sharded-aggregation.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Aggregation over sharded orders collection with $merge
db.orders.aggregate([
  { $match: { shardKey: { $gte: 1000, $lt: 2000 } } },  // pushed to shard
  { $group: { _id: '$region', total: { $sum: '$amount' } } },
  { $sort: { total: -1 } },
  { $merge: { into: 'regional_totals', on: '_id', whenMatched: 'replace', whenNotMatched: 'insert' } }
]);

// After merge, you can query the output collection
// db.regional_totals.find().sort({total:-1});

// Alternative with $out (replaces collection)
db.orders.aggregate([
  { $match: { year: 2026 } },
  { $group: { _id: '$product', totalSales: { $sum: '$amount' } } },
  { $out: 'product_totals_2026' }
]);

// Avoid $lookup across shards if possible. Use denormalization or separate pipelines.
Sharded $lookup broadcast
If the local field in $lookup is not the shard key, the pipeline broadcasts to every shard. For high-volume systems, this can overwhelm the network. Prefer to denormalize or perform the lookup after merging on mongos.
Production Insight
A $group across shards without early $match caused a 30-minute aggregation.
Each shard scanned its entire partition, then mongos merged—duplicating work.
Added $match on shard key to limit input per shard; runtime dropped to 4 minutes.
Real fix: always push $match with shard key before $group in sharded environments.
Key Takeaway
$lookup across shards is expensive — avoid unless necessary.
$merge writes results to collection, reduces memory pressure.
Test aggregation performance with realistic shard distribution.
Push $match with shard key early to limit per-shard scanning.
Sharded aggregation strategy
IfLocal collection is sharded; need $lookup
UseEnsure foreign field matches shard key or expect broadcast. Consider denormalization.
IfLarge dataset with multiple $group stages
UseUse $merge to store intermediate results, reducing memory pressure and allowing incremental processing.
IfNeed to sort aggregated results across shards
UseUse $sort after $group; mongos will merge sorted results. Add $limit before $sort if possible.

Expressions and Operators: The Engine That Makes Pipelines Powerful

Most devs treat stages like $group and $project as black boxes. That's fine until you need to transform data mid-stream. Expressions and operators are what let you compute, reshape, and filter inside any stage. You don't call a function—you pass an expression object that the pipeline evaluates per document.

Operators fall into three buckets: arithmetic ($add, $multiply), date ($year, $dateToString), and conditional ($cond, $ifNull). The bottleneck is always understanding that expressions are lazy—they resolve when the stage executes, not when you type them. That matters when you're chaining $lookup results across collections.

Production reality: your most expensive pipeline stage might not be $lookup. It's a $addFields that recalculates a date field on 10 million documents because someone used $toDate on a string instead of storing native dates. Operators don't optimize themselves. If you can push date arithmetic into a $match stage with an index, do it. Every expression costs CPU cycles. Profile first, optimize second.

ExpressionBasedRevenueCalculation.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// io.thecodeforge — database tutorial

// Real scenario: compute daily revenue from order items with conditional discount
var pipeline = [
  { $match: { orderDate: { $gte: ISODate("2025-01-01") } } }, // index on orderDate
  { $unwind: "$items" },
  {
    $addFields: {
      // Use $cond to apply discount only if promoCode exists
      effectiveUnitPrice: {
        $cond: {
          if: { $ne: ["$items.promoCode", null] },
          then: { $multiply: ["$items.unitPrice", 0.9] },  // 10% off
          else: "$items.unitPrice"
        }
      },
      // Convert ISO date to YYYY-MM-DD string for grouping
      dayStr: { $dateToString: { format: "%Y-%m-%d", date: "$orderDate" } }
    }
  },
  {
    $group: {
      _id: "$dayStr",
      totalRevenue: { $sum: { $multiply: ["$effectiveUnitPrice", "$items.quantity"] } },
      orderCount: { $sum: 1 }
    }
  },
  { $sort: { _id: 1 } },
  { $project: { _id: 0, date: "$_id", totalRevenue: 1, orderCount: 1 } }
];

db.orders.aggregate(pipeline);
Output
{ "date": "2025-01-01", "totalRevenue": 45920.50, "orderCount": 1234 }
{ "date": "2025-01-02", "totalRevenue": 38215.30, "orderCount": 1098 }
Production Trap:
Don't use $addFields to transform fields you could filter out earlier with $match. Operators run on every surviving document—reduce the set first.
Key Takeaway
Expressions are per-document functions. Use them to compute once, not to filter after.

Update Documents Using an Aggregation Pipeline — No, Really, It Works

You already know db.collection.find() and db.collection.update() are separate. But since MongoDB 4.2, you can run an aggregation pipeline inside an update operation. This lets you transform documents based on their own data without pulling them into your app and writing back. Why do this? Atomicity. One command, one lock on the document, zero race conditions.

The catch: you're limited to stages that don't break document structure. No $unwind, no $group. You need $set, $unset, $addFields, $project (to exclude fields), and conditionals. Think $lookup? Nope—same restriction. This is for in-document transformations, not cross-collection magic.

Production use cases: adding computed fields for caching (avoid expensive reads later), normalizing nested arrays, or coalescing null fields to defaults. You also sidestep the dreaded findAndModify round-trip for bulk updates. I've seen teams cut an hour-long batch job to 90 seconds by switching to this pattern. Just remember—validate on a backup first. You can't undo a bad aggregation update across 500K docs.

AggregationBasedDocumentUpdate.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — database tutorial

// Add a 'shippingCost' field based on order total, atomic update
var updatePipeline = [
  {
    $set: {
      // $switch evaluates first match; no need for nested $cond
      shippingTier: {
        $switch: {
          branches: [
            { case: { $lte: ["$totalAmount", 25] }, then: "standard" },
            { case: { $lte: ["$totalAmount", 100] }, then: "expedited" }
          ],
          default: "priority"
        }
      },
      // Flat shipping cost based on tier, computed here not in application code
      shippingCost: {
        $switch: {
          branches: [
            { case: { $eq: ["$shippingTier", "standard"] }, then: 4.99 },
            { case: { $eq: ["$shippingTier", "expedited"] }, then: 9.99 }
          ],
          default: 19.99
        }
      },
      // Unset the raw address fields now that we have structured data
      tempAddress: "$$REMOVE"
    }
  }
];

db.orders.updateMany(
  { status: "pending", fulfillmentWarehouse: { $exists: false } },
  updatePipeline
);
Output
{ "acknowledged": true, "matchedCount": 15000, "modifiedCount": 14987 }
Senior Shortcut:
Use $unset inside $set with $$REMOVE to delete fields in the same pass. Two birds, one aggregation stage, zero additional I/O.
Key Takeaway
Aggregation pipelines in update commands give you atomic, server-side transformation. No round trips, no app logic, no race conditions.

Stage Typology and Functional Categories

Every pipeline stage falls into one of three functional categories that dictate how it processes documents and interacts with the engine. Source stages, like $match and $sort, reduce or reorder the document set early, often leveraging indexes to skip full collection scans. Transformation stages—$project, $addFields, $unwind—reshape each document without altering cardinality. Aggregation stages, led by $group, $bucket, and $facet, collapse multiple documents into computed results, typically requiring all data in memory. Distinguishing these categories matters because source stages should run as early as possible to limit document flow, transformation stages are memory-cheap but can explode cardinality (think $unwind on arrays), and aggregation stages trigger the 100MB memory limit. Prioritize source stages over aggregation stages for performance. Split one large aggregation stage into multiple fine-grained stages to improve index usage and reduce spillover to disk.

StageCategories.sqlSQL
1
2
3
4
5
6
7
8
9
10
// io.thecodeforge — database tutorial

// Source stage: $match uses index, reduces docs early
db.orders.aggregate([
  { $match: { status: "shipped" } },
  // Transformation stage: reshape without adding docs
  { $addFields: { total: { $multiply: ["$price", "$qty"] } } },
  // Aggregation stage: collapses into computed results
  { $group: { _id: null, totalRevenue: { $sum: "$total" } } }
])
Output
{ "_id": null, "totalRevenue": 152340.50 }
Production Trap:
Running $unwind on a large array before $match multiplies documents first, blowing up memory. Always filter source before you unwind.
Key Takeaway
Run source stages first, then transformations, then aggregations—order directly impacts performance.

Multidimensional Analytics with $facet

When you need multiple independent aggregations on the same input document set—like sales totals per region, per product category, and per time bucket—a single $facet stage replaces several separate queries. $facet accepts a document of named sub-pipelines. Each sub-pipeline operates on the same input stream, but they run in parallel within the same stage. This eliminates redundant collection scans and reduces network round trips. Yet $facet forces every sub-pipeline to share the memory limit (default 100MB). If one sub-pipeline is a $sort + $group behemoth, it can starve the others and spill to disk. Use $facet when sub-pipelines are lightweight and balanced. For heavy aggregations, prefer separate pipeline executions with indexed $match filters. Never nest $facet inside $facet—MongoDB forbids it and throws a parse error. The output is a single document with keys matching your sub-pipeline names, each containing an array of results.

FacetAnalytics.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — database tutorial

db.sales.aggregate([
  { $match: { date: { $gte: ISODate("2024-01-01") } } },
  {
    $facet: {
      byRegion: [
        { $group: { _id: "$region", revenue: { $sum: "$amount" } } },
        { $sort: { revenue: -1 } }
      ],
      byCategory: [
        { $group: { _id: "$category", count: { $sum: 1 } } }
      ],
      timeline: [
        { $group: {
          _id: { $dateToString: { format: "%Y-%m", date: "$date" } },
          total: { $sum: "$amount" }
        }},
        { $sort: { _id: 1 } }
      ]
    }
  }
])
Output
{
"byRegion": [ { "_id": "EMEA", "revenue": 55000 } ],
"byCategory": [ { "_id": "Electronics", "count": 320 } ],
"timeline": [ { "_id": "2024-01", "total": 12000 } ]
}
Production Trap:
One heavy sub-pipeline inside $facet (like $group with $sort) can exceed the 100MB memory limit, causing all sub-pipelines to fail or spill to disk.
Key Takeaway
Use $facet for parallel, lightweight aggregations; separate heavy ones into distinct pipelines to avoid memory starvation.

What Is the MongoDB Aggregation Pipeline?

The MongoDB Aggregation Pipeline is a framework for processing data through a sequence of stages, each transforming documents as they pass through. Think of it as an assembly line: raw documents enter at the first stage, get filtered, grouped, sorted, reshaped, or computed, and the final stage outputs the result. Each stage receives input from the previous stage, applies an operation (like $match, $group, or $project), and passes the transformed documents forward. This makes the pipeline radically different from simple find queries — it enables multi-step data processing, joins across collections via $lookup, computed fields, window functions, and entire ETL workflows inside the database. Introduced in MongoDB 2.2 and heavily optimized since, the pipeline runs inside the mongod process, leveraging indexes and parallelism when possible. It is the primary tool for analytics, reporting, data cleaning, and complex transformations, replacing map-reduce in most modern MongoDB deployments. Understanding the pipeline's sequential nature is crucial: stage order directly impacts performance, memory usage, and correctness. A misordered pipeline can explode in memory or return wrong counts, making mastery of its mechanics essential for production-grade systems.

Basics.sqlSQL
1
2
3
4
5
6
7
// io.thecodeforge — database tutorial
// Minimal pipeline example: filter, then group by status
use("sales");
db.orders.aggregate([
  { $match: { date: { $gte: ISODate("2024-01-01") } } },
  { $group: { _id: "$status", totalAmount: { $sum: "$amount" } } }
]);
Output
{ _id: 'shipped', totalAmount: 54200 }
{ _id: 'pending', totalAmount: 12350 }
Production Trap:
Placing $match late in the pipeline forces every stage to process all documents, exploding memory and CPU. Always filter early.
Key Takeaway
Move $match and $limit to the earliest possible stage to reduce document throughput and avoid memory blowouts.

Why Not Just Use find() or mapReduce?

While find() handles simple filtering and projection, it cannot perform multi-stage transformations, groupings, or joins without application-side logic. mapReduce, though powerful, suffers from single-threaded JavaScript execution, poor index utilization, and no streaming output — it materializes intermediate collections, crippling real-time performance. The aggregation pipeline attacks these limitations head-on. It runs natively in C++, uses indexes for $match and $sort, streams documents between stages without writing to disk (unless spill-to-disk is triggered), and supports compound operations like $lookup for foreign joins, $bucket for histogram creation, and $facet for parallel sub-pipelines. Execution is declarative: you describe what you want, and the optimizer reorders stages, chooses index-accelerated plans, and parallelizes across shards. For example, a $group stage can compute sums, averages, and counts in a single pass, whereas equivalent find() logic would require multiple round trips or mapReduce overhead. When you need to transform data beyond column selection — like pivoting, joining, or time-series windowing — the aggregation pipeline is the only native path that scales without external processing. It is also the foundation for MongoDB's change streams and Atlas Search, making it the unifying data processing primitive.

Comparison.sqlSQL
1
2
3
4
5
6
7
// io.thecodeforge — database tutorial
// Inefficient: multiple queries vs. one pipeline
db.sales.find({ region: "EU" });
db.sales.aggregate([
  { $match: { region: "EU" } },
  { $group: { _id: "$product" } }
]);
Output
Pipeline: 1 round trip, index used, result: 5 groups
Why It Matters:
mapReduce is deprecated from MongoDB 5.0. The aggregation pipeline is the only supported path for complex transformations going forward.
Key Takeaway
Replace mapReduce with aggregation pipeline for all new development; it's faster, indexed, and the only supported transformation tool in modern MongoDB.
● Production incidentPOST-MORTEMseverity: high

The Unbounded $lookup That Brought Down Reporting

Symptom
Aggregation pipeline taking >45 minutes, eventually hitting 60-minute timeout. Dashboard showing stale data.
Assumption
The foreign collection was small enough that an index wasn't necessary.
Root cause
$lookup without index on the foreign field caused a full collection scan for every input document. With 50k input docs and 500k foreign docs, that's 25 billion comparisons.
Fix
Created an index on the foreign key field used in $lookup. Also added $limit at pipeline start to cap processing to last 30 days of data.
Key lesson
  • Always index foreign key fields used in $lookup.
  • Use explain() to verify index usage before deploying.
  • Set realistic timeouts with maxTimeMS() to fail fast.
  • Profile long-running aggregations in staging with production-like data volume.
Production debug guideSymptom → Action grid for common aggregation performance issues5 entries
Symptom · 01
Aggregation takes seconds or minutes to complete
Fix
Run explain('executionStats') to identify stage with most docs processed. Look for COLLSCAN in inputStage.
Symptom · 02
Aggregation fails with 'Exceeded memory limit' error
Fix
Add { allowDiskUse: true } to enable disk-based sorting and grouping. Consider adding $match early to reduce data volume.
Symptom · 03
$lookup is slow
Fix
Check if foreign field has an index. Verify $lookup pipeline uses indexed fields. Use indexed local and foreign fields.
Symptom · 04
Aggregation returns wrong results or duplicates
Fix
Test each stage in isolation by removing subsequent stages. Check for $unwind causing document multiplication.
Symptom · 05
$facet sub-pipeline blocks others
Fix
Run explain() on each sub-pipeline independently. If one sub-pipeline is much slower, split into separate aggregations.
★ Aggregation Pipeline Quick Debug CheatsheetFive common aggregation failures and the exact commands to diagnose them.
Pipeline times out or hangs
Immediate action
Terminate with db.killOp(opid). Get opid from currentOp().
Commands
db.currentOp({ 'command.aggregate': 'collectionName' })
db.collection.aggregate(pipeline).maxTimeMS(10000).explain('executionStats')
Fix now
Add maxTimeMS() to pipeline. Add index on foreign key for $lookup. Reduce input with earlier $match.
Memory limit exceeded error+
Immediate action
Retry with allowDiskUse: true but monitor disk I/O.
Commands
db.collection.aggregate(pipeline, { allowDiskUse: true })
Check pipeline for $sort and $group stages consuming memory. Use $match before $group.
Fix now
Restructure pipeline: $match early, use indexes, consider $bucket or $facet to reduce data.
Aggregation returns multiple identical documents+
Immediate action
Examine $lookup and $unwind for cardinality explosion.
Commands
Comment out $unwind and $lookup stages one by one to isolate.
Use $lookup's pipeline stage to filter before $unwind.
Fix now
Use $unwind with preserveNullAndEmptyArrays: false if duplicates come from nulls. For many matches, consider $lookup with aggregation pipeline to limit returned docs.
$lookup returns unexpected documents+
Immediate action
Examine pipeline in $lookup's pipeline parameter.
Commands
db.collection.aggregate([...]).explain('executionStats')
Test $lookup in isolation with sample data.
Fix now
Add $match inside $lookup pipeline to filter before join. Ensure index on foreign field.
$unwind causes document loss with null arrays+
Immediate action
Check $unwind options.
Commands
db.collection.aggregate(pipeline).itcount() vs with preserveNullAndEmptyArrays: true
Use $unwind with preserveNullAndEmptyArrays: true to see if documents appear.
Fix now
Set preserveNullAndEmptyArrays: true to keep documents with null or empty arrays.
Aggregation Pipeline vs. find() vs. MapReduce
FeatureAggregation Pipelinefind()MapReduce (deprecated)
Data transformationRich stages for reshape, group, joinOnly projection and simple field selectionJavaScript code, full flexibility
Index usageOnly at start of pipelineFull index supportRequires manual index management
PerformanceOptimized C++ engine, streamingMost efficient for simple queriesSingle-threaded JavaScript, slow
Memory limit100MB per stage (configurable with allowDiskUse)No effective limit (cursor based)As per JavaScript heap
Use caseComplex analytics, joins, aggregationsSimple CRUD, filtering with logicLegacy systems only

Key takeaways

1
You now understand what MongoDB Aggregation Pipeline is and why it exists.
2
You've seen it working in a real runnable example.
3
Practice daily
the forge only works when it's hot 🔥
4
Stage order is decisive
early $match before any transformation.
5
Always index foreign fields in $lookup
or pay O(n*m).
6
Use explain('executionStats') on every pipeline before deploying.
7
100MB memory limit is per blocking stage, not per pipeline.

Common mistakes to avoid

7 patterns
×

Memorising syntax before understanding the concept

Symptom
You can recall syntax but cannot explain why the pipeline works or debug failures in production.
Fix
Focus on understanding the execution model and data flow. Write pipelines from scratch to internalise concepts.
×

Skipping practice and only reading theory

Symptom
During interviews or debugging, you cannot construct correct aggregation stages because you've never typed them out.
Fix
Set up a local MongoDB instance or use MongoDB Atlas free tier. Build pipelines with real data (e.g., sample_mflix dataset).
×

Putting $sort before $match

Symptom
Aggregation much slower than expected, hits memory limit or times out.
Fix
Swap order: $match first to reduce documents, then $sort. Check explain to verify index usage.
×

Forgetting to index foreign fields used in $lookup

Symptom
$lookup stage consumes massive time, full collection scans.
Fix
Create an index on the foreignField in the target collection. Use explain to confirm IXSCAN.
×

Using allowDiskUse without understanding trade-offs

Symptom
Aggregation runs but is unexpectedly slow, disk I/O spikes.
Fix
Restructure pipeline to avoid disk spill: add early $match, reduce document size with $project, use indexes. Use allowDiskUse only as temporary fix.
×

Ignoring $facet sub-pipeline blocking

Symptom
$facet returns all results only after the slowest sub-pipeline completes, causing perceived latency.
Fix
If one sub-pipeline is significantly slower, split into separate aggregations. Or use $facet only when sub-pipelines are balanced and fast.
×

Using $addFields unnecessarily before $group

Symptom
Pipeline runs slower than expected, high CPU usage.
Fix
Inline computations inside $group accumulators where possible. Use $project to trim fields before $group.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does MongoDB optimize an aggregation pipeline? Give an example of an...
Q02SENIOR
Explain the 100MB memory limit in aggregation. How can you work around i...
Q03SENIOR
What happens when you $unwind an array with null? How do you prevent los...
Q04SENIOR
How does $lookup perform when the joined field is not unique? What are t...
Q05SENIOR
What is the difference between $lookup with a pipeline and a simple loca...
Q06SENIOR
How does the query planner handle $facet? What are the memory implicatio...
Q01 of 06SENIOR

How does MongoDB optimize an aggregation pipeline? Give an example of an automatic optimization and one that must be done manually.

ANSWER
Automatic: $match coalescence — multiple $match stages are merged into one, reducing stage overhead. Also $sort + $limit triggers a top-k sort algorithm. Manual: $match after $lookup is not pushed inside the lookup pipeline; you must move that filter into the $lookup's pipeline parameter explicitly.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
What is MongoDB Aggregation Pipeline in simple terms?
02
How do I avoid the 100MB memory limit in aggregation?
03
Why is my $lookup so slow and how do I fix it?
04
Why does my aggregation return no documents after $lookup and $unwind?
05
Can I use $facet to run aggregations in parallel?
06
How do I check if my aggregation used an index?
N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

Follow
Verified
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
🔥

That's NoSQL. Mark it forged?

24 min read · try the examples if you haven't

Previous
MongoDB CRUD Operations
4 / 15 · NoSQL
Next
MongoDB Indexing