MongoDB Basics Explained — Documents, Collections and Queries That Actually Make Sense
Every app you use daily — from your food delivery tracker to your social media feed — stores data somewhere. Relational databases like PostgreSQL are brilliant when your data is predictable and heavily interconnected. But the moment your data gets irregular, deeply nested, or needs to scale horizontally across dozens of servers, SQL starts fighting you. That's the real world MongoDB was built for.
MongoDB solves a specific, painful problem: storing data that doesn't fit neatly into rows and columns. A product in an e-commerce store might have two attributes or twenty. A user profile on one platform needs a bio field; on another it needs a portfolio array. Forcing that variety into a rigid table schema means either wasting columns, creating awkward join tables, or writing painful migration scripts every time requirements change. MongoDB lets the data own its shape.
By the end of this article you'll understand not just how to run MongoDB CRUD commands, but WHY the document model exists, WHEN to choose it over SQL, how to design collections that won't haunt you later, and the query patterns that show up in production systems every day. You'll also walk away knowing exactly what to say when an interviewer asks you to compare MongoDB to a relational database.
The Document Model — Why JSON-Like Storage Changes Everything
In a relational database, a 'user' lives across multiple tables. Basic info in users, their addresses in user_addresses, their preferences in user_settings. To reconstruct one complete user, you JOIN three tables. That JOIN is fast when your dataset fits on one server. When it's spread across ten servers — suddenly it's a network call, and network calls are slow.
MongoDB stores that entire user as a single document. One read, no joins. The document is stored in BSON (Binary JSON) format internally, which means it supports richer types than plain JSON — things like native Date objects and 64-bit integers without string conversion hacks.
Every document lives inside a collection. A collection is roughly equivalent to a SQL table, but it enforces no schema by default. Two documents in the same collection can have completely different fields. This isn't chaos — it's intentional flexibility. You're trading schema enforcement at the database level for schema ownership at the application level.
This matters because in fast-moving products, your schema changes weekly. With MongoDB, you add a new field to new documents without touching old ones, and your app just handles the absence gracefully. No ALTER TABLE. No downtime. No migration script that runs for three hours on a 50-million-row table.
// Connect to MongoDB using the Node.js driver (mongosh syntax works identically) // Run this in mongosh or as a Node.js script with the mongodb package // --- Step 1: Switch to (or create) our working database --- use('ecommerce_store'); // --- Step 2: Insert two product documents with intentionally DIFFERENT shapes --- // Notice: 'simple_mug' has no variants, but 'custom_tshirt' does. // In SQL you'd need a separate 'variants' table. Here it's just an array inside the doc. db.products.insertMany([ { // A simple product — flat structure, no variants needed name: 'Ceramic Coffee Mug', sku: 'MUG-001', price_usd: 12.99, stock_count: 150, category: 'kitchenware', tags: ['ceramic', 'handmade', 'dishwasher-safe'], created_at: new Date('2024-01-15') }, { // A complex product — has nested variants (size + colour combos) // This would require 3 tables in SQL. Here it's one document. name: 'Custom Logo T-Shirt', sku: 'TSH-042', base_price_usd: 24.99, category: 'apparel', tags: ['cotton', 'customizable', 'unisex'], variants: [ { size: 'S', color: 'black', additional_stock: 80 }, { size: 'M', color: 'black', additional_stock: 120 }, { size: 'L', color: 'white', additional_stock: 60 } ], customization_options: { max_logo_size_cm: 10, allowed_positions: ['chest', 'back', 'sleeve'] }, created_at: new Date('2024-03-22') } ]); // --- Step 3: Query all products in the 'apparel' category --- const apparelProducts = db.products.find( { category: 'apparel' }, // filter: only apparel { name: 1, base_price_usd: 1, _id: 0 } // projection: only return name and price ).toArray(); console.log('Apparel products found:', JSON.stringify(apparelProducts, null, 2));
{
"name": "Custom Logo T-Shirt",
"base_price_usd": 24.99
}
]
CRUD in the Real World — Beyond the Basic Insert and Find
Most tutorials show you insertOne, findOne, updateOne and deleteOne in isolation. That's fine for syntax, but it hides the decisions you'll actually make in production. Let's walk through a realistic user-account lifecycle — creating a user, enriching their profile over time, querying by nested fields, and cleaning up — because that mirrors what your application code will actually do.
The key update operator to understand deeply is $set. It's not a full document replacement; it surgically modifies only the fields you name. Compare that to passing a bare object to replaceOne, which overwrites the entire document. Using the wrong one is a classic production bug that silently deletes data.
For queries, the filter object mirrors the document shape. Want to query a nested field? Use dot-notation: 'address.city': 'Austin'. Want to check if an array contains a value? Just pass the value directly — MongoDB checks for membership automatically. Want all users who joined in the last 30 days? Use $gte on the date field. These patterns are used in virtually every MongoDB-backed app.
use('saas_platform'); // ───────────────────────────────────────── // CREATE — Register a new user // ───────────────────────────────────────── const insertResult = db.users.insertOne({ email: 'priya.sharma@example.com', display_name: 'Priya Sharma', hashed_password: '$2b$12$exampleHashedPasswordHere', plan: 'free', address: { city: 'Mumbai', country: 'IN' }, permissions: ['read', 'comment'], joined_at: new Date(), last_login: null // null is valid — she hasn't logged in yet }); console.log('Inserted ID:', insertResult.insertedId); // Inserted ID: ObjectId('664a1f...') // ───────────────────────────────────────── // READ — Find users in Mumbai on the free plan // Dot-notation queries nested fields cleanly // ───────────────────────────────────────── const mumbaiFreeUsers = db.users.find( { 'address.city': 'Mumbai', // dot-notation for nested field plan: 'free' }, { email: 1, display_name: 1, _id: 0 } // only return these fields ).toArray(); console.log('Mumbai free-plan users:', mumbaiFreeUsers); // Mumbai free-plan users: [ { email: 'priya.sharma@example.com', display_name: 'Priya Sharma' } ] // ───────────────────────────────────────── // UPDATE — Priya upgrades to 'pro' and gets new permissions // $set only touches the named fields — everything else is UNTOUCHED // $push appends to the permissions array without overwriting it // ───────────────────────────────────────── const updateResult = db.users.updateOne( { email: 'priya.sharma@example.com' }, // filter { $set: { plan: 'pro', last_login: new Date() }, $push: { permissions: 'write' } // appends 'write' to her permissions array } ); console.log('Documents modified:', updateResult.modifiedCount); // Documents modified: 1 // ───────────────────────────────────────── // VERIFY — Confirm her updated document // ───────────────────────────────────────── const updatedUser = db.users.findOne( { email: 'priya.sharma@example.com' }, { email: 1, plan: 1, permissions: 1, _id: 0 } ); console.log('Updated user:', JSON.stringify(updatedUser, null, 2)); /* Updated user: { "email": "priya.sharma@example.com", "plan": "pro", "permissions": ["read", "comment", "write"] } */ // ───────────────────────────────────────── // DELETE — Remove a test/spam account // deleteOne removes the FIRST match only // ───────────────────────────────────────── const deleteResult = db.users.deleteOne({ email: 'spam-bot@junk.io' }); console.log('Deleted count:', deleteResult.deletedCount); // Deleted count: 1 (or 0 if the email didn't exist — it won't throw an error)
Mumbai free-plan users: [ { email: 'priya.sharma@example.com', display_name: 'Priya Sharma' } ]
Documents modified: 1
Updated user: {
"email": "priya.sharma@example.com",
"plan": "pro",
"permissions": ["read", "comment", "write"]
}
Indexes and Schema Design — The Two Decisions That Make or Break Performance
A MongoDB collection with no indexes is a filing cabinet where every search requires opening every folder. That's fine at 100 documents. It's catastrophic at 10 million. An index is a sorted shortcut: MongoDB builds and maintains a separate data structure that maps field values to document locations, so it can jump straight to what you need.
The golden rule: create an index on every field you filter or sort by in production queries. MongoDB's explain('executionStats') method is your best friend here — it tells you whether a query used an index (IXSCAN) or did a full collection scan (COLLSCAN). Never ship a feature without running explain on its query.
Schema design in MongoDB comes down to one core question: do you embed or reference? Embed related data (like an order's line items) inside the parent document when you always read them together and the nested data belongs to one parent only. Use references (storing an ObjectId that points to another collection) when the related data is shared, huge, or needs to be queried independently. The wrong choice at design time causes either massive documents that time out, or a barrage of extra round-trips that kill performance.
use('saas_platform'); // ───────────────────────────────────────── // INDEXES — Create targeted indexes for known query patterns // ───────────────────────────────────────── // Single-field index: we frequently filter users by email (login lookups) db.users.createIndex( { email: 1 }, // 1 = ascending sort order { unique: true } // enforces no two users share an email at DB level ); // Compound index: we filter by plan AND sort by joined_at on the admin dashboard // The ORDER of fields in a compound index matters — put the equality filter first db.users.createIndex({ plan: 1, joined_at: -1 }); // -1 = descending (newest first) // Text index: enables full-text search on product names and descriptions db.products.createIndex({ name: 'text', description: 'text' }); // ───────────────────────────────────────── // EXPLAIN — Verify a query USES an index, not a COLLSCAN // Run this BEFORE shipping a new query to production // ───────────────────────────────────────── const queryPlan = db.users.find({ plan: 'pro' }).sort({ joined_at: -1 }).explain('executionStats'); // What to look for in the output: console.log('Stage used:', queryPlan.queryPlanner.winningPlan.inputStage.stage); // Good output: 'IXSCAN' (index scan — fast) // Bad output: 'COLLSCAN' (collection scan — reads every document) console.log('Docs examined:', queryPlan.executionStats.totalDocsExamined); console.log('Docs returned:', queryPlan.executionStats.nReturned); // If totalDocsExamined >> nReturned, your index is wrong or missing // ───────────────────────────────────────── // SCHEMA DESIGN: Embed vs Reference example // ───────────────────────────────────────── // EMBED approach — Order stores its own line items // Use when: you always load the order WITH its items, items belong to one order only db.orders.insertOne({ order_number: 'ORD-20240312-001', customer_id: ObjectId('664a1f3b2c1d4e5f6a7b8c9d'), // reference to users collection status: 'shipped', placed_at: new Date('2024-03-12'), line_items: [ // Embedded — no separate 'order_items' collection needed { sku: 'MUG-001', product_name: 'Ceramic Coffee Mug', qty: 2, unit_price_usd: 12.99 }, { sku: 'TSH-042', product_name: 'Custom Logo T-Shirt', qty: 1, unit_price_usd: 24.99 } ], total_usd: 50.97 }); // REFERENCE approach — Blog post stores author as an ObjectId, not embedded name // Use when: the author exists independently and may write many posts db.blog_posts.insertOne({ title: 'Getting Started with MongoDB Indexes', slug: 'mongodb-indexes-guide', author_id: ObjectId('664a1f3b2c1d4e5f6a7b8c9d'), // reference — not embedded // If we embedded author details, updating the author name would require // updating EVERY post they wrote. With a reference, update once in 'users'. body: 'Indexes are the single biggest performance lever in MongoDB...', published_at: new Date('2024-04-01'), tags: ['mongodb', 'performance', 'indexing'] }); console.log('Schema examples inserted successfully.');
Docs examined: 43
Docs returned: 43
Schema examples inserted successfully.
Aggregation Pipelines — MongoDB's Answer to SQL GROUP BY and JOINs
The find() method takes you far, but the moment you need to summarize, group, reshape or join data across collections, you need the Aggregation Pipeline. Think of it as an assembly line: each stage receives documents from the previous stage, does one job, and passes the result forward.
The most-used stages are $match (filter, like WHERE), $group (aggregate, like GROUP BY), $sort, $project (reshape fields, like SELECT), and $lookup (join another collection). Always put $match first — it reduces the document set early so later stages process less data. Putting a $group before a $match is a performance mistake that processes the entire collection unnecessarily.
Aggregation pipelines are where MongoDB goes from 'handy' to 'genuinely powerful'. Real-world uses include daily revenue reports, cohort analysis by signup date, tag popularity rankings, and inventory summaries — all without pulling raw data into application memory to process it yourself.
use('ecommerce_store'); // ───────────────────────────────────────── // EXAMPLE 1: Revenue by category for March 2024 // This replaces what would be a multi-JOIN GROUP BY query in SQL // ───────────────────────────────────────── const revenueByCategory = db.orders.aggregate([ // Stage 1 — $match: only look at March 2024 orders (filters EARLY for performance) { $match: { placed_at: { $gte: new Date('2024-03-01'), $lt: new Date('2024-04-01') }, status: { $in: ['shipped', 'delivered'] } // exclude cancelled orders } }, // Stage 2 — $unwind: flatten the line_items array so each item becomes its own doc // Before unwind: 1 order doc with 3 line items // After unwind: 3 docs, one per line item, each carrying the parent order fields { $unwind: '$line_items' }, // Stage 3 — $group: sum revenue per SKU prefix (our crude category proxy) { $group: { _id: '$line_items.sku', // group by SKU total_revenue: { $sum: { $multiply: ['$line_items.qty', '$line_items.unit_price_usd'] // qty × price } }, total_units_sold: { $sum: '$line_items.qty' } } }, // Stage 4 — $sort: highest revenue first { $sort: { total_revenue: -1 } }, // Stage 5 — $limit: only show top 5 products { $limit: 5 }, // Stage 6 — $project: rename _id to sku, round revenue to 2 decimal places { $project: { _id: 0, sku: '$_id', total_revenue: { $round: ['$total_revenue', 2] }, total_units_sold: 1 } } ]).toArray(); console.log('Top 5 products by March revenue:'); console.log(JSON.stringify(revenueByCategory, null, 2)); // ───────────────────────────────────────── // EXAMPLE 2: $lookup — Join blog posts with their author's display name // This is MongoDB's equivalent of a LEFT JOIN // ───────────────────────────────────────── const postsWithAuthors = db.blog_posts.aggregate([ { $match: { published_at: { $gte: new Date('2024-01-01') } } }, { $lookup: { from: 'users', // the collection to join localField: 'author_id', // field in blog_posts foreignField: '_id', // field in users as: 'author_details' // result goes into this new array field } }, // $lookup always returns an array — unwrap it since each post has one author { $unwind: '$author_details' }, { $project: { _id: 0, title: 1, author_name: '$author_details.display_name', // pull up the nested name published_at: 1 } } ]).toArray(); console.log('Posts with authors:', JSON.stringify(postsWithAuthors, null, 2));
[
{ "sku": "TSH-042", "total_revenue": 749.70, "total_units_sold": 30 },
{ "sku": "MUG-001", "total_revenue": 519.60, "total_units_sold": 40 },
{ "sku": "HAT-007", "total_revenue": 389.25, "total_units_sold": 25 },
{ "sku": "BAG-019", "total_revenue": 299.00, "total_units_sold": 10 },
{ "sku": "PIN-003", "total_revenue": 89.55, "total_units_sold": 15 }
]
Posts with authors: [
{
"title": "Getting Started with MongoDB Indexes",
"author_name": "Priya Sharma",
"published_at": "2024-04-01T00:00:00.000Z"
}
]
| Feature / Aspect | MongoDB (Document DB) | PostgreSQL (Relational DB) |
|---|---|---|
| Data shape | Flexible — each document can differ | Fixed — all rows must match table schema |
| Schema changes | Add fields to new docs with no migration | Requires ALTER TABLE — can lock the table |
| Joins | $lookup in aggregation pipeline (less natural) | Native JOIN — optimized and first-class |
| Horizontal scaling | Built-in sharding across multiple servers | Requires extensions like Citus or manual sharding |
| Transactions | Multi-document ACID transactions (v4.0+) | Full ACID transactions since day one |
| Query language | JSON-based filter objects + aggregation pipeline | Declarative SQL — widely known and portable |
| Best for | Variable-structure data, catalogs, content, IoT | Financial records, reporting, relational data |
| Nested data | First-class — embed arrays and objects naturally | Awkward — needs JSON columns or separate tables |
🎯 Key Takeaways
- MongoDB stores data as BSON documents inside collections — no rows, no fixed columns. Two documents in the same collection can have completely different fields, which is a feature, not a bug.
- Always use
$setinupdateOnecalls unless you intend a full document replacement. A bare object inupdateOnedoesn't merge — it replaces, silently deleting every field you didn't include. - Every field you filter or sort by in production needs an index. Use
explain('executionStats')to confirm you're getting an IXSCAN before shipping any query. A missing index is invisible in development and catastrophic in production. - The aggregation pipeline is MongoDB's SQL GROUP BY and JOIN equivalent — use
$matchas early as possible to reduce the working set, then$group,$sort,$lookupand$projectto shape the final result.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Using updateOne with a bare replacement object instead of $set — Symptom: entire document is silently replaced, losing all unmentioned fields. No error is thrown. Fix: always structure your update as
{ $set: { fieldToChange: newValue } }unless you explicitly want a full document replacement viareplaceOne. - ✕Mistake 2: Not creating indexes on filter fields before going to production — Symptom: queries work fine in development with 500 test documents, then crawl at 10+ seconds in production with 5M documents because every query does a COLLSCAN. Fix: run
db.collection.find(yourFilter).explain('executionStats')and confirm the winning plan showsIXSCAN. Create compound indexes that match your most common query patterns before load testing. - ✕Mistake 3: Embedding unbounded arrays inside a document — Symptom: a 'post comments' array or 'chat messages' array grows until the 16MB document limit is hit, causing insert failures with
BSONObjectTooLargeerrors. Fix: if an array can grow without a fixed upper bound (say, beyond 100-200 items), reference it instead — store comments as separate documents in acommentscollection with apost_idfield, then query withdb.comments.find({ post_id: postId }).
Interview Questions on This Topic
- QWhat's the difference between embedding and referencing in MongoDB schema design, and how do you decide which to use for a given relationship?
- QMongoDB is described as 'schema-less' — but experienced engineers say that's misleading. What do they mean, and how do you manage schema consistency in a real application?
- QIf a MongoDB aggregation pipeline is running slowly on a large collection, what are the first three things you'd check or change to improve its performance?
Frequently Asked Questions
What is the difference between MongoDB and a SQL database?
MongoDB stores data as flexible JSON-like documents inside collections, while SQL databases store data in rigid tables with fixed columns and rows. MongoDB handles variable-structure data naturally and scales horizontally out of the box, but SQL databases offer more mature JOIN support and have been the gold standard for relational, transactional data for decades. The right choice depends on your data's shape and access patterns, not hype.
Does MongoDB support transactions like SQL databases do?
Yes, since version 4.0, MongoDB supports multi-document ACID transactions — meaning you can update multiple documents across multiple collections atomically, with full rollback on failure. The syntax uses startSession() and session.withTransaction(). That said, if your app requires frequent multi-document transactions, it's worth asking whether a relational database is a better fit, since MongoDB's transaction overhead is higher than single-document operations.
When should I embed data vs reference it with an ObjectId in MongoDB?
Embed when the nested data belongs exclusively to one parent document, you always load the parent and child together, and the array has a fixed upper bound in size (e.g., a product's 3-5 shipping options). Reference when the data is shared across many documents (like an author writing many posts), the sub-data needs to be queried independently, or the array could grow without limit (like post comments). Getting this decision right at design time saves enormous pain later.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.