Home Database MongoDB Basics Explained — Documents, Collections and Queries That Actually Make Sense

MongoDB Basics Explained — Documents, Collections and Queries That Actually Make Sense

In Plain English 🔥
Imagine your school keeps student records not in a giant shared spreadsheet (where every row must have the same columns), but in a filing cabinet full of individual folders. Each folder can hold whatever papers that student needs — some folders have report cards, others have medical notes, some have both. MongoDB is that filing cabinet. Each folder is a 'document', and the cabinet itself is a 'collection'. No two folders have to look the same, and you can find any folder instantly by its label.
⚡ Quick Answer
Imagine your school keeps student records not in a giant shared spreadsheet (where every row must have the same columns), but in a filing cabinet full of individual folders. Each folder can hold whatever papers that student needs — some folders have report cards, others have medical notes, some have both. MongoDB is that filing cabinet. Each folder is a 'document', and the cabinet itself is a 'collection'. No two folders have to look the same, and you can find any folder instantly by its label.

Every app you use daily — from your food delivery tracker to your social media feed — stores data somewhere. Relational databases like PostgreSQL are brilliant when your data is predictable and heavily interconnected. But the moment your data gets irregular, deeply nested, or needs to scale horizontally across dozens of servers, SQL starts fighting you. That's the real world MongoDB was built for.

MongoDB solves a specific, painful problem: storing data that doesn't fit neatly into rows and columns. A product in an e-commerce store might have two attributes or twenty. A user profile on one platform needs a bio field; on another it needs a portfolio array. Forcing that variety into a rigid table schema means either wasting columns, creating awkward join tables, or writing painful migration scripts every time requirements change. MongoDB lets the data own its shape.

By the end of this article you'll understand not just how to run MongoDB CRUD commands, but WHY the document model exists, WHEN to choose it over SQL, how to design collections that won't haunt you later, and the query patterns that show up in production systems every day. You'll also walk away knowing exactly what to say when an interviewer asks you to compare MongoDB to a relational database.

The Document Model — Why JSON-Like Storage Changes Everything

In a relational database, a 'user' lives across multiple tables. Basic info in users, their addresses in user_addresses, their preferences in user_settings. To reconstruct one complete user, you JOIN three tables. That JOIN is fast when your dataset fits on one server. When it's spread across ten servers — suddenly it's a network call, and network calls are slow.

MongoDB stores that entire user as a single document. One read, no joins. The document is stored in BSON (Binary JSON) format internally, which means it supports richer types than plain JSON — things like native Date objects and 64-bit integers without string conversion hacks.

Every document lives inside a collection. A collection is roughly equivalent to a SQL table, but it enforces no schema by default. Two documents in the same collection can have completely different fields. This isn't chaos — it's intentional flexibility. You're trading schema enforcement at the database level for schema ownership at the application level.

This matters because in fast-moving products, your schema changes weekly. With MongoDB, you add a new field to new documents without touching old ones, and your app just handles the absence gracefully. No ALTER TABLE. No downtime. No migration script that runs for three hours on a 50-million-row table.

document_model_intro.js · JAVASCRIPT
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// Connect to MongoDB using the Node.js driver (mongosh syntax works identically)
// Run this in mongosh or as a Node.js script with the mongodb package

// --- Step 1: Switch to (or create) our working database ---
use('ecommerce_store');

// --- Step 2: Insert two product documents with intentionally DIFFERENT shapes ---
// Notice: 'simple_mug' has no variants, but 'custom_tshirt' does.
// In SQL you'd need a separate 'variants' table. Here it's just an array inside the doc.

db.products.insertMany([
  {
    // A simple product — flat structure, no variants needed
    name: 'Ceramic Coffee Mug',
    sku: 'MUG-001',
    price_usd: 12.99,
    stock_count: 150,
    category: 'kitchenware',
    tags: ['ceramic', 'handmade', 'dishwasher-safe'],
    created_at: new Date('2024-01-15')
  },
  {
    // A complex product — has nested variants (size + colour combos)
    // This would require 3 tables in SQL. Here it's one document.
    name: 'Custom Logo T-Shirt',
    sku: 'TSH-042',
    base_price_usd: 24.99,
    category: 'apparel',
    tags: ['cotton', 'customizable', 'unisex'],
    variants: [
      { size: 'S',  color: 'black', additional_stock: 80 },
      { size: 'M',  color: 'black', additional_stock: 120 },
      { size: 'L',  color: 'white', additional_stock: 60 }
    ],
    customization_options: {
      max_logo_size_cm: 10,
      allowed_positions: ['chest', 'back', 'sleeve']
    },
    created_at: new Date('2024-03-22')
  }
]);

// --- Step 3: Query all products in the 'apparel' category ---
const apparelProducts = db.products.find(
  { category: 'apparel' },   // filter: only apparel
  { name: 1, base_price_usd: 1, _id: 0 }  // projection: only return name and price
).toArray();

console.log('Apparel products found:', JSON.stringify(apparelProducts, null, 2));
▶ Output
Apparel products found: [
{
"name": "Custom Logo T-Shirt",
"base_price_usd": 24.99
}
]
🔥
Why _id Exists:MongoDB auto-generates an `_id` field (an ObjectId) for every document if you don't supply one. That ObjectId encodes the creation timestamp, machine ID and a random counter — so it's globally unique without a sequence generator. You can decode the creation time from any ObjectId with `new ObjectId(doc._id).getTimestamp()`.

CRUD in the Real World — Beyond the Basic Insert and Find

Most tutorials show you insertOne, findOne, updateOne and deleteOne in isolation. That's fine for syntax, but it hides the decisions you'll actually make in production. Let's walk through a realistic user-account lifecycle — creating a user, enriching their profile over time, querying by nested fields, and cleaning up — because that mirrors what your application code will actually do.

The key update operator to understand deeply is $set. It's not a full document replacement; it surgically modifies only the fields you name. Compare that to passing a bare object to replaceOne, which overwrites the entire document. Using the wrong one is a classic production bug that silently deletes data.

For queries, the filter object mirrors the document shape. Want to query a nested field? Use dot-notation: 'address.city': 'Austin'. Want to check if an array contains a value? Just pass the value directly — MongoDB checks for membership automatically. Want all users who joined in the last 30 days? Use $gte on the date field. These patterns are used in virtually every MongoDB-backed app.

user_account_crud.js · JAVASCRIPT
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
use('saas_platform');

// ─────────────────────────────────────────
// CREATE — Register a new user
// ─────────────────────────────────────────
const insertResult = db.users.insertOne({
  email: 'priya.sharma@example.com',
  display_name: 'Priya Sharma',
  hashed_password: '$2b$12$exampleHashedPasswordHere',
  plan: 'free',
  address: {
    city: 'Mumbai',
    country: 'IN'
  },
  permissions: ['read', 'comment'],
  joined_at: new Date(),
  last_login: null   // null is valid — she hasn't logged in yet
});

console.log('Inserted ID:', insertResult.insertedId);
// Inserted ID: ObjectId('664a1f...')

// ─────────────────────────────────────────
// READ — Find users in Mumbai on the free plan
// Dot-notation queries nested fields cleanly
// ─────────────────────────────────────────
const mumbaiFreeUsers = db.users.find(
  {
    'address.city': 'Mumbai',  // dot-notation for nested field
    plan: 'free'
  },
  { email: 1, display_name: 1, _id: 0 }  // only return these fields
).toArray();

console.log('Mumbai free-plan users:', mumbaiFreeUsers);
// Mumbai free-plan users: [ { email: 'priya.sharma@example.com', display_name: 'Priya Sharma' } ]

// ─────────────────────────────────────────
// UPDATE — Priya upgrades to 'pro' and gets new permissions
// $set only touches the named fields — everything else is UNTOUCHED
// $push appends to the permissions array without overwriting it
// ─────────────────────────────────────────
const updateResult = db.users.updateOne(
  { email: 'priya.sharma@example.com' },  // filter
  {
    $set:  { plan: 'pro', last_login: new Date() },
    $push: { permissions: 'write' }  // appends 'write' to her permissions array
  }
);

console.log('Documents modified:', updateResult.modifiedCount);
// Documents modified: 1

// ─────────────────────────────────────────
// VERIFY — Confirm her updated document
// ─────────────────────────────────────────
const updatedUser = db.users.findOne(
  { email: 'priya.sharma@example.com' },
  { email: 1, plan: 1, permissions: 1, _id: 0 }
);

console.log('Updated user:', JSON.stringify(updatedUser, null, 2));
/*
Updated user: {
  "email": "priya.sharma@example.com",
  "plan": "pro",
  "permissions": ["read", "comment", "write"]
}
*/

// ─────────────────────────────────────────
// DELETE — Remove a test/spam account
// deleteOne removes the FIRST match only
// ─────────────────────────────────────────
const deleteResult = db.users.deleteOne({ email: 'spam-bot@junk.io' });
console.log('Deleted count:', deleteResult.deletedCount);
// Deleted count: 1 (or 0 if the email didn't exist — it won't throw an error)
▶ Output
Inserted ID: ObjectId('664a1f3b2c1d4e5f6a7b8c9d')
Mumbai free-plan users: [ { email: 'priya.sharma@example.com', display_name: 'Priya Sharma' } ]
Documents modified: 1
Updated user: {
"email": "priya.sharma@example.com",
"plan": "pro",
"permissions": ["read", "comment", "write"]
}
⚠️
Watch Out: replaceOne vs updateOneIf you call `db.users.replaceOne({ email: '...' }, { plan: 'pro' })`, you don't update the plan — you REPLACE the entire document with `{ plan: 'pro' }`. Priya's email, name, permissions — all gone. Always use `updateOne` with `$set` unless you truly intend a full replacement.

Indexes and Schema Design — The Two Decisions That Make or Break Performance

A MongoDB collection with no indexes is a filing cabinet where every search requires opening every folder. That's fine at 100 documents. It's catastrophic at 10 million. An index is a sorted shortcut: MongoDB builds and maintains a separate data structure that maps field values to document locations, so it can jump straight to what you need.

The golden rule: create an index on every field you filter or sort by in production queries. MongoDB's explain('executionStats') method is your best friend here — it tells you whether a query used an index (IXSCAN) or did a full collection scan (COLLSCAN). Never ship a feature without running explain on its query.

Schema design in MongoDB comes down to one core question: do you embed or reference? Embed related data (like an order's line items) inside the parent document when you always read them together and the nested data belongs to one parent only. Use references (storing an ObjectId that points to another collection) when the related data is shared, huge, or needs to be queried independently. The wrong choice at design time causes either massive documents that time out, or a barrage of extra round-trips that kill performance.

indexes_and_schema_design.js · JAVASCRIPT
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
use('saas_platform');

// ─────────────────────────────────────────
// INDEXES — Create targeted indexes for known query patterns
// ─────────────────────────────────────────

// Single-field index: we frequently filter users by email (login lookups)
db.users.createIndex(
  { email: 1 },         // 1 = ascending sort order
  { unique: true }      // enforces no two users share an email at DB level
);

// Compound index: we filter by plan AND sort by joined_at on the admin dashboard
// The ORDER of fields in a compound index matters — put the equality filter first
db.users.createIndex({ plan: 1, joined_at: -1 });  // -1 = descending (newest first)

// Text index: enables full-text search on product names and descriptions
db.products.createIndex({ name: 'text', description: 'text' });

// ─────────────────────────────────────────
// EXPLAIN — Verify a query USES an index, not a COLLSCAN
// Run this BEFORE shipping a new query to production
// ─────────────────────────────────────────
const queryPlan = db.users.find({ plan: 'pro' }).sort({ joined_at: -1 }).explain('executionStats');

// What to look for in the output:
console.log('Stage used:', queryPlan.queryPlanner.winningPlan.inputStage.stage);
// Good output: 'IXSCAN'  (index scan — fast)
// Bad output:  'COLLSCAN' (collection scan — reads every document)

console.log('Docs examined:', queryPlan.executionStats.totalDocsExamined);
console.log('Docs returned:', queryPlan.executionStats.nReturned);
// If totalDocsExamined >> nReturned, your index is wrong or missing

// ─────────────────────────────────────────
// SCHEMA DESIGN: Embed vs Reference example
// ─────────────────────────────────────────

// EMBED approach — Order stores its own line items
// Use when: you always load the order WITH its items, items belong to one order only
db.orders.insertOne({
  order_number: 'ORD-20240312-001',
  customer_id: ObjectId('664a1f3b2c1d4e5f6a7b8c9d'),  // reference to users collection
  status: 'shipped',
  placed_at: new Date('2024-03-12'),
  line_items: [
    // Embedded — no separate 'order_items' collection needed
    { sku: 'MUG-001', product_name: 'Ceramic Coffee Mug', qty: 2, unit_price_usd: 12.99 },
    { sku: 'TSH-042', product_name: 'Custom Logo T-Shirt', qty: 1, unit_price_usd: 24.99 }
  ],
  total_usd: 50.97
});

// REFERENCE approach — Blog post stores author as an ObjectId, not embedded name
// Use when: the author exists independently and may write many posts
db.blog_posts.insertOne({
  title: 'Getting Started with MongoDB Indexes',
  slug: 'mongodb-indexes-guide',
  author_id: ObjectId('664a1f3b2c1d4e5f6a7b8c9d'),  // reference — not embedded
  // If we embedded author details, updating the author name would require
  // updating EVERY post they wrote. With a reference, update once in 'users'.
  body: 'Indexes are the single biggest performance lever in MongoDB...',
  published_at: new Date('2024-04-01'),
  tags: ['mongodb', 'performance', 'indexing']
});

console.log('Schema examples inserted successfully.');
▶ Output
Stage used: IXSCAN
Docs examined: 43
Docs returned: 43
Schema examples inserted successfully.
⚠️
Pro Tip: The 16MB Document LimitMongoDB caps each document at 16MB. If you're embedding an array that grows unboundedly — like comments on a viral post — you'll eventually hit this wall. The 'Bucket Pattern' (grouping comments into separate documents of ~100 each) is MongoDB's recommended solution for high-growth arrays. Design for growth, not just for today.

Aggregation Pipelines — MongoDB's Answer to SQL GROUP BY and JOINs

The find() method takes you far, but the moment you need to summarize, group, reshape or join data across collections, you need the Aggregation Pipeline. Think of it as an assembly line: each stage receives documents from the previous stage, does one job, and passes the result forward.

The most-used stages are $match (filter, like WHERE), $group (aggregate, like GROUP BY), $sort, $project (reshape fields, like SELECT), and $lookup (join another collection). Always put $match first — it reduces the document set early so later stages process less data. Putting a $group before a $match is a performance mistake that processes the entire collection unnecessarily.

Aggregation pipelines are where MongoDB goes from 'handy' to 'genuinely powerful'. Real-world uses include daily revenue reports, cohort analysis by signup date, tag popularity rankings, and inventory summaries — all without pulling raw data into application memory to process it yourself.

aggregation_pipeline_examples.js · JAVASCRIPT
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
use('ecommerce_store');

// ─────────────────────────────────────────
// EXAMPLE 1: Revenue by category for March 2024
// This replaces what would be a multi-JOIN GROUP BY query in SQL
// ─────────────────────────────────────────
const revenueByCategory = db.orders.aggregate([

  // Stage 1 — $match: only look at March 2024 orders (filters EARLY for performance)
  {
    $match: {
      placed_at: {
        $gte: new Date('2024-03-01'),
        $lt:  new Date('2024-04-01')
      },
      status: { $in: ['shipped', 'delivered'] }  // exclude cancelled orders
    }
  },

  // Stage 2 — $unwind: flatten the line_items array so each item becomes its own doc
  // Before unwind: 1 order doc with 3 line items
  // After unwind:  3 docs, one per line item, each carrying the parent order fields
  { $unwind: '$line_items' },

  // Stage 3 — $group: sum revenue per SKU prefix (our crude category proxy)
  {
    $group: {
      _id: '$line_items.sku',   // group by SKU
      total_revenue: {
        $sum: {
          $multiply: ['$line_items.qty', '$line_items.unit_price_usd']  // qty × price
        }
      },
      total_units_sold: { $sum: '$line_items.qty' }
    }
  },

  // Stage 4 — $sort: highest revenue first
  { $sort: { total_revenue: -1 } },

  // Stage 5 — $limit: only show top 5 products
  { $limit: 5 },

  // Stage 6 — $project: rename _id to sku, round revenue to 2 decimal places
  {
    $project: {
      _id: 0,
      sku: '$_id',
      total_revenue: { $round: ['$total_revenue', 2] },
      total_units_sold: 1
    }
  }

]).toArray();

console.log('Top 5 products by March revenue:');
console.log(JSON.stringify(revenueByCategory, null, 2));

// ─────────────────────────────────────────
// EXAMPLE 2: $lookup — Join blog posts with their author's display name
// This is MongoDB's equivalent of a LEFT JOIN
// ─────────────────────────────────────────
const postsWithAuthors = db.blog_posts.aggregate([
  { $match: { published_at: { $gte: new Date('2024-01-01') } } },
  {
    $lookup: {
      from: 'users',                  // the collection to join
      localField: 'author_id',        // field in blog_posts
      foreignField: '_id',            // field in users
      as: 'author_details'            // result goes into this new array field
    }
  },
  // $lookup always returns an array — unwrap it since each post has one author
  { $unwind: '$author_details' },
  {
    $project: {
      _id: 0,
      title: 1,
      author_name: '$author_details.display_name',  // pull up the nested name
      published_at: 1
    }
  }
]).toArray();

console.log('Posts with authors:', JSON.stringify(postsWithAuthors, null, 2));
▶ Output
Top 5 products by March revenue:
[
{ "sku": "TSH-042", "total_revenue": 749.70, "total_units_sold": 30 },
{ "sku": "MUG-001", "total_revenue": 519.60, "total_units_sold": 40 },
{ "sku": "HAT-007", "total_revenue": 389.25, "total_units_sold": 25 },
{ "sku": "BAG-019", "total_revenue": 299.00, "total_units_sold": 10 },
{ "sku": "PIN-003", "total_revenue": 89.55, "total_units_sold": 15 }
]
Posts with authors: [
{
"title": "Getting Started with MongoDB Indexes",
"author_name": "Priya Sharma",
"published_at": "2024-04-01T00:00:00.000Z"
}
]
🔥
Interview Gold: Pipeline Order MattersInterviewers love asking why aggregation pipelines are slow. The #1 answer: `$match` and `$project` stages placed too late. Always filter and slim down documents as early as possible. A `$match` at stage 1 that cuts your working set from 2M to 50K documents makes every subsequent stage 40x faster.
Feature / AspectMongoDB (Document DB)PostgreSQL (Relational DB)
Data shapeFlexible — each document can differFixed — all rows must match table schema
Schema changesAdd fields to new docs with no migrationRequires ALTER TABLE — can lock the table
Joins$lookup in aggregation pipeline (less natural)Native JOIN — optimized and first-class
Horizontal scalingBuilt-in sharding across multiple serversRequires extensions like Citus or manual sharding
TransactionsMulti-document ACID transactions (v4.0+)Full ACID transactions since day one
Query languageJSON-based filter objects + aggregation pipelineDeclarative SQL — widely known and portable
Best forVariable-structure data, catalogs, content, IoTFinancial records, reporting, relational data
Nested dataFirst-class — embed arrays and objects naturallyAwkward — needs JSON columns or separate tables

🎯 Key Takeaways

  • MongoDB stores data as BSON documents inside collections — no rows, no fixed columns. Two documents in the same collection can have completely different fields, which is a feature, not a bug.
  • Always use $set in updateOne calls unless you intend a full document replacement. A bare object in updateOne doesn't merge — it replaces, silently deleting every field you didn't include.
  • Every field you filter or sort by in production needs an index. Use explain('executionStats') to confirm you're getting an IXSCAN before shipping any query. A missing index is invisible in development and catastrophic in production.
  • The aggregation pipeline is MongoDB's SQL GROUP BY and JOIN equivalent — use $match as early as possible to reduce the working set, then $group, $sort, $lookup and $project to shape the final result.

⚠ Common Mistakes to Avoid

  • Mistake 1: Using updateOne with a bare replacement object instead of $set — Symptom: entire document is silently replaced, losing all unmentioned fields. No error is thrown. Fix: always structure your update as { $set: { fieldToChange: newValue } } unless you explicitly want a full document replacement via replaceOne.
  • Mistake 2: Not creating indexes on filter fields before going to production — Symptom: queries work fine in development with 500 test documents, then crawl at 10+ seconds in production with 5M documents because every query does a COLLSCAN. Fix: run db.collection.find(yourFilter).explain('executionStats') and confirm the winning plan shows IXSCAN. Create compound indexes that match your most common query patterns before load testing.
  • Mistake 3: Embedding unbounded arrays inside a document — Symptom: a 'post comments' array or 'chat messages' array grows until the 16MB document limit is hit, causing insert failures with BSONObjectTooLarge errors. Fix: if an array can grow without a fixed upper bound (say, beyond 100-200 items), reference it instead — store comments as separate documents in a comments collection with a post_id field, then query with db.comments.find({ post_id: postId }).

Interview Questions on This Topic

  • QWhat's the difference between embedding and referencing in MongoDB schema design, and how do you decide which to use for a given relationship?
  • QMongoDB is described as 'schema-less' — but experienced engineers say that's misleading. What do they mean, and how do you manage schema consistency in a real application?
  • QIf a MongoDB aggregation pipeline is running slowly on a large collection, what are the first three things you'd check or change to improve its performance?

Frequently Asked Questions

What is the difference between MongoDB and a SQL database?

MongoDB stores data as flexible JSON-like documents inside collections, while SQL databases store data in rigid tables with fixed columns and rows. MongoDB handles variable-structure data naturally and scales horizontally out of the box, but SQL databases offer more mature JOIN support and have been the gold standard for relational, transactional data for decades. The right choice depends on your data's shape and access patterns, not hype.

Does MongoDB support transactions like SQL databases do?

Yes, since version 4.0, MongoDB supports multi-document ACID transactions — meaning you can update multiple documents across multiple collections atomically, with full rollback on failure. The syntax uses startSession() and session.withTransaction(). That said, if your app requires frequent multi-document transactions, it's worth asking whether a relational database is a better fit, since MongoDB's transaction overhead is higher than single-document operations.

When should I embed data vs reference it with an ObjectId in MongoDB?

Embed when the nested data belongs exclusively to one parent document, you always load the parent and child together, and the array has a fixed upper bound in size (e.g., a product's 3-5 shipping options). Reference when the data is shared across many documents (like an author writing many posts), the sub-data needs to be queried independently, or the array could grow without limit (like post comments). Getting this decision right at design time saves enormous pain later.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousIntroduction to NoSQL DatabasesNext →MongoDB CRUD Operations
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged