Skip to content
Home Database MongoDB Replication Lag — 3% Reconciliation Failure

MongoDB Replication Lag — 3% Reconciliation Failure

Where developers are forged. · Structured learning · Free forever.
📍 Part of: NoSQL → Topic 1 of 15
Aggregated reports showed 3% lower totals due to MongoDB replication lag under write-heavy load.
⚙️ Intermediate — basic Database knowledge assumed
In this tutorial, you'll learn
Aggregated reports showed 3% lower totals due to MongoDB replication lag under write-heavy load.
  • NoSQL is a family of purpose-built databases — pick the one that matches your data access pattern
  • Document stores: flexible schemas, embedded data, best for catalogs and profiles
  • Key-value stores: fastest reads by primary key, memory-bound, ideal for caching and counters
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • NoSQL databases trade ACID for scalability, flexibility, and speed
  • Four main types: document, key-value, column-family, graph — each solves a different problem
  • CAP theorem governs the consistency/availability trade-off every NoSQL system faces
  • Schema-on-read allows storing polymorphic data without migrations
  • Production pitfall: choosing NoSQL for relational data leads to painful query workarounds
  • Performance insight: key-value stores can do sub-millisecond reads; column stores excel at range scans over wide columns
🚨 START HERE

Quick Debug Cheat Sheet: NoSQL Performance & Failures

Top three symptoms every on-call engineer faces with NoSQL and how to fix them fast.
🟡

Inconsistent reads across replicas in MongoDB

Immediate ActionSwitch read preference to primary and set read concern to majority
Commands
db.collection.find({}).readPref('primary').readConcern('majority')
rs.printSlaveReplicationInfo() to check replication lag
Fix NowTemporarily pin reads to primary until lag is resolved. Permanent fix: implement retry logic with at-least-once semantics.
🔴

Redis memory exhaustion – OOM kills

Immediate ActionRun redis-cli INFO memory to check used_memory_peak vs maxmemory
Commands
redis-cli MEMORY STATS
redis-cli --bigkeys to find large keys
Fix NowSet maxmemory-policy allkeys-lru in config and restart. Add eviction monitoring alert. For keys with TTL, ensure expiration is set.
🟠

Cassandra high read latency – 99th percentile >500ms

Immediate ActionCheck tpstats for read repair timeout or hitting disk I/O limits
Commands
nodetool cfstats keyspace.table
nodetool tablestats keyspace.table | grep 'Read'
Fix NowIncrease read_request_timeout_in_ms in cassandra.yaml temporarily. Permanent: add more replicas or tune speculative retry.
Production Incident

The MongoDB Migration That Silently Lost Writes

A team migrated from Postgres to MongoDB for 'flexibility', then discovered their financial reports were incorrect by millions.
SymptomAggregated reports showed lower totals than expected. Daily reconciliation failed by up to 3%.
AssumptionMongoDB's eventual consistency is 'good enough' for financial data. The application used read preference 'secondaryPreferred', assuming the secondary always has the latest data.
Root causeUnder write-heavy load, MongoDB's replication lag caused secondary reads to miss recently committed writes. The app also lacked write concern 'majority', so some writes were acknowledged before reaching the majority of nodes, and a subsequent leader re-election lost them entirely.
FixChanged read preference to 'primary' for all financial queries. Set write concern to 'majority' with a journaled write. Added a retry logic layer for 'WriteConcernError'. For analytical queries, switched to a dedicated read replica with read concern 'linearizable'.
Key Lesson
Eventual consistency is dangerous for any system that aggregates monetary valuesAlways test replication lag under peak load — not just on idle clustersUnderstand your database's consistency guarantees before you sacrifice ACIDDocument your read/write concern settings in runbooks, not just config files
Production Debug Guide

Symptom → Action guide for the four NoSQL families

Document store queries are slow despite indexesCheck index usage with explain(). Look for collection scans vs index scans. If index is not used, verify query shape matches the index — MongoDB cannot use partial indexes on regex or negation.
Redis latency spikes under loadRun SLOWLOG GET 100 to see commands taking >100µs. Common culprits: KEYS (blocking), large value retrievals, or high client count causing O(N) commands. Also check for fork() latency during BGSAVE.
Cassandra read repair / compaction causing high CPUExamine nodetool compactionstats and tpstats. If compaction backlog >1GB, tune compaction throughput and enable incremental repairs. Check for tombstone-heavy reads — they kill read performance.
Neo4j queries run slow even with indexesUse PROFILE to see if the query does a node-by-node traversal instead of leveraging indexes. Avoid unbounded path lengths; use pattern matching guards. Check for disconnected nodes causing full scans.

Every app you use daily — Instagram's feed, Netflix's recommendations, Uber's driver locations — stores data differently from the neat rows-and-columns world of SQL. These systems handle millions of writes per second, store wildly different shapes of data, and must stay online across data centers on different continents. Traditional relational databases are incredible tools, but they were designed in an era when a server rack cost more than a house and the internet didn't exist yet. The world changed; the data layer had to change with it.

The core problem SQL solves — enforcing a rigid schema and guaranteeing ACID transactions — is exactly what becomes a bottleneck at web scale or when your data shape is unpredictable. When every user profile has a different set of preferences, when a social graph has billions of edges, or when you need to read a user's session in under a millisecond from any region on earth, forcing data into tables with foreign keys and JOINs creates real pain: slow migrations, expensive hardware scaling, and query planners that simply give up.

By the end of this article you'll understand the four main NoSQL families and what problem each one was built to solve, how CAP theorem governs the trade-offs every NoSQL system makes, and how to look at a real-world requirement and choose the right database — or know when to stick with Postgres.

What Is NoSQL — and Why Did It Emerge?

NoSQL stands for 'Not Only SQL'. It's a category of database systems designed to handle data that doesn't fit neatly into fixed tables. The need emerged in the mid-2000s when internet giants like Google, Amazon, and Facebook hit walls with traditional relational databases. Their workloads demanded horizontal scaling across thousands of servers, flexible schemas for rapidly changing product features, and sub-millisecond access times for billions of users. NoSQL systems sacrificed strict ACID guarantees in exchange for these properties.

Think of NoSQL as a set of purpose-built tools rather than a single approach. Each type — document, key-value, column-family, graph — optimises for a different data access pattern. The common thread: all of them avoid the rigid table-join-index model of SQL. They're not better or worse; they're built for different jobs.

Production reality: most organisations end up running multiple NoSQL databases alongside a relational system. A typical architecture uses Postgres for core business transactions, Redis for caching and session storage, MongoDB for product catalogues, and Elasticsearch for search. Understanding the trade-offs helps you pick the right tool without overcomplicating your infrastructure.

docker-compose.yml · YAML
12345678910111213
version: '3.8'
services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_DB: orders
    # ...
  mongodb:
    image: mongo:7
    # stores product catalog - flexible schema
  redis:
    image: redis:7-alpine
    # session cache - sub-millisecond reads
▶ Output
Running `docker compose up` launches a polyglot persistence stack
Mental Model
NoSQL vs SQL: A Trade-off, Not a War
Think of NoSQL as the difference between a filing cabinet (SQL) and a backpack (NoSQL).
  • SQL: rigid drawers, you must know all fields upfront, but finding exactly what you need is fast and guaranteed consistent
  • NoSQL: throw things in, no upfront planning, but you might have to rummage through the whole bag to find something, and occasionally you'll get stale contents
  • You wouldn't carry a filing cabinet on a hike; you wouldn't store legal records in a backpack. Pick the storage that matches the job.
📊 Production Insight
The biggest NoSQL mistake is assuming it's a drop-in SQL replacement.
Your ORM might let you persist objects without migrations, but queries that used JOINs now require application-level joins or denormalised data.
Rule: before adopting NoSQL, map your access patterns — if most queries involve relationships across entities, stick with SQL.
🎯 Key Takeaway
NoSQL emerged to solve scale and flexibility problems SQL couldn't.
Each NoSQL family optimises for a specific access pattern.
Don't replace SQL — complement it.
The hardest part is admitting you still need a relational database.
When to Start Considering NoSQL
IfData shape changes frequently (new fields every sprint)
UseConsider document stores (MongoDB, Couchbase) — schema-on-read handles this natively
IfNeed sub-millisecond reads on simple key lookups
UseConsider key-value stores (Redis, DynamoDB) — consistent hashing allows O(1) access
IfWrite throughput exceeds 100k ops/s per node
UseConsider column-family stores (Cassandra, Scylla) — designed for horizontal write scaling
IfData is a graph — users, friends, recommendations
UseConsider graph databases (Neo4j, Amazon Neptune) — join tables become traversal patterns
IfNeed ACID transactions across multiple entities
UseStick with SQL — most NoSQL systems sacrifice multi-record atomicity

Document Stores — MongoDB, Couchbase

Document stores save data as self-contained JSON/BSON documents. Each document can have its own structure — one user may have 3 fields, another 20. This is ideal for product catalogues, user profiles, and content management systems where the schema evolves rapidly.

MongoDB is the most popular example. It stores documents in collections (similar to tables) but doesn't enforce a schema. Queries use a rich JSON-based query language with indexes, aggregations, and geospatial support. Document stores support secondary indexes, but joins are expensive — you typically denormalise related data into a single document.

Performance: reads are fast because a single document contains all the data needed for a page. Writes can be a bottleneck if you update large documents frequently — the entire document is rewritten. Atomic operations on single documents are supported, but multi-document transactions (available since MongoDB 4.0) have limited isolation and performance overhead.

mongodb_example.js · JAVASCRIPT
123456789101112131415161718192021222324
// Connect to MongoDB and query product catalog
const { MongoClient } = require('mongodb');
const uri = "mongodb://localhost:27017";
const client = new MongoClient(uri);

async function run() {
  await client.connect();
  const db = client.db('shop');
  const products = db.collection('products');

  // Insert a document — note different fields per category
  await products.insertOne({
    name: 'Wireless Mouse',
    price: 29.99,
    category: 'electronics',
    specs: { connectivity: 'Bluetooth', buttons: 6 }
    // another product might have 'size' and 'color' instead of 'specs'
  });

  // Query with index hint
  const cursor = products.find({ price: { $gte: 20 } });
  const results = await cursor.toArray();
  console.log(results.length);
}
▶ Output
1
⚠ Pitfall: Over-Normalisation in Document Stores
Developers coming from SQL often create separate collections for related data (e.g., users and orders) and then try to join at the application level. This kills performance. In document stores, you should embed related data that is read together. For example, embed order items inside the order document. Only reference (store IDs) when the related data is large and rarely accessed.
📊 Production Insight
Document stores hide a dangerous truth: no built-in referential integrity.
If you store a user ID in an order document and later delete the user, the order still references a ghost.
Rule: implement soft deletes or application-level cascade logic — don't rely on the database to enforce consistency.
🎯 Key Takeaway
Document stores excel when each page maps to one document.
Embedded data kills JOINs but watch out for document size limits (16MB in MongoDB).
The schema-less nature is a double-edged sword — enforce it at the application layer.
Rule: if you need atomic multi-document updates, SQL is still your friend.

Key-Value Stores — Redis, DynamoDB, Riak

Key-value stores are the simplest NoSQL family — a map from a unique key to a blob of data (string, JSON, binary). They're built for lightning-fast lookups by primary key. Redis, DynamoDB, and Memcached are the heavy hitters.

Redis is an in-memory data structure server, not just a cache. It supports strings, hashes, lists, sets, sorted sets, and streams. Multi-key operations are atomic because Redis is single-threaded (for data operations). Persistence is optional. Production use cases: session stores, rate limiter counters, leaderboards, real-time messaging via Pub/Sub.

DynamoDB is a fully managed key-value and document store by AWS. It scales horizontally automatically using consistent hashing. It offers single-digit millisecond latency at any scale. But query flexibility is limited — you must model access patterns upfront (primary key, sort key, secondary indexes). The pricing model (read/write capacity units) can be surprising.

redis_session.py · PYTHON
12345678910111213141516171819
import redis
import json

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

# Store user session (auto-expire after 3600s)
user_session = {'user_id': 42, 'role': 'admin', 'iat': 1712345678}
r.setex('session:abc123', 3600, json.dumps(user_session))

# Retrieve – O(1)
session = json.loads(r.get('session:abc123'))
print(session['role'])  # 'admin'

# Atomic counter for rate limiting
key = f'ratelimit:user:42:{datetime.utcnow():%Y%m%d%H}'
current = r.incr(key)
r.expire(key, 3600)  # auto-clean after an hour
if current > 1000:
    print('Rate limit exceeded')
▶ Output
admin
Rate limit exceeded
🔥Forge Tip: Redis Pipeline for Batch Operations
When you need to get/set many keys, use pipelines instead of individual commands. Pipelines batch commands into one round trip, reducing network overhead by 10-100x. But remember: pipelines are not transactions — they just batch, not atomise. For atomic multi-key ops, use Redis transactions (MULTI/EXEC) or Lua scripting.
📊 Production Insight
Memory is expensive. Key-value stores feel fast because they keep everything in RAM.
A Redis cluster with 100GB memory costs thousands per month. And if your hot dataset exceeds memory, Redis starts evicting keys (bad) or OOMs (worse).
Rule: profile your working set — if it doesn't fit in memory, consider a disk-backed key-value store like DynamoDB or RocksDB.
🎯 Key Takeaway
Key-value stores are the fastest way to read/write by primary key.
They don't do complex queries — you get what you put in.
Redis is for speed; DynamoDB is for scale.
Memory sizing is a budget conversation, not a technical one.
Choosing Between Redis and DynamoDB
IfNeed complex data structures (lists, sets, sorted sets)
UseRedis — these are first-class citizens, not possible in DynamoDB
IfData set exceeds available memory and needs auto-scaling
UseDynamoDB — scales horizontally to petabyte, pay-per-request
IfNeed persistence with controlled durability
UseRedis with AOF and appendfsync everysec — balances performance and durability
IfMulti-region active-active replication
UseDynamoDB Global Tables — fight with conflict resolution; Redis replication is master-replica only

Column-Family Stores — Cassandra, HBase, Scylla

Column-family stores (often called wide-column stores) store data in rows but allow each row to have different columns. The key idea: data is indexed by row key and sorted by column key within each row. This makes them excellent for time-series data, IoT streaming, and analytics workloads that scan large ranges of a known row.

Apache Cassandra is the standard-bearer. It offers tunable consistency — choose how many replicas must respond before the read/write is considered successful. Its architecture is masterless: every node can accept reads/writes. Data is partitioned via consistent hashing and replicated across nodes. Writes are designed to be blazing fast (append-only commit log + memtable + periodic SSTable flush).

HBase (on top of HDFS) offers strong consistency but at the cost of write throughput. ScyllaDB is a C++ rewrite of Cassandra claiming 10x better performance on the same hardware.

cassandra_schema.cql · CQL
12345678910111213141516171819
-- Keyspace with network topology replication
CREATE KEYSPACE iot_data WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc1': '3'
};

CREATE TABLE iot_data.sensor_readings (
  sensor_id uuid,
  day text,          -- partition key: YYYY-MM-DD
  ts timestamp,      -- clustering column for sorting
  temperature float,
  humidity float,
  PRIMARY KEY ((sensor_id, day), ts)
) WITH CLUSTERING ORDER BY (ts DESC);

-- Efficient: fetch latest 100 readings for a sensor on a given day
SELECT * FROM sensor_readings
WHERE sensor_id = ? AND day = '2026-05-01'
ORDER BY ts DESC LIMIT 100;
▶ Output
Query returns up to 100 rows in <10ms
⚠ The Partition Key Trap
A common Cassandra antipattern is using a high-cardinality partition key like a single timestamp column. If each insert creates a new partition, every read ends up hitting many partitions — killing performance. The partition key should distribute data across nodes but also allow efficient scans. For time-series data, bucket by day or hour: 'sensor_id + day' gives one partition per sensor per day — good balance.
📊 Production Insight
Tombstones are Cassandra's ghost records. When you delete a row, it's not removed immediately — a tombstone marker is inserted. During compaction, tombstones are cleaned up. But if you have many deletes (e.g., time-series TTL), tombstones can accumulate and cause read_repair messages to slow down queries.
Rule: monitor tombstone counts with nodetool cfhistograms. If you see reads scanning thousands of tombstones per query, reduce TTL or use TWCS compaction strategy.
🎯 Key Takeaway
Column-family stores excel at write-heavy, wide-row workloads.
Cassandra is masterless and linearly scalable — ideal for multi-datacenter deployments.
But query flexibility is sacrificed: you must model query patterns at schema design time.
Rule: if you need ad-hoc SQL-style queries, Cassandra will frustrate you.

Graph Databases — Neo4j, Amazon Neptune

Graph databases model data as nodes (entities) and edges (relationships). This makes them the natural choice for social networks, recommendation engines, fraud detection, and any domain where the connections between data points are as important as the data itself.

Neo4j is the most mature graph database. It uses the property graph model: nodes and edges can have key-value properties. Queries are expressed in Cypher, a declarative language that looks like ASCII art. Relationships are first-class citizens — they always have a direction and a type. This avoids the costly join tables and recursive queries needed in SQL to traverse relationships.

Performance: traversing relationships is O(1) per hop because edges are stored as pointers. For graph queries like 'find friends of friends of friends who like this movie', graph databases are orders of magnitude faster than SQL joins across multiple tables.

neo4j_social.cql · CYPHER
1234567
-- Find movie recommendations for a user based on friends' ratings
MATCH (u:User {id: 'alice'})-[:FRIEND]->(f:User)
MATCH (f)-[:RATED]->(m:Movie)
WHERE NOT EXISTS { (u)-[:RATED]->(m) }
RETURN m.title, AVG(f.rating) AS avg_rating
ORDER BY avg_rating DESC
LIMIT 10
▶ Output
10 movie titles with average friend ratings
🔥Graphs vs SQL: The Social Network Benchmark
On a graph of 1 million users, each with 50 friends on average, finding a friend-of-a-friend relationship takes ~2ms in Neo4j. In a relational database, the same query requires a self-join on a friendships table with 50 million rows — often >10 seconds. This isn't a SQL failure; it's a paradigm mismatch.
📊 Production Insight
Graph databases tempt developers to model everything as a graph. Don't.
Storing large unstructured documents (e.g., JSON blobs) in nodes is wasteful — you lose the flexibility of document stores.
Mix strategies: use Neo4j for relationship-heavy queries, but store the underlying data objects in a document store and reference them by node ID.
Also, watch out for Cartesian products in Cypher queries — they burn CPU fast.
🎯 Key Takeaway
Graph databases are unbeatable when relationships are the primary query.
Cypher is designed to express path queries naturally.
But they're terrible for bulk aggregations or range scans. Use the right tool.
Rule: if your data looks like a web of connections, go graph. If it's a table, stay SQL.

CAP Theorem and Trade-offs in NoSQL

The CAP theorem states a distributed data store can provide at most two of three guarantees: Consistency (every read sees the latest write), Availability (every request receives a non-error response), and Partition Tolerance (system continues despite network splits). In practice, partitions are inevitable in any distributed system, so you must choose between CP (Consistency + Partition Tolerance) and AP (Availability + Partition Tolerance).

NoSQL databases make explicit trade-offs: MongoDB is CP by default (primary reads), but can be configured for eventual consistency (AP). Cassandra is AP by default — it prefers availability over consistency. Redis cluster is CP for single-key operations, but AP for multi-key transactions across nodes.

This is not a theoretical exercise. In production, the CAP choice determines how your system behaves during a network partition. If a node is isolated but available, it may accept writes that conflict with writes accepted by the rest of the cluster. When the partition heals, you need conflict resolution (last-write-wins, CRDTs, or manual reconciliation). Many teams discover CAP the hard way — when their 'eventually consistent' system fails to converge for hours.

io/thecodeforge/nosql/CapExample.java · JAVA
1234567891011121314151617181920212223242526272829303132
package io.thecodeforge.nosql;

// Illustrating CAP trade-off in code — not real API
public class CapExample {
    enum CapChoice { CP, AP, CA_UNREALISTIC }

    static class Config {
        CapChoice choice;
        int readConcern;   // e.g., 1 (local), majority
        int writeConcern;  // e.g., 1 (ack from one), all

        static Config forMongoDb(String consistencyLevel) {
            Config c = new Config();
            if ("strong".equals(consistencyLevel)) {
                c.choice = CapChoice.CP;
                c.readConcern = 3;   // majority
                c.writeConcern = 3;  // majority
            } else {
                c.choice = CapChoice.AP;
                c.readConcern = 1;   // local
                c.writeConcern = 1;  // local
            }
            return c;
        }
    }

    public static void main(String[] args) {
        Config prod = Config.forMongoDb("strong");
        System.out.println("Production config: " + prod.choice);
        // Under partition: strong consistency means some writes may fail
    }
}
▶ Output
Production config: CP
Mental Model
CAP in Real Life: Your Bank vs Social Feed
Your bank account must be consistent above all — you never want to see a $0 balance when you have $500. Your social media feed is okay being stale by a few seconds — consistency can be sacrificed for availability.
  • CP systems (e.g., HBase, MongoDB with majority concern): will reject writes during a partition to maintain consistency
  • AP systems (e.g., Cassandra, DynamoDB): will accept writes during a partition, but you may read stale or conflicting data after healing
  • The choice determines your on-call experience: CP systems cause write failures; AP systems cause data reconciliation nightmares
  • No 'right' answer — it depends on your business: do you tolerate lost writes or inconsistent reads?
📊 Production Insight
CAP is often misunderstood as a static choice. In practice, you can tune per operation.
MongoDB allows per-query read concern and write concern. Cassandra has per-query consistency level.
But don't mix inconsistently — a write with QUORUM and a read with ONE will produce surprising results.
Rule: document your consistency model for every data operation. When the on-call phone rings at 3 AM, you need to know exactly what trade-off you chose.
🎯 Key Takeaway
CAP is not a suggestion — it's a physical law of distributed systems.
You choose between CP and AP; there is no 'CA' in a real network.
Your consistency level directly impacts availability during failures.
Rule: write down your CAP trade-off for every critical data path. Test it with a network partition experiment.
🗂 NoSQL Database Family Comparison
Key characteristics, strengths, and weaknesses at a glance
FeatureDocument (MongoDB)Key-Value (Redis)Column-Family (Cassandra)Graph (Neo4j)
Data ModelJSON/BSON documentsKey → value blobRows with flexible columnsNodes & relationships
Query LanguageMQL (MongoDB Query Language)Commands (SET, GET, etc.)CQL (Cassandra Query Language)Cypher
Best Use CaseProduct catalogs, user profilesSession cache, counters, leaderboardsTime series, IoT, event loggingSocial graphs, recommendations, fraud
Scalability ModelReplica sets + shardingRedis Cluster (hash slots)Masterless ring (consistent hashing)Read replicas, causal clustering
Consistency (default)CP (primary reads, majority writes)CP (single-node cluster) / AP (cluster multi-key)AP (tunable per query)CP (single-instance writes) / AP (cluster reads)
Latency (p50 read)1-5ms (indexed, local)<1ms (in-memory)2-10ms (tunable consistency)5-20ms (indexed traversal)
Primary LimitationMulti-document transactions slowMemory-bound, no complex queriesAd-hoc queries hard; tombstone overheadBad for bulk aggregations; write performance

🎯 Key Takeaways

  • NoSQL is a family of purpose-built databases — pick the one that matches your data access pattern
  • Document stores: flexible schemas, embedded data, best for catalogs and profiles
  • Key-value stores: fastest reads by primary key, memory-bound, ideal for caching and counters
  • Column-family stores: write-optimized, masterless, great for time-series and high-throughput writes
  • Graph databases: relationship-first queries, natural for social and recommendation systems
  • CAP theorem forces a choice between consistency and availability during partitions — know which one your business needs
  • Most production systems use polyglot persistence: SQL for transactions, NoSQL for specific workloads

⚠ Common Mistakes to Avoid

    Choosing NoSQL 'because it’s cool' without understanding the access patterns
    Symptom

    Your app spends most of its time doing application-level joins across collections, resulting in N+1 queries and poor latency. The 'flexibility' of NoSQL is wasted because your data actually has strong relationships.

    Fix

    Start with a relational database. Only move to NoSQL when you can articulate exactly which SQL limitation (scale, schema flexibility, latency) is blocking you, and confirm the NoSQL family addresses it.

    Not modelling your data for the NoSQL query patterns
    Symptom

    In Cassandra, you create tables mimicking your relational model and then find you can't query by anything other than the partition key. In MongoDB, you normalise users and orders into separate collections and hit performance problems because there's no JOIN.

    Fix

    For Cassandra, design your tables around the queries you will run — duplicate data across multiple tables if needed. For MongoDB, embed related data that is read together into a single document. Accept denormalisation.

    Assuming NoSQL is cheaper than SQL
    Symptom

    You spin up a large Redis cluster because it's 'fast', but the memory cost dwarfs what an optimised Postgres query would cost. Or you choose DynamoDB for a small workload and get surprised by the per-request cost.

    Fix

    Calculate total cost of ownership including infrastructure, operations, and developer time. Often, a well-tuned Postgres with connection pooling and proper indexing is more cost-efficient for most workloads. Reserve NoSQL for where it provides specific value.

    Ignoring eventual consistency during application design
    Symptom

    Your e-commerce site allows double-booking of hotel rooms because two concurrent requests read from a stale replica, both think the room is free, and both proceed to book. Afterwards, you have 2 bookings for 1 room.

    Fix

    Use strong consistency or optimistic locking for critical resources (inventory, balance). Design idempotent operations. Test your application under network partitions — tools like Jepsen or Chaos Monkey can reveal consistency bugs.

Interview Questions on This Topic

  • QExplain the CAP theorem and how it affects NoSQL database choices. Give a real-world example of a CP vs AP choice.SeniorReveal
    CAP theorem states a distributed data store can only provide two of three guarantees: Consistency (every read returns the most recent write), Availability (every request receives a response), and Partition Tolerance (system works despite network splits). Since partitions are inevitable in distributed systems, you must choose between CP and AP. Real-world example: A banking system needs strong consistency — CP is the right choice. If a partition occurs, the system may become unavailable for a few seconds rather than risk showing a stale balance. In contrast, a social media feed can tolerate stale data for a few seconds — AP is chosen to ensure high availability even during partitions. Twitter's 'fail whale' era was a CP design that failed during traffic spikes; modern designs lean AP.
  • QWhen would you choose MongoDB over PostgreSQL? Give a concrete scenario.Mid-levelReveal
    Choose MongoDB when your data has a highly varied or frequently changing schema, and when reads typically involve fetching a single document that contains all the data needed for a page. For example, a product catalogue where each product category has different attributes (electronics have specs, clothing has size/color, food has nutritional info). In MongoDB, each product is a document with its own structure — no migrations needed. Query performance is good because you can find the product by ID and get everything in one read. But if your data has strong relationships (orders, line items, shipments) and you need ACID transactions across multiple entities, PostgreSQL is almost always better — even with MongoDB 4.0+ multi-document transactions, the performance penalty is significant.
  • QDescribe how Cassandra handles writes and how it achieves high write throughput.SeniorReveal
    Cassandra's write path is append-only. When a write arrives, it's written to a commit log for durability and then to a memtable (in-memory sorted structure). Once the memtable is full, it's flushed to disk as an SSTable (sorted immutable file). This design avoids random I/O entirely — writes are sequential. Since nodes are masterless (ring topology), any node can accept a write, which eliminates the single-master bottleneck. The coordinator node determines the replicas via consistent hashing (partition key) and writes to all replicas in parallel. Consistency is tunable; for example, WRITE_QUORUM ensures the write is acknowledged by a majority of replicas. This design achieves millions of writes per second on a moderate cluster.
  • QWhat are trade-offs of using an in-memory key-value store like Redis vs a disk-backed one like DynamoDB?Mid-levelReveal
    Redis stores all data in RAM, providing sub-millisecond latency but limiting dataset size to available memory (and memory is expensive). Persistence is optional and can affect performance during snapshotting. DynamoDB stores data on SSD-backed storage but scales to petabytes, and costs are based on read/write capacity units. DynamoDB is better for large datasets that don't fit in memory or when you need automatic scaling. Redis excels for real-time operations like rate limiting, session stores, and leaderboards where low latency and rich data structures are critical. The trade-off is cost vs scale vs speed.
  • QExplain the concept of tombstone in Cassandra. Why is it a performance problem and how do you mitigate it?SeniorReveal
    In Cassandra, when you delete a row, a tombstone marker is inserted instead of immediate removal. The actual data is removed during compaction (when SSTables are merged). Tombstones are a problem because they must be kept until compaction runs, and they are scanned during reads. If a query reads a large number of tombstones, it can timeout — especially in large partitions. Mitigation: use TTL (Time-To-Live) for automatic expiry instead of explicit deletes where possible. Use the 'TimeWindowCompactionStrategy' (TWCS) for time-series data to keep partitions small. Monitor tombstone counts via nodetool cfhistograms. Avoid creating many small partitions with short TTLs — batch data into larger time buckets.
  • QWhen would you use a graph database over a document store?SeniorReveal
    Use a graph database when the relationships between entities are as important as the entities themselves, and queries primarily involve traversing those relationships. Examples: social network friend-of-friend recommendations, fraud detection (finding suspicious transaction chains), supply chain impact analysis, and metadata management. In a document store, these traversals require multiple queries (N+1) or application-level joins — slow at scale. Graph databases (e.g., Neo4j) store relationships as first-class objects, making traversal O(1) per hop. The trade-off: graph databases are poor at bulk aggregations and range scans, and their write performance is lower than column-family stores.

Frequently Asked Questions

What is NoSQL in simple terms?

NoSQL stands for 'Not Only SQL'. It refers to databases that don't use the traditional table-based relational model. Instead, they store data as documents, key-value pairs, wide columns, or graphs. They're built to handle large scale, flexible schemas, and high availability — often sacrificing some ACID guarantees in exchange.

Which NoSQL database should I learn first?

Start with MongoDB (document store) — it's the most popular, has a rich query language, and the concepts transfer to other NoSQL systems. Then learn Redis (key-value) for caching and real-time workloads. For interview preparation, also understand Cassandra (column-family) and Neo4j (graph) at a high level. The key is understanding the trade-offs, not just syntax.

Can I use NoSQL for a banking application?

Not without significant careful design. Banking requires ACID transactions across multiple entities (accounts, ledgers, transactions). Most NoSQL systems sacrifice strong consistency or multi-record atomicity. While some NoSQL databases (e.g., MongoDB with replica set majority write concern) can be configured for strong consistency, the operational complexity and potential for data loss during edge cases make relational databases a safer choice for core financial systems.

What is the difference between MongoDB and Cassandra?

MongoDB is a document store — data is stored as JSON-like documents with flexible schemas. It supports secondary indexes, rich queries, and aggregation pipelines. Cassandra is a column-family store — data is stored in rows with flexible columns, but it's designed for high write throughput and horizontal scaling across many nodes. Cassandra has a more limited query language (CQL) — you must model your data around the queries you'll run. MongoDB is easier to start with; Cassandra is better for time-series and write-heavy workloads.

Is NoSQL faster than SQL?

It depends on the query pattern. For simple key lookups, key-value stores can be 10-100x faster than SQL because they avoid parsing, planning, and joining. But for complex queries involving multiple conditions, aggregations, and relationships, a well-indexed SQL database often outperforms NoSQL, which would require application-level logic. The speed difference is a function of access pattern, not the storage engine itself.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

Next →MongoDB Basics
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged