Senior 3 min · June 25, 2026

NoSQL Store Types Compared: Picking the Right Hammer Before Production Bites You

Q: What are the four types of NoSQL databases?

The four types are key-value stores (Redis, DynamoDB), document stores (MongoDB, Firestore), column-family stores (Cassandra, HBase), and graph stores (Neo4j, Amazon Neptune). Each is optimized for different data models and access patterns.

Q: What's the difference between a document store and a column-family store?

Document stores store self-contained JSON documents and allow querying on any field. Column-family stores store rows with many columns grouped into families, optimized for wide tables and range scans. Use document for semi-structured data with varying attributes; use column-family for time-series or event logs.

Q: How do I choose between MongoDB and Cassandra for a new project?

Choose MongoDB if you need flexible queries, secondary indexes, and a schema that evolves. Choose Cassandra if you need extreme write throughput, multi-datacenter replication, and can design your schema around known query patterns (no joins, no aggregations).

Q: Can I use a graph database for simple key-value lookups?

Technically yes, but it's overkill. Graph databases are optimized for traversing relationships, not for point lookups. A key-value store will be faster and simpler for that use case.

Compare key-value, document, column-family, and graph stores by internals, trade-offs, and production patterns.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

The four main NoSQL store types are key-value (Redis, DynamoDB), document (MongoDB, Firestore), column-family (Cassandra, HBase), and graph (Neo4j, Amazon Neptune). Choose based on your access pattern: key-value for simple lookups, document for semi-structured data, column-family for wide-column aggregates, graph for connected data.

✦ Definition~90s read

What is NoSQL Store Types Compared?

NoSQL stores are non-relational databases designed for specific data models and access patterns. They trade ACID guarantees for horizontal scalability, flexible schemas, and high throughput under specific workloads.

★

Think of NoSQL stores like different toolboxes.

Plain-English First

Think of NoSQL stores like different toolboxes. Key-value is a post-it note: you write a label and a value, and you can only find it by that label. Document is a filing cabinet with folders: each folder has a label and a bunch of papers inside, and you can search by any paper's content. Column-family is a spreadsheet that grows sideways: you add columns on the fly, and you can read all cells in a column quickly. Graph is a mind map: you store nodes (people, places) and edges (relationships), and you can traverse connections efficiently.

If you've ever watched a PostgreSQL instance buckle under 50k writes per second while your colleague's DynamoDB table laughs at 500k, you know the difference between picking the right tool and the wrong one. NoSQL isn't a single thing—it's four fundamentally different data models, each with its own failure modes and sweet spots. I've seen teams burn months migrating from MongoDB to Cassandra because they didn't understand that document stores and column-family stores solve completely different problems. This article gives you the decision framework I wish I'd had: the internals, the trade-offs, and the production patterns that separate a smooth scaling story from a 3am incident. By the end, you'll be able to look at any access pattern and instantly know which NoSQL store type to reach for—and more importantly, which one to avoid.

Key-Value Stores: The Simplest Hammer, But Not for Every Nail

Key-value stores are the most basic NoSQL type. You have a key, you have a value. That's it. The value is opaque—the database doesn't care about its structure. This simplicity gives you insane performance: Redis can do 100k+ ops/sec on a single node. But it also means you can't query by anything other than the key. Use cases: caching, session stores, rate limiters, distributed locks. The trade-off: you must know the key to get the value. No range queries, no secondary indexes. If your access pattern is 'give me the user profile for user_id 42', key-value is perfect. If you need 'give me all users who signed up last week', you're building that index yourself.

RateLimiterWithRedis.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Rate limiter using Redis sorted sets (key-value with ordered values)
// Scenario: API rate limiter per user, sliding window of 1 minute, max 100 requests

import redis
import time

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

def allow_request(user_id: str) -> bool:
    key = f"ratelimit:{user_id}"
    now = time.time()
    window_start = now - 60  # 1 minute sliding window

    # Remove entries outside the window
    r.zremrangebyscore(key, 0, window_start)

    # Count current requests in window
    request_count = r.zcard(key)

    if request_count >= 100:
        return False

    # Add current request with timestamp as score
    r.zadd(key, {str(now): now})
    # Set TTL to auto-clean (optional)
    r.expire(key, 60)
    return True

# Usage
print(allow_request("user_abc"))  # True or False

Output

True

(if under limit) or False (if over limit)

Production Trap: Redis Memory Eviction

If you don't set maxmemory-policy allkeys-lru, Redis will OOM when it hits maxmemory. The default noeviction policy returns errors on writes. Always set an eviction policy in production.

thecodeforge.io

NoSQL Store Types Compared

Nosql Store Types

thecodeforge.io

Key-Value Store Operation Flow

Nosql Store Types

Document Stores: Schemas Are Optional, But Discipline Is Not

Document stores like MongoDB store JSON-like documents. Unlike key-value, the database understands the document structure—you can query on any field, create secondary indexes, and run aggregations. This flexibility is a double-edged sword. Without a schema, you can get data inconsistency: one document has email as a string, another has it as an object. The sweet spot: catalogs, content management, event sourcing, and any workload where the schema evolves rapidly. The gotcha: document stores are terrible for joins. MongoDB's $lookup is slow and doesn't scale. If your data is highly relational, you're better off with a graph store or just use PostgreSQL with JSONB.

ProductCatalogMongoDB.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Product catalog with dynamic attributes using MongoDB
// Scenario: E-commerce catalog where products have varying attributes

const { MongoClient } = require('mongodb');

async function main() {
    const client = new MongoClient('mongodb://localhost:27017');
    await client.connect();
    const db = client.db('catalog');
    const products = db.collection('products');

    // Insert a product with dynamic attributes
    await products.insertOne({
        sku: 'LAPTOP-001',
        name: 'UltraBook Pro',
        price: 1499.99,
        attributes: {
            cpu: 'Intel i7',
            ram: '16GB',
            storage: '512GB SSD'
        },
        tags: ['electronics', 'laptop']
    });

    // Query: find all laptops under $1500 with 16GB RAM
    const cursor = products.find({
        price: { $lt: 1500 },
        'attributes.ram': '16GB',
        tags: 'laptop'
    });

    for await (const doc of cursor) {
        console.log(doc.name);
    }

    await client.close();
}

main().catch(console.error);

Output

UltraBook Pro

Senior Shortcut: Index Your Query Patterns

Create compound indexes that match your exact query filters. MongoDB can only use one index per query for the equality part, but compound indexes can cover multiple fields. Use explain() to verify index usage.

Column-Family Stores: When Your Data Has a Million Columns

Column-family stores like Cassandra and HBase store data in rows but group columns into families. They're optimized for write-heavy workloads and wide-column schemas where you often read a subset of columns. The key insight: data is partitioned by row key and sorted within a partition by clustering columns. This makes range scans within a partition fast. Use cases: time-series data (IoT sensor readings), event logging, recommendation engines, and any workload with high write throughput and predictable read patterns. The trade-off: you must design your schema around your queries. Cassandra's query language (CQL) does not support joins or aggregations—you model denormalized tables for each access pattern.

TimeSeriesCassandra.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Time-series data model for IoT sensor readings using Cassandra
// Scenario: Store temperature readings from thousands of sensors

CREATE KEYSPACE IF NOT EXISTS iot
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

USE iot;

CREATE TABLE sensor_readings (
    sensor_id UUID,
    timestamp timestamp,
    temperature double,
    humidity double,
    PRIMARY KEY (sensor_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);

// Insert a reading
INSERT INTO sensor_readings (sensor_id, timestamp, temperature, humidity)
VALUES (uuid(), toTimestamp(now()), 23.5, 60.2);

// Query: get the last 10 readings for a sensor
SELECT temperature, humidity FROM sensor_readings
WHERE sensor_id = ?
ORDER BY timestamp DESC
LIMIT 10;

Output

temperature | humidity

-------------+----------

23.5 | 60.2

... | ...

Never Do This: Unbounded Partition Growth

If you use a high-cardinality partition key like a UUID per sensor, each partition stays small. But if you use a low-cardinality key like 'sensor_type', one partition can grow to gigabytes, causing read timeouts and compaction storms. Always ensure partitions are bounded (e.g., by time bucket).

Graph Stores: When Relationships Are the Data

Graph stores like Neo4j store nodes and edges. They excel at traversing relationships—finding friends of friends, shortest paths, or influence patterns. The data model is natural for social networks, recommendation engines, fraud detection, and knowledge graphs. The performance advantage comes from index-free adjacency: each node stores pointers to its neighbors, so traversing a path doesn't require global index lookups. The trade-off: graph stores are terrible for aggregate queries (e.g., 'count all users') or simple key-value lookups. They also have a steeper learning curve with query languages like Cypher or Gremlin.

SocialGraphNeo4j.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Social network graph using Neo4j Cypher
// Scenario: Find friends of friends for a user

CREATE (alice:User {name: 'Alice'})
CREATE (bob:User {name: 'Bob'})
CREATE (charlie:User {name: 'Charlie'})
CREATE (diana:User {name: 'Diana'})
CREATE (alice)-[:FRIENDS]->(bob)
CREATE (bob)-[:FRIENDS]->(charlie)
CREATE (charlie)-[:FRIENDS]->(diana)

// Query: friends of friends of Alice (excluding Alice herself and direct friends)
MATCH (alice:User {name: 'Alice'})-[:FRIENDS]->(friend)-[:FRIENDS]->(fof)
WHERE NOT (alice)-[:FRIENDS]->(fof) AND alice <> fof
RETURN fof.name AS friend_of_friend;

Output

friend_of_friend

-----------------

Diana

Interview Gold: Index-Free Adjacency

Graph stores achieve O(1) traversal per edge because each node stores direct references to its neighbors. This is why graph DBs can traverse millions of relationships per second, while relational DBs would need expensive JOINs.

When Not to Use NoSQL: The Relational Renaissance

NoSQL isn't always the answer. If your data is highly relational with complex joins, referential integrity, and ACID transactions, a relational database is still the right choice. PostgreSQL with JSONB can handle many semi-structured workloads without the operational complexity of a separate NoSQL store. I've seen teams adopt MongoDB for a simple blog and then struggle with reporting queries that would be trivial in SQL. The rule of thumb: if you need multi-object transactions, use a relational DB. If you need flexible schemas and horizontal scaling, consider NoSQL. But don't cargo-cult—evaluate your actual access patterns first.

Senior Shortcut: Polyglot Persistence

Use the right tool for each job. A typical microservices architecture might use PostgreSQL for orders, Redis for caching, Cassandra for event logs, and Elasticsearch for full-text search. Don't force one database to do everything.

thecodeforge.io

NoSQL vs Relational: When to Choose

Nosql Store Types

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

A microservice using MongoDB crashed every 6 hours with OOMKilled. The container had 4GB RAM. The dataset was 2GB.

Assumption

The team assumed a memory leak in the application code or driver.

Root cause

MongoDB's WiredTiger storage engine uses internal cache (default 50% of RAM minus 1GB) plus journaling and snapshotting. With 4GB RAM, the cache was ~1.5GB. But the working set was 2GB, causing constant page faults and cache eviction. The OOM killer fired when the OS page cache plus MongoDB cache exceeded RAM.

Fix

Set wiredTigerCacheSizeGB to 1 (25% of RAM) to leave room for OS cache and other processes. Also enabled compression with --wiredTigerCollectionBlockCompressor zlib.

Key lesson

MongoDB's memory usage is not just your data—it's cache, journal, and connections.
Always reserve 25-30% of RAM for the OS and other processes.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

MongoDB query slow despite index — executionStats shows 'COLLSCAN'

→

Fix

1. Run db.collection.getIndexes() to list indexes. 2. Create compound index matching query filter. 3. Use hint() to force index. 4. Check index size vs RAM.

Symptom · 02

Cassandra read timeout — ReadTimeoutException in logs

→

Fix

1. Check partition size with nodetool cfhistograms. 2. If partition > 100MB, redesign schema with time-bucketing. 3. Increase read_request_timeout_in_ms temporarily. 4. Add more nodes to spread load.

Symptom · 03

Redis OOM — OOM command not allowed when used memory > 'maxmemory'

→

Fix

1. Check INFO memory for used_memory. 2. Set maxmemory-policy allkeys-lru. 3. Reduce TTLs or increase maxmemory. 4. Monitor evictions with INFO stats.

★ NoSQL Store Types Compared Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

MongoDB high disk I/O — `iowait` high on server−

Immediate action

Check if WiredTiger cache is too small causing constant page faults.

Commands

db.serverStatus().wiredTiger.cache

db.adminCommand({setParameter: 1, wiredTigerCacheSizeGB: 2})

Fix now

Increase wiredTigerCacheSizeGB to 50-60% of RAM, but leave room for OS.

Cassandra high compaction load — `CompactionExecutor` threads at 100%+

Redis latency spikes — `latency` command shows >100ms+

Neo4j query slow — `PROFILE` shows high db hits+

Feature / Aspect	Key-Value	Document	Column-Family	Graph
Data Model	Key-value pairs	JSON-like documents	Rows with column families	Nodes and edges
Query Pattern	Get by key	Query on any field	Range scans within partition	Traverse relationships
Scalability	Horizontal (sharding)	Horizontal (sharding)	Horizontal (partitioning)	Vertical (scale-up) or sharding
Consistency	Eventual or strong (configurable)	Configurable (MongoDB: primary reads)	Eventual (Cassandra) or strong (HBase)	ACID (Neo4j) or eventual
Best For	Caching, sessions, rate limiting	Catalogs, CMS, event sourcing	Time-series, logging, IoT	Social networks, fraud detection, recommendations
Worst For	Complex queries, relationships	Joins, highly relational data	Ad-hoc queries, aggregations	Simple key lookups, aggregate counts

Key takeaways

Key-value stores are for simple lookups by key; anything else requires you to build the index yourself.

Document stores give you query flexibility but fail at joins; model your data to match access patterns, not relational normalization.

Column-family stores excel at write-heavy, wide-column workloads but require careful partition key design to avoid hot spots.

Graph stores are unbeatable for relationship traversal but terrible for aggregate queries; use them only when relationships are the primary access pattern.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does Cassandra handle a node failure during a write? What consistenc...

Q02SENIOR

When would you choose MongoDB over PostgreSQL with JSONB?

Q03SENIOR

What happens when a Redis key expires while a client is reading it?

Q04JUNIOR

What is a document store?

Q05SENIOR

Your team migrated from MongoDB to Cassandra to handle write throughput,...

Q06SENIOR

How would you design a global-scale user session store with 99.99% avail...

Q01 of 06SENIOR

How does Cassandra handle a node failure during a write? What consistency level ensures no data loss?

ANSWER

Cassandra uses hinted handoff: the coordinator stores the write locally and replays it when the node recovers. For no data loss, use CL=ALL but that hurts availability. In practice, CL=QUORUM with replication factor 3 tolerates one node failure.

FAQ · 4 QUESTIONS

Frequently Asked Questions

What are the four types of NoSQL databases?

What's the difference between a document store and a column-family store?

How do I choose between MongoDB and Cassandra for a new project?

Can I use a graph database for simple key-value lookups?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Database Internals. Mark it forged?

3 min read · try the examples if you haven't