Senior 3 min · June 25, 2026

NoSQL Store Types Compared: Picking the Right Hammer Before Production Bites You

Compare key-value, document, column-family, and graph stores by internals, trade-offs, and production patterns.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

The four main NoSQL store types are key-value (Redis, DynamoDB), document (MongoDB, Firestore), column-family (Cassandra, HBase), and graph (Neo4j, Amazon Neptune). Choose based on your access pattern: key-value for simple lookups, document for semi-structured data, column-family for wide-column aggregates, graph for connected data.

✦ Definition~90s read
What is NoSQL Store Types Compared?

NoSQL stores are non-relational databases designed for specific data models and access patterns. They trade ACID guarantees for horizontal scalability, flexible schemas, and high throughput under specific workloads.

Think of NoSQL stores like different toolboxes.
Plain-English First

Think of NoSQL stores like different toolboxes. Key-value is a post-it note: you write a label and a value, and you can only find it by that label. Document is a filing cabinet with folders: each folder has a label and a bunch of papers inside, and you can search by any paper's content. Column-family is a spreadsheet that grows sideways: you add columns on the fly, and you can read all cells in a column quickly. Graph is a mind map: you store nodes (people, places) and edges (relationships), and you can traverse connections efficiently.

If you've ever watched a PostgreSQL instance buckle under 50k writes per second while your colleague's DynamoDB table laughs at 500k, you know the difference between picking the right tool and the wrong one. NoSQL isn't a single thing—it's four fundamentally different data models, each with its own failure modes and sweet spots. I've seen teams burn months migrating from MongoDB to Cassandra because they didn't understand that document stores and column-family stores solve completely different problems. This article gives you the decision framework I wish I'd had: the internals, the trade-offs, and the production patterns that separate a smooth scaling story from a 3am incident. By the end, you'll be able to look at any access pattern and instantly know which NoSQL store type to reach for—and more importantly, which one to avoid.

Key-Value Stores: The Simplest Hammer, But Not for Every Nail

Key-value stores are the most basic NoSQL type. You have a key, you have a value. That's it. The value is opaque—the database doesn't care about its structure. This simplicity gives you insane performance: Redis can do 100k+ ops/sec on a single node. But it also means you can't query by anything other than the key. Use cases: caching, session stores, rate limiters, distributed locks. The trade-off: you must know the key to get the value. No range queries, no secondary indexes. If your access pattern is 'give me the user profile for user_id 42', key-value is perfect. If you need 'give me all users who signed up last week', you're building that index yourself.

RateLimiterWithRedis.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// io.thecodeforge — System Design tutorial

// Rate limiter using Redis sorted sets (key-value with ordered values)
// Scenario: API rate limiter per user, sliding window of 1 minute, max 100 requests

import redis
import time

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

def allow_request(user_id: str) -> bool:
    key = f"ratelimit:{user_id}"
    now = time.time()
    window_start = now - 60  # 1 minute sliding window

    # Remove entries outside the window
    r.zremrangebyscore(key, 0, window_start)

    # Count current requests in window
    request_count = r.zcard(key)

    if request_count >= 100:
        return False

    # Add current request with timestamp as score
    r.zadd(key, {str(now): now})
    # Set TTL to auto-clean (optional)
    r.expire(key, 60)
    return True

# Usage
print(allow_request("user_abc"))  # True or False
Output
True
(if under limit) or False (if over limit)
Production Trap: Redis Memory Eviction
If you don't set maxmemory-policy allkeys-lru, Redis will OOM when it hits maxmemory. The default noeviction policy returns errors on writes. Always set an eviction policy in production.
NoSQL Store Types Compared THECODEFORGE.IO NoSQL Store Types Compared Choosing the right NoSQL model for your data and workload Key-Value Stores Simple, fast lookups by key; not for complex queries Document Stores Flexible schemas; good for semi-structured data Column-Family Stores Wide-column design; handles massive write throughput Graph Stores Relationships as first-class citizens; ideal for connected data When Not to Use NoSQL Relational databases still excel for ACID and joins ⚠ Picking wrong store leads to painful production rewrites Match store type to query patterns, not hype THECODEFORGE.IO
thecodeforge.io
NoSQL Store Types Compared
Nosql Store Types
Key-Value Store Operation FlowTHECODEFORGE.IOKey-Value Store Operation FlowSimple get/set with opaque valuesClient RequestKey + value sent to storeHash LookupO(1) key hash to partitionValue StorageOpaque blob, no schemaResponse100k+ ops/sec possible⚠ No querying inside values — fetch entire blobTHECODEFORGE.IO
thecodeforge.io
Key-Value Store Operation Flow
Nosql Store Types

Document Stores: Schemas Are Optional, But Discipline Is Not

Document stores like MongoDB store JSON-like documents. Unlike key-value, the database understands the document structure—you can query on any field, create secondary indexes, and run aggregations. This flexibility is a double-edged sword. Without a schema, you can get data inconsistency: one document has email as a string, another has it as an object. The sweet spot: catalogs, content management, event sourcing, and any workload where the schema evolves rapidly. The gotcha: document stores are terrible for joins. MongoDB's $lookup is slow and doesn't scale. If your data is highly relational, you're better off with a graph store or just use PostgreSQL with JSONB.

ProductCatalogMongoDB.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// io.thecodeforge — System Design tutorial

// Product catalog with dynamic attributes using MongoDB
// Scenario: E-commerce catalog where products have varying attributes

const { MongoClient } = require('mongodb');

async function main() {
    const client = new MongoClient('mongodb://localhost:27017');
    await client.connect();
    const db = client.db('catalog');
    const products = db.collection('products');

    // Insert a product with dynamic attributes
    await products.insertOne({
        sku: 'LAPTOP-001',
        name: 'UltraBook Pro',
        price: 1499.99,
        attributes: {
            cpu: 'Intel i7',
            ram: '16GB',
            storage: '512GB SSD'
        },
        tags: ['electronics', 'laptop']
    });

    // Query: find all laptops under $1500 with 16GB RAM
    const cursor = products.find({
        price: { $lt: 1500 },
        'attributes.ram': '16GB',
        tags: 'laptop'
    });

    for await (const doc of cursor) {
        console.log(doc.name);
    }

    await client.close();
}

main().catch(console.error);
Output
UltraBook Pro
Senior Shortcut: Index Your Query Patterns
Create compound indexes that match your exact query filters. MongoDB can only use one index per query for the equality part, but compound indexes can cover multiple fields. Use explain() to verify index usage.

Column-Family Stores: When Your Data Has a Million Columns

Column-family stores like Cassandra and HBase store data in rows but group columns into families. They're optimized for write-heavy workloads and wide-column schemas where you often read a subset of columns. The key insight: data is partitioned by row key and sorted within a partition by clustering columns. This makes range scans within a partition fast. Use cases: time-series data (IoT sensor readings), event logging, recommendation engines, and any workload with high write throughput and predictable read patterns. The trade-off: you must design your schema around your queries. Cassandra's query language (CQL) does not support joins or aggregations—you model denormalized tables for each access pattern.

TimeSeriesCassandra.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// io.thecodeforge — System Design tutorial

// Time-series data model for IoT sensor readings using Cassandra
// Scenario: Store temperature readings from thousands of sensors

CREATE KEYSPACE IF NOT EXISTS iot
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

USE iot;

CREATE TABLE sensor_readings (
    sensor_id UUID,
    timestamp timestamp,
    temperature double,
    humidity double,
    PRIMARY KEY (sensor_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);

// Insert a reading
INSERT INTO sensor_readings (sensor_id, timestamp, temperature, humidity)
VALUES (uuid(), toTimestamp(now()), 23.5, 60.2);

// Query: get the last 10 readings for a sensor
SELECT temperature, humidity FROM sensor_readings
WHERE sensor_id = ?
ORDER BY timestamp DESC
LIMIT 10;
Output
temperature | humidity
-------------+----------
23.5 | 60.2
... | ...
Never Do This: Unbounded Partition Growth
If you use a high-cardinality partition key like a UUID per sensor, each partition stays small. But if you use a low-cardinality key like 'sensor_type', one partition can grow to gigabytes, causing read timeouts and compaction storms. Always ensure partitions are bounded (e.g., by time bucket).

Graph Stores: When Relationships Are the Data

Graph stores like Neo4j store nodes and edges. They excel at traversing relationships—finding friends of friends, shortest paths, or influence patterns. The data model is natural for social networks, recommendation engines, fraud detection, and knowledge graphs. The performance advantage comes from index-free adjacency: each node stores pointers to its neighbors, so traversing a path doesn't require global index lookups. The trade-off: graph stores are terrible for aggregate queries (e.g., 'count all users') or simple key-value lookups. They also have a steeper learning curve with query languages like Cypher or Gremlin.

SocialGraphNeo4j.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — System Design tutorial

// Social network graph using Neo4j Cypher
// Scenario: Find friends of friends for a user

CREATE (alice:User {name: 'Alice'})
CREATE (bob:User {name: 'Bob'})
CREATE (charlie:User {name: 'Charlie'})
CREATE (diana:User {name: 'Diana'})
CREATE (alice)-[:FRIENDS]->(bob)
CREATE (bob)-[:FRIENDS]->(charlie)
CREATE (charlie)-[:FRIENDS]->(diana)

// Query: friends of friends of Alice (excluding Alice herself and direct friends)
MATCH (alice:User {name: 'Alice'})-[:FRIENDS]->(friend)-[:FRIENDS]->(fof)
WHERE NOT (alice)-[:FRIENDS]->(fof) AND alice <> fof
RETURN fof.name AS friend_of_friend;
Output
friend_of_friend
-----------------
Diana
Interview Gold: Index-Free Adjacency
Graph stores achieve O(1) traversal per edge because each node stores direct references to its neighbors. This is why graph DBs can traverse millions of relationships per second, while relational DBs would need expensive JOINs.

When Not to Use NoSQL: The Relational Renaissance

NoSQL isn't always the answer. If your data is highly relational with complex joins, referential integrity, and ACID transactions, a relational database is still the right choice. PostgreSQL with JSONB can handle many semi-structured workloads without the operational complexity of a separate NoSQL store. I've seen teams adopt MongoDB for a simple blog and then struggle with reporting queries that would be trivial in SQL. The rule of thumb: if you need multi-object transactions, use a relational DB. If you need flexible schemas and horizontal scaling, consider NoSQL. But don't cargo-cult—evaluate your actual access patterns first.

Senior Shortcut: Polyglot Persistence
Use the right tool for each job. A typical microservices architecture might use PostgreSQL for orders, Redis for caching, Cassandra for event logs, and Elasticsearch for full-text search. Don't force one database to do everything.
NoSQL vs Relational: When to ChooseTHECODEFORGE.IONoSQL vs Relational: When to ChooseTrade-offs for production workloadsNoSQL FitsSemi-structured or flexible schemasHigh write throughput neededSimple key-based lookupsHorizontal scaling priorityRelational FitsComplex joins and relationshipsACID transactions requiredStrong referential integrityMature tooling and reportingPostgreSQL with JSONB bridges both worldsTHECODEFORGE.IO
thecodeforge.io
NoSQL vs Relational: When to Choose
Nosql Store Types
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
A microservice using MongoDB crashed every 6 hours with OOMKilled. The container had 4GB RAM. The dataset was 2GB.
Assumption
The team assumed a memory leak in the application code or driver.
Root cause
MongoDB's WiredTiger storage engine uses internal cache (default 50% of RAM minus 1GB) plus journaling and snapshotting. With 4GB RAM, the cache was ~1.5GB. But the working set was 2GB, causing constant page faults and cache eviction. The OOM killer fired when the OS page cache plus MongoDB cache exceeded RAM.
Fix
Set wiredTigerCacheSizeGB to 1 (25% of RAM) to leave room for OS cache and other processes. Also enabled compression with --wiredTigerCollectionBlockCompressor zlib.
Key lesson
  • MongoDB's memory usage is not just your data—it's cache, journal, and connections.
  • Always reserve 25-30% of RAM for the OS and other processes.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
MongoDB query slow despite index — executionStats shows 'COLLSCAN'
Fix
1. Run db.collection.getIndexes() to list indexes. 2. Create compound index matching query filter. 3. Use hint() to force index. 4. Check index size vs RAM.
Symptom · 02
Cassandra read timeout — ReadTimeoutException in logs
Fix
1. Check partition size with nodetool cfhistograms. 2. If partition > 100MB, redesign schema with time-bucketing. 3. Increase read_request_timeout_in_ms temporarily. 4. Add more nodes to spread load.
Symptom · 03
Redis OOM — OOM command not allowed when used memory > 'maxmemory'
Fix
1. Check INFO memory for used_memory. 2. Set maxmemory-policy allkeys-lru. 3. Reduce TTLs or increase maxmemory. 4. Monitor evictions with INFO stats.
★ NoSQL Store Types Compared Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
MongoDB high disk I/O — `iowait` high on server
Immediate action
Check if WiredTiger cache is too small causing constant page faults.
Commands
db.serverStatus().wiredTiger.cache
db.adminCommand({setParameter: 1, wiredTigerCacheSizeGB: 2})
Fix now
Increase wiredTigerCacheSizeGB to 50-60% of RAM, but leave room for OS.
Cassandra high compaction load — `CompactionExecutor` threads at 100%+
Immediate action
Check if compaction strategy is inappropriate for workload.
Commands
nodetool compactionstats
ALTER TABLE ... WITH compaction = {'class': 'LeveledCompactionStrategy'}
Fix now
Switch to LeveledCompactionStrategy for write-heavy workloads.
Redis latency spikes — `latency` command shows >100ms+
Immediate action
Check for slow commands or fork-induced latency from RDB save.
Commands
redis-cli --latency
redis-cli SLOWLOG GET 10
Fix now
Disable RDB saves if not needed, or use repl-diskless-sync yes.
Neo4j query slow — `PROFILE` shows high db hits+
Immediate action
Check if index is missing on frequently filtered properties.
Commands
CALL db.indexes()
CREATE INDEX ON :User(name)
Fix now
Create indexes on properties used in WHERE clauses.
Feature / AspectKey-ValueDocumentColumn-FamilyGraph
Data ModelKey-value pairsJSON-like documentsRows with column familiesNodes and edges
Query PatternGet by keyQuery on any fieldRange scans within partitionTraverse relationships
ScalabilityHorizontal (sharding)Horizontal (sharding)Horizontal (partitioning)Vertical (scale-up) or sharding
ConsistencyEventual or strong (configurable)Configurable (MongoDB: primary reads)Eventual (Cassandra) or strong (HBase)ACID (Neo4j) or eventual
Best ForCaching, sessions, rate limitingCatalogs, CMS, event sourcingTime-series, logging, IoTSocial networks, fraud detection, recommendations
Worst ForComplex queries, relationshipsJoins, highly relational dataAd-hoc queries, aggregationsSimple key lookups, aggregate counts

Key takeaways

1
Key-value stores are for simple lookups by key; anything else requires you to build the index yourself.
2
Document stores give you query flexibility but fail at joins; model your data to match access patterns, not relational normalization.
3
Column-family stores excel at write-heavy, wide-column workloads but require careful partition key design to avoid hot spots.
4
Graph stores are unbeatable for relationship traversal but terrible for aggregate queries; use them only when relationships are the primary access pattern.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does Cassandra handle a node failure during a write? What consistenc...
Q02SENIOR
When would you choose MongoDB over PostgreSQL with JSONB?
Q03SENIOR
What happens when a Redis key expires while a client is reading it?
Q04JUNIOR
What is a document store?
Q05SENIOR
Your team migrated from MongoDB to Cassandra to handle write throughput,...
Q06SENIOR
How would you design a global-scale user session store with 99.99% avail...
Q01 of 06SENIOR

How does Cassandra handle a node failure during a write? What consistency level ensures no data loss?

ANSWER
Cassandra uses hinted handoff: the coordinator stores the write locally and replays it when the node recovers. For no data loss, use CL=ALL but that hurts availability. In practice, CL=QUORUM with replication factor 3 tolerates one node failure.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What are the four types of NoSQL databases?
02
What's the difference between a document store and a column-family store?
03
How do I choose between MongoDB and Cassandra for a new project?
04
Can I use a graph database for simple key-value lookups?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Database Internals. Mark it forged?

3 min read · try the examples if you haven't

Previous
Database Federation
8 / 9 · Database Internals
Next
LSM Trees and SSTables