SQL = ACID guarantees, relational model, vertical scaling, strong consistency — use for financial transactions, user data, anything requiring joins
NoSQL = BASE model, flexible schema, horizontal scaling, eventual consistency — use for high-velocity writes, flexible document structures, graph traversal
Decision drivers: data shape (relational vs hierarchical/variable) and access pattern (complex joins vs key-based lookups)
Horizontal scaling: NoSQL distributes by partition key — Cassandra, DynamoDB. SQL shards with more complexity
Polyglot persistence: most production systems above a certain scale use both — PostgreSQL for transactions, Redis for cache/geo, Cassandra for event streams
Biggest mistake: choosing NoSQL 'because it scales' without defining the access pattern first
Plain-English First
Imagine you're organizing a school library. SQL is like a perfectly labeled filing cabinet — every book has a fixed slot, a category, an author field, and a due-date column. It's rigid, but finding anything is lightning fast and nothing goes missing. NoSQL is like a giant box of labeled folders — you can throw in a book, a DVD, a hand-drawn map, or a sticky note without needing to redesign the whole cabinet first. Neither is better. It depends on whether your library stores textbooks or surprises.
Every system design interview, every startup tech decision, and every database migration horror story eventually comes back to one question: SQL or NoSQL? This isn't an academic debate. Instagram started on PostgreSQL and still uses it for core data. Netflix uses Cassandra to serve 200+ million users. Airbnb uses both. The wrong choice early in a project can cost months of painful migrations, and the right choice can make your architecture elegantly simple for years.
The problem is that most explanations frame this as a competition — 'SQL is old, NoSQL is modern' or 'NoSQL scales, SQL doesn't.' Both of those statements are wrong, and believing them leads to genuinely bad system designs. The real question isn't which is better — it's which fits the shape of your data, your access patterns, and your consistency requirements right now.
By the end of this article you'll be able to look at a system's requirements and make a confident, reasoned database choice. You'll understand ACID vs BASE, when eventual consistency is actually fine, how to spot when a relational model fights your data shape, and exactly what trade-offs you're accepting with each decision. This is the mental model senior engineers use in real design discussions.
ACID vs BASE: The Real Difference Hiding Under the Buzzwords
When people say 'SQL is reliable and NoSQL is fast,' what they actually mean is ACID vs BASE — and this distinction is the most important thing to understand before picking a database.
ACID (Atomicity, Consistency, Isolation, Durability) is the promise SQL databases make. It means a bank transfer either fully succeeds or fully rolls back — you'll never have money leave one account without landing in another. Every transaction is isolated from others running simultaneously. Once committed, data survives crashes. This guarantee costs something: the database must do a lot of coordination work, which is harder to distribute across many machines.
BASE (Basically Available, Soft state, Eventually consistent) is the trade-off NoSQL systems often accept in exchange for horizontal scalability. 'Eventually consistent' sounds scary, but think about your Twitter feed. If someone in Tokyo posts a tweet and you see it 200 milliseconds later than someone in Seoul — does that matter? Not at all. But if your bank balance is 'eventually consistent,' that's a disaster.
The question to ask yourself isn't 'which is faster.' It's: 'Does my use case require strong consistency, or can it tolerate a brief window where different nodes show slightly different data?' Answer that honestly and half the decision is already made.
acid_transaction_example.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- This example shows WHY atomicity matters in a financial system.-- Without ACID, a crash between these two statements would lose money.BEGINTRANSACTION;
-- Step 1: Deduct $500 from Alice's accountUPDATE bank_accounts
SET balance = balance - 500WHERE account_holder = 'Alice'AND balance >= 500; -- Guard: never allow negative balance-- Step 2: Credit $500 to Bob's accountUPDATE bank_accounts
SET balance = balance + 500WHERE account_holder = 'Bob';
-- Only reaches here if BOTH updates succeeded-- If the server crashes between step 1 and step 2,-- the entire transaction is rolled back automatically.COMMIT;
-- What you'd see if Alice has insufficient funds:-- The guard condition (balance >= 500) causes 0 rows to be updated.-- COMMIT still runs, but since no rows changed, the state is safe.-- In production you'd check ROW_COUNT() and ROLLBACK explicitly.-- Contrast: in a NoSQL system without transactions (e.g., early MongoDB),-- you'd have to implement this logic in application code — which is-- error-prone and not atomic by default.
Output
-- Transaction either fully commits or fully rolls back
-- No partial state possible -- bank_accounts always consistent
Watch Out: 'NoSQL doesn't support transactions' is outdated
MongoDB has supported multi-document ACID transactions since version 4.0 (2018). DynamoDB has transactions too. The difference is that NoSQL systems don't enforce ACID by default — you opt in. With SQL, you get ACID whether you need it or not. Saying 'use NoSQL because it's faster' without accounting for this will lose you points in any serious system design interview.
Production Insight
ACID transactions have a coordination cost — that cost is what makes SQL slower at extreme write throughput, not a flaw.
BASE (Basically Available, Soft state, Eventually consistent) is not a safety compromise — it's a deliberate trade-off to enable horizontal partitioning.
Rule: if your domain has money moving between accounts, inventory reservations, or legal compliance requirements — ACID is non-negotiable.
Key Takeaway
ACID = strong guarantees, coordination overhead — right for financial, user, and audit data.
BASE = availability and partition tolerance, eventual consistency — right for event streams and high-velocity writes.
Modern NoSQL offers transactions — but at a cost. If you need them frequently, evaluate SQL first.
Data Shape vs Access Pattern: The Two Questions That Decide Everything
Here's a mental framework that makes database selection much less mysterious: before picking a database, answer exactly two questions.
Question 1 — What shape is your data? Relational data has clear entities with well-defined relationships between them (Users have Orders, Orders have Products). Hierarchical or polymorphic data varies per record — an e-commerce product might have 3 attributes or 300, and forcing it into fixed columns wastes space and creates endless ALTER TABLE headaches.
Question 2 — How do you access your data? If you query by many different columns with complex joins — 'give me all orders placed by VIP customers in Q3 for products in category X' — SQL's query optimizer is your best friend. If you always access data by a single known key — 'give me all activity for user_id 9821' — a document store or key-value database retrieves it in one lookup without joining anything.
The danger zone is when engineers pick a database based on hype instead of these two questions. A team building a social graph stuffs everything into PostgreSQL and ends up writing recursive CTEs just to find friends-of-friends. Another team shoves financial ledger data into MongoDB because 'NoSQL scales' and spends six months building transaction logic in application code that a relational database gives you for free.
Match the tool to the shape and the access pattern. Everything else follows.
product_catalog_document_model.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// WHY a document model wins here:// Each product category has COMPLETELY different attributes.// A TV has resolution, refresh rate, panel type.// A T-shirt has size, color, material, fit.// Forcing these into a single SQL table would require 50+ nullable columns// or a complex EAV (Entity-Attribute-Value) schema that's painful to query.// MongoDB document for an electronics productconst televisionDocument = {
_id: 'prod_tv_samsung_qn90b',
category: 'electronics',
name: 'Samsung QN90B 65-inch QLED',
price_usd: 1299.99,
stock_count: 47,
// These attributes are SPECIFIC to televisions — no other category has them
attributes: {
screen_size_inches: 65,
resolution: '4K',
panel_type: 'QLED',
refresh_rate_hz: 120,
smart_tv: true,
hdmi_ports: 4
},
tags: ['bestseller', 'holiday-sale'],
created_at: newDate('2024-01-15')
};
// MongoDB document for an apparel product// Notice: completely different attributes — no wasted null columnsconst tshirtDocument = {
_id: 'prod_shirt_nike_dri_fit_lg',
category: 'apparel',
name: 'Nike Dri-FIT Training T-Shirt',
price_usd: 34.99,
stock_count: 312,
// These attributes only make sense for clothing
attributes: {
size: 'Large',
color: 'Navy Blue',
material: '100% Polyester',
fit: 'Standard',
gender: 'Men'
},
tags: ['sports', 'summer'],
created_at: newDate('2024-02-03')
};
// Access pattern: always fetch by product ID — single document lookup.// db.products.findOne({ _id: 'prod_tv_samsung_qn90b' })// Returns the FULL product in one round trip. No joins needed.// If you needed complex cross-category analytics:// 'Top 10 best-selling products across all categories this month'// — NOW a SQL warehouse (like Redshift or BigQuery) makes more sense.// This is why large systems use BOTH databases for different jobs.
Output
// Document model: each product type has exactly the attributes it needs
// No nullable columns, no JOIN overhead, schema reflects reality
Pro Tip: Draw your access patterns before writing any schema
Before choosing a database, write down your top 5 most frequent queries as plain English. 'Get a user's profile by ID' → key-value or document store wins. 'Find all transactions over $1000 by date range grouped by merchant category' → SQL wins. If your top queries look like the first example but you still reach for SQL out of habit, you're adding unnecessary join overhead at scale.
Production Insight
The single most common database selection mistake: choosing the technology before defining the access pattern.
Write out your top 5 queries before touching any infrastructure decision — the query pattern reveals the optimal data model.
Flexible schema in NoSQL is only an advantage if your data genuinely varies per document. If it converges on a stable shape, SQL's schema enforcement prevents the drift tax.
Key Takeaway
Ask two questions before choosing: what shape is my data, and how do I access it?
Variable-attribute data (product catalogs, user-generated content) → document store.
Predictable relational data with complex queries → SQL — do not let hype override the access pattern.
Scaling Strategies: Why NoSQL Scales Horizontally and What That Actually Means
You've heard 'NoSQL scales better' a hundred times. Let's make that concrete instead of leaving it as a slogan.
SQL databases scale vertically — you buy a bigger server with more RAM, faster disks, more CPU. This works well up to a point. PostgreSQL can handle tens of thousands of transactions per second on powerful hardware. But vertical scaling has a ceiling, and expensive hardware is exactly that: expensive.
Horizontal scaling means adding more machines rather than upgrading one machine. This is where traditional SQL databases struggle, because maintaining ACID guarantees across multiple nodes requires coordination — distributed locks, two-phase commits, consensus protocols. It's not impossible (CockroachDB and Google Spanner do it), but it's complex and expensive.
NoSQL databases are architected for horizontal scaling from the start. Cassandra uses consistent hashing to distribute data across a cluster with no single master node. You add a node and the cluster automatically rebalances. DynamoDB abstracts this entirely — AWS handles sharding, replication, and failover; you just provide a partition key.
The practical implication: if you're building a system that needs to handle 10x traffic growth by next year, and your data doesn't require complex relational integrity, a well-partitioned NoSQL database will scale more cheaply and simply than vertical SQL upgrades. But if you need complex reporting queries across that data, you'll still want a SQL layer downstream.
cassandra_partition_schema.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# WHYCassandra wins for a user activity feed:
# Access pattern: 'Give me the last 50 events for user_id X'
# This is ALWAYS a lookup by user_id. Never a cross-user query.
# Cassandra's partition key ensures all of a user's events
# live on the SAME node — one network hop, sub-millisecond reads.
# Cassandra table definition (CQL — CassandraQueryLanguage)
create_table_statement: |
CREATETABLEuser_activity_feed (
user_id UUID, -- Partition key: all rows for one user on one node
event_time TIMESTAMP, -- Clustering key: rows sorted by time automatically
event_type TEXT, -- 'video_watched', 'post_liked', 'comment_added'
content_id UUID, -- The piece of content this event relates to
metadata MAP<TEXT,TEXT>, -- Flexible key-value bag — no schema migration needed
PRIMARYKEY (user_id, event_time)
) WITHCLUSTERINGORDERBY (event_time DESC); -- Latest events first
# Reading the last 50 events for a user:
read_query: |
SELECT event_type, event_time, content_id, metadata
FROM user_activity_feed
WHERE user_id = 9a8b7c6d-...
LIMIT50;
-- This hits ONE partition on ONE node.
-- At1 million users, this query is EQUALLY fast as at 1000 users.
-- That's horizontal scaling working as intended.
# What you CAN'T do efficiently in Cassandra:
bad_query_example: |
-- This would require scanning ALL partitions on ALL nodes:
SELECT * FROM user_activity_feed
WHERE event_type = 'video_watched'AND event_time > '2024-01-01';
-- NEVER design a Cassandra query like this.
-- If you need this query, use a SQL analytical database instead.
# Horizontal scaling event:
scaling_scenario:
current_nodes: 6
peak_write_throughput: '120,000 writes/second'
action: 'Add 2 more nodes to the cluster'
result: 'Cassandra automatically rebalances token ranges.
New throughput capacity: ~160,000 writes/second.
Zero downtime. No application code changes.'
equivalent_sql_action: 'Upgrade to a larger RDSinstance (vertical),
causes maintenance window, costs 3x more per month.'
Output
# All events for a given user_id land on the same Cassandra node
# One network hop, sub-millisecond read regardless of total dataset size
Interview Gold: SQL can scale horizontally too — with caveats
PostgreSQL read replicas handle horizontal read scaling effectively. Tools like Citus turn PostgreSQL into a distributed database. PlanetScale shards MySQL automatically. The real distinction is that horizontal write scaling is hard for SQL and native for most NoSQL systems. When an interviewer asks 'why NoSQL for high write throughput,' the correct answer explains the partition key model and avoids needing distributed locking — not just 'because it scales.'
Production Insight
Horizontal scaling in NoSQL databases requires choosing the right partition key — a bad partition key creates hot partitions that negate the distribution benefit.
PostgreSQL with read replicas handles the majority of read-heavy workloads without moving to NoSQL.
For write-heavy workloads beyond ~10,000 writes/second, evaluate Cassandra or DynamoDB — but define the partition key and access pattern first.
Key Takeaway
NoSQL horizontal scaling works by partitioning data by a key — all of a user's data on one node, one network hop.
The partition key design is the most important architecture decision in any NoSQL deployment.
PostgreSQL read replicas and Citus cover most horizontal scaling needs before requiring NoSQL.
Real-World Architecture: When to Use Both — Polyglot Persistence
The mature answer to SQL vs NoSQL isn't a binary choice — it's a polyglot persistence strategy. Most production systems above a certain scale use multiple databases, each doing what it does best. This isn't over-engineering; it's matching tools to jobs.
Consider a ride-sharing app like Uber. The user's account, payment methods, and trip history are relational — strong consistency matters, and the data has clear foreign key relationships. PostgreSQL handles this. Driver location updates, however, come in thousands of times per second from GPS pings. A Redis geospatial index stores real-time positions (write-heavy, millisecond reads, ephemeral data). Completed trip analytics flow into a data warehouse like BigQuery for SQL-based reporting.
Netflix uses Cassandra for its viewing history service — hundreds of millions of users, each needing their own list fetched instantly. The consistency requirement is low (seeing yesterday's watched episode appear 100ms late is fine). But Netflix also uses MySQL for its billing and account management data where strict consistency is non-negotiable.
The pattern is consistent across companies: operational writes that are partitionable by a user/entity ID → NoSQL. Transactional data requiring integrity across entities → SQL. Time-series or event data with enormous volume → NoSQL (Cassandra, InfluxDB). Complex analytical queries → SQL data warehouse. This is polyglot persistence, and recognizing this pattern in a system design interview is what separates good answers from great ones.
polyglot_persistence_architecture.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# This pseudocode shows HOW a ride-sharing backend routes# different data types to the right database.# This is the polyglot persistence pattern in action.
import psycopg2 # PostgreSQL client
import redis # Redis client
from cassandra.cluster import Cluster# Cassandra client# ─────────────────────────────────────────────# DATABASE CONNECTIONS# ─────────────────────────────────────────────# PostgreSQL: user accounts, trips, payments (ACID required)
postgres_conn = psycopg2.connect(
host='postgres-primary.internal',
database='rideapp_core',
user='app_service'
)
# Redis: real-time driver GPS positions (speed required, ephemeral data)
redis_client = redis.Redis(
host='redis-cluster.internal',
decode_responses=True
)
# Cassandra: completed trip event log (massive write volume, lookup by rider)
cassandra_session = Cluster(['cassandra-node-1', 'cassandra-node-2']).connect('rideapp')
# ─────────────────────────────────────────────# OPERATION 1: Update driver GPS position# Happens every 3 seconds per driver. At 50,000 active drivers = 16,000 writes/sec.# Redis GEOADD handles this trivially. PostgreSQL would buckle under this write rate.# ─────────────────────────────────────────────defupdate_driver_location(driver_id: str, latitude: float, longitude: float):
# Store as a geospatial point. TTL of 30 seconds — if driver goes offline,# their position automatically expires. No manual cleanup needed.
redis_client.geoadd('active_drivers', [longitude, latitude, driver_id])
redis_client.expire(f'driver:{driver_id}:location', 30) # Auto-expire stale positions# OPERATION 2: Find nearby drivers for a rider# Redis GEORADIUS searches within 5km in under 1ms regardless of driver count.deffind_drivers_near_rider(rider_latitude: float, rider_longitude: float) -> list:
nearby_drivers = redis_client.georadius(
'active_drivers',
rider_longitude,
rider_latitude,
5, # 5 km radius
unit='km',
withcoord=True,
count=10, # Top 10 closest drivers
sort='ASC' # Nearest first
)
return nearby_drivers
# ─────────────────────────────────────────────# OPERATION 3: Create a trip record (needs ACID)# Rider payment authorization + trip creation must be atomic.# If payment fails, trip must NOT be created. This is classic SQL territory.# ─────────────────────────────────────────────defcreate_trip_with_payment(rider_id: int, driver_id: int, fare_usd: float) -> int:
cursor = postgres_conn.cursor()
try:
postgres_conn.autocommit = False# Begin transaction# Authorize payment (deduct from rider's payment method)
cursor.execute(
"UPDATE rider_payment_methods SET pending_charge = pending_charge + %s ""WHERE rider_id = %s AND is_default = TRUE",
(fare_usd, rider_id)
)
# Create the trip record — only happens if payment auth succeeded
cursor.execute(
"INSERT INTO trips (rider_id, driver_id, fare_usd, status, started_at) ""VALUES (%s, %s, %s, 'active', NOW()) RETURNING trip_id",
(rider_id, driver_id, fare_usd)
)
trip_id = cursor.fetchone()[0]
postgres_conn.commit() # Both operations succeed or neither doesreturn trip_id
exceptExceptionas e:
postgres_conn.rollback() # Payment failed — undo EVERYTHINGraiseRuntimeError(f'Trip creation failed, payment rolled back: {e}')
# ─────────────────────────────────────────────# OPERATION 4: Log completed trip to event store# Billions of trip events over time. Queried only by rider_id.# Cassandra's partition-per-rider model is perfect here.# ─────────────────────────────────────────────deflog_completed_trip_event(rider_id: str, trip_id: str, duration_minutes: int, fare_usd: float):
cassandra_session.execute(
"""
INSERTINTO rider_trip_history
(rider_id, trip_completed_at, trip_id, duration_minutes, fare_usd)
VALUES (%s, toTimestamp(now()), %s, %s, %s)
""",
(rider_id, trip_id, duration_minutes, fare_usd)
)
# This write goes to Cassandra's partition for this rider_id.# Scales to billions of events — query is always 'get trips for rider X'.
Output
# PostgreSQL: user accounts, payments, trip history — ACID required
Pro Tip: Naming your databases by their job clarifies your design
In system design diagrams, label each database by its role — 'Transactional DB (PostgreSQL)', 'Cache + Geo Index (Redis)', 'Event Store (Cassandra)' — not just by technology name. This instantly communicates your reasoning to an interviewer and shows you chose each tool intentionally, not randomly.
Production Insight
Polyglot persistence adds operational complexity — every additional database is another thing to monitor, back up, and maintain.
Start with PostgreSQL for everything. Add a second database only when you have a concrete performance or schema problem that PostgreSQL cannot solve.
The operational cost of two databases is real — ensure the performance gain justifies it before committing.
Key Takeaway
Most production systems use multiple databases — PostgreSQL for transactions, Redis for cache/geo, Cassandra for high-velocity event streams.
Start with SQL. Add NoSQL only when you have a proven problem SQL cannot solve cost-effectively.
Every additional database is operational complexity — quantify the benefit before adding it.
● Production incidentPOST-MORTEMseverity: high
Multi-Document MongoDB Transactions Wiped Out the Performance Advantage
Symptom
Payment processing latency was 3x higher than projected. The engineering team was spending 40% of their time writing denormalization workarounds and migration scripts for schema drift across 12 different document shapes in the same collection.
Assumption
The team chose MongoDB because 'relational databases don't scale' and 'we'll have flexible schema requirements.' They didn't define their access patterns before choosing the database.
Root cause
The payment domain had strong relational integrity requirements: accounts, transactions, ledger entries, and audit logs all needed to be consistent atomically. MongoDB's multi-document transactions work but carry 2-3x overhead vs single-document operations. The flexible schema became a liability — different parts of the codebase wrote subtly different document shapes, breaking deserialization in subtle ways.
Fix
Migrated the payments core to PostgreSQL. Used MongoDB only for the customer-facing product catalog where schema genuinely varies per product type. The hybrid architecture reduced payment latency by 65% and eliminated schema drift incidents.
Key lesson
Define your top 5 most frequent queries before choosing a database — the access pattern decides everything
Flexible schema is a liability in domains with strong integrity requirements — schema drift is a maintenance tax
MongoDB ACID transactions exist but are expensive — if you need them frequently, you probably want SQL
Production debug guideDiagnosing when you have the wrong database for the job4 entries
Symptom · 01
NoSQL application requires frequent multi-document transactions or complex joins
→
Fix
You likely chose NoSQL for a relational domain. Audit your top 10 most frequent query patterns. If more than 30% require multi-document consistency or cross-collection joins, consider migrating the core to PostgreSQL and keeping NoSQL only for the flexible-schema parts.
Symptom · 02
SQL database struggling with high-velocity write throughput (>10,000 writes/second)
→
Fix
Evaluate write patterns: are these time-series, event logs, or user activity feeds with a clear partition key? If yes, these are strong candidates for Cassandra or DynamoDB. Add a write buffer (Kafka/Redis) in front of the SQL database as an interim measure.
Symptom · 03
NoSQL schema drift — different documents in the same collection have diverged over time
→
Fix
Add schema validation at the application layer (use Mongoose schemas, Pydantic models, or the database's built-in JSON schema validation). Establish a migration strategy — treat NoSQL schema changes with the same rigor as SQL ALTER TABLE migrations.
Symptom · 04
Real-time geospatial queries timing out on large tables
→
Fix
Move real-time location data to Redis with GEOADD/GEORADIUS commands. Redis geospatial operations on millions of points return in sub-millisecond. Store persistent trip/location history in PostgreSQL — use Redis as the live query layer only.
Factor
SQL (PostgreSQL, MySQL)
Document (MongoDB)
Wide-Column (Cassandra)
Key-Value (Redis)
Consistency
Strong (ACID)
Tunable (ACID available)
Eventual (tunable)
Strong for single ops
Schema
Fixed, enforced
Flexible per document
Fixed column families
No schema
Query model
Rich joins, aggregates
Document + limited joins
Partition key + clustering
Key lookup only
Horizontal scale
Complex (Citus, sharding)
Built-in sharding
Native linear scale
Redis Cluster
Write throughput
10K–100K/s
50K–200K/s
100K–1M+/s
1M+/s
Best for
Transactions, complex queries
Variable-schema content
Time-series, event streams
Cache, sessions, real-time geo
Key takeaways
1
Define your top 5 queries before choosing a database
the access pattern is the only thing that matters.
2
ACID vs BASE is not about reliability vs speed
it is about consistency guarantees and the coordination cost that comes with them.
3
Polyglot persistence is the mature answer
SQL for transactions, Redis for cache and real-time, Cassandra for high-velocity time-series.
4
Start with PostgreSQL. Add a second database only when you have a concrete, measured problem that SQL cannot solve cost-effectively.
Common mistakes to avoid
3 patterns
×
Choosing NoSQL because it scales without defining the access pattern first
Symptom
You end up needing multi-partition queries, complex joins, or strong consistency — all things NoSQL makes expensive or impossible
Fix
Write out your top 5 most frequent queries before choosing a database. If they require joins, complex aggregates, or multi-row atomic updates, SQL is almost always the right choice.
×
Using a document database for strongly relational data with integrity requirements
Symptom
You embed related data in documents, then spend most of your time writing denormalization workarounds and handling schema drift as documents evolve inconsistently
Fix
If your domain has foreign key relationships, financial transactions, or data that must be consistent across multiple entities atomically, use SQL. Document databases are appropriate for genuinely variable-schema data.
×
Using SQL for everything when high-velocity event streams or real-time geo queries appear
Symptom
PostgreSQL INSERT latency degrades under high write volume; geospatial queries on real-time location data time out
Fix
Route high-velocity time-series writes to Cassandra or DynamoDB. Move real-time geospatial queries to Redis GEORADIUS. Keep PostgreSQL for the transactional core — let each database do what it is optimised for.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
Your team is building a social media platform for 50 million users. The ...
Q02SENIOR
What does eventual consistency mean, and give a concrete example where i...
Q03SENIOR
A candidate says 'I'd use MongoDB because it scales better than PostgreS...
Q01 of 03SENIOR
Your team is building a social media platform for 50 million users. The feed service needs to return a user's last 100 events in under 5ms. What database would you choose and why?
ANSWER
This is a high-velocity write, time-series access pattern with a clear partition key (user_id). The access pattern is always 'get last N events for user X' — never cross-user aggregation. Cassandra is the optimal choice: partition the event table on user_id so all events for a user land on the same node, with a clustering column on event_timestamp DESC for efficient range reads. This gives sub-millisecond single-partition reads at any scale. PostgreSQL would struggle at this write volume and query rate. I would keep PostgreSQL for the user account, relationship graph, and anything requiring ACID guarantees.
Q02 of 03SENIOR
What does eventual consistency mean, and give a concrete example where it is acceptable and one where it is not?
ANSWER
Eventual consistency means that after a write, all replicas will eventually reflect the new value — but there is a window (milliseconds to seconds) where different nodes may return different results. Acceptable: a social media like count. If a post shows 1,247 likes instead of 1,248 for 200ms after a new like, no harm occurs. Not acceptable: a bank balance or inventory count. If a product shows 1 unit in stock on two servers simultaneously and both allow a purchase, you oversell. The business domain determines acceptability — financial, medical, and compliance data almost always require strong consistency.
Q03 of 03SENIOR
A candidate says 'I'd use MongoDB because it scales better than PostgreSQL.' What follow-up questions would you ask?
ANSWER
I would ask: What is your write throughput requirement — requests per second? What are your top 5 most frequent queries? Do any of them require joins or multi-document atomic operations? Does your data shape genuinely vary per document or is it relatively consistent? What is your team's operational experience with MongoDB versus PostgreSQL? The goal is to determine whether the access pattern actually requires NoSQL or whether PostgreSQL with proper indexing would handle the load. PostgreSQL handles tens of thousands of transactions per second and scales reads horizontally with replicas. 'It scales' without a defined problem statement is not a sufficient reason to take on the operational complexity of MongoDB.
01
Your team is building a social media platform for 50 million users. The feed service needs to return a user's last 100 events in under 5ms. What database would you choose and why?
SENIOR
02
What does eventual consistency mean, and give a concrete example where it is acceptable and one where it is not?
SENIOR
03
A candidate says 'I'd use MongoDB because it scales better than PostgreSQL.' What follow-up questions would you ask?
SENIOR
FAQ · 3 QUESTIONS
Frequently Asked Questions
01
Is NoSQL always faster than SQL?
No — and this is one of the most dangerous myths in system design. NoSQL is faster for specific access patterns: single-partition key lookups at massive scale. For complex queries involving multiple entities, aggregations, or cross-filtering, a well-indexed SQL database is often significantly faster than a NoSQL equivalent. Benchmark your actual access patterns, not the technology in the abstract.
Was this helpful?
02
Can a NoSQL database replace SQL entirely for a large application?
Technically possible, but almost never advisable. NoSQL databases push relational integrity, transaction logic, and join logic into your application code — which is harder to maintain, easier to get wrong, and more expensive to test than letting a SQL database enforce those constraints for free. Most large-scale applications use both: NoSQL for high-volume, single-entity access patterns and SQL for transactional and reporting needs.
Was this helpful?
03
What does 'schema-less' actually mean in MongoDB, and is it always an advantage?
Schema-less means MongoDB won't reject a document because it has extra or missing fields. It's an advantage during early development when your data model is evolving quickly. It becomes a liability in production when different documents in the same collection have inconsistent shapes and your application code has to handle every variation. Most mature MongoDB deployments use schema validation (MongoDB's $jsonSchema validator) to enforce structure — which means 'schema-less' really means 'schema is your responsibility, not the database's.'