CAP Theorem — Why R=3, W=3, N=3 Broke Banking Reads
Strict quorum config (R=3, W=3, N=3) blocked all reads when one node partitioned.
- CAP Theorem says a distributed system can guarantee at most two of Consistency, Availability, and Partition Tolerance.
- Network partitions are inevitable — you must choose between CP (consistent, but may reject requests) or AP (available, but may return stale data).
- Databases like Cassandra (AP) and Zookeeper (CP) make opposite trade-offs intentionally.
- The biggest mistake: treating partition tolerance as optional — it's not; the choice is CP or AP during a partition.
- Performance insight: CP systems add ~2-5x latency during partitions due to quorum waits; AP systems handle them with no extra latency but risk stale reads.
Imagine you and your twin run identical noticeboards at opposite ends of a school. If someone pins a new message on yours, it takes a minute to walk to your twin and update theirs. During that minute, a student asking your twin sees old info — you can't have both boards perfectly in sync AND instantly available AND survive the case where the hallway between you is blocked. CAP Theorem says the same thing about databases spread across servers: when the network between them breaks (and it will), you must choose whether to give users a possibly-stale answer right now or make them wait until you're sure the answer is correct. You can never fully dodge that choice.
Every time Netflix keeps streaming while a data center goes dark, or your bank refuses to show your balance during a network blip, a silent architectural decision is playing out — one that was formally described by Eric Brewer in 2000 and proved by Gilbert and Lynch in 2002. CAP Theorem isn't an abstract academic curiosity; it's the load-bearing wall of every distributed system you'll ever build or operate. Ignore it and you'll ship a system that silently returns wrong data, drops writes, or locks up entirely under exactly the conditions you need it most.
The theorem states that a distributed data store can guarantee at most two of three properties simultaneously: Consistency (every read returns the most recent write or an error), Availability (every request receives a non-error response, though it may be stale), and Partition Tolerance (the system continues operating even when network messages between nodes are lost or delayed). The critical insight most engineers miss is that network partitions are not optional — they happen on every cloud platform, in every data center, on every network switch that has ever existed. This means the real choice is always CA-during-partition, which translates to: do you go CP or AP when the network fails?
By the end of this article you'll be able to explain exactly which guarantees Cassandra, MongoDB, Zookeeper, etcd, CockroachDB and DynamoDB sacrifice and why, design a partition-handling strategy for a real production system, understand PACELC (CAP's more honest successor), spot the five most common architectural mistakes engineers make when reasoning about CAP, and walk into any senior engineering interview with precise, battle-tested answers. Let's get into it.
What is CAP Theorem and Databases?
CAP Theorem defines the fundamental trade-offs in distributed databases. It states that a distributed data store cannot simultaneously provide Consistency, Availability, and Partition Tolerance. Understanding CAP is essential for choosing the right database for your workload and for designing systems that survive network failures without data corruption.
Let's see a concrete example. Suppose you're building a distributed key-value store with three replicas. You want every read to return the latest write (Consistency). You also want every request to get a response (Availability). And you want the system to work even if the network between replicas goes down (Partition Tolerance). The theorem says you can't have all three. The key insight: network partitions are inevitable in any real deployment — cloud providers have documented failures, switches fail, cables get cut. So you must choose: do you sacrifice Consistency (become AP) or Availability (become CP) during a partition?
The Three Properties: Consistency, Availability, Partition Tolerance
Let's define each property precisely, because the textbook definitions hide production nuances.
Consistency — Every read receives the most recent write or an error. In a distributed system, this means all nodes see the same data at the same time. Linearizability is the strictest form. But there are weaker consistency models like eventual consistency that CAP does not forbid — CAP's Consistency is specifically linearizability (or atomic consistency). If your system only guarantees eventual consistency, it's not providing CAP-Consistency; it's effectively AP.
Availability — Every request receives a (non-error) response, without the guarantee that it contains the most recent write. The key: the response must not be a timeout or 500. Some systems interpret availability as "the system is up," but CAP defines it precisely: every request gets a response, even if it's stale. That's why AP systems like DynamoDB always succeed reads — they may return old data.
Partition Tolerance — The system continues to operate despite arbitrary message loss or delay between nodes. This is the one property you cannot sacrifice in a distributed system. You could theoretically build a system that is CA as long as no partition ever occurs, but in practice partitions happen. So the choice is CP or AP.
- C + A: CA system that is not partition tolerant — unrealistic at scale.
- C + P: CP system — consistent and partition tolerant, but may reject requests.
- A + P: AP system — always returns a response, but may return stale data.
- In production, you always end up at one of the bottom corners: CP or AP.
- The top corner (CA) is a theoretical ideal that only exists in single-node databases.
Why Partition Tolerance Is Not Optional
Every distributed system will experience network partitions. Cloud providers have regional outages. Network switches fail. even a brief blip in a packet can cause a split-brain scenario. If you design a system that is CA (no partition tolerance), you are implicitly saying: "I will shut down the entire system if a partition occurs." That is often unacceptable for production systems that require high availability.
Consider the scenario: two data centers connected by a WAN link. The link goes down. If your database is CA, both data centers must stop serving to maintain consistency. But if you're running an e-commerce platform, you'd rather serve stale product inventory than show an error page. That's the AP choice.
Therefore, the real decision is not "if" you handle partitions, but "how" you handle them. Do you want your system to maintain consistency at the cost of rejecting some requests (CP)? Or do you want to serve every request even if some nodes have stale data (AP)? This choice must be made consciously and per operation, not as an afterthought.
CP vs AP: The Real Trade-off
The classic CAP choice is between CP (Consistent but possibly unavailable during partition) and AP (Available but possibly inconsistent during partition). Let's see how real databases choose.
- Zookeeper: Uses a leader-based quorum (Zab protocol). If the leader loses connection to a majority of followers, it steps down and no writes are possible until a new leader is elected. Reads are also served only by the leader, so during a partition, the minority partition is completely unavailable for reads and writes. This is strict CP.
- etcd (used by Kubernetes): Raft consensus. Requires a majority of nodes to make progress. During a partition, the minority partition becomes unavailable. etcd is CP by design — consistent data is more important than availability.
- HBase: Uses HDFS and ZooKeeper. It's CP — during a region server failure, the region becomes unavailable until recovery completes.
- Cassandra: Uses Dynamo-style replication with tunable consistency. By default, reads and writes use QUORUM, but you can lower consistency to ONE. During a partition, each partition can continue to serve reads and writes. After partition heals, read repair and hinted handoff resolve conflicts. Cassandra is AP by default.
- DynamoDB: AP by default (eventually consistent reads). You can request strongly consistent reads, but they are limited to one partition (the leader) and may fail during partition. DynamoDB is designed for high availability.
- Couchbase: AP with multi-master replication. During partition, each node accepts writes; conflict resolution uses last-write-wins (timestamp).
The middle ground: Many databases allow per-operation trade-off. For example, Cassandra's consistency level can be set per query: ALL (CP) vs ONE (AP). This lets you mix depending on the criticality of the data.
PACELC: CAP's Honest Extension
Even when the network is healthy (no partition), there's still a trade-off between Consistency and Latency. PACELC (pronounced 'pace-elk') extends CAP: If Partition (P) then trade-off between C and A; Else (E) trade-off between C and L (Latency).
This is the practical reality: your database is never free from trade-offs. Even in normal operation, you must decide whether to pay the latency cost of strong consistency or accept lower latency with eventual consistency.
- Cassandra: In normal operation (no partition), if you choose consistency level ALL, you pay high latency because all replicas must acknowledge. This is the 'Else' branch of PACELC: you choose C over L.
- DynamoDB: Default eventually consistent reads give lower latency. If you request strongly consistent reads, latency increases — again C over L.
- Spanner (Google's globally distributed DB): Uses TrueTime to provide external consistency with high latency. In normal operation, Spanner chooses C over L because it must wait for TrueTime.
The key insight: you cannot design a system that is both low-latency and strongly consistent under normal conditions. The trade-off persists.
- Partition? → choose between C and A (CP or AP).
- No partition? → choose between C and L (strong consistency or low latency).
- For critical writes: use strong consistency even if latency increases.
- For reads of volatile data (like stocks): use strong consistency.
- For reads of static data (like user profiles): use eventual consistency for speed.
How Real Databases Choose: Cassandra, MongoDB, Zookeeper, etcd, CockroachDB, DynamoDB
Let's examine how popular databases handle the CAP trade-off in practice.
Cassandra: AP by design. Uses a Dynamo-style replication model with multi-master writes. Writes are always available (unless the entire cluster goes down). During partition, each side continues to accept writes. After partition heals, read repair and hinted handoff propagate updates. You can increase consistency level to ALL for strong consistency, but then you lose partition tolerance (because if any node is down, the write fails). Cassandra's default consistency is QUORUM (a mix of C and A).
MongoDB: CP with replica sets. Has a single primary that accepts writes; secondaries replicate asynchronously. If the primary goes down or is isolated, an election is held to pick a new primary; the old primary (if isolated) will step down when it reconnects. Reads from secondaries may be stale. MongoDB's default read preference is primary (strong consistency). You can allow reads from secondaries for better availability but at the cost of potential staleness. So MongoDB is CP during partition (minority partition cannot write).
Zookeeper: Strict CP. Uses Zab atomic broadcast. Requires a majority (quorum) to commit writes. Any node not in the majority becomes read-only (and eventually shuts down). Zookeeper is never AP — it prefers to be unavailable rather than inconsistent.
etcd: Also CP using Raft. Similar to Zookeeper — a majority must be present. If you lose majority, etcd stops serving. This is deliberate for Kubernetes etcd — losing consistency would be catastrophic.
CockroachDB: CP (strong consistency) with some AP properties. Uses Raft for replication but can survive some partitions as long as a majority of replicas for each range are available. CockroachDB offers serializable isolation. It uses clock synchronization (hybrid logical clocks) to provide linearizability without TrueTime. If a range's majority is lost, that range is unavailable — so it's CP within each range.
DynamoDB: AP. DynamoDB's always-on availability is its primary feature. It provides eventually consistent reads by default. Strongly consistent reads are available but limited to one partition and may fail during partition. DynamoDB is the canonical AP system.
Common Mistakes Engineers Make With CAP
- Assuming all databases in the same category behave identically. Redis is CP in cluster mode (with optional replicas), but a single Redis instance is CA (no partition tolerance). Many engineers think Redis is AP.
- Believing 'eventual consistency' always means AP. Cassandra with consistency level ALL is actually CP — if you configure it that way, you lose partition tolerance.
- Treating CAP as a permanent choice. You can design a system that switches between CP and AP based on the type of operation (e.g., use strongly consistent reads for payment, eventually consistent for product images).
- Ignoring the cost of consistency in normal operation (PACELC). Using strong consistency for every read will increase database load and latency significantly.
- Not testing partition recovery. The most common production issue is not the partition itself but the behavior after it heals — e.g., read repair storms, data conflicts, or slow convergence.
The Banking Blackout: A CP System Refuses Reads During a Network Blip
- Always test your quorum configuration under a simulated network partition.
- CP doesn't have to mean total unavailability — you can tune quorum sizes to survive minority failures while preserving consistency.
- Business impact of strict CP must be agreed with product owners before going to production.
Key takeaways
Common mistakes to avoid
4 patternsConfusing CAP Consistency with ACID Consistency
Using strong consistency for all reads in an AP database
Assuming partition tolerance means 'I don't have to worry about partitions'
Not considering the PACELC trade-off in normal operation
Interview Questions on This Topic
Explain CAP Theorem and give an example of a CP system vs an AP system. Which would you choose for a banking ledger?
Frequently Asked Questions
That's Database Design. Mark it forged?
8 min read · try the examples if you haven't