ACID vs BASE: The Production Trade-Offs That Kill Systems at 3 AM
ACID vs BASE explained with real production failures.
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
Use ACID when you need strong consistency and can't tolerate stale reads (e.g., financial transactions). Use BASE when you need high availability and partition tolerance, and can tolerate temporary inconsistency (e.g., social media feeds, caching layers).
Think of ACID like a bank vault: every deposit and withdrawal is recorded instantly and exactly once. If the power goes out mid-transaction, the vault locks and nothing is lost. BASE is like a whiteboard in a busy office: someone writes a number, someone else erases it, and eventually everyone agrees on the final count. It's fast and works even if people are in different rooms, but you might see the wrong number for a moment.
Everyone says ACID is for banks and BASE is for everything else. That's a lie that'll cost you a pager alert at 3 AM. I've seen a social media feed built on Cassandra lose a user's post for 47 seconds because the read repair didn't trigger in time. I've also seen a payment service on PostgreSQL deadlock because the isolation level was too strict for the concurrency pattern. The real question isn't which one is better—it's which one hurts less when it fails. By the end of this, you'll know exactly which consistency model your system needs, how to configure it without cargo-culting, and the exact failure modes that'll wake you up.
Why ACID Exists: The Problem of Partial Writes
Before ACID, databases could leave your data in a half-baked state. Imagine transferring $100 from account A to B: if the power fails after debiting A but before crediting B, the money vanishes. ACID's Atomicity ensures the entire operation succeeds or fails as one unit. Consistency guarantees that any transaction brings the database from one valid state to another—no orphaned rows, no violated constraints. Isolation prevents concurrent transactions from seeing each other's partial work. Durability means once a transaction commits, it survives a crash. Without these, you can't build reliable financial systems, inventory management, or any application where data integrity is non-negotiable.
BASE: Trading Consistency for Availability
BASE emerged from the CAP theorem: when a network partition occurs, you must choose between consistency and availability. ACID systems choose consistency—they'll refuse to serve reads if they can't guarantee freshness. BASE systems choose availability—they'll serve whatever data they have, even if it's stale. This is why DynamoDB, Cassandra, and Couchbase are popular for high-traffic web apps. The trade-off is eventual consistency: after a write, reads may return old data for a window of time. For a social media 'like' counter, that's fine. For a medical records system, it's a lawsuit waiting to happen.
When ACID Breaks: The Isolation Level Trap
ACID's Isolation comes in levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE. Each level prevents certain anomalies but adds locking overhead. The classic rookie mistake is using SERIALIZABLE everywhere 'for safety'. I've seen this bring down a payments service when the thread pool was exhausted at 3am because every transaction waited for locks held by other transactions. The fix? Use READ COMMITTED for most operations, and only escalate to SERIALIZABLE for operations that truly need it—like checking account balances before a withdrawal. Even then, consider optimistic locking with version numbers instead.
BASE Failure Modes: Stale Reads and Tombstones
BASE systems introduce two painful failure modes. First, stale reads: a client writes data, then immediately reads from a different replica that hasn't received the update yet. This breaks features like 'my profile shows my new name'. Mitigation: use read-after-write consistency by reading from the same replica that handled the write, or use quorum reads. Second, tombstones: in DynamoDB and Cassandra, deletes are just markers. If you delete a key and then immediately read, you might still get the old data because the tombstone hasn't propagated. This is a common source of bugs in session management. Always wait for a full gossip cycle (usually a few seconds) after a delete before relying on its absence.
Choosing Between ACID and BASE: The Decision Framework
Here's the blunt rule: if you can't afford to lose a single write, use ACID. If you can tolerate temporary inconsistency for higher throughput and availability, use BASE. But it's not binary. Many systems use both: an ACID relational database for the core transactional data (orders, payments) and a BASE cache or search index for read-heavy workloads (product catalog, user feeds). The key is to define your consistency boundaries. For example, an e-commerce site might use PostgreSQL for the order table (ACID) and Elasticsearch for product search (BASE). If the search index is stale by 5 seconds, customers might not see the latest review, but they can still buy. That's acceptable.
When Not to Use ACID or BASE: The Overkill Zone
ACID is overkill when you're building a logging system or an append-only event stream. You don't need transactions to insert a log line—a simple BASE store with high write throughput is better. BASE is overkill when you have a single-node application with low concurrency. Using Cassandra for a blog with 100 visitors/day is absurd—SQLite with WAL mode gives you ACID with zero operational complexity. Also, don't use BASE for anything involving money, inventory, or user identity. I've seen startups try to use MongoDB (BASE-ish) for a payment ledger. The result: duplicate charges and angry customers. Use the right tool for the job.
The 4GB Container That Kept Dying
- SERIALIZABLE isolation is not free—it trades memory for correctness.
- Always benchmark your transaction retry memory footprint under peak concurrency.
SELECT * FROM pg_locks WHERE NOT granted; to see blocked transactions. 2. Identify the conflicting queries. 3. Ensure all transactions access tables in the same order. 4. Consider lowering isolation level to READ COMMITTED.nodetool getconsistency. 2. Increase read consistency to QUORUM. 3. Verify replication factor and repair status: nodetool repair.SELECT * FROM pg_locks WHERE NOT granted;SELECT pg_cancel_backend(pid) FROM pg_locks WHERE NOT granted;Key takeaways
Interview Questions on This Topic
How does PostgreSQL handle concurrent transactions under SERIALIZABLE isolation, and what happens when a conflict is detected?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
That's Database Internals. Mark it forged?
3 min read · try the examples if you haven't