ACID Properties — Batch Atomicity Without Transactions
10,000 debits committed, JVM crashed at INSERT 4,700 — 230 accounts went negative.
- ACID guarantees transactional reliability: Atomicity (all-or-nothing), Consistency (valid state transitions only), Isolation (concurrent transactions do not interfere), Durability (committed data survives crashes)
- Atomicity is implemented via Write-Ahead Log and undo segments — partial failures roll back completely without leaving orphaned state
- Isolation uses MVCC or locking to prevent dirty reads, lost updates, and phantom reads — the level you choose is a speed-versus-safety dial
- Durability flushes WAL to disk via fsync before COMMIT returns — this is the most expensive operation in any database write path
- Setting synchronous_commit=off in PostgreSQL boosts throughput 5-10x but risks losing up to 600ms of committed data on crash — never acceptable for financial data
- The biggest production mistake: assuming higher isolation levels are always safer — SERIALIZABLE can cut throughput by 40% without preventing anything that READ COMMITTED with proper application logic would not handle
Think of ACID like a bank vault. Atomicity means the vault door either opens fully or stays shut — there is no half-open state where some of the money is accessible and some is not. Consistency means only valid bills go in — counterfeit money gets rejected at the door. Isolation means two people cannot grab the same stack of cash at the same time — one waits while the other finishes. Durability means once the money is locked inside, a power outage does not erase it — the vault remembers what is in it even after the lights come back on.
ACID is the set of properties that makes a database trustworthy. Without ACID, a power outage at the wrong millisecond could leave your data in a corrupted, half-updated state — a nightmare for financial records, medical systems, or any application where partial state is not just wrong but dangerous. Imagine a bank transfer: if the system fails after debiting your account but before crediting the recipient, that money vanishes. Not temporarily missing, not in a reconciliation queue — gone. ACID prevents this.
These four properties work in tandem, implemented through sophisticated internal mechanisms like transaction logs, locking protocols, and multi-version concurrency control. As a developer, understanding ACID is not about passing interviews. It is about understanding the performance trade-offs you make every time you choose an isolation level, decide between a traditional RDBMS and a relaxed NoSQL system, or configure how aggressively your database flushes data to disk.
A common misconception is that ACID is binary — either a database has it or it does not. In reality, every property has configurable levels that trade safety for throughput. PostgreSQL lets you trade durability for speed with synchronous_commit. MySQL lets you choose between InnoDB's row-level MVCC and MyISAM's table-level locking. The properties exist on a spectrum, and production tuning means knowing which dial to turn, how far to turn it, and what breaks if you go too far.
Atomicity — The All-or-Nothing Guarantee
Atomicity ensures that a transaction is treated as a single, indivisible unit. If any part of the transaction fails — a constraint violation, a timeout, a crash, an OOM kill — the entire operation is rolled back, leaving the database in the exact state it was in before the transaction started. There is no partial commit. There is no state where half the work is done and the other half is not.
The primary implementation mechanism is the Write-Ahead Log (WAL). Before any data pages are modified on disk, the intended changes are appended sequentially to the WAL. If a crash occurs mid-transaction, the database reads the WAL during recovery and identifies any transaction that started but never committed. Those transactions are rolled back — their changes are undone as if they never happened.
In MySQL/InnoDB, atomicity uses the undo log in addition to the redo log. When a transaction modifies a row, the original value is copied to the undo log before the change is applied to the data page. If ROLLBACK is issued — or if the server crashes before COMMIT — the database replays the undo log to restore every modified row to its pre-transaction state.
The critical production insight is this: atomicity only exists within explicit transaction boundaries. If you run 10,000 SQL statements in auto-commit mode, each statement is its own transaction. A crash at statement 4,700 means 4,699 statements are permanently committed and 5,301 never happened — with no mechanism to identify or recover the boundary. The $47,000 incident described above is exactly this failure mode.
SAVEPOINT extends atomicity within a transaction. It creates a named checkpoint inside a running transaction. If a subsequent operation fails, you can ROLLBACK TO SAVEPOINT to undo only the work after the savepoint, while preserving everything before it. This is essential for long-running batches where reprocessing everything from scratch is not acceptable.
- WAL records the intent before the change is applied to data pages — crash recovery replays or undoes based on COMMIT status.
- Undo log (InnoDB) or rollback segments store the original row values — ROLLBACK restores them atomically.
- SAVEPOINT creates named checkpoints inside a transaction — partial rollback without losing all prior work.
- Without explicit BEGIN/COMMIT boundaries, auto-commit mode means each statement is its own transaction — a crash between statements leaves an unrecoverable partial state.
- OOM kills, SIGKILL, and hardware failures do not trigger application-level rollback logic. Only database-level transaction boundaries provide crash safety.
Consistency — The Valid State Machine
Consistency ensures that a transaction brings the database from one valid state to another valid state, respecting every defined rule — constraints, triggers, cascades, and domain invariants. If a transaction would violate any constraint (foreign key, unique, check, not null), the database rejects the entire transaction. The database never persists an invalid state, even if the application code attempts to create one.
Consistency is the property that ties the other three together. Atomicity ensures no partial transaction leaks invalid intermediate state. Isolation ensures concurrent transactions do not create invalid combinations of values that no single transaction would produce. Durability ensures the valid state, once committed, persists through crashes. Without any one of the other three, consistency cannot be guaranteed.
There are two kinds of consistency that matter in practice. Schema-level consistency is enforced by database constraints: foreign keys prevent orphaned references, unique constraints prevent duplicates, check constraints prevent domain violations like negative stock counts. Application-level consistency is enforced by business logic: a transfer must debit and credit the same amount, an order must have at least one line item, a subscription cancellation must trigger a prorated refund. The database cannot enforce application-level invariants automatically — those are the application developer's responsibility, backed by database constraints as a safety net.
In production, constraint violations surface as SQL errors that applications must handle gracefully. A common anti-pattern is catching constraint violations and retrying blindly without diagnosing why the violation occurred. A unique constraint violation on an idempotency key is normal and expected — the correct response is to return the existing record, not to retry with a new key. A foreign key violation is a bug — it means the application is referencing data that does not exist, and retrying will produce the same failure.
Isolation — Handling the Chaos of Concurrency
Isolation defines how and when the changes made by one transaction become visible to other concurrent transactions. Without isolation, the lost update anomaly occurs: two transactions read the same row, both compute a new value independently, and the second commit silently overwrites the first commit's changes. Neither transaction is aware that the other existed.
Databases solve concurrency problems using two fundamentally different approaches. Pessimistic locking (SELECT ... FOR UPDATE) acquires an exclusive lock on the row at read time — any other transaction that tries to read or write the same row blocks until the lock is released. This guarantees no lost updates but reduces concurrency. Optimistic concurrency (MVCC plus version columns) allows concurrent reads without blocking and detects conflicts at write time — if the row changed since you read it, your update fails and the application retries.
The SQL standard defines four isolation levels, from weakest to strongest:
READ UNCOMMITTED allows dirty reads — you can see uncommitted changes from other transactions. PostgreSQL silently promotes this to READ COMMITTED because dirty reads are almost never desirable.
READ COMMITTED is the PostgreSQL default. Each SQL statement within a transaction sees a fresh snapshot of committed data at the moment that statement begins. This prevents dirty reads but allows non-repeatable reads: if you read a row twice within the same transaction, you might get different values if another transaction committed a change between your two reads.
REPEATABLE READ takes a snapshot when the transaction starts and uses that snapshot for every statement within the transaction. This prevents both dirty reads and non-repeatable reads. In PostgreSQL, REPEATABLE READ also prevents phantom reads because MVCC snapshots are transaction-scoped. In MySQL/InnoDB, REPEATABLE READ uses gap locks to prevent phantom inserts, which makes it stricter than the SQL standard requires.
SERIALIZABLE provides full isolation — transactions execute as if they ran one at a time, sequentially. PostgreSQL implements this with Serializable Snapshot Isolation (SSI), which detects read-write dependencies and aborts transactions that would violate serializability. This is the safest level but reduces throughput by 20-40% under contention because conflicting transactions are aborted and must be retried.
- READ COMMITTED: each statement sees a fresh snapshot of committed data. Fast. Allows non-repeatable reads within a transaction.
- REPEATABLE READ: the entire transaction sees one consistent snapshot from when it started. Prevents most anomalies in PostgreSQL. ~5-10% slower under contention.
- SERIALIZABLE: transactions execute as if they ran sequentially. Safest. 20-40% slower under contention because conflicting transactions are aborted and retried.
- MVCC (PostgreSQL): readers never block writers and writers never block readers — both work on different row versions simultaneously.
- Pessimistic locking (SELECT FOR UPDATE): blocks other transactions explicitly. Use when write conflicts are frequent and the cost of retrying a failed optimistic write is higher than the cost of blocking.
Durability — Survival After the Crash
Durability guarantees that once a transaction is committed, it remains committed — even if the server loses power one millisecond later, even if the kernel panics, even if the disk controller lies about having flushed its cache. This is the property that lets you show a user 'Payment Confirmed' and know that the confirmation is permanent.
The mechanism is straightforward in principle and expensive in practice. Before the database returns COMMIT to the client, it ensures the WAL record for that transaction has been physically written to non-volatile storage — not just to the OS page cache (which is volatile RAM), but all the way down to the disk platters or flash cells. This operation is called fsync, and it is the single most expensive operation in any database write path.
Each fsync takes 1-5 milliseconds on enterprise SSDs and 5-15 milliseconds on spinning disks. With synchronous_commit=on (the PostgreSQL default), every COMMIT waits for fsync to complete before returning to the client. This limits throughput to roughly 200-1,000 commits per second per disk on SSD, because each commit must wait for the physical write to complete.
Setting synchronous_commit=off changes the bargain dramatically. The database writes the WAL record to the OS page cache and returns COMMIT immediately, without waiting for fsync. The wal_writer background process flushes the page cache to disk every wal_writer_delay milliseconds (default 200ms). This batches multiple transactions' WAL records into a single fsync, boosting throughput to 5,000-10,000 commits per second — a 5-10x improvement.
The cost of that improvement is a durability window. If the server crashes within the wal_writer_delay window after a commit, any transactions that committed during that window but were not yet flushed to disk are lost. The data was in volatile RAM (the OS page cache), and volatile RAM does not survive power loss. In practice, you can lose up to roughly 3x the wal_writer_delay value because the wal_writer may not have completed even one flush cycle before the crash.
On restart, the database replays the WAL from the last confirmed flush point. Any committed transaction whose WAL record was flushed is recovered. Any committed transaction whose WAL record was only in the page cache is gone — permanently.
Double-Debit: $47,000 Lost to Missing Atomicity in a Payment Batch
- Never assume a batch process is atomic because it runs in a single process. Atomicity is a database-level property enforced by explicit transaction boundaries, not a JVM-level property enforced by a single process.
- Use SAVEPOINT inside long-running batches to enable partial rollback and resume — reprocessing 10,000 records because record 4,701 failed is a waste of compute and customer patience.
- Monitor for debit-to-payment count mismatches in production. A 1% drift between related tables is not a minor accounting issue — it is a critical data integrity alert that should page someone immediately.
- OOM kills, SIGKILL signals, and hardware failures do not trigger transaction rollback if no transaction boundary exists. Auto-commit mode means every individual statement is its own transaction, and a failure between statements leaves the database in a state that no single rollback can fix.
pg_current_wal_lsn() and compare against the last flushed position.now() - pg_last_xact_replay_timestamp() AS replication_lag. If lag is the cause, either route critical reads to the primary or wait for the replica to catch up before serving the response. If using MVCC on the primary, check whether VACUUM is running — excessive dead tuples from un-vacuumed tables can cause reads to see outdated row versions. Run SELECT relname, n_dead_tup FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10 to identify bloated tables.Key takeaways
Common mistakes to avoid
5 patternsSetting synchronous_commit=off globally or for tables that store financial data
Applying SERIALIZABLE isolation globally as a preventive safety measure
Relying on database constraints as the primary business validation mechanism
Implementing retry logic without idempotency keys for payment transactions
Not monitoring long-running idle in transaction sessions
now() - xact_start AS duration FROM pg_stat_activity WHERE state = 'idle in transaction'. Alert on any session exceeding 10 seconds. The most common cause is application code that opens a transaction, makes an HTTP call to an external service, and does not commit or rollback if the HTTP call times out.Interview Questions on This Topic
Explain the Lost Update anomaly. How would you solve it in a Spring Boot application using @Version (Optimistic Locking) versus SELECT FOR UPDATE (Pessimistic Locking)?
Frequently Asked Questions
That's DBMS. Mark it forged?
7 min read · try the examples if you haven't