Junior 12 min · March 06, 2026

ACID Properties — Batch Atomicity Without Transactions

Master ACID properties in DBMS: Atomicity, Consistency, Isolation, Durability explained with PostgreSQL internals, isolation levels, MVCC, WAL, ACID vs BASE, and production performance trade-offs..

N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Drawn from code that ran under real load.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • ACID guarantees transactional reliability: Atomicity (all-or-nothing), Consistency (valid state transitions only), Isolation (concurrent transactions do not interfere), Durability (committed data survives crashes)
  • Atomicity is implemented via Write-Ahead Log and undo segments — partial failures roll back completely without leaving orphaned state
  • Isolation uses MVCC or locking to prevent dirty reads, lost updates, and phantom reads — the level you choose is a speed-versus-safety dial
  • Durability flushes WAL to disk via fsync before COMMIT returns — this is the most expensive operation in any database write path
  • Setting synchronous_commit=off in PostgreSQL boosts throughput 5-10x but risks losing up to 600ms of committed data on crash — never acceptable for financial data
  • The biggest production mistake: assuming higher isolation levels are always safer — SERIALIZABLE can cut throughput by 40% without preventing anything that READ COMMITTED with proper application logic would not handle
✦ Definition~90s read
What is ACID Properties in DBMS?

ACID (Atomicity, Consistency, Isolation, Durability) is a set of database transaction guarantees that ensure data integrity even under concurrent access, system crashes, or partial failures. These properties solve the fundamental problem of making multi-step operations behave as a single, reliable unit — without ACID, a bank transfer could deduct from one account but fail to credit another, or a concurrent read could see a half-written row.

Think of ACID like a bank vault.

ACID is the bedrock of relational databases like PostgreSQL, MySQL (InnoDB), and SQLite, but it's not free: the guarantees impose significant performance costs through locking, write-ahead logging, and coordination overhead.

Atomicity means the transaction either commits completely or aborts entirely — no partial writes survive. Consistency ensures the database transitions from one valid state to another, respecting all constraints, triggers, and cascades. Isolation protects concurrent transactions from seeing each other's intermediate states, typically implemented via locking or multi-version concurrency control (MVCC).

Durability guarantees that once a transaction commits, its effects persist even if the power dies milliseconds later — usually via a write-ahead log (WAL) flushed to disk.

ACID is not a performance optimization; it's a correctness contract. When you don't need strict guarantees — for example, in high-throughput logging, analytics pipelines, or caching layers — you'd reach for BASE (Basically Available, Soft state, Eventual consistency) systems like Cassandra or DynamoDB.

The key insight: ACID trades throughput for safety. A PostgreSQL transaction doing a simple UPDATE with default isolation (Read Committed) can be 10-100x slower than a raw file write, but it guarantees you'll never lose money or see corrupt data. Choose ACID when correctness is non-negotiable; skip it when you can tolerate stale reads or eventual consistency.

ACID — Complete Transaction Guarantee Architecture diagram: ACID — Complete Transaction Guarantee ACID — Complete Transaction Guarantee 1 Transaction BEGIN multi-step op 2 Atomicity WAL + Undo → all-or-none 3 Consistency Constraints → valid state 4 Isolation MVCC → no interference 5 Durability fsync → survives crash 6 COMMIT Permanent + Reliable THECODEFORGE.IO
Plain-English First

Think of ACID like a bank vault. Atomicity means the vault door either opens fully or stays shut — there is no half-open state where some of the money is accessible and some is not. Consistency means only valid bills go in — counterfeit money gets rejected at the door. Isolation means two people cannot grab the same stack of cash at the same time — one waits while the other finishes. Durability means once the money is locked inside, a power outage does not erase it — the vault remembers what is in it even after the lights come back on.

ACID is the set of properties that makes a database trustworthy. Without ACID, a power outage at the wrong millisecond could leave your data in a corrupted, half-updated state — a nightmare for financial records, medical systems, or any application where partial state is not just wrong but dangerous. Imagine a bank transfer: if the system fails after debiting your account but before crediting the recipient, that money vanishes. Not temporarily missing, not in a reconciliation queue — gone. ACID prevents this.

These four properties work in tandem, implemented through sophisticated internal mechanisms like transaction logs, locking protocols, and multi-version concurrency control. As a developer, understanding ACID is not about passing interviews. It is about understanding the performance trade-offs you make every time you choose an isolation level, decide between a traditional RDBMS and a relaxed NoSQL system, or configure how aggressively your database flushes data to disk.

A common misconception is that ACID is binary — either a database has it or it does not. In reality, every property has configurable levels that trade safety for throughput. PostgreSQL lets you trade durability for speed with synchronous_commit. MySQL lets you choose between InnoDB's row-level MVCC and MyISAM's table-level locking. The properties exist on a spectrum, and production tuning means knowing which dial to turn, how far to turn it, and what breaks if you go too far.

What ACID Properties Actually Guarantee (and What They Don't)

ACID stands for Atomicity, Consistency, Isolation, Durability — four properties that define how a database transaction behaves under failure and concurrency. Atomicity means the transaction is all-or-nothing: if any part fails, the entire operation is rolled back as if it never started. This is batch atomicity without requiring explicit transaction boundaries in the application code.

In practice, atomicity is enforced via write-ahead logging (WAL): changes are first recorded to a log, then applied to data pages. If the system crashes mid-transaction, the recovery process uses the log to undo partial writes. Consistency ensures that any transaction brings the database from one valid state to another, respecting all constraints, triggers, and cascades. Isolation controls how concurrent transactions see each other's intermediate states — typically via locking or multi-version concurrency control (MVCC). Durability guarantees that once a transaction commits, its changes survive even a power loss, usually by flushing the log to disk before acknowledging the commit.

Use ACID when correctness is non-negotiable: financial systems, inventory management, booking engines. The trade-off is performance — enforcing these properties adds latency and reduces throughput compared to eventually consistent systems. In production, the isolation level you choose (Read Committed vs. Serializable) directly impacts both correctness and contention. Most teams over-isolate, paying unnecessary latency, or under-isolate, corrupting data silently.

Atomicity ≠ Transactions
Atomicity guarantees all-or-nothing for a single statement or batch, but without explicit transactions, each statement is its own atomic unit — partial updates across statements are not rolled back.
Production Insight
Teams using batch INSERT ... ON CONFLICT DO UPDATE assume atomicity across the batch, but a unique constraint violation mid-batch can leave some rows updated and others skipped.
The symptom: phantom duplicate keys or missing updates in downstream reports that don't match the source data.
Rule of thumb: wrap multi-row DML in an explicit transaction if you need atomicity across the entire batch — never rely on statement-level atomicity alone.
Key Takeaway
Atomicity is per-statement by default; use explicit transactions for multi-statement atomicity.
Isolation levels are a correctness vs. throughput knob — choose the weakest level that still prevents your specific anomaly.
Durability is only as strong as your fsync configuration — a single disk write-back cache can silently violate it.
ACID Properties: Batch Atomicity Without Transactions THECODEFORGE.IO ACID Properties: Batch Atomicity Without Transactions Flow from atomicity to durability and performance trade-offs Atomicity All-or-nothing batch execution Consistency Valid state after each batch Isolation Concurrent batch interference Durability Survive crash after commit Performance Impact Logging, locking, fsync overhead ⚠ Atomicity without transactions requires manual rollback Use compensating actions or idempotent retries for partial failures THECODEFORGE.IO
thecodeforge.io
ACID Properties: Batch Atomicity Without Transactions
Acid Properties Dbms

Who Is Responsible for Each ACID Property?

Understanding which component of the DBMS enforces each ACID property is a common interview question and critical for debugging production issues.

PropertyEnforced ByMechanism
AtomicityTransaction ManagerWrite-Ahead Log (WAL) + undo segments — on abort, the TM replays undo records to reverse all changes
ConsistencyApplication Programmer + DB ConstraintsThe developer defines foreign keys, CHECK constraints, and triggers; the DB engine enforces them at transaction boundaries
IsolationConcurrency Control ManagerMVCC (snapshot isolation) or 2-Phase Locking (2PL) — prevents dirty reads, lost updates, and phantom reads
DurabilityRecovery ManagerWAL flush via fsync() before COMMIT returns — guarantees the log is on non-volatile storage before acknowledgement

Critical insight for debugging: If you see inconsistent data within a transaction, suspect Isolation (check your isolation level). If committed data disappears after a crash, suspect Durability (check fsync settings). If a constraint-violating row persists, suspect a bug in Consistency enforcement (missing constraint or deferred constraint).

Atomicity — The All-or-Nothing Guarantee

Atomicity ensures that a transaction is treated as a single, indivisible unit. If any part of the transaction fails — a constraint violation, a timeout, a crash, an OOM kill — the entire operation is rolled back, leaving the database in the exact state it was in before the transaction started. There is no partial commit. There is no state where half the work is done and the other half is not.

The primary implementation mechanism is the Write-Ahead Log (WAL). Before any data pages are modified on disk, the intended changes are appended sequentially to the WAL. If a crash occurs mid-transaction, the database reads the WAL during recovery and identifies any transaction that started but never committed. Those transactions are rolled back — their changes are undone as if they never happened.

In MySQL/InnoDB, atomicity uses the undo log in addition to the redo log. When a transaction modifies a row, the original value is copied to the undo log before the change is applied to the data page. If ROLLBACK is issued — or if the server crashes before COMMIT — the database replays the undo log to restore every modified row to its pre-transaction state.

The critical production insight is this: atomicity only exists within explicit transaction boundaries. If you run 10,000 SQL statements in auto-commit mode, each statement is its own transaction. A crash at statement 4,700 means 4,699 statements are permanently committed and 5,301 never happened — with no mechanism to identify or recover the boundary. The $47,000 incident described above is exactly this failure mode.

SAVEPOINT extends atomicity within a transaction. It creates a named checkpoint inside a running transaction. If a subsequent operation fails, you can ROLLBACK TO SAVEPOINT to undo only the work after the savepoint, while preserving everything before it. This is essential for long-running batches where reprocessing everything from scratch is not acceptable.

io/thecodeforge/db/atomic_order_flow.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- ============================================================
-- Atomic E-commerce Order Flow
-- All three operations succeed together or none of them persist.
-- A crash between statement 2 and statement 3 rolls back everything.
-- ============================================================
BEGIN;

-- Step 1: Create the order header
INSERT INTO io_thecodeforge.orders (customer_id, status, total_amount)
VALUES (1024, 'PENDING_PAYMENT', 599.00);

-- Step 2: Reserve inventory — the heavy, contention-prone operation
-- The WHERE clause prevents negative stock without a CHECK constraint race
UPDATE io_thecodeforge.inventory
SET stock_count = stock_count - 1
WHERE product_id = 'SKU-99'
  AND stock_count > 0;

-- Verify the UPDATE actually affected a row (stock was available)
-- If 0 rows affected, the product is out of stock — abort the transaction
-- Application code checks the affected row count here

-- Step 3: Record the payment intent
INSERT INTO io_thecodeforge.payments (order_id, provider, amount, status)
VALUES (currval('orders_order_id_seq'), 'STRIPE', 599.00, 'INITIATED');

-- If ANY constraint violation occurs (FK, CHECK, UNIQUE),
-- the entire transaction is rolled back — no orphaned order,
-- no phantom inventory decrement, no unmatched payment record.
COMMIT;

-- ============================================================
-- SAVEPOINT example for batch processing
-- Enables partial rollback without losing the entire batch
-- ============================================================
BEGIN;

-- Process first 100 records
SAVEPOINT batch_checkpoint_100;

-- ... 100 INSERT/UPDATE statements ...

-- Process records 101-200
SAVEPOINT batch_checkpoint_200;

-- ... 100 INSERT/UPDATE statements ...
-- Record 157 fails with a constraint violation

-- Roll back ONLY to the last checkpoint — records 1-100 survive
ROLLBACK TO SAVEPOINT batch_checkpoint_100;

-- Resume processing from record 101 with corrected data
-- ... retry logic here ...

COMMIT;  -- Everything that was not rolled back is now durable
Output
INSERT 0 1
UPDATE 1
INSERT 0 1
COMMIT
SAVEPOINT batch_checkpoint_100
SAVEPOINT batch_checkpoint_200
ROLLBACK TO SAVEPOINT
COMMIT
Atomicity Is a Time Machine for Your Database
  • WAL records the intent before the change is applied to data pages — crash recovery replays or undoes based on COMMIT status.
  • Undo log (InnoDB) or rollback segments store the original row values — ROLLBACK restores them atomically.
  • SAVEPOINT creates named checkpoints inside a transaction — partial rollback without losing all prior work.
  • Without explicit BEGIN/COMMIT boundaries, auto-commit mode means each statement is its own transaction — a crash between statements leaves an unrecoverable partial state.
  • OOM kills, SIGKILL, and hardware failures do not trigger application-level rollback logic. Only database-level transaction boundaries provide crash safety.
Production Insight
The most common atomicity failure in production is batch jobs running in auto-commit mode. Each SQL statement commits immediately, and a crash mid-batch leaves an unrecoverable partial state.
A secondary failure mode is application-level retries that do not check whether the original transaction committed. The timeout fires, the application retries, and both the original and the retry commit — producing a duplicate charge.
Rule: every logical unit of work must be wrapped in explicit BEGIN/COMMIT. If it takes more than one SQL statement and they must succeed or fail together, it needs a transaction. Auto-commit is for interactive psql sessions, not for production application code.
Key Takeaway
Atomicity is implemented via WAL and undo logs — the database records intent before applying changes to data pages, enabling complete rollback on failure.
SAVEPOINT creates checkpoints inside a transaction for partial rollback without discarding all prior work.
The most dangerous production pattern: batch jobs in auto-commit mode that leave an unrecoverable partial state when the process is killed mid-execution.
When to Use SAVEPOINT vs Full ROLLBACK
IfSingle operation that must fully succeed or fully fail — an order placement, a fund transfer, a permission change
UseUse plain BEGIN/COMMIT — no SAVEPOINT needed. The transaction is small and indivisible.
IfMulti-step process where individual steps can fail without invalidating prior steps — an order with optional gift wrapping
UseUse SAVEPOINT before each optional step. ROLLBACK TO SAVEPOINT on failure, then continue with the remaining steps.
IfLong-running batch job processing thousands of records — nightly ETL, bulk import, mass update
UseUse SAVEPOINT every N records (100-1000 depending on record size). Enables resume from the last checkpoint on failure without reprocessing the entire batch.
IfNested business logic where an inner operation's failure should not discard the outer operation's work
UseUse nested SAVEPOINTs — each inner scope rolls back independently while the outer transaction continues.
Atomicity — All-or-Nothing Architecture diagram: Atomicity — All-or-Nothing Atomicity — All-or-Nothing all OK any fail crash 1 BEGIN Transaction starts 2 Debit Account A Op 1: Write to WAL 3 Credit Account B Op 2: Write to WAL 4 COMMIT All ops succeed → done 5 ROLLBACK Any fail → undo all ops THECODEFORGE.IO

Consistency — The Valid State Machine

Consistency ensures that a transaction brings the database from one valid state to another valid state, respecting every defined rule — constraints, triggers, cascades, and domain invariants. If a transaction would violate any constraint (foreign key, unique, check, not null), the database rejects the entire transaction. The database never persists an invalid state, even if the application code attempts to create one.

Consistency is the property that ties the other three together. Atomicity ensures no partial transaction leaks invalid intermediate state. Isolation ensures concurrent transactions do not create invalid combinations of values that no single transaction would produce. Durability ensures the valid state, once committed, persists through crashes. Without any one of the other three, consistency cannot be guaranteed.

There are two kinds of consistency that matter in practice. Schema-level consistency is enforced by database constraints: foreign keys prevent orphaned references, unique constraints prevent duplicates, check constraints prevent domain violations like negative stock counts. Application-level consistency is enforced by business logic: a transfer must debit and credit the same amount, an order must have at least one line item, a subscription cancellation must trigger a prorated refund. The database cannot enforce application-level invariants automatically — those are the application developer's responsibility, backed by database constraints as a safety net.

In production, constraint violations surface as SQL errors that applications must handle gracefully. A common anti-pattern is catching constraint violations and retrying blindly without diagnosing why the violation occurred. A unique constraint violation on an idempotency key is normal and expected — the correct response is to return the existing record, not to retry with a new key. A foreign key violation is a bug — it means the application is referencing data that does not exist, and retrying will produce the same failure.

io/thecodeforge/db/consistency_enforcement.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- ============================================================
-- Consistency Enforcement — Constraints as Safety Nets
-- The database rejects any transaction that would create invalid state.
-- ============================================================

-- 1. Foreign Key: an order cannot reference a non-existent customer
--    This prevents orphaned records that break JOIN integrity.
INSERT INTO io_thecodeforge.orders (customer_id, status, total_amount)
VALUES (99999, 'PENDING', 100.00);
-- ERROR:  insert or update on table "orders" violates foreign key constraint
--         "orders_customer_id_fkey"
-- DETAIL: Key (customer_id)=(99999) is not present in table "customers".

-- 2. Check Constraint: stock cannot go negative
--    This catches the bug where two concurrent decrements both pass
--    the application-level check but the net result goes below zero.
ALTER TABLE io_thecodeforge.inventory
ADD CONSTRAINT chk_stock_non_negative
CHECK (stock_count >= 0);

UPDATE io_thecodeforge.inventory
SET stock_count = stock_count - 50
WHERE product_id = 'SKU-99' AND stock_count = 3;
-- ERROR:  new row for relation "inventory" violates check constraint
--         "chk_stock_non_negative"
-- DETAIL: Failing row contains (SKU-99, -47).

-- 3. Unique Constraint with idempotency key: prevent duplicate payments
--    This is how you make payment retries safe — the database enforces
--    that each logical payment attempt can only succeed once.
CREATE UNIQUE INDEX IF NOT EXISTS idx_payments_idempotency
  ON io_thecodeforge.payments (idempotency_key);

-- First attempt — succeeds
INSERT INTO io_thecodeforge.payments
  (idempotency_key, order_id, provider, amount, status)
VALUES ('pay_abc123', 1024, 'STRIPE', 599.00, 'COMPLETED');

-- Retry with same idempotency key — safely ignored
INSERT INTO io_thecodeforge.payments
  (idempotency_key, order_id, provider, amount, status)
VALUES ('pay_abc123', 1024, 'STRIPE', 599.00, 'COMPLETED')
ON CONFLICT (idempotency_key) DO NOTHING;
-- INSERT 0 0 — no duplicate created, no error raised

-- To return the existing record on conflict:
INSERT INTO io_thecodeforge.payments
  (idempotency_key, order_id, provider, amount, status)
VALUES ('pay_abc123', 1024, 'STRIPE', 599.00, 'COMPLETED')
ON CONFLICT (idempotency_key)
DO UPDATE SET status = io_thecodeforge.payments.status
RETURNING *;
Output
ERROR: FK violation on customer_id=99999
ERROR: CHECK violation stock_count=-47
INSERT 0 1 (first payment)
INSERT 0 0 (idempotent retry — no duplicate)
Constraints Are Safety Nets, Not Primary Validation
Never rely solely on database constraints for business rule validation. Constraints are the last line of defense — they catch bugs that slipped past application-level validation. If your production logs are full of constraint violation errors, that means invalid data is reaching the database layer, which means your application validation is broken or missing. Validate business rules in application code first. Return clear error messages to users when validation fails. Use constraints to catch the cases where application validation has a bug — because it will, eventually. Log every constraint violation in production with full request context so you can diagnose why the application validation missed it.
Production Insight
Constraint violations in production are silent throughput killers — each failed INSERT or UPDATE triggers an implicit ROLLBACK, wastes disk I/O on the WAL write, and returns an error that the application must handle.
Blind retry loops on constraint violations can cause deadlocks under high concurrency — each retry acquires locks that conflict with other retries.
Rule: validate preconditions in application code before opening the transaction. Use constraints as a safety net for bugs, not as the primary validation mechanism. Log every constraint violation with full request context.
Key Takeaway
Consistency is enforced by constraints (FK, Unique, Check, Not Null) — the database rejects any transaction that would create an invalid state.
Consistency depends on the other three ACID properties: Atomicity prevents partial state, Isolation prevents concurrent interference, Durability prevents loss of valid state.
Two kinds of consistency matter: schema-level (enforced by the database) and application-level (enforced by your code, backed by constraints as a safety net).
Consistency — Valid State Transitions Architecture diagram: Consistency — Valid State Transitions Consistency — Valid State Transitions passes passes fails fails 1 Transaction Modify data 2 FK Constraints Referential integrity 3 CHECK / UNIQUE Domain constraints 4 Valid State All rules satisfied 5 Violation Rollback + error raised THECODEFORGE.IO

Isolation — Handling the Chaos of Concurrency

Isolation defines how and when the changes made by one transaction become visible to other concurrent transactions. Without isolation, the lost update anomaly occurs: two transactions read the same row, both compute a new value independently, and the second commit silently overwrites the first commit's changes. Neither transaction is aware that the other existed.

Databases solve concurrency problems using two fundamentally different approaches. Pessimistic locking (SELECT ... FOR UPDATE) acquires an exclusive lock on the row at read time — any other transaction that tries to read or write the same row blocks until the lock is released. This guarantees no lost updates but reduces concurrency. Optimistic concurrency (MVCC plus version columns) allows concurrent reads without blocking and detects conflicts at write time — if the row changed since you read it, your update fails and the application retries.

The SQL standard defines four isolation levels, from weakest to strongest:

READ UNCOMMITTED allows dirty reads — you can see uncommitted changes from other transactions. PostgreSQL silently promotes this to READ COMMITTED because dirty reads are almost never desirable.

READ COMMITTED is the PostgreSQL default. Each SQL statement within a transaction sees a fresh snapshot of committed data at the moment that statement begins. This prevents dirty reads but allows non-repeatable reads: if you read a row twice within the same transaction, you might get different values if another transaction committed a change between your two reads.

REPEATABLE READ takes a snapshot when the transaction starts and uses that snapshot for every statement within the transaction. This prevents both dirty reads and non-repeatable reads. In PostgreSQL, REPEATABLE READ also prevents phantom reads because MVCC snapshots are transaction-scoped. In MySQL/InnoDB, REPEATABLE READ uses gap locks to prevent phantom inserts, which makes it stricter than the SQL standard requires.

SERIALIZABLE provides full isolation — transactions execute as if they ran one at a time, sequentially. PostgreSQL implements this with Serializable Snapshot Isolation (SSI), which detects read-write dependencies and aborts transactions that would violate serializability. This is the safest level but reduces throughput by 20-40% under contention because conflicting transactions are aborted and must be retried.

io/thecodeforge/db/isolation_lost_update.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
-- ============================================================
-- Preventing the Lost Update Anomaly
-- Two concurrent transactions try to update the same row.
-- Without isolation controls, the second commit silently
-- overwrites the first commit's changes.
-- ============================================================

-- Approach 1: Pessimistic Locking — SELECT FOR UPDATE
-- Acquires an exclusive row lock at read time.
-- Any other transaction that tries to read this row FOR UPDATE
-- will block until this transaction commits or rolls back.

-- Transaction A:
BEGIN;
SELECT stock_count FROM io_thecodeforge.inventory
WHERE product_id = 'SKU-99'
FOR UPDATE;  -- exclusive lock acquired on this row
-- Returns stock_count = 10

UPDATE io_thecodeforge.inventory
SET stock_count = stock_count - 1,
    updated_at = now()
WHERE product_id = 'SKU-99';
-- stock_count is now 9

COMMIT;  -- lock released, Transaction B can proceed

-- Transaction B (concurrent):
BEGIN;
SELECT stock_count FROM io_thecodeforge.inventory
WHERE product_id = 'SKU-99'
FOR UPDATE;  -- BLOCKS here until Transaction A commits
-- After A commits, returns stock_count = 9 (the updated value)

UPDATE io_thecodeforge.inventory
SET stock_count = stock_count - 1
WHERE product_id = 'SKU-99';
-- stock_count is now 8 — CORRECT

COMMIT;

-- ============================================================
-- Approach 2: Optimistic Locking — Version Column
-- No locks held during read. Conflict detected at write time.
-- If someone else changed the row, the UPDATE affects 0 rows
-- and the application retries.
-- ============================================================

-- Add a version column to the table
ALTER TABLE io_thecodeforge.inventory
ADD COLUMN IF NOT EXISTS version INTEGER DEFAULT 1;

-- Transaction A reads the row (no lock)
SELECT stock_count, version FROM io_thecodeforge.inventory
WHERE product_id = 'SKU-99';
-- Returns: stock_count=10, version=1

-- Transaction A writes with version check
UPDATE io_thecodeforge.inventory
SET stock_count = stock_count - 1,
    version = version + 1
WHERE product_id = 'SKU-99'
  AND version = 1;  -- only succeeds if no one else changed it
-- UPDATE 1 — success, version is now 2

-- Transaction B (concurrent) tries the same
UPDATE io_thecodeforge.inventory
SET stock_count = stock_count - 1,
    version = version + 1
WHERE product_id = 'SKU-99'
  AND version = 1;  -- FAILS: version is now 2, not 1
-- UPDATE 0 — zero rows affected, application detects and retries

-- ============================================================
-- Check current isolation level
-- ============================================================
SHOW transaction_isolation;
-- Default in PostgreSQL: read committed

-- Set isolation level for a specific transaction
BEGIN ISOLATION LEVEL REPEATABLE READ;
-- ... your queries here ...
COMMIT;
Output
SELECT FOR UPDATE: 1 row locked
UPDATE 1
COMMIT
-- Transaction B blocks, then:
SELECT: stock_count=9
UPDATE 1
COMMIT
-- Optimistic locking:
UPDATE 1 (Transaction A succeeds)
UPDATE 0 (Transaction B detects conflict — application retries)
transaction_isolation: read committed
Isolation Is a Speed vs Safety Dial — Not a Binary Switch
  • READ COMMITTED: each statement sees a fresh snapshot of committed data. Fast. Allows non-repeatable reads within a transaction.
  • REPEATABLE READ: the entire transaction sees one consistent snapshot from when it started. Prevents most anomalies in PostgreSQL. ~5-10% slower under contention.
  • SERIALIZABLE: transactions execute as if they ran sequentially. Safest. 20-40% slower under contention because conflicting transactions are aborted and retried.
  • MVCC (PostgreSQL): readers never block writers and writers never block readers — both work on different row versions simultaneously.
  • Pessimistic locking (SELECT FOR UPDATE): blocks other transactions explicitly. Use when write conflicts are frequent and the cost of retrying a failed optimistic write is higher than the cost of blocking.
Production Insight
Most production applications default to READ COMMITTED and never change it. This is correct for roughly 90% of workloads — higher isolation levels add latency and increase abort rates without preventing anomalies that your application logic already handles.
The remaining 10% — financial reporting queries that must see a consistent snapshot across multiple tables, inventory decrements that cannot tolerate lost updates — should use REPEATABLE READ or SELECT FOR UPDATE on a per-transaction basis.
Rule: only increase isolation when you have identified a specific anomaly that the current level does not prevent. SERIALIZABLE should be a targeted fix for a demonstrated problem, not a global default applied as a safety blanket.
Key Takeaway
Isolation is a spectrum from READ COMMITTED (fast, allows some anomalies) to SERIALIZABLE (slow, prevents all anomalies). The right choice depends on which specific anomaly you need to prevent.
MVCC in PostgreSQL is optimistic — readers never block writers. Pessimistic locking (SELECT FOR UPDATE) is explicit blocking. Choose based on conflict frequency and retry cost.
Only increase isolation when you have a specific, demonstrated anomaly to prevent. Applying SERIALIZABLE globally costs 20-40% throughput for protection you probably do not need on most transactions.
Isolation Level Selection Guide
IfRead-heavy analytics or reporting with no concurrent writes to the same rows
UseREAD COMMITTED — fastest reads, minimal locking overhead, sufficient for read-only workloads.
IfConcurrent writes to the same row — inventory decrements, counter increments, balance updates
UseREAD COMMITTED with SELECT FOR UPDATE on the specific row — pessimistic lock prevents the lost update anomaly without raising the global isolation level.
IfFinancial reporting where a query must see a consistent snapshot across multiple tables — total account balances, revenue reconciliation
UseREPEATABLE READ — the entire transaction sees one snapshot, preventing non-repeatable reads that would produce incorrect totals.
IfTransaction logic where phantom reads would cause incorrect results — checking for the existence of a record and inserting if absent
UseSERIALIZABLE — prevents all anomalies including phantoms, but expect 20-40% throughput reduction under contention. Retry aborted transactions.
IfHigh-throughput event processing where occasional conflicts are acceptable and can be retried
UseREAD COMMITTED with optimistic locking (version column). No blocking, no lock contention, application retries on conflict.
Isolation — Concurrent Transactions Architecture diagram: Isolation — Concurrent Transactions Isolation — Concurrent Transactions 1 Transaction A Reads + writes 2 MVCC Snapshot A Sees committed data only 3 Isolation Mgr READ COMMITTED / SSI 4 MVCC Snapshot B Independent view 5 Transaction B No dirty reads 6 COMMIT both No lost updates THECODEFORGE.IO

Durability — Survival After the Crash

Durability guarantees that once a transaction is committed, it remains committed — even if the server loses power one millisecond later, even if the kernel panics, even if the disk controller lies about having flushed its cache. This is the property that lets you show a user 'Payment Confirmed' and know that the confirmation is permanent.

The mechanism is straightforward in principle and expensive in practice. Before the database returns COMMIT to the client, it ensures the WAL record for that transaction has been physically written to non-volatile storage — not just to the OS page cache (which is volatile RAM), but all the way down to the disk platters or flash cells. This operation is called fsync, and it is the single most expensive operation in any database write path.

Each fsync takes 1-5 milliseconds on enterprise SSDs and 5-15 milliseconds on spinning disks. With synchronous_commit=on (the PostgreSQL default), every COMMIT waits for fsync to complete before returning to the client. This limits throughput to roughly 200-1,000 commits per second per disk on SSD, because each commit must wait for the physical write to complete.

Setting synchronous_commit=off changes the bargain dramatically. The database writes the WAL record to the OS page cache and returns COMMIT immediately, without waiting for fsync. The wal_writer background process flushes the page cache to disk every wal_writer_delay milliseconds (default 200ms). This batches multiple transactions' WAL records into a single fsync, boosting throughput to 5,000-10,000 commits per second — a 5-10x improvement.

The cost of that improvement is a durability window. If the server crashes within the wal_writer_delay window after a commit, any transactions that committed during that window but were not yet flushed to disk are lost. The data was in volatile RAM (the OS page cache), and volatile RAM does not survive power loss. In practice, you can lose up to roughly 3x the wal_writer_delay value because the wal_writer may not have completed even one flush cycle before the crash.

On restart, the database replays the WAL from the last confirmed flush point. Any committed transaction whose WAL record was flushed is recovered. Any committed transaction whose WAL record was only in the page cache is gone — permanently.

io/thecodeforge/db/durability_config.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- ============================================================
-- Durability Configuration — PostgreSQL
-- synchronous_commit controls whether COMMIT waits for WAL fsync
-- ============================================================

-- Check current durability setting
SHOW synchronous_commit;
-- Default: on (full durability — COMMIT waits for fsync)

-- Check current WAL position
SELECT pg_current_wal_lsn() AS current_wal_position;

-- Check WAL file currently being written
SELECT pg_walfile_name(pg_current_wal_lsn()) AS current_wal_file;

-- ============================================================
-- Performance vs Safety Trade-off
-- ============================================================

-- OPTION 1: Full durability (default) — for financial data
-- Every COMMIT waits for fsync. Throughput: ~200-1000 TPS on SSD.
SET synchronous_commit = on;
-- Use this for: payments, account balances, audit trails, medical records
-- Rule: if the user sees a success confirmation, the data must be durable.

-- OPTION 2: Relaxed durability — for analytics and session data
-- COMMIT returns before fsync. Throughput: ~5000-10000 TPS on SSD.
-- Risk: lose up to wal_writer_delay (default 200ms) of committed data on crash.
SET synchronous_commit = off;
-- Use this for: analytics events, session caches, non-critical logs
-- NEVER use for: payments, balances, anything a user has seen confirmed

-- OPTION 3: Per-transaction durability — the production sweet spot
-- Default to 'on', relax for specific non-critical writes
BEGIN;
SET LOCAL synchronous_commit = off;  -- only affects THIS transaction
INSERT INTO io_thecodeforge.analytics_events (event_type, payload)
VALUES ('page_view', '{"url": "/product/42"}');
COMMIT;  -- returns immediately, WAL flushed asynchronously

BEGIN;
-- synchronous_commit is back to 'on' for this transaction
INSERT INTO io_thecodeforge.payments (order_id, amount, status)
VALUES (1024, 599.00, 'COMPLETED');
COMMIT;  -- waits for fsync — data is durable before returning

-- ============================================================
-- Monitor WAL performance in production
-- ============================================================
SELECT
  stats_reset,
  wal_records,
  wal_bytes,
  wal_write,
  wal_sync,
  wal_write_time,
  wal_sync_time
FROM pg_stat_wal;

-- Monitor COMMIT latency (requires pg_stat_statements extension)
SELECT
  query,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  round(total_exec_time::numeric, 2) AS total_ms
FROM pg_stat_statements
WHERE query ILIKE '%COMMIT%'
ORDER BY total_exec_time DESC
LIMIT 5;
Output
synchronous_commit: on
current_wal_position: 0/16B3F60
current_wal_file: 000000010000000000000001
SET LOCAL synchronous_commit = off
INSERT 0 1
COMMIT (immediate — no fsync wait)
INSERT 0 1
COMMIT (waited 2.3ms for fsync)
Never Disable Durability for Financial Data
Setting synchronous_commit=off is acceptable for analytics dashboards, session caches, page view counters, and non-critical logging. It is never acceptable for payment processing, account balances, audit trails, or any data where a user has seen a success confirmation. If a customer sees 'Payment Confirmed' and the transaction is lost because the server crashed within the 200ms durability window, you have a compliance violation, a trust violation, and a customer support incident — not just a technical problem. The per-transaction approach (SET LOCAL synchronous_commit = off) is the production sweet spot: default to full durability, and relax it explicitly for specific non-critical writes with a code comment explaining why.
Production Insight
The most expensive operation in a database is fsync — and it is the price you pay for durability. Every millisecond of fsync latency directly limits your maximum commits per second.
Disabling synchronous_commit for genuinely non-critical data is a legitimate optimization that most production PostgreSQL deployments should consider. But the decision must be explicit, documented, and limited to specific transaction types — never applied as a global setting.
Rule: if the user sees a success message, the data behind that message must be durable. If it is a background analytics event that the user never sees, you can trade durability for throughput — but mark the decision explicitly in code with a comment explaining the trade-off.
Key Takeaway
Durability is implemented via WAL fsync — the database ensures the transaction log is physically on disk before returning COMMIT to the client.
synchronous_commit=off boosts throughput 5-10x by batching fsyncs, but risks losing up to ~600ms of committed data on crash.
The per-transaction approach (SET LOCAL synchronous_commit = off) is the production pattern: default to full durability, relax it explicitly for specific non-critical writes.
Durability — Survives Any Crash Architecture diagram: Durability — Survives Any Crash Durability — Survives Any Crash durability if crash restored 1 COMMIT issued Client sends commit 2 Write to WAL Append log record 3 fsync to Disk Flush before ACK 4 ACK to Client Data is permanent 5 Crash Recovery Replay WAL on restart THECODEFORGE.IO

How ACID Properties Actually Hammer Your Performance

Every ACID property comes with a tax. Atomicity forces write-ahead logs. Isolation demands locking or serialization. Durability means fsync waits. You don't get free lunch. You get correctness.

In production, that tax shows up as latency spikes and throughput ceilings. PostgreSQL's default isolation level (Read Committed) exists because Serializable would crush your TPS. MySQL's InnoDB uses row-level locking instead of table locks for the same reason — they trade strict isolation for speed.

Real systems don't blindly enable ACID. They pick battles. High-frequency trading? Durable writes before acknowledgment. Analytics dashboards? Relax isolation to Read Uncommitted so queries don't block writes. The trick is knowing which property to bend, not which to break.

When you hit a deadlock under Serializable isolation, it's not a bug. It's the database telling you your concurrency model is wrong. Listen.

AtomicityOverhead.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — cs-fundamentals tutorial

// Simulating the cost of atomicity with a write-ahead log
import time

def write_with_wal(account_id, amount):
    start = time.perf_counter()
    # Write-ahead log entry (must be durable before data write)
    log_entry = f"PREPARE TRANSFER: {account_id} -> {amount}"
    fsync_log(log_entry)  # Forces disk flush — 5-20ms typical
    
    # Actual data write
    update_balance(account_id, amount)
    fsync_data()           # Another flush
    
    elapsed = (time.perf_counter() - start) * 1000
    print(f"Atomic operation cost: {elapsed:.1f}ms")
    return elapsed

write_with_wal("acc_401k", -5000)
Output
Atomic operation cost: 18.3ms
Production Trap:
Don't benchmark ACID on localhost SSD. Real-world latency includes network hops and shared storage. That 18ms local write becomes 200ms on a cloud EBS volume under contention.
Key Takeaway
ACID guarantees cost real wall-clock time. Measure before you architect, or your 'correct' system will be a slow system.

ACID vs BASE — When to Choose Which

ACID and BASE represent opposite ends of the consistency-availability spectrum in distributed systems.

ACID (Atomicity, Consistency, Isolation, Durability) prioritises correctness: every transaction either fully succeeds or fully rolls back, every read sees a consistent snapshot, and committed data survives crashes. This is exactly what you need for financial systems, inventory management, and healthcare records — any domain where a half-written state is worse than an error.

BASE (Basically Available, Soft-state, Eventually Consistent) accepts temporary inconsistency in exchange for higher availability and horizontal scalability. A BASE system might return stale data immediately after a write, but guarantees that all nodes will eventually converge. DynamoDB, Cassandra, and CouchDB are built on BASE principles.

The practical choice: - Use ACID when correctness is non-negotiable: banking, orders, medical records - Use BASE when availability and write throughput matter more than instant consistency: social media feeds, product catalogues, analytics, recommendation engines - Hybrid: many modern systems mix both — use PostgreSQL (ACID) for financial records, Cassandra (BASE) for event logs, Redis (tunable) for session state

The CAP theorem says distributed systems can guarantee at most two of: Consistency, Availability, Partition Tolerance. ACID databases typically choose CP (consistency + partition tolerance). BASE databases choose AP (availability + partition tolerance).

acid-vs-base.txtTEXT
1
2
3
4
5
6
7
8
9
10
Property      | ACID                          | BASE
-----------   | ----------------------------- | ---------------------------
Consistency   | Immediate, strict             | Eventually consistent
Availability  | May reject during contention  | Always responds
Scalability   | Vertical (scale-up)           | Horizontal (scale-out)
Complexity    | DB handles correctness        | App must handle conflicts
Use cases     | Finance, healthcare, orders   | Social, analytics, IoT
Examples      | PostgreSQL, MySQL, SQLite     | Cassandra, DynamoDB, Mongo*

*MongoDB added multi-doc ACID in v4.0 — but defaults to BASE semantics

Critical Use Cases for ACID in Databases

Not every data store needs ACID. Your analytics cluster reading parquet files doesn't. Your session cache in Redis doesn't. But when money moves or lives depend on data, ACID is non-negotiable.

Banking is the textbook case — transfer $500 from checking to savings. Without atomicity, a partial crash leaves $500 floating in the void. Without durability, the bank acknowledges the transfer but forgets it on power loss. That's not acceptable. Ever.

E-commerce order processing is another. When a customer clicks 'Buy Now', multiple writes happen: decrement inventory, charge card, create shipment record. If inventory decrements but payment fails, you've oversold. If payment succeeds but shipment record is lost, customer gets nothing.

Health records demand both consistency and durability. A lab result update must satisfy all constraints (valid patient ID, proper value ranges) and survive hardware failure. Partial updates could mean wrong diagnosis. The database must stay correct through power outages, disk failures, and network partitions.

For everything else, there's BASE (Basically Available, Soft state, Eventual consistency). Know which camp your data lives in.

OrderProcessing.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — cs-fundamentals tutorial

// Simplified e-commerce transaction with rollback
import sqlite3

conn = sqlite3.connect("shop.db")
cursor = conn.cursor()

try:
    cursor.execute("BEGIN TRANSACTION")
    
    # Decrement inventory
    cursor.execute("UPDATE inventory SET stock = stock - 1 WHERE sku = 'LAPTOP_2024'")
    if cursor.rowcount == 0:
        raise Exception("Product out of stock")
    
    # Charge customer — mock payment gateway
    cursor.execute("INSERT INTO orders (customer_id, sku, status) VALUES (42, 'LAPTOP_2024', 'pending')")
    
    conn.commit()  # All changes stick, or none do
    print("Order placed successfully")

except Exception as err:
    conn.rollback()  # Inventory goes back, no partial order
    print(f"Transaction failed: {err}")
Output
Order placed successfully
Senior Shortcut:
Always use explicit BEGIN/COMMIT/ROLLBACK in application code. Database drivers sometimes auto-commit by default — that will let partial failures slip through and corrupt your data silently.
Key Takeaway
Use ACID transactions when data corruption costs more than slower performance. For everything else, consider eventual consistency.
● Production incidentPOST-MORTEMseverity: high

Double-Debit: $47,000 Lost to Missing Atomicity in a Payment Batch

Symptom
230 customer accounts showed unexplained negative balances Monday morning. The external payment processor confirmed no funds received for those accounts. Customer support was flooded with 1,200 tickets by 9 AM. The internal ledger showed debits with no matching payment records, and the discrepancy grew by the hour as more customers checked their accounts.
Assumption
The team assumed the batch job was atomic because it ran as a single Java process inside a single JVM. They treated process-level execution as equivalent to database-level atomicity. They did not wrap each payment transfer in a BEGIN/COMMIT transaction block because they believed the ORM's implicit auto-commit was sufficient.
Root cause
The batch job executed 10,000 UPDATE statements to debit customer accounts, then 10,000 INSERT statements to create payment records in the payment log table. Each SQL statement ran in auto-commit mode — every individual UPDATE and INSERT was committed immediately upon execution. When the JVM ran out of memory at statement 4,700 of the INSERT phase, all 10,000 debits had already been committed individually, but only 2,350 of the 10,000 payment records had been created. There was no transaction boundary encompassing both the debit and the payment log insert for any individual transfer. The ORM's auto-commit mode meant each statement was its own transaction — atomicity existed at the statement level but not at the business-logic level.
Fix
1. Wrapped each payment transfer in an explicit BEGIN/COMMIT block: debit the customer account, insert the payment record, and update the internal ledger — all within a single database transaction. If any step fails, the entire transfer rolls back and the customer's balance is unchanged. 2. Added SAVEPOINT every 100 records so that a failure at record 4,700 rolls back only to the SAVEPOINT at record 4,600, and the batch can resume from that checkpoint rather than reprocessing all 10,000 records from scratch. 3. Added a reconciliation query that runs hourly in production: count of debits versus count of payment records per batch. Any mismatch greater than zero triggers a critical alert immediately. 4. Added a circuit breaker that halts the batch if more than 5 consecutive individual transfers fail, preventing cascading damage. 5. Configured JVM memory monitoring with alerts at 80% heap usage, so OOM conditions are detected before they kill the process mid-batch.
Key lesson
  • Never assume a batch process is atomic because it runs in a single process. Atomicity is a database-level property enforced by explicit transaction boundaries, not a JVM-level property enforced by a single process.
  • Use SAVEPOINT inside long-running batches to enable partial rollback and resume — reprocessing 10,000 records because record 4,701 failed is a waste of compute and customer patience.
  • Monitor for debit-to-payment count mismatches in production. A 1% drift between related tables is not a minor accounting issue — it is a critical data integrity alert that should page someone immediately.
  • OOM kills, SIGKILL signals, and hardware failures do not trigger transaction rollback if no transaction boundary exists. Auto-commit mode means every individual statement is its own transaction, and a failure between statements leaves the database in a state that no single rollback can fix.
Production debug guideDiagnosing transactional failures in production PostgreSQL and MySQL systems. These symptoms appear silently and compound over time.5 entries
Symptom · 01
Duplicate charges appear after a timeout-retry cycle — customer is billed twice for the same order
Fix
Check whether the retry logic reuses the same transaction or creates a new one. The original transaction may have committed successfully but the response timed out before reaching the application. Implement idempotency keys on the payment table with a unique constraint. Verify the original transaction status before retrying: SELECT * FROM pg_stat_activity WHERE state = 'idle in transaction' to check for abandoned connections. Use INSERT ... ON CONFLICT (idempotency_key) DO NOTHING to make retries safe.
Symptom · 02
Two concurrent updates overwrite each other silently — the lost update anomaly
Fix
Check the current isolation level: SHOW transaction_isolation. If READ COMMITTED (the PostgreSQL default), concurrent transactions can both read the same row value, compute a new value independently, and the second commit silently overwrites the first. Fix with either pessimistic locking (SELECT ... FOR UPDATE to lock the row before reading) or optimistic locking (add a version column and include AND version = :expected in the UPDATE WHERE clause). Choose pessimistic for high-conflict workloads, optimistic for low-conflict.
Symptom · 03
Committed data disappears after a server crash — users report seeing a success confirmation but the data is gone
Fix
Check the synchronous_commit setting: SHOW synchronous_commit. If set to off, committed data exists only in the OS page cache until the wal_writer flushes it to disk. A crash within that window loses the data. Set synchronous_commit = on for any table that stores data a user has seen confirmed. Monitor WAL flush lag with SELECT pg_current_wal_lsn() and compare against the last flushed position.
Symptom · 04
Deadlocks occurring every few minutes under normal load — application logs show 'deadlock detected' errors
Fix
Enable deadlock logging: SET deadlock_timeout = '1s' (the default is already 1s in PostgreSQL). Query pg_stat_activity for blocked and blocking PIDs. The most common cause is inconsistent lock ordering: Transaction A locks row 1 then row 2, while Transaction B locks row 2 then row 1. Fix by establishing a canonical lock ordering in application code — always acquire locks in the same deterministic order (e.g., by primary key ascending or table name alphabetical).
Symptom · 05
Queries return stale data despite recent writes — a read immediately after a write returns the old value
Fix
Determine whether the read is hitting a replica or the primary. Check replication lag: SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag. If lag is the cause, either route critical reads to the primary or wait for the replica to catch up before serving the response. If using MVCC on the primary, check whether VACUUM is running — excessive dead tuples from un-vacuumed tables can cause reads to see outdated row versions. Run SELECT relname, n_dead_tup FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10 to identify bloated tables.
★ ACID Debugging Cheat SheetQuick diagnostic commands for transactional issues in production PostgreSQL systems.
Transaction stuck in 'idle in transaction' state — holding locks, blocking other queries
Immediate action
Identify and terminate long-running stalled transactions that are holding locks without doing work.
Commands
SELECT pid, now() - xact_start AS duration, state, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY duration DESC;
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND now() - xact_start > interval '5 minutes';
Fix now
Set idle_in_transaction_session_timeout = '30s' in postgresql.conf to automatically terminate stalled transactions. Monitor this metric continuously — a single idle-in-transaction session can hold locks that block the entire application tier.
Deadlock detected in application logs — transactions are mutually blocking each other+
Immediate action
Identify which queries are involved in the deadlock cycle and what locks they hold.
Commands
SELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query, blocking.pid AS blocking_pid, blocking.query AS blocking_query FROM pg_catalog.pg_locks bl JOIN pg_catalog.pg_stat_activity blocked ON blocked.pid = bl.pid JOIN pg_catalog.pg_locks bbl ON bbl.locktype = bl.locktype AND bbl.relation = bl.relation AND bbl.pid != bl.pid JOIN pg_catalog.pg_stat_activity blocking ON blocking.pid = bbl.pid WHERE NOT bl.granted;
SELECT relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables WHERE n_dead_tup > 10000 ORDER BY n_dead_tup DESC;
Fix now
Enforce consistent lock ordering in application code — always acquire locks in a deterministic order such as primary key ascending or table name alphabetical. If the deadlock involves a hot table with millions of dead tuples, run VACUUM ANALYZE on that table immediately — dead tuple bloat increases lock contention.
WAL disk usage growing uncontrollably — disk space alerts firing on the WAL partition+
Immediate action
Check whether inactive replication slots are retaining WAL that will never be consumed.
Commands
SELECT slot_name, active, pg_size_pretty( pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) ) AS retained_wal FROM pg_replication_slots ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC;
SELECT pg_size_pretty(sum(size)) AS total_wal_size FROM pg_ls_waldir();
Fix now
Drop any replication slot that shows active = false and is retaining gigabytes of WAL: SELECT pg_drop_replication_slot('slot_name'). Set max_slot_wal_keep_size in postgresql.conf to cap how much WAL any single slot can retain. An inactive replication slot is a silent time bomb — it prevents WAL recycling indefinitely.
Isolation Levels — Complete Comparison
Isolation LevelDirty ReadNon-Repeatable ReadPhantom ReadImplementation MechanismThroughput Impact
READ UNCOMMITTEDPossible in theory — PostgreSQL promotes to READ COMMITTEDPossiblePossibleMVCC snapshot per statement (identical to READ COMMITTED in PostgreSQL)Baseline — fastest possible
READ COMMITTED (PostgreSQL default)Prevented — each statement sees only committed dataPossible — two reads of the same row in one transaction may return different valuesPossible — new rows inserted by other transactions become visible between statementsMVCC snapshot taken at the start of each statement, not the transactionMinimal overhead — the sweet spot for 90% of workloads
REPEATABLE READPreventedPrevented — the entire transaction sees one consistent snapshotPrevented in PostgreSQL (MVCC snapshot is transaction-scoped). MySQL/InnoDB uses gap locks.MVCC snapshot taken at the start of the transaction and held for its entire duration~5-10% slower under contention due to increased serialization failures that require retry
SERIALIZABLEPreventedPreventedPreventedSerializable Snapshot Isolation (SSI) with predicate locking — detects read-write dependencies~20-40% slower under contention — conflicting transactions are aborted and must be retried by the application

Key takeaways

1
Atomicity
all-or-nothing transactions implemented via WAL and undo logs. Without explicit BEGIN/COMMIT boundaries, auto-commit mode makes each statement its own transaction and a crash between statements leaves unrecoverable partial state.
2
Consistency
the database moves from one valid state to another by enforcing all constraints. Application-level validation should catch business rule violations before they reach the database; constraints are the safety net, not the primary check.
3
Isolation
a speed-versus-safety dial from READ COMMITTED (fast, allows non-repeatable reads) to SERIALIZABLE (safe, 20-40% slower). Default to READ COMMITTED and increase isolation only for specific transactions with demonstrated anomalies.
4
Durability
committed data survives crashes because WAL is flushed to disk via fsync before COMMIT returns. synchronous_commit=off trades durability for throughput — acceptable for analytics, never for financial data.
5
Every ACID property has a configurable dial. Turning durability down (synchronous_commit=off) or isolation down (READ COMMITTED) is a legitimate production optimization
but only when the business requirement explicitly permits it and the decision is documented in code.
6
CAP theorem governs distributed systems
during a network partition, choose between consistency and availability. ACID focuses on local integrity within a single database. BASE trades consistency for availability in distributed systems. The choice depends on the business requirement, not on technology preference.
7
BASE (Basically Available, Soft-state, Eventually Consistent) is the alternative to ACID
chosen by distributed NoSQL systems (Cassandra, DynamoDB) when availability and write throughput matter more than immediate consistency.

Common mistakes to avoid

5 patterns
×

Setting synchronous_commit=off globally or for tables that store financial data

Symptom
Users see Payment Confirmed but the transaction is lost if the server crashes within the wal_writer_delay window. The customer is charged externally by the payment gateway, but the database has no record of the charge. Reconciliation fails. Refund is manual.
Fix
Never disable synchronous_commit for any table that stores financial transactions, account balances, or audit records. Use synchronous_commit=off only for analytics events, session data, or non-critical logs — and only via SET LOCAL within specific transactions, not as a global configuration. Add a code comment at every SET LOCAL synchronous_commit = off line explaining why durability is not required for that specific write.
×

Applying SERIALIZABLE isolation globally as a preventive safety measure

Symptom
Application throughput drops 30-40% under concurrent load with no corresponding improvement in data quality. Deadlocks and serialization failures increase because SERIALIZABLE uses predicate locking that blocks range scans. Application retry logic creates additional load.
Fix
Use READ COMMITTED as the global default. Apply SERIALIZABLE only to specific transactions that have demonstrated a phantom read or write skew bug in production. In Spring Boot, use @Transactional(isolation = Isolation.SERIALIZABLE) on the specific method, not on the class or application-wide configuration. Always implement retry logic for SERIALIZABLE transactions because serialization failures are expected, not exceptional.
×

Relying on database constraints as the primary business validation mechanism

Symptom
Production logs show thousands of constraint violation errors with minimal debugging context. Each violation triggers an implicit ROLLBACK, wastes WAL I/O, and returns a generic database error code that the application must map to a user-facing message.
Fix
Validate business rules in application code before opening the transaction. Return clear, specific error messages to users when validation fails. Use constraints as a safety net that catches bugs in your application validation — not as the primary validation layer. Log every constraint violation in production with full request context so you can diagnose why the application validation missed it.
×

Implementing retry logic without idempotency keys for payment transactions

Symptom
Timeout-retry cycles produce duplicate charges. The first attempt succeeded and committed, but the response timed out before reaching the application. The application retries, creates a second charge, and the customer is billed twice for the same order.
Fix
Generate a unique idempotency key per logical payment attempt — typically a UUID generated by the client before the first attempt. Store it in a column with a unique constraint. Use INSERT ... ON CONFLICT (idempotency_key) DO NOTHING to make retries safe. If the insert returns zero rows affected, query the existing record and return it to the caller. The customer is charged exactly once regardless of how many retries the application sends.
×

Not monitoring long-running idle in transaction sessions

Symptom
Lock accumulation over hours causes gradually degrading query latency. Queries that normally take 10ms start taking 30 seconds as they wait for locks held by abandoned transactions. The degradation is gradual, making it difficult to identify the root cause without monitoring.
Fix
Set idle_in_transaction_session_timeout = '30s' in postgresql.conf to automatically terminate sessions that sit in an open transaction without executing queries for more than 30 seconds. Monitor with: SELECT pid, now() - xact_start AS duration FROM pg_stat_activity WHERE state = 'idle in transaction'. Alert on any session exceeding 10 seconds. The most common cause is application code that opens a transaction, makes an HTTP call to an external service, and does not commit or rollback if the HTTP call times out.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Explain the Lost Update anomaly. How would you solve it in a Spring Boot...
Q02SENIOR
How does the database ensure Consistency when a transaction violates a F...
Q03SENIOR
Describe a scenario where READ COMMITTED isolation would cause an applic...
Q04SENIOR
What is the performance impact of setting synchronous_commit=off in Post...
Q05SENIOR
Why do many NoSQL databases adopt BASE instead of ACID? When would you c...
Q06JUNIOR
What does ACID stand for, and why does each property matter?
Q07JUNIOR
Give a real-world example where violating Atomicity would cause serious ...
Q08JUNIOR
What is the default isolation level in PostgreSQL, and why was it chosen...
Q01 of 08SENIOR

Explain the Lost Update anomaly. How would you solve it in a Spring Boot application using @Version (Optimistic Locking) versus SELECT FOR UPDATE (Pessimistic Locking)?

ANSWER
A Lost Update occurs when two transactions read the same row, both compute a new value based on the old value, and the second commit silently overwrites the first commit's changes. Example: Admin A reads stock=10 and decrements to 9. Admin B also reads stock=10 and decrements to 9. Final value is 9 instead of 8 — one decrement is silently lost. Optimistic Locking with @Version: Add a version column to the JPA entity annotated with @Version. Hibernate automatically includes version in the WHERE clause of every UPDATE: UPDATE inventory SET stock=9, version=2 WHERE id=1 AND version=1. If another transaction modified the row first, the version no longer matches, the UPDATE affects zero rows, and Hibernate throws OptimisticLockException. The application catches the exception and retries. This approach holds no database locks during the read phase — best for low-conflict scenarios where retries are rare. Pessimistic Locking with SELECT FOR UPDATE: The first transaction acquires an exclusive row lock at read time. The second transaction blocks on its SELECT until the first commits or rolls back. In Spring Data JPA: @Lock(LockModeType.PESSIMISTIC_WRITE) on the repository method, or @Transactional with a native query using FOR UPDATE. This guarantees no lost update but reduces concurrency because blocked transactions wait. Best for high-conflict scenarios where retries would be frequent and expensive. Choose optimistic when write conflicts are rare (less than 5% of transactions). Choose pessimistic when write conflicts are frequent (inventory decrements during a flash sale).
FAQ · 11 QUESTIONS

Frequently Asked Questions

01
How does PostgreSQL implement MVCC (Multi-Version Concurrency Control)?
02
What is the difference between ACID and the CAP theorem?
03
What is a Write-Ahead Log (WAL) and why is it faster than writing directly to data files?
04
Can I use synchronous_commit=off on a read replica?
05
What is the difference between a dirty read, non-repeatable read, and phantom read?
06
What is the difference between ACID and BASE?
07
Who is responsible for enforcing each ACID property in a DBMS?
08
Does enabling ACID hurt performance, and by how much?
09
How does ACID work in a distributed database across multiple nodes?
10
What happens to ACID guarantees on a read replica?
11
Can NoSQL databases be ACID compliant?
N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Drawn from code that ran under real load.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's DBMS. Mark it forged?

12 min read · try the examples if you haven't

Previous
Introduction to DBMS
2 / 11 · DBMS
Next
Database Normalization in DBMS