SQL Transactions — Missing ROLLBACK Causes Empty Orders
12-15 empty orders daily from swallowed exceptions — fix missing ROLLBACK, debug deadlocks, pool exhaustion, table bloat from real production incidents..
20+ years shipping high-throughput database systems. Everything here is grounded in real deployments.
- A transaction wraps multiple SQL statements into an all-or-nothing unit — BEGIN → statements → COMMIT or ROLLBACK
- Atomicity: enforced by the Write-Ahead Log (WAL) — crash recovery replays or undoes incomplete transactions
- Durability: COMMIT waits for fsync to disk — disabling fsync breaks durability entirely
- Isolation levels: READ COMMITTED (PG default) → REPEATABLE READ (MySQL default) → SERIALIZABLE — lower = more concurrency, more anomaly risk
- SAVEPOINT: partial rollback within a transaction — keeps prior work alive after a recoverable error
- Biggest mistake: wrapping entire request handlers in one transaction — holds locks for hundreds of ms, exhausts connection pools under load
Imagine you're at an ATM withdrawing $100. The machine needs to do two things at once: subtract $100 from your account AND spit out the cash. If the power cuts out between those two steps, you can't end up with no cash AND no money in your account — that would be a disaster. A database transaction is the ATM's promise: either BOTH things happen, or NEITHER does. ACID is just the rulebook that makes that promise bulletproof.
A SELECT that returns the wrong customer's invoice. A funds transfer that vanishes into thin air. That's what you get without understanding ACID. SQL transactions are the engine's contract with you—atomic commits, durable logs, and isolation guarantees that prevent phantom orders and corrupted state. Ignore them, and your production database becomes a minefield of lost data, deadlocked sessions, and unkillable long transactions that bring everything to a crawl.
Why Missing ROLLBACK Creates Phantom Orders
SQL transactions group multiple operations into a single atomic unit. The core mechanic: either all operations commit permanently, or none do — the database rolls back to the prior consistent state. This is the 'A' in ACID (Atomicity). Without explicit ROLLBACK on failure, partial writes persist, corrupting data integrity.
Transactions enforce isolation (the 'I') so concurrent operations don't see intermediate states. In practice, this means two simultaneous order inserts won't interleave: one transaction's uncommitted rows are invisible to others. Durability ('D') ensures committed data survives crashes — typically via write-ahead logging. Consistency ('C') guarantees that constraints (e.g., foreign keys, unique indexes) hold before and after.
Use transactions for any multi-step write: transferring funds, placing orders, updating inventory. Real systems fail when a script inserts order headers but skips ROLLBACK on line-item failure — leaving orphaned headers that break reporting and billing. Always wrap multi-statement writes in BEGIN/COMMIT/ROLLBACK blocks, and test the failure path.
Atomicity and Durability — The Write-Ahead Log Is the Real Hero
Atomicity means all-or-nothing. Either every statement in a transaction lands in the database, or none of them do. But how does a database engine actually enforce this when the server can die at any microsecond?
The answer is the Write-Ahead Log (WAL). Before any data page is modified in memory, the engine writes an intent record to the WAL — an append-only file on disk. If the server crashes mid-transaction, the WAL lets the engine replay or undo operations on restart. This is called crash recovery, and it's what makes Atomicity a real guarantee rather than a hopeful suggestion.
Durability is the flip side: once you get a COMMIT acknowledgement, the data is safe even if the server immediately crashes. That acknowledgement is only sent after the WAL record is flushed to disk (fsync). This is why COMMIT can feel slightly slow — it's waiting on a real disk write, not just a memory operation. Disabling fsync breaks Durability entirely and is a production disaster waiting to happen.
ACID Pipeline — How a Transaction Flows Through the Engine
Understanding ACID in production means knowing the exact sequence of steps the database executes from BEGIN to COMMIT. The pipeline consists of five phases:
- Transaction Start: The database assigns a transaction ID (XID) and records the start in shared memory structures. For PostgreSQL, the snapshot data for MVCC is initialized.
- Statement Execution: Each SQL statement is parsed, planned, and executed. Data pages are read from the buffer pool (or disk if not cached). Modifications are applied to pages in memory, and WAL records are written to the WAL buffer in shared memory.
- Pre-commit: On COMMIT, the database writes a commit record to the WAL buffer. This record contains the transaction ID and a list of all data pages modified.
- WAL Flush (fsync): The WAL buffer is flushed to disk. This is the blocking step — the database waits for the OS to confirm the write reached stable storage. Only after this does the transaction become durable.
- Post-commit: The transaction's XID is marked as committed in the commit log (pg_clog or system table). Locks held by the transaction are released. The COMMIT acknowledgement is sent to the client. The actual data pages may still be in memory — they will be written to disk later by the background writer.
The pipeline is the same across PostgreSQL, MySQL InnoDB, and Oracle, with minor differences in WAL format and commit log storage.
Isolation Levels — Where ACID Gets Complicated and Performance Gets Real
Isolation is the trickiest ACID property because it's not binary — it's a spectrum. 'Full' isolation (Serializable) means transactions behave as if they ran one after another, but it's expensive. Most databases default to something weaker, and that's where bugs hide.
Read Uncommitted — a transaction can read rows another transaction modified but hasn't committed yet (dirty reads). Almost never appropriate.
Read Committed — you only see committed data, but the same SELECT inside one transaction can return different rows if another transaction commits between them (non-repeatable reads). PostgreSQL and SQL Server's default.
Repeatable Read — the same SELECT always returns the same rows within a transaction, but new rows inserted by other transactions may appear (phantom reads). MySQL InnoDB's default.
Serializable — full isolation. No dirty reads, non-repeatable reads, or phantom reads. PostgreSQL uses Serializable Snapshot Isolation (SSI); MySQL InnoDB uses gap locks.
The key insight: lower isolation = more concurrency = more anomaly risk. Choose deliberately.
Isolation Levels Comparison Matrix (ANSI Standards)
The SQL standard defines four isolation levels based on which concurrency phenomena they prevent. The matrix below maps each level to the three classic anomalies plus a fourth that some standards include:
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Serialization Anomaly |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible | Possible |
| READ COMMITTED | Prevented | Possible | Possible | Possible |
| REPEATABLE READ | Prevented | Prevented | Possible | Possible |
| SERIALIZABLE | Prevented | Prevented | Prevented | Prevented |
Dirty Read: Reading a row that has been modified by another transaction that has not yet committed. If that transaction later rolls back, the reader has seen uncommitted (and now invalid) data.
Non-Repeatable Read: The same query returns different rows within the same transaction because another transaction committed an update or delete in between.
Phantom Read: A transaction re-executes a query with a WHERE condition and finds new rows that were inserted by another committed transaction. The set of rows changes (like a phantom appearing).
Serialization Anomaly: The result of concurrent transactions is not equivalent to any serial (one-after-another) execution. Serializable prevents all anomalies.
In practice, most databases default to READ COMMITTED (PostgreSQL, SQL Server) or REPEATABLE READ (MySQL InnoDB). Know your database's default and choose intentionally.
Concurrency Phenomena Quick-Reference Table
Beyond dirty reads, non-repeatable reads, and phantom reads, there are other phenomena that affect correctness under concurrent transactions. This table covers the full set a working developer needs to recognize:
| Phenomenon | Description | Prevented By | Common Example |
|---|---|---|---|
| Dirty Read | Read uncommitted changes from another transaction | READ COMMITTED+ | A report includes a row that later gets rolled back |
| Non-Repeatable Read | Same row read twice gives different values within one transaction | REPEATABLE READ+ | Fetching a product price, then computing discount — price changed in between |
| Phantom Read | A query returns different sets of rows in the same transaction | SERIALIZABLE, or REPEATABLE READ with gap locks | Inventory query: 'SELECT * WHERE type=book' returns 10 books, then later 11 |
| Lost Update | Two transactions read and then overwrite the same row, losing one update | Row locking (SELECT FOR UPDATE) or SERIALIZABLE | Two users decrement stock: both read stock=5, both write stock=4, losing one sale |
| Write Skew | Two transactions read overlapping data sets and make conflicting writes without directly updating the same row | SERIALIZABLE | Two doctors on-call: both read no other doctor on call, both go off call, leaving no coverage |
| Read Skew | An inconsistent snapshot where a transaction sees a mix of pre- and post-commit data from another transaction | REPEATABLE READ+ in MVCC, or snapshot isolation | Read a user's address and then their order — address changed by another transaction in between |
Understanding these phenomena is critical when debugging production issues like phantom stock counts or double-booking. Each isolation level and concurrency control mechanism targets a subset of these.
Consistency, Savepoints and Long Transaction Dangers in Production
Consistency is the ACID property that's most misunderstood. It doesn't mean 'the data is correct' in a business logic sense — that's your application's job. What it means is that every transaction takes the database from one valid state to another, where 'valid' is defined by schema constraints: foreign keys, CHECK constraints, UNIQUE indexes, and NOT NULL columns.
SAVEPOINTs extend this by letting you create partial rollback points inside a long transaction. Instead of aborting everything on a recoverable error, you can roll back to a named savepoint and retry just that sub-operation. This is invaluable for bulk import scripts and multi-step wizards.
But long-running transactions are expensive regardless of what they're doing. In PostgreSQL, an open transaction holds back autovacuum from reclaiming dead rows, causing table bloat. In MySQL InnoDB, long transactions hold undo log space. The rule: keep transactions as short as possible. Do computation before opening BEGIN, not inside it.
now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC to spot long-running transactions. Any transaction open for more than a few seconds on a write-heavy table blocks autovacuum and accumulates dead tuple bloat. Set idle_in_transaction_session_timeout = '30s' in postgresql.conf as a safety net.SAVEPOINTs for Complex Business Logic — Nested Error Recovery Patterns
When business logic requires multiple steps that each have their own recovery path, SAVEPOINTs enable a nested error handling pattern that preserves the entire transaction's progress. This is especially useful in scenarios like:
- Batch processing with per-record error handling: Process all records in one transaction; if one record fails (uniqueness, FK violation), log the error and continue with the next record. Without SAVEPOINTs, the entire batch would roll back.
- Multi-step financial workflows: A payment capture, then a loyalty points update, then an inventory deduction. If the points update fails (e.g., connection timeout to a remote service), you want to roll back only the points step and retry, not the entire payment.
- Compound validation: Validate a set of business rules one by one, rolling back each invalid step without losing earlier valid work. At the end, commit only if all sub-validations passed.
A typical pattern in Java/JDBC or Python/psycopg2:
``java try { connection.setAutoCommit(false); // Step 1: create order Savepoint sp1 = connection.setSavepoint("order_created"); insertOrder(); // Step 2: add line items Savepoint sp2 = connection.setSavepoint("items_added"); insertLineItems(); // If line items fail, rollback to sp2 and log } catch (SQLException e) { connection.rollback(sp2); // rollback only items logError(e); // retry step 2 or continue } finally { connection.releaseSavepoint(sp2); } ``
Note: SAVEPOINTs have a memory cost — each savepoint retains a snapshot of the transaction state. For large batch sizes, release savepoints as soon as their work is finalized.
Atomicity Failure: The Partial Write That Burns You at 2 AM
Every dev thinks they understand atomicity until a power loss hits mid-insert and they're staring at orphaned invoice headers with no line items. Atomicity isn't a feature — it's a contract. The database promises that your transaction either fully commits or fully rolls back. No half-state. Ever. The mechanism? Write-ahead logs and undo segments. Before the engine touches a data page, it logs the 'before' image. If the transaction aborts — whether from a CHECK constraint violation or a dropped network packet — the engine replays those images to restore the original state. Here's what most junior devs miss: atomicity costs you. Every write generates log traffic. Every rollback requires reading those logs back. Watch your log buffer sizes and disk I/O latency. If you're pushing 10k row updates in a single transaction, you're betting the hardware won't blink. And in production, hardware blinks.
Durability: The Lie of fsync and What Actually Survives a Power Pull
Durability sounds simple: once COMMIT returns, data stays. That's marketing. In reality, durability depends on the write-ahead log flushing to persistent storage before the commit acknowledgment reaches your client. PostgreSQL does this by default — fsync on every commit. MySQL with InnoDB flushes the redo log. But here's the catch: hardware lies. Disk controllers cache writes in volatile RAM. RAID cards lie about flushing. Cloud SSDs throttle under load and report success before bits hit silicon. Saw a production incident where a RAID controller battery failed silently. Commits returned OK. Server crashed. Lost four minutes of transaction data. The fix? Disable disk write cache, enable write-back barriers, and for critical systems, use synchronous replication to at least two nodes. Test your durability by pulling the plug on a staging server. Read the survivor's guide: synchronous_commit = on in Postgres, innodb_flush_log_at_trx_commit = 1 in MySQL.
Silent Partial Orders — A Missing ROLLBACK in the Exception Handler
log.warn() → COMMIT executed. The COMMIT committed the order header without the items. The exception handler never called rollback().rollback() in the catch block before re-throwing. Added a post-commit assertion: if order_items count for the new order_id is zero, raise an alert. Added an integration test that injects an order_items failure and asserts the order header is also rolled back.- A caught exception does not prevent COMMIT — you must explicitly call
rollback()before re-throwing or returning - Never swallow database exceptions with only a
log.warn()inside a transaction — always either rollback or abort - Add post-commit assertions for critical multi-table writes — catch partial states that the exception handler missed
now() - query_start AS duration, state, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC. Any transaction open for more than a few seconds on a write-heavy table is a problem. Add idle_in_transaction_session_timeout = '30s' to postgresql.conf.now() - xact_start AS open_for FROM pg_stat_activity WHERE xact_start IS NOT NULL ORDER BY xact_start. Terminate the oldest idle-in-transaction sessions with pg_terminate_backend(pid).txid_current_if_assigned() to check.Key takeaways
Common mistakes to avoid
3 patternsWrapping entire request handlers in a single long transaction
Swallowing exceptions inside a transaction without rolling back
rollback() explicitly in the exception handler before re-throwing. Never let a caught exception allow execution to reach COMMIT. Add a post-commit assertion to catch partial states that slip through.Assuming AUTOCOMMIT=OFF when it is actually ON
Interview Questions on This Topic
Explain what happens mechanically inside PostgreSQL between COMMIT and the application receiving a success response.
Frequently Asked Questions
20+ years shipping high-throughput database systems. Everything here is grounded in real deployments.
That's SQL Advanced. Mark it forged?
9 min read · try the examples if you haven't