Transactions in DBMS: Mastering ACID, Isolation Levels, and Concurrency
- ACID is the gold standard: Atomicity, Consistency, Isolation, and Durability ensure data integrity even during hardware failure.
- Isolation levels are a trade-off: READ COMMITTED is usually the sweet spot for web apps, while REPEATABLE READ is safer for complex reporting.
- MVCC (Multi-Version Concurrency Control) allows readers to not block writers in engines like Postgres and InnoDB.
A transaction is a logical unit of database processing that must be completed in its entirety to ensure data integrity. It is governed by ACID properties: Atomicity (all-or-nothing), Consistency (state transition validity), Isolation (concurrency control), and Durability (permanent persistence). Modern engines like InnoDB (MySQL) and Postgres use Isolation Levels to balance data safety against system throughput.
ACID in Practice: Beyond the Acronym
While every textbook defines ACID, seeing it handle a failure in real-time clarifies why we pay the performance tax for a relational engine. Atomicity is the 'all-or-nothing' guarantee, but Durability is the 'promise'—ensuring that once a COMMIT is acknowledged, even a total power failure won't revert the change.
-- io.thecodeforge: Atomic Bank Transfer Example BEGIN; -- Step 1: Deduct from source UPDATE io_thecodeforge.accounts SET balance = balance - 500, updated_at = NOW() WHERE account_id = 'ACC-001' AND balance >= 500; -- Step 2: Add to destination UPDATE io_thecodeforge.accounts SET balance = balance + 500, updated_at = NOW() WHERE account_id = 'ACC-002'; -- If a network timeout or constraint violation occurs here, -- the entire block is invalidated. COMMIT; -- PRO TIP: In Spring Boot, use @Transactional(propagation = Propagation.REQUIRED) -- to wrap these calls in a single proxy-managed transaction.
Transaction committed successfully.
Isolation Levels and Read Phenomena
Isolation levels are a slider between 'Perfect Safety' and 'Maximum Speed.' As you lower the level, you introduce specific anomalies: 1. Dirty Reads: Reading data that hasn't been committed yet. 2. Non-Repeatable Reads: A row changes its value while you're still looking at it. 3. Phantom Reads: New rows appear in a range query because another transaction inserted them.
-- io.thecodeforge: Demonstrating Read Phenomena -- 1. READ COMMITTED (Default for Postgres) -- Prevents Dirty Reads but allows Non-Repeatable Reads. SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED; -- 2. REPEATABLE READ (Default for MySQL InnoDB) -- Uses MVCC (Multi-Version Concurrency Control) to ensure you see a snapshot. SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- 3. SERIALIZABLE (Highest Safety) -- Prevents Phantoms by placing range locks or using predicate locking. SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- Production Pattern: Use 'SELECT ... FOR UPDATE' to skip isolation logic -- and force a row-level write lock immediately. BEGIN; SELECT balance FROM io_thecodeforge.accounts WHERE account_id = 'ACC-001' FOR UPDATE; -- This row is now locked until we COMMIT.
Deadlocks: Detection and Prevention
A deadlock occurs when Transaction A holds Lock 1 and wants Lock 2, while Transaction B holds Lock 2 and wants Lock 1. They will wait forever. Most modern RDBMS (Postgres, MySQL) run a background 'Deadlock Detector' thread that identifies these cycles and kills the 'cheapest' transaction to break the loop.
-- io.thecodeforge: Deadlock Prevention via Deterministic Ordering -- BAD: Locking in random order based on app logic -- TX1: Update A, then B -- TX2: Update B, then A <-- High Deadlock Risk! -- GOOD: Always sort IDs before locking BEGIN; -- By ordering our IDs, all transactions wait in the same 'queue' SELECT * FROM io_thecodeforge.accounts WHERE account_id IN ('ACC-001', 'ACC-002') ORDER BY account_id ASC FOR UPDATE; UPDATE io_thecodeforge.accounts SET balance = balance - 100 WHERE account_id = 'ACC-001'; UPDATE io_thecodeforge.accounts SET balance = balance + 100 WHERE account_id = 'ACC-002'; COMMIT;
🎯 Key Takeaways
- ACID is the gold standard: Atomicity, Consistency, Isolation, and Durability ensure data integrity even during hardware failure.
- Isolation levels are a trade-off: READ COMMITTED is usually the sweet spot for web apps, while REPEATABLE READ is safer for complex reporting.
- MVCC (Multi-Version Concurrency Control) allows readers to not block writers in engines like Postgres and InnoDB.
- Deadlocks are an architectural failure, not a database bug. Prevent them by implementing consistent lock ordering in your code.
- PostgreSQL is 'Pessimistic' by default with row-locks, whereas some distributed DBs use 'Optimistic' concurrency (OCC) which fails at commit time if a conflict occurred.
Interview Questions on This Topic
- QExplain the 'Phantom Read' anomaly. In which isolation level is it resolved in standard SQL, and how does MySQL's InnoDB handle it differently?
- QYou are seeing high 'Lock Wait Timeout' errors in a Spring Boot application. Walk me through your debugging process to identify the offending transaction.
- QWhat is the difference between Optimistic and Pessimistic locking? When would you use a version column (@Version in JPA) over a 'SELECT FOR UPDATE'?
- QLeetCode SQL Context: Write a query to find and resolve duplicate records within a single transaction without locking the entire table.
- QHow does MVCC (Multi-Version Concurrency Control) allow a database to provide Isolation without excessive locking?
Frequently Asked Questions
What exactly is a 'Dirty Read' and why is it dangerous?
A dirty read happens when Transaction A reads data that Transaction B has modified but hasn't committed yet. If Transaction B fails and performs a ROLLBACK, Transaction A is now working with 'ghost' data that technically never existed. This can lead to incorrect financial balances or corrupted state logic. Most production databases (Oracle, Postgres, SQL Server) prevent this by default using READ COMMITTED.
What is the difference between a non-repeatable read and a phantom read?
A non-repeatable read concerns a single row: you read a row, someone else updates/deletes it, and your second read of that same row yields different data. A phantom read concerns a range of rows: you query all 'Gold Members' and find 5; someone else inserts a new 'Gold Member'; your second query finds 6. REPEATABLE READ stops the first; SERIALIZABLE is usually required to stop the second.
How does the 'Durability' in ACID handle a literal power out?
Databases use a Write-Ahead Log (WAL) or Redo Log. Before any data is changed in the actual tables, the intention is written to a sequential log file on disk. If the power cuts, the database restarts, reads the WAL, and 'replays' any committed transactions that hadn't made it to the main data files yet.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.