REPEATABLE READ Phantom Inventory — DBMS Transactions
REPEATABLE READ does not prevent phantom reads — new inserts appear in range scans, causing $50k stock discrepancy.
20+ years shipping production systems from the metal up. Everything here is grounded in real deployments.
- ACID defines the contract: Atomicity, Consistency, Isolation, Durability — violations cause silent data corruption
- Isolation levels trade safety for throughput: READ COMMITTED is the production default for most web apps
- MVCC keeps reads from blocking writes, but phantom reads still sneak through in REPEATABLE READ
- Deadlocks are an application design failure, not a database bug — always lock resources in a consistent order
- The biggest mistake: assuming your app's isolation level matches the database default (they often don't)
Think of a transaction like a bank transfer. You want to send $100 from your account to a friend's. The database must either do both steps (deduct and add) or none. If the server dies after deducting but before adding, your money shouldn't vanish. That's ACID in action.
Why REPEATABLE READ Still Lets Phantom Inventory Slip Through
REPEATABLE READ is a transaction isolation level that guarantees that any row read during a transaction will appear unchanged on subsequent reads within the same transaction. The core mechanic is that it locks every row the transaction touches (via shared or exclusive locks) and holds those locks until commit. This prevents dirty reads and non-repeatable reads — but it does not prevent phantom reads. A phantom read occurs when a new row inserted by another transaction matches the WHERE clause of a query you already executed. Because REPEATABLE READ locks only existing rows, not the gap where a new row could appear, phantoms are possible. In practice, this means that if you run SELECT ... WHERE quantity > 0 twice in the same transaction, the second query may return rows that weren't there the first time. This is a direct consequence of the isolation level's lock granularity: row locks, not predicate locks. Most databases implement REPEATABLE READ with snapshot isolation (e.g., PostgreSQL, Oracle) or with lock-based semantics (e.g., MySQL InnoDB). Under snapshot isolation, you get a consistent view of the database as of the transaction start — which actually prevents phantoms for reads, but not for write skew or write conflicts. Under lock-based REPEATABLE READ, phantoms are real and can corrupt inventory counts, reservation systems, or any logic that assumes a stable result set. Use REPEATABLE READ when you need to read the same rows multiple times and ensure they haven't changed, but you can tolerate new rows appearing. For inventory systems where phantom inserts would break business logic (e.g., overselling stock), you must escalate to SERIALIZABLE or use explicit gap locks (e.g., SELECT ... FOR UPDATE with a range condition).
ACID in Practice: Beyond the Acronym
While every textbook defines ACID, seeing it handle a failure in real-time clarifies why we pay the performance tax for a relational engine. Atomicity is the 'all-or-nothing' guarantee, but Durability is the 'promise'—ensuring that once a COMMIT is acknowledged, even a total power failure won't revert the change.
Isolation Levels and Read Phenomena
Isolation levels are a slider between 'Perfect Safety' and 'Maximum Speed.' As you lower the level, you introduce specific anomalies: 1. Dirty Reads: Reading data that hasn't been committed yet. 2. Non-Repeatable Reads: A row changes its value while you're still looking at it. 3. Phantom Reads: New rows appear in a range query because another transaction inserted them.
Deadlocks: Detection and Prevention
A deadlock occurs when Transaction A holds Lock 1 and wants Lock 2, while Transaction B holds Lock 2 and wants Lock 1. They will wait forever. Most modern RDBMS (Postgres, MySQL) run a background 'Deadlock Detector' thread that identifies these cycles and kills the 'cheapest' transaction to break the loop.
- Each transaction (car) holds a resource (position in the roundabout).
- Each wants another resource that is held by another car.
- The database traffic cop (deadlock detector) picks one car to reverse out.
- That transaction is rolled back and must retry.
- You can avoid the gridlock by enforcing a consistent direction of travel (lock order).
MVCC: How Isolation Levels Actually Work
Multi-Version Concurrency Control (MVCC) allows each transaction to see a snapshot of the data at the start of the transaction (or at the statement level, depending on isolation). Instead of locking every row a reader might touch, the database keeps multiple versions of each row. When a transaction writes, it creates a new version; concurrent readers see the old version until the writer commits. This is why PostgreSQL can run thousands of concurrent SELECTs without blocking — readers never block writers, and writers never block readers (unless they conflict).
Transaction Management in Practice: Application-Level Patterns
Real-world apps rarely use raw SQL transactions. Frameworks like Spring Boot, Hibernate, and Entity Framework manage transaction boundaries declaratively. You need to understand: - Transaction propagation: REQUIRED, REQUIRES_NEW, NESTED — when does a new transaction start? - Isolation level override: @Transactional(isolation = Isolation.REPEATABLE_READ) can clash with database defaults. - Transactional outbox pattern for reliable messaging. - Handling rollback-only exceptions: sometimes a caught exception still marks the transaction for rollback. - Setting timeout and read-only hints to optimize connection usage.
@Transactional method and do not rethrow it, the transaction is still marked rollback-only. The commit will fail with Transaction rolled back because it has been marked as rollback-only. Always rethrow the exception or use TransactionAspectSupport.currentTransactionStatus().setRollbackOnly() deliberately.@Transactional on a private method — it's ignored because the proxy can't intercept it.@Transactional method from within the same class bypasses the proxy.@Transactional from another bean to trigger the proxy.@Transactional are syntactic sugar — they still rely on database locks and isolation levels.readOnly=true for SELECT-only operations.The Forgotten Enemy: Lost Updates and Write Skew
Atomicity, isolation, and the rest of ACID still leave a bastard child — write skew. Two transactions read overlapping data, then write conflicting results based on stale snapshots. The bank example: you check your balance ($1000), I check mine ($1000). We both transfer $200 from your account to mine. REPEATABLE READ says each sees its own snapshot. Neither sees the other's write until commit. Result: your balance goes to $800, mine goes to $1200. We just printed $200. That's a phantom surplus, and no isolation level under SI fixes it. Why this happens: MVCC snapshots freeze read sets but not write sets. Both transactions see the original $1000 balance — neither notices the other's deduction. Prevent it with SELECT FOR UPDATE or explicit predicate locks. In PostgreSQL: BEGIN; SELECT balance FROM accounts WHERE account_id = 123 FOR UPDATE; — that locks the row, forcing the second transfer to wait. Without it, you're printing money in production.
When Your Cache Lies: Transaction Boundaries for Read-Heavy Workloads
Every junior dev learns to cache. Few learn that distributed caches break transaction isolation. Picture this: your RDBMS holds account balances. The app caches 'balance = 1000' in Redis. T1 deducts $200 from the DB — no cache update. A second service reads Redis, sees $1000, and initiates another deduction. The database becomes consistent after T1 commits, but your cache serves stale data for minutes. The fix: cache-aside with a TTL no longer than your business tolerance for inconsistency. Better: use database-level caching (PgBouncer with statement caching) or implement a cache invalidation protocol that aligns with transaction commit timing. Do not cache aggregate values that change per transaction unless you are willing to accept eventual consistency. If your business logic depends on strict serializability, disable caching completely. Real production incident: a fintech startup lost $12k because a Redis cache returned a stale pre-tax balance after a bulk transaction update. The DB was consistent the whole time — but nobody asked the DB.
The Phantom Inventory: How REPEATABLE READ Caused a $50k Stock Discrepancy
SELECT ... FOR UPDATE on the product range.- REPEATABLE READ stops row updates from changing your view, but it does not stop new rows from appearing.
- For any report that aggregates across a dynamic range of rows, use SERIALIZABLE or lock the expected key range.
- Always verify which anomalies your isolation level actually prevents — don't assume it's all three.
SHOW ENGINE INNODB STATUS (MySQL) or SELECT * FROM pg_locks WHERE NOT granted (PostgreSQL) to find the blocking transaction.SELECT * FROM pg_stat_activity WHERE state = 'idle in transaction' or SHOW PROCESSLIST (MySQL). Kill idle transactions over a threshold.FOR UPDATE with condition). Verify by testing with concurrent inserts.PostgreSQL: tail -f /var/log/postgresql/postgresql-*.log | grep -i deadlockMySQL: SHOW ENGINE INNODB STATUS\GKey takeaways
Common mistakes to avoid
3 patternsUsing READ UNCOMMITTED without understanding the consequences
Not handling serialization failures when using SERIALIZABLE isolation
Assuming REPEATABLE READ prevents all read phenomena
SELECT ... WHERE ... FOR UPDATE) to the reading transaction, or switch to SERIALIZABLE for range-reporting transactions.Interview Questions on This Topic
Explain the 'Phantom Read' anomaly. In which isolation level is it resolved in standard SQL, and how does MySQL's InnoDB handle it differently?
Frequently Asked Questions
20+ years shipping production systems from the metal up. Everything here is grounded in real deployments.
That's DBMS. Mark it forged?
4 min read · try the examples if you haven't