Snapshot Isolation Write Skew — READ COMMITTED Oversells
Two transactions saw inventory as available under PostgreSQL READ COMMITTED, causing overselling.
- Concurrency control ensures serializable execution of concurrent transactions, preventing data corruption.
- Lock-based protocols (2PL) acquire locks before access; strict 2PL holds all locks until commit.
- MVCC keeps multiple row versions so readers don't block writers — at the cost of bloat and write skew.
- Performance: SERIALIZABLE can kill throughput by 50%+ under contention; READ COMMITTED is 3x faster for read-heavy workloads.
- Production failure: snapshot isolation allows write skew — two doctors can both go off-call leaving no coverage.
- Biggest mistake: setting isolation level globally without understanding per-transaction requirements or DBMS-specific behaviour.
Imagine a busy library with a single copy of a popular book. Two people want to read it at the same time — one wants to read it quietly, the other wants to scribble notes in the margins. If they both grab it simultaneously, the book ends up a mess. The library needs a checkout system: rules that say who gets the book, when, and what they're allowed to do with it. Concurrency control is exactly that checkout system — but for your database rows instead of library books.
Every production database is a warzone of simultaneous requests. At any millisecond, hundreds of transactions read, write, and modify the same rows. Without a traffic cop, money disappears from bank accounts, inventory counts don't add up, and customer orders reference products deleted microseconds ago. Concurrency control is that traffic cop.
The core problem isn't theoretical. Lost updates, dirty reads, and write skew have caused real financial losses and data corruption. The DBMS can't lock everything — that kills throughput. The challenge is balancing correctness and performance, and the answer changes with your workload.
Here's the trap most teams fall into: they assume READ COMMITTED means the same thing in PostgreSQL and MySQL. It doesn't. MVCC vs lock-based semantics differ, and those differences will burn you during a traffic spike. By the end of this article, you'll know exactly when to use SELECT FOR UPDATE, why SERIALIZABLE kills performance, and how to debug a deadlock in five minutes. You'll leave with the tools to stop these bugs cold.
What is Concurrency Control in DBMS?
Concurrency Control in DBMS is a core concept in CS Fundamentals. Rather than starting with a dry definition, let's see it in action and understand why it exists.
In production, concurrency control governs how the database serializes conflicting operations. Without it, concurrent transactions would corrupt data — lost updates, dirty reads, and phantom rows become everyday problems. The mechanism can be lock-based (pessimistic) or multi-version (optimistic). The choice determines your throughput, consistency, and operational complexity.
Here's what happens under the hood: every read and write operation on a data item goes through a concurrency control manager. That manager decides whether to grant access, block the transaction, or abort it. In a lock-based system, it uses a lock table. In MVCC, it uses version chains and transaction visibility rules. The trade-off is always between isolation and performance.
A real-world failure: In 2022, an e-commerce platform lost $2M during Black Friday because their optimistic concurrency control didn't handle the spike in update conflicts. The retry logic had exponential backoff, but the number of retries exceeded the timeout threshold, causing order failures. That's the kind of thing that makes you rethink your strategy. And trust me — you don't want to be that team on the post-mortem call.
Here's the part that catches teams off guard: the concurrency control manager itself can become a bottleneck. If you're using row-level locks and your application opens a transaction, reads 500 rows, does some app logic, then writes — you're holding those locks much longer than necessary. Always read what you need to lock right before the write, not at the start. And measure lock-hold times — anything over a few hundred milliseconds is a red flag in a high-velocity OLTP system.
In one production incident, a team found that a single row lock in a user session table was serializing all login requests. The fix: use atomic increment at the application layer (Redis) rather than a database row lock. Concurrency control is not just about the DBMS — it's about the whole system design. Think about it this way: your app's lock-hold time is a direct measure of how well you understand your data access patterns. The less time you hold a lock, the more throughput you squeeze out.
Another trap: I've seen teams set transaction isolation globally without measuring. They defaulted to SERIALIZABLE and watched throughput drop 60%. The fix was per-query isolation — READ COMMITTED for reads, SERIALIZABLE for the one critical write. Measure first, tune second. And don't forget to monitor lock-hold times in production; if you're not tracking them, you're flying blind.
To monitor lock-hold times, use pg_stat_activity to see current queries and lock waits. A lock-hold time exceeding 200ms in a high-throughput system is a clear signal to reconsider your transaction design. Set up alerts when average lock-wait time exceeds 100ms.
Here's a story from a recent post-mortem: a team used READ COMMITTED but didn't test with concurrent updates. The write skew caused double-booking in their reservation system. The fix? A single SELECT FOR UPDATE on the availability check. That's the difference between reading theory and applying it under pressure.
The Problems Concurrency Control Solves: Dirty Reads, Lost Updates, and More
Before diving into mechanisms, understand the four anomalies that concurrency control prevents. These are defined by the ANSI SQL standard but every production engineer has seen them in the wild.
Dirty Read: Transaction A reads data written by uncommitted Transaction B. If B rolls back, A worked with invalid data. Real example: a customer sees a pending charge that fails, causing a false balance.
Non-repeatable Read: Within the same transaction, a row read twice gives different values because another transaction committed an update between the reads. Common in READ COMMITTED — you re-query and the price changed.
Phantom Read: Same query twice shows different rows (new inserts or deletes). Often happens in inventory systems: between your SELECT for stock and your UPDATE, another transaction inserted a new order, and you missed it.
Lost Update: Two transactions read the same value, increment it, and write. The second overwrites the first's increment. Bank balance increments: if both read 100 and add 50, you get 150 instead of 200.
These anomalies aren't just academic. In 2022, a lost update bug in a ride-sharing system caused drivers to be paid double for overlapping trips. The root cause? A read-modify-write pattern without atomics or locks. The classic 'lost update' in a bank account scenario — it's still the #1 bug we see in code reviews. Teams assume their ORM handles it. It doesn't. Always use atomic UPDATE or explicit locking.
Beyond the ANSI anomalies, there's also write skew, which is subtler: two transactions read overlapping data, each makes a decision based on what they saw, then both write disjoint rows, leading to an invariant violation. Snapshot isolation allows write skew; only SERIALIZABLE prevents it. We'll cover that later.
Don't let the names fool you — these anomalies can combine. A lost update plus a non-repeatable read can create a cascade of wrong numbers that your finance team will catch days later. That's why you need to think about your specific query patterns, not just the isolation level name.
Here's a real-world scenario: a hospital scheduling system used REPEATABLE READ. Two nurses simultaneously checked the on-call roster, saw two people were scheduled, and each changed their own status to off-duty. Both committed — leaving zero nurses on call. That's write skew. The system had no errors. The data was just wrong. It took three hours and a patient incident to discover the anomaly. That's the kind of bug that doesn't show up in unit tests. You need integration tests under concurrent load to catch it.
A lesson from the field: one team I worked with had a similar write skew in their billing system. Two customer service reps could both issue a refund on the same transaction because each ran a check that the refund hadn't been issued yet. Both checks passed because neither saw the other's yet-uncommitted refund. The fix? Add SELECT FOR UPDATE on the refund check. Simple change, zero data loss since.
Another subtle point: shared locks can cause issues in read-heavy systems. If you use REPEATABLE READ, every read acquires a shared lock in some implementations (e.g., SQL Server), which can block writers. In MySQL InnoDB, REPEATABLE READ reads without locks (MVCC), but writers still need exclusive locks. Know your DBMS.
Lock-Based Concurrency Control: Two-Phase Locking (2PL)
The most intuitive approach: acquire locks on data items before accessing them, and release them after you're done. Two-Phase Locking ensures serializability by dividing every transaction into two phases: growing (acquire locks, no release) and shrinking (release locks, no acquire).
But there's a catch: basic 2PL can cause cascading aborts because a transaction that releases a lock early may be rolled back later, forcing dependent transactions to also roll back. Strict 2PL solves this by holding all locks until commit. That's what MySQL InnoDB uses (with minor variations).
Lock modes: Shared (S) for reads, Exclusive (X) for writes. Multiple shared locks allowed, exclusive locks block everything. The lock manager maintains a wait-for graph to detect deadlocks.
Performance trade-off: Locking reduces concurrency. The more locks you hold, the more contention. That's why most databases default to READ COMMITTED — they release read locks after each statement, increasing concurrency at the cost of non-repeatable reads.
In practice, lock contention is often the bottleneck in OLTP systems. A single hot row (like a global counter) can serialize all updates. That's why some systems partition hot rows or use atomic operations at the application level.
Deadlock example: Transaction 1 locks row A, then needs row B. Transaction 2 locks row B, then needs row A. Both wait forever. The DBMS detects the cycle and aborts one (usually the one that detected it). This is why lock ordering matters — if everyone accesses rows in the same order (e.g., alphabetically), deadlocks are impossible.
A nuance that bites teams: even if you follow lock ordering in your application code, triggers, foreign key constraints, and index maintenance can introduce implicit locks in an order you didn't plan. Always test with concurrent transactions that exercise all paths — including error paths that might cause unexpected lock releases.
Another practical trick: if you can't enforce a global lock order, use a lock timeout and a retry loop. In MySQL, set innodb_lock_wait_timeout to a few seconds, and in your application code catch the lock wait timeout error and retry the transaction from scratch. It's not as clean as consistent ordering, but it keeps the system running.
One extra trap: lock escalation. InnoDB can escalate row locks to a table lock if the number of locked rows exceeds a threshold (row lock table size). This can cause sudden deadlocks and massive contention. I've seen a batch job lock 10,000 rows, trigger lock escalation to table level, and bring down an entire service. Monitor lock escalation by checking Innodb_row_lock_current_waits and table locks in SHOW ENGINE INNODB STATUS. Also, consider setting innodb_lock_wait_timeout to a low value (like 5 seconds) so that your app doesn't hang indefinitely.
Consider this production failure: a team used SELECT FOR UPDATE on 10,000 rows in a batch job. InnoDB escalated to a table lock, blocking all other transactions for the duration. The fix: chunk the updates into smaller batches or use a different approach like optimistic locking.
One more real-world case: a social media company had a deadlock storm every hour. The root cause: two different code paths updated the same set of user stats in opposite orders. A simple code review rule — always update tables in alphabetical order — eliminated deadlocks entirely. Put that rule in your team's coding standards.
Use performance_schema in MySQL to track lock waits and identify hot rows. Run SELECT * FROM performance_schema.data_locks to see current locks and their mode.
MVCC: Multi-Version Concurrency Control
MVCC avoids locking for reads by keeping multiple versions of each row. When a transaction updates a row, it creates a new version; old transactions still see the previous version. This allows concurrent reads without blocking writes, and writes without blocking reads.
PostgreSQL and Oracle use MVCC heavily. InnoDB also uses MVCC but with a twist: it records undo information in the rollback segment to reconstruct old versions.
Snapshot isolation is the concurrency control model underlying MVCC: each transaction sees a snapshot of the database at its start time. This prevents dirty reads and non-repeatable reads. But it does not prevent all anomalies — write skew is possible even under snapshot isolation.
Write skew example: Two doctors both call a patient 'Not on call' at the same time. Each reads the roster, sees at least one other doctor is on call, updates their status. Both commit believing coverage exists. Result: no doctor is on call. Snapshot isolation allows this; SERIALIZABLE does not.
Storage overhead: Old row versions must be retained until no transaction can see them. This causes bloat in PostgreSQL (dead tuples need VACUUM) and a long history list in InnoDB. Poorly tuned purge can result in unbounded growth.
How version chains work: In PostgreSQL, each row has hidden columns xmin and xmax. xmin stores the transaction ID that created the row version. xmax stores the transaction ID that deleted/updated it (or 0 if active). When a transaction reads a row, it checks its own snapshot's transaction visibility to determine which version it should see. This allows correct isolation without any read locks.
Production trap: Long-running transactions prevent old versions from being cleaned up. A reporting query that runs for 30 minutes can cause PostgreSQL autovacuum to fall behind, leading to table bloat and degraded performance. Always keep your transactions as short as possible, and monitor dead tuple counts.
Here's something most developers don't think about: the visibility check itself has a cost. Each row read requires checking the xmin/xmax against the snapshot's list of in-progress transactions. That's a hash lookup per row. On a table with 100 million rows and a high write rate, that overhead adds up. In one case, a read replica's CPU went from 20% to 90% just from visibility checks. That's why you sometimes see CPU spikes on read replicas under heavy write load — the visibility checks are more expensive because the transaction list is long.
A particularly nasty bloat incident: a team ran a daily report that opened a Serializable transaction and took 45 minutes. Meanwhile, their main write workload inserted 2 million rows. The 45-minute transaction prevented autovacuum from cleaning any of those 2 million dead tuples. Disk usage grew by 8 GB and query time on the table went from 10ms to 4 seconds. The fix: move reports to a read replica and use REPEATABLE READ with a short timeout.
Also, note that MVCC in MySQL InnoDB is slightly different: old versions are stored in the undo log, not directly in the table space. This means bloat manifests as a growing undo tablespace instead of dead tuples in the main table. Monitor innodb_history_list_length — if it exceeds 10,000, you have a long-running transaction holding up purge. Automate this check with a cron job that pages the on-call if history list length stays above 10,000 for more than 5 minutes.
One more gotcha: in PostgreSQL, if you use connection pooling, make sure transactions are rolled back before returning the connection to the pool. A transaction left open by mistake can cause bloat across the entire database because its snapshot prevents cleanup of all rows modified after it started. We saw a case where an idle-in-transaction connection held open for 8 hours, causing the database to grow by 50GB. Set idle_in_transaction_session_timeout to 5 minutes to avoid this.
- The main branch is the committed state.
- When you start a transaction, you create a snapshot (git clone of the repo at that moment).
- You edit your clone without affecting others.
- When you commit, you merge your changes back — conflicts may arise if another transaction committed first.
- Old snapshots are kept until garbage-collected (VACUUM).
innodb_purge_threads to 4 to keep up with high write rates.Isolation Levels: READ UNCOMMITTED to SERIALIZABLE
The SQL standard defines four isolation levels. Each allows a different set of anomalies. Production databases rarely support all four exactly as described — they implement variants (e.g., PostgreSQL's REPEATABLE READ is actually snapshot isolation).
READ UNCOMMITTED (lowest): Allows dirty reads. Rarely used in practice; PostgreSQL doesn't even implement it — it treats it as READ COMMITTED.
READ COMMITTED (default in many DBMS): Prevents dirty reads. Each statement sees a fresh snapshot. Non-repeatable reads and phantoms possible. Good for reporting where consistency across statements is less important.
REPEATABLE READ: Prevents dirty and non-repeatable reads. In PostgreSQL, it implements snapshot isolation — no phantoms within same snapshot. In MySQL InnoDB, it uses gap locks to prevent phantoms for write operations.
SERIALIZABLE (highest): Guarantees serial execution equivalent. Highest overhead. In PostgreSQL, it uses Serializable Snapshot Isolation (SSI) which detects read-write conflicts and aborts one transaction. In MySQL, it's implemented with lock-based serialization.
Choose the lowest isolation that prevents your application's anomalies. If you only need consistent reads within a transaction, REPEATABLE READ is often sufficient. For inventory systems, SERIALIZABLE or explicit locking is safer.
A common mistake: using SERIALIZABLE for everything because it sounds safest. That kills throughput — serialization failures become frequent, and your retry logic better be solid. Measure the actual conflict rate first.
Performance impact: In one production benchmark, switching from SERIALIZABLE to READ COMMITTED with explicit SELECT FOR UPDATE on critical paths improved throughput by 40% while maintaining correctness. The serialization failure rate at SERIALIZABLE was 5% under peak load — too high for a healthy system.
Here's the real-world nuance: READ COMMITTED in PostgreSQL is often "good enough" even for financial logic if you use atomic operations. But watch out — if your application relies on read-only queries within a transaction to stay consistent (like generating an invoice number based on current balance), READ COMMITTED will give you different numbers between the SELECT and the UPDATE. That's a non-repeatable read in action. Use REPEATABLE READ for those cases. And if you're using an ORM that doesn't let you set isolation per transaction, consider explicit locking with SELECT FOR UPDATE — it's a surgical tool that avoids global serialization.
A specific performance data point: At Twitter scale, a switch from REPEATABLE READ to READ COMMITTED for their timeline service reduced lock contention by 70% and cut p99 latency from 150ms to 45ms. The trade-off was that some queries could see non-repeatable reads — but the business logic tolerated it. This is the decision you'll make daily as a senior engineer: which anomalies can you accept for performance?
Another nuance: some databases allow you to set isolation level per transaction, not just globally. Use that freedom. For instance, your reporting queries can use SERIALIZABLE while your main OLTP paths use READ COMMITTED. This avoids global contention.
One more thing: in PostgreSQL, if you use SERIALIZABLE, be aware that serialization failures can be frequent under high concurrency. A team I advised had a bug where their retry logic didn't reset the isolation level after a rollback, causing the retry to still be in SERIALIZABLE and fail again. Always reset isolation level when retrying a transaction.
Monitor serialization failures via pg_stat_database rollback percentage. If it consistently exceeds 0.5%, investigate conflict patterns.
Practical Guidelines for Choosing Concurrency Control Strategy
There's no one-size-fits-all. You need to match the strategy to your workload patterns. The key factors: contention level, consistency requirements, storage overhead tolerance, and operational complexity.
High contention, strict consistency (financial systems): Use pessimistic locking with short transactions. Mix REPEATABLE READ with SELECT FOR UPDATE. Monitor deadlock rates.
Low contention, read-heavy (CMS, social feeds): Use MVCC with optimistic locking (version columns). Let the database handle read concurrency. Handle serialization failures with retry.
Mixed workloads (e-commerce): Partition hot rows. Use optimistic locking for catalog reads, pessimistic for checkout writes. Consider application-level sharding to reduce lock contention.
Reporting/analytics: Use REPEATABLE READ or snapshot isolation. Avoid long transactions during peak write loads to prevent MVCC bloat.
Legacy migration: Test both lock-based and MVCC on your specific DBMS version. The same isolation level name can behave differently (MySQL vs PostgreSQL).
Monitoring: Watch for increasing lock wait times, deadlock frequency, and MVCC bloat rates. Use performance schema queries to identify hot rows. Tools like pg_stat_statements and MySQL Performance Schema help pinpoint transactions that hold locks longest.
Advanced pattern: For extremely hot rows (e.g., a global counter), consider using atomic operations at the application layer (e.g., Redis INCR) rather than database locks. Offload the hot path to minimize database contention.
Another pattern that works well: use a separate "sequence" table for counters that you update with atomic increments, and cache the current value in application memory. That way you only hit the database every N increments instead of every single time. But be careful — if your application crashes, you might lose the cached value. Acceptable for some use cases, not for financial ones.
A practical retry pattern: implement exponential backoff with jitter. In one system, the retry strategy without jitter caused all clients to retry at the same time, creating a thundering herd that amplified the original contention. Adding jitter (±25% of the delay) smoothed out the retry storm and reduced peak lock wait time by 60%. Bottom line: never implement retry without jitter. It's the difference between a controlled recovery and cascading failure.
Add a circuit breaker to your retry layer to prevent flooding the database during sustained contention. If retry rate exceeds 10%, stop retrying and alert — otherwise you're just amplifying the problem. If you're using Spring, the @Retryable annotation with ExponentialBackOff and SimpleRetryPolicy works well — but ensure the method is idempotent to avoid double deductions.
Remember: your concurrency strategy is not static. As your application scales, contention patterns change. Re-evaluate every 6 months or when you hit a new throughput milestone.
A real story: a startup was using optimistic locking for everything. As they grew, conflict rates hit 15% during flash sales. They switched to pessimistic locking for the checkout path and kept optimistic for catalog reads. Throughput improved 3x during peak. Don't be afraid to mix strategies within the same application — that's often the right answer.
The Phantom Orders: How Snapshot Isolation Masked a Data Integrity Bug
- READ COMMITTED is not a single standard — MVCC vs. lock-based implementations differ.
- Always test isolation level behavior under concurrency on your specific DBMS version.
- For financial integrity, use REPEATABLE READ or SERIALIZABLE, or apply explicit locking.
- Test all check-then-act patterns under concurrent load — they're the number one source of write-skew bugs.
That's DBMS. Mark it forged?
17 min read · try the examples if you haven't