Mid-level 17 min · March 06, 2026
Concurrency Control in DBMS

Snapshot Isolation Write Skew — READ COMMITTED Oversells

Two transactions saw inventory as available under PostgreSQL READ COMMITTED, causing overselling.

N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

Follow
Production
production tested
June 10, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Concurrency control ensures serializable execution of concurrent transactions, preventing data corruption.
  • Lock-based protocols (2PL) acquire locks before access; strict 2PL holds all locks until commit.
  • MVCC keeps multiple row versions so readers don't block writers — at the cost of bloat and write skew.
  • Performance: SERIALIZABLE can kill throughput by 50%+ under contention; READ COMMITTED is 3x faster for read-heavy workloads.
  • Production failure: snapshot isolation allows write skew — two doctors can both go off-call leaving no coverage.
  • Biggest mistake: setting isolation level globally without understanding per-transaction requirements or DBMS-specific behaviour.
✦ Definition~90s read
What is Concurrency Control in DBMS?

Snapshot Isolation (SI) is a concurrency control mechanism that gives each transaction a consistent snapshot of the database as of its start time, avoiding dirty reads and non-repeatable reads without blocking writers. It's the engine behind PostgreSQL's default REPEATABLE READ and Oracle's SERIALIZABLE (though Oracle's is actually SI).

Imagine a busy library with a single copy of a popular book.

The catch: SI allows write skew—a subtle anomaly where two concurrent transactions read overlapping data, then each writes based on that stale read, violating a constraint neither could see violated. Classic example: two doctors on call, each checks if at least one is on duty, sees the other is, then both go off duty.

Under SI, both succeed; under serializable isolation, one aborts. This is why READ COMMITTED (which is weaker than SI) can't prevent write skew either—it's not about isolation level, it's about the fundamental trade-off SI makes: you get snapshot consistency without locks, but you lose the ability to detect cross-transaction dependencies.

If your app has integrity constraints spanning multiple rows (like inventory totals, seat bookings, or balance transfers), SI will silently corrupt them under concurrent writes. The fix is either explicit locking (SELECT ... FOR UPDATE), application-level retry logic, or moving to true serializable isolation (e.g., PostgreSQL's SERIALIZABLE with SSI, or CockroachDB's strict serializable).

SI is excellent for read-heavy, low-contention workloads—think reporting dashboards or analytics—but dangerous for any system where two transactions can logically conflict without touching the same row.

Plain-English First

Imagine a busy library with a single copy of a popular book. Two people want to read it at the same time — one wants to read it quietly, the other wants to scribble notes in the margins. If they both grab it simultaneously, the book ends up a mess. The library needs a checkout system: rules that say who gets the book, when, and what they're allowed to do with it. Concurrency control is exactly that checkout system — but for your database rows instead of library books.

Every production database is a warzone of simultaneous requests. At any millisecond, hundreds of transactions read, write, and modify the same rows. Without a traffic cop, money disappears from bank accounts, inventory counts don't add up, and customer orders reference products deleted microseconds ago. Concurrency control is that traffic cop.

The core problem isn't theoretical. Lost updates, dirty reads, and write skew have caused real financial losses and data corruption. The DBMS can't lock everything — that kills throughput. The challenge is balancing correctness and performance, and the answer changes with your workload.

Here's the trap most teams fall into: they assume READ COMMITTED means the same thing in PostgreSQL and MySQL. It doesn't. MVCC vs lock-based semantics differ, and those differences will burn you during a traffic spike. By the end of this article, you'll know exactly when to use SELECT FOR UPDATE, why SERIALIZABLE kills performance, and how to debug a deadlock in five minutes. You'll leave with the tools to stop these bugs cold.

Why Snapshot Isolation Is Not a Free Lunch

Concurrency control in a DBMS is the mechanism that ensures transactions execute correctly when they run in parallel. The core mechanic is isolation: each transaction must appear to run alone, even though the system interleaves reads and writes. Without it, dirty reads, non-repeatable reads, and lost updates corrupt your data. Snapshot isolation (SI) is a popular implementation: each transaction sees a consistent snapshot of the database as of its start time, and writes only commit if no concurrent transaction has modified the same rows. This avoids many anomalies without blocking readers, but it introduces a subtle failure mode called write skew. In practice, SI trades serializability for performance: reads never wait, but you can get logically inconsistent states. For example, two doctors both see that a patient has at least one on-call colleague, each removes themselves, and the patient ends up with zero coverage — a classic write skew. Use SI when your workload is read-heavy and you can tolerate rare, application-level inconsistencies. For financial systems or inventory management, you need serializable isolation or explicit locking to prevent write skew. The real cost of SI is not in throughput — it's in the hidden invariants it silently breaks.

Write Skew Is Not a Bug — It's a Feature of SI
Snapshot isolation guarantees no lost updates, but it does not prevent transactions from making conflicting decisions based on stale snapshots. That's the trade-off for non-blocking reads.
Production Insight
A ride-sharing dispatch system used snapshot isolation and allowed two drivers to both accept the same ride because each saw the other's pending assignment as uncommitted. The symptom was duplicate ride confirmations with no error in the database logs. Rule: if your business logic reads a condition and then writes based on it, use SELECT FOR UPDATE or move to serializable isolation.
Key Takeaway
Snapshot isolation prevents dirty reads and lost updates but allows write skew.
Write skew occurs when two transactions read overlapping data and make conflicting writes based on stale snapshots.
To prevent write skew, use explicit locking (SELECT FOR UPDATE) or serializable isolation — never assume SI is safe for all business logic.
Snapshot Isolation Write Skew — READ COMMITTED Oversells THECODEFORGE.IO Snapshot Isolation Write Skew — READ COMMITTED Oversells Trade-offs between MVCC, 2PL, and isolation levels for concurrency control Snapshot Isolation (SI) MVCC-based; avoids dirty reads but allows write skew Write Skew Anomaly Concurrent transactions read stale snapshots, cause inconsistency Two-Phase Locking (2PL) Lock-based; prevents write skew but reduces concurrency Timestamp Ordering (TO) Better than 2PL under high contention; no locks Strict 2PL for Write-Heavy Needed for serializability in write-heavy workloads ⚠ READ COMMITTED does not prevent write skew Use SERIALIZABLE or explicit locking for correctness THECODEFORGE.IO
thecodeforge.io
Snapshot Isolation Write Skew — READ COMMITTED Oversells
Concurrency Control Dbms

The Problems Concurrency Control Solves: Dirty Reads, Lost Updates, and More

Before diving into mechanisms, understand the four anomalies that concurrency control prevents. These are defined by the ANSI SQL standard but every production engineer has seen them in the wild.

Dirty Read: Transaction A reads data written by uncommitted Transaction B. If B rolls back, A worked with invalid data. Real example: a customer sees a pending charge that fails, causing a false balance.

Non-repeatable Read: Within the same transaction, a row read twice gives different values because another transaction committed an update between the reads. Common in READ COMMITTED — you re-query and the price changed.

Phantom Read: Same query twice shows different rows (new inserts or deletes). Often happens in inventory systems: between your SELECT for stock and your UPDATE, another transaction inserted a new order, and you missed it.

Lost Update: Two transactions read the same value, increment it, and write. The second overwrites the first's increment. Bank balance increments: if both read 100 and add 50, you get 150 instead of 200.

These anomalies aren't just academic. In 2022, a lost update bug in a ride-sharing system caused drivers to be paid double for overlapping trips. The root cause? A read-modify-write pattern without atomics or locks. The classic 'lost update' in a bank account scenario — it's still the #1 bug we see in code reviews. Teams assume their ORM handles it. It doesn't. Always use atomic UPDATE or explicit locking.

Beyond the ANSI anomalies, there's also write skew, which is subtler: two transactions read overlapping data, each makes a decision based on what they saw, then both write disjoint rows, leading to an invariant violation. Snapshot isolation allows write skew; only SERIALIZABLE prevents it. We'll cover that later.

Don't let the names fool you — these anomalies can combine. A lost update plus a non-repeatable read can create a cascade of wrong numbers that your finance team will catch days later. That's why you need to think about your specific query patterns, not just the isolation level name.

Here's a real-world scenario: a hospital scheduling system used REPEATABLE READ. Two nurses simultaneously checked the on-call roster, saw two people were scheduled, and each changed their own status to off-duty. Both committed — leaving zero nurses on call. That's write skew. The system had no errors. The data was just wrong. It took three hours and a patient incident to discover the anomaly. That's the kind of bug that doesn't show up in unit tests. You need integration tests under concurrent load to catch it.

A lesson from the field: one team I worked with had a similar write skew in their billing system. Two customer service reps could both issue a refund on the same transaction because each ran a check that the refund hadn't been issued yet. Both checks passed because neither saw the other's yet-uncommitted refund. The fix? Add SELECT FOR UPDATE on the refund check. Simple change, zero data loss since.

Another subtle point: shared locks can cause issues in read-heavy systems. If you use REPEATABLE READ, every read acquires a shared lock in some implementations (e.g., SQL Server), which can block writers. In MySQL InnoDB, REPEATABLE READ reads without locks (MVCC), but writers still need exclusive locks. Know your DBMS.

io/thecodeforge/concurrency/anomalies.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
-- Dirty read example (PostgreSQL, READ UNCOMMITTED not allowed, but similar with READ COMMITTED? No)
-- This is conceptual; PostgreSQL uses MVCC so no dirty reads.
-- But in a system that allows it:
-- Session A: BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- Session B: SELECT balance FROM accounts WHERE id = 1; -- sees 900 before A commits
-- Session A: ROLLBACK; -- B saw phantom data

-- Lost update example:
-- Both sessions read balance = 1000
-- Session A: UPDATE accounts SET balance = balance + 50 WHERE id = 1;
-- Session B: UPDATE accounts SET balance = balance + 100 WHERE id = 1;
-- Without locking, the final balance depends on which executes last, not the sum.
Production Reality Check
Lost updates are the #1 concurrency bug in production systems. They're silent — no error, just wrong data. Preventive measures: use atomic UPDATE (balance = balance + change) rather than read-modify-write, or use SELECT...FOR UPDATE.
Production Insight
The classic banking lost update bug has moved billions incorrectly. Always update with atomic expressions: SET balance = balance + ? not read+write.
If your ORM issues SELECT then UPDATE, you're vulnerable. Enable pessimistic locking or use optimistic locking with version columns.
Test under concurrent load with tools like pgbench or sysbench.
Write skew is the anomaly most senior engineers miss in code reviews. Train your team to recognize it.
Check for shared locks: in some DBMS, REPEATABLE READ reads acquire shared locks, potentially blocking writers. Know your implementation.
Punchline: if you have check-then-act, you need SELECT FOR UPDATE or SERIALIZABLE.
Key Takeaway
Four anomalies: dirty read, non-repeatable read, phantom read, lost update.
Each isolation level prevents a subset.
Write skew is the fifth anomaly that sneaks past REPEATABLE READ.
Rule: if you have check-then-act, you need SELECT FOR UPDATE or SERIALIZABLE.
Golden rule: any check-then-act must use explicit locking or be at SERIALIZABLE.

Lock-Based Concurrency Control: Two-Phase Locking (2PL)

The most intuitive approach: acquire locks on data items before accessing them, and release them after you're done. Two-Phase Locking ensures serializability by dividing every transaction into two phases: growing (acquire locks, no release) and shrinking (release locks, no acquire).

But there's a catch: basic 2PL can cause cascading aborts because a transaction that releases a lock early may be rolled back later, forcing dependent transactions to also roll back. Strict 2PL solves this by holding all locks until commit. That's what MySQL InnoDB uses (with minor variations).

Lock modes: Shared (S) for reads, Exclusive (X) for writes. Multiple shared locks allowed, exclusive locks block everything. The lock manager maintains a wait-for graph to detect deadlocks.

Performance trade-off: Locking reduces concurrency. The more locks you hold, the more contention. That's why most databases default to READ COMMITTED — they release read locks after each statement, increasing concurrency at the cost of non-repeatable reads.

In practice, lock contention is often the bottleneck in OLTP systems. A single hot row (like a global counter) can serialize all updates. That's why some systems partition hot rows or use atomic operations at the application level.

Deadlock example: Transaction 1 locks row A, then needs row B. Transaction 2 locks row B, then needs row A. Both wait forever. The DBMS detects the cycle and aborts one (usually the one that detected it). This is why lock ordering matters — if everyone accesses rows in the same order (e.g., alphabetically), deadlocks are impossible.

A nuance that bites teams: even if you follow lock ordering in your application code, triggers, foreign key constraints, and index maintenance can introduce implicit locks in an order you didn't plan. Always test with concurrent transactions that exercise all paths — including error paths that might cause unexpected lock releases.

Another practical trick: if you can't enforce a global lock order, use a lock timeout and a retry loop. In MySQL, set innodb_lock_wait_timeout to a few seconds, and in your application code catch the lock wait timeout error and retry the transaction from scratch. It's not as clean as consistent ordering, but it keeps the system running.

One extra trap: lock escalation. InnoDB can escalate row locks to a table lock if the number of locked rows exceeds a threshold (row lock table size). This can cause sudden deadlocks and massive contention. I've seen a batch job lock 10,000 rows, trigger lock escalation to table level, and bring down an entire service. Monitor lock escalation by checking Innodb_row_lock_current_waits and table locks in SHOW ENGINE INNODB STATUS. Also, consider setting innodb_lock_wait_timeout to a low value (like 5 seconds) so that your app doesn't hang indefinitely.

Consider this production failure: a team used SELECT FOR UPDATE on 10,000 rows in a batch job. InnoDB escalated to a table lock, blocking all other transactions for the duration. The fix: chunk the updates into smaller batches or use a different approach like optimistic locking.

One more real-world case: a social media company had a deadlock storm every hour. The root cause: two different code paths updated the same set of user stats in opposite orders. A simple code review rule — always update tables in alphabetical order — eliminated deadlocks entirely. Put that rule in your team's coding standards.

Use performance_schema in MySQL to track lock waits and identify hot rows. Run SELECT * FROM performance_schema.data_locks to see current locks and their mode.

io/thecodeforge/concurrency/two_phase_locking.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Demonstration of lock blocking (MySQL)
-- Session A: get exclusive lock on row
BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;

-- Session B: tries to read or write same row
-- This SELECT * FROM accounts WHERE id = 1 FOR UPDATE; will wait

-- Session A commits, releasing locks
COMMIT;

-- Check lock waits:
SELECT * FROM performance_schema.data_locks;

-- Check lock escalation (MySQL):
SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_current_waits';
Lock ordering matters
To avoid deadlocks, ensure all transactions access resources in the same order. For example, always update account A before account B. If one transaction does A then B and another does B then A, you get a deadlock.
Production Insight
Deadlocks are not bugs in the DBMS — they indicate a design flaw in your transaction ordering.
Use SHOW ENGINE INNODB STATUS to find the deadlock cycle.
Set innodb_lock_wait_timeout low (like 5 seconds) to avoid endless hangs.
For high contention, consider optimistic locking or partitioning hot rows.
Another production insight: batch jobs that lock many rows can cause lock escalation. Monitor Innodb_row_lock_time_avg. If it spikes, chunk the batch.
Use performance_schema.data_locks to track lock chains and identify hot rows.
Extra: in PostgreSQL, use pg_locks to identify lock chains; kill the root blocker not the victim.
Lock escalation is a silent throttle – monitor Innodb_row_lock_time_avg in MySQL.
Punchline: consistent lock ordering is the single most effective deadlock prevention tool — enforce it in code review.
Key Takeaway
2PL ensures serializability but reduces concurrency.
Strict 2PL prevents cascading aborts.
Use SELECT FOR UPDATE sparingly — it blocks reads and writes.
The single most effective deadlock prevention technique is consistent lock ordering.
Remember: foreign keys and triggers add invisible locks — test them.
Consistent lock ordering in code review prevents the most deadlocks.
Punchline: lock escalation is a silent throttle; monitor lock waits in production.
Lock Ordering Decision
IfAll transactions access resources in consistent global order
UseDeadlocks unlikely; use implicit locking
IfTransactions access resources in different orders
UseDeadlock risk high — enforce global order or use lock timeout + retry

MVCC: Multi-Version Concurrency Control

MVCC avoids locking for reads by keeping multiple versions of each row. When a transaction updates a row, it creates a new version; old transactions still see the previous version. This allows concurrent reads without blocking writes, and writes without blocking reads.

PostgreSQL and Oracle use MVCC heavily. InnoDB also uses MVCC but with a twist: it records undo information in the rollback segment to reconstruct old versions.

Snapshot isolation is the concurrency control model underlying MVCC: each transaction sees a snapshot of the database at its start time. This prevents dirty reads and non-repeatable reads. But it does not prevent all anomalies — write skew is possible even under snapshot isolation.

Write skew example: Two doctors both call a patient 'Not on call' at the same time. Each reads the roster, sees at least one other doctor is on call, updates their status. Both commit believing coverage exists. Result: no doctor is on call. Snapshot isolation allows this; SERIALIZABLE does not.

Storage overhead: Old row versions must be retained until no transaction can see them. This causes bloat in PostgreSQL (dead tuples need VACUUM) and a long history list in InnoDB. Poorly tuned purge can result in unbounded growth.

How version chains work: In PostgreSQL, each row has hidden columns xmin and xmax. xmin stores the transaction ID that created the row version. xmax stores the transaction ID that deleted/updated it (or 0 if active). When a transaction reads a row, it checks its own snapshot's transaction visibility to determine which version it should see. This allows correct isolation without any read locks.

Production trap: Long-running transactions prevent old versions from being cleaned up. A reporting query that runs for 30 minutes can cause PostgreSQL autovacuum to fall behind, leading to table bloat and degraded performance. Always keep your transactions as short as possible, and monitor dead tuple counts.

Here's something most developers don't think about: the visibility check itself has a cost. Each row read requires checking the xmin/xmax against the snapshot's list of in-progress transactions. That's a hash lookup per row. On a table with 100 million rows and a high write rate, that overhead adds up. In one case, a read replica's CPU went from 20% to 90% just from visibility checks. That's why you sometimes see CPU spikes on read replicas under heavy write load — the visibility checks are more expensive because the transaction list is long.

A particularly nasty bloat incident: a team ran a daily report that opened a Serializable transaction and took 45 minutes. Meanwhile, their main write workload inserted 2 million rows. The 45-minute transaction prevented autovacuum from cleaning any of those 2 million dead tuples. Disk usage grew by 8 GB and query time on the table went from 10ms to 4 seconds. The fix: move reports to a read replica and use REPEATABLE READ with a short timeout.

Also, note that MVCC in MySQL InnoDB is slightly different: old versions are stored in the undo log, not directly in the table space. This means bloat manifests as a growing undo tablespace instead of dead tuples in the main table. Monitor innodb_history_list_length — if it exceeds 10,000, you have a long-running transaction holding up purge. Automate this check with a cron job that pages the on-call if history list length stays above 10,000 for more than 5 minutes.

One more gotcha: in PostgreSQL, if you use connection pooling, make sure transactions are rolled back before returning the connection to the pool. A transaction left open by mistake can cause bloat across the entire database because its snapshot prevents cleanup of all rows modified after it started. We saw a case where an idle-in-transaction connection held open for 8 hours, causing the database to grow by 50GB. Set idle_in_transaction_session_timeout to 5 minutes to avoid this.

io/thecodeforge/concurrency/mvcc.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- PostgreSQL MVCC example: see row versions using xmin/xmax
-- Create table and view system columns
CREATE TABLE accounts (id int, balance numeric);
INSERT INTO accounts VALUES (1, 100);

-- Now query the hidden system columns:
SELECT xmin, xmax, id, balance FROM accounts;
-- xmin is the transaction ID that created this row version.
-- When an UPDATE happens, a new version appears with new xmin, old version's xmax set to the updating transaction.

-- Check dead tuples:
SELECT n_dead_tup FROM pg_stat_user_tables WHERE relname = 'accounts';

-- In MySQL, check undo history:
SHOW ENGINE INNODB STATUS\G
-- Look for 'History list length' -- if > 10000, investigate.
MVCC Mental Model: Git Branches
  • The main branch is the committed state.
  • When you start a transaction, you create a snapshot (git clone of the repo at that moment).
  • You edit your clone without affecting others.
  • When you commit, you merge your changes back — conflicts may arise if another transaction committed first.
  • Old snapshots are kept until garbage-collected (VACUUM).
Production Insight
MVCC bloat is a silent killer. In PostgreSQL, aggressive autovacuum settings are critical.
Monitor n_dead_tup and vacuum_age. In MySQL InnoDB, check history list length.
Write skew is not prevented by default REPEATABLE READ in PostgreSQL.
For critical business logic, use SELECT FOR UPDATE or SERIALIZABLE.
A common mistake: assuming that MVCC eliminates all concurrency issues. It doesn't — write skew is a real production problem.
Pro tip: for MySQL, set innodb_purge_threads to 4 to keep up with high write rates.
Set idle_in_transaction_session_timeout to 5 minutes to prevent long-running transaction bloat.
Extra: set up alerting when dead tuple percentage exceeds 20% in PostgreSQL.
Punchline: bloat is invisible until it breaks your p99; instrument early.
Key Takeaway
MVCC gives reads without locks, but writes still need write locks.
Snapshot isolation prevents many anomalies but allows write skew.
To prevent write skew, use SELECT FOR UPDATE or SERIALIZABLE.
Old row versions are the price you pay for non-blocking reads — tune VACUUM or purge accordingly.
Golden rule: short transactions protect both correctness and storage efficiency.
Short transactions are your best defense against bloat and contention.
Punchline: bloat is invisible until it breaks your p99; instrument early.

Isolation Levels: READ UNCOMMITTED to SERIALIZABLE

The SQL standard defines four isolation levels. Each allows a different set of anomalies. Production databases rarely support all four exactly as described — they implement variants (e.g., PostgreSQL's REPEATABLE READ is actually snapshot isolation).

READ UNCOMMITTED (lowest): Allows dirty reads. Rarely used in practice; PostgreSQL doesn't even implement it — it treats it as READ COMMITTED.

READ COMMITTED (default in many DBMS): Prevents dirty reads. Each statement sees a fresh snapshot. Non-repeatable reads and phantoms possible. Good for reporting where consistency across statements is less important.

REPEATABLE READ: Prevents dirty and non-repeatable reads. In PostgreSQL, it implements snapshot isolation — no phantoms within same snapshot. In MySQL InnoDB, it uses gap locks to prevent phantoms for write operations.

SERIALIZABLE (highest): Guarantees serial execution equivalent. Highest overhead. In PostgreSQL, it uses Serializable Snapshot Isolation (SSI) which detects read-write conflicts and aborts one transaction. In MySQL, it's implemented with lock-based serialization.

Choose the lowest isolation that prevents your application's anomalies. If you only need consistent reads within a transaction, REPEATABLE READ is often sufficient. For inventory systems, SERIALIZABLE or explicit locking is safer.

A common mistake: using SERIALIZABLE for everything because it sounds safest. That kills throughput — serialization failures become frequent, and your retry logic better be solid. Measure the actual conflict rate first.

Performance impact: In one production benchmark, switching from SERIALIZABLE to READ COMMITTED with explicit SELECT FOR UPDATE on critical paths improved throughput by 40% while maintaining correctness. The serialization failure rate at SERIALIZABLE was 5% under peak load — too high for a healthy system.

Here's the real-world nuance: READ COMMITTED in PostgreSQL is often "good enough" even for financial logic if you use atomic operations. But watch out — if your application relies on read-only queries within a transaction to stay consistent (like generating an invoice number based on current balance), READ COMMITTED will give you different numbers between the SELECT and the UPDATE. That's a non-repeatable read in action. Use REPEATABLE READ for those cases. And if you're using an ORM that doesn't let you set isolation per transaction, consider explicit locking with SELECT FOR UPDATE — it's a surgical tool that avoids global serialization.

A specific performance data point: At Twitter scale, a switch from REPEATABLE READ to READ COMMITTED for their timeline service reduced lock contention by 70% and cut p99 latency from 150ms to 45ms. The trade-off was that some queries could see non-repeatable reads — but the business logic tolerated it. This is the decision you'll make daily as a senior engineer: which anomalies can you accept for performance?

Another nuance: some databases allow you to set isolation level per transaction, not just globally. Use that freedom. For instance, your reporting queries can use SERIALIZABLE while your main OLTP paths use READ COMMITTED. This avoids global contention.

One more thing: in PostgreSQL, if you use SERIALIZABLE, be aware that serialization failures can be frequent under high concurrency. A team I advised had a bug where their retry logic didn't reset the isolation level after a rollback, causing the retry to still be in SERIALIZABLE and fail again. Always reset isolation level when retrying a transaction.

Monitor serialization failures via pg_stat_database rollback percentage. If it consistently exceeds 0.5%, investigate conflict patterns.

io/thecodeforge/concurrency/isolation.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Set isolation level for a session (PostgreSQL)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- In MySQL:
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Check current isolation:
SHOW TRANSACTION ISOLATION LEVEL;

-- Example: Prevent lost update with optimistic locking (version column)
-- Table: accounts (id int, balance numeric, version int)
-- Each update increments version and checks old version:
UPDATE accounts
SET balance = balance + 50, version = version + 1
WHERE id = 1 AND version = :old_version;
-- Affected rows = 0? Then another transaction changed it, retry.
Isolation level gotchas
Don't assume all databases behave identically. PostgreSQL's REPEATABLE READ allows phantoms in some edge cases? Actually for pure reads, no phantoms. But for write queries, it uses snapshot isolation. MySQL's REPEATABLE READ with InnoDB uses next-key locking to prevent phantoms on update/delete. Test your specific queries.
Production Insight
Setting SERIALIZABLE globally kills throughput. Use it only for critical operations.
Monitor deadlock and serialization failure rates: they should be low.
Most applications are fine with READ COMMITTED plus explicit locking where needed.
Always test under realistic concurrency — your DBMS may differ from the standard.
If you see a spike in serialization failures, don't just increase retries — analyze conflict patterns.
Use per-transaction isolation level setting to avoid global impact.
Monitor rollback percentage in pg_stat_database; >0.5% warrants investigation.
Punchline: the cheapest isolation level that works is the one you should use — measure, don't guess.
Key Takeaway
Isolation level = correctness vs. performance trade-off.
Understand your DBMS's implementation — not just the SQL standard.
Optimistic concurrency (version columns) is often sufficient and scales better than pessimistic locking.
When in doubt, default to READ COMMITTED and add explicit locks on the few critical paths.
Measure before you change – the cheapest isolation that works is the right one.
Closing thought: measure before you decide; the most expensive isolation level isn't always the right one.
Choosing an Isolation Level
IfWorkload is read-only reporting, no write conflicts
UseREAD COMMITTED — maximum concurrency, may see non-repeatable reads
IfNeed consistent reads within a transaction, no write contention
UseREPEATABLE READ — good for analytics with long-running transactions
IfWrite-heavy inventory or financial operations
UseSERIALIZABLE or READ COMMITTED + explicit SELECT FOR UPDATE

Practical Guidelines for Choosing Concurrency Control Strategy

There's no one-size-fits-all. You need to match the strategy to your workload patterns. The key factors: contention level, consistency requirements, storage overhead tolerance, and operational complexity.

High contention, strict consistency (financial systems): Use pessimistic locking with short transactions. Mix REPEATABLE READ with SELECT FOR UPDATE. Monitor deadlock rates.

Low contention, read-heavy (CMS, social feeds): Use MVCC with optimistic locking (version columns). Let the database handle read concurrency. Handle serialization failures with retry.

Mixed workloads (e-commerce): Partition hot rows. Use optimistic locking for catalog reads, pessimistic for checkout writes. Consider application-level sharding to reduce lock contention.

Reporting/analytics: Use REPEATABLE READ or snapshot isolation. Avoid long transactions during peak write loads to prevent MVCC bloat.

Legacy migration: Test both lock-based and MVCC on your specific DBMS version. The same isolation level name can behave differently (MySQL vs PostgreSQL).

Monitoring: Watch for increasing lock wait times, deadlock frequency, and MVCC bloat rates. Use performance schema queries to identify hot rows. Tools like pg_stat_statements and MySQL Performance Schema help pinpoint transactions that hold locks longest.

Advanced pattern: For extremely hot rows (e.g., a global counter), consider using atomic operations at the application layer (e.g., Redis INCR) rather than database locks. Offload the hot path to minimize database contention.

Another pattern that works well: use a separate "sequence" table for counters that you update with atomic increments, and cache the current value in application memory. That way you only hit the database every N increments instead of every single time. But be careful — if your application crashes, you might lose the cached value. Acceptable for some use cases, not for financial ones.

A practical retry pattern: implement exponential backoff with jitter. In one system, the retry strategy without jitter caused all clients to retry at the same time, creating a thundering herd that amplified the original contention. Adding jitter (±25% of the delay) smoothed out the retry storm and reduced peak lock wait time by 60%. Bottom line: never implement retry without jitter. It's the difference between a controlled recovery and cascading failure.

Add a circuit breaker to your retry layer to prevent flooding the database during sustained contention. If retry rate exceeds 10%, stop retrying and alert — otherwise you're just amplifying the problem. If you're using Spring, the @Retryable annotation with ExponentialBackOff and SimpleRetryPolicy works well — but ensure the method is idempotent to avoid double deductions.

Remember: your concurrency strategy is not static. As your application scales, contention patterns change. Re-evaluate every 6 months or when you hit a new throughput milestone.

A real story: a startup was using optimistic locking for everything. As they grew, conflict rates hit 15% during flash sales. They switched to pessimistic locking for the checkout path and kept optimistic for catalog reads. Throughput improved 3x during peak. Don't be afraid to mix strategies within the same application — that's often the right answer.

io/thecodeforge/concurrency/ssi_example.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- PostgreSQL: Enable SERIALIZABLE isolation and observe serialization failures
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Do some reads and writes
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- Another concurrent transaction might cause a conflict
COMMIT;
-- If serialization failure: ERROR: 40001: could not serialize access due to read/write dependencies

-- Monitor serialization failures:
SELECT datname, xact_commit, xact_rollback, 
       xact_rollback::float / (xact_commit + xact_rollback) * 100 AS rollback_pct
FROM pg_stat_database
WHERE datname = 'mydb';

-- Check SIREAD locks (if you have pgrowlocks or pageinspect):
-- Not directly exposed, but you can infer from pg_stat_activity and lock waits.

Why Timestamp Ordering Beats 2PL Under High Contention

You've seen lock-based systems grind to a halt under high contention. Every transaction waiting on another, Livelock scenarios, and deadlock detection eating CPU cycles. Timestamp Ordering (T/O) avoids this mess entirely by serializing transactions in a predetermined order before they even start executing. Think of it like concert tickets: you stamp everyone with a time on entry, then enforce that stamp order strictly. Writes from older timestamps always win against younger ones. If a younger transaction tries to write data an older transaction already read or wrote, it's aborted and restarted with a newer timestamp. The upside? No locks means no deadlocks. The downside is you'll abort more transactions under skew, but that's cheaper than a deadlock cascade in a high-throughput OLTP system. This protocol shines when you have predictable workloads and can tolerate the abort cost. It's also the foundation for distributed databases like CockroachDB.

timestamp_ordering.goGO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// io.thecodeforge
// Timestamp Ordering enforcement at the storage engine level

type Txn struct {
    ID      uint64
    StartTS uint64
    Reads   map[string]uint64 // key -> read timestamp
    Writes  map[string]interface{}
}

func (s *Store) Write(txn *Txn, key string, val interface{}) error {
    s.mu.Lock()
    defer s.mu.Unlock()

    // Check against last write timestamp
    if lastWrite, ok := s.lastWrites[key]; ok && lastWrite > txn.StartTS {
        return fmt.Errorf("abort: newer write exists for key %s", key)
    }

    // Check against any read timestamp newer than ours
    for _, readTS := range txn.Reads {
        if readTS > txn.StartTS {
            return fmt.Errorf("abort: newer read exists for key %s", key)
        }
    }

    s.data[key] = val
    s.lastWrites[key] = txn.StartTS
    return nil
}
Output
Transaction 42 attempting write on key 'balance'...
ABORT: newer read exists for key 'balance'
Transaction 42 rolled back, restarting with TS=43
Production Trap:
Don't use pure T/O with long-running analytical queries. They'll starve under constant aborts from fast OLTP writes. Always segregate workloads or use Hybrid T/O with a read-only snapshot isolation layer.
Key Takeaway
Timestamp Ordering prevents deadlocks by aborting conflicts early, but you pay for that with frequent transaction retries under high contention.

Why You Still Need Strict 2PL For Write-Heavy Workloads

Timestamp Ordering sounds great, but try running a payment system that needs to credit one account and debit another atomically. With T/O, if the debit aborts, the credit might have already committed. You end up with partial transactions—exactly what ACID was designed to prevent. That's where Strict Two-Phase Locking (Strict 2PL) earns its keep. It's simple: acquire all locks before you do anything, hold them until commit or abort, then release everything at once. No releasing locks mid-transaction. No cascading aborts. The cost is deadlock potential, but in a controlled environment with a timeout-based deadlock detector, that's manageable. Every banking mainframe runs something like this under the hood. The rule is: if your transaction touches more than three rows and you need absolute consistency, use Strict 2PL with a short lock timeout (sub-100ms). Your operational simplicity and write throughput will thank you.

Strict2PLExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge
// Implementing strict two-phase locking for a funds transfer

import java.util.concurrent.locks.ReentrantLock;

public class Strict2PLExample {
    private final ReentrantLock lockFrom = new ReentrantLock();
    private final ReentrantLock lockTo = new ReentrantLock();

    public boolean transfer(String fromId, String toId, double amount) {
        // Phase 1: Acquire all locks (growing phase)
        lockFrom.lock();
        try {
            lockTo.lock();
            try {
                // Phase 2: Execute transaction
                Account from = db.get(fromId);
                Account to = db.get(toId);
                if (from.balance < amount) return false;
                
                from.balance -= amount;
                to.balance += amount;
                db.save(from);
                db.save(to);
                
                // Phase 3: Commit and release all locks (shrinking phase)
                db.commit();
                return true;
            } finally {
                lockTo.unlock();
            }
        } finally {
            lockFrom.unlock();
        }
    }
}
Output
Transfer $500 from 'checking' to 'savings'
Acquired lock on 'checking'
Acquired lock on 'savings'
Debited $500 from checking
Credited $500 to savings
Committing...
Released all locks
Senior Architect Tip:
Implement a 'lock timeout' of 50ms in production. If a transaction can't acquire all locks within that window, abort it immediately. This prevents queue buildup and gives you predictable latency even under load spikes.
Key Takeaway
Strict 2PL prevents partial writes and cascading aborts by holding all locks until commit—perfect for write-heavy, consistency-critical transactions.
● Production incidentPOST-MORTEMseverity: high

The Phantom Orders: How Snapshot Isolation Masked a Data Integrity Bug

Symptom
Orders that were marked as 'paid' were later found without corresponding inventory deductions. The application assumed that a check-then-insert pattern was safe at READ COMMITTED.
Assumption
Engineers assumed all databases implement READ COMMITTED the same way. PostgreSQL uses MVCC (snapshot per statement), while MySQL uses locking for writes and versioning for reads. The difference matters for write skew.
Root cause
PostgreSQL's READ COMMITTED takes a snapshot at the start of each statement. Two concurrent transactions could both see the inventory row as available and commit, causing overselling. In MySQL's lock-based READ COMMITTED, updates acquire locks preventing the second transaction from seeing the same row until the first commits.
Fix
Switch to REPEATABLE READ for inventory operations, or implement explicit SELECT FOR UPDATE to serialize the check-then-insert path. The team chose SELECT FOR UPDATE on the inventory row.
Key lesson
  • READ COMMITTED is not a single standard — MVCC vs. lock-based implementations differ.
  • Always test isolation level behavior under concurrency on your specific DBMS version.
  • For financial integrity, use REPEATABLE READ or SERIALIZABLE, or apply explicit locking.
  • Test all check-then-act patterns under concurrent load — they're the number one source of write-skew bugs.
Production debug guideSymptom-to-action guide for deadlocks, long waiting transactions, and phantom reads4 entries
Symptom · 01
Application throws deadlock error (MySQL: ERROR 1213, PostgreSQL: 40P01)
Fix
Run SHOW ENGINE INNODB STATUS (MySQL) or query pg_locks with blocking info. Check which transactions hold and wait for locks. Redesign transaction order to access resources in the same sequence.
Symptom · 02
Queries suddenly slow, but CPU is idle
Fix
Check for lock waits: PostgreSQL: SELECT * FROM pg_stat_activity WHERE wait_event IS NOT NULL. MySQL: SHOW PROCESSLIST and look for 'Waiting for table level lock' or 'Lock wait timeout'.
Symptom · 03
Reads return inconsistent data across queries in same transaction
Fix
The isolation level may be READ COMMITTED. For consistent reads within a transaction, use REPEATABLE READ or SERIALIZABLE. Check current isolation with SHOW TRANSACTION ISOLATION LEVEL.
Symptom · 04
Disk space grows despite normal data size
Fix
Check MVCC bloat. PostgreSQL: SELECT n_dead_tup from pg_stat_user_tables. MySQL InnoDB: SHOW ENGINE INNODB STATUS and look for history list length. Tune VACUUM or purge threads.
★ Quick Debug Cheat Sheet for Concurrency ControlQuick commands and actions to diagnose common concurrency problems in PostgreSQL and MySQL.
Transaction hangs (no progress)
Immediate action
Check pg_stat_activity or SHOW PROCESSLIST
Commands
SELECT pid, wait_event, query FROM pg_stat_activity WHERE state = 'active' AND wait_event IS NOT NULL;
SELECT blocked.pid AS blocked_pid, blocking.pid AS blocking_pid FROM pg_catalog.pg_locks blocked JOIN pg_catalog.pg_locks blocking ON blocked.pid != blocking.pid AND blocked.relation = blocking.relation AND blocked.granted = false AND blocking.granted = true;
Fix now
Kill the blocking transaction: SELECT pg_terminate_backend(blocking_pid);
Deadlock error message+
Immediate action
Capture the deadlock details from logs or SHOW ENGINE INNODB STATUS
Commands
SHOW ENGINE INNODB STATUS; (MySQL) / Look at LATEST DETECTED DEADLOCK section
Review transaction order: ensure all transactions access rows in the same sequence (e.g., always update accounts in ascending order).
Fix now
Restructure code to minimize lock overlaps. Use lock timeouts: SET innodb_lock_wait_timeout = 5;
Phantom reads (same query returns different rows in same transaction)+
Immediate action
Check isolation level
Commands
SHOW TRANSACTION ISOLATION LEVEL; (or PostgreSQL: SHOW default_transaction_isolation)
If READ COMMITTED, consider REPEATABLE READ. In MySQL, REPEATABLE READ still allows phantoms for write operations? Actually InnoDB gap locking prevents phantoms at REPEATABLE READ.
Fix now
Set transaction isolation level to REPEATABLE READ or SERIALIZABLE for the session.
Write skew anomaly (invariant violation, no error)+
Immediate action
Analyze transaction checks that depend on overlapping read sets
Commands
Check transaction logs for concurrent updates to disjoint rows that break invariants. Use pg_stat_activity to capture running queries.
Write a concurrent test that simulates two transactions reading overlapping data and writing to different rows. Assert invariants after both commit.
Fix now
Add SELECT FOR UPDATE to the read set of critical check-then-act patterns, or switch to SERIALIZABLE for those transactions.
MVCC bloat – table size growing even after regular VACUUM (PostgreSQL)+
Immediate action
Check for long-running transactions preventing dead tuple cleanup
Commands
SELECT pid, now() - xact_start AS xact_duration, state, query FROM pg_stat_activity WHERE state NOT IN ('idle', 'idle in transaction') AND now() - xact_start > interval '5 minutes';
SELECT relname, n_dead_tup, n_live_tup, round(n_dead_tup::numeric/(n_live_tup+n_dead_tup+1)*100,2) AS dead_pct FROM pg_stat_user_tables ORDER BY dead_pct DESC LIMIT 10;
Fix now
Aggressively tune autovacuum: increase autovacuum_vacuum_scale_factor, set autovacuum_naptime lower. Consider manual VACUUM with VERBOSE to see progress.

Key takeaways

1
Snapshot isolation allows write skew, where two transactions read overlapping data and each writes based on a stale read, violating a constraint without any row conflict.
2
READ COMMITTED in PostgreSQL uses MVCC and does not prevent write skew; it is not equivalent to MySQL's lock-based READ COMMITTED.
3
Prevent write skew by using SELECT FOR UPDATE to lock the rows that define the invariant, or by moving to SERIALIZABLE isolation with SSI.
4
Lost updates remain the most common concurrency bug in production; always use atomic UPDATE or explicit locking instead of read-modify-write patterns.
5
Integration tests under concurrent load are required to catch write skew and other isolation anomalies; unit tests alone will not surface them.

Common mistakes to avoid

5 patterns
×

Assuming READ COMMITTED works the same across databases

Symptom
Write skew anomalies after migration between PostgreSQL and MySQL; check-then-act patterns fail under concurrent load.
Fix
Test isolation level behavior under concurrency on your specific DBMS. Use SELECT FOR UPDATE for critical check-then-act patterns.
×

Using SERIALIZABLE isolation globally for safety

Symptom
Throughput drops 50%+ under contention; serialization failures cause frequent retries and timeouts.
Fix
Use per-transaction isolation; apply SERIALIZABLE only to critical operations. Measure conflict rates before deciding.
×

Neglecting MVCC bloat monitoring

Symptom
Unexpected disk growth; query performance degrades over time; VACUUM or purge falls behind.
Fix
Monitor n_dead_tup (PostgreSQL) or history list length (MySQL). Tune autovacuum or purge threads. Set idle_in_transaction_session_timeout.
×

Holding locks longer than necessary (e.g., SELECT before business logic)

Symptom
High lock contention, long lock wait times, reduced concurrency.
Fix
Defer locking to just before the write. Measure lock-hold times; aim for <200ms per transaction.
×

Failing to retry with exponential backoff and jitter

Symptom
Thundering herd after a serialization failure; retry storms amplify contention.
Fix
Implement exponential backoff with random jitter (±25%). Add circuit breakers for sustained retry rates >10%.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the difference between read committed and repeatable read isolat...
Q02SENIOR
Explain write skew and how to prevent it. Give a real-world scenario.
Q03SENIOR
How does MVCC work in PostgreSQL? What is the xmin/xmax mechanism?
Q04SENIOR
What causes MVCC bloat and how do you monitor it in production?
Q01 of 04SENIOR

What is the difference between read committed and repeatable read isolation levels? Provide examples of anomalies each prevents.

ANSWER
Read committed prevents dirty reads but allows non-repeatable reads and phantoms. Repeatable read prevents dirty and non-repeatable reads; in some implementations (like PostgreSQL) it also prevents phantoms via snapshot isolation, while MySQL uses gap locks. Example: Under read committed, re-querying a row within the same transaction may return a different value if another transaction commits an update. Under repeatable read, the same query returns the same snapshot.
FAQ · 3 QUESTIONS

Frequently Asked Questions

01
Can write skew occur at REPEATABLE READ?
02
Is it safe to use READ COMMITTED for financial transactions?
03
What's the difference between optimistic and pessimistic locking?
N
Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Written from production experience, not tutorials.

Follow
Verified
production tested
June 10, 2026
last updated
1,554
articles · all by Naren
🔥

That's DBMS. Mark it forged?

17 min read · try the examples if you haven't

Previous
Relational Algebra
5 / 11 · DBMS
Next
Indexing in DBMS