Junior 9 min · March 06, 2026

DBMS Deadlock — Lock Order Bugs That Freeze Payment Systems

Q: What is deadlock in DBMS in simple terms?

Deadlock in DBMS is a situation where two or more transactions are stuck waiting for each other to release locks, and none can proceed. It's like two cars at a four-way stop, each waiting for the other to go first. The database must abort one transaction to break the deadlock.

Q: Can deadlock be completely prevented?

Theoretically, yes — by breaking one of the four Coffman conditions. But most prevention methods (like predeclaration) reduce concurrency and throughput. In practice, most databases use detection + recovery, which allows deadlocks but breaks them fast.

Q: How does PostgreSQL handle deadlock recovery?

PostgreSQL's deadlock detector runs periodically (every `deadlock_timeout`). It builds a wait-for graph and if it finds a cycle, it chooses the 'youngest' transaction (by VXID) to abort. That transaction is rolled back, its locks released, and it returns a 'deadlock detected' error. The application must retry.

Q: What is the difference between deadlock and livelock?

In a deadlock, transactions are stuck waiting and make no progress. In a livelock, transactions are actively executing but never completing because they keep getting rolled back (e.g., due to repeated conflicts). Livelock is often caused by poor retry logic or contention on a hot resource.

Q: Does using optimistic concurrency control eliminate deadlocks?

Yes, because OCC doesn't use locks — conflicts are detected at commit time. However, OCC can cause high abort rates under contention, and retries can lead to livelock. It's a trade-off: zero deadlocks but less predictability.

Two procedures locked accounts in opposite order, spiking latency from 20ms to 60s.

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.

✓ Production

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Deadlock is a permanent block when two or more transactions each hold a lock and wait for another's lock
Four Coffman conditions must hold simultaneously: mutual exclusion, hold-and-wait, no preemption, circular wait
Detection uses wait-for graphs; cycles indicate a deadlock
Prevention costs throughput — you break one Coffman condition but pay in performance
Recovery typically kills one victim transaction, rolling back its changes
Biggest mistake: assuming depends_on in tools equals deadlock prevention — it's about lock order, not start order

✦ Definition~90s read

What is Deadlock in DBMS?

A deadlock requires exactly four conditions to hold simultaneously. That's not a simplification — it's a theorem from Coffman et al. (1971). Break any one, and the deadlock disappears. Here's each condition with a database example.

★

Imagine two kids at a dinner table.

1. Mutual Exclusion: Resources (rows, tables, pages) can be held by only one transaction at a time. If you could share every resource, no problem. But writes need exclusive access.

2. Hold and Wait: A transaction holds at least one lock while waiting for another. This is the default behavior — acquire lock A, then request lock B without releasing A.

3. No Preemption: The database can't forcibly take a lock from a transaction. Only the transaction itself releases its locks (usually at commit or rollback).

4. Circular Wait: There's a cycle in the lock-wait graph. Transaction T1 waits for a lock held by T2, T2 waits for T3, ..., Tn waits for T1.

The practical insight: prevention strategies target one condition. For example, using a lock ordering protocol (e.g., always lock accounts by ascending ID) prevents circular wait without changing anything else. That's the most common production fix.

Plain-English First

Imagine two kids at a dinner table. Kid A grabs the ketchup and waits for Kid B to pass the salt. Kid B already grabbed the salt and is waiting for Kid A to pass the ketchup. Neither will let go. Neither can move forward. That's a deadlock — two parties permanently frozen, each holding something the other needs. In a database, transactions are the kids and locks on data rows are the condiments.

Every production database will eventually deadlock. It's not a sign that something is broken — it's a mathematical certainty when multiple transactions compete for overlapping resources. What separates a system that handles deadlocks gracefully from one that freezes under load is understanding the four conditions that create them, the algorithms that detect them, and the prevention strategies that trade throughput for safety. Netflix, Amazon, and every major bank have war stories about deadlocks that brought services down at peak traffic. This isn't academic — it's survival knowledge for anyone building data-intensive systems.

The root problem is that databases need locks to guarantee isolation — one of the four ACID properties. But the moment you have locks, you have the possibility of circular waiting. Two transactions each hold a resource and each wants a resource the other holds. Without intervention, they'll wait forever. The database has to pick a strategy: never let it happen (prevention), detect it after the fact and break the cycle (detection + recovery), or let the caller handle timeouts (avoidance via timeouts). Each strategy has real performance costs.

By the end of this article you'll be able to draw a resource allocation graph and spot a deadlock cycle in it, explain exactly how PostgreSQL's deadlock detector works and when it fires, write transaction code that minimises deadlock probability, and answer the trick interview questions about why deadlock prevention can be worse for throughput than detection. You'll also understand the four Coffman conditions and why you only need to break one of them to eliminate deadlocks entirely.

What is Deadlock in DBMS? The Four Coffman Conditions

Mutual Exclusion: Resources (rows, tables, pages) can be held by only one transaction at a time. If you could share every resource, no problem. But writes need exclusive access.
Hold and Wait: A transaction holds at least one lock while waiting for another. This is the default behavior — acquire lock A, then request lock B without releasing A.
No Preemption: The database can't forcibly take a lock from a transaction. Only the transaction itself releases its locks (usually at commit or rollback).
Circular Wait: There's a cycle in the lock-wait graph. Transaction T1 waits for a lock held by T2, T2 waits for T3, ..., Tn waits for T1.

deadlock_conditions.sqlSQL

-- Simulate hold-and-wait with circular wait
-- Transaction A
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- holds lock on row 1
-- now wait for row 2
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- Transaction B (run concurrently)
BEGIN;
UPDATE accounts SET balance = balance - 200 WHERE id = 2;  -- holds lock on row 2
-- now wait for row 1
UPDATE accounts SET balance = balance + 200 WHERE id = 1;
COMMIT;

-- If both run simultaneously, deadlock occurs.
-- PostgreSQL detects it after deadlock_timeout (default 1s) and aborts one transaction.

The Circular Wait Mental Model

Each transaction holds a lock (a resource), and waits for another. The graph of 'holds' and 'waits' must contain a cycle.
The cycle is the deadlock. If there's no cycle, no deadlock — even if transactions wait a long time.
Breaking any edge in the cycle (by killing a transaction or releasing a lock) resolves the deadlock.

Production Insight

In production, deadlocks rarely involve just two transactions. They cascade — three or four transactions can form a complex cycle.

Always log the wait-for graph when a deadlock is detected. That's your map to the fix.

The fix is almost always lock ordering, not increasing locks.

Key Takeaway

Deadlocks need all four Coffman conditions.

Break any one — usually circular wait via lock ordering.

Deadlocks are a design problem, not a bug.

thecodeforge.io

DBMS Deadlock: Lock Order Bugs That Freeze Payment Systems

Deadlock Dbms

Deadlock Detection Algorithms: Wait-For Graph and Wound-Wait

Detection is the most common real-world strategy. Instead of preventing deadlocks (which costs throughput), allow them to happen, but catch them quickly and break them. There are two main detection approaches:

Wait-For Graph (WFG): The database maintains a directed graph where nodes are transactions, and an edge Ti → Tj means Ti is waiting for a lock held by Tj. Periodically, or when a lock wait exceeds a threshold, the system checks for cycles. If a cycle exists, choose a victim transaction and abort it (preferably the one that will cause the least work to rollback).

Example: PostgreSQL uses a background deadlock detector that runs every deadlock_timeout (default 1 second). It builds a wait-for graph from the lock manager's state and searches for cycles using a depth-first search. If found, it aborts the youngest transaction in the cycle. The error message tells you exactly which transaction was killed and why.

MySQL/InnoDB uses a different approach: it detects deadlocks immediately upon lock acquisition that would cause a cycle. It automatically chooses a transaction to roll back (the one that holds fewer locks or has done less work). You can see the last deadlock with SHOW ENGINE INNODB STATUS.

Wound-Wait and Wait-Die: These are both prevention and detection hybrids using transaction age (timestamp). Older transactions get priority. In Wound-Wait, if an older transaction requests a lock held by a younger one, it 'wounds' the younger (aborts it). In Wait-Die, older transactions wait, younger ones die (abort). These can cause more aborts than simple WFG detection.

check_wait_for_graph.sqlSQL

-- PostgreSQL: Check current wait-for graph (requires pg_stat_activity and pg_locks)
SELECT
    blocked.pid AS blocked_pid,
    blocked.query AS blocked_query,
    blocking.pid AS blocking_pid,
    blocking.query AS blocking_query
FROM pg_locks blocked
JOIN pg_locks blocking ON blocked.locktype = blocking.locktype
    AND blocked.database = blocking.database
    AND blocked.relation = blocking.relation
    AND blocked.objid = blocking.objid
WHERE NOT blocked.granted
AND blocking.granted;

-- This shows pairs (blocked, blocking). If you find a cycle, you have a deadlock.

Detection Overhead

Building and searching a wait-for graph takes CPU and latch contention. PostgreSQL runs it at most once per deadlock_timeout (default 1s). MySQL checks on every lock request that would cause a cycle. Both are efficient enough for most workloads.

Production Insight

Detection only helps if you know about it. Many production outages happen because deadlock detection is disabled or the timeout is set too high.

Set deadlock_timeout low enough to catch deadlocks under your normal transaction duration, but not so low that it fires on benign lock waits.

Rule: start with 1 second, then tune based on your p99 transaction latency.

Key Takeaway

Wait-for graph detection is the industry standard.

Low timeout catches deadlocks fast but risks false positives.

Always log the full deadlock detail for post-mortem analysis.

Deadlock Prevention Strategies: Breaking the Coffman Conditions

Prevention eliminates deadlocks by design — but it costs throughput. Here are the four strategies, each targeting one Coffman condition:

1. Prevent Mutual Exclusion: That's impossible for write operations. You can use optimistic concurrency control (like MVCC snapshot isolation) where writers don't block readers and write conflicts are detected on commit. But writes still need exclusive locks on the data being modified — you can't two people update the same row simultaneously without conflict.

2. Prevent Hold-and-Wait: Require a transaction to acquire all locks it will ever need upfront (predeclaration). In a database, this is often done by starting the transaction, locking all required rows in a single statement (e.g., SELECT ... FOR UPDATE for all needed rows in a consistent order). If any lock is unavailable, the transaction fails immediately instead of waiting. This reduces deadlock risk but increases lock contention and reduces concurrency.

3. Prevent No Preemption: Allow the database to preempt (forcibly abort) a transaction when deadlock is detected — that's detection, not prevention. True 'prevention' of no preemption would require a mechanism for the DB to take locks away, which isn't standard.

4. Prevent Circular Wait: Enforce a global ordering of resources. For example, always lock rows in ascending primary key order. If every transaction follows the same order, cycles cannot form. This is the most practical prevention technique and widely used in production.

Trade-off: Prevention via predeclaration (hold-and-wait) reduces concurrency dramatically. Prevention via lock ordering (circular wait) is cheap but requires discipline across the entire codebase. Most teams go with detection + recovery because prevention is complex to enforce.

io/thecodeforge/db/LockOrderingExample.javaJAVA

// TheCodeForge — Lock ordering to prevent deadlock
package io.thecodeforge.db;

import java.sql.*;

public class LockOrderingExample {
    // Always lock accounts in ascending ID order
    public static void transferFunds(Connection conn, int fromId, int toId, double amount) throws SQLException {
        int lowId = Math.min(fromId, toId);
        int highId = Math.max(fromId, toId);

        conn.setAutoCommit(false);
        try {
            // Lock low ID first, then high ID
            lockAccount(conn, lowId);
            lockAccount(conn, highId);

            // Perform transfer
            updateBalance(conn, fromId, -amount);
            updateBalance(conn, toId, amount);

            conn.commit();
        } catch (SQLException e) {
            conn.rollback();
            throw e;
        } finally {
            conn.setAutoCommit(true);
        }
    }

    private static void lockAccount(Connection conn, int accountId) throws SQLException {
        String sql = "SELECT balance FROM accounts WHERE id = ? FOR UPDATE";
        try (PreparedStatement ps = conn.prepareStatement(sql)) {
            ps.setInt(1, accountId);
            ps.executeQuery();
        }
    }

    private static void updateBalance(Connection conn, int accountId, double delta) throws SQLException {
        String sql = "UPDATE accounts SET balance = balance + ? WHERE id = ?";
        try (PreparedStatement ps = conn.prepareStatement(sql)) {
            ps.setDouble(1, delta);
            ps.setInt(2, accountId);
            ps.executeUpdate();
        }
    }
}

Lock Ordering Is Not Foolproof

Lock ordering prevents circular wait only if every transaction follows the order. A single transaction that locks in reverse order can still cause a deadlock. Use code reviews and static analysis to enforce the rule.

Production Insight

Prevention via predeclaration often backfires. You end up holding locks longer, increasing contention and reducing throughput.

Lock ordering is low overhead but requires organisation-wide discipline.

At Netflix, they use a mix: lock ordering for common paths, detection as a safety net.

Key Takeaway

Lock ordering prevents circular wait at low cost.

Predeclaration of all locks reduces concurrency too much.

Deadlock detection + recovery is often the best trade-off.

Deadlock Recovery: Choosing the Victim and Rolling Back

When a deadlock is detected, the database must break the cycle by aborting one or more transactions (the 'victim'). The victim is chosen based on a cost metric:

Transaction age: Younger transactions are cheaper to abort (less work done).
Number of locks held: Fewer locks means less rollback work.
Work done (e.g., rows modified): Rollback of a large transaction costs more I/O and CPU.

PostgreSQL's victim selection: it picks the transaction that will cause the least amount of redo (i.e., the one that has made the fewest changes). In practice, it's the youngest transaction in the cycle — not by timestamp, but by the transaction's virtual transaction ID (VXID).

MySQL's victim selection: InnoDB chooses the transaction that has performed the least work (based on undo log size). If equal, it picks one with fewer locks.

Once a victim is chosen, the database rolls back that transaction entirely. All its locks are released, allowing the other waiting transaction(s) to proceed. The application receives an error (e.g., '40001' serialization failure in PostgreSQL). The application must retry.

Retry considerations

Immediate retry often causes the same deadlock. Use exponential backoff with jitter.
Track the number of retries; after 3-5, escalate to a human or a dead letter queue.
Retry at the transaction boundary, not inside it — you want a fresh transaction to avoid leftover locks.

io/thecodeforge/db/retry_on_deadlock.pyPYTHON

# TheCodeForge — Deadlock retry with exponential backoff
import time
import random
from sqlalchemy import create_engine, exc

def execute_transaction(conn_func):
    max_retries = 5
    for attempt in range(max_retries):
        try:
            return conn_func()
        except exc.OperationalError as e:
            if 'deadlock detected' in str(e) and attempt < max_retries - 1:
                sleep_time = 0.1 * (2 ** attempt) + random.uniform(0, 0.1)
                time.sleep(sleep_time)
            else:
                raise
    raise RuntimeError("Transaction failed after retries")

# Usage
engine = create_engine('postgresql://user:pass@localhost/db')
with engine.begin() as conn:
    result = execute_transaction(lambda: conn.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1"))

Retry with Caution

Don't retry indefinitely. Each retry consumes resources and may worsen contention. Use a maximum retry count and push failed transactions to a dead letter queue for manual inspection.

Production Insight

Database logs during a deadlock storm can be overwhelming. Use structured logging and scrape the deadlock details programmatically.

The retry loop can become a thundering herd. Use jitter and maybe a distributed lock to serialise retries.

Rule: never retry more than 5 times, and always log the deadlock details before retrying.

Key Takeaway

Victim selection is based on cost (young, few locks).

Retry with exponential backoff + jitter.

Monitor deadlock frequency — frequent deadlocks indicate a design flaw.

MVCC and Optimistic Concurrency Control: Avoiding Locks Altogether

Multi-Version Concurrency Control (MVCC) doesn't eliminate locks, but it changes the game. In MVCC, readers don't block writers and writers don't block readers — each transaction sees a snapshot of the database (a version). Conflicts only arise on write-write collisions. Deadlocks still happen, but they're less frequent because read operations don't acquire locks.

In PostgreSQL's MVCC (snapshot isolation), a write-transaction that updates a row locks that row. If another write-transaction tries to update the same row, it waits. If there's a cycle, deadlock occurs. So MVCC reduces the chance of deadlock by eliminating reads from the lock picture, but write-write contention remains.

Optimistic Concurrency Control (OCC): No locks at all during the transaction. All operations are applied locally to a private copy. At commit time, the database checks if a conflict occurred (e.g., via a version number or timestamp). If yes, the transaction is aborted and retried. OCC completely eliminates deadlocks because there are no locks! But the cost is high abort rate under contention.

In practice, most relational databases use MVCC (which is a form of pessimistic concurrency control for writes). In-memory databases and some NoSQL systems use OCC. The trade-off is clear: OCC has zero deadlock risk but can waste work on aborts under heavy contention. MVCC has lower abort rates but risk of deadlock.

occ_version_check.sqlSQL

-- OCC style version check in SQL (if your DB supports it)
-- Assume table accounts has a column 'version' integer
BEGIN;
SELECT balance, version INTO v_bal, v_ver FROM accounts WHERE id = 1;
-- Perform local calculation
UPDATE accounts SET balance = v_bal - 100, version = v_ver + 1 WHERE id = 1 AND version = v_ver;
-- If ROW_COUNT = 0, conflict – rollback and retry
COMMIT;

MVCC vs OCC: A Quick Comparison

In practice, MVCC (snapshot isolation) is the standard in PostgreSQL, Oracle, MySQL with InnoDB. OCC is common in distributed databases like FoundationDB. Choose based on your contention pattern — low contention favours OCC for zero deadlock risk, high contention favours MVCC for lower abort rates.

Production Insight

Under high contention (e.g., a hot row like a counter), even MVCC can cause frequent deadlocks. Consider moving the counter to an atomic counter in Redis.

OCC with version checks can still cause livelock (retry forever). Use exponential backoff and a max retry count.

Rule: if you see deadlocks on single rows, use optimistic locking with version columns and retry — it's a pattern that works.

Key Takeaway

MVCC reduces but doesn't eliminate deadlocks.

OCC eliminates deadlocks but increases aborts.

Choose based on contention level — design for your workload.

Deadlock Avoidance vs. Prevention: Why the Banker’s Algorithm Never Got Hired

Most teams confuse prevention with avoidance. Prevention breaks Coffman conditions upfront. Avoidance is smarter—it checks if granting a lock keeps the system in a safe state where all transactions can eventually complete. The textbook example is the Banker’s Algorithm: resources are like cash in a vault, and each transaction declares its max need upfront. The system only approves a request if it can still satisfy all remaining demands after the grant. In practice, real databases rarely implement this. Why? Because transactions don’t know their final resource requirements in advance, and the CPU cost of running the safety check on every lock request kills throughput. SQL Server’s deadlock avoidance is simpler: it uses the wait-die and wound-wait timestamp schemes. In wait-die, older transactions wait; younger ones die and retry. In wound-wait, older transactions force younger ones to abort. Both guarantee no deadlocks without needing a full safety check. The lesson: perfect avoidance is academic. Timestamp-based schemes are the pragmatic middle ground.

DeadlockAvoidanceExample.sqlSQL

-- io.thecodeforge
-- Demonstration: Wait-Die vs Wound-Wait behavior
-- Assume transaction T1 (older, TS=10) holds lock on TableA
-- Transaction T2 (younger, TS=20) requests conflicting lock

-- Wait-Die:
-- T2 (younger) dies immediately, releases any held locks, retries later
-- T1 (older) continues waiting

-- Wound-Wait:
-- T1 (older) wounds T2: forces T2 to abort and release TableA
-- T1 acquires lock immediately

-- Simulated transaction using timestamp ordering
BEGIN TRANSACTION;
DECLARE @my_timestamp BIGINT = 20; -- younger transaction
IF EXISTS (SELECT 1 FROM sys.dm_tran_locks WHERE resource_database_id = DB_ID() AND request_owner_type = 'TRANSACTION' AND request_session_id <> @@SPID)
BEGIN
    -- Wound-Wait logic: if younger, abort
    IF @my_timestamp < (SELECT MIN(transaction_begin_time) FROM sys.dm_tran_current_transaction)
        ROLLBACK;
    ELSE
        WAITFOR DELAY '00:00:01'; -- simulate waiting
END;
COMMIT;

Output

(No output – schema/logic only)

Production Trap:

Never use wait-die if you have many long-running older transactions. They’ll starve younger ones to death, causing endless retry storms. Wound-wait is safer for OLTP workloads because old transactions dominate.

Key Takeaway

Do not fight deadlocks with brute force. Use timestamp ordering: older transactions win, younger ones retry.

Real-World Deadlock Hunting: Trace Flags, Extended Events, and the Profiler Graph

When a deadlock hits production at 3 AM, you don’t guess. You need forensic evidence. SQL Server gives you three weapons. First, Trace Flag 1222 formats deadlock info into the SQL Server Error Log in XML. Enable it with DBCC TRACEON(1222, -1). The log shows you the victim transaction, the resources involved (like RID, KEY, or PAGE locks), and the exact SQL statements. Second, Extended Events capture deadlock graphs with zero overhead. Create a session on xml_deadlock_report and the database_xml_deadlock_report events. You get structured XML you can query. Third, the Profiler Deadlock Graph event produces a visual graph of circles (transactions) and squares (resources) with arrows showing wait chains. In practice, I start with Extended Events because it doesn’t require a server restart and runs 24/7. Parse the XML to find patterns: transactions accessing the same tables in opposite order, or queries with missing indexes causing table scans. Once you see the graph, the fix is usually obvious: reorder access, add an index, or switch to read committed snapshot isolation.

DeadlockCapture.sqlSQL

-- io.thecodeforge
-- Step 1: Enable Trace Flag 1222 for deadlock info in error log
DBCC TRACEON(1222, -1);
GO
-- Step 2: Query the error log for deadlock reports (SQL Server 2012+)
EXEC xp_readerrorlog 0, 1, N'deadlock', N'', N'2025-01-01', N'2025-12-31', 'asc';
GO

-- Step 3: Create an Extended Events session (lightweight, always-on)
CREATE EVENT SESSION [DeadlockMonitor] ON SERVER 
ADD EVENT sqlserver.xml_deadlock_report (
    ACTION (sqlserver.sql_text, sqlserver.session_id, sqlserver.client_hostname)),
ADD EVENT sqlserver.database_xml_deadlock_report (
    ACTION (sqlserver.sql_text))
ADD TARGET package0.event_file (SET filename = N'C:\DeadlockLogs\DeadlockMonitor.xel')
WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, 
       MAX_DISPATCH_LATENCY=30 SECONDS);
ALTER EVENT SESSION [DeadlockMonitor] ON SERVER STATE = START;
GO

Output

(Creates session and starts logging)

Senior Dev Tip:

Parse the deadlock XML with XQuery. Look for '<victim-list>' to see who got killed, then '<process-list>' for the victim’s last query. That’s usually the bug.

Key Takeaway

Do not debug deadlocks blind. Enable Trace Flag 1222 and Extended Events before the problem happens. The graph tells you the victim and the conflict.

Handling Deadlocks in Code: TRY...CATCH and Retry Logic That Won’t Burn CPU

You cannot eliminate deadlocks entirely. You handle them. The universal pattern is to catch error 1205 (deadlock victim) in SQL Server and retry. But naive retry kills performance. Here’s the correct approach: wrap each transaction in a TRY...CATCH block. On deadlock, wait a random amount of time (exponential backoff) to avoid same-cycle collisions, then retry up to 3 times. If it fails after that, log it and escalate. The worst retry strategy is immediate retry—you just hit the same deadlock instantly. The best strategy adds jitter: WAITFOR DELAY '00:00:00.' + CAST(FLOOR(RAND() * 500) AS VARCHAR). Also, keep transactions short. Long transactions are deadlock magnets. If you are using Entity Framework, set SqlCommand.CommandTimeout low and catch SqlException with number 1205. In Java with JDBC, catch SQLException with error code 1205. And never, ever put user input prompts inside a transaction. That’s the classic production deadlock: the user goes to lunch while holding locks.

DeadlockRetry.csCSHARP

// io.thecodeforge
using System;
using System.Data.SqlClient;
using System.Threading;

public class DeadlockHandler
{
    public static void ExecuteWithRetry(string connectionString, string commandText)
    {
        int maxRetries = 3;
        int retryCount = 0;
        int baseDelayMs = 100;
        Random rng = new Random();

        while (retryCount < maxRetries)
        {
            try
            {
                using (var conn = new SqlConnection(connectionString))
                {
                    conn.Open();
                    using (var cmd = new SqlCommand(commandText, conn))
                    {
                        cmd.CommandTimeout = 30; // short timeout
                        cmd.ExecuteNonQuery();
                    }
                }
                return; // success, exit loop
            }
            catch (SqlException ex) when (ex.Number == 1205) // deadlock victim
            {
                retryCount++;
                if (retryCount >= maxRetries)
                {
                    // Log and escalate
                    throw new InvalidOperationException("Deadlock persisted after retries", ex);
                }
                // Exponential backoff with random jitter
                int delayMs = (int)Math.Pow(2, retryCount) * baseDelayMs + rng.Next(0, 200);
                Thread.Sleep(delayMs);
            }
        }
    }
}

Output

(Executes query, retries on deadlock, throws after 3 failures)

Production Trap:

Do not retry without jitter. Two transactions retrying on the same second will deadlock again. Add a random delay component to break the cycle.

Key Takeaway

Catch error 1205, wait with exponential backoff plus jitter, retry max 3 times. Short transactions are your best prevention.

● Production incidentPOST-MORTEMseverity: high

The Midnight Payment Jam: A Classic Deadlock Cascade

Symptom

All payment threads hung. API latency went from 20ms to 60 seconds. The database's pg_stat_activity showed dozens of idle-in-transaction sessions waiting on lock events.

Assumption

The team assumed it was a network issue or a slow query. They killed connections blindly, which only rolled back transactions and then replayed them, recreating the deadlock.

Root cause

Two stored procedures updated accounts in opposite lock order. Transaction A locked account 1 then waited for account 2; Transaction B locked account 2 then waited for account 1. Classic circular wait.

Fix

Changed both procedures to acquire locks in a consistent global order (alphabetical by account ID). Also added a short lock timeout (2 seconds) so no transaction hangs forever.

Key lesson

Always acquire locks in a fixed, documented order across your entire codebase.
Add a lock timeout as a safety net — never rely solely on deadlock detection.
Retry logic on deadlock must include exponential backoff and jitter, not immediate replay.

Production debug guideSymptom → Action for diagnosing deadlocks in production4 entries

Symptom · 01

Application hangs with high response times, no obvious CPU spike

→

Fix

Check pg_stat_activity (PostgreSQL) or SHOW PROCESSLIST (MySQL). Look for many sessions in 'idle in transaction' or 'waiting for lock'.

Symptom · 02

Deadlock error message in application logs (e.g. 'deadlock detected')

→

Fix

Enable deadlock logging in the database. In PostgreSQL, set log_lock_waits = on and deadlock_timeout = 1s. Collect the deadlock details from the log.

Symptom · 03

Frequent deadlocks on the same table pair

→

Fix

Analyze the lock order in your transactions. Run pg_locks with joins to find which sessions hold conflicting locks. Then refactor to use a consistent order.

Symptom · 04

Deadlocks spike after code deploy

→

Fix

Rollback the deploy first to restore service. Then review the new transaction code for lock order violations. Use a linter or code review checklist.

★ Deadlock Quick Cheat SheetCommands and immediate actions to diagnose and break deadlocks in production without restarting the database.

Application threads stuck, no progress−

Immediate action

Identify blocking sessions

Commands

SELECT pid, wait_event, state, query FROM pg_stat_activity WHERE wait_event_type = 'Lock';

SELECT blocked.pid AS blocked_pid, blocking.pid AS blocking_pid FROM pg_locks blocked JOIN pg_locks blocking ON blocked.locktype = blocking.locktype AND blocked.database = blocking.database AND blocked.relation = blocking.relation AND blocked.objid = blocking.objid WHERE NOT blocked.granted AND blocking.granted;

Fix now

Terminate the youngest blocked transaction: SELECT pg_terminate_backend(<blocked_pid>);

Deadlock detected in logs+

Retry loop making deadlocks worse+

Deadlock Management Strategies Comparison

Strategy	How It Works	Concurrency Impact	Deadlock Risk	Typical Use Case
Prevention (Lock Ordering)	All transactions acquire locks in a fixed order	Low overhead if done right	Zero (theoretically)	Any system with well-known access patterns
Prevention (Predeclaration)	Acquire all locks upfront	Reduces concurrency – locks held longer	Zero	Batch processing with known resource sets
Detection + Recovery	Allow deadlocks, detect via WFG, abort victim	Normal (locks held until detection timeout)	Non-zero but manageable	Most online transaction processing (OLTP)
Avoidance (Wait-Die/Wound-Wait)	Use transaction age to decide who waits/aborts	High abort rates in skewed age scenarios	Zero (but high aborts)	Systems with predictable transaction lifetimes
Optimistic Concurrency Control (OCC)	No locks; validate at commit	Very high concurrency if low contention	Zero (no locks)	Low contention, read-heavy workloads

Key takeaways

Deadlocks require all four Coffman conditions

break one to eliminate them entirely.

Lock ordering (consistent order) is the cheapest prevention strategy for production systems.

Detection + recovery (with a short deadlock_timeout) is the standard approach in most databases.

Retry logic must use exponential backoff with jitter to avoid cascading deadlocks.

MVCC reduces but doesn't eliminate deadlocks

write-write conflicts still cause them.

Monitor deadlock frequency; frequent deadlocks are a design smell, not a normal occurrence.

Common mistakes to avoid

5 patterns

Assuming deadlocks will never happen in your DB

Symptom

No monitoring, no deadlock_timeout configuration, and no retry logic. When deadlock strikes, all transactions hang and the app becomes unresponsive.

Fix

Set a reasonable deadlock_timeout (e.g., 1 second) and implement retry logic with exponential backoff. Monitor deadlock frequency with alerts.

Using `NOWAIT` or `SKIP LOCKED` everywhere to avoid waits

Symptom

Transactions fail with 'could not obtain lock' errors even when waiting would be fine. Throughput drops because too many transactions abort and retry.

Fix

Use NOWAIT only when you truly cannot wait (e.g., UI real-time). For backend batch processing, let transactions wait with a reasonable timeout.

Implementing retry without backoff or jitter

Symptom

After a deadlock, all retries happen simultaneously, causing cascading contention and timeouts. The database load spikes and may crash.

Fix

Add exponential backoff and random jitter to your retry logic. Example: sleep = min(base * 2^attempt + random(0, jitter), max_sleep).

Ignoring lock ordering in stored procedures

Symptom

Deadlocks appear randomly between different code paths that update the same tables. The issue is hard to reproduce because it depends on timing.

Fix

Create a documented lock ordering convention for all tables. Use code reviews to enforce it. Add a linter rule that warns if tables are updated in a non-standard order.

Believing MVCC eliminates deadlocks entirely

Symptom

Engineers write transactions without considering lock ordering because 'MVCC should handle it'. Deadlocks still happen on write-write conflicts.

Fix

Educate the team: MVCC eliminates read-write conflicts but not write-write conflicts. Always use lock ordering for write-heavy transactions.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain the four Coffman conditions necessary for deadlock. Can a deadlo...

Q02SENIOR

How does PostgreSQL detect deadlocks? What is the deadlock_timeout param...

Q03SENIOR

Why is deadlock prevention via lock ordering considered better than pred...

Q04SENIOR

What is the difference between wait-die and wound-wait deadlock preventi...

Q05JUNIOR

Your application is experiencing frequent deadlocks when transferring mo...

Q01 of 05SENIOR

Explain the four Coffman conditions necessary for deadlock. Can a deadlock occur if only three of them hold?

ANSWER

The four conditions are: Mutual Exclusion (resources can be held by only one process/transaction at a time), Hold and Wait (a process holds at least one resource while waiting for another), No Preemption (resources cannot be forcibly taken away), and Circular Wait (a cycle in the resource allocation graph). All four must hold simultaneously for a deadlock. If any one is missing, the system cannot deadlock. For example, if there's no circular wait, even if the other three hold, transactions may wait but eventually one will proceed because there's no cycle.

FAQ · 5 QUESTIONS

Frequently Asked Questions

What is deadlock in DBMS in simple terms?

Can deadlock be completely prevented?

How does PostgreSQL handle deadlock recovery?

What is the difference between deadlock and livelock?

Does using optimistic concurrency control eliminate deadlocks?

Naren Founder & Principal Engineer

20+ years shipping production systems from the metal up. Notes here come from systems that actually shipped.

✓ Verified

production tested

May 24, 2026

last updated

1,554

articles · all by Naren

🔥

That's DBMS. Mark it forged?

9 min read · try the examples if you haven't