Intermediate 7 min · March 06, 2026

DBMS Interview Questions — Lost Update Race Conditions

Q: What is the difference between a primary key and a unique key in SQL?

A primary key uniquely identifies each row AND cannot contain NULL values — every table should have exactly one. A unique key also enforces uniqueness across a column, but it does allow NULL values (depending on the DBMS, either one or multiple NULLs). Use a primary key for your main identity and unique keys for natural keys like email addresses or passport numbers.

Q: How do you resolve a deadlock in a production database?

Most modern DBMS (like PostgreSQL or MySQL) automatically detect deadlocks and kill the 'cheapest' transaction to break the cycle. However, to prevent them, you must ensure that all application logic accesses tables in the exact same order. If Service A updates Table 1 then 2, but Service B updates Table 2 then 1, a deadlock is inevitable under load. Keep transactions short and avoid user interaction inside a transaction block.

Q: When should I use a stored procedure versus writing SQL in application code?

Use stored procedures for performance-critical batch processing to reduce network round-trips, or to centralize complex security/audit logic that multiple languages (e.g., a Python microservice and a Java monolith) must share. Use application-side SQL (via JDBI or Spring Data) for 90% of your CRUD logic to maintain better version control, easier unit testing, and simpler horizontal scaling of your app tier.

Two transfers read the same balance under Read Committed; second commit overwrote the first.

Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Drawn from code that ran under real load.

✓ Production

production tested

July 27, 2026

last updated

1,713

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

DBMS interview questions test your grasp of how data is stored, retrieved, and protected — not just SQL syntax.
Core topics: ACID, normalization, indexing, joins, transactions, and concurrency control.
Focus on tradeoffs: normalization vs denormalization, index read vs write cost, isolation vs performance.
Production insight: Misunderstanding isolation levels causes data anomalies like dirty reads and lost updates.
Biggest mistake: Memorizing definitions without being able to apply them to real scenarios.

✦ Definition~90s read

What is DBMS Interview Questions?

A lost update is a concurrency bug where two transactions read the same data, modify it in memory, then write back — and the second write silently overwrites the first, destroying that update. This isn't a theoretical edge case; it's the most common race condition in production databases under load, causing phantom inventory counts, double-booked reservations, and corrupted financial ledgers.

★

Think of a database management system like a giant, super-organized library.

The problem exists because database transactions don't magically serialize work — they rely on isolation levels and locking mechanisms that developers must explicitly configure. Without proper handling, even ACID-compliant databases like PostgreSQL or MySQL with InnoDB will lose updates under concurrent write pressure.

This article targets the specific intersection of DBMS theory and real-world interview questions. You'll need to understand why the default READ COMMITTED isolation level in most databases does NOT prevent lost updates — only REPEATABLE READ or SERIALIZABLE does, and even then only with the right locking strategy (pessimistic locks, optimistic concurrency control, or `SELECT ...

FOR UPDATE). The article also covers how normalization vs denormalization affects write contention, why indexes amplify the lost update problem by locking more rows than necessary, and how WHERE vs HAVING` filters interact with transaction isolation.

These aren't trivia questions — they're the difference between a system that works at 10 concurrent users and one that silently corrupts data at 10,000.

Plain-English First

Think of a database management system like a giant, super-organized library. The DBMS is the librarian — it decides where every book (data) is stored, who's allowed to read it, how to find it in seconds even among millions of books, and what happens if two people try to check out the same book at the same time. When interviewers ask DBMS questions, they're really asking: do you understand how this librarian makes decisions, and why those decisions matter?

Every serious backend or full-stack role has at least a handful of DBMS questions tucked into the interview. Why? Because how data is stored, retrieved, and protected is the backbone of almost every application ever built. A poorly understood database concept in production doesn't just cause slow queries — it causes data loss, race conditions, and outages at 3am. Interviewers use DBMS questions to quickly separate engineers who 'use' databases from engineers who 'understand' them.

The problem most candidates have isn't that they don't know SQL. It's that they've memorized definitions without ever connecting them to real scenarios. They can recite 'ACID stands for Atomicity, Consistency, Isolation, Durability' but freeze when asked 'What actually happens to your data if the server crashes mid-transaction?' That gap between definition and intuition is exactly what interviewers are probing for.

By the end of this article, you'll be able to answer intermediate DBMS interview questions with confident, scenario-grounded explanations. You'll understand the WHY behind normalization, indexing, transactions, and joins — and you'll have concrete examples to pull from the moment an interviewer says 'Tell me about...'

Why Lost Updates Are the Silent Killer in Concurrent DBMS

A lost update is a race condition where two concurrent transactions read the same row, modify it locally, and write back — the second write silently overwrites the first, discarding its change. This is not a theoretical corner case; it's a daily reality in any system with concurrent writes, from inventory decrements to balance transfers. The core mechanic: read-then-write without proper isolation. In practice, lost updates occur when transactions run at READ COMMITTED isolation or lower. The database allows both transactions to see the initial value, both compute a new value based on that stale snapshot, and both commit. The second commit clobbers the first. Serializable isolation prevents this by locking or aborting one transaction, but most production systems default to READ COMMITTED for performance. The fix is either explicit locking (SELECT ... FOR UPDATE) or optimistic concurrency with version columns. You use lost update awareness everywhere: e-commerce stock, financial ledgers, booking systems. Ignoring it means corrupted data that's invisible until reconciliation fails. Senior engineers treat every read-then-write as a potential lost update until proven otherwise.

⚠ Lost Update ≠ Dirty Write

A dirty write overwrites uncommitted data; a lost update overwrites committed data that was read before another transaction committed. They are distinct — and lost updates are far more common.

📊 Production Insight

E-commerce inventory: two requests decrement stock from 5 to 4 and 5 to 3 concurrently. Both read 5, both write back — final stock is 4 or 3, never 2. Symptom: silent overselling, discovered only during nightly audit. Rule: always use SELECT ... FOR UPDATE or optimistic locking with version columns on any counter read-then-write.

🎯 Key Takeaway

A lost update is a read-then-write race, not a write-write conflict.

READ COMMITTED is the default isolation level in most databases and does not prevent lost updates.

Always use explicit locking or optimistic concurrency for any row that is read, modified, and written back.

thecodeforge.io

Dbms Interview Questions

ACID Properties — What They Actually Guarantee (and When They Fail)

ACID is the most asked-about DBMS concept in interviews, and the most superficially answered. Let's fix that.

Atomicity means a transaction is all-or-nothing. If you're transferring $500 from Account A to Account B, that's two writes. Atomicity guarantees that if the second write fails, the first is rolled back — you never end up with money disappearing into thin air.

Consistency means the database moves from one valid state to another. If your schema says an order must have a valid customer_id, a transaction that violates that constraint is rejected entirely.

Isolation is the subtle one. It controls what a transaction can see about other in-flight transactions. SQL standard defines four isolation levels — Read Uncommitted, Read Committed, Repeatable Read, and Serializable — each trading performance for safety. Most databases default to Read Committed.

Durability means once a transaction is committed, it survives crashes. The database writes to a transaction log (WAL — Write-Ahead Log) before confirming success, so even if the server dies immediately after your commit, the data is recoverable on restart.

The key interview insight: ACID is a contract, not a guarantee that nothing goes wrong. It's a guarantee about what the system will do to handle things going wrong.

acid_transaction_demo.sqlSQL

-- Scenario: Bank transfer between two accounts
-- This demonstrates Atomicity — both updates happen or neither does

-- Setup: create accounts table with a balance constraint
CREATE TABLE bank_accounts (
    account_id   INT PRIMARY KEY,
    holder_name  VARCHAR(100) NOT NULL,
    balance      DECIMAL(10, 2) NOT NULL CHECK (balance >= 0)  -- Consistency constraint
);

INSERT INTO bank_accounts VALUES (1, 'Alice Johnson', 1000.00);
INSERT INTO bank_accounts VALUES (2, 'Bob Martinez',  500.00);

-- Begin a transaction (Atomicity wrapper)
BEGIN TRANSACTION;

    -- Step 1: Deduct from Alice
    UPDATE bank_accounts
    SET    balance = balance - 500.00
    WHERE  account_id = 1;

    -- Step 2: Credit Bob
    UPDATE bank_accounts
    SET    balance = balance + 500.00
    WHERE  account_id = 2;

    -- If EITHER update fails (e.g. Alice's balance would go negative,
    -- violating CHECK constraint), the whole block rolls back.
    -- Neither account changes. That's Atomicity + Consistency working together.

COMMIT;  -- Durability: this is now written to the WAL and survives a crash

-- Verify the final state
SELECT account_id, holder_name, balance FROM bank_accounts;

-- Now demonstrate a ROLLBACK scenario
BEGIN TRANSACTION;

    UPDATE bank_accounts
    SET    balance = balance - 2000.00  -- Alice only has 500 now; CHECK will fail
    WHERE  account_id = 1;

ROLLBACK;  -- We manually roll back, or the DB engine does it on constraint violation

-- Alice's balance is unchanged — Atomicity protected us
SELECT account_id, holder_name, balance FROM bank_accounts;

Output

-- After COMMIT:

account_id | holder_name | balance

-----------+---------------+---------

1 | Alice Johnson | 500.00

2 | Bob Martinez | 1000.00

-- After ROLLBACK (failed deduction):

account_id | holder_name | balance

-----------+---------------+---------

1 | Alice Johnson | 500.00

2 | Bob Martinez | 1000.00

⚠ Watch Out: Isolation ≠ 'Transactions Don't Interfere'

Most databases default to Read Committed isolation, which prevents dirty reads but still allows non-repeatable reads. If your transaction reads a row twice and another transaction commits a change in between, you'll get different values. If your logic depends on a value staying stable across two reads in the same transaction, you need Repeatable Read or Serializable — at the cost of more locking and lower throughput.

📊 Production Insight

In production, the most common ACID failure is not a crash but a lost update under Read Committed isolation.

Your transaction reads a row, another transaction updates it, and your subsequent writes overwrite that change.

Rule: If your business logic depends on a value staying stable across reads, use SELECT ... FOR UPDATE or move to Repeatable Read.

🎯 Key Takeaway

ACID properties are guarantees about failure handling, not guarantees of correctness under concurrency.

The isolation level you choose determines what concurrency anomalies you accept.

Always match your isolation level to your business requirement — default settings are rarely right.

Normalization vs Denormalization — Choosing the Right Tradeoff

Normalization is the process of organizing your schema to eliminate redundant data and protect against update anomalies. Denormalization is the deliberate reversal of that — adding redundancy back in to speed up reads. The interview question isn't 'what is normalization?' — it's 'when do you normalize and when do you denormalize?'

Normalization follows 'Normal Forms'. First Normal Form (1NF) eliminates repeating groups — every cell holds one atomic value. Second Normal Form (2NF) removes partial dependencies — every non-key column depends on the whole primary key, not just part of it. Third Normal Form (3NF) removes transitive dependencies — non-key columns shouldn't depend on other non-key columns.

Here's the real-world intuition: if you store a customer's city and zip code in the orders table, you'll have to update both every time a customer moves — and you risk them going out of sync. 3NF says: put zip code in a customers table and reference it by customer_id. Now city and zip are always consistent.

Denormalization makes sense in read-heavy analytical systems (data warehouses, reporting dashboards) where joins across 10 normalized tables are too slow. You pre-join the data at write time so reads are instant. The tradeoff: writes become slower and you need application logic to keep redundant data in sync.

The rule of thumb: normalize by default in OLTP systems (transactions), denormalize deliberately in OLAP systems (analytics).

normalization_example.sqlSQL

-- =============================================
-- BAD: Unnormalized orders table (violates 3NF)
-- =============================================
-- Problem: customer_city depends on customer_id, NOT on order_id.
-- If a customer moves, you update 100 rows instead of 1.
CREATE TABLE orders_unnormalized (
    order_id       INT PRIMARY KEY,
    customer_id    INT,
    customer_name  VARCHAR(100),   -- redundant — should live in customers table
    customer_city  VARCHAR(100),   -- transitive dependency: city -> customer, not order
    product_name   VARCHAR(100),
    order_total    DECIMAL(10, 2)
);

-- =============================================
-- GOOD: Normalized to 3NF — two separate tables
-- =============================================
CREATE TABLE customers (
    customer_id   INT PRIMARY KEY,
    customer_name VARCHAR(100) NOT NULL,
    customer_city VARCHAR(100) NOT NULL
    -- city lives here — update once, reflects everywhere
);

CREATE TABLE orders (
    order_id      INT PRIMARY KEY,
    customer_id   INT NOT NULL REFERENCES customers(customer_id),  -- FK enforces integrity
    product_name  VARCHAR(100) NOT NULL,
    order_total   DECIMAL(10, 2) NOT NULL
);

-- Insert sample data
INSERT INTO customers VALUES (1, 'Alice Johnson', 'New York');
INSERT INTO orders   VALUES (101, 1, 'Laptop', 1299.99);
INSERT INTO orders   VALUES (102, 1, 'Mouse',    29.99);

-- Join to reconstruct the full picture — this is the cost of normalization
-- (the join), but the benefit is zero data duplication
SELECT
    o.order_id,
    c.customer_name,
    c.customer_city,
    o.product_name,
    o.order_total
FROM   orders    o
JOIN   customers c ON o.customer_id = c.customer_id;

-- If Alice moves to Boston — ONE update, zero inconsistency
UPDATE customers SET customer_city = 'Boston' WHERE customer_id = 1;

Output

order_id | customer_name | customer_city | product_name | order_total

---------+---------------+---------------+--------------+------------

101 | Alice Johnson | New York | Laptop | 1299.99

102 | Alice Johnson | New York | Mouse | 29.99

💡Interview Gold: Frame It as a Tradeoff

When an interviewer asks about normalization, don't just recite the normal forms. Say: 'I normalize transactional schemas by default to avoid update anomalies, and I selectively denormalize in read-heavy or analytical contexts where join performance becomes a bottleneck — like pre-aggregating daily sales totals into a summary table.' That answer signals you think in tradeoffs, not rules.

📊 Production Insight

Denormalizing without profiling can make a fast query slower by bloating row size and increasing I/O per scan.

Always measure the actual join cost before adding redundancy.

Rule: Profile first, denormalize second.

🎯 Key Takeaway

Normalize for write-heavy OLTP to avoid anomalies; denormalize for read-heavy OLAP after measuring bottlenecks.

The 3NF rule prevents transitive dependencies — city depends on customer, not order.

Use foreign keys to enforce referential integrity — without them, you'll get orphaned records.

thecodeforge.io

Dbms Interview Questions

Indexing Deep Dive — Why Indexes Speed Up Reads but Hurt Writes

An index is a separate data structure — usually a B-Tree — that the database builds and maintains alongside your table. It stores a sorted copy of one or more columns, with pointers back to the actual rows. Without an index, a query like WHERE email = 'alice@example.com' does a full table scan — reading every single row. With an index on email, the database does a binary search on the B-Tree and jumps directly to the matching row.

So why not index every column? Because every index you create is a structure that must be updated on every INSERT, UPDATE, and DELETE. If your orders table has 8 indexes, inserting one order requires 8 separate B-Tree updates. In a write-heavy system, that overhead adds up fast and can make inserts painfully slow.

There are two important index types you'll encounter in interviews. A clustered index physically reorders the table data to match the index order — there can only be one per table (SQL Server/MySQL InnoDB use the primary key as the clustered index by default). A non-clustered index is a separate structure with pointers to rows — you can have many, and they don't affect the storage order of the table.

Composite indexes follow the 'left-prefix rule': an index on (last_name, first_name) speeds up queries filtering on last_name alone, or both columns — but NOT on first_name alone. The database can only use the leftmost columns in the index definition.

indexing_strategy_demo.sqlSQL

-- Setup: large employees table to demonstrate index impact
CREATE TABLE employees (
    employee_id   INT PRIMARY KEY,          -- clustered index by default in most DBs
    department_id INT NOT NULL,
    last_name     VARCHAR(100) NOT NULL,
    first_name    VARCHAR(100) NOT NULL,
    email         VARCHAR(200) UNIQUE,      -- UNIQUE automatically creates an index
    hire_date     DATE NOT NULL,
    salary        DECIMAL(10, 2)
);

-- Without any extra indexes, this query does a full table scan (slow at scale)
-- EXPLAIN shows 'Seq Scan' or 'Full Table Scan' — red flag in production
EXPLAIN SELECT * FROM employees WHERE last_name = 'Johnson';

-- Create a non-clustered index on last_name
CREATE INDEX idx_employees_last_name
    ON employees (last_name);  -- B-Tree built; now searches on last_name are O(log n)

-- After the index: EXPLAIN shows 'Index Scan' — jumps directly to matching rows
EXPLAIN SELECT * FROM employees WHERE last_name = 'Johnson';

-- Composite index: useful when you frequently filter by department AND hire_date
CREATE INDEX idx_employees_dept_hire
    ON employees (department_id, hire_date);

-- This query USES the composite index (left-prefix rule satisfied)
SELECT * FROM employees
WHERE  department_id = 3
AND    hire_date > '2022-01-01';           -- Both columns in index, in order

-- This query CANNOT use the composite index efficiently
-- (skips department_id, starts at second column — violates left-prefix rule)
SELECT * FROM employees
WHERE  hire_date > '2022-01-01';           -- Falls back to full scan

-- Check index usage and fragmentation (PostgreSQL syntax)
SELECT
    indexname,
    idx_scan,       -- how many times this index was used
    idx_tup_read    -- rows read through this index
FROM   pg_stat_user_indexes
WHERE  tablename = 'employees';

Output

-- EXPLAIN before index:

Seq Scan on employees (cost=0.00..2841.00 rows=45 width=80)

Filter: ((last_name)::text = 'Johnson')

-- EXPLAIN after index:

Index Scan using idx_employees_last_name on employees (cost=0.29..8.31 rows=45 width=80)

Index Cond: ((last_name)::text = 'Johnson')

-- pg_stat_user_indexes output:

indexname | idx_scan | idx_tup_read

----------------------------+----------+-------------

idx_employees_last_name | 284 | 12780

idx_employees_dept_hire | 97 | 4365

pk_employees | 5021 | 5021

🔥Pro Tip: Use EXPLAIN Before Adding Indexes Blindly

Always run EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) on a slow query before creating an index. The query plan shows exactly which step is expensive — sometimes the bottleneck is a missing join condition or a function wrapping a column (like WHERE YEAR(hire_date) = 2023, which prevents index use entirely). Fix the query first, then index if needed.

📊 Production Insight

A missing index on a foreign key column is the #1 cause of cascading full table scans in production joins.

Always index foreign keys by default — the database won't warn you.

Rule: When adding a FK constraint, add an index on the FK column immediately.

🎯 Key Takeaway

Indexes are B-Trees that speed reads at the cost of slower writes.

The left-prefix rule governs composite indexes — always put the most selective column first.

Use EXPLAIN before creating an index — sometimes the query itself is the problem.

Joins, Keys, and the Difference Between WHERE and HAVING

Joins are the mechanism for combining rows from two or more tables based on a related column. The confusion isn't usually about syntax — it's about understanding which join to use and what happens to unmatched rows.

INNER JOIN returns only rows where the join condition is satisfied in both tables. If a customer has no orders, they don't appear in the result. LEFT JOIN returns all rows from the left table and matching rows from the right — unmatched right-side columns are NULL. This is critical for questions like 'find all customers who have never placed an order' (WHERE order_id IS NULL after a LEFT JOIN).

Cross joins and self-joins come up less often in interviews but signal depth. A self-join joins a table to itself — useful for hierarchical data like an employees table where manager_id references another employee_id in the same table.

The WHERE vs HAVING distinction is a classic interview trap. WHERE filters rows before aggregation happens. HAVING filters groups after aggregation. If you put an aggregate function like COUNT() in a WHERE clause, you'll get a syntax error — because WHERE runs before GROUP BY has created any groups to count. HAVING runs after GROUP BY, so aggregates are fair game.

Primary keys uniquely identify a row and cannot be NULL. Foreign keys enforce referential integrity — a foreign key column in one table must match an existing primary key in another table (or be NULL). Without this constraint, you can end up with orphaned records: orders pointing to customer_ids that no longer exist.

joins_and_groupby_demo.sqlSQL

-- Setup: customers and orders tables
CREATE TABLE customers (
    customer_id   INT PRIMARY KEY,
    customer_name VARCHAR(100) NOT NULL
);

CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),  -- foreign key
    amount      DECIMAL(10, 2) NOT NULL,
    order_year  INT NOT NULL
);

INSERT INTO customers VALUES (1, 'Alice Johnson'), (2, 'Bob Martinez'), (3, 'Carol Wu');
INSERT INTO orders VALUES
    (101, 1, 250.00, 2023),
    (102, 1, 450.00, 2023),
    (103, 2, 125.00, 2022),
    (104, 2, 800.00, 2023);
-- Carol Wu (customer_id=3) has NO orders — useful for LEFT JOIN demo

-- =============================================
-- INNER JOIN: only customers WITH orders
-- =============================================
SELECT c.customer_name, o.order_id, o.amount
FROM   customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
-- Carol Wu is NOT in this result

-- =============================================
-- LEFT JOIN: ALL customers, even those without orders
-- =============================================
SELECT c.customer_name, o.order_id, o.amount
FROM   customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
-- Carol Wu appears with NULL for order_id and amount

-- =============================================
-- Find customers who have NEVER placed an order
-- =============================================
SELECT c.customer_name
FROM   customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE  o.order_id IS NULL;  -- NULL means no matching order row was found

-- =============================================
-- WHERE vs HAVING — the classic interview trap
-- =============================================

-- WHERE filters individual rows BEFORE grouping
SELECT   customer_id, SUM(amount) AS total_spent
FROM     orders
WHERE    order_year = 2023           -- filter rows: only 2023 orders enter the group
GROUP BY customer_id;

-- HAVING filters groups AFTER aggregation
SELECT   customer_id, SUM(amount) AS total_spent
FROM     orders
GROUP BY customer_id
HAVING   SUM(amount) > 500;          -- filter groups: only big spenders

-- WRONG: This throws an error — you can't use aggregate in WHERE
-- SELECT customer_id FROM orders WHERE SUM(amount) > 500 GROUP BY customer_id;
-- ERROR: aggregate functions are not allowed in WHERE

Output

-- INNER JOIN result:

customer_name | order_id | amount

--------------+----------+--------

Alice Johnson | 101 | 250.00

Alice Johnson | 102 | 450.00

Bob Martinez | 103 | 125.00

Bob Martinez | 104 | 800.00

-- LEFT JOIN result:

customer_name | order_id | amount

--------------+----------+--------

Alice Johnson | 101 | 250.00

Alice Johnson | 102 | 450.00

Bob Martinez | 103 | 125.00

Bob Martinez | 104 | 800.00

Carol Wu | NULL | NULL

-- Customers with NO orders:

customer_name

--------------

Carol Wu

-- WHERE 2023 filter result:

customer_id | total_spent

------------+------------

1 | 700.00

2 | 800.00

-- HAVING > 500 result:

customer_id | total_spent

------------+------------

2 | 925.00

🔥Interview Gold: The LEFT JOIN + IS NULL Pattern

The pattern LEFT JOIN ... WHERE right_table.id IS NULL is a classic 'anti-join' — it finds rows in the left table that have no match in the right. Interviewers love this because it separates people who understand what a LEFT JOIN actually produces (NULLs on the right side for unmatched rows) from those who've just memorized a diagram. An alternative is NOT EXISTS, which is often more readable and equally performant on modern query optimizers.

📊 Production Insight

The LEFT JOIN anti-join pattern (WHERE right.id IS NULL) is silently slow on large tables if the right table lacks an index on the join column.

Without an index, PostgreSQL does a Nested Loop with a full scan per row.

Rule: Always index the join column in the right table for LEFT JOIN anti-joins.

🎯 Key Takeaway

WHERE filters rows before GROUP BY; HAVING filters groups after.

LEFT JOIN + IS NULL finds unmatched rows; NOT EXISTS is often equivalent.

Primary keys enforce uniqueness; foreign keys enforce referential integrity — never skip them.

Transactions, Isolation Levels, and Concurrency Control

A transaction groups multiple SQL statements into a single logical unit. When you COMMIT, all changes become permanent. When you ROLLBACK, all changes are undone. But what happens when two transactions run at the same time? That's where isolation levels and concurrency control kick in.

Most modern databases use Multi-Version Concurrency Control (MVCC). Instead of locking every row a transaction reads, MVCC keeps older versions of rows. Each transaction sees a snapshot of the data as of its start time. Readers never block writers, and writers never block readers. That's the magic. But MVCC isn't free — it increases storage and makes the transaction log grow.

The SQL standard defines four isolation levels

Read Uncommitted: No protection — can see uncommitted changes (dirty reads). Rarely used.
Read Committed: Prevents dirty reads. Default in PostgreSQL, Oracle, SQL Server. Allows non-repeatable reads and phantom reads.
Repeatable Read: Prevents non-repeatable reads. Still allows phantom reads (new rows inserted by others). Default in MySQL InnoDB.
Serializable: Highest level. Prevents all anomalies. Achieved through either locking or snapshot isolation (PostgreSQL uses Serializable Snapshot Isolation).

A deadlock occurs when two transactions each hold a lock the other needs. The database automatically kills one transaction to break the cycle. The fix is always application-side: access tables in the same order, keep transactions short, and avoid user interaction inside transactions.

isolation_and_deadlock_demo.sqlSQL

-- =============================================
-- Example: Lost Update under Read Committed
-- =============================================
CREATE TABLE account_balance (
    account_id INT PRIMARY KEY,
    balance    DECIMAL(10, 2) NOT NULL CHECK (balance >= 0)
);
INSERT INTO account_balance VALUES (1, 1000.00);

-- Transaction A
BEGIN TRANSACTION;
SELECT balance FROM account_balance WHERE account_id = 1;  -- reads 1000
-- ... some processing ...
UPDATE account_balance SET balance = balance - 200 WHERE account_id = 1;

-- Transaction B (concurrent)
BEGIN TRANSACTION;
SELECT balance FROM account_balance WHERE account_id = 1;  -- also reads 1000
UPDATE account_balance SET balance = balance - 300 WHERE account_id = 1;

-- If Transaction B commits after A, the final balance is 700 (only B's change)
-- Transaction A's deduction is lost. This is a lost update.

-- Fix: Use SELECT ... FOR UPDATE to lock the row
BEGIN TRANSACTION;
SELECT balance FROM account_balance WHERE account_id = 1 FOR UPDATE;  -- acquires row lock
-- Transaction B will block here until A commits
UPDATE account_balance SET balance = balance - 200 WHERE account_id = 1;
COMMIT;

-- =============================================
-- Deadlock scenario (avoid this)
-- =============================================
CREATE TABLE table_a (id INT PRIMARY KEY, val INT);
CREATE TABLE table_b (id INT PRIMARY KEY, val INT);
INSERT INTO table_a VALUES (1, 10);
INSERT INTO table_b VALUES (1, 20);

-- Transaction 1 (started first)
BEGIN;
UPDATE table_a SET val = val + 1 WHERE id = 1;
-- now try to update table_b
UPDATE table_b SET val = val + 1 WHERE id = 1;  -- blocked if Transaction 2 holds lock on table_b

-- Transaction 2 (concurrent)
BEGIN;
UPDATE table_b SET val = val + 1 WHERE id = 1;
-- now try to update table_a
UPDATE table_a SET val = val + 1 WHERE id = 1;  -- DEADLOCK!

-- One of them will be killed by the database; the other succeeds.
-- Prevention: always update table_a before table_b in both transactions.

Output

-- After lost update scenario (without FOR UPDATE):

account_id | balance

-----------+--------

1 | 700.00 -- lost $200

-- After using FOR UPDATE:

account_id | balance

-----------+--------

1 | 800.00 -- correct

-- Deadlock detection log (PostgreSQL):

ERROR: deadlock detected

DETAIL: Process 123 waits for ShareLock on transaction 456; blocked by process 456.

HINT: See server log for query details.

⚠ Implicit Locks Can Still Cause Problems

Even with MVCC, UPDATE and DELETE statements acquire write locks on affected rows. If two transactions update the same row concurrently, one will wait for the other. Long-running transactions holding locks for minutes cause lock escalation and can cascade into a system-wide stall. Keep transactions short and commit early.

📊 Production Insight

Default Read Committed isolation allows non-repeatable reads and phantom reads.

If your application reads a sum of account balances twice in a transaction and gets different results, you have a phantom read.

Rule: Use Serializable only when absolutely necessary — it kills throughput.

🎯 Key Takeaway

MVCC enables non-blocking reads by keeping older row versions.

Isolation levels trade safety for performance — choose the weakest level that guarantees correctness.

Deadlocks are avoided by locking tables in the same order across all transactions.

Why Views Are Leaky Abstractions That Can Wreck Query Performance

Views are sold as a clean abstraction. They aren't. A view is just a saved SELECT statement — the database re-executes it every time you query the view. No caching. No magic. When you nest views inside views, you get a query that the optimizer cannot flatten into a single execution plan. The result? A 10-row filter triggers a 10-million-row scan. I've seen devs cry over 45-second queries that should take 2ms. The fix? Materialized views if your DB supports them (PostgreSQL, Oracle, Snowflake). Or just write the damn query yourself. Views have legitimate uses (row-level security, column masking, legacy compatibility). But if you are using a view to "simplify" a join that runs every 5 seconds, you are building a performance time bomb. Always EXPLAIN your view-based queries in production before you ship.

view_performance.sqlSQL

-- A seemingly innocent view
CREATE VIEW customer_orders AS
SELECT c.id, c.name, o.total, o.order_date
FROM customers c
JOIN orders o ON c.id = o.customer_id;

-- This nested query triggers a full table scan on orders
SELECT * FROM customer_orders
WHERE order_date > '2025-01-01'
AND total > 1000;

-- EXPLAIN output shows: Seq Scan on orders (filters applied AFTER join)
-- Not pushed down to the base tables

Output

Seq Scan on orders (cost=0.00..123456.78 rows=1 width=42)

Filter: ((total > 1000) AND (order_date > '2025-01-01'::date))

⚠ Production Trap:

Your ORM may be generating nested views behind your back. Django's 'select_related' combined with a model that inherits from a view layer is a common cause. Always enable 'log_min_duration_statement' in Postgres to catch these queries before they become P0 incidents.

🎯 Key Takeaway

Views are saved queries, not saved results. If you need performance, either materialize the view or inline the query.

The Difference Between WHERE and HAVING That Junior Devs Always Get Wrong

WHERE filters rows before grouping. HAVING filters groups after aggregation. That sounds simple. But in production, I see people using HAVING for everything because they learned it in a tutorial that never showed a WHERE clause with a GROUP BY. Here is the practical impact: HAVING cannot use indexes on columns that are not in the GROUP BY or the aggregate. WHERE can use any index. So if you write 'HAVING status = 'active'' on a 100M-row table, you just killed performance because the database must scan every group after aggregating. The rule? Filter raw rows with WHERE. Filter aggregated results with HAVING. Example: find customers with more than 5 orders. WHERE filters out inactive customers before grouping. HAVING then checks the count. That cuts the aggregation workload by 80%. Stop guessing. Start using both correctly.

where_vs_having.sqlSQL

-- WRONG: Filtering row-level data with HAVING
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
HAVING status = 'shipped' AND COUNT(*) > 5;

-- RIGHT: WHERE filters rows, HAVING filters groups
SELECT customer_id, COUNT(*) as order_count
FROM orders
WHERE status = 'shipped'
GROUP BY customer_id
HAVING COUNT(*) > 5;

-- Performance difference: 2.3s vs 0.4s on 10M order table

Output

RIGHT query: 0.4 seconds, uses index on status column

WRONG query: 2.3 seconds, no index usage, full table scan

🔥Performance Insight:

PostgreSQL's planner can sometimes optimize wrong WHERE/HAVING usage, but MySQL and SQL Server rarely do. Always assume the planner will take the literal meaning of your query. Check execution plans after refactoring from HAVING-everything to WHERE-first.

🎯 Key Takeaway

WHERE filters before grouping; HAVING filters after. Use both. Confusing them is the fastest way to burn CPU cycles.

● Production incidentPOST-MORTEMseverity: high

Lost Update Due to Read Committed Isolation

Symptom

Account balance ended up higher than expected after two simultaneous transfers; reconciliation failed.

Assumption

Atomicity would prevent partial transactions, so it was safe to run without explicit locking.

Root cause

Both transactions read the balance under Read Committed isolation before the other committed. Each calculated a new balance based on the same initial value. The second commit overwrote the first, effectively losing the first update.

Fix

Changed the transfer logic to use SELECT ... FOR UPDATE (pessimistic locking) or switched to Serializable isolation level for the transfer transaction.

Key lesson

Read Committed does not protect against lost updates when two transactions read and then write the same row.
If your business logic depends on a value staying stable between a read and a write, you must lock the row or use a higher isolation level.
Always test concurrency scenarios with high load — they often reveal hidden race conditions.

Production debug guideSystematic approach to find and fix slow queries in production.3 entries

Symptom · 01

Query that used to be fast suddenly becomes slow

→

Fix

Check if an index was dropped or if statistics are stale. Run ANALYZE and review recent DDL changes.

Symptom · 02

Deadlock error messages in application logs

→

Fix

Identify all transactions involved, extract their lock ordering, and ensure all code paths acquire locks on resources in the same global order.

Symptom · 03

Full table scan on a large table despite an existing index

→

Fix

Check if the index is on the column wrapped in a function (e.g., WHERE YEAR(date)=2023). Rewrite as range condition or create a function-based index.

★ DBMS Troubleshooting Quick ReferenceCommon production database problems and the three commands that fix them.

Slow query on a large table−

Immediate action

Run EXPLAIN ANALYZE to see the actual plan and row estimates.

Commands

EXPLAIN (ANALYZE, BUFFERS) SELECT ...

Check for missing indexes: SELECT schemaname, tablename, seq_scan FROM pg_stat_user_tables WHERE seq_scan > 1000;

Fix now

Add an index on the filtering columns, or rewrite the query to avoid full scan.

Transaction deadlock detected+

Missing rows after UPSERT+

Aspect	OLTP (Operational)	OLAP (Analytical)
Primary use case	Day-to-day transactions (insert/update/delete)	Reporting and aggregations (complex reads)
Schema style	Highly normalized (3NF) to avoid anomalies	Denormalized (star/snowflake schema) for fast reads
Query type	Many small, fast queries on few rows	Few large queries scanning millions of rows
Indexing focus	Index on FK columns and WHERE predicates	Columnar storage; bitmap indexes on low-cardinality columns
Transaction need	Strong ACID guarantees required	Often relaxed; eventual consistency acceptable
Examples	Bank transfers, e-commerce orders, user logins	Sales dashboards, trend analysis, data warehouses
Optimization target	Write throughput, low latency	Read throughput, query parallelism

⚙ Quick Reference

7 commands from this guide

File	Command / Code	Purpose
acid_transaction_demo.sql	CREATE TABLE bank_accounts (	ACID Properties
normalization_example.sql	CREATE TABLE orders_unnormalized (	Normalization vs Denormalization
indexing_strategy_demo.sql	CREATE TABLE employees (	Indexing Deep Dive
joins_and_groupby_demo.sql	CREATE TABLE customers (	Joins, Keys, and the Difference Between WHERE and HAVING
isolation_and_deadlock_demo.sql	CREATE TABLE account_balance (	Transactions, Isolation Levels, and Concurrency Control
view_performance.sql	CREATE VIEW customer_orders AS	Why Views Are Leaky Abstractions That Can Wreck Query Perfor
where_vs_having.sql	SELECT customer_id, COUNT(*) as order_count	The Difference Between WHERE and HAVING That Junior Devs Alw

Key takeaways

ACID isn't magic

it's a contract about failure behavior. Atomicity prevents partial writes, Consistency enforces schema rules, Isolation controls visibility between concurrent transactions, and Durability means committed data survives crashes via Write-Ahead Logging.

Normalize by default for OLTP (one place to update, zero inconsistency), denormalize deliberately for OLAP (pre-joined data for instant reads). Never denormalize without profiling the actual query bottleneck first.

Every index is a write-time cost for a read-time benefit. The left-prefix rule governs composite indexes

a (a, b, c) index helps queries filtering on a, a+b, or a+b+c, but not b or c alone. Always verify with EXPLAIN before creating or dropping indexes.

WHERE filters rows before aggregation; HAVING filters groups after aggregation. The LEFT JOIN + WHERE right.id IS NULL anti-join pattern is one of the most practical SQL patterns in production, and it's a near-universal interview question.

MVCC is the dominant concurrency model

it lets readers see consistent snapshots without blocking writers. Always match your isolation level to the business requirement, and maintain consistent lock ordering to prevent deadlocks.

Common mistakes to avoid

4 patterns

Wrapping indexed column in a function in WHERE clause

Symptom

Index on hire_date is completely ignored; query performs a full table scan, causing slow response times in production.

Fix

Rewrite the condition as a range: WHERE hire_date >= '2023-01-01' AND hire_date < '2024-01-01' instead of WHERE YEAR(hire_date) = 2023.

Using DELETE to clear a large table instead of TRUNCATE

Symptom

Transaction log fills up, query hangs, and the operation takes orders of magnitude longer than expected.

Fix

Use TRUNCATE TABLE to remove all rows in a minimally logged, fast operation. TRUNCATE is not fully rollbackable in all DBMS — verify before use.

Ignoring NULL behavior in aggregate functions like COUNT and AVG

Symptom

COUNT(column) returns fewer rows than COUNT(*) when the column has NULLs; AVG(column) computes average only over non-NULL rows, skewing results.

Fix

Use COUNT(*) for total rows, or COALESCE(column, 0) before aggregation to handle NULLs explicitly.

Acquiring locks in inconsistent order across transactions

Symptom

Deadlock errors appear randomly under load, causing application retries and user-facing errors.

Fix

Enforce a global lock ordering: always update tables in the same sequence (e.g., always update accounts before orders).

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Write a query to find the top 3 salaries in each department. If two empl...

Q02SENIOR

Design a 'Like' button system for 100M users. Would you use a normalized...

Q03SENIOR

Explain the difference between B-Tree and Hash indexes. Why is a B-Tree ...

Q04SENIOR

A table has a composite index on (country_id, status, created_at). Which...

Q05JUNIOR

Explain the difference between locking and MVCC in concurrency control.

Q06SENIOR

What is a deadlock and how do you prevent it in a database application?

Q01 of 06SENIOR

Write a query to find the top 3 salaries in each department. If two employees have the same salary, they should both be included in the ranking.

ANSWER

Use DENSE_RANK() window function to assign ranks without gaps: SELECT department_id, employee_id, salary FROM ( SELECT *, DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank FROM employees ) ranked WHERE rank <= 3; DENSE_RANK ensures ties get the same rank and the next rank is not skipped. ROW_NUMBER would arbitrarily pick one of the tied rows. RANK would skip a number after a tie, potentially giving fewer than 3 results.

FAQ · 4 QUESTIONS

Frequently Asked Questions

What is the difference between a primary key and a unique key in SQL?

How do you resolve a deadlock in a production database?

When should I use a stored procedure versus writing SQL in application code?

What is the difference between a clustered and a non-clustered index?

Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Drawn from code that ran under real load.

✓ Verified

production tested

July 27, 2026

last updated

1,713

articles · all by Naren

🔥

That's Database Interview. Mark it forged?

7 min read · try the examples if you haven't