DBMS interview questions test your grasp of how data is stored, retrieved, and protected — not just SQL syntax.
Core topics: ACID, normalization, indexing, joins, transactions, and concurrency control.
Focus on tradeoffs: normalization vs denormalization, index read vs write cost, isolation vs performance.
Production insight: Misunderstanding isolation levels causes data anomalies like dirty reads and lost updates.
Biggest mistake: Memorizing definitions without being able to apply them to real scenarios.
Plain-English First
Think of a database management system like a giant, super-organized library. The DBMS is the librarian — it decides where every book (data) is stored, who's allowed to read it, how to find it in seconds even among millions of books, and what happens if two people try to check out the same book at the same time. When interviewers ask DBMS questions, they're really asking: do you understand how this librarian makes decisions, and why those decisions matter?
Every serious backend or full-stack role has at least a handful of DBMS questions tucked into the interview. Why? Because how data is stored, retrieved, and protected is the backbone of almost every application ever built. A poorly understood database concept in production doesn't just cause slow queries — it causes data loss, race conditions, and outages at 3am. Interviewers use DBMS questions to quickly separate engineers who 'use' databases from engineers who 'understand' them.
The problem most candidates have isn't that they don't know SQL. It's that they've memorized definitions without ever connecting them to real scenarios. They can recite 'ACID stands for Atomicity, Consistency, Isolation, Durability' but freeze when asked 'What actually happens to your data if the server crashes mid-transaction?' That gap between definition and intuition is exactly what interviewers are probing for.
By the end of this article, you'll be able to answer intermediate DBMS interview questions with confident, scenario-grounded explanations. You'll understand the WHY behind normalization, indexing, transactions, and joins — and you'll have concrete examples to pull from the moment an interviewer says 'Tell me about...'
ACID Properties — What They Actually Guarantee (and When They Fail)
ACID is the most asked-about DBMS concept in interviews, and the most superficially answered. Let's fix that.
Atomicity means a transaction is all-or-nothing. If you're transferring $500 from Account A to Account B, that's two writes. Atomicity guarantees that if the second write fails, the first is rolled back — you never end up with money disappearing into thin air.
Consistency means the database moves from one valid state to another. If your schema says an order must have a valid customer_id, a transaction that violates that constraint is rejected entirely.
Isolation is the subtle one. It controls what a transaction can see about other in-flight transactions. SQL standard defines four isolation levels — Read Uncommitted, Read Committed, Repeatable Read, and Serializable — each trading performance for safety. Most databases default to Read Committed.
Durability means once a transaction is committed, it survives crashes. The database writes to a transaction log (WAL — Write-Ahead Log) before confirming success, so even if the server dies immediately after your commit, the data is recoverable on restart.
The key interview insight: ACID is a contract, not a guarantee that nothing goes wrong. It's a guarantee about what the system will do to handle things going wrong.
acid_transaction_demo.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- Scenario: Bank transfer between two accounts-- This demonstrates Atomicity — both updates happen or neither does-- Setup: create accounts table with a balance constraintCREATETABLEbank_accounts (
account_id INTPRIMARYKEY,
holder_name VARCHAR(100) NOTNULL,
balance DECIMAL(10, 2) NOTNULLCHECK (balance >= 0) -- Consistency constraint
);
INSERTINTO bank_accounts VALUES (1, 'Alice Johnson', 1000.00);
INSERTINTO bank_accounts VALUES (2, 'Bob Martinez', 500.00);
-- Begin a transaction (Atomicity wrapper)BEGINTRANSACTION;
-- Step 1: Deduct from AliceUPDATE bank_accounts
SET balance = balance - 500.00WHERE account_id = 1;
-- Step 2: Credit BobUPDATE bank_accounts
SET balance = balance + 500.00WHERE account_id = 2;
-- If EITHER update fails (e.g. Alice's balance would go negative,-- violating CHECK constraint), the whole block rolls back.-- Neither account changes. That's Atomicity + Consistency working together.COMMIT; -- Durability: this is now written to the WAL and survives a crash-- Verify the final stateSELECT account_id, holder_name, balance FROM bank_accounts;
-- Now demonstrate a ROLLBACK scenarioBEGINTRANSACTION;
UPDATE bank_accounts
SET balance = balance - 2000.00-- Alice only has 500 now; CHECK will failWHERE account_id = 1;
ROLLBACK; -- We manually roll back, or the DB engine does it on constraint violation-- Alice's balance is unchanged — Atomicity protected usSELECT account_id, holder_name, balance FROM bank_accounts;
Most databases default to Read Committed isolation, which prevents dirty reads but still allows non-repeatable reads. If your transaction reads a row twice and another transaction commits a change in between, you'll get different values. If your logic depends on a value staying stable across two reads in the same transaction, you need Repeatable Read or Serializable — at the cost of more locking and lower throughput.
Production Insight
In production, the most common ACID failure is not a crash but a lost update under Read Committed isolation.
Your transaction reads a row, another transaction updates it, and your subsequent writes overwrite that change.
Rule: If your business logic depends on a value staying stable across reads, use SELECT ... FOR UPDATE or move to Repeatable Read.
Key Takeaway
ACID properties are guarantees about failure handling, not guarantees of correctness under concurrency.
The isolation level you choose determines what concurrency anomalies you accept.
Always match your isolation level to your business requirement — default settings are rarely right.
Normalization vs Denormalization — Choosing the Right Tradeoff
Normalization is the process of organizing your schema to eliminate redundant data and protect against update anomalies. Denormalization is the deliberate reversal of that — adding redundancy back in to speed up reads. The interview question isn't 'what is normalization?' — it's 'when do you normalize and when do you denormalize?'
Normalization follows 'Normal Forms'. First Normal Form (1NF) eliminates repeating groups — every cell holds one atomic value. Second Normal Form (2NF) removes partial dependencies — every non-key column depends on the whole primary key, not just part of it. Third Normal Form (3NF) removes transitive dependencies — non-key columns shouldn't depend on other non-key columns.
Here's the real-world intuition: if you store a customer's city and zip code in the orders table, you'll have to update both every time a customer moves — and you risk them going out of sync. 3NF says: put zip code in a customers table and reference it by customer_id. Now city and zip are always consistent.
Denormalization makes sense in read-heavy analytical systems (data warehouses, reporting dashboards) where joins across 10 normalized tables are too slow. You pre-join the data at write time so reads are instant. The tradeoff: writes become slower and you need application logic to keep redundant data in sync.
The rule of thumb: normalize by default in OLTP systems (transactions), denormalize deliberately in OLAP systems (analytics).
normalization_example.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- =============================================-- BAD: Unnormalized orders table (violates 3NF)-- =============================================-- Problem: customer_city depends on customer_id, NOT on order_id.-- If a customer moves, you update 100 rows instead of 1.CREATETABLEorders_unnormalized (
order_id INTPRIMARYKEY,
customer_id INT,
customer_name VARCHAR(100), -- redundant — should live in customers table
customer_city VARCHAR(100), -- transitive dependency: city -> customer, not order
product_name VARCHAR(100),
order_total DECIMAL(10, 2)
);
-- =============================================-- GOOD: Normalized to 3NF — two separate tables-- =============================================CREATETABLEcustomers (
customer_id INTPRIMARYKEY,
customer_name VARCHAR(100) NOTNULL,
customer_city VARCHAR(100) NOTNULL-- city lives here — update once, reflects everywhere
);
CREATETABLEorders (
order_id INTPRIMARYKEY,
customer_id INTNOTNULLREFERENCEScustomers(customer_id), -- FK enforces integrity
product_name VARCHAR(100) NOTNULL,
order_total DECIMAL(10, 2) NOTNULL
);
-- Insert sample dataINSERTINTO customers VALUES (1, 'Alice Johnson', 'New York');
INSERTINTO orders VALUES (101, 1, 'Laptop', 1299.99);
INSERTINTO orders VALUES (102, 1, 'Mouse', 29.99);
-- Join to reconstruct the full picture — this is the cost of normalization-- (the join), but the benefit is zero data duplicationSELECT
o.order_id,
c.customer_name,
c.customer_city,
o.product_name,
o.order_total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
-- If Alice moves to Boston — ONE update, zero inconsistencyUPDATE customers SET customer_city = 'Boston'WHERE customer_id = 1;
When an interviewer asks about normalization, don't just recite the normal forms. Say: 'I normalize transactional schemas by default to avoid update anomalies, and I selectively denormalize in read-heavy or analytical contexts where join performance becomes a bottleneck — like pre-aggregating daily sales totals into a summary table.' That answer signals you think in tradeoffs, not rules.
Production Insight
Denormalizing without profiling can make a fast query slower by bloating row size and increasing I/O per scan.
Always measure the actual join cost before adding redundancy.
Rule: Profile first, denormalize second.
Key Takeaway
Normalize for write-heavy OLTP to avoid anomalies; denormalize for read-heavy OLAP after measuring bottlenecks.
The 3NF rule prevents transitive dependencies — city depends on customer, not order.
Use foreign keys to enforce referential integrity — without them, you'll get orphaned records.
Indexing Deep Dive — Why Indexes Speed Up Reads but Hurt Writes
An index is a separate data structure — usually a B-Tree — that the database builds and maintains alongside your table. It stores a sorted copy of one or more columns, with pointers back to the actual rows. Without an index, a query like WHERE email = 'alice@example.com' does a full table scan — reading every single row. With an index on email, the database does a binary search on the B-Tree and jumps directly to the matching row.
So why not index every column? Because every index you create is a structure that must be updated on every INSERT, UPDATE, and DELETE. If your orders table has 8 indexes, inserting one order requires 8 separate B-Tree updates. In a write-heavy system, that overhead adds up fast and can make inserts painfully slow.
There are two important index types you'll encounter in interviews. A clustered index physically reorders the table data to match the index order — there can only be one per table (SQL Server/MySQL InnoDB use the primary key as the clustered index by default). A non-clustered index is a separate structure with pointers to rows — you can have many, and they don't affect the storage order of the table.
Composite indexes follow the 'left-prefix rule': an index on (last_name, first_name) speeds up queries filtering on last_name alone, or both columns — but NOT on first_name alone. The database can only use the leftmost columns in the index definition.
indexing_strategy_demo.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Setup: large employees table to demonstrate index impactCREATETABLEemployees (
employee_id INTPRIMARYKEY, -- clustered index by default in most DBs
department_id INTNOTNULL,
last_name VARCHAR(100) NOTNULL,
first_name VARCHAR(100) NOTNULL,
email VARCHAR(200) UNIQUE, -- UNIQUE automatically creates an index
hire_date DATENOTNULL,
salary DECIMAL(10, 2)
);
-- Without any extra indexes, this query does a full table scan (slow at scale)-- EXPLAIN shows 'Seq Scan' or 'Full Table Scan' — red flag in productionEXPLAINSELECT * FROM employees WHERE last_name = 'Johnson';
-- Create a non-clustered index on last_nameCREATEINDEX idx_employees_last_name
ONemployees (last_name); -- B-Tree built; now searches on last_name are O(log n)-- After the index: EXPLAIN shows 'Index Scan' — jumps directly to matching rowsEXPLAINSELECT * FROM employees WHERE last_name = 'Johnson';
-- Composite index: useful when you frequently filter by department AND hire_dateCREATEINDEX idx_employees_dept_hire
ONemployees (department_id, hire_date);
-- This query USES the composite index (left-prefix rule satisfied)SELECT * FROM employees
WHERE department_id = 3AND hire_date > '2022-01-01'; -- Both columns in index, in order-- This query CANNOT use the composite index efficiently-- (skips department_id, starts at second column — violates left-prefix rule)SELECT * FROM employees
WHERE hire_date > '2022-01-01'; -- Falls back to full scan-- Check index usage and fragmentation (PostgreSQL syntax)SELECT
indexname,
idx_scan, -- how many times this index was used
idx_tup_read -- rows read through this indexFROM pg_stat_user_indexes
WHERE tablename = 'employees';
Output
-- EXPLAIN before index:
Seq Scan on employees (cost=0.00..2841.00 rows=45 width=80)
Filter: ((last_name)::text = 'Johnson')
-- EXPLAIN after index:
Index Scan using idx_employees_last_name on employees (cost=0.29..8.31 rows=45 width=80)
Pro Tip: Use EXPLAIN Before Adding Indexes Blindly
Always run EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) on a slow query before creating an index. The query plan shows exactly which step is expensive — sometimes the bottleneck is a missing join condition or a function wrapping a column (like WHERE YEAR(hire_date) = 2023, which prevents index use entirely). Fix the query first, then index if needed.
Production Insight
A missing index on a foreign key column is the #1 cause of cascading full table scans in production joins.
Always index foreign keys by default — the database won't warn you.
Rule: When adding a FK constraint, add an index on the FK column immediately.
Key Takeaway
Indexes are B-Trees that speed reads at the cost of slower writes.
The left-prefix rule governs composite indexes — always put the most selective column first.
Use EXPLAIN before creating an index — sometimes the query itself is the problem.
Joins, Keys, and the Difference Between WHERE and HAVING
Joins are the mechanism for combining rows from two or more tables based on a related column. The confusion isn't usually about syntax — it's about understanding which join to use and what happens to unmatched rows.
INNER JOIN returns only rows where the join condition is satisfied in both tables. If a customer has no orders, they don't appear in the result. LEFT JOIN returns all rows from the left table and matching rows from the right — unmatched right-side columns are NULL. This is critical for questions like 'find all customers who have never placed an order' (WHERE order_id IS NULL after a LEFT JOIN).
Cross joins and self-joins come up less often in interviews but signal depth. A self-join joins a table to itself — useful for hierarchical data like an employees table where manager_id references another employee_id in the same table.
The WHERE vs HAVING distinction is a classic interview trap. WHERE filters rows before aggregation happens. HAVING filters groups after aggregation. If you put an aggregate function like COUNT() in a WHERE clause, you'll get a syntax error — because WHERE runs before GROUP BY has created any groups to count. HAVING runs after GROUP BY, so aggregates are fair game.
Primary keys uniquely identify a row and cannot be NULL. Foreign keys enforce referential integrity — a foreign key column in one table must match an existing primary key in another table (or be NULL). Without this constraint, you can end up with orphaned records: orders pointing to customer_ids that no longer exist.
joins_and_groupby_demo.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
-- Setup: customers and orders tablesCREATETABLEcustomers (
customer_id INTPRIMARYKEY,
customer_name VARCHAR(100) NOTNULL
);
CREATETABLEorders (
order_id INTPRIMARYKEY,
customer_id INTREFERENCEScustomers(customer_id), -- foreign key
amount DECIMAL(10, 2) NOTNULL,
order_year INTNOTNULL
);
INSERTINTO customers VALUES (1, 'Alice Johnson'), (2, 'Bob Martinez'), (3, 'Carol Wu');
INSERTINTO orders VALUES
(101, 1, 250.00, 2023),
(102, 1, 450.00, 2023),
(103, 2, 125.00, 2022),
(104, 2, 800.00, 2023);
-- Carol Wu (customer_id=3) has NO orders — useful for LEFT JOIN demo-- =============================================-- INNER JOIN: only customers WITH orders-- =============================================SELECT c.customer_name, o.order_id, o.amount
FROM customers c
INNERJOIN orders o ON c.customer_id = o.customer_id;
-- Carol Wu is NOT in this result-- =============================================-- LEFT JOIN: ALL customers, even those without orders-- =============================================SELECT c.customer_name, o.order_id, o.amount
FROM customers c
LEFTJOIN orders o ON c.customer_id = o.customer_id;
-- Carol Wu appears with NULL for order_id and amount-- =============================================-- Find customers who have NEVER placed an order-- =============================================SELECT c.customer_name
FROM customers c
LEFTJOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id ISNULL; -- NULL means no matching order row was found-- =============================================-- WHERE vs HAVING — the classic interview trap-- =============================================-- WHERE filters individual rows BEFORE groupingSELECT customer_id, SUM(amount) AS total_spent
FROM orders
WHERE order_year = 2023-- filter rows: only 2023 orders enter the groupGROUPBY customer_id;
-- HAVING filters groups AFTER aggregationSELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUPBY customer_id
HAVINGSUM(amount) > 500; -- filter groups: only big spenders-- WRONG: This throws an error — you can't use aggregate in WHERE-- SELECT customer_id FROM orders WHERE SUM(amount) > 500 GROUP BY customer_id;-- ERROR: aggregate functions are not allowed in WHERE
Output
-- INNER JOIN result:
customer_name | order_id | amount
--------------+----------+--------
Alice Johnson | 101 | 250.00
Alice Johnson | 102 | 450.00
Bob Martinez | 103 | 125.00
Bob Martinez | 104 | 800.00
-- LEFT JOIN result:
customer_name | order_id | amount
--------------+----------+--------
Alice Johnson | 101 | 250.00
Alice Johnson | 102 | 450.00
Bob Martinez | 103 | 125.00
Bob Martinez | 104 | 800.00
Carol Wu | NULL | NULL
-- Customers with NO orders:
customer_name
--------------
Carol Wu
-- WHERE 2023 filter result:
customer_id | total_spent
------------+------------
1 | 700.00
2 | 800.00
-- HAVING > 500 result:
customer_id | total_spent
------------+------------
2 | 925.00
Interview Gold: The LEFT JOIN + IS NULL Pattern
The pattern LEFT JOIN ... WHERE right_table.id IS NULL is a classic 'anti-join' — it finds rows in the left table that have no match in the right. Interviewers love this because it separates people who understand what a LEFT JOIN actually produces (NULLs on the right side for unmatched rows) from those who've just memorized a diagram. An alternative is NOT EXISTS, which is often more readable and equally performant on modern query optimizers.
Production Insight
The LEFT JOIN anti-join pattern (WHERE right.id IS NULL) is silently slow on large tables if the right table lacks an index on the join column.
Without an index, PostgreSQL does a Nested Loop with a full scan per row.
Rule: Always index the join column in the right table for LEFT JOIN anti-joins.
Key Takeaway
WHERE filters rows before GROUP BY; HAVING filters groups after.
LEFT JOIN + IS NULL finds unmatched rows; NOT EXISTS is often equivalent.
Transactions, Isolation Levels, and Concurrency Control
A transaction groups multiple SQL statements into a single logical unit. When you COMMIT, all changes become permanent. When you ROLLBACK, all changes are undone. But what happens when two transactions run at the same time? That's where isolation levels and concurrency control kick in.
Most modern databases use Multi-Version Concurrency Control (MVCC). Instead of locking every row a transaction reads, MVCC keeps older versions of rows. Each transaction sees a snapshot of the data as of its start time. Readers never block writers, and writers never block readers. That's the magic. But MVCC isn't free — it increases storage and makes the transaction log grow.
The SQL standard defines four isolation levels
Read Uncommitted: No protection — can see uncommitted changes (dirty reads). Rarely used.
Read Committed: Prevents dirty reads. Default in PostgreSQL, Oracle, SQL Server. Allows non-repeatable reads and phantom reads.
Repeatable Read: Prevents non-repeatable reads. Still allows phantom reads (new rows inserted by others). Default in MySQL InnoDB.
Serializable: Highest level. Prevents all anomalies. Achieved through either locking or snapshot isolation (PostgreSQL uses Serializable Snapshot Isolation).
A deadlock occurs when two transactions each hold a lock the other needs. The database automatically kills one transaction to break the cycle. The fix is always application-side: access tables in the same order, keep transactions short, and avoid user interaction inside transactions.
isolation_and_deadlock_demo.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- =============================================-- Example: Lost Update under Read Committed-- =============================================CREATETABLEaccount_balance (
account_id INTPRIMARYKEY,
balance DECIMAL(10, 2) NOTNULLCHECK (balance >= 0)
);
INSERTINTO account_balance VALUES (1, 1000.00);
-- Transaction ABEGINTRANSACTION;
SELECT balance FROM account_balance WHERE account_id = 1; -- reads 1000-- ... some processing ...UPDATE account_balance SET balance = balance - 200WHERE account_id = 1;
-- Transaction B (concurrent)BEGINTRANSACTION;
SELECT balance FROM account_balance WHERE account_id = 1; -- also reads 1000UPDATE account_balance SET balance = balance - 300WHERE account_id = 1;
-- If Transaction B commits after A, the final balance is 700 (only B's change)-- Transaction A's deduction is lost. This is a lost update.-- Fix: Use SELECT ... FOR UPDATE to lock the rowBEGINTRANSACTION;
SELECT balance FROM account_balance WHERE account_id = 1FORUPDATE; -- acquires row lock-- Transaction B will block here until A commitsUPDATE account_balance SET balance = balance - 200WHERE account_id = 1;
COMMIT;
-- =============================================-- Deadlock scenario (avoid this)-- =============================================CREATETABLEtable_a (id INTPRIMARYKEY, val INT);
CREATETABLEtable_b (id INTPRIMARYKEY, val INT);
INSERTINTO table_a VALUES (1, 10);
INSERTINTO table_b VALUES (1, 20);
-- Transaction 1 (started first)BEGIN;
UPDATE table_a SET val = val + 1WHERE id = 1;
-- now try to update table_bUPDATE table_b SET val = val + 1WHERE id = 1; -- blocked if Transaction 2 holds lock on table_b-- Transaction 2 (concurrent)BEGIN;
UPDATE table_b SET val = val + 1WHERE id = 1;
-- now try to update table_aUPDATE table_a SET val = val + 1WHERE id = 1; -- DEADLOCK!-- One of them will be killed by the database; the other succeeds.-- Prevention: always update table_a before table_b in both transactions.
Output
-- After lost update scenario (without FOR UPDATE):
account_id | balance
-----------+--------
1 | 700.00 -- lost $200
-- After using FOR UPDATE:
account_id | balance
-----------+--------
1 | 800.00 -- correct
-- Deadlock detection log (PostgreSQL):
ERROR: deadlock detected
DETAIL: Process 123 waits for ShareLock on transaction 456; blocked by process 456.
HINT: See server log for query details.
Implicit Locks Can Still Cause Problems
Even with MVCC, UPDATE and DELETE statements acquire write locks on affected rows. If two transactions update the same row concurrently, one will wait for the other. Long-running transactions holding locks for minutes cause lock escalation and can cascade into a system-wide stall. Keep transactions short and commit early.
Production Insight
Default Read Committed isolation allows non-repeatable reads and phantom reads.
If your application reads a sum of account balances twice in a transaction and gets different results, you have a phantom read.
Rule: Use Serializable only when absolutely necessary — it kills throughput.
Key Takeaway
MVCC enables non-blocking reads by keeping older row versions.
Isolation levels trade safety for performance — choose the weakest level that guarantees correctness.
Deadlocks are avoided by locking tables in the same order across all transactions.
● Production incidentPOST-MORTEMseverity: high
Lost Update Due to Read Committed Isolation
Symptom
Account balance ended up higher than expected after two simultaneous transfers; reconciliation failed.
Assumption
Atomicity would prevent partial transactions, so it was safe to run without explicit locking.
Root cause
Both transactions read the balance under Read Committed isolation before the other committed. Each calculated a new balance based on the same initial value. The second commit overwrote the first, effectively losing the first update.
Fix
Changed the transfer logic to use SELECT ... FOR UPDATE (pessimistic locking) or switched to Serializable isolation level for the transfer transaction.
Key lesson
Read Committed does not protect against lost updates when two transactions read and then write the same row.
If your business logic depends on a value staying stable between a read and a write, you must lock the row or use a higher isolation level.
Always test concurrency scenarios with high load — they often reveal hidden race conditions.
Production debug guideSystematic approach to find and fix slow queries in production.3 entries
Symptom · 01
Query that used to be fast suddenly becomes slow
→
Fix
Check if an index was dropped or if statistics are stale. Run ANALYZE and review recent DDL changes.
Symptom · 02
Deadlock error messages in application logs
→
Fix
Identify all transactions involved, extract their lock ordering, and ensure all code paths acquire locks on resources in the same global order.
Symptom · 03
Full table scan on a large table despite an existing index
→
Fix
Check if the index is on the column wrapped in a function (e.g., WHERE YEAR(date)=2023). Rewrite as range condition or create a function-based index.
★ DBMS Troubleshooting Quick ReferenceCommon production database problems and the three commands that fix them.
Slow query on a large table−
Immediate action
Run EXPLAIN ANALYZE to see the actual plan and row estimates.
Commands
EXPLAIN (ANALYZE, BUFFERS) SELECT ...
Check for missing indexes: SELECT schemaname, tablename, seq_scan FROM pg_stat_user_tables WHERE seq_scan > 1000;
Fix now
Add an index on the filtering columns, or rewrite the query to avoid full scan.
Transaction deadlock detected+
Immediate action
Identify the blocking transaction using pg_locks (PostgreSQL) or SHOW PROCESSLIST (MySQL).
Commands
SELECT pid, state, query, wait_event FROM pg_stat_activity WHERE wait_event IS NOT NULL;
Kill the blocking transaction: SELECT pg_terminate_backend(pid);
Fix now
Redesign application code to acquire locks in the same order across all transactions.
Missing rows after UPSERT+
Immediate action
Check for duplicate key violations or constraint errors in logs.
Commands
SHOW server_errors; (PostgreSQL) or check error log file.
Test the UPSERT manually with a small dataset to see if ON CONFLICT is correctly specified.
Fix now
Add a UNIQUE constraint and ensure the ON CONFLICT clause targets the correct constraint.
Aspect
OLTP (Operational)
OLAP (Analytical)
Primary use case
Day-to-day transactions (insert/update/delete)
Reporting and aggregations (complex reads)
Schema style
Highly normalized (3NF) to avoid anomalies
Denormalized (star/snowflake schema) for fast reads
Query type
Many small, fast queries on few rows
Few large queries scanning millions of rows
Indexing focus
Index on FK columns and WHERE predicates
Columnar storage; bitmap indexes on low-cardinality columns
Transaction need
Strong ACID guarantees required
Often relaxed; eventual consistency acceptable
Examples
Bank transfers, e-commerce orders, user logins
Sales dashboards, trend analysis, data warehouses
Optimization target
Write throughput, low latency
Read throughput, query parallelism
Key takeaways
1
ACID isn't magic
it's a contract about failure behavior. Atomicity prevents partial writes, Consistency enforces schema rules, Isolation controls visibility between concurrent transactions, and Durability means committed data survives crashes via Write-Ahead Logging.
2
Normalize by default for OLTP (one place to update, zero inconsistency), denormalize deliberately for OLAP (pre-joined data for instant reads). Never denormalize without profiling the actual query bottleneck first.
3
Every index is a write-time cost for a read-time benefit. The left-prefix rule governs composite indexes
a (a, b, c) index helps queries filtering on a, a+b, or a+b+c, but not b or c alone. Always verify with EXPLAIN before creating or dropping indexes.
4
WHERE filters rows before aggregation; HAVING filters groups after aggregation. The LEFT JOIN + WHERE right.id IS NULL anti-join pattern is one of the most practical SQL patterns in production, and it's a near-universal interview question.
5
MVCC is the dominant concurrency model
it lets readers see consistent snapshots without blocking writers. Always match your isolation level to the business requirement, and maintain consistent lock ordering to prevent deadlocks.
Common mistakes to avoid
4 patterns
×
Wrapping indexed column in a function in WHERE clause
Symptom
Index on hire_date is completely ignored; query performs a full table scan, causing slow response times in production.
Fix
Rewrite the condition as a range: WHERE hire_date >= '2023-01-01' AND hire_date < '2024-01-01' instead of WHERE YEAR(hire_date) = 2023.
×
Using DELETE to clear a large table instead of TRUNCATE
Symptom
Transaction log fills up, query hangs, and the operation takes orders of magnitude longer than expected.
Fix
Use TRUNCATE TABLE to remove all rows in a minimally logged, fast operation. TRUNCATE is not fully rollbackable in all DBMS — verify before use.
×
Ignoring NULL behavior in aggregate functions like COUNT and AVG
Symptom
COUNT(column) returns fewer rows than COUNT(*) when the column has NULLs; AVG(column) computes average only over non-NULL rows, skewing results.
Fix
Use COUNT(*) for total rows, or COALESCE(column, 0) before aggregation to handle NULLs explicitly.
×
Acquiring locks in inconsistent order across transactions
Symptom
Deadlock errors appear randomly under load, causing application retries and user-facing errors.
Fix
Enforce a global lock ordering: always update tables in the same sequence (e.g., always update accounts before orders).
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
Write a query to find the top 3 salaries in each department. If two empl...
Q02SENIOR
Design a 'Like' button system for 100M users. Would you use a normalized...
Q03SENIOR
Explain the difference between B-Tree and Hash indexes. Why is a B-Tree ...
Q04SENIOR
A table has a composite index on (country_id, status, created_at). Which...
Q05JUNIOR
Explain the difference between locking and MVCC in concurrency control.
Q06SENIOR
What is a deadlock and how do you prevent it in a database application?
Q01 of 06SENIOR
Write a query to find the top 3 salaries in each department. If two employees have the same salary, they should both be included in the ranking.
ANSWER
Use DENSE_RANK() window function to assign ranks without gaps:
SELECT department_id, employee_id, salary
FROM (
SELECT *, DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM employees
) ranked
WHERE rank <= 3;
DENSE_RANK ensures ties get the same rank and the next rank is not skipped. ROW_NUMBER would arbitrarily pick one of the tied rows. RANK would skip a number after a tie, potentially giving fewer than 3 results.
Q02 of 06SENIOR
Design a 'Like' button system for 100M users. Would you use a normalized SQL table or an atomic counter in NoSQL? Explain the ACID vs Performance tradeoff.
ANSWER
For a like system where consistency matters (e.g., showing accurate count), a normalized SQL table with a row per like provides referential integrity and prevents duplicate likes via a unique constraint. Reads become slower (SELECT COUNT(*)) but you can cache the count in Redis with periodic sync. A pure NoSQL counter is fast for writes but risks data loss if the counter and the like record get out of sync. In practice, a hybrid approach works: store likes in a SQL table for correctness, and maintain a cached count in Redis with eventual consistency. Use the SQL table as source of truth and rebuild cache on failure.
Q03 of 06SENIOR
Explain the difference between B-Tree and Hash indexes. Why is a B-Tree generally preferred for columns with range queries?
ANSWER
A B-Tree index organizes data in a balanced tree structure, allowing efficient range scans (BETWEEN, >, <) in O(log n + k) where k is the number of matched rows. A hash index uses a hash function to map keys to bucket slots, providing O(1) lookup for exact equality. Hash indexes cannot support range queries because the hash function destroys ordering. B-Trees are also better for ORDER BY and prefix matching. Use hash indexes only when queries are pure equality lookups and you need maximum single-row speed.
Q04 of 06SENIOR
A table has a composite index on (country_id, status, created_at). Which queries will use the index? (a) WHERE status = 'active', (b) WHERE country_id = 5, (c) WHERE country_id = 5 AND created_at > '2024-01-01'?
ANSWER
(a) Will NOT use the index efficiently — it skips the leftmost column country_id, violating the left-prefix rule. The database may still use an index skip scan (in PostgreSQL) but generally it's a full scan.
(b) Uses the index — it matches the leftmost column. It will filter on country_id quickly.
(c) Uses the index — country_id matches the first column, and created_at matches the third column. However, because status (the second column) is not used, the index cannot filter on created_at directly; it will scan all rows for that country_id and then filter on created_at. Adding a separate index on (country_id, created_at) would be better if this query is common.
Answer: (b) and (c) can use the index, but (c) is less efficient. (a) cannot.
Q05 of 06JUNIOR
Explain the difference between locking and MVCC in concurrency control.
ANSWER
Locking (pessimistic concurrency) prevents conflicts by blocking other transactions from accessing locked data until the lock is released. It's simple but reduces concurrency and can cause deadlocks. MVCC (optimistic concurrency) allows multiple transactions to see and modify different versions of the same row simultaneously. Each transaction works on a snapshot of the data at its start time. Writers still acquire locks briefly on the rows they modify, but readers never block. MVCC avoids most lock contention but increases storage and can lead to write skew anomalies under Serializable isolation. Most modern databases (PostgreSQL, Oracle, MySQL InnoDB) use MVCC as their primary model.
Q06 of 06SENIOR
What is a deadlock and how do you prevent it in a database application?
ANSWER
A deadlock occurs when two or more transactions each hold locks on resources that the others need, creating a cycle of waiting. The database detects this and kills one transaction (the one with the least work done) to break the cycle. To prevent deadlocks:
1. Access tables and rows in the same order across all transactions.
2. Keep transactions short and commit quickly.
3. Use lower isolation levels when possible to reduce lock duration.
4. For hot rows, consider using application-level queueing or optimistic locking (e.g., version column).
5. If deadlocks persist, implement retry logic in the application layer.
01
Write a query to find the top 3 salaries in each department. If two employees have the same salary, they should both be included in the ranking.
SENIOR
02
Design a 'Like' button system for 100M users. Would you use a normalized SQL table or an atomic counter in NoSQL? Explain the ACID vs Performance tradeoff.
SENIOR
03
Explain the difference between B-Tree and Hash indexes. Why is a B-Tree generally preferred for columns with range queries?
SENIOR
04
A table has a composite index on (country_id, status, created_at). Which queries will use the index? (a) WHERE status = 'active', (b) WHERE country_id = 5, (c) WHERE country_id = 5 AND created_at > '2024-01-01'?
SENIOR
05
Explain the difference between locking and MVCC in concurrency control.
JUNIOR
06
What is a deadlock and how do you prevent it in a database application?
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
What is the difference between a primary key and a unique key in SQL?
A primary key uniquely identifies each row AND cannot contain NULL values — every table should have exactly one. A unique key also enforces uniqueness across a column, but it does allow NULL values (depending on the DBMS, either one or multiple NULLs). Use a primary key for your main identity and unique keys for natural keys like email addresses or passport numbers.
Was this helpful?
02
How do you resolve a deadlock in a production database?
Most modern DBMS (like PostgreSQL or MySQL) automatically detect deadlocks and kill the 'cheapest' transaction to break the cycle. However, to prevent them, you must ensure that all application logic accesses tables in the exact same order. If Service A updates Table 1 then 2, but Service B updates Table 2 then 1, a deadlock is inevitable under load. Keep transactions short and avoid user interaction inside a transaction block.
Was this helpful?
03
When should I use a stored procedure versus writing SQL in application code?
Use stored procedures for performance-critical batch processing to reduce network round-trips, or to centralize complex security/audit logic that multiple languages (e.g., a Python microservice and a Java monolith) must share. Use application-side SQL (via JDBI or Spring Data) for 90% of your CRUD logic to maintain better version control, easier unit testing, and simpler horizontal scaling of your app tier.
Was this helpful?
04
What is the difference between a clustered and a non-clustered index?
A clustered index physically reorders the table data to match the index order — each table can have only one clustered index because the data rows can only be stored in one order. The primary key is often the clustered index by default. A non-clustered index is a separate structure that contains a copy of the indexed columns and a pointer to the actual row. You can have many non-clustered indexes per table. Clustered indexes are faster for range queries on the indexed column, but slower for writes because the data pages must be physically reordered.