DBMS Interview Questions — Deadlock Debugging & Index Traps
UPDATE without index caused 100% CPU deadlock cascade in checkout.
20+ years shipping production systems from the metal up. Lessons pulled from things that broke in production.
- ACID guarantees Atomicity, Consistency, Isolation, Durability — each solves a specific failure mode
- Normalization reduces redundancy via 1NF→3NF, but denormalization wins in OLAP
- Clustered index physically sorts rows; non-clustered is a separate pointer structure
- Isolation levels are a dial: READ COMMITTED for most apps, SERIALIZABLE for financial accuracy
- Indexes speed reads, slow writes — only index WHERE/JOIN/ORDER BY columns
- Foreign keys prevent orphan rows but can cascade into performance traps
A database management system is like a super-organized filing cabinet in a busy hospital. Every patient record, appointment, and prescription is stored in labeled folders (tables), cross-referenced by unique patient IDs (primary keys), and managed by a strict librarian (the DBMS) who makes sure two nurses never overwrite each other's notes at the same time. When an interviewer asks about DBMS, they want to know you understand not just where things are stored, but how they stay consistent, fast, and safe.
Most developers can write a SELECT statement in their sleep. But interviews don't test that — they test whether you understand what happens underneath when that query runs, why your app grinds to a halt under load, or why two users can accidentally corrupt each other's data without any error message. DBMS knowledge is the difference between a developer who writes queries and one who designs systems that survive production.
The real problem is that most learning resources dump definitions at you — ACID, normalization, indexes — without explaining why any of it was invented. Normalization wasn't invented to annoy you with extra joins; it was invented because data duplication causes silent corruption. Transactions weren't invented to slow things down; they were invented because power cuts happen mid-write. Understanding the 'why' is what separates good interview answers from great ones.
By the end of this article you'll be able to explain normalization forms with a concrete example, describe ACID properties with a real scenario an interviewer can't poke holes in, differentiate clustered vs non-clustered indexes in a single sentence, and confidently answer the tricky follow-up questions that trip most candidates up.
What DBMS Interview Questions Actually Test
DBMS interview questions assess your understanding of how databases manage concurrent access and optimize queries. The core mechanic is the interplay between isolation levels, locking strategies, and index structures. A deadlock occurs when two transactions each hold a lock the other needs, creating a cycle that the database must detect and break by aborting one transaction. Index traps arise when a query looks efficient but forces a full scan because the index doesn't match the filter or sort order.
In practice, databases use lock-based or multi-version concurrency control (MVCC). MVCC avoids read locks by keeping old row versions, reducing deadlock probability but not eliminating it. Indexes like B-trees provide O(log n) lookups, but composite indexes require leftmost prefix matching — a query on the second column alone cannot use the index. Understanding these mechanics lets you predict performance and concurrency behavior without guessing.
You need this knowledge when designing schemas for high-throughput systems. A payment service processing 10,000 transactions per second must avoid deadlocks by accessing tables in a consistent order. An analytics query scanning millions of rows can be made sub-millisecond by adding a covering index. These concepts separate a developer who writes queries from one who designs robust, scalable data access.
ACID Properties — What They Are and Why Every Transaction Needs Them
ACID is the four-part promise a database makes to you every time you run a transaction. Think of it as a bank's promise when you transfer money: either the full transfer happens or nothing does — you'll never lose money into thin air.
Atomicity means the transaction is all-or-nothing. If you're moving £500 from Account A to Account B and the server crashes after the debit but before the credit, atomicity rolls the debit back. No money disappears.
Consistency means the database only moves from one valid state to another. If your schema says 'balance cannot be negative', a consistent database will reject any transaction that violates that — even mid-flight.
Isolation means concurrent transactions don't see each other's dirty work. If two cashiers are processing transactions simultaneously, isolation makes it look like they ran one after the other, not at the same time.
Durability means once the database says 'committed', the data survives a crash. It achieves this through write-ahead logs — it writes the intention to disk before actually doing it.
Interviewers love asking which property is hardest to implement. The answer is Isolation, because perfect isolation requires serialisable transactions which kills performance, so databases let you tune it with isolation levels.
Normalization Explained — From Messy Spreadsheet to Clean Schema
Normalization is the process of restructuring your database tables to eliminate data redundancy and prevent update anomalies. The best way to understand why it exists is to see what happens when you skip it.
Imagine a single Orders table where you store the customer's name, address, and email alongside every order they've ever placed. If a customer moves house, you need to update their address in 50 rows. Miss one? Congratulations, you now have inconsistent data — two 'truths' for the same customer.
Normalization solves this by separating concerns into distinct tables and linking them with keys. There are several normal forms, but interviewers focus on 1NF through 3NF.
1NF (First Normal Form): Every column holds atomic (indivisible) values, and each row is unique. No comma-separated lists in a single cell.
2NF (Second Normal Form): Achieves 1NF AND every non-key column is fully dependent on the entire primary key — not just part of it. This only matters for composite primary keys.
3NF (Third Normal Form): Achieves 2NF AND no non-key column depends on another non-key column (no transitive dependencies). Classic example: storing both ZipCode and City — City depends on ZipCode, not on the primary key directly.
BCNF (Boyce-Codd Normal Form) is a stricter version of 3NF. Interviewers may test whether you know it exists, even if you don't memorise the formal definition.
Indexes — The Phone Book Trick That Makes Queries 1000x Faster
An index is a separate data structure the database maintains alongside your table to make lookups faster. Without an index, a query like WHERE email = 'alice@example.com' forces the database to scan every single row — called a full table scan. With an index on email, the database jumps directly to the matching rows like finding a name in an alphabetically sorted phone book.
The two types interviewers always ask about are clustered and non-clustered indexes.
A clustered index physically reorders the rows in the table to match the index order. Because the rows themselves are the index, you can only have ONE clustered index per table. In most databases, the primary key is automatically the clustered index.
A non-clustered index is a separate structure that stores the indexed column values alongside pointers back to the actual rows. You can have many of these. Think of it as the index pages at the back of a textbook — the book (table) isn't reordered, but the index tells you exactly which page to turn to.
The performance trade-off is real: indexes speed up reads dramatically but slow down writes (INSERT, UPDATE, DELETE) because the database must update the index every time data changes. This is why you don't just index every column — you index the columns that appear in WHERE, JOIN, and ORDER BY clauses on your most frequent queries.
Joins, Keys & Isolation Levels — The Questions That Actually Catch People Out
Most candidates can explain INNER JOIN. Fewer can explain why NULL breaks a FOREIGN KEY check or what happens when two transactions read the same row simultaneously. These are the questions that separate solid candidates from exceptional ones.
Transaction isolation levels control how much one transaction can see of another's uncommitted work. The four standard levels (lowest to highest isolation) are: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE.
READ UNCOMMITTED allows dirty reads — you can see another transaction's changes before they commit. This is almost never safe. READ COMMITTED (the default in PostgreSQL) means you only see committed data, but a value you read once might change if you read it again in the same transaction (non-repeatable read). REPEATABLE READ (MySQL InnoDB default) guarantees the same value every time you read within a transaction, but phantom rows — new rows that match your WHERE clause — can still appear. SERIALIZABLE is the strictest: transactions execute as if they're running one at a time.
The practical impact: most web apps run fine at READ COMMITTED. Financial systems that require precise calculations (like generating an account statement mid-month) should use REPEATABLE READ or SERIALIZABLE. Higher isolation = more locking = lower throughput.
For keys: a PRIMARY KEY uniquely identifies a row and cannot be NULL. A FOREIGN KEY enforces referential integrity — you cannot add an order for a customer_id that doesn't exist in the customers table, and (depending on your ON DELETE rule) you can't delete a customer who still has orders.
Query Optimization & Execution Plans — Reading the Database's Mind
You can write the perfect query, but if the database optimizer decides to scan 10 million rows, your app will crawl. Understanding how to read an execution plan is what transforms a good developer into a senior one. Interviewers love asking you to optimize a slow query — they want to see your thought process, not a magic bullet.
An execution plan shows the steps the database takes to run your query: which indexes it uses, which join algorithms (hash join, nested loop, merge join), and the estimated row counts. The most common red flag is a 'Seq Scan' on a large table — that means you're missing an index or your WHERE clause isn't SARGable (Search ARGument Able).
SARGability is key: wrapping an indexed column in a function like LOWER(email) = 'alice@example.com' makes the index unusable because the database has to evaluate the function for every row. Instead, use LOWER(email) in a functional index, or store the lowercase value.
Other optimizations: avoid SELECT *, use covering indexes, and analyze statistics regularly. A query that ran fine on 1000 rows can blow up on 1 million if the optimizer chooses a different plan due to stale statistics.
- Each step is an instruction: 'scan this table', 'join these two results', 'sort by column X'.
- The cost estimate is the time each step takes (in arbitrary units).
- The actual rows vs estimated rows shows if statistics are stale.
- You want to see index scans, not sequential scans, on large tables.
What Is a DBMS? — The Answer That Separates Junior From Senior
Interviewers don't ask "What is a DBMS?" to hear a dictionary definition. They're testing if you understand why your application isn't a database. A DBMS is middleware that sits between raw storage and your code. It handles concurrent reads without corruption, enforces schemas so your ORM doesn't silently write garbage, and recovers your data when the power dies mid-write.
MySQL, PostgreSQL, SQLite — they're all DBMSs. But the important part isn't what they are. It's what they stop you from having to build. No shared filesystem locking. No manual crash recovery. No hand-rolled permission checks. When a junior says "a DBMS stores data," a senior says "a DBMS enforces the contract between your application and your data."
Here's the trap: if you don't understand the DBMS as a separate process, you'll start treating your database as a file cabinet. Next thing you know, you're writing stored procedures to do business logic because the ORM is too slow. That's when everything breaks.
DBMS vs RDBMS — The Interview Question That Exposes Your SQL Lies
Every junior parrots this: "RDBMS stores data in tables with relationships." That's the textbook answer. The senior answer is: "RDBMS guarantees referential integrity using foreign keys; a plain DBMS doesn't."
Here's the difference that matters: a DBMS (like MongoDB or XML databases) lets you store whatever you want, wherever you want. No constraints. Fast writes, but you're on the hook for consistency. An RDBMS (PostgreSQL, MySQL) enforces relationships at the engine level. That foreign key you defined? It blocks you from deleting a customer with active orders. That unique constraint? It stops two users from registering the same email.
The interviewer is really asking: "Do you understand when to use which?" If you pick an RDBMS for a document store use case, you'll drown in schema migrations. If you pick a plain DBMS for financial transactions, you'll lose money. The correct answer: RDBMS when data integrity is non-negotiable (finance, orders, user profiles). Plain DBMS when speed and schema flexibility beat strictness (logs, analytics, session stores).
Triggers & Transactions — When the Database Fires Back
A trigger is a stored procedure that automatically executes in response to certain events on a table, like INSERT, UPDATE, or DELETE. The classic interview question asks you to write a trigger that updates the Emp table when Dep is updated: increment salary by a fixed amount for all employees in the affected department. Why is this dangerous in production? Because triggers are invisible side effects—they execute within the same transaction as the triggering statement. A poorly written trigger can lock rows, cascade unexpectedly, or cause a single update to bring down the entire database. The correct approach is to use AFTER UPDATE on Dep, join on dept_id, and apply the raise using UPDATE Emp SET salary = salary + increment WHERE dept_id IN (SELECT dept_id FROM inserted). Never forget the inserted pseudo-table: it holds the new values after the update, not the old ones. The key takeaway: triggers are tempting but they turn your schema into a black box—use them only when application-level logic cannot enforce the rule.
Above-Average Students — The Subquery That Separates Insight from Noise
Finding students whose marks exceed the class average is a deceptively simple subquery problem. The naive approach—SELECT * FROM Students WHERE Marks > AVG(Marks)—fails because aggregate functions cannot appear directly in WHERE clauses. Why? Because WHERE filters rows one at a time, while AVG() requires a full scan first. The correct solution uses a subquery: SELECT Student, Marks FROM Students WHERE Marks > (SELECT AVG(Marks) FROM Students). That inner SELECT runs once, computes the global average, and the outer query compares each row against that single value. Interviewers love this question because it tests whether you understand the order of operations in SQL: WHERE runs after FROM but before GROUP BY and aggregates. A correlated subquery would be needless here—it would recompute the average for every row, killing performance. The key insight: always ask yourself whether the subquery is static (runs once) or correlated (runs per row). For above-average students, the answer is static.
SELECT AVG(Marks) FROM Students scans the entire table. For millions of rows, this is expensive. Consider caching the average or using window functions with AVG() OVER() for better performance.Third-Highest Salary — The Nested Query That Exposes Your Rank
Finding the employee with the third-highest salary tests your ability to think in sets, not loops. The most intuitive approach uses a subquery with DISTINCT and LIMIT/OFFSET: SELECT salary FROM Employee ORDER BY salary DESC LIMIT 1 OFFSET 2 returns the third-highest unique salary. Then the outer query fetches the employee: SELECT FROM Employee WHERE salary = (SELECT DISTINCT salary FROM Employee ORDER BY salary DESC LIMIT 1 OFFSET 2). But this breaks if multiple employees share that salary—you get all of them, which might be correct or not depending on the interview twist. A more robust method uses a correlated subquery: SELECT FROM Employee e1 WHERE 2 = (SELECT COUNT(DISTINCT e2.salary) FROM Employee e2 WHERE e2.salary > e1.salary). Why does this work? It counts how many distinct salaries are strictly greater than the current row's salary; if exactly two, you're third. The key takeaway: always clarify whether ties should be included or skipped. The industry standard for 'Nth highest' is DENSE_RANK() or ROW_NUMBER(), but interviewers often want to see you build it from subqueries first.
DENSE_RANK()) instead. Interviewers accept subqueries, but production code must prioritize performance.DENSE_RANK() as the scalable alternative.The Midnight Deadlock Cascade That Took Down Checkout
- Never deploy DDL changes (schema, new queries) without reviewing the execution plan.
- Missing indexes on UPDATE/DELETE WHERE clauses are a common root cause of deadlocks.
- Use EXPLAIN ANALYZE before every new query in production-like data volumes.
EXPLAIN ANALYZE SELECT ...;SHOW INDEX FROM table_name;Key takeaways
Common mistakes to avoid
4 patternsConfusing DELETE, TRUNCATE and DROP
Thinking more indexes always means better performance
Misdefining a transaction as 'a single SQL statement'
Assuming FOREIGN KEYs never hurt performance
Interview Questions on This Topic
LeetCode Hard (SQL): Find the 'Top 3 Salaries' per Department using a window function. If two employees share the same salary, they should both be included in the ranking. (Testing: DENSE_RANK() vs RANK()).
DENSE_RANK() because it assigns the same rank to ties and does not skip numbers. RANK() would skip numbers after ties, breaking the 'top 3' requirement (e.g., if two are tied at rank 1, next rank would be 3, missing the third salary). Query:
SELECT department_id, employee_id, salary
FROM (
SELECT *, DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rnk
FROM employees
) ranked
WHERE rnk <= 3;Frequently Asked Questions
20+ years shipping production systems from the metal up. Lessons pulled from things that broke in production.
That's DBMS. Mark it forged?
11 min read · try the examples if you haven't