DBMS Interview Questions — Core Concepts, Tricky Queries & Real Answers
- ACID isn't just a buzzword — each letter solves a specific real-world failure mode: power cuts (Durability), crashes mid-transaction (Atomicity), constraint violations (Consistency), and concurrent users corrupting shared data (Isolation).
- Normalization to 3NF is correct for OLTP systems, but intentional denormalization is often the right call for OLAP/analytics — always tie your answer to the read/write profile of the system.
- A clustered index physically reorders the table rows and you get exactly one per table; non-clustered indexes are separate pointer structures and you can have many — but every index adds write overhead.
A database management system is like a super-organized filing cabinet in a busy hospital. Every patient record, appointment, and prescription is stored in labeled folders (tables), cross-referenced by unique patient IDs (primary keys), and managed by a strict librarian (the DBMS) who makes sure two nurses never overwrite each other's notes at the same time. When an interviewer asks about DBMS, they want to know you understand not just where things are stored, but how they stay consistent, fast, and safe.
Most developers can write a SELECT statement in their sleep. But interviews don't test that — they test whether you understand what happens underneath when that query runs, why your app grinds to a halt under load, or why two users can accidentally corrupt each other's data without any error message. DBMS knowledge is the difference between a developer who writes queries and one who designs systems that survive production.
The real problem is that most learning resources dump definitions at you — ACID, normalization, indexes — without explaining why any of it was invented. Normalization wasn't invented to annoy you with extra joins; it was invented because data duplication causes silent corruption. Transactions weren't invented to slow things down; they were invented because power cuts happen mid-write. Understanding the 'why' is what separates good interview answers from great ones.
By the end of this article you'll be able to explain normalization forms with a concrete example, describe ACID properties with a real scenario an interviewer can't poke holes in, differentiate clustered vs non-clustered indexes in a single sentence, and confidently answer the tricky follow-up questions that trip most candidates up.
ACID Properties — What They Are and Why Every Transaction Needs Them
ACID is the four-part promise a database makes to you every time you run a transaction. Think of it as a bank's promise when you transfer money: either the full transfer happens or nothing does — you'll never lose money into thin air.
Atomicity means the transaction is all-or-nothing. If you're moving £500 from Account A to Account B and the server crashes after the debit but before the credit, atomicity rolls the debit back. No money disappears.
Consistency means the database only moves from one valid state to another. If your schema says 'balance cannot be negative', a consistent database will reject any transaction that violates that — even mid-flight.
Isolation means concurrent transactions don't see each other's dirty work. If two cashiers are processing transactions simultaneously, isolation makes it look like they ran one after the other, not at the same time.
Durability means once the database says 'committed', the data survives a crash. It achieves this through write-ahead logs — it writes the intention to disk before actually doing it.
Interviewers love asking which property is hardest to implement. The answer is Isolation, because perfect isolation requires serialisable transactions which kills performance, so databases let you tune it with isolation levels.
-- Scenario: Production-grade bank transfer logic -- Package: io.thecodeforge.db.transactions BEGIN TRANSACTION; -- Step 1: Debit the sender's account -- We assume a CHECK constraint on 'balance >= 0' exists for Consistency UPDATE io_thecodeforge.bank_accounts SET balance = balance - 500 WHERE account_id = 'ACC-001'; -- Step 2: Check the balance explicitly for custom business logic -- If this fails, the whole transaction rolls back (Atomicity) IF (SELECT balance FROM io_thecodeforge.bank_accounts WHERE account_id = 'ACC-001') < 0 BEGIN ROLLBACK TRANSACTION; PRINT 'Transfer failed: insufficient funds. No changes were saved.'; RETURN; END -- Step 3: Credit the receiver's account UPDATE io_thecodeforge.bank_accounts SET balance = balance + 500 WHERE account_id = 'ACC-002'; COMMIT TRANSACTION; -- Durability: The DBMS flushes the WAL (Write-Ahead Log) to disk here. PRINT 'Transfer committed successfully.';
-- If ACC-001 had insufficient funds:
Transfer failed: insufficient funds. No changes were saved.
Normalization Explained — From Messy Spreadsheet to Clean Schema
Normalization is the process of restructuring your database tables to eliminate data redundancy and prevent update anomalies. The best way to understand why it exists is to see what happens when you skip it.
Imagine a single Orders table where you store the customer's name, address, and email alongside every order they've ever placed. If a customer moves house, you need to update their address in 50 rows. Miss one? Congratulations, you now have inconsistent data — two 'truths' for the same customer.
Normalization solves this by separating concerns into distinct tables and linking them with keys. There are several normal forms, but interviewers focus on 1NF through 3NF.
1NF (First Normal Form): Every column holds atomic (indivisible) values, and each row is unique. No comma-separated lists in a single cell.
2NF (Second Normal Form): Achieves 1NF AND every non-key column is fully dependent on the entire primary key — not just part of it. This only matters for composite primary keys.
3NF (Third Normal Form): Achieves 2NF AND no non-key column depends on another non-key column (no transitive dependencies). Classic example: storing both ZipCode and City — City depends on ZipCode, not on the primary key directly.
BCNF (Boyce-Codd Normal Form) is a stricter version of 3NF. Interviewers may test whether you know it exists, even if you don't memorise the formal definition.
-- ============================================================ -- AFTER NORMALIZATION: Separated into three clean tables (3NF) -- Package: io.thecodeforge.db.schema -- ============================================================ -- Customers table: 3NF compliant, no transitive dependencies CREATE TABLE io_thecodeforge.customers ( customer_id INT PRIMARY KEY, customer_name VARCHAR(100) NOT NULL, customer_zip VARCHAR(10) NOT NULL, customer_city VARCHAR(100) NOT NULL ); -- Products table: Centralized product definitions CREATE TABLE io_thecodeforge.products ( product_id INT PRIMARY KEY, product_name VARCHAR(100) NOT NULL, unit_price DECIMAL(10, 2) NOT NULL ); -- Orders table: Linked via Foreign Keys for Referential Integrity CREATE TABLE io_thecodeforge.orders ( order_id INT PRIMARY KEY, customer_id INT NOT NULL REFERENCES io_thecodeforge.customers(customer_id), product_id INT NOT NULL REFERENCES io_thecodeforge.products(product_id), quantity INT NOT NULL, order_date DATE NOT NULL ); -- If a customer moves, we update ONE row. Zero Risk of Update Anomalies. UPDATE io_thecodeforge.customers SET customer_city = 'Manchester', customer_zip = 'M1 1AA' WHERE customer_id = 42; SELECT 'Update complete. Data integrity preserved.' AS result;
--------------------------------------------
Update complete. Data integrity preserved.
Indexes — The Phone Book Trick That Makes Queries 1000x Faster
An index is a separate data structure the database maintains alongside your table to make lookups faster. Without an index, a query like WHERE email = 'alice@example.com' forces the database to scan every single row — called a full table scan. With an index on email, the database jumps directly to the matching rows like finding a name in an alphabetically sorted phone book.
The two types interviewers always ask about are clustered and non-clustered indexes.
A clustered index physically reorders the rows in the table to match the index order. Because the rows themselves are the index, you can only have ONE clustered index per table. In most databases, the primary key is automatically the clustered index.
A non-clustered index is a separate structure that stores the indexed column values alongside pointers back to the actual rows. You can have many of these. Think of it as the index pages at the back of a textbook — the book (table) isn't reordered, but the index tells you exactly which page to turn to.
The performance trade-off is real: indexes speed up reads dramatically but slow down writes (INSERT, UPDATE, DELETE) because the database must update the index every time data changes. This is why you don't just index every column — you index the columns that appear in WHERE, JOIN, and ORDER BY clauses on your most frequent queries.
-- Scenario: High-traffic user lookup optimization -- Package: io.thecodeforge.db.indexing CREATE TABLE io_thecodeforge.user_sessions ( session_id BIGINT PRIMARY KEY, -- Clustered index customer_email VARCHAR(255) NOT NULL, login_time DATETIME NOT NULL ); -- ============================================================ -- ANALYZE: Before adding a non-clustered index -- ============================================================ EXPLAIN ANALYZE SELECT * FROM io_thecodeforge.user_sessions WHERE customer_email = 'dev@thecodeforge.io'; -- Expect: Sequential Scan (O(N)) -- ============================================================ -- OPTIMIZE: Creating a B-Tree Non-Clustered Index -- ============================================================ CREATE INDEX idx_user_sessions_email ON io_thecodeforge.user_sessions (customer_email); -- ============================================================ -- VERIFY: After indexing -- ============================================================ EXPLAIN ANALYZE SELECT * FROM io_thecodeforge.user_sessions WHERE customer_email = 'dev@thecodeforge.io'; -- Expect: Index Seek/Scan (O(log N))
-- Before: Full Table Scan (Cost high, rows scanned: 1M)
-- After: Index Lookup (Cost low, rows scanned: 1)
Joins, Keys & Isolation Levels — The Questions That Actually Catch People Out
Most candidates can explain INNER JOIN. Fewer can explain why NULL breaks a FOREIGN KEY check or what happens when two transactions read the same row simultaneously. These are the questions that separate solid candidates from exceptional ones.
Transaction isolation levels control how much one transaction can see of another's uncommitted work. The four standard levels (lowest to highest isolation) are: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE.
READ UNCOMMITTED allows dirty reads — you can see another transaction's changes before they commit. This is almost never safe. READ COMMITTED (the default in PostgreSQL) means you only see committed data, but a value you read once might change if you read it again in the same transaction (non-repeatable read). REPEATABLE READ (MySQL InnoDB default) guarantees the same value every time you read within a transaction, but phantom rows — new rows that match your WHERE clause — can still appear. SERIALIZABLE is the strictest: transactions execute as if they're running one at a time.
The practical impact: most web apps run fine at READ COMMITTED. Financial systems that require precise calculations (like generating an account statement mid-month) should use REPEATABLE READ or SERIALIZABLE. Higher isolation = more locking = lower throughput.
For keys: a PRIMARY KEY uniquely identifies a row and cannot be NULL. A FOREIGN KEY enforces referential integrity — you cannot add an order for a customer_id that doesn't exist in the customers table, and (depending on your ON DELETE rule) you can't delete a customer who still has orders.
-- Demonstrating Dirty Read Protection -- Package: io.thecodeforge.db.concurrency -- Transaction A: Updating stock (Uncommitted) BEGIN TRANSACTION; UPDATE io_thecodeforge.inventory SET stock_count = 0 WHERE product_id = 99; -- Transaction B: Reading under READ COMMITTED SET TRANSACTION ISOLATION LEVEL READ COMMITTED; BEGIN TRANSACTION; SELECT stock_count FROM io_thecodeforge.inventory WHERE product_id = 99; -- Result: 50 (Ignores Transaction A's uncommitted change) COMMIT; -- Transaction A fails/rolls back ROLLBACK; -- If Transaction B used READ UNCOMMITTED, it would have seen '0', -- causing a 'Dirty Read' and potentially incorrect business logic.
-----------
50 -- Correct: READ COMMITTED prevents phantom data
| Feature | Clustered Index | Non-Clustered Index |
|---|---|---|
| Number per table | Only 1 | Up to 999 (SQL Server) / many |
| Physical row order | Rows ARE the index — physically sorted | Separate structure with row pointers |
| Read speed | Fastest for range scans on indexed column | Fast for exact lookups; requires pointer follow |
| Write overhead | Moderate (row order must be maintained) | Higher per index (each index updated separately) |
| Storage | No extra storage — index IS the table | Extra disk space per index |
| Default usage | Usually the PRIMARY KEY | Foreign keys, WHERE/ORDER BY columns |
| Best for | ID-based range queries: WHERE id BETWEEN 100 AND 200 | Filtering on non-PK columns: WHERE email = '...' |
🎯 Key Takeaways
- ACID isn't just a buzzword — each letter solves a specific real-world failure mode: power cuts (Durability), crashes mid-transaction (Atomicity), constraint violations (Consistency), and concurrent users corrupting shared data (Isolation).
- Normalization to 3NF is correct for OLTP systems, but intentional denormalization is often the right call for OLAP/analytics — always tie your answer to the read/write profile of the system.
- A clustered index physically reorders the table rows and you get exactly one per table; non-clustered indexes are separate pointer structures and you can have many — but every index adds write overhead.
- Isolation levels are a dial between consistency and performance: READ COMMITTED is the pragmatic default for most apps, SERIALIZABLE is safest but slowest, and READ UNCOMMITTED is almost never appropriate in production code.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QLeetCode Hard (SQL): Find the 'Top 3 Salaries' per Department using a window function. If two employees share the same salary, they should both be included in the ranking. (Testing:
DENSE_RANK()vsRANK()). - QScenario: Your production database is experiencing high CPU usage during peak hours. You notice a query using 'WHERE LOWER(username) = 'john''. Why is this query potentially causing a full table scan even if 'username' is indexed, and how do you fix it? (Testing: Functional Indexes vs SARGability).
- QTradeoff Analysis: If normalization is so great, why do large-scale analytics platforms like Amazon Redshift or Google BigQuery often use denormalized, wide table schemas? Isn't that just bad database design?
- QLeetCode Medium: Write a query to find all 'Customers who never ordered' using both a LEFT JOIN and a NOT EXISTS clause. Explain which one you would prefer for a table with 100 million records.
Frequently Asked Questions
What is the difference between a primary key and a unique key in SQL?
A primary key uniquely identifies each row AND cannot contain NULL values — every table can have only one. A unique key also enforces uniqueness but allows a single NULL value (in most databases like MySQL/PostgreSQL) and a table can have multiple unique keys. Strategically, PKs are usually for surrogate IDs, while Unique Keys handle natural keys like 'email' or 'SSN'.
What is a deadlock in a database and how do you prevent it?
A deadlock happens when Transaction A holds a lock on Row 1 and wants Row 2, while Transaction B holds Row 2 and wants Row 1. They wait forever. Prevention involves acquiring locks in a consistent order across all services, keeping transactions minimal in duration, and using 'Deadlock Detectors' (built into most RDBMS) which kill the 'cheaper' transaction to resolve the cycle.
What is the difference between WHERE and HAVING in SQL?
WHERE filters rows BEFORE they are grouped or aggregated — it works on raw row data. HAVING filters groups AFTER a GROUP BY has been applied — it works on aggregated results (e.g., SUM, COUNT). You cannot use an aggregate function inside a WHERE clause; you must use HAVING for that.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.