CS Fundamentals Intermediate

DBMS Interview Questions — Core Concepts, Tricky Queries & Real Answers

Q: What is the difference between a primary key and a unique key in SQL?

A primary key uniquely identifies each row AND cannot contain NULL values — every table can have only one. A unique key also enforces uniqueness but allows a single NULL value (in most databases) and a table can have multiple unique keys. Primary keys are typically used as the clustered index; unique keys create non-clustered indexes by default.

Q: What is a deadlock in a database and how do you prevent it?

A deadlock happens when two transactions are each waiting for a lock held by the other, so neither can proceed — like two cars blocking each other at a one-lane bridge. Databases detect this and automatically kill one transaction (the 'victim') to break the cycle. You prevent deadlocks by always acquiring locks in the same order across all transactions, keeping transactions short, and using the lowest isolation level that your use case actually requires.

Q: What is the difference between WHERE and HAVING in SQL?

WHERE filters rows BEFORE they are grouped or aggregated — it works on raw row data. HAVING filters groups AFTER a GROUP BY has been applied — it works on aggregated results. You cannot use an aggregate function like SUM() or COUNT() inside a WHERE clause; that's what HAVING is for. A practical rule: if you're filtering on a calculated aggregate, use HAVING; for everything else, use WHERE.

📅 March 2026 ⏱ 8 min read 🎯 Intermediate

In Plain English 🔥

A database management system is like a super-organized filing cabinet in a busy hospital. Every patient record, appointment, and prescription is stored in labeled folders (tables), cross-referenced by unique patient IDs (primary keys), and managed by a strict librarian (the DBMS) who makes sure two nurses never overwrite each other's notes at the same time. When an interviewer asks about DBMS, they want to know you understand not just where things are stored, but how they stay consistent, fast, and safe.

⚡ Quick Answer

Most developers can write a SELECT statement in their sleep. But interviews don't test that — they test whether you understand what happens underneath when that query runs, why your app grinds to a halt under load, or why two users can accidentally corrupt each other's data without any error message. DBMS knowledge is the difference between a developer who writes queries and one who designs systems that survive production.

The real problem is that most learning resources dump definitions at you — ACID, normalization, indexes — without explaining why any of it was invented. Normalization wasn't invented to annoy you with extra joins; it was invented because data duplication causes silent corruption. Transactions weren't invented to slow things down; they were invented because power cuts happen mid-write. Understanding the 'why' is what separates good interview answers from great ones.

By the end of this article you'll be able to explain normalization forms with a concrete example, describe ACID properties with a real scenario an interviewer can't poke holes in, differentiate clustered vs non-clustered indexes in a single sentence, and confidently answer the tricky follow-up questions that trip most candidates up.

ACID Properties — What They Are and Why Every Transaction Needs Them

ACID is the four-part promise a database makes to you every time you run a transaction. Think of it as a bank's promise when you transfer money: either the full transfer happens or nothing does — you'll never lose money into thin air.

Atomicity means the transaction is all-or-nothing. If you're moving £500 from Account A to Account B and the server crashes after the debit but before the credit, atomicity rolls the debit back. No money disappears.

Consistency means the database only moves from one valid state to another. If your schema says 'balance cannot be negative', a consistent database will reject any transaction that violates that — even mid-flight.

Isolation means concurrent transactions don't see each other's dirty work. If two cashiers are processing transactions simultaneously, isolation makes it look like they ran one after the other, not at the same time.

Durability means once the database says 'committed', the data survives a crash. It achieves this through write-ahead logs — it writes the intention to disk before actually doing it.

Interviewers love asking which property is hardest to implement. The answer is Isolation, because perfect isolation requires serialisable transactions which kills performance, so databases let you tune it with isolation levels.

BankTransferTransaction.sql · SQL

123456789101112131415161718192021222324252627282930

-- Simulates a bank transfer between two accounts.
-- Without wrapping this in a transaction, a crash between
-- the UPDATE statements would cause money to vanish.

BEGIN TRANSACTION;

    -- Step 1: Debit the sender's account
    UPDATE bank_accounts
    SET balance = balance - 500
    WHERE account_id = 'ACC-001';

    -- Step 2: Check the balance didn't go negative (Consistency check)
    -- If this fails, the whole transaction rolls back (Atomicity)
    IF (SELECT balance FROM bank_accounts WHERE account_id = 'ACC-001') < 0
    BEGIN
        ROLLBACK TRANSACTION;
        PRINT 'Transfer failed: insufficient funds. No changes were saved.';
        RETURN;
    END

    -- Step 3: Credit the receiver's account
    UPDATE bank_accounts
    SET balance = balance + 500
    WHERE account_id = 'ACC-002';

COMMIT TRANSACTION;
-- Durability kicks in here: the DBMS flushes the write-ahead log
-- to disk, so this commit survives a crash.
PRINT 'Transfer committed successfully.';

▶ Output

Transfer committed successfully.

-- If ACC-001 had insufficient funds:
Transfer failed: insufficient funds. No changes were saved.

🔥

Interview Gold:When asked 'what is the difference between Atomicity and Durability?', say this: Atomicity is about what happens DURING a transaction (all or nothing), while Durability is about what happens AFTER a commit (it sticks, even through a crash). They're two different time horizons of the same guarantee.

Normalization Explained — From Messy Spreadsheet to Clean Schema

Normalization is the process of restructuring your database tables to eliminate data redundancy and prevent update anomalies. The best way to understand why it exists is to see what happens when you skip it.

Imagine a single Orders table where you store the customer's name, address, and email alongside every order they've ever placed. If a customer moves house, you need to update their address in 50 rows. Miss one? Congratulations, you now have inconsistent data — two 'truths' for the same customer.

Normalization solves this by separating concerns into distinct tables and linking them with keys. There are several normal forms, but interviewers focus on 1NF through 3NF.

1NF (First Normal Form): Every column holds atomic (indivisible) values, and each row is unique. No comma-separated lists in a single cell.

2NF (Second Normal Form): Achieves 1NF AND every non-key column is fully dependent on the entire primary key — not just part of it. This only matters for composite primary keys.

3NF (Third Normal Form): Achieves 2NF AND no non-key column depends on another non-key column (no transitive dependencies). Classic example: storing both ZipCode and City — City depends on ZipCode, not on the primary key directly.

BCNF (Boyce-Codd Normal Form) is a stricter version of 3NF. Interviewers may test whether you know it exists, even if you don't memorise the formal definition.

NormalizationExample.sql · SQL

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354

-- ============================================================
-- BEFORE NORMALIZATION: One messy table (violates 2NF and 3NF)
-- ============================================================
-- Problem 1: customer_city depends on customer_zip, not order_id (3NF violation)
-- Problem 2: If the same customer places 100 orders, their details repeat 100 times

CREATE TABLE orders_unnormalized (
    order_id       INT PRIMARY KEY,
    customer_id    INT,
    customer_name  VARCHAR(100),   -- Redundant: repeated per order
    customer_zip   VARCHAR(10),    -- Redundant
    customer_city  VARCHAR(100),   -- Transitive dependency: depends on zip, not order_id
    product_id     INT,
    product_name   VARCHAR(100),   -- Redundant: should live in a products table
    quantity       INT,
    order_date     DATE
);

-- ============================================================
-- AFTER NORMALIZATION: Separated into three clean tables (3NF)
-- ============================================================

-- Customers table: each customer's details live in exactly ONE place
CREATE TABLE customers (
    customer_id   INT PRIMARY KEY,
    customer_name VARCHAR(100) NOT NULL,
    customer_zip  VARCHAR(10)  NOT NULL,
    customer_city VARCHAR(100) NOT NULL   -- Still here but tied to customer, not order
);

-- Products table: product info is not duplicated across orders
CREATE TABLE products (
    product_id   INT PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    unit_price   DECIMAL(10, 2) NOT NULL
);

-- Orders table: only stores what is genuinely unique per order
CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    customer_id INT NOT NULL REFERENCES customers(customer_id),
    product_id  INT NOT NULL REFERENCES products(product_id),
    quantity    INT NOT NULL,
    order_date  DATE NOT NULL
);

-- Now if a customer moves city, you update ONE row in customers.
-- All 100 of their orders instantly reflect the change.
UPDATE customers
SET customer_city = 'Manchester', customer_zip = 'M1 1AA'
WHERE customer_id = 42;

SELECT 'One update. Zero inconsistency.' AS result;

▶ Output

result
------------------------------------
One update. Zero inconsistency.

⚠️

Watch Out:Interviewers sometimes ask 'should you always normalize to 3NF?' The answer is NO. In read-heavy analytics databases (OLAP), denormalization is often intentional — fewer joins means faster aggregation queries. Always tie your answer to the use case: OLTP (transactional systems) benefits from normalization; OLAP (reporting/analytics) often tolerates or embraces denormalization.

Indexes — The Phone Book Trick That Makes Queries 1000x Faster

An index is a separate data structure the database maintains alongside your table to make lookups faster. Without an index, a query like WHERE email = 'alice@example.com' forces the database to scan every single row — called a full table scan. With an index on email, the database jumps directly to the matching rows like finding a name in an alphabetically sorted phone book.

The two types interviewers always ask about are clustered and non-clustered indexes.

A clustered index physically reorders the rows in the table to match the index order. Because the rows themselves are the index, you can only have ONE clustered index per table. In most databases, the primary key is automatically the clustered index.

A non-clustered index is a separate structure that stores the indexed column values alongside pointers back to the actual rows. You can have many of these. Think of it as the index pages at the back of a textbook — the book (table) isn't reordered, but the index tells you exactly which page to turn to.

The performance trade-off is real: indexes speed up reads dramatically but slow down writes (INSERT, UPDATE, DELETE) because the database must update the index every time data changes. This is why you don't just index every column — you index the columns that appear in WHERE, JOIN, and ORDER BY clauses on your most frequent queries.

IndexPerformanceDemo.sql · SQL

1234567891011121314151617181920212223242526272829303132333435363738394041

-- Scenario: an e-commerce app with millions of orders.
-- Without an index on customer_email, every login check is a full scan.

-- Create a table to demonstrate index impact
CREATE TABLE user_sessions (
    session_id    BIGINT       PRIMARY KEY,  -- Auto-creates a CLUSTERED index
    customer_email VARCHAR(255) NOT NULL,
    login_time    DATETIME     NOT NULL,
    ip_address    VARCHAR(45)
);

-- Simulate inserting a large volume of data (conceptual, not run here)
-- INSERT INTO user_sessions ... (1,000,000 rows)

-- ============================================================
-- BEFORE INDEX: The database reads every row to find alice's sessions
-- ============================================================
EXPLAIN SELECT * FROM user_sessions
WHERE customer_email = 'alice@example.com';
-- Output would show: type = 'ALL', rows = 1000000 (full table scan)

-- ============================================================
-- CREATE A NON-CLUSTERED INDEX on customer_email
-- This is a SEPARATE structure — the table row order doesn't change
-- ============================================================
CREATE INDEX idx_user_sessions_email
    ON user_sessions (customer_email);
-- The index stores sorted email values + pointers to the actual rows.
-- Writes (INSERT/UPDATE/DELETE) are now slightly slower because
-- the index must be updated too — a deliberate trade-off.

-- ============================================================
-- AFTER INDEX: Database uses the index to jump straight to alice's rows
-- ============================================================
EXPLAIN SELECT * FROM user_sessions
WHERE customer_email = 'alice@example.com';
-- Output would show: type = 'ref', rows = 3 (index lookup — 99.9997% fewer reads)

-- TIP: Only index columns used in WHERE, JOIN ON, and ORDER BY clauses.
-- Indexing every column wastes disk space and slows writes significantly.

▶ Output

⚠️

Pro Tip:If an interviewer asks 'when would you NOT use an index?', give three cases: (1) Small tables where a full scan is faster than index overhead, (2) Columns with very low cardinality like a boolean 'is_active' — the index barely helps because half the table matches, (3) Tables with extremely high write rates like a real-time event log — the index maintenance cost outweighs the read benefit.

Joins, Keys & Isolation Levels — The Questions That Actually Catch People Out

Most candidates can explain INNER JOIN. Fewer can explain why NULL breaks a FOREIGN KEY check or what happens when two transactions read the same row simultaneously. These are the questions that separate solid candidates from exceptional ones.

Transaction isolation levels control how much one transaction can see of another's uncommitted work. The four standard levels (lowest to highest isolation) are: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE.

READ UNCOMMITTED allows dirty reads — you can see another transaction's changes before they commit. This is almost never safe. READ COMMITTED (the default in PostgreSQL) means you only see committed data, but a value you read once might change if you read it again in the same transaction (non-repeatable read). REPEATABLE READ (MySQL InnoDB default) guarantees the same value every time you read within a transaction, but phantom rows — new rows that match your WHERE clause — can still appear. SERIALIZABLE is the strictest: transactions execute as if they're running one at a time.

The practical impact: most web apps run fine at READ COMMITTED. Financial systems that require precise calculations (like generating an account statement mid-month) should use REPEATABLE READ or SERIALIZABLE. Higher isolation = more locking = lower throughput.

For keys: a PRIMARY KEY uniquely identifies a row and cannot be NULL. A FOREIGN KEY enforces referential integrity — you cannot add an order for a customer_id that doesn't exist in the customers table, and (depending on your ON DELETE rule) you can't delete a customer who still has orders.

IsolationLevelDemo.sql · SQL

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748

-- ============================================================
-- Demonstrates the dirty read problem (READ UNCOMMITTED)
-- and how READ COMMITTED prevents it
-- ============================================================

-- Setup: a product stock table
CREATE TABLE product_inventory (
    product_id   INT PRIMARY KEY,
    product_name VARCHAR(100),
    stock_count  INT NOT NULL CHECK (stock_count >= 0)
);

INSERT INTO product_inventory VALUES (101, 'Wireless Headphones', 50);

-- ============================================================
-- SCENARIO: Two concurrent transactions
-- Transaction A is reducing stock (but hasn't committed yet)
-- Transaction B is checking stock availability
-- ============================================================

-- TRANSACTION A (runs first, holds an open transaction)
BEGIN TRANSACTION; -- Transaction A starts
    UPDATE product_inventory
    SET stock_count = 0          -- All 50 units 'reserved' but not confirmed
    WHERE product_id = 101;
    -- *** A has NOT committed yet — maybe it's still validating payment ***

-- TRANSACTION B at READ UNCOMMITTED isolation (dangerous!)
-- It can see A's uncommitted change — a dirty read
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION;
    SELECT stock_count FROM product_inventory WHERE product_id = 101;
    -- Returns: 0  <-- WRONG. A hasn't committed. A might roll back!
    -- B incorrectly tells the customer 'out of stock'
COMMIT;

-- Now Transaction A rolls back (payment failed)
ROLLBACK; -- stock_count is back to 50. B read a phantom reality.

-- ============================================================
-- WITH READ COMMITTED: B only sees committed data
-- ============================================================
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
    SELECT stock_count FROM product_inventory WHERE product_id = 101;
    -- Returns: 50  <-- CORRECT. A's uncommitted change is invisible.
COMMIT;

▶ Output

-- READ UNCOMMITTED scenario:
stock_count
-----------
0 <-- Dirty read: sees uncommitted data (dangerous)

-- READ COMMITTED scenario:
stock_count
-----------
50 <-- Correct: only sees committed reality

🔥

Interview Gold:When asked about isolation levels, don't just list them — anchor each one to a real problem it solves or causes: 'READ UNCOMMITTED causes dirty reads. READ COMMITTED stops dirty reads but allows non-repeatable reads. REPEATABLE READ stops both but allows phantom reads. SERIALIZABLE stops all three but is the slowest.' That structure shows you understand consequences, not just definitions.

Feature	Clustered Index	Non-Clustered Index
Number per table	Only 1	Up to 999 (SQL Server) / many
Physical row order	Rows ARE the index — physically sorted	Separate structure with row pointers
Read speed	Fastest for range scans on indexed column	Fast for exact lookups; requires pointer follow
Write overhead	Moderate (row order must be maintained)	Higher per index (each index updated separately)
Storage	No extra storage — index IS the table	Extra disk space per index
Default usage	Usually the PRIMARY KEY	Foreign keys, WHERE/ORDER BY columns
Best for	ID-based range queries: WHERE id BETWEEN 100 AND 200	Filtering on non-PK columns: WHERE email = '...'

🎯 Key Takeaways

ACID isn't just a buzzword — each letter solves a specific real-world failure mode: power cuts (Durability), crashes mid-transaction (Atomicity), constraint violations (Consistency), and concurrent users corrupting shared data (Isolation).
Normalization to 3NF is correct for OLTP systems, but intentional denormalization is often the right call for OLAP/analytics — always tie your answer to the read/write profile of the system.
A clustered index physically reorders the table rows and you get exactly one per table; non-clustered indexes are separate pointer structures and you can have many — but every index adds write overhead.
Isolation levels are a dial between consistency and performance: READ COMMITTED is the pragmatic default for most apps, SERIALIZABLE is safest but slowest, and READ UNCOMMITTED is almost never appropriate in production code.

⚠ Common Mistakes to Avoid

✕Mistake 1: Confusing DELETE, TRUNCATE and DROP — Candidates say 'they all remove data' without knowing the crucial differences. DELETE is DML: it removes rows one-by-one, fires triggers, and can be rolled back. TRUNCATE is DDL: it removes all rows instantly, doesn't fire row-level triggers, resets identity counters, and cannot be rolled back in most databases. DROP removes the entire table structure and data permanently. Fix: remember the mantra — DELETE removes rows carefully, TRUNCATE nukes the data but keeps the table, DROP destroys everything.
✕Mistake 2: Thinking more indexes always means better performance — Candidates index every column 'to be safe' and then wonder why their INSERT-heavy app slows to a crawl. Every index must be updated on every write. A table with 10 indexes on a write-heavy workload can be 10x slower on inserts than the same table with 1 index. Fix: only add indexes for columns that appear in your actual WHERE, JOIN ON, and ORDER BY clauses, and benchmark before and after.
✕Mistake 3: Misdefining a transaction as 'a single SQL statement' — This is wrong and interviewers know it. A transaction is a logical unit of work that can contain multiple statements, all of which succeed or fail together. A single statement can be its own transaction (auto-commit), but the defining characteristic is the ACID guarantee around a group of operations, not the statement count. Fix: define a transaction by its ACID properties, not by its size.

Interview Questions on This Topic

QExplain the difference between REPEATABLE READ and SERIALIZABLE isolation levels. When would choosing REPEATABLE READ over SERIALIZABLE cause a real problem in production?
QYou have a query that runs in 8 seconds on a table with 10 million rows. Walk me through your exact process for diagnosing and fixing the performance issue — what tools do you use, what do you look for, and what would you try first?
QIf normalization is so great, why do large-scale analytics platforms like Amazon Redshift or Google BigQuery often use denormalized, wide table schemas? Isn't that just bad database design?

Frequently Asked Questions

What is the difference between a primary key and a unique key in SQL?

A primary key uniquely identifies each row AND cannot contain NULL values — every table can have only one. A unique key also enforces uniqueness but allows a single NULL value (in most databases) and a table can have multiple unique keys. Primary keys are typically used as the clustered index; unique keys create non-clustered indexes by default.

What is a deadlock in a database and how do you prevent it?

A deadlock happens when two transactions are each waiting for a lock held by the other, so neither can proceed — like two cars blocking each other at a one-lane bridge. Databases detect this and automatically kill one transaction (the 'victim') to break the cycle. You prevent deadlocks by always acquiring locks in the same order across all transactions, keeping transactions short, and using the lowest isolation level that your use case actually requires.

What is the difference between WHERE and HAVING in SQL?

WHERE filters rows BEFORE they are grouped or aggregated — it works on raw row data. HAVING filters groups AFTER a GROUP BY has been applied — it works on aggregated results. You cannot use an aggregate function like SUM() or COUNT() inside a WHERE clause; that's what HAVING is for. A practical rule: if you're filtering on a calculated aggregate, use HAVING; for everything else, use WHERE.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged