Mid-level 12 min · March 05, 2026

Database Normalization — Partial Dependency Pitfalls

A single product rename corrupted 12% of order history due to partial dependency in a composite key.

N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Normalization eliminates data anomalies by storing each fact once
  • 1NF: atomic values and a primary key — no lists or duplicate rows
  • 2NF: no partial dependency — every column depends on the whole composite key
  • 3NF: no transitive dependency — non-key columns must not depend on other non-key columns
  • Denormalize only after measuring: start normal, then trade redundancy for read speed
✦ Definition~90s read
What is Database Normalization?

Database normalization is the process of organizing relational tables to eliminate data redundancy and update anomalies. It solves the problem of inconsistent data caused by storing the same fact in multiple rows — think a customer address duplicated across a thousand orders, where changing it requires hunting down every copy.

Imagine your junk drawer at home — phone chargers, batteries, takeout menus, and a 2019 receipt all crammed together.

Normalization achieves this through a series of formal rules (normal forms) that decompose tables into smaller, focused structures linked by foreign keys. The trade-off is query complexity: normalized schemas often require JOINs to reconstruct the original data, which can impact read performance at scale.

Partial dependency is a specific violation of Second Normal Form (2NF) that occurs when a non-key column depends on only part of a composite primary key. For example, in an OrderDetails table with a composite key of (OrderID, ProductID), storing ProductName creates a partial dependency because ProductName depends solely on ProductID, not the full key.

This leads to redundancy (the same product name repeated for every order line) and update anomalies (changing a product name requires updating every row with that ProductID). Fixing it means splitting into Products and OrderDetails tables, each with a single-column primary key.

In practice, normalization is not optional for transactional systems (OLTP) where data integrity is paramount — think banking, e-commerce checkout, or CRM. Tools like PostgreSQL, MySQL, and SQL Server enforce these principles through constraints, but the schema design is on you.

The alternative is denormalization, which senior engineers apply strategically in read-heavy analytics (OLAP) or caching layers (e.g., Redis, materialized views) after proving the normalized schema is a bottleneck. Most production systems live at 3NF or BCNF; anything beyond is academic unless you're dealing with edge cases like multi-valued dependencies.

Plain-English First

Imagine your junk drawer at home — phone chargers, batteries, takeout menus, and a 2019 receipt all crammed together. Finding anything is a nightmare, and if you move house, you have to sort through the chaos twice. Normalization is the process of giving everything its own logical drawer so nothing is duplicated and everything is easy to find. In database terms, it means organizing your tables so each piece of information lives in exactly one place, and every table has one clear purpose.

Every production database that has ever turned into a maintenance nightmare shares a common origin story: it was designed in a hurry, by someone who just needed it to work today. Columns get stuffed with comma-separated values. Customer addresses get copy-pasted into three different tables. One typo in a city name means your analytics are quietly lying to you. This isn't a hypothetical — it's Tuesday at most startups. Database normalization is the discipline that prevents this slow-motion catastrophe before it starts.

The core problem normalization solves is data anomalies — three specific failure modes that emerge when your data is structured poorly. An insertion anomaly means you can't record a fact without recording an unrelated fact alongside it. A deletion anomaly means removing one piece of information accidentally destroys another. An update anomaly means changing one real-world fact requires hunting down and editing a dozen rows, and if you miss even one, your database now contains contradictions. Normalization eliminates all three by enforcing a simple rule: each fact should be stored exactly once.

By the end of this article you'll be able to look at a messy, flat table and identify exactly which normal form it violates and why. You'll know how to decompose it into clean, well-structured tables with proper foreign key relationships. You'll also understand the pragmatic cases where senior engineers deliberately denormalize — and why that decision should be intentional, not accidental.

Why Database Normalization Is Not Optional

Database normalization is the process of structuring a relational database to reduce data redundancy and eliminate update anomalies. The core mechanic is decomposing tables into smaller, related tables based on functional dependencies — ensuring each non-key attribute depends on the key, the whole key, and nothing but the key (so help you Codd).

In practice, normalization works by progressively applying normal forms: 1NF eliminates repeating groups, 2NF removes partial dependencies (where a non-key column depends on only part of a composite key), and 3NF eliminates transitive dependencies. Each step isolates data so that a single fact lives in exactly one place — a design that directly prevents insertion, update, and deletion anomalies.

Use normalization whenever you model transactional data (OLTP) — orders, users, inventory. It matters because denormalized schemas silently corrupt data over time: updating a customer's address in one row but missing the other 14 copies is a data integrity bug, not a performance trade-off. Normalization buys you correctness at the cost of join overhead; denormalize only after profiling proves a bottleneck.

Normalization ≠ Performance Penalty
Joins are not inherently slow — missing indexes are. A properly normalized schema with indexed foreign keys often outperforms a denormalized mess.
Production Insight
A team stored customer addresses in the orders table to avoid a join. When a customer moved, only 60% of their orders were updated — the rest shipped to the old address, causing a $50k refund incident.
The symptom: inconsistent address data across rows for the same customer, discovered only after angry customer calls.
Rule of thumb: if you can write a query that returns two different values for the same logical attribute (e.g., address for same customer_id), you have a partial dependency — normalize to 2NF immediately.
Key Takeaway
Normalization eliminates update anomalies by ensuring each fact is stored once.
Partial dependencies are the most common violation in production — always check composite keys for 2NF compliance.
Denormalize only after you measure a real join bottleneck; premature denormalization is a data integrity debt.
Normalization Pitfalls: Partial Dependency to BCNF THECODEFORGE.IO Normalization Pitfalls: Partial Dependency to BCNF Key stages and traps in database normalization from 1NF to BCNF 1NF: Atomic Values One value per cell, no repeating groups 2NF: Full Dependency No partial dependency on composite key 3NF: No Transitive Dependency Non-key column must not depend on another non-key BCNF: Stronger 3NF Every determinant must be a candidate key Strategic Denormalization Break rules for performance when justified ⚠ Partial dependency in 2NF: non-key depends on part of composite key Split table to remove partial dependency; ensure full functional dependency THECODEFORGE.IO
thecodeforge.io
Normalization Pitfalls: Partial Dependency to BCNF
Database Normalization

First Normal Form (1NF): One Value Per Cell, Every Row Unique

First Normal Form has two requirements that sound obvious until you see how often they're violated in the wild. First: every cell must contain exactly one atomic (indivisible) value. Second: every row must be uniquely identifiable by a primary key.

The most common 1NF violation is storing lists inside a single column — think a phone_numbers column with the value '555-1234, 555-5678'. It looks harmless until you need to find everyone with a specific number, and suddenly you're writing LIKE '%555-5678%' queries that can't use indexes and will break the moment someone adds a space after the comma.

The second violation is subtler: repeating groups. Instead of a list in one column, some designers create phone_number_1, phone_number_2, phone_number_3. This hits the same wall — what happens when a contact gets a fourth number? You're altering a production table schema instead of inserting a row.

Fixing both violations follows the same pattern: pull the repeating data into its own table and use a foreign key relationship. This is the foundational move that all higher normal forms build on.

normalize_to_1nf.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- ============================================================
-- BEFORE: A contacts table that violates First Normal Form
-- Problem 1: phone_numbers stores multiple values in one cell
-- Problem 2: No clean primary key (name isn't unique enough)
-- ============================================================

CREATE TABLE contacts_unnormalized (
    contact_name    VARCHAR(100),
    email           VARCHAR(150),
    phone_numbers   VARCHAR(255)   -- '555-1234, 555-5678' <-- BAD
);

INSERT INTO contacts_unnormalized VALUES
    ('Maria Garcia',  'maria@example.com',  '555-1234, 555-9999'),
    ('James Okafor',  'james@example.com',  '555-5678'),
    ('Maria Garcia',  'maria2@example.com', '555-0001');  -- duplicate name, now what?

-- ============================================================
-- AFTER: Restructured to satisfy First Normal Form
-- Each table has a clear primary key.
-- Each cell holds exactly one value.
-- Phone numbers get their own table — one row per number.
-- ============================================================

CREATE TABLE contacts (
    contact_id   INT           PRIMARY KEY AUTO_INCREMENT,
    full_name    VARCHAR(100)  NOT NULL,
    email        VARCHAR(150)  NOT NULL UNIQUE
);

CREATE TABLE contact_phone_numbers (
    phone_id     INT           PRIMARY KEY AUTO_INCREMENT,
    contact_id   INT           NOT NULL,
    phone_number VARCHAR(20)   NOT NULL,          -- one number per row
    phone_type   VARCHAR(20)   DEFAULT 'mobile',  -- 'mobile', 'home', 'work'
    FOREIGN KEY (contact_id) REFERENCES contacts(contact_id)
);

-- Insert contacts first (parent table)
INSERT INTO contacts (full_name, email) VALUES
    ('Maria Garcia', 'maria@example.com'),   -- gets contact_id = 1
    ('James Okafor', 'james@example.com');   -- gets contact_id = 2

-- Insert phone numbers (child table) — Maria has two, James has one
INSERT INTO contact_phone_numbers (contact_id, phone_number, phone_type) VALUES
    (1, '555-1234', 'mobile'),   -- Maria's mobile
    (1, '555-9999', 'work'),     -- Maria's work — clean, no comma lists
    (2, '555-5678', 'mobile');   -- James's mobile

-- Now finding all contacts with a specific number is fast and correct
SELECT
    c.full_name,
    p.phone_number,
    p.phone_type
FROM contacts c
JOIN contact_phone_numbers p ON c.contact_id = p.contact_id
ORDER BY c.full_name, p.phone_type;
Output
full_name | phone_number | phone_type
--------------+--------------+-----------
James Okafor | 555-5678 | mobile
Maria Garcia | 555-1234 | mobile
Maria Garcia | 555-9999 | work
Watch Out: JSON Columns Are a 1NF Trap
Modern databases support JSON column types (PostgreSQL's jsonb, MySQL's JSON). Storing '{"phones": ["555-1234", "555-9999"]}' in a JSON column is still a 1NF violation — you've just hidden the comma-separated list inside a different format. Use JSON columns for genuinely unstructured, schema-less data (user preferences, feature flags), not for structured relational data you'll need to query, join, or index.
Production Insight
A support app stored customer tags in a single TEXT column as comma-separated values.
A query to find customers with 'VIP' tag used LIKE '%VIP%' and accidentally matched 'VVIP' or 'VIPER'.
Switching to a normalized tag table eliminated false positives and allowed index usage.
Rule: if you need to query individual items in a list, normalize that list immediately.
Key Takeaway
1NF is about atomic values and unique rows.
If you're storing lists in a column or have no primary key, you're not even at the starting line.
Fix it by creating child tables — one row per value, one table per entity.
Diagnosing 1NF Violations
IfColumn contains multiple values (comma, pipe, JSON array)
UseViolates 1NF — extract to child table
IfNo primary key or duplicate rows exist
UseViolates 1NF — add a primary key or composite key
IfColumns named phone_1, phone_2, phone_3
UseViolates 1NF — replace with child table and phone_type
IfEvery cell holds one value and rows are unique
UseSatisfies 1NF — proceed to check 2NF

Second Normal Form (2NF): Every Column Must Depend on the Whole Key

Second Normal Form only applies to tables with a composite primary key — a key made of two or more columns. The rule is: every non-key column must depend on the entire primary key, not just part of it. When a column only depends on part of the key, that's called a partial dependency, and it's the engine that generates update anomalies.

Picture an order_items table with a composite key of (order_id, product_id). The quantity ordered absolutely depends on both — it's the quantity of that specific product in that specific order. But what about product_name? That only depends on product_id. If you ever rename a product, you now have to update every single row in order_items that references it. Miss one, and your order history lies.

The fix is the same move every time: extract the partially-dependent columns into their own table, keyed by the partial key they actually belong to. In this case, product_name moves to a products table keyed by product_id. The order_items table keeps the foreign key and nothing else about the product itself.

This is why well-designed databases look like a spider web of small, focused tables — each one stores exactly the facts it owns.

normalize_to_2nf.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
-- ============================================================
-- BEFORE: order_items violates Second Normal Form
-- Composite PK: (order_id, product_id)
-- PROBLEM: product_name and unit_price depend only on product_id
--          not on the full composite key.
-- This means updating a product name requires touching every order.
-- ============================================================

CREATE TABLE order_items_bad (
    order_id        INT,
    product_id      INT,
    product_name    VARCHAR(100),  -- partial dependency on product_id only
    unit_price      DECIMAL(10,2), -- partial dependency on product_id only
    quantity        INT,           -- this is fine: depends on BOTH order_id + product_id
    PRIMARY KEY (order_id, product_id)
);

INSERT INTO order_items_bad VALUES
    (101, 42, 'Wireless Keyboard', 49.99, 2),
    (102, 42, 'Wireless Keyboard', 49.99, 1),  -- product name duplicated
    (103, 42, 'Wireless Keyboard', 49.99, 3);  -- and again...

-- If the product is renamed to 'BT Keyboard', we must update 3 rows.
-- If we miss one row, we have contradictory product names in the database.
UPDATE order_items_bad
    SET product_name = 'BT Keyboard'
    WHERE product_id = 42;  -- easy to forget a WHERE clause here

-- ============================================================
-- AFTER: Properly separated into 2NF-compliant tables
-- Products table owns product facts.
-- order_items table owns only order-specific facts.
-- ============================================================

CREATE TABLE products (
    product_id    INT            PRIMARY KEY,
    product_name  VARCHAR(100)   NOT NULL,
    unit_price    DECIMAL(10,2)  NOT NULL  -- current price lives here
);

CREATE TABLE orders (
    order_id      INT   PRIMARY KEY AUTO_INCREMENT,
    customer_id   INT   NOT NULL,
    ordered_at    TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    order_id      INT             NOT NULL,
    product_id    INT             NOT NULL,
    quantity      INT             NOT NULL,
    -- Store price_at_purchase separately — product price can change later
    -- This is intentional: it records the price that was actually charged
    price_at_purchase DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id)   REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

INSERT INTO products VALUES
    (42, 'Wireless Keyboard', 49.99);

-- Now renaming the product only ever touches ONE row in ONE table
UPDATE products
    SET product_name = 'BT Keyboard'
    WHERE product_id = 42;

-- Query still works perfectly — no data inconsistency possible
SELECT
    oi.order_id,
    p.product_name,
    oi.quantity,
    oi.price_at_purchase,
    (oi.quantity * oi.price_at_purchase) AS line_total
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
ORDER BY oi.order_id;
Output
order_id | product_name | quantity | price_at_purchase | line_total
---------+----------------+----------+-------------------+-----------
101 | BT Keyboard | 2 | 49.99 | 99.98
102 | BT Keyboard | 1 | 49.99 | 49.99
103 | BT Keyboard | 3 | 49.99 | 149.97
Pro Tip: Price At Purchase Is Not a 2NF Violation
Notice that order_items stores price_at_purchase even though the product's current price is in the products table. This looks like duplication, but it isn't — it's recording a historical fact. The price a customer was charged is tied to both the order and the product at a specific moment in time. Removing it would mean you can never reconstruct an accurate invoice after a price change. This is a deliberate design choice, not a normalization error.
Production Insight
A B2B billing system stored product vendor details in invoice_lines — a 2NF violation.
When a vendor changed their address, 15,000 invoice rows needed updates. The batch job timed out and left inconsistent data.
Extracting vendor info to a vendors table fixed it. One UPDATE, zero inconsistency.
Rule: if a column's value is determined by a subset of the PK, it belongs in its own table.
Key Takeaway
2NF only applies to composite keys.
Every non-key column must depend on the whole key, not a subset.
If product_name only needs product_id to be determined, it belongs in a products table, not in order_items.
Diagnosing 2NF Violations
IfTable has a single-column primary key
UseAutomatically satisfies 2NF — check 3NF
IfComposite primary key and a non-key column depends on part of the key
UseViolates 2NF — extract to separate table keyed by the partial dependency
IfComposite primary key and all non-key columns depend on the full key
UseSatisfies 2NF — proceed to check 3NF

Third Normal Form (3NF): No Column Should Depend on a Non-Key Column

Third Normal Form builds directly on 2NF. Once you've eliminated partial dependencies, you look for transitive dependencies: situations where Column C depends on Column B, and Column B depends on the primary key — but Column C does not directly depend on the primary key itself. The chain A → B → C is the problem.

A classic example is storing a customer's city and zip_code in the same table as their orders. The zip code is tied to the customer (fair enough), but the city is determined by the zip code — not directly by the customer. If you update a zip code's city name in one row but not another, you've got contradictions again.

Another textbook case: storing an employee's department_name and department_budget in the employees table. The budget depends on the department, not on the employee. One budget change requires updating every row for every employee in that department.

The fix — as always — is extraction. Pull the transitively-dependent columns into their own table keyed by the column they actually depend on. After 3NF, your schema should feel almost boring in its consistency: every table has a primary key, every other column in that table tells you something directly and exclusively about that key.

normalize_to_3nf.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
-- ============================================================
-- BEFORE: employees table with a transitive dependency
-- PK: employee_id
-- department_name depends on department_id (not employee_id)
-- department_budget depends on department_id (not employee_id)
-- This is the transitive chain: employee_id -> department_id -> budget
-- ============================================================

CREATE TABLE employees_bad (
    employee_id       INT           PRIMARY KEY,
    employee_name     VARCHAR(100)  NOT NULL,
    department_id     INT           NOT NULL,
    department_name   VARCHAR(100),  -- depends on department_id, not employee_id
    department_budget DECIMAL(12,2)  -- depends on department_id, not employee_id
);

INSERT INTO employees_bad VALUES
    (1, 'Aisha Kamara',   10, 'Engineering', 500000.00),
    (2, 'Leo Petrov',     10, 'Engineering', 500000.00),  -- budget duplicated
    (3, 'Nadia Osei',     20, 'Marketing',   200000.00);

-- Engineering gets a budget increase. We must update every Engineering employee row.
-- Miss one row and the database contradicts itself.
UPDATE employees_bad
    SET department_budget = 600000.00
    WHERE department_id = 10;
-- If this WHERE clause had a bug, one employee would show the old budget forever.

-- ============================================================
-- AFTER: 3NF compliant — transitive dependency eliminated
-- departments table owns department facts.
-- employees table owns only employee facts + a FK to departments.
-- ============================================================

CREATE TABLE departments (
    department_id     INT            PRIMARY KEY,
    department_name   VARCHAR(100)   NOT NULL UNIQUE,
    department_budget DECIMAL(12,2)  NOT NULL
);

CREATE TABLE employees (
    employee_id     INT           PRIMARY KEY,
    employee_name   VARCHAR(100)  NOT NULL,
    department_id   INT           NOT NULL,
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

INSERT INTO departments VALUES
    (10, 'Engineering', 500000.00),
    (20, 'Marketing',   200000.00);

INSERT INTO employees VALUES
    (1, 'Aisha Kamara', 10),
    (2, 'Leo Petrov',   10),
    (3, 'Nadia Osei',   20);

-- Budget increase: ONE row updated, zero risk of inconsistency
UPDATE departments
    SET department_budget = 600000.00
    WHERE department_id = 10;

-- Full view of employees with their department details via JOIN
SELECT
    e.employee_id,
    e.employee_name,
    d.department_name,
    d.department_budget
FROM employees e
JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, e.employee_name;
Output
employee_id | employee_name | department_name | department_budget
------------+---------------+---------------+------------------
1 | Aisha Kamara | Engineering | 600000.00
2 | Leo Petrov | Engineering | 600000.00
3 | Nadia Osei | Marketing | 200000.00
Interview Gold: BCNF — 3NF's Stricter Cousin
Boyce-Codd Normal Form (BCNF) is a slightly stricter version of 3NF that matters when a table has multiple overlapping candidate keys. In practice, the vast majority of schemas that reach 3NF also satisfy BCNF — you'll almost never need to go further unless you're designing a schema with complex multi-column candidate key overlaps. If an interviewer asks 'what comes after 3NF?', saying BCNF and explaining the distinction will immediately set you apart.
Production Insight
A hospital management system stored patient room number and room type in the appointments table.
When a room's category changed (e.g., private to semi-private), every future appointment for that room had to be updated.
Moving room attributes to a rooms table with room_id as FK eliminated the transitive dependency.
Rule: if Column C is determined by Column B, and B is not the PK, extract C and B into their own table.
Key Takeaway
3NF eliminates transitive dependencies.
If Column C is determined by Column B, and Column B is determined by the primary key, C belongs in its own table keyed by B.
The symptom is always the same: an update to one real-world fact requires touching multiple rows.
Diagnosing 3NF Violations
IfA non-key column is functionally dependent on another non-key column
UseViolates 3NF — extract the dependent columns into a new table
IfAll non-key columns depend only on the primary key
UseSatisfies 3NF — consider BCNF for edge cases
IfNeed to store historical snapshots (e.g., price at purchase)
UseThis is not a 3NF violation if the snapshot value is determined by the full key and time

When Senior Engineers Break the Rules: Strategic Denormalization

Everything above is the theory. Here's the reality: at scale, joins are expensive, and sometimes the right engineering decision is to deliberately denormalize. This isn't a failure of discipline — it's a calibrated trade-off. The key word is deliberately.

Denormalization is appropriate when you have a read-heavy workload where a complex multi-table join runs thousands of times per second and your profiler shows it's a bottleneck. A reporting dashboard that aggregates millions of orders shouldn't be recalculating totals from raw line items on every page load. In that case, storing a pre-computed order_total on the orders table — even though it's technically derivable — is a valid performance choice.

The discipline is this: normalize first, then denormalize with evidence. Never skip normalization because you think it'll be slow. Measure first. An unmeasured premature denormalization gives you all the complexity of maintaining redundant data with none of the proven performance benefit.

The other common case is read replicas and data warehouses. Your OLTP (transactional) database should be normalized. Your OLAP (analytical) data warehouse can use star schemas and wide, flat tables optimized for aggregation — because the write patterns are completely different (bulk loads, not row-by-row updates).

strategic_denormalization.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
-- ============================================================
-- SCENARIO: An e-commerce order summary page
-- Problem: Recalculating order totals from line items on every
--          page load is destroying database performance.
-- Solution: Store a denormalized order_total on the orders table
--           and keep it in sync via a trigger.
-- ============================================================

-- Normalized base tables (we keep these — they're the source of truth)
CREATE TABLE orders (
    order_id       INT            PRIMARY KEY AUTO_INCREMENT,
    customer_id    INT            NOT NULL,
    ordered_at     TIMESTAMP      DEFAULT CURRENT_TIMESTAMP,
    -- Denormalized cache column — intentionally redundant
    -- Updated automatically by the trigger below
    order_total    DECIMAL(12,2)  DEFAULT 0.00
);

CREATE TABLE order_items (
    item_id            INT            PRIMARY KEY AUTO_INCREMENT,
    order_id           INT            NOT NULL,
    product_id         INT            NOT NULL,
    quantity           INT            NOT NULL,
    price_at_purchase  DECIMAL(10,2)  NOT NULL,
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

-- ============================================================
-- Trigger: Automatically recalculates and updates order_total
-- whenever an order_item is inserted, updated, or deleted.
-- This keeps the denormalized column consistent with no manual effort.
-- ============================================================

DELIMITER //
CREATE TRIGGER sync_order_total
AFTER INSERT ON order_items
FOR EACH ROW
BEGIN
    -- Recalculate and store the total for the affected order
    UPDATE orders
    SET order_total = (
        SELECT COALESCE(SUM(quantity * price_at_purchase), 0)
        FROM order_items
        WHERE order_id = NEW.order_id  -- only recalculate the affected order
    )
    WHERE order_id = NEW.order_id;
END//
DELIMITER ;

-- Insert an order and its items
INSERT INTO orders (customer_id) VALUES (7);

INSERT INTO order_items (order_id, product_id, quantity, price_at_purchase) VALUES
    (1, 42, 2, 49.99),   -- 2x keyboards
    (1, 88, 1, 29.99);   -- 1x mouse

-- The order summary page now reads ONE row from orders — no join needed
-- This is the read that happens thousands of times per second
SELECT
    order_id,
    customer_id,
    order_total,  -- already computed, instant read
    ordered_at
FROM orders
WHERE order_id = 1;

-- The normalized detail view still works perfectly when you need it
SELECT
    oi.order_id,
    oi.product_id,
    oi.quantity,
    oi.price_at_purchase,
    (oi.quantity * oi.price_at_purchase) AS line_total
FROM order_items oi
WHERE oi.order_id = 1;
Output
-- Order summary (fast, single-row read):
order_id | customer_id | order_total | ordered_at
---------+-------------+-------------+-------------------
1 | 7 | 129.97 | 2024-01-15 14:32:01
-- Order detail (normalized breakdown):
order_id | product_id | quantity | price_at_purchase | line_total
---------+------------+----------+-------------------+-----------
1 | 42 | 2 | 49.99 | 99.98
1 | 88 | 1 | 29.99 | 29.99
Pro Tip: Document Every Denormalization Decision
When you add a denormalized column, add a SQL comment directly on the column definition explaining why it exists, what keeps it in sync, and what to check if it ever gets out of sync. Future you (and your teammates) will be grateful at 2am during an incident. Undocumented denormalization is just a bug that hasn't been discovered yet.
Production Insight
A fintech startup skipped normalization entirely to 'move fast'. They had a single orders table with 50 columns.
When they needed to add a new payment gateway, the schema change required a full table rebuild — 2 hours of downtime.
They had to revert and normalize first. The 'fast' approach cost them three sprint cycles of unplanned work.
Rule: you can always denormalize later, but you can't easily normalize a poorly designed schema under pressure.
Key Takeaway
Denormalization is a performance tool, not a design shortcut.
Always normalize first, measure your actual query bottlenecks with EXPLAIN ANALYZE, and only denormalize with a documented reason and a sync mechanism you trust.

How to Diagnose Normal Form Violations in an Existing Schema

You rarely get to design a new database from scratch. More often, you inherit a legacy schema with hundreds of tables and no documentation. How do you quickly identify which tables violate 1NF, 2NF, or 3NF?

The process is systematic. Start by listing all tables that have no primary key — those are immediate 1NF violations. Next, for tables with composite primary keys, query the data distribution: run SELECT partial_key_column, non_key_column, COUNT() FROM table GROUP BY partial_key_column, non_key_column HAVING COUNT() > 1. If a non-key column value appears with multiple different values of the other part of the key, there's a partial dependency.

For transitive dependencies, look for columns that logically depend on another non-key column. A heuristic: if two non-key columns always appear together (e.g., zip_code and city), one is likely a transitive dependency. Run SELECT column_a, column_b, COUNT(*) FROM table GROUP BY column_a, column_b HAVING COUNT(DISTINCT column_b) > 1 — if column_b varies while column_a is the same, column_b depends on column_a, not on the PK.

Finally, verify that every foreign key in your schema actually points to a primary key. Orphaned foreign keys are a symptom of a deeper normalization issue.

diagnose_normalization.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- ============================================================
-- Diagnosis queries to detect normal form violations
-- ============================================================

-- 1. Find tables without a primary key
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'your_db'
  AND TABLE_TYPE = 'BASE TABLE'
  AND TABLE_NAME NOT IN (
      SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
      WHERE CONSTRAINT_TYPE = 'PRIMARY KEY'
  );

-- 2. Detect partial dependencies (2NF) in a composite-key table
-- Example: order_items (order_id, product_id, product_name, quantity)
-- Check if product_name varies for the same product_id
SELECT product_id, product_name, COUNT(*) AS occurrences
FROM order_items
GROUP BY product_id, product_name
HAVING COUNT(*) > 1;
-- If product_id always maps to the same product_name, it's likely a partial dep

-- 3. Detect transitive dependencies (3NF)
-- Example: employees (employee_id, department_id, department_budget)
-- Check if department_budget varies for the same department_id
SELECT department_id, department_budget, COUNT(*) AS occurrences
FROM employees
GROUP BY department_id, department_budget
HAVING COUNT(*) > 1;
-- If department_id always maps to the same budget, transitive dependency exists

-- 4. Find foreign keys that don't have matching primary key rows
SELECT *
FROM order_items oi
LEFT JOIN products p ON oi.product_id = p.product_id
WHERE p.product_id IS NULL;
Output
-- Results for a typical violative schema:
-- 1: Tables without PK: `audit_log`
-- 2: Partial dependency confirmed: product_id 42 always maps to 'Wireless Keyboard'
-- 3: Transitive dependency confirmed: department_id 10 always maps to budget 500000
-- 4: Orphaned product_id 999 in order_items
Mental Model: The Dependency Chain
  • 1NF: Every cell holds exactly one fact. No compound values.
  • 2NF: Every non-key fact must be determined by the ENTIRE primary key, not just part of it.
  • 3NF: Every non-key fact must be directly about the primary key, not about some other non-key fact.
  • Denormalization: Deliberately break the rules when you have measured evidence that the cost of a join outweighs the risk of inconsistency.
Production Insight
During a migration to a normalized schema, a DBA ran diagnosis queries and found 14 tables with partial dependencies.
The fix took one weekend of schema redesign, but the performance impact was zero — the queries were identical after normalizing.
The real win: no more update anomalies. The team estimated it saved 8 hours per month of manual data correction.
Rule: invest one day in analysis to avoid months of inconsistent data.
Key Takeaway
Diagnosing violations is systematic.
List tables without PK (1NF). Check composite-key tables for partial deps (2NF). Look for non-key columns that depend on other non-key columns (3NF).
Fix each violation with extraction into a new table — the pattern is always the same.

Boyce-Codd Normal Form (BCNF): The 3NF Bug You Didn't Know You Had

You think 3NF means you're done. It doesn't. BCNF is what happens when 3NF lets a non-trivial dependency slip through the cracks because of overlapping candidate keys. The rule is brutally simple: for every functional dependency X → Y, X must be a superkey. Not a candidate key, not part of a composite key. A superkey.

Here's the concrete scenario that kills 3NF. You've got a table storing which engineers are assigned to which project phase. The business rule: each phase has exactly one lead engineer, but an engineer can lead multiple phases. In 3NF, this table looks clean until you try to add an engineer to a new phase that already has a lead. You can't, because the phase-plus-lead combination is your primary key, and that engineer isn't the lead. You've just discovered a hidden functional dependency: Phase → Lead. The phase determines the lead, but phase alone isn't a superkey. That dependency violates BCNF.

The fix is surgical: split the phase-lead assignment into its own table, then keep the engineer-phase assignments separate. This isn't academic. I've seen production schemas where this exact design caused silent data loss during ETL pipelines. The symptom was always the same: rows that should exist simply didn't, and no one knew why until we traced it back to this 3NF blind spot.

BcnfViolationFix.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — database tutorial

-- BCNF violation hiding in plain sight
CREATE TABLE phase_assignments (
    engineer_id   INT,
    phase_id      INT,
    phase_lead    VARCHAR(100),
    PRIMARY KEY (engineer_id, phase_id)
);

-- Step 1: split out the lead dependency
CREATE TABLE phase_leads (
    phase_id      INT PRIMARY KEY,
    lead_engineer_id INT NOT NULL
);

-- Step 2: remove the redundant column
CREATE TABLE phase_engineers (
    engineer_id   INT,
    phase_id      INT,
    PRIMARY KEY (engineer_id, phase_id),
    FOREIGN KEY (phase_id) REFERENCES phase_leads(phase_id)
);
Output
Tables created. BCNF satisfied: every determinant is a superkey.
Production Trap:
If you have two candidate keys that overlap, you almost certainly have a BCNF violation. Check any composite primary key where a subset of its columns determines another column. That's your signal.
Key Takeaway
If a functional dependency's left side isn't a superkey, you have a BCNF violation — even if your schema passes 3NF.

Normalization vs. Denormalization: The Cost-To-Query Tradeoff

Every time you normalize a table, you trade write simplicity for read complexity. That's the transaction. A fully normalized schema means your inserts and updates are atomic — one row change, one table, no cascading failures. But your read queries? You're writing five-join monsters that make junior devs cry and your query planner work overtime.

Denormalization reverses that trade. You intentionally duplicate data so that a single SELECT can return a full report without touching six tables. The cost: update anomalies. Change one value in one place, and you must remember to change it everywhere else. Miss one, and your data is lying to you.

Here's when you should denormalize: your query-to-write ratio is 50:1 or worse. Reading dashboards, analytics, or audit logs — these rarely update, but they query constantly. Your read path should be fast, even if your write path becomes a choreographed dance of triggers and application-level consistency checks. The rule I follow: normalize for transactional integrity, denormalize for report speed. Never denormalize a column that changes more than once a month. Never normalize a column that is read a thousand times for every one write.

Real example: an e-commerce order table. You could normalize into orders, order_items, products, customers, addresses. Five tables. A simple order history page takes six joins. Or you stash the customer name and shipping address directly in the orders table. One table. One read. One second saved per request. On 10,000 requests per minute, that's real money.

DenormalizeForReads.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — database tutorial

-- Normalized read: 6 tables, 5 joins
SELECT o.id, c.name, a.street, p.name, oi.quantity
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN addresses a ON o.shipping_address_id = a.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.id = 1042;

-- Denormalized read: 1 table, 0 joins
SELECT id, customer_name, shipping_street, product_name, quantity
FROM order_summary
WHERE id = 1042;

-- Cost: update_customer_address now must sync to order_summary
-- Rule: only denormalize fields that change less than 1% of the time.
Output
(1042, 'Acme Corp', '123 Industrial Blvd', 'Steel Bolt M8', 500)
Senior Shortcut:
Profile your query patterns before you normalize or denormalize. Run pg_stat_statements or equivalent. If reads dominate by 10x or more, you're paying the price of normalization where it hurts most.
Key Takeaway
Normalize for write integrity; denormalize for read performance. Know your ratio, and never optimize a path you can't measure.

Visualizing the 2NF to 3NF Transformation

The jump from 2NF to 3NF removes transitive dependencies. In 2NF, a non-key column depends on the whole key but can still depend on another non-key column. That transitive chain causes update anomalies: changing a lecturer's office requires updating every course row. The fix is the same pattern used in 2NF: extract the dependent columns into their own table. Visualize this as splitting a chain: Course -> Lecturer -> Office becomes Course -> Lecturer_ID and Lecturer -> Office. The arrow now points from a foreign key to a primary key, never between non-key attributes. This isolates each piece of data to one row in one table. The 3NF schema prevents the anomaly where one lecturer's office change forces a scan of every course row. It also reduces storage because office addresses appear once instead of duplicated per course. The cost: queries joining Courses to Lecturers need one extra JOIN, which is negligible with proper indexing.

ViolationTo3NF.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — database tutorial

-- 2NF violation: transitive dependency Office -> Lecturer_Name
CREATE TABLE Course_Lecturer (
    CourseID INT,
    LecturerName VARCHAR(50),
    LecturerOffice VARCHAR(20),
    PRIMARY KEY (CourseID, LecturerName)
);

-- 3NF fix: split into two tables
CREATE TABLE Courses (
    CourseID INT PRIMARY KEY,
    LecturerID INT REFERENCES Lecturers(LecturerID)
);

CREATE TABLE Lecturers (
    LecturerID INT PRIMARY KEY,
    LecturerName VARCHAR(50),
    Office VARCHAR(20)
);
Output
Table created successfully.
Production Trap:
Developers often stop at 2NF thinking 'every column depends on the key.' But transitive dependencies silently breed duplication. Always check: does Column A determine Column B when neither is a key?
Key Takeaway
In 3NF, every non-key column depends only on the whole primary key, not on another non-key column.

Practical Tips for Normalizing Databases in SQL

Normalization in SQL isn't academic theory—it's a debugging workflow. Start by running SELECT DISTINCT on every column combination you suspect is duplicated. If the same City and ZipCode appear with the same Address 500 times, you found a 2NF violation. Next, profile candidate keys: use COUNT(DISTINCT column) vs COUNT() to test uniqueness. A ratio near 1.0 suggests a key. For transitive dependencies, query pairs of non-key columns: SELECT col1, col2, COUNT() FROM table GROUP BY col1, col2 HAVING COUNT(*) > 1. If every distinct col1 maps to exactly one col2, you have a 3NF violation. The fix is always CREATE TABLE new_table AS SELECT DISTINCT ... then ALTER the original to add a foreign key. Script the conversion in a transaction to roll back on error. Finally, index the foreign key columns immediately—without indexes, JOINs on normalized schemas kill query performance. Normalize first, then add indexes; never the reverse.

NormalizeCheck.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
// io.thecodeforge — database tutorial

-- Find candidate keys: high distinct ratio
SELECT
    COUNT(DISTINCT employee_id) / COUNT(*)::float AS key_candidate_ratio
FROM timesheets;

-- Detect transitive dependency: does department_id always give same department_name?
SELECT department_id, department_name, COUNT(*)
FROM employees
GROUP BY department_id, department_name
HAVING COUNT(DISTINCT department_name) > 1;
Output
key_candidate_ratio: 1.0000
(no rows) - no transitive violation found.
Production Trap:
Running normalization on live tables without a transaction is how data gets orphaned. Always wrap CREATE TABLE ... AS DISTINCT and ALTER TABLE in BEGIN/COMMIT with a ROLLBACK on error.
Key Takeaway
Use DISTINCT and GROUP BY with HAVING to mathematically prove normalization violations before writing any DDL.

Introduction

Database normalization is the methodical process of organizing relational data to minimize redundancy and prevent anomalies during insert, update, or delete operations. It decomposes larger, poorly structured tables into smaller, well-defined ones that adhere to formal constraints called normal forms. Each normal form introduces stricter rules: 1NF ensures atomic values and unique rows; 2NF eliminates partial dependencies; 3NF removes transitive dependencies. Without normalization, databases suffer from update inconsistencies (changing a value in one row but not duplicates), insertion anomalies (being unable to record a fact because it requires a related row to exist), and deletion anomalies (losing unintended data when removing a single record). The goal is not perfection, but a defensible structure that balances data integrity with query performance. Senior engineers evaluate tradeoffs: pure normalization often improves write reliability and storage efficiency, but may increase join complexity. Understanding why each normal form exists—not just how to apply it—is critical for designing resilient, long-lived database schemas.

Normalization_Flow.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// io.thecodeforge — database tutorial
// 25 lines max

-- Unnormalized: repeating groups, duplicate rows
CREATE TABLE Orders_Unnormalized (
    OrderID INT,
    CustomerName TEXT,
    ProductNames TEXT, -- comma-separated
    PRIMARY KEY (OrderID)
);

-- 1NF: atomic values, unique rows
CREATE TABLE Orders_1NF (
    OrderID INT,
    CustomerName TEXT,
    ProductName TEXT,
    PRIMARY KEY (OrderID, ProductName)
);

-- 2NF: remove partial dependencies (ProductName depends on ProductID)
CREATE TABLE Orders_2NF (
    OrderID INT PRIMARY KEY,
    CustomerName TEXT
);

CREATE TABLE OrderItems (
    OrderID INT,
    ProductID INT,
    PRIMARY KEY (OrderID, ProductID)
);

-- 3NF: remove transitive (CustomerName depends on CustomerID)
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY,
    CustomerName TEXT
);

CREATE TABLE Orders_3NF (
    OrderID INT PRIMARY KEY,
    CustomerID INT REFERENCES Customers(CustomerID)
);
Output
Schema progression from unnormalized to 3NF
Production Trap:
Rushing to 3NF without understanding query patterns frequently leads to severe performance regressions. Measure first, then normalize.
Key Takeaway
Normalization systematically eliminates data anomalies by enforcing constraints: 1NF (atomicity), 2NF (full key dependency), and 3NF (no transitive dependencies).

Key Takeaways

First, normalization is not optional—it is a baseline engineering practice that prevents data corruption from the start. Second, each normal form solves a specific class of anomaly: 1NF bans repeating groups and ensures every row is identifiable; 2NF prohibits partial dependencies where a column depends on only part of a composite key; 3NF eliminates transitive dependencies where a non-key column determines another non-key column. Third, denormalization is a conscious, performance-driven exception—not an excuse for sloppy design. Fourth, diagnosing violations in production schemas requires inspecting functional dependencies, not just looking at table structures. Fifth, over-normalization introduces excessive joins, burdens write paths, and can make simple reads exponentially slower. Finally, the best normalized schema is one that aligns with your specific workload: high-write systems benefit from normalization’s consistency guarantees, while read-heavy analytic systems may strategically break rules. Master these principles to design databases that are both correct and performant.

Diagnose_Violation.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// io.thecodeforge — database tutorial
// 25 lines max

-- Check for partial dependency (2NF violation)
SELECT DISTINCT EmployeeID, DepartmentName
FROM Employees_Denormalized
WHERE EmployeeID IS NOT NULL;

-- Result: DepartmentName repeats for every row with same EmployeeID
-- Fix: move to separate EmployeeDepartments table

-- Check for transitive dependency (3NF violation)
SELECT DISTINCT ZipCode, City, State
FROM Addresses;

-- Result: City/State determined by ZipCode
-- Fix: create ZipCode lookup table

-- Detect 1NF violation: comma-separated values
SELECT OrderID
FROM Orders
WHERE ProductList LIKE '%,%'
LIMIT 10;

-- Run this weekly as a data quality check
Output
Result sets highlighting dependency violations
Senior Engineer Note:
Run the diagnosis queries as periodic health checks. If violations appear, document business rules before restructuring.
Key Takeaway
Use pattern-matching and dependency analysis queries to detect normal form violations in existing schemas before refactoring.
● Production incidentPOST-MORTEMseverity: high

The $500k Update Anomaly: When Renaming a Product Broke Order History

Symptom
After renaming 'Wireless Keyboard' to 'BT Keyboard' in the products table, 12% of order history records still showed the old name. Accounting reports flagged a discrepancy: the same product appeared under two names.
Assumption
The team assumed that because the product name existed in a separate products table, any rename would automatically propagate. They didn't realize that order_items also stored product_name directly — a 2NF violation.
Root cause
The order_items table had a composite primary key (order_id, product_id) but stored product_name, which depends only on product_id. This partial dependency meant renaming the product required updating every row in order_items — an update that failed silently due to an incomplete WHERE clause.
Fix
Removed product_name from order_items. Added a foreign key to products and used a historical price table to preserve invoice accuracy. The rename now requires a single UPDATE on the products table.
Key lesson
  • Every partial dependency is a ticking bomb. If a column depends on only part of a composite key, extract it to its own table.
  • Never store derivable facts in child tables just for convenience. The JOIN cost is lower than the cost of data inconsistency.
  • When migrating to a normalized schema, run data validation queries to catch hidden dependencies before deployment.
Production debug guideHow to identify 1NF, 2NF, and 3NF violations without a design document — using SQL queries alone.3 entries
Symptom · 01
Queries use LIKE '%value%' on a column that logically holds multiple values
Fix
Check if the column stores comma‑separated lists. Run: SELECT DISTINCT column_name FROM table WHERE column_name LIKE '%,%' . If >0 rows, it's a 1NF violation.
Symptom · 02
Updating a single fact (e.g., department budget) requires updating many rows
Fix
Check for transitive dependencies: list all non‑key columns and see if any logically depend on another non‑key column. For example, department_budget depends on department_id, not on employee_id.
Symptom · 03
Inserting a new entity (e.g., a product) fails because a foreign key to an order is required
Fix
This insertion anomaly indicates a table that conflates two entity types. Use a staging table or split into two tables with separate primary keys.
Normalized vs Denormalized: When to Use Each
AspectNormalized (3NF)Denormalized
Data redundancyMinimal — each fact stored onceIntentional duplication for speed
Update complexityUpdate one row in one tableMust update multiple rows or use triggers
Read performanceJoins can be expensive at scaleSingle-table reads are very fast
Write performanceFast inserts and updatesSlower writes due to sync overhead
Risk of inconsistencyNear zero — data has one sourceReal risk if sync logic has a bug
Best suited forOLTP systems (e-commerce, CRMs)OLAP / reporting / read-heavy dashboards
Storage costLower — no duplicated dataHigher — redundant columns and tables
Schema flexibilityEasier to extend and refactorChanges ripple across multiple places

Key takeaways

1
1NF is about atomic values and unique rows
if you're storing lists in a column or have no primary key, you're not even at the starting line. Fix it by creating child tables.
2
2NF only applies to composite keys
every non-key column must depend on the whole key, not a subset. If product_name only needs product_id to be determined, it belongs in a products table, not in order_items.
3
3NF eliminates transitive dependencies
if Column C is determined by Column B, and Column B is determined by the primary key, C belongs in its own table keyed by B. The symptom is always the same: an update to one real-world fact requires touching multiple rows.
4
Denormalization is a performance tool, not a design shortcut
always normalize first, measure your actual query bottlenecks with EXPLAIN ANALYZE, and only denormalize with a documented reason and a sync mechanism you trust.

Common mistakes to avoid

5 patterns
×

Storing comma-separated values in a single column

Symptom
Queries like WHERE phone_numbers LIKE '%555-1234%' that are slow, can't use indexes, and silently return wrong results when values are substrings of each other (e.g., 555-1234 matches 555-12345).
Fix
Create a child table with a foreign key and one value per row. If you're tempted by the convenience of a list column, you're about to create a query nightmare that'll haunt you for years.
×

Confusing 'no redundancy' with 'no repeated foreign keys'

Symptom
A developer tries to 'normalize further' by removing the department_id FK from every employee row and replacing it with some indirect lookup.
Fix
Foreign key columns are not redundancy. They're how relational databases express relationships. A normalized schema has many foreign keys and that's exactly right. Redundancy means storing the same non-key fact (like department_budget) in multiple places.
×

Skipping normalization for performance reasons without measuring

Symptom
A flat, denormalized table with 30 columns that has update anomalies causing silent data corruption in production.
Fix
Always design normalized first, then use EXPLAIN ANALYZE (or your database's equivalent) to identify actual bottlenecks before denormalizing anything. Premature denormalization is the database equivalent of premature optimization — you get the costs without the benefits.
×

Using JSON columns for structured relational data

Symptom
Storing customer addresses as JSON because 'it's flexible'. Later, you need to find all customers in a specific ZIP code, and the query is slow and error-prone.
Fix
Reserve JSON columns for truly schema-less data (e.g., user preferences, feature flags). For structured data with relationships, use normalized tables with proper foreign keys.
×

Ignoring the difference between historical snapshots and redundant facts

Symptom
A developer removes price_at_purchase from order_items because it duplicates the current price in the products table. Invoices become inaccurate after the product price changes.
Fix
Recognize that price_at_purchase is a historical snapshot determined by the order's time and product. It's not a 2NF violation — it's a completely different fact. Always preserve historical snapshots that are needed for audit or accounting.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
Can you walk me through the difference between a partial dependency and ...
Q02SENIOR
We have a reporting dashboard that joins six tables on every page load a...
Q03SENIOR
If a table is in 3NF, is it automatically in BCNF? Walk me through a cas...
Q04SENIOR
What is an insertion anomaly and how does normalization prevent it?
Q01 of 04JUNIOR

Can you walk me through the difference between a partial dependency and a transitive dependency? Give me a concrete table example for each — not just the definition.

ANSWER
A partial dependency occurs in a table with a composite primary key when a non-key column depends on only part of the key. For example, in order_items(order_id, product_id, product_name), product_name depends only on product_id, not the full composite key. A transitive dependency occurs when a non-key column depends on another non-key column. For example, in employees(employee_id, department_id, department_budget), department_budget depends on department_id, which is not the primary key. Fix both by extracting the dependent column(s) into their own table.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is the difference between 2NF and 3NF?
02
Does every database need to be in 3NF?
03
Can I normalize too much? Is there such a thing as over-normalization?
04
How do I identify which normal form my table is in?
05
Is it okay to use JSON columns in PostgreSQL if they are indexed?
N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's Database Design. Mark it forged?

12 min read · try the examples if you haven't

Previous
Apache HBase Basics
1 / 16 · Database Design
Next
1NF 2NF 3NF Explained