Home Database 1NF, 2NF, 3NF Explained: Database Normalization With Real Examples

1NF, 2NF, 3NF Explained: Database Normalization With Real Examples

In Plain English 🔥
Imagine you're keeping a notebook of your friend's pizza orders. If you write 'pepperoni, mushroom, olives' all in one cell, finding everyone who ordered mushroom becomes a nightmare. Normalization is the process of reorganizing that notebook so every fact has its own dedicated spot — no repetition, no ambiguity, no hunting. The three normal forms (1NF, 2NF, 3NF) are just three increasingly strict rules for how clean that notebook needs to be.
⚡ Quick Answer
Imagine you're keeping a notebook of your friend's pizza orders. If you write 'pepperoni, mushroom, olives' all in one cell, finding everyone who ordered mushroom becomes a nightmare. Normalization is the process of reorganizing that notebook so every fact has its own dedicated spot — no repetition, no ambiguity, no hunting. The three normal forms (1NF, 2NF, 3NF) are just three increasingly strict rules for how clean that notebook needs to be.

Every database that's ever ground to a halt under load, returned mysteriously duplicate rows, or made a simple update cascade into a data disaster has one thing in common: poor normalization. Database normalization isn't an academic exercise invented by theorists — it's the difference between a schema that scales gracefully and one that quietly corrupts your data the moment traffic picks up. Edgar Codd formalized these rules in the 1970s, and they're just as relevant today whether you're building a SaaS app on Postgres or a reporting system on MySQL.

The core problem normalization solves is called an 'anomaly' — and there are three flavors. An insertion anomaly means you can't add a new piece of data without inventing fake data to fill unrelated columns. An update anomaly means changing one real-world fact requires hunting down and updating that fact in dozens of rows. A deletion anomaly means deleting one record accidentally destroys unrelated information. All three anomalies trace back to the same root cause: storing multiple independent facts in the same table row.

By the end of this article you'll understand exactly what each normal form requires and — more importantly — why it requires it. You'll be able to look at any table schema and spot which anomalies it's vulnerable to, decompose it into clean normalized tables using real SQL, and explain your reasoning confidently in a design review or an interview. Let's build this up step by step, starting from a deliberately broken table and fixing it one normal form at a time.

First Normal Form (1NF): One Fact Per Cell, One Row Per Entity

1NF has one core rule: every column must hold a single, indivisible (atomic) value. No comma-separated lists stuffed into a cell. No repeating column groups like phone1, phone2, phone3. Every row must be uniquely identifiable by a primary key.

Why does this matter so much? Because relational databases are built on set theory — the engine is optimized to compare, filter, and join atomic values. The moment you put 'pepperoni, mushroom' into a single cell you've created a mini-document inside your relational database. Now every query that needs to find 'who ordered mushroom' has to do string parsing instead of a clean index lookup. Performance tanks and correctness becomes fragile.

The repeating column trap is just as bad. If you have phone1, phone2, phone3 columns, adding a fourth phone number requires an ALTER TABLE. You've baked a business assumption (max 3 phones) into the schema itself. The fix is always to spin those repeating values out into their own table with a foreign key relationship.

In the example below, we start with a broken orders table and normalize it to 1NF. Notice how the fix isn't just about cleanliness — it unlocks proper indexing and makes every query simpler.

normalize_to_1nf.sql · SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
-- ============================================================
-- BEFORE 1NF: The broken table
-- Problems:
--   1. 'ordered_items' column stores multiple values in one cell
--   2. No single-column primary key — rows aren't cleanly unique
-- ============================================================

CREATE TABLE orders_broken (
    order_id      INT,
    customer_name VARCHAR(100),
    ordered_items VARCHAR(255),   -- BAD: 'Burger, Fries, Soda' in one cell
    item_prices   VARCHAR(255)    -- BAD: '5.99, 2.49, 1.99' matched by position
);

INSERT INTO orders_broken VALUES
    (1, 'Alice Chen',  'Burger, Fries, Soda', '5.99, 2.49, 1.99'),
    (2, 'Bob Martins', 'Pizza, Soda',          '8.99, 1.99');

-- Try querying: who ordered Soda? You'd need a LIKE '%Soda%' search.
-- That can't use an index. On a million-row table, this is a full scan every time.
SELECT customer_name
FROM   orders_broken
WHERE  ordered_items LIKE '%Soda%'; -- Fragile and slow


-- ============================================================
-- AFTER 1NF: Decomposed into atomic rows
-- Every cell holds exactly one value.
-- Primary key: (order_id, item_name) — uniquely identifies each row.
-- ============================================================

CREATE TABLE orders_1nf (
    order_id      INT,
    customer_name VARCHAR(100),
    item_name     VARCHAR(100),   -- ONE item per row
    item_price    DECIMAL(6, 2),  -- ONE price per row
    PRIMARY KEY (order_id, item_name)  -- Composite PK enforces uniqueness
);

INSERT INTO orders_1nf VALUES
    (1, 'Alice Chen',  'Burger', 5.99),
    (1, 'Alice Chen',  'Fries',  2.49),
    (1, 'Alice Chen',  'Soda',   1.99),
    (2, 'Bob Martins', 'Pizza',  8.99),
    (2, 'Bob Martins', 'Soda',   1.99);

-- Now the same query is clean, indexable, and accurate:
SELECT DISTINCT customer_name
FROM   orders_1nf
WHERE  item_name = 'Soda';
-- Uses an index on item_name. Fast at any scale.
▶ Output
-- Query result after 1NF fix:
customer_name
--------------
Alice Chen
Bob Martins
(2 rows)
⚠️
Watch Out: JSON Columns Are a 1NF TrapModern databases (Postgres, MySQL 5.7+) support JSON columns, which lets you store nested objects in a single cell. This feels convenient but deliberately breaks 1NF. It's acceptable for truly unstructured, schemaless data — but the moment you find yourself querying *inside* that JSON with ->> operators in a WHERE clause, you've recreated the same performance and correctness problems 1NF was designed to prevent. If you query it, it belongs in its own column.

Second Normal Form (2NF): Kill Partial Dependencies on Composite Keys

2NF only applies when your table has a composite primary key (a key made of two or more columns). The rule is: every non-key column must depend on the whole primary key, not just part of it. A column that only depends on part of the key is called a partial dependency, and it's a one-way ticket to update anomalies.

Look at the 1NF table we just created. The primary key is (order_id, item_name). Ask yourself: does customer_name depend on the whole key, or just on order_id? Just order_id — Alice Chen placed order 1, regardless of what she ordered. That's a partial dependency. If Alice changes her name, we have to update every row where order_id = 1. Miss one and the data is inconsistent.

The fix is straightforward: extract the partially-dependent columns into their own table, using the part of the key they actually depend on as the primary key of that new table. This isn't just tidiness — it means each real-world fact (like a customer's name) lives in exactly one place. One update, guaranteed consistency.

This is why well-designed schemas have separate customers, orders, and order_items tables. They aren't separated for aesthetics — each split removes a specific category of anomaly.

normalize_to_2nf.sql · SQL
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
-- ============================================================
-- STILL BROKEN (1NF but not 2NF):
-- Primary key: (order_id, item_name)
-- Problem: customer_name depends only on order_id, not on item_name
-- Symptom: If 'Alice Chen' updates her name, we touch N rows instead of 1
-- ============================================================

-- orders_1nf table (from previous step) violates 2NF:
-- order_id | customer_name | item_name | item_price
--    1     | Alice Chen    | Burger    | 5.99   <-- customer_name repeated
--    1     | Alice Chen    | Fries     | 2.49   <-- customer_name repeated
--    1     | Alice Chen    | Soda      | 1.99   <-- customer_name repeated


-- ============================================================
-- AFTER 2NF: Split into three focused tables
-- Each non-key attribute now depends on the FULL primary key of its table.
-- ============================================================

-- Table 1: Customers — customer_name depends only on customer_id
CREATE TABLE customers (
    customer_id   INT          PRIMARY KEY,
    customer_name VARCHAR(100) NOT NULL
);

INSERT INTO customers VALUES
    (101, 'Alice Chen'),
    (102, 'Bob Martins');


-- Table 2: Orders — links a customer to a specific order
-- customer_id depends on order_id (not on any item)
CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    customer_id INT NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

INSERT INTO orders VALUES
    (1, 101),
    (2, 102);


-- Table 3: Order Items — the composite PK is (order_id, item_name)
-- item_price depends on BOTH: which order AND which item (price can vary per order)
CREATE TABLE order_items (
    order_id   INT,
    item_name  VARCHAR(100),
    item_price DECIMAL(6, 2) NOT NULL,
    PRIMARY KEY (order_id, item_name),
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

INSERT INTO order_items VALUES
    (1, 'Burger', 5.99),
    (1, 'Fries',  2.49),
    (1, 'Soda',   1.99),
    (2, 'Pizza',  8.99),
    (2, 'Soda',   1.99);


-- Now a name update touches exactly ONE row, not N rows:
UPDATE customers
SET    customer_name = 'Alice Zhang'
WHERE  customer_id = 101;
-- 1 row updated. No risk of partial update leaving inconsistent data.


-- Joining back is clean and readable:
SELECT
    c.customer_name,
    oi.item_name,
    oi.item_price
FROM   orders      o
JOIN   customers   c  ON c.customer_id = o.customer_id
JOIN   order_items oi ON oi.order_id   = o.order_id
ORDER BY c.customer_name, oi.item_name;
▶ Output
-- After UPDATE and SELECT:
customer_name | item_name | item_price
--------------+-----------+-----------
Alice Zhang | Burger | 5.99
Alice Zhang | Fries | 2.49
Alice Zhang | Soda | 1.99
Bob Martins | Pizza | 8.99
Bob Martins | Soda | 1.99
(5 rows)
🔥
Key Insight: 2NF Is Only About Composite KeysIf every table in your schema has a single-column primary key (like a surrogate integer ID), your tables are automatically in 2NF — partial dependencies can't exist without a composite key to be partial about. This is one reason surrogate keys are so popular: they sidestep 2NF violations entirely. But don't let that make you complacent — tables with natural composite keys (like junction tables) still need careful 2NF analysis.

Third Normal Form (3NF): Eliminate Transitive Dependencies

You've hit 2NF — every non-key column depends on the full primary key. But there's one more trap: what if a non-key column depends on another non-key column rather than directly on the primary key? That's a transitive dependency, and it's what 3NF eliminates.

The classic way to spot it: if you can say 'Column C depends on Column B, and Column B depends on the primary key — but Column C doesn't directly depend on the primary key,' you have a transitive dependency. Removing Column C and Column B into their own table (with B as the new primary key) fixes it.

A real example: imagine a products table where each product belongs to a category, and each category has a fixed tax rate. The primary key is product_id. category_name depends on product_id (fine). But category_tax_rate depends on category_name, not directly on product_id. If you update the tax rate for a category, you'd need to update every product row in that category — a classic update anomaly.

The fix: create a categories table. Now category_tax_rate lives in one place. Change a category's tax rate once, and every product in that category instantly reflects it. This is how real-world schemas handle configuration, pricing tiers, regional rules, and permission levels — anything that is a property of a group rather than a property of an individual record.

normalize_to_3nf.sql · SQL
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
-- ============================================================
-- BEFORE 3NF: products table has a transitive dependency
-- Primary key: product_id
-- Dependency chain:
--   product_id --> category_name --> category_tax_rate
-- category_tax_rate does NOT directly depend on product_id.
-- It depends on category_name, which depends on product_id.
-- ============================================================

CREATE TABLE products_broken (
    product_id       INT          PRIMARY KEY,
    product_name     VARCHAR(100) NOT NULL,
    category_name    VARCHAR(100) NOT NULL,
    category_tax_rate DECIMAL(5, 4) NOT NULL  -- e.g. 0.0750 = 7.5%
);

INSERT INTO products_broken VALUES
    (1, 'Wireless Mouse',   'Electronics', 0.0750),
    (2, 'Mechanical Keyboard', 'Electronics', 0.0750),  -- tax rate duplicated
    (3, 'Desk Lamp',        'Electronics', 0.0750),     -- duplicated again
    (4, 'Notebook',         'Stationery',  0.0500),
    (5, 'Ballpoint Pen',    'Stationery',  0.0500);     -- duplicated

-- Anomaly demo: Electronics tax rate changes from 7.5% to 8.25%
-- You MUST update every Electronics row or your data becomes inconsistent:
UPDATE products_broken
SET    category_tax_rate = 0.0825
WHERE  category_name = 'Electronics';  -- 3 rows touched. What if one fails mid-transaction?


-- ============================================================
-- AFTER 3NF: Extract the transitive dependency into its own table
-- category_tax_rate now lives in ONE place per category.
-- ============================================================

-- Table 1: Categories — category_tax_rate directly depends on category_id
CREATE TABLE categories (
    category_id   INT          PRIMARY KEY,
    category_name VARCHAR(100) NOT NULL UNIQUE, -- name is still unique
    tax_rate      DECIMAL(5, 4) NOT NULL         -- one source of truth
);

INSERT INTO categories VALUES
    (10, 'Electronics', 0.0750),
    (20, 'Stationery',  0.0500);


-- Table 2: Products — references category by FK, no tax rate duplication
CREATE TABLE products (
    product_id   INT          PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    category_id  INT          NOT NULL,
    FOREIGN KEY (category_id) REFERENCES categories(category_id)
);

INSERT INTO products VALUES
    (1, 'Wireless Mouse',      10),
    (2, 'Mechanical Keyboard', 10),
    (3, 'Desk Lamp',           10),
    (4, 'Notebook',            20),
    (5, 'Ballpoint Pen',       20);


-- Now the tax rate update touches exactly ONE row:
UPDATE categories
SET    tax_rate = 0.0825
WHERE  category_name = 'Electronics';
-- 1 row updated. Every Electronics product automatically reflects 8.25%.


-- Verify: all Electronics products now use the updated rate
SELECT
    p.product_name,
    c.category_name,
    c.tax_rate,
    ROUND(p.product_id * 10.00 * c.tax_rate, 2) AS sample_tax  -- illustrative
FROM   products   p
JOIN   categories c ON c.category_id = p.category_id
ORDER BY c.category_name, p.product_name;
▶ Output
-- After UPDATE categories SET tax_rate = 0.0825 WHERE category_name = 'Electronics':
-- UPDATE 1

-- SELECT result:
product_name | category_name | tax_rate | sample_tax
---------------------+---------------+----------+------------
Desk Lamp | Electronics | 0.0825 | 2.48
Mechanical Keyboard | Electronics | 0.0825 | 1.65
Wireless Mouse | Electronics | 0.0825 | 0.83
Ballpoint Pen | Stationery | 0.0500 | 2.50
Notebook | Stationery | 0.0500 | 2.00
(5 rows)
⚠️
Pro Tip: Use Codd's Plain-English Test for 3NFEdgar Codd's own way to remember 3NF: every non-key attribute must depend on 'the key, the whole key, and nothing but the key.' The first clause rules out non-atomic values (1NF). 'The whole key' rules out partial dependencies (2NF). 'Nothing but the key' rules out transitive dependencies (3NF). Recite that in an interview and you'll stand out immediately.

When to Normalize — and When to Deliberately Stop

Normalization isn't free. Every additional table you create means an additional JOIN at query time. For an OLTP system (online transaction processing — your typical web app handling inserts and updates), 3NF is usually the right target. Writes are fast, data is consistent, and modern query optimizers handle three-table JOINs in microseconds with proper indexing.

For OLAP systems (analytics, reporting, data warehouses), the calculus flips. Analysts run massive aggregation queries across hundreds of millions of rows. A JOIN at that scale is expensive, and data consistency is less critical because the warehouse is read-only. This is why data warehouse schemas are intentionally denormalized — star schemas and snowflake schemas trade 3NF purity for query speed.

There's also a practical middle ground: read-heavy web applications sometimes denormalize specific tables after profiling proves a JOIN is a bottleneck. The key word is after profiling. Premature denormalization is one of the most common and costly database design mistakes — you pay the price in update anomalies and data bugs long before you need the performance gain.

The professional move is to start at 3NF, measure, and denormalize surgically where evidence demands it. Never denormalize based on a hunch.

denormalization_example.sql · SQL
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
-- ============================================================
-- SCENARIO: A product listing page needs to display
-- product name + category name for every product.
-- If this query runs 50,000 times/second, the JOIN might matter.
--
-- Step 1: Measure first. Run EXPLAIN ANALYZE in Postgres:
-- ============================================================

EXPLAIN ANALYZE
SELECT
    p.product_name,
    c.category_name
FROM   products   p
JOIN   categories c ON c.category_id = p.category_id;

-- If the planner uses index scans and costs are low, you're done.
-- Only proceed with denormalization if this is a proven bottleneck.


-- ============================================================
-- SURGICAL DENORMALIZATION: Cache category_name on products
-- Do this ONLY after profiling confirms the JOIN is too slow.
-- Accept the tradeoff: you now own the responsibility of keeping
-- category_name in sync when categories are renamed.
-- ============================================================

ALTER TABLE products
    ADD COLUMN category_name_cache VARCHAR(100); -- Denormalized column, explicitly named _cache

-- Populate the cache from the normalized source of truth:
UPDATE products p
SET    category_name_cache = c.category_name
FROM   categories c
WHERE  c.category_id = p.category_id;

-- You MUST maintain this cache via a trigger or application logic:
CREATE OR REPLACE FUNCTION sync_category_name_cache()
RETURNS TRIGGER AS $$
BEGIN
    -- When a category name changes, propagate it to the cache column
    UPDATE products
    SET    category_name_cache = NEW.category_name
    WHERE  category_id = NEW.category_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_sync_category_name
AFTER UPDATE OF category_name ON categories
FOR EACH ROW
EXECUTE FUNCTION sync_category_name_cache();

-- Now the listing query needs no JOIN:
SELECT product_name, category_name_cache
FROM   products;
-- Faster read. But now you have a maintenance responsibility.
-- This is the tradeoff. Make it consciously, document it clearly.
▶ Output
-- EXPLAIN ANALYZE output (example, will vary by data volume):
Hash Join (cost=1.11..2.28 rows=5 width=25) (actual time=0.052..0.061 rows=5 loops=1)
-> Seq Scan on products (cost=0.00..1.05 rows=5 width=17)
-> Hash (cost=1.02..1.02 rows=2 width=16)
-> Seq Scan on categories (cost=0.00..1.02 rows=2 width=16)
Planning Time: 0.312 ms
Execution Time: 0.123 ms
-- At 0.123ms per query, the JOIN is NOT your bottleneck. Don't denormalize.
🔥
Interview Gold: The OLTP vs OLAP TradeoffWhen an interviewer asks 'would you always normalize to 3NF?' the right answer is: 'For OLTP systems, yes — 3NF is the baseline because write correctness matters most. For OLAP / data warehousing, I'd likely use a star schema which deliberately denormalizes dimensions for query performance. In both cases I'd start normalized and measure before making any tradeoffs.' That answer shows you understand both sides of the coin.
Aspect1NF2NF3NF
Core RuleAtomic values only; no repeating groupsNo partial dependencies on a composite PKNo transitive dependencies between non-key columns
PrerequisiteNone — it's the baselineMust already satisfy 1NFMust already satisfy 1NF and 2NF
Only applies when...Always — every tableTable has a composite primary keyA non-key column depends on another non-key column
Anomaly it preventsData that's impossible to query or index reliablyUpdate anomaly from duplicated partial-key factsUpdate anomaly from duplicated transitive facts
Typical fixOne value per cell; split multi-value rowsExtract partial-dependent columns to a new tableExtract transitively-dependent columns to a new table
Real-world example of violationStoring 'Burger, Fries' in one columncustomer_name in order_items (depends only on order_id)category_tax_rate in products (depends on category, not product)
Performance impact of fixImproves (enables indexing)Slight JOIN cost; huge write correctness gainSlight JOIN cost; huge write correctness gain
OLTP relevanceEssentialEssentialEssential
OLAP/Warehouse relevanceEssentialOften relaxed intentionallyOften relaxed intentionally for query speed

🎯 Key Takeaways

  • 1NF is about atomicity: one value per cell, no repeating column groups — it's the foundation that makes indexing and querying possible at all.
  • 2NF is only relevant with composite primary keys: every non-key column must depend on the full key, not just part of it — the fix is always to split the partial-dependent columns into their own table.
  • 3NF removes transitive chains: if a non-key column depends on another non-key column rather than directly on the PK, you have a transitive dependency — extract it to its own table to get one source of truth.
  • Normalization is the default, denormalization is the deliberate exception: always start at 3NF for OLTP, measure with EXPLAIN ANALYZE before touching it, and name any denormalized column explicitly (e.g., _cache suffix) so future developers know it's intentional and maintained.

⚠ Common Mistakes to Avoid

  • Mistake 1: Thinking 2NF applies to all tables — The symptom is spending time auditing a table with a single-column primary key for partial dependencies and finding nothing. 2NF violations require a composite primary key by definition — a single-column PK means every non-key column depends on that one column by default. Fix: check your PK first; if it's a single column (especially a surrogate integer ID), skip 2NF analysis and move straight to checking for 3NF transitive dependencies.
  • Mistake 2: Confusing a transitive dependency with a legitimate derived value — The symptom is over-normalizing by splitting out a column that is computed from the primary key, like storing a full_name column when first_name and last_name already exist in the same table. That's redundancy, not a transitive dependency — full_name is derived from key-dependent columns, not via a separate real-world entity. Fix: transitive dependencies involve a third real-world entity (like a category with its own properties). If you can delete the column and reconstruct it entirely from other columns in the same row, it's a derived value problem, not a 3NF violation.
  • Mistake 3: Normalizing to 3NF and then never indexing foreign keys — The symptom is that queries that were fast on the unnormalized table are now slower after normalization because every JOIN is doing a sequential scan on the FK column. The normalization was correct but incomplete. Fix: every foreign key column should have an index unless you've measured that it's never used in a JOIN or WHERE clause. In Postgres, foreign key constraints do NOT automatically create an index (unlike primary keys). Add them explicitly: CREATE INDEX idx_products_category_id ON products(category_id);

Interview Questions on This Topic

  • QCan a table be in 3NF without being in 2NF? Walk me through your reasoning — this trips up a lot of candidates who know the definitions but haven't thought about the dependency chain between the forms.
  • QI have an orders table with columns: order_id (PK), customer_id, customer_email, customer_city, order_total. Which normal form does this violate and how would you fix it — assuming customer_email and customer_city depend on customer_id, not on order_id?
  • QWhen would you deliberately denormalize a 3NF-compliant schema, and what operational responsibility does that create that wouldn't exist in the normalized version?

Frequently Asked Questions

What is the difference between 2NF and 3NF in simple terms?

2NF removes dependencies on part of a composite primary key — it's about columns that only care about half the key. 3NF removes dependencies on other non-key columns — it's about columns that care about a different column rather than the primary key directly. Both are about ensuring every fact lives in exactly one logical place.

Does normalization always improve database performance?

Normalization improves write performance and consistency but can add cost to read queries because of JOINs. For OLTP systems (apps with frequent inserts and updates), 3NF is almost always a net win. For analytical workloads with massive read queries, intentional denormalization (star schemas) is standard practice. Profile first — don't guess.

Can a table skip straight to 3NF without satisfying 1NF and 2NF first?

No — the normal forms are cumulative prerequisites. 3NF is defined as: the table is in 2NF AND has no transitive dependencies. 2NF requires 1NF. So satisfying 3NF logically means you've already satisfied 1NF and 2NF. You can't have a table that's in 3NF but violates 1NF — by definition it would fail the 3NF test.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousDatabase NormalizationNext →ER Diagrams
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged