Database Beginner

Primary Key and Foreign Key Explained — Database Design for Beginners

📅 March 2026 ⏱ 8 min read 🎯 Beginner

In Plain English 🔥

Imagine a school library. Every book has a unique barcode — no two books share the same one. That barcode is the primary key: a one-of-a-kind label that identifies exactly one record. Now imagine a student borrows a book. The checkout slip has the student's ID on it — not the student's full name, address, and photo, just the ID. That ID pointing back to the student record is the foreign key. It's how two separate lists stay connected without duplicating all the data.

⚡ Quick Answer

Every app you've ever used — Instagram, Spotify, your banking app — stores its data in a database. Users, posts, songs, transactions: all of it lives in tables. But here's the thing: data almost never exists in isolation. A post belongs to a user. A transaction belongs to an account. An order belongs to a customer. The entire system only works because there's a mechanism that links these tables together reliably and without chaos. That mechanism starts with primary keys and foreign keys.

Without them, you'd face a nightmare. Imagine storing a customer's name, address, and phone number on every single order they ever placed. When they move house, you'd have to update hundreds of rows. Miss one? Now your database is lying to you. Or imagine deleting a customer record but leaving all their orphaned orders sitting in the database pointing to nobody. Primary keys and foreign keys exist specifically to prevent these disasters — they enforce identity and relationships at the database level, so your data stays truthful.

By the end of this article you'll understand what a primary key is and why every table needs one, what a foreign key is and how it stitches tables together, how to write the SQL to create both from scratch, the classic mistakes beginners make and exactly how to fix them, and the interview questions that trip people up. You'll walk away ready to design a real, properly connected database.

What Is a Primary Key — And Why Every Table Needs One

A primary key is a column (or combination of columns) in a database table whose value uniquely identifies every single row. Think of it like a passport number. Two people can share the same name, birthday, even the same hometown — but no two people share a passport number. That uniqueness is exactly what a primary key guarantees.

Why does this matter so much? Because databases need a reliable way to say 'I mean THIS exact row, not any other.' When you want to update a customer's email address, you can't search by name — what if two customers are both called 'James Brown'? You need one column that is guaranteed to be different for every row. That's your primary key.

A valid primary key follows three rules: it must be unique across all rows, it must never be NULL (empty), and it should never change once assigned. That last rule is important. If you use someone's email address as a primary key and they change their email, every reference to that key across your whole database breaks. This is why databases typically use a simple auto-incrementing integer — like 1, 2, 3, 4 — as the primary key. It's meaningless outside the database, which is exactly what you want: stable, dumb, permanent.

create_customers_table.sql · SQL

1234567891011121314151617181920212223242526272829

-- Create a customers table with a proper primary key
-- AUTO_INCREMENT means the database assigns the next number automatically
-- You never insert a value for customer_id yourself
CREATE TABLE customers (
    customer_id   INT           NOT NULL AUTO_INCREMENT,  -- the primary key column
    full_name     VARCHAR(100)  NOT NULL,                 -- customer's full name
    email_address VARCHAR(150)  NOT NULL,                 -- must be provided
    join_date     DATE          NOT NULL,                 -- when they signed up

    -- This line officially declares customer_id as the primary key
    PRIMARY KEY (customer_id)
);

-- Insert three customers — notice we do NOT provide customer_id
-- The database fills it in for us: 1, 2, 3
INSERT INTO customers (full_name, email_address, join_date)
VALUES
    ('Alice Mercer',  'alice@example.com',  '2023-01-15'),
    ('James Brown',   'jbrown@example.com', '2023-03-22'),
    ('James Brown',   'jb2@example.com',    '2024-07-01');  -- same name, different person

-- Retrieve all customers to see the auto-assigned IDs
SELECT customer_id, full_name, email_address FROM customers;

-- Try inserting a duplicate primary key to see the protection in action
-- This will FAIL — the database will reject it
INSERT INTO customers (customer_id, full_name, email_address, join_date)
VALUES (1, 'Hacker Person', 'hack@example.com', '2024-01-01');
-- ERROR: Duplicate entry '1' for key 'PRIMARY'

▶ Output

⚠️

Pro Tip: Never Use Real-World Data as a Primary KeyIt's tempting to use an email address or phone number as a primary key because they seem unique. Don't. People change emails. Phone numbers get recycled. Use a meaningless auto-incrementing integer (INT AUTO_INCREMENT in MySQL, SERIAL in PostgreSQL) — it's stable, fast to index, and has no real-world baggage.

What Is a Foreign Key — The Bridge Between Tables

Once every table has its own primary key, you need a way to say 'this row in Table A belongs to that row in Table B.' That's exactly what a foreign key does. A foreign key is a column in one table that stores the primary key value from another table. It's the bridge.

Back to the library analogy: when a student borrows a book, the checkout record doesn't copy out the student's full name, address, grade, and teacher. It just stores the student's ID number. That ID number is a foreign key — it points back to the full student record. To get the student's name, you follow the pointer.

This design is called normalization, and it's a fundamental principle of good database design. Store each piece of information in exactly one place. When Alice Mercer moves house, you update her address in one row of the customers table. Every order, every message, every record that references her customer_id automatically reflects the new address. No hunting, no inconsistency.

The foreign key also acts as a guardian. By default, a database with a foreign key constraint will refuse to let you insert an order for a customer_id that doesn't exist in the customers table. It also prevents you from deleting a customer who still has active orders. This is called referential integrity — the database enforces that your references always point to something real.

create_orders_table_with_foreign_key.sql · SQL

12345678910111213141516171819202122232425262728293031323334353637383940414243444546

-- We already have a customers table with customer_id as the primary key.
-- Now we create an orders table that links to it via a foreign key.

CREATE TABLE orders (
    order_id      INT            NOT NULL AUTO_INCREMENT,  -- primary key for this table
    customer_id   INT            NOT NULL,                  -- foreign key: references customers
    order_date    DATE           NOT NULL,
    total_amount  DECIMAL(10,2)  NOT NULL,
    order_status  VARCHAR(20)    NOT NULL DEFAULT 'pending',

    -- Declare order_id as this table's own primary key
    PRIMARY KEY (order_id),

    -- Declare the foreign key relationship
    -- customer_id in THIS table must match a customer_id in the customers table
    CONSTRAINT fk_orders_customer
        FOREIGN KEY (customer_id)
        REFERENCES customers (customer_id)
        ON DELETE RESTRICT   -- prevent deleting a customer who has orders
        ON UPDATE CASCADE    -- if customer_id changes, update it here too
);

-- Insert a valid order for Alice (customer_id = 1) — this works
INSERT INTO orders (customer_id, order_date, total_amount)
VALUES (1, '2024-02-10', 59.99);

-- Insert another valid order for James Brown (customer_id = 2) — this works
INSERT INTO orders (customer_id, order_date, total_amount)
VALUES (2, '2024-03-05', 120.00);

-- Try to insert an order for customer_id = 999 — does not exist in customers
-- The foreign key constraint will BLOCK this
INSERT INTO orders (customer_id, order_date, total_amount)
VALUES (999, '2024-04-01', 45.00);
-- ERROR: Cannot add or update a child row: foreign key constraint fails

-- Join the two tables to see orders with the customer's name
-- This is the power of the foreign key — we can reunite the data
SELECT
    o.order_id,
    c.full_name       AS customer_name,
    o.order_date,
    o.total_amount,
    o.order_status
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;  -- follow the foreign key bridge

▶ Output

⚠️

Watch Out: ON DELETE and ON UPDATE Actually MatterMost tutorials skip ON DELETE and ON UPDATE, but they change everything. ON DELETE RESTRICT (the safest default) stops you deleting a parent row that has children. ON DELETE CASCADE automatically deletes all child rows when the parent is deleted — useful for things like deleting a user and all their messages, but dangerous if misapplied to financial records. Always think carefully about which behaviour you want before writing the constraint.

Composite Keys, Natural Keys, and When to Break the Rules

So far we've used a single auto-incrementing integer as the primary key, and that covers 90% of real-world cases. But you'll occasionally encounter two variations worth knowing about.

A composite primary key uses two or more columns together to form the unique identifier. Imagine a table tracking which students are enrolled in which courses. A student can enrol in many courses, and a course can have many students — but a specific student can only be enrolled in a specific course once. So the combination of (student_id, course_id) is what must be unique. Neither column alone is the primary key; together they are.

A natural key uses a real-world piece of data that happens to be unique — like a national insurance number, ISBN, or vehicle registration plate. Some teams prefer these because they carry meaning. The risk, as mentioned earlier, is that real-world data can change or have edge cases (two editions of a book with the same ISBN is a real problem). A surrogate key — the auto-incremented integer — has no meaning outside the database and never changes, which is why most modern systems prefer it.

The rule of thumb: use a surrogate key by default. Add a UNIQUE constraint on natural data (like email addresses) if you need to enforce uniqueness there too. That way you get the stability of a surrogate key AND the uniqueness guarantee on the real-world field.

composite_key_and_unique_constraint.sql · SQL

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556

-- Example 1: Composite Primary Key
-- The enrolments table tracks student-course pairings
-- A student cannot enrol in the same course twice
CREATE TABLE enrolments (
    student_id  INT  NOT NULL,   -- foreign key pointing to students table
    course_id   INT  NOT NULL,   -- foreign key pointing to courses table
    enrol_date  DATE NOT NULL,
    grade       CHAR(2),         -- NULL until the course is completed

    -- The PRIMARY KEY is the COMBINATION of both columns
    -- Neither column alone is unique; the pair is
    PRIMARY KEY (student_id, course_id),

    CONSTRAINT fk_enrolment_student
        FOREIGN KEY (student_id) REFERENCES students (student_id),

    CONSTRAINT fk_enrolment_course
        FOREIGN KEY (course_id) REFERENCES courses (course_id)
);

-- This inserts fine — student 1 enrolling in course 10
INSERT INTO enrolments (student_id, course_id, enrol_date)
VALUES (1, 10, '2024-09-01');

-- This also inserts fine — student 1 enrolling in a DIFFERENT course
INSERT INTO enrolments (student_id, course_id, enrol_date)
VALUES (1, 11, '2024-09-01');

-- This FAILS — student 1 is already in course 10 (duplicate composite key)
INSERT INTO enrolments (student_id, course_id, enrol_date)
VALUES (1, 10, '2024-10-01');
-- ERROR: Duplicate entry '1-10' for key 'PRIMARY'

-- -------------------------------------------------------
-- Example 2: Surrogate key + UNIQUE constraint on email
-- The surrogate key (customer_id) is the real primary key
-- The UNIQUE constraint stops duplicate emails without making email the PK
CREATE TABLE members (
    member_id     INT          NOT NULL AUTO_INCREMENT,
    email_address VARCHAR(150) NOT NULL,
    full_name     VARCHAR(100) NOT NULL,

    PRIMARY KEY (member_id),

    -- Enforce email uniqueness separately — best of both worlds
    CONSTRAINT uq_members_email UNIQUE (email_address)
);

-- Valid insert
INSERT INTO members (email_address, full_name)
VALUES ('carol@example.com', 'Carol Danes');

-- This fails — email already exists
INSERT INTO members (email_address, full_name)
VALUES ('carol@example.com', 'Carol Smith');
-- ERROR: Duplicate entry 'carol@example.com' for key 'uq_members_email'

▶ Output

-- After the two valid enrolment inserts:
+------------+-----------+------------+-------+
| student_id | course_id | enrol_date | grade |
+------------+-----------+------------+-------+
| 1 | 10 | 2024-09-01 | NULL |
| 1 | 11 | 2024-09-01 | NULL |
+------------+-----------+------------+-------+

-- Duplicate composite key error:
ERROR 1062 (23000): Duplicate entry '1-10' for key 'enrolments.PRIMARY'

-- Duplicate email error:
ERROR 1062 (23000): Duplicate entry 'carol@example.com' for key 'uq_members_email'

🔥

Interview Gold: Surrogate vs Natural KeyInterviewers love asking 'should you use an email address as a primary key?' The strong answer is no — use a surrogate key (auto-increment integer or UUID) for stability, then add a UNIQUE constraint on the email column. This separates identity from data and future-proofs the table against changes in the real world.

Feature / Aspect	Primary Key	Foreign Key
Purpose	Uniquely identifies each row in its own table	Links a row in this table to a row in another table
Where it lives	In the table it identifies	In the child/referencing table
Must be unique?	Yes — always, no exceptions	No — many rows can share the same foreign key value
Can it be NULL?	Never — NULL is forbidden	Yes, if the relationship is optional (e.g. an order with no assigned salesperson)
Can there be duplicates?	No — each value must appear exactly once	Yes — multiple orders can reference the same customer_id
Auto-generated?	Commonly yes, via AUTO_INCREMENT or SERIAL	No — you supply the value from the parent table's primary key
What happens on violation?	INSERT/UPDATE is rejected with a duplicate key error	INSERT is rejected; DELETE of parent is blocked (RESTRICT) or cascades (CASCADE)
Typical column name	table_name + _id, e.g. customer_id in customers	Same name as the parent PK, e.g. customer_id in orders
Number allowed per table	Exactly one primary key (can span multiple columns)	Many foreign keys allowed — a table can reference several other tables

🎯 Key Takeaways

A primary key is a column whose value is unique and never NULL for every row — it's the table's permanent identity badge, ideally an auto-incrementing integer that has no real-world meaning.
A foreign key is a column that stores the primary key value from another table — it's the bridge that links related data across tables without duplicating information.
Foreign key constraints enforce referential integrity at the database level — the database itself will block you from inserting orphaned records or deleting parents that have dependent children, not just your application code.
Use ON DELETE RESTRICT for financial or critical data (safe default), ON DELETE CASCADE for owned child data like user messages, and ON DELETE SET NULL for optional relationships — the choice you make here has real consequences in production.

⚠ Common Mistakes to Avoid

✕Mistake 1: Using a mutable real-world value (like email or username) as a primary key — Symptom: UPDATE statements cascade-fail across multiple tables when a user changes their email, or you get mysterious orphaned records — Fix: Always use a surrogate key (INT AUTO_INCREMENT or UUID) as the primary key. Add a UNIQUE constraint on the email column separately if you need uniqueness there. The primary key should be meaningless, permanent, and auto-assigned.
✕Mistake 2: Forgetting to index the foreign key column — Symptom: JOIN queries on large tables become painfully slow as the table grows, even though the query looks correct — Fix: Most databases (PostgreSQL, SQL Server) do NOT automatically create an index on a foreign key column — only MySQL/InnoDB does. Always create an explicit index: CREATE INDEX idx_orders_customer_id ON orders (customer_id); This turns a full table scan into a lightning-fast lookup every time you JOIN the two tables.
✕Mistake 3: Inserting child rows before parent rows (or dropping tables in the wrong order) — Symptom: ERROR 1452 on INSERT or ERROR 1451 on DROP/DELETE even though the data looks correct — Fix: Always insert the parent record first (e.g. insert the customer before inserting their order). When dropping tables, drop child tables before parent tables, or temporarily disable foreign key checks: SET FOREIGN_KEY_CHECKS = 0; DROP TABLE orders; DROP TABLE customers; SET FOREIGN_KEY_CHECKS = 1; Re-enable checks immediately after — never leave them off.

Interview Questions on This Topic

QWhat is the difference between a primary key and a unique key? (The common trap: both enforce uniqueness, but a table can only have ONE primary key and it cannot contain NULLs, whereas you can have multiple UNIQUE constraints on a table and UNIQUE columns can contain NULL in most databases — though NULL behaviour varies by database engine.)
QCan a table have a foreign key that references itself? (Yes — this is called a self-referencing or recursive foreign key and it's used for hierarchical data like an employees table where each employee has a manager_id that points back to another employee_id in the same table.)
QWhat is referential integrity, and how do PRIMARY KEY and FOREIGN KEY constraints enforce it? (Referential integrity means every foreign key value must correspond to an existing primary key value in the parent table. The FK constraint enforces this by blocking inserts of non-existent references and by controlling what happens to child rows when a parent is deleted or updated, via ON DELETE and ON UPDATE rules like RESTRICT, CASCADE, SET NULL, and SET DEFAULT.)

Frequently Asked Questions

Can a foreign key reference a column that is not a primary key?

Yes, but only if that column has a UNIQUE constraint on it. The foreign key must reference a column that is guaranteed to contain unique values — which is true of primary keys and unique-constrained columns. In practice, referencing the primary key is almost always the right choice because it's always indexed and always unique.

Can a table have more than one foreign key?

Absolutely. A table can have as many foreign keys as it needs. An orders table, for example, might have a customer_id foreign key referencing customers, a product_id foreign key referencing products, and a salesperson_id foreign key referencing employees — all at the same time. Each constraint is independently enforced.

What is the difference between a primary key and a unique key?

Both guarantee that no two rows have the same value in that column. The differences: a table can have only one primary key but multiple unique keys; a primary key column can never contain NULL, but a unique key column can (in most databases, multiple NULLs are allowed in a unique column because NULL is considered 'unknown', not a duplicate value). Primary keys are also the default target for foreign key references.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged