Primary Key and Foreign Key Explained — Database Design for Beginners
Every app you've ever used — Instagram, Spotify, your banking app — stores its data in a database. Users, posts, songs, transactions: all of it lives in tables. But here's the thing: data almost never exists in isolation. A post belongs to a user. A transaction belongs to an account. An order belongs to a customer. The entire system only works because there's a mechanism that links these tables together reliably and without chaos. That mechanism starts with primary keys and foreign keys.
Without them, you'd face a nightmare. Imagine storing a customer's name, address, and phone number on every single order they ever placed. When they move house, you'd have to update hundreds of rows. Miss one? Now your database is lying to you. Or imagine deleting a customer record but leaving all their orphaned orders sitting in the database pointing to nobody. Primary keys and foreign keys exist specifically to prevent these disasters — they enforce identity and relationships at the database level, so your data stays truthful.
By the end of this article you'll understand what a primary key is and why every table needs one, what a foreign key is and how it stitches tables together, how to write the SQL to create both from scratch, the classic mistakes beginners make and exactly how to fix them, and the interview questions that trip people up. You'll walk away ready to design a real, properly connected database.
What Is a Primary Key — And Why Every Table Needs One
A primary key is a column (or combination of columns) in a database table whose value uniquely identifies every single row. Think of it like a passport number. Two people can share the same name, birthday, even the same hometown — but no two people share a passport number. That uniqueness is exactly what a primary key guarantees.
Why does this matter so much? Because databases need a reliable way to say 'I mean THIS exact row, not any other.' When you want to update a customer's email address, you can't search by name — what if two customers are both called 'James Brown'? You need one column that is guaranteed to be different for every row. That's your primary key.
A valid primary key follows three rules: it must be unique across all rows, it must never be NULL (empty), and it should never change once assigned. That last rule is important. If you use someone's email address as a primary key and they change their email, every reference to that key across your whole database breaks. This is why databases typically use a simple auto-incrementing integer — like 1, 2, 3, 4 — as the primary key. It's meaningless outside the database, which is exactly what you want: stable, dumb, permanent.
-- Create a customers table with a proper primary key -- AUTO_INCREMENT means the database assigns the next number automatically -- You never insert a value for customer_id yourself CREATE TABLE customers ( customer_id INT NOT NULL AUTO_INCREMENT, -- the primary key column full_name VARCHAR(100) NOT NULL, -- customer's full name email_address VARCHAR(150) NOT NULL, -- must be provided join_date DATE NOT NULL, -- when they signed up -- This line officially declares customer_id as the primary key PRIMARY KEY (customer_id) ); -- Insert three customers — notice we do NOT provide customer_id -- The database fills it in for us: 1, 2, 3 INSERT INTO customers (full_name, email_address, join_date) VALUES ('Alice Mercer', 'alice@example.com', '2023-01-15'), ('James Brown', 'jbrown@example.com', '2023-03-22'), ('James Brown', 'jb2@example.com', '2024-07-01'); -- same name, different person -- Retrieve all customers to see the auto-assigned IDs SELECT customer_id, full_name, email_address FROM customers; -- Try inserting a duplicate primary key to see the protection in action -- This will FAIL — the database will reject it INSERT INTO customers (customer_id, full_name, email_address, join_date) VALUES (1, 'Hacker Person', 'hack@example.com', '2024-01-01'); -- ERROR: Duplicate entry '1' for key 'PRIMARY'
+-------------+-------------+-------------------+
| customer_id | full_name | email_address |
+-------------+-------------+-------------------+
| 1 | Alice Mercer | alice@example.com |
| 2 | James Brown | jbrown@example.com|
| 3 | James Brown | jb2@example.com |
+-------------+-------------+-------------------+
3 rows in set
-- Result of duplicate INSERT attempt:
ERROR 1062 (23000): Duplicate entry '1' for key 'customers.PRIMARY'
What Is a Foreign Key — The Bridge Between Tables
Once every table has its own primary key, you need a way to say 'this row in Table A belongs to that row in Table B.' That's exactly what a foreign key does. A foreign key is a column in one table that stores the primary key value from another table. It's the bridge.
Back to the library analogy: when a student borrows a book, the checkout record doesn't copy out the student's full name, address, grade, and teacher. It just stores the student's ID number. That ID number is a foreign key — it points back to the full student record. To get the student's name, you follow the pointer.
This design is called normalization, and it's a fundamental principle of good database design. Store each piece of information in exactly one place. When Alice Mercer moves house, you update her address in one row of the customers table. Every order, every message, every record that references her customer_id automatically reflects the new address. No hunting, no inconsistency.
The foreign key also acts as a guardian. By default, a database with a foreign key constraint will refuse to let you insert an order for a customer_id that doesn't exist in the customers table. It also prevents you from deleting a customer who still has active orders. This is called referential integrity — the database enforces that your references always point to something real.
-- We already have a customers table with customer_id as the primary key. -- Now we create an orders table that links to it via a foreign key. CREATE TABLE orders ( order_id INT NOT NULL AUTO_INCREMENT, -- primary key for this table customer_id INT NOT NULL, -- foreign key: references customers order_date DATE NOT NULL, total_amount DECIMAL(10,2) NOT NULL, order_status VARCHAR(20) NOT NULL DEFAULT 'pending', -- Declare order_id as this table's own primary key PRIMARY KEY (order_id), -- Declare the foreign key relationship -- customer_id in THIS table must match a customer_id in the customers table CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers (customer_id) ON DELETE RESTRICT -- prevent deleting a customer who has orders ON UPDATE CASCADE -- if customer_id changes, update it here too ); -- Insert a valid order for Alice (customer_id = 1) — this works INSERT INTO orders (customer_id, order_date, total_amount) VALUES (1, '2024-02-10', 59.99); -- Insert another valid order for James Brown (customer_id = 2) — this works INSERT INTO orders (customer_id, order_date, total_amount) VALUES (2, '2024-03-05', 120.00); -- Try to insert an order for customer_id = 999 — does not exist in customers -- The foreign key constraint will BLOCK this INSERT INTO orders (customer_id, order_date, total_amount) VALUES (999, '2024-04-01', 45.00); -- ERROR: Cannot add or update a child row: foreign key constraint fails -- Join the two tables to see orders with the customer's name -- This is the power of the foreign key — we can reunite the data SELECT o.order_id, c.full_name AS customer_name, o.order_date, o.total_amount, o.order_status FROM orders o JOIN customers c ON o.customer_id = c.customer_id; -- follow the foreign key bridge
+----------+---------------+------------+--------------+--------------+
| order_id | customer_name | order_date | total_amount | order_status |
+----------+---------------+------------+--------------+--------------+
| 1 | Alice Mercer | 2024-02-10 | 59.99 | pending |
| 2 | James Brown | 2024-03-05 | 120.00 | pending |
+----------+---------------+------------+--------------+--------------+
2 rows in set
-- Result of invalid INSERT (customer 999 does not exist):
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails
(`mydb`.`orders`, CONSTRAINT `fk_orders_customer` FOREIGN KEY (`customer_id`)
REFERENCES `customers` (`customer_id`))
Composite Keys, Natural Keys, and When to Break the Rules
So far we've used a single auto-incrementing integer as the primary key, and that covers 90% of real-world cases. But you'll occasionally encounter two variations worth knowing about.
A composite primary key uses two or more columns together to form the unique identifier. Imagine a table tracking which students are enrolled in which courses. A student can enrol in many courses, and a course can have many students — but a specific student can only be enrolled in a specific course once. So the combination of (student_id, course_id) is what must be unique. Neither column alone is the primary key; together they are.
A natural key uses a real-world piece of data that happens to be unique — like a national insurance number, ISBN, or vehicle registration plate. Some teams prefer these because they carry meaning. The risk, as mentioned earlier, is that real-world data can change or have edge cases (two editions of a book with the same ISBN is a real problem). A surrogate key — the auto-incremented integer — has no meaning outside the database and never changes, which is why most modern systems prefer it.
The rule of thumb: use a surrogate key by default. Add a UNIQUE constraint on natural data (like email addresses) if you need to enforce uniqueness there too. That way you get the stability of a surrogate key AND the uniqueness guarantee on the real-world field.
-- Example 1: Composite Primary Key -- The enrolments table tracks student-course pairings -- A student cannot enrol in the same course twice CREATE TABLE enrolments ( student_id INT NOT NULL, -- foreign key pointing to students table course_id INT NOT NULL, -- foreign key pointing to courses table enrol_date DATE NOT NULL, grade CHAR(2), -- NULL until the course is completed -- The PRIMARY KEY is the COMBINATION of both columns -- Neither column alone is unique; the pair is PRIMARY KEY (student_id, course_id), CONSTRAINT fk_enrolment_student FOREIGN KEY (student_id) REFERENCES students (student_id), CONSTRAINT fk_enrolment_course FOREIGN KEY (course_id) REFERENCES courses (course_id) ); -- This inserts fine — student 1 enrolling in course 10 INSERT INTO enrolments (student_id, course_id, enrol_date) VALUES (1, 10, '2024-09-01'); -- This also inserts fine — student 1 enrolling in a DIFFERENT course INSERT INTO enrolments (student_id, course_id, enrol_date) VALUES (1, 11, '2024-09-01'); -- This FAILS — student 1 is already in course 10 (duplicate composite key) INSERT INTO enrolments (student_id, course_id, enrol_date) VALUES (1, 10, '2024-10-01'); -- ERROR: Duplicate entry '1-10' for key 'PRIMARY' -- ------------------------------------------------------- -- Example 2: Surrogate key + UNIQUE constraint on email -- The surrogate key (customer_id) is the real primary key -- The UNIQUE constraint stops duplicate emails without making email the PK CREATE TABLE members ( member_id INT NOT NULL AUTO_INCREMENT, email_address VARCHAR(150) NOT NULL, full_name VARCHAR(100) NOT NULL, PRIMARY KEY (member_id), -- Enforce email uniqueness separately — best of both worlds CONSTRAINT uq_members_email UNIQUE (email_address) ); -- Valid insert INSERT INTO members (email_address, full_name) VALUES ('carol@example.com', 'Carol Danes'); -- This fails — email already exists INSERT INTO members (email_address, full_name) VALUES ('carol@example.com', 'Carol Smith'); -- ERROR: Duplicate entry 'carol@example.com' for key 'uq_members_email'
+------------+-----------+------------+-------+
| student_id | course_id | enrol_date | grade |
+------------+-----------+------------+-------+
| 1 | 10 | 2024-09-01 | NULL |
| 1 | 11 | 2024-09-01 | NULL |
+------------+-----------+------------+-------+
-- Duplicate composite key error:
ERROR 1062 (23000): Duplicate entry '1-10' for key 'enrolments.PRIMARY'
-- Duplicate email error:
ERROR 1062 (23000): Duplicate entry 'carol@example.com' for key 'uq_members_email'
| Feature / Aspect | Primary Key | Foreign Key |
|---|---|---|
| Purpose | Uniquely identifies each row in its own table | Links a row in this table to a row in another table |
| Where it lives | In the table it identifies | In the child/referencing table |
| Must be unique? | Yes — always, no exceptions | No — many rows can share the same foreign key value |
| Can it be NULL? | Never — NULL is forbidden | Yes, if the relationship is optional (e.g. an order with no assigned salesperson) |
| Can there be duplicates? | No — each value must appear exactly once | Yes — multiple orders can reference the same customer_id |
| Auto-generated? | Commonly yes, via AUTO_INCREMENT or SERIAL | No — you supply the value from the parent table's primary key |
| What happens on violation? | INSERT/UPDATE is rejected with a duplicate key error | INSERT is rejected; DELETE of parent is blocked (RESTRICT) or cascades (CASCADE) |
| Typical column name | table_name + _id, e.g. customer_id in customers | Same name as the parent PK, e.g. customer_id in orders |
| Number allowed per table | Exactly one primary key (can span multiple columns) | Many foreign keys allowed — a table can reference several other tables |
🎯 Key Takeaways
- A primary key is a column whose value is unique and never NULL for every row — it's the table's permanent identity badge, ideally an auto-incrementing integer that has no real-world meaning.
- A foreign key is a column that stores the primary key value from another table — it's the bridge that links related data across tables without duplicating information.
- Foreign key constraints enforce referential integrity at the database level — the database itself will block you from inserting orphaned records or deleting parents that have dependent children, not just your application code.
- Use ON DELETE RESTRICT for financial or critical data (safe default), ON DELETE CASCADE for owned child data like user messages, and ON DELETE SET NULL for optional relationships — the choice you make here has real consequences in production.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Using a mutable real-world value (like email or username) as a primary key — Symptom: UPDATE statements cascade-fail across multiple tables when a user changes their email, or you get mysterious orphaned records — Fix: Always use a surrogate key (INT AUTO_INCREMENT or UUID) as the primary key. Add a UNIQUE constraint on the email column separately if you need uniqueness there. The primary key should be meaningless, permanent, and auto-assigned.
- ✕Mistake 2: Forgetting to index the foreign key column — Symptom: JOIN queries on large tables become painfully slow as the table grows, even though the query looks correct — Fix: Most databases (PostgreSQL, SQL Server) do NOT automatically create an index on a foreign key column — only MySQL/InnoDB does. Always create an explicit index: CREATE INDEX idx_orders_customer_id ON orders (customer_id); This turns a full table scan into a lightning-fast lookup every time you JOIN the two tables.
- ✕Mistake 3: Inserting child rows before parent rows (or dropping tables in the wrong order) — Symptom: ERROR 1452 on INSERT or ERROR 1451 on DROP/DELETE even though the data looks correct — Fix: Always insert the parent record first (e.g. insert the customer before inserting their order). When dropping tables, drop child tables before parent tables, or temporarily disable foreign key checks: SET FOREIGN_KEY_CHECKS = 0; DROP TABLE orders; DROP TABLE customers; SET FOREIGN_KEY_CHECKS = 1; Re-enable checks immediately after — never leave them off.
Interview Questions on This Topic
- QWhat is the difference between a primary key and a unique key? (The common trap: both enforce uniqueness, but a table can only have ONE primary key and it cannot contain NULLs, whereas you can have multiple UNIQUE constraints on a table and UNIQUE columns can contain NULL in most databases — though NULL behaviour varies by database engine.)
- QCan a table have a foreign key that references itself? (Yes — this is called a self-referencing or recursive foreign key and it's used for hierarchical data like an employees table where each employee has a manager_id that points back to another employee_id in the same table.)
- QWhat is referential integrity, and how do PRIMARY KEY and FOREIGN KEY constraints enforce it? (Referential integrity means every foreign key value must correspond to an existing primary key value in the parent table. The FK constraint enforces this by blocking inserts of non-existent references and by controlling what happens to child rows when a parent is deleted or updated, via ON DELETE and ON UPDATE rules like RESTRICT, CASCADE, SET NULL, and SET DEFAULT.)
Frequently Asked Questions
Can a foreign key reference a column that is not a primary key?
Yes, but only if that column has a UNIQUE constraint on it. The foreign key must reference a column that is guaranteed to contain unique values — which is true of primary keys and unique-constrained columns. In practice, referencing the primary key is almost always the right choice because it's always indexed and always unique.
Can a table have more than one foreign key?
Absolutely. A table can have as many foreign keys as it needs. An orders table, for example, might have a customer_id foreign key referencing customers, a product_id foreign key referencing products, and a salesperson_id foreign key referencing employees — all at the same time. Each constraint is independently enforced.
What is the difference between a primary key and a unique key?
Both guarantee that no two rows have the same value in that column. The differences: a table can have only one primary key but multiple unique keys; a primary key column can never contain NULL, but a unique key column can (in most databases, multiple NULLs are allowed in a unique column because NULL is considered 'unknown', not a duplicate value). Primary keys are also the default target for foreign key references.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.