A primary key uniquely identifies each row in a table and must be unique and non-null.
A foreign key links rows across tables by storing the parent's primary key value.
Primary keys are always unique; foreign keys can have duplicates and NULLs.
Missing indexes on foreign key columns cause full table scans on JOINs.
ON DELETE RESTRICT protects data integrity; ON DELETE CASCADE can be dangerous.
Biggest mistake: using a mutable real-world value (like email) as primary key.
✦ Definition~90s read
What is Primary Key and Foreign Key?
A primary key is a column (or combination of columns) in a database table whose value uniquely identifies every single row. Think of it like a passport number. Two people can share the same name, birthday, even the same hometown — but no two people share a passport number. That uniqueness is exactly what a primary key guarantees.
★
Imagine a school library.
Why does this matter so much? Because databases need a reliable way to say 'I mean THIS exact row, not any other.' When you want to update a customer's email address, you can't search by name — what if two customers are both called 'James Brown'? You need one column that is guaranteed to be different for every row. That's your primary key.
A valid primary key follows three rules: it must be unique across all rows, it must never be NULL (empty), and it should never change once assigned. That last rule is important. If you use someone's email address as a primary key and they change their email, every reference to that key across your whole database breaks.
This is why databases typically use a simple auto-incrementing integer — like 1, 2, 3, 4 — as the primary key. It's meaningless outside the database, which is exactly what you want: stable, dumb, permanent.
Plain-English First
Imagine a school library. Every book has a unique barcode — no two books share the same one. That barcode is the primary key: a one-of-a-kind label that identifies exactly one record. Now imagine a student borrows a book. The checkout slip has the student's ID on it — not the student's full name, address, and photo, just the ID. That ID pointing back to the student record is the foreign key. It's how two separate lists stay connected without duplicating all the data.
Every app you've ever used — Instagram, Spotify, your banking app — stores its data in a database. Users, posts, songs, transactions: all of it lives in tables. But here's the thing: data almost never exists in isolation. A post belongs to a user. A transaction belongs to an account. An order belongs to a customer. The entire system only works because there's a mechanism that links these tables together reliably and without chaos. That mechanism starts with primary keys and foreign keys.
Without them, you'd face a nightmare. Imagine storing a customer's name, address, and phone number on every single order they ever placed. When they move house, you'd have to update hundreds of rows. Miss one? Now your database is lying to you. Or imagine deleting a customer record but leaving all their orphaned orders sitting in the database pointing to nobody. Primary keys and foreign keys exist specifically to prevent these disasters — they enforce identity and relationships at the database level, so your data stays truthful.
By the end of this article you'll understand what a primary key is and why every table needs one, what a foreign key is and how it stitches tables together, how to write the SQL to create both from scratch, the classic mistakes beginners make and exactly how to fix them, and the interview questions that trip people up. You'll walk away ready to design a real, properly connected database.
What Is a Primary Key — And Why Every Table Needs One
A primary key is a column (or combination of columns) in a database table whose value uniquely identifies every single row. Think of it like a passport number. Two people can share the same name, birthday, even the same hometown — but no two people share a passport number. That uniqueness is exactly what a primary key guarantees.
Why does this matter so much? Because databases need a reliable way to say 'I mean THIS exact row, not any other.' When you want to update a customer's email address, you can't search by name — what if two customers are both called 'James Brown'? You need one column that is guaranteed to be different for every row. That's your primary key.
A valid primary key follows three rules: it must be unique across all rows, it must never be NULL (empty), and it should never change once assigned. That last rule is important. If you use someone's email address as a primary key and they change their email, every reference to that key across your whole database breaks. This is why databases typically use a simple auto-incrementing integer — like 1, 2, 3, 4 — as the primary key. It's meaningless outside the database, which is exactly what you want: stable, dumb, permanent.
create_customers_table.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Create a customers table with a proper primary key-- AUTO_INCREMENT means the database assigns the next number automatically-- You never insert a value for customer_id yourselfCREATETABLEcustomers (
customer_id INTNOTNULL AUTO_INCREMENT, -- the primary key column
full_name VARCHAR(100) NOTNULL, -- customer's full name
email_address VARCHAR(150) NOTNULL, -- must be provided
join_date DATENOTNULL, -- when they signed up-- This line officially declares customer_id as the primary keyPRIMARYKEY (customer_id)
);
-- Insert three customers — notice we do NOT provide customer_id-- The database fills it in for us: 1, 2, 3INSERTINTOcustomers (full_name, email_address, join_date)
VALUES
('Alice Mercer', 'alice@example.com', '2023-01-15'),
('James Brown', 'jbrown@example.com', '2023-03-22'),
('JamesBrown', 'jb2@example.com', '2024-07-01'); -- same name, different person-- Retrieve all customers to see the auto-assigned IDsSELECT customer_id, full_name, email_address FROM customers;
-- Try inserting a duplicate primary key to see the protection in action-- This will FAIL — the database will reject itINSERTINTOcustomers (customer_id, full_name, email_address, join_date)
VALUES (1, 'Hacker Person', 'hack@example.com', '2024-01-01');
-- ERROR: Duplicate entry '1' for key 'PRIMARY'
Output
-- Result of SELECT:
+-------------+-------------+-------------------+
| customer_id | full_name | email_address |
+-------------+-------------+-------------------+
| 1 | Alice Mercer | alice@example.com |
| 2 | James Brown | jbrown@example.com|
| 3 | James Brown | jb2@example.com |
+-------------+-------------+-------------------+
3 rows in set
-- Result of duplicate INSERT attempt:
ERROR 1062 (23000): Duplicate entry '1' for key 'customers.PRIMARY'
Pro Tip: Never Use Real-World Data as a Primary Key
It's tempting to use an email address or phone number as a primary key because they seem unique. Don't. People change emails. Phone numbers get recycled. Use a meaningless auto-incrementing integer (INT AUTO_INCREMENT in MySQL, SERIAL in PostgreSQL) — it's stable, fast to index, and has no real-world baggage.
Production Insight
Auto-increment primary keys can overflow if not sized correctly. An INT (4 bytes) maxes out at 2,147,483,647. For high-traffic tables, use BIGINT to avoid production outages.
Monitoring the sequence high-water mark avoids surprise primary key exhaustion.
Rule: choose data type based on growth, not yesterday's volume.
Key Takeaway
A primary key must be unique, non-null, and stable.
Use surrogate auto-incrementing integers instead of natural keys.
The primary key is the table's identity; never change it.
thecodeforge.io
Primary Key & Foreign Key Indexing Pitfalls
Primary Key Foreign Key
What Is a Foreign Key — The Bridge Between Tables
Once every table has its own primary key, you need a way to say 'this row in Table A belongs to that row in Table B.' That's exactly what a foreign key does. A foreign key is a column in one table that stores the primary key value from another table. It's the bridge.
Back to the library analogy: when a student borrows a book, the checkout record doesn't copy out the student's full name, address, grade, and teacher. It just stores the student's ID number. That ID number is a foreign key — it points back to the full student record. To get the student's name, you follow the pointer.
This design is called normalization, and it's a fundamental principle of good database design. Store each piece of information in exactly one place. When Alice Mercer moves house, you update her address in one row of the customers table. Every order, every message, every record that references her customer_id automatically reflects the new address. No hunting, no inconsistency.
The foreign key also acts as a guardian. By default, a database with a foreign key constraint will refuse to let you insert an order for a customer_id that doesn't exist in the customers table. It also prevents you from deleting a customer who still has active orders. This is called referential integrity — the database enforces that your references always point to something real.
create_orders_table_with_foreign_key.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- We already have a customers table with customer_id as the primary key.-- Now we create an orders table that links to it via a foreign key.CREATETABLEorders (
order_id INTNOTNULL AUTO_INCREMENT, -- primary key for this table
customer_id INTNOTNULL, -- foreign key: references customers
order_date DATENOTNULL,
total_amount DECIMAL(10,2) NOTNULL,
order_status VARCHAR(20) NOTNULLDEFAULT'pending',
-- Declare order_id as this table's own primary keyPRIMARYKEY (order_id),
-- Declare the foreign key relationship-- customer_id in THIS table must match a customer_id in the customers tableCONSTRAINT fk_orders_customer
FOREIGNKEY (customer_id)
REFERENCEScustomers (customer_id)
ONDELETERESTRICT-- prevent deleting a customer who has ordersONUPDATECASCADE-- if customer_id changes, update it here too
);
-- Insert a valid order for Alice (customer_id = 1) — this worksINSERTINTOorders (customer_id, order_date, total_amount)
VALUES (1, '2024-02-10', 59.99);
-- Insert another valid order for James Brown (customer_id = 2) — this worksINSERTINTOorders (customer_id, order_date, total_amount)
VALUES (2, '2024-03-05', 120.00);
-- Try to insert an order for customer_id = 999 — does not exist in customers-- The foreign key constraint will BLOCK thisINSERTINTOorders (customer_id, order_date, total_amount)
VALUES (999, '2024-04-01', 45.00);
-- ERROR: Cannot add or update a child row: foreign key constraint fails-- Join the two tables to see orders with the customer's name-- This is the power of the foreign key — we can reunite the dataSELECT
o.order_id,
c.full_name AS customer_name,
o.order_date,
o.total_amount,
o.order_status
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id; -- follow the foreign key bridge
Watch Out: ON DELETE and ON UPDATE Actually Matter
Most tutorials skip ON DELETE and ON UPDATE, but they change everything. ON DELETE RESTRICT (the safest default) stops you deleting a parent row that has children. ON DELETE CASCADE automatically deletes all child rows when the parent is deleted — useful for things like deleting a user and all their messages, but dangerous if misapplied to financial records. Always think carefully about which behaviour you want before writing the constraint.
Production Insight
Foreign key constraints can cause deadlocks in high-concurrency environments. When multiple transactions insert child rows and delete parent rows, lock contention on the parent table escalates.
Use indexed foreign keys and keep transactions short.
Rule: always index foreign keys and consider lock ordering to prevent deadlocks.
Key Takeaway
A foreign key links tables without duplicating data.
It enforces referential integrity at the database level.
Always match data types between FK and PK columns.
Composite Keys, Natural Keys, and When to Break the Rules
So far we've used a single auto-incrementing integer as the primary key, and that covers 90% of real-world cases. But you'll occasionally encounter two variations worth knowing about.
A composite primary key uses two or more columns together to form the unique identifier. Imagine a table tracking which students are enrolled in which courses. A student can enrol in many courses, and a course can have many students — but a specific student can only be enrolled in a specific course once. So the combination of (student_id, course_id) is what must be unique. Neither column alone is the primary key; together they are.
A natural key uses a real-world piece of data that happens to be unique — like a national insurance number, ISBN, or vehicle registration plate. Some teams prefer these because they carry meaning. The risk, as mentioned earlier, is that real-world data can change or have edge cases (two editions of a book with the same ISBN is a real problem). A surrogate key — the auto-incremented integer — has no meaning outside the database and never changes, which is why most modern systems prefer it.
The rule of thumb: use a surrogate key by default. Add a UNIQUE constraint on natural data (like email addresses) if you need to enforce uniqueness there too. That way you get the stability of a surrogate key AND the uniqueness guarantee on the real-world field.
composite_key_and_unique_constraint.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
-- Example 1: Composite Primary Key-- The enrolments table tracks student-course pairings-- A student cannot enrol in the same course twiceCREATETABLEenrolments (
student_id INTNOTNULL, -- foreign key pointing to students table
course_id INTNOTNULL, -- foreign key pointing to courses table
enrol_date DATENOTNULL,
grade CHAR(2), -- NULL until the course is completed-- The PRIMARY KEY is the COMBINATION of both columns-- Neither column alone is unique; the pair isPRIMARYKEY (student_id, course_id),
CONSTRAINT fk_enrolment_student
FOREIGNKEY (student_id) REFERENCESstudents (student_id),
CONSTRAINT fk_enrolment_course
FOREIGNKEY (course_id) REFERENCEScourses (course_id)
);
-- This inserts fine — student 1 enrolling in course 10INSERTINTOenrolments (student_id, course_id, enrol_date)
VALUES (1, 10, '2024-09-01');
-- This also inserts fine — student 1 enrolling in a DIFFERENT courseINSERTINTOenrolments (student_id, course_id, enrol_date)
VALUES (1, 11, '2024-09-01');
-- This FAILS — student 1 is already in course 10 (duplicate composite key)INSERTINTOenrolments (student_id, course_id, enrol_date)
VALUES (1, 10, '2024-10-01');
-- ERROR: Duplicate entry '1-10' for key 'PRIMARY'-- --------------------------------------------------------- Example 2: Surrogate key + UNIQUE constraint on email-- The surrogate key (customer_id) is the real primary key-- The UNIQUE constraint stops duplicate emails without making email the PKCREATETABLEmembers (
member_id INTNOTNULL AUTO_INCREMENT,
email_address VARCHAR(150) NOTNULL,
full_name VARCHAR(100) NOTNULL,
PRIMARYKEY (member_id),
-- Enforce email uniqueness separately — best of both worldsCONSTRAINT uq_members_email UNIQUE (email_address)
);
-- Valid insertINSERTINTOmembers (email_address, full_name)
VALUES ('carol@example.com', 'Carol Danes');
-- This fails — email already existsINSERTINTOmembers (email_address, full_name)
VALUES ('carol@example.com', 'Carol Smith');
-- ERROR: Duplicate entry 'carol@example.com' for key 'uq_members_email'
Output
-- After the two valid enrolment inserts:
+------------+-----------+------------+-------+
| student_id | course_id | enrol_date | grade |
+------------+-----------+------------+-------+
| 1 | 10 | 2024-09-01 | NULL |
| 1 | 11 | 2024-09-01 | NULL |
+------------+-----------+------------+-------+
-- Duplicate composite key error:
ERROR 1062 (23000): Duplicate entry '1-10' for key 'enrolments.PRIMARY'
-- Duplicate email error:
ERROR 1062 (23000): Duplicate entry 'carol@example.com' for key 'uq_members_email'
Interview Gold: Surrogate vs Natural Key
Interviewers love asking 'should you use an email address as a primary key?' The strong answer is no — use a surrogate key (auto-increment integer or UUID) for stability, then add a UNIQUE constraint on the email column. This separates identity from data and future-proofs the table against changes in the real world.
Production Insight
Composite primary keys can hurt performance when used as foreign keys because they require multi-column indexes. Use a surrogate key for the primary table and a unique constraint on the composite columns.
In many-to-many relationships, the join table benefits from a surrogate PK plus unique constraint on the two FK columns.
Surrogate keys are simpler for foreign key references.
Add unique constraints on natural combinations instead of making them the PK.
ON DELETE and ON UPDATE: Choosing the Right Referential Action
When you create a foreign key, you have to decide what happens when the parent row is deleted or its primary key is updated. The database offers several options, and each has real consequences.
RESTRICT: prevents the operation entirely if any child rows exist. This is the safest default for financial or critical data — it forces you to handle children explicitly.
CASCADE: automatically deletes or updates all child rows. Useful when children are owned by the parent (e.g., user and their messages) but dangerous if misapplied to data that shouldn't be silently removed.
SET NULL: sets the foreign key column in child rows to NULL. Useful for optional relationships (e.g., an order with no assigned salesperson), but can leave orphaned rows.
SET DEFAULT: sets the foreign key column to a predefined default value. Rarely used in practice because of the tight coupling.
NO ACTION: similar to RESTRICT in most databases, but the check is deferred until commit in some systems (like PostgreSQL with BEGIN...COMMIT).
The choice depends entirely on your data semantics. For example, deleting a customer who has orders should be blocked (RESTRICT) or you should archive instead. Deleting a blog post should also delete its comments (CASCADE). There's no universal right answer.
referential_actions_examples.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Example 1: ON DELETE CASCADE — when child rows should be removed with the parentCREATETABLEuser_messages (
message_id INT AUTO_INCREMENT,
user_id INTNOTNULL,
content TEXT,
PRIMARYKEY (message_id),
CONSTRAINT fk_user_messages
FOREIGNKEY (user_id)
REFERENCESusers (user_id)
ONDELETECASCADE
);
-- Deleting a user will automatically delete all their messagesDELETEFROM users WHERE user_id = 42;
-- Example 2: ON DELETE SET NULL — for optional relationshipsCREATETABLEorders (
order_id INT AUTO_INCREMENT,
customer_id INT, -- nullable, because salesperson can be unassigned
salesperson_id INT, -- foreign key to employees tablePRIMARYKEY (order_id),
CONSTRAINT fk_orders_salesperson
FOREIGNKEY (salesperson_id)
REFERENCESemployees (employee_id)
ONDELETESETNULL
);
-- Deleting an employee will set salesperson_id to NULL in affected ordersDELETEFROM employees WHERE employee_id = 7;
-- Example 3: ON UPDATE CASCADE — ensures child rows follow the parent's new PKCREATETABLEorder_items (
order_id INTNOTNULL,
product_id INTNOTNULL,
quantity INT,
PRIMARYKEY (order_id, product_id),
CONSTRAINT fk_order_items_order
FOREIGNKEY (order_id)
REFERENCESorders (order_id)
ONUPDATECASCADE
);
-- If the order_id changes in the parent, it will be updated here too
Output
-- All three examples demonstrate the different behaviours.
-- The key is to choose based on data ownership, not convenience.
Cascade Danger: Unintended Data Loss
ON DELETE CASCADE can wipe out thousands of child rows in a single DELETE statement. Always test cascading deletes on a staging copy before applying to production. A single misconfigured CASCADE can cascade through multiple tables, deleting records you never intended to remove. Use CASCADE only for owned child data (e.g., user messages), never for shared or financial records.
Production Insight
ON DELETE CASCADE can cascade through multiple tables in a single DELETE, quickly wiping out unrelated data if foreign key chains exist.
SET NULL can leave orphaned rows that break business logic if not handled properly in application code.
Rule: use RESTRICT by default; CASCADE only for clearly owned child data; SET NULL for optional relationships with proper application null-handling.
Key Takeaway
Referential actions determine what happens to child rows when a parent is deleted or updated.
RESTRICT is the safest default for financial data.
CASCADE can be dangerous if misapplied; always understand the dependency graph.
Performance Considerations: Indexing Foreign Keys and Avoiding Common Pitfalls
A foreign key column without an index is the silent killer of JOIN performance. When you run a query that joins on the foreign key, the database engine must scan the entire child table to find matching rows if there's no index. As the table grows, this full table scan becomes a bottleneck.
But there's more: foreign key constraints themselves add overhead. On every INSERT into the child table, the database must verify that the foreign key value exists in the parent table. That's a lookup. If the foreign key is indexed, this lookup is fast (often an index unique scan). Without an index, it triggers a scan on the parent's primary key index (already indexed) but if the foreign key itself is not indexed, the validation step on the child side for certain operations like DELETE on the parent may require checking the child table for referencing rows — and that's where the missing index hurts.
The rule: always create an index on every foreign key column immediately after creating the constraint. Most databases like PostgreSQL and SQL Server do NOT auto-index foreign keys. MySQL/InnoDB does, but MySQL is the exception.
However, indexes come with write overhead. Every INSERT, UPDATE, or DELETE on the table must update all indexes. For high-write tables, batch operations can reduce the overhead. Use EXPLAIN to verify index usage in your queries.
index_foreign_key_performance.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Before index: slow JOIN due to full table scanEXPLAINSELECT * FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
-- Output shows 'Type: ALL' on orders table — full table scan-- Create the missing indexCREATEINDEX idx_orders_customer_id ONorders (customer_id);
-- After index: index lookup is usedEXPLAINSELECT * FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
-- Output shows 'Type: ref' or 'Index lookup' — fast-- Verify index existsSHOWINDEXFROM orders;
Output
-- Before index: rows=50000 (full table), Extra: Using where
-- After index: rows=5, Extra: Using index condition
-- The difference is dramatic: from full scan to nearly instant.
Index Every Foreign Key — Immediately
Add the index as part of the same migration script that creates the foreign key. Don't rely on MySQL's auto-indexing (which is database-specific). In PostgreSQL, SQL Server, and Oracle, you must create the index explicitly. A missing index on a foreign key is the #1 cause of performance problems in systems that start small and grow.
Production Insight
Missing foreign key indexes are the #1 cause of slow JOIN queries in production databases. Always add an index immediately after creating a FK constraint.
On high-write tables, each FK check requires a lookup; batch inserts can reduce overhead.
Rule: create FK indexes as part of your migration script, not as an afterthought.
Key Takeaway
Index every foreign key column to avoid full table scans.
Use EXPLAIN to verify index usage.
Foreign key constraints add write overhead; batch writes to mitigate.
Primary Key vs. Foreign Key — The Two Laws of Relational Gravity
Stop thinking of them as features. Think of them as the laws that prevent your database from collapsing into a dumpster fire.
A primary key guarantees row-level uniqueness inside a single table. No nulls. No duplicates. Every row gets a single source of truth. Without one, your table is just a bag of records — unidentifiable, untrustworthy, and impossible to join without guesswork.
A foreign key is the contract between tables. It says "this column must exist over there before it can exist here." It's not a hint — it's a constraint enforced at the database level. The moment you rely on application code alone to maintain referential integrity, you've introduced a bug farm.
Here's the hard truth: primary keys are about identity. Foreign keys are about relationships. Mix them up and you'll end up with orphaned rows, broken joins, and debugging sessions that last until 3 AM.
Primary keys are internal. Foreign keys are external. One says "I am this row." The other says "I belong to that row." Respect the difference.
OrdersVsCustomers.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — database tutorial
CREATETABLEcustomers (
customer_id SERIALPRIMARYKEY, -- unique, not null, one per row
full_name TEXTNOTNULL
);
CREATETABLEorders (
order_id SERIALPRIMARYKEY,
customer_id INTNOTNULL,
order_date DATEDEFAULT CURRENT_DATE,
-- foreign key enforces existence in customersCONSTRAINT fk_orders_customers
FOREIGNKEY (customer_id)
REFERENCEScustomers(customer_id)
);
-- This fails: customer 999 doesn't existINSERTINTOorders (customer_id, order_date)
VALUES (999, '2024-03-15');
-- ERROR: insert or update on table "orders" violates foreign key constraint "fk_orders_customers"
Output
ERROR: insert or update on table "orders" violates foreign key constraint "fk_orders_customers"
Production Trap:
Never disable foreign key constraints — even temporarily — to bulk-load data. You will forget to re-enable them, and orphaned records will corrupt your reports for weeks before anyone notices.
Key Takeaway
Primary keys guarantee identity within a table; foreign keys enforce relationships between tables. Never let application logic substitute for a foreign key constraint.
Designing Relationships — One-to-Many vs. Many-to-Many
You can't throw foreign keys at tables and pray. You need to understand cardinality first, or your schema will look like spaghetti code.
One-to-many is the default relationship. One customer has many orders. You put the foreign key on the "many" side (orders table). That's it. Clean, fast, indexed.
Many-to-many is where rookies get wrecked. Students and courses. Orders and products. You can't jam a foreign key into either table because each row would need multiple IDs. Solution: a junction table. This is not optional. It's not academic. It's the only way to maintain referential integrity without breaking first normal form.
Self-referencing foreign keys are the dark corner. Employee table where manager_id points to employee_id. Works fine — but you must allow NULL for the root record. One mistake: cascade delete blows up your entire org tree. Always use ON DELETE SET NULL or write a recursive cleanup script.
Composite foreign keys? They exist, but avoid them unless you're dealing with legacy schemas or junction tables with natural keys. They eat index space, complicate migrations, and make your JOINs ugly.
Pick the right cardinality upfront. Changing it later means data migration, broken foreign keys, and a ticket that nobody wants to take.
StudentsCourses.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — database tutorial
-- One-to-many: one student, many enrollmentsCREATETABLEenrollments (
enrollment_id SERIALPRIMARYKEY,
student_id INTNOTNULLREFERENCESstudents(student_id),
course_id INTNOTNULLREFERENCEScourses(course_id),
enrolled_at TIMESTAMPTZDEFAULTNOW(),
UNIQUE (student_id, course_id) -- prevents duplicate enrollments
);
-- Self-referencing: employee hierarchyCREATETABLEemployees (
employee_id SERIALPRIMARYKEY,
manager_id INTREFERENCESemployees(employee_id),
full_name TEXTNOTNULL
);
-- This fails: no row with id 0INSERTINTOemployees (employee_id, manager_id, full_name)
VALUES (1, 0, 'CEO');
Output
ERROR: insert or update on table "employees" violates foreign key constraint "employees_manager_id_fkey"
Senior Shortcut:
For many-to-many relationships, always add a UNIQUE constraint on the two foreign key columns in the junction table. This prevents accidental duplicate relationships without adding an artificial composite primary key that bloats your indexes.
Key Takeaway
Foreign keys follow cardinality — put them on the 'many' side for one-to-many, use a junction table for many-to-many, and never cascade delete on self-referencing keys.
Managing Constraints and Naming Conventions
Databases enforce referential integrity through constraints, but poorly named constraints become a maintenance nightmare. Every primary key, foreign key, and index constraint should have a clear name that reveals its purpose at a glance. A foreign key constraint named FK_Order_Customer instantly tells you it links the Order table to Customer. Without this, debugging deadlocks or schema changes wastes hours scanning system tables. Naming conventions also prevent accidental constraint drops: a generic name like FK__Orders__CustId__1234 offers no clue. Adopt a standard: FK_ChildTable_ParentTable for foreign keys, PK_TableName for primaries. This habit pays off during migrations, audits, and when ON DELETE CASCADE accidentally removes data you meant to keep. Constraints are not just rules—they are documentation. Name them accordingly.
Commands completed successfully. Two tables created with named constraints.
Production Trap:
Default constraint names like 'FK__Orders__CustId__5EBF139D' change across environments and become impossible to track in logs. Always name constraints explicitly.
Key Takeaway
Name every constraint explicitly using a convention like FK_Child_Parent to avoid debugging nightmares.
Managing Modifications and Dropping Constraints
Modifying or dropping primary and foreign key constraints in production risks data corruption or cascading failures. A foreign key drop might allow orphaned rows; a primary key drop disables indexing and breaks joins. Always script changes as transactions, and verify dependencies first. To drop a foreign key, use ALTER TABLE ChildTable DROP CONSTRAINT FK_Name. To drop a primary key, you must first drop all foreign keys referencing it. For modifications, never drop and recreate without WITH CHECK or the optimizer ignores the constraint. For large tables, adding a foreign key with NOCHECK then validating inline prevents table locks. The golden rule: test in staging, script every DROP and ADD, and wrap in explicit transactions. A single uncommitted ALTER can block an entire production pipeline.
DropConstraintSafe.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — database tutorial
BEGINTRANSACTION;
-- Find all foreign keys referencing CustomerSELECT name FROM sys.foreign_keys
WHERE referenced_object_id = OBJECT_ID('Customer');
-- Drop FK before PKALTERTABLE [Order] DROPCONSTRAINT FK_Order_Customer;
-- Now safe to drop PK (rare, but needed for migration)ALTERTABLECustomerDROPCONSTRAINT PK_Customer;
-- Recreate PK with new structureALTERTABLECustomerADDCONSTRAINT PK_Customer PRIMARYKEY (CustomerId);
-- Re-add FK with validationALTERTABLE [Order] ADDCONSTRAINT FK_Order_Customer
FOREIGNKEY (CustomerId) REFERENCESCustomer(CustomerId)
WITHCHECK;
COMMIT;
Output
Transaction committed. Primary key dropped and recreated. Foreign key re-established with validation.
Production Trap:
Dropping a primary key without first dropping all referencing foreign keys triggers an immediate error. Worse, some databases silently drop child constraints if you use CASCADE—check your engine.
Key Takeaway
Always drop foreign keys before their referenced primary keys; wrap changes in a transaction and validate after each modification.
● Production incidentPOST-MORTEMseverity: high
The Unindexed Foreign Key That Took Down Checkout
Symptom
Users reported that placing an order took over 30 seconds; some orders failed with a timeout error. CPU usage on the database spiked to 100% during peak hours.
Assumption
The DBA assumed that foreign key constraints automatically create indexes on the referencing column. That's true for MySQL/InnoDB but not for PostgreSQL or SQL Server.
Root cause
The orders table had a foreign key on customer_id but no explicit index. Every INSERT and JOIN triggered a full table scan on orders to validate the foreign key constraint, because the database engine had to check for existing references when inserting or deleting parent rows.
Fix
CREATE INDEX idx_orders_customer_id ON orders (customer_id); After adding the index, order placement time dropped from 30 seconds to 50 milliseconds.
Key lesson
Foreign key constraints do NOT automatically create indexes in all databases.
Always create an explicit index on every foreign key column, especially in PostgreSQL and SQL Server.
Monitor query performance after adding foreign keys; a missing index is a silent performance killer.
Production debug guideSymptom-Action Guide for Referential Integrity Errors4 entries
Symptom · 01
INSERT fails with 'foreign key constraint fails'
→
Fix
Check if the parent record exists: SELECT * FROM parent_table WHERE pk_column = <value>. If not, insert it first.
Symptom · 02
DELETE fails with 'foreign key constraint fails'
→
Fix
Identify child records blocking the delete: SELECT * FROM child_table WHERE fk_column = <value>. Either delete or update them first, or use ON DELETE CASCADE.
Symptom · 03
JOIN query is unexpectedly slow
→
Fix
Check for missing index on the foreign key column: EXPLAIN SELECT ... Look for full table scan on the child table. Create index if missing.
Symptom · 04
UPDATE on primary key column fails
→
Fix
If ON UPDATE RESTRICT, update is blocked. Use ON UPDATE CASCADE or update child rows first. Check constraint definition: SHOW CREATE TABLE.
★ Quick Debug Cheat Sheet for Foreign Key IssuesCommon foreign key failures and immediate commands to diagnose and fix them.
INSERT fails with ERROR 1452 (23000)−
Immediate action
Stop and identify the missing parent record.
Commands
SELECT * FROM parent_table WHERE pk_column = <value>;
If not found, INSERT the parent record first, then retry the child INSERT.
Fix now
Use a transaction: BEGIN; INSERT INTO parent ...; INSERT INTO child ...; COMMIT;
DELETE fails with ERROR 1451 (23000)+
Immediate action
Find child records referencing the parent.
Commands
SELECT * FROM child_table WHERE fk_column = <value>;
If there are child records, decide: either delete them first, or temporarily disable FK check (not recommended in production).
Fix now
Set ON DELETE CASCADE if safe, or manually delete children before parent.
Slow JOIN queries on foreign key column+
Immediate action
Check execution plan for full table scan.
Commands
EXPLAIN SELECT ... FROM orders JOIN customers ON orders.customer_id = customers.customer_id;
Look for 'Seq Scan' or 'Table Scan' on orders. If found, add index: CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Fix now
Add the index and re-run the query.
UPDATE on primary key column fails with constraint violation+
Immediate action
Check foreign key constraint definition.
Commands
SHOW CREATE TABLE child_table;
If ON UPDATE RESTRICT is set, you must update child rows first or change to CASCADE.
Fix now
Use ON UPDATE CASCADE if appropriate, or update child fk values to match the new pk.
Primary Key vs Foreign Key
Feature / Aspect
Primary Key
Foreign Key
Purpose
Uniquely identifies each row in its own table
Links a row in this table to a row in another table
Where it lives
In the table it identifies
In the child/referencing table
Must be unique?
Yes — always, no exceptions
No — many rows can share the same foreign key value
Can it be NULL?
Never — NULL is forbidden
Yes, if the relationship is optional (e.g. an order with no assigned salesperson)
Can there be duplicates?
No — each value must appear exactly once
Yes — multiple orders can reference the same customer_id
Auto-generated?
Commonly yes, via AUTO_INCREMENT or SERIAL
No — you supply the value from the parent table's primary key
What happens on violation?
INSERT/UPDATE is rejected with a duplicate key error
INSERT is rejected; DELETE of parent is blocked (RESTRICT) or cascades (CASCADE)
Typical column name
table_name + _id, e.g. customer_id in customers
Same name as the parent PK, e.g. customer_id in orders
Number allowed per table
Exactly one primary key (can span multiple columns)
Many foreign keys allowed — a table can reference several other tables
Key takeaways
1
A primary key is a column whose value is unique and never NULL for every row
it's the table's permanent identity badge, ideally an auto-incrementing integer that has no real-world meaning.
2
A foreign key is a column that stores the primary key value from another table
it's the bridge that links related data across tables without duplicating information.
3
Foreign key constraints enforce referential integrity at the database level
the database itself will block you from inserting orphaned records or deleting parents that have dependent children, not just your application code.
4
Use ON DELETE RESTRICT for financial or critical data (safe default), ON DELETE CASCADE for owned child data like user messages, and ON DELETE SET NULL for optional relationships
the choice you make here has real consequences in production.
5
Always create an explicit index on every foreign key column
missing indexes are the #1 cause of slow JOINs in production databases.
Common mistakes to avoid
3 patterns
×
Using a mutable real-world value (like email) as a primary key
Symptom
UPDATE statements cascade-fail across multiple tables when a user changes their email, or you get mysterious orphaned records.
Fix
Always use a surrogate key (INT AUTO_INCREMENT or UUID) as the primary key. Add a UNIQUE constraint on the email column separately if you need uniqueness there. The primary key should be meaningless, permanent, and auto-assigned.
×
Forgetting to index the foreign key column
Symptom
JOIN queries on large tables become painfully slow as the table grows, even though the query looks correct.
Fix
Most databases (PostgreSQL, SQL Server) do NOT automatically create an index on a foreign key column — only MySQL/InnoDB does. Always create an explicit index: CREATE INDEX idx_orders_customer_id ON orders (customer_id); This turns a full table scan into a lightning-fast lookup every time you JOIN the two tables.
×
Inserting child rows before parent rows (or dropping tables in the wrong order)
Symptom
ERROR 1452 on INSERT or ERROR 1451 on DROP/DELETE even though the data looks correct.
Fix
Always insert the parent record first (e.g. insert the customer before inserting their order). When dropping tables, drop child tables before parent tables, or temporarily disable foreign key checks: SET FOREIGN_KEY_CHECKS = 0; DROP TABLE orders; DROP TABLE customers; SET FOREIGN_KEY_CHECKS = 1; Re-enable checks immediately after — never leave them off.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01JUNIOR
What is the difference between a primary key and a unique key?
Q02SENIOR
Can a table have a foreign key that references itself?
Q03SENIOR
What is referential integrity, and how do PRIMARY KEY and FOREIGN KEY co...
Q01 of 03JUNIOR
What is the difference between a primary key and a unique key?
ANSWER
Both enforce uniqueness, but a table can only have ONE primary key and it cannot contain NULLs, whereas you can have multiple UNIQUE constraints on a table and UNIQUE columns can contain NULL in most databases — though NULL behaviour varies by database engine. The primary key is also the default target for foreign key references and is typically clustered (in some database systems).
Q02 of 03SENIOR
Can a table have a foreign key that references itself?
ANSWER
Yes — this is called a self-referencing or recursive foreign key and it's used for hierarchical data like an employees table where each employee has a manager_id that points back to another employee_id in the same table.
Q03 of 03SENIOR
What is referential integrity, and how do PRIMARY KEY and FOREIGN KEY constraints enforce it?
ANSWER
Referential integrity means every foreign key value must correspond to an existing primary key value in the parent table. The FK constraint enforces this by blocking inserts of non-existent references and by controlling what happens to child rows when a parent is deleted or updated, via ON DELETE and ON UPDATE rules like RESTRICT, CASCADE, SET NULL, and SET DEFAULT. This ensures that your data references are always consistent.
01
What is the difference between a primary key and a unique key?
JUNIOR
02
Can a table have a foreign key that references itself?
SENIOR
03
What is referential integrity, and how do PRIMARY KEY and FOREIGN KEY constraints enforce it?
SENIOR
FAQ · 3 QUESTIONS
Frequently Asked Questions
01
Can a foreign key reference a column that is not a primary key?
Yes, but only if that column has a UNIQUE constraint on it. The foreign key must reference a column that is guaranteed to contain unique values — which is true of primary keys and unique-constrained columns. In practice, referencing the primary key is almost always the right choice because it's always indexed and always unique.
Was this helpful?
02
Can a table have more than one foreign key?
Absolutely. A table can have as many foreign keys as it needs. An orders table, for example, might have a customer_id foreign key referencing customers, a product_id foreign key referencing products, and a salesperson_id foreign key referencing employees — all at the same time. Each constraint is independently enforced.
Was this helpful?
03
What is the difference between a primary key and a unique key?
Both guarantee that no two rows have the same value in that column. The differences: a table can have only one primary key but multiple unique keys; a primary key column can never contain NULL, but a unique key column can (in most databases, multiple NULLs are allowed in a unique column because NULL is considered 'unknown', not a duplicate value). Primary keys are also the default target for foreign key references.