Mid-level 11 min · March 05, 2026

SQL NULL Comparison — Why = NULL Returns Zero Rows

A NULL comparison using '=' always yields unknown, causing zero rows.

N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

Follow
Production
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • SQL (Structured Query Language) is the universal language for querying and managing relational databases.
  • Core components: SELECT (columns), FROM (table), WHERE (filter), ORDER BY (sort), LIMIT (count).
  • Performance insight: Always specify columns instead of SELECT * — less data transfer and easier to optimize with indexes.
  • Production insight: A missing WHERE clause on an UPDATE or DELETE can modify or destroy all rows in a table — always test with SELECT first.
  • Biggest beginner mistake: Using = NULL instead of IS NULL — SQL silently returns no rows without error.
✦ Definition~90s read
What is Introduction to SQL?

SQL (Structured Query Language) is the standard language for managing and querying relational databases. It lets you retrieve, filter, sort, and aggregate data from tables, as well as insert, update, and delete records. The article's title highlights a classic gotcha: in SQL, = NULL returns zero rows because NULL is not a value but a marker for missing or unknown data.

Imagine a massive filing cabinet full of folders, and each folder holds a spreadsheet of information — one for customers, one for orders, one for products.

Comparisons with NULL require IS NULL or IS NOT NULL — a fundamental distinction that trips up even experienced developers. This article walks you through that concept and then builds from your first SELECT query through joins and data modification.

SQL operates on sets of rows, not procedural logic. You start with SELECT ... FROM ... WHERE to filter rows, then shape results with ORDER BY and LIMIT. Aggregate functions like COUNT, SUM, and GROUP BY let you summarize data — for example, total sales per region. INNER JOIN connects related tables (e.g., orders to customers) by matching keys.

Finally, INSERT, UPDATE, and DELETE let you modify data directly. Every major database — PostgreSQL, MySQL, SQL Server, SQLite — uses this same core syntax, though each has minor dialect differences.

Use SQL when you need to answer questions about structured data: "Which customers spent over $500 last month?" or "What's the average order value by category?" It's not the right tool for unstructured text search (use Elasticsearch) or real-time streaming (use Kafka). But for transactional systems, reporting, and analytics, SQL is the industry standard — handling billions of rows in production at companies like Uber, Airbnb, and Stripe.

Understanding NULL semantics and basic query patterns is the foundation for everything else.

Plain-English First

Imagine a massive filing cabinet full of folders, and each folder holds a spreadsheet of information — one for customers, one for orders, one for products. SQL is simply the language you use to talk to that filing cabinet: 'Hey, give me every customer who spent more than $100 last month.' The cabinet hands you exactly that list. No programming degree needed — SQL reads almost like plain English, and that's entirely by design.

SQL is the language your database speaks, and if you don't know it, you're guessing at what your data actually says. This isn't theory—it's the difference between a query that returns exactly what you need and a query that silently returns nothing because you forgot how NULL works. Skip the mental model, and you'll spend hours debugging why your WHERE clause is lying to you.

Why = NULL Returns Zero Rows

In SQL, NULL is not a value — it is a marker for the absence of a value. Because of this, the equality operator (=) cannot evaluate NULL on either side. Any comparison like column = NULL or NULL = NULL yields UNKNOWN, not TRUE, so the row is excluded from results. This is a direct consequence of SQL's three-valued logic: TRUE, FALSE, and UNKNOWN.

When you write WHERE column = NULL, the database engine does not treat NULL as a known value to match. Instead, it evaluates the expression as UNKNOWN, and since WHERE clauses only return rows where the condition is TRUE, zero rows are returned. The correct way to test for NULL is with IS NULL or IS NOT NULL. This is not a quirk — it is intentional and consistent across all SQL databases.

Use IS NULL whenever you need to filter for missing data. In real systems, this distinction matters for data integrity checks, ETL pipelines, and reporting queries. A common mistake is to write WHERE column = NULL expecting matches, which silently returns nothing — no error, no warning, just zero rows.

NULL ≠ NULL
In SQL, NULL = NULL evaluates to UNKNOWN, not TRUE. Always use IS NULL to test for missing values — never = NULL.
Production Insight
A payment reconciliation system once failed to flag unmatched transactions because the JOIN condition used t1.merchant_id = t2.merchant_id — but both sides were NULL for cash payments, so the rows were silently dropped.
The symptom: 12% of daily cash transactions were missing from the exception report, causing a $340K settlement delay.
Rule of thumb: When joining on nullable columns, explicitly handle NULLs with COALESCE or IS NULL checks — never assume = will match missing values.
Key Takeaway
NULL is not a value — it is the absence of a value, so = cannot compare it.
Use IS NULL or IS NOT NULL to test for NULL — never = NULL or != NULL.
Three-valued logic (TRUE, FALSE, UNKNOWN) governs all NULL comparisons — plan for it in every WHERE and JOIN.
SQL NULL Comparison: Why = NULL Returns Zero Rows THECODEFORGE.IO SQL NULL Comparison: Why = NULL Returns Zero Rows Illustrates the correct way to compare NULL values in SQL NULL = Unknown NULL is not a value, it's a missing/unknown marker = NULL Comparison Any comparison with NULL yields NULL (not TRUE) WHERE Clause Filters WHERE only returns rows where condition is TRUE IS NULL Operator Use IS NULL or IS NOT NULL to check for NULL ⚠ Using = NULL in WHERE returns zero rows Always use IS NULL or IS NOT NULL for NULL checks THECODEFORGE.IO
thecodeforge.io
SQL NULL Comparison: Why = NULL Returns Zero Rows
Introduction Sql

Your First SQL Query — SELECT, FROM, and WHERE Explained

SELECT is the most important SQL command you'll ever learn. It answers the question: 'Show me data.' The basic structure is almost a sentence: SELECT [what columns] FROM [which table].

The asterisk (*) is a wildcard meaning 'all columns'. Use it when exploring data. In production code, always name the columns explicitly — it's faster and far easier to read six months later.

The WHERE clause is how you filter. Without WHERE, SQL returns every single row in the table. WHERE adds a condition that each row must pass to be included in the results. Think of it as a bouncer at the door: only rows that satisfy the condition get in.

Conditions in WHERE can use these comparison operators: = (equals), != or <> (not equals), > (greater than), < (less than), >= (greater than or equal), <= (less than or equal). You can also combine multiple conditions using AND (both must be true) and OR (either can be true).

select_and_filter_books.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- ============================================================
-- Querying the bookstore database we created in the last step.
-- Run each query block one at a time to see the output.
-- ============================================================

-- Query 1: Get every column from every row in the books table.
-- Good for exploration, but avoid SELECT * in real apps.
SELECT * FROM books;

-- Query 2: Get only the title and price — we don't need everything.
SELECT title, price
FROM books;

-- Query 3: Filter with WHERE — only books cheaper than $10.00
SELECT title, price
FROM books
WHERE price < 10.00;

-- Query 4: Combine conditions with AND
-- Only books under $10 that were also published after 1980
SELECT title, price, published
FROM books
WHERE price < 10.00
  AND published > 1980;

-- Query 5: Use OR — books that are either Satire OR Dystopian genre
SELECT title, genre
FROM books
WHERE genre = 'Satire'
   OR genre = 'Dystopian';

-- Query 6: A tidier way to write the OR above, using IN
-- IN checks if a value matches any item in a list
SELECT title, genre
FROM books
WHERE genre IN ('Satire', 'Dystopian');
Output
-- Query 1 output (all rows, all columns):
-- book_id | title | author_id | genre | price | published
-- 1 | Nineteen Eighty-Four | 1 | Dystopian | 9.99 | 1949
-- 2 | Animal Farm | 1 | Satire | 7.49 | 1945
-- 3 | One Hundred Years of Solitude | 2 | Magical Realism | 12.99 | 1967
-- 4 | Americanah | 3 | Literary Fiction | 10.99 | 2013
-- 5 | Purple Hibiscus | 3 | Literary Fiction | 8.99 | 2003
-- 6 | Norwegian Wood | 4 | Literary Fiction | 11.50 | 1987
-- Query 3 output (price < 10.00):
-- title | price
-- Nineteen Eighty-Four | 9.99
-- Animal Farm | 7.49
-- Purple Hibiscus | 8.99
-- Query 4 output (price < 10.00 AND published > 1980):
-- title | price | published
-- Purple Hibiscus | 8.99 | 2003
-- Query 6 output (Satire OR Dystopian):
-- title | genre
-- Nineteen Eighty-Four | Dystopian
-- Animal Farm | Satire
Watch Out: AND vs OR Operator Precedence
SQL evaluates AND before OR, just like multiplication before addition in maths. If you write WHERE price < 10 OR genre = 'Satire' AND published > 1960, SQL reads the AND part first. Always use parentheses to make your intent explicit: WHERE price < 10 OR (genre = 'Satire' AND published > 1960). This single habit will save you hours of debugging wrong results.
Production Insight
A missing WHERE can return millions of rows from a large table.
Always use LIMIT when exploring, and never deploy SELECT * in code.
The AND/OR precedence trap is one of the most common bugs in production queries.
Key Takeaway
SELECT + FROM + WHERE = the core of every query.
Always name columns explicitly in production.
Use parentheses to control AND/OR logic.

Sorting, Limiting and Shaping Results with ORDER BY and LIMIT

Fetching data is one thing — getting it back in a useful order is another. ORDER BY sorts your results by one or more columns. ASC means ascending (A→Z, 0→9) and is the default. DESC means descending (Z→A, 9→0).

You can sort by multiple columns: ORDER BY genre ASC, price DESC sorts alphabetically by genre first, then within each genre sorts by price from highest to lowest. This is enormously useful for leaderboards, product listings, and reports.

LIMIT controls how many rows come back. Databases can hold millions of rows — you almost never want all of them at once. LIMIT 5 gives you the top five. Pair it with ORDER BY and you can answer questions like 'What are the 3 most expensive books?' in a single clean query.

ALIAS (the AS keyword) lets you rename a column in the output. This is purely cosmetic — it doesn't change the database — but it makes results far more readable, especially when column names are long or ambiguous.

sort_and_limit_books.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- ============================================================
-- Sorting, limiting and renaming output columns.
-- These techniques are used in virtually every real-world query.
-- ============================================================

-- Query 1: Sort all books by price, cheapest first (ASC is default).
SELECT title, price
FROM books
ORDER BY price ASC;

-- Query 2: Sort by price, most expensive first.
SELECT title, price
FROM books
ORDER BY price DESC;

-- Query 3: The 3 most expensive books — combine ORDER BY with LIMIT.
SELECT title, price
FROM books
ORDER BY price DESC
LIMIT 3;

-- Query 4: Sort by genre alphabetically, then by price within each genre.
-- This is multi-column sorting.
SELECT title, genre, price
FROM books
ORDER BY genre ASC, price ASC;

-- Query 5: Use AS to give columns friendlier names in the output.
-- 'title' becomes 'Book Title', 'price' becomes 'Price (USD)'.
-- Notice: multi-word aliases need to be wrapped in double quotes.
SELECT title   AS "Book Title",
       price   AS "Price (USD)",
       published AS "Year"
FROM books
ORDER BY price DESC
LIMIT 4;
Output
-- Query 1 output (cheapest first):
-- title | price
-- Animal Farm | 7.49
-- Purple Hibiscus | 8.99
-- Nineteen Eighty-Four | 9.99
-- Americanah | 10.99
-- Norwegian Wood | 11.50
-- One Hundred Years of Solitude | 12.99
-- Query 3 output (top 3 most expensive):
-- title | price
-- One Hundred Years of Solitude | 12.99
-- Norwegian Wood | 11.50
-- Americanah | 10.99
-- Query 5 output (aliased columns, top 4 by price):
-- Book Title | Price (USD) | Year
-- One Hundred Years of Solitude | 12.99 | 1967
-- Norwegian Wood | 11.50 | 1987
-- Americanah | 10.99 | 2013
-- Nineteen Eighty-Four | 9.99 | 1949
Pro Tip: Always LIMIT when exploring unknown data
If you're querying a production table for the first time and you have no idea how many rows it contains, always add LIMIT 10 to your SELECT. Running SELECT * FROM transactions on a table with 50 million rows without a LIMIT will either time out, crash your client, or make your DBA very unhappy. Explore first, then remove the limit once you know what you're working with.
Production Insight
Sorting on unindexed columns causes full table scans — slow on large tables.
Combining ORDER BY and LIMIT can enable optimised sorts.
Without LIMIT, your application might fetch millions of rows and crash the network.
Key Takeaway
ORDER BY sorts results (ASC/DEFAULT, DESC).
LIMIT restricts row count — always use it for exploration.
Aliases (AS) make output readable but don't affect the database.

Aggregate Functions — COUNT, SUM, AVG, MIN, MAX and GROUP BY

So far every query has returned individual rows. But sometimes you don't want individual records — you want a summary. How many books does each author have? What's the average price per genre? What's the most expensive book? Aggregate functions answer these questions by collapsing many rows into a single calculated value.

The five core aggregate functions: COUNT() counts rows, SUM() adds up numeric values, AVG() calculates the average, MIN() finds the smallest value, MAX() finds the largest.

Aggregate functions become truly powerful when combined with GROUP BY. GROUP BY splits your table into groups (one group per unique value in a column) and then runs the aggregate function on each group separately. The result is one summary row per group.

HAVING is like WHERE but for aggregated results. You can't use WHERE to filter on a COUNT or SUM because those values don't exist until after grouping happens. HAVING runs after the groups are formed, so it can filter on them.

aggregate_and_group_books.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- ============================================================
-- Aggregate functions and GROUP BY on our bookstore data.
-- This is where SQL starts to feel genuinely powerful.
-- ============================================================

-- Query 1: How many books are in the entire table?
-- COUNT(*) counts every row regardless of NULL values.
SELECT COUNT(*) AS total_books
FROM books;

-- Query 2: What is the average price of all books?
SELECT ROUND(AVG(price), 2) AS average_price
FROM books;
-- ROUND(value, 2) rounds the result to 2 decimal places.

-- Query 3: Most expensive and cheapest single book.
SELECT MAX(price) AS most_expensive,
       MIN(price) AS cheapest
FROM books;

-- Query 4: How many books does each author have?
-- GROUP BY splits the table into one group per author_id,
-- then COUNT(*) counts the rows in each group.
SELECT author_id,
       COUNT(*) AS number_of_books
FROM books
GROUP BY author_id
ORDER BY number_of_books DESC;

-- Query 5: Average price per genre.
SELECT genre,
       COUNT(*)            AS book_count,
       ROUND(AVG(price),2) AS avg_price
FROM books
GROUP BY genre
ORDER BY avg_price DESC;

-- Query 6: HAVING — only show genres that have MORE than 1 book.
-- We cannot use WHERE here because book_count doesn't exist yet
-- at the point WHERE runs. HAVING runs AFTER GROUP BY.
SELECT genre,
       COUNT(*) AS book_count
FROM books
GROUP BY genre
HAVING book_count > 1;
Output
-- Query 1 output:
-- total_books
-- 6
-- Query 2 output:
-- average_price
-- 10.16
-- Query 3 output:
-- most_expensive | cheapest
-- 12.99 | 7.49
-- Query 4 output (books per author):
-- author_id | number_of_books
-- 3 | 2 (Chimamanda — Americanah + Purple Hibiscus)
-- 1 | 2 (Orwell — 1984 + Animal Farm)
-- 2 | 1
-- 4 | 1
-- Query 5 output (avg price per genre):
-- genre | book_count | avg_price
-- Magical Realism | 1 | 12.99
-- Literary Fiction | 3 | 10.49
-- Dystopian | 1 | 9.99
-- Satire | 1 | 7.49
-- Query 6 output (genres with more than 1 book):
-- genre | book_count
-- Literary Fiction | 3
Interview Gold: WHERE vs HAVING
This is one of the most common SQL interview questions at every level. The rule is clean and memorable: WHERE filters individual rows BEFORE grouping happens. HAVING filters groups AFTER grouping happens. You literally cannot use WHERE to filter on an aggregate like COUNT(*) or AVG() — the database engine will throw an error because those values don't exist yet at the WHERE stage.
Production Insight
Aggregating without GROUP BY gives a single row — not always what you want.
Missing columns in GROUP BY cause errors in strict SQL databases.
Using WHERE instead of HAVING on aggregates silently returns wrong results.
Key Takeaway
COUNT, SUM, AVG, MIN, MAX collapse many rows into one.
GROUP BY creates groups; aggregates run per group.
WHERE filters rows before grouping; HAVING filters groups after.

Joining Tables with INNER JOIN — Connecting Data Across Tables

So far we've worked with one table. In real databases, data lives spread across multiple tables that relate to each other — customers -> orders -> products. To bring that data together in a single query, you need JOINs.

INNER JOIN returns only rows where the join condition is true in both tables. It's the most common JOIN type. The syntax: SELECT columns FROM table1 INNER JOIN table2 ON table1.column = table2.column. The ON clause specifies how the tables relate.

In our bookstore, books have an author_id that links to the authors table. To get each book's title alongside the author's full name, we join books to authors on books.author_id = authors.author_id.

If a book has an author_id that doesn't exist in authors, that book is excluded from the result. That's the 'inner' part — only matching records survive.

join_books_authors.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- ============================================================
-- Joining books and authors to show book title + author name.
-- INNER JOIN = only rows where the match exists in both tables.
-- ============================================================

-- Basic INNER JOIN: get every book with its author's full name.
SELECT books.title,
       authors.full_name
FROM books
INNER JOIN authors ON books.author_id = authors.author_id;

-- Add sorting to see them in a nice order.
SELECT books.title AS "Book Title",
       authors.full_name AS "Author"
FROM books
INNER JOIN authors ON books.author_id = authors.author_id
ORDER BY authors.full_name ASC;

-- Use table aliases to keep the query readable.
SELECT b.title,
       a.full_name
FROM books   AS b
INNER JOIN authors AS a ON b.author_id = a.author_id;
Output
-- Basic INNER JOIN output:
-- title | full_name
-- Nineteen Eighty-Four | George Orwell
-- Animal Farm | George Orwell
-- One Hundred Years of Solitude | Gabriel García Márquez
-- Americanah | Chimamanda Ngozi Adichie
-- Purple Hibiscus | Chimamanda Ngozi Adichie
-- Norwegian Wood | Haruki Murakami
Mental Model: JOIN is like a Venn diagram
  • INNER JOIN: rows that match in both tables.
  • LEFT JOIN: all rows from left table, matching from right (NULL if none).
  • RIGHT JOIN: all rows from right table, matching from left (NULL if none).
  • FULL JOIN: all rows from both tables, NULLs where no match.
Production Insight
A missing ON clause produces a CROSS JOIN — every pair of rows.
On two tables with 10k rows each, that's 100 million rows.
Always alias tables to avoid ambiguous column references.
Test JOINs with LIMIT 5 first to verify the row count is sane.
Key Takeaway
INNER JOIN returns only matching rows from both tables.
Always specify the JOIN condition in ON — never forget it.
Table aliases make queries readable and protect against ambiguous columns.
Which JOIN to use?
IfNeed only rows that exist in both tables
UseUse INNER JOIN
IfNeed all rows from the first table, with optional matches from the second
UseUse LEFT JOIN
IfNeed all rows from both tables regardless of matches
UseUse FULL OUTER JOIN (if supported)

Modifying Data — INSERT, UPDATE, and DELETE Basics

Querying data is half the story. To manage data, you need to add, change, and remove it. SQL provides three core commands for that: INSERT adds new rows, UPDATE modifies existing rows, and DELETE removes rows.

INSERT INTO table (columns) VALUES (values) adds one row at a time. You can also insert multiple rows in a single statement.

UPDATE table SET column = new_value WHERE condition changes data. The WHERE clause is critical here — if you omit it, every row in the table gets updated.

DELETE FROM table WHERE condition removes rows. Same warning: missing WHERE deletes every row.

Always run a SELECT with the same WHERE first to verify which rows will be affected before running UPDATE or DELETE.

modify_data.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- ============================================================
-- Adding, updating, and deleting data in our bookstore.
-- Run each block separately and verify with SELECT afterward.
-- ============================================================

-- 1. INSERT a new author.
INSERT INTO authors (author_id, full_name, country)
VALUES (5, 'Elena Ferrante', 'Italy');

-- 2. INSERT multiple books at once.
INSERT INTO books (book_id, title, author_id, genre, price, published)
VALUES
    (7, 'My Brilliant Friend',       5, 'Literary Fiction',  14.99, 2012),
    (8, 'The Story of a New Name',   5, 'Literary Fiction',  14.99, 2013);

-- 3. UPDATE the price of all books by a specific author (author_id = 1).
-- FIRST, preview which rows will be updated:
SELECT book_id, title, price
FROM books
WHERE author_id = 1;

-- THEN, run the UPDATE (10% price increase for George Orwell's books).
UPDATE books
SET price = price * 1.10
WHERE author_id = 1;

-- 4. DELETE a specific book by its ID.
-- Preview first:
SELECT * FROM books WHERE book_id = 8;

-- Then delete:
DELETE FROM books
WHERE book_id = 8;

-- 5. Danger: UPDATE without WHERE updates ALL rows.
-- Uncomment the next line only if you really mean it.
-- UPDATE books SET price = 0.00;
Output
-- INSERT: '2 rows affected' (for the two books)
-- After UPDATE (author_id = 1): Nineteen Eighty-Four price becomes 10.99, Animal Farm becomes 8.24.
-- After DELETE: book_id = 8 removed.
-- Always SELECT first to confirm scope.
Production Nightmare: UPDATE or DELETE without WHERE
Every DBA has a story about a junior dev running DELETE FROM users without a WHERE clause. The data is gone instantly — with no undo button (unless you have a backup or transaction). Always wrap modifications in a transaction (BEGIN; UPDATE ...; ROLLBACK; to test) and always SELECT first to preview affected rows.
Production Insight
A missing WHERE on UPDATE/DELETE is unrecoverable without backups.
Always run SELECT with the same WHERE first to count affected rows.
Use transactions: BEGIN, UPDATE, ROLLBACK to test, then COMMIT.
Schedule regular backups — modifications are irreversible without them.
Key Takeaway
INSERT adds rows, UPDATE modifies, DELETE removes.
Never run UPDATE or DELETE without a WHERE clause.
Preview with SELECT, protect with transactions, backup regularly.

Database Design: Why Your Schema Is Probably Wrong

Most developers jump straight to writing queries before they've thought about how their data actually relates. That's like building a house without a foundation. It works until it doesn't—and when it fails, it fails hard.

A relational database isn't just a bucket for data. It's a model of your business logic. Every table should represent one entity. Every column should be atomic. This is called normalization, and ignoring it is why you end up with columns like tags_csv or user_preferences_json that you're forced to parse in application code.

Here's the rule: If you can't explain what a table represents in one sentence, your schema is broken. A table called orders_items_products_users is a cry for help. Split it. Name things clearly. Define foreign keys upfront—not as an afterthought when your JOINs stop making sense.

Design for queries you'll write six months from now, not the one you're writing today. The database doesn't care about your feelings. It cares about constraints, indexes, and referential integrity.

SchemaDesignNightmare.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// io.thecodeforge — database tutorial

-- Bad schema: everything in one table
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    product_name VARCHAR(100),
    product_price DECIMAL(10,2),
    order_date DATETIME,
    shipping_address TEXT
);

-- Good schema: normalized entities
CREATE TABLE customers (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2)
);

CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    order_date DATETIME,
    shipping_address TEXT,
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);

CREATE TABLE order_items (
    id INT PRIMARY KEY,
    order_id INT,
    product_id INT,
    quantity INT,
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);
Output
Query successful. 0 rows affected.
Production Trap:
Don't fall for 'we'll normalize later.' Later never comes. You'll be debugging a 15-table JOIN at 2 AM because someone thought storing lists as comma-separated strings was 'faster.' It's not.
Key Takeaway
Normalize your schema first. You can always denormalize for performance, but you can't un-break a bad schema.

SQL Basics: The Mental Model That Saves Hours

SQL is not Python. It's not JavaScript. Stop trying to think procedurally—SQL is declarative. You tell it WHAT you want, not HOW to get it. This trips up everyone who comes from imperative languages.

Here's the mental model: Imagine SQL as a pipeline. Data flows through each clause in a specific order. Most developers think SELECT runs first. It doesn't. The actual execution order is:

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

This matters because you can't filter on an alias you defined in SELECT within WHERE—the WHERE clause hasn't seen it yet. I've seen junior devs spend an hour debugging this exact issue.

Another fundamental: SQL works on sets, not loops. When you write SELECT * FROM users WHERE id IN (1,2,3), the database is doing set operations, not iterating through a list. This is why SQL is faster at data operations than application code—it's built for batch processing.

Understand this mental model, and you'll stop writing queries that work accidentally and start writing queries that work by design.

ExecutionOrder.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — database tutorial

-- This WILL fail because WHERE can't see the alias
SELECT 
    u.id,
    CONCAT(u.first_name, ' ', u.last_name) AS full_name
FROM users u
WHERE full_name LIKE '%Smith%';

-- This WORKS: use HAVING or subquery
SELECT 
    u.id,
    CONCAT(u.first_name, ' ', u.last_name) AS full_name
FROM users u
HAVING full_name LIKE '%Smith%';

-- But best practice: filter before projection
SELECT 
    u.id,
    CONCAT(u.first_name, ' ', u.last_name) AS full_name
FROM users u
WHERE u.last_name = 'Smith';
Output
ERROR 1054: Unknown column 'full_name' in 'where clause'
Second query: 42 rows returned
Third query: 42 rows returned (0.002s)
Senior Shortcut:
When debugging a slow query, don't look at SELECT first. Look at FROM and WHERE. That's where you lose performance. SELECT is just formatting.
Key Takeaway
Memorize the execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. It will save you hours of debugging.

DBMS vs. SQL: Why You Need Both to Survive

Newcomers confuse SQL with the database itself. They're not the same thing. SQL is the language. DBMS (Database Management System) is the engine that runs it. You can write perfect SQL and still get destroyed by the DBMS you're running it on.

MySQL, PostgreSQL, SQL Server, SQLite—they all speak SQL, but each has its own quirks. PostgreSQL handles JSON natively. MySQL has GROUP BY behavior that'll bite you if you're not careful. SQLite doesn't support RIGHT JOINs at all. Knowing the difference keeps you from shipping code that works locally but explodes in production.

A DBMS does four things: store data, retrieve data, manage concurrent access, and ensure data integrity. That last one is why transactions exist. If you're not using transactions for operations that modify multiple tables, you're asking for data corruption.

Here's what matters: Pick one DBMS and learn it deeply before jumping to others. PostgreSQL is the industry standard for production systems. SQLite is fine for local dev. MySQL is popular but has enough foot-guns to fill a quarry. Know your toolkit.

DBMSQuirks.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — database tutorial

-- PostgreSQL: returns full_name and count
SELECT 
    full_name, 
    COUNT(*)
FROM users
GROUP BY full_name;

-- MySQL: same query might run but returns garbage
-- MySQL allows SELECT columns not in GROUP BY
-- It picks arbitrary values. This is dangerous.

-- Safe MySQL version:
SELECT 
    full_name, 
    COUNT(*) AS cnt
FROM users
GROUP BY full_name
ORDER BY cnt DESC;
Output
PostgreSQL: Returns grouped results with only explicit columns.
MySQL: Returns grouped results with arbitrary values for non-aggregated columns (deprecated in 8.0+).
Production Trap:
MySQL's loose GROUP BY behavior has caused production bugs where seemingly correct queries returned wrong data. Use ONLY_FULL_GROUP_BY mode to prevent this. Or just use PostgreSQL.
Key Takeaway
SQL is the lingua franca, but your DBMS is the local dialect. Know which one you're talking to before you start writing queries.

Subqueries: When One Query Isn't Enough

You've mastered SELECT, JOIN, and GROUP BY. But real-world data doesn't always cooperate. You need to filter by aggregated results, compare against averages, or find rows that don't exist in another table. That's where subqueries — queries inside queries — save your ass.

A subquery can live in WHERE, FROM, or even SELECT. It runs first, producing a value or a set of rows. The outer query uses that result like a variable. This isn't just theory: you'll use it to find customers who spent above average, products never ordered, or employees in the bottom quartile.

Don't nest too deep. Two levels is usually enough. Three is a code smell. If you need four, you're not breaking down the problem properly. Subqueries are surgical tools, not a substitute for proper schema design.

AboveAverageOrders.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — database tutorial

-- Get customers who spent more than average
SELECT first_name, last_name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    GROUP BY customer_id
    HAVING SUM(total) > (
        SELECT AVG(total)
        FROM orders
    )
);
Output
first_name | last_name
-----------+----------
Alice | Chen
Bob | Torres
Production Trap:
Correlated subqueries (subquery references outer query) can tank performance. They run once per row. Rewrite them as JOINs or window functions when tables exceed 10k rows.
Key Takeaway
Subqueries run first. Outer queries use their results. Keep it shallow — two levels max.

Window Functions: Rank, Lag, and Moving Averages Without Self-Joins

You need a running total. Or to rank salespeople by region. Or find the difference between each month's revenue and the previous month. You could write a self-join with GROUP BY — ugly, slow, and hard to read. Or you could use window functions, the unsung heroes of analytical SQL.

Window functions compute across a set of rows related to the current row, without collapsing them into one result. OVER() defines the window. PARTITION BY splits into groups. ORDER BY within the window sets the sequence. ROW_NUMBER(), RANK(), LAG(), LEAD(), SUM() OVER() — these are your new tools.

Every senior dev uses window functions for reporting, deduplication, and time-series analysis. They're faster than subqueries and cleaner than self-joins. Learn them once, use them daily.

MonthlyRevenueChange.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// io.thecodeforge — database tutorial

-- Month-over-month revenue change per product
SELECT 
    product_id,
    month,
    revenue,
    LAG(revenue) OVER (
        PARTITION BY product_id 
        ORDER BY month
    ) AS prev_month_revenue,
    revenue - LAG(revenue) OVER (
        PARTITION BY product_id 
        ORDER BY month
    ) AS change
FROM product_monthly_revenue;
Output
product_id | month | revenue | prev_month_revenue | change
-----------+----------+---------+-------------------+--------
101 | 2024-01 | 5000 | NULL | NULL
101 | 2024-02 | 6200 | 5000 | 1200
Senior Shortcut:
Use ROW_NUMBER() with PARTITION BY to deduplicate rows — assign 1 to the first occurrence, then filter WHERE row_num = 1. No GROUP BY needed.
Key Takeaway
Window functions compute across related rows without collapsing them. Partition to group, order to sequence, then aggregate.

Why SQL Optimization Matters Before Your Query Runs

Slow queries don't fail—they just waste seconds that compound into hours. SQL optimization is the practice of reducing query execution time and resource consumption. The why: every unoptimized SELECT or JOIN scans more rows than necessary, locking tables, consuming memory, and frustrating users. The how starts with understanding the query execution plan—EXPLAIN ANALYZE reveals whether your database is doing full table scans when it could use an index. Indexes are the single most impactful tool: a B-tree index on a WHERE column can turn a 10-second query into 10 milliseconds. But indexes aren't free—they slow down writes. Other techniques include avoiding SELECT *, using EXISTS instead of IN for subqueries, filtering early with WHERE before JOINs, and choosing the correct join type. The result: faster APIs, lower hosting costs, and happier DBAs.

Index_Optimization.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — database tutorial

// Slow: full table scan on 1M rows
SELECT * FROM orders WHERE status = 'pending';

// Fast: B-tree index on status
CREATE INDEX idx_orders_status ON orders(status);

// Re-run same query — now uses index scan
EXPLAIN ANALYZE
SELECT id, customer_id, total
FROM orders
WHERE status = 'pending';
Output
Index Scan using idx_orders_status on orders (cost=0.42..124.30 rows=850 width=18) (actual time=0.015..2.310 rows=850 loops=1)
Production Trap:
Adding an index on a low-cardinality column like 'is_deleted' (only two values) is useless. The database will still scan half the table.
Key Takeaway
Always check the execution plan before and after indexing—only add indexes that measurably reduce scan rows.

The Most Dreaded Topics: NULLs, JOIN Confusion, and Subquery Bloat

Three topics make experienced developers sweat: NULL behavior, multi-table JOINs, and nested subqueries. NULL is not a value—it's the absence of a value. Comparing anything to NULL with = returns NULL, not TRUE or FALSE, which is why WHERE column = NULL gives zero rows. The fix: use IS NULL or IS DISTINCT FROM. JOIN confusion happens because developers forget LEFT JOIN keeps all rows from the left table, while INNER JOIN drops non-matching rows. Mixing OUTER and INNER joins in the same query is the top cause of unexpected row loss. Subquery bloat occurs when developers wrap multiple subqueries inside SELECT instead of using JOINs or CTEs. A subquery inside a WHERE clause runs once per row—this is a performance disaster. Use EXISTS for correlated subqueries or rewrite with JOIN. The mental model: every query is a set operation—if you can't picture the Venn diagram, you'll get wrong results.

Null_Join_Trap.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — database tutorial

-- WRONG: returns zero rows for missing middle_name
SELECT id, name FROM users WHERE middle_name = NULL;

-- CORRECT: check for absence
SELECT id, name FROM users WHERE middle_name IS NULL;

-- JOIN trap: accidental cartesian product
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

-- Subquery bloat (runs N+1 times)
-- Rewrite as JOIN or use EXISTS
Output
id | name
---|-------
3 | Alice -- only rows with null middle_name
Production Trap:
Using LEFT JOIN with a WHERE condition on the right table (WHERE o.total > 100) silently converts it to an INNER JOIN and drops unmatched left rows.
Key Takeaway
Always treat NULL with IS NULL/IS NOT NULL, never with =, and always confirm JOIN type with a row count sanity check.

Course Syllabus: From Zero to Production-Ready SQL

This course is structured to build your mental model of SQL from the ground up, then rapidly push you into practical, complex querying. The syllabus is divided into four parts. Part One covers the foundational mental model: how a database engine interprets SELECT, WHERE, and FROM clauses differently than you read them. We then explore schema design — why most beginners (and many seniors) end up with conflicting, redundant tables. Part Two dives into aggregate functions, GROUP BY behavior, and the silent pitfalls of NULLs in calculations. Part Three is the core of real-world SQL: joining tables with INNER and LEFT JOINs, subqueries as derived tables, and window functions that eliminate fragile self-joins. Part Four focuses on data modification (INSERT, UPDATE, DELETE) with transactional safety, and finally SQL optimization — reading execution plans before your query hits production. Each module ends with a short quiz, and the final project is a multi-join warehouse report.

course_syllabus_check.sqlSQL
1
2
3
4
5
6
// io.thecodeforge — database tutorial
-- Quick check: list modules for review
SELECT module_id, title, duration_min
FROM course_syllabus
WHERE course_name = 'introduction_sql'
ORDER BY module_id;
Output
module_id | title | duration_min
1 | Mental Model & Schema Design | 45
2 | Aggregates & NULLs | 50
3 | Joins, Subqueries, Windows | 70
4 | Mutation, Transactions, Optimz | 55
Production Trap:
Skipping the mental model and jumping straight to JOIN syntax leads to confusion about row counts and duplicate rows. I've seen production queries return 10x expected rows because developers didn't understand JOIN order.
Key Takeaway
Master the theory before you write another JOIN.

Projects, Certification & Paid Features

This course includes two capstone projects: first, you'll design and populate a normalized inventory database for a fictitious e-commerce store — including customers, orders, products, and a shared many-to-many order_items table. You'll write queries that answer real business questions: 'Which products have the highest average order value?' and 'Which customers have not ordered in 90 days?' The second project is a performance analysis: you're given a slow production query (a 6-table JOIN with subqueries) and must rewrite it using window functions and indexing hints to cut execution time by 80%. Upon completing both projects and passing the final assessment (80%+ on a 20-question mixed quiz), you'll earn a certificate of completion from TheCodeForge, verifiable via a unique link. Current course rating: 4.7/5 from 1,200+ reviews. A paid plan unlocks downloadable PDF cheatsheets, a private Slack community for live query reviews, and early access to the 'Advanced Indexing & Query Tuning' module. Free tier includes all lessons and community forum access.

certificate_eligibility.sqlSQL
1
2
3
4
5
6
7
8
// io.thecodeforge — database tutorial
SELECT student_id
FROM course_progress
WHERE course_id = 'intro_sql'
  AND project_1_passed = 1
  AND project_2_passed = 1
  AND final_quiz_score >= 80
  AND plan_type IN ('free', 'paid');
Output
student_id
2041
3092
4783
Production Trap:
Projects are not toy exercises. They mimic real edge cases: duplicate customer emails, missing foreign keys, and timezone mismatches. Skipping them means you'll hit these bugs on Monday morning.
Key Takeaway
Finish both projects to earn the certificate and fix real DB pitfalls.
● Production incidentPOST-MORTEMseverity: high

The Silent NULL – How a Missing IS NULL Filtered Out Half Your Customers

Symptom
SELECT COUNT(*) FROM customers WHERE country = NULL returned 0, but the country column clearly had NULL values.
Assumption
The developer assumed that = works with NULL like any other value, and that NULL means 'empty'.
Root cause
In SQL, NULL represents 'unknown'. Any comparison with =, !=, <, > on NULL yields UNKNOWN, which is not TRUE, so the row is excluded. WHERE country = NULL is always false for every row.
Fix
Replace WHERE country = NULL with WHERE country IS NULL, or for non-null check use WHERE country IS NOT NULL.
Key lesson
  • NULL is not a value — it's the absence of a value. Never use = or != to compare against NULL.
  • Always use IS NULL or IS NOT NULL for NULL checks.
  • When debugging, first check if NULLs exist: SELECT COUNT(*) FROM table WHERE column IS NULL.
  • Use COALESCE or ISNULL to provide defaults for NULL values in queries.
Production debug guideCommon SQL query failures and the exact actions engineers take to resolve them5 entries
Symptom · 01
Query returns zero rows when you expect data
Fix
Check WHERE clause logic: are you using = NULL? Are you comparing strings that differ in case or whitespace? Are the column names correct? Run SELECT * LIMIT 5 to see raw data.
Symptom · 02
Query returns too many rows (cartesian product)
Fix
If using JOIN, check that you have a proper JOIN condition. Missing ON clause or cross join multiplies rows. Use SELECT COUNT(*) on each table individually.
Symptom · 03
Error: 'column must appear in GROUP BY clause or be used in an aggregate function'
Fix
Add the unaggregated column to the GROUP BY clause. For PostgreSQL, this is strict. For MySQL with ONLY_FULL_GROUP_BY disabled, it may run but give wrong results.
Symptom · 04
Query runs extremely slowly on large tables
Fix
Run EXPLAIN (or EXPLAIN ANALYZE) to see query plan. Look for full table scans. Ensure indexes exist on columns used in WHERE, JOIN, and ORDER BY. Consider adding LIMIT.
Symptom · 05
Subquery returns more than one row when you expected one
Fix
Check if the subquery can return multiple rows. Use LIMIT 1, or use aggregation (MAX, MIN), or restructure with JOIN and DISTINCT.
★ Quick SQL Debugging Cheat SheetThree commands to diagnose any SQL query problem fast
Unexpected result count (too many or zero)
Immediate action
Count rows in each involved table separately to isolate the issue
Commands
SELECT COUNT(*) FROM table_name;
SELECT * FROM table_name LIMIT 5;
Fix now
Check WHERE clause conditions and ensure correct NULL handling.
Syntax error or ambiguous column reference+
Immediate action
Check column names and table aliases, and verify quotes are correct
Commands
SHOW COLUMNS FROM table_name; (or \d table_name in psql)
SELECT column FROM table WHERE x = 'value'; (simplify to debug)
Fix now
Alias columns explicitly and use proper quote types.
Slow query performance+
Immediate action
Obtain the query execution plan to find bottlenecks
Commands
EXPLAIN SELECT ...; (or EXPLAIN ANALYZE)
SHOW INDEX FROM table_name;
Fix now
Add index on columns used in WHERE and JOIN, avoid SELECT *, use LIMIT.
FeatureWHEREHAVING
What it filtersIndividual rowsGroups of rows (post-aggregation)
When it runsBefore GROUP BYAfter GROUP BY
Works with aggregates?No — causes an errorYes — designed for it
Example use caseWHERE price > 10HAVING COUNT(*) > 2
Required clauseNo — optionalNo — only needed with GROUP BY
Can replace the other?NoNo — they serve different stages

Key takeaways

1
SQL reads like English on purpose
SELECT [columns] FROM [table] WHERE [condition] is the skeleton of almost every query you'll ever write.
2
NULL is not a value
it means 'unknown'. Never compare it with = or !=. Always use IS NULL or IS NOT NULL, or you'll silently filter out data that should be there.
3
WHERE filters rows before grouping; HAVING filters groups after grouping
confusing the two is one of the most common SQL errors at every experience level.
4
Always add LIMIT when exploring an unfamiliar table
running an unbounded SELECT * on a production table with millions of rows can lock up your client, slow the database, and make you very unpopular very fast.
5
INNER JOIN returns only matching rows across tables
test with SELECT and LIMIT before running on production.
6
UPDATE and DELETE without WHERE affect all rows. Always preview with SELECT first, and use transactions to allow rollback.

Common mistakes to avoid

3 patterns
×

Confusing single and double quotes

Symptom
Syntax error like 'column Satire does not exist' or unexpected results when quoting identifiers or strings.
Fix
Use single quotes for string literals: WHERE genre = 'Satire'. Use double quotes for column/table names that need quoting (e.g., if they contain spaces): SELECT "Book Title".
×

Using = NULL instead of IS NULL

Symptom
Query returns zero rows even though NULL values exist, with no error message.
Fix
Always use IS NULL for NULL checks and IS NOT NULL for non-NULL checks. Never use = NULL or != NULL.
×

Non-aggregated column in SELECT without including it in GROUP BY

Symptom
Error: 'column must appear in the GROUP BY clause or be used in an aggregate function' (PostgreSQL) or arbitrary/incorrect results (MySQL with sql_mode off).
Fix
Include the column in the GROUP BY clause or wrap it in an aggregate function like MAX(), MIN(), or AVG().
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What's the difference between WHERE and HAVING in SQL, and can you give ...
Q02JUNIOR
If I run SELECT * FROM users WHERE last_login = NULL and get zero rows b...
Q03SENIOR
What does GROUP BY actually do to the rows in a table, and why does ever...
Q01 of 03SENIOR

What's the difference between WHERE and HAVING in SQL, and can you give a concrete example of when you'd use each?

ANSWER
WHERE filters individual rows before any grouping occurs. HAVING filters groups after GROUP BY is applied. For example: to find authors with more than 2 books, you cannot use WHERE COUNT() > 2 because COUNT is an aggregate. You must use GROUP BY author_id HAVING COUNT() > 2. Conversely, to find books with price > 10, you use WHERE price > 10 (no GROUP BY).
FAQ · 3 QUESTIONS

Frequently Asked Questions

01
Do I need to install anything to start learning SQL?
02
What's the difference between SQL and MySQL (or PostgreSQL)?
03
Is SQL still worth learning in 2025 with NoSQL and AI tools around?
N
Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

Follow
Verified
production tested
May 24, 2026
last updated
1,554
articles · all by Naren
🔥

That's SQL Basics. Mark it forged?

11 min read · try the examples if you haven't

1 / 16 · SQL Basics
Next
SQL CREATE TABLE and Data Types