Beginner 11 min · March 05, 2026

Introduction to SQL

SQL NULL Comparison — Why = NULL Returns Zero Rows

A NULL comparison using '=' always yields unknown, causing zero rows.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 20 min

✓Basic programming fundamentals
✓A computer with internet access
✓Willingness to follow along with examples

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

SQL (Structured Query Language) is the universal language for querying and managing relational databases.
Core components: SELECT (columns), FROM (table), WHERE (filter), ORDER BY (sort), LIMIT (count).
Performance insight: Always specify columns instead of SELECT * — less data transfer and easier to optimize with indexes.
Production insight: A missing WHERE clause on an UPDATE or DELETE can modify or destroy all rows in a table — always test with SELECT first.
Biggest beginner mistake: Using = NULL instead of IS NULL — SQL silently returns no rows without error.

✦ Definition~90s read

What is Introduction to SQL?

SQL (Structured Query Language) is the standard language for managing and querying relational databases. It lets you retrieve, filter, sort, and aggregate data from tables, as well as insert, update, and delete records. The article's title highlights a classic gotcha: in SQL, = NULL returns zero rows because NULL is not a value but a marker for missing or unknown data.

★

Imagine a massive filing cabinet full of folders, and each folder holds a spreadsheet of information — one for customers, one for orders, one for products.

Comparisons with NULL require IS NULL or IS NOT NULL — a fundamental distinction that trips up even experienced developers. This article walks you through that concept and then builds from your first SELECT query through joins and data modification.

SQL operates on sets of rows, not procedural logic. You start with SELECT ... FROM ... WHERE to filter rows, then shape results with ORDER BY and LIMIT. Aggregate functions like COUNT, SUM, and GROUP BY let you summarize data — for example, total sales per region. INNER JOIN connects related tables (e.g., orders to customers) by matching keys.

Finally, INSERT, UPDATE, and DELETE let you modify data directly. Every major database — PostgreSQL, MySQL, SQL Server, SQLite — uses this same core syntax, though each has minor dialect differences.

Use SQL when you need to answer questions about structured data: "Which customers spent over $500 last month?" or "What's the average order value by category?" It's not the right tool for unstructured text search (use Elasticsearch) or real-time streaming (use Kafka). But for transactional systems, reporting, and analytics, SQL is the industry standard — handling billions of rows in production at companies like Uber, Airbnb, and Stripe.

Understanding NULL semantics and basic query patterns is the foundation for everything else.

Plain-English First

Imagine a massive filing cabinet full of folders, and each folder holds a spreadsheet of information — one for customers, one for orders, one for products. SQL is simply the language you use to talk to that filing cabinet: 'Hey, give me every customer who spent more than $100 last month.' The cabinet hands you exactly that list. No programming degree needed — SQL reads almost like plain English, and that's entirely by design.

SQL is the language your database speaks, and if you don't know it, you're guessing at what your data actually says. This isn't theory—it's the difference between a query that returns exactly what you need and a query that silently returns nothing because you forgot how NULL works. Skip the mental model, and you'll spend hours debugging why your WHERE clause is lying to you.

Why = NULL Returns Zero Rows

In SQL, NULL is not a value — it is a marker for the absence of a value. Because of this, the equality operator (=) cannot evaluate NULL on either side. Any comparison like column = NULL or NULL = NULL yields UNKNOWN, not TRUE, so the row is excluded from results. This is a direct consequence of SQL's three-valued logic: TRUE, FALSE, and UNKNOWN.

When you write WHERE column = NULL, the database engine does not treat NULL as a known value to match. Instead, it evaluates the expression as UNKNOWN, and since WHERE clauses only return rows where the condition is TRUE, zero rows are returned. The correct way to test for NULL is with IS NULL or IS NOT NULL. This is not a quirk — it is intentional and consistent across all SQL databases.

Use IS NULL whenever you need to filter for missing data. In real systems, this distinction matters for data integrity checks, ETL pipelines, and reporting queries. A common mistake is to write WHERE column = NULL expecting matches, which silently returns nothing — no error, no warning, just zero rows.

⚠ NULL ≠ NULL

In SQL, NULL = NULL evaluates to UNKNOWN, not TRUE. Always use IS NULL to test for missing values — never = NULL.

📊 Production Insight

A payment reconciliation system once failed to flag unmatched transactions because the JOIN condition used t1.merchant_id = t2.merchant_id — but both sides were NULL for cash payments, so the rows were silently dropped.

The symptom: 12% of daily cash transactions were missing from the exception report, causing a $340K settlement delay.

Rule of thumb: When joining on nullable columns, explicitly handle NULLs with COALESCE or IS NULL checks — never assume = will match missing values.

🎯 Key Takeaway

NULL is not a value — it is the absence of a value, so = cannot compare it.

Use IS NULL or IS NOT NULL to test for NULL — never = NULL or != NULL.

Three-valued logic (TRUE, FALSE, UNKNOWN) governs all NULL comparisons — plan for it in every WHERE and JOIN.

thecodeforge.io

Introduction Sql

Your First SQL Query — SELECT, FROM, and WHERE Explained

SELECT is the most important SQL command you'll ever learn. It answers the question: 'Show me data.' The basic structure is almost a sentence: SELECT [what columns] FROM [which table].

The asterisk (*) is a wildcard meaning 'all columns'. Use it when exploring data. In production code, always name the columns explicitly — it's faster and far easier to read six months later.

The WHERE clause is how you filter. Without WHERE, SQL returns every single row in the table. WHERE adds a condition that each row must pass to be included in the results. Think of it as a bouncer at the door: only rows that satisfy the condition get in.

Conditions in WHERE can use these comparison operators: = (equals), != or <><> (not equals), > (greater than), < (less than), >= (greater than or equal), <= (less than or equal). You can also combine multiple conditions using AND (both must be true) and OR (either can be true).

select_and_filter_books.sqlSQL

-- ============================================================
-- Querying the bookstore database we created in the last step.
-- Run each query block one at a time to see the output.
-- ============================================================

-- Query 1: Get every column from every row in the books table.
-- Good for exploration, but avoid SELECT * in real apps.
SELECT * FROM books;

-- Query 2: Get only the title and price — we don't need everything.
SELECT title, price
FROM books;

-- Query 3: Filter with WHERE — only books cheaper than $10.00
SELECT title, price
FROM books
WHERE price < 10.00;

-- Query 4: Combine conditions with AND
-- Only books under $10 that were also published after 1980
SELECT title, price, published
FROM books
WHERE price < 10.00
  AND published > 1980;

-- Query 5: Use OR — books that are either Satire OR Dystopian genre
SELECT title, genre
FROM books
WHERE genre = 'Satire'
   OR genre = 'Dystopian';

-- Query 6: A tidier way to write the OR above, using IN
-- IN checks if a value matches any item in a list
SELECT title, genre
FROM books
WHERE genre IN ('Satire', 'Dystopian');

Output

-- Query 1 output (all rows, all columns):

-- 1 | Nineteen Eighty-Four | 1 | Dystopian | 9.99 | 1949

-- 2 | Animal Farm | 1 | Satire | 7.49 | 1945

-- 3 | One Hundred Years of Solitude | 2 | Magical Realism | 12.99 | 1967

-- 4 | Americanah | 3 | Literary Fiction | 10.99 | 2013

-- 5 | Purple Hibiscus | 3 | Literary Fiction | 8.99 | 2003

-- 6 | Norwegian Wood | 4 | Literary Fiction | 11.50 | 1987

-- Query 3 output (price < 10.00):

-- title | price

-- Nineteen Eighty-Four | 9.99

-- Animal Farm | 7.49

-- Purple Hibiscus | 8.99

-- Query 4 output (price < 10.00 AND published > 1980):

-- title | price | published

-- Purple Hibiscus | 8.99 | 2003

-- Query 6 output (Satire OR Dystopian):

-- title | genre

-- Nineteen Eighty-Four | Dystopian

-- Animal Farm | Satire

⚠ Watch Out: AND vs OR Operator Precedence

SQL evaluates AND before OR, just like multiplication before addition in maths. If you write WHERE price < 10 OR genre = 'Satire' AND published > 1960, SQL reads the AND part first. Always use parentheses to make your intent explicit: WHERE price < 10 OR (genre = 'Satire' AND published > 1960). This single habit will save you hours of debugging wrong results.

📊 Production Insight

A missing WHERE can return millions of rows from a large table.

Always use LIMIT when exploring, and never deploy SELECT * in code.

The AND/OR precedence trap is one of the most common bugs in production queries.

🎯 Key Takeaway

SELECT + FROM + WHERE = the core of every query.

Always name columns explicitly in production.

Use parentheses to control AND/OR logic.

Sorting, Limiting and Shaping Results with ORDER BY and LIMIT

Fetching data is one thing — getting it back in a useful order is another. ORDER BY sorts your results by one or more columns. ASC means ascending (A→Z, 0→9) and is the default. DESC means descending (Z→A, 9→0).

You can sort by multiple columns: ORDER BY genre ASC, price DESC sorts alphabetically by genre first, then within each genre sorts by price from highest to lowest. This is enormously useful for leaderboards, product listings, and reports.

LIMIT controls how many rows come back. Databases can hold millions of rows — you almost never want all of them at once. LIMIT 5 gives you the top five. Pair it with ORDER BY and you can answer questions like 'What are the 3 most expensive books?' in a single clean query.

ALIAS (the AS keyword) lets you rename a column in the output. This is purely cosmetic — it doesn't change the database — but it makes results far more readable, especially when column names are long or ambiguous.

sort_and_limit_books.sqlSQL

-- ============================================================
-- Sorting, limiting and renaming output columns.
-- These techniques are used in virtually every real-world query.
-- ============================================================

-- Query 1: Sort all books by price, cheapest first (ASC is default).
SELECT title, price
FROM books
ORDER BY price ASC;

-- Query 2: Sort by price, most expensive first.
SELECT title, price
FROM books
ORDER BY price DESC;

-- Query 3: The 3 most expensive books — combine ORDER BY with LIMIT.
SELECT title, price
FROM books
ORDER BY price DESC
LIMIT 3;

-- Query 4: Sort by genre alphabetically, then by price within each genre.
-- This is multi-column sorting.
SELECT title, genre, price
FROM books
ORDER BY genre ASC, price ASC;

-- Query 5: Use AS to give columns friendlier names in the output.
-- 'title' becomes 'Book Title', 'price' becomes 'Price (USD)'.
-- Notice: multi-word aliases need to be wrapped in double quotes.
SELECT title   AS "Book Title",
       price   AS "Price (USD)",
       published AS "Year"
FROM books
ORDER BY price DESC
LIMIT 4;

Output

-- Query 1 output (cheapest first):

-- title | price

-- Animal Farm | 7.49

-- Purple Hibiscus | 8.99

-- Nineteen Eighty-Four | 9.99

-- Americanah | 10.99

-- Norwegian Wood | 11.50

-- One Hundred Years of Solitude | 12.99

-- Query 3 output (top 3 most expensive):

-- title | price

-- One Hundred Years of Solitude | 12.99

-- Norwegian Wood | 11.50

-- Americanah | 10.99

-- Query 5 output (aliased columns, top 4 by price):

-- Book Title | Price (USD) | Year

-- One Hundred Years of Solitude | 12.99 | 1967

-- Norwegian Wood | 11.50 | 1987

-- Americanah | 10.99 | 2013

-- Nineteen Eighty-Four | 9.99 | 1949

💡Pro Tip: Always LIMIT when exploring unknown data

If you're querying a production table for the first time and you have no idea how many rows it contains, always add LIMIT 10 to your SELECT. Running SELECT * FROM transactions on a table with 50 million rows without a LIMIT will either time out, crash your client, or make your DBA very unhappy. Explore first, then remove the limit once you know what you're working with.

📊 Production Insight

Sorting on unindexed columns causes full table scans — slow on large tables.

Combining ORDER BY and LIMIT can enable optimised sorts.

Without LIMIT, your application might fetch millions of rows and crash the network.

🎯 Key Takeaway

ORDER BY sorts results (ASC/DEFAULT, DESC).

LIMIT restricts row count — always use it for exploration.

Aliases (AS) make output readable but don't affect the database.

thecodeforge.io

Introduction Sql

Aggregate Functions — COUNT, SUM, AVG, MIN, MAX and GROUP BY

So far every query has returned individual rows. But sometimes you don't want individual records — you want a summary. How many books does each author have? What's the average price per genre? What's the most expensive book? Aggregate functions answer these questions by collapsing many rows into a single calculated value.

The five core aggregate functions: COUNT() counts rows, SUM() adds up numeric values, AVG() calculates the average, MIN() finds the smallest value, MAX() finds the largest.

Aggregate functions become truly powerful when combined with GROUP BY. GROUP BY splits your table into groups (one group per unique value in a column) and then runs the aggregate function on each group separately. The result is one summary row per group.

HAVING is like WHERE but for aggregated results. You can't use WHERE to filter on a COUNT or SUM because those values don't exist until after grouping happens. HAVING runs after the groups are formed, so it can filter on them.

aggregate_and_group_books.sqlSQL

-- ============================================================
-- Aggregate functions and GROUP BY on our bookstore data.
-- This is where SQL starts to feel genuinely powerful.
-- ============================================================

-- Query 1: How many books are in the entire table?
-- COUNT(*) counts every row regardless of NULL values.
SELECT COUNT(*) AS total_books
FROM books;

-- Query 2: What is the average price of all books?
SELECT ROUND(AVG(price), 2) AS average_price
FROM books;
-- ROUND(value, 2) rounds the result to 2 decimal places.

-- Query 3: Most expensive and cheapest single book.
SELECT MAX(price) AS most_expensive,
       MIN(price) AS cheapest
FROM books;

-- Query 4: How many books does each author have?
-- GROUP BY splits the table into one group per author_id,
-- then COUNT(*) counts the rows in each group.
SELECT author_id,
       COUNT(*) AS number_of_books
FROM books
GROUP BY author_id
ORDER BY number_of_books DESC;

-- Query 5: Average price per genre.
SELECT genre,
       COUNT(*)            AS book_count,
       ROUND(AVG(price),2) AS avg_price
FROM books
GROUP BY genre
ORDER BY avg_price DESC;

-- Query 6: HAVING — only show genres that have MORE than 1 book.
-- We cannot use WHERE here because book_count doesn't exist yet
-- at the point WHERE runs. HAVING runs AFTER GROUP BY.
SELECT genre,
       COUNT(*) AS book_count
FROM books
GROUP BY genre
HAVING book_count > 1;

Output

-- Query 1 output:

-- total_books

-- 6

-- Query 2 output:

-- average_price

-- 10.16

-- Query 3 output:

-- most_expensive | cheapest

-- 12.99 | 7.49

-- Query 4 output (books per author):

-- author_id | number_of_books

-- 3 | 2 (Chimamanda — Americanah + Purple Hibiscus)

-- 1 | 2 (Orwell — 1984 + Animal Farm)

-- 2 | 1

-- 4 | 1

-- Query 5 output (avg price per genre):

-- genre | book_count | avg_price

-- Magical Realism | 1 | 12.99

-- Literary Fiction | 3 | 10.49

-- Dystopian | 1 | 9.99

-- Satire | 1 | 7.49

-- Query 6 output (genres with more than 1 book):

-- genre | book_count

-- Literary Fiction | 3

🔥Interview Gold: WHERE vs HAVING

This is one of the most common SQL interview questions at every level. The rule is clean and memorable: WHERE filters individual rows BEFORE grouping happens. HAVING filters groups AFTER grouping happens. You literally cannot use WHERE to filter on an aggregate like COUNT(*) or AVG() — the database engine will throw an error because those values don't exist yet at the WHERE stage.

📊 Production Insight

Aggregating without GROUP BY gives a single row — not always what you want.

Missing columns in GROUP BY cause errors in strict SQL databases.

Using WHERE instead of HAVING on aggregates silently returns wrong results.

🎯 Key Takeaway

COUNT, SUM, AVG, MIN, MAX collapse many rows into one.

GROUP BY creates groups; aggregates run per group.

WHERE filters rows before grouping; HAVING filters groups after.

Joining Tables with INNER JOIN — Connecting Data Across Tables

So far we've worked with one table. In real databases, data lives spread across multiple tables that relate to each other — customers -> orders -> products. To bring that data together in a single query, you need JOINs.

INNER JOIN returns only rows where the join condition is true in both tables. It's the most common JOIN type. The syntax: SELECT columns FROM table1 INNER JOIN table2 ON table1.column = table2.column. The ON clause specifies how the tables relate.

In our bookstore, books have an author_id that links to the authors table. To get each book's title alongside the author's full name, we join books to authors on books.author_id = authors.author_id.

If a book has an author_id that doesn't exist in authors, that book is excluded from the result. That's the 'inner' part — only matching records survive.

join_books_authors.sqlSQL

-- ============================================================
-- Joining books and authors to show book title + author name.
-- INNER JOIN = only rows where the match exists in both tables.
-- ============================================================

-- Basic INNER JOIN: get every book with its author's full name.
SELECT books.title,
       authors.full_name
FROM books
INNER JOIN authors ON books.author_id = authors.author_id;

-- Add sorting to see them in a nice order.
SELECT books.title AS "Book Title",
       authors.full_name AS "Author"
FROM books
INNER JOIN authors ON books.author_id = authors.author_id
ORDER BY authors.full_name ASC;

-- Use table aliases to keep the query readable.
SELECT b.title,
       a.full_name
FROM books   AS b
INNER JOIN authors AS a ON b.author_id = a.author_id;

Output

-- Basic INNER JOIN output:

-- title | full_name

-- Nineteen Eighty-Four | George Orwell

-- Animal Farm | George Orwell

-- One Hundred Years of Solitude | Gabriel García Márquez

-- Americanah | Chimamanda Ngozi Adichie

-- Purple Hibiscus | Chimamanda Ngozi Adichie

-- Norwegian Wood | Haruki Murakami

Mental Model

Mental Model: JOIN is like a Venn diagram

INNER JOIN is the overlapping center — only rows that exist in both circles.

INNER JOIN: rows that match in both tables.
LEFT JOIN: all rows from left table, matching from right (NULL if none).
RIGHT JOIN: all rows from right table, matching from left (NULL if none).
FULL JOIN: all rows from both tables, NULLs where no match.

📊 Production Insight

A missing ON clause produces a CROSS JOIN — every pair of rows.

On two tables with 10k rows each, that's 100 million rows.

Always alias tables to avoid ambiguous column references.

Test JOINs with LIMIT 5 first to verify the row count is sane.

🎯 Key Takeaway

INNER JOIN returns only matching rows from both tables.

Always specify the JOIN condition in ON — never forget it.

Table aliases make queries readable and protect against ambiguous columns.

Which JOIN to use?

IfNeed only rows that exist in both tables

→

UseUse INNER JOIN

IfNeed all rows from the first table, with optional matches from the second

→

UseUse LEFT JOIN

IfNeed all rows from both tables regardless of matches

→

UseUse FULL OUTER JOIN (if supported)

Modifying Data — INSERT, UPDATE, and DELETE Basics

Querying data is half the story. To manage data, you need to add, change, and remove it. SQL provides three core commands for that: INSERT adds new rows, UPDATE modifies existing rows, and DELETE removes rows.

INSERT INTO table (columns) VALUES (values) adds one row at a time. You can also insert multiple rows in a single statement.

UPDATE table SET column = new_value WHERE condition changes data. The WHERE clause is critical here — if you omit it, every row in the table gets updated.

DELETE FROM table WHERE condition removes rows. Same warning: missing WHERE deletes every row.

Always run a SELECT with the same WHERE first to verify which rows will be affected before running UPDATE or DELETE.

modify_data.sqlSQL

-- ============================================================
-- Adding, updating, and deleting data in our bookstore.
-- Run each block separately and verify with SELECT afterward.
-- ============================================================

-- 1. INSERT a new author.
INSERT INTO authors (author_id, full_name, country)
VALUES (5, 'Elena Ferrante', 'Italy');

-- 2. INSERT multiple books at once.
INSERT INTO books (book_id, title, author_id, genre, price, published)
VALUES
    (7, 'My Brilliant Friend',       5, 'Literary Fiction',  14.99, 2012),
    (8, 'The Story of a New Name',   5, 'Literary Fiction',  14.99, 2013);

-- 3. UPDATE the price of all books by a specific author (author_id = 1).
-- FIRST, preview which rows will be updated:
SELECT book_id, title, price
FROM books
WHERE author_id = 1;

-- THEN, run the UPDATE (10% price increase for George Orwell's books).
UPDATE books
SET price = price * 1.10
WHERE author_id = 1;

-- 4. DELETE a specific book by its ID.
-- Preview first:
SELECT * FROM books WHERE book_id = 8;

-- Then delete:
DELETE FROM books
WHERE book_id = 8;

-- 5. Danger: UPDATE without WHERE updates ALL rows.
-- Uncomment the next line only if you really mean it.
-- UPDATE books SET price = 0.00;

Output

-- INSERT: '2 rows affected' (for the two books)

-- After UPDATE (author_id = 1): Nineteen Eighty-Four price becomes 10.99, Animal Farm becomes 8.24.

-- After DELETE: book_id = 8 removed.

-- Always SELECT first to confirm scope.

⚠ Production Nightmare: UPDATE or DELETE without WHERE

Every DBA has a story about a junior dev running DELETE FROM users without a WHERE clause. The data is gone instantly — with no undo button (unless you have a backup or transaction). Always wrap modifications in a transaction (BEGIN; UPDATE ...; ROLLBACK; to test) and always SELECT first to preview affected rows.

📊 Production Insight

A missing WHERE on UPDATE/DELETE is unrecoverable without backups.

Always run SELECT with the same WHERE first to count affected rows.

Use transactions: BEGIN, UPDATE, ROLLBACK to test, then COMMIT.

Schedule regular backups — modifications are irreversible without them.

🎯 Key Takeaway

INSERT adds rows, UPDATE modifies, DELETE removes.

Never run UPDATE or DELETE without a WHERE clause.

Preview with SELECT, protect with transactions, backup regularly.

Database Design: Why Your Schema Is Probably Wrong

Most developers jump straight to writing queries before they've thought about how their data actually relates. That's like building a house without a foundation. It works until it doesn't—and when it fails, it fails hard.

A relational database isn't just a bucket for data. It's a model of your business logic. Every table should represent one entity. Every column should be atomic. This is called normalization, and ignoring it is why you end up with columns like tags_csv or user_preferences_json that you're forced to parse in application code.

Here's the rule: If you can't explain what a table represents in one sentence, your schema is broken. A table called orders_items_products_users is a cry for help. Split it. Name things clearly. Define foreign keys upfront—not as an afterthought when your JOINs stop making sense.

Design for queries you'll write six months from now, not the one you're writing today. The database doesn't care about your feelings. It cares about constraints, indexes, and referential integrity.

SchemaDesignNightmare.sqlSQL

// io.thecodeforge — database tutorial

-- Bad schema: everything in one table
CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    product_name VARCHAR(100),
    product_price DECIMAL(10,2),
    order_date DATETIME,
    shipping_address TEXT
);

-- Good schema: normalized entities
CREATE TABLE customers (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2)
);

CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer_id INT,
    order_date DATETIME,
    shipping_address TEXT,
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);

CREATE TABLE order_items (
    id INT PRIMARY KEY,
    order_id INT,
    product_id INT,
    quantity INT,
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);

Output

Query successful. 0 rows affected.

⚠ Production Trap:

Don't fall for 'we'll normalize later.' Later never comes. You'll be debugging a 15-table JOIN at 2 AM because someone thought storing lists as comma-separated strings was 'faster.' It's not.

🎯 Key Takeaway

Normalize your schema first. You can always denormalize for performance, but you can't un-break a bad schema.

SQL Basics: The Mental Model That Saves Hours

SQL is not Python. It's not JavaScript. Stop trying to think procedurally—SQL is declarative. You tell it WHAT you want, not HOW to get it. This trips up everyone who comes from imperative languages.

Here's the mental model: Imagine SQL as a pipeline. Data flows through each clause in a specific order. Most developers think SELECT runs first. It doesn't. The actual execution order is:

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

This matters because you can't filter on an alias you defined in SELECT within WHERE—the WHERE clause hasn't seen it yet. I've seen junior devs spend an hour debugging this exact issue.

Another fundamental: SQL works on sets, not loops. When you write SELECT * FROM users WHERE id IN (1,2,3), the database is doing set operations, not iterating through a list. This is why SQL is faster at data operations than application code—it's built for batch processing.

Understand this mental model, and you'll stop writing queries that work accidentally and start writing queries that work by design.

ExecutionOrder.sqlSQL

// io.thecodeforge — database tutorial

-- This WILL fail because WHERE can't see the alias
SELECT 
    u.id,
    CONCAT(u.first_name, ' ', u.last_name) AS full_name
FROM users u
WHERE full_name LIKE '%Smith%';

-- This WORKS: use HAVING or subquery
SELECT 
    u.id,
    CONCAT(u.first_name, ' ', u.last_name) AS full_name
FROM users u
HAVING full_name LIKE '%Smith%';

-- But best practice: filter before projection
SELECT 
    u.id,
    CONCAT(u.first_name, ' ', u.last_name) AS full_name
FROM users u
WHERE u.last_name = 'Smith';

Output

ERROR 1054: Unknown column 'full_name' in 'where clause'

Second query: 42 rows returned

Third query: 42 rows returned (0.002s)

💡Senior Shortcut:

When debugging a slow query, don't look at SELECT first. Look at FROM and WHERE. That's where you lose performance. SELECT is just formatting.

🎯 Key Takeaway

Memorize the execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. It will save you hours of debugging.

DBMS vs. SQL: Why You Need Both to Survive

Newcomers confuse SQL with the database itself. They're not the same thing. SQL is the language. DBMS (Database Management System) is the engine that runs it. You can write perfect SQL and still get destroyed by the DBMS you're running it on.

MySQL, PostgreSQL, SQL Server, SQLite—they all speak SQL, but each has its own quirks. PostgreSQL handles JSON natively. MySQL has GROUP BY behavior that'll bite you if you're not careful. SQLite doesn't support RIGHT JOINs at all. Knowing the difference keeps you from shipping code that works locally but explodes in production.

A DBMS does four things: store data, retrieve data, manage concurrent access, and ensure data integrity. That last one is why transactions exist. If you're not using transactions for operations that modify multiple tables, you're asking for data corruption.

Here's what matters: Pick one DBMS and learn it deeply before jumping to others. PostgreSQL is the industry standard for production systems. SQLite is fine for local dev. MySQL is popular but has enough foot-guns to fill a quarry. Know your toolkit.

DBMSQuirks.sqlSQL

// io.thecodeforge — database tutorial

-- PostgreSQL: returns full_name and count
SELECT 
    full_name, 
    COUNT(*)
FROM users
GROUP BY full_name;

-- MySQL: same query might run but returns garbage
-- MySQL allows SELECT columns not in GROUP BY
-- It picks arbitrary values. This is dangerous.

-- Safe MySQL version:
SELECT 
    full_name, 
    COUNT(*) AS cnt
FROM users
GROUP BY full_name
ORDER BY cnt DESC;

Output

PostgreSQL: Returns grouped results with only explicit columns.

MySQL: Returns grouped results with arbitrary values for non-aggregated columns (deprecated in 8.0+).

🔥Production Trap:

MySQL's loose GROUP BY behavior has caused production bugs where seemingly correct queries returned wrong data. Use ONLY_FULL_GROUP_BY mode to prevent this. Or just use PostgreSQL.

🎯 Key Takeaway

SQL is the lingua franca, but your DBMS is the local dialect. Know which one you're talking to before you start writing queries.

Subqueries: When One Query Isn't Enough

You've mastered SELECT, JOIN, and GROUP BY. But real-world data doesn't always cooperate. You need to filter by aggregated results, compare against averages, or find rows that don't exist in another table. That's where subqueries — queries inside queries — save your ass.

A subquery can live in WHERE, FROM, or even SELECT. It runs first, producing a value or a set of rows. The outer query uses that result like a variable. This isn't just theory: you'll use it to find customers who spent above average, products never ordered, or employees in the bottom quartile.

Don't nest too deep. Two levels is usually enough. Three is a code smell. If you need four, you're not breaking down the problem properly. Subqueries are surgical tools, not a substitute for proper schema design.

AboveAverageOrders.sqlSQL

// io.thecodeforge — database tutorial

-- Get customers who spent more than average
SELECT first_name, last_name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    GROUP BY customer_id
    HAVING SUM(total) > (
        SELECT AVG(total)
        FROM orders
    )
);

Output

first_name | last_name

-----------+----------

Alice | Chen

Bob | Torres

⚠ Production Trap:

Correlated subqueries (subquery references outer query) can tank performance. They run once per row. Rewrite them as JOINs or window functions when tables exceed 10k rows.

🎯 Key Takeaway

Subqueries run first. Outer queries use their results. Keep it shallow — two levels max.

Window Functions: Rank, Lag, and Moving Averages Without Self-Joins

You need a running total. Or to rank salespeople by region. Or find the difference between each month's revenue and the previous month. You could write a self-join with GROUP BY — ugly, slow, and hard to read. Or you could use window functions, the unsung heroes of analytical SQL.

Window functions compute across a set of rows related to the current row, without collapsing them into one result. OVER() defines the window. PARTITION BY splits into groups. ORDER BY within the window sets the sequence. ROW_NUMBER(), RANK(), LAG(), LEAD(), SUM() OVER() — these are your new tools.

Every senior dev uses window functions for reporting, deduplication, and time-series analysis. They're faster than subqueries and cleaner than self-joins. Learn them once, use them daily.

MonthlyRevenueChange.sqlSQL

// io.thecodeforge — database tutorial

-- Month-over-month revenue change per product
SELECT 
    product_id,
    month,
    revenue,
    LAG(revenue) OVER (
        PARTITION BY product_id 
        ORDER BY month
    ) AS prev_month_revenue,
    revenue - LAG(revenue) OVER (
        PARTITION BY product_id 
        ORDER BY month
    ) AS change
FROM product_monthly_revenue;

Output

product_id | month | revenue | prev_month_revenue | change

-----------+----------+---------+-------------------+--------

101 | 2024-01 | 5000 | NULL | NULL

101 | 2024-02 | 6200 | 5000 | 1200

💡Senior Shortcut:

Use ROW_NUMBER() with PARTITION BY to deduplicate rows — assign 1 to the first occurrence, then filter WHERE row_num = 1. No GROUP BY needed.

🎯 Key Takeaway

Window functions compute across related rows without collapsing them. Partition to group, order to sequence, then aggregate.

Why SQL Optimization Matters Before Your Query Runs

Slow queries don't fail—they just waste seconds that compound into hours. SQL optimization is the practice of reducing query execution time and resource consumption. The why: every unoptimized SELECT or JOIN scans more rows than necessary, locking tables, consuming memory, and frustrating users. The how starts with understanding the query execution plan—EXPLAIN ANALYZE reveals whether your database is doing full table scans when it could use an index. Indexes are the single most impactful tool: a B-tree index on a WHERE column can turn a 10-second query into 10 milliseconds. But indexes aren't free—they slow down writes. Other techniques include avoiding SELECT *, using EXISTS instead of IN for subqueries, filtering early with WHERE before JOINs, and choosing the correct join type. The result: faster APIs, lower hosting costs, and happier DBAs.

Index_Optimization.sqlSQL

// io.thecodeforge — database tutorial

// Slow: full table scan on 1M rows
SELECT * FROM orders WHERE status = 'pending';

// Fast: B-tree index on status
CREATE INDEX idx_orders_status ON orders(status);

// Re-run same query — now uses index scan
EXPLAIN ANALYZE
SELECT id, customer_id, total
FROM orders
WHERE status = 'pending';

Output

Index Scan using idx_orders_status on orders (cost=0.42..124.30 rows=850 width=18) (actual time=0.015..2.310 rows=850 loops=1)

⚠ Production Trap:

Adding an index on a low-cardinality column like 'is_deleted' (only two values) is useless. The database will still scan half the table.

🎯 Key Takeaway

Always check the execution plan before and after indexing—only add indexes that measurably reduce scan rows.

The Most Dreaded Topics: NULLs, JOIN Confusion, and Subquery Bloat

Three topics make experienced developers sweat: NULL behavior, multi-table JOINs, and nested subqueries. NULL is not a value—it's the absence of a value. Comparing anything to NULL with = returns NULL, not TRUE or FALSE, which is why WHERE column = NULL gives zero rows. The fix: use IS NULL or IS DISTINCT FROM. JOIN confusion happens because developers forget LEFT JOIN keeps all rows from the left table, while INNER JOIN drops non-matching rows. Mixing OUTER and INNER joins in the same query is the top cause of unexpected row loss. Subquery bloat occurs when developers wrap multiple subqueries inside SELECT instead of using JOINs or CTEs. A subquery inside a WHERE clause runs once per row—this is a performance disaster. Use EXISTS for correlated subqueries or rewrite with JOIN. The mental model: every query is a set operation—if you can't picture the Venn diagram, you'll get wrong results.

Null_Join_Trap.sqlSQL

// io.thecodeforge — database tutorial

-- WRONG: returns zero rows for missing middle_name
SELECT id, name FROM users WHERE middle_name = NULL;

-- CORRECT: check for absence
SELECT id, name FROM users WHERE middle_name IS NULL;

-- JOIN trap: accidental cartesian product
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

-- Subquery bloat (runs N+1 times)
-- Rewrite as JOIN or use EXISTS

Output

id | name

---|-------

3 | Alice -- only rows with null middle_name

⚠ Production Trap:

Using LEFT JOIN with a WHERE condition on the right table (WHERE o.total > 100) silently converts it to an INNER JOIN and drops unmatched left rows.

🎯 Key Takeaway

Always treat NULL with IS NULL/IS NOT NULL, never with =, and always confirm JOIN type with a row count sanity check.

Course Syllabus: From Zero to Production-Ready SQL

This course is structured to build your mental model of SQL from the ground up, then rapidly push you into practical, complex querying. The syllabus is divided into four parts. Part One covers the foundational mental model: how a database engine interprets SELECT, WHERE, and FROM clauses differently than you read them. We then explore schema design — why most beginners (and many seniors) end up with conflicting, redundant tables. Part Two dives into aggregate functions, GROUP BY behavior, and the silent pitfalls of NULLs in calculations. Part Three is the core of real-world SQL: joining tables with INNER and LEFT JOINs, subqueries as derived tables, and window functions that eliminate fragile self-joins. Part Four focuses on data modification (INSERT, UPDATE, DELETE) with transactional safety, and finally SQL optimization — reading execution plans before your query hits production. Each module ends with a short quiz, and the final project is a multi-join warehouse report.

course_syllabus_check.sqlSQL

// io.thecodeforge — database tutorial
-- Quick check: list modules for review
SELECT module_id, title, duration_min
FROM course_syllabus
WHERE course_name = 'introduction_sql'
ORDER BY module_id;

Output

module_id | title | duration_min

1 | Mental Model & Schema Design | 45

2 | Aggregates & NULLs | 50

3 | Joins, Subqueries, Windows | 70

4 | Mutation, Transactions, Optimz | 55

⚠ Production Trap:

Skipping the mental model and jumping straight to JOIN syntax leads to confusion about row counts and duplicate rows. I've seen production queries return 10x expected rows because developers didn't understand JOIN order.

🎯 Key Takeaway

Master the theory before you write another JOIN.

Projects, Certification & Paid Features

This course includes two capstone projects: first, you'll design and populate a normalized inventory database for a fictitious e-commerce store — including customers, orders, products, and a shared many-to-many order_items table. You'll write queries that answer real business questions: 'Which products have the highest average order value?' and 'Which customers have not ordered in 90 days?' The second project is a performance analysis: you're given a slow production query (a 6-table JOIN with subqueries) and must rewrite it using window functions and indexing hints to cut execution time by 80%. Upon completing both projects and passing the final assessment (80%+ on a 20-question mixed quiz), you'll earn a certificate of completion from TheCodeForge, verifiable via a unique link. Current course rating: 4.7/5 from 1,200+ reviews. A paid plan unlocks downloadable PDF cheatsheets, a private Slack community for live query reviews, and early access to the 'Advanced Indexing & Query Tuning' module. Free tier includes all lessons and community forum access.

certificate_eligibility.sqlSQL

// io.thecodeforge — database tutorial
SELECT student_id
FROM course_progress
WHERE course_id = 'intro_sql'
  AND project_1_passed = 1
  AND project_2_passed = 1
  AND final_quiz_score >= 80
  AND plan_type IN ('free', 'paid');

Output

student_id

2041

3092

4783

⚠ Production Trap:

Projects are not toy exercises. They mimic real edge cases: duplicate customer emails, missing foreign keys, and timezone mismatches. Skipping them means you'll hit these bugs on Monday morning.

🎯 Key Takeaway

Finish both projects to earn the certificate and fix real DB pitfalls.

Modern SQL: SQL:2023 Standard Features

The SQL:2023 standard introduces several modern features that enhance query capabilities and developer productivity. Key additions include property graph queries for graph-based data relationships, improved JSON support with JSON_TABLE and JSON_VALUE functions, and new aggregate functions like ANY_VALUE for non-deterministic grouping. Additionally, SQL:2023 standardizes the GREATEST and LEAST functions, which were previously vendor-specific, and introduces the NULLS NOT DISTINCT clause for unique constraints that treat NULLs as equal. These features simplify complex queries and align SQL with modern data modeling needs. For example, JSON_TABLE allows you to parse JSON documents into relational rows directly within SQL:

``sql SELECT FROM JSON_TABLE( '{"items": [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}]}', '$.items[]' COLUMNS ( id INT PATH '$.id', name VARCHAR(10) PATH '$.name' ) ) AS jt; ``

This returns two rows with id and name columns. The property graph query syntax enables pattern matching on graphs using MATCH clauses, similar to Cypher. While full support varies by database, these features represent the future of SQL standardization.

modern_sql_example.sqlSQL

-- SQL:2023 JSON_TABLE example
SELECT *
FROM JSON_TABLE(
  '{"items": [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}]}',
  '$.items[*]' COLUMNS (
    id INT PATH '$.id',
    name VARCHAR(10) PATH '$.name'
  )
) AS jt;

-- GREATEST and LEAST (standardized)
SELECT GREATEST(10, 20, 30) AS max_val; -- returns 30

-- NULLS NOT DISTINCT for unique constraints
CREATE TABLE t (
  a INT,
  b INT,
  UNIQUE NULLS NOT DISTINCT (a, b)
);

🔥Vendor Adoption Varies

📊 Production Insight

When migrating to newer database versions, review SQL:2023 features to simplify ETL pipelines and reduce application code. For example, use JSON_TABLE instead of custom JSON parsing in application layers.

🎯 Key Takeaway

SQL:2023 standardizes JSON handling, graph queries, and NULL behavior in unique constraints, making SQL more expressive for modern applications.

SQL Dialects: PostgreSQL vs MySQL vs SQL Server vs Oracle Differences

While SQL is standardized, each database vendor implements its own dialect with unique syntax, functions, and behaviors. Understanding these differences is crucial for writing portable code and avoiding surprises. Key areas of divergence include:

String Concatenation: PostgreSQL and Oracle use ||, MySQL uses CONCAT(), SQL Server uses +.
LIMIT/OFFSET: PostgreSQL and MySQL use LIMIT ... OFFSET, SQL Server uses OFFSET ... FETCH NEXT, Oracle uses OFFSET ... ROWS FETCH NEXT or ROWNUM.
Date/Time Functions: PostgreSQL uses NOW(), MySQL uses NOW(), SQL Server uses GETDATE(), Oracle uses SYSDATE.
NULL Ordering: PostgreSQL treats NULLs as largest by default, MySQL as smallest, SQL Server as smallest, Oracle as largest. Use NULLS FIRST/LAST for control.
Auto-increment: PostgreSQL uses SERIAL or IDENTITY, MySQL uses AUTO_INCREMENT, SQL Server uses IDENTITY, Oracle uses SEQUENCE or IDENTITY.
Window Functions: All support them, but syntax for frames and aggregates varies slightly.

Example of LIMIT differences:

```sql -- PostgreSQL / MySQL SELECT * FROM users ORDER BY id LIMIT 10 OFFSET 20;

-- SQL Server SELECT * FROM users ORDER BY id OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY;

-- Oracle (12c+) SELECT * FROM users ORDER BY id OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY; ```

When writing cross-database applications, use an ORM or abstract the SQL layer to handle these differences.

dialect_comparison.sqlSQL

-- String concatenation
-- PostgreSQL / Oracle
SELECT 'Hello' || ' ' || 'World';
-- MySQL
SELECT CONCAT('Hello', ' ', 'World');
-- SQL Server
SELECT 'Hello' + ' ' + 'World';

-- LIMIT/OFFSET
-- PostgreSQL / MySQL
SELECT * FROM products ORDER BY price LIMIT 5 OFFSET 10;
-- SQL Server / Oracle (12c+)
SELECT * FROM products ORDER BY price OFFSET 10 ROWS FETCH NEXT 5 ROWS ONLY;

-- NULL ordering (defaults vary)
SELECT * FROM employees ORDER BY salary NULLS LAST; -- explicit

⚠ Portability Pitfalls

📊 Production Insight

In multi-database environments, maintain a compatibility layer or use an ORM like SQLAlchemy to abstract dialect differences. For performance-critical code, write native SQL per database.

🎯 Key Takeaway

SQL dialects differ in string concatenation, pagination, date functions, and NULL handling; use ANSI-standard syntax where possible and test across databases.

SQL in the Cloud: BigQuery, Redshift, Snowflake Dialect Differences

Cloud data warehouses like Google BigQuery, Amazon Redshift, and Snowflake extend SQL with proprietary features optimized for large-scale analytics. While they support standard SQL, key differences affect query writing and optimization:

BigQuery: Uses GoogleSQL, which is ANSI-compliant but has unique features like SELECT EXCEPT, UNNEST for arrays, and PARTITION BY for clustering. It charges per query (data scanned), so SELECT is expensive. BigQuery requires backticks for table names with hyphens.

Redshift: Based on PostgreSQL 8.x, but with columnar storage and distribution keys. It uses SORTKEY and DISTKEY for physical design. Redshift does not support SELECT * EXCEPT or UNNEST natively; use JSON_EXTRACT_PATH_TEXT for JSON.

Snowflake: Uses a variant of PostgreSQL with features like LATERAL FLATTEN for semi-structured data, CLUSTERING KEY, and ZEROIFNULL. Snowflake supports SELECT * EXCLUDE and QUALIFY for window filters.

Example of semi-structured data querying:

```sql -- BigQuery: UNNEST arrays SELECT id, value FROM dataset.table, UNNEST(JSON_EXTRACT_ARRAY(json_column, '$.items')) AS value;

-- Snowflake: LATERAL FLATTEN SELECT id, value FROM dataset.table, LATERAL FLATTEN(INPUT => PARSE_JSON(json_column):items) AS value;

-- Redshift: JSON_EXTRACT_PATH_TEXT (limited) SELECT id, JSON_EXTRACT_PATH_TEXT(json_column, 'items') AS value; ```

Cost optimization differs: BigQuery charges by bytes processed, Redshift by cluster hours, Snowflake by credits. Use SELECT with specific columns and filter early to reduce costs.
cloud_sql_examples.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- BigQuery: SELECT * EXCEPT SELECT * EXCEPT (sensitive_column) FROM dataset.table; -- Snowflake: QUALIFY for window filters SELECT id, amount, RANK() OVER (ORDER BY amount DESC) AS rnk FROM sales QUALIFY rnk <= 10; -- Redshift: Distribution key example CREATE TABLE orders ( order_id INT, customer_id INT, order_date DATE ) DISTKEY (customer_id) SORTKEY (order_date); -- BigQuery: Partitioned table CREATE TABLE dataset.sales PARTITION BY DATE(order_date) CLUSTER BY customer_id AS SELECT * FROM source;
💡Cost-Aware Querying
📊 Production Insight
When migrating to a cloud data warehouse, rewrite SQL to use platform-native functions (e.g., UNNEST in BigQuery, LATERAL FLATTEN in Snowflake) and design physical schemas (distribution keys, clustering) for query patterns.
🎯 Key Takeaway
Cloud data warehouses have distinct SQL dialects and pricing models; optimize queries for cost and performance by leveraging platform-specific features like partitioning and clustering.

● Production incidentPOST-MORTEMseverity: high
The Silent NULL – How a Missing IS NULL Filtered Out Half Your Customers
Symptom
SELECT COUNT(*) FROM customers WHERE country = NULL returned 0, but the country column clearly had NULL values.
Assumption
The developer assumed that = works with NULL like any other value, and that NULL means 'empty'.
Root cause
In SQL, NULL represents 'unknown'. Any comparison with =, !=, <, > on NULL yields UNKNOWN, which is not TRUE, so the row is excluded. WHERE country = NULL is always false for every row.
Fix
Replace WHERE country = NULL with WHERE country IS NULL, or for non-null check use WHERE country IS NOT NULL.
Key lesson
NULL is not a value — it's the absence of a value. Never use = or != to compare against NULL.
Always use IS NULL or IS NOT NULL for NULL checks.
When debugging, first check if NULLs exist: SELECT COUNT(*) FROM table WHERE column IS NULL.
Use COALESCE or ISNULL to provide defaults for NULL values in queries.

Production debug guideCommon SQL query failures and the exact actions engineers take to resolve them5 entries
Symptom · 01
Query returns zero rows when you expect data
→
Fix
Check WHERE clause logic: are you using = NULL? Are you comparing strings that differ in case or whitespace? Are the column names correct? Run SELECT * LIMIT 5 to see raw data.
Symptom · 02
Query returns too many rows (cartesian product)
→
Fix
If using JOIN, check that you have a proper JOIN condition. Missing ON clause or cross join multiplies rows. Use SELECT COUNT(*) on each table individually.
Symptom · 03
Error: 'column must appear in GROUP BY clause or be used in an aggregate function'
→
Fix
Add the unaggregated column to the GROUP BY clause. For PostgreSQL, this is strict. For MySQL with ONLY_FULL_GROUP_BY disabled, it may run but give wrong results.
Symptom · 04
Query runs extremely slowly on large tables
→
Fix
Run EXPLAIN (or EXPLAIN ANALYZE) to see query plan. Look for full table scans. Ensure indexes exist on columns used in WHERE, JOIN, and ORDER BY. Consider adding LIMIT.
Symptom · 05
Subquery returns more than one row when you expected one
→
Fix
Check if the subquery can return multiple rows. Use LIMIT 1, or use aggregation (MAX, MIN), or restructure with JOIN and DISTINCT.

★ Quick SQL Debugging Cheat SheetThree commands to diagnose any SQL query problem fast
Unexpected result count (too many or zero)−
Immediate action
Count rows in each involved table separately to isolate the issue
Commands
SELECT COUNT(*) FROM table_name;
SELECT * FROM table_name LIMIT 5;
Fix now
Check WHERE clause conditions and ensure correct NULL handling.
Syntax error or ambiguous column reference+
Immediate action
Check column names and table aliases, and verify quotes are correct
Commands
SHOW COLUMNS FROM table_name; (or \d table_name in psql)
SELECT column FROM table WHERE x = 'value'; (simplify to debug)
Fix now
Alias columns explicitly and use proper quote types.
Slow query performance+
Immediate action
Obtain the query execution plan to find bottlenecks
Commands
EXPLAIN SELECT ...; (or EXPLAIN ANALYZE)
SHOW INDEX FROM table_name;
Fix now
Add index on columns used in WHERE and JOIN, avoid SELECT *, use LIMIT.

Feature WHERE HAVING
What it filters Individual rows Groups of rows (post-aggregation)
When it runs Before GROUP BY After GROUP BY
Works with aggregates? No — causes an error Yes — designed for it
Example use case WHERE price > 10 HAVING COUNT(*) > 2
Required clause No — optional No — only needed with GROUP BY
Can replace the other? No No — they serve different stages

⚙ Quick Reference
17 commands from this guide
File Command / Code Purpose
select_and_filter_books.sql SELECT * FROM books; Your First SQL Query
sort_and_limit_books.sql SELECT title, price Sorting, Limiting and Shaping Results with ORDER BY and LIMI
aggregate_and_group_books.sql SELECT COUNT(*) AS total_books Aggregate Functions
join_books_authors.sql SELECT books.title, Joining Tables with INNER JOIN
modify_data.sql INSERT INTO authors (author_id, full_name, country) Modifying Data
SchemaDesignNightmare.sql CREATE TABLE orders ( Database Design
ExecutionOrder.sql SELECT SQL Basics
DBMSQuirks.sql SELECT DBMS vs. SQL
AboveAverageOrders.sql SELECT first_name, last_name Subqueries
MonthlyRevenueChange.sql SELECT Window Functions
Index_Optimization.sql SELECT * FROM orders WHERE status = 'pending'; Why SQL Optimization Matters Before Your Query Runs
Null_Join_Trap.sql SELECT id, name FROM users WHERE middle_name = NULL; The Most Dreaded Topics
course_syllabus_check.sql SELECT module_id, title, duration_min Course Syllabus
certificate_eligibility.sql SELECT student_id Projects, Certification & Paid Features
modern_sql_example.sql SELECT * Modern SQL
dialect_comparison.sql SELECT 'Hello' || ' ' || 'World'; SQL Dialects
cloud_sql_examples.sql SELECT * EXCEPT (sensitive_column) FROM dataset.table; SQL in the Cloud

Key takeaways
1
SQL reads like English on purpose
SELECT [columns] FROM [table] WHERE [condition] is the skeleton of almost every query you'll ever write.
2
NULL is not a value
it means 'unknown'. Never compare it with = or !=. Always use IS NULL or IS NOT NULL, or you'll silently filter out data that should be there.
3
WHERE filters rows before grouping; HAVING filters groups after grouping
confusing the two is one of the most common SQL errors at every experience level.
4
Always add LIMIT when exploring an unfamiliar table
running an unbounded SELECT * on a production table with millions of rows can lock up your client, slow the database, and make you very unpopular very fast.
5
INNER JOIN returns only matching rows across tables
test with SELECT and LIMIT before running on production.
6
UPDATE and DELETE without WHERE affect all rows. Always preview with SELECT first, and use transactions to allow rollback.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What's the difference between WHERE and HAVING in SQL, and can you give ...
Q02JUNIOR
If I run SELECT * FROM users WHERE last_login = NULL and get zero rows b...
Q03SENIOR
What does GROUP BY actually do to the rows in a table, and why does ever...

Q01 of 03SENIOR
What's the difference between WHERE and HAVING in SQL, and can you give a concrete example of when you'd use each?
ANSWER
WHERE filters individual rows before any grouping occurs. HAVING filters groups after GROUP BY is applied. For example: to find authors with more than 2 books, you cannot use WHERE COUNT() > 2 because COUNT is an aggregate. You must use GROUP BY author_id HAVING COUNT() > 2. Conversely, to find books with price > 10, you use WHERE price > 10 (no GROUP BY).
Q02 of 03JUNIOR
If I run SELECT * FROM users WHERE last_login = NULL and get zero rows back even though I know null values exist in that column, what's wrong and how do you fix it?
ANSWER
The problem is that NULL cannot be compared using = (or !=). In SQL, any comparison with NULL yields UNKNOWN, which is not TRUE, so the row is excluded. To check for NULL, you must use the IS NULL operator: WHERE last_login IS NULL. Similarly, use IS NOT NULL for non-null checks.
Q03 of 03SENIOR
What does GROUP BY actually do to the rows in a table, and why does every column in the SELECT list need to either be in the GROUP BY clause or wrapped in an aggregate function?
ANSWER
GROUP BY partitions the result set into groups based on the specified columns. Within each group, you can then apply aggregate functions (COUNT, SUM, AVG, MIN, MAX) to summarize the data. Every column in the SELECT list must be either a grouping column or wrapped in an aggregate because if you select a non-grouped, non-aggregated column, there could be multiple different values within a group — SQL wouldn't know which one to show, violating determinism. It's an error in most databases (except older MySQL with ONLY_FULL_GROUP_BY disabled, which picks an arbitrary value).

01
What's the difference between WHERE and HAVING in SQL, and can you give a concrete example of when you'd use each?
SENIOR
02
If I run SELECT * FROM users WHERE last_login = NULL and get zero rows back even though I know null values exist in that column, what's wrong and how do you fix it?
JUNIOR
03
What does GROUP BY actually do to the rows in a table, and why does every column in the SELECT list need to either be in the GROUP BY clause or wrapped in an aggregate function?
SENIOR

FAQ · 3 QUESTIONS

Frequently Asked Questions

01
Do I need to install anything to start learning SQL?
No installation needed to get started. SQLiteOnline.com and DB Fiddle both let you write and run SQL directly in your browser for free. If you want a local setup, SQLite is a single file with no server required — DB Browser for SQLite gives it a visual interface. All examples in this article run in any of these environments without modification.
Was this helpful?
02
What's the difference between SQL and MySQL (or PostgreSQL)?
SQL is the language — a standard defined by ISO. MySQL, PostgreSQL, SQLite, and SQL Server are database engines: software systems that store data and understand SQL. It's the same relationship as English being a language and Google Docs being a tool that accepts English. The core SQL you've learned here works across all of them; the differences only appear in advanced or edge-case features.
Was this helpful?
03
Is SQL still worth learning in 2025 with NoSQL and AI tools around?
Absolutely yes. SQL is the single most in-demand data skill across engineering, analytics, product, and business roles — consistently appearing in more job postings than any other technical skill. NoSQL databases handle unstructured or highly scalable data, but relational databases remain the backbone of most applications. Even AI data tools like Copilot for data analysis generate SQL under the hood — understanding what they produce is still your responsibility.
Was this helpful?

📖 What to Read Next
→SQL NULL Handling - 340 Employees Missing Bonuses340 employees missed bonus payments because NULL arithmetic produced $0 compensation. Avoid this production failure with →SQL CASE — Missing ELSE Silently Skips Rows in SUMAPAC revenue dropped to $0 when a CASE inside SUM lacked ELSE, silently excluding 18% of orders. Monitoring on query err →PL/SQL IF — NULL Comparison Silently Skips 12K Records12,000 invoices went unprocessed because PL/SQL's Three-Valued Logic treats NULL comparisons as NULL, not FALSE. Use IS →LEFT JOIN WHERE Filter: Missing Customers (SQL Interview)Finance quarterly report missing high-value customers: a LEFT JOIN WHERE filter silently turned it into an INNER JOIN. A →SQL WHERE: AND/OR Precedence Bug Sent 40k Wrong CustomersAND has higher precedence than OR - missing parentheses emailed 40k customers wrongly. Use parentheses in mixed conditio
From Other Categories
Biometric Authentication Explained: Types and How It WorksIntermediate

N

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

Follow

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's SQL Basics. Mark it forged?

11 min read · try the examples if you haven't

Feature	WHERE	HAVING
What it filters	Individual rows	Groups of rows (post-aggregation)
When it runs	Before GROUP BY	After GROUP BY
Works with aggregates?	No — causes an error	Yes — designed for it
Example use case	WHERE price > 10	HAVING COUNT(*) > 2
Required clause	No — optional	No — only needed with GROUP BY
Can replace the other?	No	No — they serve different stages

File	Command / Code	Purpose
select_and_filter_books.sql	SELECT * FROM books;	Your First SQL Query
sort_and_limit_books.sql	SELECT title, price	Sorting, Limiting and Shaping Results with ORDER BY and LIMI
aggregate_and_group_books.sql	SELECT COUNT(*) AS total_books	Aggregate Functions
join_books_authors.sql	SELECT books.title,	Joining Tables with INNER JOIN
modify_data.sql	INSERT INTO authors (author_id, full_name, country)	Modifying Data
SchemaDesignNightmare.sql	CREATE TABLE orders (	Database Design
ExecutionOrder.sql	SELECT	SQL Basics
DBMSQuirks.sql	SELECT	DBMS vs. SQL
AboveAverageOrders.sql	SELECT first_name, last_name	Subqueries
MonthlyRevenueChange.sql	SELECT	Window Functions
Index_Optimization.sql	SELECT * FROM orders WHERE status = 'pending';	Why SQL Optimization Matters Before Your Query Runs
Null_Join_Trap.sql	SELECT id, name FROM users WHERE middle_name = NULL;	The Most Dreaded Topics
course_syllabus_check.sql	SELECT module_id, title, duration_min	Course Syllabus
certificate_eligibility.sql	SELECT student_id	Projects, Certification & Paid Features
modern_sql_example.sql	SELECT *	Modern SQL
dialect_comparison.sql	SELECT 'Hello' \|\| ' ' \|\| 'World';	SQL Dialects
cloud_sql_examples.sql	SELECT * EXCEPT (sensitive_column) FROM dataset.table;	SQL in the Cloud