SQL for Beginners: Query, Filter and Manage Data From Scratch
Every app you've ever used — Instagram, Netflix, your bank — runs on a database humming quietly in the background. When Netflix recommends a show, a database is being asked a question. When your bank balance updates, a database is being changed. SQL (Structured Query Language) is the universal language for doing both of those things, and it has been for over 50 years. Learning it is one of the highest-ROI skills a developer — or honestly anyone who works with data — can pick up.
Before SQL existed, retrieving data from a computer system meant writing complex custom programs for every single question you wanted to ask. IBM researchers in the 1970s asked a radical question: what if non-programmers could just ask the data a question in something close to plain English? SQL was born from that idea. It solves the problem of making structured data accessible — whether you're a data analyst, a backend engineer, a product manager, or a founder trying to understand your users.
By the end of this article you'll understand what a database and a table actually are, you'll be able to write SELECT queries to fetch data, use WHERE clauses to filter it, sort results with ORDER BY, and combine conditions with AND and OR. You'll also know the most common traps beginners fall into and exactly how to sidestep them. Let's build this up from the ground floor.
What a Database and a Table Actually Are (No Hand-Waving)
A database is an organised collection of data stored on a computer so it can be retrieved quickly and reliably. Think of it as the filing cabinet from the analogy — but one that can hold millions of folders and find any single piece of information in milliseconds.
Inside a database, data lives in tables. A table is exactly what the word suggests: rows and columns, just like a spreadsheet. Each column has a name and a data type — for example, a 'price' column stores numbers, a 'customer_name' column stores text. Each row is one complete record — one customer, one product, one transaction.
When tables are related to each other — for example, an 'orders' table references a 'customers' table — the whole system is called a Relational Database. SQL is the language designed specifically for relational databases.
Popular relational databases you'll hear about include PostgreSQL, MySQL, SQLite, and Microsoft SQL Server. The good news: SQL is nearly identical across all of them. Learn it once, use it everywhere. The small differences (called 'dialects') only matter for advanced features.
-- ============================================================ -- We're building a tiny bookstore database from scratch. -- Two tables: 'authors' and 'books'. -- Run this in any SQL environment (SQLite, MySQL, PostgreSQL). -- ============================================================ -- Create the authors table first, because books will reference it. CREATE TABLE authors ( author_id INTEGER PRIMARY KEY, -- unique identifier for each author full_name TEXT NOT NULL, -- author's full name; NOT NULL means it's required country TEXT -- country of origin; optional field ); -- Create the books table. CREATE TABLE books ( book_id INTEGER PRIMARY KEY, title TEXT NOT NULL, author_id INTEGER NOT NULL, -- links back to the authors table genre TEXT, price REAL NOT NULL, -- REAL means a decimal number (e.g. 12.99) published INTEGER -- year published, stored as a number ); -- Insert four authors into the authors table. INSERT INTO authors (author_id, full_name, country) VALUES (1, 'George Orwell', 'United Kingdom'), (2, 'Gabriel García Márquez', 'Colombia'), (3, 'Chimamanda Ngozi Adichie', 'Nigeria'), (4, 'Haruki Murakami', 'Japan'); -- Insert six books into the books table. INSERT INTO books (book_id, title, author_id, genre, price, published) VALUES (1, 'Nineteen Eighty-Four', 1, 'Dystopian', 9.99, 1949), (2, 'Animal Farm', 1, 'Satire', 7.49, 1945), (3, 'One Hundred Years of Solitude',2, 'Magical Realism', 12.99, 1967), (4, 'Americanah', 3, 'Literary Fiction', 10.99, 2013), (5, 'Purple Hibiscus', 3, 'Literary Fiction', 8.99, 2003), (6, 'Norwegian Wood', 4, 'Literary Fiction', 11.50, 1987);
-- The database engine confirms each statement silently.
-- In most SQL tools you'll see: 'Query OK' or '4 rows affected' etc.
-- The data now lives in the database, ready to be queried.
Your First SQL Query — SELECT, FROM, and WHERE Explained
SELECT is the most important SQL command you'll ever learn. It answers the question: 'Show me data.' The basic structure is almost a sentence: SELECT [what columns] FROM [which table].
The asterisk (*) is a wildcard meaning 'all columns'. Use it when exploring data. In production code, always name the columns explicitly — it's faster and far easier to read six months later.
The WHERE clause is how you filter. Without WHERE, SQL returns every single row in the table. WHERE adds a condition that each row must pass to be included in the results. Think of it as a bouncer at the door: only rows that satisfy the condition get in.
Conditions in WHERE can use these comparison operators: = (equals), != or <> (not equals), > (greater than), < (less than), >= (greater than or equal), <= (less than or equal). You can also combine multiple conditions using AND (both must be true) and OR (either can be true).
-- ============================================================ -- Querying the bookstore database we created in the last step. -- Run each query block one at a time to see the output. -- ============================================================ -- Query 1: Get every column from every row in the books table. -- Good for exploration, but avoid SELECT * in real apps. SELECT * FROM books; -- Query 2: Get only the title and price — we don't need everything. SELECT title, price FROM books; -- Query 3: Filter with WHERE — only books cheaper than $10.00 SELECT title, price FROM books WHERE price < 10.00; -- Query 4: Combine conditions with AND -- Only books under $10 that were also published after 1980 SELECT title, price, published FROM books WHERE price < 10.00 AND published > 1980; -- Query 5: Use OR — books that are either Satire OR Dystopian genre SELECT title, genre FROM books WHERE genre = 'Satire' OR genre = 'Dystopian'; -- Query 6: A tidier way to write the OR above, using IN -- IN checks if a value matches any item in a list SELECT title, genre FROM books WHERE genre IN ('Satire', 'Dystopian');
-- book_id | title | author_id | genre | price | published
-- 1 | Nineteen Eighty-Four | 1 | Dystopian | 9.99 | 1949
-- 2 | Animal Farm | 1 | Satire | 7.49 | 1945
-- 3 | One Hundred Years of Solitude | 2 | Magical Realism | 12.99 | 1967
-- 4 | Americanah | 3 | Literary Fiction | 10.99 | 2013
-- 5 | Purple Hibiscus | 3 | Literary Fiction | 8.99 | 2003
-- 6 | Norwegian Wood | 4 | Literary Fiction | 11.50 | 1987
-- Query 3 output (price < 10.00):
-- title | price
-- Nineteen Eighty-Four | 9.99
-- Animal Farm | 7.49
-- Purple Hibiscus | 8.99
-- Query 4 output (price < 10.00 AND published > 1980):
-- title | price | published
-- Purple Hibiscus | 8.99 | 2003
-- Query 6 output (Satire OR Dystopian):
-- title | genre
-- Nineteen Eighty-Four | Dystopian
-- Animal Farm | Satire
Sorting, Limiting and Shaping Results with ORDER BY and LIMIT
Fetching data is one thing — getting it back in a useful order is another. ORDER BY sorts your results by one or more columns. ASC means ascending (A→Z, 0→9) and is the default. DESC means descending (Z→A, 9→0).
You can sort by multiple columns: ORDER BY genre ASC, price DESC sorts alphabetically by genre first, then within each genre sorts by price from highest to lowest. This is enormously useful for leaderboards, product listings, and reports.
LIMIT controls how many rows come back. Databases can hold millions of rows — you almost never want all of them at once. LIMIT 5 gives you the top five. Pair it with ORDER BY and you can answer questions like 'What are the 3 most expensive books?' in a single clean query.
ALIAS (the AS keyword) lets you rename a column in the output. This is purely cosmetic — it doesn't change the database — but it makes results far more readable, especially when column names are long or ambiguous.
-- ============================================================ -- Sorting, limiting and renaming output columns. -- These techniques are used in virtually every real-world query. -- ============================================================ -- Query 1: Sort all books by price, cheapest first (ASC is default). SELECT title, price FROM books ORDER BY price ASC; -- Query 2: Sort by price, most expensive first. SELECT title, price FROM books ORDER BY price DESC; -- Query 3: The 3 most expensive books — combine ORDER BY with LIMIT. SELECT title, price FROM books ORDER BY price DESC LIMIT 3; -- Query 4: Sort by genre alphabetically, then by price within each genre. -- This is multi-column sorting. SELECT title, genre, price FROM books ORDER BY genre ASC, price ASC; -- Query 5: Use AS to give columns friendlier names in the output. -- 'title' becomes 'Book Title', 'price' becomes 'Price (USD)'. -- Notice: multi-word aliases need to be wrapped in double quotes. SELECT title AS "Book Title", price AS "Price (USD)", published AS "Year" FROM books ORDER BY price DESC LIMIT 4;
-- title | price
-- Animal Farm | 7.49
-- Purple Hibiscus | 8.99
-- Nineteen Eighty-Four | 9.99
-- Americanah | 10.99
-- Norwegian Wood | 11.50
-- One Hundred Years of Solitude | 12.99
-- Query 3 output (top 3 most expensive):
-- title | price
-- One Hundred Years of Solitude | 12.99
-- Norwegian Wood | 11.50
-- Americanah | 10.99
-- Query 5 output (aliased columns, top 4 by price):
-- Book Title | Price (USD) | Year
-- One Hundred Years of Solitude | 12.99 | 1967
-- Norwegian Wood | 11.50 | 1987
-- Americanah | 10.99 | 2013
-- Nineteen Eighty-Four | 9.99 | 1949
Aggregate Functions — COUNT, SUM, AVG, MIN, MAX and GROUP BY
So far every query has returned individual rows. But sometimes you don't want individual records — you want a summary. How many books does each author have? What's the average price per genre? What's the most expensive book? Aggregate functions answer these questions by collapsing many rows into a single calculated value.
The five core aggregate functions: COUNT() counts rows, SUM() adds up numeric values, AVG() calculates the average, MIN() finds the smallest value, MAX() finds the largest.
Aggregate functions become truly powerful when combined with GROUP BY. GROUP BY splits your table into groups (one group per unique value in a column) and then runs the aggregate function on each group separately. The result is one summary row per group.
HAVING is like WHERE but for aggregated results. You can't use WHERE to filter on a COUNT or SUM because those values don't exist until after grouping happens. HAVING runs after the groups are formed, so it can filter on them.
-- ============================================================ -- Aggregate functions and GROUP BY on our bookstore data. -- This is where SQL starts to feel genuinely powerful. -- ============================================================ -- Query 1: How many books are in the entire table? -- COUNT(*) counts every row regardless of NULL values. SELECT COUNT(*) AS total_books FROM books; -- Query 2: What is the average price of all books? SELECT ROUND(AVG(price), 2) AS average_price FROM books; -- ROUND(value, 2) rounds the result to 2 decimal places. -- Query 3: Most expensive and cheapest single book. SELECT MAX(price) AS most_expensive, MIN(price) AS cheapest FROM books; -- Query 4: How many books does each author have? -- GROUP BY splits the table into one group per author_id, -- then COUNT(*) counts the rows in each group. SELECT author_id, COUNT(*) AS number_of_books FROM books GROUP BY author_id ORDER BY number_of_books DESC; -- Query 5: Average price per genre. SELECT genre, COUNT(*) AS book_count, ROUND(AVG(price),2) AS avg_price FROM books GROUP BY genre ORDER BY avg_price DESC; -- Query 6: HAVING — only show genres that have MORE than 1 book. -- We cannot use WHERE here because book_count doesn't exist yet -- at the point WHERE runs. HAVING runs AFTER GROUP BY. SELECT genre, COUNT(*) AS book_count FROM books GROUP BY genre HAVING book_count > 1;
-- total_books
-- 6
-- Query 2 output:
-- average_price
-- 10.16
-- Query 3 output:
-- most_expensive | cheapest
-- 12.99 | 7.49
-- Query 4 output (books per author):
-- author_id | number_of_books
-- 3 | 2 (Chimamanda — Americanah + Purple Hibiscus)
-- 1 | 2 (Orwell — 1984 + Animal Farm)
-- 2 | 1
-- 4 | 1
-- Query 5 output (avg price per genre):
-- genre | book_count | avg_price
-- Magical Realism | 1 | 12.99
-- Literary Fiction | 3 | 10.49
-- Dystopian | 1 | 9.99
-- Satire | 1 | 7.49
-- Query 6 output (genres with more than 1 book):
-- genre | book_count
-- Literary Fiction | 3
| Feature | WHERE | HAVING |
|---|---|---|
| What it filters | Individual rows | Groups of rows (post-aggregation) |
| When it runs | Before GROUP BY | After GROUP BY |
| Works with aggregates? | No — causes an error | Yes — designed for it |
| Example use case | WHERE price > 10 | HAVING COUNT(*) > 2 |
| Required clause | No — optional | No — only needed with GROUP BY |
| Can replace the other? | No | No — they serve different stages |
🎯 Key Takeaways
- SQL reads like English on purpose — SELECT [columns] FROM [table] WHERE [condition] is the skeleton of almost every query you'll ever write.
- NULL is not a value — it means 'unknown'. Never compare it with = or !=. Always use IS NULL or IS NOT NULL, or you'll silently filter out data that should be there.
- WHERE filters rows before grouping; HAVING filters groups after grouping — confusing the two is one of the most common SQL errors at every experience level.
- Always add LIMIT when exploring an unfamiliar table — running an unbounded SELECT * on a production table with millions of rows can lock up your client, slow the database, and make you very unpopular very fast.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Using single quotes for column/table names and double quotes for strings — SQL is the opposite of most languages. String values go in single quotes: WHERE genre = 'Satire'. Table and column names that need quoting use double quotes: SELECT "Book Title". Mixing them up gives you either wrong results or a syntax error like 'column Satire does not exist'.
- ✕Mistake 2: Forgetting that NULL is not a value — you cannot compare NULL with = or !=. Writing WHERE country = NULL will always return zero rows, silently and without error. The correct syntax is WHERE country IS NULL or WHERE country IS NOT NULL. This is one of the trickiest beginner traps because SQL won't warn you — it just quietly returns nothing.
- ✕Mistake 3: Selecting a non-aggregated column without including it in GROUP BY — if you write SELECT genre, title, COUNT(*) FROM books GROUP BY genre, most SQL databases will throw an error like 'column title must appear in the GROUP BY clause or be used in an aggregate function'. Every column in your SELECT list must either be in the GROUP BY or wrapped in an aggregate function like MAX(title). PostgreSQL enforces this strictly; older MySQL versions silently return random values, which is even more dangerous.
Interview Questions on This Topic
- QWhat's the difference between WHERE and HAVING in SQL, and can you give a concrete example of when you'd use each?
- QIf I run SELECT * FROM users WHERE last_login = NULL and get zero rows back even though I know null values exist in that column, what's wrong and how do you fix it?
- QWhat does GROUP BY actually do to the rows in a table, and why does every column in the SELECT list need to either be in the GROUP BY clause or wrapped in an aggregate function?
Frequently Asked Questions
Do I need to install anything to start learning SQL?
No installation needed to get started. SQLiteOnline.com and DB Fiddle both let you write and run SQL directly in your browser for free. If you want a local setup, SQLite is a single file with no server required — DB Browser for SQLite gives it a visual interface. All examples in this article run in any of these environments without modification.
What's the difference between SQL and MySQL (or PostgreSQL)?
SQL is the language — a standard defined by ISO. MySQL, PostgreSQL, SQLite, and SQL Server are database engines: software systems that store data and understand SQL. It's the same relationship as English being a language and Google Docs being a tool that accepts English. The core SQL you've learned here works across all of them; the differences only appear in advanced or edge-case features.
Is SQL still worth learning in 2025 with NoSQL and AI tools around?
Absolutely yes. SQL is the single most in-demand data skill across engineering, analytics, product, and business roles — consistently appearing in more job postings than any other technical skill. NoSQL databases handle unstructured or highly scalable data, but relational databases remain the backbone of most applications. Even AI data tools like Copilot for data analysis generate SQL under the hood — understanding what they produce is still your responsibility.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.