SQL GROUP BY — 40% Revenue Inflation from Missing Column
MySQL's lenient GROUP BY mode caused a 40% revenue inflation in reports — use these debug patterns to prevent silent data corruption in aggregate queries.
N
Naren · Founder
Plain-English first. Then code. Then the interview question.
GROUP BY collapses rows sharing the same value into groups — each group produces one aggregate row
Every non-aggregate SELECT column must appear in GROUP BY — this is a hard rule, not a style preference
HAVING filters groups after aggregation — WHERE filters rows before aggregation
GROUP BY column order matters for composite groupings but not for single-column groupings
HAVING can reference aggregate functions; WHERE cannot
Biggest mistake: putting an aggregate function in WHERE — it throws an error because aggregation hasn't happened yet
Plain-English First
Imagine you have a giant pile of receipts from a store — thousands of them, one per sale. GROUP BY is like sorting those receipts into separate piles by category: all the electronics together, all the groceries together, all the clothing together. Once you have those piles, HAVING is the rule you apply to the piles themselves — 'only show me the piles worth more than $500 total.' WHERE, by contrast, would be you throwing away individual receipts before you even start sorting.
Every real application that touches a database eventually needs to answer questions like 'which customers spent the most last month?' or 'which product categories have fewer than ten sales?' These aren't questions about individual rows — they're questions about groups of rows. That distinction is exactly what GROUP BY and HAVING are built for, and understanding them deeply separates developers who can write SQL from developers who can write SQL well.
The problem GROUP BY solves is aggregation: collapsing many rows into summarised results. Without it, you'd have to pull every row into your application layer and crunch the numbers in code — which is slow, memory-hungry, and frankly the database's job. HAVING then solves a second problem that naturally emerges: once you've grouped and aggregated, you need a way to filter those groups. WHERE can't do it because WHERE runs before aggregation happens. HAVING runs after, so it can see the computed totals and counts that WHERE never gets to touch.
By the end of this article you'll understand exactly when to reach for GROUP BY versus a simple WHERE clause, why HAVING exists as a separate keyword rather than just extending WHERE, how to combine them with multiple aggregate functions in a single query, and the specific mistakes that silently return wrong answers instead of errors. You'll also walk away with three interview questions that trip up a surprising number of candidates.
GROUP BY: Collapsing Rows Into Meaningful Summaries
GROUP BY tells the database to treat all rows that share the same value in a column as a single unit, then apply an aggregate function — SUM, COUNT, AVG, MAX, MIN — across that unit. The result set has one row per unique group, not one row per original record.
This is critical to internalise: after a GROUP BY, every column in your SELECT must either be the grouped column itself or wrapped in an aggregate function. If you select a non-grouped, non-aggregated column the database doesn't know which of the many original rows' values to display. Some databases (MySQL with loose mode) will silently pick a random row's value. PostgreSQL and SQL Server will flat-out error. Neither outcome is what you want.
The mental model that helps most: picture the database first sorting all rows by the GROUP BY column, then drawing a horizontal line between each new value, then running your aggregate function on each block between the lines. You only ever see the aggregated result per block — the individual rows are gone from view.
sales_by_category.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Sample table: orders-- Columns: order_id, customer_id, category, amount, order_date-- ── BASIC GROUP BY ──────────────────────────────────────────────-- Goal: find total revenue and number of orders per product categorySELECT
category, -- the grouping column — one row per unique valueCOUNT(order_id) AS order_count, -- how many orders fell into this groupSUM(amount) AS total_revenue,-- aggregate: sum of all amounts in the groupROUND(AVG(amount), 2) AS avg_order_value -- aggregate: mean order sizeFROM orders
GROUPBY category -- collapse every category into one summary rowORDERBY total_revenue DESC; -- show highest-revenue category first-- ── GROUPING BY MULTIPLE COLUMNS ────────────────────────────────-- Goal: break revenue down by category AND by the year of the order-- This gives one row per (category, year) combinationSELECT
category,
EXTRACT(YEARFROM order_date) AS order_year, -- derive year from the date columnCOUNT(order_id) AS order_count,
SUM(amount) AS total_revenue
FROM orders
GROUPBY
category, -- group first by category...EXTRACT(YEARFROM order_date) -- ...then by year within each categoryORDERBY order_year, total_revenue DESC;
If you SELECT a column that isn't in your GROUP BY clause and isn't wrapped in an aggregate, PostgreSQL throws 'column must appear in the GROUP BY clause or be used in an aggregate function.' MySQL in its default mode silently picks an arbitrary row's value — which means you get results that look correct but aren't. Always be explicit: either group by it or aggregate it.
Production Insight
GROUP BY on an unindexed column forces a hash aggregate or sort on the full table — expensive at scale.
Add an index on your most frequent GROUP BY column for repeated report queries: CREATE INDEX idx_orders_status ON orders(status).
For time-series reports, group by DATE_TRUNC('month', created_at) and index the created_at column.
Key Takeaway
GROUP BY is expensive without an index on the grouping column — add indexes for repeated aggregation queries.
NULL values in the GROUP BY column form their own group — decide explicitly whether to include or exclude them.
Every non-aggregate SELECT column must be in GROUP BY — this is SQL's strictest grouping rule.
HAVING: Filtering Groups After Aggregation (Not Before)
Here's the question that unlocks HAVING: once you've run GROUP BY and computed SUM(amount) per category, how do you show only categories where that total exceeds $50,000? You can't use WHERE — by the time WHERE runs, the aggregation hasn't happened yet. WHERE is a row-level filter applied to the raw table. HAVING is a group-level filter applied to the aggregated output.
Think of the SQL execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. WHERE filters raw rows before they enter the grouping step. HAVING filters the grouped rows after aggregation is complete. This is why you can reference SUM(amount) in HAVING but not in WHERE.
HAVING is also where you enforce data quality thresholds. 'Show me customers, but only if they've placed at least 5 orders' is a HAVING condition — COUNT(order_id) >= 5. It's common to combine both WHERE and HAVING in one query: WHERE removes irrelevant raw rows early (which improves performance by feeding fewer rows into the grouping step), and HAVING then prunes the resulting groups.
high_value_categories.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- ── HAVING: filter groups by their aggregate value ──────────────-- Goal: find product categories that have generated over $50,000-- and had more than 100 individual ordersSELECT
category,
COUNT(order_id) AS order_count,
SUM(amount) AS total_revenue
FROM orders
GROUPBY category
HAVINGSUM(amount) > 50000-- only keep groups where the SUM clears this barANDCOUNT(order_id) > 100-- AND the group contains more than 100 ordersORDERBY total_revenue DESC;
-- ── WHERE + GROUP BY + HAVING together ──────────────────────────-- Goal: same report, but exclude any orders placed before 2023-- (WHERE trims raw rows BEFORE grouping — faster than filtering in HAVING)SELECT
category,
COUNT(order_id) AS order_count,
SUM(amount) AS total_revenue
FROM orders
WHERE order_date >= '2023-01-01' -- row-level filter: remove old orders FIRSTGROUPBY category
HAVINGSUM(amount) > 50000-- group-level filter: applied AFTER aggregationANDCOUNT(order_id) > 100ORDERBY total_revenue DESC;
-- ── HAVING without GROUP BY (edge case worth knowing) ────────────-- HAVING can technically apply to the entire table as one group.-- This returns the row only if the whole orders table has > 1000 rows.-- Rarely useful, but interviewers love asking if it's valid SQL.SELECTCOUNT(*) AS total_orders
FROM orders
HAVINGCOUNT(*) > 1000;
Output
-- Result of first query:
category | order_count | total_revenue
------------+-------------+--------------
Electronics | 142 | 84350.00
Furniture | 89 | 61200.00
-- (Clothing had 310 orders but only $31,450 revenue — fails the $50k HAVING check)
-- (Groceries fails both conditions)
-- Result of WHERE + GROUP BY + HAVING query (2023 data only):
category | order_count | total_revenue
------------+-------------+--------------
Electronics | 142 | 84350.00
Furniture | 89 | 61200.00
-- Result of HAVING without GROUP BY:
total_orders
------------
1044
-- (Returns the count row because 1044 > 1000 — if < 1000, returns empty set)
Pro Tip: Push Filters Into WHERE When You Can
If a condition doesn't involve an aggregate (no SUM, COUNT, AVG, etc.), put it in WHERE — not HAVING. WHERE filters rows before the database does the expensive grouping work. HAVING filters after. A condition like HAVING category = 'Electronics' works, but WHERE category = 'Electronics' does the same job with less computation because the database skips non-Electronics rows entirely before it starts grouping.
Production Insight
HAVING is evaluated after all the grouping and aggregation work is done — it cannot short-circuit the computation.
For large datasets, use WHERE to pre-filter rows before GROUP BY whenever possible — this reduces the rows being aggregated.
HAVING MIN(total) > 0 is legitimate; WHERE MIN(total) > 0 throws an error.
Key Takeaway
HAVING filters groups, not rows — it runs after GROUP BY and can use aggregate results.
Pre-filter with WHERE before GROUP BY wherever possible — it reduces work before aggregation begins.
HAVING and WHERE can coexist: WHERE filters rows first, HAVING filters groups after.
Real-World Patterns: How GROUP BY and HAVING Are Actually Used
In production code these clauses rarely appear in isolation. Here are three patterns you'll encounter constantly and should be able to write from memory.
The first is the top-N per group pattern: find the most active customers, highest-revenue regions, or most common error codes. You GROUP BY the entity, aggregate a metric, HAVING optionally filters noise, then ORDER BY the metric with LIMIT to get your top results.
The second is the data quality audit pattern: find anomalies by grouping records that should be unique and counting how many duplicates exist. A COUNT(*) > 1 in HAVING is the classic duplicate-detection query every data engineer writes at some point.
The third is the cohort threshold pattern common in analytics: 'show me all users who took action X at least N times.' This is GROUP BY user_id, HAVING COUNT(*) >= N — the backbone of retention and engagement reports. Once you recognise these shapes, you stop treating GROUP BY and HAVING as separate features and start seeing them as a single analytical toolset.
real_world_groupby_patterns.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- ── PATTERN 1: Top-N customers by total spend ───────────────────-- Goal: find the 5 highest-spending customers in 2024SELECT
customer_id,
COUNT(order_id) AS total_orders,
SUM(amount) AS lifetime_spend
FROM orders
WHERE order_date >= '2024-01-01' -- narrow to 2024 first (WHERE = pre-filter)GROUPBY customer_id
HAVINGSUM(amount) > 0-- exclude any customer with $0 net (edge case)ORDERBY lifetime_spend DESCLIMIT5; -- take only the top 5 after sorting-- ── PATTERN 2: Duplicate detection ──────────────────────────────-- Goal: find any email addresses that appear more than once in the-- customers table — a sign of duplicate registrationsSELECT
email,
COUNT(*) AS registration_count -- count how many rows share this emailFROM customers
GROUPBY email
HAVINGCOUNT(*) > 1-- HAVING filters to only the duplicatesORDERBY registration_count DESC; -- worst offenders first-- ── PATTERN 3: Engagement cohort — users who placed 3+ orders ───-- Goal: identify loyal repeat customers for a marketing campaignSELECT
customer_id,
COUNT(order_id) AS order_count,
MIN(order_date) AS first_order_date, -- when they became a customerMAX(order_date) AS latest_order_date, -- most recent activitySUM(amount) AS total_spend
FROM orders
GROUPBY customer_id
HAVINGCOUNT(order_id) >= 3-- only customers with 3 or more ordersORDERBY total_spend DESC;
The Pattern 2 duplicate-detection query is one of the most commonly asked SQL problems in interviews. Memorise this shape: SELECT column, COUNT() FROM table GROUP BY column HAVING COUNT() > 1. It's also the foundation of data deduplication work in real data pipelines — interviewers ask it precisely because it's genuinely useful in production.
Production Insight
The ROLLUP modifier (GROUP BY category, subcategory WITH ROLLUP) produces subtotals and grand totals automatically — useful for hierarchical reports without multiple queries.
GROUP BY GROUPING SETS allows computing multiple groupings in a single pass — more efficient than UNION ALL of separate queries.
Date truncation in GROUP BY: DATE_TRUNC('month', created_at) groups by month; EXTRACT(YEAR FROM created_at) groups by year.
Key Takeaway
Real-world GROUP BY combines multiple dimensions: category + month + region are routine.
ROLLUP and GROUPING SETS produce multi-level summaries in one query — avoid N separate GROUP BY queries.
Always ORDER BY the GROUP BY column for deterministic, human-readable report output.
Common Mistakes That Silently Break Your Queries
Most GROUP BY and HAVING bugs don't throw errors — they return wrong results that look plausible, which makes them especially dangerous. These are the ones that catch experienced developers off guard, not just beginners.
The first is using WHERE where you need HAVING, or vice versa. It sounds obvious once you know the execution order, but under deadline pressure it's easy to write WHERE SUM(amount) > 50000 and then be confused when the database throws an error about aggregate functions not being allowed in WHERE.
The second is including a column in SELECT that isn't in GROUP BY and isn't aggregated — discussed earlier, but worth re-emphasising because MySQL's lenient default mode will run the query and silently return a non-deterministic value from one of the grouped rows.
The third is a subtler one: filtering on a column alias in HAVING. Because HAVING runs after SELECT in the logical execution order (but before the alias is fully materialised in most engines), referencing an alias defined in SELECT often fails. You need to repeat the expression.
groupby_mistakes_and_fixes.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- ── MISTAKE 1: Using WHERE on an aggregate ───────────────────────-- WRONG — this throws: "aggregate functions are not allowed in WHERE"SELECT category, SUM(amount) AS total_revenue
FROM orders
WHERESUM(amount) > 50000-- ❌ WHERE runs before aggregation — SUM doesn't exist yetGROUPBY category;
-- FIXED — move the aggregate condition to HAVINGSELECT category, SUM(amount) AS total_revenue
FROM orders
GROUPBY category
HAVINGSUM(amount) > 50000; -- ✅ HAVING runs after aggregation — SUM exists here-- ── MISTAKE 2: Referencing a SELECT alias in HAVING ──────────────-- WRONG — most databases can't resolve the alias 'total_revenue' in HAVING-- (PostgreSQL and SQL Server will error; MySQL may accept it as a non-standard extension)SELECT
category,
SUM(amount) AS total_revenue
FROM orders
GROUPBY category
HAVING total_revenue > 50000; -- ❌ alias not guaranteed to be visible in HAVING-- FIXED — repeat the full aggregate expression in HAVINGSELECT
category,
SUM(amount) AS total_revenue
FROM orders
GROUPBY category
HAVINGSUM(amount) > 50000; -- ✅ reference the expression, not the alias-- ── MISTAKE 3: Forgetting that COUNT(*) includes NULLs ───────────-- COUNT(*) counts all rows in the group, including rows with NULL values.-- COUNT(column_name) counts only non-NULL values in that column.-- These can give different numbers — and the difference matters.SELECT
category,
COUNT(*) AS all_rows, -- counts every row, NULLs includedCOUNT(amount) AS non_null_amounts -- counts only rows where amount IS NOT NULLFROM orders
GROUPBY category;
-- If your 'amount' column has NULLs, these two columns will differ.-- Use COUNT(column_name) when you specifically want to exclude NULLs from your count.
Output
-- Mistake 1 (wrong version): ERROR
-- ERROR: aggregate functions are not allowed in WHERE
-- Mistake 1 (fixed): OK
category | total_revenue
------------+--------------
Electronics | 84350.00
Furniture | 61200.00
-- Mistake 2 (wrong version): ERROR in PostgreSQL/SQL Server
-- ERROR: column 'total_revenue' does not exist
-- (MySQL may succeed due to its alias-in-HAVING extension — dangerous portability trap)
Watch Out: MySQL's Alias-in-HAVING Is Non-Standard
MySQL accepts HAVING alias_name > value as a convenience extension to standard SQL. If you write queries that way and then migrate to PostgreSQL or SQL Server, they'll break. Stick to repeating the full aggregate expression in HAVING — it works on every major database engine and makes the query explicit about what it's filtering on.
Production Insight
COUNT(column) vs COUNT(*) inside GROUP BY has the same NULL-skipping difference as in ungrouped aggregates.
Filtering with WHERE before GROUP BY is always faster than filtering groups with HAVING — reduce the dataset first.
For the 'top N per group' pattern (top 3 customers per country), use window functions instead of GROUP BY.
Key Takeaway
WHERE pre-filters rows; HAVING post-filters groups — use WHERE whenever the condition doesn't involve aggregates.
COUNT(column) skips NULLs inside groups just as it does globally — use COUNT(*) for row counts.
For top-N per group, window functions (ROW_NUMBER, RANK) are cleaner than GROUP BY tricks.
● Production incidentPOST-MORTEMseverity: high
Monthly Revenue Report Inflated by 40% Due to Missing GROUP BY Column
Symptom
Monthly product revenue totals were consistently ~40% higher than totals computed by the finance team from raw exports. No errors, no warnings.
Assumption
The developer assumed GROUP BY category was sufficient when the query also selected subcategory without aggregating it. MySQL's lenient mode silently picked an arbitrary subcategory value for each group.
Root cause
SELECT category, subcategory, SUM(revenue) ... GROUP BY category — MySQL with ONLY_FULL_GROUP_BY disabled picked an arbitrary subcategory value per group instead of raising an error. This caused subcategory-level revenue to be attributed to the wrong groups, inflating some and deflating others.
Fix
Added subcategory to GROUP BY: GROUP BY category, subcategory. Enabled ONLY_FULL_GROUP_BY mode in MySQL config so future missing GROUP BY columns throw an error rather than silently producing wrong results.
Key lesson
Enable ONLY_FULL_GROUP_BY in MySQL — lenient mode silently returns wrong data for ambiguous GROUP BY
Every non-aggregate SELECT column must be in GROUP BY — verify this for every GROUP BY query in code review
Compare aggregate query output against raw data exports as a post-deployment sanity check
Production debug guideDiagnosing wrong aggregations and filter errors3 entries
Symptom · 01
Error: aggregate functions not allowed in WHERE
→
Fix
Move aggregate conditions to HAVING: WHERE SUM(amount) > 1000 → HAVING SUM(amount) > 1000. WHERE runs before GROUP BY — aggregate values don't exist yet when WHERE is evaluated.
Symptom · 02
GROUP BY returns different counts than expected
→
Fix
Check for NULLs in the grouping column — NULL forms its own group. Run SELECT grouping_col, COUNT(*) ... GROUP BY grouping_col to see NULL as a separate group. Decide whether to COALESCE(grouping_col, 'Unknown') or filter with WHERE grouping_col IS NOT NULL.
Symptom · 03
MySQL returns results with non-grouped column without error
→
Fix
ONLY_FULL_GROUP_BY mode is disabled. Enable it: SET GLOBAL sql_mode = (SELECT CONCAT(@@sql_mode, ',ONLY_FULL_GROUP_BY')). Without it, MySQL picks arbitrary values for non-grouped columns — this produces silently wrong results.
Aspect
WHERE
HAVING
When it runs
Before GROUP BY — filters raw rows
After GROUP BY — filters aggregated groups
What it can filter on
Any column value in the raw table
Aggregate results (SUM, COUNT, AVG, etc.) or grouped columns
Can use aggregate functions?
No — causes a syntax error
Yes — that's its entire purpose
Performance impact
Reduces rows fed into grouping — faster
Runs after aggregation — no early row reduction benefit
Use case example
WHERE order_date >= '2024-01-01'
HAVING SUM(amount) > 50000
Works without GROUP BY?
Yes — filters all rows normally
Yes, but treats entire table as one group (rare / edge case)
Can they coexist in one query?
Yes — WHERE fires first, then GROUP BY, then HAVING
Yes — the typical production pattern uses both together
Key takeaways
1
GROUP BY collapses many rows into one summary row per unique group value
after that point, every SELECT column must be the group key or an aggregate function, with no exceptions.
2
SQL execution order is FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
WHERE never sees aggregate results; HAVING never sees raw pre-grouped rows. Knowing this order makes every GROUP BY / HAVING decision obvious.
3
HAVING and WHERE can and should coexist
use WHERE to strip irrelevant raw rows before grouping (cheaper), then use HAVING to filter the resulting groups by their aggregate values (necessary for conditions WHERE can't reach).
4
The duplicate-detection pattern
SELECT col, COUNT() FROM table GROUP BY col HAVING COUNT() > 1 — is one of the most practically useful SQL patterns in production data work and one of the most commonly asked interview queries.
Common mistakes to avoid
3 patterns
×
Putting an aggregate function in WHERE
Symptom
Error: aggregate functions are not allowed in WHERE — WHERE SUM(amount) > 1000 throws an error immediately
Fix
Use HAVING for aggregate conditions: GROUP BY category HAVING SUM(amount) > 1000. WHERE runs before grouping — aggregate values don't exist yet. Both WHERE and HAVING can appear together: WHERE filters rows first, HAVING filters groups after.
×
Selecting a non-grouped, non-aggregated column
Symptom
Error in PostgreSQL/SQL Server: column must appear in GROUP BY. Silent wrong results in MySQL with ONLY_FULL_GROUP_BY disabled — MySQL picks an arbitrary value for the ungrouped column.
Fix
Every column in SELECT must be in GROUP BY or wrapped in an aggregate function. Enable ONLY_FULL_GROUP_BY in MySQL to catch this automatically. Add the column to GROUP BY if you want per-column breakdowns, or remove it from SELECT if you don't need it.
×
Using COUNT(*) when COUNT(specific_column) is intended
Symptom
Group counts appear higher than expected — NULLs in the counted column are included in COUNT(*) but would be excluded from COUNT(column_name)
Fix
COUNT(*) counts every row in the group including NULLs. COUNT(column) counts only non-NULL rows. Use COUNT(DISTINCT column) to count unique non-NULL values per group. Run both side by side when debugging count discrepancies.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01JUNIOR
What is the difference between WHERE and HAVING, and can you use them to...
Q02JUNIOR
Write a query to find email addresses that appear more than once in a us...
Q03SENIOR
If you write HAVING category = 'Electronics' instead of WHERE category =...
Q01 of 03JUNIOR
What is the difference between WHERE and HAVING, and can you use them together?
ANSWER
WHERE filters individual rows before GROUP BY runs — it cannot use aggregate functions because aggregation hasn't happened yet. HAVING filters groups after GROUP BY and aggregation — it can use aggregate conditions like HAVING COUNT(*) > 5. Both can appear in the same query: WHERE runs first to reduce the rows being grouped, then GROUP BY aggregates the filtered rows, then HAVING removes groups that don't meet the aggregate condition. Example: SELECT dept, AVG(salary) FROM employees WHERE hire_year > 2020 GROUP BY dept HAVING AVG(salary) > 70000 — WHERE reduces to post-2020 employees, GROUP BY aggregates by department, HAVING keeps only high-paying departments.
Q02 of 03JUNIOR
Write a query to find email addresses that appear more than once in a users table.
ANSWER
SELECT email, COUNT() AS occurrences FROM users GROUP BY email HAVING COUNT() > 1 ORDER BY occurrences DESC. The key here is HAVING COUNT() > 1 — you cannot use WHERE COUNT() > 1 because COUNT hasn't been computed when WHERE runs. GROUP BY email creates one group per unique email address, COUNT(*) counts rows in each group, and HAVING filters to only groups with more than one row (duplicates).
Q03 of 03SENIOR
If you write HAVING category = 'Electronics' instead of WHERE category = 'Electronics', does it work?
ANSWER
Yes, it works — but it's inefficient. HAVING category = 'Electronics' filters groups after GROUP BY runs, meaning the database grouped all categories first and then discarded the non-Electronics groups. WHERE category = 'Electronics' runs before GROUP BY and eliminates non-Electronics rows before any grouping happens, so the database only groups Electronics rows. For non-aggregate conditions like this, WHERE is always more efficient than HAVING because it reduces the dataset before the aggregation work begins.
01
What is the difference between WHERE and HAVING, and can you use them together?
JUNIOR
02
Write a query to find email addresses that appear more than once in a users table.
JUNIOR
03
If you write HAVING category = 'Electronics' instead of WHERE category = 'Electronics', does it work?
SENIOR
FAQ · 3 QUESTIONS
Frequently Asked Questions
01
Can I use HAVING without GROUP BY in SQL?
Yes, technically. Without GROUP BY, the entire table is treated as one single group, and HAVING applies to it as a whole. The query SELECT COUNT() FROM orders HAVING COUNT() > 1000 is valid SQL — it returns the count if the table has over 1000 rows, or an empty result set if not. It's a valid edge case but rarely the right tool — a simple WHERE or a subquery is usually clearer.
Was this helpful?
02
Why can't I use a column alias defined in SELECT inside my HAVING clause?
Because of SQL's logical execution order: HAVING is evaluated before the SELECT list is fully resolved into its final aliases. Standard SQL requires you to repeat the full aggregate expression in HAVING (e.g. HAVING SUM(amount) > 1000, not HAVING total_revenue > 1000). MySQL allows the alias as a non-standard extension, but it's not portable to PostgreSQL, SQL Server, or Oracle.
Was this helpful?
03
What's the performance difference between filtering with WHERE versus HAVING?
WHERE is faster for non-aggregate conditions because it eliminates rows before the GROUP BY step even begins — meaning the database groups fewer rows, which is less work. HAVING runs after the full grouping and aggregation is complete, so it doesn't reduce the cost of the aggregation itself. As a rule: if the condition doesn't involve an aggregate function, put it in WHERE, not HAVING.