Beginner 4 min · March 06, 2026

SQL Aggregate Functions

SQL AVG NULL Behavior — Why Your LTV Report Was 23% Off

Q: Can I use multiple aggregate functions in the same SELECT?

Yes. SELECT COUNT(*), SUM(total), AVG(total), MIN(total), MAX(total) FROM orders is perfectly valid. All five aggregate functions run over the same set of rows and return their respective summary values in a single result row.

Q: What does SUM return if all values in the column are NULL?

SUM returns NULL, not zero. This catches developers off guard when building dashboards — an empty period returns NULL rather than 0. Wrap with COALESCE: COALESCE(SUM(column), 0) to return zero when no non-NULL rows exist.

Q: How do I count only rows that match a condition within an aggregate?

Use COUNT with a CASE expression: COUNT(CASE WHEN status = 'completed' THEN 1 END) counts only completed orders. This is the conditional count pattern — also works with SUM: SUM(CASE WHEN status = 'completed' THEN total ELSE 0 END). Some databases support COUNT(column) FILTER (WHERE condition) syntax as a cleaner alternative.

AVG(lifetime_value) gave 23% too-high LTV by skipping NULLs.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

✓ Production

production tested

July 27, 2026

last updated

1,713

articles · all by Naren

Before you start⏱ 20 min

✓Basic programming fundamentals
✓A computer with internet access
✓Willingness to follow along with examples

● Production Incident 🔎 Debug Guide

⚡Quick Answer

Aggregate functions collapse multiple rows into a single summary value — COUNT, SUM, AVG, MIN, MAX
COUNT(*) counts every row including NULLs; COUNT(column) skips NULLs silently
AVG also skips NULLs — AVG of (10, 20, NULL) returns 15, not 10
GROUP BY groups rows by a column before aggregating — required when mixing aggregate and non-aggregate columns in SELECT
HAVING filters groups after aggregation; WHERE filters rows before aggregation
Biggest mistake: selecting a non-aggregated column without including it in GROUP BY — most databases throw an error

✦ Definition~90s read

What is SQL Aggregate Functions?

Aggregate functions are SQL operations that collapse multiple rows into a single summary value — they exist because raw row-level data is rarely useful for decision-making. Instead of scanning 10 million transaction records, you want to know the average order value, total revenue, or highest-spending customer.

★

Imagine you have a spreadsheet with 10,000 sales records.

These functions (COUNT, SUM, AVG, MIN, MAX) are the foundation of every reporting query, dashboard, and business metric. Without them, you'd be writing application-level loops over database cursors, which is both slow and error-prone.

In practice, aggregate functions are almost always paired with GROUP BY to produce per-category summaries — average revenue per customer cohort, total sales per region, or max transaction per day. The HAVING clause then filters those grouped results, acting as a WHERE for aggregates.

This trio (aggregate + GROUP BY + HAVING) is the core pattern for any analytical query in SQL, from PostgreSQL to Snowflake to BigQuery.

The critical trap — and the reason your LTV report was off by 23% — is NULL handling. AVG ignores NULLs entirely, dividing only by non-NULL rows. If your revenue column has NULLs for free trials or churned users, AVG silently computes a misleading average.

COUNT(*) counts all rows; COUNT(column) excludes NULLs. SUM returns NULL if all values are NULL, not zero. These behaviors are consistent across every major SQL dialect, but they routinely destroy reports when data pipelines produce unexpected NULLs. Understanding this is the difference between a dashboard that looks right and one that actually is right.

Plain-English First

Imagine you have a spreadsheet with 10,000 sales records. Aggregate functions are the 'Totals' row at the bottom — COUNT tells you how many rows you have, SUM adds them all up, AVG gives you the average, and MIN/MAX find the smallest and largest values. GROUP BY is what lets you get a separate total row for each category, region, or customer rather than just one grand total for everything.

Every real application — whether it's a shopping site showing 'Average rating: 4.7 stars', a bank dashboard showing 'Total spent this month: $1,240', or an HR tool reporting 'Headcount: 342 employees' — uses aggregate functions behind the scenes. They're the bridge between raw row-level data and the summary numbers that drive business decisions.

The rules are simple until NULL gets involved. COUNT(*) and COUNT(column_name) behave completely differently. AVG silently ignores NULL values. GROUP BY requires every non-aggregated column in your SELECT. These aren't obscure edge cases — they're the cause of a surprisingly large percentage of wrong report numbers in production.

What Aggregate Functions Actually Do (And Why They Exist)

A regular SQL query like SELECT name FROM employees returns one row per employee — every single row that matches your WHERE clause. An aggregate function changes that contract entirely. Instead of 'give me all the rows', it says 'give me one summarised value across all the rows.' This is what makes them aggregate — they combine many rows into one.

Five aggregate functions cover the vast majority of real-world analytics: COUNT (how many rows), SUM (total value), AVG (mean value), MIN (smallest value), MAX (largest value). All five share a critical characteristic: they handle NULL values by ignoring them — except COUNT(*), which is the one exception.

aggregate_basics.sqlSQL

-- Setup: a simple orders table
CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    customer_id INT,
    total       DECIMAL(8,2),
    status      VARCHAR(20)
);

INSERT INTO orders VALUES
    (1, 101, 49.99,  'completed'),
    (2, 102, 120.00, 'completed'),
    (3, 101, NULL,   'pending'),   -- NULL total: not yet charged
    (4, 103, 75.50,  'completed'),
    (5, 102, 30.00,  'refunded');

-- Five aggregates on the same column
SELECT
    COUNT(*)            AS total_rows,       -- 5: counts every row
    COUNT(total)        AS rows_with_total,  -- 4: skips the NULL row
    SUM(total)          AS revenue,          -- 275.49: skips NULL
    AVG(total)          AS avg_order,        -- 68.87: 275.49/4, not /5!
    MIN(total)          AS smallest,
    MAX(total)          AS largest
FROM orders;

Output

-----------+-----------------+---------+-----------+----------+--------

5 | 4 | 275.49 | 68.87 | 30.00 | 120.00

⚠ AVG Skips NULLs — This Is Almost Never What Stakeholders Expect

AVG(total) on these 5 rows divides by 4, not 5, because NULL is excluded. If the pending order should count as zero revenue, use AVG(COALESCE(total, 0)). If it should genuinely be excluded from the average, the implicit NULL skip is correct — but document it explicitly so the next engineer reading the query understands the intent.

📊 Production Insight

COUNT(*) vs COUNT(column) is the source of countless wrong dashboard numbers in production.

Always run both side by side when debugging a count discrepancy — the difference reveals your NULL row count.

For revenue calculations, always COALESCE NULL numeric columns explicitly rather than relying on implicit skip behavior.

🎯 Key Takeaway

Aggregate functions collapse many rows into one summary — that's their entire purpose.

All five functions skip NULLs, except COUNT(*) which counts every row regardless.

Document NULL treatment explicitly in any aggregate query that will be read by business stakeholders.

thecodeforge.io

Sql Aggregate Functions

COUNT, SUM and AVG — The Most-Used Trio

COUNT comes in two forms with very different behavior. COUNT(*) counts every row in the result set, including rows with NULL values in every column. COUNT(column_name) counts only rows where that specific column is not NULL. This distinction bites constantly in production.

SUM adds all non-NULL values. NULL rows contribute nothing. SUM of a column with all NULLs returns NULL, not zero — this surprises developers who expect 0. Use COALESCE(SUM(column), 0) when zero is the correct return for an all-NULL column.

AVG computes the mean of non-NULL values only. It's mathematically equivalent to SUM(column) / COUNT(column) — both numerator and denominator exclude NULLs.

count_sum_avg_queries.sqlSQL

-- Demonstrating the COUNT(*) vs COUNT(column) difference
SELECT
    COUNT(*)           AS all_orders,
    COUNT(total)       AS charged_orders,
    COUNT(*) - COUNT(total) AS pending_not_charged,
    SUM(total)         AS total_revenue,
    COALESCE(SUM(total), 0) AS revenue_safe,  -- 0 if all NULLs
    AVG(total)         AS avg_charged,        -- excludes NULLs
    AVG(COALESCE(total, 0)) AS avg_all        -- includes NULLs as zero
FROM orders
WHERE status != 'refunded';

-- COUNT DISTINCT: unique customer count
SELECT
    COUNT(DISTINCT customer_id) AS unique_customers,
    COUNT(customer_id)          AS total_orders
FROM orders
WHERE status = 'completed';

Output

-----------+----------------+---------------------+---------------+-------------+--------

4 | 3 | 1 | 245.49 | 81.83 | 61.37

💡COUNT DISTINCT is Your Row Deduplication Tool

COUNT(DISTINCT customer_id) gives you unique customers; COUNT(customer_id) gives you total orders. If Alice placed 3 orders, COUNT includes her 3 times but COUNT DISTINCT includes her once. This is the pattern behind 'unique visitors', 'active users', and 'distinct products ordered'.

📊 Production Insight

SUM of an all-NULL column returns NULL, not zero — wrap with COALESCE(SUM(col), 0) for dashboard totals.

COUNT DISTINCT is expensive on large tables — it requires sorting or hashing all values before counting.

For approximate unique counts at scale, consider HyperLogLog (available in PostgreSQL, Redis, BigQuery) which trades 1% accuracy for O(1) memory.

🎯 Key Takeaway

COUNT(*) ≠ COUNT(column) — the difference is exactly the number of NULL rows in that column.

SUM returns NULL (not zero) when all input values are NULL — always COALESCE for dashboard totals.

COUNT DISTINCT is the right tool for unique user/product counts but is expensive — index the column.

MIN and MAX — Finding the Extremes in Your Data

MIN and MAX are the simplest aggregate functions to understand, and also the most underestimated. They don't just work on numbers — they work on dates, times, and text too. MIN on a date column gives the earliest date. MAX on a VARCHAR column gives the last value alphabetically. This makes them useful for pattern like 'first order date', 'most recent login', and 'last employee hired'.

min_max_all_five_aggregates.sqlSQL

-- MIN/MAX on dates: first and most recent order per customer
SELECT
    customer_id,
    COUNT(*)            AS order_count,
    MIN(created_at)     AS first_order_date,
    MAX(created_at)     AS most_recent_order,
    SUM(total)          AS lifetime_value,
    MAX(total)          AS largest_single_order
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
ORDER BY lifetime_value DESC;

Output

------------+-------------+------------------+-------------------+----------------+--------------

101 | 5 | 2024-01-03 | 2026-03-15 | 1240.50 | 349.99

102 | 3 | 2024-06-12 | 2026-02-28 | 820.00 | 420.00

🔥MIN and MAX Skip NULLs Too

Like all aggregate functions, MIN and MAX skip NULL values. If a column has 4 values and one NULL, MIN and MAX operate on the 4 non-NULL values. This is usually correct behavior for date columns (NULLs typically mean 'not set yet') but worth being explicit about in documentation.

📊 Production Insight

MIN(created_at) per customer_id is the standard pattern for 'acquisition date' in cohort analysis.

MAX(updated_at) identifies which rows were most recently modified — useful for incremental ETL pipelines.

For the single row with the minimum or maximum value (not just the value itself), use ORDER BY + LIMIT 1 or a window function — MIN/MAX alone can't return the full row.

🎯 Key Takeaway

MIN and MAX work on dates and text, not just numbers — first_order_date and last_login are MIN/MAX patterns.

Neither function returns the full row — use ORDER BY + LIMIT 1 or ROW_NUMBER() when you need the whole record.

Both skip NULLs — document explicitly if NULL means something specific in your column.

thecodeforge.io

Sql Aggregate Functions

GROUP BY — Where Aggregate Functions Become Truly Powerful

Everything so far has given us a single row summarising the whole table. GROUP BY changes that: instead of one grand total, it calculates a separate aggregate for each unique value in the grouping column. This is what produces 'revenue by category', 'orders per customer', 'signups per month'.

The rule is strict: every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. You cannot SELECT a column that isn't in GROUP BY and isn't aggregated — it's undefined which row's value should appear for the group.

group_by_with_aggregates.sqlSQL

-- Revenue and order count by status
SELECT
    status,
    COUNT(*)         AS order_count,
    SUM(total)       AS total_revenue,
    AVG(total)       AS avg_order_value,
    MIN(total)       AS smallest_order,
    MAX(total)       AS largest_order
FROM orders
GROUP BY status
ORDER BY total_revenue DESC;

-- HAVING: only show statuses with 2 or more orders
SELECT
    status,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
GROUP BY status
HAVING COUNT(*) >= 2  -- filter AFTER aggregation
ORDER BY revenue DESC;

Output

status | order_count | total_revenue | avg_order_value

-----------+-------------+---------------+----------------

completed | 3 | 245.49 | 81.83

refunded | 1 | 30.00 | 30.00

pending | 1 | NULL | NULL

⚠ WHERE Filters Rows; HAVING Filters Groups

WHERE runs before GROUP BY — it removes individual rows. HAVING runs after GROUP BY — it removes entire groups. You cannot use aggregate functions in WHERE. HAVING COUNT() > 5 is valid; WHERE COUNT() > 5 throws an error. Use both in the same query when you need to: WHERE to pre-filter rows, HAVING to post-filter groups.

📊 Production Insight

GROUP BY without an ORDER BY returns groups in undefined order — always add ORDER BY for consistent report output.

Grouping by date truncated to a period (DATE_TRUNC('month', created_at)) is the standard pattern for monthly trend reports.

For large tables, GROUP BY on an unindexed column forces a hash aggregate or sort — add an index on the GROUP BY column for repeated report queries.

🎯 Key Takeaway

GROUP BY produces one aggregate row per unique value — the foundation of every summary report.

Every SELECT column must be in GROUP BY or an aggregate function — this is a hard rule, not a style preference.

WHERE filters rows before grouping; HAVING filters groups after aggregation — they serve different purposes.

HAVING — The WHERE Clause for Aggregated Data

You can't use WHERE on aggregate results. That's not a bug, it's a SQL evaluation order constraint. WHERE filters rows before aggregation happens. Once COUNT(), SUM(), or AVG() collapse your rows into a single value per group, WHERE has already finished its job.

HAVING is your escape hatch. It runs after GROUP BY and aggregation, letting you filter on the computed result. Think of it as WHERE for groups, not rows. If you want departments where average salary exceeds $80,000, HAVING AVG(salary) > 80000 is the only way. Writing that filter in WHERE would silently fail or return nonsense.

The catch: HAVING forces a full scan of all groups before filtering. On large datasets with many groups, this hurts. Use WHERE to eliminate rows early whenever possible. Push filtering down to WHERE before aggregation, then only use HAVING for post-aggregation conditions. Your query planner will thank you.

DepartmentSalaryAnalysis.sqlSQL

// io.thecodeforge — database tutorial

-- Find departments with above-average total payroll
SELECT 
    department_id,
    COUNT(employee_id) AS headcount,
    SUM(salary) AS total_payroll,
    ROUND(AVG(salary), 2) AS avg_salary
FROM employees
-- WHERE happens here, before aggregation
WHERE termination_date IS NULL
GROUP BY department_id
-- HAVING happens after aggregation
HAVING SUM(salary) > 500000
ORDER BY total_payroll DESC;

Output

department_id | headcount | total_payroll | avg_salary

--------------|-----------|---------------|-----------

17 | 23 | 1245000.00 | 54130.43

42 | 31 | 1089000.00 | 35129.03

5 | 19 | 823000.00 | 43315.79

⚠ Production Trap:

HAVING without GROUP BY is valid SQL but rarely what you want. It treats the entire result set as one group. You'll get a single row back with no way to slice further. Debugging that at 2 AM is not fun.

🎯 Key Takeaway

WHERE filters rows, HAVING filters groups. Mix them up and your queries either break silently or run 10x slower than they should.

NULL Handling — The Silent Data Poison

Aggregate functions and NULL have a complicated relationship. COUNT(*) counts everything, including rows where every column is NULL. COUNT(column_name) skips NULLs entirely. SUM(), AVG(), MIN(), MAX() all silently discard NULL values before computing. Your average might be calculated over 80% of your data and you'd never know.

This becomes a production nightmare when NULLs represent missing data versus actual zeros. A NULL salary could mean "not yet entered" or "contractor with no salary." Either way, AVG(salary) treats both identically — it skips them. The result looks fine but misrepresents reality. Suddenly your quarterly compensation report shows an average that's 15% too high because entry-level positions had NULLs in the pipeline.

Mitigation: Always check NULL percentages before running aggregates. Use COUNT(*) vs COUNT(column) to gauge nullability. For critical reports, consider COALESCE() or explicit NULL handling. Better yet, enforce NOT NULL constraints at the schema level if NULLs aren't semantically valid. Database design debt shows up first in aggregate queries.

NullExposureCheck.sqlSQL

// io.thecodeforge — database tutorial

-- Spot NULL poisoning before it hits your aggregates
SELECT 
    COUNT(*) AS total_rows,
    COUNT(employee_id) AS ids_present,
    COUNT(salary) AS salaries_present,
    COUNT(*) - COUNT(salary) AS missing_salaries,
    ROUND(100.0 * COUNT(salary) / COUNT(*), 1) AS salary_completeness_pct,
    ROUND(AVG(salary), 2) AS naive_average,
    ROUND(AVG(COALESCE(salary, 0)), 2) AS zero_filled_average
FROM employees
WHERE department_id = 17;

Output

-----------|-------------|-----------------|-----------------|------------------------|---------------|-------------------

23 | 23 | 18 | 5 | 78.3 | 54130.43 | 42391.30

🔥Senior Shortcut:

Run a NULL audit once per quarter on columns you aggregate. A 5% NULL rate on a 10-million-row table means your averages are built on only 9.5M rows. That's a half-million phantom rows of uncertainty.

🎯 Key Takeaway

Aggregate functions are NULL-agnostic by default. That's a feature until it's a bug. Always know your NULL ratio before trusting an average.

Understanding SQL aggregate functions is essential, but they don't exist in isolation. To build robust data pipelines, you should pair them with window functions like ROW_NUMBER() and RANK() for running totals without collapsing rows. JOINs are critical: aggregates often query multiple tables, so mastering INNER JOIN and LEFT JOIN ensures your SUM or COUNT reflects correct relationships. Subqueries let you nest aggregates — for instance, finding departments above average salary. Finally, indexes on grouped columns drastically improve performance; without them, large datasets turn aggregate queries into bottlenecks. Explore these topics to move beyond basic reporting into performant, analytic SQL.

Related_Aggregates.sqlSQL

// io.thecodeforge — database tutorial
// 25 lines max
WITH dept_avg AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT d.department_name, da.avg_salary
FROM departments d
JOIN dept_avg da ON d.id = da.department_id
WHERE da.avg_salary > (
    SELECT AVG(salary) FROM employees
);

Output

department_name | avg_salary

Engineering | 95000.00

Sales | 82000.00

⚠ Production Trap:

Aggregates on unindexed columns in JOINs cause full table scans. Always verify execution plans before deploying.

🎯 Key Takeaway

Pair aggregates with window functions and proper indexing for scalable queries.

Feedback

Your input directly shapes TheCodeForge.io's database content. Did the NULL handling section clarify how COUNT(*) differs from COUNT(column)? Was the HAVING example with real sales data intuitive, or did it need more edge cases? I want to know if the code snippets run cleanly in your environment — particularly the MIN/MAX on timestamps, which often trips up junior engineers. Drop a comment below or open an issue in our GitHub repo. Specific feedback like 'Show how AVG ignores NULLs in salary calculations' helps us refine future tutorials. Every response gets a direct reply from our senior engineering team. We build these guides for you, not for search engines.

Feedback_Example.sqlSQL

// io.thecodeforge — database tutorial
// 25 lines max
SELECT 
    department_id,
    COUNT(*) AS total_rows,
    COUNT(salary) AS non_null_salaries,
    AVG(salary) AS avg_salary
FROM employees
WHERE department_id IN (10, 20)
GROUP BY department_id;

Output

department_id | total_rows | non_null_salaries | avg_salary

10 | 15 | 12 | 68500.00

20 | 10 | 10 | 72000.00

⚠ Production Trap:

COUNT(*) includes rows with all-NULL columns; COUNT(column) does not. Misunderstanding this skews reporting metrics.

🎯 Key Takeaway

Test aggregate behavior with NULL data in your staging environment before production deployment.

Conclusion

SQL aggregate functions transform raw rows into actionable insights — but only when you apply them with precision. We've covered the mechanics of COUNT, SUM, AVG, MIN, MAX, and the critical grouping duo GROUP BY and HAVING. The key lesson: aggregates collapse data, so always ask 'What am I losing?' when grouping. Beware NULLs: they silently exclude rows from AVG and SUM unless you use COALESCE. Start your next query by identifying the business question, then pick the aggregate that answers it directly. Run EXPLAIN ANALYZE on heavy aggregations. Master these tools, and you'll move from writing queries to designing data narratives.

Conclusion_Check.sqlSQL

// io.thecodeforge — database tutorial
// 25 lines max
SELECT 
    EXTRACT(YEAR FROM order_date) AS year,
    COUNT(DISTINCT customer_id) AS unique_buyers,
    ROUND(AVG(total_amount)::numeric, 2) AS avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY year
HAVING COUNT(*) > 100
ORDER BY year DESC;

Output

year | unique_buyers | avg_order_value

2024 | 342 | 125.50

2023 | 298 | 118.75

⚠ Production Trap:

HAVING without GROUP BY treats entire table as one group — rarely intended. Always verify your GROUP BY logic.

🎯 Key Takeaway

Start with the business question, then choose your aggregate. Always test with edge cases.

● Production incidentPOST-MORTEMseverity: high

Customer Lifetime Value Report Was 23% Higher Than Reality

Symptom

Monthly business review showed customer LTV 23% above the finance team's number. Both teams were querying the same database. No data discrepancy — just different queries.

Assumption

The analytics team used AVG(lifetime_value) and assumed NULL lifetime_value meant the customer had zero value. Finance used SUM(lifetime_value) / COUNT(*) which treated NULLs as zero in the denominator.

Root cause

AVG(lifetime_value) skips NULL rows entirely — it only averages the rows where a value exists. Customers who had never completed a purchase had NULL lifetime_value. The analytics query divided by the number of paying customers only; finance divided by all customers including nulls.

Fix

Changed to AVG(COALESCE(lifetime_value, 0)) to treat non-purchasing customers as zero value. Added a comment explaining the NULL treatment so the query is unambiguous to future readers.

Key lesson

AVG skips NULLs — this is mathematically valid but rarely what business stakeholders expect
Explicitly COALESCE NULL numeric columns before aggregating when zero is the intended value
When two queries on the same data return different aggregates, NULL treatment is the first thing to check

Production debug guideDiagnosing wrong counts, inflated averages, and GROUP BY errors4 entries

Symptom · 01

COUNT returns fewer rows than expected

→

Fix

You are using COUNT(column_name) on a column with NULLs. Switch to COUNT() to count all rows, or run SELECT COUNT(), COUNT(column_name) side by side to see the gap. The difference is your NULL row count.

Symptom · 02

AVG is higher than expected

→

Fix

NULLs are being excluded from the average. Run SELECT COUNT(), COUNT(column_name), SUM(column_name) to verify. If COUNT() != COUNT(column_name), you have NULLs. Use AVG(COALESCE(column_name, 0)) if zero is the correct treatment.

Symptom · 03

Error: column must appear in GROUP BY or aggregate function

→

Fix

You have a non-aggregated column in SELECT that isn't in GROUP BY. Either add it to GROUP BY, wrap it in an aggregate (MAX, MIN), or remove it from SELECT. PostgreSQL and SQL Server are strict about this; MySQL's ONLY_FULL_GROUP_BY mode controls it.

Symptom · 04

HAVING filter isn't working as expected

→

Fix

Check if you meant WHERE. HAVING filters groups after aggregation — it can use aggregate conditions like HAVING COUNT(*) > 5. If you're filtering on a non-aggregated column, use WHERE before GROUP BY.

Function	What It Computes	NULL Behavior	Works On
COUNT(*)	Total row count	Includes NULLs	Any table
COUNT(col)	Non-NULL row count	Skips NULLs	Any column
SUM(col)	Total of all values	Skips NULLs; returns NULL if all NULL	Numeric
AVG(col)	Mean of all values	Skips NULLs — divides by non-NULL count	Numeric
MIN(col)	Smallest value	Skips NULLs	Numeric, date, text
MAX(col)	Largest value	Skips NULLs	Numeric, date, text

⚙ Quick Reference

9 commands from this guide

File	Command / Code	Purpose
aggregate_basics.sql	CREATE TABLE orders (	What Aggregate Functions Actually Do (And Why They Exist)
count_sum_avg_queries.sql	SELECT	COUNT, SUM and AVG
min_max_all_five_aggregates.sql	SELECT	MIN and MAX
group_by_with_aggregates.sql	SELECT	GROUP BY
DepartmentSalaryAnalysis.sql	SELECT	HAVING
NullExposureCheck.sql	SELECT	NULL Handling
Related_Aggregates.sql	WITH dept_avg AS (	Related Content
Feedback_Example.sql	SELECT	Feedback
Conclusion_Check.sql	SELECT	Conclusion

Key takeaways

COUNT(*) counts every row including NULLs; COUNT(column) skips NULLs

the difference equals your NULL row count.

All aggregate functions except COUNT(*) skip NULL values

document NULL treatment explicitly for any business metric.

AVG skips NULLs in the denominator

use AVG(COALESCE(col, 0)) when NULL should mean zero.

GROUP BY requires every non-aggregated SELECT column to be in the GROUP BY clause

this is a hard rule.

WHERE filters rows before GROUP BY; HAVING filters groups after

aggregate functions belong in HAVING, not WHERE.

Common mistakes to avoid

3 patterns

Using COUNT(column) when COUNT(*) is intended

Symptom

Row count is lower than expected — the discrepancy equals the number of NULL rows in that column, but no error is thrown

Fix

Use COUNT(*) to count all rows regardless of NULLs. Only use COUNT(column_name) when you explicitly want to count non-NULL entries in that specific column. Run both side by side to expose the difference.

Selecting a non-aggregated column without adding it to GROUP BY

Symptom

Error: column must appear in GROUP BY clause or be used in an aggregate function (PostgreSQL) or column is not in SELECT list (MySQL in strict mode)

Fix

Every column in SELECT must either appear in GROUP BY or be wrapped in an aggregate function (MAX, MIN, COUNT, SUM, AVG). Add the column to GROUP BY if you want one row per unique combination, or remove it from SELECT if you don't need it.

Using WHERE to filter aggregate results instead of HAVING

Symptom

Error: aggregate functions are not allowed in WHERE — WHERE SUM(amount) > 1000 throws an error immediately

Fix

Use HAVING to filter on aggregate results: HAVING SUM(amount) > 1000. WHERE filters individual rows before grouping; HAVING filters groups after aggregation. Both can appear in the same query.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

What is the difference between COUNT(*) and COUNT(column_name)?

Q02JUNIOR

What is the difference between WHERE and HAVING? Why can't you use WHERE...

Q03SENIOR

A column contains five values — 10, 20, NULL, 30, NULL. What does AVG re...

Q01 of 03JUNIOR

What is the difference between COUNT(*) and COUNT(column_name)?

ANSWER

COUNT() counts every row in the result set, including rows where every column is NULL. COUNT(column_name) counts only rows where that specific column has a non-NULL value — NULL rows are silently skipped. If a table has 100 rows and 20 have NULL in the email column, COUNT() returns 100 and COUNT(email) returns 80. The difference tells you the NULL count: COUNT() - COUNT(email) = 20. In practice, use COUNT() when you want row counts and COUNT(column) only when you explicitly want to exclude NULLs from the count.

FAQ · 3 QUESTIONS

Frequently Asked Questions

Can I use multiple aggregate functions in the same SELECT?

What does SUM return if all values in the column are NULL?

How do I count only rows that match a condition within an aggregate?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

✓ Verified

production tested

July 27, 2026

last updated

1,713

articles · all by Naren

🔥

That's SQL Basics. Mark it forged?

4 min read · try the examples if you haven't

SQL AVG NULL Behavior — Why Your LTV Report Was 23% Off

What Aggregate Functions Actually Do (And Why They Exist)

COUNT, SUM and AVG — The Most-Used Trio

MIN and MAX — Finding the Extremes in Your Data

GROUP BY — Where Aggregate Functions Become Truly Powerful

HAVING — The WHERE Clause for Aggregated Data

NULL Handling — The Silent Data Poison

Related Content

Feedback

Conclusion

Customer Lifetime Value Report Was 23% Higher Than Reality

Key takeaways

Common mistakes to avoid

Using COUNT(column) when COUNT(*) is intended

Selecting a non-aggregated column without adding it to GROUP BY

Using WHERE to filter aggregate results instead of HAVING

Interview Questions on This Topic

Frequently Asked Questions

That's SQL Basics. Mark it forged?