Mid-level 3 min · March 06, 2026

SQL AVG NULL Behavior — Why Your LTV Report Was 23% Off

AVG(lifetime_value) gave 23% too-high LTV by skipping NULLs.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Aggregate functions collapse multiple rows into a single summary value — COUNT, SUM, AVG, MIN, MAX
  • COUNT(*) counts every row including NULLs; COUNT(column) skips NULLs silently
  • AVG also skips NULLs — AVG of (10, 20, NULL) returns 15, not 10
  • GROUP BY groups rows by a column before aggregating — required when mixing aggregate and non-aggregate columns in SELECT
  • HAVING filters groups after aggregation; WHERE filters rows before aggregation
  • Biggest mistake: selecting a non-aggregated column without including it in GROUP BY — most databases throw an error
Plain-English First

Imagine you have a spreadsheet with 10,000 sales records. Aggregate functions are the 'Totals' row at the bottom — COUNT tells you how many rows you have, SUM adds them all up, AVG gives you the average, and MIN/MAX find the smallest and largest values. GROUP BY is what lets you get a separate total row for each category, region, or customer rather than just one grand total for everything.

Every real application — whether it's a shopping site showing 'Average rating: 4.7 stars', a bank dashboard showing 'Total spent this month: $1,240', or an HR tool reporting 'Headcount: 342 employees' — uses aggregate functions behind the scenes. They're the bridge between raw row-level data and the summary numbers that drive business decisions.

The rules are simple until NULL gets involved. COUNT(*) and COUNT(column_name) behave completely differently. AVG silently ignores NULL values. GROUP BY requires every non-aggregated column in your SELECT. These aren't obscure edge cases — they're the cause of a surprisingly large percentage of wrong report numbers in production.

What Aggregate Functions Actually Do (And Why They Exist)

A regular SQL query like SELECT name FROM employees returns one row per employee — every single row that matches your WHERE clause. An aggregate function changes that contract entirely. Instead of 'give me all the rows', it says 'give me one summarised value across all the rows.' This is what makes them aggregate — they combine many rows into one.

Five aggregate functions cover the vast majority of real-world analytics: COUNT (how many rows), SUM (total value), AVG (mean value), MIN (smallest value), MAX (largest value). All five share a critical characteristic: they handle NULL values by ignoring them — except COUNT(*), which is the one exception.

aggregate_basics.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Setup: a simple orders table
CREATE TABLE orders (
    order_id    INT PRIMARY KEY,
    customer_id INT,
    total       DECIMAL(8,2),
    status      VARCHAR(20)
);

INSERT INTO orders VALUES
    (1, 101, 49.99,  'completed'),
    (2, 102, 120.00, 'completed'),
    (3, 101, NULL,   'pending'),   -- NULL total: not yet charged
    (4, 103, 75.50,  'completed'),
    (5, 102, 30.00,  'refunded');

-- Five aggregates on the same column
SELECT
    COUNT(*)            AS total_rows,       -- 5: counts every row
    COUNT(total)        AS rows_with_total,  -- 4: skips the NULL row
    SUM(total)          AS revenue,          -- 275.49: skips NULL
    AVG(total)          AS avg_order,        -- 68.87: 275.49/4, not /5!
    MIN(total)          AS smallest,
    MAX(total)          AS largest
FROM orders;
Output
total_rows | rows_with_total | revenue | avg_order | smallest | largest
-----------+-----------------+---------+-----------+----------+--------
5 | 4 | 275.49 | 68.87 | 30.00 | 120.00
AVG Skips NULLs — This Is Almost Never What Stakeholders Expect
AVG(total) on these 5 rows divides by 4, not 5, because NULL is excluded. If the pending order should count as zero revenue, use AVG(COALESCE(total, 0)). If it should genuinely be excluded from the average, the implicit NULL skip is correct — but document it explicitly so the next engineer reading the query understands the intent.
Production Insight
COUNT(*) vs COUNT(column) is the source of countless wrong dashboard numbers in production.
Always run both side by side when debugging a count discrepancy — the difference reveals your NULL row count.
For revenue calculations, always COALESCE NULL numeric columns explicitly rather than relying on implicit skip behavior.
Key Takeaway
Aggregate functions collapse many rows into one summary — that's their entire purpose.
All five functions skip NULLs, except COUNT(*) which counts every row regardless.
Document NULL treatment explicitly in any aggregate query that will be read by business stakeholders.

COUNT, SUM and AVG — The Most-Used Trio

COUNT comes in two forms with very different behavior. COUNT(*) counts every row in the result set, including rows with NULL values in every column. COUNT(column_name) counts only rows where that specific column is not NULL. This distinction bites constantly in production.

SUM adds all non-NULL values. NULL rows contribute nothing. SUM of a column with all NULLs returns NULL, not zero — this surprises developers who expect 0. Use COALESCE(SUM(column), 0) when zero is the correct return for an all-NULL column.

AVG computes the mean of non-NULL values only. It's mathematically equivalent to SUM(column) / COUNT(column) — both numerator and denominator exclude NULLs.

count_sum_avg_queries.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Demonstrating the COUNT(*) vs COUNT(column) difference
SELECT
    COUNT(*)           AS all_orders,
    COUNT(total)       AS charged_orders,
    COUNT(*) - COUNT(total) AS pending_not_charged,
    SUM(total)         AS total_revenue,
    COALESCE(SUM(total), 0) AS revenue_safe,  -- 0 if all NULLs
    AVG(total)         AS avg_charged,        -- excludes NULLs
    AVG(COALESCE(total, 0)) AS avg_all        -- includes NULLs as zero
FROM orders
WHERE status != 'refunded';

-- COUNT DISTINCT: unique customer count
SELECT
    COUNT(DISTINCT customer_id) AS unique_customers,
    COUNT(customer_id)          AS total_orders
FROM orders
WHERE status = 'completed';
Output
all_orders | charged_orders | pending_not_charged | total_revenue | avg_charged | avg_all
-----------+----------------+---------------------+---------------+-------------+--------
4 | 3 | 1 | 245.49 | 81.83 | 61.37
COUNT DISTINCT is Your Row Deduplication Tool
COUNT(DISTINCT customer_id) gives you unique customers; COUNT(customer_id) gives you total orders. If Alice placed 3 orders, COUNT includes her 3 times but COUNT DISTINCT includes her once. This is the pattern behind 'unique visitors', 'active users', and 'distinct products ordered'.
Production Insight
SUM of an all-NULL column returns NULL, not zero — wrap with COALESCE(SUM(col), 0) for dashboard totals.
COUNT DISTINCT is expensive on large tables — it requires sorting or hashing all values before counting.
For approximate unique counts at scale, consider HyperLogLog (available in PostgreSQL, Redis, BigQuery) which trades 1% accuracy for O(1) memory.
Key Takeaway
COUNT(*) ≠ COUNT(column) — the difference is exactly the number of NULL rows in that column.
SUM returns NULL (not zero) when all input values are NULL — always COALESCE for dashboard totals.
COUNT DISTINCT is the right tool for unique user/product counts but is expensive — index the column.

MIN and MAX — Finding the Extremes in Your Data

MIN and MAX are the simplest aggregate functions to understand, and also the most underestimated. They don't just work on numbers — they work on dates, times, and text too. MIN on a date column gives the earliest date. MAX on a VARCHAR column gives the last value alphabetically. This makes them useful for pattern like 'first order date', 'most recent login', and 'last employee hired'.

min_max_all_five_aggregates.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
-- MIN/MAX on dates: first and most recent order per customer
SELECT
    customer_id,
    COUNT(*)            AS order_count,
    MIN(created_at)     AS first_order_date,
    MAX(created_at)     AS most_recent_order,
    SUM(total)          AS lifetime_value,
    MAX(total)          AS largest_single_order
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
ORDER BY lifetime_value DESC;
Output
customer_id | order_count | first_order_date | most_recent_order | lifetime_value | largest_order
------------+-------------+------------------+-------------------+----------------+--------------
101 | 5 | 2024-01-03 | 2026-03-15 | 1240.50 | 349.99
102 | 3 | 2024-06-12 | 2026-02-28 | 820.00 | 420.00
MIN and MAX Skip NULLs Too
Like all aggregate functions, MIN and MAX skip NULL values. If a column has 4 values and one NULL, MIN and MAX operate on the 4 non-NULL values. This is usually correct behavior for date columns (NULLs typically mean 'not set yet') but worth being explicit about in documentation.
Production Insight
MIN(created_at) per customer_id is the standard pattern for 'acquisition date' in cohort analysis.
MAX(updated_at) identifies which rows were most recently modified — useful for incremental ETL pipelines.
For the single row with the minimum or maximum value (not just the value itself), use ORDER BY + LIMIT 1 or a window function — MIN/MAX alone can't return the full row.
Key Takeaway
MIN and MAX work on dates and text, not just numbers — first_order_date and last_login are MIN/MAX patterns.
Neither function returns the full row — use ORDER BY + LIMIT 1 or ROW_NUMBER() when you need the whole record.
Both skip NULLs — document explicitly if NULL means something specific in your column.

GROUP BY — Where Aggregate Functions Become Truly Powerful

Everything so far has given us a single row summarising the whole table. GROUP BY changes that: instead of one grand total, it calculates a separate aggregate for each unique value in the grouping column. This is what produces 'revenue by category', 'orders per customer', 'signups per month'.

The rule is strict: every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. You cannot SELECT a column that isn't in GROUP BY and isn't aggregated — it's undefined which row's value should appear for the group.

group_by_with_aggregates.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Revenue and order count by status
SELECT
    status,
    COUNT(*)         AS order_count,
    SUM(total)       AS total_revenue,
    AVG(total)       AS avg_order_value,
    MIN(total)       AS smallest_order,
    MAX(total)       AS largest_order
FROM orders
GROUP BY status
ORDER BY total_revenue DESC;

-- HAVING: only show statuses with 2 or more orders
SELECT
    status,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
GROUP BY status
HAVING COUNT(*) >= 2  -- filter AFTER aggregation
ORDER BY revenue DESC;
Output
status | order_count | total_revenue | avg_order_value
-----------+-------------+---------------+----------------
completed | 3 | 245.49 | 81.83
refunded | 1 | 30.00 | 30.00
pending | 1 | NULL | NULL
WHERE Filters Rows; HAVING Filters Groups
WHERE runs before GROUP BY — it removes individual rows. HAVING runs after GROUP BY — it removes entire groups. You cannot use aggregate functions in WHERE. HAVING COUNT() > 5 is valid; WHERE COUNT() > 5 throws an error. Use both in the same query when you need to: WHERE to pre-filter rows, HAVING to post-filter groups.
Production Insight
GROUP BY without an ORDER BY returns groups in undefined order — always add ORDER BY for consistent report output.
Grouping by date truncated to a period (DATE_TRUNC('month', created_at)) is the standard pattern for monthly trend reports.
For large tables, GROUP BY on an unindexed column forces a hash aggregate or sort — add an index on the GROUP BY column for repeated report queries.
Key Takeaway
GROUP BY produces one aggregate row per unique value — the foundation of every summary report.
Every SELECT column must be in GROUP BY or an aggregate function — this is a hard rule, not a style preference.
WHERE filters rows before grouping; HAVING filters groups after aggregation — they serve different purposes.
● Production incidentPOST-MORTEMseverity: high

Customer Lifetime Value Report Was 23% Higher Than Reality

Symptom
Monthly business review showed customer LTV 23% above the finance team's number. Both teams were querying the same database. No data discrepancy — just different queries.
Assumption
The analytics team used AVG(lifetime_value) and assumed NULL lifetime_value meant the customer had zero value. Finance used SUM(lifetime_value) / COUNT(*) which treated NULLs as zero in the denominator.
Root cause
AVG(lifetime_value) skips NULL rows entirely — it only averages the rows where a value exists. Customers who had never completed a purchase had NULL lifetime_value. The analytics query divided by the number of paying customers only; finance divided by all customers including nulls.
Fix
Changed to AVG(COALESCE(lifetime_value, 0)) to treat non-purchasing customers as zero value. Added a comment explaining the NULL treatment so the query is unambiguous to future readers.
Key lesson
  • AVG skips NULLs — this is mathematically valid but rarely what business stakeholders expect
  • Explicitly COALESCE NULL numeric columns before aggregating when zero is the intended value
  • When two queries on the same data return different aggregates, NULL treatment is the first thing to check
Production debug guideDiagnosing wrong counts, inflated averages, and GROUP BY errors4 entries
Symptom · 01
COUNT returns fewer rows than expected
Fix
You are using COUNT(column_name) on a column with NULLs. Switch to COUNT() to count all rows, or run SELECT COUNT(), COUNT(column_name) side by side to see the gap. The difference is your NULL row count.
Symptom · 02
AVG is higher than expected
Fix
NULLs are being excluded from the average. Run SELECT COUNT(), COUNT(column_name), SUM(column_name) to verify. If COUNT() != COUNT(column_name), you have NULLs. Use AVG(COALESCE(column_name, 0)) if zero is the correct treatment.
Symptom · 03
Error: column must appear in GROUP BY or aggregate function
Fix
You have a non-aggregated column in SELECT that isn't in GROUP BY. Either add it to GROUP BY, wrap it in an aggregate (MAX, MIN), or remove it from SELECT. PostgreSQL and SQL Server are strict about this; MySQL's ONLY_FULL_GROUP_BY mode controls it.
Symptom · 04
HAVING filter isn't working as expected
Fix
Check if you meant WHERE. HAVING filters groups after aggregation — it can use aggregate conditions like HAVING COUNT(*) > 5. If you're filtering on a non-aggregated column, use WHERE before GROUP BY.
FunctionWhat It ComputesNULL BehaviorWorks On
COUNT(*)Total row countIncludes NULLsAny table
COUNT(col)Non-NULL row countSkips NULLsAny column
SUM(col)Total of all valuesSkips NULLs; returns NULL if all NULLNumeric
AVG(col)Mean of all valuesSkips NULLs — divides by non-NULL countNumeric
MIN(col)Smallest valueSkips NULLsNumeric, date, text
MAX(col)Largest valueSkips NULLsNumeric, date, text

Key takeaways

1
COUNT(*) counts every row including NULLs; COUNT(column) skips NULLs
the difference equals your NULL row count.
2
All aggregate functions except COUNT(*) skip NULL values
document NULL treatment explicitly for any business metric.
3
AVG skips NULLs in the denominator
use AVG(COALESCE(col, 0)) when NULL should mean zero.
4
GROUP BY requires every non-aggregated SELECT column to be in the GROUP BY clause
this is a hard rule.
5
WHERE filters rows before GROUP BY; HAVING filters groups after
aggregate functions belong in HAVING, not WHERE.

Common mistakes to avoid

3 patterns
×

Using COUNT(column) when COUNT(*) is intended

Symptom
Row count is lower than expected — the discrepancy equals the number of NULL rows in that column, but no error is thrown
Fix
Use COUNT(*) to count all rows regardless of NULLs. Only use COUNT(column_name) when you explicitly want to count non-NULL entries in that specific column. Run both side by side to expose the difference.
×

Selecting a non-aggregated column without adding it to GROUP BY

Symptom
Error: column must appear in GROUP BY clause or be used in an aggregate function (PostgreSQL) or column is not in SELECT list (MySQL in strict mode)
Fix
Every column in SELECT must either appear in GROUP BY or be wrapped in an aggregate function (MAX, MIN, COUNT, SUM, AVG). Add the column to GROUP BY if you want one row per unique combination, or remove it from SELECT if you don't need it.
×

Using WHERE to filter aggregate results instead of HAVING

Symptom
Error: aggregate functions are not allowed in WHERE — WHERE SUM(amount) > 1000 throws an error immediately
Fix
Use HAVING to filter on aggregate results: HAVING SUM(amount) > 1000. WHERE filters individual rows before grouping; HAVING filters groups after aggregation. Both can appear in the same query.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between COUNT(*) and COUNT(column_name)?
Q02JUNIOR
What is the difference between WHERE and HAVING? Why can't you use WHERE...
Q03SENIOR
A column contains five values — 10, 20, NULL, 30, NULL. What does AVG re...
Q01 of 03JUNIOR

What is the difference between COUNT(*) and COUNT(column_name)?

ANSWER
COUNT() counts every row in the result set, including rows where every column is NULL. COUNT(column_name) counts only rows where that specific column has a non-NULL value — NULL rows are silently skipped. If a table has 100 rows and 20 have NULL in the email column, COUNT() returns 100 and COUNT(email) returns 80. The difference tells you the NULL count: COUNT() - COUNT(email) = 20. In practice, use COUNT() when you want row counts and COUNT(column) only when you explicitly want to exclude NULLs from the count.
FAQ · 3 QUESTIONS

Frequently Asked Questions

01
Can I use multiple aggregate functions in the same SELECT?
02
What does SUM return if all values in the column are NULL?
03
How do I count only rows that match a condition within an aggregate?
🔥

That's SQL Basics. Mark it forged?

3 min read · try the examples if you haven't

Previous
SQL GROUP BY and HAVING
9 / 16 · SQL Basics
Next
SQL Subqueries