SQL Window Functions: No PARTITION BY Crashes 50M Rows
Unpartitioned SUM(revenue) OVER(ORDER BY date) on 50M rows hit 100% CPU and blocked all dashboards for 12 minutes.
20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.
- Window functions compute values across related rows without collapsing them — unlike GROUP BY which removes individual rows
- OVER() defines the window: PARTITION BY groups, ORDER BY defines ordering within the group, frame clause controls which rows are included
- ROW_NUMBER() = unique sequential rank, RANK() = gaps on ties, DENSE_RANK() = no gaps on ties
- LAG(col, n) / LEAD(col, n) access n rows before/after the current row — essential for period-over-period comparisons
- Window functions cannot be filtered in the same SELECT where they are computed — wrap in a CTE or subquery
- Biggest trap: large PARTITION BY with no supporting index causes filesort on every execution
Imagine you're in a running race and you want to know your position — but you also want to see everyone else's position on the same leaderboard. GROUP BY would give you one row per runner (just the counts). A window function gives you each runner's row plus a computed position alongside it, without collapsing the data. You can see 'Alice: 1st place', 'Bob: 2nd place', 'Clara: 2nd place (tied)' all at once.
Every senior data engineer has hit the same wall: you need running totals, percentile ranks, or 'compare this row to its neighbours' logic, and suddenly GROUP BY feels like trying to paint a wall with a toothbrush. Window functions are the feature that separates analysts who write five nested subqueries from those who write one clean, efficient query.
The key mental model: a window function computes a value for each row using a defined set of related rows (the 'window') without removing any rows from the result. GROUP BY collapses rows into summaries. Window functions add derived columns to each row while leaving all rows intact.
Why Window Functions Are Not Just Fancy GROUP BYs
A window function computes a value across a set of rows related to the current row, without collapsing them into a single output row. Unlike aggregate functions (SUM, AVG) that reduce N rows to 1, window functions preserve every row while adding a computed column. The core mechanic is the OVER() clause, which defines the window frame — the subset of rows the function operates on. Without PARTITION BY, the frame defaults to the entire result set, meaning every row sees the same aggregate over all 50 million rows. That single-pass scan over the full dataset is O(n) per row, turning a query into a full table scan with a massive memory footprint. In practice, this means a query that should run in milliseconds can balloon into a multi-minute operation, spilling to disk and hammering tempdb or swap. The key property: window functions execute after WHERE, GROUP BY, and HAVING but before ORDER BY, so they operate on the filtered, grouped intermediate result. Use them when you need running totals, moving averages, row rankings, or comparisons to a group-level value (like 'each employee's salary vs. department average') without losing detail. In real systems, they replace self-joins and correlated subqueries, cutting query complexity from O(n²) to O(n log n) or better.
OVER() to compute total sales alongside each row. Symptom: query runs for 12 minutes, tempdb grows by 30 GB, and the reporting dashboard times out. Rule of thumb: Always specify PARTITION BY unless you explicitly need a global aggregate — and if you do, cache it in a CTE or temp table first.The OVER Clause — How Windows Are Defined
Every window function requires an OVER clause — this is what makes it a window function rather than a regular aggregate. The OVER clause defines the window: the set of rows the function can 'see' when computing its value for the current row.
OVER() with nothing inside it means the window is the entire result set. PARTITION BY inside OVER divides the result set into groups — the function computes separately within each partition. ORDER BY inside OVER defines the ordering within each partition, which is required for ranking and running total functions.
The critical difference from GROUP BY: GROUP BY collapses multiple rows into one summary row. OVER keeps all rows intact and adds a computed column alongside each row. You can see the individual transactions AND the running total on every row simultaneously.
- GROUP BY: N rows become 1 summary row — individual rows disappear
OVER(): N rows stay N rows — a computed column is added to each- PARTITION BY: divides the window into groups (like GROUP BY but without collapsing)
- ORDER BY inside OVER: controls accumulation order for running totals and rankings
- No PARTITION BY: window is the entire result set — dangerous on large tables
OVER() with no PARTITION BY = the entire table is one window = full table sort required.OVER() is what makes a function a window function — defines the rows it can see.SQL Evaluation Order — Where Window Functions Live
Understanding the logical order of SQL clause evaluation is essential for writing correct window function queries. The order is: FROM → WHERE → GROUP BY → HAVING → WINDOW → SELECT → DISTINCT → ORDER BY → LIMIT. Window functions are computed after WHERE, GROUP BY, and HAVING but before ORDER BY and SELECT. This means:
- You cannot reference a window function alias in the same SELECT's WHERE clause — it hasn't been computed yet. If you try, you get an 'column not found' error.
- You cannot filter on a window function result in WHERE. Wrap in a CTE or subquery.
- ORDER BY happens after window functions, so you can ORDER BY a window function alias without issues.
- GROUP BY aggregates are available before window functions — your window can reference GROUP BY output (like SUM(salary) for that group).
The WINDOW clause in the evaluation order refers to the window function computation phase, not the WINDOW clause keyword (which is syntactic sugar to name a window definition for reuse).
RANK() OVER (ORDER BY SUM(salary) DESC) AS dept_rank FROM employees GROUP BY department;Ranking Functions — ROW_NUMBER, RANK, and DENSE_RANK
Three ranking functions, three subtly different behaviours when ties exist.
ROW_NUMBER() assigns a unique sequential integer to every row, even when rows are tied. The order assigned to tied rows is arbitrary (unless you add a tiebreaker to ORDER BY). Use ROW_NUMBER() when you need to pick exactly one row per group — for example, the most recent order per customer.
RANK() assigns the same number to tied rows, then skips numbers. If two rows tie for position 2, both get rank 2 and the next row gets rank 4. The gap represents the number of tied rows.
DENSE_RANK() also assigns the same number to ties, but does not skip. Two rows tied at position 2 both get 2, and the next row gets 3. Use DENSE_RANK() when you want contiguous rank values regardless of ties — 'what is the 3rd highest salary' works with DENSE_RANK, not RANK.
RANK() OVER (...) AS r FROM table WHERE r = 1 throws an error — window functions are computed after WHERE filtering, so the alias is not yet available. Wrap in a CTE: WITH ranked AS (SELECT , RANK() OVER (...) AS r FROM table) SELECT * FROM ranked WHERE r = 1.ROW_NUMBER() is the correct function for 'get latest record per group' — use ORDER BY created_at DESC and filter WHERE rn = 1.ROW_NUMBER() can cause non-deterministic query results — always add a tiebreaker (usually the PK) to ORDER BY.ROW_NUMBER() in a CTE, filter WHERE rn = 1.ROW_NUMBER() — arbitrary tie-breaking produces inconsistent results.LAG and LEAD — Comparing Rows to Their Neighbours
LAG(column, n, default) returns the value from n rows before the current row within the window. LEAD(column, n, default) returns the value from n rows ahead. Both default n to 1 if omitted. The third argument sets what to return when there is no previous or next row (otherwise NULL).
These are indispensable for period-over-period analysis: month-over-month revenue change, day-over-day user growth, previous order date for a customer. Before LAG/LEAD existed, this required a self-join — and self-joins on large tables for row comparison are notoriously slow.
Window Frames — ROWS vs RANGE and the Subtle Default
The frame clause in an OVER definition controls exactly which rows are included in the window for each row's computation. This is the least understood part of window functions and the source of subtle bugs.
The default frame when you specify ORDER BY inside OVER is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — not ROWS. The RANGE default includes all rows that are tied on the ORDER BY value up to and including the current row. For dates with no duplicates this makes no visible difference. For financial data where multiple transactions share the same timestamp, the RANGE default groups all tied rows together — your running total will jump by the combined amount of all tied rows simultaneously, which is often not what you want.
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW processes each physical row individually, regardless of ties. For running totals and moving averages, ROWS is almost always the correct choice.
The WINDOW Clause — Reusing Window Definitions
When you use the same window definition in multiple aggregate functions, repeating the full OVER clause becomes error-prone and hard to maintain. The WINDOW clause (introduced in SQL:2003 and supported by PostgreSQL, MySQL 8+, SQL Server, and Oracle) lets you name a window specification and reference it by name in multiple OVER clauses.
Instead of writing: SUM(amount) OVER (PARTITION BY region ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_region_total, AVG(amount) OVER (PARTITION BY region ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
You can define the window once and reuse it: WINDOW w AS (PARTITION BY region ORDER BY sale_date)
Then refer to it with a frame variation: SUM(amount) OVER w AS running_total, SUM(amount) OVER (w ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
This reduces duplication, improves readability, and ensures the partition/order definition is consistent across all uses. If you need to change the partition key later, you only modify the WINDOW clause.
FIRST_VALUE, LAST_VALUE, and NTH_VALUE — Row Navigation Within Partitions
FIRST_VALUE(column), LAST_VALUE(column), and NTH_VALUE(column, n) return values from specified positions within a window partition. FIRST_VALUE gets the first row's value according to the window's ORDER BY. LAST_VALUE gets the last row's value — but careful: the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so LAST_VALUE returns the value from the current row within the current frame, not necessarily the partition's absolute last. To get the true last value in the partition, specify a frame of ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
NTH_VALUE(column, n) returns the value from the nth row (1-indexed) within the window. It's useful for getting, say, the second highest sale per region without a subquery.
These functions are most valuable when you need to compare the current row to the first or last row of a group — for example, showing the first order date for a customer alongside every subsequent order, or the percentage change from the first value.
Statistical Window Functions — NTILE and CUME_DIST
NTILE(n) divides the window into n as-equal-as-possible buckets (tiles) and assigns a bucket number from 1 to n to each row. It's commonly used for quartile analysis (NTILE(4)), decile analysis (NTILE(10)), or percentiles. Rows are evenly distributed across the buckets; if the count doesn't divide evenly, the first buckets get one extra row.
CUME_DIST() computes the cumulative distribution — the relative position of a row within the window. It returns a value between 0 and 1, representing the proportion of rows with a value less than or equal to the current row's value. For the row with the highest value, CUME_DIST = 1. For the lowest, it's 1/N (where N is the partition size). This is different from PERCENT_RANK(), which uses (rank-1)/(N-1).
NTILE is excellent for creating A/B test groups, splitting data into training and test sets, or identifying top/bottom deciles. CUME_DIST is ideal for percentile-based thresholds (top 10%, above median, etc.).
Types of Window Functions — A Battlefield Taxonomy
You don't need a PhD in SQL to understand window functions. You need a clear head and a mental map of which tool does what. Competitors love to dump you into 'Aggregate vs Ranking' and call it a day. That's like handing a recruit a toolbox with 'hammer' and 'saw' labels — it's technically true, but it doesn't help you pick the right one when the server's on fire.
There are three real categories, not two. Aggregate window functions (SUM, AVG, COUNT, MAX, MIN) let you run rolling totals or moving averages without collapsing rows. They answer "What's the running sum up to this row?" — a GROUP BY killer. Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, CUME_DIST) assign positions. They answer "Where does this row sit relative to its peers?" — essential for leaderboards, deduplication, percentile splits. Value functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE) reach across rows within the same partition. They answer "What's the previous order value?" or "Who was the first customer this month?" — row-level comparisons without a self-join.
Why this matters: Pick the wrong category and your query either collapses into a single row (using GROUP BY when you meant a window) or explodes into a cartesian product (reaching outside your partition). Know your categories before you write a single OVER clause.
PERCENT_RANK and CUME_DIST — Your Stats Toolkit for Real Data
Every dashboard has a percentile request. "Show me customers in the top 10%." "Flag orders above the 95th percentile." Your juniors will reach for NTILE and then fight with bucket boundaries. Stop them. PERCENT_RANK and CUME_DIST are the right tools for this job, and most tutorials skip them entirely.
PERCENT_RANK gives you the relative rank of a row within a partition, normalized to a 0-to-1 scale. It computes as (RANK - 1) / (total_rows - 1). First row gets 0. Last gets 1. Perfect for "top 10% high spenders" when you need a continuous score, not fixed buckets.
CUME_DIST stands for cumulative distribution. It's simpler: it tells you the proportion of rows with values less than or equal to the current row. So the 100th percentile row shows 1.0, meaning all rows are <= its value. That's your "95th percentile threshold" calculator for anomaly detection.
Real talk: PERCENT_RANK is better than NTILE when you need precise cutoffs. NTILE divides into equal buckets — if you have 101 rows, NTILE(10) creates buckets of 10 or 11 rows. That's lumpy. PERCENT_RANK gives you a smooth gradient. Use it in WHERE clauses to filter your outliers.
Here's how they differ on the same data — watch the output closely.
PERCENT_RANK() OVER (ORDER BY revenue DESC) AS rnk FROM customers) WHERE rnk <= 0.1. For anomaly detection, CUME_DIST is your friend: find the 95th percentile threshold by selecting the minimum value where CUME_DIST >= 0.95.A Window Function Over an Unpartitioned 50M-Row Table Brought the Reporting Database Down
- Window functions without PARTITION BY operate over the entire table as one window — on large tables this is a full table sort
- Add a composite index on (PARTITION_BY_col, ORDER_BY_col) to support window function execution
- Test window function queries on production-scale row counts before deploying — performance does not extrapolate from sample data
ROW_NUMBER() OVER (...) AS rn FROM table) SELECT FROM ranked WHERE rn = 1.ROW_NUMBER() results are inconsistent between runs on tied valuesROW_NUMBER() assigns arbitrary order to ties. Add a tiebreaker column to the ORDER BY: ROW_NUMBER() OVER (ORDER BY created_at DESC, id DESC). The id column (PK) guarantees a deterministic result.Key takeaways
Common mistakes to avoid
3 patternsTrying to filter on a window function result in the same SELECT's WHERE clause
ROW_NUMBER() OVER (...) AS rn FROM table) SELECT FROM ranked WHERE rn = 1.Using OVER() without PARTITION BY on a large table
Relying on RANGE default for running totals when data has duplicate ORDER BY values
Interview Questions on This Topic
What is the difference between GROUP BY and a window function with PARTITION BY?
Frequently Asked Questions
20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.
That's SQL Advanced. Mark it forged?
9 min read · try the examples if you haven't