Advanced 9 min · March 12, 2026

SQL Window Functions: No PARTITION BY Crashes 50M Rows

Q: Can I use window functions in a WHERE clause?

No — window functions are computed after the WHERE clause filtering. To filter on a window function result, wrap the entire query in a CTE: WITH ranked AS (SELECT *, ROW_NUMBER() OVER (...) AS rn FROM table) SELECT * FROM ranked WHERE rn <= 5.

Q: What is the difference between SUM() OVER() and SUM() GROUP BY?

SUM() GROUP BY collapses all rows in a group into one summary row — individual rows disappear. SUM() OVER() keeps every row intact and adds the computed sum as a new column. Use GROUP BY when you need one summary row per group. Use SUM() OVER() when you need individual rows plus a derived aggregate value alongside each one.

Q: Are window functions supported in MySQL?

Yes, MySQL supports window functions from version 8.0 onward. Earlier versions (5.7 and below) require workarounds using user variables or self-joins. If you are on MySQL 5.7, consider upgrading — window functions are one of the most impactful features added in 8.0.

Unpartitioned SUM(revenue) OVER(ORDER BY date) on 50M rows hit 100% CPU and blocked all dashboards for 12 minutes.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

✓ Production

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide

⚡Quick Answer

Window functions compute values across related rows without collapsing them — unlike GROUP BY which removes individual rows
OVER() defines the window: PARTITION BY groups, ORDER BY defines ordering within the group, frame clause controls which rows are included
ROW_NUMBER() = unique sequential rank, RANK() = gaps on ties, DENSE_RANK() = no gaps on ties
LAG(col, n) / LEAD(col, n) access n rows before/after the current row — essential for period-over-period comparisons
Window functions cannot be filtered in the same SELECT where they are computed — wrap in a CTE or subquery
Biggest trap: large PARTITION BY with no supporting index causes filesort on every execution

✦ Definition~90s read

What is SQL Window Functions?

Window functions let you perform calculations across a set of rows related to the current row, without collapsing them into a single output row like GROUP BY does. They solve the problem of needing both detail-level data and aggregated context in the same query — for example, showing each employee's salary alongside the department average.

★

Imagine you're in a running race and you want to know your position — but you also want to see everyone else's position on the same leaderboard.

Without them, you'd resort to self-joins or multiple subqueries, which tank performance and readability. The OVER clause defines the window: PARTITION BY splits rows into groups (like GROUP BY), and ORDER BY determines the order within each group. The killer mistake is omitting PARTITION BY on large tables — that forces the entire 50M-row dataset into a single window, blowing up memory and crashing the engine.

Window frames (ROWS vs RANGE) control which rows the function sees; the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW can produce surprising results with ties. In the SQL evaluation order, window functions execute after WHERE, GROUP BY, and HAVING but before ORDER BY, so you can't filter on them directly — that's where a subquery or CTE comes in.

Ranking functions (ROW_NUMBER, RANK, DENSE_RANK) are the most common use case, but LAG and LEAD let you compare a row to its neighbors without a self-join, which is essential for time-series analysis or detecting changes in sequential data. When you need per-group calculations without losing row-level detail, window functions are the tool — just don't forget the PARTITION BY on big data.

Plain-English First

Imagine you're in a running race and you want to know your position — but you also want to see everyone else's position on the same leaderboard. GROUP BY would give you one row per runner (just the counts). A window function gives you each runner's row plus a computed position alongside it, without collapsing the data. You can see 'Alice: 1st place', 'Bob: 2nd place', 'Clara: 2nd place (tied)' all at once.

Every senior data engineer has hit the same wall: you need running totals, percentile ranks, or 'compare this row to its neighbours' logic, and suddenly GROUP BY feels like trying to paint a wall with a toothbrush. Window functions are the feature that separates analysts who write five nested subqueries from those who write one clean, efficient query.

The key mental model: a window function computes a value for each row using a defined set of related rows (the 'window') without removing any rows from the result. GROUP BY collapses rows into summaries. Window functions add derived columns to each row while leaving all rows intact.

Why Window Functions Are Not Just Fancy GROUP BYs

A window function computes a value across a set of rows related to the current row, without collapsing them into a single output row. Unlike aggregate functions (SUM, AVG) that reduce N rows to 1, window functions preserve every row while adding a computed column. The core mechanic is the OVER() clause, which defines the window frame — the subset of rows the function operates on. Without PARTITION BY, the frame defaults to the entire result set, meaning every row sees the same aggregate over all 50 million rows. That single-pass scan over the full dataset is O(n) per row, turning a query into a full table scan with a massive memory footprint. In practice, this means a query that should run in milliseconds can balloon into a multi-minute operation, spilling to disk and hammering tempdb or swap. The key property: window functions execute after WHERE, GROUP BY, and HAVING but before ORDER BY, so they operate on the filtered, grouped intermediate result. Use them when you need running totals, moving averages, row rankings, or comparisons to a group-level value (like 'each employee's salary vs. department average') without losing detail. In real systems, they replace self-joins and correlated subqueries, cutting query complexity from O(n²) to O(n log n) or better.

⚠ Missing PARTITION BY = Accidental Global Aggregate

Omitting PARTITION BY in a window function on a large table silently computes the aggregate over all rows, causing a full scan and massive memory pressure — not a syntax error, but a performance disaster.

📊 Production Insight

Production scenario: A reporting query on a 50M-row sales table uses SUM(amount) OVER() to compute total sales alongside each row. Symptom: query runs for 12 minutes, tempdb grows by 30 GB, and the reporting dashboard times out. Rule of thumb: Always specify PARTITION BY unless you explicitly need a global aggregate — and if you do, cache it in a CTE or temp table first.

🎯 Key Takeaway

Window functions preserve row count; aggregates reduce it — never confuse the two.

Without PARTITION BY, every row computes over the entire dataset — O(n) per row, O(n²) total.

Use window functions to replace self-joins and correlated subqueries for cleaner, faster SQL.

thecodeforge.io

Sql Window Functions

The OVER Clause — How Windows Are Defined

Every window function requires an OVER clause — this is what makes it a window function rather than a regular aggregate. The OVER clause defines the window: the set of rows the function can 'see' when computing its value for the current row.

OVER() with nothing inside it means the window is the entire result set. PARTITION BY inside OVER divides the result set into groups — the function computes separately within each partition. ORDER BY inside OVER defines the ordering within each partition, which is required for ranking and running total functions.

The critical difference from GROUP BY: GROUP BY collapses multiple rows into one summary row. OVER keeps all rows intact and adds a computed column alongside each row. You can see the individual transactions AND the running total on every row simultaneously.

window_basics_over.sqlSQL

-- Setup: monthly sales by salesperson
CREATE TABLE sales (
    sale_id    INT PRIMARY KEY,
    rep_name   VARCHAR(100),
    region     VARCHAR(50),
    sale_date  DATE,
    amount     DECIMAL(10,2)
);

-- GROUP BY collapses to one row per rep -- individual sales lost
SELECT rep_name, SUM(amount) AS total
FROM sales
GROUP BY rep_name;

-- Window function: all rows preserved + running total added
SELECT
    sale_id,
    rep_name,
    sale_date,
    amount,
    -- Running total per rep, ordered by date
    SUM(amount) OVER (
        PARTITION BY rep_name      -- separate window per rep
        ORDER BY sale_date         -- accumulate in date order
    ) AS running_total,
    -- Grand total for comparison (no PARTITION BY = whole table)
    SUM(amount) OVER () AS grand_total
FROM sales
ORDER BY rep_name, sale_date;

Output

1 | Alice Tanaka| 2024-01-05 | 1200.00 | 1200.00 | 42850.00

4 | Alice Tanaka| 2024-01-12 | 850.00 | 2050.00 | 42850.00

7 | Alice Tanaka| 2024-02-03 | 1500.00 | 3550.00 | 42850.00

2 | Ben Okafor | 2024-01-08 | 2200.00 | 2200.00 | 42850.00

Mental Model

Window vs GROUP BY — The Core Difference

GROUP BY answers: summarise all rows with value X into one row. Window function answers: for each row, what is the aggregate of the rows in my defined neighbourhood?

GROUP BY: N rows become 1 summary row — individual rows disappear
OVER(): N rows stay N rows — a computed column is added to each
PARTITION BY: divides the window into groups (like GROUP BY but without collapsing)
ORDER BY inside OVER: controls accumulation order for running totals and rankings
No PARTITION BY: window is the entire result set — dangerous on large tables

📊 Production Insight

OVER() with no PARTITION BY = the entire table is one window = full table sort required.

On large tables always PARTITION BY a column with an index to split the sort into manageable chunks.

For EXPLAIN: look for WindowAgg node — the Sort node before it reveals whether the sort is index-backed or a filesort.

🎯 Key Takeaway

OVER() is what makes a function a window function — defines the rows it can see.

PARTITION BY splits into independent groups; ORDER BY inside OVER controls accumulation order.

No PARTITION BY on large tables = full table sort = production incident waiting to happen.

SQL Evaluation Order — Where Window Functions Live

Understanding the logical order of SQL clause evaluation is essential for writing correct window function queries. The order is: FROM → WHERE → GROUP BY → HAVING → WINDOW → SELECT → DISTINCT → ORDER BY → LIMIT. Window functions are computed after WHERE, GROUP BY, and HAVING but before ORDER BY and SELECT. This means:

You cannot reference a window function alias in the same SELECT's WHERE clause — it hasn't been computed yet. If you try, you get an 'column not found' error.
You cannot filter on a window function result in WHERE. Wrap in a CTE or subquery.
ORDER BY happens after window functions, so you can ORDER BY a window function alias without issues.
GROUP BY aggregates are available before window functions — your window can reference GROUP BY output (like SUM(salary) for that group).

The WINDOW clause in the evaluation order refers to the window function computation phase, not the WINDOW clause keyword (which is syntactic sugar to name a window definition for reuse).

evaluation_order.sqlSQL

-- This query will fail because window function alias 'rn' is not available in WHERE
SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn
FROM sales
WHERE rn = 1;  -- ERROR: column 'rn' does not exist

-- Proper pattern: wrap in a CTE (WITH) to allow filter after window computation
WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn
    FROM sales
)
SELECT * FROM ranked WHERE rn = 1;

-- ORDER BY can use window function alias directly
SELECT *,
       RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS rank_val
FROM sales
ORDER BY rank_val;  -- works fine

Output

sale_id | rep_name | region | amount | rn

3 | Alice Tanaka| East | 1500.00 | 1

6 | Ben Okafor | East | 1400.00 | 2

9 | Clara Webb | West | 2000.00 | 1

🔥The WINDOW Phase Happens After GROUP BY

This is why you can use GROUP BY aggregates inside a window function: the GROUP BY has already been computed, so SUM(salary) is available as a base column for the window. For example: SELECT department, SUM(salary) AS dept_total, RANK() OVER (ORDER BY SUM(salary) DESC) AS dept_rank FROM employees GROUP BY department;

📊 Production Insight

Knowing the evaluation order prevents the most common window function mistake — trying to filter on a result in WHERE. Every production code review should flag direct filtering on window function aliases. The CTE-pattern is the standard fix and is universally supported across PostgreSQL, MySQL 8+, SQL Server, and Oracle.

🎯 Key Takeaway

Window functions are computed after WHERE/GROUP BY/HAVING but before ORDER BY. To filter on window results, use a CTE. This order is logical, not physical — the database may optimise, but the result must match the logical order.

SQL Evaluation Order — Where Window Functions Are Computed

thecodeforge.io

Sql Window Functions

Ranking Functions — ROW_NUMBER, RANK, and DENSE_RANK

Three ranking functions, three subtly different behaviours when ties exist.

ROW_NUMBER() assigns a unique sequential integer to every row, even when rows are tied. The order assigned to tied rows is arbitrary (unless you add a tiebreaker to ORDER BY). Use ROW_NUMBER() when you need to pick exactly one row per group — for example, the most recent order per customer.

RANK() assigns the same number to tied rows, then skips numbers. If two rows tie for position 2, both get rank 2 and the next row gets rank 4. The gap represents the number of tied rows.

DENSE_RANK() also assigns the same number to ties, but does not skip. Two rows tied at position 2 both get 2, and the next row gets 3. Use DENSE_RANK() when you want contiguous rank values regardless of ties — 'what is the 3rd highest salary' works with DENSE_RANK, not RANK.

ranking_functions.sqlSQL

-- Compare all three ranking functions on the same data
SELECT
    rep_name,
    region,
    SUM(amount)  AS total_sales,
    ROW_NUMBER() OVER (PARTITION BY region ORDER BY SUM(amount) DESC) AS row_num,
    RANK()       OVER (PARTITION BY region ORDER BY SUM(amount) DESC) AS rank_val,
    DENSE_RANK() OVER (PARTITION BY region ORDER BY SUM(amount) DESC) AS dense_rank_val
FROM sales
GROUP BY rep_name, region
ORDER BY region, total_sales DESC;

-- Common pattern: get the top 1 row per group
-- ROW_NUMBER guarantees exactly one row per region regardless of ties
WITH ranked_sales AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rn
    FROM sales
)
SELECT rep_name, region, amount
FROM ranked_sales
WHERE rn = 1;  -- cannot use window function directly in WHERE -- must wrap in CTE

Output

Alice Tanaka | East | 15200.00 | 1 | 1 | 1

Ben Okafor | East | 14800.00 | 2 | 2 | 2

Clara Webb | East | 14800.00 | 3 | 2 | 2 <- tied

Dana Kim | East | 12100.00 | 4 | 4 | 3 <- RANK gaps, DENSE_RANK does not

⚠ Window Functions Cannot Be Filtered in WHERE

SELECT , RANK() OVER (...) AS r FROM table WHERE r = 1 throws an error — window functions are computed after WHERE filtering, so the alias is not yet available. Wrap in a CTE: WITH ranked AS (SELECT , RANK() OVER (...) AS r FROM table) SELECT * FROM ranked WHERE r = 1.

📊 Production Insight

ROW_NUMBER() is the correct function for 'get latest record per group' — use ORDER BY created_at DESC and filter WHERE rn = 1.

Random tie-breaking in ROW_NUMBER() can cause non-deterministic query results — always add a tiebreaker (usually the PK) to ORDER BY.

Wrap ranking in a CTE before filtering on the rank value — this is a universal pattern you will write dozens of times.

🎯 Key Takeaway

ROW_NUMBER = unique rank, RANK = gaps on ties, DENSE_RANK = no gaps on ties.

To get top-1 per group: ROW_NUMBER() in a CTE, filter WHERE rn = 1.

Always add a tiebreaker column to ORDER BY inside ROW_NUMBER() — arbitrary tie-breaking produces inconsistent results.

LAG and LEAD — Comparing Rows to Their Neighbours

LAG(column, n, default) returns the value from n rows before the current row within the window. LEAD(column, n, default) returns the value from n rows ahead. Both default n to 1 if omitted. The third argument sets what to return when there is no previous or next row (otherwise NULL).

These are indispensable for period-over-period analysis: month-over-month revenue change, day-over-day user growth, previous order date for a customer. Before LAG/LEAD existed, this required a self-join — and self-joins on large tables for row comparison are notoriously slow.

lag_lead_comparison.sqlSQL

-- Month-over-month revenue change using LAG
WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', sale_date) AS month,
        SUM(amount)                    AS revenue
    FROM sales
    GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT
    month,
    revenue,
    LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_month_revenue,
    revenue - LAG(revenue, 1, 0) OVER (ORDER BY month) AS change,
    ROUND(
        100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
        / NULLIF(LAG(revenue) OVER (ORDER BY month), 0),
    1) AS pct_change
FROM monthly_revenue
ORDER BY month;

-- Find customers whose next order came >90 days after the previous
-- LEAD gives us the next order date without a self-join
SELECT
    customer_id,
    order_date                                              AS this_order,
    LEAD(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS next_order,
    LEAD(order_date) OVER (PARTITION BY customer_id ORDER BY order_date)
        - order_date                                        AS days_to_next
FROM orders
HAVING days_to_next > 90 OR days_to_next IS NULL;

Output

month | revenue | prev_revenue | change | pct_change

2024-01-01 | 42850.00 | 0.00 | 42850.00 | NULL

2024-02-01 | 51200.00 | 42850.00 | 8350.00 | 19.5

2024-03-01 | 48900.00 | 51200.00 | -2300.00 | -4.5

💡NULLIF Prevents Division by Zero in Percentage Calculations

When computing percentage change, the previous value might be zero (first month of data, new customer). Dividing by zero throws an error or returns NULL depending on the database. NULLIF(prev_value, 0) returns NULL when prev_value is zero, making the division return NULL safely instead of crashing.

📊 Production Insight

LAG/LEAD replaced thousands of correlated subqueries and self-joins in analytics codebases — if you see a self-join comparing a table to itself on adjacent rows, it is a LAG/LEAD rewrite opportunity.

Always provide a default value as the third argument (LAG(col, 1, 0)) — NULL arithmetic produces NULL results that silently break calculations.

For large tables, PARTITION BY customer_id + an index on (customer_id, order_date) is mandatory for acceptable performance.

🎯 Key Takeaway

LAG/LEAD access neighbouring rows without self-joins — the canonical tool for period-over-period comparisons.

Always supply a default (third argument) to avoid NULL arithmetic silently corrupting your calculations.

Add a composite index on (PARTITION_BY_col, ORDER_BY_col) — these queries require sorted access.

Window Frames — ROWS vs RANGE and the Subtle Default

The frame clause in an OVER definition controls exactly which rows are included in the window for each row's computation. This is the least understood part of window functions and the source of subtle bugs.

The default frame when you specify ORDER BY inside OVER is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — not ROWS. The RANGE default includes all rows that are tied on the ORDER BY value up to and including the current row. For dates with no duplicates this makes no visible difference. For financial data where multiple transactions share the same timestamp, the RANGE default groups all tied rows together — your running total will jump by the combined amount of all tied rows simultaneously, which is often not what you want.

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW processes each physical row individually, regardless of ties. For running totals and moving averages, ROWS is almost always the correct choice.

window_frames.sqlSQL

-- RANGE default vs ROWS explicit -- spot the difference
SELECT
    sale_date,
    amount,
    -- Default: RANGE -- ties share the same cumulative value
    SUM(amount) OVER (
        ORDER BY sale_date
        -- RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (implicit default)
    ) AS running_range,
    -- Explicit ROWS: each physical row accumulated independently
    SUM(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_rows
FROM sales
WHERE rep_name = 'Alice Tanaka'
ORDER BY sale_date;

-- 7-day moving average: ROWS frame with sliding window
SELECT
    sale_date,
    amount,
    AVG(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW  -- current row + 6 before it
    ) AS moving_avg_7day
FROM daily_sales
ORDER BY sale_date;

Output

-- When two sales share the same date:

-- RANGE groups them: both rows show the combined total immediately

-- ROWS accumulates: first row shows partial, second row shows full

sale_date | amount | running_range | running_rows

2024-01-05 | 800.00 | 2200.00 | 800.00 <- RANGE jumps to combined total

2024-01-05 | 1400.00| 2200.00 | 2200.00 <- ROWS accumulates each row

⚠ The RANGE Default Causes Surprising Behaviour on Ties

When you write SUM(...) OVER (ORDER BY date), the default frame is RANGE which groups all rows with the same date value. If two transactions happen on the same day, both rows will show the combined daily total immediately — not the individual accumulation. For running totals, always specify ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW explicitly.

📊 Production Insight

Running total queries in financial systems that show identical values for same-day transactions are almost always hitting the RANGE default rather than ROWS.

For moving averages, ROWS BETWEEN N PRECEDING AND CURRENT ROW is the correct frame — gives exactly N+1 rows of data regardless of value ties.

Specifying the frame explicitly (ROWS BETWEEN ...) is a production best practice — it documents intent and prevents subtle default-behaviour bugs.

🎯 Key Takeaway

Default frame with ORDER BY is RANGE — groups tied values, which is rarely what you want for running totals.

For running totals and moving averages, always specify ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW explicitly.

Explicit frame clauses are self-documenting and prevent subtle tie-handling bugs.

The WINDOW Clause — Reusing Window Definitions

When you use the same window definition in multiple aggregate functions, repeating the full OVER clause becomes error-prone and hard to maintain. The WINDOW clause (introduced in SQL:2003 and supported by PostgreSQL, MySQL 8+, SQL Server, and Oracle) lets you name a window specification and reference it by name in multiple OVER clauses.

Instead of writing: SUM(amount) OVER (PARTITION BY region ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_region_total, AVG(amount) OVER (PARTITION BY region ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg

You can define the window once and reuse it: WINDOW w AS (PARTITION BY region ORDER BY sale_date)

Then refer to it with a frame variation: SUM(amount) OVER w AS running_total, SUM(amount) OVER (w ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg

This reduces duplication, improves readability, and ensures the partition/order definition is consistent across all uses. If you need to change the partition key later, you only modify the WINDOW clause.

window_clause.sqlSQL

-- Without WINDOW clause: repetitive OVER definitions
SELECT
    region,
    sale_date,
    amount,
    SUM(amount) OVER (PARTITION BY region ORDER BY sale_date) AS running_total,
    AVG(amount) OVER (PARTITION BY region ORDER BY sale_date ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_avg
FROM sales
ORDER BY region, sale_date;

-- With WINDOW clause: define once, reuse
SELECT
    region,
    sale_date,
    amount,
    SUM(amount) OVER w                           AS running_total,
    AVG(amount) OVER (w ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS moving_avg,
    COUNT(*) OVER w                              AS sales_count_in_region_so_far
FROM sales
WINDOW w AS (PARTITION BY region ORDER BY sale_date)
ORDER BY region, sale_date;

Output

East | 2024-01-05 | 1200.00 | 1200.00 | 1200.00 | 1

East | 2024-01-12 | 850.00 | 2050.00 | 1025.00 | 2

East | 2024-02-03 | 1500.00 | 3550.00 | 1183.33 | 3

West | 2024-01-08 | 2200.00 | 2200.00 | 2200.00 | 1

💡Frame Can Be Overridden on Referenced Windows

When you reference a named window, you can still specify a different frame. The base partition and order must stay the same, but the frame clause can differ. This is ideal for computing both running totals (UNBOUNDED PRECEDING) and moving averages (e.g., 6 PRECEDING) from the same window.

📊 Production Insight

The WINDOW clause shines in complex reporting queries where you compute multiple window aggregates with the same partition/order. It reduces the risk of copy-paste errors, which in production can lead to inconsistent results between supposedly identical window definitions. Always use WINDOW for any query with two or more window functions sharing the same base frame.

🎯 Key Takeaway

The WINDOW clause names a window definition for reuse, eliminating duplication and preventing copy-paste bugs. Use it when you have multiple window functions that share the same PARTITION BY and ORDER BY.

FIRST_VALUE(column), LAST_VALUE(column), and NTH_VALUE(column, n) return values from specified positions within a window partition. FIRST_VALUE gets the first row's value according to the window's ORDER BY. LAST_VALUE gets the last row's value — but careful: the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so LAST_VALUE returns the value from the current row within the current frame, not necessarily the partition's absolute last. To get the true last value in the partition, specify a frame of ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

NTH_VALUE(column, n) returns the value from the nth row (1-indexed) within the window. It's useful for getting, say, the second highest sale per region without a subquery.

These functions are most valuable when you need to compare the current row to the first or last row of a group — for example, showing the first order date for a customer alongside every subsequent order, or the percentage change from the first value.

first_last_nth.sqlSQL

-- FIRST_VALUE and LAST_VALUE with proper frame
SELECT
    rep_name,
    sale_date,
    amount,
    FIRST_VALUE(amount) OVER (
        PARTITION BY rep_name
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_sale_amount,
    LAST_VALUE(amount) OVER (
        PARTITION BY rep_name
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_sale_amount,
    NTH_VALUE(amount, 2) OVER (
        PARTITION BY rep_name
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS second_sale_amount
FROM sales
ORDER BY rep_name, sale_date;

Output

Alice Tanaka | 2024-01-05 | 1200.00 | 1200.00 | 1500.00 | 850.00

Alice Tanaka | 2024-01-12 | 850.00 | 1200.00 | 1500.00 | 850.00

Alice Tanaka | 2024-02-03 | 1500.00 | 1200.00 | 1500.00 | 850.00

Ben Okafor | 2024-01-08 | 2200.00 | 2200.00 | 2200.00 | NULL

⚠ LAST_VALUE Traps with Default Frame

If you omit the frame when using LAST_VALUE, the default RANGE frame will only see rows up to the current row. LAST_VALUE will then return the same value as the current row for every row — not the partition's final row. Always use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING with LAST_VALUE and NTH_VALUE to get correct partition-level results.

📊 Production Insight

FIRST_VALUE and LAST_VALUE are useful for comparisons like 'show each order and how it compares to the customer's first order'. When combined with a CTE, you can compute differences such as days_since_first_order = sale_date - FIRST_VALUE(sale_date). Always check the frame — the default RANGE frame will not give you the true first/last of the partition. Use UNBOUNDED FOLLOWING for LAST_VALUE, and UNBOUNDED PRECEDING is fine for FIRST_VALUE but it's symmetric to be explicit.

🎯 Key Takeaway

FIRST_VALUE/LAST_VALUE/NTH_VALUE access rows by position within a partition. LAST_VALUE and NTH_VALUE require an explicit frame with UNBOUNDED FOLLOWING to return the true last/nth row of the partition. Default frame returns the current row for these functions.

Statistical Window Functions — NTILE and CUME_DIST

NTILE(n) divides the window into n as-equal-as-possible buckets (tiles) and assigns a bucket number from 1 to n to each row. It's commonly used for quartile analysis (NTILE(4)), decile analysis (NTILE(10)), or percentiles. Rows are evenly distributed across the buckets; if the count doesn't divide evenly, the first buckets get one extra row.

CUME_DIST() computes the cumulative distribution — the relative position of a row within the window. It returns a value between 0 and 1, representing the proportion of rows with a value less than or equal to the current row's value. For the row with the highest value, CUME_DIST = 1. For the lowest, it's 1/N (where N is the partition size). This is different from PERCENT_RANK(), which uses (rank-1)/(N-1).

NTILE is excellent for creating A/B test groups, splitting data into training and test sets, or identifying top/bottom deciles. CUME_DIST is ideal for percentile-based thresholds (top 10%, above median, etc.).

ntile_cume_dist.sqlSQL

-- NTILE: Divide reps into 4 quartiles by total sales
SELECT
    rep_name,
    SUM(amount) AS total_sales,
    NTILE(4) OVER (ORDER BY SUM(amount) DESC) AS sales_quartile,
    CUME_DIST() OVER (ORDER BY SUM(amount) DESC) AS cumulative_dist
FROM sales
GROUP BY rep_name
ORDER BY total_sales DESC;

-- Use NTILE for A/B test bucket assignment (deterministic with ORDER BY)
SELECT
    user_id,
    NTILE(3) OVER (ORDER BY user_id) AS test_group  -- 1,2,3
FROM users;

-- Find top 10% of customers by revenue using CUME_DIST
WITH customer_revenue AS (
    SELECT
        customer_id,
        SUM(amount) AS total_revenue,
        CUME_DIST() OVER (ORDER BY SUM(amount) DESC) AS percentile
    FROM orders
    GROUP BY customer_id
)
SELECT customer_id, total_revenue
FROM customer_revenue
WHERE percentile <= 0.1
ORDER BY total_revenue DESC;

Output

rep_name | total_sales | sales_quartile | cumulative_dist

Ben Okafor | 22000.00 | 1 | 1.00

Alice Tanaka | 18500.00 | 1 | 0.67

Clara Webb | 14800.00 | 2 | 0.33

Dana Kim | 12100.00 | 2 | 0.00

🔥NTILE Requires Stable Order for Deterministic Buckets

NTILE's bucket assignment is deterministic only if the ORDER BY is unique. If rows tie, the distribution may vary between executions. For A/B test assignment, use a unique column like user_id to guarantee each user ends up in the same bucket every time.

📊 Production Insight

NTILE is the standard tool for splitting data into equal-sized groups. Use NTILE(5) for quintiles in performance reviews, NTILE(4) for quartile analysis, and NTILE(10) for decile reporting. CUME_DIST is more precise than NTILE for percentile thresholds — it gives the exact cumulative proportion, whereas NTILE only groups into buckets. For 'top 10%' use CUME_DIST <= 0.1, not NTILE(10) = 1, because NTILE forces equal bucket sizes and a hard cutoff at 10% boundaries.

🎯 Key Takeaway

NTILE divides rows into as-equal buckets for quantile analysis and A/B test split. CUME_DIST gives the exact cumulative proportion (0-1) for percentile-based thresholds. Use NTILE for grouping, CUME_DIST for precise cutoff thresholds.

Types of Window Functions — A Battlefield Taxonomy

You don't need a PhD in SQL to understand window functions. You need a clear head and a mental map of which tool does what. Competitors love to dump you into 'Aggregate vs Ranking' and call it a day. That's like handing a recruit a toolbox with 'hammer' and 'saw' labels — it's technically true, but it doesn't help you pick the right one when the server's on fire.

There are three real categories, not two. Aggregate window functions (SUM, AVG, COUNT, MAX, MIN) let you run rolling totals or moving averages without collapsing rows. They answer "What's the running sum up to this row?" — a GROUP BY killer. Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, CUME_DIST) assign positions. They answer "Where does this row sit relative to its peers?" — essential for leaderboards, deduplication, percentile splits. Value functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE) reach across rows within the same partition. They answer "What's the previous order value?" or "Who was the first customer this month?" — row-level comparisons without a self-join.

Why this matters: Pick the wrong category and your query either collapses into a single row (using GROUP BY when you meant a window) or explodes into a cartesian product (reaching outside your partition). Know your categories before you write a single OVER clause.

TaxonomyDemo.sqlSQL

// io.thecodeforge — database tutorial

WITH sales AS (
  SELECT '2024-01-01' AS sale_date, 100.0 AS amount
  UNION ALL SELECT '2024-01-02', 150.0
  UNION ALL SELECT '2024-01-03', 200.0
  UNION ALL SELECT '2024-01-04', 120.0
)
SELECT
  sale_date,
  amount,
  SUM(amount) OVER (ORDER BY sale_date) AS rolling_total,      -- aggregate
  ROW_NUMBER() OVER (ORDER BY amount DESC) AS rank_by_amount,  -- ranking
  LAG(amount, 1) OVER (ORDER BY sale_date) AS prev_day_amount -- value
FROM sales;

Output

sale_date | amount | rolling_total | rank_by_amount | prev_day_amount

-----------+--------+---------------+----------------+----------------

2024-01-01 | 100.0 | 100.0 | 4 | NULL

2024-01-02 | 150.0 | 250.0 | 2 | 100.0

2024-01-03 | 200.0 | 450.0 | 1 | 150.0

2024-01-04 | 120.0 | 570.0 | 3 | 200.0

⚠ Production Trap:

Don't confuse aggregate window functions with aggregate GROUP BY. A window SUM retains every row; GROUP BY collapses. Accidentally omitting PARTITION BY in an aggregate window will compute a global total in every row — silent, wrong, and your dashboard will look flat. Always run a SELECT * to sanity-check row counts.

🎯 Key Takeaway

There are three categories of window functions: aggregate (rolling math), ranking (position assignment), and value (row-to-row navigation). Pick the category first, then the specific function.

PERCENT_RANK and CUME_DIST — Your Stats Toolkit for Real Data

Every dashboard has a percentile request. "Show me customers in the top 10%." "Flag orders above the 95th percentile." Your juniors will reach for NTILE and then fight with bucket boundaries. Stop them. PERCENT_RANK and CUME_DIST are the right tools for this job, and most tutorials skip them entirely.

PERCENT_RANK gives you the relative rank of a row within a partition, normalized to a 0-to-1 scale. It computes as (RANK - 1) / (total_rows - 1). First row gets 0. Last gets 1. Perfect for "top 10% high spenders" when you need a continuous score, not fixed buckets.

CUME_DIST stands for cumulative distribution. It's simpler: it tells you the proportion of rows with values less than or equal to the current row. So the 100th percentile row shows 1.0, meaning all rows are <= its value. That's your "95th percentile threshold" calculator for anomaly detection.

Real talk: PERCENT_RANK is better than NTILE when you need precise cutoffs. NTILE divides into equal buckets — if you have 101 rows, NTILE(10) creates buckets of 10 or 11 rows. That's lumpy. PERCENT_RANK gives you a smooth gradient. Use it in WHERE clauses to filter your outliers.

Here's how they differ on the same data — watch the output closely.

PercentileDemo.sqlSQL

// io.thecodeforge — database tutorial

WITH scores AS (
  SELECT 50 AS score UNION ALL SELECT 60 UNION ALL SELECT 70
  UNION ALL SELECT 80 UNION ALL SELECT 90
)
SELECT
  score,
  RANK() OVER (ORDER BY score) AS simple_rank,
  PERCENT_RANK() OVER (ORDER BY score) AS pct_rank,
  CUME_DIST() OVER (ORDER BY score) AS cum_distribution
FROM scores
ORDER BY score;

Output

score | simple_rank | pct_rank | cum_distribution

------+-------------+---------------------+-----------------

50 | 1 | 0.0 | 0.2

60 | 2 | 0.25 | 0.4

70 | 3 | 0.5 | 0.6

80 | 4 | 0.75 | 0.8

90 | 5 | 1.0 | 1.0

🔥Senior Shortcut:

Use PERCENT_RANK for 'top X%' filters. Wrap it in a subquery: SELECT FROM (SELECT , PERCENT_RANK() OVER (ORDER BY revenue DESC) AS rnk FROM customers) WHERE rnk <= 0.1. For anomaly detection, CUME_DIST is your friend: find the 95th percentile threshold by selecting the minimum value where CUME_DIST >= 0.95.

🎯 Key Takeaway

PERCENT_RANK gives smooth percentile scores; CUME_DIST gives cumulative distribution. Use them for precise thresholding instead of lumpy NTILE buckets.

Frame Specification: ROWS vs RANGE vs GROUPS

Window frames define the set of rows used for calculations within a partition. The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which can lead to unexpected results when duplicates exist. ROWS operates on physical row positions, RANGE uses logical ordering (same values are tied), and GROUPS treats ties as groups. For example, with ORDER BY salary, ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING includes exactly three rows, while RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING includes all rows with salaries within the range of current salary ±1. GROUPS is similar to RANGE but groups ties. Understanding these differences is critical for accurate analytics, especially with ordered data.

Consider a table with salaries: 50k, 60k, 60k, 70k. Using SUM(salary) OVER (ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) yields cumulative sums based on physical rows: 50k, 110k, 170k, 240k. With RANGE, the two 60k rows are tied, so the frame includes both when processing the first 60k: 50k, 170k, 170k, 240k. GROUPS would treat the ties as one group: 50k, 170k, 170k, 240k (same as RANGE here). Use ROWS for strict row-level control, RANGE for logical ranges, and GROUPS for grouping ties.

frame_specification.sqlSQL

-- ROWS frame: physical rows
SELECT salary,
       SUM(salary) OVER (ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS rows_cumulative
FROM employee_salaries;

-- RANGE frame: logical range (ties included)
SELECT salary,
       SUM(salary) OVER (ORDER BY salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_cumulative
FROM employee_salaries;

-- GROUPS frame: groups of ties
SELECT salary,
       SUM(salary) OVER (ORDER BY salary GROUPS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS groups_cumulative
FROM employee_salaries;

⚠ Default Frame Pitfall

📊 Production Insight

In production, prefer ROWS for predictable performance and results, especially with large datasets where RANGE can cause excessive tie handling.

🎯 Key Takeaway

Choose ROWS for physical row positions, RANGE for logical ranges including ties, and GROUPS for grouping ties together.

Window Functions for Time Series: LEAD, LAG, FIRST_VALUE, LAST_VALUE

Time series analysis often requires comparing current rows to previous or subsequent rows. LEAD and LAG access rows at a specified offset, while FIRST_VALUE and LAST_VALUE retrieve the first or last value in a window frame. For example, to compute day-over-day change in stock prices: LAG(price, 1) OVER (ORDER BY date) gives the previous day's price. LEAD(price, 1) gives the next day's price. FIRST_VALUE(price) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) returns the earliest price, and LAST_VALUE returns the latest. These functions are essential for trend analysis, moving averages, and identifying peaks/troughs.

Consider a sales table with daily revenue. To calculate the 7-day moving average: AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW). To find the difference from the first day of the month: revenue - FIRST_VALUE(revenue) OVER (PARTITION BY month ORDER BY date). For detecting anomalies, compare current value to the average of surrounding values using LAG and LEAD.

time_series_window.sqlSQL

-- Day-over-day change
SELECT date, revenue,
       revenue - LAG(revenue, 1) OVER (ORDER BY date) AS daily_change
FROM daily_sales;

-- 7-day moving average
SELECT date, revenue,
       AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_7d
FROM daily_sales;

-- First and last revenue of each month
SELECT date, revenue,
       FIRST_VALUE(revenue) OVER (PARTITION BY EXTRACT(MONTH FROM date) ORDER BY date) AS first_of_month,
       LAST_VALUE(revenue) OVER (PARTITION BY EXTRACT(MONTH FROM date) ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_of_month
FROM daily_sales;

💡LAST_VALUE Needs Explicit Frame

📊 Production Insight

For large time series, ensure proper indexing on the ORDER BY column to avoid full table scans. Use ROWS frames for deterministic performance.

🎯 Key Takeaway

LEAD and LAG access neighboring rows; FIRST_VALUE and LAST_VALUE retrieve boundary values within a frame, essential for time series comparisons.

Window Functions vs GROUP BY: When to Use Each

GROUP BY collapses rows into aggregates, losing individual row details. Window functions retain all rows while computing aggregates over a window. Use GROUP BY when you need summary statistics per group (e.g., total sales per region). Use window functions when you need per-row calculations that depend on a group (e.g., rank within region, running total). For example, to get each employee's salary and the average salary in their department, a window function is ideal: AVG(salary) OVER (PARTITION BY department). With GROUP BY, you'd need a separate join. Window functions shine for ranking, moving averages, and comparisons without losing granularity.

However, GROUP BY is more efficient for pure aggregation because it reduces data volume early. Window functions process all rows, so they can be slower on large datasets. A common pattern is to use a subquery with GROUP BY and then join, but window functions often simplify the query. For instance, to find employees earning above their department average: use AVG(salary) OVER (PARTITION BY department) and filter in an outer query. This is cleaner than a self-join.

window_vs_groupby.sqlSQL

-- Window function: department average per employee
SELECT employee_id, department, salary,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

-- Equivalent with GROUP BY and join
SELECT e.employee_id, e.department, e.salary, d.dept_avg
FROM employees e
JOIN (SELECT department, AVG(salary) AS dept_avg FROM employees GROUP BY department) d
  ON e.department = d.department;

-- Filter above average using window
SELECT * FROM (
  SELECT employee_id, department, salary,
         AVG(salary) OVER (PARTITION BY department) AS dept_avg
  FROM employees
) sub
WHERE salary > dept_avg;

🔥Performance Consideration

📊 Production Insight

In production, test both approaches. Window functions can simplify code but may increase memory usage. For large datasets, consider partitioning and indexing the OVER columns.

🎯 Key Takeaway

Use GROUP BY for summary-only queries; use window functions for per-row calculations with group context, like ranks or running totals.

● Production incidentPOST-MORTEMseverity: high

A Window Function Over an Unpartitioned 50M-Row Table Brought the Reporting Database Down

Symptom

The reporting database CPU hit 100% and all other dashboard queries queued behind the blocking query. The analyst's query was still running after 12 minutes and had to be manually killed.

Assumption

Window functions are a database feature — the engine will handle them efficiently. The query looked clean and worked in development.

Root cause

The query used SUM(revenue) OVER (ORDER BY transaction_date) with no PARTITION BY clause, computing a running total across all 50 million rows ordered by date. Without PARTITION BY, the window function operates over the entire table as a single partition. The ORDER BY inside OVER requires sorting all 50 million rows. No index on transaction_date meant a filesort on the full dataset held an exclusive sort buffer for the duration.

Fix

Added PARTITION BY to split the computation into smaller groups (by region_id), added an index on (region_id, transaction_date) to support the window partition and ordering. Query time dropped from 12 minutes to 1.8 seconds.

Key lesson

Window functions without PARTITION BY operate over the entire table as one window — on large tables this is a full table sort
Add a composite index on (PARTITION_BY_col, ORDER_BY_col) to support window function execution
Test window function queries on production-scale row counts before deploying — performance does not extrapolate from sample data

Production debug guideDiagnosing slow queries involving window functions4 entries

Symptom · 01

Window function query is slow and EXPLAIN shows Sort or WindowAgg with high cost

→

Fix

Check if PARTITION BY has an index. Create a composite index on (partition_col, order_col): CREATE INDEX idx_sales_region_date ON sales(region_id, transaction_date). The index allows the engine to skip sorting entirely.

Symptom · 02

Cannot filter on a window function result in the WHERE clause

→

Fix

Window function values cannot be used in WHERE in the same SELECT. Wrap in a CTE or subquery: WITH ranked AS (SELECT , ROW_NUMBER() OVER (...) AS rn FROM table) SELECT FROM ranked WHERE rn = 1.

Symptom · 03

ROW_NUMBER() results are inconsistent between runs on tied values

→

Fix

ROW_NUMBER() assigns arbitrary order to ties. Add a tiebreaker column to the ORDER BY: ROW_NUMBER() OVER (ORDER BY created_at DESC, id DESC). The id column (PK) guarantees a deterministic result.

Symptom · 04

Running total resets unexpectedly at certain points

→

Fix

Your PARTITION BY is grouping data in a way you did not intend. Add PARTITION BY explicitly or remove it entirely. Check whether the ORDER BY inside OVER matches your intended calculation direction.

Function	Purpose	Handles Ties	Best Used For
`ROW_NUMBER()`	Unique sequential rank	Arbitrary order for ties	Select exactly one row per group — latest record, deduplication
`RANK()`	Rank with gaps on ties	Tied rows share rank, next rank skips	Competition rankings where tied positions matter
`DENSE_RANK()`	Rank without gaps on ties	Tied rows share rank, no skips	Nth highest value queries — 'find the 3rd highest salary'
SUM/AVG/COUNT `OVER()`	Running aggregates	Depends on frame clause	Running totals, moving averages, period comparisons
LAG(col, n)	Access n rows before current	n/a	Period-over-period comparison, previous value reference
LEAD(col, n)	Access n rows after current	n/a	Future value reference, days-to-next-event calculations
`PERCENT_RANK()`	Relative percentile position 0–1	Tied rows share rank	Distribution analysis — bottom 10% queries with CTE filter
NTILE(n)	Divide into n equal buckets	Distributes evenly	Quartile/decile analysis, A/B test bucket assignment

⚙ Quick Reference

13 commands from this guide

File	Command / Code	Purpose
window_basics_over.sql	CREATE TABLE sales (	The OVER Clause
evaluation_order.sql	SELECT	SQL Evaluation Order
ranking_functions.sql	SELECT	Ranking Functions
lag_lead_comparison.sql	WITH monthly_revenue AS (	LAG and LEAD
window_frames.sql	SELECT	Window Frames
window_clause.sql	SELECT	The WINDOW Clause
first_last_nth.sql	SELECT	FIRST_VALUE, LAST_VALUE, and NTH_VALUE
ntile_cume_dist.sql	SELECT	Statistical Window Functions
TaxonomyDemo.sql	WITH sales AS (	Types of Window Functions
PercentileDemo.sql	WITH scores AS (	PERCENT_RANK and CUME_DIST
frame_specification.sql	SELECT salary,	Frame Specification
time_series_window.sql	SELECT date, revenue,	Window Functions for Time Series
window_vs_groupby.sql	SELECT employee_id, department, salary,	Window Functions vs GROUP BY

Key takeaways

Window functions compute values across related rows without collapsing them

the opposite of GROUP BY which removes individual rows.

ROW_NUMBER for deduplication, RANK for competition rankings, DENSE_RANK for Nth-highest queries.

LAG/LEAD replace expensive self-joins for row-neighbour comparisons

the canonical tool for period-over-period analysis.

The default frame with ORDER BY is RANGE

specify ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW explicitly for running totals to avoid tie-handling bugs.

PARTITION BY without an index causes a filesort on every execution

always add a composite index on (partition_col, order_col) for large tables.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

What is the difference between GROUP BY and a window function with PARTI...

Q02SENIOR

What is the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()?

Q03SENIOR

Explain window function frames — what is the default frame and when does...

Q01 of 03JUNIOR

What is the difference between GROUP BY and a window function with PARTITION BY?

ANSWER

GROUP BY collapses multiple rows into a single summary row per group — the individual rows are gone from the result. A window function with PARTITION BY computes the aggregate within each partition but keeps every individual row in the output. The computed value appears as an additional column on each row. Example: GROUP BY rep_name returns one row per rep with a total. SUM(amount) OVER (PARTITION BY rep_name) returns every transaction with the rep's running total alongside each row. Use GROUP BY when you want summaries. Use window functions when you want derived values computed from a group but need the individual rows too.

FAQ · 3 QUESTIONS

Frequently Asked Questions

Can I use window functions in a WHERE clause?

What is the difference between SUM() OVER() and SUM() GROUP BY?

Are window functions supported in MySQL?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.

✓ Verified

production tested

July 19, 2026

last updated

2,466

articles · all by Naren

🔥

That's SQL Advanced. Mark it forged?

9 min read · try the examples if you haven't

SQL Window Functions: No PARTITION BY Crashes 50M Rows

Why Window Functions Are Not Just Fancy GROUP BYs

The OVER Clause — How Windows Are Defined

SQL Evaluation Order — Where Window Functions Live

Ranking Functions — ROW_NUMBER, RANK, and DENSE_RANK

LAG and LEAD — Comparing Rows to Their Neighbours

Window Frames — ROWS vs RANGE and the Subtle Default

The WINDOW Clause — Reusing Window Definitions

FIRST_VALUE, LAST_VALUE, and NTH_VALUE — Row Navigation Within Partitions

Statistical Window Functions — NTILE and CUME_DIST

Types of Window Functions — A Battlefield Taxonomy

PERCENT_RANK and CUME_DIST — Your Stats Toolkit for Real Data

Frame Specification: ROWS vs RANGE vs GROUPS

Window Functions for Time Series: LEAD, LAG, FIRST_VALUE, LAST_VALUE

Window Functions vs GROUP BY: When to Use Each

A Window Function Over an Unpartitioned 50M-Row Table Brought the Reporting Database Down

Key takeaways

Interview Questions on This Topic

Frequently Asked Questions

That's SQL Advanced. Mark it forged?