CTE Referenced 3 Times — 50M Rows Scanned in SQL Server
A CTE referenced three times in SQL Server caused 150M logical reads on a 50M-row table.
- A CTE (WITH clause) is a named, query-scoped result set — exists only for one query execution, no schema changes needed
- Chain multiple CTEs in one WITH clause — later CTEs can reference earlier ones, creating a readable pipeline
- Recursive CTEs need UNION ALL (not UNION), an anchor member (base case), and a recursive member that references itself
- Materialization is engine-specific: PostgreSQL 12+ inlines by default; SQL Server may re-execute a CTE every time it is referenced
- Data-modifying CTEs (PostgreSQL): DELETE/INSERT with RETURNING enables atomic archive-and-delete in one transaction
- Biggest trap: referencing an expensive CTE multiple times in SQL Server — use a #temp table instead
Every production database eventually grows a query so complex it becomes a maintenance nightmare — subqueries nested inside subqueries, the same filter logic copy-pasted five times, and a WHERE clause that scrolls off the screen. CTEs were introduced precisely because real SQL work at scale demands readable, composable, and debuggable queries.
The core problem CTEs solve is scope and readability without sacrificing power. Before CTEs, your only option was derived tables (anonymous subqueries in FROM clauses) or views (which require DDL permissions and pollute your schema). CTEs give you a named, scoped, temporary result set that lives only for the duration of your query.
By the end of this article you'll write multi-step CTEs that replace tangled subquery nests, use recursive CTEs to walk org charts, understand exactly when your database materializes a CTE versus re-evaluates it, and avoid the three production gotchas that catch even experienced engineers off guard.
The Anatomy of a CTE — Structure, Scope and Why It Beats Subqueries
A CTE is defined with the WITH keyword, followed by a name, an optional column list, and the query body in parentheses. Everything after that closing parenthesis is the outer query. The CTE only exists for the lifetime of that single outer query execution.
You can define multiple CTEs in one WITH clause by separating them with commas. Crucially, a later CTE can reference an earlier one — this is what makes chaining possible. Think of it as a pipeline: each CTE refines the data one step further before handing it to the next stage.
Why does this beat a subquery? Readability is only part of the answer. The bigger win is debuggability. When you have a three-level nested subquery and something goes wrong, you can't easily isolate which level is producing bad data. With CTEs, you SELECT directly from each named stage during development — comment out the outer query and inspect intermediate results. That feedback loop is invaluable on complex data pipelines.
Also note: a CTE referenced multiple times in the outer query does NOT automatically get computed once and cached. Whether it's materialized or re-evaluated depends on your database engine — a critical performance detail covered in Section 3.
Recursive CTEs — Walking Trees, Org Charts and Hierarchical Data
A recursive CTE has two parts separated by UNION ALL: the anchor member (the base case — where the recursion starts) and the recursive member (which references the CTE itself to walk one level deeper). The engine executes the anchor once, then keeps executing the recursive member against the previous iteration's results until it returns zero rows.
The classic use case is an organizational hierarchy stored as an adjacency list — each employee row has a manager_id pointing to another employee_id. Without a recursive CTE, flattening that requires N queries for N levels. With recursive CTEs, one query handles any depth.
Cycles in data are the most dangerous edge case. If employee A is the manager of B and B is the manager of A (data corruption happens), a naive recursive CTE loops until it hits the depth limit. Always track visited IDs and break when you detect a repeat.
CTE Materialization vs. Re-evaluation — The Performance Truth Nobody Tells You
The SQL standard says nothing about whether a CTE must be materialized (computed once, stored) or re-evaluated (re-run every time it's referenced). Each database engine makes its own choice.
PostgreSQL prior to version 12 always materialized CTEs as optimization fences — the planner could not push WHERE predicates from the outer query into the CTE. PostgreSQL 12+ inlines non-recursive CTEs by default, allowing predicate pushdown. You can force materialization with the MATERIALIZED keyword.
SQL Server never guaranteed materialization — it treats CTEs as macro expansions. If you reference a CTE three times in a UNION ALL, SQL Server may execute that query three times. For expensive CTEs referenced repeatedly, load results into a #temp_table first.
MySQL 8.0+ inlines CTEs. Oracle materializes when it determines it's beneficial and accepts the WITH MATERIALIZE hint to force it.
Production CTE Patterns — Window Functions, Data Mutations and When Not to Use CTEs
CTEs compose beautifully with window functions because window function results can't be filtered in the same SELECT where they're computed — you need a wrapper, and a CTE is cleaner than a subquery for this.
CTEs also work with INSERT, UPDATE, DELETE and MERGE in PostgreSQL and SQL Server — called data-modifying CTEs. In PostgreSQL, data-modifying CTEs execute exactly once, and their effects are visible to the outer query but not to other CTEs in the same WITH clause.
Know when NOT to use a CTE. If your query is a single flat SELECT that runs fine as a subquery, wrapping it in a CTE adds no value. CTEs shine for multi-step logic, recursion, and avoiding repeated complex expressions. A CTE over a 50-million-row unindexed table is still a full table scan — CTEs don't fix missing indexes.
| Aspect | CTE (WITH clause) | Derived Table (Subquery in FROM) | Temporary Table | View |
|---|---|---|---|---|
| Readability | Excellent — named, reusable within query | Poor — anonymous, nested | Good — named | Good — named and shared |
| Reuse within query | Yes — reference same CTE multiple times | No — must repeat the subquery | Yes | Yes |
| Recursion | Yes — WITH RECURSIVE | No | No | No |
| Schema footprint | None — query-scoped only | None | Requires CREATE/DROP | Permanent schema object |
| Materialization | Engine-specific — may re-evaluate | Always inlined | Always materialized — explicit | Engine-specific |
| Best for | Multi-step reads, recursion, one-time use | Simple single-use subqueries | Expensive results reused many times | Shared logic across many queries |
Key Takeaways
- CTEs are query-scoped named result sets — no schema changes needed, live only for one query execution, and are the cleanest way to decompose multi-step SQL into readable stages.
- Recursive CTEs require UNION ALL (not UNION), always need a termination condition, and should include cycle-detection logic for real-world hierarchical data.
- CTE materialization is engine-specific — PostgreSQL 12+ inlines by default, SQL Server may re-execute. Verify with EXPLAIN and use temp tables for expensive CTEs referenced multiple times in SQL Server.
- Data-modifying CTEs in PostgreSQL enable atomic DELETE + INSERT in one transaction — but sibling CTEs cannot see each other's changes, only the outer query can.
Common Mistakes to Avoid
- Referencing a CTE before it is defined in the WITH clause
Symptom: 'relation does not exist' or 'invalid object name' error even though the CTE is present in the query
Fix: CTEs must be defined in dependency order. A CTE can only reference CTEs that appear earlier in the same WITH clause. Reorder your CTEs so dependencies come first. - Assuming a CTE is computed once when referenced multiple times in SQL Server
Symptom: Mysteriously slow queries where EXPLAIN shows the expensive base table scanned multiple times — one scan per reference
Fix: In SQL Server, CTEs referenced more than once may be re-evaluated each time. Run SET STATISTICS IO ON and check logical reads. For expensive CTEs referenced multiple times, materialize into a #temp_table first. - Writing a recursive CTE without cycle detection on potentially dirty hierarchical data
Symptom: Query throws 'maximum recursion depth exceeded' error — happens only in production where data has referential corruption not present in test data
Fix: Always track visited node IDs in an ARRAY (PostgreSQL) or add a depth counter with WHERE depth < N. Never assume hierarchical data is clean — code defensively against cycles.
Interview Questions on This Topic
- QWhat is the difference between a CTE and a subquery, and when would you choose one in production?Mid-levelReveal
- QExplain how a recursive CTE works — what are the anchor member and recursive member, and how does the engine know when to stop?SeniorReveal
- QIf you reference the same CTE three times in a SQL Server UNION ALL, how many times does the underlying query execute — and how do you prove it?SeniorReveal
Frequently Asked Questions
Does a SQL CTE improve query performance?
Not inherently. A CTE is a logical construct, not a performance optimization. Whether it improves performance depends on whether your engine materializes it. PostgreSQL 12+ inlines CTEs by default — no performance benefit over a subquery unless you force MATERIALIZED. SQL Server may re-execute CTE logic multiple times. For genuine performance gains with reused intermediate results, use a temp table.
Can you use a CTE in an UPDATE or DELETE statement?
Yes, in most major databases. In PostgreSQL and SQL Server you can define a CTE and then UPDATE or DELETE based on it. PostgreSQL additionally supports data-modifying CTEs where the CTE itself performs the DML using RETURNING to pass affected rows to the outer query — enabling atomic archive-and-delete patterns in a single statement.
What is the difference between WITH RECURSIVE and a regular CTE?
A regular CTE is evaluated once and cannot reference itself. A recursive CTE uses WITH RECURSIVE and contains two parts joined by UNION ALL: an anchor member (base case, evaluated once) and a recursive member (references the CTE itself, runs until it returns no new rows). Recursive CTEs are essential for querying hierarchical structures like org charts, category trees, or graph data in relational tables.
That's SQL Advanced. Mark it forged?
3 min read · try the examples if you haven't