CTE Referenced 3 Times — 50M Rows Scanned in SQL Server
A CTE referenced three times in SQL Server caused 150M logical reads on a 50M-row table.
20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.
- A CTE (WITH clause) is a named, query-scoped result set — exists only for one query execution, no schema changes needed
- Chain multiple CTEs in one WITH clause — later CTEs can reference earlier ones, creating a readable pipeline
- Recursive CTEs need UNION ALL (not UNION), an anchor member (base case), and a recursive member that references itself
- Materialization is engine-specific: PostgreSQL 12+ inlines by default; SQL Server may re-execute a CTE every time it is referenced
- Data-modifying CTEs (PostgreSQL): DELETE/INSERT with RETURNING enables atomic archive-and-delete in one transaction
- Biggest trap: referencing an expensive CTE multiple times in SQL Server — use a #temp table instead
Imagine you're building a massive LEGO set. Instead of dumping all 1,000 pieces on the floor and panicking, you sort them into labeled bags first — 'wheels bag', 'windows bag', 'roof bag'. A CTE is exactly that labeled bag. You name a mini-query, tuck it away with a label, then reference that label as if it were a real table throughout your main query. No mess, no repeated code, no confusion.
Every production database eventually grows a query so complex it becomes a maintenance nightmare — subqueries nested inside subqueries, the same filter logic copy-pasted five times, and a WHERE clause that scrolls off the screen. CTEs were introduced precisely because real SQL work at scale demands readable, composable, and debuggable queries.
The core problem CTEs solve is scope and readability without sacrificing power. Before CTEs, your only option was derived tables (anonymous subqueries in FROM clauses) or views (which require DDL permissions and pollute your schema). CTEs give you a named, scoped, temporary result set that lives only for the duration of your query.
By the end of this article you'll write multi-step CTEs that replace tangled subquery nests, use recursive CTEs to walk org charts, understand exactly when your database materializes a CTE versus re-evaluates it, and avoid the three production gotchas that catch even experienced engineers off guard.
What a CTE Actually Is (and Isn't)
A Common Table Expression (CTE) is a named, temporary result set defined within a single SQL statement's execution scope. It is not a temp table, not a view, and not materialized by default — it's a syntactic construct that the query optimizer can rewrite, inline, or execute multiple times. In SQL Server, a CTE referenced three times in the same query can cause the underlying table to be scanned three separate times, leading to 50M rows scanned where 16M would suffice.
A CTE is defined with the WITH clause and exists only for the duration of the query that follows it. Crucially, SQL Server does not automatically cache or spool the CTE's result set. Each reference to the CTE name is expanded independently by the optimizer, meaning the same complex join or aggregation inside the CTE definition may be executed once per reference. This is the root cause of the "CTE multiplied scans" antipattern.
Use CTEs for readability and recursive queries, but never assume they optimize like temp tables. When the same CTE is referenced multiple times — especially against large tables — explicitly materialize it into a temp table or table variable. A single #temp table with a clustered index can collapse three full scans into one, cutting logical reads from 150,000 to 50,000.
The Anatomy of a CTE — Structure, Scope and Why It Beats Subqueries
A CTE is defined with the WITH keyword, followed by a name, an optional column list, and the query body in parentheses. Everything after that closing parenthesis is the outer query. The CTE only exists for the lifetime of that single outer query execution.
You can define multiple CTEs in one WITH clause by separating them with commas. Crucially, a later CTE can reference an earlier one — this is what makes chaining possible. Think of it as a pipeline: each CTE refines the data one step further before handing it to the next stage.
Why does this beat a subquery? Readability is only part of the answer. The bigger win is debuggability. When you have a three-level nested subquery and something goes wrong, you can't easily isolate which level is producing bad data. With CTEs, you SELECT directly from each named stage during development — comment out the outer query and inspect intermediate results. That feedback loop is invaluable on complex data pipelines.
Also note: a CTE referenced multiple times in the outer query does NOT automatically get computed once and cached. Whether it's materialized or re-evaluated depends on your database engine — a critical performance detail covered in Section 3.
CTE Scope — Why You Can't Reference a CTE in Another Query
A CTE (common table expression) is defined within a single SQL statement. Its name only exists for the duration of that statement — from the WITH keyword to the final semicolon. Once the query finishes executing, the CTE disappears. This limited scope is by design: it allows the database to manage temporary names without creating any persistent objects.
Why can't you reference a CTE in a separate query? Because the database engine does not store the CTE's result anywhere persistent. It is part of the query plan that exists only during execution. If you try to use the CTE name in a different batch or even in a separate statement within the same script, you'll get an error like "invalid object name" or "relation does not exist." This is fundamentally different from a view or a temporary table, which both have a longer lifespan.
Consider two separate queries in a script: the first defines and uses a CTE; the second tries to reference that same CTE name. The second query fails immediately because the CTE was not defined in its own WITH clause. This is the most common "gotcha" for engineers new to CTEs — they expect the name to carry over like a variable in a programming language.
The fix is simple: if you need to reuse the same logic across multiple queries, create a view or a temporary table instead. CTEs are scoped to a single query for a reason — they encourage composability within that query without creating side effects that other queries might accidentally depend on.
Recursive CTEs — Walking Trees, Org Charts and Hierarchical Data
A recursive CTE has two parts separated by UNION ALL: the anchor member (the base case — where the recursion starts) and the recursive member (which references the CTE itself to walk one level deeper). The engine executes the anchor once, then keeps executing the recursive member against the previous iteration's results until it returns zero rows.
The classic use case is an organizational hierarchy stored as an adjacency list — each employee row has a manager_id pointing to another employee_id. Without a recursive CTE, flattening that requires N queries for N levels. With recursive CTEs, one query handles any depth.
Cycles in data are the most dangerous edge case. If employee A is the manager of B and B is the manager of A (data corruption happens), a naive recursive CTE loops until it hits the depth limit. Always track visited IDs and break when you detect a repeat.
Recursive CTE for Graph Traversal — Shortest Path in a Directed Graph
Recursive CTEs are not limited to tree hierarchies; they can also traverse arbitrary directed graphs. A common production need is finding the shortest path between two nodes in a graph stored as an adjacency list (edges table). The challenge is that a graph can have multiple paths to the same node, and we want the shortest (fewest edges). A recursive CTE with breadth-first exploration naturally finds the shortest path because it visits nodes in order of increasing depth — the first time a target node is reached gives the minimal edge count.
To make this work efficiently: track visited nodes to avoid infinite loops (cycles) and store the path taken so far. Use an array to accumulate visited node IDs, and break when the target is found. Because the recursive CTE processes each level before moving to the next (as long as the database uses a BFS strategy, which PostgreSQL and SQL Server do for recursive CTEs), the first appearance of the target node will be the shortest path.
Performance note: on large graphs, recursive CTEs can be slow because they are not indexed for graph traversal. Use a LIMIT clause to stop early once the target is found, and add a depth bound to prevent runaway queries.
CTE Materialization vs. Re-evaluation — The Performance Truth Nobody Tells You
The SQL standard says nothing about whether a CTE must be materialized (computed once, stored) or re-evaluated (re-run every time it's referenced). Each database engine makes its own choice.
PostgreSQL prior to version 12 always materialized CTEs as optimization fences — the planner could not push WHERE predicates from the outer query into the CTE. PostgreSQL 12+ inlines non-recursive CTEs by default, allowing predicate pushdown. You can force materialization with the MATERIALIZED keyword.
SQL Server never guaranteed materialization — it treats CTEs as macro expansions. If you reference a CTE three times in a UNION ALL, SQL Server may execute that query three times. For expensive CTEs referenced repeatedly, load results into a #temp_table first.
MySQL 8.0+ inlines CTEs. Oracle materializes when it determines it's beneficial and accepts the WITH MATERIALIZE hint to force it.
Production CTE Patterns — Window Functions, Data Mutations and When Not to Use CTEs
CTEs compose beautifully with window functions because window function results can't be filtered in the same SELECT where they're computed — you need a wrapper, and a CTE is cleaner than a subquery for this.
CTEs also work with INSERT, UPDATE, DELETE and MERGE in PostgreSQL and SQL Server — called data-modifying CTEs. In PostgreSQL, data-modifying CTEs execute exactly once, and their effects are visible to the outer query but not to other CTEs in the same WITH clause.
Know when NOT to use a CTE. If your query is a single flat SELECT that runs fine as a subquery, wrapping it in a CTE adds no value. CTEs shine for multi-step logic, recursion, and avoiding repeated complex expressions. A CTE over a 50-million-row unindexed table is still a full table scan — CTEs don't fix missing indexes.
CTE vs View vs Temporary Table — When to Use Which
Choosing between a CTE, a view, and a temporary table depends on scope, performance requirements, and reuse pattern. All three allow you to encapsulate a query, but they differ drastically in lifespan, materialization behavior, and DDL impact.
CTEs are ideal for single-query decomposition. They have no schema footprint, disappear immediately after the query finishes, and cannot be reused across separate statements. They are the best choice for breaking down complex logic within one query, especially when recursion is needed or when you want to chain multiple transformation steps.
Views are persistent schema objects that wrap a query. They can be referenced across multiple queries by multiple users. They are a security boundary and provide abstraction. However, they require DDL permissions (CREATE VIEW) and can become a maintenance burden if not managed. Views are best for shared business logic that must be used consistently across many reports or applications.
Temporary tables (prefixed with # in SQL Server, temp table in PostgreSQL) are explicitly materialized on disk (or in memory for small ones). They persist for the duration of the session (or transaction in some cases). They are the only option when you need to reuse an expensive intermediate result multiple times within a script or across multiple statements without re-executing the source query. They also allow indexes and statistics, giving you more control over query performance.
Here's a quick reference:
| Feature | CTE | View | Temp Table |
|---|---|---|---|
| Scope | Single query | Schema (persistent) | Session/session scope |
| Reuse across queries | No | Yes | Yes |
| Materialization | Engine-specific | Engine-specific | Always materialized |
| Recursion support | Yes (WITH RECURSIVE) | No | No |
| DDL permissions | None | CREATE VIEW | CREATE TABLE (usually allowed for temp tables) |
| Index support | No | Yes (indexed views) | Yes |
| Best for | Multi-step single query, recursion | Shared logic across many queries | Repeated reuse of expensive result within a session |
In production, the pattern is often: use CTEs for readability within a single query, use temp tables when you need to break that query into multiple steps with explicit materialization, and use views only when the same logic must be shared across teams or applications.
CTEs and Indexing — Why Your Recursive CTE Just Killed the Server and How to Fix It
Nobody tells you this until you page them at 3 AM. A recursive CTE looks innocent. It's not. The recursive member executes per row, not per batch. If your anchor set returns 10k rows and each recursion step does a full table scan, you're generating millions of logical reads before your monitoring even twitches.
The fix isn't 'add an index'. It's add the right index. A recursive CTE accessing a parent-child table needs a covering index on the join columns used in the recursive step. Without it, SQL Server (or Postgres, or MySQL) spools to tempdb for every iteration. That's where performance goes to die.
Test this yourself. Write the recursive CTE. Check the execution plan — look for 'Eager Spool' or 'Index Scan' on the recursive side. If you see either, you're one bad index away from a production incident. The index you need is composite: (parent_id, child_id) if your join is on parent_id. Include any columns you filter or return. That's it. No magic.
CTE Side Effects — When They Don't Execute Exactly Once and Why That Breaks Your Report
Here's a fun one: you write a CTE that calls a function with side effects — maybe it logs access, updates a counter, or reads from a sequence. You reference the CTE twice in the outer query. SQL Server says 'I'll just inline this' and executes the function twice. Now you've double-logged, double-counted, or burned two sequence values. Good luck explaining that to the auditor.
This isn't a bug. It's by design. The optimizer decides whether to materialize a CTE or inline it based on cost estimates. You don't control it. If your CTE uses a non-deterministic function (GETDATE(), NEWID(), RAND(), any custom function with side effects), you get one execution per reference, not one execution total.
Fix: If you must have side effects exactly once, force materialization. Use a temp table. Or wrap the CTE in another CTE that adds TOP 100 PERCENT or ORDER BY a non-clustered column — some optimizers back off inlining when they see that. Better yet: don't put side-effect functions in CTEs at all. Push them to an explicit temp table. You lose the 'elegance' but gain correctness. Production rewards correctness.
GETDATE(), force it into a temp table. The optimizer will inline non-deterministic CTEs every time — and your side effects will double.CTEs in WHERE and FROM Subqueries — The Nesting Trap That Kills Readability Fast
You can use a CTE inside a subquery. You can use a subquery inside a CTE. You can stack three CTEs in a WITH clause. The SQL standard lets you. Doesn't mean you should. I've seen a 'report query' with six CTEs, each referencing the prior one, then two CTEs used inside a HAVING clause. It took me 45 minutes to trace the data flow. Nobody could maintain it.
The real problem: CTEs don't show in the execution plan as named entities. When you nest them inside subqueries, the optimizer flattens everything into one giant blob. You lose the ability to isolate performance problems. A slow CTE inside a WHERE clause becomes invisible — you just see a big 'Filter' operator with no clue which part is killing you.
My rule: one level of CTE for readability. If you need intermediate results, write them to temp tables with appropriate indexes. Name them. Document them. Your future self (or the junior who inherits this) will thank you when the query needs tuning. You don't get points for 'elegant nesting' in production — you get points for being able to explain what your query does at 2 AM.
Advanced SQL CTE Techniques — Chaining, Modifying Data, and Debugging Like a Pro
You've mastered the single CTE. Now chain them. A CTE can reference previous CTEs in the same WITH block. This isn't just neat — it lets you build transformations step by step without nesting subqueries into a nightmare. Each CTE becomes a named pipeline stage. You can inspect, debug, and reuse each stage independently. That's the whole point.
But here's where most devs miss the mark: CTEs can modify data. In PostgreSQL and SQL Server, you can wrap INSERT, UPDATE, or DELETE inside a CTE and target the result set directly. This is massive for batch operations — update rows based on a complex join, then insert a log entry in the same statement. One atomic operation. No race conditions.
Production tip: Chain CTEs but watch the optimizer. Some engines materialize each CTE, others inline them. If your chain is deep and your data is wide, you might end up reading the same table five times. Profile before you profile your career.
Potential Issues or Limitations of CTEs in SQL — The Hidden Costs Nobody Warns You About
CTEs look clean but they hide landmines. First: no index hints. You can't force an index inside a CTE definition. If your recursive CTE walks a million-node tree and the join column isn't indexed, you're toast. The server will scan full tables every recursion level. That's not a bug — it's your poor schema design.
Second: CTEs are often materialized only once, but some engines (hello, SQL Server) might re-evaluate them multiple times inside the same query. If your CTE uses a volatile function like GETDATE() or RAND(), you'll get inconsistent results. Debugging that is a special kind of hell.
Third: recursion limits. MySQL defaults to 1000 recursion depth. PostgreSQL defaults to 100 and you have to explicitly SET session to raise it. Forget that, and your org chart query silently truncates at level 7. Nobody catches that until leadership asks why the CEO's reports are missing.
Fourth: CTEs can't be indexed, partitioned, or parallelized the way temp tables can. For large datasets, a temp table with proper indexes will blow past a CTE every time.
CTEs in Azure Synapse Analytics — Why Your On-Premise Query Breaks in the Cloud
Azure Synapse Analytics (formerly SQL Data Warehouse) and Analytics Platform System (PDW) support CTEs but with critical restrictions that catch teams off guard. Unlike SQL Server, Synapse requires that every CTE be referenced at least once in the main query body—orphan CTEs silently fail. Recursive CTEs are outright banned because Synapse's MPP architecture cannot handle stack-based recursion across distribution nodes. Non-recursive CTEs behave as inline views, not materialized staging tables, so referencing the same CTE twice causes double execution and massive slowdowns. CTEs also cannot contain ORDER BY without TOP or OFFSET/FETCH, nor can they use OPTION clauses. The biggest trap: Synapse CTEs are limited to one SELECT statement—you cannot chain CTEs with UNION in the same WITH block. Always test CTE logic in a dedicated SQL pool before production deploy.
Arguments and Usage Guidelines — How Databases Differ on CTE Syntax
Every major SQL dialect supports CTEs but with subtle argument differences that break portability. PostgreSQL and MySQL allow CTEs to have explicit column aliases: WITH cte (col1, col2) AS (SELECT ...). SQL Server and Oracle require this when subquery columns lack names (e.g., aggregate expressions). DB2 and Teradata enforce strict recursion depth limits via OPTION (MAXRECURSION n)—default is 100 in SQL Server, infinite in PostgreSQL unless you set max_recursive_iterations. The most ignored guideline: never use CTEs for singleton lookups. CTEs optimize poorly for single-row fetches because the optimizer scans the entire CTE definition before applying outer filters. Another hard rule—CTEs in subqueries or WHERE clauses force nested scopes that degrade readability faster than any other pattern. Always define CTEs at the top of the query, not inside derived tables. If you exceed 3 CTEs chained, extract the logic into a view.
A CTE Referenced Three Times Scanned 50M Rows Three Times in SQL Server
- SQL Server CTEs are not cached — if referenced N times, the underlying query may execute N times
- Use SET STATISTICS IO ON to count logical reads before assuming a query is efficient
- For expensive CTEs referenced more than once in SQL Server, always use a #temp table
Key takeaways
Common mistakes to avoid
3 patternsReferencing a CTE before it is defined in the WITH clause
Assuming a CTE is computed once when referenced multiple times in SQL Server
Writing a recursive CTE without cycle detection on potentially dirty hierarchical data
Interview Questions on This Topic
What is the difference between a CTE and a subquery, and when would you choose one in production?
Frequently Asked Questions
20+ years shipping high-throughput database systems. Notes here come from systems that actually shipped.
That's SQL Advanced. Mark it forged?
15 min read · try the examples if you haven't