SQL CTEs Explained: Recursive Queries, Performance Traps & Real-World Patterns
Every production database eventually grows a query so complex it becomes a maintenance nightmare — subqueries nested inside subqueries, the same filter logic copy-pasted five times, and a WHERE clause that scrolls off the screen. This isn't a hypothetical: it's Tuesday morning for most backend engineers. CTEs were introduced precisely because real SQL work at scale demands readable, composable, and debuggable queries. They're not just syntax sugar — they change how you structure your thinking about data problems.
The core problem CTEs solve is scope and readability without sacrificing power. Before CTEs, your only option for breaking a query apart was either derived tables (anonymous subqueries in FROM clauses) or views (which require DDL permissions and pollute your schema with single-use objects). CTEs give you a named, scoped, temporary result set that lives only for the duration of your query — no schema changes, no extra permissions, no cleanup required. And with the RECURSIVE variant, they unlock an entirely different class of problem: hierarchical data, graph traversal, and sequence generation that would otherwise require procedural loops.
By the end of this article you'll be able to write multi-step CTEs that replace tangled subquery nests, use recursive CTEs to walk org charts and file trees, understand exactly when your database materializes a CTE versus re-evaluates it (and why that matters enormously for performance), and avoid the three production gotchas that catch even experienced engineers off guard.
The Anatomy of a CTE — Structure, Scope and Why It Beats Subqueries
A CTE is defined with the WITH keyword, followed by a name, an optional column list, and the query body in parentheses. Everything after that closing parenthesis is the 'outer query' — the part that actually returns results to your application. The CTE only exists for the lifetime of that single outer query execution. The moment it finishes, the CTE is gone.
You can define multiple CTEs in one WITH clause by separating them with commas. Crucially, a later CTE can reference an earlier one — this is what makes chaining possible. Think of it as a pipeline: each CTE refines the data one step further before handing it to the next stage.
Why does this beat a subquery? Readability is only part of the answer. The bigger win is debuggability. When you have a three-level nested subquery and something goes wrong, you can't easily isolate which level is producing garbage data. With CTEs, you SELECT directly from each named stage during development — you comment out the outer query and inspect intermediate results. That feedback loop is invaluable on complex data pipelines.
Also note: a CTE referenced multiple times in the outer query does NOT automatically get computed once and cached. Whether it's materialized or re-evaluated depends on your database engine — a critical performance detail covered in Section 3.
-- CONTEXT: We have an e-commerce database. -- GOAL: Find customers who placed more than 3 orders in 2023 -- AND whose total spend exceeded $500, then rank them by spend. -- Step 1: Define the first CTE — filter orders from 2023 only WITH orders_2023 AS ( SELECT customer_id, order_id, order_total FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2023 AND order_status = 'completed' -- ignore cancelled orders ), -- Step 2: Second CTE references the first — aggregate per customer customer_summary AS ( SELECT customer_id, COUNT(order_id) AS total_orders, SUM(order_total) AS total_spend FROM orders_2023 -- referencing the CTE above, not the raw table GROUP BY customer_id ), -- Step 3: Third CTE — apply business rule filters high_value_customers AS ( SELECT customer_id, total_orders, total_spend FROM customer_summary WHERE total_orders > 3 AND total_spend > 500.00 ) -- OUTER QUERY: Join filtered customers with their profile data and rank SELECT c.customer_id, c.full_name, c.email, hvc.total_orders, hvc.total_spend, -- RANK assigns the same rank to ties; ROW_NUMBER would not RANK() OVER (ORDER BY hvc.total_spend DESC) AS spend_rank FROM high_value_customers hvc JOIN customers c ON c.customer_id = hvc.customer_id ORDER BY spend_rank;
-------------+-------------------+--------------------------+--------------+-------------+------------
10482 | Sarah Okonkwo | sarah.o@example.com | 9 | 1842.50 | 1
7291 | Marcus Tran | m.tran@example.com | 7 | 1205.00 | 2
7291 | Priya Nambiar | p.nambiar@example.com | 5 | 1205.00 | 2
3847 | Leon Ferreira | leon.f@example.com | 4 | 671.25 | 4
(4 rows)
Recursive CTEs — Walking Trees, Org Charts and Hierarchical Data
A recursive CTE has two parts separated by UNION ALL: the anchor member (the base case — where the recursion starts) and the recursive member (which references the CTE itself to walk one level deeper). The engine executes the anchor once, then keeps executing the recursive member against the previous iteration's results until the recursive member returns zero rows.
The classic use case is an organizational hierarchy stored as an adjacency list — each employee row has a manager_id pointing to another employee_id. Flatten that with a JOIN loop and you need N queries for N levels. With a recursive CTE, it's one query regardless of depth.
Two critical safety mechanisms exist: First, always include a termination condition in your recursive member's WHERE clause (e.g., ensuring the traversal doesn't follow a cycle). Second, most databases enforce a maximum recursion depth — PostgreSQL defaults to 100, SQL Server to 100 as well (configurable with OPTION MAXRECURSION). If your hierarchy is deeper than that, you'll get a runtime error, not wrong data.
Cycles in data are the most dangerous edge case. If employee A is somehow the manager of B, and B is the manager of A (data corruption happens), a naive recursive CTE will loop forever until it hits the depth limit. The fix is to track visited IDs in an array and break when you detect a repeat — shown in the code below.
-- CONTEXT: Company org chart stored as adjacency list. -- employee_id | full_name | manager_id | department | salary -- NULL manager_id means this person is the root (CEO). -- GOAL: Starting from a given employee, traverse DOWN the org chart -- to find all direct and indirect reports, their depth level, -- and guard against circular references in dirty data. WITH RECURSIVE org_hierarchy AS ( -- ANCHOR MEMBER: Start with the target employee (e.g., the VP of Engineering) -- This executes exactly once. SELECT employee_id, full_name, manager_id, department, salary, 0 AS depth, -- root is level 0 ARRAY[employee_id] AS visited_ids, -- track path to detect cycles full_name::TEXT AS reporting_path -- human-readable trail FROM employees WHERE employee_id = 42 -- 42 is our VP of Engineering UNION ALL -- RECURSIVE MEMBER: Find everyone who reports to the previous level. -- 'org_hierarchy' here refers to the PREVIOUS iteration's result set. SELECT e.employee_id, e.full_name, e.manager_id, e.department, e.salary, oh.depth + 1 AS depth, oh.visited_ids || e.employee_id AS visited_ids, oh.reporting_path || ' -> ' || e.full_name AS reporting_path FROM employees e -- JOIN the recursive CTE to itself — this is what walks the tree INNER JOIN org_hierarchy oh ON e.manager_id = oh.employee_id -- CYCLE GUARD: stop if we've already visited this employee_id in this path -- Without this, circular data causes infinite recursion until depth limit hit WHERE NOT (e.employee_id = ANY(oh.visited_ids)) ) -- OUTER QUERY: Present the full org tree with summary stats SELECT depth, LPAD('', depth * 4, ' ') || full_name AS org_chart, -- indent by level department, salary, reporting_path FROM org_hierarchy ORDER BY depth, full_name;
-------+------------------------------------+---------------+-----------+---------------------------------------------
0 | Dana Kimura | Engineering | 185000.00 | Dana Kimura
1 | Alex Petrov | Backend | 140000.00 | Dana Kimura -> Alex Petrov
1 | Yemi Adeyinka | Frontend | 138000.00 | Dana Kimura -> Yemi Adeyinka
2 | Carlos Mendes | Backend | 110000.00 | Dana Kimura -> Alex Petrov -> Carlos Mendes
2 | Jin-Ho Park | Backend | 108000.00 | Dana Kimura -> Alex Petrov -> Jin-Ho Park
2 | Fatima Al-Rashid | Frontend | 112000.00 | Dana Kimura -> Yemi Adeyinka -> Fatima Al-Rashid
(6 rows)
CTE Materialization vs. Re-evaluation — The Performance Truth Nobody Tells You
This is where most articles stop short and where most production bugs hide. The SQL standard says nothing about whether a CTE must be materialized (computed once, result stored in memory/temp space) or re-evaluated (the CTE's query re-runs every time the outer query references it). Each database engine makes its own choice — and those choices changed in recent versions.
PostgreSQL prior to version 12 always materialized CTEs (treated them as optimization fences). This meant the planner could NOT push WHERE clause predicates from the outer query into the CTE — every row the CTE produced was computed before filtering happened. Post version 12, PostgreSQL inlines non-recursive CTEs by default, allowing predicate pushdown. You can force materialization explicitly with the MATERIALIZED keyword when you want the fence behavior (e.g., to prevent volatile functions from running multiple times).
SQL Server never guaranteed materialization — it treats CTEs as macro expansions by default. If you reference a CTE three times in your outer query, SQL Server may execute that CTE's query three times. For expensive CTEs referenced repeatedly, this is a silent performance killer. The fix: load results into a temp table (#temp_table) first.
MySQL added CTE support in 8.0 and inlines them. Oracle materializes CTEs when it determines it's beneficial and allows the WITH MATERIALIZE hint to force it. Understanding your specific engine's behavior isn't optional — it's the difference between a 50ms query and a 30-second one.
-- ============================================================ -- PATTERN 1: PostgreSQL — Force materialization to use CTE -- as an optimization fence (prevents predicate pushdown intentionally) -- Use case: The CTE calls a volatile function you only want to run once -- ============================================================ WITH MATERIALIZED daily_exchange_rates AS ( -- get_live_rate() hits an external FDW (foreign data wrapper) — expensive! -- Without MATERIALIZED, if referenced 3x, it could be called 3x. SELECT currency_code, get_live_rate(currency_code) AS usd_rate, -- volatile external call CURRENT_TIMESTAMP AS rate_fetched_at FROM supported_currencies WHERE is_active = true ) SELECT o.order_id, o.order_total * der.usd_rate AS total_usd, der.rate_fetched_at FROM orders o JOIN daily_exchange_rates der ON der.currency_code = o.currency_code WHERE o.order_status = 'pending'; -- ============================================================ -- PATTERN 2: SQL Server — CTE referenced multiple times. -- DANGEROUS: SQL Server may re-run expensive_aggregation 3 times. -- ============================================================ -- ANTI-PATTERN (avoid when CTE is expensive and referenced multiple times) WITH expensive_aggregation AS ( SELECT region_id, SUM(revenue) AS total_revenue, AVG(revenue) AS avg_revenue, COUNT(*) AS transaction_count FROM sales_transactions WHERE transaction_date >= '2024-01-01' -- scanning millions of rows GROUP BY region_id ) SELECT 'Top' AS tier, * FROM expensive_aggregation WHERE total_revenue > 1000000 UNION ALL SELECT 'Mid' AS tier, * FROM expensive_aggregation WHERE total_revenue BETWEEN 250000 AND 1000000 UNION ALL SELECT 'Low' AS tier, * FROM expensive_aggregation WHERE total_revenue < 250000; -- ^ SQL Server may scan sales_transactions THREE TIMES for this query -- BETTER PATTERN for SQL Server: use a temp table for expensive CTEs -- referenced multiple times SELECT region_id, SUM(revenue) AS total_revenue, AVG(revenue) AS avg_revenue, COUNT(*) AS transaction_count INTO #region_aggregates -- materialize ONCE into temp table FROM sales_transactions WHERE transaction_date >= '2024-01-01' GROUP BY region_id; SELECT 'Top' AS tier, * FROM #region_aggregates WHERE total_revenue > 1000000 UNION ALL SELECT 'Mid' AS tier, * FROM #region_aggregates WHERE total_revenue BETWEEN 250000 AND 1000000 UNION ALL SELECT 'Low' AS tier, * FROM #region_aggregates WHERE total_revenue < 250000; DROP TABLE #region_aggregates; -- clean up
order_id | total_usd | rate_fetched_at
----------+-----------+------------------------
100291 | 523.84 | 2024-11-15 09:42:11+00
100307 | 1204.17 | 2024-11-15 09:42:11+00
100318 | 88.50 | 2024-11-15 09:42:11+00
-- PATTERN 2 OUTPUT (SQL Server temp table approach):
tier | region_id | total_revenue | avg_revenue | transaction_count
------+-----------+---------------+-------------+-------------------
Top | 3 | 2845000.00 | 9483.33 | 300
Top | 7 | 1120000.00 | 5600.00 | 200
Mid | 1 | 780000.00 | 3900.00 | 200
Low | 12 | 145000.00 | 1450.00 | 100
Production CTE Patterns — Window Functions, Data Mutations and When Not to Use CTEs
CTEs compose beautifully with window functions because window functions can't be filtered in the same SELECT where they're computed — you'd need a subquery or CTE to wrap them anyway. A common pattern: compute running totals or percentile rankings in a CTE, then filter by those derived values in the outer query.
CTEs also work with INSERT, UPDATE, DELETE and MERGE in most databases — this is called a data-modifying CTE and it's powerful for complex upsert logic. In PostgreSQL, data-modifying CTEs execute exactly once regardless of how many times they're referenced, and their effects are visible to the outer query but not to other CTEs in the same WITH clause. That isolation is by design and catches people off guard.
Know when NOT to use a CTE. If your query is a single, flat SELECT that runs fine as a subquery, wrapping it in a CTE adds no value — it just adds visual complexity. CTEs shine when you're breaking apart multi-step logic, need recursion, or want to avoid repeating a complex expression. They don't shine when they become an excuse to avoid fixing a bad data model or missing index. A CTE over a 50-million-row unindexed column is still a full table scan — the CTE doesn't fix underlying problems.
-- ============================================================ -- PATTERN A: CTE + Window Function Pipeline -- Find products in the bottom 10th percentile of sales velocity -- so the warehouse team can flag them for clearance review -- ============================================================ WITH product_sales_velocity AS ( -- Step 1: Calculate units sold per day since launch for each product SELECT p.product_id, p.product_name, p.category, p.launch_date, SUM(oi.quantity) AS units_sold, CURRENT_DATE - p.launch_date AS days_since_launch, -- avoid division by zero for products launched today ROUND( SUM(oi.quantity)::NUMERIC / NULLIF(CURRENT_DATE - p.launch_date, 0), 2) AS daily_velocity FROM products p LEFT JOIN order_items oi ON oi.product_id = p.product_id LEFT JOIN orders o ON o.order_id = oi.order_id AND o.order_status = 'completed' WHERE p.is_active = true AND p.launch_date < CURRENT_DATE - INTERVAL '30 days' -- exclude brand-new products GROUP BY p.product_id, p.product_name, p.category, p.launch_date ), velocity_with_percentile AS ( -- Step 2: Add percentile rank WITHIN each category -- We can't filter on PERCENT_RANK directly in the same SELECT — hence the CTE SELECT product_id, product_name, category, units_sold, days_since_launch, daily_velocity, PERCENT_RANK() OVER ( PARTITION BY category -- rank within category, not globally ORDER BY daily_velocity ASC -- ascending: slowest movers rank lowest ) AS velocity_percentile FROM product_sales_velocity ) -- Step 3: Outer query — only pull the slow movers for clearance flagging SELECT product_id, product_name, category, daily_velocity, ROUND(velocity_percentile * 100, 1) AS percentile_position, 'CLEARANCE_REVIEW' AS recommended_action FROM velocity_with_percentile WHERE velocity_percentile <= 0.10 -- bottom 10% within their category ORDER BY category, daily_velocity ASC; -- ============================================================ -- PATTERN B: Data-Modifying CTE (PostgreSQL) -- Archive completed orders older than 2 years atomically: -- insert into archive table, then delete from live table -- Both happen in one transaction — no risk of double data or data loss -- ============================================================ WITH orders_to_archive AS ( -- Identify candidates SELECT order_id FROM orders WHERE order_status = 'completed' AND order_date < CURRENT_DATE - INTERVAL '2 years' ), archived_orders AS ( -- DELETE from live table, RETURNING the deleted rows -- In PostgreSQL, RETURNING lets data-modifying CTEs pass rows forward DELETE FROM orders WHERE order_id IN (SELECT order_id FROM orders_to_archive) RETURNING * -- return all columns of deleted rows ) -- INSERT the returned rows into the archive table INSERT INTO orders_archive SELECT *, CURRENT_TIMESTAMP AS archived_at -- add archival timestamp FROM archived_orders RETURNING order_id, archived_at; -- confirm what was archived
product_id | product_name | category | daily_velocity | percentile_position | recommended_action
------------+-----------------------+-------------+----------------+---------------------+--------------------
2847 | Retro Desk Lamp v2 | Home Office | 0.03 | 4.2 | CLEARANCE_REVIEW
3102 | Bamboo Pen Holder Set | Home Office | 0.07 | 8.1 | CLEARANCE_REVIEW
1984 | Neoprene Phone Wallet | Accessories | 0.02 | 2.7 | CLEARANCE_REVIEW
-- PATTERN B OUTPUT:
order_id | archived_at
----------+----------------------------
10042 | 2024-11-15 09:55:03.412+00
10091 | 2024-11-15 09:55:03.412+00
10178 | 2024-11-15 09:55:03.412+00
(3 rows)
| Aspect | CTE (WITH clause) | Derived Table (Subquery in FROM) | Temporary Table | View |
|---|---|---|---|---|
| Readability | Excellent — named, reusable within query | Poor — anonymous, nested, hard to trace | Good — clearly named | Good — named and shared |
| Reusability within query | Yes — reference same CTE multiple times | No — must repeat the subquery | Yes — reference multiple times | Yes — reference multiple times |
| Recursion support | Yes — WITH RECURSIVE | No | No (needs stored proc loop) | No |
| Schema footprint | None — query-scoped only | None | Requires CREATE/DROP in tempdb | Yes — permanent schema object |
| Materialization control | Engine-dependent; some DBs allow MATERIALIZED hint | No — always inlined | Always materialized — explicit | Engine-dependent |
| Performance for repeated ref | Risky — may re-evaluate (engine-specific) | Risky — always re-evaluates | Safe — computed once | Risky — may re-evaluate |
| Data modification support | Yes (PostgreSQL, SQL Server) | No | Yes — full DML | Limited (updatable views) |
| Best for | Complex multi-step reads, recursion, one-time use | Simple single-use subqueries | Expensive subresults reused many times | Shared logic reused across many queries |
🎯 Key Takeaways
- CTEs are query-scoped named result sets — they exist only for the duration of one query execution, require no schema changes, and are the cleanest way to decompose multi-step SQL logic into readable, debuggable stages.
- Recursive CTEs require UNION ALL (not UNION), always need a termination condition, and should include cycle-detection logic when traversing real-world hierarchical data that might contain referential corruption.
- CTE materialization behavior is engine-specific and version-specific — PostgreSQL 12+ inlines by default, SQL Server treats CTEs as macro expansions that may re-execute. Always verify with EXPLAIN and use temp tables for expensive CTEs referenced multiple times in SQL Server.
- Data-modifying CTEs in PostgreSQL (DELETE/INSERT with RETURNING) enable atomic multi-step data operations in a single transaction — but sibling CTEs cannot see each other's changes, only the outer query can.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Referencing a CTE before it's defined in the WITH clause — Symptom: 'relation does not exist' or 'invalid object name' error even though the CTE is present — Fix: CTEs must be defined in order. A CTE can only reference CTEs that appear earlier in the same WITH clause. Reorder your CTEs so dependencies come first. Draw a dependency arrow diagram before writing complex multi-CTE queries.
- ✕Mistake 2: Assuming a CTE is always computed once in SQL Server — Symptom: Mysteriously slow queries where EXPLAIN shows the expensive base table scanned multiple times — Fix: In SQL Server, if you reference a CTE more than once in the outer query, it is typically re-evaluated each time. Load the result into a #temp_table first if the CTE is expensive and referenced more than once. Always verify with SET STATISTICS IO ON and check the logical reads count.
- ✕Mistake 3: Forgetting a recursion termination condition in RECURSIVE CTEs with potentially cyclic data — Symptom: Query runs until it hits the max recursion depth limit and throws an error like 'maximum recursion depth exceeded' — Fix: Always track visited nodes in an array (PostgreSQL) or add a depth counter with a WHERE depth < N guard clause. Never assume your hierarchical data is clean — always code defensively against cycles, even if 'the data should never have cycles'.
Interview Questions on This Topic
- QWhat is the difference between a CTE and a subquery, and when would you choose one over the other in a production environment?
- QExplain how a recursive CTE works internally — what are the anchor member and recursive member, and how does the engine know when to stop recursing?
- QIf you reference the same CTE three times in a single query on SQL Server, how many times does the underlying query inside the CTE execute — and how would you prove your answer?
Frequently Asked Questions
Does a SQL CTE improve query performance?
Not inherently — a CTE is a logical construct, not a performance optimization. Whether it improves performance depends on whether your engine materializes it (computes once) or inlines it. PostgreSQL 12+ inlines CTEs by default, meaning no performance benefit over a subquery unless you force materialization. SQL Server may re-execute CTE logic multiple times. For genuine performance gains with reused intermediate results, use a temp table.
Can you use a CTE in an UPDATE or DELETE statement?
Yes, in most major databases. In PostgreSQL and SQL Server you can define a CTE and then UPDATE or DELETE based on it. PostgreSQL additionally supports data-modifying CTEs where the CTE itself performs the DML operation using RETURNING to pass affected rows to the outer query — enabling atomic archive-and-delete patterns in a single statement.
What is the difference between WITH RECURSIVE and a regular CTE?
A regular CTE is evaluated once and cannot reference itself. A recursive CTE uses WITH RECURSIVE and contains two parts joined by UNION ALL: an anchor member (the base case, evaluated once) and a recursive member (which references the CTE itself and runs repeatedly until it returns no new rows). Recursive CTEs are essential for querying hierarchical structures like org charts, category trees, or graph data stored in relational tables.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.