Home Database Recursive SQL Queries with CTEs — How They Work, When to Use Them, and How to Avoid Infinite Loops

Recursive SQL Queries with CTEs — How They Work, When to Use Them, and How to Avoid Infinite Loops

In Plain English 🔥
Imagine you're looking for your great-great-grandmother in a family tree. You start with yourself, then find your parents, then their parents, and keep going up until there's no one left. A recursive SQL query does exactly that — it starts with one row of data, uses that result to find more rows, then uses THOSE results to find even more, repeating itself until it hits a dead end. It's like following a trail of breadcrumbs where each crumb tells you where the next one is.
⚡ Quick Answer
Imagine you're looking for your great-great-grandmother in a family tree. You start with yourself, then find your parents, then their parents, and keep going up until there's no one left. A recursive SQL query does exactly that — it starts with one row of data, uses that result to find more rows, then uses THOSE results to find even more, repeating itself until it hits a dead end. It's like following a trail of breadcrumbs where each crumb tells you where the next one is.

Every production database hides a secret: a huge chunk of its most important data is inherently hierarchical. Org charts, product categories, bill-of-materials assemblies, comment threads, file systems, permission inheritance chains — they all share the same shape. A node points to a parent, which points to another parent, which eventually points to nothing. Flattening that structure into a two-dimensional SQL result set was, for decades, genuinely painful. Recursive SQL queries — specifically recursive Common Table Expressions — exist precisely to solve that pain elegantly inside the database engine, without shipping raw data up to application code and looping in Python or Java.

The classical alternative was either a 'closure table' (storing every ancestor-descendant pair explicitly) or a 'nested sets' model (numbering left and right boundaries). Both work, but they push write-time complexity sky-high. With recursive CTEs, you store the simplest possible relationship — just parent_id — and let the query engine do the traversal at read time. The tradeoff is query complexity and, if you're not careful, runaway execution that locks up your database.

By the end of this article you'll understand exactly how the SQL engine executes a recursive CTE step by step, you'll be able to write recursive queries for tree traversal, path building, and cycle detection, and you'll know the three performance traps that catch senior engineers off guard in production.

The Anatomy of a Recursive CTE — What the Engine Actually Does

A recursive CTE has two mandatory parts separated by UNION ALL: the anchor member and the recursive member. The anchor member is a plain SELECT that returns the starting row(s) — your 'seed'. The recursive member is a SELECT that JOINs back to the CTE itself, producing the next level of results based on what the previous iteration found.

The engine doesn't actually call itself like a programming language function does. It uses an iterative worktable approach. First it runs the anchor and stores those rows in a temporary worktable. Then it runs the recursive member against ONLY the rows currently in the worktable, producing a new batch. That new batch replaces the worktable content. This repeats until the recursive member returns zero rows or you hit the MAXRECURSION limit.

This is critical to understand: the recursive member never sees the full accumulated result — only the most recently added batch. That's why it's called a 'working table' model, not true recursion. It's breadth-first by default in most engines (SQL Server, PostgreSQL). You can coerce depth-first ordering using a path column trick, which we'll cover shortly.

SQL Server defaults to MAXRECURSION 100. PostgreSQL has no hard cap but you can set one. MySQL added recursive CTE support in version 8.0. Always verify your engine version before betting your data pipeline on this feature.

employee_hierarchy_traversal.sql · SQL
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
-- ============================================================
-- SETUP: A realistic employee table with a self-referencing
-- parent (manager) relationship.
-- Tested on PostgreSQL 15, SQL Server 2019, MySQL 8.0+
-- ============================================================

CREATE TABLE employee (
    employee_id   INT          PRIMARY KEY,
    full_name     VARCHAR(100) NOT NULL,
    job_title     VARCHAR(100) NOT NULL,
    manager_id    INT          NULL,  -- NULL means this person is the root (CEO)
    FOREIGN KEY (manager_id) REFERENCES employee(employee_id)
);

INSERT INTO employee (employee_id, full_name, job_title, manager_id) VALUES
    (1, 'Sandra Okafor',   'CEO',                 NULL),
    (2, 'Marcus Webb',     'VP of Engineering',   1),
    (3, 'Priya Nair',      'VP of Marketing',     1),
    (4, 'Leo Hartmann',    'Senior Engineer',     2),
    (5, 'Yuki Tanaka',     'Senior Engineer',     2),
    (6, 'Carla Mendez',    'Marketing Manager',   3),
    (7, 'James Osei',      'Junior Engineer',     4),
    (8, 'Fatima Al-Amin',  'Junior Engineer',     4),
    (9, 'Ravi Patel',      'Marketing Analyst',   6);

-- ============================================================
-- THE RECURSIVE CTE
-- Goal: Starting from Sandra (the CEO), walk down the entire
-- reporting chain and show each person's depth level and their
-- full reporting path as a readable string.
-- ============================================================

WITH RECURSIVE org_hierarchy AS (

    -- ---- ANCHOR MEMBER ----
    -- This runs exactly ONCE. It seeds the worktable with just
    -- the root node (the CEO who has no manager).
    SELECT
        employee_id,
        full_name,
        job_title,
        manager_id,
        0                          AS depth_level,      -- CEO is at depth 0
        full_name::TEXT            AS reporting_path    -- Path starts with just the CEO name
    FROM employee
    WHERE manager_id IS NULL                            -- The root: no manager above them

    UNION ALL

    -- ---- RECURSIVE MEMBER ----
    -- This runs repeatedly. Each iteration joins the CURRENT
    -- worktable batch (org_hierarchy) to the employee table
    -- to find one more level of direct reports.
    SELECT
        e.employee_id,
        e.full_name,
        e.job_title,
        e.manager_id,
        oh.depth_level + 1,                            -- Go one level deeper
        oh.reporting_path || ' → ' || e.full_name      -- Append this person to the path
    FROM employee e
    INNER JOIN org_hierarchy oh
        ON e.manager_id = oh.employee_id               -- Match employees whose manager is in the current batch

)
-- ============================================================
-- FINAL SELECT — pull everything out of the accumulated CTE
-- ============================================================
SELECT
    employee_id,
    REPEAT('  ', depth_level) || full_name   AS indented_name,  -- Visual indentation by depth
    job_title,
    depth_level,
    reporting_path
FROM org_hierarchy
ORDER BY reporting_path;  -- Path ordering gives us a clean depth-first visual
▶ Output
employee_id | indented_name | job_title | depth_level | reporting_path
------------+-----------------------------+---------------------+-------------+-----------------------------------------------
1 | Sandra Okafor | CEO | 0 | Sandra Okafor
2 | Marcus Webb | VP of Engineering | 1 | Sandra Okafor → Marcus Webb
4 | Leo Hartmann | Senior Engineer | 2 | Sandra Okafor → Marcus Webb → Leo Hartmann
7 | James Osei | Junior Engineer | 3 | Sandra Okafor → Marcus Webb → Leo Hartmann → James Osei
8 | Fatima Al-Amin | Junior Engineer | 3 | Sandra Okafor → Marcus Webb → Leo Hartmann → Fatima Al-Amin
5 | Yuki Tanaka | Senior Engineer | 2 | Sandra Okafor → Marcus Webb → Yuki Tanaka
3 | Priya Nair | VP of Marketing | 1 | Sandra Okafor → Priya Nair
6 | Carla Mendez | Marketing Manager | 2 | Sandra Okafor → Priya Nair → Carla Mendez
9 | Ravi Patel | Marketing Analyst | 3 | Sandra Okafor → Priya Nair → Carla Mendez → Ravi Patel
🔥
Engine Internals:The recursive member joins to the CTE name, but the engine only gives it the LATEST batch of rows — not every row accumulated so far. Think of it as a sliding window, not a growing pile. This is why building a 'path' string works: you carry state forward through each iteration by including it in your SELECT columns.

Cycle Detection and the MAXRECURSION Safety Net — Preventing Runaway Queries

Here's the production nightmare scenario: someone creates a circular reference in your data. Employee A reports to Employee B, who reports to Employee C, who somehow reports back to Employee A. Your recursive CTE will loop forever — or until the engine cuts it off — burning CPU and memory the whole time.

SQL Server throws an error at 100 iterations by default (you can raise or disable this). PostgreSQL keeps going until it runs out of memory or you add CYCLE detection syntax (added in PostgreSQL 14). MySQL respects the cte_max_recursion_depth session variable (default 1000).

The safest universal technique, which works across ALL engines including older PostgreSQL versions, is to maintain an array of visited IDs in a path column and check for membership before recursing. If the next employee_id already appears in the visited array, you stop. This is manual cycle detection and it's what you should use in production when you can't guarantee data integrity on the parent_id column.

PostgreSQL 14+ also gives you the declarative CYCLE clause which is cleaner but less portable. We'll show both. The key mental model: think of cycle detection as leaving a trail of footprints. Before you step somewhere, you look down and check if your own footprint is already there.

cycle_detection_safe_recursion.sql · SQL
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
-- ============================================================
-- Simulating a CYCLE: Let's corrupt the data to create a loop.
-- James Osei (id=7) will be made to 'manage' Leo Hartmann (id=4)
-- creating a circle: Leo → James → Leo → James → ...
-- ============================================================

-- WARNING: Do not run this on the real table without a transaction!
-- This temporarily breaks the tree for demonstration purposes.
BEGIN;

    UPDATE employee
    SET manager_id = 7         -- James now 'manages' Leo (his own senior!)
    WHERE employee_id = 4;     -- Creating the cycle: 4 → 7 → 4 → 7 ...

    -- ---- APPROACH 1: Manual cycle detection using an integer ARRAY ----
    -- Works on PostgreSQL 12+. Replace INT[] with a VARCHAR path string
    -- for SQL Server / MySQL compatibility.

    WITH RECURSIVE safe_org_traversal AS (

        -- Anchor: start at CEO, initialise the visited_ids array
        SELECT
            employee_id,
            full_name,
            manager_id,
            0                              AS depth_level,
            ARRAY[employee_id]             AS visited_ids,   -- Track every ID we've seen
            FALSE                          AS cycle_detected  -- Flag: are we in a loop?
        FROM employee
        WHERE manager_id IS NULL

        UNION ALL

        SELECT
            e.employee_id,
            e.full_name,
            e.manager_id,
            sot.depth_level + 1,
            sot.visited_ids || e.employee_id,              -- Append new ID to the visited list
            e.employee_id = ANY(sot.visited_ids)           -- TRUE if we've seen this ID before
        FROM employee e
        INNER JOIN safe_org_traversal sot
            ON e.manager_id = sot.employee_id
        WHERE
            NOT sot.cycle_detected                         -- STOP recursing if we already flagged a cycle
            AND e.employee_id <> ALL(sot.visited_ids)      -- Don't recurse into an already-visited node

    )
    SELECT
        employee_id,
        full_name,
        depth_level,
        cycle_detected,
        visited_ids
    FROM safe_org_traversal
    ORDER BY depth_level, employee_id;

ROLLBACK;  -- Undo the corrupted data

-- ============================================================
-- APPROACH 2: PostgreSQL 14+ declarative CYCLE clause (cleaner)
-- ============================================================

WITH RECURSIVE org_with_cycle_check AS (
    SELECT employee_id, full_name, manager_id, 0 AS depth_level
    FROM employee
    WHERE manager_id IS NULL

    UNION ALL

    SELECT e.employee_id, e.full_name, e.manager_id, owc.depth_level + 1
    FROM employee e
    INNER JOIN org_with_cycle_check owc ON e.manager_id = owc.employee_id
)
CYCLE employee_id           -- Tell PostgreSQL WHICH column to watch for repeats
SET is_cycle                -- A boolean column PostgreSQL adds automatically
USING cycle_path            -- An array column PostgreSQL manages for you
SELECT
    employee_id,
    full_name,
    depth_level,
    is_cycle
FROM org_with_cycle_check
ORDER BY depth_level;

-- ============================================================
-- SQL SERVER EQUIVALENT: Setting a custom recursion limit
-- Use this when you know your maximum legitimate tree depth
-- ============================================================
-- OPTION (MAXRECURSION 50)  <-- Add this at the very end of the query
-- OPTION (MAXRECURSION 0)   <-- 0 means UNLIMITED — dangerous with dirty data
▶ Output
-- Output from APPROACH 1 (with corrupted data, before ROLLBACK)
-- Notice the cycle is caught cleanly — no infinite loop, no error

employee_id | full_name | depth_level | cycle_detected | visited_ids
------------+----------------+-------------+----------------+----------------------
1 | Sandra Okafor | 0 | false | {1}
2 | Marcus Webb | 1 | false | {1,2}
3 | Priya Nair | 1 | false | {1,3}
4 | Leo Hartmann | 2 | false | {1,2,4}
5 | Yuki Tanaka | 2 | false | {1,2,5}
6 | Carla Mendez | 2 | false | {1,3,6}
7 | James Osei | 3 | false | {1,2,4,7}
-- James's children are skipped because employee_id 4 (Leo) is already in visited_ids
-- No infinite loop. No engine crash. Clean stop.
⚠️
Watch Out:Never use OPTION (MAXRECURSION 0) on SQL Server in production unless you have cycle detection built into the query itself. It removes the engine's safety net entirely. A single bad row in your parent_id column will spin the query until the server runs out of memory or someone kills it manually. Always pair unlimited recursion with an explicit visited-IDs guard.

Real-World Patterns — Aggregating Subtrees and Finding All Ancestors

Two patterns come up constantly in production work. First: given a node, find ALL its ancestors up to the root (upward traversal). Second: given a node, find ALL its descendants and compute an aggregate across the whole subtree — like the total headcount under a manager, or the total cost of all components in a bill of materials.

Upward traversal just means you flip the JOIN. Instead of joining where employee.manager_id = cte.employee_id (going down), you join where cte.manager_id = employee.employee_id (going up). The anchor is the leaf node you're starting from.

Subtree aggregation is where recursive CTEs really shine against closure tables. You traverse the full subtree in the CTE, then wrap it in an outer query and GROUP BY the root. The key insight is that the CTE gives you a flat list of all descendants — you then aggregate that flat list however you need. No recursive summing required. The recursion only handles the traversal; the aggregation is just normal SQL on top of it.

Bill of Materials (BOM) queries are the canonical example. A product is made of components, each of which is itself made of sub-components, each potentially shared across multiple parent products. Recursive CTEs handle BOM explosions elegantly, though watch for shared components creating DAGs (Directed Acyclic Graphs) rather than pure trees — you may count the same component multiple times without careful deduplication.

subtree_aggregation_and_ancestor_lookup.sql · SQL
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
-- ============================================================
-- PATTERN 1: Find ALL ancestors of a given employee (upward)
-- Use case: Show the full chain of command above 'James Osei'
-- ============================================================

WITH RECURSIVE ancestors_of_james AS (

    -- Anchor: start AT James Osei (our leaf node)
    SELECT
        employee_id,
        full_name,
        job_title,
        manager_id,
        0 AS levels_above
    FROM employee
    WHERE employee_id = 7      -- James Osei's ID

    UNION ALL

    -- Recursive: join UPWARD — find the manager of whoever we just found
    SELECT
        e.employee_id,
        e.full_name,
        e.job_title,
        e.manager_id,
        aj.levels_above + 1
    FROM employee e
    INNER JOIN ancestors_of_james aj
        ON aj.manager_id = e.employee_id   -- Flip! We go UP: CTE's manager_id matches parent's employee_id

)
SELECT
    levels_above,
    full_name,
    job_title
FROM ancestors_of_james
ORDER BY levels_above;

-- ============================================================
-- PATTERN 2: Subtree aggregation — headcount under each VP
-- Use case: HR dashboard showing how many people report to
-- each VP (including all levels below them)
-- ============================================================

WITH RECURSIVE full_subtree AS (

    -- Anchor: every direct report of the CEO (the VPs)
    -- We tag each person with the VP they ultimately roll up to
    SELECT
        employee_id,
        full_name,
        manager_id,
        employee_id   AS subtree_root_id   -- Each VP is the 'root' of their own subtree
    FROM employee
    WHERE manager_id = 1                   -- Direct reports of Sandra (CEO)

    UNION ALL

    -- Recursive: find everyone under the current batch,
    -- carrying forward the original subtree_root_id so we always
    -- know which VP this person ultimately rolls up to
    SELECT
        e.employee_id,
        e.full_name,
        e.manager_id,
        fs.subtree_root_id                 -- Preserve the root tag through all levels
    FROM employee e
    INNER JOIN full_subtree fs
        ON e.manager_id = fs.employee_id

)
-- Now aggregate the flat list — standard GROUP BY, no recursion needed here
SELECT
    root_emp.full_name                     AS vp_name,
    root_emp.job_title,
    COUNT(fs.employee_id)                  AS total_headcount_including_self_minus_one,
    COUNT(fs.employee_id) + 1              AS total_headcount_including_vp
FROM full_subtree fs
INNER JOIN employee root_emp
    ON fs.subtree_root_id = root_emp.employee_id
GROUP BY
    root_emp.employee_id,
    root_emp.full_name,
    root_emp.job_title
ORDER BY total_headcount_including_vp DESC;
▶ Output
-- PATTERN 1 OUTPUT: Ancestors of James Osei
levels_above | full_name | job_title
-------------+----------------+-------------------
0 | James Osei | Junior Engineer
1 | Leo Hartmann | Senior Engineer
2 | Marcus Webb | VP of Engineering
3 | Sandra Okafor | CEO

-- PATTERN 2 OUTPUT: Headcount under each VP
vp_name | job_title | total_headcount_including_vp
--------------+--------------------+-----------------------------
Marcus Webb | VP of Engineering | 5
Priya Nair | VP of Marketing | 3
⚠️
Pro Tip:Carry a 'root tag' column through your recursion instead of running a separate ancestor lookup for each row. The pattern of setting subtree_root_id in the anchor and propagating it unchanged through the recursive member is one of the most reusable tricks in recursive SQL — it collapses what would be N+1 queries into a single traversal pass.

Performance Internals and Production Tuning — Why Recursive CTEs Can Destroy Your Database

Recursive CTEs look elegant but they hide real performance danger. Understanding the execution model is non-negotiable before you ship them to production.

First: recursive CTEs are NOT inlined in PostgreSQL the way regular CTEs sometimes are. They're always materialised — the engine writes intermediate results to a worktable (typically in memory, spilling to disk if large). This means you cannot push WHERE clause predicates from the outer query INTO the CTE. If you filter on depth_level > 2 in your final SELECT, the CTE still computed every level. Always filter INSIDE the recursive member to prune the search space early.

Second: the JOIN inside the recursive member runs once per iteration. If your recursive member does a sequential scan of a 10-million-row table on each pass, and your tree is 8 levels deep, you're doing 8 full scans. An index on manager_id is not optional — it's the difference between milliseconds and minutes.

Third: avoid selecting columns you don't need inside the CTE. Every extra column gets materialised and written to the worktable on every iteration. This compounds fast for wide tables with TEXT or JSONB columns.

For very deep trees (thousands of levels, like a BOM explosion for complex machinery), consider whether a closure table or a materialised path (ltree in PostgreSQL) might be a better architectural choice. Recursive CTEs are ideal for trees up to a few hundred nodes per query invocation. Beyond that, benchmark aggressively.

recursive_cte_performance_tuning.sql · SQL
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
-- ============================================================
-- PERFORMANCE DEMONSTRATION
-- Showing the difference between an untuned and a tuned
-- recursive CTE on a larger dataset.
-- ============================================================

-- STEP 1: Make sure this index EXISTS. Without it, every
-- recursive iteration does a full table scan.
-- On a 500k-row employee table this index reduces query time
-- from ~4s to ~30ms.
CREATE INDEX IF NOT EXISTS idx_employee_manager_id
    ON employee (manager_id);

-- STEP 2: EXPLAIN ANALYZE to see the execution plan
-- Always do this before shipping a recursive CTE to production.
-- Look for 'CTE Scan' and 'WorkTable Scan' nodes.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
WITH RECURSIVE org_hierarchy AS (
    SELECT employee_id, full_name, manager_id, 0 AS depth_level
    FROM employee
    WHERE manager_id IS NULL
    UNION ALL
    SELECT e.employee_id, e.full_name, e.manager_id, oh.depth_level + 1
    FROM employee e
    INNER JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT * FROM org_hierarchy;

-- ============================================================
-- TUNING PATTERN: Prune INSIDE the recursive member.
-- If you only care about the top 2 levels, add the WHERE
-- clause IN the recursive member — not just in the outer query.
-- ============================================================

-- WRONG (inefficient): The CTE computes ALL levels, then filters
WITH RECURSIVE all_levels AS (
    SELECT employee_id, full_name, manager_id, 0 AS depth_level
    FROM employee WHERE manager_id IS NULL
    UNION ALL
    SELECT e.employee_id, e.full_name, e.manager_id, al.depth_level + 1
    FROM employee e
    INNER JOIN all_levels al ON e.manager_id = al.employee_id
    -- No depth guard here — recurses to the bottom of the entire tree
)
SELECT * FROM all_levels
WHERE depth_level <= 2;   -- Filter applied AFTER full traversal — wasteful!

-- RIGHT (efficient): Stop recursing as soon as depth limit is hit
WITH RECURSIVE top_two_levels AS (
    SELECT employee_id, full_name, manager_id, 0 AS depth_level
    FROM employee WHERE manager_id IS NULL
    UNION ALL
    SELECT e.employee_id, e.full_name, e.manager_id, t2l.depth_level + 1
    FROM employee e
    INNER JOIN top_two_levels t2l ON e.manager_id = t2l.employee_id
    WHERE t2l.depth_level < 2    -- Guard INSIDE recursive member — stops traversal early
)
SELECT * FROM top_two_levels
ORDER BY depth_level, employee_id;

-- ============================================================
-- PRODUCTION SAFETY: Always set MAXRECURSION (SQL Server)
-- or check pg_stat_activity for runaway queries (PostgreSQL)
-- ============================================================

-- SQL Server: append to any recursive query in production
-- OPTION (MAXRECURSION 200)   -- Tune to your known maximum tree depth + buffer

-- PostgreSQL: set per-session if needed
-- SET work_mem = '64MB';  -- Give the worktable more memory before spilling to disk
▶ Output
-- EXPLAIN ANALYZE output excerpt (PostgreSQL) — what to look for:

CTE Scan on org_hierarchy (cost=0.00..0.18 rows=9 width=50)
-> Recursive Union
-> Seq Scan on employee -- Anchor: full scan is fine for tiny tables
Filter: (manager_id IS NULL)
-> Hash Join
Hash Cond: (e.manager_id = oh.employee_id)
-> Seq Scan on employee -- !! On large tables this becomes:
-> WorkTable Scan on org_hierarchy -- Index Scan on employee (manager_id)

-- After adding idx_employee_manager_id, the 'Seq Scan on employee'
-- inside the recursive member becomes 'Index Scan' — that's the win.
-- 'WorkTable Scan' is normal — that's the engine reading its own temp results.

-- Output of top_two_levels query:
employee_id | full_name | manager_id | depth_level
------------+---------------+------------+------------
1 | Sandra Okafor | NULL | 0
2 | Marcus Webb | 1 | 1
3 | Priya Nair | 1 | 1
4 | Leo Hartmann | 2 | 2
5 | Yuki Tanaka | 2 | 2
6 | Carla Mendez | 3 | 2
⚠️
Production Gotcha:In PostgreSQL, a recursive CTE is ALWAYS materialised — unlike non-recursive CTEs which PostgreSQL 12+ can inline. This means outer-query WHERE filters are never pushed into the CTE. If you're filtering by depth, subtree root, or any other column computed inside the recursion, always put that filter INSIDE the recursive member to prune the worktable early. Failure to do this is the single most common reason recursive CTEs cause production incidents.
AspectRecursive CTEClosure TableNested Sets (Modified Preorder)
Storage overheadNone — just parent_idHigh — O(n²) rows for a balanced treeLow — 2 extra integer columns
Read performance (subtree)Moderate — traversal at query timeExcellent — single range scanExcellent — single range scan
Write performance (insert/move)Excellent — just update parent_idPoor — must update ancestor-descendant pairsVery poor — rebalances left/right values for whole tree
Cycle safetyManual detection requiredN/A (no recursion)N/A (no recursion)
Depth limitMAXRECURSION / engine limitUnlimitedUnlimited
Arbitrary depth queriesNative — that's its purposeNative — just one joinAwkward — requires range arithmetic
Schema complexityVery low — one FK columnMedium — separate table neededMedium — two extra columns to maintain
Best used whenTree depth < ~500, writes frequentReads dominate, tree rarely restructuredReads dominate, structure rarely changes
PostgreSQL supportFull (v8.4+)Manual implementationManual implementation
MySQL supportMySQL 8.0+ onlyManual implementationManual implementation

🎯 Key Takeaways

  • Recursive CTEs use a worktable model, not true function recursion — the recursive member only ever sees the most recent batch of rows, not the full accumulated result. Understanding this explains why you carry state (depth, path, visited IDs) forward as columns.
  • An index on the parent_id (manager_id) column is mandatory, not optional. Without it, the recursive member does a full table scan on every single iteration — on a deep tree over a large table this escalates from milliseconds to minutes.
  • Always filter INSIDE the recursive member to prune traversal early. WHERE clauses in the outer SELECT run AFTER the entire CTE has been materialised — they don't reduce the work the recursion does.
  • Cycle detection must be explicit. Never assume your parent_id column is clean in production. The visited-IDs array guard (check if next_id = ANY(visited_ids)) works cross-engine and is the most robust pattern. PostgreSQL 14+ CYCLE clause is cleaner but ties you to one engine.

⚠ Common Mistakes to Avoid

  • Mistake 1: Forgetting the termination condition — The recursive member never returns zero rows — [Symptom: query runs until MAXRECURSION error or crashes the session; on MySQL with cte_max_recursion_depth exceeded you get 'ERROR 3636: Recursive query aborted'] — Fix: always include a WHERE clause in the recursive member that provably reduces the result set each iteration. The most reliable guard is depth_level < N (where N is your known maximum tree depth) AND the join condition itself (which will return zero rows when there are no more children). Never rely on the join alone if your data might contain cycles.
  • Mistake 2: Filtering in the outer SELECT instead of inside the recursive member — [Symptom: query returns correct results but takes 10–100x longer than it should; EXPLAIN shows 'WorkTable Scan' processing thousands of rows] — Fix: move depth filters, subtree root filters, and any other pruning conditions INTO the WHERE clause of the recursive SELECT member. The CTE materialises every row it produces — the outer query filter runs after full materialisation. Pruning inside the recursion saves computation on every iteration.
  • Mistake 3: Using UNION instead of UNION ALL in the recursive CTE — [Symptom: query is catastrophically slow or hangs; on SQL Server you get 'Recursive common table expressions are not allowed to use UNION' error — on PostgreSQL it silently deduplicates rows which corrupts results for graphs where the same node can be reached via multiple paths] — Fix: always use UNION ALL between the anchor and recursive members. UNION ALL is the required syntax per SQL standard for recursive CTEs. If you need to deduplicate, do it in the final outer SELECT with DISTINCT or handle it with the visited-IDs cycle detection pattern.

Interview Questions on This Topic

  • QWalk me through exactly how the SQL engine executes a recursive CTE — what's happening behind the scenes on each iteration? (Interviewers want to hear 'worktable', 'anchor runs once', 'recursive member sees only the latest batch', and 'stops when zero rows returned' — not just 'it calls itself')
  • QYou've got a product bill-of-materials table where some components are shared across multiple parent products. You write a recursive CTE to explode the full BOM and sum up component costs, but you're getting numbers that are too high. What's the likely bug and how do you fix it? (Answer: shared components in a DAG get counted multiple times — once per path that reaches them. Fix: deduplicate with DISTINCT on component_id, or accumulate costs only at the leaf level, or use a closure table instead)
  • QAn engineer on your team says 'I'll just set MAXRECURSION 0 so the recursive CTE never fails.' What's wrong with that decision and what would you do instead? (Red flag answer: 'sounds fine'. Green flag: explain that MAXRECURSION 0 removes the safety net, means a single circular reference in the data causes an infinite loop that consumes server resources until killed, and that the right fix is either a database constraint preventing cycles or explicit visited-ID cycle detection inside the query itself)

Frequently Asked Questions

What is the difference between a recursive CTE and a regular CTE in SQL?

A regular CTE is just a named subquery — it runs once and produces a static result set. A recursive CTE has two parts: an anchor (runs once to seed results) and a recursive member (runs repeatedly, each time joining back to the CTE's own output from the previous iteration). The engine repeats the recursive member until it produces zero new rows. Regular CTEs can't reference themselves; recursive CTEs are specifically designed to do exactly that.

Does MySQL support recursive CTEs?

Yes, but only from MySQL 8.0 onwards. If you're on MySQL 5.x or MariaDB before 10.2, recursive CTEs are not available and you'll need to use stored procedures with loops, or restructure your schema to use a closure table or nested sets model instead. Always check your MySQL version with SELECT VERSION() before writing recursive CTEs.

How do I stop a recursive CTE from running forever?

Three layers of protection working together: first, make sure the recursive member's JOIN condition naturally produces fewer rows each iteration (it will return zero rows when there are no more children to find). Second, add an explicit depth guard (WHERE depth_level < 500) inside the recursive member as a hard cap. Third, implement visited-ID tracking using an array column to detect and break cycles caused by dirty data. In SQL Server, you also get OPTION (MAXRECURSION N) as a final backstop — set it to a value just above your known maximum legitimate tree depth.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousRead Replicas in PostgreSQLNext →Database Locking Mechanisms
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged