Senior 5 min · March 17, 2026

Recursion vs Iteration — Recursion Crashed JSON Parser

RecursionError at 1000 nesting levels crashed payment API.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Recursion calls itself to solve smaller subproblems — elegant but costs a stack frame per call
  • Iteration uses loops — lower overhead, predictable memory usage
  • Python's default recursion limit is 1000 frames — exceeding it raises RecursionError
  • Every recursive solution has an iterative equivalent via explicit stack management
  • Memoization (@lru_cache) transforms exponential recursive algorithms into linear time
  • Iteration is typically 3-10x faster than recursion due to avoided function call overhead
Plain-English First

Imagine you need to find a book on a shelf. Recursion is like asking a friend, who asks another friend, who asks another friend — each person handles one book and passes the rest along. Eventually someone finds it, and the answer bubbles back up through the chain. Iteration is like checking each book yourself, one at a time, moving along the shelf. Both find the book. But recursion uses more phone calls (stack frames), and if the chain gets long enough, someone's phone dies — that's your stack overflow. Iteration uses more of your own time (loop cycles) but never runs out of phone battery.

Fibonacci — Recursion vs Iteration vs Memoization

Fibonacci is the canonical demonstration of recursion's cost because the pathology is so clear. The naive recursive version recomputes the same subproblems an exponential number of times — fib(5) calls fib(4) and fib(3), fib(4) calls fib(3) and fib(2), and fib(3) gets called again from fib(4)'s branch. The call tree doubles in size with each increment of n. By fib(40), you're making roughly a billion function calls to compute a number that a loop would find in 40 iterations.

Memoization fixes the exponential redundancy while keeping the recursive structure intact. @lru_cache stores the result of each unique (n,) argument tuple, so fib(3) is computed once and every subsequent call to fib(3) returns the cached result immediately. This collapses the O(2^n) call tree into a O(n) chain — you still recurse, but each unique subproblem executes exactly once.

Iteration beats both on space. The memoized version caches n results — O(n) space. The iterative version uses two variables that get updated in place — O(1) space. For computing fib(1_000_000), the iterative version uses negligible memory; the memoized recursive version would require thousands of cached entries and stack frames.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import sys
import time
from functools import lru_cache

# --- Naive recursion: O(2^n) time, O(n) space ---
# DO NOT use this in production for n > 35
def fib_recursive(n):
    if n <= 1:
        return n
    return fib_recursive(n - 1) + fib_recursive(n - 2)

# --- Iterative: O(n) time, O(1) space ---
# This is what production code should use
def fib_iterative(n):
    if n <= 1:
        return n
    a, b = 0, 1
    for _ in range(2, n + 1):
        a, b = b, a + b
    return b

# --- Memoized recursion: O(n) time, O(n) space ---
# Useful when you need recursive structure for clarity
# and can afford the O(n) cache memory
@lru_cache(maxsize=None)
def fib_memo(n):
    if n <= 1:
        return n
    return fib_memo(n - 1) + fib_memo(n - 2)

# Timing comparison — run on your machine to feel the difference
def time_it(fn, n, label):
    start = time.perf_counter()
    result = fn(n)
    elapsed = time.perf_counter() - start
    print(f"{label:30s} fib({n}) = {result:>20} in {elapsed*1000:.3f}ms")

time_it(fib_iterative, 50, "Iterative")
time_it(fib_memo, 50,      "Memoized recursive")
# fib_recursive(50) would take several minutes — do not run it

# Python's recursion limit — check yours
print(f"\nRecursion limit: {sys.getrecursionlimit()} frames")

# fib_memo(990) works; fib_memo(1001) raises RecursionError
# because the memoized version still recurses to compute uncached values
try:
    fib_memo(995)
    print("fib_memo(995): OK")
except RecursionError:
    print("fib_memo(995): RecursionError — cache not warm")

# Iterative has no such limit
print(f"fib_iterative(10000): {fib_iterative(10000)} (truncated)"[:60] + '...')
Output
Iterative fib(50) = 12586269025 in 0.004ms
Memoized recursive fib(50) = 12586269025 in 0.021ms
Recursion limit: 1000 frames
fib_memo(995): OK
fib_iterative(10000): 33644764876431783695163498604...(truncated)
The Call Stack as a Stack of Plates
  • Each plate = one function call's local variables, arguments, and return address
  • Base case = the point where you stop adding plates and start removing them from the top
  • Stack overflow = the stack grew taller than the OS allows — typically 1-8MB per thread
  • Memoization = writing the result on each plate before putting it down so you never have to redo that calculation
  • Iteration = using a single plate that you erase and rewrite in place — constant stack height regardless of n
Production Insight
Naive recursive fib(50) makes approximately 2^50 function calls — over one quadrillion. In CPython each stack frame costs roughly 100-200 bytes. This is not a depth problem (the maximum depth for fib(50) is only 50), it is a redundancy problem: the same subproblem fib(2) is computed roughly 2^48 times. The fix is memoization, not iteration — but know that even memoized recursion can hit the stack limit if you call fib(1500) cold (no cached values), because the first call recurses 1500 levels deep before the cache is warm. For large n with a cold cache, iterative is the only safe choice.
Key Takeaway
Naive recursion is exponential — it is a teaching example, not production code. Memoization preserves the recursive structure and readability while reducing time complexity to O(n). Iteration wins on space (O(1) vs O(n)) and avoids the cold-cache recursion depth problem entirely. For anything beyond small bounded n, iterate.
Which Fibonacci Implementation to Use
Ifn is small and bounded (< 30), code clarity matters most
UseMemoized recursion — the structure mirrors the mathematical definition and is easy to verify
Ifn can be large (> 100) or is determined by user input
UseIterative — O(1) space, no recursion limit risk, 3-5x faster than memoized
IfComputing many Fibonacci values repeatedly in a long-running process
UseMemoized recursion — lru_cache persists across calls, so each value is computed only once total
Ifn is unbounded or controlled by external input
UseIterative only — never recurse when depth is controlled by untrusted input

Converting Recursive DFS to Iterative

The pattern for converting any recursive algorithm to iterative is mechanical once you understand it: the call stack that the language manages implicitly becomes an explicit data structure you manage yourself. Where the recursive version calls itself with a smaller problem, the iterative version pushes that smaller problem onto a list or deque. Where the recursive version returns and picks up where it left off, the iterative version pops from the list and continues the loop.

DFS is the cleanest example because the conversion is direct. The recursive DFS works by calling itself for each unvisited neighbour — the call stack naturally implements LIFO ordering, visiting the deepest unvisited node first. The iterative version replaces the call stack with a Python list used as a stack (append for push, pop for pop), producing identical traversal order.

The one subtlety that trips people up: push order. The recursive version processes neighbours left-to-right because it calls itself for the first neighbour, recurses all the way down that branch, then comes back and calls itself for the second neighbour. To match this order iteratively, you push neighbours in reverse — the last neighbour goes on the stack first, so the first neighbour gets popped and processed first.

This matters in interview settings and in production when the traversal order is semantically meaningful — for example, when processing rules in priority order or when the graph represents a dependency chain where the order of resolution matters.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
from collections import deque

# --- Recursive DFS ---
# Clean, readable, mirrors the algorithm definition
# Dangerous on graphs with unknown depth or high node count
def dfs_recursive(graph, node, visited=None):
    if visited is None:
        visited = set()  # IMPORTANT: never use mutable default arg
    visited.add(node)
    print(node, end=' ')
    for neighbour in graph.get(node, []):
        if neighbour not in visited:
            dfs_recursive(graph, neighbour, visited)
    return visited

# --- Iterative DFS ---
# Same traversal order, no stack overflow risk
# Uses heap memory (Python list) instead of the C call stack
def dfs_iterative(graph, start):
    visited = set()
    stack   = [start]  # explicit stack — append is push, pop is pop

    while stack:
        node = stack.pop()
        if node in visited:
            continue
        visited.add(node)
        print(node, end=' ')
        # Push in reverse to match recursive left-to-right processing order
        for neighbour in reversed(graph.get(node, [])):
            if neighbour not in visited:
                stack.append(neighbour)
    return visited

# For contrast: BFS uses a deque instead of a list
def bfs_iterative(graph, start):
    visited = set([start])
    queue   = deque([start])  # FIFO — popleft instead of pop

    while queue:
        node = queue.popleft()
        print(node, end=' ')
        for neighbour in graph.get(node, []):
            if neighbour not in visited:
                visited.add(neighbour)
                queue.append(neighbour)
    return visited

# Test graph — directed, acyclic
graph = {
    'A': ['B', 'C'],
    'B': ['D', 'E'],
    'C': ['F'],
    'D': [],
    'E': ['F'],
    'F': []
}

print('Recursive DFS: ', end=''); dfs_recursive(graph, 'A'); print()
print('Iterative DFS: ', end=''); dfs_iterative(graph, 'A'); print()
print('Iterative BFS: ', end=''); bfs_iterative(graph, 'A'); print()

# Demonstrate the depth advantage
# Simulate a chain graph 5000 nodes long — recursive would crash
chain = {i: [i + 1] for i in range(5000)}
chain[5000] = []
visited = dfs_iterative(chain, 0)
print(f"\nIterative DFS visited {len(visited)} nodes in 5000-node chain without stack overflow")
Output
Recursive DFS: A B D E F C
Iterative DFS: A B D E F C
Iterative BFS: A B C D E F
Iterative DFS visited 5001 nodes in 5000-node chain without stack overflow
Order Matters When Converting — and Mutable Default Arguments Will Ruin Your Day
Two things to get right when converting recursive DFS to iterative: (1) Push neighbours in reverse order to match the LIFO behaviour of the call stack in the recursive version — without this, you get a valid DFS but in a different node order, which may or may not matter for your use case. (2) Never use a mutable object (set, list, dict) as a default argument in Python — def dfs(graph, node, visited=set()) creates the set once at function definition time and reuses it across all calls. The visited set from one traversal will pollute the next. Always use visited=None and create a fresh set inside the function.
Production Insight
A 5000-node chain graph is not unusual in production — think a deeply nested comment thread, a long dependency resolution chain, or a file system directory structure. Recursive DFS on that chain would push 5000 frames onto the C stack and crash with a RecursionError at frame 1000. The iterative version uses a Python list on the heap to track the same state — and heap memory is measured in gigabytes, not megabytes. The conversion is mechanical and the result is functionally identical. There is no reason to use recursive DFS on a graph whose depth you cannot bound at design time.
Key Takeaway
The conversion pattern from recursive to iterative DFS is universal: replace the call stack with an explicit list, push instead of recurse, pop and continue in the loop. Neighbours must be pushed in reverse order to match recursive traversal order. The iterative version uses heap memory instead of the C stack — heap is gigabytes, the C stack is megabytes. Always use iterative for graphs whose depth is controlled by external input.
Recursive vs Iterative Graph Traversal
IfGraph depth is bounded and small (e.g., balanced tree, bounded AST depth)
UseRecursive is fine — clean code, easy to verify, depth bounded by problem structure
IfGraph represents user-uploaded content, external data, or a file system
UseIterative mandatory — depth is controlled by external input and cannot be bounded safely
IfNeed BFS (level-order) traversal
UseAlways iterative — BFS has no natural recursive formulation; use a deque
IfGraph may contain cycles
UseBoth work with a visited set, but iterative is safer for large or unknown depth
IfProcessing order of nodes matters for correctness
UseVerify the push order in your iterative version — use the reverse-neighbours trick to match recursive order

Tail Recursion — Why Python Refuses to Optimize It

Tail recursion is a specific structural property of a recursive function: the recursive call is the very last operation executed before the function returns, with no pending computation waiting for the result. In a tail-recursive function, the current stack frame is no longer needed once the recursive call is made — all the information needed for the rest of the computation has been passed as arguments to the next call.

Languages like Scheme, Haskell, and Scala exploit this property through tail call optimization (TCO): the compiler recognizes that the current frame is no longer needed and reuses it for the next call instead of pushing a new one. The recursive call is effectively compiled into a jump instruction rather than a function call, making tail-recursive code consume constant stack space regardless of recursion depth. Tail recursion in these languages is genuinely equivalent to a while loop at the hardware level.

Python deliberately does not implement TCO. Guido van Rossum's reasoning has been documented publicly: stack traces are the primary debugging tool for Python developers, and TCO would make stack traces useless — instead of seeing the full chain of recursive calls, you would see a single frame. He also argued that Python is not a functional language and that the idiomatic solution for iteration is a loop, not compiler magic. This decision has never been reversed.

Java is in a similar position, though for a different reason. The JVM specification does not require TCO, and the HotSpot JVM does not implement it for general recursive calls. There is ongoing work in Project Loom and through invokedynamic to support functional-style TCO in some contexts, but as of 2026 you cannot rely on TCO in standard Java code.

The practical implication is stark: in Python and Java, tail-recursive code is exactly as expensive as non-tail-recursive code. Writing a function in tail-recursive style in Python provides zero performance benefit and no stack depth advantage. If you want iteration performance, write a loop.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import sys

# --- Tail-recursive factorial ---
# The recursive call is the LAST operation — nothing pending after it
# accumulator carries the result forward so no computation needed after return
def factorial_tail(n, accumulator=1):
    if n <= 1:
        return accumulator
    return factorial_tail(n - 1, n * accumulator)  # tail position — accumulator updated

# --- Non-tail-recursive factorial ---
# Multiplication happens AFTER the recursive call returns
# The stack frame must be retained to perform n * (result of recursive call)
def factorial_non_tail(n):
    if n <= 1:
        return 1
    return n * factorial_non_tail(n - 1)  # NOT tail position — n must be remembered

# --- Iterative factorial — what you should actually write ---
def factorial_iterative(n):
    result = 1
    for i in range(2, n + 1):
        result *= i
    return result

# Demonstrate that Python treats both recursive versions identically
# Both will fail at the same recursion depth
for fn_name, fn in [("tail", factorial_tail), ("non_tail", factorial_non_tail)]:
    sys.setrecursionlimit(200)  # low limit to make failure obvious
    try:
        result = fn(150)
        print(f"factorial_{fn_name}(150) = {str(result)[:20]}...")
    except RecursionError:
        print(f"factorial_{fn_name}(150) -> RecursionError at depth 150")

# Iterative has no such limit
sys.setrecursionlimit(1000)  # restore
print(f"factorial_iterative(150) = {str(factorial_iterative(150))[:20]}...")
print(f"factorial_iterative(10000): computed successfully, len={len(str(factorial_iterative(10000)))} digits")

# Show the tail-recursive -> iterative mechanical conversion
# Replace: return factorial_tail(n-1, n*acc) with loop variable reassignment
def factorial_tail_to_iterative(n):
    accumulator = 1
    while n > 1:          # was: if n <= 1: return accumulator
        accumulator = n * accumulator  # was: argument to recursive call
        n = n - 1         # was: first argument to recursive call
    return accumulator

print(f"\nTail-to-iterative conversion: factorial(20) = {factorial_tail_to_iterative(20)}")
Output
factorial_tail(150) -> RecursionError at depth 150
factorial_non_tail(150) -> RecursionError at depth 150
factorial_iterative(150) = 57133839564458545227...
factorial_iterative(10000): computed successfully, len=35660 digits
Tail-to-iterative conversion: factorial(20) = 2432902008176640000
Guido's Decision — and Why It Has Held for 30 Years
Guido van Rossum's rejection of tail call optimization is not a limitation waiting to be fixed — it is a deliberate design philosophy that has been reaffirmed multiple times. His argument: Python prioritizes explicit, debuggable code over compiler magic. TCO would silently transform recursive functions into iterative ones, making stack traces misleading and debugging harder. If you want iteration, write a loop — that is explicit, readable, and does not depend on the optimizer recognizing a structural pattern in your code. This philosophy ('explicit is better than implicit') is the fourth line of the Zen of Python and governs far more than just TCO.
Production Insight
I have reviewed code from engineers who came from functional language backgrounds (Haskell, Elixir, Scala) and wrote elegant tail-recursive Python because that style was natural to them. The code was correct and readable, but it had no performance advantage over the non-tail-recursive equivalent and crashed at the same 1000-frame depth. The review comment was always the same: convert the tail-recursive pattern to a while loop — the mechanical conversion takes five minutes and the result is idiomatic Python that anyone on the team can maintain without knowing what tail recursion means.
Key Takeaway
Tail recursion is a theoretical optimization — Python and Java deliberately do not implement it. Both tail-recursive and non-tail-recursive functions consume the same stack depth per call in Python and standard Java. The conversion from tail recursion to a while loop is mechanical: replace the recursive call with argument reassignment and wrap in a while loop. Write loops for iteration — do not expect the runtime to do the conversion for you.
Handling Tail Recursion in Python and Java
IfFunction is tail-recursive and depth is bounded and small (< 500)
UseLeave as-is if clarity is the priority — Python won't optimize it but it won't crash either
IfFunction is tail-recursive and depth could exceed 1000
UseConvert to while loop — the conversion is mechanical and the result is faster and safer
IfFunctional language developer wrote tail-recursive code expecting TCO
UseExplain TCO is not supported; convert to iterative during review
IfNeed recursion depth beyond 1000 in Java
UseConvert to iterative or use a thread with a larger stack — never rely on JVM TCO support

When Recursion Wins — Tree Traversal and Divide and Conquer

Despite its costs, recursion is the right choice for problems whose structure is inherently recursive. Tree traversal is the clearest example: a tree is defined recursively (a node with left and right subtrees that are themselves trees), so a recursive traversal directly mirrors the definition. You can read the three-line recursive inorder traversal and immediately verify it is correct because the code and the definition are the same thing. The iterative equivalent requires managing an explicit stack, tracking multiple state variables, and reasoning about edge cases that simply do not exist in the recursive version.

Divide and conquer algorithms (merge sort, quicksort, binary search) have the same property. The recursive version of merge sort — split the array in half, sort each half, merge — is directly expressible in code. The iterative version requires managing multiple pointers and merge widths through nested loops, and is substantially harder to understand and verify.

The important constraint that makes recursion safe for these problems: the recursion depth is bounded by the structure of the data, not by user-controlled input. A balanced binary tree with one billion nodes has a depth of approximately 30. Even a perfectly degenerate tree (a linked list) has a depth equal to its node count — but if you know the tree is balanced (because you built it that way), depth 30 is well within any platform's stack limit.

The practical rule: use recursion when the problem structure is recursive AND you can bound the depth at design time. The moment depth is controlled by external input — user-uploaded files, API responses, user-constructed data structures — you must switch to iteration.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
from __future__ import annotations
from dataclasses import dataclass
from typing import Optional

@dataclass
class TreeNode:
    val: int
    left:  Optional['TreeNode'] = None
    right: Optional['TreeNode'] = None

# --- Recursive inorder: left -> root -> right ---
# 3 lines, directly mirrors the definition
# Depth = tree height — O(log n) for balanced, O(n) for degenerate
def inorder_recursive(node: Optional[TreeNode]) -> list[int]:
    if node is None:
        return []
    return inorder_recursive(node.left) + [node.val] + inorder_recursive(node.right)

# --- Iterative inorder: same result, harder to read ---
# Required when tree height is unbounded or user-controlled
def inorder_iterative(root: Optional[TreeNode]) -> list[int]:
    result, stack, current = [], [], root
    while current or stack:
        # Go as far left as possible
        while current:
            stack.append(current)
            current = current.left
        # Process the leftmost unprocessed node
        current = stack.pop()
        result.append(current.val)
        # Move to right subtree
        current = current.right
    return result

# --- Recursive merge sort: clean divide-and-conquer ---
def merge_sort(arr: list[int]) -> list[int]:
    if len(arr) <= 1:
        return arr
    mid   = len(arr) // 2
    left  = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)

def merge(left: list[int], right: list[int]) -> list[int]:
    result = []
    i = j  = 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i]); i += 1
        else:
            result.append(right[j]); j += 1
    return result + left[i:] + right[j:]

# Build a balanced BST for testing
#         4
#        / \
#       2   6
#      / \ / \
#     1  3 5  7
root = TreeNode(4,
    TreeNode(2, TreeNode(1), TreeNode(3)),
    TreeNode(6, TreeNode(5), TreeNode(7))
)

recursive_result = inorder_recursive(root)
iterative_result = inorder_iterative(root)
print(f"Recursive inorder: {recursive_result}")
print(f"Iterative inorder: {iterative_result}")
print(f"Results match: {recursive_result == iterative_result}")

# Merge sort demo
unsorted = [38, 27, 43, 3, 9, 82, 10]
print(f"\nMerge sort: {unsorted} -> {merge_sort(unsorted)}")

# Depth analysis: balanced tree with 1M nodes
import math
nodes = 1_000_000
balanced_depth = math.ceil(math.log2(nodes + 1))
print(f"\nBalanced tree with {nodes:,} nodes: max depth = {balanced_depth}")
print(f"Python's recursion limit: 1000 frames")
print(f"Safe for recursion? {'Yes' if balanced_depth < 900 else 'No'}")
Output
Recursive inorder: [1, 2, 3, 4, 5, 6, 7]
Iterative inorder: [1, 2, 3, 4, 5, 6, 7]
Results match: True
Merge sort: [38, 27, 43, 3, 9, 82, 10] -> [3, 9, 10, 27, 38, 43, 82]
Balanced tree with 1,000,000 nodes: max depth = 20
Python's recursion limit: 1000 frames
Safe for recursion? Yes
The Rule for Safe Recursive Tree Processing
Recursion on trees is safe when you built the tree yourself with a known balancing property — O(log n) depth is well within any platform's stack limit. Recursion on trees is dangerous when a user, an API, or a file built the tree — a degenerate tree (effectively a linked list) has depth equal to its node count. The question to ask at design time is: who controls the depth of this data structure? If the answer is 'the user' or 'an external system', use iteration.
Production Insight
The iterative inorder traversal in the code above has 8 meaningful lines — the recursive version has 3. Both are correct. For a BST you built and maintain with known balance guarantees, the recursive version is the right choice: shorter, easier to review, and the depth is bounded by your own invariants. For a tree that users construct (a file system directory, a user-uploaded JSON tree, a dependency graph from user-defined packages), the iterative version is the right choice: the depth is externally controlled and you cannot safely bound it at design time. Make this decision once per tree type and document it in the code.
Key Takeaway
Recursion wins when the problem structure IS recursive and you control the depth. A balanced binary tree with 1 million nodes has a maximum recursion depth of 20 — comfortably safe. Recursive tree traversal code is 3x shorter and directly verifiable against the problem definition. The line between safe recursive and mandatory iterative is simple: who controls the depth? If the answer includes any external party, iterate.
Recursion vs Iteration for Tree and Divide-and-Conquer Problems
IfTree is balanced and built by your own code (BST, heap, segment tree)
UseRecursion is safe — depth is O(log n) and bounded by your invariants
IfTree is user-provided, loaded from external data, or has no balance guarantee
UseUse iterative traversal — depth is externally controlled and cannot be bounded safely
IfImplementing divide-and-conquer (merge sort, quicksort, binary search)
UseRecursion is natural and safe — depth is O(log n) for standard implementations
IfImplementing backtracking (N-queens, Sudoku solver, permutation generation)
UseRecursion is idiomatic — depth is bounded by the problem size, not input depth
IfParsing a file format, processing nested configuration, traversing an AST
UseDepends on who produced the file — if external, use iterative; if your own toolchain, recursion is acceptable with a depth guard
● Production incidentPOST-MORTEMseverity: high

Recursive JSON Parser Crashed Production API

Symptom
The API returned 500 errors intermittently during peak traffic. Error logs showed RecursionError: maximum recursion depth exceeded in calls to parse_nested(). The errors were non-deterministic — some requests from the same payment provider succeeded and some failed — which initially made the team suspect a race condition rather than a depth issue. The pattern became clear only after correlating failed requests against payload size.
Assumption
The team assumed JSON payloads would never exceed 20 levels of nesting, which was true for all payloads they had tested in staging. The third-party payment provider's webhook documentation showed example payloads with 4-6 levels. What the documentation did not mention was that their webhook system could wrap payloads in metadata envelopes, retry context, and forwarding headers — each adding nesting levels. Under Black Friday load, the provider's system was routing payloads through additional intermediary systems, adding roughly 1200 levels of nesting total.
Root cause
The recursive JSON parser pushed one stack frame per nesting level. At 1000+ levels, Python's default recursion limit of 1000 frames was exhausted and the RecursionError was thrown. The on-call engineer's first response was to raise the limit with sys.setrecursionlimit(5000). This pushed the failure threshold higher but did not fix it — at around 3000 frames, the underlying C stack overflowed and the Python process exited with a segfault rather than a catchable exception. The application server's process manager restarted the process, which came back up and immediately crashed again on the next deeply nested payload.
Fix
1. Replaced the recursive parser with an iterative version using an explicit Python list as a stack — this moves the depth tracking from the C call stack to heap-allocated memory, which is effectively unlimited by comparison. 2. Added payload depth validation at the API boundary: any incoming webhook payload exceeding 100 nesting levels is rejected with a 400 status and logged for investigation. 3. Switched to the ijson streaming parser for all webhook processing — ijson parses JSON iteratively without loading the entire document into memory, making it safe for both deeply nested and very large payloads. 4. Added monitoring for the maximum nesting depth seen in the last hour, alerting when any payload exceeds 50 levels so the team can investigate before hitting the hard limit.
Key lesson
  • Never use recursion on untrusted input — you cannot control the depth, and the depth is often controlled by someone with different incentives than you
  • sys.setrecursionlimit() only adjusts Python's internal frame counter — it does not increase the actual C stack size allocated by the OS; raising it beyond a few thousand reliably causes segfaults
  • Always validate and reject input that exceeds your recursion budget before beginning recursive processing — fail fast with a 400, not slow with a 500
  • Iterative parsing with an explicit stack is mandatory for production systems processing external or user-controlled data; the performance difference is negligible and the safety difference is absolute
Production debug guideSymptom → Action mapping when your recursive solution breaks in production5 entries
Symptom · 01
RecursionError: maximum recursion depth exceeded
Fix
Check the input that triggered the error — log the depth of the structure being processed. Likely exceeding 1000 levels. Do not raise sys.setrecursionlimit() as the first fix — this delays the segfault rather than preventing it. Add a depth guard at the entry point: if depth > MAX_DEPTH: raise ValueError(f'Input exceeds maximum depth {MAX_DEPTH}'). Then convert the recursive path to iterative using an explicit stack.
Symptom · 02
StackOverflowError in Java during recursive processing
Fix
Run jcmd <pid> VM.flags | grep StackSize to check the current thread stack size. The default is 512KB-1MB per thread depending on the JVM and OS. For Java, you can increase it with -Xss2m for all threads, but this increases memory consumption for every thread in the JVM, not just the one that overflowed. The better fix is converting the recursive method to iterative. If the recursion is in a specific hot path, isolate it in a thread created with a larger stack size: new Thread(null, runnable, "deep-recursion-thread", 4 1024 1024).
Symptom · 03
Gradual memory increase during recursive processing that does not GC
Fix
Each recursive call retains its local variables on the stack until the call returns. If any local variable in the call chain holds a reference to a large data structure, that structure cannot be collected until the entire recursive path unwinds. Profile with tracemalloc in Python or take a heap dump in Java. Look for objects whose reference count does not decrease during processing. The fix is either to explicitly del or null out large intermediate results before the recursive call, or to convert to an iterative approach where you control exactly what is held in memory at each step.
Symptom · 04
Recursive function is dramatically slower than expected
Fix
Profile with cProfile in Python (python -m cProfile -s cumulative your_script.py) or async-profiler in Java. Look for repeated subproblem computation — if you see the same function being called with the same arguments thousands of times, you have exponential redundancy. Add memoization: @lru_cache(maxsize=None) for pure Python functions, or a manual HashMap cache in Java. Verify the function is pure (same inputs always produce same outputs, no side effects) before adding a cache.
Symptom · 05
Recursive algorithm produces correct output in tests but wrong output in production
Fix
Check whether the function uses mutable default arguments or module-level state that accumulates across calls. A common Python bug: def dfs(graph, node, visited=set()) — the set is created once at function definition, not per call, so it persists across invocations. The fix is def dfs(graph, node, visited=None) followed by if visited is None: visited = set() at the top. Also check for accidental global state modification inside the recursive function.
★ Recursion Debugging Cheat SheetFast diagnostics when recursive code misbehaves in production
RecursionError at runtime — Python process crashes or returns 500
Immediate action
Determine the current recursion limit and the depth of the input that triggered the error
Commands
python -c "import sys; print(sys.getrecursionlimit())"
python -c "import sys; print(sys.getrecursionlimit()); sys.setrecursionlimit(2000)"
Fix now
Do not raise the limit as a permanent fix — it moves the crash threshold without addressing the cause. Add a depth guard at the entry point and convert the recursive path to iterative with an explicit stack.
Java StackOverflowError — thread terminates or JVM crashes+
Immediate action
Check the thread stack size and identify which recursive method triggered the overflow
Commands
jcmd <pid> VM.flags | grep StackSize
jcmd <pid> Thread.print | grep -B5 -A30 'StackOverflowError'
Fix now
Increase -Xss for temporary relief (e.g., -Xss4m), but plan to convert the recursive method to iterative. Increasing Xss increases memory consumption for every thread in the process — it is not a free lunch.
Memory leak or growing heap during recursive batch processing+
Immediate action
Take a heap dump and check for retained references in stack frame local variables
Commands
jmap -dump:live,format=b,file=heap.hprof <pid>
python -c "import tracemalloc; tracemalloc.start()" followed by snapshot comparison between processing start and peak
Fix now
Ensure the base case is always reachable and that large objects are not captured in closures or held as local variables across the recursive call. Explicitly release references before recursing where possible.
Recursion vs Iteration
AspectRecursionIteration
Code clarityMirrors recursive problem structure (trees, divide-and-conquer) — algorithm and code are the same shapeClear for sequential processing (lists, ranges, streams) — explicit state is visible at every step
Memory usageO(depth) stack frames — each frame holds local variables, arguments, and return addressO(1) loop variables or O(n) explicit data structures on the heap — you control what is retained
PerformanceFunction call overhead per step: ~100-200ns in CPython, ~10-30ns in JVM depending on JIT warmupLoop overhead only: ~5-15ns per iteration in CPython, sub-nanosecond in JVM after JIT
Stack depth limitPython: 1000 frames default (sys.getrecursionlimit()). Java: 512KB-1MB C stack per thread (~5000-20000 frames depending on frame size)No stack depth limit — uses heap memory which is bounded only by available RAM
Debugging experienceFull call chain visible in stack traces — each frame shows the argument values at that depthSingle stack frame — must instrument loop variables explicitly to trace state through iterations
Tail call optimizationSupported in Scheme, Haskell, Scala. Explicitly NOT in Python or standard Java — both treat tail calls identically to regular callsN/A — loops are inherently iterative and require no compiler transformation
Untrusted or external inputDangerous — depth is controlled by input, stack overflow is guaranteed if input is malformed or adversarialSafe — depth has no bearing on stack consumption; all depth state lives in heap-allocated data structures
MemoizationNatural fit — @lru_cache decorates the recursive function directly; cache lookup and miss both follow the same code pathManual — must implement explicit caching layer; bottom-up DP is the idiomatic iterative alternative
Deletion and backtrackingNatural via call stack unwinding — return from recursive call naturally restores previous stateMust manually save and restore state — requires explicit undo operations or snapshot/restore pattern
Best fitTree traversal, graph DFS on bounded-depth structures, divide-and-conquer, backtracking with bounded search depthLinear processing, batch operations, parsing external data, any algorithm where input depth is externally controlled

Key takeaways

1
Recursion is the natural choice for tree traversal and divide-and-conquer
the code mirrors the problem structure and is easier to verify against the algorithm definition.
2
Python's default recursion limit is 1000 frames
deep recursion raises RecursionError; raising sys.setrecursionlimit() moves the crash threshold but causes a segfault when the C stack is exhausted.
3
Convert recursive solutions to iterative by replacing the call stack with an explicit stack data structure
the pattern is mechanical and preserves correctness while eliminating stack depth limits.
4
Memoization (@lru_cache) fixes the exponential redundancy of naive recursive algorithms
transforms O(2^n) into O(n) time while keeping the recursive structure; use only on pure functions.
5
Iteration is generally 3-10x faster than recursion due to eliminated function call overhead
O(1) stack space versus O(depth) stack frames.
6
Never use recursion on untrusted or externally controlled input
depth is unbounded and stack overflow is a certainty when input is adversarial or unexpected.
7
Tail recursion is not optimized in Python or Java
write while loops instead of relying on a transformation that neither runtime performs.

Common mistakes to avoid

5 patterns
×

Using recursion on untrusted or externally controlled input

Symptom
RecursionError or StackOverflowError when input has unexpected depth — deeply nested JSON, malicious payloads, user-uploaded files, API responses from third parties. Works perfectly in tests because test data is shallow; fails in production because real data is not.
Fix
Validate input depth at the entry point before beginning recursive processing: if depth > MAX_DEPTH: raise ValueError(f'Input depth {depth} exceeds limit {MAX_DEPTH}'). Use iterative parsers for any external data format. The rule is simple: if you did not build the data structure, do not recurse into it without a depth guard.
×

Raising sys.setrecursionlimit() as the fix for RecursionError

Symptom
After raising the limit (e.g., to 10000), the application now crashes with a segfault instead of a catchable RecursionError. The C stack overflows before Python's counter reaches the new limit. The process exits without a stack trace and the process manager restarts it, which immediately crashes again on the same input.
Fix
sys.setrecursionlimit() adjusts Python's frame counter but has no effect on the actual C stack size allocated by the OS. It is a temporary diagnostic tool, not a fix. Convert the recursive code to iterative. If you genuinely need deeper recursion for a bounded problem, increase the OS thread stack size with ulimit -s (Linux) or create a thread with a larger stack explicitly — but solve the root cause rather than shifting the crash threshold.
×

Not memoizing recursive functions with overlapping subproblems

Symptom
Function is exponentially slow — fib(40) takes several seconds, fib(50) appears to hang. Profiling shows the same function being called millions of times with the same arguments. The algorithm is O(2^n) when it should be O(n).
Fix
Add @lru_cache(maxsize=None) for pure functions in Python. Use functools.cache in Python 3.9+ (equivalent but slightly cleaner). For class methods, implement a dict-based cache manually or use a wrapper. Verify the function is pure — same inputs always produce same outputs, no side effects — before memoizing. A function that prints, writes to a file, or modifies global state must not be memoized because the cached return value will be used on repeated calls while the side effects are skipped.
×

Writing tail-recursive code expecting Python or Java to optimize it

Symptom
No performance improvement over non-tail recursion. Same stack depth limit, same call overhead per level. Code review from a functional programming background led to tail-recursive style that the team finds less familiar without providing any benefit.
Fix
Python and Java do not perform tail call optimization. Converting a function to tail-recursive style in these languages provides zero performance benefit. Convert tail recursion to a while loop instead — the conversion is mechanical (replace the recursive call with argument reassignment and wrap in a loop) and the result is idiomatic, readable, and genuinely faster.
×

Using a mutable object as a default argument in a recursive function

Symptom
The recursive function produces correct results the first time it is called but returns wrong results or no results on subsequent calls. The visited set, accumulated list, or cache dict from the first call persists into the second call, affecting results in unpredictable ways.
Fix
Never use mutable objects as default arguments: def dfs(graph, node, visited=set()) is wrong. The set is created once at function definition time and reused across all calls. The correct pattern: def dfs(graph, node, visited=None) with if visited is None: visited = set() as the first line of the function body. This creates a fresh set for each top-level call.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is Python's default recursion limit, how do you change it, and what...
Q02SENIOR
How do you convert a recursive DFS to an iterative one? What are the got...
Q03JUNIOR
What is the time and space complexity of naive recursive Fibonacci, and ...
Q04SENIOR
Why doesn't Python optimize tail recursion, and what should you do inste...
Q05SENIOR
When would you choose recursion over iteration in a production system? W...
Q01 of 05JUNIOR

What is Python's default recursion limit, how do you change it, and what are the risks of changing it?

ANSWER
Python's default recursion limit is 1000 frames, accessible via sys.getrecursionlimit(). You can change it with sys.setrecursionlimit(n). The risk: this limit only adjusts Python's internal frame counter — it does not increase the actual C stack size that the OS allocates for the thread. The C stack is typically 1-8MB depending on the OS and how the thread was created. Setting the limit to 100000 and recursing that deep will cause a segfault (unhandled C stack overflow) rather than a catchable RecursionError. The process exits without a useful stack trace. The correct fix for deep recursion is always to convert to iterative — not to raise the limit.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is tail recursion and does Python support it?
02
When is recursion the better choice over iteration?
03
How do I know if my recursive function will hit the stack limit?
04
Can I memoize a recursive function with side effects?
🔥

That's Recursion. Mark it forged?

5 min read · try the examples if you haven't

Previous
Introduction to Recursion
2 / 9 · Recursion
Next
Tower of Hanoi Problem