SQL Subqueries — 40-Minute Timeout from Correlated Query
A correlated subquery ran AVG() 200k times — 2-second report became 40-minute timeout.
20+ years shipping high-throughput database systems. Everything here is grounded in real deployments.
- A subquery is a SELECT nested inside another query — appears in WHERE (filter), SELECT (compute a value), or FROM (derived table)
- Non-correlated subquery: executes once, result reused — fast
- Correlated subquery: references the outer query and re-executes once per outer row — potentially O(n²)
- EXISTS short-circuits on first match; IN collects the full result set first
- NOT IN is dangerous when the subquery can return NULL — returns empty set silently; use NOT EXISTS instead
- Rule: replace correlated subqueries with JOINs on large tables for an order-of-magnitude speedup
A subquery is a question inside a question. Asking 'which customers placed an order?' is one question. Asking 'which customers placed an order that was larger than the average order?' requires asking the average question first, then using that answer. SQL lets you nest the inner question directly inside the outer one. The engine answers the inner question first, then uses that result to answer the outer one.
Subqueries are one of the most flexible tools in SQL — they let you compose queries in ways that would otherwise require multiple steps or temporary tables. But they also hide performance traps, especially correlated subqueries that run once per row.
This guide covers the main subquery types, the EXISTS vs IN performance difference, the dangerous NULL behaviour of NOT IN, and when to reach for CTEs instead. Understanding the performance profile of each subquery type is the difference between a query that works and one that works at scale.
What a Subquery Actually Is
A subquery is a SELECT statement nested inside another SQL statement — inside a WHERE, FROM, or HAVING clause. Its result feeds the outer query as a value, a row set, or a correlation reference. The core mechanic: the outer query pauses, the inner query executes (once or per row), and the outer query uses that result to filter, compute, or join.
Subqueries come in two flavors: non-correlated (executed once, independent of the outer query) and correlated (re-executed for each row of the outer query — O(n*m) cost). A non-correlated subquery runs first, materializes its result, then the outer query uses it. A correlated subquery references columns from the outer query, forcing per-row execution. That distinction is the single most important performance property: correlated subqueries can turn a fast query into a 40-minute timeout.
Use subqueries when you need to compare a value against an aggregated result (e.g., "employees earning above department average") or when a JOIN would produce duplicate rows due to one-to-many relationships. In production systems, subqueries often replace complex JOINs for readability, but the trade-off is execution plan complexity. A well-placed subquery can reduce data scanned by orders of magnitude; a poorly placed one can scan the same table millions of times.
Scalar and Non-Correlated Subqueries
A scalar subquery returns exactly one value — one row, one column. You can use it anywhere a single value is expected: in SELECT to compute a derived column, in WHERE for a comparison, or in SET for an UPDATE. The key property of a non-correlated scalar subquery is that it executes once and its result is reused for every row of the outer query.
Subqueries in WHERE with IN accept a multi-row, single-column result — the outer query keeps rows where the column value appears in that list. Subqueries in FROM create a derived table (also called an inline view) — the result acts as a temporary table for the outer query to filter or join against.
Performance note: non-correlated subqueries execute once. The engine computes the inner result, then uses it for all outer row comparisons. This is fundamentally different from correlated subqueries, which re-execute per outer row.
Correlated Subqueries — Power and the O(n²) Trap
A correlated subquery references a column from the outer query. This means it cannot run independently — it must re-execute for each row the outer query processes. On a table with 100 rows, that's 100 executions. On a table with 100,000 rows, it's 100,000 executions. This is the O(n²) trap.
Correlated subqueries are useful for row-by-row comparisons that are hard to express any other way. The canonical example: 'find employees earning more than their own department's average.' The subquery needs to know which department the current outer row belongs to — and that changes row by row.
But for large tables, this pattern is almost always replaceable with a JOIN to a pre-aggregated derived table or CTE. The JOIN version computes the average once per department (not once per row) and is orders of magnitude faster.
EXISTS vs IN — Performance and the NULL Danger
EXISTS and IN are both used to check whether matching rows exist in another table, but they behave differently in important ways.
IN collects the full result set from the subquery, then checks membership for each outer row. If the subquery returns 10,000 rows, IN builds a 10,000-entry lookup structure first. EXISTS, on the other hand, is correlated — it runs the inner query for each outer row and stops as soon as it finds one match. This short-circuit makes EXISTS faster when you expect many matches.
The most important practical difference is NULL handling. NOT IN returns an empty result set if the subquery contains any NULL values — because SQL uses three-valued logic: a value compared to NULL is neither true nor false, it's unknown. One NULL in the subquery poisons the entire NOT IN check. NOT EXISTS does not have this problem because it never compares values directly to NULL.
Subqueries in the FROM Clause — Deriving Battle Tables
You understand subqueries in WHERE. Now weaponize them in FROM. A subquery in the FROM clause creates a derived table — a temporary result set you can treat like a real table. Why bother? Because it lets you pre-aggregate, filter, or reshape data before your outer query touches it, avoiding repeated scans and saving your production DB from unnecessary thrashing. The derived table must have an alias. Every database needs a name to reference it. Execution order: the inner query runs first, materializing the derived table (in memory or tempdb), then the outer query queries that result. Be warned: if your derived table returns millions of rows, you just created a bottleneck. Always check execution plans. Derive only what you need, when you need it.
Subqueries with UPDATE and DELETE — Precision Surgery
Subqueries aren't just for SELECT. You can embed them in UPDATE and DELETE to target rows dynamically without manual joins or multiple statements. This is how you fix bad data without a full table scan in a transaction. For UPDATE, the subquery provides the new value or determines which rows to change. For DELETE, it identifies the rows to remove based on complex logic. Both patterns let you avoid race conditions where your filter changes between a SELECT and a subsequent write. The inner query executes first, giving you a snapshot of what to change. Critical: If your subquery returns NULL inadvertently, your UPDATE might set a column to NULL or your DELETE might remove nothing silently. Always test with a SELECT first, wrap in a transaction, and commit only when you see the expected row count.
A Correlated Subquery Turned a 2-Second Report into a 40-Minute Timeout
AVG() computation for every row in the 200,000-row outer table. 200,000 × AVG() = 200,000 full group scans. The query had O(n²) complexity that only became visible at production scale.- Correlated subqueries have O(n×m) complexity — always verify execution plans on production-scale data
- A query that runs in 2 seconds on 5,000 rows can timeout at 200,000 rows with no other changes
- Replace correlated subqueries with a pre-aggregated JOIN or CTE whenever the outer table is large
Key takeaways
Common mistakes to avoid
3 patternsUsing a correlated subquery on a large outer table without checking the execution plan
Using NOT IN when the subquery column can contain NULL values
Using a scalar subquery in SELECT to compute a per-row derived value
Interview Questions on This Topic
What is a correlated subquery and why can it be slow?
Frequently Asked Questions
20+ years shipping high-throughput database systems. Everything here is grounded in real deployments.
That's SQL Basics. Mark it forged?
4 min read · try the examples if you haven't