SQL Subqueries — 40-Minute Timeout from Correlated Query
AVG() 200k times — 2-second report became 40-minute timeout.- Correlated subqueries run once per outer row — replace with JOINs on pre-aggregated CTEs for large tables.
- EXISTS short-circuits on first match; IN collects all values first — choose based on expected match rate.
- NOT IN is broken when the subquery contains any NULL — always use NOT EXISTS for negative existence checks.
- A subquery is a SELECT nested inside another query — appears in WHERE (filter), SELECT (compute a value), or FROM (derived table)
- Non-correlated subquery: executes once, result reused — fast
- Correlated subquery: references the outer query and re-executes once per outer row — potentially O(n²)
- EXISTS short-circuits on first match; IN collects the full result set first
- NOT IN is dangerous when the subquery can return NULL — returns empty set silently; use NOT EXISTS instead
- Rule: replace correlated subqueries with JOINs on large tables for an order-of-magnitude speedup
Production Incident
AVG() computation for every row in the 200,000-row outer table. 200,000 × AVG() = 200,000 full group scans. The query had O(n²) complexity that only became visible at production scale.Production Debug GuideDiagnosing slow queries caused by subquery patterns
Subqueries are one of the most flexible tools in SQL — they let you compose queries in ways that would otherwise require multiple steps or temporary tables. But they also hide performance traps, especially correlated subqueries that run once per row.
This guide covers the main subquery types, the EXISTS vs IN performance difference, the dangerous NULL behaviour of NOT IN, and when to reach for CTEs instead. Understanding the performance profile of each subquery type is the difference between a query that works and one that works at scale.
Scalar and Non-Correlated Subqueries
A scalar subquery returns exactly one value — one row, one column. You can use it anywhere a single value is expected: in SELECT to compute a derived column, in WHERE for a comparison, or in SET for an UPDATE. The key property of a non-correlated scalar subquery is that it executes once and its result is reused for every row of the outer query.
Subqueries in WHERE with IN accept a multi-row, single-column result — the outer query keeps rows where the column value appears in that list. Subqueries in FROM create a derived table (also called an inline view) — the result acts as a temporary table for the outer query to filter or join against.
Performance note: non-correlated subqueries execute once. The engine computes the inner result, then uses it for all outer row comparisons. This is fundamentally different from correlated subqueries, which re-execute per outer row.
-- Scalar subquery: compute one value, reuse for every row SELECT employee_name, salary, (SELECT AVG(salary) FROM employees) AS company_avg, salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg FROM employees; -- The AVG subquery executes ONCE, result reused for all rows -- Subquery in WHERE with IN SELECT product_name FROM products WHERE category_id IN ( SELECT id FROM categories WHERE type = 'Electronics' ); -- Inner query executes once, outer query checks membership -- Subquery in FROM (derived table / inline view) SELECT dept, avg_salary FROM ( SELECT department AS dept, AVG(salary) AS avg_salary FROM employees GROUP BY department ) dept_averages WHERE avg_salary > 60000; -- Derived table computes once, outer query filters the result
employee_name | salary | company_avg | diff_from_avg
Alice Tanaka | 95000.00 | 72000.00 | 23000.00
Marcus Webb | 58000.00 | 72000.00 | -14000.00
-- Derived table result:
dept | avg_salary
Engineering | 88500.00
Product | 79200.00
Correlated Subqueries — Power and the O(n²) Trap
A correlated subquery references a column from the outer query. This means it cannot run independently — it must re-execute for each row the outer query processes. On a table with 100 rows, that's 100 executions. On a table with 100,000 rows, it's 100,000 executions. This is the O(n²) trap.
Correlated subqueries are useful for row-by-row comparisons that are hard to express any other way. The canonical example: 'find employees earning more than their own department's average.' The subquery needs to know which department the current outer row belongs to — and that changes row by row.
But for large tables, this pattern is almost always replaceable with a JOIN to a pre-aggregated derived table or CTE. The JOIN version computes the average once per department (not once per row) and is orders of magnitude faster.
-- CORRELATED SUBQUERY: executes AVG() once per outer row -- On 100,000 employees: 100,000 AVG() computations SELECT e.name, e.salary, e.department FROM employees e WHERE e.salary > ( SELECT AVG(salary) FROM employees WHERE department = e.department -- references outer row! ); -- BETTER: pre-aggregate in a derived table (computes AVG once per dept) SELECT e.name, e.salary, e.department FROM employees e JOIN ( SELECT department, AVG(salary) AS dept_avg FROM employees GROUP BY department ) dept ON e.department = dept.department WHERE e.salary > dept.dept_avg; -- Or with a CTE (same performance, more readable): WITH dept_averages AS ( SELECT department, AVG(salary) AS dept_avg FROM employees GROUP BY department ) SELECT e.name, e.salary, e.department FROM employees e JOIN dept_averages d ON e.department = d.department WHERE e.salary > d.dept_avg;
-- JOIN version: 1.4 seconds on same data
-- EXPLAIN output correlated: loops=200000 on inner AVG()
-- EXPLAIN output JOIN: loops=1 on the aggregation
EXISTS vs IN — Performance and the NULL Danger
EXISTS and IN are both used to check whether matching rows exist in another table, but they behave differently in important ways.
IN collects the full result set from the subquery, then checks membership for each outer row. If the subquery returns 10,000 rows, IN builds a 10,000-entry lookup structure first. EXISTS, on the other hand, is correlated — it runs the inner query for each outer row and stops as soon as it finds one match. This short-circuit makes EXISTS faster when you expect many matches.
The most important practical difference is NULL handling. NOT IN returns an empty result set if the subquery contains any NULL values — because SQL uses three-valued logic: a value compared to NULL is neither true nor false, it's unknown. One NULL in the subquery poisons the entire NOT IN check. NOT EXISTS does not have this problem because it never compares values directly to NULL.
-- IN: collects all values from subquery, then checks membership SELECT name FROM customers WHERE id IN ( SELECT customer_id FROM orders ); -- EXISTS: short-circuits on first match — stops looking immediately SELECT name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders WHERE customer_id = c.id ); -- The SELECT 1 is a convention — any column works here, -- EXISTS only cares whether a row was found, not what it contains -- DANGEROUS: NOT IN with potential NULLs SELECT name FROM customers WHERE id NOT IN ( SELECT customer_id FROM orders -- if ANY row has NULL customer_id, ); -- this returns ZERO rows silently! -- SAFE: NOT EXISTS handles NULLs correctly SELECT name FROM customers c WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.id ); -- ALSO SAFE: explicit NULL guard on NOT IN SELECT name FROM customers WHERE id NOT IN ( SELECT customer_id FROM orders WHERE customer_id IS NOT NULL -- explicitly exclude NULLs );
name
Alice Tanaka
Ben Okafor
-- NOT IN result (if any order has NULL customer_id): empty set
-- NOT EXISTS result: correctly returns customers with no orders
name
Clara Svensson
| Pattern | Executes | NULL safe? | Short-circuits? | Best for |
|---|---|---|---|---|
| Non-correlated subquery | Once | Yes | No | Computing a single value or filtering against a static list |
| Correlated subquery | Once per outer row | Yes | No | Row-by-row comparisons — use JOIN for large tables |
| EXISTS | Once per outer row, stops at first match | Yes | Yes | Checking existence when most rows match |
| IN | Once (subquery), then membership check | Yes for IN, NO for NOT IN | No | Filtering against a small, known list |
| NOT IN | Once (subquery) | NO — NULL poisons result | No | Avoid — use NOT EXISTS instead |
| NOT EXISTS | Once per outer row, stops at first match | Yes | Yes | Safe negative existence check |
🎯 Key Takeaways
- Correlated subqueries run once per outer row — replace with JOINs on pre-aggregated CTEs for large tables.
- EXISTS short-circuits on first match; IN collects all values first — choose based on expected match rate.
- NOT IN is broken when the subquery contains any NULL — always use NOT EXISTS for negative existence checks.
- Subqueries in FROM (derived tables) execute once and are safe for large result sets — they are effectively pre-aggregated CTEs without the WITH syntax.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is a correlated subquery and why can it be slow?Mid-levelReveal
- QWhat is the difference between IN and EXISTS — when would you prefer one over the other?Mid-levelReveal
- QWhy is NOT IN dangerous when the subquery can contain NULL, and how do you fix it?SeniorReveal
Frequently Asked Questions
When should I use a CTE instead of a subquery?
Use a CTE when the subquery is reused multiple times, when the logic has multiple sequential steps, or when you need recursion. CTEs are named and easier to debug — you can comment out the outer query and SELECT from the CTE directly. For simple one-off subqueries, either works.
Can a subquery return multiple columns?
In a WHERE clause with IN, the subquery can return multiple columns using row value constructors: WHERE (col1, col2) IN (SELECT a, b FROM ...). In a scalar context, the subquery must return exactly one column and one row — otherwise you get a runtime error.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.