SQL Subqueries
- Correlated subqueries run once per row of the outer query — replace with JOINs for large tables.
- EXISTS short-circuits on first match; IN collects all values first.
- NOT IN returns empty set if the subquery contains any NULL — use NOT EXISTS instead.
A subquery is a SELECT statement nested inside another query. It can appear in WHERE (filter rows), SELECT (compute values), or FROM (as a derived table). Correlated subqueries reference the outer query and run once per row. EXISTS is usually faster than IN for large datasets because it short-circuits.
Scalar and Non-Correlated Subqueries
-- Scalar subquery: returns exactly one value SELECT employee_name, salary, (SELECT AVG(salary) FROM employees) AS avg_salary, salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg FROM employees; -- Subquery in WHERE with IN SELECT product_name FROM products WHERE category_id IN ( SELECT id FROM categories WHERE type = 'Electronics' ); -- Subquery in FROM (derived table / inline view) SELECT dept, avg_salary FROM ( SELECT department AS dept, AVG(salary) AS avg_salary FROM employees GROUP BY department ) dept_averages WHERE avg_salary > 60000;
Correlated Subqueries
A correlated subquery references the outer query. It re-executes for each row of the outer query — potentially slow on large tables.
-- Find employees earning more than their department's average -- Correlated: inner query references e.department from outer query SELECT e.name, e.salary, e.department FROM employees e WHERE e.salary > ( SELECT AVG(salary) FROM employees WHERE department = e.department -- references outer query! ); -- This runs the subquery ONCE PER ROW of the outer employees table -- On 100,000 employees: 100,000 AVG() computations — slow! -- Better: use a JOIN with a derived table (computes AVG once per department) SELECT e.name, e.salary, e.department FROM employees e JOIN ( SELECT department, AVG(salary) AS dept_avg FROM employees GROUP BY department ) dept ON e.department = dept.department WHERE e.salary > dept.dept_avg;
EXISTS vs IN
-- IN: collects all values from subquery, then checks membership SELECT name FROM customers WHERE id IN (SELECT customer_id FROM orders); -- EXISTS: short-circuits — stops as soon as it finds one match SELECT name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders WHERE customer_id = c.id ); -- Performance: -- IN: subquery runs once, returns a set. Fast when subquery returns few rows. -- EXISTS: correlated, but stops at first match. Fast when many matches expected. -- For NOT: NOT EXISTS is almost always faster than NOT IN -- NOT IN fails silently if subquery contains NULL — NOT EXISTS handles it correctly -- Dangerous NULL behaviour with NOT IN: SELECT name FROM customers WHERE id NOT IN (SELECT customer_id FROM orders WHERE customer_id IS NOT NULL); -- Without IS NOT NULL: returns empty set if ANY order has NULL customer_id
🎯 Key Takeaways
- Correlated subqueries run once per row of the outer query — replace with JOINs for large tables.
- EXISTS short-circuits on first match; IN collects all values first.
- NOT IN returns empty set if the subquery contains any NULL — use NOT EXISTS instead.
- Subqueries in FROM are derived tables — they execute once and can be indexed in some databases.
- CTEs (WITH clause) are cleaner than nested subqueries and execute once (in most databases).
Interview Questions on This Topic
- QWhat is a correlated subquery and why can it be slow?
- QWhat is the difference between IN and EXISTS?
- QWhy is NOT IN dangerous when the subquery can contain NULL?
Frequently Asked Questions
When should I use a CTE instead of a subquery?
Use a CTE when the subquery is reused multiple times, when it improves readability for complex logic, or when you need recursion. CTEs defined with WITH are named and readable. In PostgreSQL and SQL Server, CTEs execute once and are memoized. In MySQL 8+, CTE behaviour is similar. For simple one-off subqueries, either works.
Can a subquery return multiple columns?
In a WHERE clause with IN, the subquery can return multiple columns if you use row value constructors: WHERE (col1, col2) IN (SELECT a, b FROM ...). In a scalar context (SELECT or comparison operators), the subquery must return exactly one column and one row.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.