SQL UNION vs INTERSECT Explained — Real-World Patterns and Pitfalls
Every non-trivial application eventually needs to pull data from more than one source and stitch it together in a meaningful way. Maybe you're running an e-commerce platform and you need one report that shows customers from two different regional databases. Maybe you're a data analyst trying to find which products appear in both this quarter's bestseller list AND last quarter's. These aren't edge cases — they're everyday production problems that UNION and INTERSECT were built to solve cleanly and efficiently.
Before these set operators existed, developers worked around the problem with clunky JOINs or application-layer logic that merged result sets in code — slow, error-prone, and hard to read. UNION and INTERSECT push that merging logic where it belongs: inside the database engine, which is orders of magnitude better at set operations than your application server. They also make intent explicit. A UNION tells any future reader of your SQL exactly what you're doing: combining two independent result sets. That clarity is worth a lot in a codebase that needs to be maintained for years.
By the end of this article you'll understand not just the syntax but the mental model behind set operators. You'll know when to reach for UNION ALL instead of UNION to avoid a silent performance trap, when INTERSECT is cleaner than a correlated subquery, how duplicate handling works in both operators, and you'll walk away with three real-world query patterns you can adapt immediately.
UNION — Combining Two Result Sets Into One (And the UNION ALL Trap)
UNION stacks the rows from two SELECT statements on top of each other. Think of it as a vertical JOIN — instead of adding columns sideways, it adds rows downward. The critical rule is that both queries must return the same number of columns, and the corresponding columns must have compatible data types. The column names in the final output come from the first SELECT statement, not the second.
Here's the part that trips people up: plain UNION automatically removes duplicate rows across the combined result. This sounds helpful, but it means the database has to sort or hash the entire result set to find and remove those duplicates — even if you know there are none. That's wasted CPU and I/O on every execution.
UNION ALL skips the deduplication step entirely. It just appends. It's always faster than UNION. You should default to UNION ALL and only use plain UNION when you genuinely need duplicates removed. A surprising number of production queries use UNION where UNION ALL was intended, causing a quiet but real performance tax.
The most common real-world use case is consolidating data from partitioned tables — for example, an orders_2023 and orders_2024 table that were split for archival reasons but need to be queried together for a full-history report.
-- Scenario: orders are split across two yearly tables. -- We need a single list of all customers who placed an order in either year. -- Table: orders_2023 -- | customer_id | customer_email | order_total | -- |-------------|-------------------------|-------------| -- | 101 | alice@example.com | 250.00 | -- | 102 | bob@example.com | 89.99 | -- | 103 | carol@example.com | 410.50 | -- Table: orders_2024 -- | customer_id | customer_email | order_total | -- |-------------|-------------------------|-------------| -- | 102 | bob@example.com | 120.00 | -- Bob ordered in both years -- | 104 | dave@example.com | 305.75 | -- | 105 | eve@example.com | 55.00 | -- ── EXAMPLE 1: UNION (deduplicates — Bob appears only ONCE) ────────────────── SELECT customer_id, customer_email FROM orders_2023 UNION -- removes duplicate rows before returning results (slower) SELECT customer_id, customer_email FROM orders_2024; -- Output: -- | customer_id | customer_email | -- |-------------|-------------------| -- | 101 | alice@example.com | -- | 102 | bob@example.com | -- only one row for Bob -- | 103 | carol@example.com | -- | 104 | dave@example.com | -- | 105 | eve@example.com | -- ── EXAMPLE 2: UNION ALL (keeps ALL rows — Bob appears TWICE) ──────────────── SELECT customer_id, customer_email FROM orders_2023 UNION ALL -- no deduplication — just appends. Faster. SELECT customer_id, customer_email FROM orders_2024; -- Output: -- | customer_id | customer_email | -- |-------------|-------------------| -- | 101 | alice@example.com | -- | 102 | bob@example.com | -- first occurrence (2023) -- | 103 | carol@example.com | -- | 102 | bob@example.com | -- second occurrence (2024) -- | 104 | dave@example.com | -- | 105 | eve@example.com | -- ── EXAMPLE 3: UNION with ORDER BY and column alias ───────────────────────── -- ORDER BY can only appear ONCE — at the very end, after the final SELECT. -- It applies to the entire combined result, not just one half. SELECT customer_id, customer_email, order_total, 2023 AS order_year FROM orders_2023 UNION ALL SELECT customer_id, customer_email, order_total, 2024 AS order_year FROM orders_2024 ORDER BY customer_id, order_year; -- sorts the combined result -- Output: -- | customer_id | customer_email | order_total | order_year | -- |-------------|-------------------|-------------|------------| -- | 101 | alice@example.com | 250.00 | 2023 | -- | 102 | bob@example.com | 89.99 | 2023 | -- | 102 | bob@example.com | 120.00 | 2024 | -- | 103 | carol@example.com | 410.50 | 2023 | -- | 104 | dave@example.com | 305.75 | 2024 | -- | 105 | eve@example.com | 55.00 | 2024 |
Example 2 (UNION ALL): 6 rows — Bob appears twice
Example 3 (UNION ALL + ORDER BY): 6 rows sorted by customer_id then year
INTERSECT — Finding What's Common to Both Queries
INTERSECT returns only the rows that appear in the result of BOTH queries. Where UNION is additive, INTERSECT is a filter. It's essentially asking: 'What do these two result sets have in common?'
Like UNION, INTERSECT removes duplicates by default — if a value exists three times in both queries, it still only appears once in the output. Most databases don't have an INTERSECT ALL variant (PostgreSQL does; MySQL famously doesn't support INTERSECT natively at all before version 8.0.31).
The real power of INTERSECT is replacing complex correlated subqueries or EXISTS clauses with something far more readable. Consider finding customers who bought from your platform in both January and February. You could write a self-JOIN with GROUP BY, or a nested subquery with IN, but INTERSECT expresses the intent in plain English: 'Give me everyone from the January buyers AND the February buyers.'
INTERSECT compares entire rows, not just one column. Every column in both SELECT lists must match for a row to be included. This is both its power and a common source of confusion — we'll cover that in the gotchas section.
-- Scenario: A SaaS company wants to identify "sticky" users — -- customers who were active in BOTH January AND February 2024. -- These are the users worth targeting for an annual plan upgrade. -- Table: user_activity -- | user_id | activity_month | feature_used | -- |---------|----------------|--------------------| -- | 201 | 2024-01 | dashboard | -- | 202 | 2024-01 | reports | -- | 203 | 2024-01 | dashboard | -- | 201 | 2024-02 | api_integration | -- | 204 | 2024-02 | dashboard | -- | 202 | 2024-02 | reports | -- ── EXAMPLE 1: Basic INTERSECT ─────────────────────────────────────────────── -- Find user_ids active in BOTH months SELECT user_id FROM user_activity WHERE activity_month = '2024-01' INTERSECT -- only rows that appear in BOTH result sets pass through SELECT user_id FROM user_activity WHERE activity_month = '2024-02'; -- Output: -- | user_id | -- |---------| -- | 201 | -- active in both January and February -- | 202 | -- active in both January and February -- (user 203 only appears in Jan, user 204 only in Feb — excluded) -- ── EXAMPLE 2: INTERSECT as a cleaner alternative to IN + subquery ─────────── -- Same result, but compare readability: -- The subquery way (harder to read at a glance): SELECT DISTINCT user_id FROM user_activity WHERE activity_month = '2024-01' AND user_id IN ( SELECT user_id -- buried logic — reader must mentally trace this FROM user_activity WHERE activity_month = '2024-02' ); -- The INTERSECT way (reads like the business requirement itself): SELECT user_id FROM user_activity WHERE activity_month = '2024-01' INTERSECT SELECT user_id FROM user_activity WHERE activity_month = '2024-02'; -- Output for both: same two rows — user 201 and 202. -- INTERSECT wins on readability. Query optimizers often produce the same plan. -- ── EXAMPLE 3: INTERSECT with multiple columns ─────────────────────────────── -- Find (user_id, feature_used) pairs that appear in BOTH months. -- i.e. users who used the SAME feature in both Jan AND Feb. SELECT user_id, feature_used FROM user_activity WHERE activity_month = '2024-01' INTERSECT SELECT user_id, feature_used FROM user_activity WHERE activity_month = '2024-02'; -- Output: -- | user_id | feature_used | -- |---------|--------------| -- | 202 | reports | -- User 202 used 'reports' in BOTH months -- (User 201 is excluded here because they used 'dashboard' in Jan -- but 'api_integration' in Feb — different feature, so no match)
Example 2: Same 2 rows via subquery approach (equivalent result)
Example 3: 1 row — only user 202 used the same feature in both months
Combining Set Operators in Real-World Query Patterns
In production, you rarely use UNION or INTERSECT in isolation. The real skill is knowing how to chain them together and how to mix them with subqueries, CTEs, and aggregations to answer complex business questions.
Set operators follow a specific precedence order: INTERSECT binds more tightly than UNION or EXCEPT. So if you write Query A UNION Query B INTERSECT Query C, the database will evaluate B INTERSECT C first, then UNION that result with A. This is counterintuitive and the source of very subtle bugs. Always use parentheses when chaining more than two queries.
Another pattern worth knowing: wrapping a UNION or INTERSECT inside a CTE (Common Table Expression) lets you treat the combined result as a named table and then run further aggregations on top of it. This is much cleaner than nesting UNION queries inside subqueries three levels deep.
Finally, remember that set operators work on result sets, not tables directly. Each SELECT can have its own WHERE clause, JOINs, and even aggregations — as long as the final column list matches. This means you can build each 'half' of the query independently and then combine them, which is a great way to break down a complex reporting requirement into manageable pieces.
-- Scenario: A retail analytics team needs a 'VIP Customer Report'. -- Definition of VIP: -- (a) Customers who spent > $500 in 2023 OR > $500 in 2024 (high spenders) -- AND -- (b) That combined list must ALSO have made a purchase in the last 30 days -- (still active — not just historically high-value) -- Tables available: -- orders(order_id, customer_id, order_total, order_date) -- customers(customer_id, full_name, email, signup_date) -- ── Step 1: Use a CTE to build the 'high spender' pool via UNION ALL ───────── WITH high_spenders AS ( -- High spenders from 2023 SELECT customer_id FROM orders WHERE YEAR(order_date) = 2023 GROUP BY customer_id HAVING SUM(order_total) > 500 -- only customers whose 2023 total exceeded $500 UNION -- plain UNION because we want to deduplicate across years -- a customer shouldn't appear twice just for being high-value in both years -- High spenders from 2024 SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024 GROUP BY customer_id HAVING SUM(order_total) > 500 ), -- ── Step 2: Find recently active customers ─────────────────────────────────── recently_active AS ( SELECT DISTINCT customer_id FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '30 days' -- activity in last 30 days ) -- ── Step 3: INTERSECT — only customers who are in BOTH groups ──────────────── -- We join back to customers for the display columns SELECT c.customer_id, c.full_name, c.email FROM customers c WHERE c.customer_id IN ( -- The INTERSECT here is the heart of the query: -- high spenders who are also recently active SELECT customer_id FROM high_spenders INTERSECT SELECT customer_id FROM recently_active ) ORDER BY c.full_name; -- ── Alternative: explicit parentheses when chaining UNION + INTERSECT ──────── -- Without parentheses, INTERSECT binds first — this is almost never what you want -- when mixing operators. Always parenthesize. -- WRONG — INTERSECT runs before UNION due to precedence: SELECT customer_id FROM orders_2023 UNION SELECT customer_id FROM orders_2024 INTERSECT -- binds to orders_2024 first! SELECT customer_id FROM recently_active; -- RIGHT — parentheses make intent unambiguous: (SELECT customer_id FROM orders_2023 UNION SELECT customer_id FROM orders_2024) INTERSECT SELECT customer_id FROM recently_active; -- now intersects the full union result
| customer_id | full_name | email |
|-------------|------------------|-------------------------|
| 101 | Alice Mercer | alice@example.com |
| 203 | Carlos Reyes | carlos@example.com |
(Results vary by data — these are illustrative rows)
Precedence demo: the 'WRONG' and 'RIGHT' versions return different row counts
when orders_2024 and recently_active have partial overlap.
| Feature / Aspect | UNION / UNION ALL | INTERSECT |
|---|---|---|
| What it returns | All rows from both queries combined | Only rows that appear in BOTH queries |
| Duplicate handling | UNION removes them; UNION ALL keeps them | Always removes duplicates (most databases) |
| Performance default | UNION ALL is fast; UNION runs DISTINCT | Similar cost to UNION — hashing/sorting required |
| MySQL support | Full support in all versions | Supported from MySQL 8.0.31 only |
| PostgreSQL support | Full support including UNION ALL | Full support including INTERSECT ALL |
| Use WHEN | Merging partitioned tables, combining report sources | Finding overlap between two independent datasets |
| Readable alternative to | Multiple OR conditions across tables | IN with subquery, EXISTS, or self-JOIN |
| Can chain multiple? | Yes — A UNION B UNION C | Yes — watch precedence with UNION mix |
| ORDER BY placement | Once, at the very end of the full query | Once, at the very end of the full query |
| Column count rule | Must match across all SELECT statements | Must match across all SELECT statements |
🎯 Key Takeaways
- UNION ALL is almost always what you want — plain UNION silently deduplicates and pays a sorting/hashing cost every time. Use it only when removing duplicates is a genuine requirement.
- INTERSECT replaces correlated subqueries and IN clauses with code that reads exactly like the business requirement — 'give me what's in both sets'. When your query can be stated that way, INTERSECT is the right tool.
- INTERSECT matches on entire rows, not single columns — if your SELECT returns two columns, both must match. This surprises almost every developer the first time they hit an unexpectedly empty result.
- INTERSECT has higher precedence than UNION — in a chained query without parentheses, INTERSECT binds first. Always parenthesize mixed set-operator queries or you'll spend an afternoon debugging a subtle logic bug.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Using UNION instead of UNION ALL when data can't overlap — If you're unioning orders_january and orders_february and you know a single order can't exist in both months, using plain UNION still forces a full deduplication scan. The symptom is queries that are mysteriously slow on large tables for no obvious reason. Fix: default to UNION ALL and only switch to UNION when you have a concrete reason to remove duplicates.
- ✕Mistake 2: Mismatched column counts or incompatible types between SELECT statements — You'll get an error like 'each UNION query must have the same number of columns' or a type coercion error. The sneaky version is when you SELECT from two tables that look the same but one has an extra column added by a recent migration. Fix: always list columns explicitly instead of using SELECT in UNION/INTERSECT queries — it's more verbose but immune to schema changes.
- ✕Mistake 3: Expecting INTERSECT to match on one column when both SELECTs return multiple columns — If you write SELECT user_id, email FROM table_a INTERSECT SELECT user_id, email FROM table_b, a row only appears in the result if BOTH user_id AND email match. Beginners expect it to behave like a JOIN where only user_id needs to match. The symptom is an empty or unexpectedly small result set. Fix: if you only want to match on one column, use a subquery with IN, or explicitly SELECT only the column you want to match on in both sides of the INTERSECT.
Interview Questions on This Topic
- QWhat's the difference between UNION and UNION ALL, and which would you choose by default and why? (Interviewers want to hear you mention the deduplication cost and that UNION ALL should be the default.)
- QHow would you rewrite an INTERSECT query using only JOIN or subquery syntax? Walk me through the equivalent logic. (Tests whether you understand what INTERSECT is actually computing, not just the keyword.)
- QIf you write 'SELECT a UNION SELECT b INTERSECT SELECT c' without parentheses, what executes first and why? What result would that produce versus '(SELECT a UNION SELECT b) INTERSECT SELECT c'? (A classic precedence trap that most developers can't answer without thinking hard.)
Frequently Asked Questions
What is the difference between SQL UNION and INTERSECT?
UNION combines all rows from two result sets into one (optionally removing duplicates with plain UNION, or keeping all rows with UNION ALL). INTERSECT returns only the rows that appear in BOTH result sets. Think of UNION as a list of everyone invited to either party, and INTERSECT as a list of people invited to both parties.
Does MySQL support INTERSECT?
MySQL added native INTERSECT support in version 8.0.31. Before that, you had to simulate it with a JOIN or an IN subquery. PostgreSQL, SQL Server, and Oracle have supported INTERSECT for much longer. Always check your database version before using INTERSECT in production.
Can I use ORDER BY with UNION or INTERSECT?
Yes, but only once — at the very end of the entire combined query, after the last SELECT statement. You cannot add ORDER BY to each individual SELECT in a UNION or INTERSECT. The final ORDER BY sorts the entire combined result set and can reference column names or positional numbers.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.