Home Database SQL UNION vs INTERSECT Explained — Real-World Patterns and Pitfalls

SQL UNION vs INTERSECT Explained — Real-World Patterns and Pitfalls

In Plain English 🔥
Imagine you have two guest lists for two separate parties. UNION combines both lists into one big list — everyone who was invited to either party. INTERSECT gives you only the names that appear on BOTH lists — the people who were invited to both parties. That's the whole idea. One operator is about 'everyone from anywhere', the other is about 'only the people everywhere'.
⚡ Quick Answer
Imagine you have two guest lists for two separate parties. UNION combines both lists into one big list — everyone who was invited to either party. INTERSECT gives you only the names that appear on BOTH lists — the people who were invited to both parties. That's the whole idea. One operator is about 'everyone from anywhere', the other is about 'only the people everywhere'.

Every non-trivial application eventually needs to pull data from more than one source and stitch it together in a meaningful way. Maybe you're running an e-commerce platform and you need one report that shows customers from two different regional databases. Maybe you're a data analyst trying to find which products appear in both this quarter's bestseller list AND last quarter's. These aren't edge cases — they're everyday production problems that UNION and INTERSECT were built to solve cleanly and efficiently.

Before these set operators existed, developers worked around the problem with clunky JOINs or application-layer logic that merged result sets in code — slow, error-prone, and hard to read. UNION and INTERSECT push that merging logic where it belongs: inside the database engine, which is orders of magnitude better at set operations than your application server. They also make intent explicit. A UNION tells any future reader of your SQL exactly what you're doing: combining two independent result sets. That clarity is worth a lot in a codebase that needs to be maintained for years.

By the end of this article you'll understand not just the syntax but the mental model behind set operators. You'll know when to reach for UNION ALL instead of UNION to avoid a silent performance trap, when INTERSECT is cleaner than a correlated subquery, how duplicate handling works in both operators, and you'll walk away with three real-world query patterns you can adapt immediately.

UNION — Combining Two Result Sets Into One (And the UNION ALL Trap)

UNION stacks the rows from two SELECT statements on top of each other. Think of it as a vertical JOIN — instead of adding columns sideways, it adds rows downward. The critical rule is that both queries must return the same number of columns, and the corresponding columns must have compatible data types. The column names in the final output come from the first SELECT statement, not the second.

Here's the part that trips people up: plain UNION automatically removes duplicate rows across the combined result. This sounds helpful, but it means the database has to sort or hash the entire result set to find and remove those duplicates — even if you know there are none. That's wasted CPU and I/O on every execution.

UNION ALL skips the deduplication step entirely. It just appends. It's always faster than UNION. You should default to UNION ALL and only use plain UNION when you genuinely need duplicates removed. A surprising number of production queries use UNION where UNION ALL was intended, causing a quiet but real performance tax.

The most common real-world use case is consolidating data from partitioned tables — for example, an orders_2023 and orders_2024 table that were split for archival reasons but need to be queried together for a full-history report.

union_orders_report.sql · SQL
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
-- Scenario: orders are split across two yearly tables.
-- We need a single list of all customers who placed an order in either year.

-- Table: orders_2023
-- | customer_id | customer_email          | order_total |
-- |-------------|-------------------------|-------------|
-- | 101         | alice@example.com       | 250.00      |
-- | 102         | bob@example.com         | 89.99       |
-- | 103         | carol@example.com       | 410.50      |

-- Table: orders_2024
-- | customer_id | customer_email          | order_total |
-- |-------------|-------------------------|-------------|
-- | 102         | bob@example.com         | 120.00      |  -- Bob ordered in both years
-- | 104         | dave@example.com        | 305.75      |
-- | 105         | eve@example.com         | 55.00       |


-- ── EXAMPLE 1: UNION (deduplicates — Bob appears only ONCE) ──────────────────
SELECT customer_id, customer_email
FROM orders_2023

UNION  -- removes duplicate rows before returning results (slower)

SELECT customer_id, customer_email
FROM orders_2024;

-- Output:
-- | customer_id | customer_email    |
-- |-------------|-------------------|
-- | 101         | alice@example.com |
-- | 102         | bob@example.com   |  -- only one row for Bob
-- | 103         | carol@example.com |
-- | 104         | dave@example.com  |
-- | 105         | eve@example.com   |


-- ── EXAMPLE 2: UNION ALL (keeps ALL rows — Bob appears TWICE) ────────────────
SELECT customer_id, customer_email
FROM orders_2023

UNION ALL  -- no deduplication — just appends. Faster.

SELECT customer_id, customer_email
FROM orders_2024;

-- Output:
-- | customer_id | customer_email    |
-- |-------------|-------------------|
-- | 101         | alice@example.com |
-- | 102         | bob@example.com   |  -- first occurrence (2023)
-- | 103         | carol@example.com |
-- | 102         | bob@example.com   |  -- second occurrence (2024)
-- | 104         | dave@example.com  |
-- | 105         | eve@example.com   |


-- ── EXAMPLE 3: UNION with ORDER BY and column alias ─────────────────────────
-- ORDER BY can only appear ONCE — at the very end, after the final SELECT.
-- It applies to the entire combined result, not just one half.
SELECT customer_id, customer_email, order_total, 2023 AS order_year
FROM orders_2023

UNION ALL

SELECT customer_id, customer_email, order_total, 2024 AS order_year
FROM orders_2024

ORDER BY customer_id, order_year;  -- sorts the combined result

-- Output:
-- | customer_id | customer_email    | order_total | order_year |
-- |-------------|-------------------|-------------|------------|
-- | 101         | alice@example.com | 250.00      | 2023       |
-- | 102         | bob@example.com   | 89.99       | 2023       |
-- | 102         | bob@example.com   | 120.00      | 2024       |
-- | 103         | carol@example.com | 410.50      | 2023       |
-- | 104         | dave@example.com  | 305.75      | 2024       |
-- | 105         | eve@example.com   | 55.00       | 2024       |
▶ Output
Example 1 (UNION): 5 rows — Bob deduplicated
Example 2 (UNION ALL): 6 rows — Bob appears twice
Example 3 (UNION ALL + ORDER BY): 6 rows sorted by customer_id then year
⚠️
Watch Out: UNION's Hidden Performance TaxPlain UNION forces the database to run a DISTINCT operation over the entire combined result set — even on millions of rows. Always ask yourself: 'Do I actually need duplicates removed here?' If the two source queries are pulling from non-overlapping data (different date ranges, different regions), use UNION ALL. It can be 2–10x faster on large tables and the query plan will confirm the difference.

INTERSECT — Finding What's Common to Both Queries

INTERSECT returns only the rows that appear in the result of BOTH queries. Where UNION is additive, INTERSECT is a filter. It's essentially asking: 'What do these two result sets have in common?'

Like UNION, INTERSECT removes duplicates by default — if a value exists three times in both queries, it still only appears once in the output. Most databases don't have an INTERSECT ALL variant (PostgreSQL does; MySQL famously doesn't support INTERSECT natively at all before version 8.0.31).

The real power of INTERSECT is replacing complex correlated subqueries or EXISTS clauses with something far more readable. Consider finding customers who bought from your platform in both January and February. You could write a self-JOIN with GROUP BY, or a nested subquery with IN, but INTERSECT expresses the intent in plain English: 'Give me everyone from the January buyers AND the February buyers.'

INTERSECT compares entire rows, not just one column. Every column in both SELECT lists must match for a row to be included. This is both its power and a common source of confusion — we'll cover that in the gotchas section.

intersect_loyal_customers.sql · SQL
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
-- Scenario: A SaaS company wants to identify "sticky" users —
-- customers who were active in BOTH January AND February 2024.
-- These are the users worth targeting for an annual plan upgrade.

-- Table: user_activity
-- | user_id | activity_month | feature_used       |
-- |---------|----------------|--------------------|
-- | 201     | 2024-01        | dashboard          |
-- | 202     | 2024-01        | reports            |
-- | 203     | 2024-01        | dashboard          |
-- | 201     | 2024-02        | api_integration    |
-- | 204     | 2024-02        | dashboard          |
-- | 202     | 2024-02        | reports            |


-- ── EXAMPLE 1: Basic INTERSECT ───────────────────────────────────────────────
-- Find user_ids active in BOTH months
SELECT user_id
FROM user_activity
WHERE activity_month = '2024-01'

INTERSECT  -- only rows that appear in BOTH result sets pass through

SELECT user_id
FROM user_activity
WHERE activity_month = '2024-02';

-- Output:
-- | user_id |
-- |---------|
-- | 201     |  -- active in both January and February
-- | 202     |  -- active in both January and February
-- (user 203 only appears in Jan, user 204 only in Feb — excluded)


-- ── EXAMPLE 2: INTERSECT as a cleaner alternative to IN + subquery ───────────
-- Same result, but compare readability:

-- The subquery way (harder to read at a glance):
SELECT DISTINCT user_id
FROM user_activity
WHERE activity_month = '2024-01'
  AND user_id IN (
      SELECT user_id          -- buried logic — reader must mentally trace this
      FROM user_activity
      WHERE activity_month = '2024-02'
  );

-- The INTERSECT way (reads like the business requirement itself):
SELECT user_id FROM user_activity WHERE activity_month = '2024-01'
INTERSECT
SELECT user_id FROM user_activity WHERE activity_month = '2024-02';

-- Output for both: same two rows — user 201 and 202.
-- INTERSECT wins on readability. Query optimizers often produce the same plan.


-- ── EXAMPLE 3: INTERSECT with multiple columns ───────────────────────────────
-- Find (user_id, feature_used) pairs that appear in BOTH months.
-- i.e. users who used the SAME feature in both Jan AND Feb.
SELECT user_id, feature_used
FROM user_activity
WHERE activity_month = '2024-01'

INTERSECT

SELECT user_id, feature_used
FROM user_activity
WHERE activity_month = '2024-02';

-- Output:
-- | user_id | feature_used |
-- |---------|--------------|
-- | 202     | reports      |  -- User 202 used 'reports' in BOTH months
-- (User 201 is excluded here because they used 'dashboard' in Jan
--  but 'api_integration' in Feb — different feature, so no match)
▶ Output
Example 1: 2 rows — user_id 201 and 202
Example 2: Same 2 rows via subquery approach (equivalent result)
Example 3: 1 row — only user 202 used the same feature in both months
⚠️
Pro Tip: INTERSECT as a Data Audit ToolINTERSECT is genuinely useful during database migrations and audits. Run INTERSECT between a query on the old table and the same query on the new table — if the row counts match and INTERSECT returns the same count, your migration preserved the data correctly. It's a fast sanity check that takes two minutes to write.

Combining Set Operators in Real-World Query Patterns

In production, you rarely use UNION or INTERSECT in isolation. The real skill is knowing how to chain them together and how to mix them with subqueries, CTEs, and aggregations to answer complex business questions.

Set operators follow a specific precedence order: INTERSECT binds more tightly than UNION or EXCEPT. So if you write Query A UNION Query B INTERSECT Query C, the database will evaluate B INTERSECT C first, then UNION that result with A. This is counterintuitive and the source of very subtle bugs. Always use parentheses when chaining more than two queries.

Another pattern worth knowing: wrapping a UNION or INTERSECT inside a CTE (Common Table Expression) lets you treat the combined result as a named table and then run further aggregations on top of it. This is much cleaner than nesting UNION queries inside subqueries three levels deep.

Finally, remember that set operators work on result sets, not tables directly. Each SELECT can have its own WHERE clause, JOINs, and even aggregations — as long as the final column list matches. This means you can build each 'half' of the query independently and then combine them, which is a great way to break down a complex reporting requirement into manageable pieces.

combined_set_operators_report.sql · SQL
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
-- Scenario: A retail analytics team needs a 'VIP Customer Report'.
-- Definition of VIP:
--   (a) Customers who spent > $500 in 2023  OR  > $500 in 2024 (high spenders)
--   AND
--   (b) That combined list must ALSO have made a purchase in the last 30 days
--       (still active — not just historically high-value)

-- Tables available:
-- orders(order_id, customer_id, order_total, order_date)
-- customers(customer_id, full_name, email, signup_date)


-- ── Step 1: Use a CTE to build the 'high spender' pool via UNION ALL ─────────
WITH high_spenders AS (
    -- High spenders from 2023
    SELECT customer_id
    FROM orders
    WHERE YEAR(order_date) = 2023
    GROUP BY customer_id
    HAVING SUM(order_total) > 500  -- only customers whose 2023 total exceeded $500

    UNION  -- plain UNION because we want to deduplicate across years
           -- a customer shouldn't appear twice just for being high-value in both years

    -- High spenders from 2024
    SELECT customer_id
    FROM orders
    WHERE YEAR(order_date) = 2024
    GROUP BY customer_id
    HAVING SUM(order_total) > 500
),

-- ── Step 2: Find recently active customers ───────────────────────────────────
recently_active AS (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'  -- activity in last 30 days
)

-- ── Step 3: INTERSECT — only customers who are in BOTH groups ────────────────
-- We join back to customers for the display columns
SELECT
    c.customer_id,
    c.full_name,
    c.email
FROM customers c
WHERE c.customer_id IN (
    -- The INTERSECT here is the heart of the query:
    -- high spenders who are also recently active
    SELECT customer_id FROM high_spenders
    INTERSECT
    SELECT customer_id FROM recently_active
)
ORDER BY c.full_name;


-- ── Alternative: explicit parentheses when chaining UNION + INTERSECT ────────
-- Without parentheses, INTERSECT binds first — this is almost never what you want
-- when mixing operators. Always parenthesize.

-- WRONG — INTERSECT runs before UNION due to precedence:
SELECT customer_id FROM orders_2023
UNION
SELECT customer_id FROM orders_2024
INTERSECT                              -- binds to orders_2024 first!
SELECT customer_id FROM recently_active;

-- RIGHT — parentheses make intent unambiguous:
(SELECT customer_id FROM orders_2023
 UNION
 SELECT customer_id FROM orders_2024)
INTERSECT
SELECT customer_id FROM recently_active;  -- now intersects the full union result
▶ Output
VIP Customer Report:
| customer_id | full_name | email |
|-------------|------------------|-------------------------|
| 101 | Alice Mercer | alice@example.com |
| 203 | Carlos Reyes | carlos@example.com |
(Results vary by data — these are illustrative rows)

Precedence demo: the 'WRONG' and 'RIGHT' versions return different row counts
when orders_2024 and recently_active have partial overlap.
🔥
Interview Gold: Precedence Catches EveryoneInterviewers love asking 'what does UNION INTERSECT do without parentheses?' The answer: INTERSECT has higher precedence than UNION (similar to how * binds before + in arithmetic). In A UNION B INTERSECT C, the engine evaluates B INTERSECT C first. Most working developers don't know this. Knowing it signals genuine depth.
Feature / AspectUNION / UNION ALLINTERSECT
What it returnsAll rows from both queries combinedOnly rows that appear in BOTH queries
Duplicate handlingUNION removes them; UNION ALL keeps themAlways removes duplicates (most databases)
Performance defaultUNION ALL is fast; UNION runs DISTINCTSimilar cost to UNION — hashing/sorting required
MySQL supportFull support in all versionsSupported from MySQL 8.0.31 only
PostgreSQL supportFull support including UNION ALLFull support including INTERSECT ALL
Use WHENMerging partitioned tables, combining report sourcesFinding overlap between two independent datasets
Readable alternative toMultiple OR conditions across tablesIN with subquery, EXISTS, or self-JOIN
Can chain multiple?Yes — A UNION B UNION CYes — watch precedence with UNION mix
ORDER BY placementOnce, at the very end of the full queryOnce, at the very end of the full query
Column count ruleMust match across all SELECT statementsMust match across all SELECT statements

🎯 Key Takeaways

  • UNION ALL is almost always what you want — plain UNION silently deduplicates and pays a sorting/hashing cost every time. Use it only when removing duplicates is a genuine requirement.
  • INTERSECT replaces correlated subqueries and IN clauses with code that reads exactly like the business requirement — 'give me what's in both sets'. When your query can be stated that way, INTERSECT is the right tool.
  • INTERSECT matches on entire rows, not single columns — if your SELECT returns two columns, both must match. This surprises almost every developer the first time they hit an unexpectedly empty result.
  • INTERSECT has higher precedence than UNION — in a chained query without parentheses, INTERSECT binds first. Always parenthesize mixed set-operator queries or you'll spend an afternoon debugging a subtle logic bug.

⚠ Common Mistakes to Avoid

  • Mistake 1: Using UNION instead of UNION ALL when data can't overlap — If you're unioning orders_january and orders_february and you know a single order can't exist in both months, using plain UNION still forces a full deduplication scan. The symptom is queries that are mysteriously slow on large tables for no obvious reason. Fix: default to UNION ALL and only switch to UNION when you have a concrete reason to remove duplicates.
  • Mistake 2: Mismatched column counts or incompatible types between SELECT statements — You'll get an error like 'each UNION query must have the same number of columns' or a type coercion error. The sneaky version is when you SELECT from two tables that look the same but one has an extra column added by a recent migration. Fix: always list columns explicitly instead of using SELECT in UNION/INTERSECT queries — it's more verbose but immune to schema changes.
  • Mistake 3: Expecting INTERSECT to match on one column when both SELECTs return multiple columns — If you write SELECT user_id, email FROM table_a INTERSECT SELECT user_id, email FROM table_b, a row only appears in the result if BOTH user_id AND email match. Beginners expect it to behave like a JOIN where only user_id needs to match. The symptom is an empty or unexpectedly small result set. Fix: if you only want to match on one column, use a subquery with IN, or explicitly SELECT only the column you want to match on in both sides of the INTERSECT.

Interview Questions on This Topic

  • QWhat's the difference between UNION and UNION ALL, and which would you choose by default and why? (Interviewers want to hear you mention the deduplication cost and that UNION ALL should be the default.)
  • QHow would you rewrite an INTERSECT query using only JOIN or subquery syntax? Walk me through the equivalent logic. (Tests whether you understand what INTERSECT is actually computing, not just the keyword.)
  • QIf you write 'SELECT a UNION SELECT b INTERSECT SELECT c' without parentheses, what executes first and why? What result would that produce versus '(SELECT a UNION SELECT b) INTERSECT SELECT c'? (A classic precedence trap that most developers can't answer without thinking hard.)

Frequently Asked Questions

What is the difference between SQL UNION and INTERSECT?

UNION combines all rows from two result sets into one (optionally removing duplicates with plain UNION, or keeping all rows with UNION ALL). INTERSECT returns only the rows that appear in BOTH result sets. Think of UNION as a list of everyone invited to either party, and INTERSECT as a list of people invited to both parties.

Does MySQL support INTERSECT?

MySQL added native INTERSECT support in version 8.0.31. Before that, you had to simulate it with a JOIN or an IN subquery. PostgreSQL, SQL Server, and Oracle have supported INTERSECT for much longer. Always check your database version before using INTERSECT in production.

Can I use ORDER BY with UNION or INTERSECT?

Yes, but only once — at the very end of the entire combined query, after the last SELECT statement. You cannot add ORDER BY to each individual SELECT in a UNION or INTERSECT. The final ORDER BY sorts the entire combined result set and can reference column names or positional numbers.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousSQL SubqueriesNext →SQL NULL Handling
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged