Junior 4 min · March 05, 2026

SQL UNION vs INTERSECT — Why UNION Dropped 1,200 Sales

Monthly reports missing 1,200 sales because UNION deduplicates rows that only matched by coincidence.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • UNION stacks result sets vertically — combines rows from two SELECT statements into one result
  • UNION deduplicates rows (slower); UNION ALL keeps all rows including duplicates (faster)
  • Both queries in a UNION must have the same number of columns with compatible data types
  • INTERSECT returns only rows appearing in both queries — the overlap between two result sets
  • EXCEPT (or MINUS in Oracle) returns rows in the first query but not the second — set difference
  • Biggest mistake: using UNION instead of UNION ALL when rows can't overlap — UNION's deduplication wastes CPU for no benefit
Plain-English First

Imagine you have two guest lists for two separate parties. UNION combines both lists into one big list — everyone who was invited to either party. INTERSECT gives you only the names that appear on BOTH lists — the people who were invited to both parties. That's the whole idea. One operator is about 'everyone from anywhere', the other is about 'only the people everywhere'.

Every non-trivial application eventually needs to pull data from more than one source and stitch it together in a meaningful way. Maybe you're running an e-commerce platform and you need one report that shows customers from two different regional databases. Maybe you're a data analyst trying to find which products appear in both this quarter's bestseller list AND last quarter's. These aren't edge cases — they're everyday production problems that UNION and INTERSECT were built to solve cleanly and efficiently.

Before these set operators existed, developers worked around the problem with clunky JOINs or application-layer logic that merged result sets in code — slow, error-prone, and hard to read. UNION and INTERSECT push that merging logic where it belongs: inside the database engine, which is orders of magnitude better at set operations than your application server. They also make intent explicit. A UNION tells any future reader of your SQL exactly what you're doing: combining two independent result sets. That clarity is worth a lot in a codebase that needs to be maintained for years.

By the end of this article you'll understand not just the syntax but the mental model behind set operators. You'll know when to reach for UNION ALL instead of UNION to avoid a silent performance trap, when INTERSECT is cleaner than a correlated subquery, how duplicate handling works in both operators, and you'll walk away with three real-world query patterns you can adapt immediately.

UNION — Combining Two Result Sets Into One (And the UNION ALL Trap)

UNION stacks the rows from two SELECT statements on top of each other. Think of it as a vertical JOIN — instead of adding columns sideways, it adds rows downward. The critical rule is that both queries must return the same number of columns, and the corresponding columns must have compatible data types. The column names in the final output come from the first SELECT statement, not the second.

Here's the part that trips people up: plain UNION automatically removes duplicate rows across the combined result. This sounds helpful, but it means the database has to sort or hash the entire result set to find and remove those duplicates — even if you know there are none. That's wasted CPU and I/O on every execution.

UNION ALL skips the deduplication step entirely. It just appends. It's always faster than UNION. You should default to UNION ALL and only use plain UNION when you genuinely need duplicates removed. A surprising number of production queries use UNION where UNION ALL was intended, causing a quiet but real performance tax.

The most common real-world use case is consolidating data from partitioned tables — for example, an orders_2023 and orders_2024 table that were split for archival reasons but need to be queried together for a full-history report.

union_orders_report.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
-- Scenario: orders are split across two yearly tables.
-- We need a single list of all customers who placed an order in either year.

-- Table: orders_2023
-- | customer_id | customer_email          | order_total |
-- |-------------|-------------------------|-------------|
-- | 101         | alice@example.com       | 250.00      |
-- | 102         | bob@example.com         | 89.99       |
-- | 103         | carol@example.com       | 410.50      |

-- Table: orders_2024
-- | customer_id | customer_email          | order_total |
-- |-------------|-------------------------|-------------|
-- | 102         | bob@example.com         | 120.00      |  -- Bob ordered in both years
-- | 104         | dave@example.com        | 305.75      |
-- | 105         | eve@example.com         | 55.00       |


-- ── EXAMPLE 1: UNION (deduplicates — Bob appears only ONCE) ──────────────────
SELECT customer_id, customer_email
FROM orders_2023

UNION  -- removes duplicate rows before returning results (slower)

SELECT customer_id, customer_email
FROM orders_2024;

-- Output:
-- | customer_id | customer_email    |
-- |-------------|-------------------|
-- | 101         | alice@example.com |
-- | 102         | bob@example.com   |  -- only one row for Bob
-- | 103         | carol@example.com |
-- | 104         | dave@example.com  |
-- | 105         | eve@example.com   |


-- ── EXAMPLE 2: UNION ALL (keeps ALL rows — Bob appears TWICE) ────────────────
SELECT customer_id, customer_email
FROM orders_2023

UNION ALL  -- no deduplication — just appends. Faster.

SELECT customer_id, customer_email
FROM orders_2024;

-- Output:
-- | customer_id | customer_email    |
-- |-------------|-------------------|
-- | 101         | alice@example.com |
-- | 102         | bob@example.com   |  -- first occurrence (2023)
-- | 103         | carol@example.com |
-- | 102         | bob@example.com   |  -- second occurrence (2024)
-- | 104         | dave@example.com  |
-- | 105         | eve@example.com   |


-- ── EXAMPLE 3: UNION with ORDER BY and column alias ─────────────────────────
-- ORDER BY can only appear ONCE — at the very end, after the final SELECT.
-- It applies to the entire combined result, not just one half.
SELECT customer_id, customer_email, order_total, 2023 AS order_year
FROM orders_2023

UNION ALL

SELECT customer_id, customer_email, order_total, 2024 AS order_year
FROM orders_2024

ORDER BY customer_id, order_year;  -- sorts the combined result

-- Output:
-- | customer_id | customer_email    | order_total | order_year |
-- |-------------|-------------------|-------------|------------|
-- | 101         | alice@example.com | 250.00      | 2023       |
-- | 102         | bob@example.com   | 89.99       | 2023       |
-- | 102         | bob@example.com   | 120.00      | 2024       |
-- | 103         | carol@example.com | 410.50      | 2023       |
-- | 104         | dave@example.com  | 305.75      | 2024       |
-- | 105         | eve@example.com   | 55.00       | 2024       |
Output
Example 1 (UNION): 5 rows — Bob deduplicated
Example 2 (UNION ALL): 6 rows — Bob appears twice
Example 3 (UNION ALL + ORDER BY): 6 rows sorted by customer_id then year
Watch Out: UNION's Hidden Performance Tax
Plain UNION forces the database to run a DISTINCT operation over the entire combined result set — even on millions of rows. Always ask yourself: 'Do I actually need duplicates removed here?' If the two source queries are pulling from non-overlapping data (different date ranges, different regions), use UNION ALL. It can be 2–10x faster on large tables and the query plan will confirm the difference.
Production Insight
UNION ALL is almost always what you want — it's faster because it skips the deduplication step.
UNION's deduplication is based on full row equality across all columns — not on any primary key.
Use UNION ALL + GROUP BY/DISTINCT for explicit, intentional deduplication where you control the uniqueness definition.
Key Takeaway
UNION ALL is faster and safer than UNION — it never silently drops rows based on coincidental column value equality.
Use UNION only when you explicitly want to deduplicate by full row equality.
If the two queries can never produce identical rows (different source tables, different date ranges), UNION ALL is always correct.

INTERSECT — Finding What's Common to Both Queries

INTERSECT returns only the rows that appear in the result of BOTH queries. Where UNION is additive, INTERSECT is a filter. It's essentially asking: 'What do these two result sets have in common?'

Like UNION, INTERSECT removes duplicates by default — if a value exists three times in both queries, it still only appears once in the output. Most databases don't have an INTERSECT ALL variant (PostgreSQL does; MySQL famously doesn't support INTERSECT natively at all before version 8.0.31).

The real power of INTERSECT is replacing complex correlated subqueries or EXISTS clauses with something far more readable. Consider finding customers who bought from your platform in both January and February. You could write a self-JOIN with GROUP BY, or a nested subquery with IN, but INTERSECT expresses the intent in plain English: 'Give me everyone from the January buyers AND the February buyers.'

INTERSECT compares entire rows, not just one column. Every column in both SELECT lists must match for a row to be included. This is both its power and a common source of confusion — we'll cover that in the gotchas section.

intersect_loyal_customers.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
-- Scenario: A SaaS company wants to identify "sticky" users —
-- customers who were active in BOTH January AND February 2024.
-- These are the users worth targeting for an annual plan upgrade.

-- Table: user_activity
-- | user_id | activity_month | feature_used       |
-- |---------|----------------|--------------------|
-- | 201     | 2024-01        | dashboard          |
-- | 202     | 2024-01        | reports            |
-- | 203     | 2024-01        | dashboard          |
-- | 201     | 2024-02        | api_integration    |
-- | 204     | 2024-02        | dashboard          |
-- | 202     | 2024-02        | reports            |


-- ── EXAMPLE 1: Basic INTERSECT ───────────────────────────────────────────────
-- Find user_ids active in BOTH months
SELECT user_id
FROM user_activity
WHERE activity_month = '2024-01'

INTERSECT  -- only rows that appear in BOTH result sets pass through

SELECT user_id
FROM user_activity
WHERE activity_month = '2024-02';

-- Output:
-- | user_id |
-- |---------|
-- | 201     |  -- active in both January and February
-- | 202     |  -- active in both January and February
-- (user 203 only appears in Jan, user 204 only in Feb — excluded)


-- ── EXAMPLE 2: INTERSECT as a cleaner alternative to IN + subquery ───────────
-- Same result, but compare readability:

-- The subquery way (harder to read at a glance):
SELECT DISTINCT user_id
FROM user_activity
WHERE activity_month = '2024-01'
  AND user_id IN (
      SELECT user_id          -- buried logic — reader must mentally trace this
      FROM user_activity
      WHERE activity_month = '2024-02'
  );

-- The INTERSECT way (reads like the business requirement itself):
SELECT user_id FROM user_activity WHERE activity_month = '2024-01'
INTERSECT
SELECT user_id FROM user_activity WHERE activity_month = '2024-02';

-- Output for both: same two rows — user 201 and 202.
-- INTERSECT wins on readability. Query optimizers often produce the same plan.


-- ── EXAMPLE 3: INTERSECT with multiple columns ───────────────────────────────
-- Find (user_id, feature_used) pairs that appear in BOTH months.
-- i.e. users who used the SAME feature in both Jan AND Feb.
SELECT user_id, feature_used
FROM user_activity
WHERE activity_month = '2024-01'

INTERSECT

SELECT user_id, feature_used
FROM user_activity
WHERE activity_month = '2024-02';

-- Output:
-- | user_id | feature_used |
-- |---------|--------------|
-- | 202     | reports      |  -- User 202 used 'reports' in BOTH months
-- (User 201 is excluded here because they used 'dashboard' in Jan
--  but 'api_integration' in Feb — different feature, so no match)
Output
Example 1: 2 rows — user_id 201 and 202
Example 2: Same 2 rows via subquery approach (equivalent result)
Example 3: 1 row — only user 202 used the same feature in both months
Pro Tip: INTERSECT as a Data Audit Tool
INTERSECT is genuinely useful during database migrations and audits. Run INTERSECT between a query on the old table and the same query on the new table — if the row counts match and INTERSECT returns the same count, your migration preserved the data correctly. It's a fast sanity check that takes two minutes to write.
Production Insight
INTERSECT is rarely used in OLTP but powerful for cohort analysis: find customers who placed orders in both January AND February.
INTERSECT is equivalent to a JOIN on all columns between the two result sets — the JOIN version is often faster with proper indexes.
For finding common records between two large result sets, IN with a subquery or INNER JOIN typically outperforms INTERSECT.
Key Takeaway
INTERSECT = 'rows that appear in both results' — useful for cohort overlap and data reconciliation.
For large datasets, an INNER JOIN on the relevant key often outperforms INTERSECT.
Not all databases support INTERSECT — MySQL added it in version 8.0.31.

Combining Set Operators in Real-World Query Patterns

In production, you rarely use UNION or INTERSECT in isolation. The real skill is knowing how to chain them together and how to mix them with subqueries, CTEs, and aggregations to answer complex business questions.

Set operators follow a specific precedence order: INTERSECT binds more tightly than UNION or EXCEPT. So if you write Query A UNION Query B INTERSECT Query C, the database will evaluate B INTERSECT C first, then UNION that result with A. This is counterintuitive and the source of very subtle bugs. Always use parentheses when chaining more than two queries.

Another pattern worth knowing: wrapping a UNION or INTERSECT inside a CTE (Common Table Expression) lets you treat the combined result as a named table and then run further aggregations on top of it. This is much cleaner than nesting UNION queries inside subqueries three levels deep.

Finally, remember that set operators work on result sets, not tables directly. Each SELECT can have its own WHERE clause, JOINs, and even aggregations — as long as the final column list matches. This means you can build each 'half' of the query independently and then combine them, which is a great way to break down a complex reporting requirement into manageable pieces.

combined_set_operators_report.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
-- Scenario: A retail analytics team needs a 'VIP Customer Report'.
-- Definition of VIP:
--   (a) Customers who spent > $500 in 2023  OR  > $500 in 2024 (high spenders)
--   AND
--   (b) That combined list must ALSO have made a purchase in the last 30 days
--       (still active — not just historically high-value)

-- Tables available:
-- orders(order_id, customer_id, order_total, order_date)
-- customers(customer_id, full_name, email, signup_date)


-- ── Step 1: Use a CTE to build the 'high spender' pool via UNION ALL ─────────
WITH high_spenders AS (
    -- High spenders from 2023
    SELECT customer_id
    FROM orders
    WHERE YEAR(order_date) = 2023
    GROUP BY customer_id
    HAVING SUM(order_total) > 500  -- only customers whose 2023 total exceeded $500

    UNION  -- plain UNION because we want to deduplicate across years
           -- a customer shouldn't appear twice just for being high-value in both years

    -- High spenders from 2024
    SELECT customer_id
    FROM orders
    WHERE YEAR(order_date) = 2024
    GROUP BY customer_id
    HAVING SUM(order_total) > 500
),

-- ── Step 2: Find recently active customers ───────────────────────────────────
recently_active AS (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'  -- activity in last 30 days
)

-- ── Step 3: INTERSECT — only customers who are in BOTH groups ────────────────
-- We join back to customers for the display columns
SELECT
    c.customer_id,
    c.full_name,
    c.email
FROM customers c
WHERE c.customer_id IN (
    -- The INTERSECT here is the heart of the query:
    -- high spenders who are also recently active
    SELECT customer_id FROM high_spenders
    INTERSECT
    SELECT customer_id FROM recently_active
)
ORDER BY c.full_name;


-- ── Alternative: explicit parentheses when chaining UNION + INTERSECT ────────
-- Without parentheses, INTERSECT binds first — this is almost never what you want
-- when mixing operators. Always parenthesize.

-- WRONG — INTERSECT runs before UNION due to precedence:
SELECT customer_id FROM orders_2023
UNION
SELECT customer_id FROM orders_2024
INTERSECT                              -- binds to orders_2024 first!
SELECT customer_id FROM recently_active;

-- RIGHT — parentheses make intent unambiguous:
(SELECT customer_id FROM orders_2023
 UNION
 SELECT customer_id FROM orders_2024)
INTERSECT
SELECT customer_id FROM recently_active;  -- now intersects the full union result
Output
VIP Customer Report:
| customer_id | full_name | email |
|-------------|------------------|-------------------------|
| 101 | Alice Mercer | alice@example.com |
| 203 | Carlos Reyes | carlos@example.com |
(Results vary by data — these are illustrative rows)
Precedence demo: the 'WRONG' and 'RIGHT' versions return different row counts
when orders_2024 and recently_active have partial overlap.
Interview Gold: Precedence Catches Everyone
Interviewers love asking 'what does UNION INTERSECT do without parentheses?' The answer: INTERSECT has higher precedence than UNION (similar to how * binds before + in arithmetic). In A UNION B INTERSECT C, the engine evaluates B INTERSECT C first. Most working developers don't know this. Knowing it signals genuine depth.
Production Insight
Set operators (UNION, INTERSECT, EXCEPT) require identical column counts and compatible types across both queries.
Column names in the final result come from the first SELECT — aliases in the second query are ignored.
For complex multi-source reports, CTEs with UNION ALL inside them are cleaner than chaining multiple UNION operators.
Key Takeaway
Column names come from the first SELECT in a UNION — aliases in subsequent queries are ignored.
All queries in a UNION chain must have the same column count and compatible types.
For readability, use CTEs (WITH clause) to name each source before combining with UNION ALL.
● Production incidentPOST-MORTEMseverity: high

Monthly Sales Report Missing 1,200 Transactions Due to Accidental UNION Deduplication

Symptom
Monthly sales totals were consistently 3-5% lower than expected. Finance found 1,200 transactions per month 'missing' from the report.
Assumption
UNION was used to 'clean up' the data, assuming duplicate rows in the combined result would be data errors that should be removed.
Root cause
Two customers placed orders for the exact same amount on the same day. UNION's deduplication compared full rows — and these rows matched on all columns (amount, date, product_id, region). UNION treated them as duplicates and kept only one, silently dropping real revenue.
Fix
Changed UNION to UNION ALL. Added a note explaining that order_id is the uniqueness key, not the full row content. Real deduplication on order_id was added as a separate step.
Key lesson
  • UNION ALL is almost always the correct choice — UNION's deduplication is based on all columns which is rarely the right uniqueness definition
  • If deduplication is needed, do it explicitly on the primary key: SELECT DISTINCT order_id, ... or GROUP BY order_id
  • Prefer UNION ALL and add explicit deduplication logic rather than relying on UNION's implicit row-equality check
Production debug guideDiagnosing missing rows, column count errors, and type mismatches3 entries
Symptom · 01
Error: each UNION query must have the same number of columns
Fix
Both SELECT statements in a UNION must return the same number of columns. Add NULL AS column_name placeholders for columns that don't exist in one query: SELECT id, name, NULL AS phone FROM employees UNION SELECT id, name, contact_number FROM contractors.
Symptom · 02
UNION returns fewer rows than expected
Fix
UNION deduplicates by full row equality — if two real rows happen to have identical values in all columns, one is dropped. Switch to UNION ALL to keep all rows, then add explicit deduplication on the primary key if needed.
Symptom · 03
INTERSECT returns empty result despite rows that appear to match
Fix
Column types or case sensitivity may differ. Check for trailing spaces: TRIM(col) in both queries. Check case: LOWER(col) in both. Also verify that all columns in the SELECT list match — INTERSECT compares full rows.
Feature / AspectUNION / UNION ALLINTERSECT
What it returnsAll rows from both queries combinedOnly rows that appear in BOTH queries
Duplicate handlingUNION removes them; UNION ALL keeps themAlways removes duplicates (most databases)
Performance defaultUNION ALL is fast; UNION runs DISTINCTSimilar cost to UNION — hashing/sorting required
MySQL supportFull support in all versionsSupported from MySQL 8.0.31 only
PostgreSQL supportFull support including UNION ALLFull support including INTERSECT ALL
Use WHENMerging partitioned tables, combining report sourcesFinding overlap between two independent datasets
Readable alternative toMultiple OR conditions across tablesIN with subquery, EXISTS, or self-JOIN
Can chain multiple?Yes — A UNION B UNION CYes — watch precedence with UNION mix
ORDER BY placementOnce, at the very end of the full queryOnce, at the very end of the full query
Column count ruleMust match across all SELECT statementsMust match across all SELECT statements

Key takeaways

1
UNION ALL is almost always what you want
plain UNION silently deduplicates and pays a sorting/hashing cost every time. Use it only when removing duplicates is a genuine requirement.
2
INTERSECT replaces correlated subqueries and IN clauses with code that reads exactly like the business requirement
'give me what's in both sets'. When your query can be stated that way, INTERSECT is the right tool.
3
INTERSECT matches on entire rows, not single columns
if your SELECT returns two columns, both must match. This surprises almost every developer the first time they hit an unexpectedly empty result.
4
INTERSECT has higher precedence than UNION
in a chained query without parentheses, INTERSECT binds first. Always parenthesize mixed set-operator queries or you'll spend an afternoon debugging a subtle logic bug.

Common mistakes to avoid

3 patterns
×

Using UNION instead of UNION ALL when rows cannot overlap

Symptom
Report is slower than necessary — UNION performs a sort-based or hash-based deduplication step even when the two queries could never produce identical rows (e.g., orders from January vs orders from February)
Fix
Use UNION ALL by default. Switch to UNION only when you explicitly need to remove duplicate rows and can verify that full-row equality is the correct uniqueness definition.
×

Different column counts or incompatible types in UNION queries

Symptom
Error: each UNION query must have the same number of columns — or a type casting error on the rows that actually differ
Fix
Match column counts by adding NULL placeholders: SELECT id, name, NULL AS phone FROM table1 UNION SELECT id, name, phone FROM table2. Cast incompatible types explicitly: CAST(int_col AS VARCHAR) to match a VARCHAR column in the other query.
×

Expecting INTERSECT to find rows matching on one key column only

Symptom
INTERSECT returns no results despite rows with matching IDs existing in both queries — because INTERSECT compares all columns, and the other columns differ
Fix
INTERSECT compares full rows across all SELECT columns. To find rows matching on a specific key, use a JOIN or IN subquery: SELECT id FROM query1 WHERE id IN (SELECT id FROM query2). This is more explicit and usually faster.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between UNION and UNION ALL?
Q02SENIOR
When would you use INTERSECT in a real production query?
Q03JUNIOR
You need to combine employee records from two tables — employees and con...
Q01 of 03JUNIOR

What is the difference between UNION and UNION ALL?

ANSWER
UNION combines the result sets of two SELECT queries and removes duplicate rows — it performs an implicit DISTINCT on the combined result. UNION ALL combines the result sets and keeps all rows, including duplicates. UNION ALL is almost always preferable: it's faster (no deduplication step), it never silently drops real data, and deduplication based on full row equality is rarely the right uniqueness definition. Use UNION only when you explicitly need to remove rows where all column values are identical. For explicit deduplication on a specific key, use UNION ALL followed by SELECT DISTINCT on the key column.
FAQ · 3 QUESTIONS

Frequently Asked Questions

01
What is the difference between SQL UNION and INTERSECT?
02
Does MySQL support INTERSECT?
03
Can I use ORDER BY with UNION or INTERSECT?
🔥

That's SQL Basics. Mark it forged?

4 min read · try the examples if you haven't

Previous
SQL Subqueries
11 / 16 · SQL Basics
Next
SQL NULL Handling