Python Sets Explained — Creation, Operations and Real-World Use Cases
Every program eventually needs to answer questions like 'which users signed up twice?' or 'which items do these two shopping carts have in common?' Without the right tool, answering those questions means writing loops inside loops, tracking flags, and hoping you didn't miss an edge case. Sets exist to make that kind of work trivially easy — and they're built right into Python, no imports needed.
The core problem sets solve is uniqueness plus fast membership testing. If you store a million email addresses in a list and need to check whether one specific address is in there, Python has to scan every single item — that's slow. A set can answer the same question almost instantly, no matter how large it is. On top of that, sets give you mathematical operations — union, intersection, difference — with a single operator instead of complex logic.
By the end of this article you'll know how to create a set, add and remove items, use set operations to compare collections, and — crucially — recognise exactly when a set is the right tool for the job. You'll also know the two most common mistakes beginners make so you can skip straight past them.
Creating a Set and Understanding Why Duplicates Vanish
There are two ways to create a set in Python. The first is the curly-brace literal syntax — you put your items inside {}, separated by commas. The second is the set() constructor, which converts any iterable (like a list or string) into a set.
The moment you create a set, Python silently discards any duplicate values. This isn't an error — it's the point. If you pass in [1, 2, 2, 3], the set keeps {1, 2, 3}. The original list is untouched; the set is a new, deduplicated collection.
One thing that surprises beginners: the order you see when you print a set is NOT guaranteed to match the order you put items in. Sets are unordered by design, which is part of what makes them so fast. If order matters to you, a set is the wrong tool — use a list. If uniqueness matters and order doesn't, a set is perfect.
Also important: every item in a set must be hashable. That means strings, numbers, and tuples are fine. Lists and dictionaries are NOT allowed as set members because they can change — Python can't safely hash something that might mutate.
# ── Way 1: curly-brace literal ────────────────────────────────────────── favourite_fruits = {"apple", "mango", "banana", "apple", "mango"} # Notice: "apple" and "mango" appear twice above — watch what Python keeps print("Favourite fruits:", favourite_fruits) # ── Way 2: set() constructor converts a list into a set ────────────────── raw_signups = ["alice@mail.com", "bob@mail.com", "alice@mail.com", "carol@mail.com"] unique_signups = set(raw_signups) # duplicates dropped automatically print("Unique signups:", unique_signups) print("Total unique:", len(unique_signups)) # 3, not 4 # ── set() on a string splits it into unique CHARACTERS ────────────────── letters_in_word = set("mississippi") # only unique letters survive print("Unique letters in 'mississippi':", letters_in_word) # ── An empty set MUST use set(), NOT {} ───────────────────────────────── empty_set = set() # correct — this is an empty set empty_dict = {} # WRONG for a set — this creates an empty dictionary! print("Type of set():", type(empty_set)) # <class 'set'> print("Type of {}: ", type(empty_dict)) # <class 'dict'> ← gotcha!
Unique signups: {'carol@mail.com', 'alice@mail.com', 'bob@mail.com'}
Total unique: 3
Unique letters in 'mississippi': {'m', 'i', 's', 'p'}
Type of set(): <class 'set'>
Type of {}: <class 'dict'>
Adding, Removing and Checking Items — The Everyday Set Operations
Once you have a set, you'll want to add new items, remove old ones, and check whether something is already in there. These are the three most common day-to-day operations.
To add a single item, use .add(). If the item is already in the set, nothing happens — no error, no duplicate, just silence. To add multiple items at once, use .update() and pass it any iterable.
Removing is where you get a choice. .remove() deletes an item but raises a KeyError if the item doesn't exist — use this when you're sure the item is there. .discard() does the same thing but does NOTHING if the item is missing — use this when you're not sure. Think of .discard() as the polite version: it won't complain.
The in keyword checks membership, and this is where sets genuinely shine. Checking item in my_set is O(1) — constant time — regardless of how large the set is. The same check on a list is O(n) — it gets slower as the list grows. This speed difference is why sets exist at all for lookup-heavy tasks.
# Starting set of confirmed attendees at an event attendees = {"Alice", "Bob", "Carol"} # ── Adding items ────────────────────────────────────────────────────────── attendees.add("David") # add one person attendees.add("Alice") # Alice is already there — nothing changes print("After adding Alice again:", attendees) # still only one Alice attendees.update(["Eve", "Frank", "Grace"]) # add several people at once print("After batch add:", attendees) # ── Removing items ──────────────────────────────────────────────────────── attendees.remove("Bob") # Bob cancelled — we're sure he's in the set print("After removing Bob:", attendees) attendees.discard("Zara") # Zara was never there — discard won't crash print("After discarding Zara (who wasn't there):", attendees) # attendees.remove("Zara") # ← this WOULD raise KeyError — commented out # ── Membership testing — the fastest way to check ───────────────────────── print("Is Alice attending?", "Alice" in attendees) # True print("Is Bob attending? ", "Bob" in attendees) # False — we removed him # ── Practical example: deduplicating user IDs from two data sources ──────── app_logins = [101, 102, 103, 102, 104, 101] # raw log with repeats unique_users = set(app_logins) # instant deduplication print("Unique user IDs:", unique_users) print("Count:", len(unique_users)) # 4 unique users
After batch add: {'Alice', 'Bob', 'Carol', 'David', 'Eve', 'Frank', 'Grace'}
After removing Bob: {'Alice', 'Carol', 'David', 'Eve', 'Frank', 'Grace'}
After discarding Zara (who wasn't there): {'Alice', 'Carol', 'David', 'Eve', 'Frank', 'Grace'}
Is Alice attending? True
Is Bob attending? False
Unique user IDs: {101, 102, 103, 104}
Count: 4
Set Math — Union, Intersection and Difference in Plain English
This is where sets go from 'nice to have' to genuinely powerful. Python sets support four mathematical operations that let you compare two collections in ways that would otherwise require several lines of loop logic.
Union (| or .union()) — give me EVERYTHING from both sets. Like combining two guest lists into one, no repeats.
Intersection (& or .intersection()) — give me only items that appear in BOTH sets. Like finding mutual friends between two people.
Difference (- or .difference()) — give me items in set A that are NOT in set B. Like finding which guests from list A didn't appear on list B.
Symmetric Difference (^ or .symmetric_difference()) — give me items that are in one set OR the other, but NOT both. Everything exclusive to each side.
These operations don't modify the original sets — they return a brand new set. If you want to modify the original in place, use the assignment versions: |=, &=, -=, ^=.
# Two streaming platforms and their exclusive shows netflix_shows = {"Stranger Things", "Ozark", "The Crown", "Dark", "Squid Game"} disney_shows = {"The Mandalorian", "WandaVision", "Squid Game", "The Crown", "Loki"} # Note: "Squid Game" and "The Crown" are on both (hypothetically) # ── UNION — everything available on either platform ─────────────────────── all_shows = netflix_shows | disney_shows print("All shows across both platforms:") print(all_shows) print(f"Total unique titles: {len(all_shows)}\n") # ── INTERSECTION — shows available on BOTH platforms ───────────────────── shared_shows = netflix_shows & disney_shows print("Shows on BOTH platforms (overlaps):") print(shared_shows) # {'Squid Game', 'The Crown'} print() # ── DIFFERENCE — shows ONLY on Netflix (not on Disney) ─────────────────── netflix_only = netflix_shows - disney_shows print("Shows exclusive to Netflix:") print(netflix_only) print() # ── SYMMETRIC DIFFERENCE — exclusives on each side ─────────────────────── exclusive_to_one_platform = netflix_shows ^ disney_shows print("Shows exclusive to exactly one platform (not shared):") print(exclusive_to_one_platform) print() # ── Real-world use case: which users are new today? ────────────────────── users_yesterday = {"alice", "bob", "carol", "david"} users_today = {"alice", "carol", "eve", "frank"} new_users = users_today - users_yesterday # signed up since yesterday lost_users = users_yesterday - users_today # didn't return today loyal_users = users_today & users_yesterday # came back both days print("New users today: ", new_users) print("Users who left: ", lost_users) print("Loyal returning: ", loyal_users)
{'Stranger Things', 'Ozark', 'The Crown', 'Dark', 'Squid Game', 'The Mandalorian', 'WandaVision', 'Loki'}
Total unique titles: 8
Shows on BOTH platforms (overlaps):
{'The Crown', 'Squid Game'}
Shows exclusive to Netflix:
{'Stranger Things', 'Ozark', 'Dark'}
Shows exclusive to exactly one platform (not shared):
{'Stranger Things', 'Ozark', 'Dark', 'The Mandalorian', 'WandaVision', 'Loki'}
New users today: {'eve', 'frank'}
Users who left: {'bob', 'david'}
Loyal returning: {'alice', 'carol'}
Frozen Sets — When You Need an Immutable Set
Regular sets are mutable — you can add and remove items after creation. But sometimes you need a set that nobody can change, one you can use as a dictionary key or store inside another set. That's what frozenset is for.
A frozenset is exactly like a regular set — same uniqueness guarantee, same fast membership testing, same mathematical operations — except it's locked after creation. You can't call .add() or .remove() on it. In exchange, it's hashable, which means you can use it as a dictionary key or put it inside another set.
When would you actually use this? Imagine you're building a permissions system where a group of permissions is a unit — you want to use that group as a dictionary key to look up what role it maps to. A regular set can't be a key. A frozenset can.
For most beginner work you won't need frozensets often, but knowing they exist saves you from confusion when you hit the 'unhashable type: set' error — and it will definitely come up in interviews.
# Regular set — mutable, cannot be used as a dictionary key read_write_permissions = {"read", "write", "delete"} # Frozenset — immutable, CAN be used as a dictionary key admin_permissions = frozenset({"read", "write", "delete", "admin"}) viewer_permissions = frozenset({"read"}) editor_permissions = frozenset({"read", "write"}) # Using frozensets as dictionary KEYS — impossible with regular sets permission_to_role = { admin_permissions : "Administrator", editor_permissions : "Editor", viewer_permissions : "Viewer", } # Look up what role a set of permissions maps to user_perms = frozenset({"read", "write"}) print("User role:", permission_to_role[user_perms]) # Editor # Frozensets support all the same math as regular sets common = admin_permissions & editor_permissions print("Shared permissions (admin & editor):", common) # Attempting to modify a frozenset raises AttributeError try: viewer_permissions.add("write") # this will fail except AttributeError as error: print(f"Cannot modify frozenset: {error}") # You CAN put a frozenset inside a regular set all_roles = {admin_permissions, editor_permissions, viewer_permissions} print("Number of distinct roles:", len(all_roles)) # 3
Shared permissions (admin & editor): {'read', 'write'}
Cannot modify frozenset: 'frozenset' object has no attribute 'add'
Number of distinct roles: 3
| Feature | List | Set | Frozenset |
|---|---|---|---|
| Allows duplicates | Yes | No — unique only | No — unique only |
| Ordered (insertion order kept) | Yes | No | No |
| Mutable (can change after creation) | Yes | Yes | No — locked |
| Can be a dictionary key | No | No | Yes |
| Membership test speed (item in ...) | O(n) — slow on large data | O(1) — constant speed | O(1) — constant speed |
| Supports union / intersection / difference | No (manual loops needed) | Yes — built-in operators | Yes — built-in operators |
| Can contain lists as elements | Yes | No — lists aren't hashable | No — lists aren't hashable |
| Typical use case | Ordered collection, may repeat | Unique items, fast lookup, set math | Immutable unique group, dict key |
🎯 Key Takeaways
- A set guarantees uniqueness — adding a duplicate silently does nothing, which makes sets the cleanest way to deduplicate any collection with a single line:
unique = set(raw_list). - Membership testing with
inis O(1) for sets versus O(n) for lists — for large datasets this is the difference between an instant response and a noticeable lag. - The four set operators —
|(union),&(intersection),-(difference),^(symmetric difference) — replace complex nested loops with a single, readable expression. - Always use
set()not{}to create an empty set, and reach forfrozensetwhenever you need a set that's immutable or needs to act as a dictionary key.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Using {} to create an empty set —
my_set = {}looks like it should work, but Python interprets curly braces without items as an empty DICTIONARY. You'll getwhen you checktype(my_set), and operations like.add()will fail withAttributeError: 'dict' object has no attribute 'add'. Fix: always usemy_set = set()to create an empty set. - ✕Mistake 2: Expecting a set to preserve insertion order — beginners often print a set and are confused that the order is different from what they typed in. Sets are deliberately unordered; Python can print
{3, 1, 2}even if you wrote{1, 2, 3}. If you need the items in a specific order, convert to a sorted list first:sorted_items = sorted(my_set). Never rely on set order for logic. - ✕Mistake 3: Trying to put a list inside a set — writing
my_set = {[1, 2], [3, 4]}raisesTypeError: unhashable type: 'list'immediately. Sets require all elements to be hashable, and lists are mutable so they can't be hashed. Fix: convert the inner lists to tuples first —my_set = {(1, 2), (3, 4)}— tuples are immutable and hashable, so they work perfectly as set members.
Interview Questions on This Topic
- QWhat is the time complexity of checking membership in a Python set versus a list, and why is there a difference?
- QHow would you use sets to find elements that exist in one list but not another? Walk me through the code.
- QIf I try to create a set of lists in Python, what happens and how would you fix it?
Frequently Asked Questions
Can a Python set contain duplicate values?
No. A set automatically discards any duplicate values the moment they're added. If you create {1, 2, 2, 3}, Python silently keeps only {1, 2, 3}. This is the defining characteristic of a set — every element is guaranteed to be unique, always.
What is the difference between a Python set and a list?
Lists are ordered and allow duplicates; sets are unordered and allow only unique values. Lists support indexing (my_list[0]) but sets don't. Membership testing (item in collection) is much faster on a set — O(1) constant time — compared to O(n) linear time on a list. Use a list when order or duplicates matter; use a set when uniqueness or fast lookup matters.
Why can't I use a list as an element inside a Python set?
Sets use a hash table internally to achieve fast lookups, which means every element must be hashable — it must have a fixed hash value that never changes. Lists are mutable (you can change them after creation), so Python can't safely compute a stable hash for them. The fix is to use tuples instead of lists as set elements, since tuples are immutable and therefore hashable.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.