Python defaultdict and OrderedDict Explained — When and Why to Use Them
Every Python developer hits the same wall eventually: you're grouping data into a dictionary and your code is cluttered with 'if key not in dict' checks just to avoid a KeyError. Or you're building a feature where insertion order genuinely matters — maybe a browser history, a cache, or a configuration system — and you need a rock-solid guarantee that iteration always reflects the order things went in. These aren't edge cases. They're everyday problems in production code.
Python's collections module ships with two dictionary subclasses that solve these problems cleanly: defaultdict and OrderedDict. defaultdict eliminates defensive key-checking entirely by auto-initialising missing keys with a value you choose upfront. OrderedDict existed before Python 3.7 made regular dicts order-preserving, but it still carries unique powers — most notably the ability to move items to either end and to compare two dicts where order actually matters.
By the end of this article you'll know how to replace messy grouping boilerplate with a single defaultdict line, understand exactly when OrderedDict still earns its place in modern Python, and be able to explain the difference confidently in a technical interview. You'll also see the three mistakes that trip up even experienced developers — and precisely how to avoid them.
defaultdict — Stop Writing 'if key not in dict' Forever
A regular dict throws a KeyError the instant you access a key that doesn't exist. That's sensible behaviour for lookups, but it becomes a genuine nuisance when you're building up data — grouping log lines by severity, counting word frequencies, collecting related items under a shared label.
The standard workaround is dict.setdefault() or an if-else guard. Both work, but they add noise to every single access. defaultdict solves this at the source. When you create one, you hand it a 'default factory' — any callable that returns the starting value for any new key. Use list and every new key starts with an empty list. Use int and it starts at zero. Use set, lambda, or even your own function — defaultdict doesn't care as long as it's callable.
The factory runs exactly once per new key, on the first access. After that, the key exists like any other. It's a small design choice with an outsized effect on readability: your grouping and counting code collapses from four lines to one.
from collections import defaultdict # --- Example 1: Grouping log entries by severity level --- # Without defaultdict you'd write: # if severity not in logs: logs[severity] = [] # logs[severity].append(message) # With defaultdict, the empty list is created automatically. logs = defaultdict(list) # default factory is 'list' — new keys start as [] raw_entries = [ ("ERROR", "Database connection timed out"), ("INFO", "User alice logged in"), ("ERROR", "Disk space below 10%"), ("WARNING", "Retry attempt 3 of 5"), ("INFO", "Scheduled backup started"), ("ERROR", "Payment gateway unreachable"), ] for severity, message in raw_entries: logs[severity].append(message) # no KeyError, no guard needed for severity, messages in logs.items(): print(f"[{severity}] — {len(messages)} event(s)") for msg in messages: print(f" • {msg}") print() # --- Example 2: Counting word frequency with int default (starts at 0) --- sentence = "the quick brown fox jumps over the lazy dog the fox" word_counts = defaultdict(int) # new keys start at 0 for word in sentence.split(): word_counts[word] += 1 # += 1 on a brand-new key works because default is 0 # Show only words that appear more than once repeated = {word: count for word, count in word_counts.items() if count > 1} print("Words appearing more than once:", repeated) print() # --- Example 3: Nested defaultdict for a 2-level grouping --- # Tracking which users performed which actions on which day activity_log = defaultdict(lambda: defaultdict(list)) events = [ ("2024-01-15", "alice", "login"), ("2024-01-15", "bob", "purchase"), ("2024-01-15", "alice", "search"), ("2024-01-16", "alice", "logout"), ("2024-01-16", "bob", "login"), ] for date, user, action in events: activity_log[date][user].append(action) # two levels, zero guards for date, users in activity_log.items(): print(f"{date}:") for user, actions in users.items(): print(f" {user}: {actions}")
• Database connection timed out
• Disk space below 10%
• Payment gateway unreachable
[INFO] — 2 event(s)
• User alice logged in
• Scheduled backup started
[WARNING] — 1 event(s)
• Retry attempt 3 of 5
Words appearing more than once: {'the': 3, 'fox': 2}
2024-01-15:
alice: ['login', 'search']
bob: ['purchase']
2024-01-16:
alice: ['logout']
bob: ['login']
OrderedDict — When Order Isn't Just a Nice-to-Have
Since Python 3.7, regular dicts maintain insertion order — so why does OrderedDict still exist? Because 'maintains order' and 'cares about order' are different things.
A plain dict preserves order as an implementation detail you can rely on, but it won't let you reorder items, and two dicts with the same keys and values but different insertion orders compare as equal. OrderedDict gives you three things a plain dict never will: move_to_end() to reposition any key to the front or back, popitem() that lets you choose whether to pop from the front or back, and equality comparisons that consider order. Two OrderedDicts with different insertion orders are NOT equal even if their contents match.
The most practical use case today is an LRU (Least Recently Used) cache. Every time you access an item, you move it to the end. When the cache is full, you evict from the front. It's a clean O(1) cache with five lines of logic. OrderedDict also shines in configuration systems where the order of settings reflects priority, and in any protocol implementation where the sequence of fields in a message has semantic meaning.
from collections import OrderedDict # --- Example 1: Order-sensitive equality (plain dict can't do this) --- plain_a = {"first": 1, "second": 2, "third": 3} plain_b = {"third": 3, "first": 1, "second": 2} print("Plain dicts equal (ignores order):", plain_a == plain_b) # True ordered_a = OrderedDict([("first", 1), ("second", 2), ("third", 3)]) ordered_b = OrderedDict([("third", 3), ("first", 1), ("second", 2)]) print("OrderedDicts equal (respects order):", ordered_a == ordered_b) # False print() # --- Example 2: A simple LRU cache built on OrderedDict --- # The rule: most recently used items live at the END. # When we're full, we evict the item at the FRONT (least recently used). class LRUCache: def __init__(self, capacity: int): self.capacity = capacity self.cache = OrderedDict() # insertion order = access order def get(self, key: str): if key not in self.cache: return None # Move accessed item to the end — it's now the most recently used self.cache.move_to_end(key) return self.cache[key] def put(self, key: str, value): if key in self.cache: self.cache.move_to_end(key) # refresh its position self.cache[key] = value if len(self.cache) > self.capacity: # popitem(last=False) removes the FIRST item — the least recently used evicted_key, evicted_val = self.cache.popitem(last=False) print(f" [EVICTED] '{evicted_key}': {evicted_val}") def __repr__(self): return f"LRUCache({dict(self.cache)})" browser_cache = LRUCache(capacity=3) print("Loading pages into cache...") browser_cache.put("homepage", "<html>Home</html>") browser_cache.put("about", "<html>About</html>") browser_cache.put("contact", "<html>Contact</html>") print(browser_cache) print("\nUser revisits 'homepage' — refreshes its recency...") browser_cache.get("homepage") # moves 'homepage' to the end print(browser_cache) print("\nLoading a new page — cache is full, eviction happens...") browser_cache.put("pricing", "<html>Pricing</html>") # 'about' is now LRU print(browser_cache) print("\nFinal cache state:", list(browser_cache.cache.keys()))
OrderedDicts equal (respects order): False
Loading pages into cache...
LRUCache({'homepage': '<html>Home</html>', 'about': '<html>About</html>', 'contact': '<html>Contact</html>'})
User revisits 'homepage' — refreshes its recency...
LRUCache({'about': '<html>About</html>', 'contact': '<html>Contact</html>', 'homepage': '<html>Home</html>'})
Loading a new page — cache is full, eviction happens...
[EVICTED] 'about': <html>About</html>
LRUCache({'contact': '<html>Contact</html>', 'homepage': '<html>Home</html>', 'pricing': '<html>Pricing</html>'})
Final cache state: ['contact', 'homepage', 'pricing']
Gotchas, Real Mistakes, and How to Fix Them
Both collections are easy to reach for — which means the subtle traps catch people off-guard. Here are the three mistakes that appear most often in code reviews, with exact symptoms and fixes.
The most common defaultdict trap is accidentally triggering key creation during a membership test. Calling missing_key in my_defaultdict returns False correctly, but accessing my_defaultdict[missing_key] — even just to check — creates the key with its default value on the spot. This silently inflates your dictionary with phantom keys, which then show up in iteration and serialisation.
The most common OrderedDict trap is trusting plain-dict equality when you actually need order-sensitive comparison. If you mix OrderedDicts and plain dicts in the same comparison (ordered == plain), Python falls back to content-only equality, silently ignoring order. You need to compare two OrderedDicts to get the order-sensitive behaviour.
A subtler defaultdict trap involves pickling and multiprocessing. Lambda functions used as default factories can't be pickled, so defaultdict(lambda: []) will raise a PicklingError the moment you try to send it across a process boundary. The fix is always a named function or a partial.
from collections import defaultdict, OrderedDict from functools import partial # ─── GOTCHA 1: Checking membership vs. accessing a missing key ─── visit_counts = defaultdict(int) visit_counts["homepage"] += 5 visit_counts["about"] += 2 # WRONG: This looks like a read, but it CREATES 'contact' with value 0 if visit_counts["contact"] == 0: # <- phantom key created here! print("Contact page not visited") print("Keys after wrong check:", list(visit_counts.keys())) # Output includes 'contact' — probably not what you wanted # RIGHT: Use 'in' for membership testing — never creates a key visit_counts2 = defaultdict(int) visit_counts2["homepage"] += 5 if "contact" not in visit_counts2: # <- safe, no key created print("Contact page not visited (safe check)") print("Keys after safe check:", list(visit_counts2.keys())) print() # ─── GOTCHA 2: OrderedDict vs plain dict equality silently drops order ─── ordered = OrderedDict([("step1", "connect"), ("step2", "authenticate"), ("step3", "fetch")]) plain = {"step3": "fetch", "step1": "connect", "step2": "authenticate"} # different order # This returns True — order is ignored when one side is a plain dict! print("OrderedDict == plain dict (different order):", ordered == plain) # True — TRAP! # To compare order-sensitively, BOTH sides must be OrderedDicts plain_as_ordered = OrderedDict(plain.items()) print("OrderedDict == OrderedDict (different order):", ordered == plain_as_ordered) # False — correct! print() # ─── GOTCHA 3: Lambda factories break pickling ─── import pickle # WRONG: lambda cannot be pickled bad_cache = defaultdict(lambda: []) try: pickle.dumps(bad_cache) except Exception as error: print(f"Pickle error with lambda factory: {type(error).__name__}: {error}") # RIGHT: use a named function or functools.partial def make_empty_list(): return [] good_cache = defaultdict(make_empty_list) # named function IS picklable good_cache["users"].append("alice") pickled = pickle.dumps(good_cache) # works fine restored = pickle.loads(pickled) print("Restored from pickle:", dict(restored)) # Also works: partial(list) is picklable good_cache2 = defaultdict(partial(list)) good_cache2["admins"].append("bob") print("partial(list) factory works:", dict(good_cache2))
Keys after wrong check: ['homepage', 'about', 'contact']
Contact page not visited (safe check)
Keys after safe check: ['homepage']
OrderedDict == plain dict (different order): True — TRAP!
OrderedDict == OrderedDict (different order): False — correct!
Pickle error with lambda factory: AttributeError: Can't pickle local object '<lambda>'
Restored from pickle: {'users': ['alice']}
partial(list) factory works: {'admins': ['bob']}
| Feature / Aspect | defaultdict | OrderedDict |
|---|---|---|
| Primary purpose | Auto-initialise missing keys | Guarantee and manipulate key order |
| Inherited from | dict | dict |
| Missing key behaviour | Calls factory, returns default value | Raises KeyError (same as plain dict) |
| Key method / feature | default_factory attribute | move_to_end(key, last=True/False) |
| Pop from front | Not supported natively | popitem(last=False) removes first item |
| Order-sensitive equality | No (same as plain dict) | Yes — two OrderedDicts with different order are NOT equal |
| Pickling with lambda factory | Fails — use named function instead | N/A — no factory involved |
| Best real-world use case | Grouping, counting, adjacency lists | LRU cache, priority configs, protocol fields |
| Python version needed | 2.5+ (collections module) | 2.7+ (collections module) |
| Still relevant in Python 3.7+? | Yes — factory auto-init is unique | Yes — move_to_end and order equality are unique |
🎯 Key Takeaways
- defaultdict's factory runs once per new key on first access — not on every read — so after that first touch the key behaves exactly like any other dict entry.
- Accessing a missing defaultdict key with bracket notation CREATES the key with its default value; use 'in' for safe membership testing that doesn't mutate the dict.
- OrderedDict is still the right tool in 2024 when order is part of your logic: move_to_end() powers clean LRU caches and order-sensitive equality catches sequence bugs that plain dicts would silently miss.
- Never use a lambda as a defaultdict factory if the dict might be pickled or passed to a multiprocessing worker — use a named function or functools.partial instead.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Accessing a missing defaultdict key to check its value — Symptom: phantom keys appear in the dict and show up during iteration or JSON serialisation — Fix: always use 'key in my_dict' for membership tests; only use bracket access when you actually want to read or write the value.
- ✕Mistake 2: Comparing an OrderedDict to a plain dict and expecting order to matter — Symptom: ordered == plain returns True even when insertion orders differ, masking a bug — Fix: ensure BOTH operands are OrderedDicts before relying on order-sensitive equality; wrap the plain dict with OrderedDict(plain.items()) first.
- ✕Mistake 3: Using a lambda as the default_factory in a defaultdict that needs to be pickled or sent via multiprocessing — Symptom: PicklingError or AttributeError at runtime, often only discovered when scaling to multiple workers — Fix: replace the lambda with a named module-level function or use functools.partial(list) / functools.partial(dict) which are both picklable.
Interview Questions on This Topic
- QWhat is the difference between using dict.setdefault() and defaultdict, and when would you choose one over the other?
- QSince Python 3.7 guarantees dict insertion order, can you give me a concrete example where you'd still prefer OrderedDict over a plain dict today?
- QIf I gave you a defaultdict(list) and asked you to prevent any NEW keys from being added after a certain point — how would you do it without changing the type?
Frequently Asked Questions
Is OrderedDict still useful in Python 3.7 and later?
Yes. Regular dicts preserve insertion order since 3.7, but OrderedDict adds three things a plain dict never will: move_to_end() to reposition any key, popitem(last=False) to pop from the front in O(1), and order-sensitive equality so two OrderedDicts with different insertion orders compare as not equal. If any of those behaviours matter to you, reach for OrderedDict.
What happens if I access a key that doesn't exist in a defaultdict?
The default_factory is called with no arguments, the returned value is stored under that key, and the value is returned to you — no KeyError is raised. This happens only on the first access; subsequent accesses just return the stored value like a normal dict key.
Can I use a defaultdict where the default value depends on the key itself?
Not directly — the factory receives no arguments, so it can't see the key. The workaround is to subclass defaultdict and override __missing__(self, key), which does receive the key. That gives you full control over what default value to generate per key while keeping all other defaultdict behaviour intact.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.