Python Intermediate

Python defaultdict and OrderedDict Explained — When and Why to Use Them

📅 March 2026 ⏱ 8 min read 🎯 Intermediate

In Plain English 🔥

Imagine a school register where every new student automatically gets a blank attendance sheet the moment their name is called — you never have to set one up manually. That's defaultdict: a dictionary that creates a default value for any missing key instead of yelling at you. OrderedDict is like a numbered ticket system at a deli counter — it guarantees you remember exactly who arrived first, second, and third, in that order, every time.

⚡ Quick Answer

Every Python developer hits the same wall eventually: you're grouping data into a dictionary and your code is cluttered with 'if key not in dict' checks just to avoid a KeyError. Or you're building a feature where insertion order genuinely matters — maybe a browser history, a cache, or a configuration system — and you need a rock-solid guarantee that iteration always reflects the order things went in. These aren't edge cases. They're everyday problems in production code.

Python's collections module ships with two dictionary subclasses that solve these problems cleanly: defaultdict and OrderedDict. defaultdict eliminates defensive key-checking entirely by auto-initialising missing keys with a value you choose upfront. OrderedDict existed before Python 3.7 made regular dicts order-preserving, but it still carries unique powers — most notably the ability to move items to either end and to compare two dicts where order actually matters.

By the end of this article you'll know how to replace messy grouping boilerplate with a single defaultdict line, understand exactly when OrderedDict still earns its place in modern Python, and be able to explain the difference confidently in a technical interview. You'll also see the three mistakes that trip up even experienced developers — and precisely how to avoid them.

defaultdict — Stop Writing 'if key not in dict' Forever

A regular dict throws a KeyError the instant you access a key that doesn't exist. That's sensible behaviour for lookups, but it becomes a genuine nuisance when you're building up data — grouping log lines by severity, counting word frequencies, collecting related items under a shared label.

The standard workaround is dict.setdefault() or an if-else guard. Both work, but they add noise to every single access. defaultdict solves this at the source. When you create one, you hand it a 'default factory' — any callable that returns the starting value for any new key. Use list and every new key starts with an empty list. Use int and it starts at zero. Use set, lambda, or even your own function — defaultdict doesn't care as long as it's callable.

The factory runs exactly once per new key, on the first access. After that, the key exists like any other. It's a small design choice with an outsized effect on readability: your grouping and counting code collapses from four lines to one.

defaultdict_grouping.py · PYTHON

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061

from collections import defaultdict

# --- Example 1: Grouping log entries by severity level ---
# Without defaultdict you'd write:
#   if severity not in logs: logs[severity] = []
#   logs[severity].append(message)
# With defaultdict, the empty list is created automatically.

logs = defaultdict(list)  # default factory is 'list' — new keys start as []

raw_entries = [
    ("ERROR",   "Database connection timed out"),
    ("INFO",    "User alice logged in"),
    ("ERROR",   "Disk space below 10%"),
    ("WARNING", "Retry attempt 3 of 5"),
    ("INFO",    "Scheduled backup started"),
    ("ERROR",   "Payment gateway unreachable"),
]

for severity, message in raw_entries:
    logs[severity].append(message)  # no KeyError, no guard needed

for severity, messages in logs.items():
    print(f"[{severity}] — {len(messages)} event(s)")
    for msg in messages:
        print(f"    • {msg}")

print()

# --- Example 2: Counting word frequency with int default (starts at 0) ---
sentence = "the quick brown fox jumps over the lazy dog the fox"
word_counts = defaultdict(int)  # new keys start at 0

for word in sentence.split():
    word_counts[word] += 1  # += 1 on a brand-new key works because default is 0

# Show only words that appear more than once
repeated = {word: count for word, count in word_counts.items() if count > 1}
print("Words appearing more than once:", repeated)

print()

# --- Example 3: Nested defaultdict for a 2-level grouping ---
# Tracking which users performed which actions on which day
activity_log = defaultdict(lambda: defaultdict(list))

events = [
    ("2024-01-15", "alice", "login"),
    ("2024-01-15", "bob",   "purchase"),
    ("2024-01-15", "alice", "search"),
    ("2024-01-16", "alice", "logout"),
    ("2024-01-16", "bob",   "login"),
]

for date, user, action in events:
    activity_log[date][user].append(action)  # two levels, zero guards

for date, users in activity_log.items():
    print(f"{date}:")
    for user, actions in users.items():
        print(f"  {user}: {actions}")

▶ Output

[ERROR] — 3 event(s)
• Database connection timed out
• Disk space below 10%
• Payment gateway unreachable
[INFO] — 2 event(s)
• User alice logged in
• Scheduled backup started
[WARNING] — 1 event(s)
• Retry attempt 3 of 5

Words appearing more than once: {'the': 3, 'fox': 2}

2024-01-15:
alice: ['login', 'search']
bob: ['purchase']
2024-01-16:
alice: ['logout']
bob: ['login']

⚠️

Pro Tip: defaultdict vs CounterIf you're only counting things, use collections.Counter instead of defaultdict(int). Counter gives you extras like most_common() and arithmetic between counters. Reserve defaultdict(int) for when you're doing custom arithmetic beyond simple counting.

OrderedDict — When Order Isn't Just a Nice-to-Have

Since Python 3.7, regular dicts maintain insertion order — so why does OrderedDict still exist? Because 'maintains order' and 'cares about order' are different things.

A plain dict preserves order as an implementation detail you can rely on, but it won't let you reorder items, and two dicts with the same keys and values but different insertion orders compare as equal. OrderedDict gives you three things a plain dict never will: move_to_end() to reposition any key to the front or back, popitem() that lets you choose whether to pop from the front or back, and equality comparisons that consider order. Two OrderedDicts with different insertion orders are NOT equal even if their contents match.

The most practical use case today is an LRU (Least Recently Used) cache. Every time you access an item, you move it to the end. When the cache is full, you evict from the front. It's a clean O(1) cache with five lines of logic. OrderedDict also shines in configuration systems where the order of settings reflects priority, and in any protocol implementation where the sequence of fields in a message has semantic meaning.

ordered_dict_lru_cache.py · PYTHON

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859

from collections import OrderedDict

# --- Example 1: Order-sensitive equality (plain dict can't do this) ---
plain_a = {"first": 1, "second": 2, "third": 3}
plain_b = {"third": 3, "first": 1, "second": 2}
print("Plain dicts equal (ignores order):", plain_a == plain_b)  # True

ordered_a = OrderedDict([("first", 1), ("second", 2), ("third", 3)])
ordered_b = OrderedDict([("third", 3), ("first", 1), ("second", 2)])
print("OrderedDicts equal (respects order):", ordered_a == ordered_b)  # False

print()

# --- Example 2: A simple LRU cache built on OrderedDict ---
# The rule: most recently used items live at the END.
# When we're full, we evict the item at the FRONT (least recently used).

class LRUCache:
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = OrderedDict()  # insertion order = access order

    def get(self, key: str):
        if key not in self.cache:
            return None
        # Move accessed item to the end — it's now the most recently used
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key: str, value):
        if key in self.cache:
            self.cache.move_to_end(key)  # refresh its position
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            # popitem(last=False) removes the FIRST item — the least recently used
            evicted_key, evicted_val = self.cache.popitem(last=False)
            print(f"  [EVICTED] '{evicted_key}': {evicted_val}")

    def __repr__(self):
        return f"LRUCache({dict(self.cache)})"


browser_cache = LRUCache(capacity=3)

print("Loading pages into cache...")
browser_cache.put("homepage",    "<html>Home</html>")
browser_cache.put("about",       "<html>About</html>")
browser_cache.put("contact",     "<html>Contact</html>")
print(browser_cache)

print("\nUser revisits 'homepage' — refreshes its recency...")
browser_cache.get("homepage")   # moves 'homepage' to the end
print(browser_cache)

print("\nLoading a new page — cache is full, eviction happens...")
browser_cache.put("pricing", "<html>Pricing</html>")  # 'about' is now LRU
print(browser_cache)

print("\nFinal cache state:", list(browser_cache.cache.keys()))

▶ Output

Plain dicts equal (ignores order): True
OrderedDicts equal (respects order): False

Loading pages into cache...
LRUCache({'homepage': '<html>Home</html>', 'about': '<html>About</html>', 'contact': '<html>Contact</html>'})

User revisits 'homepage' — refreshes its recency...
LRUCache({'about': '<html>About</html>', 'contact': '<html>Contact</html>', 'homepage': '<html>Home</html>'})

Loading a new page — cache is full, eviction happens...
[EVICTED] 'about': <html>About</html>
LRUCache({'contact': '<html>Contact</html>', 'homepage': '<html>Home</html>', 'pricing': '<html>Pricing</html>'})

Final cache state: ['contact', 'homepage', 'pricing']

🔥

Interview Gold: 'Isn't OrderedDict obsolete in Python 3.7+?'The answer interviewers want: regular dicts preserve order but DON'T expose move_to_end(), DON'T support order-sensitive equality, and DON'T let you popitem() from the front. OrderedDict is still the right tool whenever order itself is part of your logic, not just a side-effect you're relying on.

Gotchas, Real Mistakes, and How to Fix Them

Both collections are easy to reach for — which means the subtle traps catch people off-guard. Here are the three mistakes that appear most often in code reviews, with exact symptoms and fixes.

The most common defaultdict trap is accidentally triggering key creation during a membership test. Calling missing_key in my_defaultdict returns False correctly, but accessing my_defaultdict[missing_key] — even just to check — creates the key with its default value on the spot. This silently inflates your dictionary with phantom keys, which then show up in iteration and serialisation.

The most common OrderedDict trap is trusting plain-dict equality when you actually need order-sensitive comparison. If you mix OrderedDicts and plain dicts in the same comparison (ordered == plain), Python falls back to content-only equality, silently ignoring order. You need to compare two OrderedDicts to get the order-sensitive behaviour.

A subtler defaultdict trap involves pickling and multiprocessing. Lambda functions used as default factories can't be pickled, so defaultdict(lambda: []) will raise a PicklingError the moment you try to send it across a process boundary. The fix is always a named function or a partial.

defaultdict_gotchas.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263

from collections import defaultdict, OrderedDict
from functools import partial

# ─── GOTCHA 1: Checking membership vs. accessing a missing key ───
visit_counts = defaultdict(int)
visit_counts["homepage"] += 5
visit_counts["about"] += 2

# WRONG: This looks like a read, but it CREATES 'contact' with value 0
if visit_counts["contact"] == 0:           # <- phantom key created here!
    print("Contact page not visited")

print("Keys after wrong check:", list(visit_counts.keys()))
# Output includes 'contact' — probably not what you wanted

# RIGHT: Use 'in' for membership testing — never creates a key
visit_counts2 = defaultdict(int)
visit_counts2["homepage"] += 5

if "contact" not in visit_counts2:         # <- safe, no key created
    print("Contact page not visited (safe check)")

print("Keys after safe check:", list(visit_counts2.keys()))

print()

# ─── GOTCHA 2: OrderedDict vs plain dict equality silently drops order ───
ordered = OrderedDict([("step1", "connect"), ("step2", "authenticate"), ("step3", "fetch")])
plain   = {"step3": "fetch", "step1": "connect", "step2": "authenticate"}  # different order

# This returns True — order is ignored when one side is a plain dict!
print("OrderedDict == plain dict (different order):", ordered == plain)  # True — TRAP!

# To compare order-sensitively, BOTH sides must be OrderedDicts
plain_as_ordered = OrderedDict(plain.items())
print("OrderedDict == OrderedDict (different order):", ordered == plain_as_ordered)  # False — correct!

print()

# ─── GOTCHA 3: Lambda factories break pickling ───
import pickle

# WRONG: lambda cannot be pickled
bad_cache = defaultdict(lambda: [])
try:
    pickle.dumps(bad_cache)
except Exception as error:
    print(f"Pickle error with lambda factory: {type(error).__name__}: {error}")

# RIGHT: use a named function or functools.partial
def make_empty_list():
    return []

good_cache = defaultdict(make_empty_list)   # named function IS picklable
good_cache["users"].append("alice")
pickled = pickle.dumps(good_cache)          # works fine
restored = pickle.loads(pickled)
print("Restored from pickle:", dict(restored))

# Also works: partial(list) is picklable
good_cache2 = defaultdict(partial(list))
good_cache2["admins"].append("bob")
print("partial(list) factory works:", dict(good_cache2))

▶ Output

Contact page not visited
Keys after wrong check: ['homepage', 'about', 'contact']
Contact page not visited (safe check)
Keys after safe check: ['homepage']

OrderedDict == plain dict (different order): True — TRAP!
OrderedDict == OrderedDict (different order): False — correct!

Pickle error with lambda factory: AttributeError: Can't pickle local object '<lambda>'
Restored from pickle: {'users': ['alice']}
partial(list) factory works: {'admins': ['bob']}

⚠️

Watch Out: defaultdict.default_factory is None after copy()If you shallow-copy a defaultdict using dict(my_defaultdict) you get a plain dict — the default factory is lost. Use copy.copy() or defaultdict(factory, existing_dict) to preserve the factory when copying.

Feature / Aspect	defaultdict	OrderedDict
Primary purpose	Auto-initialise missing keys	Guarantee and manipulate key order
Inherited from	dict	dict
Missing key behaviour	Calls factory, returns default value	Raises KeyError (same as plain dict)
Key method / feature	default_factory attribute	move_to_end(key, last=True/False)
Pop from front	Not supported natively	popitem(last=False) removes first item
Order-sensitive equality	No (same as plain dict)	Yes — two OrderedDicts with different order are NOT equal
Pickling with lambda factory	Fails — use named function instead	N/A — no factory involved
Best real-world use case	Grouping, counting, adjacency lists	LRU cache, priority configs, protocol fields
Python version needed	2.5+ (collections module)	2.7+ (collections module)
Still relevant in Python 3.7+?	Yes — factory auto-init is unique	Yes — move_to_end and order equality are unique

🎯 Key Takeaways

defaultdict's factory runs once per new key on first access — not on every read — so after that first touch the key behaves exactly like any other dict entry.
Accessing a missing defaultdict key with bracket notation CREATES the key with its default value; use 'in' for safe membership testing that doesn't mutate the dict.
OrderedDict is still the right tool in 2024 when order is part of your logic: move_to_end() powers clean LRU caches and order-sensitive equality catches sequence bugs that plain dicts would silently miss.
Never use a lambda as a defaultdict factory if the dict might be pickled or passed to a multiprocessing worker — use a named function or functools.partial instead.

⚠ Common Mistakes to Avoid

✕Mistake 1: Accessing a missing defaultdict key to check its value — Symptom: phantom keys appear in the dict and show up during iteration or JSON serialisation — Fix: always use 'key in my_dict' for membership tests; only use bracket access when you actually want to read or write the value.
✕Mistake 2: Comparing an OrderedDict to a plain dict and expecting order to matter — Symptom: ordered == plain returns True even when insertion orders differ, masking a bug — Fix: ensure BOTH operands are OrderedDicts before relying on order-sensitive equality; wrap the plain dict with OrderedDict(plain.items()) first.
✕Mistake 3: Using a lambda as the default_factory in a defaultdict that needs to be pickled or sent via multiprocessing — Symptom: PicklingError or AttributeError at runtime, often only discovered when scaling to multiple workers — Fix: replace the lambda with a named module-level function or use functools.partial(list) / functools.partial(dict) which are both picklable.

Interview Questions on This Topic

QWhat is the difference between using dict.setdefault() and defaultdict, and when would you choose one over the other?
QSince Python 3.7 guarantees dict insertion order, can you give me a concrete example where you'd still prefer OrderedDict over a plain dict today?
QIf I gave you a defaultdict(list) and asked you to prevent any NEW keys from being added after a certain point — how would you do it without changing the type?

Frequently Asked Questions

Is OrderedDict still useful in Python 3.7 and later?

Yes. Regular dicts preserve insertion order since 3.7, but OrderedDict adds three things a plain dict never will: move_to_end() to reposition any key, popitem(last=False) to pop from the front in O(1), and order-sensitive equality so two OrderedDicts with different insertion orders compare as not equal. If any of those behaviours matter to you, reach for OrderedDict.

What happens if I access a key that doesn't exist in a defaultdict?

The default_factory is called with no arguments, the returned value is stored under that key, and the value is returned to you — no KeyError is raised. This happens only on the first access; subsequent accesses just return the stored value like a normal dict key.

Can I use a defaultdict where the default value depends on the key itself?

Not directly — the factory receives no arguments, so it can't see the key. The workaround is to subclass defaultdict and override __missing__(self, key), which does receive the key. That gives you full control over what default value to generate per key while keeping all other defaultdict behaviour intact.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged