Senior 3 min · March 17, 2026

defaultdict Silent Key Creation — Memory Leak Pattern

Reading dd[key] creates entries without assignment — one pipeline generated 3M phantom keys before OOM.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • defaultdict calls a factory function for every missing key — no more if-key-in-dict checks
  • Pass callable like list, int, set, or lambda to control default values
  • Regular dicts preserve insertion order since Python 3.7, so OrderedDict is only needed for order-sensitive equality and move_to_end()
  • Accessing a missing key creates an entry — accidental side effect when iterating
  • Using defaultdict(list) for grouping saves ~40% boilerplate in data processing
  • The hidden trap: defaultdict can mask KeyError that should crash – validate early in production pipelines

Two classes from Python's collections module that every developer should know: defaultdict eliminates the most repetitive pattern in Python data processing (checking if a key exists before appending to a list), and OrderedDict adds order-aware equality and the ability to move entries.

With regular dicts now maintaining insertion order since Python 3.7, the question of when to use OrderedDict is worth understanding clearly.

defaultdict — No More KeyError

defaultdict is a dict subclass that calls a factory function to supply missing values. Instead of writing if key not in dict: dict[key] = [] every time you group data, you pass the factory (list, int, set, or any callable) to the constructor. When you access a missing key, defaultdict calls the factory with no arguments and stores the result.

The factory is called only on __getitem__ (dd[key]), not on get() or in. That's the key distinction: dd.get('missing') returns None, not a new entry.

For counting, int() returns 0; for grouping, list() returns []. You can also pass lambda: default_value for custom defaults.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from collections import defaultdict

# Group words by their first letter
words = ['apple', 'avocado', 'banana', 'blueberry', 'cherry', 'apricot']

# Without defaultdict — verbose
groups = {}
for word in words:
    if word[0] not in groups:   # check before every append
        groups[word[0]] = []
    groups[word[0]].append(word)

# With defaultdict — clean
groups = defaultdict(list)     # list() called for every new key
for word in words:
    groups[word[0]].append(word)  # KeyError impossible

print(dict(groups))
# {'a': ['apple', 'avocado', 'apricot'], 'b': ['banana', 'blueberry'], 'c': ['cherry']}

# Count occurrences
counts = defaultdict(int)     # int() returns 0
for word in words:
    counts[word[0]] += 1
print(dict(counts))  # {'a': 3, 'b': 2, 'c': 1}
Output
{'a': ['apple', 'avocado', 'apricot'], 'b': ['banana', 'blueberry'], 'c': ['cherry']}
{'a': 3, 'b': 2, 'c': 1}
Mental Model: defaultdict as a factory with auto-store
  • Missing key triggers factory() -> result is stored -> returned as value.
  • Only bracket access (dd[key]) triggers the factory — .get() does not.
  • Once stored, the key behaves like any other dict key — no special treatment.
  • The factory is called with zero arguments — so list, int, set work directly.
  • For custom defaults, use lambda: 'default' — but be aware lambda captures closure state.
Production Insight
Accidental key creation causes silent memory leaks.
Iterating a defaultdict with dd[key] inside creates entries for every key read.
Rule: use .get(key) or a Counter when you don't want to create entries.
Key Takeaway
defaultdict eliminates boilerplate for grouping and counting.
But every key access creates an entry — be deliberate about read vs write.
The factory is called only on bracket access, not on .get() or 'in'.

defaultdict with Custom Factories

Any callable can be the factory. Common choices: list, int, set, dict, or a lambda. For nested grouping (two-level keys), use a lambda returning a defaultdict. This is especially useful when you need to group by two fields without writing nested loops.

The factory is called fresh for every missing key. That means if you use a mutable object like a list, each new key gets its own independent list — exactly what you want.

Beware of using a lambda that returns a mutable object shared across all keys — that's a classic Python gotcha. Always create a new object per key.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from collections import defaultdict

# Any callable works as a factory
dd_set   = defaultdict(set)        # empty set for missing keys
dd_zero  = defaultdict(lambda: 0)  # 0 for missing keys
dd_const = defaultdict(lambda: 'N/A')  # string constant

# Nested defaultdict — for 2-level grouping
transactions = [
    ('2026-03-01', 'Engineering', 500),
    ('2026-03-01', 'Marketing',   300),
    ('2026-03-02', 'Engineering', 700),
]

# Date → Department → total
monthly = defaultdict(lambda: defaultdict(int))
for date, dept, amount in transactions:
    monthly[date][dept] += amount

print(dict(monthly['2026-03-01']))
# {'Engineering': 500, 'Marketing': 300}
Output
{'Engineering': 500, 'Marketing': 300}
Shared mutable gotcha
Never use default_factory = lambda: [] with a closure that captures a mutable object. Each key needs its own list. The lambda must create a new empty container every time. This is correct: lambda: []. This is wrong: lambda: shared_list.
Production Insight
Nested defaultdict with lambda can hide missing keys in the outer layer.
If monthly[date] returns an empty defaultdict, logging might miss missing dates.
Rule: validate data completeness before using nested defaults — or use a custom class.
Key Takeaway
Nested defaultdict(lambda: defaultdict(type)) is the standard pattern for two-level grouping.
Each callable factory creates independent objects — no sharing.
Use set for deduplication, int for counting, list for ordering.

When Not to Use defaultdict — The Gotchas

defaultdict isn't always the right choice. Three scenarios where it backfires: pinfirst, when you need to distinguish between missing keys and keys with empty values; second, when iterating over keys and unintentionally creating new ones; third, when you want to raise KeyError for missing keys in validation logic.

Use a regular dict with .setdefault() for explicit creation. Use Counter for counting (it gives most_common()). For read-heavy workflows, use a regular dict and handle KeyError explicitly.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from collections import defaultdict

# Gotcha 1: Accidental creation in iteration
dd = defaultdict(list)
items = ['a', 'b', 'c']
for item in items:
    if dd[item]:  # This creates an empty list for 'a', 'b', 'c' even if not present
        pass
print(len(dd))  # 3 — all created

# Fix: use .get()
dd = defaultdict(list)
for item in items:
    if dd.get(item):  # does not create entry
        pass
print(len(dd))  # 0

# Gotcha 2: Masking KeyError that should propagate
def process_key(dd, key):
    # If key missing, we want to crash, not create a default
    value = dd[key]  # creates default silently
    return value.upper()  # possible AttributeError if default is [], not str
Output
3
0
Prefer Counter for counting
If you're counting hashable objects, use collections.Counter. It's a subclass of dict that provides most_common(), subtract(), and mathematical operations. Counter also returns 0 for missing keys (via __missing__) without creating entries — unlike defaultdict(int) which creates them.
Production Insight
Using defaultdict when you want to fail-fast on missing keys hides bugs.
In data pipelines, missing keys often indicate upstream corruption.
Rule: use regular dict and catch KeyError explicitly if missing keys are exceptional.
Key Takeaway
defaultdict is for grouping and counting — not for data validation.
Use .get(key) for read-only checks.
Use regular dict with explicit exception handling when missing keys should crash.

OrderedDict — When It Still Matters

Since Python 3.7, regular dicts maintain insertion order, so OrderedDict is no longer needed for basic order preservation. But two features remain unique: pinfirst, order-sensitive equality — two OrderedDicts with the same items in different order compare as unequal; second, move_to_end() which moves an existing key to the end (or front) without deleting and reinserting.

OrderedDict also has a smaller memory footprint than regular dict for small sizes? Actually OrderedDict uses more memory due to its linked list. The real value is the equality semantics and reordering methods.

Use cases: LRU caches (though functools.lru_cache is better), custom ordering in configuration, test assertions where insertion order must be strict.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from collections import OrderedDict

# Since Python 3.7, regular dicts maintain insertion order
# So when is OrderedDict useful?

# 1. Order-sensitive equality
od1 = OrderedDict([('a', 1), ('b', 2)])
od2 = OrderedDict([('b', 2), ('a', 1)])
regular1 = {'a': 1, 'b': 2}
regular2 = {'b': 2, 'a': 1}

print(od1 == od2)       # False — order matters for OrderedDict
print(regular1 == regular2)  # True — regular dict ignores order

# 2. move_to_end() — useful for LRU patterns
cache = OrderedDict()
cache['page1'] = 'content1'
cache['page2'] = 'content2'
cache['page3'] = 'content3'

cache.move_to_end('page1')  # move to most recently used
print(list(cache.keys()))   # ['page2', 'page3', 'page1']

cache.move_to_end('page2', last=False)  # move to front (LRU = least recently used)
print(list(cache.keys()))   # ['page2', 'page3', 'page1']
Output
False
True
['page2', 'page3', 'page1']
['page2', 'page3', 'page1']
Performance note
OrderedDict uses about twice the memory of a regular dict because it maintains a doubly linked list over the keys. For large ordered collections, consider using a regular dict and a separate list if you need order, or an alternative data structure like a list of tuples.
Production Insight
OrderedDict equality can break test assertions that depend on order.
If you serialize to JSON, order is preserved—but consumers may not expect it.
Rule: only use OrderedDict when you need move_to_end() or order-sensitive comparisons.
Key Takeaway
Regular dict is insertion-ordered since 3.7 — use it most of the time.
OrderedDict is for order-equality and move_to_end().
LRU caches: consider functools.lru_cache or manual dict pop + reassign.

Real-World Patterns: URL Routing and LRU Cache

defaultdict and OrderedDict are common in production frameworks. For example, a simple URL router: group handlers by HTTP method using defaultdict(list). Example: an LRU cache that evicts the least recently used entry when size exceeds a threshold uses OrderedDict.move_to_end() and popitem(last=False).

These patterns show how dict subclasses bridge the gap between raw Python and production-grade data structures.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from collections import defaultdict, OrderedDict

# URL router using defaultdict(list)
routes = defaultdict(list)
routes['GET'].append('/users')
routes['POST'].append('/users')
routes['GET'].append('/users/<id>')
print(dict(routes))
# {'GET': ['/users', '/users/<id>'], 'POST': ['/users']}

# LRU cache using OrderedDict
class LRUCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)
        return self.cache[key]

    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # evict least recently used
Output
{'GET': ['/users', '/users/<id>'], 'POST': ['/users']}
Production Insight
An LRU cache with popitem(last=False) uses O(1) operations.
But if capacity is small, eviction happens often — monitor hit rate.
Rule: always add a hit counter or logging to detect thrashing.
Key Takeaway
defaultdict simplifies routing and grouping patterns.
OrderedDict makes LRU insertion/eviction trivial.
But for high-concurrency, consider hand-rolled doubly linked list with dict for thread safety.
● Production incidentPOST-MORTEMseverity: high

Silent key creation in production grouping pipeline

Symptom
The pipeline processes millions of records grouped by date. After a month of running fine, it started consuming 8 GB memory and finally OOM-killed. Logs showed no errors, just high memory.
Assumption
Team assumed defaultdict only creates entries when explicitly assigned, not on access.
Root cause
The code iterated over a list and read dd[key] to check if key existed — each access created a new empty entry. Over 10 million records, this generated 3 million extra keys that were never used.
Fix
Use dd.get(key) for read-only checks, or use a regular dict with setdefault() for conditional creation.
Key lesson
  • defaultdict creates entries on any key access, not just assignment.
  • When debugging memory growth, check for accidental key creation in loops.
  • Use .get() or a Counter when you only need to read, not write.
Production debug guideQuick symptom-action matrix for the most common production issues4 entries
Symptom · 01
Memory usage grows unexpectedly when processing large datasets with defaultdict
Fix
Check for unintentional key creation: replace dd[key] with dd.get(key) in read-only contexts, or switch to collections.Counter for counting.
Symptom · 02
OrderedDict equality behaves differently than regular dict
Fix
If order-insensitive comparison needed, convert both to regular dict with dict(od) before comparing.
Symptom · 03
Nested defaultdict raises KeyError on inner access?
Fix
Ensure the outer defaultdict factory returns a defaultdict, e.g., defaultdict(lambda: defaultdict(int)).
Symptom · 04
move_to_end() doesn't reorder cache as expected
Fix
Verify last parameter: last=True moves to end (most recent), last=False moves to front (least recent).
★ Quick Debug Cheat Sheet for defaultdict & OrderedDictThree commands or checks to diagnose the most common problems immediately.
Accidental key creation in loop
Immediate action
Find all dd[key] access patterns in the loop
Commands
grep -rn '\[\w+\]' --include='*.py' | grep -v '\['
python -c "import collections; d = collections.defaultdict(int); print(d['missing'])" # creates key
Fix now
Replace dd[key] with dd.get(key) for read-only checks, or use a regular dict with setdefault.
OrderedDict equality mismatch+
Immediate action
Check if both dicts are OrderedDict instances
Commands
type(od1), type(od2) # both must be OrderedDict for order-sensitive equality
print(od1.items() == od2.items()) # compare order explicitly
Fix now
If order-insensitive, convert to dict: dict(od1) == dict(od2)
Nested defaultdict fails on inner key+
Immediate action
Check the outer factory returns a defaultdict
Commands
print(isinstance(dd['outer'], defaultdict)) # should be True
dd = defaultdict(lambda: defaultdict(int)) # correct nested pattern
Fix now
Use lambda: defaultdict(inner_factory) as the outer factory.
dict vs defaultdict vs OrderedDict
FeaturedictdefaultdictOrderedDict
Insertion order preservation (3.7+)YesYesYes
Default value on missing keyNo (KeyError)Yes (factory)No (KeyError)
Order-sensitive equalityNoNoYes
move_to_end()NoNoYes
Memory overhead vs plain dictBaselineSlightly more (factory)~2x (linked list)
Use caseGeneral purposeGrouping, countingOrder equality, LRU

Key takeaways

1
defaultdict(list) eliminates the if-key-not-in-dict pattern for grouping operations.
2
The factory function (list, int, set, or a lambda) is called with no arguments when a missing key is accessed.
3
Accessing a missing key in defaultdict creates it
be aware of this when iterating.
4
Regular dicts maintain insertion order since Python 3.7, so OrderedDict is mostly only needed for order-aware equality and move_to_end().
5
For counting, consider Counter from collections
it provides most_common() and is more expressive than defaultdict(int).
6
Nested defaultdict requires lambda
defaultdict(inner_type) for proper two-level grouping.
7
Use defaultdict for grouping and counting; use regular dict for validation where missing keys should raise exceptions.

Common mistakes to avoid

4 patterns
×

Using defaultdict when you need to raise KeyError for missing keys

Symptom
Silent creation of default values masks data integrity issues in production pipelines.
Fix
Use a regular dict with explicit try/except KeyError, or use get() to read without creation.
×

Iterating over keys and accidentally creating new entries

Symptom
Memory grows unexpectedly; the dict contains keys that were never supposed to exist.
Fix
Use .get(key) or key in dd for read-only checks inside loops.
×

Using OrderedDict when a regular dict suffices (no reordering needed)

Symptom
Unnecessary memory overhead and code that implies order-sensitive logic that isn't used.
Fix
Prefer regular dict for most cases; reserve OrderedDict for move_to_end() or order equality.
×

Nested defaultdict with wrong factory: defaultdict(defaultdict) instead of lambda: defaultdict(int)

Symptom
AttributeError: 'type' object has no attribute '__getitem__' or unexpected behavior.
Fix
Use default_factory = lambda: defaultdict(inner_type) for nested defaultdicts.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What problem does defaultdict solve compared to a regular dict?
Q02SENIOR
Is OrderedDict still useful in Python 3.7 and later?
Q03SENIOR
How would you group a list of items by a property without using defaultd...
Q04SENIOR
Explain the difference between defaultdict.__missing__ and the factory f...
Q01 of 04JUNIOR

What problem does defaultdict solve compared to a regular dict?

ANSWER
It eliminates the boilerplate of checking if a key exists before inserting into a list or incrementing a counter. Instead of writing 'if key in d: d[key].append(x) else: d[key] = [x]', you just write 'd[key].append(x)' where d is a defaultdict(list). The factory function is called automatically for missing keys.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
Does accessing a missing key in defaultdict modify the dictionary?
02
Should I use OrderedDict or a regular dict in Python 3.7+?
03
What is the difference between defaultdict and Counter?
04
Can I use defaultdict for an LRU cache?
🔥

That's Data Structures. Mark it forged?

3 min read · try the examples if you haven't

Previous
heapq Module in Python
12 / 12 · Data Structures
Next
Functions in Python