Python Intermediate

Python collections Module Explained — namedtuple, Counter, defaultdict & More

📅 March 2026 ⏱ 8 min read 🎯 Intermediate

In Plain English 🔥

Imagine you're organising a school trip. A regular backpack (Python's built-in list or dict) works fine, but sometimes you need a special bag — one with labelled compartments, or one that automatically creates a new pocket when you stuff something in it. The collections module is exactly that: a set of specialised containers built on top of the basics, each designed for a specific job. They don't replace lists and dicts; they upgrade them for situations where the plain versions make your code clunky.

⚡ Quick Answer

Every Python developer reaches a point where a regular dict just isn't cutting it. You're writing a word-frequency counter and you keep checking 'does this key exist yet?' before incrementing it. Or you're modelling a playing card and passing around a plain tuple, secretly hoping no one accesses index 0 when they meant index 1. These are the friction points that the collections module was built to eliminate — and it's been shipping with Python since version 2.4, which tells you how battle-tested it is.

The collections module solves a specific class of problem: everyday data-wrangling tasks that are just awkward enough with built-in types to make you write boilerplate, but not complex enough to justify a full third-party library. Instead of writing a three-line 'if key not in dict' dance every time you want a default value, defaultdict handles it in zero extra lines. Instead of sorting a list of tuples and trying to remember which index means 'price', namedtuple gives every field a readable name. The module trades a small learning investment for a massive reduction in repetitive, error-prone code.

By the end of this article you'll know exactly which collection to reach for when you're counting things, building queues, working with structured data, or grouping items. You'll also understand the performance trade-offs, the traps beginners fall into, and how to talk about these types confidently in a technical interview.

Counter — Count Anything in One Line

Counter is a subclass of dict built for tallying. You hand it any iterable — a string, a list, a file of words — and it hands back a dict-like object where each key is an element and each value is how many times that element appeared. That's it. That's the whole job, and it does it beautifully.

Where Counter shines beyond a plain dict is in the helper methods it ships with. most_common(n) returns the n highest-frequency items sorted descending — perfect for building a leaderboard or a word-cloud dataset. You can also add two Counters together with + to merge tallies, or subtract with - to find what's missing. These operations make Counter genuinely composable in real pipelines.

The most important mental model: treat Counter like a bag (multiset) rather than a set. Bags allow duplicates and track multiplicity. When you need to know not just what exists but how many times, Counter is your type. A common real-world use-case is analysing HTTP access logs to find the most-requested endpoints, or scoring a Scrabble hand by letter frequency.

word_frequency_counter.py · PYTHON

123456789101112131415161718192021222324252627

from collections import Counter

# Imagine analysing customer feedback from a support system
feedback_words = [
    "slow", "buggy", "slow", "great", "slow",
    "buggy", "excellent", "great", "slow", "excellent"
]

# Counter tallies every element automatically — no manual dict initialisation
word_tally = Counter(feedback_words)
print("Full tally:", word_tally)
# Output: Counter({'slow': 4, 'buggy': 2, 'great': 2, 'excellent': 2})

# most_common gives you a ranked list — top 3 complaints at a glance
top_issues = word_tally.most_common(3)
print("Top 3 words:", top_issues)
# Output: [('slow', 4), ('buggy', 2), ('great', 2)]

# Counters support arithmetic — merge two batches of feedback
week2_feedback = Counter(["slow", "excellent", "excellent", "buggy"])
combined = word_tally + week2_feedback
print("Combined over 2 weeks:", combined)
# Output: Counter({'slow': 5, 'excellent': 4, 'buggy': 3, 'great': 2})

# Accessing a missing key returns 0 — not a KeyError like a plain dict
print("Count of 'terrible':", word_tally["terrible"])
# Output: Count of 'terrible': 0

▶ Output

Full tally: Counter({'slow': 4, 'buggy': 2, 'great': 2, 'excellent': 2})
Top 3 words: [('slow', 4), ('buggy', 2), ('great', 2)]
Combined over 2 weeks: Counter({'slow': 5, 'excellent': 4, 'buggy': 3, 'great': 2})
Count of 'terrible': 0

⚠️

Pro Tip:Counter(string) counts individual characters — Counter('mississippi') gives you a letter-frequency map instantly. This is a common interview sub-problem inside larger coding challenges, so knowing this saves you from writing a manual loop.

defaultdict — Stop Writing 'if key not in dict' Forever

A defaultdict is a dict that automatically creates a value for a key that doesn't exist yet, the moment you first access it. You supply a callable (like int, list, or set) when you create it — that callable is invoked to produce the default value. There's no KeyError, no boilerplate guard clause, no setdefault gymnastics.

The canonical use-case is grouping. Suppose you have a list of (student, subject) pairs and you want to build a dict that maps each student to a list of their subjects. With a plain dict you'd write three lines per insertion: check if key exists, create an empty list if not, then append. With defaultdict(list) you just append — the empty list is created automatically on first access.

Under the hood, defaultdict overrides __missing__, which is the method dict calls when a key lookup fails. This means it behaves identically to a regular dict in every other way — you can iterate it, json.dumps it (after converting with dict()), and pass it anywhere a dict is expected. The only difference is that absent-key lookups no longer raise; they construct.

student_subject_grouper.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536373839

from collections import defaultdict

# Raw enrolment data — each tuple is (student_name, subject)
enrolments = [
    ("Alice", "Maths"),
    ("Bob", "Science"),
    ("Alice", "Science"),
    ("Charlie", "Maths"),
    ("Bob", "History"),
    ("Alice", "History"),
]

# defaultdict(list) creates an empty list automatically for any new key
student_subjects = defaultdict(list)

for student, subject in enrolments:
    # No 'if student not in student_subjects' needed — it just works
    student_subjects[student].append(subject)

print("Student subjects:", dict(student_subjects))
# Output: {'Alice': ['Maths', 'Science', 'History'], 'Bob': ['Science', 'History'], 'Charlie': ['Maths']}

# defaultdict(int) is perfect for manual counting without Counter
vote_counts = defaultdict(int)
votes = ["Alice", "Bob", "Alice", "Charlie", "Alice", "Bob"]
for candidate in votes:
    vote_counts[candidate] += 1   # 0 is the default, so += 1 works on first access

print("Vote counts:", dict(vote_counts))
# Output: {'Alice': 3, 'Bob': 2, 'Charlie': 1}

# defaultdict(set) lets you build unique-value groups effortlessly
page_visitors = defaultdict(set)
visits = [("home", "user_1"), ("home", "user_2"), ("home", "user_1"), ("about", "user_1")]
for page, user in visits:
    page_visitors[page].add(user)  # set deduplicates automatically

print("Unique visitors per page:", dict(page_visitors))
# Output: {'home': {'user_1', 'user_2'}, 'about': {'user_1'}}

▶ Output

Student subjects: {'Alice': ['Maths', 'Science', 'History'], 'Bob': ['Science', 'History'], 'Charlie': ['Maths']}
Vote counts: {'Alice': 3, 'Bob': 2, 'Charlie': 1}
Unique visitors per page: {'home': {'user_1', 'user_2'}, 'about': {'user_1'}}

⚠️

Watch Out:Accessing a missing key in a defaultdict *creates* that key with the default value. This means len(your_defaultdict) grows every time you probe a non-existent key — a subtle bug if you're probing to check membership. Use 'key in your_defaultdict' for existence checks to avoid polluting the dict with phantom entries.

namedtuple — Give Your Tuples a Memory

A plain tuple is positional amnesia. (52.3, -1.8) means nothing until you remember whether index 0 is latitude or longitude. namedtuple fixes this by generating a tuple subclass where every position has a name. It's immutable like a tuple, memory-efficient like a tuple, but readable like an object.

The magic is that namedtuple generates a real class at runtime — complete with __repr__, __eq__, and field-access by attribute name. You get all the benefits of a lightweight data class without the overhead of a full class definition. This is why namedtuple shows up constantly in standard library internals: os.stat_result, sys.version_info, and socket.getaddrinfo all return namedtuples.

The best mental model: namedtuple is the right choice when your data is immutable, has a fixed number of fields, and you want readable field access without the overhead of a full class. If you need mutability or methods, reach for dataclasses instead. namedtuple slots neatly between raw tuples (too opaque) and full classes (too heavy).

product_catalogue.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536

from collections import namedtuple

# Define the structure once — this creates a new class called 'Product'
Product = namedtuple('Product', ['name', 'price', 'stock', 'category'])

# Instantiate just like a class — no dict, no positional-index guessing
laptop = Product(name="ProBook 450", price=899.99, stock=12, category="Electronics")
headphones = Product(name="SoundWave Pro", price=149.99, stock=35, category="Audio")
desk = Product(name="Standing Desk", price=399.00, stock=5, category="Furniture")

# Access fields by name — code reads like a sentence, not a puzzle
print(f"{laptop.name} costs £{laptop.price} and has {laptop.stock} units in stock.")
# Output: ProBook 450 costs £899.99 and has 12 units in stock.

# namedtuple is still a tuple — indexing and unpacking both work
print("Price via index:", laptop[1])  # backwards-compatible with tuple code
# Output: Price via index: 899.99

# _replace creates a new instance with one field changed (remember: it's immutable)
updated_laptop = laptop._replace(stock=10)
print("Updated stock:", updated_laptop.stock)
# Output: Updated stock: 10

# Works seamlessly in a list — sort by price using attribute access
catalogue = [laptop, headphones, desk]
by_price = sorted(catalogue, key=lambda product: product.price)
for item in by_price:
    print(f"{item.name}: £{item.price}")
# Output:
# SoundWave Pro: £149.99
# Standing Desk: £399.0
# ProBook 450: £899.99

# _asdict converts to an OrderedDict — handy for JSON serialisation
print(laptop._asdict())
# Output: {'name': 'ProBook 450', 'price': 899.99, 'stock': 12, 'category': 'Electronics'}

▶ Output

ProBook 450 costs £899.99 and has 12 units in stock.
Price via index: 899.99
Updated stock: 10
SoundWave Pro: £149.99
Standing Desk: £399.0
ProBook 450: £899.99
{'name': 'ProBook 450', 'price': 899.99, 'stock': 12, 'category': 'Electronics'}

🔥

Interview Gold:Interviewers often ask 'what's the difference between namedtuple and dataclass?' The clean answer: namedtuple is immutable, tuple-compatible, and has no overhead — ideal for read-only records. dataclass is mutable by default, supports default values, and can have methods — ideal for objects with behaviour. Python 3.7+ dataclasses are the modern choice for anything complex; namedtuple wins on pure simplicity.

deque — The Double-Ended Queue That Outperforms Lists

A Python list is secretly bad at one thing: inserting or removing elements from the front. list.pop(0) or list.insert(0, item) are O(n) operations because Python has to shift every other element in memory. For small lists you'll never notice. For a queue processing thousands of events per second, it's a hidden bottleneck.

deque (pronounced 'deck', short for double-ended queue) solves this with O(1) appends and pops from both ends. It's backed by a doubly-linked list of fixed-size blocks, so adding or removing from either end never requires shifting. Use deque any time your data structure is conceptually a queue (first-in, first-out) or a stack (last-in, first-out), or when you need a sliding window of the last N items.

The maxlen parameter is one of deque's killer features. When maxlen is set, the deque automatically discards items from the opposite end when it fills up. This gives you a rolling window — a fixed-size buffer that always holds the most recent N items — in zero extra code. Think: last 100 log lines, last 10 sensor readings, last 5 user actions for an undo buffer.

task_queue_and_sliding_window.py · PYTHON

1234567891011121314151617181920212223242526272829303132333435363738

from collections import deque
import time

# --- USE CASE 1: Efficient task queue ---
# Simulating a print job queue in an office
print_queue = deque()

# Staff submitting print jobs — appendleft adds to the front (like a priority queue)
print_queue.append("Invoice_March.pdf")      # Normal priority — appended to the right
print_queue.append("Report_Q1.xlsx")
print_queue.appendleft("URGENT_Contract.pdf") # High priority — jumps to the front

print("Queue state:", list(print_queue))
# Output: Queue state: ['URGENT_Contract.pdf', 'Invoice_March.pdf', 'Report_Q1.xlsx']

# Process jobs FIFO — popleft takes from the front, O(1) not O(n)
while print_queue:
    job = print_queue.popleft()
    print(f"Printing: {job}")
# Output:
# Printing: URGENT_Contract.pdf
# Printing: Invoice_March.pdf
# Printing: Report_Q1.xlsx

print()

# --- USE CASE 2: Rolling window with maxlen ---
# Keeping a live feed of the last 4 server response times (milliseconds)
response_times = deque(maxlen=4)  # Only ever holds the 4 most recent readings

readings = [120, 135, 98, 210, 87, 310, 95]
for reading in readings:
    response_times.append(reading)
    # When full, the oldest reading drops off the left automatically
    avg = sum(response_times) / len(response_times)
    print(f"Added {reading}ms | Window: {list(response_times)} | Avg: {avg:.1f}ms")

# Output shows the window sliding — oldest values drop as new ones arrive

▶ Output

⚠️

Watch Out:deque does NOT support O(1) random access by index. deque[500] is O(n) because it has to traverse nodes. If you need both fast ends AND fast random access, you're in a situation where neither list nor deque is perfect — consider a different data structure like a heap or sorted container. Use deque specifically when your access pattern is 'add/remove from ends only'.

Collection Type	Best Used When	Key Advantage Over Built-in	Mutability
Counter	Tallying/frequency analysis	most_common(), arithmetic merging, 0 for missing keys	Mutable
defaultdict	Grouping or accumulating into collections	Auto-creates missing keys — eliminates KeyError guards	Mutable
namedtuple	Immutable records with named fields (e.g. DB rows)	Field names instead of index numbers, zero memory overhead vs class	Immutable
deque	FIFO queues, stacks, or rolling windows	O(1) append/pop on both ends vs O(n) for list.pop(0)	Mutable
OrderedDict	Dicts where insertion order matters (pre-Python 3.7)	Remembers insertion order + reorder methods (move_to_end)	Mutable
ChainMap	Layered config (env > config file > defaults)	Logical merge of multiple dicts without copying	Mutable (first map only)

🎯 Key Takeaways

Counter eliminates manual frequency-counting boilerplate and adds most_common() and arithmetic merging — reach for it any time you need to tally anything.
defaultdict auto-creates missing keys on first access, making grouping patterns a single line — but never use bracket notation to check if a key exists, or you'll silently create phantom entries.
namedtuple is a zero-overhead way to add field names to a tuple — it's the right choice for immutable records; graduate to dataclass when you need mutability or methods.
deque is the correct type for any queue or sliding-window pattern — list.pop(0) is O(n) and will hurt you at scale; deque.popleft() is always O(1).

⚠ Common Mistakes to Avoid

✕Mistake 1: Using list.pop(0) for a queue instead of deque — Symptom: code works fine in testing but gets dramatically slower as the list grows (O(n) per pop). Fix: replace your list with a deque and use popleft() — it's a one-line swap with identical semantics but O(1) performance.
✕Mistake 2: Probing a defaultdict with bracket notation to check if a key exists — Symptom: the key now exists with its default value even though you only wanted to check — len() and iterations behave unexpectedly. Fix: always use 'if key in my_defaultdict' for existence checks; bracket access is for getting-or-creating, not inspecting.
✕Mistake 3: Passing a namedtuple field name as a string to _replace and misspelling it — Symptom: TypeError: got an unexpected keyword argument — no IDE autocomplete catches string typos. Fix: use keyword argument syntax correctly (record._replace(price=99.99)) and let your IDE catch typos via attribute access in the rest of your code.

Interview Questions on This Topic

QWhy would you choose defaultdict over using dict.setdefault()? What are the performance and readability differences?
QExplain why deque has O(1) append and popleft while a list has O(n) for pop(0). When would you still choose a list over a deque?
QIf Counter is a subclass of dict, what specifically does it add, and what happens when you access a key that doesn't exist — how does that differ from a regular dict and why?

Frequently Asked Questions

When should I use Python's collections module instead of a regular dict or list?

Use collections when you find yourself writing repetitive boilerplate around a plain dict or list: checking if a key exists before incrementing (use Counter or defaultdict), forgetting what tuple index means what (use namedtuple), or calling list.pop(0) frequently (use deque). The module doesn't replace built-ins; it replaces awkward patterns around them.

Is defaultdict slower than a regular dict in Python?

The overhead is negligible for most use-cases — a defaultdict has one extra attribute (default_factory) and one extra method call (__missing__) per new key creation. For existing keys it's identical to a plain dict lookup. The real-world performance difference is rarely measurable unless you're creating millions of new keys per second.

What's the difference between collections.namedtuple and Python 3.7 dataclasses?

namedtuple produces an immutable, tuple-compatible class with no method overhead — it's ideal for read-only records you want to pass around cheaply. dataclass produces a mutable class with full OOP support, default values, post-init processing and __slots__ optimisation. Use namedtuple for simple, immutable data bags; use dataclass for anything that has behaviour, needs mutation, or has complex defaults.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged