Skip to content
Home Interview Python Data Structures Interview Questions — Lists, Dicts, Sets & More

Python Data Structures Interview Questions — Lists, Dicts, Sets & More

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Python Interview → Topic 3 of 4
Python data structures interview questions explained with real-world analogies, runnable code, and the WHY behind every answer.
⚙️ Intermediate — basic Interview knowledge assumed
In this tutorial, you'll learn
Python data structures interview questions explained with real-world analogies, runnable code, and the WHY behind every answer.
  • Tuple vs list isn't just about mutability — tuples are hashable and can be dict keys; lists cannot. Use tuples for fixed records, lists for growable collections.
  • Dict and set lookups are O(1) because they use hash tables — Python computes a hash, jumps to a memory location, done. List search is O(n). This single fact drives most data structure decisions in real code.
  • Never use list.pop(0) for queue behaviour — it's O(n). Use collections.deque.popleft() instead, which is O(1) and built for exactly this purpose.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer

Imagine you're organising a party. A list is your ordered guest list — everyone has a seat number. A dictionary is the name badge table — you look up a person by name, not by seat. A set is the unique-gift pile — no duplicates allowed. A tuple is the printed invitation — once it's sent, you can't change it. Python's data structures are just these everyday organising ideas, formalised into code.

Data structures aren't just a Computer Science exam topic — they're the daily tools every Python developer reaches for without thinking. Whether you're building a REST API, scraping a website, or crunching data in pandas, the choice between a list and a set, or a dict and a defaultdict, changes how fast your code runs and how readable it stays. Interviewers love data structure questions because they reveal whether you understand trade-offs, not just syntax.

The real problem most candidates have isn't that they don't know what a list is. It's that they can't articulate WHY they'd choose one structure over another under pressure, and they miss the subtle gotchas — mutability, hashing, ordering guarantees — that separate junior answers from senior ones.

By the end of this article you'll be able to answer questions about time complexity, pick the right structure for a given problem, explain the internals of dicts and sets in plain English, and dodge the three most common mistakes candidates make mid-interview. Let's build this up piece by piece.

Lists vs Tuples — It's Not Just About Mutability

Every candidate says 'lists are mutable, tuples are immutable' — and then the interviewer nods politely and waits for more. That answer is correct but incomplete.

The deeper WHY: because tuples are immutable, Python can store them more efficiently in memory and use them as dictionary keys or set members. A list can never be a dict key — try it and you'll get a TypeError. Tuples also communicate intent: when you return (latitude, longitude) from a function, you're signalling to the next developer 'these two values belong together and shouldn't be shuffled around'.

Lists shine when your collection needs to grow, shrink, or be sorted — think a queue of incoming HTTP requests or a shopping cart. Tuples shine when you want a fixed record — RGB colours, database row results, coordinate pairs.

Appending to a list is O(1) amortised — Python over-allocates memory in chunks, so you're not copying the whole list on every append. Inserting at position 0 is O(n), because every element shifts right. That detail trips up a lot of candidates.

io/thecodeforge/python/list_vs_tuple_demo.py · PYTHON
1234567891011121314151617181920212223242526
import time
import sys

# 1. Memory Efficiency: Tuples are smaller because they are fixed-size
tpl = (1, 2, 3, 4, 5)
lst = [1, 2, 3, 4, 5]
print(f"Tuple size: {sys.getsizeof(tpl)} bytes")
print(f"List size:  {sys.getsizeof(lst)} bytes")

# 2. Hashability: Using a tuple as a coordinate key
locations = {
    (40.7128, -74.0060): "New York",
    (51.5074, -0.1278): "London"
}
print(f"Location lookup: {locations[(40.7128, -74.0060)]}")

# 3. Performance Trap: O(n) vs O(1)
data = list(range(100_000))

start = time.perf_counter()
data.append(100_001)  # O(1) amortized
print(f"Append time: {time.perf_counter() - start:.8f}s")

start = time.perf_counter()
data.insert(0, -1)     # O(n) - moving 100k items!
print(f"Insert at front time: {time.perf_counter() - start:.8f}s")
▶ Output
Tuple size: 80 bytes
List size: 104 bytes
Location lookup: New York
Append time: 0.00000500s
Insert at front time: 0.00012000s
💡Interview Gold:
When an interviewer asks 'what's the difference between a list and a tuple?', go beyond mutability. Say: 'Tuples are hashable, so they can be dict keys and set members. They also signal to other developers that this is a fixed record, not a collection to be modified. And because Python knows a tuple won't change, it can allocate memory more efficiently.' That answer gets you to the next round.

Dictionaries and Sets — Hash Tables Under the Hood

Python dicts and sets are both built on hash tables — and understanding that one fact makes you dangerous in interviews.

Here's the mental model: when you do user_scores['alice'], Python doesn't scan the entire dict looking for 'alice'. It runs hash('alice'), gets a number, jumps directly to the corresponding memory bucket, and returns the value. That's why dict lookup is O(1) — it's a single mathematical jump, not a search.

Sets work identically but only store keys, no values. That's why 'alice' in large_set is blindingly fast while 'alice' in large_list gets slower as the list grows.

The critical interview implication: only hashable objects can be dict keys or set members. Integers, strings, and tuples are hashable. Lists and dicts are not — they're mutable, so their hash could change after insertion, which would corrupt the hash table.

Since Python 3.7, dicts maintain insertion order as a language guarantee — not an implementation detail. That matters when you're iterating and need predictable output.

io/thecodeforge/python/hashing_mechanics.py · PYTHON
1234567891011121314151617181920212223
from collections import Counter

# 1. Membership Testing: List vs Set (The FAANG Favorite)
large_list = list(range(10_000_000))
large_set = set(large_list)

start = time.perf_counter()
9_999_999 in large_list  # O(n) - must check every item
print(f"List search: {time.perf_counter() - start:.5f}s")

start = time.perf_counter()
9_999_999 in large_set   # O(1) - immediate hash jump
print(f"Set search:  {time.perf_counter() - start:.5f}s")

# 2. Frequency Analysis with Counter
logs = ["ERROR", "INFO", "ERROR", "WARN", "ERROR", "INFO"]
error_counts = Counter(logs)
print(f"Most common: {error_counts.most_common(1)}")

# 3. Set Algebra: Finding common items
admin_ids = {101, 105, 110}
active_ids = {105, 110, 120}
print(f"Active Admins: {admin_ids & active_ids}") # Intersection
▶ Output
List search: 0.12000s
Set search: 0.00001s
Most common: [('ERROR', 3)]
Active Admins: {105, 110}
⚠ Watch Out:
Dict keys must be hashable — but that doesn't mean ALL immutable types are hashable. A tuple of lists like ([1,2], [3,4]) is immutable at the tuple level but contains mutable objects inside, so it's NOT hashable and will raise a TypeError if used as a dict key. Always ask: does every element inside the tuple hash cleanly?

Stacks, Queues, and When the Built-ins Aren't Enough

Python doesn't have built-in Stack or Queue classes — and that confuses candidates who learned data structures in Java or C++. The Pythonic answer is: use what already exists, but use it correctly.

A list works fine as a stack (LIFO): append() to push, pop() to pop — both O(1). But using a list as a queue is a performance trap. list.pop(0) removes from the front and is O(n) because every remaining element shifts.

For queues (FIFO), use collections.deque. It's a doubly-linked list under the hood — appendleft and popleft are O(1). For priority queues — where you want to always retrieve the smallest (or largest) item first — use the heapq module. This comes up constantly in real interviews, especially graph problems like Dijkstra's algorithm.

Knowing the right tool signals that you've actually used Python in production, not just read a tutorial.

io/thecodeforge/python/queue_and_heap.py · PYTHON
1234567891011121314151617181920
from collections import deque
import heapq

# 1. Efficient Queue with deque
visitor_queue = deque(["User_A", "User_B"])
visitor_queue.append("User_C")   # O(1)
served = visitor_queue.popleft() # O(1)
print(f"Served: {served}, Remaining: {list(visitor_queue)}")

# 2. Priority Queue (Min-Heap)
tasks = []
# Pushing tuples: (priority, data)
heapq.heappush(tasks, (3, "Low Priority Job"))
heapq.heappush(tasks, (1, "Critical Hotfix"))
heapq.heappush(tasks, (2, "Standard Feature"))

# Always pops the lowest priority number first
while tasks:
    prio, task = heapq.heappop(tasks)
    print(f"Processing: [{prio}] {task}")
▶ Output
Served: User_A, Remaining: ['User_B', 'User_C']
Processing: [1] Critical Hotfix
Processing: [2] Standard Feature
Processing: [3] Low Priority Job
🔥Pro Tip:
heapq is a min-heap. To simulate a max-heap (always pop the LARGEST item first), store your values as negatives: push -priority instead of priority, then negate when you pop. It's a classic interview trick that shows you understand the underlying structure, not just the API.

List Comprehensions, Memory, and Generator Expressions

List comprehensions are one of Python's most loved features — and one of the most misused. The interview trap isn't 'can you write one?' — it's 'do you know when NOT to use one?'

A list comprehension builds the entire list in memory at once. If you're filtering a million database records to find ten matches, you've just loaded a million items into RAM before discarding 999,990 of them. That's wasteful.

Generator expressions solve this. They look almost identical to list comprehensions but use parentheses instead of square brackets. Crucially, they're lazy — they compute each value only when you ask for it. Memory usage stays flat regardless of input size.

For interview questions about large datasets, memory efficiency, or streaming data, generator expressions are almost always the right answer. Interviewers listen carefully for whether you distinguish between the two.

io/thecodeforge/python/memory_optimization.py · PYTHON
12345678910111213141516
import sys

# Generating 1 million squares
n = 1_000_000

# List Comprehension: All numbers stored in RAM
sq_list = [x**2 for x in range(n)]
print(f"List Comp Size: {sys.getsizeof(sq_list) / 1024:.2f} KB")

# Generator Expression: Only the object/logic stored
sq_gen = (x**2 for x in range(n))
print(f"Generator Size: {sys.getsizeof(sq_gen)} bytes")

# Consuming the generator (simulated file processing)
print(f"First item: {next(sq_gen)}")
print(f"Second item: {next(sq_gen)}")
▶ Output
List Comp Size: 8448.30 KB
Generator Size: 112 bytes
First item: 0
Second item: 1
💡Interview Gold:
If an interviewer asks 'how would you process a file with 10 million lines?', the answer is NOT a list comprehension. Say: 'I'd use a generator expression or iterate line-by-line with a for loop, so only one line lives in memory at a time. A list comprehension would load all 10 million lines at once and risk an out-of-memory crash.' That shows production awareness.
Feature / Aspectlisttupledictsetdeque
Ordered?Yes (insertion order)Yes (insertion order)Yes (Python 3.7+)NoYes (insertion order)
Mutable?YesNoYesYesYes
Allows duplicates?YesYesKeys: No, Values: YesNoYes
Lookup time complexityO(n) for searchO(n) for searchO(1) by keyO(1) membershipO(n) for search
Append / PushO(1) amortisedN/A — immutableO(1)O(1)O(1) both ends
Insert at frontO(n) — avoid!N/AN/AN/AO(1) — use deque!
Hashable (dict key)?NoYes (if contents hashable)NoNoNo
Best real-world useOrdered, growable collectionFixed record / DB rowLookup by name / keyDeduplication / membershipQueue / sliding window

🎯 Key Takeaways

  • Tuple vs list isn't just about mutability — tuples are hashable and can be dict keys; lists cannot. Use tuples for fixed records, lists for growable collections.
  • Dict and set lookups are O(1) because they use hash tables — Python computes a hash, jumps to a memory location, done. List search is O(n). This single fact drives most data structure decisions in real code.
  • Never use list.pop(0) for queue behaviour — it's O(n). Use collections.deque.popleft() instead, which is O(1) and built for exactly this purpose.
  • List comprehensions build everything in memory immediately. For large or streaming datasets, generator expressions (parentheses instead of brackets) compute lazily and use a constant 200 bytes of memory regardless of input size.

⚠ Common Mistakes to Avoid

    Using a list as a queue with pop(0) — Every pop(0) call is O(n) because Python shifts every remaining element left by one position. On a list with 100,000 items, that's 100,000 moves per pop. Fix: switch to collections.deque and use popleft() — it's O(1) because deque is a doubly-linked list designed for this exact operation.
    Fix

    switch to collections.deque and use popleft() — it's O(1) because deque is a doubly-linked list designed for this exact operation.

    Mutating a list while iterating over it — Removing or adding items inside a for loop that's iterating the same list causes skipped elements or IndexErrors. Python adjusts the index after each removal, so the next item jumps past. Fix: iterate over a copy (for item in my_list[:]) or build a new list with a comprehension: my_list = [x for x in my_list if condition(x)].
    Fix

    iterate over a copy (for item in my_list[:]) or build a new list with a comprehension: my_list = [x for x in my_list if condition(x)].

    Using a mutable default argument in a function — def add_item(item, cart=[]) looks harmless but the empty list is created ONCE when the function is defined, not each time it's called. Every call that uses the default shares the SAME list object, so items accumulate across calls. Fix: use def add_item(item, cart=None) and then cart = [] if cart is None inside the function body.
    Fix

    use def add_item(item, cart=None) and then cart = [] if cart is None inside the function body.

Interview Questions on This Topic

  • QYou need to count the frequency of words in a 5GB log file that won't fit in RAM. Walk me through your approach — what data structures would you use and why?
  • QWhat's the time complexity of checking membership in a list versus a set, and can you explain WHY the difference exists at the implementation level?
  • QHow does Python's dictionary maintain insertion order since version 3.7, and what are the performance implications of this change?
  • QGiven a stream of numbers, how would you efficiently find the 'median' at any given point in time? (Hint: Two Heaps pattern).
  • QExplain the 'LTM' (Least Recently Used) cache implementation using OrderedDict. Why is this more efficient than a plain dict?

Frequently Asked Questions

What is the difference between a list and a tuple in Python for interviews?

Lists are mutable and cannot be used as dictionary keys. Tuples are immutable and hashable, so they can be dict keys and set members. Beyond mutability, tuples communicate intent — they signal a fixed record like a coordinate or a database row. In interviews, always mention hashability — it's the detail most candidates miss.

Why is dictionary lookup O(1) in Python?

Python dicts are built on hash tables. When you look up a key, Python runs the built-in hash() function on it, which produces an integer that maps to a specific memory bucket. This is a direct mathematical jump — no scanning, no comparison loop. As long as hash collisions are rare (and Python manages them well), lookup stays constant time regardless of how many keys are in the dict.

What is the Big O complexity of Python's 'set' operations?

Average case for add, remove, and contains is O(1). However, set operations like Union (s|t) or Intersection (s&t) are O(len(s) + len(t)). In interviews, emphasize that membership testing in a set is the primary reason to use it over a list for deduplication tasks.

When should I use a generator expression instead of a list comprehension?

Use a generator expression when you only need to iterate over the results once and the dataset is large. A list comprehension materialises every result in memory at once, which can be gigabytes for large inputs. A generator computes each value on demand, using roughly 200 bytes regardless of size. If you need indexing, slicing, or multiple passes over the results, a list comprehension is the right choice.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousPython OOP Interview QuestionsNext →Django Interview Questions
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged