Skip to content
Home Python Python Iterator Exhaustion — Silent Data Drop in ETL

Python Iterator Exhaustion — Silent Data Drop in ETL

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Advanced Python → Topic 1 of 17
Generator exhaustion silently dropped 50% of ETL records.
⚙️ Intermediate — basic Python knowledge assumed
In this tutorial, you'll learn
Generator exhaustion silently dropped 50% of ETL records.
  • An iterable produces iterators; an iterator IS the stateful cursor — confusing the two causes silent, hard-to-debug empty-loop bugs.
  • Every Python for-loop is secretly calling iter() once and then next() on every cycle until StopIteration is raised — there is no other mechanism.
  • Iterators are one-shot by design: once exhausted, they stay exhausted. Always pass the original iterable — not the iterator — when the same data needs to be traversed more than once.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • An iterable has __iter__() that returns an iterator
  • An iterator has __next__() that yields items and raises StopIteration when done
  • Every Python for-loop calls iter() once then next() in a loop until StopIteration
  • Lists, tuples, strings are iterables; file objects are iterators
  • Generators are iterator factories written with yield — same protocol, less code
  • Reusing an iterator causes silent empty loops — always pass the iterable
🚨 START HERE

Iterator Debug Quick Reference

One-liners for the three most common iterator-related production issues.
🟡

Iterator exhausted too early

Immediate ActionReplace `iter(something)` usage with `list(something)` if you need multiple passes, or call the factory again.
Commands
print(type(obj).__name__, hasattr(obj, '__next__'))
print(id(obj)) # Compare to original variable to confirm identity
Fix NowChange `for item in my_iterator:` to `for item in list(my_iterable):` as a temporary fix, then refactor.
🟡

Custom iterator never finishes

Immediate ActionAdd a print statement in __next__ to see if it's being called; check for missing StopIteration.
Commands
print('__next__ called') # inside __next__
print(f'position: {self.position}') # track state
Fix NowAdd `if self.position >= len(self.data): raise StopIteration` at the start of __next__.
🟡

Generator doesn't execute any statements

Immediate ActionCheck that you called the function (with parentheses) and stored the result.
Commands
print(type(gen_obj)) # Should be <class 'generator'>
list(gen_obj) # Force execution to see if anything happens
Fix NowEnsure the generator is iterated over: `for item in my_generator(): ...`
Production Incident

Exhausted Iterator Causes Silent Data Drop in ETL Pipeline

A batch processing pipeline lost half its records because a generator object was consumed once during validation and then passed to the main processing loop.
SymptomThe pipeline processed only the first half of the expected records. No errors raised—just fewer rows in the output database.
AssumptionThe team assumed that passing the generator object to a validation function and then to the processing function would work like passing a list. They thought the generator would reset or that the data would be cached.
Root causeThe validation function called list(my_generator()) internally, which exhausted the generator. The original generator object was passed downstream, but it was already exhausted—calling __next__ raised StopIteration immediately, so the processing loop never executed.
FixInstead of passing the generator object directly, call the generator function twice: once to produce an iterable for validation (e.g., converted to a list) and once to produce a fresh generator for the main processing loop. Alternatively, produce a list once and pass that around—trade memory for safety.
Key Lesson
Generators are one-shot—never pass a generator object to more than one consumer.If you need to iterate twice, either call the generator function twice or materialize the data into a list.Always audit function signatures: if a function expects an iterable, it may exhaust the iterator. Prefer passing the factory (callable) over the instance.
Production Debug Guide

Common symptoms and their root causes when iterators behave unexpectedly.

Loop body never executes for the second usage of the same variableCheck if the variable is an iterator (has __next__). If yes, the iterator was exhausted. Use the original iterable (list, tuple) to get a fresh iterator.
Custom class used in for-loop raises 'TypeError: 'MyClass' object is not iterable'Implement __iter__ method that returns an iterator (or use yield in __iter__ to make it a generator). Ensure __iter__ returns an object with __next__.
Generator function returns but loop never yields anythingVerify that the generator function is called, not the generator object. If you write gen = my_generator without parentheses, you have the function itself, not a generator. Add parentheses.
File object used in two for loops: second loop prints nothingFile objects are iterators. After the first loop, the file pointer is at EOF. Call file.seek(0) to reset the pointer, or re-open the file.
Infinite loop when iterating a custom iterator that never raises StopIterationEnsure every code path in __next__ either returns a value or raises StopIteration. Add a guard condition before returning. Test with a small dataset to verify exhaustion.

Every Python developer uses for-loops from day one, but almost nobody stops to ask: how does Python actually know what to do next on each loop cycle? The answer lives inside a two-method protocol — __iter__ and __next__ — and once you understand it, you'll see it everywhere: in file reading, database cursors, API pagination, and streaming data pipelines. This isn't just academic knowledge; it's the engine under the hood of the language itself.

The problem this solves is memory and control. If Python loaded every item from a collection into memory before you could loop over it, working with a 10-million-row CSV file or an infinite sequence of sensor readings would be impossible. Iterators let you process one item at a time, on demand, without ever needing to know how many items exist in total. This lazy evaluation is what makes Python practical for real data-engineering work.

By the end of this article you'll be able to explain exactly what happens when Python executes a for-loop, write your own custom iterator class from scratch, spot the difference between an iterable and an iterator in a code review, and avoid the subtle bugs that trip up even experienced developers when they assume an iterator can be reused.

The Two-Protocol System: Iterable vs Iterator

Python draws a firm line between two roles. An iterable is any object that knows how to produce an iterator — it has an __iter__ method that returns one. A list, a string, a tuple, a dict — all iterables. An iterator is the stateful worker that actually does the traversal. It has both __iter__ (which just returns itself) and __next__ (which delivers the next item or raises StopIteration when it's done).

This separation exists for a good reason: you want to be able to loop over the same list a hundred times without it 'running out'. The list (iterable) stays neutral. Each time you start a new loop, Python silently calls iter(your_list) to create a fresh iterator — a brand-new dealer for your deck of cards.

You can manually step through this process with Python's built-in iter() and next() functions, which is exactly what a for-loop does internally on every single iteration. Understanding this unlocks everything else in this article.

iterable_vs_iterator.py · PYTHON
123456789101112131415161718192021222324
# A plain list is an ITERABLE — it can produce iterators on demand
playlist = ["Bohemian Rhapsody", "Hotel California", "Stairway to Heaven"]

# iter() calls playlist.__iter__() and returns a fresh ITERATOR
playlist_iterator = iter(playlist)

# next() calls playlist_iterator.__next__() each time
print(next(playlist_iterator))  # Fetches item 1 — iterator remembers position
print(next(playlist_iterator))  # Fetches item 2 — picks up exactly where it left off
print(next(playlist_iterator))  # Fetches item 3 — last item

# The iterator is now exhausted — calling next() again raises StopIteration
try:
    print(next(playlist_iterator))
except StopIteration:
    print("Iterator exhausted — no more songs!")

# The original list is UNCHANGED — start a fresh iterator anytime
fresh_iterator = iter(playlist)
print(next(fresh_iterator))  # Back to the beginning — "Bohemian Rhapsody"

# A for-loop does ALL of this invisibly on every iteration
for song in playlist:          # Python calls iter(playlist) once here
    print(f"Now playing: {song}")  # Then calls next() on each cycle
▶ Output
Bohemian Rhapsody
Hotel California
Stairway to Heaven
Iterator exhausted — no more songs!
Bohemian Rhapsody
Now playing: Bohemian Rhapsody
Now playing: Hotel California
Now playing: Stairway to Heaven
🔥What a for-loop actually does:
Python translates for item in collection into roughly: _iter = iter(collection), then a while-loop that calls next(_iter) and catches StopIteration to break. There is no magic — just these two protocol methods called repeatedly.
📊 Production Insight
Many engineers think for-loops are magic. They aren't.
The for-loop transforms into a while-loop that calls iter() and next().
If you accidentally pass an iterator to a for-loop twice, the second loop will silently do nothing.
🎯 Key Takeaway
iterable = can produce iterators (has __iter__).
iterator = stateful cursor (has __next__).
A for-loop uses both — but only ever calls iter() once per loop.

Building a Custom Iterator — A Real-World File Chunker

Here's where things get genuinely useful. Let's say you're processing a large log file and you want to read it in fixed-size chunks rather than line by line or all at once. You can't do this elegantly with a plain list. This is the exact scenario custom iterators were made for.

To make an object an iterator, you implement two methods: __iter__ returns self (because the iterator is its own iterable), and __next__ returns the next value or raises StopIteration. That's the entire contract.

The power here is state. Your iterator class can carry any state it needs between calls to __next__ — a file handle, a counter, a buffer, a database cursor position. This is what separates a custom iterator from a simple function: it pauses between calls and picks up exactly where it left off, making it perfect for streaming, pagination, and lazy computation.

file_chunk_iterator.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
class FileChunkIterator:
    """
    Reads a text file in fixed-size chunks.
    Useful for processing large files without loading them entirely into memory.
    """

    def __init__(self, filepath, chunk_size=5):
        self.filepath = filepath
        self.chunk_size = chunk_size  # How many lines per chunk
        self._file_handle = None      # Will hold the open file — initialised in __iter__
        self._line_buffer = []        # Accumulates lines until chunk is full

    def __iter__(self):
        # Open the file fresh each time iteration starts
        # This allows the iterator to be restarted cleanly
        self._file_handle = open(self.filepath, "r")
        self._line_buffer = []
        return self  # An iterator must return itself here

    def __next__(self):
        # Try to fill a chunk from the file
        while len(self._line_buffer) < self.chunk_size:
            raw_line = self._file_handle.readline()
            if not raw_line:  # readline() returns '' at end of file
                break
            self._line_buffer.append(raw_line.strip())

        if not self._line_buffer:  # Buffer empty — file is fully consumed
            self._file_handle.close()  # Clean up the file handle
            raise StopIteration      # Signal to the for-loop: we're done

        # Pull exactly chunk_size lines (or fewer if we're near the end)
        chunk = self._line_buffer[:self.chunk_size]
        self._line_buffer = self._line_buffer[self.chunk_size:]
        return chunk


# --- Demo: create a temporary log file and iterate over it in chunks ---
import tempfile
import os

# Write 12 fake log lines to a temp file
log_lines = [f"[INFO] Event #{i} processed successfully" for i in range(1, 13)]

with tempfile.NamedTemporaryFile(mode="w", suffix=".log", delete=False) as temp_log:
    temp_log.write("\n".join(log_lines))
    temp_filepath = temp_log.name

# Iterate over the log in chunks of 5 lines
chunk_reader = FileChunkIterator(temp_filepath, chunk_size=5)

for chunk_number, log_chunk in enumerate(chunk_reader, start=1):
    print(f"--- Chunk {chunk_number} ({len(log_chunk)} lines) ---")
    for log_entry in log_chunk:
        print(f"  {log_entry}")

os.unlink(temp_filepath)  # Clean up the temp file
▶ Output
--- Chunk 1 (5 lines) ---
[INFO] Event #1 processed successfully
[INFO] Event #2 processed successfully
[INFO] Event #3 processed successfully
[INFO] Event #4 processed successfully
[INFO] Event #5 processed successfully
--- Chunk 2 (5 lines) ---
[INFO] Event #6 processed successfully
[INFO] Event #7 processed successfully
[INFO] Event #8 processed successfully
[INFO] Event #9 processed successfully
[INFO] Event #10 processed successfully
--- Chunk 3 (2 lines) ---
[INFO] Event #11 processed successfully
[INFO] Event #12 processed successfully
💡Pro Tip — Close Resources in StopIteration:
Always close file handles, database connections, or network sockets inside your StopIteration block. If you only close them in a finally clause in the calling code, you're assuming callers will be responsible — they often aren't. Encapsulate cleanup inside __next__ where you raise StopIteration, or implement a __del__ method as a safety net.
📊 Production Insight
Forgetting to close file handles on StopIteration leads to resource leaks in long-running services.
A file handle leak of 1000 open FDs will crash your process with EMFILE.
Always close resources inside the iterator class — don't rely on the consumer to do it.
🎯 Key Takeaway
Custom iterator = class with __iter__ (return self) and __next__ (return next/raise StopIteration).
State is held between calls — use it for streaming, pagination, chunking.
Clean up resources on StopIteration to prevent leaks.

Generator Functions — The Shortcut Python Gives You

Writing a full iterator class is powerful, but verbose. Python gives you a shortcut: generator functions. Any function that contains a yield statement automatically becomes a factory for iterator objects called generators. Python handles all the __iter__ and __next__ plumbing for you.

Under the hood, calling a generator function doesn't execute the body at all — it returns a generator object. Each call to next() on that object resumes execution from the last yield, suspending again at the next one. This is exactly the same pause-and-resume behaviour as our custom iterator, but expressed in a fraction of the code.

The real-world sweet spot for generators is producing sequences that are either very large or computationally expensive — think paginated API responses, mathematical series, or streaming transformations. If your data source is 'pull-based' (you ask for the next item when you're ready), a generator is almost always the right tool.

api_pagination_generator.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
import time

# Simulates a paginated REST API that returns users in pages
def mock_api_fetch(page_number, page_size=3):
    """
    Pretend this is calling requests.get('https://api.example.com/users?page=N').
    Returns a list of users for that page, or empty list when pages run out.
    """
    all_users = [
        {"id": 1, "name": "Alice Nakamura"},
        {"id": 2, "name": "Ben Okafor"},
        {"id": 3, "name": "Carmen Reyes"},
        {"id": 4, "name": "David Chen"},
        {"id": 5, "name": "Elena Petrov"},
        {"id": 6, "name": "Farhan Malik"},
        {"id": 7, "name": "Grace Osei"},
    ]
    start_index = (page_number - 1) * page_size
    return all_users[start_index : start_index + page_size]


def paginated_user_stream(page_size=3):
    """
    Generator that fetches users page-by-page.
    Callers never need to know pagination exists — they just iterate.
    """
    current_page = 1
    while True:
        print(f"  [Generator] Fetching page {current_page} from API...")
        page_data = mock_api_fetch(current_page, page_size)

        if not page_data:  # Empty page means no more data
            print("  [Generator] All pages consumed — raising StopIteration")
            return  # 'return' inside a generator raises StopIteration automatically

        for user in page_data:
            yield user  # Suspend here, hand one user to the caller

        current_page += 1  # Only advance the page after all users on it are yielded


# The caller just iterates — zero pagination logic leaks out
print("Streaming all users from API:\n")
for user in paginated_user_stream(page_size=3):
    print(f"  Processing user: {user['name']} (ID: {user['id']})")

print("\nDone — all users processed.")
▶ Output
Streaming all users from API:

[Generator] Fetching page 1 from API...
Processing user: Alice Nakamura (ID: 1)
Processing user: Ben Okafor (ID: 2)
Processing user: Carmen Reyes (ID: 3)
[Generator] Fetching page 2 from API...
Processing user: David Chen (ID: 4)
Processing user: Elena Petrov (ID: 5)
Processing user: Farhan Malik (ID: 6)
[Generator] Fetching page 3 from API...
Processing user: Grace Osei (ID: 7)
[Generator] Fetching page 4 from API...
[Generator] All pages consumed — raising StopIteration

Done — all users processed.
🔥Interview Gold — Generator vs Iterator Class:
A generator function is syntactic sugar for writing an iterator class. They are functionally equivalent — but generators are dramatically more readable for sequential, single-pass data production. Use a class when you need multiple methods, complex state, or the ability to restart it cleanly.
📊 Production Insight
A generator that holds an open network connection (like a paginated API) must close that connection when the generator is garbage collected or exhausted.
If the consuming code breaks out of a loop early, the generator's cleanup may not run. Use try/finally inside the generator to guarantee resource release, or wrap it in a context manager.
🎯 Key Takeaway
Generators = implicit iterators via yield.
Each call to the generator function returns a fresh generator object.
Use try/finally inside generators for cleanup of external resources.

Lazy Evaluation — How Iterators Enable Streaming and Large Data Processing

The real superpower of iterators isn't just the protocol — it's that they evaluate values only when asked. This is lazy evaluation. Instead of building a whole list in memory, an iterator produces one element at a time. That means you can process data streams that would never fit in RAM: reading a 100 GB log file, iterating over an infinite mathematical sequence, or consuming a real-time sensor feed.

Python's standard library is full of lazy iterators: map(), filter(), zip(), enumerate(), reversed() (on sequences) — all return iterators. Even range() returns an iterable that produces numbers on demand, not a list of all numbers. This design is intentional: Python defaults to lazy unless you force it with list(), tuple(), or a comprehension with brackets.

Understanding lazy evaluation helps you design systems that are memory-efficient by default. If you find yourself calling list() on a generator just to pass it to a function, stop and ask: does that function truly need random access, or can it work with a stream? In many cases, the function itself can be refactored to iterate lazily.

lazy_evaluation.py · PYTHON
12345678910111213141516171819202122232425
# Compare memory usage between eager (list) and lazy (generator) for a range

# Eager: creates a list of 10 million integers — ~400 MB in memory!
eager_squares = [x * x for x in range(10_000_000)]  # DON'T run this unless you have spare memory

# Lazy: generator expression produces one square at a time — ~0 bytes for the data
lazy_squares = (x * x for x in range(10_000_000))

# Use a small slice to demonstrate laziness
small_lazy = (x * x for x in range(10))
for value in small_lazy:
    print(value, end=' ')  # Prints 0 1 4 9 16 25 36 49 64 81

print()

# map() and filter() are also lazy — they don't compute until iterated
nums = [1, 2, 3, 4, 5]
lazy_doubled = map(lambda n: n * 2, nums)
print(lazy_doubled)  # <map object at 0x...> — not a list!
print(list(lazy_doubled))  # [2, 4, 6, 8, 10]

# Practical example: reading lines from a huge file without loading all
# with open('giant_log.txt') as f:
#     for line in f:    # f is an iterator over lines
#         process(line)  # Only one line in memory at a time
▶ Output
0 1 4 9 16 25 36 49 64 81
<map object at 0x...>
[2, 4, 6, 8, 10]
Mental Model
Lazy vs Eager Mental Model
Think of lazy evaluation as a vending machine vs a warehouse: with a vending machine, you get one item at a time and the machine restocks automatically (iterator). A warehouse gives you everything at once, but takes up huge space (list).
  • Lazy: only compute/load what the consumer asks for, one step at a time.
  • Eager: compute/load everything up front, storing it all in memory.
  • Python's built-in functions like map, filter, zip are lazy by default.
  • Converting a lazy sequence to a list forces eager evaluation — use sparingly.
📊 Production Insight
Lazy evaluation hides the cost of iteration until the moment you need it.
If you chain many lazy operations (map, filter, zip), each next() call only goes one level deep.
This can lead to surprising performance: the cost is spread out, not batched. Use profiling to ensure latency is acceptable in hot loops.
🎯 Key Takeaway
Lazy = compute on demand, memory efficient.
Eager = compute up front, fast random access.
Prefer lazy by default; convert to eager only when necessary.

The Exhaustion Trap and the iter() Sentinel Form

Here's the behaviour that catches almost everyone at some point: iterators are one-shot. Once an iterator is exhausted, it stays exhausted. Calling iter() on an already-exhausted iterator just returns the same dead object — it does not reset. This is different from calling iter() on an iterable like a list, which creates a brand-new iterator.

This distinction has a practical consequence: if you pass an iterator (not an iterable) to two functions, the second one will silently get nothing. No error. Just an empty loop. These bugs are genuinely hard to track down.

Python also has a lesser-known second form of iter() — iter(callable, sentinel) — which wraps any zero-argument callable into an iterator that keeps calling it until the return value equals the sentinel. This is incredibly useful for reading data in fixed-size blocks, processing queue items, or any situation where you have a 'pull until done' data source.

iterator_exhaustion_and_sentinel.py · PYTHON
123456789101112131415161718192021222324252627282930313233
# ── PART 1: The Exhaustion Trap ──────────────────────────────────────────────

team_members = ["Priya", "Jordan", "Kwame"]  # This is an ITERABLE
team_iterator = iter(team_members)            # This is an ITERATOR (stateful)

print("First loop:")
for member in team_iterator:
    print(f"  Hello, {member}")

print("\nSecond loop over the SAME iterator (iterator is exhausted):")
for member in team_iterator:  # This loop body never executes — silent failure!
    print(f"  Hello again, {member}")

print("  (nothing printed — iterator was already exhausted)")

print("\nSecond loop over the ORIGINAL LIST (always works):")
for member in team_members:  # list is an iterable — fresh iterator each time
    print(f"  Hello again, {member}")


# ── PART 2: The iter(callable, sentinel) Form ─────────────────────────────────

import io

# Simulate reading binary data in 4-byte blocks from a stream
binary_stream = io.BytesIO(b"TheCodeForge.io rocks!")

print("\nReading binary stream in 4-byte blocks:")

# iter(callable, sentinel): calls binary_stream.read(4) repeatedly
# Stops automatically when read() returns b'' (empty bytes — end of stream)
for block in iter(lambda: binary_stream.read(4), b""):
    print(f"  Block: {block}")  # Each block is exactly 4 bytes (except maybe last)
▶ Output
First loop:
Hello, Priya
Hello, Jordan
Hello, Kwame

Second loop over the SAME iterator (iterator is exhausted):
(nothing printed — iterator was already exhausted)

Second loop over the ORIGINAL LIST (always works):
Hello again, Priya
Hello again, Jordan
Hello again, Kwame

Reading binary stream in 4-byte blocks:
Block: b'TheC'
Block: b'odeF'
Block: b'orge'
Block: b'.io '
Block: b'rock'
Block: b's!'
⚠ Watch Out — Passing Iterators to Multiple Functions:
If you do items = iter(some_list) and pass items to two different functions, the second function will see an exhausted iterator and loop over nothing. Always pass the original iterable (the list/set/etc.) unless you deliberately want to share position state. When in doubt, check: does this object have __next__? If yes, it's an iterator — treat it as one-shot.
📊 Production Insight
The one-shot nature of iterators causes data-loss bugs that are notoriously hard to reproduce.
If your ETL pipeline uses generators, always call the factory function again for each consumer.
A common pattern: expose a function that returns a fresh iterator, not a pre-wired iterator object.
🎯 Key Takeaway
Iterators are one-shot — once exhausted, they never recover.
To iterate twice, always use the original iterable (not the iterator).
The iter(callable, sentinel) form is a concise pattern for polling data sources.
Feature / AspectIterableIterator
Required methods__iter__() only__iter__() and __next__()
Holds state between callsNo — statelessYes — remembers current position
Can be looped multiple timesYes — creates a fresh iterator each timeNo — exhausted after one full pass
Examples in Python stdliblist, tuple, str, dict, setfile objects, enumerate(), zip(), map()
What iter() returnsA new iterator objectItself (self)
Memory usage patternUsually stores all dataUsually produces one item at a time
Can be passed to for-loop?YesYes
Restarting iterationAutomatic — just loop againMust call iter() on the source iterable again

🎯 Key Takeaways

  • An iterable produces iterators; an iterator IS the stateful cursor — confusing the two causes silent, hard-to-debug empty-loop bugs.
  • Every Python for-loop is secretly calling iter() once and then next() on every cycle until StopIteration is raised — there is no other mechanism.
  • Iterators are one-shot by design: once exhausted, they stay exhausted. Always pass the original iterable — not the iterator — when the same data needs to be traversed more than once.
  • Generator functions (using yield) are shorthand for writing iterator classes — use them for clean, memory-efficient lazy sequences; use a full class when you need restartable iteration or complex internal state.

⚠ Common Mistakes to Avoid

    Iterating an exhausted iterator and expecting results
    Symptom

    A for-loop runs but never executes its body — no output, no error. The iterator was already consumed by a previous loop.

    Fix

    Never store the result of iter() in a variable you plan to loop over more than once. Keep a reference to the original iterable (the list, tuple, or custom class) and call iter() fresh each time you need a new traversal. If you must reuse, convert to a list first: data = list(iterator).

    Forgetting to raise StopIteration in a custom __next__
    Symptom

    Infinite loop when your data source runs dry, or a cryptic RecursionError if you accidentally call __next__ recursively.

    Fix

    Every code path through __next__ must either return a value or raise StopIteration. Add a guard at the top: if self._position >= len(self._data): raise StopIteration before any return statement. Test with an empty collection.

    Treating a generator object as if it's reusable
    Symptom

    The second call to a function that returns a generator produces no results. Downstream code that depends on the data silently processes nothing.

    Fix

    A generator function (using yield) returns a new generator object each time it's called. If you need to iterate the same data twice, call the generator function again to get a fresh generator, or convert the first pass to a list with list(my_generator()). Do not store the generator object and reuse it.

Interview Questions on This Topic

  • QWhat's the difference between an iterable and an iterator in Python, and how does a for-loop use both of them internally?Mid-levelReveal
    An iterable is any object that implements __iter__(), which returns an iterator. An iterator implements both __iter__() (returning self) and __next__(), which returns the next element or raises StopIteration. A for-loop works by calling iter() on the target object to get an iterator, then repeatedly calling next() on that iterator, catching StopIteration to break. This means the for-loop creates a fresh iterator for iterables like lists, so you can loop multiple times. If you pass an iterator, it reuses the same exhausted one — that's why the second loop is silent.
  • QIf you call iter() on an iterator object rather than a plain list, what does it return and why? What are the implications of this for writing reusable code?Mid-levelReveal
    Calling iter() on an iterator returns the iterator itself (because __iter__ returns self). This means that if you pass an iterator to a for-loop that calls iter() once, the same iterator is used — it does not create a fresh one. The implication: if you store my_iter = iter(some_list) and then loop over my_iter twice, the second loop will see an exhausted iterator and produce no results. For reusable code, always accept or store the original iterable (e.g., list) rather than an iterator, so that each loop gets a fresh iterator.
  • QYou have a generator function that yields results from a paginated API. A colleague calls it once, stores the result in a variable, and passes that variable to two different processing functions. What bug will they hit, and how would you redesign the code to fix it?SeniorReveal
    The bug is that the generator is exhausted after the first processing function consumes it. The second processing function receives an exhausted iterator and processes zero items. The fix: instead of passing the generator object, pass the generator function itself (or a wrapper that calls it) to each processing function, so each gets a fresh generator. Alternatively, materialize the generator results into a list if the paginated data fits in memory, then pass the list. The best practice is to have the processing functions accept a callable that produces an iterator, not the iterator instance.

Frequently Asked Questions

Is every iterator also an iterable in Python?

Yes — by convention and by the protocol definition, every iterator must implement __iter__ returning self. This means you can pass an iterator directly to a for-loop or any function that expects an iterable. The reverse is not true: an iterable is not necessarily an iterator unless it also implements __next__.

What is the difference between a generator and an iterator in Python?

A generator is a specific type of iterator created either by a generator function (using yield) or a generator expression. All generators are iterators, but not all iterators are generators. The practical difference is implementation style: generators write themselves via yield, while iterators require explicit __iter__ and __next__ methods on a class.

Why does Python use StopIteration instead of returning None or a special value to signal the end?

Using an exception means the protocol works even when None or any other sentinel value is a legitimate item in your sequence. If your iterator yields None, returning None to signal 'done' would be ambiguous. StopIteration is unambiguous, and Python's for-loop catches it automatically so you never see it in normal usage — it's a control-flow mechanism, not an error.

What is the `iter(callable, sentinel)` form and when should I use it?

It repeatedly calls a zero-argument callable and stops when the return value equals the sentinel. This is perfect for reading a socket in blocks, polling a queue, or any 'keep reading until we see a stop marker' pattern. Example: for chunk in iter(functools.partial(sock.recv, 1024), b'') reads a socket in 1K chunks until the connection closes.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

Next →Coroutines and asyncio in Python
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged