Intermediate 14 min · March 05, 2026

Iterators and Iterables in Python

Python Iterator Exhaustion — Silent Data Drop in ETL

Q: Is every iterator also an iterable in Python?

Yes — by convention and by the protocol definition, every iterator must implement __iter__ returning self. This means you can pass an iterator directly to a for-loop or any function that expects an iterable. The reverse is not true: an iterable is not necessarily an iterator unless it also implements __next__.

Q: What is the difference between a generator and an iterator in Python?

A generator is a specific type of iterator created either by a generator function (using yield) or a generator expression. All generators are iterators, but not all iterators are generators. The practical difference is implementation style: generators write themselves via yield, while iterators require explicit __iter__ and __next__ methods on a class.

Q: Why does Python use StopIteration instead of returning None or a special value to signal the end?

Using an exception means the protocol works even when None or any other sentinel value is a legitimate item in your sequence. If your iterator yields None, returning None to signal 'done' would be ambiguous. StopIteration is unambiguous, and Python's for-loop catches it automatically so you never see it in normal usage — it's a control-flow mechanism, not an error.

Q: What is the `iter(callable, sentinel)` form and when should I use it?

It repeatedly calls a zero-argument callable and stops when the return value equals the sentinel. This is perfect for reading a socket in blocks, polling a queue, or any 'keep reading until we see a stop marker' pattern. Example: `for chunk in iter(functools.partial(sock.recv, 1024), b'')` reads a socket in 1K chunks until the connection closes.

Generator exhaustion silently dropped 50% of ETL records.

Naren Founder & Principal Engineer

20+ years shipping production Python across data and backend systems. Lessons pulled from things that broke in production.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

An iterable has __iter__() that returns an iterator
An iterator has __next__() that yields items and raises StopIteration when done
Every Python for-loop calls iter() once then next() in a loop until StopIteration
Lists, tuples, strings are iterables; file objects are iterators
Generators are iterator factories written with yield — same protocol, less code
Reusing an iterator causes silent empty loops — always pass the iterable

✦ Definition~90s read

What is Iterators and Iterables in Python?

Iterator exhaustion is a class of bug where a Python iterator silently yields no data after being consumed once, often causing ETL pipelines to process empty datasets without raising errors. Unlike lists or tuples that can be iterated multiple times, iterators are single-use objects — once you've looped over them, they're empty.

★

Imagine a deck of playing cards.

This becomes a silent data drop when you pass an iterator to multiple consumers (e.g., logging, validation, transformation) and the first consumer drains it, leaving nothing for the rest. Real-world ETL tools like Apache Airflow or Pandas often expose iterators under the hood (e.g., DataFrame.iterrows(), file readers), and developers unknowingly exhaust them by calling list() or next() for debugging, then wondering why downstream stages receive zero rows.

Python's iterator protocol is a two-tier system: an iterable (e.g., list, file object) implements __iter__() returning a fresh iterator each time, while an iterator (e.g., generator, map object) implements both __iter__() (returning itself) and __next__(). The critical distinction is that iterables can be looped multiple times; iterators cannot.

When you write for x in obj, Python calls iter(obj) — if obj is an iterator, iter() returns the same exhausted object. This is why zip() or map() objects break silently when reused, and why csv.reader or open() file handles behave differently depending on whether they're wrapped in a list.

To avoid exhaustion bugs, you have three practical options: (1) materialize the iterator into a list when you need multiple passes (costs memory but guarantees data), (2) use itertools.tee() to clone an iterator into N independent copies (lazy but memory-intensive for large streams), or (3) design your ETL functions to accept iterables rather than iterators, calling iter() internally to get a fresh iterator each time. The itertools module provides lazy building blocks like chain, islice, and groupby that compose iterators without materializing — but they're still single-use, so you must structure your pipeline to consume them exactly once.

Understanding this protocol is the difference between a pipeline that silently drops 40GB of data and one that reliably processes it.

Plain-English First

Imagine a deck of playing cards. The deck itself is the iterable — it's the thing that holds all the cards. The dealer who picks up one card at a time, remembers where they left off, and hands each card to you one by one? That's the iterator. You never grab the whole deck at once — you get one card, use it, then ask for the next. Python's for-loops work exactly this way, quietly using an invisible dealer every single time.

Every Python developer uses for-loops from day one, but almost nobody stops to ask: how does Python actually know what to do next on each loop cycle? The answer lives inside a two-method protocol — __iter__ and __next__ — and once you understand it, you'll see it everywhere: in file reading, database cursors, API pagination, and streaming data pipelines. This isn't just academic knowledge; it's the engine under the hood of the language itself.

The problem this solves is memory and control. If Python loaded every item from a collection into memory before you could loop over it, working with a 10-million-row CSV file or an infinite sequence of sensor readings would be impossible. Iterators let you process one item at a time, on demand, without ever needing to know how many items exist in total. This lazy evaluation is what makes Python practical for real data-engineering work.

By the end of this article you'll be able to explain exactly what happens when Python executes a for-loop, write your own custom iterator class from scratch, spot the difference between an iterable and an iterator in a code review, and avoid the subtle bugs that trip up even experienced developers when they assume an iterator can be reused.

Why Iterator Exhaustion Is a Silent Data Bug

An iterator in Python is an object that implements __iter__() and __next__(), producing values one at a time and raising StopIteration when exhausted. An iterable is any object that can return an iterator via iter(). The core mechanic: iterators are stateful — they track position and can only be traversed once. This single-pass design is memory-efficient (O(1) space) but introduces a hidden failure mode when code assumes reusability.

In practice, passing an iterator to multiple consumers (e.g., two list() calls, or a for loop followed by sum()) silently yields empty results after the first consumption. The iterator doesn't reset; it stays exhausted. This contrasts with iterables like lists, which produce fresh iterators each time. The distinction matters because many built-in functions (map, filter, zip) and generators return iterators, not iterables.

Use iterators when processing large or infinite streams where memory is constrained. But never assume an iterator can be reused. In ETL pipelines, this mistake drops entire datasets without errors — the second consumer simply sees zero rows. Always convert to a concrete collection (list, tuple) if you need multiple passes, or restructure to a single-pass pattern.

⚠ Iterator != Iterable

An iterable can produce multiple iterators; an iterator is a one-time ticket. Calling iter() on an iterator returns itself — confirming it's already exhausted.

📊 Production Insight

ETL pipeline reading a 10M-row CSV via a generator, then passing the same iterator to a validation function and a load function. The load function receives zero rows because the iterator was already consumed by validation. Symptom: empty target table with no error logs. Rule: if you need to consume data more than once, materialize it first.

🎯 Key Takeaway

Iterators are single-use — treat them like a file handle, not a list.

Converting an iterator to a list is O(n) memory but prevents silent data loss.

Always check whether a function returns an iterator or an iterable before passing it to multiple consumers.

thecodeforge.io

Iterators Iterables Python

The Two-Protocol System: Iterable vs Iterator

Python draws a firm line between two roles. An iterable is any object that knows how to produce an iterator — it has an __iter__ method that returns one. A list, a string, a tuple, a dict — all iterables. An iterator is the stateful worker that actually does the traversal. It has both __iter__ (which just returns itself) and __next__ (which delivers the next item or raises StopIteration when it's done).

This separation exists for a good reason: you want to be able to loop over the same list a hundred times without it 'running out'. The list (iterable) stays neutral. Each time you start a new loop, Python silently calls iter(your_list) to create a fresh iterator — a brand-new dealer for your deck of cards.

You can manually step through this process with Python's built-in iter() and next() functions, which is exactly what a for-loop does internally on every single iteration. Understanding this unlocks everything else in this article.

iterable_vs_iterator.pyPYTHON

# A plain list is an ITERABLE — it can produce iterators on demand
playlist = ["Bohemian Rhapsody", "Hotel California", "Stairway to Heaven"]

# iter() calls playlist.__iter__() and returns a fresh ITERATOR
playlist_iterator = iter(playlist)

# next() calls playlist_iterator.__next__() each time
print(next(playlist_iterator))  # Fetches item 1 — iterator remembers position
print(next(playlist_iterator))  # Fetches item 2 — picks up exactly where it left off
print(next(playlist_iterator))  # Fetches item 3 — last item

# The iterator is now exhausted — calling next() again raises StopIteration
try:
    print(next(playlist_iterator))
except StopIteration:
    print("Iterator exhausted — no more songs!")

# The original list is UNCHANGED — start a fresh iterator anytime
fresh_iterator = iter(playlist)
print(next(fresh_iterator))  # Back to the beginning — "Bohemian Rhapsody"

# A for-loop does ALL of this invisibly on every iteration
for song in playlist:          # Python calls iter(playlist) once here
    print(f"Now playing: {song}")  # Then calls next() on each cycle

Output

Bohemian Rhapsody

Hotel California

Stairway to Heaven

Iterator exhausted — no more songs!

Bohemian Rhapsody

Now playing: Bohemian Rhapsody

Now playing: Hotel California

Now playing: Stairway to Heaven

🔥What a for-loop actually does:

Python translates for item in collection into roughly: _iter = iter(collection), then a while-loop that calls next(_iter) and catches StopIteration to break. There is no magic — just these two protocol methods called repeatedly.

📊 Production Insight

Many engineers think for-loops are magic. They aren't.

The for-loop transforms into a while-loop that calls iter() and next().

If you accidentally pass an iterator to a for-loop twice, the second loop will silently do nothing.

🎯 Key Takeaway

iterable = can produce iterators (has __iter__).

iterator = stateful cursor (has __next__).

A for-loop uses both — but only ever calls iter() once per loop.

Building a Custom Iterator — A Real-World File Chunker

Here's where things get genuinely useful. Let's say you're processing a large log file and you want to read it in fixed-size chunks rather than line by line or all at once. You can't do this elegantly with a plain list. This is the exact scenario custom iterators were made for.

To make an object an iterator, you implement two methods: __iter__ returns self (because the iterator is its own iterable), and __next__ returns the next value or raises StopIteration. That's the entire contract.

The power here is state. Your iterator class can carry any state it needs between calls to __next__ — a file handle, a counter, a buffer, a database cursor position. This is what separates a custom iterator from a simple function: it pauses between calls and picks up exactly where it left off, making it perfect for streaming, pagination, and lazy computation.

file_chunk_iterator.pyPYTHON

class FileChunkIterator:
    """
    Reads a text file in fixed-size chunks.
    Useful for processing large files without loading them entirely into memory.
    """

    def __init__(self, filepath, chunk_size=5):
        self.filepath = filepath
        self.chunk_size = chunk_size  # How many lines per chunk
        self._file_handle = None      # Will hold the open file — initialised in __iter__
        self._line_buffer = []        # Accumulates lines until chunk is full

    def __iter__(self):
        # Open the file fresh each time iteration starts
        # This allows the iterator to be restarted cleanly
        self._file_handle = open(self.filepath, "r")
        self._line_buffer = []
        return self  # An iterator must return itself here

    def __next__(self):
        # Try to fill a chunk from the file
        while len(self._line_buffer) < self.chunk_size:
            raw_line = self._file_handle.readline()
            if not raw_line:  # readline() returns '' at end of file
                break
            self._line_buffer.append(raw_line.strip())

        if not self._line_buffer:  # Buffer empty — file is fully consumed
            self._file_handle.close()  # Clean up the file handle
            raise StopIteration      # Signal to the for-loop: we're done

        # Pull exactly chunk_size lines (or fewer if we're near the end)
        chunk = self._line_buffer[:self.chunk_size]
        self._line_buffer = self._line_buffer[self.chunk_size:]
        return chunk


# --- Demo: create a temporary log file and iterate over it in chunks ---
import tempfile
import os

# Write 12 fake log lines to a temp file
log_lines = [f"[INFO] Event #{i} processed successfully" for i in range(1, 13)]

with tempfile.NamedTemporaryFile(mode="w", suffix=".log", delete=False) as temp_log:
    temp_log.write("\n".join(log_lines))
    temp_filepath = temp_log.name

# Iterate over the log in chunks of 5 lines
chunk_reader = FileChunkIterator(temp_filepath, chunk_size=5)

for chunk_number, log_chunk in enumerate(chunk_reader, start=1):
    print(f"--- Chunk {chunk_number} ({len(log_chunk)} lines) ---")
    for log_entry in log_chunk:
        print(f"  {log_entry}")

os.unlink(temp_filepath)  # Clean up the temp file

Output

--- Chunk 1 (5 lines) ---

[INFO] Event #1 processed successfully

[INFO] Event #2 processed successfully

[INFO] Event #3 processed successfully

[INFO] Event #4 processed successfully

[INFO] Event #5 processed successfully

--- Chunk 2 (5 lines) ---

[INFO] Event #6 processed successfully

[INFO] Event #7 processed successfully

[INFO] Event #8 processed successfully

[INFO] Event #9 processed successfully

[INFO] Event #10 processed successfully

--- Chunk 3 (2 lines) ---

[INFO] Event #11 processed successfully

[INFO] Event #12 processed successfully

💡Pro Tip — Close Resources in StopIteration:

Always close file handles, database connections, or network sockets inside your StopIteration block. If you only close them in a finally clause in the calling code, you're assuming callers will be responsible — they often aren't. Encapsulate cleanup inside __next__ where you raise StopIteration, or implement a __del__ method as a safety net.

📊 Production Insight

Forgetting to close file handles on StopIteration leads to resource leaks in long-running services.

A file handle leak of 1000 open FDs will crash your process with EMFILE.

Always close resources inside the iterator class — don't rely on the consumer to do it.

🎯 Key Takeaway

Custom iterator = class with __iter__ (return self) and __next__ (return next/raise StopIteration).

State is held between calls — use it for streaming, pagination, chunking.

Clean up resources on StopIteration to prevent leaks.

thecodeforge.io

Iterators Iterables Python

Custom Iterator Class Implementation — Step-by-Step Template

While the file chunker above is a practical example, you'll often need a generic blueprint for building your own custom iterators. The pattern is always the same, whether you're wrapping a database cursor, a paginated API, or a live data stream.

Step 1: Define the class and __init__ – Accept your data source and any configuration. Store everything you need to start fresh. Do not open external resources here yet — delay that to __iter__ to allow restartability.

Step 2: Implement __iter__ – This method returns the iterator object. For one-shot iterators, return self. If you want the iterator to be restartable (i.e., you can call iter() again and get a fresh start), reset all state here, re-open resources, and return self. Restartability is optional but powerful.

Step 3: Implement __next__ – This is the core. Check if there's a next item available. If yes, return it after advancing internal state. If no, raise StopIteration. Every code path must end with either a return or a StopIteration — no fall-through.

Step 4: Handle cleanup – Close files, connections, or release locks when StopIteration is raised. Alternatively, implement __del__ as a safety net, but don't rely on it solely because garbage collection timing is unpredictable.

Step 5: Test with edge cases – Empty data, single item, partial iteration (breaking out early), and multiple iterations (if restartable).

generic_iterator_template.pyPYTHON

class GenericIterator:
    """
    Template for a restartable custom iterator.
    Adapt __init__ to your data source and __next__ to your traversal logic.
    """
    def __init__(self, data_source):
        # Source could be a list, file path, API client, etc.
        self._source = data_source
        # Internal state (will be reset in __iter__ if restartable)
        self._position = 0
        self._resource = None

    def __iter__(self):
        # Reset state so we can iterate again
        self._position = 0
        # Re-open external resource (e.g., file, network socket)
        # self._resource = open(self._source, 'r')
        return self

    def __next__(self):
        # 1. Check stop condition
        if self._position >= len(self._source):  # example for a list
            # Cleanup before raising
            # if self._resource:
            #     self._resource.close()
            raise StopIteration
        # 2. Fetch next item
        item = self._source[self._position]
        # 3. Advance state
        self._position += 1
        return item

# Usage
my_iter = GenericIterator([10, 20, 30])
for val in my_iter:
    print(val)
print("Second pass:")
for val in my_iter:  # Works because __iter__ resets position
    print(val)

Output

Second pass:

🔥One-shot vs Restartable:

If your iterator wraps a consumable source (e.g., a network stream), it cannot be restarted. In that case, __iter__ should simply return self without resetting. If your iterator wraps a reusable source (e.g., a list or a file you can re-open), make __iter__ reset the state so consumers can iterate multiple times. Document whether your iterator is restartable or one-shot.

📊 Production Insight

In production, prefer restartable iterators when possible — they prevent the silent exhaustion bug we describe in the incident section. If the source is not restartable (e.g., a Kafka stream), enforce single-consumer patterns by documenting the contract and adding asserts in __iter__ to detect double iteration.

🎯 Key Takeaway

Build custom iterators with __init__(source), __iter__(reset/return self), __next__(return or StopIteration). Always clean up resources on StopIteration. Make it restartable if the source allows it.

Generator Functions — The Shortcut Python Gives You

Writing a full iterator class is powerful, but verbose. Python gives you a shortcut: generator functions. Any function that contains a yield statement automatically becomes a factory for iterator objects called generators. Python handles all the __iter__ and __next__ plumbing for you.

Under the hood, calling a generator function doesn't execute the body at all — it returns a generator object. Each call to next() on that object resumes execution from the last yield, suspending again at the next one. This is exactly the same pause-and-resume behaviour as our custom iterator, but expressed in a fraction of the code.

The real-world sweet spot for generators is producing sequences that are either very large or computationally expensive — think paginated API responses, mathematical series, or streaming transformations. If your data source is 'pull-based' (you ask for the next item when you're ready), a generator is almost always the right tool.

api_pagination_generator.pyPYTHON

import time

# Simulates a paginated REST API that returns users in pages
def mock_api_fetch(page_number, page_size=3):
    """
    Pretend this is calling requests.get('https://api.example.com/users?page=N').
    Returns a list of users for that page, or empty list when pages run out.
    """
    all_users = [
        {"id": 1, "name": "Alice Nakamura"},
        {"id": 2, "name": "Ben Okafor"},
        {"id": 3, "name": "Carmen Reyes"},
        {"id": 4, "name": "David Chen"},
        {"id": 5, "name": "Elena Petrov"},
        {"id": 6, "name": "Farhan Malik"},
        {"id": 7, "name": "Grace Osei"},
    ]
    start_index = (page_number - 1) * page_size
    return all_users[start_index : start_index + page_size]


def paginated_user_stream(page_size=3):
    """
    Generator that fetches users page-by-page.
    Callers never need to know pagination exists — they just iterate.
    """
    current_page = 1
    while True:
        print(f"  [Generator] Fetching page {current_page} from API...")
        page_data = mock_api_fetch(current_page, page_size)

        if not page_data:  # Empty page means no more data
            print("  [Generator] All pages consumed — raising StopIteration")
            return  # 'return' inside a generator raises StopIteration automatically

        for user in page_data:
            yield user  # Suspend here, hand one user to the caller

        current_page += 1  # Only advance the page after all users on it are yielded


# The caller just iterates — zero pagination logic leaks out
print("Streaming all users from API:\n")
for user in paginated_user_stream(page_size=3):
    print(f"  Processing user: {user['name']} (ID: {user['id']})")

print("\nDone — all users processed.")

Output

Streaming all users from API:

[Generator] Fetching page 1 from API...

Processing user: Alice Nakamura (ID: 1)

Processing user: Ben Okafor (ID: 2)

Processing user: Carmen Reyes (ID: 3)

[Generator] Fetching page 2 from API...

Processing user: David Chen (ID: 4)

Processing user: Elena Petrov (ID: 5)

Processing user: Farhan Malik (ID: 6)

[Generator] Fetching page 3 from API...

Processing user: Grace Osei (ID: 7)

[Generator] Fetching page 4 from API...

[Generator] All pages consumed — raising StopIteration

Done — all users processed.

🔥Interview Gold — Generator vs Iterator Class:

A generator function is syntactic sugar for writing an iterator class. They are functionally equivalent — but generators are dramatically more readable for sequential, single-pass data production. Use a class when you need multiple methods, complex state, or the ability to restart it cleanly.

📊 Production Insight

A generator that holds an open network connection (like a paginated API) must close that connection when the generator is garbage collected or exhausted.

If the consuming code breaks out of a loop early, the generator's cleanup may not run. Use try/finally inside the generator to guarantee resource release, or wrap it in a context manager.

🎯 Key Takeaway

Generators = implicit iterators via yield.

Each call to the generator function returns a fresh generator object.

Use try/finally inside generators for cleanup of external resources.

itertools Quick Reference — Lazy Iterator Building Blocks

Python's itertools module is a collection of fast, memory-efficient iterator building blocks. Every function in itertools returns a lazy iterator — nothing is evaluated until you loop over it. This makes them ideal for chaining transformations without blowing up memory.

Here's a quick reference table of the most commonly used itertools functions. Use this as a cheat sheet during development:

Function	Purpose	Example Usage
`count(start=0, step=1)`	Infinite arithmetic progression	`for i in itertools.count(10, 2):` yields 10, 12, 14,...
`cycle(iterable)`	Infinite repetition of an iterable	`for c in itertools.cycle('AB'):` yields A, B, A, B,...
`repeat(element, times=None)`	Repeat a single value	`itertools.repeat(3, 4)` yields 3, 3, 3, 3
`accumulate(iterable, func=operator.add)`	Running total (or any binary function)	`itertools.accumulate([1,2,3])` yields 1, 3, 6
`chain(*iterables)`	Treat multiple iterables as one	`itertools.chain([1,2], [3,4])` yields 1, 2, 3, 4
`compress(data, selectors)`	Filter data using a selector iterable	`itertools.compress('ABCD', [1,0,1,0])` yields A, C
`dropwhile(predicate, iterable)`	Drop items while predicate is true, then yield all	`itertools.dropwhile(lambda x: x<5, [1,4,6,2])` yields 6, 2
`takewhile(predicate, iterable)`	Yield items while predicate is true, stop on first false	`itertools.takewhile(lambda x: x<5, [1,4,6,2])` yields 1, 4
`filterfalse(predicate, iterable)`	Yield items where predicate is false	`itertools.filterfalse(lambda x: x%2, [1,2,3])` yields 2
`groupby(iterable, key=None)`	Consecutive keys and groups (sort first!)	`for key, group in itertools.groupby('AAABBC'):` yields groups A, B, C
`product(*iterables, repeat=1)`	Cartesian product	`itertools.product([0,1], repeat=2)` yields (0,0), (0,1), (1,0), (1,1)
`permutations(iterable, r=None)`	All r-length permutations	`itertools.permutations('AB', 2)` yields ('A','B'), ('B','A')
`combinations(iterable, r)`	All r-length combinations (order doesn't matter)	`itertools.combinations('AB', 2)` yields ('A','B')

When to use itertools? Any time you're writing a custom loop that involves skipping, grouping, or combining sequences. These functions are implemented in C and are significantly faster than equivalent pure-Python code.

itertools_examples.pyPYTHON

import itertools

# Infinite cycling – useful for round-robin routing
colors = ['red', 'blue', 'green']
color_cycle = itertools.cycle(colors)
print("Cycle:", [next(color_cycle) for _ in range(5)])

# Accumulate – running total of sales
sales = [100, 200, 150, 300]
running = list(itertools.accumulate(sales))
print("Accumulate:", running)

# Chain – merge two log files
log1 = ['ERROR: conn failed', 'WARN: timeout']
log2 = ['INFO: retry', 'ERROR: disk full']
merged = list(itertools.chain(log1, log2))
print("Chain:", merged)

# Product – all combinations of two features
features = ['a', 'b']
flags = [1, 2]
combos = list(itertools.product(features, flags))
print("Product:", combos)

# Takewhile – read until a sentinel line with log parsing
lines = ['start', 'ok', 'END', 'more']
for line in itertools.takewhile(lambda x: x != 'END', lines):
    print("Takewhile:", line)

Output

Cycle: ['red', 'blue', 'green', 'red', 'blue']

Accumulate: [100, 300, 450, 750]

Chain: ['ERROR: conn failed', 'WARN: timeout', 'INFO: retry', 'ERROR: disk full']

Product: [('a', 1), ('a', 2), ('b', 1), ('b', 2)]

Takewhile: start

Takewhile: ok

💡Performance Boost:

All itertools functions are implemented in C. Replacing a Python loop with, say, itertools.chain or itertools.groupby can give you a 2x-5x speed improvement. Additionally, since they are lazy, memory stays constant regardless of input size.

📊 Production Insight

In production ETL pipelines, itertools functions like groupby, chain, and takewhile are often used to process streaming data without materializing intermediate lists. Combine them with generator functions to build complex lazy pipelines that handle gigabytes of data with minimal memory footprint.

🎯 Key Takeaway

itertools provides lazy, fast, and composable iterator building blocks. Use them to replace custom loops and reduce memory usage. Common functions: count, cycle, repeat, accumulate, chain, takewhile, groupby, product.

Lazy Evaluation — How Iterators Enable Streaming and Large Data Processing

The real superpower of iterators isn't just the protocol — it's that they evaluate values only when asked. This is lazy evaluation. Instead of building a whole list in memory, an iterator produces one element at a time. That means you can process data streams that would never fit in RAM: reading a 100 GB log file, iterating over an infinite mathematical sequence, or consuming a real-time sensor feed.

Python's standard library is full of lazy iterators: map(), filter(), zip(), enumerate(), reversed() (on sequences) — all return iterators. Even range() returns an iterable that produces numbers on demand, not a list of all numbers. This design is intentional: Python defaults to lazy unless you force it with list(), tuple(), or a comprehension with brackets.

Understanding lazy evaluation helps you design systems that are memory-efficient by default. If you find yourself calling list() on a generator just to pass it to a function, stop and ask: does that function truly need random access, or can it work with a stream? In many cases, the function itself can be refactored to iterate lazily.

lazy_evaluation.pyPYTHON

# Compare memory usage between eager (list) and lazy (generator) for a range

# Eager: creates a list of 10 million integers — ~400 MB in memory!
eager_squares = [x * x for x in range(10_000_000)]  # DON'T run this unless you have spare memory

# Lazy: generator expression produces one square at a time — ~0 bytes for the data
lazy_squares = (x * x for x in range(10_000_000))

# Use a small slice to demonstrate laziness
small_lazy = (x * x for x in range(10))
for value in small_lazy:
    print(value, end=' ')  # Prints 0 1 4 9 16 25 36 49 64 81

print()

# map() and filter() are also lazy — they don't compute until iterated
nums = [1, 2, 3, 4, 5]
lazy_doubled = map(lambda n: n * 2, nums)
print(lazy_doubled)  # <map object at 0x...> — not a list!
print(list(lazy_doubled))  # [2, 4, 6, 8, 10]

# Practical example: reading lines from a huge file without loading all
# with open('giant_log.txt') as f:
#     for line in f:    # f is an iterator over lines
#         process(line)  # Only one line in memory at a time

Output

0 1 4 9 16 25 36 49 64 81

[2, 4, 6, 8, 10]

Mental Model

Lazy vs Eager Mental Model

Think of lazy evaluation as a vending machine vs a warehouse: with a vending machine, you get one item at a time and the machine restocks automatically (iterator). A warehouse gives you everything at once, but takes up huge space (list).

Lazy: only compute/load what the consumer asks for, one step at a time.
Eager: compute/load everything up front, storing it all in memory.
Python's built-in functions like map, filter, zip are lazy by default.
Converting a lazy sequence to a list forces eager evaluation — use sparingly.

📊 Production Insight

Lazy evaluation hides the cost of iteration until the moment you need it.

If you chain many lazy operations (map, filter, zip), each next() call only goes one level deep.

This can lead to surprising performance: the cost is spread out, not batched. Use profiling to ensure latency is acceptable in hot loops.

🎯 Key Takeaway

Lazy = compute on demand, memory efficient.

Eager = compute up front, fast random access.

Prefer lazy by default; convert to eager only when necessary.

Memory Efficiency Comparison: List vs Generator (Eager vs Lazy)

The single most important practical difference between lists and generators is memory consumption. For large datasets, a list stores all elements in memory simultaneously, while a generator produces each element on demand and discards it after use. This difference can mean the difference between a pipeline that runs on a laptop and one that crashes with MemoryError.

Below is a comparison table for a sequence of n integers (assuming Python 3.12 on a 64-bit system). Actual numbers vary by Python version and system, but the ratios hold.

Number of Integers	List Memory (approx.)	Generator Memory (approx.)
1,000	~28 KB	~112 bytes (generator object)
100,000	~2.8 MB	~112 bytes
10,000,000	~280 MB	~112 bytes
100,000,000	~2.8 GB	~112 bytes

As you can see, the list's memory grows linearly with n, whereas the generator object's size is constant because it doesn't store the data — only a reference to the generating function and current state.

The same applies to lazy iterator counterparts of list operations: map vs list comprehension, filter vs list comprehension, zip vs zip (already lazy). Converting a lazy iterable to a list with list() forces eager evaluation and consumes memory proportional to the entire sequence.

When to use a generator (lazy): When you only need to iterate once, and the sequence is large or expensive to compute.

When to use a list (eager): When you need random access (indexing), multiple passes over the data, or when the dataset is small enough that memory is not a concern.

The rule of thumb: If you can avoid storing the whole dataset in memory, do it. Start with a generator; only switch to a list if you run into a use case that genuinely requires it.

memory_comparison.pyPYTHON

# Quick memory measurement example
import sys

# Generator expression for 10 million squares
lazy_squares = (x * x for x in range(10_000_000))
print(f"Generator object size: {sys.getsizeof(lazy_squares)} bytes")

# Equivalent list (WARNING: don't run this on low-memory machines)
# eager_squares = [x * x for x in range(10_000_000)]
# print(f"List size: {sys.getsizeof(eager_squares)} bytes")

# Practical check: compare memory of list and generator for 1 million items
small_list = [x for x in range(1_000_000)]
small_gen  = (x for x in range(1_000_000))
print(f"List of 1M ints: {sys.getsizeof(small_list) / 1024 / 1024:.1f} MB")
print(f"Generator for 1M ints: {sys.getsizeof(small_gen)} bytes")

Output

Generator object size: 112 bytes

List of 1M ints: ~28 MB (actual depends on Python version)

Generator for 1M ints: 112 bytes

⚠ Memory Traps in Production:

A common mistake is to call list() on a generator inside a loop or a function, not realizing the data size. This can cause OOM errors. Always profile memory usage before deploying. Use tools like memory_profiler or tracemalloc to detect accidental eager materialization.

📊 Production Insight

In production, always measure — not estimate. A generator that wraps a database cursor may seem cheap, but if the cursor fetches all rows eagerly under the hood, you're still paying the memory cost. Test with production-sized data to verify actual memory usage.

🎯 Key Takeaway

Generators use constant memory regardless of sequence length. Lists use O(n) memory. Default to generators; only materialize to a list when you need random access or multiple passes.

The Exhaustion Trap and the iter() Sentinel Form

Here's the behaviour that catches almost everyone at some point: iterators are one-shot. Once an iterator is exhausted, it stays exhausted. Calling iter() on an already-exhausted iterator just returns the same dead object — it does not reset. This is different from calling iter() on an iterable like a list, which creates a brand-new iterator.

This distinction has a practical consequence: if you pass an iterator (not an iterable) to two functions, the second one will silently get nothing. No error. Just an empty loop. These bugs are genuinely hard to track down.

Python also has a lesser-known second form of iter() — iter(callable, sentinel) — which wraps any zero-argument callable into an iterator that keeps calling it until the return value equals the sentinel. This is incredibly useful for reading data in fixed-size blocks, processing queue items, or any situation where you have a 'pull until done' data source.

iterator_exhaustion_and_sentinel.pyPYTHON

# ── PART 1: The Exhaustion Trap ──────────────────────────────────────────────

team_members = ["Priya", "Jordan", "Kwame"]  # This is an ITERABLE
team_iterator = iter(team_members)            # This is an ITERATOR (stateful)

print("First loop:")
for member in team_iterator:
    print(f"  Hello, {member}")

print("\nSecond loop over the SAME iterator (iterator is exhausted):")
for member in team_iterator:  # This loop body never executes — silent failure!
    print(f"  Hello again, {member}")

print("  (nothing printed — iterator was already exhausted)")

print("\nSecond loop over the ORIGINAL LIST (always works):")
for member in team_members:  # list is an iterable — fresh iterator each time
    print(f"  Hello again, {member}")


# ── PART 2: The iter(callable, sentinel) Form ─────────────────────────────────

import io

# Simulate reading binary data in 4-byte blocks from a stream
binary_stream = io.BytesIO(b"TheCodeForge.io rocks!")

print("\nReading binary stream in 4-byte blocks:")

# iter(callable, sentinel): calls binary_stream.read(4) repeatedly
# Stops automatically when read() returns b'' (empty bytes — end of stream)
for block in iter(lambda: binary_stream.read(4), b""):
    print(f"  Block: {block}")  # Each block is exactly 4 bytes (except maybe last)

Output

First loop:

Hello, Priya

Hello, Jordan

Hello, Kwame

Second loop over the SAME iterator (iterator is exhausted):

(nothing printed — iterator was already exhausted)

Second loop over the ORIGINAL LIST (always works):

Hello again, Priya

Hello again, Jordan

Hello again, Kwame

Reading binary stream in 4-byte blocks:

Block: b'TheC'

Block: b'odeF'

Block: b'orge'

Block: b'.io '

Block: b'rock'

Block: b's!'

⚠ Watch Out — Passing Iterators to Multiple Functions:

If you do items = iter(some_list) and pass items to two different functions, the second function will see an exhausted iterator and loop over nothing. Always pass the original iterable (the list/set/etc.) unless you deliberately want to share position state. When in doubt, check: does this object have __next__? If yes, it's an iterator — treat it as one-shot.

📊 Production Insight

The one-shot nature of iterators causes data-loss bugs that are notoriously hard to reproduce.

If your ETL pipeline uses generators, always call the factory function again for each consumer.

A common pattern: expose a function that returns a fresh iterator, not a pre-wired iterator object.

🎯 Key Takeaway

Iterators are one-shot — once exhausted, they never recover.

To iterate twice, always use the original iterable (not the iterator).

The iter(callable, sentinel) form is a concise pattern for polling data sources.

When to Reach for an Iterator Instead of a List (And When to Run Away)

Here's where most devs get it wrong: they use iterators because they heard they're "memory-efficient" without asking if their data actually benefits from laziness. The decision isn't philosophical—it's about access patterns.

Use iterators when you're processing data one element at a time, never needing random access, and the dataset is larger than available RAM. Streaming CSV files, parsing network packets, generating sequences on the fly. These are iterator territory.

Avoid iterators when you need to index into the data, iterate over it multiple times, or modify elements in place during iteration. A list is not your enemy—it's the right tool when you need random access or multiple passes without re-initializing the iterator.

The rule is brutal but simple: if your data fits in memory and you access it more than once, use a list. If your data doesn't fit in memory or you only traverse it once, use an iterator. Don't cargo-cult memory efficiency.

DecisionCheck.pyPYTHON

// io.thecodeforge — python tutorial

# Bad: iterator for random access (explodes at runtime)
def parse_log_files(file_paths):
    lines = (line for path in file_paths for line in open(path))
    if lines[42]:  # TypeError! Generator not subscriptable
        print("Nope.")

# Good: iterator for streaming (memory constant)
def stream_recent_logs(file_path):
    with open(file_path) as log:
        for line in log:
            if "ERROR" in line:
                yield line

# Use list when you need index access
log_entries = open("server.log").readlines()
print(log_entries[42])  # Works fine

Output

TypeError: 'generator' object is not subscriptable

⚠ Production Trap:

If you convert a generator to a list with list() just to index into it once, you've just paid the full memory cost with zero benefit. That's not lazy evaluation—that's a performance lie.

🎯 Key Takeaway

Use iterators for single-pass streaming over large data; use lists when you need random access or multiple iterations.

Creating Different Types of Iterators: Yield Original, Transform, or Generate New Data

Not all iterators are created equal. Once you understand the iterator protocol, you need to know which flavor solves your problem. There are three distinct patterns you'll see in production—and mixing them up causes subtle bugs.

Yielding original data means your iterator doesn't modify the source—it just exposes it lazily. Think reading a file line by line without stripping or parsing. This is the simplest, safest pattern because the consumer controls transformation.

Transforming input data is where most pipeline code lives. Your iterator yields a modified version of each element—parsing raw bytes into structs, converting log timestamps, normalizing text. The key: every element transforms independently.

Generating new data means you're producing values that have no direct mapping to input. An iterator that produces Fibonacci numbers, a counter, or a sliding window over a stream. No external source, just logic.

Each pattern demands different testing strategies. Yielding original data is trivial to unit test. Transforming requires input/output pairs. Generating needs convergence checks to avoid infinite loops.

IteratorPatterns.pyPYTHON

// io.thecodeforge — python tutorial

def original_data(path):
    """Yields original lines, no transformation."""
    with open(path) as f:
        yield from f

def transform_data(lines):
    """Transforms each line into a tuple."""
    for line in lines:
        parts = line.strip().split(",")
        yield (parts[0], int(parts[1]))

def generate_ids(prefix, count):
    """Generates new data from pure logic."""
    for i in range(count):
        yield f"{prefix}_{i:04d}"

# Usage
lines = original_data("users.csv")
transformed = transform_data(lines)
for uid, count in transformed:
    print(f"{uid}: {count} entries")

for uid in generate_ids("session", 3):
    print(f"Logging {uid}")

Output

alice: 42 entries

bob: 17 entries

Logging session_0000

Logging session_0001

Logging session_0002

💡Senior Shortcut:

Use yield from when you're just relaying an existing iterable. It's faster, cleaner, and handles StopIteration propagation automatically. Only write __next__ manually when you need state management.

🎯 Key Takeaway

Choose your iterator pattern deliberately: yield original for passthrough, transform for pipelines, generate for synthetic data.

Coding Potentially Infinite Iterators — The Pattern That Breaks Beginners

Infinite iterators aren't a gimmick—they're how you model real-time data streams, retry loops, or sensor feeds. But infinite means you never get a StopIteration. If you write a for loop over one, you hang. Forever.

The pattern is simple: write an iterator that never raises StopIteration, and control consumption from the caller side. Use itertools.islice, takewhile, or explicit break conditions. The generator function with yield is your cleanest tool here.

The danger? Forgetting to add a break condition in a production loop. I've seen a batch processing job run for 14 hours because an infinite iterator fed into a for loop with no termination logic. The code looked correct until you traced the data flow.

Best practice: always wrap infinite iterators with a bounded consumer. Either pass a count limit or use itertools.takewhile with a predicate. If someone else uses your iterator, they won't expect it to hang—make the infinite nature explicit in the function name.

InfiniteSensor.pyPYTHON

// io.thecodeforge — python tutorial

import itertools
import random

def sensor_temperature():
    """Infinite iterator: never raises StopIteration."""
    while True:
        yield random.gauss(25.0, 2.0)

def bounded_take(iterator, count):
    """Safe wrapper: limits consumption."""
    for _ in range(count):
        yield next(iterator)

# Safe usage with islice
for temp in itertools.islice(sensor_temperature(), 100):
    if temp > 30:
        print(f"Alert: {temp:.1f}°C")

# Dangerous: forever loop
# for temp in sensor_temperature():  # Hangs!
#     print(temp)

Output

Alert: 31.2°C

Alert: 32.7°C

Alert: 30.5°C

⚠ Production Trap:

Never put an infinite iterator as the source of a for loop without an explicit break or itertools.islice. Your CI pipeline doesn't have forever—your processor will run until OOM or forced kill.

🎯 Key Takeaway

Infinite iterators need bounded consumers. Always use itertools.islice or takewhile to limit consumption—don't rely on the caller remembering to break.

Stop Writing boilerplate — Subclass collections.abc.Iterator Instead

Every time you hand-roll __iter__ and __next__ on a class, you're writing code that Python already gave you. The collections.abc module ships with Iterator — an abstract base class that automatically provides __iter__ for you. You just implement __next__. That's it.

Why does this matter? Because __iter__ returning self is boilerplate you will forget, and when you forget it, your iterator won't work in for loops. Iterator.__subclasshook__ also catches classes that implement __next__ without explicit inheritance, so your code stays duck-typed friendly.

In production, this pattern matters when you're building streaming data pipelines, file parsers, or any component that processes chunks. Subclassing Iterator signals intent — every dev on your team immediately knows this class is meant to be exhausted. No guesswork, no hidden state bugs.

SentinelReader.pyPYTHON

// io.thecodeforge — python tutorial

from collections.abc import Iterator

class LogTailer(Iterator):
    def __init__(self, path: str):
        self.file = open(path)
    
    def __next__(self) -> str:
        line = self.file.readline()
        if not line:
            self.file.close()
            raise StopIteration
        return line.strip()

// Usage
for entry in LogTailer("/var/log/syslog"):
    if "ERROR" in entry:
        print(entry)
        break

Output

Oct 15 10:32:17 server kernel: [12345.678] ERROR: disk I/O timeout

⚠ Production Trap:

If your __next__ raises StopIteration but you forgot to close resources, you've created a file descriptor leak. Always finalize in the exhaustion branch.

🎯 Key Takeaway

Inherit from collections.abc.Iterator — stop writing __iter__ yourself, the base class gives it to you.

Why You Should Inherit From collections.abc.Iterator (And Not Just Wing It)

Hand-rolled iterators break silently in subtle ways. Your custom class has __next__ but some copy-paste rookie forgets __iter__? Now it fails in list() and for loops with a TypeError: 'YourClass' object is not iterable. That's a 30-minute debugging session where you stare at code that clearly has __next__ and scream at your monitor.

Subclassing Iterator eliminates that entire class of bug. The ABC provides __iter__ returning self, plus mixin methods like __length_hint__ that help CPython optimize memory in list() calls. Your iterator becomes a first-class citizen — it plays nice with itertools, multiprocessing, and any function that expects an iterable.

When you're building production data pipelines, this isn't about elegance — it's about consistency. Every team member follows the same contract. Your code review notes go from 'add __iter__' to 'approved', and you get back to shipping features.

ChunkReader.pyPYTHON

// io.thecodeforge — python tutorial

from collections.abc import Iterator

class ChunkReader(Iterator):
    def __init__(self, data: list, chunk_size: int):
        self.data = data
        self.chunk_size = chunk_size
        self._index = 0

    def __next__(self) -> list:
        if self._index >= len(self.data):
            raise StopIteration
        chunk = self.data[self._index:self._index + self.chunk_size]
        self._index += self.chunk_size
        return chunk

// Works instantly with standard library
from itertools import islice
logs = list(range(100))
for batch in islice(ChunkReader(logs, 10), 3):
    print(len(batch))

Output

💡Senior Shortcut:

Use isinstance(your_obj, Iterator) for type checks — it catches both explicit subclasses and anything with __next__ + __iter__ that returns self.

🎯 Key Takeaway

Subclass collections.abc.Iterator to eliminate boilerplate, prevent silent TypeError bugs, and make your code review pass on first submission.

Iterator Protocol: iter and next Deep-Dive

The iterator protocol is the foundation of all iteration in Python. It consists of two methods: __iter__ and __next__. Understanding these methods is crucial for building custom iterators and debugging iterator exhaustion.

__iter__ should return the iterator object itself. This allows an iterator to be used in a for loop, which implicitly calls iter() on the iterable. For a proper iterator, __iter__ returns self. __next__ should return the next item in the sequence. When there are no more items, it must raise StopIteration.

Here's a minimal iterator that yields numbers from 0 to n-1:

```python class CountDown: def __init__(self, n): self.n = n self.current = n

def __iter__(self): return self

def __next__(self): if self.current <= 0: raise StopIteration self.current -= 1 return self.current ```

Usage: ``python for x in CountDown(3): print(x) # prints 2, 1, 0 ``

A common mistake is forgetting to implement __iter__ or returning something other than self. Without __iter__, the object cannot be used in a for loop directly (though it can still be used with next() manually). The protocol ensures that iterators are also iterables.

Another subtlety: once an iterator is exhausted, calling __next__ raises StopIteration indefinitely. There is no built-in way to reset an iterator; you must create a new instance. This is the root cause of the silent data drop in ETL pipelines when an iterator is consumed multiple times.

In practice, you rarely need to implement __iter__ and __next__ manually because generator functions and itertools cover most use cases. However, when you do, always remember to raise StopIteration and return self from __iter__.

iterator_protocol.pyPYTHON

class CountDown:
    def __init__(self, n):
        self.n = n
        self.current = n

    def __iter__(self):
        return self

    def __next__(self):
        if self.current <= 0:
            raise StopIteration
        self.current -= 1
        return self.current

# Usage
for x in CountDown(3):
    print(x)  # prints 2, 1, 0

# Manual iteration
cd = CountDown(2)
print(next(cd))  # 1
print(next(cd))  # 0
print(next(cd))  # StopIteration raised

⚠ Iterator Exhaustion is Permanent

📊 Production Insight

In production ETL, always ensure that iterators are used only once or are explicitly recreated. Use itertools.tee if you need multiple independent iterators from a single source.

🎯 Key Takeaway

The iterator protocol requires __iter__ to return self and __next__ to return the next item or raise StopIteration. Exhaustion is irreversible.

itertools.count, cycle, repeat: Infinite Iterators

The itertools module provides several functions for creating infinite iterators. These are lazy and never raise StopIteration on their own, making them powerful but also dangerous if not used with a break condition.

itertools.count(start=0, step=1) returns an iterator that generates consecutive integers starting from start with a given step. It is often used with zip to add indices or with map to generate sequences.

import itertools
for i, item in zip(itertools.count(1), ['a', 'b', 'c']):
    print(i, item)  # 1 a, 2 b, 3 c

itertools.cycle(iterable) returns an iterator that repeats the elements of the iterable indefinitely. It is useful for cycling through a fixed set of values, like alternating colors or statuses.

colors = ['red', 'green', 'blue']
color_cycle = itertools.cycle(colors)
for _ in range(5):
    print(next(color_cycle))  # red, green, blue, red, green

itertools.repeat(object, times=None) returns an iterator that yields the same object repeatedly. If times is specified, it yields that many times; otherwise, it yields forever.

for x in itertools.repeat(42, 3):
    print(x)  # 42, 42, 42

Infinite iterators are useful for generating test data, implementing polling loops, or creating streaming pipelines. However, they must be used with caution: always include a termination condition (e.g., break, takewhile, or islice) to avoid infinite loops.

A common pattern is to combine count with takewhile to generate a finite sequence:

for i in itertools.takewhile(lambda x: x < 10, itertools.count()):
    print(i)  # 0 through 9

In ETL, infinite iterators can simulate real-time data streams. For example, cycle can be used to repeat configuration values, and count can generate unique IDs.

infinite_iterators.pyPYTHON

import itertools

# count: infinite counter
for i, item in zip(itertools.count(1), ['a', 'b', 'c']):
    print(i, item)

# cycle: infinite repeat
colors = ['red', 'green', 'blue']
color_cycle = itertools.cycle(colors)
for _ in range(5):
    print(next(color_cycle))

# repeat: repeat object
for x in itertools.repeat(42, 3):
    print(x)

# safe use with takewhile
for i in itertools.takewhile(lambda x: x < 10, itertools.count()):
    print(i)

⚠ Infinite Iterators Require Explicit Termination

📊 Production Insight

In production, use itertools.islice to limit infinite iterators to a fixed number of items, or combine with takewhile for condition-based termination. Avoid using infinite iterators in for loops without a break.

🎯 Key Takeaway

itertools.count, cycle, and repeat create infinite iterators. They are lazy and memory-efficient but must be used with a termination strategy.

Custom Iterator Implementation Patterns

Beyond the basic iterator protocol, there are several patterns for implementing custom iterators in Python. These patterns help you write clean, reusable, and efficient code.

1. Class-based iterator with state This is the classic pattern where the iterator maintains internal state. It's useful when you need to track position, buffer data, or manage resources.

class FileLineReader:
    def __init__(self, filename):
        self.filename = filename
        self.file = None
def __iter__(self):
        self.file = open(self.filename)
        return self
def __next__(self):
        line = self.file.readline()
        if not line:
            self.file.close()
            raise StopIteration
        return line.rstrip(&#39;
&#39;)

2. Generator-based iterator Generators are the simplest way to create iterators. They automatically implement the iterator protocol.

def file_line_reader(filename):
    with open(filename) as f:
        for line in f:
            yield line.rstrip(&#39;
&#39;)

3. Iterator with sentinel value The two-argument form of iter() creates an iterator that calls a function until it returns a sentinel value. This is useful for reading from streams or APIs.

with open('data.txt') as f:
    for line in iter(f.readline, ''):
        print(line.rstrip(&#39;
&#39;))

4. Wrapping an existing iterator Sometimes you want to add behavior to an existing iterator, like logging or transformation. You can wrap it in a custom class.

class LoggingIterator:
    def __init__(self, iterator):
        self.iterator = iterator
def __iter__(self):
        return self
def __next__(self):
        item = next(self.iterator)
        print(f"Yielding: {item}")
        return item

5. Resettable iterator Standard iterators cannot be reset, but you can implement a resettable iterator by storing the original data and recreating the state.

class ResettableIterator:
    def __init__(self, data):
        self.data = data
        self.index = 0
def __iter__(self):
        return self
def __next__(self):
        if self.index >= len(self.data):
            raise StopIteration
        result = self.data[self.index]
        self.index += 1
        return result
def reset(self):
        self.index = 0

Choose the pattern that best fits your use case. Generators are preferred for simplicity, while class-based iterators offer more control and resource management.

custom_iterator_patterns.pyPYTHON

# Pattern 1: Class-based
class FileLineReader:
    def __init__(self, filename):
        self.filename = filename
        self.file = None

    def __iter__(self):
        self.file = open(self.filename)
        return self

    def __next__(self):
        line = self.file.readline()
        if not line:
            self.file.close()
            raise StopIteration
        return line.rstrip('\n')

# Pattern 2: Generator
def file_line_reader(filename):
    with open(filename) as f:
        for line in f:
            yield line.rstrip('\n')

# Pattern 3: Sentinel
with open('data.txt') as f:
    for line in iter(f.readline, ''):
        print(line.rstrip('\n'))

# Pattern 4: Wrapper
class LoggingIterator:
    def __init__(self, iterator):
        self.iterator = iterator

    def __iter__(self):
        return self

    def __next__(self):
        item = next(self.iterator)
        print(f"Yielding: {item}")
        return item

# Pattern 5: Resettable
class ResettableIterator:
    def __init__(self, data):
        self.data = data
        self.index = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.index >= len(self.data):
            raise StopIteration
        result = self.data[self.index]
        self.index += 1
        return result

    def reset(self):
        self.index = 0

💡Prefer Generators for Simplicity

📊 Production Insight

In production, prefer generator functions for most cases. Use class-based iterators when you need to manage resources (e.g., file handles) or implement complex state machines. Always ensure proper cleanup in __del__ or context managers.

🎯 Key Takeaway

Custom iterators can be implemented as classes or generators. Generators are simpler; classes offer more control. The sentinel form of iter() is useful for stream reading.

● Production incidentPOST-MORTEMseverity: high

Exhausted Iterator Causes Silent Data Drop in ETL Pipeline

Symptom

The pipeline processed only the first half of the expected records. No errors raised—just fewer rows in the output database.

Assumption

The team assumed that passing the generator object to a validation function and then to the processing function would work like passing a list. They thought the generator would reset or that the data would be cached.

Root cause

The validation function called list(my_generator()) internally, which exhausted the generator. The original generator object was passed downstream, but it was already exhausted—calling __next__ raised StopIteration immediately, so the processing loop never executed.

Fix

Instead of passing the generator object directly, call the generator function twice: once to produce an iterable for validation (e.g., converted to a list) and once to produce a fresh generator for the main processing loop. Alternatively, produce a list once and pass that around—trade memory for safety.

Key lesson

Generators are one-shot—never pass a generator object to more than one consumer.
If you need to iterate twice, either call the generator function twice or materialize the data into a list.
Always audit function signatures: if a function expects an iterable, it may exhaust the iterator. Prefer passing the factory (callable) over the instance.

Production debug guideCommon symptoms and their root causes when iterators behave unexpectedly.5 entries

Symptom · 01

Loop body never executes for the second usage of the same variable

→

Fix

Check if the variable is an iterator (has __next__). If yes, the iterator was exhausted. Use the original iterable (list, tuple) to get a fresh iterator.

Symptom · 02

Custom class used in for-loop raises 'TypeError: 'MyClass' object is not iterable'

→

Fix

Implement __iter__ method that returns an iterator (or use yield in __iter__ to make it a generator). Ensure __iter__ returns an object with __next__.

Symptom · 03

Generator function returns but loop never yields anything

→

Fix

Verify that the generator function is called, not the generator object. If you write gen = my_generator without parentheses, you have the function itself, not a generator. Add parentheses.

Symptom · 04

File object used in two for loops: second loop prints nothing

→

Fix

File objects are iterators. After the first loop, the file pointer is at EOF. Call file.seek(0) to reset the pointer, or re-open the file.

Symptom · 05

Infinite loop when iterating a custom iterator that never raises StopIteration

→

Fix

Ensure every code path in __next__ either returns a value or raises StopIteration. Add a guard condition before returning. Test with a small dataset to verify exhaustion.

★ Iterator Debug Quick ReferenceOne-liners for the three most common iterator-related production issues.

Iterator exhausted too early−

Immediate action

Replace `iter(something)` usage with `list(something)` if you need multiple passes, or call the factory again.

Commands

print(type(obj).__name__, hasattr(obj, '__next__'))

print(id(obj)) # Compare to original variable to confirm identity

Fix now

Change for item in my_iterator: to for item in list(my_iterable): as a temporary fix, then refactor.

Custom iterator never finishes+

Generator doesn't execute any statements+

Feature / Aspect	Iterable	Iterator
Required methods	`__iter__()` only	`__iter__()` and `__next__()`
Holds state between calls	No — stateless	Yes — remembers current position
Can be looped multiple times	Yes — creates a fresh iterator each time	No — exhausted after one full pass
Examples in Python stdlib	list, tuple, str, dict, set	file objects, `enumerate()`, `zip()`, `map()`
What `iter()` returns	A new iterator object	Itself (self)
Memory usage pattern	Usually stores all data	Usually produces one item at a time
Can be passed to for-loop?	Yes	Yes
Restarting iteration	Automatic — just loop again	Must call `iter()` on the source iterable again

⚙ Quick Reference

16 commands from this guide

File	Command / Code	Purpose
iterable_vs_iterator.py	playlist = ["Bohemian Rhapsody", "Hotel California", "Stairway to Heaven"]	The Two-Protocol System
file_chunk_iterator.py	class FileChunkIterator:	Building a Custom Iterator
generic_iterator_template.py	class GenericIterator:	Custom Iterator Class Implementation
api_pagination_generator.py	def mock_api_fetch(page_number, page_size=3):	Generator Functions
itertools_examples.py	colors = ['red', 'blue', 'green']	itertools Quick Reference
lazy_evaluation.py	eager_squares = [x * x for x in range(10_000_000)] # DON'T run this unless you ...	Lazy Evaluation
memory_comparison.py	lazy_squares = (x * x for x in range(10_000_000))	Memory Efficiency Comparison
iterator_exhaustion_and_sentinel.py	team_members = ["Priya", "Jordan", "Kwame"] # This is an ITERABLE	The Exhaustion Trap and the iter() Sentinel Form
DecisionCheck.py	def parse_log_files(file_paths):	When to Reach for an Iterator Instead of a List (And When to
IteratorPatterns.py	def original_data(path):	Creating Different Types of Iterators
InfiniteSensor.py	def sensor_temperature():	Coding Potentially Infinite Iterators
SentinelReader.py	from collections.abc import Iterator	Stop Writing boilerplate
ChunkReader.py	from collections.abc import Iterator	Why You Should Inherit From collections.abc.Iterator (And No
iterator_protocol.py	class CountDown:	Iterator Protocol
infinite_iterators.py	for i, item in zip(itertools.count(1), ['a', 'b', 'c']):	itertools.count, cycle, repeat
custom_iterator_patterns.py	class FileLineReader:	Custom Iterator Implementation Patterns

Key takeaways

An iterable produces iterators; an iterator IS the stateful cursor

confusing the two causes silent, hard-to-debug empty-loop bugs.

Every Python for-loop is secretly calling iter() once and then next() on every cycle until StopIteration is raised

there is no other mechanism.

Iterators are one-shot by design

once exhausted, they stay exhausted. Always pass the original iterable — not the iterator — when the same data needs to be traversed more than once.

Generator functions (using yield) are shorthand for writing iterator classes

use them for clean, memory-efficient lazy sequences; use a full class when you need restartable iteration or complex internal state.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

What's the difference between an iterable and an iterator in Python, and...

Q02SENIOR

If you call `iter()` on an iterator object rather than a plain list, wha...

Q03SENIOR

You have a generator function that yields results from a paginated API. ...

Q01 of 03SENIOR

What's the difference between an iterable and an iterator in Python, and how does a for-loop use both of them internally?

ANSWER

An iterable is any object that implements __iter__(), which returns an iterator. An iterator implements both __iter__() (returning self) and __next__(), which returns the next element or raises StopIteration. A for-loop works by calling iter() on the target object to get an iterator, then repeatedly calling next() on that iterator, catching StopIteration to break. This means the for-loop creates a fresh iterator for iterables like lists, so you can loop multiple times. If you pass an iterator, it reuses the same exhausted one — that's why the second loop is silent.

FAQ · 4 QUESTIONS

Frequently Asked Questions

Is every iterator also an iterable in Python?

What is the difference between a generator and an iterator in Python?

Why does Python use StopIteration instead of returning None or a special value to signal the end?

What is the `iter(callable, sentinel)` form and when should I use it?

Naren Founder & Principal Engineer

20+ years shipping production Python across data and backend systems. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's Advanced Python. Mark it forged?

14 min read · try the examples if you haven't

Python Iterator Exhaustion — Silent Data Drop in ETL

Why Iterator Exhaustion Is a Silent Data Bug

The Two-Protocol System: Iterable vs Iterator

Building a Custom Iterator — A Real-World File Chunker

Custom Iterator Class Implementation — Step-by-Step Template

Generator Functions — The Shortcut Python Gives You

itertools Quick Reference — Lazy Iterator Building Blocks

Lazy Evaluation — How Iterators Enable Streaming and Large Data Processing

Memory Efficiency Comparison: List vs Generator (Eager vs Lazy)

The Exhaustion Trap and the iter() Sentinel Form

When to Reach for an Iterator Instead of a List (And When to Run Away)

Creating Different Types of Iterators: Yield Original, Transform, or Generate New Data

Coding Potentially Infinite Iterators — The Pattern That Breaks Beginners

Stop Writing boilerplate — Subclass collections.abc.Iterator Instead

Why You Should Inherit From collections.abc.Iterator (And Not Just Wing It)

Iterator Protocol: __iter__ and __next__ Deep-Dive

itertools.count, cycle, repeat: Infinite Iterators

Custom Iterator Implementation Patterns

Exhausted Iterator Causes Silent Data Drop in ETL Pipeline

Key takeaways

Interview Questions on This Topic

Frequently Asked Questions

That's Advanced Python. Mark it forged?

Iterator Protocol: iter and next Deep-Dive