Python Generators — The Empty Log Report Bug
A log pipeline that ran fine for weeks suddenly outputs zero results — that's generator exhaustion.
- Generator functions use yield to pause and resume execution, freezing local state
- Calling a generator function returns an object — the body runs only when next() or for loop starts
- Memory stays O(1): only one value exists at a time, regardless of dataset size
- Performance cost: ~40ns overhead per yield call vs direct iteration; negligible for I/O-bound pipelines
- Production trap: exhaust a generator once and it's gone forever — silent empty iterations follow
- Biggest mistake: assuming the function runs at call time; side effects never fire until iteration
Imagine a vending machine that makes each snack on demand the moment you press a button — instead of baking every snack upfront and stuffing them all into a huge bag you have to carry. A Python generator is that vending machine. It produces values one at a time, only when you ask for the next one, and it remembers exactly where it left off each time. You get the same snacks, but without the heavy bag.
Every Python developer hits a wall: they write a reasonable script that loads a dataset, processes it, and crashes — not because the logic is wrong, but because they tried to hold a million rows in memory all at once. It's one of the most common avoidable performance problems, and generators exist to solve it. They're not niche; they power Python's own range(), map(), and zip().
The core problem is the cost of 'eagerness'. A regular list computes and stores every value immediately. Fine for 100 items. A disaster for 10 million log entries, infinite sequences, or streaming API responses where you don't even know the final count. Generators flip the model: they're lazy, producing each value only when the caller asks for the next one. Memory stays flat no matter how large the dataset.
By the end you'll understand why yield exists and how it differs from return, you'll write generator functions and generator expressions with confidence, and you'll know the real-world patterns — log file processing, data pipelines, infinite sequences — where generators genuinely shine. You'll also avoid the two traps that catch almost every developer the first time.
What yield Actually Does — and Why It's Not Just a Fancy return
The single most important thing to understand about generators is what happens to the function's execution state when it hits yield. With a normal return, the function runs, hands back a value, and is completely torn down — local variables gone, position in code gone, everything erased. When a function hits yield, Python does something different: it pauses the function, hands the yielded value to the caller, and freezes the entire execution frame in place — local variables, loop counters, everything. The next time the caller asks for a value by calling next(), Python thaws that frozen frame and continues from the exact line after yield.
This is why a generator function doesn't execute at all when you call it. Calling a generator function just returns a generator object. The body doesn't run until you start consuming that object with next() or a for loop. That single distinction trips up almost every developer the first time.
Real-World Pattern — Processing Large Log Files
Log files are the textbook generator use case because they're naturally sequential and can grow into the gigabytes. Loading a 10 GB file into a list will crash most systems, but a generator pipeline handles it with a constant memory footprint. The pattern involves 'pipelining' where each step is a generator that pulls from the previous one, ensuring only one line of data exists in RAM at any given time.
By decoupling the reading, filtering, and parsing logic into separate generator functions, you create a modular, production-grade ETL (Extract, Transform, Load) system that is as readable as it is efficient.
list() in production.Memory Usage Comparison: List vs Generator
One of the most compelling reasons to use generators is the dramatic difference in memory consumption. A list stores every element in contiguous memory. A generator stores nothing — it computes each element on demand and discards it after yielding. This comparison table crystallizes the practical trade-offs for common Python workloads.
For a dataset of 10 million integers, a list would consume roughly 80 MB (8 bytes per int * 10M + list overhead). A generator requires just 112 bytes — the size of the generator object itself. The speed difference is negligible for iteration (generators add about 40ns per yield), but the memory savings are enormous.
The table below summarizes the key differences for production decision-making:
sorted() or max() on it. These functions consume the entire generator into a list internally. Always check the documentation: if the function returns a list, it materialises. Prefer functions that accept iterators (like heapq.nlargest) or build your own streaming aggregators.Advanced Mechanics: Infinite Streams and .send()
Because generators are lazy, they are the only way to represent infinite sequences. A while True loop inside a generator isn't a bug—it's a feature. Since the function pauses at every yield, it will never hang your CPU; it simply waits for the caller to request the next value. Furthermore, the .send() method allows you to push data into the generator, effectively turning it into a coroutine for two-way communication.
next() call.next() first, causing a TypeError: can't send non-None value to a just-started generator.next() once after creating a .send()-based generator, or wrap initialization in a factory.next() call.yield from — Generator Delegation Made Simple
When you have nested generators, you could write a for loop to yield all items from a sub-generator. But yield from does it cleaner and faster. It delegates to another generator (or any iterable) and yields each item as if it came from the outer generator. It also propagates StopIteration and handles .send() and .throw() correctly — something a for loop doesn't do.
Use yield from when you need to flatten nested data, compose generators, or build recursive generators. It's the unsung hero of lazy pipeline designs.
yield from for Recursive and Nested Generators
While the basic yield from works for simple delegation, its real power emerges when you need to recursively traverse deeply nested structures. Consider a file system tree, a JSON object with arbitrary nesting, or a game tree. A generator that recursively yields from sub-generators lets you produce a flat stream of elements without building intermediate lists.
The recursive pattern works because each call to yield from flatten(...) creates a new generator that yields items one by one. Python's call stack pushes frames for each level of nesting, but only one value exists at a time. This is a textbook example of lazy recursion: you can flatten a tree of any depth without running out of memory (though you can hit recursion depth limits for very deep trees).
For production use, combine this with itertools.islice or itertools.takewhile to limit output when you only need a subset of the nested data.
list() on the generator to get all values, which defeats the purpose. Always consume lazily with a for loop.Generators vs Lists vs Iterators — Knowing When to Use Each
The honest answer to 'when should I use a generator?' is: whenever you don't need all the values at once, or whenever you might not need all of them at all. If you need to sort, reverse, index by position, or pass the same sequence to multiple consumers, use a list — you need all values materialised. If you're transforming or filtering a sequence and consuming it exactly once from start to finish, a generator is almost always the better choice.
One critical difference that surprises people: generators are single-use. Once exhausted, they're done — calling iter() on them again doesn't restart them. A list can be iterated as many times as you like. This is the most common source of subtle bugs with generators in production code.
Custom iterator classes (with __iter__ and __next__) give you the same lazy behaviour as generators but with more control — you can maintain state, support multiple independent iterations, or define a length. Generators are the shortcut for the 80% case where you just need simple, one-shot lazy iteration.
sorted(), max(), list(), or any()) exhausts it silently. Any subsequent attempt to iterate that generator produces nothing. If you need to reuse values, call list() on the generator once and store the result.list() and then operate on the list. Or redesign to merge the two consumers into one pass.list() once, then use the list.Advanced Generator Methods: .send() and .throw()
Beyond simple iteration, generators support two advanced methods that turn them into two-way communication channels: .send() and .throw(). These are often overlooked but essential for building coroutine-like patterns, cooperative multitasking, and generator-based pipelines with error handling.
.send(val) resumes the generator and passes a value into it, which becomes the result of the yield expression inside the generator. This lets you inject data from outside. .throw(type, value, traceback) raises an exception at the point where the generator was paused. The generator can catch it (via try/except around the yield) and yield another value, or let it propagate to terminate the generator.
A common use case for .throw() is to signal a generator to clean up or stop early, akin to a cancel signal. For pipelines, you can throw an exception into the middle of a chain to abort processing without manually draining the generator.
next() on a generator before using .send() or .throw(). The first call sets up the generator at its first yield point. Forgetting this raises TypeError: can't send non-None value to a just-started generator.next() before using these methods.The Silent Empty Log Report — Generator Exhaustion in Production
filter_errors() which iterated it fully. When the count function later tried to iterate the same generator, it received nothing — StopIteration was already raised. No error was thrown; the for loop just didn't execute.- Generators are single-use. Passing one to a function that iterates it fully exhausts it silently.
- If multiple consumers need the same data, call
list()on the generator once and store the result. - Never assume iteration order or count — verify with a small test before deploying any generator pipeline.
list() at the pipeline start and compare output. Add a debug print('Consumed by', func.__name__) in each consumer function.list() call inside the pipeline. For example, list(lines) in a filter function materialises everything. Replace with lazy chaining.next(). Use for value in generator: or list(generator) to trigger execution.my_gen_func()) and then work with data.Key takeaways
next() call resumes it from exactly where it stopped.next() call to avoid TypeError.Common mistakes to avoid
3 patternsExpecting a generator function to run on call
my_gen_func() to trigger side effects (like printing) and nothing happens, or you print the return value and see '<generator object>' instead of your data.list() or use a for loop to actually run it, e.g. list(my_gen_func()) or next(my_gen_func()).Iterating an exhausted generator and getting no error
my_generator()) and iterate results repeatedly, or call the generator function again to get a fresh generator object.Using a generator expression where you immediately need all values anyway
Interview Questions on This Topic
What is the difference between a generator function and a regular function, and what happens to the execution frame when yield is encountered?
next() is called, execution resumes from right after the yield. The generator function returns a generator object when called, not a value.Frequently Asked Questions
That's Functions. Mark it forged?
5 min read · try the examples if you haven't