Python Generators — The Empty Log Report Bug
A log pipeline that ran fine for weeks suddenly outputs zero results — that's generator exhaustion.
20+ years shipping production Python across data and backend systems. Written from production experience, not tutorials.
- Generator functions use yield to pause and resume execution, freezing local state
- Calling a generator function returns an object — the body runs only when next() or for loop starts
- Memory stays O(1): only one value exists at a time, regardless of dataset size
- Performance cost: ~40ns overhead per yield call vs direct iteration; negligible for I/O-bound pipelines
- Production trap: exhaust a generator once and it's gone forever — silent empty iterations follow
- Biggest mistake: assuming the function runs at call time; side effects never fire until iteration
Imagine a vending machine that makes each snack on demand the moment you press a button — instead of baking every snack upfront and stuffing them all into a huge bag you have to carry. A Python generator is that vending machine. It produces values one at a time, only when you ask for the next one, and it remembers exactly where it left off each time. You get the same snacks, but without the heavy bag.
Every Python developer hits a wall: they write a reasonable script that loads a dataset, processes it, and crashes — not because the logic is wrong, but because they tried to hold a million rows in memory all at once. It's one of the most common avoidable performance problems, and generators exist to solve it. They're not niche; they power Python's own range(), map(), and zip().
The core problem is the cost of 'eagerness'. A regular list computes and stores every value immediately. Fine for 100 items. A disaster for 10 million log entries, infinite sequences, or streaming API responses where you don't even know the final count. Generators flip the model: they're lazy, producing each value only when the caller asks for the next one. Memory stays flat no matter how large the dataset.
By the end you'll understand why yield exists and how it differs from return, you'll write generator functions and generator expressions with confidence, and you'll know the real-world patterns — log file processing, data pipelines, infinite sequences — where generators genuinely shine. You'll also avoid the two traps that catch almost every developer the first time.
What yield Actually Does — and Why It's Not Just a Fancy return
The single most important thing to understand about generators is what happens to the function's execution state when it hits yield. With a normal return, the function runs, hands back a value, and is completely torn down — local variables gone, position in code gone, everything erased. When a function hits yield, Python does something different: it pauses the function, hands the yielded value to the caller, and freezes the entire execution frame in place — local variables, loop counters, everything. The next time the caller asks for a value by calling next(), Python thaws that frozen frame and continues from the exact line after yield.
This is why a generator function doesn't execute at all when you call it. Calling a generator function just returns a generator object. The body doesn't run until you start consuming that object with next() or a for loop. That single distinction trips up almost every developer the first time.
Real-World Pattern — Processing Large Log Files
Log files are the textbook generator use case because they're naturally sequential and can grow into the gigabytes. Loading a 10 GB file into a list will crash most systems, but a generator pipeline handles it with a constant memory footprint. The pattern involves 'pipelining' where each step is a generator that pulls from the previous one, ensuring only one line of data exists in RAM at any given time.
By decoupling the reading, filtering, and parsing logic into separate generator functions, you create a modular, production-grade ETL (Extract, Transform, Load) system that is as readable as it is efficient.
list() in production.Memory Usage Comparison: List vs Generator
One of the most compelling reasons to use generators is the dramatic difference in memory consumption. A list stores every element in contiguous memory. A generator stores nothing — it computes each element on demand and discards it after yielding. This comparison table crystallizes the practical trade-offs for common Python workloads.
For a dataset of 10 million integers, a list would consume roughly 80 MB (8 bytes per int * 10M + list overhead). A generator requires just 112 bytes — the size of the generator object itself. The speed difference is negligible for iteration (generators add about 40ns per yield), but the memory savings are enormous.
The table below summarizes the key differences for production decision-making:
sorted() or max() on it. These functions consume the entire generator into a list internally. Always check the documentation: if the function returns a list, it materialises. Prefer functions that accept iterators (like heapq.nlargest) or build your own streaming aggregators.Advanced Mechanics: Infinite Streams and .send()
Because generators are lazy, they are the only way to represent infinite sequences. A while True loop inside a generator isn't a bug—it's a feature. Since the function pauses at every yield, it will never hang your CPU; it simply waits for the caller to request the next value. Furthermore, the .send() method allows you to push data into the generator, effectively turning it into a coroutine for two-way communication.
next() call.next() first, causing a TypeError: can't send non-None value to a just-started generator.next() once after creating a .send()-based generator, or wrap initialization in a factory.next() call.yield from — Generator Delegation Made Simple
When you have nested generators, you could write a for loop to yield all items from a sub-generator. But yield from does it cleaner and faster. It delegates to another generator (or any iterable) and yields each item as if it came from the outer generator. It also propagates StopIteration and handles .send() and .throw() correctly — something a for loop doesn't do.
Use yield from when you need to flatten nested data, compose generators, or build recursive generators. It's the unsung hero of lazy pipeline designs.
yield from for Recursive and Nested Generators
While the basic yield from works for simple delegation, its real power emerges when you need to recursively traverse deeply nested structures. Consider a file system tree, a JSON object with arbitrary nesting, or a game tree. A generator that recursively yields from sub-generators lets you produce a flat stream of elements without building intermediate lists.
The recursive pattern works because each call to yield from flatten(...) creates a new generator that yields items one by one. Python's call stack pushes frames for each level of nesting, but only one value exists at a time. This is a textbook example of lazy recursion: you can flatten a tree of any depth without running out of memory (though you can hit recursion depth limits for very deep trees).
For production use, combine this with itertools.islice or itertools.takewhile to limit output when you only need a subset of the nested data.
list() on the generator to get all values, which defeats the purpose. Always consume lazily with a for loop.Generators vs Lists vs Iterators — Knowing When to Use Each
The honest answer to 'when should I use a generator?' is: whenever you don't need all the values at once, or whenever you might not need all of them at all. If you need to sort, reverse, index by position, or pass the same sequence to multiple consumers, use a list — you need all values materialised. If you're transforming or filtering a sequence and consuming it exactly once from start to finish, a generator is almost always the better choice.
One critical difference that surprises people: generators are single-use. Once exhausted, they're done — calling iter() on them again doesn't restart them. A list can be iterated as many times as you like. This is the most common source of subtle bugs with generators in production code.
Custom iterator classes (with __iter__ and __next__) give you the same lazy behaviour as generators but with more control — you can maintain state, support multiple independent iterations, or define a length. Generators are the shortcut for the 80% case where you just need simple, one-shot lazy iteration.
sorted(), max(), list(), or any()) exhausts it silently. Any subsequent attempt to iterate that generator produces nothing. If you need to reuse values, call list() on the generator once and store the result.list() and then operate on the list. Or redesign to merge the two consumers into one pass.list() once, then use the list.Advanced Generator Methods: .send() and .throw()
Beyond simple iteration, generators support two advanced methods that turn them into two-way communication channels: .send() and .throw(). These are often overlooked but essential for building coroutine-like patterns, cooperative multitasking, and generator-based pipelines with error handling.
.send(val) resumes the generator and passes a value into it, which becomes the result of the yield expression inside the generator. This lets you inject data from outside. .throw(type, value, traceback) raises an exception at the point where the generator was paused. The generator can catch it (via try/except around the yield) and yield another value, or let it propagate to terminate the generator.
A common use case for .throw() is to signal a generator to clean up or stop early, akin to a cancel signal. For pipelines, you can throw an exception into the middle of a chain to abort processing without manually draining the generator.
next() on a generator before using .send() or .throw(). The first call sets up the generator at its first yield point. Forgetting this raises TypeError: can't send non-None value to a just-started generator.next() before using these methods.Generator Expressions: The One-Liner That Saves Your Stack
You've used list comprehensions. They're clean, readable, and will crash your box on a 10GB dataset. Generator expressions do the same thing without materializing the entire list. The syntax is almost identical — swap square brackets for parentheses.
But here's the catch: generator expressions are single-pass. You can't index them, you can't slice them, and once you've consumed them they're gone. This isn't a bug — it's the whole point. You trade random access for memory efficiency that scales to any dataset size.
The real power comes from chaining them. A pipeline of generator expressions processes data in a single pass without intermediate storage. Three comprehension-like transformations? That's three generator expressions linked together, streaming elements one at a time. No intermediate lists, no memory spikes, no surprise OOM kills in production.
list() if you need multiple passes. Your logs won't debug this for you at 3 AM.Profiling Generator Performance — When Lazy Isn't Faster
Developers assume generators are always faster because they're memory-efficient. That's wrong. Generators have overhead: function call state tracking, yield/resume cycles, and the context switch between iterations. For small datasets, a list comprehension beats a generator expression every time. The question is where the crossover point lives.
List comprehension: allocate a list, compute all values, return. If you only need 5 items from a 10,000-element collection, that's 9,995 wasted computations. Generator: compute one value, yield, pause. If you break early, you skip the rest. Zero waste.
But if you iterate every single element and the computation per element is trivial — say a simple integer operation — the list's lower per-element overhead wins. The generator's yield machinery adds microseconds per iteration. On a million elements, microseconds add up to seconds.
The rule: benchmark with your actual data shape. Profile before you optimize. And never replace a list comprehension with a generator expression just because someone on Reddit said it's "better." It's only better when you're memory-bound or you won't consume the entire sequence.
send() Is How You Talk Back to a Generator
Most devs treat generators as one-way data pipes. You call next(), you get a value. That's fine for iterating over log files. But generators can receive data mid-execution using .send(). This turns them into coroutines — lightweight cooperative threads.
The trick: .send() resumes the generator AND injects a value into the yield expression. The first call MUST be next() or send(None) because no yield has been hit yet. After that, each send(val) sets yield's return value. This is how you implement state machines, streaming pipelines, or cooperative task schedulers without threading overhead.
Why bother? Because you avoid global state, external queues, and callback hell. The generator keeps its own context on the stack. Send data in, get data out. Clean, testable, production-hardened.
next() and send() raises TypeError. Wrap generator creation in a function that returns the primed generator — or you'll debug this at 2 AM.close() Is How You Fire a Generator Cleanly
Generators hold resources: open file handles, database cursors, socket connections. If you stop iterating early — break out of a for loop, raise an exception — the generator's stack frame freezes. That file handle stays open until garbage collection kicks in. That's a leak waiting to happen in production.
.close() raises GeneratorExit inside the generator at its current yield point. If your generator has a try/finally block, that finally runs. No other exception is raised to the caller. It's a clean, deterministic shutdown.
Pair this with contextlib.closing() or wrap your generator in a context manager. Never rely on gc to clean up your I/O. Explicit shutdown beats implicit leaks every time. Treat .close() like closing a file handle — you don't walk away leaving files open, don't walk away leaving generators open.
The Silent Empty Log Report — Generator Exhaustion in Production
filter_errors() which iterated it fully. When the count function later tried to iterate the same generator, it received nothing — StopIteration was already raised. No error was thrown; the for loop just didn't execute.- Generators are single-use. Passing one to a function that iterates it fully exhausts it silently.
- If multiple consumers need the same data, call
list()on the generator once and store the result. - Never assume iteration order or count — verify with a small test before deploying any generator pipeline.
list() at the pipeline start and compare output. Add a debug print('Consumed by', func.__name__) in each consumer function.list() call inside the pipeline. For example, list(lines) in a filter function materialises everything. Replace with lazy chaining.next(). Use for value in generator: or list(generator) to trigger execution.print(type(my_gen)) — confirm it's a generator object, not a function.print(list(my_gen)) — if empty, it's exhausted. Recreate by calling the generator function again.my_gen_func()) and then work with data.Key takeaways
next() call resumes it from exactly where it stopped.next() call to avoid TypeError.Common mistakes to avoid
3 patternsExpecting a generator function to run on call
my_gen_func() to trigger side effects (like printing) and nothing happens, or you print the return value and see '<generator object>' instead of your data.list() or use a for loop to actually run it, e.g. list(my_gen_func()) or next(my_gen_func()).Iterating an exhausted generator and getting no error
my_generator()) and iterate results repeatedly, or call the generator function again to get a fresh generator object.Using a generator expression where you immediately need all values anyway
Interview Questions on This Topic
What is the difference between a generator function and a regular function, and what happens to the execution frame when yield is encountered?
next() is called, execution resumes from right after the yield. The generator function returns a generator object when called, not a value.Frequently Asked Questions
20+ years shipping production Python across data and backend systems. Written from production experience, not tutorials.
That's Functions. Mark it forged?
8 min read · try the examples if you haven't