PHP Generators -- The 4 GB CSV That Crashed Production
Loading a 2M-row CSV with file() exhausted 1.
- PHP Generators create resumable sequences without storing all values in memory
- yield pauses execution and sends a value to the caller
- send() pushes data back into the generator — two-way communication
- yield from delegates to another iterator and captures its return value
- Memory stays flat (~1 MB) even for million-row datasets vs arrays that scale with size
- Biggest gotcha: calling send($nonNull) on a brand-new generator throws a fatal error — always initialise with current() or send(null) first
Most PHP developers hit a wall the first time they try to process a 500MB CSV file or generate a million-row report inside a web request. The naive approach — loading everything into an array — sends memory usage through the roof and often crashes the process entirely. Generators exist precisely to solve this class of problem, and they do it elegantly without forcing you to rearchitect your entire codebase.
Before generators (introduced in PHP 5.5), your only options were to either build a full array and accept the memory cost, or write a stateful iterator class with five methods and a lot of ceremony. Generators give you lazy, resumable sequences with a syntax so clean it looks almost too simple to be real. Under the hood, PHP compiles a generator function into a Generator object — a special internal class that implements both Iterator and the ability to receive values mid-execution.
By the end of this article you'll understand not just how to write a generator, but WHY the VM pauses execution at yield, how to push values back INTO a running generator with send(), how to compose generators using yield from, how to extract a final return value, and where generators genuinely beat arrays — plus the exact gotchas that trip up experienced developers in production.
How PHP Compiles a Generator Function — The Internals You Actually Need
When PHP's parser sees the yield keyword inside a function body, it silently transforms the entire function into a generator factory. Calling that function no longer executes any of its code — it returns a Generator object immediately. This is the single most important thing to internalise, because it explains almost every 'why didn't my code run?' bug you'll ever see with generators.
The Generator class implements the Iterator interface internally (rewind, current, key, next, valid) plus three extra methods: send(), throw(), and getReturn(). The Zend Engine stores the generator's execution context — local variable table, instruction pointer, call stack frame — in a heap-allocated zend_generator struct. This is why a generator can pause mid-function and resume: its stack frame is preserved off the real call stack.
Each time you call next() (or foreach triggers it), the engine restores that frame, runs until the next yield, stashes the yielded value, and suspends again. Memory for local variables stays alive for the generator's lifetime, but you only ever hold the current value in userland — not the entire sequence. That's the performance win in one sentence.
A generator function can yield zero or more times. After the last yield (or an explicit return), valid() returns false and the generator is exhausted. Exhausted generators cannot be rewound — rewind() on an already-started generator throws a fatal error.
Practical Power — Processing Large Files and Infinite Sequences Without Blowing Memory
Here's where generators stop being a curiosity and start being a production tool. Reading a large CSV line-by-line with a generator keeps peak memory at roughly the size of one line — no matter if the file is 1 MB or 10 GB. The alternative, file() or array-based fgetcsv loops that collect into an array, scales memory linearly with file size.
The key insight is that generators compose beautifully. You can pipe a file-reading generator through a filtering generator through a transforming generator — each stage lazy, each holding only one record at a time. This is PHP's equivalent of Unix pipes and it's just as powerful.
Infinite sequences are another natural fit. A Fibonacci generator, a UUID stream, a paginated API cursor — anything where 'all values' is meaningless or impossibly large becomes trivial with a generator. You pull exactly as many values as you need, then let the generator get garbage-collected.
For the file example below, watch the memory figures. On a real 100k-row CSV the array approach consumes ~80 MB; the generator approach holds steady at ~1 MB regardless of file length. That's not an optimisation — it's a different class of solution.
Two-Way Communication — send(), throw(), and getReturn() in Depth
Most tutorials stop at yield-as-output. But generators are bidirectional channels. The send() method lets you push a value INTO a running generator — the yield expression evaluates to whatever you sent. This unlocks coroutine-style patterns: think cooperative task schedulers, stateful parsers, or async I/O pipelines.
The flow is: you call send($value), the generator resumes, the yield expression evaluates to $value, execution continues to the next yield, and send() returns the newly yielded value. If there's no next yield, send() returns null.
Throw() is send()'s sibling — it injects an exception at the point of suspension instead of a value. This lets a scheduler cancel a coroutine cleanly without a global flag variable.
getReturn() extracts the value from an explicit return statement inside the generator. This is only accessible after the generator is exhausted (valid() === false). It's perfect for aggregations: let the generator yield intermediate progress and return the final computed result — callers get both streaming updates and a clean final answer without two separate functions.
Generator Delegation with yield from — Composing Sequences Like a Pro
yield from lets one generator delegate iteration to another iterable — another generator, an array, or any Traversable. It's not just syntactic sugar for a nested foreach: it transparently proxies send() and throw() calls through to the inner generator and captures the inner generator's return value as the result of the yield from expression itself.
This is critical for recursive algorithms. Without yield from, you can't simply call a generator recursively and have it yield into the outer stream — you'd have to foreach the inner generator and re-yield each value. yield from collapses this into one clean expression.
The return value capture is the part most developers miss. When the delegated generator finishes, yield from evaluates to that generator's return value. This lets you build tree-walking or recursive descent parsers where each level both yields leaf values AND returns metadata to its parent — a pattern that's genuinely hard to achieve with iterators.
Performance note: yield from avoids the double-copy overhead of re-yielding in a loop. The Zend Engine links the inner generator's frame directly, so values flow through without an intermediate allocation per element.
Real-World Pattern: Generator-Based Paginated API Consumer
Combining yield from with send() creates a powerful pattern for consuming paginated REST APIs. The outer generator manages pagination state (cursor, page number, rate-limit handling), while the inner generator (yield from) yields individual records. The caller sees one flat stream of records, not caring about pagination logic.
This pattern also allows injection of rate-limit delays via send() — the caller can tell the generator to pause if a 429 response is received. The generator can then sleep and retry the same page, all transparently.
The memory guarantee is critical here: even if the API has 10,000 pages of 100 records each, the generator holds at most one record in memory at a time. An array-based approach would collect all records, blowing through memory limits on large data sets.
| Aspect | Array-Based Iteration | Generator-Based Iteration |
|---|---|---|
| Memory usage (1M rows) | ~100–800 MB depending on row size | ~1–2 MB regardless of total rows |
| Time to first value | After ALL values computed/loaded | After FIRST value computed (immediate) |
| Rewindable | Yes — iterate as many times as needed | No — exhausted generator cannot rewind |
| Bidirectional (send values in) | No — read-only | Yes — via send() and throw() |
| Return a final value | The array IS the return value | Via return + getReturn() after exhaustion |
| Recursive composition | Manual re-merging of arrays | Clean via yield from with return capture |
| Lazy / short-circuit friendly | No — full computation always happens | Yes — unused values are never computed |
| Type-hinting in PHPDoc | array<int, MyClass> | Generator<TKey, TValue, TSend, TReturn> |
| Works as argument to foreach | Yes | Yes (implements Iterator) |
| Error propagation into sequence | Not applicable | Via throw() — injects exception at yield |
| PHP version requirement | All versions | PHP 5.5+ (yield from: PHP 7.0+) |
Key Takeaways
- A generator function returns a Generator object immediately — zero code inside it runs until you call
current(),next(), orsend(). This surprises even experienced developers. - yield is a two-way valve: it sends a value out AND receives a value in (via
send()). The yielded-OUT value is what callers see; the sent-IN value is what yield evaluates to inside the generator body. - yield from isn't just a loop shortcut — it transparently proxies
send()andthrow()to the inner generator AND captures the inner generator's return value as an expression result, enabling clean recursive composition. - Generators are single-use and cannot be rewound once started. If you need a replayable lazy sequence, encapsulate the generator factory in an IteratorAggregate class so getIterator() creates a fresh generator each time.
- Real memory savings come from not storing intermediate results. Measure memory with
memory_get_usage()to validate your generator is actually lazy in production.
Common Mistakes to Avoid
- Calling send($value) on a brand-new generator
Symptom: Throws 'Cannot send value to a newly created generator' fatal error. The generator hasn't reached its first yield yet.
Fix: Always initialise the generator first withcurrent()or send(null) before sending a real value. Think of it as a handshake: the generator must reach its first yield before it can accept incoming data. - Assuming a generator can be iterated twice
Symptom: After the first foreach exhausts it, a second foreach produces nothing; no error is thrown, just silently yields zero items.
Fix: Generators are single-use. If you need to iterate multiple times, wrap the generator factory in a class that re-calls the generator function on each getIterator() call (implement IteratorAggregate), or store results in an array explicitly. - Forgetting that returning from a generator doesn't work like a normal function return
Symptom: Developers expect $result = someGenerator() to capture the return value, but it just gets the Generator object; the actual return value is only accessible via $gen->getReturn() AFTER the generator is fully exhausted.
Fix: Always drain the generator (via foreach or while($gen->valid())) before calling getReturn(), otherwise you'll get a 'Cannot get return value of a generator that hasn't returned' exception.
Interview Questions on This Topic
- QWhat is the difference between yield and return in a PHP generator, and what happens if you call getReturn() before the generator is exhausted?Mid-levelReveal
- QHow does
send()work internally — what value does the yield expression evaluate to, and why can't you send a non-null value to a generator before it has started?SeniorReveal - QExplain yield from in PHP 7+. How does it differ from a foreach loop that re-yields each value, and how do you capture the return value of a delegated generator?SeniorReveal
Frequently Asked Questions
What is the difference between a PHP generator and a normal iterator?
A normal iterator requires a full class with five methods (rewind, current, key, next, valid). A generator achieves the same result with a plain function using yield — PHP auto-generates the Iterator implementation. Generators also add bidirectional communication (send/throw) and return value capture, which standard iterators don't have.
Does using generators actually save memory in PHP?
Yes, measurably and significantly. A generator holds exactly one value in memory at a time — the currently yielded item. An equivalent array-based approach allocates memory for every element simultaneously. For a 100,000-row CSV, the difference is typically 1–2 MB (generator) versus 50–200 MB (array), depending on row width.
Can I use a PHP generator with functions that expect an array, like array_map or array_filter?
Not directly — array_map and array_filter require actual arrays. You have two options: convert the generator to an array with iterator_to_array($generator) (which defeats the memory benefit), or use a pipeline of chained generators as shown in this article. For truly large datasets, always prefer generator pipelines over converting to an array mid-stream.
What happens if I break out of a foreach loop that iterates a generator?
The generator receives a GeneratorExit exception (internally) and its finally blocks execute. The generator is then left in an incomplete state — you cannot resume it. Always place cleanup code (like file handle closing) in a finally block inside the generator to ensure resource release.
Can I use yield inside a try block?
Yes, but be careful: if the generator is closed externally via close() or a break in the loop, the exception thrown inside the generator prevents the catch block from running. The finally block will still execute. So your resource cleanup should be in finally, not in catch.
That's Advanced PHP. Mark it forged?
5 min read · try the examples if you haven't