PHP Generators -- The 4 GB CSV That Crashed Production
Loading a 2M-row CSV with file() exhausted 1.5 GB memory -- generators yield rows on the fly, slashing memory to KB.
20+ years shipping production PHP systems at scale. Notes here come from systems that actually shipped.
- PHP Generators create resumable sequences without storing all values in memory
- yield pauses execution and sends a value to the caller
- send() pushes data back into the generator — two-way communication
- yield from delegates to another iterator and captures its return value
- Memory stays flat (~1 MB) even for million-row datasets vs arrays that scale with size
- Biggest gotcha: calling send($nonNull) on a brand-new generator throws a fatal error — always initialise with current() or send(null) first
Imagine you ordered 10,000 cookies from a bakery. A normal function would bake ALL 10,000 cookies, stack them in a giant box, and hand you the whole box at once — your kitchen is trashed. A generator is a baker who hands you one cookie at a time, baking the next one only when you reach out your hand. The baker pauses between each cookie, keeping your kitchen spotless. That pause-and-resume magic is exactly what PHP generators do with data.
Most PHP developers hit a wall the first time they try to process a 500MB CSV file or generate a million-row report inside a web request. The naive approach — loading everything into an array — sends memory usage through the roof and often crashes the process entirely. Generators exist precisely to solve this class of problem, and they do it elegantly without forcing you to rearchitect your entire codebase.
Before generators (introduced in PHP 5.5), your only options were to either build a full array and accept the memory cost, or write a stateful iterator class with five methods and a lot of ceremony. Generators give you lazy, resumable sequences with a syntax so clean it looks almost too simple to be real. Under the hood, PHP compiles a generator function into a Generator object — a special internal class that implements both Iterator and the ability to receive values mid-execution.
By the end of this article you'll understand not just how to write a generator, but WHY the VM pauses execution at yield, how to push values back INTO a running generator with send(), how to compose generators using yield from, how to extract a final return value, and where generators genuinely beat arrays — plus the exact gotchas that trip up experienced developers in production.
What PHP Generators Actually Do (and Why Your CSV Cratered)
A PHP generator is a function that uses yield instead of return to produce a sequence of values lazily — one at a time, on demand, without building an array in memory. The core mechanic: execution pauses at each yield, preserving local state, and resumes when the next value is requested. This turns O(n) memory into O(1) for iteration.
Internally, a generator returns a Generator object implementing Iterator. Each call to ->current() or ->next() advances the internal state machine. Crucially, you can send values back into the generator via ->send(), and throw exceptions into it. But the practical superpower is memory: iterating a 4 GB CSV row by row consumes a few kilobytes, not gigabytes.
Use generators whenever you process streams, large files, database result sets, or any sequence where loading everything into an array would be wasteful or impossible. In production, they are the difference between a script that handles 10 million rows and one that OOMs at 500k. They also enable cooperative multitasking via libraries like Amp or ReactPHP.
file() and array_map — the OOM killed the pod in 12 seconds.How PHP Compiles a Generator Function — The Internals You Actually Need
When PHP's parser sees the yield keyword inside a function body, it silently transforms the entire function into a generator factory. Calling that function no longer executes any of its code — it returns a Generator object immediately. This is the single most important thing to internalise, because it explains almost every 'why didn't my code run?' bug you'll ever see with generators.
The Generator class implements the Iterator interface internally (rewind, current, key, next, valid) plus three extra methods: send(), throw(), and getReturn(). The Zend Engine stores the generator's execution context — local variable table, instruction pointer, call stack frame — in a heap-allocated zend_generator struct. This is why a generator can pause mid-function and resume: its stack frame is preserved off the real call stack.
Each time you call next() (or foreach triggers it), the engine restores that frame, runs until the next yield, stashes the yielded value, and suspends again. Memory for local variables stays alive for the generator's lifetime, but you only ever hold the current value in userland — not the entire sequence. That's the performance win in one sentence.
A generator function can yield zero or more times. After the last yield (or an explicit return), valid() returns false and the generator is exhausted. Exhausted generators cannot be rewound — rewind() on an already-started generator throws a fatal error.
current() on a freshly created Generator automatically runs the function body up to the first yield — you don't need to call rewind() or next() first. But if you call next() before current(), you skip the first yielded value entirely. Always start with current() if you care about the first element.unset() once they're no longer needed inside the generator.current()/next() moves execution to the first yield.Practical Power — Processing Large Files and Infinite Sequences Without Blowing Memory
Here's where generators stop being a curiosity and start being a production tool. Reading a large CSV line-by-line with a generator keeps peak memory at roughly the size of one line — no matter if the file is 1 MB or 10 GB. The alternative, file() or array-based fgetcsv loops that collect into an array, scales memory linearly with file size.
The key insight is that generators compose beautifully. You can pipe a file-reading generator through a filtering generator through a transforming generator — each stage lazy, each holding only one record at a time. This is PHP's equivalent of Unix pipes and it's just as powerful.
Infinite sequences are another natural fit. A Fibonacci generator, a UUID stream, a paginated API cursor — anything where 'all values' is meaningless or impossibly large becomes trivial with a generator. You pull exactly as many values as you need, then let the generator get garbage-collected.
For the file example below, watch the memory figures. On a real 100k-row CSV the array approach consumes ~80 MB; the generator approach holds steady at ~1 MB regardless of file length. That's not an optimisation — it's a different class of solution.
close() on the generator. A finally block inside the generator WILL still execute — making it the right place to release file handles, database cursors, or network connections. Never rely on the generator running to completion for cleanup.Two-Way Communication — send(), throw(), and getReturn() in Depth
Most tutorials stop at yield-as-output. But generators are bidirectional channels. The send() method lets you push a value INTO a running generator — the yield expression evaluates to whatever you sent. This unlocks coroutine-style patterns: think cooperative task schedulers, stateful parsers, or async I/O pipelines.
The flow is: you call send($value), the generator resumes, the yield expression evaluates to $value, execution continues to the next yield, and send() returns the newly yielded value. If there's no next yield, send() returns null.
Throw() is send()'s sibling — it injects an exception at the point of suspension instead of a value. This lets a scheduler cancel a coroutine cleanly without a global flag variable.
getReturn() extracts the value from an explicit return statement inside the generator. This is only accessible after the generator is exhausted (valid() === false). It's perfect for aggregations: let the generator yield intermediate progress and return the final computed result — callers get both streaming updates and a clean final answer without two separate functions.
current(), next(), or send(null) — all three initialise the generator to the first yield. Calling send(42) on a fresh generator throws 'Cannot send value to a newly created generator'. This trips up almost everyone in interviews.send() in a production task scheduler can mask control flow bugs if the generator is not properly initialised.send() with a check on the generator's state using $gen->valid() or wrapping in try-catch.send() as a two-phase handshake — initialise first, then communicate.Generator Delegation with yield from — Composing Sequences Like a Pro
yield from lets one generator delegate iteration to another iterable — another generator, an array, or any Traversable. It's not just syntactic sugar for a nested foreach: it transparently proxies send() and throw() calls through to the inner generator and captures the inner generator's return value as the result of the yield from expression itself.
This is critical for recursive algorithms. Without yield from, you can't simply call a generator recursively and have it yield into the outer stream — you'd have to foreach the inner generator and re-yield each value. yield from collapses this into one clean expression.
The return value capture is the part most developers miss. When the delegated generator finishes, yield from evaluates to that generator's return value. This lets you build tree-walking or recursive descent parsers where each level both yields leaf values AND returns metadata to its parent — a pattern that's genuinely hard to achieve with iterators.
Performance note: yield from avoids the double-copy overhead of re-yielding in a loop. The Zend Engine links the inner generator's frame directly, so values flow through without an intermediate allocation per element.
send()/throw().Real-World Pattern: Generator-Based Paginated API Consumer
Combining yield from with send() creates a powerful pattern for consuming paginated REST APIs. The outer generator manages pagination state (cursor, page number, rate-limit handling), while the inner generator (yield from) yields individual records. The caller sees one flat stream of records, not caring about pagination logic.
This pattern also allows injection of rate-limit delays via send() — the caller can tell the generator to pause if a 429 response is received. The generator can then sleep and retry the same page, all transparently.
The memory guarantee is critical here: even if the API has 10,000 pages of 100 records each, the generator holds at most one record in memory at a time. An array-based approach would collect all records, blowing through memory limits on large data sets.
- yield sends data out to the caller and waits for a reply (the sent value).
- send() and yield form a request-response pair — like a coroutine.
- This enables patterns: lazy pagination, interactive parsers, cooperative multitasking.
send() keeps the generator decoupled from HTTP client logic.send() for control signals (throttle, cancel, change parameters), not for data transformation.send() builds clean paginated consumers.send() for external control signals like rate-limiting.Yielding Keys and Values — Don't Forget Associative Arrays
Most devs treat generators like firehoses for numeric sequences. That's fine until you're mapping log lines to timestamps or building lookup tables from a 2GB CSV. You can yield key-value pairs just like an associative array. The syntax is yield $key => $value. This matters when your downstream code expects named indexes—think JSON encoding, database upserts, or feeding a template engine. The WHY is simple: memory. Building an actual associative array for a million rows will crater your 256MB container. Yielding them keeps the hash map illusion without the allocation. PHP 8.x's JIT handles the internal hash lookups efficiently, but only if you're yielding keys explicitly. Forget that, and you're stuck with sequential integer indices that force your team to write brittle positional logic. Use associative yields, and your API responses stay readable without blowing the budget.
array_key_exists() or json_encode() with JSON_FORCE_OBJECT, you'll get an array, not an object. That's a silent data-type bug that surfaces in your mobile client as 'Expected object, got array'.Early Termination and Cleanup — Don't Leak Handles
Generators are lazy, which means they might never finish iterating. A foreach with a break or return? Your generator function stops mid-stream. That's a problem if you're holding file handles, database cursors, or temp streams. PHP's garbage collector will eventually close them, but 'eventually' is not a production guarantee. You need a finally block. The WHY: memory leaks in long-running processes like workers or daemons compound silently. A finally block runs when the generator is garbage collected or explicitly closed via Generator::__destruct(). This is PHP 8.x's safety net. Wrap your resource acquisition in a try-finally, and yield inside the try. When the consumer bails, your cleanup fires predictably. Don't assume the loop completes. In production, consumers are cut off by timeouts, exceptions, or user cancellations. Protect the resource, not the iteration.
The 4 GB CSV That Crashed a Production Reporting Job
file() and then iterating over the resulting array was fine because 'it worked fine on staging with 10k rows.' They didn't test at production scale.file($path) which loads the entire CSV into an array of strings — each string holding a full line. For a 2M row file with average 500 bytes per row, that's ~1 GB just for the raw lines, plus PHP overhead per string (~56 bytes) pushing total past 2 GB. The container had only 1.5 GB memory limit.file() call with a generator that reads one line at a time via fgetcsv() and yields each row. Peak memory dropped from ~2 GB to ~2 MB. The report now completes in under 3 minutes with 50 MB headroom.- Always assume input size is unbounded when processing files or API responses in production.
- Test memory behaviour under realistic row counts — 10x your expected max is a good rule.
- Use generators or
SplFixedArrayfor any operation that could grow beyond a few thousand items.
return early without any yield.send($nonNull) on a fresh generator. Always call current(), next(), or send(null) first to initialise it to the first yield.foreach loop. Also check that the generator itself doesn't hold references to large objects in local variables — those stay alive for the generator's lifetime.send() and the generator's yield expression is being assigned — if the loop condition depends on the sent value, ensure it eventually becomes falsy.memory_get_usage(true) before and after the generator loopxdebug_debug_zval('generatorVar') to inspect retained referencesKey takeaways
current(), next(), or send(). This surprises even experienced developers.send()). The yielded-OUT value is what callers see; the sent-IN value is what yield evaluates to inside the generator body.send() and throw() to the inner generator AND captures the inner generator's return value as an expression result, enabling clean recursive composition.memory_get_usage() to validate your generator is actually lazy in production.Common mistakes to avoid
3 patternsCalling send($value) on a brand-new generator
current() or send(null) before sending a real value. Think of it as a handshake: the generator must reach its first yield before it can accept incoming data.Assuming a generator can be iterated twice
Forgetting that returning from a generator doesn't work like a normal function return
Interview Questions on This Topic
What is the difference between yield and return in a PHP generator, and what happens if you call getReturn() before the generator is exhausted?
yield pauses execution and sends a value to the caller without exiting the function; return in a generator provides a final value that can be retrieved via getReturn() only after exhaustion. Calling getReturn() before the generator is exhausted throws an exception because the generator hasn't actually executed its return statement yet.Frequently Asked Questions
20+ years shipping production PHP systems at scale. Notes here come from systems that actually shipped.
That's Advanced PHP. Mark it forged?
7 min read · try the examples if you haven't