C++20 Coroutines Explained: Internals, Pitfalls and Real-World Patterns
Async programming in C++ has historically been a war zone. You either wrestled with raw threads and mutexes, chained together std::future callbacks until the code looked like spaghetti, or reached for a third-party library like Boost.Asio and learned its entire universe of abstractions before writing a single useful line. The fundamental problem was that the language had no native concept of 'pause and resume' — every async pattern was bolted on top of a model that was never designed for it.
C++20 coroutines change that at the language level. They give the compiler a first-class way to transform an ordinary-looking function into a state machine that can suspend and resume without blocking a thread. The magic is opt-in and zero-cost when you don't use it: there's no runtime, no garbage collector, no hidden scheduler. What you get instead is a set of three keywords — co_await, co_yield, and co_return — plus a precise protocol of customisation points that lets library authors (and you) define exactly what 'suspend' and 'resume' mean for your use case.
By the end of this article you'll understand how the coroutine frame is laid out in memory, how the promise_type protocol drives the entire lifecycle, how to build a lazy generator and a simple async task from scratch, and — critically — which production traps will silently corrupt your program if you don't know they're there. This isn't a hello-world tour. It's the article you read before you ship coroutine code to production.
How the Compiler Transforms a Coroutine: The Frame and the Promise
When the compiler sees co_await, co_yield, or co_return inside a function, it quietly rewrites that function into something unrecognisable. The original function body becomes a state machine. All local variables that need to survive a suspension point are hoisted into a heap-allocated coroutine frame. The frame also contains a promise object — the central hub that controls what the coroutine returns to its caller, what happens on suspension, and whether the coroutine suspends immediately when called or runs to the first suspension point first.
The coroutine handle (std::coroutine_handle
The lifecycle goes: caller invokes coroutine → frame is allocated → promise.get_return_object() is called to produce the return value → initial_suspend() decides whether to suspend immediately → body runs until a suspension point → final_suspend() decides whether to suspend before the frame is destroyed. Miss any of these steps in your promise_type and you get undefined behaviour, not a compile error. That's what makes coroutines both powerful and dangerous.
#include <coroutine> #include <iostream> #include <memory> // --------------------------------------------------------------------------- // A minimal coroutine that lets us inspect exactly when each lifecycle hook // fires. Nothing is hidden — every promise hook prints its name. // --------------------------------------------------------------------------- struct InspectableTask { // Every coroutine return type must have a nested promise_type. struct promise_type { // Called first: produces the object returned to the caller. InspectableTask get_return_object() { std::cout << "[promise] get_return_object()\n"; return InspectableTask{ std::coroutine_handle<promise_type>::from_promise(*this) }; } // Returning suspend_always means: don't run any of the body yet. // The caller gets control back immediately after construction. std::suspend_always initial_suspend() noexcept { std::cout << "[promise] initial_suspend() — coroutine is lazy\n"; return {}; } // Returning suspend_always at the end keeps the frame alive so the // caller can inspect results. Returning suspend_never destroys it. std::suspend_always final_suspend() noexcept { std::cout << "[promise] final_suspend() — frame still alive\n"; return {}; } // co_return with no value lands here. void return_void() { std::cout << "[promise] return_void()\n"; } // Any exception that escapes the coroutine body arrives here. void unhandled_exception() { std::cout << "[promise] unhandled_exception() — rethrowing\n"; std::rethrow_exception(std::current_exception()); } }; // The coroutine handle is the only data member — pointer-sized. std::coroutine_handle<promise_type> handle; explicit InspectableTask(std::coroutine_handle<promise_type> h) : handle(h) {} // We own the frame, so we destroy it in the destructor. ~InspectableTask() { if (handle && handle.done()) { std::cout << "[task] destroying completed frame\n"; handle.destroy(); } } // Let the caller drive execution one step at a time. void resume() { if (handle && !handle.done()) { handle.resume(); } } bool is_done() const { return handle.done(); } }; // --------------------------------------------------------------------------- // The coroutine itself — looks like a normal function, but the compiler // rewrites it completely because it contains co_return. // --------------------------------------------------------------------------- InspectableTask demonstrate_lifecycle() { std::cout << "[body] coroutine body — step 1\n"; // co_await suspend_always suspends right here; caller gets control back. co_await std::suspend_always{}; std::cout << "[body] coroutine body — step 2 (resumed)\n"; co_return; // triggers return_void(), then final_suspend() } int main() { std::cout << "--- calling coroutine ---\n"; // Because initial_suspend returns suspend_always, the body hasn't run yet. InspectableTask task = demonstrate_lifecycle(); std::cout << "--- first resume ---\n"; task.resume(); // runs until the co_await suspend_always inside the body std::cout << "--- second resume ---\n"; task.resume(); // runs to co_return, then final_suspend std::cout << "--- done: " << std::boolalpha << task.is_done() << " ---\n"; return 0; }
[promise] get_return_object()
[promise] initial_suspend() — coroutine is lazy
--- first resume ---
[body] coroutine body — step 1
--- second resume ---
[body] coroutine body — step 2 (resumed)
[promise] return_void()
[promise] final_suspend() — frame still alive
--- done: true ---
[task] destroying completed frame
Building a Lazy Generator with co_yield: Infinite Sequences Without the Bloat
A generator is the cleanest demonstration of why coroutines exist. Before C++20, producing a lazily-evaluated sequence meant either returning a fully-materialised container (bad for large or infinite sequences), writing a hand-rolled iterator with boilerplate state, or pulling in a library. With co_yield, a coroutine can produce one value, suspend, and wait to be asked for the next — exactly like Python's yield.
The promise_type for a generator needs one extra hook: yield_value(). When the compiler sees co_yield expr, it rewrites it as co_await promise.yield_value(expr). Your yield_value() stores the value somewhere the caller can read it and returns an awaitable that suspends the coroutine. The caller then calls .next() or advances an iterator, reads the stored value, and resumes.
The performance story is compelling. For a sequence of N items, a generator allocates exactly one coroutine frame (one heap allocation) regardless of N. A vector
One subtlety: generator coroutines are inherently single-threaded pull-based structures. Don't try to resume them from multiple threads — there's no synchronisation inside the frame.
#include <coroutine> #include <iostream> #include <optional> #include <cstdint> // --------------------------------------------------------------------------- // A reusable Generator<T> type. Produces values lazily on demand. // Ownership model: the Generator object owns the coroutine frame. // --------------------------------------------------------------------------- template<typename ValueType> class Generator { public: struct promise_type { std::optional<ValueType> current_value; Generator get_return_object() { return Generator{std::coroutine_handle<promise_type>::from_promise(*this)}; } // Suspend immediately — don't produce anything until the caller asks. std::suspend_always initial_suspend() noexcept { return {}; } // Suspend at the end so the caller can detect completion via done(). std::suspend_always final_suspend() noexcept { return {}; } // co_yield value → promise.yield_value(value) → suspend std::suspend_always yield_value(ValueType value) noexcept { current_value = std::move(value); return {}; // suspending here passes control back to the caller } void return_void() noexcept { current_value = std::nullopt; } void unhandled_exception() { std::rethrow_exception(std::current_exception()); } }; // ----------------------------------------------------------------------- // A minimal forward iterator so range-for works: for (auto v : gen) {...} // ----------------------------------------------------------------------- struct iterator { std::coroutine_handle<promise_type> handle; iterator& operator++() { handle.resume(); // ask the coroutine for the next value return *this; } const ValueType& operator*() const { return *handle.promise().current_value; } bool operator==(std::default_sentinel_t) const { return handle.done(); } }; iterator begin() { handle_.resume(); // prime the generator: run to the first co_yield return iterator{handle_}; } std::default_sentinel_t end() { return {}; } explicit Generator(std::coroutine_handle<promise_type> h) : handle_(h) {} // Non-copyable: a coroutine frame must have exactly one owner. Generator(const Generator&) = delete; Generator& operator=(const Generator&) = delete; Generator(Generator&& other) noexcept : handle_(std::exchange(other.handle_, nullptr)) {} ~Generator() { if (handle_) handle_.destroy(); } private: std::coroutine_handle<promise_type> handle_; }; // --------------------------------------------------------------------------- // An infinite Fibonacci sequence — impossible to express as a plain vector. // The coroutine frame keeps 'a' and 'b' alive across every suspension. // --------------------------------------------------------------------------- Generator<uint64_t> fibonacci_sequence() { uint64_t a = 0; uint64_t b = 1; while (true) { // infinite loop is fine — we suspend each iteration co_yield a; uint64_t next = a + b; a = b; b = next; } } // --------------------------------------------------------------------------- // A finite range generator for comparison — shows early termination works. // --------------------------------------------------------------------------- Generator<int> range(int start, int stop, int step = 1) { for (int i = start; i < stop; i += step) { co_yield i; } // co_return is implicit when execution falls off the end } int main() { std::cout << "First 10 Fibonacci numbers:\n"; int count = 0; for (uint64_t fib : fibonacci_sequence()) { std::cout << fib << ' '; if (++count == 10) break; // early exit destroys the generator safely } std::cout << '\n'; std::cout << "\nEven numbers in [0, 20):\n"; for (int v : range(0, 20, 2)) { std::cout << v << ' '; } std::cout << '\n'; return 0; }
0 1 1 2 3 5 8 13 21 34
Even numbers in [0, 20):
0 2 4 6 8 10 12 14 16 18
Writing an Async Task with co_await: Custom Awaitables and Thread Handoff
co_yield is the easy case — it always suspends. co_await is more nuanced because the awaitable you pass it gets to decide at runtime whether to suspend at all. When you write co_await expr, the compiler calls three methods on the awaitable: await_ready() (should we skip suspension entirely?), await_suspend(handle) (what do we do with this suspended coroutine?), and await_resume() (what value does the co_await expression produce when resumed?).
This three-method protocol is the extensibility seam that makes C++20 coroutines genuinely powerful. await_suspend receives the coroutine's own handle, so it can post the handle to a thread pool, store it in an event loop, attach it to an I/O completion port — anything. The coroutine is just data at that point. The scheduler decides when to call handle.resume().
For a real async task you need two more things: a way to propagate exceptions (store them in the promise, rethrow in await_resume), and a way to return a value from co_return (store it in the promise, retrieve it via get()). The example below builds a Task
#include <coroutine> #include <iostream> #include <thread> #include <functional> #include <queue> #include <mutex> #include <condition_variable> #include <exception> #include <optional> #include <stdexcept> // --------------------------------------------------------------------------- // A minimal thread pool that accepts work items (std::function<void()>). // In production you'd use a battle-tested library, but this makes the // coroutine-scheduler interaction completely transparent. // --------------------------------------------------------------------------- class ThreadPool { public: explicit ThreadPool(size_t thread_count) { for (size_t i = 0; i < thread_count; ++i) { workers_.emplace_back([this] { worker_loop(); }); } } ~ThreadPool() { { std::unique_lock lock(mutex_); shutdown_ = true; } cv_.notify_all(); for (auto& t : workers_) t.join(); } void post(std::function<void()> task) { { std::unique_lock lock(mutex_); work_queue_.push(std::move(task)); } cv_.notify_one(); } private: void worker_loop() { while (true) { std::function<void()> task; { std::unique_lock lock(mutex_); cv_.wait(lock, [this] { return !work_queue_.empty() || shutdown_; }); if (shutdown_ && work_queue_.empty()) return; task = std::move(work_queue_.front()); work_queue_.pop(); } task(); // execute the work item — could be handle.resume() } } std::vector<std::thread> workers_; std::queue<std::function<void()>> work_queue_; std::mutex mutex_; std::condition_variable cv_; bool shutdown_ = false; }; // Global pool — in real code inject this via a scheduler abstraction. ThreadPool global_pool{2}; // --------------------------------------------------------------------------- // A custom awaitable that transfers the coroutine to the thread pool. // This is the key pattern: await_suspend posts handle.resume() as a task. // --------------------------------------------------------------------------- struct TransferToPool { ThreadPool& pool; // Never skip the suspension — we always want a thread-hop. bool await_ready() const noexcept { return false; } // Store the coroutine handle in the pool's work queue. // When a pool thread picks it up, it calls handle.resume(). void await_suspend(std::coroutine_handle<> handle) const { pool.post([handle]() mutable { handle.resume(); }); } // No value produced by this await expression — it's purely a scheduling op. void await_resume() const noexcept {} }; // --------------------------------------------------------------------------- // Task<T>: a coroutine return type that carries a result (or exception). // The caller synchronises with a condition variable via .get(). // --------------------------------------------------------------------------- template<typename ResultType> class Task { public: struct promise_type { std::optional<ResultType> result; std::exception_ptr exception; std::mutex completion_mutex; std::condition_variable completion_cv; bool completed = false; Task get_return_object() { return Task{std::coroutine_handle<promise_type>::from_promise(*this)}; } // Run the body immediately — no lazy start for tasks. std::suspend_never initial_suspend() noexcept { return {}; } // Suspend at the end so the promise (and its result) stays alive // until the Task wrapper reads the value in .get(). std::suspend_always final_suspend() noexcept { { std::unique_lock lock(completion_mutex); completed = true; } completion_cv.notify_all(); // wake up anyone blocked in .get() return {}; } void return_value(ResultType value) { result = std::move(value); } void unhandled_exception() { exception = std::current_exception(); } }; // Block the calling thread until the coroutine finishes, then return result. ResultType get() { auto& p = handle_.promise(); std::unique_lock lock(p.completion_mutex); p.completion_cv.wait(lock, [&p] { return p.completed; }); if (p.exception) std::rethrow_exception(p.exception); return std::move(*p.result); } explicit Task(std::coroutine_handle<promise_type> h) : handle_(h) {} Task(Task&&) = default; ~Task() { if (handle_) handle_.destroy(); } Task(const Task&) = delete; private: std::coroutine_handle<promise_type> handle_; }; // --------------------------------------------------------------------------- // A coroutine that hops to the thread pool, does 'heavy' work, hops back, // then returns a computed result. Notice it reads exactly like sync code. // --------------------------------------------------------------------------- Task<int> compute_on_pool(int input_value) { std::cout << "[coroutine] starting on thread " << std::this_thread::get_id() << '\n'; // Hand off to the pool — execution resumes on a pool thread. co_await TransferToPool{global_pool}; std::cout << "[coroutine] now running on pool thread " << std::this_thread::get_id() << '\n'; // Simulate expensive computation (database query, file I/O, etc.) std::this_thread::sleep_for(std::chrono::milliseconds(50)); int computed_result = input_value * input_value + 1; co_return computed_result; // stored in promise.result, caller reads via .get() } int main() { std::cout << "[main] thread id: " << std::this_thread::get_id() << '\n'; Task<int> task = compute_on_pool(7); std::cout << "[main] task launched, doing other work while we wait...\n"; int result = task.get(); // blocks until coroutine finishes std::cout << "[main] result: " << result << '\n'; // 7*7+1 = 50 return 0; }
[coroutine] starting on thread 140234567890112
[main] task launched, doing other work while we wait...
[coroutine] now running on pool thread 140234512345678
[main] result: 50
Production Gotchas: Allocator Elision, Dangling References, and the Rule of One Owner
The coroutine frame is heap-allocated by default via operator new. For hot paths — tight loops, high-frequency async operations — this matters. The good news: the compiler can elide the heap allocation entirely (Heap Allocation eLision Optimisation, HALO) when it can prove the coroutine's lifetime is contained within the caller's frame. This happens automatically when you don't store the handle externally and the optimiser can see both frames. Check your disassembly with -O2 before assuming allocation overhead.
The most insidious production bug is the dangling reference inside a coroutine frame. When you write co_await some_async_op(local_string), the compiler copies local_string into the frame. But if you pass a reference or pointer to a stack variable that lives in the caller's frame, that caller might have returned by the time the coroutine resumes on a different thread. This is identical to returning a pointer to a local variable, but harder to spot because the coroutine call looks synchronous.
The other sharp edge is exception safety at final_suspend. If final_suspend() returns suspend_always and you throw inside it — you can't. final_suspend must be noexcept. The standard mandates this. Any exception that escapes initial_suspend or final_suspend calls std::terminate immediately, not your unhandled_exception hook. This catches people off guard because unhandled_exception feels like a universal safety net — it isn't.
#include <coroutine> #include <iostream> #include <string> #include <string_view> // --------------------------------------------------------------------------- // Demonstrating the most common coroutine UB in production code: // passing a reference to a temporary into a coroutine that suspends. // --------------------------------------------------------------------------- struct SimpleTask { struct promise_type { SimpleTask get_return_object() { return SimpleTask{std::coroutine_handle<promise_type>::from_promise(*this)}; } std::suspend_always initial_suspend() noexcept { return {}; } std::suspend_always final_suspend() noexcept { return {}; } void return_void() noexcept {} void unhandled_exception() { std::rethrow_exception(std::current_exception()); } }; std::coroutine_handle<promise_type> handle; explicit SimpleTask(std::coroutine_handle<promise_type> h) : handle(h) {} ~SimpleTask() { if (handle) handle.destroy(); } void resume() { if (handle && !handle.done()) handle.resume(); } }; // --------------------------------------------------------------------------- // WRONG: accepts string_view — a non-owning reference. // If the string it views is destroyed before this coroutine resumes, BOOM. // --------------------------------------------------------------------------- SimpleTask process_message_WRONG(std::string_view message) { // The coroutine suspends here at initial_suspend(). // 'message' is a string_view — it points into memory we don't own. co_await std::suspend_always{}; // suspend point // By the time we resume, the caller's temporary std::string may be gone. std::cout << "[WRONG] message: " << message << '\n'; // potential UB! co_return; } // --------------------------------------------------------------------------- // RIGHT: accept by value — the string is copied into the coroutine frame. // The frame owns the data for its entire lifetime. // --------------------------------------------------------------------------- SimpleTask process_message_CORRECT(std::string message) { co_await std::suspend_always{}; // safe to suspend: frame owns 'message' std::cout << "[CORRECT] message: " << message << '\n'; co_return; } void demonstrate_dangling_reference() { SimpleTask bad_task = [&] { std::string temporary_message = "hello from temporary"; // string_view points into 'temporary_message' on this lambda's stack. return process_message_WRONG(temporary_message); // temporary_message is destroyed here — before the coroutine resumes! }(); std::cout << "[caller] temporary_message is now out of scope\n"; bad_task.resume(); // string_view now dangles — undefined behaviour } void demonstrate_safe_ownership() { SimpleTask good_task = [&] { std::string source_message = "hello, safely owned"; // std::string is copied by value into the coroutine frame. return process_message_CORRECT(source_message); // source_message destroyed here — but frame has its own copy, so fine. }(); std::cout << "[caller] source string is out of scope, frame copy is safe\n"; good_task.resume(); // perfectly safe } int main() { // demonstrate_dangling_reference(); // DO NOT run — UB for illustration only std::cout << "Dangling reference demo skipped (undefined behaviour)\n"; demonstrate_safe_ownership(); return 0; }
[caller] source string is out of scope, frame copy is safe
[CORRECT] message: hello, safely owned
| Aspect | co_yield (Generator) | co_await (Async Task) |
|---|---|---|
| Primary use case | Lazy sequences, pipelines, ranges | Async I/O, concurrency, thread handoff |
| Suspension trigger | Always suspends on every yield | Awaitable decides at runtime (await_ready) |
| Value flow direction | Coroutine → caller (producer) | Awaitable → coroutine (result injection) |
| Promise hook used | yield_value(T) | No special hook; awaitable protocol |
| Typical initial_suspend | suspend_always (lazy start) | suspend_never (eager start) |
| Thread safety concern | Single-threaded pull model | Must protect shared state on resume |
| Exception propagation | Rethrow in unhandled_exception | Store in promise, rethrow in await_resume |
| Stack growth risk | None — single resume chain | Symmetric transfer needed for deep chains |
| HALO elision possible? | Often yes (tight iteration loops) | Less likely — handle escapes to scheduler |
🎯 Key Takeaways
- The coroutine frame is heap-allocated and contains all locals that cross a suspension point — its lifetime is controlled entirely by your promise_type and RAII wrapper, not the compiler.
- co_yield is syntactic sugar for co_await promise.yield_value(expr) — understanding this unifies generators and async tasks into one mental model: everything is an awaitable.
- Pass-by-value for all coroutine parameters that cross a suspension point — string_view, span, and raw references pointing to caller-stack data are time bombs the compiler will not warn you about.
- Symmetric transfer (returning a coroutine_handle from await_suspend) is not a micro-optimisation — it is how you prevent call-stack overflow in deeply-chained async coroutines and is essential in any production executor.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Storing a string_view, span, or reference as a coroutine parameter — The coroutine compiles and runs fine on small tests where the referred-to memory happens to still be alive, then silently corrupts memory in production when the caller's stack frame is gone by resume time — Fix it by taking all coroutine parameters by value if the function suspends; the standard guarantees value parameters are copied into the heap-allocated frame before initial_suspend fires.
- ✕Mistake 2: Forgetting to call handle.destroy() when final_suspend returns suspend_always — The frame never deallocates, producing a slow memory leak that only shows up under load; Valgrind and AddressSanitizer will both report it as a definite leak — Fix it with RAII: always implement a destructor in your Task wrapper that calls handle.destroy() if the handle is non-null, following the same pattern as std::unique_ptr.
- ✕Mistake 3: Throwing or performing non-trivial work inside final_suspend — The standard mandates that final_suspend is noexcept; any exception that escapes it calls std::terminate immediately, bypassing unhandled_exception entirely — Fix it by keeping final_suspend a pure signalling operation (notify a condition variable, set an atomic flag, or return a handle for symmetric transfer) and doing all cleanup before the final co_return.
Interview Questions on This Topic
- QExplain the three-method awaitable protocol (await_ready, await_suspend, await_resume). When would you return false from await_ready and what are the performance implications of doing so unconditionally?
- QWhat is symmetric transfer in coroutines, what problem does it solve, and how does returning a coroutine_handle from await_suspend differ mechanically from calling handle.resume() directly inside await_suspend?
- QIf final_suspend returns suspend_always and the coroutine frame is destroyed by the Task destructor, but there's also a coroutine_handle copy held somewhere else, what happens when that second handle is resumed or destroyed — and what pattern prevents this class of bug?
Frequently Asked Questions
Do C++20 coroutines require a runtime or a scheduler?
No. C++20 provides only the language mechanism — the three keywords and the customisation-point protocol. There is no built-in scheduler, thread pool, or event loop. You either provide your own (as shown in the ThreadPool example) or use a library like cppcoro, libunifex, or ASIO that does so. This is intentional: the zero-overhead principle means you only pay for what you use.
What is the difference between co_return and a regular return in a coroutine?
A regular return statement inside a coroutine is a compile error — the compiler rejects it. You must use co_return. Under the hood, co_return expr calls promise.return_value(expr) (or promise.return_void() for co_return with no argument) and then falls through to the final_suspend() call. It does not immediately destroy the frame; final_suspend decides that.
Can I use std::generator from C++23 instead of writing my own Generator type?
Yes — C++23 standardises std::generator
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.