Skip to content
Home C / C++ C++20 Coroutines — Fixing Double-Free Handle Crashes

C++20 Coroutines — Fixing Double-Free Handle Crashes

Where developers are forged. · Structured learning · Free forever.
📍 Part of: C++ Advanced → Topic 13 of 18
Copying a coroutine_handle causes double-free crashes under load.
🔥 Advanced — solid C / C++ foundation required
In this tutorial, you'll learn
Copying a coroutine_handle causes double-free crashes under load.
  • The coroutine frame is heap-allocated and contains all locals that cross a suspension point — its lifetime is controlled entirely by your promise_type and RAII wrapper, not the compiler.
  • co_yield is syntactic sugar for co_await promise.yield_value(expr) — understanding this unifies generators and async tasks into one mental model: everything is an awaitable.
  • Pass-by-value for all coroutine parameters that cross a suspension point — string_view, span, and raw references pointing to caller-stack data are time bombs the compiler will not warn you about.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Coroutines let a function suspend and resume later without blocking a thread
  • The compiler rewrites the function into a heap-allocated frame with a promise object
  • co_yield always suspends; co_await may suspend based on await_ready()
  • Frame lifetime is your responsibility — RAII wrappers prevent leaks and double-frees
  • Symmetric transfer (returning a handle from await_suspend) prevents stack overflow in chains
  • Production trap: string_view or reference parameters dangle after suspension
🚨 START HERE

Quick Debug Cheat Sheet: Coroutine Frame & Lifecycle

Five-minute fixes for the most common coroutine problems seen in production C++20 code.
🟡

Coroutine memory leak

Immediate ActionCheck final_suspend() return type. If suspend_always, ensure handle.destroy() is called in destructor.
Commands
grep -R 'final_suspend' src/ | grep suspend_always
valgrind --leak-check=full ./your_app 2>&1 | grep 'definitely lost'
Fix NowAdd a destructor to your wrapper that calls if (handle_) handle_.destroy();
🟡

Double-free crash after copying task object

Immediate ActionDelete copy constructor immediately. Implement move semantics.
Commands
Search for 'operator=(const' or 'Task(const Task&' — those are copy constructors you must delete.
git diff HEAD --name-only | xargs grep -l 'coroutine_handle'
Fix NowAdd: Task(const Task&) = delete; Task& operator=(const Task&) = delete;
🟡

Coroutine resumes with wrong or corrupted data

Immediate ActionSuspect dangling references. Change all coroutine parameters that cross suspension points to pass-by-value.
Commands
Compile with -fsanitize=address and run your tests.
If still not caught, enable -fsanitize=undefined -D_GLIBCXX_DEBUG
Fix NowReplace string_view with std::string, raw pointers with unique_ptr/shared_ptr in coroutine signatures.
🟡

Stack overflow in deeply chained coroutines

Immediate ActionCheck if await_suspend calls handle.resume() directly. If yes, switch to symmetric transfer.
Commands
grep -r 'handle.resume()' src/ | grep await_suspend
If found, change return type from void to std::coroutine_handle<> and return the next handle to resume.
Fix NowImplement: std::coroutine_handle<> await_suspend(auto handle) { return next_handle; }
Production Incident

Double-Free Disaster After Copying a Coroutine Handle

A seemingly innocent copy of your Task wrapper causes a crash that only happens under heavy load. The root cause? The coroutine handle is a raw pointer, and two destructors called destroy() on the same frame.
SymptomIntermittent segfault or heap corruption after passing a Task by value into a function. Under light load the crash rarely reproduces; under stress it becomes deterministic.
AssumptionThe Task wrapper behaves like a std::shared_ptr — copying just increments a ref count. But there's no ref count. The handle is a bare pointer.
Root causeThe coroutine_handle is a pointer-sized value that does not own the frame. When you copy the Task, both copies hold the same handle. The first destructor to run destroys the frame; the second one calls destroy() on an already-freed pointer — undefined behaviour.
FixDelete the copy constructor and copy assignment operator in your coroutine wrapper. Implement move semantics using std::exchange to null out the source handle. The moved-from object's destructor must be a no-op.
Key Lesson
A coroutine_handle is not an owning pointer — treat it like a raw resource that must have exactly one owner.Always enforce the Rule of Five: implement move constructor, move assignment, destructor; delete copy operations.Enable AddressSanitizer with -fsanitize=address during development — it catches double-frees instantly.
Production Debug Guide

Symptom → Action table for the most common coroutine failures you'll encounter in the field.

Coroutine crashes on resume with heap-use-after-freeCheck if the coroutine frame was destroyed before the last resume. Look for missing RAII wrapper or premature handle.destroy(). Add AddressSanitizer: -fsanitize=address.
Memory grows monotonically over timeVerify that final_suspend() returns suspend_always but the frame is never destroyed. Confirm your wrapper's destructor calls handle.destroy() when done(). Also check for leaking coroutine handles that are never resumed.
Coroutine behaves correctly in debug but fails in releaseSuspect undefined behaviour from stale reference/pointer in frame. Check all parameters passed by reference (string_view, span, raw pointers). Change to pass-by-value. Enable UBSan (-fsanitize=undefined) to catch misuse.
Call stack grows without bound and eventually crashesYou're calling handle.resume() from within await_suspend instead of returning a handle for symmetric transfer. Refactor to return std::coroutine_handle<> from await_suspend to avoid nested resume calls.

Async programming in C++ has historically been a war zone. You either wrestled with raw threads and mutexes, chained together std::future callbacks until the code looked like spaghetti, or reached for a third-party library like Boost.Asio and learned its entire universe of abstractions before writing a single useful line. The fundamental problem was that the language had no native concept of 'pause and resume' — every async pattern was bolted on top of a model that was never designed for it.

C++20 coroutines change that at the language level. They give the compiler a first-class way to transform an ordinary-looking function into a state machine that can suspend and resume without blocking a thread. The magic is opt-in and zero-cost when you don't use it: there's no runtime, no garbage collector, no hidden scheduler. What you get instead is a set of three keywords — co_await, co_yield, and co_return — plus a precise protocol of customisation points that lets library authors (and you) define exactly what 'suspend' and 'resume' mean for your use case.

By the end of this article you'll understand how the coroutine frame is laid out in memory, how the promise_type protocol drives the entire lifecycle, how to build a lazy generator and a simple async task from scratch, and — critically — which production traps will silently corrupt your program if you don't know they're there. This isn't a hello-world tour. It's the article you read before you ship coroutine code to production.

How the Compiler Transforms a Coroutine: The Frame and the Promise

When the compiler sees co_await, co_yield, or co_return inside a function, it quietly rewrites that function into something unrecognisable. The original function body becomes a state machine. All local variables that need to survive a suspension point are hoisted into a heap-allocated coroutine frame. The frame also contains a promise object — the central hub that controls what the coroutine returns to its caller, what happens on suspension, and whether the coroutine suspends immediately when called or runs to the first suspension point first.

The coroutine handle (std::coroutine_handle<PromiseType>) is a lightweight pointer-sized value that represents a suspended coroutine. You can copy it, store it, pass it across threads, and — crucially — call .resume() on it from anywhere. The handle owns nothing: it's just an address. Ownership of the frame is a design decision you make through the promise.

The lifecycle goes: caller invokes coroutine → frame is allocated → promise.get_return_object() is called to produce the return value → initial_suspend() decides whether to suspend immediately → body runs until a suspension point → final_suspend() decides whether to suspend before the frame is destroyed. Miss any of these steps in your promise_type and you get undefined behaviour, not a compile error. That's what makes coroutines both powerful and dangerous.

CoroutineFrameInspection.cpp · CPP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
#include <coroutine>
#include <iostream>
#include <memory>

// ---------------------------------------------------------------------------
// A minimal coroutine that lets us inspect exactly when each lifecycle hook
// fires.  Nothing is hidden — every promise hook prints its name.
// ---------------------------------------------------------------------------

struct InspectableTask {
    // Every coroutine return type must have a nested promise_type.
    struct promise_type {
        // Called first: produces the object returned to the caller.
        InspectableTask get_return_object() {
            std::cout << "[promise] get_return_object()\n";
            return InspectableTask{
                std::coroutine_handle<promise_type>::from_promise(*this)
            };
        }

        // Returning suspend_always means: don't run any of the body yet.
        // The caller gets control back immediately after construction.
        std::suspend_always initial_suspend() noexcept {
            std::cout << "[promise] initial_suspend() — coroutine is lazy\n";
            return {};
        }

        // Returning suspend_always at the end keeps the frame alive so the
        // caller can inspect results.  Returning suspend_never destroys it.
        std::suspend_always final_suspend() noexcept {
            std::cout << "[promise] final_suspend() — frame still alive\n";
            return {};
        }

        // co_return with no value lands here.
        void return_void() {
            std::cout << "[promise] return_void()\n";
        }

        // Any exception that escapes the coroutine body arrives here.
        void unhandled_exception() {
            std::cout << "[promise] unhandled_exception() — rethrowing\n";
            std::rethrow_exception(std::current_exception());
        }
    };

    // The coroutine handle is the only data member — pointer-sized.
    std::coroutine_handle<promise_type> handle;

    explicit InspectableTask(std::coroutine_handle<promise_type> h)
        : handle(h) {}

    // We own the frame, so we destroy it in the destructor.
    ~InspectableTask() {
        if (handle && handle.done()) {
            std::cout << "[task] destroying completed frame\n";
            handle.destroy();
        }
    }

    // Let the caller drive execution one step at a time.
    void resume() {
        if (handle && !handle.done()) {
            handle.resume();
        }
    }

    bool is_done() const { return handle.done(); }
};

// ---------------------------------------------------------------------------
// The coroutine itself — looks like a normal function, but the compiler
// rewrites it completely because it contains co_return.
// ---------------------------------------------------------------------------
InspectableTask demonstrate_lifecycle() {
    std::cout << "[body] coroutine body — step 1\n";
    // co_await suspend_always suspends right here; caller gets control back.
    co_await std::suspend_always{};
    std::cout << "[body] coroutine body — step 2 (resumed)\n";
    co_return;  // triggers return_void(), then final_suspend()
}

int main() {
    std::cout << "--- calling coroutine ---\n";
    // Because initial_suspend returns suspend_always, the body hasn't run yet.
    InspectableTask task = demonstrate_lifecycle();

    std::cout << "--- first resume ---\n";
    task.resume();  // runs until the co_await suspend_always inside the body

    std::cout << "--- second resume ---\n";
    task.resume();  // runs to co_return, then final_suspend

    std::cout << "--- done: " << std::boolalpha << task.is_done() << " ---\n";
    return 0;
}
▶ Output
--- calling coroutine ---
[promise] get_return_object()
[promise] initial_suspend() — coroutine is lazy
--- first resume ---
[body] coroutine body — step 1
--- second resume ---
[body] coroutine body — step 2 (resumed)
[promise] return_void()
[promise] final_suspend() — frame still alive
--- done: true ---
[task] destroying completed frame
⚠ Watch Out: Frame Lifetime Is Your Responsibility
If final_suspend() returns suspend_always (frame stays alive) but you never call handle.destroy(), you leak memory. If final_suspend() returns suspend_never (frame auto-destroys) but you call handle.destroy() anyway, you get a double-free. Pick one ownership model and enforce it rigidly — preferably through RAII in your task type's destructor, exactly as shown above.
📊 Production Insight
The coroutine frame is heap-allocated. Every co_await that actually suspends is a potential context switch — not cheap. Measure with perf stat.
If initial_suspend returns suspend_always, the allocation happens synchronously at call time. If suspend_never, allocation is deferred to first suspension.
Rule: if your coroutine never suspends, you pay for the frame anyway — only use coroutines when at least one suspension is likely.
🎯 Key Takeaway
The frame allocates on the heap; handle.destroy() must be called once.
If you miss the destroy, you leak. If you double-destroy, you crash.
RAII wrappers with deleted copy semantics are the only safe path.

Building a Lazy Generator with co_yield: Infinite Sequences Without the Bloat

A generator is the cleanest demonstration of why coroutines exist. Before C++20, producing a lazily-evaluated sequence meant either returning a fully-materialised container (bad for large or infinite sequences), writing a hand-rolled iterator with boilerplate state, or pulling in a library. With co_yield, a coroutine can produce one value, suspend, and wait to be asked for the next — exactly like Python's yield.

The promise_type for a generator needs one extra hook: yield_value(). When the compiler sees co_yield expr, it rewrites it as co_await promise.yield_value(expr). Your yield_value() stores the value somewhere the caller can read it and returns an awaitable that suspends the coroutine. The caller then calls .next() or advances an iterator, reads the stored value, and resumes.

The performance story is compelling. For a sequence of N items, a generator allocates exactly one coroutine frame (one heap allocation) regardless of N. A vector<int> of N items allocates proportionally to N. For an infinite sequence — like a Fibonacci stream — the vector approach is simply impossible. The generator frame is also cache-friendly because local variables live contiguously inside it.

One subtlety: generator coroutines are inherently single-threaded pull-based structures. Don't try to resume them from multiple threads — there's no synchronisation inside the frame.

LazyFibonacciGenerator.cpp · CPP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
#include <coroutine>
#include <iostream>
#include <optional>
#include <cstdint>

// ---------------------------------------------------------------------------
// A reusable Generator<T> type.  Produces values lazily on demand.
// Ownership model: the Generator object owns the coroutine frame.
// ---------------------------------------------------------------------------
template<typename ValueType>
class Generator {
public:
    struct promise_type {
        std::optional<ValueType> current_value;

        Generator get_return_object() {
            return Generator{std::coroutine_handle<promise_type>::from_promise(*this)};
        }

        // Suspend immediately — don't produce anything until the caller asks.
        std::suspend_always initial_suspend() noexcept { return {}; }

        // Suspend at the end so the caller can detect completion via done().
        std::suspend_always final_suspend() noexcept { return {}; }

        // co_yield value  →  promise.yield_value(value)  →  suspend
        std::suspend_always yield_value(ValueType value) noexcept {
            current_value = std::move(value);
            return {};  // suspending here passes control back to the caller
        }

        void return_void() noexcept { current_value = std::nullopt; }
        void unhandled_exception() { std::rethrow_exception(std::current_exception()); }
    };

    // -----------------------------------------------------------------------
    // A minimal forward iterator so range-for works: for (auto v : gen) {...}
    // -----------------------------------------------------------------------
    struct iterator {
        std::coroutine_handle<promise_type> handle;

        iterator& operator++() {
            handle.resume();  // ask the coroutine for the next value
            return *this;
        }

        const ValueType& operator*() const {
            return *handle.promise().current_value;
        }

        bool operator==(std::default_sentinel_t) const {
            return handle.done();
        }
    };

    iterator begin() {
        handle_.resume();  // prime the generator: run to the first co_yield
        return iterator{handle_};
    }

    std::default_sentinel_t end() { return {}; }

    explicit Generator(std::coroutine_handle<promise_type> h) : handle_(h) {}

    // Non-copyable: a coroutine frame must have exactly one owner.
    Generator(const Generator&) = delete;
    Generator& operator=(const Generator&) = delete;

    Generator(Generator&& other) noexcept
        : handle_(std::exchange(other.handle_, nullptr)) {}

    ~Generator() {
        if (handle_) handle_.destroy();
    }

private:
    std::coroutine_handle<promise_type> handle_;
};

// ---------------------------------------------------------------------------
// An infinite Fibonacci sequence — impossible to express as a plain vector.
// The coroutine frame keeps 'a' and 'b' alive across every suspension.
// ---------------------------------------------------------------------------
Generator<uint64_t> fibonacci_sequence() {
    uint64_t a = 0;
    uint64_t b = 1;
    while (true) {          // infinite loop is fine — we suspend each iteration
        co_yield a;
        uint64_t next = a + b;
        a = b;
        b = next;
    }
}

// ---------------------------------------------------------------------------
// A finite range generator for comparison — shows early termination works.
// ---------------------------------------------------------------------------
Generator<int> range(int start, int stop, int step = 1) {
    for (int i = start; i < stop; i += step) {
        co_yield i;
    }
    // co_return is implicit when execution falls off the end
}

int main() {
    std::cout << "First 10 Fibonacci numbers:\n";
    int count = 0;
    for (uint64_t fib : fibonacci_sequence()) {
        std::cout << fib << ' ';
        if (++count == 10) break;  // early exit destroys the generator safely
    }
    std::cout << '\n';

    std::cout << "\nEven numbers in [0, 20):\n";
    for (int v : range(0, 20, 2)) {
        std::cout << v << ' ';
    }
    std::cout << '\n';

    return 0;
}
▶ Output
First 10 Fibonacci numbers:
0 1 1 2 3 5 8 13 21 34

Even numbers in [0, 20):
0 2 4 6 8 10 12 14 16 18
💡Pro Tip: Make Your Generator Non-Copyable From Day One
A coroutine handle is just a raw pointer under the hood. If you accidentally copy your Generator wrapper, both copies hold the same handle, and both destructors will call handle.destroy() — classic double-free. Explicitly delete the copy constructor and copy assignment operator, and implement move semantics using std::exchange(other.handle_, nullptr) so the moved-from object's destructor is a no-op.
📊 Production Insight
Generators pull values on demand — resuming from multiple threads races on the frame's state. Never do it.
The frame stays alive until the generator is destroyed. Breaking early from a range-for loop triggers destruction.
Rule: if you need multi-threaded iteration, use a channel (e.g., std::queue + mutex) instead of a generator.
🎯 Key Takeaway
co_yield = promise.yield_value(value) + suspend.
One frame per generator, regardless of how many values produced.
Non-copyable, move-only: enforce ownership from day one.

Writing an Async Task with co_await: Custom Awaitables and Thread Handoff

co_yield is the easy case — it always suspends. co_await is more nuanced because the awaitable you pass it gets to decide at runtime whether to suspend at all. When you write co_await expr, the compiler calls three methods on the awaitable: await_ready() (should we skip suspension entirely?), await_suspend(handle) (what do we do with this suspended coroutine?), and await_resume() (what value does the co_await expression produce when resumed?).

This three-method protocol is the extensibility seam that makes C++20 coroutines genuinely powerful. await_suspend receives the coroutine's own handle, so it can post the handle to a thread pool, store it in an event loop, attach it to an I/O completion port — anything. The coroutine is just data at that point. The scheduler decides when to call handle.resume().

For a real async task you need two more things: a way to propagate exceptions (store them in the promise, rethrow in await_resume), and a way to return a value from co_return (store it in the promise, retrieve it via get()). The example below builds a Task<T> that runs on a simulated thread pool — small enough to read in one sitting, but complete enough that you could adapt it for production use with a real executor.

AsyncTaskWithThreadHandoff.cpp · CPP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179
#include <coroutine>
#include <iostream>
#include <thread>
#include <functional>
#include <queue>
#include <mutex>
#include <condition_variable>
#include <exception>
#include <optional>
#include <stdexcept>

// ---------------------------------------------------------------------------
// A minimal thread pool that accepts work items (std::function<void()>).
// In production you'd use a battle-tested library, but this makes the
// coroutine-scheduler interaction completely transparent.
// ---------------------------------------------------------------------------
class ThreadPool {
public:
    explicit ThreadPool(size_t thread_count) {
        for (size_t i = 0; i < thread_count; ++i) {
            workers_.emplace_back([this] { worker_loop(); });
        }
    }

    ~ThreadPool() {
        {
            std::unique_lock lock(mutex_);
            shutdown_ = true;
        }
        cv_.notify_all();
        for (auto& t : workers_) t.join();
    }

    void post(std::function<void()> task) {
        {
            std::unique_lock lock(mutex_);
            work_queue_.push(std::move(task));
        }
        cv_.notify_one();
    }

private:
    void worker_loop() {
        while (true) {
            std::function<void()> task;
            {
                std::unique_lock lock(mutex_);
                cv_.wait(lock, [this] { return !work_queue_.empty() || shutdown_; });
                if (shutdown_ && work_queue_.empty()) return;
                task = std::move(work_queue_.front());
                work_queue_.pop();
            }
            task();  // execute the work item — could be handle.resume()
        }
    }

    std::vector<std::thread>          workers_;
    std::queue<std::function<void()>> work_queue_;
    std::mutex                        mutex_;
    std::condition_variable           cv_;
    bool                              shutdown_ = false;
};

// Global pool — in real code inject this via a scheduler abstraction.
ThreadPool global_pool{2};

// ---------------------------------------------------------------------------
// A custom awaitable that transfers the coroutine to the thread pool.
// This is the key pattern: await_suspend posts handle.resume() as a task.
// ---------------------------------------------------------------------------
struct TransferToPool {
    ThreadPool& pool;

    // Never skip the suspension — we always want a thread-hop.
    bool await_ready() const noexcept { return false; }

    // Store the coroutine handle in the pool's work queue.
    // When a pool thread picks it up, it calls handle.resume().
    void await_suspend(std::coroutine_handle<> handle) const {
        pool.post([handle]() mutable { handle.resume(); });
    }

    // No value produced by this await expression — it's purely a scheduling op.
    void await_resume() const noexcept {}
};

// ---------------------------------------------------------------------------
// Task<T>: a coroutine return type that carries a result (or exception).
// The caller synchronises with a condition variable via .get().
// ---------------------------------------------------------------------------
template<typename ResultType>
class Task {
public:
    struct promise_type {
        std::optional<ResultType>    result;
        std::exception_ptr           exception;
        std::mutex                   completion_mutex;
        std::condition_variable      completion_cv;
        bool                         completed = false;

        Task get_return_object() {
            return Task{std::coroutine_handle<promise_type>::from_promise(*this)};
        }

        // Run the body immediately — no lazy start for tasks.
        std::suspend_never initial_suspend() noexcept { return {}; }

        // Suspend at the end so the promise (and its result) stays alive
        // until the Task wrapper reads the value in .get().
        std::suspend_always final_suspend() noexcept {
            {
                std::unique_lock lock(completion_mutex);
                completed = true;
            }
            completion_cv.notify_all();  // wake up anyone blocked in .get()
            return {};
        }

        void return_value(ResultType value) {
            result = std::move(value);
        }

        void unhandled_exception() {
            exception = std::current_exception();
        }
    };

    // Block the calling thread until the coroutine finishes, then return result.
    ResultType get() {
        auto& p = handle_.promise();
        std::unique_lock lock(p.completion_mutex);
        p.completion_cv.wait(lock, [&p] { return p.completed; });

        if (p.exception) std::rethrow_exception(p.exception);
        return std::move(*p.result);
    }

    explicit Task(std::coroutine_handle<promise_type> h) : handle_(h) {}
    Task(Task&&) = default;
    ~Task() { if (handle_) handle_.destroy(); }
    Task(const Task&) = delete;

private:
    std::coroutine_handle<promise_type> handle_;
};

// ---------------------------------------------------------------------------
// A coroutine that hops to the thread pool, does 'heavy' work, hops back,
// then returns a computed result.  Notice it reads exactly like sync code.
// ---------------------------------------------------------------------------
Task<int> compute_on_pool(int input_value) {
    std::cout << "[coroutine] starting on thread "
              << std::this_thread::get_id() << '\n';

    // Hand off to the pool — execution resumes on a pool thread.
    co_await TransferToPool{global_pool};

    std::cout << "[coroutine] now running on pool thread "
              << std::this_thread::get_id() << '\n';

    // Simulate expensive computation (database query, file I/O, etc.)
    std::this_thread::sleep_for(std::chrono::milliseconds(50));
    int computed_result = input_value * input_value + 1;

    co_return computed_result;  // stored in promise.result, caller reads via .get()
}

int main() {
    std::cout << "[main] thread id: " << std::this_thread::get_id() << '\n';

    Task<int> task = compute_on_pool(7);

    std::cout << "[main] task launched, doing other work while we wait...\n";

    int result = task.get();  // blocks until coroutine finishes
    std::cout << "[main] result: " << result << '\n';  // 7*7+1 = 50

    return 0;
}
▶ Output
[main] thread id: 140234567890112
[coroutine] starting on thread 140234567890112
[main] task launched, doing other work while we wait...
[coroutine] now running on pool thread 140234512345678
[main] result: 50
🔥Interview Gold: Why Does await_suspend Return void Here?
await_suspend can return void (always suspend), bool (suspend conditionally — false means don't suspend), or std::coroutine_handle<> (symmetric transfer — immediately resume a different coroutine without growing the call stack). The symmetric transfer return type is critical for avoiding stack overflow in recursive coroutine chains, like coroutine A awaiting coroutine B awaiting coroutine C. Always prefer symmetric transfer in deeply-chained async pipelines.
📊 Production Insight
The thread pool's post() captures the handle by copy — that's fine because it's just a pointer. But if the pool dies before the coroutine resumes, the handle dangles.
Your coroutine's await_suspend runs on the calling thread. If it blocks (e.g., lock acquisition), you delay the caller.
Rule: keep await_suspend non-blocking and noexcept. If you must allocate, pre-allocate or use a pool.
🎯 Key Takeaway
co_await = await_ready()? skip : await_suspend() + await_resume().
await_suspend can transfer the handle to another thread without blocking.
Symmetric transfer (returning a handle) stops stack growth in chains.

Symmetric Transfer: Preventing Stack Overflow in Deeply Chained Coroutines

When coroutine A awaits coroutine B, the naive implementation stores B's handle and resumes A from within B's final suspension. That means A's resume() call is nested inside B's return, which is nested inside C's, etc. The call stack grows linearly with chain depth. At 100+ levels you hit the stack limit.

The solution is symmetric transfer. Instead of calling handle.resume() from within await_suspend, you return the handle to the coroutine that should be resumed next. The compiler then uses tail-call-like generation to transfer control directly, without adding to the call stack. The return type of await_suspend becomes std::coroutine_handle<>.

This is not a micro-optimisation. It's required for any production coroutine executor that chains tasks, because real workloads create chains of arbitrary depth. Most production libraries (libunifex, cppcoro) use symmetric transfer internally. The example below shows a simple scheduler that chains two coroutines without stack growth.

SymmetricTransferExample.cpp · CPP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
#include <coroutine>
#include <iostream>
#include <deque>

// ---------------------------------------------------------------------------
// A minimal trampoline-based executor that resumes coroutines sequentially.
// Uses symmetric transfer to avoid stack growth across coroutine boundaries.
// ---------------------------------------------------------------------------

struct Executor {
    std::deque<std::coroutine_handle<>> ready_queue;

    void schedule(std::coroutine_handle<> h) {
        ready_queue.push_back(h);
    }

    // Run all scheduled coroutines until the queue is empty.
    // Because we use symmetric transfer, this single loop can handle
    // arbitrarily deep chains without overflowing the stack.
    void run() {
        while (!ready_queue.empty()) {
            auto handle = ready_queue.front();
            ready_queue.pop_front();
            handle.resume();  // resume — await_suspend will push next handle
            // If the coroutine suspended and returned a handle, it's already queued.
        }
    }
};

// ---------------------------------------------------------------------------
// An awaitable that chains to the next coroutine via symmetric transfer.
// ---------------------------------------------------------------------------
struct ChainAwaitable {
    std::coroutine_handle<> next;

    bool await_ready() const noexcept { return false; }

    // Return the handle to resume next — this is symmetric transfer.
    std::coroutine_handle<> await_suspend(std::coroutine_handle<> current) noexcept {
        // 'next' is the coroutine that should run after this one suspends.
        // The compiler will immediately resume 'next' in a new call frame
        // instead of nesting inside 'current'.
        return next;
    }

    void await_resume() const noexcept {}
};

// ---------------------------------------------------------------------------
// A simple task that executes a chain and prints its depth.
// ---------------------------------------------------------------------------
struct ChainTask {
    struct promise_type {
        Executor* ex;

        ChainTask get_return_object() {
            return ChainTask{std::coroutine_handle<promise_type>::from_promise(*this), ex};
        }

        std::suspend_always initial_suspend() noexcept { return {}; }
        std::suspend_always final_suspend() noexcept { return {}; }
        void return_void() noexcept {}
        void unhandled_exception() { std::rethrow_exception(std::current_exception()); }
    };

    std::coroutine_handle<promise_type> handle;
    Executor* executor;

    ChainTask(std::coroutine_handle<promise_type> h, Executor* ex)
        : handle(h), executor(ex) {}
    ~ChainTask() { if (handle) handle.destroy(); }

    void resume() { if (!handle.done()) handle.resume(); }
};

// ---------------------------------------------------------------------------
// A coroutine that chains to another coroutine via symmetric transfer.
// co_await ChainAwaitable{next_handle} transfers control to the next task.
// ---------------------------------------------------------------------------
ChainTask chain_step(int depth, Executor& ex, int max_depth) {
    std::cout << "Depth " << depth << "\n";

    if (depth < max_depth) {
        // Create the next coroutine in the chain.
        auto next = chain_step(depth + 1, ex, max_depth);
        // Suspend and transfer to 'next' via symmetric transfer.
        co_await ChainAwaitable{next.handle};
    }

    co_return;
}

int main() {
    Executor ex;

    // Start the chain at depth 0 with max depth 1000.
    auto first = chain_step(0, ex, 1000);

    // Schedule the first coroutine.
    ex.schedule(first.handle);

    // Run the executor — even with 1000 chained coroutines, stack stays bounded.
    ex.run();

    std::cout << "All done. No stack overflow!\n";
    return 0;
}
▶ Output
Depth 0
Depth 1
Depth 2
...
Depth 999
All done. No stack overflow!
Mental Model
Symmetric Transfer as Tail Calls
Think of await_suspend returning a handle as a tail call: the current coroutine yields control, and the next one runs in its place without growing the stack.
  • Returning void from await_suspend nests the resume — stack grows O(n).
  • Returning a handle triggers symmetric transfer — stack stays O(1).
  • The compiler generates a direct jump instead of a call+ret sequence.
  • C++20's symmetric transfer is the coroutine equivalent of tail-call optimisation.
📊 Production Insight
Without symmetric transfer, a chain of 500 co_awaits overflows the stack (default 8 MB) on most x86 systems with debug symbols.
Production executors (libunifex, folly::coro) always use symmetric transfer; they break if you return void.
Rule: if your awaitable represents a continuation, make await_suspend return the next handle. It costs nothing and saves the stack.
🎯 Key Takeaway
Return a coroutine_handle from await_suspend for symmetric transfer.
void await_suspend nests calls and risks stack overflow.
Symmetric transfer is free and mandatory for chained async pipelines.

Production Gotchas: Allocator Elision, Dangling References, and the Rule of One Owner

The coroutine frame is heap-allocated by default via operator new. For hot paths — tight loops, high-frequency async operations — this matters. The good news: the compiler can elide the heap allocation entirely (Heap Allocation eLision Optimisation, HALO) when it can prove the coroutine's lifetime is contained within the caller's frame. This happens automatically when you don't store the handle externally and the optimiser can see both frames. Check your disassembly with -O2 before assuming allocation overhead.

The most insidious production bug is the dangling reference inside a coroutine frame. When you write co_await some_async_op(local_string), the compiler copies local_string into the frame. But if you pass a reference or pointer to a stack variable that lives in the caller's frame, that caller might have returned by the time the coroutine resumes on a different thread. This is identical to returning a pointer to a local variable, but harder to spot because the coroutine call looks synchronous.

The other sharp edge is exception safety at final_suspend. If final_suspend() returns suspend_always and you throw inside it — you can't. final_suspend must be noexcept. The standard mandates this. Any exception that escapes initial_suspend or final_suspend calls std::terminate immediately, not your unhandled_exception hook. This catches people off guard because unhandled_exception feels like a universal safety net — it isn't.

DanglingReferenceInCoroutine.cpp · CPP
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
#include <coroutine>
#include <iostream>
#include <string>
#include <string_view>

// ---------------------------------------------------------------------------
// Demonstrating the most common coroutine UB in production code:
// passing a reference to a temporary into a coroutine that suspends.
// ---------------------------------------------------------------------------

struct SimpleTask {
    struct promise_type {
        SimpleTask get_return_object() {
            return SimpleTask{std::coroutine_handle<promise_type>::from_promise(*this)};
        }
        std::suspend_always initial_suspend() noexcept { return {}; }
        std::suspend_always final_suspend() noexcept { return {}; }
        void return_void() noexcept {}
        void unhandled_exception() { std::rethrow_exception(std::current_exception()); }
    };

    std::coroutine_handle<promise_type> handle;
    explicit SimpleTask(std::coroutine_handle<promise_type> h) : handle(h) {}
    ~SimpleTask() { if (handle) handle.destroy(); }
    void resume() { if (handle && !handle.done()) handle.resume(); }
};

// ---------------------------------------------------------------------------
// WRONG: accepts string_view — a non-owning reference.
// If the string it views is destroyed before this coroutine resumes, BOOM.
// ---------------------------------------------------------------------------
SimpleTask process_message_WRONG(std::string_view message) {
    // The coroutine suspends here at initial_suspend().
    // 'message' is a string_view — it points into memory we don't own.
    co_await std::suspend_always{};  // suspend point
    // By the time we resume, the caller's temporary std::string may be gone.
    std::cout << "[WRONG] message: " << message << '\n';  // potential UB!
    co_return;
}

// ---------------------------------------------------------------------------
// RIGHT: accept by value — the string is copied into the coroutine frame.
// The frame owns the data for its entire lifetime.
// ---------------------------------------------------------------------------
SimpleTask process_message_CORRECT(std::string message) {
    co_await std::suspend_always{};  // safe to suspend: frame owns 'message'
    std::cout << "[CORRECT] message: " << message << '\n';
    co_return;
}

void demonstrate_dangling_reference() {
    SimpleTask bad_task = [&] {
        std::string temporary_message = "hello from temporary";
        // string_view points into 'temporary_message' on this lambda's stack.
        return process_message_WRONG(temporary_message);
        // temporary_message is destroyed here — before the coroutine resumes!
    }();

    std::cout << "[caller] temporary_message is now out of scope\n";
    bad_task.resume();  // string_view now dangles — undefined behaviour
}

void demonstrate_safe_ownership() {
    SimpleTask good_task = [&] {
        std::string source_message = "hello, safely owned";
        // std::string is copied by value into the coroutine frame.
        return process_message_CORRECT(source_message);
        // source_message destroyed here — but frame has its own copy, so fine.
    }();

    std::cout << "[caller] source string is out of scope, frame copy is safe\n";
    good_task.resume();  // perfectly safe
}

int main() {
    // demonstrate_dangling_reference();  // DO NOT run — UB for illustration only
    std::cout << "Dangling reference demo skipped (undefined behaviour)\n";
    demonstrate_safe_ownership();
    return 0;
}
▶ Output
Dangling reference demo skipped (undefined behaviour)
[caller] source string is out of scope, frame copy is safe
[CORRECT] message: hello, safely owned
⚠ Watch Out: Coroutine Parameters Are Copied Into the Frame
The standard says coroutine parameters are moved or copied into the frame before initial_suspend fires. This means value parameters are safe. But string_view, span, raw pointers, and references all copy the view — not the data it points to. Sanitise every coroutine signature: if the parameter is non-owning and the coroutine suspends, you have a time bomb. Enable AddressSanitizer (-fsanitize=address) during development — it catches most of these instantly.
📊 Production Insight
HALO works only when the optimiser can inline the entire coroutine — that's rarely possible when the handle escapes to a scheduler.
In benchmarks, frames that are not elided add ~30-50ns per allocation. For high-throughput services this can add up to milliseconds per request.
Rule: profile with -O2 -DNDEBUG. If allocation shows up in perf, consider custom allocators via promise_type::operator new.
🎯 Key Takeaway
Parameters are copied into the frame; references dangle after suspension.
final_suspend must be noexcept — exceptions here call std::terminate.
HALO is automatic when the frame doesn't escape; otherwise accept the allocation.
🗂 co_yield (Generator) vs co_await (Async Task)
Deciding which coroutine pattern fits your use case
Aspectco_yield (Generator)co_await (Async Task)
Primary use caseLazy sequences, pipelines, rangesAsync I/O, concurrency, thread handoff
Suspension triggerAlways suspends on every yieldAwaitable decides at runtime (await_ready)
Value flow directionCoroutine → caller (producer)Awaitable → coroutine (result injection)
Promise hook usedyield_value(T)No special hook; awaitable protocol
Typical initial_suspendsuspend_always (lazy start)suspend_never (eager start)
Thread safety concernSingle-threaded pull modelMust protect shared state on resume
Exception propagationRethrow in unhandled_exceptionStore in promise, rethrow in await_resume
Stack growth riskNone — single resume chainSymmetric transfer needed for deep chains
HALO elision possible?Often yes (tight iteration loops)Less likely — handle escapes to scheduler

🎯 Key Takeaways

  • The coroutine frame is heap-allocated and contains all locals that cross a suspension point — its lifetime is controlled entirely by your promise_type and RAII wrapper, not the compiler.
  • co_yield is syntactic sugar for co_await promise.yield_value(expr) — understanding this unifies generators and async tasks into one mental model: everything is an awaitable.
  • Pass-by-value for all coroutine parameters that cross a suspension point — string_view, span, and raw references pointing to caller-stack data are time bombs the compiler will not warn you about.
  • Symmetric transfer (returning a coroutine_handle from await_suspend) is not a micro-optimisation — it is how you prevent call-stack overflow in deeply-chained async coroutines and is essential in any production executor.
  • final_suspend must be noexcept — any exception there calls std::terminate, not unhandled_exception. Keep it a pure signalling operation.

⚠ Common Mistakes to Avoid

    Storing a string_view, span, or reference as a coroutine parameter
    Symptom

    The coroutine compiles and runs fine on small tests where the referred-to memory happens to still be alive, then silently corrupts memory in production when the caller's stack frame is gone by resume time.

    Fix

    Take all coroutine parameters by value if the function suspends; the standard guarantees value parameters are copied into the heap-allocated frame before initial_suspend fires.

    Forgetting to call handle.destroy() when final_suspend returns suspend_always
    Symptom

    The frame never deallocates, producing a slow memory leak that only shows up under load; Valgrind and AddressSanitizer will both report it as a definite leak.

    Fix

    Use RAII: always implement a destructor in your Task wrapper that calls handle.destroy() if the handle is non-null, following the same pattern as std::unique_ptr.

    Throwing or performing non-trivial work inside final_suspend
    Symptom

    final_suspend must be noexcept; any exception that escapes it calls std::terminate immediately, bypassing unhandled_exception entirely.

    Fix

    Keep final_suspend a pure signalling operation (notify a condition variable, set an atomic flag, or return a handle for symmetric transfer) and do all cleanup before the final co_return.

    Calling handle.resume() from within await_suspend instead of using symmetric transfer
    Symptom

    Deep chains of coroutines cause stack overflow (often around 500+ levels on default stacks). The crash is hard to link to coroutines because the stack trace shows many nested frames.

    Fix

    Return a coroutine_handle from await_suspend instead of calling .resume() directly. The compiler will handle the tail-call-like transfer, keeping the stack flat.

Interview Questions on This Topic

  • QExplain the three-method awaitable protocol (await_ready, await_suspend, await_resume). When would you return false from await_ready and what are the performance implications of doing so unconditionally?SeniorReveal
    await_ready returns true if the operation is already complete and we can skip suspension. Returning false unconditionally means every co_await suspends, even when the result is immediately available — adding latency and frame allocation overhead. Use await_ready to check fast conditions (e.g., a flag or cached value) to avoid unnecessary context switches.
  • QWhat is symmetric transfer in coroutines, what problem does it solve, and how does returning a coroutine_handle from await_suspend differ mechanically from calling handle.resume() directly inside await_suspend?SeniorReveal
    Symmetric transfer means await_suspend returns a coroutine_handle instead of void. The compiler then generates a direct tail-call-like jump to that handle, avoiding any additional stack frame. If you call handle.resume() directly, the resume is nested inside await_suspend and the stack grows. Symmetric transfer is essential for chained async pipelines where depth can exceed 1000 calls.
  • QIf final_suspend returns suspend_always and the coroutine frame is destroyed by the Task destructor, but there's also a coroutine_handle copy held somewhere else, what happens when that second handle is resumed or destroyed — and what pattern prevents this class of bug?SeniorReveal
    The second handle points to freed memory. Resuming it is use-after-free; destroying it is double-free. The fix is to make the coroutine wrapper non-copyable (delete copy ctor) and move-only. Any shared ownership should use a shared_ptr<coroutine_handle> or a custom reference-counted handle, but the simplest and most common pattern is unique ownership via RAII with move semantics.

Frequently Asked Questions

Do C++20 coroutines require a runtime or a scheduler?

No. C++20 provides only the language mechanism — the three keywords and the customisation-point protocol. There is no built-in scheduler, thread pool, or event loop. You either provide your own (as shown in the ThreadPool example) or use a library like cppcoro, libunifex, or ASIO that does so. This is intentional: the zero-overhead principle means you only pay for what you use.

What is the difference between co_return and a regular return in a coroutine?

A regular return statement inside a coroutine is a compile error — the compiler rejects it. You must use co_return. Under the hood, co_return expr calls promise.return_value(expr) (or promise.return_void() for co_return with no argument) and then falls through to the final_suspend() call. It does not immediately destroy the frame; final_suspend decides that.

Can I use std::generator from C++23 instead of writing my own Generator type?

Yes — C++23 standardises std::generator<T, V, Allocator> in <generator>, which is a production-quality, allocator-aware, recursive generator with a proper range interface. If your compiler supports C++23, prefer it over a hand-rolled generator for anything beyond learning purposes. The hand-rolled version in this article exists so you understand exactly what std::generator is doing internally.

What happens if I store a coroutine_handle in a container and the frame is destroyed?

The handle becomes a dangling pointer. Resuming or destroying it is undefined behaviour (crash, corruption, or silent data loss). The safest pattern is to ensure the container does not outlive the coroutine wrapper, or to use shared ownership (e.g., shared_ptr<coroutine_handle<>>) with a custom deleter that avoids double destruction.

Are coroutines compatible with C++ exceptions?

Yes. An exception thrown inside the coroutine body is caught by the compiler and forwarded to promise.unhandled_exception(). You must store the exception (via std::current_exception()) and rethrow it later in await_resume() or .get(). If no exception handling is provided, the exception will be lost and the program may terminate.

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousConcepts in C++20Next →Variadic Templates in C++
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged