C++20 Coroutines — Why co_yield Crashed After 200 Frames
Recursive coroutine generators still exhaust call stacks despite heap frames.
20+ years shipping performance-critical C and C++ systems. Drawn from code that ran under real load.
- Concepts constrain template parameters with compile-time predicates — a contract for types, not runtime checks.
- Ranges compose lazy transformations over sequences without allocating intermediate containers.
- Coroutines enable cooperative multitasking: functions that suspend/resume without blocking threads.
- Modules replace header files with a proper compilation boundary; no more #include order bugs.
- The three-way comparison operator (<=>) auto-generates all relational operators for a class.
- Biggest gotcha: coroutines allocate heap frames by default; not every function should be async.
Imagine you're a chef in a restaurant kitchen. Before C++20, you could hire any cook and hope they knew how to handle a knife — and only find out they didn't when dinner service collapsed. C++20 Concepts are like requiring a formal knife-skills certificate before they step foot in your kitchen. Ranges are like having a smart sous chef who preps ingredients in exactly the order you need them, on demand, instead of dumping everything on the counter at once. Coroutines are like a cook who can pause mid-recipe, hand the stove to someone else, and pick up exactly where they left off — no wasted effort, no lost progress.
C++20 is the biggest revision to the language since C++11 rewired how we think about modern C++. It didn't just add syntax sugar — it introduced four pillars that fundamentally change how you architect, template, and scale C++ codebases: Concepts, Ranges, Coroutines, and Modules. These aren't academic curiosities. Google, Microsoft, and JetBrains are already shipping production code that leans on these features, and the compilers — GCC 10+, Clang 10+, MSVC 19.28+ — have solid enough support that there's no excuse for ignoring them in greenfield projects.
Before C++20, template error messages were infamous horror shows — pages of substitution failures that pointed nowhere useful. Constraints on template parameters were enforced via SFINAE tricks that even seasoned engineers would copy-paste without fully understanding. Lazy data pipelines required external libraries like range-v3. Asynchronous code meant either raw threads, callback hell, or heavy framework dependencies. Every one of these pain points has a direct answer in C++20.
By the end of this article you'll understand not just what each feature does, but why it was designed that way, what trade-offs the committee made, where the sharp edges are in real production code, and how to answer the interview questions that trip up candidates who only skimmed the release notes. We'll write real, runnable code for each feature and look hard at the moments where things go wrong.
Why C++20 Coroutines Are Not Just Syntactic Sugar
C++20 coroutines are stackless, resumable functions that suspend execution at a suspension point (co_await, co_yield, co_return) without blocking the thread. Unlike stackful coroutines (e.g., Boost.Coroutine2), each coroutine frame is heap-allocated and contains only the live variables across suspension points — no full stack copy. This makes them lightweight: a suspended coroutine costs roughly 100–200 bytes, not megabytes.
The compiler transforms the function into a state machine. Each suspension point becomes a label; the coroutine frame stores the program counter and spilled registers. Resumption jumps back to the correct label. This zero-overhead abstraction means you pay only for what you suspend — no hidden allocation if you never suspend. But the frame lifetime is decoupled from the caller: if you destroy the coroutine handle without resuming, the frame leaks.
Use coroutines when you need to write asynchronous code that reads like synchronous logic — network I/O, generator pipelines, or cooperative multitasking. They are not a replacement for threads; they are a tool to express non-blocking waits without callback hell. In production, the most common mistake is assuming the coroutine frame lives as long as the function scope — it doesn't. The frame persists until the coroutine completes or the handle is destroyed.
1. Concepts: Contracts for Template Parameters
Concepts allow you to specify constraints on template arguments that are checked at compile time, replacing SFINAE with readable, maintainable predicates. A concept is a compile-time boolean expression evaluated on the type — if it fails, the template is excluded from overload resolution (not an error unless no valid overload exists).
Before C++20, you'd use std::enable_if or tag dispatch to achieve the same. The difference? Concepts produce error messages that actually tell you what's wrong — not a 200-line template instantiation backtrace.
The standard library already provides predefined concepts like std::integral, std::floating_point, std::default_initializable, and std::ranges::range. You compose them with logical operators (&&, ||) and the requires clause.
requires expression can also assert on the existence of member functions and nested types — not just operators.+ but expecting a numeric type. Later operations like sqrt will produce deep template errors.static_assert in the implementation to catch logic errors early.<concepts> or define your own with requires. Avoid SFINAE entirely.&&, ||). Still cleaner than enable_if.enable_if, void_t). You can't use concepts in C++17.2. Ranges: Composable Lazy Data Pipelines
Ranges bring the composability of functional programming to C++ containers and views. A range is anything that can be iterated over — arrays, vectors, strings, or custom types. Views (std::views) are composable adaptors that lazily transform or filter ranges without allocating new containers.
Instead of writing nested loops or manually chaining algorithms, you write a pipeline: vec | views::filter(pred) | views::transform(f) | views::take(n). Each stage is lazy — only the elements actually needed are processed.
The performance win is twofold: no memory allocation for intermediate results, and early termination when using take or drop_while. But beware of dangling references: views store iterators, and if the underlying range dies, the view becomes invalid.
std::views::common to convert a view into a range that can be used with older APIs expecting begin/end pair.views::filter before views::transform if the filter rejects most elements — the transform spens work on filtered-out elements anyway? Actually, filter rejects before transform, so transform only runs on passing elements. That's correct.take before expensive transforms to limit work.take or drop_while avoid processing the entire sequence.vectorauto vec = range | ranges::to<std::vector>(); (C++23) or manually copy.3. Coroutines: Cooperative Multitasking Without Threads
Coroutines are functions that can suspend execution and resume later, preserving local state across suspension points. Unlike threads, coroutines don't require OS scheduling — the decision to suspend and resume is cooperative.
C++20 provides three keywords: co_await (suspend and wait for an awaitable), co_yield (produce a value and suspend), and co_return (final return from a coroutine). The compiler transforms the function into a state machine, allocating a coroutine frame on the heap (by default) to hold suspended state.
Coroutines shine in scenarios like streaming data, asynchronous I/O, and generators. But they have subtle pitfalls: heap allocation overhead, potential stack overflow if deeply nested, and difficult debugging because the stack is reconstructed at suspension points.
promise_type::get_return_object_on_allocation_failure to avoid allocation failure. Many production coroutine libraries (e.g., boost::asio) provide custom allocators to reduce overhead.std::experimental::generator or a custom promise with a pool allocator for high-throughput scenarios.coroutine_handle::resume explicitly in unit tests to control scheduling.co_yield generator pattern.co_await with a proper executor (boost::asio, libunifex).4. Modules: Modern Compilation Boundaries
Modules (export, import) provide a new way to organise C++ code that is faster to compile and more hygienic than headers. A module interface unit (.cppm) declares what is exported, and module implementation units (.cpp) define non-exported details. Importing a module gives access only to the exported names, eliminating macro leaks, ODR violations, and include-order dependencies.
Modules also improve build times because the compiler pre-compiles module interfaces into .pcm files (Clang) or .ifc files (MSVC). Translation units that import the module only need to load that precompiled interface — no reparsing of headers.
However, module support is not fully consistent across compilers. You may encounter issues with .cppm vs .ixx extensions, and interaction with precompiled headers (PCH) can be fragile.
.cppm and -fmodules-ts. Clang uses .cppm and -fmodules. MSVC uses .ixx for module interface and /interface. The extension matters — using .h for a module interface won't work. Always check your compiler's documentation for exact file extension and flags.#pragma once workarounds5. Three-Way Comparison Operator (Spaceship)
The <=> operator (also called the spaceship operator) performs a three-way comparison, returning a comparison category type that indicates less, equal, or greater. The compiler can automatically generate ==, !=, <, <=, >, >= from <=> if you write = default. This eliminates the boilerplate of writing six comparison operators for a class.
There are three category types: std::strong_ordering (total order: no two values are incomparable), std::weak_ordering (equivalence classes, e.g., case-insensitive strings), and std::partial_ordering (values may be incomparable, e.g., floating-point NaN).
The operator is a two-way operator with reversed candidate generation, which can lead to ambiguities when both <=> and == are custom defined.
= default <=>, the compiler generates == and != only if <=>'s return type is one of std::strong_ordering or std::weak_ordering (i.e., not partial_ordering). For partial_ordering, you must also define == explicitly if you need equality.<=> and a custom operator==. The compiler sees reversed candidates for <=> and may refuse to compile. The fix: remove the custom operator== if it does the same as the generated one, or qualify calls with explicit operator<=>.<=> returns partial_ordering — be aware that NaN comparisons will be unordered, which affects sorting containers like std::set. Use std::strong_ordering only if you guarantee no NaN.<=> with = default generates all six relational operators.<=> and ==.partial_ordering — handle NaN explicitly.auto operator<=>(const T&) const = default;. Generates all six operators cleanly.operator<=> manually to return strong_ordering after handling NaN, or accept partial_ordering but be aware of std::set incompatibility.== and a hash function. Don't add <=> unless you need sortability.6. std::to_array: Stop Using Raw Arrays in Templates
We had a bug last month. A junior wrote a template function expecting an array of integers, passed a braced-init-list, and got template deduction failure. The compiler couldn't figure out what type {1,2,3} was. Enter std::to_array. It's a factory function that deduces both size and element type from an initializer list at compile time. No more std::array size template arguments that mismatch. No more silent decays to pointers. The WHY: braced-init-lists have no type in template context. std::to_array converts them to a proper std::array with a concrete type. Use it for any compile-time sized container you pass into template code. It's in <array> and costs nothing at runtime. The compiler will reject mismatched sizes before your CI pipeline runs.
std::to_array on std::vector or dynamic data. It copies the elements at compile time, not runtime. Use std::array::data() and manual copy if you need dynamic data into a fixed-size array.std::to_array whenever you need a compile-time array from an initializer list inside templates. It fixes deduction failures.7. [[likely]] and [[unlikely]]: Assert Your Branch Expectations to the Optimizer
Your hot path was slow. We profiled. The branch predictor was thrashing on a 50/50 split that should have been 95/5. The compiler didn't know. [[likely]] and [[unlikely]] are attributes that annotate which branch the CPU should predict. No functional change. Pure performance hint. WHY: branch misprediction stalls cost 10-20 cycles per miss. In tight loops, that adds up. Place [[likely]] on the path that runs 95% of the time—error checks, fast-return paths, early exits. Place [[unlikely]] on error handling. The compiler will reorder blocks to minimize taken branches. It works in if/else, switch cases, and loop conditions. One caveat: don't lie. If you mark a branch [[likely]] that actually runs 50%, performance degrades. Profile first, annotate second.
[[likely]] on fast paths and [[unlikely]] on error handling. Never guess—profile your branches first.8. Map and Set .contains(): Kill the .count() Hack for Good
I've seen codebases with if (my_map.count(key) > 0) scattered everywhere. That works, but it's wasteful. std::map::count has to traverse the tree to count all matches—which returns 0 or 1 for unique keys. Inefficient and confusing. C++20 gives us .contains(key) which returns bool directly. It communicates intent: "does this key exist?" not "how many times does this key appear?". The WHY: readability and performance. can short-circuit after finding the first match. For unordered containers, it's a single hash lookup. For ordered maps, it's one tree traversal, not a full count. Replace all your contains().count() > 0 checks. It's a safe mechanical transformation. Your code will be clearer. Your code reviewers will thank you. Your profiler will show lower CPU usage.
.contains() if you need to access the element after checking existence. Use .find() and compare against .end() instead. contains() doesn't return an iterator.map.count(key) > 0 with map.contains(key). It's faster and clearly expresses presence checking.Coroutine Stack Exhaustion in a Production Video Transcoding Pipeline
co_yield, preventing deep recursion. The generator was actually calling itself recursively without a base case.co_yield but never allowed the promise to destroy the frame. The coroutine frame remained on the heap and the call stack grew unboundedly.- Coroutines don't automatically protect against infinite recursion — they just shift the allocation to the heap, but the call stack still grows for each nested suspend.
- Always test coroutines with stress inputs that can cause deep nesting. Stack usage analysis is vital before production deployment.
- Prefer iterative state machines over recursive generators for unbounded data flows.
-fconcepts-diagnostics-depth=2 (GCC) or -Xclang -fconcepts-ts (Clang) to print only the first failing constraint. Avoid parsing the full substitute failure tree.std::noop_coroutine or a custom promise with final_suspend. Ensure the coroutine handle is destroyed or the promise's return_void properly destroys the frame.-std=c++20 -fmodules-ts (GCC) or -std=c++20 -fmodules (Clang). Check that the module interface unit has the .cppm extension and includes export module mymodule;.<=> generates wrong comparison order for custom typesoperator<=> manually, ensure it returns a comparison category type (std::strong_ordering, etc.). For generated ==/!=, the compiler synthesises from <=> only if operator== is not explicitly defined.g++ -std=c++20 -fconcepts-diagnostics-depth=2 file.cppclang++ -std=c++20 -Xclang -fconcepts-ts file.cpprequires expression and check each sub-expression individually.Key takeaways
==.Common mistakes to avoid
4 patternsUsing `requires` without understanding substitution failure
requires expression but fails on a later constraint. Error messages point to the instantiation site, not the concept, hiding the real cause.requires clause separately. Use static_assert(MyConcept<T>) at the template definition to verify constraints are sufficient.Assuming coroutine frame is stack-allocated or cheap
Mixing modules with `#include` in the same translation unit
#include before export module is not allowed (except for headers used by the module itself).#include statements after the export module line, or use import for dependencies. Only import <iostream> style is allowed, not #include.Defining both `<=>` and `==` in a class that should be value-ordered
<=> conflicts with the custom ==.operator== if it does the same as the generated one. If you need different equality semantics, define <=> manually and not use = default.Interview Questions on This Topic
Explain how concepts differ from SFINAE and why they improve error messages.
std::enable_if to conditionally disable templates. When a substitution fails, the compiler silently removes that overload and tries others — resulting in opaque error messages if no viable overload exists. Concepts replace this with explicit compile-time predicates. If a concept is not satisfied, the compiler reports exactly which constraint failed and on what type, at the call site. Concepts also participate in overload resolution as ordered constraints (using constraint subsumption), allowing partial ordering without additional traits.
Example:
``cpp
template<typename T>
requires std::integral<T>
T add(T a, T b) { return a + b; }
`
If called with float`, error says: 'constraints not satisfied: std::integral<float>' — far clearer than SFINAE noise.Frequently Asked Questions
20+ years shipping performance-critical C and C++ systems. Drawn from code that ran under real load.
That's C++ Advanced. Mark it forged?
6 min read · try the examples if you haven't