Python asyncio Shutdown Hangs — CancelledError Swallowing
CancelledError caught in except Exception blocks causes 30+ second SIGTERM hangs.
- Coroutine = async def function that yields control at each await point
- Event loop runs a single thread, switching tasks when they await I/O
- await does not block the thread — it suspends the coroutine until the future resolves
- asyncio.gather runs tasks concurrently; return_exceptions=True prevents one failure from cancelling all
- CancelledError must be re-raised; swallowing it breaks shutdown
- Blocking the loop (e.g. time.sleep) freezes all concurrent tasks
Every modern Python backend eventually hits the same wall: your code spends 90% of its time waiting — waiting for a database to respond, waiting for an API call to return, waiting for a file to be read off disk. Threads are the classic answer, but they carry a tax: memory overhead, the GIL playing referee, and race conditions that appear only in production at 3am. There's a better tool for I/O-bound concurrency, and it's been in the standard library since Python 3.4, matured dramatically in 3.7, and is now the backbone of frameworks like FastAPI, aiohttp, and Starlette.
asyncio solves the 'waiting problem' through cooperative multitasking. Instead of the OS forcibly switching between threads, your coroutines voluntarily yield control back to an event loop whenever they'd otherwise block. One thread, one event loop, potentially thousands of concurrent operations — all without a mutex in sight. The catch is that everything in your call stack must understand this contract, which is what makes asyncio feel like learning a new dialect of Python at first.
By the end of this article you'll understand how the event loop actually schedules work, how coroutines differ from generators at the bytecode level, why blocking the event loop is the cardinal sin of async Python, and how to structure real production services — including proper cancellation, timeouts, error propagation, and the patterns that separate async code that scales from async code that silently serializes everything.
How the Event Loop Actually Works — Not the Simplified Version
Most explanations stop at 'the event loop runs coroutines'. That's not enough when something breaks in production. The event loop is essentially a tight loop around a system call — select, epoll, or kqueue depending on your OS — that asks the kernel: 'which of these file descriptors are ready?'. When one is ready, the loop wakes up the coroutine that was waiting on it and resumes execution from the exact point it yielded.
A coroutine is a function defined with async def. Under the hood it compiles to a code object whose frame can be suspended and resumed. When you await something, Python calls __await__ on the awaitable, which ultimately bottoms out in a Future object. That Future registers a callback with the event loop. The coroutine's frame is frozen — local variables and all — until the Future resolves, at which point the loop schedules its resumption.
This is why you can have 10,000 concurrent HTTP connections on a single thread: no connection holds the thread while waiting. Each coroutine's frame costs roughly 1-2 KB of heap memory — orders of magnitude cheaper than an OS thread's 1-8 MB stack.
Understanding this model is what makes the rule 'never block the event loop' feel obvious rather than arbitrary: if your coroutine calls time.sleep(5), the entire event loop freezes for five seconds because it's still on the same thread. Every other 'concurrent' coroutine is stuck waiting.
Tasks, Futures, and Awaitable Contracts — The Object Model Behind await
There are three things you can await in Python: coroutines, Tasks, and Futures. Understanding the difference is critical for writing correct async code.
A coroutine object (what you get when you call an async def function without await) is a lazy generator-like object. Nothing runs until the event loop drives it. If you write fetch_user_profile(42) without awaiting it, Python will create the object and immediately warn you it was never awaited.
A Future is a low-level promise object. It starts in a pending state and transitions to done (with a result or an exception) exactly once. You almost never create Futures manually in application code — they live inside the networking and I/O layers.
A Task wraps a coroutine and schedules it to run on the event loop immediately via . This is the key difference from a bare asyncio.create_task()await: await runs that coroutine sequentially from your perspective, while coroutine()asyncio.create_task( schedules it concurrently and returns a handle you can await later.coroutine())
The await keyword calls _ on the right-hand side object, which must return an iterator. For Tasks and Futures that iterator suspends the current coroutine and resumes it when the Future resolves. This is the same protocol _await__()yield from used in Python 3.4 generators — asyncio is built on top of that generator machinery.
Cancellation, Timeouts, and Error Handling — The Production Minefield
Happy-path async code is easy. Production async code is defined by how it handles failure. Three scenarios trip up even experienced developers: task cancellation, timeouts, and exception propagation through gather.
Cancellation in asyncio is cooperative, not forcible. When you call , Python injects a task.cancel()CancelledError into the coroutine at its next await point. If the coroutine catches CancelledError and doesn't re-raise it, the cancellation is silently swallowed — a serious bug. Always re-raise CancelledError or use finally blocks for cleanup.
Timeouts are best handled with (Python 3.11+) or asyncio.timeout() on earlier versions. Both wrap a asyncio.wait_for()CancelledError in a TimeoutError so you can distinguish 'took too long' from 'was cancelled by a parent task'.
Exception propagation through gather has a sharp edge: by default, if one task raises, gather cancels the remaining tasks and re-raises the first exception. You lose the results of tasks that succeeded. Pass return_exceptions=True in production so you can inspect every result individually.
asyncio.TaskGroup (Python 3.11+) is now the preferred pattern — it provides structured concurrency where all child tasks are cancelled if any one fails, and all exceptions are surfaced together via an ExceptionGroup.
Blocking the Event Loop — How to Detect It and What to Do Instead
Blocking the event loop is the cardinal sin of async Python and the most common source of 'asyncio isn't faster than sync code' complaints. Any call that holds the thread without yielding — a , a requests.get(), a CPU-heavy loop, even a naively-called time.sleep() on a 50MB payload — freezes every other coroutine in your application.json.loads()
The asyncio event loop has a built-in slow-callback detector: set loop.slow_callback_duration to a low threshold (e.g. 50ms) and enable debug mode. Python will log a warning whenever a callback holds the loop longer than that threshold. This is invaluable in production profiling.
For blocking I/O you can't make async (a synchronous DB driver, a legacy library), use to offload work to a thread pool. For CPU-bound work, use loop.run_in_executor()ProcessPoolExecutor — threads won't help here because of the GIL.
(Python 3.9+) is a clean shorthand for asyncio.to_thread()run_in_executor with the default thread pool. It's idiomatic for wrapping synchronous file I/O, synchronous HTTP calls, or any legacy synchronous function you can't replace yet.
The mental model: the event loop is the single thread. Think of it as a very important person's personal assistant. Every synchronous call is a task that physically occupies the assistant. Every await is handing a task to someone else while the assistant handles the next thing.
Production Patterns: Structured Concurrency with TaskGroup vs gather
asyncio.gather() and asyncio.TaskGroup both run multiple tasks concurrently, but they differ fundamentally in failure handling. gather() by default cancels all tasks if one raises, and it swallows the CancelledError that was injected into surviving tasks — you'll never see them. TaskGroup (Python 3.11+) provides structured concurrency: every task is a child of the group, and if any one fails, all children are cancelled. When all finish, exceptions are merged into an ExceptionGroup.
The critical distinction: with gather(return_exceptions=False), you get the first exception and the rest are cancelled silently. With gather(return_exceptions=True), you get a list with mixed results and exceptions, but the tasks that finished after the first failure are already cancelled by the time you inspect the results. TaskGroup avoids this surprise: you commit to either all succeed or all are cancelled, and you handle exceptions after the context manager exits.
In high-throughput services, prefer gather with return_exceptions=True when you want to preserve successful results despite some failures — for example, fetching data from multiple caches where one miss is acceptable. Use TaskGroup when you want atomicity: if any part of the workflow fails, the whole operation should be abandoned.
Another pattern: asyncio.wait() gives fine-grained control over FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED. It's useful for race patterns (e.g., fetch from primary, fallback to secondary on timeout).
| Aspect | asyncio (Coroutines) | Threading (Thread per task) | Multiprocessing |
|---|---|---|---|
| Best use case | I/O-bound: APIs, DBs, sockets | I/O-bound + legacy sync libs | CPU-bound: parsing, ML, crypto |
| Concurrency model | Cooperative (yields voluntarily) | Preemptive (OS switches threads) | True parallelism (multiple processes) |
| Memory per unit | ~1-2 KB per coroutine frame | ~1-8 MB per OS thread stack | High — full interpreter copy |
| GIL impact | No GIL contention (single thread) | GIL limits CPU parallelism | No GIL — full CPU parallelism |
| Max concurrency | 10,000s of tasks easily | ~100-1000 threads typical | Limited by CPU cores |
| Shared state safety | Safe — single thread, no races | Unsafe — need locks/queues | Unsafe — need IPC |
| Blocking call impact | Freezes ALL other tasks | Only blocks that one thread | Only blocks that one process |
| Library requirements | Must use async-native libs | Any synchronous library works | Must be picklable |
| Error propagation | CancelledError + ExceptionGroup | Exception in thread is silent unless joined | Exception must be passed via queue/pipe |
| Python version | 3.7+ for modern API, 3.11+ for TaskGroup | All versions | All versions |
Key Takeaways
- asyncio runs a single thread with cooperative multitasking: coroutines yield at each await.
- Coroutine frames cost ~1-2 KB vs 1-8 MB for threads — enabling 10,000s of connections.
- Never block the event loop with synchronous I/O or CPU-heavy code; use
to_thread()orrun_in_executor(). - Always re-raise CancelledError; swallowing it breaks shutdown.
- Prefer asyncio.gather with return_exceptions=True or use TaskGroup (3.11+) for structured concurrency.
- Enable debug mode (PYTHONASYNCIODEBUG=1) during development to catch blocking calls early.
Common Mistakes to Avoid
- Using await on each coroutine inside a loop (sequentialisation)
Symptom: Your async API performs no better than synchronous code. Wall time is sum of all operations instead of max.
Fix: Collect coroutines into a list and pass toasyncio.gather(), or create tasks withasyncio.create_task()before awaiting. - Silently swallowing CancelledError
Symptom: Application hangs on shutdown. Pod takes >30 seconds to terminate. Cleanup never runs.
Fix: Re-raise CancelledError in except blocks. Useexcept Exception:instead of bareexcept:to avoid catching CancelledError inadvertently. Always usefinallyfor cleanup that must run regardless. - Using time.sleep() instead of asyncio.sleep() inside a coroutine
Symptom: Entire application freezes for the duration of sleep. Other requests time out.
Fix: Replacewithtime.sleep()await. For synchronous sleep during startup/shutdown, use a thread pool.asyncio.sleep() - Calling synchronous I/O directly (requests.get, file read) inside coroutine
Symptom: Performance degradation under load. Slow-callback warnings if debug mode enabled.
Fix: Use async libraries (aiohttp, httpx.AsyncClient, aiofiles) or offload to thread pool withasyncio.to_thread().
Interview Questions on This Topic
- QExplain the difference between asyncio.gather with return_exceptions=False and True. When would you use each?Mid-levelReveal
- QWhat happens if you catch CancelledError and don't re-raise it?SeniorReveal
- QHow does the event loop detect blocking calls, and how can you debug them in production?SeniorReveal
Frequently Asked Questions
Why does my async code run slower than sync code?
You're likely awaiting coroutines one by one inside a loop (sequentialisation) or calling blocking functions like or time.sleep() that freeze the event loop. Verify you use requests.get() for independent tasks and replace blocking calls with async versions or offload to a thread pool.asyncio.gather()
What's the difference between asyncio.run() and creating an event loop manually?
asyncio.run() creates a new event loop, runs the coroutine, closes the loop, and recycles it. It's the recommended entry point. Manual loop creation (loop = ) is only needed when integrating asyncio with other event loops (e.g., in Jupyter notebooks, GUI frameworks). Never nest asyncio.new_event_loop()asyncio.run() calls.
How do I handle timeouts properly in asyncio?
Use (Python 3.11+) or asyncio.timeout(). Both raise asyncio.wait_for()TimeoutError when the deadline is exceeded. Internally, they cancel the underlying task by injecting CancelledError. Ensure the wrapped coroutine handles CancelledError correctly (re-raises). For graceful timeouts, prefer wait_for with a timeout that allows fallback logic.
That's Advanced Python. Mark it forged?
6 min read · try the examples if you haven't