Python asyncio Shutdown Hangs — CancelledError Swallowing
CancelledError caught in except Exception blocks causes 30+ second SIGTERM hangs.
20+ years shipping production Python across data and backend systems. Written from production experience, not tutorials.
- Coroutine = async def function that yields control at each await point
- Event loop runs a single thread, switching tasks when they await I/O
- await does not block the thread – it suspends the coroutine until the future resolves
- asyncio.gather runs tasks concurrently; return_exceptions=True prevents one failure from cancelling all
- CancelledError must be re-raised; swallowing it breaks shutdown
- Blocking the loop (e.g. time.sleep) freezes all concurrent tasks
Imagine you're a chef cooking three dishes at once. Instead of standing over the pasta staring at it until it boils, you start the water, then chop vegetables, then check the sauce — switching tasks whenever one needs waiting. Python's asyncio works exactly like that chef: one worker (the event loop) juggles many tasks by switching between them the moment a task hits a waiting period, like a network request. You get the speed of doing many things at once without the chaos of hiring multiple cooks (threads).
Every modern Python backend eventually hits the same wall: your code spends 90% of its time waiting — waiting for a database to respond, waiting for an API call to return, waiting for a file to be read off disk. Threads are the classic answer, but they carry a tax: memory overhead, the GIL playing referee, and race conditions that appear only in production at 3am. There's a better tool for I/O-bound concurrency, and it's been in the standard library since Python 3.4, matured dramatically in 3.7, and is now the backbone of frameworks like FastAPI, aiohttp, and Starlette.
asyncio solves the 'waiting problem' through cooperative multitasking. Instead of the OS forcibly switching between threads, your coroutines voluntarily yield control back to an event loop whenever they'd otherwise block. One thread, one event loop, potentially thousands of concurrent operations — all without a mutex in sight. The catch is that everything in your call stack must understand this contract, which is what makes asyncio feel like learning a new dialect of Python at first.
By the end of this article you'll understand how the event loop actually schedules work, how coroutines differ from generators at the bytecode level, why blocking the event loop is the cardinal sin of async Python, and how to structure real production services — including proper cancellation, timeouts, error propagation, and the patterns that separate async code that scales from async code that silently serializes everything.
Why asyncio Shutdown Hangs — CancelledError Swallowing
asyncio is Python's built-in library for writing concurrent code using the async/await syntax. It implements cooperative multitasking on a single thread via an event loop that schedules coroutines — functions defined with async def that can suspend execution at await points. The core mechanic is explicit yielding: a coroutine voluntarily pauses, allowing the loop to run other coroutines until the awaited operation completes.
In practice, asyncio works by wrapping I/O-bound operations (network requests, file reads, sleep) into awaitable objects. The event loop polls registered file descriptors and timers, resuming coroutines when data arrives or time passes. Key properties: all coroutines share one thread, so no locks are needed for shared state, but a long-running CPU-bound coroutine blocks the entire loop. Cancellation is cooperative — a CancelledError is raised at the next await, but if a coroutine catches and ignores it, shutdown hangs.
Use asyncio when your application is I/O-bound with many concurrent connections — web servers, API gateways, data pipelines. It matters because it enables handling tens of thousands of connections per process without the overhead of threads or processes. The critical nuance: asyncio does not make Python faster; it makes waiting faster. If your workload is CPU-bound, use multiprocessing or threads.
asyncio.shield() explicitly when you intend to suppress cancellation.task.cancel() with explicit handling.How the Event Loop Actually Works — Not the Simplified Version
Most explanations stop at 'the event loop runs coroutines'. That's not enough when something breaks in production. The event loop is essentially a tight loop around a system call — select, epoll, or kqueue depending on your OS — that asks the kernel: 'which of these file descriptors are ready?'. When one is ready, the loop wakes up the coroutine that was waiting on it and resumes execution from the exact point it yielded.
A coroutine is a function defined with async def. Under the hood it compiles to a code object whose frame can be suspended and resumed. When you await something, Python calls __await__ on the awaitable, which ultimately bottoms out in a Future object. That Future registers a callback with the event loop. The coroutine's frame is frozen — local variables and all — until the Future resolves, at which point the loop schedules its resumption.
This is why you can have 10,000 concurrent HTTP connections on a single thread: no connection holds the thread while waiting. Each coroutine's frame costs roughly 1-2 KB of heap memory — orders of magnitude cheaper than an OS thread's 1-8 MB stack.
Understanding this model is what makes the rule 'never block the event loop' feel obvious rather than arbitrary: if your coroutine calls time.sleep(5), the entire event loop freezes for five seconds because it's still on the same thread. Every other 'concurrent' coroutine is stuck waiting.
Event Loop Architecture — Visualising the Lifecycle
The event loop is not a black box. It follows a predictable cycle, and understanding each phase helps diagnose hangs and latency. At its core, the loop continuously runs three phases:
- Polling – asks the OS (via
epoll/kqueue/select) which file descriptors are ready for reading or writing. This is where the loop blocks when there is truly nothing to do – but it blocks with a timeout so it can wake periodically to run scheduled callbacks. - Running ready callbacks – for each ready FD, the loop invokes the associated callback (e.g., resuming a coroutine that was waiting on a socket). Callbacks are run until the ready list is empty or a maximum number is processed (to avoid starving other tasks).
- Scheduling – after callbacks, the loop runs any scheduled callbacks (e.g., from
asyncio.sleeporcall_later). These are stored in a heap and sorted by deadline. The loop also checks for cancelled tasks and prepares the next poll timeout.
The loop also maintains internal data structures: a ready deque, a heap of scheduled timers, and a mapping of FDs to callbacks. The debug mode exposes these with the PYTHONASYNCIODEBUG=1 environment variable, which logs each phase’s duration.
A common performance win is tuning the loop.slow_callback_duration: set it to something aggressive (e.g., 10ms) in development to catch when a callback takes too long. In production, use structured logging to emit metric-like timestamps for each loop iteration.
Here’s a simplified flow of the event loop’s main loop:
loop.slow_callback_duration=0.01.iteration_time, pending_tasks, active_handles.Tasks, Futures, and Awaitable Contracts — The Object Model Behind await
There are three things you can await in Python: coroutines, Tasks, and Futures. Understanding the difference is critical for writing correct async code.
A coroutine object (what you get when you call an async def function without await) is a lazy generator-like object. Nothing runs until the event loop drives it. If you write fetch_user_profile(42) without awaiting it, Python will create the object and immediately warn you it was never awaited.
A Future is a low-level promise object. It starts in a pending state and transitions to done (with a result or an exception) exactly once. You almost never create Futures manually in application code — they live inside the networking and I/O layers.
A Task wraps a coroutine and schedules it to run on the event loop immediately via . This is the key difference from a bare asyncio.create_task()await: await runs that coroutine sequentially from your perspective, while coroutine()asyncio.create_task( schedules it concurrently and returns a handle you can await later.coroutine())
The await keyword calls _ on the right-hand side object, which must return an iterator. For Tasks and Futures that iterator suspends the current coroutine and resumes it when the Future resolves. This is the same protocol _await__()yield from used in Python 3.4 generators — asyncio is built on top of that generator machinery.
result = await my_coroutine() inside a loop, you've written synchronous code with extra steps. The coroutine only becomes concurrent when it's wrapped in a Task via create_task() or gather(). The sequential_approach() example above takes 3x longer — and it's async code. Profile first if async 'isn't helping'.await coro() inside a loop for independent operations — use gather or TaskGroup.Cancellation, Timeouts, and Error Handling — The Production Minefield
Happy-path async code is easy. Production async code is defined by how it handles failure. Three scenarios trip up even experienced developers: task cancellation, timeouts, and exception propagation through gather.
Cancellation in asyncio is cooperative, not forcible. When you call , Python injects a task.cancel()CancelledError into the coroutine at its next await point. If the coroutine catches CancelledError and doesn't re-raise it, the cancellation is silently swallowed — a serious bug. Always re-raise CancelledError or use finally blocks for cleanup.
Timeouts are best handled with (Python 3.11+) or asyncio.timeout() on earlier versions. Both wrap a asyncio.wait_for()CancelledError in a TimeoutError so you can distinguish 'took too long' from 'was cancelled by a parent task'.
Exception propagation through gather has a sharp edge: by default, if one task raises, gather cancels the remaining tasks and re-raises the first exception. You lose the results of tasks that succeeded. Pass return_exceptions=True in production so you can inspect every result individually.
asyncio.TaskGroup (Python 3.11+) is now the preferred pattern — it provides structured concurrency where all child tasks are cancelled if any one fails, and all exceptions are surfaced together via an ExceptionGroup.
Blocking the Event Loop — How to Detect It and What to Do Instead
Blocking the event loop is the cardinal sin of async Python and the most common source of 'asyncio isn't faster than sync code' complaints. Any call that holds the thread without yielding — a , a requests.get(), a CPU-heavy loop, even a naively-called time.sleep() on a 50MB payload — freezes every other coroutine in your application.json.loads()
The asyncio event loop has a built-in slow-callback detector: set loop.slow_callback_duration to a low threshold (e.g. 50ms) and enable debug mode. Python will log a warning whenever a callback holds the loop longer than that threshold. This is invaluable in production profiling.
For blocking I/O you can't make async (a synchronous DB driver, a legacy library), use to offload work to a thread pool. For CPU-bound work, use loop.run_in_executor()ProcessPoolExecutor — threads won't help here because of the GIL.
(Python 3.9+) is a clean shorthand for asyncio.to_thread()run_in_executor with the default thread pool. It's idiomatic for wrapping synchronous file I/O, synchronous HTTP calls, or any legacy synchronous function you can't replace yet.
The mental model: the event loop is the single thread. Think of it as a very important person's personal assistant. Every synchronous call is a task that physically occupies the assistant. Every await is handing a task to someone else while the assistant handles the next thing.
asyncio.to_thread() for legacy sync code; use ProcessPoolExecutor for CPU-bound tasks.Handling Blocking Code — A Strategy Guide for run_in_executor and to_thread
When you inevitably encounter a blocking call inside a coroutine, you have three options: replace it with an async alternative, offload it to a thread pool, or offload it to a process pool. The right choice depends on the nature of the work.
Decision Strategy:
- Is there an async-native library? Use it directly (e.g.,
httpx.AsyncClientinstead ofrequests,aiofilesinstead of file I/O). This is the fastest path — zero context switching overhead. - Is the work I/O-bound but synchronous? (e.g., legacy database driver, filesystem with
). Useopen()(Python 3.9+) orasyncio.to_thread()loop.run_in_executor(None, func, ...). Both submit the call to the defaultThreadPoolExecutor, freeing the event loop. The default thread pool hasmin(32,workers, which is sufficient for most I/O workloads. Tuneos.cpu_count()+ 4)max_workersvia a custom executor if you see thread starvation (e.g., many slow synchronous calls). - Is the work CPU-bound? (e.g., parsing JSON, image processing, cryptography). Use
ProcessPoolExecutorvialoop.run_in_executor(pool, func, ...). Threads won’t help because of the GIL. A process pool gives true parallelism. Be mindful of the overhead: each call pickles the function and arguments, so only use it for work that takes at least several hundred milliseconds. - Is the work a blocking C extension that releases the GIL? (e.g., some
numpyroutines). You can use a thread pool — the GIL is released during the operation, so threads provide parallelism. Profile to confirm.
Here’s a quick reference table:
| Scenario | Recommended Approach | Pitfall to Avoid |
|---|---|---|
| Synchronous HTTP call | httpx.AsyncClient or aiohttp | Using requests directly → blocks loop |
| Synchronous file read | aiofiles or asyncio.to_thread(open) | Blocking the loop for large files |
| CPU-heavy pure Python | ProcessPoolExecutor | Using threads → still GIL-bound |
| Legacy sync library (I/O) | | Forgetting to await the future |
| Multiple independent blocking calls | asyncio.gather with to_thread | Sequential offloading (defeats concurrency) |
Always wrap blocking calls with a timeout using asyncio.wait_for on the offloaded future, so you can detect when the executor worker is stuck. This prevents your whole application from hanging on a blocked worker.
min(32, os.cpu_count() + 4) workers. This is usually fine for short I/O calls. If you have many long-running synchronous calls (e.g., a DB driver that blocks for seconds), increase the pool size to avoid starvation. Monitor the number of active threads and adjust. For CPU-bound work, stick to ProcessPoolExecutor – the GIL limits threads even if you have many cores.asyncio.to_thread() for simple I/O offload; use loop.run_in_executor with a custom pool for advanced control.Production Patterns: Structured Concurrency with TaskGroup vs gather
asyncio.gather() and asyncio.TaskGroup both run multiple tasks concurrently, but they differ fundamentally in failure handling. gather() by default cancels all tasks if one raises, and it swallows the CancelledError that was injected into surviving tasks — you'll never see them. TaskGroup (Python 3.11+) provides structured concurrency: every task is a child of the group, and if any one fails, all children are cancelled. When all finish, exceptions are merged into an ExceptionGroup.
The critical distinction: with gather(return_exceptions=False), you get the first exception and the rest are cancelled silently. With gather(return_exceptions=True), you get a list with mixed results and exceptions, but the tasks that finished after the first failure are already cancelled by the time you inspect the results. TaskGroup avoids this surprise: you commit to either all succeed or all are cancelled, and you handle exceptions after the context manager exits.
In high-throughput services, prefer gather with return_exceptions=True when you want to preserve successful results despite some failures — for example, fetching data from multiple caches where one miss is acceptable. Use TaskGroup when you want atomicity: if any part of the workflow fails, the whole operation should be abandoned.
Another pattern: asyncio.wait() gives fine-grained control over FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED. It's useful for race patterns (e.g., fetch from primary, fallback to secondary on timeout).
- gather(return_exceptions=False): first failure cancels all siblings and re-raises immediately. Use when a single failure should abort the whole batch.
- gather(return_exceptions=True): no cancellation on failure; you get a list of mixed results/exceptions. Use for fault-tolerant batches (e.g. multiple cache lookups).
- TaskGroup: if any child fails, all siblings are cancelled. Exceptions collected in ExceptionGroup after all children finish. Use for atomic workflows.
- wait(FIRST_COMPLETED): get the first task that finishes, useful for race patterns (primary/fallback) or timeout as a task.
Asyncio Batching Strategies — gather() vs TaskGroup vs as_completed
Choosing the right batching primitive is critical for both performance and correctness. Below is a comparison of the three main approaches. Use this as a quick reference when designing production async workflows.
| Feature | | asyncio.TaskGroup | |
|---|---|---|---|
| Python version | 3.5+ | 3.11+ | 3.5+ |
| Concurrency type | Wraps coroutines in Tasks implicitly | Structured concurrency with explicit task creation | Iterate over futures as they complete |
| Failure behavior (default) | First exception cancels siblings; raises immediately | All children cancelled; all exceptions collected in ExceptionGroup | Each future yields result or exception as it completes; no cancellation of others |
| Partial results on failure | Lost (siblings cancelled) | Lost (siblings cancelled) | Preserved (each future independent) |
| Works with existing tasks? | Yes, pass list of awaitables | Only via | Yes, iterate over any iterable of futures |
| Memory overhead | Low (internal list of futures) | Low (tracked by context manager) | Low (one future at a time) |
| Use case | Fire-and-wait, when you need a single list of results | Atomic workflows, where all-or-nothing semantics are required | Streaming results, e.g., making many HTTP requests and processing each as soon as it arrives |
| Error handling pattern | Wrap with try/except or use return_exceptions=True | Use except* to handle multiple exception types | Handle each result as it comes; use for fine-grained control |
When to use each:
- gather: Best for most cases where you have a fixed set of I/O operations and want to collect all results at once. Use
return_exceptions=Truefor fault tolerance. - TaskGroup: Use when you need structured concurrency and atomicity — if one task fails, you want the entire group to abort cleanly. Excellent for complex workflows where cleanup matters.
- as_completed: Use when you want to process results as soon as they arrive, rather than waiting for the slowest task. Useful for progressive UI updates or streaming pipelines.
All three can be combined: for example, use TaskGroup for a batch of related operations, and inside a task use gather to collect sub-results. The key is to match the semantic to your failure tolerance.
gather will return only when the slowest request completes—meaning you wait 2s before processing the first result. as_completed gives you each result as soon as it finishes, allowing you to start processing faster, especially useful for real-time dashboards or progressive data loading.as_completed can reduce time-to-first-byte for end users. However, it requires managing each future individually, which can increase code complexity. gather with return_exceptions=True remains the workhorse for most batch operations.as_completed, always iterate over a list of create_task results to ensure all tasks start concurrently — iterating directly over coroutines will serialize them.gather for simplicity, TaskGroup for atomicity, and as_completed for streaming.The async/await Keyword Contract — What Really Happens Under the Hood
Stop treating async def as magic. It's a compiler transformation that rewrites your function into a state machine. When you call an async function, Python doesn't execute a single line of your code. It returns a coroutine object — a frozen generator that holds your function's frame and local variables.
The await keyword is the yield point. It suspends execution by returning control to the event loop, passing a Future or awaitable that signals when the coroutine should resume. The event loop tracks these awaitables in its run queue. When the underlying I/O completes or a timer fires, the loop calls .send(None) on your coroutine, resuming exactly where it paused.
This is not threads. There's no preemption. Your coroutine runs until it hits an await that blocks, then it parks itself. That's why blocking inside a coroutine kills concurrency — the loop can't switch to another task until your coroutine voluntarily yields.
await returns the coroutine object, not the result — and silently drops the exception. The linter won't catch this in complex chaining.async def compiles to a state machine. await is the yield point that surrenders control to the event loop — no preemption, only voluntary suspension.Coroutine Chaining — Why Your Pipeline Must Be an Awaitable Stack
Real async applications chain coroutines. Your entry point awaits a , which awaits fetch_payment(), which awaits validate_card(). Each await peels back one layer of the call stack. This is not free — every suspension and resume costs a microsecond of loop overhead.authorize_gateway()
The performance trap: if you chain too many shallow coroutines that just delegate to the next, you create a cascading series of mini-suspensions. The event loop switches between runnable tasks, but each chain switch requires a context switch in the C implementation. Benchmark with to detect when your chain latency exceeds 100ms.asyncio.get_running_loop().slow_callback_duration
Design rule: flatten your chains. Combine multiple small async operations into one coroutine if they share I/O context. Use only when tasks are genuinely independent — false parallelism through chains that serialize on the same resource is slower than synchronous code.asyncio.gather()
run_in_executor. Reserve gather() for fire-and-forget fan-out where you don't need cancellation propagation.The asyncio Event Loop — Multitasking Without a Scheduler (and Why It Matters)
The event loop is not a scheduler. There's no preemptive time-slicing. It's a -style reactor loop that polls registered file descriptors and callbacks. When you select()await asyncio.sleep(5), the loop registers a timer callback in its internal heap. When the timer fires, the loop calls .send() on your coroutine to resume it.
The critical difference from threads: the loop can only run one task at a time. If you have 10,000 concurrent I/O operations, the loop cycles through them, checking readiness. No context switch overhead. No GIL contention. But the moment any single coroutine does CPU work without yielding — parsing a JSON response, hashing a password — the entire loop freezes. All 10,000 connections stall.
Design your tasks to yield frequently. For CPU work, use loop.run_in_executor(None, cpu_bound_func). The default executor uses a ThreadPoolExecutor with min(32, os.cpu_count() + 4) workers. Tune that number based on your I/O vs CPU ratio. Blocking for more than 50ms in a single coroutine is a production incident waiting to happen.
run_in_executor for CPU work.Rate Limiting with Semaphores — Controlling Concurrency in Production
Without rate limiting, async tasks can overwhelm external APIs, databases, or rate-limited services. asyncio.Semaphore caps the number of concurrent coroutines executing a critical section. Create a semaphore with the max concurrency count, then await it before each protected operation. The semaphore blocks when full, releasing a slot after the context exits. This prevents 429 errors and kernel resource exhaustion. Use it with TaskGroup or gather to batch parallel work. Always set an upper bound based on the downstream service limits, never guess. Combine with exponential backoff for retries to build resilient pipelines.
Retry Logic with Exponential Backoff — Resilient Async Calls
Network calls fail. A retry strategy with exponential backoff prevents cascading failures and respects server recovery time. The classic pattern: on failure, wait 2^attempt seconds (plus jitter), then retry up to a max retry count. In asyncio, wrap the call in a loop: try the operation, catch expected exceptions, sleep asynchronously with asyncio.sleep (never time.sleep—that blocks the event loop). Add random jitter to avoid thundering herd. Use a timeout per attempt to avoid hanging. Integrate with semaphore rate limiting for production-grade reliability.
Async Iterators and Async Comprehensions — Streaming Data Efficiently
Async iterators let you consume data as it arrives, not all at once. Define an __aiter__ returning self and __anext__ raising StopAsyncIteration when done. Use async for in loops to process chunks—ideal for streaming API responses or file lines. Async comprehensions (async for inside list/dict/set creation) build collections from async sources in one expression. Both require an async context (inside an async def). They avoid buffering entire datasets, reducing memory pressure. Combine with async generators (yield in async def) for clean, lazy pipelines.
Async I/O Isn’t Simple
Many developers treat async I/O as a magic performance switch: just add async/await and everything speeds up. The reality is far more subtle. Async Python runs all coroutines on a single thread, interleaving them via cooperative yielding. If any coroutine blocks — even for 50ms — the entire event loop stalls. Unlike threading, the OS won’t preempt a misbehaving task. The cognitive load skyrockets when debugging race conditions in shared state because asyncio gives you no locks for free. Mixing synchronous libraries (requests, time.sleep) with async code silently serializes your program. Profiling async bottlenecks requires specialized tools like asyncio.Task.cancel() or loop.slow_callback_duration. The paradox: async I/O simplifies throughput for I/O-bound problems but adds complexity for every other concern — error propagation, cancellation, and resource cleanup. Before adopting asyncio, ask whether your bottleneck is genuinely I/O latency, not CPU work or developer productivity.
asyncio.to_thread() or loop.run_in_executor() to offload synchronously blocking work.Libraries Supporting Async I/O
The async ecosystem is fragmented but essential for production systems. For HTTP, aiohttp and httpx (with async support) replace requests. Databases: asyncpg (PostgreSQL) and aiosqlite (SQLite) give true non-blocking queries. Redis: aioredis (merged into redis-py as redis.asyncio). For task queues, use aio-pika (RabbitMQ) or nats-py. If you need HTTP servers, aiohttp or FastAPI with uvicorn’s async workers provide high throughput. The tricky part is that many popular libraries (Django ORM, SQLAlchemy 1.x, boto3) lack native async support, forcing you to run them in executors. Newer tools like SQLAlchemy 2.0 async, beanie (MongoDB ODM), and motor (MongoDB driver) fill gaps. Always verify the library’s async support by checking for async def methods or look for an async submodule. Pro tip: use anyio’s backends to write library-agnostic async code that works with both asyncio and trio, future-proofing your stack against event loop lock-in.
asyncio.run() before trusting.asyncio.sleep() vs time.sleep() — The One That Kills Your Event Loop
You call time.sleep(2) inside an async function thinking you're just pausing. What you've actually done is freeze the entire event loop — all your other coroutines, their network calls, your semaphores, everything — stops dead for 2 seconds. is a blocking system call that suspends the whole Python thread. The event loop can't switch to another task because it never gets a chance to run. time.sleep() yields control back to the event loop, which then schedules other pending coroutines during that 2-second pause. The fundamental distinction: blocking vs yielding. One starves the loop, the other feeds it. Never import asyncio.sleep()time.sleep in async code unless you enjoy watching your concurrent system collapse into sequential misery.
time.sleep() via retry loops. Always wrap sync calls in loop.run_in_executor() to avoid accidentally nuking your event loop's throughput.asyncio.sleep() yields to the loop — use the right one or watch your concurrency die.Eager Task Factory — Fire Coroutines Now, Schedule Later
You write task = asyncio.create_task( and think it starts immediately. It doesn't — the coroutine gets scheduled on the event loop's run queue and won't execute until the current task yields control via an my_coro())await. That's lazy scheduling overhead. When you need a coroutine to begin execution right now, not after you finish your current await, you use the eager task factory: asyncio.Task(. This creates the task and runs it until its first suspension point before returning. Perfect for kicking off background work like health checks or cache warming that must start immediately while your main coroutine continues. The factory bypasses the queue and executes directly. Use it when latency of first execution matters. Overusing it on trivial coroutines adds unnecessary stack overhead — reserve for operations where starting instantly beats waiting for the next event loop iteration.my_coro(), eager_start=True)
The Silent Shutdown Hang: A CancelledError Swallowed in Production
asyncio.run() handles cancellation automatically. We thought we didn't need to re-raise CancelledError because we caught all exceptions in the top-level coroutine.raise. We also enabled debug mode (PYTHONASYNCIODEBUG=1) to log pending tasks during shutdown.- CancelledError inherits from BaseException, not Exception — a bare
except Exception:will never catch it, butexcept:orexcept Exception as ewith a wrong hierarchy can. Always re-raise CancelledError. - Use
asyncio.TaskGrouporwithasyncio.gather()return_exceptions=Trueto minimise the risk of silent cancellation swallowing. - Enable slow-callback detection and debug mode during dev and staging to catch hanging tasks before they reach production.
asyncio.all_tasks(loop) to list all pending tasks.asyncio.gather(*coros) or asyncio.create_task() for each. Check for accidental blocking calls (e.g. time.sleep instead of asyncio.sleep). Enable debug mode and check slow-callback warnings.asyncio.wait_for() and a timeout occurs, the inner coroutine gets a CancelledError. Ensure the inner coroutine re-raises and doesn't hold resources. Use asyncio.to_thread() for any synchronous I/O that might have caused a real delay.asyncio.run() calls. If you need to run async code inside a synchronous context (e.g. pytest), use asyncio.run() only once at the entry point. For testing, use pytest-asyncio with the @pytest.mark.asyncio decorator.PYTHONASYNCIODEBUG=1 python app.pyCheck logs for 'Executing <Task ...> took N seconds'asyncio.to_thread().Key takeaways
to_thread() or run_in_executor().Common mistakes to avoid
4 patternsUsing await on each coroutine inside a loop (sequentialisation)
asyncio.gather(), or create tasks with asyncio.create_task() before awaiting.Silently swallowing CancelledError
except Exception: instead of bare except: to avoid catching CancelledError inadvertently. Always use finally for cleanup that must run regardless.Using time.sleep() instead of asyncio.sleep() inside a coroutine
time.sleep() with await asyncio.sleep(). For synchronous sleep during startup/shutdown, use a thread pool.Calling synchronous I/O directly (requests.get, file read) inside coroutine
asyncio.to_thread().Interview Questions on This Topic
Explain the difference between asyncio.gather with return_exceptions=False and True. When would you use each?
Frequently Asked Questions
20+ years shipping production Python across data and backend systems. Written from production experience, not tutorials.
That's Advanced Python. Mark it forged?
18 min read · try the examples if you haven't