Advanced 6 min · March 05, 2026

Python asyncio Shutdown Hangs — CancelledError Swallowing

CancelledError caught in except Exception blocks causes 30+ second SIGTERM hangs.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
Quick Answer
  • Coroutine = async def function that yields control at each await point
  • Event loop runs a single thread, switching tasks when they await I/O
  • await does not block the thread — it suspends the coroutine until the future resolves
  • asyncio.gather runs tasks concurrently; return_exceptions=True prevents one failure from cancelling all
  • CancelledError must be re-raised; swallowing it breaks shutdown
  • Blocking the loop (e.g. time.sleep) freezes all concurrent tasks

Every modern Python backend eventually hits the same wall: your code spends 90% of its time waiting — waiting for a database to respond, waiting for an API call to return, waiting for a file to be read off disk. Threads are the classic answer, but they carry a tax: memory overhead, the GIL playing referee, and race conditions that appear only in production at 3am. There's a better tool for I/O-bound concurrency, and it's been in the standard library since Python 3.4, matured dramatically in 3.7, and is now the backbone of frameworks like FastAPI, aiohttp, and Starlette.

asyncio solves the 'waiting problem' through cooperative multitasking. Instead of the OS forcibly switching between threads, your coroutines voluntarily yield control back to an event loop whenever they'd otherwise block. One thread, one event loop, potentially thousands of concurrent operations — all without a mutex in sight. The catch is that everything in your call stack must understand this contract, which is what makes asyncio feel like learning a new dialect of Python at first.

By the end of this article you'll understand how the event loop actually schedules work, how coroutines differ from generators at the bytecode level, why blocking the event loop is the cardinal sin of async Python, and how to structure real production services — including proper cancellation, timeouts, error propagation, and the patterns that separate async code that scales from async code that silently serializes everything.

How the Event Loop Actually Works — Not the Simplified Version

Most explanations stop at 'the event loop runs coroutines'. That's not enough when something breaks in production. The event loop is essentially a tight loop around a system call — select, epoll, or kqueue depending on your OS — that asks the kernel: 'which of these file descriptors are ready?'. When one is ready, the loop wakes up the coroutine that was waiting on it and resumes execution from the exact point it yielded.

A coroutine is a function defined with async def. Under the hood it compiles to a code object whose frame can be suspended and resumed. When you await something, Python calls __await__ on the awaitable, which ultimately bottoms out in a Future object. That Future registers a callback with the event loop. The coroutine's frame is frozen — local variables and all — until the Future resolves, at which point the loop schedules its resumption.

This is why you can have 10,000 concurrent HTTP connections on a single thread: no connection holds the thread while waiting. Each coroutine's frame costs roughly 1-2 KB of heap memory — orders of magnitude cheaper than an OS thread's 1-8 MB stack.

Understanding this model is what makes the rule 'never block the event loop' feel obvious rather than arbitrary: if your coroutine calls time.sleep(5), the entire event loop freezes for five seconds because it's still on the same thread. Every other 'concurrent' coroutine is stuck waiting.

Tasks, Futures, and Awaitable Contracts — The Object Model Behind await

There are three things you can await in Python: coroutines, Tasks, and Futures. Understanding the difference is critical for writing correct async code.

A coroutine object (what you get when you call an async def function without await) is a lazy generator-like object. Nothing runs until the event loop drives it. If you write fetch_user_profile(42) without awaiting it, Python will create the object and immediately warn you it was never awaited.

A Future is a low-level promise object. It starts in a pending state and transitions to done (with a result or an exception) exactly once. You almost never create Futures manually in application code — they live inside the networking and I/O layers.

A Task wraps a coroutine and schedules it to run on the event loop immediately via asyncio.create_task(). This is the key difference from a bare await: await coroutine() runs that coroutine sequentially from your perspective, while asyncio.create_task(coroutine()) schedules it concurrently and returns a handle you can await later.

The await keyword calls __await__() on the right-hand side object, which must return an iterator. For Tasks and Futures that iterator suspends the current coroutine and resumes it when the Future resolves. This is the same protocol yield from used in Python 3.4 generators — asyncio is built on top of that generator machinery.

Cancellation, Timeouts, and Error Handling — The Production Minefield

Happy-path async code is easy. Production async code is defined by how it handles failure. Three scenarios trip up even experienced developers: task cancellation, timeouts, and exception propagation through gather.

Cancellation in asyncio is cooperative, not forcible. When you call task.cancel(), Python injects a CancelledError into the coroutine at its next await point. If the coroutine catches CancelledError and doesn't re-raise it, the cancellation is silently swallowed — a serious bug. Always re-raise CancelledError or use finally blocks for cleanup.

Timeouts are best handled with asyncio.timeout() (Python 3.11+) or asyncio.wait_for() on earlier versions. Both wrap a CancelledError in a TimeoutError so you can distinguish 'took too long' from 'was cancelled by a parent task'.

Exception propagation through gather has a sharp edge: by default, if one task raises, gather cancels the remaining tasks and re-raises the first exception. You lose the results of tasks that succeeded. Pass return_exceptions=True in production so you can inspect every result individually.

asyncio.TaskGroup (Python 3.11+) is now the preferred pattern — it provides structured concurrency where all child tasks are cancelled if any one fails, and all exceptions are surfaced together via an ExceptionGroup.

Blocking the Event Loop — How to Detect It and What to Do Instead

Blocking the event loop is the cardinal sin of async Python and the most common source of 'asyncio isn't faster than sync code' complaints. Any call that holds the thread without yielding — a requests.get(), a time.sleep(), a CPU-heavy loop, even a naively-called json.loads() on a 50MB payload — freezes every other coroutine in your application.

The asyncio event loop has a built-in slow-callback detector: set loop.slow_callback_duration to a low threshold (e.g. 50ms) and enable debug mode. Python will log a warning whenever a callback holds the loop longer than that threshold. This is invaluable in production profiling.

For blocking I/O you can't make async (a synchronous DB driver, a legacy library), use loop.run_in_executor() to offload work to a thread pool. For CPU-bound work, use ProcessPoolExecutor — threads won't help here because of the GIL.

asyncio.to_thread() (Python 3.9+) is a clean shorthand for run_in_executor with the default thread pool. It's idiomatic for wrapping synchronous file I/O, synchronous HTTP calls, or any legacy synchronous function you can't replace yet.

The mental model: the event loop is the single thread. Think of it as a very important person's personal assistant. Every synchronous call is a task that physically occupies the assistant. Every await is handing a task to someone else while the assistant handles the next thing.

Production Patterns: Structured Concurrency with TaskGroup vs gather

asyncio.gather() and asyncio.TaskGroup both run multiple tasks concurrently, but they differ fundamentally in failure handling. gather() by default cancels all tasks if one raises, and it swallows the CancelledError that was injected into surviving tasks — you'll never see them. TaskGroup (Python 3.11+) provides structured concurrency: every task is a child of the group, and if any one fails, all children are cancelled. When all finish, exceptions are merged into an ExceptionGroup.

The critical distinction: with gather(return_exceptions=False), you get the first exception and the rest are cancelled silently. With gather(return_exceptions=True), you get a list with mixed results and exceptions, but the tasks that finished after the first failure are already cancelled by the time you inspect the results. TaskGroup avoids this surprise: you commit to either all succeed or all are cancelled, and you handle exceptions after the context manager exits.

In high-throughput services, prefer gather with return_exceptions=True when you want to preserve successful results despite some failures — for example, fetching data from multiple caches where one miss is acceptable. Use TaskGroup when you want atomicity: if any part of the workflow fails, the whole operation should be abandoned.

Another pattern: asyncio.wait() gives fine-grained control over FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED. It's useful for race patterns (e.g., fetch from primary, fallback to secondary on timeout).

Comparing asyncio, Threading, and Multiprocessing
Aspectasyncio (Coroutines)Threading (Thread per task)Multiprocessing
Best use caseI/O-bound: APIs, DBs, socketsI/O-bound + legacy sync libsCPU-bound: parsing, ML, crypto
Concurrency modelCooperative (yields voluntarily)Preemptive (OS switches threads)True parallelism (multiple processes)
Memory per unit~1-2 KB per coroutine frame~1-8 MB per OS thread stackHigh — full interpreter copy
GIL impactNo GIL contention (single thread)GIL limits CPU parallelismNo GIL — full CPU parallelism
Max concurrency10,000s of tasks easily~100-1000 threads typicalLimited by CPU cores
Shared state safetySafe — single thread, no racesUnsafe — need locks/queuesUnsafe — need IPC
Blocking call impactFreezes ALL other tasksOnly blocks that one threadOnly blocks that one process
Library requirementsMust use async-native libsAny synchronous library worksMust be picklable
Error propagationCancelledError + ExceptionGroupException in thread is silent unless joinedException must be passed via queue/pipe
Python version3.7+ for modern API, 3.11+ for TaskGroupAll versionsAll versions

Key Takeaways

  • asyncio runs a single thread with cooperative multitasking: coroutines yield at each await.
  • Coroutine frames cost ~1-2 KB vs 1-8 MB for threads — enabling 10,000s of connections.
  • Never block the event loop with synchronous I/O or CPU-heavy code; use to_thread() or run_in_executor().
  • Always re-raise CancelledError; swallowing it breaks shutdown.
  • Prefer asyncio.gather with return_exceptions=True or use TaskGroup (3.11+) for structured concurrency.
  • Enable debug mode (PYTHONASYNCIODEBUG=1) during development to catch blocking calls early.

Common Mistakes to Avoid

  • Using await on each coroutine inside a loop (sequentialisation)
    Symptom: Your async API performs no better than synchronous code. Wall time is sum of all operations instead of max.
    Fix: Collect coroutines into a list and pass to asyncio.gather(), or create tasks with asyncio.create_task() before awaiting.
  • Silently swallowing CancelledError
    Symptom: Application hangs on shutdown. Pod takes >30 seconds to terminate. Cleanup never runs.
    Fix: Re-raise CancelledError in except blocks. Use except Exception: instead of bare except: to avoid catching CancelledError inadvertently. Always use finally for cleanup that must run regardless.
  • Using time.sleep() instead of asyncio.sleep() inside a coroutine
    Symptom: Entire application freezes for the duration of sleep. Other requests time out.
    Fix: Replace time.sleep() with await asyncio.sleep(). For synchronous sleep during startup/shutdown, use a thread pool.
  • Calling synchronous I/O directly (requests.get, file read) inside coroutine
    Symptom: Performance degradation under load. Slow-callback warnings if debug mode enabled.
    Fix: Use async libraries (aiohttp, httpx.AsyncClient, aiofiles) or offload to thread pool with asyncio.to_thread().

Interview Questions on This Topic

  • QExplain the difference between asyncio.gather with return_exceptions=False and True. When would you use each?Mid-levelReveal
    With return_exceptions=False (default), if any task raises an exception, gather cancels all other pending tasks and re-raises the first exception immediately. You lose results from tasks that already completed. Use this when the operation is atomic — any failure should abort the whole batch. With return_exceptions=True, exceptions are returned as values in the results list; no tasks are cancelled. Use this for fault-tolerant operations where partial success is valuable, e.g., fetching data from multiple caches where some may fail.
  • QWhat happens if you catch CancelledError and don't re-raise it?SeniorReveal
    The coroutine swallows the cancellation and continues running. The event loop thinks the task is still alive. If this happens during application shutdown, the process will not exit until the watchdog kills it, potentially leaving resources unclosed and connections dangling. Always re-raise CancelledError or let it propagate. Use finally blocks for cleanup that must happen regardless of cancellation.
  • QHow does the event loop detect blocking calls, and how can you debug them in production?SeniorReveal
    Set loop.slow_callback_duration to a small threshold (e.g., 0.05s) and enable debug mode via asyncio.run(main(), debug=True) or the PYTHONASYNCIODEBUG=1 environment variable. Python will log a message like Executing <Task ...> took 0.123 seconds for any callback that runs longer than the threshold. This is the primary tool for identifying blocking calls. In production, you can also use asyncio.all_tasks(loop) to list all tasks and inspect their state.

Frequently Asked Questions

Why does my async code run slower than sync code?

You're likely awaiting coroutines one by one inside a loop (sequentialisation) or calling blocking functions like time.sleep() or requests.get() that freeze the event loop. Verify you use asyncio.gather() for independent tasks and replace blocking calls with async versions or offload to a thread pool.

What's the difference between asyncio.run() and creating an event loop manually?

asyncio.run() creates a new event loop, runs the coroutine, closes the loop, and recycles it. It's the recommended entry point. Manual loop creation (loop = asyncio.new_event_loop()) is only needed when integrating asyncio with other event loops (e.g., in Jupyter notebooks, GUI frameworks). Never nest asyncio.run() calls.

How do I handle timeouts properly in asyncio?

Use asyncio.timeout() (Python 3.11+) or asyncio.wait_for(). Both raise TimeoutError when the deadline is exceeded. Internally, they cancel the underlying task by injecting CancelledError. Ensure the wrapped coroutine handles CancelledError correctly (re-raises). For graceful timeouts, prefer wait_for with a timeout that allows fallback logic.

🔥

That's Advanced Python. Mark it forged?

6 min read · try the examples if you haven't

Previous
Iterators and Iterables in Python
2 / 17 · Advanced Python
Next
Metaclasses in Python