Python asyncio — 47-Second Freeze from Sync Calls
A sync requests.
- asyncio is Python's single-threaded concurrency library using async/await syntax
- The Event Loop is the central scheduler — it multiplexes I/O across coroutines without OS threads
- Coroutines yield control at every
awaitpoint, enabling cooperative multitasking - asyncio.gather() runs independent I/O tasks concurrently — total time equals the slowest task, not the sum
- Blocking the loop with sync calls (requests.get, time.sleep) freezes ALL coroutines — use run_in_executor for sync code
- For CPU-bound work, use multiprocessing — asyncio cannot bypass the GIL
asyncio solves a specific and important scalability problem: how do you handle thousands of concurrent I/O operations without spawning thousands of OS threads? The answer is cooperative multitasking — coroutines voluntarily yield control at await points, allowing a single-threaded event loop to multiplex work across all of them efficiently. No thread management, no locking, no context switch overhead from the OS.
The distinction senior engineers must genuinely internalise is that asyncio is concurrency, not parallelism. One thread handles all scheduling. This means any blocking call — a synchronous HTTP request, a CPU-heavy computation, even time.sleep() — freezes the entire loop and every coroutine waiting on it. Not some of them. All of them. This is the property that makes asyncio both powerful and dangerous: the performance gains from concurrency and the catastrophic failure mode from a single blocking call live right next to each other in the same codebase.
This guide covers production-grade patterns: orchestrating concurrent tasks with gather(), building fault-tolerant batch operations against unreliable dependencies, understanding the timeout and cancellation model deeply enough to use it correctly under pressure, and the operational mistakes I have seen repeatedly bring down async services. The examples are written for Python 3.11+ but the core patterns apply from 3.8 onward.
Coroutines and the Event Loop: The Engine Room
A coroutine is a specialised Python generator with async/await syntax. When you define a function with async def, calling it does not execute a single line of the function body — it returns a coroutine object. To actually run it, you must either await it inside another coroutine or schedule it on the event loop via asyncio.create_task() or asyncio.run(). This is not a convention or a style choice — it is how the object model works.
A common misconception is that async def makes a function asynchronous in some broad sense. It does not. It makes the function coroutine-returning. The function body executes zero lines until the coroutine is awaited or scheduled. Forgetting this distinction leads to one of the most common asyncio bugs in production codebases: creating a coroutine object, not awaiting it, and then wondering why the operation never happened. Python 3.11+ will emit a RuntimeWarning about this, but only if something holds a reference long enough to trigger garbage collection — in high-throughput code where objects are short-lived, this warning is sometimes never emitted.
The event loop is the heartbeat of every asyncio application. It maintains a queue of ready callbacks, a selector watching registered I/O file descriptors (using epoll on Linux, kqueue on macOS, IOCP on Windows), and a heap of scheduled callbacks ordered by their scheduled time. When a coroutine awaits an I/O operation, it registers a callback with the selector and suspends — control returns to the loop, which picks the next ready callback and runs it. When the OS signals that I/O is complete, the selector delivers the event, the callback is scheduled, and the original coroutine resumes from exactly where it yielded. This entire mechanism happens in a single thread. No OS context switches. No lock contention. No stack per concurrent operation beyond the coroutine frame itself.
asyncio.gather() — Orchestrating True Concurrency
When you have independent I/O operations, awaiting them sequentially one by one is leaving performance on the table. If you have three health checks that each take one second, awaiting them in series costs three seconds. Running them with gather() costs one second — the duration of the slowest one. That is the entire value proposition of gather(), and it is substantial.
asyncio.gather() takes an iterable of coroutines (or awaitables), wraps each one into a Task internally via create_task(), and schedules them all onto the event loop simultaneously. It then suspends the calling coroutine until every task completes, and returns a list of results in the same order as the input arguments — regardless of which tasks finished first. This ordering guarantee is important and worth relying on: you can safely unpack results positionally.
One nuance worth understanding: gather() creates tasks at the moment it is called, not at the moment the await resolves. This means all tasks start running as soon as the event loop gets control after the gather() call, which is immediately when the caller awaits gather(). If you are constructing a list of coroutines before calling gather(), those coroutines have not started yet — they are still inert coroutine objects. Only gather() turns them into running Tasks.
The performance model is straightforward: gather() converts sequential wait time into concurrent wait time. The total duration is bounded by max(all task durations) rather than sum(all task durations). For workloads involving many small network calls — health checks, fanout requests to microservices, parallel database lookups — this can reduce latency by an order of magnitude.
Fault Tolerance: Exceptions, Timeouts, and Cancellation
In production, external APIs fail, network partitions happen, and upstream services degrade. The question is not whether these events will occur — it is whether your async code handles them gracefully or cascades them into wider outages.
The two primary tools for fault tolerance in asyncio are gather(return_exceptions=True) for batch resilience and asyncio.wait_for() for individual operation timeouts. They address different failure modes and are commonly used together.
gather(return_exceptions=True) is the production standard for any batch operation against multiple dependencies. Instead of letting the first exception short-circuit the entire batch, it captures all exceptions as return values — Exceptions are just results that happen to be error objects. You iterate the results list, check isinstance(result, Exception) for each entry, and handle successes and failures individually. Every task gets a chance to complete. No orphaned tasks. Full visibility into which dependencies failed and which succeeded.
asyncio.wait_for(coroutine, timeout=seconds) enforces a maximum duration on a single coroutine. If the coroutine does not complete within the timeout, asyncio raises TimeoutError and cancels the wrapped coroutine. This is the mechanism for implementing SLAs on individual downstream calls — if your upstream health check should complete in under 3 seconds, wrap it in wait_for() with timeout=3.0 and handle TimeoutError explicitly.
Cancellation is where most engineers get tripped up. When wait_for() fires, it sends a CancelledError to the wrapped coroutine at its current await point. If the coroutine has cleanup logic that also awaits — closing a database connection, writing an audit log, releasing a lock — that cleanup will only run if it is inside a try/finally block. Code after a cancelled await does not execute. This is cooperative cancellation, and respecting it correctly is what separates async code that is safe from async code that merely appears to work in testing.
The Golden Rule: Never Block the Event Loop
This is the number one cause of production performance degradation in Python async services, and it is also the most insidious because it does not fail loudly. A blocking call does not raise an exception. It does not log a warning by default. It simply holds the event loop thread for its entire duration, during which every other coroutine in the process is frozen. One synchronous HTTP call taking 200ms freezes 10,000 concurrent connections for 200ms. Under load, these micro-freezes compound into p99 latency spikes that look like intermittent upstream degradation but are entirely self-inflicted.
The most common offenders in codebases I have reviewed: the requests library (always synchronous, even for simple GET calls), time.sleep() used as a delay inside handlers, CPU-heavy operations like image resizing or report generation, and third-party SDK clients that were written before async was widespread. The pattern is usually introduced by someone who understood the application's sync codebase well and did not fully internalise the async execution model.
For unavoidable synchronous code — a legacy library that cannot be replaced, a CPU-intensive operation that has no async equivalent — the correct approach is loop.run_in_executor(), which offloads the blocking call to a thread pool and returns an awaitable that the event loop can wait on without blocking itself. The thread pool runs in parallel with the event loop thread. The event loop remains free to process other coroutines while the thread pool handles the blocking work. This adds thread overhead, but it is categorically better than blocking the loop.
For CPU-bound work that needs true parallelism, ProcessPoolExecutor bypasses the GIL by running work in separate processes. The inter-process communication overhead makes this appropriate for coarse-grained work (process this batch) rather than fine-grained work (transform this value).
| Dimension | asyncio | threading | multiprocessing |
|---|---|---|---|
| Concurrency model | Cooperative — coroutines yield voluntarily at await points, single OS thread | Preemptive — OS scheduler switches between threads at arbitrary points | Parallel — separate OS processes, each with its own Python interpreter and GIL |
| Best for | High-concurrency I/O: thousands of simultaneous network calls, database queries, WebSocket connections | Legacy synchronous libraries, blocking I/O that cannot be rewritten, integrating with callback-based frameworks | CPU-bound computation: image processing, ML inference, encryption, data transformation that saturates a single core |
| Memory overhead | Very low — coroutines are lightweight objects, roughly a few KB each | Moderate — each OS thread has a stack, typically 1-8 MB depending on OS and configuration | High — each process has a separate heap, a copy of imported modules, and its own interpreter state |
| GIL interaction | Single thread — the GIL is entirely irrelevant, there is no contention | GIL limits true parallelism — only one thread executes Python bytecode at a time, I/O releases the GIL | Each process has its own GIL — true CPU parallelism across cores, but IPC overhead for data exchange |
| Context switch cost | Zero OS overhead — context switches are Python-level coroutine frame swaps at await points | OS kernel context switch — roughly 1-10 microseconds per switch, adds up under high thread counts | OS process switch — roughly 10-100 microseconds plus IPC serialisation overhead for any data passed between processes |
| Cancellation support | Cooperative via CancelledError delivered at await points — cleanup code in try/finally runs reliably | No safe cancellation mechanism — you can set daemon=True to kill on process exit but cannot interrupt mid-execution | Terminate the process (abrupt) or send a signal — no cooperative cancellation, cleanup code may not run |
| Debugging complexity | Moderate — stack traces span await boundaries, async context is lost across yields, debug mode helps significantly | High — race conditions, deadlocks, and data races are non-deterministic and difficult to reproduce reliably | High — IPC issues, serialisation failures, shared memory races, and zombie processes add significant diagnostic complexity |
| Scaling limit | Tens of thousands of concurrent coroutines on a single process with appropriate I/O multiplexing | Hundreds of threads before stack memory and context switch overhead degrades performance noticeably | Limited by the number of physical CPU cores and available memory — inter-process communication becomes the bottleneck |
Key Takeaways
- asyncio is single-threaded concurrency — it excels at I/O-bound tasks with thousands of concurrent operations but provides zero CPU parallelism. For CPU-bound work, the correct tool is multiprocessing.
- Coroutines are non-blocking — they yield control back to the event loop at every await point. A coroutine object not yet awaited or scheduled executes zero lines of code and produces no side effects.
- Concurrency versus parallelism: asyncio is concurrent — many things progressing at once on one thread. Multiprocessing is parallel — many things running simultaneously on separate cores. These are different properties solving different bottlenecks.
- gather(return_exceptions=True) is the production standard for batch operations against flaky dependencies — every task completes, every exception is inspectable, no orphaned tasks leak resources.
- Prefer async-native libraries in all async code paths: httpx over requests, motor over pymongo, aioredis over redis-py, aiofiles over
open(). A single synchronous library call can freeze the entire service. - One blocking call in the event loop thread freezes every coroutine in the process — instrument event loop latency as a separate metric from request latency, because average latency can look healthy while the loop is periodically freezing.
- Store a reference to every Task created with
create_task()— unreferenced tasks can be garbage collected before they complete with no error or warning in non-debug mode. - For Python 3.11+, prefer asyncio.TaskGroup over
gather()for structured concurrency — automatic sibling cancellation on failure and ExceptionGroup handling make it safer by default.
Common Mistakes to Avoid
- Calling a coroutine without awaiting it
Symptom: The coroutine function is called and returns immediately with no output, no side effect, and no error raised. The coroutine object is created and garbage collected silently. In Python 3.11+ debug mode, a RuntimeWarning is emitted saying 'coroutine was never awaited' — but only if the garbage collector runs before the program exits, which is not guaranteed under load.
Fix: Always await coroutine calls: result = awaitmy_coroutine(). For fire-and-forget patterns where you want to start work without waiting for it, use asyncio.create_task(my_coroutine()) and store the returned Task in a set or list. Add a done callback to remove it from the set on completion: task.add_done_callback(active_tasks.discard). Without storing the reference, the Task itself may be garbage collected before it completes. - Using synchronous libraries inside async handlers
Symptom: Event loop freezes for the full duration of every synchronous call. Under load, p99 latency spikes to the duration of the blocking call while average latency stays deceptively low — because most requests complete fine, only the ones unlucky enough to run while a blocking call is in progress are affected. All concurrent connections stall simultaneously during the freeze, causing correlated timeouts across unrelated requests.
Fix: Replace synchronous libraries with async equivalents: requests becomes httpx, pymongo becomes motor, redis-py becomes aioredis,open()becomes aiofiles. For synchronous code that cannot be replaced, wrap it in loop.run_in_executor(None, sync_function) — this offloads the call to Python's default thread pool and returns an awaitable that the event loop can wait on without blocking itself. - Using gather() without return_exceptions=True in production
Symptom: A single failing coroutine raises an exception that propagates immediately to the caller. The remaining coroutines continue running as orphaned tasks with no owner — they consume database connections, HTTP connections, and memory until they complete or timeout on their own. Under sustained load with any upstream failure rate, orphaned tasks accumulate and connection pools are silently exhausted.
Fix: Always pass return_exceptions=True toasyncio.gather()in production paths. Iterate the results list and handle exceptions individually: for i, result in enumerate(results): if isinstance(result, Exception): log_and_handle(i, result). This gives you full visibility into which tasks failed and which succeeded, with no orphaned tasks and no abandoned resources. - Creating tasks without storing references to them
Symptom: Tasks disappear mid-execution — they are garbage collected because no strong reference exists to keep them alive. The coroutine never completes, writes nothing to the database, sends nothing to the network, and raises no error. This manifests as intermittent missing records or skipped processing steps that are extremely difficult to reproduce because they depend on GC timing.
Fix: Store every Task reference in a set or list for its full lifetime: background_tasks =set(); task = asyncio.create_task(coro()); background_tasks.add(task); task.add_done_callback(background_tasks.discard). For Python 3.11+, asyncio.TaskGroup provides structured lifecycle management that eliminates this class of bug entirely. - Using time.sleep() instead of asyncio.sleep() in async code
Symptom: The entire event loop blocks for the sleep duration. Every coroutine in the process freezes. A time.sleep(5) inside a handler makes the entire service unresponsive for 5 full seconds. Load balancer health checks fail, triggering cascading restarts that amplify the incident by resetting all in-flight connections.
Fix: Always use await asyncio.sleep(seconds) in async code. Search the codebase for time.sleep and replace every instance in an async context. If a third-party library internally callstime.sleep()and cannot be replaced, wrap the entire call in loop.run_in_executor(None, library_function) to move it to a thread pool where it can block safely without holding the event loop. - Ignoring CancelledError in cleanup logic after a timeout
Symptom: When a task is cancelled by wait_for() timeout, cleanup code that follows the cancelled await statement never executes. Database connections are not closed, file handles are left open, distributed locks are not released, and temporary resources leak. Over hours of production traffic, connection pools are gradually exhausted without a clear explanation in the logs.
Fix: Use try/finally to ensure cleanup always runs regardless of how the coroutine exits: try: result = awaitwork(); finally: awaitcleanup(). If the cleanup itself must complete even if the outer scope is cancelled, wrap it inasyncio.shield(): try: result = awaitwork(); finally: await asyncio.shield(critical_cleanup()). The shield prevents the cleanup coroutine from being cancelled along with its parent.
Interview Questions on This Topic
- QExplain the starvation problem in an event loop. How does a single blocking call affect other unrelated coroutines?SeniorReveal
- QWhat is the difference between await task and asyncio.gather(task)? When would you use one over the other?Mid-levelReveal
- QHow would you implement a rate limiter that allows only 5 concurrent coroutines to run at a time using asyncio.Semaphore?Mid-levelReveal
- QHow does the Python GIL interact with asyncio? Does asyncio allow Python to bypass the GIL?SeniorReveal
- QWhat is the
asyncio.shield()pattern and when would you use it to protect a coroutine from cancellation?SeniorReveal
Frequently Asked Questions
What is the difference between asyncio and threading in Python?
Threading relies on the OS to schedule and switch between threads. Because of Python's GIL, only one thread executes Python bytecode at a time regardless of how many threads exist — so threading does not provide CPU parallelism for Python code, though it does release the GIL during I/O operations and C extension calls.
asyncio uses a single OS thread with cooperative multitasking. Context switches happen only when a coroutine explicitly yields with await, which means zero OS overhead per switch. The memory footprint is dramatically lower — a coroutine is a few kilobytes; an OS thread stack is typically one to eight megabytes. For high-concurrency I/O workloads, asyncio scales to tens of thousands of concurrent operations where threading would exhaust memory and OS thread limits at a few hundred.
Use asyncio for new high-concurrency I/O code. Use threading when integrating with legacy synchronous libraries via run_in_executor() or when a specific library explicitly requires a dedicated thread.
When should I use asyncio.create_task() instead of gather()?
Use asyncio.create_task() when you want to start a background operation now and collect its result at a later point in the same code path, or when you want fire-and-forget behaviour where you do not need the result at all. The task starts immediately when create_task() is called and runs concurrently with whatever the current coroutine does next.
Use asyncio.gather() when you have a known list of independent operations and you need all of their results before you can proceed. gather() is the right tool for fanout patterns: fetch from all these sources, wait for all results, then process them together.
The practical distinction: create_task() gives you fine-grained control over individual task lifetimes. gather() is a convenient all-or-nothing collection mechanism. For structured concurrency in Python 3.11+, TaskGroup combines the control of create_task() with the wait-for-all semantics of gather() and adds automatic sibling cancellation on failure.
Can I use asyncio for CPU-intensive tasks like image processing or ML inference?
No, not directly. asyncio is single-threaded — a CPU-intensive operation running inside the event loop will block the loop for its entire duration, freezing all other coroutines. CPU work does not release the event loop the way I/O does.
For CPU-bound work, use multiprocessing to bypass the GIL and utilise multiple cores. If you need to integrate CPU-bound work into an async application, use loop.run_in_executor(ProcessPoolExecutor, cpu_function, data) — this offloads the CPU work to a separate process and returns an awaitable so the event loop remains free.
The overhead of IPC serialisation makes ProcessPoolExecutor most appropriate for coarse-grained work — process this entire batch — rather than fine-grained per-request work where the IPC cost exceeds the computation time.
How do I test async code with pytest?
Use the pytest-asyncio plugin. Install it with pip install pytest-asyncio, then mark test functions with @pytest.mark.asyncio and define them as async def. The plugin manages the event loop lifecycle for each test.
For async fixtures, use @pytest_asyncio.fixture with async def and await inside them normally. Configure the plugin mode in pytest.ini or pyproject.toml: asyncio_mode = auto applies the asyncio mark automatically to all async test functions, removing boilerplate.
For testing timeout and cancellation behaviour specifically, asyncio.wait_for() in combination with asyncio.sleep() inside test fixtures allows you to simulate slow dependencies. Mock async functions with AsyncMock from unittest.mock rather than MagicMock — AsyncMock returns an awaitable, MagicMock does not and will cause errors when the code under test awaits it.
What happens to pending tasks when the event loop closes?
When asyncio.run() returns, it cancels all tasks that are still pending and runs the loop briefly to allow them to handle CancelledError. Tasks that have try/finally or async context manager cleanup will execute their cleanup code. Tasks that do not handle CancelledError will have any code after their current await point skipped.
For graceful shutdown where you need all tasks to complete cleanly, implement explicit shutdown logic before returning from main(): collect all tasks with asyncio.all_tasks(), cancel each one, and then await asyncio.gather(*pending_tasks, return_exceptions=True) to let each task run its cleanup. After this, the loop closes with no orphaned tasks.
For long-running services, wire SIGTERM handling to this shutdown sequence so that Kubernetes pod termination or systemd stop triggers a clean drain rather than an abrupt kill.
That's Advanced Python. Mark it forged?
6 min read · try the examples if you haven't