Python asyncio — 47-Second Freeze from Sync Calls
A sync requests.get() blocked the event loop for 47s—zero CPU, all coroutines frozen.
20+ years shipping production Python across data and backend systems. Notes here come from systems that actually shipped.
- asyncio is Python's single-threaded concurrency library using async/await syntax
- The Event Loop is the central scheduler — it multiplexes I/O across coroutines without OS threads
- Coroutines yield control at every
awaitpoint, enabling cooperative multitasking - asyncio.gather() runs independent I/O tasks concurrently — total time equals the slowest task, not the sum
- Blocking the loop with sync calls (requests.get, time.sleep) freezes ALL coroutines — use run_in_executor for sync code
- For CPU-bound work, use multiprocessing — asyncio cannot bypass the GIL
Imagine you are a chef in a busy kitchen. In a synchronous kitchen, you put toast in the toaster and stand motionless staring at it until it pops before you touch anything else. That is a waste of time and the customers notice. In an asyncio kitchen, you put the toast in, set a timer — that is the await — and immediately start grinding coffee beans while the toast browns. You are not growing extra arms, which would be threads. You are just being smart about using the waiting time productively. The event loop is the head chef managing all those timers simultaneously, making sure nothing burns and breakfast reaches the table faster. The moment you drop a heavy cookbook on the floor and spend five minutes picking it up, everyone in the kitchen freezes waiting for you. That cookbook is a blocking call.
asyncio solves a specific and important scalability problem: how do you handle thousands of concurrent I/O operations without spawning thousands of OS threads? The answer is cooperative multitasking — coroutines voluntarily yield control at await points, allowing a single-threaded event loop to multiplex work across all of them efficiently. No thread management, no locking, no context switch overhead from the OS.
The distinction senior engineers must genuinely internalise is that asyncio is concurrency, not parallelism. One thread handles all scheduling. This means any blocking call — a synchronous HTTP request, a CPU-heavy computation, even time.sleep() — freezes the entire loop and every coroutine waiting on it. Not some of them. All of them. This is the property that makes asyncio both powerful and dangerous: the performance gains from concurrency and the catastrophic failure mode from a single blocking call live right next to each other in the same codebase.
This guide covers production-grade patterns: orchestrating concurrent tasks with gather(), building fault-tolerant batch operations against unreliable dependencies, understanding the timeout and cancellation model deeply enough to use it correctly under pressure, and the operational mistakes I have seen repeatedly bring down async services. The examples are written for Python 3.11+ but the core patterns apply from 3.8 onward.
Why Your Async Code Freezes for 47 Seconds
asyncio is Python's cooperative concurrency model: a single-threaded event loop that multiplexes I/O-bound tasks by yielding control at explicit await points. The core mechanic is that only one coroutine runs at a time, and it must voluntarily suspend itself before another can proceed. This means any synchronous blocking call — time.sleep(), a CPU-bound loop, or a blocking I/O operation — stalls the entire event loop for its full duration. In practice, a single sync call of 47 milliseconds can cascade into a 47-second freeze under load because the loop cannot schedule other coroutines. Use asyncio when your workload is I/O-bound with many concurrent operations (network requests, file reads, database queries) and you need to maximize throughput without the overhead of threads. It is not a solution for CPU-bound work — that requires multiprocessing or thread pools. The real-world impact: a single sync call in a web server's async handler can drop thousands of requests per second.
Coroutines and the Event Loop: The Engine Room
A coroutine is a specialised Python generator with async/await syntax. When you define a function with async def, calling it does not execute a single line of the function body — it returns a coroutine object. To actually run it, you must either await it inside another coroutine or schedule it on the event loop via asyncio.create_task() or asyncio.run(). This is not a convention or a style choice — it is how the object model works.
A common misconception is that async def makes a function asynchronous in some broad sense. It does not. It makes the function coroutine-returning. The function body executes zero lines until the coroutine is awaited or scheduled. Forgetting this distinction leads to one of the most common asyncio bugs in production codebases: creating a coroutine object, not awaiting it, and then wondering why the operation never happened. Python 3.11+ will emit a RuntimeWarning about this, but only if something holds a reference long enough to trigger garbage collection — in high-throughput code where objects are short-lived, this warning is sometimes never emitted.
The event loop is the heartbeat of every asyncio application. It maintains a queue of ready callbacks, a selector watching registered I/O file descriptors (using epoll on Linux, kqueue on macOS, IOCP on Windows), and a heap of scheduled callbacks ordered by their scheduled time. When a coroutine awaits an I/O operation, it registers a callback with the selector and suspends — control returns to the loop, which picks the next ready callback and runs it. When the OS signals that I/O is complete, the selector delivers the event, the callback is scheduled, and the original coroutine resumes from exactly where it yielded. This entire mechanism happens in a single thread. No OS context switches. No lock contention. No stack per concurrent operation beyond the coroutine frame itself.
- Each await is a voluntary yield — the coroutine says 'I am waiting on I/O, run something else while I am paused'
- The loop maintains a selector that monitors all registered I/O file descriptors and a callback heap ordered by scheduled time
- When I/O completes, the OS notifies the selector, the loop schedules the corresponding callback, and the suspended coroutine resumes
- No parallel execution ever happens — at any given instant, exactly one coroutine is executing Python bytecode
- Context switches happen only at await boundaries — never mid-expression, never between two lines in the same function body
create_task(), it is not running — there is no in-between state.coroutine() — suspends the current coroutine until the result is ready, no scheduling overheadcoroutine()) — schedules the coroutine immediately on the loop, returns a Task handle you can await latercreate_task() over raw Futures for clarityasyncio.gather() — Orchestrating True Concurrency
When you have independent I/O operations, awaiting them sequentially one by one is leaving performance on the table. If you have three health checks that each take one second, awaiting them in series costs three seconds. Running them with gather() costs one second — the duration of the slowest one. That is the entire value proposition of gather(), and it is substantial.
asyncio.gather() takes an iterable of coroutines (or awaitables), wraps each one into a Task internally via create_task(), and schedules them all onto the event loop simultaneously. It then suspends the calling coroutine until every task completes, and returns a list of results in the same order as the input arguments — regardless of which tasks finished first. This ordering guarantee is important and worth relying on: you can safely unpack results positionally.
One nuance worth understanding: gather() creates tasks at the moment it is called, not at the moment the await resolves. This means all tasks start running as soon as the event loop gets control after the gather() call, which is immediately when the caller awaits gather(). If you are constructing a list of coroutines before calling gather(), those coroutines have not started yet — they are still inert coroutine objects. Only gather() turns them into running Tasks.
The performance model is straightforward: gather() converts sequential wait time into concurrent wait time. The total duration is bounded by max(all task durations) rather than sum(all task durations). For workloads involving many small network calls — health checks, fanout requests to microservices, parallel database lookups — this can reduce latency by an order of magnitude.
- Default (return_exceptions=False): if any coroutine raises, the exception propagates to the caller immediately and
gather()resolves — but the remaining tasks continue running in the background as orphaned tasks with no caller awaiting their results - Orphaned tasks are not cancelled — they consume connections, memory, and file descriptors until they complete or timeout on their own
- Over time in a busy service, orphaned tasks accumulate and connection pools are silently exhausted
- Always use return_exceptions=True in production — it captures every exception as a return value, no task is abandoned, and you inspect each result individually
gather() calls per second with a 5% upstream failure rate, the default behaviour creates dozens of orphaned tasks every second.asyncio.create_task() — schedules immediately, returns a Task handle, await it whenever you need the resultcreate_task() and asyncio.wait() — gather() requires all coroutines to be specified upfrontFault Tolerance: Exceptions, Timeouts, and Cancellation
In production, external APIs fail, network partitions happen, and upstream services degrade. The question is not whether these events will occur — it is whether your async code handles them gracefully or cascades them into wider outages.
The two primary tools for fault tolerance in asyncio are gather(return_exceptions=True) for batch resilience and asyncio.wait_for() for individual operation timeouts. They address different failure modes and are commonly used together.
gather(return_exceptions=True) is the production standard for any batch operation against multiple dependencies. Instead of letting the first exception short-circuit the entire batch, it captures all exceptions as return values — Exceptions are just results that happen to be error objects. You iterate the results list, check isinstance(result, Exception) for each entry, and handle successes and failures individually. Every task gets a chance to complete. No orphaned tasks. Full visibility into which dependencies failed and which succeeded.
asyncio.wait_for(coroutine, timeout=seconds) enforces a maximum duration on a single coroutine. If the coroutine does not complete within the timeout, asyncio raises TimeoutError and cancels the wrapped coroutine. This is the mechanism for implementing SLAs on individual downstream calls — if your upstream health check should complete in under 3 seconds, wrap it in wait_for() with timeout=3.0 and handle TimeoutError explicitly.
Cancellation is where most engineers get tripped up. When wait_for() fires, it sends a CancelledError to the wrapped coroutine at its current await point. If the coroutine has cleanup logic that also awaits — closing a database connection, writing an audit log, releasing a lock — that cleanup will only run if it is inside a try/finally block. Code after a cancelled await does not execute. This is cooperative cancellation, and respecting it correctly is what separates async code that is safe from async code that merely appears to work in testing.
- Default mode (return_exceptions=False): first exception propagates to the caller, gather resolves, remaining tasks continue running as orphans with no owner
- Safe mode (return_exceptions=True): all exceptions are captured as return values, no task is abandoned, you inspect each result individually
- wait_for() raises TimeoutError and cancels the target coroutine at its current await point — cleanup code must be in try/finally
- CancelledError is a BaseException, not an Exception — catching Exception does not catch it, which is intentional
- asyncio.shield(coroutine) protects a coroutine from external cancellation — the inner task continues even if the outer scope is cancelled
The Golden Rule: Never Block the Event Loop
This is the number one cause of production performance degradation in Python async services, and it is also the most insidious because it does not fail loudly. A blocking call does not raise an exception. It does not log a warning by default. It simply holds the event loop thread for its entire duration, during which every other coroutine in the process is frozen. One synchronous HTTP call taking 200ms freezes 10,000 concurrent connections for 200ms. Under load, these micro-freezes compound into p99 latency spikes that look like intermittent upstream degradation but are entirely self-inflicted.
The most common offenders in codebases I have reviewed: the requests library (always synchronous, even for simple GET calls), time.sleep() used as a delay inside handlers, CPU-heavy operations like image resizing or report generation, and third-party SDK clients that were written before async was widespread. The pattern is usually introduced by someone who understood the application's sync codebase well and did not fully internalise the async execution model.
For unavoidable synchronous code — a legacy library that cannot be replaced, a CPU-intensive operation that has no async equivalent — the correct approach is loop.run_in_executor(), which offloads the blocking call to a thread pool and returns an awaitable that the event loop can wait on without blocking itself. The thread pool runs in parallel with the event loop thread. The event loop remains free to process other coroutines while the thread pool handles the blocking work. This adds thread overhead, but it is categorically better than blocking the loop.
For CPU-bound work that needs true parallelism, ProcessPoolExecutor bypasses the GIL by running work in separate processes. The inter-process communication overhead makes this appropriate for coarse-grained work (process this batch) rather than fine-grained work (transform this value).
- Set loop.slow_callback_duration = 0.05 (50ms) to log a warning every time a callback takes longer than the threshold — this is the first tool to reach for
- Set PYTHONASYNCIODEBUG=1 in staging to enable full debug mode including unawaited coroutine detection
- Monitor event loop latency as a separate Prometheus metric — a blocked loop shows as latency spikes even when request volume is constant
- A healthy request rate combined with near-zero CPU is the operational signature of a blocked event loop — add an alert for this combination
- Use py-spy or yappi to profile a running async process without restarting it — both support sampling live Python processes
requests.get() call with a 30-second timeout does not degrade your service — it takes it completely offline for 30 seconds if the upstream is slow.What Is Asyncio — And What It Absolutely Isn't
Asyncio is a concurrency framework for I/O-bound work running in a single thread. It's not parallelism. It won't make your CPU-bound loops faster. What it does is let you juggle a thousand open network connections without breaking a sweat.
The event loop sits in the middle. It polls file descriptors, schedules coroutines when data arrives, and yields control back to waiting tasks. No threads, no GIL contention — just cooperative multitasking with explicit yield points.
When you call asyncio.run(, that creates a new event loop, runs main()) as a coroutine, and blocks until done. Inside that loop, every main()await is a handshake: "I'm waiting on something — go run someone else's code while I do." If you forget to await, you get a coroutine object, not execution. If you block the loop with a instead of time.sleep(), you freeze the entire show for every other task.asyncio.sleep()
The hard truth: asyncio is powerful, but it demands discipline. One synchronous call to inside a coroutine and your async web server is now serving one request at a time.requests.get()
time.sleep(), requests.get(), or any blocking library inside a coroutine. They starve the event loop and make your async code run like synchronous garbage. If you must use blocking code, shove it into a thread with asyncio.to_thread().Asyncio vs Threading: When to Use a Sledgehammer vs a Scalpel
Threading gives you preemptive multitasking. The OS decides when to swap threads. Asyncio gives you cooperative multitasking. You decide when to yield. The difference isn't academic — it dictates what kind of work you can do.
Threading works for I/O and CPU-bound work, but it's expensive. Each thread carries a 1–8 MB stack, and context switching costs real CPU cycles. Worse, Python's GIL means threads don't parallelise CPU work anyway. For 10k concurrent connections, threads will eat your memory. For 10k connections, asyncio eats maybe 10 MB total.
Asyncio shines when you're waiting on something external: a database query, an HTTP response, a file read. While you wait, other coroutines run on the same thread. Zero context switch overhead. No GIL contention. But if you need to parse a 200 MB JSON blob or run a Monte Carlo simulation, asyncio won't help — you still block the loop. That's when you reach for multiprocessing or push work to a task queue.
Rule of thumb: I/O-bound and many concurrent tasks → asyncio. CPU-bound or complex locking → threads or processes. Mixing both? Carefully. You can offload CPU work to a thread pool with , but you've just added complexity. Choose the right tool from the start.loop.run_in_executor()
asyncio.to_thread(). It offloads the blocking work to a thread pool without you writing thread management code. But better yet, use an async driver like asyncpg.Asyncio Best Practices That Save Your Prod Deploy
Most asyncio code breaks in production because devs treat it like threads with async/await syntax. Stop that. Rule one: never mix asyncio.run() inside a running event loop — you'll get 'RuntimeError: This event loop is already running' and your service crashes at 3 AM. Use asyncio.create_task() instead of low-level loop.create_task() when you're inside a coroutine; the high-level API properly handles cleanup on cancellation.
Second: always wrap your main entry point in asyncio.run(main()). That function creates a fresh event loop, runs your coroutine, and cleans up all pending tasks. Never call loop.close() yourself unless you're writing framework internals. Third: for timeouts, use asyncio.timeout() (Python 3.11+) or asyncio.wait_for() — never implement your own sleep-polling loop. That's how you get 47-second freezes. Fourth: debug mode is your friend. Set PYTHONASYNCIODEBUG=1 or pass debug=True to asyncio.run(). It catches forgotten awaits and slow callbacks.
Fifth: if you're doing CPU-bound work inside a coroutine, you've already lost. Offload to run_in_executor() with a ThreadPoolExecutor. The event loop isn't magic — it's an I/O scheduler.
create_task() inside, timeouts on every I/O call.Real-World Asyncio: Where It Pays and Where It Chokes
Asyncio shines when you're waiting on I/O — network calls, file reads, database queries, API requests. Think web scrapers hitting a hundred endpoints, or a chat server handling ten thousand connections. Each coroutine yields the CPU while waiting, so one thread handles all of them. That's the sweet spot: high-concurrency I/O where latency dominates, not CPU cycles.
Where does asyncio choke? CPU-bound work. Parsing a 10GB JSON file, image processing, or running ML inference inside a coroutine blocks the event loop. Your 10k connections freeze. Threading or multiprocessing is the right tool there. Also: complex synchronous libraries like some database drivers (looking at you, older MySQL connectors) silently block the loop. Wrap them in run_in_executor() or choose an async-native driver.
Real production pattern: FastAPI or aiohttp for the web layer, async Redis and asyncpg for data, and a separate multiprocessing pool for heavy lifting. The event loop handles thousands of concurrent I/O tasks; the process pool handles the CPU-bound grunt work. Mix them properly, and you scale to thousands of requests per second on a single box.
The Inner Workings of Coroutines
Coroutines are not just async functions. Behind the scenes, Python transforms every async def into a generator-like object with __await__ and send() methods. When you await something, the coroutine suspends itself—saving its local state and instruction pointer—then yields control back to the event loop. The event loop holds a reference to that coroutine, waiting for a signal (like a socket becoming readable) to call send(None) on it, which resumes execution exactly where it paused. This is fundamentally cooperative: no preemption, no OS thread stack. Every await point is a voluntary yield. Understanding this explains why blocking in a coroutine is catastrophic—you aren't giving the loop a chance to call send() on other waiting coroutines. The loop can only resume one coroutine at a time, in a single thread, but it juggles thousands by never blocking at a resume point. This mechanism is why asyncio achieves concurrency without parallelism: each coroutine is a tiny state machine driven by the loop.
A Homemade asyncio.sleep
Built-in asyncio.sleep(n) seems like magic, but you can build one from scratch to internalize how the event loop schedules work. The key is a future: an object that signals readiness. When you create a future and call loop.call_later(seconds, future.set_result, None), you schedule a callback to mark the future as done after the delay. Then await future suspends the coroutine until that callback fires. Your homemade sleep must return an awaitable that does exactly two things: (1) schedule a callback on the loop with your delay, (2) yield control by awaiting a future that the callback resolves. That's it. No busy waiting, no threads. The event loop holds the timer in its internal heap; when the timer expires, it runs the callback, which marks the future done, which resumes your coroutine. This reveals that asyncio.sleep is a thin wrapper around loop.call_later + future. Understanding this lets you create custom timed waits, cancellable delays, or polling loops that don't block.
set_result more than once raises InvalidStateError. Your homemade sleep must create a fresh future each call—never reuse.The 47-Second Freeze: How a Single requests.get() Call Locked an Entire Async Service
requests.get() is a blocking call, it held the event loop thread for the full 47 seconds. During that time, the event loop could not schedule any other coroutine. No new requests could be accepted. All in-flight awaitable operations stalled. The process was alive but effectively brain-dead. The real danger here was the silence: no exception was raised, no warning was logged, and the call eventually succeeded from the application's perspective — it just took 47 seconds and took the entire service down with it.requests.get() with httpx.AsyncClient.get() using await. Added an asyncio.wait_for() wrapper with a 3-second timeout to enforce an SLA on the health check. Instrumented event loop latency as a separate Prometheus metric so that blocking calls above 50ms would trigger an alert before manifesting as a user-visible outage. Implemented a flake8-async linter rule in the CI pipeline that flags synchronous blocking calls inside async function definitions — humans forget under deadline pressure, CI does not.- Never call synchronous blocking functions inside the event loop thread — one blocked call freezes every coroutine running in that process, not just the one making the call
- Use async-native libraries for all network I/O in async code paths: httpx instead of requests, motor instead of pymongo, aiofiles instead of the built-in
open() - Enforce linting rules that detect sync blocking calls in async contexts at CI time — code review catches many things but not this class of subtle correctness issue reliably
- Monitor event loop latency as a first-class metric separate from request latency — a healthy request count combined with near-zero CPU is the operational signature of a blocked event loop, and it needs its own alert
gather() call and re-run. Inspect the full results list for Exception instances rather than letting the first failure short-circuit. Count asyncio.all_tasks() before and after to verify no orphaned tasks remain running after the gather resolves.gc.collect() followed by len(asyncio.all_tasks()) to count live tasks. A task count that grows monotonically under load points to fire-and-forget coroutines that were never awaited and have no strong reference holding them accountable — they are running but nobody is collecting their results.asyncio.wait_for() timeout is sufficient for current load conditions. Check if the event loop itself is under contention — timeouts fire relative to event loop scheduling time, not wall clock time. If the loop is busy for 200ms between iterations, a 100ms timeout will fire even if the target operation only needed 50ms of actual work.python -c "import asyncio; loop=asyncio.get_event_loop(); loop.slow_callback_duration=0.05; print('Monitoring enabled at 50ms threshold')"strace -p $(pgrep -f your_app) -e trace=network,read,write -cKey takeaways
open(). A single synchronous library call can freeze the entire service.create_task()gather() for structured concurrencyCommon mistakes to avoid
6 patternsCalling a coroutine without awaiting it
my_coroutine(). For fire-and-forget patterns where you want to start work without waiting for it, use asyncio.create_task(my_coroutine()) and store the returned Task in a set or list. Add a done callback to remove it from the set on completion: task.add_done_callback(active_tasks.discard). Without storing the reference, the Task itself may be garbage collected before it completes.Using synchronous libraries inside async handlers
open() becomes aiofiles. For synchronous code that cannot be replaced, wrap it in loop.run_in_executor(None, sync_function) — this offloads the call to Python's default thread pool and returns an awaitable that the event loop can wait on without blocking itself.Using gather() without return_exceptions=True in production
asyncio.gather() in production paths. Iterate the results list and handle exceptions individually: for i, result in enumerate(results): if isinstance(result, Exception): log_and_handle(i, result). This gives you full visibility into which tasks failed and which succeeded, with no orphaned tasks and no abandoned resources.Creating tasks without storing references to them
set(); task = asyncio.create_task(coro()); background_tasks.add(task); task.add_done_callback(background_tasks.discard). For Python 3.11+, asyncio.TaskGroup provides structured lifecycle management that eliminates this class of bug entirely.Using time.sleep() instead of asyncio.sleep() in async code
time.sleep() and cannot be replaced, wrap the entire call in loop.run_in_executor(None, library_function) to move it to a thread pool where it can block safely without holding the event loop.Ignoring CancelledError in cleanup logic after a timeout
wait_for() timeout, cleanup code that follows the cancelled await statement never executes. Database connections are not closed, file handles are left open, distributed locks are not released, and temporary resources leak. Over hours of production traffic, connection pools are gradually exhausted without a clear explanation in the logs.work(); finally: await cleanup(). If the cleanup itself must complete even if the outer scope is cancelled, wrap it in asyncio.shield(): try: result = await work(); finally: await asyncio.shield(critical_cleanup()). The shield prevents the cleanup coroutine from being cancelled along with its parent.Interview Questions on This Topic
Explain the starvation problem in an event loop. How does a single blocking call affect other unrelated coroutines?
time.sleep(), a CPU-intensive computation, or any call that does not release the thread — it holds the OS thread for the entire duration of that call. The event loop cannot interrupt it. No other coroutine can be scheduled. No new I/O events can be processed. The entire process is effectively frozen.
This is starvation: coroutines that are ready to run, that have I/O results waiting for them, sit in the loop's ready queue with no CPU time. They are not waiting on anything external — they are waiting for one coroutine to stop monopolising the thread.
The solutions are: replace blocking calls with async equivalents for I/O-bound work; use loop.run_in_executor() with a thread pool for unavoidable synchronous code; use ProcessPoolExecutor for CPU-bound work that needs real parallelism. Detecting it in production requires instrumenting event loop latency separately from request latency — a blocked loop shows up as scheduling delay even when the underlying operations would be fast.Frequently Asked Questions
20+ years shipping production Python across data and backend systems. Notes here come from systems that actually shipped.
That's Advanced Python. Mark it forged?
12 min read · try the examples if you haven't