Python Intermediate

FastAPI Background Tasks and Async Endpoints

📅 March 05, 2026 ⏱ 5 min read 🎯 Intermediate

Where developers are forged. · Structured learning · Free forever.

📍 Part of: Python Libraries → Topic 42 of 51

Master FastAPI concurrency.

⚙️ Intermediate — basic Python knowledge assumed

In this tutorial, you'll learn

Master FastAPI concurrency.

BackgroundTasks is ideal for fire-and-forget logic that does not require a durable distributed system — but know its limits precisely: no retries, no persistence, no survival on restart. The boundary between BackgroundTasks and a durable queue is the question 'does losing this task cause a business impact?'
Endpoints declared with async def must use only non-blocking calls at every I/O boundary. time.sleep(), requests.get(), synchronous database drivers, and synchronous file I/O inside async def are production incidents in waiting — they block the entire event loop and freeze all concurrent requests with no error signal.
Regular def endpoints are safe for synchronous and CPU-bound code because FastAPI automatically submits them to the AnyIO thread pool. The thread pool default size is min(32, os.cpu_count() + 4) — exhaust it and sync endpoint requests queue silently.

✦ Plain-English analogy ✦ Real code with output ✦ Interview questions

⚡Quick Answer

FastAPI handles concurrency via two paths: BackgroundTasks for post-response work and async/await for non-blocking I/O
Use async def only when calling async libraries (httpx, asyncpg, motor) — never for synchronous blocking code
Use regular def for synchronous or CPU-bound work — FastAPI auto-offloads it to a thread pool via AnyIO
BackgroundTasks execute AFTER the response is sent — the client never waits for them
Calling time.sleep() inside async def blocks the entire event loop and freezes every concurrent request on the server
BackgroundTasks are lost on server restart — use Celery or ARQ for mission-critical jobs that must survive crashes
The thread pool default size is min(32, os.cpu_count() + 4) — exhaust it and sync endpoint requests start queuing silently

🚨 START HERE

FastAPI Concurrency Debug Cheat Sheet

When your FastAPI server is slow or background tasks are failing, run these checks in order. Start with the event loop before touching infrastructure.

🟠Event loop blocked — all endpoints frozen, CPU low, health checks timing out

Immediate ActionFind which async endpoint is making a synchronous blocking call — this is the cause in 90% of cases

Commands

PYTHONASYNCIODEBUG=1 uvicorn main:app --log-level debug 2>&1 | grep -i 'Executing.*took\|slow callback\|blocked'

grep -rn 'async def' app/ | xargs grep -l 'time.sleep\|requests.get\|smtplib\|open(' 2>/dev/null

Fix NowWrap the blocking call with `await asyncio.to_thread(sync_function, args)` or convert the endpoint to regular def so FastAPI offloads it to the thread pool automatically

🟡Background task silently failing — API returns success but side effects never happen

Immediate ActionCheck server logs for unhandled exceptions in the background function scope — if none exist, the function has no error handling

Commands

docker compose logs api --tail=200 | grep -i 'error\|exception\|traceback\|background'

grep -rn 'add_task' app/ | head -20

Fix NowAdd try/except with structured logging inside every background function body. Add a metric counter for failures so silent errors produce an observable signal in your monitoring platform

🟠Low throughput despite low CPU — requests queuing with no obvious bottleneck

Immediate ActionCheck Uvicorn worker count and thread pool saturation before looking at the application code

Commands

ps aux | grep uvicorn | grep -v grep

curl -s http://localhost:8000/metrics | grep -i 'thread\|worker\|active'

Fix NowIncrease workers: `uvicorn main:app --workers $(nproc)`. If thread pool saturation is the issue, convert high-traffic sync endpoints to async with native async libraries, or increase the thread pool size by setting the ANYIO_MAX_THREADS environment variable

Production IncidentRegistration Endpoint Frozen — async def with Synchronous SMTP Killed the Event LoopA developer added email sending inside an `async def` endpoint using the synchronous `smtplib` library. Under load, every registration request blocked the event loop for 3 seconds, causing all other endpoints to queue. P99 latency across the entire API jumped from 50ms to 30 seconds within two minutes of a marketing campaign launch.

SymptomAll API endpoints became sluggish simultaneously during registration spikes — not just the registration endpoint. Health check endpoints started timing out. Kubernetes liveness probes began failing. No CPU spike was visible, no memory leak, no database errors — the server appeared nearly idle by resource metrics but refused new connections and returned responses with multi-second delays.

AssumptionThe team assumed the PostgreSQL database was the bottleneck, since registration is a write-heavy operation. They spent two hours profiling slow queries, checking connection pool saturation, and reviewing the EXPLAIN ANALYZE output for the INSERT statement. The database was completely healthy — 200 idle connections, sub-5ms query times, no lock contention. The investigation was looking in entirely the wrong layer.

Root causeThe registration endpoint was declared as async def and called smtplib.SMTP().sendmail() — a synchronous blocking call that holds the executing thread for the duration of the SMTP handshake and data transfer. In asyncio, there is only one thread running the event loop. A blocking call on that thread does not just slow down the current request — it prevents every other coroutine from advancing for the entire duration of the block. With a 3-second SMTP handshake, every registration request held the event loop hostage for 3 seconds. Under modest load, this stacked: 10 concurrent registrations meant the event loop was blocked for 30 seconds of every 30-second window. Health checks, order lookups, search requests — everything queued behind SMTP.

FixMoved email sending to BackgroundTasks using a regular synchronous function, which FastAPI automatically offloads to its internal thread pool. The event loop is no longer involved in the SMTP handshake. For services where email delivery reliability mattered more, switched to aiosmtplib for a truly async SMTP implementation that works correctly inside async def. Added a ruff lint rule (ASYNC210) to the CI pipeline to flag synchronous I/O calls inside async functions at the pull request stage, before they reach production. Added event loop lag monitoring via a middleware that records the delta between when a request callback is scheduled and when it actually executes — a lag above 100ms now triggers an alert.

Key Lesson

Never call synchronous I/O inside async def. The event loop is single-threaded and cooperative — a blocking call does not yield, it occupies. Every other request on the server waits.If you must use a synchronous library inside an async endpoint, wrap it: result = await asyncio.to_thread(sync_function, arg1, arg2). This delegates execution to the thread pool without blocking the event loop.BackgroundTasks with a regular def function is the correct pattern for fire-and-forget sync work like email sending. The task runs in the thread pool after the response is sent, and the event loop is never involved.Monitor event loop lag as a first-class production metric. CPU and memory metrics will look normal during an event loop blockage — the server appears idle while being completely unresponsive. Lag monitoring is the only reliable way to catch blocking calls before users notice.Add async-aware linting to CI. Ruff's ASYNC rule set detects common blocking patterns inside async functions — time.sleep, requests.get, synchronous file I/O. Catching these at review time costs nothing; catching them in production costs hours.

Production Debug GuideSymptom → Action mapping for async/blocking problems

All endpoints slow during traffic spikes, but CPU usage is low and the database looks healthy→This pattern — low CPU, healthy database, slow responses — is almost always an event loop blockage. Enable PYTHONASYNCIODEBUG=1 to activate asyncio's slow callback detector, which logs any callback that takes longer than 100ms to complete. Search your async endpoints for time.sleep(), requests.get(), synchronous file operations, or any database driver that is not async-native. The blocking call is almost always in an async def function that looks correct at a glance. Once found, either convert the endpoint to regular def or wrap the blocking call with await asyncio.to_thread().

BackgroundTask email or notification never arrives, but the API returns 201 every time→FastAPI silently swallows exceptions raised inside background functions — the HTTP response has already been sent by the time the exception occurs, so there is nowhere to propagate it. Check server logs for unhandled exceptions in background functions, but if you have not added explicit try/except with logging inside the background function, there will be nothing to find. Add structured error logging as an immediate fix. Add a Prometheus counter or Datadog metric for background task failures so silent failures produce an observable signal going forward.

Background tasks execute slowly and subsequent requests are noticeably delayed→Check whether the background function is declared as async def. Async background functions run on the event loop, not in the thread pool, which means a slow async background task can starve other event loop operations. For CPU-bound or slow I/O background work, the background function should be a regular def — FastAPI runs it in the thread pool, leaving the event loop free. If the function genuinely needs to be async (it calls async libraries), ensure it properly awaits all I/O and does not contain any blocking calls.

Server handles only 10–20 concurrent requests despite low CPU and memory usage→Check the ASGI server worker count first — Uvicorn defaults to a single worker process, which means a single event loop. Increase with --workers to match available CPU cores: uvicorn main:app --workers 4. If you are already running multiple workers, check for thread pool exhaustion on sync endpoints. The default thread pool is min(32, cpu_count + 4) threads. If all threads are occupied by long-running synchronous operations, subsequent sync requests queue. Monitor active thread count and consider converting high-traffic sync endpoints to async with native async libraries.

Background task data is lost after server restart or rolling deploy→This is expected behavior, not a bug. BackgroundTasks are in-memory and tied to the process lifecycle. There is no persistence layer, no broker, no checkpointing. Tasks that are queued or in-flight when the process receives SIGTERM are dropped. For tasks that must survive restarts — user notifications, order confirmation emails, data pipeline steps — migrate to Celery with a Redis or RabbitMQ broker, or ARQ if you prefer a lighter async-native alternative. The migration is straightforward: the function body stays the same; you change how you enqueue the task.

FastAPI is built on Starlette and asyncio, which makes it one of the fastest Python web frameworks available. That speed is not free — it is entirely conditional on the developer understanding how the event loop works and respecting its rules. Unlike traditional synchronous frameworks where every request gets its own thread and isolation is automatic, FastAPI can handle thousands of concurrent connections on a single thread. The moment you introduce a blocking call into that single thread, every one of those connections stalls together.

The core decision point is deceptively simple on the surface: async def or def. Get it right and you get the throughput numbers in the benchmarks. Get it wrong and your production server handles 10 requests per second instead of 10,000 — with no obvious error, no alarm, just a wall of slow responses and a CPU that looks inexplicably idle.

This guide covers both concurrency paths — BackgroundTasks for fire-and-forget post-response logic, and async/await for non-blocking I/O — with the kind of specificity that only comes from seeing both patterns succeed and fail in production. The failure modes are predictable. The fixes are mechanical once you understand the model.

BackgroundTasks — Run After Response

The standard HTTP request-response cycle forces the client to wait for every operation inside the endpoint to complete before receiving a response. For operations that the user does not need to wait for — sending a welcome email, writing an audit log entry, invalidating a CDN cache — this is pure latency overhead with no user-facing benefit.

FastAPI's BackgroundTasks class solves this with a clean hook into the ASGI response lifecycle. You register a function (and its arguments) with the task manager inside your endpoint. FastAPI sends the HTTP response to the client first, then executes the registered functions afterward. The client's round-trip time reflects only the endpoint logic, not the background work.

The execution model has a few characteristics worth understanding clearly. Tasks are executed sequentially, not in parallel — if you register three background tasks, they run one after another in registration order, not concurrently. Regular def background functions run in FastAPI's thread pool, keeping the event loop free. Async def background functions run on the event loop itself — which means a slow async background function can delay other requests if it does not yield control frequently.

The hardest thing to accept about BackgroundTasks is what it explicitly is not. It is not a task queue. It has no retry mechanism. It has no persistence layer. It has no scheduling capability. It has no visibility into task status after the response is sent. If the server process dies while a background task is running or queued, the task is gone with no record of it. This is not a limitation to work around — it is a design boundary that defines where BackgroundTasks is appropriate and where a durable task queue is required.

The appropriate use cases are narrow but common: sending transactional emails, writing audit or activity log entries, cache invalidation, incrementing counters in analytics systems where occasional data loss is acceptable. The inappropriate use cases are equally clear: payment processing, order fulfillment, document generation, any operation where losing the task in a crash would cause a business impact.

io/thecodeforge/tasks/registration.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899

from fastapi import FastAPI, BackgroundTasks, status
from pydantic import BaseModel, EmailStr
import logging
import time

app = FastAPI()
log = logging.getLogger(__name__)


class RegistrationRequest(BaseModel):
    email: EmailStr
    username: str


def send_welcome_email(email: str, username: str) -> None:
    """
    Regular def — FastAPI runs this in the thread pool after the response is sent.
    The event loop is not involved. time.sleep() is safe here because this
    runs in a worker thread, not on the event loop thread.

    Critical: wrap everything in try/except. FastAPI silently swallows
    exceptions raised inside background functions. Without this block,
    a failed SMTP connection produces no log entry and no alert — the
    user simply never receives the email and no one knows.
    """
    try:
        log.info(
            "welcome_email_started",
            extra={"email": email, "username": username}
        )
        # Simulate SMTP handshake — safe in a thread pool, catastrophic on event loop
        time.sleep(2)
        # In production: use smtplib here, or aiosmtplib in an async background task
        log.info(
            "welcome_email_sent",
            extra={"email": email, "username": username}
        )
    except Exception:
        # Log with exc_info=True to capture the full traceback
        # Also increment a metric counter here in production:
        # metrics.counter("forge.bg.email.failure").increment()
        log.error(
            "welcome_email_failed",
            extra={"email": email},
            exc_info=True
        )


def write_registration_audit_log(email: str, username: str) -> None:
    """
    Second background task — runs sequentially after send_welcome_email.
    Tasks registered with add_task() execute in registration order, not in parallel.
    """
    try:
        log.info(
            "registration_audit_written",
            extra={"email": email, "username": username, "event": "user_registered"}
        )
    except Exception:
        log.error("audit_log_failed", extra={"email": email}, exc_info=True)


@app.post('/forge/register', status_code=status.HTTP_201_CREATED)
async def register_user(payload: RegistrationRequest, tasks: BackgroundTasks):
    """
    Registers a user and enqueues post-response work.

    Execution order:
      1. Validate payload (Pydantic)
      2. Write user to database (not shown)
      3. Send HTTP 201 to client  <-- client connection returns here
      4. send_welcome_email() runs in thread pool
      5. write_registration_audit_log() runs in thread pool

    The client's measured latency includes only steps 1-3.
    Steps 4-5 are invisible to the client and independent of client connectivity.
    """
    # Step 1: In production, write to database here
    # await db.users.insert({"email": payload.email, "username": payload.username})

    # Step 2: Register background tasks — they do not run yet
    tasks.add_task(send_welcome_email, payload.email, payload.username)
    tasks.add_task(write_registration_audit_log, payload.email, payload.username)

    # Step 3: Return response — background tasks run after this is sent
    return {
        "status": "created",
        "detail": "Registration complete. Welcome email is on its way.",
    }


# GET /forge/register POST {"email": "dev@thecodeforge.io", "username": "forge-dev"}
# Client receives immediately:
# -> {"status": "created", "detail": "Registration complete. Welcome email is on its way."}
#
# Server logs after response (invisible to client):
# INFO  welcome_email_started    email=dev@thecodeforge.io
# INFO  welcome_email_sent       email=dev@thecodeforge.io
# INFO  registration_audit_written email=dev@thecodeforge.io

▶ Output

{"status": "created", "detail": "Registration complete. Welcome email is on its way."}

Mental Model

BackgroundTasks as a Lightweight Post-Response Hook

BackgroundTasks is a sequential post-response executor — not a queue, not a scheduler, not a retry system. It runs registered functions after the HTTP response is dispatched, in the order they were registered, with no persistence and no fault tolerance. Know exactly what it is, and it is extremely useful. Expect more from it and it will disappoint you at the worst possible moment.

Tasks run AFTER the response is fully sent — the client's round-trip time excludes background work entirely.
Tasks run sequentially in registration order — a slow first task delays all subsequent tasks. There is no parallelism between background tasks.
Regular def background functions run in the thread pool — synchronous I/O is safe. Async def background functions run on the event loop — blocking calls inside them affect all concurrent requests.
Exceptions inside background functions are silently swallowed by FastAPI. Without explicit try/except and logging inside the background function, failures are invisible. This is the most common operational mistake with BackgroundTasks.
No retries, no persistence, no scheduling — if the process dies, tasks die with it. Use for: welcome emails, audit logs, cache invalidation. Do NOT use for: payment processing, order fulfillment, document generation, or any operation where losing the task causes a business impact.

📊 Production Insight

FastAPI silently swallows exceptions raised inside background functions because the HTTP response has already been sent by the time the exception occurs — there is nowhere to propagate it to the client, and the framework makes a deliberate choice not to crash the server process over a background task failure.

The consequence: if your email function crashes due to an SMTP timeout, an invalid recipient address, or an uncaught import error, the API returns 201 and the user never receives the email. No error log appears unless you wrote one. No alert fires unless you instrumented one. The failure is completely invisible to both the user and the operator.

Rule: every background function must have a top-level try/except block that logs the exception with exc_info=True and increments a failure counter metric. This is not optional defensive programming — it is the minimum viable error observability for a function that has no other error reporting path.

🎯 Key Takeaway

BackgroundTasks is a post-response hook — not a queue, not a retry system, not a scheduler. It runs registered functions sequentially after the HTTP response is sent, in the same process, with no persistence layer.

Exceptions are silently swallowed. Always wrap background function bodies in try/except with structured logging and a failure metric.

Use it for fire-and-forget work where occasional data loss is acceptable. The moment a task must complete reliably or survive a server restart, you are past what BackgroundTasks can offer — reach for a durable task queue.

Choosing BackgroundTasks vs a Durable Task Queue

IfTask is non-critical and losing it on server restart has no business impact (welcome email, cache bust, analytics event)

→

UseUse BackgroundTasks — zero infrastructure overhead, no broker to operate, fast to implement

IfTask must complete even if the server restarts, crashes, or is killed mid-execution

→

UseUse Celery with Redis or RabbitMQ, or ARQ for async-native workloads — durable broker required

IfTask needs automatic retries on failure, exponential backoff, or dead-letter handling

→

UseUse Celery with retry policies — BackgroundTasks has no retry mechanism whatsoever

IfTask is CPU-intensive (image resizing, PDF generation, video transcoding, ML inference)

→

UseUse Celery with dedicated workers in separate processes — background tasks in the API process compete for CPU with request handling and will degrade both

async def vs def — When to Use Which

This is the single most consequential decision in any FastAPI codebase, and it is the most frequently misunderstood. The surface-level explanation — 'use async def for async code and def for sync code' — is correct but incomplete. Understanding why FastAPI behaves differently based on the function signature is what separates code that performs from code that silently degrades under load.

When FastAPI receives a request for a regular def endpoint, it does not execute the function on the event loop. It submits the function to AnyIO's thread pool executor and awaits the result. The event loop is free to handle other requests while the sync function runs in a worker thread. This is why synchronous code inside a regular def endpoint is completely safe — it runs in isolation from the event loop.

When FastAPI receives a request for an async def endpoint, it awaits the coroutine directly on the event loop thread. No thread is involved. The coroutine runs cooperatively — it progresses until it hits an await expression, yields control back to the event loop, and resumes when the awaited operation completes. This cooperative yielding is what allows thousands of concurrent requests to share a single thread efficiently. The contract is simple: every operation inside an async def function must yield at every I/O boundary.

Break that contract — call time.sleep(), call requests.get(), open a synchronous database connection — and the cooperative model collapses. The event loop thread is occupied for the duration of the blocking call, and every other coroutine waits. There is no error, no warning, no exception. The server simply becomes unresponsive in proportion to how long and how frequently the blocking call occurs.

The thread pool that handles regular def endpoints has a default size of min(32, os.cpu_count() + 4). On a 4-core machine, that is 8 threads. If 8 sync endpoints are all executing simultaneously — each holding a thread for a database query, a file read, or an external HTTP call — request 9 must wait for a thread to become available. This is thread pool exhaustion, and it produces the same symptom as event loop blockage: slow responses with low CPU usage. Monitor active thread count and tune the pool size or migrate high-traffic sync endpoints to async libraries accordingly.

The practical decision framework is mechanical: does the code call an async library? Use async def with await. Does the code call a synchronous library? Use regular def. Does the code mix both? Use async def and wrap every synchronous call with await asyncio.to_thread(). Follow this consistently and the framework handles the rest.

io/thecodeforge/concurrency/endpoint_types.py · PYTHON

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111

from fastapi import FastAPI
import asyncio
import httpx
import hashlib
import time

app = FastAPI()


# ─────────────────────────────────────────────────────────────────────────────
# CASE 1: async def — the correct choice for async I/O
# ─────────────────────────────────────────────────────────────────────────────
@app.get('/forge/weather/{city}')
async def get_weather(city: str):
    """
    Uses httpx.AsyncClient — a truly async HTTP client.
    Every network operation yields to the event loop via await.
    While waiting for the external API response, FastAPI handles other requests.
    This endpoint scales to thousands of concurrent requests on a single worker.
    """
    async with httpx.AsyncClient(timeout=10.0) as client:
        response = await client.get(
            f'https://api.thecodeforge.io/v1/weather/{city}'
        )
        response.raise_for_status()
        return response.json()


# ─────────────────────────────────────────────────────────────────────────────
# CASE 2: regular def — the correct choice for synchronous or CPU-bound work
# ─────────────────────────────────────────────────────────────────────────────
@app.get('/forge/hash')
def compute_password_hash(password: str):
    """
    CPU-bound operation using hashlib — synchronous and compute-intensive.
    FastAPI detects regular def and submits this to the thread pool.
    The event loop is free to handle other requests while this runs.
    Thread pool default: min(32, os.cpu_count() + 4).
    """
    # Simulate bcrypt-level work with high iteration SHA-256
    digest = hashlib.pbkdf2_hmac(
        'sha256',
        password.encode(),
        b'forge-salt',
        iterations=100_000
    )
    return {"hash": digest.hex()}


# ─────────────────────────────────────────────────────────────────────────────
# CASE 3: async def with mixed sync/async — use asyncio.to_thread()
# ─────────────────────────────────────────────────────────────────────────────
@app.post('/forge/report')
async def generate_report(report_id: str):
    """
    Needs both async (fetch data from async DB) and sync (write to filesystem).
    Wrapping the synchronous file write in asyncio.to_thread() delegates it
    to the thread pool, keeping the event loop free during file I/O.
    """
    # Async database call — yields to event loop
    # data = await db.reports.find_one({"id": report_id})
    data = {"report_id": report_id, "rows": 1500}  # Simulated

    # Synchronous file I/O — must NOT run on event loop directly
    # Wrap with asyncio.to_thread() to delegate to thread pool
    def write_to_disk(payload: dict) -> str:
        path = f"/tmp/forge-report-{payload['report_id']}.json"
        import json
        with open(path, 'w') as f:
            json.dump(payload, f)
        return path

    output_path = await asyncio.to_thread(write_to_disk, data)
    return {"report_id": report_id, "output": output_path}


# ─────────────────────────────────────────────────────────────────────────────
# CASE 4: THE LOOP KILLER — async def with synchronous blocking call
# This is the pattern behind most FastAPI production incidents.
# ─────────────────────────────────────────────────────────────────────────────
#
# @app.get('/forge/disaster')
# async def event_loop_hostage():
#     time.sleep(10)  # BLOCKS THE ENTIRE SERVER FOR 10 SECONDS
#                     # Every concurrent request queues behind this.
#                     # CPU shows nearly 0% — the server appears idle.
#                     # Health checks time out. Kubernetes kills the pod.
#                     # There is no error. There is no warning.
#                     # There is just silence and then a wall of timeouts.


# ─────────────────────────────────────────────────────────────────────────────
# CASE 5: asyncio.sleep() — the correct non-blocking delay in async context
# ─────────────────────────────────────────────────────────────────────────────
@app.get('/forge/delayed-response')
async def delayed_response():
    """
    asyncio.sleep() yields control to the event loop for the duration.
    During this 2-second wait, FastAPI processes thousands of other requests.
    time.sleep(2) here would freeze all of them.
    """
    await asyncio.sleep(2)
    return {"status": "ready", "source": "thecodeforge"}


# Summary of behavior:
# Case 1: async def + httpx.AsyncClient -> event loop, non-blocking, scales to 10k+ concurrent
# Case 2: regular def + hashlib        -> thread pool, safe for sync/CPU work
# Case 3: async def + to_thread()      -> event loop for async parts, thread pool for sync parts
# Case 4: async def + time.sleep()     -> event loop BLOCKED, all other requests frozen
# Case 5: async def + asyncio.sleep()  -> event loop free, correct non-blocking delay

▶ Output

Case 1: {"city": "london", "temp_c": 14}
Case 2: {"hash": "a3f2c1..."}
Case 3: {"report_id": "RPT-001", "output": "/tmp/forge-report-RPT-001.json"}
Case 5: {"status": "ready", "source": "thecodeforge"}

⚠ The async def Trap — Blocking the Event Loop Has No Error Signal

📊 Production Insight

The thread pool that handles regular def endpoints is managed by AnyIO in modern FastAPI versions (0.95+). The default capacity is min(32, os.cpu_count() + 4) threads. On a typical 4-core container, that is 8 threads.

Eight threads sounds limiting, but for I/O-bound sync work (database queries, external HTTP calls) it handles substantial concurrency — each thread spends most of its time waiting for I/O, not executing CPU instructions. The real ceiling appears when those 8 threads are all occupied with long-running operations simultaneously. Request 9 waits. This produces the same symptom as event loop blockage — slow responses, low CPU — but the cause is different and the fix is different.

For I/O-bound sync endpoints under heavy load: consider migrating to async libraries. For CPU-bound sync endpoints that are legitimately compute-heavy: increase workers at the Uvicorn level to scale across cores, or offload to Celery workers. You can also tune the AnyIO thread pool size via the ANYIO_MAX_THREADS environment variable, but increasing thread count does not solve the underlying bottleneck — it defers it.

🎯 Key Takeaway

async def means you own the concurrency contract — every operation inside must yield at every I/O boundary via await. Regular def means FastAPI owns it — synchronous code is submitted to the thread pool and the event loop stays free.

The most dangerous code in a FastAPI codebase is synchronous I/O inside async def. It produces no error, no warning, and no stack trace — just a progressively unresponsive server that looks idle on every resource dashboard. Prevent it with linting. Detect it in production with event loop lag monitoring.

Choosing async def vs def for Your Endpoint

IfCalling async-native libraries: httpx, asyncpg, motor, aioredis, aiosmtplib

→

UseUse async def with await — this is the intended fast path. The event loop handles cooperative scheduling.

IfCalling synchronous libraries: requests, psycopg2, smtplib, synchronous file I/O

→

UseUse regular def — FastAPI submits to the thread pool automatically. Sync code is safe and does not block the event loop.

IfCPU-bound work: cryptographic hashing, image processing, data transformation, ML inference

→

UseUse regular def for thread pool isolation. For very heavy CPU work, use Celery with dedicated workers in separate processes to avoid GIL contention.

IfMixing async and sync calls within a single endpoint

→

UseUse async def and wrap every synchronous call with await asyncio.to_thread(sync_call, args). Never call a sync function directly from async def.

🗂 BackgroundTasks vs Celery vs Direct Async

Choosing the right concurrency pattern for your use case

Feature	BackgroundTasks	Celery	Direct Async Endpoint
Execution Timing	After response is sent — client never waits	On worker pickup, independent of API process	During request — client waits for completion
Persistence	None — in-memory, lost on process death	Yes — written to broker (Redis/RabbitMQ) before execution	N/A — completes or fails within the request lifecycle
Retries	No — one attempt, silent failure on exception	Yes — configurable retry policies, exponential backoff, dead-letter queues	No — handle failures in endpoint logic manually
Infrastructure	None — built into FastAPI, zero dependencies	Broker (Redis or RabbitMQ) + separate worker processes	None — runs in the same ASGI worker as the request
Survives Restart	No — queued and in-progress tasks are dropped	Yes — broker holds tasks until a worker picks them up	N/A
Best For	Welcome emails, audit logs, cache invalidation, analytics events	Video processing, payment workflows, PDF generation, data pipelines	Database queries, external API calls, any work the user needs to wait for
Performance Impact	Minimal — sync tasks run in thread pool, async tasks on event loop	Scales independently from API — dedicated worker fleet	Blocks event loop if sync code is used in async def; safe in regular def via thread pool

🎯 Key Takeaways

BackgroundTasks is ideal for fire-and-forget logic that does not require a durable distributed system — but know its limits precisely: no retries, no persistence, no survival on restart. The boundary between BackgroundTasks and a durable queue is the question 'does losing this task cause a business impact?'
Endpoints declared with async def must use only non-blocking calls at every I/O boundary. time.sleep(), requests.get(), synchronous database drivers, and synchronous file I/O inside async def are production incidents in waiting — they block the entire event loop and freeze all concurrent requests with no error signal.
Regular def endpoints are safe for synchronous and CPU-bound code because FastAPI automatically submits them to the AnyIO thread pool. The thread pool default size is min(32, os.cpu_count() + 4) — exhaust it and sync endpoint requests queue silently.
The client receives the HTTP response before any code in BackgroundTasks starts executing. The client's measured latency excludes all background work. This is the correct pattern for any side effect the user does not need to wait for.
Exceptions inside BackgroundTasks are silently swallowed — there is no error path to the client after the response has been sent. Every background function must have a top-level try/except block with structured logging and a failure metric counter. Without it, failures are completely invisible.
For complex, long-running, or mission-critical jobs that must survive server restarts, use a durable task queue. Celery with Redis or RabbitMQ for most teams; ARQ for async-native workflows where Celery's operational weight is not justified.

⚠ Common Mistakes to Avoid

✕Using async def with synchronous I/O calls — the #1 FastAPI production incident

Symptom

The entire server freezes for the duration of every blocking call. Under load, P99 latency spikes from milliseconds to seconds. CPU usage stays near zero — the server appears idle by every resource metric. Other endpoints, including health checks, become unresponsive. Kubernetes starts restarting pods due to liveness probe failures. The blocking endpoint itself looks fine in isolation; the problem only surfaces under concurrent load.

Fix

Option 1 (preferred): Replace the synchronous library with an async-native equivalent — httpx instead of requests, asyncpg instead of psycopg2, aiosmtplib instead of smtplib. Option 2: Wrap the blocking call with result = await asyncio.to_thread(sync_function, arg1, arg2) to delegate it to the thread pool. Option 3: Convert the endpoint from async def to regular def if it has no legitimate async operations. Enable ruff ASYNC lint rules in CI to catch this class of mistake before it reaches production.

✕Expecting BackgroundTasks to survive server restarts, rolling deploys, or OOM kills

Symptom

After a rolling deploy during a traffic spike, a batch of users registered in the 30 seconds before the old pods were terminated never receive welcome emails. No errors appear in logs — the tasks were queued but the process was killed before they executed. Support receives user complaints hours later with no record of the failure.

Fix

BackgroundTasks are in-memory and tied entirely to the process lifecycle. There is no persistence, no handoff, no graceful drain on shutdown. For tasks that must complete regardless of server state, use Celery with Redis or RabbitMQ as the broker. Tasks written to the broker survive process death — a new worker picks them up when it starts. ARQ is a lighter async-native alternative if you are already using Redis and want to avoid Celery's operational complexity.

✕Not handling exceptions inside background functions

Symptom

API returns 201 Created consistently. Users report never receiving welcome emails. Server logs contain no errors. The failure is completely invisible — there is no error path from a background function exception to any observable output unless you explicitly create one. The team discovers the problem via user complaints, not monitoring.

Fix

Wrap every background function body in a top-level try/except block. Log the exception with exc_info=True so the full traceback is captured. In production, also increment a failure metric: a Prometheus counter or Datadog metric named something like forge_bg_task_failures_total with a task_name label. This converts a silent failure into an observable signal that your alerting can detect.

✕Running CPU-bound work inside async def without offloading to a separate executor

Symptom

Single CPU core spikes to 100% while all other cores remain idle. The event loop is occupied by CPU computation, which does not yield between iterations. All concurrent requests queue behind the CPU work. Scaling to more Uvicorn workers partially helps (each worker gets its own CPU core) but the GIL still prevents true parallelism within a single worker process for pure Python code.

Fix

For moderate CPU work that needs to stay in the API process: wrap with await asyncio.to_thread(cpu_function, args) to run in the thread pool. For heavy CPU work: use a ProcessPoolExecutor explicitly (await asyncio.get_event_loop().run_in_executor(process_pool, cpu_function, args)) to escape the GIL and use multiple cores. For production-scale CPU-intensive work: use Celery workers running in dedicated processes, scaled independently from the API fleet.

✕Using time.sleep() instead of asyncio.sleep() inside async endpoints

Symptom

An endpoint with a time.sleep(2) call for a polling delay or rate limit freezes the entire server for 2 seconds under concurrent load. Health check liveness probes time out at 1-second intervals. Kubernetes marks the pod as unhealthy and restarts it. After restart, the same behavior recurs immediately because the code was not changed — only the infrastructure responded to the symptom.

Fix

Replace every instance of time.sleep(n) inside async def with await asyncio.sleep(n). The asyncio version yields control to the event loop for the duration — other requests are processed normally during the wait. Add a pre-commit hook or ruff rule (ASYNC101) to flag time.sleep imports in files that contain async def functions. This is a one-character mistake with severe production consequences and a trivially detectable pattern — catch it in tooling, not in postmortems.

Interview Questions on This Topic

QExplain how FastAPI's internal thread pool handles synchronous def functions versus async def functions — and what happens when a developer gets the choice wrong.SeniorReveal
FastAPI's behavior is fundamentally different depending on function signature, and the difference is not cosmetic — it is architectural. For regular def endpoints, FastAPI submits the function to AnyIO's thread pool executor and awaits the result asynchronously. The event loop posts the work item and immediately returns to handling other requests. When the thread pool completes the function, the event loop resumes the request coroutine and sends the response. The event loop is never blocked, even if the function takes several seconds. For async def endpoints, FastAPI awaits the coroutine directly on the event loop thread. No thread is allocated. The function runs cooperatively — it progresses until it hits an await expression, yields control, and resumes when the awaited operation completes. The assumption is that every I/O operation is async-native and will yield quickly. When a developer writes synchronous blocking code inside async def — time.sleep(), requests.get(), a synchronous database driver — the cooperative model breaks. The blocking call does not yield. The event loop thread is occupied for the duration of the call. No other coroutine can run: no other request can be processed, no health checks can respond, no background tasks can execute. The server appears idle by CPU and memory metrics while being completely unresponsive to new requests. The thread pool default size is min(32, os.cpu_count() + 4). On a 4-core machine: 8 threads. If 8 sync endpoints hold threads simultaneously, request 9 queues. This is thread pool exhaustion — same symptom as event loop blockage, different root cause, different fix. The diagnostic tells them apart: event loop blockage shows zero CPU with all requests slow. Thread pool exhaustion shows moderate CPU with sync endpoint requests queuing while async endpoints respond normally.
QWhat is the event loop, and how does calling time.sleep() inside an async def route impact every other concurrent request on the server?Mid-levelReveal
The asyncio event loop is the single-threaded scheduler that coordinates all coroutines in an asyncio application. It works on a cooperative multitasking model — coroutines take turns executing, and each one must explicitly yield control via await for others to run. The event loop processes one thing at a time on one thread. Its efficiency comes from the fact that I/O-bound work spends most of its time waiting, and during that wait, the event loop runs other coroutines. time.sleep() is a synchronous function that suspends the calling thread. When called from an async def function running on the event loop thread, it suspends that thread — which is the event loop thread. No await is issued. Control does not return to the event loop. Every coroutine that was waiting to run continues waiting until time.sleep() returns. A concrete example: 100 concurrent requests arrive at a FastAPI server. The first request enters an async def endpoint that calls time.sleep(10). For the next 10 seconds, the remaining 99 requests cannot make any progress. They are not queued in a thread pool — they are frozen in place on the event loop, waiting for a scheduler that is not scheduling. The fix is await asyncio.sleep(10). This is semantically equivalent — it pauses execution for 10 seconds — but it yields control to the event loop at the await point. During those 10 seconds, the event loop processes all 99 other requests normally. When the sleep completes, the event loop resumes the original coroutine. Monitoring event loop lag — the time between when a callback is scheduled and when it actually executes — is the reliable production signal for detecting blocking calls. A lag above 100ms sustained over multiple seconds indicates a blocking call in an async context. CPU and memory will look normal. Only loop lag exposes it.
QScenario: You need to process an image upload and generate a thumbnail. Would you use BackgroundTasks or an async endpoint? Justify your choice based on CPU intensity and reliability requirements.SeniorReveal
Neither option is appropriate without additional consideration, and the right answer depends on two independent dimensions: whether the client needs to wait, and whether the operation must complete reliably. Image thumbnail generation is CPU-bound work. CPU-bound work inside either BackgroundTasks or an async endpoint blocks the event loop if written naively. The processor does not yield between iterations of an image resize algorithm — there is no await point, no I/O boundary, no natural yield. Even in a regular def background function, the CPU work runs in the thread pool and is bounded by the GIL — you do not get true parallelism from multiple threads doing pure Python CPU work. If the thumbnail must be visible before the user can proceed — the UI shows a progress indicator — I would return HTTP 202 Accepted immediately with a job ID, process the thumbnail in a Celery worker with a dedicated CPU-optimized queue, and expose a polling endpoint or WebSocket to notify the client when the thumbnail URL is ready. This pattern is reliable (Celery persists the task to Redis before execution), scalable (worker fleet scales independently from the API), and correct (CPU work runs in a separate process, escaping both the GIL and the event loop entirely). If a placeholder thumbnail is acceptable and eventual generation is fine, BackgroundTasks is tempting but still inadequate for CPU-bound work. The correct implementation would be a regular def background function that uses ProcessPoolExecutor for the image processing step, not the default thread pool. That escapes the GIL and uses a dedicated CPU core. The key principle: BackgroundTasks solves the 'client does not wait' problem. It does not solve the 'CPU-bound work competes with request handling' problem. Those are separate problems that require separate solutions.
QHow does FastAPI's BackgroundTasks differ from a raw threading.Thread implementation in terms of the HTTP lifecycle and execution semantics?Mid-levelReveal
The most important difference is lifecycle integration. BackgroundTasks is managed by the Starlette ASGI framework and executes within a well-defined phase of the HTTP response lifecycle. FastAPI guarantees that the HTTP response is fully serialized and sent to the client before any registered background task begins executing. The client's connection is returned to the pool, the response time metric is recorded, and only then does the first background task start. With raw threading.Thread, the thread starts executing the moment you call .start() — which may be before, during, or after the response is serialized, depending on thread scheduling. The thread competes for CPU and I/O with the response serialization path. There is no lifecycle guarantee. Execution model: BackgroundTasks executes registered functions sequentially in registration order. There is no parallelism between background tasks — task 2 waits for task 1 to complete. threading.Thread creates a truly parallel thread that runs concurrently with both the request handler and other threads. Error handling: exceptions in BackgroundTasks are silently swallowed after the response is sent. exceptions in threading.Thread terminate the thread silently unless you have a thread-level exception handler configured. For production use, BackgroundTasks is the correct choice when you want framework-managed lifecycle semantics and the sequential execution model is acceptable. threading.Thread is a raw concurrency primitive — it offers more control at the cost of no lifecycle integration, no framework awareness, and more complex error handling. BackgroundTasks is almost always the right answer for post-response work in FastAPI; threading.Thread is rarely the right answer inside a web request handler.
QIf a FastAPI server restarts while a BackgroundTask is executing, what happens to that task? How does this shape your architectural decisions about what belongs in BackgroundTasks versus a durable queue?SeniorReveal
The task is lost, completely and silently. BackgroundTasks execute in the same OS process as the FastAPI application. There is no persistence layer, no write-ahead log, no state checkpoint. When the process receives SIGTERM (a normal graceful shutdown), SIGKILL (a forced kill from Kubernetes or OOM), or crashes due to an unhandled exception, every background task that is queued or in-flight at that moment is dropped. No error is logged unless you have explicit logging inside the task function. No retry occurs. The operation simply does not happen. Graceful shutdown adds a partial mitigation — Uvicorn will wait for in-flight requests to complete before shutting down, and Starlette attempts to complete background tasks that have already started. But tasks that are queued but not yet started, and tasks that are in the middle of a blocking I/O operation when the shutdown deadline expires, are still lost. This shapes the architectural decision clearly: BackgroundTasks is appropriate only for operations where losing the task has no business impact. Welcome email for a new registration — acceptable to lose in a rare restart, can be resent manually if the user complains. Audit log entry — acceptable to occasionally miss one in a crash scenario. Cache invalidation — acceptable, will self-correct on next request. Not acceptable: payment capture, order fulfillment step, password reset email (security-critical, user cannot proceed without it), data pipeline write, compliance audit record. For these, the task must be written to a durable broker before execution begins. Celery with Redis satisfies this — the task is serialized and written to Redis before the endpoint returns the response. If the worker process dies mid-execution, the broker holds the task and another worker picks it up. The failure is recoverable. The rule I apply: if a business or compliance stakeholder would care about a lost task, it does not belong in BackgroundTasks.

Frequently Asked Questions

What is the difference between BackgroundTasks and Celery?

BackgroundTasks is built into FastAPI and runs within the same process and memory space as your application. Setup requires zero additional infrastructure — no broker, no worker processes, no configuration beyond importing BackgroundTasks. The cost of that simplicity: no persistence, no retries, and no survival on process death. If the server crashes while a background task is running, the task is gone.

Celery is a distributed task queue that requires a message broker (Redis or RabbitMQ) and separate worker processes. Tasks are written to the broker before execution begins — if the API process or a worker process dies, the task survives in the broker and another worker picks it up. Celery supports configurable retry policies, task scheduling, rate limiting, and worker fleet scaling independently from the API.

The decision rule: use BackgroundTasks for sending a welcome email, writing an analytics event, or invalidating a cache — operations where occasional data loss is acceptable. Use Celery for generating a 500-page PDF report, processing a video upload, capturing a payment, or any operation where losing the task would require manual intervention or causes a user-facing failure.

Can I use async def with a database driver that has only a synchronous API?

Technically yes, but you will introduce an event loop blocking problem that degrades performance proportionally to database call frequency and duration. Every synchronous database call inside an async def endpoint blocks the event loop for the duration of the network round-trip to the database — typically 1–50ms per query. Under load, this stacks and becomes measurable latency degradation across all endpoints.

The two correct approaches: wrap the synchronous database call with result = await asyncio.to_thread(sync_db_call, args) to delegate it to the thread pool without blocking the event loop. This is acceptable as a migration step or when the sync driver is unavoidable. The professional approach is to replace the synchronous driver with an async-native one: asyncpg or psycopg3 for PostgreSQL, motor for MongoDB, aioredis for Redis. These libraries are designed to yield to the event loop during network I/O, enabling genuine concurrency.

Is it possible to return data from a BackgroundTask to the user?

No, and this is by design. When a BackgroundTask begins executing, the HTTP response has already been fully sent and the client connection has returned to the pool. There is no open channel to push additional data to the client.

If you need the user to receive the result of an asynchronous operation, there are two standard patterns. For short-lived operations (seconds): implement a polling endpoint — return a job ID in the initial response, start the work in a background task or Celery worker, and expose a GET endpoint that the client polls until the job status is 'complete'. For long-lived operations or real-time feedback: use WebSockets to push the result to the client when the work finishes. Both patterns require the work to be tracked somewhere persistent (database, Redis) so the polling endpoint or WebSocket handler can retrieve the result when it becomes available.

🔥

Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

About Naren Get in touch

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged