FastAPI Background Tasks and Async Endpoints
- BackgroundTasks is ideal for fire-and-forget logic that does not require a durable distributed system — but know its limits precisely: no retries, no persistence, no survival on restart. The boundary between BackgroundTasks and a durable queue is the question 'does losing this task cause a business impact?'
- Endpoints declared with
async defmust use only non-blocking calls at every I/O boundary.time.sleep(),requests.get(), synchronous database drivers, and synchronous file I/O inside async def are production incidents in waiting — they block the entire event loop and freeze all concurrent requests with no error signal. - Regular
defendpoints are safe for synchronous and CPU-bound code because FastAPI automatically submits them to the AnyIO thread pool. The thread pool default size is min(32,os.cpu_count()+ 4) — exhaust it and sync endpoint requests queue silently.
- FastAPI handles concurrency via two paths: BackgroundTasks for post-response work and async/await for non-blocking I/O
- Use
async defonly when calling async libraries (httpx, asyncpg, motor) — never for synchronous blocking code - Use regular
deffor synchronous or CPU-bound work — FastAPI auto-offloads it to a thread pool via AnyIO - BackgroundTasks execute AFTER the response is sent — the client never waits for them
- Calling
time.sleep()insideasync defblocks the entire event loop and freezes every concurrent request on the server - BackgroundTasks are lost on server restart — use Celery or ARQ for mission-critical jobs that must survive crashes
- The thread pool default size is min(32, os.cpu_count() + 4) — exhaust it and sync endpoint requests start queuing silently
Event loop blocked — all endpoints frozen, CPU low, health checks timing out
PYTHONASYNCIODEBUG=1 uvicorn main:app --log-level debug 2>&1 | grep -i 'Executing.*took\|slow callback\|blocked'grep -rn 'async def' app/ | xargs grep -l 'time.sleep\|requests.get\|smtplib\|open(' 2>/dev/nullBackground task silently failing — API returns success but side effects never happen
docker compose logs api --tail=200 | grep -i 'error\|exception\|traceback\|background'grep -rn 'add_task' app/ | head -20Low throughput despite low CPU — requests queuing with no obvious bottleneck
ps aux | grep uvicorn | grep -v grepcurl -s http://localhost:8000/metrics | grep -i 'thread\|worker\|active'Production Incident
async def and called smtplib.SMTP().sendmail() — a synchronous blocking call that holds the executing thread for the duration of the SMTP handshake and data transfer. In asyncio, there is only one thread running the event loop. A blocking call on that thread does not just slow down the current request — it prevents every other coroutine from advancing for the entire duration of the block. With a 3-second SMTP handshake, every registration request held the event loop hostage for 3 seconds. Under modest load, this stacked: 10 concurrent registrations meant the event loop was blocked for 30 seconds of every 30-second window. Health checks, order lookups, search requests — everything queued behind SMTP.aiosmtplib for a truly async SMTP implementation that works correctly inside async def. Added a ruff lint rule (ASYNC210) to the CI pipeline to flag synchronous I/O calls inside async functions at the pull request stage, before they reach production. Added event loop lag monitoring via a middleware that records the delta between when a request callback is scheduled and when it actually executes — a lag above 100ms now triggers an alert.result = await asyncio.to_thread(sync_function, arg1, arg2). This delegates execution to the thread pool without blocking the event loop.BackgroundTasks with a regular def function is the correct pattern for fire-and-forget sync work like email sending. The task runs in the thread pool after the response is sent, and the event loop is never involved.Monitor event loop lag as a first-class production metric. CPU and memory metrics will look normal during an event loop blockage — the server appears idle while being completely unresponsive. Lag monitoring is the only reliable way to catch blocking calls before users notice.Add async-aware linting to CI. Ruff's ASYNC rule set detects common blocking patterns inside async functions — time.sleep, requests.get, synchronous file I/O. Catching these at review time costs nothing; catching them in production costs hours.Production Debug GuideSymptom → Action mapping for async/blocking problems
time.sleep(), requests.get(), synchronous file operations, or any database driver that is not async-native. The blocking call is almost always in an async def function that looks correct at a glance. Once found, either convert the endpoint to regular def or wrap the blocking call with await asyncio.to_thread().uvicorn main:app --workers 4. If you are already running multiple workers, check for thread pool exhaustion on sync endpoints. The default thread pool is min(32, cpu_count + 4) threads. If all threads are occupied by long-running synchronous operations, subsequent sync requests queue. Monitor active thread count and consider converting high-traffic sync endpoints to async with native async libraries.FastAPI is built on Starlette and asyncio, which makes it one of the fastest Python web frameworks available. That speed is not free — it is entirely conditional on the developer understanding how the event loop works and respecting its rules. Unlike traditional synchronous frameworks where every request gets its own thread and isolation is automatic, FastAPI can handle thousands of concurrent connections on a single thread. The moment you introduce a blocking call into that single thread, every one of those connections stalls together.
The core decision point is deceptively simple on the surface: async def or def. Get it right and you get the throughput numbers in the benchmarks. Get it wrong and your production server handles 10 requests per second instead of 10,000 — with no obvious error, no alarm, just a wall of slow responses and a CPU that looks inexplicably idle.
This guide covers both concurrency paths — BackgroundTasks for fire-and-forget post-response logic, and async/await for non-blocking I/O — with the kind of specificity that only comes from seeing both patterns succeed and fail in production. The failure modes are predictable. The fixes are mechanical once you understand the model.
BackgroundTasks — Run After Response
The standard HTTP request-response cycle forces the client to wait for every operation inside the endpoint to complete before receiving a response. For operations that the user does not need to wait for — sending a welcome email, writing an audit log entry, invalidating a CDN cache — this is pure latency overhead with no user-facing benefit.
FastAPI's BackgroundTasks class solves this with a clean hook into the ASGI response lifecycle. You register a function (and its arguments) with the task manager inside your endpoint. FastAPI sends the HTTP response to the client first, then executes the registered functions afterward. The client's round-trip time reflects only the endpoint logic, not the background work.
The execution model has a few characteristics worth understanding clearly. Tasks are executed sequentially, not in parallel — if you register three background tasks, they run one after another in registration order, not concurrently. Regular def background functions run in FastAPI's thread pool, keeping the event loop free. Async def background functions run on the event loop itself — which means a slow async background function can delay other requests if it does not yield control frequently.
The hardest thing to accept about BackgroundTasks is what it explicitly is not. It is not a task queue. It has no retry mechanism. It has no persistence layer. It has no scheduling capability. It has no visibility into task status after the response is sent. If the server process dies while a background task is running or queued, the task is gone with no record of it. This is not a limitation to work around — it is a design boundary that defines where BackgroundTasks is appropriate and where a durable task queue is required.
The appropriate use cases are narrow but common: sending transactional emails, writing audit or activity log entries, cache invalidation, incrementing counters in analytics systems where occasional data loss is acceptable. The inappropriate use cases are equally clear: payment processing, order fulfillment, document generation, any operation where losing the task in a crash would cause a business impact.
from fastapi import FastAPI, BackgroundTasks, status from pydantic import BaseModel, EmailStr import logging import time app = FastAPI() log = logging.getLogger(__name__) class RegistrationRequest(BaseModel): email: EmailStr username: str def send_welcome_email(email: str, username: str) -> None: """ Regular def — FastAPI runs this in the thread pool after the response is sent. The event loop is not involved. time.sleep() is safe here because this runs in a worker thread, not on the event loop thread. Critical: wrap everything in try/except. FastAPI silently swallows exceptions raised inside background functions. Without this block, a failed SMTP connection produces no log entry and no alert — the user simply never receives the email and no one knows. """ try: log.info( "welcome_email_started", extra={"email": email, "username": username} ) # Simulate SMTP handshake — safe in a thread pool, catastrophic on event loop time.sleep(2) # In production: use smtplib here, or aiosmtplib in an async background task log.info( "welcome_email_sent", extra={"email": email, "username": username} ) except Exception: # Log with exc_info=True to capture the full traceback # Also increment a metric counter here in production: # metrics.counter("forge.bg.email.failure").increment() log.error( "welcome_email_failed", extra={"email": email}, exc_info=True ) def write_registration_audit_log(email: str, username: str) -> None: """ Second background task — runs sequentially after send_welcome_email. Tasks registered with add_task() execute in registration order, not in parallel. """ try: log.info( "registration_audit_written", extra={"email": email, "username": username, "event": "user_registered"} ) except Exception: log.error("audit_log_failed", extra={"email": email}, exc_info=True) @app.post('/forge/register', status_code=status.HTTP_201_CREATED) async def register_user(payload: RegistrationRequest, tasks: BackgroundTasks): """ Registers a user and enqueues post-response work. Execution order: 1. Validate payload (Pydantic) 2. Write user to database (not shown) 3. Send HTTP 201 to client <-- client connection returns here 4. send_welcome_email() runs in thread pool 5. write_registration_audit_log() runs in thread pool The client's measured latency includes only steps 1-3. Steps 4-5 are invisible to the client and independent of client connectivity. """ # Step 1: In production, write to database here # await db.users.insert({"email": payload.email, "username": payload.username}) # Step 2: Register background tasks — they do not run yet tasks.add_task(send_welcome_email, payload.email, payload.username) tasks.add_task(write_registration_audit_log, payload.email, payload.username) # Step 3: Return response — background tasks run after this is sent return { "status": "created", "detail": "Registration complete. Welcome email is on its way.", } # GET /forge/register POST {"email": "dev@thecodeforge.io", "username": "forge-dev"} # Client receives immediately: # -> {"status": "created", "detail": "Registration complete. Welcome email is on its way."} # # Server logs after response (invisible to client): # INFO welcome_email_started email=dev@thecodeforge.io # INFO welcome_email_sent email=dev@thecodeforge.io # INFO registration_audit_written email=dev@thecodeforge.io
- Tasks run AFTER the response is fully sent — the client's round-trip time excludes background work entirely.
- Tasks run sequentially in registration order — a slow first task delays all subsequent tasks. There is no parallelism between background tasks.
- Regular def background functions run in the thread pool — synchronous I/O is safe. Async def background functions run on the event loop — blocking calls inside them affect all concurrent requests.
- Exceptions inside background functions are silently swallowed by FastAPI. Without explicit try/except and logging inside the background function, failures are invisible. This is the most common operational mistake with BackgroundTasks.
- No retries, no persistence, no scheduling — if the process dies, tasks die with it. Use for: welcome emails, audit logs, cache invalidation. Do NOT use for: payment processing, order fulfillment, document generation, or any operation where losing the task causes a business impact.
async def vs def — When to Use Which
This is the single most consequential decision in any FastAPI codebase, and it is the most frequently misunderstood. The surface-level explanation — 'use async def for async code and def for sync code' — is correct but incomplete. Understanding why FastAPI behaves differently based on the function signature is what separates code that performs from code that silently degrades under load.
When FastAPI receives a request for a regular def endpoint, it does not execute the function on the event loop. It submits the function to AnyIO's thread pool executor and awaits the result. The event loop is free to handle other requests while the sync function runs in a worker thread. This is why synchronous code inside a regular def endpoint is completely safe — it runs in isolation from the event loop.
When FastAPI receives a request for an async def endpoint, it awaits the coroutine directly on the event loop thread. No thread is involved. The coroutine runs cooperatively — it progresses until it hits an await expression, yields control back to the event loop, and resumes when the awaited operation completes. This cooperative yielding is what allows thousands of concurrent requests to share a single thread efficiently. The contract is simple: every operation inside an async def function must yield at every I/O boundary.
Break that contract — call time.sleep(), call requests.get(), open a synchronous database connection — and the cooperative model collapses. The event loop thread is occupied for the duration of the blocking call, and every other coroutine waits. There is no error, no warning, no exception. The server simply becomes unresponsive in proportion to how long and how frequently the blocking call occurs.
The thread pool that handles regular def endpoints has a default size of min(32, os.cpu_count() + 4). On a 4-core machine, that is 8 threads. If 8 sync endpoints are all executing simultaneously — each holding a thread for a database query, a file read, or an external HTTP call — request 9 must wait for a thread to become available. This is thread pool exhaustion, and it produces the same symptom as event loop blockage: slow responses with low CPU usage. Monitor active thread count and tune the pool size or migrate high-traffic sync endpoints to async libraries accordingly.
The practical decision framework is mechanical: does the code call an async library? Use async def with await. Does the code call a synchronous library? Use regular def. Does the code mix both? Use async def and wrap every synchronous call with await asyncio.to_thread(). Follow this consistently and the framework handles the rest.
from fastapi import FastAPI import asyncio import httpx import hashlib import time app = FastAPI() # ───────────────────────────────────────────────────────────────────────────── # CASE 1: async def — the correct choice for async I/O # ───────────────────────────────────────────────────────────────────────────── @app.get('/forge/weather/{city}') async def get_weather(city: str): """ Uses httpx.AsyncClient — a truly async HTTP client. Every network operation yields to the event loop via await. While waiting for the external API response, FastAPI handles other requests. This endpoint scales to thousands of concurrent requests on a single worker. """ async with httpx.AsyncClient(timeout=10.0) as client: response = await client.get( f'https://api.thecodeforge.io/v1/weather/{city}' ) response.raise_for_status() return response.json() # ───────────────────────────────────────────────────────────────────────────── # CASE 2: regular def — the correct choice for synchronous or CPU-bound work # ───────────────────────────────────────────────────────────────────────────── @app.get('/forge/hash') def compute_password_hash(password: str): """ CPU-bound operation using hashlib — synchronous and compute-intensive. FastAPI detects regular def and submits this to the thread pool. The event loop is free to handle other requests while this runs. Thread pool default: min(32, os.cpu_count() + 4). """ # Simulate bcrypt-level work with high iteration SHA-256 digest = hashlib.pbkdf2_hmac( 'sha256', password.encode(), b'forge-salt', iterations=100_000 ) return {"hash": digest.hex()} # ───────────────────────────────────────────────────────────────────────────── # CASE 3: async def with mixed sync/async — use asyncio.to_thread() # ───────────────────────────────────────────────────────────────────────────── @app.post('/forge/report') async def generate_report(report_id: str): """ Needs both async (fetch data from async DB) and sync (write to filesystem). Wrapping the synchronous file write in asyncio.to_thread() delegates it to the thread pool, keeping the event loop free during file I/O. """ # Async database call — yields to event loop # data = await db.reports.find_one({"id": report_id}) data = {"report_id": report_id, "rows": 1500} # Simulated # Synchronous file I/O — must NOT run on event loop directly # Wrap with asyncio.to_thread() to delegate to thread pool def write_to_disk(payload: dict) -> str: path = f"/tmp/forge-report-{payload['report_id']}.json" import json with open(path, 'w') as f: json.dump(payload, f) return path output_path = await asyncio.to_thread(write_to_disk, data) return {"report_id": report_id, "output": output_path} # ───────────────────────────────────────────────────────────────────────────── # CASE 4: THE LOOP KILLER — async def with synchronous blocking call # This is the pattern behind most FastAPI production incidents. # ───────────────────────────────────────────────────────────────────────────── # # @app.get('/forge/disaster') # async def event_loop_hostage(): # time.sleep(10) # BLOCKS THE ENTIRE SERVER FOR 10 SECONDS # # Every concurrent request queues behind this. # # CPU shows nearly 0% — the server appears idle. # # Health checks time out. Kubernetes kills the pod. # # There is no error. There is no warning. # # There is just silence and then a wall of timeouts. # ───────────────────────────────────────────────────────────────────────────── # CASE 5: asyncio.sleep() — the correct non-blocking delay in async context # ───────────────────────────────────────────────────────────────────────────── @app.get('/forge/delayed-response') async def delayed_response(): """ asyncio.sleep() yields control to the event loop for the duration. During this 2-second wait, FastAPI processes thousands of other requests. time.sleep(2) here would freeze all of them. """ await asyncio.sleep(2) return {"status": "ready", "source": "thecodeforge"} # Summary of behavior: # Case 1: async def + httpx.AsyncClient -> event loop, non-blocking, scales to 10k+ concurrent # Case 2: regular def + hashlib -> thread pool, safe for sync/CPU work # Case 3: async def + to_thread() -> event loop for async parts, thread pool for sync parts # Case 4: async def + time.sleep() -> event loop BLOCKED, all other requests frozen # Case 5: async def + asyncio.sleep() -> event loop free, correct non-blocking delay
Case 2: {"hash": "a3f2c1..."}
Case 3: {"report_id": "RPT-001", "output": "/tmp/forge-report-RPT-001.json"}
Case 5: {"status": "ready", "source": "thecodeforge"}
os.cpu_count() + 4) threads. On a typical 4-core container, that is 8 threads.await asyncio.to_thread(sync_call, args). Never call a sync function directly from async def.| Feature | BackgroundTasks | Celery | Direct Async Endpoint |
|---|---|---|---|
| Execution Timing | After response is sent — client never waits | On worker pickup, independent of API process | During request — client waits for completion |
| Persistence | None — in-memory, lost on process death | Yes — written to broker (Redis/RabbitMQ) before execution | N/A — completes or fails within the request lifecycle |
| Retries | No — one attempt, silent failure on exception | Yes — configurable retry policies, exponential backoff, dead-letter queues | No — handle failures in endpoint logic manually |
| Infrastructure | None — built into FastAPI, zero dependencies | Broker (Redis or RabbitMQ) + separate worker processes | None — runs in the same ASGI worker as the request |
| Survives Restart | No — queued and in-progress tasks are dropped | Yes — broker holds tasks until a worker picks them up | N/A |
| Best For | Welcome emails, audit logs, cache invalidation, analytics events | Video processing, payment workflows, PDF generation, data pipelines | Database queries, external API calls, any work the user needs to wait for |
| Performance Impact | Minimal — sync tasks run in thread pool, async tasks on event loop | Scales independently from API — dedicated worker fleet | Blocks event loop if sync code is used in async def; safe in regular def via thread pool |
🎯 Key Takeaways
- BackgroundTasks is ideal for fire-and-forget logic that does not require a durable distributed system — but know its limits precisely: no retries, no persistence, no survival on restart. The boundary between BackgroundTasks and a durable queue is the question 'does losing this task cause a business impact?'
- Endpoints declared with
async defmust use only non-blocking calls at every I/O boundary.time.sleep(),requests.get(), synchronous database drivers, and synchronous file I/O inside async def are production incidents in waiting — they block the entire event loop and freeze all concurrent requests with no error signal. - Regular
defendpoints are safe for synchronous and CPU-bound code because FastAPI automatically submits them to the AnyIO thread pool. The thread pool default size is min(32,os.cpu_count()+ 4) — exhaust it and sync endpoint requests queue silently. - The client receives the HTTP response before any code in BackgroundTasks starts executing. The client's measured latency excludes all background work. This is the correct pattern for any side effect the user does not need to wait for.
- Exceptions inside BackgroundTasks are silently swallowed — there is no error path to the client after the response has been sent. Every background function must have a top-level try/except block with structured logging and a failure metric counter. Without it, failures are completely invisible.
- For complex, long-running, or mission-critical jobs that must survive server restarts, use a durable task queue. Celery with Redis or RabbitMQ for most teams; ARQ for async-native workflows where Celery's operational weight is not justified.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QExplain how FastAPI's internal thread pool handles synchronous def functions versus async def functions — and what happens when a developer gets the choice wrong.SeniorReveal
- QWhat is the event loop, and how does calling
time.sleep()inside an async def route impact every other concurrent request on the server?Mid-levelReveal - QScenario: You need to process an image upload and generate a thumbnail. Would you use BackgroundTasks or an async endpoint? Justify your choice based on CPU intensity and reliability requirements.SeniorReveal
- QHow does FastAPI's BackgroundTasks differ from a raw threading.Thread implementation in terms of the HTTP lifecycle and execution semantics?Mid-levelReveal
- QIf a FastAPI server restarts while a BackgroundTask is executing, what happens to that task? How does this shape your architectural decisions about what belongs in BackgroundTasks versus a durable queue?SeniorReveal
Frequently Asked Questions
What is the difference between BackgroundTasks and Celery?
BackgroundTasks is built into FastAPI and runs within the same process and memory space as your application. Setup requires zero additional infrastructure — no broker, no worker processes, no configuration beyond importing BackgroundTasks. The cost of that simplicity: no persistence, no retries, and no survival on process death. If the server crashes while a background task is running, the task is gone.
Celery is a distributed task queue that requires a message broker (Redis or RabbitMQ) and separate worker processes. Tasks are written to the broker before execution begins — if the API process or a worker process dies, the task survives in the broker and another worker picks it up. Celery supports configurable retry policies, task scheduling, rate limiting, and worker fleet scaling independently from the API.
The decision rule: use BackgroundTasks for sending a welcome email, writing an analytics event, or invalidating a cache — operations where occasional data loss is acceptable. Use Celery for generating a 500-page PDF report, processing a video upload, capturing a payment, or any operation where losing the task would require manual intervention or causes a user-facing failure.
Can I use async def with a database driver that has only a synchronous API?
Technically yes, but you will introduce an event loop blocking problem that degrades performance proportionally to database call frequency and duration. Every synchronous database call inside an async def endpoint blocks the event loop for the duration of the network round-trip to the database — typically 1–50ms per query. Under load, this stacks and becomes measurable latency degradation across all endpoints.
The two correct approaches: wrap the synchronous database call with result = await asyncio.to_thread(sync_db_call, args) to delegate it to the thread pool without blocking the event loop. This is acceptable as a migration step or when the sync driver is unavoidable. The professional approach is to replace the synchronous driver with an async-native one: asyncpg or psycopg3 for PostgreSQL, motor for MongoDB, aioredis for Redis. These libraries are designed to yield to the event loop during network I/O, enabling genuine concurrency.
Is it possible to return data from a BackgroundTask to the user?
No, and this is by design. When a BackgroundTask begins executing, the HTTP response has already been fully sent and the client connection has returned to the pool. There is no open channel to push additional data to the client.
If you need the user to receive the result of an asynchronous operation, there are two standard patterns. For short-lived operations (seconds): implement a polling endpoint — return a job ID in the initial response, start the work in a background task or Celery worker, and expose a GET endpoint that the client polls until the job status is 'complete'. For long-lived operations or real-time feedback: use WebSockets to push the result to the client when the work finishes. Both patterns require the work to be tracked somewhere persistent (database, Redis) so the polling endpoint or WebSocket handler can retrieve the result when it becomes available.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.