FastAPI WebSockets — Real-time Communication
Master full-duplex communication in FastAPI.
- WebSockets upgrade an HTTP connection to a persistent, full-duplex TCP stream via the WS protocol
- Use @app.websocket('/ws') and await websocket.accept() to establish the handshake
- A ConnectionManager pattern maps user IDs to WebSocket objects for targeted routing and broadcast
- Each open socket consumes a coroutine slot and a file descriptor; 10K connections require tuned ulimits
- Without try/except WebSocketDisconnect, crashed clients leak server-side file descriptors silently
- Browsers cannot send custom headers on WS handshake — use signed query parameters or cookies for auth
Stateful Connection Management
In a production environment you rarely deal with a single socket in isolation. The moment you have more than one user you need a centralized structure that tracks active sessions, routes messages to specific clients, and handles lifecycle transitions cleanly. The ConnectionManager pattern is that structure.
The ForgeSocketManager below stores WebSocket objects in a dictionary keyed by user ID. This makes targeted messaging (send_personal_message) a dictionary lookup, and broadcast a single pass over the values. The critical invariant is that every connect() call must have a corresponding disconnect() call — otherwise the dictionary grows unbounded and you accumulate the exact kind of zombie entries described in the incident above.
The try/except WebSocketDisconnect block in the endpoint is what guarantees the disconnect() call happens for clean client closures. For network-level drops where no close frame arrives, you need the heartbeat mechanism covered later in this guide — the try/except block alone is not sufficient.
Securing the Handshake
WebSockets start life as an HTTP request, but the browser's WebSocket API does not allow you to set custom HTTP headers like Authorization: Bearer <token> on that initial request. This is a deliberate browser security restriction, not a FastAPI limitation. The two workable alternatives are signed query parameters and HttpOnly cookies set during a prior HTTP login flow.
Query parameters are the most common choice for API clients and native apps. Cookies are preferable for browser-based applications because they are never visible in access logs or browser history. The code below uses a query parameter because it is easier to demonstrate, but the authentication logic is identical for cookies — you just read from request.cookies instead of a query string.
The single rule that matters most: validate before you call accept(). Once accept() is called the handshake is complete, the connection is established, and the client is inside your system. Closing immediately after an invalid accept() is not equivalent to rejecting it — between accept() and close() the client may have already received a broadcast message or had its user ID added to the manager.
Scaling Broadcasts with Redis Pub/Sub
A WebSocket connection is bound to the specific OS process that accepted it. The ConnectionManager on that process has no visibility into connections on any other process. When you run multiple Uvicorn workers (--workers 4) or deploy multiple instances behind a load balancer, each worker has its own isolated ConnectionManager. Calling broadcast() on Worker A silently drops all messages destined for clients connected to Workers B, C, and D.
The standard fix is a pub/sub broker as a shared nervous system. Each FastAPI instance subscribes to a Redis channel on startup. When any instance needs to broadcast, it publishes to Redis. Redis delivers the message to every subscribed instance simultaneously, and each instance fans it out to its own local connections. The originating instance doesn't need to know where any given client is connected — Redis handles the routing.
Two things to understand about Redis pub/sub before you commit to it: first, it is fire-and-forget. If an instance is temporarily disconnected from Redis when a message is published, that message is gone — no replay, no delivery guarantee. Second, if a client is not currently connected (offline), the message is lost entirely because there is no storage layer. For systems where offline clients must receive missed messages on reconnect, replace pub/sub with Redis Streams (XADD/XREAD with consumer groups) or a proper message queue. The latency trade-off is real — Streams add microseconds of persistence overhead, but for notification systems that matters less than you might think.
Heartbeat and Liveness Detection
TCP connections can become half-open when the underlying network disappears without sending a FIN or RST packet. This is not an edge case — it is routine. Mobile clients switching between Wi-Fi and cellular, corporate firewalls with idle connection timeouts, cloud NAT gateways that evict long-lived sessions, and VPNs reconnecting all produce this behavior. The network layer drops the connection; the application layer never finds out.
The WebSocket protocol addresses this with ping and pong control frames. The server sends a ping frame; the client must respond with a pong. If the pong never arrives, the connection is dead and should be closed and removed from the manager.
In FastAPI with Uvicorn, you have two options. The first is Uvicorn-level heartbeat: set ws_ping_interval and ws_ping_timeout in uvicorn.run() and Uvicorn handles everything transparently using native WebSocket ping frames. This is the preferred approach — it requires no application code and covers every endpoint automatically. The second option is application-level heartbeat: your server sends a JSON message like {"type": "ping"} on a timer and expects {"type": "pong"} back. This is useful when you need application-aware liveness logic (for example, detecting an authenticated session that has gone stale vs. a dead TCP connection).
The heartbeat interval is a real trade-off. A 30-second interval with a 10-second response window means dead connections are detected within 30-40 seconds. The overhead is roughly 20 bytes per connection per 30 seconds — for 10,000 connections that is about 6.5 KB/s of heartbeat traffic, which is negligible. Going shorter than 15 seconds starts to add up at very high connection counts and increases pressure on the Redis pub/sub channel if you are propagating liveness events.
Concurrency Limits and Worker Tuning
A WebSocket coroutine lives for the entire duration of the connection — minutes, hours, sometimes days. This is fundamentally different from HTTP request handlers, which complete in milliseconds and release all their resources. You need to think about capacity differently.
The good news is that asyncio coroutines are cheap. Each one uses roughly 4-8 KB of memory on the stack. 10,000 concurrent WebSocket connections consume about 40-80 MB of coroutine stack space — manageable on any modern server. The limiting factor is almost never memory.
The actual bottleneck is file descriptors. Every open WebSocket holds one OS file descriptor for its TCP socket. The default ulimit on most Linux distributions is 1,024. On some distributions and container environments it is 65,535. Neither is adequate for a production WebSocket server expecting more than a few thousand concurrent connections. You need to raise this limit before the service goes live, not after the first EMFILE incident.
For CPU-bound work inside WebSocket handlers — JWT verification, JSON schema validation, image processing — asyncio's single-threaded model becomes the bottleneck. The event loop cannot proceed with other coroutines while a synchronous CPU-bound function is running. The solution is either run_in_executor to offload to a thread pool, or multiple Uvicorn workers (--workers N, one per CPU core) so each worker has its own event loop. With multiple workers you must have Redis pub/sub in place — without it, broadcast stops working correctly the moment you have more than one worker.
uvloop is a drop-in replacement for Python's built-in asyncio event loop implemented in Cython on top of libuv. Benchmarks consistently show 2-4x throughput improvement for I/O-bound workloads, which covers the overwhelming majority of WebSocket use cases. It is a one-line change and there is no reason not to use it in production.
| Protocol | Direction | Connection Model | Best For |
|---|---|---|---|
| WebSocket (WS/WSS) | Full-duplex bidirectional | Persistent TCP connection after HTTP upgrade handshake | Chat, live dashboards, collaborative editing, multiplayer gaming — anything requiring low-latency bidirectional communication |
| Server-Sent Events (SSE) | Server to client only | Persistent HTTP connection with text/event-stream MIME type — client cannot send back over the same connection | Push notifications, live feeds, stock tickers, build log streaming — one-way server push with automatic reconnect built into the browser |
| HTTP Long Polling | Client-initiated, server-deferred response | Repeated HTTP requests held open until data is available or timeout, then immediately re-issued by the client | Compatibility fallback for environments where WebSocket or SSE is blocked (some corporate proxies, older CDNs) |
| HTTP Short Polling | Client-initiated request/response | Periodic HTTP GET at fixed intervals — standard request/response, connection closes after each response | Low-frequency updates where latency above 5 seconds is acceptable and implementation simplicity outweighs efficiency |
| gRPC Streaming | Bidirectional (client, server, or both directions independently) | Persistent HTTP/2 connection with binary protobuf framing — supports client streaming, server streaming, and bidirectional streaming | Microservice-to-microservice communication, high-throughput binary streams, mobile clients where bandwidth efficiency matters |
Key Takeaways
- Always call await
websocket.accept()before sending or receiving — and never call it before validating auth. - Wrap the receive loop in try/except WebSocketDisconnect to handle clean client closures — but know that network-level drops never trigger this exception.
- The while True receive loop is intentional — it keeps the coroutine alive for the connection's lifetime. This is the correct pattern, not a bug.
- Server-side heartbeat via ws_ping_interval and ws_ping_timeout in Uvicorn is mandatory — without it, half-open sockets accumulate until EMFILE kills the worker.
- Fan out broadcast with
asyncio.gather()and return_exceptions=True — sequential await on 1K+ connections blocks the event loop and causes delivery jitter. - File descriptors, not memory, are the capacity bottleneck — set LimitNOFILE=1000000 in systemd or container config before going to production.
- WebSocket connections are local to a single process — any multi-worker or multi-instance deployment requires Redis pub/sub for broadcast to work correctly.
- Use the lifespan context manager for Redis and background task setup — @app.on_event('startup') is deprecated as of FastAPI 0.93.
Common Mistakes to Avoid
- Calling websocket.accept() before validating the auth token
Symptom: Unauthenticated clients briefly connect, get registered in the ConnectionManager, and may receive broadcast messages sent between accept() and close(). Security audits flag the endpoint as having an authentication bypass window. The issue is non-deterministic and hard to reproduce in testing because the window is small.
Fix: Validate the token (from query parameter or cookie) before any call towebsocket.accept(). If validation fails, call await websocket.close(code=1008) and return immediately. The 1008 close code signals Policy Violation to the client — use it instead of 1000 (Normal Closure) which implies success. - Using a sequential for loop in broadcast() without asyncio.gather()
Symptom: Broadcast latency scales linearly with connection count. The event loop is blocked for the entire duration of the broadcast, which delays all other coroutines including incoming messages and heartbeats. With 1,000 connections the block is ~100ms; with 10,000 connections it stretches into seconds, causing visible message delivery jitter and missed heartbeat responses.
Fix: Replace the sequential loop with: await asyncio.gather(*[conn.send_text(msg) for conn inconnections.values()], return_exceptions=True). The return_exceptions=True is critical — without it, a single failed send (dead connection) raises an exception that cancels the entire gather, silencing everyone else. - No server-side heartbeat to detect network-level disconnections
Symptom: File descriptor count grows at a slow, steady rate over hours. The metric looks benign in isolation until the process hits the ulimit ceiling and stops accepting new connections entirely. Application logs show nothing — no exceptions, no errors, no warnings. The only signal is the fd count trend and eventually EMFILE in kernel logs.
Fix: Set ws_ping_interval=30 and ws_ping_timeout=10 inuvicorn.run(). This is the lowest-effort, highest-reliability fix — Uvicorn handles the ping/pong at the WebSocket protocol level with no application code required. If you need application-level liveness metadata, implement a HeartbeatManager as shown in the Heartbeat section. - Broadcasting without Redis pub/sub in a multi-worker or multi-instance deployment
Symptom: Users connected to different workers or server instances cannot see each other's messages. The behavior appears random — some users see all messages, others miss some, depending on which worker their connection landed on. The bug is invisible in local development where a single worker handles all test clients.
Fix: Add Redis pub/sub: each worker subscribes to a shared channel via the lifespan context manager, publishes outgoing messages to Redis, and forwards received messages to its local connection set. Every worker receives every published message and delivers it to its own connected clients. - Not setting ulimit/file descriptor limits before deploying the WebSocket service
Symptom: The service appears healthy in staging (low connection count) and works fine in the first hours of production. After enough users connect, the process hits the OS default ulimit and all new connection attempts fail with EMFILE. Existing connections continue working, making the issue look like an admission control problem rather than a resource limit.
Fix: Set LimitNOFILE=1000000 in the [Service] section of the systemd unit file. For Docker, add --ulimit nofile=1000000:1000000 to the run command or ulimits in docker-compose.yml. Verify with cat /proc/<pid>/limits | grep 'open files' after the service starts. Do this before the service handles any production traffic — raising ulimit requires a service restart.
Interview Questions on This Topic
- QWhat is the difference between WSGI and ASGI, and why is the latter required for FastAPI WebSockets?Mid-levelReveal
- QExplain the 'C10k problem' and how Python's asyncio library helps FastAPI handle thousands of concurrent WebSocket connections.SeniorReveal
- QHow would you implement a distributed 'Presence' system (showing who is online) using FastAPI and Redis?SeniorReveal
- QDescribe the lifecycle of a WebSocket handshake. What happens at the HTTP level during the Upgrade request?Mid-levelReveal
- QIn a System Design context: how would you design a scalable notification service using WebSockets for 10 million users?SeniorReveal
Frequently Asked Questions
How do I authenticate a WebSocket connection?
The browser's WebSocket API does not allow setting custom HTTP headers like Authorization: Bearer on the initial handshake request. Your two options are query parameters and cookies.
Query parameters are straightforward for API clients and native apps: @app.websocket('/ws') async def ws(websocket: WebSocket, token: str = Query(...)). Validate the token before calling websocket.accept(), and close with websocket.close(code=1008) if it is invalid. The problem with query parameters is that they appear verbatim in server access logs and proxy logs — a long-lived API key in a query parameter is a permanent credential leak. Use short-lived JWTs with a 60-second expiry: the client obtains a fresh token via a normal HTTP endpoint, opens the WebSocket connection with it, and the token is worthless by the time it appears in a log.
Cookies are the better choice for browser-based applications. HttpOnly cookies set during a prior HTTP login flow are sent automatically with the WebSocket upgrade request, are never visible in JavaScript, and do not appear in browser history. Read the cookie in the WebSocket handler via websocket.cookies.get('session') and validate it before accepting.
How do WebSockets scale across multiple FastAPI instances?
A WebSocket connection is permanently bound to the specific process that accepted it. The ConnectionManager on that process has no visibility into connections on any other process. If you broadcast on Instance A, clients on Instances B, C, and D receive nothing.
The solution is Redis pub/sub: each instance subscribes to a shared Redis channel in the lifespan startup hook. When any instance needs to broadcast, it publishes to Redis. Redis delivers the message to every subscribed instance simultaneously, and each instance fans it out to its own local connections. The originating instance doesn't need to know which instance any given client is connected to.
For durable delivery — where clients that are temporarily offline should receive missed messages when they reconnect — replace pub/sub with Redis Streams. Use XADD to publish and XREAD with consumer groups to deliver, with the stream acting as a persistent log. Pub/sub is fire-and-forget; Streams are durable and replayable.
Can I use FastAPI Middleware with WebSockets?
Standard HTTP middleware defined with @app.middleware('http') does not execute for WebSocket connections. The middleware chain is built around the HTTP request/response model — it sees the initial upgrade request, but after the 101 handshake the connection transitions to the WebSocket protocol and the middleware stack no longer intercepts it.
For cross-cutting concerns on WebSocket endpoints — authentication, rate limiting, structured logging, request tracing — the practical approaches are: a dependency function injected into the endpoint (the cleanest option for auth and validation), a decorator wrapping the endpoint function, or a class-based wrapper that implements the same connect/receive/send lifecycle. For rate limiting specifically, implement it in the dependency using Redis counters keyed by IP or user_id — the same Redis instance you are already using for pub/sub.
What happens to WebSocket connections during a server deploy or restart?
When Uvicorn receives SIGTERM (the standard shutdown signal from systemd, Kubernetes, or a container orchestrator), it begins graceful shutdown. Active WebSocket connections receive a close frame with code 1001 (Going Away). The client's WebSocket implementation fires the onclose event, and well-written clients use this as the trigger to implement exponential-backoff reconnect logic.
For zero-downtime deploys, use rolling deployments with connection draining: stop routing new connections to the instance being replaced (remove it from the load balancer's target group or upstream), then wait for existing connections to close naturally or hit the drain timeout, then send SIGTERM. Configure the drain timeout based on your expected connection lifetime — for notification services where connections are typically short-lived, 30 seconds is usually sufficient; for long-lived collaborative editing sessions, you may need minutes.
In Kubernetes, preStop hooks give you a window to drain before SIGTERM arrives. Set terminationGracePeriodSeconds to at least 2x your expected drain window. With Redis pub/sub, a client that reconnects to a new pod immediately rejoins the same broadcast graph — the user experience is a brief reconnect rather than a visible outage.
That's Python Libraries. Mark it forged?
5 min read · try the examples if you haven't