Senior 11 min · March 05, 2026

FastAPI WebSockets — Real-time Communication

Master full-duplex communication in FastAPI.

N
Naren Founder & Principal Engineer

20+ years shipping production Python across data and backend systems. Written from production experience, not tutorials.

Follow
Production
production tested
June 10, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • WebSockets upgrade an HTTP connection to a persistent, full-duplex TCP stream via the WS protocol
  • Use @app.websocket('/ws') and await websocket.accept() to establish the handshake
  • A ConnectionManager pattern maps user IDs to WebSocket objects for targeted routing and broadcast
  • Each open socket consumes a coroutine slot and a file descriptor; 10K connections require tuned ulimits
  • Without try/except WebSocketDisconnect, crashed clients leak server-side file descriptors silently
  • Browsers cannot send custom headers on WS handshake — use signed query parameters or cookies for auth
✦ Definition~90s read
What is FastAPI WebSockets?

FastAPI WebSockets let you maintain persistent, bidirectional connections between clients and your Python server — essential for real-time features like live dashboards, chat, or collaborative editing. Unlike HTTP request-response cycles, WebSockets keep a socket open after the initial upgrade handshake, so you can push data from server to client without polling.

Think of HTTP as sending letters through the postal service — each message needs a new envelope, a stamp, and a full round trip before you hear back.

FastAPI handles this natively with WebSocket endpoints, giving you async I/O out of the box, automatic dependency injection, and the same Pydantic validation you use for REST — but with a stateful, long-lived connection model that demands different patterns for lifecycle management, concurrency, and backpressure.

In production, you'll quickly hit limits: a single uvicorn worker can handle maybe 1,000–10,000 concurrent WebSocket connections depending on payload size and CPU, but scaling across multiple workers or machines requires an external pub/sub layer like Redis to broadcast messages to all connected clients. You also need to manage connection state explicitly — FastAPI doesn't track open sockets for you — so you'll maintain a dict or set of active connections per room or user, and clean them up on disconnect.

Security is another layer: the initial HTTP upgrade request can carry cookies or tokens, but you must validate them manually inside the WebSocket handler before accepting the connection, because FastAPI's middleware doesn't run on the upgrade path.

Heartbeat mechanisms (ping/pong frames) are critical to detect dead clients and free resources — without them, half-open connections accumulate silently. And because WebSocket handlers are long-running coroutines, you must tune worker concurrency (e.g., --workers 4 --limit-concurrency 1000) and set timeouts to avoid starving other requests.

Redis Pub/Sub is the go-to for horizontal scaling: each worker subscribes to channels and publishes messages, so a broadcast from one worker reaches all others. Alternatives like raw TCP sockets or Server-Sent Events exist, but WebSockets give you full-duplex communication with lower latency than SSE and simpler browser APIs than raw sockets — at the cost of more complex state management and scaling.

Plain-English First

Think of HTTP as sending letters through the postal service — each message needs a new envelope, a stamp, and a full round trip before you hear back. A WebSocket is more like picking up the phone: once both sides say hello, you have an open line where either person can speak at any time without redialing. The server can push updates the instant something happens, rather than waiting for the client to ask 'anything new?' every few seconds. That difference — push versus poll — is what makes WebSockets the right tool for anything that needs to feel live.

FastAPI WebSockets give you persistent, bidirectional connections for real-time features, but production deployment forces hard problems: connection state tracking, horizontal scaling across workers, and detecting dead clients before they exhaust resources. This guide covers the ConnectionManager pattern, Redis Pub/Sub for broadcast scaling, JWT validation on the upgrade handshake, and heartbeat mechanisms to prevent half-open connections from accumulating silently.

FastAPI WebSockets — Real-time Communication Without the Fluff

FastAPI WebSockets provide a bidirectional, persistent connection between a client and server over a single TCP socket, enabling real-time data exchange without the overhead of HTTP request-response cycles. The core mechanic is the WebSocket protocol (RFC 6455), which FastAPI exposes via the WebSocket class and @app.websocket decorator. Once a WebSocket handshake upgrades an HTTP connection, both sides can send messages freely — typically JSON or text frames — until either party closes the link. This is fundamentally different from polling or Server-Sent Events because the server can push data to the client at any time, not just in response to a request. In practice, FastAPI handles WebSocket lifecycle events (connect, receive, disconnect) asynchronously, leveraging Python's async/await to manage thousands of concurrent connections efficiently. The key property that matters: WebSockets are stateful — the server must track each open socket, handle reconnection logic, and manage backpressure. Without careful design, a single slow client can block the entire event loop. Use WebSockets when you need low-latency, bidirectional updates — chat applications, live dashboards, collaborative editing, or real-time gaming. They are not a replacement for REST; they solve a specific problem: pushing data from server to client with sub-100ms latency, where polling would waste bandwidth and degrade UX.

Stateful ≠ Stateless
WebSockets are stateful — your server must track open connections and handle reconnection. Don't treat them like stateless HTTP endpoints.
Production Insight
A real-time trading dashboard using FastAPI WebSockets crashed under 500 concurrent users because the server didn't implement backpressure — a single slow client with a 1Mbps connection caused the event loop to block on send(), dropping all other connections.
Symptom: asyncio.TimeoutError on websocket.send() after 30 seconds of no response, followed by a cascade of disconnects.
Rule: Always set a send timeout (e.g., 5 seconds) and implement a per-client send queue with a bounded size — drop messages for slow clients rather than blocking the loop.
Key Takeaway
WebSockets are bidirectional and stateful — design for connection lifecycle, not request-response.
Backpressure is not optional: a single slow client can block your entire event loop without explicit send timeouts and queues.
Use WebSockets only when you need server-push with <100ms latency — for everything else, REST or SSE is simpler and more resilient.
FastAPI WebSockets Real-Time Communication Flow THECODEFORGE.IO FastAPI WebSockets Real-Time Communication Flow From handshake to broadcast with Redis and heartbeat WebSocket Handshake Upgrade HTTP to WS, validate origin & token Stateful Connection Manager Track active connections per user/room Message Protocol Use JSON with type field, not raw text Redis Pub/Sub Broadcast Scale across workers with channel publish Heartbeat & Liveness Periodic ping/pong to detect stale sockets Concurrency & Worker Tuning Limit workers, use async for I/O ⚠ Handshake failure is silent if not validated Always verify upgrade headers and reject invalid origins THECODEFORGE.IO
thecodeforge.io
FastAPI WebSockets Real-Time Communication Flow
Fastapi Websockets

Stateful Connection Management

In a production environment you rarely deal with a single socket in isolation. The moment you have more than one user you need a centralized structure that tracks active sessions, routes messages to specific clients, and handles lifecycle transitions cleanly. The ConnectionManager pattern is that structure.

The ForgeSocketManager below stores WebSocket objects in a dictionary keyed by user ID. This makes targeted messaging (send_personal_message) a dictionary lookup, and broadcast a single pass over the values. The critical invariant is that every connect() call must have a corresponding disconnect() call — otherwise the dictionary grows unbounded and you accumulate the exact kind of zombie entries described in the incident above.

The try/except WebSocketDisconnect block in the endpoint is what guarantees the disconnect() call happens for clean client closures. For network-level drops where no close frame arrives, you need the heartbeat mechanism covered later in this guide — the try/except block alone is not sufficient.

io/thecodeforge/realtime/connection_manager.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from typing import Dict
import asyncio

app = FastAPI()


class ForgeSocketManager:
    """Manages active WebSocket connections keyed by user ID.

    The dictionary is the single source of truth for who is currently
    connected. Every code path that touches it must respect the
    connect/disconnect contract — no exceptions.
    """

    def __init__(self):
        # Keyed by user_id so we can route messages without iterating everything
        self.active_connections: Dict[str, WebSocket] = {}

    async def connect(self, user_id: str, websocket: WebSocket) -> None:
        await websocket.accept()
        self.active_connections[user_id] = websocket

    def disconnect(self, user_id: str) -> None:
        # .pop() with a default is safer than del — avoids KeyError on double-disconnect
        self.active_connections.pop(user_id, None)

    async def send_personal_message(self, message: str, user_id: str) -> None:
        ws = self.active_connections.get(user_id)
        if ws is not None:
            await ws.send_text(message)

    async def broadcast(self, message: str) -> None:
        # asyncio.gather fans out all sends concurrently instead of awaiting each one
        # return_exceptions=True prevents one failed send from cancelling the rest
        await asyncio.gather(
            *[ws.send_text(message) for ws in self.active_connections.values()],
            return_exceptions=True,
        )


manager = ForgeSocketManager()


@app.websocket("/ws/{user_id}")
async def websocket_endpoint(websocket: WebSocket, user_id: str):
    await manager.connect(user_id, websocket)
    try:
        while True:
            data = await websocket.receive_text()
            await manager.broadcast(f"User {user_id}: {data}")
    except WebSocketDisconnect:
        # Clean close frame received — remove from manager and notify others
        manager.disconnect(user_id)
        await manager.broadcast(f"User {user_id} has left the forge.")
Output
Broadcasting enabled: messages are fanned out concurrently to all connected users via asyncio.gather.
Connection Manager as a Phone Book
  • connect() = add a new entry after the handshake completes — not before
  • disconnect() = remove the entry when the call ends — skip this and you have ghost listings that accumulate forever
  • send_personal_message() = look up a number and call it directly — .get() with a None check handles the case where the user already left
  • broadcast() with asyncio.gather() = conference call to everyone simultaneously — sequential await is O(n) latency which blocks the event loop
  • The try/except WebSocketDisconnect is the hang-up detector for clean closes — heartbeats handle the network-level drops that never fire this exception
Production Insight
The original broadcast() pattern in most tutorials uses a sequential for loop — await each send one at a time. With 100 connections that is imperceptible. With 1,000 connections at 100 microseconds per send you are blocking the event loop for 100ms on every broadcast. asyncio.gather() makes all the sends concurrent within the same event loop tick, which keeps broadcast latency flat regardless of connection count. Use return_exceptions=True or a single failed send will cancel the entire gather and silence everyone else.
Key Takeaway
Every connect() must pair with a disconnect() — the try/except block covers clean closes, heartbeats cover silent drops.
asyncio.gather() for broadcast is not a micro-optimisation — it is the difference between O(1) and O(n) latency at scale.
The manager dict is the single source of truth for active sessions — never bypass it.

Securing the Handshake

WebSockets start life as an HTTP request, but the browser's WebSocket API does not allow you to set custom HTTP headers like Authorization: Bearer <token> on that initial request. This is a deliberate browser security restriction, not a FastAPI limitation. The two workable alternatives are signed query parameters and HttpOnly cookies set during a prior HTTP login flow.

Query parameters are the most common choice for API clients and native apps. Cookies are preferable for browser-based applications because they are never visible in access logs or browser history. The code below uses a query parameter because it is easier to demonstrate, but the authentication logic is identical for cookies — you just read from request.cookies instead of a query string.

The single rule that matters most: validate before you call accept(). Once accept() is called the handshake is complete, the connection is established, and the client is inside your system. Closing immediately after an invalid accept() is not equivalent to rejecting it — between accept() and close() the client may have already received a broadcast message or had its user ID added to the manager.

io/thecodeforge/security/socket_auth.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from fastapi import FastAPI, WebSocket, Query, status
from jose import JWTError, jwt
from datetime import datetime

app = FastAPI()

SECRET_KEY = "your-secret-key"  # Load from environment in production
ALGORITHM = "HS256"


def decode_ws_token(token: str) -> dict | None:
    """Returns the decoded payload or None if the token is invalid or expired."""
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
        # Verify expiry explicitly — some JWT libraries are lenient about this
        if payload.get("exp") and datetime.utcnow().timestamp() > payload["exp"]:
            return None
        return payload
    except JWTError:
        return None


@app.websocket("/secure-ws")
async def secure_socket(websocket: WebSocket, token: str = Query(...)):
    # Validate BEFORE accept() — this is the hard rule
    payload = decode_ws_token(token)
    if payload is None:
        # 1008 = Policy Violation — the correct code for auth failure
        # 1000 = Normal Closure — wrong signal; implies success
        await websocket.close(code=status.WS_1008_POLICY_VIOLATION)
        return

    user_id: str = payload.get("sub")
    await websocket.accept()
    await websocket.send_text(f"Authenticated. Welcome, {user_id}.")

    try:
        while True:
            data = await websocket.receive_text()
            await websocket.send_text(f"Echo: {data}")
    except WebSocketDisconnect:
        pass  # Clean up handled by ConnectionManager in the full implementation
Output
Unauthorized connections are rejected before the accept() handshake. Authenticated users receive a welcome message with their user ID from the JWT payload.
Never Accept Before Validating
Calling websocket.accept() before token validation creates a window where an unauthenticated client is fully connected. Even if you close immediately after, they have already been registered in the event loop as a live connection, may have received a broadcast message that went out between accept() and close(), and their connection attempt has consumed a file descriptor. For JWT tokens, decode and verify the signature, expiry, and audience claim before accepting. For cookies, read and validate before accepting. The order is non-negotiable.
Production Insight
Query parameters containing tokens appear verbatim in nginx access logs, application logs, and any proxy sitting between the client and server. A long-lived API key in a query parameter is a credential leak waiting to happen. In production, use short-lived JWTs with a 60-second expiry — the client obtains a fresh token via a normal HTTP endpoint before opening each WebSocket connection. The token is useless by the time it appears in a log. Alternatively, use HttpOnly cookies set during the HTTP login flow — they are sent automatically with the WebSocket upgrade request and never appear in logs.
Key Takeaway
Browsers cannot send custom Authorization headers on WebSocket handshake — query params or cookies are your only options.
Validate the token BEFORE calling accept()accept() is the point of no return.
Close with code 1008 (Policy Violation) for auth failures — not 1000 (Normal Closure) which signals success.
Use short-lived tokens (60s expiry) or HttpOnly cookies to keep credentials out of access logs.

Scaling Broadcasts with Redis Pub/Sub

A WebSocket connection is bound to the specific OS process that accepted it. The ConnectionManager on that process has no visibility into connections on any other process. When you run multiple Uvicorn workers (--workers 4) or deploy multiple instances behind a load balancer, each worker has its own isolated ConnectionManager. Calling broadcast() on Worker A silently drops all messages destined for clients connected to Workers B, C, and D.

The standard fix is a pub/sub broker as a shared nervous system. Each FastAPI instance subscribes to a Redis channel on startup. When any instance needs to broadcast, it publishes to Redis. Redis delivers the message to every subscribed instance simultaneously, and each instance fans it out to its own local connections. The originating instance doesn't need to know where any given client is connected — Redis handles the routing.

Two things to understand about Redis pub/sub before you commit to it: first, it is fire-and-forget. If an instance is temporarily disconnected from Redis when a message is published, that message is gone — no replay, no delivery guarantee. Second, if a client is not currently connected (offline), the message is lost entirely because there is no storage layer. For systems where offline clients must receive missed messages on reconnect, replace pub/sub with Redis Streams (XADD/XREAD with consumer groups) or a proper message queue. The latency trade-off is real — Streams add microseconds of persistence overhead, but for notification systems that matters less than you might think.

io/thecodeforge/realtime/redis_broadcast.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import asyncio
import redis.asyncio as aioredis
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from contextlib import asynccontextmanager


local_connections: dict[str, WebSocket] = {}
CHANNEL = "forge:broadcast"
redis: aioredis.Redis | None = None


async def redis_listener(r: aioredis.Redis) -> None:
    """Long-running coroutine: subscribes to Redis and forwards to local clients."""
    pubsub = r.pubsub()
    await pubsub.subscribe(CHANNEL)
    async for message in pubsub.listen():
        if message["type"] != "message":
            continue
        data: str = message["data"].decode()
        # Collect stale connections during the fan-out rather than modifying
        # the dict mid-iteration, which raises RuntimeError in Python 3.x
        stale: list[str] = []
        await asyncio.gather(
            *[
                ws.send_text(data)
                for uid, ws in local_connections.items()
            ],
            return_exceptions=True,
        )
        # Clean up any connections that raised during the gather
        for uid in stale:
            local_connections.pop(uid, None)


@asynccontextmanager
async def lifespan(app: FastAPI):
    global redis
    redis = aioredis.from_url("redis://localhost:6379", decode_responses=False)
    # Start the listener as a background task tied to the worker's lifespan
    listener_task = asyncio.create_task(redis_listener(redis))
    yield
    # Graceful shutdown — cancel the listener and close the Redis connection
    listener_task.cancel()
    await redis.aclose()


app = FastAPI(lifespan=lifespan)


@app.websocket("/ws/{user_id}")
async def ws_endpoint(websocket: WebSocket, user_id: str):
    await websocket.accept()
    local_connections[user_id] = websocket
    try:
        while True:
            data = await websocket.receive_text()
            # Publish to Redis — all instances (including this one) receive it
            await redis.publish(CHANNEL, f"{user_id}: {data}")
    except WebSocketDisconnect:
        local_connections.pop(user_id, None)
Output
Messages published by any worker instance are delivered to all local connection sets across the cluster via Redis pub/sub.
Redis as the Shared Nervous System
  • Each worker subscribes to the same Redis channel on startup via the lifespan context manager
  • Publishing to Redis delivers to every subscribed instance in microseconds regardless of where the publisher is
  • Each instance fans out to its own local connection set — it never touches another instance's connections directly
  • Pub/sub is ephemeral — an instance that loses Redis connectivity during a publish misses that message entirely
  • For offline delivery or message replay, use Redis Streams (XADD/XREAD) with a consumer group instead of pub/sub
  • Note: the lifespan context manager replaces the deprecated @app.on_event('startup') pattern as of FastAPI 0.93
Production Insight
A common mistake is running four Uvicorn workers and testing locally with a single client — everything appears to work perfectly because all connections happen to land on the same worker. The bug only surfaces when you scale out and clients on different workers stop seeing each other's messages. The rule is simple: if you have more than one worker process or more than one server instance, Redis pub/sub is not optional. Add it before you scale, not after you debug the symptom.
Key Takeaway
WebSocket connections are local to a single process — broadcast without Redis only reaches that process's clients.
Redis pub/sub decouples broadcast from server topology — publish once, every instance delivers to its own connections.
Pub/sub is fire-and-forget — use Redis Streams with consumer groups for durable, replayable delivery.
Use the lifespan context manager for Redis setup and teardown — @app.on_event is deprecated.

Heartbeat and Liveness Detection

TCP connections can become half-open when the underlying network disappears without sending a FIN or RST packet. This is not an edge case — it is routine. Mobile clients switching between Wi-Fi and cellular, corporate firewalls with idle connection timeouts, cloud NAT gateways that evict long-lived sessions, and VPNs reconnecting all produce this behavior. The network layer drops the connection; the application layer never finds out.

The WebSocket protocol addresses this with ping and pong control frames. The server sends a ping frame; the client must respond with a pong. If the pong never arrives, the connection is dead and should be closed and removed from the manager.

In FastAPI with Uvicorn, you have two options. The first is Uvicorn-level heartbeat: set ws_ping_interval and ws_ping_timeout in uvicorn.run() and Uvicorn handles everything transparently using native WebSocket ping frames. This is the preferred approach — it requires no application code and covers every endpoint automatically. The second option is application-level heartbeat: your server sends a JSON message like {"type": "ping"} on a timer and expects {"type": "pong"} back. This is useful when you need application-aware liveness logic (for example, detecting an authenticated session that has gone stale vs. a dead TCP connection).

The heartbeat interval is a real trade-off. A 30-second interval with a 10-second response window means dead connections are detected within 30-40 seconds. The overhead is roughly 20 bytes per connection per 30 seconds — for 10,000 connections that is about 6.5 KB/s of heartbeat traffic, which is negligible. Going shorter than 15 seconds starts to add up at very high connection counts and increases pressure on the Redis pub/sub channel if you are propagating liveness events.

io/thecodeforge/realtime/heartbeat.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import asyncio
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from typing import Dict

app = FastAPI()


class HeartbeatManager:
    """Connection manager with per-connection heartbeat tasks.

    Each connection gets an independent heartbeat coroutine. If the client
    fails to respond to a ping within the timeout window, the connection is
    closed and removed from the active set.
    """

    def __init__(self, interval: int = 30, timeout: int = 10):
        self.connections: Dict[str, WebSocket] = {}
        self._last_pong: Dict[str, float] = {}
        self.interval = interval
        self.timeout = timeout

    async def connect(self, user_id: str, websocket: WebSocket) -> None:
        await websocket.accept()
        self.connections[user_id] = websocket
        self._last_pong[user_id] = asyncio.get_event_loop().time()
        asyncio.create_task(self._heartbeat_loop(user_id, websocket))

    def record_pong(self, user_id: str) -> None:
        """Called when the client sends a pong-type message."""
        self._last_pong[user_id] = asyncio.get_event_loop().time()

    def disconnect(self, user_id: str) -> None:
        self.connections.pop(user_id, None)
        self._last_pong.pop(user_id, None)

    async def _heartbeat_loop(self, user_id: str, ws: WebSocket) -> None:
        while user_id in self.connections:
            await asyncio.sleep(self.interval)
            if user_id not in self.connections:
                break
            # Check when we last heard from this client
            elapsed = asyncio.get_event_loop().time() - self._last_pong.get(user_id, 0)
            if elapsed > self.interval + self.timeout:
                # Client has not responded within the timeout window — treat as dead
                await ws.close(code=1001)  # 1001 = Going Away
                self.disconnect(user_id)
                break
            try:
                await ws.send_json({"type": "ping"})
            except Exception:
                # Send failed — connection is already dead at the network layer
                self.disconnect(user_id)
                break


manager = HeartbeatManager(interval=30, timeout=10)


@app.websocket("/ws/{user_id}")
async def ws_endpoint(websocket: WebSocket, user_id: str):
    await manager.connect(user_id, websocket)
    try:
        while True:
            data = await websocket.receive_json()
            if data.get("type") == "pong":
                manager.record_pong(user_id)
                continue  # Heartbeat response — no further processing needed
            # Handle actual application messages here
            await websocket.send_json({"echo": data})
    except WebSocketDisconnect:
        manager.disconnect(user_id)
Output
Dead connections are detected within 30-40 seconds and cleaned up. The last_pong timestamp approach handles slow or intermittent clients gracefully without false-positive disconnects.
Zombie Connections Drain Resources Without Any Warning
A half-open WebSocket holds a file descriptor, keeps a coroutine parked in the event loop, and occupies a slot in the ConnectionManager dictionary. None of these resources are released until something explicitly closes the connection. No error is logged. No metric spikes. The process just slowly accumulates dead weight until it hits the file descriptor ceiling and stops accepting any new connections — HTTP or WebSocket. By the time the symptom is visible, the damage has been building for hours. Heartbeats are the only reliable way to detect and reclaim these resources within a bounded time window.
Production Insight
If you are using Uvicorn directly, ws_ping_interval and ws_ping_timeout in uvicorn.run() give you native WebSocket-level ping frames with no application code required. This is preferable to an application-level heartbeat because it uses the actual WebSocket control frame mechanism, works transparently with browser WebSocket implementations, and doesn't require the client to implement any JSON message protocol. Reserve application-level heartbeats for cases where you need to carry additional liveness metadata — for example, sending the server's current timestamp so the client can detect clock drift.
Key Takeaway
TCP drops without FIN/RST create half-open sockets that never fire WebSocketDisconnect — they are completely invisible to the application layer.
Uvicorn ws_ping_interval + ws_ping_timeout is the simplest heartbeat implementation — prefer it over application-level ping/pong unless you need custom liveness metadata.
30-second interval with 10-second timeout means dead sockets are detected and reclaimed in at most 40 seconds.

Concurrency Limits and Worker Tuning

A WebSocket coroutine lives for the entire duration of the connection — minutes, hours, sometimes days. This is fundamentally different from HTTP request handlers, which complete in milliseconds and release all their resources. You need to think about capacity differently.

The good news is that asyncio coroutines are cheap. Each one uses roughly 4-8 KB of memory on the stack. 10,000 concurrent WebSocket connections consume about 40-80 MB of coroutine stack space — manageable on any modern server. The limiting factor is almost never memory.

The actual bottleneck is file descriptors. Every open WebSocket holds one OS file descriptor for its TCP socket. The default ulimit on most Linux distributions is 1,024. On some distributions and container environments it is 65,535. Neither is adequate for a production WebSocket server expecting more than a few thousand concurrent connections. You need to raise this limit before the service goes live, not after the first EMFILE incident.

For CPU-bound work inside WebSocket handlers — JWT verification, JSON schema validation, image processing — asyncio's single-threaded model becomes the bottleneck. The event loop cannot proceed with other coroutines while a synchronous CPU-bound function is running. The solution is either run_in_executor to offload to a thread pool, or multiple Uvicorn workers (--workers N, one per CPU core) so each worker has its own event loop. With multiple workers you must have Redis pub/sub in place — without it, broadcast stops working correctly the moment you have more than one worker.

uvloop is a drop-in replacement for Python's built-in asyncio event loop implemented in Cython on top of libuv. Benchmarks consistently show 2-4x throughput improvement for I/O-bound workloads, which covers the overwhelming majority of WebSocket use cases. It is a one-line change and there is no reason not to use it in production.

io/thecodeforge/config/uvicorn_ws.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import uvicorn

# Production WebSocket server configuration
# Run this with: python -m io.thecodeforge.config.uvicorn_ws
# Or directly: uvicorn app:app --workers 4 --loop uvloop

if __name__ == "__main__":
    uvicorn.run(
        "io.thecodeforge.realtime.connection_manager:app",
        host="0.0.0.0",
        port=8000,
        workers=4,                # One per physical CPU core for CPU-bound handler work
        loop="uvloop",            # 2-4x faster than default asyncio on I/O-bound workloads
        ws="websockets",          # websockets library — better performance than wsproto default
        ws_ping_interval=30,      # Send native WS ping frame every 30 seconds
        ws_ping_timeout=10,       # Close connection if no pong within 10 seconds
        limit_concurrency=10000,  # Reject new connections beyond this per worker
        limit_max_requests=50000, # Restart worker after N requests — guards against slow leaks
        timeout_keep_alive=5,     # HTTP keep-alive timeout (not WebSocket — separate setting)
        access_log=True,          # Log WS upgrade requests alongside HTTP — useful for debugging
    )
Output
Server starts with 4 workers, native 30s/10s ping/pong heartbeat, 10K per-worker connection limit, and uvloop for maximum throughput.
Capacity Planning: File Descriptors First, Memory Second
  • Each WebSocket coroutine uses ~4-8 KB of stack memory — 10K connections is roughly 40-80 MB, well within budget
  • Each connection holds exactly one OS file descriptor — this is the real ceiling and it defaults to 1,024 or 65,535
  • Set LimitNOFILE=1000000 in the systemd unit file — not ulimit in a shell script, which doesn't persist across restarts
  • In Docker, set --ulimit nofile=1000000:1000000 on the container or adjust in docker-compose.yml
  • ws_ping_interval and ws_ping_timeout at the Uvicorn level handle heartbeat transparently — you don't need application-level ping/pong unless you need custom liveness metadata
  • limit_max_requests is a safety valve against slow memory leaks — the worker restarts after N requests, Uvicorn handles graceful handoff
Production Insight
A common misconfiguration is setting ulimit in a shell script that runs before the service starts. Shell ulimit changes only apply to the shell and its child processes in that session. If systemd starts the service, the shell's ulimit has no effect — the process inherits systemd's default limits. Always set LimitNOFILE in the [Service] section of the systemd unit file, or in the container's runtime configuration. Verify it actually took effect with cat /proc/<pid>/limits | grep 'open files' after the service starts.
Key Takeaway
Each WebSocket holds a coroutine and a file descriptor for its entire lifetime — unlike HTTP which releases both in milliseconds.
File descriptors are the capacity bottleneck — set LimitNOFILE=1000000 in systemd or container config before going to production.
Uvicorn's ws_ping_interval + ws_ping_timeout replaces manual heartbeat code — it is the simplest and most reliable approach.
Multiple workers require Redis pub/sub — without it, broadcast is silently broken for any client not on the same worker.

The Handshake: Don't Let Your Upgrade Fail Silently

Every WebSocket connection starts with an HTTP upgrade request. The client sends Upgrade: websocket, Connection: Upgrade, plus a Sec-WebSocket-Key header. Your server returns 101 Switching Protocols with a computed Sec-WebSocket-Accept. If that handshake fails, you get a silent hang or a 426 status that most frontends don't handle gracefully.

The common mistake is assuming the WebSocket client library handles retry logic. It doesn't. You own the retry strategy. In FastAPI, the @app.websocket decorator automatically accepts the handshake, but you must validate the origin header before calling websocket.accept(). Otherwise, any domain can open a socket to your server.

Production tip: log the client.host and headers during the handshake. When a socket drops after 30 seconds, that log entry tells you if it was a bad upgrade or a network interruption. Don't waste hours debugging the wrong layer.

HandshakeValidator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — python tutorial

from fastapi import FastAPI, WebSocket, WebSocketDisconnect, status
from typing import Optional

app = FastAPI()

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    origin = websocket.headers.get("origin", "")
    allowed_origins = ["https://your-frontend.com"]

    if origin not in allowed_origins:
        await websocket.close(code=status.WS_1008_POLICY_VIOLATION)
        return

    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_text()
            await websocket.send_text(f"Echo: {data}")
    except WebSocketDisconnect:
        print("Client disconnected cleanly")
Output
➜ Client sends upgrade with mismatched origin
➜ Server closes with 1008 (Policy Violation)
➜ No log noise, no resource leak
Production Trap: The 5-Second Window
If your server calls websocket.accept() before validating the origin, the socket is open and consuming resources. Attackers can open thousands of sockets and leave them idle. Always validate before accept.
Key Takeaway
Validate origin headers before calling accept() or you're paying for sockets that waste your connection pool.

Message Protocol: Why Raw Text Is a Liability

Most tutorials send plain text strings over WebSockets. In production, that's a disaster waiting to happen. You get one message type, no schema, no error handling. The moment you need to send different payloads — user join, typing indicator, system alert — you're parsing fragile string prefixes.

Instead, adopt a lightweight message envelope with a type field and a data payload. JSON is fine. If you need binary efficiency, use MessagePack. The pattern is simple: {"type": "chat_message", "data": {"user": "alice", "text": "hello"}}. Your handler dispatches on type, not on string hacking.

The payoff: testability, schema validation with Pydantic, and clean routing. When a client sends garbage, you reject the message, not the socket. Keep the connection alive for legitimate traffic.

MessageRouter.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — python tutorial

from pydantic import BaseModel, ValidationError
from fastapi import WebSocket
from typing import Any, Dict

class WSMessage(BaseModel):
    type: str
    data: Dict[str, Any]

async def handle_message(websocket: WebSocket, raw: str):
    try:
        msg = WSMessage.model_validate_json(raw)
    except ValidationError:
        await websocket.send_json({"error": "invalid message format"})
        return

    if msg.type == "ping":
        await websocket.send_json({"type": "pong"})
    elif msg.type == "chat":
        # process chat message
        await websocket.send_json({"type": "chat_ack", "data": msg.data})
    else:
        await websocket.send_json({"error": f"unknown type: {msg.type}"})
Output
➜ Client sends: {"type":"chat","data":{"text":"hi"}}
➜ Server responds: {"type":"chat_ack","data":{"text":"hi"}}
➜ Client sends: garbage string
➜ Server responds: {"error":"invalid message format"}
Senior Shortcut: Schema-First Sockets
Key Takeaway
Use a typed message envelope with Pydantic. Raw text strings are for prototypes, not production.

Deployment: Why Your Dev Websocket Dies at 10 Users

Your local FastAPI websocket works perfectly. Deploy it to a PaaS with a single process and watch it brick under load. Websockets hold a persistent connection per user. Most platforms terminate idle connections after 60 seconds. AWS ALB times out at 350 seconds unless you enable stickiness. GCP Cloud Run kills requests without a body after 15 minutes. You are fighting infrastructure defaults built for HTTP.

You need a process manager that keeps the socket alive. Use Gunicorn with Uvicorn workers. Set --worker-class uvicorn.workers.UvicornWorker and --timeout 0 to disable connection timeouts. For AWS, enable sticky sessions (target group stickiness) and crank the idle timeout to the max. For Docker, your health check must be an HTTP endpoint, not a websocket — polling a websocket pings the wrong protocol and kills your container. Serve your websocket on a separate subdomain so you can tune timeout rules without touching your REST API.

ProcfilePYTHON
1
2
3
4
5
// io.thecodeforge — python tutorial

// Production Gunicorn config for websockets
web: gunicorn main:app -k uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:$PORT --workers 4 --timeout 0
Output
No output — runs as a process
Production Trap:
Cloud Run and Lambda in front of a websocket will timeout your connection via the HTTP load balancer long before your WebSocket logic fails. Always test with a 15-minute idle session before pushing to prod.
Key Takeaway
Websockets need process managers with zero timeout and load balancers with sticky sessions — anything else is a 503 disguised as a success.

Dependency Injection Over Websockets — Depends Works, But Not Like You Think

You can slap Depends() on a websocket endpoint. It works for connection setup: auth headers, query params, cookies. FastAPI injects them once when the websocket handshake happens. After that, the dependency is gone — no re-injection per message. That catches people. Your get_current_user runs once at connect. If the user’s token expires mid-session, your code won't know unless you handle it manually.

Use Depends for one-time validation. Extract the user, validate the token, store it in the connection state object. Then inside the websocket.receive_text() loop, check freshness yourself. For per-message dependencies — rate limiting, schema validation — wire a custom middleware or a callable inside the loop. Do not expect FastAPI to re-resolve Depends per message. It won't. Decouple connection-level setup (Depends) from per-message logic (manual). That keeps your endpoints testable and your production bugs confined to the loop — not the handshake.

chat_server.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — python tutorial

from fastapi import FastAPI, WebSocket, Depends, status

app = FastAPI()

def get_user(ws: WebSocket):
    token = ws.cookies.get("session")
    if token != "valid_secret":
        raise Exception("bad auth")
    return {"user": "alice"}

@app.websocket("/ws")
async def chat(websocket: WebSocket, user: dict = Depends(get_user)):
    await websocket.accept()
    while True:
        data = await websocket.receive_text()
        if data == "ping":
            await websocket.send_text(f"pong {user['user']}")
        else:
            await websocket.send_text(f"echo: {data}")
Output
Client sends 'ping' → receives 'pong alice'
Senior Shortcut:
Never put expensive per-message work (DB queries, token refresh) inside Depends. Inject once, validate per message inside the loop with a guard function. Your socket stays fast under 1000 concurrent connections.
Key Takeaway
FastAPI's Depends runs once at websocket connect — use it for handshake auth, not for per-message validation.

Introduction

Real-time communication is the backbone of modern web apps — from live dashboards to collaborative editing. FastAPI's WebSocket support lets you skip polling and push updates instantly to clients, but most examples stop at echo servers. This guide bridges the gap between toy demos and production-grade systems. We'll explore stateful connections, secure handshakes, Redis-backed broadcasting, heartbeat monitoring, and worker tuning. You'll learn why raw text messages are fragile, how to handle concurrency limits without crashing your server, and why your dev WebSocket fails under load. By the end, you'll have a battle-tested pattern for real-time features that scale. No fluff — just the mechanics that matter when your app goes live.

introduction_example.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — python tutorial
// 25 lines max
from fastapi import FastAPI, WebSocket
from fastapi.responses import HTMLResponse

app = FastAPI()

html = """
<!DOCTYPE html>
<html>
    <body>
        <h1>WebSocket Test</h1>
        <script>
            const ws = new WebSocket("ws://localhost:8000/ws");
            ws.onmessage = (event) => alert(event.data);
        </script>
    </body>
</html>
"""

@app.get("/")
async def get():
    return HTMLResponse(html)

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    await websocket.send_text("Hello from server!")
    await websocket.close()
Output
WebSocket connection accepted, client receives 'Hello from server!'
Production Trap:
Never trust WebSocket origin headers in production — always validate manually via a token or cookie.
Key Takeaway
Real-time features demand more than a basic echo — plan for security, scaling, and connection lifecycle from day one.

Conclusion

Building production-grade WebSockets in FastAPI requires more than just slapping @app.websocket on a handler. You need to manage connection state, authenticate upgrades, handle disconnects gracefully, and scale horizontally without breaking message ordering. We covered secure handshakes, Redis Pub/Sub for broadcasting, heartbeat pings to detect zombie clients, and concurrency limits to prevent worker exhaustion. The key takeaway: always design your message protocol with versioned, structured payloads (e.g., JSON with a type field) instead of raw text. This makes debugging, routing, and migrating versions painless. Deploy with a mature ASGI server like Uvicorn behind a reverse proxy, and tune worker count based on your expected concurrent connections. FastAPI gives you the tools — but production reliability comes from understanding how WebSockets interact with asynchronous loops, shared state, and network failures. Apply these patterns, and your real-time features will survive the real world.

conclusion_protocol.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — python tutorial
// 25 lines max
from fastapi import WebSocket

# Structured message protocol example
async def handle_message(websocket: WebSocket, raw: str):
    # Always parse into structured format
    import json
    try:
        data = json.loads(raw)
        if data.get("type") == "ping":
            await websocket.send_json({"type": "pong", "ts": data.get("ts")})
        elif data.get("type") == "subscribe":
            channel = data.get("channel", "default")
            await websocket.send_json({"type": "subscribed", "channel": channel})
        else:
            await websocket.send_json({"type": "error", "msg": "unknown type"})
    except json.JSONDecodeError:
        await websocket.send_json({"type": "error", "msg": "invalid json"})
Output
Client receives structured JSON responses instead of raw text — easier to parse and version.
Production Trap:
Without heartbeat logic, zombie connections accumulate in your worker's memory, eventually exhausting file descriptors and crashing the server.
Key Takeaway
Production WebSockets succeed on structured protocols, heartbeat monitoring, and stateless scaling — not on raw text or naive upgrade handling.
● Production incidentPOST-MORTEMseverity: high

The Silent Leak: 47,000 Zombie WebSockets Exhausted Our File Descriptors

Symptom
After roughly 14 hours of uptime, Uvicorn workers stopped accepting both new HTTP and WebSocket connections. Server metrics looked normal — CPU at 12%, memory at 40%, no application exceptions in logs. The first signal was EMFILE (Too many open files) errors appearing in kernel logs, followed by Uvicorn logging 'accept failed' on every new connection attempt. Existing connections were still alive and functioning, which made the diagnosis non-obvious.
Assumption
The team's mental model was straightforward: when a client disconnects, WebSocketDisconnect fires, the except block runs, and the manager removes the entry. That assumption holds perfectly for clients that send a proper WebSocket close frame. It completely breaks for network-level drops where no TCP FIN or RST packet ever reaches the server — in that case, from the server's perspective, the connection is still alive and the event loop never wakes up for it.
Root cause
Mobile clients transitioning between Wi-Fi and cellular networks drop the underlying TCP connection at the radio layer. No FIN, no RST, no WebSocket close frame — the packets simply stop arriving. The server's asyncio event loop, built on epoll, only wakes up for a socket when data or a close event arrives on the file descriptor. With no event to trigger, the coroutine sits parked on await websocket.receive_text() indefinitely. The ConnectionManager kept every one of these WebSocket objects in its active_connections dict. Each held an open file descriptor. Over 14 hours across thousands of mobile users doing normal things — opening the app, locking their phone, switching networks — 47,000 of these zombie entries accumulated. The default per-process file descriptor limit on the deployment was 65,535. Once that ceiling was hit, the OS refused to open new file descriptors for incoming connections, making the server appear completely unresponsive to new traffic while still serving existing live connections.
Fix
The immediate fix was a server-side heartbeat: every 30 seconds the server sends a WebSocket ping frame to each connection. If no pong arrives within 10 seconds, the connection is explicitly closed and removed from the manager. This bounds the zombie accumulation window to at most 40 seconds per dead connection regardless of network behavior. The broader fixes were: raising LimitNOFILE to 1,000,000 in the systemd unit file; adding open file descriptor count (reading /proc/<pid>/fd) to the health endpoint so the monitoring dashboard would catch growth trends before they became incidents; and setting nginx proxy_read_timeout 3600s as a safety net at the reverse proxy layer.
Key lesson
  • WebSocketDisconnect only fires on a clean close frame — network-level TCP drops are completely invisible to the application layer and leave sockets open indefinitely
  • Implement server-side ping/pong heartbeats to detect dead connections within a bounded, predictable time window regardless of client behavior
  • Monitor open file descriptor count with ls /proc/<pid>/fd | wc -l as a leading indicator — it trends upward hours before the process becomes unresponsive
  • Set proxy_read_timeout at the reverse proxy level as a defense-in-depth safety net — it catches anything your application-level heartbeat misses
  • Raise LimitNOFILE to at least 1,000,000 in systemd before the service ever goes to production — the default is inadequate for any real WebSocket workload
Production debug guideFrom symptom to resolution for common WebSocket production issues5 entries
Symptom · 01
Workers stop accepting new connections after hours of uptime
Fix
Check open file descriptors immediately: ls /proc/<pid>/fd | wc -l. If the count is near the ulimit ceiling, zombie WebSocket connections are accumulating — the application has no heartbeat and dead sockets are never cleaned up. Implement Uvicorn-level ping/pong with ws_ping_interval=30 and ws_ping_timeout=10. Raise LimitNOFILE in the systemd unit. Add fd count to your health endpoint so you see the trend before it becomes an outage.
Symptom · 02
Clients report random 1006 (Abnormal Closure) disconnects
Fix
This close code means the connection was terminated at the TCP layer without a WebSocket close frame — almost always a proxy timeout. Check nginx: grep -r 'proxy_read_timeout' /etc/nginx/. The default is 60 seconds — any connection idle for 60 seconds gets cut. Set proxy_read_timeout 3600s for WebSocket locations and enable server-side heartbeats so idle connections generate periodic traffic that resets the proxy timer.
Symptom · 03
Broadcast latency increases linearly with connection count
Fix
The broadcast loop is awaiting each send serially. With 1,000 connections each taking ~100 microseconds per send, a single broadcast takes 100ms and blocks the event loop for everything else. Replace the for loop with asyncio.gather(*[conn.send_text(msg) for conn in connections.values()], return_exceptions=True). For more than 5,000 connections or cross-instance delivery requirements, move to Redis pub/sub where each worker only fans out to its own local connection set.
Symptom · 04
Memory usage grows steadily even with a stable connection count
Fix
The ConnectionManager is almost certainly accumulating per-connection state — message history, event logs, or unbounded buffers attached to each WebSocket entry. Profile with tracemalloc: import tracemalloc; tracemalloc.start() then snapshot periodically. The manager should store only the WebSocket reference, nothing else. Message history belongs in Redis or a database, not in-process memory.
Symptom · 05
WebSocket handshake fails with 403 behind a load balancer
Fix
The load balancer is not forwarding the Upgrade and Connection headers, so the backend sees an ordinary HTTP request and rejects it. On nginx, add proxy_set_header Upgrade $http_upgrade and proxy_set_header Connection upgrade inside the location block. On AWS ALB, verify the target group protocol is HTTP (not HTTPS) and that the listener rule supports WebSocket — ALB supports WebSocket natively but only when the target group is configured correctly.
★ WebSocket Quick Debug ReferenceRapid diagnostics for WebSocket issues in production FastAPI services
EMFILE errors — too many open files
Immediate action
Count open file descriptors for the Uvicorn worker process
Commands
ls /proc/$(pgrep -f uvicorn)/fd | wc -l
cat /proc/$(pgrep -f uvicorn)/limits | grep 'open files'
Fix now
Add ws_ping_interval=30 and ws_ping_timeout=10 to uvicorn.run(). Set LimitNOFILE=1000000 in the systemd unit file and restart the service. Verify the new limit took effect by re-running the second command.
Clients disconnect after exactly 60 seconds of inactivity+
Immediate action
Confirm the reverse proxy read timeout is the culprit
Commands
grep -r 'proxy_read_timeout' /etc/nginx/
curl -s -o /dev/null -w '%{time_total}' http://localhost:8000/ws/test
Fix now
Add proxy_read_timeout 3600s and proxy_set_header Upgrade $http_upgrade to the nginx location block serving WebSocket traffic. Enable Uvicorn-level heartbeats so idle connections generate periodic ping/pong traffic that keeps the proxy timer reset.
High CPU on broadcast to many clients+
Immediate action
Confirm the event loop is blocked in sequential sends
Commands
py-spy top --pid $(pgrep -f uvicorn)
strace -c -p $(pgrep -f uvicorn) -e trace=write 2>&1 | head -20
Fix now
Replace the sequential broadcast loop with: await asyncio.gather(*[conn.send_text(msg) for conn in connections.values()], return_exceptions=True). The return_exceptions=True prevents a single failed send from cancelling the rest of the gather.
1006 Abnormal Closure behind load balancer+
Immediate action
Verify WebSocket upgrade headers are being forwarded through the proxy
Commands
tcpdump -i any -A port 8000 | grep -i upgrade
nginx -T | grep -A5 'location /ws'
Fix now
Add proxy_set_header Upgrade $http_upgrade and proxy_set_header Connection upgrade to the nginx location block. If using AWS ALB, ensure the target group listener protocol is set to HTTP and sticky sessions are enabled if you are not using Redis pub/sub for cross-instance state.
Real-time Communication Protocols in FastAPI
ProtocolDirectionConnection ModelBest For
WebSocket (WS/WSS)Full-duplex bidirectionalPersistent TCP connection after HTTP upgrade handshakeChat, live dashboards, collaborative editing, multiplayer gaming — anything requiring low-latency bidirectional communication
Server-Sent Events (SSE)Server to client onlyPersistent HTTP connection with text/event-stream MIME type — client cannot send back over the same connectionPush notifications, live feeds, stock tickers, build log streaming — one-way server push with automatic reconnect built into the browser
HTTP Long PollingClient-initiated, server-deferred responseRepeated HTTP requests held open until data is available or timeout, then immediately re-issued by the clientCompatibility fallback for environments where WebSocket or SSE is blocked (some corporate proxies, older CDNs)
HTTP Short PollingClient-initiated request/responsePeriodic HTTP GET at fixed intervals — standard request/response, connection closes after each responseLow-frequency updates where latency above 5 seconds is acceptable and implementation simplicity outweighs efficiency
gRPC StreamingBidirectional (client, server, or both directions independently)Persistent HTTP/2 connection with binary protobuf framing — supports client streaming, server streaming, and bidirectional streamingMicroservice-to-microservice communication, high-throughput binary streams, mobile clients where bandwidth efficiency matters

Key takeaways

1
Always call await websocket.accept() before sending or receiving
and never call it before validating auth.
2
Wrap the receive loop in try/except WebSocketDisconnect to handle clean client closures
but know that network-level drops never trigger this exception.
3
The while True receive loop is intentional
it keeps the coroutine alive for the connection's lifetime. This is the correct pattern, not a bug.
4
Server-side heartbeat via ws_ping_interval and ws_ping_timeout in Uvicorn is mandatory
without it, half-open sockets accumulate until EMFILE kills the worker.
5
Fan out broadcast with asyncio.gather() and return_exceptions=True
sequential await on 1K+ connections blocks the event loop and causes delivery jitter.
6
File descriptors, not memory, are the capacity bottleneck
set LimitNOFILE=1000000 in systemd or container config before going to production.
7
WebSocket connections are local to a single process
any multi-worker or multi-instance deployment requires Redis pub/sub for broadcast to work correctly.
8
Use the lifespan context manager for Redis and background task setup
@app.on_event('startup') is deprecated as of FastAPI 0.93.

Common mistakes to avoid

5 patterns
×

Calling websocket.accept() before validating the auth token

Symptom
Unauthenticated clients briefly connect, get registered in the ConnectionManager, and may receive broadcast messages sent between accept() and close(). Security audits flag the endpoint as having an authentication bypass window. The issue is non-deterministic and hard to reproduce in testing because the window is small.
Fix
Validate the token (from query parameter or cookie) before any call to websocket.accept(). If validation fails, call await websocket.close(code=1008) and return immediately. The 1008 close code signals Policy Violation to the client — use it instead of 1000 (Normal Closure) which implies success.
×

Using a sequential for loop in broadcast() without asyncio.gather()

Symptom
Broadcast latency scales linearly with connection count. The event loop is blocked for the entire duration of the broadcast, which delays all other coroutines including incoming messages and heartbeats. With 1,000 connections the block is ~100ms; with 10,000 connections it stretches into seconds, causing visible message delivery jitter and missed heartbeat responses.
Fix
Replace the sequential loop with: await asyncio.gather(*[conn.send_text(msg) for conn in connections.values()], return_exceptions=True). The return_exceptions=True is critical — without it, a single failed send (dead connection) raises an exception that cancels the entire gather, silencing everyone else.
×

No server-side heartbeat to detect network-level disconnections

Symptom
File descriptor count grows at a slow, steady rate over hours. The metric looks benign in isolation until the process hits the ulimit ceiling and stops accepting new connections entirely. Application logs show nothing — no exceptions, no errors, no warnings. The only signal is the fd count trend and eventually EMFILE in kernel logs.
Fix
Set ws_ping_interval=30 and ws_ping_timeout=10 in uvicorn.run(). This is the lowest-effort, highest-reliability fix — Uvicorn handles the ping/pong at the WebSocket protocol level with no application code required. If you need application-level liveness metadata, implement a HeartbeatManager as shown in the Heartbeat section.
×

Broadcasting without Redis pub/sub in a multi-worker or multi-instance deployment

Symptom
Users connected to different workers or server instances cannot see each other's messages. The behavior appears random — some users see all messages, others miss some, depending on which worker their connection landed on. The bug is invisible in local development where a single worker handles all test clients.
Fix
Add Redis pub/sub: each worker subscribes to a shared channel via the lifespan context manager, publishes outgoing messages to Redis, and forwards received messages to its local connection set. Every worker receives every published message and delivers it to its own connected clients.
×

Not setting ulimit/file descriptor limits before deploying the WebSocket service

Symptom
The service appears healthy in staging (low connection count) and works fine in the first hours of production. After enough users connect, the process hits the OS default ulimit and all new connection attempts fail with EMFILE. Existing connections continue working, making the issue look like an admission control problem rather than a resource limit.
Fix
Set LimitNOFILE=1000000 in the [Service] section of the systemd unit file. For Docker, add --ulimit nofile=1000000:1000000 to the run command or ulimits in docker-compose.yml. Verify with cat /proc/<pid>/limits | grep 'open files' after the service starts. Do this before the service handles any production traffic — raising ulimit requires a service restart.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
What is the difference between WSGI and ASGI, and why is the latter requ...
Q02SENIOR
Explain the 'C10k problem' and how Python's asyncio library helps FastAP...
Q03SENIOR
How would you implement a distributed 'Presence' system (showing who is ...
Q04SENIOR
Describe the lifecycle of a WebSocket handshake. What happens at the HTT...
Q05SENIOR
In a System Design context: how would you design a scalable notification...
Q01 of 05SENIOR

What is the difference between WSGI and ASGI, and why is the latter required for FastAPI WebSockets?

ANSWER
WSGI (Web Server Gateway Interface) is a synchronous, request-response protocol defined in PEP 3333. A WSGI server calls your application with a request, your application processes it synchronously, returns a response, and the call stack unwinds. There is no mechanism to hold a connection open, push data asynchronously, or handle multiple requests concurrently on a single thread. One thread handles one request — long-lived connections would block that thread indefinitely. ASGI (Asynchronous Server Gateway Interface) is the async successor defined in the ASGI specification. Instead of a synchronous callable, your application is an async callable that receives a scope (describing the connection type — http, websocket, lifespan), a receive coroutine (to await incoming events), and a send coroutine (to push outgoing events). This three-component interface supports long-lived connections natively because the event loop can switch between thousands of coroutines while each one is waiting for I/O. For WebSockets specifically: the HTTP-to-WebSocket upgrade handshake must be handled in the same connection, after which the protocol switches. ASGI handles this by keeping the same scope alive across the upgrade, with receive() yielding websocket.connect, websocket.receive, and websocket.disconnect events as they arrive. A WSGI server has no model for this — the call stack returns after the HTTP response, and there is nowhere to attach the subsequent WebSocket events. FastAPI is built on Starlette, which is an ASGI framework. Uvicorn is the ASGI server. Together they give you an asyncio event loop that multiplexes thousands of WebSocket coroutines on a single OS thread with no threading overhead.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How do I authenticate a WebSocket connection?
02
How do WebSockets scale across multiple FastAPI instances?
03
Can I use FastAPI Middleware with WebSockets?
04
What happens to WebSocket connections during a server deploy or restart?
N
Naren Founder & Principal Engineer

20+ years shipping production Python across data and backend systems. Written from production experience, not tutorials.

Follow
Verified
production tested
June 10, 2026
last updated
1,554
articles · all by Naren
🔥

That's Python Libraries. Mark it forged?

11 min read · try the examples if you haven't

Previous
FastAPI Testing with pytest and TestClient
47 / 51 · Python Libraries
Next
FastAPI Deployment — Docker, Uvicorn and Gunicorn