FastAPI WebSockets — Real-time Communication
- Always call await
websocket.accept()before sending or receiving — and never call it before validating auth. - Wrap the receive loop in try/except WebSocketDisconnect to handle clean client closures — but know that network-level drops never trigger this exception.
- The while True receive loop is intentional — it keeps the coroutine alive for the connection's lifetime. This is the correct pattern, not a bug.
- WebSockets upgrade an HTTP connection to a persistent, full-duplex TCP stream via the WS protocol
- Use @app.websocket('/ws') and await websocket.accept() to establish the handshake
- A ConnectionManager pattern maps user IDs to WebSocket objects for targeted routing and broadcast
- Each open socket consumes a coroutine slot and a file descriptor; 10K connections require tuned ulimits
- Without try/except WebSocketDisconnect, crashed clients leak server-side file descriptors silently
- Browsers cannot send custom headers on WS handshake — use signed query parameters or cookies for auth
EMFILE errors — too many open files
ls /proc/$(pgrep -f uvicorn)/fd | wc -lcat /proc/$(pgrep -f uvicorn)/limits | grep 'open files'Clients disconnect after exactly 60 seconds of inactivity
grep -r 'proxy_read_timeout' /etc/nginx/curl -s -o /dev/null -w '%{time_total}' http://localhost:8000/ws/testHigh CPU on broadcast to many clients
py-spy top --pid $(pgrep -f uvicorn)strace -c -p $(pgrep -f uvicorn) -e trace=write 2>&1 | head -201006 Abnormal Closure behind load balancer
tcpdump -i any -A port 8000 | grep -i upgradenginx -T | grep -A5 'location /ws'Production Incident
websocket.receive_text() indefinitely. The ConnectionManager kept every one of these WebSocket objects in its active_connections dict. Each held an open file descriptor. Over 14 hours across thousands of mobile users doing normal things — opening the app, locking their phone, switching networks — 47,000 of these zombie entries accumulated. The default per-process file descriptor limit on the deployment was 65,535. Once that ceiling was hit, the OS refused to open new file descriptors for incoming connections, making the server appear completely unresponsive to new traffic while still serving existing live connections.Production Debug GuideFrom symptom to resolution for common WebSocket production issues
connections.values()], return_exceptions=True). For more than 5,000 connections or cross-instance delivery requirements, move to Redis pub/sub where each worker only fans out to its own local connection set.tracemalloc.start() then snapshot periodically. The manager should store only the WebSocket reference, nothing else. Message history belongs in Redis or a database, not in-process memory.Stateful Connection Management
In a production environment you rarely deal with a single socket in isolation. The moment you have more than one user you need a centralized structure that tracks active sessions, routes messages to specific clients, and handles lifecycle transitions cleanly. The ConnectionManager pattern is that structure.
The ForgeSocketManager below stores WebSocket objects in a dictionary keyed by user ID. This makes targeted messaging (send_personal_message) a dictionary lookup, and broadcast a single pass over the values. The critical invariant is that every connect() call must have a corresponding disconnect() call — otherwise the dictionary grows unbounded and you accumulate the exact kind of zombie entries described in the incident above.
The try/except WebSocketDisconnect block in the endpoint is what guarantees the disconnect() call happens for clean client closures. For network-level drops where no close frame arrives, you need the heartbeat mechanism covered later in this guide — the try/except block alone is not sufficient.
from fastapi import FastAPI, WebSocket, WebSocketDisconnect from typing import Dict import asyncio app = FastAPI() class ForgeSocketManager: """Manages active WebSocket connections keyed by user ID. The dictionary is the single source of truth for who is currently connected. Every code path that touches it must respect the connect/disconnect contract — no exceptions. """ def __init__(self): # Keyed by user_id so we can route messages without iterating everything self.active_connections: Dict[str, WebSocket] = {} async def connect(self, user_id: str, websocket: WebSocket) -> None: await websocket.accept() self.active_connections[user_id] = websocket def disconnect(self, user_id: str) -> None: # .pop() with a default is safer than del — avoids KeyError on double-disconnect self.active_connections.pop(user_id, None) async def send_personal_message(self, message: str, user_id: str) -> None: ws = self.active_connections.get(user_id) if ws is not None: await ws.send_text(message) async def broadcast(self, message: str) -> None: # asyncio.gather fans out all sends concurrently instead of awaiting each one # return_exceptions=True prevents one failed send from cancelling the rest await asyncio.gather( *[ws.send_text(message) for ws in self.active_connections.values()], return_exceptions=True, ) manager = ForgeSocketManager() @app.websocket("/ws/{user_id}") async def websocket_endpoint(websocket: WebSocket, user_id: str): await manager.connect(user_id, websocket) try: while True: data = await websocket.receive_text() await manager.broadcast(f"User {user_id}: {data}") except WebSocketDisconnect: # Clean close frame received — remove from manager and notify others manager.disconnect(user_id) await manager.broadcast(f"User {user_id} has left the forge.")
- connect() = add a new entry after the handshake completes — not before
- disconnect() = remove the entry when the call ends — skip this and you have ghost listings that accumulate forever
- send_personal_message() = look up a number and call it directly — .get() with a None check handles the case where the user already left
- broadcast() with
asyncio.gather()= conference call to everyone simultaneously — sequential await is O(n) latency which blocks the event loop - The try/except WebSocketDisconnect is the hang-up detector for clean closes — heartbeats handle the network-level drops that never fire this exception
broadcast() pattern in most tutorials uses a sequential for loop — await each send one at a time. With 100 connections that is imperceptible. With 1,000 connections at 100 microseconds per send you are blocking the event loop for 100ms on every broadcast. asyncio.gather() makes all the sends concurrent within the same event loop tick, which keeps broadcast latency flat regardless of connection count. Use return_exceptions=True or a single failed send will cancel the entire gather and silence everyone else.connect() must pair with a disconnect() — the try/except block covers clean closes, heartbeats cover silent drops.Securing the Handshake
WebSockets start life as an HTTP request, but the browser's WebSocket API does not allow you to set custom HTTP headers like Authorization: Bearer <token> on that initial request. This is a deliberate browser security restriction, not a FastAPI limitation. The two workable alternatives are signed query parameters and HttpOnly cookies set during a prior HTTP login flow.
Query parameters are the most common choice for API clients and native apps. Cookies are preferable for browser-based applications because they are never visible in access logs or browser history. The code below uses a query parameter because it is easier to demonstrate, but the authentication logic is identical for cookies — you just read from request.cookies instead of a query string.
The single rule that matters most: validate before you call accept(). Once accept() is called the handshake is complete, the connection is established, and the client is inside your system. Closing immediately after an invalid accept() is not equivalent to rejecting it — between accept() and close() the client may have already received a broadcast message or had its user ID added to the manager.
from fastapi import FastAPI, WebSocket, Query, status from jose import JWTError, jwt from datetime import datetime app = FastAPI() SECRET_KEY = "your-secret-key" # Load from environment in production ALGORITHM = "HS256" def decode_ws_token(token: str) -> dict | None: """Returns the decoded payload or None if the token is invalid or expired.""" try: payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM]) # Verify expiry explicitly — some JWT libraries are lenient about this if payload.get("exp") and datetime.utcnow().timestamp() > payload["exp"]: return None return payload except JWTError: return None @app.websocket("/secure-ws") async def secure_socket(websocket: WebSocket, token: str = Query(...)): # Validate BEFORE accept() — this is the hard rule payload = decode_ws_token(token) if payload is None: # 1008 = Policy Violation — the correct code for auth failure # 1000 = Normal Closure — wrong signal; implies success await websocket.close(code=status.WS_1008_POLICY_VIOLATION) return user_id: str = payload.get("sub") await websocket.accept() await websocket.send_text(f"Authenticated. Welcome, {user_id}.") try: while True: data = await websocket.receive_text() await websocket.send_text(f"Echo: {data}") except WebSocketDisconnect: pass # Clean up handled by ConnectionManager in the full implementation
websocket.accept() before token validation creates a window where an unauthenticated client is fully connected. Even if you close immediately after, they have already been registered in the event loop as a live connection, may have received a broadcast message that went out between accept() and close(), and their connection attempt has consumed a file descriptor. For JWT tokens, decode and verify the signature, expiry, and audience claim before accepting. For cookies, read and validate before accepting. The order is non-negotiable.accept() — accept() is the point of no return.Scaling Broadcasts with Redis Pub/Sub
A WebSocket connection is bound to the specific OS process that accepted it. The ConnectionManager on that process has no visibility into connections on any other process. When you run multiple Uvicorn workers (--workers 4) or deploy multiple instances behind a load balancer, each worker has its own isolated ConnectionManager. Calling broadcast() on Worker A silently drops all messages destined for clients connected to Workers B, C, and D.
The standard fix is a pub/sub broker as a shared nervous system. Each FastAPI instance subscribes to a Redis channel on startup. When any instance needs to broadcast, it publishes to Redis. Redis delivers the message to every subscribed instance simultaneously, and each instance fans it out to its own local connections. The originating instance doesn't need to know where any given client is connected — Redis handles the routing.
Two things to understand about Redis pub/sub before you commit to it: first, it is fire-and-forget. If an instance is temporarily disconnected from Redis when a message is published, that message is gone — no replay, no delivery guarantee. Second, if a client is not currently connected (offline), the message is lost entirely because there is no storage layer. For systems where offline clients must receive missed messages on reconnect, replace pub/sub with Redis Streams (XADD/XREAD with consumer groups) or a proper message queue. The latency trade-off is real — Streams add microseconds of persistence overhead, but for notification systems that matters less than you might think.
import asyncio import redis.asyncio as aioredis from fastapi import FastAPI, WebSocket, WebSocketDisconnect from contextlib import asynccontextmanager local_connections: dict[str, WebSocket] = {} CHANNEL = "forge:broadcast" redis: aioredis.Redis | None = None async def redis_listener(r: aioredis.Redis) -> None: """Long-running coroutine: subscribes to Redis and forwards to local clients.""" pubsub = r.pubsub() await pubsub.subscribe(CHANNEL) async for message in pubsub.listen(): if message["type"] != "message": continue data: str = message["data"].decode() # Collect stale connections during the fan-out rather than modifying # the dict mid-iteration, which raises RuntimeError in Python 3.x stale: list[str] = [] await asyncio.gather( *[ ws.send_text(data) for uid, ws in local_connections.items() ], return_exceptions=True, ) # Clean up any connections that raised during the gather for uid in stale: local_connections.pop(uid, None) @asynccontextmanager async def lifespan(app: FastAPI): global redis redis = aioredis.from_url("redis://localhost:6379", decode_responses=False) # Start the listener as a background task tied to the worker's lifespan listener_task = asyncio.create_task(redis_listener(redis)) yield # Graceful shutdown — cancel the listener and close the Redis connection listener_task.cancel() await redis.aclose() app = FastAPI(lifespan=lifespan) @app.websocket("/ws/{user_id}") async def ws_endpoint(websocket: WebSocket, user_id: str): await websocket.accept() local_connections[user_id] = websocket try: while True: data = await websocket.receive_text() # Publish to Redis — all instances (including this one) receive it await redis.publish(CHANNEL, f"{user_id}: {data}") except WebSocketDisconnect: local_connections.pop(user_id, None)
- Each worker subscribes to the same Redis channel on startup via the lifespan context manager
- Publishing to Redis delivers to every subscribed instance in microseconds regardless of where the publisher is
- Each instance fans out to its own local connection set — it never touches another instance's connections directly
- Pub/sub is ephemeral — an instance that loses Redis connectivity during a publish misses that message entirely
- For offline delivery or message replay, use Redis Streams (XADD/XREAD) with a consumer group instead of pub/sub
- Note: the lifespan context manager replaces the deprecated @app.on_event('startup') pattern as of FastAPI 0.93
Heartbeat and Liveness Detection
TCP connections can become half-open when the underlying network disappears without sending a FIN or RST packet. This is not an edge case — it is routine. Mobile clients switching between Wi-Fi and cellular, corporate firewalls with idle connection timeouts, cloud NAT gateways that evict long-lived sessions, and VPNs reconnecting all produce this behavior. The network layer drops the connection; the application layer never finds out.
The WebSocket protocol addresses this with ping and pong control frames. The server sends a ping frame; the client must respond with a pong. If the pong never arrives, the connection is dead and should be closed and removed from the manager.
In FastAPI with Uvicorn, you have two options. The first is Uvicorn-level heartbeat: set ws_ping_interval and ws_ping_timeout in uvicorn.run() and Uvicorn handles everything transparently using native WebSocket ping frames. This is the preferred approach — it requires no application code and covers every endpoint automatically. The second option is application-level heartbeat: your server sends a JSON message like {"type": "ping"} on a timer and expects {"type": "pong"} back. This is useful when you need application-aware liveness logic (for example, detecting an authenticated session that has gone stale vs. a dead TCP connection).
The heartbeat interval is a real trade-off. A 30-second interval with a 10-second response window means dead connections are detected within 30-40 seconds. The overhead is roughly 20 bytes per connection per 30 seconds — for 10,000 connections that is about 6.5 KB/s of heartbeat traffic, which is negligible. Going shorter than 15 seconds starts to add up at very high connection counts and increases pressure on the Redis pub/sub channel if you are propagating liveness events.
import asyncio from fastapi import FastAPI, WebSocket, WebSocketDisconnect from typing import Dict app = FastAPI() class HeartbeatManager: """Connection manager with per-connection heartbeat tasks. Each connection gets an independent heartbeat coroutine. If the client fails to respond to a ping within the timeout window, the connection is closed and removed from the active set. """ def __init__(self, interval: int = 30, timeout: int = 10): self.connections: Dict[str, WebSocket] = {} self._last_pong: Dict[str, float] = {} self.interval = interval self.timeout = timeout async def connect(self, user_id: str, websocket: WebSocket) -> None: await websocket.accept() self.connections[user_id] = websocket self._last_pong[user_id] = asyncio.get_event_loop().time() asyncio.create_task(self._heartbeat_loop(user_id, websocket)) def record_pong(self, user_id: str) -> None: """Called when the client sends a pong-type message.""" self._last_pong[user_id] = asyncio.get_event_loop().time() def disconnect(self, user_id: str) -> None: self.connections.pop(user_id, None) self._last_pong.pop(user_id, None) async def _heartbeat_loop(self, user_id: str, ws: WebSocket) -> None: while user_id in self.connections: await asyncio.sleep(self.interval) if user_id not in self.connections: break # Check when we last heard from this client elapsed = asyncio.get_event_loop().time() - self._last_pong.get(user_id, 0) if elapsed > self.interval + self.timeout: # Client has not responded within the timeout window — treat as dead await ws.close(code=1001) # 1001 = Going Away self.disconnect(user_id) break try: await ws.send_json({"type": "ping"}) except Exception: # Send failed — connection is already dead at the network layer self.disconnect(user_id) break manager = HeartbeatManager(interval=30, timeout=10) @app.websocket("/ws/{user_id}") async def ws_endpoint(websocket: WebSocket, user_id: str): await manager.connect(user_id, websocket) try: while True: data = await websocket.receive_json() if data.get("type") == "pong": manager.record_pong(user_id) continue # Heartbeat response — no further processing needed # Handle actual application messages here await websocket.send_json({"echo": data}) except WebSocketDisconnect: manager.disconnect(user_id)
uvicorn.run() give you native WebSocket-level ping frames with no application code required. This is preferable to an application-level heartbeat because it uses the actual WebSocket control frame mechanism, works transparently with browser WebSocket implementations, and doesn't require the client to implement any JSON message protocol. Reserve application-level heartbeats for cases where you need to carry additional liveness metadata — for example, sending the server's current timestamp so the client can detect clock drift.Concurrency Limits and Worker Tuning
A WebSocket coroutine lives for the entire duration of the connection — minutes, hours, sometimes days. This is fundamentally different from HTTP request handlers, which complete in milliseconds and release all their resources. You need to think about capacity differently.
The good news is that asyncio coroutines are cheap. Each one uses roughly 4-8 KB of memory on the stack. 10,000 concurrent WebSocket connections consume about 40-80 MB of coroutine stack space — manageable on any modern server. The limiting factor is almost never memory.
The actual bottleneck is file descriptors. Every open WebSocket holds one OS file descriptor for its TCP socket. The default ulimit on most Linux distributions is 1,024. On some distributions and container environments it is 65,535. Neither is adequate for a production WebSocket server expecting more than a few thousand concurrent connections. You need to raise this limit before the service goes live, not after the first EMFILE incident.
For CPU-bound work inside WebSocket handlers — JWT verification, JSON schema validation, image processing — asyncio's single-threaded model becomes the bottleneck. The event loop cannot proceed with other coroutines while a synchronous CPU-bound function is running. The solution is either run_in_executor to offload to a thread pool, or multiple Uvicorn workers (--workers N, one per CPU core) so each worker has its own event loop. With multiple workers you must have Redis pub/sub in place — without it, broadcast stops working correctly the moment you have more than one worker.
uvloop is a drop-in replacement for Python's built-in asyncio event loop implemented in Cython on top of libuv. Benchmarks consistently show 2-4x throughput improvement for I/O-bound workloads, which covers the overwhelming majority of WebSocket use cases. It is a one-line change and there is no reason not to use it in production.
import uvicorn # Production WebSocket server configuration # Run this with: python -m io.thecodeforge.config.uvicorn_ws # Or directly: uvicorn app:app --workers 4 --loop uvloop if __name__ == "__main__": uvicorn.run( "io.thecodeforge.realtime.connection_manager:app", host="0.0.0.0", port=8000, workers=4, # One per physical CPU core for CPU-bound handler work loop="uvloop", # 2-4x faster than default asyncio on I/O-bound workloads ws="websockets", # websockets library — better performance than wsproto default ws_ping_interval=30, # Send native WS ping frame every 30 seconds ws_ping_timeout=10, # Close connection if no pong within 10 seconds limit_concurrency=10000, # Reject new connections beyond this per worker limit_max_requests=50000, # Restart worker after N requests — guards against slow leaks timeout_keep_alive=5, # HTTP keep-alive timeout (not WebSocket — separate setting) access_log=True, # Log WS upgrade requests alongside HTTP — useful for debugging )
- Each WebSocket coroutine uses ~4-8 KB of stack memory — 10K connections is roughly 40-80 MB, well within budget
- Each connection holds exactly one OS file descriptor — this is the real ceiling and it defaults to 1,024 or 65,535
- Set LimitNOFILE=1000000 in the systemd unit file — not ulimit in a shell script, which doesn't persist across restarts
- In Docker, set --ulimit nofile=1000000:1000000 on the container or adjust in docker-compose.yml
- ws_ping_interval and ws_ping_timeout at the Uvicorn level handle heartbeat transparently — you don't need application-level ping/pong unless you need custom liveness metadata
- limit_max_requests is a safety valve against slow memory leaks — the worker restarts after N requests, Uvicorn handles graceful handoff
| Protocol | Direction | Connection Model | Best For |
|---|---|---|---|
| WebSocket (WS/WSS) | Full-duplex bidirectional | Persistent TCP connection after HTTP upgrade handshake | Chat, live dashboards, collaborative editing, multiplayer gaming — anything requiring low-latency bidirectional communication |
| Server-Sent Events (SSE) | Server to client only | Persistent HTTP connection with text/event-stream MIME type — client cannot send back over the same connection | Push notifications, live feeds, stock tickers, build log streaming — one-way server push with automatic reconnect built into the browser |
| HTTP Long Polling | Client-initiated, server-deferred response | Repeated HTTP requests held open until data is available or timeout, then immediately re-issued by the client | Compatibility fallback for environments where WebSocket or SSE is blocked (some corporate proxies, older CDNs) |
| HTTP Short Polling | Client-initiated request/response | Periodic HTTP GET at fixed intervals — standard request/response, connection closes after each response | Low-frequency updates where latency above 5 seconds is acceptable and implementation simplicity outweighs efficiency |
| gRPC Streaming | Bidirectional (client, server, or both directions independently) | Persistent HTTP/2 connection with binary protobuf framing — supports client streaming, server streaming, and bidirectional streaming | Microservice-to-microservice communication, high-throughput binary streams, mobile clients where bandwidth efficiency matters |
🎯 Key Takeaways
- Always call await
websocket.accept()before sending or receiving — and never call it before validating auth. - Wrap the receive loop in try/except WebSocketDisconnect to handle clean client closures — but know that network-level drops never trigger this exception.
- The while True receive loop is intentional — it keeps the coroutine alive for the connection's lifetime. This is the correct pattern, not a bug.
- Server-side heartbeat via ws_ping_interval and ws_ping_timeout in Uvicorn is mandatory — without it, half-open sockets accumulate until EMFILE kills the worker.
- Fan out broadcast with
asyncio.gather()and return_exceptions=True — sequential await on 1K+ connections blocks the event loop and causes delivery jitter. - File descriptors, not memory, are the capacity bottleneck — set LimitNOFILE=1000000 in systemd or container config before going to production.
- WebSocket connections are local to a single process — any multi-worker or multi-instance deployment requires Redis pub/sub for broadcast to work correctly.
- Use the lifespan context manager for Redis and background task setup — @app.on_event('startup') is deprecated as of FastAPI 0.93.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is the difference between WSGI and ASGI, and why is the latter required for FastAPI WebSockets?Mid-levelReveal
- QExplain the 'C10k problem' and how Python's asyncio library helps FastAPI handle thousands of concurrent WebSocket connections.SeniorReveal
- QHow would you implement a distributed 'Presence' system (showing who is online) using FastAPI and Redis?SeniorReveal
- QDescribe the lifecycle of a WebSocket handshake. What happens at the HTTP level during the Upgrade request?Mid-levelReveal
- QIn a System Design context: how would you design a scalable notification service using WebSockets for 10 million users?SeniorReveal
Frequently Asked Questions
How do I authenticate a WebSocket connection?
The browser's WebSocket API does not allow setting custom HTTP headers like Authorization: Bearer on the initial handshake request. Your two options are query parameters and cookies.
Query parameters are straightforward for API clients and native apps: @app.websocket('/ws') async def ws(websocket: WebSocket, token: str = Query(...)). Validate the token before calling websocket.accept(), and close with websocket.close(code=1008) if it is invalid. The problem with query parameters is that they appear verbatim in server access logs and proxy logs — a long-lived API key in a query parameter is a permanent credential leak. Use short-lived JWTs with a 60-second expiry: the client obtains a fresh token via a normal HTTP endpoint, opens the WebSocket connection with it, and the token is worthless by the time it appears in a log.
Cookies are the better choice for browser-based applications. HttpOnly cookies set during a prior HTTP login flow are sent automatically with the WebSocket upgrade request, are never visible in JavaScript, and do not appear in browser history. Read the cookie in the WebSocket handler via websocket.cookies.get('session') and validate it before accepting.
How do WebSockets scale across multiple FastAPI instances?
A WebSocket connection is permanently bound to the specific process that accepted it. The ConnectionManager on that process has no visibility into connections on any other process. If you broadcast on Instance A, clients on Instances B, C, and D receive nothing.
The solution is Redis pub/sub: each instance subscribes to a shared Redis channel in the lifespan startup hook. When any instance needs to broadcast, it publishes to Redis. Redis delivers the message to every subscribed instance simultaneously, and each instance fans it out to its own local connections. The originating instance doesn't need to know which instance any given client is connected to.
For durable delivery — where clients that are temporarily offline should receive missed messages when they reconnect — replace pub/sub with Redis Streams. Use XADD to publish and XREAD with consumer groups to deliver, with the stream acting as a persistent log. Pub/sub is fire-and-forget; Streams are durable and replayable.
Can I use FastAPI Middleware with WebSockets?
Standard HTTP middleware defined with @app.middleware('http') does not execute for WebSocket connections. The middleware chain is built around the HTTP request/response model — it sees the initial upgrade request, but after the 101 handshake the connection transitions to the WebSocket protocol and the middleware stack no longer intercepts it.
For cross-cutting concerns on WebSocket endpoints — authentication, rate limiting, structured logging, request tracing — the practical approaches are: a dependency function injected into the endpoint (the cleanest option for auth and validation), a decorator wrapping the endpoint function, or a class-based wrapper that implements the same connect/receive/send lifecycle. For rate limiting specifically, implement it in the dependency using Redis counters keyed by IP or user_id — the same Redis instance you are already using for pub/sub.
What happens to WebSocket connections during a server deploy or restart?
When Uvicorn receives SIGTERM (the standard shutdown signal from systemd, Kubernetes, or a container orchestrator), it begins graceful shutdown. Active WebSocket connections receive a close frame with code 1001 (Going Away). The client's WebSocket implementation fires the onclose event, and well-written clients use this as the trigger to implement exponential-backoff reconnect logic.
For zero-downtime deploys, use rolling deployments with connection draining: stop routing new connections to the instance being replaced (remove it from the load balancer's target group or upstream), then wait for existing connections to close naturally or hit the drain timeout, then send SIGTERM. Configure the drain timeout based on your expected connection lifetime — for notification services where connections are typically short-lived, 30 seconds is usually sufficient; for long-lived collaborative editing sessions, you may need minutes.
In Kubernetes, preStop hooks give you a window to drain before SIGTERM arrives. Set terminationGracePeriodSeconds to at least 2x your expected drain window. With Redis pub/sub, a client that reconnects to a new pod immediately rejoins the same broadcast graph — the user experience is a brief reconnect rather than a visible outage.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.