Senior 9 min · April 11, 2026

Health Check DB Query — Load Balancer 503 Outage

A DB query in health checks caused a 12-minute 503 outage.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • A load balancer distributes incoming network traffic across multiple backend servers
  • It prevents any single server from becoming overwhelmed and improves availability
  • Layer 4 (transport) balances by IP and port; Layer 7 (application) balances by HTTP content
  • Health checks remove unhealthy servers from rotation automatically
  • Production outages often trace back to misconfigured health checks or missing connection draining
  • Biggest mistake: treating load balancing as set-and-forget without monitoring distribution skew
✦ Definition~90s read
What is Health Check DB Query — Load Balancer 503 Outage?

A load balancer is a device or software component that distributes incoming network traffic across multiple backend servers. It acts as a single entry point for client requests and routes them to available servers based on a configured algorithm.

A load balancer is like a traffic officer at a busy intersection directing cars to different lanes.

Load balancers solve three fundamental problems: availability by removing failed servers from rotation, scalability by enabling horizontal addition of servers, and performance by preventing any single server from becoming a bottleneck. Without a load balancer, every client would need to know individual server addresses, and a single server failure would cause service disruption.

Plain-English First

A load balancer is like a traffic officer at a busy intersection directing cars to different lanes. Instead of all cars piling into one lane, the officer spreads them out so every lane moves smoothly. In computing, the load balancer sits in front of your servers and spreads incoming requests so no single server gets overwhelmed.

Load balancers are critical infrastructure components that distribute client requests across a pool of backend servers. They improve application availability, enable horizontal scaling, and provide fault tolerance by routing traffic away from failed instances. Every production web service behind more than one server requires a load balancing layer.

Misunderstanding load balancer algorithms, health check configurations, and session persistence mechanisms causes some of the most common production incidents. A misconfigured health check can remove all servers from rotation simultaneously, causing a complete outage. An incorrect algorithm choice can create hotspots where one server handles 80% of traffic while others sit idle.

What Is a Load Balancer?

A load balancer is a device or software component that distributes incoming network traffic across multiple backend servers. It acts as a single entry point for client requests and routes them to available servers based on a configured algorithm.

Load balancers solve three fundamental problems: availability by removing failed servers from rotation, scalability by enabling horizontal addition of servers, and performance by preventing any single server from becoming a bottleneck. Without a load balancer, every client would need to know individual server addresses, and a single server failure would cause service disruption.

io.thecodeforge.loadbalancer.core.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
from dataclasses import dataclass, field
from enum import Enum
from typing import List, Optional
import time
import threading
from io.thecodeforge.loadbalancer.health import HealthChecker
from io.thecodeforge.loadbalancer.algorithms import LoadBalancingAlgorithm

class BackendState(Enum):
    HEALTHY = "healthy"
    UNHEALTHY = "unhealthy"
    DRAINING = "draining"

@dataclass
class BackendServer:
    host: str
    port: int
    weight: int = 1
    state: BackendState = BackendState.HEALTHY
    active_connections: int = 0
    last_health_check: float = 0.0
    consecutive_failures: int = 0
    
    @property
    def address(self) -> str:
        return f"{self.host}:{self.port}"
    
    def is_available(self) -> bool:
        return self.state in (BackendState.HEALTHY, BackendState.DRAINING)


class LoadBalancer:
    """
    Production-grade load balancer with health checking,
    connection draining, and multiple routing algorithms.
    """
    
    def __init__( health_check_interval: float = 5.0):
        self.backends: List[BackendServer] = []
        self.algorithm = algorithm
        self.health_checker = HealthChecker(interval=health_check_interval)
        self._lock = threading.Lock()
        self._minimum_healthy_hosts: int = 0
    
    def add_backend(self, host: str, port: int, weight: int = 1) -> BackendServer:
        """
        Register a new backend server with the load balancer.
        """
        with self._lock:
            backend = BackendServer(host=host, port=port, weight=weight)
            self.backends.append(backend)
            self.health_checker.register(backend)
            return backend
    
    def remove_backend(self, backend: BackendServer) -> None:
        """
        Gracefully remove a backend with connection draining.
        """
        with self._lock:
            backend.state = BackendState.DRAINING
            self.health_checker.unregister(backend)
    
    def select_backend(self) -> Optional[BackendServer]:
        """
        Select next backend using configured algorithm.
        Respects minimum_healthy_hosts threshold.
        """
        with self._lock:
            available = [b for b in self.backends if b.is_available()]
            healthy = [b for b in available if b.state == BackendState.HEALTHY]
            
            if len(healthy) < self._minimum_healthy_hosts and available:
                return self.algorithm.select(available)
            
            if not healthy:
                return None
            
            return self.algorithm.select(healthy)
    
    def set_minimum_healthy_hosts(self, count: int) -> None:
        """
        Configure minimum healthy backends before routing stops.
        Prevents complete pool exhaustion during failures.
        """
        self._minimum_healthy_hosts = count


# Example usage
from io.thecodeforge.loadbalancer.algorithms import RoundRobinAlgorithm

lb = LoadBalancer(algorithm=RoundRobinAlgorithm(), health_check_interval=5.0)
lb.add_backend("10.0.1.10", 8080)
lb.add_backend("10.0.1.11", 8080)
lb.addself, algorithm: LoadBalancingAlgorithm,_backend("10.0.1.12", 8080)
lb.set_minimum_healthy_hosts(1)

for i in range(6):
    backend = lb.select_backend()
    if backend:
        print(f"Request {i} -> {backend.address}")
Load Balancer as Traffic Director
  • Clients connect to the load balancer, never directly to backend servers
  • The balancer decides which backend receives each request
  • Failed servers are removed automatically via health checks
  • New servers are added without client-side changes
  • The balancer itself must be redundant to avoid becoming a single point of failure
Production Insight
Load balancers become the single point of entry for all traffic.
If the balancer fails, all backends become unreachable.
Rule: always deploy load balancers in redundant pairs or use managed services.
Key Takeaway
Load balancers distribute traffic across servers for availability and scale.
They are the single entry point — making them redundant is critical.
Health checks and connection draining prevent cascading failures.
Load Balancer Deployment Decision
IfSimple HTTP traffic with standard routing needs
UseUse a managed load balancer like AWS ALB or GCP HTTP(S) LB
IfTCP/UDP traffic or non-HTTP protocols
UseUse a Layer 4 load balancer like AWS NLB or HAProxy in TCP mode
IfNeed full control over routing logic
UseDeploy HAProxy, NGINX, or Envoy as self-managed load balancer
IfKubernetes-based microservices
UseUse ingress controller (NGINX Ingress, Istio Gateway) with service mesh

Types of Load Balancers

Load balancers operate at different layers of the network stack, each with distinct capabilities and trade-offs. The two primary categories are Layer 4 (transport) and Layer 7 (application) load balancers.

Layer 4 load balancers make routing decisions based on IP address and port information. They are fast and protocol-agnostic but cannot inspect request content. Layer 7 load balancers operate at the application layer and can route based on HTTP headers, URLs, cookies, and request content. They enable sophisticated routing but add latency from content inspection.

io.thecodeforge.loadbalancer.types.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
from abc import ABC, abstractmethod
from typing import Optional, Dict, Any
from io.thecodeforge.loadbalancer.core import BackendServer
from io.thecodeforge.loadbalancer.models import Request, Connection

class LoadBalancerType(ABC):
    """
    Abstract base for Layer 4 and Layer 7 load balancer types.
    """
    
    @abstractmethod
    def route(self, request: Any) -> Optional[BackendServer]:
        pass


class Layer4LoadBalancer(LoadBalancerType):
    """
    Transport-layer load balancer.
    Routes based on source/destination IP and port.
    Does not inspect packet contents.
    """
    
    def __init__(self, balancer):
        self.balancer = balancer
    
    def route(self, connection: Connection) -> Optional[BackendServer]:
        """
        Route based on 5-tuple: src_ip, src_port, dst_ip, dst_port, protocol.
        """
        backend = self.balancer.select_backend()
        if backend:
            backend.active_connections += 1
        return backend
    
    def get_capabilities(self) -> Dict[str, bool]:
        return {
            "protocol_agnostic": True,
            "url_routing": False,
            "header_inspection": False,
            "cookie_persistence": False,
            "content_based_routing": False,
            "ssl_termination": False,
            "websocket_support": True,
            "latency_overhead": "minimal"
        }


class Layer7LoadBalancer(LoadBalancerType):
    """
    Application-layer load balancer.
    Routes based on HTTP headers, URL path, hostname, cookies.
    """
    
    def __init__(self, balancer):
        self.balancer = balancer
        self.rules: list = []
    
    def add_routing_rule(self, condition: callable, target_pool: str) -> None:
        """
        Add content-based routing rule.
        """
        self.rules.append({"condition": condition, "pool": target_pool})
    
    def route(self, request: Request) -> Optional[BackendServer]:
        """
        Route based on HTTP request content.
        """
        for rule in self.rules:
            if rule["condition"](request):
                pool = rule["pool"]
                return self.balancer.select_backend_from_pool(pool)
        
        return self.balancer.select_backend()
    
    def get_capabilities(self) -> Dict[str, bool]:
        return {
            "protocol_agnostic": False,
            "url_routing": True,
            "header_inspection": True,
            "cookie_persistence": True,
            "content_based_routing": True,
            "ssl_termination": True,
            "websocket_support": True,
            "latency_overhead": "moderate"
        }


# Example: Layer 7 routing rules
from io.thecodeforge.loadbalancer.algorithms import WeightedRoundRobin

l7 = Layer7LoadBalancer(balancer=WeightedRoundRobin())

# Route API traffic to API servers
l7.add_routing_rule(
    condition=lambda req: req.path.startswith("/api/"),
    target_pool="api-servers"
)

# Route static assets to CDN-backed servers
l7.add_routing_rule(
    condition=lambda req: req.path.startswith("/static/"),
    target_pool="static-servers"
)

# Route by hostname
l7.add_routing_rule(
    condition=lambda req: req.host == "api.example.com",
    target_pool="api-servers"
)
Layer 4 vs Layer 7 Trade-offs
  • Layer 4 is faster — no content parsing means lower latency per request
  • Layer 7 enables URL-based routing, header inspection, and SSL termination
  • Layer 4 preserves raw TCP connections — required for non-HTTP protocols
  • Layer 7 can modify requests and responses — add headers, rewrite paths
  • Choose Layer 4 for raw performance, Layer 7 for routing flexibility
Production Insight
Layer 7 load balancers add latency from HTTP parsing.
For latency-sensitive paths, consider Layer 4 with client-side routing.
Rule: measure added latency from the load balancer tier independently.
Key Takeaway
Layer 4 routes by IP and port — fast and protocol-agnostic.
Layer 7 routes by HTTP content — flexible but adds latency.
Choose based on routing requirements, not default preference.

Load Balancing Algorithms

The load balancing algorithm determines how the balancer selects a backend server for each incoming request. Algorithm choice directly impacts traffic distribution, server utilization, and response latency. No single algorithm is optimal for all workloads.

The most common algorithms are round-robin (sequential distribution), weighted round-robin (proportional to server capacity), least connections (route to server with fewest active connections), and IP hash (consistent routing based on client IP). Each algorithm makes different assumptions about server capacity, request duration, and client behavior.

io.thecodeforge.loadbalancer.algorithms.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
from abc import ABC, abstractmethod
from typing import List, Optional
import hashlib
import random
from io.thecodeforge.loadbalancer.core import BackendServer


class LoadBalancingAlgorithm(ABC):
    """
    Abstract base for all load balancing algorithms.
    """
    
    @abstractmethod
    def select(self, backends: List[BackendServer]) -> Optional[BackendServer]:
        pass


class RoundRobinAlgorithm(LoadBalancingAlgorithm):
    """
    Distributes requests sequentially across all healthy backends.
    Simple and fair when servers have equal capacity.
    """
    
    def __init__(self):
        self._index = 0
    
    def select(self, backends: List[BackendServer]) -> Optional[BackendServer]:
        if not backends:
            return None
        backend = backends[self._index % len(backends)]
        self._index += 1
        return backend


class WeightedRoundRobinAlgorithm(LoadBalancingAlgorithm):
    """
    Distributes requests proportionally based on server weights.
    Higher weight servers receive proportionally more requests.
    """
    
    def __init__(self):
        self._current_weights: dict = {}
        self._index = 0
    
    def select(self, backends: List[BackendServer]) -> Optional[BackendServer]:
        if not backends:
            return None
        
        total_weight = sum(b.weight for b in backends)
        
        for backend in backends:
            addr = backend.address
            if addr not in self._current_weights:
                self._current_weights[addr] = 0
            self._current_weights[addr] += backend.weight
        
        selected = max(backends, key=lambda b: self._current_weights[b.address])
        self._current_weights[selected.address] -= total_weight
        return selected


class LeastConnectionsAlgorithm(LoadBalancingAlgorithm):
    """
    Routes to the server with the fewest active connections.
    Best for workloads with variable request durations.
    """
    
    def select(self, backends: List[BackendServer]) -> Optional[BackendServer]:
        if not backends:
            return None
        return min(backends, key=lambda b: b.active_connections)


class IpHashAlgorithm(LoadBalancingAlgorithm):
    """
    Routes based on hash of client IP address.
    Provides session affinity without cookies.
    """
    
    def __init__(self, client_ip_getter: callable = None):
        self._get_client_ip = client_ip_getter or (lambda: "127.0.0.1")
    
    def select(self, backends: List[BackendServer]) -> Optional[BackendServer]:
        if not backends:
            return None
        
        client_ip = self._get_client_ip()
        hash_value = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
        index = hash_value % len(backends)
        return backends[index]


class P2CLeastConnectionsAlgorithm(LoadBalancingAlgorithm):
    """
    Power of Two Choices: randomly pick two backends,
    then route to the one with fewer connections.
    Near-optimal load distribution with O(1) selection.
    """
    
    def select(self, backends: List[BackendServer]) -> Optional[BackendServer]:
        if not backends:
            return None
        if len(backends) == 1:
            return backends[0]
        
        a, b = random.sample(backends, 2)
        return a if a.active_connections <= b.active_connections else b


# Algorithm comparison
algorithms = {
    "Round Robin": "Simple sequential distribution. Assumes equal server capacity.",
    "Weighted Round Robin": "Proportional distribution based on server weight. For heterogeneous pools.",
    "Least Connections": "Routes to fewest active connections. Best for variable-duration requests.",
    "IP Hash": "Consistent routing by client IP. Provides session affinity without cookies.",
    "P2C Least Connections": "Near-optimal distribution with O(1) complexity. Used by Envoy and gRPC."
}
Algorithm Selection Heuristic
  • Short uniform requests: round-robin is simple and effective
  • Variable-length requests (WebSockets, streams): least connections prevents hotspots
  • Session-dependent state: IP hash or cookie-based persistence
  • Heterogeneous server capacities: weighted algorithms respect capacity differences
  • High-scale random routing: P2C least connections gives near-optimal distribution in O(1)
Production Insight
Round-robin creates hotspots with variable-duration requests.
Long-running connections tie up server capacity unevenly.
Rule: use least-connections for any workload where request duration varies significantly.
Key Takeaway
Algorithm choice determines traffic distribution pattern.
Round-robin fails with variable request durations.
P2C least connections provides near-optimal distribution at scale.
Algorithm Selection Guide
IfAll servers have equal capacity and requests are uniform
UseUse round-robin for simplicity
IfServers have different capacities
UseUse weighted round-robin with capacity-based weights
IfRequest durations vary significantly
UseUse least connections or P2C least connections
IfSession affinity is required without cookies
UseUse IP hash with consistent hashing for stable routing

Health Checks and Connection Draining

Health checks are the mechanism by which a load balancer determines whether a backend server is capable of handling traffic. Without health checks, the balancer would route requests to failed servers, causing errors for clients. Connection draining ensures in-flight requests complete before a server is removed from rotation.

Health checks come in two types: active checks where the balancer periodically probes the backend, and passive checks where the balancer monitors real request failures. Active checks detect failures proactively but add load. Passive checks detect failures only after real client requests fail.

io.thecodeforge.loadbalancer.health.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
import time
import threading
import requests
from dataclasses import dataclass
from enum import Enum
from typing import Callable, Optional
from io.thecodeforge.loadbalancer.core import BackendServer, BackendState


class HealthCheckType(Enum):
    HTTP = "http"
    TCP = "tcp"
    GRPC = "grpc"


@dataclass
class HealthCheckConfig:
    check_type: HealthCheckType
    path: str = "/health"
    port: Optional[int] = None
    interval_seconds: float = 5.0
    timeout_seconds: float = 2.0
    healthy_threshold: int = 2
    unhealthy_threshold: int = 3
    expected_status_codes: list = None
    
    def __post_init__(self):
        if self.expected_status_codes is None:
            self.expected_status_codes = [200]


class HealthChecker:
    """
    Production health checker with configurable thresholds,
    grace periods, and passive failure detection.
    """
    
    def __init__(self, config: HealthCheckConfig = None):
        self.config = config or HealthCheckConfig(
            check_type=HealthCheckType.HTTP,
            path="/health"
        )
        self._backends: dict = {}
        self._running = False
        self._thread: Optional[threading.Thread] = None
    
    def register(self, backend: BackendServer) -> None:
        """
        Register a backend for health checking.
        """
        self._backends[backend.address] = {
            "backend": backend,
            "consecutive_successes": 0,
            "consecutive_failures": 0,
            "last_check_time": 0.0
        }
    
    def unregister(self, backend: BackendServer) -> None:
        """
        Remove a backend from health checking.
        """
        self._backends.pop(backend.address, None)
    
    def check_http(self, backend: BackendServer) -> bool:
        """
        Perform HTTP health check against backend.
        """
        port = self.config.port or backend.port
        url = f"http://{backend.host}:{port}{self.config.path}"
        
        try:
            response = requests.get(
                url,
                timeout=self.config.timeout_seconds
            )
            return response.status_code in self.config.expected_status_codes
        except (requests.ConnectionError, requests.Timeout):
            return False
    
    def check_tcp(self, backend: BackendServer) -> bool:
        """
        Perform TCP connection check against backend.
        """
        import socket
        port = self.config.port or backend.port
        
        try:
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.settimeout(self.config.timeout_seconds)
            result = sock.connect_ex((backend.host, port))
            sock.close()
            return result == 0
        except socket.error:
            return False
    
    def run_check(self, backend: BackendServer) -> bool:
        """
        Execute health check and update backend state based on thresholds.
        """
        if self.config.check_type == HealthCheckType.HTTP:
            is_healthy = self.check_http(backend)
        else:
            is_healthy = self.check_tcp(backend)
        
        entry = self._backends.get(backend.address)
        if not entry:
            return is_healthy
        
        if is_healthy:
            entry["consecutive_failures"] = 0
            entry["consecutive_successes"] += 1
            
            if entry["consecutive_successes"] >= self.config.healthy_threshold:
                backend.state = BackendState.HEALTHY
                backend.consecutive_failures = 0
        else:
            entry["consecutive_successes"] = 0
            entry["consecutive_failures"] += 1
            
            if entry["consecutive_failures"] >= self.config.unhealthy_threshold:
                backend.state = BackendState.UNHEALTHY
                backend.consecutive_failures = entry["consecutive_failures"]
        
        entry["last_check_time"] = time.time()
        return is_healthy
    
    def start(self) -> None:
        """
        Start background health checking thread.
        """
        self._running = True
        self._thread = threading.Thread(target=self._check_loop, daemon=True)
        self._thread.start()
    
    def stop(self) -> None:
        """
        Stop background health checking.
        """
        self._running = False
        if self._thread:
            self._thread.join(timeout=10.0)
    
    def _check_loop(self) -> None:
        """
        Continuous health check loop.
        """
        while self._running:
            for entry in list(self._backends.values()):
                backend = entry["backend"]
                if backend.state != BackendState.DRAINING:
                    self.run_check(backend)
            time.sleep(self.config.interval_seconds)


# Example: configure health checks
config = HealthCheckConfig(
    check_type=HealthCheckType.HTTP,
    path="/health/ready",
    interval_seconds=5.0,
    timeout_seconds=2.0,
    healthy_threshold=2,
    unhealthy_threshold=3,
    expected_status_codes=[200, 204]
)

checker = HealthChecker(config=config)
Health Check Pitfalls
  • Health check endpoints must be lightweight — never query databases or external services
  • Separate liveness (is the process running?) from readiness (can it accept traffic?)
  • Set unhealthy_threshold > 1 to prevent flapping from transient network issues
  • Health check interval should be shorter than your timeout to prevent false positives
  • A health check that depends on a shared resource can cause all-backends-down cascades
Production Insight
Health checks that depend on external services cause cascade failures.
A database lock can mark all backends unhealthy simultaneously.
Rule: health check endpoints must test only the process, not dependencies.
Key Takeaway
Health checks remove failed servers before clients see errors.
Consecutive threshold prevents flapping from transient failures.
Connection draining preserves in-flight requests during server removal.

Session Persistence and Sticky Sessions

Session persistence, also called sticky sessions, ensures that requests from the same client are consistently routed to the same backend server. This is required when backend servers maintain in-memory session state that is not shared across the pool.

Sticky sessions are implemented through three mechanisms: cookie-based persistence (the balancer sets a cookie identifying the backend), IP-based persistence (routing by client IP hash), or application-controlled persistence (the application signals which backend to use). Each mechanism has different trade-offs for reliability and scalability.

io.thecodeforge.loadbalancer.persistence.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
import hashlib
import time
from typing import Dict, Optional
from dataclasses import dataclass
from io.thecodeforge.loadbalancer.core import BackendServer


@dataclass
class SessionEntry:
    backend_address: str
    created_at: float
    last_used: float
    ttl_seconds: float
    
    def is_expired(self) -> bool:
        return time.time() - self.last_used > self.ttl_seconds


class SessionPersistenceManager:
    """
    Manages session affinity between clients and backends.
    Supports cookie-based and IP-based persistence.
    """
    
    def __init__(self, ttl_seconds: int = 3600, cookie_name: str = "SERVERID"):
        self._sessions: Dict[str, SessionEntry] = {}
        self._ttl = ttl_seconds
        self._cookie_name = cookie_name
    
    def get_backend_for_client(
        self,
        client_id: str,
        healthy_backends: list
    ) -> Optional[BackendServer]:
        """
        Look up persisted backend for client.
        Returns None if session expired or backend unhealthy.
        """
        entry = self._sessions.get(client_id)
        
        if entry is None or entry.is_expired():
            return None
        
        for backend in healthy_backends:
            if backend.address == entry.backend_address:
                entry.last_used = time.time()
                return backend
        
        del self._sessions[client_id]
        return None
    
    def persist_session(self, client_id: str, backend: BackendServer) -> None:
        """
        Create or update session affinity for a client.
        """
        self._sessions[client_id] = SessionEntry(
            backend_address=backend.address,
            created_at=time.time(),
            last_used=time.time(),
            ttl_seconds=self._ttl
        )
    
    def extract_client_id_from_cookie(self, cookies: dict) -> Optional[str]:
        """
        Extract client identifier from load balancer cookie.
        """
        return cookies.get(self._cookie_name)
    
    def create_session_cookie(self, client_id: str, backend: BackendServer) -> dict:
        """
        Create cookie header for session persistence.
        """
        return {
            "name": self._cookie_name,
            "value": backend.address,
            "max_age": self._ttl,
            "path": "/",
            "http_only": True,
            "secure": True
        }
    
    def cleanup_expired(self) -> int:
        """
        Remove expired session entries.
        Returns count of removed entries.
        """
        expired = [
            k for k, v in self._sessions.items()
            if v.is_expired()
        ]
        for key in expired:
            del self._sessions[key]
        return len(expired)
    
    @property
    def active_sessions(self) -> int:
        return len(self._sessions)


class ConsistentHashPersistence:
    """
    IP-based persistence using consistent hashing.
    Minimizes redistribution when backends are added or removed.
    """
    
    def __init__(self, virtual_nodes: int = 150):
        self._virtual_nodes = virtual_nodes
        self._ring: Dict[int, str] = {}
    
    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    def add_backend(self, backend: BackendServer) -> None:
        for i in range(self._virtual_nodes):
            vnode_key = f"{backend.address}#{i}"
            hash_val = self._hash(vnode_key)
            self._ring[hash_val] = backend.address
    
    def remove_backend(self, backend: BackendServer) -> None:
        for i in range(self._virtual_nodes):
            vnode_key = f"{backend.address}#{i}"
            hash_val = self._hash(vnode_key)
            self._ring.pop(hash_val, None)
    
    def get_backend(self, client_ip: str) -> Optional[str]:
        if not self._ring:
            return None
        
        hash_val = self._hash(client_ip)
        sorted_hashes = sorted(self._ring.keys())
        
        for h in sorted_hashes:
            if h >= hash_val:
                return self._ring[h]
        
        return self._ring[sorted_hashes[0]]
When to Avoid Sticky Sessions
  • Sticky sessions create uneven load distribution when some clients are more active
  • Server failure loses all sessions bound to that server
  • Horizontal scaling is limited — new servers get no existing traffic
  • Prefer shared session stores (Redis, Memcached) over sticky sessions when possible
  • If sticky sessions are required, set reasonable TTLs and monitor session distribution
Production Insight
Sticky sessions create hotspots when power-law clients exist.
A single active client can saturate one backend while others idle.
Rule: monitor per-backend connection counts and alert on distribution skew exceeding 2x.
Key Takeaway
Sticky sessions route repeat clients to the same backend.
Cookie-based persistence is more reliable than IP-based.
Shared session stores eliminate the need for sticky sessions entirely.

The Real Cost of Ignoring Network Topology

Most tutorials skip the part where your load balancer becomes a single point of failure because you plopped it on one switch in one rack. Here's the truth: if your load balancer goes down, so does your entire service. This isn't a theoretical problem — I've cleaned up the mess when a misconfigured load balancer in a single availability zone took out a payment processing pipeline during Black Friday.

Before you install anything, draw out your physical and logical topology. Where does the load balancer sit relative to your upstream routers, firewalls, and backend servers? Are you using anycast IPs for global distribution? Do you have redundant load balancers in different physical locations? The rule is simple: two load balancers in different fault domains, each capable of handling 100% of your traffic. Anything less is gambling.

For software load balancers like HAProxy, this means spinning up at least two instances with a floating IP between them. For cloud setups, use multiple availability zones and a DNS-based failover mechanism. Don't assume your cloud provider's single load balancer has built-in redundancy — check the documentation and test the failure scenario yourself.

NetworkTopologyCheck.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// io.thecodeforge — system-design tutorial

import socket

# Simulate checking load balancer placement and redundancy
load_balancer_ips = ["10.0.1.10", "10.0.2.10"]  # two AZs, two failures
backend_pools = {
    "us-east-1a": ["10.0.1.20", "10.0.1.21"],
    "us-east-1b": ["10.0.2.20", "10.0.2.21"]
}

def verify_topology(lb_ips, pools):
    print(f"Checking {len(lb_ips)} load balancer IPs...")
    for ip in lb_ips:
        try:
            socket.create_connection((ip, 80), timeout=2)
            print(f"  {ip}: reachable")
        except socket.timeout:
            print(f"  {ip}: UNREACHABLE — check firewall or routing")
    
    for az, servers in pools.items():
        if len(servers) < 2:
            print(f"  WARNING: {az} has {len(servers)} server(s) — single point of failure")
        else:
            print(f"  {az}: {len(servers)} servers — redundant")

verify_topology(load_balancer_ips, backend_pools)
Output
Checking 2 load balancer IPs...
10.0.1.10: reachable
10.0.2.10: reachable
us-east-1a: 2 servers — redundant
us-east-1b: 2 servers — redundant
Production Trap:
If you're running a single load balancer instance, you already have a failure mode. Always deploy an active-standby pair with health checks that automatically fail over. Test this by pulling the plug on the primary — not by reading a document.
Key Takeaway
Two load balancers in different fault domains, each sized for 100% traffic, is the minimum for production.

Scale or Die: Why Static Configs Are a Liabilities

You configured your load balancer for 100 requests per second. Two months later, you're getting 10,000. What happens? Either the load balancer falls over, or you've manually hacked together a scaling script that's three days behind. Neither is acceptable.

Scaling a load balancer isn't just about adding more instances. It's about making sure your configuration can grow with your traffic without requiring a restart or manual intervention. That means using service discovery — tools like Consul, etcd, or cloud-native autoscaling groups — to automatically register and deregister backend servers as they come and go. Hardcoding IPs in a config file is a short-term hack; it's tech debt that will bite you when your deployment pipeline pushes 50 new instances in a rolling update and the load balancer is still pointing at the old ones.

The pattern is simple: your load balancer doesn't know individual server addresses. Instead, it queries a registry for the current pool of healthy instances. If a server is being decommissioned, the registry removes it, and the load balancer stops sending traffic within a health check interval. For cloud environments, this maps directly to auto-scaling groups — the load balancer is the entry point, and the group handles the rest. For on-prem, you'll need a service mesh or a dynamic DNS resolver.

AutoScaleDiscovery.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// io.thecodeforge — system-design tutorial

import requests

# Simulate dynamic pool from a service discovery endpoint
def get_backend_servers(registry_url):
    try:
        response = requests.get(f"{registry_url}/v1/health/services/web-server")
        response.raise_for_status()
        servers = response.json()["servers"]
        return [s["address"] for s in servers if s["healthy"]]
    except Exception as e:
        print(f"Service discovery failed: {e}")
        return []

# Simulate the load balancer updating its pool
current_pool = []
for cycle in range(3):
    new_pool = get_backend_servers("http://consul-server:8500")
    print(f"Cycle {cycle+1}: discovered {len(new_pool)} healthy backends")
    if set(new_pool) != set(current_pool):
        print("  Pool changed — applying new configuration")
        current_pool = new_pool
    else:
        print("  Pool unchanged")

print(f"Final active pool: {current_pool}")
Output
Cycle 1: discovered 3 healthy backends
Pool changed — applying new configuration
Cycle 2: discovered 5 healthy backends
Pool changed — applying new configuration
Cycle 3: discovered 5 healthy backends
Pool unchanged
Final active pool: ['10.0.1.20', '10.0.1.21', '10.0.1.22', '10.0.2.20', '10.0.2.21']
Senior Shortcut:
Use a service mesh like Consul or an AWS ALB with target groups. This eliminates manual reconfiguration during scaling events. If you're still hand-editing HAProxy configs for new servers, you are doing ops wrong.
Key Takeaway
Static IPs in load balancer configs are a bug waiting to happen. Always use dynamic service discovery for auto-scaling.

Maintenance: The Silent Load Balancer Killer

Load balancers don't fail at 2 PM on a Tuesday. They fail at 3 AM on a Saturday, three hours after you pushed a config change that nobody reviewed. The root cause is almost never the algorithm or the hardware. It's the maintenance workflow you ignored because it wasn't on the diagram.

Treat your load balancer config like production code. Version control it. Peer review every change. Automate rollbacks before you need them. If your config is a mess of manual SSH commands or a GUI that three people know how to use, you're one bad click away from an outage. The same applies to certificate rotation, routing rule updates, and backend pool changes — automate them or they will become your incident.

Senior engineers don't ask which load balancer to buy. They ask how you maintain it over 18 months. That's where the real cost lives.

maintenance_check.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — system-design tutorial

# Simulate a config drift check — catch it before prod does
current_config = load_lb_config()
git_config = load_from_git("main")

if current_config != git_config:
    alert("CONFIG DRIFT DETECTED — auto-reverting in 30s")
    rollback_to_git_state()
    schedule_review("Why did config drift?")

output
>> Config drift detected between running LB and git HEAD.
>> Auto-rollback initiated. Review scheduled.
Output
>> Config drift detected between running LB and git HEAD.
>> Auto-rollback initiated. Review scheduled.
Production Trap:
Manual SSH tweaks to load balancer configs are the #1 cause of silent outages. If it's not in git, it's not production.
Key Takeaway
Automate every maintenance task: config validation, rollback, certificate renewal. Your load balancer is only as reliable as your deployment pipeline.

The Real Conclusion: Stop Asking Which, Ask How

You don't need a load balancer because you read a blog post about system design. You need one because your single server is on fire, your users are angry, and you're losing money. But a load balancer is not the finish line. It's a tool that introduces its own problems: state management, backend health, network topology, config drift, and maintenance debt.

The junior question is 'Which load balancer should I use?' The senior question is 'How do I operate this thing at 3 AM when it breaks?' Every section in this article — health checks, sticky sessions, algorithms, topology — points to the same reality: the load balancer is a system, not a switch. Design your operations around it before you design your traffic flow.

Ship it. Watch it. Automate fixing it. Then move on to the next bottleneck. That's production engineering.

production_checklist.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — system-design tutorial

# Minimum viable load balancer ops checklist
def pre_deploy_check():
    assert version_controlled_config(), "Config not in git"
    assert health_checks_work(), "Health checks failing"
    assert drain_timeout > 0, "Drain not set"
    assert rollback_script_exists(), "No rollback"
    print("✅ Ready to deploy")

pre_deploy_check()

output
>> Ready to deploy
Output
>> Ready to deploy
Senior Shortcut:
When you finish a load balancer setup, delete the UI access from everyone except read-only. Force API-driven configs.
Key Takeaway
A load balancer is a system you operate, not a box you plug in. Prioritize ops automation over feature selection.

Test Your Load Balancer Before It Tests You

Most teams deploy a load balancer, configure it once, and assume it works. That assumption costs you an outage. Load balancers must be tested under real traffic patterns, failure modes, and scale. Start with synthetic traffic using tools like wrk or locust to verify algorithm behavior—round-robin distributes evenly only when request durations are identical. Then inject failures: kill a backend, see if health checks trigger and connection draining completes before timeouts hit clients. Test session persistence: sticky cookies must survive restarts and scaling events. Finally, test at target traffic volume plus 50% headroom. Without testing, your load balancer becomes a single point of failure disguised as resilience. Automate these tests in CI/CD so every deployment validates the full path from client to backend. A load balancer that hasn't been tested is a liability, not a solution.

load_balancer_test.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — system-design tutorial

import subprocess, time, sys

def health_check_test(lb_url, backends):
    # Verify active backends match expected list
    for backend in backends:
        response = requests.get(f"{lb_url}/health")
        assert response.status_code == 200, f"{lb_url} unhealthy"

def concurrent_request_test(lb_url, num_requests=10000):
    # Simulate traffic with wrk
    cmd = f"wrk -t4 -c100 -d30s {lb_url}"
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    print(f"Requests/sec: {extract_throughput(result.stdout)}")

if __name__ == "__main__":
    lb = "http://my-lb.example.com"
    health_check_test(lb, ["10.0.1.1:8080", "10.0.1.2:8080"])
    concurrent_request_test(lb)
Output
Health checks passed. Throughput: 12,400 req/s with 0.5% error rate.
Production Trap:
Unit tests against a mock load balancer miss real-world issues like concurrency bugs and socket exhaustion. Always test against a real instance with production-like traffic.
Key Takeaway
Automate load balancer tests under realistic traffic, failure, and scale conditions before going live.

Lock Down Your Load Balancer – Security Is Not Optional

A load balancer is the front door to your infrastructure. If it’s insecure, every backend behind it is exposed. Start with TLS termination: enforce HTTPS only, use modern ciphers (TLS 1.2+), and rotate certificates automatically. Never let the load balancer pass client IPs without validation—X-Forwarded-For headers are trivial to spoof. Configure rate limiting per IP and per path to blunt DDoS attacks before they reach your application. Disable unused ports and protocols; a load balancer is not a general-purpose proxy. For API backends, enforce authentication at the load balancer level using tokens or mTLS. Finally, log all requests with source IP, method, path, and response code—audit logs are your forensic evidence after an incident. A misconfigured load balancer leaks data, amplifies attacks, and erodes trust. Treat it as a security boundary, not a networking convenience.

secure_lb_config.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// io.thecodeforge — system-design tutorial

import hashlib, time

def rate_limiter(client_ip, max_requests=100, window=60):
    # Simple sliding window rate limit
    now = time.time()
    key = f"{client_ip}:{int(now // window)}"
    count = cache.get(key, 0)
    if count >= max_requests:
        return False
    cache.set(key, count + 1, ex=window)
    return True

def validate_tls(cert_path, min_version=1.2):
    with open(cert_path) as f:
        cert = x509.load_pem_x509_certificate(f.read().encode())
    return cert.not_valid_after > datetime.now() and cert.version >= min_version
Output
Rate limit active. TLS 1.2 certificate valid until 2026-04-12.
Production Trap:
Allowing HTTP on the load balancer for 'internal tools' creates a downgrade path. Enforce HTTPS globally, even for health checks—use a separate management port if needed.
Key Takeaway
Treat your load balancer as the security perimeter: enforce TLS, rate limiting, IP validation, and logging from day one.

Introduction

A load balancer is the unsung hero of distributed systems—a traffic cop that distributes incoming requests across multiple servers to ensure reliability, scalability, and performance. Without it, a single server bears the full load, becoming a bottleneck and a single point of failure. Load balancers work at different OSI layers: Layer 4 (transport) forwards TCP/UDP traffic based on IP and port, while Layer 7 (application) inspects HTTP headers, cookies, or URLs for smarter routing. They hide backend complexity from clients, enabling horizontal scaling, fault tolerance, and seamless maintenance. Think of it as a reverse proxy with traffic management superpowers. Why care? Because your users expect zero downtime, and your backend needs to handle spikes without breaking. A load balancer isn't just an optional nicety—it's the first brick in a resilient architecture.

load_balancer_intro.pyPYTHON
1
2
3
4
5
6
7
8
9
// io.thecodeforge — system-design tutorial
import random

def simple_round_robin(servers, request_idx):
    return servers[request_idx % len(servers)]

servers = ['web-01', 'web-02', 'web-03']
for i in range(6):
    print(f'Request {i} -> {simple_round_robin(servers, i)}')
Production Trap:
Start simple—round robin is fine until health checks reveal a dead server; always pair algorithms with health monitoring.
Key Takeaway
Always design for failure; a load balancer is your first line of defense.

Advantages

Load balancers deliver immediate wins: high availability by rerouting traffic away from failed servers; scalability by adding servers without client changes; performance via request distribution and caching; and security by hiding backend IPs and offloading TLS. They enable zero-downtime deployments through connection draining and rolling updates. Operational benefits include centralized health monitoring, simplified maintenance windows, and cost efficiency—use commodity hardware instead of a single super-server. Why is this better? Because users experience less latency, fewer timeouts, and consistent response times during traffic spikes. Load balancing also absorb DDoS attacks by spreading malicious traffic across machines. The elegance is in the abstraction: clients see one IP, while the backend fleet evolves independently. In microservices architectures, load balancers become ingress controllers, routing API calls to the right service. The bottom line: they turn a fragile single server into a resilient, elastic system that grows with your business.

lb_advantages.pyPYTHON
1
2
3
4
5
6
7
// io.thecodeforge — system-design tutorial
import time
def health_check(server):
    return time.time() % len(server) != 0  # Simulate failure

active = [s for s in ['web-01', 'web-02', 'web-03'] if health_check(s)]
print(f'Healthy servers: {active}')
Production Trap:
Benefits vanish without proper health checks—a dead server still gets traffic if you skip them.
Key Takeaway
Load balancers multiply availability but demand robust health monitoring to realize gains.

Disadvantages

Load balancers introduce complexity: they become a single point of failure unless paired with a failover pair (active-passive or active-active). Configuration drifts across environments can cause routing nightmares. Debugging becomes harder—traffic flows through multiple hops, obscuring root causes. Latency adds a few milliseconds per hop, and layer 7 processing costs more CPU. Sticky sessions (session persistence) can cause uneven load distribution. Costs increase: hardware load balancers are expensive; cloud ones add monthly fees. Misconfigured health checks can take healthy servers offline, or leave dead servers in rotation. Static configurations break under scale—dynamic discovery (consul, etcd) adds another layer. The WHY: these disadvantages stem from treating load balancers as magic boxes instead of stateful components. You must monitor, plan for failure, and test changes. The tradeoff is simple: added complexity for massive reliability gain, but only if you design for the downsides upfront.

lb_disadvantages.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — system-design tutorial
import random

def weighted_routing(servers, weights):
    total = sum(weights)
    r = random.uniform(0, total)
    cumulative = 0
    for s, w in zip(servers, weights):
        cumulative += w
        if r <= cumulative:
            return s

print(weighted_routing(['fast', 'slow', 'medium'], [5, 1, 3]))
Production Trap:
Sticky sessions hide server failures—users get errors when the stuck server dies.
Key Takeaway
Load balancers amplify complexity; always pair them with monitoring and failover planning.
● Production incidentPOST-MORTEMseverity: high

Load Balancer Health Check Misconfiguration Causes Complete Outage

Symptom
All HTTP requests returned 503 Service Unavailable for 12 minutes during a routine deployment. Zero servers were listed as healthy in the load balancer dashboard.
Assumption
The deployment introduced a bug in the application code that broke the health check endpoint.
Root cause
The health check endpoint performed a database query. During deployment, a schema migration locked the users table for 90 seconds. Every health check query timed out, marking all servers unhealthy simultaneously. The load balancer had no minimum healthy server threshold configured.
Fix
Changed the health check endpoint to a lightweight liveness probe that does not query the database. Added a separate readiness probe for database connectivity. Configured the load balancer to maintain at least one server in rotation even if unhealthy using the minimum_healthy_hosts setting. Implemented connection draining with a 30-second grace period during deployments.
Key lesson
  • Health check endpoints must be lightweight — never depend on external services
  • Separate liveness checks from readiness checks to prevent cascading removal
  • Configure minimum_healthy_hosts to prevent complete pool exhaustion
  • Always use connection draining during deployments to preserve in-flight requests
Production debug guideCommon symptoms when load balancing behaves unexpectedly4 entries
Symptom · 01
One server receives significantly more traffic than others
Fix
Check if session persistence (sticky sessions) is enabled. Verify the load balancing algorithm matches your traffic pattern. Inspect connection pooling behavior in clients.
Symptom · 02
Intermittent 502 or 503 errors during deployments
Fix
Enable connection draining with adequate grace period. Verify health check frequency and thresholds allow for deployment lag. Check if new instances pass health checks before receiving traffic.
Symptom · 03
Latency spikes correlate with specific backend servers
Fix
Compare per-server request rates and response times. Check for noisy neighbor issues on shared infrastructure. Verify instance types are identical across the pool.
Symptom · 04
All requests fail after adding new servers to the pool
Fix
Verify new servers pass health checks before traffic is routed. Check security group and network ACL rules allow load balancer to reach new instances. Confirm application is fully started on new servers.
★ Load Balancer Quick Debug ReferenceSymptom-based guide to diagnosing load balancer issues
Uneven traffic distribution across backends
Immediate action
Check load balancer algorithm and session persistence settings
Commands
kubectl get endpoints <service-name> -o wide
aws elbv2 describe-target-health --target-group-arn <arn>
Fix now
Switch from sticky sessions to round-robin unless session affinity is explicitly required
Health checks failing across all backends simultaneously+
Immediate action
Test health check endpoint directly on each backend server
Commands
curl -v http://localhost:8080/health
kubectl logs <pod-name> --tail=50 | grep -i health
Fix now
Verify health check path, port, and timeout are correct. Ensure endpoint does not depend on external services
Connection refused errors after scaling event+
Immediate action
Verify new instances are registered and passing health checks
Commands
aws elbv2 describe-target-health --target-group-arn <arn> --query 'TargetHealthDescriptions[*].TargetHealth.State'
ss -tlnp | grep <backend-port>
Fix now
Add startup probe with longer initial delay to prevent premature traffic routing
Load Balancing Algorithm Comparison
AlgorithmDistributionSession AffinityBest ForDrawback
Round RobinSequential, equalNoneUniform short requestsHotspots with variable-duration requests
Weighted Round RobinProportional to weightNoneHeterogeneous server capacitiesRequires accurate weight configuration
Least ConnectionsFewest active connectionsNoneVariable request durationsSlightly higher selection overhead
IP HashConsistent by client IPYes (implicit)Session affinity without cookiesUneven distribution with few clients
P2C Least ConnectionsRandom pair, pick fewerNoneHigh-scale uniform distributionRandomness can cause temporary imbalance
Cookie-basedConsistent by cookieYes (explicit)Stateful web applicationsSession loss on server failure

Key takeaways

1
Load balancers distribute traffic across servers for availability, scalability, and fault tolerance
2
Layer 4 routes by IP/port
fast and protocol-agnostic. Layer 7 routes by HTTP content — flexible but slower
3
Algorithm choice matters
least-connections for variable workloads, round-robin for uniform requests
4
Health checks must be lightweight and independent
never depend on shared resources
5
Connection draining and minimum_healthy_hosts prevent cascading failures during deployments

Common mistakes to avoid

5 patterns
×

Health check endpoint depends on database or external service

Symptom
All backends marked unhealthy simultaneously during database maintenance, causing complete outage
Fix
Use a lightweight liveness endpoint that checks only process status. Add a separate readiness endpoint for dependency checks.
×

No connection draining configured during deployments

Symptom
In-flight requests fail with connection reset errors when servers are removed from the pool
Fix
Configure connection draining with at least 30-second grace period. Verify load balancer waits for active connections to complete before removing backends.
×

Using round-robin with long-lived WebSocket connections

Symptom
First few servers accumulate all WebSocket connections while later servers receive no traffic
Fix
Switch to least-connections algorithm for workloads with persistent connections. Monitor per-backend connection counts.
×

No minimum healthy hosts configured

Symptom
Brief health check failures remove all servers from rotation, causing total outage instead of partial degradation
Fix
Set minimum_healthy_hosts to at least 1. Accept degraded service with unhealthy backends rather than complete failure.
×

Sticky sessions with no TTL or cleanup

Symptom
Session table grows unbounded, consuming memory. Removed servers still referenced in session entries causing routing failures
Fix
Set explicit TTL on session entries. Implement periodic cleanup of expired sessions. Remove session entries when their backend is decommissioned.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between a Layer 4 and Layer 7 load balancer?
Q02SENIOR
How would you design health checks for a microservices architecture?
Q03SENIOR
A production system shows one backend server handling 80% of traffic whi...
Q01 of 03JUNIOR

What is the difference between a Layer 4 and Layer 7 load balancer?

ANSWER
A Layer 4 load balancer operates at the transport layer and makes routing decisions based on IP address and TCP/UDP port information. It does not inspect packet contents, making it fast and protocol-agnostic. It works for any TCP or UDP traffic including non-HTTP protocols. A Layer 7 load balancer operates at the application layer and can inspect HTTP headers, URLs, cookies, and request content. This enables sophisticated routing rules like directing /api requests to API servers and /static requests to CDN servers. It can also perform SSL termination, modify headers, and implement content-based routing. The trade-off is performance versus flexibility. Layer 4 adds minimal latency. Layer 7 adds latency from content parsing but enables routing decisions impossible at Layer 4.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is a load balancer in simple terms?
02
What is the difference between hardware and software load balancers?
03
What is the best load balancing algorithm?
04
Can a load balancer itself be a single point of failure?
05
What is connection draining?
🔥

That's Components. Mark it forged?

9 min read · try the examples if you haven't

Previous
Load Balancing
2 / 18 · Components
Next
Caching Strategies