What Is a Load Balancer? Types, Algorithms, and How They Work
- Load balancers distribute traffic across servers for availability, scalability, and fault tolerance
- Layer 4 routes by IP/port β fast and protocol-agnostic. Layer 7 routes by HTTP content β flexible but slower
- Algorithm choice matters: least-connections for variable workloads, round-robin for uniform requests
- A load balancer distributes incoming network traffic across multiple backend servers
- It prevents any single server from becoming overwhelmed and improves availability
- Layer 4 (transport) balances by IP and port; Layer 7 (application) balances by HTTP content
- Health checks remove unhealthy servers from rotation automatically
- Production outages often trace back to misconfigured health checks or missing connection draining
- Biggest mistake: treating load balancing as set-and-forget without monitoring distribution skew
Uneven traffic distribution across backends
kubectl get endpoints <service-name> -o wideaws elbv2 describe-target-health --target-group-arn <arn>Health checks failing across all backends simultaneously
curl -v http://localhost:8080/healthkubectl logs <pod-name> --tail=50 | grep -i healthConnection refused errors after scaling event
aws elbv2 describe-target-health --target-group-arn <arn> --query 'TargetHealthDescriptions[*].TargetHealth.State'ss -tlnp | grep <backend-port>Production Incident
Production Debug GuideCommon symptoms when load balancing behaves unexpectedly
Load balancers are critical infrastructure components that distribute client requests across a pool of backend servers. They improve application availability, enable horizontal scaling, and provide fault tolerance by routing traffic away from failed instances. Every production web service behind more than one server requires a load balancing layer.
Misunderstanding load balancer algorithms, health check configurations, and session persistence mechanisms causes some of the most common production incidents. A misconfigured health check can remove all servers from rotation simultaneously, causing a complete outage. An incorrect algorithm choice can create hotspots where one server handles 80% of traffic while others sit idle.
What Is a Load Balancer?
A load balancer is a device or software component that distributes incoming network traffic across multiple backend servers. It acts as a single entry point for client requests and routes them to available servers based on a configured algorithm.
Load balancers solve three fundamental problems: availability by removing failed servers from rotation, scalability by enabling horizontal addition of servers, and performance by preventing any single server from becoming a bottleneck. Without a load balancer, every client would need to know individual server addresses, and a single server failure would cause service disruption.
from dataclasses import dataclass, field from enum import Enum from typing import List, Optional import time import threading from io.thecodeforge.loadbalancer.health import HealthChecker from io.thecodeforge.loadbalancer.algorithms import LoadBalancingAlgorithm class BackendState(Enum): HEALTHY = "healthy" UNHEALTHY = "unhealthy" DRAINING = "draining" @dataclass class BackendServer: host: str port: int weight: int = 1 state: BackendState = BackendState.HEALTHY active_connections: int = 0 last_health_check: float = 0.0 consecutive_failures: int = 0 @property def address(self) -> str: return f"{self.host}:{self.port}" def is_available(self) -> bool: return self.state in (BackendState.HEALTHY, BackendState.DRAINING) class LoadBalancer: """ Production-grade load balancer with health checking, connection draining, and multiple routing algorithms. """ def __init__( health_check_interval: float = 5.0): self.backends: List[BackendServer] = [] self.algorithm = algorithm self.health_checker = HealthChecker(interval=health_check_interval) self._lock = threading.Lock() self._minimum_healthy_hosts: int = 0 def add_backend(self, host: str, port: int, weight: int = 1) -> BackendServer: """ Register a new backend server with the load balancer. """ with self._lock: backend = BackendServer(host=host, port=port, weight=weight) self.backends.append(backend) self.health_checker.register(backend) return backend def remove_backend(self, backend: BackendServer) -> None: """ Gracefully remove a backend with connection draining. """ with self._lock: backend.state = BackendState.DRAINING self.health_checker.unregister(backend) def select_backend(self) -> Optional[BackendServer]: """ Select next backend using configured algorithm. Respects minimum_healthy_hosts threshold. """ with self._lock: available = [b for b in self.backends if b.is_available()] healthy = [b for b in available if b.state == BackendState.HEALTHY] if len(healthy) < self._minimum_healthy_hosts and available: return self.algorithm.select(available) if not healthy: return None return self.algorithm.select(healthy) def set_minimum_healthy_hosts(self, count: int) -> None: """ Configure minimum healthy backends before routing stops. Prevents complete pool exhaustion during failures. """ self._minimum_healthy_hosts = count # Example usage from io.thecodeforge.loadbalancer.algorithms import RoundRobinAlgorithm lb = LoadBalancer(algorithm=RoundRobinAlgorithm(), health_check_interval=5.0) lb.add_backend("10.0.1.10", 8080) lb.add_backend("10.0.1.11", 8080) lb.addself, algorithm: LoadBalancingAlgorithm,_backend("10.0.1.12", 8080) lb.set_minimum_healthy_hosts(1) for i in range(6): backend = lb.select_backend() if backend: print(f"Request {i} -> {backend.address}")
- Clients connect to the load balancer, never directly to backend servers
- The balancer decides which backend receives each request
- Failed servers are removed automatically via health checks
- New servers are added without client-side changes
- The balancer itself must be redundant to avoid becoming a single point of failure
Types of Load Balancers
Load balancers operate at different layers of the network stack, each with distinct capabilities and trade-offs. The two primary categories are Layer 4 (transport) and Layer 7 (application) load balancers.
Layer 4 load balancers make routing decisions based on IP address and port information. They are fast and protocol-agnostic but cannot inspect request content. Layer 7 load balancers operate at the application layer and can route based on HTTP headers, URLs, cookies, and request content. They enable sophisticated routing but add latency from content inspection.
from abc import ABC, abstractmethod from typing import Optional, Dict, Any from io.thecodeforge.loadbalancer.core import BackendServer from io.thecodeforge.loadbalancer.models import Request, Connection class LoadBalancerType(ABC): """ Abstract base for Layer 4 and Layer 7 load balancer types. """ @abstractmethod def route(self, request: Any) -> Optional[BackendServer]: pass class Layer4LoadBalancer(LoadBalancerType): """ Transport-layer load balancer. Routes based on source/destination IP and port. Does not inspect packet contents. """ def __init__(self, balancer): self.balancer = balancer def route(self, connection: Connection) -> Optional[BackendServer]: """ Route based on 5-tuple: src_ip, src_port, dst_ip, dst_port, protocol. """ backend = self.balancer.select_backend() if backend: backend.active_connections += 1 return backend def get_capabilities(self) -> Dict[str, bool]: return { "protocol_agnostic": True, "url_routing": False, "header_inspection": False, "cookie_persistence": False, "content_based_routing": False, "ssl_termination": False, "websocket_support": True, "latency_overhead": "minimal" } class Layer7LoadBalancer(LoadBalancerType): """ Application-layer load balancer. Routes based on HTTP headers, URL path, hostname, cookies. """ def __init__(self, balancer): self.balancer = balancer self.rules: list = [] def add_routing_rule(self, condition: callable, target_pool: str) -> None: """ Add content-based routing rule. """ self.rules.append({"condition": condition, "pool": target_pool}) def route(self, request: Request) -> Optional[BackendServer]: """ Route based on HTTP request content. """ for rule in self.rules: if rule["condition"](request): pool = rule["pool"] return self.balancer.select_backend_from_pool(pool) return self.balancer.select_backend() def get_capabilities(self) -> Dict[str, bool]: return { "protocol_agnostic": False, "url_routing": True, "header_inspection": True, "cookie_persistence": True, "content_based_routing": True, "ssl_termination": True, "websocket_support": True, "latency_overhead": "moderate" } # Example: Layer 7 routing rules from io.thecodeforge.loadbalancer.algorithms import WeightedRoundRobin l7 = Layer7LoadBalancer(balancer=WeightedRoundRobin()) # Route API traffic to API servers l7.add_routing_rule( condition=lambda req: req.path.startswith("/api/"), target_pool="api-servers" ) # Route static assets to CDN-backed servers l7.add_routing_rule( condition=lambda req: req.path.startswith("/static/"), target_pool="static-servers" ) # Route by hostname l7.add_routing_rule( condition=lambda req: req.host == "api.example.com", target_pool="api-servers" )
Load Balancing Algorithms
The load balancing algorithm determines how the balancer selects a backend server for each incoming request. Algorithm choice directly impacts traffic distribution, server utilization, and response latency. No single algorithm is optimal for all workloads.
The most common algorithms are round-robin (sequential distribution), weighted round-robin (proportional to server capacity), least connections (route to server with fewest active connections), and IP hash (consistent routing based on client IP). Each algorithm makes different assumptions about server capacity, request duration, and client behavior.
from abc import ABC, abstractmethod from typing import List, Optional import hashlib import random from io.thecodeforge.loadbalancer.core import BackendServer class LoadBalancingAlgorithm(ABC): """ Abstract base for all load balancing algorithms. """ @abstractmethod def select(self, backends: List[BackendServer]) -> Optional[BackendServer]: pass class RoundRobinAlgorithm(LoadBalancingAlgorithm): """ Distributes requests sequentially across all healthy backends. Simple and fair when servers have equal capacity. """ def __init__(self): self._index = 0 def select(self, backends: List[BackendServer]) -> Optional[BackendServer]: if not backends: return None backend = backends[self._index % len(backends)] self._index += 1 return backend class WeightedRoundRobinAlgorithm(LoadBalancingAlgorithm): """ Distributes requests proportionally based on server weights. Higher weight servers receive proportionally more requests. """ def __init__(self): self._current_weights: dict = {} self._index = 0 def select(self, backends: List[BackendServer]) -> Optional[BackendServer]: if not backends: return None total_weight = sum(b.weight for b in backends) for backend in backends: addr = backend.address if addr not in self._current_weights: self._current_weights[addr] = 0 self._current_weights[addr] += backend.weight selected = max(backends, key=lambda b: self._current_weights[b.address]) self._current_weights[selected.address] -= total_weight return selected class LeastConnectionsAlgorithm(LoadBalancingAlgorithm): """ Routes to the server with the fewest active connections. Best for workloads with variable request durations. """ def select(self, backends: List[BackendServer]) -> Optional[BackendServer]: if not backends: return None return min(backends, key=lambda b: b.active_connections) class IpHashAlgorithm(LoadBalancingAlgorithm): """ Routes based on hash of client IP address. Provides session affinity without cookies. """ def __init__(self, client_ip_getter: callable = None): self._get_client_ip = client_ip_getter or (lambda: "127.0.0.1") def select(self, backends: List[BackendServer]) -> Optional[BackendServer]: if not backends: return None client_ip = self._get_client_ip() hash_value = int(hashlib.md5(client_ip.encode()).hexdigest(), 16) index = hash_value % len(backends) return backends[index] class P2CLeastConnectionsAlgorithm(LoadBalancingAlgorithm): """ Power of Two Choices: randomly pick two backends, then route to the one with fewer connections. Near-optimal load distribution with O(1) selection. """ def select(self, backends: List[BackendServer]) -> Optional[BackendServer]: if not backends: return None if len(backends) == 1: return backends[0] a, b = random.sample(backends, 2) return a if a.active_connections <= b.active_connections else b # Algorithm comparison algorithms = { "Round Robin": "Simple sequential distribution. Assumes equal server capacity.", "Weighted Round Robin": "Proportional distribution based on server weight. For heterogeneous pools.", "Least Connections": "Routes to fewest active connections. Best for variable-duration requests.", "IP Hash": "Consistent routing by client IP. Provides session affinity without cookies.", "P2C Least Connections": "Near-optimal distribution with O(1) complexity. Used by Envoy and gRPC." }
- Short uniform requests: round-robin is simple and effective
- Variable-length requests (WebSockets, streams): least connections prevents hotspots
- Session-dependent state: IP hash or cookie-based persistence
- Heterogeneous server capacities: weighted algorithms respect capacity differences
- High-scale random routing: P2C least connections gives near-optimal distribution in O(1)
Health Checks and Connection Draining
Health checks are the mechanism by which a load balancer determines whether a backend server is capable of handling traffic. Without health checks, the balancer would route requests to failed servers, causing errors for clients. Connection draining ensures in-flight requests complete before a server is removed from rotation.
Health checks come in two types: active checks where the balancer periodically probes the backend, and passive checks where the balancer monitors real request failures. Active checks detect failures proactively but add load. Passive checks detect failures only after real client requests fail.
import time import threading import requests from dataclasses import dataclass from enum import Enum from typing import Callable, Optional from io.thecodeforge.loadbalancer.core import BackendServer, BackendState class HealthCheckType(Enum): HTTP = "http" TCP = "tcp" GRPC = "grpc" @dataclass class HealthCheckConfig: check_type: HealthCheckType path: str = "/health" port: Optional[int] = None interval_seconds: float = 5.0 timeout_seconds: float = 2.0 healthy_threshold: int = 2 unhealthy_threshold: int = 3 expected_status_codes: list = None def __post_init__(self): if self.expected_status_codes is None: self.expected_status_codes = [200] class HealthChecker: """ Production health checker with configurable thresholds, grace periods, and passive failure detection. """ def __init__(self, config: HealthCheckConfig = None): self.config = config or HealthCheckConfig( check_type=HealthCheckType.HTTP, path="/health" ) self._backends: dict = {} self._running = False self._thread: Optional[threading.Thread] = None def register(self, backend: BackendServer) -> None: """ Register a backend for health checking. """ self._backends[backend.address] = { "backend": backend, "consecutive_successes": 0, "consecutive_failures": 0, "last_check_time": 0.0 } def unregister(self, backend: BackendServer) -> None: """ Remove a backend from health checking. """ self._backends.pop(backend.address, None) def check_http(self, backend: BackendServer) -> bool: """ Perform HTTP health check against backend. """ port = self.config.port or backend.port url = f"http://{backend.host}:{port}{self.config.path}" try: response = requests.get( url, timeout=self.config.timeout_seconds ) return response.status_code in self.config.expected_status_codes except (requests.ConnectionError, requests.Timeout): return False def check_tcp(self, backend: BackendServer) -> bool: """ Perform TCP connection check against backend. """ import socket port = self.config.port or backend.port try: sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(self.config.timeout_seconds) result = sock.connect_ex((backend.host, port)) sock.close() return result == 0 except socket.error: return False def run_check(self, backend: BackendServer) -> bool: """ Execute health check and update backend state based on thresholds. """ if self.config.check_type == HealthCheckType.HTTP: is_healthy = self.check_http(backend) else: is_healthy = self.check_tcp(backend) entry = self._backends.get(backend.address) if not entry: return is_healthy if is_healthy: entry["consecutive_failures"] = 0 entry["consecutive_successes"] += 1 if entry["consecutive_successes"] >= self.config.healthy_threshold: backend.state = BackendState.HEALTHY backend.consecutive_failures = 0 else: entry["consecutive_successes"] = 0 entry["consecutive_failures"] += 1 if entry["consecutive_failures"] >= self.config.unhealthy_threshold: backend.state = BackendState.UNHEALTHY backend.consecutive_failures = entry["consecutive_failures"] entry["last_check_time"] = time.time() return is_healthy def start(self) -> None: """ Start background health checking thread. """ self._running = True self._thread = threading.Thread(target=self._check_loop, daemon=True) self._thread.start() def stop(self) -> None: """ Stop background health checking. """ self._running = False if self._thread: self._thread.join(timeout=10.0) def _check_loop(self) -> None: """ Continuous health check loop. """ while self._running: for entry in list(self._backends.values()): backend = entry["backend"] if backend.state != BackendState.DRAINING: self.run_check(backend) time.sleep(self.config.interval_seconds) # Example: configure health checks config = HealthCheckConfig( check_type=HealthCheckType.HTTP, path="/health/ready", interval_seconds=5.0, timeout_seconds=2.0, healthy_threshold=2, unhealthy_threshold=3, expected_status_codes=[200, 204] ) checker = HealthChecker(config=config)
Session Persistence and Sticky Sessions
Session persistence, also called sticky sessions, ensures that requests from the same client are consistently routed to the same backend server. This is required when backend servers maintain in-memory session state that is not shared across the pool.
Sticky sessions are implemented through three mechanisms: cookie-based persistence (the balancer sets a cookie identifying the backend), IP-based persistence (routing by client IP hash), or application-controlled persistence (the application signals which backend to use). Each mechanism has different trade-offs for reliability and scalability.
import hashlib import time from typing import Dict, Optional from dataclasses import dataclass from io.thecodeforge.loadbalancer.core import BackendServer @dataclass class SessionEntry: backend_address: str created_at: float last_used: float ttl_seconds: float def is_expired(self) -> bool: return time.time() - self.last_used > self.ttl_seconds class SessionPersistenceManager: """ Manages session affinity between clients and backends. Supports cookie-based and IP-based persistence. """ def __init__(self, ttl_seconds: int = 3600, cookie_name: str = "SERVERID"): self._sessions: Dict[str, SessionEntry] = {} self._ttl = ttl_seconds self._cookie_name = cookie_name def get_backend_for_client( self, client_id: str, healthy_backends: list ) -> Optional[BackendServer]: """ Look up persisted backend for client. Returns None if session expired or backend unhealthy. """ entry = self._sessions.get(client_id) if entry is None or entry.is_expired(): return None for backend in healthy_backends: if backend.address == entry.backend_address: entry.last_used = time.time() return backend del self._sessions[client_id] return None def persist_session(self, client_id: str, backend: BackendServer) -> None: """ Create or update session affinity for a client. """ self._sessions[client_id] = SessionEntry( backend_address=backend.address, created_at=time.time(), last_used=time.time(), ttl_seconds=self._ttl ) def extract_client_id_from_cookie(self, cookies: dict) -> Optional[str]: """ Extract client identifier from load balancer cookie. """ return cookies.get(self._cookie_name) def create_session_cookie(self, client_id: str, backend: BackendServer) -> dict: """ Create cookie header for session persistence. """ return { "name": self._cookie_name, "value": backend.address, "max_age": self._ttl, "path": "/", "http_only": True, "secure": True } def cleanup_expired(self) -> int: """ Remove expired session entries. Returns count of removed entries. """ expired = [ k for k, v in self._sessions.items() if v.is_expired() ] for key in expired: del self._sessions[key] return len(expired) @property def active_sessions(self) -> int: return len(self._sessions) class ConsistentHashPersistence: """ IP-based persistence using consistent hashing. Minimizes redistribution when backends are added or removed. """ def __init__(self, virtual_nodes: int = 150): self._virtual_nodes = virtual_nodes self._ring: Dict[int, str] = {} def _hash(self, key: str) -> int: return int(hashlib.md5(key.encode()).hexdigest(), 16) def add_backend(self, backend: BackendServer) -> None: for i in range(self._virtual_nodes): vnode_key = f"{backend.address}#{i}" hash_val = self._hash(vnode_key) self._ring[hash_val] = backend.address def remove_backend(self, backend: BackendServer) -> None: for i in range(self._virtual_nodes): vnode_key = f"{backend.address}#{i}" hash_val = self._hash(vnode_key) self._ring.pop(hash_val, None) def get_backend(self, client_ip: str) -> Optional[str]: if not self._ring: return None hash_val = self._hash(client_ip) sorted_hashes = sorted(self._ring.keys()) for h in sorted_hashes: if h >= hash_val: return self._ring[h] return self._ring[sorted_hashes[0]]
- Sticky sessions create uneven load distribution when some clients are more active
- Server failure loses all sessions bound to that server
- Horizontal scaling is limited β new servers get no existing traffic
- Prefer shared session stores (Redis, Memcached) over sticky sessions when possible
- If sticky sessions are required, set reasonable TTLs and monitor session distribution
| Algorithm | Distribution | Session Affinity | Best For | Drawback |
|---|---|---|---|---|
| Round Robin | Sequential, equal | None | Uniform short requests | Hotspots with variable-duration requests |
| Weighted Round Robin | Proportional to weight | None | Heterogeneous server capacities | Requires accurate weight configuration |
| Least Connections | Fewest active connections | None | Variable request durations | Slightly higher selection overhead |
| IP Hash | Consistent by client IP | Yes (implicit) | Session affinity without cookies | Uneven distribution with few clients |
| P2C Least Connections | Random pair, pick fewer | None | High-scale uniform distribution | Randomness can cause temporary imbalance |
| Cookie-based | Consistent by cookie | Yes (explicit) | Stateful web applications | Session loss on server failure |
π― Key Takeaways
- Load balancers distribute traffic across servers for availability, scalability, and fault tolerance
- Layer 4 routes by IP/port β fast and protocol-agnostic. Layer 7 routes by HTTP content β flexible but slower
- Algorithm choice matters: least-connections for variable workloads, round-robin for uniform requests
- Health checks must be lightweight and independent β never depend on shared resources
- Connection draining and minimum_healthy_hosts prevent cascading failures during deployments
β Common Mistakes to Avoid
Interview Questions on This Topic
- QWhat is the difference between a Layer 4 and Layer 7 load balancer?JuniorReveal
- QHow would you design health checks for a microservices architecture?Mid-levelReveal
- QA production system shows one backend server handling 80% of traffic while three other servers handle 20% combined. How do you diagnose and fix this?SeniorReveal
Frequently Asked Questions
What is a load balancer in simple terms?
A load balancer is a system that sits in front of your servers and distributes incoming traffic across them. Instead of all users hitting one server, the load balancer spreads the load so no single server gets overwhelmed. If one server goes down, the load balancer automatically stops sending traffic to it.
What is the and software services. Modern production systems overwhelmingly use software load balancers or managed cloud load balancers (AWS ALB/NLB, GCP Load Balancer) for flexibility, cost, and scalability.
What is the best load balancing algorithm?
There is no universally best algorithm. Round-robin works well for simple, uniform workloads. Least connections is best when request durations vary. IP hash provides session affinity without cookies. P2C least connections offers near-optimal distribution at high scale with O(1) complexity. The right choice depends on your traffic pattern, session requirements, and server capacity.
Can a load balancer itself be a single point of failure?
Yes, a single load balancer is a single point of failure. Production systems deploy load balancers in redundant pairs using active-passive or active-active configurations. Cloud providers offer managed load balancers with built-in redundancy across availability zones. DNS-based load balancing across multiple load balancer instances provides another layer of fault tolerance.
What is connection draining?
Connection draining is the process of allowing in-flight requests to complete before removing a server from the load balancer pool. When a server is marked for removal (during deployment or scaling), the load balancer stops sending new requests but waits for existing connections to finish. This prevents users from experiencing connection reset errors during deployments.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.