Design a Rate Limiter — Clock Skew Lets 2x Traffic Through
- Fixed window = simple but burst-prone. Use for soft limits only.
- Token bucket = burst-friendly, O(1) memory. Ideal for most production APIs.
- Sliding window counter = O(1) memory, ~1% precision error. Best for high-traffic APIs.
- Rate limiting stops API abuse by capping requests per key (user, IP, endpoint) in a time window. Prevents DDoS, cost spikes, and database overload.
- Token bucket: stores up to capacity tokens; each request consumes one, replenished at fixed rate. Burst-friendly, memory O(1) per key.
- Sliding window log: stores timestamp per request; count those in last window. Memory O(window requests) — can spike.
- Sliding window counter: hybrid of fixed window + log. Memory O(1), approximates log precision with 10-100M Redis ops/day.
- Fixed window counter: counts per discrete window (e.g., 1-min intervals). Burst at boundaries allows 2x rate in worst case (e.g., 59s and 60s).
- Production trap: Redis cluster + clock skew breaks rate limiting. Node A sees request at 12:00:01, node B at 11:59:59. Without synchronised clocks, limiters drift.
Rate Limiter Debug Cheat Sheet
Redis memory full — OOM kills rate limiter
redis-cli INFO memory | grep used_memory_humanredis-cli --bigkeysRate limit exceeded errors for legitimate users behind shared IP
grep -c "client_ip='203.0.113.45'" api-access.logredis-cli GET "rate_limit:203.0.113.45"Race condition allows requests beyond limit at high concurrency
grep -n 'INCR\|GET\|SET\|MULTI\|EXEC' rate_limiter.pyredis-cli --latency-historyRedis latency spikes during rate limiting
redis-cli SLOWLOG GET 10redis-cli INFO commandstats | grep evalAttackers bypass limit by rotating IPs (proxy pool)
SELECT COUNT(DISTINCT client_ip) FROM api_logs WHERE endpoint = '/login'tcpdump -i eth0 -n 'host your-api.com and port 443' | grep -c 'client_ip'Production Incident
CLOCK_SKEW_TOLERANCE_MS = 100 config; if multiple nodes disagree on window, use the earliest window to be conservative.Production Debug GuideSymptom → Action mapping for common rate limiter failures in production APIs.
Every public API you've ever used has a rate limiter quietly running behind it. GitHub limits REST calls to 5,000 per hour. Stripe throttles card-creation endpoints. OpenAI caps tokens per minute. These aren't arbitrary annoyances — they're the difference between a stable, profitable service and one that gets DDOSed into bankruptcy by a single misconfigured cron job. Rate limiting is one of the most deceptively simple concepts in system design: easy to describe in an interview, brutally hard to get right in production.
The core problem rate limiting solves is resource fairness under pressure. Without it, one aggressive client can saturate your database connection pool, spike your compute costs, or starve every other tenant on a shared platform. It's a first-line defense against abuse, a cost-control mechanism, and an SLA enforcement tool — all rolled into one component that typically adds less than 1ms of latency per request when built correctly.
By the end of this article you'll understand the four major rate-limiting algorithms from first principles — including their memory footprints and failure modes — know exactly how Redis-based distributed rate limiters work under the hood (including the Lua script trick that makes them atomic), be able to design a rate limiter that survives node failures and clock skew, and walk into any system design interview and confidently discuss trade-offs that most candidates never even consider.
Fixed Window Counter — Simple but Burst-Prone
The fixed window algorithm divides time into discrete intervals (e.g., 1 minute, 1 hour). For each key (user ID, IP address), the counter resets at the start of each interval.
Implementation: Redis INCR with key like rate_limit:user_123:2026-03-18T15:34. Set an expiry of TTL seconds. If the counter after increment exceeds the limit, reject the request.
This is the easiest algorithm to implement and the most memory-efficient (one integer per key). However, it allows bursts at the window boundary. If the limit is 1000 requests per hour, a user can send 1000 requests at 11:59:59 and another 1000 at 12:00:00 — 2000 requests within 2 seconds. The worst-case rate is 2x the limit.
In production, fixed window is acceptable for soft rate limits (logging, analytics) or when the window size is very small (1 second). For hard limits that enforce service-level agreements, you need a sliding window algorithm.
import time import redis from datetime import datetime class FixedWindowRateLimiter: def __init__(self, redis_client, limit_per_window, window_seconds): self.redis = redis_client self.limit = limit_per_window self.window_seconds = window_seconds def _get_window_key(self, key): """Generate key with current window number (UNIX timestamp / window size).""" current_window = int(time.time() / self.window_seconds) return f"rate_limit:{key}:{current_window}" def allow_request(self, key): """ Returns (allowed, current_count). WARNING: Not atomic — race condition possible at high concurrency. Use Lua script for production (see sliding_window_counter.py). """ window_key = self._get_window_key(key) # Pipeline: INCR + EXPIRE (if key doesn't exist, EXPIRE sets TTL) pipe = self.redis.pipeline() pipe.incr(window_key) pipe.expire(window_key, self.window_seconds) result = pipe.execute() current_count = result[0] # INCR result if current_count <= self.limit: return True, current_count else: return False, current_count # ───────────────────────────────────────────────────────────── # PRODUCTION-READY VERSION (Atomic Lua script) # ───────────────────────────────────────────────────────────── FIXED_WINDOW_LUA = """ local key = KEYS[1] local limit = tonumber(ARGV[1]) local window_seconds = tonumber(ARGV[2]) local current = redis.call('INCR', key) if current == 1 then redis.call('EXPIRE', key, window_seconds) end if current <= limit then return {1, current} else return {0, current} end """ def allow_request_atomic(self, key): """Use Lua script for atomic operation.""" window_key = self._get_window_key(key) allowed, current = self.redis.eval( self.FIXED_WINDOW_LUA, 1, # number of keys window_key, self.limit, self.window_seconds ) return allowed == 1, current
Sliding Window Log — Precise But Memory-Intensive
The sliding window log algorithm stores a timestamp of each request in a sorted set (ZSET) keyed by user ID. When a new request arrives, it removes entries older than the window (e.g., last 60 seconds), counts the remaining entries, and accepts if the count is below the limit.
Implementation: ZADD key timestamp request_id. ZREMRANGEBYSCORE key 0 current_time - window_seconds. ZCARD key to get count.
This algorithm provides perfect precision — no burst at boundaries because the window slides continuously rather than resetting at fixed intervals.
Memory consumption grows with request volume. At 10,000 requests per second, each window stores up to 600,000 timestamps (for 60-second window). Each timestamp is ~50 bytes in Redis = 30MB per key. For 10,000 keys, that's 300GB — unacceptable.
In production, sliding window log is only suitable for low-traffic APIs (<10 QPS per key). For high-traffic, use sliding window counter (approximate, O(1) memory).
import time import redis import uuid class SlidingWindowLogRateLimiter: def __init__(self, redis_client, limit_per_window, window_seconds): self.redis = redis_client self.limit = limit_per_window self.window_seconds = window_seconds def allow_request(self, key): """ Sliding window log using Redis ZSET. Memory O(number of requests in window) per key. """ zset_key = f"rate_limit_log:{key}" now = time.time() window_start = now - self.window_seconds # Use Lua script for atomic cleanup + count lua_script = """ local key = KEYS[1] local now = tonumber(ARGV[1]) local window_start = tonumber(ARGV[2]) local limit = tonumber(ARGV[3]) local request_id = ARGV[4] -- Remove entries outside current window redis.call('ZREMRANGEBYSCORE', key, 0, window_start) -- Get current count local count = redis.call('ZCARD', key) if count < limit then -- Add new request redis.call('ZADD', key, now, request_id) -- Set TTL to window_seconds + buffer redis.call('EXPIRE', key, window_start + 60) return {1, count + 1} else return {0, count} end """ request_id = str(uuid.uuid4()) allowed, count = self.redis.eval( lua_script, 1, # number of keys zset_key, now, window_start, self.limit, request_id ) return allowed == 1, count def get_memory_usage(self, key): """Estimate memory usage for a key (in bytes).""" zset_key = f"rate_limit_log:{key}" return self.redis.memory_usage(zset_key) or 0
Token Bucket — Burst-Friendly and Predictable
The token bucket algorithm maintains a bucket for each key with a capacity (max burst) and a refill_rate (tokens per second). Each request consumes one token. Tokens are added at the refill rate up to the capacity.
Implementation: Store last_refill_timestamp and tokens in Redis. On each request, calculate tokens added since last refill, cap at capacity, deduct one token.
This algorithm is ideal for APIs that need to handle bursts (e.g., 10 requests in 1 second) but enforce a sustained average rate (e.g., 60 requests per minute). It's also memory-efficient: O(1) per key.
Token bucket is the default choice for most production rate limiters (used by AWS API Gateway, Stripe, GitHub).
Distributed implementation: use Redis Lua script for atomic token calculation. The script reads current_tokens and last_refill, computes new token count, deducts if sufficient, and updates Redis.
Corner case: if last_refill is too far in the past (e.g., user hasn't used the API for days), the token count caps at capacity. This prevents unlimited token accumulation.
import time import redis class TokenBucketRateLimiter: def __init__(self, redis_client, capacity, refill_rate_per_second): self.redis = redis_client self.capacity = capacity self.refill_rate = refill_rate_per_second def allow_request(self, key): """ Token bucket using Redis Lua script. Memory O(1) per key (stores tokens and last_refill timestamp). """ bucket_key = f"rate_limit_token:{key}" now = time.time() lua_script = """ local key = KEYS[1] local now = tonumber(ARGV[1]) local capacity = tonumber(ARGV[2]) local refill_rate = tonumber(ARGV[3]) local data = redis.call('HMGET', key, 'tokens', 'last_refill') local tokens = tonumber(data[1]) or capacity local last_refill = tonumber(data[2]) or now -- Calculate tokens to add local elapsed = now - last_refill local tokens_to_add = elapsed * refill_rate tokens = math.min(capacity, tokens + tokens_to_add) local allowed = 0 if tokens >= 1 then allowed = 1 tokens = tokens - 1 end -- Store updated state redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now) redis.call('EXPIRE', key, 3600) -- TTL 1 hour return {allowed, tokens} """ allowed, tokens = self.redis.eval( lua_script, 1, bucket_key, now, self.capacity, self.refill_rate ) return allowed == 1, tokens def get_token_count(self, key): """Get current token count for debugging.""" bucket_key = f"rate_limit_token:{key}" tokens = self.redis.hget(bucket_key, 'tokens') return float(tokens) if tokens else self.capacity
- Capacity = maximum burst. If capacity = 10, you can handle 10 requests instantly.
- Refill rate = sustained average. If refill_rate = 1 token/second, sustained limit is 3600 requests/hour.
- Spike handling: burst uses stored tokens, then refills at constant rate.
- If user hasn't used API for hours, tokens accumulate up to capacity (not unlimited).
- Token bucket is the algorithm behind AWS API Gateway's rate limiting.
Sliding Window Counter — Memory-Efficient and Production-Proven
The sliding window counter algorithm is a hybrid of fixed window and sliding window log. It tracks two counters per key: current window count and previous window count. The weighted average of the previous and current windows approximates the sliding window without storing per-request timestamps.
Formula: estimated_count = previous_window_count * ((window_end - now) / window_size) + current_count
Implementation: Redis key for current window (e.g., rate_limit:user_123:{current_minute}) and previous window (rate_limit:user_123:{previous_minute}). On each request, get current and previous counts, compute weighted estimate, reject if above limit.
This algorithm was popularised by Cloudflare and is used in many production systems. It provides ~1% precision error compared to sliding window log but O(1) memory per key.
Distributed clock skew issue: if nodes have different system times, they may write to different window keys for the same request. Solution: use centralised Redis TIME command in Lua script to get a consistent timestamp across all nodes.
Memory optimisation: Set TTL on window keys to 2 * window size (e.g., 60-second window → TTL 120 seconds). This prevents stale keys from accumulating.
import time import redis class SlidingWindowCounterRateLimiter: def __init__(self, redis_client, limit_per_window, window_seconds): self.redis = redis_client self.limit = limit_per_window self.window_seconds = window_seconds def allow_request(self, key): """ Sliding window counter using two Redis keys (current + previous windows). Memory O(1) per key. Precision ~1% vs sliding window log. Uses Redis TIME for consistent cluster-wide timestamp. """ lua_script = """ local key = KEYS[1] local limit = tonumber(ARGV[1]) local window_seconds = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) local window_start = now - window_seconds -- Each window is 1 second (or window_seconds, but we use 1-second granularity) -- Calculate current and previous window identifiers local current_window = math.floor(now) local previous_window = math.floor(window_start) local current_key = key .. ':' .. current_window local previous_key = key .. ':' .. previous_window -- Get counts for both windows local current_count = redis.call('GET', current_key) or 0 local previous_count = redis.call('GET', previous_key) or 0 -- Calculate weighted count: previous window weight * (overlap / window_seconds) local overlap = window_seconds - (now - previous_window) local weight = overlap / window_seconds local estimated_count = previous_count * weight + current_count if estimated_count < limit then -- Increment current window counter local new_count = redis.call('INCR', current_key) if new_count == 1 then redis.call('EXPIRE', current_key, window_seconds * 2) end return {1, new_count} else return {0, estimated_count} end """ # Get consistent timestamp from Redis (not local time) redis_time = self.redis.time() # returns [seconds, microseconds] now = redis_time[0] + redis_time[1] / 1000000.0 allowed, count = self.redis.eval( lua_script, 1, f"rate_limit_sliding:{key}", self.limit, self.window_seconds, now ) return allowed == 1, count def get_estimated_usage(self, key): """Get current estimated count without consuming a request.""" # Similar to allow_request but without INCR # ... (omitted for brevity) pass
redis.time() to get a consistent timestamp from the Redis leader. Adds ~2ms latency but ensures correctness.redis.time() to get a centralised timestamp. Each Lua script called redis.time() at the beginning, ensuring all nodes use the same time for window calculation.redis.time()) to avoid clock skew in distributed clusters.| Algorithm | Memory per key | Precision | Burst handling | Distributed clock skew safe? | Best for |
|---|---|---|---|---|---|
| Fixed Window Counter | O(1) | Poor (2x rate at boundaries) | Poor (hard reset at window edge) | No (requires central timestamp) | Soft limits, logging, non-critical features |
| Sliding Window Log | O(window_requests) | Perfect (100%) | Excellent (sliding) | Yes (timestamps per request) | Low-traffic APIs (<10 QPS per key) |
| Token Bucket | O(1) | Excellent (sustained rate) | Excellent (configurable burst) | Yes (with refill timestamp) | Production APIs with burst allowance |
| Sliding Window Counter | O(1) | ~99% (1% error) | Good (sliding approximation) | Yes (with centralised timestamp) | High-traffic APIs (>100 QPS per key) |
🎯 Key Takeaways
- Fixed window = simple but burst-prone. Use for soft limits only.
- Token bucket = burst-friendly, O(1) memory. Ideal for most production APIs.
- Sliding window counter = O(1) memory, ~1% precision error. Best for high-traffic APIs.
- Sliding window log = perfect precision, O(window_requests) memory. Low-traffic only.
- Distributed rate limiters need centralised timestamps (
redis.time()) to avoid clock skew. - Atomic increments are mandatory — non-atomic read-then-write allows 10-20% over-limit at high concurrency.
- Set TTL = 2× window size to prevent stale keys from accumulating.
- Rate limiting by IP is easily bypassed by proxy pools. Prefer per-authenticated-user limits.
- For Redis outages, choose: fail open (availability), fail closed (security), or degrade to local limiter (balanced).
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QExplain the trade-offs between fixed window counter and sliding window counter. When would you choose one over the other?SeniorReveal
- QHow do you implement a distributed rate limiter that survives Redis cluster node failures?SeniorReveal
- QWhat happens when the Redis node storing rate limit counters is unreachable? How does your system behave?SeniorReveal
- QHow does Stripe's rate limiting work at 100,000 RPS scale?SeniorReveal
- QHow do you test a rate limiter in production without breaking existing clients?SeniorReveal
Frequently Asked Questions
What is the difference between rate limiting and throttling?
Rate limiting caps the number of requests over a time window (e.g., 1000 requests per hour). Throttling reduces the rate of processing (e.g., delay requests to 10 per second). Throttling often uses a token bucket with queuing; rate limiting rejects excess requests outright. In practice, the terms are used interchangeably, but throttling implies a gradual slowdown, while rate limiting implies explicit rejection.
Should I rate limit by IP, user ID, or API key?
Prioritise: API key (authenticated) > user ID > IP address. IP-based limits are easily bypassed by proxy pools (attackers can rotate through thousands of IPs). For unauthenticated endpoints, combine IP + User-Agent fingerprinting + global aggregate limits as a safety net. For authenticated users, limit by API key first, user ID second, and IP only as a last resort.
How do I choose the right window size for rate limiting?
Match window size to your business SLA and infrastructure capacity: - 1 second: DDoS protection, per-second billing (e.g., 10 requests/sec). - 1 minute: API rate limits for moderate traffic (e.g., 300 requests/min). - 1 hour: Standard API quotas (e.g., 5000 requests/hour). - 1 day: Tiered plans (free: 10k/day, pro: 100k/day).
Trade-off: smaller windows enable faster burst recovery but increase Redis key churn. Larger windows smooth out bursts but may allow short-term overload.
What is the memory footprint of Redis-based rate limiters?
Fixed window or token bucket: O(1) per key → 10 million keys × 100 bytes = 1GB. Sliding window counter: similar O(1) per key. Sliding window log (ZSET): O(window_requests) per key → 1M QPS × 60s × 50 bytes = 3GB per key. Never use sliding window log for high-traffic APIs.
How do you handle rate limiting for WebSocket connections?
WebSockets are long-lived; rate limiting per open connection is insufficient. Instead, limit messages per time window per connection. Use an in-memory token bucket on the WebSocket server (e.g., 10 messages per second per connection). For distributed WebSocket servers (multiple nodes), use consistent hashing to assign each connection to a specific node, then use local rate limiter. Avoid Redis per-message overhead (adds latency). For cross-node coordination, use Redis Pub/Sub to broadcast rate limit updates, but accept that some messages will slip through during propagation delay (eventual consistency is acceptable).
Can I implement rate limiting without Redis?
Yes, for single-node deployments use in-memory token bucket (Guava RateLimiter in Java, time.Sleep-based in Go). For distributed systems, you can use consistent hashing + local limiters with total capacity = global limit / number of nodes + small buffer. However, Redis (or another central KV store) is the standard solution because it provides atomic operations, persistence, and high throughput. Alternatives: Hazelcast (in-memory grid), etcd (consensus-based, slower), or local limiters with gossip protocol (complex).
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.