Skip to content
Home System Design Design a Rate Limiter — Clock Skew Lets 2x Traffic Through

Design a Rate Limiter — Clock Skew Lets 2x Traffic Through

Where developers are forged. · Structured learning · Free forever.
📍 Part of: Real World → Topic 11 of 17
30ms NTP drift between Redis nodes doubled request rates at window boundaries.
🔥 Advanced — solid System Design foundation required
In this tutorial, you'll learn
30ms NTP drift between Redis nodes doubled request rates at window boundaries.
  • Fixed window = simple but burst-prone. Use for soft limits only.
  • Token bucket = burst-friendly, O(1) memory. Ideal for most production APIs.
  • Sliding window counter = O(1) memory, ~1% precision error. Best for high-traffic APIs.
✦ Plain-English analogy ✦ Real code with output ✦ Interview questions
Quick Answer
  • Rate limiting stops API abuse by capping requests per key (user, IP, endpoint) in a time window. Prevents DDoS, cost spikes, and database overload.
  • Token bucket: stores up to capacity tokens; each request consumes one, replenished at fixed rate. Burst-friendly, memory O(1) per key.
  • Sliding window log: stores timestamp per request; count those in last window. Memory O(window requests) — can spike.
  • Sliding window counter: hybrid of fixed window + log. Memory O(1), approximates log precision with 10-100M Redis ops/day.
  • Fixed window counter: counts per discrete window (e.g., 1-min intervals). Burst at boundaries allows 2x rate in worst case (e.g., 59s and 60s).
  • Production trap: Redis cluster + clock skew breaks rate limiting. Node A sees request at 12:00:01, node B at 11:59:59. Without synchronised clocks, limiters drift.
🚨 START HERE

Rate Limiter Debug Cheat Sheet

Fast diagnostics for rate-limiter failures in production APIs.
🔴

Redis memory full — OOM kills rate limiter

Immediate ActionCheck if using ZSET (Sliding Window Log)
Commands
redis-cli INFO memory | grep used_memory_human
redis-cli --bigkeys
Fix NowSwitch to Sliding Window Counter (small fixed memory) or Token Bucket (O(1) memory).
🟡

Rate limit exceeded errors for legitimate users behind shared IP

Immediate ActionIdentify shared NAT IPs (corporate, mobile carrier)
Commands
grep -c "client_ip='203.0.113.45'" api-access.log
redis-cli GET "rate_limit:203.0.113.45"
Fix NowSwitch to per-user authentication (API key, JWT) instead of per-IP.
🟡

Race condition allows requests beyond limit at high concurrency

Immediate ActionCheck if increment and check are atomic
Commands
grep -n 'INCR\|GET\|SET\|MULTI\|EXEC' rate_limiter.py
redis-cli --latency-history
Fix NowWrap INCR + GETTTL + EXPIRE in Lua script or use SET NX EX.
🟠

Redis latency spikes during rate limiting

Immediate ActionCheck if Lua script uses KEYS or SCAN
Commands
redis-cli SLOWLOG GET 10
redis-cli INFO commandstats | grep eval
Fix NowAvoid KEYS in Lua. Use SCAN with count limit. Move Redis to same region as API.
🟡

Attackers bypass limit by rotating IPs (proxy pool)

Immediate ActionCheck if limiting is per IP only
Commands
SELECT COUNT(DISTINCT client_ip) FROM api_logs WHERE endpoint = '/login'
tcpdump -i eth0 -n 'host your-api.com and port 443' | grep -c 'client_ip'
Fix NowRequire API key or JWT. Use CAPTCHA for unauthenticated endpoints. Implement global aggregate limits.
Production Incident

The Redis Clock Skew That Opened the Floodgates

A cross-continental Redis cluster with 5ms average latency experienced a 30ms clock drift between nodes. Attackers discovered they could send requests to different nodes during the drift window, effectively doubling their rate limit. 200,000 API calls above limit before detection.
SymptomRate limiter logs showed each Redis node independently accepted requests at the allowed rate (1000/hour). The aggregate rate across nodes exceeded 2000/hour within the same wall-clock hour. No single node saw the spike, but the global rate was 2x the limit.
AssumptionThe team assumed Redis Lua scripts running on each node would provide globally consistent rate limiting. They didn't account for clock drift across nodes in different AWS regions (us-east-1 vs eu-west-1, 30ms typical difference).
Root causeThe rate limiter used Redis TIME command to get server time for window calculation. Each node's system clock drifted up to 30ms relative to others due to NTP variance. In a fixed window counter implementation, a request at 12:00:00.000 on node A and another at 11:59:59.980 on node B fall into different windows on each node's local time. Both were accepted because each node believed the request belonged to the current window. The global aggregate allowed 2x the intended rate. The fix window boundary drift: attacker sent 1 request to node A at 59.5s, 1 to node B at 60.5s. Both were counted in different 1-minute windows on each node, but globally they occurred within 1 second.
Fix1. Switched from node-local TIME to centralised Redis cluster time (all nodes query same leader for timestamp, adds 2ms latency). 2. Implemented sliding window counter (Redlock-style) using Redis sorted sets (ZSET) with expiration, which doesn't depend on absolute window boundaries. 3. Added clock drift monitoring: if node clock differs from cluster leader by >10ms, reject rate-limit decisions from that node. 4. For cross-region clusters, moved rate limiter to single region (us-east-1) and replicated decisions via Kafka — eventual consistency, but enforcement happens at edge. 5. Added CLOCK_SKEW_TOLERANCE_MS = 100 config; if multiple nodes disagree on window, use the earliest window to be conservative.
Key Lesson
Distributed rate limiters using local node time are vulnerable to clock skew. Centralise time or use algorithms that don't depend on absolute windows.Sliding window log (ZSET) eliminates window boundary attacks but memory grows with request volume.Fixed window counters in distributed systems can allow 2x the intended rate at boundaries. Acceptable only for non-critical limits.If you must use local time, add a conservative buffer (e.g., treat requests within 100ms of window boundary as potentially belonging to previous/next window and reject).Monitor the delta between local node time and a trusted time source. Alert if drift exceeds 20ms.
Production Debug Guide

Symptom → Action mapping for common rate limiter failures in production APIs.

Rate limit exceeded errors for legitimate users who haven't hit the limitCheck clock skew across rate limiter nodes (Redis cluster time). If drift > 100ms, synchronise NTP. Also check if user's IP is behind a shared NAT (office, mobile carrier) — multiple users sharing same IP hit the same limit.
Rate limiter allowing requests beyond configured limitVerify atomicity of increment + check. Non-atomic read-then-write allows race conditions at high concurrency. Ensure your rate limiter uses Lua script or Redis INCR with EXPIRE in same command (multi/exec or eval).
High memory usage in Redis — OOM kills rate limiterCheck if using Sliding Window Log algorithm (ZSET). Each request adds a timestamp entry. For 100k RPS, memory grows by 100k keys per second * TTL. Switch to Sliding Window Counter (fixed memory) or Token Bucket.
Rate limiter adds 20ms+ latency to API callsCheck if Redis is in a different region from API server. Move Redis to same region (or AWS Local Zone). Also check if you're using Lua scripts that block Redis for too long (avoid KEYS, use SCAN).
Rate limits bypassed by attackers rotating IP addressesIf limiting per IP, attackers can use proxy pools (thousands of IPs). Switch to per-user authentication (API key, JWT) for meaningful limits. For unauthenticated endpoints, use CAPTCHA or global aggregate limits.

Every public API you've ever used has a rate limiter quietly running behind it. GitHub limits REST calls to 5,000 per hour. Stripe throttles card-creation endpoints. OpenAI caps tokens per minute. These aren't arbitrary annoyances — they're the difference between a stable, profitable service and one that gets DDOSed into bankruptcy by a single misconfigured cron job. Rate limiting is one of the most deceptively simple concepts in system design: easy to describe in an interview, brutally hard to get right in production.

The core problem rate limiting solves is resource fairness under pressure. Without it, one aggressive client can saturate your database connection pool, spike your compute costs, or starve every other tenant on a shared platform. It's a first-line defense against abuse, a cost-control mechanism, and an SLA enforcement tool — all rolled into one component that typically adds less than 1ms of latency per request when built correctly.

By the end of this article you'll understand the four major rate-limiting algorithms from first principles — including their memory footprints and failure modes — know exactly how Redis-based distributed rate limiters work under the hood (including the Lua script trick that makes them atomic), be able to design a rate limiter that survives node failures and clock skew, and walk into any system design interview and confidently discuss trade-offs that most candidates never even consider.

Fixed Window Counter — Simple but Burst-Prone

The fixed window algorithm divides time into discrete intervals (e.g., 1 minute, 1 hour). For each key (user ID, IP address), the counter resets at the start of each interval.

Implementation: Redis INCR with key like rate_limit:user_123:2026-03-18T15:34. Set an expiry of TTL seconds. If the counter after increment exceeds the limit, reject the request.

This is the easiest algorithm to implement and the most memory-efficient (one integer per key). However, it allows bursts at the window boundary. If the limit is 1000 requests per hour, a user can send 1000 requests at 11:59:59 and another 1000 at 12:00:00 — 2000 requests within 2 seconds. The worst-case rate is 2x the limit.

In production, fixed window is acceptable for soft rate limits (logging, analytics) or when the window size is very small (1 second). For hard limits that enforce service-level agreements, you need a sliding window algorithm.

io/thecodeforge/rate_limiter/fixed_window.py · PYTHON
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
import time
import redis
from datetime import datetime

class FixedWindowRateLimiter:
    def __init__(self, redis_client, limit_per_window, window_seconds):
        self.redis = redis_client
        self.limit = limit_per_window
        self.window_seconds = window_seconds

    def _get_window_key(self, key):
        """Generate key with current window number (UNIX timestamp / window size)."""
        current_window = int(time.time() / self.window_seconds)
        return f"rate_limit:{key}:{current_window}"

    def allow_request(self, key):
        """
        Returns (allowed, current_count).
        WARNING: Not atomic — race condition possible at high concurrency.
        Use Lua script for production (see sliding_window_counter.py).
        """
        window_key = self._get_window_key(key)
        
        # Pipeline: INCR + EXPIRE (if key doesn't exist, EXPIRE sets TTL)
        pipe = self.redis.pipeline()
        pipe.incr(window_key)
        pipe.expire(window_key, self.window_seconds)
        result = pipe.execute()
        
        current_count = result[0]  # INCR result
        
        if current_count <= self.limit:
            return True, current_count
        else:
            return False, current_count

    # ─────────────────────────────────────────────────────────────
    # PRODUCTION-READY VERSION (Atomic Lua script)
    # ─────────────────────────────────────────────────────────────
    
    FIXED_WINDOW_LUA = """
        local key = KEYS[1]
        local limit = tonumber(ARGV[1])
        local window_seconds = tonumber(ARGV[2])
        
        local current = redis.call('INCR', key)
        if current == 1 then
            redis.call('EXPIRE', key, window_seconds)
        end
        
        if current <= limit then
            return {1, current}
        else
            return {0, current}
        end
    """
    
    def allow_request_atomic(self, key):
        """Use Lua script for atomic operation."""
        window_key = self._get_window_key(key)
        allowed, current = self.redis.eval(
            self.FIXED_WINDOW_LUA,
            1,  # number of keys
            window_key,
            self.limit,
            self.window_seconds
        )
        return allowed == 1, current
🔥Fixed Window Burst Example
Limit: 10 requests/minute. Worst-case: User sends 10 requests at 01:59:59 (counts in window 01:59), then 10 requests at 02:00:00 (new window). Total 20 requests in 2 seconds = 2x the limit. Acceptable for soft limits; unacceptable for hard SLAs.
📊 Production Insight
We deployed fixed window for a public API with 1 request/second limit (1-second windows). Attackers sent 1 request at 0.999s, 1 at 1.001s. Both were in different windows. Rate limiter accepted 2 requests in 2ms — 1000x the intended rate.
The team assumed 1-second windows were fine because 'nobody will hit the boundary'. Attackers did.
Rule: Fixed window is only safe when the window size is very small (1 second) AND you add a cooldown period after each window (e.g., reject requests in first 100ms of next window).
🎯 Key Takeaway
Fixed window = simple but burst-prone. Worst-case allows 2x limit at boundaries.
Use for soft limits only. For hard SLAs, use sliding window or token bucket.
Rule: If you implement fixed window, make window size small (1 second) and add cooldown.
Choose Rate Limiting Algorithm
IfSoft limit (analytics, logging, non-critical features)
UseFixed window counter. Simple, O(1) memory, but allows 2x bursts at boundaries.
IfHard SLA limit with predictable bursting (e.g., 1000 requests per hour, burst up to 1200)
UseToken bucket. Allows bursts, limits sustained rate. O(1) memory.
IfHard SLA limit with precise window enforcement (no boundary bursts)
UseSliding window counter (Redis + Lua). O(1) memory, ~1% precision loss vs log.
IfLow-traffic API (<10 QPS per key), need perfect precision
UseSliding window log (ZSET). O(n) memory per key (stores each request timestamp).
IfDistributed cluster, cannot tolerate clock skew
UseCentralised time source (Redis TIME) + sliding window counter. Add 2-5ms latency.

Sliding Window Log — Precise But Memory-Intensive

The sliding window log algorithm stores a timestamp of each request in a sorted set (ZSET) keyed by user ID. When a new request arrives, it removes entries older than the window (e.g., last 60 seconds), counts the remaining entries, and accepts if the count is below the limit.

Implementation: ZADD key timestamp request_id. ZREMRANGEBYSCORE key 0 current_time - window_seconds. ZCARD key to get count.

This algorithm provides perfect precision — no burst at boundaries because the window slides continuously rather than resetting at fixed intervals.

Memory consumption grows with request volume. At 10,000 requests per second, each window stores up to 600,000 timestamps (for 60-second window). Each timestamp is ~50 bytes in Redis = 30MB per key. For 10,000 keys, that's 300GB — unacceptable.

In production, sliding window log is only suitable for low-traffic APIs (<10 QPS per key). For high-traffic, use sliding window counter (approximate, O(1) memory).

io/thecodeforge/rate_limiter/sliding_window_log.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
import time
import redis
import uuid

class SlidingWindowLogRateLimiter:
    def __init__(self, redis_client, limit_per_window, window_seconds):
        self.redis = redis_client
        self.limit = limit_per_window
        self.window_seconds = window_seconds

    def allow_request(self, key):
        """
        Sliding window log using Redis ZSET.
        Memory O(number of requests in window) per key.
        """
        zset_key = f"rate_limit_log:{key}"
        now = time.time()
        window_start = now - self.window_seconds
        
        # Use Lua script for atomic cleanup + count
        lua_script = """
            local key = KEYS[1]
            local now = tonumber(ARGV[1])
            local window_start = tonumber(ARGV[2])
            local limit = tonumber(ARGV[3])
            local request_id = ARGV[4]
            
            -- Remove entries outside current window
            redis.call('ZREMRANGEBYSCORE', key, 0, window_start)
            
            -- Get current count
            local count = redis.call('ZCARD', key)
            
            if count < limit then
                -- Add new request
                redis.call('ZADD', key, now, request_id)
                -- Set TTL to window_seconds + buffer
                redis.call('EXPIRE', key, window_start + 60)
                return {1, count + 1}
            else
                return {0, count}
            end
        """
        
        request_id = str(uuid.uuid4())
        allowed, count = self.redis.eval(
            lua_script,
            1,  # number of keys
            zset_key,
            now,
            window_start,
            self.limit,
            request_id
        )
        return allowed == 1, count
    
    def get_memory_usage(self, key):
        """Estimate memory usage for a key (in bytes)."""
        zset_key = f"rate_limit_log:{key}"
        return self.redis.memory_usage(zset_key) or 0
⚠ Memory Usage Warning
Sliding window log stores one ZSET entry per request. At 1000 QPS, a 60-second window stores 60,000 entries per key. Each entry ~50 bytes = 3MB per key. With 10,000 keys, that's 30GB of Redis memory. Use sliding window counter for high-traffic APIs.
📊 Production Insight
We deployed sliding window log for a public API with 1000 QPS. Redis memory grew from 500MB to 12GB in 2 hours. OOM killed Redis, rate limiter failed open (allowed all requests). Attackers exploited the open state to DDoS the origin.
Fix: Switched to sliding window counter (O(1) memory).
Rule: Never use sliding window log for high-traffic APIs. Memory grows with request volume.
🎯 Key Takeaway
Sliding window log = perfect precision, O(window_requests) memory.
Use only for low-traffic APIs (<10 QPS per key) or when precision is critical.
Rule: For high-traffic (>100 QPS), use sliding window counter — approximate but O(1) memory.

Token Bucket — Burst-Friendly and Predictable

The token bucket algorithm maintains a bucket for each key with a capacity (max burst) and a refill_rate (tokens per second). Each request consumes one token. Tokens are added at the refill rate up to the capacity.

Implementation: Store last_refill_timestamp and tokens in Redis. On each request, calculate tokens added since last refill, cap at capacity, deduct one token.

This algorithm is ideal for APIs that need to handle bursts (e.g., 10 requests in 1 second) but enforce a sustained average rate (e.g., 60 requests per minute). It's also memory-efficient: O(1) per key.

Token bucket is the default choice for most production rate limiters (used by AWS API Gateway, Stripe, GitHub).

Distributed implementation: use Redis Lua script for atomic token calculation. The script reads current_tokens and last_refill, computes new token count, deducts if sufficient, and updates Redis.

Corner case: if last_refill is too far in the past (e.g., user hasn't used the API for days), the token count caps at capacity. This prevents unlimited token accumulation.

io/thecodeforge/rate_limiter/token_bucket.py · PYTHON
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
import time
import redis

class TokenBucketRateLimiter:
    def __init__(self, redis_client, capacity, refill_rate_per_second):
        self.redis = redis_client
        self.capacity = capacity
        self.refill_rate = refill_rate_per_second
    
    def allow_request(self, key):
        """
        Token bucket using Redis Lua script.
        Memory O(1) per key (stores tokens and last_refill timestamp).
        """
        bucket_key = f"rate_limit_token:{key}"
        now = time.time()
        
        lua_script = """
            local key = KEYS[1]
            local now = tonumber(ARGV[1])
            local capacity = tonumber(ARGV[2])
            local refill_rate = tonumber(ARGV[3])
            
            local data = redis.call('HMGET', key, 'tokens', 'last_refill')
            local tokens = tonumber(data[1]) or capacity
            local last_refill = tonumber(data[2]) or now
            
            -- Calculate tokens to add
            local elapsed = now - last_refill
            local tokens_to_add = elapsed * refill_rate
            tokens = math.min(capacity, tokens + tokens_to_add)
            
            local allowed = 0
            if tokens >= 1 then
                allowed = 1
                tokens = tokens - 1
            end
            
            -- Store updated state
            redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
            redis.call('EXPIRE', key, 3600)  -- TTL 1 hour
            
            return {allowed, tokens}
        """
        
        allowed, tokens = self.redis.eval(
            lua_script,
            1,
            bucket_key,
            now,
            self.capacity,
            self.refill_rate
        )
        return allowed == 1, tokens

    def get_token_count(self, key):
        """Get current token count for debugging."""
        bucket_key = f"rate_limit_token:{key}"
        tokens = self.redis.hget(bucket_key, 'tokens')
        return float(tokens) if tokens else self.capacity
Mental Model
Token Bucket as a Regulated River
Think of the bucket as a reservoir. Tokens are water flowing in at a constant rate. Each request is a person drinking a cup of water. The reservoir size = max burst you can handle.
  • Capacity = maximum burst. If capacity = 10, you can handle 10 requests instantly.
  • Refill rate = sustained average. If refill_rate = 1 token/second, sustained limit is 3600 requests/hour.
  • Spike handling: burst uses stored tokens, then refills at constant rate.
  • If user hasn't used API for hours, tokens accumulate up to capacity (not unlimited).
  • Token bucket is the algorithm behind AWS API Gateway's rate limiting.
📊 Production Insight
We used token bucket for an API with customer tiers (free: 10 req/min, pro: 1000 req/min). The free tier capacity was set to 10, refill_rate = 10/60 = 0.1667 tokens/second. Pro tier capacity = 1000, refill_rate = 1000/60 = 16.67 tokens/second.
A pro customer made a burst of 1000 requests in 1 second (API was idle for hours). Token bucket allowed it (capacity=1000). After the burst, tokens dropped to 0, and refill rate limited to 16.67 req/sec. The system stayed stable.
🎯 Key Takeaway
Token bucket = burst-tolerant, sustained-rate limiting. O(1) memory per key.
Ideal for API rate limiting where you want to allow bursts but enforce average rate.
Rule: Set capacity based on your infrastructure's max tolerable burst. Set refill rate based on sustainable QPS.

Sliding Window Counter — Memory-Efficient and Production-Proven

The sliding window counter algorithm is a hybrid of fixed window and sliding window log. It tracks two counters per key: current window count and previous window count. The weighted average of the previous and current windows approximates the sliding window without storing per-request timestamps.

Formula: estimated_count = previous_window_count * ((window_end - now) / window_size) + current_count

Implementation: Redis key for current window (e.g., rate_limit:user_123:{current_minute}) and previous window (rate_limit:user_123:{previous_minute}). On each request, get current and previous counts, compute weighted estimate, reject if above limit.

This algorithm was popularised by Cloudflare and is used in many production systems. It provides ~1% precision error compared to sliding window log but O(1) memory per key.

Distributed clock skew issue: if nodes have different system times, they may write to different window keys for the same request. Solution: use centralised Redis TIME command in Lua script to get a consistent timestamp across all nodes.

Memory optimisation: Set TTL on window keys to 2 * window size (e.g., 60-second window → TTL 120 seconds). This prevents stale keys from accumulating.

io/thecodeforge/rate_limiter/sliding_window_counter.py · PYTHON
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
import time
import redis

class SlidingWindowCounterRateLimiter:
    def __init__(self, redis_client, limit_per_window, window_seconds):
        self.redis = redis_client
        self.limit = limit_per_window
        self.window_seconds = window_seconds
    
    def allow_request(self, key):
        """
        Sliding window counter using two Redis keys (current + previous windows).
        Memory O(1) per key. Precision ~1% vs sliding window log.
        Uses Redis TIME for consistent cluster-wide timestamp.
        """
        lua_script = """
            local key = KEYS[1]
            local limit = tonumber(ARGV[1])
            local window_seconds = tonumber(ARGV[2])
            local now = tonumber(ARGV[3])
            
            local window_start = now - window_seconds
            -- Each window is 1 second (or window_seconds, but we use 1-second granularity)
            -- Calculate current and previous window identifiers
            local current_window = math.floor(now)
            local previous_window = math.floor(window_start)
            
            local current_key = key .. ':' .. current_window
            local previous_key = key .. ':' .. previous_window
            
            -- Get counts for both windows
            local current_count = redis.call('GET', current_key) or 0
            local previous_count = redis.call('GET', previous_key) or 0
            
            -- Calculate weighted count: previous window weight * (overlap / window_seconds)
            local overlap = window_seconds - (now - previous_window)
            local weight = overlap / window_seconds
            local estimated_count = previous_count * weight + current_count
            
            if estimated_count < limit then
                -- Increment current window counter
                local new_count = redis.call('INCR', current_key)
                if new_count == 1 then
                    redis.call('EXPIRE', current_key, window_seconds * 2)
                end
                return {1, new_count}
            else
                return {0, estimated_count}
            end
        """
        
        # Get consistent timestamp from Redis (not local time)
        redis_time = self.redis.time()  # returns [seconds, microseconds]
        now = redis_time[0] + redis_time[1] / 1000000.0
        
        allowed, count = self.redis.eval(
            lua_script,
            1,
            f"rate_limit_sliding:{key}",
            self.limit,
            self.window_seconds,
            now
        )
        return allowed == 1, count

    def get_estimated_usage(self, key):
        """Get current estimated count without consuming a request."""
        # Similar to allow_request but without INCR
        # ... (omitted for brevity)
        pass
💡Use Redis TIME for Distributed Clock Sync
Never rely on node.local time for rate limiting in a cluster. Node A's 12:00:00.001 may be node B's 11:59:59.995 due to NTP drift. Use redis.time() to get a consistent timestamp from the Redis leader. Adds ~2ms latency but ensures correctness.
📊 Production Insight
A cross-region Redis cluster (us-east-1 + eu-west-1) had 30ms average clock drift. Fixed window counters on each node treated the same request as belonging to different windows, allowing 2x the rate.
The fix was to use sliding window counter with redis.time() to get a centralised timestamp. Each Lua script called redis.time() at the beginning, ensuring all nodes use the same time for window calculation.
Result: precision error dropped from 100% (2x rate) to <1%.
🎯 Key Takeaway
Sliding window counter = O(1) memory, ~1% precision error.
Use centralised timestamp (redis.time()) to avoid clock skew in distributed clusters.
Rule: For high-traffic APIs (>100 QPS per key), this is the optimal algorithm.
🗂 Rate Limiting Algorithms Comparison
Trade-offs in memory, precision, burst handling, and production suitability.
AlgorithmMemory per keyPrecisionBurst handlingDistributed clock skew safe?Best for
Fixed Window CounterO(1)Poor (2x rate at boundaries)Poor (hard reset at window edge)No (requires central timestamp)Soft limits, logging, non-critical features
Sliding Window LogO(window_requests)Perfect (100%)Excellent (sliding)Yes (timestamps per request)Low-traffic APIs (<10 QPS per key)
Token BucketO(1)Excellent (sustained rate)Excellent (configurable burst)Yes (with refill timestamp)Production APIs with burst allowance
Sliding Window CounterO(1)~99% (1% error)Good (sliding approximation)Yes (with centralised timestamp)High-traffic APIs (>100 QPS per key)

🎯 Key Takeaways

  • Fixed window = simple but burst-prone. Use for soft limits only.
  • Token bucket = burst-friendly, O(1) memory. Ideal for most production APIs.
  • Sliding window counter = O(1) memory, ~1% precision error. Best for high-traffic APIs.
  • Sliding window log = perfect precision, O(window_requests) memory. Low-traffic only.
  • Distributed rate limiters need centralised timestamps (redis.time()) to avoid clock skew.
  • Atomic increments are mandatory — non-atomic read-then-write allows 10-20% over-limit at high concurrency.
  • Set TTL = 2× window size to prevent stale keys from accumulating.
  • Rate limiting by IP is easily bypassed by proxy pools. Prefer per-authenticated-user limits.
  • For Redis outages, choose: fail open (availability), fail closed (security), or degrade to local limiter (balanced).

⚠ Common Mistakes to Avoid

    Using non-atomic read-then-write in fixed window counter
    Symptom

    At high concurrency (>1000 RPS), rate limiter allows 10-20% more requests than limit. Race condition: two threads read count=9 (limit=10), both increment to 10, both return allowed.

    Fix

    Use Redis Lua script or MULTI/EXEC with WATCH to make increment + check atomic. Or use INCR + EXPIRE in same pipeline with Lua.

    Ignoring clock skew in distributed rate limiter
    Symptom

    Cross-region Redis cluster allows 2x limit at window boundaries due to node time differences. Attackers exploit by sending requests to different nodes within drift window.

    Fix

    Use centralised timestamp via redis.time() in Lua script. Or use sliding window counter with fixed window IDs based on leader node's time.

    Setting TTL too short on token bucket state
    Symptom

    Infrequently used keys (e.g., users who log in once per week) lose token bucket state. On next login, bucket resets to capacity, allowing a burst that may exceed infrastructure capacity.

    Fix

    Set TTL to at least 2x expected idle period. For token bucket, store last_refill and tokens; if last_refill is missing, initialise with capacity. For fixed window, set TTL = 2 * window_seconds.

    Using sliding window log for high-traffic API
    Symptom

    Redis memory grows linearly with request volume. 10k QPS × 60-second window = 600k entries per key. 10 keys = 6M entries × 50 bytes = 300MB. OOM kills Redis.

    Fix

    Switch to sliding window counter (O(1) memory) or token bucket. For ZSET-based log, add ZREMRANGEBYSCORE before ZCARD to keep window size bounded.

    Rate limiting by IP only — bypassed by proxy pools
    Symptom

    Attacker uses 10,000 proxy IPs to send 1 request per IP per second, bypassing the 1000 req/sec per-IP limit. API sees 10,000 req/sec from unique IPs, each below limit.

    Fix

    Use per-authenticated-user limits (API key, JWT). For unauthenticated endpoints, combine IP + User-Agent fingerprinting + CAPTCHA for suspicious traffic. Implement global aggregate limit as safety net.

Interview Questions on This Topic

  • QExplain the trade-offs between fixed window counter and sliding window counter. When would you choose one over the other?SeniorReveal
    Fixed window counter: O(1) memory, simple implementation, but allows 2x burst at window boundaries. Worst-case: 1000 req/hour limit → 2000 requests in 2 seconds at the 59:59-60:00 boundary. This makes it unsuitable for hard SLAs. Sliding window counter: O(1) memory but more complex (stores two windows + weighted average). Precision ~99% (1% error). Eliminates boundary bursts. Requires careful handling of clock skew in distributed systems (use centralised timestamp). Choose fixed window for: soft limits (analytics, logging), low-traffic APIs (<10 QPS), or when a 2x burst is acceptable. Choose sliding window counter for: hard SLAs (payment APIs, critical infrastructure), high-traffic APIs (>100 QPS), or when you need precise rate enforcement without storing per-request timestamps.
  • QHow do you implement a distributed rate limiter that survives Redis cluster node failures?SeniorReveal
    Three strategies: 1. Client-side consistent hashing: Each rate limiter key (e.g., user_123) is always routed to the same Redis node. If that node fails, requests for that key are rejected (fail closed) until node recovers. Acceptable for non-critical rate limits. 2. Redis Cluster with replica reads: Configure replicated keys across nodes. On write failure, fall back to replica for reads only (but writes still need quorum). Use WAIT command to ensure write propagation before acknowledging request. 3. Leaky bucket with local rate limiter + central sync: Each API node maintains a local token bucket with capacity = global limit / number of nodes + 20% buffer. Periodically sync with Redis (every 10ms) to adjust local capacity. This reduces dependency on Redis and allows nodes to continue during Redis outage (at reduced capacity). For financial APIs, I prefer strategy 3: fail closed but degrade gracefully. For social media APIs, strategy 1 with short TTLs on rate limit state.
  • QWhat happens when the Redis node storing rate limit counters is unreachable? How does your system behave?SeniorReveal
    Three choices: fail open (allow all requests), fail closed (reject all requests), or degrade to local rate limiter. Fail open: If Redis is down, rate limiter allows all requests. This prevents false positives but leaves API vulnerable to DDoS. Only acceptable for non-critical features where availability > abuse risk. Fail closed: Rate limiter rejects all requests when Redis is unavailable. Prevents abuse but causes false negatives during Redis outage. Acceptable for high-security APIs (payment, authentication). Degrade to local rate limiter: Each API node maintains a local token bucket with capacity = global limit * 2 (as a safety margin). During Redis outage, local limiter throttles traffic but doesn't reject all. When Redis recovers, the node syncs its state (or simply discards local state, accepting that some excess traffic may have been allowed). Best practice: For public APIs, use degraded mode with aggressive local limits. For internal APIs, fail closed + alert on Redis health check.
  • QHow does Stripe's rate limiting work at 100,000 RPS scale?SeniorReveal
    Stripe uses a multi-layer approach: Edge layer (Cloudflare): Global rate limits per IP (1000 requests/second) to absorb DDoS. Stateless, uses sliding window counter with 1s windows. API Gateway layer (custom): Per authenticated user+API key limits (e.g., 1000 requests per 10 seconds). Uses token bucket with distributed Redis cluster (sharded by key). Each shard handles ~10k keys with local token buckets; Lua script updates state in <0.5ms. Service layer (Stripe's core): Per endpoint limits (e.g., /v1/charges: 100 requests per 1 second). Uses sliding window log with Redis ZSET because traffic per user is low; memory is acceptable. Backpressure: If Redis latency exceeds 5ms, the rate limiter fails open but logs a metric and triggers circuit breaker after 3 consecutive failures. A separate health check monitors Redis and switches to degraded mode before it becomes critical. Key insight: No single algorithm fits all tiers. Choose based on latency, memory, and precision requirements per endpoint.
  • QHow do you test a rate limiter in production without breaking existing clients?SeniorReveal
    Use shadow mode and gradual rollout: Phase 1 — Shadow mode: Deploy rate limiter that logs decision (allow/deny) but never blocks. Compare actual vs allowed counts in analytics. This catches algorithm bugs without customer impact. Phase 2 — Dry run with per-customer overrides: Allow customers who opt into testing to experience real blocking. Monitor error rates and false positives. Phase 3 — Gradual rollout with canary: Enable rate limiter for 1% of traffic. If error rate increases by >0.1%, roll back automatically. Phase 4 — Full rollout with per-customer whitelist: Allow temporary exemptions for large customers who hit limits. Provide a self-service console to adjust limits. Testing tools: - wrk2 for fixed-rate load testing (e.g., 1000 RPS sustained) - Custom scripts that replay production traffic (from logs) to measure false positives. - Chaos engineering: inject Redis latency (using tc) to verify failover behaviour.

Frequently Asked Questions

What is the difference between rate limiting and throttling?

Rate limiting caps the number of requests over a time window (e.g., 1000 requests per hour). Throttling reduces the rate of processing (e.g., delay requests to 10 per second). Throttling often uses a token bucket with queuing; rate limiting rejects excess requests outright. In practice, the terms are used interchangeably, but throttling implies a gradual slowdown, while rate limiting implies explicit rejection.

Should I rate limit by IP, user ID, or API key?

Prioritise: API key (authenticated) > user ID > IP address. IP-based limits are easily bypassed by proxy pools (attackers can rotate through thousands of IPs). For unauthenticated endpoints, combine IP + User-Agent fingerprinting + global aggregate limits as a safety net. For authenticated users, limit by API key first, user ID second, and IP only as a last resort.

How do I choose the right window size for rate limiting?

Match window size to your business SLA and infrastructure capacity: - 1 second: DDoS protection, per-second billing (e.g., 10 requests/sec). - 1 minute: API rate limits for moderate traffic (e.g., 300 requests/min). - 1 hour: Standard API quotas (e.g., 5000 requests/hour). - 1 day: Tiered plans (free: 10k/day, pro: 100k/day).

Trade-off: smaller windows enable faster burst recovery but increase Redis key churn. Larger windows smooth out bursts but may allow short-term overload.

What is the memory footprint of Redis-based rate limiters?

Fixed window or token bucket: O(1) per key → 10 million keys × 100 bytes = 1GB. Sliding window counter: similar O(1) per key. Sliding window log (ZSET): O(window_requests) per key → 1M QPS × 60s × 50 bytes = 3GB per key. Never use sliding window log for high-traffic APIs.

How do you handle rate limiting for WebSocket connections?

WebSockets are long-lived; rate limiting per open connection is insufficient. Instead, limit messages per time window per connection. Use an in-memory token bucket on the WebSocket server (e.g., 10 messages per second per connection). For distributed WebSocket servers (multiple nodes), use consistent hashing to assign each connection to a specific node, then use local rate limiter. Avoid Redis per-message overhead (adds latency). For cross-node coordination, use Redis Pub/Sub to broadcast rate limit updates, but accept that some messages will slip through during propagation delay (eventual consistency is acceptable).

Can I implement rate limiting without Redis?

Yes, for single-node deployments use in-memory token bucket (Guava RateLimiter in Java, time.Sleep-based in Go). For distributed systems, you can use consistent hashing + local limiters with total capacity = global limit / number of nodes + small buffer. However, Redis (or another central KV store) is the standard solution because it provides atomic operations, persistence, and high throughput. Alternatives: Hazelcast (in-memory grid), etcd (consensus-based, slower), or local limiters with gossip protocol (complex).

🔥
Naren Founder & Author

Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.

← PreviousDesign SlackNext →Design a Notification System
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged