Design a Rate Limiter — Clock Skew Lets 2x Traffic Through
30ms NTP drift between Redis nodes doubled request rates at window boundaries.
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
- Rate limiting stops API abuse by capping requests per key (user, IP, endpoint) in a time window. Prevents DDoS, cost spikes, and database overload.
- Token bucket: stores up to capacity tokens; each request consumes one, replenished at fixed rate. Burst-friendly, memory O(1) per key.
- Sliding window log: stores timestamp per request; count those in last window. Memory O(window requests) — can spike.
- Sliding window counter: hybrid of fixed window + log. Memory O(1), approximates log precision with 10-100M Redis ops/day.
- Fixed window counter: counts per discrete window (e.g., 1-min intervals). Burst at boundaries allows 2x rate in worst case (e.g., 59s and 60s).
- Production trap: Redis cluster + clock skew breaks rate limiting. Node A sees request at 12:00:01, node B at 11:59:59. Without synchronised clocks, limiters drift.
Imagine a theme park ride that only lets 10 people board every 5 minutes — no matter how long the line gets. That limit exists so the ride doesn't break and everyone stays safe. A rate limiter does the exact same thing for your API: it decides how many requests a user (or system) can make in a given window, and politely — or not so politely — turns away the rest. It's the velvet rope outside a club, except it runs at microsecond speed inside your servers.
Choosing the right rate limiter is a high-stakes engineering decision, not a copy-paste config. Without one fixed correctly, your system silently bleeds resources under a spike or, worse, lets a malicious burst slide right through a clock-skewed window. This article walks the trade-offs between memory, precision, and burst tolerance so you can pick the strategy your backend won’t regret at 3 AM.
Why Rate Limiters Fail Under Clock Skew
A rate limiter controls how many requests a client can make in a given time window. The core mechanic is a counter per client, reset periodically — typically using a fixed window (e.g., 100 requests per minute) or a sliding window (e.g., 10 requests per second with 1-second granularity). The naive implementation stores a timestamp and a count in a hash map, checks if the window has elapsed, and either resets or increments. That's O(1) per request but breaks under clock skew.
In practice, distributed systems rely on synchronized clocks. If node A's clock is 2 seconds ahead of node B's, a client hitting A then B can double its allowed rate. The sliding window algorithm using a sorted set (e.g., Redis Sorted Set with ZREMRANGEBYSCORE) avoids this by storing each request's timestamp and counting within a moving window — but only if all nodes share the same monotonic clock. Without it, you get 2x traffic through.
Use a rate limiter when you must protect a shared resource — an API, a database, a queue — from a single misbehaving client or a coordinated burst. The choice of algorithm (token bucket, leaky bucket, sliding window) depends on whether you need to smooth bursts or enforce a hard cap. In production, clock skew is the silent killer that turns a 100 req/min limit into 200.
Fixed Window Counter — Simple but Burst-Prone
The fixed window algorithm divides time into discrete intervals (e.g., 1 minute, 1 hour). For each key (user ID, IP address), the counter resets at the start of each interval.
Implementation: Redis INCR with key like rate_limit:user_123:2026-03-18T15:34. Set an expiry of TTL seconds. If the counter after increment exceeds the limit, reject the request.
This is the easiest algorithm to implement and the most memory-efficient (one integer per key). However, it allows bursts at the window boundary. If the limit is 1000 requests per hour, a user can send 1000 requests at 11:59:59 and another 1000 at 12:00:00 — 2000 requests within 2 seconds. The worst-case rate is 2x the limit.
In production, fixed window is acceptable for soft rate limits (logging, analytics) or when the window size is very small (1 second). For hard limits that enforce service-level agreements, you need a sliding window algorithm.
Sliding Window Log — Precise But Memory-Intensive
The sliding window log algorithm stores a timestamp of each request in a sorted set (ZSET) keyed by user ID. When a new request arrives, it removes entries older than the window (e.g., last 60 seconds), counts the remaining entries, and accepts if the count is below the limit.
Implementation: ZADD key timestamp request_id. ZREMRANGEBYSCORE key 0 current_time - window_seconds. ZCARD key to get count.
This algorithm provides perfect precision — no burst at boundaries because the window slides continuously rather than resetting at fixed intervals.
Memory consumption grows with request volume. At 10,000 requests per second, each window stores up to 600,000 timestamps (for 60-second window). Each timestamp is ~50 bytes in Redis = 30MB per key. For 10,000 keys, that's 300GB — unacceptable.
In production, sliding window log is only suitable for low-traffic APIs (<10 QPS per key). For high-traffic, use sliding window counter (approximate, O(1) memory).
Token Bucket — Burst-Friendly and Predictable
The token bucket algorithm maintains a bucket for each key with a capacity (max burst) and a refill_rate (tokens per second). Each request consumes one token. Tokens are added at the refill rate up to the capacity.
Implementation: Store last_refill_timestamp and tokens in Redis. On each request, calculate tokens added since last refill, cap at capacity, deduct one token.
This algorithm is ideal for APIs that need to handle bursts (e.g., 10 requests in 1 second) but enforce a sustained average rate (e.g., 60 requests per minute). It's also memory-efficient: O(1) per key.
Token bucket is the default choice for most production rate limiters (used by AWS API Gateway, Stripe, GitHub).
Distributed implementation: use Redis Lua script for atomic token calculation. The script reads current_tokens and last_refill, computes new token count, deducts if sufficient, and updates Redis.
Corner case: if last_refill is too far in the past (e.g., user hasn't used the API for days), the token count caps at capacity. This prevents unlimited token accumulation.
- Capacity = maximum burst. If capacity = 10, you can handle 10 requests instantly.
- Refill rate = sustained average. If refill_rate = 1 token/second, sustained limit is 3600 requests/hour.
- Spike handling: burst uses stored tokens, then refills at constant rate.
- If user hasn't used API for hours, tokens accumulate up to capacity (not unlimited).
- Token bucket is the algorithm behind AWS API Gateway's rate limiting.
Sliding Window Counter — Memory-Efficient and Production-Proven
The sliding window counter algorithm is a hybrid of fixed window and sliding window log. It tracks two counters per key: current window count and previous window count. The weighted average of the previous and current windows approximates the sliding window without storing per-request timestamps.
Formula: estimated_count = previous_window_count * ((window_end - now) / window_size) + current_count
Implementation: Redis key for current window (e.g., rate_limit:user_123:{current_minute}) and previous window (rate_limit:user_123:{previous_minute}). On each request, get current and previous counts, compute weighted estimate, reject if above limit.
This algorithm was popularised by Cloudflare and is used in many production systems. It provides ~1% precision error compared to sliding window log but O(1) memory per key.
Distributed clock skew issue: if nodes have different system times, they may write to different window keys for the same request. Solution: use centralised Redis TIME command in Lua script to get a consistent timestamp across all nodes.
Memory optimisation: Set TTL on window keys to 2 * window size (e.g., 60-second window → TTL 120 seconds). This prevents stale keys from accumulating.
redis.time() to get a consistent timestamp from the Redis leader. Adds ~2ms latency but ensures correctness.redis.time() to get a centralised timestamp. Each Lua script called redis.time() at the beginning, ensuring all nodes use the same time for window calculation.redis.time()) to avoid clock skew in distributed clusters.Where to Park the Cop: Placement of a Rate Limiter
Where you put the rate limiter in your stack changes what it protects. Slap it on the API gateway and you shield your entire backend from bad actors, but you lose per-service nuance. Embed it in the application layer and you get full control over business logic — but now every service has to implement its own throttle logic, which is how you end up with five different rate limiting bugs in production.
The right answer depends on what you're defending. If you're blocking DDoS or scraping, put the limiter at the edge — cloud load balancer or API gateway. That's where you kill traffic before it consumes compute. If you're rate limiting per-user for billing tiers, the application layer is the only place that knows who the user is and what they paid for.
Hybrid approach works best. Edge limiter for broad strokes (IP-based, global request rate). App-layer limiter for fine-grained control (user ID, API key, endpoint-specific rules). Just don't try to do everything in the database. Your counter store needs to be fast — Redis or similar in-memory store, not PostgreSQL.
High-Level Design: The Skeleton Before the Algorithms
A rate limiter isn't one thing. It's a decision engine with three pieces: a rule store, a counter store, and a decision point. The rule store holds your configuration — who gets what limit and for how long. The counter store tracks how many requests each user has burned. The decision point sits between the client and your backend, checks the counters against the rules, and either passes the request or returns HTTP 429.
Counter store is where most designs fail. You need sub-millisecond reads and writes because every request hits it. Relational databases don't cut it. Use Redis with the INCR command and a TTL — that's the simplest production-proven counter. For multi-region setups, you need CRDTs or a consensus layer like Redis Cluster with write-your-locality. Don't use global locks. You'll serialize all your traffic and defeat the purpose of a distributed system.
The decision point itself must be stateless. No local counters, no in-memory state that disappears when the pod restarts. The gateway or middleware reads the rule from a cache (rules rarely change), queries the counter store, and makes the call. If the counter store goes down, you fail open or closed — pick your poison. Fail open allows a flood. Fail closed blocks everything. Production systems fail open with a degraded mode (allow all but log it) because a 429 storm from a healthy client is worse than a few extra requests.
Functional Requirements — What the System Must Do
Functional requirements define the exact behaviors a rate limiter must support. The core requirement is rejecting requests that exceed a defined threshold within a time window, returning HTTP 429 with a Retry-After header. The limiter must support configurable limits per client (API key, user ID, IP address) and per endpoint. It must distinguish between different time windows (second, minute, hour, day) and allow hierarchical limits (e.g., 100 requests per second and 10,000 per hour). The system must expose current rate-limit status via headers (X-RateLimit-Remaining, X-RateLimit-Reset). Clients must be able to query remaining quota without deducting from it (dry-run mode). For multi-instance deployments, the limiter must enforce global, not local, counts. Rate limit definitions must be hot-reloadable — no restarts when adding new API keys. Finally, the system must log all rejected requests.
Non-Functional Requirements — The Constraints That Shape Design
Non-functional requirements dictate how the rate limiter behaves under pressure. Latency must stay under 5 milliseconds per decision — every millisecond added eats into the business logic time budget. Throughput must scale horizontally to handle 100,000 requests per second without central bottlenecks. Availability must exceed 99.99% — a rate limiter outage means no API traffic flows, costing revenue. Consistency matters: once a request is counted, all subsequent decisions must see that count within 100 milliseconds (eventual consistency). Durability of counter state is optional for in-memory windows but critical for billing-level limits. The system must degrade gracefully under backpressure — when Redis is down, fallback to a local token bucket with relaxed limits rather than blocking all traffic. Security demands tamper-proof client identifiers and protection against spoofed headers. Cost efficiency requires memory-optimized counters in shared clusters.
The Redis Clock Skew That Opened the Floodgates
CLOCK_SKEW_TOLERANCE_MS = 100 config; if multiple nodes disagree on window, use the earliest window to be conservative.- Distributed rate limiters using local node time are vulnerable to clock skew. Centralise time or use algorithms that don't depend on absolute windows.
- Sliding window log (ZSET) eliminates window boundary attacks but memory grows with request volume.
- Fixed window counters in distributed systems can allow 2x the intended rate at boundaries. Acceptable only for non-critical limits.
- If you must use local time, add a conservative buffer (e.g., treat requests within 100ms of window boundary as potentially belonging to previous/next window and reject).
- Monitor the delta between local node time and a trusted time source. Alert if drift exceeds 20ms.
redis-cli INFO memory | grep used_memory_humanredis-cli --bigkeysKey takeaways
redis.time()) to avoid clock skew.Common mistakes to avoid
5 patternsUsing non-atomic read-then-write in fixed window counter
Ignoring clock skew in distributed rate limiter
redis.time() in Lua script. Or use sliding window counter with fixed window IDs based on leader node's time.Setting TTL too short on token bucket state
Using sliding window log for high-traffic API
ZREMRANGEBYSCORE before ZCARD to keep window size bounded.Rate limiting by IP only — bypassed by proxy pools
Interview Questions on This Topic
Explain the trade-offs between fixed window counter and sliding window counter. When would you choose one over the other?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
That's Real World. Mark it forged?
8 min read · try the examples if you haven't