Senior 5 min · March 06, 2026

ASP.NET Core Rate Limiting — Redis Pool Starvation Bug

A Redis pool starvation in IRateLimiterPolicy caused ASP.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Core concept: ASP.NET Core's RateLimiterMiddleware enforces request quotas per client before application logic runs.
  • Key components: FixedWindow, SlidingWindow, TokenBucket, Concurrency algorithms plus partitioned limiters per IP/user/key.
  • Performance insight: Token bucket supports bursts up to TokenLimit while smoothing sustained load — ideal for public APIs.
  • Production insight: In-memory state breaks under multiple instances; use Redis-backed policies for horizontal scaling.
  • Biggest mistake: Relying on RemoteIpAddress behind a proxy without forwarding headers — blocks all users as one.
Plain-English First

Imagine a popular lemonade stand that can only serve 10 cups per minute. If 50 kids show up at once, the stand doesn't collapse — it just makes everyone wait their turn or politely says 'come back in a minute.' Rate limiting does exactly that for your API: it controls how many requests a client can make in a given time window so your server never gets crushed by traffic spikes, accidental hammering, or malicious bots.

Every public API you've ever used has a rate limiter quietly working behind the scenes. GitHub's API caps you at 5,000 requests per hour. Stripe throttles card-creation calls per second. Twitter once killed third-party apps overnight by tightening limits. Rate limiting isn't a nice-to-have — it's the difference between an API that scales gracefully under load and one that falls over the moment a client's for-loop goes rogue or a DDoS probe starts knocking.

How ASP.NET Core's Built-in Rate Limiting Middleware Actually Works

Before .NET 7, the only way to rate-limit in ASP.NET Core was to bolt on a third-party library like AspNetCoreRateLimit or write custom middleware from scratch. Both approaches worked, but they weren't first-class citizens — they lived outside the framework and couldn't tap into the built-in routing pipeline or endpoint metadata system.

.NET 7 introduced the Microsoft.AspNetCore.RateLimiting namespace and the RateLimiterMiddleware. Internally, it integrates with the IRateLimiterPolicy<TPartitionKey> interface and the lower-level System.Threading.RateLimiting primitives, which were deliberately shipped as a standalone NuGet package (System.Threading.RateLimiting) so you can use the algorithm implementations anywhere — not just in web apps.

The middleware sits in the request pipeline and calls RateLimiter.AcquireAsync() before your endpoint handler ever runs. If a lease is granted, the request flows through. If the limiter rejects the request, the middleware short-circuits with a 429 Too Many Requests response — your controller code is never touched. This is important: rate limiting is enforced at the infrastructure layer, not the application layer, which means you get protection even for endpoints you haven't explicitly coded defensively.

Program.csCSHARP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
using Microsoft.AspNetCore.RateLimiting;
using System.Threading.RateLimiting;

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddControllers();

// ── Register the rate limiting services and configure policies ──────────────
builder.Services.AddRateLimiter(rateLimiterOptions =>
{
    // Reject callbacks let you customise the 429 response body and headers.
    rateLimiterOptions.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    rateLimiterOptions.OnRejected = async (context, cancellationToken) =>
    {
        // Add a Retry-After header so well-behaved clients know when to retry.
        if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
        {
            context.HttpContext.Response.Headers.RetryAfter =
                ((int)retryAfter.TotalSeconds).ToString();
        }

        context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
        await context.HttpContext.Response.WriteAsync(
            "Too many requests. Please slow down.",
            cancellationToken);
    };

    // ── Fixed Window policy: 10 requests per 10-second window per IP ─────────
    rateLimiterOptions.AddFixedWindowLimiter(
        policyName: "FixedWindowPolicy",
        options =>
        {
            options.PermitLimit         = 10;                     // max requests allowed
            options.Window              = TimeSpan.FromSeconds(10); // window duration
            options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            options.QueueLimit          = 2;  // allow 2 requests to queue; rest get 429
        });

    // ── Partitioned policy: each IP address gets its OWN sliding window ──────
    rateLimiterOptions.AddSlidingWindowLimiter(
        policyName: "SlidingWindowPerIp",
        options =>
        {
            options.PermitLimit         = 20;
            options.Window              = TimeSpan.FromSeconds(30);
            options.SegmentsPerWindow   = 3;   // divides 30s into 3 × 10s segments
            options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            options.QueueLimit          = 0;   // no queuing — reject immediately
        });
});

var app = builder.Build();

// ── UseRateLimiter MUST come after UseRouting but before MapControllers ──────
app.UseRouting();
app.UseRateLimiter();   // <── this is the middleware that does the work
app.UseAuthorization();
app.MapControllers();

app.Run();
Output
// No direct console output — the middleware operates silently on each request.
// When a client exceeds the limit you will see in the HTTP response:
//
// HTTP/1.1 429 Too Many Requests
// Retry-After: 10
// Content-Type: text/plain; charset=utf-8
//
// Too many requests. Please slow down.
Watch Out: Middleware Order Is Not Optional
If you place UseRateLimiter() before UseRouting(), the middleware can't resolve endpoint metadata — named policies applied via [EnableRateLimiting] attributes will silently do nothing. Always order it: UseRouting → UseRateLimiter → UseAuthorization → MapControllers.
Production Insight
Middleware order is critical — UseRateLimiter after UseRouting or endpoint metadata is unavailable.
The middleware short-circuits with 429 before controller code runs, ensuring protection even on unhandled endpoints.
Always verify middleware order in your pipeline when rate limiting seems ignored.
Key Takeaway
Rate limiting in ASP.NET Core is infrastructure-level, not application-level.
It intercepts requests before your endpoint logic executes.
Protection is automatic for all routes if the middleware is placed correctly.

Choosing the Right Algorithm — Fixed Window vs Sliding Window vs Token Bucket vs Concurrency

Each of the four built-in algorithms solves a slightly different problem. Picking the wrong one doesn't just hurt correctness — it can crater performance or give clients a worse experience than they deserve.

Fixed Window is the simplest. It resets a counter at the start of each window. The dark side: a client can fire all 10 allowed requests in the last millisecond of window N, then fire another 10 in the first millisecond of window N+1 — hitting you with 20 requests in a 2ms burst. This 'boundary burst' is a well-known flaw.

Sliding Window eliminates boundary bursts by tracking requests across overlapping segments. It's more accurate but uses more memory because it maintains per-segment counters.

Token Bucket is the industry favourite for APIs that want to allow short bursts but smooth out sustained traffic. Tokens refill at a steady rate; a client can save up unused tokens and spend them in a burst. This maps naturally to 'burst-friendly' API contracts.

Concurrency Limiter isn't about time at all — it limits how many requests can be in-flight simultaneously. This is the right tool when your bottleneck is a downstream resource (a database connection pool, an external API) rather than raw request rate.

RateLimitPolicies.csCSHARP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
using Microsoft.AspNetCore.RateLimiting;
using System.Threading.RateLimiting;

// This file shows ALL four algorithms side-by-side so you can compare them.
// Wire these up inside AddRateLimiter() in Program.cs.

public static class RateLimitPolicies
{
    public const string FixedWindow   = nameof(FixedWindow);
    public const string SlidingWindow = nameof(SlidingWindow);
    public const string TokenBucket   = nameof(TokenBucket);
    public const string Concurrency   = nameof(Concurrency);

    public static void Register(RateLimiterOptions options)
    {
        // ── 1. Fixed Window ──────────────────────────────────────────────────
        // Good for: simple internal endpoints where occasional boundary bursts are tolerable.
        options.AddFixedWindowLimiter(FixedWindow, opt =>
        {
            opt.PermitLimit          = 100;                      // 100 reqs per minute
            opt.Window               = TimeSpan.FromMinutes(1);
            opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            opt.QueueLimit           = 10;
        });

        // ── 2. Sliding Window ────────────────────────────────────────────────
        // Good for: search endpoints, user-facing UIs where burst spikes feel bad.
        // SegmentsPerWindow=6 means the 60s window is divided into 6×10s buckets.
        // Requests from the oldest bucket are 'forgotten' as time moves forward.
        options.AddSlidingWindowLimiter(SlidingWindow, opt =>
        {
            opt.PermitLimit          = 100;
            opt.Window               = TimeSpan.FromMinutes(1);
            opt.SegmentsPerWindow    = 6;                        // granularity tradeoff
            opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            opt.QueueLimit           = 5;
        });

        // ── 3. Token Bucket ──────────────────────────────────────────────────
        // Good for: payment processors, upload APIs — burst-friendly but smooth overall.
        // TokensPerPeriod=20 means 20 tokens are added every ReplenishmentPeriod.
        // TokenLimit=50 is the bucket capacity — max burst size.
        options.AddTokenBucketLimiter(TokenBucket, opt =>
        {
            opt.TokenLimit           = 50;                       // bucket capacity
            opt.ReplenishmentPeriod  = TimeSpan.FromSeconds(10);
            opt.TokensPerPeriod      = 20;                       // refill rate
            opt.AutoReplenishment    = true;                     // background timer handles refill
            opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            opt.QueueLimit           = 3;
        });

        // ── 4. Concurrency Limiter ───────────────────────────────────────────
        // Good for: endpoints hitting a DB connection pool or a slow third-party API.
        // This says: max 5 requests can execute at the same time; up to 2 can queue.
        options.AddConcurrencyLimiter(Concurrency, opt =>
        {
            opt.PermitLimit          = 5;                        // max concurrent requests
            opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            opt.QueueLimit           = 2;
        });
    }
}
Output
// No runtime output — these are policy registrations.
// At runtime, when a request hits the concurrency policy with 5 already in-flight:
//
// HTTP/1.1 429 Too Many Requests
// (the QueueLimit=2 requests will wait; the 8th concurrent caller gets 429 immediately)
Pro Tip: Token Bucket Is the API Industry Standard
AWS API Gateway, Stripe, and GitHub all use token bucket or a close variant because it lets power users burst without punishing steady-state callers. If you're building a public-facing API and you're not sure which algorithm to pick, start with token bucket — it gives you the most natural 'fair usage' behaviour with the fewest angry support tickets.
Production Insight
Fixed window boundary bursts can cause up to 2x traffic at window edges, breaking downstream dependencies.
Token bucket's AutoReplenishment uses a background timer; ensure the timer interval matches your replenishment period.
Concurrency limiter is often confused with rate limiter — use it for I/O bottlenecks, not request rates.
Key Takeaway
Token bucket is the industry default for public APIs because it allows bursts while smoothing sustained usage.
Sliding window eliminates boundary bursts at the cost of memory per partition.
Concurrency limiter is for controlling simultaneous in-flight requests, not time-based rate limiting.

Partitioned Rate Limiting — Per-User, Per-API-Key and Per-IP Policies in Production

A global rate limiter that throttles every caller equally is almost never what you want in production. Your paying enterprise customer shouldn't share a quota with an anonymous crawler. A background service you own shouldn't compete with end-user traffic.

This is where partitioned limiters come in. Instead of one shared RateLimiter instance, a partitioned limiter creates (or retrieves) a separate limiter instance for each partition key — typically an IP address, a user ID, an API key, or some combination. Each partition has its own counter that doesn't affect anyone else.

The AddPolicy<TPartitionKey> overload is how you build a partitioned policy. The factory delegate receives the HttpContext and must return a RateLimitPartition<TPartitionKey> — a value type that pairs a key with a factory function that creates the limiter for that key.

Critically, these per-partition limiter instances are cached inside a PartitionedRateLimiter<TResource>. You're not allocating a new object on every request — the runtime reuses the instance for the same key. However, if you have millions of unique keys (e.g. one per user), the cache can grow unbounded. Use ReplenishmentPeriod wisely and monitor memory in production.

PartitionedRateLimitPolicy.csCSHARP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
using Microsoft.AspNetCore.RateLimiting;
using System.Security.Claims;
using System.Threading.RateLimiting;

builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    // ── Partitioned policy: anonymous vs authenticated users get different limits ──
    options.AddPolicy(
        policyName: "UserTierPolicy",
        partitioner: httpContext =>
        {
            // Prefer the authenticated user ID as the partition key.
            // Fall back to IP address for anonymous callers.
            var userId = httpContext.User.FindFirstValue(ClaimTypes.NameIdentifier);

            if (!string.IsNullOrEmpty(userId))
            {
                // Authenticated users get a generous token bucket.
                return RateLimitPartition.GetTokenBucketLimiter(
                    partitionKey: $"user:{userId}",   // unique key per user
                    factory: _ => new TokenBucketRateLimiterOptions
                    {
                        TokenLimit          = 200,               // large bucket for auth'd users
                        ReplenishmentPeriod = TimeSpan.FromSeconds(30),
                        TokensPerPeriod     = 50,
                        AutoReplenishment   = true,
                        QueueLimit          = 0
                    });
            }

            // Anonymous callers get a much stricter fixed window per IP.
            var clientIpAddress =
                httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown";

            return RateLimitPartition.GetFixedWindowLimiter(
                partitionKey: $"anon:{clientIpAddress}",
                factory: _ => new FixedWindowRateLimiterOptions
                {
                    PermitLimit          = 10,                   // tight limit for anonymous
                    Window               = TimeSpan.FromSeconds(10),
                    QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
                    QueueLimit           = 0
                });
        });
});

// ── Apply the policy to a specific controller via attribute ─────────────────
[ApiController]
[Route("api/[controller]")]
[EnableRateLimiting("UserTierPolicy")]   // <── applied to every action in this controller
public class ProductsController : ControllerBase
{
    [HttpGet]
    public IActionResult GetAll()
    {
        return Ok(new[] { "Widget A", "Widget B", "Widget C" });
    }

    [HttpGet("{id:int}")]
    [DisableRateLimiting]   // <── opt-out for this specific action (e.g. health/status endpoints)
    public IActionResult GetById(int id)
    {
        return Ok($"Product {id}");
    }
}
Output
// Anonymous caller from 192.168.1.5 — 11th request in 10 seconds:
// HTTP/1.1 429 Too Many Requests
//
// Authenticated user 'alice@example.com' — 201st token consumed within window:
// HTTP/1.1 429 Too Many Requests
//
// GET /api/products/42 — always succeeds because [DisableRateLimiting] is applied:
// HTTP/1.1 200 OK
Watch Out: IP Address Is Not a Safe Partition Key Behind a Proxy
If your app runs behind a reverse proxy (NGINX, Azure Front Door, AWS ALB), HttpContext.Connection.RemoteIpAddress will be the proxy's IP — meaning ALL your users share one rate limit. Always configure ForwardedHeadersOptions and use X-Forwarded-For or X-Real-IP to extract the real client IP. Alternatively, partition by an API key or JWT claim which doesn't have this problem.
Production Insight
Partition keys based on IP behind a proxy require ForwardedHeaders — without it, all users share one bucket.
Partitioned limiter instances are cached; monitor memory if you have millions of unique keys.
Anonymous fallback to IP can be abused via IP rotation; prefer API key or JWT when possible.
Key Takeaway
Partitioned limiters give each client their own quota — essential for multi-tenant APIs.
Partition key selection is the most common source of rate limiting bugs.
Always test with the actual client identifier, not the connection IP.

Production Gotchas — Distributed Environments, Load Balancers and Custom Rejection Responses

The built-in middleware stores limiter state in process memory. On a single-node deployment that's fine. The moment you scale to two or more instances — behind a load balancer, in Kubernetes with multiple pods — each instance maintains its own independent counter. A client can now hit 10× your intended limit just by round-robin luck across 10 pods.

For distributed rate limiting you have two practical options. First, implement IRateLimiterPolicy<TPartitionKey> backed by a Redis counter (using StackExchange.Redis with Lua scripts for atomic increment-and-check). Second, use a gateway-level rate limiter (NGINX limit_req, Azure API Management, AWS WAF) that sits in front of all instances and is the single source of truth.

Another common production issue is the queue behaviour under thundering herd. When QueueLimit is greater than zero, queued requests hold a thread (technically they await an async operation, but they do consume memory and connection handles). If a sudden spike queues 10,000 requests with a 1-second window, those requests all time out simultaneously and your clients get a terrible experience. For public APIs, QueueLimit = 0 is often the safer choice — fail fast and let clients back off.

Finally, always expose rate limit headers. RFC 6585 defines Retry-After, and the emerging RateLimit-* headers draft (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) is being adopted broadly. Well-behaved clients rely on these to implement exponential backoff without guessing.

DistributedRateLimitPolicy.csCSHARP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
using Microsoft.AspNetCore.RateLimiting;
using StackExchange.Redis;
using System.Threading.RateLimiting;

// ── Custom IRateLimiterPolicy backed by Redis for distributed rate limiting ──
// This is the pattern you NEED when running multiple instances in production.
//
// Redis Lua script ensures the increment + expiry check is atomic (no race conditions).
public class RedisFixedWindowPolicy : IRateLimiterPolicy<string>
{
    private readonly IConnectionMultiplexer _redis;
    private readonly ILogger<RedisFixedWindowPolicy> _logger;

    // How many requests are allowed per window
    private const int PermitLimit = 100;
    private static readonly TimeSpan Window = TimeSpan.FromMinutes(1);

    // Lua script: atomically increment and set expiry on first call
    private const string IncrementScript = """
        local current = redis.call('INCR', KEYS[1])
        if current == 1 then
            redis.call('EXPIRE', KEYS[1], ARGV[1])
        end
        return current
        """;

    public RedisFixedWindowPolicy(
        IConnectionMultiplexer redis,
        ILogger<RedisFixedWindowPolicy> logger)
    {
        _redis  = redis;
        _logger = logger;
    }

    // GetPartition is called on EVERY request — keep it fast.
    public RateLimitPartition<string> GetPartition(HttpContext httpContext)
    {
        // Use a claim-based key for authenticated users; IP for anonymous.
        var partitionKey = httpContext.User.Identity?.IsAuthenticated == true
            ? $"rl:user:{httpContext.User.Identity.Name}"
            : $"rl:ip:{httpContext.Connection.RemoteIpAddress}";

        // We return a NoLimiter here and do the actual Redis check in a custom
        // middleware BEFORE this policy runs — see note in callout below.
        // For demonstration we show the Redis check inline via a custom factory.
        return RateLimitPartition.Get(
            partitionKey: partitionKey,
            factory: key => new RedisBackedRateLimiter(key, _redis, _logger));
    }

    public Func<OnRejectedContext, CancellationToken, ValueTask>? OnRejected =>
        async (context, cancellationToken) =>
        {
            context.HttpContext.Response.StatusCode = 429;

            // Expose the reset time so clients can schedule a retry intelligently.
            context.HttpContext.Response.Headers["RateLimit-Limit"]  = PermitLimit.ToString();
            context.HttpContext.Response.Headers["RateLimit-Reset"]  =
                DateTimeOffset.UtcNow.Add(Window).ToUnixTimeSeconds().ToString();

            await context.HttpContext.Response.WriteAsJsonAsync(
                new { error = "rate_limit_exceeded", retryAfterSeconds = (int)Window.TotalSeconds },
                cancellationToken);
        };
}

// ── Register in Program.cs ───────────────────────────────────────────────────
// builder.Services.AddSingleton<IConnectionMultiplexer>(
//     ConnectionMultiplexer.Connect("localhost:6379"));
// builder.Services.AddRateLimiter(opt =>
//     opt.AddPolicy<string, RedisFixedWindowPolicy>("DistributedPolicy"));
Output
// When client exceeds the Redis-backed limit across ANY pod in the cluster:
//
// HTTP/1.1 429 Too Many Requests
// RateLimit-Limit: 100
// RateLimit-Reset: 1718293200
// Content-Type: application/json
//
// {"error":"rate_limit_exceeded","retryAfterSeconds":60}
Interview Gold: In-Memory vs Distributed Rate Limiting
Interviewers love this question. The built-in ASP.NET Core middleware is in-process only — state lives in RAM and dies with the process. For horizontally scaled deployments, you need a shared backing store (Redis is the industry default). The key insight: Redis Lua scripts are used for atomic counter increments because a read-then-write in application code has a TOCTOU race condition under concurrent load.
Production Insight
Queue limit > 0 under thundering herd creates synchronous timeouts and memory pressure.
Redis Lua scripts for atomic increment avoid TOCTOU but require connection multiplexing.
RateLimit-* headers are becoming standard; implement them now to avoid future client breakage.
Key Takeaway
In-process state dies with the process — distributed rate limiting is mandatory for horizontal scaling.
Fail fast with QueueLimit=0 for public APIs to avoid cascading failures.
Always expose rate limit headers to let clients implement proper backoff.

Implementing Custom Rate Limiter Policies with IRateLimiterPolicy

When the built-in policies fall short — maybe you need a hybrid of token bucket and concurrency, or a rate limit that scales with a JWT claim — you'll reach for IRateLimiterPolicy<TPartitionKey>. This interface gives you full control over how partitions are created and how the limiter behaves on rejection.

The key method is GetPartition(HttpContext), which returns a RateLimitPartition<TPartitionKey>. You choose the algorithm and configuration per partition. The factory delegate you pass inside is cached: it runs once per unique partition key, then the resulting limiter is reused. That's great for performance but dangerous if your factory allocates expensive resources (like a new Redis connection) — you'll leak connections fast.

Here's a pattern using io.thecodeforge namespace with a Redis-backed sliding window that uses connection multiplexing correctly. Note the singleton IConnectionMultiplexer injected via DI. Also note the fail-closed behaviour: if Redis is unreachable, we refuse all requests rather than allowing unlimited traffic.

IoTheCodeforgeRateLimiter.csCSHARP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
using Microsoft.AspNetCore.RateLimiting;
using StackExchange.Redis;
using System.Threading.RateLimiting;

namespace io.thecodeforge.RateLimiting;

public class CustomSlidingWindowPolicy : IRateLimiterPolicy<string>
{
    private readonly IConnectionMultiplexer _redis;
    private const int PermitLimit = 100;
    private static readonly TimeSpan Window = TimeSpan.FromMinutes(1);
    private const int Segments = 6;

    public CustomSlidingWindowPolicy(IConnectionMultiplexer redis)
    {
        _redis = redis;
    }

    public RateLimitPartition<string> GetPartition(HttpContext httpContext)
    {
        var partitionKey = httpContext.User.Identity?.IsAuthenticated == true
            ? $"rl:user:{httpContext.User.Identity.Name}"
            : $"rl:ip:{httpContext.Connection.RemoteIpAddress}";

        // Factory is called once per partitionKey and cached.
        // Only use the multiplexer here, never create a new ConnectionMultiplexer.
        return RateLimitPartition.GetSlidingWindowLimiter(
            partitionKey,
            _ => new SlidingWindowRateLimiterOptions
            {
                PermitLimit = PermitLimit,
                Window = Window,
                SegmentsPerWindow = Segments,
                QueueLimit = 0  // fail fast for public endpoints
            });
    }

    public Func<OnRejectedContext, CancellationToken, ValueTask>? OnRejected =>
        async (context, cancellationToken) =>
        {
            context.HttpContext.Response.StatusCode = 429;
            context.HttpContext.Response.Headers["RateLimit-Limit"] = PermitLimit.ToString();
            context.HttpContext.Response.Headers["Retry-After"] = "60";
            await context.HttpContext.Response.WriteAsJsonAsync(
                new { error = "rate_limit_exceeded", retry_after_seconds = 60 },
                cancellationToken);
        };
}

// Registration in Program.cs:
// builder.Services.AddSingleton<IConnectionMultiplexer>(
//     ConnectionMultiplexer.Connect(Configuration.GetConnectionString("Redis")));
// builder.Services.AddRateLimiter(opt =>
//     opt.AddPolicy<string, CustomSlidingWindowPolicy>("CustomSlidingPolicy"));
Output
// With this policy registered and applied, each client gets their own sliding window.
// Rate limit headers are always returned, even on successful requests (if you add middleware to forward them).
// On failure: HTTP 429 with JSON body and Retry-After header.
Fail Closed vs Fail Open: Decide Intentionally
If your custom policy's backing store (Redis, database) goes down, the factory will throw. The middleware catches that and by default fails open (allows the request). For security-critical rate limits, you may want to fail closed — but that can cause a total outage. Use a circuit breaker or a fallback in-memory limiter for degraded mode.
Production Insight
Custom policies that allocate resources in the factory cause connection leaks — inject singletons instead.
Fail-closed behaviour prevents abuse during backing store outages but risks total blackout.
Always test custom policies under load with a backing store failure scenario.
Key Takeaway
IRateLimiterPolicy gives you complete control over partition logic and rejection responses.
Resource management in the factory delegate is the #1 source of production bugs.
Decide fail-closed vs fail-open based on your API's security vs availability requirements.
● Production incidentPOST-MORTEMseverity: high

The Redis Throttle That Became the Throttle

Symptom
After deploying a distributed rate limiter with Redis, the API started allowing unlimited requests from clients that should have been rate-limited.
Assumption
The rate limiter was working correctly because unit tests passed and the Redis connection was healthy in staging.
Root cause
The custom IRateLimiterPolicy acquired a new Redis connection from the pool on every request, but never released it — leading to pool starvation. Once the pool was exhausted, the rate limiter fell back to a default NoLimiter, bypassing all limits.
Fix
Wrap the Redis client in a singleton and reuse it across all requests. Add connection multiplexing and set a reasonable pool size with proper timeout.
Key lesson
  • Always monitor connection pool metrics for external dependencies used in rate limiter policies.
  • Implement a circuit breaker or fallback that rejects requests when the backing store is unavailable, rather than defaulting to unlimited.
  • Load-test rate limited endpoints with simulated distributed traffic to catch resource exhaustion before production.
Production debug guideSymptom → Action guide for common rate limiter failures4 entries
Symptom · 01
Client gets 429 even though they are below the limit
Fix
Check that the partition key (IP, user ID) is correctly extracted. If behind a proxy, verify ForwardedHeaders middleware is configured.
Symptom · 02
Rate limiting appears inconsistent across multiple pods
Fix
Check if you're using in-memory or distributed limiter. If in-memory, switch to Redis-backed policy.
Symptom · 03
Retry-After header is missing or incorrect
Fix
Verify the OnRejected callback sets RetryAfter using lease metadata. Ensure the policy returns metadata via TryGetMetadata.
Symptom · 04
Rate limit policy not applying to certain endpoints
Fix
Check middleware order: UseRouting, UseRateLimiter, UseAuthorization, MapControllers. Also verify EnableRateLimiting attribute is present.
★ Quick Reference for Rate Limiting DebuggingImmediate commands and fixes for the most common rate limiting issues
429 returned unexpectedly
Immediate action
Check the partition key used for the request.
Commands
curl -v https://api.example.com/endpoint
docker compose logs api | grep 'RateLimiter'
Fix now
Add logging to your OnRejected callback to see the partition key and limit count.
Rate limit not enforced horizontally+
Immediate action
Confirm if you're using Redis-backed policy.
Commands
kubectl exec -it pod -- curl localhost:5000/health
redis-cli --raw keys 'rl:*' | head -20
Fix now
Replace in-memory limiter with custom RedisBackedRateLimiter.
All users share the same limit behind proxy+
Immediate action
Check if X-Forwarded-For is being respected.
Commands
tail -f /var/log/nginx/access.log | grep 'X-Forwarded-For'
curl -H 'X-Forwarded-For: 10.0.0.1' https://api/endpoint
Fix now
Add app.UseForwardedHeaders() and configure ForwardedHeadersOptions.
Rate Limiting Algorithm Comparison
AspectFixed WindowSliding WindowToken BucketConcurrency
What it limitsRequests per time windowRequests across rolling windowRequests via token refillSimultaneous in-flight requests
Boundary burst vulnerabilityYes — double-tap at window edgeNo — segments smooth it outNo — token drain prevents itN/A — time-independent
Memory usage per partitionLow (1 counter)Medium (N segment counters)Low (1 counter + metadata)Low (1 semaphore)
Burst-friendly?NoNoYes — up to TokenLimitNo — hard concurrency ceiling
Best use caseSimple internal APIsUser-facing search / feedsPublic APIs, payment endpointsDB-backed or slow I/O endpoints
Queue supportYesYesYesYes
Refill mechanismWindow resetSegment expiryTimer-based replenishmentRelease on request completion
.NET class nameFixedWindowRateLimiterSlidingWindowRateLimiterTokenBucketRateLimiterConcurrencyLimiter

Key takeaways

1
The built-in RateLimiterMiddleware in .NET 7+ is in-process only
the moment you scale horizontally, you need Redis-backed distributed limiting or gateway-level enforcement, full stop.
2
Token bucket is the right default for public APIs because it allows short bursts (great UX) while preventing sustained abuse
this is why AWS, Stripe and GitHub all use it.
3
QueueLimit = 0 is usually the correct choice for public APIs
queuing under a thundering herd consumes memory and causes synchronised retry storms when the window expires.
4
Always emit Retry-After and the draft RateLimit-* headers in your 429 responses
without them, well-intentioned clients have no choice but to hammer you with retries at full speed.

Common mistakes to avoid

3 patterns
×

Relying on in-memory rate limiting in a multi-instance deployment

Symptom
Clients easily exceed their supposed limit because each pod has an independent counter. A user hitting 10 pods round-robin gets 10× the intended limit.
Fix
Use a distributed backing store (Redis) via a custom IRateLimiterPolicy, or push rate limiting up to the API gateway / load balancer layer where there's a single enforcement point.
×

Using HttpContext.Connection.RemoteIpAddress as the partition key without handling reverse proxy forwarded headers

Symptom
All users appear to come from the same IP (the proxy's IP), so one user exhausting the limit blocks everyone.
Fix
Add app.UseForwardedHeaders() with ForwardedHeadersOptions configured for your proxy, then read httpContext.Connection.RemoteIpAddress which will now correctly reflect the real client IP. Better yet, prefer a user ID or API key from an authenticated claim.
×

Setting QueueLimit too high on public-facing endpoints

Symptom
During a traffic spike, thousands of requests queue up consuming memory and connection handles; when the window expires they all flush simultaneously causing a secondary spike, and slow clients that have already disconnected still hold queue slots.
Fix
For public APIs set QueueLimit = 0 to fail fast and return 429 immediately. Reserve queuing (QueueLimit > 0) only for internal or authenticated endpoints where the caller can reliably handle a delayed response.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
ASP.NET Core's built-in rate limiting stores state in memory. What break...
Q02SENIOR
Walk me through the difference between the token bucket and sliding wind...
Q03SENIOR
A client tells you they're getting 429s even though they claim they're w...
Q01 of 03SENIOR

ASP.NET Core's built-in rate limiting stores state in memory. What breaks when you deploy multiple instances behind a load balancer, and what are your options for fixing it?

ANSWER
Each instance maintains its own in-memory counters, so a client can hit N times the intended limit by round-robin across N pods. Fixes: (1) Use a distributed backing store like Redis with Lua scripts for atomic counters; implement a custom IRateLimiterPolicy. (2) Use a gateway-level rate limiter (NGINX limit_req, Azure API Management, AWS WAF) that sits in front of all instances. (3) Use sticky sessions (session affinity) but that's generally not recommended because it reduces resilience.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
Is rate limiting in ASP.NET Core available without any extra NuGet packages?
02
How do I apply rate limiting to only specific endpoints, not the whole application?
03
What HTTP status code should a rate limiter return and what headers should it include?
04
How do I test rate limiting policies in my integration tests?
🔥

That's ASP.NET. Mark it forged?

5 min read · try the examples if you haven't

Previous
Caching in ASP.NET Core
14 / 14 · ASP.NET
Next
Unit Testing in C# with xUnit