Home C# / .NET Rate Limiting in ASP.NET Core — Built-in Algorithms, Policies and Production Gotchas

Rate Limiting in ASP.NET Core — Built-in Algorithms, Policies and Production Gotchas

In Plain English 🔥
Imagine a popular lemonade stand that can only serve 10 cups per minute. If 50 kids show up at once, the stand doesn't collapse — it just makes everyone wait their turn or politely says 'come back in a minute.' Rate limiting does exactly that for your API: it controls how many requests a client can make in a given time window so your server never gets crushed by traffic spikes, accidental hammering, or malicious bots.
⚡ Quick Answer
Imagine a popular lemonade stand that can only serve 10 cups per minute. If 50 kids show up at once, the stand doesn't collapse — it just makes everyone wait their turn or politely says 'come back in a minute.' Rate limiting does exactly that for your API: it controls how many requests a client can make in a given time window so your server never gets crushed by traffic spikes, accidental hammering, or malicious bots.

Every public API you've ever used has a rate limiter quietly working behind the scenes. GitHub's API caps you at 5,000 requests per hour. Stripe throttles card-creation calls per second. Twitter once killed third-party apps overnight by tightening limits. Rate limiting isn't a nice-to-have — it's the difference between an API that scales gracefully under load and one that falls over the moment a client's for-loop goes rogue or a DDoS probe starts knocking.

How ASP.NET Core's Built-in Rate Limiting Middleware Actually Works

Before .NET 7, the only way to rate-limit in ASP.NET Core was to bolt on a third-party library like AspNetCoreRateLimit or write custom middleware from scratch. Both approaches worked, but they weren't first-class citizens — they lived outside the framework and couldn't tap into the built-in routing pipeline or endpoint metadata system.

.NET 7 introduced the Microsoft.AspNetCore.RateLimiting namespace and the RateLimiterMiddleware. Internally, it integrates with the IRateLimiterPolicy interface and the lower-level System.Threading.RateLimiting primitives, which were deliberately shipped as a standalone NuGet package (System.Threading.RateLimiting) so you can use the algorithm implementations anywhere — not just in web apps.

The middleware sits in the request pipeline and calls RateLimiter.AcquireAsync() before your endpoint handler ever runs. If a lease is granted, the request flows through. If the limiter rejects the request, the middleware short-circuits with a 429 Too Many Requests response — your controller code is never touched. This is important: rate limiting is enforced at the infrastructure layer, not the application layer, which means you get protection even for endpoints you haven't explicitly coded defensively.

Program.cs · CSHARP
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
using Microsoft.AspNetCore.RateLimiting;
using System.Threading.RateLimiting;

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddControllers();

// ── Register the rate limiting services and configure policies ──────────────
builder.Services.AddRateLimiter(rateLimiterOptions =>
{
    // Reject callbacks let you customise the 429 response body and headers.
    rateLimiterOptions.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    rateLimiterOptions.OnRejected = async (context, cancellationToken) =>
    {
        // Add a Retry-After header so well-behaved clients know when to retry.
        if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
        {
            context.HttpContext.Response.Headers.RetryAfter =
                ((int)retryAfter.TotalSeconds).ToString();
        }

        context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
        await context.HttpContext.Response.WriteAsync(
            "Too many requests. Please slow down.",
            cancellationToken);
    };

    // ── Fixed Window policy: 10 requests per 10-second window per IP ─────────
    rateLimiterOptions.AddFixedWindowLimiter(
        policyName: "FixedWindowPolicy",
        options =>
        {
            options.PermitLimit         = 10;                     // max requests allowed
            options.Window              = TimeSpan.FromSeconds(10); // window duration
            options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            options.QueueLimit          = 2;  // allow 2 requests to queue; rest get 429
        });

    // ── Partitioned policy: each IP address gets its OWN sliding window ──────
    rateLimiterOptions.AddSlidingWindowLimiter(
        policyName: "SlidingWindowPerIp",
        options =>
        {
            options.PermitLimit         = 20;
            options.Window              = TimeSpan.FromSeconds(30);
            options.SegmentsPerWindow   = 3;   // divides 30s into 3 × 10s segments
            options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            options.QueueLimit          = 0;   // no queuing — reject immediately
        });
});

var app = builder.Build();

// ── UseRateLimiter MUST come after UseRouting but before MapControllers ──────
app.UseRouting();
app.UseRateLimiter();   // <── this is the middleware that does the work
app.UseAuthorization();
app.MapControllers();

app.Run();
▶ Output
// No direct console output — the middleware operates silently on each request.
// When a client exceeds the limit you will see in the HTTP response:
//
// HTTP/1.1 429 Too Many Requests
// Retry-After: 10
// Content-Type: text/plain; charset=utf-8
//
// Too many requests. Please slow down.
⚠️
Watch Out: Middleware Order Is Not OptionalIf you place `UseRateLimiter()` before `UseRouting()`, the middleware can't resolve endpoint metadata — named policies applied via `[EnableRateLimiting]` attributes will silently do nothing. Always order it: UseRouting → UseRateLimiter → UseAuthorization → MapControllers.

Choosing the Right Algorithm — Fixed Window vs Sliding Window vs Token Bucket vs Concurrency

Each of the four built-in algorithms solves a slightly different problem. Picking the wrong one doesn't just hurt correctness — it can crater performance or give clients a worse experience than they deserve.

Fixed Window is the simplest. It resets a counter at the start of each window. The dark side: a client can fire all 10 allowed requests in the last millisecond of window N, then fire another 10 in the first millisecond of window N+1 — hitting you with 20 requests in a 2ms burst. This 'boundary burst' is a well-known flaw.

Sliding Window eliminates boundary bursts by tracking requests across overlapping segments. It's more accurate but uses more memory because it maintains per-segment counters.

Token Bucket is the industry favourite for APIs that want to allow short bursts but smooth out sustained traffic. Tokens refill at a steady rate; a client can save up unused tokens and spend them in a burst. This maps naturally to 'burst-friendly' API contracts.

Concurrency Limiter isn't about time at all — it limits how many requests can be in-flight simultaneously. This is the right tool when your bottleneck is a downstream resource (a database connection pool, an external API) rather than raw request rate.

RateLimitPolicies.cs · CSHARP
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
using Microsoft.AspNetCore.RateLimiting;
using System.Threading.RateLimiting;

// This file shows ALL four algorithms side-by-side so you can compare them.
// Wire these up inside AddRateLimiter() in Program.cs.

public static class RateLimitPolicies
{
    public const string FixedWindow   = nameof(FixedWindow);
    public const string SlidingWindow = nameof(SlidingWindow);
    public const string TokenBucket   = nameof(TokenBucket);
    public const string Concurrency   = nameof(Concurrency);

    public static void Register(RateLimiterOptions options)
    {
        // ── 1. Fixed Window ──────────────────────────────────────────────────
        // Good for: simple public endpoints where occasional boundary bursts are tolerable.
        options.AddFixedWindowLimiter(FixedWindow, opt =>
        {
            opt.PermitLimit          = 100;                      // 100 reqs per minute
            opt.Window               = TimeSpan.FromMinutes(1);
            opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            opt.QueueLimit           = 10;
        });

        // ── 2. Sliding Window ────────────────────────────────────────────────
        // Good for: search endpoints, user-facing UIs where burst spikes feel bad.
        // SegmentsPerWindow=6 means the 60s window is divided into 6×10s buckets.
        // Requests from the oldest bucket are 'forgotten' as time moves forward.
        options.AddSlidingWindowLimiter(SlidingWindow, opt =>
        {
            opt.PermitLimit          = 100;
            opt.Window               = TimeSpan.FromMinutes(1);
            opt.SegmentsPerWindow    = 6;                        // granularity tradeoff
            opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            opt.QueueLimit           = 5;
        });

        // ── 3. Token Bucket ──────────────────────────────────────────────────
        // Good for: payment processors, upload APIs — burst-friendly but smooth overall.
        // TokensPerPeriod=20 means 20 tokens are added every ReplenishmentPeriod.
        // TokenLimit=50 is the bucket capacity — max burst size.
        options.AddTokenBucketLimiter(TokenBucket, opt =>
        {
            opt.TokenLimit           = 50;                       // bucket capacity
            opt.ReplenishmentPeriod  = TimeSpan.FromSeconds(10);
            opt.TokensPerPeriod      = 20;                       // refill rate
            opt.AutoReplenishment    = true;                     // background timer handles refill
            opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            opt.QueueLimit           = 3;
        });

        // ── 4. Concurrency Limiter ───────────────────────────────────────────
        // Good for: endpoints hitting a DB connection pool or a slow third-party API.
        // This says: max 5 requests can execute at the same time; up to 2 can queue.
        options.AddConcurrencyLimiter(Concurrency, opt =>
        {
            opt.PermitLimit          = 5;                        // max concurrent requests
            opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
            opt.QueueLimit           = 2;
        });
    }
}
▶ Output
// No runtime output — these are policy registrations.
// At runtime, when a request hits the concurrency policy with 5 already in-flight:
//
// HTTP/1.1 429 Too Many Requests
// (the QueueLimit=2 requests will wait; the 8th concurrent caller gets 429 immediately)
⚠️
Pro Tip: Token Bucket Is the API Industry StandardAWS API Gateway, Stripe, and GitHub all use token bucket or a close variant because it lets power users burst without punishing steady-state callers. If you're building a public-facing API and you're not sure which algorithm to pick, start with token bucket — it gives you the most natural 'fair usage' behaviour with the fewest angry support tickets.

Partitioned Rate Limiting — Per-User, Per-API-Key and Per-IP Policies in Production

A global rate limiter that throttles every caller equally is almost never what you want in production. Your paying enterprise customer shouldn't share a quota with an anonymous crawler. A background service you own shouldn't compete with end-user traffic.

This is where partitioned limiters come in. Instead of one shared RateLimiter instance, a partitioned limiter creates (or retrieves) a separate limiter instance for each partition key — typically an IP address, a user ID, an API key, or some combination. Each partition has its own counter that doesn't affect anyone else.

The AddPolicy overload is how you build a partitioned policy. The factory delegate receives the HttpContext and must return a RateLimitPartition — a value type that pairs a key with a factory function that creates the limiter for that key.

Critically, these per-partition limiter instances are cached inside a PartitionedRateLimiter. You're not allocating a new object on every request — the runtime reuses the instance for the same key. However, if you have millions of unique keys (e.g. one per user), the cache can grow unbounded. Use ReplenishmentPeriod wisely and monitor memory in production.

PartitionedRateLimitPolicy.cs · CSHARP
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
using Microsoft.AspNetCore.RateLimiting;
using System.Security.Claims;
using System.Threading.RateLimiting;

builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;

    // ── Partitioned policy: anonymous vs authenticated users get different limits ──
    options.AddPolicy(
        policyName: "UserTierPolicy",
        partitioner: httpContext =>
        {
            // Prefer the authenticated user ID as the partition key.
            // Fall back to IP address for anonymous callers.
            var userId = httpContext.User.FindFirstValue(ClaimTypes.NameIdentifier);

            if (!string.IsNullOrEmpty(userId))
            {
                // Authenticated users get a generous token bucket.
                return RateLimitPartition.GetTokenBucketLimiter(
                    partitionKey: $"user:{userId}",   // unique key per user
                    factory: _ => new TokenBucketRateLimiterOptions
                    {
                        TokenLimit          = 200,               // large bucket for auth'd users
                        ReplenishmentPeriod = TimeSpan.FromSeconds(30),
                        TokensPerPeriod     = 50,
                        AutoReplenishment   = true,
                        QueueLimit          = 0
                    });
            }

            // Anonymous callers get a much stricter fixed window per IP.
            var clientIpAddress =
                httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown";

            return RateLimitPartition.GetFixedWindowLimiter(
                partitionKey: $"anon:{clientIpAddress}",
                factory: _ => new FixedWindowRateLimiterOptions
                {
                    PermitLimit          = 10,                   // tight limit for anonymous
                    Window               = TimeSpan.FromSeconds(10),
                    QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
                    QueueLimit           = 0
                });
        });
});

// ── Apply the policy to a specific controller via attribute ─────────────────
[ApiController]
[Route("api/[controller]")]
[EnableRateLimiting("UserTierPolicy")]   // <── applied to every action in this controller
public class ProductsController : ControllerBase
{
    [HttpGet]
    public IActionResult GetAll()
    {
        return Ok(new[] { "Widget A", "Widget B", "Widget C" });
    }

    [HttpGet("{id:int}")]
    [DisableRateLimiting]   // <── opt-out for this specific action (e.g. health/status endpoints)
    public IActionResult GetById(int id)
    {
        return Ok($"Product {id}");
    }
}
▶ Output
// Anonymous caller from 192.168.1.5 — 11th request in 10 seconds:
// HTTP/1.1 429 Too Many Requests
//
// Authenticated user 'alice@example.com' — 201st token consumed within window:
// HTTP/1.1 429 Too Many Requests
//
// GET /api/products/42 — always succeeds because [DisableRateLimiting] is applied:
// HTTP/1.1 200 OK
⚠️
Watch Out: IP Address Is Not a Safe Partition Key Behind a ProxyIf your app runs behind a reverse proxy (NGINX, Azure Front Door, AWS ALB), `HttpContext.Connection.RemoteIpAddress` will be the proxy's IP — meaning ALL your users share one rate limit. Always configure `ForwardedHeadersOptions` and use `X-Forwarded-For` or `X-Real-IP` to extract the real client IP. Alternatively, partition by an API key or JWT claim which doesn't have this problem.

Production Gotchas — Distributed Environments, Load Balancers and Custom Rejection Responses

The built-in middleware stores limiter state in process memory. On a single-node deployment that's fine. The moment you scale to two or more instances — behind a load balancer, in Kubernetes with multiple pods — each instance maintains its own independent counter. A client can now hit 10× your intended limit just by round-robin luck across 10 pods.

For distributed rate limiting you have two practical options. First, implement IRateLimiterPolicy backed by a Redis counter (using StackExchange.Redis with Lua scripts for atomic increment-and-check). Second, use a gateway-level rate limiter (NGINX limit_req, Azure API Management, AWS WAF) that sits in front of all instances and is the single source of truth.

Another common production issue is the queue behaviour under thundering herd. When QueueLimit is greater than zero, queued requests hold a thread (technically they await an async operation, but they do consume memory and connection handles). If a sudden spike queues 10,000 requests with a 1-second window, those requests all time out simultaneously and your clients get a terrible experience. For public APIs, QueueLimit = 0 is often the safer choice — fail fast and let clients back off.

Finally, always expose rate limit headers. RFC 6585 defines Retry-After, and the emerging RateLimit-* headers draft (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) is being adopted broadly. Well-behaved clients rely on these to implement exponential backoff without guessing.

DistributedRateLimitPolicy.cs · CSHARP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
using Microsoft.AspNetCore.RateLimiting;
using StackExchange.Redis;
using System.Threading.RateLimiting;

// ── Custom IRateLimiterPolicy backed by Redis for distributed rate limiting ──
// This is the pattern you NEED when running multiple instances in production.
//
// Redis Lua script ensures the increment + expiry check is atomic (no race conditions).
public class RedisFixedWindowPolicy : IRateLimiterPolicy<string>
{
    private readonly IConnectionMultiplexer _redis;
    private readonly ILogger<RedisFixedWindowPolicy> _logger;

    // How many requests are allowed per window
    private const int PermitLimit = 100;
    private static readonly TimeSpan Window = TimeSpan.FromMinutes(1);

    // Lua script: atomically increment and set expiry on first call
    private const string IncrementScript = """
        local current = redis.call('INCR', KEYS[1])
        if current == 1 then
            redis.call('EXPIRE', KEYS[1], ARGV[1])
        end
        return current
        """;

    public RedisFixedWindowPolicy(
        IConnectionMultiplexer redis,
        ILogger<RedisFixedWindowPolicy> logger)
    {
        _redis  = redis;
        _logger = logger;
    }

    // GetPartition is called on EVERY request — keep it fast.
    public RateLimitPartition<string> GetPartition(HttpContext httpContext)
    {
        // Use a claim-based key for authenticated users; IP for anonymous.
        var partitionKey = httpContext.User.Identity?.IsAuthenticated == true
            ? $"rl:user:{httpContext.User.Identity.Name}"
            : $"rl:ip:{httpContext.Connection.RemoteIpAddress}";

        // We return a NoLimiter here and do the actual Redis check in a custom
        // middleware BEFORE this policy runs — see note in callout below.
        // For demonstration we show the Redis check inline via a custom factory.
        return RateLimitPartition.Get(
            partitionKey: partitionKey,
            factory: key => new RedisBackedRateLimiter(key, _redis, _logger));
    }

    public Func<OnRejectedContext, CancellationToken, ValueTask>? OnRejected =>
        async (context, cancellationToken) =>
        {
            context.HttpContext.Response.StatusCode = 429;

            // Expose the reset time so clients can schedule a retry intelligently.
            context.HttpContext.Response.Headers["RateLimit-Limit"]  = PermitLimit.ToString();
            context.HttpContext.Response.Headers["RateLimit-Reset"]  =
                DateTimeOffset.UtcNow.Add(Window).ToUnixTimeSeconds().ToString();

            await context.HttpContext.Response.WriteAsJsonAsync(
                new { error = "rate_limit_exceeded", retryAfterSeconds = (int)Window.TotalSeconds },
                cancellationToken);
        };
}

// ── Register in Program.cs ───────────────────────────────────────────────────
// builder.Services.AddSingleton<IConnectionMultiplexer>(
//     ConnectionMultiplexer.Connect("localhost:6379"));
// builder.Services.AddRateLimiter(opt =>
//     opt.AddPolicy<string, RedisFixedWindowPolicy>("DistributedPolicy"));
▶ Output
// When client exceeds the Redis-backed limit across ANY pod in the cluster:
//
// HTTP/1.1 429 Too Many Requests
// RateLimit-Limit: 100
// RateLimit-Reset: 1718293200
// Content-Type: application/json
//
// {"error":"rate_limit_exceeded","retryAfterSeconds":60}
🔥
Interview Gold: In-Memory vs Distributed Rate LimitingInterviewers love this question. The built-in ASP.NET Core middleware is in-process only — state lives in RAM and dies with the process. For horizontally scaled deployments, you need a shared backing store (Redis is the industry default). The key insight: Redis Lua scripts are used for atomic counter increments because a read-then-write in application code has a TOCTOU race condition under concurrent load.
AspectFixed WindowSliding WindowToken BucketConcurrency
What it limitsRequests per time windowRequests across rolling windowRequests via token refillSimultaneous in-flight requests
Boundary burst vulnerabilityYes — double-tap at window edgeNo — segments smooth it outNo — token drain prevents itN/A — time-independent
Memory usage per partitionLow (1 counter)Medium (N segment counters)Low (1 counter + metadata)Low (1 semaphore)
Burst-friendly?NoNoYes — up to TokenLimitNo — hard concurrency ceiling
Best use caseSimple internal APIsUser-facing search / feedsPublic APIs, payment endpointsDB-backed or slow I/O endpoints
Queue supportYesYesYesYes
Refill mechanismWindow resetSegment expiryTimer-based replenishmentRelease on request completion
.NET class nameFixedWindowRateLimiterSlidingWindowRateLimiterTokenBucketRateLimiterConcurrencyLimiter

🎯 Key Takeaways

  • The built-in RateLimiterMiddleware in .NET 7+ is in-process only — the moment you scale horizontally, you need Redis-backed distributed limiting or gateway-level enforcement, full stop.
  • Token bucket is the right default for public APIs because it allows short bursts (great UX) while preventing sustained abuse — this is why AWS, Stripe and GitHub all use it.
  • QueueLimit = 0 is usually the correct choice for public APIs — queuing under a thundering herd consumes memory and causes synchronised retry storms when the window expires.
  • Always emit Retry-After and the draft RateLimit-* headers in your 429 responses — without them, well-intentioned clients have no choice but to hammer you with retries at full speed.

⚠ Common Mistakes to Avoid

  • Mistake 1: Relying on in-memory rate limiting in a multi-instance deployment — Symptom: clients easily exceed their supposed limit because each pod has an independent counter. A user hitting 10 pods round-robin gets 10× the intended limit. Fix: use a distributed backing store (Redis) via a custom IRateLimiterPolicy, or push rate limiting up to the API gateway / load balancer layer where there's a single enforcement point.
  • Mistake 2: Using HttpContext.Connection.RemoteIpAddress as the partition key without handling reverse proxy forwarded headers — Symptom: all users appear to come from the same IP (the proxy's IP), so one user exhausting the limit blocks everyone. Fix: add app.UseForwardedHeaders() with ForwardedHeadersOptions configured for your proxy, then read httpContext.Connection.RemoteIpAddress which will now correctly reflect the real client IP. Better yet, prefer a user ID or API key from an authenticated claim.
  • Mistake 3: Setting QueueLimit too high on public-facing endpoints — Symptom: during a traffic spike, thousands of requests queue up consuming memory and connection handles; when the window expires they all flush simultaneously causing a secondary spike, and slow clients that have already disconnected still hold queue slots. Fix: for public APIs set QueueLimit = 0 to fail fast and return 429 immediately. Reserve queuing (QueueLimit > 0) only for internal or authenticated endpoints where the caller can reliably handle a delayed response.

Interview Questions on This Topic

  • QASP.NET Core's built-in rate limiting stores state in memory. What breaks when you deploy multiple instances behind a load balancer, and what are your options for fixing it?
  • QWalk me through the difference between the token bucket and sliding window algorithms. Give me a concrete scenario where picking the wrong one causes production problems.
  • QA client tells you they're getting 429s even though they claim they're well within the rate limit. After checking, you discover they're running 3 instances of their app against your API. How does your rate limiting policy need to change, and how would you communicate remaining quota back to them in the HTTP response?

Frequently Asked Questions

Is rate limiting in ASP.NET Core available without any extra NuGet packages?

Yes — from .NET 7 onwards, Microsoft.AspNetCore.RateLimiting is included in the framework. The underlying algorithm primitives (TokenBucketRateLimiter, FixedWindowRateLimiter etc.) live in System.Threading.RateLimiting, which is also inbox in .NET 7+ but can be installed as a standalone NuGet package if you need the algorithms in a non-web project like a console app or background service.

How do I apply rate limiting to only specific endpoints, not the whole application?

Use the [EnableRateLimiting("PolicyName")] attribute on a controller class or individual action method. Conversely, use [DisableRateLimiting] to exempt specific actions from a policy applied at the controller level. You can also apply policies fluently in the routing pipeline using .RequireRateLimiting("PolicyName") on a MapGet / MapPost call.

What HTTP status code should a rate limiter return and what headers should it include?

RFC 6585 specifies 429 Too Many Requests as the correct status code. You should always include a Retry-After header (value in seconds) so clients know when they can retry. The evolving IETF draft also specifies RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset headers — adopting these now future-proofs your API and enables SDK clients to implement automatic backoff without guesswork.

🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousBDD with SpecFlow in C#Next →Source Generators in C#
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged