Rate Limiting in ASP.NET Core — Built-in Algorithms, Policies and Production Gotchas
Every public API you've ever used has a rate limiter quietly working behind the scenes. GitHub's API caps you at 5,000 requests per hour. Stripe throttles card-creation calls per second. Twitter once killed third-party apps overnight by tightening limits. Rate limiting isn't a nice-to-have — it's the difference between an API that scales gracefully under load and one that falls over the moment a client's for-loop goes rogue or a DDoS probe starts knocking.
How ASP.NET Core's Built-in Rate Limiting Middleware Actually Works
Before .NET 7, the only way to rate-limit in ASP.NET Core was to bolt on a third-party library like AspNetCoreRateLimit or write custom middleware from scratch. Both approaches worked, but they weren't first-class citizens — they lived outside the framework and couldn't tap into the built-in routing pipeline or endpoint metadata system.
.NET 7 introduced the Microsoft.AspNetCore.RateLimiting namespace and the RateLimiterMiddleware. Internally, it integrates with the IRateLimiterPolicy interface and the lower-level System.Threading.RateLimiting primitives, which were deliberately shipped as a standalone NuGet package (System.Threading.RateLimiting) so you can use the algorithm implementations anywhere — not just in web apps.
The middleware sits in the request pipeline and calls RateLimiter.AcquireAsync() before your endpoint handler ever runs. If a lease is granted, the request flows through. If the limiter rejects the request, the middleware short-circuits with a 429 Too Many Requests response — your controller code is never touched. This is important: rate limiting is enforced at the infrastructure layer, not the application layer, which means you get protection even for endpoints you haven't explicitly coded defensively.
using Microsoft.AspNetCore.RateLimiting; using System.Threading.RateLimiting; var builder = WebApplication.CreateBuilder(args); builder.Services.AddControllers(); // ── Register the rate limiting services and configure policies ────────────── builder.Services.AddRateLimiter(rateLimiterOptions => { // Reject callbacks let you customise the 429 response body and headers. rateLimiterOptions.RejectionStatusCode = StatusCodes.Status429TooManyRequests; rateLimiterOptions.OnRejected = async (context, cancellationToken) => { // Add a Retry-After header so well-behaved clients know when to retry. if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter)) { context.HttpContext.Response.Headers.RetryAfter = ((int)retryAfter.TotalSeconds).ToString(); } context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests; await context.HttpContext.Response.WriteAsync( "Too many requests. Please slow down.", cancellationToken); }; // ── Fixed Window policy: 10 requests per 10-second window per IP ───────── rateLimiterOptions.AddFixedWindowLimiter( policyName: "FixedWindowPolicy", options => { options.PermitLimit = 10; // max requests allowed options.Window = TimeSpan.FromSeconds(10); // window duration options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst; options.QueueLimit = 2; // allow 2 requests to queue; rest get 429 }); // ── Partitioned policy: each IP address gets its OWN sliding window ────── rateLimiterOptions.AddSlidingWindowLimiter( policyName: "SlidingWindowPerIp", options => { options.PermitLimit = 20; options.Window = TimeSpan.FromSeconds(30); options.SegmentsPerWindow = 3; // divides 30s into 3 × 10s segments options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst; options.QueueLimit = 0; // no queuing — reject immediately }); }); var app = builder.Build(); // ── UseRateLimiter MUST come after UseRouting but before MapControllers ────── app.UseRouting(); app.UseRateLimiter(); // <── this is the middleware that does the work app.UseAuthorization(); app.MapControllers(); app.Run();
// When a client exceeds the limit you will see in the HTTP response:
//
// HTTP/1.1 429 Too Many Requests
// Retry-After: 10
// Content-Type: text/plain; charset=utf-8
//
// Too many requests. Please slow down.
Choosing the Right Algorithm — Fixed Window vs Sliding Window vs Token Bucket vs Concurrency
Each of the four built-in algorithms solves a slightly different problem. Picking the wrong one doesn't just hurt correctness — it can crater performance or give clients a worse experience than they deserve.
Fixed Window is the simplest. It resets a counter at the start of each window. The dark side: a client can fire all 10 allowed requests in the last millisecond of window N, then fire another 10 in the first millisecond of window N+1 — hitting you with 20 requests in a 2ms burst. This 'boundary burst' is a well-known flaw.
Sliding Window eliminates boundary bursts by tracking requests across overlapping segments. It's more accurate but uses more memory because it maintains per-segment counters.
Token Bucket is the industry favourite for APIs that want to allow short bursts but smooth out sustained traffic. Tokens refill at a steady rate; a client can save up unused tokens and spend them in a burst. This maps naturally to 'burst-friendly' API contracts.
Concurrency Limiter isn't about time at all — it limits how many requests can be in-flight simultaneously. This is the right tool when your bottleneck is a downstream resource (a database connection pool, an external API) rather than raw request rate.
using Microsoft.AspNetCore.RateLimiting; using System.Threading.RateLimiting; // This file shows ALL four algorithms side-by-side so you can compare them. // Wire these up inside AddRateLimiter() in Program.cs. public static class RateLimitPolicies { public const string FixedWindow = nameof(FixedWindow); public const string SlidingWindow = nameof(SlidingWindow); public const string TokenBucket = nameof(TokenBucket); public const string Concurrency = nameof(Concurrency); public static void Register(RateLimiterOptions options) { // ── 1. Fixed Window ────────────────────────────────────────────────── // Good for: simple public endpoints where occasional boundary bursts are tolerable. options.AddFixedWindowLimiter(FixedWindow, opt => { opt.PermitLimit = 100; // 100 reqs per minute opt.Window = TimeSpan.FromMinutes(1); opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst; opt.QueueLimit = 10; }); // ── 2. Sliding Window ──────────────────────────────────────────────── // Good for: search endpoints, user-facing UIs where burst spikes feel bad. // SegmentsPerWindow=6 means the 60s window is divided into 6×10s buckets. // Requests from the oldest bucket are 'forgotten' as time moves forward. options.AddSlidingWindowLimiter(SlidingWindow, opt => { opt.PermitLimit = 100; opt.Window = TimeSpan.FromMinutes(1); opt.SegmentsPerWindow = 6; // granularity tradeoff opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst; opt.QueueLimit = 5; }); // ── 3. Token Bucket ────────────────────────────────────────────────── // Good for: payment processors, upload APIs — burst-friendly but smooth overall. // TokensPerPeriod=20 means 20 tokens are added every ReplenishmentPeriod. // TokenLimit=50 is the bucket capacity — max burst size. options.AddTokenBucketLimiter(TokenBucket, opt => { opt.TokenLimit = 50; // bucket capacity opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10); opt.TokensPerPeriod = 20; // refill rate opt.AutoReplenishment = true; // background timer handles refill opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst; opt.QueueLimit = 3; }); // ── 4. Concurrency Limiter ─────────────────────────────────────────── // Good for: endpoints hitting a DB connection pool or a slow third-party API. // This says: max 5 requests can execute at the same time; up to 2 can queue. options.AddConcurrencyLimiter(Concurrency, opt => { opt.PermitLimit = 5; // max concurrent requests opt.QueueProcessingOrder = QueueProcessingOrder.OldestFirst; opt.QueueLimit = 2; }); } }
// At runtime, when a request hits the concurrency policy with 5 already in-flight:
//
// HTTP/1.1 429 Too Many Requests
// (the QueueLimit=2 requests will wait; the 8th concurrent caller gets 429 immediately)
Partitioned Rate Limiting — Per-User, Per-API-Key and Per-IP Policies in Production
A global rate limiter that throttles every caller equally is almost never what you want in production. Your paying enterprise customer shouldn't share a quota with an anonymous crawler. A background service you own shouldn't compete with end-user traffic.
This is where partitioned limiters come in. Instead of one shared RateLimiter instance, a partitioned limiter creates (or retrieves) a separate limiter instance for each partition key — typically an IP address, a user ID, an API key, or some combination. Each partition has its own counter that doesn't affect anyone else.
The AddPolicy overload is how you build a partitioned policy. The factory delegate receives the HttpContext and must return a RateLimitPartition — a value type that pairs a key with a factory function that creates the limiter for that key.
Critically, these per-partition limiter instances are cached inside a PartitionedRateLimiter. You're not allocating a new object on every request — the runtime reuses the instance for the same key. However, if you have millions of unique keys (e.g. one per user), the cache can grow unbounded. Use ReplenishmentPeriod wisely and monitor memory in production.
using Microsoft.AspNetCore.RateLimiting; using System.Security.Claims; using System.Threading.RateLimiting; builder.Services.AddRateLimiter(options => { options.RejectionStatusCode = StatusCodes.Status429TooManyRequests; // ── Partitioned policy: anonymous vs authenticated users get different limits ── options.AddPolicy( policyName: "UserTierPolicy", partitioner: httpContext => { // Prefer the authenticated user ID as the partition key. // Fall back to IP address for anonymous callers. var userId = httpContext.User.FindFirstValue(ClaimTypes.NameIdentifier); if (!string.IsNullOrEmpty(userId)) { // Authenticated users get a generous token bucket. return RateLimitPartition.GetTokenBucketLimiter( partitionKey: $"user:{userId}", // unique key per user factory: _ => new TokenBucketRateLimiterOptions { TokenLimit = 200, // large bucket for auth'd users ReplenishmentPeriod = TimeSpan.FromSeconds(30), TokensPerPeriod = 50, AutoReplenishment = true, QueueLimit = 0 }); } // Anonymous callers get a much stricter fixed window per IP. var clientIpAddress = httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown"; return RateLimitPartition.GetFixedWindowLimiter( partitionKey: $"anon:{clientIpAddress}", factory: _ => new FixedWindowRateLimiterOptions { PermitLimit = 10, // tight limit for anonymous Window = TimeSpan.FromSeconds(10), QueueProcessingOrder = QueueProcessingOrder.OldestFirst, QueueLimit = 0 }); }); }); // ── Apply the policy to a specific controller via attribute ───────────────── [ApiController] [Route("api/[controller]")] [EnableRateLimiting("UserTierPolicy")] // <── applied to every action in this controller public class ProductsController : ControllerBase { [HttpGet] public IActionResult GetAll() { return Ok(new[] { "Widget A", "Widget B", "Widget C" }); } [HttpGet("{id:int}")] [DisableRateLimiting] // <── opt-out for this specific action (e.g. health/status endpoints) public IActionResult GetById(int id) { return Ok($"Product {id}"); } }
// HTTP/1.1 429 Too Many Requests
//
// Authenticated user 'alice@example.com' — 201st token consumed within window:
// HTTP/1.1 429 Too Many Requests
//
// GET /api/products/42 — always succeeds because [DisableRateLimiting] is applied:
// HTTP/1.1 200 OK
Production Gotchas — Distributed Environments, Load Balancers and Custom Rejection Responses
The built-in middleware stores limiter state in process memory. On a single-node deployment that's fine. The moment you scale to two or more instances — behind a load balancer, in Kubernetes with multiple pods — each instance maintains its own independent counter. A client can now hit 10× your intended limit just by round-robin luck across 10 pods.
For distributed rate limiting you have two practical options. First, implement IRateLimiterPolicy backed by a Redis counter (using StackExchange.Redis with Lua scripts for atomic increment-and-check). Second, use a gateway-level rate limiter (NGINX limit_req, Azure API Management, AWS WAF) that sits in front of all instances and is the single source of truth.
Another common production issue is the queue behaviour under thundering herd. When QueueLimit is greater than zero, queued requests hold a thread (technically they await an async operation, but they do consume memory and connection handles). If a sudden spike queues 10,000 requests with a 1-second window, those requests all time out simultaneously and your clients get a terrible experience. For public APIs, QueueLimit = 0 is often the safer choice — fail fast and let clients back off.
Finally, always expose rate limit headers. RFC 6585 defines Retry-After, and the emerging RateLimit-* headers draft (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) is being adopted broadly. Well-behaved clients rely on these to implement exponential backoff without guessing.
using Microsoft.AspNetCore.RateLimiting; using StackExchange.Redis; using System.Threading.RateLimiting; // ── Custom IRateLimiterPolicy backed by Redis for distributed rate limiting ── // This is the pattern you NEED when running multiple instances in production. // // Redis Lua script ensures the increment + expiry check is atomic (no race conditions). public class RedisFixedWindowPolicy : IRateLimiterPolicy<string> { private readonly IConnectionMultiplexer _redis; private readonly ILogger<RedisFixedWindowPolicy> _logger; // How many requests are allowed per window private const int PermitLimit = 100; private static readonly TimeSpan Window = TimeSpan.FromMinutes(1); // Lua script: atomically increment and set expiry on first call private const string IncrementScript = """ local current = redis.call('INCR', KEYS[1]) if current == 1 then redis.call('EXPIRE', KEYS[1], ARGV[1]) end return current """; public RedisFixedWindowPolicy( IConnectionMultiplexer redis, ILogger<RedisFixedWindowPolicy> logger) { _redis = redis; _logger = logger; } // GetPartition is called on EVERY request — keep it fast. public RateLimitPartition<string> GetPartition(HttpContext httpContext) { // Use a claim-based key for authenticated users; IP for anonymous. var partitionKey = httpContext.User.Identity?.IsAuthenticated == true ? $"rl:user:{httpContext.User.Identity.Name}" : $"rl:ip:{httpContext.Connection.RemoteIpAddress}"; // We return a NoLimiter here and do the actual Redis check in a custom // middleware BEFORE this policy runs — see note in callout below. // For demonstration we show the Redis check inline via a custom factory. return RateLimitPartition.Get( partitionKey: partitionKey, factory: key => new RedisBackedRateLimiter(key, _redis, _logger)); } public Func<OnRejectedContext, CancellationToken, ValueTask>? OnRejected => async (context, cancellationToken) => { context.HttpContext.Response.StatusCode = 429; // Expose the reset time so clients can schedule a retry intelligently. context.HttpContext.Response.Headers["RateLimit-Limit"] = PermitLimit.ToString(); context.HttpContext.Response.Headers["RateLimit-Reset"] = DateTimeOffset.UtcNow.Add(Window).ToUnixTimeSeconds().ToString(); await context.HttpContext.Response.WriteAsJsonAsync( new { error = "rate_limit_exceeded", retryAfterSeconds = (int)Window.TotalSeconds }, cancellationToken); }; } // ── Register in Program.cs ─────────────────────────────────────────────────── // builder.Services.AddSingleton<IConnectionMultiplexer>( // ConnectionMultiplexer.Connect("localhost:6379")); // builder.Services.AddRateLimiter(opt => // opt.AddPolicy<string, RedisFixedWindowPolicy>("DistributedPolicy"));
//
// HTTP/1.1 429 Too Many Requests
// RateLimit-Limit: 100
// RateLimit-Reset: 1718293200
// Content-Type: application/json
//
// {"error":"rate_limit_exceeded","retryAfterSeconds":60}
| Aspect | Fixed Window | Sliding Window | Token Bucket | Concurrency |
|---|---|---|---|---|
| What it limits | Requests per time window | Requests across rolling window | Requests via token refill | Simultaneous in-flight requests |
| Boundary burst vulnerability | Yes — double-tap at window edge | No — segments smooth it out | No — token drain prevents it | N/A — time-independent |
| Memory usage per partition | Low (1 counter) | Medium (N segment counters) | Low (1 counter + metadata) | Low (1 semaphore) |
| Burst-friendly? | No | No | Yes — up to TokenLimit | No — hard concurrency ceiling |
| Best use case | Simple internal APIs | User-facing search / feeds | Public APIs, payment endpoints | DB-backed or slow I/O endpoints |
| Queue support | Yes | Yes | Yes | Yes |
| Refill mechanism | Window reset | Segment expiry | Timer-based replenishment | Release on request completion |
| .NET class name | FixedWindowRateLimiter | SlidingWindowRateLimiter | TokenBucketRateLimiter | ConcurrencyLimiter |
🎯 Key Takeaways
- The built-in
RateLimiterMiddlewarein .NET 7+ is in-process only — the moment you scale horizontally, you need Redis-backed distributed limiting or gateway-level enforcement, full stop. - Token bucket is the right default for public APIs because it allows short bursts (great UX) while preventing sustained abuse — this is why AWS, Stripe and GitHub all use it.
QueueLimit = 0is usually the correct choice for public APIs — queuing under a thundering herd consumes memory and causes synchronised retry storms when the window expires.- Always emit
Retry-Afterand the draftRateLimit-*headers in your 429 responses — without them, well-intentioned clients have no choice but to hammer you with retries at full speed.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Relying on in-memory rate limiting in a multi-instance deployment — Symptom: clients easily exceed their supposed limit because each pod has an independent counter. A user hitting 10 pods round-robin gets 10× the intended limit. Fix: use a distributed backing store (Redis) via a custom
IRateLimiterPolicy, or push rate limiting up to the API gateway / load balancer layer where there's a single enforcement point. - ✕Mistake 2: Using
HttpContext.Connection.RemoteIpAddressas the partition key without handling reverse proxy forwarded headers — Symptom: all users appear to come from the same IP (the proxy's IP), so one user exhausting the limit blocks everyone. Fix: addapp.UseForwardedHeaders()withForwardedHeadersOptionsconfigured for your proxy, then readhttpContext.Connection.RemoteIpAddresswhich will now correctly reflect the real client IP. Better yet, prefer a user ID or API key from an authenticated claim. - ✕Mistake 3: Setting
QueueLimittoo high on public-facing endpoints — Symptom: during a traffic spike, thousands of requests queue up consuming memory and connection handles; when the window expires they all flush simultaneously causing a secondary spike, and slow clients that have already disconnected still hold queue slots. Fix: for public APIs setQueueLimit = 0to fail fast and return 429 immediately. Reserve queuing (QueueLimit > 0) only for internal or authenticated endpoints where the caller can reliably handle a delayed response.
Interview Questions on This Topic
- QASP.NET Core's built-in rate limiting stores state in memory. What breaks when you deploy multiple instances behind a load balancer, and what are your options for fixing it?
- QWalk me through the difference between the token bucket and sliding window algorithms. Give me a concrete scenario where picking the wrong one causes production problems.
- QA client tells you they're getting 429s even though they claim they're well within the rate limit. After checking, you discover they're running 3 instances of their app against your API. How does your rate limiting policy need to change, and how would you communicate remaining quota back to them in the HTTP response?
Frequently Asked Questions
Is rate limiting in ASP.NET Core available without any extra NuGet packages?
Yes — from .NET 7 onwards, Microsoft.AspNetCore.RateLimiting is included in the framework. The underlying algorithm primitives (TokenBucketRateLimiter, FixedWindowRateLimiter etc.) live in System.Threading.RateLimiting, which is also inbox in .NET 7+ but can be installed as a standalone NuGet package if you need the algorithms in a non-web project like a console app or background service.
How do I apply rate limiting to only specific endpoints, not the whole application?
Use the [EnableRateLimiting("PolicyName")] attribute on a controller class or individual action method. Conversely, use [DisableRateLimiting] to exempt specific actions from a policy applied at the controller level. You can also apply policies fluently in the routing pipeline using .RequireRateLimiting("PolicyName") on a MapGet / MapPost call.
What HTTP status code should a rate limiter return and what headers should it include?
RFC 6585 specifies 429 Too Many Requests as the correct status code. You should always include a Retry-After header (value in seconds) so clients know when they can retry. The evolving IETF draft also specifies RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset headers — adopting these now future-proofs your API and enables SDK clients to implement automatic backoff without guesswork.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.