Spring Cloud Gateway: The Complete Production Guide
Master Spring Cloud Gateway: RouteLocator, predicates, filters, rate limiting with Redis, JWT auth filters, and lb:// load balancing for production microservices.
- Define routes via RouteLocator bean or YAML with predicates (Path, Header, Method) and filters (AddRequestHeader, RewritePath)
- Rate limit with RequestRateLimiter filter backed by Redis using KeyResolver beans
- JWT authentication via a GlobalFilter that validates tokens before routing
- Use lb://service-name URI scheme for automatic Eureka/Consul load balancing
- Circuit breaker integration via CircuitBreaker filter with fallback URIs for resilience
Spring Cloud Gateway is the front door of your microservices house. Every request from the outside world passes through this door, which checks credentials, rate-limits abusive visitors, rewrites messy URLs, and then directs each visitor to the right room inside the house. The rooms (microservices) never talk directly to the outside world.
Before API gateways became standard, microservice teams faced a brutal choice: either expose every service directly to the internet (an operational and security nightmare) or build a bespoke routing layer in Nginx that nobody wanted to maintain. Spring Cloud Gateway was created to solve this by providing a programmable, reactive routing engine that integrates natively with the Spring ecosystem.
The production pain point that drives teams to Spring Cloud Gateway is the proliferation of cross-cutting concerns. Authentication, rate limiting, request tracing, CORS, circuit breaking, and response caching all need to happen before a request reaches its target service. Without a gateway, every service implements these independently — and inconsistently. A single missed Authorization header check in one service becomes a security incident.
Spring Cloud Gateway is built on Spring WebFlux and Project Reactor, making it fully non-blocking and reactive. This architectural choice means a single gateway instance can handle tens of thousands of concurrent connections without the thread-per-request overhead of a Servlet-based gateway. In benchmarks, it consistently handles 3-5x more requests per second than Zuul 1.x on the same hardware.
The routing model is predicate-based: each route has a set of predicates that must match the incoming request (path pattern, header values, HTTP method, query parameters, time of day, cookie values), and a set of filters that transform the request before forwarding and transform the response before returning. This declarative model makes complex routing logic readable and testable.
Rate limiting is one of the most critical production concerns. Without it, a single misbehaving client or a DDoS attack can exhaust backend service capacity. Gateway's Redis-backed RequestRateLimiter implements the token bucket algorithm with per-key limits, where the key can be the authenticated user ID, API key, or client IP. This is far superior to per-service rate limiting because it enforces limits at the edge.
This guide walks through every major Gateway capability with production-grade configuration, real incident analysis, and the exact patterns used in high-traffic Spring Boot microservice deployments.
RouteLocator: Java DSL vs YAML Configuration
Spring Cloud Gateway supports two configuration styles: Java DSL via RouteLocatorBuilder and YAML/properties configuration. Both produce identical runtime behavior, but the Java DSL is more expressive for complex routing logic and provides compile-time type checking. YAML is more readable for simple routing tables and better for environments where configuration is managed separately from code.
The Java DSL uses a fluent builder API where you define each route with an ID, URI, predicates, and filters. The RouteLocatorBuilder.routes() method chains route definitions, and the resulting RouteLocator bean replaces or supplements YAML-defined routes. Both sources are merged at startup.
A critical detail often missed: route order matters. Gateway evaluates routes in the order they're defined and selects the first match. Put more specific routes before more general ones. In YAML, routes are ordered by their position in the list. In Java DSL, they're ordered by definition order within the routes() builder.
For dynamic routing that changes at runtime without redeployment, implement a custom RouteDefinitionRepository backed by a database or Redis. This enables admin APIs that add or remove routes without restarting the Gateway. The InMemoryRouteDefinitionRepository (the default) supports this via the Gateway Actuator endpoints (POST /actuator/gateway/routes, DELETE /actuator/gateway/routes/{id}).
Rate Limiting with Redis RequestRateLimiter
Rate limiting at the API gateway level is the most effective defense against API abuse, DDoS attacks, and accidental client bugs that cause thundering herds. Spring Cloud Gateway's RequestRateLimiter filter implements the token bucket algorithm backed by Redis Lua scripts, providing accurate, distributed rate limiting that works correctly across multiple gateway instances.
The token bucket algorithm maintains a bucket with a maximum capacity (burst-capacity) that refills at a fixed rate (replenish-rate tokens per second). Each request consumes one or more tokens. If the bucket is empty, the request is rejected with 429 Too Many Requests. This allows short bursts of traffic (up to burst-capacity) while enforcing a long-term average rate (replenish-rate).
The key resolver is the most important configuration decision. It determines the granularity of rate limiting. Common strategies: by authenticated user ID (prevents power users from starving others), by API key (for quota-based monetization), by client IP (for unauthenticated endpoints), or by request path (to protect expensive endpoints). You can compose multiple resolvers.
Redis connectivity is critical — if Redis is unreachable and deny-empty-key=true (the default), all requests are rejected. In production, use Redis Sentinel or Redis Cluster for HA, configure appropriate connection pool settings, and set deny-empty-key=false with monitoring alerts so you know when rate limiting is degraded rather than serving 100% 429s.
The rate limiter headers in the response (X-RateLimit-Remaining, X-RateLimit-Replenish-Rate, X-RateLimit-Burst-Capacity) are valuable for clients and should be preserved. They allow clients to implement backoff before hitting the limit rather than polling until they get a 429.
JWT Authentication Global Filter
Authentication at the gateway level enforces a single, consistent security boundary across all microservices. A Global Filter that validates JWT tokens runs before any route filter, ensuring unauthenticated requests never reach downstream services regardless of which route is matched.
The Global Filter implements GlobalFilter and Ordered. The order value determines priority — lower numbers run first. Authentication should run at a very low order number (high priority) so it runs before any other filter. The filter receives a ServerWebExchange (containing the request and response) and a GatewayFilterChain, and it either calls chain.filter(exchange) to proceed or completes the exchange with a 401/403 response.
After validating the JWT, the filter should extract user claims and forward them to downstream services as request headers. This allows downstream services to trust the user identity without performing their own JWT validation. Common headers include X-User-ID, X-User-Roles, X-User-Email. Use a prefix like X-Auth- to distinguish gateway-injected headers from client-provided ones, and strip any X-Auth- headers from incoming requests before validation to prevent header spoofing.
Whitelist public endpoints (health checks, Swagger UI, auth endpoints themselves) by path pattern. Use a configurable list of patterns stored in configuration, not hardcoded in the filter class, so new public endpoints can be added without code changes. AntPathMatcher works for pattern matching in WebFlux contexts.
mutate().headers(h -> h.remove(...)) call must happen before you add the validated values.Schedulers.boundedElastic() to avoid blocking the Netty event loop, which causes latency spikes under load.Circuit Breaker Filter and Resilience Patterns
The CircuitBreaker filter integrates Resilience4j circuit breaker logic at the gateway level. When a downstream service begins failing or responding slowly, the circuit opens and requests are immediately routed to a fallback URI, preventing timeouts from cascading into gateway thread exhaustion.
The fallback URI can be a local gateway endpoint (forward:/fallback/orders) that returns a cached response, a default error response, or even a redirect to a maintenance page. For read-heavy endpoints, the fallback can serve stale cached data from Redis, giving users a degraded but functional experience instead of an error.
Timeout configuration is a separate concern from circuit breaking but works alongside it. The gateway's HttpClient timeout (spring.cloud.gateway.httpclient.response-timeout) applies to all routes. Per-route timeouts override this via the RequestTimeout filter. Set timeouts aggressively — a 30-second timeout means a slow downstream can hold gateway connections for 30 seconds per request, quickly exhausting the connection pool.
Resilience4j's sliding window configuration deserves careful tuning. COUNT_BASED uses the last N calls; TIME_BASED uses calls in the last N seconds. For low-traffic services, COUNT_BASED is more responsive because TIME_BASED windows may not have enough samples to make accurate decisions. The failure rate threshold (default 50%) means half your traffic must fail before the circuit opens — in production, lower this to 30-40% for critical services.
Load Balancing with lb:// URI Scheme
The lb:// URI scheme in Spring Cloud Gateway integrates with Spring Cloud LoadBalancer to automatically resolve service names to physical instance addresses. When Gateway sees lb://order-service, it queries the service registry (Eureka, Consul, or Kubernetes) for available instances, applies the configured load balancing strategy, and forwards the request to the selected instance.
The default load balancer is RoundRobinLoadBalancer. For sticky sessions (sending requests from the same client to the same instance), use a custom ServiceInstanceListSupplier. For canary deployments, use a WeightedServiceInstanceListSupplier that routes a percentage of traffic to new instances.
The connection pool to downstream services is managed by Reactor Netty's connection provider. Each unique host:port combination gets its own pool. Key settings: max-connections (default 500 per pool), pending-acquire-max-count (requests waiting for a connection, default 1000), and connect-timeout. If these limits are exceeded, requests fail immediately with a ConnectionPoolAcquireTimeoutException.
Healthy instance filtering is crucial — without it, the load balancer may route to instances that are running but unhealthy (DOWN in Actuator). Use ServiceInstanceListSupplier.builder().withDiscoveryClient().withHealthChecks().build() to filter unhealthy instances before the load balancing algorithm selects one. This requires that your services expose /actuator/health and are registered in the service registry with accurate health status.
Global CORS, Logging, and Observability
CORS configuration at the Gateway eliminates the need for CORS config in every downstream service. Configure it once globally or per-route, and ensure all downstream services remove their CORS configuration to prevent duplicate headers. Duplicate CORS headers (Access-Control-Allow-Origin appearing twice) cause browsers to reject all responses from that origin.
Request logging for debugging and audit trails should be implemented as a Global Filter that logs before and after each proxied request. Include: request ID (generate one if not present), path, method, user ID (from JWT), response status, and duration. Structured JSON logging with these fields makes log aggregation and querying in Elasticsearch or CloudWatch straightforward.
Micrometer integration provides metrics for every route: spring.cloud.gateway.requests with tags for routeId, uri, outcome, and status. Export to Prometheus and create dashboards for: P50/P95/P99 latency per route, error rate per route, rate limiter rejection rate, and circuit breaker state changes. These four dashboards give you complete observability without any custom instrumentation.
Distributed tracing with Micrometer Tracing (Spring Boot 3.x) automatically propagates trace IDs through the gateway to downstream services via HTTP headers. Configure a 100% sampling rate for development and 1-5% for production, or use a head-based sampler that samples 100% of requests that return 5xx.
Gateway OOM Crash During Traffic Spike Due to Response Caching Filter Misconfiguration
- Global filters in Spring Cloud Gateway apply to every request.
- Never apply body-buffering or transformation filters globally — scope them to specific routes.
- Always load test the gateway with realistic payload sizes and concurrency levels before traffic spikes.
Mono.error() in the KeyResolver causes all requests to be rate-limited. Check the replenish-rate and burst-capacity values — burst-capacity must be greater than or equal to replenish-rate.curl -s http://gateway:8080/actuator/gateway/routes | python3 -m json.toolcurl -s http://gateway:8080/actuator/gateway/routefiltersKey takeaways
Schedulers.boundedElastic() for any synchronous I/OCommon mistakes to avoid
7 patternsApplying body-buffering filters (ModifyResponseBodyFilter) globally
Forgetting to strip injected auth headers from incoming requests
request.mutate().headers(h -> h.remove("X-User-ID")) before adding validated values from the JWT; strip first, then addConfiguring Retry filter for POST/PUT/DELETE endpoints
Setting deny-empty-key=true (default) without Redis HA for rate limiting
Configuring CORS on both Gateway and downstream services
Not setting explicit route IDs
Blocking operations in Gateway filters on the event loop thread
Schedulers.boundedElastic() using subscribeOn(); keep event loop threads non-blockingInterview Questions on This Topic
What is the difference between a GlobalFilter and a GatewayFilter in Spring Cloud Gateway?
Frequently Asked Questions
That's Spring Cloud. Mark it forged?
7 min read · try the examples if you haven't