Performance: Stateless gateways scale horizontally behind any cloud LB — a 3-instance Kong cluster handles 50k+ RPS
Production trap: In-memory rate limiting on multiple gateway instances — each instance has its own counter, user gets 3x their limit
Circuit breakers at the gateway prevent cascade failures — failing service only degrades, not the whole system
Biggest mistake: Business logic in the gateway — if it would break when swapping gateway providers, it doesn't belong there
Plain-English First
Imagine a massive hotel. Guests don't wander into the kitchen, the laundry room, or the staff quarters — they talk to the front desk, and the front desk figures out who to send them to. An API Gateway is that front desk for your backend services. Every request from the outside world hits the gateway first, and the gateway decides where it goes, whether the person is allowed in, and how fast they can make requests.
Every time you open Uber and request a ride, your app doesn't talk to seventeen different backend services directly. It talks to one door — the API Gateway — and that gateway orchestrates the chaos behind the scenes. In a world where a single product might run on dozens of microservices, having a clean, single entry point isn't a luxury; it's what keeps the whole system from falling apart under real traffic.
But gateways are also dangerous. Put business logic in one and you've created a new monolith. Use in-memory rate limiting across two instances and your limits silently double. Misconfigure timeouts and a slow database takes down your entire checkout flow. The difference between a gateway that helps and one that hurts is understanding the components — not just copying a config from a tutorial.
By the end you'll know exactly what each component does, where they break in production, and how to design for scale. You'll walk away with patterns (BFF, aggregation, circuit breaking) that senior engineers use in interviews and real systems.
The Core Job of an API Gateway: One Door, Many Rooms
Before microservices became the norm, you had a monolith — one big application that handled everything. A client made one request, the app handled it, done. But once you split that monolith into ten, twenty, or fifty services (authentication, payments, user profiles, notifications…), clients suddenly need to know where everything lives. That's chaos.
An API Gateway solves this by being the single entry point for all client traffic. It receives every inbound request, applies a set of cross-cutting concerns (auth, rate limiting, logging), and then routes the request to the right downstream service. The client only ever needs to know one URL.
The key insight here is that the gateway isn't just a proxy that forwards traffic blindly. It actively transforms, validates, and enriches requests before they ever touch your services. Think of it as a bouncer, a receptionist, and a traffic cop rolled into one.
# Example: AWSAPIGateway / Kong-style declarative configuration
# This shows how a gateway maps external routes to internal services
gateway:
name: ecommerce-gateway
base_url: https://api.shopexample.com
routes:
# Route1: Public product catalog — no auth required
- path: /products
method: GET
upstream_service: http://product-service:8001/api/products
auth_required: false # Public endpoint, anyone can browse
rate_limit:
requests_per_minute: 300 # Still throttled to prevent scraping
# Route2: Place an order — must be authenticated
- path: /orders
method: POST
upstream_service: http://order-service:8002/api/orders
auth_required: true # Gateway checks JWT before forwarding
rate_limit:
requests_per_minute: 30 # Stricter limit on write operations
timeout_ms: 5000 # Gateway cancels request after 5s
# Route3: User profile — auth + request transformation
- path: /users/{userId}/profile
method: GET
upstream_service: http://user-service:8003/api/profile
auth_required: true
transform_request:
add_header:
X-Internal-Request-Id: "${generate_uuid}" # Gateway injects trace ID
X-Caller-Service: "api-gateway" # Downstream knows origin
rate_limit:
requests_per_minute: 60
Why This Matters:
Notice that the Authorization header is stripped before reaching the downstream service in some patterns. Your internal services can then trust that any request arriving from the gateway is already authenticated — they don't each need to implement JWT validation themselves. This is called the 'trusted network' pattern and it cuts duplicated auth logic across every service.
Production Insight
A single gateway instance is a single point of failure. Run at least two behind a cloud load balancer.
The gateway becomes the most critical piece of infrastructure the moment it's deployed.
Rule: Stateless gateways (all state in Redis) scale horizontally. Stateful gateways don't. Design for statelessness from day one.
Key Takeaway
An API Gateway is not just a proxy — it's a composition of five distinct components: router, auth layer, rate limiter, load balancer, and request/response transformer.
Knowing each one separately is what makes you credible in system design interviews and production incidents.
Rule: If your gateway config contains business logic (if userId == 'admin'), you're doing it wrong.
Gateway or No Gateway?
IfYou have 1-3 services, all internal, low traffic
→
UseSkip the gateway. Use a simple load balancer (Nginx, HAProxy) + internal service discovery.
IfYou have >3 services, multiple client types (web, mobile, third-party)
→
UseDeploy an API gateway. The cross-cutting concerns (auth, rate limiting, aggregation) will otherwise be duplicated per service.
IfYou need per-client API customization (iOS vs Android vs Web)
→
UseUse Backend for Frontend (BFF) pattern — a separate gateway per client type. Mobile gets lightweight payloads; web gets full data.
IfYour only concern is SSL termination and basic routing
→
UseUse Nginx or HAProxy. A full API gateway is overkill. Add AWS API Gateway or Kong only when you need auth or rate limiting.
IfYour team lacks operational experience with gateways
→
UseUse a managed gateway (AWS API Gateway, GCP Apigee). Building and operating your own Kong cluster is non-trivial.
The Five Components Every API Gateway Must Have
A gateway is more than a reverse proxy. It's a composition of distinct components, each with a specific job. Understanding each one separately is how you answer system design questions confidently — and how you avoid misconfiguring production systems.
1. Request Router — Maps incoming URLs and HTTP methods to upstream services. This is the core. Without routing, nothing works.
2. Authentication & Authorization Layer — Validates identity (AuthN) and checks permissions (AuthZ). The gateway is the ideal place for this because it's centralised. JWT validation, OAuth token introspection, API key checks — all happen here before requests go anywhere.
3. Rate Limiter & Throttler — Protects your services from being overwhelmed. Rate limiting says 'you get 100 requests per minute.' Throttling says 'requests beyond that get queued or slowed down, not just rejected.'
4. Load Balancer — When multiple instances of a service are running, the gateway distributes traffic across them. Round-robin, least-connections, and weighted routing are common strategies.
5. Request/Response Transformer — The gateway can reshape payloads in both directions. Strip sensitive fields from responses, add internal headers, translate between REST and gRPC, or aggregate responses from multiple services into one (the Backend for Frontend pattern).
// Simulating the middleware pipeline of an API Gateway in Node.js// This shows HOW each component processes a request in sequence// Real gateways (Kong, AWS API GW, Nginx) do this in compiled code,// but the pipeline logic is identical.const express = require('express');
const { v4: uuidv4 } = require('uuid');
const app = express();
app.use(express.json());
// ─── COMPONENT 1: Request Logger ────────────────────────────────────────────// Every request gets a unique trace ID the moment it arrives
app.use((req, res, next) => {
req.traceId = uuidv4(); // Unique ID for distributed tracing
req.arrivalTime = Date.now();
console.log(`[GATEWAY] Incoming: ${req.method} ${req.path} | traceId=${req.traceId}`);
next();
});
// ─── COMPONENT 2: Authentication Layer ──────────────────────────────────────// Validate the token ONCE here — downstream services don't need to
const PUBLIC_ROUTES = ['/products']; // Routes that skip authfunctionauthMiddleware(req, res, next) {
if (PUBLIC_ROUTES.includes(req.path)) {
return next(); // Skip auth for public routes
}
const authHeader = req.headers['authorization'];
if (!authHeader || !authHeader.startsWith('Bearer ')) {
return res.status(401).json({
error: 'Authentication required',
traceId: req.traceId // Always return traceId for debugging
});
}
const token = authHeader.split(' ')[1];
// In production: verify JWT signature, check expiry, decode claims// For demo: we accept any token that starts with 'valid-'if (!token.startsWith('valid-')) {
return res.status(401).json({ error: 'Invalid token', traceId: req.traceId });
}
req.authenticatedUserId = token.replace('valid-', ''); // Extracted from JWT claims
res.setHeader('X-Auth-User', req.authenticatedUserId); // Pass identity downstreamnext();
}
app.use(authMiddleware);
// ─── COMPONENT 3: Rate Limiter ───────────────────────────────────────────────// Simple in-memory rate limiter (production: use Redis for distributed state)
const requestCounts = {}; // ip -> { count, windowStart }
const RATE_LIMIT = 5; // Max 5 requests per window
const WINDOW_MS = 60 * 1000; // 60-second windowfunctionrateLimiter(req, res, next) {
const clientIp = req.ip;
const now = Date.now();
if (!requestCounts[clientIp] || now - requestCounts[clientIp].windowStart > WINDOW_MS) {
requestCounts[clientIp] = { count: 1, windowStart: now }; // Reset windowreturnnext();
}
requestCounts[clientIp].count++;
if (requestCounts[clientIp].count > RATE_LIMIT) {
const retryAfter = Math.ceil((WINDOW_MS - (now - requestCounts[clientIp].windowStart)) / 1000);
res.setHeader('Retry-After', retryAfter); // Tell client when to retryreturn res.status(429).json({
error: 'Rate limit exceeded',
retryAfterSeconds: retryAfter,
traceId: req.traceId
});
}
next();
}
app.use(rateLimiter);
// ─── COMPONENT 4: Router + Load Balancer ────────────────────────────────────// Round-robin across multiple instances of a serviceconst productServiceInstances = [
'http://product-service-1:8001',
'http://product-service-2:8001',
'http://product-service-3:8001'
];
let roundRobinIndex = 0;
functiongetNextProductInstance() {
const instance = productServiceInstances[roundRobinIndex];
roundRobinIndex = (roundRobinIndex + 1) % productServiceInstances.length; // Wrap aroundreturn instance;
}
// Public product route — no auth needed
app.get('/products', (req, res) => {
const targetInstance = getNextProductInstance();
console.log(`[GATEWAY] RoutingGET /products -> ${targetInstance} | traceId=${req.traceId}`);
// In production: forward the actual HTTP request using axios/node-fetch// For demo: simulate the downstream response
res.json({
_meta: { routedTo: targetInstance, traceId: req.traceId },
products: [{ id: 1, name: 'Wireless Headphones', price: 79.99 }]
});
});
// Protected order route
app.post('/orders', (req, res) => {
console.log(`[GATEWAY] RoutingPOST /orders for user=${req.authenticatedUserId} | traceId=${req.traceId}`);
// ─── COMPONENT 5: Request Transformer ─────────────────────────────────────// Strip the raw Authorization header — order-service trusts X-Auth-User insteadconst internalPayload = {
...req.body,
requestedByUserId: req.authenticatedUserId, // Inject verified identity
gatewayTraceId: req.traceId // Inject trace ID for observability// Note: we do NOT forward req.headers.authorization to internal services
};
console.log('[GATEWAY] Transformed payload for order-service:', internalPayload);
res.status(201).json({
message: 'Order created',
orderId: `ORD-${Date.now()}`,
traceId: req.traceId
});
});
app.listen(3000, () => {
console.log('[GATEWAY] API Gateway running on port 3000');
});
Watch Out: In-Memory Rate Limiting Breaks at Scale
The in-memory rate limiter above works perfectly on a single gateway instance. The moment you run two gateway instances (which you will in production for high availability), each instance has its own counter — so a user effectively gets double their limit. Always back your rate limiter with a shared store like Redis using atomic INCR + EXPIRE commands. This is one of the most common production bugs teams hit after their first gateway scaling event.
Production Insight
The in-memory rate limiter in the code above will fail silently in production the moment you scale to two gateway instances.
Each instance tracks its own counters independently, so a user hitting both instances gets double the intended limit.
Rule: All gateway state must live in a shared external store (Redis). Stateless gateways are horizontally scalable; stateful gateways are not.
Key Takeaway
Authenticate once at the gateway, not in every service. Downstream services should trust identity headers the gateway injects (X-Auth-User).
Rate limiter state must live in a shared store (Redis) the moment you have more than one gateway instance.
Rule: In-memory rate limiting works perfectly in development and fails silently in production. Don't learn this the hard way.
Rate Limiter Storage Decision
IfSingle gateway instance, development or low-traffic production
→
UseIn-memory rate limiter is acceptable. Use ConcurrentHashMap with sliding window. No external dependency.
IfMultiple gateway instances (HA or scaling), any production traffic
→
UseUse Redis with atomic INCR + EXPIRE operations. Key format: rate_limit:{userId}:{minute_window} with TTL=60 seconds.
IfVery high throughput (>100k RPS), Redis single point of failure is acceptable
→
UseUse Redis Cluster with consistent hashing. Lua script for increment-and-check atomicity. Set up Redis sentinel for failover.
IfRate limiting must be exact, not approximate, and no Redis available
→
UseUse client-side consistent hashing pinned to gateway instance based on user ID hash. Each user always hits same instance. Trade-off: uneven load distribution.
IfYou need advanced features (sliding windows, burst handling, per-route limits)
→
UseUse existing library (token bucket, leaky bucket) with Redis backend. Implement using sorted sets for sliding window counters.
Gateway Patterns You'll Actually Use: BFF, Aggregation, and Circuit Breaking
Knowing the components is step one. Knowing the patterns built on top of them is what separates a junior engineer from someone who can design systems confidently.
Backend for Frontend (BFF) — Mobile apps and web apps have different data needs. A mobile screen might need a simplified user profile summary; the web dashboard needs the full version with activity history. Instead of having clients make multiple calls or your services maintain multiple response shapes, you create a dedicated gateway layer per frontend. Each BFF cherry-picks and reshapes data for its specific client.
Request Aggregation — Some UI screens need data from three services: user info, recent orders, and loyalty points. Without aggregation, the client makes three serial or parallel calls. With the gateway aggregating, the client makes one call and the gateway fans out to all three services, merges the responses, and returns a single payload. Latency drops dramatically.
Circuit Breaker at the Gateway — If the inventory service is down, you don't want every request piling up and timing out at 5 seconds each. A circuit breaker tracks failure rates and 'opens' — immediately rejecting requests to a failing service with a fallback response — until the service recovers. The gateway is the perfect place to implement this because it's the chokepoint for all traffic.
// Demonstrates two advanced gateway patterns:// 1. Response Aggregation (fan-out to multiple services, merge results)// 2. Circuit Breaker (fail fast instead of cascading timeouts)// ─── Circuit Breaker State Machine ──────────────────────────────────────────// States: CLOSED (normal) -> OPEN (failing) -> HALF_OPEN (testing recovery)constCircuitState = { CLOSED: 'CLOSED', OPEN: 'OPEN', HALF_OPEN: 'HALF_OPEN' };
classCircuitBreaker {
constructor(serviceName, failureThreshold = 3, recoveryTimeoutMs = 10000) {
this.serviceName = serviceName;
this.failureThreshold = failureThreshold; // Open circuit after 3 failures
this.recoveryTimeout = recoveryTimeoutMs; // Try again after 10 secondsthis.state = CircuitState.CLOSED;
this.failureCount = 0;
this.lastFailureTime = null;
}
asynccall(serviceCallFn) {
// If circuit is OPEN, check if recovery timeout has passedif (this.state === CircuitState.OPEN) {
const timeSinceFailure = Date.now() - this.lastFailureTime;
if (timeSinceFailure < this.recoveryTimeout) {
// Still in open state — fail fast without calling the service
console.log(`[CIRCUIT] ${this.serviceName} is OPEN — fast failing`);
thrownewError(`${this.serviceName} circuit is open — service unavailable`);
}
// Timeout elapsed — try one probe request
console.log(`[CIRCUIT] ${this.serviceName} moving to HALF_OPEN — sending probe`);
this.state = CircuitState.HALF_OPEN;
}
try {
const result = await serviceCallFn(); // Attempt the actual service call
this.onSuccess(); // Reset on successreturn result;
} catch (error) {
this.onFailure(); // Track failurethrow error;
}
}
onSuccess() {
this.failureCount = 0;
this.state = CircuitState.CLOSED; // Restore normal operation
console.log(`[CIRCUIT] ${this.serviceName} — circuit CLOSED (healthy)`);
}
onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold || this.state === CircuitState.HALF_OPEN) {
this.state = CircuitState.OPEN; // Trip the circuit
console.log(`[CIRCUIT] ${this.serviceName} — circuit OPENED after ${this.failureCount} failures`);
}
}
}
// ─── Simulated Downstream Services ──────────────────────────────────────────let inventoryCallCount = 0;
asyncfunctionfetchUserProfile(userId) {
// Simulates user-service responding successfullyreturn { userId, name: 'Amara Osei', tier: 'Gold' };
}
asyncfunctionfetchRecentOrders(userId) {
// Simulates order-service responding successfullyreturn { userId, orders: [{ orderId: 'ORD-001', total: 149.99, status: 'Shipped' }] };
}
asyncfunctionfetchInventoryStatus(productId) {
// Simulates a flaky inventory service that fails intermittently
inventoryCallCount++;
if (inventoryCallCount <= 3) { // First 3 calls failthrownewError('inventory-service: connection refused');
}
return { productId, inStock: true, quantity: 47 };
}
// ─── Circuit Breakers (one per downstream service) ───────────────────────────const inventoryCircuit = newCircuitBreaker('inventory-service', 3, 5000);
// ─── Gateway Aggregation Handler ────────────────────────────────────────────// Client makes ONE call to /dashboard — gateway fans out to 3 servicesasyncfunctionhandleDashboardRequest(userId, productId) {
console.log(`\n[GATEWAY] Dashboard request for userId=${userId}, productId=${productId}`);
// Fan out: run user profile and orders in PARALLEL (faster than serial)const [userProfile, recentOrders] = awaitPromise.all([
fetchUserProfile(userId),
fetchRecentOrders(userId)
]);
// Inventory goes through circuit breaker — it's a non-critical enhancementlet inventoryData = null;
try {
inventoryData = await inventoryCircuit.call(() => fetchInventoryStatus(productId));
} catch (circuitError) {
// GRACEFUL DEGRADATION: dashboard still works without inventory data
console.log(`[GATEWAY] Inventory unavailable — degraded response: ${circuitError.message}`);
inventoryData = { productId, inStock: null, message: 'Inventory temporarily unavailable' };
}
// Aggregate all responses into one payload for the clientreturn {
user: userProfile,
orders: recentOrders,
inventory: inventoryData
};
}
// ─── Simulate 5 consecutive dashboard requests ──────────────────────────────
(async () => {
for (let requestNum = 1; requestNum <= 5; requestNum++) {
const response = awaithandleDashboardRequest('user-99', 'product-headphones');
console.log(`[GATEWAY] Response aggregated:`, JSON.stringify(response.inventory));
}
})();
Interview Gold: Graceful Degradation vs. Failure
The circuit breaker above returns a degraded response (inventory null) instead of failing the entire dashboard request. This is called graceful degradation — a core resilience principle. Interviewers love asking 'what happens when service X goes down?' The answer they want is: the gateway catches it at the circuit breaker level, returns a partial response with sensible defaults, and protects the rest of the system from the cascade. Saying 'the whole request fails' is the wrong answer.
Production Insight
A circuit breaker that fails open (rejecting all requests) degrades gracefully; a circuit breaker that fails closed (passing all requests) can cascade failure.
Without aggregation, a dashboard that needs data from 3 services makes 3 round trips. With aggregation, the client makes 1 round trip.
Rule: Use BFF when mobile and web have different data needs. Use aggregation when a client needs data from multiple services to render one screen.
Key Takeaway
Circuit breakers belong at the gateway, not just in service-to-service calls. When a downstream service is degraded, fail fast and return a partial response.
Aggregation reduces client complexity and latency. The gateway becomes the orchestrator, not just a proxy.
Rule: A dashboard that makes 5 separate calls from the client is a performance problem. Move aggregation to the gateway.
Gateway Pattern Selection
IfMobile and web clients need different data shapes from the same services
→
UseUse Backend for Frontend (BFF). Deploy separate gateway instances per client type. Mobile BFF returns lightweight payloads; web BFF returns full data.
IfClient needs data from 3+ services to render a single screen (dashboard, profile, home feed)
→
UseUse Request Aggregation. Gateway fans out to all services in parallel and merges responses. Client makes 1 call instead of 3+.
IfA non-critical downstream service failing should not break the entire request
→
UseUse Circuit Breaker + Graceful Degradation. Return default or cached data for the failing service. Mark it as degraded in the response.
IfService is critical (payments, auth) and must not be called if already failing
→
UseUse Circuit Breaker with FAIL_OPEN (reject requests) when circuit is open. Do NOT fall back to degraded data for critical services.
IfYou need to combine the above patterns (BFF + Aggregation + Circuit Breaker)
→
UseImplement middleware that applies patterns in order: BFF selector (per client), Aggregation fan-out, Circuit Breaker per downstream call.
● Production incidentPOST-MORTEMseverity: high
The Double Rate Limit That Killed Checkout
Symptom
Inventory sold out 75% faster than expected. Angry customers tweeted that the promotion was a 'scam' — they couldn't check out fast enough. The team's monitoring showed each gateway instance counted requests correctly, but per-user rate limiting was double the intended value.
Assumption
The team assumed rate limit state was shared because they'd configured the same limit value (100 per minute) in each gateway's config file. They didn't know each gateway instance maintains its own in-memory counter. The load balancer distributed requests randomly across instances, so a user hitting 50 requests to instance A and 50 to instance B wasn't blocked — 100 per instance, 200 total.
Root cause
Rate limiting was implemented as a simple in-memory sliding window counter using Redis OM (Object Mapping) was not used. Each gateway instance tracked its own HashMap<userId, requestCount>. Production deployed two instances for high availability. A user making 150 requests per minute would hit instance A for 80 and instance B for 70 — never exceeding either instance's 100 limit. The intended limit of 100 per user was effectively doubled to 200 per user. This is a classic 'race condition in distribution' failure — the system worked correctly per instance but failed at the aggregate level.
Fix
Moved rate limit state to a shared Redis cluster with atomic INCR and EXPIRE commands. Each request now increments a Redis key rate_limit:{userId}:{minute_window} with TTL=60. The gateway checks the Redis value before forwarding. Added a Lua script to increment and check in one atomic operation to eliminate race conditions between instances. After the fix, a user's 101st request was correctly rejected regardless of which instance received it, even at 50k RPS.
Key lesson
In-memory rate limiting only works with one gateway instance. The moment you run two (which you must for HA), each instance has its own counter and users get effectively multiplied limits.
Any state shared across gateway instances must live in an external store — Redis is the industry standard. Use atomic operations (INCR + EXPIRE) or Lua scripts.
Load test your rate limiter with multiple gateway instances before a major sale. Simulate a single user's traffic across all instances using a consistent hash ring or a test harness.
Monitor per-user rate limit rejections at the load balancer level, not per gateway instance. A rejection rate of 0 per instance can hide aggregate overages.
Production debug guideSymptom → Action mapping for common gateway failures5 entries
Symptom · 01
Rate limits not applying correctly — some users get through way more requests than allowed
→
Fix
Check number of gateway instances. If >1, rate limit state is likely in-memory per instance. Verify rate limiter uses a shared Redis or equivalent. Look for Map<userId, count> in code — that's the smoking gun. Switch to Redis with atomic INCR operations.
Symptom · 02
Requests timing out sporadically — some succeed, some fail with 504 Gateway Timeout
→
Fix
Check downstream service health and gateway timeout settings. Most likely a slow service is absorbing all gateway threads. Implement circuit breaker that fails fast after 3 consecutive failures. Set timeout_ms per route shorter than client timeout so gateway fails fast, not slow.
Symptom · 03
Tracing fails — you have logs across services but can't correlate a single request
→
Fix
Gateway is not generating or propagating trace IDs. Add middleware that generates a UUID on every incoming request and injects it as X-Trace-Id and X-Request-Id headers to all downstream calls. Each service must log that ID. Now you can grep across all logs.
Symptom · 04
Authentication works inconsistently — some requests to same endpoint succeed, others fail with 401
→
Fix
Multiple gateway instances may have different JWT validation configurations or cached public keys. Verify all instances have identical config. Check if JWT is being stripped before reaching auth layer. Add X-Auth-User header after validation so downstream services trust it.
Symptom · 05
Circuit breaker stuck in OPEN state — service is healthy but gateway still fast-failing
→
Fix
Circuit breaker recovery timeout may be too long. When circuit is HALF_OPEN, check that the probe request is actually reaching the service and succeeding. Some circuit breakers require successful probe before closing; others require manual reset. Implement half-open retry logic.
★ API Gateway Quick Debug Cheat SheetFast diagnostics for production gateway issues. Run these commands to confirm the root cause before changing config.
Rate limiter not working as expected — users exceeding limits−
Immediate action
Check if multiple gateway instances are running and rate limit state sharing
If gateway instances > 1 and Redis keys are missing, rate limiter is in-memory. Deploy Redis and switch to atomic INCR+EXPIRE in the gateway. If Redis keys exist but limits still wrong, check TTL and window alignment.
504 Gateway Timeout errors — some requests succeed, some timeout+
Immediate action
Check downstream service latency and gateway thread pool utilization
Increase timeout_ms per route if downstream is slow. Add circuit breaker to fail fast after 3 timeouts. Increase gateway thread pool size if connections saturated.
No trace IDs in downstream logs — can't correlate requests+
Immediate action
Verify gateway is generating and propagating trace headers
Add middleware that generates UUID on each request. Inject as X-Trace-Id header to downstream services. Configure log formatter to include this header value in every log line.
Circuit breaker open but service is healthy+
Immediate action
Check circuit breaker state and health probe configuration
Force close circuit breaker via admin API if service is confirmed healthy. Adjust failureThreshold from 3 to 5 to reduce false positives. Set recoveryTimeoutMs to 30000 (30 seconds).
Authentication passes on some instances, fails on others+
Immediate action
Check JWT validation consistency across gateway instances
Commands
for pod in $(kubectl get pods -l app=gateway -o name); do kubectl exec $pod -- cat /etc/gateway/jwt-config.json; done
Mount JWT public key from a shared ConfigMap. Disable per-instance caching of validation results. Use stateless JWT validation (no shared cache needed).
API Gateway vs Reverse Proxy
Aspect
API Gateway
Simple Reverse Proxy (e.g. Nginx)
Primary role
Cross-cutting concerns + smart routing
Traffic forwarding + SSL termination
Authentication
Built-in JWT/OAuth/API key validation
Not natively — requires plugins or custom Lua
Rate limiting
Per-client, per-route, configurable policies
Basic — IP-based, limited granularity
Request transformation
Yes — reshape payloads, add/strip headers
Limited — mostly header manipulation
Service aggregation (BFF)
Yes — fan out and merge multiple service calls
No — 1:1 proxy only
Circuit breaking
Yes — native in Kong, AWS, Apigee
No — must use Nginx+ or external sidecar
Developer portal / API docs
Yes — most managed gateways include this
No
Operational complexity
Higher — another stateful layer to manage
Lower — battle-hardened, config is simple
Best for
Microservices with many clients and policies
Simple routing, static content, TLS offload
Key takeaways
1
An API Gateway is not just a proxy
it's a composition of five distinct components: router, auth layer, rate limiter, load balancer, and request/response transformer. Knowing each one separately is what makes you credible in system design interviews.
2
Authenticate once at the gateway, not in every service. Downstream services should trust the identity headers the gateway injects (X-Auth-User)
not re-validate tokens themselves. This eliminates duplicated auth logic across your entire microservice fleet.
3
Circuit breakers belong at the gateway, not just in service-to-service calls. When a downstream service is degraded, fail fast and return a partial (gracefully degraded) response. Letting a slow service absorb all your gateway threads causes cascading failures that are much harder to debug.
4
Rate limiter state must live in a shared store (Redis) the moment you have more than one gateway instance. In-memory rate limiting works perfectly in development and fails silently in production
because each instance tracks its own counters independently.
5
BFF (Backend for Frontend) is the evolution of the single gateway
separate gateway instances per client type (mobile, web, third-party). Each tailors responses to its client, reducing latency and payload size at the cost of operational complexity.
Common mistakes to avoid
5 patterns
×
Putting business logic inside the gateway
Symptom
Gateway config contains conditional logic about order values, user tiers, or product categories. Deploying a config change takes 10 minutes. You're afraid to update it. The gateway has become a new monolith.
Fix
The gateway should only handle infrastructure concerns (auth, routing, rate limiting, transformation). Any logic that touches your domain model belongs in a service. A good rule: if the logic would break if you swapped to a different gateway provider, it doesn't belong in the gateway.
×
Using a single gateway instance without a fallback
Symptom
Your gateway goes down during a deploy, and your entire product is offline. The load balancer has no other healthy target. Customers see 503 Service Unavailable.
Fix
Run at minimum two gateway instances behind a cloud load balancer (AWS ALB, GCP LB) in different availability zones. The gateway is now the most critical piece of your infrastructure — treat it like one. Stateless gateway design (externalising state to Redis) makes horizontal scaling trivial.
×
Not propagating trace IDs end-to-end
Symptom
A request fails in production. You have logs in five different services but can't correlate them. Each service logs different IDs. Debugging takes 2 hours instead of 2 minutes.
Fix
The gateway must generate a unique traceId (UUID or W3C traceparent header) on every incoming request and inject it into every downstream call as a header. Each service logs that ID with every log line. Now you can grep a single ID across all services and reconstruct the full request journey in seconds.
Requests occasionally fail with 504 Gateway Timeout. Downstream service logs show the request succeeded after 3 seconds. Gateway timed out at 2 seconds.
Fix
Gateway timeout must be greater than the sum of downstream processing time + network latency + retries. Set timeout_ms per route to at least 2x the 99th percentile latency of the downstream service. Use retries configuration with exponential backoff.
×
Stripping authentication headers before downstream validation
Symptom
Gateway validates JWT, strips Authorization header, then forwards request. Downstream service implements its own JWT validation but the header is gone. Request fails with 401 Unauthorized.
Fix
After validation, inject a trusted identity header like X-Auth-User: userId and optionally X-Auth-Roles: admin,user. Downstream services should trust these headers if they come from the gateway's trusted network. Use mTLS between gateway and services to prevent spoofing.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How does an API Gateway differ from a Service Mesh like Istio, and when ...
Q02SENIOR
Walk me through what happens — component by component — when an unauthen...
Q03SENIOR
If your rate limiter is deployed across three gateway instances and a us...
Q04SENIOR
What is the Backend for Frontend (BFF) pattern, and when would you use i...
Q01 of 04SENIOR
How does an API Gateway differ from a Service Mesh like Istio, and when would you choose one over the other?
ANSWER
An API Gateway operates at Layer 7 (HTTP/HTTPS) as an ingress point for external clients. It handles north-south traffic: browser, mobile app, third-party API calls to your system. Service mesh (Istio, Linkerd) operates as a sidecar proxy on every pod, handling east-west traffic between internal services. Service mesh provides service discovery, mTLS encryption, circuit breaking, and observability for internal communication. Choose API Gateway when: you need authentication, rate limiting, or request transformation for external clients; or you want a single entry point for all client traffic. Choose Service Mesh when: you have many internal services needing mTLS; you want fine-grained retry/timeout policies between services; or you need traffic splitting for canary deployments. The two are complementary — you can (and should) use both: an API Gateway for north-south traffic and a service mesh for east-west traffic.
Q02 of 04SENIOR
Walk me through what happens — component by component — when an unauthenticated user hits POST /checkout on a system with an API Gateway in front of five microservices.
ANSWER
The request hits the gateway's load balancer, which picks one healthy gateway instance. Step 1 — Request Logger: gateway generates a traceId and logs the incoming request. Step 2 — Authentication Layer: gateway checks the Authorization header; finds none; immediately returns 401 Unauthorized with traceId. No other components run because auth failed early. This is the 'fail fast' principle. The request never touches the checkout service or any other downstream service, saving resources and making the failure obvious. If the user had a valid token, the flow would continue: Step 3 — Rate Limiter checks Redis for user's request count; increments if under limit; rejects 429 if over. Step 4 — Router: maps POST /checkout to checkout-service:8004. Step 5 — Load Balancer: picks an instance of checkout-service (round-robin). Step 6 — Request Transformer: injects X-Auth-User, X-Trace-Id, and strips original Authorization header. Step 7 — Circuit Breaker: checks if circuit for checkout-service is open; if healthy, forwards request; if open, returns 503 with graceful degradation message. Step 8 — Response Transformer: on success, strips internal headers, formats response for client, adds traceId. The client receives a consistent response shape regardless of which service handled the request.
Q03 of 04SENIOR
If your rate limiter is deployed across three gateway instances and a user is allowed 100 requests per minute, how do you prevent them from actually making 300 requests per minute — and what trade-offs does your solution involve?
ANSWER
The correct solution is a shared external store, specifically Redis with atomic INCR operations. Key format: rate_limit:{userId}:{minute_window} with TTL=60 seconds. Each request does: count = INCR(rate_limit_key); if count == 1: EXPIRE(key, 60); if count > 100: reject. This is atomic and works across all instances. Trade-offs: (1) Latency: each request adds a Redis round trip (~1-2ms). (2) Availability: Redis becomes a single point of failure. Use Redis Sentinel or Cluster for HA. (3) Consistency: strict sliding windows require sorted sets (ZADD + ZREMRANGEBYSCORE) which are more complex. Alternative approaches: token buckets in Redis, or client-side consistent hashing pinned to gateway instance (user always hits same instance) but this causes uneven load distribution. The Redis approach is the industry standard; the trade-off of 1-2ms latency is acceptable for most workloads.
Q04 of 04SENIOR
What is the Backend for Frontend (BFF) pattern, and when would you use it instead of a single shared gateway?
ANSWER
BFF means deploying separate gateway instances per client type — one for mobile (iOS/Android), one for web browsers, one for third-party APIs, etc. Each BFF tailors responses to its specific client's needs. Mobile BFF returns lightweight JSON with only essential fields, aggregates multiple calls into one, and uses binary formats. Web BFF returns richer data, includes metadata for debugging, and may support partial responses. Use BFF instead of a single gateway when: mobile and web have fundamentally different data needs; you need to version APIs per client; or one client (e.g., mobile) is more sensitive to latency/payload size than others. The trade-off is operational complexity — you now maintain multiple gateway codebases or configs, and routing becomes more complex. The industry shift is toward BFF as the default: a shared gateway often becomes a bottleneck or forces compromises that hurt both mobile and web.
01
How does an API Gateway differ from a Service Mesh like Istio, and when would you choose one over the other?
SENIOR
02
Walk me through what happens — component by component — when an unauthenticated user hits POST /checkout on a system with an API Gateway in front of five microservices.
SENIOR
03
If your rate limiter is deployed across three gateway instances and a user is allowed 100 requests per minute, how do you prevent them from actually making 300 requests per minute — and what trade-offs does your solution involve?
SENIOR
04
What is the Backend for Frontend (BFF) pattern, and when would you use it instead of a single shared gateway?
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
What is the difference between an API Gateway and a load balancer?
A load balancer distributes traffic across identical instances of the same service — it doesn't understand what's in the request. An API Gateway routes traffic to different services based on the request path, method, or headers, and applies cross-cutting concerns like auth and rate limiting along the way. In practice, you use both: a cloud load balancer distributes traffic across multiple gateway instances, and the gateway then routes to the right downstream services.
Was this helpful?
02
Should I build my own API Gateway or use a managed solution like AWS API Gateway or Kong?
For almost every team, use a managed solution. Building and operating a gateway means owning TLS termination, rate limit state management, auth plugin security, and high availability — none of which is your core product. Managed gateways like Kong, AWS API Gateway, or Apigee handle all of this. Roll your own only if you have extremely specific requirements that no existing solution meets, and even then, treat it as a long-term maintenance commitment.
Was this helpful?
03
Can the API Gateway become a single point of failure?
Yes — which is exactly why you always run multiple gateway instances behind a cloud load balancer, distributed across availability zones. The gateway itself should be stateless (externalise all state like rate limit counters to Redis), so any instance can handle any request. Health checks and automatic instance replacement handle individual failures. The real risk isn't the gateway going down — it's misconfiguring it, which is why change management and staged rollouts for gateway config are critical.
Was this helpful?
04
How do you handle authentication across multiple gateway instances?
JWT authentication is stateless by design — each gateway instance validates the JWT signature using a shared public key (no shared state needed). OAuth token introspection may require calling the authorization server. Use Redis to cache introspection results for 5 minutes to avoid overloading the auth server. For API keys, store the key-to-user mapping in Redis with a TTL. Never use in-memory caching for auth decisions across multiple instances — stale cache entries can cause inconsistent auth results.