Mid-level 3 min · March 05, 2026

API Gateway — Why 2 Instances Doubled Our Rate Limits

Per-instance rate limiters silently doubled user limits during our flash sale.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • API Gateway = single entry point for all clients, routing requests to downstream services while applying cross-cutting concerns
  • Five core components: Router (maps URL to service), Auth layer (JWT/OAuth validation), Rate limiter (protects services), Load balancer (distributes traffic), Request transformer (reshapes payloads)
  • Performance: Stateless gateways scale horizontally behind any cloud LB — a 3-instance Kong cluster handles 50k+ RPS
  • Production trap: In-memory rate limiting on multiple gateway instances — each instance has its own counter, user gets 3x their limit
  • Circuit breakers at the gateway prevent cascade failures — failing service only degrades, not the whole system
  • Biggest mistake: Business logic in the gateway — if it would break when swapping gateway providers, it doesn't belong there
Plain-English First

Imagine a massive hotel. Guests don't wander into the kitchen, the laundry room, or the staff quarters — they talk to the front desk, and the front desk figures out who to send them to. An API Gateway is that front desk for your backend services. Every request from the outside world hits the gateway first, and the gateway decides where it goes, whether the person is allowed in, and how fast they can make requests.

Every time you open Uber and request a ride, your app doesn't talk to seventeen different backend services directly. It talks to one door — the API Gateway — and that gateway orchestrates the chaos behind the scenes. In a world where a single product might run on dozens of microservices, having a clean, single entry point isn't a luxury; it's what keeps the whole system from falling apart under real traffic.

But gateways are also dangerous. Put business logic in one and you've created a new monolith. Use in-memory rate limiting across two instances and your limits silently double. Misconfigure timeouts and a slow database takes down your entire checkout flow. The difference between a gateway that helps and one that hurts is understanding the components — not just copying a config from a tutorial.

By the end you'll know exactly what each component does, where they break in production, and how to design for scale. You'll walk away with patterns (BFF, aggregation, circuit breaking) that senior engineers use in interviews and real systems.

The Core Job of an API Gateway: One Door, Many Rooms

Before microservices became the norm, you had a monolith — one big application that handled everything. A client made one request, the app handled it, done. But once you split that monolith into ten, twenty, or fifty services (authentication, payments, user profiles, notifications…), clients suddenly need to know where everything lives. That's chaos.

An API Gateway solves this by being the single entry point for all client traffic. It receives every inbound request, applies a set of cross-cutting concerns (auth, rate limiting, logging), and then routes the request to the right downstream service. The client only ever needs to know one URL.

The key insight here is that the gateway isn't just a proxy that forwards traffic blindly. It actively transforms, validates, and enriches requests before they ever touch your services. Think of it as a bouncer, a receptionist, and a traffic cop rolled into one.

io/thecodeforge/gateway/api-gateway-config.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Example: AWS API Gateway / Kong-style declarative configuration
# This shows how a gateway maps external routes to internal services

gateway:
  name: ecommerce-gateway
  base_url: https://api.shopexample.com

routes:
  # Route 1: Public product catalog — no auth required
  - path: /products
    method: GET
    upstream_service: http://product-service:8001/api/products
    auth_required: false          # Public endpoint, anyone can browse
    rate_limit:
      requests_per_minute: 300    # Still throttled to prevent scraping

  # Route 2: Place an order — must be authenticated
  - path: /orders
    method: POST
    upstream_service: http://order-service:8002/api/orders
    auth_required: true           # Gateway checks JWT before forwarding
    rate_limit:
      requests_per_minute: 30     # Stricter limit on write operations
    timeout_ms: 5000              # Gateway cancels request after 5s

  # Route 3: User profile — auth + request transformation
  - path: /users/{userId}/profile
    method: GET
    upstream_service: http://user-service:8003/api/profile
    auth_required: true
    transform_request:
      add_header:
        X-Internal-Request-Id: "${generate_uuid}"   # Gateway injects trace ID
        X-Caller-Service: "api-gateway"              # Downstream knows origin
    rate_limit:
      requests_per_minute: 60
Why This Matters:
Notice that the Authorization header is stripped before reaching the downstream service in some patterns. Your internal services can then trust that any request arriving from the gateway is already authenticated — they don't each need to implement JWT validation themselves. This is called the 'trusted network' pattern and it cuts duplicated auth logic across every service.
Production Insight
A single gateway instance is a single point of failure. Run at least two behind a cloud load balancer.
The gateway becomes the most critical piece of infrastructure the moment it's deployed.
Rule: Stateless gateways (all state in Redis) scale horizontally. Stateful gateways don't. Design for statelessness from day one.
Key Takeaway
An API Gateway is not just a proxy — it's a composition of five distinct components: router, auth layer, rate limiter, load balancer, and request/response transformer.
Knowing each one separately is what makes you credible in system design interviews and production incidents.
Rule: If your gateway config contains business logic (if userId == 'admin'), you're doing it wrong.
Gateway or No Gateway?
IfYou have 1-3 services, all internal, low traffic
UseSkip the gateway. Use a simple load balancer (Nginx, HAProxy) + internal service discovery.
IfYou have >3 services, multiple client types (web, mobile, third-party)
UseDeploy an API gateway. The cross-cutting concerns (auth, rate limiting, aggregation) will otherwise be duplicated per service.
IfYou need per-client API customization (iOS vs Android vs Web)
UseUse Backend for Frontend (BFF) pattern — a separate gateway per client type. Mobile gets lightweight payloads; web gets full data.
IfYour only concern is SSL termination and basic routing
UseUse Nginx or HAProxy. A full API gateway is overkill. Add AWS API Gateway or Kong only when you need auth or rate limiting.
IfYour team lacks operational experience with gateways
UseUse a managed gateway (AWS API Gateway, GCP Apigee). Building and operating your own Kong cluster is non-trivial.

The Five Components Every API Gateway Must Have

A gateway is more than a reverse proxy. It's a composition of distinct components, each with a specific job. Understanding each one separately is how you answer system design questions confidently — and how you avoid misconfiguring production systems.

1. Request Router — Maps incoming URLs and HTTP methods to upstream services. This is the core. Without routing, nothing works.

2. Authentication & Authorization Layer — Validates identity (AuthN) and checks permissions (AuthZ). The gateway is the ideal place for this because it's centralised. JWT validation, OAuth token introspection, API key checks — all happen here before requests go anywhere.

3. Rate Limiter & Throttler — Protects your services from being overwhelmed. Rate limiting says 'you get 100 requests per minute.' Throttling says 'requests beyond that get queued or slowed down, not just rejected.'

4. Load Balancer — When multiple instances of a service are running, the gateway distributes traffic across them. Round-robin, least-connections, and weighted routing are common strategies.

5. Request/Response Transformer — The gateway can reshape payloads in both directions. Strip sensitive fields from responses, add internal headers, translate between REST and gRPC, or aggregate responses from multiple services into one (the Backend for Frontend pattern).

io/thecodeforge/gateway/gateway-middleware-pipeline.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
// Simulating the middleware pipeline of an API Gateway in Node.js
// This shows HOW each component processes a request in sequence
// Real gateways (Kong, AWS API GW, Nginx) do this in compiled code,
// but the pipeline logic is identical.

const express = require('express');
const { v4: uuidv4 } = require('uuid');
const app = express();

app.use(express.json());

// ─── COMPONENT 1: Request Logger ────────────────────────────────────────────
// Every request gets a unique trace ID the moment it arrives
app.use((req, res, next) => {
  req.traceId = uuidv4();                         // Unique ID for distributed tracing
  req.arrivalTime = Date.now();
  console.log(`[GATEWAY] Incoming: ${req.method} ${req.path} | traceId=${req.traceId}`);
  next();
});

// ─── COMPONENT 2: Authentication Layer ──────────────────────────────────────
// Validate the token ONCE here — downstream services don't need to
const PUBLIC_ROUTES = ['/products'];              // Routes that skip auth

function authMiddleware(req, res, next) {
  if (PUBLIC_ROUTES.includes(req.path)) {
    return next();                                // Skip auth for public routes
  }

  const authHeader = req.headers['authorization'];
  if (!authHeader || !authHeader.startsWith('Bearer ')) {
    return res.status(401).json({
      error: 'Authentication required',
      traceId: req.traceId                        // Always return traceId for debugging
    });
  }

  const token = authHeader.split(' ')[1];
  // In production: verify JWT signature, check expiry, decode claims
  // For demo: we accept any token that starts with 'valid-'
  if (!token.startsWith('valid-')) {
    return res.status(401).json({ error: 'Invalid token', traceId: req.traceId });
  }

  req.authenticatedUserId = token.replace('valid-', '');  // Extracted from JWT claims
  res.setHeader('X-Auth-User', req.authenticatedUserId);  // Pass identity downstream
  next();
}

app.use(authMiddleware);

// ─── COMPONENT 3: Rate Limiter ───────────────────────────────────────────────
// Simple in-memory rate limiter (production: use Redis for distributed state)
const requestCounts = {};                         // ip -> { count, windowStart }
const RATE_LIMIT = 5;                             // Max 5 requests per window
const WINDOW_MS  = 60 * 1000;                     // 60-second window

function rateLimiter(req, res, next) {
  const clientIp = req.ip;
  const now = Date.now();

  if (!requestCounts[clientIp] || now - requestCounts[clientIp].windowStart > WINDOW_MS) {
    requestCounts[clientIp] = { count: 1, windowStart: now };  // Reset window
    return next();
  }

  requestCounts[clientIp].count++;

  if (requestCounts[clientIp].count > RATE_LIMIT) {
    const retryAfter = Math.ceil((WINDOW_MS - (now - requestCounts[clientIp].windowStart)) / 1000);
    res.setHeader('Retry-After', retryAfter);     // Tell client when to retry
    return res.status(429).json({
      error: 'Rate limit exceeded',
      retryAfterSeconds: retryAfter,
      traceId: req.traceId
    });
  }

  next();
}

app.use(rateLimiter);

// ─── COMPONENT 4: Router + Load Balancer ────────────────────────────────────
// Round-robin across multiple instances of a service
const productServiceInstances = [
  'http://product-service-1:8001',
  'http://product-service-2:8001',
  'http://product-service-3:8001'
];
let roundRobinIndex = 0;

function getNextProductInstance() {
  const instance = productServiceInstances[roundRobinIndex];
  roundRobinIndex = (roundRobinIndex + 1) % productServiceInstances.length;  // Wrap around
  return instance;
}

// Public product route — no auth needed
app.get('/products', (req, res) => {
  const targetInstance = getNextProductInstance();
  console.log(`[GATEWAY] Routing GET /products -> ${targetInstance} | traceId=${req.traceId}`);

  // In production: forward the actual HTTP request using axios/node-fetch
  // For demo: simulate the downstream response
  res.json({
    _meta: { routedTo: targetInstance, traceId: req.traceId },
    products: [{ id: 1, name: 'Wireless Headphones', price: 79.99 }]
  });
});

// Protected order route
app.post('/orders', (req, res) => {
  console.log(`[GATEWAY] Routing POST /orders for user=${req.authenticatedUserId} | traceId=${req.traceId}`);

  // ─── COMPONENT 5: Request Transformer ─────────────────────────────────────
  // Strip the raw Authorization header — order-service trusts X-Auth-User instead
  const internalPayload = {
    ...req.body,
    requestedByUserId: req.authenticatedUserId,   // Inject verified identity
    gatewayTraceId: req.traceId                   // Inject trace ID for observability
    // Note: we do NOT forward req.headers.authorization to internal services
  };

  console.log('[GATEWAY] Transformed payload for order-service:', internalPayload);

  res.status(201).json({
    message: 'Order created',
    orderId: `ORD-${Date.now()}`,
    traceId: req.traceId
  });
});

app.listen(3000, () => {
  console.log('[GATEWAY] API Gateway running on port 3000');
});
Watch Out: In-Memory Rate Limiting Breaks at Scale
The in-memory rate limiter above works perfectly on a single gateway instance. The moment you run two gateway instances (which you will in production for high availability), each instance has its own counter — so a user effectively gets double their limit. Always back your rate limiter with a shared store like Redis using atomic INCR + EXPIRE commands. This is one of the most common production bugs teams hit after their first gateway scaling event.
Production Insight
The in-memory rate limiter in the code above will fail silently in production the moment you scale to two gateway instances.
Each instance tracks its own counters independently, so a user hitting both instances gets double the intended limit.
Rule: All gateway state must live in a shared external store (Redis). Stateless gateways are horizontally scalable; stateful gateways are not.
Key Takeaway
Authenticate once at the gateway, not in every service. Downstream services should trust identity headers the gateway injects (X-Auth-User).
Rate limiter state must live in a shared store (Redis) the moment you have more than one gateway instance.
Rule: In-memory rate limiting works perfectly in development and fails silently in production. Don't learn this the hard way.
Rate Limiter Storage Decision
IfSingle gateway instance, development or low-traffic production
UseIn-memory rate limiter is acceptable. Use ConcurrentHashMap with sliding window. No external dependency.
IfMultiple gateway instances (HA or scaling), any production traffic
UseUse Redis with atomic INCR + EXPIRE operations. Key format: rate_limit:{userId}:{minute_window} with TTL=60 seconds.
IfVery high throughput (>100k RPS), Redis single point of failure is acceptable
UseUse Redis Cluster with consistent hashing. Lua script for increment-and-check atomicity. Set up Redis sentinel for failover.
IfRate limiting must be exact, not approximate, and no Redis available
UseUse client-side consistent hashing pinned to gateway instance based on user ID hash. Each user always hits same instance. Trade-off: uneven load distribution.
IfYou need advanced features (sliding windows, burst handling, per-route limits)
UseUse existing library (token bucket, leaky bucket) with Redis backend. Implement using sorted sets for sliding window counters.

Gateway Patterns You'll Actually Use: BFF, Aggregation, and Circuit Breaking

Knowing the components is step one. Knowing the patterns built on top of them is what separates a junior engineer from someone who can design systems confidently.

Backend for Frontend (BFF) — Mobile apps and web apps have different data needs. A mobile screen might need a simplified user profile summary; the web dashboard needs the full version with activity history. Instead of having clients make multiple calls or your services maintain multiple response shapes, you create a dedicated gateway layer per frontend. Each BFF cherry-picks and reshapes data for its specific client.

Request Aggregation — Some UI screens need data from three services: user info, recent orders, and loyalty points. Without aggregation, the client makes three serial or parallel calls. With the gateway aggregating, the client makes one call and the gateway fans out to all three services, merges the responses, and returns a single payload. Latency drops dramatically.

Circuit Breaker at the Gateway — If the inventory service is down, you don't want every request piling up and timing out at 5 seconds each. A circuit breaker tracks failure rates and 'opens' — immediately rejecting requests to a failing service with a fallback response — until the service recovers. The gateway is the perfect place to implement this because it's the chokepoint for all traffic.

io/thecodeforge/gateway/gateway-aggregation-and-circuit-breaker.jsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
// Demonstrates two advanced gateway patterns:
// 1. Response Aggregation (fan-out to multiple services, merge results)
// 2. Circuit Breaker (fail fast instead of cascading timeouts)

// ─── Circuit Breaker State Machine ──────────────────────────────────────────
// States: CLOSED (normal) -> OPEN (failing) -> HALF_OPEN (testing recovery)
const CircuitState = { CLOSED: 'CLOSED', OPEN: 'OPEN', HALF_OPEN: 'HALF_OPEN' };

class CircuitBreaker {
  constructor(serviceName, failureThreshold = 3, recoveryTimeoutMs = 10000) {
    this.serviceName       = serviceName;
    this.failureThreshold  = failureThreshold;  // Open circuit after 3 failures
    this.recoveryTimeout   = recoveryTimeoutMs; // Try again after 10 seconds
    this.state             = CircuitState.CLOSED;
    this.failureCount      = 0;
    this.lastFailureTime   = null;
  }

  async call(serviceCallFn) {
    // If circuit is OPEN, check if recovery timeout has passed
    if (this.state === CircuitState.OPEN) {
      const timeSinceFailure = Date.now() - this.lastFailureTime;
      if (timeSinceFailure < this.recoveryTimeout) {
        // Still in open state — fail fast without calling the service
        console.log(`[CIRCUIT] ${this.serviceName} is OPEN — fast failing`);
        throw new Error(`${this.serviceName} circuit is open — service unavailable`);
      }
      // Timeout elapsed — try one probe request
      console.log(`[CIRCUIT] ${this.serviceName} moving to HALF_OPEN — sending probe`);
      this.state = CircuitState.HALF_OPEN;
    }

    try {
      const result = await serviceCallFn();      // Attempt the actual service call
      this.onSuccess();                          // Reset on success
      return result;
    } catch (error) {
      this.onFailure();                          // Track failure
      throw error;
    }
  }

  onSuccess() {
    this.failureCount = 0;
    this.state = CircuitState.CLOSED;            // Restore normal operation
    console.log(`[CIRCUIT] ${this.serviceName} — circuit CLOSED (healthy)`);
  }

  onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.failureThreshold || this.state === CircuitState.HALF_OPEN) {
      this.state = CircuitState.OPEN;            // Trip the circuit
      console.log(`[CIRCUIT] ${this.serviceName} — circuit OPENED after ${this.failureCount} failures`);
    }
  }
}

// ─── Simulated Downstream Services ──────────────────────────────────────────
let inventoryCallCount = 0;

async function fetchUserProfile(userId) {
  // Simulates user-service responding successfully
  return { userId, name: 'Amara Osei', tier: 'Gold' };
}

async function fetchRecentOrders(userId) {
  // Simulates order-service responding successfully
  return { userId, orders: [{ orderId: 'ORD-001', total: 149.99, status: 'Shipped' }] };
}

async function fetchInventoryStatus(productId) {
  // Simulates a flaky inventory service that fails intermittently
  inventoryCallCount++;
  if (inventoryCallCount <= 3) {                 // First 3 calls fail
    throw new Error('inventory-service: connection refused');
  }
  return { productId, inStock: true, quantity: 47 };
}

// ─── Circuit Breakers (one per downstream service) ───────────────────────────
const inventoryCircuit = new CircuitBreaker('inventory-service', 3, 5000);

// ─── Gateway Aggregation Handler ────────────────────────────────────────────
// Client makes ONE call to /dashboard — gateway fans out to 3 services
async function handleDashboardRequest(userId, productId) {
  console.log(`\n[GATEWAY] Dashboard request for userId=${userId}, productId=${productId}`);

  // Fan out: run user profile and orders in PARALLEL (faster than serial)
  const [userProfile, recentOrders] = await Promise.all([
    fetchUserProfile(userId),
    fetchRecentOrders(userId)
  ]);

  // Inventory goes through circuit breaker — it's a non-critical enhancement
  let inventoryData = null;
  try {
    inventoryData = await inventoryCircuit.call(() => fetchInventoryStatus(productId));
  } catch (circuitError) {
    // GRACEFUL DEGRADATION: dashboard still works without inventory data
    console.log(`[GATEWAY] Inventory unavailable — degraded response: ${circuitError.message}`);
    inventoryData = { productId, inStock: null, message: 'Inventory temporarily unavailable' };
  }

  // Aggregate all responses into one payload for the client
  return {
    user:      userProfile,
    orders:    recentOrders,
    inventory: inventoryData
  };
}

// ─── Simulate 5 consecutive dashboard requests ──────────────────────────────
(async () => {
  for (let requestNum = 1; requestNum <= 5; requestNum++) {
    const response = await handleDashboardRequest('user-99', 'product-headphones');
    console.log(`[GATEWAY] Response aggregated:`, JSON.stringify(response.inventory));
  }
})();
Interview Gold: Graceful Degradation vs. Failure
The circuit breaker above returns a degraded response (inventory null) instead of failing the entire dashboard request. This is called graceful degradation — a core resilience principle. Interviewers love asking 'what happens when service X goes down?' The answer they want is: the gateway catches it at the circuit breaker level, returns a partial response with sensible defaults, and protects the rest of the system from the cascade. Saying 'the whole request fails' is the wrong answer.
Production Insight
A circuit breaker that fails open (rejecting all requests) degrades gracefully; a circuit breaker that fails closed (passing all requests) can cascade failure.
Without aggregation, a dashboard that needs data from 3 services makes 3 round trips. With aggregation, the client makes 1 round trip.
Rule: Use BFF when mobile and web have different data needs. Use aggregation when a client needs data from multiple services to render one screen.
Key Takeaway
Circuit breakers belong at the gateway, not just in service-to-service calls. When a downstream service is degraded, fail fast and return a partial response.
Aggregation reduces client complexity and latency. The gateway becomes the orchestrator, not just a proxy.
Rule: A dashboard that makes 5 separate calls from the client is a performance problem. Move aggregation to the gateway.
Gateway Pattern Selection
IfMobile and web clients need different data shapes from the same services
UseUse Backend for Frontend (BFF). Deploy separate gateway instances per client type. Mobile BFF returns lightweight payloads; web BFF returns full data.
IfClient needs data from 3+ services to render a single screen (dashboard, profile, home feed)
UseUse Request Aggregation. Gateway fans out to all services in parallel and merges responses. Client makes 1 call instead of 3+.
IfA non-critical downstream service failing should not break the entire request
UseUse Circuit Breaker + Graceful Degradation. Return default or cached data for the failing service. Mark it as degraded in the response.
IfService is critical (payments, auth) and must not be called if already failing
UseUse Circuit Breaker with FAIL_OPEN (reject requests) when circuit is open. Do NOT fall back to degraded data for critical services.
IfYou need to combine the above patterns (BFF + Aggregation + Circuit Breaker)
UseImplement middleware that applies patterns in order: BFF selector (per client), Aggregation fan-out, Circuit Breaker per downstream call.
● Production incidentPOST-MORTEMseverity: high

The Double Rate Limit That Killed Checkout

Symptom
Inventory sold out 75% faster than expected. Angry customers tweeted that the promotion was a 'scam' — they couldn't check out fast enough. The team's monitoring showed each gateway instance counted requests correctly, but per-user rate limiting was double the intended value.
Assumption
The team assumed rate limit state was shared because they'd configured the same limit value (100 per minute) in each gateway's config file. They didn't know each gateway instance maintains its own in-memory counter. The load balancer distributed requests randomly across instances, so a user hitting 50 requests to instance A and 50 to instance B wasn't blocked — 100 per instance, 200 total.
Root cause
Rate limiting was implemented as a simple in-memory sliding window counter using Redis OM (Object Mapping) was not used. Each gateway instance tracked its own HashMap<userId, requestCount>. Production deployed two instances for high availability. A user making 150 requests per minute would hit instance A for 80 and instance B for 70 — never exceeding either instance's 100 limit. The intended limit of 100 per user was effectively doubled to 200 per user. This is a classic 'race condition in distribution' failure — the system worked correctly per instance but failed at the aggregate level.
Fix
Moved rate limit state to a shared Redis cluster with atomic INCR and EXPIRE commands. Each request now increments a Redis key rate_limit:{userId}:{minute_window} with TTL=60. The gateway checks the Redis value before forwarding. Added a Lua script to increment and check in one atomic operation to eliminate race conditions between instances. After the fix, a user's 101st request was correctly rejected regardless of which instance received it, even at 50k RPS.
Key lesson
  • In-memory rate limiting only works with one gateway instance. The moment you run two (which you must for HA), each instance has its own counter and users get effectively multiplied limits.
  • Any state shared across gateway instances must live in an external store — Redis is the industry standard. Use atomic operations (INCR + EXPIRE) or Lua scripts.
  • Load test your rate limiter with multiple gateway instances before a major sale. Simulate a single user's traffic across all instances using a consistent hash ring or a test harness.
  • Monitor per-user rate limit rejections at the load balancer level, not per gateway instance. A rejection rate of 0 per instance can hide aggregate overages.
Production debug guideSymptom → Action mapping for common gateway failures5 entries
Symptom · 01
Rate limits not applying correctly — some users get through way more requests than allowed
Fix
Check number of gateway instances. If >1, rate limit state is likely in-memory per instance. Verify rate limiter uses a shared Redis or equivalent. Look for Map<userId, count> in code — that's the smoking gun. Switch to Redis with atomic INCR operations.
Symptom · 02
Requests timing out sporadically — some succeed, some fail with 504 Gateway Timeout
Fix
Check downstream service health and gateway timeout settings. Most likely a slow service is absorbing all gateway threads. Implement circuit breaker that fails fast after 3 consecutive failures. Set timeout_ms per route shorter than client timeout so gateway fails fast, not slow.
Symptom · 03
Tracing fails — you have logs across services but can't correlate a single request
Fix
Gateway is not generating or propagating trace IDs. Add middleware that generates a UUID on every incoming request and injects it as X-Trace-Id and X-Request-Id headers to all downstream calls. Each service must log that ID. Now you can grep across all logs.
Symptom · 04
Authentication works inconsistently — some requests to same endpoint succeed, others fail with 401
Fix
Multiple gateway instances may have different JWT validation configurations or cached public keys. Verify all instances have identical config. Check if JWT is being stripped before reaching auth layer. Add X-Auth-User header after validation so downstream services trust it.
Symptom · 05
Circuit breaker stuck in OPEN state — service is healthy but gateway still fast-failing
Fix
Circuit breaker recovery timeout may be too long. When circuit is HALF_OPEN, check that the probe request is actually reaching the service and succeeding. Some circuit breakers require successful probe before closing; others require manual reset. Implement half-open retry logic.
★ API Gateway Quick Debug Cheat SheetFast diagnostics for production gateway issues. Run these commands to confirm the root cause before changing config.
Rate limiter not working as expected — users exceeding limits
Immediate action
Check if multiple gateway instances are running and rate limit state sharing
Commands
kubectl get pods -l app=gateway | wc -l # count instances
redis-cli KEYS 'rate_limit:*' | wc -l # count rate limit keys in Redis
Fix now
If gateway instances > 1 and Redis keys are missing, rate limiter is in-memory. Deploy Redis and switch to atomic INCR+EXPIRE in the gateway. If Redis keys exist but limits still wrong, check TTL and window alignment.
504 Gateway Timeout errors — some requests succeed, some timeout+
Immediate action
Check downstream service latency and gateway thread pool utilization
Commands
curl -w "@curl-format.txt" -o /dev/null -s http://downstream-service/health
netstat -an | grep :8080 | grep ESTABLISHED | wc -l
Fix now
Increase timeout_ms per route if downstream is slow. Add circuit breaker to fail fast after 3 timeouts. Increase gateway thread pool size if connections saturated.
No trace IDs in downstream logs — can't correlate requests+
Immediate action
Verify gateway is generating and propagating trace headers
Commands
curl -v http://gateway/api/orders 2>&1 | grep -i 'x-trace-id'
grep -r 'X-Trace-Id' /var/log/gateway/ | head -1
Fix now
Add middleware that generates UUID on each request. Inject as X-Trace-Id header to downstream services. Configure log formatter to include this header value in every log line.
Circuit breaker open but service is healthy+
Immediate action
Check circuit breaker state and health probe configuration
Commands
curl http://gateway/debug/circuit-breakers | jq '.[] | {service: .name, state: .state, failures: .failureCount}'
kubectl logs -l app=gateway --tail=100 | grep -i 'circuit\|breaker'
Fix now
Force close circuit breaker via admin API if service is confirmed healthy. Adjust failureThreshold from 3 to 5 to reduce false positives. Set recoveryTimeoutMs to 30000 (30 seconds).
Authentication passes on some instances, fails on others+
Immediate action
Check JWT validation consistency across gateway instances
Commands
for pod in $(kubectl get pods -l app=gateway -o name); do kubectl exec $pod -- cat /etc/gateway/jwt-config.json; done
kubectl logs -l app=gateway --tail=10 | grep -i 'jwt\|bearer'
Fix now
Mount JWT public key from a shared ConfigMap. Disable per-instance caching of validation results. Use stateless JWT validation (no shared cache needed).
API Gateway vs Reverse Proxy
AspectAPI GatewaySimple Reverse Proxy (e.g. Nginx)
Primary roleCross-cutting concerns + smart routingTraffic forwarding + SSL termination
AuthenticationBuilt-in JWT/OAuth/API key validationNot natively — requires plugins or custom Lua
Rate limitingPer-client, per-route, configurable policiesBasic — IP-based, limited granularity
Request transformationYes — reshape payloads, add/strip headersLimited — mostly header manipulation
Service aggregation (BFF)Yes — fan out and merge multiple service callsNo — 1:1 proxy only
Circuit breakingYes — native in Kong, AWS, ApigeeNo — must use Nginx+ or external sidecar
Developer portal / API docsYes — most managed gateways include thisNo
Operational complexityHigher — another stateful layer to manageLower — battle-hardened, config is simple
Best forMicroservices with many clients and policiesSimple routing, static content, TLS offload

Key takeaways

1
An API Gateway is not just a proxy
it's a composition of five distinct components: router, auth layer, rate limiter, load balancer, and request/response transformer. Knowing each one separately is what makes you credible in system design interviews.
2
Authenticate once at the gateway, not in every service. Downstream services should trust the identity headers the gateway injects (X-Auth-User)
not re-validate tokens themselves. This eliminates duplicated auth logic across your entire microservice fleet.
3
Circuit breakers belong at the gateway, not just in service-to-service calls. When a downstream service is degraded, fail fast and return a partial (gracefully degraded) response. Letting a slow service absorb all your gateway threads causes cascading failures that are much harder to debug.
4
Rate limiter state must live in a shared store (Redis) the moment you have more than one gateway instance. In-memory rate limiting works perfectly in development and fails silently in production
because each instance tracks its own counters independently.
5
BFF (Backend for Frontend) is the evolution of the single gateway
separate gateway instances per client type (mobile, web, third-party). Each tailors responses to its client, reducing latency and payload size at the cost of operational complexity.

Common mistakes to avoid

5 patterns
×

Putting business logic inside the gateway

Symptom
Gateway config contains conditional logic about order values, user tiers, or product categories. Deploying a config change takes 10 minutes. You're afraid to update it. The gateway has become a new monolith.
Fix
The gateway should only handle infrastructure concerns (auth, routing, rate limiting, transformation). Any logic that touches your domain model belongs in a service. A good rule: if the logic would break if you swapped to a different gateway provider, it doesn't belong in the gateway.
×

Using a single gateway instance without a fallback

Symptom
Your gateway goes down during a deploy, and your entire product is offline. The load balancer has no other healthy target. Customers see 503 Service Unavailable.
Fix
Run at minimum two gateway instances behind a cloud load balancer (AWS ALB, GCP LB) in different availability zones. The gateway is now the most critical piece of your infrastructure — treat it like one. Stateless gateway design (externalising state to Redis) makes horizontal scaling trivial.
×

Not propagating trace IDs end-to-end

Symptom
A request fails in production. You have logs in five different services but can't correlate them. Each service logs different IDs. Debugging takes 2 hours instead of 2 minutes.
Fix
The gateway must generate a unique traceId (UUID or W3C traceparent header) on every incoming request and inject it into every downstream call as a header. Each service logs that ID with every log line. Now you can grep a single ID across all services and reconstruct the full request journey in seconds.
×

Misconfiguring timeouts — gateway timeout < downstream timeout

Symptom
Requests occasionally fail with 504 Gateway Timeout. Downstream service logs show the request succeeded after 3 seconds. Gateway timed out at 2 seconds.
Fix
Gateway timeout must be greater than the sum of downstream processing time + network latency + retries. Set timeout_ms per route to at least 2x the 99th percentile latency of the downstream service. Use retries configuration with exponential backoff.
×

Stripping authentication headers before downstream validation

Symptom
Gateway validates JWT, strips Authorization header, then forwards request. Downstream service implements its own JWT validation but the header is gone. Request fails with 401 Unauthorized.
Fix
After validation, inject a trusted identity header like X-Auth-User: userId and optionally X-Auth-Roles: admin,user. Downstream services should trust these headers if they come from the gateway's trusted network. Use mTLS between gateway and services to prevent spoofing.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does an API Gateway differ from a Service Mesh like Istio, and when ...
Q02SENIOR
Walk me through what happens — component by component — when an unauthen...
Q03SENIOR
If your rate limiter is deployed across three gateway instances and a us...
Q04SENIOR
What is the Backend for Frontend (BFF) pattern, and when would you use i...
Q01 of 04SENIOR

How does an API Gateway differ from a Service Mesh like Istio, and when would you choose one over the other?

ANSWER
An API Gateway operates at Layer 7 (HTTP/HTTPS) as an ingress point for external clients. It handles north-south traffic: browser, mobile app, third-party API calls to your system. Service mesh (Istio, Linkerd) operates as a sidecar proxy on every pod, handling east-west traffic between internal services. Service mesh provides service discovery, mTLS encryption, circuit breaking, and observability for internal communication. Choose API Gateway when: you need authentication, rate limiting, or request transformation for external clients; or you want a single entry point for all client traffic. Choose Service Mesh when: you have many internal services needing mTLS; you want fine-grained retry/timeout policies between services; or you need traffic splitting for canary deployments. The two are complementary — you can (and should) use both: an API Gateway for north-south traffic and a service mesh for east-west traffic.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between an API Gateway and a load balancer?
02
Should I build my own API Gateway or use a managed solution like AWS API Gateway or Kong?
03
Can the API Gateway become a single point of failure?
04
How do you handle authentication across multiple gateway instances?
🔥

That's Components. Mark it forged?

3 min read · try the examples if you haven't

Previous
Message Queues
6 / 18 · Components
Next
Rate Limiting