Intermediate 4 min · March 05, 2026

API Gateway — Why 2 Instances Doubled Our Rate Limits

Q: What is the difference between an API Gateway and a load balancer?

A load balancer distributes traffic across identical instances of the same service — it doesn't understand what's in the request. An API Gateway routes traffic to different services based on the request path, method, or headers, and applies cross-cutting concerns like auth and rate limiting along the way. In practice, you use both: a cloud load balancer distributes traffic across multiple gateway instances, and the gateway then routes to the right downstream services.

Q: Should I build my own API Gateway or use a managed solution like AWS API Gateway or Kong?

For almost every team, use a managed solution. Building and operating a gateway means owning TLS termination, rate limit state management, auth plugin security, and high availability — none of which is your core product. Managed gateways like Kong, AWS API Gateway, or Apigee handle all of this. Roll your own only if you have extremely specific requirements that no existing solution meets, and even then, treat it as a long-term maintenance commitment.

Q: Can the API Gateway become a single point of failure?

Yes — which is exactly why you always run multiple gateway instances behind a cloud load balancer, distributed across availability zones. The gateway itself should be stateless (externalise all state like rate limit counters to Redis), so any instance can handle any request. Health checks and automatic instance replacement handle individual failures. The real risk isn't the gateway going down — it's misconfiguring it, which is why change management and staged rollouts for gateway config are critical.

Q: How do you handle authentication across multiple gateway instances?

JWT authentication is stateless by design — each gateway instance validates the JWT signature using a shared public key (no shared state needed). OAuth token introspection may require calling the authorization server. Use Redis to cache introspection results for 5 minutes to avoid overloading the auth server. For API keys, store the key-to-user mapping in Redis with a TTL. Never use in-memory caching for auth decisions across multiple instances — stale cache entries can cause inconsistent auth results.

Per-instance rate limiters silently doubled user limits during our flash sale.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

API Gateway = single entry point for all clients, routing requests to downstream services while applying cross-cutting concerns
Five core components: Router (maps URL to service), Auth layer (JWT/OAuth validation), Rate limiter (protects services), Load balancer (distributes traffic), Request transformer (reshapes payloads)
Performance: Stateless gateways scale horizontally behind any cloud LB — a 3-instance Kong cluster handles 50k+ RPS
Production trap: In-memory rate limiting on multiple gateway instances — each instance has its own counter, user gets 3x their limit
Circuit breakers at the gateway prevent cascade failures — failing service only degrades, not the whole system
Biggest mistake: Business logic in the gateway — if it would break when swapping gateway providers, it doesn't belong there

✦ Definition~90s read

What is API Gateway?

An API gateway is a reverse proxy that sits between your clients and backend services, acting as a single entry point for all API traffic. It exists to solve the fundamental problem of distributed systems: as you scale from one service to many, each client shouldn't need to know about every service's location, authentication scheme, or rate limit.

★

Imagine a massive hotel.

Instead, the gateway handles cross-cutting concerns—auth, rate limiting, routing, request transformation—in one place, so your backend services stay focused on business logic. Companies like Netflix, Amazon, and Stripe run gateways handling millions of requests per second precisely because they decouple client-facing APIs from internal service architecture.

In the ecosystem, API gateways compete with service meshes (like Istio) and direct client-to-service calls. You don't need a gateway for a monolith or a handful of services—it adds latency and operational complexity. But once you hit 5+ services, or need to enforce rate limits per client, or support multiple client types (web, mobile, IoT), a gateway becomes essential.

Tools like Kong, AWS API Gateway, NGINX, and Envoy dominate production deployments, each with trade-offs in latency overhead (typically 2-10ms) and configuration complexity.

The core job is straightforward: one door, many rooms. The gateway inspects each request, applies policies (authentication via JWT/OAuth, rate limiting per API key, request validation), then routes to the correct backend. It also aggregates responses from multiple services—critical for mobile clients that can't afford N round trips.

When a backend fails, the gateway circuit-breaks to prevent cascading failures. This single control point is why adding a second gateway instance doubled your rate limits: each instance independently enforces limits, so two instances effectively double your throughput ceiling before hitting per-instance caps.

Plain-English First

Imagine a massive hotel. Guests don't wander into the kitchen, the laundry room, or the staff quarters — they talk to the front desk, and the front desk figures out who to send them to. An API Gateway is that front desk for your backend services. Every request from the outside world hits the gateway first, and the gateway decides where it goes, whether the person is allowed in, and how fast they can make requests.

⚙ Browser compatibility

Latest versions — ✓ supported

Chrome	Firefox	Safari	Edge
✓	✓	✓	✓

Every time you open Uber and request a ride, your app doesn't talk to seventeen different backend services directly. It talks to one door — the API Gateway — and that gateway orchestrates the chaos behind the scenes. In a world where a single product might run on dozens of microservices, having a clean, single entry point isn't a luxury; it's what keeps the whole system from falling apart under real traffic.

But gateways are also dangerous. Put business logic in one and you've created a new monolith. Use in-memory rate limiting across two instances and your limits silently double. Misconfigure timeouts and a slow database takes down your entire checkout flow. The difference between a gateway that helps and one that hurts is understanding the components — not just copying a config from a tutorial.

By the end you'll know exactly what each component does, where they break in production, and how to design for scale. You'll walk away with patterns (BFF, aggregation, circuit breaking) that senior engineers use in interviews and real systems.

API Gateway: The Single Point of Control That Multiplied Our Rate Limits

An API gateway is a reverse proxy that sits between clients and backend services, acting as a single entry point for all API traffic. It intercepts every request to enforce cross-cutting concerns — authentication, rate limiting, routing, and logging — before forwarding to the appropriate service. Without it, each service must implement these policies independently, leading to duplication and inconsistency.

In practice, the gateway evaluates each request against a set of rules. For rate limiting, it typically uses a token bucket or sliding window algorithm, tracking client IPs or API keys in a fast in-memory store (e.g., Redis). When we deployed two gateway instances behind a load balancer, each instance maintained its own counter, effectively doubling the allowed requests per client before hitting the limit. This is because the default configuration treats each instance as an independent rate limiter, not a shared pool.

Use an API gateway when you have multiple microservices that need unified authentication, throttling, or observability. It becomes critical at scale — without it, you'll either over-provision resources to handle bursts or leave services vulnerable to abuse. The key is to ensure shared state (like rate limit counters) lives in a distributed cache, not in-memory per instance.

⚠ Rate Limiting Is Not Stateless

Two gateway instances with in-memory rate limiters will each allow the full quota, effectively doubling throughput. Always use a distributed store like Redis for shared counters.

📊 Production Insight

We deployed two API gateway instances for redundancy and saw rate limit violations spike. The symptom: clients reported 429 errors inconsistently — one request passed, the next was blocked. The rule: never rely on in-memory rate limit counters across multiple instances; use a centralized store with atomic increments.

🎯 Key Takeaway

An API gateway centralizes cross-cutting concerns, but its state must be shared across instances.

Rate limiting without a distributed counter is a security hole, not a safeguard.

Always test gateway behavior under multi-instance load before production deployment.

The Core Job of an API Gateway: One Door, Many Rooms

Before microservices became the norm, you had a monolith — one big application that handled everything. A client made one request, the app handled it, done. But once you split that monolith into ten, twenty, or fifty services (authentication, payments, user profiles, notifications…), clients suddenly need to know where everything lives. That's chaos.

An API Gateway solves this by being the single entry point for all client traffic. It receives every inbound request, applies a set of cross-cutting concerns (auth, rate limiting, logging), and then routes the request to the right downstream service. The client only ever needs to know one URL.

The key insight here is that the gateway isn't just a proxy that forwards traffic blindly. It actively transforms, validates, and enriches requests before they ever touch your services. Think of it as a bouncer, a receptionist, and a traffic cop rolled into one.

io/thecodeforge/gateway/api-gateway-config.yamlYAML

# Example: AWS API Gateway / Kong-style declarative configuration
# This shows how a gateway maps external routes to internal services

gateway:
  name: ecommerce-gateway
  base_url: https://api.shopexample.com

routes:
  # Route 1: Public product catalog — no auth required
  - path: /products
    method: GET
    upstream_service: http://product-service:8001/api/products
    auth_required: false          # Public endpoint, anyone can browse
    rate_limit:
      requests_per_minute: 300    # Still throttled to prevent scraping

  # Route 2: Place an order — must be authenticated
  - path: /orders
    method: POST
    upstream_service: http://order-service:8002/api/orders
    auth_required: true           # Gateway checks JWT before forwarding
    rate_limit:
      requests_per_minute: 30     # Stricter limit on write operations
    timeout_ms: 5000              # Gateway cancels request after 5s

  # Route 3: User profile — auth + request transformation
  - path: /users/{userId}/profile
    method: GET
    upstream_service: http://user-service:8003/api/profile
    auth_required: true
    transform_request:
      add_header:
        X-Internal-Request-Id: "${generate_uuid}"   # Gateway injects trace ID
        X-Caller-Service: "api-gateway"              # Downstream knows origin
    rate_limit:
      requests_per_minute: 60

🔥Why This Matters:

Notice that the Authorization header is stripped before reaching the downstream service in some patterns. Your internal services can then trust that any request arriving from the gateway is already authenticated — they don't each need to implement JWT validation themselves. This is called the 'trusted network' pattern and it cuts duplicated auth logic across every service.

📊 Production Insight

A single gateway instance is a single point of failure. Run at least two behind a cloud load balancer.

The gateway becomes the most critical piece of infrastructure the moment it's deployed.

Rule: Stateless gateways (all state in Redis) scale horizontally. Stateful gateways don't. Design for statelessness from day one.

🎯 Key Takeaway

An API Gateway is not just a proxy — it's a composition of five distinct components: router, auth layer, rate limiter, load balancer, and request/response transformer.

Knowing each one separately is what makes you credible in system design interviews and production incidents.

Rule: If your gateway config contains business logic (if userId == 'admin'), you're doing it wrong.

Gateway or No Gateway?

IfYou have 1-3 services, all internal, low traffic

→

UseSkip the gateway. Use a simple load balancer (Nginx, HAProxy) + internal service discovery.

IfYou have >3 services, multiple client types (web, mobile, third-party)

→

UseDeploy an API gateway. The cross-cutting concerns (auth, rate limiting, aggregation) will otherwise be duplicated per service.

IfYou need per-client API customization (iOS vs Android vs Web)

→

UseUse Backend for Frontend (BFF) pattern — a separate gateway per client type. Mobile gets lightweight payloads; web gets full data.

IfYour only concern is SSL termination and basic routing

→

UseUse Nginx or HAProxy. A full API gateway is overkill. Add AWS API Gateway or Kong only when you need auth or rate limiting.

IfYour team lacks operational experience with gateways

→

UseUse a managed gateway (AWS API Gateway, GCP Apigee). Building and operating your own Kong cluster is non-trivial.

The Five Components Every API Gateway Must Have

A gateway is more than a reverse proxy. It's a composition of distinct components, each with a specific job. Understanding each one separately is how you answer system design questions confidently — and how you avoid misconfiguring production systems.

1. Request Router — Maps incoming URLs and HTTP methods to upstream services. This is the core. Without routing, nothing works.

2. Authentication & Authorization Layer — Validates identity (AuthN) and checks permissions (AuthZ). The gateway is the ideal place for this because it's centralised. JWT validation, OAuth token introspection, API key checks — all happen here before requests go anywhere.

3. Rate Limiter & Throttler — Protects your services from being overwhelmed. Rate limiting says 'you get 100 requests per minute.' Throttling says 'requests beyond that get queued or slowed down, not just rejected.'

4. Load Balancer — When multiple instances of a service are running, the gateway distributes traffic across them. Round-robin, least-connections, and weighted routing are common strategies.

5. Request/Response Transformer — The gateway can reshape payloads in both directions. Strip sensitive fields from responses, add internal headers, translate between REST and gRPC, or aggregate responses from multiple services into one (the Backend for Frontend pattern).

io/thecodeforge/gateway/gateway-middleware-pipeline.jsJAVASCRIPT

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

// Simulating the middleware pipeline of an API Gateway in Node.js
// This shows HOW each component processes a request in sequence
// Real gateways (Kong, AWS API GW, Nginx) do this in compiled code,
// but the pipeline logic is identical.

const express = require('express');
const { v4: uuidv4 } = require('uuid');
const app = express();

app.use(express.json());

// ─── COMPONENT 1: Request Logger ────────────────────────────────────────────
// Every request gets a unique trace ID the moment it arrives
app.use((req, res, next) => {
  req.traceId = uuidv4();                         // Unique ID for distributed tracing
  req.arrivalTime = Date.now();
  console.log(`[GATEWAY] Incoming: ${req.method} ${req.path} | traceId=${req.traceId}`);
  next();
});

// ─── COMPONENT 2: Authentication Layer ──────────────────────────────────────
// Validate the token ONCE here — downstream services don't need to
const PUBLIC_ROUTES = ['/products'];              // Routes that skip auth

function authMiddleware(req, res, next) {
  if (PUBLIC_ROUTES.includes(req.path)) {
    return next();                                // Skip auth for public routes
  }

  const authHeader = req.headers['authorization'];
  if (!authHeader || !authHeader.startsWith('Bearer ')) {
    return res.status(401).json({
      error: 'Authentication required',
      traceId: req.traceId                        // Always return traceId for debugging
    });
  }

  const token = authHeader.split(' ')[1];
  // In production: verify JWT signature, check expiry, decode claims
  // For demo: we accept any token that starts with 'valid-'
  if (!token.startsWith('valid-')) {
    return res.status(401).json({ error: 'Invalid token', traceId: req.traceId });
  }

  req.authenticatedUserId = token.replace('valid-', '');  // Extracted from JWT claims
  res.setHeader('X-Auth-User', req.authenticatedUserId);  // Pass identity downstream
  next();
}

app.use(authMiddleware);

// ─── COMPONENT 3: Rate Limiter ───────────────────────────────────────────────
// Simple in-memory rate limiter (production: use Redis for distributed state)
const requestCounts = {};                         // ip -> { count, windowStart }
const RATE_LIMIT = 5;                             // Max 5 requests per window
const WINDOW_MS  = 60 * 1000;                     // 60-second window

function rateLimiter(req, res, next) {
  const clientIp = req.ip;
  const now = Date.now();

  if (!requestCounts[clientIp] || now - requestCounts[clientIp].windowStart > WINDOW_MS) {
    requestCounts[clientIp] = { count: 1, windowStart: now };  // Reset window
    return next();
  }

  requestCounts[clientIp].count++;

  if (requestCounts[clientIp].count > RATE_LIMIT) {
    const retryAfter = Math.ceil((WINDOW_MS - (now - requestCounts[clientIp].windowStart)) / 1000);
    res.setHeader('Retry-After', retryAfter);     // Tell client when to retry
    return res.status(429).json({
      error: 'Rate limit exceeded',
      retryAfterSeconds: retryAfter,
      traceId: req.traceId
    });
  }

  next();
}

app.use(rateLimiter);

// ─── COMPONENT 4: Router + Load Balancer ────────────────────────────────────
// Round-robin across multiple instances of a service
const productServiceInstances = [
  'http://product-service-1:8001',
  'http://product-service-2:8001',
  'http://product-service-3:8001'
];
let roundRobinIndex = 0;

function getNextProductInstance() {
  const instance = productServiceInstances[roundRobinIndex];
  roundRobinIndex = (roundRobinIndex + 1) % productServiceInstances.length;  // Wrap around
  return instance;
}

// Public product route — no auth needed
app.get('/products', (req, res) => {
  const targetInstance = getNextProductInstance();
  console.log(`[GATEWAY] Routing GET /products -> ${targetInstance} | traceId=${req.traceId}`);

  // In production: forward the actual HTTP request using axios/node-fetch
  // For demo: simulate the downstream response
  res.json({
    _meta: { routedTo: targetInstance, traceId: req.traceId },
    products: [{ id: 1, name: 'Wireless Headphones', price: 79.99 }]
  });
});

// Protected order route
app.post('/orders', (req, res) => {
  console.log(`[GATEWAY] Routing POST /orders for user=${req.authenticatedUserId} | traceId=${req.traceId}`);

  // ─── COMPONENT 5: Request Transformer ─────────────────────────────────────
  // Strip the raw Authorization header — order-service trusts X-Auth-User instead
  const internalPayload = {
    ...req.body,
    requestedByUserId: req.authenticatedUserId,   // Inject verified identity
    gatewayTraceId: req.traceId                   // Inject trace ID for observability
    // Note: we do NOT forward req.headers.authorization to internal services
  };

  console.log('[GATEWAY] Transformed payload for order-service:', internalPayload);

  res.status(201).json({
    message: 'Order created',
    orderId: `ORD-${Date.now()}`,
    traceId: req.traceId
  });
});

app.listen(3000, () => {
  console.log('[GATEWAY] API Gateway running on port 3000');
});

Try it live

⚠ Watch Out: In-Memory Rate Limiting Breaks at Scale

The in-memory rate limiter above works perfectly on a single gateway instance. The moment you run two gateway instances (which you will in production for high availability), each instance has its own counter — so a user effectively gets double their limit. Always back your rate limiter with a shared store like Redis using atomic INCR + EXPIRE commands. This is one of the most common production bugs teams hit after their first gateway scaling event.

📊 Production Insight

The in-memory rate limiter in the code above will fail silently in production the moment you scale to two gateway instances.

Each instance tracks its own counters independently, so a user hitting both instances gets double the intended limit.

Rule: All gateway state must live in a shared external store (Redis). Stateless gateways are horizontally scalable; stateful gateways are not.

🎯 Key Takeaway

Authenticate once at the gateway, not in every service. Downstream services should trust identity headers the gateway injects (X-Auth-User).

Rate limiter state must live in a shared store (Redis) the moment you have more than one gateway instance.

Rule: In-memory rate limiting works perfectly in development and fails silently in production. Don't learn this the hard way.

Rate Limiter Storage Decision

IfSingle gateway instance, development or low-traffic production

→

UseIn-memory rate limiter is acceptable. Use ConcurrentHashMap with sliding window. No external dependency.

IfMultiple gateway instances (HA or scaling), any production traffic

→

UseUse Redis with atomic INCR + EXPIRE operations. Key format: rate_limit:{userId}:{minute_window} with TTL=60 seconds.

IfVery high throughput (>100k RPS), Redis single point of failure is acceptable

→

UseUse Redis Cluster with consistent hashing. Lua script for increment-and-check atomicity. Set up Redis sentinel for failover.

IfRate limiting must be exact, not approximate, and no Redis available

→

UseUse client-side consistent hashing pinned to gateway instance based on user ID hash. Each user always hits same instance. Trade-off: uneven load distribution.

IfYou need advanced features (sliding windows, burst handling, per-route limits)

→

UseUse existing library (token bucket, leaky bucket) with Redis backend. Implement using sorted sets for sliding window counters.

Gateway Patterns You'll Actually Use: BFF, Aggregation, and Circuit Breaking

Knowing the components is step one. Knowing the patterns built on top of them is what separates a junior engineer from someone who can design systems confidently.

Backend for Frontend (BFF) — Mobile apps and web apps have different data needs. A mobile screen might need a simplified user profile summary; the web dashboard needs the full version with activity history. Instead of having clients make multiple calls or your services maintain multiple response shapes, you create a dedicated gateway layer per frontend. Each BFF cherry-picks and reshapes data for its specific client.

Request Aggregation — Some UI screens need data from three services: user info, recent orders, and loyalty points. Without aggregation, the client makes three serial or parallel calls. With the gateway aggregating, the client makes one call and the gateway fans out to all three services, merges the responses, and returns a single payload. Latency drops dramatically.

Circuit Breaker at the Gateway — If the inventory service is down, you don't want every request piling up and timing out at 5 seconds each. A circuit breaker tracks failure rates and 'opens' — immediately rejecting requests to a failing service with a fallback response — until the service recovers. The gateway is the perfect place to implement this because it's the chokepoint for all traffic.

io/thecodeforge/gateway/gateway-aggregation-and-circuit-breaker.jsJAVASCRIPT

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

// Demonstrates two advanced gateway patterns:
// 1. Response Aggregation (fan-out to multiple services, merge results)
// 2. Circuit Breaker (fail fast instead of cascading timeouts)

// ─── Circuit Breaker State Machine ──────────────────────────────────────────
// States: CLOSED (normal) -> OPEN (failing) -> HALF_OPEN (testing recovery)
const CircuitState = { CLOSED: 'CLOSED', OPEN: 'OPEN', HALF_OPEN: 'HALF_OPEN' };

class CircuitBreaker {
  constructor(serviceName, failureThreshold = 3, recoveryTimeoutMs = 10000) {
    this.serviceName       = serviceName;
    this.failureThreshold  = failureThreshold;  // Open circuit after 3 failures
    this.recoveryTimeout   = recoveryTimeoutMs; // Try again after 10 seconds
    this.state             = CircuitState.CLOSED;
    this.failureCount      = 0;
    this.lastFailureTime   = null;
  }

  async call(serviceCallFn) {
    // If circuit is OPEN, check if recovery timeout has passed
    if (this.state === CircuitState.OPEN) {
      const timeSinceFailure = Date.now() - this.lastFailureTime;
      if (timeSinceFailure < this.recoveryTimeout) {
        // Still in open state — fail fast without calling the service
        console.log(`[CIRCUIT] ${this.serviceName} is OPEN — fast failing`);
        throw new Error(`${this.serviceName} circuit is open — service unavailable`);
      }
      // Timeout elapsed — try one probe request
      console.log(`[CIRCUIT] ${this.serviceName} moving to HALF_OPEN — sending probe`);
      this.state = CircuitState.HALF_OPEN;
    }

    try {
      const result = await serviceCallFn();      // Attempt the actual service call
      this.onSuccess();                          // Reset on success
      return result;
    } catch (error) {
      this.onFailure();                          // Track failure
      throw error;
    }
  }

  onSuccess() {
    this.failureCount = 0;
    this.state = CircuitState.CLOSED;            // Restore normal operation
    console.log(`[CIRCUIT] ${this.serviceName} — circuit CLOSED (healthy)`);
  }

  onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.failureThreshold || this.state === CircuitState.HALF_OPEN) {
      this.state = CircuitState.OPEN;            // Trip the circuit
      console.log(`[CIRCUIT] ${this.serviceName} — circuit OPENED after ${this.failureCount} failures`);
    }
  }
}

// ─── Simulated Downstream Services ──────────────────────────────────────────
let inventoryCallCount = 0;

async function fetchUserProfile(userId) {
  // Simulates user-service responding successfully
  return { userId, name: 'Amara Osei', tier: 'Gold' };
}

async function fetchRecentOrders(userId) {
  // Simulates order-service responding successfully
  return { userId, orders: [{ orderId: 'ORD-001', total: 149.99, status: 'Shipped' }] };
}

async function fetchInventoryStatus(productId) {
  // Simulates a flaky inventory service that fails intermittently
  inventoryCallCount++;
  if (inventoryCallCount <= 3) {                 // First 3 calls fail
    throw new Error('inventory-service: connection refused');
  }
  return { productId, inStock: true, quantity: 47 };
}

// ─── Circuit Breakers (one per downstream service) ───────────────────────────
const inventoryCircuit = new CircuitBreaker('inventory-service', 3, 5000);

// ─── Gateway Aggregation Handler ────────────────────────────────────────────
// Client makes ONE call to /dashboard — gateway fans out to 3 services
async function handleDashboardRequest(userId, productId) {
  console.log(`\n[GATEWAY] Dashboard request for userId=${userId}, productId=${productId}`);

  // Fan out: run user profile and orders in PARALLEL (faster than serial)
  const [userProfile, recentOrders] = await Promise.all([
    fetchUserProfile(userId),
    fetchRecentOrders(userId)
  ]);

  // Inventory goes through circuit breaker — it's a non-critical enhancement
  let inventoryData = null;
  try {
    inventoryData = await inventoryCircuit.call(() => fetchInventoryStatus(productId));
  } catch (circuitError) {
    // GRACEFUL DEGRADATION: dashboard still works without inventory data
    console.log(`[GATEWAY] Inventory unavailable — degraded response: ${circuitError.message}`);
    inventoryData = { productId, inStock: null, message: 'Inventory temporarily unavailable' };
  }

  // Aggregate all responses into one payload for the client
  return {
    user:      userProfile,
    orders:    recentOrders,
    inventory: inventoryData
  };
}

// ─── Simulate 5 consecutive dashboard requests ──────────────────────────────
(async () => {
  for (let requestNum = 1; requestNum <= 5; requestNum++) {
    const response = await handleDashboardRequest('user-99', 'product-headphones');
    console.log(`[GATEWAY] Response aggregated:`, JSON.stringify(response.inventory));
  }
})();

Try it live

💡Interview Gold: Graceful Degradation vs. Failure

The circuit breaker above returns a degraded response (inventory null) instead of failing the entire dashboard request. This is called graceful degradation — a core resilience principle. Interviewers love asking 'what happens when service X goes down?' The answer they want is: the gateway catches it at the circuit breaker level, returns a partial response with sensible defaults, and protects the rest of the system from the cascade. Saying 'the whole request fails' is the wrong answer.

📊 Production Insight

A circuit breaker that fails open (rejecting all requests) degrades gracefully; a circuit breaker that fails closed (passing all requests) can cascade failure.

Without aggregation, a dashboard that needs data from 3 services makes 3 round trips. With aggregation, the client makes 1 round trip.

Rule: Use BFF when mobile and web have different data needs. Use aggregation when a client needs data from multiple services to render one screen.

🎯 Key Takeaway

Circuit breakers belong at the gateway, not just in service-to-service calls. When a downstream service is degraded, fail fast and return a partial response.

Aggregation reduces client complexity and latency. The gateway becomes the orchestrator, not just a proxy.

Rule: A dashboard that makes 5 separate calls from the client is a performance problem. Move aggregation to the gateway.

Gateway Pattern Selection

IfMobile and web clients need different data shapes from the same services

→

UseUse Backend for Frontend (BFF). Deploy separate gateway instances per client type. Mobile BFF returns lightweight payloads; web BFF returns full data.

IfClient needs data from 3+ services to render a single screen (dashboard, profile, home feed)

→

UseUse Request Aggregation. Gateway fans out to all services in parallel and merges responses. Client makes 1 call instead of 3+.

IfA non-critical downstream service failing should not break the entire request

→

UseUse Circuit Breaker + Graceful Degradation. Return default or cached data for the failing service. Mark it as degraded in the response.

IfService is critical (payments, auth) and must not be called if already failing

→

UseUse Circuit Breaker with FAIL_OPEN (reject requests) when circuit is open. Do NOT fall back to degraded data for critical services.

IfYou need to combine the above patterns (BFF + Aggregation + Circuit Breaker)

→

UseImplement middleware that applies patterns in order: BFF selector (per client), Aggregation fan-out, Circuit Breaker per downstream call.

thecodeforge.io

Api Gateway

Request Validation: The Silent Gate That Stops Production Fires

You don't want invalid payloads reaching your downstream services. They crash. They corrupt data. They cost you a weekend. Your API Gateway should be the first line of defense, not your business logic. Why push malformed JSON to a microservice when you can reject it at the door? The Gateway validates requests before routing: schema checks, type enforcement, missing field detection. If the request fails, it returns a 400 with a clear error message. Your backend teams thank you. Your on-call phone stops buzzing. This is not optional. Every Gateway worth its salt offers middleware hooks for custom validation. Use them. Define your schemas in OpenAPI or a JSON Schema file. Load them at startup. Validate every inbound payload. Yes, it adds a few milliseconds. No, a malformed request hitting three services before failing is not 'faster'.

GatewayValidator.javaJAVA

// io.thecodeforge.gateway
import com.networknt.schema.*; // small but fierce

public class GatewayValidator {
    private final JsonSchema schema;

    public GatewayValidator(String schemaPath) {
        JsonSchemaFactory factory = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V7);
        this.schema = factory.getSchema(SchemaLocation.of(schemaPath));
    }

    public ValidationResult validate(String requestBody) {
        Set<ValidationMessage> errors = schema.validate(new StringReader(requestBody));
        if (!errors.isEmpty()) {
            return new ValidationResult(false, errors.stream()
               .map(ValidationMessage::getMessage)
               .collect(Collectors.toList()));
        }
        return new ValidationResult(true, List.of());
    }
}

// Output: ValidationResult(success=false, errors=[$.price: must be a number, $.email: string "bademail" is not a valid email])

Output

ValidationResult(success=false, errors=[$.price: must be a number, $.email: string "bademail" is not a valid email])

⚠ Production Trap:

Don't let teams bypass the Gateway 'for performance'. That's how a single bad mobile client takes down inventory service. Enforce validation at the Gateway, or enforce it in four different service repos. Pick your poison.

🎯 Key Takeaway

Validate requests at the Gateway, not the service. Fail fast, fail at the edge.

Rate Limiting: Because One Rogue Client Will Wreck Your SLA

Your API is a shared resource. One bad actor—a misconfigured script, a DDOS, a junior's infinite loop—can saturate your database connections and knock out legitimate users. Rate limiting is the airbag. Every API Gateway worth deploying supports it. You configure limits per client, per endpoint, per time window. The Gateway tracks counters in memory or Redis. When a client exceeds the limit, it returns HTTP 429 'Too Many Requests' with a Retry-After header. The client's problem becomes their problem, not your SRE team's. Why is this here and not in each microservice? Because if you implement rate limiting per service, you need N different implementations. One bug. One inconsistent configuration. Or worse—a client hits each service individually and still topples the cluster. Centralized rate limiting gives you a single panel to tweak limits without redeploying ten services. Use it.

RateLimiterFilter.javaJAVA

// io.thecodeforge.gateway.ratelimit
import io.github.bucket4j.*; // simple, doesn't get in the way

public class RateLimiterFilter implements GatewayFilter {
    private final Bucket bucket;

    public RateLimiterFilter(int capacity, int tokensPerMinute) {
        Bandwidth limit = Bandwidth.classic(capacity, Refill.greedy(tokensPerMinute, Duration.ofMinutes(1)));
        this.bucket = Bucket4j.builder().addLimit(limit).build();
    }

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        if (bucket.tryConsume(1)) {
            return chain.filter(exchange);
        }
        exchange.getResponse().setStatusCode(HttpStatus.TOO_MANY_REQUESTS);
        exchange.getResponse().getHeaders().add("Retry-After", "60");
        return exchange.getResponse().setComplete();
    }
}

// Output: Client receives 429 with Retry-After: 60 header. Logs show token bucket depleted.

Output

Client receives 429 with Retry-After: 60 header. Logs show token bucket depleted.

⚠ Production Trap:

Never use in-memory rate limiting for multi-instance Gateway deployments. Each pod has its own counter. Client hits pod A three times, pod B three times—nobody fails. Use Redis-backed distributed limits or your gateway is a sieve.

🎯 Key Takeaway

Rate limit at the Gateway, not the service. Use a distributed backend (Redis) for multi-instance deployments.

thecodeforge.io

Api Gateway

● Production incidentPOST-MORTEMseverity: high

The Double Rate Limit That Killed Checkout

Symptom

Inventory sold out 75% faster than expected. Angry customers tweeted that the promotion was a 'scam' — they couldn't check out fast enough. The team's monitoring showed each gateway instance counted requests correctly, but per-user rate limiting was double the intended value.

Assumption

The team assumed rate limit state was shared because they'd configured the same limit value (100 per minute) in each gateway's config file. They didn't know each gateway instance maintains its own in-memory counter. The load balancer distributed requests randomly across instances, so a user hitting 50 requests to instance A and 50 to instance B wasn't blocked — 100 per instance, 200 total.

Root cause

Rate limiting was implemented as a simple in-memory sliding window counter using Redis OM (Object Mapping) was not used. Each gateway instance tracked its own HashMap<userId, requestCount>. Production deployed two instances for high availability. A user making 150 requests per minute would hit instance A for 80 and instance B for 70 — never exceeding either instance's 100 limit. The intended limit of 100 per user was effectively doubled to 200 per user. This is a classic 'race condition in distribution' failure — the system worked correctly per instance but failed at the aggregate level.

Fix

Moved rate limit state to a shared Redis cluster with atomic INCR and EXPIRE commands. Each request now increments a Redis key rate_limit:{userId}:{minute_window} with TTL=60. The gateway checks the Redis value before forwarding. Added a Lua script to increment and check in one atomic operation to eliminate race conditions between instances. After the fix, a user's 101st request was correctly rejected regardless of which instance received it, even at 50k RPS.

Key lesson

In-memory rate limiting only works with one gateway instance. The moment you run two (which you must for HA), each instance has its own counter and users get effectively multiplied limits.
Any state shared across gateway instances must live in an external store — Redis is the industry standard. Use atomic operations (INCR + EXPIRE) or Lua scripts.
Load test your rate limiter with multiple gateway instances before a major sale. Simulate a single user's traffic across all instances using a consistent hash ring or a test harness.
Monitor per-user rate limit rejections at the load balancer level, not per gateway instance. A rejection rate of 0 per instance can hide aggregate overages.

Production debug guideSymptom → Action mapping for common gateway failures5 entries

Symptom · 01

Rate limits not applying correctly — some users get through way more requests than allowed

→

Fix

Check number of gateway instances. If >1, rate limit state is likely in-memory per instance. Verify rate limiter uses a shared Redis or equivalent. Look for Map<userId, count> in code — that's the smoking gun. Switch to Redis with atomic INCR operations.

Symptom · 02

Requests timing out sporadically — some succeed, some fail with 504 Gateway Timeout

→

Fix

Check downstream service health and gateway timeout settings. Most likely a slow service is absorbing all gateway threads. Implement circuit breaker that fails fast after 3 consecutive failures. Set timeout_ms per route shorter than client timeout so gateway fails fast, not slow.

Symptom · 03

Tracing fails — you have logs across services but can't correlate a single request

→

Fix

Gateway is not generating or propagating trace IDs. Add middleware that generates a UUID on every incoming request and injects it as X-Trace-Id and X-Request-Id headers to all downstream calls. Each service must log that ID. Now you can grep across all logs.

Symptom · 04

Authentication works inconsistently — some requests to same endpoint succeed, others fail with 401

→

Fix

Multiple gateway instances may have different JWT validation configurations or cached public keys. Verify all instances have identical config. Check if JWT is being stripped before reaching auth layer. Add X-Auth-User header after validation so downstream services trust it.

Symptom · 05

Circuit breaker stuck in OPEN state — service is healthy but gateway still fast-failing

→

Fix

Circuit breaker recovery timeout may be too long. When circuit is HALF_OPEN, check that the probe request is actually reaching the service and succeeding. Some circuit breakers require successful probe before closing; others require manual reset. Implement half-open retry logic.

★ API Gateway Quick Debug Cheat SheetFast diagnostics for production gateway issues. Run these commands to confirm the root cause before changing config.

Rate limiter not working as expected — users exceeding limits−

Immediate action

Check if multiple gateway instances are running and rate limit state sharing

Commands

kubectl get pods -l app=gateway | wc -l # count instances

redis-cli KEYS 'rate_limit:*' | wc -l # count rate limit keys in Redis

Fix now

If gateway instances > 1 and Redis keys are missing, rate limiter is in-memory. Deploy Redis and switch to atomic INCR+EXPIRE in the gateway. If Redis keys exist but limits still wrong, check TTL and window alignment.

504 Gateway Timeout errors — some requests succeed, some timeout+

No trace IDs in downstream logs — can't correlate requests+

Circuit breaker open but service is healthy+

Authentication passes on some instances, fails on others+

API Gateway vs Reverse Proxy

Aspect	API Gateway	Simple Reverse Proxy (e.g. Nginx)
Primary role	Cross-cutting concerns + smart routing	Traffic forwarding + SSL termination
Authentication	Built-in JWT/OAuth/API key validation	Not natively — requires plugins or custom Lua
Rate limiting	Per-client, per-route, configurable policies	Basic — IP-based, limited granularity
Request transformation	Yes — reshape payloads, add/strip headers	Limited — mostly header manipulation
Service aggregation (BFF)	Yes — fan out and merge multiple service calls	No — 1:1 proxy only
Circuit breaking	Yes — native in Kong, AWS, Apigee	No — must use Nginx+ or external sidecar
Developer portal / API docs	Yes — most managed gateways include this	No
Operational complexity	Higher — another stateful layer to manage	Lower — battle-hardened, config is simple
Best for	Microservices with many clients and policies	Simple routing, static content, TLS offload

⚙ Quick Reference

5 commands from this guide

File	Command / Code	Purpose
iothecodeforgegatewayapi-gateway-config.yaml	gateway:	The Core Job of an API Gateway
iothecodeforgegatewaygateway-middleware-pipeline.js	const express = require('express');	The Five Components Every API Gateway Must Have
iothecodeforgegatewaygateway-aggregation-and-circuit-breaker.js	const CircuitState = { CLOSED: 'CLOSED', OPEN: 'OPEN', HALF_OPEN: 'HALF_OPEN' };	Gateway Patterns You'll Actually Use
GatewayValidator.java	public class GatewayValidator {	Request Validation
RateLimiterFilter.java	public class RateLimiterFilter implements GatewayFilter {	Rate Limiting

Key takeaways

An API Gateway is not just a proxy

it's a composition of five distinct components: router, auth layer, rate limiter, load balancer, and request/response transformer. Knowing each one separately is what makes you credible in system design interviews.

Authenticate once at the gateway, not in every service. Downstream services should trust the identity headers the gateway injects (X-Auth-User)

not re-validate tokens themselves. This eliminates duplicated auth logic across your entire microservice fleet.

Circuit breakers belong at the gateway, not just in service-to-service calls. When a downstream service is degraded, fail fast and return a partial (gracefully degraded) response. Letting a slow service absorb all your gateway threads causes cascading failures that are much harder to debug.

Rate limiter state must live in a shared store (Redis) the moment you have more than one gateway instance. In-memory rate limiting works perfectly in development and fails silently in production

because each instance tracks its own counters independently.

BFF (Backend for Frontend) is the evolution of the single gateway

separate gateway instances per client type (mobile, web, third-party). Each tailors responses to its client, reducing latency and payload size at the cost of operational complexity.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does an API Gateway differ from a Service Mesh like Istio, and when ...

Q02SENIOR

Walk me through what happens — component by component — when an unauthen...

Q03SENIOR

If your rate limiter is deployed across three gateway instances and a us...

Q04SENIOR

What is the Backend for Frontend (BFF) pattern, and when would you use i...

Q01 of 04SENIOR

How does an API Gateway differ from a Service Mesh like Istio, and when would you choose one over the other?

ANSWER

An API Gateway operates at Layer 7 (HTTP/HTTPS) as an ingress point for external clients. It handles north-south traffic: browser, mobile app, third-party API calls to your system. Service mesh (Istio, Linkerd) operates as a sidecar proxy on every pod, handling east-west traffic between internal services. Service mesh provides service discovery, mTLS encryption, circuit breaking, and observability for internal communication. Choose API Gateway when: you need authentication, rate limiting, or request transformation for external clients; or you want a single entry point for all client traffic. Choose Service Mesh when: you have many internal services needing mTLS; you want fine-grained retry/timeout policies between services; or you need traffic splitting for canary deployments. The two are complementary — you can (and should) use both: an API Gateway for north-south traffic and a service mesh for east-west traffic.

FAQ · 4 QUESTIONS

Frequently Asked Questions

What is the difference between an API Gateway and a load balancer?

Should I build my own API Gateway or use a managed solution like AWS API Gateway or Kong?

Can the API Gateway become a single point of failure?

How do you handle authentication across multiple gateway instances?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's Components. Mark it forged?

4 min read · try the examples if you haven't