System Design Intermediate

API Gateway Explained: Components, Patterns and Real-World Design

📅 March 2026 ⏱ 8 min read 🎯 Intermediate

In Plain English 🔥

Imagine a massive hotel. Guests don't wander into the kitchen, the laundry room, or the staff quarters — they talk to the front desk, and the front desk figures out who to send them to. An API Gateway is that front desk for your backend services. Every request from the outside world hits the gateway first, and the gateway decides where it goes, whether the person is allowed in, and how fast they can make requests.

⚡ Quick Answer

Every time you open Uber and request a ride, your app doesn't talk to seventeen different backend services directly. It talks to one door — the API Gateway — and that gateway orchestrates the chaos behind the scenes. In a world where a single product might run on dozens of microservices, having a clean, single entry point isn't a luxury; it's what keeps the whole system from falling apart under real traffic.

The Core Job of an API Gateway: One Door, Many Rooms

Before microservices became the norm, you had a monolith — one big application that handled everything. A client made one request, the app handled it, done. But once you split that monolith into ten, twenty, or fifty services (authentication, payments, user profiles, notifications…), clients suddenly need to know where everything lives. That's chaos.

An API Gateway solves this by being the single entry point for all client traffic. It receives every inbound request, applies a set of cross-cutting concerns (auth, rate limiting, logging), and then routes the request to the right downstream service. The client only ever needs to know one URL.

The key insight here is that the gateway isn't just a proxy that forwards traffic blindly. It actively transforms, validates, and enriches requests before they ever touch your services. Think of it as a bouncer, a receptionist, and a traffic cop rolled into one.

api-gateway-config.yaml · YAML

123456789101112131415161718192021222324252627282930313233343536

# Example: AWS API Gateway / Kong-style declarative configuration
# This shows how a gateway maps external routes to internal services

gateway:
  name: ecommerce-gateway
  base_url: https://api.shopexample.com

routes:
  # Route 1: Public product catalog — no auth required
  - path: /products
    method: GET
    upstream_service: http://product-service:8001/api/products
    auth_required: false          # Public endpoint, anyone can browse
    rate_limit:
      requests_per_minute: 300    # Still throttled to prevent scraping

  # Route 2: Place an order — must be authenticated
  - path: /orders
    method: POST
    upstream_service: http://order-service:8002/api/orders
    auth_required: true           # Gateway checks JWT before forwarding
    rate_limit:
      requests_per_minute: 30     # Stricter limit on write operations
    timeout_ms: 5000              # Gateway cancels request after 5s

  # Route 3: User profile — auth + request transformation
  - path: /users/{userId}/profile
    method: GET
    upstream_service: http://user-service:8003/api/profile
    auth_required: true
    transform_request:
      add_header:
        X-Internal-Request-Id: "${generate_uuid}"   # Gateway injects trace ID
        X-Caller-Service: "api-gateway"              # Downstream knows origin
    rate_limit:
      requests_per_minute: 60

▶ Output

# When a client hits POST /orders without a token:
HTTP 401 Unauthorized
{ "error": "Missing or invalid Authorization header" }

# When a client exceeds 30 req/min on /orders:
HTTP 429 Too Many Requests
{ "error": "Rate limit exceeded", "retry_after_seconds": 12 }

# When a valid request reaches order-service, it sees these headers:
X-Internal-Request-Id: 7f3a9c21-4d12-4e88-b1f0-9c3e7d8a2b56
X-Caller-Service: api-gateway
Authorization: (stripped — gateway already validated it)

🔥

Why This Matters:Notice that the Authorization header is stripped before reaching the downstream service in some patterns. Your internal services can then trust that any request arriving from the gateway is already authenticated — they don't each need to implement JWT validation themselves. This is called the 'trusted network' pattern and it cuts duplicated auth logic across every service.

The Five Components Every API Gateway Must Have

A gateway is more than a reverse proxy. It's a composition of distinct components, each with a specific job. Understanding each one separately is how you answer system design questions confidently — and how you avoid misconfiguring production systems.

1. Request Router — Maps incoming URLs and HTTP methods to upstream services. This is the core. Without routing, nothing works.

2. Authentication & Authorization Layer — Validates identity (AuthN) and checks permissions (AuthZ). The gateway is the ideal place for this because it's centralised. JWT validation, OAuth token introspection, API key checks — all happen here before requests go anywhere.

3. Rate Limiter & Throttler — Protects your services from being overwhelmed. Rate limiting says 'you get 100 requests per minute.' Throttling says 'requests beyond that get queued or slowed down, not just rejected.'

4. Load Balancer — When multiple instances of a service are running, the gateway distributes traffic across them. Round-robin, least-connections, and weighted routing are common strategies.

5. Request/Response Transformer — The gateway can reshape payloads in both directions. Strip sensitive fields from responses, add internal headers, translate between REST and gRPC, or aggregate responses from multiple services into one (the Backend for Frontend pattern).

gateway-middleware-pipeline.js · JAVASCRIPT

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136

// Simulating the middleware pipeline of an API Gateway in Node.js
// This shows HOW each component processes a request in sequence
// Real gateways (Kong, AWS API GW, Nginx) do this in compiled code,
// but the pipeline logic is identical.

const express = require('express');
const { v4: uuidv4 } = require('uuid');
const app = express();

app.use(express.json());

// ─── COMPONENT 1: Request Logger ────────────────────────────────────────────
// Every request gets a unique trace ID the moment it arrives
app.use((req, res, next) => {
  req.traceId = uuidv4();                         // Unique ID for distributed tracing
  req.arrivalTime = Date.now();
  console.log(`[GATEWAY] Incoming: ${req.method} ${req.path} | traceId=${req.traceId}`);
  next();
});

// ─── COMPONENT 2: Authentication Layer ──────────────────────────────────────
// Validate the token ONCE here — downstream services don't need to
const PUBLIC_ROUTES = ['/products'];              // Routes that skip auth

function authMiddleware(req, res, next) {
  if (PUBLIC_ROUTES.includes(req.path)) {
    return next();                                // Skip auth for public routes
  }

  const authHeader = req.headers['authorization'];
  if (!authHeader || !authHeader.startsWith('Bearer ')) {
    return res.status(401).json({
      error: 'Authentication required',
      traceId: req.traceId                        // Always return traceId for debugging
    });
  }

  const token = authHeader.split(' ')[1];
  // In production: verify JWT signature, check expiry, decode claims
  // For demo: we accept any token that starts with 'valid-'
  if (!token.startsWith('valid-')) {
    return res.status(401).json({ error: 'Invalid token', traceId: req.traceId });
  }

  req.authenticatedUserId = token.replace('valid-', '');  // Extracted from JWT claims
  res.setHeader('X-Auth-User', req.authenticatedUserId);  // Pass identity downstream
  next();
}

app.use(authMiddleware);

// ─── COMPONENT 3: Rate Limiter ───────────────────────────────────────────────
// Simple in-memory rate limiter (production: use Redis for distributed state)
const requestCounts = {};                         // ip -> { count, windowStart }
const RATE_LIMIT = 5;                             // Max 5 requests per window
const WINDOW_MS  = 60 * 1000;                     // 60-second window

function rateLimiter(req, res, next) {
  const clientIp = req.ip;
  const now = Date.now();

  if (!requestCounts[clientIp] || now - requestCounts[clientIp].windowStart > WINDOW_MS) {
    requestCounts[clientIp] = { count: 1, windowStart: now };  // Reset window
    return next();
  }

  requestCounts[clientIp].count++;

  if (requestCounts[clientIp].count > RATE_LIMIT) {
    const retryAfter = Math.ceil((WINDOW_MS - (now - requestCounts[clientIp].windowStart)) / 1000);
    res.setHeader('Retry-After', retryAfter);     // Tell client when to retry
    return res.status(429).json({
      error: 'Rate limit exceeded',
      retryAfterSeconds: retryAfter,
      traceId: req.traceId
    });
  }

  next();
}

app.use(rateLimiter);

// ─── COMPONENT 4: Router + Load Balancer ────────────────────────────────────
// Round-robin across multiple instances of a service
const productServiceInstances = [
  'http://product-service-1:8001',
  'http://product-service-2:8001',
  'http://product-service-3:8001'
];
let roundRobinIndex = 0;

function getNextProductInstance() {
  const instance = productServiceInstances[roundRobinIndex];
  roundRobinIndex = (roundRobinIndex + 1) % productServiceInstances.length;  // Wrap around
  return instance;
}

// Public product route — no auth needed
app.get('/products', (req, res) => {
  const targetInstance = getNextProductInstance();
  console.log(`[GATEWAY] Routing GET /products -> ${targetInstance} | traceId=${req.traceId}`);

  // In production: forward the actual HTTP request using axios/node-fetch
  // For demo: simulate the downstream response
  res.json({
    _meta: { routedTo: targetInstance, traceId: req.traceId },
    products: [{ id: 1, name: 'Wireless Headphones', price: 79.99 }]
  });
});

// Protected order route
app.post('/orders', (req, res) => {
  console.log(`[GATEWAY] Routing POST /orders for user=${req.authenticatedUserId} | traceId=${req.traceId}`);

  // ─── COMPONENT 5: Request Transformer ─────────────────────────────────────
  // Strip the raw Authorization header — order-service trusts X-Auth-User instead
  const internalPayload = {
    ...req.body,
    requestedByUserId: req.authenticatedUserId,   // Inject verified identity
    gatewayTraceId: req.traceId                   // Inject trace ID for observability
    // Note: we do NOT forward req.headers.authorization to internal services
  };

  console.log('[GATEWAY] Transformed payload for order-service:', internalPayload);

  res.status(201).json({
    message: 'Order created',
    orderId: `ORD-${Date.now()}`,
    traceId: req.traceId
  });
});

app.listen(3000, () => {
  console.log('[GATEWAY] API Gateway running on port 3000');
});

▶ Output

[GATEWAY] API Gateway running on port 3000

# Request 1: GET /products (no auth needed)
[GATEWAY] Incoming: GET /products | traceId=a1b2c3d4-...
[GATEWAY] Routing GET /products -> http://product-service-1:8001 | traceId=a1b2c3d4-...
Response: { "_meta": { "routedTo": "http://product-service-1:8001", "traceId": "a1b2c3d4-..." }, "products": [...] }

# Request 2: POST /orders without token
[GATEWAY] Incoming: POST /orders | traceId=e5f6a7b8-...
Response: HTTP 401 { "error": "Authentication required", "traceId": "e5f6a7b8-..." }

# Request 3: POST /orders with valid token
[GATEWAY] Incoming: POST /orders | traceId=c9d0e1f2-...
[GATEWAY] Routing POST /orders for user=user-42 | traceId=c9d0e1f2-...
[GATEWAY] Transformed payload: { "item": "headphones", "requestedByUserId": "user-42", "gatewayTraceId": "c9d0e1f2-..." }
Response: HTTP 201 { "message": "Order created", "orderId": "ORD-1718234567890", "traceId": "c9d0e1f2-..." }

⚠️

Watch Out: In-Memory Rate Limiting Breaks at ScaleThe in-memory rate limiter above works perfectly on a single gateway instance. The moment you run two gateway instances (which you will in production for high availability), each instance has its own counter — so a user effectively gets double their limit. Always back your rate limiter with a shared store like Redis using atomic INCR + EXPIRE commands. This is one of the most common production bugs teams hit after their first gateway scaling event.

Gateway Patterns You'll Actually Use: BFF, Aggregation, and Circuit Breaking

Knowing the components is step one. Knowing the patterns built on top of them is what separates a junior engineer from someone who can design systems confidently.

Backend for Frontend (BFF) — Mobile apps and web apps have different data needs. A mobile screen might need a simplified user profile summary; the web dashboard needs the full version with activity history. Instead of having clients make multiple calls or your services maintain multiple response shapes, you create a dedicated gateway layer per frontend. Each BFF cherry-picks and reshapes data for its specific client.

Request Aggregation — Some UI screens need data from three services: user info, recent orders, and loyalty points. Without aggregation, the client makes three serial or parallel calls. With the gateway aggregating, the client makes one call and the gateway fans out to all three services, merges the responses, and returns a single payload. Latency drops dramatically.

Circuit Breaker at the Gateway — If the inventory service is down, you don't want every request piling up and timing out at 5 seconds each. A circuit breaker tracks failure rates and 'opens' — immediately rejecting requests to a failing service with a fallback response — until the service recovers. The gateway is the perfect place to implement this because it's the chokepoint for all traffic.

gateway-aggregation-and-circuit-breaker.js · JAVASCRIPT

// Demonstrates two advanced gateway patterns:
// 1. Response Aggregation (fan-out to multiple services, merge results)
// 2. Circuit Breaker (fail fast instead of cascading timeouts)

// ─── Circuit Breaker State Machine ──────────────────────────────────────────
// States: CLOSED (normal) -> OPEN (failing) -> HALF_OPEN (testing recovery)
const CircuitState = { CLOSED: 'CLOSED', OPEN: 'OPEN', HALF_OPEN: 'HALF_OPEN' };

class CircuitBreaker {
  constructor(serviceName, failureThreshold = 3, recoveryTimeoutMs = 10000) {
    this.serviceName       = serviceName;
    this.failureThreshold  = failureThreshold;  // Open circuit after 3 failures
    this.recoveryTimeout   = recoveryTimeoutMs; // Try again after 10 seconds
    this.state             = CircuitState.CLOSED;
    this.failureCount      = 0;
    this.lastFailureTime   = null;
  }

  async call(serviceCallFn) {
    // If circuit is OPEN, check if recovery timeout has passed
    if (this.state === CircuitState.OPEN) {
      const timeSinceFailure = Date.now() - this.lastFailureTime;
      if (timeSinceFailure < this.recoveryTimeout) {
        // Still in open state — fail fast without calling the service
        console.log(`[CIRCUIT] ${this.serviceName} is OPEN — fast failing`);
        throw new Error(`${this.serviceName} circuit is open — service unavailable`);
      }
      // Timeout elapsed — try one probe request
      console.log(`[CIRCUIT] ${this.serviceName} moving to HALF_OPEN — sending probe`);
      this.state = CircuitState.HALF_OPEN;
    }

    try {
      const result = await serviceCallFn();      // Attempt the actual service call
      this.onSuccess();                          // Reset on success
      return result;
    } catch (error) {
      this.onFailure();                          // Track failure
      throw error;
    }
  }

  onSuccess() {
    this.failureCount = 0;
    this.state = CircuitState.CLOSED;            // Restore normal operation
    console.log(`[CIRCUIT] ${this.serviceName} — circuit CLOSED (healthy)`);
  }

  onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.failureThreshold || this.state === CircuitState.HALF_OPEN) {
      this.state = CircuitState.OPEN;            // Trip the circuit
      console.log(`[CIRCUIT] ${this.serviceName} — circuit OPENED after ${this.failureCount} failures`);
    }
  }
}

// ─── Simulated Downstream Services ──────────────────────────────────────────
let inventoryCallCount = 0;

async function fetchUserProfile(userId) {
  // Simulates user-service responding successfully
  return { userId, name: 'Amara Osei', tier: 'Gold' };
}

async function fetchRecentOrders(userId) {
  // Simulates order-service responding successfully
  return { userId, orders: [{ orderId: 'ORD-001', total: 149.99, status: 'Shipped' }] };
}

async function fetchInventoryStatus(productId) {
  // Simulates a flaky inventory service that fails intermittently
  inventoryCallCount++;
  if (inventoryCallCount <= 3) {                 // First 3 calls fail
    throw new Error('inventory-service: connection refused');
  }
  return { productId, inStock: true, quantity: 47 };
}

// ─── Circuit Breakers (one per downstream service) ───────────────────────────
const inventoryCircuit = new CircuitBreaker('inventory-service', 3, 5000);

// ─── Gateway Aggregation Handler ────────────────────────────────────────────
// Client makes ONE call to /dashboard — gateway fans out to 3 services
async function handleDashboardRequest(userId, productId) {
  console.log(`\n[GATEWAY] Dashboard request for userId=${userId}, productId=${productId}`);

  // Fan out: run user profile and orders in PARALLEL (faster than serial)
  const [userProfile, recentOrders] = await Promise.all([
    fetchUserProfile(userId),
    fetchRecentOrders(userId)
  ]);

  // Inventory goes through circuit breaker — it's a non-critical enhancement
  let inventoryData = null;
  try {
    inventoryData = await inventoryCircuit.call(() => fetchInventoryStatus(productId));
  } catch (circuitError) {
    // GRACEFUL DEGRADATION: dashboard still works without inventory data
    console.log(`[GATEWAY] Inventory unavailable — degraded response: ${circuitError.message}`);
    inventoryData = { productId, inStock: null, message: 'Inventory temporarily unavailable' };
  }

  // Aggregate all responses into one payload for the client
  return {
    user:      userProfile,
    orders:    recentOrders,
    inventory: inventoryData
  };
}

// ─── Simulate 5 consecutive dashboard requests ──────────────────────────────
(async () => {
  for (let requestNum = 1; requestNum <= 5; requestNum++) {
    const response = await handleDashboardRequest('user-99', 'product-headphones');
    console.log(`[GATEWAY] Response aggregated:`, JSON.stringify(response.inventory));
  }
})();

▶ Output

[GATEWAY] Dashboard request for userId=user-99, productId=product-headphones
[CIRCUIT] inventory-service — circuit OPENED after 1 failures
[GATEWAY] Inventory unavailable — degraded response: inventory-service: connection refused
[GATEWAY] Response aggregated: {"productId":"product-headphones","inStock":null,"message":"Inventory temporarily unavailable"}

[GATEWAY] Dashboard request for userId=user-99, productId=product-headphones
[CIRCUIT] inventory-service is OPEN — fast failing
[GATEWAY] Inventory unavailable — degraded response: inventory-service circuit is open — service unavailable
[GATEWAY] Response aggregated: {"productId":"product-headphones","inStock":null,"message":"Inventory temporarily unavailable"}

[GATEWAY] Dashboard request for userId=user-99, productId=product-headphones
[CIRCUIT] inventory-service is OPEN — fast failing
[GATEWAY] Inventory unavailable — degraded response: inventory-service circuit is open — service unavailable

[GATEWAY] Dashboard request for userId=user-99, productId=product-headphones
[CIRCUIT] inventory-service moving to HALF_OPEN — sending probe
[CIRCUIT] inventory-service — circuit CLOSED (healthy)
[GATEWAY] Response aggregated: {"productId":"product-headphones","inStock":true,"quantity":47}

[GATEWAY] Dashboard request for userId=user-99, productId=product-headphones
[CIRCUIT] inventory-service — circuit CLOSED (healthy)
[GATEWAY] Response aggregated: {"productId":"product-headphones","inStock":true,"quantity":47}

⚠️

Interview Gold: Graceful Degradation vs. FailureThe circuit breaker above returns a degraded response (inventory null) instead of failing the entire dashboard request. This is called graceful degradation — a core resilience principle. Interviewers love asking 'what happens when service X goes down?' The answer they want is: the gateway catches it at the circuit breaker level, returns a partial response with sensible defaults, and protects the rest of the system from the cascade. Saying 'the whole request fails' is the wrong answer.

Aspect	API Gateway	Simple Reverse Proxy (e.g. Nginx)
Primary role	Cross-cutting concerns + smart routing	Traffic forwarding + SSL termination
Authentication	Built-in JWT/OAuth/API key validation	Not natively — requires plugins or custom Lua
Rate limiting	Per-client, per-route, configurable policies	Basic — IP-based, limited granularity
Request transformation	Yes — reshape payloads, add/strip headers	Limited — mostly header manipulation
Service aggregation (BFF)	Yes — fan out and merge multiple service calls	No — 1:1 proxy only
Circuit breaking	Yes — native in Kong, AWS, Apigee	No — must use Nginx+ or external sidecar
Developer portal / API docs	Yes — most managed gateways include this	No
Operational complexity	Higher — another stateful layer to manage	Lower — battle-hardened, config is simple
Best for	Microservices with many clients and policies	Simple routing, static content, TLS offload

🎯 Key Takeaways

An API Gateway is not just a proxy — it's a composition of five distinct components: router, auth layer, rate limiter, load balancer, and request/response transformer. Knowing each one separately is what makes you credible in system design interviews.
Authenticate once at the gateway, not in every service. Downstream services should trust the identity headers the gateway injects (X-Auth-User) — not re-validate tokens themselves. This eliminates duplicated auth logic across your entire microservice fleet.
Circuit breakers belong at the gateway, not just in service-to-service calls. When a downstream service is degraded, fail fast and return a partial (gracefully degraded) response. Letting a slow service absorb all your gateway threads causes cascading failures that are much harder to debug.
Rate limiter state must live in a shared store (Redis) the moment you have more than one gateway instance. In-memory rate limiting works perfectly in development and fails silently in production — because each instance tracks its own counters independently.

⚠ Common Mistakes to Avoid

✕Mistake 1: Putting business logic inside the gateway — Symptom: your gateway config starts containing if/else conditions about order values or user tiers, and deployments require gateway restarts — Fix: the gateway should only handle infrastructure concerns (auth, routing, rate limiting). Any logic that touches your domain model belongs in a service. A useful rule: if the logic would break if you swapped to a different gateway provider, it doesn't belong in the gateway.
✕Mistake 2: Using a single gateway instance without a fallback — Symptom: your gateway goes down for a deploy and your entire product is offline — Fix: run at minimum two gateway instances behind a cloud load balancer (AWS ALB, GCP LB) in different availability zones. The gateway is now the most critical piece of your infrastructure — treat it like one. Stateless gateway design (externalising rate limit state to Redis) makes horizontal scaling trivial.
✕Mistake 3: Not propagating trace IDs end-to-end — Symptom: a request fails in production, you have logs in five different services but can't correlate them because there's no shared identifier — Fix: the gateway must generate a unique traceId (UUID or W3C traceparent header) on every incoming request and inject it into every downstream call as a header. Each service logs that ID with every log line. Now you can grep a single ID across all services and reconstruct the full request journey.

Interview Questions on This Topic

QHow does an API Gateway differ from a Service Mesh like Istio, and when would you choose one over the other?
QWalk me through what happens — component by component — when an unauthenticated user hits POST /checkout on a system with an API Gateway in front of five microservices.
QIf your rate limiter is deployed across three gateway instances and a user is allowed 100 requests per minute, how do you prevent them from actually making 300 requests per minute — and what trade-offs does your solution involve?

Frequently Asked Questions

What is the difference between an API Gateway and a load balancer?

A load balancer distributes traffic across identical instances of the same service — it doesn't understand what's in the request. An API Gateway routes traffic to different services based on the request path, method, or headers, and applies cross-cutting concerns like auth and rate limiting along the way. In practice, you use both: a cloud load balancer distributes traffic across multiple gateway instances, and the gateway then routes to the right downstream services.

Should I build my own API Gateway or use a managed solution like AWS API Gateway or Kong?

For almost every team, use a managed solution. Building and operating a gateway means owning TLS termination, rate limit state management, auth plugin security, and high availability — none of which is your core product. Managed gateways like Kong, AWS API Gateway, or Apigee handle all of this. Roll your own only if you have extremely specific requirements that no existing solution meets, and even then, treat it as a long-term maintenance commitment.

Can the API Gateway become a single point of failure?

Yes — which is exactly why you always run multiple gateway instances behind a cloud load balancer, distributed across availability zones. The gateway itself should be stateless (externalise all state like rate limit counters to Redis), so any instance can handle any request. Health checks and automatic instance replacement handle individual failures. The real risk isn't the gateway going down — it's misconfiguring it, which is why change management and staged rollouts for gateway config are critical.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged