API Gateway Explained: Components, Patterns and Real-World Design
Every time you open Uber and request a ride, your app doesn't talk to seventeen different backend services directly. It talks to one door — the API Gateway — and that gateway orchestrates the chaos behind the scenes. In a world where a single product might run on dozens of microservices, having a clean, single entry point isn't a luxury; it's what keeps the whole system from falling apart under real traffic.
The Core Job of an API Gateway: One Door, Many Rooms
Before microservices became the norm, you had a monolith — one big application that handled everything. A client made one request, the app handled it, done. But once you split that monolith into ten, twenty, or fifty services (authentication, payments, user profiles, notifications…), clients suddenly need to know where everything lives. That's chaos.
An API Gateway solves this by being the single entry point for all client traffic. It receives every inbound request, applies a set of cross-cutting concerns (auth, rate limiting, logging), and then routes the request to the right downstream service. The client only ever needs to know one URL.
The key insight here is that the gateway isn't just a proxy that forwards traffic blindly. It actively transforms, validates, and enriches requests before they ever touch your services. Think of it as a bouncer, a receptionist, and a traffic cop rolled into one.
# Example: AWS API Gateway / Kong-style declarative configuration # This shows how a gateway maps external routes to internal services gateway: name: ecommerce-gateway base_url: https://api.shopexample.com routes: # Route 1: Public product catalog — no auth required - path: /products method: GET upstream_service: http://product-service:8001/api/products auth_required: false # Public endpoint, anyone can browse rate_limit: requests_per_minute: 300 # Still throttled to prevent scraping # Route 2: Place an order — must be authenticated - path: /orders method: POST upstream_service: http://order-service:8002/api/orders auth_required: true # Gateway checks JWT before forwarding rate_limit: requests_per_minute: 30 # Stricter limit on write operations timeout_ms: 5000 # Gateway cancels request after 5s # Route 3: User profile — auth + request transformation - path: /users/{userId}/profile method: GET upstream_service: http://user-service:8003/api/profile auth_required: true transform_request: add_header: X-Internal-Request-Id: "${generate_uuid}" # Gateway injects trace ID X-Caller-Service: "api-gateway" # Downstream knows origin rate_limit: requests_per_minute: 60
HTTP 401 Unauthorized
{ "error": "Missing or invalid Authorization header" }
# When a client exceeds 30 req/min on /orders:
HTTP 429 Too Many Requests
{ "error": "Rate limit exceeded", "retry_after_seconds": 12 }
# When a valid request reaches order-service, it sees these headers:
X-Internal-Request-Id: 7f3a9c21-4d12-4e88-b1f0-9c3e7d8a2b56
X-Caller-Service: api-gateway
Authorization: (stripped — gateway already validated it)
The Five Components Every API Gateway Must Have
A gateway is more than a reverse proxy. It's a composition of distinct components, each with a specific job. Understanding each one separately is how you answer system design questions confidently — and how you avoid misconfiguring production systems.
1. Request Router — Maps incoming URLs and HTTP methods to upstream services. This is the core. Without routing, nothing works.
2. Authentication & Authorization Layer — Validates identity (AuthN) and checks permissions (AuthZ). The gateway is the ideal place for this because it's centralised. JWT validation, OAuth token introspection, API key checks — all happen here before requests go anywhere.
3. Rate Limiter & Throttler — Protects your services from being overwhelmed. Rate limiting says 'you get 100 requests per minute.' Throttling says 'requests beyond that get queued or slowed down, not just rejected.'
4. Load Balancer — When multiple instances of a service are running, the gateway distributes traffic across them. Round-robin, least-connections, and weighted routing are common strategies.
5. Request/Response Transformer — The gateway can reshape payloads in both directions. Strip sensitive fields from responses, add internal headers, translate between REST and gRPC, or aggregate responses from multiple services into one (the Backend for Frontend pattern).
// Simulating the middleware pipeline of an API Gateway in Node.js // This shows HOW each component processes a request in sequence // Real gateways (Kong, AWS API GW, Nginx) do this in compiled code, // but the pipeline logic is identical. const express = require('express'); const { v4: uuidv4 } = require('uuid'); const app = express(); app.use(express.json()); // ─── COMPONENT 1: Request Logger ──────────────────────────────────────────── // Every request gets a unique trace ID the moment it arrives app.use((req, res, next) => { req.traceId = uuidv4(); // Unique ID for distributed tracing req.arrivalTime = Date.now(); console.log(`[GATEWAY] Incoming: ${req.method} ${req.path} | traceId=${req.traceId}`); next(); }); // ─── COMPONENT 2: Authentication Layer ────────────────────────────────────── // Validate the token ONCE here — downstream services don't need to const PUBLIC_ROUTES = ['/products']; // Routes that skip auth function authMiddleware(req, res, next) { if (PUBLIC_ROUTES.includes(req.path)) { return next(); // Skip auth for public routes } const authHeader = req.headers['authorization']; if (!authHeader || !authHeader.startsWith('Bearer ')) { return res.status(401).json({ error: 'Authentication required', traceId: req.traceId // Always return traceId for debugging }); } const token = authHeader.split(' ')[1]; // In production: verify JWT signature, check expiry, decode claims // For demo: we accept any token that starts with 'valid-' if (!token.startsWith('valid-')) { return res.status(401).json({ error: 'Invalid token', traceId: req.traceId }); } req.authenticatedUserId = token.replace('valid-', ''); // Extracted from JWT claims res.setHeader('X-Auth-User', req.authenticatedUserId); // Pass identity downstream next(); } app.use(authMiddleware); // ─── COMPONENT 3: Rate Limiter ─────────────────────────────────────────────── // Simple in-memory rate limiter (production: use Redis for distributed state) const requestCounts = {}; // ip -> { count, windowStart } const RATE_LIMIT = 5; // Max 5 requests per window const WINDOW_MS = 60 * 1000; // 60-second window function rateLimiter(req, res, next) { const clientIp = req.ip; const now = Date.now(); if (!requestCounts[clientIp] || now - requestCounts[clientIp].windowStart > WINDOW_MS) { requestCounts[clientIp] = { count: 1, windowStart: now }; // Reset window return next(); } requestCounts[clientIp].count++; if (requestCounts[clientIp].count > RATE_LIMIT) { const retryAfter = Math.ceil((WINDOW_MS - (now - requestCounts[clientIp].windowStart)) / 1000); res.setHeader('Retry-After', retryAfter); // Tell client when to retry return res.status(429).json({ error: 'Rate limit exceeded', retryAfterSeconds: retryAfter, traceId: req.traceId }); } next(); } app.use(rateLimiter); // ─── COMPONENT 4: Router + Load Balancer ──────────────────────────────────── // Round-robin across multiple instances of a service const productServiceInstances = [ 'http://product-service-1:8001', 'http://product-service-2:8001', 'http://product-service-3:8001' ]; let roundRobinIndex = 0; function getNextProductInstance() { const instance = productServiceInstances[roundRobinIndex]; roundRobinIndex = (roundRobinIndex + 1) % productServiceInstances.length; // Wrap around return instance; } // Public product route — no auth needed app.get('/products', (req, res) => { const targetInstance = getNextProductInstance(); console.log(`[GATEWAY] Routing GET /products -> ${targetInstance} | traceId=${req.traceId}`); // In production: forward the actual HTTP request using axios/node-fetch // For demo: simulate the downstream response res.json({ _meta: { routedTo: targetInstance, traceId: req.traceId }, products: [{ id: 1, name: 'Wireless Headphones', price: 79.99 }] }); }); // Protected order route app.post('/orders', (req, res) => { console.log(`[GATEWAY] Routing POST /orders for user=${req.authenticatedUserId} | traceId=${req.traceId}`); // ─── COMPONENT 5: Request Transformer ───────────────────────────────────── // Strip the raw Authorization header — order-service trusts X-Auth-User instead const internalPayload = { ...req.body, requestedByUserId: req.authenticatedUserId, // Inject verified identity gatewayTraceId: req.traceId // Inject trace ID for observability // Note: we do NOT forward req.headers.authorization to internal services }; console.log('[GATEWAY] Transformed payload for order-service:', internalPayload); res.status(201).json({ message: 'Order created', orderId: `ORD-${Date.now()}`, traceId: req.traceId }); }); app.listen(3000, () => { console.log('[GATEWAY] API Gateway running on port 3000'); });
# Request 1: GET /products (no auth needed)
[GATEWAY] Incoming: GET /products | traceId=a1b2c3d4-...
[GATEWAY] Routing GET /products -> http://product-service-1:8001 | traceId=a1b2c3d4-...
Response: { "_meta": { "routedTo": "http://product-service-1:8001", "traceId": "a1b2c3d4-..." }, "products": [...] }
# Request 2: POST /orders without token
[GATEWAY] Incoming: POST /orders | traceId=e5f6a7b8-...
Response: HTTP 401 { "error": "Authentication required", "traceId": "e5f6a7b8-..." }
# Request 3: POST /orders with valid token
[GATEWAY] Incoming: POST /orders | traceId=c9d0e1f2-...
[GATEWAY] Routing POST /orders for user=user-42 | traceId=c9d0e1f2-...
[GATEWAY] Transformed payload: { "item": "headphones", "requestedByUserId": "user-42", "gatewayTraceId": "c9d0e1f2-..." }
Response: HTTP 201 { "message": "Order created", "orderId": "ORD-1718234567890", "traceId": "c9d0e1f2-..." }
Gateway Patterns You'll Actually Use: BFF, Aggregation, and Circuit Breaking
Knowing the components is step one. Knowing the patterns built on top of them is what separates a junior engineer from someone who can design systems confidently.
Backend for Frontend (BFF) — Mobile apps and web apps have different data needs. A mobile screen might need a simplified user profile summary; the web dashboard needs the full version with activity history. Instead of having clients make multiple calls or your services maintain multiple response shapes, you create a dedicated gateway layer per frontend. Each BFF cherry-picks and reshapes data for its specific client.
Request Aggregation — Some UI screens need data from three services: user info, recent orders, and loyalty points. Without aggregation, the client makes three serial or parallel calls. With the gateway aggregating, the client makes one call and the gateway fans out to all three services, merges the responses, and returns a single payload. Latency drops dramatically.
Circuit Breaker at the Gateway — If the inventory service is down, you don't want every request piling up and timing out at 5 seconds each. A circuit breaker tracks failure rates and 'opens' — immediately rejecting requests to a failing service with a fallback response — until the service recovers. The gateway is the perfect place to implement this because it's the chokepoint for all traffic.
// Demonstrates two advanced gateway patterns: // 1. Response Aggregation (fan-out to multiple services, merge results) // 2. Circuit Breaker (fail fast instead of cascading timeouts) // ─── Circuit Breaker State Machine ────────────────────────────────────────── // States: CLOSED (normal) -> OPEN (failing) -> HALF_OPEN (testing recovery) const CircuitState = { CLOSED: 'CLOSED', OPEN: 'OPEN', HALF_OPEN: 'HALF_OPEN' }; class CircuitBreaker { constructor(serviceName, failureThreshold = 3, recoveryTimeoutMs = 10000) { this.serviceName = serviceName; this.failureThreshold = failureThreshold; // Open circuit after 3 failures this.recoveryTimeout = recoveryTimeoutMs; // Try again after 10 seconds this.state = CircuitState.CLOSED; this.failureCount = 0; this.lastFailureTime = null; } async call(serviceCallFn) { // If circuit is OPEN, check if recovery timeout has passed if (this.state === CircuitState.OPEN) { const timeSinceFailure = Date.now() - this.lastFailureTime; if (timeSinceFailure < this.recoveryTimeout) { // Still in open state — fail fast without calling the service console.log(`[CIRCUIT] ${this.serviceName} is OPEN — fast failing`); throw new Error(`${this.serviceName} circuit is open — service unavailable`); } // Timeout elapsed — try one probe request console.log(`[CIRCUIT] ${this.serviceName} moving to HALF_OPEN — sending probe`); this.state = CircuitState.HALF_OPEN; } try { const result = await serviceCallFn(); // Attempt the actual service call this.onSuccess(); // Reset on success return result; } catch (error) { this.onFailure(); // Track failure throw error; } } onSuccess() { this.failureCount = 0; this.state = CircuitState.CLOSED; // Restore normal operation console.log(`[CIRCUIT] ${this.serviceName} — circuit CLOSED (healthy)`); } onFailure() { this.failureCount++; this.lastFailureTime = Date.now(); if (this.failureCount >= this.failureThreshold || this.state === CircuitState.HALF_OPEN) { this.state = CircuitState.OPEN; // Trip the circuit console.log(`[CIRCUIT] ${this.serviceName} — circuit OPENED after ${this.failureCount} failures`); } } } // ─── Simulated Downstream Services ────────────────────────────────────────── let inventoryCallCount = 0; async function fetchUserProfile(userId) { // Simulates user-service responding successfully return { userId, name: 'Amara Osei', tier: 'Gold' }; } async function fetchRecentOrders(userId) { // Simulates order-service responding successfully return { userId, orders: [{ orderId: 'ORD-001', total: 149.99, status: 'Shipped' }] }; } async function fetchInventoryStatus(productId) { // Simulates a flaky inventory service that fails intermittently inventoryCallCount++; if (inventoryCallCount <= 3) { // First 3 calls fail throw new Error('inventory-service: connection refused'); } return { productId, inStock: true, quantity: 47 }; } // ─── Circuit Breakers (one per downstream service) ─────────────────────────── const inventoryCircuit = new CircuitBreaker('inventory-service', 3, 5000); // ─── Gateway Aggregation Handler ──────────────────────────────────────────── // Client makes ONE call to /dashboard — gateway fans out to 3 services async function handleDashboardRequest(userId, productId) { console.log(`\n[GATEWAY] Dashboard request for userId=${userId}, productId=${productId}`); // Fan out: run user profile and orders in PARALLEL (faster than serial) const [userProfile, recentOrders] = await Promise.all([ fetchUserProfile(userId), fetchRecentOrders(userId) ]); // Inventory goes through circuit breaker — it's a non-critical enhancement let inventoryData = null; try { inventoryData = await inventoryCircuit.call(() => fetchInventoryStatus(productId)); } catch (circuitError) { // GRACEFUL DEGRADATION: dashboard still works without inventory data console.log(`[GATEWAY] Inventory unavailable — degraded response: ${circuitError.message}`); inventoryData = { productId, inStock: null, message: 'Inventory temporarily unavailable' }; } // Aggregate all responses into one payload for the client return { user: userProfile, orders: recentOrders, inventory: inventoryData }; } // ─── Simulate 5 consecutive dashboard requests ────────────────────────────── (async () => { for (let requestNum = 1; requestNum <= 5; requestNum++) { const response = await handleDashboardRequest('user-99', 'product-headphones'); console.log(`[GATEWAY] Response aggregated:`, JSON.stringify(response.inventory)); } })();
[CIRCUIT] inventory-service — circuit OPENED after 1 failures
[GATEWAY] Inventory unavailable — degraded response: inventory-service: connection refused
[GATEWAY] Response aggregated: {"productId":"product-headphones","inStock":null,"message":"Inventory temporarily unavailable"}
[GATEWAY] Dashboard request for userId=user-99, productId=product-headphones
[CIRCUIT] inventory-service is OPEN — fast failing
[GATEWAY] Inventory unavailable — degraded response: inventory-service circuit is open — service unavailable
[GATEWAY] Response aggregated: {"productId":"product-headphones","inStock":null,"message":"Inventory temporarily unavailable"}
[GATEWAY] Dashboard request for userId=user-99, productId=product-headphones
[CIRCUIT] inventory-service is OPEN — fast failing
[GATEWAY] Inventory unavailable — degraded response: inventory-service circuit is open — service unavailable
[GATEWAY] Dashboard request for userId=user-99, productId=product-headphones
[CIRCUIT] inventory-service moving to HALF_OPEN — sending probe
[CIRCUIT] inventory-service — circuit CLOSED (healthy)
[GATEWAY] Response aggregated: {"productId":"product-headphones","inStock":true,"quantity":47}
[GATEWAY] Dashboard request for userId=user-99, productId=product-headphones
[CIRCUIT] inventory-service — circuit CLOSED (healthy)
[GATEWAY] Response aggregated: {"productId":"product-headphones","inStock":true,"quantity":47}
| Aspect | API Gateway | Simple Reverse Proxy (e.g. Nginx) |
|---|---|---|
| Primary role | Cross-cutting concerns + smart routing | Traffic forwarding + SSL termination |
| Authentication | Built-in JWT/OAuth/API key validation | Not natively — requires plugins or custom Lua |
| Rate limiting | Per-client, per-route, configurable policies | Basic — IP-based, limited granularity |
| Request transformation | Yes — reshape payloads, add/strip headers | Limited — mostly header manipulation |
| Service aggregation (BFF) | Yes — fan out and merge multiple service calls | No — 1:1 proxy only |
| Circuit breaking | Yes — native in Kong, AWS, Apigee | No — must use Nginx+ or external sidecar |
| Developer portal / API docs | Yes — most managed gateways include this | No |
| Operational complexity | Higher — another stateful layer to manage | Lower — battle-hardened, config is simple |
| Best for | Microservices with many clients and policies | Simple routing, static content, TLS offload |
🎯 Key Takeaways
- An API Gateway is not just a proxy — it's a composition of five distinct components: router, auth layer, rate limiter, load balancer, and request/response transformer. Knowing each one separately is what makes you credible in system design interviews.
- Authenticate once at the gateway, not in every service. Downstream services should trust the identity headers the gateway injects (X-Auth-User) — not re-validate tokens themselves. This eliminates duplicated auth logic across your entire microservice fleet.
- Circuit breakers belong at the gateway, not just in service-to-service calls. When a downstream service is degraded, fail fast and return a partial (gracefully degraded) response. Letting a slow service absorb all your gateway threads causes cascading failures that are much harder to debug.
- Rate limiter state must live in a shared store (Redis) the moment you have more than one gateway instance. In-memory rate limiting works perfectly in development and fails silently in production — because each instance tracks its own counters independently.
⚠ Common Mistakes to Avoid
- ✕Mistake 1: Putting business logic inside the gateway — Symptom: your gateway config starts containing if/else conditions about order values or user tiers, and deployments require gateway restarts — Fix: the gateway should only handle infrastructure concerns (auth, routing, rate limiting). Any logic that touches your domain model belongs in a service. A useful rule: if the logic would break if you swapped to a different gateway provider, it doesn't belong in the gateway.
- ✕Mistake 2: Using a single gateway instance without a fallback — Symptom: your gateway goes down for a deploy and your entire product is offline — Fix: run at minimum two gateway instances behind a cloud load balancer (AWS ALB, GCP LB) in different availability zones. The gateway is now the most critical piece of your infrastructure — treat it like one. Stateless gateway design (externalising rate limit state to Redis) makes horizontal scaling trivial.
- ✕Mistake 3: Not propagating trace IDs end-to-end — Symptom: a request fails in production, you have logs in five different services but can't correlate them because there's no shared identifier — Fix: the gateway must generate a unique traceId (UUID or W3C traceparent header) on every incoming request and inject it into every downstream call as a header. Each service logs that ID with every log line. Now you can grep a single ID across all services and reconstruct the full request journey.
Interview Questions on This Topic
- QHow does an API Gateway differ from a Service Mesh like Istio, and when would you choose one over the other?
- QWalk me through what happens — component by component — when an unauthenticated user hits POST /checkout on a system with an API Gateway in front of five microservices.
- QIf your rate limiter is deployed across three gateway instances and a user is allowed 100 requests per minute, how do you prevent them from actually making 300 requests per minute — and what trade-offs does your solution involve?
Frequently Asked Questions
What is the difference between an API Gateway and a load balancer?
A load balancer distributes traffic across identical instances of the same service — it doesn't understand what's in the request. An API Gateway routes traffic to different services based on the request path, method, or headers, and applies cross-cutting concerns like auth and rate limiting along the way. In practice, you use both: a cloud load balancer distributes traffic across multiple gateway instances, and the gateway then routes to the right downstream services.
Should I build my own API Gateway or use a managed solution like AWS API Gateway or Kong?
For almost every team, use a managed solution. Building and operating a gateway means owning TLS termination, rate limit state management, auth plugin security, and high availability — none of which is your core product. Managed gateways like Kong, AWS API Gateway, or Apigee handle all of this. Roll your own only if you have extremely specific requirements that no existing solution meets, and even then, treat it as a long-term maintenance commitment.
Can the API Gateway become a single point of failure?
Yes — which is exactly why you always run multiple gateway instances behind a cloud load balancer, distributed across availability zones. The gateway itself should be stateless (externalise all state like rate limit counters to Redis), so any instance can handle any request. Health checks and automatic instance replacement handle individual failures. The real risk isn't the gateway going down — it's misconfiguring it, which is why change management and staged rollouts for gateway config are critical.
Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.