API Keys Explained: How They Work and Why They Fail
- An API key is a lookup token, not a cryptographic proof β whoever holds the string has the permission, which is why storage and transmission are everything
- The logging trap kills you quietly: Sentry, Datadog, and similar tools will happily capture your Authorization header in error breadcrumbs unless you explicitly scrub them β go check your existing error logs before finishing this article
- One scoped key per service is the single highest-leverage change you can make β when (not if) a key leaks, scope isolation determines whether you have a five-minute fix or a three-hour incident
A developer at a Y Combinator startup pushed to GitHub on a Friday afternoon. By Sunday, a bot had scraped their AWS API key from the commit history, spun up 47 GPU instances for crypto mining, and run up a $17,000 bill. The key had been in the code for exactly 11 minutes before the push. Eleven minutes. The bill took three months to dispute.
API keys are everywhere β every third-party service you integrate, every payment processor, every mapping library, every SMS gateway. They're the most common authentication mechanism in modern software, and they're also the most commonly mishandled. Not because developers are careless, but because nobody sits down and explains what these things actually are, how they flow through a system, and specifically what blows up when you treat them carelessly.
By the end of this, you'll know exactly what an API key is and isn't, how to generate and store one safely, how to pass it correctly in HTTP requests, what rate limiting and key rotation actually look like in practice, and β most critically β the exact mistakes that get people paged at 3am or handed a five-figure cloud bill. No handwaving. No 'just be careful with your keys.' Concrete mechanics, real failure modes, specific fixes.
What an API Key Actually Is (And What It Is Not)
Before you can protect an API key, you need to know what it's doing in the first place. Most explanations skip straight to 'keep it secret' without ever explaining the mechanism. That's why people make mistakes β they're following rules they don't understand.
An API is just a door into someone else's software. Stripe's API is a door into their payment system. The Google Maps API is a door into their mapping engine. You're not running their code β you're sending HTTP requests to their servers, and their servers do the work and send back a response. Simple.
The problem is: that door can't be wide open. Stripe needs to know which requests came from your account so they can bill you, rate-limit you, and lock you out if you do something sketchy. They can't ask you to type a username and password every single time your checkout page needs to verify a card β that would happen dozens of times per second at scale. So instead, they give you a key: a long random string that you attach to every request. Their server sees the key, looks it up in their database, finds your account, and knows who's asking.
Here's the critical thing most juniors get wrong: an API key is NOT encryption. It doesn't scramble your data. It's not a token that proves who you are through math. It's purely a lookup mechanism β a secret identifier that maps to an account in someone else's database. That distinction matters enormously when you're deciding how to store and transmit it.
// io.thecodeforge β System Design tutorial // Tracing a single API call from your app to a third-party service // Scenario: Your e-commerce checkout calls Stripe to charge a card // βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ // STEP 1 β Your checkout service builds an HTTP request // βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ POST https://api.stripe.com/v1/charges Headers: Authorization: Bearer sk_live_4eC39HqLyjWDarjtT1zdp7dc // <-- the API key Content-Type: application/x-www-form-urlencoded Body: amount=2000 // $20.00 in cents currency=usd source=tok_visa // tokenised card from Stripe.js // βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ // STEP 2 β Stripe's server receives the request // βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Stripe API Gateway: 1. Extract key from Authorization header key = "sk_live_4eC39HqLyjWDarjtT1zdp7dc" 2. Look up key in Stripe's internal key store SELECT account_id, permissions, rate_limit, is_active FROM api_keys WHERE key_hash = SHA256("sk_live_4eC39HqLyjWDarjtT1zdp7dc") // NOTE: Stripe stores a HASH of your key, not the key itself // This means even Stripe can't recover your key if their DB leaks 3. Key found β account_id = "acct_1A2B3C4D5E6F" is_active = true permissions = ["charges:write", "refunds:write"] rate_limit = 100 requests/second 4. Check rate limit β current usage: 23/100 req/sec β OK 5. Process the charge against account acct_1A2B3C4D5E6F // βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ // STEP 3 β Stripe responds // βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ HTTP 200 OK { "id": "ch_3MqLiJKZ2eZvKYlo2T9UW2GX", "object": "charge", "amount": 2000, "status": "succeeded" } // βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ // WHAT HAPPENS WITH A BAD KEY // βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Stripe API Gateway (bad key scenario): 1. Extract key: "sk_live_INVALIDKEYHERE" 2. Hash and look up β no matching row in api_keys table 3. Return immediately β no account check, no charge processing HTTP 401 Unauthorized { "error": { "code": "api_key_invalid", "message": "No such API key: 'sk_live_INVA...HERE'" } }
HTTP 200 β { "id": "ch_3MqLiJKZ2eZvKYlo2T9UW2GX", "status": "succeeded" }
// Invalid key:
HTTP 401 β { "error": { "code": "api_key_invalid" } }
// Correct key, wrong permissions:
HTTP 403 β { "error": { "code": "permission_denied", "message": "This key does not have permission for charges:write" } }
// Rate limit hit:
HTTP 429 β { "error": { "code": "rate_limit_exceeded", "message": "Too many requests" } }
Where API Keys Live, Travel, and Get Stolen
The key gets generated once. After that, it has to live somewhere in your system, travel with every request, and never appear anywhere a human or bot shouldn't see it. Every one of those three moments is a potential leak point, and I've seen all three fail in production.
Storage is where most teams fail first. The lazy path β and I've seen it in codebases at companies you've heard of β is hardcoding the key directly in source code. It's fast, it works locally, and it will eventually destroy you. GitHub's secret scanning catches some of these and emails the vendor, but by the time the email arrives, automated bots have already scraped the commit. Those bots watch GitHub's public event stream in real time. Real time. Not a crawl β a live stream.
The correct storage pattern is environment variables at minimum, a secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager) in any production system that matters. The key lives in the secrets manager, your app fetches it at startup or at request time, and it never touches your source control, your logs, or your error reporting service. That last one trips people up constantly β Sentry, Datadog, and similar tools often log full request objects on errors. If your API key is in a request header and you log the full request on a 500 error, you just wrote your key into your observability stack.
# io.thecodeforge β System Design tutorial # Scenario: Payment service loading a Stripe key safely at startup # Demonstrating: env vars (dev), secrets manager (prod), and the logging trap import os import boto3 import json import logging import requests from functools import lru_cache logger = logging.getLogger(__name__) # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # PATTERN 1 β Environment variable (acceptable for local dev) # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ def load_stripe_key_from_env() -> str: key = os.environ.get("STRIPE_SECRET_KEY") if not key: # Fail loud at startup β better than a cryptic 401 at checkout time raise EnvironmentError( "STRIPE_SECRET_KEY is not set. " "Check your .env file or deployment environment variables." ) if key.startswith("sk_live") and os.environ.get("APP_ENV") == "development": # Catch the classic mistake: live key used in local dev raise EnvironmentError( "Live Stripe key detected in development environment. " "Use sk_test_ keys for local development." ) return key # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # PATTERN 2 β AWS Secrets Manager (required for production) # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ @lru_cache(maxsize=1) # Cache the secret β don't call Secrets Manager on every request def load_stripe_key_from_secrets_manager(secret_name: str, region: str) -> str: client = boto3.client("secretsmanager", region_name=region) try: response = client.get_secret_value(SecretId=secret_name) except client.exceptions.ResourceNotFoundException: raise RuntimeError(f"Secret '{secret_name}' not found in Secrets Manager.") except client.exceptions.AccessDeniedException: # This usually means your IAM role doesn't have secretsmanager:GetSecretValue raise RuntimeError( f"IAM permission denied reading '{secret_name}'. " "Check your task role policy for secretsmanager:GetSecretValue." ) secret = json.loads(response["SecretString"]) return secret["stripe_secret_key"] # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # THE LOGGING TRAP β this is how keys end up in Datadog # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ def charge_card_unsafe(stripe_key: str, amount_cents: int, card_token: str): headers = {"Authorization": f"Bearer {stripe_key}"} response = requests.post( "https://api.stripe.com/v1/charges", headers=headers, data={"amount": amount_cents, "currency": "usd", "source": card_token} ) if response.status_code != 200: # DANGER: logging response.request exposes the Authorization header # If Sentry or Datadog captures this log, your key is now in their system logger.error(f"Stripe charge failed. Request: {response.request.headers}") return response.json() def charge_card_safe(stripe_key: str, amount_cents: int, card_token: str): headers = {"Authorization": f"Bearer {stripe_key}"} response = requests.post( "https://api.stripe.com/v1/charges", headers=headers, data={"amount": amount_cents, "currency": "usd", "source": card_token} ) if response.status_code != 200: # Log only what you need to debug β never log headers containing credentials logger.error( "Stripe charge failed", extra={ "status_code": response.status_code, "stripe_error_code": response.json().get("error", {}).get("code"), "amount_cents": amount_cents # Deliberately omitting: headers, full request object, card_token } ) return response.json() # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # STARTUP β how the service wires this together # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ if __name__ == "__main__": env = os.environ.get("APP_ENV", "development") if env == "production": stripe_key = load_stripe_key_from_secrets_manager( secret_name="prod/payment-service/stripe", region="us-east-1" ) print("Loaded Stripe key from Secrets Manager") else: stripe_key = load_stripe_key_from_env() print("Loaded Stripe key from environment variable") # Sanity check β log key PREFIX only so you can confirm which key is active # Never log the full key, even in debug mode print(f"Active Stripe key prefix: {stripe_key[:12]}...")
Loaded Stripe key from Secrets Manager
Active Stripe key prefix: sk_live_4eC3...
# Development startup with test key:
Loaded Stripe key from environment variable
Active Stripe key prefix: sk_test_51Lk...
# Development startup with LIVE key (caught at startup, not at runtime):
EnvironmentError: Live Stripe key detected in development environment. Use sk_test_ keys for local development.
# Production with missing IAM permission:
RuntimeError: IAM permission denied reading 'prod/payment-service/stripe'. Check your task role policy for secretsmanager:GetSecretValue.
Rate Limiting, Key Rotation, and Scoping: The Three Things That Save You
Generating an API key is easy. Managing it across the lifecycle of a production system is where teams fall apart. There are three practices that separate systems that recover from a leaked key in five minutes from systems that spend a week cleaning up the blast radius.
Rate limiting is your circuit breaker. Every serious API provider implements it β they'll return HTTP 429 Too Many Requests when you exceed your quota. But here's what most juniors don't realize: rate limiting protects the provider, not you. It stops a leaked key from burning through someone else's quota, but it doesn't stop an attacker from doing exactly 99 requests per minute (just under your limit) indefinitely. You need your own rate limiting on inbound requests to your service, separate from whatever the upstream API enforces.
Key rotation means proactively replacing your API keys on a schedule, even if they haven't leaked. The argument against it β 'why fix what isn't broken?' β ignores the reality that you often don't know a key is compromised until damage is done. Rotate quarterly at minimum. Rotate immediately any time a developer with access leaves the company. Rotate immediately if the key appears anywhere it shouldn't. The operational cost of rotation is low if you've already externalized keys to a secrets manager β it's a one-line update, not a deployment.
Scoping means giving each key only the permissions it actually needs. Don't use your master admin key in your read-only reporting service. If that reporting service gets compromised, the attacker should get read access to your data β not write access, not billing access, not the ability to create new API keys. Most providers let you scope keys to specific operations. Use it every time.
# io.thecodeforge β System Design tutorial # Scenario: Internal API gateway managing keys for microservices # Demonstrates: scoped keys, rotation tracking, and handling 429s correctly import time import hashlib import secrets import logging from datetime import datetime, timedelta from dataclasses import dataclass, field from typing import Optional import requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry logger = logging.getLogger(__name__) # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # DATA MODEL β what a managed API key looks like internally # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ @dataclass class ScopedAPIKey: service_name: str # which internal service owns this key provider: str # e.g. "stripe", "sendgrid", "googlemaps" permissions: list[str] # e.g. ["charges:write"] β not ["*"] created_at: datetime = field(default_factory=datetime.utcnow) rotate_by: datetime = field(default_factory=lambda: datetime.utcnow() + timedelta(days=90)) _raw_key: str = field(default="", repr=False) # never printed in logs or repr @property def key_prefix(self) -> str: # Safe to log β enough to identify which key is active without exposing it return self._raw_key[:12] + "..." @property def days_until_rotation(self) -> int: return (self.rotate_by - datetime.utcnow()).days @property def needs_rotation(self) -> bool: return self.days_until_rotation <= 7 # warn a week out # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # RETRY LOGIC β handling 429s without hammering the upstream # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ def build_resilient_http_session(total_retries: int = 3) -> requests.Session: session = requests.Session() # Retry on 429 (rate limit) and 503 (upstream temporarily unavailable) # backoff_factor=2 means: wait 2s, then 4s, then 8s between retries retry_strategy = Retry( total=total_retries, status_forcelist=[429, 503], backoff_factor=2, respect_retry_after_header=True # honour Stripe/SendGrid's Retry-After header ) adapter = HTTPAdapter(max_retries=retry_strategy) session.mount("https://", adapter) return session # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # SCOPED REQUEST BUILDER β enforces least-privilege per service # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ class ScopedStripeClient: """ Each internal service gets its own ScopedStripeClient with its own key. The checkout service gets charges:write. The reporting service gets charges:read only. A compromised reporting service cannot create charges. """ def __init__(self, api_key: ScopedAPIKey): self._key = api_key self._session = build_resilient_http_session() self._check_rotation_status() def _check_rotation_status(self): if self._key.needs_rotation: # Warn loudly at startup β gives ops team time to rotate before expiry logger.warning( "API key rotation due soon", extra={ "service": self._key.service_name, "provider": self._key.provider, "key_prefix": self._key.key_prefix, "days_remaining": self._key.days_until_rotation } ) def get_charge(self, charge_id: str) -> dict: # Reporting service uses this β read-only, no ability to create/modify if "charges:read" not in self._key.permissions: raise PermissionError( f"Key for '{self._key.service_name}' lacks charges:read permission. " f"Granted permissions: {self._key.permissions}" ) response = self._session.get( f"https://api.stripe.com/v1/charges/{charge_id}", headers={"Authorization": f"Bearer {self._key._raw_key}"} ) response.raise_for_status() return response.json() def create_charge(self, amount_cents: int, card_token: str) -> dict: # Checkout service uses this β requires explicit write permission if "charges:write" not in self._key.permissions: raise PermissionError( f"Key for '{self._key.service_name}' lacks charges:write permission. " f"This is likely a scoping error β do not expand permissions. " f"Create a dedicated key with charges:write for the checkout service." ) response = self._session.post( "https://api.stripe.com/v1/charges", headers={"Authorization": f"Bearer {self._key._raw_key}"}, data={"amount": amount_cents, "currency": "usd", "source": card_token} ) response.raise_for_status() return response.json() # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # EXAMPLE USAGE β wiring up two services with different scopes # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ if __name__ == "__main__": # Checkout service key β write access checkout_api_key = ScopedAPIKey( service_name="checkout-service", provider="stripe", permissions=["charges:write", "refunds:write"], rotate_by=datetime.utcnow() + timedelta(days=5) # triggers rotation warning ) checkout_api_key._raw_key = "sk_live_checkout_key_here" # Reporting service key β read access only reporting_api_key = ScopedAPIKey( service_name="reporting-service", provider="stripe", permissions=["charges:read"], rotate_by=datetime.utcnow() + timedelta(days=60) ) reporting_api_key._raw_key = "sk_live_reporting_key_here" checkout_client = ScopedStripeClient(checkout_api_key) reporting_client = ScopedStripeClient(reporting_api_key) # This works: print("Checkout client permissions:", checkout_api_key.permissions) # This raises PermissionError β the reporting client cannot create charges try: reporting_client.create_charge(2000, "tok_visa") except PermissionError as e: print(f"Caught expected permission error: {e}")
WARNING: API key rotation due soon | service=checkout-service | provider=stripe | key_prefix=sk_live_chec... | days_remaining=5
# No warning for reporting key (60 days out):
[no rotation warning]
# Permissions check:
Checkout client permissions: ['charges:write', 'refunds:write']
# Reporting client attempting to create a charge:
Caught expected permission error: Key for 'reporting-service' lacks charges:write permission. This is likely a scoping error β do not expand permissions. Create a dedicated key with charges:write for the checkout service.
The API Key Graveyard: Real Failure Modes and How to Detect Them
Every API key failure I've seen fits one of four patterns. Learn to recognise the smell of each, because by the time you're debugging them under pressure they all look like generic 'service unavailable' errors.
The first pattern is the silent leak. The key is out in the wild β in a public GitHub repo, in a Slack message, in a Confluence page someone made public β and you don't know yet. The attacker isn't being dramatic. They're making exactly 80 requests per minute to stay under your 100 req/min rate limit. Your metrics look normal. Your error rate is zero. Your bill is climbing. Detection: set up spend anomaly alerts on every API provider that has billing. AWS, Stripe, SendGrid β they all have it. Set the threshold low. A 20% spike in API usage at 2am is worth a PagerDuty alert.
The second pattern is the rotation death spiral. Someone rotates a key, updates it in the secrets manager, but forgets that four services read that secret at startup and cache it with lru_cache. They're all still using the old key. You start seeing 401s in production. Panicked, someone reverts the rotation. Now you're back to the leaked key and have to do the whole thing again. Fix: implement a cache TTL on secret fetches, and build a /healthz endpoint that validates the API key is still active without caching the result.
The third pattern is the scope creep accident. Someone needs a quick fix in staging, expands a key's permissions 'temporarily,' and that change makes it to production. Now your read-only analytics service has write access. It doesn't matter until the analytics service has a bug that starts writing garbage data. Audit your key permissions quarterly β not just whether keys are rotated, but whether their scopes still match what they actually need.
# io.thecodeforge β System Design tutorial # Scenario: A lightweight health-check system that validates API keys are alive # and alerts on anomalous usage patterns before the bill arrives import logging import time from datetime import datetime, timedelta from collections import deque from typing import Callable import requests logger = logging.getLogger(__name__) # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # PATTERN: Sliding window usage tracker # Detects anomalous request spikes that could indicate a leaked key # being used by someone else against your quota # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ class APIKeyUsageMonitor: def __init__( self, service_name: str, rate_limit_per_minute: int, spike_alert_threshold: float = 0.75 # alert at 75% of rate limit ): self.service_name = service_name self.rate_limit_per_minute = rate_limit_per_minute self.spike_alert_threshold = spike_alert_threshold # Deque of timestamps β we keep only the last 60 seconds of requests self._request_timestamps: deque = deque() def record_request(self): now = time.monotonic() self._request_timestamps.append(now) self._evict_old_timestamps(now) self._check_for_spike() def _evict_old_timestamps(self, now: float): # Remove timestamps older than 60 seconds cutoff = now - 60.0 while self._request_timestamps and self._request_timestamps[0] < cutoff: self._request_timestamps.popleft() def _check_for_spike(self): current_rate = len(self._request_timestamps) alert_threshold = int(self.rate_limit_per_minute * self.spike_alert_threshold) if current_rate >= alert_threshold: logger.warning( "API key usage spike detected β possible key leak or runaway client", extra={ "service": self.service_name, "requests_last_60s": current_rate, "rate_limit": self.rate_limit_per_minute, "threshold": alert_threshold, "pct_of_limit": round(current_rate / self.rate_limit_per_minute * 100, 1) } ) def current_usage(self) -> dict: now = time.monotonic() self._evict_old_timestamps(now) return { "service": self.service_name, "requests_last_60s": len(self._request_timestamps), "rate_limit_per_minute": self.rate_limit_per_minute, "headroom_remaining": self.rate_limit_per_minute - len(self._request_timestamps) } # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # PATTERN: Active key health check # Run this from your /healthz endpoint β does NOT use lru_cache # so it always validates the current key, even after rotation # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ def validate_stripe_key_is_active(stripe_key: str) -> dict: """ Stripe's /v1/account endpoint requires a valid key and returns account metadata. It's the canonical 'is this key alive?' check. Costs one API call. Cache the RESULT for 60 seconds max, never the key. """ try: response = requests.get( "https://api.stripe.com/v1/account", headers={"Authorization": f"Bearer {stripe_key}"}, timeout=5 # never let a health check block indefinitely ) if response.status_code == 200: account_data = response.json() return { "status": "healthy", "account_id": account_data.get("id"), "charges_enabled": account_data.get("charges_enabled"), "checked_at": datetime.utcnow().isoformat() } elif response.status_code == 401: # The key is dead β either revoked, rotated, or never valid return { "status": "invalid_key", "error": response.json().get("error", {}).get("message"), "action_required": "Rotate key immediately and update secrets manager" } else: return { "status": "unexpected_response", "http_status": response.status_code } except requests.Timeout: return {"status": "timeout", "note": "Stripe API did not respond within 5s"} except requests.ConnectionError: return {"status": "network_error", "note": "Cannot reach api.stripe.com"} # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ # DEMO β simulating usage tracking and health check # βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ if __name__ == "__main__": monitor = APIKeyUsageMonitor( service_name="checkout-service", rate_limit_per_minute=100, spike_alert_threshold=0.75 ) # Simulate normal traffic (30 requests) for _ in range(30): monitor.record_request() print("After 30 requests:", monitor.current_usage()) # Simulate spike (76 more requests β crosses 75% threshold) for _ in range(46): monitor.record_request() print("After 76 requests:", monitor.current_usage()) # Health check output (mocked β would hit real Stripe in prod) print("\nKey health check result:") print({ "status": "healthy", "account_id": "acct_1A2B3C4D5E6F", "charges_enabled": True, "checked_at": datetime.utcnow().isoformat() })
WARNING: API key usage spike detected β possible key leak or runaway client | service=checkout-service | requests_last_60s=76 | rate_limit=100 | threshold=75 | pct_of_limit=76.0
After 76 requests: {'service': 'checkout-service', 'requests_last_60s': 76, 'rate_limit_per_minute': 100, 'headroom_remaining': 24}
Key health check result:
{'status': 'healthy', 'account_id': 'acct_1A2B3C4D5E6F', 'charges_enabled': True, 'checked_at': '2024-03-15T03:42:17.221483'}
| Aspect | API Key | OAuth 2.0 Bearer Token (JWT) |
|---|---|---|
| What it proves | Caller has the string β nothing more | Caller authenticated via a trusted identity provider |
| Expiry | Never expires unless manually revoked | Short-lived (typically 15minβ1hr), auto-expires |
| Revocation speed | Instant β delete the key server-side | Cannot revoke before expiry without a blocklist |
| Theft impact | Attacker has permanent access until manual revoke | Attacker has access for the remaining token lifetime only |
| Ideal use case | Server-to-server with a secret you fully control | User-facing auth, or anywhere expiry matters |
| Cryptographic proof | None β pure lookup | Yes β signature verified with public key, no DB call needed |
| Storage location | Secrets manager / environment variable | Short-lived, often stored in memory only |
| Rotation complexity | Manual process, operationally risky if cached | Automatic via token expiry and refresh flow |
| Provider-side DB hit per request | Yes β key must be looked up every request | No β signature verification is stateless |
| Setup complexity | Trivial β generate, copy, use | High β OAuth flows, identity providers, token endpoints |
π― Key Takeaways
- An API key is a lookup token, not a cryptographic proof β whoever holds the string has the permission, which is why storage and transmission are everything
- The logging trap kills you quietly: Sentry, Datadog, and similar tools will happily capture your Authorization header in error breadcrumbs unless you explicitly scrub them β go check your existing error logs before finishing this article
- One scoped key per service is the single highest-leverage change you can make β when (not if) a key leaks, scope isolation determines whether you have a five-minute fix or a three-hour incident
- An attacker with your API key doesn't need to hammer your rate limit β they'll stay just under it indefinitely, which means spend anomaly alerts catch leaks that error-rate monitoring completely misses
β Common Mistakes to Avoid
- βMistake 1: Hardcoding an API key directly in source code β 'No such API key' 401 errors appear in production after key rotation, AND the old key is permanently exposed in git history β Fix: run
git log -S 'sk_live' --allright now to audit history; usegit filter-repoor BFG Repo Cleaner to scrub the key, then rotate it immediately; move storage to Secrets Manager going forward - βMistake 2: Using the same API key across all services (checkout, reporting, cron jobs) β when any one service is compromised or its key leaks, you revoke it and take down every other service simultaneously β Fix: generate one scoped key per service with least-privilege permissions; use your provider's restricted key feature (Stripe calls these 'Restricted Keys', Twilio calls them 'API Key SIDs')
- βMistake 3: Caching the API key from Secrets Manager with no TTL using lru_cache β after key rotation, services keep sending the revoked key and receive HTTP 401, but they won't recover until redeployed β Fix: replace
@lru_cachewith a timed cache that re-fetches after 60 seconds (usecachetools.TTLCachein Python); add a/healthzroute that validates the live key against the upstream API on every call
Interview Questions on This Topic
- QYour team rotates a Stripe API key in AWS Secrets Manager. Within two minutes, your checkout service starts returning 401 errors, but your reporting service is fine. Both services use the same key name in Secrets Manager. What's causing the discrepancy, and how do you fix it without a redeployment?
- QWhen would you use an API key instead of OAuth 2.0 for authenticating a microservice, and at what point does that choice become a liability you need to revisit?
- QA leaked API key is being used by an attacker at exactly 80 req/min β just under your rate limit of 100 req/min. Your error rate is zero and your latency is normal. How would you detect this attack in a system you're designing today?
Frequently Asked Questions
Is it safe to put an API key in a frontend JavaScript file?
No β never put a secret API key in frontend code. Anything shipped to the browser is public, full stop. Anyone can open DevTools, go to the Network tab, and read every header your frontend sends. If you need to call a third-party API from the frontend, proxy the call through your backend server, which holds the key. The only keys safe for frontend use are explicitly designated 'public keys' (like Stripe's pk_live_ publishable key), which providers scope to read-only, non-sensitive operations by design.
What's the difference between an API key and an API token?
An API key is a static credential that doesn't expire and maps directly to an account β think of it as a permanent password. An API token (usually a JWT or OAuth Bearer token) is short-lived, cryptographically signed, and expires automatically. Use API keys for server-to-server integrations where you fully control the secret. Use tokens for anything involving user identity, or anywhere automatic expiry matters more than operational simplicity.
How do I rotate an API key in production without downtime?
Generate the new key first, then deploy it. Don't revoke the old key until you've confirmed the new key is working in production. The sequence: (1) generate new key in the provider dashboard, (2) update the value in Secrets Manager or your secrets manager of choice, (3) trigger a rolling restart of services (or wait for the TTL cache to expire if you've implemented one), (4) verify your health check endpoint returns healthy with the new key, (5) only then revoke the old key. Skipping step 4 before step 5 is how teams create 3am incidents.
If an attacker gets my API key, can I tell what they did with it?
Only if your API provider logs per-key request history β and most do, but retention windows are short (Stripe keeps 30 days, AWS CloudTrail keeps 90 days by default). The hard reality: if a key was silently leaking for six months at just-under-rate-limit usage, you may never reconstruct the full damage. This is exactly why you should set up spend anomaly alerts and per-key request dashboards proactively, not forensically. After an incident, the first thing to pull is your provider's API usage logs filtered by key prefix, timestamp, and source IP β source IP mismatches between your known datacenter ranges and unknown ranges are your clearest signal.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.