Think of an API key like a loyalty card at a coffee shop. The barista doesn't know your name, doesn't check your ID — they just scan the card and know you're allowed to order, how many free drinks you have left, and whether you're a VIP. The card itself IS the permission. Lose it, and whoever finds it can order on your tab until you cancel it. That's it. That's an API key. It's a password-shaped permission slip that you hand to every service call instead of logging in each time.
A developer at a Y Combinator startup pushed to GitHub on a Friday afternoon. By Sunday, a bot had scraped their AWS API key from the commit history, spun up 47 GPU instances for crypto mining, and run up a $17,000 bill. The key had been in the code for exactly 11 minutes before the push. Eleven minutes. The bill took three months to dispute.
API keys are everywhere — every third-party service you integrate, every payment processor, every mapping library, every SMS gateway. They're the most common authentication mechanism in modern software, and they're also the most commonly mishandled. Not because developers are careless, but because nobody sits down and explains what these things actually are, how they flow through a system, and specifically what blows up when you treat them carelessly.
By the end of this, you'll know exactly what an API key is and isn't, how to generate and store one safely, how to pass it correctly in HTTP requests, what rate limiting and key rotation actually look like in practice, and — most critically — the exact mistakes that get people paged at 3am or handed a five-figure cloud bill. No handwaving. No 'just be careful with your keys.' Concrete mechanics, real failure modes, specific fixes.
What an API Key Actually Is (And What It Is Not)
Before you can protect an API key, you need to know what it's doing in the first place. Most explanations skip straight to 'keep it secret' without ever explaining the mechanism. That's why people make mistakes — they're following rules they don't understand.
An API is just a door into someone else's software. Stripe's API is a door into their payment system. The Google Maps API is a door into their mapping engine. You're not running their code — you're sending HTTP requests to their servers, and their servers do the work and send back a response. Simple.
The problem is: that door can't be wide open. Stripe needs to know which requests came from your account so they can bill you, rate-limit you, and lock you out if you do something sketchy. They can't ask you to type a username and password every single time your checkout page needs to verify a card — that would happen dozens of times per second at scale. So instead, they give you a key: a long random string that you attach to every request. Their server sees the key, looks it up in their database, finds your account, and knows who's asking.
Here's the critical thing most juniors get wrong: an API key is NOT encryption. It doesn't scramble your data. It's not a token that proves who you are through math. It's purely a lookup mechanism — a secret identifier that maps to an account in someone else's database. That distinction matters enormously when you're deciding how to store and transmit it.
APIKeyFlowDiagram.systemdesignPLAINTEXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// io.thecodeforge — SystemDesign tutorial
// Tracing a single API call from your app to a third-party service
// Scenario: Your e-commerce checkout calls Stripe to charge a card
// ─────────────────────────────────────────────────────────────
// STEP1 — Your checkout service builds an HTTP request
// ─────────────────────────────────────────────────────────────
POST https://api.stripe.com/v1/charges
Headers:
Authorization: Bearer sk_live_4eC39HqLyjWDarjtT1zdp7dc // <-- the API key
Content-Type: application/x-www-form-urlencoded
Body:
amount=2000 // $20.00 in cents
currency=usd
source=tok_visa // tokenised card from Stripe.js
// ─────────────────────────────────────────────────────────────
// STEP2 — Stripe's server receives the request
// ─────────────────────────────────────────────────────────────
StripeAPIGateway:
1. Extract key from Authorization header
key = "sk_live_4eC39HqLyjWDarjtT1zdp7dc"2. Look up key in Stripe's internal key store
SELECT account_id, permissions, rate_limit, is_active
FROM api_keys
WHERE key_hash = SHA256("sk_live_4eC39HqLyjWDarjtT1zdp7dc")
// NOTE: Stripe stores a HASH of your key, not the key itself
// This means even Stripe can't recover your key if their DB leaks
3. Key found → account_id = "acct_1A2B3C4D5E6F"
is_active = true
permissions = ["charges:write", "refunds:write"]
rate_limit = 100 requests/second
4. Check rate limit — current usage: 23/100 req/sec → OK5. Process the charge against account acct_1A2B3C4D5E6F
// ─────────────────────────────────────────────────────────────
// STEP3 — Stripe responds
// ─────────────────────────────────────────────────────────────
HTTP200OK
{
"id": "ch_3MqLiJKZ2eZvKYlo2T9UW2GX",
"object": "charge",
"amount": 2000,
"status": "succeeded"
}
// ─────────────────────────────────────────────────────────────
// WHATHAPPENSWITH A BADKEY
// ─────────────────────────────────────────────────────────────
StripeAPIGateway (bad key scenario):
1. Extract key: "sk_live_INVALIDKEYHERE"2. Hash and look up → no matching row in api_keys table
3. Return immediately — no account check, no charge processing
HTTP401Unauthorized
{
"error": {
"code": "api_key_invalid",
"message": "No such API key: 'sk_live_INVA...HERE'"
}
}
Never Do This: Confusing API Keys with Authentication
An API key proves nothing about identity through cryptography — it just proves the caller has the string. If someone steals your key, the server cannot tell the difference between them and you. Unlike a JWT (which is cryptographically signed and expires), a stolen API key is valid forever until you manually revoke it. Build your threat model around that fact.
Where API Keys Live, Travel, and Get Stolen
The key gets generated once. After that, it has to live somewhere in your system, travel with every request, and never appear anywhere a human or bot shouldn't see it. Every one of those three moments is a potential leak point, and I've seen all three fail in production.
Storage is where most teams fail first. The lazy path — and I've seen it in codebases at companies you've heard of — is hardcoding the key directly in source code. It's fast, it works locally, and it will eventually destroy you. GitHub's secret scanning catches some of these and emails the vendor, but by the time the email arrives, automated bots have already scraped the commit. Those bots watch GitHub's public event stream in real time. Real time. Not a crawl — a live stream.
The correct storage pattern is environment variables at minimum, a secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager) in any production system that matters. The key lives in the secrets manager, your app fetches it at startup or at request time, and it never touches your source control, your logs, or your error reporting service. That last one trips people up constantly — Sentry, Datadog, and similar tools often log full request objects on errors. If your API key is in a request header and you log the full request on a 500 error, you just wrote your key into your observability stack.
SecureKeyLoading.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# io.thecodeforge — System Design tutorial# Scenario: Payment service loading a Stripe key safely at startup# Demonstrating: env vars (dev), secrets manager (prod), and the logging trapimport os
import boto3
import json
import logging
import requests
from functools import lru_cache
logger = logging.getLogger(__name__)
# ─────────────────────────────────────────────────────────────# PATTERN 1 — Environment variable (acceptable for local dev)# ─────────────────────────────────────────────────────────────defload_stripe_key_from_env() -> str:
key = os.environ.get("STRIPE_SECRET_KEY")
ifnot key:
# Fail loud at startup — better than a cryptic 401 at checkout timeraiseEnvironmentError(
"STRIPE_SECRET_KEY is not set. ""Check your .env file or deployment environment variables."
)
if key.startswith("sk_live") and os.environ.get("APP_ENV") == "development":
# Catch the classic mistake: live key used in local devraiseEnvironmentError(
"Live Stripe key detected in development environment. ""Use sk_test_ keys for local development."
)
return key
# ─────────────────────────────────────────────────────────────# PATTERN 2 — AWS Secrets Manager (required for production)# ─────────────────────────────────────────────────────────────
@lru_cache(maxsize=1) # Cache the secret — don't call Secrets Manager on every requestdefload_stripe_key_from_secrets_manager(secret_name: str, region: str) -> str:
client = boto3.client("secretsmanager", region_name=region)
try:
response = client.get_secret_value(SecretId=secret_name)
except client.exceptions.ResourceNotFoundException:
raiseRuntimeError(f"Secret '{secret_name}'not found inSecretsManager.")
except client.exceptions.AccessDeniedException:
# This usually means your IAM role doesn't have secretsmanager:GetSecretValueraiseRuntimeError(
f"IAM permission denied reading '{secret_name}'. ""Check your task role policy for secretsmanager:GetSecretValue."
)
secret = json.loads(response["SecretString"])
return secret["stripe_secret_key"]
# ─────────────────────────────────────────────────────────────# THE LOGGING TRAP — this is how keys end up in Datadog# ─────────────────────────────────────────────────────────────defcharge_card_unsafe(stripe_key: str, amount_cents: int, card_token: str):
headers = {"Authorization": f"Bearer {stripe_key}"}
response = requests.post(
"https://api.stripe.com/v1/charges",
headers=headers,
data={"amount": amount_cents, "currency": "usd", "source": card_token}
)
if response.status_code != 200:
# DANGER: logging response.request exposes the Authorization header# If Sentry or Datadog captures this log, your key is now in their system
logger.error(f"Stripe charge failed. Request: {response.request.headers}")
return response.json()
defcharge_card_safe(stripe_key: str, amount_cents: int, card_token: str):
headers = {"Authorization": f"Bearer {stripe_key}"}
response = requests.post(
"https://api.stripe.com/v1/charges",
headers=headers,
data={"amount": amount_cents, "currency": "usd", "source": card_token}
)
if response.status_code != 200:
# Log only what you need to debug — never log headers containing credentials
logger.error(
"Stripe charge failed",
extra={
"status_code": response.status_code,
"stripe_error_code": response.json().get("error", {}).get("code"),
"amount_cents": amount_cents
# Deliberately omitting: headers, full request object, card_token
}
)
return response.json()
# ─────────────────────────────────────────────────────────────# STARTUP — how the service wires this together# ─────────────────────────────────────────────────────────────if __name__ == "__main__":
env = os.environ.get("APP_ENV", "development")
if env == "production":
stripe_key = load_stripe_key_from_secrets_manager(
secret_name="prod/payment-service/stripe",
region="us-east-1"
)
print("Loaded Stripe key from Secrets Manager")
else:
stripe_key = load_stripe_key_from_env()
print("Loaded Stripe key from environment variable")
# Sanity check — log key PREFIX only so you can confirm which key is active# Never log the full key, even in debug modeprint(f"Active Stripe key prefix: {stripe_key[:12]}...")
Output
# Production startup:
Loaded Stripe key from Secrets Manager
Active Stripe key prefix: sk_live_4eC3...
# Development startup with test key:
Loaded Stripe key from environment variable
Active Stripe key prefix: sk_test_51Lk...
# Development startup with LIVE key (caught at startup, not at runtime):
EnvironmentError: Live Stripe key detected in development environment. Use sk_test_ keys for local development.
# Production with missing IAM permission:
RuntimeError: IAM permission denied reading 'prod/payment-service/stripe'. Check your task role policy for secretsmanager:GetSecretValue.
Production Trap: Your Error Reporter Is Logging Your Keys
Sentry's default Django and Flask integrations capture the full HTTP request object on unhandled exceptions — including all headers. Authorization: Bearer sk_live_... goes straight into Sentry's servers. Fix it: configure Sentry's before_send hook to scrub Authorization headers, or use sentry_sdk's send_default_pii=False setting. Check your existing Sentry issues right now — search for 'Authorization' in the breadcrumb data.
Rate Limiting, Key Rotation, and Scoping: The Three Things That Save You
Generating an API key is easy. Managing it across the lifecycle of a production system is where teams fall apart. There are three practices that separate systems that recover from a leaked key in five minutes from systems that spend a week cleaning up the blast radius.
Rate limiting is your circuit breaker. Every serious API provider implements it — they'll return HTTP 429 Too Many Requests when you exceed your quota. But here's what most juniors don't realize: rate limiting protects the provider, not you. It stops a leaked key from burning through someone else's quota, but it doesn't stop an attacker from doing exactly 99 requests per minute (just under your limit) indefinitely. You need your own rate limiting on inbound requests to your service, separate from whatever the upstream API enforces.
Key rotation means proactively replacing your API keys on a schedule, even if they haven't leaked. The argument against it — 'why fix what isn't broken?' — ignores the reality that you often don't know a key is compromised until damage is done. Rotate quarterly at minimum. Rotate immediately any time a developer with access leaves the company. Rotate immediately if the key appears anywhere it shouldn't. The operational cost of rotation is low if you've already externalized keys to a secrets manager — it's a one-line update, not a deployment.
Scoping means giving each key only the permissions it actually needs. Don't use your master admin key in your read-only reporting service. If that reporting service gets compromised, the attacker should get read access to your data — not write access, not billing access, not the ability to create new API keys. Most providers let you scope keys to specific operations. Use it every time.
APIKeyRotationAndScoping.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# io.thecodeforge — System Design tutorial# Scenario: Internal API gateway managing keys for microservices# Demonstrates: scoped keys, rotation tracking, and handling 429s correctlyimport time
import hashlib
import secrets
import logging
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from typing importOptionalimport requests
from requests.adapters importHTTPAdapterfrom urllib3.util.retry importRetry
logger = logging.getLogger(__name__)
# ─────────────────────────────────────────────────────────────# DATA MODEL — what a managed API key looks like internally# ─────────────────────────────────────────────────────────────
@dataclass
classScopedAPIKey:
service_name: str # which internal service owns this key
provider: str # e.g. "stripe", "sendgrid", "googlemaps"
permissions: list[str] # e.g. ["charges:write"] — not ["*"]
created_at: datetime = field(default_factory=datetime.utcnow)
rotate_by: datetime = field(default_factory=lambda: datetime.utcnow() + timedelta(days=90))
_raw_key: str = field(default="", repr=False) # never printed in logs or repr
@property
defkey_prefix(self) -> str:
# Safe to log — enough to identify which key is active without exposing itreturnself._raw_key[:12] + "..."
@property
defdays_until_rotation(self) -> int:
return (self.rotate_by - datetime.utcnow()).days
@property
defneeds_rotation(self) -> bool:
return self.days_until_rotation <= 7# warn a week out# ─────────────────────────────────────────────────────────────# RETRY LOGIC — handling 429s without hammering the upstream# ─────────────────────────────────────────────────────────────defbuild_resilient_http_session(total_retries: int = 3) -> requests.Session:
session = requests.Session()
# Retry on 429 (rate limit) and 503 (upstream temporarily unavailable)# backoff_factor=2 means: wait 2s, then 4s, then 8s between retries
retry_strategy = Retry(
total=total_retries,
status_forcelist=[429, 503],
backoff_factor=2,
respect_retry_after_header=True# honour Stripe/SendGrid's Retry-After header
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
return session
# ─────────────────────────────────────────────────────────────# SCOPED REQUEST BUILDER — enforces least-privilege per service# ─────────────────────────────────────────────────────────────classScopedStripeClient:
"""
Each internal service gets its own ScopedStripeClientwith its own key.
The checkout service gets charges:write.
The reporting service gets charges:read only.
A compromised reporting service cannot create charges.
"""
def__init__(self, api_key: ScopedAPIKey):
self._key = api_key
self._session = build_resilient_http_session()
self._check_rotation_status()
def_check_rotation_status(self):
ifself._key.needs_rotation:
# Warn loudly at startup — gives ops team time to rotate before expiry
logger.warning(
"API key rotation due soon",
extra={
"service": self._key.service_name,
"provider": self._key.provider,
"key_prefix": self._key.key_prefix,
"days_remaining": self._key.days_until_rotation
}
)
defget_charge(self, charge_id: str) -> dict:
# Reporting service uses this — read-only, no ability to create/modifyif"charges:read"notinself._key.permissions:
raisePermissionError(
f"Key for '{self._key.service_name}' lacks charges:read permission. "
f"Granted permissions: {self._key.permissions}"
)
response = self._session.get(
f"https://api.stripe.com/v1/charges/{charge_id}",
headers={"Authorization": f"Bearer {self._key._raw_key}"}
)
response.raise_for_status()
return response.json()
defcreate_charge(self, amount_cents: int, card_token: str) -> dict:
# Checkout service uses this — requires explicit write permissionif"charges:write"notinself._key.permissions:
raisePermissionError(
f"Key for '{self._key.service_name}' lacks charges:write permission. "
f"This is likely a scoping error — do not expand permissions. "
f"Create a dedicated key with charges:write for the checkout service."
)
response = self._session.post(
"https://api.stripe.com/v1/charges",
headers={"Authorization": f"Bearer {self._key._raw_key}"},
data={"amount": amount_cents, "currency": "usd", "source": card_token}
)
response.raise_for_status()
return response.json()
# ─────────────────────────────────────────────────────────────# EXAMPLE USAGE — wiring up two services with different scopes# ─────────────────────────────────────────────────────────────if __name__ == "__main__":
# Checkout service key — write access
checkout_api_key = ScopedAPIKey(
service_name="checkout-service",
provider="stripe",
permissions=["charges:write", "refunds:write"],
rotate_by=datetime.utcnow() + timedelta(days=5) # triggers rotation warning
)
checkout_api_key._raw_key = "sk_live_checkout_key_here"# Reporting service key — read access only
reporting_api_key = ScopedAPIKey(
service_name="reporting-service",
provider="stripe",
permissions=["charges:read"],
rotate_by=datetime.utcnow() + timedelta(days=60)
)
reporting_api_key._raw_key = "sk_live_reporting_key_here"
checkout_client = ScopedStripeClient(checkout_api_key)
reporting_client = ScopedStripeClient(reporting_api_key)
# This works:print("Checkout client permissions:", checkout_api_key.permissions)
# This raises PermissionError — the reporting client cannot create chargestry:
reporting_client.create_charge(2000, "tok_visa")
exceptPermissionErroras e:
print(f"Caught expected permission error: {e}")
Output
# Startup warning (checkout key expires in 5 days):
WARNING: API key rotation due soon | service=checkout-service | provider=stripe | key_prefix=sk_live_chec... | days_remaining=5
Caught expected permission error: Key for 'reporting-service' lacks charges:write permission. This is likely a scoping error — do not expand permissions. Create a dedicated key with charges:write for the checkout service.
Senior Shortcut: One Key Per Service, Never One Key Per Company
The single biggest operational upgrade you can make today: stop using one shared API key across all your services. Give each service its own scoped key. When a key leaks, you revoke exactly that key, you know exactly which service was compromised, and every other service keeps running. With a shared key, a leak in your reporting cron job takes down your payment flow while you rotate. Scope isolation is your blast radius limiter.
The API Key Graveyard: Real Failure Modes and How to Detect Them
Every API key failure I've seen fits one of four patterns. Learn to recognise the smell of each, because by the time you're debugging them under pressure they all look like generic 'service unavailable' errors.
The first pattern is the silent leak. The key is out in the wild — in a public GitHub repo, in a Slack message, in a Confluence page someone made public — and you don't know yet. The attacker isn't being dramatic. They're making exactly 80 requests per minute to stay under your 100 req/min rate limit. Your metrics look normal. Your error rate is zero. Your bill is climbing. Detection: set up spend anomaly alerts on every API provider that has billing. AWS, Stripe, SendGrid — they all have it. Set the threshold low. A 20% spike in API usage at 2am is worth a PagerDuty alert.
The second pattern is the rotation death spiral. Someone rotates a key, updates it in the secrets manager, but forgets that four services read that secret at startup and cache it with lru_cache. They're all still using the old key. You start seeing 401s in production. Panicked, someone reverts the rotation. Now you're back to the leaked key and have to do the whole thing again. Fix: implement a cache TTL on secret fetches, and build a /healthz endpoint that validates the API key is still active without caching the result.
The third pattern is the scope creep accident. Someone needs a quick fix in staging, expands a key's permissions 'temporarily,' and that change makes it to production. Now your read-only analytics service has write access. It doesn't matter until the analytics service has a bug that starts writing garbage data. Audit your key permissions quarterly — not just whether keys are rotated, but whether their scopes still match what they actually need.
APIKeyHealthMonitor.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# io.thecodeforge — System Design tutorial# Scenario: A lightweight health-check system that validates API keys are alive# and alerts on anomalous usage patterns before the bill arrivesimport logging
import time
from datetime import datetime, timedelta
from collections import deque
from typing importCallableimport requests
logger = logging.getLogger(__name__)
# ─────────────────────────────────────────────────────────────# PATTERN: Sliding window usage tracker# Detects anomalous request spikes that could indicate a leaked key# being used by someone else against your quota# ─────────────────────────────────────────────────────────────classAPIKeyUsageMonitor:
def__init__(
self,
service_name: str,
rate_limit_per_minute: int,
spike_alert_threshold: float = 0.75# alert at 75% of rate limit
):
self.service_name = service_name
self.rate_limit_per_minute = rate_limit_per_minute
self.spike_alert_threshold = spike_alert_threshold
# Deque of timestamps — we keep only the last 60 seconds of requestsself._request_timestamps: deque = deque()
defrecord_request(self):
now = time.monotonic()
self._request_timestamps.append(now)
self._evict_old_timestamps(now)
self._check_for_spike()
def_evict_old_timestamps(self, now: float):
# Remove timestamps older than 60 seconds
cutoff = now - 60.0whileself._request_timestamps andself._request_timestamps[0] < cutoff:
self._request_timestamps.popleft()
def_check_for_spike(self):
current_rate = len(self._request_timestamps)
alert_threshold = int(self.rate_limit_per_minute * self.spike_alert_threshold)
if current_rate >= alert_threshold:
logger.warning(
"API key usage spike detected — possible key leak or runaway client",
extra={
"service": self.service_name,
"requests_last_60s": current_rate,
"rate_limit": self.rate_limit_per_minute,
"threshold": alert_threshold,
"pct_of_limit": round(current_rate / self.rate_limit_per_minute * 100, 1)
}
)
defcurrent_usage(self) -> dict:
now = time.monotonic()
self._evict_old_timestamps(now)
return {
"service": self.service_name,
"requests_last_60s": len(self._request_timestamps),
"rate_limit_per_minute": self.rate_limit_per_minute,
"headroom_remaining": self.rate_limit_per_minute - len(self._request_timestamps)
}
# ─────────────────────────────────────────────────────────────# PATTERN: Active key health check# Run this from your /healthz endpoint — does NOT use lru_cache# so it always validates the current key, even after rotation# ─────────────────────────────────────────────────────────────defvalidate_stripe_key_is_active(stripe_key: str) -> dict:
"""
Stripe's /v1/account endpoint requires a valid key and returns
account metadata. It's the canonical 'is this key alive?' check.
Costs one API call. Cache the RESULTfor60 seconds max, never the key.
"""
try:
response = requests.get(
"https://api.stripe.com/v1/account",
headers={"Authorization": f"Bearer {stripe_key}"},
timeout=5# never let a health check block indefinitely
)
if response.status_code == 200:
account_data = response.json()
return {
"status": "healthy",
"account_id": account_data.get("id"),
"charges_enabled": account_data.get("charges_enabled"),
"checked_at": datetime.utcnow().isoformat()
}
elif response.status_code == 401:
# The key is dead — either revoked, rotated, or never validreturn {
"status": "invalid_key",
"error": response.json().get("error", {}).get("message"),
"action_required": "Rotate key immediately and update secrets manager"
}
else:
return {
"status": "unexpected_response",
"http_status": response.status_code
}
except requests.Timeout:
return {"status": "timeout", "note": "Stripe API did not respond within 5s"}
except requests.ConnectionError:
return {"status": "network_error", "note": "Cannot reach api.stripe.com"}
# ─────────────────────────────────────────────────────────────# DEMO — simulating usage tracking and health check# ─────────────────────────────────────────────────────────────if __name__ == "__main__":
monitor = APIKeyUsageMonitor(
service_name="checkout-service",
rate_limit_per_minute=100,
spike_alert_threshold=0.75
)
# Simulate normal traffic (30 requests)for _ inrange(30):
monitor.record_request()
print("After 30 requests:", monitor.current_usage())
# Simulate spike (76 more requests — crosses 75% threshold)for _ inrange(46):
monitor.record_request()
print("After 76 requests:", monitor.current_usage())
# Health check output (mocked — would hit real Stripe in prod)print("\nKey health check result:")
print({
"status": "healthy",
"account_id": "acct_1A2B3C4D5E6F",
"charges_enabled": True,
"checked_at": datetime.utcnow().isoformat()
})
Interviewers love this one: 'You rotate an API key in Secrets Manager but services start returning 401 — why?' The answer: services fetched the old key at startup and cached it in memory with no TTL. Fix with two things: set a max cache TTL of 60 seconds on secret fetches, and have your health check endpoint always re-fetch the key from Secrets Manager (bypassing cache) so you catch rotation failures within one health check cycle.
Aspect
API Key
OAuth 2.0 Bearer Token (JWT)
What it proves
Caller has the string — nothing more
Caller authenticated via a trusted identity provider
Expiry
Never expires unless manually revoked
Short-lived (typically 15min–1hr), auto-expires
Revocation speed
Instant — delete the key server-side
Cannot revoke before expiry without a blocklist
Theft impact
Attacker has permanent access until manual revoke
Attacker has access for the remaining token lifetime only
Ideal use case
Server-to-server with a secret you fully control
User-facing auth, or anywhere expiry matters
Cryptographic proof
None — pure lookup
Yes — signature verified with public key, no DB call needed
Storage location
Secrets manager / environment variable
Short-lived, often stored in memory only
Rotation complexity
Manual process, operationally risky if cached
Automatic via token expiry and refresh flow
Provider-side DB hit per request
Yes — key must be looked up every request
No — signature verification is stateless
Setup complexity
Trivial — generate, copy, use
High — OAuth flows, identity providers, token endpoints
Key takeaways
1
An API key is a lookup token, not a cryptographic proof
whoever holds the string has the permission, which is why storage and transmission are everything
2
The logging trap kills you quietly
Sentry, Datadog, and similar tools will happily capture your Authorization header in error breadcrumbs unless you explicitly scrub them — go check your existing error logs before finishing this article
3
One scoped key per service is the single highest-leverage change you can make
when (not if) a key leaks, scope isolation determines whether you have a five-minute fix or a three-hour incident
4
An attacker with your API key doesn't need to hammer your rate limit
they'll stay just under it indefinitely, which means spend anomaly alerts catch leaks that error-rate monitoring completely misses
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
Is it safe to put an API key in a frontend JavaScript file?
No — never put a secret API key in frontend code. Anything shipped to the browser is public, full stop. Anyone can open DevTools, go to the Network tab, and read every header your frontend sends. If you need to call a third-party API from the frontend, proxy the call through your backend server, which holds the key. The only keys safe for frontend use are explicitly designated 'public keys' (like Stripe's pk_live_ publishable key), which providers scope to read-only, non-sensitive operations by design.
Was this helpful?
02
What's the difference between an API key and an API token?
An API key is a static credential that doesn't expire and maps directly to an account — think of it as a permanent password. An API token (usually a JWT or OAuth Bearer token) is short-lived, cryptographically signed, and expires automatically. Use API keys for server-to-server integrations where you fully control the secret. Use tokens for anything involving user identity, or anywhere automatic expiry matters more than operational simplicity.
Was this helpful?
03
How do I rotate an API key in production without downtime?
Generate the new key first, then deploy it. Don't revoke the old key until you've confirmed the new key is working in production. The sequence: (1) generate new key in the provider dashboard, (2) update the value in Secrets Manager or your secrets manager of choice, (3) trigger a rolling restart of services (or wait for the TTL cache to expire if you've implemented one), (4) verify your health check endpoint returns healthy with the new key, (5) only then revoke the old key. Skipping step 4 before step 5 is how teams create 3am incidents.
Was this helpful?
04
If an attacker gets my API key, can I tell what they did with it?
Only if your API provider logs per-key request history — and most do, but retention windows are short (Stripe keeps 30 days, AWS CloudTrail keeps 90 days by default). The hard reality: if a key was silently leaking for six months at just-under-rate-limit usage, you may never reconstruct the full damage. This is exactly why you should set up spend anomaly alerts and per-key request dashboards proactively, not forensically. After an incident, the first thing to pull is your provider's API usage logs filtered by key prefix, timestamp, and source IP — source IP mismatches between your known datacenter ranges and unknown ranges are your clearest signal.