Senior 7 min · March 06, 2026

SLA Uptime Calculation — Why 4×99.9% Services Fail 99.6%

Four 99.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • SLA uptime calculation converts percentage promises into concrete downtime numbers you can act on.
  • 99.9% (three nines) allows ~43 minutes per month; 99.99% allows ~4 minutes; 99.999% allows ~26 seconds.
  • Compound SLAs: multiply each service's uptime — two 99.9% services yield only 99.8% overall, a loss of 86 minutes/month.
  • Performance insight: bumping from 99.9% to 99.99% requires roughly 10× in infrastructure cost, not 10%.
  • Production insight: most outage budgets are eaten by planned maintenance because teams forget to exclude it from the calculation.
  • Biggest mistake: assuming 99.9% uptime means you'll only have 8 hours of downtime a year — it's actually 8.76 hours, and that's only if you measure correctly.
Plain-English First

Imagine you hire a babysitter who promises to show up 99% of the time. That sounds great — until you realise that 1% of the year is nearly four days they might just not turn up. An SLA (Service Level Agreement) is that same promise, but between a software service and its users. Uptime is simply the percentage of time your service is actually working. The tricky part is that 99% and 99.9% sound almost identical, but the difference in real downtime is enormous — and that gap is exactly what engineers fight over.

Every time you open Netflix, tap your bank app, or hit send on a Slack message, there is a number quietly sitting behind that experience: an uptime percentage. That number is the spine of every SLA — the contractual promise a service makes about how reliably it will be available. For most users it is invisible. For engineers, it is one of the most consequential numbers they will ever design around.

The problem is that uptime percentages are deeply deceptive. 99% sounds like near-perfection, but it allows for over seven hours of downtime every month. Worse, most real systems are not a single service — they are chains of services, and each link in that chain multiplies the risk. Without understanding how to calculate and compose SLAs correctly, you can architect a system that looks resilient on paper but bleeds reliability in production.

By the end of this article you will be able to read an SLA and immediately translate it into concrete downtime minutes, calculate the real availability of a multi-service architecture, understand error budgets and how teams use them to make deployment decisions, and spot the most common mistakes engineers make when reasoning about nines.

What Is SLA and Uptime Calculation?

SLA and Uptime Calculation is the discipline of converting a service availability promise into measurable downtime budgets. The core concept is simple: an SLA states a target percentage (e.g., 99.9%) over a defined time window (typically a month or year). Uptime calculation then tracks actual availability against that target. But the simplicity ends there — real-world nuances like measurement windows, planned maintenance, and compound services make this a minefield for the unprepared.

The counterpart to an SLA is the error budget: the amount of downtime you're allowed while still meeting the target. Error budgets align engineering velocity with reliability: when the budget is full, you can deploy aggressively; when it's nearly exhausted, you freeze risk. This turns a static contract into a dynamic decision tool.

Key components of any uptime calculation
  • Measurement window: Calendar month, rolling 30 days, or trailing year. Each changes how early you detect problems.
  • Allowed downtime: Total minutes the service can be unavailable per window.
  • Exclusions: Planned maintenance windows (if allowed by contract) must be subtracted from the denominator.
  • Monitoring resolution: The granularity at which you sample uptime — too coarse and you miss short blips.
ForgeExample.javaSYSTEM DESIGN
1
2
3
4
5
6
7
8
// TheCodeForgeSLA and Uptime Calculation example
// Always use meaningful names, not x or n
public class ForgeExample {
    public static void main(String[] args) {
        String topic = "SLA and Uptime Calculation";
        System.out.println("Learning: " + topic + " 🔥");
    }
}
Output
Learning: SLA and Uptime Calculation 🔥
Forge Tip:
Type this code yourself rather than copy-pasting. The muscle memory of writing it will help it stick.
Production Insight
The biggest mistake teams make is treating SLA as a single number.
In production, uptime is measured differently depending on whose perspective you take — the end user, the network, or the compute layer.
Rule: always define the measurement window and scope before calculating uptime.
Key Takeaway
SLA is a promise, not a measurement.
Uptime calculation is the math that turns that promise into a budget.
Get the denominator right first — everything else follows.

The Math of Nines — Converting Percentage to Real Downtime

A 'nine' represents a factor of 10 improvement in downtime. 99% (two nines) means 1% downtime. 99.9% (three nines) means 0.1% downtime. The trick is that the percentage looks close, but the absolute downtime differences are massive.

Let's put it in real terms. One year has 365 days × 24 hours × 60 minutes = 525,600 minutes. - 99% uptime = 1% downtime = 5,256 minutes = 87.6 hours = 3.65 days - 99.9% uptime = 0.1% downtime = 525.6 minutes = 8.76 hours - 99.99% uptime = 0.01% downtime = 52.56 minutes = ~53 minutes - 99.999% uptime = 0.001% downtime = 5.256 minutes = ~5 minutes Each extra nine reduces downtime by a factor of 10.

Now imagine you're running a payment gateway. A 99.9% SLA means you can be down for almost 9 hours a year. If your average transaction is $50 and you process 1,000 transactions per minute, 9 hours of downtime could cost $27 million. That's why finance apps demand 99.99% or higher.

But the cost of achieving higher nines scales nonlinearly. Moving from 99.9% to 99.99% typically requires redundant load balancers, multi-AZ deployment, automated failover, and often database multi-region replication. Infrastructure costs roughly 10× for that single extra nine. You need to be certain the business value justifies the spend.

downtime_calculator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def downtime_minutes(uptime_percent, period_days=365):
    total_minutes = period_days * 24 * 60
    return total_minutes * (100 - uptime_percent) / 100

# Example usage
for nines in [99, 99.9, 99.99, 99.999]:
    mins = downtime_minutes(nines)
    print(f"{nines}% uptime = {mins:.1f} minutes downtime/year")

# Output:
# 99% uptime = 5256.0 minutes downtime/year
# 99.9% uptime = 525.6 minutes downtime/year
# 99.99% uptime = 52.6 minutes downtime/year
# 99.999% uptime = 5.3 minutes downtime/year
Production Insight
Engineers often mistake 99.9% for 'almost perfect' and then budget infrastructure accordingly.
The jump from 99.9% to 99.99% requires redundant hardware, multi-region failover, and often 10× cost.
Don't promise what you can't afford to build.
Key Takeaway
One extra nine reduces downtime by 90%.
But it also multiplies infrastructure cost by about 10×.
Match your SLA to what the business can afford, not what sounds impressive.
Which SLA Target Should You Choose?
IfInternal dev/test environment
Use99% (two nines) — acceptable, downtime of ~3.6 days/year
IfCustomer-facing web app, non-critical
Use99.9% (three nines) — ~8.7 hours/year, typical for SaaS
IfE-commerce or payment processing
Use99.99% (four nines) — ~52 minutes/year, requires redundancy
IfReal-time trading or medical systems
Use99.999% (five nines) — ~5 minutes/year, extreme cost and complexity

Compound SLAs — How Microservices Multiply Risk

In a multi-service architecture, the overall availability is not the average of individual service uptimes — it's the product. If Service A is up 99.9% of the time and Service B is up 99.9% of the time, the end-to-end availability is 0.999 × 0.999 = 0.998 = 99.8%. That extra 0.1% loss translates to ~17 hours of combined downtime per year instead of 8.7.

This gets worse fast: a chain of four services each at 99.9% yields only 99.6%. That's more than 35 hours of downtime per year — more than four times the downtime of a single 99.9% service.

To compensate, you need at least one service to have a much higher SLA, or you need to build redundant paths so that a single service failure doesn't take down the whole chain. For example, if you have four services in series and you want overall 99.9%, each service must be at least 99.975% reliable. That's a far higher bar than most teams naturally plan for.

In production, the weakest link determines the chain's strength — but because the math is multiplicative, even two strong links can't compensate for one weak one. Always compute the compound SLA before setting individual targets.

compound_sla.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
def compound_sla(service_uptimes):
    from functools import reduce
    return reduce(lambda x, y: x * y, service_uptimes)

services = [0.999, 0.999, 0.999, 0.999]  # each 99.9%
overall = compound_sla(services)
print(f"Combined uptime: {overall*100:.4f}%")
print(f"Downtime per year: {(1-overall)*525600:.0f} minutes")
# Output:
# Combined uptime: 99.6001%
# Downtime per year: 2102 minutes (~35 hours)
Chain of Failures
  • Two 99.9% services in series = 99.8% (downtime doubles).
  • Three 99.9% services = 99.7% (downtime triples).
  • Four 99.9% services = 99.6% (downtime quadruples).
  • To maintain overall 99.9% with 4 services, each must be at least 99.975%.
Production Insight
I've seen teams proudly maintain 99.99% on each microservice and still fail the monthly SLA.
They forgot to multiply.
The fix: compute the compound SLA before setting individual service targets and build redundancy for the weakest link.
Key Takeaway
Compound SLA is multiplicative, not additive.
With four services at 99.9%, you've already lost your three-nines promise.
Design each service's SLA knowing the chain length.

Error Budgets — Turning SLAs Into Deployment Decisions

An error budget is the amount of downtime your service is allowed to have within a given period while still meeting the SLA. For a 99.9% monthly SLA, the error budget is 0.1% of the month's total minutes — about 43 minutes.

Teams use error budgets to decide when to deploy. If you've consumed most of your error budget, you can freeze risky deployments until you recover margin. If you're well within budget, you can deploy more aggressively.

This aligns engineering velocity with reliability: you don't have to choose between moving fast and staying up. The error budget tells you exactly where you stand.

In practice, error budgets work best when they are automatically enforced. CI/CD pipelines should query a monitoring system (e.g., Prometheus) for remaining budget before approving a deployment. If the budget is below a threshold (say 20%), the pipeline automatically blocks non-critical changes. This removes the human override problem.

Error budgets also highlight reliability debt. If you consistently exhaust your budget early in the month, your architecture is not meeting its target — you need to invest in reliability before features.

error_budget_tracker.pyPYTHON
1
2
3
4
5
6
7
8
9
10
def error_budget_remaining(sla_percent, month_days, actual_downtime_minutes):
    total_minutes = month_days * 24 * 60
    budget = total_minutes * (100 - sla_percent) / 100
    return budget - actual_downtime_minutes

# Example: April 2026 (30 days), SLA=99.9%
remaining = error_budget_remaining(99.9, 30, 20)
print(f"Error budget remaining: {remaining:.0f} minutes")
# Output: Error budget remaining: 23 minutes
# So you have only 23 minutes left for the rest of the month.
Common Misconception
Error budgets aren't meant to be fully consumed. They're a safety margin. If you regularly hit 90% consumption, your SLA target is too loose or your reliability strategy is failing.
Production Insight
Error budgets fail when teams don't enforce the freeze after exhaustion.
Managers override the rule to ship a feature, and the SLA is missed by a few minutes.
If you set a budget, honour it — or change the SLA.
Key Takeaway
An error budget turns a static SLA into a dynamic deployment throttle.
When budget is low, stop risky changes and focus on reliability.
If you never use the budget, you're probably over-engineering reliability.

Monitoring and Reporting — How to Actually Track Uptime

Uptime is only as good as the monitoring that measures it. You need to decide:

  1. Measurement window: Rolling year, calendar month, or sliding 30 days? Most SLAs use calendar month, but rolling windows are better for early detection.
  2. What counts as downtime: Is it binary (up/down) or threshold-based (latency > 5s)? Typically, a period is 'down' only if the service is completely unreachable for a minimum number of consecutive seconds (e.g., 30 seconds).
  3. Planned maintenance: Should it be excluded? If your SLA doesn't explicitly exclude maintenance, all downtime counts. Most enterprise SLAs exclude planned windows with prior notice.
  4. Synthetic monitoring: Probes that simulate user requests are more reliable than server-side metrics alone. Use them to measure availability from the user's perspective.
  5. Alerting: Don't wait for the monthly report. Set up real-time alerts when error budget consumption crosses thresholds like 50%, 75%, 90%.

A common trap: monitoring resolution that is too coarse. If you check availability every 5 minutes, a 3-minute outage is invisible. You'll report 100% uptime while customers are failing. Rule of thumb: your check interval should be no longer than the shortest outage you want to detect.

prometheus_uptime_query.promqlPROMQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Uptime over last 30 days (sliding window) - PromQL
# Assumes a metric 'up' that is 1 when service is healthy
100 * avg_over_time(
    sum_over_time(
        (up{job="api"} == 1)[30d:1m]
    )[30d:1m] / count_over_time(
        (up{job="api"})[30d:1m]
    )
)

# Error budget remaining in minutes
# Budget = 0.001 * (30 * 24 * 60) = 43.2 minutes
43.2 - sum_over_time(
    avg_over_time(
        (up{job="api"} == 0)[30d:1m]
    )[30d:1m]
)
Production Insight
A team I worked with used a 30-minute data resolution and missed three 5-minute blips each month.
Their reported uptime was 99.95% but actual was 99.85% — they were violating their SLA for six months without knowing.
Rule: monitoring resolution must be finer than your shortest expected outage window.
Key Takeaway
Uptime reporting is only as good as the monitoring resolution.
Choose your measurement window, exclude planned maintenance if contractual.
Always cross-check synthetic probes against server metrics.

Real-World SLA Patterns and Trade-offs

Designing an SLA isn't just math — it's a business decision. Here are the patterns that actually show up in production contracts:

Inclusion of planned maintenance: Some SLAs carve out maintenance windows (e.g., 2 hours/month). Others expect zero-downtime upgrades. If your SLA excludes maintenance, your monitoring must tag those periods and subtract them from the denominator. A common mistake: failing to exclude maintenance leads to false violation alarms.

Penalty clauses vs credits: Most SLAs offer service credits for violations (e.g., 10% monthly fee refund per 0.1% below target). Credits align incentives — they compensate customers without lawsuits. But credits alone don't fix reliability. Some teams treat credits as a budget line, which is dangerous.

Measurement authority: Who measures uptime — you, the customer, or a third party? If both sides measure differently, disputes arise. Define the measurement method explicitly (synthetic probes from specific locations, using agreed tooling).

Compensation caps: SLAs often cap total credits (e.g., 100% of monthly fee). That means beyond a certain point, you can't compensate for catastrophic downtime. For critical systems, negotiators sometimes add termination rights for repeated violations.

Blast radius: A single SLA for a monolithic service is simpler, but fails to account for partial failures. Consider splitting SLAs by criticality: the payment path may need 99.99%, while the reporting path can tolerate 99.9%.

Pattern Insight
Enterprise SLAs often exclude planned maintenance but require minimum advance notice (e.g., 7 days). Automate the tagging of maintenance windows in your monitoring system to avoid false negatives.
Production Insight
I've seen a startup lose a major client because they promised 99.99% without understanding the infrastructure cost.
They spent 60% of their burn rate on multi-region deployment and still missed the SLA due to a config error.
Rule: map SLA targets to real costs before signing the contract.
Key Takeaway
SLA design is a trade-off between cost, complexity, and risk appetite.
Don't promise what you can't measure — and don't measure what you can't defend.
Get the contract terms (maintenance, measurement, credits) right before signing.
SLA Contract Decision Tree
IfClient demands 99.99% but budget is < $5k/month
UsePush back — explain that achieving 99.99% requires multi-region deployment costing ~$20k+/month. Offer 99.9% with strict maintenance windows.
IfService is single-node and cannot failover
UseDo not promise more than 99%. Single-node can't survive hardware failure without downtime.
IfYou have full control over the stack (no third-party dependencies)
UseYou can realistically aim for 99.99% with proper redundancy and monitoring.
● Production incidentPOST-MORTEMseverity: high

The Microservices Downtime Chain Reaction That Wrecked a Monthly Target

Symptom
End-of-month report showed overall platform uptime of 99.61% despite each individual service meeting its 99.9% target. Customer support reported that the billing API was intermittently unreachable for 15-20 minute periods three times a month.
Assumption
The team assumed that if each service hit 99.9%, the overall system uptime would be around 99.9%. They never multiplied the individual availabilities.
Root cause
Four services chained: 0.999^4 = 0.996. The downtime of each service overlapped only partially, but the combined window of any service being down was 14.4 hours per month — nearly double the allowed 8.76 hours for 99.9%.
Fix
Add a global SLA tracker that computes the product of all service uptimes. Introduce redundancy for the payment and auth services to raise their individual SLAs to 99.99%. Deploy a health check that measures end-to-end availability, not per-service.
Key lesson
  • Always multiply nines across service boundaries — it's not additive, it's multiplicative.
  • A single 99.99% service in the chain can't compensate for three 99.9% services.
  • Monitor end-to-end uptime, not just individual service dashboards.
Production debug guideStep-by-step symptom-to-action mapping for when your uptime reporting shows red.4 entries
Symptom · 01
Monthly uptime report shows value below target by 0.2%
Fix
Check the calculation window — did you include planned maintenance? If yes, subtract that time from the denominator. Recalculate.
Symptom · 02
System is up but customers report intermittent failures
Fix
Look at error rate outside of uptime calculation; uptime only measures total unavailability, not degraded performance. Add a SLO for latency.
Symptom · 03
Individual service dashboards show green, but aggregated SLA is red
Fix
Assume compound SLA failure. Multiply the uptime of all services in the critical path. Find the weakest link and increase its availability or add redundancy.
Symptom · 04
Downtime budget exhausted mid-month
Fix
Freeze all deployments that aren't critical bug fixes. Review incident logs for root causes. Adjust alert thresholds to catch early signs of potential downtime.
★ SLA Violation Quick Debug Cheat SheetFive common symptoms when your SLA is at risk, with commands and immediate actions.
Uptime < 99.9% for current month
Immediate action
Calculate remaining downtime budget: (99.9% - current_uptime) * total_minutes_in_month. If remaining <= 0, critical.
Commands
SELECT (count(*) filter(where status='DOWN') / count(*)) * 100 FROM uptime_log WHERE month = current_month;
echo 'Remaining downtime minutes: $(( ( (999*1000) - ${current_uptime_dec} ) * ${minutes_in_month} / 10000 ))';
Fix now
Pause non-critical deploys, reduce rollout velocity, increase canary duration.
End-to-end monitoring shows red but service dashboards green+
Immediate action
Assume compound SLA issue. Collect per-service uptime for the last 24 hours.
Commands
curl -s http://prometheus/api/v1/query?query=avg_over_time(up{job=~'.+'}[24h]) | jq '.data.result[] | {service: .metric.job, uptime: .value[1]}'
python -c 'services=[0.999,0.999,0.999,0.999]; print("Combined uptime:", 100*(reduce(lambda x,y:x*y, services)), "%")'
Fix now
Identify which service has the lowest uptime; it's the bottleneck. Add replicas or upgrade its SLA.
Planned maintenance incorrectly included in uptime calculation+
Immediate action
Compute correct uptime excluding maintenance windows.
Commands
total_minutes = (end_date - start_date).total_minutes(); maintenance_minutes = sum(maintenance_durations); uptime = (total_minutes - outage_minutes) / (total_minutes - maintenance_minutes)
UPDATE sla_report SET uptime = correct_uptime WHERE month = '2026-04';
Fix now
Automate exclusion of planned maintenance from monitoring queries. Use annotation tags in your monitoring system.
Error budget nearly exhausted mid-month+
Immediate action
Calculate daily allowable error budget and compare to actual daily error rate.
Commands
total_budget_minutes = 0.001 * total_month_minutes; daily_budget = total_budget_minutes / 30;
SELECT date, sum(downtime_minutes) AS actual, daily_budget FROM incidents GROUP BY date HAVING sum(downtime_minutes) > daily_budget;
Fix now
Implement manual change freeze until error budget recovers. Set up proactive alerts at 50% budget consumption.
Customer complaints about occasional latency, but uptime is fine+
Immediate action
Add a latency SLO (e.g., p99 < 200ms) and measure separately.
Commands
curl -s 'http://prometheus/api/v1/query?query=histogram_quantile(0.99,rate(request_duration_bucket[5m]))' | jq '.data.result[] | {service: .metric.job, p99: .value[1]}'
echo 'Degraded% = (count where p99 > 200ms) / total_sample * 100'
Fix now
Optimise the slowest endpoint or add caching. SLA for availability is binary; latency degradation is invisible to uptime metrics.
Uptime Levels Comparison
SLA (%)Downtime per YearDowntime per Month (30d)Downtime per WeekTypical Use Case
99% (two nines)87.6 hours (~3.65 days)7.3 hours1.68 hoursInternal dev/test
99.9% (three nines)8.76 hours43.2 minutes10 minutesSaaS web apps
99.99% (four nines)52.56 minutes4.32 minutes1 minuteE-commerce, payments
99.999% (five nines)5.26 minutes25.9 seconds6 secondsReal-time trading, healthcare

Key takeaways

1
You now understand what SLA and Uptime Calculation is and why it exists
2
You've seen it working in a real runnable example
3
Practice daily
the forge only works when it's hot 🔥
4
Uptime percentages are deceptive
the difference between 99% and 99.9% is 78 hours of downtime per year.
5
Compound SLAs multiply risk; always compute the product of service uptimes.
6
Error budgets turn a static SLA into a deployment throttle; respect the freeze when budget is low.
7
Monitoring resolution must be finer than your shortest expected outage to get accurate uptime.
8
SLA design is a business decision
match the target to cost, not to ego.

Common mistakes to avoid

6 patterns
×

Memorising syntax before understanding the concept

Symptom
Engineers can recite the nines table but can't explain when to exclude maintenance windows or how to handle compound SLA in a real system.
Fix
Focus on understanding the rationale behind uptime calculations. Practice with real data from your monitoring system. Don't just memorise formulas — apply them.
×

Skipping practice and only reading theory

Symptom
After reading, you know the math but your first SLA calculation for a multi-service system has an error because you didn't consider overlapping downtime windows.
Fix
Set up a small project with two mock services and calculate compound SLA for different scenarios. Use Python to simulate and verify.
×

Treating SLA as a single percentage without defining scope

Symptom
Different teams measure uptime differently — one uses server-side pings, another uses client-side latency, yet they report the same 99.9% number. Discrepancies cause false SLA violations.
Fix
Define scope in the SLA contract: measurement method (synthetic vs real-user), measurement window (rolling vs calendar), and exclusion criteria (planned maintenance, force majeure).
×

Assuming linear additive downtime for multi-service architecture

Symptom
System reports 99.95% uptime but end-to-end availability is only 99.7%. Customers experience frequent timeouts but individual service dashboards show green.
Fix
Compute compound SLA as the product of all critical service uptimes. Track end-to-end availability with a single synthetic transaction that exercises the full path.
×

Including planned maintenance in uptime calculation

Symptom
Uptime report shows 99.8% despite no unplanned outages. Team panics, but later realises they didn't exclude a 4-hour maintenance window. The real uptime is 99.9%.
Fix
Always subtract planned maintenance minutes from the denominator before calculating uptime. Automate this in monitoring queries with annotation tags.
×

Setting SLA target without understanding cost impact

Symptom
Startup promises 99.99% uptime to first customer. After scaling analysis, they realise it requires multi-region deployment costing $50k/month — unsustainable.
Fix
Map SLA targets to real infrastructure costs first. Use the decision tree to choose a target that matches budget and criticality.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
A client promises you 99.9% uptime for their API. What does that mean in...
Q02SENIOR
Explain how compound SLA works. If you have three services each with 99....
Q03SENIOR
How do you use error budgets in practice to decide whether to deploy on ...
Q04SENIOR
Your team promises 99.99% uptime but your current monitoring resolution ...
Q01 of 04SENIOR

A client promises you 99.9% uptime for their API. What does that mean in real terms, and how would you verify it?

ANSWER
99.9% uptime means the API can be down for at most 0.1% of the measurement period. For a 30-day month, that's 43.2 minutes. To verify, I'd set up synthetic monitoring that probes the API every minute from multiple locations, logging each failure. At the end of the month, I'd calculate availability as (total probes - failed probes) / total probes, excluding any planned maintenance windows agreed in the SLA contract. I'd also check the measurement window — calendar month or rolling.
FAQ · 5 QUESTIONS

Frequently Asked Questions

01
What is SLA and Uptime Calculation in simple terms?
02
How is downtime calculated if a service is partially degraded (slow but not down)?
03
What's the difference between Availability and Reliability?
04
Should I include planned maintenance in my uptime calculation?
05
What is the difference between an SLA and an SLO?
🔥

That's Fundamentals. Mark it forged?

7 min read · try the examples if you haven't

Previous
Availability and Reliability
9 / 10 · Fundamentals
Next
Idempotency in API Design