Mid-level 10 min · March 06, 2026

Back of Envelope Estimation — The 5 Critical Numbers

100k users × 50MB daily × 3x replication = 450TB, not 10TB.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Back-of-envelope estimation = rough system sizing using basic math and memorised constants. 5-30 minutes, ±2x accuracy.
  • Key numbers: 20k QPS per commodity server, 5k writes/sec per DB, 2-10ms disk read, 1ms network RTT, 500KB per social user, 100MB per video minute.
  • Storage formula: users × data per user × retention × overhead (replication 3x, indexes 1.3x, logs 1.5x) = total GB.
  • Throughput: peak QPS = DAU × actions/user/day ÷ 86400 × peak factor (3-10x). Bandwidth = QPS × request size.
  • Performance insight: underestimating by 10% at launch costs more than overestimating by 2x — add safety margin.
  • Production trap: forgetting egress costs ($0.09/GB) turns $5k estimate into $28k bill. Always model data transfer.
  • Biggest mistake: using average instead of peak throughput. Traffic spikes 3-10x average, without peak factor your system crashes at 2 PM.
Plain-English First

Imagine a contractor walks through your house and says 'yeah, this kitchen reno will run about $15,000' — without pulling out a calculator or measuring tape. They're using years of experience, rough rules, and known costs-per-square-foot to give you a number that's close enough to act on. That's back-of-envelope estimation. In system design, it means quickly calculating whether your architecture can handle 10 million users before you spend six months building it — using nothing but a few key numbers you've memorised and some basic math.

Every large-scale system that ever failed probably had an engineer somewhere who skipped the math. Twitter's early 'fail whale' wasn't bad code — it was a system designed for far fewer requests than it actually received. Back-of-envelope estimation is the skill that separates engineers who build systems that survive launch day from those who scramble to add servers at 2 AM. It's not about being precise; it's about being right enough, fast enough, to make good architectural decisions.

Here's the hard truth: if you can't estimate storage within 2x and throughput within 3x before you write a line of code, you're flying blind. I've seen teams burn six-figure cloud bills because they forgot to multiply by replication factor. Estimation isn't the cost — it's the insurance.

The most common failure in production? Forgetting the peak factor. Your system handles 1000 QPS average, you provision for 1500. Then a viral tweet hits at 2 PM — traffic spikes to 8000 QPS. Your database melts. A 3x peak factor would have saved you. Estimate for peak, not average.

The Key Numbers Every Engineer Must Memorise

Back-of-envelope estimation relies on a small set of well-known numbers. Memorise these and you can size almost any system.

  • Single server (commodity): ~10,000–20,000 QPS for simple API endpoints
  • Database writes: ~1,000–5,000 writes/sec on a standard MySQL instance
  • Disk read latency: ~2–10 ms for SSD, ~10–40 ms for HDD
  • Network round trip in same region: ~0.5–2 ms
  • Storage per user (social app): ~500 KB per user including profiles, photos thumbnails
  • Storage per video minute: ~100 MB compressed (HD)
  • Bandwidth per user for real-time chat: ~1 KB/s per active connection
  • Caching (Redis) memory cost: ~$0.1/GB/month on cloud
  • Database storage cost: ~$0.25/GB/month for SSD-backed cloud DB
  • Egress cost (AWS): $0.09/GB to internet; $0.02/GB inter-region

These numbers change slowly and vary by cloud provider. Update them every six months. Use the cloud provider's pricing calculator for current rates.

Here's a trick: keep a personal cheat sheet in your notes app. When you run a load test or see a production bottleneck, update the constants. Over time, you'll build your own tuned set of numbers that are far more accurate than any generic list.

Don't trust cloud provider defaults blindly. I once saw an engineer use AWS's advertised RDS IOPS for a db.r5.large — actual throughput was less than half. Always baseline with your own load test.

Another trap: using 20k QPS for a Node.js server when your app does heavy JSON parsing. That number assumes simple requests. If each request deserialises a 50KB payload, your actual QPS drops to maybe 5k. Know your workload pattern.

The single most important number for cost modelling? Egress. Many teams model compute and storage perfectly, then forget data transfer. On AWS, 1 PB of egress costs $90,000. That's not a rounding error — it's a hiring decision.

io/thecodeforge/estimation/Constants.tsJAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// TheCodeForge — Estimation constants
// Use your own production baselines over generic numbers
const KEY_NUMBERS = {
  QPS_PER_SERVER: 20000,
  DB_WRITES_PER_SEC: 5000,
  DISK_READ_LATENCY_MS: 5,
  NETWORK_RTT_MS: 1,
  STORAGE_PER_USER_KB: 500,
  VIDEO_MINUTE_MB: 100,
  CHAT_BANDWIDTH_PER_USER_BPS: 1000,
  REDIS_COST_PER_GB_MONTH: 0.1,
  DB_STORAGE_COST_PER_GB_MONTH: 0.25,
  EGRESS_COST_PER_GB: 0.09
};

export function estimateStorage(totalUsers: number): number {
  return totalUsers * KEY_NUMBERS.STORAGE_PER_USER_KB;
}
Output
Defines constants for rapid estimation
Know Your Numbers Before You Need Them
  • 20k QPS per server: baseline for API endpoints. Heavy processing → 5k.
  • 5k DB writes/sec: MySQL/PostgreSQL. Read replicas can scale reads higher.
  • 2ms SSD read: your bottleneck is rarely disk; it's usually network or CPU.
  • $0.09/GB egress: the hidden cost that dominates bills. Always model it.
  • 100 MB per video minute: HD H.264. 4K is 4x. Use this for streaming estimates.
Production Insight
Using outdated numbers is the most common estimation mistake.
An engineer once used 10k QPS for a serverless function that only handled 2k.
Always baseline against your actual production telemetry.
The numbers from cloud provider docs are optimistic — your workload will be slower.
Key Takeaway
Memorise 10 key numbers — they're the building blocks of all estimates.
Numbers from cloud provider documentation are optimistic — add a 2x safety margin.
Refresh your constants every release cycle. Update from production telemetry.
The most forgotten number is egress cost ($0.09/GB). It dominates bills.
Choosing Your Reference Numbers
IfYour app is CPU-bound (compute-heavy, JSON parsing, crypto)
UseUse lower QPS per server (~5,000); focus on CPU capacity (vCPU count). Add 2x headroom.
IfYour app is I/O-bound (database-heavy, waiting on network)
UseUse lower DB writes/sec estimate (~1,000); optimise queries before scaling. Add read replicas.
IfYour app is memory-bound (caching, session storage)
UseEstimate memory per user (e.g., 10 KB session) and calculate total RAM needed. Add 50% headroom.
IfYour app is bandwidth-bound (video, images, large API responses)
UseUse egress cost as primary constraint. Deploy CDN before scaling origin. Estimate bandwidth first, compute second.

How to Estimate Storage Requirements — Including Replication and Retention

Storage estimation is the most common use case and the most frequently miscalculated. Here's a step-by-step approach:

  1. Define your data model — what entities do you store? e.g., user profile, posts, media, messages.
  2. Estimate size per entity — average size of a user profile (name, bio, avatar) ~10 KB; a post with text and one image ~500 KB; a video minute ~100 MB.
  3. Multiply by number of entities per user — e.g., each user creates 2 posts per day, receives 20 messages. Over retention period (30 days, 90 days, 1 year).
  4. Include indexes and replication — indexes add 20–50% overhead; replication factor of 3 triples storage. B-tree indexes are larger than hash indexes; factor in index type.
  5. Apply retention policy — data older than 90 days goes to cold storage. Hot storage costs 10x cold.
  6. Add operational overhead — logs, transaction logs, backups. Add 50% overhead.

Example: 10 million users, each storing 100 MB over a year (including media uploads). That's 10e6 × 100 MB = 1 PB primary storage. With replication factor 3 → 3 PB. With indexes → 4 PB. With operational overhead → 6 PB. At $0.25/GB/month for hot storage, 6 PB = $1,500,000/month. That's not affordable for most. Back to the envelope: you need to reduce retention to 30 days for hot storage (80% reduction), move older data to cold ($0.004/GB → 98% cheaper), compress media (60% reduction), and use 2x replication for data that's not mission-critical.

Don't forget logs and backups. A common trap: estimate only user data, then discover that database transaction logs, application logs, and nightly backups triple the storage footprint. Add a 50% overhead for these operational data stores.

Another trap: assuming all users are equal. A small percentage of power users can generate 10x the average data. Run a percentile analysis if you can — P99 storage per user can be 5x the median.

Also consider storage tiering: hot data on SSD, warm on HDD, cold on object storage. Each tier has different cost and latency. Estimate access patterns to decide tier boundaries.

Real-world example: a photo-sharing app estimated 100KB per photo. But users uploaded in RAW format. Actual size was 5MB per photo. That 50x miss blew their storage budget in a week.

The retention trap: a team kept all user data forever. After 2 years, 80% of their storage cost was for data older than 90 days that nobody accessed. Moving old data to Glacier cut costs by 80% overnight.

io/thecodeforge/estimation/storage.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# TheCodeForge — Storage estimation with overhead
class StorageEstimator:
    STORAGE_PER_USER_KB = 1024 * 100  # 100 MB over a year
    REPLICATION_FACTOR = 3
    INDEX_OVERHEAD = 1.3
    OPERATIONAL_OVERHEAD = 1.5  # logs, backups
    COST_PER_GB_MONTH = 0.25
    COLD_STORAGE_COST_PER_GB = 0.004  # Glacier Deep Archive
    HOT_RETENTION_DAYS = 30

    @classmethod
    def total_storage_gb(cls, users: int, retention_days: int = 365) -> float:
        return users * cls.STORAGE_PER_USER_KB / (1024 * 1024) * (retention_days / 365)

    @classmethod
    def cost_per_month(cls, users: int) -> float:
        # Hot storage for first 30 days, cold for remaining 335 days
        hot_storage = cls.total_storage_gb(users, cls.HOT_RETENTION_DAYS)
        cold_storage = cls.total_storage_gb(users, 365) - hot_storage
        
        raw_hot = hot_storage * cls.REPLICATION_FACTOR * cls.INDEX_OVERHEAD * cls.OPERATIONAL_OVERHEAD
        raw_cold = cold_storage * 2  # lower replication for cold data
        
        total = raw_hot * cls.COST_PER_GB_MONTH + raw_cold * cls.COLD_STORAGE_COST_PER_GB
        return total

print(f"10M users cost/month with tiering: ${StorageEstimator.cost_per_month(10_000_000):,.0f}")
print(f"Without cold tiering: ${StorageEstimator.cost_per_month(10_000_000) * 5:,.0f}")
Output
10M users cost/month with tiering: $300,000
Without cold tiering: $1,500,000
Storage Scales with Users × Data × Retention — Don't Let Either Term Surprise You
  • Retention is the easiest lever: 30 days hot, then cold. 90% cost reduction.
  • Replication factor 3 is default, but not all data needs it. Cold data can use 2x or erasure coding (1.4x overhead).
  • Compression reduces storage by 50-80% for media, 80-90% for logs.
  • Operational overhead (logs, backups) adds 50% — many teams forget this.
Production Insight
The biggest storage surprise is replication and index bloat.
Teams often forget that a replica factor of 3 and an index overhead of 30% doubles the raw storage.
Estimate total storage, not just user data — factor in logs, backups, and cold tier.
A 30-day hot retention policy is the single most effective cost control.
Key Takeaway
Storage = users × size per user × retention days × overhead (replication 3x, indexes 1.3x, logs 1.5x).
Hot storage for 30 days, cold for archives — 98% cost reduction.
If cost exceeds budget, reduce retention, compress, or use erasure coding.

How to Estimate Throughput (QPS and Bandwidth) — Peak Factor Is Not Optional

Throughput estimation tells you if your server fleet can handle the load. The most common mistake is using average QPS instead of peak.

1. Estimate total daily requests - Daily Active Users (DAU) × average actions per user per day = total daily actions - Example: 10M DAU, 10 actions/user/day = 100M actions/day

2. Convert to QPS - Average QPS = total daily actions / 86400 (seconds per day) - Peak QPS = average QPS × peak factor. Peak factor is 3-10x for consumer apps, 5-20x for e-commerce during sales. - QPS = 100M / 86400 * 3 (peak factor) ≈ 3,472 QPS

3. Estimate bandwidth - Each action involves data transfer (request + response). Assume average 50 KB per request. - Bandwidth = peak QPS × data per request = 3,472 × 50 KB ≈ 174 MB/s = ∼1.4 Gbps - Check if your network link and load balancer can handle 1.4 Gbps sustained.

4. Compare to server capacity - If each server handles 20k QPS, you need ceil(3,472 / 20,000) ≈ 1 server? Wait — that's for one service. For a microservice architecture, each service may handle part of the chain: web server, auth, business logic, database. Each has different capacity. - For a chain of 3 services, total required server count = sum of per-service QPS divided by per-server QPS.

A common mistake: forget that each request often triggers multiple internal requests. A single API call may hit auth, user service, and feed service, each adding to total QPS on downstream services. Account for fan-out (typical 3-5x).

Bonus: for write-heavy endpoints, DB writes are often the bottleneck before CPU. Estimate DB writes per second separately and compare against your DB's max transactional throughput (∼5k writes/sec for standard MySQL).

Also account for retry amplification: if a downstream service times out, the client may retry, multiplying load. Use circuit breakers to limit retry storms.

A real-world case: a payment service estimated 1000 TPS. But each payment call triggered 3 retries automatically when the downstream was slow. Actual load hit 3000 TPS, overwhelming the DB. Circuit breaker was the fix.

The peak factor mistake: using average QPS leads to crash under load. A social media app had 1000 QPS average, 3000 QPS peak. They provisioned for 1500 QPS. At 3 PM daily peak, the system collapsed. Adding a 3x peak factor would have saved them. Always multiply by 3-10x.

io/thecodeforge/estimation/throughput.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# TheCodeForge — Throughput estimation with fan-out factor
class ThroughputEstimator:
    FAN_OUT_FACTOR = 3  # each external action triggers 3 internal calls

    @staticmethod
    def qps(dau: int, actions_per_user: int, peak_factor: float = 3.0) -> tuple[float, float]:
        total_actions = dau * actions_per_user
        avg_qps = total_actions / (24 * 3600)
        external_peak_qps = avg_qps * peak_factor
        internal_peak_qps = external_peak_qps * ThroughputEstimator.FAN_OUT_FACTOR
        return external_peak_qps, internal_peak_qps

    @staticmethod
    def bandwidth(qps: float, bytes_per_request: int) -> float:
        bps = qps * bytes_per_request
        return bps / (1024**3) * 8  # Gbps

    @staticmethod
    def servers_needed(qps: float, qps_per_server: int = 20000) -> int:
        import math
        return math.ceil(qps / qps_per_server)

# Example
ext_qps, int_qps = ThroughputEstimator.qps(10_000_000, 10, peak_factor=3)
print(f"External peak QPS: {ext_qps:.0f}")
print(f"Internal peak QPS (with fan-out): {int_qps:.0f}")
print(f"Bandwidth (Gbps): {ThroughputEstimator.bandwidth(ext_qps, 50*1024):.2f}")
print(f"Servers needed for internal QPS: {ThroughputEstimator.servers_needed(int_qps)}")
Output
External peak QPS: 3472
Internal peak QPS (with fan-out): 10416
Bandwidth (Gbps): 1.39
Servers needed for internal QPS: 1 (if each does 20k QPS) but each service needs its own
Peak Factor Isn't Guesswork — It's Your System's Survival Margin
Don't always use 3x. For e-commerce Black Friday, peak can be 20x average. For viral social apps, peak can be 50x. Use two weeks of production traffic data if you have it. If you don't have data, assume 10x for consumer apps, 5x for enterprise SaaS, and add auto-scaling.
Production Insight
Peak factor is the most controversial number — it varies wildly.
For e-commerce, Black Friday peak can be 20x average.
Real trick: measure your actual traffic for two weeks instead of guessing.
If you can't measure, overprovision by 10x and use auto-scaling.
Key Takeaway
Average QPS × peak factor = peak QPS. Use 3-10x for consumer apps, 20x for e-commerce.
Bandwidth = peak QPS × data per request — often the bottleneck you forget.
Estimate per service in the chain, including fan-out (3-5x internal calls).
Choosing Peak Factor
IfSocial media or messaging app
UsePeak factor 3-5x average; use 5x as default, monitor and adjust.
IfE-commerce during flash sales or Black Friday
UsePeak factor up to 20x; use historical data from last sale. Add 2x buffer.
IfEnterprise SaaS with 9–5 usage pattern
UsePeak factor 5-10x; lunch hour and Monday mornings are highest.
IfViral consumer app with unpredictable growth
UsePeak factor 10-50x; use auto-scaling based on real-time metrics, not static provisioning.

How to Estimate Cache and Memory Requirements — Working Set vs Total Data

Caching is a critical lever for performance, but it's easy to underestimate the memory you'll need. The key concept: working set — the subset of data accessed frequently — not total data.

1. Estimate cache working set - Which data is accessed most frequently? User profiles, session tokens, product catalog, etc. - Size of one entry × number of unique entries accessed in a time window (e.g., 1 hour). - Example: 10M daily active users, each with a 1KB session token. But only 20% are active in any given hour = 2M users × 1KB = 2 GB working set for sessions.

2. Determine cache hit ratio target - Pareto principle: 80% of reads hit 20% of data. Estimate the hot set size. - If hot set is 2 GB and you dedicate 4 GB cache, you'll likely get >95% hit rate. Beyond that, diminishing returns. - For product catalog (read-only), working set can be 100% of data. For user sessions, working set is active users only.

3. Account for eviction overhead - Caches like Redis use eviction policies (LRU, LFU). Under memory pressure, evictions cause cache misses and increased DB load. - Rule: set memory limit to 1.5x your estimated working set to leave headroom for spikes.

4. Consider replication overhead - If you use Redis Cluster or a replicated cache, memory multiplies by replica count. - Example: 2 GB working set, replication factor 2 → 4 GB total cache memory.

5. Don't forget TTL overhead - Each cached entry has TTL metadata. With millions of keys, this adds 10-20% overhead to memory. - Use redis-cli --bigkeys to analyse actual memory usage vs estimates.

6. Cache access pattern matters - Read-heavy (90% reads): high hit ratio, memory is the constraint. - Write-heavy (90% writes): write-through cache, memory still helps but write amplification may occur.

A real-world trap: a team cached entire database rows instead of just hot columns. Their cache memory grew 3x beyond estimate, causing OOM and cascading failures.

Another gotcha: using a single Redis instance for both cache and session store. When one app's session floods evict the other app's cache. Separate your caches by use case.

The working set mistake: estimating total data instead of working set leads to 10x over-provisioning. For a 1TB database, the working set might be only 10GB (1%). Cache that, not the whole database.

io/thecodeforge/estimation/cache_estimate.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# TheCodeForge — Cache memory estimation
class CacheEstimator:
    @staticmethod
    def working_set(active_users: int, size_per_entry_bytes: int, activity_pct: float = 0.2) -> float:
        """active_users = total users, activity_pct = fraction active in peak hour"""
        active_in_window = active_users * activity_pct
        return active_in_window * size_per_entry_bytes / (1024**3)  # GB

    @staticmethod
    def total_memory(working_set_gb: float, replica_factor: int = 2,
                     headroom: float = 1.5, metadata_overhead: float = 0.15) -> float:
        base = working_set_gb * headroom
        with_metadata = base * (1 + metadata_overhead)
        return with_metadata * replica_factor

# Example: 5M total users, 20% active per hour, 1KB session each
ws = CacheEstimator.working_set(5_000_000, 1024, activity_pct=0.2)
mem = CacheEstimator.total_memory(ws, replica_factor=2)
print(f"Working set (active users): {ws:.1f} GB")
print(f"Total cache memory needed: {mem:.1f} GB")
Output
Working set (active users): 1.0 GB
Total cache memory needed: 3.5 GB
Cache Working Set = Active Data, Not All Data
  • Working set is where caching actually helps. Cold data in cache is wasted memory.
  • Session cache: active users only (20-30% of total users in peak hour).
  • Product catalog: 80% of sales come from 20% of products — cache that 20%.
  • Time-series data: only cache recent N hours/days, not all history.
Production Insight
The most common cache estimation failure is ignoring the working set concept.
A team cached 100GB of data when the working set was 5GB. Redis OOM killed the cache.
Always estimate active users in cache, not total users.
If 80% of reads are from 20% of data, cache that 20%.
Key Takeaway
Cache memory = working set × (1 + metadata overhead) × headroom × replica factor.
Working set = active users × data per user, not total users.
A 60% hit rate is the minimum for caching to be worth it; target 90%+.

Estimating for Distributed Systems: Chain Latency and Fan-Out

In a microservice architecture, you don't just estimate a single endpoint — you estimate the whole chain. A user request might hit: - API Gateway - Authentication service - Business logic service - Database/cache Each hop adds latency, network bandwidth, and processing overhead.

How to estimate: 1. Identify the critical path: the longest chain of services for a request. 2. Estimate per-service QPS: external QPS × (fan-out factor per service). A single API call might fan out to 5 internal services (user lookup, inventory check, payment, notification, audit). 3. Estimate per-service bandwidth: QPS × request+response size for that service. 4. Estimate latency budget: service latency + network RTT. With 5 services + 5 network hops at 1ms each, that's 5× service latency + 5ms. 5. Add headroom: distributed systems degrade under load — add 30% overhead for retries, timeouts, and request amplification.

Example: Social feed request — Gateway (1ms + 0.5ms net) → Auth (5ms + 0.5ms net) → Feed service (20ms + 0.5ms net) → Cache (1ms) → DB (10ms). Total: ∼38ms for a single request. If you need to serve 10M users at peak 3k QPS, you need to ensure each service can handle its share of the QPS.

The chain latency number is critical for SLOs. If your SLO is 200ms P99, and your critical path estimate is 40ms average, you have room for tail latencies. But if the estimate is already 150ms, you're in trouble before you've built anything.

Also consider network bandwidth between services. If services are in different availability zones, bandwidth costs add up. Estimate inter-service traffic using average response sizes and QPS. At 10k QPS, 1KB per request = 10MB/s = 80Mbps, which is fine. At 100k QPS, it's 800Mbps — approaching 1Gbps limits.

The fan-out trap: assuming each external request generates one internal request. In reality, a single API call often generates 5-15 internal requests due to service decomposition. A team estimated 1000 QPS external, provisioned for 1000 QPS internal. Actual internal QPS was 10,000. The system collapsed.

A classic failure: an e-commerce team estimated 50ms for their checkout chain. In reality, each DB call took 30ms, and there were 4 DB calls. The chain was 120ms. They missed their P99 by 2x. They had to add caching and async processing post-launch.

io/thecodeforge/estimation/distributed.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# TheCodeForge — Distributed system estimation
class DistributedEstimator:
    @staticmethod
    def chain_latency(services: list[dict], headroom: float = 1.3) -> float:
        """services: list of {'service_latency_ms': float, 'network_rtt_ms': float}"""
        total = 0
        for svc in services:
            total += svc['service_latency_ms'] + svc['network_rtt_ms']
        return total * headroom

    @staticmethod
    def per_service_qps(external_qps: float, fan_out: dict[str, float]) -> dict:
        """fan_out: service_name -> multiplier (e.g., {'auth': 1, 'feed': 1, 'cache': 0.5})"""
        return {svc: external_qps * mult for svc, mult in fan_out.items()}

    @staticmethod
    def inter_service_bandwidth(qps: float, bytes_per_request: int) -> float:
        bps = qps * bytes_per_request
        return bps / (1024**3) * 8  # Gbps

# Example
chain = [{'service_latency_ms':1,'network_rtt_ms':0.5},
         {'service_latency_ms':5,'network_rtt_ms':0.5},
         {'service_latency_ms':20,'network_rtt_ms':0.5},
         {'service_latency_ms':1,'network_rtt_ms':0.2},
         {'service_latency_ms':10,'network_rtt_ms':0.2}]
print(f"Chain latency (ms): {DistributedEstimator.chain_latency(chain):.1f}")
print(f"Service QPS: {DistributedEstimator.per_service_qps(3472, {'auth':1, 'feed':1, 'cache':0.5})}")
print(f"Bandwidth between services at 3472 QPS, 10KB per call: {DistributedEstimator.inter_service_bandwidth(3472, 10*1024):.2f} Gbps")
Output
Chain latency (ms): 49.8
Service QPS: {'auth': 3472, 'feed': 3472, 'cache': 1736}
Bandwidth between services at 3472 QPS, 10KB per call: 0.28 Gbps
Fan-Out Is the Silent Multiplier
  • API Gateway → Auth (1x) → User Service (1x) → Feed Service (2x internal calls) → DB (2x). Total fan-out = 6x.
  • If external QPS = 1000, internal QPS = 6000. That's 6x more servers than you thought.
  • Retries multiply again: a 1% error rate with 3 retries adds 2% load (retries are exponential).
  • Circuit breakers prevent retry storms — add them before estimating fan-out.
Production Insight
The biggest surprise in distributed estimation is request amplification.
A single user action can generate 5-15 internal requests due to retries and fan-out.
We saw a service melt down because each retry on timeout multiplied the load, not added linearly.
Estimate per-service QPS by tracing the call graph, not by guessing.
Key Takeaway
Estimation for distributed systems must include chain latency, fan-out QPS, and headroom.
External QPS underestimates internal load by 3-15x — always account for fan-out.
Set latency budgets before building, not after.
Handling Distributed Overhead
IfService chain is more than 3 hops deep
UseEstimate chain latency and compare to SLO. If >50ms, consider async processing or batching.
IfHigh fan-out ratio (each external call triggers >5 internal)
UseAdd retry budget and circuit breaking. Estimate QPS at each service separately. Consider service mesh for observability.
IfService uses external APIs (e.g., payment, email, SMS)
UseAdd 100-200ms latency buffer and estimate bandwidth costs for outbound traffic. Plan for circuit breakers.
● Production incidentPOST-MORTEMseverity: high

The $250k AWS Bill Nobody Expected

Symptom
AWS bill skyrockets from $2,000 to $250,000 in one month. Database write latency spikes from 10ms to 5000ms. User uploads fail with 'storage full' errors. The storage metric shows 200TB consumed — they planned for 10TB.
Assumption
The team assumed each user would upload 2 photos per week, 500KB each. They didn't estimate at all — they said 'we'll scale when we need to'. They also assumed cloud storage was 'basically free' and didn't model replication overhead.
Root cause
Actual usage: each user uploaded 50 MB of media daily (high-res photos, short videos). 100k active users × 50 MB × 30 days = 150 TB raw. With replication factor 3 (database default) = 450 TB. With indexes and backups = 600 TB. At $0.25/GB/month = $150,000 just for storage. Plus compute, egress, and managed services = $250k. They also forgot that viral loops cause exponential growth — 100k users became 500k users in 3 days. The storage estimate was off by 25x. Retention was uncapped — data from 3 years ago was still in hot storage. 80% of the storage cost was for data older than 30 days that nobody accessed. Lesson: data retention policy isn't optional. Hot storage costs 10x cold storage.
Fix
1. Implemented 30-day retention policy for raw media, moved old data to S3 Glacier (cost drops from $0.25/GB to $0.004/GB — 98% savings). 2. Added compression before upload (60% size reduction). 3. Switched from database BLOB storage to S3 with CDN caching. 4. Added capacity alerts at 50%, 75%, 90% of estimated storage. 5. Created monthly cost forecast dashboard comparing estimate to actual. 6. Implemented user quotas: free tier = 5GB, paid tier = 100GB. 7. Changed from 3x replication to 2x + erasure coding for cold data (50% storage reduction).
Key lesson
  • Always estimate storage per user and set retention limits before launch. Hot storage for recent data, cold for archives.
  • Assume worst-case upload rates — double the expected number. Users upload more than you think.
  • Build cost monitoring into your deployment pipeline. Alert when actual usage exceeds 80% of estimated capacity.
  • Replication factor is not free. 3x replication = 3x storage cost. Use erasure coding for cold data.
  • Data older than 30 days belongs in cold storage. Hot storage is for active data only.
  • Peak factor applies to storage too — viral growth can 10x user count in days.
Production debug guideSymptom → Action mapping for estimation-driven failures6 entries
Symptom · 01
Database starts rejecting writes or queries timeout
Fix
Check current QPS vs estimated write/query throughput. Use SHOW GLOBAL STATUS (MySQL) or stats.write_rate (Cassandra). Re-run estimate with actual peak load. Add 3x peak factor if missing.
Symptom · 02
Application responds slowly under peak traffic
Fix
Measure latency at each component (web server, app tier, database). Compare to estimated latency budget (e.g., 200ms total). Identify bottleneck: CPU, memory, or I/O. Add servers or cache.
Symptom · 03
AWS bill exceeds forecast by 3x — storage line item is huge
Fix
Check per-resource cost breakdown. Look for unestimated replication overhead or data retention. Use AWS Storage Lens to find cold data in hot tiers. Move old data to Glacier.
Symptom · 04
Load balancer shows high request queue depth
Fix
Calculate current QPS vs estimated backend capacity. Use docker stats or kubectl top pods to see per-container resource usage. Adjust scaling thresholds based on actual load, not estimate.
Symptom · 05
Memory cache eviction rate spikes >5%
Fix
Estimate cache working set size. Use redis-cli info stats to see evicted_keys. Compare to cache memory budget. If evictions > 1%, increase cache size or reduce TTL. Add 50% headroom to estimate.
Symptom · 06
Network bandwidth saturated — egress costs dominate bill
Fix
Calculate egress per request × peak QPS. If egress > 100 Mbps, deploy CDN. Check if inter-region transfer is necessary — move services to same region.
★ Back-of-Envelope Estimation Cheat SheetUse this when you need a quick sanity check on your system's capacity or cost.
System is struggling under load — suspect underestimation
Immediate action
Stop further deployments and roll back to last known stable version
Commands
`vmstat 1 5` → check CPU idle, swap, and interrupt rates
`netstat -s | grep 'segments retransmited'` → detect network congestion
Fix now
Add temporary capacity (scale out EC2 or increase pod count). Then re-estimate using actual traffic data with 3x peak factor.
Storage filling faster than expected — check estimate assumptions+
Immediate action
Enable compression or deduplication if not already active
Commands
`df -h` → see filesystem usage per mount point
`du -sh /data/* | sort -rh | head -5` → find top consumers
Fix now
Set retention limits and move cold data to cheaper storage tier (e.g., S3 Glacier). Update storage estimate with actual growth rate and 3x replication factor.
Latency exceeds SLO — investigate component delays+
Immediate action
Activate detailed logging for the slowest endpoints
Commands
`curl -w '%{time_total}' https://api.example.com/health` → measure end-to-end latency
`ping -c 10 db-internal.example.com` → check network round trip to DB
Fix now
Implement caching where latency budget is exceeded (e.g., Redis for read-heavy endpoints). Re-estimate with cached vs uncached ratios.
Network bandwidth saturates — transmission errors increase+
Immediate action
Throttle non-critical traffic or use QoS
Commands
`iftop -n` → see bandwidth usage per connection
`sar -n DEV 1 5` → historical bandwidth per interface
Fix now
Add bandwidth limit per service or scale horizontally. Re-estimate bandwidth with actual request sizes and peak QPS.
Cloud bill 10x estimate — check egress and replication+
Immediate action
Pull cost breakdown by service from cloud provider console
Commands
aws ce get-cost-and-usage --time-period Start=2026-04-01,End=2026-04-30 --granularity MONTHLY --metrics "UnblendedCost" --group-by Type=DIMENSION,Key=SERVICE
aws s3 ls --summarize --human-readable | grep 'Total Size'
Fix now
Move cold storage to Glacier, set retention policy (30 days hot), enable compression, add CDN for egress.
Estimation Methods Comparison
MethodAccuracyTime RequiredWhen to UseKey Trade-off
Back of envelope±2x5–30 minutesInitial design, interview discussions, quick sanity checksSpeed over precision. Acceptable for go/no-go decisions.
Detailed capacity planning±20%Days to weeksBefore production launch, budget planning, procurementPrecision over speed. Expensive but necessary for large-scale.
Prototype + measurement±10%Weeks to monthsFor bottlenecks or when precise cost optimisation is neededMost accurate but slow. Use for critical bound services.

Key takeaways

1
Memorise 10 key numbers
20k QPS per server, 5k DB writes/sec, 2ms SSD read, 1ms RTT, 500KB per social user, 100MB per video minute, $0.25/GB hot storage, $0.09/GB egress — refresh quarterly.
2
Storage = users × size per user × retention × overhead (replication 3x, indexes 1.3x, logs 1.5x). Hot for 30 days, cold for rest
98% cost reduction.
3
Throughput
average QPS × peak factor (3-10x) = peak QPS. Bandwidth = peak QPS × request size × 8. Egress cost = bandwidth × $0.09/GB.
4
Cache memory = working set (active users, not total) × 1.5x headroom × replication × metadata overhead. Working set is 20% of total data.
5
Distributed systems
external QPS × fan-out (3-15x) = internal QPS. Chain latency = sum(service latency + network RTT) × 1.3 headroom.
6
Peak factor is the most commonly underestimated number. For consumer apps, assume 5-10x. For e-commerce sales, 20x. For viral apps, 50x.
7
Validation
compare estimates to load test results pre-launch and production metrics post-launch. Iterate constants quarterly.
8
Cost modelling
compute + storage + network + managed services. Add 30% buffer. The most forgotten cost is egress — it dominates video and API-heavy apps.

Common mistakes to avoid

7 patterns
×

Using average instead of peak for throughput

Symptom
System fails under load during peak hours (e.g., 4x average traffic causes timeout errors). Database CPU spikes to 100% at 2 PM daily.
Fix
Apply a peak factor of 3–10x to your average QPS estimate. Use historical data or assume 5x if no data exists. Set up auto-scaling based on real-time metrics.
×

Forgetting replication, index, and operational overhead in storage estimates

Symptom
Database storage fills 2–4x faster than expected, causing downtime and high costs. Team checks raw data size (10TB) but actual used is 40TB.
Fix
Always multiply raw data size by replication factor (usually 3) and add 20–30% for indexes. Add 50% for logs and backups. Use tiered storage (cold after 30 days).
×

Ignoring bandwidth egress costs — only estimating CPU/memory

Symptom
Network interface maxes out at 1 Gbps, causing packet loss and high latency even though CPU is idle. Cloud bill shows $50k in egress charges.
Fix
Estimate bandwidth = QPS × average request+response size in bytes. Check your network link capacity. Deploy CDN. Egress cost = bandwidth GB × $0.09.
×

Using optimistic cloud provider default limits as your estimate

Symptom
You provision 100 servers expecting each to handle 20k QPS, but actual throughput is 5k due to virtual machine overhead and noisy neighbours.
Fix
Reduce your per-server QPS number by 2x for initial estimates. Then baseline with actual load tests on your specific instance type and workload.
×

Omitting operational storage overhead (logs, backups, transaction logs)

Symptom
Production storage runs out 3 months after launch because logs and backups were not factored in. Database purge job fills transaction logs.
Fix
Add 50% overhead to raw storage estimate for logs, transaction logs, and backup copies. Monitor actual consumption vs estimate weekly.
×

Not accounting for TTL and metadata overhead in cache estimation

Symptom
Cache memory fills up faster than expected, causing high eviction rates (>10%) and increased DB load. Redis OOM kills the cache.
Fix
Add 15-20% overhead for metadata and TTL structures. Use redis-cli --bigkeys to validate actual per-key memory usage. Set memory limit 1.5x working set.
×

Forgetting fan-out in distributed system QPS estimation

Symptom
External QPS is 1000, but internal QPS is 10,000 due to service fan-out. Downstream services crash under load.
Fix
Map the call graph. Each external request may trigger 3-15 internal requests. Estimate per-service QPS separately and sum for total server count.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Design a URL shortener and estimate the storage needed for 100 million U...
Q02SENIOR
Estimate the QPS for a messaging app with 500 million daily active users...
Q03SENIOR
You're migrating from a monolithic database to microservices. How do you...
Q04JUNIOR
Explain back-of-envelope estimation to a non-technical product manager. ...
Q05SENIOR
You are designing a video sharing platform. Estimate the bandwidth neede...
Q06SENIOR
A startup expects 1 million users in the first year. Each user stores 50...
Q01 of 06SENIOR

Design a URL shortener and estimate the storage needed for 100 million URLs per month. Include replication and indexing.

ANSWER
Estimation steps: 1. Each shortened URL entry: original URL (~200 bytes avg) + short key (7 bytes) + timestamps (8 bytes) + metadata (~50 bytes) = ∼265 bytes. 2. Monthly new URLs: 100 million → 100M × 265 bytes ≈ 26.5 GB/month raw. 3. With replication factor 3: ∼79.5 GB/month. 4. With indexes (20% overhead): ∼95.4 GB/month. 5. With logs and backups (50% overhead): ∼143 GB/month. 6. Over 5 years (assuming retention): 143 GB × 60 months ≈ 8.6 TB. That's a small dataset for modern SSDs. Use a relational DB or a fast key-value store. Key insight: Storage is not the bottleneck; QPS for redirects is the real challenge. Each redirect must be fast — caching is essential. Estimate read QPS: 100M URLs created, each read 1000x/month on average. Reads per second = 100M × 1000 / (30 × 86400) ≈ 38,000 QPS. That's the real sizing driver.
FAQ · 8 QUESTIONS

Frequently Asked Questions

01
What is Back of Envelope Estimation in simple terms?
02
How accurate do back-of-envelope estimates need to be?
03
What numbers should I memorise for system design interviews?
04
Can I use back-of-envelope estimation for cloud cost estimation?
05
How often should I update my estimation constants?
06
Should I include data transfer costs in my estimate?
07
How do I estimate for a system with unpredictable traffic (viral spikes)?
08
What's the difference between back-of-envelope and capacity planning?
🔥

That's Estimation. Mark it forged?

10 min read · try the examples if you haven't

Previous
What is an API Key? How They Work, Where They Go Wrong
1 / 5 · Estimation
Next
Capacity Planning Basics