Back of Envelope Estimation — The 5 Critical Numbers
100k users × 50MB daily × 3x replication = 450TB, not 10TB.
20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.
- Back-of-envelope estimation = rough system sizing using basic math and memorised constants. 5-30 minutes, ±2x accuracy.
- Key numbers: 20k QPS per commodity server, 5k writes/sec per DB, 2-10ms disk read, 1ms network RTT, 500KB per social user, 100MB per video minute.
- Storage formula: users × data per user × retention × overhead (replication 3x, indexes 1.3x, logs 1.5x) = total GB.
- Throughput: peak QPS = DAU × actions/user/day ÷ 86400 × peak factor (3-10x). Bandwidth = QPS × request size.
- Performance insight: underestimating by 10% at launch costs more than overestimating by 2x — add safety margin.
- Production trap: forgetting egress costs ($0.09/GB) turns $5k estimate into $28k bill. Always model data transfer.
- Biggest mistake: using average instead of peak throughput. Traffic spikes 3-10x average, without peak factor your system crashes at 2 PM.
Imagine a contractor walks through your house and says 'yeah, this kitchen reno will run about $15,000' — without pulling out a calculator or measuring tape. They're using years of experience, rough rules, and known costs-per-square-foot to give you a number that's close enough to act on. That's back-of-envelope estimation. In system design, it means quickly calculating whether your architecture can handle 10 million users before you spend six months building it — using nothing but a few key numbers you've memorised and some basic math.
Every large-scale system that ever failed probably had an engineer somewhere who skipped the math. Twitter's early 'fail whale' wasn't bad code — it was a system designed for far fewer requests than it actually received. Back-of-envelope estimation is the skill that separates engineers who build systems that survive launch day from those who scramble to add servers at 2 AM. It's not about being precise; it's about being right enough, fast enough, to make good architectural decisions.
Here's the hard truth: if you can't estimate storage within 2x and throughput within 3x before you write a line of code, you're flying blind. I've seen teams burn six-figure cloud bills because they forgot to multiply by replication factor. Estimation isn't the cost — it's the insurance.
The most common failure in production? Forgetting the peak factor. Your system handles 1000 QPS average, you provision for 1500. Then a viral tweet hits at 2 PM — traffic spikes to 8000 QPS. Your database melts. A 3x peak factor would have saved you. Estimate for peak, not average.
Back of Envelope Estimation — The 5 Critical Numbers
Back of envelope estimation is the practice of approximating system capacity, latency, or throughput using simple arithmetic and a handful of known reference numbers — no profiler, no load test, just a napkin and a pen. The core mechanic is to break a high-level question (e.g., 'Can this service handle 10K QPS?') into a chain of multiplications and divisions using constants like '1 request = 2 KB' or 'disk seek = 10 ms'. You then compare the result against a known bottleneck (network bandwidth, CPU cores, memory bandwidth) to get a yes/no sanity check.
In practice, the method relies on three properties: (a) you work in orders of magnitude — 1 ms vs 10 ms matters, 1.2 ms vs 1.4 ms does not; (b) you use the 5 critical numbers — 10 ms (network round trip within a datacenter), 100 ms (disk seek), 1 GB/s (memory bandwidth), 1 Gbps (network bandwidth), 1 µs (L1 cache reference); (c) you always include a safety factor of 2–5x for real-world variance. The goal is not precision but to catch orders-of-magnitude mismatches before you build.
Use back of envelope estimation during design reviews, capacity planning, or when choosing between architectures (e.g., in-memory cache vs. disk-backed store). It matters because it forces you to surface hidden assumptions — like assuming a database can do 10K writes/sec when each write triggers a 10 ms disk seek, capping you at 100 writes/sec per disk. Without this check, teams routinely over-provision or, worse, under-provision and crash in production.
The Key Numbers Every Engineer Must Memorise
Back-of-envelope estimation relies on a small set of well-known numbers. Memorise these and you can size almost any system.
- Single server (commodity): ~10,000–20,000 QPS for simple API endpoints
- Database writes: ~1,000–5,000 writes/sec on a standard MySQL instance
- Disk read latency: ~2–10 ms for SSD, ~10–40 ms for HDD
- Network round trip in same region: ~0.5–2 ms
- Storage per user (social app): ~500 KB per user including profiles, photos thumbnails
- Storage per video minute: ~100 MB compressed (HD)
- Bandwidth per user for real-time chat: ~1 KB/s per active connection
- Caching (Redis) memory cost: ~$0.1/GB/month on cloud
- Database storage cost: ~$0.25/GB/month for SSD-backed cloud DB
- Egress cost (AWS): $0.09/GB to internet; $0.02/GB inter-region
These numbers change slowly and vary by cloud provider. Update them every six months. Use the cloud provider's pricing calculator for current rates.
Here's a trick: keep a personal cheat sheet in your notes app. When you run a load test or see a production bottleneck, update the constants. Over time, you'll build your own tuned set of numbers that are far more accurate than any generic list.
Don't trust cloud provider defaults blindly. I once saw an engineer use AWS's advertised RDS IOPS for a db.r5.large — actual throughput was less than half. Always baseline with your own load test.
Another trap: using 20k QPS for a Node.js server when your app does heavy JSON parsing. That number assumes simple requests. If each request deserialises a 50KB payload, your actual QPS drops to maybe 5k. Know your workload pattern.
The single most important number for cost modelling? Egress. Many teams model compute and storage perfectly, then forget data transfer. On AWS, 1 PB of egress costs $90,000. That's not a rounding error — it's a hiring decision.
- 20k QPS per server: baseline for API endpoints. Heavy processing → 5k.
- 5k DB writes/sec: MySQL/PostgreSQL. Read replicas can scale reads higher.
- 2ms SSD read: your bottleneck is rarely disk; it's usually network or CPU.
- $0.09/GB egress: the hidden cost that dominates bills. Always model it.
- 100 MB per video minute: HD H.264. 4K is 4x. Use this for streaming estimates.
How to Estimate Storage Requirements — Including Replication and Retention
Storage estimation is the most common use case and the most frequently miscalculated. Here's a step-by-step approach:
- Define your data model — what entities do you store? e.g., user profile, posts, media, messages.
- Estimate size per entity — average size of a user profile (name, bio, avatar) ~10 KB; a post with text and one image ~500 KB; a video minute ~100 MB.
- Multiply by number of entities per user — e.g., each user creates 2 posts per day, receives 20 messages. Over retention period (30 days, 90 days, 1 year).
- Include indexes and replication — indexes add 20–50% overhead; replication factor of 3 triples storage. B-tree indexes are larger than hash indexes; factor in index type.
- Apply retention policy — data older than 90 days goes to cold storage. Hot storage costs 10x cold.
- Add operational overhead — logs, transaction logs, backups. Add 50% overhead.
Example: 10 million users, each storing 100 MB over a year (including media uploads). That's 10e6 × 100 MB = 1 PB primary storage. With replication factor 3 → 3 PB. With indexes → 4 PB. With operational overhead → 6 PB. At $0.25/GB/month for hot storage, 6 PB = $1,500,000/month. That's not affordable for most. Back to the envelope: you need to reduce retention to 30 days for hot storage (80% reduction), move older data to cold ($0.004/GB → 98% cheaper), compress media (60% reduction), and use 2x replication for data that's not mission-critical.
Don't forget logs and backups. A common trap: estimate only user data, then discover that database transaction logs, application logs, and nightly backups triple the storage footprint. Add a 50% overhead for these operational data stores.
Another trap: assuming all users are equal. A small percentage of power users can generate 10x the average data. Run a percentile analysis if you can — P99 storage per user can be 5x the median.
Also consider storage tiering: hot data on SSD, warm on HDD, cold on object storage. Each tier has different cost and latency. Estimate access patterns to decide tier boundaries.
Real-world example: a photo-sharing app estimated 100KB per photo. But users uploaded in RAW format. Actual size was 5MB per photo. That 50x miss blew their storage budget in a week.
The retention trap: a team kept all user data forever. After 2 years, 80% of their storage cost was for data older than 90 days that nobody accessed. Moving old data to Glacier cut costs by 80% overnight.
- Retention is the easiest lever: 30 days hot, then cold. 90% cost reduction.
- Replication factor 3 is default, but not all data needs it. Cold data can use 2x or erasure coding (1.4x overhead).
- Compression reduces storage by 50-80% for media, 80-90% for logs.
- Operational overhead (logs, backups) adds 50% — many teams forget this.
How to Estimate Throughput (QPS and Bandwidth) — Peak Factor Is Not Optional
Throughput estimation tells you if your server fleet can handle the load. The most common mistake is using average QPS instead of peak.
1. Estimate total daily requests - Daily Active Users (DAU) × average actions per user per day = total daily actions - Example: 10M DAU, 10 actions/user/day = 100M actions/day
2. Convert to QPS - Average QPS = total daily actions / 86400 (seconds per day) - Peak QPS = average QPS × peak factor. Peak factor is 3-10x for consumer apps, 5-20x for e-commerce during sales. - QPS = 100M / 86400 * 3 (peak factor) ≈ 3,472 QPS
3. Estimate bandwidth - Each action involves data transfer (request + response). Assume average 50 KB per request. - Bandwidth = peak QPS × data per request = 3,472 × 50 KB ≈ 174 MB/s = ∼1.4 Gbps - Check if your network link and load balancer can handle 1.4 Gbps sustained.
4. Compare to server capacity - If each server handles 20k QPS, you need ceil(3,472 / 20,000) ≈ 1 server? Wait — that's for one service. For a microservice architecture, each service may handle part of the chain: web server, auth, business logic, database. Each has different capacity. - For a chain of 3 services, total required server count = sum of per-service QPS divided by per-server QPS.
A common mistake: forget that each request often triggers multiple internal requests. A single API call may hit auth, user service, and feed service, each adding to total QPS on downstream services. Account for fan-out (typical 3-5x).
Bonus: for write-heavy endpoints, DB writes are often the bottleneck before CPU. Estimate DB writes per second separately and compare against your DB's max transactional throughput (∼5k writes/sec for standard MySQL).
Also account for retry amplification: if a downstream service times out, the client may retry, multiplying load. Use circuit breakers to limit retry storms.
A real-world case: a payment service estimated 1000 TPS. But each payment call triggered 3 retries automatically when the downstream was slow. Actual load hit 3000 TPS, overwhelming the DB. Circuit breaker was the fix.
The peak factor mistake: using average QPS leads to crash under load. A social media app had 1000 QPS average, 3000 QPS peak. They provisioned for 1500 QPS. At 3 PM daily peak, the system collapsed. Adding a 3x peak factor would have saved them. Always multiply by 3-10x.
How to Estimate Cache and Memory Requirements — Working Set vs Total Data
Caching is a critical lever for performance, but it's easy to underestimate the memory you'll need. The key concept: working set — the subset of data accessed frequently — not total data.
1. Estimate cache working set - Which data is accessed most frequently? User profiles, session tokens, product catalog, etc. - Size of one entry × number of unique entries accessed in a time window (e.g., 1 hour). - Example: 10M daily active users, each with a 1KB session token. But only 20% are active in any given hour = 2M users × 1KB = 2 GB working set for sessions.
2. Determine cache hit ratio target - Pareto principle: 80% of reads hit 20% of data. Estimate the hot set size. - If hot set is 2 GB and you dedicate 4 GB cache, you'll likely get >95% hit rate. Beyond that, diminishing returns. - For product catalog (read-only), working set can be 100% of data. For user sessions, working set is active users only.
3. Account for eviction overhead - Caches like Redis use eviction policies (LRU, LFU). Under memory pressure, evictions cause cache misses and increased DB load. - Rule: set memory limit to 1.5x your estimated working set to leave headroom for spikes.
4. Consider replication overhead - If you use Redis Cluster or a replicated cache, memory multiplies by replica count. - Example: 2 GB working set, replication factor 2 → 4 GB total cache memory.
5. Don't forget TTL overhead - Each cached entry has TTL metadata. With millions of keys, this adds 10-20% overhead to memory. - Use redis-cli --bigkeys to analyse actual memory usage vs estimates.
6. Cache access pattern matters - Read-heavy (90% reads): high hit ratio, memory is the constraint. - Write-heavy (90% writes): write-through cache, memory still helps but write amplification may occur.
A real-world trap: a team cached entire database rows instead of just hot columns. Their cache memory grew 3x beyond estimate, causing OOM and cascading failures.
Another gotcha: using a single Redis instance for both cache and session store. When one app's session floods evict the other app's cache. Separate your caches by use case.
The working set mistake: estimating total data instead of working set leads to 10x over-provisioning. For a 1TB database, the working set might be only 10GB (1%). Cache that, not the whole database.
- Working set is where caching actually helps. Cold data in cache is wasted memory.
- Session cache: active users only (20-30% of total users in peak hour).
- Product catalog: 80% of sales come from 20% of products — cache that 20%.
- Time-series data: only cache recent N hours/days, not all history.
Estimating for Distributed Systems: Chain Latency and Fan-Out
In a microservice architecture, you don't just estimate a single endpoint — you estimate the whole chain. A user request might hit: - API Gateway - Authentication service - Business logic service - Database/cache Each hop adds latency, network bandwidth, and processing overhead.
How to estimate: 1. Identify the critical path: the longest chain of services for a request. 2. Estimate per-service QPS: external QPS × (fan-out factor per service). A single API call might fan out to 5 internal services (user lookup, inventory check, payment, notification, audit). 3. Estimate per-service bandwidth: QPS × request+response size for that service. 4. Estimate latency budget: service latency + network RTT. With 5 services + 5 network hops at 1ms each, that's 5× service latency + 5ms. 5. Add headroom: distributed systems degrade under load — add 30% overhead for retries, timeouts, and request amplification.
Example: Social feed request — Gateway (1ms + 0.5ms net) → Auth (5ms + 0.5ms net) → Feed service (20ms + 0.5ms net) → Cache (1ms) → DB (10ms). Total: ∼38ms for a single request. If you need to serve 10M users at peak 3k QPS, you need to ensure each service can handle its share of the QPS.
The chain latency number is critical for SLOs. If your SLO is 200ms P99, and your critical path estimate is 40ms average, you have room for tail latencies. But if the estimate is already 150ms, you're in trouble before you've built anything.
Also consider network bandwidth between services. If services are in different availability zones, bandwidth costs add up. Estimate inter-service traffic using average response sizes and QPS. At 10k QPS, 1KB per request = 10MB/s = 80Mbps, which is fine. At 100k QPS, it's 800Mbps — approaching 1Gbps limits.
The fan-out trap: assuming each external request generates one internal request. In reality, a single API call often generates 5-15 internal requests due to service decomposition. A team estimated 1000 QPS external, provisioned for 1000 QPS internal. Actual internal QPS was 10,000. The system collapsed.
A classic failure: an e-commerce team estimated 50ms for their checkout chain. In reality, each DB call took 30ms, and there were 4 DB calls. The chain was 120ms. They missed their P99 by 2x. They had to add caching and async processing post-launch.
- API Gateway → Auth (1x) → User Service (1x) → Feed Service (2x internal calls) → DB (2x). Total fan-out = 6x.
- If external QPS = 1000, internal QPS = 6000. That's 6x more servers than you thought.
- Retries multiply again: a 1% error rate with 3 retries adds 2% load (retries are exponential).
- Circuit breakers prevent retry storms — add them before estimating fan-out.
The Latency Numbers You Must Internalize (Because Your Cache Isn't Magic)
You can't estimate back-of-envelope if you don't know what 'fast' and 'slow' actually mean. The numbers haven't changed much since Jeff Dean published them — and that's the point. L1 cache reference: 1 nanosecond. Main memory: 100 nanoseconds. Disk seek: 10 milliseconds. Network round trip in same datacenter: 500 microseconds. Read 1 MB sequentially from disk: 30 milliseconds. These aren't trivia — they're the difference between a design that works and a design that melts under load. When a junior tells me 'we'll just cache it in Redis,' I ask them: what's your cache hit ratio? What's your working set? If your answer starts with 'I think,' you're guessing. Memorize these numbers. Internalize the orders of magnitude. Then you'll know that reading 1 MB from memory is 300x faster than from disk. That a single disk seek costs you 10 million nanoseconds — 10,000 times more than an L1 cache hit. This is why prefetching, batching, and in-memory data structures matter. Not because they're fancy. Because the physics of hardware doesn't care about your feelings.
Availability Numbers: The Nines That Keep You Up at Night (And How to Estimate Them)
When a system goes down, you don't just lose money — you lose trust. Back-of-envelope estimation forces you to quantify uptime. The industry counts 'nines': 99% is 87.6 hours of downtime per year. 99.9% is 8.76 hours. 99.99% is 52.6 minutes. 99.999% (five nines) is 5.26 minutes. Most teams overestimate their availability. They build for 99.99% but test for happy path. Reality hits when a DNS provider fails or a database replica lags. To estimate your own availability, map the chain: each component has an uptime. If your load balancer is 99.99%, database is 99.95%, and cache is 99.9%, the combined availability is 0.9999 0.9995 0.999 = 0.9984 — that's 99.84%, or 14 hours of downtime a year. Two nines less than you thought. This is why distributed systems introduce redundancy. A single server can't give you five nines. But a cluster with a 99% uptime per node, with three replicas and automatic failover, gets you closer. Back-of-envelope estimation tells you whether your SLA is even possible with your current architecture before you sign it.
Putting It All Together: Estimating Twitter-Scale Traffic and Storage (The Example You'll Be Asked in Every Interview)
You've memorized the numbers. You know the nines. Now apply them to a concrete example: Estimate Twitter's QPS and storage needs. Assume 500 million monthly active users (MAU). 40% log in daily — that's 200 million DAU. Each user posts 0.5 tweets per day on average, reads 100 tweets, likes 10 tweets. Daily tweets: 200M 0.5 = 100M. Peak QPS: assume 8x factor from average. Average QPS for tweets: 100M / (24 3600) ≈ 1157. Peak QPS: 1157 8 ≈ 9256. Read QPS: 200M 100 reads/day = 20B reads/day. Average read QPS: 20B / 86400 ≈ 231,481. Peak read QPS: 231k 8 ≈ 1.85M. Now storage. Each tweet: text 280 bytes (Unicode), metadata 64 bytes, user ID and timestamps 36 bytes — call it 400 bytes. Plus media: 1% of tweets have an image (200KB), 0.1% have a video (2MB). Average storage per tweet: 400 bytes + (0.01 200KB) + (0.001 2MB) = 400 + 2048 + 2048 = 4.5KB. Daily tweet storage: 100M 4.5KB = 450GB/day. Yearly: 450GB * 365 ≈ 164 TB. Add replication (3x): 492 TB/year. Retain for 5 years with a 90/10 hot/cold tier (hot in SSDs, cold in HDDs): total storage ≈ 2.5 PB. Now you have the numbers to size your caching layer, database shards, and bandwidth. This is the kind of estimation that separates a senior engineer from someone who just 'feels' the system is right.
The $250k AWS Bill Nobody Expected
- Always estimate storage per user and set retention limits before launch. Hot storage for recent data, cold for archives.
- Assume worst-case upload rates — double the expected number. Users upload more than you think.
- Build cost monitoring into your deployment pipeline. Alert when actual usage exceeds 80% of estimated capacity.
- Replication factor is not free. 3x replication = 3x storage cost. Use erasure coding for cold data.
- Data older than 30 days belongs in cold storage. Hot storage is for active data only.
- Peak factor applies to storage too — viral growth can 10x user count in days.
SHOW GLOBAL STATUS (MySQL) or stats.write_rate (Cassandra). Re-run estimate with actual peak load. Add 3x peak factor if missing.docker stats or kubectl top pods to see per-container resource usage. Adjust scaling thresholds based on actual load, not estimate.redis-cli info stats to see evicted_keys. Compare to cache memory budget. If evictions > 1%, increase cache size or reduce TTL. Add 50% headroom to estimate.`vmstat 1 5` → check CPU idle, swap, and interrupt rates`netstat -s | grep 'segments retransmited'` → detect network congestionKey takeaways
Common mistakes to avoid
7 patternsUsing average instead of peak for throughput
Forgetting replication, index, and operational overhead in storage estimates
Ignoring bandwidth egress costs — only estimating CPU/memory
Using optimistic cloud provider default limits as your estimate
Omitting operational storage overhead (logs, backups, transaction logs)
Not accounting for TTL and metadata overhead in cache estimation
redis-cli --bigkeys to validate actual per-key memory usage. Set memory limit 1.5x working set.Forgetting fan-out in distributed system QPS estimation
Interview Questions on This Topic
Design a URL shortener and estimate the storage needed for 100 million URLs per month. Include replication and indexing.
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.
That's Estimation. Mark it forged?
14 min read · try the examples if you haven't