Back of Envelope Estimation — The 5 Critical Numbers
100k users × 50MB daily × 3x replication = 450TB, not 10TB.
- Back-of-envelope estimation = rough system sizing using basic math and memorised constants. 5-30 minutes, ±2x accuracy.
- Key numbers: 20k QPS per commodity server, 5k writes/sec per DB, 2-10ms disk read, 1ms network RTT, 500KB per social user, 100MB per video minute.
- Storage formula: users × data per user × retention × overhead (replication 3x, indexes 1.3x, logs 1.5x) = total GB.
- Throughput: peak QPS = DAU × actions/user/day ÷ 86400 × peak factor (3-10x). Bandwidth = QPS × request size.
- Performance insight: underestimating by 10% at launch costs more than overestimating by 2x — add safety margin.
- Production trap: forgetting egress costs ($0.09/GB) turns $5k estimate into $28k bill. Always model data transfer.
- Biggest mistake: using average instead of peak throughput. Traffic spikes 3-10x average, without peak factor your system crashes at 2 PM.
Imagine a contractor walks through your house and says 'yeah, this kitchen reno will run about $15,000' — without pulling out a calculator or measuring tape. They're using years of experience, rough rules, and known costs-per-square-foot to give you a number that's close enough to act on. That's back-of-envelope estimation. In system design, it means quickly calculating whether your architecture can handle 10 million users before you spend six months building it — using nothing but a few key numbers you've memorised and some basic math.
Every large-scale system that ever failed probably had an engineer somewhere who skipped the math. Twitter's early 'fail whale' wasn't bad code — it was a system designed for far fewer requests than it actually received. Back-of-envelope estimation is the skill that separates engineers who build systems that survive launch day from those who scramble to add servers at 2 AM. It's not about being precise; it's about being right enough, fast enough, to make good architectural decisions.
Here's the hard truth: if you can't estimate storage within 2x and throughput within 3x before you write a line of code, you're flying blind. I've seen teams burn six-figure cloud bills because they forgot to multiply by replication factor. Estimation isn't the cost — it's the insurance.
The most common failure in production? Forgetting the peak factor. Your system handles 1000 QPS average, you provision for 1500. Then a viral tweet hits at 2 PM — traffic spikes to 8000 QPS. Your database melts. A 3x peak factor would have saved you. Estimate for peak, not average.
The Key Numbers Every Engineer Must Memorise
Back-of-envelope estimation relies on a small set of well-known numbers. Memorise these and you can size almost any system.
- Single server (commodity): ~10,000–20,000 QPS for simple API endpoints
- Database writes: ~1,000–5,000 writes/sec on a standard MySQL instance
- Disk read latency: ~2–10 ms for SSD, ~10–40 ms for HDD
- Network round trip in same region: ~0.5–2 ms
- Storage per user (social app): ~500 KB per user including profiles, photos thumbnails
- Storage per video minute: ~100 MB compressed (HD)
- Bandwidth per user for real-time chat: ~1 KB/s per active connection
- Caching (Redis) memory cost: ~$0.1/GB/month on cloud
- Database storage cost: ~$0.25/GB/month for SSD-backed cloud DB
- Egress cost (AWS): $0.09/GB to internet; $0.02/GB inter-region
These numbers change slowly and vary by cloud provider. Update them every six months. Use the cloud provider's pricing calculator for current rates.
Here's a trick: keep a personal cheat sheet in your notes app. When you run a load test or see a production bottleneck, update the constants. Over time, you'll build your own tuned set of numbers that are far more accurate than any generic list.
Don't trust cloud provider defaults blindly. I once saw an engineer use AWS's advertised RDS IOPS for a db.r5.large — actual throughput was less than half. Always baseline with your own load test.
Another trap: using 20k QPS for a Node.js server when your app does heavy JSON parsing. That number assumes simple requests. If each request deserialises a 50KB payload, your actual QPS drops to maybe 5k. Know your workload pattern.
The single most important number for cost modelling? Egress. Many teams model compute and storage perfectly, then forget data transfer. On AWS, 1 PB of egress costs $90,000. That's not a rounding error — it's a hiring decision.
- 20k QPS per server: baseline for API endpoints. Heavy processing → 5k.
- 5k DB writes/sec: MySQL/PostgreSQL. Read replicas can scale reads higher.
- 2ms SSD read: your bottleneck is rarely disk; it's usually network or CPU.
- $0.09/GB egress: the hidden cost that dominates bills. Always model it.
- 100 MB per video minute: HD H.264. 4K is 4x. Use this for streaming estimates.
How to Estimate Storage Requirements — Including Replication and Retention
Storage estimation is the most common use case and the most frequently miscalculated. Here's a step-by-step approach:
- Define your data model — what entities do you store? e.g., user profile, posts, media, messages.
- Estimate size per entity — average size of a user profile (name, bio, avatar) ~10 KB; a post with text and one image ~500 KB; a video minute ~100 MB.
- Multiply by number of entities per user — e.g., each user creates 2 posts per day, receives 20 messages. Over retention period (30 days, 90 days, 1 year).
- Include indexes and replication — indexes add 20–50% overhead; replication factor of 3 triples storage. B-tree indexes are larger than hash indexes; factor in index type.
- Apply retention policy — data older than 90 days goes to cold storage. Hot storage costs 10x cold.
- Add operational overhead — logs, transaction logs, backups. Add 50% overhead.
Example: 10 million users, each storing 100 MB over a year (including media uploads). That's 10e6 × 100 MB = 1 PB primary storage. With replication factor 3 → 3 PB. With indexes → 4 PB. With operational overhead → 6 PB. At $0.25/GB/month for hot storage, 6 PB = $1,500,000/month. That's not affordable for most. Back to the envelope: you need to reduce retention to 30 days for hot storage (80% reduction), move older data to cold ($0.004/GB → 98% cheaper), compress media (60% reduction), and use 2x replication for data that's not mission-critical.
Don't forget logs and backups. A common trap: estimate only user data, then discover that database transaction logs, application logs, and nightly backups triple the storage footprint. Add a 50% overhead for these operational data stores.
Another trap: assuming all users are equal. A small percentage of power users can generate 10x the average data. Run a percentile analysis if you can — P99 storage per user can be 5x the median.
Also consider storage tiering: hot data on SSD, warm on HDD, cold on object storage. Each tier has different cost and latency. Estimate access patterns to decide tier boundaries.
Real-world example: a photo-sharing app estimated 100KB per photo. But users uploaded in RAW format. Actual size was 5MB per photo. That 50x miss blew their storage budget in a week.
The retention trap: a team kept all user data forever. After 2 years, 80% of their storage cost was for data older than 90 days that nobody accessed. Moving old data to Glacier cut costs by 80% overnight.
- Retention is the easiest lever: 30 days hot, then cold. 90% cost reduction.
- Replication factor 3 is default, but not all data needs it. Cold data can use 2x or erasure coding (1.4x overhead).
- Compression reduces storage by 50-80% for media, 80-90% for logs.
- Operational overhead (logs, backups) adds 50% — many teams forget this.
How to Estimate Throughput (QPS and Bandwidth) — Peak Factor Is Not Optional
Throughput estimation tells you if your server fleet can handle the load. The most common mistake is using average QPS instead of peak.
1. Estimate total daily requests - Daily Active Users (DAU) × average actions per user per day = total daily actions - Example: 10M DAU, 10 actions/user/day = 100M actions/day
2. Convert to QPS - Average QPS = total daily actions / 86400 (seconds per day) - Peak QPS = average QPS × peak factor. Peak factor is 3-10x for consumer apps, 5-20x for e-commerce during sales. - QPS = 100M / 86400 * 3 (peak factor) ≈ 3,472 QPS
3. Estimate bandwidth - Each action involves data transfer (request + response). Assume average 50 KB per request. - Bandwidth = peak QPS × data per request = 3,472 × 50 KB ≈ 174 MB/s = ∼1.4 Gbps - Check if your network link and load balancer can handle 1.4 Gbps sustained.
4. Compare to server capacity - If each server handles 20k QPS, you need ceil(3,472 / 20,000) ≈ 1 server? Wait — that's for one service. For a microservice architecture, each service may handle part of the chain: web server, auth, business logic, database. Each has different capacity. - For a chain of 3 services, total required server count = sum of per-service QPS divided by per-server QPS.
A common mistake: forget that each request often triggers multiple internal requests. A single API call may hit auth, user service, and feed service, each adding to total QPS on downstream services. Account for fan-out (typical 3-5x).
Bonus: for write-heavy endpoints, DB writes are often the bottleneck before CPU. Estimate DB writes per second separately and compare against your DB's max transactional throughput (∼5k writes/sec for standard MySQL).
Also account for retry amplification: if a downstream service times out, the client may retry, multiplying load. Use circuit breakers to limit retry storms.
A real-world case: a payment service estimated 1000 TPS. But each payment call triggered 3 retries automatically when the downstream was slow. Actual load hit 3000 TPS, overwhelming the DB. Circuit breaker was the fix.
The peak factor mistake: using average QPS leads to crash under load. A social media app had 1000 QPS average, 3000 QPS peak. They provisioned for 1500 QPS. At 3 PM daily peak, the system collapsed. Adding a 3x peak factor would have saved them. Always multiply by 3-10x.
How to Estimate Cache and Memory Requirements — Working Set vs Total Data
Caching is a critical lever for performance, but it's easy to underestimate the memory you'll need. The key concept: working set — the subset of data accessed frequently — not total data.
1. Estimate cache working set - Which data is accessed most frequently? User profiles, session tokens, product catalog, etc. - Size of one entry × number of unique entries accessed in a time window (e.g., 1 hour). - Example: 10M daily active users, each with a 1KB session token. But only 20% are active in any given hour = 2M users × 1KB = 2 GB working set for sessions.
2. Determine cache hit ratio target - Pareto principle: 80% of reads hit 20% of data. Estimate the hot set size. - If hot set is 2 GB and you dedicate 4 GB cache, you'll likely get >95% hit rate. Beyond that, diminishing returns. - For product catalog (read-only), working set can be 100% of data. For user sessions, working set is active users only.
3. Account for eviction overhead - Caches like Redis use eviction policies (LRU, LFU). Under memory pressure, evictions cause cache misses and increased DB load. - Rule: set memory limit to 1.5x your estimated working set to leave headroom for spikes.
4. Consider replication overhead - If you use Redis Cluster or a replicated cache, memory multiplies by replica count. - Example: 2 GB working set, replication factor 2 → 4 GB total cache memory.
5. Don't forget TTL overhead - Each cached entry has TTL metadata. With millions of keys, this adds 10-20% overhead to memory. - Use redis-cli --bigkeys to analyse actual memory usage vs estimates.
6. Cache access pattern matters - Read-heavy (90% reads): high hit ratio, memory is the constraint. - Write-heavy (90% writes): write-through cache, memory still helps but write amplification may occur.
A real-world trap: a team cached entire database rows instead of just hot columns. Their cache memory grew 3x beyond estimate, causing OOM and cascading failures.
Another gotcha: using a single Redis instance for both cache and session store. When one app's session floods evict the other app's cache. Separate your caches by use case.
The working set mistake: estimating total data instead of working set leads to 10x over-provisioning. For a 1TB database, the working set might be only 10GB (1%). Cache that, not the whole database.
- Working set is where caching actually helps. Cold data in cache is wasted memory.
- Session cache: active users only (20-30% of total users in peak hour).
- Product catalog: 80% of sales come from 20% of products — cache that 20%.
- Time-series data: only cache recent N hours/days, not all history.
Estimating for Distributed Systems: Chain Latency and Fan-Out
In a microservice architecture, you don't just estimate a single endpoint — you estimate the whole chain. A user request might hit: - API Gateway - Authentication service - Business logic service - Database/cache Each hop adds latency, network bandwidth, and processing overhead.
How to estimate: 1. Identify the critical path: the longest chain of services for a request. 2. Estimate per-service QPS: external QPS × (fan-out factor per service). A single API call might fan out to 5 internal services (user lookup, inventory check, payment, notification, audit). 3. Estimate per-service bandwidth: QPS × request+response size for that service. 4. Estimate latency budget: service latency + network RTT. With 5 services + 5 network hops at 1ms each, that's 5× service latency + 5ms. 5. Add headroom: distributed systems degrade under load — add 30% overhead for retries, timeouts, and request amplification.
Example: Social feed request — Gateway (1ms + 0.5ms net) → Auth (5ms + 0.5ms net) → Feed service (20ms + 0.5ms net) → Cache (1ms) → DB (10ms). Total: ∼38ms for a single request. If you need to serve 10M users at peak 3k QPS, you need to ensure each service can handle its share of the QPS.
The chain latency number is critical for SLOs. If your SLO is 200ms P99, and your critical path estimate is 40ms average, you have room for tail latencies. But if the estimate is already 150ms, you're in trouble before you've built anything.
Also consider network bandwidth between services. If services are in different availability zones, bandwidth costs add up. Estimate inter-service traffic using average response sizes and QPS. At 10k QPS, 1KB per request = 10MB/s = 80Mbps, which is fine. At 100k QPS, it's 800Mbps — approaching 1Gbps limits.
The fan-out trap: assuming each external request generates one internal request. In reality, a single API call often generates 5-15 internal requests due to service decomposition. A team estimated 1000 QPS external, provisioned for 1000 QPS internal. Actual internal QPS was 10,000. The system collapsed.
A classic failure: an e-commerce team estimated 50ms for their checkout chain. In reality, each DB call took 30ms, and there were 4 DB calls. The chain was 120ms. They missed their P99 by 2x. They had to add caching and async processing post-launch.
- API Gateway → Auth (1x) → User Service (1x) → Feed Service (2x internal calls) → DB (2x). Total fan-out = 6x.
- If external QPS = 1000, internal QPS = 6000. That's 6x more servers than you thought.
- Retries multiply again: a 1% error rate with 3 retries adds 2% load (retries are exponential).
- Circuit breakers prevent retry storms — add them before estimating fan-out.
The $250k AWS Bill Nobody Expected
- Always estimate storage per user and set retention limits before launch. Hot storage for recent data, cold for archives.
- Assume worst-case upload rates — double the expected number. Users upload more than you think.
- Build cost monitoring into your deployment pipeline. Alert when actual usage exceeds 80% of estimated capacity.
- Replication factor is not free. 3x replication = 3x storage cost. Use erasure coding for cold data.
- Data older than 30 days belongs in cold storage. Hot storage is for active data only.
- Peak factor applies to storage too — viral growth can 10x user count in days.
SHOW GLOBAL STATUS (MySQL) or stats.write_rate (Cassandra). Re-run estimate with actual peak load. Add 3x peak factor if missing.docker stats or kubectl top pods to see per-container resource usage. Adjust scaling thresholds based on actual load, not estimate.redis-cli info stats to see evicted_keys. Compare to cache memory budget. If evictions > 1%, increase cache size or reduce TTL. Add 50% headroom to estimate.Key takeaways
Common mistakes to avoid
7 patternsUsing average instead of peak for throughput
Forgetting replication, index, and operational overhead in storage estimates
Ignoring bandwidth egress costs — only estimating CPU/memory
Using optimistic cloud provider default limits as your estimate
Omitting operational storage overhead (logs, backups, transaction logs)
Not accounting for TTL and metadata overhead in cache estimation
redis-cli --bigkeys to validate actual per-key memory usage. Set memory limit 1.5x working set.Forgetting fan-out in distributed system QPS estimation
Interview Questions on This Topic
Design a URL shortener and estimate the storage needed for 100 million URLs per month. Include replication and indexing.
Frequently Asked Questions
That's Estimation. Mark it forged?
10 min read · try the examples if you haven't