Biggest mistake: using total registered users — always start with DAU
✦ Definition~90s read
What is QPS?
QPS (Queries Per Second) estimation error is the gap between calculating load using average traffic versus applying a peak multiplier — a mistake that causes production outages at scale. Average QPS (total daily queries / 86,400 seconds) looks safe on paper but ignores traffic bursts: social feeds spike during commute hours, e-commerce peaks on Black Friday, and authentication systems see 10x load at 9 AM.
★
Imagine a lemonade stand at a summer fair.
Using average alone guarantees you'll provision for the mean while failing under the 95th or 99th percentile, which is where real users experience latency or timeouts. The fix is a three-step formula: start with Daily Active Users (DAU), multiply by average requests per user per day, then divide by peak seconds (not 86,400) — typically 10-20% of the day for consumer apps.
A common rule of thumb is a 5-10x multiplier between average and peak QPS for social/messaging systems, and 20-50x for event-driven platforms like ticket sales. This error is why Netflix uses chaos engineering to test burst capacity, why Stripe publishes peak-to-average ratios in their engineering blogs, and why your database connection pool should never be sized on average QPS alone.
The reference tables in this article — power of two for storage, latency numbers from L1 cache (0.5 ns) to cross-datacenter RTT (100+ ms), and availability from 99% (3.65 days downtime/year) to 99.999% (5.26 minutes) — give you the mental math to sanity-check any QPS estimate before it hits production.
Plain-English First
Imagine a lemonade stand at a summer fair. If 600 kids show up over 10 minutes and each buys one cup, your stand is handling 1 cup per second — that's your QPS. If you only have one pitcher, you're in trouble. QPS is just the speed at which requests hit your system, and estimating it correctly tells you exactly how big your 'pitcher' needs to be before the fair even starts.
Every system that has ever gone down under traffic had one thing in common: the engineers underestimated how many requests per second were coming in. QPS — Queries Per Second — is the single most important number in a capacity planning conversation. It's the heartbeat of your system, and if you don't know it, you're flying blind. Twitter's 2013 Super Bowl outage, Ticketmaster collapsing during Taylor Swift's Eras Tour presale, and countless startup launch-day crashes all trace back to the same root cause: nobody did the math ahead of time.
QPS estimation solves the problem of 'how much infrastructure do I actually need?' It bridges the gap between product thinking ('we expect 10 million users!') and engineering reality ('so that means we need X database replicas, Y cache nodes, and a load balancer that can sustain Z connections per second'). Without this translation step, you're guessing — and guessing with servers is expensive.
By the end of this article you'll be able to take any back-of-the-napkin user metric — daily active users, monthly signups, event-driven spikes — and convert it into a concrete QPS number with a peak multiplier, read/write split, and storage growth rate. You'll also know the three most dangerous estimation mistakes engineers make in system design interviews, and how to sidestep them all confidently.
Why Average QPS Is a Dangerous Metric
QPS (queries per second) measures the rate at which a system processes requests over a one-second window. The core mechanic is simple: count requests completed in a second. But the trap is using the average QPS to size capacity. A system averaging 100 QPS can see spikes of 500 QPS for 200ms — the average hides the burst. Peak-to-average ratios of 5:1 to 10:1 are common in web traffic, especially under flash crowds or cron-job synchronization. If you provision for the average, you guarantee tail latency spikes and potential cascading failures under load. The key property: QPS is a rate, not a count — it must be measured with sliding windows (e.g., 1s, 10s) to capture bursts. Use P99 or max QPS over short windows for capacity planning, not the mean. In real systems, this matters because autoscalers driven by average QPS react too late — by the time the average rises, the burst has already caused timeouts and retries, amplifying load. Always measure peak QPS over 100ms intervals and set your headroom accordingly.
Average QPS Lies
A system at 50% average QPS can still hit 100% CPU for 500ms every second — the average masks the burst that kills latency.
Production Insight
A flash sale caused 10x QPS spike for 3 seconds; autoscaler using 1-minute average QPS never triggered.
Symptom: 503s and connection pool exhaustion during the spike, then idle resources after.
Rule: Always measure peak QPS over 100ms windows and set autoscale threshold at 60% of that peak.
Key Takeaway
Average QPS hides bursts — always measure peak QPS over sub-second windows.
Provision for P99 QPS, not mean QPS, to avoid tail latency disasters.
Autoscalers using average QPS are too slow; use leading indicators like request queue depth.
thecodeforge.io
QPS Estimation: Average vs Peak Multiplier
Qps Queries Per Second
The Core Formula — From DAU to QPS in Three Steps
The foundation of every QPS estimate is the same simple chain: how many users, how many actions each, spread over how many seconds.
Step 1 — Anchor on Daily Active Users (DAU). This is your starting point. Product gives you this number, or you derive it from total registered users multiplied by an engagement rate. A typical consumer app sees 10–20% of registered users active on any given day.
Step 2 — Estimate actions per user per day. Think about what a single user actually does in a session. For a Twitter-like feed app: they open the app (1 read), scroll through 20 posts (20 reads), post once (1 write), and like 5 things (5 writes). That's 26 requests per user per day. Most systems are 80–95% reads.
Step 3 — Divide by seconds in a day. One day has 86,400 seconds. Divide total daily requests by 86,400 to get your average QPS.
Average QPS is never your target. It's your baseline. Real traffic is not flat — it spikes. Always apply a peak multiplier (typically 2x–5x for consumer apps) to get the number your infrastructure must actually survive. That peak QPS is what you design for.
io_thecodeforge_qps_estimator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# io.thecodeforge QPS Estimator — Back-of-Napkin System Design Calculator# Run this with Python 3. No external libraries needed.defestimate_qps(
daily_active_users: int,
reads_per_user_per_day: int,
writes_per_user_per_day: int,
peak_multiplier: float = 3.0
) -> dict:
"""
Converts user-level product metrics into engineering-level QPS numbers.
peak_multiplier: how much higher than average your busiest hour gets.
Use 2x for stable enterprise apps, 5x for viral consumer apps.
"""
SECONDS_IN_A_DAY = 86_400 # 60 seconds * 60 minutes * 24 hours
total_daily_reads = daily_active_users * reads_per_user_per_day
total_daily_writes = daily_active_users * writes_per_user_per_day
total_daily_requests = total_daily_reads + total_daily_writes
# Average QPS — assumes traffic is perfectly flat across 24 hours (it never is)
avg_read_qps = total_daily_reads / SECONDS_IN_A_DAY
avg_write_qps = total_daily_writes / SECONDS_IN_A_DAY
avg_total_qps = total_daily_requests / SECONDS_IN_A_DAY
# Peak QPS — the number your system MUST handle without degrading
peak_read_qps = avg_read_qps * peak_multiplier
peak_write_qps = avg_write_qps * peak_multiplier
peak_total_qps = avg_total_qps * peak_multiplier
# Read/write ratio — critical for choosing DB architecture (replicas, caching strategy)
read_write_ratio = total_daily_reads / max(total_daily_writes, 1) # guard against division by zeroreturn {
"avg_read_qps": round(avg_read_qps, 1),
"avg_write_qps": round(avg_write_qps, 1),
"avg_total_qps": round(avg_total_qps, 1),
"peak_read_qps": round(peak_read_qps, 1),
"peak_write_qps": round(peak_write_qps, 1),
"peak_total_qps": round(peak_total_qps, 1),
"read_write_ratio": round(read_write_ratio, 1),
}
# --- Example: Twitter-like social feed app ---# Assumptions:# 50 million DAU (mid-size social network)# Each user reads 30 tweets per session (timeline, explore, notifications)# Each user writes 1 tweet + 5 likes = 6 write actions per day# Peak multiplier of 3x (busy evenings vs. quiet early mornings)
results = estimate_qps(
daily_active_users=50_000_000,
reads_per_user_per_day=30,
writes_per_user_per_day=6,
peak_multiplier=3.0
)
print("=== QPS Estimation: Twitter-like App ===")
print(f" Average Read QPS : {results['avg_read_qps']:>10,.1f}")
print(f" Average Write QPS : {results['avg_write_qps']:>10,.1f}")
print(f" Average Total QPS : {results['avg_total_qps']:>10,.1f}")
print()
print(f" Peak Read QPS : {results['peak_read_qps']:>10,.1f} <-- design your read path for this")
print(f" Peak Write QPS : {results['peak_write_qps']:>10,.1f} <-- design your write path for this")
print(f" Peak Total QPS : {results['peak_total_qps']:>10,.1f}")
print()
print(f" Read/Write Ratio : {results['read_write_ratio']:>10,.1f}x <-- heavy read bias → caching is critical")
Output
=== QPS Estimation: Twitter-like App ===
Average Read QPS : 17,361.1
Average Write QPS : 3,472.2
Average Total QPS : 20,833.3
Peak Read QPS : 52,083.3 <-- design your read path for this
Peak Write QPS : 10,416.7 <-- design your write path for this
Peak Total QPS : 62,500.0
Read/Write Ratio : 5.0x <-- heavy read bias → caching is critical
The 86,400 Anchor
Memorise this: one day = 86,400 seconds. In interviews, round it to 100,000 for faster mental math — it's only a 16% overestimate and keeps your arithmetic clean. Interviewers care about your reasoning process, not your arithmetic precision.
Production Insight
Using registered users instead of DAU is the #1 estimation error.
Always ask: 'What percentage of users are active daily?'
Rule: DAU = total users × engagement rate — never skip this step.
Key Takeaway
Average QPS = (DAU × actions) ÷ 86,400.
Peak QPS = average × 2-5x.
The formula is simple; the assumptions are hard.
Power of Two Reference Table — Mental Math for Storage and Bandwidth
Computers operate in binary, so memory, disk, and network capacities are traditionally expressed as powers of two. Knowing these conversions lets you switch between bytes, kilobytes, megabytes, and terabytes in your head, which is essential for quick back-of-envelope calculations during system design interviews or production capacity planning.
When to use which? When calculating storage requirements for databases, always use the binary (powers of two) values because memory addressing and file system allocation are binary. For network bandwidth, service providers often use SI values (1 Gbps = 1,000,000,000 bps). The safest approach in interviews is to use the binary definition but round loosely—convert 1 GB ≈ 1e9 bytes for quick division.
Memory aid: Write down the exponent: 10 = KB, 20 = MB, 30 = GB, 40 = TB. Each step is multiplying by 1,024 ≈ 1,000. So to go from bytes to MB, divide by 10^6 (roughly) or 2^20 (exactly). Use a slider approach: for every 10 bits you drop, you shift one unit.
Interview Math Shortcut
Round 2^10 to 1,000 for quick divisions. If your storage is 375 TB from the earlier example, that's about 3.75 × 10^14 bytes — not far from 375 × 2^40. Interviewers rarely check exact digits; they check your approach.
Production Insight
Always clarify with your infrastructure team whether they use binary or SI units in monitoring alerts. Mixing up KBiB (2^10) and kB (10^3) can cause 2.4% errors that accumulate in long-term projections.
Key Takeaway
Memorize: 2^10 = 1,024 ≈ 1K; 2^20 = 1M; 2^30 = 1G; 2^40 = 1T. Use binary for storage, SI for bandwidth unless specified otherwise.
Latency Numbers Every Programmer Should Know — From L1 Cache to Cross-Datacenter Round Trip
Understanding latency at different levels of the hardware stack is critical for making architectural trade-offs. These numbers, popularised by Jeff Dean and updated for modern hardware, give you an intuition for where to invest optimization effort.
Memory is ~100,000x faster than HDD, ~1,000x faster than SSD. If your read working set fits in RAM, you can serve it orders of magnitude faster than from disk.
Network within the same datacenter costs about 500μs round trip — that's 1,000,000 CPU cycles. Every remote call you avoid saves a huge amount of time.
Cross-datacenter calls are so slow that they dominate response times. Design for data locality.
How to use these numbers: When you see a design that makes synchronous HTTP calls across regions, you can immediately flag it as high latency. When you propose caching in Redis, you're trading a 50μs Redis GET (including network) for a 10ms HDD read — that's 200x improvement.
Production Insight
In production, monitor actual latencies rather than relying solely on these reference numbers. Cloud provider network latency can vary 2x between availability zones. Always measure before optimizing.
Key Takeaway
Memory access ~100ns, SSD ~150μs, HDD ~10ms, datacenter network round trip ~500μs. Use these to judge when in-memory caching, SSDs, or cross-region calls are justified.
Availability Numbers Table — Annual Downtime for 99% to 99.999%
Uptime percentages translate directly into how much downtime your users experience per year. Understanding this table helps you set realistic SLAs and choose the right redundancy strategy.
Availability vs. Annual Downtime | Availability Level | Annual Downtime | Typical Deployment | |-------------------|----------------|--------------------| | 99% (2 nines) | 3.65 days | Single server, no redundancy | | 99.9% (3 nines) | 8.76 hours | Single server with monitoring & recovery | | 99.99% (4 nines) | 52.56 minutes | Multi-AZ, load balancer, automated failover | | 99.999% (5 nines) | 5.26 minutes | Multi-region active-active, redundant all layers | | 99.9999% (6 nines) | 31.56 seconds | Global distribution with instant failover, extreme cost |
Cost vs. benefit: Each extra nine of availability roughly doubles infrastructure cost. For most consumer startups, 99.9% (3 nines) is acceptable — you can lose 8 hours per year while retaining user trust. For financial systems, 99.99% (4 nines) is the baseline. Promising 5 nines requires a full multi-region design with continuous traffic redirection.
Practical rule of thumb: If your system's peak QPS estimate is a few thousand, a single DB instance with a warm standby can hit 3–4 nines. If your peak QPS is tens of thousands and you need 4+ nines, you must architect for component failure from the start — meaning multiple AZs, stateless application servers, and database replication with automatic failover.
Production Insight
Availability promises drive capacity planning. A 4-nines SLA means you must design for the failure of any single AZ, which directly affects how you provision static vs. autoscaling capacity. For 99.99%, you need at least 2x the peak QPS capacity spread across AZs.
Key Takeaway
99% = 3.65 days/yr; 99.9% = 8.76 hrs; 99.99% = 52.6 min; 99.999% = 5.26 min. Don't promise more nines than your architecture can afford or deliver.
Peak QPS vs. Average QPS — Why Average Will Get You Fired
If you provision infrastructure for average QPS, your system will collapse during every spike — which is exactly when your users need you most. The Super Bowl, Black Friday, a viral tweet, a product launch: all of these are predictable spike patterns, and none of them look like an average day.
Traffic follows a diurnal pattern (fancy word for 'it changes with the time of day'). For a US-based consumer app, traffic is lowest at 3–5am Eastern and peaks between 7–9pm Eastern. The ratio between peak hour and the overnight trough can easily be 10:1 or higher.
There are two types of peak you must plan for separately. The first is the predictable daily peak — use a 2x–3x multiplier above your daily average. The second is the burst peak — think of a celebrity tweeting your app link or a DDoS. This can be 10x–50x and you handle it with rate limiting and autoscaling, not by provisioning for it statically.
The multiplier you choose also informs your architectural decisions. At 2x peak you might be fine with a single primary database with replicas. At 10x peak you're looking at tiered caching, read replicas, and queue-based write buffering. The number drives the architecture — not the other way around.
Conclusion: provisioning for average QPS means your system is
overloaded for roughly 8 hours every single day.
Watch Out: The 'Average' Trap
Presenting average QPS as your design target in a system design interview is a red flag for experienced interviewers. Always state your peak multiplier explicitly and justify it — 'I'll use 3x because this is a consumer app with strong evening traffic patterns.' That one sentence shows you understand real production systems.
Production Insight
Provisioning for average QPS creates an 8-hour daily overload window.
Each hour above capacity adds latency, retries, and dropped connections.
Rule: design for peak, autoscale for bursts.
Key Takeaway
Average QPS is a cost metric, not a capacity metric.
Always state peak QPS explicitly with multiplier.
Your architecture is only as strong as your peak assumption.
Storage Growth Rate — QPS Has a Write Side Effect
QPS estimation doesn't stop at request throughput. Every write request creates data, and that data accumulates. If you only estimate QPS for request handling but ignore storage growth, you'll build a system that handles traffic fine on day one but runs out of disk space on day 90.
The storage calculation is a direct extension of write QPS. Take your peak write QPS, multiply by the average payload size per write, and you get bytes per second written to disk. Multiply that by seconds per day, then by 365, and you know your one-year raw storage requirement. Always add 20–30% overhead for indexes, replication, and metadata.
This calculation also drives replication decisions. If your write QPS is 10,000 and you're running 3 replicas, each replica must sustain 10,000 writes per second. That's a very different machine spec than your read replicas, which only need to serve reads.
In interviews, connecting QPS → storage growth → replication factor is the sign of someone who's actually run production systems. It shows you're thinking about the full lifecycle of data, not just the happy path.
io_thecodeforge_storage_growth_estimator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# io.thecodeforge Storage Growth Estimator — connects write QPS to long-term storage needs# This is the calculation interviewers want to see AFTER you state your write QPSdefestimate_storage_growth(
peak_write_qps: float,
avg_record_size_bytes: int,
replication_factor: int = 3,
index_overhead_pct: float = 0.30,
projection_years: int = 3
) -> None:
"""
Given a write QPS, calculates raw and replicated storage growth over time.
replication_factor: how many copies of each write are stored (e.g., 3for most distributed DBs)
index_overhead_pct: extra space consumed by DBindexes (30% is a safe default for B-tree indexes)
"""
SECONDS_PER_DAY = 86_400
BYTES_PER_GB = 1_073_741_824
BYTES_PER_TB = BYTES_PER_GB * 1_024
raw_bytes_per_second = peak_write_qps * avg_record_size_bytes
replicated_bytes_per_second = raw_bytes_per_second * replication_factor
total_bytes_per_second = replicated_bytes_per_second * (1 + index_overhead_pct)
print(f"=== Storage Growth Estimation ===")
print(f"Peak Write QPS : {peak_write_qps:>12,.0f} writes/sec")
print(f"Avg Record Size : {avg_record_size_bytes:>12,} bytes")
print(f"Replication Factor : {replication_factor:>12}x")
print(f"Index Overhead : {index_overhead_pct*100:>11.0f}%")
print()
print(f"Raw write throughput : {raw_bytes_per_second/1_000_000:>11.1f} MB/sec")
print(f"Total disk write rate: {total_bytes_per_second/1_000_000:>11.1f} MB/sec (with replication + indexes)")
print()
print(f"{'Period':<12} | {'Raw Storage':>14} | {'With Replication + Index':>24}")
print("-" * 58)
for period_label, seconds in [
("1 Day", SECONDS_PER_DAY),
("1 Month", SECONDS_PER_DAY * 30),
("1 Year", SECONDS_PER_DAY * 365),
(f"{projection_years} Years", SECONDS_PER_DAY * 365 * projection_years),
]:
raw_total = raw_bytes_per_second * seconds
on_disk_total = total_bytes_per_second * seconds
raw_str = f"{raw_total / BYTES_PER_TB:.2f} TB"if raw_total > BYTES_PER_TB else f"{raw_total / BYTES_PER_GB:.1f} GB"
on_disk_str = f"{on_disk_total / BYTES_PER_TB:.2f} TB"if on_disk_total > BYTES_PER_TB else f"{on_disk_total / BYTES_PER_GB:.1f} GB"print(f"{period_label:<12} | {raw_str:>14} | {on_disk_str:>24}")
print()
print("Architecture implications:")
yearly_tb = (total_bytes_per_second * SECONDS_PER_DAY * 365) / BYTES_PER_TB
if yearly_tb < 10:
print(" → Single-region storage likely sufficient for year 1")
elif yearly_tb < 100:
print(" → Plan for sharding or partitioning by end of year 1")
else:
print(" → Distributed storage (e.g. Cassandra, S3 + data lake) required from day 1")
# Using the write QPS from our Twitter-like example: 10,417 peak writes/sec# Tweet record: ~300 bytes (tweet text + user_id + timestamp + metadata)estimate_storage_growth(
peak_write_qps=10_417,
avg_record_size_bytes=300,
replication_factor=3,
index_overhead_pct=0.30,
projection_years=3
)
Output
=== Storage Growth Estimation ===
Peak Write QPS : 10,417 writes/sec
Avg Record Size : 300 bytes
Replication Factor : 3x
Index Overhead : 30%
Raw write throughput : 3.1 MB/sec
Total disk write rate: 12.2 MB/sec (with replication + indexes)
→ Distributed storage (e.g. Cassandra, S3 + data lake) required from day 1
Interview Gold: The Storage Chain
Interviewers love when you say: 'My write QPS is 10,000/sec at peak. At 300 bytes per record with 3x replication and 30% index overhead, I'm writing about 12 MB/sec to disk — that's roughly 375 TB per year. That means I need a distributed storage solution from day one, which rules out a single Postgres instance.' That chain of reasoning — from QPS to architecture — is what separates senior candidates from junior ones.
Production Insight
10,000 writes/sec at peak with 300-byte records means 12 MB/s disk write rate.
With 3 replicas and 30% index overhead, that's 375 TB/year — distributed storage from day 1.
Rule: never commit to a single-node database without running this calc.
Key Takeaway
Write QPS × record size × replication × (1 + index overhead) = storage growth.
Bandwidth Estimation — Calculating Network Throughput from QPS
Most engineers stop at storage when estimating infrastructure from QPS, but network bandwidth is equally critical. Your servers must not only handle requests and store data — they must also push that data out to clients, replicas, and caches. Underestimating bandwidth leads to network saturation, TCP congestion, and retry storms that amplify latency.
Bandwidth formula: Bandwidth (bits per second) = QPS × average response size (bytes) × 8. For a system serving 50,000 peak read QPS with a 50KB average response (JSON payload including HTTP headers): - Bandwidth = 50,000 × 50,000 × 8 = 20,000,000,000 bps = 20 Gbps. That's a significant fraction of a typical 40 Gbps NIC. If you're running 10 servers, each needs at least 2 Gbps of outbound bandwidth — easily achieved with modern cloud instances, but you must request high-bandwidth instance types.
Inbound vs. outbound: For reads, the bulk is outbound (from server to client). For writes, inbound bandwidth matters (request bodies). A write-heavy system with 10,000 peak write QPS and 1KB average request body pushes 80 Mbps inbound — negligible compared to reads. Always compute both directions separately.
Bandwidth with replicas: Each replica also consumes network. If you have 3 read replicas and each receives the same 50KB response to serve to clients (assuming cache miss), your total outbound bandwidth triples. This often pushes you toward caching at the CDN level to reduce duplicate network load.
Practical rule: Monitor bandwidth at the NIC-level metrics. If you see >70% utilization during peak, you're at risk. Add more instances or optimize payload sizes (gzip, field selection, pagination).
Bandwidth scales linearly with response size. If your API returns a 500KB JSON blob instead of a 50KB paginated response, bandwidth jumps 10x. Always design APIs to return only what's needed. For large objects, use CDN or pre-signed URLs.
Production Insight
Bandwidth bottlenecks often appear silently — network buffers fill, TCP windowing kicks in, and clients experience timeouts before CPU or memory alarms trigger. Always graph outbound bandwidth alongside QPS.
Key Takeaway
Bandwidth (Gbps) = (QPS × response size × 8 × overhead) / 1e9. Outbound dominates for read-heavy systems. Monitor and cache at the CDN to reduce network load.
Read/Write Split — Why Your Cache Strategy Depends on It
Most engineers estimate total QPS and stop there. That's a mistake. Read QPS and write QPS drive fundamentally different architectural decisions. If your read/write ratio is 10:1, you invest in caching and read replicas. If it's 1:1, you focus on write durability, replication lag, and partition tolerance.
Take the Twitter-like app from earlier: read QPS 52,083 at peak, write QPS 10,417 — a 5:1 ratio. That means the read path must handle 5x the throughput of the write path. With a read cache that serves 80% of reads from memory, you reduce read database QPS to 10,417 — exactly matching write QPS. Suddenly a symmetric read replica setup works.
Inversely, a write-heavy system like a monitoring metrics pipeline (where every agent writes every second) will have a 1:1 or even write-dominated ratio. There, caching reads doesn't help because most reads are dashboards querying recent data. Instead, you partition writes across time-based shards and use columnar compression.
Knowing the split isn't a nice-to-have. It's the difference between spending $50k/month on Redis instances you don't need or $50k/month on write-optimised storage you do.
io_thecodeforge_read_write_split_planner.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# io.thecodeforge Read/Write Split Calculator – determines optimal architecture from ratiodefrecommend_architecture(
avg_read_qps: float,
avg_write_qps: float,
peak_multiplier: float
) -> dict:
"""
Given average read/write QPS, returns architectural recommendations.
"""
read_write_ratio = avg_read_qps / max(avg_write_qps, 1)
peak_read_qps = avg_read_qps * peak_multiplier
peak_write_qps = avg_write_qps * peak_multiplier
print(f"=== Read/Write Split Analysis ===")
print(f"Average Read QPS : {avg_read_qps:>10,.1f}")
print(f"Average Write QPS: {avg_write_qps:>10,.1f}")
print(f"Ratio (Read:Write): {read_write_ratio:.1f}:1")
print(f"Peak Read QPS : {peak_read_qps:>10,.1f}")
print(f"Peak Write QPS : {peak_write_qps:>10,.1f}")
print()
print("Architecture Recommendations:")
if read_write_ratio >= 10:
print(" → Heavy read bias: Use CDN + Redis cache for reads.")
print(" → Multiple read replicas; consider read-only shards.")
print(" → Write path: single primary with async replication.")
elif read_write_ratio >= 3:
print(" → Moderate read bias: Use Redis or Memcached for hot data.")
print(" → A few read replicas; database primary with replicas is often enough.")
print(" → Write path: single primary with replication and backups.")
elif read_write_ratio >= 0.5:
print(" → Balanced: Both read and write paths need scaling.")
print(" → Consider sharded database (e.g., Vitess, CockroachDB).")
print(" → Caching still useful but less impact; focus on partition tolerance.")
else:
print(" → Write-heavy: Optimize for write throughput.")
print(" → Use distributed database like Cassandra or ScyllaDB.")
print(" → Queue writes (Kafka) before database to absorb bursts.")
print(" → Caching reads is modest; consider materialized views.")
print()
print("Key insight: If you can cache 80% of reads,")
print(f" effective read QPS reduces to {peak_read_qps * 0.2:>10,.0f}")
print(f" which is {peak_read_qps * 0.2 / peak_write_qps:.1f}x your write QPS.")
# Example: Twitter-like app (read 52k, write 10k peak)recommend_architecture(
avg_read_qps=52_083,
avg_write_qps=10_417,
peak_multiplier=1.0# already peak values
)
Output
=== Read/Write Split Analysis ===
Average Read QPS : 52,083.0
Average Write QPS: 10,417.0
Ratio (Read:Write): 5.0:1
Peak Read QPS : 52,083.0
Peak Write QPS : 10,417.0
Architecture Recommendations:
→ Moderate read bias: Use Redis or Memcached for hot data.
→ A few read replicas; database primary with replicas is often enough.
→ Write path: single primary with replication and backups.
Key insight: If you can cache 80% of reads,
effective read QPS reduces to 10,417
which is 1.0x your write QPS.
The Cache Leverage Point
If reads = 100k/s and cache hits 80%, DB reads drop to 20k/s
That 20k/s matches a 20k/s write workload — symmetric architecture possible
If writes = 50k/s, even 90% cache hit leaves 10k/s reads — still 0.2x writes
Cache doesn't fix write bottlenecks. Use it only where ratio ≥5:1
Production Insight
A 5:1 read/write ratio means you need 5x read capacity vs write.
Caching 80% of reads reduces read QPS by 80% — database load drops dramatically.
Rule: ratio >10:1? Use CDN and in-memory cache. Ratio 1:1? Use write-optimized storage like Cassandra.
Key Takeaway
Read QPS drives cache and replica count.
Write QPS drives durability and replication strategy.
Never treat QPS as a single number — split it first.
Burst Traffic and Autoscaling — Handling the Unexpected Spikes
You've estimated your average QPS, applied a peak multiplier, and provisioned for sustained peak. But what about the unexpected? A viral post, a PR disaster, a partner integration activation — these can dump 10x your peak QPS in seconds. If you provision statically for that, your cloud bill becomes a joke. If you don't, your site goes down.
Your autoscaling design must account for three things: detection speed, scaling velocity, and cooldown physics. CPU-based autoscaling is too slow — by the time CPU hits 80% and new instances register, you're already in backpressure. Instead, use request queue depth or a dedicated QPS metric. Set scaling to trigger at, say, 70% of provisioned capacity, and aim to add 20% capacity every 60 seconds.
Bursts that exceed sustained peak by 20% or more should trigger a different response — not more instances, but load shedding. Queue non-critical writes to Kafka, drop analytics requests, serve stale cache. Define a tiered degradation plan before you need it.
In production, we've seen teams buy 3x peak capacity after a single spike. That's wasteful. The right approach: provision for 1.5x sustained peak, autoscale to 3x within 2 minutes, and rate-limit beyond that. And always test your autoscaling with a real traffic spike before the Super Bowl.
io_thecodeforge_autoscaling_config.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# io.thecodeforge AutoscalingConfigurationExample (KubernetesHPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: io_thecodeforge_qps-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-service
minReplicas: 5
maxReplicas: 30
metrics:
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: 2000 # target 2k QPS per pod
behavior:
scaleUp:
stabilizationWindowSeconds: 0 # scale up immediately
policies:
- type: Pods
value: 5 # add up to 5 pods per period
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # wait 5 min before scaling down
policies:
- type: Percent
value: 20 # remove at most 20% pods per period
periodSeconds: 60
Don't Let Autoscaling Be Your Only Defense
Autoscaling has a reaction delay. If traffic doubles in 30 seconds, by the time your scaling takes effect the database has already fallen over. Always layer rate limiting, connection pooling, and circuit breakers under your autoscaling.
Production Insight
Autoscaling based on CPU is too slow for burst traffic — use request queue depth.
Provision statically for 1.5x sustained peak; autoscale to 5x for bursts.
Rule: test autoscaling with a real spike before it matters.
Key Takeaway
Burst QPS ≠ peak QPS.
Use tiered plan: static provision + autoscale + load shed.
Never rely on autoscaling alone for sudden spikes.
The Interview Walkthrough — Tying It All Together
Let's walk through a typical system design interview problem: 'Design a URL shortener like bit.ly with 100 million daily active users.'
Step 1: Anchor on DAU. Already given: 100M DAU. No need to derive from MAU (but if they gave MAU, you'd use 10-20% to get DAU).
Step 2: Estimate actions per user per day. - Users visit short URLs (read): assume each user clicks 3 links per day → 300M reads/day. - Users create short URLs (write): assume each user creates 0.1 short URLs per day → 10M writes/day. - Total daily requests: 310M.
Step 3: Average QPS. - Average read QPS = 300M / 86,400 ≈ 3,470. - Average write QPS = 10M / 86,400 ≈ 116. - Average total QPS ≈ 3,586.
Step 4: Peak multiplier. URL shorteners have diurnal pattern but also viral bursts (a link goes viral). Use 3x sustained peak and 10x burst (handled by autoscaling). - Peak read QPS ≈ 10,400. - Peak write QPS ≈ 350.
Step 5: Storage growth. Write QPS 350 at peak; record size ~500 bytes (URL + user ID + timestamp + metadata). With 3x replication and 30% overhead: - Write rate ≈ 350 × 500 × 3 × 1.3 ≈ 682 KB/s. - Storage per year ≈ 682 KB/s × 86,400 × 365 ≈ 21.5 TB/year. - That's moderate — a few database nodes can handle it, but you'll need to plan for sharding after year 2.
Step 6: Architecture decisions from QPS. - Read-heavy (30:1 ratio): use a CDN for redirects (they're cacheable), Redis for top-N popular URLs, read replicas. - Write QPS low: a single primary database with replication is fine, but you'll need it to be write-optimized (e.g., Aurora or PostgreSQL with streaming replication). - Storage growth: distributed key-value store? Not required year 1, but plan for Vitess or Cassandra in year 3. - Caching: with 80% cache hit on reads, database read QPS drops to ~2,080 — matches write QPS comfortably.
That chain — from DAU to QPS to storage to architecture — is what interviewers want to see. Practice it until it's second nature.
Interviewers care less about your exact numbers and more about your assumptions. Say: 'I'm assuming each active user clicks 3 URLs per day. That might be higher during a viral campaign, so I'll note that and handle bursts with autoscaling.' Then state your peak multiplier and justify it.
Production Insight
Interviewers test your assumptions more than your arithmetic.
Stating 'I'll use 15% DAU/MAU' is better than 'I assume 100M DAU.'
Rule: always justify each assumption with a brief rationale.
Key Takeaway
The chain: DAU → actions → QPS → storage → architecture is the interview backbone.
Always verbalize your assumptions and multipliers.
UseUse write-optimized database (Cassandra, ScyllaDB). Queue writes to absorb bursts. Caching less impactful.
IfRead and Write are balanced (ratio 0.5:1 to 2:1)
→
UseSharded database with partition tolerance (Vitess, CockroachDB). Both paths need horizontal scaling.
IfPeak QPS exceeds capacity by >5x
→
UseAdd rate limiting, graceful degradation, and anticipatory autoscaling. Consider async processing for non-critical writes.
Tools for Measuring QPS — Don't Fly Blind
You can't improve what you don't measure. And you can't measure QPS with gut feelings or a cron job that polls top every five seconds. Real production monitoring requires sampling at sub-second granularity, especially during tail latencies.
Your first stop: application performance monitoring tools. Datadog, New Relic, and Grafana with Prometheus all expose QPS as a first-class metric. The difference between them is how they aggregate. Prometheus scrapes at 15-second intervals by default — fine for steady state, useless for burst detection. You need histogram-enabled tools that can expose p50, p95, and p99 QPS across your endpoint matrix.
Don't forget server-side logs. Nginx access logs with a timestamp precision of microseconds give you raw traffic patterns you can pipe into ELK or Loki. Parse them with a simple python script to sanity-check your APM numbers. They should match within 2-3%. If they don't, you have a sampling or a clock skew problem.
One trap: measuring QPS only on the load balancer. That shows client-to-proxy traffic, not actual application throughput. Measure at the application tier. Every dropped request is lost QPS that your load balancer never sees.
QpsMonitor.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// io.thecodeforge — system-design tutorial
import time
import json
classQpsMonitor:
def__init__(self, window_seconds=60):
self.window_seconds = window_seconds
self.requests = []
defrecord_request(self, endpoint, status_code):
self.requests.append({
"timestamp": time.time(),
"endpoint": endpoint,
"status_code": status_code
})
# Prune old entries
cutoff = time.time() - self.window_seconds
self.requests = [r for r inself.requests if r["timestamp"] > cutoff]
defcurrent_qps(self, endpoint=None):
cutoff = time.time() - 1# last second
relevant = [
r for r inself.requests
if r["timestamp"] > cutoff and (endpoint isNoneor r["endpoint"] == endpoint)
]
returnlen(relevant)
monitor = QpsMonitor(window_seconds=300)
monitor.record_request("/api/users", 200)
monitor.record_request("/api/users", 200)
monitor.record_request("/api/checkout", 500)
time.sleep(1)
print(f"Current QPS: {monitor.current_qps()}")
print(f"Users endpoint QPS: {monitor.current_qps(endpoint='/api/users')}")
Output
Current QPS: 3
Users endpoint QPS: 2
Production Trap:
Don't use 1-second windows in production at high QPS. The buffer allocation for storing all requests in memory will kill your garbage collector. Use a sliding window with a ring buffer and atomic counters instead. This code is for learning, not production.
Key Takeaway
Always measure QPS at the application tier, not just the load balancer, and sample at sub-second intervals to catch bursts.
Improving QPS — Stop Throwing Hardware at Bad Code
You've got your baseline QPS. It's embarrassingly low. Your first instinct is to scale up — more CPUs, more nodes. Stop. That's the expensive, lazy fix. Real QPS improvements come from removing bottlenecks that make each request take longer than it should.
Start with database queries. N+1 queries are the silent QPS killers. One API call triggers 50 database round trips — your QPS drops while your connection pool drowns. Use eager loading, denormalization, or a read-through cache. PostgreSQL's EXPLAIN ANALYZE is your weapon. If you see sequential scans on indexed columns, fix the index. That one change can 10x your QPS.
Next: optimize your code path. Hot spots are loops over large collections, blocking I/O in critical sections, and unnecessary serialization. Profile with cProfile or a flame graph tool. Anything that blocks the event loop in async frameworks like FastAPI kills throughput. Offload that work to a task queue.
Horizontal scaling works, but only after you've made each node efficient. Doubling your cluster from 10 to 20 nodes when one node handles 100 QPS badly just gives you 2000 QPS of bad performance. Fix the single-node QPS first, then scale out.
SlowEndpointFix.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// io.thecodeforge — system-design tutorial
import asyncio
import time
# Before: blocking database call per userasyncdefget_users_slow(user_ids):
users = []
for uid in user_ids:
user = await fetch_user_from_db(uid) # 10ms each
users.append(user)
return users
# After: batch query — one round tripasyncdefget_users_fast(user_ids):
return await fetch_users_batch(user_ids) # 15ms totalasyncdeffetch_users_batch(ids):
# Simulated batch DB callawait asyncio.sleep(0.015)
return [{"id": i, "name": f"user_{i}"} for i in ids]
asyncdeffetch_user_from_db(uid):
await asyncio.sleep(0.01)
return {"id": uid, "name": f"user_{uid}"}
asyncdefmain():
user_ids = list(range(10))
start = time.time()
awaitget_users_slow(user_ids)
print(f"Slow (N+1): {time.time() - start:.3f}s")
start = time.time()
awaitget_users_fast(user_ids)
print(f"Fast (batch): {time.time() - start:.3f}s")
asyncio.run(main())
Output
Slow (N+1): 0.103s
Fast (batch): 0.016s
Senior Shortcut:
If your APM shows high time spent in database calls, don't optimize the query first. Fix the number of queries. Batching 10 queries into 1 is a 10x improvement that no query tuning can match.
Key Takeaway
Fix N+1 queries and blocking I/O before scaling horizontally — single-node optimization is 10x cheaper than adding servers.
Trade-offs and Tech Choices — Why Your Database Dies at 10K QPS
Every QPS number you calculate points at a bottleneck. The database. You can throw SSDs and connection pools at it, but that's treating the symptom. The real question: what's your read/write pattern and consistency requirement?
Read-heavy at 50K QPS? You need a cache layer and read replicas. Don't touch the primary. Write-heavy at 10K QPS? Now you care about write-ahead logs, batching, and maybe Cassandra or ScyllaDB. PostgreSQL with synchronous replication will choke. MySQL with Group Replication is better but still has upper bounds.
Your choices cascade: SQL vs NoSQL, synchronous vs asynchronous replication, sharding vs partitioning. If you pick a single PostgreSQL instance for a 20K QPS write-heavy feed, you've already lost. The trade-off is always consistency vs throughput. You don't get both. Decide before you write a single CREATE TABLE.
estimate_bottleneck.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — system-design tutorial
defdb_bottleneck(read_qps: int, write_qps: int, db_max: int = 5000):
total = read_qps + write_qps
if total > db_max:
print(f"Bottleneck at {total} QPS — single DB max is {db_max}")
replicas_needed = (total + db_max - 1) // db_max
print(f"Minimum replicas: {replicas_needed}")
returnFalseprint("Single instance survives")
returnTrueif __name__ == "__main__":
db_bottleneck(30000, 500, 5000)
Output
Bottleneck at 30500 QPS — single DB max is 5000
Minimum replicas: 7
Production Trap:
A single PostgreSQL instance dies around 5K QPS on commodity hardware. Your benchmark showed 15K? That's with zero writes, zero indexes, and a tiny working set. Production reality is different.
Key Takeaway
Below 5K QPS, any SQL works. Above 10K QPS, you need caching, sharding, or a different database. Pick your poison before you pick your tech.
Sharp H2: Following and Favorites — The Fan-Out Pattern That Burns You
Two features look innocent until you do the math. Following and Favorites. They're deceptively write-heavy. Every follow writes to a social graph table. Every favorite writes to an activity log. But the real pain? Reading them at scale.
When a user has 10K followers and posts, you either fan-out write (push to every follower's timeline) or fan-out read (pull on demand). Fan-out write explodes QPS: 1 post becomes 10K writes. Fan-out read hits cache like a truck when the post goes viral. Neither is painless.
Favorites have the same trap. A single popular post with 100K favorites? Every read of that post's count fights for cache slots. Denormalize the count into the post row, or you'll watch your Redis hit rate drop to 40% during a spike. Production truth: trade storage cost for write amplification. Always.
favorite_amplification.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — system-design tutorial
# Simulating the write amplification of following/favorites
POSTS_PER_USER = 0.1# daily
FOLLOWER_COUNT = 10_000
FAVORITES_PER_POST = 500defwrite_amplification():
daily_users = 1_000_000
posts = daily_users * POSTS_PER_USER
# Fan-out write: each post pushed to followers
fan_out_writes = posts * FOLLOWER_COUNT
# Favorites: each favorite writes, then updates count
favorite_writes = posts * FAVORITES_PER_POST * 2# write + update countprint(f"Naive fan-out writes/day: {fan_out_writes:,.0f}")
print(f"Favorites writes/day: {favorite_writes:,.0f}")
print(f"Combined write QPS: {(fan_out_writes + favorite_writes)/86400:,.0f}")
if __name__ == "__main__":
write_amplification()
Output
Naive fan-out writes/day: 1,000,000,000
Favorites writes/day: 100,000,000
Combined write QPS: 12,731
Senior Shortcut:
For following, use fan-out-on-read for power users (celebrities) and fan-out-on-write for normal users. Hybrid approach cuts write QPS by 60% without losing timeline freshness.
Key Takeaway
Following and favorites are write amplification bombs. Always denormalize counts and use hybrid fan-out. Your write QPS will thank you.
Case Study: E-Commerce Traffic at Scale
Most systems die not under average load but during flash crowds. An e-commerce site with 10 million DAU, 3 daily sessions per user, and 20 page views per session yields 600 million page views per day. The average QPS is 6,944—manageable. But Black Friday triggers 10x spikes. Every second, customers search, browse, add to cart, and pay. The write path is the bottleneck: order creation triggers inventory deduction, payment processing, and notification fan-out. A single checkout API call can generate 15 internal writes. At 50,000 peak QPS write traffic, a monolithic database crashes. The fix: shard orders by user ID, use a write-ahead log for durability, and serve product catalog from a CDN with stale-while-revalidate. Read-heavy search endpoints must be decoupled via Elasticsearch, not queried against the primary DB. Autoscaling based on request queue depth (not CPU) prevents cold starts during the spike.
Assuming peak traffic is uniform across endpoints hides the write amplification. Checkout spikes at 20K write QPS melt single-node Postgres. Shard before you need to.
Key Takeaway
Always model write fan-out separately from read QPS. A single user action can multiply DB writes by 10–20x.
Challenges and Considerations in QPS Design
QPS estimation is not a one-time calculation. Every assumption shifts under real traffic. First, the coherency challenge: high read QPS demands caching, but caches introduce staleness. E-commerce can tolerate 1-second stale product prices, but a payment gateway cannot stale account balances. Second, the tail latency trap: at 10K QPS, the 99th percentile request can be 500ms while median is 10ms. Autoscaling on average latency ignores the 1% of users who time out. Third, cross-datacenter QPS: a global user base means requests fan out across regions. A single 100K QPS API behind a global load balancer creates 300ms of cross-region overhead for writes. Solution: write-local, read-global with async replication. Fourth, cost of idle capacity: overprovisioning for peak QPS burns budget. Use spot instances for background workers and reserved instances for the steady-state read path. Fifth, the hardest: testing at scale. Load testing at 10K QPS in staging doesn't replicate real user behavior—cache warmup, connection pooling, and garbage collection patterns differ. The golden rule: every 10x QPS increase introduces a new class of failure.
Every 10x QPS growth introduces a new failure mode. At 1K QPS, connection limits bite. At 10K, network bandwidth. At 100K, garbage collection pause kills throughput.
Key Takeaway
QPS design is iterative. Validate each tier's tail latency and cost before scaling to the next order of magnitude.
● Production incidentPOST-MORTEMseverity: high
The $2M Missing Cache: When Average QPS Killed the Launch
Symptom
New feature page slowly loaded, then returned 502 errors. Database CPU 100%, connections exhausted. Autoscaling triggered but new instances started already in backpressure.
Assumption
The engineer assumed average QPS = traffic during launch hour. They used 1 hour window data from beta and divided by 3600 seconds, ignoring that beta had 1/10th the user base.
Root cause
They calculated QPS as (total daily requests) / 86,400 but used total daily requests from a small beta test. They didn't apply a peak multiplier because 'average should be safe with autoscaling.' But autoscaling was configured on CPU, which hit 100% before new instances could register with the load balancer.
Fix
Added a Redis cache layer in front of the database (hit ratio 85%). Set autoscaling to trigger on request queue depth instead of CPU. Provisioned static capacity for 20,000 QPS (sustained peak) and allowed autoscaling to 60,000 QPS.
Key lesson
Never use average QPS as your provisioning target — always compute peak QPS with a multiplier based on traffic patterns.
Autoscaling based on CPU is too slow for sudden spikes; use request queue depth or request rate metrics.
Beta traffic patterns don't scale linearly — apply a risk multiplier when extrapolating to full launch.
Production debug guideUse these symptom-action pairs to rapidly identify capacity misconfigurations4 entries
Symptom · 01
Average QPS is 5,000 but database CPU is 100% at 2 PM
→
Fix
Check peak hour QPS — the diurnal pattern may push actual QPS to 3x average. Review hourly metrics. If peak QPS exceeds provisioned capacity, add read replicas or cache.
Symptom · 02
Writes are slow, reads are fine
→
Fix
Compare write QPS vs write throughput capacity. Check if replication lag is causing write throttling. Consider partitioning (sharding) the write path.
Symptom · 03
Disk filling 3x faster than expected
→
Fix
Recalculate storage growth: write QPS × record size × replication factor × (1 + index overhead). Verify actual record sizes vs estimate. Increase projection if necessary.
Symptom · 04
Autoscaling never triggers during traffic spikes
→
Fix
Check scaling metric configuration. CPU lags behind request rate; use request queue depth or custom QPS metric. Set scaling cooldown to 60 seconds to avoid flapping.
★ QPS Estimation Quick Debug Cheat SheetWhen your system behaves unexpectedly, run these commands and checks to verify your QPS assumptions are correct.
Database CPU at 100% during normal business hours−
Immediate action
Check hourly QPS distribution to identify peak times.
Commands
SELECT DATE_TRUNC('hour', created_at) AS hour, COUNT(*) AS requests FROM requests GROUP BY hour ORDER BY hour;
SHOW FULL PROCESSLIST; (MySQL) or pg_stat_activity (Postgres) to see active queries.
Fix now
Add read replicas or increase cache TTL to absorb peak reads. If peak exceeds current static capacity by >2x, provision more replicas immediately.
Write latency spikes every few hours+
Immediate action
Check if writes are queued and if replication lag is accumulating.
Commands
SHOW SLAVE STATUS\G; (MySQL) or SELECT * FROM pg_stat_replication; (Postgres) for lag.
Check disk IOPS: iostat -x 1 (Linux) or CloudWatch metrics.
Fix now
Reduce synchronous commit if replication is the bottleneck, or increase write throughput by partitioning the write table.
Disk space warning after deploying new feature+
Immediate action
Recalculate storage growth based on actual write QPS and record size.
Commands
SELECT pg_database_size(current_database()); (Postgres) or SELECT table_schema AS 'Database', SUM(data_length + index_length) / 1024 / 1024 AS 'Size (MB)' FROM information_schema.tables GROUP BY table_schema; (MySQL)
Check logs for unexpected large writes or indexing overhead.
Fix now
Add disk space or enable auto-scaling for storage. If growth is >10% per week, plan to shard within three months.
Common mistakes to avoid
3 patterns
×
Using total registered users instead of DAU
Symptom
QPS estimate is 10x higher than actual traffic, leading to over-provisioned infrastructure and wasted cloud spend.
Fix
Always start with Daily Active Users (DAU). Apply an engagement rate: DAU = total users × (% active daily). Default to 15% if unknown.
×
Ignoring read/write split
Symptom
Cache strategy fails because write path is the real bottleneck, or read replicas are underutilized.
Fix
Separate read and write QPS. Use the read/write ratio to decide caching vs. write-optimized storage. Cache only if ratio ≥5:1.
×
Using average QPS as the provisioning target
Symptom
System collapses during peak hours despite appearing 'within capacity' on daily averages.
Fix
Apply a peak multiplier (2x-5x for consumer apps). Design for sustained peak QPS, not average. Use autoscaling for bursts only.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How would you estimate QPS for a social media platform with 50 million D...
Q02JUNIOR
What is the most common mistake in QPS estimation during system design i...
Q03SENIOR
Explain how QPS estimation connects to storage growth and how that influ...
Q01 of 03SENIOR
How would you estimate QPS for a social media platform with 50 million DAU? Assume each user performs 10 reads and 2 writes per day.
ANSWER
Average QPS = (50M × (10+2)) / 86,400 = ~6,944. With a 3x peak multiplier, peak QPS ≈ 20,833. Split: read QPS peak ≈ 17,361, write QPS peak ≈ 3,472. Storage: 3,472 writes/sec × 500 bytes × 3 replicas × 1.3 overhead ≈ 6.8 MB/s → ~214 TB/year. Architecture: read-heavy (5:1), use Redis cache and read replicas; write path single primary with replication, plan sharding after year 2.
Q02 of 03JUNIOR
What is the most common mistake in QPS estimation during system design interviews?
ANSWER
Using total registered users instead of DAU. Also, ignoring the peak multiplier and treating average QPS as the design target. Interviewers look for explicit assumptions and justification of multipliers.
Q03 of 03SENIOR
Explain how QPS estimation connects to storage growth and how that influences database choice.
ANSWER
Write QPS × record size × replication factor × index overhead gives bytes/second storage growth. For example, 10,000 writes/sec × 300 bytes × 3 replicas × 1.3 = 11.7 MB/s → 369 TB/year. If storage >100 TB/year, you need distributed storage like Cassandra or sharded SQL from day one. If <10 TB/year, a single node is fine.
01
How would you estimate QPS for a social media platform with 50 million DAU? Assume each user performs 10 reads and 2 writes per day.
SENIOR
02
What is the most common mistake in QPS estimation during system design interviews?
JUNIOR
03
Explain how QPS estimation connects to storage growth and how that influences database choice.
SENIOR
FAQ · 3 QUESTIONS
Frequently Asked Questions
01
What is QPS and why is it important?
QPS stands for Queries Per Second (or Requests Per Second). It measures the rate at which your system receives requests. It's critical for capacity planning: without knowing your QPS, you can't size your infrastructure correctly. Underestimating QPS leads to outages; overestimating leads to wasted cloud spend.
Was this helpful?
02
How do I calculate QPS from user metrics?
Start with Daily Active Users (DAU). Multiply by the average number of actions per user per day (reads + writes). Divide by 86,400 (seconds in a day) to get average QPS. Then apply a peak multiplier (2x-5x) to get the design target. Always split into read and write QPS.
Was this helpful?
03
Should I use CPU utilization for autoscaling around QPS?
No. CPU lags behind request rate. Use request queue depth or a custom QPS metric instead. CPU-based autoscaling is too slow for sudden traffic spikes, leading to backpressure and failures before new instances register.