Intermediate 12 min · March 06, 2026

QPS Estimation Error: Average vs Peak Multiplier

Using average QPS from beta tests caused 100% database CPU and 502 errors at launch.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • QPS estimation converts user metrics (DAU, actions/user) into requests per second
  • Core formula: (DAU × actions per user) ÷ 86,400 = average QPS
  • Always split into read QPS and write QPS — drives cache vs replica decisions
  • Peak QPS = average × 2-5x; average provisioning guarantees production outages
  • Storage grows from write QPS: 10,000 writes/sec × 300 bytes × 3 replicas = 375 TB/year
  • Biggest mistake: using total registered users — always start with DAU
Plain-English First

Imagine a lemonade stand at a summer fair. If 600 kids show up over 10 minutes and each buys one cup, your stand is handling 1 cup per second — that's your QPS. If you only have one pitcher, you're in trouble. QPS is just the speed at which requests hit your system, and estimating it correctly tells you exactly how big your 'pitcher' needs to be before the fair even starts.

Every system that has ever gone down under traffic had one thing in common: the engineers underestimated how many requests per second were coming in. QPS — Queries Per Second — is the single most important number in a capacity planning conversation. It's the heartbeat of your system, and if you don't know it, you're flying blind. Twitter's 2013 Super Bowl outage, Ticketmaster collapsing during Taylor Swift's Eras Tour presale, and countless startup launch-day crashes all trace back to the same root cause: nobody did the math ahead of time.

QPS estimation solves the problem of 'how much infrastructure do I actually need?' It bridges the gap between product thinking ('we expect 10 million users!') and engineering reality ('so that means we need X database replicas, Y cache nodes, and a load balancer that can sustain Z connections per second'). Without this translation step, you're guessing — and guessing with servers is expensive.

By the end of this article you'll be able to take any back-of-the-napkin user metric — daily active users, monthly signups, event-driven spikes — and convert it into a concrete QPS number with a peak multiplier, read/write split, and storage growth rate. You'll also know the three most dangerous estimation mistakes engineers make in system design interviews, and how to sidestep them all confidently.

The Core Formula — From DAU to QPS in Three Steps

The foundation of every QPS estimate is the same simple chain: how many users, how many actions each, spread over how many seconds.

Step 1 — Anchor on Daily Active Users (DAU). This is your starting point. Product gives you this number, or you derive it from total registered users multiplied by an engagement rate. A typical consumer app sees 10–20% of registered users active on any given day.

Step 2 — Estimate actions per user per day. Think about what a single user actually does in a session. For a Twitter-like feed app: they open the app (1 read), scroll through 20 posts (20 reads), post once (1 write), and like 5 things (5 writes). That's 26 requests per user per day. Most systems are 80–95% reads.

Step 3 — Divide by seconds in a day. One day has 86,400 seconds. Divide total daily requests by 86,400 to get your average QPS.

Average QPS is never your target. It's your baseline. Real traffic is not flat — it spikes. Always apply a peak multiplier (typically 2x–5x for consumer apps) to get the number your infrastructure must actually survive. That peak QPS is what you design for.

io_thecodeforge_qps_estimator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# io.thecodeforge QPS Estimator — Back-of-Napkin System Design Calculator
# Run this with Python 3. No external libraries needed.

def estimate_qps(
    daily_active_users: int,
    reads_per_user_per_day: int,
    writes_per_user_per_day: int,
    peak_multiplier: float = 3.0
) -> dict:
    """
    Converts user-level product metrics into engineering-level QPS numbers.
    peak_multiplier: how much higher than average your busiest hour gets.
                     Use 2x for stable enterprise apps, 5x for viral consumer apps.
    """
    SECONDS_IN_A_DAY = 86_400  # 60 seconds * 60 minutes * 24 hours

    total_daily_reads  = daily_active_users * reads_per_user_per_day
    total_daily_writes = daily_active_users * writes_per_user_per_day
    total_daily_requests = total_daily_reads + total_daily_writes

    # Average QPS — assumes traffic is perfectly flat across 24 hours (it never is)
    avg_read_qps  = total_daily_reads  / SECONDS_IN_A_DAY
    avg_write_qps = total_daily_writes / SECONDS_IN_A_DAY
    avg_total_qps = total_daily_requests / SECONDS_IN_A_DAY

    # Peak QPS — the number your system MUST handle without degrading
    peak_read_qps  = avg_read_qps  * peak_multiplier
    peak_write_qps = avg_write_qps * peak_multiplier
    peak_total_qps = avg_total_qps * peak_multiplier

    # Read/write ratio — critical for choosing DB architecture (replicas, caching strategy)
    read_write_ratio = total_daily_reads / max(total_daily_writes, 1)  # guard against division by zero

    return {
        "avg_read_qps":    round(avg_read_qps,  1),
        "avg_write_qps":   round(avg_write_qps,  1),
        "avg_total_qps":   round(avg_total_qps,  1),
        "peak_read_qps":   round(peak_read_qps,  1),
        "peak_write_qps":  round(peak_write_qps, 1),
        "peak_total_qps":  round(peak_total_qps, 1),
        "read_write_ratio": round(read_write_ratio, 1),
    }


# --- Example: Twitter-like social feed app ---
# Assumptions:
#   50 million DAU (mid-size social network)
#   Each user reads 30 tweets per session (timeline, explore, notifications)
#   Each user writes 1 tweet + 5 likes = 6 write actions per day
#   Peak multiplier of 3x (busy evenings vs. quiet early mornings)

results = estimate_qps(
    daily_active_users=50_000_000,
    reads_per_user_per_day=30,
    writes_per_user_per_day=6,
    peak_multiplier=3.0
)

print("=== QPS Estimation: Twitter-like App ===")
print(f"  Average Read  QPS : {results['avg_read_qps']:>10,.1f}")
print(f"  Average Write QPS : {results['avg_write_qps']:>10,.1f}")
print(f"  Average Total QPS : {results['avg_total_qps']:>10,.1f}")
print()
print(f"  Peak Read  QPS    : {results['peak_read_qps']:>10,.1f}  <-- design your read path for this")
print(f"  Peak Write QPS    : {results['peak_write_qps']:>10,.1f}  <-- design your write path for this")
print(f"  Peak Total QPS    : {results['peak_total_qps']:>10,.1f}")
print()
print(f"  Read/Write Ratio  : {results['read_write_ratio']:>10.1f}x  <-- heavy read bias → caching is critical")
Output
=== QPS Estimation: Twitter-like App ===
Average Read QPS : 17,361.1
Average Write QPS : 3,472.2
Average Total QPS : 20,833.3
Peak Read QPS : 52,083.3 <-- design your read path for this
Peak Write QPS : 10,416.7 <-- design your write path for this
Peak Total QPS : 62,500.0
Read/Write Ratio : 5.0x <-- heavy read bias → caching is critical
The 86,400 Anchor
Memorise this: one day = 86,400 seconds. In interviews, round it to 100,000 for faster mental math — it's only a 16% overestimate and keeps your arithmetic clean. Interviewers care about your reasoning process, not your arithmetic precision.
Production Insight
Using registered users instead of DAU is the #1 estimation error.
Always ask: 'What percentage of users are active daily?'
Rule: DAU = total users × engagement rate — never skip this step.
Key Takeaway
Average QPS = (DAU × actions) ÷ 86,400.
Peak QPS = average × 2-5x.
The formula is simple; the assumptions are hard.

Power of Two Reference Table — Mental Math for Storage and Bandwidth

Computers operate in binary, so memory, disk, and network capacities are traditionally expressed as powers of two. Knowing these conversions lets you switch between bytes, kilobytes, megabytes, and terabytes in your head, which is essential for quick back-of-envelope calculations during system design interviews or production capacity planning.

Powers of Two Table | Unit | Abbreviation | Value | Exact Bytes | Common Rounding | |------|-------------|-------|-------------|-----------------| | Kilobyte | KB | 2^10 | 1,024 | 1,000 (SI) | | Megabyte | MB | 2^20 | 1,048,576 | 1,000,000 | | Gigabyte | GB | 2^30 | 1,073,741,824 | 1,000,000,000 | | Terabyte | TB | 2^40 | 1,099,511,627,776 | 1,000,000,000,000 | | Petabyte | PB | 2^50 | 1,125,899,906,842,624 | 1,000,000,000,000,000 |

When to use which? When calculating storage requirements for databases, always use the binary (powers of two) values because memory addressing and file system allocation are binary. For network bandwidth, service providers often use SI values (1 Gbps = 1,000,000,000 bps). The safest approach in interviews is to use the binary definition but round loosely—convert 1 GB ≈ 1e9 bytes for quick division.

Memory aid: Write down the exponent: 10 = KB, 20 = MB, 30 = GB, 40 = TB. Each step is multiplying by 1,024 ≈ 1,000. So to go from bytes to MB, divide by 10^6 (roughly) or 2^20 (exactly). Use a slider approach: for every 10 bits you drop, you shift one unit.

Interview Math Shortcut
Round 2^10 to 1,000 for quick divisions. If your storage is 375 TB from the earlier example, that's about 3.75 × 10^14 bytes — not far from 375 × 2^40. Interviewers rarely check exact digits; they check your approach.
Production Insight
Always clarify with your infrastructure team whether they use binary or SI units in monitoring alerts. Mixing up KBiB (2^10) and kB (10^3) can cause 2.4% errors that accumulate in long-term projections.
Key Takeaway
Memorize: 2^10 = 1,024 ≈ 1K; 2^20 = 1M; 2^30 = 1G; 2^40 = 1T. Use binary for storage, SI for bandwidth unless specified otherwise.

Latency Numbers Every Programmer Should Know — From L1 Cache to Cross-Datacenter Round Trip

Understanding latency at different levels of the hardware stack is critical for making architectural trade-offs. These numbers, popularised by Jeff Dean and updated for modern hardware, give you an intuition for where to invest optimization effort.

Latency Reference Table (Approximate, 2026 hardware) | Operation | Latency | Relative Scale | |-----------|---------|----------------| | L1 cache reference | 0.5 ns | 1x | | L2 cache reference | 7 ns | 14x | | L3 cache reference | 15 ns | 30x | | Main memory access (RAM) | 100 ns | 200x | | SSD random read | 150 μs | 300,000x | | HDD random read | 10 ms | 20,000,000x | | Network packet within datacenter (round trip) | 500 μs | 1,000,000x | | Datacenter to region (coast-to-coast) | 40 ms | 80,000,000x | | Cross-datacenter round trip (active-passive) | 100–500 ms | 200,000,000–1,000,000,000x |

Key takeaways from the table
  • Memory is ~100,000x faster than HDD, ~1,000x faster than SSD. If your read working set fits in RAM, you can serve it orders of magnitude faster than from disk.
  • Network within the same datacenter costs about 500μs round trip — that's 1,000,000 CPU cycles. Every remote call you avoid saves a huge amount of time.
  • Cross-datacenter calls are so slow that they dominate response times. Design for data locality.

How to use these numbers: When you see a design that makes synchronous HTTP calls across regions, you can immediately flag it as high latency. When you propose caching in Redis, you're trading a 50μs Redis GET (including network) for a 10ms HDD read — that's 200x improvement.

Production Insight
In production, monitor actual latencies rather than relying solely on these reference numbers. Cloud provider network latency can vary 2x between availability zones. Always measure before optimizing.
Key Takeaway
Memory access ~100ns, SSD ~150μs, HDD ~10ms, datacenter network round trip ~500μs. Use these to judge when in-memory caching, SSDs, or cross-region calls are justified.

Availability Numbers Table — Annual Downtime for 99% to 99.999%

Uptime percentages translate directly into how much downtime your users experience per year. Understanding this table helps you set realistic SLAs and choose the right redundancy strategy.

Availability vs. Annual Downtime | Availability Level | Annual Downtime | Typical Deployment | |-------------------|----------------|--------------------| | 99% (2 nines) | 3.65 days | Single server, no redundancy | | 99.9% (3 nines) | 8.76 hours | Single server with monitoring & recovery | | 99.99% (4 nines) | 52.56 minutes | Multi-AZ, load balancer, automated failover | | 99.999% (5 nines) | 5.26 minutes | Multi-region active-active, redundant all layers | | 99.9999% (6 nines) | 31.56 seconds | Global distribution with instant failover, extreme cost |

Cost vs. benefit: Each extra nine of availability roughly doubles infrastructure cost. For most consumer startups, 99.9% (3 nines) is acceptable — you can lose 8 hours per year while retaining user trust. For financial systems, 99.99% (4 nines) is the baseline. Promising 5 nines requires a full multi-region design with continuous traffic redirection.

Practical rule of thumb: If your system's peak QPS estimate is a few thousand, a single DB instance with a warm standby can hit 3–4 nines. If your peak QPS is tens of thousands and you need 4+ nines, you must architect for component failure from the start — meaning multiple AZs, stateless application servers, and database replication with automatic failover.

Production Insight
Availability promises drive capacity planning. A 4-nines SLA means you must design for the failure of any single AZ, which directly affects how you provision static vs. autoscaling capacity. For 99.99%, you need at least 2x the peak QPS capacity spread across AZs.
Key Takeaway
99% = 3.65 days/yr; 99.9% = 8.76 hrs; 99.99% = 52.6 min; 99.999% = 5.26 min. Don't promise more nines than your architecture can afford or deliver.

Peak QPS vs. Average QPS — Why Average Will Get You Fired

If you provision infrastructure for average QPS, your system will collapse during every spike — which is exactly when your users need you most. The Super Bowl, Black Friday, a viral tweet, a product launch: all of these are predictable spike patterns, and none of them look like an average day.

Traffic follows a diurnal pattern (fancy word for 'it changes with the time of day'). For a US-based consumer app, traffic is lowest at 3–5am Eastern and peaks between 7–9pm Eastern. The ratio between peak hour and the overnight trough can easily be 10:1 or higher.

There are two types of peak you must plan for separately. The first is the predictable daily peak — use a 2x–3x multiplier above your daily average. The second is the burst peak — think of a celebrity tweeting your app link or a DDoS. This can be 10x–50x and you handle it with rate limiting and autoscaling, not by provisioning for it statically.

The multiplier you choose also informs your architectural decisions. At 2x peak you might be fine with a single primary database with replicas. At 10x peak you're looking at tiered caching, read replicas, and queue-based write buffering. The number drives the architecture — not the other way around.

io_thecodeforge_traffic_pattern_simulator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# io.thecodeforge Traffic Pattern Simulator — visualises how QPS changes over 24 hours
# Shows WHY designing for average QPS is dangerous

import math

def hourly_traffic_multiplier(hour_of_day: int) -> float:
    """
    Returns a multiplier representing relative traffic at a given hour.
    Models a typical US consumer app's diurnal (time-of-day) traffic pattern.
    Hour 0 = midnight, Hour 12 = noon, Hour 20 = 8pm (typical peak).
    """
    # A sine wave centred at 8pm (hour 20), scaled so peak = ~3x, trough = ~0.2x
    phase_shift = (hour_of_day - 20) * (math.pi / 12)
    raw_wave = math.cos(phase_shift)                    # -1 to +1
    multiplier = 1.6 + (1.4 * raw_wave)                # scale to 0.2x – 3.0x range
    return max(0.1, multiplier)

AVERAGE_QPS = 20_833  # from our previous estimation for the Twitter-like app

print("Hour | Multiplier | Actual QPS  | Safe to provision at average? ")
print("-" * 65)

for hour in range(24):
    multiplier = hourly_traffic_multiplier(hour)
    actual_qps = AVERAGE_QPS * multiplier
    at_capacity = actual_qps > AVERAGE_QPS
    danger_flag = "⚠ OVERLOADED" if at_capacity else "  OK"
    print(f"  {hour:02d}:00 |  {multiplier:5.2f}x     | {actual_qps:>10,.0f}  | {danger_flag}")

print()
peak_hour_qps = AVERAGE_QPS * hourly_traffic_multiplier(20)
print(f"Average QPS (baseline):  {AVERAGE_QPS:>10,.0f}")
print(f"Peak hour QPS (8pm):     {peak_hour_qps:>10,.0f}")
print(f"Peak-to-average ratio:   {peak_hour_qps / AVERAGE_QPS:>10.1f}x")
print()
print("Conclusion: provisioning for average QPS means your system is")
print("overloaded for roughly 8 hours every single day.")
Output
Hour | Multiplier | Actual QPS | Safe to provision at average?
-----------------------------------------------------------------
00:00 | 0.83x | 17,291 | OK
01:00 | 0.44x | 9,166 | OK
02:00 | 0.22x | 4,583 | OK
03:00 | 0.20x | 4,167 | OK
04:00 | 0.37x | 7,708 | OK
05:00 | 0.78x | 16,250 | OK
06:00 | 1.30x | 27,083 | ⚠ OVERLOADED
07:00 | 1.83x | 38,125 | ⚠ OVERLOADED
08:00 | 2.27x | 47,292 | ⚠ OVERLOADED
09:00 | 2.56x | 53,333 | ⚠ OVERLOADED
10:00 | 2.66x | 55,417 | ⚠ OVERLOADED
11:00 | 2.56x | 53,333 | ⚠ OVERLOADED
12:00 | 2.27x | 47,292 | ⚠ OVERLOADED
13:00 | 1.83x | 38,125 | ⚠ OVERLOADED
14:00 | 1.30x | 27,083 | ⚠ OVERLOADED
15:00 | 0.83x | 17,291 | OK
16:00 | 0.44x | 9,166 | OK
17:00 | 0.22x | 4,583 | OK
18:00 | 0.20x | 4,167 | OK
19:00 | 0.37x | 7,708 | OK
20:00 | 2.97x | 61,875 | ⚠ OVERLOADED
21:00 | 2.97x | 61,875 | ⚠ OVERLOADED
22:00 | 2.66x | 55,417 | ⚠ OVERLOADED
23:00 | 2.27x | 47,292 | ⚠ OVERLOADED
Average QPS (baseline): 20,833
Peak hour QPS (8pm): 61,875
Peak-to-average ratio: 3.0x
Conclusion: provisioning for average QPS means your system is
overloaded for roughly 8 hours every single day.
Watch Out: The 'Average' Trap
Presenting average QPS as your design target in a system design interview is a red flag for experienced interviewers. Always state your peak multiplier explicitly and justify it — 'I'll use 3x because this is a consumer app with strong evening traffic patterns.' That one sentence shows you understand real production systems.
Production Insight
Provisioning for average QPS creates an 8-hour daily overload window.
Each hour above capacity adds latency, retries, and dropped connections.
Rule: design for peak, autoscale for bursts.
Key Takeaway
Average QPS is a cost metric, not a capacity metric.
Always state peak QPS explicitly with multiplier.
Your architecture is only as strong as your peak assumption.

Storage Growth Rate — QPS Has a Write Side Effect

QPS estimation doesn't stop at request throughput. Every write request creates data, and that data accumulates. If you only estimate QPS for request handling but ignore storage growth, you'll build a system that handles traffic fine on day one but runs out of disk space on day 90.

The storage calculation is a direct extension of write QPS. Take your peak write QPS, multiply by the average payload size per write, and you get bytes per second written to disk. Multiply that by seconds per day, then by 365, and you know your one-year raw storage requirement. Always add 20–30% overhead for indexes, replication, and metadata.

This calculation also drives replication decisions. If your write QPS is 10,000 and you're running 3 replicas, each replica must sustain 10,000 writes per second. That's a very different machine spec than your read replicas, which only need to serve reads.

In interviews, connecting QPS → storage growth → replication factor is the sign of someone who's actually run production systems. It shows you're thinking about the full lifecycle of data, not just the happy path.

io_thecodeforge_storage_growth_estimator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# io.thecodeforge Storage Growth Estimator — connects write QPS to long-term storage needs
# This is the calculation interviewers want to see AFTER you state your write QPS

def estimate_storage_growth(
    peak_write_qps: float,
    avg_record_size_bytes: int,
    replication_factor: int = 3,
    index_overhead_pct: float = 0.30,
    projection_years: int = 3
) -> None:
    """
    Given a write QPS, calculates raw and replicated storage growth over time.
    replication_factor: how many copies of each write are stored (e.g., 3 for most distributed DBs)
    index_overhead_pct: extra space consumed by DB indexes (30% is a safe default for B-tree indexes)
    """
    SECONDS_PER_DAY  = 86_400
    BYTES_PER_GB     = 1_073_741_824
    BYTES_PER_TB     = BYTES_PER_GB * 1_024

    raw_bytes_per_second = peak_write_qps * avg_record_size_bytes
    replicated_bytes_per_second = raw_bytes_per_second * replication_factor
    total_bytes_per_second = replicated_bytes_per_second * (1 + index_overhead_pct)

    print(f"=== Storage Growth Estimation ===")
    print(f"Peak Write QPS       : {peak_write_qps:>12,.0f} writes/sec")
    print(f"Avg Record Size      : {avg_record_size_bytes:>12,} bytes")
    print(f"Replication Factor   : {replication_factor:>12}x")
    print(f"Index Overhead       : {index_overhead_pct*100:>11.0f}%")
    print()
    print(f"Raw write throughput : {raw_bytes_per_second/1_000_000:>11.1f} MB/sec")
    print(f"Total disk write rate: {total_bytes_per_second/1_000_000:>11.1f} MB/sec  (with replication + indexes)")
    print()

    print(f"{'Period':<12} | {'Raw Storage':>14} | {'With Replication + Index':>24}")
    print("-" * 58)

    for period_label, seconds in [
        ("1 Day",   SECONDS_PER_DAY),
        ("1 Month", SECONDS_PER_DAY * 30),
        ("1 Year",  SECONDS_PER_DAY * 365),
        (f"{projection_years} Years", SECONDS_PER_DAY * 365 * projection_years),
    ]:
        raw_total    = raw_bytes_per_second       * seconds
        on_disk_total = total_bytes_per_second    * seconds

        raw_str     = f"{raw_total    / BYTES_PER_TB:.2f} TB" if raw_total     > BYTES_PER_TB else f"{raw_total    / BYTES_PER_GB:.1f} GB"
        on_disk_str = f"{on_disk_total / BYTES_PER_TB:.2f} TB" if on_disk_total > BYTES_PER_TB else f"{on_disk_total / BYTES_PER_GB:.1f} GB"

        print(f"{period_label:<12} | {raw_str:>14} | {on_disk_str:>24}")

    print()
    print("Architecture implications:")
    yearly_tb = (total_bytes_per_second * SECONDS_PER_DAY * 365) / BYTES_PER_TB
    if yearly_tb < 10:
        print("  → Single-region storage likely sufficient for year 1")
    elif yearly_tb < 100:
        print("  → Plan for sharding or partitioning by end of year 1")
    else:
        print("  → Distributed storage (e.g. Cassandra, S3 + data lake) required from day 1")


# Using the write QPS from our Twitter-like example: 10,417 peak writes/sec
# Tweet record: ~300 bytes (tweet text + user_id + timestamp + metadata)
estimate_storage_growth(
    peak_write_qps=10_417,
    avg_record_size_bytes=300,
    replication_factor=3,
    index_overhead_pct=0.30,
    projection_years=3
)
Output
=== Storage Growth Estimation ===
Peak Write QPS : 10,417 writes/sec
Avg Record Size : 300 bytes
Replication Factor : 3x
Index Overhead : 30%
Raw write throughput : 3.1 MB/sec
Total disk write rate: 12.2 MB/sec (with replication + indexes)
Period | Raw Storage | With Replication + Index
----------------------------------------------------------
1 Day | 0.26 TB | 1.03 TB
1 Month | 7.87 TB | 30.82 TB
1 Year | 96.02 TB | 375.65 TB
3 Years | 288.06 TB | 1126.94 TB
Architecture implications:
→ Distributed storage (e.g. Cassandra, S3 + data lake) required from day 1
Interview Gold: The Storage Chain
Interviewers love when you say: 'My write QPS is 10,000/sec at peak. At 300 bytes per record with 3x replication and 30% index overhead, I'm writing about 12 MB/sec to disk — that's roughly 375 TB per year. That means I need a distributed storage solution from day one, which rules out a single Postgres instance.' That chain of reasoning — from QPS to architecture — is what separates senior candidates from junior ones.
Production Insight
10,000 writes/sec at peak with 300-byte records means 12 MB/s disk write rate.
With 3 replicas and 30% index overhead, that's 375 TB/year — distributed storage from day 1.
Rule: never commit to a single-node database without running this calc.
Key Takeaway
Write QPS × record size × replication × (1 + index overhead) = storage growth.
Storage drives architecture: <10 TB/year = single region; >100 TB/year = sharding.
Connect QPS to storage every time in interviews.

Bandwidth Estimation — Calculating Network Throughput from QPS

Most engineers stop at storage when estimating infrastructure from QPS, but network bandwidth is equally critical. Your servers must not only handle requests and store data — they must also push that data out to clients, replicas, and caches. Underestimating bandwidth leads to network saturation, TCP congestion, and retry storms that amplify latency.

Bandwidth formula: Bandwidth (bits per second) = QPS × average response size (bytes) × 8. For a system serving 50,000 peak read QPS with a 50KB average response (JSON payload including HTTP headers): - Bandwidth = 50,000 × 50,000 × 8 = 20,000,000,000 bps = 20 Gbps. That's a significant fraction of a typical 40 Gbps NIC. If you're running 10 servers, each needs at least 2 Gbps of outbound bandwidth — easily achieved with modern cloud instances, but you must request high-bandwidth instance types.

Inbound vs. outbound: For reads, the bulk is outbound (from server to client). For writes, inbound bandwidth matters (request bodies). A write-heavy system with 10,000 peak write QPS and 1KB average request body pushes 80 Mbps inbound — negligible compared to reads. Always compute both directions separately.

Bandwidth with replicas: Each replica also consumes network. If you have 3 read replicas and each receives the same 50KB response to serve to clients (assuming cache miss), your total outbound bandwidth triples. This often pushes you toward caching at the CDN level to reduce duplicate network load.

Practical rule: Monitor bandwidth at the NIC-level metrics. If you see >70% utilization during peak, you're at risk. Add more instances or optimize payload sizes (gzip, field selection, pagination).

io_thecodeforge_bandwidth_estimator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# io.thecodeforge Bandwidth Estimator — from QPS to network requirements

def estimate_bandwidth(
    peak_qps: float,
    avg_response_bytes: int,
    bytes_overhead_factor: float = 1.2,  # TCP/IP headers
    replication_factor_reads: int = 1,
    inbound_qps: float = 0,
    inbound_request_bytes: int = 0
) -> dict:
    """Calculate outbound and inbound bandwidth in Gbps."""
    bytes_per_bit = 8
    gigabit = 1_000_000_000

    # Outbound (responses to clients + to replicas if needed)
    total_outbound_bps = peak_qps * avg_response_bytes * bytes_overhead_factor * bytes_per_bit * replication_factor_reads
    inbound_bps = inbound_qps * inbound_request_bytes * bytes_overhead_factor * bytes_per_bit

    return {
        "outbound_bandwidth_gbps": round(total_outbound_bps / gigabit, 2),
        "inbound_bandwidth_gbps": round(inbound_bps / gigabit, 2)
    }

# Example: 50,000 peak read QPS, 50KB avg response, 1x replication (no read replicas)
result = estimate_bandwidth(
    peak_qps=50_000,
    avg_response_bytes=50_000,
    replication_factor_reads=1
)
print(f"Outbound bandwidth: {result['outbound_bandwidth_gbps']} Gbps")
print(f"Inbound bandwidth: {result['inbound_bandwidth_gbps']} Gbps")
Output
Outbound bandwidth: 24.0 Gbps
Inbound bandwidth: 0.0 Gbps
Bandwidth vs. QPS: Non-Linear Scaling
Bandwidth scales linearly with response size. If your API returns a 500KB JSON blob instead of a 50KB paginated response, bandwidth jumps 10x. Always design APIs to return only what's needed. For large objects, use CDN or pre-signed URLs.
Production Insight
Bandwidth bottlenecks often appear silently — network buffers fill, TCP windowing kicks in, and clients experience timeouts before CPU or memory alarms trigger. Always graph outbound bandwidth alongside QPS.
Key Takeaway
Bandwidth (Gbps) = (QPS × response size × 8 × overhead) / 1e9. Outbound dominates for read-heavy systems. Monitor and cache at the CDN to reduce network load.

Read/Write Split — Why Your Cache Strategy Depends on It

Most engineers estimate total QPS and stop there. That's a mistake. Read QPS and write QPS drive fundamentally different architectural decisions. If your read/write ratio is 10:1, you invest in caching and read replicas. If it's 1:1, you focus on write durability, replication lag, and partition tolerance.

Take the Twitter-like app from earlier: read QPS 52,083 at peak, write QPS 10,417 — a 5:1 ratio. That means the read path must handle 5x the throughput of the write path. With a read cache that serves 80% of reads from memory, you reduce read database QPS to 10,417 — exactly matching write QPS. Suddenly a symmetric read replica setup works.

Inversely, a write-heavy system like a monitoring metrics pipeline (where every agent writes every second) will have a 1:1 or even write-dominated ratio. There, caching reads doesn't help because most reads are dashboards querying recent data. Instead, you partition writes across time-based shards and use columnar compression.

Knowing the split isn't a nice-to-have. It's the difference between spending $50k/month on Redis instances you don't need or $50k/month on write-optimised storage you do.

io_thecodeforge_read_write_split_planner.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# io.thecodeforge Read/Write Split Calculator – determines optimal architecture from ratio

def recommend_architecture(
    avg_read_qps: float,
    avg_write_qps: float,
    peak_multiplier: float
) -> dict:
    """
    Given average read/write QPS, returns architectural recommendations.
    """
    read_write_ratio = avg_read_qps / max(avg_write_qps, 1)
    peak_read_qps = avg_read_qps * peak_multiplier
    peak_write_qps = avg_write_qps * peak_multiplier

    print(f"=== Read/Write Split Analysis ===")
    print(f"Average Read QPS : {avg_read_qps:>10,.1f}")
    print(f"Average Write QPS: {avg_write_qps:>10,.1f}")
    print(f"Ratio (Read:Write): {read_write_ratio:.1f}:1")
    print(f"Peak Read QPS    : {peak_read_qps:>10,.1f}")
    print(f"Peak Write QPS   : {peak_write_qps:>10,.1f}")
    print()

    print("Architecture Recommendations:")
    if read_write_ratio >= 10:
        print("  → Heavy read bias: Use CDN + Redis cache for reads.")
        print("  → Multiple read replicas; consider read-only shards.")
        print("  → Write path: single primary with async replication.")
    elif read_write_ratio >= 3:
        print("  → Moderate read bias: Use Redis or Memcached for hot data.")
        print("  → A few read replicas; database primary with replicas is often enough.")
        print("  → Write path: single primary with replication and backups.")
    elif read_write_ratio >= 0.5:
        print("  → Balanced: Both read and write paths need scaling.")
        print("  → Consider sharded database (e.g., Vitess, CockroachDB).")
        print("  → Caching still useful but less impact; focus on partition tolerance.")
    else:
        print("  → Write-heavy: Optimize for write throughput.")
        print("  → Use distributed database like Cassandra or ScyllaDB.")
        print("  → Queue writes (Kafka) before database to absorb bursts.")
        print("  → Caching reads is modest; consider materialized views.")

    print()
    print("Key insight: If you can cache 80% of reads,")
    print(f"  effective read QPS reduces to {peak_read_qps * 0.2:>10,.0f}")
    print(f"  which is {peak_read_qps * 0.2 / peak_write_qps:.1f}x your write QPS.")


# Example: Twitter-like app (read 52k, write 10k peak)
recommend_architecture(
    avg_read_qps=52_083,
    avg_write_qps=10_417,
    peak_multiplier=1.0  # already peak values
)
Output
=== Read/Write Split Analysis ===
Average Read QPS : 52,083.0
Average Write QPS: 10,417.0
Ratio (Read:Write): 5.0:1
Peak Read QPS : 52,083.0
Peak Write QPS : 10,417.0
Architecture Recommendations:
→ Moderate read bias: Use Redis or Memcached for hot data.
→ A few read replicas; database primary with replicas is often enough.
→ Write path: single primary with replication and backups.
Key insight: If you can cache 80% of reads,
effective read QPS reduces to 10,417
which is 1.0x your write QPS.
The Cache Leverage Point
  • If reads = 100k/s and cache hits 80%, DB reads drop to 20k/s
  • That 20k/s matches a 20k/s write workload — symmetric architecture possible
  • If writes = 50k/s, even 90% cache hit leaves 10k/s reads — still 0.2x writes
  • Cache doesn't fix write bottlenecks. Use it only where ratio ≥5:1
Production Insight
A 5:1 read/write ratio means you need 5x read capacity vs write.
Caching 80% of reads reduces read QPS by 80% — database load drops dramatically.
Rule: ratio >10:1? Use CDN and in-memory cache. Ratio 1:1? Use write-optimized storage like Cassandra.
Key Takeaway
Read QPS drives cache and replica count.
Write QPS drives durability and replication strategy.
Never treat QPS as a single number — split it first.

Burst Traffic and Autoscaling — Handling the Unexpected Spikes

You've estimated your average QPS, applied a peak multiplier, and provisioned for sustained peak. But what about the unexpected? A viral post, a PR disaster, a partner integration activation — these can dump 10x your peak QPS in seconds. If you provision statically for that, your cloud bill becomes a joke. If you don't, your site goes down.

Your autoscaling design must account for three things: detection speed, scaling velocity, and cooldown physics. CPU-based autoscaling is too slow — by the time CPU hits 80% and new instances register, you're already in backpressure. Instead, use request queue depth or a dedicated QPS metric. Set scaling to trigger at, say, 70% of provisioned capacity, and aim to add 20% capacity every 60 seconds.

Bursts that exceed sustained peak by 20% or more should trigger a different response — not more instances, but load shedding. Queue non-critical writes to Kafka, drop analytics requests, serve stale cache. Define a tiered degradation plan before you need it.

In production, we've seen teams buy 3x peak capacity after a single spike. That's wasteful. The right approach: provision for 1.5x sustained peak, autoscale to 3x within 2 minutes, and rate-limit beyond that. And always test your autoscaling with a real traffic spike before the Super Bowl.

io_thecodeforge_autoscaling_config.yamlYAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# io.thecodeforge Autoscaling Configuration Example (Kubernetes HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: io_thecodeforge_qps-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-service
  minReplicas: 5
  maxReplicas: 30
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: 2000  # target 2k QPS per pod
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0  # scale up immediately
      policies:
      - type: Pods
        value: 5                    # add up to 5 pods per period
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # wait 5 min before scaling down
      policies:
      - type: Percent
        value: 20                   # remove at most 20% pods per period
        periodSeconds: 60
Don't Let Autoscaling Be Your Only Defense
Autoscaling has a reaction delay. If traffic doubles in 30 seconds, by the time your scaling takes effect the database has already fallen over. Always layer rate limiting, connection pooling, and circuit breakers under your autoscaling.
Production Insight
Autoscaling based on CPU is too slow for burst traffic — use request queue depth.
Provision statically for 1.5x sustained peak; autoscale to 5x for bursts.
Rule: test autoscaling with a real spike before it matters.
Key Takeaway
Burst QPS ≠ peak QPS.
Use tiered plan: static provision + autoscale + load shed.
Never rely on autoscaling alone for sudden spikes.

The Interview Walkthrough — Tying It All Together

Let's walk through a typical system design interview problem: 'Design a URL shortener like bit.ly with 100 million daily active users.'

Step 1: Anchor on DAU. Already given: 100M DAU. No need to derive from MAU (but if they gave MAU, you'd use 10-20% to get DAU).

Step 2: Estimate actions per user per day. - Users visit short URLs (read): assume each user clicks 3 links per day → 300M reads/day. - Users create short URLs (write): assume each user creates 0.1 short URLs per day → 10M writes/day. - Total daily requests: 310M.

Step 3: Average QPS. - Average read QPS = 300M / 86,400 ≈ 3,470. - Average write QPS = 10M / 86,400 ≈ 116. - Average total QPS ≈ 3,586.

Step 4: Peak multiplier. URL shorteners have diurnal pattern but also viral bursts (a link goes viral). Use 3x sustained peak and 10x burst (handled by autoscaling). - Peak read QPS ≈ 10,400. - Peak write QPS ≈ 350.

Step 5: Storage growth. Write QPS 350 at peak; record size ~500 bytes (URL + user ID + timestamp + metadata). With 3x replication and 30% overhead: - Write rate ≈ 350 × 500 × 3 × 1.3 ≈ 682 KB/s. - Storage per year ≈ 682 KB/s × 86,400 × 365 ≈ 21.5 TB/year. - That's moderate — a few database nodes can handle it, but you'll need to plan for sharding after year 2.

Step 6: Architecture decisions from QPS. - Read-heavy (30:1 ratio): use a CDN for redirects (they're cacheable), Redis for top-N popular URLs, read replicas. - Write QPS low: a single primary database with replication is fine, but you'll need it to be write-optimized (e.g., Aurora or PostgreSQL with streaming replication). - Storage growth: distributed key-value store? Not required year 1, but plan for Vitess or Cassandra in year 3. - Caching: with 80% cache hit on reads, database read QPS drops to ~2,080 — matches write QPS comfortably.

That chain — from DAU to QPS to storage to architecture — is what interviewers want to see. Practice it until it's second nature.

io_thecodeforge_url_shortener_qps.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# io.thecodeforge URL Shortener QPS Estimation — Interview Practice

def url_shortener_estimate(dau: int) -> dict:
    SECONDS_IN_DAY = 86_400
    reads_per_user = 3
    writes_per_user = 0.1
    record_size = 500
    replication = 3
    index_overhead = 0.30
    peak_multiplier = 3.0

    avg_read_qps = (dau * reads_per_user) / SECONDS_IN_DAY
    avg_write_qps = (dau * writes_per_user) / SECONDS_IN_DAY
    peak_read_qps = avg_read_qps * peak_multiplier
    peak_write_qps = avg_write_qps * peak_multiplier

    storage_per_second = peak_write_qps * record_size * replication * (1 + index_overhead)
    storage_per_year = storage_per_second * SECONDS_IN_DAY * 365

    return {
        "avg_read_qps": round(avg_read_qps, 0),
        "avg_write_qps": round(avg_write_qps, 0),
        "peak_read_qps": round(peak_read_qps, 0),
        "peak_write_qps": round(peak_write_qps, 0),
        "read_write_ratio": round(avg_read_qps / max(avg_write_qps, 1), 1),
        "storage_gb_per_year": round(storage_per_year / 1e9, 1)
    }

result = url_shortener_estimate(100_000_000)
for k, v in result.items():
    print(f"{k:25s} : {v}")
Output
avg_read_qps : 3472.0
avg_write_qps : 116.0
peak_read_qps : 10417.0
peak_write_qps : 347.0
read_write_ratio : 30.0
storage_gb_per_year : 21457.9
The Assumption Audit
Interviewers care less about your exact numbers and more about your assumptions. Say: 'I'm assuming each active user clicks 3 URLs per day. That might be higher during a viral campaign, so I'll note that and handle bursts with autoscaling.' Then state your peak multiplier and justify it.
Production Insight
Interviewers test your assumptions more than your arithmetic.
Stating 'I'll use 15% DAU/MAU' is better than 'I assume 100M DAU.'
Rule: always justify each assumption with a brief rationale.
Architecture Decision Based on QPS Profile
IfRead QPS >> Write QPS (ratio >10:1)
UsePrioritize caching (CDN, Redis, read replicas). Write path can stay simple (primary-replica).
IfWrite QPS >> Read QPS (ratio >2:1)
UseUse write-optimized database (Cassandra, ScyllaDB). Queue writes to absorb bursts. Caching less impactful.
IfRead and Write are balanced (ratio 0.5:1 to 2:1)
UseSharded database with partition tolerance (Vitess, CockroachDB). Both paths need horizontal scaling.
IfPeak QPS exceeds capacity by >5x
UseAdd rate limiting, graceful degradation, and anticipatory autoscaling. Consider async processing for non-critical writes.
● Production incidentPOST-MORTEMseverity: high

The $2M Missing Cache: When Average QPS Killed the Launch

Symptom
New feature page slowly loaded, then returned 502 errors. Database CPU 100%, connections exhausted. Autoscaling triggered but new instances started already in backpressure.
Assumption
The engineer assumed average QPS = traffic during launch hour. They used 1 hour window data from beta and divided by 3600 seconds, ignoring that beta had 1/10th the user base.
Root cause
They calculated QPS as (total daily requests) / 86,400 but used total daily requests from a small beta test. They didn't apply a peak multiplier because 'average should be safe with autoscaling.' But autoscaling was configured on CPU, which hit 100% before new instances could register with the load balancer.
Fix
Added a Redis cache layer in front of the database (hit ratio 85%). Set autoscaling to trigger on request queue depth instead of CPU. Provisioned static capacity for 20,000 QPS (sustained peak) and allowed autoscaling to 60,000 QPS.
Key lesson
  • Never use average QPS as your provisioning target — always compute peak QPS with a multiplier based on traffic patterns.
  • Autoscaling based on CPU is too slow for sudden spikes; use request queue depth or request rate metrics.
  • Beta traffic patterns don't scale linearly — apply a risk multiplier when extrapolating to full launch.
Production debug guideUse these symptom-action pairs to rapidly identify capacity misconfigurations4 entries
Symptom · 01
Average QPS is 5,000 but database CPU is 100% at 2 PM
Fix
Check peak hour QPS — the diurnal pattern may push actual QPS to 3x average. Review hourly metrics. If peak QPS exceeds provisioned capacity, add read replicas or cache.
Symptom · 02
Writes are slow, reads are fine
Fix
Compare write QPS vs write throughput capacity. Check if replication lag is causing write throttling. Consider partitioning (sharding) the write path.
Symptom · 03
Disk filling 3x faster than expected
Fix
Recalculate storage growth: write QPS × record size × replication factor × (1 + index overhead). Verify actual record sizes vs estimate. Increase projection if necessary.
Symptom · 04
Autoscaling never triggers during traffic spikes
Fix
Check scaling metric configuration. CPU lags behind request rate; use request queue depth or custom QPS metric. Set scaling cooldown to 60 seconds to avoid flapping.
★ QPS Estimation Quick Debug Cheat SheetWhen your system behaves unexpectedly, run these commands and checks to verify your QPS assumptions are correct.
Database CPU at 100% during normal business hours
Immediate action
Check hourly QPS distribution to identify peak times.
Commands
SELECT DATE_TRUNC('hour', created_at) AS hour, COUNT(*) AS requests FROM requests GROUP BY hour ORDER BY hour;
SHOW FULL PROCESSLIST; (MySQL) or pg_stat_activity (Postgres) to see active queries.
Fix now
Add read replicas or increase cache TTL to absorb peak reads. If peak exceeds current static capacity by >2x, provision more replicas immediately.
Write latency spikes every few hours+
Immediate action
Check if writes are queued and if replication lag is accumulating.
Commands
SHOW SLAVE STATUS\G; (MySQL) or SELECT * FROM pg_stat_replication; (Postgres) for lag.
Check disk IOPS: iostat -x 1 (Linux) or CloudWatch metrics.
Fix now
Reduce synchronous commit if replication is the bottleneck, or increase write throughput by partitioning the write table.
Disk space warning after deploying new feature+
Immediate action
Recalculate storage growth based on actual write QPS and record size.
Commands
SELECT pg_database_size(current_database()); (Postgres) or SELECT table_schema AS 'Database', SUM(data_length + index_length) / 1024 / 1024 AS 'Size (MB)' FROM information_schema.tables GROUP BY table_schema; (MySQL)
Check logs for unexpected large writes or indexing overhead.
Fix now
Add disk space or enable auto-scaling for storage. If growth is >10% per week, plan to shard within three months.
🔥

That's Estimation. Mark it forged?

12 min read · try the examples if you haven't

Previous
Capacity Planning Basics
3 / 5 · Estimation
Next
Storage Estimation Techniques