Advanced 12 min · March 06, 2026

How to Answer System Design Q

System Design Interview: Fan-out OOM & Duplicate Tweets

Q: How long should I spend on requirements in a system design interview?

Five minutes maximum — but make them count. The goal is to lock in 2-3 functional requirements (which flows are in scope), 3 specific NFRs (QPS, latency SLA, availability target), and one explicit consistency model decision. After that, move to high-level design. Spending more than 5 minutes on requirements signals analysis paralysis, not thoroughness — interviewers have a fixed 45 minutes and need to see the design, not just the scoping conversation.

Q: Should I memorize specific architectures like 'design Twitter' or 'design YouTube'?

Don't memorize architectures — memorize patterns. The fan-out problem in Twitter is the same shape as the notification delivery problem in any social app, or the event propagation problem in a multiplayer game, or the price update broadcast problem in a trading system. If you internalize the underlying pattern — write amplification vs. read amplification as a function of follower distribution — you can derive the solution for any prompt. Memorized answers fall apart the moment the interviewer adds a constraint you didn't rehearse for, like 'now add topics' or 'now make this work offline-first.' Patterns are durable; memorized diagrams are not.

Q: What's the difference between a 'hire' and a 'strong hire' in a system design round?

A 'hire' candidate completes a reasonable design that addresses the core flows and mentions the right trade-offs when prompted by the interviewer. A 'strong hire' candidate proactively surfaces edge cases and bottlenecks before being asked, quantifies every design decision against specific numbers, demonstrates that they've seen these problems break in production rather than just read about them in a textbook, and closes with a crisp one-paragraph summary that a non-technical stakeholder could understand. The difference is almost entirely in the depth and proactiveness of trade-off reasoning and the clarity of the closing summary — not the final diagram on the whiteboard.

Fan-out worker OOM caused duplicate timeline entries and 12-min Kafka lag.

Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Lessons pulled from things that broke in production.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

System design interviews test structured ambiguity navigation, not knowledge recall — process matters more than the final diagram
Four phases: clarify requirements (5 min), high-level design (15 min), deep dive (20 min), trade-offs and close (5 min)
Never draw a component until you have 3 specific NFRs with real numbers: QPS, latency SLA, availability target
Hybrid fan-out (push for normal users, pull for celebrities) is the production answer to social feed scaling
Proactively surface your own design weaknesses before the interviewer finds them — this is the highest-signal senior behavior
Biggest mistake: jumping to Kafka in the first 60 seconds without numbers to justify it — interviewers read this as guessing

✦ Definition~90s read

What is How to Answer System Design Q?

This article walks through a realistic system design interview for a Twitter-like feed service, focusing on two specific failure modes that separate senior candidates from juniors: fan-out out-of-memory (OOM) errors and duplicate tweets. Fan-out is the pattern where a user's tweet is pushed into the timelines of all their followers; the OOM problem occurs when a celebrity with millions of followers triggers a massive write storm that can crash a single server or queue.

★

Imagine someone walks up to you and says: 'Design a city.' You wouldn't just start drawing roads randomly — you'd ask how many people live there, what they need, and what the budget is.

Duplicate tweets arise from retry logic, network partitions, or idempotency gaps in the write path. The article treats these not as edge cases but as the core test of your ability to reason about distributed systems under load. It covers the full 45-minute interview arc: clarifying requirements, sketching a high-level design with API contracts, deep-diving into the hard problems (fan-out OOM mitigation via hybrid push-pull, deduplication via idempotency keys or bloom filters), and closing with trade-offs like consistency vs. availability.

It also includes a Phase 0 on capacity estimation—because designing without numbers is like debugging without logs. Real-world references include Twitter's early fan-out issues, Redis cluster limits, and Kafka partitioning strategies. If you're preparing for senior+ system design rounds at FAANG or similar, this article gives you the concrete failure modes and decision frameworks that interviewers actually care about.

Plain-English First

Imagine someone walks up to you and says: 'Design a city.' You wouldn't just start drawing roads randomly — you'd ask how many people live there, what they need, and what the budget is. System design interviews work exactly the same way. The interviewer hands you a blank whiteboard and says 'design YouTube' — and what they're actually testing is whether you can think like an architect before you start laying bricks. The framework in this article is your blueprint for that conversation.

⚙ Browser compatibility

Latest versions — ✓ supported

Chrome	Firefox	Safari	Edge
✓	✓	✓	✓

System design interviews are the highest-signal round in any senior engineering hiring process. They separate engineers who can implement from engineers who can think at scale. A candidate who writes flawless LeetCode solutions can still completely bomb a system design round because the skills are fundamentally different — one is puzzle-solving, the other is structured ambiguity navigation.

The problem isn't that engineers don't know distributed systems. Most senior candidates have read about consistent hashing, Kafka, and sharding. The problem is they don't know how to structure a 45-minute conversation into a coherent narrative that demonstrates both technical depth and product instinct simultaneously.

By the end of this article you'll have a repeatable, time-boxed framework you can apply to any system design prompt. You'll know exactly what to say in each phase, which trade-offs to surface proactively, how to handle follow-up pressure, and what signals separate a 'strong hire' from a 'hire' at FAANG-level bars.

Why Fan-out OOM & Duplicate Tweets Are the Real System Design Test

System design interviews evaluate your ability to build scalable, fault-tolerant distributed systems under realistic constraints. The core mechanic is not memorizing architectures but reasoning through trade-offs: consistency vs. availability, latency vs. throughput, and cost vs. reliability. For fan-out, you must decide between push (write-time fan-out) and pull (read-time fan-out), each with distinct memory and latency profiles. Duplicate tweets expose idempotency failures — a missing dedup key or retry logic can double-write to timelines, corrupting user experience and downstream analytics.

In practice, fan-out OOM occurs when a celebrity tweet triggers a push to millions of followers, overwhelming the fan-out service’s heap. The key property is that fan-out size is O(followers) per tweet, not O(1). Without backpressure or batching, a single hot tweet can exhaust memory in seconds. Duplicate tweets arise from at-least-once delivery semantics in message queues: a producer retry after a timeout may succeed twice, and if the consumer lacks idempotency (e.g., a unique tweet ID check before insert), duplicates persist.

Use push fan-out for high-read, low-write systems (e.g., Twitter timelines) where read latency must be sub-100ms. Use pull fan-out for sparse followers or when write throughput dominates. Idempotency keys (tweet ID + timestamp) are non-negotiable for any system with retries. These decisions matter because a single design flaw — like unbounded fan-out or missing dedup — can cascade into global outages, as seen in real Twitter and Facebook incidents.

⚠ Fan-out ≠ Scalability

Push fan-out reduces read latency but shifts memory pressure to write time — a celebrity tweet can OOM your fan-out service if you don't cap batch size or use async queues.

📊 Production Insight

Twitter's 'fail whale' era: a single celebrity tweet (e.g., Obama's 2017 post) triggered fan-out to 100M+ followers, causing OOM in the fan-out service due to unbounded in-memory batch processing.

Symptom: fan-out service heap spikes to 95%+ in seconds, followed by cascading timeouts in downstream timeline caches.

Rule of thumb: always batch fan-out writes to at most 1,000 recipients per batch, and use a bounded thread pool with a rejection handler (e.g., backpressure to queue).

🎯 Key Takeaway

Fan-out OOM is a memory-bound problem, not a CPU problem — always model worst-case fan-out size per request.

Duplicate tweets are a data integrity problem solved by idempotency keys, not by fixing retry logic alone.

The best design trades off write amplification for read simplicity — know your read-to-write ratio before choosing push vs. pull.

thecodeforge.io

How To Answer System Design

Phase 1 — Clarify Requirements Before Touching the Whiteboard (Minutes 0–5)

The single biggest mistake candidates make is starting to design immediately. Interviewers deliberately leave the prompt vague because the ability to identify the right questions is itself a senior engineering skill. 'Design Twitter' could mean the timeline, the search, the DM system, or the ad platform — they're completely different problems with completely different bottlenecks.

Spend the first five minutes asking structured clarifying questions in two buckets: functional requirements and non-functional requirements.

Functional requirements define what the system does. Which features are in scope? 'Design Twitter' — are we doing tweet posting, home timeline, follower graph, search, or all of them? Pin down one or two core flows and explicitly park the rest: 'I'll focus on posting tweets and reading the home timeline. I'll note search and DMs as out of scope unless we have time.'

Non-functional requirements define how the system behaves under load. This is where most candidates are too vague. Don't ask 'how many users?' — ask specifically: 'Are we targeting Twitter's actual scale of ~500M daily active users, or is this a startup at 1M DAU?' Then drive the numbers yourself. Assume a write:read ratio, calculate QPS, estimate storage needs. Saying 'let's assume 500M DAU, 100M tweets per day, roughly 1,200 tweet writes per second and maybe 10x that for reads — so about 12,000 timeline reads per second as a floor, likely higher with fan-out' shows the interviewer you can reason from first principles rather than recite memorized numbers.

Also clarify consistency requirements explicitly. Is eventual consistency acceptable for the timeline? Almost certainly yes — a user seeing a tweet 2 seconds late is tolerable. For payments or inventory? Never. Lock in the consistency model you're designing to before you propose any components, because it determines whether you can use async fan-out at all.

io/thecodeforge/interview/RequirementsClarificationScript.txtTEXT

// ── PHASE 1: REQUIREMENT CLARIFICATION SCRIPT ──────────────────────────────
// Use this verbatim in your next interview. Adapt the numbers to the prompt.

// STEP 1 — Functional Scope
"Before I start designing, I want to make sure we're aligned on scope.
 Let's say we're building Twitter's core. I'm going to focus on:
   (1) A user posting a tweet
   (2) A user reading their home timeline
 Out of scope for now: search, DMs, ads, trending topics.
 Does that work for you, or would you like me to reprioritize?"

// STEP 2 — Scale Estimation (do this out loud, show your work)
"Let me size this system:
  - 500M daily active users
  - Assume 20% post content: ~100M tweets/day
  - 100M tweets / 86,400 seconds ≈ 1,200 tweet writes/second (peak ~3x = 3,600 wps)
  - Read:Write ratio is typically 100:1 for social feeds
  - So reads: ~120,000 reads/second (peak ~360,000 rps)
  - Tweet storage: avg 200 bytes per tweet * 100M/day = 20GB/day, ~7TB/year
  - Media (images/video) stored separately in object storage — not in the tweet DB"

// STEP 3 — Non-Functional Requirements
"A few more constraints I want to nail down:
  - Latency: home timeline should load in under 200ms p99
  - Availability: 99.99% uptime (four nines) — no scheduled downtime
  - Consistency: eventual consistency is OK for the timeline
    (a user seeing a tweet 2 seconds late is fine)
  - Durability: tweets must never be lost once written — acknowledged writes
    must survive a single node failure
  - Geo: assume global, so we need multi-region support with latency-based routing"

// OUTPUT: You now have a design contract. Everything you build next
// is justified against these numbers. This is what separates architects
// from engineers who just know technology names.

Output

Interviewer feedback pattern after this phase:

'Great — yes, let's focus on posting and timeline. Your scale numbers look reasonable.

Let's proceed.' (This is the green light. You've demonstrated product + engineering instinct.)

⚠ Watch Out: The Vague Non-Functional Trap

Saying 'the system should be highly available and scalable' is a red flag to interviewers. Every system should be. Instead, anchor your NFRs to specific numbers: '99.99% availability means we can afford ~52 minutes of downtime per year — that shapes our failover strategy and rules out any single-region design.' Specificity signals seniority because it proves you've had to justify these numbers to a real stakeholder.

📊 Production Insight

Vague NFRs like 'highly available and scalable' signal inexperience to interviewers. Every component choice must trace back to a specific NFR with a number. Rule: never draw a single box until you have QPS, latency SLA, and availability target written down — these three numbers are the load-bearing walls of your entire design.

🎯 Key Takeaway

Requirements before architecture — this is the single most important rule in any system design interview. Every component you draw must trace back to a specific NFR with a real number. If you can't justify a component against a requirement, you're pattern-matching to things you've read rather than designing for the problem in front of you — and experienced interviewers can tell the difference immediately.

Requirements Clarification Decision Tree

IfInterviewer gives a vague prompt like 'design Twitter'

→

UseClarify functional scope first — pick 1-2 core flows, explicitly park the rest, and confirm the scope with the interviewer before moving on

IfInterviewer does not specify scale or user count

→

UsePropose reasonable numbers yourself and get confirmation — show first-principles reasoning out loud, not just the final number

IfInterviewer does not mention consistency requirements

→

UseAsk explicitly — 'Is eventual consistency acceptable for this flow, or do we need read-after-write guarantees?' The answer changes the entire architecture

IfRequirements take more than 5 minutes to clarify

→

UseMove forward with your best assumptions, state them clearly, and note them on the whiteboard — analysis paralysis is a stronger negative signal than slightly imprecise numbers

thecodeforge.io

How To Answer System Design

Phase 2 — High-Level Design and the API Contract (Minutes 5–20)

Once requirements are locked, draw the 30,000-foot view before zooming in anywhere. Most candidates zoom in too fast — they start talking about database sharding before they've even established what services exist. Interviewers want to see that you can hold the whole system in your head before optimizing any part of it.

Start with the client-to-server path. Draw: client → load balancer → API gateway → application services → data stores. Then define your API contract for the core flows. This is non-negotiable for senior roles — defining the API before designing the backend proves you think contract-first, which is how production systems are actually built. Teams agree on the API surface first, then build services independently against that contract.

For tweet posting: POST /v1/tweets, body: {user_id, content, media_ids[]}, response: {tweet_id, created_at}. For timeline: GET /v1/users/{user_id}/timeline?cursor={cursor}&limit=20, response: {tweets: [...], next_cursor: string}. Note the cursor-based pagination — explain why explicitly: offset pagination requires the database to scan and discard N rows to reach the offset, which degrades to a full table scan at depth. Cursor pagination hits the index directly.

Decompose into services early. For Twitter, you need at minimum: a Tweet Service (writes and reads), a Timeline Service (fan-out and reads), a User Service (auth and profiles), and a Media Service (upload/serve images). Keep services small and name them after the business capability, not the technology. Don't say 'the Java microservice' — say 'the Timeline Service.' This signals domain-driven thinking and maps directly to how you'd structure teams around the system.

For each service, identify the primary data it owns. The Tweet Service owns a tweets table. The Timeline Service owns precomputed timeline caches. The User Service owns the social graph. Call out the anti-pattern explicitly before the interviewer asks about it: 'I want to be clear that each service owns its own datastore. The Timeline Service does not query the Tweet Service's database directly. Cross-service communication happens via APIs or async events on a message queue. Shared databases create hidden coupling that makes services impossible to scale or deploy independently — I've seen this pattern cause production outages where a slow query from one service brought down an unrelated service sharing the same DB connection pool.'

io/thecodeforge/interview/HighLevelDesignScript.txtTEXT

// ── PHASE 2: HIGH-LEVEL DESIGN WALKTHROUGH ──────────────────────────────────
// Narrate this while drawing on the whiteboard. Never draw in silence.

// STEP 1 — API Contract (define before backend)

API DESIGN — Tweet Posting Flow:
  POST /v1/tweets
  Request:  { user_id: UUID, content: string(280), media_ids: UUID[] }
  Response: { tweet_id: UUID, created_at: ISO8601, status: 'published' }

API DESIGN — Home Timeline Read:
  GET /v1/users/{user_id}/timeline?cursor={opaque_cursor}&limit=20
  Response: {
    tweets: [ { tweet_id, author_id, content, media_url, created_at, like_count } ],
    next_cursor: "eyJ0d2VldF9pZCI6IjEyMzQ1NiJ9"  // base64-encoded last tweet ID
  }

  // Why cursor pagination on the timeline read?
  // Offset-based: SELECT * FROM tweets LIMIT 20 OFFSET 10000
  //   → DB scans 10,020 rows and discards 10,000. O(n) cost. Breaks at scale.
  // Cursor-based: SELECT * FROM tweets WHERE id < :last_seen_id LIMIT 20
  //   → Uses the index directly. O(log n). Stable even when rows are inserted.

// STEP 2 — Service Decomposition

  [Mobile / Web Client]
       ↓ HTTPS
  [CDN — static assets, cached timeline responses for inactive users]
       ↓ cache miss
  [API Gateway — rate limiting, auth token validation, routing]
       ↓
  ┌──────────────────────────────────────────────┐
  │  Tweet Service    Timeline Service           │
  │  User Service     Notification Service       │
  │  Media Service    Search Service (out-scope) │
  └──────────────────────────────────────────────┘
       ↓                    ↓
  [Tweet DB]         [Timeline Cache — Redis]
  [User DB]          [Fan-out Queue — Kafka]
  [Object Store]     [Notification Queue — Kafka]

// STEP 3 — Call out the coupling anti-pattern proactively
"One thing I want to be explicit about: each service owns its own datastore.
 The Timeline Service does NOT query the Tweet Service's database directly.
 Communication between services is via APIs or async events on a message queue.
 Shared databases create hidden coupling — I've seen this cause incidents where
 a slow analytics query from the reporting service saturated the connection pool
 on the tweet DB and caused write timeouts on the hot path. Separate datastores
 eliminate that failure mode entirely."

Output

Expected interviewer response:

'Good. Now let's talk about the timeline fan-out — how does a tweet from

a celebrity with 50M followers get delivered without destroying your system?'

(You've earned the deep-dive. This is the inflection point.)

💡Pro Tip: Narrate Your Trade-offs, Don't Just Draw

Every architectural decision is a trade-off. When you choose REST over gRPC, say why: 'REST for external APIs because it's firewall-friendly and easier for third-party clients to consume without generated stubs. If this were internal service-to-service communication at high throughput, I'd pick gRPC for the strongly-typed contracts and binary serialization.' Interviewers give credit for the reasoning, not just the answer — a correct choice with no justification scores lower than a slightly imperfect choice with clear trade-off reasoning.

📊 Production Insight

Shared databases between services create hidden coupling that blocks independent scaling and deployment. API contract definition before backend design proves you think contract-first — this is the behavior that distinguishes senior engineers from mid-level engineers who start with the implementation and infer the interface. Rule: name services after business capabilities, not technologies — 'Timeline Service' not 'the Java microservice'.

🎯 Key Takeaway

Define the API contract before designing the backend — contract-first is how production teams build systems independently without stepping on each other. Cursor-based pagination is non-negotiable at scale — offset pagination degrades to O(n) scans and produces page drift when new rows are inserted. Shared databases between services are a coupling anti-pattern — call it out proactively and explain the production failure mode it prevents.

High-Level Design Decision Tree

IfExternal-facing API for third-party clients

→

UseUse REST with JSON — firewall-friendly, universally supported, no client-side code generation required

IfInternal service-to-service communication with high throughput requirements

→

UseUse gRPC with protobuf — strongly-typed contracts, lower latency, binary serialization reduces payload size by 60-80% vs JSON

IfPaginated API serving production-scale datasets

→

UseUse cursor-based pagination — offset pagination degrades to O(n) full scans at depth and produces inconsistent results when rows are inserted between pages

IfMultiple services need access to the same data

→

UseEach service owns its datastore — communicate via APIs or async events on a message queue, never via shared databases

Phase 3 — Deep Dive on the Hard Problems (Minutes 20–40)

This is where the interview is won or lost. After the high-level design, a good interviewer will steer you toward the hardest sub-problem in your design. For Twitter, that's the fan-out problem: when Katy Perry tweets, how do you push that tweet to 150M followers without your system catching fire?

There are two approaches — fan-out on write (push) and fan-out on read (pull) — and neither is universally correct. This is a classic system design trade-off that interviewers use specifically because it has no single right answer. The right choice depends on the read:write ratio and the distribution of follower counts in your user base.

Fan-out on write: when a tweet is posted, the Tweet Service publishes an event to a fan-out queue. A fleet of fan-out workers picks up the event and writes the tweet_id into each follower's precomputed timeline in Redis. Timeline reads are then a simple Redis ZSET lookup — sub-millisecond. The cost? Writing one tweet for a celebrity with 50M followers means 50M Redis writes. This is write amplification at its most extreme. At 3,600 writes per second at peak, and assuming the average user has 1,000 followers, that's 3.6M fan-out writes per second for average users alone — before you account for any celebrity traffic.

Fan-out on read: when a user opens their timeline, the Timeline Service queries the social graph to get their followed accounts, fetches recent tweets from each of those accounts, merges and sorts them, and returns the result. No write amplification — but read time scales linearly with the number of followed accounts. Following 5,000 accounts means up to 5,000 DB lookups on every timeline refresh, which violates the 200ms p99 SLA we set in Phase 1.

The production answer, used by Twitter and Instagram, is a hybrid: fan-out on write for normal users (fewer than 10K followers), fan-out on read for celebrities (more than 10K followers). When Katy Perry's tweet appears in your timeline, it was lazily fetched at read time and merged in memory with your precomputed timeline from non-celebrity follows. This caps write amplification at a manageable level while keeping reads fast for the overwhelming majority of cases. The 10K threshold is a tunable configuration value, not a magic number — you'd calibrate it against your actual follower distribution and Redis write capacity.

io/thecodeforge/interview/FanOutDeepDive.txtTEXT

// ── PHASE 3: FAN-OUT DEEP DIVE — THE HARD PROBLEM ──────────────────────────

// HYBRID FAN-OUT ALGORITHM (production-grade)

When a tweet is posted:
  1. Tweet Service writes tweet to tweets DB (source of truth)
  2. Tweet Service publishes event to Fan-out Queue (Kafka topic: 'new-tweets')

Fan-out Worker picks up the event:
  3. Fetch author's follower list from User Service (graph DB or followers table)
  4. Check follower count:
     IF author.follower_count < CELEBRITY_THRESHOLD (e.g., 10,000):
       → PUSH path: for each follower_id, write tweet_id to their Redis timeline
           ZADD timeline:{follower_id} {timestamp_score} {tweet_id}
           // Redis ZSET keeps timeline sorted by time automatically
           // Trim to last 800 entries to bound memory per user:
           ZREMRANGEBYRANK timeline:{follower_id} 0 -801
     ELSE:
       → PULL path: mark tweet as 'celebrity-tweet', store only in tweets DB
           No fan-out write. Will be fetched at read time.
           SET celebrity_tweet:{tweet_id} 1 EX 86400  // 24hr marker

When a user requests their timeline:
  5. Timeline Service reads precomputed timeline from Redis:
       ZREVRANGE timeline:{user_id} 0 19  // last 20 tweet_ids, newest first
  6. Identify which users the requester follows AND have celebrity status:
       celebrity_follows = graph.get_celebrity_follows(user_id)
  7. For each celebrity, fetch their latest tweets from tweets DB:
       SELECT tweet_id FROM tweets
       WHERE author_id IN (:celebrity_ids)  // batch query, not N+1
       ORDER BY created_at DESC LIMIT 20 PER author
  8. MERGE precomputed timeline + celebrity tweets in memory:
       merged = sort_by_time(redis_tweets + celebrity_tweets)[:20]
  9. Hydrate tweet_ids → full tweet objects via Tweet Service (batch GET)
  10. Return hydrated, merged, paginated timeline to client

// WHY THIS WORKS:
// - Normal users (99.9% of authors): write amplification is bounded
//   1,000 followers * 3,600 writes/sec = 3.6M Redis ops/sec — achievable
//   with a Redis cluster at this scale
// - Celebrities: no write amplification. 50M followers * 0 writes = 0
// - Read path: one Redis lookup + a handful of DB queries for celebrities
//   This keeps p99 latency under 200ms even at peak scale

// EDGE CASES TO CALL OUT IN YOUR INTERVIEW:
// 1. Threshold transition: if @JohnDoe gains 10,001 followers, do we backfill
//    existing timelines? No — switch the threshold prospectively.
//    The slight inconsistency window is acceptable under eventual consistency.
// 2. Redis eviction: if a user hasn't logged in for 30 days, their timeline
//    key expires. On return, trigger an async cold-start timeline rebuild.
//    The user sees a loading state for ~500ms on first login, then normal UX.
// 3. Follower list pagination: fetching 50M follower IDs in one call will OOM
//    your fan-out worker. Stream in batches of 10,000 using cursor pagination.
// 4. At-least-once delivery: Kafka guarantees at-least-once. A fan-out worker
//    crash mid-batch could re-process the event. Make ZADD idempotent —
//    re-adding the same member with the same score is a no-op in Redis.
//    Normalize timestamp scores to second precision before writing.

Output

Timeline read latency breakdown (production approximation):

Redis ZRANGE lookup: ~1ms

Celebrity tweet DB batch query: ~10ms (indexed on author_id + created_at)

Tweet hydration (batch GET): ~20ms

Merge + sort in memory: <1ms

Network + serialization: ~15ms

─────────────────────────────────────────

Total p50: ~47ms

Total p99 (cold Redis / slow DB): ~180ms ✓ (within our 200ms SLA)

🔥Interview Gold: The Celebrity Problem Has a Name

Interviewers call this the 'hotspot' or 'hot key' problem. In distributed systems, any time one entity generates disproportionate traffic — a celebrity tweet, a viral product listing, a trending hashtag — uniform approaches break because they assume traffic is evenly distributed. The pattern to recognize is: identify the outlier distribution, handle outliers on a separate path, and keep the common path cheap and predictable. This reasoning applies to Kafka partition hotspots, database hot rows, CDN cache stampedes, and rate limiter fairness problems — same root cause, same solution shape.

📊 Production Insight

Uniform approaches break when one entity generates disproportionate traffic — this is the hotspot problem and it shows up everywhere in distributed systems. Hybrid fan-out caps write amplification for celebrities while keeping reads O(1) for normal users. Rule: always identify outlier distributions early in the design and handle them on a separate code path from the common case — conflating the two paths is what causes incidents.

🎯 Key Takeaway

The hybrid fan-out pattern is the production answer — push for normal users, pull for celebrities. The hotspot problem (one entity causing disproportionate load) appears everywhere in distributed systems: Kafka partitions, database rows, CDN cache keys. The solution pattern is always the same: identify the outliers early, route them to a separate path, and keep the common path cheap. If you can articulate this principle and apply it across different problem shapes, you're reasoning like a principal engineer rather than reciting a memorized architecture.

Fan-out Strategy Selection

IfAuthor has fewer than 10K followers (99.9% of users)

→

UseFan-out on write — push tweet_id to each follower's Redis ZSET timeline at write time

IfAuthor has more than 10K followers (celebrity accounts)

→

UseFan-out on read — store tweet in DB only, fetch and merge lazily at read time

IfMixed workload with both normal and celebrity authors

→

UseHybrid approach — threshold-based routing in the fan-out worker, threshold is a tunable config value

IfKafka consumer lag exceeds 30 seconds during a traffic spike

→

UseAuto-scale fan-out workers via HPA on lag metric; if lag exceeds 5 minutes, degrade gracefully to full fan-out-on-read for all accounts temporarily

thecodeforge.io

How To Answer System Design

Phase 4 — Trade-offs, Bottlenecks, and How to Close Strong (Minutes 40–45)

The last five minutes are your chance to demonstrate that you think about systems holistically — not just 'does it work?' but 'how does it fail, and how do we recover?' Strong candidates proactively surface the weaknesses in their own design before the interviewer has to find them. This is not just a performance trick — it reflects genuine production experience, because engineers who've actually operated systems at scale know that the interesting questions are always about the failure modes, not the happy path.

Walk through your design and call out at least three potential bottlenecks with mitigations. Don't wait to be asked. For our Twitter design: (1) the fan-out queue — if Kafka falls behind during a traffic spike, timeline freshness degrades. Mitigation: monitor consumer lag per partition, auto-scale fan-out workers on lag metric, implement a dead-letter queue for failed fan-out events, and fall back to full fan-out-on-read as a degraded mode if lag exceeds five minutes. (2) Redis memory — precomputing timelines for 500M users at ~800 tweet_ids each, stored as 8-byte integers, is roughly 3.2TB of Redis storage. Manageable with a Redis cluster, but requires LRU eviction, timeline key expiry for inactive users, and strict enforcement that only tweet_ids (not full objects) are stored in Redis. (3) The single-region failure mode — the design as described has no geographic failover. For 99.99% global availability, deploy identical stacks in three regions with latency-based DNS routing, and replicate tweets asynchronously across regions via Kafka MirrorMaker 2, accepting ~500ms of cross-region replication lag.

Then close with a design summary — this is almost universally skipped by candidates but it's one of the most powerful things you can do: 'To summarize — we designed a Twitter clone handling 1,200 tweet writes and 120,000 timeline reads per second. The key architectural insight was the hybrid fan-out strategy that caps write amplification while keeping read latency under 200ms p99. The main trade-off was operational complexity — the fan-out worker requires careful idempotency handling and batch checkpointing that would not exist in a simpler pull-only design.' One paragraph. Architecture recapped. Trade-off named. This demonstrates that you can communicate to a non-technical stakeholder or an engineering manager, not just to the person next to you at a whiteboard.

Finally, leave time to ask the interviewer a genuine question about their actual systems. 'How does your team handle the fan-out problem today — did you go hybrid, or take a different approach?' This signals collaborative instinct and intellectual curiosity. Interviewers remember candidates who made the conversation feel like a peer discussion rather than an oral exam.

io/thecodeforge/interview/BottlenecksAndClosingScript.txtTEXT

// ── PHASE 4: BOTTLENECK REVIEW AND CLOSING STATEMENT ────────────────────────

// PROACTIVE BOTTLENECK IDENTIFICATION (say this before they ask)

"Let me stress-test my own design before we wrap up."

// BOTTLENECK 1 — Fan-out Queue Lag
"If we get a sudden traffic spike — say, a major world event drives 10x normal
 tweet volume — our Kafka consumer group (fan-out workers) may fall behind.
 The symptom: users see stale timelines. The mitigation:
  - Monitor consumer lag per Kafka partition (alert if lag > 30 seconds)
  - Auto-scale fan-out worker pods based on lag metric (HPA in Kubernetes)
  - For extreme outliers, shed load: if fan-out lag exceeds 5 minutes,
    temporarily switch all users to fan-out-on-read as a degraded mode
  - Dead-letter queue for fan-out events that fail after 3 retries"

// BOTTLENECK 2 — Redis Memory at Scale
"500M users * 800 tweet_ids * 8 bytes each = ~3.2TB of Redis storage.
 Achievable with a Redis cluster, but requires active management:
  - Expire timeline keys for users inactive > 30 days (EXPIRE command)
  - Store only tweet_id (int64, 8 bytes) not full tweet objects in Redis
  - Use Redis cluster with consistent hashing across 10+ shards
  - Monitor memory fragmentation ratio — above 1.5 means Redis is
    allocating but not efficiently reusing memory; trigger MEMORY PURGE
  - Set a hard cap of 800 entries per timeline ZSET via ZREMRANGEBYRANK"

// BOTTLENECK 3 — Single-Region Failure
"My design so far is single-region. For 99.99% availability globally:
  - Deploy identical stacks in 3 regions: US-East, EU-West, AP-Southeast
  - Use latency-based DNS routing to pin users to their nearest region
  - Tweets written to local region, asynchronously replicated to other
    regions via Kafka MirrorMaker 2 with ~500ms replication lag
  - Conflict resolution: last-write-wins with tweet_id as tiebreaker
  - Timeline reads always served from local region Redis — cross-region
    fan-out happens async and is acceptable under eventual consistency"

// CLOSING SUMMARY (always do this — most candidates skip it)
"To summarize what we built:
  - Core entities: tweets, users, follower graph, precomputed timelines
  - Write path: Tweet Service → Kafka → Fan-out Workers → Redis + Tweet DB
  - Read path: API Gateway → Timeline Service → Redis ZSET + celebrity DB queries
  - Key trade-off: hybrid fan-out adds operational complexity — idempotency
    handling and batch checkpointing — but is the only approach that keeps
    both write amplification and read latency bounded simultaneously
  - Scale targets: 1,200 wps, 120,000 rps, 7TB/year tweet storage, 3.2TB Redis
  - Reliability target: 99.99% via multi-region active-active, async replication"

// YOUR QUESTION TO THE INTERVIEWER
"One thing I'd genuinely love to know — how does your team actually handle
 the fan-out problem today? Did you go hybrid, or take a different approach?
 And what was the hardest operational problem you ran into with it?"

Output

This closing pattern typically triggers one of two responses:

1. 'That's a thorough summary — we actually do something similar but use

a different threshold for the celebrity cutoff based on write capacity.

Let me tell you about the operational issues we ran into...'

→ You've turned the interview into a peer conversation. Strong hire signal.

2. 'Great. Follow-up: what happens if a fan-out worker crashes mid-batch —

do you lose fan-out events for those 30,000 followers?'

→ You already handled this with idempotent ZADD and batch checkpointing.

Answer calmly and reference your earlier design decision directly.

💡Pro Tip: Critique Your Own Design First

Interviewers are trained to probe weaknesses in your design. If you surface them first, you demonstrate self-awareness and production experience. If they find them first, it reads as a gap in your thinking — even if you would have gotten there eventually. The phrase 'let me stress-test my own design before we wrap up' is one of the most powerful things you can say in the last five minutes of a system design interview.

📊 Production Insight

Self-critique before the interviewer probes demonstrates that you've operated systems under real failure conditions, not just designed them on a whiteboard. A closing summary that names the primary trade-off shows you can communicate design decisions to stakeholders who weren't in the room. Rule: always end with a genuine question about the interviewer's real systems — the best interviews feel like peer conversations, and that perception starts with you.

🎯 Key Takeaway

Proactively surfacing your own design weaknesses is the single highest-signal senior behavior in a system design interview — it demonstrates that you think about failure modes, not just happy paths. A one-paragraph closing summary that names the key trade-off shows stakeholder communication ability. The difference between 'hire' and 'strong hire' is almost entirely in the depth and proactiveness of trade-off reasoning, not the final diagram on the whiteboard.

Closing Strategy Decision Tree

IfYou have 5 minutes remaining and have covered the deep dive

→

UseProactively surface 3 bottlenecks with concrete mitigations, then give a one-paragraph design summary that names the key trade-off

IfInterviewer finds a weakness you didn't surface

→

UseAcknowledge it honestly, propose a mitigation immediately, and connect it back to the trade-off decision you made earlier — never deflect

IfInterviewer asks a follow-up about a component you already addressed

→

UseReference your earlier design decision calmly and directly — 'I covered this with idempotent ZADD and batch offset checkpointing in Redis'

IfYou have time for a final question to the interviewer

→

UseAsk about their real system and its hardest operational problem — 'How does your team handle this today, and what was the ugliest failure mode you hit?' This is the question that turns an interview into a conversation.

You don't build a bridge by guessing how many cars cross it. Yet most candidates jump into designing distributed systems without a single back-of-the-envelope calculation. That's how you end up with a 50-node Cassandra cluster for a service that gets 200 requests per day.

Capacity estimation isn't busywork. It forces you to surface hard constraints before they become production incidents. Start with traffic: read-heavy or write-heavy? A typical Twitter clone might see 80% reads, 20% writes. Take the monthly active users (say 100M), daily active ratio (50%), and average requests per user per day (cached newsfeed refreshes on open). That's roughly 50M users doing 200 requests/day = 10 billion daily requests. Peak is 5x that. Now you know your API gateway needs to handle 115k requests/second without sweating.

Storage follows: each tweet payload is ~700 bytes after metadata. One year of all tweets at 100M/day is 25 TB. No big deal. But the media pipeline? Each image averages 500 KB. Now you're looking at 50 TB per day. Suddenly object storage isn't optional, it's mandatory. Network bandwidth calc tells you: don't even think about moving 50 TB/day through a single 1 Gbps link. That's 466 Mbps sustained, which leaves zero headroom for replication. You need a 10 Gbps inter-connect _or_ a CDN.

The number isn't the point. The clarity is. Write these estimates on the whiteboard before drawing a single box.

CapacityPlanner.pyPYTHON

// io.thecodeforge — interview tutorial

MAU = 100_000_000      # monthly active users
DAILY_ACTIVE_RATIO = 0.5
dau = MAU * DAILY_ACTIVE_RATIO       # 50M

REQUESTS_PER_USER_PER_DAY = 200
daily_requests = dau * REQUESTS_PER_USER_PER_DAY  # 10B

PEAK_FACTOR = 5
peak_daily_requests = daily_requests * PEAK_FACTOR

# Convert daily to per-second
peak_rps = peak_daily_requests / 86400

print(f"Daily requests: {daily_requests:,}")
print(f"Peak RPS: {peak_rps:,.0f}")

BYTES_PER_TWEET = 700
AVG_IMAGE_SIZE_KB = 500

storage_per_tweet = BYTES_PER_TWEET + (AVG_IMAGE_SIZE_KB * 1024)
storage_per_day = daily_requests * 0.2 * storage_per_tweet  # assume 20% are media tweets

print(f"Storage per tweet: {storage_per_tweet:,} bytes")
print(f"Storage per day: {storage_per_day / 1e12:.2f} TB")

Output

Daily requests: 10,000,000,000

Peak RPS: 115,741

Storage per tweet: 512,700 bytes

Storage per day: 1.19 TB

⚠ Production Trap:

Interviewers watch for candidates who treat capacity estimation as a formality. When asked 'how did you arrive at 115k RPS?', if you can't explain your DAU-to-MAU ratio or your peak factor assumption, you just torpedoed your seniority signal.

🎯 Key Takeaway

Always open with traffic and storage estimates — it proves you understand the load before you architect for it.

Phase 5: Database Schema — Where the Interview Happens

You've drawn the boxes and arrows. Now the interviewer says: 'How do you store it?' This is where 70% of candidates fold. They wave their hands at a tinder box labeled 'database' and move on. Don't be that person. The schema reveals whether you understand read patterns, write patterns, and consistency trade-offs.

For a Twitter timeline system, you have three core tables: users, tweets, and timeline_cache. The users table is standard SQL — UUID, handle, profile metadata. Tweets need careful indexing. A global tweet_id as primary key (use Snowflake IDs, not auto-increment) and a secondary index on author_id sorted by created_at descending for the user's own timeline. That's your hot path query.

The evil bit is the home timeline. You _don't_ query the tweet table for 800 followees on every refresh. That's a full table scan per request. Instead, build a timeline_cache: a Redis sorted set per user with tweet IDs and scores (timestamps). Write to it on tweet creation via fanout, capped at 800 entries. If the user misses a write, they see stale data for 5 seconds. That's acceptable. Strong consistency betrays you.

Now the DB selection: use Postgres for relational integrity (users, relationships). Use Cassandra for write-heavy tweet ingestion at massive scale. Use Redis for in-memory timeline materialization. Three databases with clear ownership. No silver bullet. Each has a job, and you know why.

TimelineSchema.pyPYTHON

// io.thecodeforge — interview tutorial

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    handle VARCHAR(15) UNIQUE NOT NULL,
    profile_pic_url TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE tweets (
    tweet_id BIGINT PRIMARY KEY,  -- Snowflake ID
    author_id UUID NOT NULL REFERENCES users(user_id),
    content TEXT NOT NULL,
    media_urls TEXT[],
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_tweets_author_timestamp
ON tweets(author_id, created_at DESC);

-- Timeline cache is a Redis sorted set per user
-- ZADD user:<user_id>:timeline <timestamp> <tweet_id>
-- ZREVRANGE user:<user_id>:timeline 0 800 WITHSCORES

print("Schema ready. 3 databases, 2 indexes, 0 regrets.")

Output

Schema ready. 3 databases, 2 indexes, 0 regrets.

💡Senior Shortcut:

When the interviewer asks 'why not just use a single Postgres table?' say 'Single table works for 10K users. At 10M, the index fan-out on timeline reads O(n). You'd need 8 seconds to assemble one home feed. Redis sorted set reduces that to 2 ms — and that's why we pay for dedicated cache nodes.'

🎯 Key Takeaway

Your schema must map one-to-one with query patterns, not some ERD from a textbook. Know which query is hot and index for it. If you can't explain why Redis wins over Postgres for timeline assembly, you haven't thought about the data path.

AI Components in System Design: Embeddings, Vector DBs, RAG

Modern system design interviews increasingly test candidates on AI integration. For a Twitter-like feed, embeddings can power personalized recommendations, search, and duplicate detection. Embeddings convert tweets into dense vectors capturing semantic meaning. Use a pre-trained model like BERT or Sentence-BERT to generate 768-dimensional vectors for each tweet. Store these in a vector database (e.g., Pinecone, Weaviate, or pgvector) for efficient similarity search. For duplicate tweet detection, compute cosine similarity between incoming tweet embeddings and recent tweets; if similarity exceeds a threshold (e.g., 0.95), flag as duplicate. This complements traditional exact-match dedup. For feed ranking, use Retrieval-Augmented Generation (RAG): retrieve top-K relevant tweets based on user embedding (derived from their history) and feed them to a ranking model. Example: user embedding = average of last 50 liked tweet embeddings. RAG pipeline: (1) embed user query/user vector, (2) query vector DB for nearest neighbors, (3) pass results to a lightweight ML model for final ranking. This reduces latency vs. full neural ranking. Trade-offs: vector DB adds operational complexity and cost; embedding generation requires GPU inference at scale. Use approximate nearest neighbor (ANN) indexes (e.g., HNSW) for sub-100ms queries. In an interview, discuss trade-offs: accuracy vs. latency, cold start for new users, and incremental embedding updates via CDC.

duplicate_detection.pyPYTHON

import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

model = SentenceTransformer('all-MiniLM-L6-v2')

def is_duplicate(new_tweet: str, recent_tweets: list[str], threshold: float = 0.95) -> bool:
    new_emb = model.encode([new_tweet])
    recent_embs = model.encode(recent_tweets)
    sims = cosine_similarity(new_emb, recent_embs)[0]
    return np.any(sims > threshold)

🔥Vector DB Selection

📊 Production Insight

At Twitter scale, embed tweets asynchronously via a Kafka pipeline and store embeddings in a separate column family (e.g., Cassandra with vector support) to avoid blocking write path.

🎯 Key Takeaway

Embeddings + vector DB enable semantic duplicate detection and personalized ranking, but add latency and cost; use ANN indexes and incremental updates.

Event-Driven Architecture: Kafka, Debezium, CDC Patterns

Fan-out OOM and duplicate tweets are exacerbated by synchronous fan-out. An event-driven architecture decouples tweet ingestion from fan-out. Use Apache Kafka as the central event bus. When a user posts a tweet, the API gateway publishes a 'TweetCreated' event to Kafka. Multiple consumers process this event asynchronously: (1) Fan-out service: reads the event and writes the tweet to followers' timelines (using a batch write to a distributed cache like Redis). (2) Duplicate detection service: checks for duplicates via exact match (tweet ID) and semantic match (embedding similarity). (3) Notification service: sends push notifications. Debezium, a CDC (Change Data Capture) tool, can stream changes from the primary database (e.g., PostgreSQL) to Kafka without dual writes. For example, when a tweet is inserted into the 'tweets' table, Debezium captures the change and publishes it to a Kafka topic. This ensures exactly-once semantics and reduces application complexity. For fan-out, use Kafka partitions keyed by follower user ID to parallelize writes. To avoid OOM, the fan-out service processes events in batches (e.g., 100 events per batch) and uses backpressure (Kafka consumer pause/resume). For duplicate tweets, the duplicate detection service can use a sliding window cache (e.g., Redis with TTL) of recent tweet IDs and embeddings. CDC also enables rebuilding the timeline cache from scratch by replaying Kafka topics. Trade-offs: Kafka adds latency (milliseconds) and operational overhead; CDC requires careful schema management. In an interview, discuss how to handle failures: dead letter queues for failed events, idempotent consumers, and exactly-once processing with Kafka transactions.

fanout_consumer.pyPYTHON

from kafka import KafkaConsumer
import redis

consumer = KafkaConsumer('tweet-created', bootstrap_servers='localhost:9092', enable_auto_commit=False)
r = redis.Redis()

for msg in consumer:
    tweet = json.loads(msg.value)
    followers = get_followers(tweet['author_id'])  # from DB
    pipeline = r.pipeline()
    for follower_id in followers:
        pipeline.lpush(f'timeline:{follower_id}', tweet['id'])
        pipeline.ltrim(f'timeline:{follower_id}', 0, 800)  # keep last 800
    pipeline.execute()
    consumer.commit()

⚠ Exactly-Once Semantics

📊 Production Insight

At Twitter, fan-out uses a combination of Kafka for async processing and Redis for timeline cache. Debezium captures DB changes to Kafka, ensuring consistency and enabling replay for cache rebuilds.

🎯 Key Takeaway

Event-driven architecture with Kafka and Debezium decouples tweet ingestion from fan-out, preventing OOM and enabling scalable duplicate detection.

Observability-Driven Design: Tracing, Metrics, Logging Strategy

System design interviews often overlook observability, but it's critical for debugging fan-out OOM and duplicate tweets. Design for observability from the start. Use distributed tracing (e.g., OpenTelemetry) to trace a tweet's lifecycle: from API gateway → Kafka producer → fan-out consumer → Redis write. Each span captures latency and errors. For example, if a fan-out consumer OOMs, tracing shows the exact step where memory spiked. Metrics: (1) Fan-out latency p50/p99, (2) Kafka consumer lag, (3) Redis memory usage per timeline, (4) duplicate detection rate (false positives/negatives). Use Prometheus to collect metrics and Grafana for dashboards. Logging: structured logs (JSON) with correlation IDs. For duplicate tweets, log the similarity score and threshold. Set up alerts: if fan-out latency > 500ms or consumer lag > 1000, page on-call. For OOM prevention, monitor heap usage and set a high-water mark to pause consumers. Example: a Go fan-out service uses pprof to detect memory leaks. In an interview, discuss how observability helps during the deep dive: e.g., tracing reveals that duplicate detection is slow due to embedding generation, prompting a switch to a faster model. Also discuss cost: tracing adds overhead (sampling rate 1% for high throughput). Production insight: Twitter uses Zipkin for tracing and Observability as a Service (e.g., Datadog) for unified dashboards. Key takeaway: Observability-driven design ensures you can detect and fix fan-out OOM and duplicate tweet issues in production.

tracing_setup.pyPYTHON

from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span('fanout_process') as span:
    span.set_attribute('tweet_id', tweet_id)
    span.set_attribute('follower_count', len(followers))
    # fan-out logic

💡Sampling Strategy

📊 Production Insight

Twitter uses a custom observability platform with high-cardinality metrics (per user timeline) and distributed tracing to debug latency spikes and memory leaks in real-time.

🎯 Key Takeaway

Observability with tracing, metrics, and logging is essential for diagnosing fan-out OOM and duplicate tweet issues in production.

● Production incidentPOST-MORTEMseverity: high

Fan-out Worker Crash on Celebrity Tweet Causes Duplicate Timeline Entries and OOM

Symptom

Timeline feeds showed duplicate tweets for users following the celebrity. Simultaneously, the fan-out worker pod restarted with an OOMKill status. Kafka consumer lag spiked to 12 minutes during the incident window.

Assumption

A Kafka broker failure caused message redelivery, and the worker had a memory leak.

Root cause

The fan-out worker fetched the entire 50M follower list into memory in a single API call (no cursor pagination), causing OOM after consuming ~3GB of heap. When Kubernetes restarted the pod, Kafka redelivered the same message (at-least-once semantics). The worker reprocessed the entire batch — but the ZADD idempotency guarantee was bypassed because the worker used a different score encoding on the second pass (millisecond precision vs second precision timestamps), causing duplicate entries in the Redis ZSET.

Fix

Three changes: (1) stream follower list in batches of 10,000 using cursor pagination against the graph store, capping memory at ~80MB per batch. (2) Normalize timestamp scores to second-precision before ZADD to guarantee idempotency on replay. (3) Add a process_batch_offset checkpoint in Redis so a restarted worker can resume from the last committed follower batch instead of replaying from zero.

Key lesson

Never fetch unbounded lists in a single call — always use cursor pagination with a batch size limit.
At-least-once delivery requires idempotent writes — test your idempotency guarantee by simulating a crash-and-replay scenario in staging before shipping.
Score encoding precision matters for ZSET idempotency — normalize to a consistent unit before writing and enforce it in a shared utility function so no caller can deviate.
Checkpoint batch progress externally (Redis or DB) so workers can resume, not restart, after a crash — the difference between resuming at offset 30,001 and replaying from zero is the difference between a 2-minute recovery and a 90-minute incident.

Production debug guideDiagnostic steps when the designed system exhibits production symptoms4 entries

Symptom · 01

Timeline shows stale tweets — users see content from 5+ minutes ago

→

Fix

Check Kafka consumer lag per partition. If lag is growing, scale fan-out workers or check for a single-partition hotspot caused by all messages for one celebrity routing to the same partition key. If lag is zero, check Redis ZSET freshness — the timeline key may have expired due to user inactivity and needs a cold-start rebuild triggered asynchronously on the next login.

Symptom · 02

Timeline read latency exceeds 200ms p99 during peak hours

→

Fix

Profile the read path in three segments: Redis ZRANGE latency, celebrity tweet DB query time, and tweet hydration batch GET time. The most common culprit is an N+1 query in the hydration step — fetching one tweet per network call instead of batching all tweet IDs into a single batch GET. Switch to WHERE tweet_id IN (...) or a multi-key MGET pattern and verify the query hits a covering index.

Symptom · 03

Fan-out worker pods keep restarting with OOMKill

→

Fix

Check whether the triggering fan-out event involves a high-follower-count author whose follower list is being fetched without cursor pagination. Add batch streaming with a hard cap of 10,000 follower IDs per batch and a memory limit on the worker pod. Also verify the celebrity threshold check is firing correctly — a near-threshold account that slips through to the write path can allocate follower lists larger than the pod's memory limit.

Symptom · 04

Write throughput drops suddenly during a viral event

→

Fix

Check whether the celebrity threshold logic is working correctly — a viral tweet from a near-threshold account may be hitting the fan-out-on-write path instead of the fan-out-on-read path, flooding fan-out workers unexpectedly. Also check Kafka producer backpressure metrics and topic partition rebalancing logs — a rebalance during a traffic spike can cause producer timeouts and reduce effective write throughput by 40-60% for the duration of the rebalance.

★ System Design Debugging Cheat SheetSymptom-based diagnostic commands for the Twitter timeline design

Timeline shows stale tweets from minutes ago−

Immediate action

Check Kafka consumer lag per partition for the fan-out consumer group

Commands

kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group fan-out-workers

kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group fan-out-workers | awk '$5 > 0 {print $1, $2, $5}'

Fix now

Scale fan-out worker replicas: kubectl scale deployment fan-out-worker --replicas=10 — verify lag starts dropping within 60 seconds of scaling

Timeline read latency exceeds 200ms p99+

Fan-out worker pods keep restarting with OOMKill+

Write throughput drops during a viral event+

Fan-out Strategy Comparison

Decision Point	Fan-out on Write (Push)	Fan-out on Read (Pull)
Read latency	O(1) — Redis ZSET lookup, ~1ms	O(followees) — one DB query per followed account, degrades with follow count
Write cost	O(followers) — 50M Redis writes for a celebrity tweet	O(1) — write the tweet once to the DB, no fan-out
Storage cost	High — 3.2TB Redis for 500M precomputed timelines at 800 entries each	Low — no precomputed state, only the source tweet DB
Freshness	Eventual — fan-out lag under load can be 30-120 seconds during spikes	Strong — always reads latest data from source at read time
Best for	Normal users with fewer than 10K followers — the 99.9% case	Celebrity accounts with more than 10K followers — the write amplification outlier
Production example	Twitter (normal users), Instagram feed delivery	Twitter celebrities, LinkedIn feed for high-connection accounts
Failure mode	Kafka consumer lag → stale timelines during traffic spikes	Large followee lists → high read latency; N+1 DB queries if not batched
Recommended approach	Hybrid: write path for normal users, read path for celebrities	Hybrid: write path for normal users, read path for celebrities

⚙ Quick Reference

9 commands from this guide

File	Command / Code	Purpose
iothecodeforgeinterviewRequirementsClarificationScript.txt	"Before I start designing, I want to make sure we're aligned on scope.	Phase 1
iothecodeforgeinterviewHighLevelDesignScript.txt	API DESIGN — Tweet Posting Flow:	Phase 2
iothecodeforgeinterviewFanOutDeepDive.txt	When a tweet is posted:	Phase 3
iothecodeforgeinterviewBottlenecksAndClosingScript.txt	"Let me stress-test my own design before we wrap up."	Phase 4
CapacityPlanner.py	MAU = 100_000_000 # monthly active users	Phase 0: Capacity Estimation
TimelineSchema.py	CREATE TABLE users (	Phase 5: Database Schema
duplicate_detection.py	from sentence_transformers import SentenceTransformer	AI Components in System Design
fanout_consumer.py	from kafka import KafkaConsumer	Event-Driven Architecture
tracing_setup.py	from opentelemetry import trace	Observability-Driven Design

Key takeaways

Requirements before architecture

never draw a single component until you have at least 3 specific NFRs with real numbers. QPS, latency SLA, and availability target are the minimum. Every component choice must be traceable to one of these requirements — if it isn't, you're guessing.

The hybrid fan-out pattern is the production answer to social feed scaling

push (fan-out on write) for normal users keeps reads O(1), pull (fan-out on read) for celebrities caps write amplification. The threshold (~10K followers) is a tunable configuration value calibrated against your Redis write capacity, not a magic number pulled from a blog post.

Critique your own design proactively

surface bottlenecks before the interviewer finds them. This is the single highest-signal behavior that separates senior candidates from mid-level candidates in system design rounds. Engineers who have operated systems at scale think about failure modes first, not last.

Cursor-based pagination is non-negotiable at scale

offset pagination (LIMIT x OFFSET n) degrades to O(n) full table scans and produces inconsistent results when rows are inserted between page fetches. Always use a cursor — a last-seen ID or opaque base64 token — for any paginated API that will serve production traffic.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

You've designed a fan-out system using Kafka and Redis. A fan-out worker...

Q02SENIOR

Your timeline service is hitting 200ms p99 during peak load. Walk me thr...

Q03SENIOR

We want to add a 'Topics' feature — users can follow topics like #NBA in...

Q04SENIOR

Walk me through how you would estimate the QPS and storage requirements ...

Q01 of 04SENIOR

You've designed a fan-out system using Kafka and Redis. A fan-out worker crashes after processing 30,000 of a celebrity's 50M followers. When it restarts and replays the Kafka message, how do you ensure those 30,000 followers don't get duplicate timeline entries?

ANSWER

Two-part answer: idempotency and checkpointing. Idempotency: Redis ZADD with the same member and same score is a no-op — re-adding tweet_id 12345 to timeline:user_789 with score 1713000000 produces no change if that entry already exists. However, this only holds if the score encoding is identical on replay. If the first pass used millisecond precision and the replay used second precision, the scores differ and Redis treats them as two distinct entries, producing duplicates. The fix: normalize all timestamp scores to second precision in a shared utility function before any ZADD call, and enforce this through a code review rule so no caller can deviate from it. Checkpointing: without checkpointing, the restarted worker reads the Kafka message from the beginning and replays all 50M followers starting from offset zero. Even with idempotent ZADD, this wastes compute and extends the delay for followers 30,001 through 50M. With checkpointing, the worker writes a progress key like fan-out:progress:{tweet_id}:{kafka_partition} = {last_committed_follower_cursor} in Redis after each batch of 10,000 followers. On restart, it reads this checkpoint and resumes from cursor 30,001 instead of zero. Combined: idempotent ZADD guarantees correctness on any replay path, and checkpointing guarantees efficiency. Both are required — idempotency without checkpointing is correct but slow; checkpointing without idempotency is fast but potentially wrong if the checkpoint itself is written non-atomically.

FAQ · 3 QUESTIONS

Frequently Asked Questions

How long should I spend on requirements in a system design interview?

Should I memorize specific architectures like 'design Twitter' or 'design YouTube'?

What's the difference between a 'hire' and a 'strong hire' in a system design round?

Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's System Design Interview. Mark it forged?

12 min read · try the examples if you haven't

System Design Interview: Fan-out OOM & Duplicate Tweets

Why Fan-out OOM & Duplicate Tweets Are the Real System Design Test

Phase 1 — Clarify Requirements Before Touching the Whiteboard (Minutes 0–5)

Phase 2 — High-Level Design and the API Contract (Minutes 5–20)

Phase 3 — Deep Dive on the Hard Problems (Minutes 20–40)

Phase 4 — Trade-offs, Bottlenecks, and How to Close Strong (Minutes 40–45)

Phase 0: Capacity Estimation — Stop Designing Blind

Phase 5: Database Schema — Where the Interview Happens

AI Components in System Design: Embeddings, Vector DBs, RAG

Event-Driven Architecture: Kafka, Debezium, CDC Patterns

Observability-Driven Design: Tracing, Metrics, Logging Strategy

Fan-out Worker Crash on Celebrity Tweet Causes Duplicate Timeline Entries and OOM

Key takeaways

Interview Questions on This Topic

Frequently Asked Questions

That's System Design Interview. Mark it forged?