Design Twitter Feed — Surviving 50M Fan-Out Writes
One celebrity tweet triggered 50M writes, freezing feeds for 30 seconds.
- Fan-out models: push (write-time) vs pull (read-time) vs hybrid — each trades off write cost vs read latency
- Timeline cache: pre-computed per-user feed reduces read latency from O(N) to O(1)
- Celebrity (hot key) problem: followers of high-profile users cause write amplification — solution: separate celebrity fan-out into pull path
- Ranking: ML-based scoring on recency, engagement, relevance — production models update in near-real-time
- Performance insight: push fan-out at scale requires ~5ms per follower write; hybrid reduces peak write load by 40%
Imagine a school notice board, but instead of one board everyone walks to, each student gets their own personal copy of only the notices that matter to them — delivered the moment someone posts. When you follow 300 people on Twitter, you want to see their tweets instantly without Twitter searching through billions of posts every time you open the app. The feed system is basically a very smart mail-sorting room that pre-packages your personal newspaper so it's ready the instant you ask for it.
Twitter serves roughly 500 million tweets per day to hundreds of millions of active users. When you tap the home icon, you expect your feed to load in under 200 milliseconds — faster than a blink. Behind that blink is one of the most studied, most debated, and most instructive system design problems in the industry. If you can reason about a Twitter feed from first principles, you can design virtually any social content platform that has ever existed.
The core tension is simple to state but brutally hard to solve: reads vastly outnumber writes (people scroll far more than they tweet), yet writes need to fan out to potentially millions of followers in near-real time. Solve for reads, and you stress writes. Solve for writes, and reads become expensive. Every architectural decision in this problem is a negotiation between these two forces, and the right answer changes based on traffic patterns you can only learn from production.
By the end of this article you'll be able to whiteboard the full Twitter feed pipeline — from the moment a user hits 'Post', through fan-out, caching, ranking, and eventual delivery — explain the celebrity (hot key) problem and its solutions, articulate the trade-offs between push vs. pull vs. hybrid fan-out, and answer the follow-up questions that trip up even strong candidates.
What is Design Twitter Feed?
Design Twitter Feed is the system design problem of building a timeline that shows tweets from followed users in near-real-time. The core challenge is fan-out — distributing a single tweet to all followers. Three models exist: - Push (write fan-out): On tweet, pre-compute and store a new tweet in every follower's timeline cache. Read is O(1). Write cost is O(followers). - Pull (read fan-out): On timeline load, fetch tweets from all followed users and merge. Write is O(1). Read is O(followers). - Hybrid: Use push for average users (follower count < threshold) and pull for celebrities (follower count > threshold). The hybrid model is what production Twitter uses. The threshold is typically around 1 million followers. Another key component is ranking: after fetching the raw timeline, apply a machine learning model to score tweets by recency, engagement, and relevance. The ranked list is cached per user for a short TTL (typically 5 minutes) to balance freshness and read performance.
- Push = mail merge: each subscriber gets a personalised copy (high write cost)
- Pull = RSS feed: subscriber fetches when they want (high read cost)
- Hybrid = VIP subscribers get pull, others get push – balances load
Timeline Cache Architecture
The timeline cache is the heart of read performance. For each user, a sorted set (Redis sorted set) stores tweet IDs scored by timestamp. When a new tweet is fanned out, it's appended to the follower's timeline cache. On read, the cache returns the top 200 tweet IDs, which are then hydrated with full tweet content from a separate cache (tweet content cache). The cache TTL is typically 5 minutes. After TTL expires, the next read triggers a rebuild: the system fetches all followed users' recent tweets (pulling from celebrity records and recent tweet caches), runs ranking, and repopulates the cache. This rebuild is expensive, so cache hit ratio is a critical production metric. To prevent cache stampedes, use a Co-ordinated Omission pattern: only one request per user rebuilds the cache; others wait on a future.
redis-cli info stats | grep keyspace_hits.Ranking Pipeline: From Raw Tweets to Personalised Feed
Ranking is what turns a chronological list of tweets into a personalised feed that maximizes engagement. The pipeline: gather candidate tweets (from timeline cache or rebuild), extract features (recency, author engagement, content type, user's past interactions), score using an ML model (typically a lightweight gradient boosted tree or neural network), and return top N (usually 200). The model is trained offline on implicit user signals (clicks, dwell time, retweets). Inference is done online per timeline load. Model latency must be < 50ms to keep overall feed load < 200ms. Use model quantization or distillation for speed. Feature freshness matters: some features like tweet recency decay rapidly. Incorporate time-decay in the score.<br>Ranking is also used in the pull path: when a user pulls a celebrity's tweets, the ranking model scores which of those recent tweets to show prominently.
kubectl top pod -l app=ranking for CPU/memory.Celebrity (Hot Key) Problem – Deep Dive
When a user with millions of followers tweets, a push fan-out would require writing to millions of timeline caches. This creates a 'hot key' – the write load on the fan-out workers spikes dramatically. In production, Twitter observed that a single celebrity tweet could take down the fan-out infrastructure. The solution: split the world into two groups – 'normal' users (followers < threshold) get push; 'celebrity' users (followers >= threshold) get pull. For celebrities, we don't pre-push; instead, when a follower loads their timeline, we fetch recent tweets from the celebrity's own tweet cache.<br>But even the pull path needs to be fast. For each celebrity, maintain a small cache (e.g., 100 most recent tweets) served from memory. When a follower's timeline rebuild hits, it queries all followed celebrities' recent caches in parallel.<br>Threshold selection is critical: too low, and you still have hot keys; too high, and many users with moderate followers still cause spikes. Typically, threshold is set dynamically based on the current fan-out worker queue depth and cluster capacity.
- Push fan-out = each fan buys a ticket individually – overwhelms the system
- Pull fan-out = fans stay home and receive a notification to watch the livestream (fetch)
- Hybrid = VIPs get reserved seats (pull); general admission gets pre-assigned tickets (push)
Timeline Consistency and Freshness
Users expect their timeline to be fresh – new tweets should appear within seconds. But with caching and hybrid fan-out, achieving strong consistency is expensive. Twitter uses a relaxed consistency model: eventual consistency for timeline writes, with a best-effort refresh. The core mechanism: when a user tweets, they themselves get an immediate push to their own timeline cache. For followers, the push happens asynchronously within a few seconds. If a follower loads the timeline before the push completes, they may not see the tweet. To mitigate, the pull path includes recent tweets from the author (especially if the author is celebrity). For critical timeliness (e.g., breaking news), some systems implement a 'real-time feed' that bypasses caching and does a full pull from a small set of followed users. This is more expensive but ensures sub-second freshness. Monitoring freshness is done by comparing the timestamp of the last visible tweet in the cached timeline vs the actual latest tweet from followed authors. If drift > 30 seconds, alert.
The Celebrity Tweet That Melted the Fan-Out Cluster
- Always design for hot keys – a single celebrity tweet can dwarf normal traffic
- Hybrid fan-out is not optional for any social platform with power users
- Monitor fan-out worker queue depth per celebrity as a leading indicator
curl <internal-metrics>/fanout/lag?userId=<id> — if lag > 5s, inspect the fan-out worker queues for the tweet author. Verify the author is not classified as celebrity; if so, check pull path health.redis-cli --latency. Also verify ranking model prediction time – if model inference > 100ms, consider caching the ranked list.kubectl rollout history deployment/ranking-model.SELECT count(*) FROM timeline WHERE tweet_id = X to confirm duplicates. Add distributed lock per (userId, tweetId) before insert.toggle_ranking_off?tweetId=YKey takeaways
Common mistakes to avoid
4 patternsAssuming push fan-out scales to all users
Caching only tweet content, not the merged timeline
Using a single TTL for all timeline caches
Not having a fallback for ranking model failures
Interview Questions on This Topic
Design the Twitter timeline. Walk through the fan-out strategy and caching architecture.
Frequently Asked Questions
That's Real World. Mark it forged?
4 min read · try the examples if you haven't