Senior 4 min · March 05, 2026

Design Twitter Feed — Surviving 50M Fan-Out Writes

One celebrity tweet triggered 50M writes, freezing feeds for 30 seconds.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Fan-out models: push (write-time) vs pull (read-time) vs hybrid — each trades off write cost vs read latency
  • Timeline cache: pre-computed per-user feed reduces read latency from O(N) to O(1)
  • Celebrity (hot key) problem: followers of high-profile users cause write amplification — solution: separate celebrity fan-out into pull path
  • Ranking: ML-based scoring on recency, engagement, relevance — production models update in near-real-time
  • Performance insight: push fan-out at scale requires ~5ms per follower write; hybrid reduces peak write load by 40%
Plain-English First

Imagine a school notice board, but instead of one board everyone walks to, each student gets their own personal copy of only the notices that matter to them — delivered the moment someone posts. When you follow 300 people on Twitter, you want to see their tweets instantly without Twitter searching through billions of posts every time you open the app. The feed system is basically a very smart mail-sorting room that pre-packages your personal newspaper so it's ready the instant you ask for it.

Twitter serves roughly 500 million tweets per day to hundreds of millions of active users. When you tap the home icon, you expect your feed to load in under 200 milliseconds — faster than a blink. Behind that blink is one of the most studied, most debated, and most instructive system design problems in the industry. If you can reason about a Twitter feed from first principles, you can design virtually any social content platform that has ever existed.

The core tension is simple to state but brutally hard to solve: reads vastly outnumber writes (people scroll far more than they tweet), yet writes need to fan out to potentially millions of followers in near-real time. Solve for reads, and you stress writes. Solve for writes, and reads become expensive. Every architectural decision in this problem is a negotiation between these two forces, and the right answer changes based on traffic patterns you can only learn from production.

By the end of this article you'll be able to whiteboard the full Twitter feed pipeline — from the moment a user hits 'Post', through fan-out, caching, ranking, and eventual delivery — explain the celebrity (hot key) problem and its solutions, articulate the trade-offs between push vs. pull vs. hybrid fan-out, and answer the follow-up questions that trip up even strong candidates.

What is Design Twitter Feed?

Design Twitter Feed is the system design problem of building a timeline that shows tweets from followed users in near-real-time. The core challenge is fan-out — distributing a single tweet to all followers. Three models exist: - Push (write fan-out): On tweet, pre-compute and store a new tweet in every follower's timeline cache. Read is O(1). Write cost is O(followers). - Pull (read fan-out): On timeline load, fetch tweets from all followed users and merge. Write is O(1). Read is O(followers). - Hybrid: Use push for average users (follower count < threshold) and pull for celebrities (follower count > threshold). The hybrid model is what production Twitter uses. The threshold is typically around 1 million followers. Another key component is ranking: after fetching the raw timeline, apply a machine learning model to score tweets by recency, engagement, and relevance. The ranked list is cached per user for a short TTL (typically 5 minutes) to balance freshness and read performance.

io/thecodeforge/feed/FanoutService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
package io.thecodeforge.feed;

import java.util.*;
import java.util.concurrent.*;

public class FanoutService {
    private static final long CELEBRITY_THRESHOLD = 1_000_000L;
    private final ExecutorService fanOutPool = Executors.newFixedThreadPool(64);
    private final TimelineCache cache; // Redis-backed
    private final FollowerGraph followers; // Cassandra-backed

    public void fanoutTweet(Tweet tweet) {
        long authorId = tweet.authorId();
        long followerCount = followers.count(authorId);
        if (followerCount > CELEBRITY_THRESHOLD) {
            // Push only a summary; full timeline will pull
            fanOutPool.submit(() -> pushToRecentCelebrityCache(authorId, tweet));
        } else {
            // Push to all followers
            List<Long> followerIds = followers.getFollowerIds(authorId);
            for (long uid : followerIds) {
                fanOutPool.submit(() -> cache.append(uid, tweet));
            }
        }
    }

    private void pushToRecentCelebrityCache(long authorId, Tweet tweet) {
        // Store in a small cache of recent tweets by this celebrity
        cache.appendToCelebrityList(authorId, tweet); // TTL 1 hour
    }
}
Fan-out Analogy: Email Newsletter
  • Push = mail merge: each subscriber gets a personalised copy (high write cost)
  • Pull = RSS feed: subscriber fetches when they want (high read cost)
  • Hybrid = VIP subscribers get pull, others get push – balances load
Production Insight
Push fan-out at scale requires ~5ms per follower write.
At 10M followers, that's 50,000 ms = 50 seconds for one tweet.
Hybrid cuts this by skipping celebrity fan-out entirely.
Key Takeaway
Choose fan-out model based on follower distribution.
Push optimises reads; pull optimises writes.
Hybrid is the pragmatic production choice.
Choosing a Fan-out Model
IfMost users have < 1000 followers
UseUse push fan-out. Read latency is critical; write load manageable.
IfFew users have > 1M followers
UseUse hybrid. Move celebrities to pull path to prevent write spikes.
IfTimeline freshness must be sub-second even for celebrities
UseUse pull-only with aggressive caching of popular authors' recent tweets.

Timeline Cache Architecture

The timeline cache is the heart of read performance. For each user, a sorted set (Redis sorted set) stores tweet IDs scored by timestamp. When a new tweet is fanned out, it's appended to the follower's timeline cache. On read, the cache returns the top 200 tweet IDs, which are then hydrated with full tweet content from a separate cache (tweet content cache). The cache TTL is typically 5 minutes. After TTL expires, the next read triggers a rebuild: the system fetches all followed users' recent tweets (pulling from celebrity records and recent tweet caches), runs ranking, and repopulates the cache. This rebuild is expensive, so cache hit ratio is a critical production metric. To prevent cache stampedes, use a Co-ordinated Omission pattern: only one request per user rebuilds the cache; others wait on a future.

io/thecodeforge/feed/TimelineCacheManager.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
package io.thecodeforge.feed;

import redis.clients.jedis.*;
import java.util.*;
import java.util.concurrent.*;

public class TimelineCacheManager {
    private final JedisCluster redis;
    private final LoadingCache<Long, List<Tweet>> rebuildFutures;

    public List<Tweet> getTimeline(long userId) {
        String key = "timeline:" + userId;
        Set<String> rawIds = redis.zrevrange(key, 0, 199);
        if (!rawIds.isEmpty()) {
            return hydrateTweets(rawIds);
        }
        // Cache miss – rebuild
        return rebuildAndCache(userId);
    }

    private List<Tweet> rebuildAndCache(long userId) {
        // Use a future cache to avoid stampede
        return rebuildFutures.get(userId, () -> doRebuild(userId));
    }

    private List<Tweet> doRebuild(long userId) {
        List<Tweet> allTweets = fetchRecentFromFollowed(userId);
        List<Tweet> scored = rankingModel.score(allTweets);
        String key = "timeline:" + userId;
        redis.del(key);
        for (int i = 0; i < scored.size(); i++) {
            redis.zadd(key, scored.get(i).score(), String.valueOf(scored.get(i).id()));
        }
        redis.expire(key, 300); // 5 min TTL
        return scored.subList(0, Math.min(200, scored.size()));
    }
}
Cache Stampede Danger
When a cache entry expires, multiple read requests may all trigger a rebuild simultaneously. This can overwhelm the database and ranking service. Always use a rebuild future or mutex per key.
Production Insight
Timeline cache hit rate should be > 90%.
A 10% drop increases read latency by 400ms on average.
Monitor: redis-cli info stats | grep keyspace_hits.
Key Takeaway
Cache the merged feed, not individual tweets.
Avoid stampede with co-ordinated rebuilds.
5-minute TTL balances freshness and write load.

Ranking Pipeline: From Raw Tweets to Personalised Feed

Ranking is what turns a chronological list of tweets into a personalised feed that maximizes engagement. The pipeline: gather candidate tweets (from timeline cache or rebuild), extract features (recency, author engagement, content type, user's past interactions), score using an ML model (typically a lightweight gradient boosted tree or neural network), and return top N (usually 200). The model is trained offline on implicit user signals (clicks, dwell time, retweets). Inference is done online per timeline load. Model latency must be < 50ms to keep overall feed load < 200ms. Use model quantization or distillation for speed. Feature freshness matters: some features like tweet recency decay rapidly. Incorporate time-decay in the score.<br>Ranking is also used in the pull path: when a user pulls a celebrity's tweets, the ranking model scores which of those recent tweets to show prominently.

io/thecodeforge/feed/ranking_model.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# TheCodeForge ranking model (simplified)
import numpy as np
import joblib

class RankingScorer:
    def __init__(self, model_path='model/feed_ranker.pkl'):
        self.model = joblib.load(model_path)

    def score_tweets(self, tweets, user_profile):
        features = np.array([
            self._extract_features(t, user_profile) for t in tweets
        ])
        scores = self.model.predict_proba(features)[:, 1]  # probability of engagement
        for t, s in zip(tweets, scores):
            t.rank_score = s
        return sorted(tweets, key=lambda t: t.rank_score, reverse=True)[:200]

    def _extract_features(self, tweet, profile):
        return [
            tweet.recent_engagement_rate,    # 0-1
            tweet.hours_since_post,          # inverse
            profile.interaction_score.get(tweet.author_id, 0.5),
            tweet.media_type_encoded,        # 0=text,1=image,2=video
            tweet.is_from_celebrity,          # boolean
        ]
Feature Engineering Warning
Recency is often the most predictive feature, but over-weighting it can bury high-quality content. Use a time-decay function (e.g., exponential decay with half-life of 6 hours) to balance recency and relevance.
Production Insight
Ranking model inference must complete within 50ms at p99.
If model latency exceeds 100ms, fall back to chronological sort to avoid feed timeouts.
Monitor: kubectl top pod -l app=ranking for CPU/memory.
Key Takeaway
Ranking is a real-time ML pipeline with strict latency budgets.
Always maintain a chronological fallback.
Feature freshness is as important as model accuracy.
Ranking Fallback Strategy
IfModel latency > 100ms at p50
UseFall back to chronological sort + recency boost. Notify ML team.
IfModel prediction accuracy drops below 0.65 AUC
UseRoll back to previous model version. Compare offline metrics.

Celebrity (Hot Key) Problem – Deep Dive

When a user with millions of followers tweets, a push fan-out would require writing to millions of timeline caches. This creates a 'hot key' – the write load on the fan-out workers spikes dramatically. In production, Twitter observed that a single celebrity tweet could take down the fan-out infrastructure. The solution: split the world into two groups – 'normal' users (followers < threshold) get push; 'celebrity' users (followers >= threshold) get pull. For celebrities, we don't pre-push; instead, when a follower loads their timeline, we fetch recent tweets from the celebrity's own tweet cache.<br>But even the pull path needs to be fast. For each celebrity, maintain a small cache (e.g., 100 most recent tweets) served from memory. When a follower's timeline rebuild hits, it queries all followed celebrities' recent caches in parallel.<br>Threshold selection is critical: too low, and you still have hot keys; too high, and many users with moderate followers still cause spikes. Typically, threshold is set dynamically based on the current fan-out worker queue depth and cluster capacity.

io/thecodeforge/feed/CelebrityCache.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package io.thecodeforge.feed;

import java.util.*;
import java.util.concurrent.*;

public class CelebrityCache {
    private final ConcurrentHashMap<Long, List<Tweet>> recent = new ConcurrentHashMap<>();
    private static final int MAX_PER_CELEBRITY = 100;

    public void append(long authorId, Tweet tweet) {
        recent.compute(authorId, (k, list) -> {
            if (list == null) list = new ArrayList<>();
            list.add(0, tweet);
            if (list.size() > MAX_PER_CELEBRITY) {
                list = list.subList(0, MAX_PER_CELEBRITY);
            }
            return list;
        });
    }

    public List<Tweet> getRecent(long authorId) {
        return recent.getOrDefault(authorId, Collections.emptyList());
    }
}
Hot Key Analogy: Concert Ticketing
  • Push fan-out = each fan buys a ticket individually – overwhelms the system
  • Pull fan-out = fans stay home and receive a notification to watch the livestream (fetch)
  • Hybrid = VIPs get reserved seats (pull); general admission gets pre-assigned tickets (push)
Production Insight
A single celebrity tweet can generate 10 million timeline writes.
Hybrid fan-out reduces peak write load by 40%.
Auto-tune the threshold based on fan-out queue depth every 5 seconds.
Key Takeaway
Hot keys break push fan-out.
Separate celebrities into pull path.
Dynamically adjust threshold based on system load.

Timeline Consistency and Freshness

Users expect their timeline to be fresh – new tweets should appear within seconds. But with caching and hybrid fan-out, achieving strong consistency is expensive. Twitter uses a relaxed consistency model: eventual consistency for timeline writes, with a best-effort refresh. The core mechanism: when a user tweets, they themselves get an immediate push to their own timeline cache. For followers, the push happens asynchronously within a few seconds. If a follower loads the timeline before the push completes, they may not see the tweet. To mitigate, the pull path includes recent tweets from the author (especially if the author is celebrity). For critical timeliness (e.g., breaking news), some systems implement a 'real-time feed' that bypasses caching and does a full pull from a small set of followed users. This is more expensive but ensures sub-second freshness. Monitoring freshness is done by comparing the timestamp of the last visible tweet in the cached timeline vs the actual latest tweet from followed authors. If drift > 30 seconds, alert.

io/thecodeforge/feed/FreshnessMonitor.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package io.thecodeforge.feed;

import java.time.*;
import java.util.concurrent.*;

public class FreshnessMonitor implements Runnable {
    private final TimelineCache cache;
    private final FollowerGraph followers;
    private final long THRESHOLD_SECONDS = 30;

    @Override
    public void run() {
        // Sample 1% of users
        for (long userId : cache.sampleUserIds(0.01)) {
            Instant lastCached = cache.getLatestTimestamp(userId);
            Instant latestTweet = followers.getLatestTweetFromFollowed(userId);
            if (lastCached == null || latestTweet.isAfter(lastCached.plusSeconds(THRESHOLD_SECONDS))) {
                System.err.println("Freshness breach for userId=" + userId + " drift=" +
                        Duration.between(lastCached, latestTweet).getSeconds() + "s");
                // Trigger immediate rebuild for this user
                cache.invalidate(userId);
            }
        }
    }
}
Eventual Consistency Traps
Users may see their own tweet immediately but followers don't see it for seconds. This asymmetry can cause user confusion. Provide a manual refresh button that triggers a pull path update.
Production Insight
Timeline freshness drift > 30 seconds is a P1 incident.
Monitor using synthetic user accounts that tweet and then check if they appear on a follower's feed.
Target: 95% of tweets visible within 5 seconds.
Key Takeaway
Accept eventual consistency, but bound freshness.
Use synthetic users to measure real user-perceived latency.
Provide a manual refresh fallback.
● Production incidentPOST-MORTEMseverity: high

The Celebrity Tweet That Melted the Fan-Out Cluster

Symptom
Feeds for normal users stopped updating for up to 30 seconds. Logs showed massive write queue backlogs on the fan-out workers. CPU on Cassandra nodes hit 95%.
Assumption
The push fan-out system could handle any tweet volume because it was horizontally scalable.
Root cause
A single celebrity tweet generated 50M timeline writes – the same cost as 50,000 average tweets. The fan-out workers were sized for average load, not tail latency for hot keys.
Fix
Moved celebrity users (follower count > 1M) to a hybrid fan-out: their tweets are not pre-pushed; instead, followers pull those tweets during timeline read. This reduced peak write load by 40% and eliminated the hot-key bottleneck.
Key lesson
  • Always design for hot keys – a single celebrity tweet can dwarf normal traffic
  • Hybrid fan-out is not optional for any social platform with power users
  • Monitor fan-out worker queue depth per celebrity as a leading indicator
Production debug guideSymptom → Action guide for feed reliability engineers4 entries
Symptom · 01
User's feed is missing recent tweets from some followed accounts
Fix
Check fan-out lag: curl <internal-metrics>/fanout/lag?userId=<id> — if lag > 5s, inspect the fan-out worker queues for the tweet author. Verify the author is not classified as celebrity; if so, check pull path health.
Symptom · 02
Feed loads slowly (>500ms) for a specific user
Fix
Profile timeline cache hit rate. If < 80%, check cache cluster health: redis-cli --latency. Also verify ranking model prediction time – if model inference > 100ms, consider caching the ranked list.
Symptom · 03
Inconsistent feed ordering across devices
Fix
Check ranking model version deployed. Different load balancer pods may serve different model versions. Roll out model updates via canary: kubectl rollout history deployment/ranking-model.
Symptom · 04
Feed shows duplicate tweets
Fix
Inspect idempotency key in timeline write path. Fan-out may have retried due to timeout. Use SELECT count(*) FROM timeline WHERE tweet_id = X to confirm duplicates. Add distributed lock per (userId, tweetId) before insert.
★ Feed System Quick Debug Cheat SheetInstant commands and actions for the top 3 timeline emergencies.
Fan-out worker queue builds up – write latency spikes
Immediate action
Identify the offending tweet author via fan-out trends dashboard. If celebrity, temporarily enable pull path for that author.
Commands
curl -X POST internal/admin/fanout/mode?authorId=X&mode=pull
kubectl scale deployment fanout-worker --replicas=100
Fix now
Pause ranking for this tweet during backfill by setting a feature flag: toggle_ranking_off?tweetId=Y
Timeline cache hit rate drops below 70%+
Immediate action
Verify cache cluster CPU and memory. If saturation, increase cluster size or add read replicas for fallback.
Commands
redis-cli --stat | grep hits
kubectl exec pod/redis-master-0 -- redis-cli info keyspace
Fix now
Set temporary TTL extension in config: timeline_cache_ttl_seconds: 1800 (increase from default 300)
Ranking model response times exceed 200ms+
Immediate action
Check model inference server health and model version. Fallback to a simpler heuristic ranking (temporal sort) if needed.
Commands
curl internal/metrics/ranking/p99
kubectl set env deployment/ranking FEATURE_FLAG=simple_sort
Fix now
Bake out model servers by adding new pods: kubectl scale deployment ranking --replicas=20
Fan-out Model Comparison
ModelWrite CostRead CostFreshnessProduction Use
Push fan-outO(followers)O(1)BestNormal users (follower count < 1M)
Pull fan-outO(1)O(followers)Good (depends on query)Celebrities (follower count >= 1M)
HybridO(average followers)O(1) for normal, O(celebrities) for pullBest for normal, Good for celebritiesTwitter, Instagram, Facebook

Key takeaways

1
You now understand what Design Twitter Feed is and why it exists
2
You've seen it working in a real runnable example
3
Practice daily
the forge only works when it's hot 🔥
4
Hybrid fan-out separates normal users (push) from celebrities (pull) to avoid hot keys
5
Timeline cache (Redis sorted set) is the key to O(1) reads
6
Ranking must have a real-time latency budget and a fallback
7
Cache stampedes can destroy the backend
always coalesce rebuilds

Common mistakes to avoid

4 patterns
×

Assuming push fan-out scales to all users

Symptom
Fan-out worker queues overflow during celebrity tweets, causing timeline delays for everyone.
Fix
Implement hybrid fan-out: push for users with < 1M followers, pull for celebrities. Use dynamic threshold based on current fan-out load.
×

Caching only tweet content, not the merged timeline

Symptom
Each feed request requires merging tweets from all followed users, causing high read latency and DB load.
Fix
Cache the pre-merged timeline (list of tweet IDs) per user. Use Redis sorted sets for efficient update and range retrieval.
×

Using a single TTL for all timeline caches

Symptom
Active users get stale feeds because TTL is too high; inactive users waste memory with quickly refreshes.
Fix
Use adaptive TTL: short TTL (2 min) for active users, longer TTL (30 min) for infrequent users. Track last access time per user.
×

Not having a fallback for ranking model failures

Symptom
If the ML model serving crashes, feed loads fail or return empty results.
Fix
Always maintain a simple chronological sort fallback. Use feature flags to switch instantly.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
Design the Twitter timeline. Walk through the fan-out strategy and cachi...
Q02SENIOR
How do you handle the 'super user' or 'celebrity' problem at scale?
Q03SENIOR
Explain the trade-offs between push, pull, and hybrid fan-out in a feed ...
Q04SENIOR
How does the ranking pipeline work in a feed system? How do you ensure i...
Q01 of 04SENIOR

Design the Twitter timeline. Walk through the fan-out strategy and caching architecture.

ANSWER
I'd start by defining the requirements: 500M tweets/day, 300M MAU, read latency < 200ms, eventual consistency acceptable. The fan-out model is hybrid: push for users under a follower threshold (e.g., 1M), pull for celebrities. Timeline cache uses Redis sorted sets per user, storing tweet IDs scored by timestamp. On write, fan-out workers push to followers' caches asynchronously. On read, get top 200 IDs from cache, hydrate from content cache. If cache miss, rebuild by fetching from followed users' recent tweet stores, run ranking, and populate cache. Celebrity tweets are not pre-pushed; when rebuilding, we fetch from a small celebrity recent cache. Ranking model scores tweets within 50ms. Trade-offs: push scales reads but creates hot keys; hybrid reduces write peaks. We also use a cache stampede prevention with per-user rebuild futures.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is Design Twitter Feed in simple terms?
02
Why is push fan-out not used for all users?
03
How does ranking avoid being slow?
04
What happens if the timeline cache is empty?
🔥

That's Real World. Mark it forged?

4 min read · try the examples if you haven't

Previous
Design URL Shortener
2 / 17 · Real World
Next
Design YouTube