TikTok's system design relies on a hybrid of precomputed and real-time feed generation, using a combination of object storage, CDN, and a custom recommendation engine. The core is a fan-out-on-write approach for active users and fan-out-on-read for long-tail creators.
✦ Definition~90s read
What is Design TikTok?
Design TikTok is the system architecture behind a short-video platform serving billions of users. It covers feed generation, video storage, recommendation systems, and real-time scaling.
★
Think of TikTok as a massive library where every visitor wants a personalized stack of books.
Plain-English First
Think of TikTok as a massive library where every visitor wants a personalized stack of books. Instead of having a librarian fetch books on demand for each person (too slow), the library pre-stacks popular books on tables (precomputed feeds). For less popular books, the librarian grabs them when asked (real-time fetch). The library also remembers what you liked before to guess what you want next.
You think scaling a read-heavy app is hard? Try scaling a read-heavy app where every user expects a unique, personalized feed of short videos, and where a single viral video can spike traffic by 10x in minutes. That's TikTok. Most system design tutorials give you textbook diagrams with load balancers and databases, but they skip the hard parts: how do you generate a feed for 1 billion users without melting your database? How do you store and serve petabytes of video without bankrupting your company on CDN costs? This article walks you through the real production decisions behind TikTok's architecture—the trade-offs, the failure modes, and the patterns that actually work at scale. By the end, you'll know how to design a feed system that handles viral spikes, reduces storage costs, and keeps latency under 200ms.
Feed Generation: The Core Problem
Every social media app has a feed. But TikTok's feed is different: it's not just chronological. It's a personalized, infinite scroll of short videos. The naive approach—query all videos, rank them, return top N—breaks at scale. A single query could scan billions of videos. The real solution is a two-tier approach: precompute feeds for active users (fan-out-on-write) and compute on read for inactive users. Why? Because active users expect instant updates, while inactive users can tolerate a few seconds of latency. The trade-off is storage: precomputed feeds take space. But with a TTL of 1 hour and a limit of 500 videos per feed, it's manageable.
FeedGenerator.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// io.thecodeforge — SystemDesign tutorial
// Feed generation service: precomputes feeds for active users
// Active user = logged in within last 24h
classFeedGenerator {
privatefinalCache<String, List<Video>> feedCache;
privatefinalVideoRanker ranker;
privatefinalCreatorGraph graph; // who user follows
publicList<Video> getFeed(String userId) {
// Check cache first
List<Video> cached = feedCache.get(userId);
if (cached != null) return cached;
// Fan-out-on-read: compute feed on demand
Set<String> followees = graph.getFollowees(userId);
List<Video> candidates = fetchRecentVideos(followees, 500);
List<Video> ranked = ranker.rank(candidates, userId);
feedCache.put(userId, ranked, Duration.ofMinutes(30));
return ranked;
}
// Called when a creator uploads a new video
publicvoidonNewVideo(String creatorId, Video video) {
// Fan-out-on-write: push to active followers
Set<String> activeFollowers = graph.getActiveFollowers(creatorId);
for (String followerId : activeFollowers) {
feedCache.evict(followerId); // force recompute on next read
// Alternatively, prepend to a precomputed list
}
}
}
Output
Feed for user 'abc123' returned in 45ms (cache hit) or 2.1s (cache miss, computed on read).
Production Trap: Write Amplification
If you fan-out-on-write to every follower, a celebrity with 10M followers causes 10M cache updates per video. That's 10M writes for one upload. Mitigation: only fan-out to active followers (logged in last 24h). For mega-creators, switch to fan-out-on-read entirely.
Feed Generation Strategy Decision Tree
IfUser active in last 24h
→
UseFan-out-on-write: precompute feed on new video upload
IfUser inactive >24h
→
UseFan-out-on-read: compute feed on login
IfCreator has >1M followers
→
UseFan-out-on-read for all followers to avoid write amplification
thecodeforge.io
TikTok Feed System Design: Zero to 1B
Design Tiktok
thecodeforge.io
Feed Generation Pipeline
Design Tiktok
Video Storage: Where the Money Goes
Storing and serving videos is the biggest cost in TikTok. Raw video files are huge. The solution: transcode videos into multiple resolutions (240p to 4K), store in object storage (S3, GCS), and serve via CDN. But CDN egress is expensive. The trick is to only cache viral videos at the edge. Use a two-tier CDN: edge nodes cache only the top 1% of videos (by views in last hour), regional nodes cache the top 10%, and the rest come from origin. This cuts CDN costs by 60% while keeping p95 latency under 300ms.
VideoStorage.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// io.thecodeforge — SystemDesign tutorial
// Video storage service with tiered caching
classVideoStorage {
privatefinalObjectStore objectStore;
privatefinalCache<String, byte[]> edgeCache; // small, fast
privatefinalCache<String, byte[]> regionalCache; // larger, slower
publicbyte[] getVideo(String videoId, String resolution) {
String key = videoId + ":" + resolution;
// Check edge cache first
byte[] video = edgeCache.get(key);
if (video != null) return video;
// Check regional cache
video = regionalCache.get(key);
if (video != null) {
edgeCache.put(key, video); // promote to edge
return video;
}
// Fetch from object store
video = objectStore.get(key);
// Only cache in edge if video is viral (views > threshold)
if (isViral(videoId)) {
edgeCache.put(key, video);
} else {
regionalCache.put(key, video);
}
return video;
}
privatebooleanisViral(String videoId) {
// Check view count in last hour from analytics
return viewCountService.getHourlyViews(videoId) > 100_000;
}
}
Output
Video 'v98765' at 720p returned in 12ms (edge cache hit), 45ms (regional), or 320ms (origin).
Senior Shortcut: Content-Aware Compression
Use different compression for different content types. Talking head videos compress well with H.265 at lower bitrates. High-motion content (dance, sports) needs higher bitrate. Analyze video content on upload and set compression profile accordingly. Saves 30% storage without quality loss.
thecodeforge.io
Video Storage Strategy
Design Tiktok
Recommendation System: The Secret Sauce
TikTok's recommendation system is what keeps users hooked. It's not just collaborative filtering. It's a multi-stage ranking pipeline: first, retrieve candidates from multiple sources (follows, trending, similar users, content-based). Then, rank them using a deep learning model that predicts engagement (likes, shares, watch time). Finally, apply business rules (diversity, freshness, anti-spam). The key insight: the retrieval stage must be fast (sub-50ms) and the ranking stage must be accurate. Use approximate nearest neighbor (ANN) search for content-based retrieval, and a lightweight neural network for ranking. Don't try to rank all candidates—use a funnel: retrieve 500, rank 100, return 10.
Recommendations for user 'xyz789' returned 10 videos in 180ms total (45ms retrieve, 95ms rank, 40ms filter).
Interview Gold: Cold Start Problem
New users have no history. Solution: use a default model based on device type, location, and time of day. Also, force-show a few popular videos to bootstrap the profile. Once user interacts, switch to personalized model.
Scaling Under Viral Load
A single viral video can cause a 10x traffic spike in minutes. Your system must handle this without falling over. The key is to design for overload: use a queue for video uploads (so transcoding doesn't block the API), use a CDN with a high cache hit ratio, and implement rate limiting at the API gateway. But the most important pattern is circuit breakers: if the feed generation service starts timing out, stop sending requests to it and return a stale cached feed instead. This prevents cascading failures. Also, use autoscaling with a fast startup time—container images should be small (<500MB) and pre-warmed.
ViralSpikeHandler.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — SystemDesign tutorial
// Circuit breaker for feed generation
classFeedCircuitBreaker {
privatefinalAtomicInteger failureCount = newAtomicInteger(0);
privatefinalint threshold = 10; // failures in 1 minute
privatevolatileboolean open = false;
publicList<Video> getFeed(String userId) {
if (open) {
// Return stale cached feed
return staleFeedCache.get(userId);
}
try {
List<Video> feed = feedGenerator.getFeed(userId);
failureCount.set(0);
return feed;
} catch (TimeoutException e) {
int failures = failureCount.incrementAndGet();
if (failures >= threshold) {
open = true;
// Schedule reset after 30 seconds
scheduler.schedule(() -> open = false, 30, TimeUnit.SECONDS);
}
throw e;
}
}
}
Output
During a viral spike, feed requests that would have timed out now return stale cached feeds in 5ms. Circuit breaker opens after 10 failures, closes after 30s.
Never Do This: Synchronous Transcoding
Never transcode videos synchronously on upload. A 4K video can take minutes to transcode. Use an async queue (SQS, Kafka) and return a 'processing' status. The client polls until the video is ready.
● Production incidentPOST-MORTEMseverity: high
The 4GB Container That Kept Dying
Symptom
Feed generation service containers were OOM-killed every 10 minutes during peak hours. CPU was fine, memory was the killer.
Assumption
Team assumed a memory leak in the feed generator code. Spent days profiling heap dumps.
Root cause
The feed generator was loading full video metadata (including user profile pictures) for each video in the candidate pool. For a pool of 500 videos, that's 500 user objects with profile images. Each image was 500KB on average, leading to 250MB per request. With 20 concurrent requests, that's 5GB—over the 4GB container limit.
Fix
Lazy-load user profile images only when rendering the feed card. Reduced per-request memory from 250MB to 2MB. Also added a per-container request concurrency limit of 10.
Key lesson
Never load full user objects in feed generation.
Only load what's needed for ranking, then hydrate display data on the client or via a separate lightweight API.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Feed generation latency >2s for cache misses
→
Fix
1. Check if the candidate retrieval is slow (query database or cache). 2. Ensure ANN index is loaded in memory. 3. Check if ranking model inference is CPU-bound. 4. Add more feed generator pods.
Symptom · 02
Video uploads failing with 503 Service Unavailable
→
Fix
1. Check object store write throughput limits. 2. Ensure upload queue is not full. 3. Verify presigned URL expiration. 4. Increase queue consumers.
Symptom · 03
CDN costs increasing 50% month-over-month
→
Fix
1. Check cache hit ratio (should be >90% for edge). 2. Verify viral threshold is correct. 3. Implement content-aware compression. 4. Negotiate CDN contract for egress discounts.
★ Design TikTok Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Feed slow for all users−
Immediate action
Check feed cache hit ratio in Redis
Commands
redis-cli info stats | grep hit_rate
redis-cli --bigkeys
Fix now
Increase cache size or precompute feeds for top users
Scale up transcoder workers or reduce video resolution options
High CDN egress costs+
Immediate action
Check CDN cache hit ratio
Commands
curl <cdn-endpoint>/stats?cache_hit_ratio
Check top 100 videos by egress in CDN logs
Fix now
Increase edge cache TTL for viral videos, implement tiered caching
Recommendations not personalized+
Immediate action
Check user embedding freshness
Commands
curl <ml-service>/user-embedding/<userId>
Check last interaction timestamp in user profile
Fix now
Force recompute embedding on next interaction
Aspect
Fan-out-on-write
Fan-out-on-read
Latency for active users
Low (precomputed)
High (computed on read)
Write amplification
High (updates for each follower)
None
Storage cost
High (cached feeds)
Low (no cache)
Best for
Active users with few followees
Inactive users or mega-creators
Key takeaways
1
Feed generation is a trade-off between latency and write amplification
use fan-out-on-write for active users with few followees, fan-out-on-read for inactive users and mega-creators.
2
Video storage costs dominate
use tiered caching (edge for viral, regional for popular, origin for rest) and content-aware compression to cut costs by 60%.
3
Recommendation is a funnel
retrieve 500 candidates from multiple sources, rank 100 with ML, return 10. Use ANN for fast content-based retrieval.
4
The counterintuitive truth
precomputing feeds for everyone is a trap. The most scalable systems use a hybrid approach that adapts to user activity and creator popularity.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
How does TikTok's feed generation handle a user who follows 10,000 peopl...
Q02SENIOR
When would you choose fan-out-on-write over fan-out-on-read in a product...
Q03SENIOR
What happens when a video goes viral and millions of users request it si...
Q04JUNIOR
What is the role of approximate nearest neighbor (ANN) search in TikTok'...
Q05SENIOR
A production incident: the feed generation service is OOM-killed every 1...
Q06SENIOR
How would you design the system to handle a new feature like 'live strea...
Q01 of 06SENIOR
How does TikTok's feed generation handle a user who follows 10,000 people? Does it precompute feeds for all of them?
ANSWER
No, that would be write amplification hell. For users with many followees, the system uses fan-out-on-read: when the user requests their feed, it fetches recent videos from all followees (using a time-ordered index) and ranks them. Precomputation is only for active users with a moderate number of followees (e.g., <1000).
Q02 of 06SENIOR
When would you choose fan-out-on-write over fan-out-on-read in a production feed system?
ANSWER
Fan-out-on-write is best when the read-to-write ratio is high and latency is critical (e.g., active users). Fan-out-on-read is better when writes are frequent (e.g., mega-creators) or when storage cost is a concern. The decision also depends on the number of followers per creator: if most creators have <1000 followers, fan-out-on-write is fine; if some have millions, use fan-out-on-read for those.
Q03 of 06SENIOR
What happens when a video goes viral and millions of users request it simultaneously? How do you prevent a thundering herd on the origin storage?
ANSWER
The CDN absorbs most requests. But if the video is not cached, the first few requests hit the origin. Use a 'request coalescing' pattern: only one request goes to origin while others wait on a lock (e.g., Redis mutex). Also, pre-warm the CDN by proactively pushing viral videos to edge nodes based on view count velocity.
Q04 of 06JUNIOR
What is the role of approximate nearest neighbor (ANN) search in TikTok's recommendation system?
ANSWER
ANN is used in the content-based retrieval stage to find videos similar to ones the user has liked. It converts video embeddings into vectors and uses ANN indexes (e.g., HNSW) to find nearest neighbors in sub-millisecond time. This is much faster than brute-force cosine similarity on millions of vectors.
Q05 of 06SENIOR
A production incident: the feed generation service is OOM-killed every 10 minutes. How do you debug and fix it?
ANSWER
First, check heap dumps to see what objects are consuming memory. Likely culprit: loading full user objects (including profile images) for each video in the candidate pool. Fix: lazy-load profile images only when rendering the feed card. Also, reduce concurrency per container and set a memory limit with a buffer.
Q06 of 06SENIOR
How would you design the system to handle a new feature like 'live streaming' alongside the existing video feed?
ANSWER
Live streams are real-time and ephemeral. They need a separate pipeline: ingest via RTMP, transcode in real-time, and serve via a low-latency CDN (e.g., WebRTC or HLS with small segments). The feed should include live streams with a different ranking signal (e.g., 'live now' badge). Storage is minimal (only for replays).
01
How does TikTok's feed generation handle a user who follows 10,000 people? Does it precompute feeds for all of them?
SENIOR
02
When would you choose fan-out-on-write over fan-out-on-read in a production feed system?
SENIOR
03
What happens when a video goes viral and millions of users request it simultaneously? How do you prevent a thundering herd on the origin storage?
SENIOR
04
What is the role of approximate nearest neighbor (ANN) search in TikTok's recommendation system?
JUNIOR
05
A production incident: the feed generation service is OOM-killed every 10 minutes. How do you debug and fix it?
SENIOR
06
How would you design the system to handle a new feature like 'live streaming' alongside the existing video feed?
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
How does TikTok generate personalized feeds for billions of users without melting the database?
TikTok uses a hybrid approach: for active users, it precomputes feeds (fan-out-on-write) and caches them in Redis. For inactive users, it computes feeds on demand (fan-out-on-read). This balances latency and write amplification. The database is only queried for candidate retrieval, not for full feed generation.
Was this helpful?
02
What's the difference between fan-out-on-write and fan-out-on-read in system design?
Fan-out-on-write precomputes feeds when a creator uploads a video, pushing it to all followers' caches. Fan-out-on-read computes the feed when the user requests it, fetching recent videos from followees. Use fan-out-on-write for low-latency reads and many reads per write; use fan-out-on-read to avoid write amplification when creators have many followers.
Was this helpful?
03
How do I reduce CDN costs for a video streaming service like TikTok?
Implement tiered caching: edge CDN caches only viral videos (top 1% by views), regional caches popular videos (top 10%), and the rest come from origin. Use content-aware compression (lower bitrate for static content). Also, negotiate egress discounts with CDN providers and use peer-to-peer CDN for very large files.
Was this helpful?
04
What happens when a video goes viral and millions of users request it at once?
The CDN absorbs most requests. If the video is not cached, use request coalescing (only one request to origin, others wait on a lock). Proactively pre-warm CDN by pushing videos to edge nodes when view count velocity exceeds a threshold. Also, use a circuit breaker to fall back to stale content if origin is overloaded.