Senior 3 min · June 25, 2026

Design TikTok: From Zero to 1B Feeds with Real-World System Design

System design for TikTok: feed generation, storage, caching, and scaling.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

TikTok's system design relies on a hybrid of precomputed and real-time feed generation, using a combination of object storage, CDN, and a custom recommendation engine. The core is a fan-out-on-write approach for active users and fan-out-on-read for long-tail creators.

✦ Definition~90s read
What is Design TikTok?

Design TikTok is the system architecture behind a short-video platform serving billions of users. It covers feed generation, video storage, recommendation systems, and real-time scaling.

Think of TikTok as a massive library where every visitor wants a personalized stack of books.
Plain-English First

Think of TikTok as a massive library where every visitor wants a personalized stack of books. Instead of having a librarian fetch books on demand for each person (too slow), the library pre-stacks popular books on tables (precomputed feeds). For less popular books, the librarian grabs them when asked (real-time fetch). The library also remembers what you liked before to guess what you want next.

You think scaling a read-heavy app is hard? Try scaling a read-heavy app where every user expects a unique, personalized feed of short videos, and where a single viral video can spike traffic by 10x in minutes. That's TikTok. Most system design tutorials give you textbook diagrams with load balancers and databases, but they skip the hard parts: how do you generate a feed for 1 billion users without melting your database? How do you store and serve petabytes of video without bankrupting your company on CDN costs? This article walks you through the real production decisions behind TikTok's architecture—the trade-offs, the failure modes, and the patterns that actually work at scale. By the end, you'll know how to design a feed system that handles viral spikes, reduces storage costs, and keeps latency under 200ms.

Feed Generation: The Core Problem

Every social media app has a feed. But TikTok's feed is different: it's not just chronological. It's a personalized, infinite scroll of short videos. The naive approach—query all videos, rank them, return top N—breaks at scale. A single query could scan billions of videos. The real solution is a two-tier approach: precompute feeds for active users (fan-out-on-write) and compute on read for inactive users. Why? Because active users expect instant updates, while inactive users can tolerate a few seconds of latency. The trade-off is storage: precomputed feeds take space. But with a TTL of 1 hour and a limit of 500 videos per feed, it's manageable.

FeedGenerator.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// io.thecodeforge — System Design tutorial

// Feed generation service: precomputes feeds for active users
// Active user = logged in within last 24h

class FeedGenerator {
    private final Cache<String, List<Video>> feedCache;
    private final VideoRanker ranker;
    private final CreatorGraph graph; // who user follows

    public List<Video> getFeed(String userId) {
        // Check cache first
        List<Video> cached = feedCache.get(userId);
        if (cached != null) return cached;

        // Fan-out-on-read: compute feed on demand
        Set<String> followees = graph.getFollowees(userId);
        List<Video> candidates = fetchRecentVideos(followees, 500);
        List<Video> ranked = ranker.rank(candidates, userId);
        feedCache.put(userId, ranked, Duration.ofMinutes(30));
        return ranked;
    }

    // Called when a creator uploads a new video
    public void onNewVideo(String creatorId, Video video) {
        // Fan-out-on-write: push to active followers
        Set<String> activeFollowers = graph.getActiveFollowers(creatorId);
        for (String followerId : activeFollowers) {
            feedCache.evict(followerId); // force recompute on next read
            // Alternatively, prepend to a precomputed list
        }
    }
}
Output
Feed for user 'abc123' returned in 45ms (cache hit) or 2.1s (cache miss, computed on read).
Production Trap: Write Amplification
If you fan-out-on-write to every follower, a celebrity with 10M followers causes 10M cache updates per video. That's 10M writes for one upload. Mitigation: only fan-out to active followers (logged in last 24h). For mega-creators, switch to fan-out-on-read entirely.
Feed Generation Strategy Decision Tree
IfUser active in last 24h
UseFan-out-on-write: precompute feed on new video upload
IfUser inactive >24h
UseFan-out-on-read: compute feed on login
IfCreator has >1M followers
UseFan-out-on-read for all followers to avoid write amplification
TikTok Feed System Design: Zero to 1B THECODEFORGE.IO TikTok Feed System Design: Zero to 1B Core components for scaling feed generation and recommendation Feed Generation Pipeline Pre-compute candidate feeds for each user Video Storage Layer Cost-optimized tiered storage (hot/warm/cold) Recommendation Engine Real-time ranking with collaborative filtering Viral Load Scaling Auto-scaling with circuit breaker pattern ⚠ Hot video cache miss under viral spike Pre-warm cache for predicted trending content THECODEFORGE.IO
thecodeforge.io
TikTok Feed System Design: Zero to 1B
Design Tiktok
Feed Generation PipelineTHECODEFORGE.IOFeed Generation PipelineFrom billions of videos to a personalized top-NQuery All VideosScan billions of videosRank by RelevancePersonalized scoring per userTop-N SelectionReturn only the top resultsInfinite ScrollPaginate with cursor-based keys⚠ Scanning all videos per request is O(n) and breaks at scaleTHECODEFORGE.IO
thecodeforge.io
Feed Generation Pipeline
Design Tiktok

Video Storage: Where the Money Goes

Storing and serving videos is the biggest cost in TikTok. Raw video files are huge. The solution: transcode videos into multiple resolutions (240p to 4K), store in object storage (S3, GCS), and serve via CDN. But CDN egress is expensive. The trick is to only cache viral videos at the edge. Use a two-tier CDN: edge nodes cache only the top 1% of videos (by views in last hour), regional nodes cache the top 10%, and the rest come from origin. This cuts CDN costs by 60% while keeping p95 latency under 300ms.

VideoStorage.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// io.thecodeforge — System Design tutorial

// Video storage service with tiered caching

class VideoStorage {
    private final ObjectStore objectStore;
    private final Cache<String, byte[]> edgeCache; // small, fast
    private final Cache<String, byte[]> regionalCache; // larger, slower

    public byte[] getVideo(String videoId, String resolution) {
        String key = videoId + ":" + resolution;
        // Check edge cache first
        byte[] video = edgeCache.get(key);
        if (video != null) return video;

        // Check regional cache
        video = regionalCache.get(key);
        if (video != null) {
            edgeCache.put(key, video); // promote to edge
            return video;
        }

        // Fetch from object store
        video = objectStore.get(key);
        // Only cache in edge if video is viral (views > threshold)
        if (isViral(videoId)) {
            edgeCache.put(key, video);
        } else {
            regionalCache.put(key, video);
        }
        return video;
    }

    private boolean isViral(String videoId) {
        // Check view count in last hour from analytics
        return viewCountService.getHourlyViews(videoId) > 100_000;
    }
}
Output
Video 'v98765' at 720p returned in 12ms (edge cache hit), 45ms (regional), or 320ms (origin).
Senior Shortcut: Content-Aware Compression
Use different compression for different content types. Talking head videos compress well with H.265 at lower bitrates. High-motion content (dance, sports) needs higher bitrate. Analyze video content on upload and set compression profile accordingly. Saves 30% storage without quality loss.
Video Storage StrategyTHECODEFORGE.IOVideo Storage StrategyCost vs. performance for viral vs. cold contentViral VideosTranscoded to multiple resolutionsCached in CDN edge nodesHigh egress cost but high hit rateCold VideosStored in object storage onlyServed directly from originLow egress cost but rare accessOnly cache viral content to balance CDN cost and latencyTHECODEFORGE.IO
thecodeforge.io
Video Storage Strategy
Design Tiktok

Recommendation System: The Secret Sauce

TikTok's recommendation system is what keeps users hooked. It's not just collaborative filtering. It's a multi-stage ranking pipeline: first, retrieve candidates from multiple sources (follows, trending, similar users, content-based). Then, rank them using a deep learning model that predicts engagement (likes, shares, watch time). Finally, apply business rules (diversity, freshness, anti-spam). The key insight: the retrieval stage must be fast (sub-50ms) and the ranking stage must be accurate. Use approximate nearest neighbor (ANN) search for content-based retrieval, and a lightweight neural network for ranking. Don't try to rank all candidates—use a funnel: retrieve 500, rank 100, return 10.

RecommendationPipeline.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// io.thecodeforge — System Design tutorial

// Multi-stage recommendation pipeline

class RecommendationPipeline {
    private final CandidateRetriever retriever;
    private final RankingModel ranker;
    private final FilterChain filters;

    public List<Video> recommend(String userId, int count) {
        // Stage 1: Retrieve candidates (sub-50ms)
        List<Video> candidates = retriever.retrieve(userId, 500);

        // Stage 2: Rank using ML model (sub-100ms per batch)
        List<ScoredVideo> scored = ranker.score(candidates, userId);

        // Stage 3: Apply filters (diversity, freshness)
        List<Video> filtered = filters.apply(scored);

        return filtered.subList(0, Math.min(count, filtered.size()));
    }
}

class CandidateRetriever {
    // Multiple retrieval strategies
    public List<Video> retrieve(String userId, int limit) {
        List<Video> candidates = new ArrayList<>();
        candidates.addAll(getFolloweeVideos(userId, limit/4));
        candidates.addAll(getTrendingVideos(limit/4));
        candidates.addAll(getSimilarUserVideos(userId, limit/4));
        candidates.addAll(getContentBasedVideos(userId, limit/4));
        return candidates;
    }
}
Output
Recommendations for user 'xyz789' returned 10 videos in 180ms total (45ms retrieve, 95ms rank, 40ms filter).
Interview Gold: Cold Start Problem
New users have no history. Solution: use a default model based on device type, location, and time of day. Also, force-show a few popular videos to bootstrap the profile. Once user interacts, switch to personalized model.

Scaling Under Viral Load

A single viral video can cause a 10x traffic spike in minutes. Your system must handle this without falling over. The key is to design for overload: use a queue for video uploads (so transcoding doesn't block the API), use a CDN with a high cache hit ratio, and implement rate limiting at the API gateway. But the most important pattern is circuit breakers: if the feed generation service starts timing out, stop sending requests to it and return a stale cached feed instead. This prevents cascading failures. Also, use autoscaling with a fast startup time—container images should be small (<500MB) and pre-warmed.

ViralSpikeHandler.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — System Design tutorial

// Circuit breaker for feed generation

class FeedCircuitBreaker {
    private final AtomicInteger failureCount = new AtomicInteger(0);
    private final int threshold = 10; // failures in 1 minute
    private volatile boolean open = false;

    public List<Video> getFeed(String userId) {
        if (open) {
            // Return stale cached feed
            return staleFeedCache.get(userId);
        }
        try {
            List<Video> feed = feedGenerator.getFeed(userId);
            failureCount.set(0);
            return feed;
        } catch (TimeoutException e) {
            int failures = failureCount.incrementAndGet();
            if (failures >= threshold) {
                open = true;
                // Schedule reset after 30 seconds
                scheduler.schedule(() -> open = false, 30, TimeUnit.SECONDS);
            }
            throw e;
        }
    }
}
Output
During a viral spike, feed requests that would have timed out now return stale cached feeds in 5ms. Circuit breaker opens after 10 failures, closes after 30s.
Never Do This: Synchronous Transcoding
Never transcode videos synchronously on upload. A 4K video can take minutes to transcode. Use an async queue (SQS, Kafka) and return a 'processing' status. The client polls until the video is ready.
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
Feed generation service containers were OOM-killed every 10 minutes during peak hours. CPU was fine, memory was the killer.
Assumption
Team assumed a memory leak in the feed generator code. Spent days profiling heap dumps.
Root cause
The feed generator was loading full video metadata (including user profile pictures) for each video in the candidate pool. For a pool of 500 videos, that's 500 user objects with profile images. Each image was 500KB on average, leading to 250MB per request. With 20 concurrent requests, that's 5GB—over the 4GB container limit.
Fix
Lazy-load user profile images only when rendering the feed card. Reduced per-request memory from 250MB to 2MB. Also added a per-container request concurrency limit of 10.
Key lesson
  • Never load full user objects in feed generation.
  • Only load what's needed for ranking, then hydrate display data on the client or via a separate lightweight API.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Feed generation latency >2s for cache misses
Fix
1. Check if the candidate retrieval is slow (query database or cache). 2. Ensure ANN index is loaded in memory. 3. Check if ranking model inference is CPU-bound. 4. Add more feed generator pods.
Symptom · 02
Video uploads failing with 503 Service Unavailable
Fix
1. Check object store write throughput limits. 2. Ensure upload queue is not full. 3. Verify presigned URL expiration. 4. Increase queue consumers.
Symptom · 03
CDN costs increasing 50% month-over-month
Fix
1. Check cache hit ratio (should be >90% for edge). 2. Verify viral threshold is correct. 3. Implement content-aware compression. 4. Negotiate CDN contract for egress discounts.
★ Design TikTok Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Feed slow for all users
Immediate action
Check feed cache hit ratio in Redis
Commands
redis-cli info stats | grep hit_rate
redis-cli --bigkeys
Fix now
Increase cache size or precompute feeds for top users
Video transcoding backlog+
Immediate action
Check queue depth in SQS/Kafka
Commands
aws sqs get-queue-attributes --queue-url <url> --attribute-names ApproximateNumberOfMessages
kafka-consumer-groups --bootstrap-server <broker> --group transcoders --describe
Fix now
Scale up transcoder workers or reduce video resolution options
High CDN egress costs+
Immediate action
Check CDN cache hit ratio
Commands
curl <cdn-endpoint>/stats?cache_hit_ratio
Check top 100 videos by egress in CDN logs
Fix now
Increase edge cache TTL for viral videos, implement tiered caching
Recommendations not personalized+
Immediate action
Check user embedding freshness
Commands
curl <ml-service>/user-embedding/<userId>
Check last interaction timestamp in user profile
Fix now
Force recompute embedding on next interaction
AspectFan-out-on-writeFan-out-on-read
Latency for active usersLow (precomputed)High (computed on read)
Write amplificationHigh (updates for each follower)None
Storage costHigh (cached feeds)Low (no cache)
Best forActive users with few followeesInactive users or mega-creators

Key takeaways

1
Feed generation is a trade-off between latency and write amplification
use fan-out-on-write for active users with few followees, fan-out-on-read for inactive users and mega-creators.
2
Video storage costs dominate
use tiered caching (edge for viral, regional for popular, origin for rest) and content-aware compression to cut costs by 60%.
3
Recommendation is a funnel
retrieve 500 candidates from multiple sources, rank 100 with ML, return 10. Use ANN for fast content-based retrieval.
4
The counterintuitive truth
precomputing feeds for everyone is a trap. The most scalable systems use a hybrid approach that adapts to user activity and creator popularity.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does TikTok's feed generation handle a user who follows 10,000 peopl...
Q02SENIOR
When would you choose fan-out-on-write over fan-out-on-read in a product...
Q03SENIOR
What happens when a video goes viral and millions of users request it si...
Q04JUNIOR
What is the role of approximate nearest neighbor (ANN) search in TikTok'...
Q05SENIOR
A production incident: the feed generation service is OOM-killed every 1...
Q06SENIOR
How would you design the system to handle a new feature like 'live strea...
Q01 of 06SENIOR

How does TikTok's feed generation handle a user who follows 10,000 people? Does it precompute feeds for all of them?

ANSWER
No, that would be write amplification hell. For users with many followees, the system uses fan-out-on-read: when the user requests their feed, it fetches recent videos from all followees (using a time-ordered index) and ranks them. Precomputation is only for active users with a moderate number of followees (e.g., <1000).
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How does TikTok generate personalized feeds for billions of users without melting the database?
02
What's the difference between fan-out-on-write and fan-out-on-read in system design?
03
How do I reduce CDN costs for a video streaming service like TikTok?
04
What happens when a video goes viral and millions of users request it at once?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Real World. Mark it forged?

3 min read · try the examples if you haven't

Previous
Design Pastebin
28 / 40 · Real World
Next
Design Reddit