Senior 3 min · June 25, 2026

Design TikTok: From Zero to 1B Feeds with Real-World System Design

Q: How does TikTok generate personalized feeds for billions of users without melting the database?

TikTok uses a hybrid approach: for active users, it precomputes feeds (fan-out-on-write) and caches them in Redis. For inactive users, it computes feeds on demand (fan-out-on-read). This balances latency and write amplification. The database is only queried for candidate retrieval, not for full feed generation.

Q: What's the difference between fan-out-on-write and fan-out-on-read in system design?

Fan-out-on-write precomputes feeds when a creator uploads a video, pushing it to all followers' caches. Fan-out-on-read computes the feed when the user requests it, fetching recent videos from followees. Use fan-out-on-write for low-latency reads and many reads per write; use fan-out-on-read to avoid write amplification when creators have many followers.

Q: How do I reduce CDN costs for a video streaming service like TikTok?

Implement tiered caching: edge CDN caches only viral videos (top 1% by views), regional caches popular videos (top 10%), and the rest come from origin. Use content-aware compression (lower bitrate for static content). Also, negotiate egress discounts with CDN providers and use peer-to-peer CDN for very large files.

Q: What happens when a video goes viral and millions of users request it at once?

The CDN absorbs most requests. If the video is not cached, use request coalescing (only one request to origin, others wait on a lock). Proactively pre-warm CDN by pushing videos to edge nodes when view count velocity exceeds a threshold. Also, use a circuit breaker to fall back to stale content if origin is overloaded.

System design for TikTok: feed generation, storage, caching, and scaling.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

TikTok's system design relies on a hybrid of precomputed and real-time feed generation, using a combination of object storage, CDN, and a custom recommendation engine. The core is a fan-out-on-write approach for active users and fan-out-on-read for long-tail creators.

✦ Definition~90s read

What is Design TikTok?

Design TikTok is the system architecture behind a short-video platform serving billions of users. It covers feed generation, video storage, recommendation systems, and real-time scaling.

★

Think of TikTok as a massive library where every visitor wants a personalized stack of books.

Plain-English First

Think of TikTok as a massive library where every visitor wants a personalized stack of books. Instead of having a librarian fetch books on demand for each person (too slow), the library pre-stacks popular books on tables (precomputed feeds). For less popular books, the librarian grabs them when asked (real-time fetch). The library also remembers what you liked before to guess what you want next.

You think scaling a read-heavy app is hard? Try scaling a read-heavy app where every user expects a unique, personalized feed of short videos, and where a single viral video can spike traffic by 10x in minutes. That's TikTok. Most system design tutorials give you textbook diagrams with load balancers and databases, but they skip the hard parts: how do you generate a feed for 1 billion users without melting your database? How do you store and serve petabytes of video without bankrupting your company on CDN costs? This article walks you through the real production decisions behind TikTok's architecture—the trade-offs, the failure modes, and the patterns that actually work at scale. By the end, you'll know how to design a feed system that handles viral spikes, reduces storage costs, and keeps latency under 200ms.

Feed Generation: The Core Problem

Every social media app has a feed. But TikTok's feed is different: it's not just chronological. It's a personalized, infinite scroll of short videos. The naive approach—query all videos, rank them, return top N—breaks at scale. A single query could scan billions of videos. The real solution is a two-tier approach: precompute feeds for active users (fan-out-on-write) and compute on read for inactive users. Why? Because active users expect instant updates, while inactive users can tolerate a few seconds of latency. The trade-off is storage: precomputed feeds take space. But with a TTL of 1 hour and a limit of 500 videos per feed, it's manageable.

FeedGenerator.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Feed generation service: precomputes feeds for active users
// Active user = logged in within last 24h

class FeedGenerator {
    private final Cache<String, List<Video>> feedCache;
    private final VideoRanker ranker;
    private final CreatorGraph graph; // who user follows

    public List<Video> getFeed(String userId) {
        // Check cache first
        List<Video> cached = feedCache.get(userId);
        if (cached != null) return cached;

        // Fan-out-on-read: compute feed on demand
        Set<String> followees = graph.getFollowees(userId);
        List<Video> candidates = fetchRecentVideos(followees, 500);
        List<Video> ranked = ranker.rank(candidates, userId);
        feedCache.put(userId, ranked, Duration.ofMinutes(30));
        return ranked;
    }

    // Called when a creator uploads a new video
    public void onNewVideo(String creatorId, Video video) {
        // Fan-out-on-write: push to active followers
        Set<String> activeFollowers = graph.getActiveFollowers(creatorId);
        for (String followerId : activeFollowers) {
            feedCache.evict(followerId); // force recompute on next read
            // Alternatively, prepend to a precomputed list
        }
    }
}

Output

Feed for user 'abc123' returned in 45ms (cache hit) or 2.1s (cache miss, computed on read).

Production Trap: Write Amplification

If you fan-out-on-write to every follower, a celebrity with 10M followers causes 10M cache updates per video. That's 10M writes for one upload. Mitigation: only fan-out to active followers (logged in last 24h). For mega-creators, switch to fan-out-on-read entirely.

Feed Generation Strategy Decision Tree

IfUser active in last 24h

→

UseFan-out-on-write: precompute feed on new video upload

IfUser inactive >24h

→

UseFan-out-on-read: compute feed on login

IfCreator has >1M followers

→

UseFan-out-on-read for all followers to avoid write amplification

thecodeforge.io

TikTok Feed System Design: Zero to 1B

Design Tiktok

thecodeforge.io

Feed Generation Pipeline

Design Tiktok

Video Storage: Where the Money Goes

Storing and serving videos is the biggest cost in TikTok. Raw video files are huge. The solution: transcode videos into multiple resolutions (240p to 4K), store in object storage (S3, GCS), and serve via CDN. But CDN egress is expensive. The trick is to only cache viral videos at the edge. Use a two-tier CDN: edge nodes cache only the top 1% of videos (by views in last hour), regional nodes cache the top 10%, and the rest come from origin. This cuts CDN costs by 60% while keeping p95 latency under 300ms.

VideoStorage.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Video storage service with tiered caching

class VideoStorage {
    private final ObjectStore objectStore;
    private final Cache<String, byte[]> edgeCache; // small, fast
    private final Cache<String, byte[]> regionalCache; // larger, slower

    public byte[] getVideo(String videoId, String resolution) {
        String key = videoId + ":" + resolution;
        // Check edge cache first
        byte[] video = edgeCache.get(key);
        if (video != null) return video;

        // Check regional cache
        video = regionalCache.get(key);
        if (video != null) {
            edgeCache.put(key, video); // promote to edge
            return video;
        }

        // Fetch from object store
        video = objectStore.get(key);
        // Only cache in edge if video is viral (views > threshold)
        if (isViral(videoId)) {
            edgeCache.put(key, video);
        } else {
            regionalCache.put(key, video);
        }
        return video;
    }

    private boolean isViral(String videoId) {
        // Check view count in last hour from analytics
        return viewCountService.getHourlyViews(videoId) > 100_000;
    }
}

Output

Video 'v98765' at 720p returned in 12ms (edge cache hit), 45ms (regional), or 320ms (origin).

Senior Shortcut: Content-Aware Compression

Use different compression for different content types. Talking head videos compress well with H.265 at lower bitrates. High-motion content (dance, sports) needs higher bitrate. Analyze video content on upload and set compression profile accordingly. Saves 30% storage without quality loss.

thecodeforge.io

Video Storage Strategy

Design Tiktok

Recommendation System: The Secret Sauce

TikTok's recommendation system is what keeps users hooked. It's not just collaborative filtering. It's a multi-stage ranking pipeline: first, retrieve candidates from multiple sources (follows, trending, similar users, content-based). Then, rank them using a deep learning model that predicts engagement (likes, shares, watch time). Finally, apply business rules (diversity, freshness, anti-spam). The key insight: the retrieval stage must be fast (sub-50ms) and the ranking stage must be accurate. Use approximate nearest neighbor (ANN) search for content-based retrieval, and a lightweight neural network for ranking. Don't try to rank all candidates—use a funnel: retrieve 500, rank 100, return 10.

RecommendationPipeline.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Multi-stage recommendation pipeline

class RecommendationPipeline {
    private final CandidateRetriever retriever;
    private final RankingModel ranker;
    private final FilterChain filters;

    public List<Video> recommend(String userId, int count) {
        // Stage 1: Retrieve candidates (sub-50ms)
        List<Video> candidates = retriever.retrieve(userId, 500);

        // Stage 2: Rank using ML model (sub-100ms per batch)
        List<ScoredVideo> scored = ranker.score(candidates, userId);

        // Stage 3: Apply filters (diversity, freshness)
        List<Video> filtered = filters.apply(scored);

        return filtered.subList(0, Math.min(count, filtered.size()));
    }
}

class CandidateRetriever {
    // Multiple retrieval strategies
    public List<Video> retrieve(String userId, int limit) {
        List<Video> candidates = new ArrayList<>();
        candidates.addAll(getFolloweeVideos(userId, limit/4));
        candidates.addAll(getTrendingVideos(limit/4));
        candidates.addAll(getSimilarUserVideos(userId, limit/4));
        candidates.addAll(getContentBasedVideos(userId, limit/4));
        return candidates;
    }
}

Output

Recommendations for user 'xyz789' returned 10 videos in 180ms total (45ms retrieve, 95ms rank, 40ms filter).

Interview Gold: Cold Start Problem

New users have no history. Solution: use a default model based on device type, location, and time of day. Also, force-show a few popular videos to bootstrap the profile. Once user interacts, switch to personalized model.

Scaling Under Viral Load

A single viral video can cause a 10x traffic spike in minutes. Your system must handle this without falling over. The key is to design for overload: use a queue for video uploads (so transcoding doesn't block the API), use a CDN with a high cache hit ratio, and implement rate limiting at the API gateway. But the most important pattern is circuit breakers: if the feed generation service starts timing out, stop sending requests to it and return a stale cached feed instead. This prevents cascading failures. Also, use autoscaling with a fast startup time—container images should be small (<500MB) and pre-warmed.

ViralSpikeHandler.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Circuit breaker for feed generation

class FeedCircuitBreaker {
    private final AtomicInteger failureCount = new AtomicInteger(0);
    private final int threshold = 10; // failures in 1 minute
    private volatile boolean open = false;

    public List<Video> getFeed(String userId) {
        if (open) {
            // Return stale cached feed
            return staleFeedCache.get(userId);
        }
        try {
            List<Video> feed = feedGenerator.getFeed(userId);
            failureCount.set(0);
            return feed;
        } catch (TimeoutException e) {
            int failures = failureCount.incrementAndGet();
            if (failures >= threshold) {
                open = true;
                // Schedule reset after 30 seconds
                scheduler.schedule(() -> open = false, 30, TimeUnit.SECONDS);
            }
            throw e;
        }
    }
}

Output

During a viral spike, feed requests that would have timed out now return stale cached feeds in 5ms. Circuit breaker opens after 10 failures, closes after 30s.

Never Do This: Synchronous Transcoding

Never transcode videos synchronously on upload. A 4K video can take minutes to transcode. Use an async queue (SQS, Kafka) and return a 'processing' status. The client polls until the video is ready.

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

Feed generation service containers were OOM-killed every 10 minutes during peak hours. CPU was fine, memory was the killer.

Assumption

Team assumed a memory leak in the feed generator code. Spent days profiling heap dumps.

Root cause

The feed generator was loading full video metadata (including user profile pictures) for each video in the candidate pool. For a pool of 500 videos, that's 500 user objects with profile images. Each image was 500KB on average, leading to 250MB per request. With 20 concurrent requests, that's 5GB—over the 4GB container limit.

Fix

Lazy-load user profile images only when rendering the feed card. Reduced per-request memory from 250MB to 2MB. Also added a per-container request concurrency limit of 10.

Key lesson

Never load full user objects in feed generation.
Only load what's needed for ranking, then hydrate display data on the client or via a separate lightweight API.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

Feed generation latency >2s for cache misses

→

Fix

1. Check if the candidate retrieval is slow (query database or cache). 2. Ensure ANN index is loaded in memory. 3. Check if ranking model inference is CPU-bound. 4. Add more feed generator pods.

Symptom · 02

Video uploads failing with 503 Service Unavailable

→

Fix

1. Check object store write throughput limits. 2. Ensure upload queue is not full. 3. Verify presigned URL expiration. 4. Increase queue consumers.

Symptom · 03

CDN costs increasing 50% month-over-month

→

Fix

1. Check cache hit ratio (should be >90% for edge). 2. Verify viral threshold is correct. 3. Implement content-aware compression. 4. Negotiate CDN contract for egress discounts.

★ Design TikTok Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

Feed slow for all users−

Immediate action

Check feed cache hit ratio in Redis

Commands

redis-cli info stats | grep hit_rate

redis-cli --bigkeys

Fix now

Increase cache size or precompute feeds for top users

Video transcoding backlog+

High CDN egress costs+

Recommendations not personalized+

Aspect	Fan-out-on-write	Fan-out-on-read
Latency for active users	Low (precomputed)	High (computed on read)
Write amplification	High (updates for each follower)	None
Storage cost	High (cached feeds)	Low (no cache)
Best for	Active users with few followees	Inactive users or mega-creators

Key takeaways

Feed generation is a trade-off between latency and write amplification

use fan-out-on-write for active users with few followees, fan-out-on-read for inactive users and mega-creators.

Video storage costs dominate

use tiered caching (edge for viral, regional for popular, origin for rest) and content-aware compression to cut costs by 60%.

Recommendation is a funnel

retrieve 500 candidates from multiple sources, rank 100 with ML, return 10. Use ANN for fast content-based retrieval.

The counterintuitive truth

precomputing feeds for everyone is a trap. The most scalable systems use a hybrid approach that adapts to user activity and creator popularity.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does TikTok's feed generation handle a user who follows 10,000 peopl...

Q02SENIOR

When would you choose fan-out-on-write over fan-out-on-read in a product...

Q03SENIOR

What happens when a video goes viral and millions of users request it si...

Q04JUNIOR

What is the role of approximate nearest neighbor (ANN) search in TikTok'...

Q05SENIOR

A production incident: the feed generation service is OOM-killed every 1...

Q06SENIOR

How would you design the system to handle a new feature like 'live strea...

Q01 of 06SENIOR

How does TikTok's feed generation handle a user who follows 10,000 people? Does it precompute feeds for all of them?

ANSWER

No, that would be write amplification hell. For users with many followees, the system uses fan-out-on-read: when the user requests their feed, it fetches recent videos from all followees (using a time-ordered index) and ranks them. Precomputation is only for active users with a moderate number of followees (e.g., <1000).

FAQ · 4 QUESTIONS

Frequently Asked Questions

How does TikTok generate personalized feeds for billions of users without melting the database?

What's the difference between fan-out-on-write and fan-out-on-read in system design?

How do I reduce CDN costs for a video streaming service like TikTok?

What happens when a video goes viral and millions of users request it at once?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Written from production experience, not tutorials.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Real World. Mark it forged?

3 min read · try the examples if you haven't