Senior 11 min · March 06, 2026

Design Instagram — Hybrid Push-Pull Feed at 100M Followers

A celebrity post triggered 100M Kafka fan-out tasks, spiking latency from 200ms to 20s.

N
Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Drawn from code that ran under real load.

Follow
Production
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Instagram serves 2B+ users uploading 100M photos/videos daily, delivering feeds in under 300ms
  • Core components: Load balancer, API gateway, Object Store (S3), SQL/NoSQL databases, CDN, Feed Service
  • Feed generation uses a hybrid push-pull model: push for normal users (fan-out on write), pull for celebrities (fan-out on read)
  • Caching layers: Redis for precomputed feeds, LRU for profiles/metadata, CDN for static media – reduces DB reads by 80%
  • Storage: 200TB new data daily; hot in S3 for 30 days, then cold archive; CDN edge caches popular media globally
  • Scaling: shard by user_id via consistent hashing, use message queues (Kafka) for async fan-out, watch for celebrity hot spots
✦ Definition~90s read
What is Design Instagram?

This is a deep-dive system design walkthrough for building Instagram's feed at 100 million daily active users, framed as a senior-level interview problem. The core challenge is generating a personalized, real-time feed for each user from a pool of 100 million daily photo uploads, while keeping p95 latency under 300ms.

Imagine a giant post office where a billion people each send photo postcards every day.

The article dissects why this specific problem is a favorite in FAANG interviews—it forces you to balance read vs. write amplification, choose between push (fanout-on-write) and pull (fanout-on-read) architectures, and justify tradeoffs at scale. You'll see concrete numbers: 500 million active users, 100 million daily uploads, and the math behind why a pure push model breaks at around 10,000 followers per user average.

The solution presented is a hybrid push-pull model: push for high-value users (celebrities with millions of followers) and pull for everyone else, with a fanout threshold tuned at runtime. The article covers database sharding by user_id hash (not follower count), multi-level caching with Redis clusters for hot feeds and local LRU caches for edge nodes, and object storage tiering from S3/CloudFront CDN to Glacier for cold archival.

You'll see why Cassandra or ScyllaDB is preferred over Postgres for the feed timeline table, and how to shard the social graph (follower edges) across 100+ nodes to avoid hot spots. The caching strategy alone—L1 (in-memory), L2 (Redis), L3 (CDN for media)—is designed to serve 95% of feed requests from L1/L2, with a fallback to pull from the database for cache misses.

This isn't theory; it's the playbook used by teams at Meta, TikTok, and Twitter to handle billions of feed refreshes daily.

Plain-English First

Imagine a giant post office where a billion people each send photo postcards every day. The post office has to instantly sort every postcard, deliver it only to the people who care about that sender, store the original photo safely forever, and let anyone pull up an old postcard in under a second — even at 3 AM during a major event. Instagram is exactly that post office, and designing it means figuring out every room, shelf, conveyor belt, and delivery truck needed to make it all work without ever losing a single photo.

Instagram serves over two billion monthly active users, processes roughly 100 million photo and video uploads every day, and is expected to return a personalised feed in under 300 milliseconds. When an interviewer asks you to design it, they're not looking for a diagram of boxes connected by arrows — they're watching whether you can reason about trade-offs at scale, make deliberate architectural decisions, and defend them under pressure. This is one of the most common system design questions in FAANG-level loops, and candidates who haven't internalised the nuances consistently get stuck on the feed generation problem or grossly underestimate storage requirements.

The core challenge Instagram solves is deceptively simple on the surface: store media, show it to followers, let people discover new content. Underneath, it's a collision of three genuinely hard distributed-systems problems — write-heavy media ingestion, read-heavy personalised feed delivery, and near-real-time social graph queries — all happening concurrently at planetary scale. Each of those problems demands different storage engines, caching strategies, and consistency models, and they have to coexist inside one coherent product.

By the end of this article you'll have a complete, defensible Instagram design you can present in a 45-minute interview. You'll know the exact numbers to anchor your estimates, the right database choices for each data type, how to generate feeds without melting your servers, where CDNs fit, how to shard your data, and — critically — which trade-offs to call out proactively so the interviewer knows you're thinking like an engineer who has shipped things to production, not just read about it.

Why the Instagram Interview Tests Feed Design

The Design Instagram interview asks you to architect a social feed system for 100M+ followers. The core mechanic is the hybrid push-pull model: pre-compute feeds for high-value users (push) and generate on-read for long-tail users (pull). This balances write amplification against read latency.

In practice, you must decide the fanout threshold — typically around 10K followers. Below that, push to all followers' inboxes. Above that, pull from a timeline cache on read. The key properties: push gives O(1) read but O(followers) write; pull gives O(1) write but O(followers) read. Hybrid keeps both under control.

You use this pattern when writes must be fast (posting a photo) and reads must be fresh (scrolling feed). It matters because naive push at 100M followers means a single post triggers 100M writes — impossible. Hybrid is the only way to serve a billion daily active users without collapsing the database.

Fanout Threshold Is Not Static
Teams often hardcode the threshold at 10K, but it must be dynamic — based on current write load, cache hit rate, and follower distribution.
Production Insight
When a celebrity posts during a live event, the pull path for their 50M followers sees a thundering herd on the timeline cache — each read triggers a full fanout query. The symptom is P99 latency spikes from 50ms to 5s. The rule: always add a short-lived write-through cache (TTL 30s) for high-fanout users' recent posts.
Key Takeaway
Push is for high-write-volume users with moderate read volume; pull is for everyone else.
The fanout threshold is the single most impactful tuning knob — get it wrong and you either burn writes or starve reads.
Always design for the 99th percentile user, not the median — a single celebrity post can dominate your system's load.
Instagram Hybrid Push-Pull Feed Design THECODEFORGE.IO Instagram Hybrid Push-Pull Feed Design Architecture for 100M followers with push-pull hybrid model User Upload 100M daily uploads to object store Fanout Service Push to active followers, pull for passive Sharded Database User ID-based sharding for feed tables Multi-Level Cache Redis + local cache for sub-300ms reads Feed Generation Merge push and pull results in real-time CDN & Cold Archive Media served via CDN, old data archived ⚠ Push to all followers at 100M causes write amplification Use hybrid: push to active users, pull for inactive or large fanouts THECODEFORGE.IO
thecodeforge.io
Instagram Hybrid Push-Pull Feed Design
Design Instagram Interview

Core Component Architecture: Handling 100M Uploads Daily

Designing Instagram isn't just about 'uploading a file.' It's about a decoupled architecture where the Write Path (Uploading) and the Read Path (Feed Generation) are optimized independently. On the write side, we use a Load Balancer to distribute traffic to an API Gateway, which handles authentication and rate limiting. The media itself (Photos/Videos) never touches our relational database; instead, it's streamed to an Object Store like AWS S3 or Google Cloud Storage.

We store the metadata (User ID, Photo URL, Timestamp, Location) in a distributed NoSQL database like Cassandra or a sharded PostgreSQL cluster. This separation ensures that even if our metadata database is busy, our media storage remains performant and durable. To make the images load instantly worldwide, we push them to Edge Locations via a Content Delivery Network (CDN).

io.thecodeforge.instagram.MediaService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package io.thecodeforge.instagram;

import java.util.UUID;
import java.time.Instant;

/**
 * Represents the Metadata record for a high-scale media upload.
 * In production, this would be persisted to a sharded DB cluster.
 */
public class PhotoMetadata {
    private final String photoId;
    private final String userId;
    private final String s3Url;
    private final long timestamp;

    public PhotoMetadata(String userId, String s3Url) {
        this.photoId = UUID.randomUUID().toString();
        this.userId = userId;
        this.s3Url = s3Url;
        this.timestamp = Instant.now().getEpochSecond();
    }

    public void saveToDatabase() {
        // High-level logic for sharded database insertion
        System.out.println("Persisting metadata to shard based on userId: " + userId);
        System.out.println("Photo ID: " + photoId + " | Storage Path: " + s3Url);
    }

    public static void main(String[] args) {
        PhotoMetadata upload = new PhotoMetadata("user_8821", "https://s3.thecodeforge.io/bucket/img_99.jpg");
        upload.saveToDatabase();
    }
}
Output
Persisting metadata to shard based on userId: user_8821
Photo ID: 7c9e... | Storage Path: https://s3.thecodeforge.io/bucket/img_99.jpg
Forge Tip: The 'Pull' vs 'Push' Feed Model
For regular users, 'Push' their posts to followers' pre-computed feeds (Fan-out). For celebrities with millions of followers, 'Pull' their content only when a follower refreshes their feed. This hybrid approach prevents 'Celebrity Fan-out' from crashing your message queues.
Production Insight
Straight push to all followers works only when your fan-out ratio is low.
At 100M followers per celebrity, a single write triggers 100M queue entries – that's enough to OOM any worker pool.
Rule: always add a dynamic threshold that switches to pull for users above a certain follower count.
Debug it: monitor Kafka partition lag per user – if one user's partition lag grows while others are idle, you've hit the celebrity edge case.
Key Takeaway
Separate write and read paths are non-negotiable.
Push vs pull: efficiency comes from knowing where your curve breaks.
If you don't set a celebrity threshold, a single viral post will take down your feed.
Feed Generation Strategy Decision
IfFollower count < 10,000
UsePush: write post to each follower's feed cache immediately. Low write cost.
IfFollower count 10,000 – 1,000,000
UsePush with batch fan-out: enqueue fan-out tasks in Kafka; use 50 workers per partition.
IfFollower count > 1,000,000
UsePull: store post in a hot table; on feed request, merge recent posts from all followed accounts via a read-path query.

Database Sharding & Scalability Strategies

A single database instance will fail at Instagram's scale. We must shard our data. The best strategy is to shard by User_ID. This ensures that all photos from a single user live on the same shard, making the 'View Profile' query extremely fast. However, for the 'Global Feed', we might need a secondary index or a specialized search service like Elasticsearch.

We also implement a multi-layered caching strategy: Redis for the 'Latest Feed' (Pre-computed), and an LRU cache at the application level for frequently accessed user profiles. This reduces the DB read pressure by over 80%.

SchemaDesign.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- io.thecodeforge.instagram - Database Schema Concepts
-- Sharding Key: user_id

CREATE TABLE io_thecodeforge.users (
    user_id BIGINT PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE io_thecodeforge.photos (
    photo_id BIGINT PRIMARY KEY,
    user_id BIGINT REFERENCES io_thecodeforge.users(user_id),
    image_path VARCHAR(255) NOT NULL,
    caption TEXT,
    latitude DECIMAL(9,6),
    longitude DECIMAL(9,6),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Index for fast Feed Generation (Sorted by Time)
CREATE INDEX idx_user_photos_time ON io_thecodeforge.photos(user_id, created_at DESC);
Output
Schema created. Sharding logic should be handled by the application layer or middleware like Vitess.
Production Reality:
Don't use auto-incrementing IDs in a distributed system. Use a 'Snowflake' ID generator (like Twitter's) to create unique, time-sortable 64-bit IDs across multiple shards.
Production Insight
Sharding by user_id makes profile queries fast but creates hot spots for celebrities.
One celebrity shard gets 100x the writes of others, causing latency tail to spike.
Rule: use consistent hashing with virtual nodes to spread celebrity write load.
Debug it: monitor per-shard write latency; if one shard is hotter, reassign virtual nodes dynamically.
Key trade-off: you give up easy range queries across users – secondary indexes are needed for global feed or search.
Key Takeaway
Shard by user_id for profile locality.
Consistent hashing handles elastic scaling without downtime.
Always plan for hot keys – they're not a rare edge case, they're guaranteed as you grow.
Shard Key Selection
IfEqual distribution of users across shards
UseUse user_id modulo N – simple, but leads to incremental rebalancing when adding shards.
IfHandle celebrity hot spots without manual rebalancing
UseUse consistent hashing with 1000 virtual nodes per physical shard – automatically distributes hot keys.
IfNeed cross-shard queries (e.g., find photos near a location)
UseAdd a secondary index table sharded differently (e.g., by geo hash) or use Elasticsearch.

Feed Generation: The Push-Pull Hybrid Model

The feed is the heart of Instagram. It must show recent posts from followed users in reverse chronological order (or ranked by engagement). Two classic approaches exist: pull (fan-out on read) and push (fan-out on write). Pull means when a user opens the app, we query all followed users' recent posts and merge. Push means when a user posts, we insert that post into every follower's precomputed feed list.

Pull is efficient for celebrities because you don't push to millions; but it's slow for users following hundreds of accounts because you need many queries. Push is fast for reading but costly for writes, especially for popular users. The solution: use push for regular users (fan-out on write) and pull for users with >1M followers (fan-out on read). This hybrid model balances the trade-offs.

io.thecodeforge.instagram.FeedGenerationService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
package io.thecodeforge.instagram;

import java.util.*;
import java.util.concurrent.*;

public class FeedGenerationService {
    private static final long CELEBRITY_THRESHOLD = 1_000_000;
    private final FollowerService followerService;
    private final FeedCache feedCache;
    private final ExecutorService fanoutExecutor = Executors.newFixedThreadPool(20);

    public void onNewPost(Post post, String userId) {
        long followerCount = followerService.getFollowerCount(userId);
        if (followerCount < CELEBRITY_THRESHOLD) {
            fanoutToAllFollowers(post, userId);
        } else {
            // celebrity: store post for pull-based retrieval
            feedCache.storeCelebrityPost(userId, post);
        }
    }

    private void fanoutToAllFollowers(Post post, String userId) {
        List<String> followers = followerService.getFollowers(userId);
        for (String followerId : followers) {
            fanoutExecutor.submit(() -> feedCache.addToFeed(followerId, post));
        }
    }

    public List<Post> getFeed(String userId, int limit) {
        // merge cached feed with recent posts from followed celebrities
        List<Post> cachedPosts = feedCache.getFeed(userId);
        List<String> followedCelebrities = followerService.getFollowedCelebrities(userId);
        List<Post> celebrityPosts = feedCache.getRecentCelebrityPosts(followedCelebrities);
        List<Post> merged = mergeAndSort(cachedPosts, celebrityPosts);
        return merged.subList(0, Math.min(limit, merged.size()));
    }

    private List<Post> mergeAndSort(List<Post> a, List<Post> b) {
        // merge two sorted (by timestamp descending) lists
        List<Post> result = new ArrayList<>();
        int i = 0, j = 0;
        while (i < a.size() && j < b.size()) {
            result.add(a.get(i).getTimestamp() >= b.get(j).getTimestamp() ? a.get(i++) : b.get(j++));
        }
        while (i < a.size()) result.add(a.get(i++));
        while (j < b.size()) result.add(b.get(j++));
        return result;
    }
}
Output
Feed generation service uses hybrid push/pull based on follower count.
The Pub/Sub Hybrid Analogy
  • Push: writer writes once, but many recipients must read. Good for small fan-outs.
  • Pull: reader reads from many sources; good for large fan-outs because writer isn't burdened.
  • Hybrid: set a threshold where the cost of push exceeds the benefit of instant delivery.
  • Threshold should be configurable and adjusted based on system load – you can even make it dynamic.
Production Insight
Push fan-out with 100M followers creates 100M writes per post – that's a 100M spike in write traffic.
If your message queue isn't partitioned properly, a single partition gets backed up and all feeds slow down.
Rule: use multiple Kafka partitions and route fan-out tasks by hash of follower ID, not user ID.
Debug it: if one partition lag is high, check if that user's fan-out tasks are all in one partition due to bad key selection.
Performance impact: hybrid reduces write amplification by 99% for celebrities while keeping feed generation under 50ms for normal users.
Key Takeaway
Feed generation is the main bottleneck at scale.
Push works for small groups, pull works for massive groups – hybrid is the engineering answer.
Make the threshold configurable: you'll tune it based on observed consumer throughput and latency SLAs.

Storage Strategy: Object Store, CDN, and Cold Archival

Media storage at Instagram's scale requires a tiered approach. The primary storage is an object store (AWS S3, Google Cloud Storage) because they offer near-infinite capacity, strong durability (99.999999999%), and pay-per-GB pricing. However, serving every image directly from S3 would be too slow for users far from the data center and expensive in egress costs. That's where CDNs come in: we push popular media to edge servers worldwide so users download from a nearby node.

For cold data – photos older than 30 days with zero views – we move them to a cheaper archival tier (Amazon S3 Glacier, Google Cloud Archive) and serve a placeholder if accessed. The CDN cache also holds a copy for a shorter TTL (e.g., 7 days for popular content, 1 day for normal). Videos are stored as HLS segments for adaptive bitrate streaming.

io.thecodeforge.instagram.StorageStrategy.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
package io.thecodeforge.instagram;

import java.time.*;

public class StorageStrategy {
    enum StorageTier { HOT, COLD, ARCHIVE }

    public StorageTier classifyMedia(String mediaId, Instant lastAccessTime) {
        long daysSinceLastAccess = Duration.between(lastAccessTime, Instant.now()).toDays();
        if (daysSinceLastAccess <= 30) {
            return StorageTier.HOT;
        } else if (daysSinceLastAccess <= 365) {
            return StorageTier.COLD;
        } else {
            return StorageTier.ARCHIVE;
        }
    }

    // Production: this method would invoke S3 lifecycle policies
    public String getMediaUrl(String mediaId, StorageTier tier) {
        switch(tier) {
            case HOT:
                return "https://s3.us-east-1.amazonaws.com/instagram-hot/" + mediaId;
            case COLD:
                return "https://s3.us-west-2.amazonaws.com/instagram-cold/" + mediaId;
            case ARCHIVE:
                return "https://glacier.amazonaws.com/instagram-archive/" + mediaId;
            default:
                throw new IllegalArgumentException("Unknown tier");
        }
    }

    public static void main(String[] args) {
        StorageStrategy s = new StorageStrategy();
        StorageTier tier = s.classifyMedia("photo_abc123", Instant.now().minus(60, ChronoUnit.DAYS));
        System.out.println("Media URL: " + s.getMediaUrl("photo_abc123", tier));
    }
}
Output
Media URL: https://s3.us-west-2.amazonaws.com/instagram-cold/photo_abc123
Data Lifecycle Management
Set S3 lifecycle rules to transition objects: 30 days to Infrequent Access (reduced cost), 365 days to Glacier (archive). CDN TTLs should be shorter (1-7 days) for freshness; use invalidation handles for immediate removal.
Production Insight
Storing all media in one S3 bucket makes it easy to manage but creates a permission nightmare.
Use separate buckets per storage tier (hot, cold, archive) with IAM roles restricting write access to the upload service only.
Rule: never give public read access to S3 – always use CDN signed URLs with expiry.
Performance impact: CDN adds 10-20ms latency but reduces origin load by 95% and saves 80% on egress costs.
Key trade-off: faster global delivery costs more for cache invalidation – you can't immediately purge all edges; use versioned URLs (e.g., include upload timestamp) to avoid cache staleness.
Key Takeaway
S3 for durability, CDN for speed, Glacier for cost.
Never expose S3 direct URLs – always use CDN signed URLs.
Lifecycle policies are cheap automation – set them on day 1, or pay the cost later.

Caching Strategy: Multi-Level Cache for Sub-300ms Feeds

To achieve sub-300ms feed loads, we need a multi-layer cache. The first layer is a CDN for static media (images, video thumbs). The second layer is an in-memory cache (Redis) for precomputed feeds of active users. The third layer is an application-level LRU cache for frequently accessed metadata (user profiles, popular photos).

For feed data: we store the top N recent posts for each user in Redis (capped at 1000 per user). When a user posts, we push to followers' feed caches (for non-celebrities) or store in a 'celebrity post list' in Redis. For membership services (follower counts, liked), we use a separate Redis cluster with eventual consistency. Cache invalidation is handled via version numbers: each user has a feed version; when a new version is available (due to new post), the client refetches.

io.thecodeforge.instagram.CacheStrategy.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
package io.thecodeforge.instagram;

import redis.clients.jedis.Jedis;
import java.util.*;

public class CacheStrategy {
    private static final int FEED_CACHE_SIZE = 1000;
    private final Jedis jedis;

    public CacheStrategy(String redisHost) {
        this.jedis = new Jedis(redisHost);
    }

    public void addPostToUserFeed(String userId, Post post) {
        String key = "feed:" + userId;
        // Use sorted set with timestamp as score
        jedis.zadd(key, post.getTimestamp(), post.getId());
        // Trim to keep only top 1000
        jedis.zremrangeByRank(key, 0, -(FEED_CACHE_SIZE + 1));
    }

    public List<String> getUserFeed(String userId, int start, int count) {
        String key = "feed:" + userId;
        Set<String> ids = jedis.zrevrange(key, start, start + count - 1);
        return new ArrayList<>(ids);
    }

    public long getFeedSize(String userId) {
        return jedis.zcard("feed:" + userId);
    }

    // Cache aside pattern for user profile
    public UserProfile getUserProfile(String userId) {
        String key = "profile:" + userId;
        UserProfile profile = (UserProfile) jedis.get(key); // assuming serialization
        if (profile == null) {
            profile = loadFromDatabase(userId);
            jedis.setex(key, 300, profile.toString()); // 5 min TTL
        }
        return profile;
    }

    private UserProfile loadFromDatabase(String userId) {
        // Placeholder – in reality query sharded DB
        return new UserProfile(userId, "user_" + userId, System.currentTimeMillis());
    }

    public static void main(String[] args) {
        CacheStrategy cache = new CacheStrategy("localhost:6379");
        cache.addPostToUserFeed("user123", new Post("photo1", 1000000L));
        System.out.println("Feed: " + cache.getUserFeed("user123", 0, 10));
    }
}
Output
Feed: [photo1, ...]
Cache Layers as Distance from User
  • Layer 1 (CDN): Static media – miss penalty = 50-100ms (fetch from origin). Hit ratio > 95%.
  • Layer 2 (Redis): Feed data, popularity scores – miss penalty = 5-10ms to get from DB. Hit ratio > 80%.
  • Layer 3 (LRU): User profiles, photo metadata – miss penalty = 1-5ms (local). Hit ratio > 90%.
  • If you have to go to DB, your response time jumps from microseconds to milliseconds – that's where your SLAs break.
Production Insight
Putting all feed data in Redis sounds great until you estimate the memory cost.
500M active users × 1KB per feed entry × 1000 entries = 500TB of RAM. That's $15M/month just for Redis.
Rule: only cache feeds for active users (logged in last 24h). For inactive, regenerate on login.
Memory optimisation: use Redis with compressed data (snappy) and cap feed entries to 200 per user.
Debug it: monitor Redis INFO keyspace; if keyspace_hits ratio drops below 90%, you're caching too little or TTLs are too short.
Performance impact: 99th percentile latency drops from 500ms (no cache) to 180ms (multi-level cache).
Key Takeaway
Cache the feed, not the whole world.
Active-only caching reduces memory by 70% without losing performance.
Always have a cache miss plan: if Redis goes down, your feeds should still work (just slower) by falling back to DB reads.

Capacity Estimation: The Math They Actually Ask For, But Nobody Does Right

Everyone waves hands about "millions of users" in interviews. But when a senior engineer asks you how much storage you need per day, or what bandwidth your CDN has to provision, they're not being pedantic — they're testing if you've dealt with real infra limits. Let's do the numbers, because a 50TB storage requirement you didn't anticipate will kill your architecture faster than any cache miss.

Assume 500M daily active users. Average photo size after compression: 200KB. Average video size after transcoding: 2MB for a 30-second clip. Ratio of photos to videos: 80/20. Uploads per user per day: 0.5 (most users aren't power posters). That's 250M photos (50TB) and 62.5M videos (125TB) daily. Total: 175TB/day raw storage. Before replication. Before cold archival.

Bandwidth? On read-heavy workloads like feed generation, you serve 10x what you ingest. Add in thumbnail generation (multiple resolutions per upload), and your egress from object store to CDN to user is roughly 2PB/day. You do not handle that with a single region or a naive storage strategy. You design for it from day one.

CapacityEstimator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — interview tutorial

def estimate_daily_ingest(daily_active_users: int, uploads_per_user: float = 0.5,
                          photo_fraction: float = 0.8, avg_photo_kb: int = 200,
                          video_fraction: float = 0.2, avg_video_mb: int = 2) -> dict:
    total_uploads = int(daily_active_users * uploads_per_user)
    photos = int(total_uploads * photo_fraction)
    videos = total_uploads - photos

    photo_storage_gb = (photos * avg_photo_kb) / (1024 * 1024)
    video_storage_gb = (videos * avg_video_mb)

    return {
        "total_daily_uploads": total_uploads,
        "total_storage_gb": round(photo_storage_gb + video_storage_gb, 2),
        "estimated_egress_gb": round((photo_storage_gb + video_storage_gb) * 10, 2)
    }

capacity = estimate_daily_ingest(500_000_000)
print(f"Daily storage: {capacity['total_storage_gb']} GB")
print(f"Daily egress: {capacity['estimated_egress_gb']} GB")
Output
Daily storage: 175000.0 GB
Daily egress: 1750000.0 GB
Production Trap: Underestimating Replication Factor
New engineers always forget replication. If you use S3 standard (3x replication) with a 30-day hot retention before moving to Glacier, your first month's raw storage bill is 175TB * 3 = 525TB. At $0.023/GB hot storage, that's over $12,000/month. Before CDN costs. Do the math before you build.
Key Takeaway
Always include replication factor and retention policies in capacity estimates — it's the hidden cost that kills budgets.

System Requirements: The Black Box Your Interviewer Actually Opens

Competitor pages list Instagram features like a grocery list: "Post photos, follow users, like posts." Bored. Useless. The interviewer doesn't care if you can list features — they care if you can identify which ones create architectural tension. Here's the real distinction:

Core vs. Edge requirements. Posting photos is core. Creating a 4K video reel and expecting sub-second playback latency? That's a hard edge requirement that forces content-addressable storage and adaptive bitrate streaming. Follow/unfollow is easy until you have 10 million followers on a single celebrity account — then your fan-out model breaks.

Non-functional requirements are where the fight lives. Latency: feed generation must complete under 500ms for 99th percentile. Availability: 99.99% for read paths, but writes can tolerate brief inconsistency. Consistency: eventual consistency is fine for feed — users won't notice a 2-minute lag. But likes on a post? That expects real-time increment visibility. Different tradeoffs.

Capacity estimation isn't a separate section — it's how you validate every other choice. Store photos in S3? Great, what's your upload throughput per second? 100M daily uploads / 86400 seconds ≈ 1,157 uploads/second peak, likely 5x that during a viral event. Your upload API needs to handle 6,000 req/s with graceful degradation. That drives autoscaling policy, connection pooling, and database write throughput calculations.

ThroughputCalculator.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// io.thecodeforge — interview tutorial

def estimate_write_throughput(daily_uploads: int, peak_multiplier: float = 5.0):
    seconds_per_day = 86400
    avg_throughput = daily_uploads / seconds_per_day
    # Assume peak hour is 5x average (Instagram's pattern)
    peak_throughput = avg_throughput * peak_multiplier
    return {
        "average_writes_per_second": round(avg_throughput, 2),
        "peak_writes_per_second": round(peak_throughput, 2),
        "peak_p95_percentage": f"{peak_multiplier * 100}% of average"
    }

io_requirements = estimate_write_throughput(100_000_000)
print(f"Avg writes/sec: {io_requirements['average_writes_per_second']}")
print(f"Peak writes/sec: {io_requirements['peak_writes_per_second']}")
Output
Avg writes/sec: 1157.41
Peak writes/sec: 5787.04
Senior Shortcut: State Your Assumptions Out Loud
Tell the interviewer: 'I'm assuming 50/50 read-write split and 5x peak traffic — if your actual profile differs, I'll adjust the database connection pool and caching strategy accordingly.' It shows you model uncertainty, not just ideal numbers.
Key Takeaway
Functional requirements are cheap; non-functional requirements (latency, consistency, throughput) are the real system design constraints.

Sharding Keys: The Decision That Makes or Breaks Your Database

Competitors vaguely mention 'database sharding' but never show you the knife fight: choosing the shard key. Instagram's core tables — User, Post, Follow, Like — each require different sharding strategies, and picking wrong means data hotspots that no cache can fix.

For the Post table, don't shard on user_id. Why? A celebrity posts once and gets 10M likes in an hour. That's 10M writes to a single shard. You've just created a burning hot partition. Instead, shard on post_id (or a hash of post_id). Spreads writes evenly because each post gets its own shard. Reads? Feed generation hits many post_ids anyway, so scatter-gather is unavoidable — but you parallelize that across shards.

For the Follow table? User_id shard works — a user's follow list is read-heavy and small (<10K follows, typically). Hot shard problem exists for celebrities (10M followers), but the read pattern is one user reading their follow graph, not 10M writes. Use a separate adjacency list in a cache layer like Redis for celebrity follow graph lookups. Don't torture your primary database.

For Likes, shard on (post_id, user_id) composite key. The goal is to evenly distribute like events across shards while supporting fast UNIQUE checks to prevent duplicate likes. Composite keys pre-split the write load. If you use auto-increment IDs, wrap them with a shard ID prefix — standard Ticket Server pattern.

ShardKeyDesigner.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — interview tutorial

import hashlib

class ShardRouter:
    def __init__(self, num_shards: int = 64):
        self.num_shards = num_shards

    def _hash_to_shard(self, key: str) -> int:
        # Consistent hashing avoids reshuffling on shard count change
        return int(hashlib.md5(key.encode()).hexdigest(), 16) % self.num_shards

    def post_shard(self, post_id: str) -> int:
        return self._hash_to_shard(post_id)

    def like_shard(self, post_id: str, user_id: str) -> int:
        # Composite key for even distribution
        return self._hash_to_shard(f"{post_id}:{user_id}")

router = ShardRouter(num_shards=64)
print(f"Post 'abc123' -> shard {router.post_shard('abc123')}")
print(f"Like (post='abc123', user='user_99') -> shard {router.like_shard('abc123', 'user_99')}")
Output
Post 'abc123' -> shard 47
Like (post='abc123', user='user_99') -> shard 12
Production Trap: Don't Use MOD for Shard Keys
MOD-based sharding (post_id % num_shards) causes massive data migration when you add shards. Instagram grew from 8 to 64 shards — that's 8x reshuffling of all data. Use consistent hashing (hash ring) from day one, even if you only have 4 shards. The code is trivial, the cost of changing later is catastrophic.
Key Takeaway
Shard keys must be chosen per query pattern — not a one-size-fits-all. Post by post_id, likes by composite key, follows by user_id with separate cache for celebrity hot keys.

Interactions: The Hot Path You Will Mistake for a Side-Show

Interviews focus on feed reads but the hot path is writes from interactions. Like, comment, share, save — these hit the write-heavy chunk of your architecture every time a user breathes. You get 10x more interaction writes than uploads. And they need immediate consistency because no one wants to see their like disappear after a refresh.

Your read-replica strategy breaks here. Interaction writes must hit the primary shard containing the post owner's data. If you shard by user_id, the post's interaction traffic fans out across shards — every 'like' becomes a cross-shard transaction waiting to fail. Instead, shard interactions by post_id, colocating the post's metadata and interaction counters on the same node. This turns a multi-shard nightmare into a local increment.

The trap is treating interactions as low-priority. They generate the feed's engagement signals, trigger real-time notifications, and feed the ranking algorithm. If your design can't handle 500 concurrent likes on a single post, your Instagram clone dies on the first viral photo.

interaction_handler.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — interview tutorial

class InteractionService:
    def like_post(self, user_id: str, post_id: str):
        # shard by post_id to keep metadata colocated
        post_shard = self.shard_for_post(post_id)
        
        # atomic increment on the primary, no cross-shard
        with post_shard.transaction():
            post_shard.increment('like_count', post_id)
            post_shard.insert('like', {
                'user_id': user_id,
                'post_id': post_id,
                'timestamp': self.now()
            })
        
        # fire async to feed ranking
        self.queue.send('interaction_rank', {
            'post_id': post_id,
            'type': 'like'
        })
Output
like_count incremented for post_abc123
like record inserted
interaction_rank event enqueued
Sharding Trap:
Sharding interactions by user_id creates a cross-shard write storm on viral posts. Always shard by post_id so every like becomes a local single-shard operation.
Key Takeaway
Interactions are the write-heavy hot path. Shard by post_id, not user_id, to avoid cross-shard atomic increment failures.

User_Follows: The Graph Problem Your NoSQL Can't Solve

Your SQL fanboy starts talking about a join table. Wrong. Instagram has 500M+ daily active users. A user follows 500 accounts. That's 250B edges. joins at that scale are a death sentence. You need a directed graph stored as a lookup table in a document store or a specialized graph database like Amazon Neptune or Neo4j.

But here's the real problem: when a post goes viral, your feed generation needs to check 'does user A follow user B?' This check happens per post, per feed request, for every user scrolling. A simple key-value check — user_id -> set of followee_ids — handles this at sub-millisecond latency. Store that in Redis as a sorted set. Use the follow timestamp as the score so the feed generator can paginate by recency without a costly join.

The follow/unfollow operation is another trap. Stale followers in the cache cause phantom content in the feed. Use a write-through cache pattern: on unfollow, delete the cached following set for that user. Then lazy-load it on the next feed request. This keeps consistency without cascading invalidations across nodes.

follow_graph.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// io.thecodeforge — interview tutorial

class FollowGraph:
    def __init__(self, redis, db):
        self.cache = redis
        self.store = db
    
    def get_following(self, user_id: str) -> set:
        # check cache first
        cached = self.cache.smembers(f"following:{user_id}")
        if cached:
            return cached
        
        # lazy load from DB if cache miss
        following = self.store.query(
            "SELECT followee_id FROM follows WHERE follower_id = ?", user_id
        )
        self.cache.sadd(f"following:{user_id}", *following)
        self.cache.expire(f"following:{user_id}", 3600)
        return following
    
    def unfollow(self, follower: str, followee: str):
        self.store.execute(
            "DELETE FROM follows WHERE follower_id = ? AND followee_id = ?",
            follower, followee
        )
        # write-through cache invalidation
        self.cache.delete(f"following:{follower}")
Output
following set cache-missed and loaded from DB
unfollow executed: follower_1234 no longer follows followee_5678
cache cleared for next request
Senior Shortcut:
Store follow relationships as sorted sets in Redis with follow timestamp as the score. This lets you paginate the feed by recency without querying the database for every scroll.
Key Takeaway
The follows graph is a lookup problem, not a join problem. Use Redis sorted sets for sub-millisecond 'does user A follow user B?' checks.

High-Level Design (HLD): The Diagram That Defines Your Interview

Your HLD is not a network topology. It is a story about data flow under constraints. Start with the read vs. write asymmetry: 100M uploads/day vs. 500M feed refreshes/day. Every component choice follows from that ratio. Use a single API gateway for authentication, rate limiting, and request routing. Separate upload and feed read paths at the gateway level—uploads hit a dedicated ingestion service, feeds hit a read-optimized service. The fan-out service writes to each follower's feed cache (push) while a lazy loader fills gaps for inactive users (pull). Your HLD must show the reverse index for comments and likes as separate logical stores, even if physically shared. The interviewer wants to see you encode trade-offs—like why you pick Kafka over RabbitMQ for the async fan-out (throughput, replayability) or why you split the feed read service from the user service (independent scaling, failure isolation). A strong HLD for Instagram is a decision tree, not a checklist.

HLD_Decisions.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// io.thecodeforge — interview tutorial

# HLD decision rules translated to code logic

class FeedHLD:
    def __init__(self):
        self.read_write_ratio = 5_000_000_000 / 100_000_000  # 50:1
        self.use_kafka = True  # async fan-out needs replay

    def choose_gateway_split(self):
        return {"upload": "ingestion_service", "feed": "read_service"}

    def scaling_rules(self):
        # only scale read service horizontally
        return "feed_read instances >> upload instances"
Output
{'upload': 'ingestion_service', 'feed': 'read_service'}
Production Trap:
Putting feed generation and upload processing behind the same service causes cascading failures. A sudden upload spike (e.g., Super Bowl) stalls feed reads for all users.
Key Takeaway
Your HLD is a set of forced trade-offs driven by the read-write ratio.

Low-Level Design (LLD): The Feed Cache That Must Survive Queries per Second

Feed generation at scale fails in data structures, not algorithms. Your LLD must specify the exact Redis data types for the feed cache: a Redis sorted set per user where the key is post_id and score is creation timestamp. Each post stores a serialized object with media_id, user_id, and like_count (not the full image URL—that's in CDN). The fan-out worker writes to these sets in batch: for each new post, it inserts into the top 10,000 followers' sorted sets (push), then signals a lazy loader thread for the rest. The lazy loader uses a Bloom filter to check if a post was already pushed to a user's cache before falling back to the follower's SQL index. Eviction policy is TTL-based (72 hours for active users, 24 for dormant). Cache misses warm from a denormalized Postgres table with index on (user_id, created_at DESC). Your LLD must also handle likes and comments as counter shards: each post's interaction count lives in a Redis hash updated asynchronously via Kafka streams, avoiding write amplification on the feed read path.

FeedCacheLLD.pyPYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// io.thecodeforge — interview tutorial

import redis

r = redis.Redis(decode_responses=True)

def push_post_to_feed(follower_id, post_id, timestamp):
    # sorted set: key = feed:{user_id}, score = timestamp
    r.zadd(f"feed:{follower_id}", {post_id: timestamp})

def get_feed(user_id, start, end):
    # reverse range, newest first
    return r.zrevrange(f"feed:{user_id}", start, end, withscores=True)

# Bloom filter check before lazy load
# store follower count per post to decide push vs pull
Output
[(b'post_abc', 1701274000), (b'post_xyz', 1701273900)]
Production Trap:
Storing user_follows as a list in Redis causes O(n) walk for top followers. Always store as a sorted set with follower count as score.
Key Takeaway
At LLD, every cache miss equals a relational query you cannot afford.
● Production incidentPOST-MORTEMseverity: high

Celebrity Post Causes Global Feed Delay

Symptom
Feed updates stalled globally for 2 hours. API latency spiked from 200ms to 20s. Message queues (Kafka) backed up with millions of unprocessed fan-out tasks. OOM errors on feed workers.
Assumption
The push-based fan-out model could handle any user because the system was horizontally scalable. The threshold for 'celebrity' handling was not defined.
Root cause
The system applied the same fan-out behaviour to all users. A celebrity with 100M followers triggered 100M entries in a single Kafka partition, overwhelming the consumer group's ability to process before new messages arrived.
Fix
Introduced a hybrid feed generation model: users with follower count > 10M are treated as 'celebrities'. Their posts are stored in a hot table and pulled into follower feeds only when that follower requests their feed. The threshold is configurable via a dynamic feature flag.
Key lesson
  • Profile user follower distributions regularly – the tail (celebrities) is where your system breaks.
  • Always have a safety threshold for push vs pull – don't treat all writes equally.
  • Monitor per-partition lag in Kafka (or equivalent) – it's the first signal that you're overwhelming a single consumer.
Production debug guideSymptom → Immediate Action → Root Cause Analysis4 entries
Symptom · 01
Feed loads slowly (>500ms) or shows stale content
Fix
Check Redis cache hit ratio via redis-cli info stats – if hits < 85%, check feed precomputation worker health. Examine Kafka consumer lag for the feed partition.
Symptom · 02
Photo loading takes >2 seconds or shows broken images
Fix
Verify CDN cache status via curl -I https://cdn.instagram.com/p/.... If X-Cache: MISS, check origin S3 bucket for file existence. Ensure CDN purge didn't wipe popular content.
Symptom · 03
Upload fails with 503 or timeout
Fix
Check API gateway rate limiting logs – if rate exceeded, scale gateway or adjust per-user limits. Also verify auth token expiration and S3 bucket permissions.
Symptom · 04
User sees incorrect follower count or feed order
Fix
Check database replication lag – if SQL/NoSQL replicas are behind, downgrade consistency to LOCAL_QUORUM for reads. Examine async job queue for pending follower count updates.
★ Cheat sheet: Instant Diagnosis for Instagram-Scale IssuesWhen the system goes wrong, these commands get you to the root cause in under 2 minutes.
High API latency
Immediate action
Check if the bottleneck is CPU, memory, or I/O on feed workers.
Commands
kubectl top pods -n feed-service
kubectl logs -l app=feed-worker --tail=100 | grep 'fanout' | head -20
Fix now
Scale feed workers: kubectl scale deployment feed-worker --replicas=20
CDN cache miss ratio > 20%+
Immediate action
Check origin S3 request rate – it might be under heavy load.
Commands
aws cloudwatch get-metric-statistics --metric-name GetRequests --namespace AWS/S3 --statistics Sum --period 300
curl -I <cdn-url> | grep -i 'x-cache-status'
Fix now
Increase CDN TTL for popular media, or use CDN pre-warming for upcoming events.
Kafka consumer lag growing for feed partition+
Immediate action
Identify if one partition is hotter than others (uneven fan-out distribution).
Commands
kafkacat -L -b broker:9092 -t feed-events -J | jq '.topics[].partitions[].partition, .topics[].partitions[].leader'
kafkacat -C -b broker:9092 -t feed-events -p <hot-partition> -o -100 -e | wc -l
Fix now
Temporarily increase consumer threads: kafka-consumer-groups --bootstrap-server broker:9092 --group feed-workers --topic feed-events --reset-offsets --to-latest --execute
Upload error: 'Permission denied'+
Immediate action
Check IAM role permissions for the media upload service.
Commands
aws sts assume-role --role-arn 'arn:aws:iam::123456789012:role/MediaUploadRole' --role-session-name test
aws iam simulate-principal-policy --policy-source-arn 'arn:aws:iam::123456789012:role/MediaUploadRole' --action-names s3:PutObject --resource-arns 'arn:aws:s3:::instagram-media/*'
Fix now
Attach s3:PutObject permission to the role, then test upload again.
Database Choice Comparison for Instagram Components
ComponentRelational (PostgreSQL)NoSQL (Cassandra/HBase)RecommendedWhy
User ProfilesACID compliant, join-friendlyHigh availability, flexible schemaPostgreSQL (sharded)Referential integrity matters; strong consistency for critical user data.
Feed Data (Likes, Comments)High write overhead, scaling issuesColumn-family, fast writes, linear scalabilityCassandraWrite-heavy, tunable consistency; compressible columns.
Media Metadata (Photo URLs, Captions)Possible with shardingExcellent for high throughput writes and readsCassandraHigh write volume (100M/day); no complex joins; eventual consistency fine.
Follower GraphRequires junction table; slow for massive fan-out queriesNot ideal for graph traversalCassandra (with denormalisation) or dedicated graph DB (Neo4j for recommendations)Follower graph is read-heavy; denormalise for fast fan-out; graph DB for recommendation.
Search / ExploreFull-text search limitedElasticsearch/Solr is not NoSQL in the traditional senseElasticsearch (search) + Cassandra (data)Use Elasticsearch for full-text search over posts/descriptions; store source of truth in Cassandra.

Key takeaways

1
Separate Read and Write paths to handle lopsided traffic patterns (1:100 write-to-read ratio).
2
Use Consistent Hashing and Sharding by User_ID to manage data growth across multiple servers.
3
Implement a Hybrid Feed Model (Push for normal users, Pull for celebrities) to avoid the 'Thundering Herd' problem.
4
Leverage CDNs to minimize 'Time to First Byte' (TTFB) for global users.
5
Cache only for active users; use multi-level cache (CDN, Redis, LRU) to achieve sub-300ms feed loads.
6
Always set a celebrity follower threshold
it's not optional, it's a hard requirement for feed stability.

Common mistakes to avoid

5 patterns
×

Storing images directly in the database (BLOB columns)

Symptom
Database size balloons to petabytes, backups take days, read/write latency skyrockets as DB becomes I/O-bound.
Fix
Always store images/videos in an object store (S3, GCS). Keep only the URL string in the database. Use presigned URLs for access control.
×

Ignoring the CDN or deploying a single-region CDN

Symptom
Users in Europe and Asia experience >5s load times for photos stored in US-West; high egress costs from origin.
Fix
Use a global CDN (CloudFront, Cloudflare). Set proper cache-control headers to max-age=86400 for popular content. Pre-warm CDN for major events.
×

Assuming strong consistency is needed for all data (e.g., like counts)

Symptom
Write latency increases due to cross-region replication; availability drops when replicas are down.
Fix
Use eventual consistency for social metrics (likes, comments, follower counts). Show approximate counts with a + indicator. Only enforce strong consistency for user settings and posts.
×

Under-calculating storage growth for videos

Symptom
After 1 year, storage costs exceed projected budget; OOM errors on ingestion pipeline due to slow compression.
Fix
Estimate storage: 100M photos/day * 2MB = 200TB daily. Compress videos with H.265, transcode to multiple resolutions, and move older videos to cold storage (Glacier). Set lifecycle policies from day one.
×

Using auto-increment IDs in distributed sharded databases

Symptom
ID collisions when inserting concurrently; coordination overhead kills write throughput.
Fix
Use a Snowflake-style ID generator (64-bit, time-sortable, datacenter + worker bits). Or use UUID v7 (sortable). Avoid auto-increment entirely.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How would you design Instagram's feed generation system to handle celebr...
Q02SENIOR
How would you shard the database for Instagram? Which shard key would yo...
Q03SENIOR
How do you ensure high availability and durability for photo uploads?
Q04SENIOR
Compare using Cassandra vs PostgreSQL for the photo metadata table. Whic...
Q05SENIOR
How would you estimate the storage and bandwidth needed for Instagram?
Q01 of 05SENIOR

How would you design Instagram's feed generation system to handle celebrities with millions of followers?

ANSWER
Use a hybrid push-pull model. For users below a threshold (e.g., 1M followers), push posts to followers' feed caches (fan-out on write) using Kafka for async fan-out. For celebrities above the threshold, store their posts in a hot table in Redis; at feed request time, merge the precomputed cached feed with recent celebrity posts. The threshold should be configurable and monitored via Kafka consumer lag. This prevents write amplification from destroying the message queue and keeps read latency acceptable.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
How do you handle the 'Celebrity' problem in Instagram's feed?
02
Which database is better for Instagram: SQL or NoSQL?
03
How do you ensure 'High Availability' for image viewing?
04
How do you handle video uploads and streaming?
05
What happens when a user deletes a photo?
06
How do you handle search in Instagram (Explore page)?
N
Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Drawn from code that ran under real load.

Follow
Verified
production tested
May 23, 2026
last updated
1,554
articles · all by Naren
🔥

That's System Design Interview. Mark it forged?

11 min read · try the examples if you haven't

Previous
Design TinyURL — Interview
4 / 7 · System Design Interview
Next
Design a Caching System