Senior 4 min · March 06, 2026

Design Instagram — Hybrid Push-Pull Feed at 100M Followers

A celebrity post triggered 100M Kafka fan-out tasks, spiking latency from 200ms to 20s.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • Instagram serves 2B+ users uploading 100M photos/videos daily, delivering feeds in under 300ms
  • Core components: Load balancer, API gateway, Object Store (S3), SQL/NoSQL databases, CDN, Feed Service
  • Feed generation uses a hybrid push-pull model: push for normal users (fan-out on write), pull for celebrities (fan-out on read)
  • Caching layers: Redis for precomputed feeds, LRU for profiles/metadata, CDN for static media – reduces DB reads by 80%
  • Storage: 200TB new data daily; hot in S3 for 30 days, then cold archive; CDN edge caches popular media globally
  • Scaling: shard by user_id via consistent hashing, use message queues (Kafka) for async fan-out, watch for celebrity hot spots
Plain-English First

Imagine a giant post office where a billion people each send photo postcards every day. The post office has to instantly sort every postcard, deliver it only to the people who care about that sender, store the original photo safely forever, and let anyone pull up an old postcard in under a second — even at 3 AM during a major event. Instagram is exactly that post office, and designing it means figuring out every room, shelf, conveyor belt, and delivery truck needed to make it all work without ever losing a single photo.

Instagram serves over two billion monthly active users, processes roughly 100 million photo and video uploads every day, and is expected to return a personalised feed in under 300 milliseconds. When an interviewer asks you to design it, they're not looking for a diagram of boxes connected by arrows — they're watching whether you can reason about trade-offs at scale, make deliberate architectural decisions, and defend them under pressure. This is one of the most common system design questions in FAANG-level loops, and candidates who haven't internalised the nuances consistently get stuck on the feed generation problem or grossly underestimate storage requirements.

The core challenge Instagram solves is deceptively simple on the surface: store media, show it to followers, let people discover new content. Underneath, it's a collision of three genuinely hard distributed-systems problems — write-heavy media ingestion, read-heavy personalised feed delivery, and near-real-time social graph queries — all happening concurrently at planetary scale. Each of those problems demands different storage engines, caching strategies, and consistency models, and they have to coexist inside one coherent product.

By the end of this article you'll have a complete, defensible Instagram design you can present in a 45-minute interview. You'll know the exact numbers to anchor your estimates, the right database choices for each data type, how to generate feeds without melting your servers, where CDNs fit, how to shard your data, and — critically — which trade-offs to call out proactively so the interviewer knows you're thinking like an engineer who has shipped things to production, not just read about it.

Core Component Architecture: Handling 100M Uploads Daily

Designing Instagram isn't just about 'uploading a file.' It's about a decoupled architecture where the Write Path (Uploading) and the Read Path (Feed Generation) are optimized independently. On the write side, we use a Load Balancer to distribute traffic to an API Gateway, which handles authentication and rate limiting. The media itself (Photos/Videos) never touches our relational database; instead, it's streamed to an Object Store like AWS S3 or Google Cloud Storage.

We store the metadata (User ID, Photo URL, Timestamp, Location) in a distributed NoSQL database like Cassandra or a sharded PostgreSQL cluster. This separation ensures that even if our metadata database is busy, our media storage remains performant and durable. To make the images load instantly worldwide, we push them to Edge Locations via a Content Delivery Network (CDN).

io.thecodeforge.instagram.MediaService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package io.thecodeforge.instagram;

import java.util.UUID;
import java.time.Instant;

/**
 * Represents the Metadata record for a high-scale media upload.
 * In production, this would be persisted to a sharded DB cluster.
 */
public class PhotoMetadata {
    private final String photoId;
    private final String userId;
    private final String s3Url;
    private final long timestamp;

    public PhotoMetadata(String userId, String s3Url) {
        this.photoId = UUID.randomUUID().toString();
        this.userId = userId;
        this.s3Url = s3Url;
        this.timestamp = Instant.now().getEpochSecond();
    }

    public void saveToDatabase() {
        // High-level logic for sharded database insertion
        System.out.println("Persisting metadata to shard based on userId: " + userId);
        System.out.println("Photo ID: " + photoId + " | Storage Path: " + s3Url);
    }

    public static void main(String[] args) {
        PhotoMetadata upload = new PhotoMetadata("user_8821", "https://s3.thecodeforge.io/bucket/img_99.jpg");
        upload.saveToDatabase();
    }
}
Output
Persisting metadata to shard based on userId: user_8821
Photo ID: 7c9e... | Storage Path: https://s3.thecodeforge.io/bucket/img_99.jpg
Forge Tip: The 'Pull' vs 'Push' Feed Model
For regular users, 'Push' their posts to followers' pre-computed feeds (Fan-out). For celebrities with millions of followers, 'Pull' their content only when a follower refreshes their feed. This hybrid approach prevents 'Celebrity Fan-out' from crashing your message queues.
Production Insight
Straight push to all followers works only when your fan-out ratio is low.
At 100M followers per celebrity, a single write triggers 100M queue entries – that's enough to OOM any worker pool.
Rule: always add a dynamic threshold that switches to pull for users above a certain follower count.
Debug it: monitor Kafka partition lag per user – if one user's partition lag grows while others are idle, you've hit the celebrity edge case.
Key Takeaway
Separate write and read paths are non-negotiable.
Push vs pull: efficiency comes from knowing where your curve breaks.
If you don't set a celebrity threshold, a single viral post will take down your feed.
Feed Generation Strategy Decision
IfFollower count < 10,000
UsePush: write post to each follower's feed cache immediately. Low write cost.
IfFollower count 10,000 – 1,000,000
UsePush with batch fan-out: enqueue fan-out tasks in Kafka; use 50 workers per partition.
IfFollower count > 1,000,000
UsePull: store post in a hot table; on feed request, merge recent posts from all followed accounts via a read-path query.

Database Sharding & Scalability Strategies

A single database instance will fail at Instagram's scale. We must shard our data. The best strategy is to shard by User_ID. This ensures that all photos from a single user live on the same shard, making the 'View Profile' query extremely fast. However, for the 'Global Feed', we might need a secondary index or a specialized search service like Elasticsearch.

We also implement a multi-layered caching strategy: Redis for the 'Latest Feed' (Pre-computed), and an LRU cache at the application level for frequently accessed user profiles. This reduces the DB read pressure by over 80%.

SchemaDesign.sqlSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- io.thecodeforge.instagram - Database Schema Concepts
-- Sharding Key: user_id

CREATE TABLE io_thecodeforge.users (
    user_id BIGINT PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE io_thecodeforge.photos (
    photo_id BIGINT PRIMARY KEY,
    user_id BIGINT REFERENCES io_thecodeforge.users(user_id),
    image_path VARCHAR(255) NOT NULL,
    caption TEXT,
    latitude DECIMAL(9,6),
    longitude DECIMAL(9,6),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Index for fast Feed Generation (Sorted by Time)
CREATE INDEX idx_user_photos_time ON io_thecodeforge.photos(user_id, created_at DESC);
Output
Schema created. Sharding logic should be handled by the application layer or middleware like Vitess.
Production Reality:
Don't use auto-incrementing IDs in a distributed system. Use a 'Snowflake' ID generator (like Twitter's) to create unique, time-sortable 64-bit IDs across multiple shards.
Production Insight
Sharding by user_id makes profile queries fast but creates hot spots for celebrities.
One celebrity shard gets 100x the writes of others, causing latency tail to spike.
Rule: use consistent hashing with virtual nodes to spread celebrity write load.
Debug it: monitor per-shard write latency; if one shard is hotter, reassign virtual nodes dynamically.
Key trade-off: you give up easy range queries across users – secondary indexes are needed for global feed or search.
Key Takeaway
Shard by user_id for profile locality.
Consistent hashing handles elastic scaling without downtime.
Always plan for hot keys – they're not a rare edge case, they're guaranteed as you grow.
Shard Key Selection
IfEqual distribution of users across shards
UseUse user_id modulo N – simple, but leads to incremental rebalancing when adding shards.
IfHandle celebrity hot spots without manual rebalancing
UseUse consistent hashing with 1000 virtual nodes per physical shard – automatically distributes hot keys.
IfNeed cross-shard queries (e.g., find photos near a location)
UseAdd a secondary index table sharded differently (e.g., by geo hash) or use Elasticsearch.

Feed Generation: The Push-Pull Hybrid Model

The feed is the heart of Instagram. It must show recent posts from followed users in reverse chronological order (or ranked by engagement). Two classic approaches exist: pull (fan-out on read) and push (fan-out on write). Pull means when a user opens the app, we query all followed users' recent posts and merge. Push means when a user posts, we insert that post into every follower's precomputed feed list.

Pull is efficient for celebrities because you don't push to millions; but it's slow for users following hundreds of accounts because you need many queries. Push is fast for reading but costly for writes, especially for popular users. The solution: use push for regular users (fan-out on write) and pull for users with >1M followers (fan-out on read). This hybrid model balances the trade-offs.

io.thecodeforge.instagram.FeedGenerationService.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
package io.thecodeforge.instagram;

import java.util.*;
import java.util.concurrent.*;

public class FeedGenerationService {
    private static final long CELEBRITY_THRESHOLD = 1_000_000;
    private final FollowerService followerService;
    private final FeedCache feedCache;
    private final ExecutorService fanoutExecutor = Executors.newFixedThreadPool(20);

    public void onNewPost(Post post, String userId) {
        long followerCount = followerService.getFollowerCount(userId);
        if (followerCount < CELEBRITY_THRESHOLD) {
            fanoutToAllFollowers(post, userId);
        } else {
            // celebrity: store post for pull-based retrieval
            feedCache.storeCelebrityPost(userId, post);
        }
    }

    private void fanoutToAllFollowers(Post post, String userId) {
        List<String> followers = followerService.getFollowers(userId);
        for (String followerId : followers) {
            fanoutExecutor.submit(() -> feedCache.addToFeed(followerId, post));
        }
    }

    public List<Post> getFeed(String userId, int limit) {
        // merge cached feed with recent posts from followed celebrities
        List<Post> cachedPosts = feedCache.getFeed(userId);
        List<String> followedCelebrities = followerService.getFollowedCelebrities(userId);
        List<Post> celebrityPosts = feedCache.getRecentCelebrityPosts(followedCelebrities);
        List<Post> merged = mergeAndSort(cachedPosts, celebrityPosts);
        return merged.subList(0, Math.min(limit, merged.size()));
    }

    private List<Post> mergeAndSort(List<Post> a, List<Post> b) {
        // merge two sorted (by timestamp descending) lists
        List<Post> result = new ArrayList<>();
        int i = 0, j = 0;
        while (i < a.size() && j < b.size()) {
            result.add(a.get(i).getTimestamp() >= b.get(j).getTimestamp() ? a.get(i++) : b.get(j++));
        }
        while (i < a.size()) result.add(a.get(i++));
        while (j < b.size()) result.add(b.get(j++));
        return result;
    }
}
Output
Feed generation service uses hybrid push/pull based on follower count.
The Pub/Sub Hybrid Analogy
  • Push: writer writes once, but many recipients must read. Good for small fan-outs.
  • Pull: reader reads from many sources; good for large fan-outs because writer isn't burdened.
  • Hybrid: set a threshold where the cost of push exceeds the benefit of instant delivery.
  • Threshold should be configurable and adjusted based on system load – you can even make it dynamic.
Production Insight
Push fan-out with 100M followers creates 100M writes per post – that's a 100M spike in write traffic.
If your message queue isn't partitioned properly, a single partition gets backed up and all feeds slow down.
Rule: use multiple Kafka partitions and route fan-out tasks by hash of follower ID, not user ID.
Debug it: if one partition lag is high, check if that user's fan-out tasks are all in one partition due to bad key selection.
Performance impact: hybrid reduces write amplification by 99% for celebrities while keeping feed generation under 50ms for normal users.
Key Takeaway
Feed generation is the main bottleneck at scale.
Push works for small groups, pull works for massive groups – hybrid is the engineering answer.
Make the threshold configurable: you'll tune it based on observed consumer throughput and latency SLAs.

Storage Strategy: Object Store, CDN, and Cold Archival

Media storage at Instagram's scale requires a tiered approach. The primary storage is an object store (AWS S3, Google Cloud Storage) because they offer near-infinite capacity, strong durability (99.999999999%), and pay-per-GB pricing. However, serving every image directly from S3 would be too slow for users far from the data center and expensive in egress costs. That's where CDNs come in: we push popular media to edge servers worldwide so users download from a nearby node.

For cold data – photos older than 30 days with zero views – we move them to a cheaper archival tier (Amazon S3 Glacier, Google Cloud Archive) and serve a placeholder if accessed. The CDN cache also holds a copy for a shorter TTL (e.g., 7 days for popular content, 1 day for normal). Videos are stored as HLS segments for adaptive bitrate streaming.

io.thecodeforge.instagram.StorageStrategy.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
package io.thecodeforge.instagram;

import java.time.*;

public class StorageStrategy {
    enum StorageTier { HOT, COLD, ARCHIVE }

    public StorageTier classifyMedia(String mediaId, Instant lastAccessTime) {
        long daysSinceLastAccess = Duration.between(lastAccessTime, Instant.now()).toDays();
        if (daysSinceLastAccess <= 30) {
            return StorageTier.HOT;
        } else if (daysSinceLastAccess <= 365) {
            return StorageTier.COLD;
        } else {
            return StorageTier.ARCHIVE;
        }
    }

    // Production: this method would invoke S3 lifecycle policies
    public String getMediaUrl(String mediaId, StorageTier tier) {
        switch(tier) {
            case HOT:
                return "https://s3.us-east-1.amazonaws.com/instagram-hot/" + mediaId;
            case COLD:
                return "https://s3.us-west-2.amazonaws.com/instagram-cold/" + mediaId;
            case ARCHIVE:
                return "https://glacier.amazonaws.com/instagram-archive/" + mediaId;
            default:
                throw new IllegalArgumentException("Unknown tier");
        }
    }

    public static void main(String[] args) {
        StorageStrategy s = new StorageStrategy();
        StorageTier tier = s.classifyMedia("photo_abc123", Instant.now().minus(60, ChronoUnit.DAYS));
        System.out.println("Media URL: " + s.getMediaUrl("photo_abc123", tier));
    }
}
Output
Media URL: https://s3.us-west-2.amazonaws.com/instagram-cold/photo_abc123
Data Lifecycle Management
Set S3 lifecycle rules to transition objects: 30 days to Infrequent Access (reduced cost), 365 days to Glacier (archive). CDN TTLs should be shorter (1-7 days) for freshness; use invalidation handles for immediate removal.
Production Insight
Storing all media in one S3 bucket makes it easy to manage but creates a permission nightmare.
Use separate buckets per storage tier (hot, cold, archive) with IAM roles restricting write access to the upload service only.
Rule: never give public read access to S3 – always use CDN signed URLs with expiry.
Performance impact: CDN adds 10-20ms latency but reduces origin load by 95% and saves 80% on egress costs.
Key trade-off: faster global delivery costs more for cache invalidation – you can't immediately purge all edges; use versioned URLs (e.g., include upload timestamp) to avoid cache staleness.
Key Takeaway
S3 for durability, CDN for speed, Glacier for cost.
Never expose S3 direct URLs – always use CDN signed URLs.
Lifecycle policies are cheap automation – set them on day 1, or pay the cost later.

Caching Strategy: Multi-Level Cache for Sub-300ms Feeds

To achieve sub-300ms feed loads, we need a multi-layer cache. The first layer is a CDN for static media (images, video thumbs). The second layer is an in-memory cache (Redis) for precomputed feeds of active users. The third layer is an application-level LRU cache for frequently accessed metadata (user profiles, popular photos).

For feed data: we store the top N recent posts for each user in Redis (capped at 1000 per user). When a user posts, we push to followers' feed caches (for non-celebrities) or store in a 'celebrity post list' in Redis. For membership services (follower counts, liked), we use a separate Redis cluster with eventual consistency. Cache invalidation is handled via version numbers: each user has a feed version; when a new version is available (due to new post), the client refetches.

io.thecodeforge.instagram.CacheStrategy.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
package io.thecodeforge.instagram;

import redis.clients.jedis.Jedis;
import java.util.*;

public class CacheStrategy {
    private static final int FEED_CACHE_SIZE = 1000;
    private final Jedis jedis;

    public CacheStrategy(String redisHost) {
        this.jedis = new Jedis(redisHost);
    }

    public void addPostToUserFeed(String userId, Post post) {
        String key = "feed:" + userId;
        // Use sorted set with timestamp as score
        jedis.zadd(key, post.getTimestamp(), post.getId());
        // Trim to keep only top 1000
        jedis.zremrangeByRank(key, 0, -(FEED_CACHE_SIZE + 1));
    }

    public List<String> getUserFeed(String userId, int start, int count) {
        String key = "feed:" + userId;
        Set<String> ids = jedis.zrevrange(key, start, start + count - 1);
        return new ArrayList<>(ids);
    }

    public long getFeedSize(String userId) {
        return jedis.zcard("feed:" + userId);
    }

    // Cache aside pattern for user profile
    public UserProfile getUserProfile(String userId) {
        String key = "profile:" + userId;
        UserProfile profile = (UserProfile) jedis.get(key); // assuming serialization
        if (profile == null) {
            profile = loadFromDatabase(userId);
            jedis.setex(key, 300, profile.toString()); // 5 min TTL
        }
        return profile;
    }

    private UserProfile loadFromDatabase(String userId) {
        // Placeholder – in reality query sharded DB
        return new UserProfile(userId, "user_" + userId, System.currentTimeMillis());
    }

    public static void main(String[] args) {
        CacheStrategy cache = new CacheStrategy("localhost:6379");
        cache.addPostToUserFeed("user123", new Post("photo1", 1000000L));
        System.out.println("Feed: " + cache.getUserFeed("user123", 0, 10));
    }
}
Output
Feed: [photo1, ...]
Cache Layers as Distance from User
  • Layer 1 (CDN): Static media – miss penalty = 50-100ms (fetch from origin). Hit ratio > 95%.
  • Layer 2 (Redis): Feed data, popularity scores – miss penalty = 5-10ms to get from DB. Hit ratio > 80%.
  • Layer 3 (LRU): User profiles, photo metadata – miss penalty = 1-5ms (local). Hit ratio > 90%.
  • If you have to go to DB, your response time jumps from microseconds to milliseconds – that's where your SLAs break.
Production Insight
Putting all feed data in Redis sounds great until you estimate the memory cost.
500M active users × 1KB per feed entry × 1000 entries = 500TB of RAM. That's $15M/month just for Redis.
Rule: only cache feeds for active users (logged in last 24h). For inactive, regenerate on login.
Memory optimisation: use Redis with compressed data (snappy) and cap feed entries to 200 per user.
Debug it: monitor Redis INFO keyspace; if keyspace_hits ratio drops below 90%, you're caching too little or TTLs are too short.
Performance impact: 99th percentile latency drops from 500ms (no cache) to 180ms (multi-level cache).
Key Takeaway
Cache the feed, not the whole world.
Active-only caching reduces memory by 70% without losing performance.
Always have a cache miss plan: if Redis goes down, your feeds should still work (just slower) by falling back to DB reads.
● Production incidentPOST-MORTEMseverity: high

Celebrity Post Causes Global Feed Delay

Symptom
Feed updates stalled globally for 2 hours. API latency spiked from 200ms to 20s. Message queues (Kafka) backed up with millions of unprocessed fan-out tasks. OOM errors on feed workers.
Assumption
The push-based fan-out model could handle any user because the system was horizontally scalable. The threshold for 'celebrity' handling was not defined.
Root cause
The system applied the same fan-out behaviour to all users. A celebrity with 100M followers triggered 100M entries in a single Kafka partition, overwhelming the consumer group's ability to process before new messages arrived.
Fix
Introduced a hybrid feed generation model: users with follower count > 10M are treated as 'celebrities'. Their posts are stored in a hot table and pulled into follower feeds only when that follower requests their feed. The threshold is configurable via a dynamic feature flag.
Key lesson
  • Profile user follower distributions regularly – the tail (celebrities) is where your system breaks.
  • Always have a safety threshold for push vs pull – don't treat all writes equally.
  • Monitor per-partition lag in Kafka (or equivalent) – it's the first signal that you're overwhelming a single consumer.
Production debug guideSymptom → Immediate Action → Root Cause Analysis4 entries
Symptom · 01
Feed loads slowly (>500ms) or shows stale content
Fix
Check Redis cache hit ratio via redis-cli info stats – if hits < 85%, check feed precomputation worker health. Examine Kafka consumer lag for the feed partition.
Symptom · 02
Photo loading takes >2 seconds or shows broken images
Fix
Verify CDN cache status via curl -I https://cdn.instagram.com/p/.... If X-Cache: MISS, check origin S3 bucket for file existence. Ensure CDN purge didn't wipe popular content.
Symptom · 03
Upload fails with 503 or timeout
Fix
Check API gateway rate limiting logs – if rate exceeded, scale gateway or adjust per-user limits. Also verify auth token expiration and S3 bucket permissions.
Symptom · 04
User sees incorrect follower count or feed order
Fix
Check database replication lag – if SQL/NoSQL replicas are behind, downgrade consistency to LOCAL_QUORUM for reads. Examine async job queue for pending follower count updates.
★ Cheat sheet: Instant Diagnosis for Instagram-Scale IssuesWhen the system goes wrong, these commands get you to the root cause in under 2 minutes.
High API latency
Immediate action
Check if the bottleneck is CPU, memory, or I/O on feed workers.
Commands
kubectl top pods -n feed-service
kubectl logs -l app=feed-worker --tail=100 | grep 'fanout' | head -20
Fix now
Scale feed workers: kubectl scale deployment feed-worker --replicas=20
CDN cache miss ratio > 20%+
Immediate action
Check origin S3 request rate – it might be under heavy load.
Commands
aws cloudwatch get-metric-statistics --metric-name GetRequests --namespace AWS/S3 --statistics Sum --period 300
curl -I <cdn-url> | grep -i 'x-cache-status'
Fix now
Increase CDN TTL for popular media, or use CDN pre-warming for upcoming events.
Kafka consumer lag growing for feed partition+
Immediate action
Identify if one partition is hotter than others (uneven fan-out distribution).
Commands
kafkacat -L -b broker:9092 -t feed-events -J | jq '.topics[].partitions[].partition, .topics[].partitions[].leader'
kafkacat -C -b broker:9092 -t feed-events -p <hot-partition> -o -100 -e | wc -l
Fix now
Temporarily increase consumer threads: kafka-consumer-groups --bootstrap-server broker:9092 --group feed-workers --topic feed-events --reset-offsets --to-latest --execute
Upload error: 'Permission denied'+
Immediate action
Check IAM role permissions for the media upload service.
Commands
aws sts assume-role --role-arn 'arn:aws:iam::123456789012:role/MediaUploadRole' --role-session-name test
aws iam simulate-principal-policy --policy-source-arn 'arn:aws:iam::123456789012:role/MediaUploadRole' --action-names s3:PutObject --resource-arns 'arn:aws:s3:::instagram-media/*'
Fix now
Attach s3:PutObject permission to the role, then test upload again.
Database Choice Comparison for Instagram Components
ComponentRelational (PostgreSQL)NoSQL (Cassandra/HBase)RecommendedWhy
User ProfilesACID compliant, join-friendlyHigh availability, flexible schemaPostgreSQL (sharded)Referential integrity matters; strong consistency for critical user data.
Feed Data (Likes, Comments)High write overhead, scaling issuesColumn-family, fast writes, linear scalabilityCassandraWrite-heavy, tunable consistency; compressible columns.
Media Metadata (Photo URLs, Captions)Possible with shardingExcellent for high throughput writes and readsCassandraHigh write volume (100M/day); no complex joins; eventual consistency fine.
Follower GraphRequires junction table; slow for massive fan-out queriesNot ideal for graph traversalCassandra (with denormalisation) or dedicated graph DB (Neo4j for recommendations)Follower graph is read-heavy; denormalise for fast fan-out; graph DB for recommendation.
Search / ExploreFull-text search limitedElasticsearch/Solr is not NoSQL in the traditional senseElasticsearch (search) + Cassandra (data)Use Elasticsearch for full-text search over posts/descriptions; store source of truth in Cassandra.

Key takeaways

1
Separate Read and Write paths to handle lopsided traffic patterns (1:100 write-to-read ratio).
2
Use Consistent Hashing and Sharding by User_ID to manage data growth across multiple servers.
3
Implement a Hybrid Feed Model (Push for normal users, Pull for celebrities) to avoid the 'Thundering Herd' problem.
4
Leverage CDNs to minimize 'Time to First Byte' (TTFB) for global users.
5
Cache only for active users; use multi-level cache (CDN, Redis, LRU) to achieve sub-300ms feed loads.
6
Always set a celebrity follower threshold
it's not optional, it's a hard requirement for feed stability.

Common mistakes to avoid

5 patterns
×

Storing images directly in the database (BLOB columns)

Symptom
Database size balloons to petabytes, backups take days, read/write latency skyrockets as DB becomes I/O-bound.
Fix
Always store images/videos in an object store (S3, GCS). Keep only the URL string in the database. Use presigned URLs for access control.
×

Ignoring the CDN or deploying a single-region CDN

Symptom
Users in Europe and Asia experience >5s load times for photos stored in US-West; high egress costs from origin.
Fix
Use a global CDN (CloudFront, Cloudflare). Set proper cache-control headers to max-age=86400 for popular content. Pre-warm CDN for major events.
×

Assuming strong consistency is needed for all data (e.g., like counts)

Symptom
Write latency increases due to cross-region replication; availability drops when replicas are down.
Fix
Use eventual consistency for social metrics (likes, comments, follower counts). Show approximate counts with a + indicator. Only enforce strong consistency for user settings and posts.
×

Under-calculating storage growth for videos

Symptom
After 1 year, storage costs exceed projected budget; OOM errors on ingestion pipeline due to slow compression.
Fix
Estimate storage: 100M photos/day * 2MB = 200TB daily. Compress videos with H.265, transcode to multiple resolutions, and move older videos to cold storage (Glacier). Set lifecycle policies from day one.
×

Using auto-increment IDs in distributed sharded databases

Symptom
ID collisions when inserting concurrently; coordination overhead kills write throughput.
Fix
Use a Snowflake-style ID generator (64-bit, time-sortable, datacenter + worker bits). Or use UUID v7 (sortable). Avoid auto-increment entirely.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How would you design Instagram's feed generation system to handle celebr...
Q02SENIOR
How would you shard the database for Instagram? Which shard key would yo...
Q03SENIOR
How do you ensure high availability and durability for photo uploads?
Q04SENIOR
Compare using Cassandra vs PostgreSQL for the photo metadata table. Whic...
Q05SENIOR
How would you estimate the storage and bandwidth needed for Instagram?
Q01 of 05SENIOR

How would you design Instagram's feed generation system to handle celebrities with millions of followers?

ANSWER
Use a hybrid push-pull model. For users below a threshold (e.g., 1M followers), push posts to followers' feed caches (fan-out on write) using Kafka for async fan-out. For celebrities above the threshold, store their posts in a hot table in Redis; at feed request time, merge the precomputed cached feed with recent celebrity posts. The threshold should be configurable and monitored via Kafka consumer lag. This prevents write amplification from destroying the message queue and keeps read latency acceptable.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
How do you handle the 'Celebrity' problem in Instagram's feed?
02
Which database is better for Instagram: SQL or NoSQL?
03
How do you ensure 'High Availability' for image viewing?
04
How do you handle video uploads and streaming?
05
What happens when a user deletes a photo?
06
How do you handle search in Instagram (Explore page)?
🔥

That's System Design Interview. Mark it forged?

4 min read · try the examples if you haven't

Previous
Design TinyURL — Interview
4 / 7 · System Design Interview
Next
Design a Caching System