Advanced 12 min · March 06, 2026

Design Instagram — Interview

Design Instagram — Hybrid Push-Pull Feed at 100M Followers

Q: How do you handle the 'Celebrity' problem in Instagram's feed?

We use a hybrid approach. For regular users, we 'fan-out' (push) their posts to followers' feeds. For celebrities like Cristiano Ronaldo, we don't push to 600M+ people. Instead, we store their post in a 'Hot Table' and pull it into a follower's feed only at request time.

Q: Which database is better for Instagram: SQL or NoSQL?

Both. SQL (sharded) is better for relational data like User Profiles and Followers where referential integrity matters. NoSQL (like Cassandra) is better for the massive, write-heavy stream of Likes, Comments, and Activity Feeds.

Q: How do you ensure 'High Availability' for image viewing?

We use Object Storage with cross-region replication and a global CDN. If one region goes down, the CDN automatically routes requests to the next nearest healthy edge location or origin server.

Q: How do you handle video uploads and streaming?

Videos are uploaded as single files, then a transcoding service (e.g., AWS Elastic Transcoder) converts them to HLS segments at multiple bitrates. The segments are stored in S3 and served via a CDN with adaptive bitrate playback. Metadata (duration, thumbnail URLs) is stored alongside photo metadata in Cassandra. A separate worker handles thumbnail generation.

Q: What happens when a user deletes a photo?

The delete request goes to the API gateway, which marks the photo as deleted in Cassandra (soft delete). A background worker then removes the file from S3 and invalidates the CDN cache for that URL. The metadata is retained for a short period (30 days) for legal compliance, then hard-deleted. The feed cache is updated to remove the post from followers' feeds via a delete marker.

Q: How do you handle search in Instagram (Explore page)?

We index photo captions, hashtags, and locations into Elasticsearch. When a user opens Explore, we serve a curated feed using collaborative filtering and popularity signals. Search queries hit Elasticsearch for full-text matching, while the actual photo metadata (URLs) is fetched from Cassandra in a second round trip. This decoupling allows each system to scale independently.

A celebrity post triggered 100M Kafka fan-out tasks, spiking latency from 200ms to 20s.

Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Drawn from code that ran under real load.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Instagram serves 2B+ users uploading 100M photos/videos daily, delivering feeds in under 300ms
Core components: Load balancer, API gateway, Object Store (S3), SQL/NoSQL databases, CDN, Feed Service
Feed generation uses a hybrid push-pull model: push for normal users (fan-out on write), pull for celebrities (fan-out on read)
Caching layers: Redis for precomputed feeds, LRU for profiles/metadata, CDN for static media – reduces DB reads by 80%
Storage: 200TB new data daily; hot in S3 for 30 days, then cold archive; CDN edge caches popular media globally
Scaling: shard by user_id via consistent hashing, use message queues (Kafka) for async fan-out, watch for celebrity hot spots

✦ Definition~90s read

What is Design Instagram?

This is a deep-dive system design walkthrough for building Instagram's feed at 100 million daily active users, framed as a senior-level interview problem. The core challenge is generating a personalized, real-time feed for each user from a pool of 100 million daily photo uploads, while keeping p95 latency under 300ms.

★

Imagine a giant post office where a billion people each send photo postcards every day.

The article dissects why this specific problem is a favorite in FAANG interviews—it forces you to balance read vs. write amplification, choose between push (fanout-on-write) and pull (fanout-on-read) architectures, and justify tradeoffs at scale. You'll see concrete numbers: 500 million active users, 100 million daily uploads, and the math behind why a pure push model breaks at around 10,000 followers per user average.

The solution presented is a hybrid push-pull model: push for high-value users (celebrities with millions of followers) and pull for everyone else, with a fanout threshold tuned at runtime. The article covers database sharding by user_id hash (not follower count), multi-level caching with Redis clusters for hot feeds and local LRU caches for edge nodes, and object storage tiering from S3/CloudFront CDN to Glacier for cold archival.

You'll see why Cassandra or ScyllaDB is preferred over Postgres for the feed timeline table, and how to shard the social graph (follower edges) across 100+ nodes to avoid hot spots. The caching strategy alone—L1 (in-memory), L2 (Redis), L3 (CDN for media)—is designed to serve 95% of feed requests from L1/L2, with a fallback to pull from the database for cache misses.

This isn't theory; it's the playbook used by teams at Meta, TikTok, and Twitter to handle billions of feed refreshes daily.

Plain-English First

Imagine a giant post office where a billion people each send photo postcards every day. The post office has to instantly sort every postcard, deliver it only to the people who care about that sender, store the original photo safely forever, and let anyone pull up an old postcard in under a second — even at 3 AM during a major event. Instagram is exactly that post office, and designing it means figuring out every room, shelf, conveyor belt, and delivery truck needed to make it all work without ever losing a single photo.

Instagram serves over two billion monthly active users, processes roughly 100 million photo and video uploads every day, and is expected to return a personalised feed in under 300 milliseconds. When an interviewer asks you to design it, they're not looking for a diagram of boxes connected by arrows — they're watching whether you can reason about trade-offs at scale, make deliberate architectural decisions, and defend them under pressure. This is one of the most common system design questions in FAANG-level loops, and candidates who haven't internalised the nuances consistently get stuck on the feed generation problem or grossly underestimate storage requirements.

The core challenge Instagram solves is deceptively simple on the surface: store media, show it to followers, let people discover new content. Underneath, it's a collision of three genuinely hard distributed-systems problems — write-heavy media ingestion, read-heavy personalised feed delivery, and near-real-time social graph queries — all happening concurrently at planetary scale. Each of those problems demands different storage engines, caching strategies, and consistency models, and they have to coexist inside one coherent product.

By the end of this article you'll have a complete, defensible Instagram design you can present in a 45-minute interview. You'll know the exact numbers to anchor your estimates, the right database choices for each data type, how to generate feeds without melting your servers, where CDNs fit, how to shard your data, and — critically — which trade-offs to call out proactively so the interviewer knows you're thinking like an engineer who has shipped things to production, not just read about it.

Why the Instagram Interview Tests Feed Design

The Design Instagram interview asks you to architect a social feed system for 100M+ followers. The core mechanic is the hybrid push-pull model: pre-compute feeds for high-value users (push) and generate on-read for long-tail users (pull). This balances write amplification against read latency.

In practice, you must decide the fanout threshold — typically around 10K followers. Below that, push to all followers' inboxes. Above that, pull from a timeline cache on read. The key properties: push gives O(1) read but O(followers) write; pull gives O(1) write but O(followers) read. Hybrid keeps both under control.

You use this pattern when writes must be fast (posting a photo) and reads must be fresh (scrolling feed). It matters because naive push at 100M followers means a single post triggers 100M writes — impossible. Hybrid is the only way to serve a billion daily active users without collapsing the database.

⚠ Fanout Threshold Is Not Static

Teams often hardcode the threshold at 10K, but it must be dynamic — based on current write load, cache hit rate, and follower distribution.

📊 Production Insight

When a celebrity posts during a live event, the pull path for their 50M followers sees a thundering herd on the timeline cache — each read triggers a full fanout query. The symptom is P99 latency spikes from 50ms to 5s. The rule: always add a short-lived write-through cache (TTL 30s) for high-fanout users' recent posts.

🎯 Key Takeaway

Push is for high-write-volume users with moderate read volume; pull is for everyone else.

The fanout threshold is the single most impactful tuning knob — get it wrong and you either burn writes or starve reads.

Always design for the 99th percentile user, not the median — a single celebrity post can dominate your system's load.

thecodeforge.io

Design Instagram Interview

Core Component Architecture: Handling 100M Uploads Daily

Designing Instagram isn't just about 'uploading a file.' It's about a decoupled architecture where the Write Path (Uploading) and the Read Path (Feed Generation) are optimized independently. On the write side, we use a Load Balancer to distribute traffic to an API Gateway, which handles authentication and rate limiting. The media itself (Photos/Videos) never touches our relational database; instead, it's streamed to an Object Store like AWS S3 or Google Cloud Storage.

We store the metadata (User ID, Photo URL, Timestamp, Location) in a distributed NoSQL database like Cassandra or a sharded PostgreSQL cluster. This separation ensures that even if our metadata database is busy, our media storage remains performant and durable. To make the images load instantly worldwide, we push them to Edge Locations via a Content Delivery Network (CDN).

io.thecodeforge.instagram.MediaService.javaJAVA

package io.thecodeforge.instagram;

import java.util.UUID;
import java.time.Instant;

/**
 * Represents the Metadata record for a high-scale media upload.
 * In production, this would be persisted to a sharded DB cluster.
 */
public class PhotoMetadata {
    private final String photoId;
    private final String userId;
    private final String s3Url;
    private final long timestamp;

    public PhotoMetadata(String userId, String s3Url) {
        this.photoId = UUID.randomUUID().toString();
        this.userId = userId;
        this.s3Url = s3Url;
        this.timestamp = Instant.now().getEpochSecond();
    }

    public void saveToDatabase() {
        // High-level logic for sharded database insertion
        System.out.println("Persisting metadata to shard based on userId: " + userId);
        System.out.println("Photo ID: " + photoId + " | Storage Path: " + s3Url);
    }

    public static void main(String[] args) {
        PhotoMetadata upload = new PhotoMetadata("user_8821", "https://s3.thecodeforge.io/bucket/img_99.jpg");
        upload.saveToDatabase();
    }
}

Output

Persisting metadata to shard based on userId: user_8821

Photo ID: 7c9e... | Storage Path: https://s3.thecodeforge.io/bucket/img_99.jpg

🔥Forge Tip: The 'Pull' vs 'Push' Feed Model

For regular users, 'Push' their posts to followers' pre-computed feeds (Fan-out). For celebrities with millions of followers, 'Pull' their content only when a follower refreshes their feed. This hybrid approach prevents 'Celebrity Fan-out' from crashing your message queues.

📊 Production Insight

Straight push to all followers works only when your fan-out ratio is low.

At 100M followers per celebrity, a single write triggers 100M queue entries – that's enough to OOM any worker pool.

Rule: always add a dynamic threshold that switches to pull for users above a certain follower count.

Debug it: monitor Kafka partition lag per user – if one user's partition lag grows while others are idle, you've hit the celebrity edge case.

🎯 Key Takeaway

Separate write and read paths are non-negotiable.

Push vs pull: efficiency comes from knowing where your curve breaks.

If you don't set a celebrity threshold, a single viral post will take down your feed.

Feed Generation Strategy Decision

IfFollower count < 10,000

→

UsePush: write post to each follower's feed cache immediately. Low write cost.

IfFollower count 10,000 – 1,000,000

→

UsePush with batch fan-out: enqueue fan-out tasks in Kafka; use 50 workers per partition.

IfFollower count > 1,000,000

→

UsePull: store post in a hot table; on feed request, merge recent posts from all followed accounts via a read-path query.

Database Sharding & Scalability Strategies

A single database instance will fail at Instagram's scale. We must shard our data. The best strategy is to shard by User_ID. This ensures that all photos from a single user live on the same shard, making the 'View Profile' query extremely fast. However, for the 'Global Feed', we might need a secondary index or a specialized search service like Elasticsearch.

We also implement a multi-layered caching strategy: Redis for the 'Latest Feed' (Pre-computed), and an LRU cache at the application level for frequently accessed user profiles. This reduces the DB read pressure by over 80%.

SchemaDesign.sqlSQL

-- io.thecodeforge.instagram - Database Schema Concepts
-- Sharding Key: user_id

CREATE TABLE io_thecodeforge.users (
    user_id BIGINT PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE io_thecodeforge.photos (
    photo_id BIGINT PRIMARY KEY,
    user_id BIGINT REFERENCES io_thecodeforge.users(user_id),
    image_path VARCHAR(255) NOT NULL,
    caption TEXT,
    latitude DECIMAL(9,6),
    longitude DECIMAL(9,6),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Index for fast Feed Generation (Sorted by Time)
CREATE INDEX idx_user_photos_time ON io_thecodeforge.photos(user_id, created_at DESC);

Output

Schema created. Sharding logic should be handled by the application layer or middleware like Vitess.

⚠ Production Reality:

Don't use auto-incrementing IDs in a distributed system. Use a 'Snowflake' ID generator (like Twitter's) to create unique, time-sortable 64-bit IDs across multiple shards.

📊 Production Insight

Sharding by user_id makes profile queries fast but creates hot spots for celebrities.

One celebrity shard gets 100x the writes of others, causing latency tail to spike.

Rule: use consistent hashing with virtual nodes to spread celebrity write load.

Debug it: monitor per-shard write latency; if one shard is hotter, reassign virtual nodes dynamically.

Key trade-off: you give up easy range queries across users – secondary indexes are needed for global feed or search.

🎯 Key Takeaway

Shard by user_id for profile locality.

Consistent hashing handles elastic scaling without downtime.

Always plan for hot keys – they're not a rare edge case, they're guaranteed as you grow.

Shard Key Selection

IfEqual distribution of users across shards

→

UseUse user_id modulo N – simple, but leads to incremental rebalancing when adding shards.

IfHandle celebrity hot spots without manual rebalancing

→

UseUse consistent hashing with 1000 virtual nodes per physical shard – automatically distributes hot keys.

IfNeed cross-shard queries (e.g., find photos near a location)

→

UseAdd a secondary index table sharded differently (e.g., by geo hash) or use Elasticsearch.

thecodeforge.io

Design Instagram Interview

Feed Generation: The Push-Pull Hybrid Model

The feed is the heart of Instagram. It must show recent posts from followed users in reverse chronological order (or ranked by engagement). Two classic approaches exist: pull (fan-out on read) and push (fan-out on write). Pull means when a user opens the app, we query all followed users' recent posts and merge. Push means when a user posts, we insert that post into every follower's precomputed feed list.

Pull is efficient for celebrities because you don't push to millions; but it's slow for users following hundreds of accounts because you need many queries. Push is fast for reading but costly for writes, especially for popular users. The solution: use push for regular users (fan-out on write) and pull for users with >1M followers (fan-out on read). This hybrid model balances the trade-offs.

io.thecodeforge.instagram.FeedGenerationService.javaJAVA

package io.thecodeforge.instagram;

import java.util.*;
import java.util.concurrent.*;

public class FeedGenerationService {
    private static final long CELEBRITY_THRESHOLD = 1_000_000;
    private final FollowerService followerService;
    private final FeedCache feedCache;
    private final ExecutorService fanoutExecutor = Executors.newFixedThreadPool(20);

    public void onNewPost(Post post, String userId) {
        long followerCount = followerService.getFollowerCount(userId);
        if (followerCount < CELEBRITY_THRESHOLD) {
            fanoutToAllFollowers(post, userId);
        } else {
            // celebrity: store post for pull-based retrieval
            feedCache.storeCelebrityPost(userId, post);
        }
    }

    private void fanoutToAllFollowers(Post post, String userId) {
        List<String> followers = followerService.getFollowers(userId);
        for (String followerId : followers) {
            fanoutExecutor.submit(() -> feedCache.addToFeed(followerId, post));
        }
    }

    public List<Post> getFeed(String userId, int limit) {
        // merge cached feed with recent posts from followed celebrities
        List<Post> cachedPosts = feedCache.getFeed(userId);
        List<String> followedCelebrities = followerService.getFollowedCelebrities(userId);
        List<Post> celebrityPosts = feedCache.getRecentCelebrityPosts(followedCelebrities);
        List<Post> merged = mergeAndSort(cachedPosts, celebrityPosts);
        return merged.subList(0, Math.min(limit, merged.size()));
    }

    private List<Post> mergeAndSort(List<Post> a, List<Post> b) {
        // merge two sorted (by timestamp descending) lists
        List<Post> result = new ArrayList<>();
        int i = 0, j = 0;
        while (i < a.size() && j < b.size()) {
            result.add(a.get(i).getTimestamp() >= b.get(j).getTimestamp() ? a.get(i++) : b.get(j++));
        }
        while (i < a.size()) result.add(a.get(i++));
        while (j < b.size()) result.add(b.get(j++));
        return result;
    }
}

Output

Feed generation service uses hybrid push/pull based on follower count.

Mental Model

The Pub/Sub Hybrid Analogy

Think of it like this: normal users 'tweet' and it goes into everyone's mailbox (push); celebrities 'postergize' and you have to walk to the store and grab it when you want (pull).

Push: writer writes once, but many recipients must read. Good for small fan-outs.
Pull: reader reads from many sources; good for large fan-outs because writer isn't burdened.
Hybrid: set a threshold where the cost of push exceeds the benefit of instant delivery.
Threshold should be configurable and adjusted based on system load – you can even make it dynamic.

📊 Production Insight

Push fan-out with 100M followers creates 100M writes per post – that's a 100M spike in write traffic.

If your message queue isn't partitioned properly, a single partition gets backed up and all feeds slow down.

Rule: use multiple Kafka partitions and route fan-out tasks by hash of follower ID, not user ID.

Debug it: if one partition lag is high, check if that user's fan-out tasks are all in one partition due to bad key selection.

Performance impact: hybrid reduces write amplification by 99% for celebrities while keeping feed generation under 50ms for normal users.

🎯 Key Takeaway

Feed generation is the main bottleneck at scale.

Push works for small groups, pull works for massive groups – hybrid is the engineering answer.

Make the threshold configurable: you'll tune it based on observed consumer throughput and latency SLAs.

Storage Strategy: Object Store, CDN, and Cold Archival

Media storage at Instagram's scale requires a tiered approach. The primary storage is an object store (AWS S3, Google Cloud Storage) because they offer near-infinite capacity, strong durability (99.999999999%), and pay-per-GB pricing. However, serving every image directly from S3 would be too slow for users far from the data center and expensive in egress costs. That's where CDNs come in: we push popular media to edge servers worldwide so users download from a nearby node.

For cold data – photos older than 30 days with zero views – we move them to a cheaper archival tier (Amazon S3 Glacier, Google Cloud Archive) and serve a placeholder if accessed. The CDN cache also holds a copy for a shorter TTL (e.g., 7 days for popular content, 1 day for normal). Videos are stored as HLS segments for adaptive bitrate streaming.

io.thecodeforge.instagram.StorageStrategy.javaJAVA

package io.thecodeforge.instagram;

import java.time.*;

public class StorageStrategy {
    enum StorageTier { HOT, COLD, ARCHIVE }

    public StorageTier classifyMedia(String mediaId, Instant lastAccessTime) {
        long daysSinceLastAccess = Duration.between(lastAccessTime, Instant.now()).toDays();
        if (daysSinceLastAccess <= 30) {
            return StorageTier.HOT;
        } else if (daysSinceLastAccess <= 365) {
            return StorageTier.COLD;
        } else {
            return StorageTier.ARCHIVE;
        }
    }

    // Production: this method would invoke S3 lifecycle policies
    public String getMediaUrl(String mediaId, StorageTier tier) {
        switch(tier) {
            case HOT:
                return "https://s3.us-east-1.amazonaws.com/instagram-hot/" + mediaId;
            case COLD:
                return "https://s3.us-west-2.amazonaws.com/instagram-cold/" + mediaId;
            case ARCHIVE:
                return "https://glacier.amazonaws.com/instagram-archive/" + mediaId;
            default:
                throw new IllegalArgumentException("Unknown tier");
        }
    }

    public static void main(String[] args) {
        StorageStrategy s = new StorageStrategy();
        StorageTier tier = s.classifyMedia("photo_abc123", Instant.now().minus(60, ChronoUnit.DAYS));
        System.out.println("Media URL: " + s.getMediaUrl("photo_abc123", tier));
    }
}

Output

Media URL: https://s3.us-west-2.amazonaws.com/instagram-cold/photo_abc123

🔥Data Lifecycle Management

Set S3 lifecycle rules to transition objects: 30 days to Infrequent Access (reduced cost), 365 days to Glacier (archive). CDN TTLs should be shorter (1-7 days) for freshness; use invalidation handles for immediate removal.

📊 Production Insight

Storing all media in one S3 bucket makes it easy to manage but creates a permission nightmare.

Use separate buckets per storage tier (hot, cold, archive) with IAM roles restricting write access to the upload service only.

Rule: never give public read access to S3 – always use CDN signed URLs with expiry.

Performance impact: CDN adds 10-20ms latency but reduces origin load by 95% and saves 80% on egress costs.

Key trade-off: faster global delivery costs more for cache invalidation – you can't immediately purge all edges; use versioned URLs (e.g., include upload timestamp) to avoid cache staleness.

🎯 Key Takeaway

S3 for durability, CDN for speed, Glacier for cost.

Never expose S3 direct URLs – always use CDN signed URLs.

Lifecycle policies are cheap automation – set them on day 1, or pay the cost later.

Caching Strategy: Multi-Level Cache for Sub-300ms Feeds

To achieve sub-300ms feed loads, we need a multi-layer cache. The first layer is a CDN for static media (images, video thumbs). The second layer is an in-memory cache (Redis) for precomputed feeds of active users. The third layer is an application-level LRU cache for frequently accessed metadata (user profiles, popular photos).

For feed data: we store the top N recent posts for each user in Redis (capped at 1000 per user). When a user posts, we push to followers' feed caches (for non-celebrities) or store in a 'celebrity post list' in Redis. For membership services (follower counts, liked), we use a separate Redis cluster with eventual consistency. Cache invalidation is handled via version numbers: each user has a feed version; when a new version is available (due to new post), the client refetches.

io.thecodeforge.instagram.CacheStrategy.javaJAVA

package io.thecodeforge.instagram;

import redis.clients.jedis.Jedis;
import java.util.*;

public class CacheStrategy {
    private static final int FEED_CACHE_SIZE = 1000;
    private final Jedis jedis;

    public CacheStrategy(String redisHost) {
        this.jedis = new Jedis(redisHost);
    }

    public void addPostToUserFeed(String userId, Post post) {
        String key = "feed:" + userId;
        // Use sorted set with timestamp as score
        jedis.zadd(key, post.getTimestamp(), post.getId());
        // Trim to keep only top 1000
        jedis.zremrangeByRank(key, 0, -(FEED_CACHE_SIZE + 1));
    }

    public List<String> getUserFeed(String userId, int start, int count) {
        String key = "feed:" + userId;
        Set<String> ids = jedis.zrevrange(key, start, start + count - 1);
        return new ArrayList<>(ids);
    }

    public long getFeedSize(String userId) {
        return jedis.zcard("feed:" + userId);
    }

    // Cache aside pattern for user profile
    public UserProfile getUserProfile(String userId) {
        String key = "profile:" + userId;
        UserProfile profile = (UserProfile) jedis.get(key); // assuming serialization
        if (profile == null) {
            profile = loadFromDatabase(userId);
            jedis.setex(key, 300, profile.toString()); // 5 min TTL
        }
        return profile;
    }

    private UserProfile loadFromDatabase(String userId) {
        // Placeholder – in reality query sharded DB
        return new UserProfile(userId, "user_" + userId, System.currentTimeMillis());
    }

    public static void main(String[] args) {
        CacheStrategy cache = new CacheStrategy("localhost:6379");
        cache.addPostToUserFeed("user123", new Post("photo1", 1000000L));
        System.out.println("Feed: " + cache.getUserFeed("user123", 0, 10));
    }
}

Output

Feed: [photo1, ...]

Mental Model

Cache Layers as Distance from User

Every cache miss moves you one step further from the user, adding 20-50ms per step. Your goal is to minimise steps.

Layer 1 (CDN): Static media – miss penalty = 50-100ms (fetch from origin). Hit ratio > 95%.
Layer 2 (Redis): Feed data, popularity scores – miss penalty = 5-10ms to get from DB. Hit ratio > 80%.
Layer 3 (LRU): User profiles, photo metadata – miss penalty = 1-5ms (local). Hit ratio > 90%.
If you have to go to DB, your response time jumps from microseconds to milliseconds – that's where your SLAs break.

📊 Production Insight

Putting all feed data in Redis sounds great until you estimate the memory cost.

500M active users × 1KB per feed entry × 1000 entries = 500TB of RAM. That's $15M/month just for Redis.

Rule: only cache feeds for active users (logged in last 24h). For inactive, regenerate on login.

Memory optimisation: use Redis with compressed data (snappy) and cap feed entries to 200 per user.

Debug it: monitor Redis INFO keyspace; if keyspace_hits ratio drops below 90%, you're caching too little or TTLs are too short.

Performance impact: 99th percentile latency drops from 500ms (no cache) to 180ms (multi-level cache).

🎯 Key Takeaway

Cache the feed, not the whole world.

Active-only caching reduces memory by 70% without losing performance.

Always have a cache miss plan: if Redis goes down, your feeds should still work (just slower) by falling back to DB reads.

Capacity Estimation: The Math They Actually Ask For, But Nobody Does Right

Everyone waves hands about "millions of users" in interviews. But when a senior engineer asks you how much storage you need per day, or what bandwidth your CDN has to provision, they're not being pedantic — they're testing if you've dealt with real infra limits. Let's do the numbers, because a 50TB storage requirement you didn't anticipate will kill your architecture faster than any cache miss.

Assume 500M daily active users. Average photo size after compression: 200KB. Average video size after transcoding: 2MB for a 30-second clip. Ratio of photos to videos: 80/20. Uploads per user per day: 0.5 (most users aren't power posters). That's 250M photos (50TB) and 62.5M videos (125TB) daily. Total: 175TB/day raw storage. Before replication. Before cold archival.

Bandwidth? On read-heavy workloads like feed generation, you serve 10x what you ingest. Add in thumbnail generation (multiple resolutions per upload), and your egress from object store to CDN to user is roughly 2PB/day. You do not handle that with a single region or a naive storage strategy. You design for it from day one.

CapacityEstimator.pyPYTHON

// io.thecodeforge — interview tutorial

def estimate_daily_ingest(daily_active_users: int, uploads_per_user: float = 0.5,
                          photo_fraction: float = 0.8, avg_photo_kb: int = 200,
                          video_fraction: float = 0.2, avg_video_mb: int = 2) -> dict:
    total_uploads = int(daily_active_users * uploads_per_user)
    photos = int(total_uploads * photo_fraction)
    videos = total_uploads - photos

    photo_storage_gb = (photos * avg_photo_kb) / (1024 * 1024)
    video_storage_gb = (videos * avg_video_mb)

    return {
        "total_daily_uploads": total_uploads,
        "total_storage_gb": round(photo_storage_gb + video_storage_gb, 2),
        "estimated_egress_gb": round((photo_storage_gb + video_storage_gb) * 10, 2)
    }

capacity = estimate_daily_ingest(500_000_000)
print(f"Daily storage: {capacity['total_storage_gb']} GB")
print(f"Daily egress: {capacity['estimated_egress_gb']} GB")

Output

Daily storage: 175000.0 GB

Daily egress: 1750000.0 GB

⚠ Production Trap: Underestimating Replication Factor

New engineers always forget replication. If you use S3 standard (3x replication) with a 30-day hot retention before moving to Glacier, your first month's raw storage bill is 175TB * 3 = 525TB. At $0.023/GB hot storage, that's over $12,000/month. Before CDN costs. Do the math before you build.

🎯 Key Takeaway

Always include replication factor and retention policies in capacity estimates — it's the hidden cost that kills budgets.

System Requirements: The Black Box Your Interviewer Actually Opens

Competitor pages list Instagram features like a grocery list: "Post photos, follow users, like posts." Bored. Useless. The interviewer doesn't care if you can list features — they care if you can identify which ones create architectural tension. Here's the real distinction:

Core vs. Edge requirements. Posting photos is core. Creating a 4K video reel and expecting sub-second playback latency? That's a hard edge requirement that forces content-addressable storage and adaptive bitrate streaming. Follow/unfollow is easy until you have 10 million followers on a single celebrity account — then your fan-out model breaks.

Non-functional requirements are where the fight lives. Latency: feed generation must complete under 500ms for 99th percentile. Availability: 99.99% for read paths, but writes can tolerate brief inconsistency. Consistency: eventual consistency is fine for feed — users won't notice a 2-minute lag. But likes on a post? That expects real-time increment visibility. Different tradeoffs.

Capacity estimation isn't a separate section — it's how you validate every other choice. Store photos in S3? Great, what's your upload throughput per second? 100M daily uploads / 86400 seconds ≈ 1,157 uploads/second peak, likely 5x that during a viral event. Your upload API needs to handle 6,000 req/s with graceful degradation. That drives autoscaling policy, connection pooling, and database write throughput calculations.

ThroughputCalculator.pyPYTHON

// io.thecodeforge — interview tutorial

def estimate_write_throughput(daily_uploads: int, peak_multiplier: float = 5.0):
    seconds_per_day = 86400
    avg_throughput = daily_uploads / seconds_per_day
    # Assume peak hour is 5x average (Instagram's pattern)
    peak_throughput = avg_throughput * peak_multiplier
    return {
        "average_writes_per_second": round(avg_throughput, 2),
        "peak_writes_per_second": round(peak_throughput, 2),
        "peak_p95_percentage": f"{peak_multiplier * 100}% of average"
    }

io_requirements = estimate_write_throughput(100_000_000)
print(f"Avg writes/sec: {io_requirements['average_writes_per_second']}")
print(f"Peak writes/sec: {io_requirements['peak_writes_per_second']}")

Output

Avg writes/sec: 1157.41

Peak writes/sec: 5787.04

💡Senior Shortcut: State Your Assumptions Out Loud

Tell the interviewer: 'I'm assuming 50/50 read-write split and 5x peak traffic — if your actual profile differs, I'll adjust the database connection pool and caching strategy accordingly.' It shows you model uncertainty, not just ideal numbers.

🎯 Key Takeaway

Functional requirements are cheap; non-functional requirements (latency, consistency, throughput) are the real system design constraints.

Sharding Keys: The Decision That Makes or Breaks Your Database

Competitors vaguely mention 'database sharding' but never show you the knife fight: choosing the shard key. Instagram's core tables — User, Post, Follow, Like — each require different sharding strategies, and picking wrong means data hotspots that no cache can fix.

For the Post table, don't shard on user_id. Why? A celebrity posts once and gets 10M likes in an hour. That's 10M writes to a single shard. You've just created a burning hot partition. Instead, shard on post_id (or a hash of post_id). Spreads writes evenly because each post gets its own shard. Reads? Feed generation hits many post_ids anyway, so scatter-gather is unavoidable — but you parallelize that across shards.

For the Follow table? User_id shard works — a user's follow list is read-heavy and small (<10K follows, typically). Hot shard problem exists for celebrities (10M followers), but the read pattern is one user reading their follow graph, not 10M writes. Use a separate adjacency list in a cache layer like Redis for celebrity follow graph lookups. Don't torture your primary database.

For Likes, shard on (post_id, user_id) composite key. The goal is to evenly distribute like events across shards while supporting fast UNIQUE checks to prevent duplicate likes. Composite keys pre-split the write load. If you use auto-increment IDs, wrap them with a shard ID prefix — standard Ticket Server pattern.

ShardKeyDesigner.pyPYTHON

// io.thecodeforge — interview tutorial

import hashlib

class ShardRouter:
    def __init__(self, num_shards: int = 64):
        self.num_shards = num_shards

    def _hash_to_shard(self, key: str) -> int:
        # Consistent hashing avoids reshuffling on shard count change
        return int(hashlib.md5(key.encode()).hexdigest(), 16) % self.num_shards

    def post_shard(self, post_id: str) -> int:
        return self._hash_to_shard(post_id)

    def like_shard(self, post_id: str, user_id: str) -> int:
        # Composite key for even distribution
        return self._hash_to_shard(f"{post_id}:{user_id}")

router = ShardRouter(num_shards=64)
print(f"Post 'abc123' -> shard {router.post_shard('abc123')}")
print(f"Like (post='abc123', user='user_99') -> shard {router.like_shard('abc123', 'user_99')}")

Output

Post 'abc123' -> shard 47

Like (post='abc123', user='user_99') -> shard 12

🔥Production Trap: Don't Use MOD for Shard Keys

MOD-based sharding (post_id % num_shards) causes massive data migration when you add shards. Instagram grew from 8 to 64 shards — that's 8x reshuffling of all data. Use consistent hashing (hash ring) from day one, even if you only have 4 shards. The code is trivial, the cost of changing later is catastrophic.

🎯 Key Takeaway

Shard keys must be chosen per query pattern — not a one-size-fits-all. Post by post_id, likes by composite key, follows by user_id with separate cache for celebrity hot keys.

Interactions: The Hot Path You Will Mistake for a Side-Show

Interviews focus on feed reads but the hot path is writes from interactions. Like, comment, share, save — these hit the write-heavy chunk of your architecture every time a user breathes. You get 10x more interaction writes than uploads. And they need immediate consistency because no one wants to see their like disappear after a refresh.

Your read-replica strategy breaks here. Interaction writes must hit the primary shard containing the post owner's data. If you shard by user_id, the post's interaction traffic fans out across shards — every 'like' becomes a cross-shard transaction waiting to fail. Instead, shard interactions by post_id, colocating the post's metadata and interaction counters on the same node. This turns a multi-shard nightmare into a local increment.

The trap is treating interactions as low-priority. They generate the feed's engagement signals, trigger real-time notifications, and feed the ranking algorithm. If your design can't handle 500 concurrent likes on a single post, your Instagram clone dies on the first viral photo.

interaction_handler.pyPYTHON

// io.thecodeforge — interview tutorial

class InteractionService:
    def like_post(self, user_id: str, post_id: str):
        # shard by post_id to keep metadata colocated
        post_shard = self.shard_for_post(post_id)
        
        # atomic increment on the primary, no cross-shard
        with post_shard.transaction():
            post_shard.increment('like_count', post_id)
            post_shard.insert('like', {
                'user_id': user_id,
                'post_id': post_id,
                'timestamp': self.now()
            })
        
        # fire async to feed ranking
        self.queue.send('interaction_rank', {
            'post_id': post_id,
            'type': 'like'
        })

Output

like_count incremented for post_abc123

like record inserted

interaction_rank event enqueued

⚠ Sharding Trap:

Sharding interactions by user_id creates a cross-shard write storm on viral posts. Always shard by post_id so every like becomes a local single-shard operation.

🎯 Key Takeaway

Interactions are the write-heavy hot path. Shard by post_id, not user_id, to avoid cross-shard atomic increment failures.

User_Follows: The Graph Problem Your NoSQL Can't Solve

Your SQL fanboy starts talking about a join table. Wrong. Instagram has 500M+ daily active users. A user follows 500 accounts. That's 250B edges. joins at that scale are a death sentence. You need a directed graph stored as a lookup table in a document store or a specialized graph database like Amazon Neptune or Neo4j.

But here's the real problem: when a post goes viral, your feed generation needs to check 'does user A follow user B?' This check happens per post, per feed request, for every user scrolling. A simple key-value check — user_id -> set of followee_ids — handles this at sub-millisecond latency. Store that in Redis as a sorted set. Use the follow timestamp as the score so the feed generator can paginate by recency without a costly join.

The follow/unfollow operation is another trap. Stale followers in the cache cause phantom content in the feed. Use a write-through cache pattern: on unfollow, delete the cached following set for that user. Then lazy-load it on the next feed request. This keeps consistency without cascading invalidations across nodes.

follow_graph.pyPYTHON

// io.thecodeforge — interview tutorial

class FollowGraph:
    def __init__(self, redis, db):
        self.cache = redis
        self.store = db
    
    def get_following(self, user_id: str) -> set:
        # check cache first
        cached = self.cache.smembers(f"following:{user_id}")
        if cached:
            return cached
        
        # lazy load from DB if cache miss
        following = self.store.query(
            "SELECT followee_id FROM follows WHERE follower_id = ?", user_id
        )
        self.cache.sadd(f"following:{user_id}", *following)
        self.cache.expire(f"following:{user_id}", 3600)
        return following
    
    def unfollow(self, follower: str, followee: str):
        self.store.execute(
            "DELETE FROM follows WHERE follower_id = ? AND followee_id = ?",
            follower, followee
        )
        # write-through cache invalidation
        self.cache.delete(f"following:{follower}")

Output

following set cache-missed and loaded from DB

unfollow executed: follower_1234 no longer follows followee_5678

cache cleared for next request

💡Senior Shortcut:

Store follow relationships as sorted sets in Redis with follow timestamp as the score. This lets you paginate the feed by recency without querying the database for every scroll.

🎯 Key Takeaway

The follows graph is a lookup problem, not a join problem. Use Redis sorted sets for sub-millisecond 'does user A follow user B?' checks.

High-Level Design (HLD): The Diagram That Defines Your Interview

Your HLD is not a network topology. It is a story about data flow under constraints. Start with the read vs. write asymmetry: 100M uploads/day vs. 500M feed refreshes/day. Every component choice follows from that ratio. Use a single API gateway for authentication, rate limiting, and request routing. Separate upload and feed read paths at the gateway level—uploads hit a dedicated ingestion service, feeds hit a read-optimized service. The fan-out service writes to each follower's feed cache (push) while a lazy loader fills gaps for inactive users (pull). Your HLD must show the reverse index for comments and likes as separate logical stores, even if physically shared. The interviewer wants to see you encode trade-offs—like why you pick Kafka over RabbitMQ for the async fan-out (throughput, replayability) or why you split the feed read service from the user service (independent scaling, failure isolation). A strong HLD for Instagram is a decision tree, not a checklist.

HLD_Decisions.pyPYTHON

// io.thecodeforge — interview tutorial

# HLD decision rules translated to code logic

class FeedHLD:
    def __init__(self):
        self.read_write_ratio = 5_000_000_000 / 100_000_000  # 50:1
        self.use_kafka = True  # async fan-out needs replay

    def choose_gateway_split(self):
        return {"upload": "ingestion_service", "feed": "read_service"}

    def scaling_rules(self):
        # only scale read service horizontally
        return "feed_read instances >> upload instances"

Output

{'upload': 'ingestion_service', 'feed': 'read_service'}

⚠ Production Trap:

Putting feed generation and upload processing behind the same service causes cascading failures. A sudden upload spike (e.g., Super Bowl) stalls feed reads for all users.

🎯 Key Takeaway

Your HLD is a set of forced trade-offs driven by the read-write ratio.

Low-Level Design (LLD): The Feed Cache That Must Survive Queries per Second

Feed generation at scale fails in data structures, not algorithms. Your LLD must specify the exact Redis data types for the feed cache: a Redis sorted set per user where the key is post_id and score is creation timestamp. Each post stores a serialized object with media_id, user_id, and like_count (not the full image URL—that's in CDN). The fan-out worker writes to these sets in batch: for each new post, it inserts into the top 10,000 followers' sorted sets (push), then signals a lazy loader thread for the rest. The lazy loader uses a Bloom filter to check if a post was already pushed to a user's cache before falling back to the follower's SQL index. Eviction policy is TTL-based (72 hours for active users, 24 for dormant). Cache misses warm from a denormalized Postgres table with index on (user_id, created_at DESC). Your LLD must also handle likes and comments as counter shards: each post's interaction count lives in a Redis hash updated asynchronously via Kafka streams, avoiding write amplification on the feed read path.

FeedCacheLLD.pyPYTHON

// io.thecodeforge — interview tutorial

import redis

r = redis.Redis(decode_responses=True)

def push_post_to_feed(follower_id, post_id, timestamp):
    # sorted set: key = feed:{user_id}, score = timestamp
    r.zadd(f"feed:{follower_id}", {post_id: timestamp})

def get_feed(user_id, start, end):
    # reverse range, newest first
    return r.zrevrange(f"feed:{user_id}", start, end, withscores=True)

# Bloom filter check before lazy load
# store follower count per post to decide push vs pull

Output

[(b'post_abc', 1701274000), (b'post_xyz', 1701273900)]

⚠ Production Trap:

Storing user_follows as a list in Redis causes O(n) walk for top followers. Always store as a sorted set with follower count as score.

🎯 Key Takeaway

At LLD, every cache miss equals a relational query you cannot afford.

Feed Generation: Fanout-on-Write vs Fanout-on-Read Hybrid

In designing Instagram's feed, the core challenge is generating a personalized feed for each user from millions of follow relationships. Two primary strategies exist: fanout-on-write (push) and fanout-on-read (pull). Fanout-on-write pre-computes feeds by pushing each new post to all followers' inboxes at write time. This yields fast reads (O(1)) but is write-heavy, especially for celebrities with millions of followers. Fanout-on-read generates feeds on-demand by pulling recent posts from followed users at read time, which saves write costs but increases read latency and database load. Instagram uses a hybrid approach: for most users (those with fewer than ~100,000 followers), fanout-on-write is used to ensure low-latency feed reads. For celebrities or high-follower accounts, fanout-on-read is employed to avoid overwhelming the system with write operations. This hybrid model balances write amplification and read performance. For example, when a celebrity posts, the system does not push to all followers immediately; instead, followers fetch the post when they refresh their feed, using a pre-fetched list of recent celebrity posts. The hybrid approach also leverages a 'pull' cache for high-follower users, storing their recent posts in a distributed cache (e.g., Redis) to reduce database load. This design is critical for scaling to 100M daily active users, as it prevents the write bottleneck while maintaining sub-300ms feed load times.

feed_generation.pyPYTHON

class FeedGenerator:
    def __init__(self, follow_graph, cache, db):
        self.follow_graph = follow_graph
        self.cache = cache
        self.db = db
        self.FANOUT_THRESHOLD = 100000

    def on_post_created(self, post, author_id):
        follower_count = self.follow_graph.get_follower_count(author_id)
        if follower_count <= self.FANOUT_THRESHOLD:
            self._fanout_on_write(post, author_id)
        else:
            self._fanout_on_read(post, author_id)

    def _fanout_on_write(self, post, author_id):
        followers = self.follow_graph.get_followers(author_id)
        pipeline = self.cache.pipeline()
        for follower_id in followers:
            pipeline.lpush(f'feed:{follower_id}', post['id'])
            pipeline.ltrim(f'feed:{follower_id}', 0, 500)
        pipeline.execute()

    def _fanout_on_read(self, post, author_id):
        # Store post in a celebrity timeline cache
        self.cache.lpush(f'celebrity_timeline:{author_id}', post['id'])
        self.cache.ltrim(f'celebrity_timeline:{author_id}', 0, 1000)

    def get_feed(self, user_id):
        feed_ids = self.cache.lrange(f'feed:{user_id}', 0, 100)
        if not feed_ids:
            # Fallback to pull from followed celebrities
            celebrities = self.follow_graph.get_followed_celebrities(user_id)
            for celeb_id in celebrities:
                celeb_posts = self.cache.lrange(f'celebrity_timeline:{celeb_id}', 0, 10)
                feed_ids.extend(celeb_posts)
            feed_ids = sorted(feed_ids, reverse=True)[:100]
        return self.db.get_posts_by_ids(feed_ids)

💡Hybrid Threshold Tuning

📊 Production Insight

At scale, the hybrid approach reduces write amplification by 90% for celebrity posts while keeping feed generation under 300ms for 99% of users.

🎯 Key Takeaway

Instagram uses a hybrid push-pull model: fanout-on-write for users with fewer than 100k followers, fanout-on-read for celebrities, balancing write amplification and read latency.

Instagram Reels: Video Processing Pipeline and CDN Strategy

Instagram Reels are short-form videos (up to 90 seconds) that require efficient processing and global delivery. The video processing pipeline involves several stages: ingestion, transcoding, thumbnail generation, and quality assessment. When a user uploads a Reel, the video is first stored in a temporary object store (e.g., S3). A message is published to a task queue (e.g., Kafka) for asynchronous processing. The transcoding service converts the video into multiple resolutions (e.g., 360p, 720p, 1080p) and formats (H.264, H.265) to support various devices and network conditions. Thumbnails are generated at key moments (e.g., first frame, mid-point) for previews. Quality assessment checks for blur, audio sync, and policy compliance. After processing, the final videos are stored in a CDN (e.g., Akamai, CloudFront) with edge caching. The CDN strategy uses geographic routing to serve videos from the nearest edge node, reducing latency. For popular Reels, content is pre-cached at edge nodes based on trending algorithms. Adaptive bitrate streaming (ABR) is implemented using HLS or MPEG-DASH, allowing the client to switch resolutions dynamically based on network bandwidth. For example, a user on a slow connection receives 360p, while a user on Wi-Fi gets 1080p. The pipeline also includes a 'hot' cache layer (e.g., Redis) for metadata like view counts and likes. To handle viral spikes, the system uses a 'thundering herd' protection mechanism: if a Reel goes viral, the CDN origin shield absorbs the load and caches the content aggressively. This design ensures sub-2 second startup time for Reels globally.

video_pipeline.pyPYTHON

class VideoPipeline:
    def __init__(self, storage, queue, transcoder, cdn):
        self.storage = storage
        self.queue = queue
        self.transcoder = transcoder
        self.cdn = cdn

    def process_upload(self, video_id, user_id):
        # Step 1: Store raw video
        raw_url = self.storage.upload_temp(video_id)
        # Step 2: Enqueue processing task
        self.queue.publish('video_processing', {
            'video_id': video_id,
            'raw_url': raw_url,
            'user_id': user_id
        })

    def transcode_video(self, video_id, raw_url):
        # Transcode to multiple resolutions
        resolutions = [360, 720, 1080]
        urls = {}
        for res in resolutions:
            output_url = self.transcoder.transcode(raw_url, res)
            urls[res] = output_url
        # Generate thumbnails
        thumbnails = self.transcoder.generate_thumbnails(raw_url, [0, 0.5, 1.0])
        # Store manifest
        manifest = {
            'video_id': video_id,
            'urls': urls,
            'thumbnails': thumbnails,
            'status': 'ready'
        }
        self.storage.save_manifest(video_id, manifest)
        # Pre-cache popular content
        if self._is_trending(video_id):
            self.cdn.pre_cache(urls.values())
        return manifest

    def _is_trending(self, video_id):
        # Simplified trending detection
        return self.storage.get_view_count(video_id) > 10000

🔥CDN Origin Shield

📊 Production Insight

Pre-caching trending Reels at CDN edges reduces origin load by 60% and improves startup time by 40% during viral events.

🎯 Key Takeaway

Reels processing uses a multi-resolution transcoding pipeline with CDN edge caching and adaptive bitrate streaming to deliver videos quickly worldwide.

Stories and Ephemeral Content: Storage and Expiration Design

Instagram Stories are ephemeral content that disappears after 24 hours. The design must handle high write throughput (millions of uploads per day) and efficient expiration. Storage strategy uses a combination of object store (e.g., S3) for media files and a time-series database (e.g., Cassandra) for metadata. Each story is stored as a blob in S3 with a key like stories/{user_id}/{story_id}.jpg. Metadata includes user_id, story_id, timestamp, expiration_time, and view counts. To enforce expiration, a TTL (time-to-live) mechanism is used. In Cassandra, the TTL is set at the column level; after 24 hours, the data is automatically deleted. For S3, a lifecycle policy is configured to delete objects older than 24 hours. However, to avoid scanning all objects, the system uses a 'bucket per day' approach: stories are stored in daily buckets (e.g., stories/2025-03-20/), and the entire bucket is deleted after 24 hours. This reduces cleanup overhead. For active users, stories are cached in Redis with a TTL of 24 hours to serve the 'Stories tray' quickly. When a user views a story, the system updates view counts asynchronously using a message queue to avoid write amplification. Expired stories are purged from cache via Redis keyspace notifications or periodic scans. For example, when a story expires, the cache key story:{story_id} is evicted, and the S3 lifecycle policy removes the object. This design ensures that expired content is not served and storage costs are minimized. The system also handles 'highlighted' stories (permanent) by moving them to a separate bucket with no TTL.

story_expiration.pyPYTHON

import time
from datetime import datetime, timedelta

class StoryManager:
    def __init__(self, s3, cassandra, redis):
        self.s3 = s3
        self.cassandra = cassandra
        self.redis = redis
        self.TTL_SECONDS = 24 * 3600

    def upload_story(self, user_id, media_bytes):
        story_id = self._generate_id()
        timestamp = int(time.time())
        # Store in daily bucket
        bucket = f'stories/{datetime.utcnow().strftime("%Y-%m-%d")}'
        key = f'{user_id}/{story_id}.jpg'
        self.s3.put_object(Bucket=bucket, Key=key, Body=media_bytes)
        # Store metadata in Cassandra with TTL
        self.cassandra.execute(
            "INSERT INTO stories (user_id, story_id, timestamp, expiration) VALUES (%s, %s, %s, %s) USING TTL %s",
            (user_id, story_id, timestamp, timestamp + self.TTL_SECONDS, self.TTL_SECONDS)
        )
        # Cache in Redis
        self.redis.setex(f'story:{story_id}', self.TTL_SECONDS, media_bytes)
        return story_id

    def get_story(self, story_id):
        # Try cache first
        cached = self.redis.get(f'story:{story_id}')
        if cached:
            return cached
        # Fallback to S3 (but check expiration)
        # In practice, we rely on TTL to avoid serving expired
        return None

    def expire_old_stories(self):
        # Delete yesterday's bucket
        yesterday = datetime.utcnow() - timedelta(days=1)
        bucket = f'stories/{yesterday.strftime("%Y-%m-%d")}'
        self.s3.delete_bucket(Bucket=bucket)
        # Cassandra TTL handles automatic deletion

⚠ TTL Accuracy

📊 Production Insight

Bucket-per-day strategy reduces S3 listing costs by 99% compared to per-object expiration, and Cassandra TTL avoids background cleanup jobs.

🎯 Key Takeaway

Instagram Stories use daily S3 buckets with lifecycle policies and Cassandra TTLs for automatic 24-hour expiration, with Redis caching for fast access.

● Production incidentPOST-MORTEMseverity: high

Celebrity Post Causes Global Feed Delay

Symptom

Feed updates stalled globally for 2 hours. API latency spiked from 200ms to 20s. Message queues (Kafka) backed up with millions of unprocessed fan-out tasks. OOM errors on feed workers.

Assumption

The push-based fan-out model could handle any user because the system was horizontally scalable. The threshold for 'celebrity' handling was not defined.

Root cause

The system applied the same fan-out behaviour to all users. A celebrity with 100M followers triggered 100M entries in a single Kafka partition, overwhelming the consumer group's ability to process before new messages arrived.

Fix

Introduced a hybrid feed generation model: users with follower count > 10M are treated as 'celebrities'. Their posts are stored in a hot table and pulled into follower feeds only when that follower requests their feed. The threshold is configurable via a dynamic feature flag.

Key lesson

Profile user follower distributions regularly – the tail (celebrities) is where your system breaks.
Always have a safety threshold for push vs pull – don't treat all writes equally.
Monitor per-partition lag in Kafka (or equivalent) – it's the first signal that you're overwhelming a single consumer.

Production debug guideSymptom → Immediate Action → Root Cause Analysis4 entries

Symptom · 01

Feed loads slowly (>500ms) or shows stale content

→

Fix

Check Redis cache hit ratio via redis-cli info stats – if hits < 85%, check feed precomputation worker health. Examine Kafka consumer lag for the feed partition.

Symptom · 02

Photo loading takes >2 seconds or shows broken images

→

Fix

Verify CDN cache status via curl -I https://cdn.instagram.com/p/.... If X-Cache: MISS, check origin S3 bucket for file existence. Ensure CDN purge didn't wipe popular content.

Symptom · 03

Upload fails with 503 or timeout

→

Fix

Check API gateway rate limiting logs – if rate exceeded, scale gateway or adjust per-user limits. Also verify auth token expiration and S3 bucket permissions.

Symptom · 04

User sees incorrect follower count or feed order

→

Fix

Check database replication lag – if SQL/NoSQL replicas are behind, downgrade consistency to LOCAL_QUORUM for reads. Examine async job queue for pending follower count updates.

★ Cheat sheet: Instant Diagnosis for Instagram-Scale IssuesWhen the system goes wrong, these commands get you to the root cause in under 2 minutes.

High API latency−

Immediate action

Check if the bottleneck is CPU, memory, or I/O on feed workers.

Commands

kubectl top pods -n feed-service

kubectl logs -l app=feed-worker --tail=100 | grep 'fanout' | head -20

Fix now

Scale feed workers: kubectl scale deployment feed-worker --replicas=20

CDN cache miss ratio > 20%+

Kafka consumer lag growing for feed partition+

Upload error: 'Permission denied'+

Database Choice Comparison for Instagram Components

Component	Relational (PostgreSQL)	NoSQL (Cassandra/HBase)	Recommended	Why
User Profiles	ACID compliant, join-friendly	High availability, flexible schema	PostgreSQL (sharded)	Referential integrity matters; strong consistency for critical user data.
Feed Data (Likes, Comments)	High write overhead, scaling issues	Column-family, fast writes, linear scalability	Cassandra	Write-heavy, tunable consistency; compressible columns.
Media Metadata (Photo URLs, Captions)	Possible with sharding	Excellent for high throughput writes and reads	Cassandra	High write volume (100M/day); no complex joins; eventual consistency fine.
Follower Graph	Requires junction table; slow for massive fan-out queries	Not ideal for graph traversal	Cassandra (with denormalisation) or dedicated graph DB (Neo4j for recommendations)	Follower graph is read-heavy; denormalise for fast fan-out; graph DB for recommendation.
Search / Explore	Full-text search limited	Elasticsearch/Solr is not NoSQL in the traditional sense	Elasticsearch (search) + Cassandra (data)	Use Elasticsearch for full-text search over posts/descriptions; store source of truth in Cassandra.

⚙ Quick Reference

15 commands from this guide

File	Command / Code	Purpose
io.thecodeforge.instagram.MediaService.java	/**	Core Component Architecture
SchemaDesign.sql	CREATE TABLE io_thecodeforge.users (	Database Sharding & Scalability Strategies
io.thecodeforge.instagram.FeedGenerationService.java	public class FeedGenerationService {	Feed Generation
io.thecodeforge.instagram.StorageStrategy.java	public class StorageStrategy {	Storage Strategy
io.thecodeforge.instagram.CacheStrategy.java	public class CacheStrategy {	Caching Strategy
CapacityEstimator.py	def estimate_daily_ingest(daily_active_users: int, uploads_per_user: float = 0.5...	Capacity Estimation
ThroughputCalculator.py	def estimate_write_throughput(daily_uploads: int, peak_multiplier: float = 5.0):	System Requirements
ShardKeyDesigner.py	class ShardRouter:	Sharding Keys
interaction_handler.py	class InteractionService:	Interactions
follow_graph.py	class FollowGraph:	User_Follows
HLD_Decisions.py	class FeedHLD:	High-Level Design (HLD)
FeedCacheLLD.py	r = redis.Redis(decode_responses=True)	Low-Level Design (LLD)
feed_generation.py	class FeedGenerator:	Feed Generation
video_pipeline.py	class VideoPipeline:	Instagram Reels
story_expiration.py	from datetime import datetime, timedelta	Stories and Ephemeral Content

Key takeaways

Separate Read and Write paths to handle lopsided traffic patterns (1:100 write-to-read ratio).

Use Consistent Hashing and Sharding by User_ID to manage data growth across multiple servers.

Implement a Hybrid Feed Model (Push for normal users, Pull for celebrities) to avoid the 'Thundering Herd' problem.

Leverage CDNs to minimize 'Time to First Byte' (TTFB) for global users.

Cache only for active users; use multi-level cache (CDN, Redis, LRU) to achieve sub-300ms feed loads.

Always set a celebrity follower threshold

it's not optional, it's a hard requirement for feed stability.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How would you design Instagram's feed generation system to handle celebr...

Q02SENIOR

How would you shard the database for Instagram? Which shard key would yo...

Q03SENIOR

How do you ensure high availability and durability for photo uploads?

Q04SENIOR

Compare using Cassandra vs PostgreSQL for the photo metadata table. Whic...

Q05SENIOR

How would you estimate the storage and bandwidth needed for Instagram?

Q01 of 05SENIOR

How would you design Instagram's feed generation system to handle celebrities with millions of followers?

ANSWER

Use a hybrid push-pull model. For users below a threshold (e.g., 1M followers), push posts to followers' feed caches (fan-out on write) using Kafka for async fan-out. For celebrities above the threshold, store their posts in a hot table in Redis; at feed request time, merge the precomputed cached feed with recent celebrity posts. The threshold should be configurable and monitored via Kafka consumer lag. This prevents write amplification from destroying the message queue and keeps read latency acceptable.

FAQ · 6 QUESTIONS

Frequently Asked Questions

How do you handle the 'Celebrity' problem in Instagram's feed?

Which database is better for Instagram: SQL or NoSQL?

How do you ensure 'High Availability' for image viewing?

How do you handle video uploads and streaming?

What happens when a user deletes a photo?

How do you handle search in Instagram (Explore page)?

Naren Founder & Principal Engineer

20+ years shipping production code across the stack, with years spent interviewing engineers. Drawn from code that ran under real load.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's System Design Interview. Mark it forged?

12 min read · try the examples if you haven't