Design Instagram — Hybrid Push-Pull Feed at 100M Followers
A celebrity post triggered 100M Kafka fan-out tasks, spiking latency from 200ms to 20s.
20+ years shipping production code across the stack, with years spent interviewing engineers. Drawn from code that ran under real load.
- Instagram serves 2B+ users uploading 100M photos/videos daily, delivering feeds in under 300ms
- Core components: Load balancer, API gateway, Object Store (S3), SQL/NoSQL databases, CDN, Feed Service
- Feed generation uses a hybrid push-pull model: push for normal users (fan-out on write), pull for celebrities (fan-out on read)
- Caching layers: Redis for precomputed feeds, LRU for profiles/metadata, CDN for static media – reduces DB reads by 80%
- Storage: 200TB new data daily; hot in S3 for 30 days, then cold archive; CDN edge caches popular media globally
- Scaling: shard by user_id via consistent hashing, use message queues (Kafka) for async fan-out, watch for celebrity hot spots
Imagine a giant post office where a billion people each send photo postcards every day. The post office has to instantly sort every postcard, deliver it only to the people who care about that sender, store the original photo safely forever, and let anyone pull up an old postcard in under a second — even at 3 AM during a major event. Instagram is exactly that post office, and designing it means figuring out every room, shelf, conveyor belt, and delivery truck needed to make it all work without ever losing a single photo.
Instagram serves over two billion monthly active users, processes roughly 100 million photo and video uploads every day, and is expected to return a personalised feed in under 300 milliseconds. When an interviewer asks you to design it, they're not looking for a diagram of boxes connected by arrows — they're watching whether you can reason about trade-offs at scale, make deliberate architectural decisions, and defend them under pressure. This is one of the most common system design questions in FAANG-level loops, and candidates who haven't internalised the nuances consistently get stuck on the feed generation problem or grossly underestimate storage requirements.
The core challenge Instagram solves is deceptively simple on the surface: store media, show it to followers, let people discover new content. Underneath, it's a collision of three genuinely hard distributed-systems problems — write-heavy media ingestion, read-heavy personalised feed delivery, and near-real-time social graph queries — all happening concurrently at planetary scale. Each of those problems demands different storage engines, caching strategies, and consistency models, and they have to coexist inside one coherent product.
By the end of this article you'll have a complete, defensible Instagram design you can present in a 45-minute interview. You'll know the exact numbers to anchor your estimates, the right database choices for each data type, how to generate feeds without melting your servers, where CDNs fit, how to shard your data, and — critically — which trade-offs to call out proactively so the interviewer knows you're thinking like an engineer who has shipped things to production, not just read about it.
Why the Instagram Interview Tests Feed Design
The Design Instagram interview asks you to architect a social feed system for 100M+ followers. The core mechanic is the hybrid push-pull model: pre-compute feeds for high-value users (push) and generate on-read for long-tail users (pull). This balances write amplification against read latency.
In practice, you must decide the fanout threshold — typically around 10K followers. Below that, push to all followers' inboxes. Above that, pull from a timeline cache on read. The key properties: push gives O(1) read but O(followers) write; pull gives O(1) write but O(followers) read. Hybrid keeps both under control.
You use this pattern when writes must be fast (posting a photo) and reads must be fresh (scrolling feed). It matters because naive push at 100M followers means a single post triggers 100M writes — impossible. Hybrid is the only way to serve a billion daily active users without collapsing the database.
Core Component Architecture: Handling 100M Uploads Daily
Designing Instagram isn't just about 'uploading a file.' It's about a decoupled architecture where the Write Path (Uploading) and the Read Path (Feed Generation) are optimized independently. On the write side, we use a Load Balancer to distribute traffic to an API Gateway, which handles authentication and rate limiting. The media itself (Photos/Videos) never touches our relational database; instead, it's streamed to an Object Store like AWS S3 or Google Cloud Storage.
We store the metadata (User ID, Photo URL, Timestamp, Location) in a distributed NoSQL database like Cassandra or a sharded PostgreSQL cluster. This separation ensures that even if our metadata database is busy, our media storage remains performant and durable. To make the images load instantly worldwide, we push them to Edge Locations via a Content Delivery Network (CDN).
Database Sharding & Scalability Strategies
A single database instance will fail at Instagram's scale. We must shard our data. The best strategy is to shard by User_ID. This ensures that all photos from a single user live on the same shard, making the 'View Profile' query extremely fast. However, for the 'Global Feed', we might need a secondary index or a specialized search service like Elasticsearch.
We also implement a multi-layered caching strategy: Redis for the 'Latest Feed' (Pre-computed), and an LRU cache at the application level for frequently accessed user profiles. This reduces the DB read pressure by over 80%.
Feed Generation: The Push-Pull Hybrid Model
The feed is the heart of Instagram. It must show recent posts from followed users in reverse chronological order (or ranked by engagement). Two classic approaches exist: pull (fan-out on read) and push (fan-out on write). Pull means when a user opens the app, we query all followed users' recent posts and merge. Push means when a user posts, we insert that post into every follower's precomputed feed list.
Pull is efficient for celebrities because you don't push to millions; but it's slow for users following hundreds of accounts because you need many queries. Push is fast for reading but costly for writes, especially for popular users. The solution: use push for regular users (fan-out on write) and pull for users with >1M followers (fan-out on read). This hybrid model balances the trade-offs.
- Push: writer writes once, but many recipients must read. Good for small fan-outs.
- Pull: reader reads from many sources; good for large fan-outs because writer isn't burdened.
- Hybrid: set a threshold where the cost of push exceeds the benefit of instant delivery.
- Threshold should be configurable and adjusted based on system load – you can even make it dynamic.
Storage Strategy: Object Store, CDN, and Cold Archival
Media storage at Instagram's scale requires a tiered approach. The primary storage is an object store (AWS S3, Google Cloud Storage) because they offer near-infinite capacity, strong durability (99.999999999%), and pay-per-GB pricing. However, serving every image directly from S3 would be too slow for users far from the data center and expensive in egress costs. That's where CDNs come in: we push popular media to edge servers worldwide so users download from a nearby node.
For cold data – photos older than 30 days with zero views – we move them to a cheaper archival tier (Amazon S3 Glacier, Google Cloud Archive) and serve a placeholder if accessed. The CDN cache also holds a copy for a shorter TTL (e.g., 7 days for popular content, 1 day for normal). Videos are stored as HLS segments for adaptive bitrate streaming.
Caching Strategy: Multi-Level Cache for Sub-300ms Feeds
To achieve sub-300ms feed loads, we need a multi-layer cache. The first layer is a CDN for static media (images, video thumbs). The second layer is an in-memory cache (Redis) for precomputed feeds of active users. The third layer is an application-level LRU cache for frequently accessed metadata (user profiles, popular photos).
For feed data: we store the top N recent posts for each user in Redis (capped at 1000 per user). When a user posts, we push to followers' feed caches (for non-celebrities) or store in a 'celebrity post list' in Redis. For membership services (follower counts, liked), we use a separate Redis cluster with eventual consistency. Cache invalidation is handled via version numbers: each user has a feed version; when a new version is available (due to new post), the client refetches.
- Layer 1 (CDN): Static media – miss penalty = 50-100ms (fetch from origin). Hit ratio > 95%.
- Layer 2 (Redis): Feed data, popularity scores – miss penalty = 5-10ms to get from DB. Hit ratio > 80%.
- Layer 3 (LRU): User profiles, photo metadata – miss penalty = 1-5ms (local). Hit ratio > 90%.
- If you have to go to DB, your response time jumps from microseconds to milliseconds – that's where your SLAs break.
INFO keyspace; if keyspace_hits ratio drops below 90%, you're caching too little or TTLs are too short.Capacity Estimation: The Math They Actually Ask For, But Nobody Does Right
Everyone waves hands about "millions of users" in interviews. But when a senior engineer asks you how much storage you need per day, or what bandwidth your CDN has to provision, they're not being pedantic — they're testing if you've dealt with real infra limits. Let's do the numbers, because a 50TB storage requirement you didn't anticipate will kill your architecture faster than any cache miss.
Assume 500M daily active users. Average photo size after compression: 200KB. Average video size after transcoding: 2MB for a 30-second clip. Ratio of photos to videos: 80/20. Uploads per user per day: 0.5 (most users aren't power posters). That's 250M photos (50TB) and 62.5M videos (125TB) daily. Total: 175TB/day raw storage. Before replication. Before cold archival.
Bandwidth? On read-heavy workloads like feed generation, you serve 10x what you ingest. Add in thumbnail generation (multiple resolutions per upload), and your egress from object store to CDN to user is roughly 2PB/day. You do not handle that with a single region or a naive storage strategy. You design for it from day one.
System Requirements: The Black Box Your Interviewer Actually Opens
Competitor pages list Instagram features like a grocery list: "Post photos, follow users, like posts." Bored. Useless. The interviewer doesn't care if you can list features — they care if you can identify which ones create architectural tension. Here's the real distinction:
Core vs. Edge requirements. Posting photos is core. Creating a 4K video reel and expecting sub-second playback latency? That's a hard edge requirement that forces content-addressable storage and adaptive bitrate streaming. Follow/unfollow is easy until you have 10 million followers on a single celebrity account — then your fan-out model breaks.
Non-functional requirements are where the fight lives. Latency: feed generation must complete under 500ms for 99th percentile. Availability: 99.99% for read paths, but writes can tolerate brief inconsistency. Consistency: eventual consistency is fine for feed — users won't notice a 2-minute lag. But likes on a post? That expects real-time increment visibility. Different tradeoffs.
Capacity estimation isn't a separate section — it's how you validate every other choice. Store photos in S3? Great, what's your upload throughput per second? 100M daily uploads / 86400 seconds ≈ 1,157 uploads/second peak, likely 5x that during a viral event. Your upload API needs to handle 6,000 req/s with graceful degradation. That drives autoscaling policy, connection pooling, and database write throughput calculations.
Sharding Keys: The Decision That Makes or Breaks Your Database
Competitors vaguely mention 'database sharding' but never show you the knife fight: choosing the shard key. Instagram's core tables — User, Post, Follow, Like — each require different sharding strategies, and picking wrong means data hotspots that no cache can fix.
For the Post table, don't shard on user_id. Why? A celebrity posts once and gets 10M likes in an hour. That's 10M writes to a single shard. You've just created a burning hot partition. Instead, shard on post_id (or a hash of post_id). Spreads writes evenly because each post gets its own shard. Reads? Feed generation hits many post_ids anyway, so scatter-gather is unavoidable — but you parallelize that across shards.
For the Follow table? User_id shard works — a user's follow list is read-heavy and small (<10K follows, typically). Hot shard problem exists for celebrities (10M followers), but the read pattern is one user reading their follow graph, not 10M writes. Use a separate adjacency list in a cache layer like Redis for celebrity follow graph lookups. Don't torture your primary database.
For Likes, shard on (post_id, user_id) composite key. The goal is to evenly distribute like events across shards while supporting fast UNIQUE checks to prevent duplicate likes. Composite keys pre-split the write load. If you use auto-increment IDs, wrap them with a shard ID prefix — standard Ticket Server pattern.
Interactions: The Hot Path You Will Mistake for a Side-Show
Interviews focus on feed reads but the hot path is writes from interactions. Like, comment, share, save — these hit the write-heavy chunk of your architecture every time a user breathes. You get 10x more interaction writes than uploads. And they need immediate consistency because no one wants to see their like disappear after a refresh.
Your read-replica strategy breaks here. Interaction writes must hit the primary shard containing the post owner's data. If you shard by user_id, the post's interaction traffic fans out across shards — every 'like' becomes a cross-shard transaction waiting to fail. Instead, shard interactions by post_id, colocating the post's metadata and interaction counters on the same node. This turns a multi-shard nightmare into a local increment.
The trap is treating interactions as low-priority. They generate the feed's engagement signals, trigger real-time notifications, and feed the ranking algorithm. If your design can't handle 500 concurrent likes on a single post, your Instagram clone dies on the first viral photo.
User_Follows: The Graph Problem Your NoSQL Can't Solve
Your SQL fanboy starts talking about a join table. Wrong. Instagram has 500M+ daily active users. A user follows 500 accounts. That's 250B edges. joins at that scale are a death sentence. You need a directed graph stored as a lookup table in a document store or a specialized graph database like Amazon Neptune or Neo4j.
But here's the real problem: when a post goes viral, your feed generation needs to check 'does user A follow user B?' This check happens per post, per feed request, for every user scrolling. A simple key-value check — user_id -> set of followee_ids — handles this at sub-millisecond latency. Store that in Redis as a sorted set. Use the follow timestamp as the score so the feed generator can paginate by recency without a costly join.
The follow/unfollow operation is another trap. Stale followers in the cache cause phantom content in the feed. Use a write-through cache pattern: on unfollow, delete the cached following set for that user. Then lazy-load it on the next feed request. This keeps consistency without cascading invalidations across nodes.
High-Level Design (HLD): The Diagram That Defines Your Interview
Your HLD is not a network topology. It is a story about data flow under constraints. Start with the read vs. write asymmetry: 100M uploads/day vs. 500M feed refreshes/day. Every component choice follows from that ratio. Use a single API gateway for authentication, rate limiting, and request routing. Separate upload and feed read paths at the gateway level—uploads hit a dedicated ingestion service, feeds hit a read-optimized service. The fan-out service writes to each follower's feed cache (push) while a lazy loader fills gaps for inactive users (pull). Your HLD must show the reverse index for comments and likes as separate logical stores, even if physically shared. The interviewer wants to see you encode trade-offs—like why you pick Kafka over RabbitMQ for the async fan-out (throughput, replayability) or why you split the feed read service from the user service (independent scaling, failure isolation). A strong HLD for Instagram is a decision tree, not a checklist.
Low-Level Design (LLD): The Feed Cache That Must Survive Queries per Second
Feed generation at scale fails in data structures, not algorithms. Your LLD must specify the exact Redis data types for the feed cache: a Redis sorted set per user where the key is post_id and score is creation timestamp. Each post stores a serialized object with media_id, user_id, and like_count (not the full image URL—that's in CDN). The fan-out worker writes to these sets in batch: for each new post, it inserts into the top 10,000 followers' sorted sets (push), then signals a lazy loader thread for the rest. The lazy loader uses a Bloom filter to check if a post was already pushed to a user's cache before falling back to the follower's SQL index. Eviction policy is TTL-based (72 hours for active users, 24 for dormant). Cache misses warm from a denormalized Postgres table with index on (user_id, created_at DESC). Your LLD must also handle likes and comments as counter shards: each post's interaction count lives in a Redis hash updated asynchronously via Kafka streams, avoiding write amplification on the feed read path.
Celebrity Post Causes Global Feed Delay
- Profile user follower distributions regularly – the tail (celebrities) is where your system breaks.
- Always have a safety threshold for push vs pull – don't treat all writes equally.
- Monitor per-partition lag in Kafka (or equivalent) – it's the first signal that you're overwhelming a single consumer.
redis-cli info stats – if hits < 85%, check feed precomputation worker health. Examine Kafka consumer lag for the feed partition.curl -I https://cdn.instagram.com/p/.... If X-Cache: MISS, check origin S3 bucket for file existence. Ensure CDN purge didn't wipe popular content.kubectl top pods -n feed-servicekubectl logs -l app=feed-worker --tail=100 | grep 'fanout' | head -20Key takeaways
Common mistakes to avoid
5 patternsStoring images directly in the database (BLOB columns)
Ignoring the CDN or deploying a single-region CDN
Assuming strong consistency is needed for all data (e.g., like counts)
Under-calculating storage growth for videos
Using auto-increment IDs in distributed sharded databases
Interview Questions on This Topic
How would you design Instagram's feed generation system to handle celebrities with millions of followers?
Frequently Asked Questions
20+ years shipping production code across the stack, with years spent interviewing engineers. Drawn from code that ran under real load.
That's System Design Interview. Mark it forged?
11 min read · try the examples if you haven't