Senior 4 min · June 25, 2026

Design Instagram: The Real-World System Behind 500M Daily Active Users

System design for Instagram: sharding, caching, feed generation, and the production traps that bring down photo-sharing at scale..

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

Instagram's system design relies on a combination of microservices for upload, feed generation, and storage, with a CDN for media delivery, a NoSQL database for user metadata, and a distributed cache for hot feeds. The key challenge is generating personalized feeds at low latency while handling write-heavy uploads.

✦ Definition~90s read
What is Design Instagram?

Design Instagram is a system design exercise that models how to build a scalable photo-sharing service handling millions of uploads, billions of feed views, and low-latency content delivery across a global user base.

Think of Instagram like a massive photo album shared by the whole world.
Plain-English First

Think of Instagram like a massive photo album shared by the whole world. When you upload a photo, it's like putting a print into a central warehouse (object storage), then making copies to display in your friends' albums (feed generation). The tricky part is that millions of people are adding photos every second, and each person's album (feed) is different. You can't just show everyone the same photos—you have to sort and rank them just for you, and do it fast enough that you don't notice the delay.

Here's what everyone gets wrong about designing Instagram: they focus on the upload path. The real nightmare is the read path—generating a personalized feed for 500 million daily active users in under 500 milliseconds. I've seen teams burn months optimizing photo storage while their feed latency crawled to 10 seconds because they ignored fan-out. This article walks through the actual architecture that makes Instagram work at scale: the sharding strategy, the cache hierarchy, and the feed generation patterns that separate a working prototype from a production system. By the end, you'll know exactly how to design a photo-sharing service that doesn't fall over when a celebrity posts a selfie.

Why Instagram's Read Path Is the Hard Part

Most tutorials start with upload: client sends photo, server stores it, done. That's the easy 10%. The hard 90% is the read path—generating a feed for each user that's personalized, ranked, and delivered under 500ms. Without a proper design, a single celebrity post can cause a thundering herd that takes down your feed service. The core problem is fan-out: when a user with 10 million followers posts a photo, you need to insert that post into 10 million feed caches. Do it synchronously and your write latency explodes. Do it lazily and followers see stale feeds. The real Instagram uses a hybrid approach: push for users with few followers, pull for users with many followers, and a threshold (e.g., 10K followers) to switch.

FeedGeneration.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — System Design tutorial

// Hybrid fan-out: push for small followers, pull for large
function generateFeed(userId, limit) {
  const followerCount = getUserFollowerCount(userId);
  if (followerCount < 10000) {
    // Push-based: pre-compute feed on post
    return getCachedFeed(userId, limit);
  } else {
    // Pull-based: compute on read from followees' recent posts
    const followees = getFollowees(userId);
    const posts = [];
    for (let followeeId of followees) {
      const recentPosts = getRecentPosts(followeeId, limit / followees.length);
      posts.push(...recentPosts);
    }
    return rankPosts(posts, userId);
  }
}
Output
Returns a list of post IDs, ranked by recency and relevance, from cache or computed on the fly.
Production Trap: Synchronous Fan-Out
If you push every post to every follower synchronously, a single post from a celebrity can cause 10 million database writes. Your write throughput will collapse. Always use async fan-out with a message queue (Kafka) and batch writes to the feed cache.
Instagram System Design: Read Path, Sharding, Caching, Upload, Feed, Celebrity, THECODEFORGE.IO Instagram System Design: Read Path, Sharding, Caching, Upload, Feed, Celebrity, Key components for scaling to 500M daily active users Read Path: Fanout on Read Pull model for feed generation; heavy on cache Sharding User Data Hash-based sharding by user ID; avoid hot spots Caching: Redis + Memcached Cache feed, user metadata, media metadata Upload Pipeline: Client to CDN Direct upload to S3 via presigned URLs; CDN for delivery Feed Ranking: ML + Recency Rank by engagement signals, not just time Celebrity Problem: Separate Fanout Pre-compute feed for celebs; avoid fanout storms ⚠ Don't use SQL for all data; NoSQL for high write throughput Use Cassandra for user graph, PostgreSQL for transactional data THECODEFORGE.IO
thecodeforge.io
Instagram System Design: Read Path, Sharding, Caching, Upload, Feed, Celebrity, DB
Design Instagram
Read Path: Feed GenerationTHECODEFORGE.IORead Path: Feed GenerationHow a user's feed is assembled in under 500msUser ScrollsClient requests feed via APICheck CacheRedis sorted set by timestampCache Hit?Return cached post IDsPull GeneratorFetch recent posts from followersRank & ReturnML ranking, then render feed⚠ Cache miss triggers pull-based generation; keep it under 500msTHECODEFORGE.IO
thecodeforge.io
Read Path: Feed Generation
Design Instagram

Sharding User Data Without Losing Your Mind

User data—profiles, followers, posts—needs to be sharded across databases. The naive approach is to shard by user ID modulo N. That works until a hot user (celebrity) causes a single shard to be hammered. Instagram uses a two-level sharding: first by user ID hash, then by a configurable number of logical shards per physical database. This allows resharding without downtime. For the social graph (follows), they use a graph database (TAO) that stores edges as key-value pairs with locality. The key insight: always keep related data (user + their posts) on the same shard to avoid cross-shard queries.

ShardingStrategy.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
// io.thecodeforge — System Design tutorial

// Two-level sharding: logical shard -> physical database
function getShard(userId) {
  const logicalShard = hash(userId) % TOTAL_LOGICAL_SHARDS;
  // Map logical shard to physical DB using a config file
  const physicalDb = shardMap[logicalShard];
  return physicalDb;
}

// When resharding, update shardMap gradually and move data in background
Output
Returns the database connection for a given user ID.
Senior Shortcut: Pre-Join Data
Store a user's recent posts in the same database row as the user profile. This avoids a join across tables or shards when loading a profile page. Denormalize aggressively for read-heavy workloads.

Caching: The Only Way to Survive Peak Traffic

Instagram's feed is cached in Redis clusters. Each user's feed is a sorted set of post IDs with score = timestamp. When a user scrolls, they read from cache. Cache misses fall back to the pull-based generator. The cache is pre-warmed for the top 1% of users (by follower count) to handle sudden spikes. For media (photos, videos), a CDN (Akamai) caches at edge locations. The CDN cache hit ratio should be >95% for static content. If it drops below 90%, you're paying too much for origin bandwidth. The classic rookie mistake is caching the entire post object in Redis—cache only post IDs and metadata, fetch media URLs from CDN.

CacheStrategy.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — System Design tutorial

// Cache feed as sorted set of post IDs
function addPostToFeeds(postId, userId, timestamp) {
  const followers = getFollowers(userId);
  for (let followerId of followers) {
    if (followerId in hotUsers) {
      redis.zadd(`feed:${followerId}`, timestamp, postId);
      redis.expire(`feed:${followerId}`, 3600); // TTL 1 hour
    }
  }
}

// Read feed: try cache first, then generate
function getFeed(userId, limit) {
  const cached = redis.zrevrange(`feed:${userId}`, 0, limit - 1);
  if (cached.length > 0) return cached;
  return generateFeed(userId, limit);
}
Output
List of post IDs from cache or generated on the fly.
Interview Gold: Cache Invalidation
When a user unfollows someone, you must delete that user's posts from the follower's feed cache. Do this asynchronously with a background job. Otherwise, stale posts appear in feeds indefinitely.

Upload Pipeline: From Client to CDN

When a user uploads a photo, the client sends it directly to a CDN upload endpoint (not your server). The CDN returns a URL. Your server then receives only metadata (URL, caption, location) and stores it in the database. This offloads bandwidth from your servers. The CDN also handles resizing and format conversion (WebP, AVIF). The upload service is stateless and can be scaled horizontally. The bottleneck is the database write for the post metadata. Use a write-ahead log (Kafka) to buffer writes and batch them into the database. Never write directly to the database on every upload—you'll saturate the disk I/O.

UploadPipeline.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — System Design tutorial

// Upload handler: receive metadata, queue for DB write
function handleUpload(req, res) {
  const { userId, imageUrl, caption, location } = req.body;
  // Validate and sanitize
  const post = { userId, imageUrl, caption, location, timestamp: Date.now() };
  // Send to Kafka for async batch insert
  kafka.send('post-uploads', post);
  res.status(202).json({ message: 'Upload accepted' });
}

// Kafka consumer batches writes
function batchInsertPosts() {
  const batch = [];
  kafka.consume('post-uploads', (post) => {
    batch.push(post);
    if (batch.length >= 100) {
      db.insertMany(batch);
      batch.length = 0;
    }
  });
}
Output
HTTP 202 Accepted response. Post metadata is eventually consistent in the database.
Never Do This: Synchronous DB Write on Upload
Writing to the database on every upload will cause connection pool exhaustion under load. You'll see 'Error: Connection pool exhausted' in your logs. Always buffer writes with a queue.

Feed Ranking: Beyond Recency

Instagram's feed isn't just chronological. It's ranked by a machine learning model that considers affinity (how often you interact with the poster), timeliness, and content type. The ranking service runs as a separate microservice that takes a list of candidate post IDs and returns a scored list. The model is updated daily. The challenge is latency: ranking must complete in under 100ms. Use a lightweight model (e.g., logistic regression with feature precomputation) rather than a deep neural network. Precompute features like 'average likes per post from this user' and store them in a key-value store (Cassandra).

FeedRanking.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
// io.thecodeforge — System Design tutorial

// Simplified ranking: score = affinity * recency
function rankPosts(posts, userId) {
  return posts.map(post => {
    const affinity = getAffinity(userId, post.userId); // 0-1
    const recency = 1 / (Date.now() - post.timestamp); // inverse
    return { postId: post.id, score: affinity * recency };
  }).sort((a, b) => b.score - a.score);
}
Output
Sorted list of post IDs with scores.
Senior Shortcut: Precompute Affinity
Don't compute affinity on the fly. Run a daily batch job that calculates affinity scores for each user-followee pair and stores them in a cache. This reduces ranking latency from 500ms to 10ms.

Handling the Celebrity Problem

A user with 50 million followers posts a photo. If you push to all followers synchronously, your feed cache write rate spikes to 50 million writes per second. The solution: use a pull-based model for users with >10K followers. Their feed is generated on read by fetching recent posts from followees. For the celebrity's own feed, they see a push-based feed of their own posts. Additionally, rate-limit the fan-out: only push to the first 10K followers (the most active ones) and let the rest pull. This is exactly what Instagram does.

CelebrityFanout.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// io.thecodeforge — System Design tutorial

// Fan-out with threshold
function fanoutPost(post, userId) {
  const followerCount = getUserFollowerCount(userId);
  if (followerCount <= 10000) {
    // Push to all followers
    const followers = getFollowers(userId);
    for (let followerId of followers) {
      redis.zadd(`feed:${followerId}`, post.timestamp, post.id);
    }
  } else {
    // Push only to top 10K active followers
    const activeFollowers = getActiveFollowers(userId, 10000);
    for (let followerId of activeFollowers) {
      redis.zadd(`feed:${followerId}`, post.timestamp, post.id);
    }
    // Mark post for pull-based retrieval
    redis.sadd(`recent-posts:${userId}`, post.id);
  }
}
Output
Post is added to feeds of active followers; others fetch on read.
Production Trap: Thundering Herd
When a celebrity posts, the pull-based followers will all try to generate their feed simultaneously. Use a cache-aside pattern with a mutex lock per user to prevent multiple requests from regenerating the same feed. Otherwise, your database will be hammered.
Push vs Pull for Celebrity PostsTHECODEFORGE.IOPush vs Pull for Celebrity PostsHandling a user with 50M followersPush ModelWrite post to all 50M feeds50M writes/sec spikeCache write rate bottleneckHigh latency for followersNot scalable for mega-usersPull ModelStore post in celebrity's timeline0 writes to follower feedsRead-time fetch from timelineCache only for <10K followersHandles any follower countPull model avoids write amplification; push only for small accountsTHECODEFORGE.IO
thecodeforge.io
Push vs Pull for Celebrity Posts
Design Instagram

Database Choice: SQL vs NoSQL

Instagram uses PostgreSQL for user metadata and posts, Cassandra for the social graph (follows), and Redis for caching. Why not all in one? PostgreSQL provides strong consistency for transactions (e.g., user registration). Cassandra provides high write throughput for the social graph (millions of follows/unfollows per second). Redis provides low-latency reads for feeds. The trade-off: eventual consistency between systems. A user might follow someone and not see their posts for a few seconds. That's acceptable. Never use a single monolithic database—you'll hit scaling limits.

The Classic Bug: Cross-Shard Joins
If you store user profiles in PostgreSQL and posts in Cassandra, don't try to join them in application code. You'll end up with N+1 queries. Instead, denormalize: store a user's recent post IDs in the user profile row.

When Not to Use This Design

This architecture is overkill for a photo-sharing app with fewer than 1 million users. If you're building an MVP, use a monolithic backend with a single PostgreSQL database and a CDN for images. The complexity of microservices, sharding, and async fan-out will slow you down. Only adopt these patterns when you see specific pain points: feed latency >1 second, database CPU >80%, or upload failures due to write contention. Also, if your app is read-heavy but not write-heavy (e.g., a gallery), a simpler pull-based feed with a CDN cache is sufficient.

Senior Shortcut: Start Simple
For the first 100K users, a single PostgreSQL instance with a Redis cache and a CDN will handle the load. Don't pre-optimize. When you hit 1M users, start sharding. When you hit 10M, add async fan-out. Premature scaling is the root of all evil.
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
Feed generation service containers were OOM-killed every 30 minutes during peak hours.
Assumption
Memory leak in feed ranking algorithm.
Root cause
Each feed generation request loaded the entire user graph (followers, followees) into memory for every user. With 10K concurrent requests, memory spiked to 6GB per container despite 4GB limit.
Fix
Changed to lazy-load user graph with a TTL cache (Redis) and reduced batch size to 100 users per request. Set container memory to 8GB with a 6GB soft limit.
Key lesson
  • Never load the full social graph in memory per request—cache it with a TTL and paginate the fan-out.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Feed loading slowly for all users
Fix
1. Check Redis cache hit ratio (should be >90%). 2. If low, check for cache stampede: add mutex locks per user. 3. Pre-warm cache for top 1% users.
Symptom · 02
Uploads failing with 503
Fix
1. Check Kafka consumer lag. 2. Increase number of consumers. 3. If DB write throughput is bottleneck, batch larger (e.g., 500 per batch).
Symptom · 03
Feed inconsistent across devices
Fix
1. Check if fan-out is async and eventual consistency is expected. 2. If not, add version vector to feed cache entries. 3. Force cache invalidation on unfollow.
★ Design Instagram Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Feed latency >2s for all users
Immediate action
Check Redis cache hit ratio
Commands
redis-cli INFO stats | grep hit_rate
redis-cli --bigkeys
Fix now
Increase Redis memory or add replicas. Pre-warm cache for top users.
Uploads returning 503+
Immediate action
Check Kafka consumer lag
Commands
kafka-consumer-groups --bootstrap-server localhost:9092 --group upload-group --describe
Check DB connection pool: SHOW STATUS LIKE 'Threads_connected';
Fix now
Scale Kafka consumers horizontally. Increase DB max_connections.
Feed shows stale posts after unfollow+
Immediate action
Check if unfollow triggers cache invalidation
Commands
redis-cli KEYS 'feed:*' | head -5
Check application logs for 'unfollow' events
Fix now
Add async job to delete posts from unfollowed user in follower's feed cache.
CDN miss ratio >10%+
Immediate action
Check CDN cache configuration
Commands
curl -I https://cdn.example.com/photo.jpg | grep x-cache
Check origin server load
Fix now
Increase TTL on CDN for static assets. Pre-fetch popular content.
Feature / AspectPush-Based FeedPull-Based Feed
Write LatencyHigh (fan-out to all followers)Low (only write post metadata)
Read LatencyLow (pre-computed)High (compute on read)
ConsistencyImmediateEventual (seconds delay)
Best ForUsers with <10K followersUsers with >10K followers

Key takeaways

1
The read path (feed generation) is the hard part, not the upload path. Focus on fan-out and caching.
2
Hybrid push/pull fan-out with a threshold (e.g., 10K followers) prevents write amplification from celebrities.
3
Cache only post IDs in Redis, not full objects. Use CDN for media delivery with >95% hit ratio.
4
Start monolithic, scale with sharding and async processing only when you see real pain points.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does Instagram handle the fan-out problem for users with millions of...
Q02SENIOR
When would you choose push-based feed over pull-based in a production sy...
Q03SENIOR
What happens when a celebrity posts and 10 million followers pull their ...
Q04JUNIOR
What database would you use for the social graph (follows) and why?
Q05SENIOR
A user reports that their feed shows a post from someone they unfollowed...
Q06SENIOR
How would you design the feed ranking to scale to 500M users?
Q01 of 06SENIOR

How does Instagram handle the fan-out problem for users with millions of followers?

ANSWER
Instagram uses a hybrid approach: push for users with fewer than 10K followers (pre-compute feed on post), pull for users with more than 10K followers (compute feed on read from followees' recent posts). This avoids writing to millions of feed caches synchronously.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
How does Instagram generate the feed for each user?
02
What's the difference between push-based and pull-based feed generation?
03
How do I handle the celebrity problem in system design?
04
What happens when a user unfollows someone in Instagram's system?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Real World. Mark it forged?

4 min read · try the examples if you haven't

Previous
Design Google Docs
20 / 40 · Real World
Next
Design a Distributed Message Queue