Senior 4 min · June 25, 2026

Design Instagram: The Real-World System Behind 500M Daily Active Users

Q: How does Instagram generate the feed for each user?

Instagram uses a hybrid fan-out: for users with few followers, it pushes new posts to followers' feed caches on upload. For users with many followers, it pulls recent posts from followees on read. The feed is then ranked by a machine learning model.

Q: What's the difference between push-based and pull-based feed generation?

Push-based pre-computes the feed when a post is created, giving low read latency but high write cost. Pull-based computes the feed on read, reducing write cost but increasing read latency. Instagram uses push for small accounts and pull for large ones.

Q: How do I handle the celebrity problem in system design?

Use a pull-based model for users with many followers. Only push to a subset of active followers (e.g., top 10K). For the rest, generate feed on read by fetching recent posts from followees. Use a mutex lock to prevent thundering herd.

Q: What happens when a user unfollows someone in Instagram's system?

The unfollow event triggers an async job that removes the unfollowed user's posts from the follower's feed cache. Until the job runs, stale posts may appear. This is acceptable for eventual consistency.

System design for Instagram: sharding, caching, feed generation, and the production traps that bring down photo-sharing at scale..

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Instagram's system design relies on a combination of microservices for upload, feed generation, and storage, with a CDN for media delivery, a NoSQL database for user metadata, and a distributed cache for hot feeds. The key challenge is generating personalized feeds at low latency while handling write-heavy uploads.

✦ Definition~90s read

What is Design Instagram?

Design Instagram is a system design exercise that models how to build a scalable photo-sharing service handling millions of uploads, billions of feed views, and low-latency content delivery across a global user base.

★

Think of Instagram like a massive photo album shared by the whole world.

Plain-English First

Think of Instagram like a massive photo album shared by the whole world. When you upload a photo, it's like putting a print into a central warehouse (object storage), then making copies to display in your friends' albums (feed generation). The tricky part is that millions of people are adding photos every second, and each person's album (feed) is different. You can't just show everyone the same photos—you have to sort and rank them just for you, and do it fast enough that you don't notice the delay.

Here's what everyone gets wrong about designing Instagram: they focus on the upload path. The real nightmare is the read path—generating a personalized feed for 500 million daily active users in under 500 milliseconds. I've seen teams burn months optimizing photo storage while their feed latency crawled to 10 seconds because they ignored fan-out. This article walks through the actual architecture that makes Instagram work at scale: the sharding strategy, the cache hierarchy, and the feed generation patterns that separate a working prototype from a production system. By the end, you'll know exactly how to design a photo-sharing service that doesn't fall over when a celebrity posts a selfie.

Why Instagram's Read Path Is the Hard Part

Most tutorials start with upload: client sends photo, server stores it, done. That's the easy 10%. The hard 90% is the read path—generating a feed for each user that's personalized, ranked, and delivered under 500ms. Without a proper design, a single celebrity post can cause a thundering herd that takes down your feed service. The core problem is fan-out: when a user with 10 million followers posts a photo, you need to insert that post into 10 million feed caches. Do it synchronously and your write latency explodes. Do it lazily and followers see stale feeds. The real Instagram uses a hybrid approach: push for users with few followers, pull for users with many followers, and a threshold (e.g., 10K followers) to switch.

FeedGeneration.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Hybrid fan-out: push for small followers, pull for large
function generateFeed(userId, limit) {
  const followerCount = getUserFollowerCount(userId);
  if (followerCount < 10000) {
    // Push-based: pre-compute feed on post
    return getCachedFeed(userId, limit);
  } else {
    // Pull-based: compute on read from followees' recent posts
    const followees = getFollowees(userId);
    const posts = [];
    for (let followeeId of followees) {
      const recentPosts = getRecentPosts(followeeId, limit / followees.length);
      posts.push(...recentPosts);
    }
    return rankPosts(posts, userId);
  }
}

Output

Returns a list of post IDs, ranked by recency and relevance, from cache or computed on the fly.

Production Trap: Synchronous Fan-Out

If you push every post to every follower synchronously, a single post from a celebrity can cause 10 million database writes. Your write throughput will collapse. Always use async fan-out with a message queue (Kafka) and batch writes to the feed cache.

thecodeforge.io

Instagram System Design: Read Path, Sharding, Caching, Upload, Feed, Celebrity, DB

Design Instagram

thecodeforge.io

Read Path: Feed Generation

Design Instagram

Sharding User Data Without Losing Your Mind

User data—profiles, followers, posts—needs to be sharded across databases. The naive approach is to shard by user ID modulo N. That works until a hot user (celebrity) causes a single shard to be hammered. Instagram uses a two-level sharding: first by user ID hash, then by a configurable number of logical shards per physical database. This allows resharding without downtime. For the social graph (follows), they use a graph database (TAO) that stores edges as key-value pairs with locality. The key insight: always keep related data (user + their posts) on the same shard to avoid cross-shard queries.

ShardingStrategy.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Two-level sharding: logical shard -> physical database
function getShard(userId) {
  const logicalShard = hash(userId) % TOTAL_LOGICAL_SHARDS;
  // Map logical shard to physical DB using a config file
  const physicalDb = shardMap[logicalShard];
  return physicalDb;
}

// When resharding, update shardMap gradually and move data in background

Output

Returns the database connection for a given user ID.

Senior Shortcut: Pre-Join Data

Store a user's recent posts in the same database row as the user profile. This avoids a join across tables or shards when loading a profile page. Denormalize aggressively for read-heavy workloads.

Caching: The Only Way to Survive Peak Traffic

Instagram's feed is cached in Redis clusters. Each user's feed is a sorted set of post IDs with score = timestamp. When a user scrolls, they read from cache. Cache misses fall back to the pull-based generator. The cache is pre-warmed for the top 1% of users (by follower count) to handle sudden spikes. For media (photos, videos), a CDN (Akamai) caches at edge locations. The CDN cache hit ratio should be >95% for static content. If it drops below 90%, you're paying too much for origin bandwidth. The classic rookie mistake is caching the entire post object in Redis—cache only post IDs and metadata, fetch media URLs from CDN.

CacheStrategy.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Cache feed as sorted set of post IDs
function addPostToFeeds(postId, userId, timestamp) {
  const followers = getFollowers(userId);
  for (let followerId of followers) {
    if (followerId in hotUsers) {
      redis.zadd(`feed:${followerId}`, timestamp, postId);
      redis.expire(`feed:${followerId}`, 3600); // TTL 1 hour
    }
  }
}

// Read feed: try cache first, then generate
function getFeed(userId, limit) {
  const cached = redis.zrevrange(`feed:${userId}`, 0, limit - 1);
  if (cached.length > 0) return cached;
  return generateFeed(userId, limit);
}

Output

List of post IDs from cache or generated on the fly.

Interview Gold: Cache Invalidation

When a user unfollows someone, you must delete that user's posts from the follower's feed cache. Do this asynchronously with a background job. Otherwise, stale posts appear in feeds indefinitely.

Upload Pipeline: From Client to CDN

When a user uploads a photo, the client sends it directly to a CDN upload endpoint (not your server). The CDN returns a URL. Your server then receives only metadata (URL, caption, location) and stores it in the database. This offloads bandwidth from your servers. The CDN also handles resizing and format conversion (WebP, AVIF). The upload service is stateless and can be scaled horizontally. The bottleneck is the database write for the post metadata. Use a write-ahead log (Kafka) to buffer writes and batch them into the database. Never write directly to the database on every upload—you'll saturate the disk I/O.

UploadPipeline.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Upload handler: receive metadata, queue for DB write
function handleUpload(req, res) {
  const { userId, imageUrl, caption, location } = req.body;
  // Validate and sanitize
  const post = { userId, imageUrl, caption, location, timestamp: Date.now() };
  // Send to Kafka for async batch insert
  kafka.send('post-uploads', post);
  res.status(202).json({ message: 'Upload accepted' });
}

// Kafka consumer batches writes
function batchInsertPosts() {
  const batch = [];
  kafka.consume('post-uploads', (post) => {
    batch.push(post);
    if (batch.length >= 100) {
      db.insertMany(batch);
      batch.length = 0;
    }
  });
}

Output

HTTP 202 Accepted response. Post metadata is eventually consistent in the database.

Never Do This: Synchronous DB Write on Upload

Writing to the database on every upload will cause connection pool exhaustion under load. You'll see 'Error: Connection pool exhausted' in your logs. Always buffer writes with a queue.

Feed Ranking: Beyond Recency

Instagram's feed isn't just chronological. It's ranked by a machine learning model that considers affinity (how often you interact with the poster), timeliness, and content type. The ranking service runs as a separate microservice that takes a list of candidate post IDs and returns a scored list. The model is updated daily. The challenge is latency: ranking must complete in under 100ms. Use a lightweight model (e.g., logistic regression with feature precomputation) rather than a deep neural network. Precompute features like 'average likes per post from this user' and store them in a key-value store (Cassandra).

FeedRanking.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Simplified ranking: score = affinity * recency
function rankPosts(posts, userId) {
  return posts.map(post => {
    const affinity = getAffinity(userId, post.userId); // 0-1
    const recency = 1 / (Date.now() - post.timestamp); // inverse
    return { postId: post.id, score: affinity * recency };
  }).sort((a, b) => b.score - a.score);
}

Output

Sorted list of post IDs with scores.

Senior Shortcut: Precompute Affinity

Don't compute affinity on the fly. Run a daily batch job that calculates affinity scores for each user-followee pair and stores them in a cache. This reduces ranking latency from 500ms to 10ms.

Handling the Celebrity Problem

A user with 50 million followers posts a photo. If you push to all followers synchronously, your feed cache write rate spikes to 50 million writes per second. The solution: use a pull-based model for users with >10K followers. Their feed is generated on read by fetching recent posts from followees. For the celebrity's own feed, they see a push-based feed of their own posts. Additionally, rate-limit the fan-out: only push to the first 10K followers (the most active ones) and let the rest pull. This is exactly what Instagram does.

CelebrityFanout.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Fan-out with threshold
function fanoutPost(post, userId) {
  const followerCount = getUserFollowerCount(userId);
  if (followerCount <= 10000) {
    // Push to all followers
    const followers = getFollowers(userId);
    for (let followerId of followers) {
      redis.zadd(`feed:${followerId}`, post.timestamp, post.id);
    }
  } else {
    // Push only to top 10K active followers
    const activeFollowers = getActiveFollowers(userId, 10000);
    for (let followerId of activeFollowers) {
      redis.zadd(`feed:${followerId}`, post.timestamp, post.id);
    }
    // Mark post for pull-based retrieval
    redis.sadd(`recent-posts:${userId}`, post.id);
  }
}

Output

Post is added to feeds of active followers; others fetch on read.

Production Trap: Thundering Herd

When a celebrity posts, the pull-based followers will all try to generate their feed simultaneously. Use a cache-aside pattern with a mutex lock per user to prevent multiple requests from regenerating the same feed. Otherwise, your database will be hammered.

thecodeforge.io

Push vs Pull for Celebrity Posts

Design Instagram

Database Choice: SQL vs NoSQL

Instagram uses PostgreSQL for user metadata and posts, Cassandra for the social graph (follows), and Redis for caching. Why not all in one? PostgreSQL provides strong consistency for transactions (e.g., user registration). Cassandra provides high write throughput for the social graph (millions of follows/unfollows per second). Redis provides low-latency reads for feeds. The trade-off: eventual consistency between systems. A user might follow someone and not see their posts for a few seconds. That's acceptable. Never use a single monolithic database—you'll hit scaling limits.

The Classic Bug: Cross-Shard Joins

If you store user profiles in PostgreSQL and posts in Cassandra, don't try to join them in application code. You'll end up with N+1 queries. Instead, denormalize: store a user's recent post IDs in the user profile row.

When Not to Use This Design

This architecture is overkill for a photo-sharing app with fewer than 1 million users. If you're building an MVP, use a monolithic backend with a single PostgreSQL database and a CDN for images. The complexity of microservices, sharding, and async fan-out will slow you down. Only adopt these patterns when you see specific pain points: feed latency >1 second, database CPU >80%, or upload failures due to write contention. Also, if your app is read-heavy but not write-heavy (e.g., a gallery), a simpler pull-based feed with a CDN cache is sufficient.

Senior Shortcut: Start Simple

For the first 100K users, a single PostgreSQL instance with a Redis cache and a CDN will handle the load. Don't pre-optimize. When you hit 1M users, start sharding. When you hit 10M, add async fan-out. Premature scaling is the root of all evil.

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

Feed generation service containers were OOM-killed every 30 minutes during peak hours.

Assumption

Memory leak in feed ranking algorithm.

Root cause

Each feed generation request loaded the entire user graph (followers, followees) into memory for every user. With 10K concurrent requests, memory spiked to 6GB per container despite 4GB limit.

Fix

Changed to lazy-load user graph with a TTL cache (Redis) and reduced batch size to 100 users per request. Set container memory to 8GB with a 6GB soft limit.

Key lesson

Never load the full social graph in memory per request—cache it with a TTL and paginate the fan-out.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

Feed loading slowly for all users

→

Fix

1. Check Redis cache hit ratio (should be >90%). 2. If low, check for cache stampede: add mutex locks per user. 3. Pre-warm cache for top 1% users.

Symptom · 02

Uploads failing with 503

→

Fix

1. Check Kafka consumer lag. 2. Increase number of consumers. 3. If DB write throughput is bottleneck, batch larger (e.g., 500 per batch).

Symptom · 03

Feed inconsistent across devices

→

Fix

1. Check if fan-out is async and eventual consistency is expected. 2. If not, add version vector to feed cache entries. 3. Force cache invalidation on unfollow.

★ Design Instagram Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

Feed latency >2s for all users−

Immediate action

Check Redis cache hit ratio

Commands

redis-cli INFO stats | grep hit_rate

redis-cli --bigkeys

Fix now

Increase Redis memory or add replicas. Pre-warm cache for top users.

Uploads returning 503+

Feed shows stale posts after unfollow+

CDN miss ratio >10%+

Feature / Aspect	Push-Based Feed	Pull-Based Feed
Write Latency	High (fan-out to all followers)	Low (only write post metadata)
Read Latency	Low (pre-computed)	High (compute on read)
Consistency	Immediate	Eventual (seconds delay)
Best For	Users with <10K followers	Users with >10K followers

Key takeaways

The read path (feed generation) is the hard part, not the upload path. Focus on fan-out and caching.

Hybrid push/pull fan-out with a threshold (e.g., 10K followers) prevents write amplification from celebrities.

Cache only post IDs in Redis, not full objects. Use CDN for media delivery with >95% hit ratio.

Start monolithic, scale with sharding and async processing only when you see real pain points.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does Instagram handle the fan-out problem for users with millions of...

Q02SENIOR

When would you choose push-based feed over pull-based in a production sy...

Q03SENIOR

What happens when a celebrity posts and 10 million followers pull their ...

Q04JUNIOR

What database would you use for the social graph (follows) and why?

Q05SENIOR

A user reports that their feed shows a post from someone they unfollowed...

Q06SENIOR

How would you design the feed ranking to scale to 500M users?

Q01 of 06SENIOR

How does Instagram handle the fan-out problem for users with millions of followers?

ANSWER

Instagram uses a hybrid approach: push for users with fewer than 10K followers (pre-compute feed on post), pull for users with more than 10K followers (compute feed on read from followees' recent posts). This avoids writing to millions of feed caches synchronously.

FAQ · 4 QUESTIONS

Frequently Asked Questions

How does Instagram generate the feed for each user?

What's the difference between push-based and pull-based feed generation?

How do I handle the celebrity problem in system design?

What happens when a user unfollows someone in Instagram's system?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Real World. Mark it forged?

4 min read · try the examples if you haven't