Design Instagram: The Real-World System Behind 500M Daily Active Users
System design for Instagram: sharding, caching, feed generation, and the production traps that bring down photo-sharing at scale..
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
Instagram's system design relies on a combination of microservices for upload, feed generation, and storage, with a CDN for media delivery, a NoSQL database for user metadata, and a distributed cache for hot feeds. The key challenge is generating personalized feeds at low latency while handling write-heavy uploads.
Think of Instagram like a massive photo album shared by the whole world. When you upload a photo, it's like putting a print into a central warehouse (object storage), then making copies to display in your friends' albums (feed generation). The tricky part is that millions of people are adding photos every second, and each person's album (feed) is different. You can't just show everyone the same photos—you have to sort and rank them just for you, and do it fast enough that you don't notice the delay.
Here's what everyone gets wrong about designing Instagram: they focus on the upload path. The real nightmare is the read path—generating a personalized feed for 500 million daily active users in under 500 milliseconds. I've seen teams burn months optimizing photo storage while their feed latency crawled to 10 seconds because they ignored fan-out. This article walks through the actual architecture that makes Instagram work at scale: the sharding strategy, the cache hierarchy, and the feed generation patterns that separate a working prototype from a production system. By the end, you'll know exactly how to design a photo-sharing service that doesn't fall over when a celebrity posts a selfie.
Why Instagram's Read Path Is the Hard Part
Most tutorials start with upload: client sends photo, server stores it, done. That's the easy 10%. The hard 90% is the read path—generating a feed for each user that's personalized, ranked, and delivered under 500ms. Without a proper design, a single celebrity post can cause a thundering herd that takes down your feed service. The core problem is fan-out: when a user with 10 million followers posts a photo, you need to insert that post into 10 million feed caches. Do it synchronously and your write latency explodes. Do it lazily and followers see stale feeds. The real Instagram uses a hybrid approach: push for users with few followers, pull for users with many followers, and a threshold (e.g., 10K followers) to switch.
Sharding User Data Without Losing Your Mind
User data—profiles, followers, posts—needs to be sharded across databases. The naive approach is to shard by user ID modulo N. That works until a hot user (celebrity) causes a single shard to be hammered. Instagram uses a two-level sharding: first by user ID hash, then by a configurable number of logical shards per physical database. This allows resharding without downtime. For the social graph (follows), they use a graph database (TAO) that stores edges as key-value pairs with locality. The key insight: always keep related data (user + their posts) on the same shard to avoid cross-shard queries.
Caching: The Only Way to Survive Peak Traffic
Instagram's feed is cached in Redis clusters. Each user's feed is a sorted set of post IDs with score = timestamp. When a user scrolls, they read from cache. Cache misses fall back to the pull-based generator. The cache is pre-warmed for the top 1% of users (by follower count) to handle sudden spikes. For media (photos, videos), a CDN (Akamai) caches at edge locations. The CDN cache hit ratio should be >95% for static content. If it drops below 90%, you're paying too much for origin bandwidth. The classic rookie mistake is caching the entire post object in Redis—cache only post IDs and metadata, fetch media URLs from CDN.
Upload Pipeline: From Client to CDN
When a user uploads a photo, the client sends it directly to a CDN upload endpoint (not your server). The CDN returns a URL. Your server then receives only metadata (URL, caption, location) and stores it in the database. This offloads bandwidth from your servers. The CDN also handles resizing and format conversion (WebP, AVIF). The upload service is stateless and can be scaled horizontally. The bottleneck is the database write for the post metadata. Use a write-ahead log (Kafka) to buffer writes and batch them into the database. Never write directly to the database on every upload—you'll saturate the disk I/O.
Feed Ranking: Beyond Recency
Instagram's feed isn't just chronological. It's ranked by a machine learning model that considers affinity (how often you interact with the poster), timeliness, and content type. The ranking service runs as a separate microservice that takes a list of candidate post IDs and returns a scored list. The model is updated daily. The challenge is latency: ranking must complete in under 100ms. Use a lightweight model (e.g., logistic regression with feature precomputation) rather than a deep neural network. Precompute features like 'average likes per post from this user' and store them in a key-value store (Cassandra).
Handling the Celebrity Problem
A user with 50 million followers posts a photo. If you push to all followers synchronously, your feed cache write rate spikes to 50 million writes per second. The solution: use a pull-based model for users with >10K followers. Their feed is generated on read by fetching recent posts from followees. For the celebrity's own feed, they see a push-based feed of their own posts. Additionally, rate-limit the fan-out: only push to the first 10K followers (the most active ones) and let the rest pull. This is exactly what Instagram does.
Database Choice: SQL vs NoSQL
Instagram uses PostgreSQL for user metadata and posts, Cassandra for the social graph (follows), and Redis for caching. Why not all in one? PostgreSQL provides strong consistency for transactions (e.g., user registration). Cassandra provides high write throughput for the social graph (millions of follows/unfollows per second). Redis provides low-latency reads for feeds. The trade-off: eventual consistency between systems. A user might follow someone and not see their posts for a few seconds. That's acceptable. Never use a single monolithic database—you'll hit scaling limits.
When Not to Use This Design
This architecture is overkill for a photo-sharing app with fewer than 1 million users. If you're building an MVP, use a monolithic backend with a single PostgreSQL database and a CDN for images. The complexity of microservices, sharding, and async fan-out will slow you down. Only adopt these patterns when you see specific pain points: feed latency >1 second, database CPU >80%, or upload failures due to write contention. Also, if your app is read-heavy but not write-heavy (e.g., a gallery), a simpler pull-based feed with a CDN cache is sufficient.
The 4GB Container That Kept Dying
- Never load the full social graph in memory per request—cache it with a TTL and paginate the fan-out.
redis-cli INFO stats | grep hit_rateredis-cli --bigkeysKey takeaways
Interview Questions on This Topic
How does Instagram handle the fan-out problem for users with millions of followers?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
That's Real World. Mark it forged?
4 min read · try the examples if you haven't