Design Reddit: Building a Scalable Social News Platform from Scratch
Learn how to design Reddit's core features: scalable feed, voting, comments, and subreddits.
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
Design Reddit by separating read and write paths: use a distributed database for posts/comments, a caching layer for hot feeds, and a message queue for asynchronous vote aggregation. Subreddits are partitions. The feed is precomputed and cached per user session.
Imagine a giant corkboard where anyone can pin a note. People walk by and either thumbs-up or thumbs-down each note. The most popular notes float to the top. Now imagine millions of people doing this every second — you need a system that doesn't collapse. That's Reddit. You need a way to collect notes fast, count votes without fighting over the same paper, and show each person a personalized board without recalculating everything from scratch.
Reddit serves 430 million monthly active users. Every second, someone submits a post, casts a vote, or writes a comment. The naive approach — a single SQL database with joins — would collapse under its own weight within minutes. The real challenge isn't the feature set; it's the scale. This article walks you through building a Reddit clone that won't fall over when you hit the front page of Hacker News. You'll learn how to design the data model, handle the hot path of voting, serve personalized feeds, and partition subreddits — all with production-tested patterns. By the end, you'll be able to architect a social platform that handles millions of concurrent users without breaking a sweat.
Data Model: The Foundation That Won't Crumble
Start with the data model. Reddit's core entities are users, posts, comments, votes, and subreddits. The naive approach is a normalized relational schema with foreign keys everywhere. That works for a few thousand users. At Reddit's scale, joins become death. You need to denormalize strategically. For posts, store the subreddit ID, author ID, title, URL or text, and a score (denormalized from votes). For comments, use a materialized path — a string like 'root_id.parent_id.child_id' — so you can fetch an entire thread with a single index scan. Votes are the hottest path: every upvote/downvote is a write. Store them in a separate table with a composite primary key (user_id, post_id) to enforce one vote per user per post. But don't update the post score synchronously — that's a write bottleneck. Instead, use a message queue to batch vote updates. The subreddit is a partition key: shard posts and comments by subreddit ID. This keeps hot subreddits from starving cold ones.
Voting: The Hot Path That Burns the Naive
Voting is the highest-write-throughput operation on Reddit. Every upvote or downvote is a write to the votes table and an update to the post's score. If you do both synchronously in a transaction, you'll serialize all writes to the same post. At scale, that's a bottleneck. The fix: decouple the vote recording from the score update. Write the vote to a fast append-only log (Kafka, Pulsar) and let a consumer batch-update the post score every few seconds. This means the displayed score is eventually consistent — but users don't notice a 5-second lag. For the vote count, use a counter in Redis (INCR/DECR) as a hot cache, and persist to Postgres asynchronously. This pattern is called 'write-behind cache'. The classic rookie mistake is updating the post score in the same transaction as the vote insert. That causes row-level locks on the post row, and under high concurrency, you'll see 'deadlock detected' errors.
Feed Generation: Precompute or Die
Reddit's front page and subreddit feeds are the most-read data. Generating them on the fly by scanning all posts and sorting by score is impossible at scale. Instead, precompute the feed for each subreddit every few minutes and cache it. For the home feed (aggregate of subscribed subreddits), merge the top N posts from each subscribed subreddit's cached feed. Use a fan-out-on-write pattern for high-engagement users: when a user is active, push new posts from their subscribed subreddits into a per-user Redis list. For the rest, use a pull-based approach: on request, fetch the top posts from each subscribed subreddit's cache and merge them. The trade-off: fan-out writes more data but gives real-time feeds for active users. Pull is simpler and works for 99% of users. Reddit uses a hybrid: active users get fan-out, others get pull. The key insight: don't sort the entire post set. Use a hotness score that decays over time, and only keep the top 1000 posts per subreddit in cache.
Subreddits as Partitions: The Scalability Multiplier
Subreddits are the natural partition key. Each subreddit is independent — posts, comments, and votes are scoped to a subreddit. Shard your database by subreddit_id. This means a hot subreddit (like r/AskReddit) lives on its own shard and doesn't affect others. Use consistent hashing to map subreddit IDs to shards. For cross-subreddit queries (home feed), you query all relevant shards in parallel and merge results. This is a scatter-gather pattern. The downside: adding a new shard requires rebalancing. Use a lookup table (subreddit_id -> shard_id) that can be updated without downtime. For the comment tree, since comments are partitioned by post_id (which is under a subreddit), the entire tree lives on one shard. That's fine — a single post's comment tree fits on one node. The gotcha: if a subreddit grows too hot (e.g., r/place), you may need to split it further. Use a two-level partition: subreddit_id, then hash of post_id.
Caching Strategy: Three Layers Deep
Reddit's read-to-write ratio is about 80:20. Caching is critical. Use three layers: CDN for static assets (images, CSS), Redis for hot data (feed, scores, user sessions), and Memcached for less hot data (user profiles, subreddit info). The feed cache is the most important. Cache the top 1000 posts per subreddit with a TTL of 5 minutes. For the home feed, cache the merged result per user for 1 minute. Use cache-aside pattern: on miss, load from DB and populate cache. For write-through, update cache on vote — but only for the post's score, not the entire feed. The feed is eventually consistent. The classic mistake: caching entire post objects in the feed. Cache only post IDs and scores, then fetch full post data on demand (with a separate cache). This reduces cache size and avoids cache invalidation on post edits.
Real-Time Comments and Notifications
Reddit needs real-time updates for comments and notifications. Use WebSockets for live comment threads. When a user posts a comment, the server broadcasts it to all clients viewing that post via a pub/sub channel (Redis Pub/Sub or Kafka). For notifications (replies, mentions), use a message queue to decouple the notification sending from the comment write. The notification service consumes from the queue and pushes to the user via WebSocket or push notification. The gotcha: if a post has 10,000 viewers, broadcasting to all of them can overwhelm the server. Use a fan-out approach: each WebSocket server subscribes to a channel per post, and the pub/sub system distributes the message to all servers. Don't send the full comment object — send only the comment ID, and let the client fetch the details.
Search: Full-Text Search at Scale
Reddit's search needs to index posts, comments, and subreddits. Use Elasticsearch for full-text search. Index posts with fields: title, body, subreddit_id, author_id, score, created_at. Use a separate index for comments. For subreddit search, use a lightweight autocomplete index (e.g., Elasticsearch's completion suggester). The challenge: keeping the index in sync with the database. Use a change data capture (CDC) pipeline: Debezium reads the Postgres WAL and pushes changes to Kafka, then a consumer updates Elasticsearch. This ensures near-real-time search without dual-write complexity. The classic mistake: indexing every comment immediately. That's expensive. Instead, index comments in batches every 30 seconds. Search results can be slightly stale — users don't notice.
When Not to Use This Architecture
This architecture is overkill for a small community (<10k users). For a small Reddit clone, a single Postgres instance with proper indexing and caching (Redis) is sufficient. The partitioning, Kafka, and Elasticsearch add operational complexity. Only invest in this when you have >1M DAU or anticipate rapid growth. Also, if your content is ephemeral (e.g., disappearing posts), you can skip the search index and use a simpler in-memory store. Another case: if you don't need real-time feeds, you can generate feeds on request with a simple SQL query and cache the result. The fan-out-on-write pattern is only justified when users expect sub-second feed updates. For most apps, a 5-minute stale feed is acceptable.
The 4GB Container That Kept Dying
work_mem = '64MB' and replaced recursive CTE with a materialized path approach using ltree extension. Added an index on path column.- Recursive queries on hierarchical data at scale are a trap.
- Always use materialized paths or nested sets for comment trees.
redis-cli -h redis-cluster keys 'subreddit:*:top' | head -5redis-cli -h redis-cluster get 'subreddit:1:top'docker-compose restart feed-generatorKey takeaways
Interview Questions on This Topic
How would you handle a sudden spike in votes on a single post (e.g., a front-page post)?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Notes here come from systems that actually shipped.
That's Real World. Mark it forged?
6 min read · try the examples if you haven't