System Design — Replication Lag Breaks Read-Your-Writes
After posting, users saw comments missing for up to 5 seconds — replication lag broke read-your-writes.
- System design decides how your software handles growth, not just user features
- Core components: load balancers, caches, databases, and message queues
- Horizontal scaling adds machines; vertical scaling adds power to one machine
- Caching reduces database load by 10x, but cache invalidation is the hard part
- Biggest mistake: microservices for a 10-user app — start monolithic, split later
Every application you've ever loved — Instagram, Spotify, Google Maps — was built twice. Once in code, and once in architecture. The code handles what the app does. The architecture determines whether it survives Monday morning when a million users show up at once. System design is the discipline of making those architectural decisions deliberately, before production teaches you the hard way. It's the difference between a service that scales gracefully and one that crumbles under its own success.
The problem system design solves isn't a coding problem — it's a coordination problem. A single server can handle a few hundred users just fine. But what happens when you hit ten thousand? A hundred thousand? The naive answer is 'add a bigger server,' but that only buys you time. Real scalability means distributing work intelligently, caching aggressively, tolerating failure gracefully, and keeping data consistent without grinding everything to a halt. Each of those goals pulls in a different direction, and system design is the art of finding the right tension between them.
By the end of this article you'll understand the core building blocks every system is made of — load balancers, caches, databases, and message queues — why each one exists, when to reach for it, and what you give up when you do. You'll be able to look at a system description in an interview or a design doc and immediately start asking the right questions instead of staring at a blank whiteboard.
The Core Pillar: Scaling from One to a Million Users
In system design, we differentiate between Vertical Scaling (buying a bigger machine) and Horizontal Scaling (buying more machines). While Vertical Scaling is easy, it has a hard ceiling. Horizontal Scaling is the industry standard for modern distributed systems, but it introduces the need for Load Balancing and Data Consistency management.
To handle this, we use a Load Balancer (LB) as the entry point. The LB sits in front of your application servers and conducts traffic based on algorithms like Round Robin or Least Connections. This ensures no single server is overwhelmed while others sit idle.
Database Scaling and The CAP Theorem
When scaling databases, you'll eventually face the CAP Theorem: you can only have two out of three: Consistency, Availability, and Partition Tolerance. For a global system, we often use 'Read Replicas' to handle heavy traffic. We write to a 'Primary' node and read from 'Secondary' nodes. This improves performance but introduces 'Eventual Consistency'—where a user might not see their own post for a few milliseconds while data synchronizes.
Caching: The Fast Lane to Performance
Caching stores frequently accessed data in a faster storage layer (memory) so that repeated requests avoid expensive database calls. Common caching patterns include Cache-Aside (application checks cache first), Read-Through (cache loads from DB on miss), and Write-Through (write to cache and DB simultaneously). The most widely used cache is Redis, an in-memory data store with sub-millisecond latency.
Caching isn't free. You trade memory for speed, and you introduce the problem of cache invalidation — keeping the cache in sync with the source of truth. A stale cache can serve outdated data, sometimes breaking business logic.
Load Balancers: Beyond Round-Robin
Load balancers distribute incoming traffic across multiple servers. But the algorithm matters. Round-Robin is simple but assumes equal server capacity and health. Least Connections sends requests to the server with fewest active connections — better for variable-length requests. IP Hash can enable session persistence without sticky cookies, by hashing the client IP to a specific server.
In production, LBs also perform health checks (active and passive), SSL termination, and rate limiting. They can be L4 (TCP/UDP) or L7 (HTTP/HTTPS). L7 allows content-based routing, e.g., forwarding /api/ to one pool and /static/ to another.
Message Queues: Decouple and Scale Asynchronously
Message queues (e.g., RabbitMQ, Kafka, SQS) decouple producers from consumers. When a user uploads a video, the web server returns immediately and enqueues a task saying 'process video'. A separate worker picks that task, does the heavy lifting, and updates the database. This pattern keeps the web server responsive even under load.
Key concepts: Producer sends messages to a queue; Consumer pulls them; Queue buffers during spikes. Kafka adds the idea of log-based persistence and replayability. The trade-off is complexity: you now have at-least-once delivery, idempotency, and ordering challenges.
| Feature | Vertical Scaling (Scale Up) | Horizontal Scaling (Scale Out) |
|---|---|---|
| Complexity | Low (No code changes needed) | High (Requires Load Balancer & Distributed Logic) |
| Hardware | Increasing CPU/RAM on one box | Adding more standard commodity servers |
| Availability | SPOF (If server dies, app dies) | High (Other nodes stay up if one fails) |
| Cost | Exponentially expensive at high end | Linear and more predictable |
Key Takeaways
- Scalability is not just about size; it's about the ability to handle growth gracefully without total architectural rewrites.
- Caching is your best friend for performance—use Redis or Memcached to store 'hot' data and avoid expensive database hits.
- State is the enemy of scaling. Keep your application servers 'stateless' so you can add or remove them at will without losing user sessions.
- The CAP Theorem is a law of nature in distributed systems—accepting Eventual Consistency is often the price of high availability.
Common Mistakes to Avoid
- Premature Optimization
Symptom: Building a distributed microservices architecture for a product with 10 users leads to high complexity and slow feature delivery.
Fix: Start monolithic but design with modularity so you can split services later when the user base grows. - Ignoring Latency
Symptom: Adding too many network hops (e.g., App -> LB -> Cache -> DB) without measuring RTT, causing slow responses.
Fix: Profile network hops early. Use a CDN for static assets, co-locate services in the same region, and batch requests where possible. - Hard-coding IPs
Symptom: Service discovery fails when servers are replaced or scaled. Manual IP updates are error-prone and cause outages.
Fix: Use service discovery (Consul, Eureka) or internal DNS names. Never reference IPs directly in config or code.
Interview Questions on This Topic
- QYou are designing a service like 'URL Shortener' (TinyURL). How would you handle 100,000 requests per second with low latency?SeniorReveal
- QExplain the trade-offs between a NoSQL (e.g., MongoDB) and a Relational (e.g., PostgreSQL) database for a social media feed. When would you prefer one over the other?Mid-levelReveal
- QHow does a Content Delivery Network (CDN) reduce latency for global users, and where does it sit in the system hierarchy?Mid-levelReveal
Frequently Asked Questions
What is the difference between Scalability and Reliability?
Scalability is the system's ability to handle increased load by adding resources. Reliability is the system's ability to remain functional even when components fail. A system can be scalable but unreliable if its parts break frequently under that load.
When should I use a Message Queue like RabbitMQ or Kafka?
Use a Message Queue to decouple heavy tasks from the user request cycle. For example, when a user uploads a video, return 'Success' immediately and put the video processing task into a queue to be handled asynchronously by a worker service.
What is 'Sticky Sessions' and why are they generally avoided?
Sticky sessions force a specific user to always talk to the same server. While this makes session management easy, it makes scaling hard and load balancing uneven. It's better to use a distributed session store like Redis so any server can handle any user's request.
That's Fundamentals. Mark it forged?
3 min read · try the examples if you haven't