Intermediate 3 min · March 05, 2026

System Design — Replication Lag Breaks Read-Your-Writes

After posting, users saw comments missing for up to 5 seconds — replication lag broke read-your-writes.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
Quick Answer
  • System design decides how your software handles growth, not just user features
  • Core components: load balancers, caches, databases, and message queues
  • Horizontal scaling adds machines; vertical scaling adds power to one machine
  • Caching reduces database load by 10x, but cache invalidation is the hard part
  • Biggest mistake: microservices for a 10-user app — start monolithic, split later

Every application you've ever loved — Instagram, Spotify, Google Maps — was built twice. Once in code, and once in architecture. The code handles what the app does. The architecture determines whether it survives Monday morning when a million users show up at once. System design is the discipline of making those architectural decisions deliberately, before production teaches you the hard way. It's the difference between a service that scales gracefully and one that crumbles under its own success.

The problem system design solves isn't a coding problem — it's a coordination problem. A single server can handle a few hundred users just fine. But what happens when you hit ten thousand? A hundred thousand? The naive answer is 'add a bigger server,' but that only buys you time. Real scalability means distributing work intelligently, caching aggressively, tolerating failure gracefully, and keeping data consistent without grinding everything to a halt. Each of those goals pulls in a different direction, and system design is the art of finding the right tension between them.

By the end of this article you'll understand the core building blocks every system is made of — load balancers, caches, databases, and message queues — why each one exists, when to reach for it, and what you give up when you do. You'll be able to look at a system description in an interview or a design doc and immediately start asking the right questions instead of staring at a blank whiteboard.

The Core Pillar: Scaling from One to a Million Users

In system design, we differentiate between Vertical Scaling (buying a bigger machine) and Horizontal Scaling (buying more machines). While Vertical Scaling is easy, it has a hard ceiling. Horizontal Scaling is the industry standard for modern distributed systems, but it introduces the need for Load Balancing and Data Consistency management.

To handle this, we use a Load Balancer (LB) as the entry point. The LB sits in front of your application servers and conducts traffic based on algorithms like Round Robin or Least Connections. This ensures no single server is overwhelmed while others sit idle.

Database Scaling and The CAP Theorem

When scaling databases, you'll eventually face the CAP Theorem: you can only have two out of three: Consistency, Availability, and Partition Tolerance. For a global system, we often use 'Read Replicas' to handle heavy traffic. We write to a 'Primary' node and read from 'Secondary' nodes. This improves performance but introduces 'Eventual Consistency'—where a user might not see their own post for a few milliseconds while data synchronizes.

Caching: The Fast Lane to Performance

Caching stores frequently accessed data in a faster storage layer (memory) so that repeated requests avoid expensive database calls. Common caching patterns include Cache-Aside (application checks cache first), Read-Through (cache loads from DB on miss), and Write-Through (write to cache and DB simultaneously). The most widely used cache is Redis, an in-memory data store with sub-millisecond latency.

Caching isn't free. You trade memory for speed, and you introduce the problem of cache invalidation — keeping the cache in sync with the source of truth. A stale cache can serve outdated data, sometimes breaking business logic.

Load Balancers: Beyond Round-Robin

Load balancers distribute incoming traffic across multiple servers. But the algorithm matters. Round-Robin is simple but assumes equal server capacity and health. Least Connections sends requests to the server with fewest active connections — better for variable-length requests. IP Hash can enable session persistence without sticky cookies, by hashing the client IP to a specific server.

In production, LBs also perform health checks (active and passive), SSL termination, and rate limiting. They can be L4 (TCP/UDP) or L7 (HTTP/HTTPS). L7 allows content-based routing, e.g., forwarding /api/ to one pool and /static/ to another.

Message Queues: Decouple and Scale Asynchronously

Message queues (e.g., RabbitMQ, Kafka, SQS) decouple producers from consumers. When a user uploads a video, the web server returns immediately and enqueues a task saying 'process video'. A separate worker picks that task, does the heavy lifting, and updates the database. This pattern keeps the web server responsive even under load.

Key concepts: Producer sends messages to a queue; Consumer pulls them; Queue buffers during spikes. Kafka adds the idea of log-based persistence and replayability. The trade-off is complexity: you now have at-least-once delivery, idempotency, and ordering challenges.

Vertical vs Horizontal Scaling
FeatureVertical Scaling (Scale Up)Horizontal Scaling (Scale Out)
ComplexityLow (No code changes needed)High (Requires Load Balancer & Distributed Logic)
HardwareIncreasing CPU/RAM on one boxAdding more standard commodity servers
AvailabilitySPOF (If server dies, app dies)High (Other nodes stay up if one fails)
CostExponentially expensive at high endLinear and more predictable

Key Takeaways

  • Scalability is not just about size; it's about the ability to handle growth gracefully without total architectural rewrites.
  • Caching is your best friend for performance—use Redis or Memcached to store 'hot' data and avoid expensive database hits.
  • State is the enemy of scaling. Keep your application servers 'stateless' so you can add or remove them at will without losing user sessions.
  • The CAP Theorem is a law of nature in distributed systems—accepting Eventual Consistency is often the price of high availability.

Common Mistakes to Avoid

  • Premature Optimization
    Symptom: Building a distributed microservices architecture for a product with 10 users leads to high complexity and slow feature delivery.
    Fix: Start monolithic but design with modularity so you can split services later when the user base grows.
  • Ignoring Latency
    Symptom: Adding too many network hops (e.g., App -> LB -> Cache -> DB) without measuring RTT, causing slow responses.
    Fix: Profile network hops early. Use a CDN for static assets, co-locate services in the same region, and batch requests where possible.
  • Hard-coding IPs
    Symptom: Service discovery fails when servers are replaced or scaled. Manual IP updates are error-prone and cause outages.
    Fix: Use service discovery (Consul, Eureka) or internal DNS names. Never reference IPs directly in config or code.

Interview Questions on This Topic

  • QYou are designing a service like 'URL Shortener' (TinyURL). How would you handle 100,000 requests per second with low latency?SeniorReveal
    Key components: design a database to store mappings (wide row like Cassandra or distributed SQL), use consistent hashing for sharding, cache hot URLs in Redis with TTL, and use a CDN for redirects. For write-heavy, consider pre-generating keys or using a distributed ID generator (e.g., Snowflake). Scale horizontally with stateless web servers behind an LB that does rate limiting.
  • QExplain the trade-offs between a NoSQL (e.g., MongoDB) and a Relational (e.g., PostgreSQL) database for a social media feed. When would you prefer one over the other?Mid-levelReveal
    PostgreSQL gives you strong consistency, joins, and ACID transactions. Good for user profiles, comments where references matter. MongoDB offers flexible schema and horizontal scaling with sharding. Better for activity feeds where the data is denormalized and read-heavy. In production, many social feeds use both: relational for user data, NoSQL for feed items, with eventual consistency where acceptable.
  • QHow does a Content Delivery Network (CDN) reduce latency for global users, and where does it sit in the system hierarchy?Mid-levelReveal
    A CDN caches static assets (images, CSS, JS, video) at edge servers near users. When a user requests a file, the CDN serves it from memory instead of the origin server. This cuts latency from hundreds of ms to tens of ms. The CDN sits in front of your load balancer: user -> CDN -> LB -> web server -> DB. It also offloads traffic from your origin, reducing server load.

Frequently Asked Questions

What is the difference between Scalability and Reliability?

Scalability is the system's ability to handle increased load by adding resources. Reliability is the system's ability to remain functional even when components fail. A system can be scalable but unreliable if its parts break frequently under that load.

When should I use a Message Queue like RabbitMQ or Kafka?

Use a Message Queue to decouple heavy tasks from the user request cycle. For example, when a user uploads a video, return 'Success' immediately and put the video processing task into a queue to be handled asynchronously by a worker service.

What is 'Sticky Sessions' and why are they generally avoided?

Sticky sessions force a specific user to always talk to the same server. While this makes session management easy, it makes scaling hard and load balancing uneven. It's better to use a distributed session store like Redis so any server can handle any user's request.

🔥

That's Fundamentals. Mark it forged?

3 min read · try the examples if you haven't

1 / 10 · Fundamentals
Next
Scalability Concepts