CQRS Pattern — Projection Lag and Stale Read Pitfalls
A 300ms projection lag caused duplicate payments and chargebacks.
20+ years shipping large-scale distributed systems. Everything here is grounded in real deployments.
- CQRS separates write models (commands) from read models (queries) for independent optimisation
- Commands change state using normalised stores with full business logic
- Queries read from denormalised, pre-joined views optimised for display
- Performance benefit: query latency drops ~70% because joins are eliminated at read time
- Production gotcha: eventual consistency means stale reads until projection catches up
- Biggest mistake: applying CQRS to simple CRUD — the complexity tax outweighs the gain
Imagine you have a library with one desk for checking out books (write) and separate, faster desks just for looking up books (read). The checkout desk has all the rules — you need a card, books must be returned — but the lookup desks are lean, with pre-sorted shelves so you find any book instantly. CQRS is like having these two different desks instead of one that tries to do both.
A 300ms projection lag between your write and read models can cause duplicate payments, chargebacks, and angry customers. CQRS trades strong consistency for performance, but most teams underestimate the engineering required to make that trade-off safe. This article covers how to bound eventual consistency, build idempotent commands, and avoid serving stale data when it actually matters.
CQRS: Separating Read and Write Models to Tame Complex Domains
Command Query Responsibility Segregation (CQRS) splits a system's data model into two distinct paths: commands (writes) and queries (reads). Instead of one monolithic model that both updates and retrieves data, CQRS uses separate models—often backed by different stores or schemas—optimized for their respective operations. This is not about physical separation of databases; it's a logical pattern that decouples the write-side invariants from the read-side projection logic.
In practice, commands validate business rules and produce events, while queries consume pre-computed projections built from those events. The write model enforces consistency and concurrency control (e.g., optimistic locking), while the read model can be denormalized, cached, or even served from a different engine (e.g., Elasticsearch). The two models are eventually consistent: a command's effect may not be immediately visible to queries. This lag is inherent, not a bug.
Use CQRS when your domain has high write contention or complex validation that would make a single model slow for reads. It shines in event-sourced systems, audit-heavy domains, or when different read shapes (aggregations, search, reporting) are needed. Avoid it for simple CRUD—you'll pay the complexity tax without benefit. Real systems like e-commerce order management or financial trading platforms use CQRS to scale reads independently from writes.
The Core Pattern
CQRS splits your system into two logical halves: the command side (write) and the query side (read). The command side receives commands—imperative instructions like 'PlaceOrder' or 'UpdateProfile'—validates business rules, writes to a normalised store, and publishes an event. The query side subscribes to those events and builds denormalised read models that serve UI or API responses without joins. This separation lets you scale read and write independently: you might have 10 read replicas and 1 write master, or use different database technologies entirely (e.g., PostgreSQL for writes, Elasticsearch for reads).
Maintaining the Read Model
Read models are not kept in sync by the write side. Instead, they are built and updated by projections—event handlers that listen to events and denormalise data into query-optimised tables. In the example below, an OrderProjection class listens for OrderCreated events and upserts a row in user_orders_view that includes denormalised user info and pre-joined item names. This removes all joins from the read path, making queries extremely fast. The trade-off is eventual consistency: there is a window between the write commit and the projection update where the read model is stale.
When to Apply CQRS (and When to Avoid It)
CQRS adds significant complexity: you now maintain two models, an event pipeline, and deal with eventual consistency. Apply it only when the benefits clearly outweigh the cost. The sweet spot is when read and write workloads have fundamentally different performance characteristics—for example, writes are transactional with frequent updates to many related tables, while reads need to aggregate data from multiple sources and serve high traffic with low latency. Avoid CQRS for simple CRUD apps where a single normalised model can handle both tasks efficiently. Start with a monolithic model, measure, and extract read models only when you hit a measurable performance bottleneck.
CQRS and Event Sourcing: Separate Patterns That Work Well Together
CQRS and Event Sourcing are often mentioned together but are independent. CQRS separates read and write models. Event Sourcing stores all state changes as an ordered sequence of immutable events, instead of current state. They combine naturally: the event store becomes the write model, and the projections build read models from those events. However, you can absolutely use CQRS without Event Sourcing — you can implement a simple write model that updates a normalised table and emits events to update the read model. Conversely, you can use Event Sourcing without CQRS by building a single model from events for both reads and writes (though that's unusual). The key insight: CQRS is about separation of concerns, not about storage strategy.
- CQRS separates the act of writing (commands) from the act of reading (queries).
- Event Sourcing stores every state change as an event — you replay events to get current state.
- You can have CQRS without Event Sourcing: just update a normalised write table and publish events for projections.
- You can have Event Sourcing without CQRS: though rare, you can read from the event stream directly.
- Together: Event Sourcing provides the event stream that CQRS projections consume.
Production Trade-offs: Consistency, Complexity, and Cost
Deploying CQRS in production introduces three core trade-offs you must design for. First, eventual consistency: your read model lags behind the write model. You must decide acceptable staleness per use case — 1 second for dashboards, near-zero for payment confirmations. Second, operational complexity: you now have two databases, an event bus, projections, and monitoring. Each component becomes a failure domain. Third, cost: storing two copies of data (write model + read model) doubles storage. Denormalised read models can be larger due to redundant data. You also pay for the event bus infrastructure. However, read query performance improvements can offset these costs by reducing need for read replicas and expensive joins.
CQRS Without Event Sourcing: The Database That Can't Decide What It Is
Here's where juniors get burned. They see CQRS and immediately pair it with Event Sourcing. That's not mandatory. You can implement CQRS with two separate databases right now. A PostgreSQL for writes, an Elasticsearch for reads. The command side writes normalized aggregates. The query side denormalizes into a search-optimized index. This buys you independent scaling. Writes spike? Scale the write database. Reads spike? Scale the read database. No shared lock contention. The cost is eventual consistency. Your write side commits, your read side lags by milliseconds or seconds depending on your sync mechanism. That's a tradeoff you accept when your business logic demands audit trails and your read side needs to answer 'how many orders were placed in the last hour' without five JOINs. The production pattern is simple: write to Postgres, publish a domain event to a message queue, consume that event and update Elasticsearch. No Event Sourcing. No replay. Just two databases talking through a queue.
Why Your Repository Pattern Won't Cut It for Complex Writes
Repository pattern is fine for CRUD. It is a death sentence for domain logic with invariants. Here's the problem: a repository exposes methods like save(Order). That's one atomic operation. But in a real system, placing an order might involve checking inventory, validating credit, applying a discount, and updating an account balance. All in one transaction. With a repository, you end up with a fat service that calls orderRepo.save(), inventoryRepo.update(), accountRepo.adjust(). That's where race conditions live. Two simultaneous orders for the last item in stock both pass the check. CQRS forces you to model the write as a command. A command is not a dumb data holder. It carries intent: PlaceOrderCommand, CancelOrderCommand, AddItemCommand. The command handler orchestrates the business logic in isolation. It fires domain events after success. The query side picks up those events and builds whatever read projections it needs. This separation enforces a boundary. Writes are transactions. Reads are snapshots. You stop mixing concerns. Your code stops mixing concerns. Your database stops mixing concerns. The complexity moves from 'how do I debug this race condition' to 'how do I handle a failed sync.' That's a better problem to have.
Stale Read Model Leads to Customer Chargeback Spike
- Eventual consistency has a measurable latency bound — quantify it under peak load, don't assume 'eventual' means 'fast enough'.
- Always add idempotency in the write side to handle duplicate commands from user retries.
- Use read-after-write consistency for critical paths (payment confirmations, balance checks).
kafka-consumer-groups --bootstrap-server localhost:9092 --group order-projection --describetail -n 100 /var/log/projection/application.log | grep 'LAG'Key takeaways
Common mistakes to avoid
4 patternsNot handling projection failures
Assuming eventual consistency is negligible
Using the same database for both write and read models
Not designing commands to be idempotent
Interview Questions on This Topic
What is CQRS and what problem does it solve?
Frequently Asked Questions
20+ years shipping large-scale distributed systems. Everything here is grounded in real deployments.
That's Architecture. Mark it forged?
5 min read · try the examples if you haven't