Senior 5 min · March 17, 2026

CQRS Pattern — Projection Lag and Stale Read Pitfalls

A 300ms projection lag caused duplicate payments and chargebacks.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Everything here is grounded in real deployments.

Follow
Production
production tested
June 10, 2026
last updated
1,554
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • CQRS separates write models (commands) from read models (queries) for independent optimisation
  • Commands change state using normalised stores with full business logic
  • Queries read from denormalised, pre-joined views optimised for display
  • Performance benefit: query latency drops ~70% because joins are eliminated at read time
  • Production gotcha: eventual consistency means stale reads until projection catches up
  • Biggest mistake: applying CQRS to simple CRUD — the complexity tax outweighs the gain
✦ Definition~90s read
What is CQRS Pattern?

CQRS (Command Query Responsibility Segregation) is an architectural pattern that separates the data mutation path (commands) from the data retrieval path (queries) into distinct models, often backed by separate data stores. It exists to solve the impedance mismatch that occurs when a single domain model tries to serve both transactional writes and complex reads efficiently — a problem that becomes acute in high-traffic or domain-rich systems where the write model is optimized for consistency and invariants, while the read model needs denormalized projections for fast queries.

Imagine you have a library with one desk for checking out books (write) and separate, faster desks just for looking up books (read).

By splitting these concerns, you can scale reads and writes independently, choose different storage engines (e.g., PostgreSQL for commands, Elasticsearch for queries), and avoid the performance tax of joining across aggregates on every read.

In practice, CQRS introduces a projection layer that consumes events or data changes from the write side and builds read-optimized views. This asynchronous update mechanism is the source of the pattern's primary trade-off: projection lag. When a command completes, the read model is not immediately consistent — there's a window where stale data may be served.

This isn't a bug; it's a deliberate design choice that trades strong consistency for availability and performance. You'll see this in production systems like event-driven microservices (e.g., at Uber or Netflix) where eventual consistency is acceptable for read-heavy UIs, but it's a non-starter for financial ledgers or inventory systems that require immediate read-after-write consistency.

CQRS is often conflated with Event Sourcing, but they are independent patterns. Event Sourcing stores state as a sequence of events; CQRS separates read/write models. They pair well because events from the write side naturally feed projections, but you can implement CQRS without Event Sourcing (e.g., using change data capture from a relational database) or use Event Sourcing without CQRS (rare, but possible).

Avoid CQRS when your domain is CRUD-heavy with simple queries, your team is small, or you can't tolerate the operational complexity of maintaining multiple data stores and handling projection failures. For most applications, a well-tuned relational database with materialized views or a caching layer is simpler and more appropriate.

Plain-English First

Imagine you have a library with one desk for checking out books (write) and separate, faster desks just for looking up books (read). The checkout desk has all the rules — you need a card, books must be returned — but the lookup desks are lean, with pre-sorted shelves so you find any book instantly. CQRS is like having these two different desks instead of one that tries to do both.

A 300ms projection lag between your write and read models can cause duplicate payments, chargebacks, and angry customers. CQRS trades strong consistency for performance, but most teams underestimate the engineering required to make that trade-off safe. This article covers how to bound eventual consistency, build idempotent commands, and avoid serving stale data when it actually matters.

CQRS: Separating Read and Write Models to Tame Complex Domains

Command Query Responsibility Segregation (CQRS) splits a system's data model into two distinct paths: commands (writes) and queries (reads). Instead of one monolithic model that both updates and retrieves data, CQRS uses separate models—often backed by different stores or schemas—optimized for their respective operations. This is not about physical separation of databases; it's a logical pattern that decouples the write-side invariants from the read-side projection logic.

In practice, commands validate business rules and produce events, while queries consume pre-computed projections built from those events. The write model enforces consistency and concurrency control (e.g., optimistic locking), while the read model can be denormalized, cached, or even served from a different engine (e.g., Elasticsearch). The two models are eventually consistent: a command's effect may not be immediately visible to queries. This lag is inherent, not a bug.

Use CQRS when your domain has high write contention or complex validation that would make a single model slow for reads. It shines in event-sourced systems, audit-heavy domains, or when different read shapes (aggregations, search, reporting) are needed. Avoid it for simple CRUD—you'll pay the complexity tax without benefit. Real systems like e-commerce order management or financial trading platforms use CQRS to scale reads independently from writes.

Eventual Consistency Is Not Optional
CQRS without eventual consistency is just two databases. If your business requires immediate read-after-write consistency, CQRS will break—you need a different pattern.
Production Insight
A payment service used CQRS with a 5-second projection lag. During a flash sale, users saw 'insufficient funds' on the read model while the write model had already deducted the amount, causing double-spend attempts and a 30-minute incident.
Symptom: stale read projections showing outdated balances, triggering false-positive fraud alerts and user-facing errors.
Rule of thumb: always measure and document the maximum projection lag, and design compensating actions (e.g., retry, fallback to write model) for reads that must be fresh.
Key Takeaway
CQRS decouples write and read models, but introduces eventual consistency—design for lag, not against it.
Projection lag is a first-class concern: monitor it, cap it, and expose it to callers via staleness headers.
Never use CQRS for simple CRUD; the complexity is justified only when write and read shapes diverge significantly.
CQRS Pattern — Projection Lag and Stale Read Pitfalls THECODEFORGE.IO CQRS Pattern — Projection Lag and Stale Read Pitfalls Flow from write model to read model with consistency trade-offs Write Model (Command Side) Handles commands, emits events Event Store / Bus Persists and publishes domain events Projection Builder Subscribes to events, updates read model Read Model (Query Side) Optimized for queries, eventually consistent Stale Read Risk Projection lag causes outdated data ⚠ Projection lag leads to stale reads in production Use idempotent projections and monitor lag metrics THECODEFORGE.IO
thecodeforge.io
CQRS Pattern — Projection Lag and Stale Read Pitfalls
Cqrs Pattern

The Core Pattern

CQRS splits your system into two logical halves: the command side (write) and the query side (read). The command side receives commands—imperative instructions like 'PlaceOrder' or 'UpdateProfile'—validates business rules, writes to a normalised store, and publishes an event. The query side subscribes to those events and builds denormalised read models that serve UI or API responses without joins. This separation lets you scale read and write independently: you might have 10 read replicas and 1 write master, or use different database technologies entirely (e.g., PostgreSQL for writes, Elasticsearch for reads).

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Package: io.thecodeforge.python.system_design

# COMMAND: changes state — returns nothing or just an ID
class CreateOrderCommand:
    def __init__(self, user_id: int, items: list, total: float):
        self.user_id = user_id
        self.items = items
        self.total = total

class OrderCommandHandler:
    def handle(self, cmd: CreateOrderCommand) -> int:
        # Business logic: validate, apply rules
        if cmd.total <= 0:
            raise ValueError('Order total must be positive')

        # Write to normalised store
        order_id = orders_db.insert({
            'user_id': cmd.user_id,
            'total': cmd.total,
            'status': 'pending'
        })
        for item in cmd.items:
            order_items_db.insert({'order_id': order_id, **item})

        # Publish event for read model update
        event_bus.publish('OrderCreated', {'order_id': order_id, **vars(cmd)})
        return order_id

# QUERY: reads state — returns data, changes nothing
class GetUserOrdersQuery:
    def __init__(self, user_id: int, page: int = 1):
        self.user_id = user_id
        self.page = page

class OrderQueryHandler:
    def handle(self, query: GetUserOrdersQuery):
        # Read from DENORMALISED read model — no joins needed
        return read_db.query(
            'SELECT * FROM user_orders_view WHERE user_id = ? ORDER BY created_at DESC LIMIT 20 OFFSET ?',
            [query.user_id, (query.page - 1) * 20]
        )
Output
# Commands write to normalised DB; queries read from denormalised view
Production Insight
Commands must be idempotent. If the event bus fails after the write but before event publishing, the command retry creates a duplicate write without an event.
Use an outbox table: write both command result and event in the same database transaction.
Never let the command side depend on the read model being up to date.
Key Takeaway
Commands change state via normalised stores.
Queries read from denormalised, pre-joined views.
The two sides share no data store — only events.

Maintaining the Read Model

Read models are not kept in sync by the write side. Instead, they are built and updated by projections—event handlers that listen to events and denormalise data into query-optimised tables. In the example below, an OrderProjection class listens for OrderCreated events and upserts a row in user_orders_view that includes denormalised user info and pre-joined item names. This removes all joins from the read path, making queries extremely fast. The trade-off is eventual consistency: there is a window between the write commit and the projection update where the read model is stale.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Read model is updated asynchronously by consuming events
class OrderProjection:
    """Keeps user_orders_view up to date by handling OrderCreated events."""

    def on_order_created(self, event):
        # Denormalise: join order + user + items into one read-optimised row
        user = users_db.get(event['user_id'])
        items = event['items']

        read_db.upsert('user_orders_view', {
            'order_id':   event['order_id'],
            'user_id':    event['user_id'],
            'user_name':  user['name'],         # denormalised
            'user_email': user['email'],         # denormalised
            'total':      event['total'],
            'item_count': len(items),            # pre-computed
            'item_names': ', '.join(i['name'] for i in items),  # pre-joined
            'status':     'pending',
            'created_at': event['timestamp']
        })

# Trade-off: read model is eventually consistent with the write model
# Between OrderCreated event and projection update, a brief window of inconsistency
Output
# Read model updated asynchronously — eventual consistency
Production Insight
Projection failures can cause silent data loss. If the event handler crashes mid-upsert, the read model stays stale until you replay the event.
Always make projections idempotent: use upsert with a deterministic key (e.g., order_id).
Monitor projection lag as a standard metric — set SLO thresholds and alert when breached.
Key Takeaway
Projections build denormalised read models from events.
Idempotency ensures safe replays after crashes.
Monitor lag — don't assume 'eventual' is always fast enough.

When to Apply CQRS (and When to Avoid It)

CQRS adds significant complexity: you now maintain two models, an event pipeline, and deal with eventual consistency. Apply it only when the benefits clearly outweigh the cost. The sweet spot is when read and write workloads have fundamentally different performance characteristics—for example, writes are transactional with frequent updates to many related tables, while reads need to aggregate data from multiple sources and serve high traffic with low latency. Avoid CQRS for simple CRUD apps where a single normalised model can handle both tasks efficiently. Start with a monolithic model, measure, and extract read models only when you hit a measurable performance bottleneck.

Production Insight
A common mistake is adopting CQRS preemptively 'because we might scale later'. Premature CQRS adds months of development overhead for no immediate gain.
Measure first: profile your read queries. If 80% of reads are simple lookups with <5ms latency, CQRS will not help.
Start with a query-optimised view (materialised view, secondary index) before splitting into a full read model.
Key Takeaway
CQRS adds complexity — use only when read/write needs genuinely diverge.
Measure before committing, not after.
Start simple, extract read models as a proven optimisation.
Should You Use CQRS?
IfRead and write workloads have similar latency and throughput requirements
UseDo not use CQRS — a single model with proper indexing is simpler and correct.
IfReads require expensive joins across multiple tables at high throughput
UseCQRS is a good fit — denormalise into a read model to eliminate joins.
IfYou need different storage technologies for reads vs writes (e.g., Elasticsearch for search, PostgreSQL for transactions)
UseCQRS required — you need separate models per storage engine.
IfTeam is new to event-driven architecture and eventual consistency
UseDelay CQRS until the team is comfortable with async error handling and monitoring projection lag.

CQRS and Event Sourcing: Separate Patterns That Work Well Together

CQRS and Event Sourcing are often mentioned together but are independent. CQRS separates read and write models. Event Sourcing stores all state changes as an ordered sequence of immutable events, instead of current state. They combine naturally: the event store becomes the write model, and the projections build read models from those events. However, you can absolutely use CQRS without Event Sourcing — you can implement a simple write model that updates a normalised table and emits events to update the read model. Conversely, you can use Event Sourcing without CQRS by building a single model from events for both reads and writes (though that's unusual). The key insight: CQRS is about separation of concerns, not about storage strategy.

CQRS vs Event Sourcing
  • CQRS separates the act of writing (commands) from the act of reading (queries).
  • Event Sourcing stores every state change as an event — you replay events to get current state.
  • You can have CQRS without Event Sourcing: just update a normalised write table and publish events for projections.
  • You can have Event Sourcing without CQRS: though rare, you can read from the event stream directly.
  • Together: Event Sourcing provides the event stream that CQRS projections consume.
Production Insight
Teams often conflate CQRS with Event Sourcing and implement both when only one is needed.
If you need audit trails and full history, Event Sourcing is necessary; CQRS is optional.
If you need separate read/write models for performance, CQRS is necessary; Event Sourcing is optional.
Know which problem you're solving before picking both patterns.
Key Takeaway
CQRS ≠ Event Sourcing.
CQRS separates reads from writes.
Event Sourcing stores history as events.
They complement but do not require each other.

Production Trade-offs: Consistency, Complexity, and Cost

Deploying CQRS in production introduces three core trade-offs you must design for. First, eventual consistency: your read model lags behind the write model. You must decide acceptable staleness per use case — 1 second for dashboards, near-zero for payment confirmations. Second, operational complexity: you now have two databases, an event bus, projections, and monitoring. Each component becomes a failure domain. Third, cost: storing two copies of data (write model + read model) doubles storage. Denormalised read models can be larger due to redundant data. You also pay for the event bus infrastructure. However, read query performance improvements can offset these costs by reducing need for read replicas and expensive joins.

Production Insight
Choose the right consistency guarantees per endpoint. Not all reads need immediate consistency.
Set SLOs for projection lag (e.g., p99 < 200ms) and alert on violation.
Plan for failure of event bus — have fallback that queries write model directly for critical reads.
Document the data flow: each read should know whether it's eventually consistent or not.
Key Takeaway
Eventual consistency is a design parameter, not a bug.
Complexity scales with number of read models — each one must be built, tested, and monitored.
Cost of storage and infrastructure is offset by query performance and scalability.

CQRS Without Event Sourcing: The Database That Can't Decide What It Is

Here's where juniors get burned. They see CQRS and immediately pair it with Event Sourcing. That's not mandatory. You can implement CQRS with two separate databases right now. A PostgreSQL for writes, an Elasticsearch for reads. The command side writes normalized aggregates. The query side denormalizes into a search-optimized index. This buys you independent scaling. Writes spike? Scale the write database. Reads spike? Scale the read database. No shared lock contention. The cost is eventual consistency. Your write side commits, your read side lags by milliseconds or seconds depending on your sync mechanism. That's a tradeoff you accept when your business logic demands audit trails and your read side needs to answer 'how many orders were placed in the last hour' without five JOINs. The production pattern is simple: write to Postgres, publish a domain event to a message queue, consume that event and update Elasticsearch. No Event Sourcing. No replay. Just two databases talking through a queue.

CommandHandler.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// io.thecodeforge
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;

@Service
public class CreateOrderHandler {
    private final OrderRepository writeRepo;
    private final EventPublisher publisher;

    public CreateOrderHandler(OrderRepository writeRepo, EventPublisher publisher) {
        this.writeRepo = writeRepo;
        this.publisher = publisher;
    }

    @Transactional
    public OrderId handle(CreateOrderCommand command) {
        Order order = Order.create(command.customerId(), command.items());
        writeRepo.save(order);
        publisher.publish(new OrderCreatedEvent(
            order.id(),
            order.customerId(),
            order.total(),
            Instant.now()
        ));
        return order.id();
    }
}
Output
Order saved to write DB. Event published to queue.
Production Trap:
If your sync mechanism (queue, CDC, scheduled batch) fails, your read model goes stale. Monitor read-model lag aggressively. Set alerts at 30 seconds. Anything beyond that is a P1.
Key Takeaway
CQRS is about separate models, not separate histories. Use two databases and a queue. Skip Event Sourcing until you need replay.

Why Your Repository Pattern Won't Cut It for Complex Writes

Repository pattern is fine for CRUD. It is a death sentence for domain logic with invariants. Here's the problem: a repository exposes methods like save(Order). That's one atomic operation. But in a real system, placing an order might involve checking inventory, validating credit, applying a discount, and updating an account balance. All in one transaction. With a repository, you end up with a fat service that calls orderRepo.save(), inventoryRepo.update(), accountRepo.adjust(). That's where race conditions live. Two simultaneous orders for the last item in stock both pass the check. CQRS forces you to model the write as a command. A command is not a dumb data holder. It carries intent: PlaceOrderCommand, CancelOrderCommand, AddItemCommand. The command handler orchestrates the business logic in isolation. It fires domain events after success. The query side picks up those events and builds whatever read projections it needs. This separation enforces a boundary. Writes are transactions. Reads are snapshots. You stop mixing concerns. Your code stops mixing concerns. Your database stops mixing concerns. The complexity moves from 'how do I debug this race condition' to 'how do I handle a failed sync.' That's a better problem to have.

OrderProjectionUpdater.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// io.thecodeforge
import org.springframework.stereotype.Component;
import org.springframework.transaction.event.TransactionalEventListener;

@Component
public class OrderProjectionUpdater {
    private final OrderReadRepository readRepo;

    public OrderProjectionUpdater(OrderReadRepository readRepo) {
        this.readRepo = readRepo;
    }

    @TransactionalEventListener
    public void on(OrderCreatedEvent event) {
        OrderProjection projection = new OrderProjection(
            event.orderId(),
            event.customerId(),
            event.total(),
            event.createdAt()
        );
        readRepo.upsert(projection);
    }

    @TransactionalEventListener
    public void on(OrderCancelledEvent event) {
        readRepo.updateStatus(event.orderId(), "CANCELLED");
    }
}
Output
Read projection updated asynchronously after domain events.
Senior Insight:
Commands should be idempotent. If the queue redelivers a message, your command handler should detect that the order already exists and return. Defensive design pays off on Fridays at 5 PM.
Key Takeaway
Repository pattern is for simple CRUD wrappers. Commands are for complex business operations with invariants. CQRS forces the split.
● Production incidentPOST-MORTEMseverity: high

Stale Read Model Leads to Customer Chargeback Spike

Symptom
Customers reported seeing old balances after making payments. Support tickets surged with 'I paid but my balance still shows due'. Payment gateway showed duplicate payment attempts at scale.
Assumption
The team assumed eventual consistency meant 'within a few seconds' — acceptable for the use case.
Root cause
The read model projection consumed events from a single Kafka partition that backed up under peak load. Projection latency grew linearly with event volume, reaching over 300ms on average, with tail latencies of 2+ seconds. Users hitting the read model within that window saw stale data and retried payments.
Fix
1) Added a read-after-write consistency check on the order confirmation endpoint — forces a projection refresh before returning 200. 2) Partitioned the event stream by customer ID to parallelise projections. 3) Added monotonic counters to the write model so the read model can reject stale updates.
Key lesson
  • Eventual consistency has a measurable latency bound — quantify it under peak load, don't assume 'eventual' means 'fast enough'.
  • Always add idempotency in the write side to handle duplicate commands from user retries.
  • Use read-after-write consistency for critical paths (payment confirmations, balance checks).
Production debug guideSymptom → action guide for projection lag and stale read models4 entries
Symptom · 01
User sees stale data after a write (e.g., old balance, missing order)
Fix
Check projection lag: compare last event timestamp on the read model vs current time. If lag > expected, inspect event bus (Kafka consumer lag, RabbitMQ queue depth).
Symptom · 02
Read model missing some records entirely
Fix
Verify event stream completeness — replay events from the start and count expected vs actual projection writes. Look for event deserialisation errors in projection logs.
Symptom · 03
Inconsistent read model across replicas (same query returns different results)
Fix
Check if each replica consumes from the same event log with same offset. Enable deterministic projection replay (idempotent, order-independent).
Symptom · 04
Write succeeds but read model never updates
Fix
Confirm the write side publishes the event after commit. Use outbox pattern to ensure atomic write + event publish. Check event routing — the projection must subscribe to the correct event type.
★ CQRS Projection Lag Debug Cheat SheetCommands you can run to diagnose stale read models in a typical CQRS system with Kafka and PostgreSQL
Projection lag unknown
Immediate action
Check Kafka consumer group lag
Commands
kafka-consumer-groups --bootstrap-server localhost:9092 --group order-projection --describe
tail -n 100 /var/log/projection/application.log | grep 'LAG'
Fix now
If lag > 1000, restart the projection service to force rebalance. If persistent, partition the event stream.
Read model row count doesn't match write model+
Immediate action
Count rows in both tables
Commands
SELECT COUNT(*) FROM write_orders; SELECT COUNT(*) FROM read_user_orders;
SELECT event_id, order_id FROM events WHERE event_type='OrderCreated' ORDER BY event_id DESC LIMIT 10;
Fix now
Replay projection from last known good offset: docker-compose run projection --replay-from-offset=12345
Read model returns stale data after write+
Immediate action
Measure the time between write response and read model update
Commands
curl -w '%{time_total}' -X POST http://orders/create -d '{"item":"test"}'
SELECT NOW() - created_at AS age FROM read_user_orders WHERE order_id = <id>;
Fix now
If latency > 500ms, add a read-after-write sync endpoint for critical reads.

Key takeaways

1
CQRS separates write model (commands) from read model (queries)
optimise each independently.
2
Read models are denormalised and pre-computed
fast reads, no joins at query time.
3
Read models are eventually consistent
updated asynchronously via events.
4
CQRS complexity is high
use only when read/write performance requirements genuinely diverge.
5
CQRS pairs naturally with Event Sourcing but does not require it.

Common mistakes to avoid

4 patterns
×

Not handling projection failures

Symptom
Read model becomes permanently stale after a transient error, and users see old data for days until manual intervention.
Fix
Make projections idempotent (upsert by key). Implement a replay mechanism that can reprocess events from any offset. Monitor projection error rate and lag with alerts.
×

Assuming eventual consistency is negligible

Symptom
The read model lag grows unbounded under load, causing critical features (e.g., balance checks) to behave incorrectly.
Fix
Quantify acceptable lag per query type. For time-sensitive reads, implement read-after-write consistency or route to write model directly.
×

Using the same database for both write and read models

Symptom
You don't get the performance or scalability benefits of CQRS — still fighting with the same bottleneck.
Fix
Use separate databases (or at least separate schemas/instances) optimised for each access pattern. Write model: normalised, ACID. Read model: denormalised, potentially no joins, can use columnar or document store.
×

Not designing commands to be idempotent

Symptom
If a command is retried due to network timeout, the system creates duplicate records or inconsistent state.
Fix
Assign a unique command ID to each command. The write side stores processed IDs and rejects duplicates. This prevents double processing when the event publish fails and the command is retried.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is CQRS and what problem does it solve?
Q02SENIOR
What is eventual consistency in the context of CQRS?
Q03SENIOR
What is the difference between CQRS and Event Sourcing?
Q04SENIOR
How do you handle idempotency in a CQRS system?
Q01 of 04JUNIOR

What is CQRS and what problem does it solve?

ANSWER
CQRS stands for Command Query Responsibility Segregation. It solves the performance problem that arises when the same data model is used for both writes (normalised, with enforced business rules) and reads (often requiring joins). By separating into a write model that handles commands and a read model that serves queries, you can optimise each independently. The write model uses a normalised store with full validation, the read model uses denormalised, pre-joined views for fast queries. The trade-off is eventual consistency between the two models.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between CQRS and Event Sourcing?
02
When should I NOT use CQRS?
03
Can I use CQRS without Event Sourcing?
04
How do I ensure read-after-write consistency in CQRS?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Everything here is grounded in real deployments.

Follow
Verified
production tested
June 10, 2026
last updated
1,554
articles · all by Naren
🔥

That's Architecture. Mark it forged?

5 min read · try the examples if you haven't

Previous
Event-Driven Architecture
5 / 13 · Architecture
Next
Event Sourcing