Senior 3 min · March 17, 2026

CQRS Pattern — Projection Lag and Stale Read Pitfalls

A 300ms projection lag caused duplicate payments and chargebacks.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • CQRS separates write models (commands) from read models (queries) for independent optimisation
  • Commands change state using normalised stores with full business logic
  • Queries read from denormalised, pre-joined views optimised for display
  • Performance benefit: query latency drops ~70% because joins are eliminated at read time
  • Production gotcha: eventual consistency means stale reads until projection catches up
  • Biggest mistake: applying CQRS to simple CRUD — the complexity tax outweighs the gain
Plain-English First

Imagine you have a library with one desk for checking out books (write) and separate, faster desks just for looking up books (read). The checkout desk has all the rules — you need a card, books must be returned — but the lookup desks are lean, with pre-sorted shelves so you find any book instantly. CQRS is like having these two different desks instead of one that tries to do both.

The Core Pattern

CQRS splits your system into two logical halves: the command side (write) and the query side (read). The command side receives commands—imperative instructions like 'PlaceOrder' or 'UpdateProfile'—validates business rules, writes to a normalised store, and publishes an event. The query side subscribes to those events and builds denormalised read models that serve UI or API responses without joins. This separation lets you scale read and write independently: you might have 10 read replicas and 1 write master, or use different database technologies entirely (e.g., PostgreSQL for writes, Elasticsearch for reads).

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Package: io.thecodeforge.python.system_design

# COMMAND: changes state — returns nothing or just an ID
class CreateOrderCommand:
    def __init__(self, user_id: int, items: list, total: float):
        self.user_id = user_id
        self.items = items
        self.total = total

class OrderCommandHandler:
    def handle(self, cmd: CreateOrderCommand) -> int:
        # Business logic: validate, apply rules
        if cmd.total <= 0:
            raise ValueError('Order total must be positive')

        # Write to normalised store
        order_id = orders_db.insert({
            'user_id': cmd.user_id,
            'total': cmd.total,
            'status': 'pending'
        })
        for item in cmd.items:
            order_items_db.insert({'order_id': order_id, **item})

        # Publish event for read model update
        event_bus.publish('OrderCreated', {'order_id': order_id, **vars(cmd)})
        return order_id

# QUERY: reads state — returns data, changes nothing
class GetUserOrdersQuery:
    def __init__(self, user_id: int, page: int = 1):
        self.user_id = user_id
        self.page = page

class OrderQueryHandler:
    def handle(self, query: GetUserOrdersQuery):
        # Read from DENORMALISED read model — no joins needed
        return read_db.query(
            'SELECT * FROM user_orders_view WHERE user_id = ? ORDER BY created_at DESC LIMIT 20 OFFSET ?',
            [query.user_id, (query.page - 1) * 20]
        )
Output
# Commands write to normalised DB; queries read from denormalised view
Production Insight
Commands must be idempotent. If the event bus fails after the write but before event publishing, the command retry creates a duplicate write without an event.
Use an outbox table: write both command result and event in the same database transaction.
Never let the command side depend on the read model being up to date.
Key Takeaway
Commands change state via normalised stores.
Queries read from denormalised, pre-joined views.
The two sides share no data store — only events.

Maintaining the Read Model

Read models are not kept in sync by the write side. Instead, they are built and updated by projections—event handlers that listen to events and denormalise data into query-optimised tables. In the example below, an OrderProjection class listens for OrderCreated events and upserts a row in user_orders_view that includes denormalised user info and pre-joined item names. This removes all joins from the read path, making queries extremely fast. The trade-off is eventual consistency: there is a window between the write commit and the projection update where the read model is stale.

ExamplePYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Read model is updated asynchronously by consuming events
class OrderProjection:
    """Keeps user_orders_view up to date by handling OrderCreated events."""

    def on_order_created(self, event):
        # Denormalise: join order + user + items into one read-optimised row
        user = users_db.get(event['user_id'])
        items = event['items']

        read_db.upsert('user_orders_view', {
            'order_id':   event['order_id'],
            'user_id':    event['user_id'],
            'user_name':  user['name'],         # denormalised
            'user_email': user['email'],         # denormalised
            'total':      event['total'],
            'item_count': len(items),            # pre-computed
            'item_names': ', '.join(i['name'] for i in items),  # pre-joined
            'status':     'pending',
            'created_at': event['timestamp']
        })

# Trade-off: read model is eventually consistent with the write model
# Between OrderCreated event and projection update, a brief window of inconsistency
Output
# Read model updated asynchronously — eventual consistency
Production Insight
Projection failures can cause silent data loss. If the event handler crashes mid-upsert, the read model stays stale until you replay the event.
Always make projections idempotent: use upsert with a deterministic key (e.g., order_id).
Monitor projection lag as a standard metric — set SLO thresholds and alert when breached.
Key Takeaway
Projections build denormalised read models from events.
Idempotency ensures safe replays after crashes.
Monitor lag — don't assume 'eventual' is always fast enough.

When to Apply CQRS (and When to Avoid It)

CQRS adds significant complexity: you now maintain two models, an event pipeline, and deal with eventual consistency. Apply it only when the benefits clearly outweigh the cost. The sweet spot is when read and write workloads have fundamentally different performance characteristics—for example, writes are transactional with frequent updates to many related tables, while reads need to aggregate data from multiple sources and serve high traffic with low latency. Avoid CQRS for simple CRUD apps where a single normalised model can handle both tasks efficiently. Start with a monolithic model, measure, and extract read models only when you hit a measurable performance bottleneck.

Production Insight
A common mistake is adopting CQRS preemptively 'because we might scale later'. Premature CQRS adds months of development overhead for no immediate gain.
Measure first: profile your read queries. If 80% of reads are simple lookups with <5ms latency, CQRS will not help.
Start with a query-optimised view (materialised view, secondary index) before splitting into a full read model.
Key Takeaway
CQRS adds complexity — use only when read/write needs genuinely diverge.
Measure before committing, not after.
Start simple, extract read models as a proven optimisation.
Should You Use CQRS?
IfRead and write workloads have similar latency and throughput requirements
UseDo not use CQRS — a single model with proper indexing is simpler and correct.
IfReads require expensive joins across multiple tables at high throughput
UseCQRS is a good fit — denormalise into a read model to eliminate joins.
IfYou need different storage technologies for reads vs writes (e.g., Elasticsearch for search, PostgreSQL for transactions)
UseCQRS required — you need separate models per storage engine.
IfTeam is new to event-driven architecture and eventual consistency
UseDelay CQRS until the team is comfortable with async error handling and monitoring projection lag.

CQRS and Event Sourcing: Separate Patterns That Work Well Together

CQRS and Event Sourcing are often mentioned together but are independent. CQRS separates read and write models. Event Sourcing stores all state changes as an ordered sequence of immutable events, instead of current state. They combine naturally: the event store becomes the write model, and the projections build read models from those events. However, you can absolutely use CQRS without Event Sourcing — you can implement a simple write model that updates a normalised table and emits events to update the read model. Conversely, you can use Event Sourcing without CQRS by building a single model from events for both reads and writes (though that's unusual). The key insight: CQRS is about separation of concerns, not about storage strategy.

CQRS vs Event Sourcing
  • CQRS separates the act of writing (commands) from the act of reading (queries).
  • Event Sourcing stores every state change as an event — you replay events to get current state.
  • You can have CQRS without Event Sourcing: just update a normalised write table and publish events for projections.
  • You can have Event Sourcing without CQRS: though rare, you can read from the event stream directly.
  • Together: Event Sourcing provides the event stream that CQRS projections consume.
Production Insight
Teams often conflate CQRS with Event Sourcing and implement both when only one is needed.
If you need audit trails and full history, Event Sourcing is necessary; CQRS is optional.
If you need separate read/write models for performance, CQRS is necessary; Event Sourcing is optional.
Know which problem you're solving before picking both patterns.
Key Takeaway
CQRS ≠ Event Sourcing.
CQRS separates reads from writes.
Event Sourcing stores history as events.
They complement but do not require each other.

Production Trade-offs: Consistency, Complexity, and Cost

Deploying CQRS in production introduces three core trade-offs you must design for. First, eventual consistency: your read model lags behind the write model. You must decide acceptable staleness per use case — 1 second for dashboards, near-zero for payment confirmations. Second, operational complexity: you now have two databases, an event bus, projections, and monitoring. Each component becomes a failure domain. Third, cost: storing two copies of data (write model + read model) doubles storage. Denormalised read models can be larger due to redundant data. You also pay for the event bus infrastructure. However, read query performance improvements can offset these costs by reducing need for read replicas and expensive joins.

Production Insight
Choose the right consistency guarantees per endpoint. Not all reads need immediate consistency.
Set SLOs for projection lag (e.g., p99 < 200ms) and alert on violation.
Plan for failure of event bus — have fallback that queries write model directly for critical reads.
Document the data flow: each read should know whether it's eventually consistent or not.
Key Takeaway
Eventual consistency is a design parameter, not a bug.
Complexity scales with number of read models — each one must be built, tested, and monitored.
Cost of storage and infrastructure is offset by query performance and scalability.
● Production incidentPOST-MORTEMseverity: high

Stale Read Model Leads to Customer Chargeback Spike

Symptom
Customers reported seeing old balances after making payments. Support tickets surged with 'I paid but my balance still shows due'. Payment gateway showed duplicate payment attempts at scale.
Assumption
The team assumed eventual consistency meant 'within a few seconds' — acceptable for the use case.
Root cause
The read model projection consumed events from a single Kafka partition that backed up under peak load. Projection latency grew linearly with event volume, reaching over 300ms on average, with tail latencies of 2+ seconds. Users hitting the read model within that window saw stale data and retried payments.
Fix
1) Added a read-after-write consistency check on the order confirmation endpoint — forces a projection refresh before returning 200. 2) Partitioned the event stream by customer ID to parallelise projections. 3) Added monotonic counters to the write model so the read model can reject stale updates.
Key lesson
  • Eventual consistency has a measurable latency bound — quantify it under peak load, don't assume 'eventual' means 'fast enough'.
  • Always add idempotency in the write side to handle duplicate commands from user retries.
  • Use read-after-write consistency for critical paths (payment confirmations, balance checks).
Production debug guideSymptom → action guide for projection lag and stale read models4 entries
Symptom · 01
User sees stale data after a write (e.g., old balance, missing order)
Fix
Check projection lag: compare last event timestamp on the read model vs current time. If lag > expected, inspect event bus (Kafka consumer lag, RabbitMQ queue depth).
Symptom · 02
Read model missing some records entirely
Fix
Verify event stream completeness — replay events from the start and count expected vs actual projection writes. Look for event deserialisation errors in projection logs.
Symptom · 03
Inconsistent read model across replicas (same query returns different results)
Fix
Check if each replica consumes from the same event log with same offset. Enable deterministic projection replay (idempotent, order-independent).
Symptom · 04
Write succeeds but read model never updates
Fix
Confirm the write side publishes the event after commit. Use outbox pattern to ensure atomic write + event publish. Check event routing — the projection must subscribe to the correct event type.
★ CQRS Projection Lag Debug Cheat SheetCommands you can run to diagnose stale read models in a typical CQRS system with Kafka and PostgreSQL
Projection lag unknown
Immediate action
Check Kafka consumer group lag
Commands
kafka-consumer-groups --bootstrap-server localhost:9092 --group order-projection --describe
tail -n 100 /var/log/projection/application.log | grep 'LAG'
Fix now
If lag > 1000, restart the projection service to force rebalance. If persistent, partition the event stream.
Read model row count doesn't match write model+
Immediate action
Count rows in both tables
Commands
SELECT COUNT(*) FROM write_orders; SELECT COUNT(*) FROM read_user_orders;
SELECT event_id, order_id FROM events WHERE event_type='OrderCreated' ORDER BY event_id DESC LIMIT 10;
Fix now
Replay projection from last known good offset: docker-compose run projection --replay-from-offset=12345
Read model returns stale data after write+
Immediate action
Measure the time between write response and read model update
Commands
curl -w '%{time_total}' -X POST http://orders/create -d '{"item":"test"}'
SELECT NOW() - created_at AS age FROM read_user_orders WHERE order_id = <id>;
Fix now
If latency > 500ms, add a read-after-write sync endpoint for critical reads.

Key takeaways

1
CQRS separates write model (commands) from read model (queries)
optimise each independently.
2
Read models are denormalised and pre-computed
fast reads, no joins at query time.
3
Read models are eventually consistent
updated asynchronously via events.
4
CQRS complexity is high
use only when read/write performance requirements genuinely diverge.
5
CQRS pairs naturally with Event Sourcing but does not require it.

Common mistakes to avoid

4 patterns
×

Not handling projection failures

Symptom
Read model becomes permanently stale after a transient error, and users see old data for days until manual intervention.
Fix
Make projections idempotent (upsert by key). Implement a replay mechanism that can reprocess events from any offset. Monitor projection error rate and lag with alerts.
×

Assuming eventual consistency is negligible

Symptom
The read model lag grows unbounded under load, causing critical features (e.g., balance checks) to behave incorrectly.
Fix
Quantify acceptable lag per query type. For time-sensitive reads, implement read-after-write consistency or route to write model directly.
×

Using the same database for both write and read models

Symptom
You don't get the performance or scalability benefits of CQRS — still fighting with the same bottleneck.
Fix
Use separate databases (or at least separate schemas/instances) optimised for each access pattern. Write model: normalised, ACID. Read model: denormalised, potentially no joins, can use columnar or document store.
×

Not designing commands to be idempotent

Symptom
If a command is retried due to network timeout, the system creates duplicate records or inconsistent state.
Fix
Assign a unique command ID to each command. The write side stores processed IDs and rejects duplicates. This prevents double processing when the event publish fails and the command is retried.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is CQRS and what problem does it solve?
Q02SENIOR
What is eventual consistency in the context of CQRS?
Q03SENIOR
What is the difference between CQRS and Event Sourcing?
Q04SENIOR
How do you handle idempotency in a CQRS system?
Q01 of 04JUNIOR

What is CQRS and what problem does it solve?

ANSWER
CQRS stands for Command Query Responsibility Segregation. It solves the performance problem that arises when the same data model is used for both writes (normalised, with enforced business rules) and reads (often requiring joins). By separating into a write model that handles commands and a read model that serves queries, you can optimise each independently. The write model uses a normalised store with full validation, the read model uses denormalised, pre-joined views for fast queries. The trade-off is eventual consistency between the two models.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between CQRS and Event Sourcing?
02
When should I NOT use CQRS?
03
Can I use CQRS without Event Sourcing?
04
How do I ensure read-after-write consistency in CQRS?
🔥

That's Architecture. Mark it forged?

3 min read · try the examples if you haven't

Previous
Event-Driven Architecture
5 / 13 · Architecture
Next
Event Sourcing