Advanced 9 min · May 23, 2026

CQRS in Spring Boot: You Split the Database, Now Your Queries Are Stale

Q: Is CQRS the same as using a read replica database?

No. A read replica still reads from the same logical database as the write. CQRS uses a separate data store with a denormalized schema designed specifically for queries. A read replica is a copy of the write model. CQRS is a different model entirely.

Q: Do I need CQRS for my microservices?

Only if your read and write workloads have different performance requirements. If you have a heavy reporting query that runs 100 times per second and a write endpoint that runs 5 times per second, CQRS helps. If your CRUD is simple and uniform, start with normal CRUD. Don't add CQRS for fun — it adds complexity.

Q: What database should I use for the read model?

Elasticsearch for full-text search and analytics. DynamoDB for low-latency, single-key lookups. MongoDB for flexible document projections. PostgreSQL with denormalized tables for small-to-medium setups. Avoid Cassandra — it's not built for ad-hoc queries.

Q: How do I handle idempotency in CQRS?

Use event IDs as idempotency keys. The projection service checks if an event with that ID has already been processed. If yes, skip it. If no, process it. Store the event ID in the projection. For the outbox, use a unique constraint on (aggregate_id, event_type) to prevent duplicate publications.

Q: Can CQRS be used within a single monolithic application?

Yes. Use separate database schemas or separate databases. The write side uses a normalized schema. The read side uses a denormalized schema. Events can flow through an in-memory event bus instead of Kafka. This gives you the benefits of CQRS without the distributed system complexity. It's a good stepping stone before migrating to microservices.

CQRS pattern in Spring Boot microservices — real production failures, eventual consistency traps, and debug commands that save your on-call rotation..

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Lessons pulled from things that broke in production.

✓ Production

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

CQRS separates read and write models to scale independently.
The write model uses events; the read model is a denormalized projection.
Eventual consistency means stale reads — your users will see old data.
Kafka + Debezium is the battle-tested pipeline, not shared DBs.
If your read store is down, writes still succeed — and users get 500s on reads.

✦ Definition~90s read

What is CQRS Pattern in Spring Boot Microservices?

CQRS stands for Command Query Responsibility Segregation. You split your database into a write side (commands) and a read side (queries). Commands produce events. Queries consume those events to build denormalized projections. The write model is normalized, ACID, and transactional.

★

You have two cash registers.

The read model is flat, indexed for the exact queries your UI needs. They are not the same database. If they are, you don't have CQRS. You have a transaction log.

The split solves two problems. First, your write model doesn't get crushed by read traffic. Second, your read model doesn't get bogged down by write locks. But it introduces a third problem: eventual consistency. The read side is always behind. A few milliseconds in the best case. Seconds or minutes if the event pipeline breaks. You cannot hide this. You must design for it.

You need CQRS when your read queries have different performance and scaling requirements than your write commands. When reporting queries join 12 tables and take 30 seconds per request while your writes need sub-10ms latency. When your graphQL API needs drastically different data shapes than your REST command endpoints. When one-size-fits-all CRUD gives you a database that can't do anything well.

Plain-English First

You have two cash registers. One takes orders and writes them in a book. The other just reads the book and tells customers their total. If the reader lags behind the writer, the customer sees yesterday's price. That's eventual consistency — and you have to tell the customer they might be wrong.

You're on-call. It's 2 AM. The ticket queue is exploding. Users report that their order total shows double the product price. Every third refresh, the total changes. The product team is screaming. Your instinct says 'data race' or 'cache bug'. You check Redis TTLs. You check the database for phantom writes. Nothing.

You go deeper. The orders service writes to PostgreSQL. The pricing service writes to MongoDB. A Kafka topic pipes events between them. You find it. The pricing projection lagged by 45 seconds. A burst of 10,000 orders hit the same time as a pricing update. The read model hadn't processed the new price yet. Users saw old prices on new orders.

Welcome to CQRS. You split the database for scaling and now your queries are stale.

I've seen this exact pattern kill a Black Friday launch. We had three services sharing a single PostgreSQL database. Reads and writes fought each other at 500 QPS. The fix was CQRS. But the first implementation was naive — one database with two connection pools. That's not CQRS. That's a shared database with extra latency.

Real CQRS means separate data stores. Writes go to an event store or a write-optimized database like Aurora. Reads come from a denormalized store like Elasticsearch or DynamoDB. Events flow through Kafka. Projections update asynchronously. This buys you independent scaling. It costs you eventual consistency.

You can't wave that away with 'it's fine for our use case'. It breaks. Predictably. Your web app shows stale data until the projection catches up. Your tests pass because they poll and wait. Production users don't poll. They hit refresh and get a different answer. You have to tell them the answer is wrong and show a loading state. That's the contract.

If you can't handle that, don't do CQRS. If you can, it saves your systems under load. Let me show you how to build it right and what breaks when you don't.

Command Model: The Write Side Is Not an ORM Dump

The command model handles writes. It validates invariants, applies business logic, and produces events. It does not serve queries. End of story. If your command handler returns the saved entity, you're doing CRUD, not CQRS.

The command handler should be a single-purpose class. One command, one handler. No generic save() methods. The handler receives a command object (DTO), validates it, checks invariants, and if all passes, applies the change and publishes an event. The event is the only output. If the client needs the new state, they query the read model.

I've seen teams handwave this and return the entity from the command endpoint 'for convenience'. Then the client starts using that data instead of querying the read model. Fast forward six months. The write model has leaky abstractions. The read model is stale because no one uses it. You have a shared database with a facade.

Use the transactional outbox pattern. Write the event to an outbox table in the same transaction as the domain change. A separate process polls that table and publishes to Kafka. This guarantees at-least-once delivery. If you use Eventuate Tram or Debezium, you get this out of the box.

The command model database should be normalized. 3NF or BCNF. Indexes for the write path only. No read-oriented denormalization. That's the read model's job.

One more thing: commands are not queries. A command changes state. If it doesn't, it's a query. Don't call a query a command. Don't call a command a query. The distinction is behavioral, not architectural.

Production Trap:

If your command handler loads the entity and returns it, you've broken the read-write separation. The client will cache that response and never query the read model. Your read model becomes a dead letter. Always return 202 Accepted with no body, or a correlation ID only.

Production Insight

Saw a team use Jackson JSON views to return different fields for commands vs queries. That's not CQRS. That's a different JSON serializer.

Key Takeaway

Commands produce events, not responses. If your command returns data, you're doing CRUD.

thecodeforge.io

Microservices Cqrs Pattern

Query Model: Denormalize Aggressively, But Don't Forget Versioning

The query model is a denormalized projection of the events. One table per query shape. No joins. No subqueries. You eat the storage cost to gain read speed. Each projection is built by a consumer that listens to events and writes directly to the query store.

This is where the real work lives. Your events are JSON blobs with nested objects. Your query store is flat columns and arrays. You need three pieces: the event consumer (Kafka Listener), the projection function, and the query repository.

Version every projection. Add a version column or field. When the consumer processes an event, it increments the version. Queries check this version. If the version hasn't changed in N seconds (your acceptable staleness window), return the data. If it's older, return a 503 with a Retry-After header or a loading state. This turns an eventual consistency problem into a user experience problem.

Failures happen. The consumer crashes mid-event. The projection updates half the fields. You need idempotent projections — the same event applied twice produces the same result, not a duplicate row. Use upsert logic: if the entity exists, update; if not, insert. The event ID is your idempotency key.

Database choice matters. For high-read, low-write projections, use Elasticsearch or OpenSearch. For consistent, low-latency reads, use DynamoDB with DAX. For simple projections, use PostgreSQL with a denormalized schema and no joins. Do not use Cassandra for projections unless you love debugging tombstone issues.

One more thing: the query service must not talk to the write database. Not even for health checks. Separate connection pools, separate schemas, separate clusters. If they share a database, you don't have CQRS. You have a monolith with two APIs.

Senior Shortcut:

Use MongoDB or DynamoDB for projections. They handle partial updates and schema changes better than relational DBs. Plus, you get automatic versioning with @Version in JPA, but DynamoDB conditional updates are faster under high concurrency.

Production Insight

Don't put read models in a relational database if you can avoid it. The normalization tax kills query performance at scale.

Key Takeaway

Denormalize until there are no joins. Store the query in the shape your API sends to the client. Version everything.

Event Sourcing vs CQRS: You Don't Need Both

People conflate CQRS with event sourcing. They're separate. CQRS splits read and write. Event sourcing stores state as a sequence of events. You can have one without the other.

CQRS without event sourcing: you have a normal write database and a separate read database. Events flow between them. The write database stores current state. Events are produced on change. The read database is rebuilt from events. This is simpler. It also means you lose historical state — you can't replay events to rebuild the read model from scratch unless you kept them.

Event sourcing without CQRS: you store all events, but you still query the event store directly. This is painful. Querying a sequence of events to answer 'what's the current state?' requires replaying all events. You need snapshots. You need projection indexes. It's a write-optimized nightmare for reads.

You need both when you need complete audit trails. Financial systems. Compliance-heavy domains. Multi-version concurrency with serializable isolation. If you don't need those, don't pay the complexity tax.

Here's the production truth: event sourcing is for when you need to know not just the current state, but how you got there. CQRS is for when your read and write workloads have fundamentally different scaling profiles. They overlap but are not dependent.

My team once built an event-sourced CQRS system for a customer loyalty program. The events were 'points earned', 'points redeemed', 'points expired'. The read model was a current balance. We never needed the event history. We could have stored just the balance in a single row. The event store was dead weight. Don't be that team.

Interview Gold:

When asked 'How do you rebuild a read model?', say 'Replay all events from the event store, or from the last checkpoint. If you don't have an event store, you need a snapshot table or a periodic full rebuild from the write database.'

Production Insight

90% of CQRS systems don't need event sourcing. The event stream between write and read is enough for replay.

Key Takeaway

CQRS splits read from write. Event sourcing stores all events. They are orthogonal. Choose both only when you need audit trails.

thecodeforge.io

Microservices Cqrs Pattern

Eventual Consistency: You Can't Hide It, So Design For It

Eventual consistency is not a bug. It's a feature of distributed systems. If you try to hide it, you end up with synchronous event processing, which defeats the purpose of CQRS. The write model slows down waiting for the read model to update. You lose the scaling benefit.

Design for staleness. The UI must handle stale data. Show loading spinners, skeleton screens, or a 'this data is N seconds old' indicator. The API must return stale data gracefully. If the read model is behind, return a warning header: Warning: 299 - "stale-data". The client can decide to retry or show cached data.

Set SLAs. Define acceptable staleness per entity. Order status can be 5 seconds stale. Inventory count can be 30 seconds. User profile can be 2 minutes. The projection service must meet these SLAs. Monitor consumer lag. Alert when lag exceeds the SLA.

What happens when the projection service is down for 10 minutes? The read model is 10 minutes stale. Users create orders against a 10-minute-old inventory count. You sell items that no longer exist. The fix is to reject the command at the write model if the read model is too stale. Yes, that couples them again. But it's better than overselling.

Another pattern: read-your-writes consistency. When a user writes data, they should see it in the next read. For that user only. Other users see stale data. Implement this by returning a write timestamp from the command. The client includes that timestamp in the next query. The read model checks if its projection version is >= that timestamp. If not, it blocks until it catches up or returns a 503.

I've seen teams use a distributed cache like Redis as a write-through cache between the command and read model. The command writes to both the write DB and Redis. The read model reads from Redis. This gives strong consistency at the cost of latency. It works for small datasets. It fails for large ones because Redis memory fills up.

Never Do This:

Don't make the projection service synchronous. Don't block the command handler waiting for the read model to update. You'll bottleneck the write path and lose the entire benefit of CQRS.

Production Insight

A stale read that returns 200 is worse than a 503 with Retry-After. Users understand 'try again'. They don't understand 'the total changed'.

Key Takeaway

Design for eventual consistency, don't fight it. Set SLAs, monitor lag, and tell the client when data is stale.

Transactional Outbox Pattern: The Backbone of Reliable CQRS

The outbox pattern is how you guarantee events are published reliably. Write the event to an outbox table in the same database transaction as the domain change. A separate publisher polls the outbox for unpublished events and sends them to Kafka.

Why not publish the event directly from the command handler? Because the DB transaction might fail after the event is published. Now you have an event that says 'order created' but the order doesn't exist in the write database. This is the classic dual-write problem.

Use Debezium if you want CDC (change data capture). It reads the database transaction log and publishes events. No outbox table needed. But it requires the write database to support CDC (PostgreSQL logical replication, MySQL binlog). It also adds latency — the event is available only after the transaction log is flushed to disk.

Use a simple outbox table if you want low latency. Poll interval of 100ms. Batch size of 100. Deduplicate by event ID. This is what I use in production. It's simple, predictable, and easy to debug.

The outbox publisher must be idempotent. If it crashes after sending the event but before marking it as published, the event is sent twice. The consumer must handle duplicates — use idempotent projections (upsert by event ID).

I once saw an outbox publisher with a bug: it published events but never marked them as published. The table grew to 5 million rows. The poll query (SELECT * FROM outbox WHERE published = false) timed out. The publisher stopped. Not a single event was processed for 8 hours. The fix: add a LIMIT and an index on (published, created_at). Also add a TTL cleanup job — delete published events after 24 hours.

Don't use Quartz or cron jobs for polling. Use Spring's @Scheduled with a configurable rate. Or use a dedicated library like Outbox Runner. Keep it simple.

Senior Shortcut:

Add an index on (published, created_at) on the outbox table. Without it, the poll query scans the entire table as it grows. Add a TTL: delete published events older than 24 hours. The outbox table is not an event store — don't keep it forever.

Production Insight

The #1 outbox failure is the poll query timing out because of missing index. Add the index before you ship to production.

Key Takeaway

Outbox pattern guarantees at-least-once delivery. Poll with LIMIT. Index the right columns. Clean up old records.

Testing CQRS: It's Not Just Unit Tests

CQRS breaks your test assumptions. You can't test with a single database and in-memory mocks. You need three things: component tests for the command handler, component tests for the query handler, and integration tests for the event pipeline.

Command handler tests: mock the repository and the event publisher (or outbox writer). Test invariants and event generation. Don't test the database. Use a fake or an in-memory repository.

Query handler tests: mock the read repository. Test that the correct data is returned. Test that stale data warnings are generated. Test the projection function by feeding it events and checking the output in the read store.

Integration tests: spin up a real Kafka (use Testcontainers). Start the command service, the outbox publisher, and the query service. Send a command via REST. Wait for the event to flow through the pipeline. Query the read model and assert the result. This is the only test that proves the system works end-to-end.

Don't stub the Kafka producer. Don't mock the outbox publisher. These are the parts that break in production. Test them with real infrastructure.

Resilience tests: simulate consumer crash. Kill the projection service while events are in flight. Restart it. Assert that events are processed exactly once (idempotent projection). Simulate outbox publisher crash. Kill it. Restart. Assert no events are lost.

Performance tests: generate 10,000 events. Measure consumer lag. Measure read model update latency. Verify it meets your SLA. If it doesn't, tune the consumer batch size, parallelism, and read store write throughput.

I've seen teams skip integration tests because 'they're slow'. Then they ship an outbox bug that drops every 100th event. Logging found it 3 hours after deployment. 5,000 orders lost. Don't be that team.

The Classic Bug:

Your outbox publisher uses @Scheduled but the application has multiple instances. Now every instance polls the same outbox table and publishes the same events. Duplicate events everywhere. Fix: use a distributed lock (e.g., ShedLock) or a leader election.

Production Insight

Duplicate events from multiple outbox publishers cost me a weekend. Use ShedLock. Or better, use Debezium which naturally single-threads the CDC pipeline.

Key Takeaway

Test the full pipeline with real infrastructure. Don't trust mocks for event-driven architecture.

Synchronizing Read and Write Models: The Glue That Breaks First

You split the models. Great. Now how does data get from the command side to the query side? This is where most teams fall apart. The synchronization layer isn't a simple DB trigger or a scheduled batch job. It's a message-driven pipeline that must handle failures, duplicates, and ordering. When you update an order status in the write model, you publish an event. A subscriber picks that event, transforms it into the read model's shape, and upserts the denormalized view. The hard part: your read model can lag, or worse, process events out of order. That's why your event payload must carry a version number or sequence ID. The subscriber must reject stale events or reorder them on arrival. And never — I mean never — rely on the event bus for exactly-once delivery. Your consumer must be idempotent. Check the version before applying the update. If the event is older than the current state, drop it. If it's newer, apply it. That's it. No magic.

OrderProjection.javaJAVA

// io.theforge — java tutorial
@Component
public class OrderProjection {

    private final OrderReadRepository readRepo;

    public OrderProjection(OrderReadRepository readRepo) {
        this.readRepo = readRepo;
    }

    @EventListener
    public void on(OrderCreatedEvent event) {
        OrderReadModel model = new OrderReadModel(
            event.orderId(),
            event.customerId(),
            event.totalAmount(),
            event.createdAt(),
            1  // version start
        );
        readRepo.save(model);
    }

    @EventListener
    public void on(OrderStatusUpdatedEvent event) {
        OrderReadModel current = readRepo.findById(event.orderId());
        if (current == null) return;

        // Idempotent by version: drop stale
        if (event.version() <= current.version()) {
            log.warn("Stale event dropped: {} <= {}", event.version(), current.version());
            return;
        }

        current.setStatus(event.newStatus());
        current.setVersion(event.version());
        readRepo.save(current);
    }
}

Output

Order projection processes events. Version check prevents stale updates.

Production Trap:

Don't assume message brokers preserve order across retries. If a consumer crashes mid-flight, replayed events can arrive out of sequence. Always use versioning on the read model. Without it, you'll corrupt denormalized data silently.

Key Takeaway

The synchronizer is a state machine, not a copier. Check version before write, or your read side will lie to users.

Command Validation: Don't Let Garbage Enter the Write Side

The command model is the gatekeeper. It's the only place where business rules are enforced. Every command must be validated before it touches an aggregate. But here's the catch: don't put validation in the controller. That's the wrong layer. Validation belongs in the command handler — right before it loads the aggregate. Why? Because business rules change. The controller should only check for malformed input (null IDs, empty strings). The real guard is in the domain layer. For example: 'Can this user cancel an order that's already shipped?' The controller doesn't know that. The aggregate does. Pattern: wrap your command in a transformer method that returns either a success or a failure. If it fails, publish a CommandRejectedEvent. Don't throw exceptions for expected business violations. Exceptions are for unexpected errors, not for 'order already fulfilled.' Use a validation result type. This keeps your audit log clean and your code predictable.

CancelOrderCommand.javaJAVA

// io.theforge — java tutorial
@Value
public class CancelOrderCommand {
    String orderId;
    String reason;
    Instant cancelledAt;

    public ValidationResult validate(OrderAggregate aggregate) {
        if (aggregate.isShipped()) {
            return ValidationResult.rejected("Cannot cancel shipped order");
        }
        if (aggregate.isCancelled()) {
            return ValidationResult.rejected("Order already cancelled");
        }
        return ValidationResult.accepted();
    }
}

@Component
public class CancelOrderHandler {

    @CommandHandler
    public void handle(CancelOrderCommand cmd, OrderAggregate aggregate) {
        ValidationResult result = cmd.validate(aggregate);
        if (result.isRejected()) {
            EventPublisher.publish(new CommandRejectedEvent(cmd.orderId(), result.reason()));
            return;
        }
        aggregate.apply(new OrderCancelledEvent(cmd.orderId(), cmd.reason(), cmd.cancelledAt()));
    }
}

Output

Command validation returns rejection event instead of throwing exception.

Sharp Edge:

If you throw a RuntimeException for a business rejection, your event store will still record the failed command attempt as an event. That's noise. Use explicit rejection events for expected failures. They're cheaper, cleaner, and queryable.

Key Takeaway

Business rules are not exceptions. Validate commands in the handler, return rejection events, and keep your aggregates honest.

● Production incidentPOST-MORTEMseverity: high

The 45-Second Pricing Lag — A Black Friday Near-Miss

Symptom

PagerDuty alert: 'Price mismatch rate > 5%'. Users report order total changes between step 2 and step 3 of checkout. Revenue reconciliation job fails — says 12% of orders have wrong totals.

Assumption

Thought it was a race condition in the checkout service. Checked write locks and transaction isolation levels. All normal.

Root cause

The pricing projection service processed Kafka messages with batch size 500 and poll interval 5 seconds. A price update event and 10,000 orders hit the same window. The projection service processed orders with old prices before processing the price update. The read model was 45 seconds stale under peak load.

Fix

Reduced Kafka consumer batch size to 50. Set max.poll.interval.ms to 10 seconds. Added a materialized view version column. Queries now check if the view version matches the current event timestamp. If mismatch > 2 seconds, show a loading skeleton instead of stale data.

Key lesson

Eventual consistency isn't eventual enough for checkout.
Build staleness detection into the read path or don't use CQRS for pricing.

Production debug guideSymptom → root cause → fix for the failures that actually happen4 entries

Symptom · 01

Read model returns stale data — user sees yesterday's order status.

→

Fix

Check the Kafka consumer lag for your projection topic (kafka-consumer-groups --bootstrap-server ... --group ... --describe). High lag means consumer is stuck or slow. Check CPU on the consumer pod — likely too small. Check for poison pill messages (non-deserializable events). Also check if the write service published the event. If the write succeeded but the event didn't publish, you have no data for the projection to consume.

Symptom · 02

Commands return success but reads show no change.

→

Fix

Check the write service logs for successful event publication. Then check the event store (Kafka topic) for the message. If it's there, the projection service didn't consume it. Check the consumer group offset — it might have committed without processing due to auto-commit. Enable manual offset commit after processing. If the event is missing, the write service didn't publish it. Check transactional outbox pattern — maybe the DB transaction rolled back but the event was published.

Symptom · 03

Read model inconsistent across replicas — two API calls return different data.

→

Fix

Check if your read replicas are behind different write leaders. If using DynamoDB global tables, check replication lag via CloudWatch. For Elasticsearch, check cluster health — unassigned shards cause partial results. The root cause is usually a partial deploy of the projection service — some pods have old code that processes events differently. Roll back to a single version.

Symptom · 04

High latency on read queries despite CQRS.

→

Fix

Check if your read model is still doing joins. If your projection service writes to a normalized schema, you've defeated CQRS. The read model must be denormalized. Check for missing indexes. Check the read model's write path — if the projection service writes too slowly, queries pile up. Also check if the read model is being used as a write cache (write-through, write-behind) — that violates the pattern.

★ Debug Cheat SheetCommands for fast diagnosis in production

Kafka consumer lag > 1000−

Immediate action

Check consumer pod CPU and memory. Check for poison pill messages.

Commands

kafka-consumer-groups --bootstrap-server broker:9092 --group pricing-projection-group --describe

kubectl top pod -l app=pricing-projection

Fix now

Increase consumer parallelism: add more partitions and consumers. Set max.poll.records=50. Enable manual offset commit.

Write succeeds but read shows old data after 30 seconds+

Elasticsearch returns partial results — some fields null+

Read replica behind primary by > 10 seconds+

CQRS vs Traditional CRUD vs Event Sourcing

Dimension	Traditional CRUD	CQRS	CQRS + Event Sourcing
Read/Write coupling	Tight — one DB for both	Loose — separate stores	Loose — separate stores
Consistency model	Strong (within one DB)	Eventual (across stores)	Eventual (across stores + events)
Scaling	Shared bottleneck	Independent scaling	Independent scaling + audit
Historical state	Last known state only	Last known state only	Full event history
Complexity	Low	Medium — outbox, projections, lag	High — event store, snapshots, versioning
Best for	Small teams, simple CRUD	High read/write skew, reporting	Compliance, audit, multi-version concurrency

⚙ Quick Reference

2 commands from this guide

File	Command / Code	Purpose
OrderProjection.java	@Component	Synchronizing Read and Write Models
CancelOrderCommand.java	@Value	Command Validation

Key takeaways

CQRS separates read and write databases. If they share storage, you don't have CQRS.

Eventual consistency is a design constraint, not a bug. Design the UI for staleness.

Always use the transactional outbox pattern

never publish events from the command handler directly.

Version your projections. Detect staleness. Return warnings or 503 instead of wrong data.

Test the full pipeline with real Kafka and real databases. Mocks hide production failures.

Common mistakes to avoid

5 patterns

Using a single database table for both command and query models with two connection pools.

Symptom

Read queries still block writes. Performance doesn't improve.

Fix

Create a separate read-only replica or a denormalized read table. The command model should never serve queries.

Making the projection service synchronous — waiting for the read model to update before returning from the command.

Symptom

Command latency spikes as reads catch up. Throughput drops.

Fix

Fire-and-forget the event. The command returns 202 immediately. The read model updates asynchronously. Set SLA monitoring.

Not versioning the projection — no way to detect staleness.

Symptom

Users see inconsistent data across refreshes. No metric to track lag.

Fix

Add a version column to the read model. The projection service increments it on each update. The query controller checks it against the write timestamp.

Publishing events from the command handler without using the outbox pattern.

Symptom

Events published for transactions that rolled back. Or events lost when the Kafka producer fails.

Fix

Use transactional outbox. Write event to outbox table in same DB transaction. Poll and publish separately.

Running multiple outbox publisher instances without a distributed lock.

Symptom

Duplicate events in Kafka. Read model processes each event multiple times (unless idempotent).

Fix

Use ShedLock or a leader election pattern. Or use Debezium which handles this natively.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

What happens to the read model when the write model publishes an event b...

Q02JUNIOR

Explain the difference between CQRS and event sourcing in simple terms.

Q03SENIOR

Design a CQRS system for an e-commerce catalog that supports 10,000 read...

Q04SENIOR

How do you handle schema changes in CQRS when the read model needs a new...

Q05SENIOR

Your CQRS system has 3 read replicas for the query database. Why does on...

Q06JUNIOR

Can CQRS work without a message broker like Kafka?

Q07SENIOR

How do you test eventual consistency in CQRS without relying on Thread.s...

Q08SENIOR

You need to rebuild the read model from scratch after a bug in the proje...

Q01 of 08SENIOR

What happens to the read model when the write model publishes an event but the projection service crashes before processing? How do you recover?

ANSWER

The event stays in Kafka (uncommitted offset) if the consumer uses manual commit. When the projection service restarts, it resumes from the last committed offset and reprocesses the event. If the consumer used auto-commit (bad), the event is lost. Recovery requires replaying from the outbox table or a backup. Always use manual offset commits and idempotent projections.

FAQ · 5 QUESTIONS

Frequently Asked Questions

Is CQRS the same as using a read replica database?

Do I need CQRS for my microservices?

What database should I use for the read model?

How do I handle idempotency in CQRS?

Can CQRS be used within a single monolithic application?

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Lessons pulled from things that broke in production.

✓ Verified

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

🔥

That's Microservices Patterns. Mark it forged?

9 min read · try the examples if you haven't