Advanced 14 min · March 06, 2026

CQRS with Databases

CQRS Outbox Polling Failures — Connection Pool Exhaustion

Q: Do I need Event Sourcing to implement CQRS?

No — they're complementary but independent patterns. CQRS just means separating your read and write models. You can implement it with a traditional state-based write database (PostgreSQL with normal UPDATE statements) and use the Outbox Pattern to emit events that sync to read projections. Event Sourcing (storing events as the primary record instead of current state) pairs naturally with CQRS and adds an audit log and replayability, but it carries significant additional complexity and isn't required.

Q: How much eventual consistency lag is acceptable in a CQRS system?

That entirely depends on your domain. For an e-commerce order list, 500ms is invisible to users. For a bank balance display after a transfer, even 2 seconds may be unacceptable. The right answer is: measure your actual lag under realistic load, then decide per-feature whether to accept it, use read-your-writes consistency (serve the write-side response once after a command), or use optimistic UI updates on the client. There's no universal number.

Q: When should I NOT use CQRS?

CQRS adds real operational complexity — you're now maintaining two data models, a synchronization pipeline, idempotent consumers, and eventual consistency handling. Don't use it for simple CRUD services where reads and writes have the same shape, low-traffic internal tools, or any domain where the team isn't ready to operationalize the event pipeline. The clearest signals that CQRS is worth it: read-to-write ratio above 10:1, distinct read patterns per consumer (dashboards vs. lists vs. search), or a need for a full audit trail of state changes.

Q: What database indexing strategy should I use for the domain_events outbox table?

The critical index is a partial index on `published = FALSE` and `occurred_at ASC` to efficiently fetch batches of unpublished events. Without a partial index, the poller will scan the entire table. Also index `occurred_at` for ordering. If you use Debezium, the WAL decoder doesn't query the table directly, so those indexes are less important.

Q: Can I use CQRS with a single database instance?

Technically yes, but you lose most benefits. If both read and write models live in the same PostgreSQL instance, they still compete for IO, memory, and connection pool. The read queries can still cause lock contention on write tables if they share a schema. To get real separation, run read and write on separate database instances (or at least separate schemas with different connection pools and roles). The minimal viable CQRS uses two independent database connections pointing to the same server but separate databases.

One broken connection stalled 50K+ events.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

CQRS splits your database into a write-optimized command model and read-optimized query models.
The write side uses normalized tables with full constraints for correctness.
The read side uses denormalized projections built from domain events.
Synchronization via the Outbox Pattern prevents dual-write failures.
Typical replication lag: 50–500ms under normal load; up to seconds during backpressure.
Biggest mistake: sharing the same database instance for both models, defeating the purpose.

✦ Definition~90s read

What is CQRS with Databases?

CQRS (Command Query Responsibility Segregation) is an architectural pattern that separates read and write operations into distinct models, each optimized for its specific purpose. Instead of forcing a single data model to handle both complex business logic updates and high-performance queries, CQRS lets you design a write model that enforces invariants and a read model that directly serves UI needs.

★

Imagine a busy library.

This pattern exists because traditional CRUD approaches often create a tug-of-war between transactional consistency and query performance — especially in systems with high concurrency or complex domain logic. You'd use CQRS when your read and write workloads have fundamentally different shapes, like an e-commerce platform where order placement (write) requires strict validation while product catalog queries (read) need denormalized, pre-joined data.

Don't use it for simple CRUD apps where a single model suffices — the added complexity of synchronization and eventual consistency isn't worth it.

In the ecosystem, CQRS often pairs with Event Sourcing (where state changes are stored as an immutable event log) and polyglot persistence (using different databases for different read models — PostgreSQL for relational queries, Elasticsearch for full-text search, Redis for caching). The core challenge is keeping the read side eventually consistent with the write side, typically via event-driven synchronization.

This is where patterns like the Outbox (writing events to a local table alongside the domain model) and Change Data Capture (CDC) come in. Real-world implementations include companies like Uber (using CQRS for trip management) and Shopify (for order processing), where write throughput and read latency are decoupled.

The trade-off is operational complexity: you now manage multiple data stores, handle replication lag, and deal with eventual consistency failures — like connection pool exhaustion when your Outbox polling mechanism overwhelms the database.

Plain-English First

Imagine a busy library. One desk handles all book returns and new arrivals — it carefully updates every record, checks for duplicates, and logs everything precisely. A completely separate desk handles 'what books do you have?' questions — it uses a big, pre-sorted poster on the wall for instant answers instead of digging through the filing cabinet. CQRS does exactly this for your database: one path writes data carefully and correctly, a completely separate path reads it fast. They don't share the same table, the same model, or even the same database engine. That's the whole idea.

Every system eventually hits the same wall: reads and writes want completely different things. Your write path needs strict consistency, complex validation, and transactional safety. Your read path needs denormalized, pre-joined data that returns in milliseconds without locking a single row. Shoving both responsibilities into one data model is like making your head librarian personally answer every 'where is the fiction section?' question between processing every book return. Something always suffers. In high-traffic production systems, it's usually reads — or worse, reads start killing writes through lock contention on shared tables.

Command Query Responsibility Segregation (CQRS) solves this by treating reads and writes as fundamentally different concerns at the database level, not just the application layer. It gives each side its own optimized data model, its own storage engine if needed, and its own scaling strategy. This isn't just an architectural buzzword — it's a direct response to the CAP theorem tensions, OLTP vs OLAP impedance mismatch, and the practical reality that read-to-write ratios in most production systems are anywhere from 10:1 to 1000:1.

By the end of this article you'll understand exactly how to design the write-side (command) and read-side (query) database models, how to build the synchronization layer between them without data loss, how to handle the eventual consistency window safely, and where CQRS genuinely helps versus where it adds complexity you don't need. You'll leave with runnable code, real schema designs, and the mental model to defend these decisions in a production architecture review.

CQRS with Databases — The Core Mechanic

CQRS (Command Query Responsibility Segregation) separates write and read data models. Instead of one schema for both inserts and queries, you maintain a command model optimized for writes and a separate read model optimized for queries. The two models are synchronized asynchronously, typically via an event-driven pipeline. This decoupling lets you scale reads independently, use different storage engines per model, and avoid write contention blocking read performance.

In practice, the write model emits events (e.g., OrderCreated) that a background process consumes to update the read model. The read model can be a denormalized projection, a materialized view, or even a completely different database. The synchronization introduces eventual consistency — the read model lags behind the write model by milliseconds to seconds. This trade-off is acceptable for many domains but requires careful handling of idempotency and ordering.

Use CQRS when your system has asymmetric read/write patterns: many more reads than writes, complex query shapes that don't map well to the write schema, or when write throughput is bottlenecked by read indexes. It's common in event-sourced systems, high-traffic e-commerce, and multi-tenant SaaS platforms. The cost is operational complexity — you now manage two data stores and a sync mechanism — so only adopt it when the benefits clearly outweigh that overhead.

🔥CQRS ≠ Event Sourcing

CQRS and event sourcing are often paired but are independent. You can do CQRS without event sourcing, and vice versa.

📊 Production Insight

A team ran CQRS with a single PostgreSQL instance for both command and read models, using a polling publisher to forward events. Under peak load, the polling queries and read-model updates consumed all 50 connections in the pool, starving the write path. The symptom: intermittent 500s on command endpoints with 'connection pool exhausted' errors. Rule of thumb: always size the connection pool for the peak concurrent queries from both command and read sides, and consider a dedicated read-replica for the read model.

🎯 Key Takeaway

CQRS decouples write and read models to optimize each independently.

Synchronization introduces eventual consistency — design for it, don't fight it.

Only adopt CQRS when the operational cost is justified by asymmetric read/write demands.

thecodeforge.io

Cqrs With Databases

The Write Side: Designing a Command Model That Protects Invariants

The command side of your database has one job: accept a change, validate it completely, and persist it durably. That means your write model is normalized, constraint-heavy, and optimized for correctness — not speed. You're not trying to return data fast here. You're trying to make sure that when you say 'this order is confirmed', it actually is, with no race conditions and no partial writes.

In practice, the write model typically lives in a relational database (PostgreSQL is a popular choice) with proper foreign keys, unique constraints, check constraints, and row-level locking where needed. The schema reflects your domain's invariants, not your UI's data needs. An 'Order' table doesn't have a denormalized customer name column — it has a foreign key to a Customers table because that constraint matters at write time.

The critical design decision is what you persist. Many teams using CQRS go one step further and adopt Event Sourcing on the write side — instead of storing the current state of an order, you store the sequence of events that produced it (OrderPlaced, PaymentReceived, OrderShipped). This makes the command model an append-only event log, which has profound implications: you get a full audit trail for free, replaying events rebuilds any read model, and you eliminate update contention entirely. Even if you don't use Event Sourcing, your write model should emit domain events after each successful command — these events are the bridge to your read side.

write_side_schema.sqlSQL

-- ============================================================
-- WRITE SIDE (Command Model) — PostgreSQL
-- Normalized, constraint-heavy, optimized for correctness.
-- This schema is NEVER queried by the UI directly.
-- ============================================================

-- Core customer record — source of truth for identity
CREATE TABLE customers (
    customer_id     UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email           TEXT NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    -- Enforce unique emails at the DB level, not just the app layer
    CONSTRAINT uq_customers_email UNIQUE (email)
);

-- Orders table — normalized, references customers by FK
CREATE TABLE orders (
    order_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id     UUID NOT NULL REFERENCES customers(customer_id),
    status          TEXT NOT NULL DEFAULT 'pending',
    total_cents     INTEGER NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    -- Prevent nonsense states at the DB level
    CONSTRAINT chk_orders_status CHECK (status IN ('pending','confirmed','shipped','cancelled')),
    CONSTRAINT chk_orders_total  CHECK (total_cents >= 0)
);

-- Order line items — fully normalized, no denormalized product names here
CREATE TABLE order_items (
    item_id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id        UUID NOT NULL REFERENCES orders(order_id) ON DELETE CASCADE,
    product_id      UUID NOT NULL,
    quantity        INTEGER NOT NULL,
    unit_price_cents INTEGER NOT NULL,
    CONSTRAINT chk_order_items_qty   CHECK (quantity > 0),
    CONSTRAINT chk_order_items_price CHECK (unit_price_cents > 0)
);

-- ============================================================
-- DOMAIN EVENTS TABLE — the outbox that feeds the read side
-- Every successful command inserts a row here atomically
-- in the same transaction as the state change.
-- ============================================================
CREATE TABLE domain_events (
    event_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type  TEXT NOT NULL,          -- e.g. 'Order'
    aggregate_id    UUID NOT NULL,          -- e.g. the order_id
    event_type      TEXT NOT NULL,          -- e.g. 'OrderConfirmed'
    payload         JSONB NOT NULL,         -- full event data
    occurred_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
    published       BOOLEAN NOT NULL DEFAULT FALSE  -- outbox pattern flag
);

-- Index for the outbox poller — only scans unpublished events
CREATE INDEX idx_domain_events_unpublished
    ON domain_events (occurred_at)
    WHERE published = FALSE;

-- ============================================================
-- Example: confirming an order (run as a single transaction)
-- Both the state change AND the event are written atomically.
-- If either fails, both roll back — no lost events.
-- ============================================================
BEGIN;

    -- 1. Update the write-side state
    UPDATE orders
    SET    status     = 'confirmed',
           updated_at = now()
    WHERE  order_id   = 'a1b2c3d4-0000-0000-0000-000000000001'
    AND    status     = 'pending';   -- optimistic guard

    -- 2. Write the domain event in the SAME transaction (Outbox Pattern)
    INSERT INTO domain_events (aggregate_type, aggregate_id, event_type, payload)
    VALUES (
        'Order',
        'a1b2c3d4-0000-0000-0000-000000000001',
        'OrderConfirmed',
        '{"order_id": "a1b2c3d4-0000-0000-0000-000000000001", "confirmed_at": "2024-01-15T10:30:00Z"}'
    );

COMMIT;
-- Both rows land or neither does. The read side cannot miss this event.

Output

UPDATE 1

INSERT 0 1

COMMIT

⚠ Watch Out: The Dual-Write Trap

Never update the write-side state AND publish to a message broker (Kafka, RabbitMQ) in two separate operations. If the app crashes between them, your read model diverges silently. The Outbox Pattern shown above — writing the event to a DB table in the same transaction, then having a separate poller publish it — is the production-safe solution. Debezium with PostgreSQL logical replication is the zero-polling alternative.

📊 Production Insight

If the Outbox poller fails silently, events pile up and read models diverge.

Monitor domain_events table for unpublished rows and set up alerts.

Add dead-letter handling for events that repeatedly fail to process.

🎯 Key Takeaway

The write model enforces invariants with constraints.

The Outbox Pattern guarantees read side never misses a change.

Dual-writes to DB and broker are a production antipattern.

Choose Your Event Transport

IfYou have a single database and low event volume

→

UseSimple Outbox poller with sleep interval works fine.

IfYou need sub-100ms lag and can operate Debezium

→

UseUse PostgreSQL logical replication + Debezium CDC.

IfYou need high throughput and multiple consumers

→

UseOutbox poller publishes to Kafka; consumers read from Kafka.

The Read Side: Denormalized Query Models Built for Your UI

The read side exists for one reason: return exactly the data a consumer needs in a single, cheap query. No joins. No aggregations at read time. No shared locks with the write side. The read model is a pre-computed view of your data, shaped specifically around how it will be consumed.

This is where CQRS pays its maintenance cost back with interest. Instead of every UI component issuing a 5-table join query, each feature has its own denormalized projection stored in a read-optimized store. An order summary page gets a flat 'order_summaries' view with customer name, item count, and total pre-joined. A customer history page gets a 'customer_order_history' projection sorted and paginated. Each projection is rebuilt by consuming the domain events emitted from the write side.

You're free to use different storage engines per projection. Order summaries might live in PostgreSQL with a covering index. A full-text product search projection might live in Elasticsearch. A real-time analytics projection might live in Redis sorted sets. This is called polyglot persistence — using the right tool for each specific read pattern instead of forcing one database to be everything to everyone.

The read model is also entirely disposable. Because it's derived from events, you can delete any projection and rebuild it by replaying the event log from the beginning. This is a superpower — it means schema migrations on the read side are never scary. Drop the old projection, deploy the new schema, replay events, cut over. No risky ALTER TABLE on a live system.

read_side_projections.sqlSQL

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

-- ============================================================
-- READ SIDE (Query Models) — PostgreSQL read replica
-- Denormalized. No foreign keys enforced. No joins needed at query time.
-- These tables are populated by the event consumer, NOT by the app.
-- ============================================================

-- Projection 1: Order Summary List
-- Powers the "My Orders" page — returns everything needed in ONE query
CREATE TABLE order_summary_projection (
    order_id            UUID PRIMARY KEY,
    customer_id         UUID NOT NULL,
    customer_email      TEXT NOT NULL,    -- denormalized from Customers
    customer_name       TEXT NOT NULL,    -- denormalized from Customers
    status              TEXT NOT NULL,
    item_count          INTEGER NOT NULL, -- pre-aggregated
    total_cents         INTEGER NOT NULL,
    first_item_name     TEXT,            -- denormalized first item for display
    created_at          TIMESTAMPTZ NOT NULL,
    last_updated_at     TIMESTAMPTZ NOT NULL
);

-- Covering index: all columns needed for the list query are in the index
-- This means Postgres can answer the query without touching the heap at all
CREATE INDEX idx_order_summary_customer_created
    ON order_summary_projection (customer_id, created_at DESC)
    INCLUDE (status, item_count, total_cents, first_item_name);

-- Projection 2: Customer Lifetime Value
-- Powers analytics — pre-aggregated so no GROUP BY at query time
CREATE TABLE customer_ltv_projection (
    customer_id         UUID PRIMARY KEY,
    customer_email      TEXT NOT NULL,
    total_orders        INTEGER NOT NULL DEFAULT 0,
    total_spent_cents   BIGINT  NOT NULL DEFAULT 0,
    last_order_at       TIMESTAMPTZ,
    first_order_at      TIMESTAMPTZ
);

-- ============================================================
-- Event consumer: applies domain events to keep projections current
-- This runs in a background worker that polls the domain_events outbox
-- (or listens to Kafka/RabbitMQ if you use a broker)
-- ============================================================

-- Handler for 'OrderConfirmed' event
-- Called by your event consumer with the raw event payload
CREATE OR REPLACE FUNCTION apply_order_confirmed_event(
    p_order_id      UUID,
    p_customer_id   UUID,
    p_confirmed_at  TIMESTAMPTZ
) RETURNS VOID AS $$
BEGIN
    -- Update the order summary projection status
    UPDATE order_summary_projection
    SET    status          = 'confirmed',
           last_updated_at = p_confirmed_at
    WHERE  order_id        = p_order_id;

    -- No need to touch order_items or customers — they're already denormalized
END;
$$ LANGUAGE plpgsql;

-- Handler for 'OrderPlaced' event — builds the initial projection row
CREATE OR REPLACE FUNCTION apply_order_placed_event(
    p_order_id          UUID,
    p_customer_id       UUID,
    p_customer_email    TEXT,
    p_customer_name     TEXT,
    p_item_count        INTEGER,
    p_total_cents       INTEGER,
    p_first_item_name   TEXT,
    p_placed_at         TIMESTAMPTZ
) RETURNS VOID AS $$
BEGIN
    -- INSERT ... ON CONFLICT makes this handler idempotent:
    -- replaying the same event twice won't create duplicate rows
    INSERT INTO order_summary_projection (
        order_id, customer_id, customer_email, customer_name,
        status, item_count, total_cents, first_item_name,
        created_at, last_updated_at
    ) VALUES (
        p_order_id, p_customer_id, p_customer_email, p_customer_name,
        'pending', p_item_count, p_total_cents, p_first_item_name,
        p_placed_at, p_placed_at
    )
    ON CONFLICT (order_id) DO NOTHING;  -- idempotent: safe to replay

    -- Also update or insert the customer LTV projection
    INSERT INTO customer_ltv_projection (
        customer_id, customer_email,
        total_orders, total_spent_cents,
        last_order_at, first_order_at
    ) VALUES (
        p_customer_id, p_customer_email,
        1, p_total_cents,
        p_placed_at, p_placed_at
    )
    ON CONFLICT (customer_id) DO UPDATE SET
        total_orders      = customer_ltv_projection.total_orders + 1,
        total_spent_cents = customer_ltv_projection.total_spent_cents + EXCLUDED.total_spent_cents,
        last_order_at     = GREATEST(customer_ltv_projection.last_order_at, EXCLUDED.last_order_at);
END;
$$ LANGUAGE plpgsql;

-- ============================================================
-- The actual query — no joins, no aggregations, blazing fast
-- ============================================================
SELECT
    order_id,
    status,
    item_count,
    total_cents,
    first_item_name,
    created_at
FROM  order_summary_projection
WHERE customer_id = 'cust-uuid-here'
ORDER BY created_at DESC
LIMIT 20;

Output

-------------------------------------+-----------+------------+-------------+----------------------+-------------------------

a1b2c3d4-0000-0000-0000-000000000001 | confirmed | 3 | 4599 | Mechanical Keyboard | 2024-01-15 10:28:00+00

b2c3d4e5-0000-0000-0000-000000000002 | pending | 1 | 1299 | USB-C Hub | 2024-01-14 08:15:00+00

(2 rows)

Time: 0.842 ms <-- single index scan, no joins

💡Pro Tip: Make Every Event Handler Idempotent

Your event consumer WILL process the same event twice eventually — network retries, consumer restarts, at-least-once delivery guarantees. Every handler must produce the same result whether it runs once or ten times. The ON CONFLICT DO NOTHING / DO UPDATE pattern shown above is your best friend. Store a processed_event_ids table as a second safeguard for events that don't map cleanly to upsert semantics.

📊 Production Insight

Covering indexes reduce heap lookups to zero for known query patterns.

Each projection is disposable — schema migrations are safe and cheap.

Idempotent handlers prevent double-spend in cumulative projections.

🎯 Key Takeaway

Read projections are flat, pre-computed views of event-sourced data.

They are disposable — drop and rebuild without affecting writes.

Polyglot persistence: choose the right engine per read pattern.

Choose Storage Engine for Read Projection

IfLow latency, strong consistency needed

→

UsePostgreSQL read replica with covering indexes.

IfFull-text search required

→

UseElasticsearch projection updated from events.

IfReal-time leaderboard or sorted set

→

UseRedis sorted sets with event consumers.

thecodeforge.io

Cqrs With Databases

Synchronization, Eventual Consistency, and the Replication Lag Problem

Here's the uncomfortable truth that many CQRS tutorials gloss over: after a command succeeds, there is a window — however brief — where your read model is stale. A user places an order, the write side commits, then they immediately refresh their order list and see nothing. This isn't a bug — it's eventual consistency by design. But if you don't handle it explicitly, your users will think your system is broken.

The synchronization pipeline has several moving parts, each with its own failure mode. The Outbox poller (or Debezium CDC process) reads unpublished events, publishes them to a broker, and marks them published. The consumer reads from the broker and applies events to projections. Each step introduces latency. Under normal conditions this is 50-500ms. Under load, broker backpressure, or a consumer restart, it can be seconds or minutes.

Production strategies for managing this lag window are critical. The simplest is 'read-your-writes consistency': after a successful command, the API response includes the current state of the relevant data (pulled from the write side, just this once) or a version number the client can poll against. The client-side strategy is to optimistically update the UI immediately on command success and reconcile when the read model catches up — exactly what every modern frontend framework does with optimistic updates.

For the synchronization infrastructure itself, you have two main patterns: the polling Outbox (simple, works with any DB) and Change Data Capture with Debezium (zero-polling, sub-100ms lag, but adds operational complexity). In both cases, the event consumer must track its position (a cursor or Kafka offset) durably so it can resume after a crash without reprocessing from the beginning or missing events.

outbox_poller_and_event_consumer.pyPYTHON

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

# ============================================================
# Outbox Poller + Event Consumer — Python with psycopg2
# Runs as a separate background service alongside your main app.
# Polls the domain_events outbox, publishes to a channel,
# and applies events to read-side projections.
# ============================================================

import psycopg2
import psycopg2.extras
import json
import time
import logging
from datetime import datetime, timezone
from typing import Dict, Callable, Any

logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
logger = logging.getLogger(__name__)

# ── Database connection settings ──────────────────────────────
WRITE_DB_DSN = "postgresql://app_user:secret@write-db:5432/orders_write"
READ_DB_DSN  = "postgresql://app_user:secret@read-db:5432/orders_read"

# ── How many events to process in one polling batch ──────────
BATCH_SIZE = 100

# ── Pause between polls when no events are pending (seconds) ─
IDLE_SLEEP_SECONDS = 0.5


def fetch_unpublished_events(write_conn, batch_size: int) -> list:
    """Pull a batch of unpublished events from the outbox, ordered by time.
    We use SELECT ... FOR UPDATE SKIP LOCKED so multiple poller instances
    never process the same event — safe for horizontal scaling."""
    with write_conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cursor:
        cursor.execute("""
            SELECT event_id, aggregate_type, aggregate_id,
                   event_type, payload, occurred_at
            FROM   domain_events
            WHERE  published = FALSE
            ORDER  BY occurred_at ASC
            LIMIT  %s
            FOR UPDATE SKIP LOCKED
        """, (batch_size,))
        return cursor.fetchall()


def mark_events_published(write_conn, event_ids: list) -> None:
    """Mark the batch as published only AFTER the read side has committed.
    If the read-side apply fails, these stay unpublished and will be retried."""
    with write_conn.cursor() as cursor:
        cursor.execute("""
            UPDATE domain_events
            SET    published = TRUE
            WHERE  event_id  = ANY(%s)
        """, (event_ids,))
    write_conn.commit()  # commit the mark-published update


def apply_order_placed(read_conn, payload: Dict[str, Any]) -> None:
    """Update the order_summary_projection and customer_ltv_projection.
    Uses ON CONFLICT to be idempotent — safe to replay."""
    with read_conn.cursor() as cursor:
        cursor.execute("""
            SELECT apply_order_placed_event(
                %s::uuid, %s::uuid, %s, %s, %s, %s, %s, %s::timestamptz
            )
        """, (
            payload['order_id'],
            payload['customer_id'],
            payload['customer_email'],
            payload['customer_name'],
            payload['item_count'],
            payload['total_cents'],
            payload.get('first_item_name'),
            payload['placed_at']
        ))
    read_conn.commit()
    logger.info(f"Applied OrderPlaced for order {payload['order_id']}")


def apply_order_confirmed(read_conn, payload: Dict[str, Any]) -> None:
    """Update status in the order summary projection."""
    with read_conn.cursor() as cursor:
        cursor.execute("""
            SELECT apply_order_confirmed_event(%s::uuid, %s::uuid, %s::timestamptz)
        """, (
            payload['order_id'],
            payload['customer_id'],
            payload['confirmed_at']
        ))
    read_conn.commit()
    logger.info(f"Applied OrderConfirmed for order {payload['order_id']}")


# ── Registry mapping event_type strings to handler functions ─
# Adding a new event type is one line here + one handler function.
EVENT_HANDLERS: Dict[str, Callable] = {
    'OrderPlaced':     apply_order_placed,
    'OrderConfirmed':  apply_order_confirmed,
    # 'OrderShipped':  apply_order_shipped,   # add as needed
    # 'OrderCancelled': apply_order_cancelled,
}


def run_outbox_poller():
    """Main loop: poll → apply → mark published. Runs forever."""
    write_conn = psycopg2.connect(WRITE_DB_DSN)
    read_conn  = psycopg2.connect(READ_DB_DSN)
    write_conn.autocommit = False  # we control transactions manually
    read_conn.autocommit  = False

    logger.info("Outbox poller started — watching for domain events...")

    while True:
        try:
            # Open a transaction on the write side to lock the batch
            events = fetch_unpublished_events(write_conn, BATCH_SIZE)

            if not events:
                write_conn.rollback()  # release the FOR UPDATE lock
                time.sleep(IDLE_SLEEP_SECONDS)
                continue

            logger.info(f"Processing batch of {len(events)} event(s)")
            processed_ids = []

            for event in events:
                event_type = event['event_type']
                payload    = event['payload']  # already a dict from JSONB

                handler = EVENT_HANDLERS.get(event_type)

                if handler is None:
                    # Unknown event type — log and skip (don't crash the poller)
                    logger.warning(f"No handler for event type '{event_type}' — skipping")
                    processed_ids.append(event['event_id'])
                    continue

                try:
                    # Apply the event to the read-side projection
                    handler(read_conn, payload)
                    processed_ids.append(event['event_id'])

                except Exception as apply_error:
                    # Read-side apply failed — rollback read side, stop batch.
                    # These events remain unpublished and will retry next poll.
                    read_conn.rollback()
                    write_conn.rollback()
                    logger.error(
                        f"Failed to apply event {event['event_id']} "
                        f"({event_type}): {apply_error}"
                    )
                    break  # stop this batch, retry after sleep

            if processed_ids:
                # Only mark as published AFTER read side committed successfully
                mark_events_published(write_conn, processed_ids)
                logger.info(f"Marked {len(processed_ids)} event(s) as published")

        except Exception as poll_error:
            logger.error(f"Poller error: {poll_error}")
            try:
                write_conn.rollback()
                read_conn.rollback()
            except Exception:
                pass
            time.sleep(2)  # back off before retrying


if __name__ == '__main__':
    run_outbox_poller()

Output

2024-01-15 10:30:00,001 INFO Outbox poller started — watching for domain events...

2024-01-15 10:30:00,542 INFO Processing batch of 2 event(s)

2024-01-15 10:30:00,581 INFO Applied OrderPlaced for order a1b2c3d4-0000-0000-0000-000000000001

2024-01-15 10:30:00,612 INFO Applied OrderConfirmed for order a1b2c3d4-0000-0000-0000-000000000001

2024-01-15 10:30:00,643 INFO Marked 2 event(s) as published

2024-01-15 10:30:01,144 INFO Processing batch of 0 event(s)

-- (sleeps 0.5s, then polls again)

2024-01-15 10:30:01,645 INFO Processing batch of 1 event(s)

2024-01-15 10:30:01,678 INFO Applied OrderPlaced for order b2c3d4e5-0000-0000-0000-000000000002

2024-01-15 10:30:01,699 INFO Marked 1 event(s) as published

🔥Interview Gold: Debezium vs. Outbox Polling

Interviewers love this trade-off. The Outbox Poller is simple, DB-agnostic, and needs no extra infrastructure — but it adds write load from polling and has minimum latency equal to your sleep interval. Debezium with PostgreSQL logical replication reads the WAL directly, achieving sub-100ms lag with zero polling overhead — but it requires a replication slot, careful WAL retention settings, and another service to operate. For most teams, start with the Outbox Poller. Graduate to Debezium when you can measure that you need it.

📊 Production Insight

Poller connection validation prevents silent failures.

Read-your-writes consistency solves the refresh-after-command problem.

Client optimistic updates make lag invisible to users.

🎯 Key Takeaway

Eventual consistency is not a bug — it's a design choice.

Handle the lag window with read-your-writes or optimistic UI.

Outbox Poller is production-proven; Debezium for sub-100ms lag.

Manage Consistency Window

IfUser-facing feature with immediate feedback need

→

UseReturn write-side state in command response, plus optimistic UI update.

IfBackground process (e.g., reporting)

→

UseAccept eventual consistency; measure lag and alert on threshold.

IfFinancial transactions requiring strong read consistency

→

UseRead from write side for critical checks; use read model for non-critical queries.

Event Sourcing Integration: When Events Become the Source of Truth

Pure CQRS doesn't require Event Sourcing, but the two patterns form a natural pair. Event Sourcing means storing the stream of events that led to the current state, rather than the current state itself. The command side becomes an append-only event store. The read side still uses projections, but now they are built entirely from the event stream.

Why go this route? You get a complete audit log — every state change is recorded. You can rebuild any read projection from scratch at any time, even years later. You can debug production issues by replaying the exact sequence of events that led to a bug. And because the event store is append-only, you eliminate write contention entirely. No more row-level locks blocking concurrent commands.

But there's a practical cost. Your queries for current state now require replaying events or maintaining snapshots. Event store schemas are less familiar to most developers. And you must version your events — adding a field to OrderPlaced after it's in production means handling old events that lack that field. Schema evolution on events is a real skill.

The common compromise: use a traditional normalized database on the write side (as shown earlier), but still emit domain events and store them durably. You get the replayability and audit trail without the full Event Sourcing operational burden. You can migrate to full Event Sourcing later if the need arises.

event_store_client.pyPYTHON

# ============================================================
# Minimal Event Store Client — io.thecodeforge.eventstore
# Append-only stream for Event Sourcing.
# Uses PostgreSQL as the store with snapshot support.
# ============================================================

import psycopg2
import psycopg2.extras
import json
import uuid
from datetime import datetime, timezone
from typing import Optional, Dict, List, Any


class EventStore:
    def __init__(self, dsn: str):
        self.dsn = dsn

    def append_event(self, stream_id: str, event_type: str,
                     data: Dict[str, Any], expected_version: Optional[int] = None) -> int:
        """Append an event to the stream. If expected_version is provided,
        the operation will fail if the stream version doesn't match (optimistic concurrency)."""
        conn = psycopg2.connect(self.dsn)
        try:
            with conn.cursor() as cur:
                # Retrieve current version number for the stream
                cur.execute("""
                    SELECT version FROM io_thecodeforge.event_streams
                    WHERE stream_id = %s
                    FOR UPDATE
                """, (stream_id,))
                row = cur.fetchone()
                current_version = row[0] if row else 0

                if expected_version is not None and current_version != expected_version:
                    raise ValueError(
                        f"Concurrency error: expected version {expected_version}, "
                        f"current version {current_version}"
                    )

                new_version = current_version + 1

                # Upsert the stream row with new version
                cur.execute("""
                    INSERT INTO io_thecodeforge.event_streams (stream_id, version, last_updated)
                    VALUES (%s, %s, %s)
                    ON CONFLICT (stream_id) DO UPDATE SET
                        version = EXCLUDED.version,
                        last_updated = EXCLUDED.last_updated
                """, (stream_id, new_version, datetime.now(timezone.utc)))

                # Insert the event
                cur.execute("""
                    INSERT INTO io_thecodeforge.events (event_id, stream_id, version, event_type, data, occurred_at)
                    VALUES (%s, %s, %s, %s, %s, %s)
                """, (
                    str(uuid.uuid4()), stream_id, new_version,
                    event_type, json.dumps(data), datetime.now(timezone.utc)
                ))
                conn.commit()
                return new_version
        finally:
            conn.close()

    def read_events(self, stream_id: str, from_version: int = 1) -> List[Dict[str, Any]]:
        """Return all events for a stream starting from a given version."""
        conn = psycopg2.connect(self.dsn)
        try:
            with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
                cur.execute("""
                    SELECT event_id, version, event_type, data, occurred_at
                    FROM io_thecodeforge.events
                    WHERE stream_id = %s
                    AND version >= %s
                    ORDER BY version ASC
                """, (stream_id, from_version))
                return cur.fetchall()
        finally:
            conn.close()

    def create_snapshot(self, stream_id: str, snapshot_data: Dict[str, Any], version: int) -> None:
        """Store a snapshot for efficient rebuilding of current state."""
        conn = psycopg2.connect(self.dsn)
        try:
            with conn.cursor() as cur:
                cur.execute("""
                    INSERT INTO io_thecodeforge.event_snapshots (stream_id, version, snapshot_data, created_at)
                    VALUES (%s, %s, %s, %s)
                    ON CONFLICT (stream_id, version) DO NOTHING
                """, (stream_id, version, json.dumps(snapshot_data), datetime.now(timezone.utc)))
                conn.commit()
        finally:
            conn.close()

Output

Event appended with version 3.

Mental Model

Event Sourcing vs State-Based Write Model

Think of Event Sourcing as a bank statement — you record every transaction, and the balance is derived from them. State-based is a spreadsheet that only stores the current balance.

Event Sourcing: append-only, full audit trail, can rebuild any point in time.
State-based: simpler, lower storage, but no history and potential for lost updates.
Snapshotting is essential for performance — replaying 10 million events on read is not practical.
Event versioning is mandatory — never modify an existing event schema, always add new fields.
CQRS works without Event Sourcing, but Event Sourcing amplifies CQRS's benefits.

📊 Production Insight

Event Sourcing eliminates update contention — no more deadlocks on writes.

Snapshot regularly: replaying from event 0 on every restart kills startup time.

Version events using a schema registry to handle backward compatibility.

🎯 Key Takeaway

Event Sourcing pairs naturally with CQRS but adds operational complexity.

Snapshots are non-negotiable for performant state rebuilds.

Schema evolution on events is a skill — plan for backward compatibility.

State-Based vs Event-Sourced Write Model

IfNeed full audit trail and time-travel queries

→

UseUse Event Sourcing for the write model.

IfSimple CRUD, low write contention, team new to CQRS

→

UseState-based write model with Outbox is sufficient.

IfHigh write throughput, need to avoid locks

→

UseEvent Sourcing append-only store is ideal.

Polyglot Persistence: Choosing the Right Database for Each Read Model

One of CQRS's biggest advantages is that you're no longer forced to use the same database for everything. Each read projection can use the storage engine that best matches its access pattern. This is polyglot persistence — and it's a direct reason to adopt CQRS.

Let's say you have three distinct read consumers: an order summary list (needs low latency, consistent sort), a full-text search for products (needs inverted indexes), and a real-time analytics dashboard (needs fast aggregations on time windows). With CQRS, you can serve these from three different databases: PostgreSQL for the order summaries (covering index), Elasticsearch for search, and Redis with Sorted Sets + Timeseries for the dashboard. Each projection is updated by the same event stream but stored and queried independently.

This freedom comes with a cost: you now have three databases to operate, three connection pools, three backup strategies. And data consistency is entirely your responsibility — there's no cross-database transaction. The event stream is your single source of truth, and each projection must handle its own failure and retry logic.

A pragmatic rule: start with a single read replica for all projections. Split into specialized engines only when you have measured the performance gap. Premature polyglot is just over-engineering.

multi_store_event_consumer.pyPYTHON

# ============================================================
# Multi-store event consumer — io.thecodeforge.eventconsumer
# Applies the same event to different projections in different stores.
# Each projection is idempotent and can fail independently.
# ============================================================

import json
import logging
from typing import Dict, Any, Callable

logger = logging.getLogger(__name__)

class ProjectionHandler:
    """Base class for a projection that updates a specific store."""
    def handle(self, event_type: str, payload: Dict[str, Any]) -> None:
        raise NotImplementedError

class PostgresOrderSummaryProjection(ProjectionHandler):
    def handle(self, event_type: str, payload: Dict[str, Any]) -> None:
        # ... PostgreSQL upsert logic
        pass

class ElasticsearchProductSearchProjection(ProjectionHandler):
    def handle(self, event_type: str, payload: Dict[str, Any]) -> None:
        # ... Elasticsearch index update logic
        pass

class RedisAnalyticsProjection(ProjectionHandler):
    def handle(self, event_type: str, payload: Dict[str, Any]) -> None:
        # ... Redis sorted set increments
        pass


def process_event(event: Dict[str, Any], handlers: Dict[str, ProjectionHandler]):
    """Apply event to all registered projections.
    Each projection runs independently; failures are logged but don't block others."""
    event_type = event['event_type']
    payload = event['payload']
    for name, handler in handlers.items():
        try:
            handler.handle(event_type, payload)
            logger.info(f"Projection '{name}' updated for event {event['event_id']}")
        except Exception as e:
            logger.error(f"Projection '{name}' failed for event {event['event_id']}: {e}")
            # Optionally send to dead-letter queue

# Example usage
handlers = {
    'order_summary': PostgresOrderSummaryProjection(),
    'product_search': ElasticsearchProductSearchProjection(),
    'analytics': RedisAnalyticsProjection(),
}

for event in event_batch:
    process_event(event, handlers)

⚠ Warning: Polyglot Adds Operational Complexity

Each additional database is another system to patch, monitor, back up, and scale. If your team is not experienced with Elasticsearch or Redis, the learning curve will slow you down. Start with a single read replica for all projections. Split only when you can measure the performance benefit and have the operational capacity to support it.

📊 Production Insight

Event stream is the only cross-database consistency guarantee.

Each projection store fails independently — handle per-store errors gracefully.

Start with one read replica; split into specialized stores only after measurement.

🎯 Key Takeaway

Polyglot persistence is a benefit, not a requirement.

Each projection store can fail independently without affecting others.

Let measured performance drive the decision, not architecture hype.

When to Split into Specialized Stores

IfMeasure: read queries take > 10ms despite covering indexes

→

UseMove that projection to a dedicated store (e.g., Elasticsearch for search).

IfOne projection consumes 80% of read replica IOPS

→

UseMove that projection to its own dedicated replica or engine.

IfTeam is unfamiliar with the target database (e.g., Neo4j)

→

UseDefer until trained; use existing store with optimized schema.

When CQRS Is a Trap: The Overengineering Threshold

Every second startup slaps CQRS on a CRUD app with three tables and calls it architecture. That's cargo culting. CQRS solves a specific class of pain: when your read and write workloads have opposing access patterns that a single model cannot satisfy without compromises you can't stomach.

You need CQRS when your writes require strict consistency and complex invariants (think financial ledger, booking system) while reads demand denormalized projections that slice data across dimensions the write model never considered. If you're building a blog CMS where you write once and read occasionally, a single PostgreSQL table with proper indexing will embarrass your CQRS setup on latency, complexity, and developer sanity.

The threshold is measurable: when your query-side ORM starts doing five-join monstrosities that block your command-side ACID transactions, you've got a candidate. Until then, you're adding distributed systems complexity for zero gain. CQRS is a surgical tool, not a foundation.

DetectCQRSNeed.sqlSQL

// io.thecodeforge — database tutorial

-- Check if your reads are blocking writes
SELECT
  wait_event_type,
  wait_event,
  count(*) as blocked_sessions,
  pg_blocking_pids(pid) as blockers
FROM pg_stat_activity
WHERE state = 'active'
GROUP BY wait_event_type, wait_event, pg_blocking_pids(pid)
HAVING count(*) > 2
ORDER BY blocked_sessions DESC;

Output

wait_event_type | wait_event | blocked_sessions | blockers

-----------------+-----------------+------------------+----------

Lock | relation | 12 | {24583}

Lock | tuple | 8 | {24901}

(2 rows)

⚠ Production Trap:

If you can't name the specific locking problem CQRS solves in your current system, you're building abstractions on abstraction. Start with a read replica before reaching for separate databases.

🎯 Key Takeaway

CQRS is justified when your write model's invariants are fundamentally incompatible with your read model's access patterns — not when your ORM queries look ugly.

The Atomicity Nightmare: Coordinating Command Side Failure Modes

Separating commands from queries doesn't eliminate atomicity requirements — it makes them harder. When a command updates the write database but the event propagation to the read model fails, you have data that exists on one side and not the other. This isn't eventual consistency, it's corruption.

Your command side must handle this with transactional outboxes or saga patterns. The command handler writes to the domain table and the outbox in the same database transaction. A separate process — structured as a CDC consumer or a poller with at-least-once delivery — picks up those outbox events and applies them to the read model. If the read model write fails, the outbox entry remains. You retry. You dead-letter. You never silently swallow.

This is where event sourcing integrates naturally: the event store becomes both your outbox and your source of truth. Without event sourcing, you still need the outbox pattern. Don't skip it because microservices blogs made it sound optional. I've debugged three-day-old data drift caused by command handlers that committed but never published. Every single time, the root cause was "we thought the message broker would handle it."

TransactionalOutbox.sqlSQL

// io.thecodeforge — database tutorial

BEGIN;

-- The actual command: create order
INSERT INTO orders (
  order_id, customer_id, status, total, created_at
) VALUES (
  'ord_2024_03_001', 'cus_4821', 'pending', 129.50, NOW()
);

-- The outbox: same transaction, different table
INSERT INTO outbox (
  event_id, aggregate_type, aggregate_id, event_type, payload, created_at
) VALUES (
  gen_random_uuid(),
  'Order',
  'ord_2024_03_001',
  'OrderCreated',
  jsonb_build_object(
    'order_id', 'ord_2024_03_001',
    'customer_id', 'cus_4821',
    'total', 129.50
  ),
  NOW()
);

COMMIT;

Output

COMMIT

-- Both order and outbox entry written atomically

-- Background worker polls: DELETE FROM outbox WHERE created_at < NOW() - INTERVAL '1 minute'

-- On failure: outbox entry stays, re-delivery produces idempotent read model update

🔥Senior Shortcut:

Use the same database for your outbox and command model. Different databases for the outbox introduces a distributed transaction problem worse than the one you're solving.

🎯 Key Takeaway

Transactional outbox or CDC — pick one. If your command writes succeed but reads never see them, you don't have CQRS, you have a bug that's invisible until the customer complains.

Problems and Considerations: Where CQRS Bites Back

Everyone talks about the glory of separated read and write models. Nobody warns you about the operational debt. The first problem is consistency. You can't just slap CQRS on a legacy database and call it a day. Your write model enforces invariants; your read model serves stale data. That gap is a feature until your business demands real-time reads. Then it's a nightmare.

The second problem is complexity. Every command needs validation, handling, and error recovery. Every read model needs its own sync strategy. You're not building one system—you're maintaining two or more. Polyglot persistence sounds cool until your team has to debug a replication failure across PostgreSQL, Redis, and Elasticsearch at 3 AM. The hidden cost here is cognitive load. Junior devs will conflate command and query responsibilities. Code reviews turn into lectures about bounded contexts. If your domain isn't screaming for this separation, don't do it. Start with a simple CRUD layer and refactor into CQRS when you can prove the pain. Otherwise, you're optimizing for a problem you don't have.

CqrsFailureModes.sqlSQL

// io.thecodeforge — database tutorial

-- Shows sync failure between write and read models

CREATE TABLE write_model.orders (
    order_id UUID PRIMARY KEY,
    status TEXT NOT NULL CHECK (status IN ('pending', 'shipped', 'cancelled')),
    version INT NOT NULL DEFAULT 1
);

CREATE TABLE read_model.orders (
    order_id UUID PRIMARY KEY,
    status TEXT NOT NULL,
    customer_name TEXT NOT NULL,
    sync_timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Replication lag meant read model showed 'pending' for 30 seconds
-- while write model had 'shipped'. Support got 15 tickets.
-- Output shows the problem:

-- write_model: order_id='abc-123', status='shipped'
-- read_model:  order_id='abc-123', status='pending', sync_timestamp='2024-09-12 14:22:30'

Output

write_model: order_id='abc-123', status='shipped'

read_model: order_id='abc-123', status='pending', sync_timestamp='2024-09-12 14:22:30'

⚠ Production Trap:

If your read model is more than 5 seconds behind, your users will notice. Set alerting on replication lag. Anything above 10 seconds is a P1 incident.

🎯 Key Takeaway

CQRS doubles your operational surface area. If you can't measure and alert on sync lag, you're flying blind.

Challenges: The Versioning Apocalypse

Versioning is the silent killer in CQRS. Your command model evolves with new business rules. Your read models are denormalized for specific UI screens. When you change the write side—add a field, split an aggregate, change a validation rule—every read model must be updated or it breaks silently. This isn't a schema migration problem. It's a contract problem.

The challenge is that read models have their own lifecycle. They get optimized for different queries. One read model might join five tables for a dashboard. Another might be a flat JSON dump for a mobile API. A change to the order status enum on the write side means all those read models need new mapping logic. Miss one, and your dashboard shows invalid states. The only sane approach is to version your events and make read models subscribe to specific versions. That means every consuming service or database must handle backward compatibility. Otherwise, you get a cascading failure when a new event type hits an old subscriber. Test this path explicitly in staging. Or enjoy debugging production data corruption at 4 AM.

VersioningChallenge.sqlSQL

// io.thecodeforge — database tutorial

-- Simulates read model breaking on version change

CREATE TABLE write_model.events (
    event_id UUID PRIMARY KEY,
    aggregate_type TEXT NOT NULL,
    event_version INT NOT NULL,
    payload JSONB NOT NULL
);

INSERT INTO write_model.events VALUES
    ('e1', 'Order', 1, '{"status": "pending", "total": 100}'),
    ('e2', 'Order', 2, '{"status": "shipped", "total": 100, "tracking": "USPS123"}');

-- Old read model only expects version 1
-- Query fails silently or inserts null tracking
SELECT
    payload->>'status' AS status,
    payload->>'tracking' AS tracking
FROM write_model.events
WHERE event_version = 1;

-- Output shows null for tracking—UI breaks

-- status: 'pending'
-- tracking: NULL

Output

status: 'pending'

tracking: NULL

🔥Senior Shortcut:

Define a read model compatibility matrix in your CI pipeline. Every version change triggers automated tests against all registered read model schemas. If one breaks, the build fails before deploy.

🎯 Key Takeaway

A write model change is a broadcast to everyone. If your read models don't pin to an event version, you're one deploy away from data corruption.

When building CQRS with databases, you're never doing it alone. The pattern draws from a rich ecosystem of tools, methodologies, and canonical references that can accelerate your implementation and prevent common mistakes. Start with Greg Young's original CQRS documentation and his talks on event sourcing—they remain the clearest articulation of the why behind the pattern. For the database side, Martin Kleppmann's "Designing Data-Driven Applications" covers exactly the transaction boundaries and replication lag scenarios that haunt CQRS systems. On the tooling front, look at PostgreSQL's logical replication for syncing write models to read models, Debezium for change data capture (CDC) pipelines that turn database commits into event streams, and Kafka or Pulsar for durable event transport. For operational monitoring, Datadog and New Relic offer lag-tracking dashboards that alert you when read models drift too far from the write side. Avoid cargo-culting from blog posts that show CQRS in toy examples—they omit the failure modes. Instead, study real production incidents from companies like Uber, Netflix, and Shopify, who have published detailed postmortems on CQRS failures. These resources teach you that CQRS is not a database problem—it's a consistency and observability problem that happens to involve databases.

monitor_read_lag.sqlSQL

// io.thecodeforge — database tutorial
// Query to detect read model replication lag
SELECT
    rm.model_name,
    rm.last_updated AS read_model_ts,
    wm.last_committed AS write_model_ts,
    EXTRACT(EPOCH FROM (wm.last_committed - rm.last_updated)) AS lag_seconds
FROM
    read_models rm
JOIN
    write_models wm ON rm.command_id = wm.command_id
WHERE
    wm.last_committed > rm.last_updated
ORDER BY
    lag_seconds DESC
LIMIT 10;

Output

model_name | read_model_ts | write_model_ts | lag_seconds

----------------|-----------------------|-----------------------|------------

order_summary | 2025-03-20 14:32:10 | 2025-03-20 14:32:45 | 35

user_profile | 2025-03-20 14:31:55 | 2025-03-20 14:32:45 | 50

⚠ Production Trap:

Don't rely solely on your ORM to proxy read models. Using raw CDC tools like Debezium avoids the false-consistency illusion that an ORM gives you—you'll see actual replication timestamps instead of cached expectations.

🎯 Key Takeaway

CQRS requires observability tooling (CDC, lag dashboards) and canonical references like Kleppmann's book to avoid building on faulty assumptions about consistency.

Putting It All Together

CQRS with databases isn't a single decision but a layered architecture that demands disciplined orchestration. At the top, your application exposes two surfaces: a command API that accepts mutations and a query API that returns views. The command API flows into a write database—typically a normalized, ACID-compliant store like PostgreSQL that enforces your domain invariants via constraints and transactions. Once a command commits, an event is published (through CDC, outbox pattern, or direct event bus) into a messaging layer. Downstream, subscribers process these events and update denormalized read models stored in purpose-built databases: a PostgreSQL materialized view for operational dashboards, a Redis cache for low-latency lookups, an Elasticsearch index for full-text search, and a Cassandra or DynamoDB for time-series analytics. The critical integration point is the synchronizer—a dedicated service that runs idempotent update loops, handles retries with exponential backoff, and tracks offset positions in an event log. This service also monitors lag metrics and raises alarms if read models fall behind beyond your SLA (typically 1–5 seconds for near-real-time, minutes for batch-heavy systems). You must also plan for schema evolution: version your events (e.g., OrderPlacedV1, OrderPlacedV2) so that old read models can coexist with new ones during rolling deployments. Finally, wire in circuit breakers on the command side to reject writes if read model lag exceeds a threshold—this prevents a cascading inconsistency avalanche. When you wire these pieces together correctly, CQRS becomes transparent to users: they see consistent views, and your system scales horizontally without global locks.

synchronizer_offset_management.sqlSQL

// io.thecodeforge — database tutorial
// Idempotent read model replays using event offset tracking
CREATE TABLE synchronizer_offsets (
    read_model_name VARCHAR(100) PRIMARY KEY,
    last_event_id BIGINT NOT NULL,
    last_event_type VARCHAR(50),
    version INTEGER DEFAULT 1,
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Upsert to guarantee exactly-once semantices
INSERT INTO synchronizer_offsets (read_model_name, last_event_id, last_event_type)
VALUES ('order_summary', 4096, 'OrderPlacedV2')
ON CONFLICT (read_model_name, last_event_id)
DO UPDATE SET
    last_event_type = EXCLUDED.last_event_type,
    version = synchronizer_offsets.version + 1,
    updated_at = NOW();

Output

INSERT 0 1

🔥Key Integration Point:

The synchronizer offset table is your source of truth for read model freshness. If you ever need to rebuild a read model, truncate it and reset the offset to zero—the system replays all events idempotently.

🎯 Key Takeaway

A working CQRS stack is command API → write DB → event stream → synchronizer (idempotent, offset-tracked) → purpose-built read models, with circuit breakers and event versioning as safety nets.

CQRS with PostgreSQL: Read Models with Materialized Views

Materialized views in PostgreSQL provide an efficient way to implement read models in a CQRS architecture. Unlike regular views, materialized views physically store the query result, which can be refreshed periodically or on demand. This is ideal for denormalized read models that aggregate data from multiple tables, reducing query complexity and improving read performance.

To create a materialized view for a read model, define it with the desired denormalized structure. For example, consider an e-commerce system with separate write models for orders and products. A read model for order summaries might join these tables:

``sql CREATE MATERIALIZED VIEW order_summary AS SELECT o.id AS order_id, o.customer_id, c.name AS customer_name, o.order_date, SUM(oi.quantity * oi.unit_price) AS total_amount, COUNT(oi.product_id) AS item_count FROM orders o JOIN customers c ON o.customer_id = c.id JOIN order_items oi ON o.id = oi.order_id GROUP BY o.id, o.customer_id, c.name, o.order_date; ``

This view can be refreshed using REFRESH MATERIALIZED VIEW order_summary;. In a CQRS setup, the refresh can be triggered by an outbox processor after events are published, ensuring eventual consistency. For high-frequency updates, consider using concurrent refresh (REFRESH MATERIALIZED VIEW CONCURRENTLY) to avoid locking reads.

Materialized views support indexing, which further accelerates read queries. For instance, create an index on customer_id for faster lookups:

``sql CREATE INDEX idx_order_summary_customer ON order_summary(customer_id); ``

However, be mindful of refresh overhead. For near-real-time requirements, incremental updates using triggers or logical replication may be more suitable. Materialized views are best for read models that change infrequently or can tolerate some staleness.

create_materialized_view.sqlSQL

CREATE MATERIALIZED VIEW order_summary AS
SELECT o.id AS order_id,
       o.customer_id,
       c.name AS customer_name,
       o.order_date,
       SUM(oi.quantity * oi.unit_price) AS total_amount,
       COUNT(oi.product_id) AS item_count
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id, o.customer_id, c.name, o.order_date;

-- Add index for performance
CREATE INDEX idx_order_summary_customer ON order_summary(customer_id);

-- Refresh concurrently to avoid locks
REFRESH MATERIALIZED VIEW CONCURRENTLY order_summary;

💡Refresh Strategy

📊 Production Insight

In production, schedule refreshes during low-traffic periods or use logical replication for near-real-time updates. Monitor refresh duration and index usage to avoid bottlenecks.

🎯 Key Takeaway

Materialized views in PostgreSQL enable efficient denormalized read models in CQRS, but require careful refresh strategies to balance consistency and performance.

CQRS vs CRUD: When Is Each Appropriate

Choosing between CQRS (Command Query Responsibility Segregation) and CRUD (Create, Read, Update, Delete) depends on the complexity of your domain and scalability needs. CRUD is straightforward: a single model handles both reads and writes, suitable for simple applications with low contention. CQRS separates read and write models, optimizing each for its purpose.

When to use CRUD: - Simple domains with few business rules. - Low read-to-write ratio or balanced workloads. - Teams unfamiliar with event-driven architectures. - Rapid prototyping or small-scale applications.

When to use CQRS: - Complex domains with distinct read and write shapes (e.g., reporting vs. transactional). - High read throughput requiring denormalized views. - Need for independent scaling of reads and writes. - Event sourcing integration for audit trails.

Consider a blog platform: CRUD works for basic posts and comments. But for an analytics dashboard with aggregated data, CQRS with separate read models (e.g., materialized views) is better. Example CRUD operation:

``sql -- CRUD: single table update UPDATE posts SET title = 'New Title' WHERE id = 1; ``

In CQRS, the write side might emit an event, and the read side updates a denormalized view:

```sql -- Write side: insert into outbox INSERT INTO outbox (event_type, payload) VALUES ('PostUpdated', '{"id":1,"title":"New Title"}');

-- Read side: refresh materialized view REFRESH MATERIALIZED VIEW post_summary; ```

CQRS introduces eventual consistency, which may be unacceptable for some use cases (e.g., banking transactions). CRUD provides immediate consistency. Evaluate trade-offs: CQRS adds complexity but enables scalability and flexibility.

crud_vs_cqrs.sqlSQL

-- CRUD approach: direct update
UPDATE posts SET title = 'New Title' WHERE id = 1;

-- CQRS approach: write to outbox, then refresh read model
INSERT INTO outbox (event_type, payload) VALUES ('PostUpdated', '{"id":1,"title":"New Title"}');
-- Later, outbox processor refreshes materialized view
REFRESH MATERIALIZED VIEW CONCURRENTLY post_summary;

🔥Consistency Trade-off

📊 Production Insight

Start with CRUD and refactor to CQRS only when needed. Premature CQRS can overcomplicate a system without tangible benefits.

🎯 Key Takeaway

Use CRUD for simple, consistent data access; adopt CQRS when read and write workloads diverge significantly or when scalability demands separate optimization.

thecodeforge.io

Cqrs With Databases

CQRS in Distributed Systems: Separate Read/Write Databases

In distributed systems, CQRS often involves physically separate databases for reads and writes. This allows independent scaling, different storage engines, and optimized schemas. For example, use PostgreSQL for transactional writes and Elasticsearch for full-text search reads.

Implementation considerations: - Data synchronization: Use an outbox pattern or change data capture (CDC) to propagate changes from write to read databases. For instance, Debezium can stream PostgreSQL WAL to update read models. - Consistency: Eventual consistency is inherent. Design for stale reads where acceptable. - Failure handling: If the read database is down, writes continue; reads may fail or fall back to the write database.

Example: A write database stores orders in normalized form; a read database stores denormalized order summaries. The outbox processor sends events to a message queue, which updates the read database:

```sql -- Write database: outbox table CREATE TABLE outbox ( id UUID PRIMARY KEY, event_type TEXT, payload JSONB, created_at TIMESTAMP DEFAULT NOW() );

-- Read database: denormalized table CREATE TABLE order_summary ( order_id UUID PRIMARY KEY, customer_name TEXT, total_amount DECIMAL, item_count INT ); ```

A worker polls the outbox, processes events, and updates the read database:

``sql -- Worker: insert into read database INSERT INTO order_summary (order_id, customer_name, total_amount, item_count) VALUES ($1, $2, $3, $4) ON CONFLICT (order_id) DO UPDATE SET customer_name = EXCLUDED.customer_name, total_amount = EXCLUDED.total_amount, item_count = EXCLUDED.item_count; ``

Separate databases enable polyglot persistence: choose the best database for each read model. However, operational complexity increases (backup, monitoring, network latency). Ensure idempotent updates to handle duplicate events.

separate_databases.sqlSQL

-- Write database schema
CREATE TABLE outbox (
  id UUID PRIMARY KEY,
  event_type TEXT NOT NULL,
  payload JSONB NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Read database schema
CREATE TABLE order_summary (
  order_id UUID PRIMARY KEY,
  customer_name TEXT,
  total_amount DECIMAL(10,2),
  item_count INT,
  last_updated TIMESTAMP DEFAULT NOW()
);

-- Worker update (idempotent)
INSERT INTO order_summary (order_id, customer_name, total_amount, item_count)
VALUES ($1, $2, $3, $4)
ON CONFLICT (order_id) DO UPDATE SET
  customer_name = EXCLUDED.customer_name,
  total_amount = EXCLUDED.total_amount,
  item_count = EXCLUDED.item_count,
  last_updated = NOW();

⚠ Operational Complexity

📊 Production Insight

Use CDC tools like Debezium for reliable synchronization. Implement circuit breakers for read databases to prevent cascading failures.

🎯 Key Takeaway

Separate read/write databases in CQRS enable independent scaling and optimized storage, but introduce eventual consistency and operational overhead.

● Production incidentPOST-MORTEMseverity: high

The Midnight Order Disappearance

Symptom

Users reported 'order placed successfully' confirmation but the order never appeared in their order history. Internal monitoring showed the domain_events outbox growing unboundedly.

Assumption

The team assumed the Outbox poller was stateless and could restart safely, and that the database had enough connections to handle retries.

Root cause

The Outbox poller used a single database connection that was not released on failure. After a transient network blip, the poller's connection entered a broken state, but the connection pool kept it alive. The poller loop consumed all available connections waiting for the broken connection to recover, causing connection exhaustion. Meanwhile, new events kept piling up unread.

Fix

Implemented connection validation before each poll: test the connection with SELECT 1 and reconnect if broken. Added a maximum retry limit per event to prevent infinite loops. Deployed two poller instances with FOR UPDATE SKIP LOCKED to allow zero-downtime failover.

Key lesson

Always validate database connections before using them in background loops.
Add dead-letter queues for events that fail repeatedly.
Monitor the outbox table size and poller lag as first-class metrics.

Production debug guideQuick reference for common read-model lag and divergence issues in production5 entries

Symptom · 01

Read model misses a recently processed command

→

Fix

Check the domain_events table for unpublished events: SELECT count(*) FROM domain_events WHERE published = FALSE; Then verify the Outbox poller is running and not stuck.

Symptom · 02

Read model shows stale data more than 10 seconds old

→

Fix

Measure poller lag: SELECT now() - occurred_at FROM domain_events ORDER BY occurred_at DESC LIMIT 1; Investigate consumer bottlenecks (broker queue depth, DB connection pool).

Symptom · 03

Duplicate events in read model (idempotency broken)

→

Fix

Check event handler logic: look for missing ON CONFLICT clauses or non-idempotent UPDATE statements. Verify processed_event_ids tracking table.

Symptom · 04

Event consumer crashes with OOM

→

Fix

Review batch size – large batches with complex projections cause memory pressure. Reduce BATCH_SIZE and add per-event commit with checkpointing.

Symptom · 05

Read model out of sync after DB restore

→

Fix

Event log is source of truth; replay events from the last known good position. If no snapshot, rebuild projections from scratch using event log.

★ CQRS Sync Troubleshooting Quick ReferenceFive commands to diagnose the most common CQRS data pipeline failures in production.

Read model not reflecting new writes−

Immediate action

Check domain_events outbox growth

Commands

SELECT count(*) FROM io_thecodeforge.domain_events WHERE published = FALSE;

SELECT event_type, count(*) FROM io_thecodeforge.domain_events WHERE published = FALSE GROUP BY event_type;

Fix now

Restart the Outbox poller service after verifying database connectivity.

Event consumer stuck on a single event+

Projection shows duplicate rows+

Poller lag exceeds 1 minute+

Write Side vs Read Side

Aspect	Write Side (Command Model)	Read Side (Query Model)
Primary goal	Correctness and consistency	Speed and consumer convenience
Schema shape	Normalized — 3NF or higher	Denormalized — flat projections
Constraints	Foreign keys, check constraints, unique indexes	Minimal or none — data integrity guaranteed by events
Typical DB engine	PostgreSQL (OLTP, strong ACID)	PostgreSQL read replica, Redis, Elasticsearch, or DynamoDB
Indexing strategy	Indexes on write-path lookup keys	Covering indexes for exact read patterns; search indexes
Who writes to it	Application (via commands)	Event consumer / projection updater only
Who reads from it	Only for consistency checks in commands	API layer, dashboards, reporting tools
Schema migrations	Risky — ALTER TABLE on live write traffic	Safe — drop projection, replay events, swap in new schema
Scaling strategy	Vertical + connection pooling (PgBouncer)	Horizontal — multiple read replicas or sharded per projection
Consistency model	Strong (synchronous, ACID)	Eventual (asynchronous, updated via events)
Data freshness	Always current	Delayed by replication lag (typically 50ms–5s)
Rebuild possible?	No — it is the source of truth	Yes — replay all events to rebuild any projection from scratch

⚙ Quick Reference

14 commands from this guide

File	Command / Code	Purpose
write_side_schema.sql	CREATE TABLE customers (	The Write Side
read_side_projections.sql	CREATE TABLE order_summary_projection (	The Read Side
outbox_poller_and_event_consumer.py	from datetime import datetime, timezone	Synchronization, Eventual Consistency, and the Replication L
event_store_client.py	from datetime import datetime, timezone	Event Sourcing Integration
multi_store_event_consumer.py	from typing import Dict, Any, Callable	Polyglot Persistence
DetectCQRSNeed.sql	SELECT	When CQRS Is a Trap
TransactionalOutbox.sql	BEGIN;	The Atomicity Nightmare
CqrsFailureModes.sql	CREATE TABLE write_model.orders (	Problems and Considerations
VersioningChallenge.sql	CREATE TABLE write_model.events (	Challenges
monitor_read_lag.sql	SELECT	Related Resources
synchronizer_offset_management.sql	CREATE TABLE synchronizer_offsets (	Putting It All Together
create_materialized_view.sql	CREATE MATERIALIZED VIEW order_summary AS	CQRS with PostgreSQL
crud_vs_cqrs.sql	UPDATE posts SET title = 'New Title' WHERE id = 1;	CQRS vs CRUD
separate_databases.sql	CREATE TABLE outbox (	CQRS in Distributed Systems

Key takeaways

The write model enforces invariants; the read model serves consumers

they have opposite optimization goals and should never share the same table structure or storage engine.

The Outbox Pattern

writing domain events in the same transaction as state changes — is the only reliable way to prevent the dual-write problem and guarantee your read model never silently diverges.

Every event handler in your consumer must be idempotent by design, not by hope. At-least-once delivery is a guarantee from every serious message broker

double processing is not a theoretical edge case.

Read projections are disposable by design. The ability to drop and rebuild any projection from the event log is CQRS's most underrated superpower

it turns read-side schema migrations from a production risk into a routine operation.

Polyglot persistence is a benefit, not a requirement. Start with one read replica and split only when measured performance demands it.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does CQRS handle the consistency problem when a user submits a comma...

Q02SENIOR

If your read-side projection gets corrupted or falls behind by several t...

Q03SENIOR

A colleague says 'we should use CQRS for our user authentication service...

Q01 of 03SENIOR

How does CQRS handle the consistency problem when a user submits a command and immediately queries the read model — what patterns exist to bridge that gap, and what are their trade-offs?

ANSWER

Three main patterns: (1) Read-your-writes consistency — after a successful command, the API returns the current state from the write side, and the client uses that as immediate feedback. (2) Optimistic UI update — the client assumes success and updates the UI immediately, reconciling when the read model eventually reflects the change. (3) Version token — the command returns a version number, and the client polls the read model until it sees that version. Each trades complexity for user experience. The first two are most common; polling adds latency and wasted requests.

FAQ · 5 QUESTIONS

Frequently Asked Questions

Do I need Event Sourcing to implement CQRS?

How much eventual consistency lag is acceptable in a CQRS system?

When should I NOT use CQRS?

What database indexing strategy should I use for the domain_events outbox table?

Can I use CQRS with a single database instance?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Written from production experience, not tutorials.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's Database Design. Mark it forged?

14 min read · try the examples if you haven't

CQRS Outbox Polling Failures — Connection Pool Exhaustion

CQRS with Databases — The Core Mechanic

The Write Side: Designing a Command Model That Protects Invariants

The Read Side: Denormalized Query Models Built for Your UI

Synchronization, Eventual Consistency, and the Replication Lag Problem

Event Sourcing Integration: When Events Become the Source of Truth

Polyglot Persistence: Choosing the Right Database for Each Read Model

When CQRS Is a Trap: The Overengineering Threshold

The Atomicity Nightmare: Coordinating Command Side Failure Modes

Problems and Considerations: Where CQRS Bites Back

Challenges: The Versioning Apocalypse

Related Resources

Putting It All Together

CQRS with PostgreSQL: Read Models with Materialized Views

CQRS vs CRUD: When Is Each Appropriate

CQRS in Distributed Systems: Separate Read/Write Databases

The Midnight Order Disappearance

Key takeaways

Interview Questions on This Topic

Frequently Asked Questions

That's Database Design. Mark it forged?