Event Sourcing with Databases: Design, Patterns & Production Pitfalls
Event Sourcing with databases explained deeply — schema design, snapshot strategies, projection rebuilds, CQRS integration, and real production gotchas you won't find elsewhere..
20+ years shipping high-throughput database systems. Written from production experience, not tutorials.
- Event sourcing stores every state change as an immutable event in an append-only log
- Current state is derived by replaying events — never stored directly
- Snapshot strategies prevent replay time from growing linearly with event count
- Event store schema must index on aggregate ID, event type, and timestamp
- Projections rebuilt from scratch by replaying history — design them idempotent
- Biggest mistake: treating events as fixed — schema evolution is mandatory, not optional
Imagine your bank never stored your 'current balance' — instead it kept every deposit and withdrawal ever made, and calculated your balance by replaying them. That's event sourcing. Instead of saving the current state of something, you save every change that ever happened to it. Your database becomes a permanent, ordered diary of facts, not a whiteboard that gets erased and rewritten.
Most applications are built around a simple mental model: save the current state, overwrite it when it changes, query it when you need it. That model works until it doesn't — and the moment it stops working, you feel it hard. You can't answer 'what did this record look like three weeks ago?' You can't audit who changed what and why. You can't replay business logic against historical data. You've lost information the moment you hit UPDATE.
Event sourcing flips this model on its head. Instead of storing current state, you store an immutable, append-only log of every state transition — every event that caused the world to change. The current state becomes a derived view, computed on demand by replaying that log. This gives you a complete audit trail, time-travel debugging, the ability to rebuild any projection from scratch, and a natural fit for event-driven architectures. The database stops being a snapshot of 'now' and becomes a ledger of 'everything that ever happened'.
By the end of this article you'll know how to design an event store schema from scratch, implement snapshot strategies to keep replay times sane at scale, wire up CQRS read models as SQL projections, handle schema evolution without breaking old events, and avoid the production mistakes that bite teams six months after they ship. This is the article I wish existed when I first built an event-sourced system in production.
What is Event Sourcing with Databases?
The core idea is simple: instead of updating a row in-place to reflect the latest state, you append an event to an append-only log. Each event captures what happened, when, and why. The current state of any entity — an account, an order, a document — is computed by replaying all its events in order.
This is fundamentally different from CRUD. In CRUD, an UPDATE erases the previous state. You lose history. Event sourcing preserves every fact forever. That's hugely powerful for auditing, debugging, and building multiple read models without fighting the write schema.
But it's also a trade-off. Replaying events on every read is slow — that's where snapshots come in. Storing immutable events makes schema evolution harder — you have to handle old event formats gracefully. And because you're appending to a single table at high concurrency, your database needs to handle write throughput, not just reads.
The snippet below shows a minimal Java example. This is not what you'd use in production, but it illustrates the append-replay loop.
Event Store Schema Design
Your event store schema is the foundation of your entire system. Get it wrong and you'll fight ordering issues, slow queries, and painful migrations.
The essential columns: aggregate_id (string/ UUID), sequence_number (integer), event_type (string), event_data (JSON/JSONB), timestamp (transaction time). The primary key MUST be (aggregate_id, sequence_number). This enforces ordering per aggregate and allows fast replay.
You also need indexes for specific query patterns: projectors that need to scan events by type or by timestamp range. Consider a composite index on (event_type, timestamp) for subscriptions. But remember: you trade write throughput for read performance.
Avoid using high-latency sequences like UUIDs for sequence_number unless you can guarantee monotonic ordering. Database sequences (like SERIAL in PostgreSQL) work well for single-writer per aggregate. If you have concurrent writers for the same aggregate, you need optimistic locking with a version field.
Below is a PostgreSQL DDL for a production-grade event store table.
- Aggregate ID = account number. Every event belongs to exactly one account.
- Sequence number = increasing check number. Gaps indicate missing transactions.
- Event type = transaction type (deposit, withdrawal, transfer).
- Event data = transaction details (amount, counterparty, notes).
- Timestamp = when the transaction was recorded (business time, not system time).
Snapshot Strategies for Efficient Replay
Replaying every event from the beginning of time on every read gets expensive fast. If an aggregate has 100k events, a single read takes 100k SQL queries or a huge IN clause. Snapshots are the answer.
A snapshot captures the aggregate state at a specific point (sequence number). On read, you load the latest snapshot and replay only events after that snapshot. This keeps replay cost constant relative to the window between snapshots, not total history.
Decide when to take snapshots: after every N events, or after a time interval. Use a background process (e.g., cron job or scheduled task) that reads events since the last snapshot and writes the aggregated state. Ensure snapshot creation is idempotent: if two snapshot jobs overlap, the later one wins.
Store snapshots in a separate table with the same primary key structure but with the state data. Projections also need snapshots if they're rebuilt often.
Here's a Java snippet for snapshot creation logic.
Projections and CQRS Integration
Event sourcing naturally pairs with CQRS. The write side appends events, the read side builds projections — denormalised tables optimised for query patterns. Projections are updated asynchronously by subscribing to the event stream.
This decoupling lets you build multiple read models without touching the write schema. You can have a relational projection for reporting, a cache projection for APIs, and a search projection for Elasticsearch — all fed from the same event stream.
But projections introduce complexity: eventual consistency, idempotency, and rebuild capability. When a projection fails, you need to rebuild it from scratch by replaying all events. Idempotency ensures that replaying events multiple times doesn't double-count. Your projection handlers must be deterministic and handle duplicate events gracefully.
Use a dedicated projection table that tracks the last processed position (often the event store's transaction ID or sequence number). This allows the projection to resume after a crash without missing events.
Here's a simple projector example for reading account balances.
Schema Evolution: Handling Event Versioning
Events are immutable, but your understanding of the domain changes. You'll need to add fields, rename them, or change structure. The challenge: you have millions of events in the old format. You can't go back and update them — they're immutable.
Two strategies: versioned event types and upcasters.
Versioned event types: each event type carries a version number (e.g., OrderPlacedV1, OrderPlacedV2). New events are written in the new version. On replay, you check the version and apply the appropriate handler. Old events stay as-is.
Upcasters: conversion functions that transform an old event into the new format on-the-fly during replay. This centralizes the transformation and keeps application code dealing only with the latest version. Upcasters run after deserialization but before handler application.
Choose upcasters over versioned types if the number of event types grows large — you avoid handler proliferation. Choose versioned types if different handlers need different logic per version.
Real-world tools like EventStoreDB or Axon Framework have built-in upcaster support. For custom databases, you implement upcasters in your application layer.
Here's an upcaster example in Java.
Production Pitfalls: Real-World Mistakes and Fixes
After running event sourcing in production for months, teams hit predictable but painful problems. Here are the three that hurt most.
- Missing per-aggregate ordering. As covered in the incident report, without a correct sequence number and single-writer per aggregate, events get reordered. The fix: enforce that all events for an aggregate are appended by the same thread/process, or use optimistic locking with a version field in the aggregate.
- Snapshot corruption due to concurrent writes. If your snapshot job reads events while a new event is being written, it may snap an incomplete state. The snapshot appears valid but when replaying after it, you get duplicate or missing events. Solution: run snapshots on a read replica with a known lag, or use snapshot isolation level.
- Projection rebuild timeout. When a projection fails and needs a full rebuild, replaying millions of events can exceed HTTP timeout for APIs or exhaust memory. Test rebuild times regularly. Consider rebuilding via a background job with progress tracking, and serve stale read models until rebuild completes.
Below is a checklist for avoiding these pitfalls.
- Journal = truth. Append only, never update or delete.
- Snapshot = derived cache. Can be recomputed from journal at any time.
- Projections = further derived caches. Also recomputable.
- Read models will always be eventually consistent with the journal.
- If journal is corrupted, you lose everything — back up the journal.
Why Your Standard CRUD Setup Is Begging for Event Sourcing
Your typical CRUD app is a lie. You store the current state, overwrite it on every update, and pretend nothing happened before. That works fine until you need to explain how a customer's account went from $10,000 to zero in three minutes.
Event sourcing doesn't store state. It stores facts. Every change is an immutable event appended to a log. Want to know the balance at any point in time? Replay the events. Need to debug a race condition? Read the event stream. Auditors knocking? Hand them the entire history.
The tradeoff is real: you're trading simple reads for complex reconstruction. If your problem isn't auditability, temporal queries, or high-write contention, don't touch this pattern. But if you're building financial systems, inventory management, or anything where "how did we get here" matters more than "where are we now," CRUD is a liability.
Event sourcing swaps 'latest state' for 'immutable truth.' That's a powerful shift, but it demands discipline. You need idempotent events, careful versioning, and a strategy for snapshots. Skip any of those and you'll be debugging replay logic at 3 AM.
Event Sourcing Is Not CQRS — Stop Confusing the Two
Every junior who reads a blog post about event sourcing immediately starts drawing boxes labeled 'Command' and 'Query.' That's CQRS. It's a separate pattern that often pairs with event sourcing, but they are not the same thing.
Event sourcing is about storage: how you persist facts. CQRS is about architecture: how you separate reads from writes. You can do event sourcing without CQRS by materializing state directly from events on read. Ugly but functional. You can also do CQRS with a plain relational database — just point your queries at a read replica.
Here's the confusion that causes real damage: teams slap CQRS on top of event sourcing too early, before they understand their query patterns. They build expensive projections for queries that could be answered by a simple snapshot. Then they complain event sourcing is slow.
Start with a single store. Write events, build snapshots, and query from snapshots. Only introduce separate read models when you have data — not guesses — proving you need them. Premature CQRS is the microservices of event sourcing: a solution in search of a problem.
When you do need CQRS (high read/write disparity, different read models for different consumers), design your projections as idempotent subscribers to the event stream. Miss an event? Reprocess from last checkpoint. That's fault tolerance, not failure.
What Are Event-Driven Microservices in Kafka?
Event-driven microservices decouple services by communicating through immutable event streams instead of direct HTTP calls. Kafka acts as the durable, ordered log where each service produces and consumes events. Unlike a shared database, Kafka stores events as topics — each event is a fact that any service can replay. This matches event sourcing's append-only nature: the Kafka log becomes your event store. Each microservice owns its state by reading from shared topics and projecting its local view. The WHY: synchronous coupling kills scalability. Event-driven architectures let services fail independently, scale separately, and evolve without coordinated deployments. The HOW: producers write events to partitioned topics, consumers track offsets, and brokers guarantee ordering per partition. Combine this with a schema registry to enforce event versions. Kafka handles replay via consumer group rebalancing — add instances to scale reads. Production trap: never treat Kafka as a message queue with ephemeral state; events are permanent records.
Integrating with Webhooks and Database
Webhooks push event notifications to external systems when state changes. The WHY: polling is wasteful and slow — webhooks give instant, targeted delivery. The HOW: after appending an event to your event store, a background process (or database trigger) publishes the event to registered webhook endpoints. PostgreSQL LISTEN/NOTIFY or logical replication slots can notify subscribers without polling. For high throughput, push events into a queue (e.g., Redis, RabbitMQ) and have workers dispatch HTTP POSTs to URLs with retries. The event store remains the source of truth — webhooks are a downstream projection. Production trap: webhooks fail silently; implement exponential backoff and dead-letter queues. Never send raw internal event structures; map to a stable external schema. Use idempotency keys so external systems can safely retry. The key rule: persist before notify — the event store commit must be durable before any webhook fires.
The Missing Events Incident — Why Ordering Broke Our Audit Trail
- Never rely on physical clocks for event ordering — use sequence numbers per aggregate.
- If you need global event order, use a distributed sequencer or accept eventual consistency.
- Always test event replay under concurrent write load across time-skewed machines.
SELECT aggregate_id, sequence_number FROM events WHERE aggregate_id = '<id>' ORDER BY sequence_number;Check application logs for commit errors or timeouts.Key takeaways
Common mistakes to avoid
5 patternsUsing timestamp as event ordering key
Not implementing snapshots from day one
Treating events as immutable and never evolving the schema
Projection handler not idempotent
Running snapshot creation concurrently with event writes without isolation
Interview Questions on This Topic
Explain the difference between event sourcing and the event-driven architecture (EDA). Can they be combined?
Frequently Asked Questions
20+ years shipping high-throughput database systems. Written from production experience, not tutorials.
That's Database Design. Mark it forged?
9 min read · try the examples if you haven't