Event Sourcing with Databases: Design, Patterns & Production Pitfalls
Event Sourcing with databases explained deeply — schema design, snapshot strategies, projection rebuilds, CQRS integration, and real production gotchas you won't find elsewhere.
- Event sourcing stores every state change as an immutable event in an append-only log
- Current state is derived by replaying events — never stored directly
- Snapshot strategies prevent replay time from growing linearly with event count
- Event store schema must index on aggregate ID, event type, and timestamp
- Projections rebuilt from scratch by replaying history — design them idempotent
- Biggest mistake: treating events as fixed — schema evolution is mandatory, not optional
Imagine your bank never stored your 'current balance' — instead it kept every deposit and withdrawal ever made, and calculated your balance by replaying them. That's event sourcing. Instead of saving the current state of something, you save every change that ever happened to it. Your database becomes a permanent, ordered diary of facts, not a whiteboard that gets erased and rewritten.
Most applications are built around a simple mental model: save the current state, overwrite it when it changes, query it when you need it. That model works until it doesn't — and the moment it stops working, you feel it hard. You can't answer 'what did this record look like three weeks ago?' You can't audit who changed what and why. You can't replay business logic against historical data. You've lost information the moment you hit UPDATE.
Event sourcing flips this model on its head. Instead of storing current state, you store an immutable, append-only log of every state transition — every event that caused the world to change. The current state becomes a derived view, computed on demand by replaying that log. This gives you a complete audit trail, time-travel debugging, the ability to rebuild any projection from scratch, and a natural fit for event-driven architectures. The database stops being a snapshot of 'now' and becomes a ledger of 'everything that ever happened'.
By the end of this article you'll know how to design an event store schema from scratch, implement snapshot strategies to keep replay times sane at scale, wire up CQRS read models as SQL projections, handle schema evolution without breaking old events, and avoid the production mistakes that bite teams six months after they ship. This is the article I wish existed when I first built an event-sourced system in production.
What is Event Sourcing with Databases?
The core idea is simple: instead of updating a row in-place to reflect the latest state, you append an event to an append-only log. Each event captures what happened, when, and why. The current state of any entity — an account, an order, a document — is computed by replaying all its events in order.
This is fundamentally different from CRUD. In CRUD, an UPDATE erases the previous state. You lose history. Event sourcing preserves every fact forever. That's hugely powerful for auditing, debugging, and building multiple read models without fighting the write schema.
But it's also a trade-off. Replaying events on every read is slow — that's where snapshots come in. Storing immutable events makes schema evolution harder — you have to handle old event formats gracefully. And because you're appending to a single table at high concurrency, your database needs to handle write throughput, not just reads.
The snippet below shows a minimal Java example. This is not what you'd use in production, but it illustrates the append-replay loop.
Event Store Schema Design
Your event store schema is the foundation of your entire system. Get it wrong and you'll fight ordering issues, slow queries, and painful migrations.
The essential columns: aggregate_id (string/ UUID), sequence_number (integer), event_type (string), event_data (JSON/JSONB), timestamp (transaction time). The primary key MUST be (aggregate_id, sequence_number). This enforces ordering per aggregate and allows fast replay.
You also need indexes for specific query patterns: projectors that need to scan events by type or by timestamp range. Consider a composite index on (event_type, timestamp) for subscriptions. But remember: you trade write throughput for read performance.
Avoid using high-latency sequences like UUIDs for sequence_number unless you can guarantee monotonic ordering. Database sequences (like SERIAL in PostgreSQL) work well for single-writer per aggregate. If you have concurrent writers for the same aggregate, you need optimistic locking with a version field.
Below is a PostgreSQL DDL for a production-grade event store table.
- Aggregate ID = account number. Every event belongs to exactly one account.
- Sequence number = increasing check number. Gaps indicate missing transactions.
- Event type = transaction type (deposit, withdrawal, transfer).
- Event data = transaction details (amount, counterparty, notes).
- Timestamp = when the transaction was recorded (business time, not system time).
Snapshot Strategies for Efficient Replay
Replaying every event from the beginning of time on every read gets expensive fast. If an aggregate has 100k events, a single read takes 100k SQL queries or a huge IN clause. Snapshots are the answer.
A snapshot captures the aggregate state at a specific point (sequence number). On read, you load the latest snapshot and replay only events after that snapshot. This keeps replay cost constant relative to the window between snapshots, not total history.
Decide when to take snapshots: after every N events, or after a time interval. Use a background process (e.g., cron job or scheduled task) that reads events since the last snapshot and writes the aggregated state. Ensure snapshot creation is idempotent: if two snapshot jobs overlap, the later one wins.
Store snapshots in a separate table with the same primary key structure but with the state data. Projections also need snapshots if they're rebuilt often.
Here's a Java snippet for snapshot creation logic.
Projections and CQRS Integration
Event sourcing naturally pairs with CQRS. The write side appends events, the read side builds projections — denormalised tables optimised for query patterns. Projections are updated asynchronously by subscribing to the event stream.
This decoupling lets you build multiple read models without touching the write schema. You can have a relational projection for reporting, a cache projection for APIs, and a search projection for Elasticsearch — all fed from the same event stream.
But projections introduce complexity: eventual consistency, idempotency, and rebuild capability. When a projection fails, you need to rebuild it from scratch by replaying all events. Idempotency ensures that replaying events multiple times doesn't double-count. Your projection handlers must be deterministic and handle duplicate events gracefully.
Use a dedicated projection table that tracks the last processed position (often the event store's transaction ID or sequence number). This allows the projection to resume after a crash without missing events.
Here's a simple projector example for reading account balances.
Schema Evolution: Handling Event Versioning
Events are immutable, but your understanding of the domain changes. You'll need to add fields, rename them, or change structure. The challenge: you have millions of events in the old format. You can't go back and update them — they're immutable.
Two strategies: versioned event types and upcasters.
Versioned event types: each event type carries a version number (e.g., OrderPlacedV1, OrderPlacedV2). New events are written in the new version. On replay, you check the version and apply the appropriate handler. Old events stay as-is.
Upcasters: conversion functions that transform an old event into the new format on-the-fly during replay. This centralizes the transformation and keeps application code dealing only with the latest version. Upcasters run after deserialization but before handler application.
Choose upcasters over versioned types if the number of event types grows large — you avoid handler proliferation. Choose versioned types if different handlers need different logic per version.
Real-world tools like EventStoreDB or Axon Framework have built-in upcaster support. For custom databases, you implement upcasters in your application layer.
Here's an upcaster example in Java.
Production Pitfalls: Real-World Mistakes and Fixes
After running event sourcing in production for months, teams hit predictable but painful problems. Here are the three that hurt most.
- Missing per-aggregate ordering. As covered in the incident report, without a correct sequence number and single-writer per aggregate, events get reordered. The fix: enforce that all events for an aggregate are appended by the same thread/process, or use optimistic locking with a version field in the aggregate.
- Snapshot corruption due to concurrent writes. If your snapshot job reads events while a new event is being written, it may snap an incomplete state. The snapshot appears valid but when replaying after it, you get duplicate or missing events. Solution: run snapshots on a read replica with a known lag, or use snapshot isolation level.
- Projection rebuild timeout. When a projection fails and needs a full rebuild, replaying millions of events can exceed HTTP timeout for APIs or exhaust memory. Test rebuild times regularly. Consider rebuilding via a background job with progress tracking, and serve stale read models until rebuild completes.
Below is a checklist for avoiding these pitfalls.
- Journal = truth. Append only, never update or delete.
- Snapshot = derived cache. Can be recomputed from journal at any time.
- Projections = further derived caches. Also recomputable.
- Read models will always be eventually consistent with the journal.
- If journal is corrupted, you lose everything — back up the journal.
The Missing Events Incident — Why Ordering Broke Our Audit Trail
- Never rely on physical clocks for event ordering — use sequence numbers per aggregate.
- If you need global event order, use a distributed sequencer or accept eventual consistency.
- Always test event replay under concurrent write load across time-skewed machines.
Key takeaways
Common mistakes to avoid
5 patternsUsing timestamp as event ordering key
Not implementing snapshots from day one
Treating events as immutable and never evolving the schema
Projection handler not idempotent
Running snapshot creation concurrently with event writes without isolation
Interview Questions on This Topic
Explain the difference between event sourcing and the event-driven architecture (EDA). Can they be combined?
Frequently Asked Questions
That's Database Design. Mark it forged?
6 min read · try the examples if you haven't