Senior 4 min · June 25, 2026

Transactional Outbox Pattern: Stop Losing Events in Production — The Definitive Guide

Transactional outbox pattern explained with production code, failure modes, and debugging.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

Use the transactional outbox pattern when you need to guarantee that a message is sent exactly once after a database transaction commits. Write the event to an outbox table in the same DB transaction, then have a background process poll and publish those events to your message broker.

✦ Definition~90s read
What is Transactional Outbox Pattern?

The transactional outbox pattern ensures reliable message delivery by writing events to a database table within the same transaction as the business operation, then a separate process publishes them to the message broker. This prevents data loss when the broker is unavailable or the service crashes after the DB commit but before the message is sent.

Imagine you're a bartender who needs to both pour a drink and tell the waitstaff it's ready.
Plain-English First

Imagine you're a bartender who needs to both pour a drink and tell the waitstaff it's ready. If you pour the drink but forget to call out, the waitstaff never picks it up. The outbox pattern is like writing the order on a slip of paper and putting it on a spike — the drink is poured (DB commit) and the order slip is there (outbox). A runner checks the spike every few seconds and yells the order. If the runner crashes, the slip is still on the spike when they restart. No drink gets lost.

You've been there. A payment succeeds in the database, but the confirmation email never sends. Or an order is placed, but the inventory service never gets the message. The root cause? Your service committed the transaction, then tried to publish a message — and crashed between the two. That's the dual-write problem, and it's been burning production systems since the dawn of microservices.

The transactional outbox pattern is the battle-tested solution. Instead of sending messages directly from your business logic, you write them to a database table in the same transaction. A separate process — the publisher — reads that table and sends the messages to the broker. If the publisher crashes, it picks up where it left off. No more lost events.

By the end of this article, you'll be able to implement the transactional outbox pattern in your own services, handle edge cases like duplicate messages and backpressure, and debug the most common production failures. You'll also know exactly when this pattern is overkill and what simpler alternatives exist.

Why You Can't Trust Direct Message Publishing

The naive approach: after your business logic commits the DB transaction, you publish a message to Kafka/RabbitMQ/SQS. This works 99.9% of the time. But that 0.1%? That's your 3am call. The service crashes between commit and publish. Or the broker is briefly unavailable. Or the network times out. The DB says the order is placed, but the rest of the system never knows.

Before the outbox pattern, teams hacked around this with two-phase commits (too slow), delayed retries (complex), or just hoped for the best (production incidents). The outbox pattern is the pragmatic middle ground: atomicity without distributed transactions.

NaiveApproach.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
// io.thecodeforge — System Design tutorial

// Naive approach: send message after DB commit
// THIS IS BROKEN — don't do this
public void placeOrder(Order order) {
    try {
        orderRepository.save(order);  // DB commit
        messagePublisher.publish("order.created", order);  // network call — can fail
    } catch (Exception e) {
        log.error("Failed to publish order event", e);
        // Order is saved, but event is lost. No retry.
    }
}
Output
Order saved to DB. Event not published. User gets no confirmation. Support ticket filed at 3am.
Production Trap:
The naive approach fails silently when the message broker is under load. You'll see 'Connection refused' or 'TimeoutException' in logs, but the business operation already committed. The event is gone forever unless you have a compensating transaction.
Transactional Outbox Pattern Flow THECODEFORGE.IO Transactional Outbox Pattern Flow Atomic writes, polling vs CDC, idempotent consumers, and production traps Direct Publish Fails No atomicity; risk of lost events Outbox Table Write Insert event in same DB transaction Polling Publisher Cron job reads and sends events CDC Publisher Stream DB logs for real-time capture Idempotent Consumer Deduplicate via unique event ID ⚠ Outbox table not cleaned; infinite growth Add TTL or batch delete after successful publish THECODEFORGE.IO
thecodeforge.io
Transactional Outbox Pattern Flow
Transactional Outbox

The Outbox Pattern: Atomic Writes to the Rescue

The fix is brutally simple: write the event to a database table (the outbox) in the same transaction as your business operation. If the transaction commits, the event is persisted. If it rolls back, the event disappears with the business data. A separate process — the publisher — reads the outbox table and sends events to the broker.

This decouples the reliability of the DB from the reliability of the network. The DB is local, fast, and transactional. The network is none of those things. By writing the event first, you guarantee it won't be lost even if the publisher crashes mid-flight.

ExampleSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// io.thecodeforge — System Design tutorial

// Correct: write event to outbox in same transaction
public void placeOrder(Order order) {
    @Transactional
    public void placeOrder(Order order) {
        orderRepository.save(order);
        OutboxEvent event = new OutboxEvent(
            UUID.randomUUID(),
            "order.created",
            objectMapper.writeValueAsString(order),
            LocalDateTime.now()
        );
        outboxRepository.save(event);
        // Transaction commits — both order and event are persisted atomically
    }
}

// Publisher: polls outbox and sends events
@Scheduled(fixedDelay = 100)
public void publishOutboxEvents() {
    List<OutboxEvent> events = outboxRepository.findTop100ByPublishedFalseOrderByCreatedAt();
    for (OutboxEvent event : events) {
        try {
            messagePublisher.publish(event.getType(), event.getPayload());
            event.setPublished(true);
            event.setPublishedAt(LocalDateTime.now());
            outboxRepository.save(event);
        } catch (Exception e) {
            log.error("Failed to publish event {}", event.getId(), e);
            // Will be retried on next poll
        }
    }
}
Output
Order saved. Event saved. Publisher sends event within 100ms. If publisher crashes, event is still in outbox and will be picked up on restart.
Senior Shortcut:
Use a unique constraint on (aggregate_id, event_type) to prevent duplicate events. This also makes the publisher idempotent — if it crashes after publishing but before marking as published, the next poll will re-publish, but the consumer can deduplicate via the event ID.
Outbox Pattern: Atomic Write FlowTHECODEFORGE.IOOutbox Pattern: Atomic Write FlowBusiness logic + event insert in one DB transactionBusiness OpUpdate order status in DBInsert EventWrite event to outbox tableCommit TXBoth changes persist atomicallyPublisherPoll or CDC reads outboxSend to BrokerPublish event to Kafka/RabbitMQ⚠ If commit fails, event rolls back with business dataTHECODEFORGE.IO
thecodeforge.io
Outbox Pattern: Atomic Write Flow
Transactional Outbox

Polling vs CDC: Which Publisher Strategy Wins?

The simplest publisher polls the outbox table every N milliseconds. This works fine for most systems, but has two drawbacks: (1) polling adds latency proportional to the poll interval, and (2) it puts load on the database, especially if you have many rows.

Change Data Capture (CDC) is the alternative. Tools like Debezium read the database's transaction log (WAL in PostgreSQL, binlog in MySQL) and stream changes to Kafka. This gives you sub-millisecond latency and zero load on the application table. The trade-off: you now manage a CDC pipeline, which is another moving part.

My rule of thumb: if your latency requirement is <100ms and you already have Kafka, use CDC. If you're on a simpler stack or latency isn't critical, polling is fine. I've run polling-based outboxes at 500ms intervals handling 10k events/sec without issues.

ExampleSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — System Design tutorial

// Polling publisher with batch processing
@Scheduled(fixedDelay = 500)
public void publishBatch() {
    List<OutboxEvent> batch = outboxRepository.findTop500ByPublishedFalseOrderByCreatedAt();
    if (batch.isEmpty()) return;
    
    for (OutboxEvent event : batch) {
        try {
            messagePublisher.publish(event.getType(), event.getPayload());
            event.setPublished(true);
        } catch (Exception e) {
            log.warn("Retrying event {} later", event.getId());
            // Don't mark as published — will retry
        }
    }
    outboxRepository.saveAll(batch);  // Batch update
}
Output
Processes up to 500 events every 500ms. Failed events are retried on next cycle. Batch save reduces DB round-trips.
Interview Gold:
Interviewers love asking about the trade-off between polling and CDC. The key insight: polling is eventually consistent, CDC is near-real-time. But CDC adds operational complexity — you need to manage Debezium connectors, handle schema changes, and deal with the fact that the transaction log is a finite resource.
Polling vs CDC PublisherTHECODEFORGE.IOPolling vs CDC PublisherTwo strategies to read the outbox tablePollingSimple: SELECT every N msAdds latency = poll intervalDB load grows with rowsEasy to implementCDC (Debezium)Streams DB transaction logsNear real-time, low latencyMinimal DB impactRequires Kafka ConnectPolling for simplicity; CDC for scale and low latencyTHECODEFORGE.IO
thecodeforge.io
Polling vs CDC Publisher
Transactional Outbox

Handling Duplicates: Idempotent Consumers Save Your Sanity

Even with the outbox pattern, duplicates can happen. The publisher might crash after publishing but before marking the event as published. On restart, it re-publishes the same event. Your consumer sees it twice.

The fix is idempotent consumers. Each event carries a unique ID (UUID). The consumer stores processed event IDs in a deduplication table (or uses a Redis set with TTL). Before processing an event, it checks if the ID was already processed. If yes, it skips.

This is not optional. Every production outbox implementation I've seen eventually produces duplicates — network retries, publisher crashes, DB replication lag. Idempotent consumers are your safety net.

ExampleSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// io.thecodeforge — System Design tutorial

// Idempotent consumer
public void handleOrderCreated(OrderCreatedEvent event) {
    String eventId = event.getEventId();
    
    // Check deduplication store
    if (redisTemplate.opsForSet().isMember("processed_events", eventId)) {
        log.info("Skipping duplicate event {}", eventId);
        return;
    }
    
    // Process event
    inventoryService.reserveStock(event.getOrder());
    emailService.sendConfirmation(event.getOrder());
    
    // Mark as processed
    redisTemplate.opsForSet().add("processed_events", eventId);
    redisTemplate.expire("processed_events", 24, TimeUnit.HOURS);
}
Output
Duplicate events are silently skipped. The deduplication set expires after 24 hours to prevent unbounded growth.
Never Do This:
Don't rely on the database's unique constraint for deduplication in the consumer. If the consumer crashes after processing but before committing the dedup record, you'll get a constraint violation on retry, which can cascade into a poison message scenario.

When the Outbox Pattern Is Overkill

The outbox pattern adds complexity: an extra table, a publisher process, and deduplication logic. Don't use it if you don't need it.

When to skip it
  • You're using a message broker that supports transactions (e.g., Kafka with exactly-once semantics via transactional producers). But even then, the broker transaction is a distributed transaction — it's slower and more fragile.
  • Your system can tolerate occasional message loss. For example, a cache invalidation event can be retried on the next read.
  • You're building a prototype or internal tool where losing a message means a manual retry.

My rule: if losing a single message costs you money or reputation, use the outbox pattern. Otherwise, keep it simple.

Senior Shortcut:
Before implementing the outbox pattern, ask: 'What happens if this message is lost?' If the answer is 'We'll catch it in the next reconciliation job' or 'It's just a notification', skip the outbox. Save your complexity budget for things that matter.

Production Gotchas: What Will Burn You

I've seen three common failures in production outbox implementations:

  1. Outbox table growth: If the publisher falls behind, the outbox table grows unbounded. This slows down your business transactions because every write also inserts into the outbox. Fix: add a TTL or archive processed events to a separate table. Or use a partitioned table and drop old partitions.
  2. Deadlocks on the outbox table: If your publisher uses SELECT FOR UPDATE to claim events, and your business transaction also writes to the outbox, you can get deadlocks. Fix: use a separate connection pool for the publisher, or use optimistic locking with a version column.
  3. Publisher backpressure: If the message broker is slow, the publisher's poll loop blocks, and events pile up. Fix: use a bounded queue in the publisher and drop events if the queue is full (with a dead-letter queue for retries).
ExampleSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// io.thecodeforge — System Design tutorial

// Publisher with backpressure: bounded queue and dead-letter
ExecutorService executor = Executors.newFixedThreadPool(4);
BlockingQueue<OutboxEvent> queue = new LinkedBlockingQueue<>(1000);

@Scheduled(fixedDelay = 100)
public void pollAndEnqueue() {
    List<OutboxEvent> events = outboxRepository.findTop100ByPublishedFalseOrderByCreatedAt();
    for (OutboxEvent event : events) {
        if (!queue.offer(event)) {
            // Queue full — move to dead-letter for manual inspection
            deadLetterRepository.save(event);
            log.warn("Outbox queue full, moved event {} to DLQ", event.getId());
        }
    }
}

// Worker threads process the queue
public void startWorkers() {
    for (int i = 0; i < 4; i++) {
        executor.submit(() -> {
            while (true) {
                OutboxEvent event = queue.take();
                try {
                    messagePublisher.publish(event.getType(), event.getPayload());
                    event.setPublished(true);
                    outboxRepository.save(event);
                } catch (Exception e) {
                    log.error("Failed to publish event {}", event.getId(), e);
                    // Re-enqueue or send to DLQ
                }
            }
        });
    }
}
Output
Events are enqueued into a bounded queue. If the queue is full, events go to a dead-letter table for manual inspection. Workers process events concurrently with backpressure.
The Classic Bug:
Forgetting to index the outbox table on (published, created_at). Without this index, the publisher's SELECT query does a full table scan, which kills performance as the table grows. Always add a composite index: CREATE INDEX idx_outbox_unpublished ON outbox_events (published, created_at) WHERE published = false;
● Production incidentPOST-MORTEMseverity: high

The 3AM Payment That Vanished

Symptom
Users reported successful payments but never received confirmation emails. The payment service logs showed the DB transaction committed, but the message queue had no corresponding event.
Assumption
The team assumed the message broker was dropping messages under load.
Root cause
The payment service sent the message directly after the DB commit, but a thread pool exhaustion caused the send to fail silently. The exception was caught and logged, but the event was never retried.
Fix
Moved to transactional outbox: write the event to an outbox table in the same DB transaction. A scheduled task polls the outbox every 100ms and publishes events. Added a unique constraint on event_id to prevent duplicates.
Key lesson
  • Never trust a network call after a DB commit.
  • Always persist the intent to send before the commit.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Outbox table growing unboundedly; events not being published
Fix
1. Check publisher logs for errors (connection refused, timeout). 2. Verify the message broker is healthy. 3. Check if the publisher is running (process alive, no OOM). 4. If publisher is stuck, restart it. 5. If broker is down, fix broker first, then events will drain.
Symptom · 02
Duplicate events being processed by consumers
Fix
1. Check if the publisher marks events as published after sending. 2. Verify the consumer has idempotency logic (dedup table/Redis). 3. If not, add dedup. 4. Clear duplicate events from the consumer's side by deleting processed event IDs from the dedup store.
Symptom · 03
High latency in event delivery (minutes instead of seconds)
Fix
1. Check the poll interval configuration. 2. Check if the publisher is CPU-bound or I/O-bound. 3. Increase the number of publisher threads. 4. Consider switching to CDC if latency requirements are strict.
★ Transactional Outbox Pattern Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Events not being published; outbox table growing
Immediate action
Check publisher logs for errors
Commands
kubectl logs deployment/outbox-publisher --tail=100
SELECT COUNT(*) FROM outbox_events WHERE published = false;
Fix now
Restart publisher: kubectl rollout restart deployment/outbox-publisher
Duplicate events in consumer+
Immediate action
Check consumer dedup store
Commands
SMEMBERS processed_events | head -10
SELECT COUNT(*) FROM outbox_events WHERE published = true AND updated_at > NOW() - INTERVAL '5 minutes';
Fix now
Add idempotency logic to consumer if missing
High latency in event delivery+
Immediate action
Check poll interval and publisher threads
Commands
kubectl exec deployment/outbox-publisher -- cat /app/config/application.yml | grep poll
kubectl top pod -l app=outbox-publisher
Fix now
Reduce poll interval to 100ms or increase thread pool to 8
Deadlock errors in application logs+
Immediate action
Check if publisher uses SELECT FOR UPDATE
Commands
grep 'deadlock' /var/log/app/error.log
SHOW ENGINE INNODB STATUS;
Fix now
Switch to optimistic locking or use a separate connection pool for publisher
Feature / AspectPolling PublisherCDC Publisher
LatencyPoll interval (e.g., 100ms)Sub-millisecond (real-time)
Database LoadSELECT queries every poll intervalZero (reads transaction log)
Operational ComplexityLow (just a scheduled task)High (Debezium, Kafka Connect)
DurabilityEvent survives if publisher crashesEvent survives if CDC connector crashes
Best ForSimple systems, low event volumeHigh-throughput, low-latency systems

Key takeaways

1
The transactional outbox pattern guarantees message delivery by persisting events in the same DB transaction as the business operation
never trust a network call after a commit.
2
Always implement idempotent consumers
duplicates are inevitable due to publisher crashes and retries. Use a deduplication store (Redis or DB) with TTL.
3
Index your outbox table on (published, created_at) with a partial index for unpublished events
otherwise your publisher will full-table-scan and kill performance.
4
The outbox pattern is overkill for systems that can tolerate occasional message loss. Save complexity for when losing a message costs you money or reputation.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does the transactional outbox pattern handle concurrent writes to th...
Q02SENIOR
When would you choose the transactional outbox pattern over a two-phase ...
Q03SENIOR
What happens if the publisher crashes after marking an event as publishe...
Q04JUNIOR
What is the transactional outbox pattern?
Q05SENIOR
You notice that events are being published but the consumer is processin...
Q06SENIOR
How would you design the outbox pattern for a system processing 100k eve...
Q01 of 06SENIOR

How does the transactional outbox pattern handle concurrent writes to the outbox table from multiple service instances?

ANSWER
Each instance writes to the outbox table in its own transaction. The publisher uses SELECT FOR UPDATE SKIP LOCKED (PostgreSQL) or a similar mechanism to claim events without blocking other instances. This ensures each event is processed exactly once by one publisher instance.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the transactional outbox pattern and why is it needed?
02
What's the difference between transactional outbox and saga pattern?
03
How do I implement the transactional outbox pattern in Spring Boot?
04
What happens if the outbox publisher fails to publish an event?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Async & Data Processing. Mark it forged?

4 min read · try the examples if you haven't

Previous
Backpressure
7 / 7 · Async & Data Processing
Next
Cache Eviction Policies