Senior 4 min · June 25, 2026

Transactional Outbox Pattern: Stop Losing Events in Production — The Definitive Guide

Q: What is the transactional outbox pattern and why is it needed?

It's a pattern that ensures messages are reliably sent after a database transaction by writing them to an outbox table in the same transaction. It's needed to solve the dual-write problem where a service crashes after committing to the DB but before sending a message, causing data loss.

Q: What's the difference between transactional outbox and saga pattern?

The outbox pattern ensures reliable message delivery from a single service. The saga pattern coordinates multiple services in a distributed transaction with compensating actions. Use outbox for reliable messaging, saga for multi-step business processes.

Q: How do I implement the transactional outbox pattern in Spring Boot?

Add an OutboxEvent entity and repository. In your service method annotated with @Transactional, save the business entity and the outbox event. Create a @Scheduled method that polls unpublished events and publishes them via a message template. Use @Transactional on the publisher method to mark events as published atomically.

Q: What happens if the outbox publisher fails to publish an event?

The event remains unpublished in the outbox table. The publisher will retry on the next poll cycle. To prevent infinite retries, implement a retry limit and move events to a dead-letter queue after N failures. Monitor the dead-letter queue for manual intervention.

Transactional outbox pattern explained with production code, failure modes, and debugging.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Use the transactional outbox pattern when you need to guarantee that a message is sent exactly once after a database transaction commits. Write the event to an outbox table in the same DB transaction, then have a background process poll and publish those events to your message broker.

✦ Definition~90s read

What is Transactional Outbox Pattern?

The transactional outbox pattern ensures reliable message delivery by writing events to a database table within the same transaction as the business operation, then a separate process publishes them to the message broker. This prevents data loss when the broker is unavailable or the service crashes after the DB commit but before the message is sent.

★

Imagine you're a bartender who needs to both pour a drink and tell the waitstaff it's ready.

Plain-English First

Imagine you're a bartender who needs to both pour a drink and tell the waitstaff it's ready. If you pour the drink but forget to call out, the waitstaff never picks it up. The outbox pattern is like writing the order on a slip of paper and putting it on a spike — the drink is poured (DB commit) and the order slip is there (outbox). A runner checks the spike every few seconds and yells the order. If the runner crashes, the slip is still on the spike when they restart. No drink gets lost.

You've been there. A payment succeeds in the database, but the confirmation email never sends. Or an order is placed, but the inventory service never gets the message. The root cause? Your service committed the transaction, then tried to publish a message — and crashed between the two. That's the dual-write problem, and it's been burning production systems since the dawn of microservices.

The transactional outbox pattern is the battle-tested solution. Instead of sending messages directly from your business logic, you write them to a database table in the same transaction. A separate process — the publisher — reads that table and sends the messages to the broker. If the publisher crashes, it picks up where it left off. No more lost events.

By the end of this article, you'll be able to implement the transactional outbox pattern in your own services, handle edge cases like duplicate messages and backpressure, and debug the most common production failures. You'll also know exactly when this pattern is overkill and what simpler alternatives exist.

Why You Can't Trust Direct Message Publishing

The naive approach: after your business logic commits the DB transaction, you publish a message to Kafka/RabbitMQ/SQS. This works 99.9% of the time. But that 0.1%? That's your 3am call. The service crashes between commit and publish. Or the broker is briefly unavailable. Or the network times out. The DB says the order is placed, but the rest of the system never knows.

Before the outbox pattern, teams hacked around this with two-phase commits (too slow), delayed retries (complex), or just hoped for the best (production incidents). The outbox pattern is the pragmatic middle ground: atomicity without distributed transactions.

NaiveApproach.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Naive approach: send message after DB commit
// THIS IS BROKEN — don't do this
public void placeOrder(Order order) {
    try {
        orderRepository.save(order);  // DB commit
        messagePublisher.publish("order.created", order);  // network call — can fail
    } catch (Exception e) {
        log.error("Failed to publish order event", e);
        // Order is saved, but event is lost. No retry.
    }
}

Output

Order saved to DB. Event not published. User gets no confirmation. Support ticket filed at 3am.

Production Trap:

The naive approach fails silently when the message broker is under load. You'll see 'Connection refused' or 'TimeoutException' in logs, but the business operation already committed. The event is gone forever unless you have a compensating transaction.

thecodeforge.io

Transactional Outbox Pattern Flow

Transactional Outbox

The Outbox Pattern: Atomic Writes to the Rescue

The fix is brutally simple: write the event to a database table (the outbox) in the same transaction as your business operation. If the transaction commits, the event is persisted. If it rolls back, the event disappears with the business data. A separate process — the publisher — reads the outbox table and sends events to the broker.

This decouples the reliability of the DB from the reliability of the network. The DB is local, fast, and transactional. The network is none of those things. By writing the event first, you guarantee it won't be lost even if the publisher crashes mid-flight.

ExampleSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Correct: write event to outbox in same transaction
public void placeOrder(Order order) {
    @Transactional
    public void placeOrder(Order order) {
        orderRepository.save(order);
        OutboxEvent event = new OutboxEvent(
            UUID.randomUUID(),
            "order.created",
            objectMapper.writeValueAsString(order),
            LocalDateTime.now()
        );
        outboxRepository.save(event);
        // Transaction commits — both order and event are persisted atomically
    }
}

// Publisher: polls outbox and sends events
@Scheduled(fixedDelay = 100)
public void publishOutboxEvents() {
    List<OutboxEvent> events = outboxRepository.findTop100ByPublishedFalseOrderByCreatedAt();
    for (OutboxEvent event : events) {
        try {
            messagePublisher.publish(event.getType(), event.getPayload());
            event.setPublished(true);
            event.setPublishedAt(LocalDateTime.now());
            outboxRepository.save(event);
        } catch (Exception e) {
            log.error("Failed to publish event {}", event.getId(), e);
            // Will be retried on next poll
        }
    }
}

Output

Order saved. Event saved. Publisher sends event within 100ms. If publisher crashes, event is still in outbox and will be picked up on restart.

Senior Shortcut:

Use a unique constraint on (aggregate_id, event_type) to prevent duplicate events. This also makes the publisher idempotent — if it crashes after publishing but before marking as published, the next poll will re-publish, but the consumer can deduplicate via the event ID.

thecodeforge.io

Outbox Pattern: Atomic Write Flow

Transactional Outbox

Polling vs CDC: Which Publisher Strategy Wins?

The simplest publisher polls the outbox table every N milliseconds. This works fine for most systems, but has two drawbacks: (1) polling adds latency proportional to the poll interval, and (2) it puts load on the database, especially if you have many rows.

Change Data Capture (CDC) is the alternative. Tools like Debezium read the database's transaction log (WAL in PostgreSQL, binlog in MySQL) and stream changes to Kafka. This gives you sub-millisecond latency and zero load on the application table. The trade-off: you now manage a CDC pipeline, which is another moving part.

My rule of thumb: if your latency requirement is <100ms and you already have Kafka, use CDC. If you're on a simpler stack or latency isn't critical, polling is fine. I've run polling-based outboxes at 500ms intervals handling 10k events/sec without issues.

ExampleSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Polling publisher with batch processing
@Scheduled(fixedDelay = 500)
public void publishBatch() {
    List<OutboxEvent> batch = outboxRepository.findTop500ByPublishedFalseOrderByCreatedAt();
    if (batch.isEmpty()) return;
    
    for (OutboxEvent event : batch) {
        try {
            messagePublisher.publish(event.getType(), event.getPayload());
            event.setPublished(true);
        } catch (Exception e) {
            log.warn("Retrying event {} later", event.getId());
            // Don't mark as published — will retry
        }
    }
    outboxRepository.saveAll(batch);  // Batch update
}

Output

Processes up to 500 events every 500ms. Failed events are retried on next cycle. Batch save reduces DB round-trips.

Interview Gold:

Interviewers love asking about the trade-off between polling and CDC. The key insight: polling is eventually consistent, CDC is near-real-time. But CDC adds operational complexity — you need to manage Debezium connectors, handle schema changes, and deal with the fact that the transaction log is a finite resource.

thecodeforge.io

Polling vs CDC Publisher

Transactional Outbox

Handling Duplicates: Idempotent Consumers Save Your Sanity

Even with the outbox pattern, duplicates can happen. The publisher might crash after publishing but before marking the event as published. On restart, it re-publishes the same event. Your consumer sees it twice.

The fix is idempotent consumers. Each event carries a unique ID (UUID). The consumer stores processed event IDs in a deduplication table (or uses a Redis set with TTL). Before processing an event, it checks if the ID was already processed. If yes, it skips.

This is not optional. Every production outbox implementation I've seen eventually produces duplicates — network retries, publisher crashes, DB replication lag. Idempotent consumers are your safety net.

ExampleSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Idempotent consumer
public void handleOrderCreated(OrderCreatedEvent event) {
    String eventId = event.getEventId();
    
    // Check deduplication store
    if (redisTemplate.opsForSet().isMember("processed_events", eventId)) {
        log.info("Skipping duplicate event {}", eventId);
        return;
    }
    
    // Process event
    inventoryService.reserveStock(event.getOrder());
    emailService.sendConfirmation(event.getOrder());
    
    // Mark as processed
    redisTemplate.opsForSet().add("processed_events", eventId);
    redisTemplate.expire("processed_events", 24, TimeUnit.HOURS);
}

Output

Duplicate events are silently skipped. The deduplication set expires after 24 hours to prevent unbounded growth.

Never Do This:

Don't rely on the database's unique constraint for deduplication in the consumer. If the consumer crashes after processing but before committing the dedup record, you'll get a constraint violation on retry, which can cascade into a poison message scenario.

When the Outbox Pattern Is Overkill

The outbox pattern adds complexity: an extra table, a publisher process, and deduplication logic. Don't use it if you don't need it.

When to skip it

You're using a message broker that supports transactions (e.g., Kafka with exactly-once semantics via transactional producers). But even then, the broker transaction is a distributed transaction — it's slower and more fragile.
Your system can tolerate occasional message loss. For example, a cache invalidation event can be retried on the next read.
You're building a prototype or internal tool where losing a message means a manual retry.

My rule: if losing a single message costs you money or reputation, use the outbox pattern. Otherwise, keep it simple.

Senior Shortcut:

Before implementing the outbox pattern, ask: 'What happens if this message is lost?' If the answer is 'We'll catch it in the next reconciliation job' or 'It's just a notification', skip the outbox. Save your complexity budget for things that matter.

Production Gotchas: What Will Burn You

I've seen three common failures in production outbox implementations:

Outbox table growth: If the publisher falls behind, the outbox table grows unbounded. This slows down your business transactions because every write also inserts into the outbox. Fix: add a TTL or archive processed events to a separate table. Or use a partitioned table and drop old partitions.
Deadlocks on the outbox table: If your publisher uses SELECT FOR UPDATE to claim events, and your business transaction also writes to the outbox, you can get deadlocks. Fix: use a separate connection pool for the publisher, or use optimistic locking with a version column.
Publisher backpressure: If the message broker is slow, the publisher's poll loop blocks, and events pile up. Fix: use a bounded queue in the publisher and drop events if the queue is full (with a dead-letter queue for retries).

ExampleSYSTEMDESIGN

// io.thecodeforge — System Design tutorial

// Publisher with backpressure: bounded queue and dead-letter
ExecutorService executor = Executors.newFixedThreadPool(4);
BlockingQueue<OutboxEvent> queue = new LinkedBlockingQueue<>(1000);

@Scheduled(fixedDelay = 100)
public void pollAndEnqueue() {
    List<OutboxEvent> events = outboxRepository.findTop100ByPublishedFalseOrderByCreatedAt();
    for (OutboxEvent event : events) {
        if (!queue.offer(event)) {
            // Queue full — move to dead-letter for manual inspection
            deadLetterRepository.save(event);
            log.warn("Outbox queue full, moved event {} to DLQ", event.getId());
        }
    }
}

// Worker threads process the queue
public void startWorkers() {
    for (int i = 0; i < 4; i++) {
        executor.submit(() -> {
            while (true) {
                OutboxEvent event = queue.take();
                try {
                    messagePublisher.publish(event.getType(), event.getPayload());
                    event.setPublished(true);
                    outboxRepository.save(event);
                } catch (Exception e) {
                    log.error("Failed to publish event {}", event.getId(), e);
                    // Re-enqueue or send to DLQ
                }
            }
        });
    }
}

Output

Events are enqueued into a bounded queue. If the queue is full, events go to a dead-letter table for manual inspection. Workers process events concurrently with backpressure.

The Classic Bug:

Forgetting to index the outbox table on (published, created_at). Without this index, the publisher's SELECT query does a full table scan, which kills performance as the table grows. Always add a composite index: CREATE INDEX idx_outbox_unpublished ON outbox_events (published, created_at) WHERE published = false;

● Production incidentPOST-MORTEMseverity: high

The 3AM Payment That Vanished

Symptom

Users reported successful payments but never received confirmation emails. The payment service logs showed the DB transaction committed, but the message queue had no corresponding event.

Assumption

The team assumed the message broker was dropping messages under load.

Root cause

The payment service sent the message directly after the DB commit, but a thread pool exhaustion caused the send to fail silently. The exception was caught and logged, but the event was never retried.

Fix

Moved to transactional outbox: write the event to an outbox table in the same DB transaction. A scheduled task polls the outbox every 100ms and publishes events. Added a unique constraint on event_id to prevent duplicates.

Key lesson

Never trust a network call after a DB commit.
Always persist the intent to send before the commit.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

Outbox table growing unboundedly; events not being published

→

Fix

1. Check publisher logs for errors (connection refused, timeout). 2. Verify the message broker is healthy. 3. Check if the publisher is running (process alive, no OOM). 4. If publisher is stuck, restart it. 5. If broker is down, fix broker first, then events will drain.

Symptom · 02

Duplicate events being processed by consumers

→

Fix

1. Check if the publisher marks events as published after sending. 2. Verify the consumer has idempotency logic (dedup table/Redis). 3. If not, add dedup. 4. Clear duplicate events from the consumer's side by deleting processed event IDs from the dedup store.

Symptom · 03

High latency in event delivery (minutes instead of seconds)

→

Fix

1. Check the poll interval configuration. 2. Check if the publisher is CPU-bound or I/O-bound. 3. Increase the number of publisher threads. 4. Consider switching to CDC if latency requirements are strict.

★ Transactional Outbox Pattern Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

Events not being published; outbox table growing−

Immediate action

Check publisher logs for errors

Commands

kubectl logs deployment/outbox-publisher --tail=100

SELECT COUNT(*) FROM outbox_events WHERE published = false;

Fix now

Restart publisher: kubectl rollout restart deployment/outbox-publisher

Duplicate events in consumer+

High latency in event delivery+

Deadlock errors in application logs+

Feature / Aspect	Polling Publisher	CDC Publisher
Latency	Poll interval (e.g., 100ms)	Sub-millisecond (real-time)
Database Load	SELECT queries every poll interval	Zero (reads transaction log)
Operational Complexity	Low (just a scheduled task)	High (Debezium, Kafka Connect)
Durability	Event survives if publisher crashes	Event survives if CDC connector crashes
Best For	Simple systems, low event volume	High-throughput, low-latency systems

Key takeaways

The transactional outbox pattern guarantees message delivery by persisting events in the same DB transaction as the business operation

never trust a network call after a commit.

Always implement idempotent consumers

duplicates are inevitable due to publisher crashes and retries. Use a deduplication store (Redis or DB) with TTL.

Index your outbox table on (published, created_at) with a partial index for unpublished events

otherwise your publisher will full-table-scan and kill performance.

The outbox pattern is overkill for systems that can tolerate occasional message loss. Save complexity for when losing a message costs you money or reputation.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does the transactional outbox pattern handle concurrent writes to th...

Q02SENIOR

When would you choose the transactional outbox pattern over a two-phase ...

Q03SENIOR

What happens if the publisher crashes after marking an event as publishe...

Q04JUNIOR

What is the transactional outbox pattern?

Q05SENIOR

You notice that events are being published but the consumer is processin...

Q06SENIOR

How would you design the outbox pattern for a system processing 100k eve...

Q01 of 06SENIOR

How does the transactional outbox pattern handle concurrent writes to the outbox table from multiple service instances?

ANSWER

Each instance writes to the outbox table in its own transaction. The publisher uses SELECT FOR UPDATE SKIP LOCKED (PostgreSQL) or a similar mechanism to claim events without blocking other instances. This ensures each event is processed exactly once by one publisher instance.

FAQ · 4 QUESTIONS

Frequently Asked Questions

What is the transactional outbox pattern and why is it needed?

What's the difference between transactional outbox and saga pattern?

How do I implement the transactional outbox pattern in Spring Boot?

What happens if the outbox publisher fails to publish an event?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Lessons pulled from things that broke in production.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Async & Data Processing. Mark it forged?

4 min read · try the examples if you haven't