Intermediate 8 min · May 23, 2026

Spring Boot Async Messaging Patterns: Kafka, RabbitMQ & Beyond

Q: Can I use both Kafka and RabbitMQ in the same Spring Boot application?

Yes. Add both spring-kafka and spring-boot-starter-amqp dependencies. Configure separate @Bean for KafkaTemplate and RabbitTemplate with their respective properties. Spring Boot's auto-configuration handles both. A common pattern is Kafka for domain event streaming between services and RabbitMQ for internal task queues or request-reply RPC within a bounded context.

Q: What is the maximum message size for Kafka and RabbitMQ?

Kafka's default max message size is 1MB (message.max.bytes on the broker). It can be increased to 10MB for large payloads, but this increases memory pressure on consumers and brokers. For larger payloads, store the data in S3/GCS and put only the reference in the Kafka message. RabbitMQ's default is 128MB but is also recommended to keep messages small. The same claim-check pattern applies: store large payloads externally and reference them in the message.

Q: How do I monitor Kafka consumer lag in production?

Use kafka-consumer-groups.sh --describe for CLI inspection. For production observability, use Kafka Exporter or Confluent's JMX metrics exposed via Micrometer to Prometheus, then build a Grafana dashboard on kafka_consumer_group_lag. Set alerts at lag > 1000 (warning) and lag > 10000 (critical). For RabbitMQ, use the management plugin REST API (/api/queues) or the rabbitmq-prometheus plugin.

Q: Should I use Spring Cloud Stream instead of Spring Kafka directly?

Spring Cloud Stream adds a binder abstraction that lets you swap Kafka for RabbitMQ by changing a dependency. Use it if you are building a framework or expect to support multiple brokers. For a specific-broker application, use Spring Kafka or Spring AMQP directly — you get better control over broker-specific features (Kafka transactions, RabbitMQ exchange types) without the abstraction overhead.

Q: How do I handle ordering in Kafka when I have multiple consumer instances?

Kafka guarantees ordering only within a partition. Ensure all events for a given entity use the same partition key: kafkaTemplate.send(topic, entityId.toString(), event). With N partitions and M consumer instances (M ≤ N), Kafka assigns each partition to exactly one consumer — all events for one entity go to one partition, processed by one consumer thread, in order. If M > N, some consumer instances will have no partitions assigned and will be idle.

Q: What is the difference between at-most-once, at-least-once, and exactly-once delivery?

At-most-once: messages may be lost, never duplicated. Achieved by auto-commit with Kafka. Appropriate for metrics/telemetry where occasional loss is acceptable. At-least-once: messages are never lost but may be duplicated. The standard for production systems — manual commit + retry on failure. Consumers must be idempotent. Exactly-once: no loss, no duplicates. Requires Kafka transactions (idempotent producer + transactional consumer) or an idempotency table in the consumer's database. The most expensive guarantee.

Master async messaging in Spring Boot: request-reply with correlation IDs, fire-and-forget, pub-sub, Kafka vs RabbitMQ trade-offs, and backpressure handling..

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Notes here come from systems that actually shipped.

✓ Production

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Fire-and-forget sends a message and moves on — no response expected, highest throughput
Request-reply adds a correlation ID and a reply topic/queue so the sender can match the response
Pub-sub decouples publishers from subscribers — multiple consumers receive the same event
Kafka excels at ordered, high-throughput, replayable event streams; RabbitMQ excels at flexible routing and low-latency RPC
Backpressure must be handled explicitly: bounded queues, consumer rate limiting, and dead-letter queues

✦ Definition~90s read

What is Async Messaging Patterns in Spring Boot?

Asynchronous messaging is a communication pattern where the sender deposits a message into an intermediate broker and immediately continues execution, without waiting for the receiver to process the message or return a response. The broker durably stores the message until a consumer is ready to process it, providing temporal decoupling (sender and receiver need not be available simultaneously) and load levelling (bursts of messages are absorbed by the broker rather than overwhelming the consumer).

★

Async messaging is like dropping a letter in a post box instead of making a phone call.

In Spring Boot, the two dominant brokers are Apache Kafka and RabbitMQ. Kafka models messages as an immutable, ordered, partitioned log — messages are retained for a configurable period (e.g., 7 days) and consumers track their position (offset) in the log, enabling replay.

RabbitMQ models messages as work items in queues — messages are typically deleted after acknowledgement, making replay impossible but making per-message routing, prioritisation, and TTL easy to configure.

Spring's messaging abstractions (@KafkaListener, RabbitListener, @SendTo, ReplyingKafkaTemplate) reduce boilerplate significantly, but production-grade messaging requires understanding the underlying broker semantics: delivery guarantees (at-most-once, at-least-once, exactly-once), ordering guarantees (per-partition in Kafka, per-queue in RabbitMQ), and failure handling (DLQ, retry policies, poison pill detection).

Plain-English First

Async messaging is like dropping a letter in a post box instead of making a phone call. Fire-and-forget is a flyer in the letterbox — you walk away immediately. Request-reply is a registered letter with a return address — you wait for the postman to bring an answer. Pub-sub is a newspaper — one publisher, many readers, each getting their own copy on their own schedule.

Synchronous HTTP calls are the instinctive choice when building microservices APIs. They are easy to reason about, easy to test, and return results immediately. But in production, synchronous calls become fragility vectors: a slow downstream service holds a thread, a timeout cascade can take down an entire call chain, and a service restart loses all in-flight requests. Async messaging inverts this dynamic — the caller deposits work into a broker and continues processing immediately, decoupled from the receiver's availability.

Spring Boot's messaging ecosystem is mature and rich. Spring Kafka wraps the Kafka client with @KafkaListener, @KafkaTemplate, and deep integration with Spring transactions and the application context lifecycle. Spring AMQP provides the same for RabbitMQ with additional routing primitives that Kafka's topic model does not offer. Spring Cloud Stream adds an abstraction layer that lets you switch brokers by changing a dependency.

The choice of pattern — fire-and-forget, request-reply, or pub-sub — shapes the entire architecture of a feature. Fire-and-forget is appropriate when the caller does not need a result: sending a welcome email, logging an audit event, triggering a background report. Request-reply is appropriate when the caller does need a result but can tolerate higher latency than a synchronous call: an order enrichment service that calls a pricing service asynchronously while computing other fields in parallel. Pub-sub is appropriate when multiple downstream systems need to react to the same event: an OrderPlaced event consumed by inventory, billing, analytics, and fulfilment.

The choice of broker shapes throughput, ordering guarantees, and routing capabilities. Kafka's partitioned log is the choice for high-throughput event streaming (millions of events per second), ordered processing within a partition, and event replay (new consumers can read from the beginning of a topic). RabbitMQ's exchange/queue model is the choice for flexible routing (topic exchanges, fanout exchanges, header routing), low-latency RPC patterns, and priority queues. Both are mature, production-proven, and have excellent Spring Boot support.

Backpressure is the often-overlooked consequence of async messaging. If producers publish faster than consumers can process, the broker queue grows without bound until it exhausts memory or disk. Explicit backpressure mechanisms — bounded queues, consumer concurrency limits, rate limiting, and dead-letter queues — are essential production requirements, not optional enhancements.

Fire-and-Forget: Maximising Throughput

Fire-and-forget is the simplest and highest-throughput async pattern. The producer publishes a message to the broker and immediately continues execution without waiting for processing confirmation beyond the broker's acknowledgement (acks=all for Kafka). It is appropriate when the business operation is complete once the event is durably recorded, regardless of downstream processing.

Common use cases: audit logging, welcome emails, analytics events, cache invalidation signals, search index updates. The key property these share is that the producer does not need to know whether or when the consumer processes the event, and a processing failure in the consumer can be handled independently (DLQ + alerting) without affecting the producer's success path.

In Spring Boot with Kafka, fire-and-forget uses KafkaTemplate.send(), which returns a CompletableFuture. For true fire-and-forget, you can ignore the future — the message is delivered to the broker regardless. For production resilience, add a callback to log send failures: listenableFuture.whenComplete((result, ex) -> { if (ex != null) log.error(...); }). This does not block the producer but gives visibility into broker-level failures.

With RabbitMQ, fire-and-forget uses RabbitTemplate.convertAndSend(). Publisher confirms should be enabled to detect broker-level rejections. Without publisher confirms, a message can be silently dropped if the exchange does not have a matching queue.

Critical producer configuration for Kafka reliability: acks=all (wait for all in-sync replicas), retries=Integer.MAX_VALUE with idempotent producer enabled, and max.in.flight.requests.per.connection=5 (safe with idempotent producer). This prevents message loss and duplicate production without sacrificing throughput.

acks=1 is Not Safe for Production

The default Kafka acks=1 setting acknowledges the message as soon as the partition leader receives it. If the leader fails before replication, the message is lost. Use acks=all (or acks=-1) with min.insync.replicas=2 for production topics that carry business events.

Production Insight

Enable idempotent producer (enable.idempotence=true) — it prevents duplicates on retry with zero throughput cost, and it requires acks=all automatically.

Key Takeaway

Fire-and-forget is only safe with acks=all, idempotent producer enabled, and a send-failure callback for visibility — 'fire' does not mean 'ignore errors'.

thecodeforge.io

Spring Boot Async Messaging

Request-Reply with Correlation IDs

Request-reply over a message broker gives you the asynchronous decoupling benefit while still returning a result to the caller. The pattern works by attaching a correlation ID to the outgoing request message and a reply-to topic or queue. The consumer processes the request and publishes its response to the reply-to destination, including the original correlation ID. The caller matches the incoming response to the outstanding request using the correlation ID.

Spring Kafka provides ReplyingKafkaTemplate, which automates this pattern. It creates a reply consumer, generates correlation IDs, serialises/deserialises request and reply, and blocks (or uses a CompletableFuture) until the response arrives or a timeout expires. It is conceptually clean but carries a production warning: blocking on .get() consumes a thread for the duration of the wait. Under load, this can exhaust thread pools and cause cascading timeouts.

The production-safe approach is to use non-blocking callbacks: replyingTemplate.sendAndReceive(record).whenComplete((reply, ex) -> { ... }). This releases the calling thread immediately and processes the reply on a separate callback thread.

RabbitMQ's RabbitTemplate.convertSendAndReceive() implements the same pattern for AMQP. RabbitMQ is often preferred for request-reply because its exchange/queue model supports per-request reply queues naturally, and its low-latency routing makes round trips shorter.

When to use request-reply over Kafka vs direct HTTP: use async request-reply when the caller can proceed with other work while waiting, or when the downstream service needs to batch requests for efficiency. Use HTTP when you need the response synchronously in the same request context and latency must be minimised.

Never Block on .get() Under Load

ReplyingKafkaTemplate.sendAndReceive().get() blocks a thread for up to the timeout duration. With 200 concurrent requests and a 10s timeout, you need 200 threads minimum. Use .thenApply() callbacks instead, or wrap in a virtual thread (Java 21+) to avoid blocking platform threads.

Production Insight

Set a per-request correlation ID in MDC before sending and restore it in the reply callback — this ties the request and its async reply in your distributed traces.

Key Takeaway

Request-reply over Kafka is powerful but thread-expensive if blocking — always use non-blocking callbacks in production and wrap with a circuit breaker.

Pub-Sub: Event-Driven Fan-Out

Pub-sub (publish-subscribe) is the pattern where a producer publishes an event to a topic or exchange, and any number of independent consumers receive and process their own copy of the event. The producer has no knowledge of its consumers — it simply publishes the fact that something happened. This is the core pattern of event-driven microservices.

In Kafka, pub-sub is implemented with consumer groups. Each consumer group gets its own copy of every message in the topic. Within a consumer group, each partition is processed by exactly one consumer — this provides horizontal scalability up to the number of partitions. To add a new subscriber (e.g., an analytics service wants to know about orders), simply create a new consumer group with a new groupId — no changes to the producer or other consumers required.

In RabbitMQ, pub-sub is implemented with fanout exchanges or topic exchanges. A fanout exchange routes every message to all bound queues. A topic exchange routes based on routing key patterns (orders.# routes all order events). Adding a new subscriber means binding a new queue to the exchange — again, no producer changes required.

The critical operational difference: Kafka retains messages for a configurable period (7 days by default), so a new consumer group can replay historical events. RabbitMQ does not — messages are deleted after acknowledgement, so a new queue only receives messages published after it was bound. This makes Kafka the choice for event sourcing and audit trails; RabbitMQ the choice for transient notifications.

Backpressure in pub-sub: if one consumer group is slow, it accumulates lag independently without affecting other consumer groups (Kafka) or other queues (RabbitMQ). Monitor per-consumer-group lag independently and scale slow consumers separately.

One Consumer Group Per Logical Consumer

Every distinct application that needs to process a Kafka topic must have its own consumer groupId. Sharing a group splits the partitions between applications — each message goes to only one of them. This is usually a misconfiguration bug that causes 50% of messages to be silently dropped from each consumer's perspective.

Production Insight

For fan-out to N services, prefer Kafka topics over RabbitMQ fanout exchanges — Kafka's replay capability means you can add new subscribers retroactively without missing historical events.

Key Takeaway

In Kafka pub-sub, adding a new subscriber is a zero-change operation for the producer — just register a new consumer group with a unique groupId.

thecodeforge.io

Spring Boot Async Messaging

Kafka vs RabbitMQ: Choosing the Right Broker

Kafka and RabbitMQ are both battle-tested message brokers, but their architectures suit different use cases. Choosing the wrong one creates friction that compounds over time.

Kafka's core abstraction is the immutable, partitioned, replicated log. Producers append to the end of the log. Consumers read at their own pace and track their position (offset). Messages are retained regardless of consumption — this enables replay, event sourcing, and adding new consumers without data loss. Kafka is the choice when: (1) throughput exceeds 100K messages/second, (2) message ordering within an entity is required, (3) event replay is needed (new service onboarding, disaster recovery, debugging), (4) the use case maps to event streaming (CDC, audit logs, ML feature pipelines).

RabbitMQ's core abstraction is the exchange/queue routing model. Messages are routed from exchanges to queues based on routing keys and bindings. Messages are deleted after consumer acknowledgement. RabbitMQ is the choice when: (1) flexible routing is needed (route order events to different queues based on country, priority, or product category), (2) per-message TTL or priority is needed, (3) request-reply RPC patterns dominate (low-latency round trips), (4) the team is more familiar with AMQP semantics.

Kafka disadvantages: operational complexity (ZooKeeper or KRaft mode, partition rebalancing, offset management), high storage cost if topics are large and retention is long, no per-message TTL or priority, minimum useful cluster size is 3 brokers. RabbitMQ disadvantages: no message replay after acknowledgement, throughput ceiling around 50K messages/second per queue, complex cluster topologies for high availability.

Hybrid architectures are common: use Kafka for domain events and event streaming, use RabbitMQ for task queues and low-latency RPC within a bounded context.

Never Use Auto-Commit in Production Kafka Consumers

enable.auto.commit=true commits offsets on a timer, regardless of whether messages were successfully processed. If the consumer crashes after the commit but before processing, those messages are permanently lost. Always use enable.auto.commit=false and commit manually after successful processing (or rely on Spring's AckMode.RECORD or BATCH).

Production Insight

Start with RabbitMQ if your team is smaller — it is easier to operate and debug. Migrate to Kafka when replay, ordering guarantees, or throughput >50K/s becomes a requirement.

Key Takeaway

Kafka for event streaming, ordering, and replay; RabbitMQ for flexible routing, low-latency RPC, and priority queues — use both in the same system when needed.

Backpressure: Preventing Consumer Overload

Backpressure is the mechanism by which a slow consumer signals to the producer to slow down or by which the system limits the rate of message production to match the rate of consumption. Without backpressure, an unconstrained producer can fill the broker queue until disk or memory is exhausted, causing either message loss (if the broker drops messages) or broker failure.

In Kafka, consumer-side backpressure is controlled by max.poll.records (how many messages the consumer fetches per poll), the processing time per message, and consumer concurrency. If the consumer takes 100ms per message and fetches 500 records per poll, it can process 5000 records/second per thread. If the producer publishes at 10K/second, you need at least 2 consumer threads (up to the number of partitions). Monitor consumer lag as the primary backpressure indicator.

In RabbitMQ, backpressure is controlled by prefetchCount (how many unacknowledged messages the broker delivers to a consumer). Setting prefetchCount=1 means the broker waits for an ack before sending the next message — maximum backpressure, minimum throughput. Setting prefetchCount=50 allows the consumer to buffer 50 messages locally — higher throughput but more messages in flight if the consumer crashes.

Producer-side rate limiting with Spring Cloud Gateway or Resilience4j RateLimiter prevents the producer from overwhelming the broker in the first place. For internal services, use a Semaphore or a bounded BlockingQueue at the producer to limit in-flight requests.

WebFlux (Project Reactor) provides reactive backpressure natively: a Flux<Message> pulled from the broker propagates backpressure signals upstream — if the consumer's flatMap is slow, the Flux stops pulling new messages until demand exists. This is the most elegant backpressure solution for high-throughput streaming pipelines.

Monitor Consumer Lag as Your Primary SLA Metric

Define an SLO on consumer lag: e.g., 'p95 lag must be below 500 messages'. Alert at 80% of the threshold. If lag consistently grows, it is a capacity signal — add more consumer instances (up to the partition count) before the queue becomes unmanageable.

Production Insight

Consumer lag is the most important operational metric for async systems — treat sustained lag growth the same way you treat rising HTTP error rates: it is a production incident.

Key Takeaway

Backpressure must be explicit — set concurrency limits, prefetch counts, and poll limits; then monitor consumer lag as the primary health signal.

Choreography vs Orchestration Sagas with Messaging

Sagas implemented via messaging are one of the most important applications of async messaging patterns in microservices. A choreography saga uses domain events to coordinate between services with no central controller. An orchestration saga uses a central saga service that sends commands and waits for responses.

Choreography with Kafka: each service publishes domain events to well-known topics. Other services subscribe and react. The saga 'flow' is encoded in the event subscriptions. Example: OrderService publishes OrderPlaced → InventoryService listens, reserves stock, publishes StockReserved → PaymentService listens, charges card, publishes PaymentProcessed → FulfillmentService listens, ships. Compensation flows in reverse: PaymentFailed → InventoryService listens to PaymentFailed, publishes StockReleased.

Orchestration with Kafka: the saga orchestrator publishes commands to each service's command topic. Each service processes the command and publishes a result event. The orchestrator listens for result events and advances the saga state machine. Spring State Machine or Axon Framework can implement the state machine; Temporal provides durable workflow execution with retries and timeouts built in.

For messaging-based sagas, correlation ID is essential — every command and event in the saga chain must carry the saga ID so the orchestrator (or any debugging tool) can correlate the entire flow. Include the saga ID in the Kafka message key for partitioned ordering of saga events.

Use Separate Topics for Commands and Events

Commands are directed at a specific service (ReserveStockCommand → inventory service only). Events are facts broadcast to all interested parties (StockReserved → any subscriber). Mixing them in one topic creates routing confusion and makes it impossible to give services independent consumption control. Use a topic naming convention: <service>.commands and <service>.events.

Production Insight

Store saga state in a dedicated database table with the current step, started_at, and updated_at — this is your operational dashboard when a saga gets stuck at 2 AM.

Key Takeaway

Orchestrated sagas provide visibility into complex multi-step flows at the cost of a central orchestrator; choreography provides simplicity at the cost of implicit flow visibility.

JMS Transaction Boundaries: Don't Lose Messages on a Crash

You think async makes you safe? Think again. Without proper transaction management, a message can be consumed and then vanish when your database write fails. That's a data loss incident waiting to happen.

Spring's JMS integration gives you two knobs: sessionTransacted (local JMS transaction) and a JtaTransactionManager (XA two-phase commit). For most services, sessionTransacted is enough — it ties the message acknowledgment to your listener's success. If your listener throws a RuntimeException, the message goes back on the queue.

But here's the gotcha: unless your database write is also in the same transaction, you can still lose data. If you're updating a PostgreSQL row and the commit before JMS ACK, a crash between commits sends your customer's order to the void.

For serious guarantees, use a JtaTransactionManager and a broker like ActiveMQ Artemis or IBM MQ that supports XA. Your platform team will thank you when you don't wake them at 3 AM.

OrderListener.javaJAVA

// io.thecodeforge — java tutorial
import org.springframework.jms.annotation.JmsListener;
import org.springframework.transaction.annotation.Transactional;

@Component
public class OrderListener {

    @Transactional
    @JmsListener(destination = "orders.new")
    public void onOrder(Order order) {
        // Both operations roll back together
        orderRepository.save(order);
        paymentService.charge(order.getAmount());
        // If paymentService throws, the JMS message is NOT acknowledged
        // It gets redelivered automatically
    }
}

Output

Message stays on queue if listener fails.

No ack until transaction commits.

Production Trap:

Setting sessionTransacted=true but not annotating your listener with @Transactional? You get local JMS rollback, but your DB write is a separate transaction. That's partial rollback — worse than no rollback.

Key Takeaway

SessionTransacted with @Transactional gives atomicity: message and data changes live or die together.

Message Selectors: Stop Wasting CPU on Irrelevant Messages

Every message your consumer ignores is still a message you paid to deserialize, validate, and discard. That's throughput you're burning.

JMS message selectors let you filter messages at the broker level. The broker only delivers messages that match your expression. You consume zero CPU for messages you don't care about.

The selector is a SQL-like expression on the JMS message headers (JMSType, JMSDeliveryMode) and your custom String properties. You cannot filter on the message body — that ship sailed in the JMS 1.1 spec.

When does this matter? When you have a shared queue for multiple event types (e.g., 'order.created', 'order.cancelled') and your listener only processes one type. Instead of a generic listener with a switch statement, use selectors to partition the work. Your team gets smaller, faster consumers that never touch the wrong event.

But careful: selectors add broker-side overhead. If 90% of your messages match, the broker is doing pointless work. Profile before you over-optimize.

FilteredConsumer.javaJAVA

// io.thecodeforge — java tutorial
import org.springframework.jms.annotation.JmsListener;
import org.springframework.stereotype.Component;

@Component
public class FilteredConsumer {

    @JmsListener(
        destination = "events",
        selector = "eventType = 'order.created'"
    )
    public void onOrderCreated(String payload) {
        System.out.println("Only sees order.created: " + payload);
    }

    @JmsListener(
        destination = "events",
        selector = "eventType = 'order.cancelled'"
    )
    public void onOrderCancelled(String payload) {
        System.out.println("Only sees order.cancelled: " + payload);
    }
}

Output

Broker delivers only matching messages.

No CPU wasted on deserialization of irrelevant events.

Selector Gotcha:

Selector expressions evaluate null properties as false. If a message lacks the 'eventType' header, it won't match ANY selector. Always set the header on every message.

Key Takeaway

Message selectors move filtering from your app to the broker — only pay for messages you process.

● Production incidentPOST-MORTEMseverity: high

ReplyingKafkaTemplate Timeout Storm During Kafka Broker Restart

Symptom

Alert at 11:43: order service error rate spiked to 94%. Logs showed thousands of 'Reply timed out after 10000ms' errors from ReplyingKafkaTemplate. All order creation attempts were failing. Payment and inventory services were healthy.

Assumption

The on-call engineer assumed a downstream service was down. They checked payment and inventory service health — both green. They restarted order service pods — no improvement. They did not check Kafka broker status.

Root cause

The Kafka platform team was performing a rolling broker restart for a configuration change. During the partition leader election (typically 5-30 seconds per broker), producers were blocked waiting for leader assignment. All in-flight ReplyingKafkaTemplate requests (synchronously blocking with .get(10, SECONDS)) were blocking threads. When the timeout expired, all threads threw simultaneously, causing a thread pool exhaustion spiral. New requests could not acquire threads and failed immediately.

Fix

Immediate: increased ReplyingKafkaTemplate timeout to 30s; reduced concurrency to prevent thread pool exhaustion; added circuit breaker around all async request-reply calls. Architectural: replaced synchronous ReplyingKafkaTemplate with async CompletableFuture-based request-reply with non-blocking callback; added Kafka broker health to the order service readiness probe.

Key lesson

Request-reply over Kafka is not truly async if you block on the response with .get().
Use non-blocking callbacks.
Add circuit breakers around all broker-dependent operations.
Never assume that because your application is healthy, its dependencies are.

Production debug guideSymptom → root cause → fix5 entries

Symptom · 01

Consumer group lag growing — messages accumulating faster than consumed

→

Fix

First, measure lag with kafka-consumer-groups.sh or Confluent Control Center. If lag is growing on all partitions, the consumer is too slow — profile the consumer logic for slow DB queries or downstream HTTP calls. If lag is growing on only some partitions, suspect a poison-pill message (one message causing repeated exceptions that block the partition). Check the consumer logs for repeated ERROR messages on the same offset. Move the blocking message to the DLQ by setting a retry backoff policy and a DLQ topic in the ConcurrentKafkaListenerContainerFactory. Then scale up consumer replicas (up to the number of partitions).

Symptom · 02

ReplyingKafkaTemplate requests timing out sporadically

→

Fix

Check the reply topic consumer group lag — if the reply consumer is behind, responses are arriving but not being matched. Verify the replyTopic is correct and the consumer is running. If lag is zero but timeouts persist, the downstream service is responding slowly — check its latency metrics. Increase the default reply timeout only after measuring the p95 response time of the downstream. Add a Resilience4j TimeLimiter circuit breaker so a slow reply does not block threads indefinitely. Consider replacing blocking .get() with non-blocking .thenApply() callback.

Symptom · 03

RabbitMQ queue growing — consumers not processing messages

→

Fix

Check consumer count with rabbitmqctl list_consumers. If consumers are zero, the Spring listener container has stopped — check for UncaughtException in the listener that caused the container to shut down (RabbitMQ listener containers stop on unhandled exceptions by default). Check application logs for 'SimpleMessageListenerContainer' errors. If consumers exist but lag grows, check prefetchCount — if it is 1 (default), throughput is limited to one message per consumer per round trip. Increase prefetchCount to 10-50 for batch workloads. Also check if messages are being NACK'd and re-queued in a tight loop (poison pill).

Symptom · 04

Messages processed out of order despite using Kafka

→

Fix

Kafka guarantees ordering only within a partition. If your consumer group has more than one partition, messages from different partitions are interleaved. Verify that all messages for a given entity (e.g., all events for orderId=123) are published to the same partition by using the entity ID as the message key: kafkaTemplate.send(topic, orderId.toString(), event). This ensures all events for one order land in the same partition and are processed in order. Do not use null keys for order-sensitive event streams.

Symptom · 05

Dead-letter queue filling up — messages repeatedly failing

→

Fix

Inspect the DLQ messages — look at the x-death headers (RabbitMQ) or the original offset (Kafka) to understand which messages are failing. Reproduce the failure locally with a sample message. Common causes: deserialization error (schema changed, use versioned Avro), NPE in business logic (new field added to event, consumer not null-safe), downstream service timeout (consumer making synchronous HTTP call that times out). Fix the consumer bug, then replay DLQ messages after verification.

★ Debug Cheat SheetImmediate diagnostic commands for async messaging incidents

Kafka consumer group lagging−

Immediate action

Measure lag per partition and identify if all or some partitions are affected

Commands

kafka-consumer-groups.sh --bootstrap-server kafka:9092 --describe --group my-consumer-group

kubectl logs -l app=my-consumer --since=5m | grep -c 'ERROR'

Fix now

If lag on all partitions: scale consumer replicas. If lag on one partition: find and DLQ the poison-pill message blocking that partition.

RabbitMQ queue depth growing+

Messages in dead-letter queue+

ReplyingKafkaTemplate timeout errors+

Kafka vs RabbitMQ Feature Comparison

Feature	Apache Kafka	RabbitMQ
Max throughput	Millions/sec (partitioned)	~50K/sec per queue
Message retention	Time or size-based (days/weeks)	Until acknowledged (no replay)
Ordering	Per-partition guarantee	Per-queue guarantee
Routing	Topic + partition key only	Exchange/routing key (powerful)
Request-reply RPC	Possible (ReplyingKafkaTemplate)	Native (convertSendAndReceive)
Message priority	Not supported	Per-queue priority (0-255)
Per-message TTL	Not supported	Supported
Event replay	Yes (offset rewind)	No
Minimum cluster	3 brokers (production)	1 node (mirrored for HA)
Spring Boot support	Spring Kafka	Spring AMQP
Best for	Event streaming, audit logs, CDC	Task queues, RPC, flexible routing

⚙ Quick Reference

2 commands from this guide

File	Command / Code	Purpose
OrderListener.java	@Component	JMS Transaction Boundaries
FilteredConsumer.java	@Component	Message Selectors

Key takeaways

Fire-and-forget is only safe with acks=all, idempotent producer, and send-failure logging

silent drops are a production risk

Never block on ReplyingKafkaTemplate.get() under load

use non-blocking CompletableFuture callbacks and add circuit breakers

Each logically distinct Kafka consumer must have a unique consumer groupId

sharing splits partitions and causes silent message loss

Consumer lag is the primary health metric for async systems

alert before it becomes unmanageable

Kafka for event streaming and replay; RabbitMQ for flexible routing and low-latency RPC

both can coexist in one system

Poison-pill messages must be detected and routed to a DLT

without a DLT, one bad message blocks an entire partition indefinitely

Common mistakes to avoid

6 patterns

Using auto.commit=true in Kafka consumers

Symptom

Messages lost after consumer crash — offsets committed before processing completes

Fix

Set enable-auto-commit: false; use Spring's AckMode.RECORD to commit after each successful message processing

Blocking on ReplyingKafkaTemplate.get() in a web request thread

Symptom

Thread pool exhaustion under load — all threads blocked waiting for replies; new requests rejected with 503

Fix

Use .thenApply() non-blocking callback or run in a virtual thread (Java 21+); add a Resilience4j TimeLimiter around the reply future

Sharing a Kafka consumer groupId between two different services

Symptom

Each service processes only ~50% of messages (Kafka splits partitions between group members)

Fix

Each logically distinct consumer must have a unique groupId — use the service name as the groupId

Not setting a dead-letter queue for failed messages

Symptom

Poison-pill messages cause the consumer to retry indefinitely, blocking all subsequent messages on that partition

Fix

Configure DefaultErrorHandler with DeadLetterPublishingRecoverer and a FixedBackOff(interval, maxAttempts); monitor DLQ size as an alert

Publishing events with null keys in Kafka when ordering matters

Symptom

Events for the same entity arrive out of order because they land on random partitions

Fix

Always use the entity ID (orderId, customerId) as the Kafka message key — this guarantees all events for one entity go to the same partition

Not enabling publisher confirms in RabbitMQ

Symptom

Messages silently dropped when exchange has no bound queue — producer receives no error

Fix

Enable publisher-confirms: correlated and publisher-returns: true in application.yml; add a ConfirmCallback and ReturnCallback to log failures

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

What is the difference between fire-and-forget, request-reply, and pub-s...

Q02JUNIOR

Why should you never use enable.auto.commit=true in a production Kafka c...

Q03SENIOR

When would you choose Kafka over RabbitMQ?

Q04SENIOR

How does backpressure work in Kafka consumers?

Q05SENIOR

Explain the correlation ID pattern in request-reply messaging.

Q06SENIOR

What is a poison-pill message and how do you handle it?

Q07SENIOR

How do you implement exactly-once processing in Kafka?

Q08SENIOR

How would you design a request-reply pattern over Kafka that does not bl...

Q09SENIOR

How do you handle schema evolution in Kafka message payloads?

Q01 of 09JUNIOR

What is the difference between fire-and-forget, request-reply, and pub-sub messaging patterns?

ANSWER

Fire-and-forget: producer sends a message and does not wait for a response. Used for audit events, emails, analytics. Request-reply: producer sends a message with a correlation ID and reply-to address, then waits (blocking or async) for a response from the consumer. Used for async RPC where the result is needed. Pub-sub: producer publishes to a topic; multiple independent consumers each receive the message in their own consumer group. Used for event-driven fan-out where multiple services react to the same event.

FAQ · 6 QUESTIONS

Frequently Asked Questions

Can I use both Kafka and RabbitMQ in the same Spring Boot application?

What is the maximum message size for Kafka and RabbitMQ?

How do I monitor Kafka consumer lag in production?

Should I use Spring Cloud Stream instead of Spring Kafka directly?

How do I handle ordering in Kafka when I have multiple consumer instances?

What is the difference between at-most-once, at-least-once, and exactly-once delivery?

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Notes here come from systems that actually shipped.

✓ Verified

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

🔥

That's Messaging. Mark it forged?

8 min read · try the examples if you haven't