Senior 10 min · May 23, 2026

Spring Boot Async Messaging Patterns: Kafka, RabbitMQ & Beyond

Master async messaging in Spring Boot: request-reply with correlation IDs, fire-and-forget, pub-sub, Kafka vs RabbitMQ trade-offs, and backpressure handling.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Fire-and-forget sends a message and moves on — no response expected, highest throughput
  • Request-reply adds a correlation ID and a reply topic/queue so the sender can match the response
  • Pub-sub decouples publishers from subscribers — multiple consumers receive the same event
  • Kafka excels at ordered, high-throughput, replayable event streams; RabbitMQ excels at flexible routing and low-latency RPC
  • Backpressure must be handled explicitly: bounded queues, consumer rate limiting, and dead-letter queues
✦ Definition~90s read
What is Spring Boot Async Messaging Patterns?

Asynchronous messaging is a communication pattern where the sender deposits a message into an intermediate broker and immediately continues execution, without waiting for the receiver to process the message or return a response. The broker durably stores the message until a consumer is ready to process it, providing temporal decoupling (sender and receiver need not be available simultaneously) and load levelling (bursts of messages are absorbed by the broker rather than overwhelming the consumer).

Async messaging is like dropping a letter in a post box instead of making a phone call.

In Spring Boot, the two dominant brokers are Apache Kafka and RabbitMQ. Kafka models messages as an immutable, ordered, partitioned log — messages are retained for a configurable period (e.g., 7 days) and consumers track their position (offset) in the log, enabling replay.

RabbitMQ models messages as work items in queues — messages are typically deleted after acknowledgement, making replay impossible but making per-message routing, prioritisation, and TTL easy to configure.

Spring's messaging abstractions (@KafkaListener, RabbitListener, @SendTo, ReplyingKafkaTemplate) reduce boilerplate significantly, but production-grade messaging requires understanding the underlying broker semantics: delivery guarantees (at-most-once, at-least-once, exactly-once), ordering guarantees (per-partition in Kafka, per-queue in RabbitMQ), and failure handling (DLQ, retry policies, poison pill detection).

Plain-English First

Async messaging is like dropping a letter in a post box instead of making a phone call. Fire-and-forget is a flyer in the letterbox — you walk away immediately. Request-reply is a registered letter with a return address — you wait for the postman to bring an answer. Pub-sub is a newspaper — one publisher, many readers, each getting their own copy on their own schedule.

Synchronous HTTP calls are the instinctive choice when building microservices APIs. They are easy to reason about, easy to test, and return results immediately. But in production, synchronous calls become fragility vectors: a slow downstream service holds a thread, a timeout cascade can take down an entire call chain, and a service restart loses all in-flight requests. Async messaging inverts this dynamic — the caller deposits work into a broker and continues processing immediately, decoupled from the receiver's availability.

Spring Boot's messaging ecosystem is mature and rich. Spring Kafka wraps the Kafka client with @KafkaListener, @KafkaTemplate, and deep integration with Spring transactions and the application context lifecycle. Spring AMQP provides the same for RabbitMQ with additional routing primitives that Kafka's topic model does not offer. Spring Cloud Stream adds an abstraction layer that lets you switch brokers by changing a dependency.

The choice of pattern — fire-and-forget, request-reply, or pub-sub — shapes the entire architecture of a feature. Fire-and-forget is appropriate when the caller does not need a result: sending a welcome email, logging an audit event, triggering a background report. Request-reply is appropriate when the caller does need a result but can tolerate higher latency than a synchronous call: an order enrichment service that calls a pricing service asynchronously while computing other fields in parallel. Pub-sub is appropriate when multiple downstream systems need to react to the same event: an OrderPlaced event consumed by inventory, billing, analytics, and fulfilment.

The choice of broker shapes throughput, ordering guarantees, and routing capabilities. Kafka's partitioned log is the choice for high-throughput event streaming (millions of events per second), ordered processing within a partition, and event replay (new consumers can read from the beginning of a topic). RabbitMQ's exchange/queue model is the choice for flexible routing (topic exchanges, fanout exchanges, header routing), low-latency RPC patterns, and priority queues. Both are mature, production-proven, and have excellent Spring Boot support.

Backpressure is the often-overlooked consequence of async messaging. If producers publish faster than consumers can process, the broker queue grows without bound until it exhausts memory or disk. Explicit backpressure mechanisms — bounded queues, consumer concurrency limits, rate limiting, and dead-letter queues — are essential production requirements, not optional enhancements.

Fire-and-Forget: Maximising Throughput

Fire-and-forget is the simplest and highest-throughput async pattern. The producer publishes a message to the broker and immediately continues execution without waiting for processing confirmation beyond the broker's acknowledgement (acks=all for Kafka). It is appropriate when the business operation is complete once the event is durably recorded, regardless of downstream processing.

Common use cases: audit logging, welcome emails, analytics events, cache invalidation signals, search index updates. The key property these share is that the producer does not need to know whether or when the consumer processes the event, and a processing failure in the consumer can be handled independently (DLQ + alerting) without affecting the producer's success path.

In Spring Boot with Kafka, fire-and-forget uses KafkaTemplate.send(), which returns a CompletableFuture. For true fire-and-forget, you can ignore the future — the message is delivered to the broker regardless. For production resilience, add a callback to log send failures: listenableFuture.whenComplete((result, ex) -> { if (ex != null) log.error(...); }). This does not block the producer but gives visibility into broker-level failures.

With RabbitMQ, fire-and-forget uses RabbitTemplate.convertAndSend(). Publisher confirms should be enabled to detect broker-level rejections. Without publisher confirms, a message can be silently dropped if the exchange does not have a matching queue.

Critical producer configuration for Kafka reliability: acks=all (wait for all in-sync replicas), retries=Integer.MAX_VALUE with idempotent producer enabled, and max.in.flight.requests.per.connection=5 (safe with idempotent producer). This prevents message loss and duplicate production without sacrificing throughput.

acks=1 is Not Safe for Production
The default Kafka acks=1 setting acknowledges the message as soon as the partition leader receives it. If the leader fails before replication, the message is lost. Use acks=all (or acks=-1) with min.insync.replicas=2 for production topics that carry business events.
Production Insight
Enable idempotent producer (enable.idempotence=true) — it prevents duplicates on retry with zero throughput cost, and it requires acks=all automatically.
Key Takeaway
Fire-and-forget is only safe with acks=all, idempotent producer enabled, and a send-failure callback for visibility — 'fire' does not mean 'ignore errors'.

Request-Reply with Correlation IDs

Request-reply over a message broker gives you the asynchronous decoupling benefit while still returning a result to the caller. The pattern works by attaching a correlation ID to the outgoing request message and a reply-to topic or queue. The consumer processes the request and publishes its response to the reply-to destination, including the original correlation ID. The caller matches the incoming response to the outstanding request using the correlation ID.

Spring Kafka provides ReplyingKafkaTemplate, which automates this pattern. It creates a reply consumer, generates correlation IDs, serialises/deserialises request and reply, and blocks (or uses a CompletableFuture) until the response arrives or a timeout expires. It is conceptually clean but carries a production warning: blocking on .get() consumes a thread for the duration of the wait. Under load, this can exhaust thread pools and cause cascading timeouts.

The production-safe approach is to use non-blocking callbacks: replyingTemplate.sendAndReceive(record).whenComplete((reply, ex) -> { ... }). This releases the calling thread immediately and processes the reply on a separate callback thread.

RabbitMQ's RabbitTemplate.convertSendAndReceive() implements the same pattern for AMQP. RabbitMQ is often preferred for request-reply because its exchange/queue model supports per-request reply queues naturally, and its low-latency routing makes round trips shorter.

When to use request-reply over Kafka vs direct HTTP: use async request-reply when the caller can proceed with other work while waiting, or when the downstream service needs to batch requests for efficiency. Use HTTP when you need the response synchronously in the same request context and latency must be minimised.

Never Block on .get() Under Load
ReplyingKafkaTemplate.sendAndReceive().get() blocks a thread for up to the timeout duration. With 200 concurrent requests and a 10s timeout, you need 200 threads minimum. Use .thenApply() callbacks instead, or wrap in a virtual thread (Java 21+) to avoid blocking platform threads.
Production Insight
Set a per-request correlation ID in MDC before sending and restore it in the reply callback — this ties the request and its async reply in your distributed traces.
Key Takeaway
Request-reply over Kafka is powerful but thread-expensive if blocking — always use non-blocking callbacks in production and wrap with a circuit breaker.

Pub-Sub: Event-Driven Fan-Out

Pub-sub (publish-subscribe) is the pattern where a producer publishes an event to a topic or exchange, and any number of independent consumers receive and process their own copy of the event. The producer has no knowledge of its consumers — it simply publishes the fact that something happened. This is the core pattern of event-driven microservices.

In Kafka, pub-sub is implemented with consumer groups. Each consumer group gets its own copy of every message in the topic. Within a consumer group, each partition is processed by exactly one consumer — this provides horizontal scalability up to the number of partitions. To add a new subscriber (e.g., an analytics service wants to know about orders), simply create a new consumer group with a new groupId — no changes to the producer or other consumers required.

In RabbitMQ, pub-sub is implemented with fanout exchanges or topic exchanges. A fanout exchange routes every message to all bound queues. A topic exchange routes based on routing key patterns (orders.# routes all order events). Adding a new subscriber means binding a new queue to the exchange — again, no producer changes required.

The critical operational difference: Kafka retains messages for a configurable period (7 days by default), so a new consumer group can replay historical events. RabbitMQ does not — messages are deleted after acknowledgement, so a new queue only receives messages published after it was bound. This makes Kafka the choice for event sourcing and audit trails; RabbitMQ the choice for transient notifications.

Backpressure in pub-sub: if one consumer group is slow, it accumulates lag independently without affecting other consumer groups (Kafka) or other queues (RabbitMQ). Monitor per-consumer-group lag independently and scale slow consumers separately.

One Consumer Group Per Logical Consumer
Every distinct application that needs to process a Kafka topic must have its own consumer groupId. Sharing a group splits the partitions between applications — each message goes to only one of them. This is usually a misconfiguration bug that causes 50% of messages to be silently dropped from each consumer's perspective.
Production Insight
For fan-out to N services, prefer Kafka topics over RabbitMQ fanout exchanges — Kafka's replay capability means you can add new subscribers retroactively without missing historical events.
Key Takeaway
In Kafka pub-sub, adding a new subscriber is a zero-change operation for the producer — just register a new consumer group with a unique groupId.

Kafka vs RabbitMQ: Choosing the Right Broker

Kafka and RabbitMQ are both battle-tested message brokers, but their architectures suit different use cases. Choosing the wrong one creates friction that compounds over time.

Kafka's core abstraction is the immutable, partitioned, replicated log. Producers append to the end of the log. Consumers read at their own pace and track their position (offset). Messages are retained regardless of consumption — this enables replay, event sourcing, and adding new consumers without data loss. Kafka is the choice when: (1) throughput exceeds 100K messages/second, (2) message ordering within an entity is required, (3) event replay is needed (new service onboarding, disaster recovery, debugging), (4) the use case maps to event streaming (CDC, audit logs, ML feature pipelines).

RabbitMQ's core abstraction is the exchange/queue routing model. Messages are routed from exchanges to queues based on routing keys and bindings. Messages are deleted after consumer acknowledgement. RabbitMQ is the choice when: (1) flexible routing is needed (route order events to different queues based on country, priority, or product category), (2) per-message TTL or priority is needed, (3) request-reply RPC patterns dominate (low-latency round trips), (4) the team is more familiar with AMQP semantics.

Kafka disadvantages: operational complexity (ZooKeeper or KRaft mode, partition rebalancing, offset management), high storage cost if topics are large and retention is long, no per-message TTL or priority, minimum useful cluster size is 3 brokers. RabbitMQ disadvantages: no message replay after acknowledgement, throughput ceiling around 50K messages/second per queue, complex cluster topologies for high availability.

Hybrid architectures are common: use Kafka for domain events and event streaming, use RabbitMQ for task queues and low-latency RPC within a bounded context.

Never Use Auto-Commit in Production Kafka Consumers
enable.auto.commit=true commits offsets on a timer, regardless of whether messages were successfully processed. If the consumer crashes after the commit but before processing, those messages are permanently lost. Always use enable.auto.commit=false and commit manually after successful processing (or rely on Spring's AckMode.RECORD or BATCH).
Production Insight
Start with RabbitMQ if your team is smaller — it is easier to operate and debug. Migrate to Kafka when replay, ordering guarantees, or throughput >50K/s becomes a requirement.
Key Takeaway
Kafka for event streaming, ordering, and replay; RabbitMQ for flexible routing, low-latency RPC, and priority queues — use both in the same system when needed.

Backpressure: Preventing Consumer Overload

Backpressure is the mechanism by which a slow consumer signals to the producer to slow down or by which the system limits the rate of message production to match the rate of consumption. Without backpressure, an unconstrained producer can fill the broker queue until disk or memory is exhausted, causing either message loss (if the broker drops messages) or broker failure.

In Kafka, consumer-side backpressure is controlled by max.poll.records (how many messages the consumer fetches per poll), the processing time per message, and consumer concurrency. If the consumer takes 100ms per message and fetches 500 records per poll, it can process 5000 records/second per thread. If the producer publishes at 10K/second, you need at least 2 consumer threads (up to the number of partitions). Monitor consumer lag as the primary backpressure indicator.

In RabbitMQ, backpressure is controlled by prefetchCount (how many unacknowledged messages the broker delivers to a consumer). Setting prefetchCount=1 means the broker waits for an ack before sending the next message — maximum backpressure, minimum throughput. Setting prefetchCount=50 allows the consumer to buffer 50 messages locally — higher throughput but more messages in flight if the consumer crashes.

Producer-side rate limiting with Spring Cloud Gateway or Resilience4j RateLimiter prevents the producer from overwhelming the broker in the first place. For internal services, use a Semaphore or a bounded BlockingQueue at the producer to limit in-flight requests.

WebFlux (Project Reactor) provides reactive backpressure natively: a Flux<Message> pulled from the broker propagates backpressure signals upstream — if the consumer's flatMap is slow, the Flux stops pulling new messages until demand exists. This is the most elegant backpressure solution for high-throughput streaming pipelines.

Monitor Consumer Lag as Your Primary SLA Metric
Define an SLO on consumer lag: e.g., 'p95 lag must be below 500 messages'. Alert at 80% of the threshold. If lag consistently grows, it is a capacity signal — add more consumer instances (up to the partition count) before the queue becomes unmanageable.
Production Insight
Consumer lag is the most important operational metric for async systems — treat sustained lag growth the same way you treat rising HTTP error rates: it is a production incident.
Key Takeaway
Backpressure must be explicit — set concurrency limits, prefetch counts, and poll limits; then monitor consumer lag as the primary health signal.

Choreography vs Orchestration Sagas with Messaging

Sagas implemented via messaging are one of the most important applications of async messaging patterns in microservices. A choreography saga uses domain events to coordinate between services with no central controller. An orchestration saga uses a central saga service that sends commands and waits for responses.

Choreography with Kafka: each service publishes domain events to well-known topics. Other services subscribe and react. The saga 'flow' is encoded in the event subscriptions. Example: OrderService publishes OrderPlaced → InventoryService listens, reserves stock, publishes StockReserved → PaymentService listens, charges card, publishes PaymentProcessed → FulfillmentService listens, ships. Compensation flows in reverse: PaymentFailed → InventoryService listens to PaymentFailed, publishes StockReleased.

Orchestration with Kafka: the saga orchestrator publishes commands to each service's command topic. Each service processes the command and publishes a result event. The orchestrator listens for result events and advances the saga state machine. Spring State Machine or Axon Framework can implement the state machine; Temporal provides durable workflow execution with retries and timeouts built in.

For messaging-based sagas, correlation ID is essential — every command and event in the saga chain must carry the saga ID so the orchestrator (or any debugging tool) can correlate the entire flow. Include the saga ID in the Kafka message key for partitioned ordering of saga events.

Use Separate Topics for Commands and Events
Commands are directed at a specific service (ReserveStockCommand → inventory service only). Events are facts broadcast to all interested parties (StockReserved → any subscriber). Mixing them in one topic creates routing confusion and makes it impossible to give services independent consumption control. Use a topic naming convention: <service>.commands and <service>.events.
Production Insight
Store saga state in a dedicated database table with the current step, started_at, and updated_at — this is your operational dashboard when a saga gets stuck at 2 AM.
Key Takeaway
Orchestrated sagas provide visibility into complex multi-step flows at the cost of a central orchestrator; choreography provides simplicity at the cost of implicit flow visibility.

JMS Transaction Boundaries: Don't Lose Messages on a Crash

You think async makes you safe? Think again. Without proper transaction management, a message can be consumed and then vanish when your database write fails. That's a data loss incident waiting to happen.

Spring's JMS integration gives you two knobs: sessionTransacted (local JMS transaction) and a JtaTransactionManager (XA two-phase commit). For most services, sessionTransacted is enough — it ties the message acknowledgment to your listener's success. If your listener throws a RuntimeException, the message goes back on the queue.

But here's the gotcha: unless your database write is also in the same transaction, you can still lose data. If you're updating a PostgreSQL row and the commit before JMS ACK, a crash between commits sends your customer's order to the void.

For serious guarantees, use a JtaTransactionManager and a broker like ActiveMQ Artemis or IBM MQ that supports XA. Your platform team will thank you when you don't wake them at 3 AM.

OrderListener.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// io.thecodeforge — java tutorial
import org.springframework.jms.annotation.JmsListener;
import org.springframework.transaction.annotation.Transactional;

@Component
public class OrderListener {

    @Transactional
    @JmsListener(destination = "orders.new")
    public void onOrder(Order order) {
        // Both operations roll back together
        orderRepository.save(order);
        paymentService.charge(order.getAmount());
        // If paymentService throws, the JMS message is NOT acknowledged
        // It gets redelivered automatically
    }
}
Output
Message stays on queue if listener fails.
No ack until transaction commits.
Production Trap:
Setting sessionTransacted=true but not annotating your listener with @Transactional? You get local JMS rollback, but your DB write is a separate transaction. That's partial rollback — worse than no rollback.
Key Takeaway
SessionTransacted with @Transactional gives atomicity: message and data changes live or die together.

Message Selectors: Stop Wasting CPU on Irrelevant Messages

Every message your consumer ignores is still a message you paid to deserialize, validate, and discard. That's throughput you're burning.

JMS message selectors let you filter messages at the broker level. The broker only delivers messages that match your expression. You consume zero CPU for messages you don't care about.

The selector is a SQL-like expression on the JMS message headers (JMSType, JMSDeliveryMode) and your custom String properties. You cannot filter on the message body — that ship sailed in the JMS 1.1 spec.

When does this matter? When you have a shared queue for multiple event types (e.g., 'order.created', 'order.cancelled') and your listener only processes one type. Instead of a generic listener with a switch statement, use selectors to partition the work. Your team gets smaller, faster consumers that never touch the wrong event.

But careful: selectors add broker-side overhead. If 90% of your messages match, the broker is doing pointless work. Profile before you over-optimize.

FilteredConsumer.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — java tutorial
import org.springframework.jms.annotation.JmsListener;
import org.springframework.stereotype.Component;

@Component
public class FilteredConsumer {

    @JmsListener(
        destination = "events",
        selector = "eventType = 'order.created'"
    )
    public void onOrderCreated(String payload) {
        System.out.println("Only sees order.created: " + payload);
    }

    @JmsListener(
        destination = "events",
        selector = "eventType = 'order.cancelled'"
    )
    public void onOrderCancelled(String payload) {
        System.out.println("Only sees order.cancelled: " + payload);
    }
}
Output
Broker delivers only matching messages.
No CPU wasted on deserialization of irrelevant events.
Selector Gotcha:
Selector expressions evaluate null properties as false. If a message lacks the 'eventType' header, it won't match ANY selector. Always set the header on every message.
Key Takeaway
Message selectors move filtering from your app to the broker — only pay for messages you process.
● Production incidentPOST-MORTEMseverity: high

ReplyingKafkaTemplate Timeout Storm During Kafka Broker Restart

Symptom
Alert at 11:43: order service error rate spiked to 94%. Logs showed thousands of 'Reply timed out after 10000ms' errors from ReplyingKafkaTemplate. All order creation attempts were failing. Payment and inventory services were healthy.
Assumption
The on-call engineer assumed a downstream service was down. They checked payment and inventory service health — both green. They restarted order service pods — no improvement. They did not check Kafka broker status.
Root cause
The Kafka platform team was performing a rolling broker restart for a configuration change. During the partition leader election (typically 5-30 seconds per broker), producers were blocked waiting for leader assignment. All in-flight ReplyingKafkaTemplate requests (synchronously blocking with .get(10, SECONDS)) were blocking threads. When the timeout expired, all threads threw simultaneously, causing a thread pool exhaustion spiral. New requests could not acquire threads and failed immediately.
Fix
Immediate: increased ReplyingKafkaTemplate timeout to 30s; reduced concurrency to prevent thread pool exhaustion; added circuit breaker around all async request-reply calls. Architectural: replaced synchronous ReplyingKafkaTemplate with async CompletableFuture-based request-reply with non-blocking callback; added Kafka broker health to the order service readiness probe.
Key lesson
  • Request-reply over Kafka is not truly async if you block on the response with .get().
  • Use non-blocking callbacks.
  • Add circuit breakers around all broker-dependent operations.
  • Never assume that because your application is healthy, its dependencies are.
Production debug guideSymptom → root cause → fix5 entries
Symptom · 01
Consumer group lag growing — messages accumulating faster than consumed
Fix
First, measure lag with kafka-consumer-groups.sh or Confluent Control Center. If lag is growing on all partitions, the consumer is too slow — profile the consumer logic for slow DB queries or downstream HTTP calls. If lag is growing on only some partitions, suspect a poison-pill message (one message causing repeated exceptions that block the partition). Check the consumer logs for repeated ERROR messages on the same offset. Move the blocking message to the DLQ by setting a retry backoff policy and a DLQ topic in the ConcurrentKafkaListenerContainerFactory. Then scale up consumer replicas (up to the number of partitions).
Symptom · 02
ReplyingKafkaTemplate requests timing out sporadically
Fix
Check the reply topic consumer group lag — if the reply consumer is behind, responses are arriving but not being matched. Verify the replyTopic is correct and the consumer is running. If lag is zero but timeouts persist, the downstream service is responding slowly — check its latency metrics. Increase the default reply timeout only after measuring the p95 response time of the downstream. Add a Resilience4j TimeLimiter circuit breaker so a slow reply does not block threads indefinitely. Consider replacing blocking .get() with non-blocking .thenApply() callback.
Symptom · 03
RabbitMQ queue growing — consumers not processing messages
Fix
Check consumer count with rabbitmqctl list_consumers. If consumers are zero, the Spring listener container has stopped — check for UncaughtException in the listener that caused the container to shut down (RabbitMQ listener containers stop on unhandled exceptions by default). Check application logs for 'SimpleMessageListenerContainer' errors. If consumers exist but lag grows, check prefetchCount — if it is 1 (default), throughput is limited to one message per consumer per round trip. Increase prefetchCount to 10-50 for batch workloads. Also check if messages are being NACK'd and re-queued in a tight loop (poison pill).
Symptom · 04
Messages processed out of order despite using Kafka
Fix
Kafka guarantees ordering only within a partition. If your consumer group has more than one partition, messages from different partitions are interleaved. Verify that all messages for a given entity (e.g., all events for orderId=123) are published to the same partition by using the entity ID as the message key: kafkaTemplate.send(topic, orderId.toString(), event). This ensures all events for one order land in the same partition and are processed in order. Do not use null keys for order-sensitive event streams.
Symptom · 05
Dead-letter queue filling up — messages repeatedly failing
Fix
Inspect the DLQ messages — look at the x-death headers (RabbitMQ) or the original offset (Kafka) to understand which messages are failing. Reproduce the failure locally with a sample message. Common causes: deserialization error (schema changed, use versioned Avro), NPE in business logic (new field added to event, consumer not null-safe), downstream service timeout (consumer making synchronous HTTP call that times out). Fix the consumer bug, then replay DLQ messages after verification.
★ Debug Cheat SheetImmediate diagnostic commands for async messaging incidents
Kafka consumer group lagging
Immediate action
Measure lag per partition and identify if all or some partitions are affected
Commands
kafka-consumer-groups.sh --bootstrap-server kafka:9092 --describe --group my-consumer-group
kubectl logs -l app=my-consumer --since=5m | grep -c 'ERROR'
Fix now
If lag on all partitions: scale consumer replicas. If lag on one partition: find and DLQ the poison-pill message blocking that partition.
RabbitMQ queue depth growing+
Immediate action
Check queue stats and consumer count
Commands
rabbitmqctl list_queues name messages consumers message_stats.publish_details.rate message_stats.deliver_get_details.rate
rabbitmqctl list_consumers queue_name consumer_tag ack_required prefetch_count
Fix now
If consumers=0: restart listener container. If consumers>0 but slow: increase prefetchCount and/or add consumer replicas.
Messages in dead-letter queue+
Immediate action
Inspect DLQ message headers to understand failure reason
Commands
kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic orders.DLT --from-beginning --max-messages 5 --property print.headers=true
rabbitmqctl get_messages my-queue.dlq 5
Fix now
Identify the failure type from headers (deserialization vs business logic vs timeout), fix the consumer, then replay DLQ messages
ReplyingKafkaTemplate timeout errors+
Immediate action
Check reply topic lag and downstream service latency
Commands
kafka-consumer-groups.sh --bootstrap-server kafka:9092 --describe --group my-app-reply-consumer
kubectl top pods -l app=downstream-service
Fix now
If reply consumer lag>0: consumer thread is blocked, check for slow DB/HTTP calls in reply handler. If lag=0: downstream is slow, add circuit breaker.
Kafka vs RabbitMQ Feature Comparison
FeatureApache KafkaRabbitMQ
Max throughputMillions/sec (partitioned)~50K/sec per queue
Message retentionTime or size-based (days/weeks)Until acknowledged (no replay)
OrderingPer-partition guaranteePer-queue guarantee
RoutingTopic + partition key onlyExchange/routing key (powerful)
Request-reply RPCPossible (ReplyingKafkaTemplate)Native (convertSendAndReceive)
Message priorityNot supportedPer-queue priority (0-255)
Per-message TTLNot supportedSupported
Event replayYes (offset rewind)No
Minimum cluster3 brokers (production)1 node (mirrored for HA)
Spring Boot supportSpring KafkaSpring AMQP
Best forEvent streaming, audit logs, CDCTask queues, RPC, flexible routing

Key takeaways

1
Fire-and-forget is only safe with acks=all, idempotent producer, and send-failure logging
silent drops are a production risk
2
Never block on ReplyingKafkaTemplate.get() under load
use non-blocking CompletableFuture callbacks and add circuit breakers
3
Each logically distinct Kafka consumer must have a unique consumer groupId
sharing splits partitions and causes silent message loss
4
Consumer lag is the primary health metric for async systems
alert before it becomes unmanageable
5
Kafka for event streaming and replay; RabbitMQ for flexible routing and low-latency RPC
both can coexist in one system
6
Poison-pill messages must be detected and routed to a DLT
without a DLT, one bad message blocks an entire partition indefinitely

Common mistakes to avoid

6 patterns
×

Using auto.commit=true in Kafka consumers

Symptom
Messages lost after consumer crash — offsets committed before processing completes
Fix
Set enable-auto-commit: false; use Spring's AckMode.RECORD to commit after each successful message processing
×

Blocking on ReplyingKafkaTemplate.get() in a web request thread

Symptom
Thread pool exhaustion under load — all threads blocked waiting for replies; new requests rejected with 503
Fix
Use .thenApply() non-blocking callback or run in a virtual thread (Java 21+); add a Resilience4j TimeLimiter around the reply future
×

Sharing a Kafka consumer groupId between two different services

Symptom
Each service processes only ~50% of messages (Kafka splits partitions between group members)
Fix
Each logically distinct consumer must have a unique groupId — use the service name as the groupId
×

Not setting a dead-letter queue for failed messages

Symptom
Poison-pill messages cause the consumer to retry indefinitely, blocking all subsequent messages on that partition
Fix
Configure DefaultErrorHandler with DeadLetterPublishingRecoverer and a FixedBackOff(interval, maxAttempts); monitor DLQ size as an alert
×

Publishing events with null keys in Kafka when ordering matters

Symptom
Events for the same entity arrive out of order because they land on random partitions
Fix
Always use the entity ID (orderId, customerId) as the Kafka message key — this guarantees all events for one entity go to the same partition
×

Not enabling publisher confirms in RabbitMQ

Symptom
Messages silently dropped when exchange has no bound queue — producer receives no error
Fix
Enable publisher-confirms: correlated and publisher-returns: true in application.yml; add a ConfirmCallback and ReturnCallback to log failures
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is the difference between fire-and-forget, request-reply, and pub-s...
Q02JUNIOR
Why should you never use enable.auto.commit=true in a production Kafka c...
Q03SENIOR
When would you choose Kafka over RabbitMQ?
Q04SENIOR
How does backpressure work in Kafka consumers?
Q05SENIOR
Explain the correlation ID pattern in request-reply messaging.
Q06SENIOR
What is a poison-pill message and how do you handle it?
Q07SENIOR
How do you implement exactly-once processing in Kafka?
Q08SENIOR
How would you design a request-reply pattern over Kafka that does not bl...
Q09SENIOR
How do you handle schema evolution in Kafka message payloads?
Q01 of 09JUNIOR

What is the difference between fire-and-forget, request-reply, and pub-sub messaging patterns?

ANSWER
Fire-and-forget: producer sends a message and does not wait for a response. Used for audit events, emails, analytics. Request-reply: producer sends a message with a correlation ID and reply-to address, then waits (blocking or async) for a response from the consumer. Used for async RPC where the result is needed. Pub-sub: producer publishes to a topic; multiple independent consumers each receive the message in their own consumer group. Used for event-driven fan-out where multiple services react to the same event.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
Can I use both Kafka and RabbitMQ in the same Spring Boot application?
02
What is the maximum message size for Kafka and RabbitMQ?
03
How do I monitor Kafka consumer lag in production?
04
Should I use Spring Cloud Stream instead of Spring Kafka directly?
05
How do I handle ordering in Kafka when I have multiple consumer instances?
06
What is the difference between at-most-once, at-least-once, and exactly-once delivery?
🔥

That's Messaging. Mark it forged?

10 min read · try the examples if you haven't

Previous
Event-Driven Architecture with Spring Boot
6 / 7 · Messaging
Next
Retry Mechanism in Spring Boot