Message Queue Components Explained — Producers, Brokers, Consumers and Beyond
- Producers fire-and-forget — they hand the message to the broker and return immediately. This latency independence between services is the core architectural benefit, and it is why message queues are the first tool experienced engineers reach for when synchronous coupling creates reliability problems.
- The acknowledgement model is your delivery guarantee: only ack after all side effects succeed and are committed to durable storage. Use nack with requeue=False for permanent failures to route to the DLQ. Auto-ack is a data loss mechanism with no legitimate use in any consumer that does real work.
- Queue means one consumer gets the message — correct for tasks. Topic or fanout means every subscriber gets the message — correct for domain events. Choose based on whether multiple downstream services need to react to the same event, not based on expected volume.
- A message queue decouples services by letting a producer drop a message and move on — the consumer picks it up whenever it's ready, regardless of whether the producer is still running
- Producers serialize and publish; brokers store and route; consumers process and acknowledge — these three form the core pipeline and each has independent failure modes
- Ack after success, Nack with requeue=False to route poison messages to the DLQ — never requeue a message that will never succeed
- RabbitMQ deletes messages after ACK (work queue model); Kafka retains them as an immutable, time-ordered log (streaming model) — the architectural difference is significant, not cosmetic
- Prefetch count = 1 prevents one fast consumer from hoarding all in-flight messages while slower consumers sit idle with nothing to process
- The biggest trap: auto-ack means the broker deletes the message the instant it is delivered, before your code does anything — a crash mid-processing loses it permanently with zero trace
Queue depth growing — consumers not draining despite appearing to run
docker exec forge-rabbitmq rabbitmqctl list_consumersdocker exec forge-rabbitmq rabbitmqctl list_queues name messages messages_unacknowledged consumersPoison message looping — same message reprocessed hundreds of times, consumer CPU spikes
docker exec forge-rabbitmq rabbitmqctl list_queues name messages_ready messages_unacknowledgeddocker logs forge-rabbitmq --tail 200 | grep -i 'dead'Messages published successfully — producer logs confirm — but consumer receives nothing
docker exec forge-rabbitmq rabbitmqctl list_bindingsdocker exec forge-rabbitmq rabbitmqctl list_queues name messagesProduction Incident
Production Debug GuideSymptom to Action mapping for common MQ failures in production
Modern software systems are rarely a single monolith running by itself. They are a constellation of services — payments, notifications, inventory, analytics — all needing to communicate. When service A calls service B synchronously and B is slow, A slows down too. When B crashes, A crashes or returns an error to the user. That tight coupling is a hidden time bomb in nearly every high-traffic system, and message queues are the first tool experienced engineers reach for.
A message queue solves one fundamental problem: it lets two services communicate without either needing to know the other's current availability or processing speed. The sender drops a message into the queue and moves on immediately. The receiver picks it up whenever it is ready. This tiny architectural shift unlocks fault tolerance, horizontal scalability, and independent deployability — the three properties every distributed system must eventually achieve.
By the end of this guide you will understand every moving part inside a message queue system — producers, brokers, queues versus topics, consumers, consumer groups, acknowledgements, and dead-letter queues — well enough to design one from scratch on a whiteboard, explain the trade-offs clearly under interview pressure, and avoid the subtle bugs that trip up engineers who only understand the happy path.
In 2026, message queues underpin everything from payment processing to LLM inference pipelines. The fundamentals have not changed, but the operational expectations have — consumers are expected to be idempotent by default, DLQs are expected to have alerting from day one, and the choice between RabbitMQ and Kafka is increasingly driven by replay requirements rather than throughput alone.
The Producer — Implementing Reliable Event Egress
The producer is the entry point for all data entering your distributed system, and its reliability story is more nuanced than most tutorials suggest. A producer that simply calls publish and moves on is adequate for logging and analytics. A producer handling payment events, order creation, or anything with financial or regulatory consequences needs to think carefully about three things: persistence, confirmation, and the atomicity gap between a database write and a queue publish.
Persistence is about what happens when the broker restarts. A non-durable message lives only in memory — broker restart, message gone. For anything that matters, use delivery mode 2 (persistent), which tells the broker to flush the message to disk before acknowledging receipt. The trade-off is disk I/O on every publish, which reduces maximum throughput but eliminates the silent data loss vector.
Confirmation is about knowing the broker actually received the message. The standard publish call is fire-and-forget — if the network drops between publish and the broker's internal receipt, you never know. Publisher confirms enable an ack back from the broker to the producer. This adds latency per message but gives you a guarantee that the message is in the broker's hands.
The atomicity gap is the hardest problem. If your producer writes a row to Postgres and then publishes an event to RabbitMQ, there is a window where the database write succeeds but the queue publish fails — or the process is killed between the two operations. The downstream services never learn that the order exists. The Outbox pattern closes this gap by writing the event into a table inside the same database transaction, with a separate polling process handling the actual publish to the queue.
package io.thecodeforge; import com.rabbitmq.client.Channel; import com.rabbitmq.client.Connection; import com.rabbitmq.client.ConnectionFactory; import com.rabbitmq.client.MessageProperties; import com.rabbitmq.client.ConfirmListener; import java.nio.charset.StandardCharsets; import java.util.UUID; import java.util.concurrent.ConcurrentNavigableMap; import java.util.concurrent.ConcurrentSkipListMap; /** * Production-grade producer with durable messaging and publisher confirms. * io.thecodeforge naming conventions throughout. * * Two critical settings: * 1. durable=true on the queue declaration — survives broker restart * 2. PERSISTENT_TEXT_PLAIN delivery mode — message flushed to disk before broker ACKs * * Publisher confirms (channel.confirmSelect()) ensure we know when the broker * has actually received and persisted the message, not just buffered it. */ public class OrderProducer { private static final String QUEUE_NAME = "order_processing"; // Track outstanding unconfirmed publishes: sequenceNumber -> message payload // In production this would trigger a retry or fallback path on nack private final ConcurrentNavigableMap<Long, String> outstandingConfirms = new ConcurrentSkipListMap<>(); public void publishOrder(String customerId, double amount) throws Exception { ConnectionFactory factory = new ConnectionFactory(); factory.setHost("localhost"); factory.setUsername("forge_user"); factory.setPassword("forge_password"); // One Connection per process — creating a new Connection per publish // adds 100-200ms of TCP handshake overhead and exhausts file descriptors at scale. try (Connection connection = factory.newConnection(); Channel channel = connection.createChannel()) { // durable=true: this queue survives a broker restart // exclusive=false: allow multiple consumers // autoDelete=false: queue persists when the last consumer disconnects channel.queueDeclare(QUEUE_NAME, true, false, false, null); // Enable publisher confirms — the broker will ack or nack each message // so we know it was received and persisted, not just buffered in memory channel.confirmSelect(); // Wire up confirm listeners before publishing channel.addConfirmListener(new ConfirmListener() { @Override public void handleAck(long deliveryTag, boolean multiple) { if (multiple) { outstandingConfirms.headMap(deliveryTag + 1).clear(); } else { outstandingConfirms.remove(deliveryTag); } System.out.printf(" [Forge] Broker confirmed delivery for tag: %d%n", deliveryTag); } @Override public void handleNack(long deliveryTag, boolean multiple) { // Broker rejected the message — trigger your retry or fallback path String nacked = outstandingConfirms.get(deliveryTag); System.err.printf(" [Forge] Broker NACK'd message, triggering fallback: %s%n", nacked); // In production: write to outbox table for later retry } }); // Generate a unique event ID that consumers use for idempotency checks String eventId = UUID.randomUUID().toString(); String payload = String.format( "{\"event_id\":\"%s\", \"customer_id\":\"%s\", \"amount\":%.2f, \"currency\":\"USD\"}", eventId, customerId, amount ); // Track this publish before sending so the confirm listener can correlate long sequenceNumber = channel.getNextPublishSeqNo(); outstandingConfirms.put(sequenceNumber, payload); // PERSISTENT_TEXT_PLAIN = Delivery Mode 2 + Content-Type text/plain // Mode 2 tells the broker to write to disk before acknowledging // Without this, a broker crash between receipt and flush loses the message channel.basicPublish( "", // default exchange — routes directly to named queue QUEUE_NAME, // routing key for default exchange = queue name MessageProperties.PERSISTENT_TEXT_PLAIN, payload.getBytes(StandardCharsets.UTF_8) ); System.out.printf(" [Forge] Published order event %s for customer %s%n", eventId, customerId); // Wait for broker confirmation before returning to caller // In high-throughput scenarios, use async confirms and batch publishing instead if (!channel.waitForConfirms(5000)) { throw new RuntimeException("Broker did not confirm message within 5 seconds"); } } } }
[Forge] Broker confirmed delivery for tag: 1
- Write the event to an 'outbox' table inside the same database transaction as your business logic — if the transaction rolls back, the event disappears with it
- A separate polling process reads uncommitted outbox rows and publishes them to the message queue — decoupling the database commit from the queue publish
- Once published successfully, the poller marks the outbox row as sent or deletes it — the idempotent marker prevents double-publish on poller restart
- This pattern guarantees atomicity: either both the data write and the event publish succeed, or neither does — no phantom events, no silent drops
- The trade-off is added latency equal to the polling interval (typically 100ms to 5 seconds) and an outbox table that needs periodic cleanup
The Broker — Orchestrating Infrastructure with Docker
The broker is the central post office of your distributed system — every message passes through it, and its configuration determines the durability, routing, and delivery guarantees your system provides. Senior engineers containerise the broker with Docker Compose for a simple reason: dev-prod parity. When your local RabbitMQ instance has the same exchanges, bindings, and policies as production, you catch routing bugs and configuration mismatches before they hit staging.
The compose file below includes a named volume for data persistence — without it, a docker-compose down wipes every message in every queue, which is fine for unit tests and genuinely dangerous for integration tests where you are verifying retry and DLQ behaviour. The management UI on port 15672 is not just an admin panel — during an incident it is your primary visibility into queue depth, consumer count, message rates, and memory usage. Know how to read it under pressure before you need it.
One often-overlooked aspect of broker configuration is the Dead Letter Exchange. Defining the DLX in your compose file alongside the primary queue means every developer's local environment has the same failure routing as production. Discovering that your DLQ configuration is wrong during a production incident is several orders of magnitude worse than discovering it locally.
version: '3.8' services: thecodeforge-broker: image: rabbitmq:3.13-management container_name: forge-rabbitmq ports: - "5672:5672" # AMQP protocol — application connections - "15672:15672" # Management UI — use for debugging queue depth and consumer health environment: RABBITMQ_DEFAULT_USER: forge_user RABBITMQ_DEFAULT_PASS: forge_password # Reduce heartbeat to detect dropped connections faster (default is 60s) RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS: "-rabbitmq_management load_definitions /etc/rabbitmq/definitions.json" volumes: # Named volume ensures messages survive docker-compose down and up # Without this, every restart wipes the queue — dangerous for DLQ testing - rabbitmq_data:/var/lib/rabbitmq # Pre-load exchange and queue definitions on startup for dev-prod parity - ./rabbitmq-definitions.json:/etc/rabbitmq/definitions.json:ro healthcheck: # Wait for the broker to be ready before starting dependent services test: ["CMD", "rabbitmq-diagnostics", "ping"] interval: 10s timeout: 5s retries: 5 start_period: 20s volumes: rabbitmq_data: # Explicit driver declaration makes the volume's purpose visible to future engineers driver: local
# forge-rabbitmq started. Waiting for healthcheck...
# rabbitmq-diagnostics ping succeeded after 15s
# Broker ready at amqp://forge_user:forge_password@localhost:5672
# Management UI at http://localhost:15672 (forge_user / forge_password)
#
# Verify broker is healthy:
# docker exec forge-rabbitmq rabbitmqctl status
# docker exec forge-rabbitmq rabbitmqctl list_queues name messages consumers
- Queue (point-to-point): one message, one consumer — correct for task distribution like image resizing, email sending, or payment processing where you want exactly one service to act
- Topic or fanout (pub/sub): one message, every subscriber — correct for domain events like ORDER_PLACED where inventory, notifications, and analytics all need to react independently
- Adding a new microservice to a topic requires zero producer code changes — bind a new queue to the existing exchange and the service starts receiving events
- Competing consumers on a single queue scale horizontally and naturally — add more consumer instances and the broker distributes work across all of them
- Rule of thumb: if more than one downstream service will ever need the event, use a topic exchange from day one — retrofitting it later requires coordinating multiple teams
The Consumer — Ensuring Idempotency and Manual Acknowledgement
Consumers are where processing happens and where most data loss actually occurs — not in the broker, not in the producer. The configuration decisions you make in the consumer determine whether your system has at-least-once delivery with retry guarantees or auto-ack with silent data loss disguised as reliability.
The most dangerous configuration is auto_ack=True. The broker interprets delivery as confirmation that the message was processed successfully. If your Java process receives the message, starts processing, and then gets killed by the OOM killer between the payment gateway call and the database commit, the message is gone. The broker has already deleted it. There is no retry, no DLQ entry, and no trace in any log — only a successful charge in the gateway and a missing order in your database.
Manual acknowledgement changes the contract: the broker keeps the message in an unacknowledged state until your code explicitly calls basicAck. Only call basicAck after every side effect has succeeded — not after receipt, not after the external API call, but after the database commit that makes the operation durable.
The second non-negotiable for production consumers is idempotency. At-least-once delivery means duplicates are guaranteed over time. A network blip between your ack call and the broker receiving it causes redelivery. A consumer restart with unacked messages causes redelivery. Assume it will happen and build accordingly — check the event_id against a processed_events table before acting, and ack without processing if the event was already handled.
Prefetch count rounds out the three required settings. Without it, RabbitMQ uses an unbounded prefetch and may deliver 500 messages to one fast consumer while three other consumers sit idle. Set basicQos(1) to enforce fair dispatch — the broker only sends a new message to a consumer after the previous one is acknowledged.
package io.thecodeforge; import com.rabbitmq.client.*; import java.nio.charset.StandardCharsets; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; /** * Production-grade consumer with: * 1. Manual acknowledgement — only ack after all side effects succeed * 2. Idempotency check — safe to receive the same message twice * 3. Prefetch=1 — fair dispatch across all consumer instances * 4. DLQ routing on permanent failure — no infinite poison message loops */ public class OrderConsumer { private static final String QUEUE_NAME = "order_processing"; private static final String DB_URL = "jdbc:postgresql://localhost:5432/forge"; private static final String DB_USER = "forge_user"; private static final String DB_PASS = "forge_password"; public static void main(String[] args) throws Exception { ConnectionFactory factory = new ConnectionFactory(); factory.setHost("localhost"); factory.setUsername("forge_user"); factory.setPassword("forge_password"); // Enable automatic connection recovery — reconnects after network interruptions // without requiring consumer code to restart or re-register callbacks factory.setAutomaticRecoveryEnabled(true); factory.setNetworkRecoveryInterval(5000); // retry every 5 seconds Connection connection = factory.newConnection(); Channel channel = connection.createChannel(); // Declare the queue with the same parameters as the producer // If the queue already exists with different params, this throws — catch it early channel.queueDeclare(QUEUE_NAME, true, false, false, null); // CRITICAL: prefetch=1 prevents one consumer from hoarding all messages // The broker will not deliver a new message until the previous one is acked // This forces round-robin distribution across all running consumer instances channel.basicQos(1); System.out.println(" [Forge] Consumer waiting for messages on: " + QUEUE_NAME); DeliverCallback deliverCallback = (consumerTag, delivery) -> { String body = new String(delivery.getBody(), StandardCharsets.UTF_8); long deliveryTag = delivery.getEnvelope().getDeliveryTag(); String eventId = extractEventId(body); System.out.printf(" [Forge] Received event %s%n", eventId); try { // Step 1: Idempotency check — has this event been processed before? // Required because at-least-once delivery means duplicates WILL arrive if (isAlreadyProcessed(eventId)) { System.out.printf(" [Forge] Duplicate event %s — acking without processing%n", eventId); channel.basicAck(deliveryTag, false); return; } // Step 2: Execute business logic — all of this must succeed or fail atomically processOrder(body); // Step 3: Mark as processed in the idempotency table // Do this inside the same DB transaction as processOrder() in production markAsProcessed(eventId); // Step 4: Only ack AFTER all side effects are committed to durable storage // If we crash between processOrder() and basicAck(), the broker redelivers // The idempotency check in Step 1 handles that redelivery safely channel.basicAck(deliveryTag, false); System.out.printf(" [Forge] Successfully processed and acked event %s%n", eventId); } catch (Exception error) { System.err.printf(" [Forge] Failed to process event %s: %s%n", eventId, error.getMessage()); // requeue=false: send to DLQ instead of requeueing // requeue=true would create an infinite loop for permanent failures // The DLQ preserves the message for manual inspection and replay channel.basicNack(deliveryTag, false, false); } }; // autoAck=false: broker keeps message unacked until we call basicAck or basicNack // This is the single most important setting — never set this to true in production channel.basicConsume(QUEUE_NAME, false, deliverCallback, consumerTag -> System.out.println(" [Forge] Consumer cancelled: " + consumerTag) ); } private static void processOrder(String json) throws Exception { // In production: parse JSON, call payment gateway, write to DB in one transaction System.out.printf(" [Forge] Processing order payload: %s%n", json); Thread.sleep(50); // simulate processing time } private static boolean isAlreadyProcessed(String eventId) throws Exception { try (var conn = DriverManager.getConnection(DB_URL, DB_USER, DB_PASS); PreparedStatement ps = conn.prepareStatement( "SELECT 1 FROM processed_events WHERE event_id = ? LIMIT 1")) { ps.setString(1, eventId); ResultSet rs = ps.executeQuery(); return rs.next(); } } private static void markAsProcessed(String eventId) throws Exception { try (var conn = DriverManager.getConnection(DB_URL, DB_USER, DB_PASS); PreparedStatement ps = conn.prepareStatement( "INSERT INTO processed_events (event_id, processed_at) VALUES (?, NOW()) ON CONFLICT DO NOTHING")) { ps.setString(1, eventId); ps.executeUpdate(); } } private static String extractEventId(String json) { // In production use Jackson or Gson for JSON parsing // This simple extraction is for illustration clarity only int start = json.indexOf('"', json.indexOf("event_id") + 10) + 1; int end = json.indexOf('"', start); return json.substring(start, end); } }
[Forge] Received event 550e8400-e29b-41d4-a716-446655440000
[Forge] Processing order payload: {"event_id":"550e8400...", "customer_id":"cust-123", "amount":99.99}
[Forge] Successfully processed and acked event 550e8400-e29b-41d4-a716-446655440000
# On redelivery of the same event_id:
[Forge] Received event 550e8400-e29b-41d4-a716-446655440000
[Forge] Duplicate event 550e8400-e29b-41d4-a716-446655440000 — acking without processing
Dead-Letter Queues — Handling Failures Without Blocking the Pipeline
A Dead Letter Queue is where messages go when they cannot be processed — not to be forgotten, but to be preserved for diagnosis and recovery. The DLQ's value is not that it catches failures (your nack logic does that), but that it isolates them from the primary queue so the happy path continues processing at full speed while the failed messages wait for human attention.
In RabbitMQ, a Dead Letter Exchange is a first-class feature configured on the primary queue via arguments. When a message is nacked with requeue=False, or expires its TTL, or exceeds the queue's maximum length, the broker automatically routes it to the configured dead-letter exchange, which then delivers it to the DLQ. This requires no code changes in the consumer beyond using requeue=False — the broker handles the routing.
Archiving DLQ messages to a relational database table adds context that the message broker cannot preserve on its own — the source queue name, the failure reason, the retry count, the timestamp of each attempt. Without this context, when someone opens the RabbitMQ management UI a week later and sees 47 messages in the DLQ, they have no idea which ones are related, which ones can be replayed, or what went wrong.
The alerting is not optional. A DLQ that accumulates silently for days is just deferred data loss. Every message that enters the DLQ represents a failure that a customer or downstream system is already experiencing. Treat DLQ growth exactly like a 500 error rate spike — investigate immediately, not during the next weekly review.
-- io.thecodeforge: Schema for auditing and replaying DLQ extractions -- This table provides the context that the message broker cannot preserve: -- which queue failed, why it failed, how many times it was attempted, -- and whether it has been replayed or resolved. CREATE TABLE dlq_archive ( id BIGSERIAL PRIMARY KEY, original_event_id UUID NOT NULL, source_queue VARCHAR(255) NOT NULL, -- which queue the message came from payload JSONB NOT NULL, -- full message body for replay failure_reason TEXT NOT NULL, -- exception message or error code retry_count INT NOT NULL DEFAULT 0, first_failed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), last_failed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), resolved_at TIMESTAMPTZ, -- null until an engineer resolves it resolution_notes TEXT -- what was done to fix or discard ); -- Fast lookup by event_id for idempotency checks and deduplication during replay CREATE UNIQUE INDEX idx_dlq_event_id ON dlq_archive(original_event_id); -- For ops dashboards: filter by source queue and unresolved status CREATE INDEX idx_dlq_source_unresolved ON dlq_archive(source_queue, resolved_at) WHERE resolved_at IS NULL; -- Query: find all unresolved failures for the order_processing queue in the last 24 hours SELECT original_event_id, failure_reason, retry_count, first_failed_at, payload->>'customer_id' AS customer_id, payload->>'amount' AS amount FROM dlq_archive WHERE source_queue = 'order_processing' AND resolved_at IS NULL AND first_failed_at >= NOW() - INTERVAL '24 hours' ORDER BY first_failed_at DESC; -- Replay query: select fixable messages to republish to the original queue -- After fixing the consumer bug, republish these payloads with new event_ids SELECT original_event_id, payload FROM dlq_archive WHERE source_queue = 'order_processing' AND failure_reason LIKE '%connection timeout%' -- transient errors worth replaying AND resolved_at IS NULL;
-- Partial index for unresolved messages keeps ops dashboard queries fast
-- even as the archive grows to millions of rows over months.
--
-- Example unresolved failures query result:
-- original_event_id | failure_reason | retry_count | customer_id
-- 550e8400-e29b-41d4-a716-446655440000 | DB connection timeout | 3 | cust-123
-- 7c9e6679-7425-40de-944b-e07fc1f90ae7 | JSON parse error: ... | 1 | cust-456
| Feature / Aspect | RabbitMQ (Queue-based) | Apache Kafka (Log-based) |
|---|---|---|
| Delivery model | Push — broker pushes messages to consumers as they become available | Pull — consumers poll the broker at their own pace, which naturally provides backpressure |
| Message retention | Deleted after ACK — once processed and acknowledged, the message is gone | Retained for a configurable period (hours, days, weeks) regardless of consumer acknowledgement |
| Message replay | Not supported natively — once a message is acked it cannot be replayed without re-publishing | Core feature — any consumer group can seek to offset 0 and replay the entire event history |
| Throughput ceiling | Approximately 50,000 messages per second per node under ideal conditions | Millions of messages per second per node — designed for extreme throughput at the cost of lower-level control |
| Ordering guarantee | FIFO per queue — messages in a single queue are delivered in order | FIFO per partition — ordering is guaranteed within a partition; global ordering requires a single partition at the cost of throughput |
| Consumer isolation | Competing consumers share one queue — each message delivered to exactly one consumer | Each consumer group maintains independent offsets — truly independent fan-out with no interference between groups |
| Best for | Task queues, RPC patterns, complex routing logic with multiple exchanges, work distribution | Event streaming, audit logs, data pipelines, event sourcing, analytics, and any use case requiring replay |
| Operational complexity | Low — single binary, easy setup, excellent management UI out of the box | Higher — requires partition tuning, consumer group coordination, and offset management; KRaft mode removes ZooKeeper in recent versions |
| Dead-letter support | First-class feature via x-dead-letter-exchange queue argument — automatic routing on nack or TTL expiry | Convention-based retry and dead-letter topic pattern — requires explicit consumer logic to route to retry or DLT topics |
| Message size practical limit | 128MB default, configurable — but large messages should be stored externally with the queue carrying a reference | 1MB default, configurable — same advice applies: use the queue for metadata and reference, not for payloads |
🎯 Key Takeaways
- Producers fire-and-forget — they hand the message to the broker and return immediately. This latency independence between services is the core architectural benefit, and it is why message queues are the first tool experienced engineers reach for when synchronous coupling creates reliability problems.
- The acknowledgement model is your delivery guarantee: only ack after all side effects succeed and are committed to durable storage. Use nack with requeue=False for permanent failures to route to the DLQ. Auto-ack is a data loss mechanism with no legitimate use in any consumer that does real work.
- Queue means one consumer gets the message — correct for tasks. Topic or fanout means every subscriber gets the message — correct for domain events. Choose based on whether multiple downstream services need to react to the same event, not based on expected volume.
- A DLQ without depth alerting is deferred data loss — set the alert threshold at one message, not some arbitrary number that gives you false confidence. Treat DLQ growth exactly like a 500 error rate spike: investigate immediately, fix the root cause, and replay selectively after the fix is deployed.
⚠ Common Mistakes to Avoid
Interview Questions on This Topic
- QHow do you handle backpressure in a system where the producer is significantly faster than the consumer?SeniorReveal
- QExplain the difference between at-least-once and exactly-once delivery semantics. Which does Kafka provide, and what are the caveats?SeniorReveal
- QDesign a distributed retry pattern. How do you avoid thundering herd problems when a downstream service recovers from an outage?Mid-levelReveal
- QWhat is the difference between a message queue and an event streaming platform like Kafka?JuniorReveal
Frequently Asked Questions
What is the difference between a message queue and an event streaming platform like Kafka?
A message queue like RabbitMQ deletes a message after a consumer acknowledges it. The message exists only until it is processed — then it is gone. This is ideal for work distribution: process this payment, send this email. Kafka retains all messages as an immutable log for a configurable retention period. Any consumer group can replay from any offset at any time without affecting other groups. Use a queue for task distribution where exactly one service should process each message. Use Kafka for event streaming, audit logs, or any use case where replay capability or multiple independent consumers on the same event stream are requirements.
What happens if a consumer crashes before acknowledging a message?
With manual acknowledgement, the broker detects the lost connection via heartbeat timeout and automatically redelivers the unacknowledged message to another available consumer. This is at-least-once delivery — it guarantees no data loss but means the same message can be received more than once. This is why every consumer performing non-idempotent operations must check the event_id against a processed_events table before acting. A consumer that does not handle duplicates will produce incorrect side effects — double charges, duplicate emails — on redelivery.
Can I use a database as a message queue?
You can, but it does not scale and creates operational problems at volume. Databases are optimised for transactional reads and updates — not for the constant write-then-delete pattern of queue consumption, which causes index fragmentation, table bloat, and lock contention under load. Polling-based database queues also introduce artificial latency. That said, the Outbox pattern uses a database table as a transient event store — not as a queue itself, but as an atomicity bridge between a business transaction and a real message broker. The distinction matters: the outbox table is a staging area, not the delivery mechanism.
What is the Outbox pattern and when should I use it?
The Outbox pattern writes events to a database table inside the same transaction as your business logic. A separate poller process reads that table and publishes to the message queue, then marks the row as sent. Use it whenever a database write and a queue publish must succeed or fail together atomically. Without it, the window between a successful database commit and the subsequent queue publish can contain a process crash — leaving you with data that exists but no downstream service will ever know about. The trade-off is additional latency equal to the poller's interval and an outbox table that needs periodic cleanup, but for payment events, order creation, and any operation with financial or compliance implications, that trade-off is always worth making.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.