Junior 9 min · May 23, 2026

Kafka with Spring Boot: Complete Beginner to Production Guide

Master Apache Kafka with Spring Boot — KafkaTemplate, @KafkaListener, consumer groups, partitions, JsonSerializer, and lag monitoring in production.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer
  • Use KafkaTemplate.send(topic, key, value) to produce messages from any Spring bean
  • Annotate methods with @KafkaListener(topics="orders", groupId="order-service") to consume
  • Register a NewTopic @Bean to auto-create topics with partition/replication config
  • Configure JsonSerializer/JsonDeserializer for automatic POJO serialization
  • Monitor consumer lag with kafka-consumer-groups.sh --describe to detect processing backlogs
✦ Definition~90s read
What is Kafka with Spring Boot?

Apache Kafka is a distributed event streaming platform built around an immutable, append-only log. Messages are organized into topics, each split into one or more partitions — the unit of parallelism. Every message written to a partition receives a monotonically increasing offset.

Think of Kafka like a massive, ordered logbook in a busy post office.

Consumers track their progress by committing offsets, allowing them to resume exactly where they left off after a restart or crash. Messages are retained on disk for a configurable retention period regardless of whether they have been consumed, making Kafka fundamentally different from traditional queues where messages disappear after acknowledgment.

A consumer group is a set of consumer instances sharing a group.id. Kafka assigns each partition to exactly one consumer within the group at any point in time — this is the guarantee that enables parallel, ordered processing without duplication.

Add more consumers to the group (up to the partition count) and throughput scales linearly. Add more partitions and you can scale further, though repartitioning a live topic requires careful coordination.

Spring Boot's spring-kafka library wraps the official Kafka Java client with idiomatic Spring abstractions. KafkaTemplate provides a fluent, thread-safe API for sending messages with optional callbacks for async acknowledgment. @KafkaListener integrates with Spring's message listener container infrastructure, supporting concurrent consumers, error handlers, retry policies, and transaction participation — all configurable through application.yml or dedicated @Bean definitions.

Plain-English First

Think of Kafka like a massive, ordered logbook in a busy post office. Producers drop letters (messages) into named mailboxes (topics), which are split into numbered slots (partitions) for parallel delivery. Consumer groups are teams of postal workers — each worker handles a different slot, and Kafka remembers exactly which letter each team last processed, so nothing gets lost or delivered twice.

Every high-traffic production system eventually hits the wall: the monolith that tries to do everything synchronously collapses under load. An order placed on Black Friday triggers an inventory check, a fraud scan, an email, a push notification, and a loyalty-points calculation — all in one HTTP request. The timeout cascades. Users see errors. The on-call engineer gets paged at 2 AM.

Apache Kafka solves this by decoupling producers from consumers through a distributed, durable, ordered log. Instead of calling seven downstream services synchronously, you write one message to a Kafka topic and go back to serving the next request. Each downstream service consumes at its own pace, retries independently, and scales horizontally without touching the producer.

Spring Boot's spring-kafka autoconfiguration makes wiring Kafka into a production service remarkably straightforward. A KafkaTemplate bean is ready to inject the moment you add the dependency and configure spring.kafka.bootstrap-servers. An @KafkaListener annotation turns any method into a message handler with full Spring context — transactions, security, metrics — all available.

But the ease of getting started hides the operational complexity lurking underneath. Consumer group rebalancing can pause processing for seconds. Poorly chosen partition counts bottleneck throughput. A single slow consumer can cause the entire group to fall behind, piling up unbounded lag. Without proper observability — consumer lag dashboards, offset monitoring, dead-letter queues — you won't know you have a problem until your database is backed up by hours of unprocessed events.

This guide walks through every layer: the Spring Boot autoconfiguration that bootstraps the connection, producing and consuming type-safe POJOs with JSON serialization, managing topics programmatically, and monitoring consumer lag in production using real Kafka CLI commands. Every concept is grounded in production patterns used at scale.

Setting Up Spring Boot with Kafka: Autoconfiguration Deep Dive

Spring Boot's autoconfiguration for Kafka activates the moment spring-kafka is on the classpath and spring.kafka.bootstrap-servers is set. Behind the scenes, KafkaAutoConfiguration creates a KafkaAdmin bean (for topic management), a KafkaTemplate bean (for producing), and a ConcurrentKafkaListenerContainerFactory bean (for consuming). You rarely need to declare these manually.

The producer factory uses spring.kafka.producer.* properties to build the underlying KafkaProducer. Key properties: key-serializer defaults to StringSerializer, value-serializer should be set to JsonSerializer for POJO payloads. The consumer factory mirrors this with key-deserializer and value-deserializer. Spring Boot maps spring.kafka.consumer.group-id directly to the Kafka group.id property, so all @KafkaListener methods in the application share the same group unless overridden.

For multi-environment setups, externalise all Kafka config in application.yml and use Spring profiles. In local development, a Testcontainers KafkaContainer or an embedded broker (@EmbeddedKafka from spring-kafka-test) provides a realistic test environment without a running cluster. In production, point bootstrap-servers at your MSK or Confluent endpoint and layer in SSL/SASL configuration via spring.kafka.ssl.* and spring.kafka.security.protocol.

One subtle autoconfiguration detail: KafkaAdmin.autoCreateTopics defaults to true, meaning any NewTopic @Bean you declare will be created automatically on startup if it doesn't exist, and the partition/replication configuration will be applied to new topics but NOT retroactively updated on existing ones. This is a common source of confusion when changing partition counts — Kafka doesn't support reducing partitions, and the NewTopic bean won't add partitions to an existing under-partitioned topic without explicit intervention via the AdminClient API.

NewTopic Beans Don't Update Existing Topics
If your orders topic already exists with 3 partitions and you change the NewTopic bean to 6 partitions, Spring Boot will NOT add partitions automatically. You must use kafka-topics.sh --alter or the AdminClient API. Never rely on NewTopic beans for partition changes on live topics.
Production Insight
In production, always set auto-offset-reset: earliest for new consumer groups so they process historical messages from the beginning. Use latest only for real-time-only consumers where historical data has no value. Misconfiguring this causes new deployments to silently skip months of messages.
Key Takeaway
Spring Boot autoconfiguration wires Kafka with minimal config, but understanding what each property controls prevents silent data loss in production.

Producing Messages with KafkaTemplate

The KafkaTemplate<K, V> is Spring Kafka's primary API for sending messages. It wraps the native KafkaProducer and adds Spring-friendly abstractions: a default topic, header management, and a CompletableFuture<SendResult<K, V>> return type for async result handling.

The most important overload in production is send(String topic, K key, V value). The key is critical: Kafka routes all messages with the same key to the same partition, guaranteeing order for that key. For an order service, use orderId as the key — all events for a given order (created, paid, shipped) land on the same partition and are processed in sequence by the same consumer.

The returned CompletableFuture resolves when the broker acknowledges the message according to the acks setting. With acks=all (the production default), resolution means all in-sync replicas have persisted the message — durable and safe. Never call .send() and discard the future in production code; at minimum, attach a callback to log send failures. Better yet, wire a global ProducerListener bean that records metrics on every success and failure.

For high-throughput scenarios, KafkaTemplate supports batching transparently via the producer's batch.size and linger.ms settings. Setting linger.ms=5 causes the producer to wait up to 5ms before flushing a batch, dramatically increasing throughput at the cost of slightly higher latency. In microservices with bursty traffic patterns, this trade-off is almost always worth it. Enable compression.type=snappy or lz4 alongside batching to reduce network and broker storage overhead by 60-70% for JSON payloads.

Transactional producers require a transactionIdPrefix on the ProducerFactory and a KafkaTransactionManager bean. Inside a @Transactional method, all KafkaTemplate.send() calls participate in a single atomic transaction — either all messages are committed or none are, even across multiple topics. This is the foundation for exactly-once semantics when combined with isolation.level=read_committed on consumers.

Never Swallow the Send Future
Calling kafkaTemplate.send(topic, key, value) without attaching a callback or calling .get() means producer errors (broker unreachable, topic not found, serialization failure) are silently discarded. In production, every send future must have a whenComplete callback that at minimum logs the failure and increments an error metric.
Production Insight
For order-critical messages, always configure a ProducerInterceptor that adds a unique message ID header. This enables idempotent consumers to detect and skip duplicates without relying solely on Kafka's idempotent producer setting.
Key Takeaway
Choose your message key based on the ordering unit (orderId, userId, sessionId) — it directly determines which partition receives the message and which consumer processes it.

Consuming Messages with @KafkaListener

The @KafkaListener annotation transforms a Spring-managed method into a Kafka message handler. The listener container factory polls the broker on a background thread, deserializes the record, and invokes your method. Spring's full DI context is active — you can inject repositories, call other services, and participate in transactions exactly as you would in a REST controller.

The method signature is flexible. Accept ConsumerRecord<K, V> for access to offset, partition, headers, and timestamp. Accept just the value type for clean domain code. Inject @Header(KafkaHeaders.RECEIVED_PARTITION) or @Header(KafkaHeaders.OFFSET) for individual header values. Accept Acknowledgment ack when using AckMode.MANUAL to control exactly when the offset is committed.

The concurrency attribute on @KafkaListener (or on the container factory) controls how many consumer threads the application runs. With concurrency=3 on a topic with 6 partitions, Spring starts 3 KafkaMessageListenerContainer instances, each consuming 2 partitions. Setting concurrency higher than the partition count wastes threads — the excess consumers sit idle waiting for a rebalance that gives them a partition. A common production pattern is to set partition count equal to the maximum expected consumer concurrency.

Partition assignment and group coordination happen automatically, but you can influence behavior with @TopicPartition on @KafkaListener for static partition assignment — useful for consumers that need to maintain local state (e.g., windowed aggregations) without rebalance disruption. Static assignment bypasses the group coordinator and requires careful management when scaling, so use it judiciously.

Error handling in @KafkaListener follows a layered model: the method itself can throw, triggering the container's error handler. The default DefaultErrorHandler retries with exponential backoff and then sends the record to a dead-letter topic (if configured). Understanding this pipeline is essential — an unconfigured error handler means a poison message will retry infinitely, blocking the partition.

Concurrency Must Not Exceed Partition Count
Setting concurrency=10 on a topic with 3 partitions starts 10 threads but only 3 will ever receive messages. The other 7 consume memory and CPU for zero throughput benefit. Always align concurrency with your topic's partition count.
Production Insight
Use separate groupId values for different business purposes consuming the same topic (e.g., order-service for fulfillment, analytics-service for reporting). Each group maintains independent offsets and scales independently — this is Kafka's broadcast pattern.
Key Takeaway
@KafkaListener is powerful but requires understanding the concurrency-to-partition relationship and error handler pipeline to avoid silent message loss or infinite retry loops.

JSON Serialization: JsonSerializer and JsonDeserializer

Raw Kafka messages are byte arrays. Spring Kafka's JsonSerializer and JsonDeserializer use Jackson to convert between Java objects and JSON bytes automatically, making your @KafkaListener methods receive typed POJOs instead of byte arrays.

On the producer side, JsonSerializer serializes the POJO using Jackson's ObjectMapper. By default, it adds a spring_json_header_types message header containing the fully qualified Java class name of the value. This header is read by JsonDeserializer on the consumer to determine the target deserialization type without explicit configuration — convenient for same-codebase producer/consumer pairs.

The header-based type resolution breaks in cross-service communication. If the producer is a Python service or a different Java package structure, the header will reference a class that doesn't exist in the consumer's classpath. Configure explicit type mapping using spring.json.type.mapping=order:com.example.events.OrderEvent on both sides — this maps a logical alias to the concrete class, decoupling producer and consumer class hierarchies.

For schema evolution, use Jackson's @JsonIgnoreProperties(ignoreUnknown = true) on your event classes. This allows consumers to handle new fields added by producers without throwing deserialization errors — a critical resilience pattern for independently deployable services. Avoid removing fields or changing field types without a coordinated migration plan, as Kafka retains historical messages that will be deserialized by both old and new consumers.

Customize the shared ObjectMapper by declaring an ObjectMapper @Bean with JavaTimeModule registered (for LocalDateTime, Instant, etc.) and WRITE_DATES_AS_TIMESTAMPS disabled. Spring Kafka's JsonSerializer and JsonDeserializer pick up the primary ObjectMapper bean automatically in Spring Boot 3.x, ensuring consistent date serialization across your entire application.

Always Set Trusted Packages in Production
Without spring.json.trusted.packages or addTrustedPackages(), JsonDeserializer throws IllegalArgumentException: The class ... is not in the trusted packages. Avoid setting * (trust all) in production — enumerate the specific packages your consumers need to deserialize from.
Production Insight
Define a canonical event schema module (a shared library) that both producer and consumer depend on. Version it with SemVer and release additive-only changes. This eliminates class-not-found deserialization errors across service boundaries.
Key Takeaway
Decouple producer and consumer class paths using spring.json.type.mapping and annotate all event records with @JsonIgnoreProperties(ignoreUnknown = true) for safe schema evolution.

Topic Management with NewTopic Bean and KafkaAdmin

Managing Kafka topics programmatically through Spring eliminates the need for manual kafka-topics.sh commands in deployment pipelines. The NewTopic @Bean pattern integrates topic creation into the application lifecycle — on startup, KafkaAdmin checks each NewTopic bean against the broker and creates missing topics.

Partition count is the most consequential topic configuration decision. Partitions determine max parallelism: a topic with 6 partitions can be consumed by at most 6 consumers in the same group simultaneously. Under-partitioning is a performance ceiling that cannot be easily raised in production without a migration plan. Over-partitioning wastes broker resources (each partition requires memory, file handles, and replication overhead). A practical rule: estimate your peak consumer throughput requirement, divide by throughput per consumer, and multiply by 2 for headroom.

Replication factor should be 3 in any production cluster with at least 3 brokers. This tolerates one broker failure without data loss and maintains availability for reads and writes. Setting min.insync.replicas=2 on the topic alongside acks=all on the producer ensures that at least 2 replicas must acknowledge each write before the producer considers it complete — a strong durability guarantee.

Retention configuration (retention.ms, retention.bytes) controls how long messages remain available for consumption after being written. Event sourcing use cases often need retention.ms=-1 (infinite retention) or compaction (cleanup.policy=compact) so the latest value per key is always retained. For ephemeral event streams, 7-day retention is a common default that balances storage cost with the ability to reprocess recent history after an incident.

Beyond NewTopic beans, KafkaAdmin exposes a programmatic API for dynamic topic management at runtime — useful for multi-tenant architectures where tenant onboarding triggers topic creation, or for integration tests that need to create and delete topics per test run.

Partition Count Cannot Be Reduced
Once a topic is created with N partitions, you can only increase — never decrease — partition count. Increasing partitions redistributes future messages but does NOT move existing messages, which can break ordering guarantees for keys that relied on the old partition assignment. Plan partition counts carefully before going to production.
Production Insight
Use Spring profiles to set different partition counts per environment: 1 partition locally, 3 in staging, 6-12 in production. Inject @Value("${kafka.topics.orders.partitions:6}") into the NewTopic bean to keep environment differences in configuration, not code.
Key Takeaway
Invest time in partition count planning before production — it's the one Kafka configuration you can only make harder to change later, never easier.

Consumer Lag Monitoring in Production

Consumer lag is the gap between the latest message offset in a partition and the offset the consumer group has committed. A lag of zero means the consumer is keeping up in real time. Growing lag signals that consumers are processing slower than producers are writing — a leading indicator of problems before they become user-visible.

The canonical CLI command for lag inspection is kafka-consumer-groups.sh --bootstrap-server <broker> --describe --group <group>. It shows each partition's LOG-END-OFFSET (latest), CURRENT-OFFSET (committed), LAG (the difference), and which consumer instance currently owns the partition. Run this during incidents to instantly identify whether lag is uniform (a throughput problem) or isolated to specific partitions (possibly a poison message or a crashed consumer).

For ongoing monitoring, expose Kafka consumer metrics through Spring Boot Actuator and Micrometer. Add spring.kafka.listener.observation-enabled=true and the Micrometer registry of your choice. The kafka.consumer.fetch-manager.records-lag gauge updates per-poll and flows into Prometheus/Datadog/CloudWatch without additional instrumentation. Set an alert rule: if lag on any critical consumer group exceeds N * (acceptable processing time in seconds), page the on-call engineer.

In Kubernetes environments, the Kafka Lag Exporter (Lightbend) or Confluent's built-in metrics reporter surfaces per-consumer-group lag as a Prometheus metric that integrates with KEDA for autoscaling. Configure a ScaledObject that adds consumer replicas when lag exceeds a threshold and removes them when lag drops — this creates a self-regulating consumer pool that handles traffic bursts without manual intervention.

Beyond lag, monitor records-consumed-rate, fetch-latency-avg, and commit-latency-avg as leading indicators. A spike in fetch-latency-avg suggests broker-side pressure or network issues. A spike in commit-latency-avg with growing lag suggests the consumer is processing fast but offset commits are slow — often caused by a blocked @Transactional boundary or a network partition to ZooKeeper (pre-KRaft) or the coordinator broker.

Lag of Zero Does Not Mean Healthy Processing
A consumer can have zero lag while silently discarding messages (e.g., catching exceptions and committing offsets without actually processing). Always track business-level metrics (orders processed per second, payment events handled) alongside Kafka lag. Lag is a delivery metric, not a processing success metric.
Production Insight
Set up a KEDA ScaledObject on your consumer Deployment with a kafka trigger pointing at your consumer group and topic. When lag exceeds 500 messages, KEDA automatically adds consumer pods; when lag drops to zero, it scales back down. This eliminates manual scaling entirely for most bursty workloads.
Key Takeaway
Consumer lag is the single most important Kafka operational metric. Alert on it early, automate response with KEDA, and always correlate it with business-level processing success metrics.
● Production incidentPOST-MORTEMseverity: high

Consumer Group Rebalance Storm Brings Order Processing to a Halt

Symptom
After a rolling deployment of the order-service, consumer lag on the orders topic spiked from near-zero to 12,000 messages. The Grafana dashboard showed all consumer instances reporting no progress for exactly 4 minutes before recovering. No errors appeared in application logs.
Assumption
The team assumed the lag spike was caused by the new code being slower at processing orders — a performance regression in the new release.
Root cause
The session.timeout.ms was left at the Kafka default of 10 seconds, and max.poll.interval.ms was 5 minutes. During rolling deployment, each pod restart triggered a group rebalance. Because the team deployed all 6 pods in rapid succession without waiting for stabilization, the group entered a continuous rebalance loop — no consumer could complete a full rebalance cycle before the next pod restart initiated another one. During rebalance, no consumer processes messages.
Fix
Set spring.kafka.consumer.properties.session.timeout.ms=45000 and stagger pod restarts with a 60-second wait between each. Alternatively, use CooperativeStickyAssignor (set partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor) which performs incremental rebalances, allowing unaffected partitions to keep processing during the transition.
Key lesson
  • Rebalance storms are invisible in application logs but catastrophic for throughput.
  • Monitor the kafka.consumer.fetch-manager.records-lag-max metric and alert when lag exceeds SLA-bound thresholds.
  • Always configure CooperativeStickyAssignor in services that perform rolling deployments.
Production debug guideSymptom → root cause → fix5 entries
Symptom · 01
Consumer lag grows indefinitely but no errors in logs
Fix
Check if the consumer group is stuck in a rebalance loop using kafka-consumer-groups.sh --describe. If the 'STATE' column shows 'PreparingRebalance' or 'CompletingRebalance' for more than 30 seconds, a rebalance storm is likely. Reduce deployment cadence and switch to CooperativeStickyAssignor. Also check max.poll.records — if each poll takes longer than max.poll.interval.ms, the broker evicts the consumer from the group silently.
Symptom · 02
Messages appear to be processed twice (duplicate processing)
Fix
Duplicate processing occurs when the consumer crashes after processing but before committing the offset. Verify enable.auto.commit=false and that your @KafkaListener method (or AckMode) explicitly acknowledges after successful processing. For idempotent consumption, store a processed message ID in Redis or a database unique constraint before doing business logic. Never commit offsets inside a try-catch that swallows exceptions.
Symptom · 03
KafkaTemplate.send() returns a future but messages never appear in the topic
Fix
The send future completes on the producer's I/O thread, not the calling thread. Call .get(5, TimeUnit.SECONDS) on the returned CompletableFuture to surface producer errors that are otherwise silently swallowed. Check acks configuration — with acks=0, the broker never confirms receipt. Also verify the topic exists; with spring.kafka.admin.auto-create-topics=false, sending to a nonexistent topic fails silently in some broker configurations.
Symptom · 04
JsonDeserializer throws ClassCastException or deserialization errors
Fix
Kafka's JsonDeserializer embeds the target type in the message header by default (spring_json_header_types). If producer and consumer use different package structures or class names, deserialization fails. Set spring.kafka.consumer.properties.spring.json.trusted.packages=* and explicitly configure spring.kafka.consumer.value-deserializer-class=org.springframework.kafka.support.serializer.JsonDeserializer with spring.kafka.consumer.properties.spring.json.value.default.type=com.example.OrderEvent to bypass header-based type resolution.
Symptom · 05
Topic has fewer active consumers than partitions
Fix
Run kafka-consumer-groups.sh --describe --group <group> and inspect the 'CONSUMER-ID' column. Partitions assigned to a blank consumer-ID are unassigned — no consumer is processing them. This happens when the consumer count drops below partition count due to crashes or misconfigured replica counts. Scale up consumer pods to match partition count, or reduce concurrency in @KafkaListener if running multiple virtual consumers per pod.
★ Kafka Debug Cheat SheetRun these commands on any Kafka host or from a pod with kafka-tools installed. Replace broker address and group/topic names as needed.
Consumer lag is high — need current offset vs end offset
Immediate action
Describe the consumer group to see lag per partition
Commands
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group order-service
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group order-service --verbose
Fix now
Compare LOG-END-OFFSET vs CURRENT-OFFSET per partition. Scale consumers if lag is uniform across all partitions. If lag is only on specific partitions, check for poison messages or slow processing on that consumer instance.
Not sure if messages are reaching the topic+
Immediate action
Consume from the beginning of the topic to verify message presence
Commands
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic orders --from-beginning --max-messages 10
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic orders
Fix now
If the topic describe shows 0 partitions or the consumer returns nothing, the topic may not exist or messages are being produced to a different topic name. Check spring.kafka.template.default-topic and producer configuration.
Need to reset consumer group offset to reprocess messages+
Immediate action
Stop all consumers in the group first, then reset offsets
Commands
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group order-service --reset-offsets --to-earliest --topic orders --execute
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group order-service --reset-offsets --to-datetime 2026-05-23T00:00:00.000 --topic orders --execute
Fix now
Use --to-earliest to reprocess all messages or --to-datetime for a specific point in time. Always verify with --dry-run first. Ensure idempotent processing before reprocessing to avoid duplicate side effects.
Topic throughput or partition distribution looks uneven+
Immediate action
Check partition leader distribution and replication status
Commands
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic orders
kafka-log-dirs.sh --bootstrap-server localhost:9092 --topic-list orders --describe | grep -v '^{' | head -50
Fix now
Uneven partition leaders indicate broker imbalance. Run kafka-leader-election.sh or enable auto.leader.rebalance.enable=true. If one partition has significantly more data, your key distribution is skewed — review the message key strategy in KafkaTemplate.send(topic, key, value).
Kafka vs. Traditional Message Queue vs. REST Synchronous Call
AspectKafka TopicRabbitMQ/SQS QueueREST Synchronous
Message retentionConfigurable (days to forever)Deleted after ACKNot stored
Consumer scalabilityUp to partition count per groupUnlimited consumers competingLimited by upstream capacity
Message orderingGuaranteed per partition/keyBest-effort (FIFO queues: per queue)Request order only
Multiple consumersYes — independent consumer groupsNo — competing consumersNo — one response per call
Replay historical dataYes — reset offsetsNo — messages gone after ACKNo
Producer-consumer couplingNone — async, decoupledMinimal — asyncStrong — synchronous
Throughput ceilingMillions/sec (horizontal scale)Tens of thousands/secLimited by service capacity
Best forEvent streaming, audit logs, CDCTask queues, work distributionCRUD APIs, request-response

Key takeaways

1
Kafka decouples producers and consumers through a durable, partitioned log
the key architectural primitive that enables independent scaling and fault isolation
2
The message key is not just metadata
it controls partition assignment and therefore ordering guarantees and consumer locality for stateful processing
3
Consumer group rebalances pause all processing in the group; use CooperativeStickyAssignor and stagger deployments to minimize rebalance frequency and duration
4
Always configure an error handler (DefaultErrorHandler with DeadLetterPublishingRecoverer)
a poison message on an unprotected listener blocks the entire partition indefinitely
5
Consumer lag is the primary production health metric for Kafka
instrument it in Prometheus, alert on thresholds, and automate scaling with KEDA

Common mistakes to avoid

7 patterns
×

Using the same consumer group for different business purposes

Symptom
Two different services fighting over messages — one service processes a message, the other never sees it
Fix
Assign unique groupId values to each consuming service. Messages are broadcast to all groups but load-balanced within a group. analytics-service and order-service consuming the same orders topic need different group IDs.
×

Setting partition count lower than max expected consumer concurrency

Symptom
Throughput plateaus despite adding more consumer pods — some pods sit idle with no partitions assigned
Fix
Set partition count to the maximum number of consumers you'll ever run concurrently. A 3-partition topic can never utilize more than 3 consumers in one group, regardless of how many pods you deploy.
×

Not handling the send future result in KafkaTemplate.send()

Symptom
Messages silently disappear under broker pressure, network issues, or topic-not-found errors with no errors in application logs
Fix
Always attach .whenComplete((result, ex) -> { if (ex != null) { log.error(...); } }) to the returned CompletableFuture. For critical messages, use .get(timeout) for synchronous confirmation.
×

Leaving enable.auto.commit=true with manual business logic

Symptom
Duplicate processing after consumer restarts — messages are acknowledged before your database write completes
Fix
Set spring.kafka.consumer.enable-auto-commit=false and use AckMode.MANUAL_IMMEDIATE. Commit the offset only after your business logic succeeds. This prevents the offset from advancing past messages you haven't actually processed.
×

Not configuring an error handler on the listener container

Symptom
A single malformed message (poison pill) causes the consumer to retry infinitely, blocking the partition and growing lag to the millions
Fix
Configure DefaultErrorHandler with a FixedBackOff or ExponentialBackOff and a dead-letter topic fallback. After N retries, the message is sent to a .DLT topic and processing of subsequent messages resumes.
×

Running consumer concurrency higher than partition count

Symptom
Excess consumer threads are started but receive no messages; CPU and memory wasted; occasional NullPointerExceptions from unassigned consumer instances
Fix
Set concurrency in @KafkaListener or the container factory to be <= partition count. Idle consumer threads are not just wasteful — they participate in rebalances, adding latency.
×

Forgetting to trust packages in JsonDeserializer

Symptom
Application starts but throws IllegalArgumentException: The class X is not in the trusted packages on first message received
Fix
Add spring.kafka.consumer.properties.spring.json.trusted.packages=com.example.events to application.yml. For cross-org topics, enumerate all producer packages explicitly rather than using *.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is a Kafka consumer group and how does it enable parallel processin...
Q02JUNIOR
How does message ordering work in Kafka, and how do you ensure related m...
Q03SENIOR
What happens during a consumer group rebalance, and how can you minimize...
Q04SENIOR
Explain the difference between acks=0, acks=1, and acks=all in Kafka pro...
Q05SENIOR
How does Spring Kafka's DefaultErrorHandler work, and what happens to a ...
Q06SENIOR
How would you implement exactly-once semantics in a Kafka-based Spring B...
Q07SENIOR
How do you handle schema evolution in Kafka events when multiple service...
Q08SENIOR
You see consumer lag growing on one specific partition but not others. W...
Q09JUNIOR
What is the role of the message key in Kafka, and what are the consequen...
Q01 of 09JUNIOR

What is a Kafka consumer group and how does it enable parallel processing?

ANSWER
A consumer group is a set of consumer instances that share a group.id. Kafka assigns each topic partition to exactly one consumer within the group at any time, enabling parallel processing where each consumer handles a subset of partitions. Adding consumers to the group (up to the partition count) increases throughput linearly. Multiple groups can consume the same topic independently — each group maintains its own committed offsets, effectively broadcasting the same events to multiple downstream systems.
FAQ · 6 QUESTIONS

Frequently Asked Questions

01
How many partitions should I create for a new topic?
02
What is the difference between @KafkaListener and a Spring Integration Kafka channel?
03
How do I test @KafkaListener methods without a running Kafka broker?
04
When should I use Kafka vs. RabbitMQ for a new project?
05
How do I handle Kafka in a transactional context with a database?
06
What does auto-offset-reset=earliest vs latest mean in production?
🔥

That's Messaging. Mark it forged?

9 min read · try the examples if you haven't

Previous
Inter-service Communication in Microservices
1 / 5 · Messaging
Next
Kafka Producer and Consumer in Spring Boot