Intermediate 8 min · May 23, 2026

Kafka with Spring Boot: Complete Beginner to Production Guide

Q: How many partitions should I create for a new topic?

Start with the maximum number of consumers you plan to run concurrently — this is your throughput ceiling. A common formula: (target throughput MB/s) / (throughput per consumer MB/s). For most microservices starting out, 6 partitions with replication factor 3 is a safe default that allows scaling to 6 concurrent consumers while tolerating broker failures. Add 50% headroom over your estimated peak. Remember: you can add partitions later but cannot remove them, and adding partitions breaks key-based ordering for existing messages.

Q: What is the difference between @KafkaListener and a Spring Integration Kafka channel?

`@KafkaListener` is a lightweight annotation-based consumer that integrates Kafka messages directly into your Spring service methods. It's the right choice for most microservice use cases. Spring Integration Kafka channels are part of the heavier Spring Integration framework, designed for complex message routing, transformation pipelines, and enterprise integration patterns (EIP). Use Spring Integration when you need sophisticated routing logic, protocol bridging, or you're already using Spring Integration for other messaging. For straightforward consume-and-process patterns, `@KafkaListener` is simpler and more maintainable.

Q: How do I test @KafkaListener methods without a running Kafka broker?

Use `@EmbeddedKafka` from `spring-kafka-test`. Annotate your test class with `@EmbeddedKafka(partitions = 1, topics = {"orders"})` and add `@SpringBootTest`. The embedded broker starts in-process, supports full producer/consumer interaction, and is cleaned up after the test. For unit tests that don't need broker interaction, use `@MockBean KafkaTemplate` and inject a mock consumer record directly into your listener method. Always use `EmbeddedKafkaBroker.consumeFromAnEmbeddedTopic()` with a `CountDownLatch` or `Awaitility` to wait for async message processing before asserting.

Q: When should I use Kafka vs. RabbitMQ for a new project?

Choose Kafka when: you need message replay (reprocessing historical events), you need high throughput (millions of messages/second), you have multiple independent consumers of the same event stream, or you're building event sourcing / CDC pipelines. Choose RabbitMQ when: you need flexible routing logic (topic exchanges, header matching), your primary use case is task queues with competing consumers, message volume is moderate, or you need priority queues and per-message TTL. RabbitMQ is operationally simpler for small-to-medium scale. Kafka excels at scale and durability but requires more operational investment.

Q: How do I handle Kafka in a transactional context with a database?

For consume-then-write-to-DB patterns, use the `ChainedKafkaTransactionManager` that wraps both a `KafkaTransactionManager` and a `JpaTransactionManager`. Both the offset commit and the DB transaction either commit or rollback together. For produce-then-consume patterns, be aware that Kafka transactions only span Kafka operations — database operations are not part of the Kafka transaction. The safest pattern is outbox: write your event to a database table in the same transaction as your business data, then a separate process (Debezium, a scheduler) reads the outbox and publishes to Kafka, providing guaranteed at-least-once delivery tied to your DB transaction.

Q: What does auto-offset-reset=earliest vs latest mean in production?

`earliest` (also called `--from-beginning`) means a new consumer group that has never committed an offset will start reading from the very first available message in the topic — including all historical retained messages. `latest` means a new group starts reading only messages published after the consumer first connects — it skips all historical messages. In production, `earliest` is usually safer for new deployments: you won't miss events that arrived while the consumer was being deployed. Use `latest` for monitoring or alerting consumers where historical events have no operational value, like a live dashboard showing current system state.

Master Apache Kafka with Spring Boot — KafkaTemplate, @KafkaListener, consumer groups, partitions, JsonSerializer, and lag monitoring in production..

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Drawn from code that ran under real load.

✓ Production

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

Before you start⏱ 25 min

✓Solid grasp of fundamentals
✓Comfortable reading code examples
✓Basic production concepts

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Use KafkaTemplate.send(topic, key, value) to produce messages from any Spring bean
Annotate methods with @KafkaListener(topics="orders", groupId="order-service") to consume
Register a NewTopic @Bean to auto-create topics with partition/replication config
Configure JsonSerializer/JsonDeserializer for automatic POJO serialization
Monitor consumer lag with kafka-consumer-groups.sh --describe to detect processing backlogs

✦ Definition~90s read

What is Apache Kafka with Spring Boot?

Apache Kafka is a distributed event streaming platform built around an immutable, append-only log. Messages are organized into topics, each split into one or more partitions — the unit of parallelism. Every message written to a partition receives a monotonically increasing offset.

★

Think of Kafka like a massive, ordered logbook in a busy post office.

Consumers track their progress by committing offsets, allowing them to resume exactly where they left off after a restart or crash. Messages are retained on disk for a configurable retention period regardless of whether they have been consumed, making Kafka fundamentally different from traditional queues where messages disappear after acknowledgment.

A consumer group is a set of consumer instances sharing a group.id. Kafka assigns each partition to exactly one consumer within the group at any point in time — this is the guarantee that enables parallel, ordered processing without duplication.

Add more consumers to the group (up to the partition count) and throughput scales linearly. Add more partitions and you can scale further, though repartitioning a live topic requires careful coordination.

Spring Boot's spring-kafka library wraps the official Kafka Java client with idiomatic Spring abstractions. KafkaTemplate provides a fluent, thread-safe API for sending messages with optional callbacks for async acknowledgment. @KafkaListener integrates with Spring's message listener container infrastructure, supporting concurrent consumers, error handlers, retry policies, and transaction participation — all configurable through application.yml or dedicated @Bean definitions.

Plain-English First

Think of Kafka like a massive, ordered logbook in a busy post office. Producers drop letters (messages) into named mailboxes (topics), which are split into numbered slots (partitions) for parallel delivery. Consumer groups are teams of postal workers — each worker handles a different slot, and Kafka remembers exactly which letter each team last processed, so nothing gets lost or delivered twice.

Every high-traffic production system eventually hits the wall: the monolith that tries to do everything synchronously collapses under load. An order placed on Black Friday triggers an inventory check, a fraud scan, an email, a push notification, and a loyalty-points calculation — all in one HTTP request. The timeout cascades. Users see errors. The on-call engineer gets paged at 2 AM.

Apache Kafka solves this by decoupling producers from consumers through a distributed, durable, ordered log. Instead of calling seven downstream services synchronously, you write one message to a Kafka topic and go back to serving the next request. Each downstream service consumes at its own pace, retries independently, and scales horizontally without touching the producer.

Spring Boot's spring-kafka autoconfiguration makes wiring Kafka into a production service remarkably straightforward. A KafkaTemplate bean is ready to inject the moment you add the dependency and configure spring.kafka.bootstrap-servers. An @KafkaListener annotation turns any method into a message handler with full Spring context — transactions, security, metrics — all available.

But the ease of getting started hides the operational complexity lurking underneath. Consumer group rebalancing can pause processing for seconds. Poorly chosen partition counts bottleneck throughput. A single slow consumer can cause the entire group to fall behind, piling up unbounded lag. Without proper observability — consumer lag dashboards, offset monitoring, dead-letter queues — you won't know you have a problem until your database is backed up by hours of unprocessed events.

This guide walks through every layer: the Spring Boot autoconfiguration that bootstraps the connection, producing and consuming type-safe POJOs with JSON serialization, managing topics programmatically, and monitoring consumer lag in production using real Kafka CLI commands. Every concept is grounded in production patterns used at scale.

Setting Up Spring Boot with Kafka: Autoconfiguration Deep Dive

Spring Boot's autoconfiguration for Kafka activates the moment spring-kafka is on the classpath and spring.kafka.bootstrap-servers is set. Behind the scenes, KafkaAutoConfiguration creates a KafkaAdmin bean (for topic management), a KafkaTemplate bean (for producing), and a ConcurrentKafkaListenerContainerFactory bean (for consuming). You rarely need to declare these manually.

The producer factory uses spring.kafka.producer.* properties to build the underlying KafkaProducer. Key properties: key-serializer defaults to StringSerializer, value-serializer should be set to JsonSerializer for POJO payloads. The consumer factory mirrors this with key-deserializer and value-deserializer. Spring Boot maps spring.kafka.consumer.group-id directly to the Kafka group.id property, so all @KafkaListener methods in the application share the same group unless overridden.

For multi-environment setups, externalise all Kafka config in application.yml and use Spring profiles. In local development, a Testcontainers KafkaContainer or an embedded broker (@EmbeddedKafka from spring-kafka-test) provides a realistic test environment without a running cluster. In production, point bootstrap-servers at your MSK or Confluent endpoint and layer in SSL/SASL configuration via spring.kafka.ssl.* and spring.kafka.security.protocol.

One subtle autoconfiguration detail: KafkaAdmin.autoCreateTopics defaults to true, meaning any NewTopic @Bean you declare will be created automatically on startup if it doesn't exist, and the partition/replication configuration will be applied to new topics but NOT retroactively updated on existing ones. This is a common source of confusion when changing partition counts — Kafka doesn't support reducing partitions, and the NewTopic bean won't add partitions to an existing under-partitioned topic without explicit intervention via the AdminClient API.

NewTopic Beans Don't Update Existing Topics

If your orders topic already exists with 3 partitions and you change the NewTopic bean to 6 partitions, Spring Boot will NOT add partitions automatically. You must use kafka-topics.sh --alter or the AdminClient API. Never rely on NewTopic beans for partition changes on live topics.

Production Insight

In production, always set auto-offset-reset: earliest for new consumer groups so they process historical messages from the beginning. Use latest only for real-time-only consumers where historical data has no value. Misconfiguring this causes new deployments to silently skip months of messages.

Key Takeaway

Spring Boot autoconfiguration wires Kafka with minimal config, but understanding what each property controls prevents silent data loss in production.

thecodeforge.io

Spring Boot Kafka Basics

Producing Messages with KafkaTemplate

The KafkaTemplate<K, V> is Spring Kafka's primary API for sending messages. It wraps the native KafkaProducer and adds Spring-friendly abstractions: a default topic, header management, and a CompletableFuture<SendResult<K, V>> return type for async result handling.

The most important overload in production is send(String topic, K key, V value). The key is critical: Kafka routes all messages with the same key to the same partition, guaranteeing order for that key. For an order service, use orderId as the key — all events for a given order (created, paid, shipped) land on the same partition and are processed in sequence by the same consumer.

The returned CompletableFuture resolves when the broker acknowledges the message according to the acks setting. With acks=all (the production default), resolution means all in-sync replicas have persisted the message — durable and safe. Never call .send() and discard the future in production code; at minimum, attach a callback to log send failures. Better yet, wire a global ProducerListener bean that records metrics on every success and failure.

For high-throughput scenarios, KafkaTemplate supports batching transparently via the producer's batch.size and linger.ms settings. Setting linger.ms=5 causes the producer to wait up to 5ms before flushing a batch, dramatically increasing throughput at the cost of slightly higher latency. In microservices with bursty traffic patterns, this trade-off is almost always worth it. Enable compression.type=snappy or lz4 alongside batching to reduce network and broker storage overhead by 60-70% for JSON payloads.

Transactional producers require a transactionIdPrefix on the ProducerFactory and a KafkaTransactionManager bean. Inside a @Transactional method, all KafkaTemplate.send() calls participate in a single atomic transaction — either all messages are committed or none are, even across multiple topics. This is the foundation for exactly-once semantics when combined with isolation.level=read_committed on consumers.

Never Swallow the Send Future

Calling kafkaTemplate.send(topic, key, value) without attaching a callback or calling .get() means producer errors (broker unreachable, topic not found, serialization failure) are silently discarded. In production, every send future must have a whenComplete callback that at minimum logs the failure and increments an error metric.

Production Insight

For order-critical messages, always configure a ProducerInterceptor that adds a unique message ID header. This enables idempotent consumers to detect and skip duplicates without relying solely on Kafka's idempotent producer setting.

Key Takeaway

Choose your message key based on the ordering unit (orderId, userId, sessionId) — it directly determines which partition receives the message and which consumer processes it.

Consuming Messages with @KafkaListener

The @KafkaListener annotation transforms a Spring-managed method into a Kafka message handler. The listener container factory polls the broker on a background thread, deserializes the record, and invokes your method. Spring's full DI context is active — you can inject repositories, call other services, and participate in transactions exactly as you would in a REST controller.

The method signature is flexible. Accept ConsumerRecord<K, V> for access to offset, partition, headers, and timestamp. Accept just the value type for clean domain code. Inject @Header(KafkaHeaders.RECEIVED_PARTITION) or @Header(KafkaHeaders.OFFSET) for individual header values. Accept Acknowledgment ack when using AckMode.MANUAL to control exactly when the offset is committed.

The concurrency attribute on @KafkaListener (or on the container factory) controls how many consumer threads the application runs. With concurrency=3 on a topic with 6 partitions, Spring starts 3 KafkaMessageListenerContainer instances, each consuming 2 partitions. Setting concurrency higher than the partition count wastes threads — the excess consumers sit idle waiting for a rebalance that gives them a partition. A common production pattern is to set partition count equal to the maximum expected consumer concurrency.

Partition assignment and group coordination happen automatically, but you can influence behavior with @TopicPartition on @KafkaListener for static partition assignment — useful for consumers that need to maintain local state (e.g., windowed aggregations) without rebalance disruption. Static assignment bypasses the group coordinator and requires careful management when scaling, so use it judiciously.

Error handling in @KafkaListener follows a layered model: the method itself can throw, triggering the container's error handler. The default DefaultErrorHandler retries with exponential backoff and then sends the record to a dead-letter topic (if configured). Understanding this pipeline is essential — an unconfigured error handler means a poison message will retry infinitely, blocking the partition.

Concurrency Must Not Exceed Partition Count

Setting concurrency=10 on a topic with 3 partitions starts 10 threads but only 3 will ever receive messages. The other 7 consume memory and CPU for zero throughput benefit. Always align concurrency with your topic's partition count.

Production Insight

Use separate groupId values for different business purposes consuming the same topic (e.g., order-service for fulfillment, analytics-service for reporting). Each group maintains independent offsets and scales independently — this is Kafka's broadcast pattern.

Key Takeaway

@KafkaListener is powerful but requires understanding the concurrency-to-partition relationship and error handler pipeline to avoid silent message loss or infinite retry loops.

thecodeforge.io

Spring Boot Kafka Basics

JSON Serialization: JsonSerializer and JsonDeserializer

Raw Kafka messages are byte arrays. Spring Kafka's JsonSerializer and JsonDeserializer use Jackson to convert between Java objects and JSON bytes automatically, making your @KafkaListener methods receive typed POJOs instead of byte arrays.

On the producer side, JsonSerializer serializes the POJO using Jackson's ObjectMapper. By default, it adds a spring_json_header_types message header containing the fully qualified Java class name of the value. This header is read by JsonDeserializer on the consumer to determine the target deserialization type without explicit configuration — convenient for same-codebase producer/consumer pairs.

The header-based type resolution breaks in cross-service communication. If the producer is a Python service or a different Java package structure, the header will reference a class that doesn't exist in the consumer's classpath. Configure explicit type mapping using spring.json.type.mapping=order:com.example.events.OrderEvent on both sides — this maps a logical alias to the concrete class, decoupling producer and consumer class hierarchies.

For schema evolution, use Jackson's @JsonIgnoreProperties(ignoreUnknown = true) on your event classes. This allows consumers to handle new fields added by producers without throwing deserialization errors — a critical resilience pattern for independently deployable services. Avoid removing fields or changing field types without a coordinated migration plan, as Kafka retains historical messages that will be deserialized by both old and new consumers.

Customize the shared ObjectMapper by declaring an ObjectMapper @Bean with JavaTimeModule registered (for LocalDateTime, Instant, etc.) and WRITE_DATES_AS_TIMESTAMPS disabled. Spring Kafka's JsonSerializer and JsonDeserializer pick up the primary ObjectMapper bean automatically in Spring Boot 3.x, ensuring consistent date serialization across your entire application.

Always Set Trusted Packages in Production

Without spring.json.trusted.packages or addTrustedPackages(), JsonDeserializer throws IllegalArgumentException: The class ... is not in the trusted packages. Avoid setting * (trust all) in production — enumerate the specific packages your consumers need to deserialize from.

Production Insight

Define a canonical event schema module (a shared library) that both producer and consumer depend on. Version it with SemVer and release additive-only changes. This eliminates class-not-found deserialization errors across service boundaries.

Key Takeaway

Decouple producer and consumer class paths using spring.json.type.mapping and annotate all event records with @JsonIgnoreProperties(ignoreUnknown = true) for safe schema evolution.

Topic Management with NewTopic Bean and KafkaAdmin

Managing Kafka topics programmatically through Spring eliminates the need for manual kafka-topics.sh commands in deployment pipelines. The NewTopic @Bean pattern integrates topic creation into the application lifecycle — on startup, KafkaAdmin checks each NewTopic bean against the broker and creates missing topics.

Partition count is the most consequential topic configuration decision. Partitions determine max parallelism: a topic with 6 partitions can be consumed by at most 6 consumers in the same group simultaneously. Under-partitioning is a performance ceiling that cannot be easily raised in production without a migration plan. Over-partitioning wastes broker resources (each partition requires memory, file handles, and replication overhead). A practical rule: estimate your peak consumer throughput requirement, divide by throughput per consumer, and multiply by 2 for headroom.

Replication factor should be 3 in any production cluster with at least 3 brokers. This tolerates one broker failure without data loss and maintains availability for reads and writes. Setting min.insync.replicas=2 on the topic alongside acks=all on the producer ensures that at least 2 replicas must acknowledge each write before the producer considers it complete — a strong durability guarantee.

Retention configuration (retention.ms, retention.bytes) controls how long messages remain available for consumption after being written. Event sourcing use cases often need retention.ms=-1 (infinite retention) or compaction (cleanup.policy=compact) so the latest value per key is always retained. For ephemeral event streams, 7-day retention is a common default that balances storage cost with the ability to reprocess recent history after an incident.

Beyond NewTopic beans, KafkaAdmin exposes a programmatic API for dynamic topic management at runtime — useful for multi-tenant architectures where tenant onboarding triggers topic creation, or for integration tests that need to create and delete topics per test run.

Partition Count Cannot Be Reduced

Once a topic is created with N partitions, you can only increase — never decrease — partition count. Increasing partitions redistributes future messages but does NOT move existing messages, which can break ordering guarantees for keys that relied on the old partition assignment. Plan partition counts carefully before going to production.

Production Insight

Use Spring profiles to set different partition counts per environment: 1 partition locally, 3 in staging, 6-12 in production. Inject @Value("${kafka.topics.orders.partitions:6}") into the NewTopic bean to keep environment differences in configuration, not code.

Key Takeaway

Invest time in partition count planning before production — it's the one Kafka configuration you can only make harder to change later, never easier.

Consumer Lag Monitoring in Production

Consumer lag is the gap between the latest message offset in a partition and the offset the consumer group has committed. A lag of zero means the consumer is keeping up in real time. Growing lag signals that consumers are processing slower than producers are writing — a leading indicator of problems before they become user-visible.

The canonical CLI command for lag inspection is kafka-consumer-groups.sh --bootstrap-server <broker> --describe --group <group>. It shows each partition's LOG-END-OFFSET (latest), CURRENT-OFFSET (committed), LAG (the difference), and which consumer instance currently owns the partition. Run this during incidents to instantly identify whether lag is uniform (a throughput problem) or isolated to specific partitions (possibly a poison message or a crashed consumer).

For ongoing monitoring, expose Kafka consumer metrics through Spring Boot Actuator and Micrometer. Add spring.kafka.listener.observation-enabled=true and the Micrometer registry of your choice. The kafka.consumer.fetch-manager.records-lag gauge updates per-poll and flows into Prometheus/Datadog/CloudWatch without additional instrumentation. Set an alert rule: if lag on any critical consumer group exceeds N * (acceptable processing time in seconds), page the on-call engineer.

In Kubernetes environments, the Kafka Lag Exporter (Lightbend) or Confluent's built-in metrics reporter surfaces per-consumer-group lag as a Prometheus metric that integrates with KEDA for autoscaling. Configure a ScaledObject that adds consumer replicas when lag exceeds a threshold and removes them when lag drops — this creates a self-regulating consumer pool that handles traffic bursts without manual intervention.

Beyond lag, monitor records-consumed-rate, fetch-latency-avg, and commit-latency-avg as leading indicators. A spike in fetch-latency-avg suggests broker-side pressure or network issues. A spike in commit-latency-avg with growing lag suggests the consumer is processing fast but offset commits are slow — often caused by a blocked @Transactional boundary or a network partition to ZooKeeper (pre-KRaft) or the coordinator broker.

Lag of Zero Does Not Mean Healthy Processing

A consumer can have zero lag while silently discarding messages (e.g., catching exceptions and committing offsets without actually processing). Always track business-level metrics (orders processed per second, payment events handled) alongside Kafka lag. Lag is a delivery metric, not a processing success metric.

Production Insight

Set up a KEDA ScaledObject on your consumer Deployment with a kafka trigger pointing at your consumer group and topic. When lag exceeds 500 messages, KEDA automatically adds consumer pods; when lag drops to zero, it scales back down. This eliminates manual scaling entirely for most bursty workloads.

Key Takeaway

Consumer lag is the single most important Kafka operational metric. Alert on it early, automate response with KEDA, and always correlate it with business-level processing success metrics.

Why Your Consumer Falls Silent: Listener Containers and Thread Safety

You deployed a new consumer. It starts fine. Then it stops processing for no obvious reason. No errors. No lag. Just silence. That's not a Kafka issue—that's a listener container misconfiguration.

Spring wraps your @KafkaListener in a ConcurrentMessageListenerContainer. That container manages the threads that poll Kafka. The concurrency setting controls how many threads run in parallel. Each thread gets its own Kafka consumer instance.

Here's the trap: by default, concurrency is 1. Even if you have 10 partitions on your topic, only one thread polls them. The other 9 partitions sit idle. Your throughput tanks, and you get no warning.

Set concurrency explicitly. Match it to your partition count—or to your consumer group's workload. If you have 6 partitions and 3 consumers, set concurrency to at least 6 per consumer group, but watch your thread pool size.

Also: each container thread shares the same error handler. If one thread throws an unhandled exception, the container can stop all threads. Use a SeekToCurrentErrorHandler to keep processing alive.

ConsumerConfig.javaJAVA

// io.thecodeforge — kafka listener container config
@Configuration
public class ConsumerConfig {

    @Bean
    public ConcurrentKafkaListenerContainerFactory<String, OrderEvent>
            kafkaListenerContainerFactory(
                    ConsumerFactory<String, OrderEvent> consumerFactory) {

        var factory = new ConcurrentKafkaListenerContainerFactory<String, OrderEvent>();
        factory.setConsumerFactory(consumerFactory);
        factory.setConcurrency(6); // Match your partition count
        factory.setCommonErrorHandler(
                new DefaultErrorHandler(
                        (record, exception) -> {
                            // Log, send to DLQ, or retry
                            log.error("Failed record: {}", record.value(), exception);
                        },
                        new FixedBackOff(1000L, 3L) // 1s delay, 3 retries
                )
        );
        return factory;
    }
}

Output

Container created with concurrency=6

Each consumer thread polls partition(s) independently

Error handler will retry up to 3 times with 1s backoff

Production Trap:

If you have 10 partitions but concurrency=3, the remaining 7 partitions are never polled. Increase concurrency to at least partition count per app instance, but never exceed available CPU threads.

Key Takeaway

Always set @KafkaListener container concurrency to match or exceed your topic partition count for the consumer group.

Custom Serializers Are a Lie: The Real Cost of JSON

JSON serialization looks clean in development. In production, it burns you. Every timestamp that deserializes as a string instead of Instant. Every null field that breaks the entire batch. Every extra byte that doubles network cost.

Here's the hard truth: Kafka stores bytes. Your JsonSerializer writes JSON strings. Your JsonDeserializer parses them back. The problem is type erasure. Kafka doesn't know your payload class at runtime unless you tell it.

Use a TypeReference or explicit deserializer. Don't rely on generic jackson type inference. When you have records from 5 months ago with a different schema, your deserializer will fail silently. The record gets discarded. Data loss.

For production, switch to Avro or Protobuf with Schema Registry. If you must use JSON, wrap your deserializer with a try-catch that logs the raw bytes. Then you can debug what actually arrived.

And never use AutoOffsetReset.LATEST for critical pipelines. You miss the first batch of messages on restart. Use EARLIEST, but only after testing that your deserializer can handle historical records.

SafeJsonDeserializer.javaJAVA

// io.thecodeforge — resilient JSON deserializer
public class SafeJsonDeserializer<T> implements Deserializer<T> {

    private final ObjectMapper mapper = new ObjectMapper();
    private final Class<T> targetClass;

    public SafeJsonDeserializer(Class<T> targetClass) {
        this.targetClass = targetClass;
    }

    @Override
    public T deserialize(String topic, byte[] data) {
        if (data == null) return null;
        try {
            return mapper.readValue(data, targetClass);
        } catch (Exception e) {
            // Log raw bytes before failing
            log.error("Deserialization failed for topic {}: raw={}", topic,
                    new String(data, StandardCharsets.UTF_8), e);
            throw new SerializationException("Cannot deserialize to " + targetClass, e);
        }
    }
}

// Usage in application.yml:
// spring.kafka.consumer.value-deserializer: io.thecodeforge.SafeJsonDeserializer

Output

// On deserialization failure, logs raw bytes to console

// Does not silently drop records

Production Trap:

Never use the generic Jackson deserializer in production. A schema change from 3 months ago will crash your consumer with zero visibility into the raw payload.

Key Takeaway

Wrap your JSON deserializer with error logging of raw bytes. Better yet, use Avro or Protobuf with Schema Registry for any production system.

● Production incidentPOST-MORTEMseverity: high

Consumer Group Rebalance Storm Brings Order Processing to a Halt

Symptom

After a rolling deployment of the order-service, consumer lag on the orders topic spiked from near-zero to 12,000 messages. The Grafana dashboard showed all consumer instances reporting no progress for exactly 4 minutes before recovering. No errors appeared in application logs.

Assumption

The team assumed the lag spike was caused by the new code being slower at processing orders — a performance regression in the new release.

Root cause

The session.timeout.ms was left at the Kafka default of 10 seconds, and max.poll.interval.ms was 5 minutes. During rolling deployment, each pod restart triggered a group rebalance. Because the team deployed all 6 pods in rapid succession without waiting for stabilization, the group entered a continuous rebalance loop — no consumer could complete a full rebalance cycle before the next pod restart initiated another one. During rebalance, no consumer processes messages.

Fix

Set spring.kafka.consumer.properties.session.timeout.ms=45000 and stagger pod restarts with a 60-second wait between each. Alternatively, use CooperativeStickyAssignor (set partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor) which performs incremental rebalances, allowing unaffected partitions to keep processing during the transition.

Key lesson

Rebalance storms are invisible in application logs but catastrophic for throughput.
Monitor the kafka.consumer.fetch-manager.records-lag-max metric and alert when lag exceeds SLA-bound thresholds.
Always configure CooperativeStickyAssignor in services that perform rolling deployments.

Production debug guideSymptom → root cause → fix5 entries

Symptom · 01

Consumer lag grows indefinitely but no errors in logs

→

Fix

Check if the consumer group is stuck in a rebalance loop using kafka-consumer-groups.sh --describe. If the 'STATE' column shows 'PreparingRebalance' or 'CompletingRebalance' for more than 30 seconds, a rebalance storm is likely. Reduce deployment cadence and switch to CooperativeStickyAssignor. Also check max.poll.records — if each poll takes longer than max.poll.interval.ms, the broker evicts the consumer from the group silently.

Symptom · 02

Messages appear to be processed twice (duplicate processing)

→

Fix

Duplicate processing occurs when the consumer crashes after processing but before committing the offset. Verify enable.auto.commit=false and that your @KafkaListener method (or AckMode) explicitly acknowledges after successful processing. For idempotent consumption, store a processed message ID in Redis or a database unique constraint before doing business logic. Never commit offsets inside a try-catch that swallows exceptions.

Symptom · 03

KafkaTemplate.send() returns a future but messages never appear in the topic

→

Fix

The send future completes on the producer's I/O thread, not the calling thread. Call .get(5, TimeUnit.SECONDS) on the returned CompletableFuture to surface producer errors that are otherwise silently swallowed. Check acks configuration — with acks=0, the broker never confirms receipt. Also verify the topic exists; with spring.kafka.admin.auto-create-topics=false, sending to a nonexistent topic fails silently in some broker configurations.

Symptom · 04

JsonDeserializer throws ClassCastException or deserialization errors

→

Fix

Kafka's JsonDeserializer embeds the target type in the message header by default (spring_json_header_types). If producer and consumer use different package structures or class names, deserialization fails. Set spring.kafka.consumer.properties.spring.json.trusted.packages=* and explicitly configure spring.kafka.consumer.value-deserializer-class=org.springframework.kafka.support.serializer.JsonDeserializer with spring.kafka.consumer.properties.spring.json.value.default.type=com.example.OrderEvent to bypass header-based type resolution.

Symptom · 05

Topic has fewer active consumers than partitions

→

Fix

Run kafka-consumer-groups.sh --describe --group <group> and inspect the 'CONSUMER-ID' column. Partitions assigned to a blank consumer-ID are unassigned — no consumer is processing them. This happens when the consumer count drops below partition count due to crashes or misconfigured replica counts. Scale up consumer pods to match partition count, or reduce concurrency in @KafkaListener if running multiple virtual consumers per pod.

★ Kafka Debug Cheat SheetRun these commands on any Kafka host or from a pod with kafka-tools installed. Replace broker address and group/topic names as needed.

Consumer lag is high — need current offset vs end offset−

Immediate action

Describe the consumer group to see lag per partition

Commands

kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group order-service

kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group order-service --verbose

Fix now

Compare LOG-END-OFFSET vs CURRENT-OFFSET per partition. Scale consumers if lag is uniform across all partitions. If lag is only on specific partitions, check for poison messages or slow processing on that consumer instance.

Not sure if messages are reaching the topic+

Need to reset consumer group offset to reprocess messages+

Topic throughput or partition distribution looks uneven+

Kafka vs. Traditional Message Queue vs. REST Synchronous Call

Aspect	Kafka Topic	RabbitMQ/SQS Queue	REST Synchronous
Message retention	Configurable (days to forever)	Deleted after ACK	Not stored
Consumer scalability	Up to partition count per group	Unlimited consumers competing	Limited by upstream capacity
Message ordering	Guaranteed per partition/key	Best-effort (FIFO queues: per queue)	Request order only
Multiple consumers	Yes — independent consumer groups	No — competing consumers	No — one response per call
Replay historical data	Yes — reset offsets	No — messages gone after ACK	No
Producer-consumer coupling	None — async, decoupled	Minimal — async	Strong — synchronous
Throughput ceiling	Millions/sec (horizontal scale)	Tens of thousands/sec	Limited by service capacity
Best for	Event streaming, audit logs, CDC	Task queues, work distribution	CRUD APIs, request-response

⚙ Quick Reference

2 commands from this guide

File	Command / Code	Purpose
ConsumerConfig.java	@Configuration	Why Your Consumer Falls Silent
SafeJsonDeserializer.java	public class SafeJsonDeserializer implements Deserializer {	Custom Serializers Are a Lie

Key takeaways

Kafka decouples producers and consumers through a durable, partitioned log

the key architectural primitive that enables independent scaling and fault isolation

The message key is not just metadata

it controls partition assignment and therefore ordering guarantees and consumer locality for stateful processing

Consumer group rebalances pause all processing in the group; use CooperativeStickyAssignor and stagger deployments to minimize rebalance frequency and duration

Always configure an error handler (DefaultErrorHandler with DeadLetterPublishingRecoverer)

a poison message on an unprotected listener blocks the entire partition indefinitely

Consumer lag is the primary production health metric for Kafka

instrument it in Prometheus, alert on thresholds, and automate scaling with KEDA

Common mistakes to avoid

7 patterns

Using the same consumer group for different business purposes

Symptom

Two different services fighting over messages — one service processes a message, the other never sees it

Fix

Assign unique groupId values to each consuming service. Messages are broadcast to all groups but load-balanced within a group. analytics-service and order-service consuming the same orders topic need different group IDs.

Setting partition count lower than max expected consumer concurrency

Symptom

Throughput plateaus despite adding more consumer pods — some pods sit idle with no partitions assigned

Fix

Set partition count to the maximum number of consumers you'll ever run concurrently. A 3-partition topic can never utilize more than 3 consumers in one group, regardless of how many pods you deploy.

Not handling the send future result in KafkaTemplate.send()

Symptom

Messages silently disappear under broker pressure, network issues, or topic-not-found errors with no errors in application logs

Fix

Always attach .whenComplete((result, ex) -> { if (ex != null) { log.error(...); } }) to the returned CompletableFuture. For critical messages, use .get(timeout) for synchronous confirmation.

Leaving enable.auto.commit=true with manual business logic

Symptom

Duplicate processing after consumer restarts — messages are acknowledged before your database write completes

Fix

Set spring.kafka.consumer.enable-auto-commit=false and use AckMode.MANUAL_IMMEDIATE. Commit the offset only after your business logic succeeds. This prevents the offset from advancing past messages you haven't actually processed.

Not configuring an error handler on the listener container

Symptom

A single malformed message (poison pill) causes the consumer to retry infinitely, blocking the partition and growing lag to the millions

Fix

Configure DefaultErrorHandler with a FixedBackOff or ExponentialBackOff and a dead-letter topic fallback. After N retries, the message is sent to a .DLT topic and processing of subsequent messages resumes.

Running consumer concurrency higher than partition count

Symptom

Excess consumer threads are started but receive no messages; CPU and memory wasted; occasional NullPointerExceptions from unassigned consumer instances

Fix

Set concurrency in @KafkaListener or the container factory to be <= partition count. Idle consumer threads are not just wasteful — they participate in rebalances, adding latency.

Forgetting to trust packages in JsonDeserializer

Symptom

Application starts but throws IllegalArgumentException: The class X is not in the trusted packages on first message received

Fix

Add spring.kafka.consumer.properties.spring.json.trusted.packages=com.example.events to application.yml. For cross-org topics, enumerate all producer packages explicitly rather than using *.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR

What is a Kafka consumer group and how does it enable parallel processin...

Q02JUNIOR

How does message ordering work in Kafka, and how do you ensure related m...

Q03SENIOR

What happens during a consumer group rebalance, and how can you minimize...

Q04SENIOR

Explain the difference between acks=0, acks=1, and acks=all in Kafka pro...

Q05SENIOR

How does Spring Kafka's DefaultErrorHandler work, and what happens to a ...

Q06SENIOR

How would you implement exactly-once semantics in a Kafka-based Spring B...

Q07SENIOR

How do you handle schema evolution in Kafka events when multiple service...

Q08SENIOR

You see consumer lag growing on one specific partition but not others. W...

Q09JUNIOR

What is the role of the message key in Kafka, and what are the consequen...

Q01 of 09JUNIOR

What is a Kafka consumer group and how does it enable parallel processing?

ANSWER

A consumer group is a set of consumer instances that share a group.id. Kafka assigns each topic partition to exactly one consumer within the group at any time, enabling parallel processing where each consumer handles a subset of partitions. Adding consumers to the group (up to the partition count) increases throughput linearly. Multiple groups can consume the same topic independently — each group maintains its own committed offsets, effectively broadcasting the same events to multiple downstream systems.

FAQ · 6 QUESTIONS

Frequently Asked Questions

How many partitions should I create for a new topic?

What is the difference between @KafkaListener and a Spring Integration Kafka channel?

How do I test @KafkaListener methods without a running Kafka broker?

When should I use Kafka vs. RabbitMQ for a new project?

How do I handle Kafka in a transactional context with a database?

What does auto-offset-reset=earliest vs latest mean in production?

Naren Founder & Principal Engineer

20+ years shipping production Java in banking & fintech. Drawn from code that ran under real load.

✓ Verified

production tested

July 04, 2026

last updated

1,697

articles · all by Naren

🔥

That's Messaging. Mark it forged?

8 min read · try the examples if you haven't