Kafka with Spring Boot: Complete Beginner to Production Guide
Master Apache Kafka with Spring Boot — KafkaTemplate, @KafkaListener, consumer groups, partitions, JsonSerializer, and lag monitoring in production.
- Use
KafkaTemplate.send(topic, key, value)to produce messages from any Spring bean - Annotate methods with
@KafkaListener(topics="orders", groupId="order-service")to consume - Register a
NewTopic@Bean to auto-create topics with partition/replication config - Configure
JsonSerializer/JsonDeserializerfor automatic POJO serialization - Monitor consumer lag with
kafka-consumer-groups.sh --describeto detect processing backlogs
Think of Kafka like a massive, ordered logbook in a busy post office. Producers drop letters (messages) into named mailboxes (topics), which are split into numbered slots (partitions) for parallel delivery. Consumer groups are teams of postal workers — each worker handles a different slot, and Kafka remembers exactly which letter each team last processed, so nothing gets lost or delivered twice.
Every high-traffic production system eventually hits the wall: the monolith that tries to do everything synchronously collapses under load. An order placed on Black Friday triggers an inventory check, a fraud scan, an email, a push notification, and a loyalty-points calculation — all in one HTTP request. The timeout cascades. Users see errors. The on-call engineer gets paged at 2 AM.
Apache Kafka solves this by decoupling producers from consumers through a distributed, durable, ordered log. Instead of calling seven downstream services synchronously, you write one message to a Kafka topic and go back to serving the next request. Each downstream service consumes at its own pace, retries independently, and scales horizontally without touching the producer.
Spring Boot's spring-kafka autoconfiguration makes wiring Kafka into a production service remarkably straightforward. A KafkaTemplate bean is ready to inject the moment you add the dependency and configure spring.kafka.bootstrap-servers. An @KafkaListener annotation turns any method into a message handler with full Spring context — transactions, security, metrics — all available.
But the ease of getting started hides the operational complexity lurking underneath. Consumer group rebalancing can pause processing for seconds. Poorly chosen partition counts bottleneck throughput. A single slow consumer can cause the entire group to fall behind, piling up unbounded lag. Without proper observability — consumer lag dashboards, offset monitoring, dead-letter queues — you won't know you have a problem until your database is backed up by hours of unprocessed events.
This guide walks through every layer: the Spring Boot autoconfiguration that bootstraps the connection, producing and consuming type-safe POJOs with JSON serialization, managing topics programmatically, and monitoring consumer lag in production using real Kafka CLI commands. Every concept is grounded in production patterns used at scale.
Setting Up Spring Boot with Kafka: Autoconfiguration Deep Dive
Spring Boot's autoconfiguration for Kafka activates the moment spring-kafka is on the classpath and spring.kafka.bootstrap-servers is set. Behind the scenes, KafkaAutoConfiguration creates a KafkaAdmin bean (for topic management), a KafkaTemplate bean (for producing), and a ConcurrentKafkaListenerContainerFactory bean (for consuming). You rarely need to declare these manually.
The producer factory uses spring.kafka.producer.* properties to build the underlying KafkaProducer. Key properties: key-serializer defaults to StringSerializer, value-serializer should be set to JsonSerializer for POJO payloads. The consumer factory mirrors this with key-deserializer and value-deserializer. Spring Boot maps spring.kafka.consumer.group-id directly to the Kafka group.id property, so all @KafkaListener methods in the application share the same group unless overridden.
For multi-environment setups, externalise all Kafka config in application.yml and use Spring profiles. In local development, a Testcontainers KafkaContainer or an embedded broker (@EmbeddedKafka from spring-kafka-test) provides a realistic test environment without a running cluster. In production, point bootstrap-servers at your MSK or Confluent endpoint and layer in SSL/SASL configuration via spring.kafka.ssl.* and spring.kafka.security.protocol.
One subtle autoconfiguration detail: KafkaAdmin.autoCreateTopics defaults to true, meaning any NewTopic @Bean you declare will be created automatically on startup if it doesn't exist, and the partition/replication configuration will be applied to new topics but NOT retroactively updated on existing ones. This is a common source of confusion when changing partition counts — Kafka doesn't support reducing partitions, and the NewTopic bean won't add partitions to an existing under-partitioned topic without explicit intervention via the AdminClient API.
orders topic already exists with 3 partitions and you change the NewTopic bean to 6 partitions, Spring Boot will NOT add partitions automatically. You must use kafka-topics.sh --alter or the AdminClient API. Never rely on NewTopic beans for partition changes on live topics.auto-offset-reset: earliest for new consumer groups so they process historical messages from the beginning. Use latest only for real-time-only consumers where historical data has no value. Misconfiguring this causes new deployments to silently skip months of messages.Producing Messages with KafkaTemplate
The KafkaTemplate<K, V> is Spring Kafka's primary API for sending messages. It wraps the native KafkaProducer and adds Spring-friendly abstractions: a default topic, header management, and a CompletableFuture<SendResult<K, V>> return type for async result handling.
The most important overload in production is send(String topic, K key, V value). The key is critical: Kafka routes all messages with the same key to the same partition, guaranteeing order for that key. For an order service, use orderId as the key — all events for a given order (created, paid, shipped) land on the same partition and are processed in sequence by the same consumer.
The returned CompletableFuture resolves when the broker acknowledges the message according to the acks setting. With acks=all (the production default), resolution means all in-sync replicas have persisted the message — durable and safe. Never call .send() and discard the future in production code; at minimum, attach a callback to log send failures. Better yet, wire a global ProducerListener bean that records metrics on every success and failure.
For high-throughput scenarios, KafkaTemplate supports batching transparently via the producer's batch.size and linger.ms settings. Setting linger.ms=5 causes the producer to wait up to 5ms before flushing a batch, dramatically increasing throughput at the cost of slightly higher latency. In microservices with bursty traffic patterns, this trade-off is almost always worth it. Enable compression.type=snappy or lz4 alongside batching to reduce network and broker storage overhead by 60-70% for JSON payloads.
Transactional producers require a transactionIdPrefix on the ProducerFactory and a KafkaTransactionManager bean. Inside a @Transactional method, all KafkaTemplate.send() calls participate in a single atomic transaction — either all messages are committed or none are, even across multiple topics. This is the foundation for exactly-once semantics when combined with isolation.level=read_committed on consumers.
kafkaTemplate.send(topic, key, value) without attaching a callback or calling .get() means producer errors (broker unreachable, topic not found, serialization failure) are silently discarded. In production, every send future must have a whenComplete callback that at minimum logs the failure and increments an error metric.ProducerInterceptor that adds a unique message ID header. This enables idempotent consumers to detect and skip duplicates without relying solely on Kafka's idempotent producer setting.Consuming Messages with @KafkaListener
The @KafkaListener annotation transforms a Spring-managed method into a Kafka message handler. The listener container factory polls the broker on a background thread, deserializes the record, and invokes your method. Spring's full DI context is active — you can inject repositories, call other services, and participate in transactions exactly as you would in a REST controller.
The method signature is flexible. Accept ConsumerRecord<K, V> for access to offset, partition, headers, and timestamp. Accept just the value type for clean domain code. Inject @Header(KafkaHeaders.RECEIVED_PARTITION) or @Header(KafkaHeaders.OFFSET) for individual header values. Accept Acknowledgment ack when using AckMode.MANUAL to control exactly when the offset is committed.
The concurrency attribute on @KafkaListener (or on the container factory) controls how many consumer threads the application runs. With concurrency=3 on a topic with 6 partitions, Spring starts 3 KafkaMessageListenerContainer instances, each consuming 2 partitions. Setting concurrency higher than the partition count wastes threads — the excess consumers sit idle waiting for a rebalance that gives them a partition. A common production pattern is to set partition count equal to the maximum expected consumer concurrency.
Partition assignment and group coordination happen automatically, but you can influence behavior with @TopicPartition on @KafkaListener for static partition assignment — useful for consumers that need to maintain local state (e.g., windowed aggregations) without rebalance disruption. Static assignment bypasses the group coordinator and requires careful management when scaling, so use it judiciously.
Error handling in @KafkaListener follows a layered model: the method itself can throw, triggering the container's error handler. The default DefaultErrorHandler retries with exponential backoff and then sends the record to a dead-letter topic (if configured). Understanding this pipeline is essential — an unconfigured error handler means a poison message will retry infinitely, blocking the partition.
concurrency=10 on a topic with 3 partitions starts 10 threads but only 3 will ever receive messages. The other 7 consume memory and CPU for zero throughput benefit. Always align concurrency with your topic's partition count.groupId values for different business purposes consuming the same topic (e.g., order-service for fulfillment, analytics-service for reporting). Each group maintains independent offsets and scales independently — this is Kafka's broadcast pattern.JSON Serialization: JsonSerializer and JsonDeserializer
Raw Kafka messages are byte arrays. Spring Kafka's JsonSerializer and JsonDeserializer use Jackson to convert between Java objects and JSON bytes automatically, making your @KafkaListener methods receive typed POJOs instead of byte arrays.
On the producer side, JsonSerializer serializes the POJO using Jackson's ObjectMapper. By default, it adds a spring_json_header_types message header containing the fully qualified Java class name of the value. This header is read by JsonDeserializer on the consumer to determine the target deserialization type without explicit configuration — convenient for same-codebase producer/consumer pairs.
The header-based type resolution breaks in cross-service communication. If the producer is a Python service or a different Java package structure, the header will reference a class that doesn't exist in the consumer's classpath. Configure explicit type mapping using spring.json.type.mapping=order:com.example.events.OrderEvent on both sides — this maps a logical alias to the concrete class, decoupling producer and consumer class hierarchies.
For schema evolution, use Jackson's @JsonIgnoreProperties(ignoreUnknown = true) on your event classes. This allows consumers to handle new fields added by producers without throwing deserialization errors — a critical resilience pattern for independently deployable services. Avoid removing fields or changing field types without a coordinated migration plan, as Kafka retains historical messages that will be deserialized by both old and new consumers.
Customize the shared ObjectMapper by declaring an ObjectMapper @Bean with JavaTimeModule registered (for LocalDateTime, Instant, etc.) and WRITE_DATES_AS_TIMESTAMPS disabled. Spring Kafka's JsonSerializer and JsonDeserializer pick up the primary ObjectMapper bean automatically in Spring Boot 3.x, ensuring consistent date serialization across your entire application.
spring.json.trusted.packages or addTrustedPackages(), JsonDeserializer throws IllegalArgumentException: The class ... is not in the trusted packages. Avoid setting * (trust all) in production — enumerate the specific packages your consumers need to deserialize from.spring.json.type.mapping and annotate all event records with @JsonIgnoreProperties(ignoreUnknown = true) for safe schema evolution.Topic Management with NewTopic Bean and KafkaAdmin
Managing Kafka topics programmatically through Spring eliminates the need for manual kafka-topics.sh commands in deployment pipelines. The NewTopic @Bean pattern integrates topic creation into the application lifecycle — on startup, KafkaAdmin checks each NewTopic bean against the broker and creates missing topics.
Partition count is the most consequential topic configuration decision. Partitions determine max parallelism: a topic with 6 partitions can be consumed by at most 6 consumers in the same group simultaneously. Under-partitioning is a performance ceiling that cannot be easily raised in production without a migration plan. Over-partitioning wastes broker resources (each partition requires memory, file handles, and replication overhead). A practical rule: estimate your peak consumer throughput requirement, divide by throughput per consumer, and multiply by 2 for headroom.
Replication factor should be 3 in any production cluster with at least 3 brokers. This tolerates one broker failure without data loss and maintains availability for reads and writes. Setting min.insync.replicas=2 on the topic alongside acks=all on the producer ensures that at least 2 replicas must acknowledge each write before the producer considers it complete — a strong durability guarantee.
Retention configuration (retention.ms, retention.bytes) controls how long messages remain available for consumption after being written. Event sourcing use cases often need retention.ms=-1 (infinite retention) or compaction (cleanup.policy=compact) so the latest value per key is always retained. For ephemeral event streams, 7-day retention is a common default that balances storage cost with the ability to reprocess recent history after an incident.
Beyond NewTopic beans, KafkaAdmin exposes a programmatic API for dynamic topic management at runtime — useful for multi-tenant architectures where tenant onboarding triggers topic creation, or for integration tests that need to create and delete topics per test run.
@Value("${kafka.topics.orders.partitions:6}") into the NewTopic bean to keep environment differences in configuration, not code.Consumer Lag Monitoring in Production
Consumer lag is the gap between the latest message offset in a partition and the offset the consumer group has committed. A lag of zero means the consumer is keeping up in real time. Growing lag signals that consumers are processing slower than producers are writing — a leading indicator of problems before they become user-visible.
The canonical CLI command for lag inspection is kafka-consumer-groups.sh --bootstrap-server <broker> --describe --group <group>. It shows each partition's LOG-END-OFFSET (latest), CURRENT-OFFSET (committed), LAG (the difference), and which consumer instance currently owns the partition. Run this during incidents to instantly identify whether lag is uniform (a throughput problem) or isolated to specific partitions (possibly a poison message or a crashed consumer).
For ongoing monitoring, expose Kafka consumer metrics through Spring Boot Actuator and Micrometer. Add spring.kafka.listener.observation-enabled=true and the Micrometer registry of your choice. The kafka.consumer.fetch-manager.records-lag gauge updates per-poll and flows into Prometheus/Datadog/CloudWatch without additional instrumentation. Set an alert rule: if lag on any critical consumer group exceeds N * (acceptable processing time in seconds), page the on-call engineer.
In Kubernetes environments, the Kafka Lag Exporter (Lightbend) or Confluent's built-in metrics reporter surfaces per-consumer-group lag as a Prometheus metric that integrates with KEDA for autoscaling. Configure a ScaledObject that adds consumer replicas when lag exceeds a threshold and removes them when lag drops — this creates a self-regulating consumer pool that handles traffic bursts without manual intervention.
Beyond lag, monitor records-consumed-rate, fetch-latency-avg, and commit-latency-avg as leading indicators. A spike in fetch-latency-avg suggests broker-side pressure or network issues. A spike in commit-latency-avg with growing lag suggests the consumer is processing fast but offset commits are slow — often caused by a blocked @Transactional boundary or a network partition to ZooKeeper (pre-KRaft) or the coordinator broker.
kafka trigger pointing at your consumer group and topic. When lag exceeds 500 messages, KEDA automatically adds consumer pods; when lag drops to zero, it scales back down. This eliminates manual scaling entirely for most bursty workloads.Consumer Group Rebalance Storm Brings Order Processing to a Halt
orders topic spiked from near-zero to 12,000 messages. The Grafana dashboard showed all consumer instances reporting no progress for exactly 4 minutes before recovering. No errors appeared in application logs.session.timeout.ms was left at the Kafka default of 10 seconds, and max.poll.interval.ms was 5 minutes. During rolling deployment, each pod restart triggered a group rebalance. Because the team deployed all 6 pods in rapid succession without waiting for stabilization, the group entered a continuous rebalance loop — no consumer could complete a full rebalance cycle before the next pod restart initiated another one. During rebalance, no consumer processes messages.spring.kafka.consumer.properties.session.timeout.ms=45000 and stagger pod restarts with a 60-second wait between each. Alternatively, use CooperativeStickyAssignor (set partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor) which performs incremental rebalances, allowing unaffected partitions to keep processing during the transition.- Rebalance storms are invisible in application logs but catastrophic for throughput.
- Monitor the
kafka.consumer.fetch-manager.records-lag-maxmetric and alert when lag exceeds SLA-bound thresholds. - Always configure
CooperativeStickyAssignorin services that perform rolling deployments.
kafka-consumer-groups.sh --describe. If the 'STATE' column shows 'PreparingRebalance' or 'CompletingRebalance' for more than 30 seconds, a rebalance storm is likely. Reduce deployment cadence and switch to CooperativeStickyAssignor. Also check max.poll.records — if each poll takes longer than max.poll.interval.ms, the broker evicts the consumer from the group silently.enable.auto.commit=false and that your @KafkaListener method (or AckMode) explicitly acknowledges after successful processing. For idempotent consumption, store a processed message ID in Redis or a database unique constraint before doing business logic. Never commit offsets inside a try-catch that swallows exceptions.KafkaTemplate.send() returns a future but messages never appear in the topic.get(5, TimeUnit.SECONDS) on the returned CompletableFuture to surface producer errors that are otherwise silently swallowed. Check acks configuration — with acks=0, the broker never confirms receipt. Also verify the topic exists; with spring.kafka.admin.auto-create-topics=false, sending to a nonexistent topic fails silently in some broker configurations.JsonDeserializer embeds the target type in the message header by default (spring_json_header_types). If producer and consumer use different package structures or class names, deserialization fails. Set spring.kafka.consumer.properties.spring.json.trusted.packages=* and explicitly configure spring.kafka.consumer.value-deserializer-class=org.springframework.kafka.support.serializer.JsonDeserializer with spring.kafka.consumer.properties.spring.json.value.default.type=com.example.OrderEvent to bypass header-based type resolution.kafka-consumer-groups.sh --describe --group <group> and inspect the 'CONSUMER-ID' column. Partitions assigned to a blank consumer-ID are unassigned — no consumer is processing them. This happens when the consumer count drops below partition count due to crashes or misconfigured replica counts. Scale up consumer pods to match partition count, or reduce concurrency in @KafkaListener if running multiple virtual consumers per pod.kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group order-servicekafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group order-service --verboseKey takeaways
Common mistakes to avoid
7 patternsUsing the same consumer group for different business purposes
groupId values to each consuming service. Messages are broadcast to all groups but load-balanced within a group. analytics-service and order-service consuming the same orders topic need different group IDs.Setting partition count lower than max expected consumer concurrency
Not handling the send future result in KafkaTemplate.send()
.whenComplete((result, ex) -> { if (ex != null) { log.error(...); } }) to the returned CompletableFuture. For critical messages, use .get(timeout) for synchronous confirmation.Leaving enable.auto.commit=true with manual business logic
spring.kafka.consumer.enable-auto-commit=false and use AckMode.MANUAL_IMMEDIATE. Commit the offset only after your business logic succeeds. This prevents the offset from advancing past messages you haven't actually processed.Not configuring an error handler on the listener container
DefaultErrorHandler with a FixedBackOff or ExponentialBackOff and a dead-letter topic fallback. After N retries, the message is sent to a .DLT topic and processing of subsequent messages resumes.Running consumer concurrency higher than partition count
concurrency in @KafkaListener or the container factory to be <= partition count. Idle consumer threads are not just wasteful — they participate in rebalances, adding latency.Forgetting to trust packages in JsonDeserializer
IllegalArgumentException: The class X is not in the trusted packages on first message receivedspring.kafka.consumer.properties.spring.json.trusted.packages=com.example.events to application.yml. For cross-org topics, enumerate all producer packages explicitly rather than using *.Interview Questions on This Topic
What is a Kafka consumer group and how does it enable parallel processing?
group.id. Kafka assigns each topic partition to exactly one consumer within the group at any time, enabling parallel processing where each consumer handles a subset of partitions. Adding consumers to the group (up to the partition count) increases throughput linearly. Multiple groups can consume the same topic independently — each group maintains its own committed offsets, effectively broadcasting the same events to multiple downstream systems.Frequently Asked Questions
That's Messaging. Mark it forged?
9 min read · try the examples if you haven't