Advanced 14 min · March 06, 2026

Apache Kafka Basics

Apache Kafka — Why Rebalance Storms Kill Consumer Groups

Q: How many Kafka partitions should I create for a new topic?

A practical starting point is to divide your target throughput (MB/s) by the throughput a single consumer can handle (MB/s), then multiply by 3 to give yourself room to scale. Never go below your expected peak consumer count. For example: if you expect 200 MB/s of write traffic and a single consumer can process 20 MB/s, you need at least 10 partitions for current load. Multiplying by 3 gives you 30 partitions to handle traffic spikes and future growth without repartitioning. Remember: you can add partitions later, but you cannot remove them, so err toward more partitions — the overhead of extra partitions is low until you get into the thousands (1,000+ partitions creates ZooKeeper/KRaft metadata pressure).

Q: What is the difference between Kafka offset and a message ID?

An offset is a monotonically increasing integer that Kafka assigns per partition — it's a position in the log, not a globally unique message identifier. The same offset number (e.g., offset 42) exists independently in every partition of a topic. Offsets are Kafka-internal bookmarks; they tell a consumer where it is in a partition, not which specific message it has. If you need a globally unique message ID (for deduplication, idempotent processing, or cross-partition correlation), you must include it in the message payload itself. Common practice: use a UUID or a composite of timestamp + sequence number + producer ID as the message key or in a dedicated 'messageId' field.

Q: Is Kafka a database? Can I query it like one?

Kafka is not a database in the traditional sense — it doesn't support secondary indexes, arbitrary queries, or point lookups by key (without scanning the entire log). It's a distributed, append-only event log optimised for sequential writes and sequential reads. However, Kafka Streams and ksqlDB layer SQL-like query and aggregation capabilities on top of Kafka topics, and log compaction lets Kafka retain only the latest value per key, making it behave like a changelog for a key-value store. For true ad-hoc query capabilities, you'd normally stream Kafka data into a separate analytics store (Elasticsearch, ClickHouse, Druid) or a data warehouse (BigQuery, Snowflake) rather than query Kafka directly.

Q: How do I choose between Kafka and RabbitMQ for a new project?

Kafka is for event streaming and historical replay. Choose Kafka when: you need to replay messages, retain data for days/weeks, support multiple consumer groups reading the same data independently, handle very high throughput (>100 MB/s), or your architecture is event-driven with logs as the source of truth. RabbitMQ is for work queues and transient messages. Choose RabbitMQ when: messages are transient and deleted after consumption, you need complex routing (headers, topics, direct), you need message priority or per-message TTL, or your throughput is under 50 MB/s and latency under 10ms is critical. The two are not direct competitors — teams often use both: RabbitMQ for request/response patterns and background jobs, Kafka for event sourcing, audit logs, and streaming analytics.

One slow consumer caused 45 min of Black Friday downtime.

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Drawn from code that ran under real load.

✓ Production

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

Before you start⏱ 30 min

✓Deep production experience
✓Understanding of internals and trade-offs
✓Experience debugging complex systems

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Kafka is a distributed, partitioned, append-only commit log — not a message queue. Data persists, consumers read by offset, messages are not deleted after consumption
Core components: topics (named logs), partitions (parallelism units), offsets (sequence numbers), consumer groups (shared load), brokers (servers)
Performance magic: sequential append writes + zero-copy sendfile — a single broker can saturate a 10Gbps NIC
Partition count is your permanent max parallelism: you can never have more active consumers than partitions in a group
Production trap: consumer rebalance storms when max.poll.interval.ms is too low — the group stops all consumption and oscillates
Biggest mistake: acks=all without min.insync.replicas=2 — you think writes are durable but a single broker loss can still lose data

✦ Definition~90s read

What is Apache Kafka Basics?

Apache Kafka is a distributed event streaming platform built on a deceptively simple idea: an append-only log. Unlike traditional message queues that delete messages after consumption, Kafka persists everything to disk and lets consumers track their position independently via offsets.

★

Imagine a massive newspaper printing plant that runs 24/7.

This design decouples producers from consumers entirely — you can replay historical data, add new consumers without changing producers, and achieve throughput measured in millions of messages per second per cluster. Kafka exists because relational databases and traditional message brokers (RabbitMQ, ActiveMQ) couldn't handle the scale, durability, and replay requirements of modern event-driven architectures at companies like LinkedIn, Uber, and Netflix.

Kafka's fundamental unit of parallelism is the partition. A topic is split into partitions, each an ordered, immutable sequence of records. Consumer groups distribute partitions across their members — each partition is consumed by exactly one consumer in the group.

This is the parallelism contract: if you have 10 partitions and 3 consumers, 2 consumers handle 3 partitions each and one handles 4. The moment a consumer joins, leaves, or fails, Kafka triggers a rebalance — a coordinated reassignment of partitions. Rebalance storms happen when this process cascades: one consumer's failure triggers a rebalance, which causes other consumers to miss heartbeats during the reassignment, triggering more rebalances.

In large clusters (100+ partitions, 20+ consumers), this can stall consumption for minutes.

Kafka's durability model revolves around replication and the In-Sync Replica (ISR) set. Each partition has a leader and N followers; producers write to the leader, followers replicate asynchronously. The ISR is the set of replicas that are fully caught up with the leader.

When a broker dies, the controller (one broker elected via ZooKeeper) picks a new leader from the ISR. The common misconception is that ZooKeeper is the bottleneck — in practice, it handles leader elections in milliseconds. The real bottleneck is the controller's single-threaded state machine processing partition reassignments and rebalance requests.

Misconfigured session timeouts, too many partitions per broker, or aggressive rebalance timeouts turn a single broker failure into a cluster-wide outage.

Plain-English First

Imagine a massive newspaper printing plant that runs 24/7. Instead of delivering one paper to one house at a time, it prints millions of copies, sorts them into labeled bundles (topics), loads each bundle onto separate delivery trucks (partitions), and lets any number of paperboys (consumers) grab from their assigned truck without slowing anyone else down. If a truck breaks down, a spare steps in immediately — nobody misses their morning paper. Kafka is that printing plant, but for data.

Every modern distributed system eventually hits the same wall: services need to talk to each other faster than synchronous HTTP allows, and they need to do it reliably even when parts of the system go down. A payment service can't wait for analytics to finish before confirming a transaction. An inventory system can't drop an event just because the warehouse app is restarting. The moment you have more than two services exchanging real-time data, you need a message broker — and Kafka has become the industry's default answer.

Kafka solves a deceptively hard problem: durable, ordered, high-throughput, multi-consumer event streaming. Traditional message queues like RabbitMQ delete a message once it's consumed. Kafka keeps it, indexed by offset, for as long as you tell it to. This means you can replay events, onboard new consumers without re-producing data, and audit exactly what happened and when. That's not a queue — it's a distributed, append-only log. Understanding the difference separates engineers who use Kafka from engineers who understand Kafka.

By the end you'll be able to reason about partition assignment and why it determines your throughput ceiling, explain exactly how consumer group rebalancing works and when it silently kills your throughput, configure producers for durability versus latency trade-offs, and walk into any production incident knowing which knobs to turn first.

Apache Kafka — The Log That Never Sleeps

Apache Kafka is a distributed event store and stream-processing platform built around an immutable, partitioned commit log. Producers append records to topics; consumers read from offsets they control. The core mechanic is that each partition is an ordered, replayable sequence — no fan-out, no broker-side filtering. This design gives Kafka O(1) disk reads per partition and horizontal scaling by adding partitions.

In practice, Kafka decouples data producers from consumers via a pull model. Consumers within a group coordinate to assign partitions — each partition goes to exactly one consumer in the group. This is where the rebalance protocol enters: when a consumer joins, leaves, or fails, the group triggers a rebalance that stops all consumers, revokes partitions, and reassigns them. During a rebalance, no messages are processed. A large group with many partitions can take seconds to minutes to stabilize.

Use Kafka when you need a durable, replayable, high-throughput event backbone — think order processing, metrics pipelines, or CDC feeds. It shines where multiple downstream systems must consume the same stream independently. But its rebalance semantics mean you must design for partition count, consumer count, and session timeouts carefully. A misconfigured group with 50 partitions and 10 consumers can rebalance for 30 seconds every time a single consumer restarts.

⚠ Rebalance Is Not Free

A rebalance stops all consumers in the group — not just the one that left. A single flaky consumer can bring the entire group to a halt repeatedly.

📊 Production Insight

Teams with 100+ partitions and 20 consumers see a single consumer's transient GC pause trigger a full-group rebalance lasting 45 seconds. Symptom: lag spikes across all partitions every few minutes, even though only one consumer is unhealthy. Rule: set session.timeout.ms to at least 3x your max expected GC pause, and use cooperative rebalancing (KIP-429) to avoid stop-the-world rebalances.

🎯 Key Takeaway

Kafka's log is immutable and partitioned — consumers read at their own offset, not from a broker push.

Rebalances are global stop-the-world events that scale with partition count, not consumer count.

Design for partition count to be a multiple of consumer count, and always use cooperative rebalancing in production.

thecodeforge.io

Apache Kafka Basics

The Append-Only Log: Why Kafka's Core Data Structure Changes Everything

Most engineers learn Kafka by learning its API. That's backwards. To really understand it, start with the log — a data structure so simple it's almost boring, yet so powerful it underpins everything from databases (WAL in PostgreSQL) to distributed consensus (Raft).

A Kafka topic is not a queue. It's a named, partitioned, append-only log. When a producer sends a message, Kafka appends it to the end of the log and assigns it a monotonically increasing integer: the offset. Consumers don't pop messages off — they read forward from an offset they track themselves. This is the first mental model shift you need to make.

Because the log is append-only and offset-indexed, reads are sequential I/O — the fastest kind of disk operation. Kafka exploits this aggressively with zero-copy I/O (sendfile syscall), meaning data moves from disk to the network socket without ever touching user-space memory. A single Kafka broker can saturate a 10Gbps NIC handling millions of messages per second, not because Kafka is magic, but because it's built around hardware's natural strengths.

Each partition is an independent log stored as a set of segment files on disk. The active segment is open for writes. Older segments are immutable and eligible for deletion or compaction based on your retention policy. Understanding this file structure is critical when you're debugging disk space alerts or tuning log compaction.

io/thecodeforge/kafka/kafka_topic_and_log_structure.shBASH

# ── Step 1: Create a topic with 3 partitions and replication factor 3 ──
# More partitions = more parallelism, but also more overhead.
# Rule of thumb: partitions = (target throughput MB/s) / (single-partition throughput MB/s)
kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --create \
  --topic order-events \
  --partitions 3 \
  --replication-factor 3 \
  --config retention.ms=604800000    # 7 days retention
  --config segment.bytes=536870912   # Roll a new segment file at 512MB

# ── Step 2: Inspect the physical log files on the broker ──
# Each partition gets its own directory: <topic>-<partition-number>
ls -lh /var/kafka-logs/order-events-0/
# Expected output:
# 00000000000000000000.index   <- sparse offset index for fast seek
# 00000000000000000000.log     <- the actual message data, append-only
# 00000000000000000000.timeindex <- maps timestamps to offsets
# leader-epoch-checkpoint      <- tracks leader changes for safe consumer recovery

# ── Step 3: Dump raw messages from a partition segment to inspect offsets ──
# kafka-dump-log is your best friend for debugging corruption or offset gaps
kafka-dump-log.sh \
  --files /var/kafka-logs/order-events-0/00000000000000000000.log \
  --print-data-log | head -20

# ── Step 4: Check topic configuration and current low/high watermarks ──
# High watermark = last offset fully replicated to all ISR members
# Log end offset = last offset written (may not be replicated yet)
kafka-log-dirs.sh \
  --bootstrap-server localhost:9092 \
  --topic-list order-events \
  --describe | python3 -m json.tool

🔥Why the High Watermark Matters More Than You Think:

Consumers can only read up to the high watermark — not the log end offset. If your ISR (In-Sync Replica) set shrinks because a follower is slow, the high watermark stalls and consumer lag appears to spike even though producers are still writing. Check ISR sizes first when you see sudden unexplained consumer lag.

📊 Production Insight

Sequential disk writes are faster than random writes on SSDs — Kafka's log structure is engineered for this.

Zero-copy sendfile moves data from disk to network without touching user memory, eliminating a full copy.

Rule: Never fight Kafka's append-only nature. Random access patterns and message-level deletes will destroy your throughput.

🎯 Key Takeaway

Kafka's performance advantage is architectural, not magic — sequential append-only disk writes plus zero-copy I/O let a single broker saturate a 10Gbps NIC.

Fight this architecture with random access and you will lose throughput fast.

Rule: treat Kafka as an append-only log with immutable segments. Writes are free; overwrites and deletes are expensive.

Kafka Compaction Decision Tree

IfTopic stores immutable event data (logs, clicks, transactions)

→

UseUse deletion compaction: cleanup.policy=delete, retention by time or size. Each event is independent and never updated.

IfTopic stores changelog — latest state per key matters (user profiles, inventory counts)

→

UseUse log compaction: cleanup.policy=compact. Kafka retains only the latest value per key. Deleting a key requires adding a tombstone record with null value.

IfTopic has very low volume (< 1000 messages/day)

→

UseUse segment.bytes=107374182 (100MB) and retention.ms=86400000 (1 day) only. Compaction overhead is wasted.

IfSegment file corruption suspected or offsets missing

→

UseRun kafka-dump-log on the segment. Corrupt segment? Reassign replicas to force a healthy replica to become leader, then stop the corrupt broker and let it rebuild from the healthy replica.

IfBroker runs out of disk space despite retention policy

→

UseCheck for active segments that cannot be deleted. A partition with very low volume may never roll a segment. Force roll with kafka-log-dirs.sh or set log.roll.ms to a sane value (604800000 = 7 days).

Partitions, Consumer Groups, and the Parallelism Contract You Must Honour

Here's the rule that most engineers learn the hard way: within a consumer group, each partition is assigned to exactly one consumer at any moment. One consumer can read from multiple partitions, but one partition cannot be read by multiple consumers in the same group simultaneously. This is Kafka's ordering guarantee — and it's also your parallelism ceiling.

If you have a topic with 6 partitions and spin up 8 consumers in the same group, 2 of those consumers will sit completely idle. You've paid for 8 processes and get the throughput of 6. Conversely, if you have 6 partitions and 2 consumers, each consumer reads from 3 partitions — perfectly fine, just not maximally parallel.

Consumer group rebalancing is where production systems silently degrade. A rebalance happens when a consumer joins, leaves, or is considered dead by the group coordinator broker. During a rebalance, all consumption stops — this is the stop-the-world event of the Kafka world. A slow consumer that exceeds max.poll.interval.ms (default 5 minutes) triggers a rebalance even if the consumer process is healthy. This is one of the most misdiagnosed issues in Kafka production systems.

Kafka 2.4+ introduced Incremental Cooperative Rebalancing (the StickyAssignor strategy), which lets consumers release only the partitions they need to give up rather than all of them. This eliminates the full stop-the-world pause for most real rebalances. If you're on an older assignor, you're leaving throughput on the table.

io/thecodeforge/kafka/OrderEventConsumer.javaJAVA

100

101

102

103

104

105

106

107

package io.thecodeforge.kafka;

import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.errors.WakeupException;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicBoolean;

public class OrderEventConsumer {

    // Shared shutdown flag — set from a shutdown hook so the poll loop exits cleanly
    private final AtomicBoolean shutdownRequested = new AtomicBoolean(false);

    public void startConsuming() {
        Properties consumerConfig = new Properties();

        consumerConfig.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker-1:9092,kafka-broker-2:9092");
        consumerConfig.put(ConsumerConfig.GROUP_ID_CONFIG, "order-processing-service"); // Consumer group name
        consumerConfig.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        consumerConfig.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());

        // CRITICAL: How long between poll() calls before the broker declares this consumer dead.
        // If your business logic per batch takes > 5 min, raise this — or better, move processing off the poll thread.
        consumerConfig.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "300000"); // 5 minutes

        // CRITICAL: How many records to fetch per partition per poll() call.
        // Lower this if your processing is slow to avoid triggering max.poll.interval.ms
        consumerConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "100");

        // Use CooperativeStickyAssignor to avoid full stop-the-world rebalances (Kafka 2.4+)
        consumerConfig.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
            CooperativeStickyAssignor.class.getName());

        // Disable auto-commit — we commit manually AFTER processing succeeds
        // Auto-commit is at-most-once: it can commit before your DB write finishes
        consumerConfig.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");

        // Start reading from the earliest available offset if no committed offset exists
        consumerConfig.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

        try (KafkaConsumer<String, String> orderConsumer = new KafkaConsumer<>(consumerConfig)) {

            // Register a ConsumerRebalanceListener to log rebalance events in production
            orderConsumer.subscribe(Collections.singletonList("order-events"),
                new ConsumerRebalanceListener() {
                    @Override
                    public void onPartitionsRevoked(java.util.Collection<org.apache.kafka.common.TopicPartition> partitions) {
                        // Called BEFORE partitions are taken away — commit offsets here to avoid reprocessing
                        System.out.printf("[REBALANCE] Revoking partitions: %s — committing offsets now%n", partitions);
                        orderConsumer.commitSync(); // Synchronous commit before giving partitions up
                    }

                    @Override
                    public void onPartitionsAssigned(java.util.Collection<org.apache.kafka.common.TopicPartition> partitions) {
                        System.out.printf("[REBALANCE] Assigned partitions: %s%n", partitions);
                    }
                });

            // Register shutdown hook so Ctrl+C triggers a clean wakeup instead of an abrupt kill
            Runtime.getRuntime().addShutdownHook(new Thread(() -> {
                shutdownRequested.set(true);
                orderConsumer.wakeup(); // Causes the blocking poll() to throw WakeupException
            }));

            while (!shutdownRequested.get()) {
                try {
                    // poll() does two things: fetches records AND sends heartbeats to the group coordinator
                    ConsumerRecords<String, String> orderBatch = orderConsumer.poll(Duration.ofMillis(500));

                    for (ConsumerRecord<String, String> orderRecord : orderBatch) {
                        System.out.printf("Partition=%d | Offset=%d | Key=%s | Value=%s%n",
                            orderRecord.partition(),
                            orderRecord.offset(),       // Monotonically increasing per partition
                            orderRecord.key(),
                            orderRecord.value());

                        processOrder(orderRecord.value()); // Your actual business logic
                    }

                    if (!orderBatch.isEmpty()) {
                        // Synchronous commit: blocks until broker acknowledges.
                        // Guarantees at-least-once delivery — your processing must be idempotent.
                        orderConsumer.commitSync();
                        System.out.printf("Committed offsets for %d records%n", orderBatch.count());
                    }

                } catch (WakeupException shutdownSignal) {
                    // Expected during shutdown — exit the loop cleanly
                    if (!shutdownRequested.get()) throw shutdownSignal;
                    System.out.println("Shutdown signal received — exiting poll loop");
                }
            }
        }
    }

    private void processOrder(String orderJson) {
        // Simulate order processing — in production this writes to a DB, calls downstream services, etc.
        System.out.printf("Processing order: %s%n", orderJson);
    }

    public static void main(String[] args) {
        new OrderEventConsumer().startConsuming();
    }
}

⚠ Watch Out: The Silent Rebalance Loop

If your consumer group is stuck in a continuous rebalance loop, check max.poll.interval.ms vs your actual processing time per batch first. Then check session.timeout.ms — if your broker's group.min.session.timeout.ms is higher than your configured value, the broker silently ignores your setting and uses its minimum. Run kafka-consumer-groups.sh --describe and watch the STATE field: 'PreparingRebalance' appearing repeatedly is your smoking gun.

📊 Production Insight

Consumer rebalancing stops ALL consumption in the group. A 30-second rebalance on a 6-partition topic is 30 seconds of zero throughput.

The default RangeAssignor revokes all partitions from all consumers during any rebalance, even a single consumer join/leave.

Rule: upgrade to CooperativeStickyAssignor (Kafka 2.4+). It only moves the partitions that actually need to move. The stop-the-world pause disappears.

🎯 Key Takeaway

Partition count is your permanent parallelism ceiling — you can never have more active consumers than partitions in a group, and you cannot reduce partition count without creating a new topic and migrating data.

Over-provision partitions by at least 3x your current consumer count to leave room for growth.

Rule: partition count = max_throughput_MBps / single_partition_throughput_MBps, then multiply by 3. Err on the side of more partitions.

Consumer Lag Investigation Tree

IfLag is growing, all consumers at < 80% CPU

→

UseYou're I/O bound or network bound. Check consumer's downstream system latency (DB calls, HTTP requests). Increase fetch.max.bytes to get more data per poll. Add more consumers — but only if you have partitions available.

IfLag is growing, at least one consumer at 100% CPU

→

UseYou're CPU bound. Optimize message processing logic. Move to a compiled language for the consumer if currently in Python/Ruby. Partition the workload differently — low-cardinality keys cause hot partitions.

IfLag spikes every few minutes, then returns to zero

→

UseRebalance storm. Check max.poll.interval.ms vs processing time. Check group state with kafka-consumer-groups.sh. Switch to CooperativeStickyAssignor. Move heavy processing off the poll thread.

IfOne specific partition has much higher lag than others

→

UseHot partition. Check message keys — low-cardinality keys (country, region) create imbalance. Check producer key distribution with kafka-run-class GetOffsetShell per partition. Redesign key to include salt: composite key with higher cardinality.

IfLag is zero but downstream system sees no messages

→

UseCommitted offset moved without processing. Check auto-commit setting. If true, set to false. If false, verify commitSync() is called AFTER processing, not before. Look for exceptions between processing and commit.

thecodeforge.io

Apache Kafka Basics

Producer Durability vs Latency: The acks, retries, and idempotence Triangle

Every Kafka producer configuration is really a negotiation between three forces: how fast you want to send, how sure you want to be the message arrived, and how much you'll tolerate duplicates. Getting this wrong is expensive in production — either you lose data silently or you process orders twice.

The acks setting controls acknowledgement behaviour. acks=0 means fire and forget — fastest, no guarantees. acks=1 means the partition leader acknowledges — fast, but if the leader dies before replication, the message is lost. acks=all (or -1) means every in-sync replica (ISR) must acknowledge — durable, but only as strong as your min.insync.replicas setting.

Here's the trap: acks=all with min.insync.replicas=1 is effectively the same as acks=1, because only one replica needs to be in sync. The correct production configuration is acks=all combined with min.insync.replicas=2 on a cluster with replication factor 3. This tolerates one broker failure with zero data loss.

Idempotent producers (enable.idempotence=true), available since Kafka 0.11, solve duplicate delivery during retries. The broker assigns each producer a PID (Producer ID) and tracks a sequence number per partition. If a retry arrives with the same sequence number, the broker deduplicates it. This gives you exactly-once semantics at the producer level with zero application-side logic. Always enable this in production — the overhead is negligible.

io/thecodeforge/kafka/DurableOrderEventProducer.javaJAVA

100

101

102

103

package io.thecodeforge.kafka;

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class DurableOrderEventProducer {

    public static void main(String[] args) throws ExecutionException, InterruptedException {

        Properties producerConfig = new Properties();

        producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker-1:9092,kafka-broker-2:9092,kafka-broker-3:9092");
        producerConfig.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        producerConfig.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // ── DURABILITY CONFIG ──────────────────────────────────────────────────
        // Require ALL in-sync replicas to acknowledge before the write is confirmed.
        // Combined with min.insync.replicas=2 on the broker/topic, this means
        // the write survives a single broker failure.
        producerConfig.put(ProducerConfig.ACKS_CONFIG, "all");

        // Enable idempotent producer: assigns this producer a unique PID.
        // Broker deduplicates retried messages using PID + partition + sequence number.
        // REQUIRES: acks=all, retries > 0, max.in.flight.requests.per.connection <= 5
        producerConfig.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");

        // How many times to retry a failed send before giving up.
        // With idempotence enabled, retries are safe — no duplicates.
        producerConfig.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);

        // Max unacknowledged requests per connection.
        // Must be <= 5 when idempotence is enabled.
        // Higher values increase throughput but with idempotence, 5 is the max safe value.
        producerConfig.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5");

        // ── THROUGHPUT / LATENCY TUNING ───────────────────────────────────────
        // linger.ms: How long to wait for more messages before sending a batch.
        // 0 = send immediately (low latency, small batches).
        // 20 = wait 20ms, batch more records, fewer network round-trips (higher throughput).
        producerConfig.put(ProducerConfig.LINGER_MS_CONFIG, "20");

        // batch.size: Max bytes per batch per partition before sending regardless of linger.ms.
        // Default is 16KB — increasing to 64KB or 128KB dramatically improves throughput
        // when producing at high volume.
        producerConfig.put(ProducerConfig.BATCH_SIZE_CONFIG, "65536"); // 64KB

        // compression.type: Compresses batches before sending. 'snappy' is CPU-light.
        // 'lz4' is faster. 'zstd' gives best compression ratio (Kafka 2.1+).
        // Compression happens client-side — reduces network and disk usage.
        producerConfig.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");

        // buffer.memory: Total memory the producer uses to buffer records before sending.
        // If the buffer fills up (broker is slow), send() will block for max.block.ms,
        // then throw a TimeoutException.
        producerConfig.put(ProducerConfig.BUFFER_MEMORY_CONFIG, "67108864"); // 64MB

        try (KafkaProducer<String, String> orderProducer = new KafkaProducer<>(producerConfig)) {

            String orderId = "orderId-9821";
            String orderPayload = "{\"status\":\"PLACED\",\"amount\":149.99,\"customerId\":\"cust-4477\"}";

            // Using the orderId as the message key ensures all events for the same order
            // always land on the same partition — preserving per-order event ordering.
            ProducerRecord<String, String> orderRecord = new ProducerRecord<>(
                "order-events",   // topic
                orderId,          // key — determines partition via murmur2 hash
                orderPayload      // value
            );

            // ── OPTION A: Async send with callback (preferred for high throughput) ──
            orderProducer.send(orderRecord, (metadata, sendException) -> {
                if (sendException != null) {
                    // In production: push to a dead-letter topic or alert on-call
                    System.err.printf("FAILED to send order %s: %s%n", orderId, sendException.getMessage());
                } else {
                    System.out.printf("[ASYNC] Sent to topic=%s partition=%d offset=%d timestamp=%d%n",
                        metadata.topic(),
                        metadata.partition(),
                        metadata.offset(),
                        metadata.timestamp());
                }
            });

            // ── OPTION B: Sync send — blocks until ack or exception ──
            // Use when you MUST know the write succeeded before proceeding (e.g., financial transactions)
            try {
                RecordMetadata syncMetadata = orderProducer.send(orderRecord).get(); // Blocks here
                System.out.printf("[SYNC] Confirmed at partition=%d offset=%d%n",
                    syncMetadata.partition(),
                    syncMetadata.offset());
            } catch (ExecutionException kafkaSendError) {
                System.err.println("[SYNC] Send failed: " + kafkaSendError.getCause().getMessage());
            }

            // flush() ensures all buffered messages are sent before the try-with-resources closes the producer
            orderProducer.flush();
        }
    }
}

💡Pro Tip: Partition Key Design Is a Throughput Decision

Using a high-cardinality key like orderId distributes load evenly across partitions — ideal. Using a low-cardinality key like region or country creates hot partitions: one partition gets 80% of traffic while others sit idle. If you're seeing one partition's offset lag growing while others are flat, suspect a hot-key problem. Consider a composite key (region + customerId) or a custom partitioner that salts the key.

📊 Production Insight

acks=all with min.insync.replicas=1 is a false sense of security. Lose the leader and writes fail.

Idempotent producers (enable.idempotence=true) add 2-5% latency overhead but eliminate duplicate delivery during retries.

Rule: For financial data: replication factor=3, min.insync.replicas=2, acks=all, enable.idempotence=true. For analytics logs: acks=1, compression=zstd, and accept occasional loss.

🎯 Key Takeaway

acks=all is necessary but not sufficient for durability — you must also set min.insync.replicas=2 at the topic level.

Without MISR=2, acks=all with only one in-sync replica is functionally identical to acks=1.

Rule: Always set enable.idempotence=true in production. The overhead is small and eliminated duplicates are worth it.

Producer Configuration Strategy

IfFinancial transactions, inventory updates, any data where loss is unacceptable

→

Useacks=all, min.insync.replicas=2 (topic config), replication factor=3, enable.idempotence=true, retries=MAX_INT

IfBusiness metrics, analytics, logs — loss is tolerable, throughput is priority

→

Useacks=1, enable.idempotence=false, compression=lz4 or zstd, linger.ms=50 for batch efficiency

IfDevelopment/testing environment

→

Useacks=0, no compression, no idempotence. Speed over durability — data loss is acceptable in non-production.

IfMessages must be processed in strict order per key (e.g., user events)

→

Usemax.in.flight.requests.per.connection=1. This serialises all requests, killing throughput. Alternative: enable.idempotence=true (limits to 5 in-flight) which is usually sufficient for ordering.

IfProducer buffer filling frequently — send() blocks

→

UseIncrease buffer.memory from 32MB to 128MB. Increase max.block.ms to 120000. Check if broker is slow — if so, increase broker's num.io.threads.

Replication, ISR, and What Actually Happens When a Broker Dies

Replication in Kafka is leader-follower. Each partition has one leader and N-1 followers spread across brokers. Producers and consumers always talk to the leader — followers exist purely for fault tolerance. The leader tracks which followers are caught up: if a follower hasn't fetched within replica.lag.time.max.ms (default 30 seconds), it's removed from the ISR. When the leader fails, the controller broker (elected via Zookeeper or KRaft) promotes one of the ISR members to leader.

This is where min.insync.replicas (often called MISR) becomes critical. With replication factor 3 and min.insync.replicas=2, you can lose one broker and still accept writes. Lose two brokers and the topic partition becomes unavailable for writes — producers get a NotEnoughReplicasException. This is intentional: Kafka chooses consistency over availability in this scenario, which is the correct trade-off for financial data.

Unclean leader election (unclean.leader.election.enable) is the escape hatch. If all ISR members are dead and this setting is true, Kafka will elect an out-of-sync replica as leader — potentially losing committed messages. This defaults to false in Kafka 0.11+ for exactly this reason. Only enable it for topics where availability beats data integrity.

Kafka 2.8+ introduced KRaft mode, replacing Zookeeper with a Raft-based metadata quorum built into Kafka itself. This eliminates the operational complexity of running a separate Zookeeper ensemble and dramatically speeds up controller failover — from tens of seconds to under a second.

io/thecodeforge/kafka/kafka_replication_diagnostics.shBASH

# ── Check ISR status for all partitions of a topic ──
# Under-replicated partitions (ISR < replication factor) are a red alert
kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --describe \
  --topic order-events

# Expected healthy output:
# Topic: order-events  Partition: 0  Leader: 1  Replicas: 1,2,3  Isr: 1,2,3
# Topic: order-events  Partition: 1  Leader: 2  Replicas: 2,3,1  Isr: 2,3,1
# Topic: order-events  Partition: 2  Leader: 3  Replicas: 3,1,2  Isr: 3,1,2

# ── Simulate broker 2 going down, then check ISR shrinkage ──
# After replica.lag.time.max.ms (30s), broker 2 is removed from ISR
kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --describe \
  --topic order-events

# Output after broker 2 failure:
# Topic: order-events  Partition: 0  Leader: 1  Replicas: 1,2,3  Isr: 1,3  <-- broker 2 dropped!
# Topic: order-events  Partition: 1  Leader: 3  Replicas: 2,3,1  Isr: 3,1  <-- leader elected from ISR
# Topic: order-events  Partition: 2  Leader: 3  Replicas: 3,1,2  Isr: 3,1

# ── Check consumer group lag — the most important production health metric ──
# LAG = LOG-END-OFFSET minus CURRENT-OFFSET per partition
# Growing lag means consumers can't keep up with producer throughput
kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --describe \
  --group order-processing-service

# Healthy output (lag near zero):
# GROUP                    TOPIC        PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG  CONSUMER-ID
# order-processing-service order-events 0          1047            1047            0    consumer-1-uuid
# order-processing-service order-events 1          892             892             0    consumer-2-uuid
# order-processing-service order-events 2          1103            1103            0    consumer-1-uuid

# ── Set min.insync.replicas at the topic level (overrides broker default) ──
# This is the safest place to set it — per topic, not globally
kafka-configs.sh \
  --bootstrap-server localhost:9092 \
  --alter \
  --entity-type topics \
  --entity-name order-events \
  --add-config min.insync.replicas=2

# ── Verify the config was applied ──
kafka-configs.sh \
  --bootstrap-server localhost:9092 \
  --describe \
  --entity-type topics \
  --entity-name order-events

⚠ Watch Out: Under-Replicated Partitions Are a Time Bomb

An under-replicated partition isn't a crisis on its own — but it means you've lost your fault-tolerance buffer. If the remaining replica goes down while you're under-replicated, that partition becomes offline and you lose write availability. Set up a Prometheus alert on kafka_server_replicamanager_underreplicatedpartitions > 0 and treat it as P1 regardless of time of day.

📊 Production Insight

ISR (In-Sync Replica) shrinkage happens silently. One slow broker can drop from ISR without any alert.

min.insync.replicas is a topic-level config, not a broker default. Set it per topic based on data criticality.

Rule: Monitor under-replicated partitions with a P1 alert. If ISR size < replication factor for more than 5 minutes, investigate immediately. An under-replicated partition cannot survive another broker loss.

🎯 Key Takeaway

Replication factor sets the total copies. ISR tracks copies that are fully caught up.

min.insync.replicas sets the minimum ISR size required for writes — the true durability knob.

Rule: For production critical topics: RF=3, MISR=2. For analytics: RF=2, MISR=1. Never run RF=1 in production unless you accept data loss.

Replication and ISR Decision Tree

IfTopic contains financial or transactional data where loss is unacceptable

→

Usereplication.factor=3, min.insync.replicas=2. Tolerates one broker failure without downtime or data loss. Can survive with one remaining replica but no further failures.

IfTopic contains logs or metrics where loss is tolerable, availability matters more

→

Usereplication.factor=2, min.insync.replicas=1. Faster writes, slightly lower durability. Data loss possible if the leader fails before replication to the single follower.

IfISR size < replication.factor consistently — one follower always lagging

→

UseIncrease replica.lag.time.max.ms from 30s to 60s. Monitor follower disk I/O and network latency. If a broker is permanently slow, replace it or move its replicas to another broker.

IfYou have a single broker in production (development only)

→

Usereplication.factor=1, min.insync.replicas=1. No fault tolerance. Any broker failure causes data loss and downtime. Do not run production workloads on a single broker.

IfMultiple brokers down simultaneously

→

UseIf down brokers exceed (replication.factor - min.insync.replicas), writes will fail with NotEnoughReplicasException. This is intentional. Kafka chooses consistency over availability. To restore, bring brokers back online or (dangerously) set min.insync.replicas=1 temporarily.

Leader Election: Why ZooKeeper Isn't the Bottleneck You Think (But Your Controller Configuration Is)

Every Kafka cluster has exactly one controller broker. That controller is responsible for electing new partition leaders when a broker dies. The election itself is fast — ZooKeeper handles the ephemeral znode check, and the controller picks the first in-sync replica from the ISR list. That decision happens in milliseconds.

What kills you is the controller failover when the controller broker itself goes down. All brokers compete for the /controller znode in ZooKeeper. The winner becomes the new controller and immediately processes pending partition reassignments. If you have thousands of partitions, that batch operation can block metadata requests from producers and consumers for seconds.

You tune zookeeper.session.timeout.ms to detect failures faster. You set auto.leader.rebalance.enable=true only if you want the controller to continuously push leaders back to preferred replicas — which adds load. The real optimization? Keep your partition count sane. A broker with 10,000 partitions is a limp hand grenade. Start raising alarms at 4,000 per broker.

LeaderElectionConfig.sqlSQL

// io.thecodeforge — database tutorial

// Check the current controller broker
SELECT * FROM kafka_broker_roles WHERE role = 'controller';
-- Output: broker_id=2, host=kafka-prod-02.internal

// View partition leader distribution
SELECT topic, partition, leader, replicas, isr
FROM kafka_partition_state
WHERE topic = 'order_events';

// Sample output for a partition with a recent leader change
-- topic: order_events
-- partition: 7
-- leader: 3
-- replicas: [1,2,3]
-- isr: [2,3]
// Notice leader=3 but broker 1 is out of sync. Leader election picked 3 from ISR.

// Tune controller failover detection (server.properties)
-- zookeeper.session.timeout.ms=6000
-- zookeeper.connection.timeout.ms=18000
-- controlled.shutdown.enable=true

Output

broker_id | host

----------|-------------------

2 | kafka-prod-02.internal

topic | partition | leader | replicas | isr

--------------|-----------|--------|-----------|-------

order_events | 7 | 3 | [1,2,3] | [2,3]

⚠ Production Trap:

Setting zookeeper.session.timeout.ms below 6 seconds on a cluster under heavy load triggers false-positive broker removals. The controller kicks out a broker that's just GC-pausing. You lose partitions, you lose availability. Measure your broker GC pause times first.

🎯 Key Takeaway

The controller is a single point of coordination. Keep partition count under 4,000 per broker, tune session timeout to survive GC, and never underestimate the cost of a controller failover cascade.

Consumer Rebalancing: The Silent Throttle That Wastes Your Parallelism Gains

You set up a consumer group with 12 consumers, you have 12 partitions, you expect 12-way parallelism. Then the group coordinator (one of the brokers) triggers a rebalance because a consumer’s heartbeat timed out. For the duration of the rebalance — 'stop the world' — every consumer in that group stops processing. All 12 partitions go dark.

The new consumer protocol (cooperative rebalancing) fixes the worst of this. Instead of revoking all partitions, it revokes a subset. Consumers keep processing the partitions they still own. You enable it with partition.assignment.strategy=CooperativeStickyAssignor. But here is the catch: the coordinator still serializes assignment changes. Your consumers are idle during the assignment computation.

The hard truth: rebalancing frequency is a function of consumer instability. Short-lived consumers, consumer crashes, or network blips trigger cascading rebalances. Keep session.timeout.ms at 45 seconds and heartbeat.interval.ms at 3 seconds. Don't let a misconfigured health check kill your consumer every 30 seconds. Each rebalance costs your entire group its throughput for a window.

Also, monitor kafka.consumer:type=consumer-coordinator-metrics for rebalance rate and time since last rebalance. If you see more than one rebalance per minute, you have a problem.

ConsumerRebalanceMetrics.sqlSQL

// io.thecodeforge — database tutorial

// Query consumer group state from Kafka's internal __consumer_offsets topic
SELECT group_id, state, partition, current_offset, log_end_offset
FROM kafka_group_state
WHERE group_id = 'order_processor_v2';

-- Example state: 'Stable' or 'PreparingRebalance'
// If state is 'PreparingRebalance', consumers are NOT processing.

// Check rebalance rate in the last 5 minutes (JMX metric)
-- Metric: kafka.consumer:type=consumer-coordinator-metrics
-- Attribute: rebalance-rate
-- Value: 2.3 (rebalances per minute) => BAD

// Sample output:
-- group_id: order_processor_v2
-- state: Stable
-- partition: 4
-- current_offset: 1423508
-- log_end_offset: 1423512
// This partition is nearly caught up. Good.

// But if state flips to PreparingRebalance, offsets stall.

Output

group_id | state | partition | current_offset | log_end_offset

--------------------|----------|-----------|----------------|----------------

order_processor_v2 | Stable | 4 | 1423508 | 1423512

order_processor_v2 | Stable | 5 | 890123 | 890125

order_processor_v2 | Stable | 6 | 2045678 | 2045682

💡Senior Shortcut:

Wrap your consumer's poll loop in a graceful shutdown hook. Send a 'leave group' request before the consumer terminates. This triggers a quick, cooperative rebalance instead of waiting for session timeout. Saves your team 5–15 seconds of downtime per consumer restart.

🎯 Key Takeaway

A rebalance is a global bottleneck for your consumer group. Use cooperative sticky assignor, keep consumers alive, and treat rebalance rate as an SLO.

Kafka CLI Basics: The Commands That Actually Matter in Production

You don't need a UI to run Kafka—you need a terminal and three commands that will diagnose 90% of your cluster issues before they hit production. The Kafka CLI tools are ugly, but they're fast and they don't lie.

Start with kafka-console-producer to blast test data into a topic: it's your canary. Then kafka-console-consumer with --from-beginning to verify everything arrived—or your consumer group isn't stuck. The real money is kafka-consumer-groups --describe. That single command shows you lag per partition. If lag grows, you have a consumer bottleneck or a partition count mismatch.

Don't fall for the trap of using kafka-topics --describe to check health. That's metadata, not data. The logs tell the truth. If you're debugging, grep for "LEADER_NOT_AVAILABLE" or "NOT_ENOUGH_REPLICAS" in broker logs. That's where production fires start.

Kafka_Basic_Ops.sqlSQL

// io.thecodeforge — database tutorial

// Create a topic with 3 partitions, replication factor 2
kafka-topics \
  --bootstrap-server broker1:9092 \
  --create \
  --topic orders_raw \
  --partitions 3 \
  --replication-factor 2

// Produce a single record with a key for co-partitioning
echo "order_id:abc123,event:created" | \
  kafka-console-producer \
  --bootstrap-server broker1:9092 \
  --topic orders_raw \
  --property parse.key=true \
  --property key.separator=:

// Describe consumer lag to find slow partitions
kafka-consumer-groups \
  --bootstrap-server broker1:9092 \
  --group order_processor \
  --describe

Output

GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG

order_processor orders_raw 0 1420 1420 0

order_processor orders_raw 1 1389 1390 1

order_processor orders_raw 2 1411 1411 0

⚠ Production Trap:

Never use kafka-console-consumer with --group in production unless you intend to join that consumer group. You'll mess up offset commits and cause rebalances on real services.

🎯 Key Takeaway

kafka-consumer-groups --describe shows partition lag—the only metric that tells you if your consumers are falling behind.

Scalability: Adding Partitions Does Not Magically Speed You Up

Everyone thinks "more partitions = more throughput." That's a half-truth that will cost you. Kafka partitions are the unit of parallelism—each partition can only be consumed by one consumer in a group. So if you have 3 partitions and 10 consumers, 7 sit idle. The bottleneck isn't partitions—it's the number of consumer threads that can actually process data concurrently.

Here's the real math: max throughput = (number of partitions) × (single partition throughput). Single partition throughput is bounded by disk I/O and network bandwidth on that broker. Adding partitions spreads load across brokers if your topic is well-distributed, but adding partitions on a single broker just adds seek contention.

Production rule: set partition count to at least 2× your peak expected consumer count. Overshooting wastes broker resources—each partition is a file with leader election overhead. And never use kafka-topics --alter to increase partitions on a keyed topic unless you understand that keys will no longer map to the same partitions. Your joins will silently break.

Scalability_Check.sqlSQL

// io.thecodeforge — database tutorial

// Check per-broker partition distribution
kafka-topics \
  --bootstrap-server broker1:9092 \
  --describe \
  --topic orders_raw

// Output shows partition-to-broker mapping
Topic: orders_raw  PartitionCount: 3  ReplicationFactor: 2
  Topic: orders_raw  Partition: 0  Leader: 1  Replicas: 1,2  Isr: 1,2
  Topic: orders_raw  Partition: 1  Leader: 2  Replicas: 2,3  Isr: 2,3
  Topic: orders_raw  Partition: 2  Leader: 3  Replicas: 3,1  Isr: 3,1

// Simulate adding 3 more partitions (DANGER if keyed!)
kafka-topics \
  --bootstrap-server broker1:9092 \
  --alter \
  --topic orders_raw \
  --partition-count 6

Output

WARNING: If partitions are increased for a topic that uses a key, the partition logic will change. Existing keys will NOT be reassigned.

⚠ Senior Shortcut:

If you need more throughput, first increase consumer parallelism (threads per instance or more instances). Only increase partitions if your consumer group already has idle instances and partitions are the bottleneck.

🎯 Key Takeaway

Throughput = partitions × per-partition rate. Add consumers, not partitions, unless partitions are fully saturated.

Log Aggregation: Kafka The Unsexy Firehose That Replaced Syslog

Log aggregation was the original Kafka use case—LinkedIn built it to replace a mountain of point-to-point syslog pipes. Every service dumped structured logs to a single Kafka topic, and consumers parsed, filtered, and stored them to HDFS, Elasticsearch, or S3. It worked because Kafka decoupled producers (anything that logs) from consumers (anything that stores or alerts).

The pattern is dead simple: each service writes to a topic like logs_app_ with a JSON payload containing timestamp, level, service, trace_id, and message. Your consumers read from the topic, filter on log level, and route to the right sink. Need to add a new monitoring tool? Spin up a new consumer group. No changes to producers.

Here's the trap: Kafka is fast, but log aggregation at scale means high throughput with small messages. Each log line is maybe 500 bytes. Batching matters. Tune batch.size and linger.ms on producers to avoid a flood of tiny I/O operations. Otherwise your brokers thrash on disk writes and your aggregation pipeline collapses under metadata overhead.

Log_Aggregation_Producer.sqlSQL

// io.thecodeforge — database tutorial

// Producer config for log aggregation (batch-heavy)
producer:
  bootstrap.servers: "broker1:9092,broker2:9092"
  acks: "1"                  # tolerate rare loss for throughput
  batch.size: 65536          # 64KB batches for small log messages
  linger.ms: 100             # wait up to 100ms to fill a batch
  compression.type: "gzip"  # logs compress 10:1 on text
  key.serializer: "org.apache.kafka.common.serialization.StringSerializer"
  value.serializer: "org.apache.kafka.common.serialization.StringSerializer"

// Single log record (keyed by service name for order per service)
Key: "auth-service"
Value: "{\"ts\":\"2025-03-15T10:30:00Z\",\"level\":\"ERROR\",\"service\":\"auth-service\",\"trace_id\":\"xyz789\",\"message\":\"DynamoDB timeout on user lookup\"}"

Output

[2025-03-15 10:30:01] Produced to partition 2, offset 14200: auth-service | ERROR | DynamoDB timeout

💡Production Trap:

Don't use acks=all for log aggregation. Losing a few log lines is acceptable. Acks=all kills throughput. Set acks=1, compress with gzip, and accept eventual delivery.

🎯 Key Takeaway

Kafka log aggregation = one topic, many consumers. Batch aggressively and compress—logs are data you can sacrifice for speed.

Real-World Applications Beyond Log Aggregation

Kafka's append-only log and replayability make it the backbone for mission-critical systems far beyond simple log shipping. Financial institutions use Kafka for real-time fraud detection — each transaction is a message consumed by multiple applications: one checks velocity limits, another cross-references blacklists, and a third updates risk scores. E-commerce platforms rely on Kafka for order processing state machines. When a customer places an order, Kafka streams events through inventory reservation, payment authorization, and shipping dispatch, all in separate consumer groups. Healthcare systems use Kafka for patient monitoring data pipelines, where sensor readings from thousands of devices must be processed in real-time to alert clinicians. The key pattern is the decoupling of producers from consumers — each system writes its events once, and any number of downstream systems replay or filter those events independently. Kafka replaces fragile point-to-point integrations with a durable, ordered event backbone that scales horizontally. Any system where data loss is unacceptable and multiple consumers need the same stream is a candidate for Kafka.

kafka_use_case_order_tracking.sqlSQL

// io.thecodeforge — database tutorial

-- Simulated Kafka order lifecycle stream
CREATE STREAM orders (
  order_id STRING KEY,
  user_id STRING,
  product_id STRING,
  amount DOUBLE,
  status STRING
) WITH (
  kafka_topic = 'orders',
  value_format = 'JSON',
  partitions = 6
);

-- Consumer: fraud detection
SELECT order_id, user_id, amount
FROM orders
WHERE status = 'PENDING'
  AND amount > 10000
EMIT CHANGES;

Output

order_id=P001, user_id=U99, amount=15000.0

⚠ Production Trap:

Don't put all business logic into a single consumer. Split responsibilities across consumer groups to avoid coupling failure modes — a slow fraud check should not delay payment processing.

🎯 Key Takeaway

Kafka excels when multiple independent systems need to consume the same event stream without polling or tight coupling.

Advanced Kafka Features That Save Your Architecture

Most engineers stop at basic produce/consume patterns, but Kafka's advanced features solve real production headaches. Idempotent producers, enabled via enable.idempotence=true, guarantee exactly-once semantics for writes — no duplicate messages even during retries, which prevents financial double-counting. Transactional messaging extends this across multiple partitions: wrap a batch of messages in a transaction so they all commit or none do, critical for atomic updates across topics. Compacted topics retain only the latest value per key, not the full history — ideal for restoring state from a changelog without reprocessing old events. Tiered storage offloads older log segments to cheaper object stores (S3, HDFS), reducing broker disk pressure while keeping data accessible. KSQL (Kafka Streams SQL) lets you define streaming joins, aggregations, and windowing with plain SQL, saving you from writing Java Streams code for simple transformations. The MirrorMaker 2 pattern enables cross-datacenter replication with active-active topologies, but beware of cyclic replication loops. These features exist because real systems need transactions, state restoration, and infinite retention without crashing brokers.

kafka_transactional_producer.sqlSQL

// io.thecodeforge — database tutorial

-- Kafka transactional producer setup (conceptual SQL)
CREATE PRODUCER transaction_producer WITH (
  transactional.id = 'order-payment-xact',
  enable.idempotence = true
);

BEGIN TRANSACTION;
INSERT INTO orders (order_id, status) VALUES ('O123', 'PAID');
INSERT INTO payments (order_id, amount) VALUES ('O123', 49.99);
COMMIT;

Output

Transaction committed: order O123 and payment 49.99 visible atomically.

⚠ Production Trap:

Transactional producers require acks=all and min.insync.replicas=2. Without proper replication, a single broker failure can abort uncommitted transactions.

🎯 Key Takeaway

Enable idempotence and transactional producers to guarantee exactly-once semantics for critical financial and order processing streams.

Activity Tracking at Scale: Kafka's Original Purpose

Before Kafka became the universal event bus, LinkedIn used it to solve a simple problem: track every user action — page views, clicks, likes, shares — and make that data available to dozens of systems without overwhelming databases. This is still one of Kafka's most powerful yet underused applications. Each user action is a message in an activity topic partitioned by user ID. Multiple consumers read the same stream for different purposes: one writes to a real-time dashboard for active user counts, another feeds a recommendation engine, a third sends data to a data lake for weekly analytics. The key insight is that Kafka decouples the capture of activity from its consumption. You never worry about backpressure from a slow consumer because Kafka buffers messages on disk. You can add new consumers years later and replay past activity to backfill models — a capability impossible with traditional pub/sub systems. Activity tracking demands high throughput (millions of events per second) and low latency (sub-second from click to dashboard). Kafka handles both when you configure sufficient partitions (at least 50 per broker) and use batching wisely. Every web application generating user events should consider Kafka before building custom polling solutions.

kafka_activity_stream.sqlSQL

// io.thecodeforge — database tutorial

CREATE STREAM user_activity (
  user_id STRING KEY,
  event_type STRING,
  page STRING,
  timestamp BIGINT
) WITH (
  kafka_topic = 'user_activity',
  partitions = 12,
  value_format = 'AVRO'
);

-- Realtime active user count
SELECT user_id, COUNT(*) AS actions
FROM user_activity
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY user_id
HAVING COUNT(*) > 10
EMIT CHANGES;

Output

user_id=U42, actions=15 (window 2024-03-15T10:00:00Z)

⚠ Production Trap:

Do not use single-character event type names ('v' for view). They save bytes but make debugging impossible. Always use descriptive, human-readable event types.

🎯 Key Takeaway

Activity tracking is Kafka's original use case — capture once, consume by many, replay anytime without database overload.

Installation: Kafka Without the Pain

Installing Kafka isn't about running a script and forgetting it; it's about understanding the moving parts that make your cluster reliable. Why does the installation method matter? Because your environment dictates how ZooKeeper, brokers, and your CLI tools communicate. For macOS, use Homebrew: brew install kafka. This pulls both ZooKeeper and the broker binaries. On Windows, avoid native installs—use WSL2 for Linux parity. Install Java 11+, then download Kafka from Apache. On Linux, extract the tarball and set KAFKA_HOME. The key insight: install ZooKeeper separately (brew or manual) because Kafka's controller is still dependent on it despite KIP-500 progress. Always verify with kafka-broker-api-versions.sh --bootstrap-server localhost:9092. A failed install often means Java version mismatch or ZooKeeper not starting first—check logs, not assumptions.

install_kafka.shSQL

// io.thecodeforge — database tutorial
// 25 lines max
#!/bin/bash
# macOS
brew install kafka
brew services start zookeeper
brew services start kafka
# Linux/WSL2
wget https://dlcdn.apache.org/kafka/3.5.1/kafka_2.13-3.5.1.tgz
tar -xzf kafka_2.13-3.5.1.tgz
cd kafka_2.13-3.5.1
bin/zookeeper-server-start.sh config/zookeeper.properties &
bin/kafka-server-start.sh config/server.properties &
echo "Verify: bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092"

Output

Apache Kafka 3.5.1 is ready.

Waiting for broker...

Broker started on port 9092.

⚠ Production Trap:

Don't use Homebrew Kafka in production—it hides config files in cellar paths. Always use explicit tarball installs for predictable control over log retention and heap settings.

🎯 Key Takeaway

Install Kafka by matching tools to your OS, then validate with broker API checks.

Starting Kafka: The CLI Lifecycle You Must Own

Starting Kafka isn't a single command—it's a chain of decisions that affect durability and replication. Why start this way? Because mixing startup order causes phantom leader failures. On WSL2 and Linux, begin with ZooKeeper: bin/zookeeper-server-start.sh config/zookeeper.properties. It must bind to 0.0.0.0:2181 for cross-container access. Next, start the Kafka broker: bin/kafka-server-start.sh config/server.properties. Tweak log.dirs to a persistent mount (e.g., /data/kafka), not /tmp. For WSL2, use /mnt/c/kafka-logs to survive restarts. After startup, test with bin/kafka-topics.sh --create --topic test --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1. If you see Connection refused, check that ZooKeeper is alive first. The silent killer? advertised.listeners misconfiguration in server.properties—set it to your WSL2 IP (found via ip addr show eth0) or localhost for single-broker dev. Only then does your CLI toolchain work reliably.

start_kafka_cli.sqlSQL

// io.thecodeforge — database tutorial
// 25 lines max
-- Terminal 1: ZooKeeper first
!ZooKeeper on WSL2
bin/zookeeper-server-start.sh config/zookeeper.properties
-- Terminal 2: Kafka broker
bin/kafka-server-start.sh config/server.properties
-- Verify
bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
-- Create topic
bin/kafka-topics.sh --create --topic orders --partitions 3 --replication-factor 1 --bootstrap-server localhost:9092
-- Status
bin/kafka-topics.sh --describe --topic orders --bootstrap-server localhost:9092

Output

Topic: orders PartitionCount: 3 ReplicationFactor: 1

Configs: segment.bytes=1073741824

Topic: orders Partition: 0 Leader: 0 Replicas: 0 Isr: 0

Topic: orders Partition: 1 Leader: 0 Replicas: 0 Isr: 0

Topic: orders Partition: 2 Leader: 0 Replicas: 0 Isr: 0

⚠ Production Trap:

Starting Kafka without setting advertised.listeners in server.properties will make your WSL2 consumers fail silently—they can't reach the broker's internal IP.

🎯 Key Takeaway

Start ZooKeeper before Kafka, tune listeners for WSL2, and validate with topic commands.

Console CLI Tools: Producers, Consumers, and Groups

The kafka-console-producer and kafka-console-consumer are your surgical instruments for testing message flow and group behavior. Why use them before code? Because they isolate network and config issues from application logic. Produce a message: echo "order:123" | bin/kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092. The --property parse.key=true --property key.separator=: adds key semantics—without it, you get null keys, killing partition strategy. For consumers, use --group my-group to enable group management: bin/kafka-console-consumer.sh --topic orders --group my-group --from-beginning --bootstrap-server localhost:9092. To inspect group state, use kafka-consumer-groups.sh --group my-group --describe --bootstrap-server localhost:9092. This shows lag (difference between LOG-END-OFFSET and CURRENT-OFFSET). High lag means your consumer is throttled. The trap: --from-beginning with an existing group only works if the group has no committed offsets—reset with --reset-offsets --to-earliest --execute. Master these commands, and you debug Kafka blindfolded.

kafka_cli_demo.sqlSQL

// io.thecodeforge — database tutorial
// 25 lines max
-- Produce with key
bin/kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092 --property parse.key=true --property key.separator=:
>order:123:confirmed
-- Consume in group
bin/kafka-console-consumer.sh --topic orders --group my-group --bootstrap-server localhost:9092 --from-beginning --property print.key=true --property print.partition
-- Describe group
bin/kafka-consumer-groups.sh --group my-group --describe --bootstrap-server localhost:9092
GROUP           TOPIC  PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
my-group        orders 0          1               2              1
my-group        orders 1          0               1              1

Output

Key: order:123 Partition: 1 Value: confirmed

Consumer group 'my-group' has lag: 2 messages across all partitions.

⚠ Production Trap:

Running --from-beginning on a group with committed offsets does nothing—you must reset offsets via --reset-offsets to reprocess old data.

🎯 Key Takeaway

Use console tools to test keys, groups, and lag—they reveal partition and network issues before any code runs.

Kafka 3.x/4.x Features: KRaft, Elastic Scaling, Tiered Storage

Kafka 3.x introduced KRaft (Kafka Raft), removing the dependency on ZooKeeper for metadata management. This simplifies deployment and improves scalability by using a Raft-based consensus protocol within Kafka itself. In Kafka 4.x, KRaft becomes mandatory, enabling elastic scaling where brokers can be added or removed without full cluster restarts. Tiered Storage, also introduced in 3.x, allows older log segments to be offloaded to cheaper object storage (e.g., S3, GCS) while retaining the ability to consume them. This reduces local storage costs and enables retention of massive datasets. For example, you can configure a topic with local.retention.ms=86400000 (1 day local) and remote.storage.enable=true to keep data in S3 for 30 days. KRaft also improves controller failover speed, reducing rebalance storms. To use KRaft, generate a cluster ID and format storage directories: ./bin/kafka-storage.sh format -t -c config/kraft/server.properties. Then start the controller and broker processes separately or combined. Tiered Storage requires configuring the remote storage system in server.properties and enabling it per topic. These features are critical for modern Kafka deployments, reducing operational overhead and enabling cost-effective long-term data retention.

kraft-setup.shSQL

-- Generate a unique cluster ID
./bin/kafka-storage.sh random-uuid

-- Format storage directories with the cluster ID
./bin/kafka-storage.sh format -t <uuid> -c config/kraft/server.properties

-- Start the controller (if separate)
./bin/kafka-server-start.sh config/kraft/controller.properties

-- Start the broker
./bin/kafka-server-start.sh config/kraft/broker.properties

-- Enable tiered storage for a topic
./bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --alter --add-config 'remote.storage.enable=true,local.retention.ms=86400000'

🔥KRaft is mandatory in Kafka 4.x

📊 Production Insight

In production, use KRaft with at least 3 controllers for high availability. For Tiered Storage, monitor remote storage latency and ensure network bandwidth is sufficient to avoid consumer lag.

🎯 Key Takeaway

KRaft removes ZooKeeper dependency, enabling simpler and more scalable clusters, while Tiered Storage reduces local storage costs by offloading old data to object storage.

Kafka Connect: Source and Sink Connectors

Kafka Connect is a framework for streaming data between Kafka and other systems. Source connectors import data from external systems (e.g., databases, message queues) into Kafka topics, while sink connectors export data from Kafka topics to external systems (e.g., Elasticsearch, HDFS). Connect runs as a distributed service, providing fault tolerance and scalability. For example, the JDBC source connector can capture changes from a PostgreSQL table using incrementing or timestamp columns. Configuration is done via JSON or REST API. A typical source connector config specifies the connector class, connection URL, table name, and mode (e.g., incrementing). Sink connectors like the Elasticsearch sink write data to an index, with options for key and value converters. Connect also supports Single Message Transforms (SMTs) for lightweight data manipulation (e.g., renaming fields, filtering). To use Connect, start a worker process with connect-distributed.sh and submit connector configs via POST /connectors. For production, use at least two workers for redundancy. Connect simplifies integration without writing custom code, making it a cornerstone of event-driven architectures.

jdbc-source-connector.jsonSQL

{
  "name": "postgres-orders-source",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "connection.url": "jdbc:postgresql://localhost:5432/orders_db",
    "connection.user": "connect_user",
    "connection.password": "connect_pass",
    "table.whitelist": "orders",
    "mode": "incrementing",
    "incrementing.column.name": "order_id",
    "topic.prefix": "postgres-",
    "transforms": "RenameField",
    "transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
    "transforms.RenameField.renames": "customer_name:customer"
  }
}

💡Use Single Message Transforms sparingly

📊 Production Insight

Monitor connector offsets and restart connectors automatically using a health check. Use dead letter queues (DLQs) to handle malformed records without stopping the pipeline.

🎯 Key Takeaway

Kafka Connect provides reusable connectors to stream data in and out of Kafka, reducing custom integration code and enabling scalable data pipelines.

Kafka Streams vs ksqlDB: Stream Processing Options

Kafka Streams is a lightweight Java library for building stream processing applications that run within your application's JVM. It provides exactly-once semantics, stateful operations (e.g., aggregations, joins), and fault-tolerant state stores. ksqlDB, on the other hand, is a SQL abstraction layer that runs as a separate server (or embedded) and allows you to express stream processing as SQL queries. It is built on top of Kafka Streams but offers a simpler interface for analysts and developers familiar with SQL. For example, to compute a running count of orders per customer, in Kafka Streams you'd write a Java application with a KStream and groupBy; in ksqlDB, you'd write CREATE TABLE order_counts AS SELECT customer_id, COUNT(*) FROM orders_stream GROUP BY customer_id EMIT CHANGES;. ksqlDB also supports pull queries for low-latency lookups on materialized views. Kafka Streams is more flexible for custom logic and integration with other Java libraries, while ksqlDB is ideal for rapid prototyping and simpler pipelines. Both support exactly-once semantics and can be deployed in production. Choose Kafka Streams for complex event processing or when you need tight integration with existing Java code; choose ksqlDB for SQL-based transformations and when your team is more comfortable with SQL.

ksqldb-example.sqlSQL

-- Create a stream from a Kafka topic
CREATE STREAM orders_stream (
  order_id BIGINT,
  customer_id VARCHAR,
  amount DOUBLE
) WITH (
  KAFKA_TOPIC = 'orders',
  VALUE_FORMAT = 'JSON'
);

-- Create a materialized view with aggregation
CREATE TABLE order_counts AS
SELECT
  customer_id,
  COUNT(*) AS order_count,
  SUM(amount) AS total_amount
FROM orders_stream
GROUP BY customer_id
EMIT CHANGES;

-- Query the materialized view (pull query)
SELECT * FROM order_counts WHERE customer_id = '123';

🔥ksqlDB is built on Kafka Streams

📊 Production Insight

For production, monitor state store sizes and rebalance times. Kafka Streams applications should be deployed with multiple instances for parallelism, and ksqlDB servers should be clustered for high availability.

🎯 Key Takeaway

Kafka Streams offers programmatic control for complex stream processing, while ksqlDB provides SQL-based simplicity for faster development and easier maintenance.

● Production incidentPOST-MORTEMseverity: high

The Rebalance Storm That Killed Black Friday Sales

Symptom

Consumer lag monitoring showed no processing for 45 minutes during Black Friday peak. When lag finally dropped, it then spiked again 5 minutes later. The pattern repeated: complete stall, burst of consumption, stall, burst. kafka-consumer-groups.sh showed STATE=PreparingRebalance repeatedly. 8 consumer instances were in the group, but only 2 actually processed messages at any given time.

Assumption

The team assumed max.poll.interval.ms of 5 minutes was generous — their processing per batch averaged 30 seconds. They didn't account for long-tail latency: occasional batches with large messages or complex enrichment took 4.5 minutes. The consumer sent heartbeats, so the broker kept the session alive, but the group coordinator saw no poll() calls for 4.5 minutes and initiated a rebalance, thinking the consumer was dead.

Root cause

A single consumer in the group occasionally exceeded max.poll.interval.ms by 30 seconds due to an external API call timing out. The group coordinator removed it from the group and triggered a full rebalance (stop-the-world). During the rebalance, the consumer finished processing and called poll() again — but the coordinator had already removed it. The consumer re-joined, triggering another rebalance. This cycle repeated endlessly. The default partition.assignment.strategy (RangeAssignor) forced a full revocation of all partitions, not just the affected ones, making the storm worse.

Fix

Increased max.poll.interval.ms to 10 minutes to cover the worst-case processing time. Switched to CooperativeStickyAssignor (Kafka 2.4+), which only revokes partitions that need to move, not all of them. Moved the slow external API call from the poll thread to a separate thread pool, keeping the poll thread fast. Added a circuit breaker that fails fast on API timeouts instead of waiting the full timeout. After the fix, rebalances dropped from 47 per hour to 0.

Key lesson

max.poll.interval.ms must cover your 99.9th percentile processing time, not the average. Long-tail latency kills consumer groups.
Never place blocking or slow operations on the Kafka poll thread. Offload to a processing thread pool and keep poll() under 100ms.
Upgrade to CooperativeStickyAssignor. The stop-the-world rebalance of RangeAssignor is a production liability for any group with >2 consumers.
Monitor rebalance metrics: kafka.consumer:type=consumer-coordinator-metrics looking for rebalances.total and rebalance.time.total. A rising rebalance count is always a red alert.

Production debug guideSymptom → Action mapping for common Kafka production failures5 entries

Symptom · 01

Consumer group stuck in continuous rebalance loop (PreparingRebalance state repeating)

→

Fix

Check max.poll.interval.ms vs actual per-batch processing time. Run kafka-consumer-groups.sh --describe --group <group> and look for any consumer with high time since last poll. Increase max.poll.interval.ms to cover worst-case processing. Switch from RangeAssignor to CooperativeStickyAssignor by setting partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor in consumer config.

Symptom · 02

Producers getting NotEnoughReplicasException — writes failing intermittently

→

Fix

Check ISR (In-Sync Replica) count per partition with kafka-topics.sh --describe --topic <topic>. If Isr list size < min.insync.replicas, writes will fail. Check broker logs for follower fetch lag. Most common cause: a follower broker is falling behind due to disk I/O saturation or network latency. Increase replica.lag.time.max.ms temporarily, but address the underlying performance issue.

Symptom · 03

Consumer lag growing but consumers are at 100% CPU

→

Fix

You're CPU-bound. Add more consumers — but only if you have partitions available. Check partition count: if 6 partitions and 8 consumers, 2 are idle. You must increase partition count (requires downtime or new topic) or optimize processing logic to reduce CPU per message. Use async processing to batch work or parallelize per-message within the consumer.

Symptom · 04

Messages seem to be processed twice — duplicates in downstream system

→

Fix

You have 'at-least-once' delivery and your consumer crashes after processing but before committing offsets. Check enable.auto.commit — if true, change to false. Implement manual commitSync() ONLY after the entire batch is successfully processed and the downstream transaction is committed. Make downstream operations idempotent using a message-ID deduplication table with TTL.

Symptom · 05

High producer latency — send() blocks for seconds

→

Fix

Check buffer.memory vs actual usage. If the producer's buffer fills because the broker is slow to acknowledge, send() blocks. Monitor record-queue-time metric. Increase buffer.memory from default 32MB to 64MB or 128MB. Check broker request.handler.threads — if saturated, broker is the bottleneck, not producer.

★ Kafka Quick Debug Cheat SheetFast diagnostics for production Kafka issues. Run these commands to confirm the root cause before changing configuration.

Consumer group rebalancing repeatedly−

Immediate action

Check rebalance state and frequency

Commands

kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group <group>

kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group <group> --state

Fix now

Set max.poll.interval.ms=600000 (10 min). Set partition.assignment.strategy=CooperativeStickyAssignor. Move slow processing off the poll thread to a separate executor pool.

Under-replicated partitions (ISR < replication factor)+

Message loss suspected — offset commits without processing+

Producer requests timing out — write latency spikes+

Log compaction not cleaning up old records as expected+

Kafka Configuration Trade-offs

Configuration Aspect	High Throughput (Analytics)	High Durability (Financial)
acks setting	acks=1 (leader only)	acks=all (all ISR members)
min.insync.replicas	1 (effectively same as acks=1)	2 (tolerates 1 broker loss)
enable.idempotence	false (not needed, duplicates OK)	true (exactly-once at producer level)
linger.ms	50-100ms (large batches)	0-5ms (low latency commits)
batch.size	128KB-512KB (maximize batching)	16KB-64KB (flush quickly)
compression.type	lz4 or zstd (heavy compression)	snappy or none (CPU budget for other work)
Consumer: enable.auto.commit	true (at-most-once acceptable)	false (manual commitSync after DB write)
Consumer: max.poll.records	1000+ (process fast in bulk)	50-100 (safer for slow processing)
Replication factor	2 (cost savings)	3+ (survive 1 broker loss with ISR=2)
unclean.leader.election	true (availability > consistency)	false (never lose committed data)

⚙ Quick Reference

18 commands from this guide

File	Command / Code	Purpose
iothecodeforgekafkakafka_topic_and_log_structure.sh	kafka-topics.sh \	The Append-Only Log
iothecodeforgekafkaOrderEventConsumer.java	public class OrderEventConsumer {	Partitions, Consumer Groups, and the Parallelism Contract Yo
iothecodeforgekafkaDurableOrderEventProducer.java	public class DurableOrderEventProducer {	Producer Durability vs Latency
iothecodeforgekafkakafka_replication_diagnostics.sh	kafka-topics.sh \	Replication, ISR, and What Actually Happens When a Broker Di
LeaderElectionConfig.sql	SELECT * FROM kafka_broker_roles WHERE role = 'controller';	Leader Election
ConsumerRebalanceMetrics.sql	SELECT group_id, state, partition, current_offset, log_end_offset	Consumer Rebalancing
Kafka_Basic_Ops.sql	kafka-topics \	Kafka CLI Basics
Scalability_Check.sql	kafka-topics \	Scalability
Log_Aggregation_Producer.sql	producer:	Log Aggregation
kafka_use_case_order_tracking.sql	CREATE STREAM orders (	Real-World Applications Beyond Log Aggregation
kafka_transactional_producer.sql	CREATE PRODUCER transaction_producer WITH (	Advanced Kafka Features That Save Your Architecture
kafka_activity_stream.sql	CREATE STREAM user_activity (	Activity Tracking at Scale
install_kafka.sh	brew install kafka	Installation
start_kafka_cli.sql	!ZooKeeper on WSL2	Starting Kafka
kafka_cli_demo.sql	bin/kafka-console-producer.sh --topic orders --bootstrap-server localhost:9092 -...	Console CLI Tools
kraft-setup.sh	./bin/kafka-storage.sh random-uuid	Kafka 3.x/4.x Features
jdbc-source-connector.json	{	Kafka Connect
ksqldb-example.sql	CREATE STREAM orders_stream (	Kafka Streams vs ksqlDB

Key takeaways

Kafka's performance advantage is architectural, not magic

sequential append-only disk writes plus zero-copy I/O (sendfile syscall) let a single broker saturate a 10Gbps NIC; fight this architecture and you will lose throughput fast

Partition count is your permanent parallelism ceiling

you can never have more active consumers than partitions in a group, and you cannot reduce partition count without creating a new topic and migrating data, so over-provision by at least 3x your current consumer count

Consumer group rebalances stop all consumption

switch to CooperativeStickyAssignor (Kafka 2.4+) to eliminate the stop-the-world pause, and always commit offsets in the onPartitionsRevoked callback to avoid reprocessing after a rebalance

acks=all is necessary but not sufficient for durability

you must also set min.insync.replicas=2 at the topic level; without MISR=2, acks=all with only one in-sync replica is functionally identical to acks=1

Idempotent producers (enable.idempotence=true) eliminate duplicate delivery during retries at 2-5% overhead

enable them in every production workload, no exceptions.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

Explain exactly what happens — at the broker and consumer level — when a...

Q02SENIOR

You have a topic with 12 partitions, replication factor 3, and min.insyn...

Q03SENIOR

A colleague suggests setting acks=all to guarantee no data loss. You agr...

Q04SENIOR

How does Kafka's log compaction differ from deletion-based retention, an...

Q05SENIOR

Walk me through Kafka's consumer rebalancing protocol. What's the differ...

Q01 of 05SENIOR

Explain exactly what happens — at the broker and consumer level — when a Kafka consumer exceeds max.poll.interval.ms. What does the broker do, what does the consumer see, and how does this affect other consumers in the same group?

ANSWER

The consumer's internal heartbeat thread continues running. The group coordinator receives heartbeats, so session.timeout.ms doesn't trigger. However, the coordinator also tracks the last time the consumer called poll(). When max.poll.interval.ms passes without a poll call, the coordinator marks the consumer as dead and initiates a rebalance, removing that consumer from the group. The consumer meanwhile finishes processing its batch and calls poll() again. It discovers that its assignment has been revoked (via a WakeupException on older clients, or an empty partition set on newer ones). It then re-joins the group, triggering ANOTHER rebalance. This creates a continuous loop: the slow consumer is repeatedly removed, rejoins, and removed again. During each rebalance, all consumers in the group stop processing entirely. On the default RangeAssignor, all partitions are revoked from all consumers, causing a full stop-the-world pause. On CooperativeStickyAssignor, only the partitions belonging to the slow consumer are revoked, leaving others processing. The solution is to increase max.poll.interval.ms to cover worst-case processing time, or move processing off the poll thread entirely.

FAQ · 4 QUESTIONS

Frequently Asked Questions

How many Kafka partitions should I create for a new topic?

What is the difference between Kafka offset and a message ID?

Is Kafka a database? Can I query it like one?

How do I choose between Kafka and RabbitMQ for a new project?

Naren Founder & Principal Engineer

20+ years shipping high-throughput database systems. Drawn from code that ran under real load.

✓ Verified

production tested

July 18, 2026

last updated

2,466

articles · all by Naren

🔥

That's NoSQL. Mark it forged?

14 min read · try the examples if you haven't