Senior 5 min · June 25, 2026

Kafka Distributed Log: How to Build Reliable Async Data Pipelines

Q: What is the difference between Kafka and a traditional message queue?

Kafka is a distributed log — messages are stored durably and can be replayed. Traditional message queues delete messages after acknowledgment. Kafka also supports multiple independent consumer groups reading the same stream, while queues typically have competing consumers.

Q: How do I choose the number of partitions for a Kafka topic?

Start with 3 times the number of expected consumers. Monitor consumer lag. If lag grows, double the partition count. Avoid going above 4,000 partitions per broker for high throughput. You can increase partitions later, but decreasing requires recreating the topic.

Q: How do I prevent duplicate messages in Kafka?

Enable idempotent producers (`enable.idempotence=true`) to prevent duplicates from producer retries. On the consumer side, implement idempotent processing (e.g., deduplicate by event ID) or use Kafka's transactional API for exactly-once semantics.

Q: What happens if all in-sync replicas for a partition fail?

If `unclean.leader.election.enable=false` (default), the partition becomes unavailable until at least one replica comes back. If set to true, an out-of-sync replica can become leader, potentially losing data. For high availability, set `unclean.leader.election.enable=true` and accept the data loss risk.

Kafka distributed log internals: partitioning, replication, consumer groups, and production gotchas.

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

✓ Production

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

● Production Incident 🔎 Debug Guide ⚙ Triage Commands

⚡Quick Answer

Kafka's distributed log is an append-only, partitioned, replicated commit log that stores records in topics. Producers write to partitions, consumers read from them, and the log ensures durability and ordering within a partition.

✦ Definition~90s read

What is Kafka and the Distributed Log?

Apache Kafka is a distributed event streaming platform built around a distributed log — an append-only, partitioned, replicated commit log that decouples producers from consumers. It provides durable, ordered, and fault-tolerant message storage.

★

Imagine a library where every new book gets added to the end of a shelf, and each shelf is labeled by genre.

Plain-English First

Imagine a library where every new book gets added to the end of a shelf, and each shelf is labeled by genre. Multiple librarians (producers) can add books simultaneously to different shelves. Readers (consumers) can pick any shelf and read books in order, from oldest to newest. If a shelf gets too long, you split it into multiple shelves (partitions) but keep the genre label. The library keeps copies of each shelf in different rooms (replication) so if one room burns down, the books aren't lost.

Most developers think Kafka is just a message queue. It's not. It's a distributed log — and that distinction is the difference between a system that survives a 3AM partition failure and one that silently loses data. I've seen a payments service go down because the team treated Kafka like RabbitMQ and ran out of consumer threads. Don't be that team.

The problem Kafka solves is brutally simple: how do you reliably move data between services without tight coupling, data loss, or performance bottlenecks? Before Kafka, teams hacked together databases as message queues, polled for changes, or used point-to-point HTTP calls that failed under load. Kafka's distributed log gives you a single source of truth for event streams.

By the end of this article, you'll understand Kafka's internal architecture — the log, partitions, segments, and replication protocol — well enough to design resilient async pipelines, debug production issues, and avoid the common pitfalls that burn teams. You'll also get battle-tested code for a realistic checkout service.

The Log: Append-Only, Immutable, and Ordered

At its core, Kafka's distributed log is an append-only sequence of records. Each record has a unique offset within its partition. This immutability is what makes Kafka fast: no locks, no random writes, just sequential I/O. The log is split into segments — files on disk that are rolled over when they reach a size or time limit. Old segments are deleted or compacted based on retention policies.

Why does this matter? Because sequential disk writes are faster than random writes by orders of magnitude. Kafka can saturate a 10Gbps NIC with a single partition if the log is on fast SSDs. But here's the gotcha: if you have too many partitions (thousands), the sequential I/O advantage disappears because the OS is seeking between many small files. I've seen teams create 10,000 partitions and wonder why throughput tanked. Keep partitions per broker under 4,000 for high throughput.

LogSegment.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial
// Simulating Kafka log segment behavior

import java.io.*;
import java.nio.file.*;

public class LogSegment {
    private final Path directory;
    private final long baseOffset;
    private Path currentSegment;
    private long currentSize;
    private final long maxSegmentBytes = 1024 * 1024 * 1024; // 1GB

    public LogSegment(Path directory, long baseOffset) {
        this.directory = directory;
        this.baseOffset = baseOffset;
        this.currentSegment = directory.resolve(baseOffset + ".log");
        this.currentSize = 0;
    }

    public void append(byte[] record) throws IOException {
        if (currentSize + record.length > maxSegmentBytes) {
            roll();
        }
        Files.write(currentSegment, record, StandardOpenOption.APPEND, StandardOpenOption.CREATE);
        currentSize += record.length;
    }

    private void roll() {
        long newOffset = baseOffset + (currentSize / 100); // simplified
        currentSegment = directory.resolve(newOffset + ".log");
        currentSize = 0;
    }

    public static void main(String[] args) throws IOException {
        LogSegment segment = new LogSegment(Paths.get("/tmp/kafka-logs"), 0);
        segment.append("order-123".getBytes());
        segment.append("order-124".getBytes());
        System.out.println("Appended records to segment: " + segment.currentSegment);
        // Output: Appended records to segment: /tmp/kafka-logs/0.log
    }
}

Output

Appended records to segment: /tmp/kafka-logs/0.log

Production Trap: Too Many Partitions

Each partition is a directory with multiple segment files. With 10,000 partitions, the OS struggles with file handles and page cache. Keep partitions per broker under 4,000 for high throughput, or under 2,000 if using spinning disks.

thecodeforge.io

Kafka Distributed Log Architecture

Kafka Distributed Log

Partitioning: The Key to Parallelism and Ordering

Partitions are the unit of parallelism in Kafka. Each partition is an ordered, immutable log. Producers write to partitions based on a key (or round-robin if no key). Consumers read from partitions in order. This gives you a trade-off: within a partition, messages are ordered; across partitions, order is not guaranteed.

Here's the production reality: if you need global ordering, you can only have one partition. That kills throughput. Most systems don't need global ordering — they need ordering per entity (e.g., per user, per order). Use the entity ID as the partition key. I've seen teams use a random key for load balancing and then wonder why user events arrive out of order. Don't do that.

When choosing the number of partitions, consider your throughput requirements and consumer parallelism. A good rule of thumb: start with 3x the number of consumers you plan to run. You can increase partitions later, but decreasing is not supported without recreating the topic.

PartitionKeyProducer.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial
// Kafka producer with partition key for order-per-user

import org.apache.kafka.clients.producer.*;
import java.util.Properties;

public class OrderProducer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("acks", "all"); // Wait for all in-sync replicas to acknowledge

        Producer<String, String> producer = new KafkaProducer<>(props);

        // Use userId as key to ensure all events for a user go to same partition
        String userId = "user-42";
        String orderEvent = "{\"orderId\":\"order-789\",\"status\":\"created\"}";
        ProducerRecord<String, String> record = new ProducerRecord<>("orders", userId, orderEvent);

        producer.send(record, (metadata, exception) -> {
            if (exception != null) {
                System.err.println("Failed to send: " + exception.getMessage());
            } else {
                System.out.println("Sent to partition " + metadata.partition() + " at offset " + metadata.offset());
            }
        });

        producer.close();
    }
}

Output

Sent to partition 3 at offset 42

Senior Shortcut: Partition Count Formula

Start with max(3 * expected_consumers, 6). Monitor consumer lag. If lag grows, increase partitions by doubling. Never go below 3 partitions for production topics — you need at least 3 for the default min.insync.replicas=2 to work with replication factor 3.

Replication: How Kafka Survives Broker Failures

Kafka replicates each partition across multiple brokers (configurable via replication.factor). One broker is the leader for a partition; the rest are followers. Producers write to the leader; followers replicate the data. If the leader fails, a follower becomes the new leader (controlled by the controller broker).

The key config is min.insync.replicas — the minimum number of replicas that must acknowledge a write for it to be considered committed. If you set acks=all and min.insync.replicas=2, a write must be acknowledged by the leader and at least one follower. This prevents data loss if the leader crashes after acknowledging but before the follower replicates.

Here's the gotcha: if the number of in-sync replicas drops below min.insync.replicas, the broker stops accepting writes (NotEnoughReplicasException). I've seen this happen when a broker goes down for maintenance and the team forgot to adjust min.insync.replicas. Always set min.insync.replicas to replication.factor - 1 for maximum durability, but be prepared for availability impact during broker failures.

ReplicationConfig.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial
// Topic creation with replication settings

// Command to create a topic with replication factor 3 and min.insync.replicas=2
// kafka-topics.sh --bootstrap-server localhost:9092 --create --topic orders \
//   --partitions 6 --replication-factor 3 --config min.insync.replicas=2

// Verify with:
// kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic orders
// Output:
// Topic: orders	PartitionCount: 6	ReplicationFactor: 3	Configs: min.insync.replicas=2
// 	Topic: orders	Partition: 0	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
// 	Topic: orders	Partition: 1	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
// ...

Output

Topic: orders PartitionCount: 6 ReplicationFactor: 3 Configs: min.insync.replicas=2

Topic: orders Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3

Never Do This: acks=0 in Production

Setting acks=0 means the producer doesn't wait for any acknowledgment. You will lose data on broker failure. I've seen a metrics pipeline use this and lose 20% of events during a rolling restart. Use acks=all with min.insync.replicas=2 for any data you care about.

thecodeforge.io

Kafka Partition Replication Flow

Kafka Distributed Log

Consumer Groups: Scaling Reads Without Losing Order

Consumer groups allow multiple consumers to read from a topic in parallel. Each partition is assigned to exactly one consumer in the group. This ensures ordering within a partition (since only one consumer reads it) while allowing horizontal scaling. If you have more consumers than partitions, some consumers will be idle.

The consumer group protocol uses a group coordinator broker and a rebalance protocol. When a consumer joins or leaves, a rebalance triggers, and partitions are reassigned. During rebalance, all consumers in the group stop processing (stop-the-world). This can cause latency spikes. I've seen teams with hundreds of consumers experience 30-second rebalances. Mitigation: use static group membership (introduced in Kafka 2.3) or cooperative rebalancing (incremental rebalance).

Another gotcha: if your consumer processes messages slowly, lag grows. But if it crashes, the group rebalances, and the new consumer starts from the last committed offset. If you use auto-commit, you might lose messages between the last commit and the crash. Always use manual offset commits after processing, not before.

ManualCommitConsumer.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial
// Consumer with manual offset commit after processing

import org.apache.kafka.clients.consumer.*;
import java.time.Duration;
import java.util.*;

public class OrderConsumer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("group.id", "order-processors");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("enable.auto.commit", "false"); // Manual commit
        props.put("max.poll.records", "100");

        Consumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("orders"));

        try {
            while (true) {
                ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
                for (ConsumerRecord<String, String> record : records) {
                    processOrder(record.value());
                }
                // Commit after successful batch processing
                consumer.commitSync();
            }
        } finally {
            consumer.close();
        }
    }

    private static void processOrder(String orderJson) {
        // Simulate processing (e.g., update database, call API)
        System.out.println("Processing: " + orderJson);
    }
}

Output

Processing: {"orderId":"order-789","status":"created"}

Processing: {"orderId":"order-790","status":"created"}

Interview Gold: Rebalance Protocol

During a rebalance, all consumers in the group stop processing. The group coordinator assigns partitions using a partition assignor (range, round-robin, sticky, or cooperative). Cooperative rebalancing (introduced in KIP-429) allows consumers to keep processing partitions that aren't being reassigned. Use partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor for large groups.

Producers: Idempotence, Batching, and Throughput

Kafka producers batch records to improve throughput. The batch.size and linger.ms configs control batching. batch.size is the maximum bytes to accumulate before sending; linger.ms is the maximum time to wait for more records. Setting linger.ms=5 adds 5ms latency but can double throughput under moderate load.

Idempotent producers (enable.idempotence=true) prevent duplicate records caused by retries. When a producer sends a batch and the broker acknowledges but the producer doesn't receive the ack (e.g., network issue), the producer retries. Without idempotence, this can cause duplicate records. With idempotence, the broker deduplicates based on producer ID and sequence number. Always enable idempotence in production — the performance cost is negligible.

Here's a gotcha: if you set max.in.flight.requests.per.connection > 1 with idempotence, Kafka ensures ordering within a partition. But if you set it > 1 without idempotence, retries can cause out-of-order writes. I've seen a logging pipeline lose ordering because of this. Keep max.in.flight.requests.per.connection=5 with idempotence for high throughput.

IdempotentProducer.systemdesignSYSTEMDESIGN

// io.thecodeforge — System Design tutorial
// Producer with idempotence and batching

import org.apache.kafka.clients.producer.*;
import java.util.Properties;

public class HighThroughputProducer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("acks", "all");
        props.put("enable.idempotence", "true"); // Prevent duplicates
        props.put("batch.size", 16384); // 16KB
        props.put("linger.ms", 5); // Wait up to 5ms for more records
        props.put("compression.type", "snappy"); // Reduce network bandwidth

        Producer<String, String> producer = new KafkaProducer<>(props);

        for (int i = 0; i < 1000; i++) {
            String userId = "user-" + (i % 100);
            String event = "{\"eventId\":\"" + i + "\"}";
            producer.send(new ProducerRecord<>("events", userId, event));
        }

        producer.close();
    }
}

Output

(No output — records sent asynchronously)

Senior Shortcut: Compression

Use compression.type=snappy or lz4 for text-heavy data. Snappy gives good compression with low CPU overhead. Gzip compresses more but uses more CPU. For high-throughput pipelines, Snappy is the sweet spot.

When Not to Use Kafka: The Overkill Trap

Kafka is not a message queue. If you need point-to-point messaging with low latency (sub-millisecond) and don't need replay or long-term storage, use RabbitMQ or Redis Streams. If you need a simple work queue for background jobs, use Redis or a database-backed queue. Kafka's strength is in streaming, replay, and multi-subscriber patterns.

I've seen teams use Kafka for a simple email notification service with 10 messages per second. They ended up with a 3-broker cluster, ZooKeeper (or KRaft), and a complex deployment. A single Redis instance would have handled it with less ops overhead. Don't use Kafka unless you need at least two of: replay, multiple consumer groups, or high throughput (>10K msg/s).

The Classic Bug: Using Kafka as a Database

Kafka is not a database. You cannot query records by key efficiently. If you need to look up the latest state for an entity, use a database or a compacted topic with a state store (Kafka Streams). I've seen teams try to scan a topic from the beginning to find a record — that's a full table scan on every request.

thecodeforge.io

Kafka vs. Traditional Message Queues

Kafka Distributed Log

● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom

A Kafka broker in a Kubernetes cluster kept crashing with OutOfMemoryError every 6 hours. Heap was 3GB, container limit 4GB.

Assumption

The team assumed a memory leak in the Kafka process and planned to increase the container limit.

Root cause

The topic had log.retention.bytes=1GB but log.segment.bytes=1GB and log.retention.check.interval.ms=300000. With high throughput, segments accumulated faster than retention could clean them, causing the log directory to grow beyond the container's disk limit, which triggered OOM as the OS tried to page.

Fix

Set log.retention.bytes=500MB and log.segment.bytes=256MB. Also set log.retention.check.interval.ms=60000 to clean more frequently. Added disk monitoring with Prometheus.

Key lesson

Always align segment size, retention size, and check interval with your throughput.
Disk pressure kills brokers faster than CPU or memory.

Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries

Symptom · 01

Consumer lag growing unbounded

→

Fix

1. Run kafka-consumer-groups --bootstrap-server localhost:9092 --group my-group --describe to check lag per partition. 2. Check if number of consumers equals number of partitions. 3. If lag is on specific partitions, check if those partitions have a slow consumer or hot key. 4. Increase partitions (double) and add consumers.

Symptom · 02

Produce request timeout (TimeoutException)

→

Fix

1. Check broker CPU and disk I/O. 2. Check min.insync.replicas — if ISR count drops below it, writes are rejected. 3. Increase request.timeout.ms and delivery.timeout.ms. 4. Ensure network is stable.

Symptom · 03

NotEnoughReplicasException

→

Fix

1. Run kafka-topics --describe --topic my-topic to see ISR. 2. If a broker is down, wait for it to come back or reduce min.insync.replicas temporarily. 3. Check if unclean.leader.election.enable is set to false (default) — if so, partitions with no ISR become unavailable.

★ Kafka and the Distributed Log Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.

Consumer lag `LAG` column shows large numbers−

Immediate action

Check if consumers are alive and processing

Commands

kafka-consumer-groups --bootstrap-server localhost:9092 --group my-group --describe

kafka-consumer-groups --bootstrap-server localhost:9092 --group my-group --reset-offsets --to-latest --execute

Fix now

Increase partitions or add consumers. If hot key, repartition with better key.

Producer `TimeoutException`+

Broker `OutOfMemoryError`+

Rebalance happening too often+

Feature / Aspect	Kafka (Distributed Log)	RabbitMQ (Message Queue)
Storage model	Append-only log on disk; retains messages for configurable time/size	In-memory or on disk; messages deleted after acknowledgment
Ordering	Strict ordering within a partition; no global ordering	Ordering per queue; can be lost with multiple consumers
Throughput	Millions of messages per second (with proper tuning)	Tens of thousands per second
Replay	Yes — consumers can rewind to any offset	No — messages are deleted after ack
Consumer model	Pull-based; consumer groups for parallelism	Push-based; competing consumers
Use case	Event streaming, log aggregation, data pipelines	Task queues, RPC, point-to-point messaging

Key takeaways

Kafka's distributed log is an append-only, partitioned, replicated commit log

not a message queue. Design your system around this model.

Always use idempotent producers (enable.idempotence=true) and manual offset commits (enable.auto.commit=false) in production to prevent data loss and duplicates.

Partition count is critical

too few limits parallelism, too many kills sequential I/O. Start with 3x expected consumers and monitor.

Kafka is overkill for simple point-to-point messaging or low-throughput task queues. Use RabbitMQ or Redis for those cases.

INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR

How does Kafka handle a broker failure? Describe the leader election pro...

Q02SENIOR

When would you choose Kafka over a traditional message queue like Rabbit...

Q03SENIOR

What happens when a consumer crashes after processing a message but befo...

Q04JUNIOR

What is a Kafka partition and why is it important?

Q05SENIOR

You notice consumer lag growing despite having enough consumers. How do ...

Q06SENIOR

How would you design a system that processes 1 million events per second...

Q01 of 06SENIOR

How does Kafka handle a broker failure? Describe the leader election process.

ANSWER

The controller broker detects the failure via ZooKeeper (or KRaft quorum). It selects a new leader from the in-sync replicas (ISR) for each partition that had the failed broker as leader. If unclean.leader.election.enable=false and no ISR exists, the partition becomes unavailable. The new leader starts accepting writes and followers replicate from it.

FAQ · 4 QUESTIONS

Frequently Asked Questions

What is the difference between Kafka and a traditional message queue?

How do I choose the number of partitions for a Kafka topic?

How do I prevent duplicate messages in Kafka?

What happens if all in-sync replicas for a partition fail?

Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

✓ Verified

production tested

June 25, 2026

last updated

1,663

articles · all by Naren

🔥

That's Async & Data Processing. Mark it forged?

5 min read · try the examples if you haven't