Senior 5 min · June 25, 2026

Kafka Distributed Log: How to Build Reliable Async Data Pipelines

Kafka distributed log internals: partitioning, replication, consumer groups, and production gotchas.

N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

Follow
Production
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
 ● Production Incident 🔎 Debug Guide ⚙ Triage Commands
Quick Answer

Kafka's distributed log is an append-only, partitioned, replicated commit log that stores records in topics. Producers write to partitions, consumers read from them, and the log ensures durability and ordering within a partition.

✦ Definition~90s read
What is Kafka and the Distributed Log?

Apache Kafka is a distributed event streaming platform built around a distributed log — an append-only, partitioned, replicated commit log that decouples producers from consumers. It provides durable, ordered, and fault-tolerant message storage.

Imagine a library where every new book gets added to the end of a shelf, and each shelf is labeled by genre.
Plain-English First

Imagine a library where every new book gets added to the end of a shelf, and each shelf is labeled by genre. Multiple librarians (producers) can add books simultaneously to different shelves. Readers (consumers) can pick any shelf and read books in order, from oldest to newest. If a shelf gets too long, you split it into multiple shelves (partitions) but keep the genre label. The library keeps copies of each shelf in different rooms (replication) so if one room burns down, the books aren't lost.

Most developers think Kafka is just a message queue. It's not. It's a distributed log — and that distinction is the difference between a system that survives a 3AM partition failure and one that silently loses data. I've seen a payments service go down because the team treated Kafka like RabbitMQ and ran out of consumer threads. Don't be that team.

The problem Kafka solves is brutally simple: how do you reliably move data between services without tight coupling, data loss, or performance bottlenecks? Before Kafka, teams hacked together databases as message queues, polled for changes, or used point-to-point HTTP calls that failed under load. Kafka's distributed log gives you a single source of truth for event streams.

By the end of this article, you'll understand Kafka's internal architecture — the log, partitions, segments, and replication protocol — well enough to design resilient async pipelines, debug production issues, and avoid the common pitfalls that burn teams. You'll also get battle-tested code for a realistic checkout service.

The Log: Append-Only, Immutable, and Ordered

At its core, Kafka's distributed log is an append-only sequence of records. Each record has a unique offset within its partition. This immutability is what makes Kafka fast: no locks, no random writes, just sequential I/O. The log is split into segments — files on disk that are rolled over when they reach a size or time limit. Old segments are deleted or compacted based on retention policies.

Why does this matter? Because sequential disk writes are faster than random writes by orders of magnitude. Kafka can saturate a 10Gbps NIC with a single partition if the log is on fast SSDs. But here's the gotcha: if you have too many partitions (thousands), the sequential I/O advantage disappears because the OS is seeking between many small files. I've seen teams create 10,000 partitions and wonder why throughput tanked. Keep partitions per broker under 4,000 for high throughput.

LogSegment.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// io.thecodeforge — System Design tutorial
// Simulating Kafka log segment behavior

import java.io.*;
import java.nio.file.*;

public class LogSegment {
    private final Path directory;
    private final long baseOffset;
    private Path currentSegment;
    private long currentSize;
    private final long maxSegmentBytes = 1024 * 1024 * 1024; // 1GB

    public LogSegment(Path directory, long baseOffset) {
        this.directory = directory;
        this.baseOffset = baseOffset;
        this.currentSegment = directory.resolve(baseOffset + ".log");
        this.currentSize = 0;
    }

    public void append(byte[] record) throws IOException {
        if (currentSize + record.length > maxSegmentBytes) {
            roll();
        }
        Files.write(currentSegment, record, StandardOpenOption.APPEND, StandardOpenOption.CREATE);
        currentSize += record.length;
    }

    private void roll() {
        long newOffset = baseOffset + (currentSize / 100); // simplified
        currentSegment = directory.resolve(newOffset + ".log");
        currentSize = 0;
    }

    public static void main(String[] args) throws IOException {
        LogSegment segment = new LogSegment(Paths.get("/tmp/kafka-logs"), 0);
        segment.append("order-123".getBytes());
        segment.append("order-124".getBytes());
        System.out.println("Appended records to segment: " + segment.currentSegment);
        // Output: Appended records to segment: /tmp/kafka-logs/0.log
    }
}
Output
Appended records to segment: /tmp/kafka-logs/0.log
Production Trap: Too Many Partitions
Each partition is a directory with multiple segment files. With 10,000 partitions, the OS struggles with file handles and page cache. Keep partitions per broker under 4,000 for high throughput, or under 2,000 if using spinning disks.
Kafka Distributed Log Architecture THECODEFORGE.IO Kafka Distributed Log Architecture Core components for building reliable async data pipelines Append-Only Log Immutable, ordered record sequence Partitioning Key-based sharding for parallelism Replication ISR quorum for broker failure Consumer Groups Scalable reads with offset tracking Idempotent Producers Exactly-once via batch+sequence ⚠ Overkill trap: Kafka for simple queues Use lightweight brokers for low-throughput needs THECODEFORGE.IO
thecodeforge.io
Kafka Distributed Log Architecture
Kafka Distributed Log

Partitioning: The Key to Parallelism and Ordering

Partitions are the unit of parallelism in Kafka. Each partition is an ordered, immutable log. Producers write to partitions based on a key (or round-robin if no key). Consumers read from partitions in order. This gives you a trade-off: within a partition, messages are ordered; across partitions, order is not guaranteed.

Here's the production reality: if you need global ordering, you can only have one partition. That kills throughput. Most systems don't need global ordering — they need ordering per entity (e.g., per user, per order). Use the entity ID as the partition key. I've seen teams use a random key for load balancing and then wonder why user events arrive out of order. Don't do that.

When choosing the number of partitions, consider your throughput requirements and consumer parallelism. A good rule of thumb: start with 3x the number of consumers you plan to run. You can increase partitions later, but decreasing is not supported without recreating the topic.

PartitionKeyProducer.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// io.thecodeforge — System Design tutorial
// Kafka producer with partition key for order-per-user

import org.apache.kafka.clients.producer.*;
import java.util.Properties;

public class OrderProducer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("acks", "all"); // Wait for all in-sync replicas to acknowledge

        Producer<String, String> producer = new KafkaProducer<>(props);

        // Use userId as key to ensure all events for a user go to same partition
        String userId = "user-42";
        String orderEvent = "{\"orderId\":\"order-789\",\"status\":\"created\"}";
        ProducerRecord<String, String> record = new ProducerRecord<>("orders", userId, orderEvent);

        producer.send(record, (metadata, exception) -> {
            if (exception != null) {
                System.err.println("Failed to send: " + exception.getMessage());
            } else {
                System.out.println("Sent to partition " + metadata.partition() + " at offset " + metadata.offset());
            }
        });

        producer.close();
    }
}
Output
Sent to partition 3 at offset 42
Senior Shortcut: Partition Count Formula
Start with max(3 * expected_consumers, 6). Monitor consumer lag. If lag grows, increase partitions by doubling. Never go below 3 partitions for production topics — you need at least 3 for the default min.insync.replicas=2 to work with replication factor 3.

Replication: How Kafka Survives Broker Failures

Kafka replicates each partition across multiple brokers (configurable via replication.factor). One broker is the leader for a partition; the rest are followers. Producers write to the leader; followers replicate the data. If the leader fails, a follower becomes the new leader (controlled by the controller broker).

The key config is min.insync.replicas — the minimum number of replicas that must acknowledge a write for it to be considered committed. If you set acks=all and min.insync.replicas=2, a write must be acknowledged by the leader and at least one follower. This prevents data loss if the leader crashes after acknowledging but before the follower replicates.

Here's the gotcha: if the number of in-sync replicas drops below min.insync.replicas, the broker stops accepting writes (NotEnoughReplicasException). I've seen this happen when a broker goes down for maintenance and the team forgot to adjust min.insync.replicas. Always set min.insync.replicas to replication.factor - 1 for maximum durability, but be prepared for availability impact during broker failures.

ReplicationConfig.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// io.thecodeforge — System Design tutorial
// Topic creation with replication settings

// Command to create a topic with replication factor 3 and min.insync.replicas=2
// kafka-topics.sh --bootstrap-server localhost:9092 --create --topic orders \
//   --partitions 6 --replication-factor 3 --config min.insync.replicas=2

// Verify with:
// kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic orders
// Output:
// Topic: orders	PartitionCount: 6	ReplicationFactor: 3	Configs: min.insync.replicas=2
// 	Topic: orders	Partition: 0	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
// 	Topic: orders	Partition: 1	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
// ...
Output
Topic: orders PartitionCount: 6 ReplicationFactor: 3 Configs: min.insync.replicas=2
Topic: orders Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Never Do This: acks=0 in Production
Setting acks=0 means the producer doesn't wait for any acknowledgment. You will lose data on broker failure. I've seen a metrics pipeline use this and lose 20% of events during a rolling restart. Use acks=all with min.insync.replicas=2 for any data you care about.
Kafka Partition Replication FlowTHECODEFORGE.IOKafka Partition Replication FlowLeader-follower model for fault toleranceProducerWrites only to partition leaderPartition LeaderAccepts writes, serves readsFollower 1Replicates from leader in-syncFollower 2Replicates from leader in-syncNew LeaderElected from ISR on failure⚠ Only in-sync replicas (ISR) can become leaderTHECODEFORGE.IO
thecodeforge.io
Kafka Partition Replication Flow
Kafka Distributed Log

Consumer Groups: Scaling Reads Without Losing Order

Consumer groups allow multiple consumers to read from a topic in parallel. Each partition is assigned to exactly one consumer in the group. This ensures ordering within a partition (since only one consumer reads it) while allowing horizontal scaling. If you have more consumers than partitions, some consumers will be idle.

The consumer group protocol uses a group coordinator broker and a rebalance protocol. When a consumer joins or leaves, a rebalance triggers, and partitions are reassigned. During rebalance, all consumers in the group stop processing (stop-the-world). This can cause latency spikes. I've seen teams with hundreds of consumers experience 30-second rebalances. Mitigation: use static group membership (introduced in Kafka 2.3) or cooperative rebalancing (incremental rebalance).

Another gotcha: if your consumer processes messages slowly, lag grows. But if it crashes, the group rebalances, and the new consumer starts from the last committed offset. If you use auto-commit, you might lose messages between the last commit and the crash. Always use manual offset commits after processing, not before.

ManualCommitConsumer.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// io.thecodeforge — System Design tutorial
// Consumer with manual offset commit after processing

import org.apache.kafka.clients.consumer.*;
import java.time.Duration;
import java.util.*;

public class OrderConsumer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("group.id", "order-processors");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("enable.auto.commit", "false"); // Manual commit
        props.put("max.poll.records", "100");

        Consumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("orders"));

        try {
            while (true) {
                ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
                for (ConsumerRecord<String, String> record : records) {
                    processOrder(record.value());
                }
                // Commit after successful batch processing
                consumer.commitSync();
            }
        } finally {
            consumer.close();
        }
    }

    private static void processOrder(String orderJson) {
        // Simulate processing (e.g., update database, call API)
        System.out.println("Processing: " + orderJson);
    }
}
Output
Processing: {"orderId":"order-789","status":"created"}
Processing: {"orderId":"order-790","status":"created"}
Interview Gold: Rebalance Protocol
During a rebalance, all consumers in the group stop processing. The group coordinator assigns partitions using a partition assignor (range, round-robin, sticky, or cooperative). Cooperative rebalancing (introduced in KIP-429) allows consumers to keep processing partitions that aren't being reassigned. Use partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor for large groups.

Producers: Idempotence, Batching, and Throughput

Kafka producers batch records to improve throughput. The batch.size and linger.ms configs control batching. batch.size is the maximum bytes to accumulate before sending; linger.ms is the maximum time to wait for more records. Setting linger.ms=5 adds 5ms latency but can double throughput under moderate load.

Idempotent producers (enable.idempotence=true) prevent duplicate records caused by retries. When a producer sends a batch and the broker acknowledges but the producer doesn't receive the ack (e.g., network issue), the producer retries. Without idempotence, this can cause duplicate records. With idempotence, the broker deduplicates based on producer ID and sequence number. Always enable idempotence in production — the performance cost is negligible.

Here's a gotcha: if you set max.in.flight.requests.per.connection > 1 with idempotence, Kafka ensures ordering within a partition. But if you set it > 1 without idempotence, retries can cause out-of-order writes. I've seen a logging pipeline lose ordering because of this. Keep max.in.flight.requests.per.connection=5 with idempotence for high throughput.

IdempotentProducer.systemdesignSYSTEMDESIGN
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// io.thecodeforge — System Design tutorial
// Producer with idempotence and batching

import org.apache.kafka.clients.producer.*;
import java.util.Properties;

public class HighThroughputProducer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("acks", "all");
        props.put("enable.idempotence", "true"); // Prevent duplicates
        props.put("batch.size", 16384); // 16KB
        props.put("linger.ms", 5); // Wait up to 5ms for more records
        props.put("compression.type", "snappy"); // Reduce network bandwidth

        Producer<String, String> producer = new KafkaProducer<>(props);

        for (int i = 0; i < 1000; i++) {
            String userId = "user-" + (i % 100);
            String event = "{\"eventId\":\"" + i + "\"}";
            producer.send(new ProducerRecord<>("events", userId, event));
        }

        producer.close();
    }
}
Output
(No output — records sent asynchronously)
Senior Shortcut: Compression
Use compression.type=snappy or lz4 for text-heavy data. Snappy gives good compression with low CPU overhead. Gzip compresses more but uses more CPU. For high-throughput pipelines, Snappy is the sweet spot.

When Not to Use Kafka: The Overkill Trap

Kafka is not a message queue. If you need point-to-point messaging with low latency (sub-millisecond) and don't need replay or long-term storage, use RabbitMQ or Redis Streams. If you need a simple work queue for background jobs, use Redis or a database-backed queue. Kafka's strength is in streaming, replay, and multi-subscriber patterns.

I've seen teams use Kafka for a simple email notification service with 10 messages per second. They ended up with a 3-broker cluster, ZooKeeper (or KRaft), and a complex deployment. A single Redis instance would have handled it with less ops overhead. Don't use Kafka unless you need at least two of: replay, multiple consumer groups, or high throughput (>10K msg/s).

The Classic Bug: Using Kafka as a Database
Kafka is not a database. You cannot query records by key efficiently. If you need to look up the latest state for an entity, use a database or a compacted topic with a state store (Kafka Streams). I've seen teams try to scan a topic from the beginning to find a record — that's a full table scan on every request.
Kafka vs. Traditional Message QueuesTHECODEFORGE.IOKafka vs. Traditional Message QueuesWhen to choose Kafka vs. RabbitMQ/RedisKafka (Distributed Log)High throughput, replayable logOrdering within partitionsLong-term storage, multi-consumerBest for event streaming, ETLRabbitMQ / RedisSub-millisecond point-to-pointNo replay, transient messagesSimple work queue patternBest for low-latency tasksKafka excels at streams; queues win for simple jobsTHECODEFORGE.IO
thecodeforge.io
Kafka vs. Traditional Message Queues
Kafka Distributed Log
● Production incidentPOST-MORTEMseverity: high

The 4GB Container That Kept Dying

Symptom
A Kafka broker in a Kubernetes cluster kept crashing with OutOfMemoryError every 6 hours. Heap was 3GB, container limit 4GB.
Assumption
The team assumed a memory leak in the Kafka process and planned to increase the container limit.
Root cause
The topic had log.retention.bytes=1GB but log.segment.bytes=1GB and log.retention.check.interval.ms=300000. With high throughput, segments accumulated faster than retention could clean them, causing the log directory to grow beyond the container's disk limit, which triggered OOM as the OS tried to page.
Fix
Set log.retention.bytes=500MB and log.segment.bytes=256MB. Also set log.retention.check.interval.ms=60000 to clean more frequently. Added disk monitoring with Prometheus.
Key lesson
  • Always align segment size, retention size, and check interval with your throughput.
  • Disk pressure kills brokers faster than CPU or memory.
Production debug guideSystematic recovery paths for the failure modes engineers actually hit.3 entries
Symptom · 01
Consumer lag growing unbounded
Fix
1. Run kafka-consumer-groups --bootstrap-server localhost:9092 --group my-group --describe to check lag per partition. 2. Check if number of consumers equals number of partitions. 3. If lag is on specific partitions, check if those partitions have a slow consumer or hot key. 4. Increase partitions (double) and add consumers.
Symptom · 02
Produce request timeout (TimeoutException)
Fix
1. Check broker CPU and disk I/O. 2. Check min.insync.replicas — if ISR count drops below it, writes are rejected. 3. Increase request.timeout.ms and delivery.timeout.ms. 4. Ensure network is stable.
Symptom · 03
NotEnoughReplicasException
Fix
1. Run kafka-topics --describe --topic my-topic to see ISR. 2. If a broker is down, wait for it to come back or reduce min.insync.replicas temporarily. 3. Check if unclean.leader.election.enable is set to false (default) — if so, partitions with no ISR become unavailable.
★ Kafka and the Distributed Log Triage Cheat SheetFirst-response commands for when things go wrong — copy-paste ready.
Consumer lag `LAG` column shows large numbers
Immediate action
Check if consumers are alive and processing
Commands
kafka-consumer-groups --bootstrap-server localhost:9092 --group my-group --describe
kafka-consumer-groups --bootstrap-server localhost:9092 --group my-group --reset-offsets --to-latest --execute
Fix now
Increase partitions or add consumers. If hot key, repartition with better key.
Producer `TimeoutException`+
Immediate action
Check broker availability and ISR
Commands
kafka-broker-api-versions --bootstrap-server localhost:9092
kafka-topics --describe --topic my-topic
Fix now
Increase request.timeout.ms or reduce min.insync.replicas temporarily.
Broker `OutOfMemoryError`+
Immediate action
Check disk usage and segment count
Commands
df -h /var/lib/kafka/data
ls /var/lib/kafka/data/my-topic-* | wc -l
Fix now
Reduce log.segment.bytes and log.retention.bytes. Increase JVM heap.
Rebalance happening too often+
Immediate action
Check consumer session timeout and heartbeat
Commands
grep 'rebalance' /var/log/kafka/consumer.log
kafka-consumer-groups --bootstrap-server localhost:9092 --group my-group --describe
Fix now
Increase session.timeout.ms and heartbeat.interval.ms. Use cooperative rebalancing.
Feature / AspectKafka (Distributed Log)RabbitMQ (Message Queue)
Storage modelAppend-only log on disk; retains messages for configurable time/sizeIn-memory or on disk; messages deleted after acknowledgment
OrderingStrict ordering within a partition; no global orderingOrdering per queue; can be lost with multiple consumers
ThroughputMillions of messages per second (with proper tuning)Tens of thousands per second
ReplayYes — consumers can rewind to any offsetNo — messages are deleted after ack
Consumer modelPull-based; consumer groups for parallelismPush-based; competing consumers
Use caseEvent streaming, log aggregation, data pipelinesTask queues, RPC, point-to-point messaging

Key takeaways

1
Kafka's distributed log is an append-only, partitioned, replicated commit log
not a message queue. Design your system around this model.
2
Always use idempotent producers (enable.idempotence=true) and manual offset commits (enable.auto.commit=false) in production to prevent data loss and duplicates.
3
Partition count is critical
too few limits parallelism, too many kills sequential I/O. Start with 3x expected consumers and monitor.
4
Kafka is overkill for simple point-to-point messaging or low-throughput task queues. Use RabbitMQ or Redis for those cases.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01SENIOR
How does Kafka handle a broker failure? Describe the leader election pro...
Q02SENIOR
When would you choose Kafka over a traditional message queue like Rabbit...
Q03SENIOR
What happens when a consumer crashes after processing a message but befo...
Q04JUNIOR
What is a Kafka partition and why is it important?
Q05SENIOR
You notice consumer lag growing despite having enough consumers. How do ...
Q06SENIOR
How would you design a system that processes 1 million events per second...
Q01 of 06SENIOR

How does Kafka handle a broker failure? Describe the leader election process.

ANSWER
The controller broker detects the failure via ZooKeeper (or KRaft quorum). It selects a new leader from the in-sync replicas (ISR) for each partition that had the failed broker as leader. If unclean.leader.election.enable=false and no ISR exists, the partition becomes unavailable. The new leader starts accepting writes and followers replicate from it.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between Kafka and a traditional message queue?
02
How do I choose the number of partitions for a Kafka topic?
03
How do I prevent duplicate messages in Kafka?
04
What happens if all in-sync replicas for a partition fail?
N
Naren Founder & Principal Engineer

20+ years shipping large-scale distributed systems. Drawn from code that ran under real load.

Follow
Verified
production tested
June 25, 2026
last updated
1,663
articles · all by Naren
🔥

That's Async & Data Processing. Mark it forged?

5 min read · try the examples if you haven't

Previous
Publish-Subscribe Pattern
2 / 7 · Async & Data Processing
Next
MapReduce and Batch Processing