0% found this document useful (0 votes)

24 views

Understanding Kafka Topic Partitions - by Dunith Danushka - Tributary Data - Medium

The document discusses Kafka topic partitions and how they are used to distribute data across brokers for scalability and redundancy. Partitions allow topics to be divided into smaller pieces that can be distributed and replicated across multiple servers. Producers can write to partitions using partition keys to group related data, while consumers read from partitions in order by offset.

Uploaded by

ajay.rathore

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Understanding Kafka Topic Partitions - by Dunith Danushka - Tributary Data - Medium

Uploaded by

ajay.rathore

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

Get unlimited access to the best of Medium for less than $1/week. Become a member

Understanding Kafka Topic Partitions

Everything in Kafka is modeled around partitions. They rule Kafka’s storage,
scalability, replication, and message movement.

Dunith Danushka · Follow

Published in Tributary Data
7 min read · Mar 29, 2021

Listen Share More

Photo by Meggyn Pomerleau on Unsplash

P.S I did some edits to reflect the feedback I received from the audience. Thanks for your
valuable contribution. I expect more! :)

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 1/17
14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

Everything in Kafka revolves around partitions. They play a crucial role in

structuring Kafka’s storage and the production and consumption of messages.

Understanding partitions helps you learn Kafka faster. This article walks through
the concepts, structure, and behavior of Kafka’s partitions.

Events, Streams, and Kafka Topics

Before diving into partitions, we need to set the stage here. So let’s look at some
high-level concepts and how they relate to partitions.

Events
An event represents a fact that happened in the past. Events are immutable and
never stay in one place. They always travel from one system to another system,
carrying the state changes that happened.

Streams
An event stream represents related events in motion.

Topics
When an event stream enters Kafka, it is persisted as a topic. In Kafka’s universe, a
topic is a materialized event stream. In other words, a topic is a stream at rest.

Topic groups related events together and durably stores them. The closest analogy
for a Kafka topic is a table in a database or folder in a file system.

Topics are the central concept in Kafka that decouples producers and consumers. A
consumer pulls messages off of a Kafka topic while producers push messages into a
Kafka topic. A topic can have many producers and many consumers.

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 2/17
14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

How events, streams, and topics play together in Kafka

Partitions
Kafka’s topics are divided into several partitions. While the topic is a logical concept
in Kafka, a partition is the smallest storage unit that holds a subset of records owned
by a topic. Each partition is a single log file where records are written to it in an
append-only fashion.

When talking about the content inside a partition, I will use the terms record and
message interchangeably.

Offsets and the ordering of messages

The records in the partitions are each assigned a sequential identifier called the
offset, which is unique for each record within the partition.

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 3/17
14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

The offset is an incremental and immutable number, maintained by Kafka. When a

record is written to a partition, it is appended to the end of the log, assigning the
next sequential offset. Offsets are particularly useful for consumers when reading
records from a partition. We’ll come to that at a later point.

The figure below shows a topic with three partitions. Records are being appended
to the end of each one.

Although the messages within a partition are ordered, messages across a topic are
not guaranteed to be ordered.

Open in app

A topic in Kafka is broken into multiple partitions

Partitions are the way that Kafka provides scalability

A Kafka cluster is made of one or more servers. In the Kafka universe, they are
called Brokers. Each broker holds a subset of records that belongs to the entire
cluster.

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 4/17
14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

Kafka distributes the partitions of a particular topic across multiple brokers. By

doing so, we’ll get the following benefits.

If we are to put all partitions of a topic in a single broker, the scalability of that
topic will be constrained by the broker’s IO throughput. A topic will never get
bigger than the biggest machine in the cluster. By spreading partitions across
multiple brokers, a single topic can be scaled horizontally to provide
performance far beyond a single broker’s ability.

A single topic can be consumed by multiple consumers in parallel. Serving all

partitions from a single broker limits the number of consumers it can support.
Partitions on multiple brokers enable more consumers.

Multiple instances of the same consumer can connect to partitions on different

brokers, allowing very high message processing throughput. Each consumer
instance will be served by one partition, ensuring that each record has a clear
processing owner.

Partitions are the way that Kafka provides redundancy.

Kafka keeps more than one copy of the same partition across multiple brokers. This
redundant copy is called a replica. If a broker fails, Kafka can still serve consumers
with the replicas of partitions that failed broker owned.

Partition replication is complex, and it deserves its own post. Next time maybe?

Writing records to partitions.

How does a producer decide to which partition a record should go? There are three
ways a producer can rule on that.

Using a partition key to specify the partition

A producer can use a partition key to direct messages to a specific partition. A
partition key can be any value that can be derived from the application context. A
unique device ID or a user ID will make a good partition key.

By default, the partition key is passed through a hashing function, which creates the
partition assignment. That assures that all records produced with the same key will
arrive at the same partition. Specifying a partition key enables keeping related
events together in the same partition and in the exact order in which they were
sent.

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 5/17
14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

Messages with the same partition key will end up at the same partition

Key based partition assignment can lead to broker skew if keys aren’t well
distributed.

For example, when customer ID is used as the partition key, and one customer
generates 90% of traffic, then one partition will be getting 90% of the traffic most of
the time. On small topics, this is negligible, on larger ones, it can sometime take a
broker down.

When choosing a partition key, ensure that they are well distributed.

Allowing Kafka to decide the partition

If a producer doesn’t specify a partition key when producing a record, Kafka will use
a round-robin partition assignment. Those records will be written evenly across all
partitions of a particular topic.

However, if no partition key is used, the ordering of records can not be guaranteed
within a given partition.

The key takeaway is to use a partition key to put related events together in the same
partition in the exact order in which they were sent.

Writing a custom partitioner

In some situations, a producer can use its own partitioner implementation that uses
other business rules to do the partition assignment.

Reading records from partitions.

Unlike the other pub/sub implementations, Kafka doesn’t push messages to
consumers. Instead, consumers have to pull messages off Kafka topic partitions. A
consumer connects to a partition in a broker, reads the messages in the order in
which they were written.

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 6/17
14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

The offset of a message works as a consumer side cursor at this point. The
consumer keeps track of which messages it has already consumed by keeping track
of the offset of messages. After reading a message, the consumer advances its
cursor to the next offset in the partition and continues. Advancing and
remembering the last read offset within a partition is the responsibility of the
consumer. Kafka has nothing to do with it.

By remembering the offset of the last consumed message for each partition, a
consumer can join a partition at the point in time they choose and resume from
there. That is particularly useful for a consumer to resume reading after recovering
from a crash.

A partition can be consumed by one or more consumers, each reading at different

offsets.

Kafka has the concept of consumer groups where several consumers are grouped to
consume a given topic. Consumers in the same consumer group are assigned the
same group-id value.

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 7/17
14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

The consumer group concept ensures that a message is only ever read by a single
consumer in the group.

When a consumer group consumes the partitions of

a topic, Kafka makes sure that each partition is
consumed by exactly one consumer in the group.
The following figure shows the above relationship.

A group of consumers consuming from multiple partitions. Source

Consumer groups enable consumers to parallelize and process messages at very

high throughputs. However, the maximum parallelism of a group will be equal to
the number of partitions of that topic.

For example, if you have N + 1 consumers for a topic with N partitions, then the first
N consumers will be assigned a partition, and the remaining consumer will be idle,
unless one of the N consumers fails, then the waiting consumer will be assigned its
partition. This is a good strategy to implement a hot failover.

The figure below illustrates this.

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 8/17
14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

The key takeaway is that number of consumers don’t govern the degree of
parallelism of a topic. It’s the number of partitions.

Kafka Software Development Real Time Data

Written by Dunith Danushka

2.1K Followers · Editor for Tributary Data

Editor of Tributary Data. Technologist, Writer, Senior Developer Advocate at Redpanda. Opinions are my
own.

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 9/17
14/09/2023, 12:25 Understanding Kafka Topic Partitions | by Dunith Danushka | Tributary Data | Medium

Dunith Danushka in Tributary Data

Event-driven APIs — Understanding the Principles

What are event-driven APIs? How do they differ from REST APIs? What technology choices are
there to build them?

10 min read · Apr 26, 2021

558 5

https://medium.com/event-driven-utopia/understanding-kafka-topic-partitions-ae40f80552e8 10/17

Apache Kafka Documentation
No ratings yet
Apache Kafka Documentation
419 pages
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
UiPath Exam Questions
0% (1)
UiPath Exam Questions
15 pages
IBS Scoring Program Ver11.1
100% (1)
IBS Scoring Program Ver11.1
32 pages
HCI Final Project
No ratings yet
HCI Final Project
15 pages
Apache Kafka | Thi Nguyen's Blog
No ratings yet
Apache Kafka | Thi Nguyen's Blog
39 pages
Kafka: Big Data Huawei Course
No ratings yet
Kafka: Big Data Huawei Course
14 pages
Kafka Notes Linkedin
No ratings yet
Kafka Notes Linkedin
33 pages
Basics of Kafka
No ratings yet
Basics of Kafka
17 pages
? Kafka
No ratings yet
? Kafka
2 pages
kafka
No ratings yet
kafka
5 pages
Kafka and Mongodb
No ratings yet
Kafka and Mongodb
15 pages
Kafka 2
No ratings yet
Kafka 2
20 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
Lecture Intro Kafka
No ratings yet
Lecture Intro Kafka
27 pages
Kafka
No ratings yet
Kafka
3 pages
Apache Kafka Description
No ratings yet
Apache Kafka Description
36 pages
_Data_and_AI_Kafka_Overview_1740507867
No ratings yet
_Data_and_AI_Kafka_Overview_1740507867
20 pages
Message Partitions: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Message Partitions: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Unit 5 Apache Kafka Notes
No ratings yet
Unit 5 Apache Kafka Notes
54 pages
KAFKAExample2
No ratings yet
KAFKAExample2
12 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
25 pages
Pache Kafka Is An Open-Source Distr
No ratings yet
Pache Kafka Is An Open-Source Distr
1 page
Kafka Interview Q&A
No ratings yet
Kafka Interview Q&A
28 pages
Introduction To Apache Kafka - 070224-1155-334
No ratings yet
Introduction To Apache Kafka - 070224-1155-334
7 pages
Step 19 Kafka Optional
No ratings yet
Step 19 Kafka Optional
10 pages
Documentation
No ratings yet
Documentation
105 pages
Top Answers To Kafka Interview Questions
No ratings yet
Top Answers To Kafka Interview Questions
3 pages
Kafka With Spring Boot
No ratings yet
Kafka With Spring Boot
48 pages
Apache Kafka
No ratings yet
Apache Kafka
6 pages
Apache Kafka - Introduction - Tutorialspoint
No ratings yet
Apache Kafka - Introduction - Tutorialspoint
3 pages
Kafka Topic Questions
No ratings yet
Kafka Topic Questions
9 pages
Kafka
No ratings yet
Kafka
8 pages
Kafka Notes1
No ratings yet
Kafka Notes1
19 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
No ratings yet
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Big Data-Kafka
No ratings yet
Big Data-Kafka
14 pages
Instaclustr Understanding Apache Kafka White Paper
No ratings yet
Instaclustr Understanding Apache Kafka White Paper
8 pages
KafkaDemo
No ratings yet
KafkaDemo
12 pages
Kafka in Action
100% (1)
Kafka in Action
209 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Apache Kafka Introduction
No ratings yet
Apache Kafka Introduction
21 pages
Kafka Interview Questions
No ratings yet
Kafka Interview Questions
10 pages
Kafka Streaming Data
No ratings yet
Kafka Streaming Data
154 pages
Integrating Apache Nifi and Apache Kafka
No ratings yet
Integrating Apache Nifi and Apache Kafka
5 pages
List The Various Components in Kafka
No ratings yet
List The Various Components in Kafka
2 pages
Benchmarking Apache Kafka - 2 Million Writes Per Second (On Three Cheap Machines) - LinkedIn Engineering
No ratings yet
Benchmarking Apache Kafka - 2 Million Writes Per Second (On Three Cheap Machines) - LinkedIn Engineering
9 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Kafka - Premiera Ola
100% (3)
Kafka - Premiera Ola
2 pages
KAFKA PRESENTATION (1)
No ratings yet
KAFKA PRESENTATION (1)
16 pages
Kafka Monitoring
No ratings yet
Kafka Monitoring
64 pages
RabbitMQ vs. Kafka - Head-To-Head - Better Programming
No ratings yet
RabbitMQ vs. Kafka - Head-To-Head - Better Programming
19 pages
Kafka Broker
No ratings yet
Kafka Broker
5 pages
Apache Kafka Long Polling
No ratings yet
Apache Kafka Long Polling
20 pages
Unveiling Kafka Topics - The Heartbeat of Real-Time Data Streaming
No ratings yet
Unveiling Kafka Topics - The Heartbeat of Real-Time Data Streaming
5 pages
Kafka 2
No ratings yet
Kafka 2
11 pages
HD Mod011 Kafka
No ratings yet
HD Mod011 Kafka
29 pages
Apache Kafka
No ratings yet
Apache Kafka
43 pages
Kafka
No ratings yet
Kafka
23 pages
Kafka_Interview Questions
No ratings yet
Kafka_Interview Questions
4 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
From Everand
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
Peter Jones
No ratings yet
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
From Everand
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
Jordan Lioy
No ratings yet
Test Page
No ratings yet
Test Page
1 page
Air Traffic Control System
100% (1)
Air Traffic Control System
13 pages
Advanced Databases Second Semester: Dr. Jihan A. Rasool
No ratings yet
Advanced Databases Second Semester: Dr. Jihan A. Rasool
18 pages
Core Java Syllabus
No ratings yet
Core Java Syllabus
6 pages
Personal Insurance, Inc.: Advanced Form Techniques
No ratings yet
Personal Insurance, Inc.: Advanced Form Techniques
6 pages
Week 9 PDF
No ratings yet
Week 9 PDF
5 pages
All Key
No ratings yet
All Key
13 pages
Learning Three - Js - The JavaScript 3D Library For WebGL - Second Edition - Sample Chapter
100% (1)
Learning Three - Js - The JavaScript 3D Library For WebGL - Second Edition - Sample Chapter
34 pages
Quiz 001 - Attempt Review PDF
No ratings yet
Quiz 001 - Attempt Review PDF
3 pages
Untitled
75% (16)
Untitled
33 pages
Android DRM
No ratings yet
Android DRM
5 pages
How To Convert Oracle Reports To XML Publisher Report
No ratings yet
How To Convert Oracle Reports To XML Publisher Report
3 pages
Webplus E-Commerce Tutorial
No ratings yet
Webplus E-Commerce Tutorial
15 pages
The University of Auckland: Summer Semester, 2012 Campus: City
No ratings yet
The University of Auckland: Summer Semester, 2012 Campus: City
24 pages
Lecture 08 Language Translation PDF
No ratings yet
Lecture 08 Language Translation PDF
11 pages
NiagaraAX Demo QuickStart
No ratings yet
NiagaraAX Demo QuickStart
9 pages
Shivam 2nd Report PDF
No ratings yet
Shivam 2nd Report PDF
43 pages
Nielit, Jammu: Industrial Training - On - Python
No ratings yet
Nielit, Jammu: Industrial Training - On - Python
24 pages
Criminal Records Management System Proposal
77% (35)
Criminal Records Management System Proposal
20 pages
Computing Fundamentals QUIZ 5-FINAL
No ratings yet
Computing Fundamentals QUIZ 5-FINAL
6 pages
The Build Process: of (GNU Tools For ARM Embedded Processors) 2013-12
No ratings yet
The Build Process: of (GNU Tools For ARM Embedded Processors) 2013-12
16 pages
Linux Commands
No ratings yet
Linux Commands
3 pages
Screenshot 5
No ratings yet
Screenshot 5
1 page
Siemens TCP Ip Ethernet Manual
No ratings yet
Siemens TCP Ip Ethernet Manual
103 pages
Start Download - View PDF
No ratings yet
Start Download - View PDF
6 pages
American International University - Bangladesh (AIUB) : Objective Setup DNS, FTP, HTTP and Email Servers in Packet Tracer
No ratings yet
American International University - Bangladesh (AIUB) : Objective Setup DNS, FTP, HTTP and Email Servers in Packet Tracer
12 pages
1 - PAN Terminal Server Agent Install Steps
No ratings yet
1 - PAN Terminal Server Agent Install Steps
6 pages