kafka-in-depth
kafka-in-depth
Kafka cluster.
Topic:
consumers.
- Topics act as a way to categorize and organize messages based on
Partitions:
- Partitions are the physical storage units for a topic. Each topic can
from 0.
1) within a topic.
Here's why partitions are used with a real-world example:
event represents a user action, and you want to collect and process
events increases, a single server may not be sufficient to handle all the
data. Partitions allow you to distribute the event log across multiple
are strictly ordered. This means that events generated by a single user
each partition. Some partitions can retain data for a longer period,
while others may have shorter retention periods, based on your data
retention requirements.
Real-World Usage:
process events in parallel, maintain order within each user group, and
0-999, partition 1 for users with user IDs 1000-1999, and so on. This
read data in parallel from a Kafka topic. A Kafka Consumer Group has
more than one consumer reading from the same topic, there is a high
chance that each message will be read more than once. Kafka solves
Let’s assume that we have a Kafka topic, and there are 4 partitions in
In this case, each Consumer will read data from each partition, which
is the ideal case.
In this case, one consumer will remain idle and leads to poor
utilization of the resource.
3. Number of consumers < Number of partitions
In this case, one of the consumers will read data from more than one
partition.
Consumer fails for some reason? The whole pipeline will break.
Real Example:
As people started liking our services, more people started using them,
thus generating many logs per hour. We found that the application
In Kafka, the use of a key when producing a message serves two main
purposes:
Real-World Example:
In this case, you can use the user ID as the key when producing ride
request messages to Kafka. Here's how it works:
So, in simple terms, using keys in Kafka allows you to organize and
process related messages together while still benefiting from Kafka's
parallelism and scalability. It helps you maintain order and efficiency
in your data processing pipeline.
- You can use the `--partition` flag to manually specify the partition
number when producing messages, but this is typically not necessary
unless you have a specific reason to override Kafka's default
partitioning behavior.
Simple python code
Kafka UseCase - Uber:
For more Details:
https://www.uber.com/en-DE/blog/kafka-async-queuing-with-consum
er-proxy/
https://blog.devgenius.io/unraveling-kafka-with-uber-a-real-life-applica
tion-of-event-streaming-43c07ab305cc