0% found this document useful (0 votes)
4 views

Apache Kafka

Uploaded by

dyvikmanju5
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Apache Kafka

Uploaded by

dyvikmanju5
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

INTERNET OF THINGS

TOPIC: APACHE KAFKA


Group members
Taniya Souza
[1DA21CS150]
Srinivasan r Guide:
[1DA21CS143] Prof. Lavanya Santosh
Yashwanth B K
CSE Dept
[1DA21CS171]
Yashwanth Gowda B
[1DA21CS172]
What is Apache Kafka?
•Apache Kafka is an open-source distributed event-streaming platform.

•Originally developed by LinkedIn and donated to the Apache Software

Foundation in 2011.

•Designed to handle high-throughput, low-latency, real-time data streams.


Key Features of Kafka
 Distributed System: Runs as a cluster of brokers for scalability and fault tolerance.

 Durable Storage: Data is stored on disk and replicated across brokers.

 High Throughput: Can handle millions of messages per second.

 Low Latency: Ensures quick delivery of messages.

 Decoupling Systems: Allows independent development and scaling of producers and consumers.
Why Use Kafka?
•Ideal for modern data-driven applications.

•Helps in building real-time analytics systems.

•Serves as a backbone for microservices communication.

•Ensures scalability to handle large datasets.

•Integrates with popular big data frameworks like Spark, Flink, and Hadoop.
Core Functions:
•Publish and Subscribe: Enables real-time messaging between producers and consumers through

topics.

•Durable Storage: Persistently stores data streams on disk, allowing replay and recovery.

•Scalable Partitioning: Divides topics into partitions for parallel and distributed data processing.

•Fault Tolerance: Ensures data availability and reliability through replication across brokers.

•Real-Time Stream Processing: Processes and analyzes data streams in real time using Kafka

Streams or external tools.


Kafka Architecture
Overview
• Kafka is a publish-subscribe messaging
system with the following components:
• Producers: Publish messages to topics.
• Consumers: Subscribe to topics to consume
messages.
• Brokers: Manage the storage and retrieval
of messages.
• Topics: Categories to which messages are
published.
• Partitions: Break down topics for scalability.
Kafka Topics
 A topic is a logical channel for data streams.

 Each topic is divided into partitions for parallel processing.

 Data in topics is retained for a configurable period, even after

consumption.

 Topics can have configurations for replication and data retention.

 Example: A “Sales Data” topic could have partitions based on

regions.
Producers and Consumers
•Producers: Send data to Kafka topics.

•Push messages to specific partitions.

•Can define custom partitioning logic (e.g., based on keys).

•Consumers: Read data from topics.

•Join consumer groups for parallel processing.

•Kafka ensures that each partition is read by one consumer in a group.


Brokers and Clusters
• A Kafka cluster consists of multiple brokers.

• Brokers: Handle storage and management of data

streams.

• Each broker handles a subset of partitions.

• Collaborate to provide fault tolerance and scalability.

• Clusters use ZooKeeper (or KRaft, in newer versions)

for managing configurations and leader election.


Kafka Partitions
 Topics are divided into partitions to distribute data

and allow parallelism.

 Data Placement: Messages in partitions are stored

in the order they arrive.

 Key-Based Partitioning: Ensures that messages

with the same key go to the same partition.

 Example: A “User Activity” topic could have partitions

for different user IDs.


Offset and Message
Ordering
• Offset: A unique identifier for each message in a partition.

• Used to keep track of consumed messages.

• Kafka guarantees message order within a partition but not

across partitions.

• Consumers can reset offsets for reprocessing messages.


Durability and Replication
•Kafka ensures durability by replicating data across brokers.

•Leader Replica: Handles all read and write requests for a partition.

•Follower Replicas: Maintain copies and take over if the leader fails.

•Acknowledgments: Producers can configure how many replicas must confirm

a message before it's considered successful.


Use Cases of Kafka
•Real-Time Analytics:
Monitor and analyze social media feeds or website activities.
•Log Aggregation:
Centralized logging from distributed systems.
•Event Sourcing:
Capture application changes as a sequence of events.
•Data Integration:
Sync databases and applications.
•Stream Processing:
Process and analyze data in real-time with Kafka Streams or other tools.
Advantages of Kafka
•Scalability: Can scale horizontally by adding brokers.
•Flexibility: Works with multiple programming languages.
•Resilience: Fault-tolerant with replication and partitioning.
•Performance: Handles millions of events per second with low latency.
•Integration: Seamlessly integrates with popular tools like Spark and Flink.
Challenges with Kafka
 Complex Setup: Requires expertise to configure and maintain.

 Resource-Intensive: High memory usage for durability and performance.

 Message Duplication: Can occur without proper configuration.

 Operational Overhead: ZooKeeper dependency in older versions.


SUMMAR
Y
 Apache Kafka is a distributed platform for real-time data streaming and
processing, designed for high-throughput, low-latency, and fault-tolerant
communication.
 Kafka uses topics for organizing data, partitions for scalability, and
replication for reliability, enabling efficient handling of massive data
streams.
 Common applications include real-time analytics, event-driven
architectures, log aggregation, and data integration between diverse
systems.
THANK YOU

You might also like