SlideShare a Scribd company logo
Introduction to Kafka
BY DUCAS FRANCIS
The problem
Web
Security
System
Real-time
Monitoring
Logging
System
Other
services
Mobile
API
Job
It’s simple enough at first…
Then it gets a little busy…
And ends up a mess.
The solution
Web
Security
System
Real-time
Monitoring
Logging
System
Other
services
Mobile
API
Job
Pub/Sub
Decouple data pipelines using a pub/sub system
Producers Brokers Consumers
Apache Kafka
A UNIFIED, HIGH-
THROUGHPUT, LOW-LATENCY
PLATFORM FOR HANDLING
REAL-TIME DATA FEEDS
A brief history lesson
 Originally developed at LinkedIn in 2011
 Graduated Apache Incubator in 2012
 Engineers from LinkedIn formed Confluent in 2014
 Up to version 0.9.4 with 0.10 on horizon
Motivation
 Unified platform for all real-time data feeds
 High throughput for high volume streams
 Support periodic data loads from offline systems
 Low latency for traditional messaging
 Support partitioned, distributed, real-time processing
 Guarantee fault-tolerance
Common use cases
 Messaging
 Website activity tracking
 Metrics
 Log aggregation
 Stream processing
 Event sourcing
 Commit log
Benefits of Kafka
 High throughput
 Low latency
 Load balancing
 Fault tolerant
 Guaranteed delivery
 Secure
Performance comparison
Batch performance comparison
Some terminology
 Topic – feed of messages
 Producer – publishes messages to a topic
 Consumer – subscribes to topics and processes the feed of messages
 Broker – server instance that acts in a cluster
@apachekafkapowers
@microsot…
Libraries
 Python – kafka-python / pykafka
 Go – sarama / go_kafka_client / …
 C/C++ - librdkafka / libkafka / …
 .NET – kafka-net (x2) / rdkafka-dotnet / CSharpClient-for-Kafka
 Node.js – kafka-node / sutoiku/node-kafka / ...
 HTTP – kafka-pixy / kafka-rest
 etc.
Architecture
Producer Producer
Broker BrokerBroker
Consumer ConsumerZookeeper
Cluster
x3
Show me the
Kafka!!!
VAGRANT TO THE RESCUE
Anatomy of a topic
 Topics are broken into partitions
 Messages are assigned sequential ID
called and offset
 Data is retained for a configurable
period of time
 Number of partitions can be increased
after creation, but not decreased
 Partitions are assigned to brokers
Each partition is an ordered, immutable sequence of messages that is continually appended to…
a commit log.
Broker
 Kafka service running as part of a cluster
 Receives messages from producers and serves them to consumers
 Coordinated using Zookeeper
 Need odd number for quorum
 Store messages on the file system
 Replicate messages to/from other brokers
 Answer metadata requests about brokers and topics/partitions
 As of 0.9.0 – coordinate consumers
Replication
 Partitions on a topic should be replicated
 Each partition has 1 leader and 0 or more followers
 An In-Sync Replica (ISR) is one that’s communicating with Zookeeper and not too
far behind the leader
 Replication factor can be increased after creation, not decreased
./kafka-topics
--CREATE
--REPLICATION-FACTOR
--PARTITIONS
--DESCRIBE
Producers
 Publishes messages to a topic
 Distributes messages across partitions
 Round-robin
 Key hashing
 Send synchronously or asynchronously to the broker that is the leader for the
partition
 ACKS = 0 (none),1 (leader), -1 (all ISRs)
 Synchronous is obviously slower, but more durable
Testing... Testing…
1 2 3
LET’S SEE HOW FAST WE CAN
PUSH
Consumers
 Read messages from a topic
 Multiple consumers can read from the same topic
 Manage their offsets
 Messages stay on Kafka after they are consumed
Testing... Testing…
1 2 3
LET’S SEE HOW FAST WE CAN
RECEIVE
It’s fast! But why…?
 Efficient protocol based on message set
 Batching messages to reduce network latency and small I/O operations
 Append/chunk messages to increase consumer throughput
 Optimised OS operations
 pagecache
 sendfile()
 Broker services consumers from cache where possible
 End-to-end batch compression
Load balanced consumers
 Distribute load across instances in a group by allocating partitions
 Handle failure by rebalancing partitions to other instances
 Commit their offsets to Kafka
Cluster
Broker 1 Broker 2
P0 P1 P2 P3
Consumer Group 1
C0 C1
Consumer Group 2
C2 C3 C4 C6
Consumer groups and offsets
Cluster
Broker 1 Broker 2
P0 P1 P2 P3
Consumer Group 1
C0 C1
0 1 2 3 4 5 6 7 8 9 10P3
C1
read
C1
commit
C0
read
C0
commit
Guarantees
 Messages sent by a producer to a particular topic’s partition will be appended in
the order they are sent
 A consumer instance sees messages in the order they are stored in the log
 For a topic with replication factor N, we will tolerate up to N-1 server failures
without losing any messages committed to the log
Ordered delivery
 Messages are guaranteed to be delivered in order by partition, NOT topic
M1 M3 M5
M2 M4 M6
P0
P1
 M1 before M3 before M5 – YES
 M1 before M2 – NO
 M2 before M4 before M6 – YES
 M2 before M3 - NO
Enough ALT… now
.NET
USING RDKAFKA-DOTNET
FIN. THANK YOU
Resources
 http://kafka.apache.org/documentation.html
 http://www.confluent.io/
 https://kafka.apache.org/090/configuration.html
 https://github.com/edenhill/librdkafka
 https://github.com/ah-/rdkafka-dotnet
Log compaction
 Keep the most recent payload for a key
 Use cases
 Database change subscription
 Event sourcing
 Journaling for HA
Log compaction
Ad

More Related Content

What's hot (20)

Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
Otávio Carvalho
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
Dmitry Tolpeko
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
Whiteklay
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
MapR Technologies
 
Kafka connect
Kafka connectKafka connect
Kafka connect
Andrew Stevenson
 
Kafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summitKafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summit
Goutam Chowdhury
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
AmitDhodi
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Jemin Patel
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
Kaufman Ng
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecture
techmaddy
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
Edgar Domingues
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
jimriecken
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
Otávio Carvalho
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
Dmitry Tolpeko
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
Whiteklay
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
MapR Technologies
 
Kafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summitKafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summit
Goutam Chowdhury
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
AmitDhodi
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
Kaufman Ng
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecture
techmaddy
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
Edgar Domingues
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
jimriecken
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 

Viewers also liked (9)

Building a robot with the .Net Micro Framework
Building a robot with the .Net Micro FrameworkBuilding a robot with the .Net Micro Framework
Building a robot with the .Net Micro Framework
Ducas Francis
 
A Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloperA Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloper
Ducas Francis
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
Nitin Kumar
 
Continuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsContinuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patterns
Sonatype
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
Maxim Shelest
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Rahul Jain
 
Building a robot with the .Net Micro Framework
Building a robot with the .Net Micro FrameworkBuilding a robot with the .Net Micro Framework
Building a robot with the .Net Micro Framework
Ducas Francis
 
A Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloperA Day in the Life of a Metro-veloper
A Day in the Life of a Metro-veloper
Ducas Francis
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
Nitin Kumar
 
Continuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patternsContinuous Delivery Pipeline - Patterns and Anti-patterns
Continuous Delivery Pipeline - Patterns and Anti-patterns
Sonatype
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
Maxim Shelest
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Rahul Jain
 
Ad

Similar to Introduction to Kafka (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
Ramakrishna kapa
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Kafka Architecture | Key Components | kafka training online
Kafka Architecture |  Key Components |  kafka training onlineKafka Architecture |  Key Components |  kafka training online
Kafka Architecture | Key Components | kafka training online
Accentfuture
 
apache kafka training online | kafka online training
apache kafka training online | kafka online trainingapache kafka training online | kafka online training
apache kafka training online | kafka online training
Accentfuture
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Videoguy
 
Connecting mq&kafka
Connecting mq&kafkaConnecting mq&kafka
Connecting mq&kafka
Matt Leming
 
ppt
pptppt
ppt
Videoguy
 
ppt
pptppt
ppt
Videoguy
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
Videoguy
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
Videoguy
 
Collaboration and Grid Technologies
Collaboration and Grid TechnologiesCollaboration and Grid Technologies
Collaboration and Grid Technologies
Videoguy
 
Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)
GiuseppeBaccini
 
Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
Viyaan Jhiingade
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basics
Brian S. Paskin
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
Kafka Architecture | Key Components | kafka training online
Kafka Architecture |  Key Components |  kafka training onlineKafka Architecture |  Key Components |  kafka training online
Kafka Architecture | Key Components | kafka training online
Accentfuture
 
apache kafka training online | kafka online training
apache kafka training online | kafka online trainingapache kafka training online | kafka online training
apache kafka training online | kafka online training
Accentfuture
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer ...
Videoguy
 
Connecting mq&kafka
Connecting mq&kafkaConnecting mq&kafka
Connecting mq&kafka
Matt Leming
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
Videoguy
 
NaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web ServicesNaradaBrokering Grid Messaging and Applications as Web Services
NaradaBrokering Grid Messaging and Applications as Web Services
Videoguy
 
Collaboration and Grid Technologies
Collaboration and Grid TechnologiesCollaboration and Grid Technologies
Collaboration and Grid Technologies
Videoguy
 
Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)
GiuseppeBaccini
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Kafka and ibm event streams basics
Kafka and ibm event streams basicsKafka and ibm event streams basics
Kafka and ibm event streams basics
Brian S. Paskin
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
Ad

Recently uploaded (20)

UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 

Introduction to Kafka

Editor's Notes

  • #7: High throughput – web activity tracking receiving 10’s of events per page hit or interaction. Periodic data loads – every 5min receving 100,000s messages Low latency – pub/sub in ms Distributed – anyone sending or receiving messages should be able to accomplish HA
  • #16: cd ~/Projects/kafka-vagrant vagrant status vagrant up vagrant ssh kafka-1 cat /etc/kafka/server.properties https://kafka.apache.org/090/configuration.html
  • #17: Topic – feed of messages Partition – topics are broken into partitions Messages – written to the end of a partition within a topic and assigned a sequential identifier (a 64bit integer) which is called an offset Data is retained within a partition for a configurable amount of time. The time is defaulted in broker configuration, but can be set per topic. Messages are stored on the file system in segmented files. Number of partitions can be increased after creation, but not decreased. This is because (as mentioned) the messages are stored on the file system on a per-partition basis, so reducing partitions would be effectively deleting data. Partitions are assigned to brokers – not topics. Kafka attempts to balance the number of partitions across the available brokers, which can be manually configured too. This is how kafka attempts to load balance its activity because, in theory, each broker having an equal number of partitions should receive an equal number of send and fetch requests.
  • #18: … The responsibilities of coordination are mixed between ZK and Kafka. Older versions of kafka relied more on ZK, but this is being brought more into the broker and ZK is being used more for service discovery and configuration. Before 0.9.0, consumers were coordinated by ZK and had to have a lot of logic around which partitions were assigned to them. This was changed so that for a new consumer a broker is assigned to be the consumer coordinator and tell the consumers which partitions were assigned to them.
  • #20: cd ~/Downloads/confluent-2.0.0/bin ls ./kafka-topics ./kafka-topics --create --topic perf-test --partitions 10 --replication-factor 3 --zookeeper 192.168.32.11:2181 ./kafka-topics --list --zookeeper 192.168.32.12 ./kafka-topics --describe --topic perf-test --zookeeper 192.168.32.13
  • #22: ./kafka-producer-perf-test # Publish 10k x 4kb messages ./kafka-producer-perf-test --topic perf-test --num-records 10000 --record-size 4096 --throughput 1000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 # Up the throughput for 100k ./kafka-producer-perf-test --topic perf-test --num-records 100000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 # No ACKs ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=0 # ACKs from all ISRs ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=-1 # ACK from leader, use snappy ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=1 compression.type=snappy # linger for 100ms ./kafka-producer-perf-test --topic perf-test --num-records 1000000 --record-size 4096 --throughput 100000 --producer-props bootstrap.servers=192.168.32.21:9092,192.168.32.22:9092,192.168.32.23:9092 acks=1 compression.type=snappy linger.ms=100
  • #24: # Consumer 1M on 5 threads ./kafka-consumer-perf-test --zookeeper 192.168.32.11 --topic perf-test --messages 1000000 --group perf-test --threads 5
  • #25: Modern OSs maintain a page cache and aggressively use main memory for disk caching. By NOT utilizing this and storing an in-memory representation of data you’re effectively doubling up on the amount of memory you’re application is consuming. By utilizing this you’re utilizing all available RAM for caching without GC penalties. It’s also kept in memory even if the application is restarted. This is obviously advantageous when reading messages, but also when writing. Rather than maintain as much as possible in-memory and flush it all out to the file system in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache. Modern unix operating systems offer a highly optimized code path for transferring data out of pagecache to a socket – the sendfile system call. OS reads data from a file into pagecache in kernel space Application reads from kernel space to a user space buffer Application writes data back to kernel space into a socker buffer OS copies from socket buffer to NIC buffer to send over the network sendfile avoids this by instructing the OS to send data directly from the pagecache to the NIC. This means that consumers that are caught up will be served completely from memory.
  • #26: Kafka scales topic consumption by distributing partitions among a consumer group, which is a set of consumers sharing a common group identifier. For each group a broker is selected as the group coordinator. The coordinator is responsible for managing the state of the group. Its main job is to mediate partition assignment when new members arrive, old members depart, and when topic metadata changes. The act of reassigning partitions is known as rebalancing the group.